Scikit Learn Tutorial and Cheat Sheet

Scikit Learn Cheat Sheet Python Machine Learning

An easy-to-follow scikit learn tutorial that will help you to get started with the Python machine learning. A handy scikit-learn cheat sheet to machine learning with Python, this includes the function and its brief description
Function Description
sklearn.preprocessing.StandardScaler Standardize features by removing the mean and scaling to unit variance
sklearn.preprocessing.Imputer Imputation transformer for completing missing values
sklearn.preprocessing.LabelBinarizer Binarize labels in a one-vs-all fashion
sklearn.preprocessing.OneHotEncoder Encode categorical integer features using a one-hot a.k.a one-of-K scheme
sklearn.preprocessing.PolynomialFeatures Generate polynomial and interaction features


Function Description
sklearn.tree.DecisionTreeRegressor A decision tree regressor
sklearn.svm.SVR Epsilon-Support Vector Regression
sklearn.linear_model.LinearRegression Ordinary least squares Linear Regression
sklearn.linear_model.Lasso Linear Model trained with L1 prior as regularized (a.k.a the lasso)
sklearn.linear_model.SGDRegressor Linear model fitted by minimizing a regularized empirical loss with SGD
sklearn.linear_model.ElasticNet Linear regression with combined L1 and L2 priors as regularizor
sklearn.ensemble.RandomForestRegressor A random forest regressor
sklearn.ensemble.GradientBoostingRegressor Gradient Boosting for regression
sklearn.neural_network.MLPRegressor Multi-layer Perceptron regressor


Function Description
sklearn.neural_network.MLPClassifier Multi-layer Perceptron classifier
sklearn.tree.DecisionTreeClassifier A decision tree classifier
sklearn.svm.SVC C-Support Vector Classification
sklearn.linear_model.LogisticRegression Logistic Regression (a.k.a logit, Max Ent) classifier
sklearn.linear_model.SGDClassifier Linear classifiers (SVM, logistic regression, a.o.) with SGD training
sklearn.naive_bayes.GaussianNB Gaussain Naïve Bayes
sklearn.neighbors.KNeighborsClassifier Classifier implementing the k-nearest neighbors vote
sklearn.ensemble.RandomForestClassifier A random forest classifier
sklearn.ensemble.GradientBoostingClassifier Gradient Boosting for classification


Function Description
sklearn.cluster.Kmeans K-Means clustering
sklearn.cluster.DBSCAN perform DBSCAN clustering from vector array or distance matrix
sklearn.cluster.AgglomerativeClustering Agglomerative clustering
sklearn.cluster.SpectralBiclustering Spectral bi-clustering


Dimensionality Reduction
Function Description
sklearn.decomposition.PCA Principal component analysis (PCA)
sklearn.decomposition.LatentDirichletAllocation Latent Dirichlet Allocation with online variational Bayes algorithm
sklearn.decomposition.SparseCoder Sparse coding
sklearn.decomposition.DictionaryLearning Dictionary learning


Model Selection
Function Description
sklearn.model_selection.Kfold K-Folds cross-validator
sklearn.model_selection.StratifiedKFold Stratified K-Flods cross-validator
sklearn.model_selection.TimeSeriesSplit Time Series cross-validator
sklearn.model_selection.train_test_split Split arrays or matrices into random train and test subsets
sklearn.model_selection.GridSearchCV Exhaustive search over specified parameter value for an estimator
sklearn.model_selection.cross_val_score Evaluate a score by cross-validation


Function Description
sklearn.metrics.accuracy_score Classification Metric: Accuracy classification score
sklearn.metrics.log_loss Classification Metric: Log loss, a.k.a logistic loss or cross-entropy loss
sklearn.metrics.roc_auc_score Classification Metric: Compute Receiver operating characteristics ROC
sklearn.metrics.mean_absolute_error Regression Metric: Mean absolute error regression loss
sklearn.metrics.r2_score Regression Metric: R^2 (coefficient of determination) regression score
sklearn.metrics.label_ranking_loss Ranking Metric: Compute Ranking loss measure
sklearn.metrics.mutual_info_score Clustering Metric: Mutual Information between two clustering.


Function Description
sklearn.datasets.load_boston Load and return the Boston house prices data set (regression)
sklearn.datasets.make_classification Generate a random n-class classification problem
sklearn.feature_extraction.FeatureHasher Implements feature hashing, a.k.a the hashing trick
sklearn.feature_selection.SelectKBest Select features according to the k highest scores
sklearn.pipeline.Pipeline Pipeline of transforms with a final estimator
sklearn.semi_supervised.LabelPropagation Label Propagation classifier for semi-supervised learning


Download PDF


You might also like More from author