Scikit Learn Tutorial and Cheat Sheet

Scikit Learn Cheat Sheet Python Machine Learning

An easy-to-follow scikit learn tutorial that will help you to get started with the Python machine learning. A handy scikit-learn cheat sheet to machine learning with Python, this includes the function and its brief description
Pre-Processing
Function Description
sklearn.preprocessing.StandardScaler Standardize features by removing the mean and scaling to unit variance
sklearn.preprocessing.Imputer Imputation transformer for completing missing values
sklearn.preprocessing.LabelBinarizer Binarize labels in a one-vs-all fashion
sklearn.preprocessing.OneHotEncoder Encode categorical integer features using a one-hot a.k.a one-of-K scheme
sklearn.preprocessing.PolynomialFeatures Generate polynomial and interaction features

 

Regression
Function Description
sklearn.tree.DecisionTreeRegressor A decision tree regressor
sklearn.svm.SVR Epsilon-Support Vector Regression
sklearn.linear_model.LinearRegression Ordinary least squares Linear Regression
sklearn.linear_model.Lasso Linear Model trained with L1 prior as regularized (a.k.a the lasso)
sklearn.linear_model.SGDRegressor Linear model fitted by minimizing a regularized empirical loss with SGD
sklearn.linear_model.ElasticNet Linear regression with combined L1 and L2 priors as regularizor
sklearn.ensemble.RandomForestRegressor A random forest regressor
sklearn.ensemble.GradientBoostingRegressor Gradient Boosting for regression
sklearn.neural_network.MLPRegressor Multi-layer Perceptron regressor

 

classification
Function Description
sklearn.neural_network.MLPClassifier Multi-layer Perceptron classifier
sklearn.tree.DecisionTreeClassifier A decision tree classifier
sklearn.svm.SVC C-Support Vector Classification
sklearn.linear_model.LogisticRegression Logistic Regression (a.k.a logit, Max Ent) classifier
sklearn.linear_model.SGDClassifier Linear classifiers (SVM, logistic regression, a.o.) with SGD training
sklearn.naive_bayes.GaussianNB Gaussain Naïve Bayes
sklearn.neighbors.KNeighborsClassifier Classifier implementing the k-nearest neighbors vote
sklearn.ensemble.RandomForestClassifier A random forest classifier
sklearn.ensemble.GradientBoostingClassifier Gradient Boosting for classification

 

Clustering
Function Description
sklearn.cluster.Kmeans K-Means clustering
sklearn.cluster.DBSCAN perform DBSCAN clustering from vector array or distance matrix
sklearn.cluster.AgglomerativeClustering Agglomerative clustering
sklearn.cluster.SpectralBiclustering Spectral bi-clustering

 

Dimensionality Reduction
Function Description
sklearn.decomposition.PCA Principal component analysis (PCA)
sklearn.decomposition.LatentDirichletAllocation Latent Dirichlet Allocation with online variational Bayes algorithm
sklearn.decomposition.SparseCoder Sparse coding
sklearn.decomposition.DictionaryLearning Dictionary learning

 

Model Selection
Function Description
sklearn.model_selection.Kfold K-Folds cross-validator
sklearn.model_selection.StratifiedKFold Stratified K-Flods cross-validator
sklearn.model_selection.TimeSeriesSplit Time Series cross-validator
sklearn.model_selection.train_test_split Split arrays or matrices into random train and test subsets
sklearn.model_selection.GridSearchCV Exhaustive search over specified parameter value for an estimator
sklearn.model_selection.cross_val_score Evaluate a score by cross-validation

 

Metric
Function Description
sklearn.metrics.accuracy_score Classification Metric: Accuracy classification score
sklearn.metrics.log_loss Classification Metric: Log loss, a.k.a logistic loss or cross-entropy loss
sklearn.metrics.roc_auc_score Classification Metric: Compute Receiver operating characteristics ROC
sklearn.metrics.mean_absolute_error Regression Metric: Mean absolute error regression loss
sklearn.metrics.r2_score Regression Metric: R^2 (coefficient of determination) regression score
sklearn.metrics.label_ranking_loss Ranking Metric: Compute Ranking loss measure
sklearn.metrics.mutual_info_score Clustering Metric: Mutual Information between two clustering.

 

Miscellaneous
Function Description
sklearn.datasets.load_boston Load and return the Boston house prices data set (regression)
sklearn.datasets.make_classification Generate a random n-class classification problem
sklearn.feature_extraction.FeatureHasher Implements feature hashing, a.k.a the hashing trick
sklearn.feature_selection.SelectKBest Select features according to the k highest scores
sklearn.pipeline.Pipeline Pipeline of transforms with a final estimator
sklearn.semi_supervised.LabelPropagation Label Propagation classifier for semi-supervised learning

 

Download PDF

 

You might also like More from author