mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-01-26 16:58:56 +08:00
Improve the scikit-learn skill
This commit is contained in:
@@ -1,261 +1,378 @@
|
||||
# Supervised Learning in scikit-learn
|
||||
# Supervised Learning Reference
|
||||
|
||||
## Overview
|
||||
Supervised learning algorithms learn patterns from labeled training data to make predictions on new data. Scikit-learn organizes supervised learning into 17 major categories.
|
||||
|
||||
Supervised learning algorithms learn from labeled training data to make predictions on new data. Scikit-learn provides comprehensive implementations for both classification and regression tasks.
|
||||
|
||||
## Linear Models
|
||||
|
||||
### Regression
|
||||
- **LinearRegression**: Ordinary least squares regression
|
||||
- **Ridge**: L2-regularized regression, good for multicollinearity
|
||||
- **Lasso**: L1-regularized regression, performs feature selection
|
||||
- **ElasticNet**: Combined L1/L2 regularization
|
||||
- **LassoLars**: Lasso using Least Angle Regression algorithm
|
||||
- **BayesianRidge**: Bayesian approach with automatic relevance determination
|
||||
|
||||
**Linear Regression (`sklearn.linear_model.LinearRegression`)**
|
||||
- Ordinary least squares regression
|
||||
- Fast, interpretable, no hyperparameters
|
||||
- Use when: Linear relationships, interpretability matters
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.linear_model import LinearRegression
|
||||
|
||||
model = LinearRegression()
|
||||
model.fit(X_train, y_train)
|
||||
predictions = model.predict(X_test)
|
||||
```
|
||||
|
||||
**Ridge Regression (`sklearn.linear_model.Ridge`)**
|
||||
- L2 regularization to prevent overfitting
|
||||
- Key parameter: `alpha` (regularization strength, default=1.0)
|
||||
- Use when: Multicollinearity present, need regularization
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.linear_model import Ridge
|
||||
|
||||
model = Ridge(alpha=1.0)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Lasso (`sklearn.linear_model.Lasso`)**
|
||||
- L1 regularization with feature selection
|
||||
- Key parameter: `alpha` (regularization strength)
|
||||
- Use when: Want sparse models, feature selection
|
||||
- Can reduce some coefficients to exactly zero
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.linear_model import Lasso
|
||||
|
||||
model = Lasso(alpha=0.1)
|
||||
model.fit(X_train, y_train)
|
||||
# Check which features were selected
|
||||
print(f"Non-zero coefficients: {sum(model.coef_ != 0)}")
|
||||
```
|
||||
|
||||
**ElasticNet (`sklearn.linear_model.ElasticNet`)**
|
||||
- Combines L1 and L2 regularization
|
||||
- Key parameters: `alpha`, `l1_ratio` (0=Ridge, 1=Lasso)
|
||||
- Use when: Need both feature selection and regularization
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.linear_model import ElasticNet
|
||||
|
||||
model = ElasticNet(alpha=0.1, l1_ratio=0.5)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
### Classification
|
||||
- **LogisticRegression**: Binary and multiclass classification
|
||||
- **RidgeClassifier**: Ridge regression for classification
|
||||
- **SGDClassifier**: Linear classifiers with SGD training
|
||||
|
||||
**Use cases**: Baseline models, interpretable predictions, high-dimensional data, when linear relationships are expected
|
||||
**Logistic Regression (`sklearn.linear_model.LogisticRegression`)**
|
||||
- Binary and multiclass classification
|
||||
- Key parameters: `C` (inverse regularization), `penalty` ('l1', 'l2', 'elasticnet')
|
||||
- Returns probability estimates
|
||||
- Use when: Need probabilistic predictions, interpretability
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
|
||||
**Key parameters**:
|
||||
- `alpha`: Regularization strength (higher = more regularization)
|
||||
- `fit_intercept`: Whether to calculate intercept
|
||||
- `solver`: Optimization algorithm ('lbfgs', 'saga', 'liblinear')
|
||||
model = LogisticRegression(C=1.0, max_iter=1000)
|
||||
model.fit(X_train, y_train)
|
||||
probas = model.predict_proba(X_test)
|
||||
```
|
||||
|
||||
## Support Vector Machines (SVM)
|
||||
**Stochastic Gradient Descent (SGD)**
|
||||
- `SGDClassifier`, `SGDRegressor`
|
||||
- Efficient for large-scale learning
|
||||
- Key parameters: `loss`, `penalty`, `alpha`, `learning_rate`
|
||||
- Use when: Very large datasets (>10^4 samples)
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.linear_model import SGDClassifier
|
||||
|
||||
- **SVC**: Support Vector Classification
|
||||
- **SVR**: Support Vector Regression
|
||||
- **LinearSVC**: Linear SVM using liblinear (faster for large datasets)
|
||||
- **OneClassSVM**: Unsupervised outlier detection
|
||||
model = SGDClassifier(loss='log_loss', max_iter=1000, tol=1e-3)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Use cases**: Complex non-linear decision boundaries, high-dimensional spaces, when clear margin of separation exists
|
||||
## Support Vector Machines
|
||||
|
||||
**Key parameters**:
|
||||
- `kernel`: 'linear', 'poly', 'rbf', 'sigmoid'
|
||||
- `C`: Regularization parameter (lower = more regularization)
|
||||
- `gamma`: Kernel coefficient ('scale', 'auto', or float)
|
||||
- `degree`: Polynomial degree (for poly kernel)
|
||||
**SVC (`sklearn.svm.SVC`)**
|
||||
- Classification with kernel methods
|
||||
- Key parameters: `C`, `kernel` ('linear', 'rbf', 'poly'), `gamma`
|
||||
- Use when: Small to medium datasets, complex decision boundaries
|
||||
- Note: Does not scale well to large datasets
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.svm import SVC
|
||||
|
||||
**Performance tip**: SVMs don't scale well beyond tens of thousands of samples. Use LinearSVC for large datasets with linear kernel.
|
||||
# Linear kernel for linearly separable data
|
||||
model_linear = SVC(kernel='linear', C=1.0)
|
||||
|
||||
# RBF kernel for non-linear data
|
||||
model_rbf = SVC(kernel='rbf', C=1.0, gamma='scale')
|
||||
model_rbf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**SVR (`sklearn.svm.SVR`)**
|
||||
- Regression with kernel methods
|
||||
- Similar parameters to SVC
|
||||
- Additional parameter: `epsilon` (tube width)
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.svm import SVR
|
||||
|
||||
model = SVR(kernel='rbf', C=1.0, epsilon=0.1)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
## Decision Trees
|
||||
|
||||
- **DecisionTreeClassifier**: Classification tree
|
||||
- **DecisionTreeRegressor**: Regression tree
|
||||
- **ExtraTreeClassifier/Regressor**: Extremely randomized tree
|
||||
**DecisionTreeClassifier / DecisionTreeRegressor**
|
||||
- Non-parametric model learning decision rules
|
||||
- Key parameters:
|
||||
- `max_depth`: Maximum tree depth (prevents overfitting)
|
||||
- `min_samples_split`: Minimum samples to split a node
|
||||
- `min_samples_leaf`: Minimum samples in leaf
|
||||
- `criterion`: 'gini', 'entropy' for classification; 'squared_error', 'absolute_error' for regression
|
||||
- Use when: Need interpretable model, non-linear relationships, mixed feature types
|
||||
- Prone to overfitting - use ensembles or pruning
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.tree import DecisionTreeClassifier
|
||||
|
||||
**Use cases**: Non-linear relationships, feature importance analysis, interpretable rules, handling mixed data types
|
||||
model = DecisionTreeClassifier(
|
||||
max_depth=5,
|
||||
min_samples_split=20,
|
||||
min_samples_leaf=10,
|
||||
criterion='gini'
|
||||
)
|
||||
model.fit(X_train, y_train)
|
||||
|
||||
**Key parameters**:
|
||||
- `max_depth`: Maximum tree depth (controls overfitting)
|
||||
- `min_samples_split`: Minimum samples to split a node
|
||||
- `min_samples_leaf`: Minimum samples in leaf node
|
||||
- `max_features`: Number of features to consider for splits
|
||||
- `criterion`: 'gini', 'entropy' (classification); 'squared_error', 'absolute_error' (regression)
|
||||
|
||||
**Overfitting prevention**: Limit `max_depth`, increase `min_samples_split/leaf`, use pruning with `ccp_alpha`
|
||||
# Visualize the tree
|
||||
from sklearn.tree import plot_tree
|
||||
plot_tree(model, feature_names=feature_names, class_names=class_names)
|
||||
```
|
||||
|
||||
## Ensemble Methods
|
||||
|
||||
### Random Forests
|
||||
- **RandomForestClassifier**: Ensemble of decision trees
|
||||
- **RandomForestRegressor**: Regression variant
|
||||
|
||||
**Use cases**: Robust general-purpose algorithm, reduces overfitting vs single trees, handles non-linear relationships
|
||||
**RandomForestClassifier / RandomForestRegressor**
|
||||
- Ensemble of decision trees with bagging
|
||||
- Key parameters:
|
||||
- `n_estimators`: Number of trees (default=100)
|
||||
- `max_depth`: Maximum tree depth
|
||||
- `max_features`: Features to consider for splits ('sqrt', 'log2', or int)
|
||||
- `min_samples_split`, `min_samples_leaf`: Control tree growth
|
||||
- Use when: High accuracy needed, can afford computation
|
||||
- Provides feature importance
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.ensemble import RandomForestClassifier
|
||||
|
||||
**Key parameters**:
|
||||
- `n_estimators`: Number of trees (higher = better but slower)
|
||||
- `max_depth`: Maximum tree depth
|
||||
- `max_features`: Features per split ('sqrt', 'log2', int, float)
|
||||
- `bootstrap`: Whether to use bootstrap samples
|
||||
- `n_jobs`: Parallel processing (-1 uses all cores)
|
||||
model = RandomForestClassifier(
|
||||
n_estimators=100,
|
||||
max_depth=10,
|
||||
max_features='sqrt',
|
||||
n_jobs=-1 # Use all CPU cores
|
||||
)
|
||||
model.fit(X_train, y_train)
|
||||
|
||||
# Feature importance
|
||||
importances = model.feature_importances_
|
||||
```
|
||||
|
||||
### Gradient Boosting
|
||||
- **HistGradientBoostingClassifier/Regressor**: Histogram-based, fast for large datasets (>10k samples)
|
||||
- **GradientBoostingClassifier/Regressor**: Traditional implementation, better for small datasets
|
||||
|
||||
**Use cases**: High-performance predictions, winning Kaggle competitions, structured/tabular data
|
||||
**GradientBoostingClassifier / GradientBoostingRegressor**
|
||||
- Sequential ensemble building trees on residuals
|
||||
- Key parameters:
|
||||
- `n_estimators`: Number of boosting stages
|
||||
- `learning_rate`: Shrinks contribution of each tree
|
||||
- `max_depth`: Depth of individual trees (typically 3-5)
|
||||
- `subsample`: Fraction of samples for training each tree
|
||||
- Use when: Need high accuracy, can afford training time
|
||||
- Often achieves best performance
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.ensemble import GradientBoostingClassifier
|
||||
|
||||
**Key parameters**:
|
||||
- `n_estimators`: Number of boosting stages
|
||||
- `learning_rate`: Shrinks contribution of each tree
|
||||
- `max_depth`: Maximum tree depth (typically 3-8)
|
||||
- `subsample`: Fraction of samples per tree (enables stochastic gradient boosting)
|
||||
- `early_stopping`: Stop when validation score stops improving
|
||||
model = GradientBoostingClassifier(
|
||||
n_estimators=100,
|
||||
learning_rate=0.1,
|
||||
max_depth=3,
|
||||
subsample=0.8
|
||||
)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Performance tip**: HistGradientBoosting is orders of magnitude faster for large datasets
|
||||
**HistGradientBoostingClassifier / HistGradientBoostingRegressor**
|
||||
- Faster gradient boosting with histogram-based algorithm
|
||||
- Native support for missing values and categorical features
|
||||
- Key parameters: Similar to GradientBoosting
|
||||
- Use when: Large datasets, need faster training
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.ensemble import HistGradientBoostingClassifier
|
||||
|
||||
### AdaBoost
|
||||
- **AdaBoostClassifier/Regressor**: Adaptive boosting
|
||||
model = HistGradientBoostingClassifier(
|
||||
max_iter=100,
|
||||
learning_rate=0.1,
|
||||
max_depth=None, # No limit by default
|
||||
categorical_features='from_dtype' # Auto-detect categorical
|
||||
)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Use cases**: Boosting weak learners, less prone to overfitting than other methods
|
||||
### Other Ensemble Methods
|
||||
|
||||
**Key parameters**:
|
||||
- `estimator`: Base estimator (default: DecisionTreeClassifier with max_depth=1)
|
||||
- `n_estimators`: Number of boosting iterations
|
||||
- `learning_rate`: Weight applied to each classifier
|
||||
**AdaBoost**
|
||||
- Adaptive boosting focusing on misclassified samples
|
||||
- Key parameters: `n_estimators`, `learning_rate`, `estimator` (base estimator)
|
||||
- Use when: Simple boosting approach needed
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.ensemble import AdaBoostClassifier
|
||||
|
||||
### Bagging
|
||||
- **BaggingClassifier/Regressor**: Bootstrap aggregating with any base estimator
|
||||
model = AdaBoostClassifier(n_estimators=50, learning_rate=1.0)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Use cases**: Reducing variance of unstable models, parallel ensemble creation
|
||||
**Voting Classifier / Regressor**
|
||||
- Combines predictions from multiple models
|
||||
- Types: 'hard' (majority vote) or 'soft' (average probabilities)
|
||||
- Use when: Want to ensemble different model types
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.ensemble import VotingClassifier
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.tree import DecisionTreeClassifier
|
||||
from sklearn.svm import SVC
|
||||
|
||||
**Key parameters**:
|
||||
- `estimator`: Base estimator to fit
|
||||
- `n_estimators`: Number of estimators
|
||||
- `max_samples`: Samples to draw per estimator
|
||||
- `bootstrap`: Whether to use replacement
|
||||
model = VotingClassifier(
|
||||
estimators=[
|
||||
('lr', LogisticRegression()),
|
||||
('dt', DecisionTreeClassifier()),
|
||||
('svc', SVC(probability=True))
|
||||
],
|
||||
voting='soft'
|
||||
)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
### Voting & Stacking
|
||||
- **VotingClassifier/Regressor**: Combines different model types
|
||||
- **StackingClassifier/Regressor**: Meta-learner trained on base predictions
|
||||
**Stacking Classifier / Regressor**
|
||||
- Trains a meta-model on predictions from base models
|
||||
- More sophisticated than voting
|
||||
- Key parameter: `final_estimator` (meta-learner)
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.ensemble import StackingClassifier
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.tree import DecisionTreeClassifier
|
||||
from sklearn.svm import SVC
|
||||
|
||||
**Use cases**: Combining diverse models, leveraging different model strengths
|
||||
model = StackingClassifier(
|
||||
estimators=[
|
||||
('dt', DecisionTreeClassifier()),
|
||||
('svc', SVC())
|
||||
],
|
||||
final_estimator=LogisticRegression()
|
||||
)
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
## Neural Networks
|
||||
## K-Nearest Neighbors
|
||||
|
||||
- **MLPClassifier**: Multi-layer perceptron classifier
|
||||
- **MLPRegressor**: Multi-layer perceptron regressor
|
||||
**KNeighborsClassifier / KNeighborsRegressor**
|
||||
- Non-parametric method based on distance
|
||||
- Key parameters:
|
||||
- `n_neighbors`: Number of neighbors (default=5)
|
||||
- `weights`: 'uniform' or 'distance'
|
||||
- `metric`: Distance metric ('euclidean', 'manhattan', etc.)
|
||||
- Use when: Small dataset, simple baseline needed
|
||||
- Slow prediction on large datasets
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.neighbors import KNeighborsClassifier
|
||||
|
||||
**Use cases**: Complex non-linear patterns, when gradient boosting is too slow, deep feature learning
|
||||
|
||||
**Key parameters**:
|
||||
- `hidden_layer_sizes`: Tuple of hidden layer sizes (e.g., (100, 50))
|
||||
- `activation`: 'relu', 'tanh', 'logistic'
|
||||
- `solver`: 'adam', 'lbfgs', 'sgd'
|
||||
- `alpha`: L2 regularization term
|
||||
- `learning_rate`: Learning rate schedule
|
||||
- `early_stopping`: Stop when validation score stops improving
|
||||
|
||||
**Important**: Feature scaling is critical for neural networks. Always use StandardScaler or similar.
|
||||
|
||||
## Nearest Neighbors
|
||||
|
||||
- **KNeighborsClassifier/Regressor**: K-nearest neighbors
|
||||
- **RadiusNeighborsClassifier/Regressor**: Radius-based neighbors
|
||||
- **NearestCentroid**: Classification using class centroids
|
||||
|
||||
**Use cases**: Simple baseline, irregular decision boundaries, when interpretability isn't critical
|
||||
|
||||
**Key parameters**:
|
||||
- `n_neighbors`: Number of neighbors (typically 3-11)
|
||||
- `weights`: 'uniform' or 'distance' (distance-weighted voting)
|
||||
- `metric`: Distance metric ('euclidean', 'manhattan', 'minkowski')
|
||||
- `algorithm`: 'auto', 'ball_tree', 'kd_tree', 'brute'
|
||||
model = KNeighborsClassifier(n_neighbors=5, weights='distance')
|
||||
model.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
## Naive Bayes
|
||||
|
||||
- **GaussianNB**: Assumes Gaussian distribution of features
|
||||
- **MultinomialNB**: For discrete counts (text classification)
|
||||
- **BernoulliNB**: For binary/boolean features
|
||||
- **CategoricalNB**: For categorical features
|
||||
- **ComplementNB**: Adapted for imbalanced datasets
|
||||
**GaussianNB, MultinomialNB, BernoulliNB**
|
||||
- Probabilistic classifiers based on Bayes' theorem
|
||||
- Fast training and prediction
|
||||
- GaussianNB: Continuous features (assumes Gaussian distribution)
|
||||
- MultinomialNB: Count features (text classification)
|
||||
- BernoulliNB: Binary features
|
||||
- Use when: Text classification, fast baseline, probabilistic predictions
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.naive_bayes import GaussianNB, MultinomialNB
|
||||
|
||||
**Use cases**: Text classification, fast baseline, when features are independent, small training sets
|
||||
# For continuous features
|
||||
model_gaussian = GaussianNB()
|
||||
|
||||
**Key parameters**:
|
||||
- `alpha`: Smoothing parameter (Laplace/Lidstone smoothing)
|
||||
- `fit_prior`: Whether to learn class prior probabilities
|
||||
# For text/count data
|
||||
model_multinomial = MultinomialNB(alpha=1.0) # alpha is smoothing parameter
|
||||
model_multinomial.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
## Linear/Quadratic Discriminant Analysis
|
||||
## Neural Networks
|
||||
|
||||
- **LinearDiscriminantAnalysis**: Linear decision boundary with dimensionality reduction
|
||||
- **QuadraticDiscriminantAnalysis**: Quadratic decision boundary
|
||||
**MLPClassifier / MLPRegressor**
|
||||
- Multi-layer perceptron (feedforward neural network)
|
||||
- Key parameters:
|
||||
- `hidden_layer_sizes`: Tuple of hidden layer sizes, e.g., (100, 50)
|
||||
- `activation`: 'relu', 'tanh', 'logistic'
|
||||
- `solver`: 'adam', 'sgd', 'lbfgs'
|
||||
- `alpha`: L2 regularization parameter
|
||||
- `learning_rate`: 'constant', 'adaptive'
|
||||
- Use when: Complex non-linear patterns, large datasets
|
||||
- Requires feature scaling
|
||||
- Example:
|
||||
```python
|
||||
from sklearn.neural_network import MLPClassifier
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
|
||||
**Use cases**: When classes have Gaussian distributions, dimensionality reduction, when covariance assumptions hold
|
||||
# Scale features first
|
||||
scaler = StandardScaler()
|
||||
X_train_scaled = scaler.fit_transform(X_train)
|
||||
|
||||
## Gaussian Processes
|
||||
model = MLPClassifier(
|
||||
hidden_layer_sizes=(100, 50),
|
||||
activation='relu',
|
||||
solver='adam',
|
||||
alpha=0.0001,
|
||||
max_iter=1000
|
||||
)
|
||||
model.fit(X_train_scaled, y_train)
|
||||
```
|
||||
|
||||
- **GaussianProcessClassifier**: Probabilistic classification
|
||||
- **GaussianProcessRegressor**: Probabilistic regression with uncertainty estimates
|
||||
## Algorithm Selection Guide
|
||||
|
||||
**Use cases**: When uncertainty quantification is important, small datasets, smooth function approximation
|
||||
### Choose based on:
|
||||
|
||||
**Key parameters**:
|
||||
- `kernel`: Covariance function (RBF, Matern, RationalQuadratic, etc.)
|
||||
- `alpha`: Noise level
|
||||
**Dataset size:**
|
||||
- Small (<1k samples): KNN, SVM, Decision Trees
|
||||
- Medium (1k-100k): Random Forest, Gradient Boosting, Linear Models
|
||||
- Large (>100k): SGD, Linear Models, HistGradientBoosting
|
||||
|
||||
**Limitation**: Doesn't scale well to large datasets (O(n³) complexity)
|
||||
**Interpretability:**
|
||||
- High: Linear Models, Decision Trees
|
||||
- Medium: Random Forest (feature importance)
|
||||
- Low: SVM with RBF kernel, Neural Networks
|
||||
|
||||
## Stochastic Gradient Descent
|
||||
**Accuracy vs Speed:**
|
||||
- Fast training: Naive Bayes, Linear Models, KNN
|
||||
- High accuracy: Gradient Boosting, Random Forest, Stacking
|
||||
- Fast prediction: Linear Models, Naive Bayes
|
||||
- Slow prediction: KNN (on large datasets), SVM
|
||||
|
||||
- **SGDClassifier**: Linear classifiers with SGD
|
||||
- **SGDRegressor**: Linear regressors with SGD
|
||||
**Feature types:**
|
||||
- Continuous: Most algorithms work well
|
||||
- Categorical: Trees, HistGradientBoosting (native support)
|
||||
- Mixed: Trees, Gradient Boosting
|
||||
- Text: Naive Bayes, Linear Models with TF-IDF
|
||||
|
||||
**Use cases**: Very large datasets (>100k samples), online learning, when data doesn't fit in memory
|
||||
|
||||
**Key parameters**:
|
||||
- `loss`: Loss function ('hinge', 'log_loss', 'squared_error', etc.)
|
||||
- `penalty`: Regularization ('l2', 'l1', 'elasticnet')
|
||||
- `alpha`: Regularization strength
|
||||
- `learning_rate`: Learning rate schedule
|
||||
|
||||
## Semi-Supervised Learning
|
||||
|
||||
- **SelfTrainingClassifier**: Self-training with any base classifier
|
||||
- **LabelPropagation**: Label propagation through graph
|
||||
- **LabelSpreading**: Label spreading (modified label propagation)
|
||||
|
||||
**Use cases**: When labeled data is scarce but unlabeled data is abundant
|
||||
|
||||
## Feature Selection
|
||||
|
||||
- **VarianceThreshold**: Remove low-variance features
|
||||
- **SelectKBest**: Select K highest scoring features
|
||||
- **SelectPercentile**: Select top percentile of features
|
||||
- **RFE**: Recursive feature elimination
|
||||
- **RFECV**: RFE with cross-validation
|
||||
- **SelectFromModel**: Select features based on importance
|
||||
- **SequentialFeatureSelector**: Forward/backward feature selection
|
||||
|
||||
**Use cases**: Reducing dimensionality, removing irrelevant features, improving interpretability, reducing overfitting
|
||||
|
||||
## Probability Calibration
|
||||
|
||||
- **CalibratedClassifierCV**: Calibrate classifier probabilities
|
||||
|
||||
**Use cases**: When probability estimates are important (not just class predictions), especially with SVM and Naive Bayes
|
||||
|
||||
**Methods**:
|
||||
- `sigmoid`: Platt scaling
|
||||
- `isotonic`: Isotonic regression (more flexible, needs more data)
|
||||
|
||||
## Multi-Output Methods
|
||||
|
||||
- **MultiOutputClassifier**: Fit one classifier per target
|
||||
- **MultiOutputRegressor**: Fit one regressor per target
|
||||
- **ClassifierChain**: Models dependencies between targets
|
||||
- **RegressorChain**: Regression variant
|
||||
|
||||
**Use cases**: Predicting multiple related targets simultaneously
|
||||
|
||||
## Specialized Regression
|
||||
|
||||
- **IsotonicRegression**: Monotonic regression
|
||||
- **QuantileRegressor**: Quantile regression for prediction intervals
|
||||
|
||||
## Algorithm Selection Guidelines
|
||||
|
||||
**Start with**:
|
||||
1. **Logistic Regression** (classification) or **LinearRegression/Ridge** (regression) as baseline
|
||||
2. **RandomForestClassifier/Regressor** for general non-linear problems
|
||||
3. **HistGradientBoostingClassifier/Regressor** when best performance is needed
|
||||
|
||||
**Consider dataset size**:
|
||||
- Small (<1k samples): SVM, Gaussian Processes, any algorithm
|
||||
- Medium (1k-100k): Random Forests, Gradient Boosting, Neural Networks
|
||||
- Large (>100k): SGD, HistGradientBoosting, LinearSVC
|
||||
|
||||
**Consider interpretability needs**:
|
||||
- High interpretability: Linear models, Decision Trees, Naive Bayes
|
||||
- Medium: Random Forests (feature importance), Rule extraction
|
||||
- Low (black box acceptable): Gradient Boosting, Neural Networks, SVM with RBF kernel
|
||||
|
||||
**Consider training time**:
|
||||
- Fast: Linear models, Naive Bayes, Decision Trees
|
||||
- Medium: Random Forests (parallelizable), SVM (small data)
|
||||
- Slow: Gradient Boosting, Neural Networks, SVM (large data), Gaussian Processes
|
||||
**Common starting points:**
|
||||
1. Logistic Regression (classification) / Linear Regression (regression) - fast baseline
|
||||
2. Random Forest - good default choice
|
||||
3. Gradient Boosting - optimize for best accuracy
|
||||
|
||||
Reference in New Issue
Block a user