12 KiB
Learning Tasks: Classification, Regression, Clustering, and Similarity Search
This reference provides comprehensive details on supervised and unsupervised learning tasks for time series collections.
Time Series Classification
Time series classification (TSC) assigns labels to entire sequences. Aeon provides diverse algorithm families with unique strengths.
Algorithm Categories
1. Convolution-Based Classifiers
Transform time series using random convolutional kernels:
ROCKET (RAndom Convolutional KErnel Transform)
- Ultra-fast feature extraction via random kernels
- 10,000+ kernels generate discriminative features
- Linear classifier on extracted features
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
Variants:
MiniRocketClassifier: Faster, streamlined versionMultiRocketClassifier: Multivariate extensionsArsenal: Ensemble of ROCKET transformersHydra: Dictionary-based convolution variant
2. Deep Learning Classifiers
Neural networks specialized for time series:
InceptionTime
- Ensemble of Inception modules
- Captures patterns at multiple scales
- State-of-the-art on UCR benchmarks
from aeon.classification.deep_learning import InceptionTimeClassifier
clf = InceptionTimeClassifier(n_epochs=200, batch_size=64)
clf.fit(X_train, y_train)
Other architectures:
ResNetClassifier: Residual connectionsFCNClassifier: Fully Convolutional NetworksCNNClassifier: Standard convolutional architectureLITEClassifier: Lightweight networksMLPClassifier: Multi-layer perceptronsTapNetClassifier: Attentional prototype networks
3. Dictionary-Based Classifiers
Symbolic representations and bag-of-words approaches:
BOSS (Bag of SFA Symbols)
- Converts series to symbolic words
- Histogram-based classification
- Effective for shape patterns
from aeon.classification.dictionary_based import BOSSEnsemble
clf = BOSSEnsemble(max_ensemble_size=500)
clf.fit(X_train, y_train)
Other dictionary methods:
TemporalDictionaryEnsemble (TDE): Enhanced BOSS with temporal infoWEASEL: Word ExtrAction for time SEries cLassificationMUSE: MUltivariate Symbolic ExtensionMrSEQL: Multiple Representations SEQuence Learner
4. Distance-Based Classifiers
Leverage time series-specific distance metrics:
K-Nearest Neighbors with DTW
- Dynamic Time Warping handles temporal shifts
- Effective for shape-based similarity
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
clf = KNeighborsTimeSeriesClassifier(
distance="dtw",
n_neighbors=5
)
clf.fit(X_train, y_train)
Other distance methods:
ElasticEnsemble: Ensemble of elastic distancesProximityForest: Tree-based with elastic measuresProximityTree: Single tree variantShapeDTW: DTW with shape descriptors
5. Feature-Based Classifiers
Extract statistical and domain-specific features:
Catch22
- 22 time series features
- Canonical Time-series CHaracteristics
- Fast and interpretable
from aeon.classification.feature_based import Catch22Classifier
clf = Catch22Classifier(estimator=RandomForestClassifier())
clf.fit(X_train, y_train)
Other feature methods:
FreshPRINCEClassifier: Fresh Pipelines with Random Interval and Catch22 FeaturesSignatureClassifier: Path signature featuresTSFreshClassifier: Comprehensive feature extraction (slower, more features)SummaryClassifier: Simple summary statistics
6. Interval-Based Classifiers
Analyze discriminative time intervals:
Time Series Forest (TSF)
- Random intervals + summary statistics
- Random forest on extracted features
from aeon.classification.interval_based import TimeSeriesForestClassifier
clf = TimeSeriesForestClassifier(n_estimators=500)
clf.fit(X_train, y_train)
Other interval methods:
CanonicalIntervalForest (CIF): Canonical Interval ForestDrCIF: Diverse Representation CIFRISE: Random Interval Spectral EnsembleRandomIntervalClassifier: Basic random interval approachSTSF: Shapelet Transform Interval Forest
7. Shapelet-Based Classifiers
Discover discriminative subsequences:
Shapelets: Small subsequences that best distinguish classes
from aeon.classification.shapelet_based import ShapeletTransformClassifier
clf = ShapeletTransformClassifier(
n_shapelet_samples=10000,
max_shapelets=20
)
clf.fit(X_train, y_train)
Other shapelet methods:
LearningShapeletClassifier: Gradient-based learningSASTClassifier: Shapelet-Attention Subsequence Transform
8. Hybrid Ensembles
Combine multiple algorithm families:
HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles)
- State-of-the-art accuracy
- Combines shapelets, intervals, dictionaries, and spectral features
- V2 uses ROCKET and improved components
from aeon.classification.hybrid import HIVECOTEV2
clf = HIVECOTEV2(n_jobs=-1) # Slow but highly accurate
clf.fit(X_train, y_train)
Algorithm Selection Guide
Fast and accurate (default choice):
RocketClassifierorMiniRocketClassifier
Maximum accuracy (slow):
HIVECOTEV2orInceptionTimeClassifier
Interpretable:
Catch22ClassifierorShapeletTransformClassifier
Multivariate focus:
MultiRocketClassifierorMUSE
Small datasets:
KNeighborsTimeSeriesClassifierwith DTW
Classification Workflow
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
from sklearn.metrics import accuracy_score, classification_report
# Load data
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Train classifier
clf = RocketClassifier(n_jobs=-1)
clf.fit(X_train, y_train)
# Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.3f}")
print(classification_report(y_test, y_pred))
Time Series Regression
Time series regression predicts continuous values from sequences. Most classification algorithms have regression equivalents.
Regression Algorithms
Available regressors mirror classification structure:
RocketRegressor,MiniRocketRegressor,MultiRocketRegressorInceptionTimeRegressor,ResNetRegressor,FCNRegressorKNeighborsTimeSeriesRegressorCatch22Regressor,FreshPRINCERegressorTimeSeriesForestRegressor,DrCIFRegressor
Regression Workflow
from aeon.regression.convolution_based import RocketRegressor
from sklearn.metrics import mean_squared_error, r2_score
# Train regressor
reg = RocketRegressor(num_kernels=10000)
reg.fit(X_train, y_train_continuous)
# Predict and evaluate
y_pred = reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MSE: {mse:.3f}, R²: {r2:.3f}")
Time Series Clustering
Clustering groups similar time series without labels.
Clustering Algorithms
TimeSeriesKMeans
- K-means with time series distances
- Supports DTW, Euclidean, and other metrics
from aeon.clustering import TimeSeriesKMeans
clusterer = TimeSeriesKMeans(
n_clusters=3,
distance="dtw",
n_init=10
)
clusterer.fit(X_collection)
labels = clusterer.labels_
TimeSeriesKMedoids
- Uses actual series as cluster centers
- More robust to outliers
from aeon.clustering import TimeSeriesKMedoids
clusterer = TimeSeriesKMedoids(
n_clusters=3,
distance="euclidean"
)
clusterer.fit(X_collection)
Other clustering methods:
TimeSeriesKernelKMeans: Kernel-based clusteringElasticSOM: Self-organizing maps with elastic distances
Clustering Workflow
from aeon.clustering import TimeSeriesKMeans
from aeon.distances import dtw_distance
import numpy as np
# Cluster time series
clusterer = TimeSeriesKMeans(n_clusters=4, distance="dtw")
clusterer.fit(X_train)
# Get cluster labels
labels = clusterer.predict(X_test)
# Compute cluster centers
centers = clusterer.cluster_centers_
# Evaluate clustering quality (if ground truth available)
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(y_true, labels)
Similarity Search
Similarity search finds motifs, nearest neighbors, and repeated patterns.
Key Concepts
Motifs: Frequently repeated subsequences within a time series Matrix Profile: Data structure encoding nearest neighbor distances for all subsequences
Similarity Search Methods
Matrix Profile
- Efficient motif discovery
- Change point detection
- Anomaly detection
from aeon.similarity_search import MatrixProfile
mp = MatrixProfile(window_size=50)
profile = mp.fit_transform(X_series)
# Find top motif
motif_idx = np.argmin(profile)
Query Search
- Find nearest neighbors to a query subsequence
- Useful for template matching
from aeon.similarity_search import QuerySearch
searcher = QuerySearch(distance="euclidean")
distances, indices = searcher.search(X_series, query_subsequence)
Similarity Search Workflow
from aeon.similarity_search import MatrixProfile
import numpy as np
# Compute matrix profile
mp = MatrixProfile(window_size=100)
profile, profile_index = mp.fit_transform(X_series)
# Find top-k motifs (lowest profile values)
k = 3
motif_indices = np.argsort(profile)[:k]
# Find anomalies (highest profile values)
anomaly_indices = np.argsort(profile)[-k:]
Ensemble and Composition Tools
Voting Ensembles
from aeon.classification.ensemble import WeightedEnsembleClassifier
from aeon.classification.convolution_based import RocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
ensemble = WeightedEnsembleClassifier(
estimators=[
('rocket', RocketClassifier()),
('knn', KNeighborsTimeSeriesClassifier())
]
)
ensemble.fit(X_train, y_train)
Pipelines
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('features', Catch22()),
('classifier', RandomForestClassifier())
])
pipeline.fit(X_train, y_train)
Model Selection and Validation
Cross-Validation
from sklearn.model_selection import cross_val_score
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier()
scores = cross_val_score(clf, X_train, y_train, cv=5)
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
Grid Search
from sklearn.model_selection import GridSearchCV
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
param_grid = {
'n_neighbors': [1, 3, 5, 7],
'distance': ['dtw', 'euclidean', 'erp']
}
clf = KNeighborsTimeSeriesClassifier()
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")
Discovery Functions
Find available estimators programmatically:
from aeon.utils.discovery import all_estimators
# Get all classifiers
classifiers = all_estimators(type_filter="classifier")
# Get all regressors
regressors = all_estimators(type_filter="regressor")
# Get all clusterers
clusterers = all_estimators(type_filter="clusterer")
# Filter by tag (e.g., multivariate capable)
mv_classifiers = all_estimators(
type_filter="classifier",
filter_tags={"capability:multivariate": True}
)