Support for aeon for time-series analysis and machine learning

This commit is contained in:
Timothy Kassis
2025-10-25 21:25:50 -07:00
parent 6cefe6f4cc
commit b83942845c
5 changed files with 2645 additions and 0 deletions

View File

@@ -0,0 +1,224 @@
---
name: aeon
description: Time series machine learning toolkit for classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use this skill when working with temporal data, performing time series analysis, building predictive models on sequential data, or implementing workflows that involve distance metrics (DTW), transformations (ROCKET, Catch22), or deep learning for time series. Applicable for tasks like ECG classification, stock price forecasting, sensor anomaly detection, or activity recognition from wearable devices.
---
# Aeon
## Overview
Aeon is a comprehensive Python toolkit for time series machine learning, providing state-of-the-art algorithms and classical techniques for analyzing temporal data. Use this skill when working with sequential/temporal data across seven primary learning tasks: classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search.
## When to Use This Skill
Apply this skill when:
- Classifying or predicting from time series data (e.g., ECG classification, activity recognition)
- Forecasting future values in temporal sequences (e.g., stock prices, energy demand)
- Detecting anomalies in sensor streams or operational data
- Clustering temporal patterns or discovering motifs
- Segmenting time series into meaningful regions (change point detection)
- Computing distances between time series using specialized metrics (DTW, MSM, ERP)
- Extracting features from temporal data using ROCKET, Catch22, TSFresh, or shapelets
- Building deep learning models for time series with specialized architectures
## Core Capabilities
### 1. Time Series Classification
Classify labeled time series using diverse algorithm families:
- **Convolution-based**: ROCKET, MiniRocket, MultiRocket, Arsenal, Hydra
- **Deep learning**: InceptionTime, ResNet, FCN, TimeCNN, LITE
- **Dictionary-based**: BOSS, TDE, WEASEL, MrSEQL (symbolic representations)
- **Distance-based**: KNN with elastic distances, Elastic Ensemble, Proximity Forest
- **Feature-based**: Catch22, FreshPRINCE, Signature classifiers
- **Interval-based**: CIF, DrCIF, RISE, Random Interval variants
- **Shapelet-based**: Learning Shapelet, SAST
- **Hybrid ensembles**: HIVE-COTE V1/V2
Example:
```python
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
clf = RocketClassifier()
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
```
### 2. Time Series Regression
Predict continuous values from time series using adapted classification algorithms:
```python
from aeon.regression.convolution_based import RocketRegressor
reg = RocketRegressor()
reg.fit(X_train, y_train_continuous)
predictions = reg.predict(X_test)
```
### 3. Forecasting
Predict future values using statistical and deep learning models:
- Statistical: ARIMA, ETS, Theta, TAR, AutoTAR, TVP
- Naive baselines: NaiveForecaster with seasonal strategies
- Deep learning: TCN (Temporal Convolutional Networks)
- Regression-based: RegressionForecaster with sliding windows
Example:
```python
from aeon.forecasting.naive import NaiveForecaster
forecaster = NaiveForecaster(strategy="last")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3]) # forecast 3 steps ahead
```
### 4. Anomaly Detection
Identify outliers in time series data:
- **Distance-based**: KMeansAD, CBLOF, LOF, STOMP, LeftSTAMPi, MERLIN, ROCKAD
- **Distribution-based**: COPOD, DWT_MLEAD
- **Outlier detection**: IsolationForest, OneClassSVM, STRAY
- **Collection adapters**: ClassificationAdapter, OutlierDetectionAdapter
Example:
```python
from aeon.anomaly_detection import STOMP
detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
### 5. Clustering
Group similar time series without labels:
```python
from aeon.clustering import TimeSeriesKMeans
clusterer = TimeSeriesKMeans(n_clusters=3, distance="dtw")
clusterer.fit(X_collection)
labels = clusterer.predict(X_new)
```
### 6. Segmentation
Divide time series into distinct regions or identify change points:
```python
from aeon.segmentation import ClaSPSegmenter
segmenter = ClaSPSegmenter()
change_points = segmenter.fit_predict(X_series)
```
### 7. Similarity Search
Find motifs and nearest neighbors in time series collections using specialized distance metrics and matrix profile techniques.
### 8. Transformations
Preprocess and extract features from time series:
- **Collection transformers**: ROCKET, Catch22, TSFresh, Shapelet, SAX, PAA, SFA
- **Series transformers**: Moving Average, Box-Cox, PCA, Fourier, Savitzky-Golay
- **Channel operations**: Selection, scoring, balancing
- **Data balancing**: SMOTE, ADASYN
Example:
```python
from aeon.transformations.collection.convolution_based import Rocket
rocket = Rocket(num_kernels=10000)
X_transformed = rocket.fit_transform(X_train)
```
### 9. Distance Metrics
Compute specialized time series distances:
- **Warping**: DTW, WDTW, DDTW, WDDTW, Shape DTW, ADTW
- **Edit distances**: ERP, EDR, LCSS, TWE
- **Standard**: Euclidean, Manhattan, Minkowski, Squared
- **Specialized**: MSM, SBD
Example:
```python
from aeon.distances import dtw_distance, pairwise_distance
dist = dtw_distance(series1, series2)
dist_matrix = pairwise_distance(X_collection, metric="dtw")
```
## Installation
Install aeon using pip:
```bash
# Core dependencies only
pip install -U aeon
# All optional dependencies
pip install -U "aeon[all_extras]"
```
Or using conda:
```bash
conda create -n aeon-env -c conda-forge aeon
conda activate aeon-env
```
**Requirements**: Python 3.9, 3.10, 3.11, or 3.12
## Data Format
Aeon uses standardized data shapes:
- **Collections**: `(n_cases, n_channels, n_timepoints)` as NumPy arrays or pandas DataFrames
- **Single series**: NumPy arrays or pandas Series
- **Variable-length**: Supported with padding or specialized handling
Load example datasets:
```python
from aeon.datasets import load_arrow_head, load_airline
# Classification dataset
X_train, y_train = load_arrow_head(split="train")
# Forecasting dataset
y = load_airline()
```
## Workflow Patterns
### Pipeline Construction
Combine transformers and estimators using scikit-learn pipelines:
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
pipeline = Pipeline([
('features', Catch22()),
('classifier', KNeighborsTimeSeriesClassifier())
])
pipeline.fit(X_train, y_train)
```
### Discovery and Tags
Find estimators programmatically:
```python
from aeon.utils.discovery import all_estimators
# Find all classifiers
classifiers = all_estimators(type_filter="classifier")
# Find all forecasters
forecasters = all_estimators(type_filter="forecaster")
```
## References
The skill includes modular reference files with comprehensive details:
### references/learning_tasks.md
In-depth coverage of classification, regression, clustering, and similarity search, including algorithm categories, use cases, and code patterns.
### references/temporal_analysis.md
Detailed information on forecasting, anomaly detection, and segmentation tasks with model descriptions and workflows.
### references/core_modules.md
Comprehensive documentation of transformations, distances, networks, datasets, and benchmarking utilities.
### references/workflows.md
Common workflow patterns, pipeline examples, cross-validation strategies, and integration with scikit-learn.
Load these reference files as needed for detailed information on specific modules or workflows.

View File

@@ -0,0 +1,749 @@
# Core Modules: Transformations, Distances, Networks, Datasets, and Benchmarking
This reference provides comprehensive details on foundational modules that support aeon's learning tasks.
## Transformations
Transformations convert time series into alternative representations for feature extraction, preprocessing, or visualization.
### Two Types of Transformers
**Collection Transformers**: Process entire collections of time series
- Input: `(n_cases, n_channels, n_timepoints)`
- Output: Features, transformed collections, or tabular data
**Series Transformers**: Work on individual time series
- Input: Single time series
- Output: Transformed single series
### Collection-Level Transformations
#### ROCKET (RAndom Convolutional KErnel Transform)
Fast feature extraction via random convolutional kernels:
```python
from aeon.transformations.collection.convolution_based import Rocket
rocket = Rocket(num_kernels=10000, n_jobs=-1)
X_transformed = rocket.fit_transform(X_train)
# Output shape: (n_cases, 2 * num_kernels)
```
**Variants:**
```python
from aeon.transformations.collection.convolution_based import (
MiniRocket,
MultiRocket,
Hydra
)
# MiniRocket: Faster, streamlined version
minirocket = MiniRocket(num_kernels=10000)
X_features = minirocket.fit_transform(X_train)
# MultiRocket: Multivariate extensions
multirocket = MultiRocket(num_kernels=10000)
X_features = multirocket.fit_transform(X_train)
# Hydra: Dictionary-based convolution
hydra = Hydra(n_kernels=8)
X_features = hydra.fit_transform(X_train)
```
#### Catch22
22 canonical time series features:
```python
from aeon.transformations.collection.feature_based import Catch22
catch22 = Catch22(n_jobs=-1)
X_features = catch22.fit_transform(X_train)
# Output shape: (n_cases, 22)
```
**Feature categories:**
- Distribution (mean, variance, skewness)
- Autocorrelation properties
- Entropy measures
- Nonlinear dynamics
- Spectral properties
#### TSFresh
Comprehensive feature extraction (779 features):
```python
from aeon.transformations.collection.feature_based import TSFresh
tsfresh = TSFresh(
default_fc_parameters="comprehensive",
n_jobs=-1
)
X_features = tsfresh.fit_transform(X_train)
```
**Warning**: Slow on large datasets; use Catch22 for faster alternative
#### FreshPRINCE
Fresh Pipelines with Random Interval and Catch22 Features:
```python
from aeon.transformations.collection.feature_based import FreshPRINCE
freshprince = FreshPRINCE(n_intervals=50, n_jobs=-1)
X_features = freshprince.fit_transform(X_train)
```
#### Shapelet Transform
Extract discriminative subsequences:
```python
from aeon.transformations.collection.shapelet_based import ShapeletTransform
shapelet = ShapeletTransform(
n_shapelet_samples=10000,
max_shapelets=20,
n_jobs=-1
)
X_features = shapelet.fit_transform(X_train, y_train)
# Requires labels for supervised shapelet discovery
```
**Random Shapelet Transform**:
```python
from aeon.transformations.collection.shapelet_based import RandomShapeletTransform
rst = RandomShapeletTransform(n_shapelets=1000)
X_features = rst.fit_transform(X_train)
```
#### SAST (Shapelet-Attention Subsequence Transform)
Attention-based shapelet discovery:
```python
from aeon.transformations.collection.shapelet_based import SAST
sast = SAST(window_size=0.1, n_shapelets=100)
X_features = sast.fit_transform(X_train, y_train)
```
#### Symbolic Representations
**SAX (Symbolic Aggregate approXimation)**:
```python
from aeon.transformations.collection.dictionary_based import SAX
sax = SAX(n_segments=8, alphabet_size=4)
X_symbolic = sax.fit_transform(X_train)
```
**PAA (Piecewise Aggregate Approximation)**:
```python
from aeon.transformations.collection.dictionary_based import PAA
paa = PAA(n_segments=10)
X_approximated = paa.fit_transform(X_train)
```
**SFA (Symbolic Fourier Approximation)**:
```python
from aeon.transformations.collection.dictionary_based import SFA
sfa = SFA(word_length=8, alphabet_size=4)
X_symbolic = sfa.fit_transform(X_train)
```
#### Channel Selection and Operations
**Channel Selection**:
```python
from aeon.transformations.collection.channel_selection import ChannelSelection
selector = ChannelSelection(channels=[0, 2, 5])
X_selected = selector.fit_transform(X_train)
```
**Channel Scoring**:
```python
from aeon.transformations.collection.channel_selection import ChannelScorer
scorer = ChannelScorer()
scores = scorer.fit_transform(X_train, y_train)
```
#### Data Balancing
**SMOTE (Synthetic Minority Over-sampling)**:
```python
from aeon.transformations.collection.smote import SMOTE
smote = SMOTE(k_neighbors=5)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
```
**ADASYN**:
```python
from aeon.transformations.collection.smote import ADASYN
adasyn = ADASYN(n_neighbors=5)
X_resampled, y_resampled = adasyn.fit_resample(X_train, y_train)
```
### Series-Level Transformations
#### Smoothing Filters
**Moving Average**:
```python
from aeon.transformations.series.moving_average import MovingAverage
ma = MovingAverage(window_size=5)
X_smoothed = ma.fit_transform(X_series)
```
**Exponential Smoothing**:
```python
from aeon.transformations.series.exponent import ExponentTransformer
exp_smooth = ExponentTransformer(power=0.5)
X_smoothed = exp_smooth.fit_transform(X_series)
```
**Savitzky-Golay Filter**:
```python
from aeon.transformations.series.savgol import SavitzkyGolay
savgol = SavitzkyGolay(window_length=11, polyorder=3)
X_smoothed = savgol.fit_transform(X_series)
```
**Gaussian Filter**:
```python
from aeon.transformations.series.gaussian import GaussianFilter
gaussian = GaussianFilter(sigma=2.0)
X_smoothed = gaussian.fit_transform(X_series)
```
#### Statistical Transforms
**Box-Cox Transformation**:
```python
from aeon.transformations.series.boxcox import BoxCoxTransformer
boxcox = BoxCoxTransformer()
X_transformed = boxcox.fit_transform(X_series)
```
**AutoCorrelation**:
```python
from aeon.transformations.series.acf import AutoCorrelationTransformer
acf = AutoCorrelationTransformer(n_lags=40)
X_acf = acf.fit_transform(X_series)
```
**PCA (Principal Component Analysis)**:
```python
from aeon.transformations.series.pca import PCATransformer
pca = PCATransformer(n_components=3)
X_reduced = pca.fit_transform(X_series)
```
#### Approximation Methods
**Discrete Fourier Transform (DFT)**:
```python
from aeon.transformations.series.fourier import FourierTransform
dft = FourierTransform()
X_freq = dft.fit_transform(X_series)
```
**Piecewise Linear Approximation (PLA)**:
```python
from aeon.transformations.series.pla import PLA
pla = PLA(n_segments=10)
X_approx = pla.fit_transform(X_series)
```
#### Anomaly Detection Transform
**DOBIN (Distance-based Outlier BasIs using Neighbors)**:
```python
from aeon.transformations.series.dobin import DOBIN
dobin = DOBIN()
X_transformed = dobin.fit_transform(X_series)
```
### Transformation Pipelines
Chain transformers together:
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22, PCA
pipeline = Pipeline([
('features', Catch22()),
('reduce', PCA(n_components=10))
])
X_transformed = pipeline.fit_transform(X_train)
```
## Distance Metrics
Specialized distance functions for time series similarity measurement.
### Distance Categories
#### Warping-Based Distances
**DTW (Dynamic Time Warping)**:
```python
from aeon.distances import dtw_distance, dtw_pairwise_distance
# Compute distance between two series
dist = dtw_distance(series1, series2, window=0.2)
# Pairwise distances for a collection
dist_matrix = dtw_pairwise_distance(X_collection)
# Get alignment path
from aeon.distances import dtw_alignment_path
path = dtw_alignment_path(series1, series2)
# Get cost matrix
from aeon.distances import dtw_cost_matrix
cost = dtw_cost_matrix(series1, series2)
```
**DTW Variants**:
```python
from aeon.distances import (
wdtw_distance, # Weighted DTW
ddtw_distance, # Derivative DTW
wddtw_distance, # Weighted Derivative DTW
adtw_distance, # Amerced DTW
shape_dtw_distance # Shape DTW
)
# Weighted DTW (penalize warping)
dist = wdtw_distance(series1, series2, g=0.05)
# Derivative DTW (compare shapes)
dist = ddtw_distance(series1, series2)
# Shape DTW (with shape descriptors)
dist = shape_dtw_distance(series1, series2)
```
**DTW Parameters**:
- `window`: Sakoe-Chiba band constraint (0.0-1.0)
- `g`: Penalty weight for warping distances
#### Edit Distances
**ERP (Edit distance with Real Penalty)**:
```python
from aeon.distances import erp_distance
dist = erp_distance(series1, series2, g=0.0, window=None)
```
**EDR (Edit Distance on Real sequences)**:
```python
from aeon.distances import edr_distance
dist = edr_distance(series1, series2, epsilon=0.1, window=None)
```
**LCSS (Longest Common SubSequence)**:
```python
from aeon.distances import lcss_distance
dist = lcss_distance(series1, series2, epsilon=1.0, window=None)
```
**TWE (Time Warp Edit)**:
```python
from aeon.distances import twe_distance
dist = twe_distance(series1, series2, penalty=0.1, stiffness=0.001)
```
#### Standard Metrics
```python
from aeon.distances import (
euclidean_distance,
manhattan_distance,
minkowski_distance,
squared_distance
)
# Euclidean distance
dist = euclidean_distance(series1, series2)
# Manhattan (L1) distance
dist = manhattan_distance(series1, series2)
# Minkowski distance
dist = minkowski_distance(series1, series2, p=3)
# Squared Euclidean
dist = squared_distance(series1, series2)
```
#### Specialized Distances
**MSM (Move-Split-Merge)**:
```python
from aeon.distances import msm_distance
dist = msm_distance(series1, series2, c=1.0)
```
**SBD (Shape-Based Distance)**:
```python
from aeon.distances import sbd_distance
dist = sbd_distance(series1, series2)
```
### Unified Distance Interface
```python
from aeon.distances import distance, pairwise_distance
# Compute any distance by name
dist = distance(series1, series2, metric="dtw", window=0.1)
# Pairwise distance matrix
dist_matrix = pairwise_distance(X_collection, metric="euclidean")
# Get available distance names
from aeon.distances import get_distance_function_names
available_distances = get_distance_function_names()
```
### Distance Selection Guide
**Fast and accurate**:
- Euclidean for aligned series
- Squared for even faster computation
**Handle temporal shifts**:
- DTW for general warping
- WDTW to penalize excessive warping
**Shape-based similarity**:
- DDTW or Shape DTW
- SBD for normalized shape comparison
**Robust to noise**:
- ERP, EDR, or LCSS
**Multivariate**:
- DTW supports multivariate via independent/dependent alignment
## Deep Learning Networks
Neural network architectures specialized for time series.
### Network Architectures
#### InceptionTime
Ensemble of Inception modules capturing multi-scale patterns:
```python
from aeon.networks import InceptionNetwork
from aeon.classification.deep_learning import InceptionTimeClassifier
# Use via classifier
clf = InceptionTimeClassifier(
n_epochs=200,
batch_size=64,
n_ensemble=5
)
# Or use network directly
network = InceptionNetwork(
n_classes=3,
n_channels=1,
n_timepoints=100
)
```
#### ResNet
Residual networks with skip connections:
```python
from aeon.networks import ResNetNetwork
from aeon.classification.deep_learning import ResNetClassifier
clf = ResNetClassifier(
n_epochs=200,
batch_size=64,
n_res_blocks=3
)
```
#### FCN (Fully Convolutional Network)
```python
from aeon.networks import FCNNetwork
from aeon.classification.deep_learning import FCNClassifier
clf = FCNClassifier(
n_epochs=200,
batch_size=64,
n_conv_layers=3
)
```
#### CNN
Standard convolutional architecture:
```python
from aeon.classification.deep_learning import CNNClassifier
clf = CNNClassifier(
n_epochs=100,
batch_size=32,
kernel_size=7,
n_filters=32
)
```
#### TapNet
Attentional prototype networks:
```python
from aeon.classification.deep_learning import TapNetClassifier
clf = TapNetClassifier(
n_epochs=200,
batch_size=64
)
```
#### MLP (Multi-Layer Perceptron)
```python
from aeon.classification.deep_learning import MLPClassifier
clf = MLPClassifier(
n_epochs=100,
batch_size=32,
hidden_layer_sizes=[500]
)
```
#### LITE (Light Inception with boosTing tEchnique)
Lightweight ensemble network:
```python
from aeon.classification.deep_learning import LITEClassifier
clf = LITEClassifier(
n_epochs=100,
batch_size=64
)
```
### Training Configuration
```python
from aeon.classification.deep_learning import InceptionTimeClassifier
clf = InceptionTimeClassifier(
n_epochs=200,
batch_size=64,
learning_rate=0.001,
use_bias=True,
verbose=1
)
clf.fit(X_train, y_train)
```
**Common parameters:**
- `n_epochs`: Training iterations
- `batch_size`: Samples per gradient update
- `learning_rate`: Optimizer learning rate
- `verbose`: Training output verbosity
- `callbacks`: Keras callbacks (early stopping, etc.)
## Datasets
Load built-in datasets and access UCR/UEA archives.
### Built-in Datasets
```python
from aeon.datasets import (
load_arrow_head,
load_airline,
load_gunpoint,
load_italy_power_demand,
load_basic_motions,
load_japanese_vowels
)
# Classification dataset
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Forecasting dataset (univariate series)
y = load_airline()
# Multivariate classification
X_train, y_train = load_basic_motions(split="train")
print(X_train.shape) # (n_cases, n_channels, n_timepoints)
```
### UCR/UEA Archives
Access 100+ benchmark datasets:
```python
from aeon.datasets import load_from_tsfile, load_classification
# Load UCR/UEA dataset by name
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Load from local .ts file
X, y = load_from_tsfile("data/my_dataset_TRAIN.ts")
```
### Dataset Information
```python
from aeon.datasets import get_dataset_meta_data
# Get metadata about a dataset
info = get_dataset_meta_data("GunPoint")
print(info)
# {'n_cases': 150, 'n_timepoints': 150, 'n_classes': 2, ...}
```
### Custom Dataset Format
Save/load custom datasets in aeon format:
```python
from aeon.datasets import write_to_tsfile, load_from_tsfile
# Save
write_to_tsfile(
X_train,
"my_dataset_TRAIN.ts",
y=y_train,
problem_name="MyDataset"
)
# Load
X, y = load_from_tsfile("my_dataset_TRAIN.ts")
```
## Benchmarking
Tools for reproducible evaluation and comparison.
### Benchmarking Utilities
```python
from aeon.benchmarking import benchmark_estimator
# Benchmark a classifier on multiple datasets
results = benchmark_estimator(
estimator=RocketClassifier(),
datasets=["GunPoint", "ArrowHead", "ItalyPowerDemand"],
n_resamples=10
)
```
### Result Storage and Comparison
```python
from aeon.benchmarking import (
write_results_to_csv,
read_results_from_csv,
compare_results
)
# Save results
write_results_to_csv(results, "results.csv")
# Load and compare
results_rocket = read_results_from_csv("results_rocket.csv")
results_inception = read_results_from_csv("results_inception.csv")
comparison = compare_results(
[results_rocket, results_inception],
estimator_names=["ROCKET", "InceptionTime"]
)
```
### Critical Difference Diagrams
Visualize statistical significance of differences:
```python
from aeon.benchmarking.results_plotting import plot_critical_difference_diagram
plot_critical_difference_diagram(
results_dict={
'ROCKET': results_rocket,
'InceptionTime': results_inception,
'BOSS': results_boss
},
dataset_names=["GunPoint", "ArrowHead", "ItalyPowerDemand"]
)
```
## Discovery and Tags
### Finding Estimators
```python
from aeon.utils.discovery import all_estimators
# Get all classifiers
classifiers = all_estimators(type_filter="classifier")
# Get all transformers
transformers = all_estimators(type_filter="transformer")
# Filter by capability tags
multivariate_classifiers = all_estimators(
type_filter="classifier",
filter_tags={"capability:multivariate": True}
)
```
### Checking Estimator Tags
```python
from aeon.utils.tags import all_tags_for_estimator
from aeon.classification.convolution_based import RocketClassifier
tags = all_tags_for_estimator(RocketClassifier)
print(tags)
# {'capability:multivariate': True, 'X_inner_type': ['numpy3D'], ...}
```
### Common Tags
- `capability:multivariate`: Handles multivariate series
- `capability:unequal_length`: Handles variable-length series
- `capability:missing_values`: Handles missing data
- `algorithm_type`: Algorithm family (e.g., "convolution", "distance")
- `python_dependencies`: Required packages

View File

@@ -0,0 +1,442 @@
# Learning Tasks: Classification, Regression, Clustering, and Similarity Search
This reference provides comprehensive details on supervised and unsupervised learning tasks for time series collections.
## Time Series Classification
Time series classification (TSC) assigns labels to entire sequences. Aeon provides diverse algorithm families with unique strengths.
### Algorithm Categories
#### 1. Convolution-Based Classifiers
Transform time series using random convolutional kernels:
**ROCKET (RAndom Convolutional KErnel Transform)**
- Ultra-fast feature extraction via random kernels
- 10,000+ kernels generate discriminative features
- Linear classifier on extracted features
```python
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
```
**Variants:**
- `MiniRocketClassifier`: Faster, streamlined version
- `MultiRocketClassifier`: Multivariate extensions
- `Arsenal`: Ensemble of ROCKET transformers
- `Hydra`: Dictionary-based convolution variant
#### 2. Deep Learning Classifiers
Neural networks specialized for time series:
**InceptionTime**
- Ensemble of Inception modules
- Captures patterns at multiple scales
- State-of-the-art on UCR benchmarks
```python
from aeon.classification.deep_learning import InceptionTimeClassifier
clf = InceptionTimeClassifier(n_epochs=200, batch_size=64)
clf.fit(X_train, y_train)
```
**Other architectures:**
- `ResNetClassifier`: Residual connections
- `FCNClassifier`: Fully Convolutional Networks
- `CNNClassifier`: Standard convolutional architecture
- `LITEClassifier`: Lightweight networks
- `MLPClassifier`: Multi-layer perceptrons
- `TapNetClassifier`: Attentional prototype networks
#### 3. Dictionary-Based Classifiers
Symbolic representations and bag-of-words approaches:
**BOSS (Bag of SFA Symbols)**
- Converts series to symbolic words
- Histogram-based classification
- Effective for shape patterns
```python
from aeon.classification.dictionary_based import BOSSEnsemble
clf = BOSSEnsemble(max_ensemble_size=500)
clf.fit(X_train, y_train)
```
**Other dictionary methods:**
- `TemporalDictionaryEnsemble (TDE)`: Enhanced BOSS with temporal info
- `WEASEL`: Word ExtrAction for time SEries cLassification
- `MUSE`: MUltivariate Symbolic Extension
- `MrSEQL`: Multiple Representations SEQuence Learner
#### 4. Distance-Based Classifiers
Leverage time series-specific distance metrics:
**K-Nearest Neighbors with DTW**
- Dynamic Time Warping handles temporal shifts
- Effective for shape-based similarity
```python
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
clf = KNeighborsTimeSeriesClassifier(
distance="dtw",
n_neighbors=5
)
clf.fit(X_train, y_train)
```
**Other distance methods:**
- `ElasticEnsemble`: Ensemble of elastic distances
- `ProximityForest`: Tree-based with elastic measures
- `ProximityTree`: Single tree variant
- `ShapeDTW`: DTW with shape descriptors
#### 5. Feature-Based Classifiers
Extract statistical and domain-specific features:
**Catch22**
- 22 time series features
- Canonical Time-series CHaracteristics
- Fast and interpretable
```python
from aeon.classification.feature_based import Catch22Classifier
clf = Catch22Classifier(estimator=RandomForestClassifier())
clf.fit(X_train, y_train)
```
**Other feature methods:**
- `FreshPRINCEClassifier`: Fresh Pipelines with Random Interval and Catch22 Features
- `SignatureClassifier`: Path signature features
- `TSFreshClassifier`: Comprehensive feature extraction (slower, more features)
- `SummaryClassifier`: Simple summary statistics
#### 6. Interval-Based Classifiers
Analyze discriminative time intervals:
**Time Series Forest (TSF)**
- Random intervals + summary statistics
- Random forest on extracted features
```python
from aeon.classification.interval_based import TimeSeriesForestClassifier
clf = TimeSeriesForestClassifier(n_estimators=500)
clf.fit(X_train, y_train)
```
**Other interval methods:**
- `CanonicalIntervalForest (CIF)`: Canonical Interval Forest
- `DrCIF`: Diverse Representation CIF
- `RISE`: Random Interval Spectral Ensemble
- `RandomIntervalClassifier`: Basic random interval approach
- `STSF`: Shapelet Transform Interval Forest
#### 7. Shapelet-Based Classifiers
Discover discriminative subsequences:
**Shapelets**: Small subsequences that best distinguish classes
```python
from aeon.classification.shapelet_based import ShapeletTransformClassifier
clf = ShapeletTransformClassifier(
n_shapelet_samples=10000,
max_shapelets=20
)
clf.fit(X_train, y_train)
```
**Other shapelet methods:**
- `LearningShapeletClassifier`: Gradient-based learning
- `SASTClassifier`: Shapelet-Attention Subsequence Transform
#### 8. Hybrid Ensembles
Combine multiple algorithm families:
**HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles)**
- State-of-the-art accuracy
- Combines shapelets, intervals, dictionaries, and spectral features
- V2 uses ROCKET and improved components
```python
from aeon.classification.hybrid import HIVECOTEV2
clf = HIVECOTEV2(n_jobs=-1) # Slow but highly accurate
clf.fit(X_train, y_train)
```
### Algorithm Selection Guide
**Fast and accurate (default choice):**
- `RocketClassifier` or `MiniRocketClassifier`
**Maximum accuracy (slow):**
- `HIVECOTEV2` or `InceptionTimeClassifier`
**Interpretable:**
- `Catch22Classifier` or `ShapeletTransformClassifier`
**Multivariate focus:**
- `MultiRocketClassifier` or `MUSE`
**Small datasets:**
- `KNeighborsTimeSeriesClassifier` with DTW
### Classification Workflow
```python
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
from sklearn.metrics import accuracy_score, classification_report
# Load data
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Train classifier
clf = RocketClassifier(n_jobs=-1)
clf.fit(X_train, y_train)
# Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.3f}")
print(classification_report(y_test, y_pred))
```
## Time Series Regression
Time series regression predicts continuous values from sequences. Most classification algorithms have regression equivalents.
### Regression Algorithms
Available regressors mirror classification structure:
- `RocketRegressor`, `MiniRocketRegressor`, `MultiRocketRegressor`
- `InceptionTimeRegressor`, `ResNetRegressor`, `FCNRegressor`
- `KNeighborsTimeSeriesRegressor`
- `Catch22Regressor`, `FreshPRINCERegressor`
- `TimeSeriesForestRegressor`, `DrCIFRegressor`
### Regression Workflow
```python
from aeon.regression.convolution_based import RocketRegressor
from sklearn.metrics import mean_squared_error, r2_score
# Train regressor
reg = RocketRegressor(num_kernels=10000)
reg.fit(X_train, y_train_continuous)
# Predict and evaluate
y_pred = reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MSE: {mse:.3f}, R²: {r2:.3f}")
```
## Time Series Clustering
Clustering groups similar time series without labels.
### Clustering Algorithms
**TimeSeriesKMeans**
- K-means with time series distances
- Supports DTW, Euclidean, and other metrics
```python
from aeon.clustering import TimeSeriesKMeans
clusterer = TimeSeriesKMeans(
n_clusters=3,
distance="dtw",
n_init=10
)
clusterer.fit(X_collection)
labels = clusterer.labels_
```
**TimeSeriesKMedoids**
- Uses actual series as cluster centers
- More robust to outliers
```python
from aeon.clustering import TimeSeriesKMedoids
clusterer = TimeSeriesKMedoids(
n_clusters=3,
distance="euclidean"
)
clusterer.fit(X_collection)
```
**Other clustering methods:**
- `TimeSeriesKernelKMeans`: Kernel-based clustering
- `ElasticSOM`: Self-organizing maps with elastic distances
### Clustering Workflow
```python
from aeon.clustering import TimeSeriesKMeans
from aeon.distances import dtw_distance
import numpy as np
# Cluster time series
clusterer = TimeSeriesKMeans(n_clusters=4, distance="dtw")
clusterer.fit(X_train)
# Get cluster labels
labels = clusterer.predict(X_test)
# Compute cluster centers
centers = clusterer.cluster_centers_
# Evaluate clustering quality (if ground truth available)
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(y_true, labels)
```
## Similarity Search
Similarity search finds motifs, nearest neighbors, and repeated patterns.
### Key Concepts
**Motifs**: Frequently repeated subsequences within a time series
**Matrix Profile**: Data structure encoding nearest neighbor distances for all subsequences
### Similarity Search Methods
**Matrix Profile**
- Efficient motif discovery
- Change point detection
- Anomaly detection
```python
from aeon.similarity_search import MatrixProfile
mp = MatrixProfile(window_size=50)
profile = mp.fit_transform(X_series)
# Find top motif
motif_idx = np.argmin(profile)
```
**Query Search**
- Find nearest neighbors to a query subsequence
- Useful for template matching
```python
from aeon.similarity_search import QuerySearch
searcher = QuerySearch(distance="euclidean")
distances, indices = searcher.search(X_series, query_subsequence)
```
### Similarity Search Workflow
```python
from aeon.similarity_search import MatrixProfile
import numpy as np
# Compute matrix profile
mp = MatrixProfile(window_size=100)
profile, profile_index = mp.fit_transform(X_series)
# Find top-k motifs (lowest profile values)
k = 3
motif_indices = np.argsort(profile)[:k]
# Find anomalies (highest profile values)
anomaly_indices = np.argsort(profile)[-k:]
```
## Ensemble and Composition Tools
### Voting Ensembles
```python
from aeon.classification.ensemble import WeightedEnsembleClassifier
from aeon.classification.convolution_based import RocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
ensemble = WeightedEnsembleClassifier(
estimators=[
('rocket', RocketClassifier()),
('knn', KNeighborsTimeSeriesClassifier())
]
)
ensemble.fit(X_train, y_train)
```
### Pipelines
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('features', Catch22()),
('classifier', RandomForestClassifier())
])
pipeline.fit(X_train, y_train)
```
## Model Selection and Validation
### Cross-Validation
```python
from sklearn.model_selection import cross_val_score
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier()
scores = cross_val_score(clf, X_train, y_train, cv=5)
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
```
### Grid Search
```python
from sklearn.model_selection import GridSearchCV
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
param_grid = {
'n_neighbors': [1, 3, 5, 7],
'distance': ['dtw', 'euclidean', 'erp']
}
clf = KNeighborsTimeSeriesClassifier()
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")
```
## Discovery Functions
Find available estimators programmatically:
```python
from aeon.utils.discovery import all_estimators
# Get all classifiers
classifiers = all_estimators(type_filter="classifier")
# Get all regressors
regressors = all_estimators(type_filter="regressor")
# Get all clusterers
clusterers = all_estimators(type_filter="clusterer")
# Filter by tag (e.g., multivariate capable)
mv_classifiers = all_estimators(
type_filter="classifier",
filter_tags={"capability:multivariate": True}
)
```

View File

@@ -0,0 +1,596 @@
# Temporal Analysis: Forecasting, Anomaly Detection, and Segmentation
This reference provides comprehensive details on forecasting future values, detecting anomalies, and segmenting time series.
## Forecasting
Forecasting predicts future values in a time series based on historical patterns.
### Forecasting Concepts
**Forecasting horizon (fh)**: Number of steps ahead to predict
- Absolute: `fh=[1, 2, 3]` (predict steps 1, 2, 3)
- Relative: `fh=ForecastingHorizon([1, 2, 3], is_relative=True)`
**Exogenous variables**: External features that influence predictions
### Statistical Forecasters
#### ARIMA (AutoRegressive Integrated Moving Average)
Classical time series model combining AR, differencing, and MA components:
```python
from aeon.forecasting.arima import ARIMA
forecaster = ARIMA(
order=(1, 1, 1), # (p, d, q)
seasonal_order=(1, 1, 1, 12), # (P, D, Q, s)
suppress_warnings=True
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
**Parameters:**
- `p`: AR order (lags)
- `d`: Differencing order
- `q`: MA order (moving average)
- `P, D, Q, s`: Seasonal components
#### ETS (Exponential Smoothing)
State space model for trend and seasonality:
```python
from aeon.forecasting.ets import ETS
forecaster = ETS(
error="add",
trend="add",
seasonal="add",
sp=12 # seasonal period
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
**Model types:**
- Error: "add" (additive) or "mul" (multiplicative)
- Trend: "add", "mul", or None
- Seasonal: "add", "mul", or None
#### Theta Forecaster
Simple, effective method using exponential smoothing:
```python
from aeon.forecasting.theta import ThetaForecaster
forecaster = ThetaForecaster(deseasonalize=True, sp=12)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=np.arange(1, 13))
```
#### TAR (Threshold AutoRegressive)
Non-linear autoregressive model with regime switching:
```python
from aeon.forecasting.tar import TAR
forecaster = TAR(
delay=1,
threshold=0.0,
order_below=2,
order_above=2
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
**AutoTAR**: Automatically optimizes threshold:
```python
from aeon.forecasting.tar import AutoTAR
forecaster = AutoTAR(max_order=5)
forecaster.fit(y_train)
```
#### TVP (Time-Varying Parameter)
Kalman filter-based forecaster with dynamic coefficients:
```python
from aeon.forecasting.tvp import TVP
forecaster = TVP(
order=2,
use_exog=False
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Naive Baselines
Simple forecasting strategies for benchmarking:
```python
from aeon.forecasting.naive import NaiveForecaster
# Last value
forecaster = NaiveForecaster(strategy="last")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
# Seasonal naive (use value from same season last year)
forecaster = NaiveForecaster(strategy="seasonal_last", sp=12)
forecaster.fit(y_train)
# Mean
forecaster = NaiveForecaster(strategy="mean")
forecaster.fit(y_train)
# Drift (linear trend from first to last)
forecaster = NaiveForecaster(strategy="drift")
forecaster.fit(y_train)
```
**Strategies:**
- `"last"`: Repeat last observed value
- `"mean"`: Use mean of training data
- `"seasonal_last"`: Repeat value from previous season
- `"drift"`: Linear extrapolation
### Deep Learning Forecasters
#### TCN (Temporal Convolutional Network)
Deep learning with dilated causal convolutions:
```python
from aeon.forecasting.deep_learning import TCNForecaster
forecaster = TCNForecaster(
n_epochs=100,
batch_size=32,
kernel_size=3,
n_filters=64,
dilation_rate=2
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
### Regression-Based Forecasting
Transform forecasting into a supervised learning problem:
```python
from aeon.forecasting.compose import RegressionForecaster
from sklearn.ensemble import RandomForestRegressor
forecaster = RegressionForecaster(
regressor=RandomForestRegressor(n_estimators=100),
window_length=10
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Forecasting Workflow
```python
from aeon.forecasting.arima import ARIMA
from aeon.datasets import load_airline
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
# Load data
y = load_airline()
# Split train/test
split_point = int(len(y) * 0.8)
y_train, y_test = y[:split_point], y[split_point:]
# Fit forecaster
forecaster = ARIMA(order=(2, 1, 2), suppress_warnings=True)
forecaster.fit(y_train)
# Predict
fh = np.arange(1, len(y_test) + 1)
y_pred = forecaster.predict(fh=fh)
# Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"MAE: {mae:.3f}, RMSE: {rmse:.3f}")
```
### Forecasting with Exogenous Variables
```python
from aeon.forecasting.arima import ARIMA
# X contains exogenous features
forecaster = ARIMA(order=(1, 1, 1))
forecaster.fit(y_train, X=X_train)
# Must provide future exogenous values
y_pred = forecaster.predict(fh=[1, 2, 3], X=X_future)
```
### Multi-Step Forecasting Strategies
**Direct**: Train separate model for each horizon
**Recursive**: Use predictions as inputs for next step
**DirRec**: Combine both strategies
```python
from aeon.forecasting.compose import DirectReductionForecaster
from sklearn.linear_model import Ridge
forecaster = DirectReductionForecaster(
regressor=Ridge(),
window_length=10
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
## Anomaly Detection
Anomaly detection identifies unusual patterns or outliers in time series data.
### Anomaly Detection Types
**Point anomalies**: Single unusual values
**Contextual anomalies**: Values anomalous given context
**Collective anomalies**: Sequences of unusual behavior
### Distance-Based Anomaly Detectors
#### STOMP (Scalable Time series Ordered-search Matrix Profile)
Matrix profile-based anomaly detection:
```python
from aeon.anomaly_detection import STOMP
detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
# High scores indicate anomalies
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
```
#### LeftSTAMPi
Incremental matrix profile for streaming data:
```python
from aeon.anomaly_detection import LeftSTAMPi
detector = LeftSTAMPi(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### MERLIN
Matrix profile with range constraints:
```python
from aeon.anomaly_detection import MERLIN
detector = MERLIN(window_size=50, k=3)
anomaly_scores = detector.fit_predict(X_series)
```
#### KMeansAD
K-means clustering-based anomaly detection:
```python
from aeon.anomaly_detection import KMeansAD
detector = KMeansAD(n_clusters=5, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### CBLOF (Cluster-Based Local Outlier Factor)
```python
from aeon.anomaly_detection import CBLOF
detector = CBLOF(n_clusters=8, alpha=0.9)
anomaly_scores = detector.fit_predict(X_series)
```
#### LOF (Local Outlier Factor)
Density-based outlier detection:
```python
from aeon.anomaly_detection import LOF
detector = LOF(n_neighbors=20, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### ROCKAD
ROCKET-based anomaly detection:
```python
from aeon.anomaly_detection import ROCKAD
detector = ROCKAD(num_kernels=1000, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
### Distribution-Based Anomaly Detectors
#### COPOD (Copula-Based Outlier Detection)
```python
from aeon.anomaly_detection import COPOD
detector = COPOD(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### DWT_MLEAD
Discrete Wavelet Transform with Machine Learning:
```python
from aeon.anomaly_detection import DWT_MLEAD
detector = DWT_MLEAD(window_size=50, wavelet='db4')
anomaly_scores = detector.fit_predict(X_series)
```
### Outlier Detection Methods
#### IsolationForest
Ensemble tree-based isolation:
```python
from aeon.anomaly_detection import IsolationForest
detector = IsolationForest(
n_estimators=100,
window_size=50,
contamination=0.1
)
anomaly_scores = detector.fit_predict(X_series)
```
#### OneClassSVM
Support vector machine for novelty detection:
```python
from aeon.anomaly_detection import OneClassSVM
detector = OneClassSVM(
kernel='rbf',
nu=0.1,
window_size=50
)
anomaly_scores = detector.fit_predict(X_series)
```
#### STRAY (Search TRace AnomalY)
```python
from aeon.anomaly_detection import STRAY
detector = STRAY(alpha=0.05)
anomaly_scores = detector.fit_predict(X_series)
```
### Collection Anomaly Detection
Detect anomalous time series within a collection:
```python
from aeon.anomaly_detection import ClassificationAdapter
from aeon.classification.convolution_based import RocketClassifier
detector = ClassificationAdapter(
classifier=RocketClassifier()
)
detector.fit(X_normal) # Train on normal data
anomaly_labels = detector.predict(X_test) # 1 = anomaly, 0 = normal
```
### Anomaly Detection Workflow
```python
from aeon.anomaly_detection import STOMP
import numpy as np
import matplotlib.pyplot as plt
# Detect anomalies
detector = STOMP(window_size=100)
anomaly_scores = detector.fit_predict(X_series)
# Identify anomalies (top 5%)
threshold = np.percentile(anomaly_scores, 95)
anomaly_indices = np.where(anomaly_scores > threshold)[0]
# Visualize
plt.figure(figsize=(12, 6))
plt.subplot(2, 1, 1)
plt.plot(X_series[0, 0, :])
plt.scatter(anomaly_indices, X_series[0, 0, anomaly_indices],
color='red', label='Anomalies', zorder=5)
plt.legend()
plt.title('Time Series with Detected Anomalies')
plt.subplot(2, 1, 2)
plt.plot(anomaly_scores)
plt.axhline(threshold, color='red', linestyle='--', label='Threshold')
plt.legend()
plt.title('Anomaly Scores')
plt.tight_layout()
plt.show()
```
## Segmentation
Segmentation divides time series into distinct regions or identifies change points.
### Segmentation Concepts
**Change points**: Locations where statistical properties change
**Segments**: Homogeneous regions between change points
**Applications**: Regime detection, event identification, structural breaks
### Segmentation Algorithms
#### ClaSP (Classification Score Profile)
Discover change points using classification performance:
```python
from aeon.segmentation import ClaSPSegmenter
segmenter = ClaSPSegmenter(
n_segments=3,
period_length=10
)
change_points = segmenter.fit_predict(X_series)
print(f"Change points at indices: {change_points}")
```
**How it works:**
- Slides a window over the series
- Computes classification score for left vs. right segments
- High scores indicate change points
#### FLUSS (Fast Low-cost Unipotent Semantic Segmentation)
Matrix profile-based segmentation:
```python
from aeon.segmentation import FLUSSSegmenter
segmenter = FLUSSSegmenter(
n_segments=5,
window_size=50
)
change_points = segmenter.fit_predict(X_series)
```
#### BinSeg (Binary Segmentation)
Recursive splitting for change point detection:
```python
from aeon.segmentation import BinSegSegmenter
segmenter = BinSegSegmenter(
n_segments=4,
model="l2" # cost function
)
change_points = segmenter.fit_predict(X_series)
```
**Models:**
- `"l2"`: Least squares (continuous data)
- `"l1"`: Absolute deviation (robust to outliers)
- `"rbf"`: Radial basis function
- `"ar"`: Autoregressive model
#### HMM (Hidden Markov Model) Segmentation
Probabilistic state-based segmentation:
```python
from aeon.segmentation import HMMSegmenter
segmenter = HMMSegmenter(
n_states=3,
covariance_type="full"
)
segmenter.fit(X_series)
states = segmenter.predict(X_series)
```
### Segmentation Workflow
```python
from aeon.segmentation import ClaSPSegmenter
import matplotlib.pyplot as plt
# Detect change points
segmenter = ClaSPSegmenter(n_segments=4)
change_points = segmenter.fit_predict(X_series)
# Visualize segments
plt.figure(figsize=(12, 4))
plt.plot(X_series[0, 0, :])
for cp in change_points:
plt.axvline(cp, color='red', linestyle='--', alpha=0.7)
plt.title('Time Series Segmentation')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
# Extract segments
segments = []
prev_cp = 0
for cp in np.append(change_points, len(X_series[0, 0, :])):
segment = X_series[0, 0, prev_cp:cp]
segments.append(segment)
prev_cp = cp
```
### Multi-Variate Segmentation
```python
from aeon.segmentation import ClaSPSegmenter
# X_multivariate has shape (1, n_channels, n_timepoints)
segmenter = ClaSPSegmenter(n_segments=3)
change_points = segmenter.fit_predict(X_multivariate)
```
## Combining Forecasting, Anomaly Detection, and Segmentation
### Robust Forecasting with Anomaly Detection
```python
from aeon.forecasting.arima import ARIMA
from aeon.anomaly_detection import IsolationForest
# Detect and remove anomalies
detector = IsolationForest(window_size=50, contamination=0.1)
anomaly_scores = detector.fit_predict(X_series)
normal_mask = anomaly_scores < np.percentile(anomaly_scores, 90)
# Forecast on cleaned data
y_clean = y_train[normal_mask]
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_clean)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Segmentation-Based Forecasting
```python
from aeon.segmentation import ClaSPSegmenter
from aeon.forecasting.arima import ARIMA
# Segment time series
segmenter = ClaSPSegmenter(n_segments=3)
change_points = segmenter.fit_predict(X_series)
# Forecast using most recent segment
last_segment_start = change_points[-1]
y_recent = y_train[last_segment_start:]
forecaster = ARIMA(order=(1, 1, 1))
forecaster.fit(y_recent)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
## Discovery Functions
Find available forecasters, detectors, and segmenters:
```python
from aeon.utils.discovery import all_estimators
# Get all forecasters
forecasters = all_estimators(type_filter="forecaster")
# Get all anomaly detectors
detectors = all_estimators(type_filter="anomaly-detector")
# Get all segmenters
segmenters = all_estimators(type_filter="segmenter")
```

View File

@@ -0,0 +1,634 @@
# Common Workflows and Integration Patterns
This reference provides end-to-end workflows, best practices, and integration patterns for using aeon effectively.
## Complete Classification Workflow
### Basic Classification Pipeline
```python
# 1. Import required modules
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
# 2. Load and inspect data
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
print(f"Training shape: {X_train.shape}") # (n_cases, n_channels, n_timepoints)
print(f"Unique classes: {np.unique(y_train)}")
# 3. Train classifier
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
clf.fit(X_train, y_train)
# 4. Make predictions
y_pred = clf.predict(X_test)
y_proba = clf.predict_proba(X_test)
# 5. Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.3f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# 6. Visualize confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()
```
### Feature Extraction + Classifier Pipeline
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
# Create pipeline
pipeline = Pipeline([
('features', Catch22(n_jobs=-1)),
('classifier', RandomForestClassifier(n_estimators=500, n_jobs=-1))
])
# Cross-validation
scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='accuracy')
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
# Train on full training set
pipeline.fit(X_train, y_train)
# Evaluate on test set
accuracy = pipeline.score(X_test, y_test)
print(f"Test Accuracy: {accuracy:.3f}")
```
### Multi-Algorithm Comparison
```python
from aeon.classification.convolution_based import RocketClassifier, MiniRocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
from aeon.classification.feature_based import Catch22Classifier
from aeon.classification.interval_based import TimeSeriesForestClassifier
import time
classifiers = {
'ROCKET': RocketClassifier(num_kernels=10000),
'MiniRocket': MiniRocketClassifier(),
'KNN-DTW': KNeighborsTimeSeriesClassifier(distance='dtw', n_neighbors=5),
'Catch22': Catch22Classifier(),
'TSF': TimeSeriesForestClassifier(n_estimators=200)
}
results = {}
for name, clf in classifiers.items():
start_time = time.time()
clf.fit(X_train, y_train)
train_time = time.time() - start_time
start_time = time.time()
accuracy = clf.score(X_test, y_test)
test_time = time.time() - start_time
results[name] = {
'accuracy': accuracy,
'train_time': train_time,
'test_time': test_time
}
# Display results
import pandas as pd
df_results = pd.DataFrame(results).T
df_results = df_results.sort_values('accuracy', ascending=False)
print(df_results)
```
## Complete Forecasting Workflow
### Univariate Forecasting
```python
from aeon.forecasting.arima import ARIMA
from aeon.forecasting.naive import NaiveForecaster
from aeon.datasets import load_airline
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
import matplotlib.pyplot as plt
# 1. Load data
y = load_airline()
# 2. Train/test split (temporal)
split_point = int(len(y) * 0.8)
y_train, y_test = y[:split_point], y[split_point:]
# 3. Create baseline (naive forecaster)
baseline = NaiveForecaster(strategy="last")
baseline.fit(y_train)
y_pred_baseline = baseline.predict(fh=np.arange(1, len(y_test) + 1))
# 4. Train ARIMA model
forecaster = ARIMA(order=(2, 1, 2), seasonal_order=(1, 1, 1, 12))
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=np.arange(1, len(y_test) + 1))
# 5. Evaluate
mae_baseline = mean_absolute_error(y_test, y_pred_baseline)
mae_arima = mean_absolute_error(y_test, y_pred)
rmse_baseline = np.sqrt(mean_squared_error(y_test, y_pred_baseline))
rmse_arima = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Baseline - MAE: {mae_baseline:.2f}, RMSE: {rmse_baseline:.2f}")
print(f"ARIMA - MAE: {mae_arima:.2f}, RMSE: {rmse_arima:.2f}")
# 6. Visualize
plt.figure(figsize=(12, 6))
plt.plot(y_train.index, y_train, label='Train', alpha=0.7)
plt.plot(y_test.index, y_test, label='Test (Actual)', alpha=0.7)
plt.plot(y_test.index, y_pred, label='ARIMA Forecast', linestyle='--')
plt.plot(y_test.index, y_pred_baseline, label='Baseline', linestyle=':', alpha=0.5)
plt.legend()
plt.title('Forecasting Results')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
```
### Forecast with Confidence Intervals
```python
from aeon.forecasting.arima import ARIMA
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_train)
# Predict with prediction intervals
y_pred = forecaster.predict(fh=np.arange(1, len(y_test) + 1))
pred_interval = forecaster.predict_interval(
fh=np.arange(1, len(y_test) + 1),
coverage=0.95
)
# Visualize with confidence bands
plt.figure(figsize=(12, 6))
plt.plot(y_test.index, y_test, label='Actual')
plt.plot(y_test.index, y_pred, label='Forecast')
plt.fill_between(
y_test.index,
pred_interval.iloc[:, 0],
pred_interval.iloc[:, 1],
alpha=0.3,
label='95% Confidence'
)
plt.legend()
plt.show()
```
### Multi-Step Ahead Forecasting
```python
from aeon.forecasting.compose import DirectReductionForecaster
from sklearn.ensemble import GradientBoostingRegressor
# Convert to supervised learning problem
forecaster = DirectReductionForecaster(
regressor=GradientBoostingRegressor(n_estimators=100),
window_length=12
)
forecaster.fit(y_train)
# Forecast multiple steps
fh = np.arange(1, 13) # 12 months ahead
y_pred = forecaster.predict(fh=fh)
```
## Complete Anomaly Detection Workflow
```python
from aeon.anomaly_detection import STOMP
from aeon.datasets import load_airline
import numpy as np
import matplotlib.pyplot as plt
# 1. Load data
y = load_airline()
X_series = y.values.reshape(1, 1, -1) # Convert to aeon format
# 2. Detect anomalies
detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
# 3. Identify anomalies (top 5%)
threshold = np.percentile(anomaly_scores, 95)
anomaly_indices = np.where(anomaly_scores > threshold)[0]
# 4. Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
# Plot time series with anomalies
axes[0].plot(y.values, label='Time Series')
axes[0].scatter(
anomaly_indices,
y.values[anomaly_indices],
color='red',
s=100,
label='Anomalies',
zorder=5
)
axes[0].set_ylabel('Value')
axes[0].legend()
axes[0].set_title('Time Series with Detected Anomalies')
# Plot anomaly scores
axes[1].plot(anomaly_scores, label='Anomaly Score')
axes[1].axhline(threshold, color='red', linestyle='--', label='Threshold')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Score')
axes[1].legend()
axes[1].set_title('Anomaly Scores')
plt.tight_layout()
plt.show()
# 5. Extract anomalous segments
print(f"Found {len(anomaly_indices)} anomalies")
for idx in anomaly_indices[:5]: # Show first 5
print(f"Anomaly at index {idx}, value: {y.values[idx]:.2f}")
```
## Complete Clustering Workflow
```python
from aeon.clustering import TimeSeriesKMeans
from aeon.datasets import load_basic_motions
from sklearn.metrics import silhouette_score, davies_bouldin_score
import matplotlib.pyplot as plt
# 1. Load data
X_train, y_train = load_basic_motions(split="train")
# 2. Determine optimal number of clusters (elbow method)
inertias = []
silhouettes = []
K = range(2, 11)
for k in K:
clusterer = TimeSeriesKMeans(n_clusters=k, distance="euclidean", n_init=5)
labels = clusterer.fit_predict(X_train)
inertias.append(clusterer.inertia_)
silhouettes.append(silhouette_score(X_train.reshape(len(X_train), -1), labels))
# Plot elbow curve
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].plot(K, inertias, 'bo-')
axes[0].set_xlabel('Number of Clusters')
axes[0].set_ylabel('Inertia')
axes[0].set_title('Elbow Method')
axes[1].plot(K, silhouettes, 'ro-')
axes[1].set_xlabel('Number of Clusters')
axes[1].set_ylabel('Silhouette Score')
axes[1].set_title('Silhouette Analysis')
plt.tight_layout()
plt.show()
# 3. Cluster with optimal k
optimal_k = 4
clusterer = TimeSeriesKMeans(n_clusters=optimal_k, distance="dtw", n_init=10)
labels = clusterer.fit_predict(X_train)
# 4. Visualize clusters
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.ravel()
for cluster_id in range(optimal_k):
cluster_indices = np.where(labels == cluster_id)[0]
ax = axes[cluster_id]
# Plot all series in cluster
for idx in cluster_indices[:20]: # Plot up to 20 series
ax.plot(X_train[idx, 0, :], alpha=0.3, color='blue')
# Plot cluster center
ax.plot(clusterer.cluster_centers_[cluster_id, 0, :],
color='red', linewidth=2, label='Center')
ax.set_title(f'Cluster {cluster_id} (n={len(cluster_indices)})')
ax.legend()
plt.tight_layout()
plt.show()
```
## Cross-Validation Strategies
### Standard K-Fold Cross-Validation
```python
from sklearn.model_selection import cross_val_score, StratifiedKFold
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier()
# Stratified K-Fold (preserves class distribution)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(clf, X_train, y_train, cv=cv, scoring='accuracy')
print(f"Cross-validation scores: {scores}")
print(f"Mean accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
```
### Time Series Cross-Validation (for forecasting)
```python
from sklearn.model_selection import TimeSeriesSplit
from aeon.forecasting.arima import ARIMA
from sklearn.metrics import mean_squared_error
import numpy as np
# Time-aware split (no future data leakage)
tscv = TimeSeriesSplit(n_splits=5)
mse_scores = []
for train_idx, test_idx in tscv.split(y):
y_train_cv, y_test_cv = y.iloc[train_idx], y.iloc[test_idx]
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_train_cv)
fh = np.arange(1, len(y_test_cv) + 1)
y_pred = forecaster.predict(fh=fh)
mse = mean_squared_error(y_test_cv, y_pred)
mse_scores.append(mse)
print(f"CV MSE: {np.mean(mse_scores):.3f} (+/- {np.std(mse_scores):.3f})")
```
## Hyperparameter Tuning
### Grid Search
```python
from sklearn.model_selection import GridSearchCV
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
# Define parameter grid
param_grid = {
'n_neighbors': [1, 3, 5, 7, 9],
'distance': ['dtw', 'euclidean', 'erp', 'msm'],
'distance_params': [{'window': 0.1}, {'window': 0.2}, None]
}
# Grid search with cross-validation
clf = KNeighborsTimeSeriesClassifier()
grid_search = GridSearchCV(
clf,
param_grid,
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=2
)
grid_search.fit(X_train, y_train)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.3f}")
print(f"Test accuracy: {grid_search.score(X_test, y_test):.3f}")
```
### Random Search
```python
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
param_distributions = {
'n_neighbors': randint(1, 20),
'distance': ['dtw', 'euclidean', 'ddtw'],
'distance_params': [{'window': w} for w in np.linspace(0.0, 0.5, 10)]
}
clf = KNeighborsTimeSeriesClassifier()
random_search = RandomizedSearchCV(
clf,
param_distributions,
n_iter=50,
cv=5,
scoring='accuracy',
n_jobs=-1,
random_state=42
)
random_search.fit(X_train, y_train)
print(f"Best parameters: {random_search.best_params_}")
```
## Integration with scikit-learn
### Using aeon in scikit-learn Pipelines
```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from aeon.transformations.collection import Catch22
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('features', Catch22()),
('scaler', StandardScaler()),
('feature_selection', SelectKBest(f_classif, k=15)),
('classifier', RandomForestClassifier(n_estimators=500))
])
pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_test, y_test)
```
### Voting Ensemble with scikit-learn
```python
from sklearn.ensemble import VotingClassifier
from aeon.classification.convolution_based import RocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
from aeon.classification.feature_based import Catch22Classifier
ensemble = VotingClassifier(
estimators=[
('rocket', RocketClassifier()),
('knn', KNeighborsTimeSeriesClassifier()),
('catch22', Catch22Classifier())
],
voting='soft',
n_jobs=-1
)
ensemble.fit(X_train, y_train)
accuracy = ensemble.score(X_test, y_test)
```
### Stacking with Meta-Learner
```python
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from aeon.classification.convolution_based import MiniRocketClassifier
from aeon.classification.interval_based import TimeSeriesForestClassifier
stacking = StackingClassifier(
estimators=[
('minirocket', MiniRocketClassifier()),
('tsf', TimeSeriesForestClassifier(n_estimators=100))
],
final_estimator=LogisticRegression(),
cv=5
)
stacking.fit(X_train, y_train)
accuracy = stacking.score(X_test, y_test)
```
## Data Preprocessing
### Handling Variable-Length Series
```python
from aeon.transformations.collection import PaddingTransformer
# Pad series to equal length
padder = PaddingTransformer(pad_length=None, fill_value=0)
X_padded = padder.fit_transform(X_variable_length)
```
### Handling Missing Values
```python
from aeon.transformations.series import Imputer
imputer = Imputer(method='mean')
X_imputed = imputer.fit_transform(X_with_missing)
```
### Normalization
```python
from aeon.transformations.collection import Normalizer
normalizer = Normalizer(method='z-score')
X_normalized = normalizer.fit_transform(X_train)
```
## Model Persistence
### Saving and Loading Models
```python
import pickle
from aeon.classification.convolution_based import RocketClassifier
# Train and save
clf = RocketClassifier()
clf.fit(X_train, y_train)
with open('rocket_model.pkl', 'wb') as f:
pickle.dump(clf, f)
# Load and predict
with open('rocket_model.pkl', 'rb') as f:
loaded_clf = pickle.load(f)
predictions = loaded_clf.predict(X_test)
```
### Using joblib (recommended for large models)
```python
import joblib
# Save
joblib.dump(clf, 'rocket_model.joblib')
# Load
loaded_clf = joblib.load('rocket_model.joblib')
```
## Visualization Utilities
### Plotting Time Series
```python
from aeon.visualisation import plot_series
import matplotlib.pyplot as plt
# Plot multiple series
fig, ax = plt.subplots(figsize=(12, 6))
plot_series(X_train[0], X_train[1], X_train[2], labels=['Series 1', 'Series 2', 'Series 3'], ax=ax)
plt.title('Time Series Visualization')
plt.show()
```
### Plotting Distance Matrices
```python
from aeon.distances import pairwise_distance
import seaborn as sns
dist_matrix = pairwise_distance(X_train[:50], metric="dtw")
plt.figure(figsize=(10, 8))
sns.heatmap(dist_matrix, cmap='viridis', square=True)
plt.title('DTW Distance Matrix')
plt.show()
```
## Performance Optimization Tips
1. **Use n_jobs=-1** for parallel processing:
```python
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
```
2. **Use MiniRocket instead of ROCKET** for faster training:
```python
clf = MiniRocketClassifier() # 75% faster
```
3. **Reduce num_kernels** for faster training:
```python
clf = RocketClassifier(num_kernels=2000) # Default is 10000
```
4. **Use Catch22 instead of TSFresh**:
```python
transform = Catch22() # Much faster, fewer features
```
5. **Window constraints for DTW**:
```python
clf = KNeighborsTimeSeriesClassifier(
distance='dtw',
distance_params={'window': 0.1} # Constrain warping
)
```
## Best Practices
1. **Always use train/test split** with time series ordering preserved
2. **Use stratified splits** for classification to maintain class balance
3. **Start with fast algorithms** (ROCKET, MiniRocket) before trying slow ones
4. **Use cross-validation** to estimate generalization performance
5. **Benchmark against naive baselines** to establish minimum performance
6. **Normalize/standardize** when using distance-based methods
7. **Use appropriate distance metrics** for your data characteristics
8. **Save trained models** to avoid retraining
9. **Monitor training time** and computational resources
10. **Visualize results** to understand model behavior