Support for aeon for time-series analysis and machine learning

This commit is contained in:
Timothy Kassis
2025-10-25 21:25:50 -07:00
parent 6cefe6f4cc
commit b83942845c
5 changed files with 2645 additions and 0 deletions

View File

@@ -0,0 +1,749 @@
# Core Modules: Transformations, Distances, Networks, Datasets, and Benchmarking
This reference provides comprehensive details on foundational modules that support aeon's learning tasks.
## Transformations
Transformations convert time series into alternative representations for feature extraction, preprocessing, or visualization.
### Two Types of Transformers
**Collection Transformers**: Process entire collections of time series
- Input: `(n_cases, n_channels, n_timepoints)`
- Output: Features, transformed collections, or tabular data
**Series Transformers**: Work on individual time series
- Input: Single time series
- Output: Transformed single series
### Collection-Level Transformations
#### ROCKET (RAndom Convolutional KErnel Transform)
Fast feature extraction via random convolutional kernels:
```python
from aeon.transformations.collection.convolution_based import Rocket
rocket = Rocket(num_kernels=10000, n_jobs=-1)
X_transformed = rocket.fit_transform(X_train)
# Output shape: (n_cases, 2 * num_kernels)
```
**Variants:**
```python
from aeon.transformations.collection.convolution_based import (
MiniRocket,
MultiRocket,
Hydra
)
# MiniRocket: Faster, streamlined version
minirocket = MiniRocket(num_kernels=10000)
X_features = minirocket.fit_transform(X_train)
# MultiRocket: Multivariate extensions
multirocket = MultiRocket(num_kernels=10000)
X_features = multirocket.fit_transform(X_train)
# Hydra: Dictionary-based convolution
hydra = Hydra(n_kernels=8)
X_features = hydra.fit_transform(X_train)
```
#### Catch22
22 canonical time series features:
```python
from aeon.transformations.collection.feature_based import Catch22
catch22 = Catch22(n_jobs=-1)
X_features = catch22.fit_transform(X_train)
# Output shape: (n_cases, 22)
```
**Feature categories:**
- Distribution (mean, variance, skewness)
- Autocorrelation properties
- Entropy measures
- Nonlinear dynamics
- Spectral properties
#### TSFresh
Comprehensive feature extraction (779 features):
```python
from aeon.transformations.collection.feature_based import TSFresh
tsfresh = TSFresh(
default_fc_parameters="comprehensive",
n_jobs=-1
)
X_features = tsfresh.fit_transform(X_train)
```
**Warning**: Slow on large datasets; use Catch22 for faster alternative
#### FreshPRINCE
Fresh Pipelines with Random Interval and Catch22 Features:
```python
from aeon.transformations.collection.feature_based import FreshPRINCE
freshprince = FreshPRINCE(n_intervals=50, n_jobs=-1)
X_features = freshprince.fit_transform(X_train)
```
#### Shapelet Transform
Extract discriminative subsequences:
```python
from aeon.transformations.collection.shapelet_based import ShapeletTransform
shapelet = ShapeletTransform(
n_shapelet_samples=10000,
max_shapelets=20,
n_jobs=-1
)
X_features = shapelet.fit_transform(X_train, y_train)
# Requires labels for supervised shapelet discovery
```
**Random Shapelet Transform**:
```python
from aeon.transformations.collection.shapelet_based import RandomShapeletTransform
rst = RandomShapeletTransform(n_shapelets=1000)
X_features = rst.fit_transform(X_train)
```
#### SAST (Shapelet-Attention Subsequence Transform)
Attention-based shapelet discovery:
```python
from aeon.transformations.collection.shapelet_based import SAST
sast = SAST(window_size=0.1, n_shapelets=100)
X_features = sast.fit_transform(X_train, y_train)
```
#### Symbolic Representations
**SAX (Symbolic Aggregate approXimation)**:
```python
from aeon.transformations.collection.dictionary_based import SAX
sax = SAX(n_segments=8, alphabet_size=4)
X_symbolic = sax.fit_transform(X_train)
```
**PAA (Piecewise Aggregate Approximation)**:
```python
from aeon.transformations.collection.dictionary_based import PAA
paa = PAA(n_segments=10)
X_approximated = paa.fit_transform(X_train)
```
**SFA (Symbolic Fourier Approximation)**:
```python
from aeon.transformations.collection.dictionary_based import SFA
sfa = SFA(word_length=8, alphabet_size=4)
X_symbolic = sfa.fit_transform(X_train)
```
#### Channel Selection and Operations
**Channel Selection**:
```python
from aeon.transformations.collection.channel_selection import ChannelSelection
selector = ChannelSelection(channels=[0, 2, 5])
X_selected = selector.fit_transform(X_train)
```
**Channel Scoring**:
```python
from aeon.transformations.collection.channel_selection import ChannelScorer
scorer = ChannelScorer()
scores = scorer.fit_transform(X_train, y_train)
```
#### Data Balancing
**SMOTE (Synthetic Minority Over-sampling)**:
```python
from aeon.transformations.collection.smote import SMOTE
smote = SMOTE(k_neighbors=5)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
```
**ADASYN**:
```python
from aeon.transformations.collection.smote import ADASYN
adasyn = ADASYN(n_neighbors=5)
X_resampled, y_resampled = adasyn.fit_resample(X_train, y_train)
```
### Series-Level Transformations
#### Smoothing Filters
**Moving Average**:
```python
from aeon.transformations.series.moving_average import MovingAverage
ma = MovingAverage(window_size=5)
X_smoothed = ma.fit_transform(X_series)
```
**Exponential Smoothing**:
```python
from aeon.transformations.series.exponent import ExponentTransformer
exp_smooth = ExponentTransformer(power=0.5)
X_smoothed = exp_smooth.fit_transform(X_series)
```
**Savitzky-Golay Filter**:
```python
from aeon.transformations.series.savgol import SavitzkyGolay
savgol = SavitzkyGolay(window_length=11, polyorder=3)
X_smoothed = savgol.fit_transform(X_series)
```
**Gaussian Filter**:
```python
from aeon.transformations.series.gaussian import GaussianFilter
gaussian = GaussianFilter(sigma=2.0)
X_smoothed = gaussian.fit_transform(X_series)
```
#### Statistical Transforms
**Box-Cox Transformation**:
```python
from aeon.transformations.series.boxcox import BoxCoxTransformer
boxcox = BoxCoxTransformer()
X_transformed = boxcox.fit_transform(X_series)
```
**AutoCorrelation**:
```python
from aeon.transformations.series.acf import AutoCorrelationTransformer
acf = AutoCorrelationTransformer(n_lags=40)
X_acf = acf.fit_transform(X_series)
```
**PCA (Principal Component Analysis)**:
```python
from aeon.transformations.series.pca import PCATransformer
pca = PCATransformer(n_components=3)
X_reduced = pca.fit_transform(X_series)
```
#### Approximation Methods
**Discrete Fourier Transform (DFT)**:
```python
from aeon.transformations.series.fourier import FourierTransform
dft = FourierTransform()
X_freq = dft.fit_transform(X_series)
```
**Piecewise Linear Approximation (PLA)**:
```python
from aeon.transformations.series.pla import PLA
pla = PLA(n_segments=10)
X_approx = pla.fit_transform(X_series)
```
#### Anomaly Detection Transform
**DOBIN (Distance-based Outlier BasIs using Neighbors)**:
```python
from aeon.transformations.series.dobin import DOBIN
dobin = DOBIN()
X_transformed = dobin.fit_transform(X_series)
```
### Transformation Pipelines
Chain transformers together:
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22, PCA
pipeline = Pipeline([
('features', Catch22()),
('reduce', PCA(n_components=10))
])
X_transformed = pipeline.fit_transform(X_train)
```
## Distance Metrics
Specialized distance functions for time series similarity measurement.
### Distance Categories
#### Warping-Based Distances
**DTW (Dynamic Time Warping)**:
```python
from aeon.distances import dtw_distance, dtw_pairwise_distance
# Compute distance between two series
dist = dtw_distance(series1, series2, window=0.2)
# Pairwise distances for a collection
dist_matrix = dtw_pairwise_distance(X_collection)
# Get alignment path
from aeon.distances import dtw_alignment_path
path = dtw_alignment_path(series1, series2)
# Get cost matrix
from aeon.distances import dtw_cost_matrix
cost = dtw_cost_matrix(series1, series2)
```
**DTW Variants**:
```python
from aeon.distances import (
wdtw_distance, # Weighted DTW
ddtw_distance, # Derivative DTW
wddtw_distance, # Weighted Derivative DTW
adtw_distance, # Amerced DTW
shape_dtw_distance # Shape DTW
)
# Weighted DTW (penalize warping)
dist = wdtw_distance(series1, series2, g=0.05)
# Derivative DTW (compare shapes)
dist = ddtw_distance(series1, series2)
# Shape DTW (with shape descriptors)
dist = shape_dtw_distance(series1, series2)
```
**DTW Parameters**:
- `window`: Sakoe-Chiba band constraint (0.0-1.0)
- `g`: Penalty weight for warping distances
#### Edit Distances
**ERP (Edit distance with Real Penalty)**:
```python
from aeon.distances import erp_distance
dist = erp_distance(series1, series2, g=0.0, window=None)
```
**EDR (Edit Distance on Real sequences)**:
```python
from aeon.distances import edr_distance
dist = edr_distance(series1, series2, epsilon=0.1, window=None)
```
**LCSS (Longest Common SubSequence)**:
```python
from aeon.distances import lcss_distance
dist = lcss_distance(series1, series2, epsilon=1.0, window=None)
```
**TWE (Time Warp Edit)**:
```python
from aeon.distances import twe_distance
dist = twe_distance(series1, series2, penalty=0.1, stiffness=0.001)
```
#### Standard Metrics
```python
from aeon.distances import (
euclidean_distance,
manhattan_distance,
minkowski_distance,
squared_distance
)
# Euclidean distance
dist = euclidean_distance(series1, series2)
# Manhattan (L1) distance
dist = manhattan_distance(series1, series2)
# Minkowski distance
dist = minkowski_distance(series1, series2, p=3)
# Squared Euclidean
dist = squared_distance(series1, series2)
```
#### Specialized Distances
**MSM (Move-Split-Merge)**:
```python
from aeon.distances import msm_distance
dist = msm_distance(series1, series2, c=1.0)
```
**SBD (Shape-Based Distance)**:
```python
from aeon.distances import sbd_distance
dist = sbd_distance(series1, series2)
```
### Unified Distance Interface
```python
from aeon.distances import distance, pairwise_distance
# Compute any distance by name
dist = distance(series1, series2, metric="dtw", window=0.1)
# Pairwise distance matrix
dist_matrix = pairwise_distance(X_collection, metric="euclidean")
# Get available distance names
from aeon.distances import get_distance_function_names
available_distances = get_distance_function_names()
```
### Distance Selection Guide
**Fast and accurate**:
- Euclidean for aligned series
- Squared for even faster computation
**Handle temporal shifts**:
- DTW for general warping
- WDTW to penalize excessive warping
**Shape-based similarity**:
- DDTW or Shape DTW
- SBD for normalized shape comparison
**Robust to noise**:
- ERP, EDR, or LCSS
**Multivariate**:
- DTW supports multivariate via independent/dependent alignment
## Deep Learning Networks
Neural network architectures specialized for time series.
### Network Architectures
#### InceptionTime
Ensemble of Inception modules capturing multi-scale patterns:
```python
from aeon.networks import InceptionNetwork
from aeon.classification.deep_learning import InceptionTimeClassifier
# Use via classifier
clf = InceptionTimeClassifier(
n_epochs=200,
batch_size=64,
n_ensemble=5
)
# Or use network directly
network = InceptionNetwork(
n_classes=3,
n_channels=1,
n_timepoints=100
)
```
#### ResNet
Residual networks with skip connections:
```python
from aeon.networks import ResNetNetwork
from aeon.classification.deep_learning import ResNetClassifier
clf = ResNetClassifier(
n_epochs=200,
batch_size=64,
n_res_blocks=3
)
```
#### FCN (Fully Convolutional Network)
```python
from aeon.networks import FCNNetwork
from aeon.classification.deep_learning import FCNClassifier
clf = FCNClassifier(
n_epochs=200,
batch_size=64,
n_conv_layers=3
)
```
#### CNN
Standard convolutional architecture:
```python
from aeon.classification.deep_learning import CNNClassifier
clf = CNNClassifier(
n_epochs=100,
batch_size=32,
kernel_size=7,
n_filters=32
)
```
#### TapNet
Attentional prototype networks:
```python
from aeon.classification.deep_learning import TapNetClassifier
clf = TapNetClassifier(
n_epochs=200,
batch_size=64
)
```
#### MLP (Multi-Layer Perceptron)
```python
from aeon.classification.deep_learning import MLPClassifier
clf = MLPClassifier(
n_epochs=100,
batch_size=32,
hidden_layer_sizes=[500]
)
```
#### LITE (Light Inception with boosTing tEchnique)
Lightweight ensemble network:
```python
from aeon.classification.deep_learning import LITEClassifier
clf = LITEClassifier(
n_epochs=100,
batch_size=64
)
```
### Training Configuration
```python
from aeon.classification.deep_learning import InceptionTimeClassifier
clf = InceptionTimeClassifier(
n_epochs=200,
batch_size=64,
learning_rate=0.001,
use_bias=True,
verbose=1
)
clf.fit(X_train, y_train)
```
**Common parameters:**
- `n_epochs`: Training iterations
- `batch_size`: Samples per gradient update
- `learning_rate`: Optimizer learning rate
- `verbose`: Training output verbosity
- `callbacks`: Keras callbacks (early stopping, etc.)
## Datasets
Load built-in datasets and access UCR/UEA archives.
### Built-in Datasets
```python
from aeon.datasets import (
load_arrow_head,
load_airline,
load_gunpoint,
load_italy_power_demand,
load_basic_motions,
load_japanese_vowels
)
# Classification dataset
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Forecasting dataset (univariate series)
y = load_airline()
# Multivariate classification
X_train, y_train = load_basic_motions(split="train")
print(X_train.shape) # (n_cases, n_channels, n_timepoints)
```
### UCR/UEA Archives
Access 100+ benchmark datasets:
```python
from aeon.datasets import load_from_tsfile, load_classification
# Load UCR/UEA dataset by name
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Load from local .ts file
X, y = load_from_tsfile("data/my_dataset_TRAIN.ts")
```
### Dataset Information
```python
from aeon.datasets import get_dataset_meta_data
# Get metadata about a dataset
info = get_dataset_meta_data("GunPoint")
print(info)
# {'n_cases': 150, 'n_timepoints': 150, 'n_classes': 2, ...}
```
### Custom Dataset Format
Save/load custom datasets in aeon format:
```python
from aeon.datasets import write_to_tsfile, load_from_tsfile
# Save
write_to_tsfile(
X_train,
"my_dataset_TRAIN.ts",
y=y_train,
problem_name="MyDataset"
)
# Load
X, y = load_from_tsfile("my_dataset_TRAIN.ts")
```
## Benchmarking
Tools for reproducible evaluation and comparison.
### Benchmarking Utilities
```python
from aeon.benchmarking import benchmark_estimator
# Benchmark a classifier on multiple datasets
results = benchmark_estimator(
estimator=RocketClassifier(),
datasets=["GunPoint", "ArrowHead", "ItalyPowerDemand"],
n_resamples=10
)
```
### Result Storage and Comparison
```python
from aeon.benchmarking import (
write_results_to_csv,
read_results_from_csv,
compare_results
)
# Save results
write_results_to_csv(results, "results.csv")
# Load and compare
results_rocket = read_results_from_csv("results_rocket.csv")
results_inception = read_results_from_csv("results_inception.csv")
comparison = compare_results(
[results_rocket, results_inception],
estimator_names=["ROCKET", "InceptionTime"]
)
```
### Critical Difference Diagrams
Visualize statistical significance of differences:
```python
from aeon.benchmarking.results_plotting import plot_critical_difference_diagram
plot_critical_difference_diagram(
results_dict={
'ROCKET': results_rocket,
'InceptionTime': results_inception,
'BOSS': results_boss
},
dataset_names=["GunPoint", "ArrowHead", "ItalyPowerDemand"]
)
```
## Discovery and Tags
### Finding Estimators
```python
from aeon.utils.discovery import all_estimators
# Get all classifiers
classifiers = all_estimators(type_filter="classifier")
# Get all transformers
transformers = all_estimators(type_filter="transformer")
# Filter by capability tags
multivariate_classifiers = all_estimators(
type_filter="classifier",
filter_tags={"capability:multivariate": True}
)
```
### Checking Estimator Tags
```python
from aeon.utils.tags import all_tags_for_estimator
from aeon.classification.convolution_based import RocketClassifier
tags = all_tags_for_estimator(RocketClassifier)
print(tags)
# {'capability:multivariate': True, 'X_inner_type': ['numpy3D'], ...}
```
### Common Tags
- `capability:multivariate`: Handles multivariate series
- `capability:unequal_length`: Handles variable-length series
- `capability:missing_values`: Handles missing data
- `algorithm_type`: Algorithm family (e.g., "convolution", "distance")
- `python_dependencies`: Required packages

View File

@@ -0,0 +1,442 @@
# Learning Tasks: Classification, Regression, Clustering, and Similarity Search
This reference provides comprehensive details on supervised and unsupervised learning tasks for time series collections.
## Time Series Classification
Time series classification (TSC) assigns labels to entire sequences. Aeon provides diverse algorithm families with unique strengths.
### Algorithm Categories
#### 1. Convolution-Based Classifiers
Transform time series using random convolutional kernels:
**ROCKET (RAndom Convolutional KErnel Transform)**
- Ultra-fast feature extraction via random kernels
- 10,000+ kernels generate discriminative features
- Linear classifier on extracted features
```python
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
```
**Variants:**
- `MiniRocketClassifier`: Faster, streamlined version
- `MultiRocketClassifier`: Multivariate extensions
- `Arsenal`: Ensemble of ROCKET transformers
- `Hydra`: Dictionary-based convolution variant
#### 2. Deep Learning Classifiers
Neural networks specialized for time series:
**InceptionTime**
- Ensemble of Inception modules
- Captures patterns at multiple scales
- State-of-the-art on UCR benchmarks
```python
from aeon.classification.deep_learning import InceptionTimeClassifier
clf = InceptionTimeClassifier(n_epochs=200, batch_size=64)
clf.fit(X_train, y_train)
```
**Other architectures:**
- `ResNetClassifier`: Residual connections
- `FCNClassifier`: Fully Convolutional Networks
- `CNNClassifier`: Standard convolutional architecture
- `LITEClassifier`: Lightweight networks
- `MLPClassifier`: Multi-layer perceptrons
- `TapNetClassifier`: Attentional prototype networks
#### 3. Dictionary-Based Classifiers
Symbolic representations and bag-of-words approaches:
**BOSS (Bag of SFA Symbols)**
- Converts series to symbolic words
- Histogram-based classification
- Effective for shape patterns
```python
from aeon.classification.dictionary_based import BOSSEnsemble
clf = BOSSEnsemble(max_ensemble_size=500)
clf.fit(X_train, y_train)
```
**Other dictionary methods:**
- `TemporalDictionaryEnsemble (TDE)`: Enhanced BOSS with temporal info
- `WEASEL`: Word ExtrAction for time SEries cLassification
- `MUSE`: MUltivariate Symbolic Extension
- `MrSEQL`: Multiple Representations SEQuence Learner
#### 4. Distance-Based Classifiers
Leverage time series-specific distance metrics:
**K-Nearest Neighbors with DTW**
- Dynamic Time Warping handles temporal shifts
- Effective for shape-based similarity
```python
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
clf = KNeighborsTimeSeriesClassifier(
distance="dtw",
n_neighbors=5
)
clf.fit(X_train, y_train)
```
**Other distance methods:**
- `ElasticEnsemble`: Ensemble of elastic distances
- `ProximityForest`: Tree-based with elastic measures
- `ProximityTree`: Single tree variant
- `ShapeDTW`: DTW with shape descriptors
#### 5. Feature-Based Classifiers
Extract statistical and domain-specific features:
**Catch22**
- 22 time series features
- Canonical Time-series CHaracteristics
- Fast and interpretable
```python
from aeon.classification.feature_based import Catch22Classifier
clf = Catch22Classifier(estimator=RandomForestClassifier())
clf.fit(X_train, y_train)
```
**Other feature methods:**
- `FreshPRINCEClassifier`: Fresh Pipelines with Random Interval and Catch22 Features
- `SignatureClassifier`: Path signature features
- `TSFreshClassifier`: Comprehensive feature extraction (slower, more features)
- `SummaryClassifier`: Simple summary statistics
#### 6. Interval-Based Classifiers
Analyze discriminative time intervals:
**Time Series Forest (TSF)**
- Random intervals + summary statistics
- Random forest on extracted features
```python
from aeon.classification.interval_based import TimeSeriesForestClassifier
clf = TimeSeriesForestClassifier(n_estimators=500)
clf.fit(X_train, y_train)
```
**Other interval methods:**
- `CanonicalIntervalForest (CIF)`: Canonical Interval Forest
- `DrCIF`: Diverse Representation CIF
- `RISE`: Random Interval Spectral Ensemble
- `RandomIntervalClassifier`: Basic random interval approach
- `STSF`: Shapelet Transform Interval Forest
#### 7. Shapelet-Based Classifiers
Discover discriminative subsequences:
**Shapelets**: Small subsequences that best distinguish classes
```python
from aeon.classification.shapelet_based import ShapeletTransformClassifier
clf = ShapeletTransformClassifier(
n_shapelet_samples=10000,
max_shapelets=20
)
clf.fit(X_train, y_train)
```
**Other shapelet methods:**
- `LearningShapeletClassifier`: Gradient-based learning
- `SASTClassifier`: Shapelet-Attention Subsequence Transform
#### 8. Hybrid Ensembles
Combine multiple algorithm families:
**HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles)**
- State-of-the-art accuracy
- Combines shapelets, intervals, dictionaries, and spectral features
- V2 uses ROCKET and improved components
```python
from aeon.classification.hybrid import HIVECOTEV2
clf = HIVECOTEV2(n_jobs=-1) # Slow but highly accurate
clf.fit(X_train, y_train)
```
### Algorithm Selection Guide
**Fast and accurate (default choice):**
- `RocketClassifier` or `MiniRocketClassifier`
**Maximum accuracy (slow):**
- `HIVECOTEV2` or `InceptionTimeClassifier`
**Interpretable:**
- `Catch22Classifier` or `ShapeletTransformClassifier`
**Multivariate focus:**
- `MultiRocketClassifier` or `MUSE`
**Small datasets:**
- `KNeighborsTimeSeriesClassifier` with DTW
### Classification Workflow
```python
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
from sklearn.metrics import accuracy_score, classification_report
# Load data
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Train classifier
clf = RocketClassifier(n_jobs=-1)
clf.fit(X_train, y_train)
# Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.3f}")
print(classification_report(y_test, y_pred))
```
## Time Series Regression
Time series regression predicts continuous values from sequences. Most classification algorithms have regression equivalents.
### Regression Algorithms
Available regressors mirror classification structure:
- `RocketRegressor`, `MiniRocketRegressor`, `MultiRocketRegressor`
- `InceptionTimeRegressor`, `ResNetRegressor`, `FCNRegressor`
- `KNeighborsTimeSeriesRegressor`
- `Catch22Regressor`, `FreshPRINCERegressor`
- `TimeSeriesForestRegressor`, `DrCIFRegressor`
### Regression Workflow
```python
from aeon.regression.convolution_based import RocketRegressor
from sklearn.metrics import mean_squared_error, r2_score
# Train regressor
reg = RocketRegressor(num_kernels=10000)
reg.fit(X_train, y_train_continuous)
# Predict and evaluate
y_pred = reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MSE: {mse:.3f}, R²: {r2:.3f}")
```
## Time Series Clustering
Clustering groups similar time series without labels.
### Clustering Algorithms
**TimeSeriesKMeans**
- K-means with time series distances
- Supports DTW, Euclidean, and other metrics
```python
from aeon.clustering import TimeSeriesKMeans
clusterer = TimeSeriesKMeans(
n_clusters=3,
distance="dtw",
n_init=10
)
clusterer.fit(X_collection)
labels = clusterer.labels_
```
**TimeSeriesKMedoids**
- Uses actual series as cluster centers
- More robust to outliers
```python
from aeon.clustering import TimeSeriesKMedoids
clusterer = TimeSeriesKMedoids(
n_clusters=3,
distance="euclidean"
)
clusterer.fit(X_collection)
```
**Other clustering methods:**
- `TimeSeriesKernelKMeans`: Kernel-based clustering
- `ElasticSOM`: Self-organizing maps with elastic distances
### Clustering Workflow
```python
from aeon.clustering import TimeSeriesKMeans
from aeon.distances import dtw_distance
import numpy as np
# Cluster time series
clusterer = TimeSeriesKMeans(n_clusters=4, distance="dtw")
clusterer.fit(X_train)
# Get cluster labels
labels = clusterer.predict(X_test)
# Compute cluster centers
centers = clusterer.cluster_centers_
# Evaluate clustering quality (if ground truth available)
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(y_true, labels)
```
## Similarity Search
Similarity search finds motifs, nearest neighbors, and repeated patterns.
### Key Concepts
**Motifs**: Frequently repeated subsequences within a time series
**Matrix Profile**: Data structure encoding nearest neighbor distances for all subsequences
### Similarity Search Methods
**Matrix Profile**
- Efficient motif discovery
- Change point detection
- Anomaly detection
```python
from aeon.similarity_search import MatrixProfile
mp = MatrixProfile(window_size=50)
profile = mp.fit_transform(X_series)
# Find top motif
motif_idx = np.argmin(profile)
```
**Query Search**
- Find nearest neighbors to a query subsequence
- Useful for template matching
```python
from aeon.similarity_search import QuerySearch
searcher = QuerySearch(distance="euclidean")
distances, indices = searcher.search(X_series, query_subsequence)
```
### Similarity Search Workflow
```python
from aeon.similarity_search import MatrixProfile
import numpy as np
# Compute matrix profile
mp = MatrixProfile(window_size=100)
profile, profile_index = mp.fit_transform(X_series)
# Find top-k motifs (lowest profile values)
k = 3
motif_indices = np.argsort(profile)[:k]
# Find anomalies (highest profile values)
anomaly_indices = np.argsort(profile)[-k:]
```
## Ensemble and Composition Tools
### Voting Ensembles
```python
from aeon.classification.ensemble import WeightedEnsembleClassifier
from aeon.classification.convolution_based import RocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
ensemble = WeightedEnsembleClassifier(
estimators=[
('rocket', RocketClassifier()),
('knn', KNeighborsTimeSeriesClassifier())
]
)
ensemble.fit(X_train, y_train)
```
### Pipelines
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('features', Catch22()),
('classifier', RandomForestClassifier())
])
pipeline.fit(X_train, y_train)
```
## Model Selection and Validation
### Cross-Validation
```python
from sklearn.model_selection import cross_val_score
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier()
scores = cross_val_score(clf, X_train, y_train, cv=5)
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
```
### Grid Search
```python
from sklearn.model_selection import GridSearchCV
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
param_grid = {
'n_neighbors': [1, 3, 5, 7],
'distance': ['dtw', 'euclidean', 'erp']
}
clf = KNeighborsTimeSeriesClassifier()
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")
```
## Discovery Functions
Find available estimators programmatically:
```python
from aeon.utils.discovery import all_estimators
# Get all classifiers
classifiers = all_estimators(type_filter="classifier")
# Get all regressors
regressors = all_estimators(type_filter="regressor")
# Get all clusterers
clusterers = all_estimators(type_filter="clusterer")
# Filter by tag (e.g., multivariate capable)
mv_classifiers = all_estimators(
type_filter="classifier",
filter_tags={"capability:multivariate": True}
)
```

View File

@@ -0,0 +1,596 @@
# Temporal Analysis: Forecasting, Anomaly Detection, and Segmentation
This reference provides comprehensive details on forecasting future values, detecting anomalies, and segmenting time series.
## Forecasting
Forecasting predicts future values in a time series based on historical patterns.
### Forecasting Concepts
**Forecasting horizon (fh)**: Number of steps ahead to predict
- Absolute: `fh=[1, 2, 3]` (predict steps 1, 2, 3)
- Relative: `fh=ForecastingHorizon([1, 2, 3], is_relative=True)`
**Exogenous variables**: External features that influence predictions
### Statistical Forecasters
#### ARIMA (AutoRegressive Integrated Moving Average)
Classical time series model combining AR, differencing, and MA components:
```python
from aeon.forecasting.arima import ARIMA
forecaster = ARIMA(
order=(1, 1, 1), # (p, d, q)
seasonal_order=(1, 1, 1, 12), # (P, D, Q, s)
suppress_warnings=True
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
**Parameters:**
- `p`: AR order (lags)
- `d`: Differencing order
- `q`: MA order (moving average)
- `P, D, Q, s`: Seasonal components
#### ETS (Exponential Smoothing)
State space model for trend and seasonality:
```python
from aeon.forecasting.ets import ETS
forecaster = ETS(
error="add",
trend="add",
seasonal="add",
sp=12 # seasonal period
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
**Model types:**
- Error: "add" (additive) or "mul" (multiplicative)
- Trend: "add", "mul", or None
- Seasonal: "add", "mul", or None
#### Theta Forecaster
Simple, effective method using exponential smoothing:
```python
from aeon.forecasting.theta import ThetaForecaster
forecaster = ThetaForecaster(deseasonalize=True, sp=12)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=np.arange(1, 13))
```
#### TAR (Threshold AutoRegressive)
Non-linear autoregressive model with regime switching:
```python
from aeon.forecasting.tar import TAR
forecaster = TAR(
delay=1,
threshold=0.0,
order_below=2,
order_above=2
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
**AutoTAR**: Automatically optimizes threshold:
```python
from aeon.forecasting.tar import AutoTAR
forecaster = AutoTAR(max_order=5)
forecaster.fit(y_train)
```
#### TVP (Time-Varying Parameter)
Kalman filter-based forecaster with dynamic coefficients:
```python
from aeon.forecasting.tvp import TVP
forecaster = TVP(
order=2,
use_exog=False
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Naive Baselines
Simple forecasting strategies for benchmarking:
```python
from aeon.forecasting.naive import NaiveForecaster
# Last value
forecaster = NaiveForecaster(strategy="last")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
# Seasonal naive (use value from same season last year)
forecaster = NaiveForecaster(strategy="seasonal_last", sp=12)
forecaster.fit(y_train)
# Mean
forecaster = NaiveForecaster(strategy="mean")
forecaster.fit(y_train)
# Drift (linear trend from first to last)
forecaster = NaiveForecaster(strategy="drift")
forecaster.fit(y_train)
```
**Strategies:**
- `"last"`: Repeat last observed value
- `"mean"`: Use mean of training data
- `"seasonal_last"`: Repeat value from previous season
- `"drift"`: Linear extrapolation
### Deep Learning Forecasters
#### TCN (Temporal Convolutional Network)
Deep learning with dilated causal convolutions:
```python
from aeon.forecasting.deep_learning import TCNForecaster
forecaster = TCNForecaster(
n_epochs=100,
batch_size=32,
kernel_size=3,
n_filters=64,
dilation_rate=2
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
### Regression-Based Forecasting
Transform forecasting into a supervised learning problem:
```python
from aeon.forecasting.compose import RegressionForecaster
from sklearn.ensemble import RandomForestRegressor
forecaster = RegressionForecaster(
regressor=RandomForestRegressor(n_estimators=100),
window_length=10
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Forecasting Workflow
```python
from aeon.forecasting.arima import ARIMA
from aeon.datasets import load_airline
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
# Load data
y = load_airline()
# Split train/test
split_point = int(len(y) * 0.8)
y_train, y_test = y[:split_point], y[split_point:]
# Fit forecaster
forecaster = ARIMA(order=(2, 1, 2), suppress_warnings=True)
forecaster.fit(y_train)
# Predict
fh = np.arange(1, len(y_test) + 1)
y_pred = forecaster.predict(fh=fh)
# Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"MAE: {mae:.3f}, RMSE: {rmse:.3f}")
```
### Forecasting with Exogenous Variables
```python
from aeon.forecasting.arima import ARIMA
# X contains exogenous features
forecaster = ARIMA(order=(1, 1, 1))
forecaster.fit(y_train, X=X_train)
# Must provide future exogenous values
y_pred = forecaster.predict(fh=[1, 2, 3], X=X_future)
```
### Multi-Step Forecasting Strategies
**Direct**: Train separate model for each horizon
**Recursive**: Use predictions as inputs for next step
**DirRec**: Combine both strategies
```python
from aeon.forecasting.compose import DirectReductionForecaster
from sklearn.linear_model import Ridge
forecaster = DirectReductionForecaster(
regressor=Ridge(),
window_length=10
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
## Anomaly Detection
Anomaly detection identifies unusual patterns or outliers in time series data.
### Anomaly Detection Types
**Point anomalies**: Single unusual values
**Contextual anomalies**: Values anomalous given context
**Collective anomalies**: Sequences of unusual behavior
### Distance-Based Anomaly Detectors
#### STOMP (Scalable Time series Ordered-search Matrix Profile)
Matrix profile-based anomaly detection:
```python
from aeon.anomaly_detection import STOMP
detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
# High scores indicate anomalies
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
```
#### LeftSTAMPi
Incremental matrix profile for streaming data:
```python
from aeon.anomaly_detection import LeftSTAMPi
detector = LeftSTAMPi(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### MERLIN
Matrix profile with range constraints:
```python
from aeon.anomaly_detection import MERLIN
detector = MERLIN(window_size=50, k=3)
anomaly_scores = detector.fit_predict(X_series)
```
#### KMeansAD
K-means clustering-based anomaly detection:
```python
from aeon.anomaly_detection import KMeansAD
detector = KMeansAD(n_clusters=5, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### CBLOF (Cluster-Based Local Outlier Factor)
```python
from aeon.anomaly_detection import CBLOF
detector = CBLOF(n_clusters=8, alpha=0.9)
anomaly_scores = detector.fit_predict(X_series)
```
#### LOF (Local Outlier Factor)
Density-based outlier detection:
```python
from aeon.anomaly_detection import LOF
detector = LOF(n_neighbors=20, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### ROCKAD
ROCKET-based anomaly detection:
```python
from aeon.anomaly_detection import ROCKAD
detector = ROCKAD(num_kernels=1000, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
### Distribution-Based Anomaly Detectors
#### COPOD (Copula-Based Outlier Detection)
```python
from aeon.anomaly_detection import COPOD
detector = COPOD(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### DWT_MLEAD
Discrete Wavelet Transform with Machine Learning:
```python
from aeon.anomaly_detection import DWT_MLEAD
detector = DWT_MLEAD(window_size=50, wavelet='db4')
anomaly_scores = detector.fit_predict(X_series)
```
### Outlier Detection Methods
#### IsolationForest
Ensemble tree-based isolation:
```python
from aeon.anomaly_detection import IsolationForest
detector = IsolationForest(
n_estimators=100,
window_size=50,
contamination=0.1
)
anomaly_scores = detector.fit_predict(X_series)
```
#### OneClassSVM
Support vector machine for novelty detection:
```python
from aeon.anomaly_detection import OneClassSVM
detector = OneClassSVM(
kernel='rbf',
nu=0.1,
window_size=50
)
anomaly_scores = detector.fit_predict(X_series)
```
#### STRAY (Search TRace AnomalY)
```python
from aeon.anomaly_detection import STRAY
detector = STRAY(alpha=0.05)
anomaly_scores = detector.fit_predict(X_series)
```
### Collection Anomaly Detection
Detect anomalous time series within a collection:
```python
from aeon.anomaly_detection import ClassificationAdapter
from aeon.classification.convolution_based import RocketClassifier
detector = ClassificationAdapter(
classifier=RocketClassifier()
)
detector.fit(X_normal) # Train on normal data
anomaly_labels = detector.predict(X_test) # 1 = anomaly, 0 = normal
```
### Anomaly Detection Workflow
```python
from aeon.anomaly_detection import STOMP
import numpy as np
import matplotlib.pyplot as plt
# Detect anomalies
detector = STOMP(window_size=100)
anomaly_scores = detector.fit_predict(X_series)
# Identify anomalies (top 5%)
threshold = np.percentile(anomaly_scores, 95)
anomaly_indices = np.where(anomaly_scores > threshold)[0]
# Visualize
plt.figure(figsize=(12, 6))
plt.subplot(2, 1, 1)
plt.plot(X_series[0, 0, :])
plt.scatter(anomaly_indices, X_series[0, 0, anomaly_indices],
color='red', label='Anomalies', zorder=5)
plt.legend()
plt.title('Time Series with Detected Anomalies')
plt.subplot(2, 1, 2)
plt.plot(anomaly_scores)
plt.axhline(threshold, color='red', linestyle='--', label='Threshold')
plt.legend()
plt.title('Anomaly Scores')
plt.tight_layout()
plt.show()
```
## Segmentation
Segmentation divides time series into distinct regions or identifies change points.
### Segmentation Concepts
**Change points**: Locations where statistical properties change
**Segments**: Homogeneous regions between change points
**Applications**: Regime detection, event identification, structural breaks
### Segmentation Algorithms
#### ClaSP (Classification Score Profile)
Discover change points using classification performance:
```python
from aeon.segmentation import ClaSPSegmenter
segmenter = ClaSPSegmenter(
n_segments=3,
period_length=10
)
change_points = segmenter.fit_predict(X_series)
print(f"Change points at indices: {change_points}")
```
**How it works:**
- Slides a window over the series
- Computes classification score for left vs. right segments
- High scores indicate change points
#### FLUSS (Fast Low-cost Unipotent Semantic Segmentation)
Matrix profile-based segmentation:
```python
from aeon.segmentation import FLUSSSegmenter
segmenter = FLUSSSegmenter(
n_segments=5,
window_size=50
)
change_points = segmenter.fit_predict(X_series)
```
#### BinSeg (Binary Segmentation)
Recursive splitting for change point detection:
```python
from aeon.segmentation import BinSegSegmenter
segmenter = BinSegSegmenter(
n_segments=4,
model="l2" # cost function
)
change_points = segmenter.fit_predict(X_series)
```
**Models:**
- `"l2"`: Least squares (continuous data)
- `"l1"`: Absolute deviation (robust to outliers)
- `"rbf"`: Radial basis function
- `"ar"`: Autoregressive model
#### HMM (Hidden Markov Model) Segmentation
Probabilistic state-based segmentation:
```python
from aeon.segmentation import HMMSegmenter
segmenter = HMMSegmenter(
n_states=3,
covariance_type="full"
)
segmenter.fit(X_series)
states = segmenter.predict(X_series)
```
### Segmentation Workflow
```python
from aeon.segmentation import ClaSPSegmenter
import matplotlib.pyplot as plt
# Detect change points
segmenter = ClaSPSegmenter(n_segments=4)
change_points = segmenter.fit_predict(X_series)
# Visualize segments
plt.figure(figsize=(12, 4))
plt.plot(X_series[0, 0, :])
for cp in change_points:
plt.axvline(cp, color='red', linestyle='--', alpha=0.7)
plt.title('Time Series Segmentation')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
# Extract segments
segments = []
prev_cp = 0
for cp in np.append(change_points, len(X_series[0, 0, :])):
segment = X_series[0, 0, prev_cp:cp]
segments.append(segment)
prev_cp = cp
```
### Multi-Variate Segmentation
```python
from aeon.segmentation import ClaSPSegmenter
# X_multivariate has shape (1, n_channels, n_timepoints)
segmenter = ClaSPSegmenter(n_segments=3)
change_points = segmenter.fit_predict(X_multivariate)
```
## Combining Forecasting, Anomaly Detection, and Segmentation
### Robust Forecasting with Anomaly Detection
```python
from aeon.forecasting.arima import ARIMA
from aeon.anomaly_detection import IsolationForest
# Detect and remove anomalies
detector = IsolationForest(window_size=50, contamination=0.1)
anomaly_scores = detector.fit_predict(X_series)
normal_mask = anomaly_scores < np.percentile(anomaly_scores, 90)
# Forecast on cleaned data
y_clean = y_train[normal_mask]
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_clean)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Segmentation-Based Forecasting
```python
from aeon.segmentation import ClaSPSegmenter
from aeon.forecasting.arima import ARIMA
# Segment time series
segmenter = ClaSPSegmenter(n_segments=3)
change_points = segmenter.fit_predict(X_series)
# Forecast using most recent segment
last_segment_start = change_points[-1]
y_recent = y_train[last_segment_start:]
forecaster = ARIMA(order=(1, 1, 1))
forecaster.fit(y_recent)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
## Discovery Functions
Find available forecasters, detectors, and segmenters:
```python
from aeon.utils.discovery import all_estimators
# Get all forecasters
forecasters = all_estimators(type_filter="forecaster")
# Get all anomaly detectors
detectors = all_estimators(type_filter="anomaly-detector")
# Get all segmenters
segmenters = all_estimators(type_filter="segmenter")
```

View File

@@ -0,0 +1,634 @@
# Common Workflows and Integration Patterns
This reference provides end-to-end workflows, best practices, and integration patterns for using aeon effectively.
## Complete Classification Workflow
### Basic Classification Pipeline
```python
# 1. Import required modules
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
# 2. Load and inspect data
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
print(f"Training shape: {X_train.shape}") # (n_cases, n_channels, n_timepoints)
print(f"Unique classes: {np.unique(y_train)}")
# 3. Train classifier
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
clf.fit(X_train, y_train)
# 4. Make predictions
y_pred = clf.predict(X_test)
y_proba = clf.predict_proba(X_test)
# 5. Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.3f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# 6. Visualize confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()
```
### Feature Extraction + Classifier Pipeline
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
# Create pipeline
pipeline = Pipeline([
('features', Catch22(n_jobs=-1)),
('classifier', RandomForestClassifier(n_estimators=500, n_jobs=-1))
])
# Cross-validation
scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='accuracy')
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
# Train on full training set
pipeline.fit(X_train, y_train)
# Evaluate on test set
accuracy = pipeline.score(X_test, y_test)
print(f"Test Accuracy: {accuracy:.3f}")
```
### Multi-Algorithm Comparison
```python
from aeon.classification.convolution_based import RocketClassifier, MiniRocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
from aeon.classification.feature_based import Catch22Classifier
from aeon.classification.interval_based import TimeSeriesForestClassifier
import time
classifiers = {
'ROCKET': RocketClassifier(num_kernels=10000),
'MiniRocket': MiniRocketClassifier(),
'KNN-DTW': KNeighborsTimeSeriesClassifier(distance='dtw', n_neighbors=5),
'Catch22': Catch22Classifier(),
'TSF': TimeSeriesForestClassifier(n_estimators=200)
}
results = {}
for name, clf in classifiers.items():
start_time = time.time()
clf.fit(X_train, y_train)
train_time = time.time() - start_time
start_time = time.time()
accuracy = clf.score(X_test, y_test)
test_time = time.time() - start_time
results[name] = {
'accuracy': accuracy,
'train_time': train_time,
'test_time': test_time
}
# Display results
import pandas as pd
df_results = pd.DataFrame(results).T
df_results = df_results.sort_values('accuracy', ascending=False)
print(df_results)
```
## Complete Forecasting Workflow
### Univariate Forecasting
```python
from aeon.forecasting.arima import ARIMA
from aeon.forecasting.naive import NaiveForecaster
from aeon.datasets import load_airline
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
import matplotlib.pyplot as plt
# 1. Load data
y = load_airline()
# 2. Train/test split (temporal)
split_point = int(len(y) * 0.8)
y_train, y_test = y[:split_point], y[split_point:]
# 3. Create baseline (naive forecaster)
baseline = NaiveForecaster(strategy="last")
baseline.fit(y_train)
y_pred_baseline = baseline.predict(fh=np.arange(1, len(y_test) + 1))
# 4. Train ARIMA model
forecaster = ARIMA(order=(2, 1, 2), seasonal_order=(1, 1, 1, 12))
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=np.arange(1, len(y_test) + 1))
# 5. Evaluate
mae_baseline = mean_absolute_error(y_test, y_pred_baseline)
mae_arima = mean_absolute_error(y_test, y_pred)
rmse_baseline = np.sqrt(mean_squared_error(y_test, y_pred_baseline))
rmse_arima = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Baseline - MAE: {mae_baseline:.2f}, RMSE: {rmse_baseline:.2f}")
print(f"ARIMA - MAE: {mae_arima:.2f}, RMSE: {rmse_arima:.2f}")
# 6. Visualize
plt.figure(figsize=(12, 6))
plt.plot(y_train.index, y_train, label='Train', alpha=0.7)
plt.plot(y_test.index, y_test, label='Test (Actual)', alpha=0.7)
plt.plot(y_test.index, y_pred, label='ARIMA Forecast', linestyle='--')
plt.plot(y_test.index, y_pred_baseline, label='Baseline', linestyle=':', alpha=0.5)
plt.legend()
plt.title('Forecasting Results')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
```
### Forecast with Confidence Intervals
```python
from aeon.forecasting.arima import ARIMA
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_train)
# Predict with prediction intervals
y_pred = forecaster.predict(fh=np.arange(1, len(y_test) + 1))
pred_interval = forecaster.predict_interval(
fh=np.arange(1, len(y_test) + 1),
coverage=0.95
)
# Visualize with confidence bands
plt.figure(figsize=(12, 6))
plt.plot(y_test.index, y_test, label='Actual')
plt.plot(y_test.index, y_pred, label='Forecast')
plt.fill_between(
y_test.index,
pred_interval.iloc[:, 0],
pred_interval.iloc[:, 1],
alpha=0.3,
label='95% Confidence'
)
plt.legend()
plt.show()
```
### Multi-Step Ahead Forecasting
```python
from aeon.forecasting.compose import DirectReductionForecaster
from sklearn.ensemble import GradientBoostingRegressor
# Convert to supervised learning problem
forecaster = DirectReductionForecaster(
regressor=GradientBoostingRegressor(n_estimators=100),
window_length=12
)
forecaster.fit(y_train)
# Forecast multiple steps
fh = np.arange(1, 13) # 12 months ahead
y_pred = forecaster.predict(fh=fh)
```
## Complete Anomaly Detection Workflow
```python
from aeon.anomaly_detection import STOMP
from aeon.datasets import load_airline
import numpy as np
import matplotlib.pyplot as plt
# 1. Load data
y = load_airline()
X_series = y.values.reshape(1, 1, -1) # Convert to aeon format
# 2. Detect anomalies
detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
# 3. Identify anomalies (top 5%)
threshold = np.percentile(anomaly_scores, 95)
anomaly_indices = np.where(anomaly_scores > threshold)[0]
# 4. Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
# Plot time series with anomalies
axes[0].plot(y.values, label='Time Series')
axes[0].scatter(
anomaly_indices,
y.values[anomaly_indices],
color='red',
s=100,
label='Anomalies',
zorder=5
)
axes[0].set_ylabel('Value')
axes[0].legend()
axes[0].set_title('Time Series with Detected Anomalies')
# Plot anomaly scores
axes[1].plot(anomaly_scores, label='Anomaly Score')
axes[1].axhline(threshold, color='red', linestyle='--', label='Threshold')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Score')
axes[1].legend()
axes[1].set_title('Anomaly Scores')
plt.tight_layout()
plt.show()
# 5. Extract anomalous segments
print(f"Found {len(anomaly_indices)} anomalies")
for idx in anomaly_indices[:5]: # Show first 5
print(f"Anomaly at index {idx}, value: {y.values[idx]:.2f}")
```
## Complete Clustering Workflow
```python
from aeon.clustering import TimeSeriesKMeans
from aeon.datasets import load_basic_motions
from sklearn.metrics import silhouette_score, davies_bouldin_score
import matplotlib.pyplot as plt
# 1. Load data
X_train, y_train = load_basic_motions(split="train")
# 2. Determine optimal number of clusters (elbow method)
inertias = []
silhouettes = []
K = range(2, 11)
for k in K:
clusterer = TimeSeriesKMeans(n_clusters=k, distance="euclidean", n_init=5)
labels = clusterer.fit_predict(X_train)
inertias.append(clusterer.inertia_)
silhouettes.append(silhouette_score(X_train.reshape(len(X_train), -1), labels))
# Plot elbow curve
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].plot(K, inertias, 'bo-')
axes[0].set_xlabel('Number of Clusters')
axes[0].set_ylabel('Inertia')
axes[0].set_title('Elbow Method')
axes[1].plot(K, silhouettes, 'ro-')
axes[1].set_xlabel('Number of Clusters')
axes[1].set_ylabel('Silhouette Score')
axes[1].set_title('Silhouette Analysis')
plt.tight_layout()
plt.show()
# 3. Cluster with optimal k
optimal_k = 4
clusterer = TimeSeriesKMeans(n_clusters=optimal_k, distance="dtw", n_init=10)
labels = clusterer.fit_predict(X_train)
# 4. Visualize clusters
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.ravel()
for cluster_id in range(optimal_k):
cluster_indices = np.where(labels == cluster_id)[0]
ax = axes[cluster_id]
# Plot all series in cluster
for idx in cluster_indices[:20]: # Plot up to 20 series
ax.plot(X_train[idx, 0, :], alpha=0.3, color='blue')
# Plot cluster center
ax.plot(clusterer.cluster_centers_[cluster_id, 0, :],
color='red', linewidth=2, label='Center')
ax.set_title(f'Cluster {cluster_id} (n={len(cluster_indices)})')
ax.legend()
plt.tight_layout()
plt.show()
```
## Cross-Validation Strategies
### Standard K-Fold Cross-Validation
```python
from sklearn.model_selection import cross_val_score, StratifiedKFold
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier()
# Stratified K-Fold (preserves class distribution)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(clf, X_train, y_train, cv=cv, scoring='accuracy')
print(f"Cross-validation scores: {scores}")
print(f"Mean accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
```
### Time Series Cross-Validation (for forecasting)
```python
from sklearn.model_selection import TimeSeriesSplit
from aeon.forecasting.arima import ARIMA
from sklearn.metrics import mean_squared_error
import numpy as np
# Time-aware split (no future data leakage)
tscv = TimeSeriesSplit(n_splits=5)
mse_scores = []
for train_idx, test_idx in tscv.split(y):
y_train_cv, y_test_cv = y.iloc[train_idx], y.iloc[test_idx]
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_train_cv)
fh = np.arange(1, len(y_test_cv) + 1)
y_pred = forecaster.predict(fh=fh)
mse = mean_squared_error(y_test_cv, y_pred)
mse_scores.append(mse)
print(f"CV MSE: {np.mean(mse_scores):.3f} (+/- {np.std(mse_scores):.3f})")
```
## Hyperparameter Tuning
### Grid Search
```python
from sklearn.model_selection import GridSearchCV
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
# Define parameter grid
param_grid = {
'n_neighbors': [1, 3, 5, 7, 9],
'distance': ['dtw', 'euclidean', 'erp', 'msm'],
'distance_params': [{'window': 0.1}, {'window': 0.2}, None]
}
# Grid search with cross-validation
clf = KNeighborsTimeSeriesClassifier()
grid_search = GridSearchCV(
clf,
param_grid,
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=2
)
grid_search.fit(X_train, y_train)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.3f}")
print(f"Test accuracy: {grid_search.score(X_test, y_test):.3f}")
```
### Random Search
```python
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
param_distributions = {
'n_neighbors': randint(1, 20),
'distance': ['dtw', 'euclidean', 'ddtw'],
'distance_params': [{'window': w} for w in np.linspace(0.0, 0.5, 10)]
}
clf = KNeighborsTimeSeriesClassifier()
random_search = RandomizedSearchCV(
clf,
param_distributions,
n_iter=50,
cv=5,
scoring='accuracy',
n_jobs=-1,
random_state=42
)
random_search.fit(X_train, y_train)
print(f"Best parameters: {random_search.best_params_}")
```
## Integration with scikit-learn
### Using aeon in scikit-learn Pipelines
```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from aeon.transformations.collection import Catch22
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('features', Catch22()),
('scaler', StandardScaler()),
('feature_selection', SelectKBest(f_classif, k=15)),
('classifier', RandomForestClassifier(n_estimators=500))
])
pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_test, y_test)
```
### Voting Ensemble with scikit-learn
```python
from sklearn.ensemble import VotingClassifier
from aeon.classification.convolution_based import RocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
from aeon.classification.feature_based import Catch22Classifier
ensemble = VotingClassifier(
estimators=[
('rocket', RocketClassifier()),
('knn', KNeighborsTimeSeriesClassifier()),
('catch22', Catch22Classifier())
],
voting='soft',
n_jobs=-1
)
ensemble.fit(X_train, y_train)
accuracy = ensemble.score(X_test, y_test)
```
### Stacking with Meta-Learner
```python
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from aeon.classification.convolution_based import MiniRocketClassifier
from aeon.classification.interval_based import TimeSeriesForestClassifier
stacking = StackingClassifier(
estimators=[
('minirocket', MiniRocketClassifier()),
('tsf', TimeSeriesForestClassifier(n_estimators=100))
],
final_estimator=LogisticRegression(),
cv=5
)
stacking.fit(X_train, y_train)
accuracy = stacking.score(X_test, y_test)
```
## Data Preprocessing
### Handling Variable-Length Series
```python
from aeon.transformations.collection import PaddingTransformer
# Pad series to equal length
padder = PaddingTransformer(pad_length=None, fill_value=0)
X_padded = padder.fit_transform(X_variable_length)
```
### Handling Missing Values
```python
from aeon.transformations.series import Imputer
imputer = Imputer(method='mean')
X_imputed = imputer.fit_transform(X_with_missing)
```
### Normalization
```python
from aeon.transformations.collection import Normalizer
normalizer = Normalizer(method='z-score')
X_normalized = normalizer.fit_transform(X_train)
```
## Model Persistence
### Saving and Loading Models
```python
import pickle
from aeon.classification.convolution_based import RocketClassifier
# Train and save
clf = RocketClassifier()
clf.fit(X_train, y_train)
with open('rocket_model.pkl', 'wb') as f:
pickle.dump(clf, f)
# Load and predict
with open('rocket_model.pkl', 'rb') as f:
loaded_clf = pickle.load(f)
predictions = loaded_clf.predict(X_test)
```
### Using joblib (recommended for large models)
```python
import joblib
# Save
joblib.dump(clf, 'rocket_model.joblib')
# Load
loaded_clf = joblib.load('rocket_model.joblib')
```
## Visualization Utilities
### Plotting Time Series
```python
from aeon.visualisation import plot_series
import matplotlib.pyplot as plt
# Plot multiple series
fig, ax = plt.subplots(figsize=(12, 6))
plot_series(X_train[0], X_train[1], X_train[2], labels=['Series 1', 'Series 2', 'Series 3'], ax=ax)
plt.title('Time Series Visualization')
plt.show()
```
### Plotting Distance Matrices
```python
from aeon.distances import pairwise_distance
import seaborn as sns
dist_matrix = pairwise_distance(X_train[:50], metric="dtw")
plt.figure(figsize=(10, 8))
sns.heatmap(dist_matrix, cmap='viridis', square=True)
plt.title('DTW Distance Matrix')
plt.show()
```
## Performance Optimization Tips
1. **Use n_jobs=-1** for parallel processing:
```python
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
```
2. **Use MiniRocket instead of ROCKET** for faster training:
```python
clf = MiniRocketClassifier() # 75% faster
```
3. **Reduce num_kernels** for faster training:
```python
clf = RocketClassifier(num_kernels=2000) # Default is 10000
```
4. **Use Catch22 instead of TSFresh**:
```python
transform = Catch22() # Much faster, fewer features
```
5. **Window constraints for DTW**:
```python
clf = KNeighborsTimeSeriesClassifier(
distance='dtw',
distance_params={'window': 0.1} # Constrain warping
)
```
## Best Practices
1. **Always use train/test split** with time series ordering preserved
2. **Use stratified splits** for classification to maintain class balance
3. **Start with fast algorithms** (ROCKET, MiniRocket) before trying slow ones
4. **Use cross-validation** to estimate generalization performance
5. **Benchmark against naive baselines** to establish minimum performance
6. **Normalize/standardize** when using distance-based methods
7. **Use appropriate distance metrics** for your data characteristics
8. **Save trained models** to avoid retraining
9. **Monitor training time** and computational resources
10. **Visualize results** to understand model behavior