Improve the aeon skill

This commit is contained in:
Timothy Kassis
2025-11-03 15:59:29 -08:00
parent 862445f531
commit 094d5aa9f1
17 changed files with 2493 additions and 2563 deletions

View File

@@ -42,7 +42,7 @@
- **ESM (Evolutionary Scale Modeling)** - State-of-the-art protein language models from EvolutionaryScale for protein design, structure prediction, and representation learning. Includes ESM3 (1.4B-98B parameter multimodal generative models for simultaneous reasoning across sequence, structure, and function with chain-of-thought generation, inverse folding, and function-conditioned design) and ESM C (300M-6B parameter efficient embedding models 3x faster than ESM2 for similarity analysis, classification, and feature extraction). Supports local inference with open weights and cloud-based Forge API for scalable batch processing. Use cases: novel protein design, structure prediction from sequence, sequence design from structure, protein embeddings, function annotation, variant generation, and directed evolution workflows
## Machine Learning & Deep Learning
- **aeon** - Time series machine learning toolkit for classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use this skill when working with temporal data, performing time series analysis, building predictive models on sequential data, or implementing workflows that involve distance metrics (DTW), transformations (ROCKET, Catch22), or deep learning for time series. Applicable for tasks like ECG classification, stock price forecasting, sensor anomaly detection, or activity recognition from wearable devices
- **aeon** - Comprehensive scikit-learn compatible Python toolkit for time series machine learning providing state-of-the-art algorithms across 7 domains: classification (13 algorithm categories including ROCKET variants, deep learning with InceptionTime/ResNet/FCN, distance-based with DTW/ERP/LCSS, shapelet-based, dictionary methods like BOSS/WEASEL, and hybrid ensembles HIVECOTE), regression (9 categories mirroring classification approaches), clustering (k-means/k-medoids with temporal distances, deep learning autoencoders, spectral methods), forecasting (ARIMA, ETS, Theta, Threshold Autoregressive, TCN, DeepAR), anomaly detection (STOMP/MERLIN matrix profile, clustering-based CBLOF/KMeans, isolation methods, copula-based COPOD), segmentation (ClaSP, FLUSS, HMM, binary segmentation), and similarity search (MASS algorithm, STOMP motif discovery, approximate nearest neighbors). Includes 40+ distance metrics (elastic: DTW/DDTW/WDTW/Shape-DTW, edit-based: ERP/EDR/LCSS/TWE/MSM, lock-step: Euclidean/Manhattan), extensive transformations (ROCKET/MiniRocket/MultiRocket for features, Catch22/TSFresh for statistics, SAX/PAA for symbolic representation, shapelet transforms, wavelets, matrix profile), 20+ deep learning architectures (FCN, ResNet, InceptionTime, TCN, autoencoders with attention mechanisms), comprehensive benchmarking tools (UCR/UEA archives with 100+ datasets, published results repository, statistical testing), and performance-optimized implementations using numba. Features progressive model complexity from fast baselines (MiniRocket: <1 second training, 0.95+ accuracy on many benchmarks) to state-of-the-art ensembles (HIVECOTE V2), GPU acceleration support, and extensive visualization utilities. Use cases: physiological signal classification (ECG, EEG), industrial sensor monitoring, financial forecasting, change point detection, pattern discovery, activity recognition from wearables, predictive maintenance, climate time series analysis, and any sequential data requiring specialized temporal modeling beyond standard ML
- **PufferLib** - High-performance reinforcement learning library achieving 1M-4M steps/second through optimized vectorization, native multi-agent support, and efficient PPO training (PuffeRL). Use this skill for RL training on any environment (Gymnasium, PettingZoo, Atari, Procgen), creating custom PufferEnv environments, developing policies (CNN, LSTM, multi-input architectures), optimizing parallel simulation performance, or scaling multi-agent systems. Includes Ocean suite (20+ environments), seamless framework integration with automatic space flattening, zero-copy vectorization with shared memory buffers, distributed training support, and comprehensive reference guides for training workflows, environment development, vectorization optimization, policy architectures, and third-party integrations
- **PyMC** - Bayesian statistical modeling and probabilistic programming
- **PyMOO** - Multi-objective optimization with evolutionary algorithms

View File

@@ -1,224 +1,368 @@
---
name: aeon
description: Time series machine learning toolkit for classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use this skill when working with temporal data, performing time series analysis, building predictive models on sequential data, or implementing workflows that involve distance metrics (DTW), transformations (ROCKET, Catch22), or deep learning for time series. Applicable for tasks like ECG classification, stock price forecasting, sensor anomaly detection, or activity recognition from wearable devices.
description: This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.
---
# Aeon
# Aeon Time Series Machine Learning
## Overview
Aeon is a comprehensive Python toolkit for time series machine learning, providing state-of-the-art algorithms and classical techniques for analyzing temporal data. Use this skill when working with sequential/temporal data across seven primary learning tasks: classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search.
Aeon is a scikit-learn compatible Python toolkit for time series machine learning. It provides state-of-the-art algorithms for classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search.
## When to Use This Skill
Apply this skill when:
- Classifying or predicting from time series data (e.g., ECG classification, activity recognition)
- Forecasting future values in temporal sequences (e.g., stock prices, energy demand)
- Detecting anomalies in sensor streams or operational data
- Clustering temporal patterns or discovering motifs
- Segmenting time series into meaningful regions (change point detection)
- Computing distances between time series using specialized metrics (DTW, MSM, ERP)
- Extracting features from temporal data using ROCKET, Catch22, TSFresh, or shapelets
- Building deep learning models for time series with specialized architectures
- Classifying or predicting from time series data
- Detecting anomalies or change points in temporal sequences
- Clustering similar time series patterns
- Forecasting future values
- Finding repeated patterns (motifs) or unusual subsequences (discords)
- Comparing time series with specialized distance metrics
- Extracting features from temporal data
## Installation
```bash
pip install aeon
```
## Core Capabilities
### 1. Time Series Classification
Classify labeled time series using diverse algorithm families:
- **Convolution-based**: ROCKET, MiniRocket, MultiRocket, Arsenal, Hydra
- **Deep learning**: InceptionTime, ResNet, FCN, TimeCNN, LITE
- **Dictionary-based**: BOSS, TDE, WEASEL, MrSEQL (symbolic representations)
- **Distance-based**: KNN with elastic distances, Elastic Ensemble, Proximity Forest
- **Feature-based**: Catch22, FreshPRINCE, Signature classifiers
- **Interval-based**: CIF, DrCIF, RISE, Random Interval variants
- **Shapelet-based**: Learning Shapelet, SAST
- **Hybrid ensembles**: HIVE-COTE V1/V2
Example:
Categorize time series into predefined classes. See `references/classification.md` for complete algorithm catalog.
**Quick Start:**
```python
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
from aeon.datasets import load_classification
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Load data
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
clf = RocketClassifier()
# Train classifier
clf = RocketClassifier(n_kernels=10000)
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
```
**Algorithm Selection:**
- **Speed + Performance**: `MiniRocketClassifier`, `Arsenal`
- **Maximum Accuracy**: `HIVECOTEV2`, `InceptionTimeClassifier`
- **Interpretability**: `ShapeletTransformClassifier`, `Catch22Classifier`
- **Small Datasets**: `KNeighborsTimeSeriesClassifier` with DTW distance
### 2. Time Series Regression
Predict continuous values from time series using adapted classification algorithms:
Predict continuous values from time series. See `references/regression.md` for algorithms.
**Quick Start:**
```python
from aeon.regression.convolution_based import RocketRegressor
from aeon.datasets import load_regression
X_train, y_train = load_regression("Covid3Month", split="train")
X_test, y_test = load_regression("Covid3Month", split="test")
reg = RocketRegressor()
reg.fit(X_train, y_train_continuous)
reg.fit(X_train, y_train)
predictions = reg.predict(X_test)
```
### 3. Forecasting
Predict future values using statistical and deep learning models:
- Statistical: ARIMA, ETS, Theta, TAR, AutoTAR, TVP
- Naive baselines: NaiveForecaster with seasonal strategies
- Deep learning: TCN (Temporal Convolutional Networks)
- Regression-based: RegressionForecaster with sliding windows
### 3. Time Series Clustering
Example:
Group similar time series without labels. See `references/clustering.md` for methods.
**Quick Start:**
```python
from aeon.forecasting.naive import NaiveForecaster
from aeon.clustering import TimeSeriesKMeans
forecaster = NaiveForecaster(strategy="last")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3]) # forecast 3 steps ahead
clusterer = TimeSeriesKMeans(
n_clusters=3,
distance="dtw",
averaging_method="ba"
)
labels = clusterer.fit_predict(X_train)
centers = clusterer.cluster_centers_
```
### 4. Anomaly Detection
Identify outliers in time series data:
- **Distance-based**: KMeansAD, CBLOF, LOF, STOMP, LeftSTAMPi, MERLIN, ROCKAD
- **Distribution-based**: COPOD, DWT_MLEAD
- **Outlier detection**: IsolationForest, OneClassSVM, STRAY
- **Collection adapters**: ClassificationAdapter, OutlierDetectionAdapter
### 4. Forecasting
Example:
Predict future time series values. See `references/forecasting.md` for forecasters.
**Quick Start:**
```python
from aeon.forecasting.arima import ARIMA
forecaster = ARIMA(order=(1, 1, 1))
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
### 5. Anomaly Detection
Identify unusual patterns or outliers. See `references/anomaly_detection.md` for detectors.
**Quick Start:**
```python
from aeon.anomaly_detection import STOMP
detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
anomaly_scores = detector.fit_predict(y)
### 5. Clustering
Group similar time series without labels:
```python
from aeon.clustering import TimeSeriesKMeans
clusterer = TimeSeriesKMeans(n_clusters=3, distance="dtw")
clusterer.fit(X_collection)
labels = clusterer.predict(X_new)
# Higher scores indicate anomalies
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
```
### 6. Segmentation
Divide time series into distinct regions or identify change points:
Partition time series into regions with change points. See `references/segmentation.md`.
**Quick Start:**
```python
from aeon.segmentation import ClaSPSegmenter
segmenter = ClaSPSegmenter()
change_points = segmenter.fit_predict(X_series)
change_points = segmenter.fit_predict(y)
```
### 7. Similarity Search
Find motifs and nearest neighbors in time series collections using specialized distance metrics and matrix profile techniques.
### 8. Transformations
Preprocess and extract features from time series:
- **Collection transformers**: ROCKET, Catch22, TSFresh, Shapelet, SAX, PAA, SFA
- **Series transformers**: Moving Average, Box-Cox, PCA, Fourier, Savitzky-Golay
- **Channel operations**: Selection, scoring, balancing
- **Data balancing**: SMOTE, ADASYN
Find similar patterns within or across time series. See `references/similarity_search.md`.
Example:
**Quick Start:**
```python
from aeon.transformations.collection.convolution_based import Rocket
from aeon.similarity_search import StompMotif
rocket = Rocket(num_kernels=10000)
X_transformed = rocket.fit_transform(X_train)
# Find recurring patterns
motif_finder = StompMotif(window_size=50, k=3)
motifs = motif_finder.fit_predict(y)
```
### 9. Distance Metrics
Compute specialized time series distances:
- **Warping**: DTW, WDTW, DDTW, WDDTW, Shape DTW, ADTW
- **Edit distances**: ERP, EDR, LCSS, TWE
- **Standard**: Euclidean, Manhattan, Minkowski, Squared
- **Specialized**: MSM, SBD
## Feature Extraction and Transformations
Example:
Transform time series for feature engineering. See `references/transformations.md`.
**ROCKET Features:**
```python
from aeon.distances import dtw_distance, pairwise_distance
from aeon.transformations.collection.convolution_based import RocketTransformer
dist = dtw_distance(series1, series2)
dist_matrix = pairwise_distance(X_collection, metric="dtw")
rocket = RocketTransformer()
X_features = rocket.fit_transform(X_train)
# Use features with any sklearn classifier
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_features, y_train)
```
## Installation
Install aeon using pip:
```bash
# Core dependencies only
pip install -U aeon
# All optional dependencies
pip install -U "aeon[all_extras]"
```
Or using conda:
```bash
conda create -n aeon-env -c conda-forge aeon
conda activate aeon-env
```
**Requirements**: Python 3.9, 3.10, 3.11, or 3.12
## Data Format
Aeon uses standardized data shapes:
- **Collections**: `(n_cases, n_channels, n_timepoints)` as NumPy arrays or pandas DataFrames
- **Single series**: NumPy arrays or pandas Series
- **Variable-length**: Supported with padding or specialized handling
Load example datasets:
**Statistical Features:**
```python
from aeon.datasets import load_arrow_head, load_airline
from aeon.transformations.collection.feature_based import Catch22
# Classification dataset
X_train, y_train = load_arrow_head(split="train")
# Forecasting dataset
y = load_airline()
catch22 = Catch22()
X_features = catch22.fit_transform(X_train)
```
## Workflow Patterns
### Pipeline Construction
Combine transformers and estimators using scikit-learn pipelines:
**Preprocessing:**
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from aeon.transformations.collection import MinMaxScaler, Normalizer
scaler = Normalizer() # Z-normalization
X_normalized = scaler.fit_transform(X_train)
```
## Distance Metrics
Specialized temporal distance measures. See `references/distances.md` for complete catalog.
**Usage:**
```python
from aeon.distances import dtw_distance, dtw_pairwise_distance
# Single distance
distance = dtw_distance(x, y, window=0.1)
# Pairwise distances
distance_matrix = dtw_pairwise_distance(X_train)
# Use with classifiers
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
pipeline = Pipeline([
('features', Catch22()),
('classifier', KNeighborsTimeSeriesClassifier())
])
pipeline.fit(X_train, y_train)
clf = KNeighborsTimeSeriesClassifier(
n_neighbors=5,
distance="dtw",
distance_params={"window": 0.2}
)
```
### Discovery and Tags
Find estimators programmatically:
**Available Distances:**
- **Elastic**: DTW, DDTW, WDTW, ERP, EDR, LCSS, TWE, MSM
- **Lock-step**: Euclidean, Manhattan, Minkowski
- **Shape-based**: Shape DTW, SBD
## Deep Learning Networks
Neural architectures for time series. See `references/networks.md`.
**Architectures:**
- Convolutional: `FCNClassifier`, `ResNetClassifier`, `InceptionTimeClassifier`
- Recurrent: `RecurrentNetwork`, `TCNNetwork`
- Autoencoders: `AEFCNClusterer`, `AEResNetClusterer`
**Usage:**
```python
from aeon.utils.discovery import all_estimators
from aeon.classification.deep_learning import InceptionTimeClassifier
# Find all classifiers
classifiers = all_estimators(type_filter="classifier")
# Find all forecasters
forecasters = all_estimators(type_filter="forecaster")
clf = InceptionTimeClassifier(n_epochs=100, batch_size=32)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
```
## References
## Datasets and Benchmarking
The skill includes modular reference files with comprehensive details:
Load standard benchmarks and evaluate performance. See `references/datasets_benchmarking.md`.
### references/learning_tasks.md
In-depth coverage of classification, regression, clustering, and similarity search, including algorithm categories, use cases, and code patterns.
**Load Datasets:**
```python
from aeon.datasets import load_classification, load_regression
### references/temporal_analysis.md
Detailed information on forecasting, anomaly detection, and segmentation tasks with model descriptions and workflows.
# Classification
X_train, y_train = load_classification("ArrowHead", split="train")
### references/core_modules.md
Comprehensive documentation of transformations, distances, networks, datasets, and benchmarking utilities.
# Regression
X_train, y_train = load_regression("Covid3Month", split="train")
```
### references/workflows.md
Common workflow patterns, pipeline examples, cross-validation strategies, and integration with scikit-learn.
**Benchmarking:**
```python
from aeon.benchmarking import get_estimator_results
Load these reference files as needed for detailed information on specific modules or workflows.
# Compare with published results
published = get_estimator_results("ROCKET", "GunPoint")
```
## Common Workflows
### Classification Pipeline
```python
from aeon.transformations.collection import Normalizer
from aeon.classification.convolution_based import RocketClassifier
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('normalize', Normalizer()),
('classify', RocketClassifier())
])
pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_test, y_test)
```
### Feature Extraction + Traditional ML
```python
from aeon.transformations.collection import RocketTransformer
from sklearn.ensemble import GradientBoostingClassifier
# Extract features
rocket = RocketTransformer()
X_train_features = rocket.fit_transform(X_train)
X_test_features = rocket.transform(X_test)
# Train traditional ML
clf = GradientBoostingClassifier()
clf.fit(X_train_features, y_train)
predictions = clf.predict(X_test_features)
```
### Anomaly Detection with Visualization
```python
from aeon.anomaly_detection import STOMP
import matplotlib.pyplot as plt
detector = STOMP(window_size=50)
scores = detector.fit_predict(y)
plt.figure(figsize=(15, 5))
plt.subplot(2, 1, 1)
plt.plot(y, label='Time Series')
plt.subplot(2, 1, 2)
plt.plot(scores, label='Anomaly Scores', color='red')
plt.axhline(np.percentile(scores, 95), color='k', linestyle='--')
plt.show()
```
## Best Practices
### Data Preparation
1. **Normalize**: Most algorithms benefit from z-normalization
```python
from aeon.transformations.collection import Normalizer
normalizer = Normalizer()
X_train = normalizer.fit_transform(X_train)
X_test = normalizer.transform(X_test)
```
2. **Handle Missing Values**: Impute before analysis
```python
from aeon.transformations.collection import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_train = imputer.fit_transform(X_train)
```
3. **Check Data Format**: Aeon expects shape `(n_samples, n_channels, n_timepoints)`
### Model Selection
1. **Start Simple**: Begin with ROCKET variants before deep learning
2. **Use Validation**: Split training data for hyperparameter tuning
3. **Compare Baselines**: Test against simple methods (1-NN Euclidean, Naive)
4. **Consider Resources**: ROCKET for speed, deep learning if GPU available
### Algorithm Selection Guide
**For Fast Prototyping:**
- Classification: `MiniRocketClassifier`
- Regression: `MiniRocketRegressor`
- Clustering: `TimeSeriesKMeans` with Euclidean
**For Maximum Accuracy:**
- Classification: `HIVECOTEV2`, `InceptionTimeClassifier`
- Regression: `InceptionTimeRegressor`
- Forecasting: `ARIMA`, `TCNForecaster`
**For Interpretability:**
- Classification: `ShapeletTransformClassifier`, `Catch22Classifier`
- Features: `Catch22`, `TSFresh`
**For Small Datasets:**
- Distance-based: `KNeighborsTimeSeriesClassifier` with DTW
- Avoid: Deep learning (requires large data)
## Reference Documentation
Detailed information available in `references/`:
- `classification.md` - All classification algorithms
- `regression.md` - Regression methods
- `clustering.md` - Clustering algorithms
- `forecasting.md` - Forecasting approaches
- `anomaly_detection.md` - Anomaly detection methods
- `segmentation.md` - Segmentation algorithms
- `similarity_search.md` - Pattern matching and motif discovery
- `transformations.md` - Feature extraction and preprocessing
- `distances.md` - Time series distance metrics
- `networks.md` - Deep learning architectures
- `datasets_benchmarking.md` - Data loading and evaluation tools
## Additional Resources
- Documentation: https://www.aeon-toolkit.org/
- GitHub: https://github.com/aeon-toolkit/aeon
- Examples: https://www.aeon-toolkit.org/en/stable/examples.html
- API Reference: https://www.aeon-toolkit.org/en/stable/api_reference.html

View File

@@ -0,0 +1,154 @@
# Anomaly Detection
Aeon provides anomaly detection methods for identifying unusual patterns in time series at both series and collection levels.
## Collection Anomaly Detectors
Detect anomalous time series within a collection:
- `ClassificationAdapter` - Adapts classifiers for anomaly detection
- Train on normal data, flag outliers during prediction
- **Use when**: Have labeled normal data, want classification-based approach
- `OutlierDetectionAdapter` - Wraps sklearn outlier detectors
- Works with IsolationForest, LOF, OneClassSVM
- **Use when**: Want to use sklearn anomaly detectors on collections
## Series Anomaly Detectors
Detect anomalous points or subsequences within a single time series.
### Distance-Based Methods
Use similarity metrics to identify anomalies:
- `CBLOF` - Cluster-Based Local Outlier Factor
- Clusters data, identifies outliers based on cluster properties
- **Use when**: Anomalies form sparse clusters
- `KMeansAD` - K-means based anomaly detection
- Distance to nearest cluster center indicates anomaly
- **Use when**: Normal patterns cluster well
- `LeftSTAMPi` - Left STAMP incremental
- Matrix profile for online anomaly detection
- **Use when**: Streaming data, need online detection
- `STOMP` - Scalable Time series Ordered-search Matrix Profile
- Computes matrix profile for subsequence anomalies
- **Use when**: Discord discovery, motif detection
- `MERLIN` - Matrix profile-based method
- Efficient matrix profile computation
- **Use when**: Large time series, need scalability
- `LOF` - Local Outlier Factor adapted for time series
- Density-based outlier detection
- **Use when**: Anomalies in low-density regions
- `ROCKAD` - ROCKET-based semi-supervised detection
- Uses ROCKET features for anomaly identification
- **Use when**: Have some labeled data, want feature-based approach
### Distribution-Based Methods
Analyze statistical distributions:
- `COPOD` - Copula-Based Outlier Detection
- Models marginal and joint distributions
- **Use when**: Multi-dimensional time series, complex dependencies
- `DWT_MLEAD` - Discrete Wavelet Transform Multi-Level Anomaly Detection
- Decomposes series into frequency bands
- **Use when**: Anomalies at specific frequencies
### Isolation-Based Methods
Use isolation principles:
- `IsolationForest` - Random forest-based isolation
- Anomalies easier to isolate than normal points
- **Use when**: High-dimensional data, no assumptions about distribution
- `OneClassSVM` - Support vector machine for novelty detection
- Learns boundary around normal data
- **Use when**: Well-defined normal region, need robust boundary
- `STRAY` - Streaming Robust Anomaly Detection
- Robust to data distribution changes
- **Use when**: Streaming data, distribution shifts
### External Library Integration
- `PyODAdapter` - Bridges PyOD library to aeon
- Access 40+ PyOD anomaly detectors
- **Use when**: Need specific PyOD algorithm
## Quick Start
```python
from aeon.anomaly_detection import STOMP
import numpy as np
# Create time series with anomaly
y = np.concatenate([
np.sin(np.linspace(0, 10, 100)),
[5.0], # Anomaly spike
np.sin(np.linspace(10, 20, 100))
])
# Detect anomalies
detector = STOMP(window_size=10)
anomaly_scores = detector.fit_predict(y)
# Higher scores indicate more anomalous points
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
```
## Point vs Subsequence Anomalies
- **Point anomalies**: Single unusual values
- Use: COPOD, DWT_MLEAD, IsolationForest
- **Subsequence anomalies** (discords): Unusual patterns
- Use: STOMP, LeftSTAMPi, MERLIN
- **Collective anomalies**: Groups of points forming unusual pattern
- Use: Matrix profile methods, clustering-based
## Evaluation Metrics
Specialized metrics for anomaly detection:
```python
from aeon.benchmarking.metrics.anomaly_detection import (
range_precision,
range_recall,
range_f_score,
roc_auc_score
)
# Range-based metrics account for window detection
precision = range_precision(y_true, y_pred, alpha=0.5)
recall = range_recall(y_true, y_pred, alpha=0.5)
f1 = range_f_score(y_true, y_pred, alpha=0.5)
```
## Algorithm Selection
- **Speed priority**: KMeansAD, IsolationForest
- **Accuracy priority**: STOMP, COPOD
- **Streaming data**: LeftSTAMPi, STRAY
- **Discord discovery**: STOMP, MERLIN
- **Multi-dimensional**: COPOD, PyODAdapter
- **Semi-supervised**: ROCKAD, OneClassSVM
- **No training data**: IsolationForest, STOMP
## Best Practices
1. **Normalize data**: Many methods sensitive to scale
2. **Choose window size**: For matrix profile methods, window size critical
3. **Set threshold**: Use percentile-based or domain-specific thresholds
4. **Validate results**: Visualize detections to verify meaningfulness
5. **Handle seasonality**: Detrend/deseasonalize before detection

View File

@@ -0,0 +1,144 @@
# Time Series Classification
Aeon provides 13 categories of time series classifiers with scikit-learn compatible APIs.
## Convolution-Based Classifiers
Apply random convolutional transformations for efficient feature extraction:
- `Arsenal` - Ensemble of ROCKET classifiers with varied kernels
- `HydraClassifier` - Multi-resolution convolution with dilation
- `RocketClassifier` - Random convolution kernels with ridge regression
- `MiniRocketClassifier` - Simplified ROCKET variant for speed
- `MultiRocketClassifier` - Combines multiple ROCKET variants
**Use when**: Need fast, scalable classification with strong performance across diverse datasets.
## Deep Learning Classifiers
Neural network architectures optimized for temporal sequences:
- `FCNClassifier` - Fully convolutional network
- `ResNetClassifier` - Residual networks with skip connections
- `InceptionTimeClassifier` - Multi-scale inception modules
- `TimeCNNClassifier` - Standard CNN for time series
- `MLPClassifier` - Multi-layer perceptron baseline
- `EncoderClassifier` - Generic encoder wrapper
- `DisjointCNNClassifier` - Shapelet-focused architecture
**Use when**: Large datasets available, need end-to-end learning, or complex temporal patterns.
## Dictionary-Based Classifiers
Transform time series into symbolic representations:
- `BOSSEnsemble` - Bag-of-SFA-Symbols with ensemble voting
- `TemporalDictionaryEnsemble` - Multiple dictionary methods combined
- `WEASEL` - Word ExtrAction for time SEries cLassification
- `MrSEQLClassifier` - Multiple symbolic sequence learning
**Use when**: Need interpretable models, sparse patterns, or symbolic reasoning.
## Distance-Based Classifiers
Leverage specialized time series distance metrics:
- `KNeighborsTimeSeriesClassifier` - k-NN with temporal distances (DTW, LCSS, ERP, etc.)
- `ElasticEnsemble` - Combines multiple elastic distance measures
- `ProximityForest` - Tree ensemble using distance-based splits
**Use when**: Small datasets, need similarity-based classification, or interpretable decisions.
## Feature-Based Classifiers
Extract statistical and signature features before classification:
- `Catch22Classifier` - 22 canonical time-series characteristics
- `TSFreshClassifier` - Automated feature extraction via tsfresh
- `SignatureClassifier` - Path signature transformations
- `SummaryClassifier` - Summary statistics extraction
- `FreshPRINCEClassifier` - Combines multiple feature extractors
**Use when**: Need interpretable features, domain expertise available, or feature engineering approach.
## Interval-Based Classifiers
Extract features from random or supervised intervals:
- `CanonicalIntervalForestClassifier` - Random interval features with decision trees
- `DrCIFClassifier` - Diverse Representation CIF with catch22 features
- `TimeSeriesForestClassifier` - Random intervals with summary statistics
- `RandomIntervalClassifier` - Simple interval-based approach
- `RandomIntervalSpectralEnsembleClassifier` - Spectral features from intervals
- `SupervisedTimeSeriesForest` - Supervised interval selection
**Use when**: Discriminative patterns occur in specific time windows.
## Shapelet-Based Classifiers
Identify discriminative subsequences (shapelets):
- `ShapeletTransformClassifier` - Discovers and uses discriminative shapelets
- `LearningShapeletClassifier` - Learns shapelets via gradient descent
- `SASTClassifier` - Scalable approximate shapelet transform
- `RDSTClassifier` - Random dilated shapelet transform
**Use when**: Need interpretable discriminative patterns or phase-invariant features.
## Hybrid Classifiers
Combine multiple classification paradigms:
- `HIVECOTEV1` - Hierarchical Vote Collective of Transformation-based Ensembles (version 1)
- `HIVECOTEV2` - Enhanced version with updated components
**Use when**: Maximum accuracy required, computational resources available.
## Early Classification
Make predictions before observing entire time series:
- `TEASER` - Two-tier Early and Accurate Series Classifier
- `ProbabilityThresholdEarlyClassifier` - Prediction when confidence exceeds threshold
**Use when**: Real-time decisions needed, or observations have cost.
## Ordinal Classification
Handle ordered class labels:
- `OrdinalTDE` - Temporal dictionary ensemble for ordinal outputs
**Use when**: Classes have natural ordering (e.g., severity levels).
## Composition Tools
Build custom pipelines and ensembles:
- `ClassifierPipeline` - Chain transformers with classifiers
- `WeightedEnsembleClassifier` - Weighted combination of classifiers
- `SklearnClassifierWrapper` - Adapt sklearn classifiers for time series
## Quick Start
```python
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_classification
# Load data
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Train and predict
clf = RocketClassifier()
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
```
## Algorithm Selection
- **Speed priority**: MiniRocketClassifier, Arsenal
- **Accuracy priority**: HIVECOTEV2, InceptionTimeClassifier
- **Interpretability**: ShapeletTransformClassifier, Catch22Classifier
- **Small data**: KNeighborsTimeSeriesClassifier, Distance-based methods
- **Large data**: Deep learning classifiers, ROCKET variants

View File

@@ -0,0 +1,123 @@
# Time Series Clustering
Aeon provides clustering algorithms adapted for temporal data with specialized distance metrics and averaging methods.
## Partitioning Algorithms
Standard k-means/k-medoids adapted for time series:
- `TimeSeriesKMeans` - K-means with temporal distance metrics (DTW, Euclidean, etc.)
- `TimeSeriesKMedoids` - Uses actual time series as cluster centers
- `TimeSeriesKShape` - Shape-based clustering algorithm
- `TimeSeriesKernelKMeans` - Kernel-based variant for nonlinear patterns
**Use when**: Known number of clusters, spherical cluster shapes expected.
## Large Dataset Methods
Efficient clustering for large collections:
- `TimeSeriesCLARA` - Clustering Large Applications with sampling
- `TimeSeriesCLARANS` - Randomized search variant of CLARA
**Use when**: Dataset too large for standard k-medoids, need scalability.
## Elastic Distance Clustering
Specialized for alignment-based similarity:
- `KASBA` - K-means with shift-invariant elastic averaging
- `ElasticSOM` - Self-organizing map using elastic distances
**Use when**: Time series have temporal shifts or warping.
## Spectral Methods
Graph-based clustering:
- `KSpectralCentroid` - Spectral clustering with centroid computation
**Use when**: Non-convex cluster shapes, need graph-based approach.
## Deep Learning Clustering
Neural network-based clustering with auto-encoders:
- `AEFCNClusterer` - Fully convolutional auto-encoder
- `AEResNetClusterer` - Residual network auto-encoder
- `AEDCNNClusterer` - Dilated CNN auto-encoder
- `AEDRNNClusterer` - Dilated RNN auto-encoder
- `AEBiGRUClusterer` - Bidirectional GRU auto-encoder
- `AEAttentionBiGRUClusterer` - Attention-enhanced BiGRU auto-encoder
**Use when**: Large datasets, need learned representations, or complex patterns.
## Feature-Based Clustering
Transform to feature space before clustering:
- `Catch22Clusterer` - Clusters on 22 canonical features
- `SummaryClusterer` - Uses summary statistics
- `TSFreshClusterer` - Automated tsfresh features
**Use when**: Raw time series not informative, need interpretable features.
## Composition
Build custom clustering pipelines:
- `ClustererPipeline` - Chain transformers with clusterers
## Averaging Methods
Compute cluster centers for time series:
- `mean_average` - Arithmetic mean
- `ba_average` - Barycentric averaging with DTW
- `kasba_average` - Shift-invariant averaging
- `shift_invariant_average` - General shift-invariant method
**Use when**: Need representative cluster centers for visualization or initialization.
## Quick Start
```python
from aeon.clustering import TimeSeriesKMeans
from aeon.datasets import load_classification
# Load data (using classification data for clustering)
X_train, _ = load_classification("GunPoint", split="train")
# Cluster time series
clusterer = TimeSeriesKMeans(
n_clusters=3,
distance="dtw", # Use DTW distance
averaging_method="ba" # Barycentric averaging
)
labels = clusterer.fit_predict(X_train)
centers = clusterer.cluster_centers_
```
## Algorithm Selection
- **Speed priority**: TimeSeriesKMeans with Euclidean distance
- **Temporal alignment**: KASBA, TimeSeriesKMeans with DTW
- **Large datasets**: TimeSeriesCLARA, TimeSeriesCLARANS
- **Complex patterns**: Deep learning clusterers
- **Interpretability**: Catch22Clusterer, SummaryClusterer
- **Non-convex clusters**: KSpectralCentroid
## Distance Metrics
Compatible distance metrics include:
- Euclidean, Manhattan, Minkowski (lock-step)
- DTW, DDTW, WDTW (elastic with alignment)
- ERP, EDR, LCSS (edit-based)
- MSM, TWE (specialized elastic)
## Evaluation
Use clustering metrics from sklearn or aeon benchmarking:
- Silhouette score
- Davies-Bouldin index
- Calinski-Harabasz index

View File

@@ -1,749 +0,0 @@
# Core Modules: Transformations, Distances, Networks, Datasets, and Benchmarking
This reference provides comprehensive details on foundational modules that support aeon's learning tasks.
## Transformations
Transformations convert time series into alternative representations for feature extraction, preprocessing, or visualization.
### Two Types of Transformers
**Collection Transformers**: Process entire collections of time series
- Input: `(n_cases, n_channels, n_timepoints)`
- Output: Features, transformed collections, or tabular data
**Series Transformers**: Work on individual time series
- Input: Single time series
- Output: Transformed single series
### Collection-Level Transformations
#### ROCKET (RAndom Convolutional KErnel Transform)
Fast feature extraction via random convolutional kernels:
```python
from aeon.transformations.collection.convolution_based import Rocket
rocket = Rocket(num_kernels=10000, n_jobs=-1)
X_transformed = rocket.fit_transform(X_train)
# Output shape: (n_cases, 2 * num_kernels)
```
**Variants:**
```python
from aeon.transformations.collection.convolution_based import (
MiniRocket,
MultiRocket,
Hydra
)
# MiniRocket: Faster, streamlined version
minirocket = MiniRocket(num_kernels=10000)
X_features = minirocket.fit_transform(X_train)
# MultiRocket: Multivariate extensions
multirocket = MultiRocket(num_kernels=10000)
X_features = multirocket.fit_transform(X_train)
# Hydra: Dictionary-based convolution
hydra = Hydra(n_kernels=8)
X_features = hydra.fit_transform(X_train)
```
#### Catch22
22 canonical time series features:
```python
from aeon.transformations.collection.feature_based import Catch22
catch22 = Catch22(n_jobs=-1)
X_features = catch22.fit_transform(X_train)
# Output shape: (n_cases, 22)
```
**Feature categories:**
- Distribution (mean, variance, skewness)
- Autocorrelation properties
- Entropy measures
- Nonlinear dynamics
- Spectral properties
#### TSFresh
Comprehensive feature extraction (779 features):
```python
from aeon.transformations.collection.feature_based import TSFresh
tsfresh = TSFresh(
default_fc_parameters="comprehensive",
n_jobs=-1
)
X_features = tsfresh.fit_transform(X_train)
```
**Warning**: Slow on large datasets; use Catch22 for faster alternative
#### FreshPRINCE
Fresh Pipelines with Random Interval and Catch22 Features:
```python
from aeon.transformations.collection.feature_based import FreshPRINCE
freshprince = FreshPRINCE(n_intervals=50, n_jobs=-1)
X_features = freshprince.fit_transform(X_train)
```
#### Shapelet Transform
Extract discriminative subsequences:
```python
from aeon.transformations.collection.shapelet_based import ShapeletTransform
shapelet = ShapeletTransform(
n_shapelet_samples=10000,
max_shapelets=20,
n_jobs=-1
)
X_features = shapelet.fit_transform(X_train, y_train)
# Requires labels for supervised shapelet discovery
```
**Random Shapelet Transform**:
```python
from aeon.transformations.collection.shapelet_based import RandomShapeletTransform
rst = RandomShapeletTransform(n_shapelets=1000)
X_features = rst.fit_transform(X_train)
```
#### SAST (Shapelet-Attention Subsequence Transform)
Attention-based shapelet discovery:
```python
from aeon.transformations.collection.shapelet_based import SAST
sast = SAST(window_size=0.1, n_shapelets=100)
X_features = sast.fit_transform(X_train, y_train)
```
#### Symbolic Representations
**SAX (Symbolic Aggregate approXimation)**:
```python
from aeon.transformations.collection.dictionary_based import SAX
sax = SAX(n_segments=8, alphabet_size=4)
X_symbolic = sax.fit_transform(X_train)
```
**PAA (Piecewise Aggregate Approximation)**:
```python
from aeon.transformations.collection.dictionary_based import PAA
paa = PAA(n_segments=10)
X_approximated = paa.fit_transform(X_train)
```
**SFA (Symbolic Fourier Approximation)**:
```python
from aeon.transformations.collection.dictionary_based import SFA
sfa = SFA(word_length=8, alphabet_size=4)
X_symbolic = sfa.fit_transform(X_train)
```
#### Channel Selection and Operations
**Channel Selection**:
```python
from aeon.transformations.collection.channel_selection import ChannelSelection
selector = ChannelSelection(channels=[0, 2, 5])
X_selected = selector.fit_transform(X_train)
```
**Channel Scoring**:
```python
from aeon.transformations.collection.channel_selection import ChannelScorer
scorer = ChannelScorer()
scores = scorer.fit_transform(X_train, y_train)
```
#### Data Balancing
**SMOTE (Synthetic Minority Over-sampling)**:
```python
from aeon.transformations.collection.smote import SMOTE
smote = SMOTE(k_neighbors=5)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
```
**ADASYN**:
```python
from aeon.transformations.collection.smote import ADASYN
adasyn = ADASYN(n_neighbors=5)
X_resampled, y_resampled = adasyn.fit_resample(X_train, y_train)
```
### Series-Level Transformations
#### Smoothing Filters
**Moving Average**:
```python
from aeon.transformations.series.moving_average import MovingAverage
ma = MovingAverage(window_size=5)
X_smoothed = ma.fit_transform(X_series)
```
**Exponential Smoothing**:
```python
from aeon.transformations.series.exponent import ExponentTransformer
exp_smooth = ExponentTransformer(power=0.5)
X_smoothed = exp_smooth.fit_transform(X_series)
```
**Savitzky-Golay Filter**:
```python
from aeon.transformations.series.savgol import SavitzkyGolay
savgol = SavitzkyGolay(window_length=11, polyorder=3)
X_smoothed = savgol.fit_transform(X_series)
```
**Gaussian Filter**:
```python
from aeon.transformations.series.gaussian import GaussianFilter
gaussian = GaussianFilter(sigma=2.0)
X_smoothed = gaussian.fit_transform(X_series)
```
#### Statistical Transforms
**Box-Cox Transformation**:
```python
from aeon.transformations.series.boxcox import BoxCoxTransformer
boxcox = BoxCoxTransformer()
X_transformed = boxcox.fit_transform(X_series)
```
**AutoCorrelation**:
```python
from aeon.transformations.series.acf import AutoCorrelationTransformer
acf = AutoCorrelationTransformer(n_lags=40)
X_acf = acf.fit_transform(X_series)
```
**PCA (Principal Component Analysis)**:
```python
from aeon.transformations.series.pca import PCATransformer
pca = PCATransformer(n_components=3)
X_reduced = pca.fit_transform(X_series)
```
#### Approximation Methods
**Discrete Fourier Transform (DFT)**:
```python
from aeon.transformations.series.fourier import FourierTransform
dft = FourierTransform()
X_freq = dft.fit_transform(X_series)
```
**Piecewise Linear Approximation (PLA)**:
```python
from aeon.transformations.series.pla import PLA
pla = PLA(n_segments=10)
X_approx = pla.fit_transform(X_series)
```
#### Anomaly Detection Transform
**DOBIN (Distance-based Outlier BasIs using Neighbors)**:
```python
from aeon.transformations.series.dobin import DOBIN
dobin = DOBIN()
X_transformed = dobin.fit_transform(X_series)
```
### Transformation Pipelines
Chain transformers together:
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22, PCA
pipeline = Pipeline([
('features', Catch22()),
('reduce', PCA(n_components=10))
])
X_transformed = pipeline.fit_transform(X_train)
```
## Distance Metrics
Specialized distance functions for time series similarity measurement.
### Distance Categories
#### Warping-Based Distances
**DTW (Dynamic Time Warping)**:
```python
from aeon.distances import dtw_distance, dtw_pairwise_distance
# Compute distance between two series
dist = dtw_distance(series1, series2, window=0.2)
# Pairwise distances for a collection
dist_matrix = dtw_pairwise_distance(X_collection)
# Get alignment path
from aeon.distances import dtw_alignment_path
path = dtw_alignment_path(series1, series2)
# Get cost matrix
from aeon.distances import dtw_cost_matrix
cost = dtw_cost_matrix(series1, series2)
```
**DTW Variants**:
```python
from aeon.distances import (
wdtw_distance, # Weighted DTW
ddtw_distance, # Derivative DTW
wddtw_distance, # Weighted Derivative DTW
adtw_distance, # Amerced DTW
shape_dtw_distance # Shape DTW
)
# Weighted DTW (penalize warping)
dist = wdtw_distance(series1, series2, g=0.05)
# Derivative DTW (compare shapes)
dist = ddtw_distance(series1, series2)
# Shape DTW (with shape descriptors)
dist = shape_dtw_distance(series1, series2)
```
**DTW Parameters**:
- `window`: Sakoe-Chiba band constraint (0.0-1.0)
- `g`: Penalty weight for warping distances
#### Edit Distances
**ERP (Edit distance with Real Penalty)**:
```python
from aeon.distances import erp_distance
dist = erp_distance(series1, series2, g=0.0, window=None)
```
**EDR (Edit Distance on Real sequences)**:
```python
from aeon.distances import edr_distance
dist = edr_distance(series1, series2, epsilon=0.1, window=None)
```
**LCSS (Longest Common SubSequence)**:
```python
from aeon.distances import lcss_distance
dist = lcss_distance(series1, series2, epsilon=1.0, window=None)
```
**TWE (Time Warp Edit)**:
```python
from aeon.distances import twe_distance
dist = twe_distance(series1, series2, penalty=0.1, stiffness=0.001)
```
#### Standard Metrics
```python
from aeon.distances import (
euclidean_distance,
manhattan_distance,
minkowski_distance,
squared_distance
)
# Euclidean distance
dist = euclidean_distance(series1, series2)
# Manhattan (L1) distance
dist = manhattan_distance(series1, series2)
# Minkowski distance
dist = minkowski_distance(series1, series2, p=3)
# Squared Euclidean
dist = squared_distance(series1, series2)
```
#### Specialized Distances
**MSM (Move-Split-Merge)**:
```python
from aeon.distances import msm_distance
dist = msm_distance(series1, series2, c=1.0)
```
**SBD (Shape-Based Distance)**:
```python
from aeon.distances import sbd_distance
dist = sbd_distance(series1, series2)
```
### Unified Distance Interface
```python
from aeon.distances import distance, pairwise_distance
# Compute any distance by name
dist = distance(series1, series2, metric="dtw", window=0.1)
# Pairwise distance matrix
dist_matrix = pairwise_distance(X_collection, metric="euclidean")
# Get available distance names
from aeon.distances import get_distance_function_names
available_distances = get_distance_function_names()
```
### Distance Selection Guide
**Fast and accurate**:
- Euclidean for aligned series
- Squared for even faster computation
**Handle temporal shifts**:
- DTW for general warping
- WDTW to penalize excessive warping
**Shape-based similarity**:
- DDTW or Shape DTW
- SBD for normalized shape comparison
**Robust to noise**:
- ERP, EDR, or LCSS
**Multivariate**:
- DTW supports multivariate via independent/dependent alignment
## Deep Learning Networks
Neural network architectures specialized for time series.
### Network Architectures
#### InceptionTime
Ensemble of Inception modules capturing multi-scale patterns:
```python
from aeon.networks import InceptionNetwork
from aeon.classification.deep_learning import InceptionTimeClassifier
# Use via classifier
clf = InceptionTimeClassifier(
n_epochs=200,
batch_size=64,
n_ensemble=5
)
# Or use network directly
network = InceptionNetwork(
n_classes=3,
n_channels=1,
n_timepoints=100
)
```
#### ResNet
Residual networks with skip connections:
```python
from aeon.networks import ResNetNetwork
from aeon.classification.deep_learning import ResNetClassifier
clf = ResNetClassifier(
n_epochs=200,
batch_size=64,
n_res_blocks=3
)
```
#### FCN (Fully Convolutional Network)
```python
from aeon.networks import FCNNetwork
from aeon.classification.deep_learning import FCNClassifier
clf = FCNClassifier(
n_epochs=200,
batch_size=64,
n_conv_layers=3
)
```
#### CNN
Standard convolutional architecture:
```python
from aeon.classification.deep_learning import CNNClassifier
clf = CNNClassifier(
n_epochs=100,
batch_size=32,
kernel_size=7,
n_filters=32
)
```
#### TapNet
Attentional prototype networks:
```python
from aeon.classification.deep_learning import TapNetClassifier
clf = TapNetClassifier(
n_epochs=200,
batch_size=64
)
```
#### MLP (Multi-Layer Perceptron)
```python
from aeon.classification.deep_learning import MLPClassifier
clf = MLPClassifier(
n_epochs=100,
batch_size=32,
hidden_layer_sizes=[500]
)
```
#### LITE (Light Inception with boosTing tEchnique)
Lightweight ensemble network:
```python
from aeon.classification.deep_learning import LITEClassifier
clf = LITEClassifier(
n_epochs=100,
batch_size=64
)
```
### Training Configuration
```python
from aeon.classification.deep_learning import InceptionTimeClassifier
clf = InceptionTimeClassifier(
n_epochs=200,
batch_size=64,
learning_rate=0.001,
use_bias=True,
verbose=1
)
clf.fit(X_train, y_train)
```
**Common parameters:**
- `n_epochs`: Training iterations
- `batch_size`: Samples per gradient update
- `learning_rate`: Optimizer learning rate
- `verbose`: Training output verbosity
- `callbacks`: Keras callbacks (early stopping, etc.)
## Datasets
Load built-in datasets and access UCR/UEA archives.
### Built-in Datasets
```python
from aeon.datasets import (
load_arrow_head,
load_airline,
load_gunpoint,
load_italy_power_demand,
load_basic_motions,
load_japanese_vowels
)
# Classification dataset
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Forecasting dataset (univariate series)
y = load_airline()
# Multivariate classification
X_train, y_train = load_basic_motions(split="train")
print(X_train.shape) # (n_cases, n_channels, n_timepoints)
```
### UCR/UEA Archives
Access 100+ benchmark datasets:
```python
from aeon.datasets import load_from_tsfile, load_classification
# Load UCR/UEA dataset by name
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Load from local .ts file
X, y = load_from_tsfile("data/my_dataset_TRAIN.ts")
```
### Dataset Information
```python
from aeon.datasets import get_dataset_meta_data
# Get metadata about a dataset
info = get_dataset_meta_data("GunPoint")
print(info)
# {'n_cases': 150, 'n_timepoints': 150, 'n_classes': 2, ...}
```
### Custom Dataset Format
Save/load custom datasets in aeon format:
```python
from aeon.datasets import write_to_tsfile, load_from_tsfile
# Save
write_to_tsfile(
X_train,
"my_dataset_TRAIN.ts",
y=y_train,
problem_name="MyDataset"
)
# Load
X, y = load_from_tsfile("my_dataset_TRAIN.ts")
```
## Benchmarking
Tools for reproducible evaluation and comparison.
### Benchmarking Utilities
```python
from aeon.benchmarking import benchmark_estimator
# Benchmark a classifier on multiple datasets
results = benchmark_estimator(
estimator=RocketClassifier(),
datasets=["GunPoint", "ArrowHead", "ItalyPowerDemand"],
n_resamples=10
)
```
### Result Storage and Comparison
```python
from aeon.benchmarking import (
write_results_to_csv,
read_results_from_csv,
compare_results
)
# Save results
write_results_to_csv(results, "results.csv")
# Load and compare
results_rocket = read_results_from_csv("results_rocket.csv")
results_inception = read_results_from_csv("results_inception.csv")
comparison = compare_results(
[results_rocket, results_inception],
estimator_names=["ROCKET", "InceptionTime"]
)
```
### Critical Difference Diagrams
Visualize statistical significance of differences:
```python
from aeon.benchmarking.results_plotting import plot_critical_difference_diagram
plot_critical_difference_diagram(
results_dict={
'ROCKET': results_rocket,
'InceptionTime': results_inception,
'BOSS': results_boss
},
dataset_names=["GunPoint", "ArrowHead", "ItalyPowerDemand"]
)
```
## Discovery and Tags
### Finding Estimators
```python
from aeon.utils.discovery import all_estimators
# Get all classifiers
classifiers = all_estimators(type_filter="classifier")
# Get all transformers
transformers = all_estimators(type_filter="transformer")
# Filter by capability tags
multivariate_classifiers = all_estimators(
type_filter="classifier",
filter_tags={"capability:multivariate": True}
)
```
### Checking Estimator Tags
```python
from aeon.utils.tags import all_tags_for_estimator
from aeon.classification.convolution_based import RocketClassifier
tags = all_tags_for_estimator(RocketClassifier)
print(tags)
# {'capability:multivariate': True, 'X_inner_type': ['numpy3D'], ...}
```
### Common Tags
- `capability:multivariate`: Handles multivariate series
- `capability:unequal_length`: Handles variable-length series
- `capability:missing_values`: Handles missing data
- `algorithm_type`: Algorithm family (e.g., "convolution", "distance")
- `python_dependencies`: Required packages

View File

@@ -0,0 +1,387 @@
# Datasets and Benchmarking
Aeon provides comprehensive tools for loading datasets and benchmarking time series algorithms.
## Dataset Loading
### Task-Specific Loaders
**Classification Datasets**:
```python
from aeon.datasets import load_classification
# Load train/test split
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Load entire dataset
X, y = load_classification("GunPoint")
```
**Regression Datasets**:
```python
from aeon.datasets import load_regression
X_train, y_train = load_regression("Covid3Month", split="train")
X_test, y_test = load_regression("Covid3Month", split="test")
# Bulk download
from aeon.datasets import download_all_regression
download_all_regression() # Downloads Monash TSER archive
```
**Forecasting Datasets**:
```python
from aeon.datasets import load_forecasting
# Load from forecastingdata.org
y, X = load_forecasting("airline", return_X_y=True)
```
**Anomaly Detection Datasets**:
```python
from aeon.datasets import load_anomaly_detection
X, y = load_anomaly_detection("NAB_realKnownCause")
```
### File Format Loaders
**Load from .ts files**:
```python
from aeon.datasets import load_from_ts_file
X, y = load_from_ts_file("path/to/data.ts")
```
**Load from .tsf files**:
```python
from aeon.datasets import load_from_tsf_file
df, metadata = load_from_tsf_file("path/to/data.tsf")
```
**Load from ARFF files**:
```python
from aeon.datasets import load_from_arff_file
X, y = load_from_arff_file("path/to/data.arff")
```
**Load from TSV files**:
```python
from aeon.datasets import load_from_tsv_file
data = load_from_tsv_file("path/to/data.tsv")
```
**Load TimeEval CSV**:
```python
from aeon.datasets import load_from_timeeval_csv_file
X, y = load_from_timeeval_csv_file("path/to/timeeval.csv")
```
### Writing Datasets
**Write to .ts format**:
```python
from aeon.datasets import write_to_ts_file
write_to_ts_file(X, "output.ts", y=y, problem_name="MyDataset")
```
**Write to ARFF format**:
```python
from aeon.datasets import write_to_arff_file
write_to_arff_file(X, "output.arff", y=y)
```
## Built-in Datasets
Aeon includes several benchmark datasets for quick testing:
### Classification
- `ArrowHead` - Shape classification
- `GunPoint` - Gesture recognition
- `ItalyPowerDemand` - Energy demand
- `BasicMotions` - Motion classification
- And 100+ more from UCR/UEA archives
### Regression
- `Covid3Month` - COVID forecasting
- Various datasets from Monash TSER archive
### Segmentation
- Time series segmentation datasets
- Human activity data
- Sensor data collections
### Special Collections
- `RehabPile` - Rehabilitation data (classification & regression)
## Dataset Metadata
Get information about datasets:
```python
from aeon.datasets import get_dataset_meta_data
metadata = get_dataset_meta_data("GunPoint")
print(metadata)
# {'n_train': 50, 'n_test': 150, 'length': 150, 'n_classes': 2, ...}
```
## Benchmarking Tools
### Loading Published Results
Access pre-computed benchmark results:
```python
from aeon.benchmarking import get_estimator_results
# Get results for specific algorithm on dataset
results = get_estimator_results(
estimator_name="ROCKET",
dataset_name="GunPoint"
)
# Get all available estimators for a dataset
estimators = get_available_estimators("GunPoint")
```
### Resampling Strategies
Create reproducible train/test splits:
```python
from aeon.benchmarking import stratified_resample
# Stratified resampling maintaining class distribution
X_train, X_test, y_train, y_test = stratified_resample(
X, y,
random_state=42,
test_size=0.3
)
```
### Performance Metrics
Specialized metrics for time series tasks:
**Anomaly Detection Metrics**:
```python
from aeon.benchmarking.metrics.anomaly_detection import (
range_precision,
range_recall,
range_f_score,
range_roc_auc_score
)
# Range-based metrics for window detection
precision = range_precision(y_true, y_pred, alpha=0.5)
recall = range_recall(y_true, y_pred, alpha=0.5)
f1 = range_f_score(y_true, y_pred, alpha=0.5)
auc = range_roc_auc_score(y_true, y_scores)
```
**Clustering Metrics**:
```python
from aeon.benchmarking.metrics.clustering import clustering_accuracy
# Clustering accuracy with label matching
accuracy = clustering_accuracy(y_true, y_pred)
```
**Segmentation Metrics**:
```python
from aeon.benchmarking.metrics.segmentation import (
count_error,
hausdorff_error
)
# Number of change points difference
count_err = count_error(y_true, y_pred)
# Maximum distance between predicted and true change points
hausdorff_err = hausdorff_error(y_true, y_pred)
```
### Statistical Testing
Post-hoc analysis for algorithm comparison:
```python
from aeon.benchmarking import (
nemenyi_test,
wilcoxon_test
)
# Nemenyi test for multiple algorithms
results = nemenyi_test(scores_matrix, alpha=0.05)
# Pairwise Wilcoxon signed-rank test
stat, p_value = wilcoxon_test(scores_alg1, scores_alg2)
```
## Benchmark Collections
### UCR/UEA Time Series Archives
Access to comprehensive benchmark repositories:
```python
# Classification: 112 univariate + 30 multivariate datasets
X_train, y_train = load_classification("Chinatown", split="train")
# Automatically downloads from timeseriesclassification.com
```
### Monash Forecasting Archive
```python
# Load forecasting datasets
y = load_forecasting("nn5_daily", return_X_y=False)
```
### Published Benchmark Results
Pre-computed results from major competitions:
- 2017 Univariate Bake-off
- 2021 Multivariate Classification
- 2023 Univariate Bake-off
## Workflow Example
Complete benchmarking workflow:
```python
from aeon.datasets import load_classification
from aeon.classification.convolution_based import RocketClassifier
from aeon.benchmarking import get_estimator_results
from sklearn.metrics import accuracy_score
import numpy as np
# Load dataset
dataset_name = "GunPoint"
X_train, y_train = load_classification(dataset_name, split="train")
X_test, y_test = load_classification(dataset_name, split="test")
# Train model
clf = RocketClassifier(n_kernels=10000, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
# Compare with published results
published = get_estimator_results("ROCKET", dataset_name)
print(f"Published ROCKET accuracy: {published['accuracy']:.4f}")
```
## Best Practices
### 1. Use Standard Splits
For reproducibility, use provided train/test splits:
```python
# Good: Use standard splits
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Avoid: Creating custom splits
X, y = load_classification("GunPoint")
X_train, X_test, y_train, y_test = train_test_split(X, y)
```
### 2. Set Random Seeds
Ensure reproducibility:
```python
clf = RocketClassifier(random_state=42)
results = stratified_resample(X, y, random_state=42)
```
### 3. Report Multiple Metrics
Don't rely on single metric:
```python
from sklearn.metrics import accuracy_score, f1_score, precision_score
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred, average='weighted')
precision = precision_score(y_test, y_pred, average='weighted')
```
### 4. Cross-Validation
For robust evaluation on small datasets:
```python
from sklearn.model_selection import cross_val_score
scores = cross_val_score(
clf, X_train, y_train,
cv=5,
scoring='accuracy'
)
print(f"CV Accuracy: {scores.mean():.4f} (+/- {scores.std():.4f})")
```
### 5. Compare Against Baselines
Always compare with simple baselines:
```python
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
# Simple baseline: 1-NN with Euclidean distance
baseline = KNeighborsTimeSeriesClassifier(n_neighbors=1, distance="euclidean")
baseline.fit(X_train, y_train)
baseline_acc = baseline.score(X_test, y_test)
print(f"Baseline: {baseline_acc:.4f}")
print(f"Your model: {accuracy:.4f}")
```
### 6. Statistical Significance
Test if improvements are statistically significant:
```python
from aeon.benchmarking import wilcoxon_test
# Run on multiple datasets
accuracies_alg1 = [0.85, 0.92, 0.78, 0.88]
accuracies_alg2 = [0.83, 0.90, 0.76, 0.86]
stat, p_value = wilcoxon_test(accuracies_alg1, accuracies_alg2)
if p_value < 0.05:
print("Difference is statistically significant")
```
## Dataset Discovery
Find datasets matching criteria:
```python
# List all available classification datasets
from aeon.datasets import get_available_datasets
datasets = get_available_datasets("classification")
print(f"Found {len(datasets)} classification datasets")
# Filter by properties
univariate_datasets = [
d for d in datasets
if get_dataset_meta_data(d)['n_channels'] == 1
]
```

View File

@@ -0,0 +1,256 @@
# Distance Metrics
Aeon provides specialized distance functions for measuring similarity between time series, compatible with both aeon and scikit-learn estimators.
## Distance Categories
### Elastic Distances
Allow flexible temporal alignment between series:
**Dynamic Time Warping Family:**
- `dtw` - Classic Dynamic Time Warping
- `ddtw` - Derivative DTW (compares derivatives)
- `wdtw` - Weighted DTW (penalizes warping by location)
- `wddtw` - Weighted Derivative DTW
- `shape_dtw` - Shape-based DTW
**Edit-Based:**
- `erp` - Edit distance with Real Penalty
- `edr` - Edit Distance on Real sequences
- `lcss` - Longest Common SubSequence
- `twe` - Time Warp Edit distance
**Specialized:**
- `msm` - Move-Split-Merge distance
- `adtw` - Amerced DTW
- `sbd` - Shape-Based Distance
**Use when**: Time series may have temporal shifts, speed variations, or phase differences.
### Lock-Step Distances
Compare time series point-by-point without alignment:
- `euclidean` - Euclidean distance (L2 norm)
- `manhattan` - Manhattan distance (L1 norm)
- `minkowski` - Generalized Minkowski distance (Lp norm)
- `squared` - Squared Euclidean distance
**Use when**: Series already aligned, need computational speed, or no temporal warping expected.
## Usage Patterns
### Computing Single Distance
```python
from aeon.distances import dtw_distance
# Distance between two time series
distance = dtw_distance(x, y)
# With window constraint (Sakoe-Chiba band)
distance = dtw_distance(x, y, window=0.1)
```
### Pairwise Distance Matrix
```python
from aeon.distances import dtw_pairwise_distance
# All pairwise distances in collection
X = [series1, series2, series3, series4]
distance_matrix = dtw_pairwise_distance(X)
# Cross-collection distances
distance_matrix = dtw_pairwise_distance(X_train, X_test)
```
### Cost Matrix and Alignment Path
```python
from aeon.distances import dtw_cost_matrix, dtw_alignment_path
# Get full cost matrix
cost_matrix = dtw_cost_matrix(x, y)
# Get optimal alignment path
path = dtw_alignment_path(x, y)
# Returns indices: [(0,0), (1,1), (2,1), (2,2), ...]
```
### Using with Estimators
```python
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
# Use DTW distance in classifier
clf = KNeighborsTimeSeriesClassifier(
n_neighbors=5,
distance="dtw",
distance_params={"window": 0.2}
)
clf.fit(X_train, y_train)
```
## Distance Parameters
### Window Constraints
Limit warping path deviation (improves speed and prevents pathological warping):
```python
# Sakoe-Chiba band: window as fraction of series length
dtw_distance(x, y, window=0.1) # Allow 10% deviation
# Itakura parallelogram: slopes constrain path
dtw_distance(x, y, itakura_max_slope=2.0)
```
### Normalization
Control whether to z-normalize series before distance computation:
```python
# Most elastic distances support normalization
distance = dtw_distance(x, y, normalize=True)
```
### Distance-Specific Parameters
```python
# ERP: penalty for gaps
distance = erp_distance(x, y, g=0.5)
# TWE: stiffness and penalty parameters
distance = twe_distance(x, y, nu=0.001, lmbda=1.0)
# LCSS: epsilon threshold for matching
distance = lcss_distance(x, y, epsilon=0.5)
```
## Algorithm Selection
### By Use Case:
**Temporal misalignment**: DTW, DDTW, WDTW
**Speed variations**: DTW with window constraint
**Shape similarity**: Shape DTW, SBD
**Edit operations**: ERP, EDR, LCSS
**Derivative matching**: DDTW
**Computational speed**: Euclidean, Manhattan
**Outlier robustness**: Manhattan, LCSS
### By Computational Cost:
**Fastest**: Euclidean (O(n))
**Fast**: Constrained DTW (O(nw) where w is window)
**Medium**: Full DTW (O(n²))
**Slower**: Complex elastic distances (ERP, TWE, MSM)
## Quick Reference Table
| Distance | Alignment | Speed | Robustness | Interpretability |
|----------|-----------|-------|------------|------------------|
| Euclidean | Lock-step | Very Fast | Low | High |
| DTW | Elastic | Medium | Medium | Medium |
| DDTW | Elastic | Medium | High | Medium |
| WDTW | Elastic | Medium | Medium | Medium |
| ERP | Edit-based | Slow | High | Low |
| LCSS | Edit-based | Slow | Very High | Low |
| Shape DTW | Elastic | Medium | Medium | High |
## Best Practices
### 1. Normalization
Most distances sensitive to scale; normalize when appropriate:
```python
from aeon.transformations.collection import Normalizer
normalizer = Normalizer()
X_normalized = normalizer.fit_transform(X)
```
### 2. Window Constraints
For DTW variants, use window constraints for speed and better generalization:
```python
# Start with 10-20% window
distance = dtw_distance(x, y, window=0.1)
```
### 3. Series Length
- Equal-length required: Most lock-step distances
- Unequal-length supported: Elastic distances (DTW, ERP, etc.)
### 4. Multivariate Series
Most distances support multivariate time series:
```python
# x.shape = (n_channels, n_timepoints)
distance = dtw_distance(x_multivariate, y_multivariate)
```
### 5. Performance Optimization
- Use numba-compiled implementations (default in aeon)
- Consider lock-step distances if alignment not needed
- Use windowed DTW instead of full DTW
- Precompute distance matrices for repeated use
### 6. Choosing the Right Distance
```python
# Quick decision tree:
if series_aligned:
use_distance = "euclidean"
elif need_speed:
use_distance = "dtw" # with window constraint
elif temporal_shifts_expected:
use_distance = "dtw" or "shape_dtw"
elif outliers_present:
use_distance = "lcss" or "manhattan"
elif derivatives_matter:
use_distance = "ddtw" or "wddtw"
```
## Integration with scikit-learn
Aeon distances work with sklearn estimators:
```python
from sklearn.neighbors import KNeighborsClassifier
from aeon.distances import dtw_pairwise_distance
# Precompute distance matrix
X_train_distances = dtw_pairwise_distance(X_train)
# Use with sklearn
clf = KNeighborsClassifier(metric='precomputed')
clf.fit(X_train_distances, y_train)
```
## Available Distance Functions
Get list of all available distances:
```python
from aeon.distances import get_distance_function_names
print(get_distance_function_names())
# ['dtw', 'ddtw', 'wdtw', 'euclidean', 'erp', 'edr', ...]
```
Retrieve specific distance function:
```python
from aeon.distances import get_distance_function
distance_func = get_distance_function("dtw")
result = distance_func(x, y, window=0.1)
```

View File

@@ -0,0 +1,140 @@
# Time Series Forecasting
Aeon provides forecasting algorithms for predicting future time series values.
## Naive and Baseline Methods
Simple forecasting strategies for comparison:
- `NaiveForecaster` - Multiple strategies: last value, mean, seasonal naive
- Parameters: `strategy` ("last", "mean", "seasonal"), `sp` (seasonal period)
- **Use when**: Establishing baselines or simple patterns
## Statistical Models
Classical time series forecasting methods:
### ARIMA
- `ARIMA` - AutoRegressive Integrated Moving Average
- Parameters: `p` (AR order), `d` (differencing), `q` (MA order)
- **Use when**: Linear patterns, stationary or difference-stationary series
### Exponential Smoothing
- `ETS` - Error-Trend-Seasonal decomposition
- Parameters: `error`, `trend`, `seasonal` types
- **Use when**: Trend and seasonal patterns present
### Threshold Autoregressive
- `TAR` - Threshold Autoregressive model for regime switching
- `AutoTAR` - Automated threshold discovery
- **Use when**: Series exhibits different behaviors in different regimes
### Theta Method
- `Theta` - Classical Theta forecasting
- Parameters: `theta`, `weights` for decomposition
- **Use when**: Simple but effective baseline needed
### Time-Varying Parameter
- `TVP` - Time-varying parameter model with Kalman filtering
- **Use when**: Parameters change over time
## Deep Learning Forecasters
Neural networks for complex temporal patterns:
- `TCNForecaster` - Temporal Convolutional Network
- Dilated convolutions for large receptive fields
- **Use when**: Long sequences, need non-recurrent architecture
- `DeepARNetwork` - Probabilistic forecasting with RNNs
- Provides prediction intervals
- **Use when**: Need probabilistic forecasts, uncertainty quantification
## Regression-Based Forecasting
Apply regression to lagged features:
- `RegressionForecaster` - Wraps regressors for forecasting
- Parameters: `window_length`, `horizon`
- **Use when**: Want to use any regressor as forecaster
## Quick Start
```python
from aeon.forecasting.naive import NaiveForecaster
from aeon.forecasting.arima import ARIMA
import numpy as np
# Create time series
y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Naive baseline
naive = NaiveForecaster(strategy="last")
naive.fit(y)
forecast_naive = naive.predict(fh=[1, 2, 3])
# ARIMA model
arima = ARIMA(order=(1, 1, 1))
arima.fit(y)
forecast_arima = arima.predict(fh=[1, 2, 3])
```
## Forecasting Horizon
The forecasting horizon (`fh`) specifies which future time points to predict:
```python
# Relative horizon (next 3 steps)
fh = [1, 2, 3]
# Absolute horizon (specific time indices)
from aeon.forecasting.base import ForecastingHorizon
fh = ForecastingHorizon([11, 12, 13], is_relative=False)
```
## Model Selection
- **Baseline**: NaiveForecaster with seasonal strategy
- **Linear patterns**: ARIMA
- **Trend + seasonality**: ETS
- **Regime changes**: TAR, AutoTAR
- **Complex patterns**: TCNForecaster
- **Probabilistic**: DeepARNetwork
- **Long sequences**: TCNForecaster
- **Short sequences**: ARIMA, ETS
## Evaluation Metrics
Use standard forecasting metrics:
```python
from aeon.performance_metrics.forecasting import (
mean_absolute_error,
mean_squared_error,
mean_absolute_percentage_error
)
# Calculate error
mae = mean_absolute_error(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)
mape = mean_absolute_percentage_error(y_true, y_pred)
```
## Exogenous Variables
Many forecasters support exogenous features:
```python
# Train with exogenous variables
forecaster.fit(y, X=X_train)
# Predict requires future exogenous values
y_pred = forecaster.predict(fh=[1, 2, 3], X=X_test)
```
## Base Classes
- `BaseForecaster` - Abstract base for all forecasters
- `BaseDeepForecaster` - Base for deep learning forecasters
Extend these to implement custom forecasting algorithms.

View File

@@ -1,442 +0,0 @@
# Learning Tasks: Classification, Regression, Clustering, and Similarity Search
This reference provides comprehensive details on supervised and unsupervised learning tasks for time series collections.
## Time Series Classification
Time series classification (TSC) assigns labels to entire sequences. Aeon provides diverse algorithm families with unique strengths.
### Algorithm Categories
#### 1. Convolution-Based Classifiers
Transform time series using random convolutional kernels:
**ROCKET (RAndom Convolutional KErnel Transform)**
- Ultra-fast feature extraction via random kernels
- 10,000+ kernels generate discriminative features
- Linear classifier on extracted features
```python
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
```
**Variants:**
- `MiniRocketClassifier`: Faster, streamlined version
- `MultiRocketClassifier`: Multivariate extensions
- `Arsenal`: Ensemble of ROCKET transformers
- `Hydra`: Dictionary-based convolution variant
#### 2. Deep Learning Classifiers
Neural networks specialized for time series:
**InceptionTime**
- Ensemble of Inception modules
- Captures patterns at multiple scales
- State-of-the-art on UCR benchmarks
```python
from aeon.classification.deep_learning import InceptionTimeClassifier
clf = InceptionTimeClassifier(n_epochs=200, batch_size=64)
clf.fit(X_train, y_train)
```
**Other architectures:**
- `ResNetClassifier`: Residual connections
- `FCNClassifier`: Fully Convolutional Networks
- `CNNClassifier`: Standard convolutional architecture
- `LITEClassifier`: Lightweight networks
- `MLPClassifier`: Multi-layer perceptrons
- `TapNetClassifier`: Attentional prototype networks
#### 3. Dictionary-Based Classifiers
Symbolic representations and bag-of-words approaches:
**BOSS (Bag of SFA Symbols)**
- Converts series to symbolic words
- Histogram-based classification
- Effective for shape patterns
```python
from aeon.classification.dictionary_based import BOSSEnsemble
clf = BOSSEnsemble(max_ensemble_size=500)
clf.fit(X_train, y_train)
```
**Other dictionary methods:**
- `TemporalDictionaryEnsemble (TDE)`: Enhanced BOSS with temporal info
- `WEASEL`: Word ExtrAction for time SEries cLassification
- `MUSE`: MUltivariate Symbolic Extension
- `MrSEQL`: Multiple Representations SEQuence Learner
#### 4. Distance-Based Classifiers
Leverage time series-specific distance metrics:
**K-Nearest Neighbors with DTW**
- Dynamic Time Warping handles temporal shifts
- Effective for shape-based similarity
```python
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
clf = KNeighborsTimeSeriesClassifier(
distance="dtw",
n_neighbors=5
)
clf.fit(X_train, y_train)
```
**Other distance methods:**
- `ElasticEnsemble`: Ensemble of elastic distances
- `ProximityForest`: Tree-based with elastic measures
- `ProximityTree`: Single tree variant
- `ShapeDTW`: DTW with shape descriptors
#### 5. Feature-Based Classifiers
Extract statistical and domain-specific features:
**Catch22**
- 22 time series features
- Canonical Time-series CHaracteristics
- Fast and interpretable
```python
from aeon.classification.feature_based import Catch22Classifier
clf = Catch22Classifier(estimator=RandomForestClassifier())
clf.fit(X_train, y_train)
```
**Other feature methods:**
- `FreshPRINCEClassifier`: Fresh Pipelines with Random Interval and Catch22 Features
- `SignatureClassifier`: Path signature features
- `TSFreshClassifier`: Comprehensive feature extraction (slower, more features)
- `SummaryClassifier`: Simple summary statistics
#### 6. Interval-Based Classifiers
Analyze discriminative time intervals:
**Time Series Forest (TSF)**
- Random intervals + summary statistics
- Random forest on extracted features
```python
from aeon.classification.interval_based import TimeSeriesForestClassifier
clf = TimeSeriesForestClassifier(n_estimators=500)
clf.fit(X_train, y_train)
```
**Other interval methods:**
- `CanonicalIntervalForest (CIF)`: Canonical Interval Forest
- `DrCIF`: Diverse Representation CIF
- `RISE`: Random Interval Spectral Ensemble
- `RandomIntervalClassifier`: Basic random interval approach
- `STSF`: Shapelet Transform Interval Forest
#### 7. Shapelet-Based Classifiers
Discover discriminative subsequences:
**Shapelets**: Small subsequences that best distinguish classes
```python
from aeon.classification.shapelet_based import ShapeletTransformClassifier
clf = ShapeletTransformClassifier(
n_shapelet_samples=10000,
max_shapelets=20
)
clf.fit(X_train, y_train)
```
**Other shapelet methods:**
- `LearningShapeletClassifier`: Gradient-based learning
- `SASTClassifier`: Shapelet-Attention Subsequence Transform
#### 8. Hybrid Ensembles
Combine multiple algorithm families:
**HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles)**
- State-of-the-art accuracy
- Combines shapelets, intervals, dictionaries, and spectral features
- V2 uses ROCKET and improved components
```python
from aeon.classification.hybrid import HIVECOTEV2
clf = HIVECOTEV2(n_jobs=-1) # Slow but highly accurate
clf.fit(X_train, y_train)
```
### Algorithm Selection Guide
**Fast and accurate (default choice):**
- `RocketClassifier` or `MiniRocketClassifier`
**Maximum accuracy (slow):**
- `HIVECOTEV2` or `InceptionTimeClassifier`
**Interpretable:**
- `Catch22Classifier` or `ShapeletTransformClassifier`
**Multivariate focus:**
- `MultiRocketClassifier` or `MUSE`
**Small datasets:**
- `KNeighborsTimeSeriesClassifier` with DTW
### Classification Workflow
```python
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
from sklearn.metrics import accuracy_score, classification_report
# Load data
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Train classifier
clf = RocketClassifier(n_jobs=-1)
clf.fit(X_train, y_train)
# Evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.3f}")
print(classification_report(y_test, y_pred))
```
## Time Series Regression
Time series regression predicts continuous values from sequences. Most classification algorithms have regression equivalents.
### Regression Algorithms
Available regressors mirror classification structure:
- `RocketRegressor`, `MiniRocketRegressor`, `MultiRocketRegressor`
- `InceptionTimeRegressor`, `ResNetRegressor`, `FCNRegressor`
- `KNeighborsTimeSeriesRegressor`
- `Catch22Regressor`, `FreshPRINCERegressor`
- `TimeSeriesForestRegressor`, `DrCIFRegressor`
### Regression Workflow
```python
from aeon.regression.convolution_based import RocketRegressor
from sklearn.metrics import mean_squared_error, r2_score
# Train regressor
reg = RocketRegressor(num_kernels=10000)
reg.fit(X_train, y_train_continuous)
# Predict and evaluate
y_pred = reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MSE: {mse:.3f}, R²: {r2:.3f}")
```
## Time Series Clustering
Clustering groups similar time series without labels.
### Clustering Algorithms
**TimeSeriesKMeans**
- K-means with time series distances
- Supports DTW, Euclidean, and other metrics
```python
from aeon.clustering import TimeSeriesKMeans
clusterer = TimeSeriesKMeans(
n_clusters=3,
distance="dtw",
n_init=10
)
clusterer.fit(X_collection)
labels = clusterer.labels_
```
**TimeSeriesKMedoids**
- Uses actual series as cluster centers
- More robust to outliers
```python
from aeon.clustering import TimeSeriesKMedoids
clusterer = TimeSeriesKMedoids(
n_clusters=3,
distance="euclidean"
)
clusterer.fit(X_collection)
```
**Other clustering methods:**
- `TimeSeriesKernelKMeans`: Kernel-based clustering
- `ElasticSOM`: Self-organizing maps with elastic distances
### Clustering Workflow
```python
from aeon.clustering import TimeSeriesKMeans
from aeon.distances import dtw_distance
import numpy as np
# Cluster time series
clusterer = TimeSeriesKMeans(n_clusters=4, distance="dtw")
clusterer.fit(X_train)
# Get cluster labels
labels = clusterer.predict(X_test)
# Compute cluster centers
centers = clusterer.cluster_centers_
# Evaluate clustering quality (if ground truth available)
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(y_true, labels)
```
## Similarity Search
Similarity search finds motifs, nearest neighbors, and repeated patterns.
### Key Concepts
**Motifs**: Frequently repeated subsequences within a time series
**Matrix Profile**: Data structure encoding nearest neighbor distances for all subsequences
### Similarity Search Methods
**Matrix Profile**
- Efficient motif discovery
- Change point detection
- Anomaly detection
```python
from aeon.similarity_search import MatrixProfile
mp = MatrixProfile(window_size=50)
profile = mp.fit_transform(X_series)
# Find top motif
motif_idx = np.argmin(profile)
```
**Query Search**
- Find nearest neighbors to a query subsequence
- Useful for template matching
```python
from aeon.similarity_search import QuerySearch
searcher = QuerySearch(distance="euclidean")
distances, indices = searcher.search(X_series, query_subsequence)
```
### Similarity Search Workflow
```python
from aeon.similarity_search import MatrixProfile
import numpy as np
# Compute matrix profile
mp = MatrixProfile(window_size=100)
profile, profile_index = mp.fit_transform(X_series)
# Find top-k motifs (lowest profile values)
k = 3
motif_indices = np.argsort(profile)[:k]
# Find anomalies (highest profile values)
anomaly_indices = np.argsort(profile)[-k:]
```
## Ensemble and Composition Tools
### Voting Ensembles
```python
from aeon.classification.ensemble import WeightedEnsembleClassifier
from aeon.classification.convolution_based import RocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
ensemble = WeightedEnsembleClassifier(
estimators=[
('rocket', RocketClassifier()),
('knn', KNeighborsTimeSeriesClassifier())
]
)
ensemble.fit(X_train, y_train)
```
### Pipelines
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('features', Catch22()),
('classifier', RandomForestClassifier())
])
pipeline.fit(X_train, y_train)
```
## Model Selection and Validation
### Cross-Validation
```python
from sklearn.model_selection import cross_val_score
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier()
scores = cross_val_score(clf, X_train, y_train, cv=5)
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
```
### Grid Search
```python
from sklearn.model_selection import GridSearchCV
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
param_grid = {
'n_neighbors': [1, 3, 5, 7],
'distance': ['dtw', 'euclidean', 'erp']
}
clf = KNeighborsTimeSeriesClassifier()
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(f"Best params: {grid_search.best_params_}")
```
## Discovery Functions
Find available estimators programmatically:
```python
from aeon.utils.discovery import all_estimators
# Get all classifiers
classifiers = all_estimators(type_filter="classifier")
# Get all regressors
regressors = all_estimators(type_filter="regressor")
# Get all clusterers
clusterers = all_estimators(type_filter="clusterer")
# Filter by tag (e.g., multivariate capable)
mv_classifiers = all_estimators(
type_filter="classifier",
filter_tags={"capability:multivariate": True}
)
```

View File

@@ -0,0 +1,289 @@
# Deep Learning Networks
Aeon provides neural network architectures specifically designed for time series tasks. These networks serve as building blocks for classification, regression, clustering, and forecasting.
## Core Network Architectures
### Convolutional Networks
**FCNNetwork** - Fully Convolutional Network
- Three convolutional blocks with batch normalization
- Global average pooling for dimensionality reduction
- **Use when**: Need simple yet effective CNN baseline
**ResNetNetwork** - Residual Network
- Residual blocks with skip connections
- Prevents vanishing gradients in deep networks
- **Use when**: Deep networks needed, training stability important
**InceptionNetwork** - Inception Modules
- Multi-scale feature extraction with parallel convolutions
- Different kernel sizes capture patterns at various scales
- **Use when**: Patterns exist at multiple temporal scales
**TimeCNNNetwork** - Standard CNN
- Basic convolutional architecture
- **Use when**: Simple CNN sufficient, interpretability valued
**DisjointCNNNetwork** - Separate Pathways
- Disjoint convolutional pathways
- **Use when**: Different feature extraction strategies needed
**DCNNNetwork** - Dilated CNN
- Dilated convolutions for large receptive fields
- **Use when**: Long-range dependencies without many layers
### Recurrent Networks
**RecurrentNetwork** - RNN/LSTM/GRU
- Configurable cell type (RNN, LSTM, GRU)
- Sequential modeling of temporal dependencies
- **Use when**: Sequential dependencies critical, variable-length series
### Temporal Convolutional Network
**TCNNetwork** - Temporal Convolutional Network
- Dilated causal convolutions
- Large receptive field without recurrence
- **Use when**: Long sequences, need parallelizable architecture
### Multi-Layer Perceptron
**MLPNetwork** - Basic Feedforward
- Simple fully-connected layers
- Flattens time series before processing
- **Use when**: Baseline needed, computational limits, or simple patterns
## Encoder-Based Architectures
Networks designed for representation learning and clustering.
### Autoencoder Variants
**EncoderNetwork** - Generic Encoder
- Flexible encoder structure
- **Use when**: Custom encoding needed
**AEFCNNetwork** - FCN-based Autoencoder
- Fully convolutional encoder-decoder
- **Use when**: Need convolutional representation learning
**AEResNetNetwork** - ResNet Autoencoder
- Residual blocks in encoder-decoder
- **Use when**: Deep autoencoding with skip connections
**AEDCNNNetwork** - Dilated CNN Autoencoder
- Dilated convolutions for compression
- **Use when**: Need large receptive field in autoencoder
**AEDRNNNetwork** - Dilated RNN Autoencoder
- Dilated recurrent connections
- **Use when**: Sequential patterns with long-range dependencies
**AEBiGRUNetwork** - Bidirectional GRU
- Bidirectional recurrent encoding
- **Use when**: Context from both directions helpful
**AEAttentionBiGRUNetwork** - Attention + BiGRU
- Attention mechanism on BiGRU outputs
- **Use when**: Need to focus on important time steps
## Specialized Architectures
**LITENetwork** - Lightweight Inception Time Ensemble
- Efficient inception-based architecture
- LITEMV variant for multivariate series
- **Use when**: Need efficiency with strong performance
**DeepARNetwork** - Probabilistic Forecasting
- Autoregressive RNN for forecasting
- Produces probabilistic predictions
- **Use when**: Need forecast uncertainty quantification
## Usage with Estimators
Networks are typically used within estimators, not directly:
```python
from aeon.classification.deep_learning import FCNClassifier
from aeon.regression.deep_learning import ResNetRegressor
from aeon.clustering.deep_learning import AEFCNClusterer
# Classification with FCN
clf = FCNClassifier(n_epochs=100, batch_size=16)
clf.fit(X_train, y_train)
# Regression with ResNet
reg = ResNetRegressor(n_epochs=100)
reg.fit(X_train, y_train)
# Clustering with autoencoder
clusterer = AEFCNClusterer(n_clusters=3, n_epochs=100)
labels = clusterer.fit_predict(X_train)
```
## Custom Network Configuration
Many networks accept configuration parameters:
```python
# Configure FCN layers
clf = FCNClassifier(
n_epochs=200,
batch_size=32,
kernel_size=[7, 5, 3], # Kernel sizes for each layer
n_filters=[128, 256, 128], # Filters per layer
learning_rate=0.001
)
```
## Base Classes
- `BaseDeepLearningNetwork` - Abstract base for all networks
- `BaseDeepRegressor` - Base for deep regression
- `BaseDeepClassifier` - Base for deep classification
- `BaseDeepForecaster` - Base for deep forecasting
Extend these to implement custom architectures.
## Training Considerations
### Hyperparameters
Key hyperparameters to tune:
- `n_epochs` - Training iterations (50-200 typical)
- `batch_size` - Samples per batch (16-64 typical)
- `learning_rate` - Step size (0.0001-0.01)
- Network-specific: layers, filters, kernel sizes
### Callbacks
Many networks support callbacks for training monitoring:
```python
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
clf = FCNClassifier(
n_epochs=200,
callbacks=[
EarlyStopping(patience=20, restore_best_weights=True),
ReduceLROnPlateau(patience=10, factor=0.5)
]
)
```
### GPU Acceleration
Deep learning networks benefit from GPU:
```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0' # Use first GPU
# Networks automatically use GPU if available
clf = InceptionTimeClassifier(n_epochs=100)
clf.fit(X_train, y_train)
```
## Architecture Selection
### By Task:
**Classification**: InceptionNetwork, ResNetNetwork, FCNNetwork
**Regression**: InceptionNetwork, ResNetNetwork, TCNNetwork
**Forecasting**: TCNNetwork, DeepARNetwork, RecurrentNetwork
**Clustering**: AEFCNNetwork, AEResNetNetwork, AEAttentionBiGRUNetwork
### By Data Characteristics:
**Long sequences**: TCNNetwork, DCNNNetwork (dilated convolutions)
**Short sequences**: MLPNetwork, FCNNetwork
**Multivariate**: InceptionNetwork, FCNNetwork, LITENetwork
**Variable length**: RecurrentNetwork with masking
**Multi-scale patterns**: InceptionNetwork
### By Computational Resources:
**Limited compute**: MLPNetwork, LITENetwork
**Moderate compute**: FCNNetwork, TimeCNNNetwork
**High compute available**: InceptionNetwork, ResNetNetwork
**GPU available**: Any deep network (major speedup)
## Best Practices
### 1. Data Preparation
Normalize input data:
```python
from aeon.transformations.collection import Normalizer
normalizer = Normalizer()
X_train_norm = normalizer.fit_transform(X_train)
X_test_norm = normalizer.transform(X_test)
```
### 2. Training/Validation Split
Use validation set for early stopping:
```python
from sklearn.model_selection import train_test_split
X_train_fit, X_val, y_train_fit, y_val = train_test_split(
X_train, y_train, test_size=0.2, stratify=y_train
)
clf = FCNClassifier(n_epochs=200)
clf.fit(X_train_fit, y_train_fit, validation_data=(X_val, y_val))
```
### 3. Start Simple
Begin with simpler architectures before complex ones:
1. Try MLPNetwork or FCNNetwork first
2. If insufficient, try ResNetNetwork or InceptionNetwork
3. Consider ensembles if single models insufficient
### 4. Hyperparameter Tuning
Use grid search or random search:
```python
from sklearn.model_selection import GridSearchCV
param_grid = {
'n_epochs': [100, 200],
'batch_size': [16, 32],
'learning_rate': [0.001, 0.0001]
}
clf = FCNClassifier()
grid = GridSearchCV(clf, param_grid, cv=3)
grid.fit(X_train, y_train)
```
### 5. Regularization
Prevent overfitting:
- Use dropout (if network supports)
- Early stopping
- Data augmentation (if available)
- Reduce model complexity
### 6. Reproducibility
Set random seeds:
```python
import numpy as np
import random
import tensorflow as tf
seed = 42
np.random.seed(seed)
random.seed(seed)
tf.random.set_seed(seed)
```

View File

@@ -0,0 +1,118 @@
# Time Series Regression
Aeon provides time series regressors across 9 categories for predicting continuous values from temporal sequences.
## Convolution-Based Regressors
Apply convolutional kernels for feature extraction:
- `HydraRegressor` - Multi-resolution dilated convolutions
- `RocketRegressor` - Random convolutional kernels
- `MiniRocketRegressor` - Simplified ROCKET for speed
- `MultiRocketRegressor` - Combined ROCKET variants
- `MultiRocketHydraRegressor` - Merges ROCKET and Hydra approaches
**Use when**: Need fast regression with strong baseline performance.
## Deep Learning Regressors
Neural architectures for end-to-end temporal regression:
- `FCNRegressor` - Fully convolutional network
- `ResNetRegressor` - Residual blocks with skip connections
- `InceptionTimeRegressor` - Multi-scale inception modules
- `TimeCNNRegressor` - Standard CNN architecture
- `RecurrentRegressor` - RNN/LSTM/GRU variants
- `MLPRegressor` - Multi-layer perceptron
- `EncoderRegressor` - Generic encoder wrapper
- `LITERegressor` - Lightweight inception time ensemble
- `DisjointCNNRegressor` - Specialized CNN architecture
**Use when**: Large datasets, complex patterns, or need feature learning.
## Distance-Based Regressors
k-nearest neighbors with temporal distance metrics:
- `KNeighborsTimeSeriesRegressor` - k-NN with DTW, LCSS, ERP, or other distances
**Use when**: Small datasets, local similarity patterns, or interpretable predictions.
## Feature-Based Regressors
Extract statistical features before regression:
- `Catch22Regressor` - 22 canonical time-series characteristics
- `FreshPRINCERegressor` - Pipeline combining multiple feature extractors
- `SummaryRegressor` - Summary statistics features
- `TSFreshRegressor` - Automated tsfresh feature extraction
**Use when**: Need interpretable features or domain-specific feature engineering.
## Hybrid Regressors
Combine multiple approaches:
- `RISTRegressor` - Randomized Interval-Shapelet Transformation
**Use when**: Benefit from combining interval and shapelet methods.
## Interval-Based Regressors
Extract features from time intervals:
- `CanonicalIntervalForestRegressor` - Random intervals with decision trees
- `DrCIFRegressor` - Diverse Representation CIF
- `TimeSeriesForestRegressor` - Random interval ensemble
- `RandomIntervalRegressor` - Simple interval-based approach
- `RandomIntervalSpectralEnsembleRegressor` - Spectral interval features
- `QUANTRegressor` - Quantile-based interval features
**Use when**: Predictive patterns occur in specific time windows.
## Shapelet-Based Regressors
Use discriminative subsequences for prediction:
- `RDSTRegressor` - Random Dilated Shapelet Transform
**Use when**: Need phase-invariant discriminative patterns.
## Composition Tools
Build custom regression pipelines:
- `RegressorPipeline` - Chain transformers with regressors
- `RegressorEnsemble` - Weighted ensemble with learnable weights
- `SklearnRegressorWrapper` - Adapt sklearn regressors for time series
## Utilities
- `DummyRegressor` - Baseline strategies (mean, median)
- `BaseRegressor` - Abstract base for custom regressors
- `BaseDeepRegressor` - Base for deep learning regressors
## Quick Start
```python
from aeon.regression.convolution_based import RocketRegressor
from aeon.datasets import load_regression
# Load data
X_train, y_train = load_regression("Covid3Month", split="train")
X_test, y_test = load_regression("Covid3Month", split="test")
# Train and predict
reg = RocketRegressor()
reg.fit(X_train, y_train)
predictions = reg.predict(X_test)
```
## Algorithm Selection
- **Speed priority**: MiniRocketRegressor
- **Accuracy priority**: InceptionTimeRegressor, MultiRocketHydraRegressor
- **Interpretability**: Catch22Regressor, SummaryRegressor
- **Small data**: KNeighborsTimeSeriesRegressor
- **Large data**: Deep learning regressors, ROCKET variants
- **Interval patterns**: DrCIFRegressor, CanonicalIntervalForestRegressor

View File

@@ -0,0 +1,163 @@
# Time Series Segmentation
Aeon provides algorithms to partition time series into regions with distinct characteristics, identifying change points and boundaries.
## Segmentation Algorithms
### Binary Segmentation
- `BinSegmenter` - Recursive binary segmentation
- Iteratively splits series at most significant change points
- Parameters: `n_segments`, `cost_function`
- **Use when**: Known number of segments, hierarchical structure
### Classification-Based
- `ClaSPSegmenter` - Classification Score Profile
- Uses classification performance to identify boundaries
- Discovers segments where classification distinguishes neighbors
- **Use when**: Segments have different temporal patterns
### Fast Pattern-Based
- `FLUSSSegmenter` - Fast Low-cost Unipotent Semantic Segmentation
- Efficient semantic segmentation using arc crossings
- Based on matrix profile
- **Use when**: Large time series, need speed and pattern discovery
### Information Theory
- `InformationGainSegmenter` - Information gain maximization
- Finds boundaries maximizing information gain
- **Use when**: Statistical differences between segments
### Gaussian Modeling
- `GreedyGaussianSegmenter` - Greedy Gaussian approximation
- Models segments as Gaussian distributions
- Incrementally adds change points
- **Use when**: Segments follow Gaussian distributions
### Hierarchical Agglomerative
- `EAggloSegmenter` - Bottom-up merging approach
- Estimates change points via agglomeration
- **Use when**: Want hierarchical segmentation structure
### Hidden Markov Models
- `HMMSegmenter` - HMM with Viterbi decoding
- Probabilistic state-based segmentation
- **Use when**: Segments represent hidden states
### Dimensionality-Based
- `HidalgoSegmenter` - Heterogeneous Intrinsic Dimensionality Algorithm
- Detects changes in local dimensionality
- **Use when**: Dimensionality shifts between segments
### Baseline
- `RandomSegmenter` - Random change point generation
- **Use when**: Need null hypothesis baseline
## Quick Start
```python
from aeon.segmentation import ClaSPSegmenter
import numpy as np
# Create time series with regime changes
y = np.concatenate([
np.sin(np.linspace(0, 10, 100)), # Segment 1
np.cos(np.linspace(0, 10, 100)), # Segment 2
np.sin(2 * np.linspace(0, 10, 100)) # Segment 3
])
# Segment the series
segmenter = ClaSPSegmenter()
change_points = segmenter.fit_predict(y)
print(f"Detected change points: {change_points}")
```
## Output Format
Segmenters return change point indices:
```python
# change_points = [100, 200] # Boundaries between segments
# This divides series into: [0:100], [100:200], [200:end]
```
## Algorithm Selection
- **Speed priority**: FLUSSSegmenter, BinSegmenter
- **Accuracy priority**: ClaSPSegmenter, HMMSegmenter
- **Known segment count**: BinSegmenter with n_segments parameter
- **Unknown segment count**: ClaSPSegmenter, InformationGainSegmenter
- **Pattern changes**: FLUSSSegmenter, ClaSPSegmenter
- **Statistical changes**: InformationGainSegmenter, GreedyGaussianSegmenter
- **State transitions**: HMMSegmenter
## Common Use Cases
### Regime Change Detection
Identify when time series behavior fundamentally changes:
```python
from aeon.segmentation import InformationGainSegmenter
segmenter = InformationGainSegmenter(k=3) # Up to 3 change points
change_points = segmenter.fit_predict(stock_prices)
```
### Activity Segmentation
Segment sensor data into activities:
```python
from aeon.segmentation import ClaSPSegmenter
segmenter = ClaSPSegmenter()
boundaries = segmenter.fit_predict(accelerometer_data)
```
### Seasonal Boundary Detection
Find season transitions in time series:
```python
from aeon.segmentation import HMMSegmenter
segmenter = HMMSegmenter(n_states=4) # 4 seasons
segments = segmenter.fit_predict(temperature_data)
```
## Evaluation Metrics
Use segmentation quality metrics:
```python
from aeon.benchmarking.metrics.segmentation import (
count_error,
hausdorff_error
)
# Count error: difference in number of change points
count_err = count_error(y_true, y_pred)
# Hausdorff: maximum distance between predicted and true points
hausdorff_err = hausdorff_error(y_true, y_pred)
```
## Best Practices
1. **Normalize data**: Ensures change detection not dominated by scale
2. **Choose appropriate metric**: Different algorithms optimize different criteria
3. **Validate segments**: Visualize to verify meaningful boundaries
4. **Handle noise**: Consider smoothing before segmentation
5. **Domain knowledge**: Use expected segment count if known
6. **Parameter tuning**: Adjust sensitivity parameters (thresholds, penalties)
## Visualization
```python
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 4))
plt.plot(y, label='Time Series')
for cp in change_points:
plt.axvline(cp, color='r', linestyle='--', label='Change Point')
plt.legend()
plt.show()
```

View File

@@ -0,0 +1,187 @@
# Similarity Search
Aeon provides tools for finding similar patterns within and across time series, including subsequence search, motif discovery, and approximate nearest neighbors.
## Subsequence Nearest Neighbors (SNN)
Find most similar subsequences within a time series.
### MASS Algorithm
- `MassSNN` - Mueen's Algorithm for Similarity Search
- Fast normalized cross-correlation for similarity
- Computes distance profile efficiently
- **Use when**: Need exact nearest neighbor distances, large series
### STOMP-Based Motif Discovery
- `StompMotif` - Discovers recurring patterns (motifs)
- Finds top-k most similar subsequence pairs
- Based on matrix profile computation
- **Use when**: Want to discover repeated patterns
### Brute Force Baseline
- `DummySNN` - Exhaustive distance computation
- Computes all pairwise distances
- **Use when**: Small series, need exact baseline
## Collection-Level Search
Find similar time series across collections.
### Approximate Nearest Neighbors (ANN)
- `RandomProjectionIndexANN` - Locality-sensitive hashing
- Uses random projections with cosine similarity
- Builds index for fast approximate search
- **Use when**: Large collection, speed more important than exactness
## Quick Start: Motif Discovery
```python
from aeon.similarity_search import StompMotif
import numpy as np
# Create time series with repeated patterns
pattern = np.sin(np.linspace(0, 2*np.pi, 50))
y = np.concatenate([
pattern + np.random.normal(0, 0.1, 50),
np.random.normal(0, 1, 100),
pattern + np.random.normal(0, 0.1, 50),
np.random.normal(0, 1, 100)
])
# Find top-3 motifs
motif_finder = StompMotif(window_size=50, k=3)
motifs = motif_finder.fit_predict(y)
# motifs contains indices of motif occurrences
for i, (idx1, idx2) in enumerate(motifs):
print(f"Motif {i+1} at positions {idx1} and {idx2}")
```
## Quick Start: Subsequence Search
```python
from aeon.similarity_search import MassSNN
import numpy as np
# Time series to search within
y = np.sin(np.linspace(0, 20, 500))
# Query subsequence
query = np.sin(np.linspace(0, 2, 50))
# Find nearest subsequences
searcher = MassSNN()
distances = searcher.fit_transform(y, query)
# Find best match
best_match_idx = np.argmin(distances)
print(f"Best match at index {best_match_idx}")
```
## Quick Start: Approximate NN on Collections
```python
from aeon.similarity_search import RandomProjectionIndexANN
from aeon.datasets import load_classification
# Load time series collection
X_train, _ = load_classification("GunPoint", split="train")
# Build index
ann = RandomProjectionIndexANN(n_projections=8, n_bits=4)
ann.fit(X_train)
# Find approximate nearest neighbors
query = X_train[0]
neighbors, distances = ann.kneighbors(query, k=5)
```
## Matrix Profile
The matrix profile is a fundamental data structure for many similarity search tasks:
- **Distance Profile**: Distances from a query to all subsequences
- **Matrix Profile**: Minimum distance for each subsequence to any other
- **Motif**: Pair of subsequences with minimum distance
- **Discord**: Subsequence with maximum minimum distance (anomaly)
```python
from aeon.similarity_search import StompMotif
# Compute matrix profile and find motifs/discords
mp = StompMotif(window_size=50)
mp.fit(y)
# Access matrix profile
profile = mp.matrix_profile_
profile_indices = mp.matrix_profile_index_
# Find discords (anomalies)
discord_idx = np.argmax(profile)
```
## Algorithm Selection
- **Exact subsequence search**: MassSNN
- **Motif discovery**: StompMotif
- **Anomaly detection**: Matrix profile (see anomaly_detection.md)
- **Fast approximate search**: RandomProjectionIndexANN
- **Small data**: DummySNN for exact results
## Use Cases
### Pattern Matching
Find where a pattern occurs in a long series:
```python
# Find heartbeat pattern in ECG data
searcher = MassSNN()
distances = searcher.fit_transform(ecg_data, heartbeat_pattern)
occurrences = np.where(distances < threshold)[0]
```
### Motif Discovery
Identify recurring patterns:
```python
# Find repeated behavioral patterns
motif_finder = StompMotif(window_size=100, k=5)
motifs = motif_finder.fit_predict(activity_data)
```
### Time Series Retrieval
Find similar time series in database:
```python
# Build searchable index
ann = RandomProjectionIndexANN()
ann.fit(time_series_database)
# Query for similar series
neighbors = ann.kneighbors(query_series, k=10)
```
## Best Practices
1. **Window size**: Critical parameter for subsequence methods
- Too small: Captures noise
- Too large: Misses fine-grained patterns
- Rule of thumb: 10-20% of series length
2. **Normalization**: Most methods assume z-normalized subsequences
- Handles amplitude variations
- Focus on shape similarity
3. **Distance metrics**: Different metrics for different needs
- Euclidean: Fast, shape-based
- DTW: Handles temporal warping
- Cosine: Scale-invariant
4. **Exclusion zone**: For motif discovery, exclude trivial matches
- Typically set to 0.5-1.0 × window_size
- Prevents finding overlapping occurrences
5. **Performance**:
- MASS is O(n log n) vs O(n²) brute force
- ANN trades accuracy for speed
- GPU acceleration available for some methods

View File

@@ -1,596 +0,0 @@
# Temporal Analysis: Forecasting, Anomaly Detection, and Segmentation
This reference provides comprehensive details on forecasting future values, detecting anomalies, and segmenting time series.
## Forecasting
Forecasting predicts future values in a time series based on historical patterns.
### Forecasting Concepts
**Forecasting horizon (fh)**: Number of steps ahead to predict
- Absolute: `fh=[1, 2, 3]` (predict steps 1, 2, 3)
- Relative: `fh=ForecastingHorizon([1, 2, 3], is_relative=True)`
**Exogenous variables**: External features that influence predictions
### Statistical Forecasters
#### ARIMA (AutoRegressive Integrated Moving Average)
Classical time series model combining AR, differencing, and MA components:
```python
from aeon.forecasting.arima import ARIMA
forecaster = ARIMA(
order=(1, 1, 1), # (p, d, q)
seasonal_order=(1, 1, 1, 12), # (P, D, Q, s)
suppress_warnings=True
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
**Parameters:**
- `p`: AR order (lags)
- `d`: Differencing order
- `q`: MA order (moving average)
- `P, D, Q, s`: Seasonal components
#### ETS (Exponential Smoothing)
State space model for trend and seasonality:
```python
from aeon.forecasting.ets import ETS
forecaster = ETS(
error="add",
trend="add",
seasonal="add",
sp=12 # seasonal period
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
**Model types:**
- Error: "add" (additive) or "mul" (multiplicative)
- Trend: "add", "mul", or None
- Seasonal: "add", "mul", or None
#### Theta Forecaster
Simple, effective method using exponential smoothing:
```python
from aeon.forecasting.theta import ThetaForecaster
forecaster = ThetaForecaster(deseasonalize=True, sp=12)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=np.arange(1, 13))
```
#### TAR (Threshold AutoRegressive)
Non-linear autoregressive model with regime switching:
```python
from aeon.forecasting.tar import TAR
forecaster = TAR(
delay=1,
threshold=0.0,
order_below=2,
order_above=2
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
**AutoTAR**: Automatically optimizes threshold:
```python
from aeon.forecasting.tar import AutoTAR
forecaster = AutoTAR(max_order=5)
forecaster.fit(y_train)
```
#### TVP (Time-Varying Parameter)
Kalman filter-based forecaster with dynamic coefficients:
```python
from aeon.forecasting.tvp import TVP
forecaster = TVP(
order=2,
use_exog=False
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Naive Baselines
Simple forecasting strategies for benchmarking:
```python
from aeon.forecasting.naive import NaiveForecaster
# Last value
forecaster = NaiveForecaster(strategy="last")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
# Seasonal naive (use value from same season last year)
forecaster = NaiveForecaster(strategy="seasonal_last", sp=12)
forecaster.fit(y_train)
# Mean
forecaster = NaiveForecaster(strategy="mean")
forecaster.fit(y_train)
# Drift (linear trend from first to last)
forecaster = NaiveForecaster(strategy="drift")
forecaster.fit(y_train)
```
**Strategies:**
- `"last"`: Repeat last observed value
- `"mean"`: Use mean of training data
- `"seasonal_last"`: Repeat value from previous season
- `"drift"`: Linear extrapolation
### Deep Learning Forecasters
#### TCN (Temporal Convolutional Network)
Deep learning with dilated causal convolutions:
```python
from aeon.forecasting.deep_learning import TCNForecaster
forecaster = TCNForecaster(
n_epochs=100,
batch_size=32,
kernel_size=3,
n_filters=64,
dilation_rate=2
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
### Regression-Based Forecasting
Transform forecasting into a supervised learning problem:
```python
from aeon.forecasting.compose import RegressionForecaster
from sklearn.ensemble import RandomForestRegressor
forecaster = RegressionForecaster(
regressor=RandomForestRegressor(n_estimators=100),
window_length=10
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Forecasting Workflow
```python
from aeon.forecasting.arima import ARIMA
from aeon.datasets import load_airline
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
# Load data
y = load_airline()
# Split train/test
split_point = int(len(y) * 0.8)
y_train, y_test = y[:split_point], y[split_point:]
# Fit forecaster
forecaster = ARIMA(order=(2, 1, 2), suppress_warnings=True)
forecaster.fit(y_train)
# Predict
fh = np.arange(1, len(y_test) + 1)
y_pred = forecaster.predict(fh=fh)
# Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"MAE: {mae:.3f}, RMSE: {rmse:.3f}")
```
### Forecasting with Exogenous Variables
```python
from aeon.forecasting.arima import ARIMA
# X contains exogenous features
forecaster = ARIMA(order=(1, 1, 1))
forecaster.fit(y_train, X=X_train)
# Must provide future exogenous values
y_pred = forecaster.predict(fh=[1, 2, 3], X=X_future)
```
### Multi-Step Forecasting Strategies
**Direct**: Train separate model for each horizon
**Recursive**: Use predictions as inputs for next step
**DirRec**: Combine both strategies
```python
from aeon.forecasting.compose import DirectReductionForecaster
from sklearn.linear_model import Ridge
forecaster = DirectReductionForecaster(
regressor=Ridge(),
window_length=10
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
```
## Anomaly Detection
Anomaly detection identifies unusual patterns or outliers in time series data.
### Anomaly Detection Types
**Point anomalies**: Single unusual values
**Contextual anomalies**: Values anomalous given context
**Collective anomalies**: Sequences of unusual behavior
### Distance-Based Anomaly Detectors
#### STOMP (Scalable Time series Ordered-search Matrix Profile)
Matrix profile-based anomaly detection:
```python
from aeon.anomaly_detection import STOMP
detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
# High scores indicate anomalies
threshold = np.percentile(anomaly_scores, 95)
anomalies = anomaly_scores > threshold
```
#### LeftSTAMPi
Incremental matrix profile for streaming data:
```python
from aeon.anomaly_detection import LeftSTAMPi
detector = LeftSTAMPi(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### MERLIN
Matrix profile with range constraints:
```python
from aeon.anomaly_detection import MERLIN
detector = MERLIN(window_size=50, k=3)
anomaly_scores = detector.fit_predict(X_series)
```
#### KMeansAD
K-means clustering-based anomaly detection:
```python
from aeon.anomaly_detection import KMeansAD
detector = KMeansAD(n_clusters=5, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### CBLOF (Cluster-Based Local Outlier Factor)
```python
from aeon.anomaly_detection import CBLOF
detector = CBLOF(n_clusters=8, alpha=0.9)
anomaly_scores = detector.fit_predict(X_series)
```
#### LOF (Local Outlier Factor)
Density-based outlier detection:
```python
from aeon.anomaly_detection import LOF
detector = LOF(n_neighbors=20, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### ROCKAD
ROCKET-based anomaly detection:
```python
from aeon.anomaly_detection import ROCKAD
detector = ROCKAD(num_kernels=1000, window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
### Distribution-Based Anomaly Detectors
#### COPOD (Copula-Based Outlier Detection)
```python
from aeon.anomaly_detection import COPOD
detector = COPOD(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
```
#### DWT_MLEAD
Discrete Wavelet Transform with Machine Learning:
```python
from aeon.anomaly_detection import DWT_MLEAD
detector = DWT_MLEAD(window_size=50, wavelet='db4')
anomaly_scores = detector.fit_predict(X_series)
```
### Outlier Detection Methods
#### IsolationForest
Ensemble tree-based isolation:
```python
from aeon.anomaly_detection import IsolationForest
detector = IsolationForest(
n_estimators=100,
window_size=50,
contamination=0.1
)
anomaly_scores = detector.fit_predict(X_series)
```
#### OneClassSVM
Support vector machine for novelty detection:
```python
from aeon.anomaly_detection import OneClassSVM
detector = OneClassSVM(
kernel='rbf',
nu=0.1,
window_size=50
)
anomaly_scores = detector.fit_predict(X_series)
```
#### STRAY (Search TRace AnomalY)
```python
from aeon.anomaly_detection import STRAY
detector = STRAY(alpha=0.05)
anomaly_scores = detector.fit_predict(X_series)
```
### Collection Anomaly Detection
Detect anomalous time series within a collection:
```python
from aeon.anomaly_detection import ClassificationAdapter
from aeon.classification.convolution_based import RocketClassifier
detector = ClassificationAdapter(
classifier=RocketClassifier()
)
detector.fit(X_normal) # Train on normal data
anomaly_labels = detector.predict(X_test) # 1 = anomaly, 0 = normal
```
### Anomaly Detection Workflow
```python
from aeon.anomaly_detection import STOMP
import numpy as np
import matplotlib.pyplot as plt
# Detect anomalies
detector = STOMP(window_size=100)
anomaly_scores = detector.fit_predict(X_series)
# Identify anomalies (top 5%)
threshold = np.percentile(anomaly_scores, 95)
anomaly_indices = np.where(anomaly_scores > threshold)[0]
# Visualize
plt.figure(figsize=(12, 6))
plt.subplot(2, 1, 1)
plt.plot(X_series[0, 0, :])
plt.scatter(anomaly_indices, X_series[0, 0, anomaly_indices],
color='red', label='Anomalies', zorder=5)
plt.legend()
plt.title('Time Series with Detected Anomalies')
plt.subplot(2, 1, 2)
plt.plot(anomaly_scores)
plt.axhline(threshold, color='red', linestyle='--', label='Threshold')
plt.legend()
plt.title('Anomaly Scores')
plt.tight_layout()
plt.show()
```
## Segmentation
Segmentation divides time series into distinct regions or identifies change points.
### Segmentation Concepts
**Change points**: Locations where statistical properties change
**Segments**: Homogeneous regions between change points
**Applications**: Regime detection, event identification, structural breaks
### Segmentation Algorithms
#### ClaSP (Classification Score Profile)
Discover change points using classification performance:
```python
from aeon.segmentation import ClaSPSegmenter
segmenter = ClaSPSegmenter(
n_segments=3,
period_length=10
)
change_points = segmenter.fit_predict(X_series)
print(f"Change points at indices: {change_points}")
```
**How it works:**
- Slides a window over the series
- Computes classification score for left vs. right segments
- High scores indicate change points
#### FLUSS (Fast Low-cost Unipotent Semantic Segmentation)
Matrix profile-based segmentation:
```python
from aeon.segmentation import FLUSSSegmenter
segmenter = FLUSSSegmenter(
n_segments=5,
window_size=50
)
change_points = segmenter.fit_predict(X_series)
```
#### BinSeg (Binary Segmentation)
Recursive splitting for change point detection:
```python
from aeon.segmentation import BinSegSegmenter
segmenter = BinSegSegmenter(
n_segments=4,
model="l2" # cost function
)
change_points = segmenter.fit_predict(X_series)
```
**Models:**
- `"l2"`: Least squares (continuous data)
- `"l1"`: Absolute deviation (robust to outliers)
- `"rbf"`: Radial basis function
- `"ar"`: Autoregressive model
#### HMM (Hidden Markov Model) Segmentation
Probabilistic state-based segmentation:
```python
from aeon.segmentation import HMMSegmenter
segmenter = HMMSegmenter(
n_states=3,
covariance_type="full"
)
segmenter.fit(X_series)
states = segmenter.predict(X_series)
```
### Segmentation Workflow
```python
from aeon.segmentation import ClaSPSegmenter
import matplotlib.pyplot as plt
# Detect change points
segmenter = ClaSPSegmenter(n_segments=4)
change_points = segmenter.fit_predict(X_series)
# Visualize segments
plt.figure(figsize=(12, 4))
plt.plot(X_series[0, 0, :])
for cp in change_points:
plt.axvline(cp, color='red', linestyle='--', alpha=0.7)
plt.title('Time Series Segmentation')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
# Extract segments
segments = []
prev_cp = 0
for cp in np.append(change_points, len(X_series[0, 0, :])):
segment = X_series[0, 0, prev_cp:cp]
segments.append(segment)
prev_cp = cp
```
### Multi-Variate Segmentation
```python
from aeon.segmentation import ClaSPSegmenter
# X_multivariate has shape (1, n_channels, n_timepoints)
segmenter = ClaSPSegmenter(n_segments=3)
change_points = segmenter.fit_predict(X_multivariate)
```
## Combining Forecasting, Anomaly Detection, and Segmentation
### Robust Forecasting with Anomaly Detection
```python
from aeon.forecasting.arima import ARIMA
from aeon.anomaly_detection import IsolationForest
# Detect and remove anomalies
detector = IsolationForest(window_size=50, contamination=0.1)
anomaly_scores = detector.fit_predict(X_series)
normal_mask = anomaly_scores < np.percentile(anomaly_scores, 90)
# Forecast on cleaned data
y_clean = y_train[normal_mask]
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_clean)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
### Segmentation-Based Forecasting
```python
from aeon.segmentation import ClaSPSegmenter
from aeon.forecasting.arima import ARIMA
# Segment time series
segmenter = ClaSPSegmenter(n_segments=3)
change_points = segmenter.fit_predict(X_series)
# Forecast using most recent segment
last_segment_start = change_points[-1]
y_recent = y_train[last_segment_start:]
forecaster = ARIMA(order=(1, 1, 1))
forecaster.fit(y_recent)
y_pred = forecaster.predict(fh=[1, 2, 3])
```
## Discovery Functions
Find available forecasters, detectors, and segmenters:
```python
from aeon.utils.discovery import all_estimators
# Get all forecasters
forecasters = all_estimators(type_filter="forecaster")
# Get all anomaly detectors
detectors = all_estimators(type_filter="anomaly-detector")
# Get all segmenters
segmenters = all_estimators(type_filter="segmenter")
```

View File

@@ -0,0 +1,246 @@
# Transformations
Aeon provides extensive transformation capabilities for preprocessing, feature extraction, and representation learning from time series data.
## Transformation Types
Aeon distinguishes between:
- **CollectionTransformers**: Transform multiple time series (collections)
- **SeriesTransformers**: Transform individual time series
## Collection Transformers
### Convolution-Based Feature Extraction
Fast, scalable feature generation using random kernels:
- `RocketTransformer` - Random convolutional kernels
- `MiniRocketTransformer` - Simplified ROCKET for speed
- `MultiRocketTransformer` - Enhanced ROCKET variant
- `HydraTransformer` - Multi-resolution dilated convolutions
- `MultiRocketHydraTransformer` - Combines ROCKET and Hydra
- `ROCKETGPU` - GPU-accelerated variant
**Use when**: Need fast, scalable features for any ML algorithm, strong baseline performance.
### Statistical Feature Extraction
Domain-agnostic features based on time series characteristics:
- `Catch22` - 22 canonical time-series characteristics
- `TSFresh` - Comprehensive automated feature extraction (100+ features)
- `TSFreshRelevant` - Feature extraction with relevance filtering
- `SevenNumberSummary` - Descriptive statistics (mean, std, quantiles)
**Use when**: Need interpretable features, domain-agnostic approach, or feeding traditional ML.
### Dictionary-Based Representations
Symbolic approximations for discrete representations:
- `SAX` - Symbolic Aggregate approXimation
- `PAA` - Piecewise Aggregate Approximation
- `SFA` - Symbolic Fourier Approximation
- `SFAFast` - Optimized SFA
- `SFAWhole` - SFA on entire series (no windowing)
- `BORF` - Bag-of-Receptive-Fields
**Use when**: Need discrete/symbolic representation, dimensionality reduction, interpretability.
### Shapelet-Based Features
Discriminative subsequence extraction:
- `RandomShapeletTransform` - Random discriminative shapelets
- `RandomDilatedShapeletTransform` - Dilated shapelets for multi-scale
- `SAST` - Scalable And Accurate Subsequence Transform
- `RSAST` - Randomized SAST
**Use when**: Need interpretable discriminative patterns, phase-invariant features.
### Interval-Based Features
Statistical summaries from time intervals:
- `RandomIntervals` - Features from random intervals
- `SupervisedIntervals` - Supervised interval selection
- `QUANTTransformer` - Quantile-based interval features
**Use when**: Predictive patterns localized to specific windows.
### Preprocessing Transformations
Data preparation and normalization:
- `MinMaxScaler` - Scale to [0, 1] range
- `Normalizer` - Z-normalization (zero mean, unit variance)
- `Centerer` - Center to zero mean
- `SimpleImputer` - Fill missing values
- `DownsampleTransformer` - Reduce temporal resolution
- `Tabularizer` - Convert time series to tabular format
**Use when**: Need standardization, missing value handling, format conversion.
### Specialized Transformations
Advanced analysis methods:
- `MatrixProfile` - Computes distance profiles for pattern discovery
- `DWTTransformer` - Discrete Wavelet Transform
- `AutocorrelationFunctionTransformer` - ACF computation
- `Dobin` - Distance-based Outlier BasIs using Neighbors
- `SignatureTransformer` - Path signature methods
- `PLATransformer` - Piecewise Linear Approximation
### Class Imbalance Handling
- `ADASYN` - Adaptive Synthetic Sampling
- `SMOTE` - Synthetic Minority Over-sampling
- `OHIT` - Over-sampling with Highly Imbalanced Time series
**Use when**: Classification with imbalanced classes.
### Pipeline Composition
- `CollectionTransformerPipeline` - Chain multiple transformers
## Series Transformers
Transform individual time series (e.g., for preprocessing in forecasting).
### Statistical Analysis
- `AutoCorrelationSeriesTransformer` - Autocorrelation
- `StatsModelsACF` - ACF using statsmodels
- `StatsModelsPACF` - Partial autocorrelation
### Smoothing and Filtering
- `ExponentialSmoothing` - Exponentially weighted moving average
- `MovingAverage` - Simple or weighted moving average
- `SavitzkyGolayFilter` - Polynomial smoothing
- `GaussianFilter` - Gaussian kernel smoothing
- `BKFilter` - Baxter-King bandpass filter
- `DiscreteFourierApproximation` - Fourier-based filtering
**Use when**: Need noise reduction, trend extraction, or frequency filtering.
### Dimensionality Reduction
- `PCASeriesTransformer` - Principal component analysis
- `PlASeriesTransformer` - Piecewise Linear Approximation
### Transformations
- `BoxCoxTransformer` - Variance stabilization
- `LogTransformer` - Logarithmic scaling
- `ClaSPTransformer` - Classification Score Profile
### Pipeline Composition
- `SeriesTransformerPipeline` - Chain series transformers
## Quick Start: Feature Extraction
```python
from aeon.transformations.collection.convolution_based import RocketTransformer
from aeon.classification.sklearn import RotationForest
from aeon.datasets import load_classification
# Load data
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Extract ROCKET features
rocket = RocketTransformer()
X_train_features = rocket.fit_transform(X_train)
X_test_features = rocket.transform(X_test)
# Use with any sklearn classifier
clf = RotationForest()
clf.fit(X_train_features, y_train)
accuracy = clf.score(X_test_features, y_test)
```
## Quick Start: Preprocessing Pipeline
```python
from aeon.transformations.collection import (
MinMaxScaler,
SimpleImputer,
CollectionTransformerPipeline
)
# Build preprocessing pipeline
pipeline = CollectionTransformerPipeline([
('imputer', SimpleImputer(strategy='mean')),
('scaler', MinMaxScaler())
])
X_transformed = pipeline.fit_transform(X_train)
```
## Quick Start: Series Smoothing
```python
from aeon.transformations.series import MovingAverage
# Smooth individual time series
smoother = MovingAverage(window_size=5)
y_smoothed = smoother.fit_transform(y)
```
## Algorithm Selection
### For Feature Extraction:
- **Speed + Performance**: MiniRocketTransformer
- **Interpretability**: Catch22, TSFresh
- **Dimensionality reduction**: PAA, SAX, PCA
- **Discriminative patterns**: Shapelet transforms
- **Comprehensive features**: TSFresh (with longer runtime)
### For Preprocessing:
- **Normalization**: Normalizer, MinMaxScaler
- **Smoothing**: MovingAverage, SavitzkyGolayFilter
- **Missing values**: SimpleImputer
- **Frequency analysis**: DWTTransformer, Fourier methods
### For Symbolic Representation:
- **Fast approximation**: PAA
- **Alphabet-based**: SAX
- **Frequency-based**: SFA, SFAFast
## Best Practices
1. **Fit on training data only**: Avoid data leakage
```python
transformer.fit(X_train)
X_train_tf = transformer.transform(X_train)
X_test_tf = transformer.transform(X_test)
```
2. **Pipeline composition**: Chain transformers for complex workflows
```python
pipeline = CollectionTransformerPipeline([
('imputer', SimpleImputer()),
('scaler', Normalizer()),
('features', RocketTransformer())
])
```
3. **Feature selection**: TSFresh can generate many features; consider selection
```python
from sklearn.feature_selection import SelectKBest
selector = SelectKBest(k=100)
X_selected = selector.fit_transform(X_features, y)
```
4. **Memory considerations**: Some transformers memory-intensive on large datasets
- Use MiniRocket instead of ROCKET for speed
- Consider downsampling for very long series
- Use ROCKETGPU for GPU acceleration
5. **Domain knowledge**: Choose transformations matching domain:
- Periodic data: Fourier-based methods
- Noisy data: Smoothing filters
- Spike detection: Wavelet transforms

View File

@@ -1,634 +0,0 @@
# Common Workflows and Integration Patterns
This reference provides end-to-end workflows, best practices, and integration patterns for using aeon effectively.
## Complete Classification Workflow
### Basic Classification Pipeline
```python
# 1. Import required modules
from aeon.classification.convolution_based import RocketClassifier
from aeon.datasets import load_arrow_head
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
# 2. Load and inspect data
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
print(f"Training shape: {X_train.shape}") # (n_cases, n_channels, n_timepoints)
print(f"Unique classes: {np.unique(y_train)}")
# 3. Train classifier
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
clf.fit(X_train, y_train)
# 4. Make predictions
y_pred = clf.predict(X_test)
y_proba = clf.predict_proba(X_test)
# 5. Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.3f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# 6. Visualize confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()
```
### Feature Extraction + Classifier Pipeline
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
# Create pipeline
pipeline = Pipeline([
('features', Catch22(n_jobs=-1)),
('classifier', RandomForestClassifier(n_estimators=500, n_jobs=-1))
])
# Cross-validation
scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='accuracy')
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
# Train on full training set
pipeline.fit(X_train, y_train)
# Evaluate on test set
accuracy = pipeline.score(X_test, y_test)
print(f"Test Accuracy: {accuracy:.3f}")
```
### Multi-Algorithm Comparison
```python
from aeon.classification.convolution_based import RocketClassifier, MiniRocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
from aeon.classification.feature_based import Catch22Classifier
from aeon.classification.interval_based import TimeSeriesForestClassifier
import time
classifiers = {
'ROCKET': RocketClassifier(num_kernels=10000),
'MiniRocket': MiniRocketClassifier(),
'KNN-DTW': KNeighborsTimeSeriesClassifier(distance='dtw', n_neighbors=5),
'Catch22': Catch22Classifier(),
'TSF': TimeSeriesForestClassifier(n_estimators=200)
}
results = {}
for name, clf in classifiers.items():
start_time = time.time()
clf.fit(X_train, y_train)
train_time = time.time() - start_time
start_time = time.time()
accuracy = clf.score(X_test, y_test)
test_time = time.time() - start_time
results[name] = {
'accuracy': accuracy,
'train_time': train_time,
'test_time': test_time
}
# Display results
import pandas as pd
df_results = pd.DataFrame(results).T
df_results = df_results.sort_values('accuracy', ascending=False)
print(df_results)
```
## Complete Forecasting Workflow
### Univariate Forecasting
```python
from aeon.forecasting.arima import ARIMA
from aeon.forecasting.naive import NaiveForecaster
from aeon.datasets import load_airline
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np
import matplotlib.pyplot as plt
# 1. Load data
y = load_airline()
# 2. Train/test split (temporal)
split_point = int(len(y) * 0.8)
y_train, y_test = y[:split_point], y[split_point:]
# 3. Create baseline (naive forecaster)
baseline = NaiveForecaster(strategy="last")
baseline.fit(y_train)
y_pred_baseline = baseline.predict(fh=np.arange(1, len(y_test) + 1))
# 4. Train ARIMA model
forecaster = ARIMA(order=(2, 1, 2), seasonal_order=(1, 1, 1, 12))
forecaster.fit(y_train)
y_pred = forecaster.predict(fh=np.arange(1, len(y_test) + 1))
# 5. Evaluate
mae_baseline = mean_absolute_error(y_test, y_pred_baseline)
mae_arima = mean_absolute_error(y_test, y_pred)
rmse_baseline = np.sqrt(mean_squared_error(y_test, y_pred_baseline))
rmse_arima = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Baseline - MAE: {mae_baseline:.2f}, RMSE: {rmse_baseline:.2f}")
print(f"ARIMA - MAE: {mae_arima:.2f}, RMSE: {rmse_arima:.2f}")
# 6. Visualize
plt.figure(figsize=(12, 6))
plt.plot(y_train.index, y_train, label='Train', alpha=0.7)
plt.plot(y_test.index, y_test, label='Test (Actual)', alpha=0.7)
plt.plot(y_test.index, y_pred, label='ARIMA Forecast', linestyle='--')
plt.plot(y_test.index, y_pred_baseline, label='Baseline', linestyle=':', alpha=0.5)
plt.legend()
plt.title('Forecasting Results')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
```
### Forecast with Confidence Intervals
```python
from aeon.forecasting.arima import ARIMA
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_train)
# Predict with prediction intervals
y_pred = forecaster.predict(fh=np.arange(1, len(y_test) + 1))
pred_interval = forecaster.predict_interval(
fh=np.arange(1, len(y_test) + 1),
coverage=0.95
)
# Visualize with confidence bands
plt.figure(figsize=(12, 6))
plt.plot(y_test.index, y_test, label='Actual')
plt.plot(y_test.index, y_pred, label='Forecast')
plt.fill_between(
y_test.index,
pred_interval.iloc[:, 0],
pred_interval.iloc[:, 1],
alpha=0.3,
label='95% Confidence'
)
plt.legend()
plt.show()
```
### Multi-Step Ahead Forecasting
```python
from aeon.forecasting.compose import DirectReductionForecaster
from sklearn.ensemble import GradientBoostingRegressor
# Convert to supervised learning problem
forecaster = DirectReductionForecaster(
regressor=GradientBoostingRegressor(n_estimators=100),
window_length=12
)
forecaster.fit(y_train)
# Forecast multiple steps
fh = np.arange(1, 13) # 12 months ahead
y_pred = forecaster.predict(fh=fh)
```
## Complete Anomaly Detection Workflow
```python
from aeon.anomaly_detection import STOMP
from aeon.datasets import load_airline
import numpy as np
import matplotlib.pyplot as plt
# 1. Load data
y = load_airline()
X_series = y.values.reshape(1, 1, -1) # Convert to aeon format
# 2. Detect anomalies
detector = STOMP(window_size=50)
anomaly_scores = detector.fit_predict(X_series)
# 3. Identify anomalies (top 5%)
threshold = np.percentile(anomaly_scores, 95)
anomaly_indices = np.where(anomaly_scores > threshold)[0]
# 4. Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
# Plot time series with anomalies
axes[0].plot(y.values, label='Time Series')
axes[0].scatter(
anomaly_indices,
y.values[anomaly_indices],
color='red',
s=100,
label='Anomalies',
zorder=5
)
axes[0].set_ylabel('Value')
axes[0].legend()
axes[0].set_title('Time Series with Detected Anomalies')
# Plot anomaly scores
axes[1].plot(anomaly_scores, label='Anomaly Score')
axes[1].axhline(threshold, color='red', linestyle='--', label='Threshold')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Score')
axes[1].legend()
axes[1].set_title('Anomaly Scores')
plt.tight_layout()
plt.show()
# 5. Extract anomalous segments
print(f"Found {len(anomaly_indices)} anomalies")
for idx in anomaly_indices[:5]: # Show first 5
print(f"Anomaly at index {idx}, value: {y.values[idx]:.2f}")
```
## Complete Clustering Workflow
```python
from aeon.clustering import TimeSeriesKMeans
from aeon.datasets import load_basic_motions
from sklearn.metrics import silhouette_score, davies_bouldin_score
import matplotlib.pyplot as plt
# 1. Load data
X_train, y_train = load_basic_motions(split="train")
# 2. Determine optimal number of clusters (elbow method)
inertias = []
silhouettes = []
K = range(2, 11)
for k in K:
clusterer = TimeSeriesKMeans(n_clusters=k, distance="euclidean", n_init=5)
labels = clusterer.fit_predict(X_train)
inertias.append(clusterer.inertia_)
silhouettes.append(silhouette_score(X_train.reshape(len(X_train), -1), labels))
# Plot elbow curve
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].plot(K, inertias, 'bo-')
axes[0].set_xlabel('Number of Clusters')
axes[0].set_ylabel('Inertia')
axes[0].set_title('Elbow Method')
axes[1].plot(K, silhouettes, 'ro-')
axes[1].set_xlabel('Number of Clusters')
axes[1].set_ylabel('Silhouette Score')
axes[1].set_title('Silhouette Analysis')
plt.tight_layout()
plt.show()
# 3. Cluster with optimal k
optimal_k = 4
clusterer = TimeSeriesKMeans(n_clusters=optimal_k, distance="dtw", n_init=10)
labels = clusterer.fit_predict(X_train)
# 4. Visualize clusters
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.ravel()
for cluster_id in range(optimal_k):
cluster_indices = np.where(labels == cluster_id)[0]
ax = axes[cluster_id]
# Plot all series in cluster
for idx in cluster_indices[:20]: # Plot up to 20 series
ax.plot(X_train[idx, 0, :], alpha=0.3, color='blue')
# Plot cluster center
ax.plot(clusterer.cluster_centers_[cluster_id, 0, :],
color='red', linewidth=2, label='Center')
ax.set_title(f'Cluster {cluster_id} (n={len(cluster_indices)})')
ax.legend()
plt.tight_layout()
plt.show()
```
## Cross-Validation Strategies
### Standard K-Fold Cross-Validation
```python
from sklearn.model_selection import cross_val_score, StratifiedKFold
from aeon.classification.convolution_based import RocketClassifier
clf = RocketClassifier()
# Stratified K-Fold (preserves class distribution)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(clf, X_train, y_train, cv=cv, scoring='accuracy')
print(f"Cross-validation scores: {scores}")
print(f"Mean accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
```
### Time Series Cross-Validation (for forecasting)
```python
from sklearn.model_selection import TimeSeriesSplit
from aeon.forecasting.arima import ARIMA
from sklearn.metrics import mean_squared_error
import numpy as np
# Time-aware split (no future data leakage)
tscv = TimeSeriesSplit(n_splits=5)
mse_scores = []
for train_idx, test_idx in tscv.split(y):
y_train_cv, y_test_cv = y.iloc[train_idx], y.iloc[test_idx]
forecaster = ARIMA(order=(2, 1, 2))
forecaster.fit(y_train_cv)
fh = np.arange(1, len(y_test_cv) + 1)
y_pred = forecaster.predict(fh=fh)
mse = mean_squared_error(y_test_cv, y_pred)
mse_scores.append(mse)
print(f"CV MSE: {np.mean(mse_scores):.3f} (+/- {np.std(mse_scores):.3f})")
```
## Hyperparameter Tuning
### Grid Search
```python
from sklearn.model_selection import GridSearchCV
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
# Define parameter grid
param_grid = {
'n_neighbors': [1, 3, 5, 7, 9],
'distance': ['dtw', 'euclidean', 'erp', 'msm'],
'distance_params': [{'window': 0.1}, {'window': 0.2}, None]
}
# Grid search with cross-validation
clf = KNeighborsTimeSeriesClassifier()
grid_search = GridSearchCV(
clf,
param_grid,
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=2
)
grid_search.fit(X_train, y_train)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.3f}")
print(f"Test accuracy: {grid_search.score(X_test, y_test):.3f}")
```
### Random Search
```python
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
param_distributions = {
'n_neighbors': randint(1, 20),
'distance': ['dtw', 'euclidean', 'ddtw'],
'distance_params': [{'window': w} for w in np.linspace(0.0, 0.5, 10)]
}
clf = KNeighborsTimeSeriesClassifier()
random_search = RandomizedSearchCV(
clf,
param_distributions,
n_iter=50,
cv=5,
scoring='accuracy',
n_jobs=-1,
random_state=42
)
random_search.fit(X_train, y_train)
print(f"Best parameters: {random_search.best_params_}")
```
## Integration with scikit-learn
### Using aeon in scikit-learn Pipelines
```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from aeon.transformations.collection import Catch22
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.ensemble import RandomForestClassifier
pipeline = Pipeline([
('features', Catch22()),
('scaler', StandardScaler()),
('feature_selection', SelectKBest(f_classif, k=15)),
('classifier', RandomForestClassifier(n_estimators=500))
])
pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_test, y_test)
```
### Voting Ensemble with scikit-learn
```python
from sklearn.ensemble import VotingClassifier
from aeon.classification.convolution_based import RocketClassifier
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
from aeon.classification.feature_based import Catch22Classifier
ensemble = VotingClassifier(
estimators=[
('rocket', RocketClassifier()),
('knn', KNeighborsTimeSeriesClassifier()),
('catch22', Catch22Classifier())
],
voting='soft',
n_jobs=-1
)
ensemble.fit(X_train, y_train)
accuracy = ensemble.score(X_test, y_test)
```
### Stacking with Meta-Learner
```python
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from aeon.classification.convolution_based import MiniRocketClassifier
from aeon.classification.interval_based import TimeSeriesForestClassifier
stacking = StackingClassifier(
estimators=[
('minirocket', MiniRocketClassifier()),
('tsf', TimeSeriesForestClassifier(n_estimators=100))
],
final_estimator=LogisticRegression(),
cv=5
)
stacking.fit(X_train, y_train)
accuracy = stacking.score(X_test, y_test)
```
## Data Preprocessing
### Handling Variable-Length Series
```python
from aeon.transformations.collection import PaddingTransformer
# Pad series to equal length
padder = PaddingTransformer(pad_length=None, fill_value=0)
X_padded = padder.fit_transform(X_variable_length)
```
### Handling Missing Values
```python
from aeon.transformations.series import Imputer
imputer = Imputer(method='mean')
X_imputed = imputer.fit_transform(X_with_missing)
```
### Normalization
```python
from aeon.transformations.collection import Normalizer
normalizer = Normalizer(method='z-score')
X_normalized = normalizer.fit_transform(X_train)
```
## Model Persistence
### Saving and Loading Models
```python
import pickle
from aeon.classification.convolution_based import RocketClassifier
# Train and save
clf = RocketClassifier()
clf.fit(X_train, y_train)
with open('rocket_model.pkl', 'wb') as f:
pickle.dump(clf, f)
# Load and predict
with open('rocket_model.pkl', 'rb') as f:
loaded_clf = pickle.load(f)
predictions = loaded_clf.predict(X_test)
```
### Using joblib (recommended for large models)
```python
import joblib
# Save
joblib.dump(clf, 'rocket_model.joblib')
# Load
loaded_clf = joblib.load('rocket_model.joblib')
```
## Visualization Utilities
### Plotting Time Series
```python
from aeon.visualisation import plot_series
import matplotlib.pyplot as plt
# Plot multiple series
fig, ax = plt.subplots(figsize=(12, 6))
plot_series(X_train[0], X_train[1], X_train[2], labels=['Series 1', 'Series 2', 'Series 3'], ax=ax)
plt.title('Time Series Visualization')
plt.show()
```
### Plotting Distance Matrices
```python
from aeon.distances import pairwise_distance
import seaborn as sns
dist_matrix = pairwise_distance(X_train[:50], metric="dtw")
plt.figure(figsize=(10, 8))
sns.heatmap(dist_matrix, cmap='viridis', square=True)
plt.title('DTW Distance Matrix')
plt.show()
```
## Performance Optimization Tips
1. **Use n_jobs=-1** for parallel processing:
```python
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
```
2. **Use MiniRocket instead of ROCKET** for faster training:
```python
clf = MiniRocketClassifier() # 75% faster
```
3. **Reduce num_kernels** for faster training:
```python
clf = RocketClassifier(num_kernels=2000) # Default is 10000
```
4. **Use Catch22 instead of TSFresh**:
```python
transform = Catch22() # Much faster, fewer features
```
5. **Window constraints for DTW**:
```python
clf = KNeighborsTimeSeriesClassifier(
distance='dtw',
distance_params={'window': 0.1} # Constrain warping
)
```
## Best Practices
1. **Always use train/test split** with time series ordering preserved
2. **Use stratified splits** for classification to maintain class balance
3. **Start with fast algorithms** (ROCKET, MiniRocket) before trying slow ones
4. **Use cross-validation** to estimate generalization performance
5. **Benchmark against naive baselines** to establish minimum performance
6. **Normalize/standardize** when using distance-based methods
7. **Use appropriate distance metrics** for your data characteristics
8. **Save trained models** to avoid retraining
9. **Monitor training time** and computational resources
10. **Visualize results** to understand model behavior