mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-28 07:33:45 +08:00
Support for aeon for time-series analysis and machine learning
This commit is contained in:
749
scientific-packages/aeon/references/core_modules.md
Normal file
749
scientific-packages/aeon/references/core_modules.md
Normal file
@@ -0,0 +1,749 @@
|
||||
# Core Modules: Transformations, Distances, Networks, Datasets, and Benchmarking
|
||||
|
||||
This reference provides comprehensive details on foundational modules that support aeon's learning tasks.
|
||||
|
||||
## Transformations
|
||||
|
||||
Transformations convert time series into alternative representations for feature extraction, preprocessing, or visualization.
|
||||
|
||||
### Two Types of Transformers
|
||||
|
||||
**Collection Transformers**: Process entire collections of time series
|
||||
- Input: `(n_cases, n_channels, n_timepoints)`
|
||||
- Output: Features, transformed collections, or tabular data
|
||||
|
||||
**Series Transformers**: Work on individual time series
|
||||
- Input: Single time series
|
||||
- Output: Transformed single series
|
||||
|
||||
### Collection-Level Transformations
|
||||
|
||||
#### ROCKET (RAndom Convolutional KErnel Transform)
|
||||
|
||||
Fast feature extraction via random convolutional kernels:
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection.convolution_based import Rocket
|
||||
|
||||
rocket = Rocket(num_kernels=10000, n_jobs=-1)
|
||||
X_transformed = rocket.fit_transform(X_train)
|
||||
# Output shape: (n_cases, 2 * num_kernels)
|
||||
```
|
||||
|
||||
**Variants:**
|
||||
```python
|
||||
from aeon.transformations.collection.convolution_based import (
|
||||
MiniRocket,
|
||||
MultiRocket,
|
||||
Hydra
|
||||
)
|
||||
|
||||
# MiniRocket: Faster, streamlined version
|
||||
minirocket = MiniRocket(num_kernels=10000)
|
||||
X_features = minirocket.fit_transform(X_train)
|
||||
|
||||
# MultiRocket: Multivariate extensions
|
||||
multirocket = MultiRocket(num_kernels=10000)
|
||||
X_features = multirocket.fit_transform(X_train)
|
||||
|
||||
# Hydra: Dictionary-based convolution
|
||||
hydra = Hydra(n_kernels=8)
|
||||
X_features = hydra.fit_transform(X_train)
|
||||
```
|
||||
|
||||
#### Catch22
|
||||
|
||||
22 canonical time series features:
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection.feature_based import Catch22
|
||||
|
||||
catch22 = Catch22(n_jobs=-1)
|
||||
X_features = catch22.fit_transform(X_train)
|
||||
# Output shape: (n_cases, 22)
|
||||
```
|
||||
|
||||
**Feature categories:**
|
||||
- Distribution (mean, variance, skewness)
|
||||
- Autocorrelation properties
|
||||
- Entropy measures
|
||||
- Nonlinear dynamics
|
||||
- Spectral properties
|
||||
|
||||
#### TSFresh
|
||||
|
||||
Comprehensive feature extraction (779 features):
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection.feature_based import TSFresh
|
||||
|
||||
tsfresh = TSFresh(
|
||||
default_fc_parameters="comprehensive",
|
||||
n_jobs=-1
|
||||
)
|
||||
X_features = tsfresh.fit_transform(X_train)
|
||||
```
|
||||
|
||||
**Warning**: Slow on large datasets; use Catch22 for faster alternative
|
||||
|
||||
#### FreshPRINCE
|
||||
|
||||
Fresh Pipelines with Random Interval and Catch22 Features:
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection.feature_based import FreshPRINCE
|
||||
|
||||
freshprince = FreshPRINCE(n_intervals=50, n_jobs=-1)
|
||||
X_features = freshprince.fit_transform(X_train)
|
||||
```
|
||||
|
||||
#### Shapelet Transform
|
||||
|
||||
Extract discriminative subsequences:
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection.shapelet_based import ShapeletTransform
|
||||
|
||||
shapelet = ShapeletTransform(
|
||||
n_shapelet_samples=10000,
|
||||
max_shapelets=20,
|
||||
n_jobs=-1
|
||||
)
|
||||
X_features = shapelet.fit_transform(X_train, y_train)
|
||||
# Requires labels for supervised shapelet discovery
|
||||
```
|
||||
|
||||
**Random Shapelet Transform**:
|
||||
```python
|
||||
from aeon.transformations.collection.shapelet_based import RandomShapeletTransform
|
||||
|
||||
rst = RandomShapeletTransform(n_shapelets=1000)
|
||||
X_features = rst.fit_transform(X_train)
|
||||
```
|
||||
|
||||
#### SAST (Shapelet-Attention Subsequence Transform)
|
||||
|
||||
Attention-based shapelet discovery:
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection.shapelet_based import SAST
|
||||
|
||||
sast = SAST(window_size=0.1, n_shapelets=100)
|
||||
X_features = sast.fit_transform(X_train, y_train)
|
||||
```
|
||||
|
||||
#### Symbolic Representations
|
||||
|
||||
**SAX (Symbolic Aggregate approXimation)**:
|
||||
```python
|
||||
from aeon.transformations.collection.dictionary_based import SAX
|
||||
|
||||
sax = SAX(n_segments=8, alphabet_size=4)
|
||||
X_symbolic = sax.fit_transform(X_train)
|
||||
```
|
||||
|
||||
**PAA (Piecewise Aggregate Approximation)**:
|
||||
```python
|
||||
from aeon.transformations.collection.dictionary_based import PAA
|
||||
|
||||
paa = PAA(n_segments=10)
|
||||
X_approximated = paa.fit_transform(X_train)
|
||||
```
|
||||
|
||||
**SFA (Symbolic Fourier Approximation)**:
|
||||
```python
|
||||
from aeon.transformations.collection.dictionary_based import SFA
|
||||
|
||||
sfa = SFA(word_length=8, alphabet_size=4)
|
||||
X_symbolic = sfa.fit_transform(X_train)
|
||||
```
|
||||
|
||||
#### Channel Selection and Operations
|
||||
|
||||
**Channel Selection**:
|
||||
```python
|
||||
from aeon.transformations.collection.channel_selection import ChannelSelection
|
||||
|
||||
selector = ChannelSelection(channels=[0, 2, 5])
|
||||
X_selected = selector.fit_transform(X_train)
|
||||
```
|
||||
|
||||
**Channel Scoring**:
|
||||
```python
|
||||
from aeon.transformations.collection.channel_selection import ChannelScorer
|
||||
|
||||
scorer = ChannelScorer()
|
||||
scores = scorer.fit_transform(X_train, y_train)
|
||||
```
|
||||
|
||||
#### Data Balancing
|
||||
|
||||
**SMOTE (Synthetic Minority Over-sampling)**:
|
||||
```python
|
||||
from aeon.transformations.collection.smote import SMOTE
|
||||
|
||||
smote = SMOTE(k_neighbors=5)
|
||||
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
|
||||
```
|
||||
|
||||
**ADASYN**:
|
||||
```python
|
||||
from aeon.transformations.collection.smote import ADASYN
|
||||
|
||||
adasyn = ADASYN(n_neighbors=5)
|
||||
X_resampled, y_resampled = adasyn.fit_resample(X_train, y_train)
|
||||
```
|
||||
|
||||
### Series-Level Transformations
|
||||
|
||||
#### Smoothing Filters
|
||||
|
||||
**Moving Average**:
|
||||
```python
|
||||
from aeon.transformations.series.moving_average import MovingAverage
|
||||
|
||||
ma = MovingAverage(window_size=5)
|
||||
X_smoothed = ma.fit_transform(X_series)
|
||||
```
|
||||
|
||||
**Exponential Smoothing**:
|
||||
```python
|
||||
from aeon.transformations.series.exponent import ExponentTransformer
|
||||
|
||||
exp_smooth = ExponentTransformer(power=0.5)
|
||||
X_smoothed = exp_smooth.fit_transform(X_series)
|
||||
```
|
||||
|
||||
**Savitzky-Golay Filter**:
|
||||
```python
|
||||
from aeon.transformations.series.savgol import SavitzkyGolay
|
||||
|
||||
savgol = SavitzkyGolay(window_length=11, polyorder=3)
|
||||
X_smoothed = savgol.fit_transform(X_series)
|
||||
```
|
||||
|
||||
**Gaussian Filter**:
|
||||
```python
|
||||
from aeon.transformations.series.gaussian import GaussianFilter
|
||||
|
||||
gaussian = GaussianFilter(sigma=2.0)
|
||||
X_smoothed = gaussian.fit_transform(X_series)
|
||||
```
|
||||
|
||||
#### Statistical Transforms
|
||||
|
||||
**Box-Cox Transformation**:
|
||||
```python
|
||||
from aeon.transformations.series.boxcox import BoxCoxTransformer
|
||||
|
||||
boxcox = BoxCoxTransformer()
|
||||
X_transformed = boxcox.fit_transform(X_series)
|
||||
```
|
||||
|
||||
**AutoCorrelation**:
|
||||
```python
|
||||
from aeon.transformations.series.acf import AutoCorrelationTransformer
|
||||
|
||||
acf = AutoCorrelationTransformer(n_lags=40)
|
||||
X_acf = acf.fit_transform(X_series)
|
||||
```
|
||||
|
||||
**PCA (Principal Component Analysis)**:
|
||||
```python
|
||||
from aeon.transformations.series.pca import PCATransformer
|
||||
|
||||
pca = PCATransformer(n_components=3)
|
||||
X_reduced = pca.fit_transform(X_series)
|
||||
```
|
||||
|
||||
#### Approximation Methods
|
||||
|
||||
**Discrete Fourier Transform (DFT)**:
|
||||
```python
|
||||
from aeon.transformations.series.fourier import FourierTransform
|
||||
|
||||
dft = FourierTransform()
|
||||
X_freq = dft.fit_transform(X_series)
|
||||
```
|
||||
|
||||
**Piecewise Linear Approximation (PLA)**:
|
||||
```python
|
||||
from aeon.transformations.series.pla import PLA
|
||||
|
||||
pla = PLA(n_segments=10)
|
||||
X_approx = pla.fit_transform(X_series)
|
||||
```
|
||||
|
||||
#### Anomaly Detection Transform
|
||||
|
||||
**DOBIN (Distance-based Outlier BasIs using Neighbors)**:
|
||||
```python
|
||||
from aeon.transformations.series.dobin import DOBIN
|
||||
|
||||
dobin = DOBIN()
|
||||
X_transformed = dobin.fit_transform(X_series)
|
||||
```
|
||||
|
||||
### Transformation Pipelines
|
||||
|
||||
Chain transformers together:
|
||||
|
||||
```python
|
||||
from sklearn.pipeline import Pipeline
|
||||
from aeon.transformations.collection import Catch22, PCA
|
||||
|
||||
pipeline = Pipeline([
|
||||
('features', Catch22()),
|
||||
('reduce', PCA(n_components=10))
|
||||
])
|
||||
X_transformed = pipeline.fit_transform(X_train)
|
||||
```
|
||||
|
||||
## Distance Metrics
|
||||
|
||||
Specialized distance functions for time series similarity measurement.
|
||||
|
||||
### Distance Categories
|
||||
|
||||
#### Warping-Based Distances
|
||||
|
||||
**DTW (Dynamic Time Warping)**:
|
||||
```python
|
||||
from aeon.distances import dtw_distance, dtw_pairwise_distance
|
||||
|
||||
# Compute distance between two series
|
||||
dist = dtw_distance(series1, series2, window=0.2)
|
||||
|
||||
# Pairwise distances for a collection
|
||||
dist_matrix = dtw_pairwise_distance(X_collection)
|
||||
|
||||
# Get alignment path
|
||||
from aeon.distances import dtw_alignment_path
|
||||
path = dtw_alignment_path(series1, series2)
|
||||
|
||||
# Get cost matrix
|
||||
from aeon.distances import dtw_cost_matrix
|
||||
cost = dtw_cost_matrix(series1, series2)
|
||||
```
|
||||
|
||||
**DTW Variants**:
|
||||
```python
|
||||
from aeon.distances import (
|
||||
wdtw_distance, # Weighted DTW
|
||||
ddtw_distance, # Derivative DTW
|
||||
wddtw_distance, # Weighted Derivative DTW
|
||||
adtw_distance, # Amerced DTW
|
||||
shape_dtw_distance # Shape DTW
|
||||
)
|
||||
|
||||
# Weighted DTW (penalize warping)
|
||||
dist = wdtw_distance(series1, series2, g=0.05)
|
||||
|
||||
# Derivative DTW (compare shapes)
|
||||
dist = ddtw_distance(series1, series2)
|
||||
|
||||
# Shape DTW (with shape descriptors)
|
||||
dist = shape_dtw_distance(series1, series2)
|
||||
```
|
||||
|
||||
**DTW Parameters**:
|
||||
- `window`: Sakoe-Chiba band constraint (0.0-1.0)
|
||||
- `g`: Penalty weight for warping distances
|
||||
|
||||
#### Edit Distances
|
||||
|
||||
**ERP (Edit distance with Real Penalty)**:
|
||||
```python
|
||||
from aeon.distances import erp_distance
|
||||
|
||||
dist = erp_distance(series1, series2, g=0.0, window=None)
|
||||
```
|
||||
|
||||
**EDR (Edit Distance on Real sequences)**:
|
||||
```python
|
||||
from aeon.distances import edr_distance
|
||||
|
||||
dist = edr_distance(series1, series2, epsilon=0.1, window=None)
|
||||
```
|
||||
|
||||
**LCSS (Longest Common SubSequence)**:
|
||||
```python
|
||||
from aeon.distances import lcss_distance
|
||||
|
||||
dist = lcss_distance(series1, series2, epsilon=1.0, window=None)
|
||||
```
|
||||
|
||||
**TWE (Time Warp Edit)**:
|
||||
```python
|
||||
from aeon.distances import twe_distance
|
||||
|
||||
dist = twe_distance(series1, series2, penalty=0.1, stiffness=0.001)
|
||||
```
|
||||
|
||||
#### Standard Metrics
|
||||
|
||||
```python
|
||||
from aeon.distances import (
|
||||
euclidean_distance,
|
||||
manhattan_distance,
|
||||
minkowski_distance,
|
||||
squared_distance
|
||||
)
|
||||
|
||||
# Euclidean distance
|
||||
dist = euclidean_distance(series1, series2)
|
||||
|
||||
# Manhattan (L1) distance
|
||||
dist = manhattan_distance(series1, series2)
|
||||
|
||||
# Minkowski distance
|
||||
dist = minkowski_distance(series1, series2, p=3)
|
||||
|
||||
# Squared Euclidean
|
||||
dist = squared_distance(series1, series2)
|
||||
```
|
||||
|
||||
#### Specialized Distances
|
||||
|
||||
**MSM (Move-Split-Merge)**:
|
||||
```python
|
||||
from aeon.distances import msm_distance
|
||||
|
||||
dist = msm_distance(series1, series2, c=1.0)
|
||||
```
|
||||
|
||||
**SBD (Shape-Based Distance)**:
|
||||
```python
|
||||
from aeon.distances import sbd_distance
|
||||
|
||||
dist = sbd_distance(series1, series2)
|
||||
```
|
||||
|
||||
### Unified Distance Interface
|
||||
|
||||
```python
|
||||
from aeon.distances import distance, pairwise_distance
|
||||
|
||||
# Compute any distance by name
|
||||
dist = distance(series1, series2, metric="dtw", window=0.1)
|
||||
|
||||
# Pairwise distance matrix
|
||||
dist_matrix = pairwise_distance(X_collection, metric="euclidean")
|
||||
|
||||
# Get available distance names
|
||||
from aeon.distances import get_distance_function_names
|
||||
available_distances = get_distance_function_names()
|
||||
```
|
||||
|
||||
### Distance Selection Guide
|
||||
|
||||
**Fast and accurate**:
|
||||
- Euclidean for aligned series
|
||||
- Squared for even faster computation
|
||||
|
||||
**Handle temporal shifts**:
|
||||
- DTW for general warping
|
||||
- WDTW to penalize excessive warping
|
||||
|
||||
**Shape-based similarity**:
|
||||
- DDTW or Shape DTW
|
||||
- SBD for normalized shape comparison
|
||||
|
||||
**Robust to noise**:
|
||||
- ERP, EDR, or LCSS
|
||||
|
||||
**Multivariate**:
|
||||
- DTW supports multivariate via independent/dependent alignment
|
||||
|
||||
## Deep Learning Networks
|
||||
|
||||
Neural network architectures specialized for time series.
|
||||
|
||||
### Network Architectures
|
||||
|
||||
#### InceptionTime
|
||||
Ensemble of Inception modules capturing multi-scale patterns:
|
||||
|
||||
```python
|
||||
from aeon.networks import InceptionNetwork
|
||||
from aeon.classification.deep_learning import InceptionTimeClassifier
|
||||
|
||||
# Use via classifier
|
||||
clf = InceptionTimeClassifier(
|
||||
n_epochs=200,
|
||||
batch_size=64,
|
||||
n_ensemble=5
|
||||
)
|
||||
|
||||
# Or use network directly
|
||||
network = InceptionNetwork(
|
||||
n_classes=3,
|
||||
n_channels=1,
|
||||
n_timepoints=100
|
||||
)
|
||||
```
|
||||
|
||||
#### ResNet
|
||||
Residual networks with skip connections:
|
||||
|
||||
```python
|
||||
from aeon.networks import ResNetNetwork
|
||||
from aeon.classification.deep_learning import ResNetClassifier
|
||||
|
||||
clf = ResNetClassifier(
|
||||
n_epochs=200,
|
||||
batch_size=64,
|
||||
n_res_blocks=3
|
||||
)
|
||||
```
|
||||
|
||||
#### FCN (Fully Convolutional Network)
|
||||
```python
|
||||
from aeon.networks import FCNNetwork
|
||||
from aeon.classification.deep_learning import FCNClassifier
|
||||
|
||||
clf = FCNClassifier(
|
||||
n_epochs=200,
|
||||
batch_size=64,
|
||||
n_conv_layers=3
|
||||
)
|
||||
```
|
||||
|
||||
#### CNN
|
||||
Standard convolutional architecture:
|
||||
|
||||
```python
|
||||
from aeon.classification.deep_learning import CNNClassifier
|
||||
|
||||
clf = CNNClassifier(
|
||||
n_epochs=100,
|
||||
batch_size=32,
|
||||
kernel_size=7,
|
||||
n_filters=32
|
||||
)
|
||||
```
|
||||
|
||||
#### TapNet
|
||||
Attentional prototype networks:
|
||||
|
||||
```python
|
||||
from aeon.classification.deep_learning import TapNetClassifier
|
||||
|
||||
clf = TapNetClassifier(
|
||||
n_epochs=200,
|
||||
batch_size=64
|
||||
)
|
||||
```
|
||||
|
||||
#### MLP (Multi-Layer Perceptron)
|
||||
```python
|
||||
from aeon.classification.deep_learning import MLPClassifier
|
||||
|
||||
clf = MLPClassifier(
|
||||
n_epochs=100,
|
||||
batch_size=32,
|
||||
hidden_layer_sizes=[500]
|
||||
)
|
||||
```
|
||||
|
||||
#### LITE (Light Inception with boosTing tEchnique)
|
||||
Lightweight ensemble network:
|
||||
|
||||
```python
|
||||
from aeon.classification.deep_learning import LITEClassifier
|
||||
|
||||
clf = LITEClassifier(
|
||||
n_epochs=100,
|
||||
batch_size=64
|
||||
)
|
||||
```
|
||||
|
||||
### Training Configuration
|
||||
|
||||
```python
|
||||
from aeon.classification.deep_learning import InceptionTimeClassifier
|
||||
|
||||
clf = InceptionTimeClassifier(
|
||||
n_epochs=200,
|
||||
batch_size=64,
|
||||
learning_rate=0.001,
|
||||
use_bias=True,
|
||||
verbose=1
|
||||
)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Common parameters:**
|
||||
- `n_epochs`: Training iterations
|
||||
- `batch_size`: Samples per gradient update
|
||||
- `learning_rate`: Optimizer learning rate
|
||||
- `verbose`: Training output verbosity
|
||||
- `callbacks`: Keras callbacks (early stopping, etc.)
|
||||
|
||||
## Datasets
|
||||
|
||||
Load built-in datasets and access UCR/UEA archives.
|
||||
|
||||
### Built-in Datasets
|
||||
|
||||
```python
|
||||
from aeon.datasets import (
|
||||
load_arrow_head,
|
||||
load_airline,
|
||||
load_gunpoint,
|
||||
load_italy_power_demand,
|
||||
load_basic_motions,
|
||||
load_japanese_vowels
|
||||
)
|
||||
|
||||
# Classification dataset
|
||||
X_train, y_train = load_arrow_head(split="train")
|
||||
X_test, y_test = load_arrow_head(split="test")
|
||||
|
||||
# Forecasting dataset (univariate series)
|
||||
y = load_airline()
|
||||
|
||||
# Multivariate classification
|
||||
X_train, y_train = load_basic_motions(split="train")
|
||||
print(X_train.shape) # (n_cases, n_channels, n_timepoints)
|
||||
```
|
||||
|
||||
### UCR/UEA Archives
|
||||
|
||||
Access 100+ benchmark datasets:
|
||||
|
||||
```python
|
||||
from aeon.datasets import load_from_tsfile, load_classification
|
||||
|
||||
# Load UCR/UEA dataset by name
|
||||
X_train, y_train = load_classification("GunPoint", split="train")
|
||||
X_test, y_test = load_classification("GunPoint", split="test")
|
||||
|
||||
# Load from local .ts file
|
||||
X, y = load_from_tsfile("data/my_dataset_TRAIN.ts")
|
||||
```
|
||||
|
||||
### Dataset Information
|
||||
|
||||
```python
|
||||
from aeon.datasets import get_dataset_meta_data
|
||||
|
||||
# Get metadata about a dataset
|
||||
info = get_dataset_meta_data("GunPoint")
|
||||
print(info)
|
||||
# {'n_cases': 150, 'n_timepoints': 150, 'n_classes': 2, ...}
|
||||
```
|
||||
|
||||
### Custom Dataset Format
|
||||
|
||||
Save/load custom datasets in aeon format:
|
||||
|
||||
```python
|
||||
from aeon.datasets import write_to_tsfile, load_from_tsfile
|
||||
|
||||
# Save
|
||||
write_to_tsfile(
|
||||
X_train,
|
||||
"my_dataset_TRAIN.ts",
|
||||
y=y_train,
|
||||
problem_name="MyDataset"
|
||||
)
|
||||
|
||||
# Load
|
||||
X, y = load_from_tsfile("my_dataset_TRAIN.ts")
|
||||
```
|
||||
|
||||
## Benchmarking
|
||||
|
||||
Tools for reproducible evaluation and comparison.
|
||||
|
||||
### Benchmarking Utilities
|
||||
|
||||
```python
|
||||
from aeon.benchmarking import benchmark_estimator
|
||||
|
||||
# Benchmark a classifier on multiple datasets
|
||||
results = benchmark_estimator(
|
||||
estimator=RocketClassifier(),
|
||||
datasets=["GunPoint", "ArrowHead", "ItalyPowerDemand"],
|
||||
n_resamples=10
|
||||
)
|
||||
```
|
||||
|
||||
### Result Storage and Comparison
|
||||
|
||||
```python
|
||||
from aeon.benchmarking import (
|
||||
write_results_to_csv,
|
||||
read_results_from_csv,
|
||||
compare_results
|
||||
)
|
||||
|
||||
# Save results
|
||||
write_results_to_csv(results, "results.csv")
|
||||
|
||||
# Load and compare
|
||||
results_rocket = read_results_from_csv("results_rocket.csv")
|
||||
results_inception = read_results_from_csv("results_inception.csv")
|
||||
|
||||
comparison = compare_results(
|
||||
[results_rocket, results_inception],
|
||||
estimator_names=["ROCKET", "InceptionTime"]
|
||||
)
|
||||
```
|
||||
|
||||
### Critical Difference Diagrams
|
||||
|
||||
Visualize statistical significance of differences:
|
||||
|
||||
```python
|
||||
from aeon.benchmarking.results_plotting import plot_critical_difference_diagram
|
||||
|
||||
plot_critical_difference_diagram(
|
||||
results_dict={
|
||||
'ROCKET': results_rocket,
|
||||
'InceptionTime': results_inception,
|
||||
'BOSS': results_boss
|
||||
},
|
||||
dataset_names=["GunPoint", "ArrowHead", "ItalyPowerDemand"]
|
||||
)
|
||||
```
|
||||
|
||||
## Discovery and Tags
|
||||
|
||||
### Finding Estimators
|
||||
|
||||
```python
|
||||
from aeon.utils.discovery import all_estimators
|
||||
|
||||
# Get all classifiers
|
||||
classifiers = all_estimators(type_filter="classifier")
|
||||
|
||||
# Get all transformers
|
||||
transformers = all_estimators(type_filter="transformer")
|
||||
|
||||
# Filter by capability tags
|
||||
multivariate_classifiers = all_estimators(
|
||||
type_filter="classifier",
|
||||
filter_tags={"capability:multivariate": True}
|
||||
)
|
||||
```
|
||||
|
||||
### Checking Estimator Tags
|
||||
|
||||
```python
|
||||
from aeon.utils.tags import all_tags_for_estimator
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
|
||||
tags = all_tags_for_estimator(RocketClassifier)
|
||||
print(tags)
|
||||
# {'capability:multivariate': True, 'X_inner_type': ['numpy3D'], ...}
|
||||
```
|
||||
|
||||
### Common Tags
|
||||
|
||||
- `capability:multivariate`: Handles multivariate series
|
||||
- `capability:unequal_length`: Handles variable-length series
|
||||
- `capability:missing_values`: Handles missing data
|
||||
- `algorithm_type`: Algorithm family (e.g., "convolution", "distance")
|
||||
- `python_dependencies`: Required packages
|
||||
442
scientific-packages/aeon/references/learning_tasks.md
Normal file
442
scientific-packages/aeon/references/learning_tasks.md
Normal file
@@ -0,0 +1,442 @@
|
||||
# Learning Tasks: Classification, Regression, Clustering, and Similarity Search
|
||||
|
||||
This reference provides comprehensive details on supervised and unsupervised learning tasks for time series collections.
|
||||
|
||||
## Time Series Classification
|
||||
|
||||
Time series classification (TSC) assigns labels to entire sequences. Aeon provides diverse algorithm families with unique strengths.
|
||||
|
||||
### Algorithm Categories
|
||||
|
||||
#### 1. Convolution-Based Classifiers
|
||||
Transform time series using random convolutional kernels:
|
||||
|
||||
**ROCKET (RAndom Convolutional KErnel Transform)**
|
||||
- Ultra-fast feature extraction via random kernels
|
||||
- 10,000+ kernels generate discriminative features
|
||||
- Linear classifier on extracted features
|
||||
|
||||
```python
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
|
||||
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
|
||||
clf.fit(X_train, y_train)
|
||||
predictions = clf.predict(X_test)
|
||||
probabilities = clf.predict_proba(X_test)
|
||||
```
|
||||
|
||||
**Variants:**
|
||||
- `MiniRocketClassifier`: Faster, streamlined version
|
||||
- `MultiRocketClassifier`: Multivariate extensions
|
||||
- `Arsenal`: Ensemble of ROCKET transformers
|
||||
- `Hydra`: Dictionary-based convolution variant
|
||||
|
||||
#### 2. Deep Learning Classifiers
|
||||
Neural networks specialized for time series:
|
||||
|
||||
**InceptionTime**
|
||||
- Ensemble of Inception modules
|
||||
- Captures patterns at multiple scales
|
||||
- State-of-the-art on UCR benchmarks
|
||||
|
||||
```python
|
||||
from aeon.classification.deep_learning import InceptionTimeClassifier
|
||||
|
||||
clf = InceptionTimeClassifier(n_epochs=200, batch_size=64)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other architectures:**
|
||||
- `ResNetClassifier`: Residual connections
|
||||
- `FCNClassifier`: Fully Convolutional Networks
|
||||
- `CNNClassifier`: Standard convolutional architecture
|
||||
- `LITEClassifier`: Lightweight networks
|
||||
- `MLPClassifier`: Multi-layer perceptrons
|
||||
- `TapNetClassifier`: Attentional prototype networks
|
||||
|
||||
#### 3. Dictionary-Based Classifiers
|
||||
Symbolic representations and bag-of-words approaches:
|
||||
|
||||
**BOSS (Bag of SFA Symbols)**
|
||||
- Converts series to symbolic words
|
||||
- Histogram-based classification
|
||||
- Effective for shape patterns
|
||||
|
||||
```python
|
||||
from aeon.classification.dictionary_based import BOSSEnsemble
|
||||
|
||||
clf = BOSSEnsemble(max_ensemble_size=500)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other dictionary methods:**
|
||||
- `TemporalDictionaryEnsemble (TDE)`: Enhanced BOSS with temporal info
|
||||
- `WEASEL`: Word ExtrAction for time SEries cLassification
|
||||
- `MUSE`: MUltivariate Symbolic Extension
|
||||
- `MrSEQL`: Multiple Representations SEQuence Learner
|
||||
|
||||
#### 4. Distance-Based Classifiers
|
||||
Leverage time series-specific distance metrics:
|
||||
|
||||
**K-Nearest Neighbors with DTW**
|
||||
- Dynamic Time Warping handles temporal shifts
|
||||
- Effective for shape-based similarity
|
||||
|
||||
```python
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
|
||||
clf = KNeighborsTimeSeriesClassifier(
|
||||
distance="dtw",
|
||||
n_neighbors=5
|
||||
)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other distance methods:**
|
||||
- `ElasticEnsemble`: Ensemble of elastic distances
|
||||
- `ProximityForest`: Tree-based with elastic measures
|
||||
- `ProximityTree`: Single tree variant
|
||||
- `ShapeDTW`: DTW with shape descriptors
|
||||
|
||||
#### 5. Feature-Based Classifiers
|
||||
Extract statistical and domain-specific features:
|
||||
|
||||
**Catch22**
|
||||
- 22 time series features
|
||||
- Canonical Time-series CHaracteristics
|
||||
- Fast and interpretable
|
||||
|
||||
```python
|
||||
from aeon.classification.feature_based import Catch22Classifier
|
||||
|
||||
clf = Catch22Classifier(estimator=RandomForestClassifier())
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other feature methods:**
|
||||
- `FreshPRINCEClassifier`: Fresh Pipelines with Random Interval and Catch22 Features
|
||||
- `SignatureClassifier`: Path signature features
|
||||
- `TSFreshClassifier`: Comprehensive feature extraction (slower, more features)
|
||||
- `SummaryClassifier`: Simple summary statistics
|
||||
|
||||
#### 6. Interval-Based Classifiers
|
||||
Analyze discriminative time intervals:
|
||||
|
||||
**Time Series Forest (TSF)**
|
||||
- Random intervals + summary statistics
|
||||
- Random forest on extracted features
|
||||
|
||||
```python
|
||||
from aeon.classification.interval_based import TimeSeriesForestClassifier
|
||||
|
||||
clf = TimeSeriesForestClassifier(n_estimators=500)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other interval methods:**
|
||||
- `CanonicalIntervalForest (CIF)`: Canonical Interval Forest
|
||||
- `DrCIF`: Diverse Representation CIF
|
||||
- `RISE`: Random Interval Spectral Ensemble
|
||||
- `RandomIntervalClassifier`: Basic random interval approach
|
||||
- `STSF`: Shapelet Transform Interval Forest
|
||||
|
||||
#### 7. Shapelet-Based Classifiers
|
||||
Discover discriminative subsequences:
|
||||
|
||||
**Shapelets**: Small subsequences that best distinguish classes
|
||||
|
||||
```python
|
||||
from aeon.classification.shapelet_based import ShapeletTransformClassifier
|
||||
|
||||
clf = ShapeletTransformClassifier(
|
||||
n_shapelet_samples=10000,
|
||||
max_shapelets=20
|
||||
)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other shapelet methods:**
|
||||
- `LearningShapeletClassifier`: Gradient-based learning
|
||||
- `SASTClassifier`: Shapelet-Attention Subsequence Transform
|
||||
|
||||
#### 8. Hybrid Ensembles
|
||||
Combine multiple algorithm families:
|
||||
|
||||
**HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles)**
|
||||
- State-of-the-art accuracy
|
||||
- Combines shapelets, intervals, dictionaries, and spectral features
|
||||
- V2 uses ROCKET and improved components
|
||||
|
||||
```python
|
||||
from aeon.classification.hybrid import HIVECOTEV2
|
||||
|
||||
clf = HIVECOTEV2(n_jobs=-1) # Slow but highly accurate
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
### Algorithm Selection Guide
|
||||
|
||||
**Fast and accurate (default choice):**
|
||||
- `RocketClassifier` or `MiniRocketClassifier`
|
||||
|
||||
**Maximum accuracy (slow):**
|
||||
- `HIVECOTEV2` or `InceptionTimeClassifier`
|
||||
|
||||
**Interpretable:**
|
||||
- `Catch22Classifier` or `ShapeletTransformClassifier`
|
||||
|
||||
**Multivariate focus:**
|
||||
- `MultiRocketClassifier` or `MUSE`
|
||||
|
||||
**Small datasets:**
|
||||
- `KNeighborsTimeSeriesClassifier` with DTW
|
||||
|
||||
### Classification Workflow
|
||||
|
||||
```python
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
from aeon.datasets import load_arrow_head
|
||||
from sklearn.metrics import accuracy_score, classification_report
|
||||
|
||||
# Load data
|
||||
X_train, y_train = load_arrow_head(split="train")
|
||||
X_test, y_test = load_arrow_head(split="test")
|
||||
|
||||
# Train classifier
|
||||
clf = RocketClassifier(n_jobs=-1)
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
# Evaluate
|
||||
y_pred = clf.predict(X_test)
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy:.3f}")
|
||||
print(classification_report(y_test, y_pred))
|
||||
```
|
||||
|
||||
## Time Series Regression
|
||||
|
||||
Time series regression predicts continuous values from sequences. Most classification algorithms have regression equivalents.
|
||||
|
||||
### Regression Algorithms
|
||||
|
||||
Available regressors mirror classification structure:
|
||||
- `RocketRegressor`, `MiniRocketRegressor`, `MultiRocketRegressor`
|
||||
- `InceptionTimeRegressor`, `ResNetRegressor`, `FCNRegressor`
|
||||
- `KNeighborsTimeSeriesRegressor`
|
||||
- `Catch22Regressor`, `FreshPRINCERegressor`
|
||||
- `TimeSeriesForestRegressor`, `DrCIFRegressor`
|
||||
|
||||
### Regression Workflow
|
||||
|
||||
```python
|
||||
from aeon.regression.convolution_based import RocketRegressor
|
||||
from sklearn.metrics import mean_squared_error, r2_score
|
||||
|
||||
# Train regressor
|
||||
reg = RocketRegressor(num_kernels=10000)
|
||||
reg.fit(X_train, y_train_continuous)
|
||||
|
||||
# Predict and evaluate
|
||||
y_pred = reg.predict(X_test)
|
||||
mse = mean_squared_error(y_test, y_pred)
|
||||
r2 = r2_score(y_test, y_pred)
|
||||
print(f"MSE: {mse:.3f}, R²: {r2:.3f}")
|
||||
```
|
||||
|
||||
## Time Series Clustering
|
||||
|
||||
Clustering groups similar time series without labels.
|
||||
|
||||
### Clustering Algorithms
|
||||
|
||||
**TimeSeriesKMeans**
|
||||
- K-means with time series distances
|
||||
- Supports DTW, Euclidean, and other metrics
|
||||
|
||||
```python
|
||||
from aeon.clustering import TimeSeriesKMeans
|
||||
|
||||
clusterer = TimeSeriesKMeans(
|
||||
n_clusters=3,
|
||||
distance="dtw",
|
||||
n_init=10
|
||||
)
|
||||
clusterer.fit(X_collection)
|
||||
labels = clusterer.labels_
|
||||
```
|
||||
|
||||
**TimeSeriesKMedoids**
|
||||
- Uses actual series as cluster centers
|
||||
- More robust to outliers
|
||||
|
||||
```python
|
||||
from aeon.clustering import TimeSeriesKMedoids
|
||||
|
||||
clusterer = TimeSeriesKMedoids(
|
||||
n_clusters=3,
|
||||
distance="euclidean"
|
||||
)
|
||||
clusterer.fit(X_collection)
|
||||
```
|
||||
|
||||
**Other clustering methods:**
|
||||
- `TimeSeriesKernelKMeans`: Kernel-based clustering
|
||||
- `ElasticSOM`: Self-organizing maps with elastic distances
|
||||
|
||||
### Clustering Workflow
|
||||
|
||||
```python
|
||||
from aeon.clustering import TimeSeriesKMeans
|
||||
from aeon.distances import dtw_distance
|
||||
import numpy as np
|
||||
|
||||
# Cluster time series
|
||||
clusterer = TimeSeriesKMeans(n_clusters=4, distance="dtw")
|
||||
clusterer.fit(X_train)
|
||||
|
||||
# Get cluster labels
|
||||
labels = clusterer.predict(X_test)
|
||||
|
||||
# Compute cluster centers
|
||||
centers = clusterer.cluster_centers_
|
||||
|
||||
# Evaluate clustering quality (if ground truth available)
|
||||
from sklearn.metrics import adjusted_rand_score
|
||||
ari = adjusted_rand_score(y_true, labels)
|
||||
```
|
||||
|
||||
## Similarity Search
|
||||
|
||||
Similarity search finds motifs, nearest neighbors, and repeated patterns.
|
||||
|
||||
### Key Concepts
|
||||
|
||||
**Motifs**: Frequently repeated subsequences within a time series
|
||||
**Matrix Profile**: Data structure encoding nearest neighbor distances for all subsequences
|
||||
|
||||
### Similarity Search Methods
|
||||
|
||||
**Matrix Profile**
|
||||
- Efficient motif discovery
|
||||
- Change point detection
|
||||
- Anomaly detection
|
||||
|
||||
```python
|
||||
from aeon.similarity_search import MatrixProfile
|
||||
|
||||
mp = MatrixProfile(window_size=50)
|
||||
profile = mp.fit_transform(X_series)
|
||||
|
||||
# Find top motif
|
||||
motif_idx = np.argmin(profile)
|
||||
```
|
||||
|
||||
**Query Search**
|
||||
- Find nearest neighbors to a query subsequence
|
||||
- Useful for template matching
|
||||
|
||||
```python
|
||||
from aeon.similarity_search import QuerySearch
|
||||
|
||||
searcher = QuerySearch(distance="euclidean")
|
||||
distances, indices = searcher.search(X_series, query_subsequence)
|
||||
```
|
||||
|
||||
### Similarity Search Workflow
|
||||
|
||||
```python
|
||||
from aeon.similarity_search import MatrixProfile
|
||||
import numpy as np
|
||||
|
||||
# Compute matrix profile
|
||||
mp = MatrixProfile(window_size=100)
|
||||
profile, profile_index = mp.fit_transform(X_series)
|
||||
|
||||
# Find top-k motifs (lowest profile values)
|
||||
k = 3
|
||||
motif_indices = np.argsort(profile)[:k]
|
||||
|
||||
# Find anomalies (highest profile values)
|
||||
anomaly_indices = np.argsort(profile)[-k:]
|
||||
```
|
||||
|
||||
## Ensemble and Composition Tools
|
||||
|
||||
### Voting Ensembles
|
||||
```python
|
||||
from aeon.classification.ensemble import WeightedEnsembleClassifier
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
|
||||
ensemble = WeightedEnsembleClassifier(
|
||||
estimators=[
|
||||
('rocket', RocketClassifier()),
|
||||
('knn', KNeighborsTimeSeriesClassifier())
|
||||
]
|
||||
)
|
||||
ensemble.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
### Pipelines
|
||||
```python
|
||||
from sklearn.pipeline import Pipeline
|
||||
from aeon.transformations.collection import Catch22
|
||||
from sklearn.ensemble import RandomForestClassifier
|
||||
|
||||
pipeline = Pipeline([
|
||||
('features', Catch22()),
|
||||
('classifier', RandomForestClassifier())
|
||||
])
|
||||
pipeline.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
## Model Selection and Validation
|
||||
|
||||
### Cross-Validation
|
||||
```python
|
||||
from sklearn.model_selection import cross_val_score
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
|
||||
clf = RocketClassifier()
|
||||
scores = cross_val_score(clf, X_train, y_train, cv=5)
|
||||
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
|
||||
```
|
||||
|
||||
### Grid Search
|
||||
```python
|
||||
from sklearn.model_selection import GridSearchCV
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
|
||||
param_grid = {
|
||||
'n_neighbors': [1, 3, 5, 7],
|
||||
'distance': ['dtw', 'euclidean', 'erp']
|
||||
}
|
||||
|
||||
clf = KNeighborsTimeSeriesClassifier()
|
||||
grid_search = GridSearchCV(clf, param_grid, cv=5)
|
||||
grid_search.fit(X_train, y_train)
|
||||
print(f"Best params: {grid_search.best_params_}")
|
||||
```
|
||||
|
||||
## Discovery Functions
|
||||
|
||||
Find available estimators programmatically:
|
||||
|
||||
```python
|
||||
from aeon.utils.discovery import all_estimators
|
||||
|
||||
# Get all classifiers
|
||||
classifiers = all_estimators(type_filter="classifier")
|
||||
|
||||
# Get all regressors
|
||||
regressors = all_estimators(type_filter="regressor")
|
||||
|
||||
# Get all clusterers
|
||||
clusterers = all_estimators(type_filter="clusterer")
|
||||
|
||||
# Filter by tag (e.g., multivariate capable)
|
||||
mv_classifiers = all_estimators(
|
||||
type_filter="classifier",
|
||||
filter_tags={"capability:multivariate": True}
|
||||
)
|
||||
```
|
||||
596
scientific-packages/aeon/references/temporal_analysis.md
Normal file
596
scientific-packages/aeon/references/temporal_analysis.md
Normal file
@@ -0,0 +1,596 @@
|
||||
# Temporal Analysis: Forecasting, Anomaly Detection, and Segmentation
|
||||
|
||||
This reference provides comprehensive details on forecasting future values, detecting anomalies, and segmenting time series.
|
||||
|
||||
## Forecasting
|
||||
|
||||
Forecasting predicts future values in a time series based on historical patterns.
|
||||
|
||||
### Forecasting Concepts
|
||||
|
||||
**Forecasting horizon (fh)**: Number of steps ahead to predict
|
||||
- Absolute: `fh=[1, 2, 3]` (predict steps 1, 2, 3)
|
||||
- Relative: `fh=ForecastingHorizon([1, 2, 3], is_relative=True)`
|
||||
|
||||
**Exogenous variables**: External features that influence predictions
|
||||
|
||||
### Statistical Forecasters
|
||||
|
||||
#### ARIMA (AutoRegressive Integrated Moving Average)
|
||||
Classical time series model combining AR, differencing, and MA components:
|
||||
|
||||
```python
|
||||
from aeon.forecasting.arima import ARIMA
|
||||
|
||||
forecaster = ARIMA(
|
||||
order=(1, 1, 1), # (p, d, q)
|
||||
seasonal_order=(1, 1, 1, 12), # (P, D, Q, s)
|
||||
suppress_warnings=True
|
||||
)
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `p`: AR order (lags)
|
||||
- `d`: Differencing order
|
||||
- `q`: MA order (moving average)
|
||||
- `P, D, Q, s`: Seasonal components
|
||||
|
||||
#### ETS (Exponential Smoothing)
|
||||
State space model for trend and seasonality:
|
||||
|
||||
```python
|
||||
from aeon.forecasting.ets import ETS
|
||||
|
||||
forecaster = ETS(
|
||||
error="add",
|
||||
trend="add",
|
||||
seasonal="add",
|
||||
sp=12 # seasonal period
|
||||
)
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3])
|
||||
```
|
||||
|
||||
**Model types:**
|
||||
- Error: "add" (additive) or "mul" (multiplicative)
|
||||
- Trend: "add", "mul", or None
|
||||
- Seasonal: "add", "mul", or None
|
||||
|
||||
#### Theta Forecaster
|
||||
Simple, effective method using exponential smoothing:
|
||||
|
||||
```python
|
||||
from aeon.forecasting.theta import ThetaForecaster
|
||||
|
||||
forecaster = ThetaForecaster(deseasonalize=True, sp=12)
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=np.arange(1, 13))
|
||||
```
|
||||
|
||||
#### TAR (Threshold AutoRegressive)
|
||||
Non-linear autoregressive model with regime switching:
|
||||
|
||||
```python
|
||||
from aeon.forecasting.tar import TAR
|
||||
|
||||
forecaster = TAR(
|
||||
delay=1,
|
||||
threshold=0.0,
|
||||
order_below=2,
|
||||
order_above=2
|
||||
)
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3])
|
||||
```
|
||||
|
||||
**AutoTAR**: Automatically optimizes threshold:
|
||||
```python
|
||||
from aeon.forecasting.tar import AutoTAR
|
||||
|
||||
forecaster = AutoTAR(max_order=5)
|
||||
forecaster.fit(y_train)
|
||||
```
|
||||
|
||||
#### TVP (Time-Varying Parameter)
|
||||
Kalman filter-based forecaster with dynamic coefficients:
|
||||
|
||||
```python
|
||||
from aeon.forecasting.tvp import TVP
|
||||
|
||||
forecaster = TVP(
|
||||
order=2,
|
||||
use_exog=False
|
||||
)
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3])
|
||||
```
|
||||
|
||||
### Naive Baselines
|
||||
|
||||
Simple forecasting strategies for benchmarking:
|
||||
|
||||
```python
|
||||
from aeon.forecasting.naive import NaiveForecaster
|
||||
|
||||
# Last value
|
||||
forecaster = NaiveForecaster(strategy="last")
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3])
|
||||
|
||||
# Seasonal naive (use value from same season last year)
|
||||
forecaster = NaiveForecaster(strategy="seasonal_last", sp=12)
|
||||
forecaster.fit(y_train)
|
||||
|
||||
# Mean
|
||||
forecaster = NaiveForecaster(strategy="mean")
|
||||
forecaster.fit(y_train)
|
||||
|
||||
# Drift (linear trend from first to last)
|
||||
forecaster = NaiveForecaster(strategy="drift")
|
||||
forecaster.fit(y_train)
|
||||
```
|
||||
|
||||
**Strategies:**
|
||||
- `"last"`: Repeat last observed value
|
||||
- `"mean"`: Use mean of training data
|
||||
- `"seasonal_last"`: Repeat value from previous season
|
||||
- `"drift"`: Linear extrapolation
|
||||
|
||||
### Deep Learning Forecasters
|
||||
|
||||
#### TCN (Temporal Convolutional Network)
|
||||
Deep learning with dilated causal convolutions:
|
||||
|
||||
```python
|
||||
from aeon.forecasting.deep_learning import TCNForecaster
|
||||
|
||||
forecaster = TCNForecaster(
|
||||
n_epochs=100,
|
||||
batch_size=32,
|
||||
kernel_size=3,
|
||||
n_filters=64,
|
||||
dilation_rate=2
|
||||
)
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
|
||||
```
|
||||
|
||||
### Regression-Based Forecasting
|
||||
|
||||
Transform forecasting into a supervised learning problem:
|
||||
|
||||
```python
|
||||
from aeon.forecasting.compose import RegressionForecaster
|
||||
from sklearn.ensemble import RandomForestRegressor
|
||||
|
||||
forecaster = RegressionForecaster(
|
||||
regressor=RandomForestRegressor(n_estimators=100),
|
||||
window_length=10
|
||||
)
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3])
|
||||
```
|
||||
|
||||
### Forecasting Workflow
|
||||
|
||||
```python
|
||||
from aeon.forecasting.arima import ARIMA
|
||||
from aeon.datasets import load_airline
|
||||
from sklearn.metrics import mean_absolute_error, mean_squared_error
|
||||
import numpy as np
|
||||
|
||||
# Load data
|
||||
y = load_airline()
|
||||
|
||||
# Split train/test
|
||||
split_point = int(len(y) * 0.8)
|
||||
y_train, y_test = y[:split_point], y[split_point:]
|
||||
|
||||
# Fit forecaster
|
||||
forecaster = ARIMA(order=(2, 1, 2), suppress_warnings=True)
|
||||
forecaster.fit(y_train)
|
||||
|
||||
# Predict
|
||||
fh = np.arange(1, len(y_test) + 1)
|
||||
y_pred = forecaster.predict(fh=fh)
|
||||
|
||||
# Evaluate
|
||||
mae = mean_absolute_error(y_test, y_pred)
|
||||
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
|
||||
print(f"MAE: {mae:.3f}, RMSE: {rmse:.3f}")
|
||||
```
|
||||
|
||||
### Forecasting with Exogenous Variables
|
||||
|
||||
```python
|
||||
from aeon.forecasting.arima import ARIMA
|
||||
|
||||
# X contains exogenous features
|
||||
forecaster = ARIMA(order=(1, 1, 1))
|
||||
forecaster.fit(y_train, X=X_train)
|
||||
|
||||
# Must provide future exogenous values
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3], X=X_future)
|
||||
```
|
||||
|
||||
### Multi-Step Forecasting Strategies
|
||||
|
||||
**Direct**: Train separate model for each horizon
|
||||
**Recursive**: Use predictions as inputs for next step
|
||||
**DirRec**: Combine both strategies
|
||||
|
||||
```python
|
||||
from aeon.forecasting.compose import DirectReductionForecaster
|
||||
from sklearn.linear_model import Ridge
|
||||
|
||||
forecaster = DirectReductionForecaster(
|
||||
regressor=Ridge(),
|
||||
window_length=10
|
||||
)
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3, 4, 5])
|
||||
```
|
||||
|
||||
## Anomaly Detection
|
||||
|
||||
Anomaly detection identifies unusual patterns or outliers in time series data.
|
||||
|
||||
### Anomaly Detection Types
|
||||
|
||||
**Point anomalies**: Single unusual values
|
||||
**Contextual anomalies**: Values anomalous given context
|
||||
**Collective anomalies**: Sequences of unusual behavior
|
||||
|
||||
### Distance-Based Anomaly Detectors
|
||||
|
||||
#### STOMP (Scalable Time series Ordered-search Matrix Profile)
|
||||
Matrix profile-based anomaly detection:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import STOMP
|
||||
|
||||
detector = STOMP(window_size=50)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
|
||||
# High scores indicate anomalies
|
||||
threshold = np.percentile(anomaly_scores, 95)
|
||||
anomalies = anomaly_scores > threshold
|
||||
```
|
||||
|
||||
#### LeftSTAMPi
|
||||
Incremental matrix profile for streaming data:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import LeftSTAMPi
|
||||
|
||||
detector = LeftSTAMPi(window_size=50)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### MERLIN
|
||||
Matrix profile with range constraints:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import MERLIN
|
||||
|
||||
detector = MERLIN(window_size=50, k=3)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### KMeansAD
|
||||
K-means clustering-based anomaly detection:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import KMeansAD
|
||||
|
||||
detector = KMeansAD(n_clusters=5, window_size=50)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### CBLOF (Cluster-Based Local Outlier Factor)
|
||||
```python
|
||||
from aeon.anomaly_detection import CBLOF
|
||||
|
||||
detector = CBLOF(n_clusters=8, alpha=0.9)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### LOF (Local Outlier Factor)
|
||||
Density-based outlier detection:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import LOF
|
||||
|
||||
detector = LOF(n_neighbors=20, window_size=50)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### ROCKAD
|
||||
ROCKET-based anomaly detection:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import ROCKAD
|
||||
|
||||
detector = ROCKAD(num_kernels=1000, window_size=50)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
### Distribution-Based Anomaly Detectors
|
||||
|
||||
#### COPOD (Copula-Based Outlier Detection)
|
||||
```python
|
||||
from aeon.anomaly_detection import COPOD
|
||||
|
||||
detector = COPOD(window_size=50)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### DWT_MLEAD
|
||||
Discrete Wavelet Transform with Machine Learning:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import DWT_MLEAD
|
||||
|
||||
detector = DWT_MLEAD(window_size=50, wavelet='db4')
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
### Outlier Detection Methods
|
||||
|
||||
#### IsolationForest
|
||||
Ensemble tree-based isolation:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import IsolationForest
|
||||
|
||||
detector = IsolationForest(
|
||||
n_estimators=100,
|
||||
window_size=50,
|
||||
contamination=0.1
|
||||
)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### OneClassSVM
|
||||
Support vector machine for novelty detection:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import OneClassSVM
|
||||
|
||||
detector = OneClassSVM(
|
||||
kernel='rbf',
|
||||
nu=0.1,
|
||||
window_size=50
|
||||
)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### STRAY (Search TRace AnomalY)
|
||||
```python
|
||||
from aeon.anomaly_detection import STRAY
|
||||
|
||||
detector = STRAY(alpha=0.05)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
```
|
||||
|
||||
### Collection Anomaly Detection
|
||||
|
||||
Detect anomalous time series within a collection:
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import ClassificationAdapter
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
|
||||
detector = ClassificationAdapter(
|
||||
classifier=RocketClassifier()
|
||||
)
|
||||
detector.fit(X_normal) # Train on normal data
|
||||
anomaly_labels = detector.predict(X_test) # 1 = anomaly, 0 = normal
|
||||
```
|
||||
|
||||
### Anomaly Detection Workflow
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import STOMP
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Detect anomalies
|
||||
detector = STOMP(window_size=100)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
|
||||
# Identify anomalies (top 5%)
|
||||
threshold = np.percentile(anomaly_scores, 95)
|
||||
anomaly_indices = np.where(anomaly_scores > threshold)[0]
|
||||
|
||||
# Visualize
|
||||
plt.figure(figsize=(12, 6))
|
||||
plt.subplot(2, 1, 1)
|
||||
plt.plot(X_series[0, 0, :])
|
||||
plt.scatter(anomaly_indices, X_series[0, 0, anomaly_indices],
|
||||
color='red', label='Anomalies', zorder=5)
|
||||
plt.legend()
|
||||
plt.title('Time Series with Detected Anomalies')
|
||||
|
||||
plt.subplot(2, 1, 2)
|
||||
plt.plot(anomaly_scores)
|
||||
plt.axhline(threshold, color='red', linestyle='--', label='Threshold')
|
||||
plt.legend()
|
||||
plt.title('Anomaly Scores')
|
||||
plt.tight_layout()
|
||||
plt.show()
|
||||
```
|
||||
|
||||
## Segmentation
|
||||
|
||||
Segmentation divides time series into distinct regions or identifies change points.
|
||||
|
||||
### Segmentation Concepts
|
||||
|
||||
**Change points**: Locations where statistical properties change
|
||||
**Segments**: Homogeneous regions between change points
|
||||
**Applications**: Regime detection, event identification, structural breaks
|
||||
|
||||
### Segmentation Algorithms
|
||||
|
||||
#### ClaSP (Classification Score Profile)
|
||||
Discover change points using classification performance:
|
||||
|
||||
```python
|
||||
from aeon.segmentation import ClaSPSegmenter
|
||||
|
||||
segmenter = ClaSPSegmenter(
|
||||
n_segments=3,
|
||||
period_length=10
|
||||
)
|
||||
change_points = segmenter.fit_predict(X_series)
|
||||
print(f"Change points at indices: {change_points}")
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
- Slides a window over the series
|
||||
- Computes classification score for left vs. right segments
|
||||
- High scores indicate change points
|
||||
|
||||
#### FLUSS (Fast Low-cost Unipotent Semantic Segmentation)
|
||||
Matrix profile-based segmentation:
|
||||
|
||||
```python
|
||||
from aeon.segmentation import FLUSSSegmenter
|
||||
|
||||
segmenter = FLUSSSegmenter(
|
||||
n_segments=5,
|
||||
window_size=50
|
||||
)
|
||||
change_points = segmenter.fit_predict(X_series)
|
||||
```
|
||||
|
||||
#### BinSeg (Binary Segmentation)
|
||||
Recursive splitting for change point detection:
|
||||
|
||||
```python
|
||||
from aeon.segmentation import BinSegSegmenter
|
||||
|
||||
segmenter = BinSegSegmenter(
|
||||
n_segments=4,
|
||||
model="l2" # cost function
|
||||
)
|
||||
change_points = segmenter.fit_predict(X_series)
|
||||
```
|
||||
|
||||
**Models:**
|
||||
- `"l2"`: Least squares (continuous data)
|
||||
- `"l1"`: Absolute deviation (robust to outliers)
|
||||
- `"rbf"`: Radial basis function
|
||||
- `"ar"`: Autoregressive model
|
||||
|
||||
#### HMM (Hidden Markov Model) Segmentation
|
||||
Probabilistic state-based segmentation:
|
||||
|
||||
```python
|
||||
from aeon.segmentation import HMMSegmenter
|
||||
|
||||
segmenter = HMMSegmenter(
|
||||
n_states=3,
|
||||
covariance_type="full"
|
||||
)
|
||||
segmenter.fit(X_series)
|
||||
states = segmenter.predict(X_series)
|
||||
```
|
||||
|
||||
### Segmentation Workflow
|
||||
|
||||
```python
|
||||
from aeon.segmentation import ClaSPSegmenter
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Detect change points
|
||||
segmenter = ClaSPSegmenter(n_segments=4)
|
||||
change_points = segmenter.fit_predict(X_series)
|
||||
|
||||
# Visualize segments
|
||||
plt.figure(figsize=(12, 4))
|
||||
plt.plot(X_series[0, 0, :])
|
||||
for cp in change_points:
|
||||
plt.axvline(cp, color='red', linestyle='--', alpha=0.7)
|
||||
plt.title('Time Series Segmentation')
|
||||
plt.xlabel('Time')
|
||||
plt.ylabel('Value')
|
||||
plt.show()
|
||||
|
||||
# Extract segments
|
||||
segments = []
|
||||
prev_cp = 0
|
||||
for cp in np.append(change_points, len(X_series[0, 0, :])):
|
||||
segment = X_series[0, 0, prev_cp:cp]
|
||||
segments.append(segment)
|
||||
prev_cp = cp
|
||||
```
|
||||
|
||||
### Multi-Variate Segmentation
|
||||
|
||||
```python
|
||||
from aeon.segmentation import ClaSPSegmenter
|
||||
|
||||
# X_multivariate has shape (1, n_channels, n_timepoints)
|
||||
segmenter = ClaSPSegmenter(n_segments=3)
|
||||
change_points = segmenter.fit_predict(X_multivariate)
|
||||
```
|
||||
|
||||
## Combining Forecasting, Anomaly Detection, and Segmentation
|
||||
|
||||
### Robust Forecasting with Anomaly Detection
|
||||
|
||||
```python
|
||||
from aeon.forecasting.arima import ARIMA
|
||||
from aeon.anomaly_detection import IsolationForest
|
||||
|
||||
# Detect and remove anomalies
|
||||
detector = IsolationForest(window_size=50, contamination=0.1)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
normal_mask = anomaly_scores < np.percentile(anomaly_scores, 90)
|
||||
|
||||
# Forecast on cleaned data
|
||||
y_clean = y_train[normal_mask]
|
||||
forecaster = ARIMA(order=(2, 1, 2))
|
||||
forecaster.fit(y_clean)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3])
|
||||
```
|
||||
|
||||
### Segmentation-Based Forecasting
|
||||
|
||||
```python
|
||||
from aeon.segmentation import ClaSPSegmenter
|
||||
from aeon.forecasting.arima import ARIMA
|
||||
|
||||
# Segment time series
|
||||
segmenter = ClaSPSegmenter(n_segments=3)
|
||||
change_points = segmenter.fit_predict(X_series)
|
||||
|
||||
# Forecast using most recent segment
|
||||
last_segment_start = change_points[-1]
|
||||
y_recent = y_train[last_segment_start:]
|
||||
|
||||
forecaster = ARIMA(order=(1, 1, 1))
|
||||
forecaster.fit(y_recent)
|
||||
y_pred = forecaster.predict(fh=[1, 2, 3])
|
||||
```
|
||||
|
||||
## Discovery Functions
|
||||
|
||||
Find available forecasters, detectors, and segmenters:
|
||||
|
||||
```python
|
||||
from aeon.utils.discovery import all_estimators
|
||||
|
||||
# Get all forecasters
|
||||
forecasters = all_estimators(type_filter="forecaster")
|
||||
|
||||
# Get all anomaly detectors
|
||||
detectors = all_estimators(type_filter="anomaly-detector")
|
||||
|
||||
# Get all segmenters
|
||||
segmenters = all_estimators(type_filter="segmenter")
|
||||
```
|
||||
634
scientific-packages/aeon/references/workflows.md
Normal file
634
scientific-packages/aeon/references/workflows.md
Normal file
@@ -0,0 +1,634 @@
|
||||
# Common Workflows and Integration Patterns
|
||||
|
||||
This reference provides end-to-end workflows, best practices, and integration patterns for using aeon effectively.
|
||||
|
||||
## Complete Classification Workflow
|
||||
|
||||
### Basic Classification Pipeline
|
||||
|
||||
```python
|
||||
# 1. Import required modules
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
from aeon.datasets import load_arrow_head
|
||||
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
|
||||
import matplotlib.pyplot as plt
|
||||
import seaborn as sns
|
||||
|
||||
# 2. Load and inspect data
|
||||
X_train, y_train = load_arrow_head(split="train")
|
||||
X_test, y_test = load_arrow_head(split="test")
|
||||
|
||||
print(f"Training shape: {X_train.shape}") # (n_cases, n_channels, n_timepoints)
|
||||
print(f"Unique classes: {np.unique(y_train)}")
|
||||
|
||||
# 3. Train classifier
|
||||
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
# 4. Make predictions
|
||||
y_pred = clf.predict(X_test)
|
||||
y_proba = clf.predict_proba(X_test)
|
||||
|
||||
# 5. Evaluate performance
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy:.3f}")
|
||||
print("\nClassification Report:")
|
||||
print(classification_report(y_test, y_pred))
|
||||
|
||||
# 6. Visualize confusion matrix
|
||||
cm = confusion_matrix(y_test, y_pred)
|
||||
plt.figure(figsize=(8, 6))
|
||||
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
|
||||
plt.title('Confusion Matrix')
|
||||
plt.ylabel('True Label')
|
||||
plt.xlabel('Predicted Label')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### Feature Extraction + Classifier Pipeline
|
||||
|
||||
```python
|
||||
from sklearn.pipeline import Pipeline
|
||||
from aeon.transformations.collection import Catch22
|
||||
from sklearn.ensemble import RandomForestClassifier
|
||||
from sklearn.model_selection import cross_val_score
|
||||
|
||||
# Create pipeline
|
||||
pipeline = Pipeline([
|
||||
('features', Catch22(n_jobs=-1)),
|
||||
('classifier', RandomForestClassifier(n_estimators=500, n_jobs=-1))
|
||||
])
|
||||
|
||||
# Cross-validation
|
||||
scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='accuracy')
|
||||
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
|
||||
|
||||
# Train on full training set
|
||||
pipeline.fit(X_train, y_train)
|
||||
|
||||
# Evaluate on test set
|
||||
accuracy = pipeline.score(X_test, y_test)
|
||||
print(f"Test Accuracy: {accuracy:.3f}")
|
||||
```
|
||||
|
||||
### Multi-Algorithm Comparison
|
||||
|
||||
```python
|
||||
from aeon.classification.convolution_based import RocketClassifier, MiniRocketClassifier
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
from aeon.classification.feature_based import Catch22Classifier
|
||||
from aeon.classification.interval_based import TimeSeriesForestClassifier
|
||||
import time
|
||||
|
||||
classifiers = {
|
||||
'ROCKET': RocketClassifier(num_kernels=10000),
|
||||
'MiniRocket': MiniRocketClassifier(),
|
||||
'KNN-DTW': KNeighborsTimeSeriesClassifier(distance='dtw', n_neighbors=5),
|
||||
'Catch22': Catch22Classifier(),
|
||||
'TSF': TimeSeriesForestClassifier(n_estimators=200)
|
||||
}
|
||||
|
||||
results = {}
|
||||
for name, clf in classifiers.items():
|
||||
start_time = time.time()
|
||||
clf.fit(X_train, y_train)
|
||||
train_time = time.time() - start_time
|
||||
|
||||
start_time = time.time()
|
||||
accuracy = clf.score(X_test, y_test)
|
||||
test_time = time.time() - start_time
|
||||
|
||||
results[name] = {
|
||||
'accuracy': accuracy,
|
||||
'train_time': train_time,
|
||||
'test_time': test_time
|
||||
}
|
||||
|
||||
# Display results
|
||||
import pandas as pd
|
||||
df_results = pd.DataFrame(results).T
|
||||
df_results = df_results.sort_values('accuracy', ascending=False)
|
||||
print(df_results)
|
||||
```
|
||||
|
||||
## Complete Forecasting Workflow
|
||||
|
||||
### Univariate Forecasting
|
||||
|
||||
```python
|
||||
from aeon.forecasting.arima import ARIMA
|
||||
from aeon.forecasting.naive import NaiveForecaster
|
||||
from aeon.datasets import load_airline
|
||||
from sklearn.metrics import mean_absolute_error, mean_squared_error
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# 1. Load data
|
||||
y = load_airline()
|
||||
|
||||
# 2. Train/test split (temporal)
|
||||
split_point = int(len(y) * 0.8)
|
||||
y_train, y_test = y[:split_point], y[split_point:]
|
||||
|
||||
# 3. Create baseline (naive forecaster)
|
||||
baseline = NaiveForecaster(strategy="last")
|
||||
baseline.fit(y_train)
|
||||
y_pred_baseline = baseline.predict(fh=np.arange(1, len(y_test) + 1))
|
||||
|
||||
# 4. Train ARIMA model
|
||||
forecaster = ARIMA(order=(2, 1, 2), seasonal_order=(1, 1, 1, 12))
|
||||
forecaster.fit(y_train)
|
||||
y_pred = forecaster.predict(fh=np.arange(1, len(y_test) + 1))
|
||||
|
||||
# 5. Evaluate
|
||||
mae_baseline = mean_absolute_error(y_test, y_pred_baseline)
|
||||
mae_arima = mean_absolute_error(y_test, y_pred)
|
||||
rmse_baseline = np.sqrt(mean_squared_error(y_test, y_pred_baseline))
|
||||
rmse_arima = np.sqrt(mean_squared_error(y_test, y_pred))
|
||||
|
||||
print(f"Baseline - MAE: {mae_baseline:.2f}, RMSE: {rmse_baseline:.2f}")
|
||||
print(f"ARIMA - MAE: {mae_arima:.2f}, RMSE: {rmse_arima:.2f}")
|
||||
|
||||
# 6. Visualize
|
||||
plt.figure(figsize=(12, 6))
|
||||
plt.plot(y_train.index, y_train, label='Train', alpha=0.7)
|
||||
plt.plot(y_test.index, y_test, label='Test (Actual)', alpha=0.7)
|
||||
plt.plot(y_test.index, y_pred, label='ARIMA Forecast', linestyle='--')
|
||||
plt.plot(y_test.index, y_pred_baseline, label='Baseline', linestyle=':', alpha=0.5)
|
||||
plt.legend()
|
||||
plt.title('Forecasting Results')
|
||||
plt.xlabel('Time')
|
||||
plt.ylabel('Value')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### Forecast with Confidence Intervals
|
||||
|
||||
```python
|
||||
from aeon.forecasting.arima import ARIMA
|
||||
|
||||
forecaster = ARIMA(order=(2, 1, 2))
|
||||
forecaster.fit(y_train)
|
||||
|
||||
# Predict with prediction intervals
|
||||
y_pred = forecaster.predict(fh=np.arange(1, len(y_test) + 1))
|
||||
pred_interval = forecaster.predict_interval(
|
||||
fh=np.arange(1, len(y_test) + 1),
|
||||
coverage=0.95
|
||||
)
|
||||
|
||||
# Visualize with confidence bands
|
||||
plt.figure(figsize=(12, 6))
|
||||
plt.plot(y_test.index, y_test, label='Actual')
|
||||
plt.plot(y_test.index, y_pred, label='Forecast')
|
||||
plt.fill_between(
|
||||
y_test.index,
|
||||
pred_interval.iloc[:, 0],
|
||||
pred_interval.iloc[:, 1],
|
||||
alpha=0.3,
|
||||
label='95% Confidence'
|
||||
)
|
||||
plt.legend()
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### Multi-Step Ahead Forecasting
|
||||
|
||||
```python
|
||||
from aeon.forecasting.compose import DirectReductionForecaster
|
||||
from sklearn.ensemble import GradientBoostingRegressor
|
||||
|
||||
# Convert to supervised learning problem
|
||||
forecaster = DirectReductionForecaster(
|
||||
regressor=GradientBoostingRegressor(n_estimators=100),
|
||||
window_length=12
|
||||
)
|
||||
forecaster.fit(y_train)
|
||||
|
||||
# Forecast multiple steps
|
||||
fh = np.arange(1, 13) # 12 months ahead
|
||||
y_pred = forecaster.predict(fh=fh)
|
||||
```
|
||||
|
||||
## Complete Anomaly Detection Workflow
|
||||
|
||||
```python
|
||||
from aeon.anomaly_detection import STOMP
|
||||
from aeon.datasets import load_airline
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# 1. Load data
|
||||
y = load_airline()
|
||||
X_series = y.values.reshape(1, 1, -1) # Convert to aeon format
|
||||
|
||||
# 2. Detect anomalies
|
||||
detector = STOMP(window_size=50)
|
||||
anomaly_scores = detector.fit_predict(X_series)
|
||||
|
||||
# 3. Identify anomalies (top 5%)
|
||||
threshold = np.percentile(anomaly_scores, 95)
|
||||
anomaly_indices = np.where(anomaly_scores > threshold)[0]
|
||||
|
||||
# 4. Visualize
|
||||
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)
|
||||
|
||||
# Plot time series with anomalies
|
||||
axes[0].plot(y.values, label='Time Series')
|
||||
axes[0].scatter(
|
||||
anomaly_indices,
|
||||
y.values[anomaly_indices],
|
||||
color='red',
|
||||
s=100,
|
||||
label='Anomalies',
|
||||
zorder=5
|
||||
)
|
||||
axes[0].set_ylabel('Value')
|
||||
axes[0].legend()
|
||||
axes[0].set_title('Time Series with Detected Anomalies')
|
||||
|
||||
# Plot anomaly scores
|
||||
axes[1].plot(anomaly_scores, label='Anomaly Score')
|
||||
axes[1].axhline(threshold, color='red', linestyle='--', label='Threshold')
|
||||
axes[1].set_xlabel('Time')
|
||||
axes[1].set_ylabel('Score')
|
||||
axes[1].legend()
|
||||
axes[1].set_title('Anomaly Scores')
|
||||
|
||||
plt.tight_layout()
|
||||
plt.show()
|
||||
|
||||
# 5. Extract anomalous segments
|
||||
print(f"Found {len(anomaly_indices)} anomalies")
|
||||
for idx in anomaly_indices[:5]: # Show first 5
|
||||
print(f"Anomaly at index {idx}, value: {y.values[idx]:.2f}")
|
||||
```
|
||||
|
||||
## Complete Clustering Workflow
|
||||
|
||||
```python
|
||||
from aeon.clustering import TimeSeriesKMeans
|
||||
from aeon.datasets import load_basic_motions
|
||||
from sklearn.metrics import silhouette_score, davies_bouldin_score
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# 1. Load data
|
||||
X_train, y_train = load_basic_motions(split="train")
|
||||
|
||||
# 2. Determine optimal number of clusters (elbow method)
|
||||
inertias = []
|
||||
silhouettes = []
|
||||
K = range(2, 11)
|
||||
|
||||
for k in K:
|
||||
clusterer = TimeSeriesKMeans(n_clusters=k, distance="euclidean", n_init=5)
|
||||
labels = clusterer.fit_predict(X_train)
|
||||
inertias.append(clusterer.inertia_)
|
||||
silhouettes.append(silhouette_score(X_train.reshape(len(X_train), -1), labels))
|
||||
|
||||
# Plot elbow curve
|
||||
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
|
||||
axes[0].plot(K, inertias, 'bo-')
|
||||
axes[0].set_xlabel('Number of Clusters')
|
||||
axes[0].set_ylabel('Inertia')
|
||||
axes[0].set_title('Elbow Method')
|
||||
|
||||
axes[1].plot(K, silhouettes, 'ro-')
|
||||
axes[1].set_xlabel('Number of Clusters')
|
||||
axes[1].set_ylabel('Silhouette Score')
|
||||
axes[1].set_title('Silhouette Analysis')
|
||||
plt.tight_layout()
|
||||
plt.show()
|
||||
|
||||
# 3. Cluster with optimal k
|
||||
optimal_k = 4
|
||||
clusterer = TimeSeriesKMeans(n_clusters=optimal_k, distance="dtw", n_init=10)
|
||||
labels = clusterer.fit_predict(X_train)
|
||||
|
||||
# 4. Visualize clusters
|
||||
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
|
||||
axes = axes.ravel()
|
||||
|
||||
for cluster_id in range(optimal_k):
|
||||
cluster_indices = np.where(labels == cluster_id)[0]
|
||||
ax = axes[cluster_id]
|
||||
|
||||
# Plot all series in cluster
|
||||
for idx in cluster_indices[:20]: # Plot up to 20 series
|
||||
ax.plot(X_train[idx, 0, :], alpha=0.3, color='blue')
|
||||
|
||||
# Plot cluster center
|
||||
ax.plot(clusterer.cluster_centers_[cluster_id, 0, :],
|
||||
color='red', linewidth=2, label='Center')
|
||||
ax.set_title(f'Cluster {cluster_id} (n={len(cluster_indices)})')
|
||||
ax.legend()
|
||||
|
||||
plt.tight_layout()
|
||||
plt.show()
|
||||
```
|
||||
|
||||
## Cross-Validation Strategies
|
||||
|
||||
### Standard K-Fold Cross-Validation
|
||||
|
||||
```python
|
||||
from sklearn.model_selection import cross_val_score, StratifiedKFold
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
|
||||
clf = RocketClassifier()
|
||||
|
||||
# Stratified K-Fold (preserves class distribution)
|
||||
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
|
||||
scores = cross_val_score(clf, X_train, y_train, cv=cv, scoring='accuracy')
|
||||
|
||||
print(f"Cross-validation scores: {scores}")
|
||||
print(f"Mean accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
|
||||
```
|
||||
|
||||
### Time Series Cross-Validation (for forecasting)
|
||||
|
||||
```python
|
||||
from sklearn.model_selection import TimeSeriesSplit
|
||||
from aeon.forecasting.arima import ARIMA
|
||||
from sklearn.metrics import mean_squared_error
|
||||
import numpy as np
|
||||
|
||||
# Time-aware split (no future data leakage)
|
||||
tscv = TimeSeriesSplit(n_splits=5)
|
||||
mse_scores = []
|
||||
|
||||
for train_idx, test_idx in tscv.split(y):
|
||||
y_train_cv, y_test_cv = y.iloc[train_idx], y.iloc[test_idx]
|
||||
|
||||
forecaster = ARIMA(order=(2, 1, 2))
|
||||
forecaster.fit(y_train_cv)
|
||||
|
||||
fh = np.arange(1, len(y_test_cv) + 1)
|
||||
y_pred = forecaster.predict(fh=fh)
|
||||
|
||||
mse = mean_squared_error(y_test_cv, y_pred)
|
||||
mse_scores.append(mse)
|
||||
|
||||
print(f"CV MSE: {np.mean(mse_scores):.3f} (+/- {np.std(mse_scores):.3f})")
|
||||
```
|
||||
|
||||
## Hyperparameter Tuning
|
||||
|
||||
### Grid Search
|
||||
|
||||
```python
|
||||
from sklearn.model_selection import GridSearchCV
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
|
||||
# Define parameter grid
|
||||
param_grid = {
|
||||
'n_neighbors': [1, 3, 5, 7, 9],
|
||||
'distance': ['dtw', 'euclidean', 'erp', 'msm'],
|
||||
'distance_params': [{'window': 0.1}, {'window': 0.2}, None]
|
||||
}
|
||||
|
||||
# Grid search with cross-validation
|
||||
clf = KNeighborsTimeSeriesClassifier()
|
||||
grid_search = GridSearchCV(
|
||||
clf,
|
||||
param_grid,
|
||||
cv=5,
|
||||
scoring='accuracy',
|
||||
n_jobs=-1,
|
||||
verbose=2
|
||||
)
|
||||
|
||||
grid_search.fit(X_train, y_train)
|
||||
|
||||
print(f"Best parameters: {grid_search.best_params_}")
|
||||
print(f"Best CV score: {grid_search.best_score_:.3f}")
|
||||
print(f"Test accuracy: {grid_search.score(X_test, y_test):.3f}")
|
||||
```
|
||||
|
||||
### Random Search
|
||||
|
||||
```python
|
||||
from sklearn.model_selection import RandomizedSearchCV
|
||||
from scipy.stats import randint, uniform
|
||||
|
||||
param_distributions = {
|
||||
'n_neighbors': randint(1, 20),
|
||||
'distance': ['dtw', 'euclidean', 'ddtw'],
|
||||
'distance_params': [{'window': w} for w in np.linspace(0.0, 0.5, 10)]
|
||||
}
|
||||
|
||||
clf = KNeighborsTimeSeriesClassifier()
|
||||
random_search = RandomizedSearchCV(
|
||||
clf,
|
||||
param_distributions,
|
||||
n_iter=50,
|
||||
cv=5,
|
||||
scoring='accuracy',
|
||||
n_jobs=-1,
|
||||
random_state=42
|
||||
)
|
||||
|
||||
random_search.fit(X_train, y_train)
|
||||
print(f"Best parameters: {random_search.best_params_}")
|
||||
```
|
||||
|
||||
## Integration with scikit-learn
|
||||
|
||||
### Using aeon in scikit-learn Pipelines
|
||||
|
||||
```python
|
||||
from sklearn.pipeline import Pipeline
|
||||
from sklearn.preprocessing import StandardScaler
|
||||
from aeon.transformations.collection import Catch22
|
||||
from sklearn.feature_selection import SelectKBest, f_classif
|
||||
from sklearn.ensemble import RandomForestClassifier
|
||||
|
||||
pipeline = Pipeline([
|
||||
('features', Catch22()),
|
||||
('scaler', StandardScaler()),
|
||||
('feature_selection', SelectKBest(f_classif, k=15)),
|
||||
('classifier', RandomForestClassifier(n_estimators=500))
|
||||
])
|
||||
|
||||
pipeline.fit(X_train, y_train)
|
||||
accuracy = pipeline.score(X_test, y_test)
|
||||
```
|
||||
|
||||
### Voting Ensemble with scikit-learn
|
||||
|
||||
```python
|
||||
from sklearn.ensemble import VotingClassifier
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
from aeon.classification.feature_based import Catch22Classifier
|
||||
|
||||
ensemble = VotingClassifier(
|
||||
estimators=[
|
||||
('rocket', RocketClassifier()),
|
||||
('knn', KNeighborsTimeSeriesClassifier()),
|
||||
('catch22', Catch22Classifier())
|
||||
],
|
||||
voting='soft',
|
||||
n_jobs=-1
|
||||
)
|
||||
|
||||
ensemble.fit(X_train, y_train)
|
||||
accuracy = ensemble.score(X_test, y_test)
|
||||
```
|
||||
|
||||
### Stacking with Meta-Learner
|
||||
|
||||
```python
|
||||
from sklearn.ensemble import StackingClassifier
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from aeon.classification.convolution_based import MiniRocketClassifier
|
||||
from aeon.classification.interval_based import TimeSeriesForestClassifier
|
||||
|
||||
stacking = StackingClassifier(
|
||||
estimators=[
|
||||
('minirocket', MiniRocketClassifier()),
|
||||
('tsf', TimeSeriesForestClassifier(n_estimators=100))
|
||||
],
|
||||
final_estimator=LogisticRegression(),
|
||||
cv=5
|
||||
)
|
||||
|
||||
stacking.fit(X_train, y_train)
|
||||
accuracy = stacking.score(X_test, y_test)
|
||||
```
|
||||
|
||||
## Data Preprocessing
|
||||
|
||||
### Handling Variable-Length Series
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection import PaddingTransformer
|
||||
|
||||
# Pad series to equal length
|
||||
padder = PaddingTransformer(pad_length=None, fill_value=0)
|
||||
X_padded = padder.fit_transform(X_variable_length)
|
||||
```
|
||||
|
||||
### Handling Missing Values
|
||||
|
||||
```python
|
||||
from aeon.transformations.series import Imputer
|
||||
|
||||
imputer = Imputer(method='mean')
|
||||
X_imputed = imputer.fit_transform(X_with_missing)
|
||||
```
|
||||
|
||||
### Normalization
|
||||
|
||||
```python
|
||||
from aeon.transformations.collection import Normalizer
|
||||
|
||||
normalizer = Normalizer(method='z-score')
|
||||
X_normalized = normalizer.fit_transform(X_train)
|
||||
```
|
||||
|
||||
## Model Persistence
|
||||
|
||||
### Saving and Loading Models
|
||||
|
||||
```python
|
||||
import pickle
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
|
||||
# Train and save
|
||||
clf = RocketClassifier()
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
with open('rocket_model.pkl', 'wb') as f:
|
||||
pickle.dump(clf, f)
|
||||
|
||||
# Load and predict
|
||||
with open('rocket_model.pkl', 'rb') as f:
|
||||
loaded_clf = pickle.load(f)
|
||||
|
||||
predictions = loaded_clf.predict(X_test)
|
||||
```
|
||||
|
||||
### Using joblib (recommended for large models)
|
||||
|
||||
```python
|
||||
import joblib
|
||||
|
||||
# Save
|
||||
joblib.dump(clf, 'rocket_model.joblib')
|
||||
|
||||
# Load
|
||||
loaded_clf = joblib.load('rocket_model.joblib')
|
||||
```
|
||||
|
||||
## Visualization Utilities
|
||||
|
||||
### Plotting Time Series
|
||||
|
||||
```python
|
||||
from aeon.visualisation import plot_series
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Plot multiple series
|
||||
fig, ax = plt.subplots(figsize=(12, 6))
|
||||
plot_series(X_train[0], X_train[1], X_train[2], labels=['Series 1', 'Series 2', 'Series 3'], ax=ax)
|
||||
plt.title('Time Series Visualization')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
### Plotting Distance Matrices
|
||||
|
||||
```python
|
||||
from aeon.distances import pairwise_distance
|
||||
import seaborn as sns
|
||||
|
||||
dist_matrix = pairwise_distance(X_train[:50], metric="dtw")
|
||||
|
||||
plt.figure(figsize=(10, 8))
|
||||
sns.heatmap(dist_matrix, cmap='viridis', square=True)
|
||||
plt.title('DTW Distance Matrix')
|
||||
plt.show()
|
||||
```
|
||||
|
||||
## Performance Optimization Tips
|
||||
|
||||
1. **Use n_jobs=-1** for parallel processing:
|
||||
```python
|
||||
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
|
||||
```
|
||||
|
||||
2. **Use MiniRocket instead of ROCKET** for faster training:
|
||||
```python
|
||||
clf = MiniRocketClassifier() # 75% faster
|
||||
```
|
||||
|
||||
3. **Reduce num_kernels** for faster training:
|
||||
```python
|
||||
clf = RocketClassifier(num_kernels=2000) # Default is 10000
|
||||
```
|
||||
|
||||
4. **Use Catch22 instead of TSFresh**:
|
||||
```python
|
||||
transform = Catch22() # Much faster, fewer features
|
||||
```
|
||||
|
||||
5. **Window constraints for DTW**:
|
||||
```python
|
||||
clf = KNeighborsTimeSeriesClassifier(
|
||||
distance='dtw',
|
||||
distance_params={'window': 0.1} # Constrain warping
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always use train/test split** with time series ordering preserved
|
||||
2. **Use stratified splits** for classification to maintain class balance
|
||||
3. **Start with fast algorithms** (ROCKET, MiniRocket) before trying slow ones
|
||||
4. **Use cross-validation** to estimate generalization performance
|
||||
5. **Benchmark against naive baselines** to establish minimum performance
|
||||
6. **Normalize/standardize** when using distance-based methods
|
||||
7. **Use appropriate distance metrics** for your data characteristics
|
||||
8. **Save trained models** to avoid retraining
|
||||
9. **Monitor training time** and computational resources
|
||||
10. **Visualize results** to understand model behavior
|
||||
Reference in New Issue
Block a user