Support for aeon for time-series analysis and machine learning

This commit is contained in:
Timothy Kassis
2025-10-25 21:25:50 -07:00
parent 6cefe6f4cc
commit b83942845c
5 changed files with 2645 additions and 0 deletions

View File

@@ -0,0 +1,749 @@
# Core Modules: Transformations, Distances, Networks, Datasets, and Benchmarking
This reference provides comprehensive details on foundational modules that support aeon's learning tasks.
## Transformations
Transformations convert time series into alternative representations for feature extraction, preprocessing, or visualization.
### Two Types of Transformers
**Collection Transformers**: Process entire collections of time series
- Input: `(n_cases, n_channels, n_timepoints)`
- Output: Features, transformed collections, or tabular data
**Series Transformers**: Work on individual time series
- Input: Single time series
- Output: Transformed single series
### Collection-Level Transformations
#### ROCKET (RAndom Convolutional KErnel Transform)
Fast feature extraction via random convolutional kernels:
```python
from aeon.transformations.collection.convolution_based import Rocket
rocket = Rocket(num_kernels=10000, n_jobs=-1)
X_transformed = rocket.fit_transform(X_train)
# Output shape: (n_cases, 2 * num_kernels)
```
**Variants:**
```python
from aeon.transformations.collection.convolution_based import (
MiniRocket,
MultiRocket,
Hydra
)
# MiniRocket: Faster, streamlined version
minirocket = MiniRocket(num_kernels=10000)
X_features = minirocket.fit_transform(X_train)
# MultiRocket: Multivariate extensions
multirocket = MultiRocket(num_kernels=10000)
X_features = multirocket.fit_transform(X_train)
# Hydra: Dictionary-based convolution
hydra = Hydra(n_kernels=8)
X_features = hydra.fit_transform(X_train)
```
#### Catch22
22 canonical time series features:
```python
from aeon.transformations.collection.feature_based import Catch22
catch22 = Catch22(n_jobs=-1)
X_features = catch22.fit_transform(X_train)
# Output shape: (n_cases, 22)
```
**Feature categories:**
- Distribution (mean, variance, skewness)
- Autocorrelation properties
- Entropy measures
- Nonlinear dynamics
- Spectral properties
#### TSFresh
Comprehensive feature extraction (779 features):
```python
from aeon.transformations.collection.feature_based import TSFresh
tsfresh = TSFresh(
default_fc_parameters="comprehensive",
n_jobs=-1
)
X_features = tsfresh.fit_transform(X_train)
```
**Warning**: Slow on large datasets; use Catch22 for faster alternative
#### FreshPRINCE
Fresh Pipelines with Random Interval and Catch22 Features:
```python
from aeon.transformations.collection.feature_based import FreshPRINCE
freshprince = FreshPRINCE(n_intervals=50, n_jobs=-1)
X_features = freshprince.fit_transform(X_train)
```
#### Shapelet Transform
Extract discriminative subsequences:
```python
from aeon.transformations.collection.shapelet_based import ShapeletTransform
shapelet = ShapeletTransform(
n_shapelet_samples=10000,
max_shapelets=20,
n_jobs=-1
)
X_features = shapelet.fit_transform(X_train, y_train)
# Requires labels for supervised shapelet discovery
```
**Random Shapelet Transform**:
```python
from aeon.transformations.collection.shapelet_based import RandomShapeletTransform
rst = RandomShapeletTransform(n_shapelets=1000)
X_features = rst.fit_transform(X_train)
```
#### SAST (Shapelet-Attention Subsequence Transform)
Attention-based shapelet discovery:
```python
from aeon.transformations.collection.shapelet_based import SAST
sast = SAST(window_size=0.1, n_shapelets=100)
X_features = sast.fit_transform(X_train, y_train)
```
#### Symbolic Representations
**SAX (Symbolic Aggregate approXimation)**:
```python
from aeon.transformations.collection.dictionary_based import SAX
sax = SAX(n_segments=8, alphabet_size=4)
X_symbolic = sax.fit_transform(X_train)
```
**PAA (Piecewise Aggregate Approximation)**:
```python
from aeon.transformations.collection.dictionary_based import PAA
paa = PAA(n_segments=10)
X_approximated = paa.fit_transform(X_train)
```
**SFA (Symbolic Fourier Approximation)**:
```python
from aeon.transformations.collection.dictionary_based import SFA
sfa = SFA(word_length=8, alphabet_size=4)
X_symbolic = sfa.fit_transform(X_train)
```
#### Channel Selection and Operations
**Channel Selection**:
```python
from aeon.transformations.collection.channel_selection import ChannelSelection
selector = ChannelSelection(channels=[0, 2, 5])
X_selected = selector.fit_transform(X_train)
```
**Channel Scoring**:
```python
from aeon.transformations.collection.channel_selection import ChannelScorer
scorer = ChannelScorer()
scores = scorer.fit_transform(X_train, y_train)
```
#### Data Balancing
**SMOTE (Synthetic Minority Over-sampling)**:
```python
from aeon.transformations.collection.smote import SMOTE
smote = SMOTE(k_neighbors=5)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
```
**ADASYN**:
```python
from aeon.transformations.collection.smote import ADASYN
adasyn = ADASYN(n_neighbors=5)
X_resampled, y_resampled = adasyn.fit_resample(X_train, y_train)
```
### Series-Level Transformations
#### Smoothing Filters
**Moving Average**:
```python
from aeon.transformations.series.moving_average import MovingAverage
ma = MovingAverage(window_size=5)
X_smoothed = ma.fit_transform(X_series)
```
**Exponential Smoothing**:
```python
from aeon.transformations.series.exponent import ExponentTransformer
exp_smooth = ExponentTransformer(power=0.5)
X_smoothed = exp_smooth.fit_transform(X_series)
```
**Savitzky-Golay Filter**:
```python
from aeon.transformations.series.savgol import SavitzkyGolay
savgol = SavitzkyGolay(window_length=11, polyorder=3)
X_smoothed = savgol.fit_transform(X_series)
```
**Gaussian Filter**:
```python
from aeon.transformations.series.gaussian import GaussianFilter
gaussian = GaussianFilter(sigma=2.0)
X_smoothed = gaussian.fit_transform(X_series)
```
#### Statistical Transforms
**Box-Cox Transformation**:
```python
from aeon.transformations.series.boxcox import BoxCoxTransformer
boxcox = BoxCoxTransformer()
X_transformed = boxcox.fit_transform(X_series)
```
**AutoCorrelation**:
```python
from aeon.transformations.series.acf import AutoCorrelationTransformer
acf = AutoCorrelationTransformer(n_lags=40)
X_acf = acf.fit_transform(X_series)
```
**PCA (Principal Component Analysis)**:
```python
from aeon.transformations.series.pca import PCATransformer
pca = PCATransformer(n_components=3)
X_reduced = pca.fit_transform(X_series)
```
#### Approximation Methods
**Discrete Fourier Transform (DFT)**:
```python
from aeon.transformations.series.fourier import FourierTransform
dft = FourierTransform()
X_freq = dft.fit_transform(X_series)
```
**Piecewise Linear Approximation (PLA)**:
```python
from aeon.transformations.series.pla import PLA
pla = PLA(n_segments=10)
X_approx = pla.fit_transform(X_series)
```
#### Anomaly Detection Transform
**DOBIN (Distance-based Outlier BasIs using Neighbors)**:
```python
from aeon.transformations.series.dobin import DOBIN
dobin = DOBIN()
X_transformed = dobin.fit_transform(X_series)
```
### Transformation Pipelines
Chain transformers together:
```python
from sklearn.pipeline import Pipeline
from aeon.transformations.collection import Catch22, PCA
pipeline = Pipeline([
('features', Catch22()),
('reduce', PCA(n_components=10))
])
X_transformed = pipeline.fit_transform(X_train)
```
## Distance Metrics
Specialized distance functions for time series similarity measurement.
### Distance Categories
#### Warping-Based Distances
**DTW (Dynamic Time Warping)**:
```python
from aeon.distances import dtw_distance, dtw_pairwise_distance
# Compute distance between two series
dist = dtw_distance(series1, series2, window=0.2)
# Pairwise distances for a collection
dist_matrix = dtw_pairwise_distance(X_collection)
# Get alignment path
from aeon.distances import dtw_alignment_path
path = dtw_alignment_path(series1, series2)
# Get cost matrix
from aeon.distances import dtw_cost_matrix
cost = dtw_cost_matrix(series1, series2)
```
**DTW Variants**:
```python
from aeon.distances import (
wdtw_distance, # Weighted DTW
ddtw_distance, # Derivative DTW
wddtw_distance, # Weighted Derivative DTW
adtw_distance, # Amerced DTW
shape_dtw_distance # Shape DTW
)
# Weighted DTW (penalize warping)
dist = wdtw_distance(series1, series2, g=0.05)
# Derivative DTW (compare shapes)
dist = ddtw_distance(series1, series2)
# Shape DTW (with shape descriptors)
dist = shape_dtw_distance(series1, series2)
```
**DTW Parameters**:
- `window`: Sakoe-Chiba band constraint (0.0-1.0)
- `g`: Penalty weight for warping distances
#### Edit Distances
**ERP (Edit distance with Real Penalty)**:
```python
from aeon.distances import erp_distance
dist = erp_distance(series1, series2, g=0.0, window=None)
```
**EDR (Edit Distance on Real sequences)**:
```python
from aeon.distances import edr_distance
dist = edr_distance(series1, series2, epsilon=0.1, window=None)
```
**LCSS (Longest Common SubSequence)**:
```python
from aeon.distances import lcss_distance
dist = lcss_distance(series1, series2, epsilon=1.0, window=None)
```
**TWE (Time Warp Edit)**:
```python
from aeon.distances import twe_distance
dist = twe_distance(series1, series2, penalty=0.1, stiffness=0.001)
```
#### Standard Metrics
```python
from aeon.distances import (
euclidean_distance,
manhattan_distance,
minkowski_distance,
squared_distance
)
# Euclidean distance
dist = euclidean_distance(series1, series2)
# Manhattan (L1) distance
dist = manhattan_distance(series1, series2)
# Minkowski distance
dist = minkowski_distance(series1, series2, p=3)
# Squared Euclidean
dist = squared_distance(series1, series2)
```
#### Specialized Distances
**MSM (Move-Split-Merge)**:
```python
from aeon.distances import msm_distance
dist = msm_distance(series1, series2, c=1.0)
```
**SBD (Shape-Based Distance)**:
```python
from aeon.distances import sbd_distance
dist = sbd_distance(series1, series2)
```
### Unified Distance Interface
```python
from aeon.distances import distance, pairwise_distance
# Compute any distance by name
dist = distance(series1, series2, metric="dtw", window=0.1)
# Pairwise distance matrix
dist_matrix = pairwise_distance(X_collection, metric="euclidean")
# Get available distance names
from aeon.distances import get_distance_function_names
available_distances = get_distance_function_names()
```
### Distance Selection Guide
**Fast and accurate**:
- Euclidean for aligned series
- Squared for even faster computation
**Handle temporal shifts**:
- DTW for general warping
- WDTW to penalize excessive warping
**Shape-based similarity**:
- DDTW or Shape DTW
- SBD for normalized shape comparison
**Robust to noise**:
- ERP, EDR, or LCSS
**Multivariate**:
- DTW supports multivariate via independent/dependent alignment
## Deep Learning Networks
Neural network architectures specialized for time series.
### Network Architectures
#### InceptionTime
Ensemble of Inception modules capturing multi-scale patterns:
```python
from aeon.networks import InceptionNetwork
from aeon.classification.deep_learning import InceptionTimeClassifier
# Use via classifier
clf = InceptionTimeClassifier(
n_epochs=200,
batch_size=64,
n_ensemble=5
)
# Or use network directly
network = InceptionNetwork(
n_classes=3,
n_channels=1,
n_timepoints=100
)
```
#### ResNet
Residual networks with skip connections:
```python
from aeon.networks import ResNetNetwork
from aeon.classification.deep_learning import ResNetClassifier
clf = ResNetClassifier(
n_epochs=200,
batch_size=64,
n_res_blocks=3
)
```
#### FCN (Fully Convolutional Network)
```python
from aeon.networks import FCNNetwork
from aeon.classification.deep_learning import FCNClassifier
clf = FCNClassifier(
n_epochs=200,
batch_size=64,
n_conv_layers=3
)
```
#### CNN
Standard convolutional architecture:
```python
from aeon.classification.deep_learning import CNNClassifier
clf = CNNClassifier(
n_epochs=100,
batch_size=32,
kernel_size=7,
n_filters=32
)
```
#### TapNet
Attentional prototype networks:
```python
from aeon.classification.deep_learning import TapNetClassifier
clf = TapNetClassifier(
n_epochs=200,
batch_size=64
)
```
#### MLP (Multi-Layer Perceptron)
```python
from aeon.classification.deep_learning import MLPClassifier
clf = MLPClassifier(
n_epochs=100,
batch_size=32,
hidden_layer_sizes=[500]
)
```
#### LITE (Light Inception with boosTing tEchnique)
Lightweight ensemble network:
```python
from aeon.classification.deep_learning import LITEClassifier
clf = LITEClassifier(
n_epochs=100,
batch_size=64
)
```
### Training Configuration
```python
from aeon.classification.deep_learning import InceptionTimeClassifier
clf = InceptionTimeClassifier(
n_epochs=200,
batch_size=64,
learning_rate=0.001,
use_bias=True,
verbose=1
)
clf.fit(X_train, y_train)
```
**Common parameters:**
- `n_epochs`: Training iterations
- `batch_size`: Samples per gradient update
- `learning_rate`: Optimizer learning rate
- `verbose`: Training output verbosity
- `callbacks`: Keras callbacks (early stopping, etc.)
## Datasets
Load built-in datasets and access UCR/UEA archives.
### Built-in Datasets
```python
from aeon.datasets import (
load_arrow_head,
load_airline,
load_gunpoint,
load_italy_power_demand,
load_basic_motions,
load_japanese_vowels
)
# Classification dataset
X_train, y_train = load_arrow_head(split="train")
X_test, y_test = load_arrow_head(split="test")
# Forecasting dataset (univariate series)
y = load_airline()
# Multivariate classification
X_train, y_train = load_basic_motions(split="train")
print(X_train.shape) # (n_cases, n_channels, n_timepoints)
```
### UCR/UEA Archives
Access 100+ benchmark datasets:
```python
from aeon.datasets import load_from_tsfile, load_classification
# Load UCR/UEA dataset by name
X_train, y_train = load_classification("GunPoint", split="train")
X_test, y_test = load_classification("GunPoint", split="test")
# Load from local .ts file
X, y = load_from_tsfile("data/my_dataset_TRAIN.ts")
```
### Dataset Information
```python
from aeon.datasets import get_dataset_meta_data
# Get metadata about a dataset
info = get_dataset_meta_data("GunPoint")
print(info)
# {'n_cases': 150, 'n_timepoints': 150, 'n_classes': 2, ...}
```
### Custom Dataset Format
Save/load custom datasets in aeon format:
```python
from aeon.datasets import write_to_tsfile, load_from_tsfile
# Save
write_to_tsfile(
X_train,
"my_dataset_TRAIN.ts",
y=y_train,
problem_name="MyDataset"
)
# Load
X, y = load_from_tsfile("my_dataset_TRAIN.ts")
```
## Benchmarking
Tools for reproducible evaluation and comparison.
### Benchmarking Utilities
```python
from aeon.benchmarking import benchmark_estimator
# Benchmark a classifier on multiple datasets
results = benchmark_estimator(
estimator=RocketClassifier(),
datasets=["GunPoint", "ArrowHead", "ItalyPowerDemand"],
n_resamples=10
)
```
### Result Storage and Comparison
```python
from aeon.benchmarking import (
write_results_to_csv,
read_results_from_csv,
compare_results
)
# Save results
write_results_to_csv(results, "results.csv")
# Load and compare
results_rocket = read_results_from_csv("results_rocket.csv")
results_inception = read_results_from_csv("results_inception.csv")
comparison = compare_results(
[results_rocket, results_inception],
estimator_names=["ROCKET", "InceptionTime"]
)
```
### Critical Difference Diagrams
Visualize statistical significance of differences:
```python
from aeon.benchmarking.results_plotting import plot_critical_difference_diagram
plot_critical_difference_diagram(
results_dict={
'ROCKET': results_rocket,
'InceptionTime': results_inception,
'BOSS': results_boss
},
dataset_names=["GunPoint", "ArrowHead", "ItalyPowerDemand"]
)
```
## Discovery and Tags
### Finding Estimators
```python
from aeon.utils.discovery import all_estimators
# Get all classifiers
classifiers = all_estimators(type_filter="classifier")
# Get all transformers
transformers = all_estimators(type_filter="transformer")
# Filter by capability tags
multivariate_classifiers = all_estimators(
type_filter="classifier",
filter_tags={"capability:multivariate": True}
)
```
### Checking Estimator Tags
```python
from aeon.utils.tags import all_tags_for_estimator
from aeon.classification.convolution_based import RocketClassifier
tags = all_tags_for_estimator(RocketClassifier)
print(tags)
# {'capability:multivariate': True, 'X_inner_type': ['numpy3D'], ...}
```
### Common Tags
- `capability:multivariate`: Handles multivariate series
- `capability:unequal_length`: Handles variable-length series
- `capability:missing_values`: Handles missing data
- `algorithm_type`: Algorithm family (e.g., "convolution", "distance")
- `python_dependencies`: Required packages