mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-28 07:33:45 +08:00
Support for aeon for time-series analysis and machine learning
This commit is contained in:
442
scientific-packages/aeon/references/learning_tasks.md
Normal file
442
scientific-packages/aeon/references/learning_tasks.md
Normal file
@@ -0,0 +1,442 @@
|
||||
# Learning Tasks: Classification, Regression, Clustering, and Similarity Search
|
||||
|
||||
This reference provides comprehensive details on supervised and unsupervised learning tasks for time series collections.
|
||||
|
||||
## Time Series Classification
|
||||
|
||||
Time series classification (TSC) assigns labels to entire sequences. Aeon provides diverse algorithm families with unique strengths.
|
||||
|
||||
### Algorithm Categories
|
||||
|
||||
#### 1. Convolution-Based Classifiers
|
||||
Transform time series using random convolutional kernels:
|
||||
|
||||
**ROCKET (RAndom Convolutional KErnel Transform)**
|
||||
- Ultra-fast feature extraction via random kernels
|
||||
- 10,000+ kernels generate discriminative features
|
||||
- Linear classifier on extracted features
|
||||
|
||||
```python
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
|
||||
clf = RocketClassifier(num_kernels=10000, n_jobs=-1)
|
||||
clf.fit(X_train, y_train)
|
||||
predictions = clf.predict(X_test)
|
||||
probabilities = clf.predict_proba(X_test)
|
||||
```
|
||||
|
||||
**Variants:**
|
||||
- `MiniRocketClassifier`: Faster, streamlined version
|
||||
- `MultiRocketClassifier`: Multivariate extensions
|
||||
- `Arsenal`: Ensemble of ROCKET transformers
|
||||
- `Hydra`: Dictionary-based convolution variant
|
||||
|
||||
#### 2. Deep Learning Classifiers
|
||||
Neural networks specialized for time series:
|
||||
|
||||
**InceptionTime**
|
||||
- Ensemble of Inception modules
|
||||
- Captures patterns at multiple scales
|
||||
- State-of-the-art on UCR benchmarks
|
||||
|
||||
```python
|
||||
from aeon.classification.deep_learning import InceptionTimeClassifier
|
||||
|
||||
clf = InceptionTimeClassifier(n_epochs=200, batch_size=64)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other architectures:**
|
||||
- `ResNetClassifier`: Residual connections
|
||||
- `FCNClassifier`: Fully Convolutional Networks
|
||||
- `CNNClassifier`: Standard convolutional architecture
|
||||
- `LITEClassifier`: Lightweight networks
|
||||
- `MLPClassifier`: Multi-layer perceptrons
|
||||
- `TapNetClassifier`: Attentional prototype networks
|
||||
|
||||
#### 3. Dictionary-Based Classifiers
|
||||
Symbolic representations and bag-of-words approaches:
|
||||
|
||||
**BOSS (Bag of SFA Symbols)**
|
||||
- Converts series to symbolic words
|
||||
- Histogram-based classification
|
||||
- Effective for shape patterns
|
||||
|
||||
```python
|
||||
from aeon.classification.dictionary_based import BOSSEnsemble
|
||||
|
||||
clf = BOSSEnsemble(max_ensemble_size=500)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other dictionary methods:**
|
||||
- `TemporalDictionaryEnsemble (TDE)`: Enhanced BOSS with temporal info
|
||||
- `WEASEL`: Word ExtrAction for time SEries cLassification
|
||||
- `MUSE`: MUltivariate Symbolic Extension
|
||||
- `MrSEQL`: Multiple Representations SEQuence Learner
|
||||
|
||||
#### 4. Distance-Based Classifiers
|
||||
Leverage time series-specific distance metrics:
|
||||
|
||||
**K-Nearest Neighbors with DTW**
|
||||
- Dynamic Time Warping handles temporal shifts
|
||||
- Effective for shape-based similarity
|
||||
|
||||
```python
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
|
||||
clf = KNeighborsTimeSeriesClassifier(
|
||||
distance="dtw",
|
||||
n_neighbors=5
|
||||
)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other distance methods:**
|
||||
- `ElasticEnsemble`: Ensemble of elastic distances
|
||||
- `ProximityForest`: Tree-based with elastic measures
|
||||
- `ProximityTree`: Single tree variant
|
||||
- `ShapeDTW`: DTW with shape descriptors
|
||||
|
||||
#### 5. Feature-Based Classifiers
|
||||
Extract statistical and domain-specific features:
|
||||
|
||||
**Catch22**
|
||||
- 22 time series features
|
||||
- Canonical Time-series CHaracteristics
|
||||
- Fast and interpretable
|
||||
|
||||
```python
|
||||
from aeon.classification.feature_based import Catch22Classifier
|
||||
|
||||
clf = Catch22Classifier(estimator=RandomForestClassifier())
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other feature methods:**
|
||||
- `FreshPRINCEClassifier`: Fresh Pipelines with Random Interval and Catch22 Features
|
||||
- `SignatureClassifier`: Path signature features
|
||||
- `TSFreshClassifier`: Comprehensive feature extraction (slower, more features)
|
||||
- `SummaryClassifier`: Simple summary statistics
|
||||
|
||||
#### 6. Interval-Based Classifiers
|
||||
Analyze discriminative time intervals:
|
||||
|
||||
**Time Series Forest (TSF)**
|
||||
- Random intervals + summary statistics
|
||||
- Random forest on extracted features
|
||||
|
||||
```python
|
||||
from aeon.classification.interval_based import TimeSeriesForestClassifier
|
||||
|
||||
clf = TimeSeriesForestClassifier(n_estimators=500)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other interval methods:**
|
||||
- `CanonicalIntervalForest (CIF)`: Canonical Interval Forest
|
||||
- `DrCIF`: Diverse Representation CIF
|
||||
- `RISE`: Random Interval Spectral Ensemble
|
||||
- `RandomIntervalClassifier`: Basic random interval approach
|
||||
- `STSF`: Shapelet Transform Interval Forest
|
||||
|
||||
#### 7. Shapelet-Based Classifiers
|
||||
Discover discriminative subsequences:
|
||||
|
||||
**Shapelets**: Small subsequences that best distinguish classes
|
||||
|
||||
```python
|
||||
from aeon.classification.shapelet_based import ShapeletTransformClassifier
|
||||
|
||||
clf = ShapeletTransformClassifier(
|
||||
n_shapelet_samples=10000,
|
||||
max_shapelets=20
|
||||
)
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
**Other shapelet methods:**
|
||||
- `LearningShapeletClassifier`: Gradient-based learning
|
||||
- `SASTClassifier`: Shapelet-Attention Subsequence Transform
|
||||
|
||||
#### 8. Hybrid Ensembles
|
||||
Combine multiple algorithm families:
|
||||
|
||||
**HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles)**
|
||||
- State-of-the-art accuracy
|
||||
- Combines shapelets, intervals, dictionaries, and spectral features
|
||||
- V2 uses ROCKET and improved components
|
||||
|
||||
```python
|
||||
from aeon.classification.hybrid import HIVECOTEV2
|
||||
|
||||
clf = HIVECOTEV2(n_jobs=-1) # Slow but highly accurate
|
||||
clf.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
### Algorithm Selection Guide
|
||||
|
||||
**Fast and accurate (default choice):**
|
||||
- `RocketClassifier` or `MiniRocketClassifier`
|
||||
|
||||
**Maximum accuracy (slow):**
|
||||
- `HIVECOTEV2` or `InceptionTimeClassifier`
|
||||
|
||||
**Interpretable:**
|
||||
- `Catch22Classifier` or `ShapeletTransformClassifier`
|
||||
|
||||
**Multivariate focus:**
|
||||
- `MultiRocketClassifier` or `MUSE`
|
||||
|
||||
**Small datasets:**
|
||||
- `KNeighborsTimeSeriesClassifier` with DTW
|
||||
|
||||
### Classification Workflow
|
||||
|
||||
```python
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
from aeon.datasets import load_arrow_head
|
||||
from sklearn.metrics import accuracy_score, classification_report
|
||||
|
||||
# Load data
|
||||
X_train, y_train = load_arrow_head(split="train")
|
||||
X_test, y_test = load_arrow_head(split="test")
|
||||
|
||||
# Train classifier
|
||||
clf = RocketClassifier(n_jobs=-1)
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
# Evaluate
|
||||
y_pred = clf.predict(X_test)
|
||||
accuracy = accuracy_score(y_test, y_pred)
|
||||
print(f"Accuracy: {accuracy:.3f}")
|
||||
print(classification_report(y_test, y_pred))
|
||||
```
|
||||
|
||||
## Time Series Regression
|
||||
|
||||
Time series regression predicts continuous values from sequences. Most classification algorithms have regression equivalents.
|
||||
|
||||
### Regression Algorithms
|
||||
|
||||
Available regressors mirror classification structure:
|
||||
- `RocketRegressor`, `MiniRocketRegressor`, `MultiRocketRegressor`
|
||||
- `InceptionTimeRegressor`, `ResNetRegressor`, `FCNRegressor`
|
||||
- `KNeighborsTimeSeriesRegressor`
|
||||
- `Catch22Regressor`, `FreshPRINCERegressor`
|
||||
- `TimeSeriesForestRegressor`, `DrCIFRegressor`
|
||||
|
||||
### Regression Workflow
|
||||
|
||||
```python
|
||||
from aeon.regression.convolution_based import RocketRegressor
|
||||
from sklearn.metrics import mean_squared_error, r2_score
|
||||
|
||||
# Train regressor
|
||||
reg = RocketRegressor(num_kernels=10000)
|
||||
reg.fit(X_train, y_train_continuous)
|
||||
|
||||
# Predict and evaluate
|
||||
y_pred = reg.predict(X_test)
|
||||
mse = mean_squared_error(y_test, y_pred)
|
||||
r2 = r2_score(y_test, y_pred)
|
||||
print(f"MSE: {mse:.3f}, R²: {r2:.3f}")
|
||||
```
|
||||
|
||||
## Time Series Clustering
|
||||
|
||||
Clustering groups similar time series without labels.
|
||||
|
||||
### Clustering Algorithms
|
||||
|
||||
**TimeSeriesKMeans**
|
||||
- K-means with time series distances
|
||||
- Supports DTW, Euclidean, and other metrics
|
||||
|
||||
```python
|
||||
from aeon.clustering import TimeSeriesKMeans
|
||||
|
||||
clusterer = TimeSeriesKMeans(
|
||||
n_clusters=3,
|
||||
distance="dtw",
|
||||
n_init=10
|
||||
)
|
||||
clusterer.fit(X_collection)
|
||||
labels = clusterer.labels_
|
||||
```
|
||||
|
||||
**TimeSeriesKMedoids**
|
||||
- Uses actual series as cluster centers
|
||||
- More robust to outliers
|
||||
|
||||
```python
|
||||
from aeon.clustering import TimeSeriesKMedoids
|
||||
|
||||
clusterer = TimeSeriesKMedoids(
|
||||
n_clusters=3,
|
||||
distance="euclidean"
|
||||
)
|
||||
clusterer.fit(X_collection)
|
||||
```
|
||||
|
||||
**Other clustering methods:**
|
||||
- `TimeSeriesKernelKMeans`: Kernel-based clustering
|
||||
- `ElasticSOM`: Self-organizing maps with elastic distances
|
||||
|
||||
### Clustering Workflow
|
||||
|
||||
```python
|
||||
from aeon.clustering import TimeSeriesKMeans
|
||||
from aeon.distances import dtw_distance
|
||||
import numpy as np
|
||||
|
||||
# Cluster time series
|
||||
clusterer = TimeSeriesKMeans(n_clusters=4, distance="dtw")
|
||||
clusterer.fit(X_train)
|
||||
|
||||
# Get cluster labels
|
||||
labels = clusterer.predict(X_test)
|
||||
|
||||
# Compute cluster centers
|
||||
centers = clusterer.cluster_centers_
|
||||
|
||||
# Evaluate clustering quality (if ground truth available)
|
||||
from sklearn.metrics import adjusted_rand_score
|
||||
ari = adjusted_rand_score(y_true, labels)
|
||||
```
|
||||
|
||||
## Similarity Search
|
||||
|
||||
Similarity search finds motifs, nearest neighbors, and repeated patterns.
|
||||
|
||||
### Key Concepts
|
||||
|
||||
**Motifs**: Frequently repeated subsequences within a time series
|
||||
**Matrix Profile**: Data structure encoding nearest neighbor distances for all subsequences
|
||||
|
||||
### Similarity Search Methods
|
||||
|
||||
**Matrix Profile**
|
||||
- Efficient motif discovery
|
||||
- Change point detection
|
||||
- Anomaly detection
|
||||
|
||||
```python
|
||||
from aeon.similarity_search import MatrixProfile
|
||||
|
||||
mp = MatrixProfile(window_size=50)
|
||||
profile = mp.fit_transform(X_series)
|
||||
|
||||
# Find top motif
|
||||
motif_idx = np.argmin(profile)
|
||||
```
|
||||
|
||||
**Query Search**
|
||||
- Find nearest neighbors to a query subsequence
|
||||
- Useful for template matching
|
||||
|
||||
```python
|
||||
from aeon.similarity_search import QuerySearch
|
||||
|
||||
searcher = QuerySearch(distance="euclidean")
|
||||
distances, indices = searcher.search(X_series, query_subsequence)
|
||||
```
|
||||
|
||||
### Similarity Search Workflow
|
||||
|
||||
```python
|
||||
from aeon.similarity_search import MatrixProfile
|
||||
import numpy as np
|
||||
|
||||
# Compute matrix profile
|
||||
mp = MatrixProfile(window_size=100)
|
||||
profile, profile_index = mp.fit_transform(X_series)
|
||||
|
||||
# Find top-k motifs (lowest profile values)
|
||||
k = 3
|
||||
motif_indices = np.argsort(profile)[:k]
|
||||
|
||||
# Find anomalies (highest profile values)
|
||||
anomaly_indices = np.argsort(profile)[-k:]
|
||||
```
|
||||
|
||||
## Ensemble and Composition Tools
|
||||
|
||||
### Voting Ensembles
|
||||
```python
|
||||
from aeon.classification.ensemble import WeightedEnsembleClassifier
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
|
||||
ensemble = WeightedEnsembleClassifier(
|
||||
estimators=[
|
||||
('rocket', RocketClassifier()),
|
||||
('knn', KNeighborsTimeSeriesClassifier())
|
||||
]
|
||||
)
|
||||
ensemble.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
### Pipelines
|
||||
```python
|
||||
from sklearn.pipeline import Pipeline
|
||||
from aeon.transformations.collection import Catch22
|
||||
from sklearn.ensemble import RandomForestClassifier
|
||||
|
||||
pipeline = Pipeline([
|
||||
('features', Catch22()),
|
||||
('classifier', RandomForestClassifier())
|
||||
])
|
||||
pipeline.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
## Model Selection and Validation
|
||||
|
||||
### Cross-Validation
|
||||
```python
|
||||
from sklearn.model_selection import cross_val_score
|
||||
from aeon.classification.convolution_based import RocketClassifier
|
||||
|
||||
clf = RocketClassifier()
|
||||
scores = cross_val_score(clf, X_train, y_train, cv=5)
|
||||
print(f"CV Accuracy: {scores.mean():.3f} (+/- {scores.std():.3f})")
|
||||
```
|
||||
|
||||
### Grid Search
|
||||
```python
|
||||
from sklearn.model_selection import GridSearchCV
|
||||
from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier
|
||||
|
||||
param_grid = {
|
||||
'n_neighbors': [1, 3, 5, 7],
|
||||
'distance': ['dtw', 'euclidean', 'erp']
|
||||
}
|
||||
|
||||
clf = KNeighborsTimeSeriesClassifier()
|
||||
grid_search = GridSearchCV(clf, param_grid, cv=5)
|
||||
grid_search.fit(X_train, y_train)
|
||||
print(f"Best params: {grid_search.best_params_}")
|
||||
```
|
||||
|
||||
## Discovery Functions
|
||||
|
||||
Find available estimators programmatically:
|
||||
|
||||
```python
|
||||
from aeon.utils.discovery import all_estimators
|
||||
|
||||
# Get all classifiers
|
||||
classifiers = all_estimators(type_filter="classifier")
|
||||
|
||||
# Get all regressors
|
||||
regressors = all_estimators(type_filter="regressor")
|
||||
|
||||
# Get all clusterers
|
||||
clusterers = all_estimators(type_filter="clusterer")
|
||||
|
||||
# Filter by tag (e.g., multivariate capable)
|
||||
mv_classifiers = all_estimators(
|
||||
type_filter="classifier",
|
||||
filter_tags={"capability:multivariate": True}
|
||||
)
|
||||
```
|
||||
Reference in New Issue
Block a user