Add neuropixels-analysis skill for extracellular electrophysiology

Adds comprehensive toolkit for analyzing Neuropixels high-density neural recordings using SpikeInterface, Allen Institute, and IBL best practices. Features: - Data loading from SpikeGLX, Open Ephys, and NWB formats - Preprocessing pipelines (filtering, phase shift, CAR, bad channel detection) - Motion/drift estimation and correction - Spike sorting integration (Kilosort4, SpykingCircus2, Mountainsort5) - Quality metrics computation (SNR, ISI violations, presence ratio) - Automated curation using Allen/IBL criteria - AI-assisted visual curation for uncertain units - Export to Phy and NWB formats Supports Neuropixels 1.0 and 2.0 probes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 16:58:56 +08:00 · 2025-12-17 11:06:28 -05:00
parent 4fb9c053f7
commit 312f18ae60
21 changed files with 5358 additions and 1 deletions
--- a/scientific-skills/neuropixels-analysis/AUTOMATED_CURATION.md
+++ b/scientific-skills/neuropixels-analysis/AUTOMATED_CURATION.md
@@ -0,0 +1,358 @@
+# Automated Curation Reference
+
+Guide to automated spike sorting curation using Bombcell, UnitRefine, and other tools.
+
+## Why Automated Curation?
+
+Manual curation is:
+- **Slow**: Hours per recording session
+- **Subjective**: Inter-rater variability
+- **Non-reproducible**: Hard to standardize
+
+Automated tools provide consistent, reproducible quality classification.
+
+## Available Tools
+
+| Tool | Classification | Language | Integration |
+|------|---------------|----------|-------------|
+| **Bombcell** | 4-class (single/multi/noise/non-somatic) | Python/MATLAB | SpikeInterface, Phy |
+| **UnitRefine** | Machine learning-based | Python | SpikeInterface |
+| **SpikeInterface QM** | Threshold-based | Python | Native |
+| **UnitMatch** | Cross-session tracking | Python/MATLAB | Kilosort, Bombcell |
+
+## Bombcell
+
+### Overview
+
+Bombcell classifies units into 4 categories:
+1. **Single somatic units** - Well-isolated single neurons
+2. **Multi-unit activity (MUA)** - Mixed neuronal signals
+3. **Noise** - Non-neural artifacts
+4. **Non-somatic** - Axonal or dendritic signals
+
+### Installation
+
+```bash
+# Python
+pip install bombcell
+
+# Or development version
+git clone https://github.com/Julie-Fabre/bombcell.git
+cd bombcell/py_bombcell
+pip install -e .
+```
+
+### Basic Usage (Python)
+
+```python
+import bombcell as bc
+
+# Load sorted data (Kilosort output)
+kilosort_folder = '/path/to/kilosort/output'
+raw_data_path = '/path/to/recording.ap.bin'
+
+# Run Bombcell
+results = bc.run_bombcell(
+    kilosort_folder,
+    raw_data_path,
+    sample_rate=30000,
+    n_channels=384
+)
+
+# Get classifications
+unit_labels = results['unit_labels']
+# 'good' = single unit, 'mua' = multi-unit, 'noise' = noise
+```
+
+### Integration with SpikeInterface
+
+```python
+import spikeinterface.full as si
+
+# After spike sorting
+sorting = si.run_sorter('kilosort4', recording, output_folder='ks4/')
+
+# Create analyzer and compute required extensions
+analyzer = si.create_sorting_analyzer(sorting, recording, sparse=True)
+analyzer.compute('waveforms')
+analyzer.compute('templates')
+analyzer.compute('spike_amplitudes')
+
+# Export to Phy format (Bombcell can read this)
+si.export_to_phy(analyzer, output_folder='phy_export/')
+
+# Run Bombcell on Phy export
+import bombcell as bc
+results = bc.run_bombcell_phy('phy_export/')
+```
+
+### Bombcell Metrics
+
+Bombcell computes specific metrics for classification:
+
+| Metric | Description | Used For |
+|--------|-------------|----------|
+| `peak_trough_ratio` | Waveform shape | Somatic vs non-somatic |
+| `spatial_decay` | Amplitude across channels | Noise detection |
+| `refractory_period_violations` | ISI violations | Single vs multi |
+| `presence_ratio` | Temporal stability | Unit quality |
+| `waveform_duration` | Peak-to-trough time | Cell type |
+
+### Custom Thresholds
+
+```python
+# Customize classification thresholds
+custom_params = {
+    'isi_threshold': 0.01,          # ISI violation threshold
+    'presence_threshold': 0.9,       # Minimum presence ratio
+    'amplitude_threshold': 20,       # Minimum amplitude (μV)
+    'spatial_decay_threshold': 40,   # Spatial decay (μm)
+}
+
+results = bc.run_bombcell(
+    kilosort_folder,
+    raw_data_path,
+    **custom_params
+)
+```
+
+## SpikeInterface Auto-Curation
+
+### Threshold-Based Curation
+
+```python
+# Compute quality metrics
+analyzer.compute('quality_metrics')
+qm = analyzer.get_extension('quality_metrics').get_data()
+
+# Define curation function
+def auto_curate(qm):
+    labels = {}
+    for unit_id in qm.index:
+        row = qm.loc[unit_id]
+
+        # Classification logic
+        if row['snr'] < 2 or row['presence_ratio'] < 0.5:
+            labels[unit_id] = 'noise'
+        elif row['isi_violations_ratio'] > 0.1:
+            labels[unit_id] = 'mua'
+        elif (row['snr'] > 5 and
+              row['isi_violations_ratio'] < 0.01 and
+              row['presence_ratio'] > 0.9):
+            labels[unit_id] = 'good'
+        else:
+            labels[unit_id] = 'unsorted'
+
+    return labels
+
+unit_labels = auto_curate(qm)
+
+# Filter by label
+good_unit_ids = [u for u, l in unit_labels.items() if l == 'good']
+sorting_curated = sorting.select_units(good_unit_ids)
+```
+
+### Using SpikeInterface Curation Module
+
+```python
+from spikeinterface.curation import (
+    CurationSorting,
+    MergeUnitsSorting,
+    SplitUnitSorting
+)
+
+# Wrap sorting for curation
+curation = CurationSorting(sorting)
+
+# Remove noise units
+noise_units = qm[qm['snr'] < 2].index.tolist()
+curation.remove_units(noise_units)
+
+# Merge similar units (based on template similarity)
+analyzer.compute('template_similarity')
+similarity = analyzer.get_extension('template_similarity').get_data()
+
+# Find highly similar pairs
+import numpy as np
+threshold = 0.9
+similar_pairs = np.argwhere(similarity > threshold)
+# Merge pairs (careful - requires manual review)
+
+# Get curated sorting
+sorting_curated = curation.to_sorting()
+```
+
+## UnitMatch: Cross-Session Tracking
+
+Track the same neurons across recording days.
+
+### Installation
+
+```bash
+pip install unitmatch
+# Or from source
+git clone https://github.com/EnnyvanBeest/UnitMatch.git
+```
+
+### Usage
+
+```python
+# After running Bombcell on multiple sessions
+session_folders = [
+    '/path/to/session1/kilosort/',
+    '/path/to/session2/kilosort/',
+    '/path/to/session3/kilosort/',
+]
+
+from unitmatch import UnitMatch
+
+# Run UnitMatch
+um = UnitMatch(session_folders)
+um.run()
+
+# Get matching results
+matches = um.get_matches()
+# Returns DataFrame with unit IDs matched across sessions
+
+# Assign unique IDs
+unique_ids = um.get_unique_ids()
+```
+
+### Integration with Workflow
+
+```python
+# Typical workflow:
+# 1. Spike sort each session
+# 2. Run Bombcell for quality control
+# 3. Run UnitMatch for cross-session tracking
+
+# Session 1
+sorting1 = si.run_sorter('kilosort4', rec1, output_folder='session1/ks4/')
+# Run Bombcell
+labels1 = bc.run_bombcell('session1/ks4/', raw1_path)
+
+# Session 2
+sorting2 = si.run_sorter('kilosort4', rec2, output_folder='session2/ks4/')
+labels2 = bc.run_bombcell('session2/ks4/', raw2_path)
+
+# Track units across sessions
+um = UnitMatch(['session1/ks4/', 'session2/ks4/'])
+matches = um.get_matches()
+```
+
+## Semi-Automated Workflow
+
+Combine automated and manual curation:
+
+```python
+# Step 1: Automated classification
+analyzer.compute('quality_metrics')
+qm = analyzer.get_extension('quality_metrics').get_data()
+
+# Auto-label obvious cases
+auto_labels = {}
+for unit_id in qm.index:
+    row = qm.loc[unit_id]
+    if row['snr'] < 1.5:
+        auto_labels[unit_id] = 'noise'
+    elif row['snr'] > 8 and row['isi_violations_ratio'] < 0.005:
+        auto_labels[unit_id] = 'good'
+    else:
+        auto_labels[unit_id] = 'needs_review'
+
+# Step 2: Export uncertain units for manual review
+needs_review = [u for u, l in auto_labels.items() if l == 'needs_review']
+
+# Export only uncertain units to Phy
+sorting_review = sorting.select_units(needs_review)
+analyzer_review = si.create_sorting_analyzer(sorting_review, recording)
+analyzer_review.compute('waveforms')
+analyzer_review.compute('templates')
+si.export_to_phy(analyzer_review, output_folder='phy_review/')
+
+# Manual review in Phy: phy template-gui phy_review/params.py
+
+# Step 3: Load manual labels and merge
+manual_labels = si.read_phy('phy_review/').get_property('quality')
+# Combine auto + manual labels for final result
+```
+
+## Comparison of Methods
+
+| Method | Pros | Cons |
+|--------|------|------|
+| **Manual (Phy)** | Gold standard, flexible | Slow, subjective |
+| **SpikeInterface QM** | Fast, reproducible | Simple thresholds only |
+| **Bombcell** | Multi-class, validated | Requires waveform extraction |
+| **UnitRefine** | ML-based, learns from data | Needs training data |
+
+## Best Practices
+
+1. **Always visualize** - Don't blindly trust automated results
+2. **Document thresholds** - Record exact parameters used
+3. **Validate** - Compare automated vs manual on subset
+4. **Be conservative** - When in doubt, exclude the unit
+5. **Report methods** - Include curation criteria in publications
+
+## Pipeline Example
+
+```python
+def curate_sorting(sorting, recording, output_dir):
+    """Complete curation pipeline."""
+
+    # Create analyzer
+    analyzer = si.create_sorting_analyzer(sorting, recording, sparse=True,
+                                          folder=f'{output_dir}/analyzer')
+
+    # Compute required extensions
+    analyzer.compute('random_spikes', max_spikes_per_unit=500)
+    analyzer.compute('waveforms')
+    analyzer.compute('templates')
+    analyzer.compute('noise_levels')
+    analyzer.compute('spike_amplitudes')
+    analyzer.compute('quality_metrics')
+
+    qm = analyzer.get_extension('quality_metrics').get_data()
+
+    # Auto-classify
+    labels = {}
+    for unit_id in qm.index:
+        row = qm.loc[unit_id]
+
+        if row['snr'] < 2:
+            labels[unit_id] = 'noise'
+        elif row['isi_violations_ratio'] > 0.1 or row['presence_ratio'] < 0.8:
+            labels[unit_id] = 'mua'
+        elif (row['snr'] > 5 and
+              row['isi_violations_ratio'] < 0.01 and
+              row['presence_ratio'] > 0.9 and
+              row['amplitude_cutoff'] < 0.1):
+            labels[unit_id] = 'good'
+        else:
+            labels[unit_id] = 'unsorted'
+
+    # Summary
+    from collections import Counter
+    print("Classification summary:")
+    print(Counter(labels.values()))
+
+    # Save labels
+    import json
+    with open(f'{output_dir}/unit_labels.json', 'w') as f:
+        json.dump(labels, f)
+
+    # Return good units
+    good_ids = [u for u, l in labels.items() if l == 'good']
+    return sorting.select_units(good_ids), labels
+
+# Usage
+sorting_curated, labels = curate_sorting(sorting, recording, 'output/')
+```
+
+## References
+
+- [Bombcell GitHub](https://github.com/Julie-Fabre/bombcell)
+- [UnitMatch GitHub](https://github.com/EnnyvanBeest/UnitMatch)
+- [SpikeInterface Curation](https://spikeinterface.readthedocs.io/en/stable/modules/curation.html)
+- Fabre et al. (2023) "Bombcell: automated curation and cell classification"
+- van Beest et al. (2024) "UnitMatch: tracking neurons across days with high-density probes"