skills/claude-scientific-skills

Fork 0

mirror of https://github.com/K-Dense-AI/claude-scientific-skills.git synced 2026-01-26 16:58:56 +08:00

Files

Timothy Kassis 2621ee329d Add skill author

2025-12-31 13:57:51 -08:00

9.8 KiB

Raw Blame History

Automated Curation Reference

Guide to automated spike sorting curation using Bombcell, UnitRefine, and other tools.

Why Automated Curation?

Manual curation is:

Slow: Hours per recording session
Subjective: Inter-rater variability
Non-reproducible: Hard to standardize

Automated tools provide consistent, reproducible quality classification.

Available Tools

Tool	Classification	Language	Integration
Bombcell	4-class (single/multi/noise/non-somatic)	Python/MATLAB	SpikeInterface, Phy
UnitRefine	Machine learning-based	Python	SpikeInterface
SpikeInterface QM	Threshold-based	Python	Native
UnitMatch	Cross-session tracking	Python/MATLAB	Kilosort, Bombcell

Bombcell

Overview

Bombcell classifies units into 4 categories:

Single somatic units - Well-isolated single neurons
Multi-unit activity (MUA) - Mixed neuronal signals
Noise - Non-neural artifacts
Non-somatic - Axonal or dendritic signals

Installation

# Python
pip install bombcell

# Or development version
git clone https://github.com/Julie-Fabre/bombcell.git
cd bombcell/py_bombcell
pip install -e .

Basic Usage (Python)

import bombcell as bc

# Load sorted data (Kilosort output)
kilosort_folder = '/path/to/kilosort/output'
raw_data_path = '/path/to/recording.ap.bin'

# Run Bombcell
results = bc.run_bombcell(
    kilosort_folder,
    raw_data_path,
    sample_rate=30000,
    n_channels=384
)

# Get classifications
unit_labels = results['unit_labels']
# 'good' = single unit, 'mua' = multi-unit, 'noise' = noise

Integration with SpikeInterface

import spikeinterface.full as si

# After spike sorting
sorting = si.run_sorter('kilosort4', recording, output_folder='ks4/')

# Create analyzer and compute required extensions
analyzer = si.create_sorting_analyzer(sorting, recording, sparse=True)
analyzer.compute('waveforms')
analyzer.compute('templates')
analyzer.compute('spike_amplitudes')

# Export to Phy format (Bombcell can read this)
si.export_to_phy(analyzer, output_folder='phy_export/')

# Run Bombcell on Phy export
import bombcell as bc
results = bc.run_bombcell_phy('phy_export/')

Bombcell Metrics

Bombcell computes specific metrics for classification:

Metric	Description	Used For
`peak_trough_ratio`	Waveform shape	Somatic vs non-somatic
`spatial_decay`	Amplitude across channels	Noise detection
`refractory_period_violations`	ISI violations	Single vs multi
`presence_ratio`	Temporal stability	Unit quality
`waveform_duration`	Peak-to-trough time	Cell type

Custom Thresholds

# Customize classification thresholds
custom_params = {
    'isi_threshold': 0.01,          # ISI violation threshold
    'presence_threshold': 0.9,       # Minimum presence ratio
    'amplitude_threshold': 20,       # Minimum amplitude (μV)
    'spatial_decay_threshold': 40,   # Spatial decay (μm)
}

results = bc.run_bombcell(
    kilosort_folder,
    raw_data_path,
    **custom_params
)

SpikeInterface Auto-Curation

Threshold-Based Curation

# Compute quality metrics
analyzer.compute('quality_metrics')
qm = analyzer.get_extension('quality_metrics').get_data()

# Define curation function
def auto_curate(qm):
    labels = {}
    for unit_id in qm.index:
        row = qm.loc[unit_id]

        # Classification logic
        if row['snr'] < 2 or row['presence_ratio'] < 0.5:
            labels[unit_id] = 'noise'
        elif row['isi_violations_ratio'] > 0.1:
            labels[unit_id] = 'mua'
        elif (row['snr'] > 5 and
              row['isi_violations_ratio'] < 0.01 and
              row['presence_ratio'] > 0.9):
            labels[unit_id] = 'good'
        else:
            labels[unit_id] = 'unsorted'

    return labels

unit_labels = auto_curate(qm)

# Filter by label
good_unit_ids = [u for u, l in unit_labels.items() if l == 'good']
sorting_curated = sorting.select_units(good_unit_ids)

Using SpikeInterface Curation Module

from spikeinterface.curation import (
    CurationSorting,
    MergeUnitsSorting,
    SplitUnitSorting
)

# Wrap sorting for curation
curation = CurationSorting(sorting)

# Remove noise units
noise_units = qm[qm['snr'] < 2].index.tolist()
curation.remove_units(noise_units)

# Merge similar units (based on template similarity)
analyzer.compute('template_similarity')
similarity = analyzer.get_extension('template_similarity').get_data()

# Find highly similar pairs
import numpy as np
threshold = 0.9
similar_pairs = np.argwhere(similarity > threshold)
# Merge pairs (careful - requires manual review)

# Get curated sorting
sorting_curated = curation.to_sorting()

UnitMatch: Cross-Session Tracking

Track the same neurons across recording days.

Installation

pip install unitmatch
# Or from source
git clone https://github.com/EnnyvanBeest/UnitMatch.git

Usage

# After running Bombcell on multiple sessions
session_folders = [
    '/path/to/session1/kilosort/',
    '/path/to/session2/kilosort/',
    '/path/to/session3/kilosort/',
]

from unitmatch import UnitMatch

# Run UnitMatch
um = UnitMatch(session_folders)
um.run()

# Get matching results
matches = um.get_matches()
# Returns DataFrame with unit IDs matched across sessions

# Assign unique IDs
unique_ids = um.get_unique_ids()

Integration with Workflow

# Typical workflow:
# 1. Spike sort each session
# 2. Run Bombcell for quality control
# 3. Run UnitMatch for cross-session tracking

# Session 1
sorting1 = si.run_sorter('kilosort4', rec1, output_folder='session1/ks4/')
# Run Bombcell
labels1 = bc.run_bombcell('session1/ks4/', raw1_path)

# Session 2
sorting2 = si.run_sorter('kilosort4', rec2, output_folder='session2/ks4/')
labels2 = bc.run_bombcell('session2/ks4/', raw2_path)

# Track units across sessions
um = UnitMatch(['session1/ks4/', 'session2/ks4/'])
matches = um.get_matches()

Semi-Automated Workflow

Combine automated and manual curation:

# Step 1: Automated classification
analyzer.compute('quality_metrics')
qm = analyzer.get_extension('quality_metrics').get_data()

# Auto-label obvious cases
auto_labels = {}
for unit_id in qm.index:
    row = qm.loc[unit_id]
    if row['snr'] < 1.5:
        auto_labels[unit_id] = 'noise'
    elif row['snr'] > 8 and row['isi_violations_ratio'] < 0.005:
        auto_labels[unit_id] = 'good'
    else:
        auto_labels[unit_id] = 'needs_review'

# Step 2: Export uncertain units for manual review
needs_review = [u for u, l in auto_labels.items() if l == 'needs_review']

# Export only uncertain units to Phy
sorting_review = sorting.select_units(needs_review)
analyzer_review = si.create_sorting_analyzer(sorting_review, recording)
analyzer_review.compute('waveforms')
analyzer_review.compute('templates')
si.export_to_phy(analyzer_review, output_folder='phy_review/')

# Manual review in Phy: phy template-gui phy_review/params.py

# Step 3: Load manual labels and merge
manual_labels = si.read_phy('phy_review/').get_property('quality')
# Combine auto + manual labels for final result

Comparison of Methods

Method	Pros	Cons
Manual (Phy)	Gold standard, flexible	Slow, subjective
SpikeInterface QM	Fast, reproducible	Simple thresholds only
Bombcell	Multi-class, validated	Requires waveform extraction
UnitRefine	ML-based, learns from data	Needs training data

Best Practices

Always visualize - Don't blindly trust automated results
Document thresholds - Record exact parameters used
Validate - Compare automated vs manual on subset
Be conservative - When in doubt, exclude the unit
Report methods - Include curation criteria in publications

Pipeline Example

def curate_sorting(sorting, recording, output_dir):
    """Complete curation pipeline."""

    # Create analyzer
    analyzer = si.create_sorting_analyzer(sorting, recording, sparse=True,
                                          folder=f'{output_dir}/analyzer')

    # Compute required extensions
    analyzer.compute('random_spikes', max_spikes_per_unit=500)
    analyzer.compute('waveforms')
    analyzer.compute('templates')
    analyzer.compute('noise_levels')
    analyzer.compute('spike_amplitudes')
    analyzer.compute('quality_metrics')

    qm = analyzer.get_extension('quality_metrics').get_data()

    # Auto-classify
    labels = {}
    for unit_id in qm.index:
        row = qm.loc[unit_id]

        if row['snr'] < 2:
            labels[unit_id] = 'noise'
        elif row['isi_violations_ratio'] > 0.1 or row['presence_ratio'] < 0.8:
            labels[unit_id] = 'mua'
        elif (row['snr'] > 5 and
              row['isi_violations_ratio'] < 0.01 and
              row['presence_ratio'] > 0.9 and
              row['amplitude_cutoff'] < 0.1):
            labels[unit_id] = 'good'
        else:
            labels[unit_id] = 'unsorted'

    # Summary
    from collections import Counter
    print("Classification summary:")
    print(Counter(labels.values()))

    # Save labels
    import json
    with open(f'{output_dir}/unit_labels.json', 'w') as f:
        json.dump(labels, f)

    # Return good units
    good_ids = [u for u, l in labels.items() if l == 'good']
    return sorting.select_units(good_ids), labels

# Usage
sorting_curated, labels = curate_sorting(sorting, recording, 'output/')

References

Bombcell GitHub
UnitMatch GitHub
SpikeInterface Curation
Fabre et al. (2023) "Bombcell: automated curation and cell classification"
van Beest et al. (2024) "UnitMatch: tracking neurons across days with high-density probes"

9.8 KiB Raw Blame History

Automated Curation Reference

Why Automated Curation?

Available Tools

Bombcell

Overview

Installation

Basic Usage (Python)

Integration with SpikeInterface

Bombcell Metrics

Custom Thresholds

SpikeInterface Auto-Curation

Threshold-Based Curation

Using SpikeInterface Curation Module

UnitMatch: Cross-Session Tracking

Installation

Usage

Integration with Workflow

Semi-Automated Workflow

Comparison of Methods

Best Practices

Pipeline Example

References

9.8 KiB

Raw Blame History