mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-28 07:33:45 +08:00
187 lines
5.2 KiB
Markdown
187 lines
5.2 KiB
Markdown
# Common Use Cases for IDC
|
|
|
|
**Tested with:** idc-index 0.11.9 (IDC data version v23)
|
|
|
|
This guide provides complete end-to-end workflow examples for common IDC use cases. Each use case demonstrates the full workflow from query to download with best practices.
|
|
|
|
## When to Use This Guide
|
|
|
|
Load this guide when you need:
|
|
- Complete end-to-end workflow examples for training dataset creation
|
|
- Patterns for multi-step data selection and download workflows
|
|
- Examples of license-aware data handling for commercial use
|
|
- Visualization workflows for data preview before download
|
|
|
|
For core API patterns (query, download, visualize, citations), see the "Core Capabilities" section in the main SKILL.md.
|
|
|
|
## Prerequisites
|
|
|
|
```bash
|
|
pip install --upgrade idc-index
|
|
```
|
|
|
|
## Use Case 1: Find and Download Lung CT Scans for Deep Learning
|
|
|
|
**Objective:** Build training dataset of lung CT scans from NLST collection
|
|
|
|
**Steps:**
|
|
```python
|
|
from idc_index import IDCClient
|
|
|
|
client = IDCClient()
|
|
|
|
# 1. Query for lung CT scans with specific criteria
|
|
query = """
|
|
SELECT
|
|
PatientID,
|
|
SeriesInstanceUID,
|
|
SeriesDescription
|
|
FROM index
|
|
WHERE collection_id = 'nlst'
|
|
AND Modality = 'CT'
|
|
AND BodyPartExamined = 'CHEST'
|
|
AND license_short_name = 'CC BY 4.0'
|
|
ORDER BY PatientID
|
|
LIMIT 100
|
|
"""
|
|
|
|
results = client.sql_query(query)
|
|
print(f"Found {len(results)} series from {results['PatientID'].nunique()} patients")
|
|
|
|
# 2. Download data organized by patient
|
|
client.download_from_selection(
|
|
seriesInstanceUID=list(results['SeriesInstanceUID'].values),
|
|
downloadDir="./training_data",
|
|
dirTemplate="%collection_id/%PatientID/%SeriesInstanceUID"
|
|
)
|
|
|
|
# 3. Save manifest for reproducibility
|
|
results.to_csv('training_manifest.csv', index=False)
|
|
```
|
|
|
|
## Use Case 2: Query Brain MRI by Manufacturer for Quality Study
|
|
|
|
**Objective:** Compare image quality across different MRI scanner manufacturers
|
|
|
|
**Steps:**
|
|
```python
|
|
from idc_index import IDCClient
|
|
import pandas as pd
|
|
|
|
client = IDCClient()
|
|
|
|
# Query for brain MRI grouped by manufacturer
|
|
query = """
|
|
SELECT
|
|
Manufacturer,
|
|
ManufacturerModelName,
|
|
COUNT(DISTINCT SeriesInstanceUID) as num_series,
|
|
COUNT(DISTINCT PatientID) as num_patients
|
|
FROM index
|
|
WHERE Modality = 'MR'
|
|
AND BodyPartExamined LIKE '%BRAIN%'
|
|
GROUP BY Manufacturer, ManufacturerModelName
|
|
HAVING num_series >= 10
|
|
ORDER BY num_series DESC
|
|
"""
|
|
|
|
manufacturers = client.sql_query(query)
|
|
print(manufacturers)
|
|
|
|
# Download sample from each manufacturer for comparison
|
|
for _, row in manufacturers.head(3).iterrows():
|
|
mfr = row['Manufacturer']
|
|
model = row['ManufacturerModelName']
|
|
|
|
query = f"""
|
|
SELECT SeriesInstanceUID
|
|
FROM index
|
|
WHERE Manufacturer = '{mfr}'
|
|
AND ManufacturerModelName = '{model}'
|
|
AND Modality = 'MR'
|
|
AND BodyPartExamined LIKE '%BRAIN%'
|
|
LIMIT 5
|
|
"""
|
|
|
|
series = client.sql_query(query)
|
|
client.download_from_selection(
|
|
seriesInstanceUID=list(series['SeriesInstanceUID'].values),
|
|
downloadDir=f"./quality_study/{mfr.replace(' ', '_')}"
|
|
)
|
|
```
|
|
|
|
## Use Case 3: Visualize Series Without Downloading
|
|
|
|
**Objective:** Preview imaging data before committing to download
|
|
|
|
```python
|
|
from idc_index import IDCClient
|
|
import webbrowser
|
|
|
|
client = IDCClient()
|
|
|
|
series_list = client.sql_query("""
|
|
SELECT SeriesInstanceUID, PatientID, SeriesDescription
|
|
FROM index
|
|
WHERE collection_id = 'acrin_nsclc_fdg_pet' AND Modality = 'PT'
|
|
LIMIT 10
|
|
""")
|
|
|
|
# Preview each in browser
|
|
for _, row in series_list.iterrows():
|
|
viewer_url = client.get_viewer_URL(seriesInstanceUID=row['SeriesInstanceUID'])
|
|
print(f"Patient {row['PatientID']}: {row['SeriesDescription']}")
|
|
print(f" View at: {viewer_url}")
|
|
# webbrowser.open(viewer_url) # Uncomment to open automatically
|
|
```
|
|
|
|
For additional visualization options, see the [IDC Portal getting started guide](https://learn.canceridc.dev/portal/getting-started) or [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser) for 3D Slicer integration.
|
|
|
|
## Use Case 4: License-Aware Batch Download for Commercial Use
|
|
|
|
**Objective:** Download only CC-BY licensed data suitable for commercial applications
|
|
|
|
**Steps:**
|
|
```python
|
|
from idc_index import IDCClient
|
|
|
|
client = IDCClient()
|
|
|
|
# Query ONLY for CC BY licensed data (allows commercial use with attribution)
|
|
query = """
|
|
SELECT
|
|
SeriesInstanceUID,
|
|
collection_id,
|
|
PatientID,
|
|
Modality
|
|
FROM index
|
|
WHERE license_short_name LIKE 'CC BY%'
|
|
AND license_short_name NOT LIKE '%NC%'
|
|
AND Modality IN ('CT', 'MR')
|
|
AND BodyPartExamined IN ('CHEST', 'BRAIN', 'ABDOMEN')
|
|
LIMIT 200
|
|
"""
|
|
|
|
cc_by_data = client.sql_query(query)
|
|
|
|
print(f"Found {len(cc_by_data)} CC BY licensed series")
|
|
print(f"Collections: {cc_by_data['collection_id'].unique()}")
|
|
|
|
# Download with license verification
|
|
client.download_from_selection(
|
|
seriesInstanceUID=list(cc_by_data['SeriesInstanceUID'].values),
|
|
downloadDir="./commercial_dataset",
|
|
dirTemplate="%collection_id/%Modality/%PatientID/%SeriesInstanceUID"
|
|
)
|
|
|
|
# Save license information
|
|
cc_by_data.to_csv('commercial_dataset_manifest_CC-BY_ONLY.csv', index=False)
|
|
```
|
|
|
|
## Resources
|
|
|
|
- Main SKILL.md for core API patterns (query, download, visualize)
|
|
- `references/clinical_data_guide.md` for clinical data integration workflows
|
|
- `references/sql_patterns.md` for additional SQL query patterns
|
|
- `references/index_tables_guide.md` for complex join patterns
|