Files
claude-scientific-skills/scientific-skills/imaging-data-commons/references/use_cases.md
2026-02-10 18:12:49 -05:00

187 lines
5.2 KiB
Markdown

# Common Use Cases for IDC
**Tested with:** idc-index 0.11.9 (IDC data version v23)
This guide provides complete end-to-end workflow examples for common IDC use cases. Each use case demonstrates the full workflow from query to download with best practices.
## When to Use This Guide
Load this guide when you need:
- Complete end-to-end workflow examples for training dataset creation
- Patterns for multi-step data selection and download workflows
- Examples of license-aware data handling for commercial use
- Visualization workflows for data preview before download
For core API patterns (query, download, visualize, citations), see the "Core Capabilities" section in the main SKILL.md.
## Prerequisites
```bash
pip install --upgrade idc-index
```
## Use Case 1: Find and Download Lung CT Scans for Deep Learning
**Objective:** Build training dataset of lung CT scans from NLST collection
**Steps:**
```python
from idc_index import IDCClient
client = IDCClient()
# 1. Query for lung CT scans with specific criteria
query = """
SELECT
PatientID,
SeriesInstanceUID,
SeriesDescription
FROM index
WHERE collection_id = 'nlst'
AND Modality = 'CT'
AND BodyPartExamined = 'CHEST'
AND license_short_name = 'CC BY 4.0'
ORDER BY PatientID
LIMIT 100
"""
results = client.sql_query(query)
print(f"Found {len(results)} series from {results['PatientID'].nunique()} patients")
# 2. Download data organized by patient
client.download_from_selection(
seriesInstanceUID=list(results['SeriesInstanceUID'].values),
downloadDir="./training_data",
dirTemplate="%collection_id/%PatientID/%SeriesInstanceUID"
)
# 3. Save manifest for reproducibility
results.to_csv('training_manifest.csv', index=False)
```
## Use Case 2: Query Brain MRI by Manufacturer for Quality Study
**Objective:** Compare image quality across different MRI scanner manufacturers
**Steps:**
```python
from idc_index import IDCClient
import pandas as pd
client = IDCClient()
# Query for brain MRI grouped by manufacturer
query = """
SELECT
Manufacturer,
ManufacturerModelName,
COUNT(DISTINCT SeriesInstanceUID) as num_series,
COUNT(DISTINCT PatientID) as num_patients
FROM index
WHERE Modality = 'MR'
AND BodyPartExamined LIKE '%BRAIN%'
GROUP BY Manufacturer, ManufacturerModelName
HAVING num_series >= 10
ORDER BY num_series DESC
"""
manufacturers = client.sql_query(query)
print(manufacturers)
# Download sample from each manufacturer for comparison
for _, row in manufacturers.head(3).iterrows():
mfr = row['Manufacturer']
model = row['ManufacturerModelName']
query = f"""
SELECT SeriesInstanceUID
FROM index
WHERE Manufacturer = '{mfr}'
AND ManufacturerModelName = '{model}'
AND Modality = 'MR'
AND BodyPartExamined LIKE '%BRAIN%'
LIMIT 5
"""
series = client.sql_query(query)
client.download_from_selection(
seriesInstanceUID=list(series['SeriesInstanceUID'].values),
downloadDir=f"./quality_study/{mfr.replace(' ', '_')}"
)
```
## Use Case 3: Visualize Series Without Downloading
**Objective:** Preview imaging data before committing to download
```python
from idc_index import IDCClient
import webbrowser
client = IDCClient()
series_list = client.sql_query("""
SELECT SeriesInstanceUID, PatientID, SeriesDescription
FROM index
WHERE collection_id = 'acrin_nsclc_fdg_pet' AND Modality = 'PT'
LIMIT 10
""")
# Preview each in browser
for _, row in series_list.iterrows():
viewer_url = client.get_viewer_URL(seriesInstanceUID=row['SeriesInstanceUID'])
print(f"Patient {row['PatientID']}: {row['SeriesDescription']}")
print(f" View at: {viewer_url}")
# webbrowser.open(viewer_url) # Uncomment to open automatically
```
For additional visualization options, see the [IDC Portal getting started guide](https://learn.canceridc.dev/portal/getting-started) or [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser) for 3D Slicer integration.
## Use Case 4: License-Aware Batch Download for Commercial Use
**Objective:** Download only CC-BY licensed data suitable for commercial applications
**Steps:**
```python
from idc_index import IDCClient
client = IDCClient()
# Query ONLY for CC BY licensed data (allows commercial use with attribution)
query = """
SELECT
SeriesInstanceUID,
collection_id,
PatientID,
Modality
FROM index
WHERE license_short_name LIKE 'CC BY%'
AND license_short_name NOT LIKE '%NC%'
AND Modality IN ('CT', 'MR')
AND BodyPartExamined IN ('CHEST', 'BRAIN', 'ABDOMEN')
LIMIT 200
"""
cc_by_data = client.sql_query(query)
print(f"Found {len(cc_by_data)} CC BY licensed series")
print(f"Collections: {cc_by_data['collection_id'].unique()}")
# Download with license verification
client.download_from_selection(
seriesInstanceUID=list(cc_by_data['SeriesInstanceUID'].values),
downloadDir="./commercial_dataset",
dirTemplate="%collection_id/%Modality/%PatientID/%SeriesInstanceUID"
)
# Save license information
cc_by_data.to_csv('commercial_dataset_manifest_CC-BY_ONLY.csv', index=False)
```
## Resources
- Main SKILL.md for core API patterns (query, download, visualize)
- `references/clinical_data_guide.md` for clinical data integration workflows
- `references/sql_patterns.md` for additional SQL query patterns
- `references/index_tables_guide.md` for complex join patterns