Merge pull request #46 from fedorov/update-idc-v1.3.0

update imaging-data-commons skill to v1.3.1
2026-03-27 07:09:27 +08:00 · 2026-02-16 10:24:23 -08:00
parent 3a5f2e2227 5a471d9c36
commit 326b043b8f
6 changed files with 1214 additions and 436 deletions
--- a/scientific-skills/imaging-data-commons/SKILL.md
+++ b/scientific-skills/imaging-data-commons/SKILL.md
@@ -3,9 +3,10 @@ name: imaging-data-commons
 description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Use for accessing large-scale radiology (CT, MR, PET) and pathology datasets for AI training or research. No authentication required. Query by metadata, visualize in browser, check licenses.
 license: This skill is provided under the MIT License. IDC data itself has individual licensing (mostly CC-BY, some CC-NC) that must be respected when using the data.
 metadata:
-    version: 1.2.0
+    version: 1.3.1
    skill-author: Andrey Fedorov, @fedorov
-    idc-index: "0.11.7"
+    idc-index: "0.11.9"
+    idc-data-version: "v23"
    repository: https://github.com/ImagingDataCommons/idc-claude-skill
 ---

@@ -15,20 +16,39 @@ metadata:

 Use the `idc-index` Python package to query and download public cancer imaging data from the National Cancer Institute Imaging Data Commons (IDC). No authentication required for data access.

+**Current IDC Data Version: v23** (always verify with `IDCClient().get_idc_version()`)
+
 **Primary tool:** `idc-index` ([GitHub](https://github.com/imagingdatacommons/idc-index))

-**Check current data scale for the latest version:**
+**CRITICAL - Check package version and upgrade if needed (run this FIRST):**
+
+```python
+import idc_index
+
+REQUIRED_VERSION = "0.11.9"  # Must match metadata.idc-index in this file
+installed = idc_index.__version__
+
+if installed < REQUIRED_VERSION:
+    print(f"Upgrading idc-index from {installed} to {REQUIRED_VERSION}...")
+    import subprocess
+    subprocess.run(["pip3", "install", "--upgrade", "--break-system-packages", "idc-index"], check=True)
+    print("Upgrade complete. Restart Python to use new version.")
+else:
+    print(f"idc-index {installed} meets requirement ({REQUIRED_VERSION})")
+```
+
+**Verify IDC data version and check current data scale:**

 ```python
 from idc_index import IDCClient
 client = IDCClient()

-# get IDC data version
-print(client.get_idc_version())
+# Verify IDC data version (should be "v23")
+print(f"IDC data version: {client.get_idc_version()}")

 # Get collection count and total series
 stats = client.sql_query("""
-    SELECT   
+    SELECT
        COUNT(DISTINCT collection_id) as collections,
        COUNT(DISTINCT analysis_result_id) as analysis_results,
        COUNT(DISTINCT PatientID) as patients,
@@ -54,6 +74,30 @@ print(stats)
 - Checking data licenses before use in research or commercial applications
 - Visualizing medical images in a browser without local DICOM viewer software

+## Quick Navigation
+
+**Core Sections (inline):**
+- IDC Data Model - Collection and analysis result hierarchy
+- Index Tables - Available tables and joining patterns
+- Installation - Package setup and version verification
+- Core Capabilities - Essential API patterns (query, download, visualize, license, citations, batch)
+- Best Practices - Usage guidelines
+- Troubleshooting - Common issues and solutions
+
+**Reference Guides (load on demand):**
+
+| Guide | When to Load |
+|-------|--------------|
+| `index_tables_guide.md` | Complex JOINs, schema discovery, DataFrame access |
+| `use_cases.md` | End-to-end workflow examples (training datasets, batch downloads) |
+| `sql_patterns.md` | Quick SQL patterns for filter discovery, annotations, size estimation |
+| `clinical_data_guide.md` | Clinical/tabular data, imaging+clinical joins, value mapping |
+| `cloud_storage_guide.md` | Direct S3/GCS access, versioning, UUID mapping |
+| `dicomweb_guide.md` | DICOMweb endpoints, PACS integration |
+| `digital_pathology_guide.md` | Slide microscopy (SM), annotations (ANN), pathology workflows |
+| `bigquery_guide.md` | Full DICOM metadata, private elements (requires GCP) |
+| `cli_guide.md` | Command-line tools (`idc download`, manifest files) |
+
 ## IDC Data Model

 IDC adds two grouping levels above the standard DICOM hierarchy (Patient → Study → Series → Instance):
@@ -75,6 +119,8 @@ Use `collection_id` to find original imaging data, may include annotations depos

 The `idc-index` package provides multiple metadata index tables, accessible via SQL or as pandas DataFrames.

+**Complete index table documentation:** Use https://idc-index.readthedocs.io/en/latest/indices_reference.html for quick check of available tables and columns without executing any code.
+
 **Important:** Use `client.indices_overview` to get current table descriptions and column schemas. This is the authoritative source for available columns and their types — always query it when writing SQL or exploring data structure.

 ### Available Tables
@@ -89,6 +135,9 @@ The `idc-index` package provides multiple metadata index tables, accessible via
 | `sm_index` | 1 row = 1 slide microscopy series | fetch_index() | Slide Microscopy (pathology) series metadata |
 | `sm_instance_index` | 1 row = 1 slide microscopy instance | fetch_index() | Instance-level (SOPInstanceUID) metadata for slide microscopy |
 | `seg_index` | 1 row = 1 DICOM Segmentation series | fetch_index() | Segmentation metadata: algorithm, segment count, reference to source image series |
+| `ann_index` | 1 row = 1 DICOM ANN series | fetch_index() | Microscopy Bulk Simple Annotations series metadata; references annotated image series |
+| `ann_group_index` | 1 row = 1 annotation group | fetch_index() | Detailed annotation group metadata: graphic type, annotation count, property codes, algorithm |
+| `contrast_index` | 1 row = 1 series with contrast info | fetch_index() | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |

 **Auto** = loaded automatically when `IDCClient()` is instantiated
 **fetch_index()** = requires `client.fetch_index("table_name")` to load
@@ -107,140 +156,13 @@ The `idc-index` package provides multiple metadata index tables, accessible via
 | `source_DOI` | index, analysis_results_index | Link by publication DOI |
 | `crdc_series_uuid` | index, prior_versions_index | Link by CRDC unique identifier |
 | `Modality` | index, prior_versions_index | Filter by imaging modality |
-| `SeriesInstanceUID` | index, seg_index | Link segmentation series to its index metadata |
+| `SeriesInstanceUID` | index, seg_index, ann_index, ann_group_index, contrast_index | Link segmentation/annotation/contrast series to its index metadata |
 | `segmented_SeriesInstanceUID` | seg_index → index | Link segmentation to its source image series (join seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
+| `referenced_SeriesInstanceUID` | ann_index → index | Link annotation to its source image series (join ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |

 **Note:** `Subjects`, `Updated`, and `Description` appear in multiple tables but have different meanings (counts vs identifiers, different update contexts).

-**Example joins:**
-```python
-from idc_index import IDCClient
-client = IDCClient()
-
-# Join index with collections_index to get cancer types
-client.fetch_index("collections_index")
-result = client.sql_query("""
-    SELECT i.SeriesInstanceUID, i.Modality, c.CancerTypes, c.TumorLocations
-    FROM index i
-    JOIN collections_index c ON i.collection_id = c.collection_id
-    WHERE i.Modality = 'MR'
-    LIMIT 10
-""")
-
-# Join index with sm_index for slide microscopy details
-client.fetch_index("sm_index")
-result = client.sql_query("""
-    SELECT i.collection_id, i.PatientID, s.ObjectiveLensPower, s.min_PixelSpacing_2sf
-    FROM index i
-    JOIN sm_index s ON i.SeriesInstanceUID = s.SeriesInstanceUID
-    LIMIT 10
-""")
-
-# Join seg_index with index to find segmentations and their source images
-client.fetch_index("seg_index")
-result = client.sql_query("""
-    SELECT
-        s.SeriesInstanceUID as seg_series,
-        s.AlgorithmName,
-        s.total_segments,
-        src.collection_id,
-        src.Modality as source_modality,
-        src.BodyPartExamined
-    FROM seg_index s
-    JOIN index src ON s.segmented_SeriesInstanceUID = src.SeriesInstanceUID
-    WHERE s.AlgorithmType = 'AUTOMATIC'
-    LIMIT 10
-""")
-```
-
-### Accessing Index Tables
-
-**Via SQL (recommended for filtering/aggregation):**
-```python
-from idc_index import IDCClient
-client = IDCClient()
-
-# Query the primary index (always available)
-results = client.sql_query("SELECT * FROM index WHERE Modality = 'CT' LIMIT 10")
-
-# Fetch and query additional indices
-client.fetch_index("collections_index")
-collections = client.sql_query("SELECT collection_id, CancerTypes, TumorLocations FROM collections_index")
-
-client.fetch_index("analysis_results_index")
-analysis = client.sql_query("SELECT * FROM analysis_results_index LIMIT 5")
-```
-
-**As pandas DataFrames (direct access):**
-```python
-# Primary index (always available after client initialization)
-df = client.index
-
-# Fetch and access on-demand indices
-client.fetch_index("sm_index")
-sm_df = client.sm_index
-```
-
-### Discovering Table Schemas (Essential for Query Writing)
-
-The `indices_overview` dictionary contains complete schema information for all tables. **Always consult this when writing queries or exploring data structure.**
-
-**DICOM attribute mapping:** Many columns are populated directly from DICOM attributes in the source files. The column description in the schema indicates when a column corresponds to a DICOM attribute (e.g., "DICOM Modality attribute" or references a DICOM tag). This allows leveraging DICOM knowledge when querying — standard DICOM attribute names like `PatientID`, `StudyInstanceUID`, `Modality`, `BodyPartExamined` work as expected.
-
-```python
-from idc_index import IDCClient
-client = IDCClient()
-
-# List all available indices with descriptions
-for name, info in client.indices_overview.items():
-    print(f"\n{name}:")
-    print(f"  Installed: {info['installed']}")
-    print(f"  Description: {info['description']}")
-
-# Get complete schema for a specific index (columns, types, descriptions)
-schema = client.indices_overview["index"]["schema"]
-print(f"\nTable: {schema['table_description']}")
-print("\nColumns:")
-for col in schema['columns']:
-    desc = col.get('description', 'No description')
-    # Description indicates if column is from DICOM attribute
-    print(f"  {col['name']} ({col['type']}): {desc}")
-
-# Find columns that are DICOM attributes (check description for "DICOM" reference)
-dicom_cols = [c['name'] for c in schema['columns'] if 'DICOM' in c.get('description', '').upper()]
-print(f"\nDICOM-sourced columns: {dicom_cols}")
-```
-
-**Alternative: use `get_index_schema()` method:**
-```python
-schema = client.get_index_schema("index")
-# Returns same schema dict: {'table_description': ..., 'columns': [...]}
-```
-
-### Key Columns in Primary `index` Table
-
-Most common columns for queries (use `indices_overview` for complete list and descriptions):
-
-| Column | Type | DICOM | Description |
-|--------|------|-------|-------------|
-| `collection_id` | STRING | No | IDC collection identifier |
-| `analysis_result_id` | STRING | No | If applicable, indicates what analysis results collection given series is part of |
-| `source_DOI` | STRING | No | DOI linking to dataset details; use for learning more about the content and for attribution (see citations below) |
-| `PatientID` | STRING | Yes | Patient identifier |
-| `StudyInstanceUID` | STRING | Yes | DICOM Study UID |
-| `SeriesInstanceUID` | STRING | Yes | DICOM Series UID — use for downloads/viewing |
-| `Modality` | STRING | Yes | Imaging modality (CT, MR, PT, SM, etc.) |
-| `BodyPartExamined` | STRING | Yes | Anatomical region |
-| `SeriesDescription` | STRING | Yes | Description of the series |
-| `Manufacturer` | STRING | Yes | Equipment manufacturer |
-| `StudyDate` | STRING | Yes | Date study was performed |
-| `PatientSex` | STRING | Yes | Patient sex |
-| `PatientAge` | STRING | Yes | Patient age at time of study |
-| `license_short_name` | STRING | No | License type (CC BY 4.0, CC BY-NC 4.0, etc.) |
-| `series_size_MB` | FLOAT | No | Size of series in megabytes |
-| `instanceCount` | INTEGER | No | Number of DICOM instances in series |
-
-**DICOM = Yes**: Column value extracted from the DICOM attribute with the same name. Refer to the [DICOM standard](https://dicom.nema.org/medical/dicom/current/output/chtml/part06/chapter_6.html) for numeric tag mappings. Use standard DICOM knowledge for expected values and formats.
+For detailed join examples, schema discovery patterns, key columns reference, and DataFrame access, see `references/index_tables_guide.md`.

 ### Clinical Data Access

@@ -301,7 +223,13 @@ pip install --upgrade idc-index

 **Important:** New IDC data release will always trigger a new version of `idc-index`. Always use `--upgrade` flag while installing, unless an older version is needed for reproducibility.

-**Tested with:** idc-index 0.11.7 (IDC data version v23)
+**IMPORTANT:** IDC data version v23 is current. Always verify your version:
+```python
+print(client.get_idc_version())  # Should return "v23"
+```
+If you see an older version, upgrade with: `pip install --upgrade idc-index`
+
+**Tested with:** idc-index 0.11.9 (IDC data version v23)

 **Optional (for data analysis):**
 ```bash
@@ -484,6 +412,15 @@ client.download_from_selection(
 # Results in: ./data/flat/*.dcm
 ```

+**Downloaded file names:**
+
+Individual DICOM files are named using their CRDC instance UUID: `<crdc_instance_uuid>.dcm` (e.g., `0d73f84e-70ae-4eeb-96a0-1c613b5d9229.dcm`). This UUID-based naming:
+- Enables version tracking (UUIDs change when file content changes)
+- Matches cloud storage organization (`s3://idc-open-data/<crdc_series_uuid>/<crdc_instance_uuid>.dcm`)
+- Differs from DICOM UIDs (SOPInstanceUID) which are preserved inside the file metadata
+
+To identify files, use the `crdc_instance_uuid` column in queries or read DICOM metadata (SOPInstanceUID) from the files.
+
 ### Command-Line Download

 The `idc download` command provides command-line access to download functionality without writing Python code. Available after installing `idc-index`.
@@ -705,6 +642,13 @@ For queries requiring full DICOM metadata, complex JOINs, clinical data tables,

 See `references/bigquery_guide.md` for setup, table schemas, query patterns, private element access, and cost optimization.

+**Before using BigQuery**, always check if a specialized index table already has the metadata you need:
+1. Use `client.indices_overview` or the [idc-index indices reference](https://idc-index.readthedocs.io/en/latest/indices_reference.html) to discover all available tables and their columns
+2. Fetch the relevant index: `client.fetch_index("table_name")`
+3. Query locally with `client.sql_query()` (free, no GCP account needed)
+
+Common specialized indices: `seg_index` (segmentations), `ann_index` / `ann_group_index` (microscopy annotations), `sm_index` (slide microscopy), `collections_index` (collection metadata). Only use BigQuery if you need private DICOM elements or attributes not in any index.
+
 ### 8. Tool Selection Guide

 | Task | Tool | Reference |
@@ -782,166 +726,15 @@ sitk.WriteImage(smoothed, "processed_volume.nii.gz")

 ## Common Use Cases

-### Use Case 1: Find and Download Lung CT Scans for Deep Learning
-
-**Objective:** Build training dataset of lung CT scans from NLST collection
-
-**Steps:**
-```python
-from idc_index import IDCClient
-
-client = IDCClient()
-
-# 1. Query for lung CT scans with specific criteria
-query = """
-SELECT
-  PatientID,
-  SeriesInstanceUID,
-  SeriesDescription
-FROM index
-WHERE collection_id = 'nlst'
-  AND Modality = 'CT'
-  AND BodyPartExamined = 'CHEST'
-  AND license_short_name = 'CC BY 4.0'
-ORDER BY PatientID
-LIMIT 100
-"""
-
-results = client.sql_query(query)
-print(f"Found {len(results)} series from {results['PatientID'].nunique()} patients")
-
-# 2. Download data organized by patient
-client.download_from_selection(
-    seriesInstanceUID=list(results['SeriesInstanceUID'].values),
-    downloadDir="./training_data",
-    dirTemplate="%collection_id/%PatientID/%SeriesInstanceUID"
-)
-
-# 3. Save manifest for reproducibility
-results.to_csv('training_manifest.csv', index=False)
-```
-
-### Use Case 2: Query Brain MRI by Manufacturer for Quality Study
-
-**Objective:** Compare image quality across different MRI scanner manufacturers
-
-**Steps:**
-```python
-from idc_index import IDCClient
-import pandas as pd
-
-client = IDCClient()
-
-# Query for brain MRI grouped by manufacturer
-query = """
-SELECT
-  Manufacturer,
-  ManufacturerModelName,
-  COUNT(DISTINCT SeriesInstanceUID) as num_series,
-  COUNT(DISTINCT PatientID) as num_patients
-FROM index
-WHERE Modality = 'MR'
-  AND BodyPartExamined LIKE '%BRAIN%'
-GROUP BY Manufacturer, ManufacturerModelName
-HAVING num_series >= 10
-ORDER BY num_series DESC
-"""
-
-manufacturers = client.sql_query(query)
-print(manufacturers)
-
-# Download sample from each manufacturer for comparison
-for _, row in manufacturers.head(3).iterrows():
-    mfr = row['Manufacturer']
-    model = row['ManufacturerModelName']
-
-    query = f"""
-    SELECT SeriesInstanceUID
-    FROM index
-    WHERE Manufacturer = '{mfr}'
-      AND ManufacturerModelName = '{model}'
-      AND Modality = 'MR'
-      AND BodyPartExamined LIKE '%BRAIN%'
-    LIMIT 5
-    """
-
-    series = client.sql_query(query)
-    client.download_from_selection(
-        seriesInstanceUID=list(series['SeriesInstanceUID'].values),
-        downloadDir=f"./quality_study/{mfr.replace(' ', '_')}"
-    )
-```
-
-### Use Case 3: Visualize Series Without Downloading
-
-**Objective:** Preview imaging data before committing to download
-
-```python
-from idc_index import IDCClient
-import webbrowser
-
-client = IDCClient()
-
-series_list = client.sql_query("""
-    SELECT SeriesInstanceUID, PatientID, SeriesDescription
-    FROM index
-    WHERE collection_id = 'acrin_nsclc_fdg_pet' AND Modality = 'PT'
-    LIMIT 10
-""")
-
-# Preview each in browser
-for _, row in series_list.iterrows():
-    viewer_url = client.get_viewer_URL(seriesInstanceUID=row['SeriesInstanceUID'])
-    print(f"Patient {row['PatientID']}: {row['SeriesDescription']}")
-    print(f"  View at: {viewer_url}")
-    # webbrowser.open(viewer_url)  # Uncomment to open automatically
-```
-
-For additional visualization options, see the [IDC Portal getting started guide](https://learn.canceridc.dev/portal/getting-started) or [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser) for 3D Slicer integration.
-
-### Use Case 4: License-Aware Batch Download for Commercial Use
-
-**Objective:** Download only CC-BY licensed data suitable for commercial applications
-
-**Steps:**
-```python
-from idc_index import IDCClient
-
-client = IDCClient()
-
-# Query ONLY for CC BY licensed data (allows commercial use with attribution)
-query = """
-SELECT
-  SeriesInstanceUID,
-  collection_id,
-  PatientID,
-  Modality
-FROM index
-WHERE license_short_name LIKE 'CC BY%'
-  AND license_short_name NOT LIKE '%NC%'
-  AND Modality IN ('CT', 'MR')
-  AND BodyPartExamined IN ('CHEST', 'BRAIN', 'ABDOMEN')
-LIMIT 200
-"""
-
-cc_by_data = client.sql_query(query)
-
-print(f"Found {len(cc_by_data)} CC BY licensed series")
-print(f"Collections: {cc_by_data['collection_id'].unique()}")
-
-# Download with license verification
-client.download_from_selection(
-    seriesInstanceUID=list(cc_by_data['SeriesInstanceUID'].values),
-    downloadDir="./commercial_dataset",
-    dirTemplate="%collection_id/%Modality/%PatientID/%SeriesInstanceUID"
-)
-
-# Save license information
-cc_by_data.to_csv('commercial_dataset_manifest_CC-BY_ONLY.csv', index=False)
-```
+See `references/use_cases.md` for complete end-to-end workflow examples including:
+- Building deep learning training datasets from lung CT scans
+- Comparing image quality across scanner manufacturers
+- Previewing data in browser before downloading
+- License-aware batch downloads for commercial use

 ## Best Practices

+- **Verify IDC version before generating responses** - Always call `client.get_idc_version()` at the start of a session to confirm you're using the expected data version (currently v23). If using an older version, recommend `pip install --upgrade idc-index`
 - **Check licenses before use** - Always query the `license_short_name` field and respect licensing terms (CC BY vs CC BY-NC)
 - **Generate citations for attribution** - Use `citations_from_selection()` to get properly formatted citations from `source_DOI` values; include these in publications
 - **Start with small queries** - Use `LIMIT` clause when exploring to avoid long downloads and understand data structure
@@ -989,142 +782,14 @@ cc_by_data.to_csv('commercial_dataset_manifest_CC-BY_ONLY.csv', index=False)

 ## Common SQL Query Patterns

-Quick reference for common queries. For detailed examples with context, see the Core Capabilities section above.
+See `references/sql_patterns.md` for quick-reference SQL patterns including:
+- Filter value discovery (modalities, body parts, manufacturers)
+- Annotation and segmentation queries (including seg_index, ann_index joins)
+- Slide microscopy queries (sm_index patterns)
+- Download size estimation
+- Clinical data linking

-### Discover available filter values
-```python
-# What modalities exist?
-client.sql_query("SELECT DISTINCT Modality FROM index")
-
-# What body parts for a specific modality?
-client.sql_query("""
-    SELECT DISTINCT BodyPartExamined, COUNT(*) as n
-    FROM index WHERE Modality = 'CT' AND BodyPartExamined IS NOT NULL
-    GROUP BY BodyPartExamined ORDER BY n DESC
-""")
-
-# What manufacturers for MR?
-client.sql_query("""
-    SELECT DISTINCT Manufacturer, COUNT(*) as n
-    FROM index WHERE Modality = 'MR'
-    GROUP BY Manufacturer ORDER BY n DESC
-""")
-```
-
-### Find annotations and segmentations
-
-**Note:** Not all image-derived objects belong to analysis result collections. Some annotations are deposited alongside original images. Use DICOM Modality or SOPClassUID to find all derived objects regardless of collection type.
-
-```python
-# Find ALL segmentations and structure sets by DICOM Modality
-# SEG = DICOM Segmentation, RTSTRUCT = Radiotherapy Structure Set
-client.sql_query("""
-    SELECT collection_id, Modality, COUNT(*) as series_count
-    FROM index
-    WHERE Modality IN ('SEG', 'RTSTRUCT')
-    GROUP BY collection_id, Modality
-    ORDER BY series_count DESC
-""")
-
-# Find segmentations for a specific collection (includes non-analysis-result items)
-client.sql_query("""
-    SELECT SeriesInstanceUID, SeriesDescription, analysis_result_id
-    FROM index
-    WHERE collection_id = 'tcga_luad' AND Modality = 'SEG'
-""")
-
-# List analysis result collections (curated derived datasets)
-client.fetch_index("analysis_results_index")
-client.sql_query("""
-    SELECT analysis_result_id, analysis_result_title, Collections, Modalities
-    FROM analysis_results_index
-""")
-
-# Find analysis results for a specific source collection
-client.sql_query("""
-    SELECT analysis_result_id, analysis_result_title
-    FROM analysis_results_index
-    WHERE Collections LIKE '%tcga_luad%'
-""")
-
-# Use seg_index for detailed DICOM Segmentation metadata
-client.fetch_index("seg_index")
-
-# Get segmentation statistics by algorithm
-client.sql_query("""
-    SELECT AlgorithmName, AlgorithmType, COUNT(*) as seg_count
-    FROM seg_index
-    WHERE AlgorithmName IS NOT NULL
-    GROUP BY AlgorithmName, AlgorithmType
-    ORDER BY seg_count DESC
-    LIMIT 10
-""")
-
-# Find segmentations for specific source images (e.g., chest CT)
-client.sql_query("""
-    SELECT
-        s.SeriesInstanceUID as seg_series,
-        s.AlgorithmName,
-        s.total_segments,
-        s.segmented_SeriesInstanceUID as source_series
-    FROM seg_index s
-    JOIN index src ON s.segmented_SeriesInstanceUID = src.SeriesInstanceUID
-    WHERE src.Modality = 'CT' AND src.BodyPartExamined = 'CHEST'
-    LIMIT 10
-""")
-
-# Find TotalSegmentator results with source image context
-client.sql_query("""
-    SELECT
-        seg_info.collection_id,
-        COUNT(DISTINCT s.SeriesInstanceUID) as seg_count,
-        SUM(s.total_segments) as total_segments
-    FROM seg_index s
-    JOIN index seg_info ON s.SeriesInstanceUID = seg_info.SeriesInstanceUID
-    WHERE s.AlgorithmName LIKE '%TotalSegmentator%'
-    GROUP BY seg_info.collection_id
-    ORDER BY seg_count DESC
-""")
-```
-
-### Query slide microscopy data
-```python
-# sm_index has detailed metadata; join with index for collection_id
-client.fetch_index("sm_index")
-client.sql_query("""
-    SELECT i.collection_id, COUNT(*) as slides,
-           MIN(s.min_PixelSpacing_2sf) as min_resolution
-    FROM sm_index s
-    JOIN index i ON s.SeriesInstanceUID = i.SeriesInstanceUID
-    GROUP BY i.collection_id
-    ORDER BY slides DESC
-""")
-```
-
-### Estimate download size
-```python
-# Size for specific criteria
-client.sql_query("""
-    SELECT SUM(series_size_MB) as total_mb, COUNT(*) as series_count
-    FROM index
-    WHERE collection_id = 'nlst' AND Modality = 'CT'
-""")
-```
-
-### Link to clinical data
-```python
-client.fetch_index("clinical_index")
-
-# Find collections with clinical data and their tables
-client.sql_query("""
-    SELECT collection_id, table_name, COUNT(DISTINCT column_label) as columns
-    FROM clinical_index
-    GROUP BY collection_id, table_name
-    ORDER BY collection_id
-""")
-```
-
-See `references/clinical_data_guide.md` for complete patterns including value mapping and patient cohort selection.
+For segmentation and annotation details, also see `references/digital_pathology_guide.md`.

 ## Related Skills

@@ -1134,8 +799,7 @@ The following skills complement IDC workflows for downstream analysis and visual
 - **pydicom** - Read, write, and manipulate downloaded DICOM files. Use for extracting pixel data, reading metadata, anonymization, and format conversion. Essential for working with IDC radiology data (CT, MR, PET).

 ### Pathology and Slide Microscopy
- **histolab** - Lightweight tile extraction and preprocessing for whole slide images. Use for basic slide processing, tissue detection, and dataset preparation from IDC slide microscopy data.
- **pathml** - Full-featured computational pathology toolkit. Use for advanced WSI analysis including multiplexed imaging, nucleus segmentation, and ML model training on pathology data downloaded from IDC.
+See `references/digital_pathology_guide.md` for DICOM-compatible tools (highdicom, wsidicom, TIA-Toolbox, Slim viewer).

 ### Metadata Visualization
 - **matplotlib** - Low-level plotting for full customization. Use for creating static figures summarizing IDC query results (bar charts of modalities, histograms of series counts, etc.).
@@ -1159,11 +823,8 @@ columns = [(c['name'], c['type'], c.get('description', '')) for c in schema['col

 ### Reference Documentation

- **clinical_data_guide.md** - Clinical/tabular data navigation, value mapping, and joining with imaging data
- **cloud_storage_guide.md** - Direct cloud bucket access (S3/GCS), file organization, CRDC UUIDs, versioning, and reproducibility
- **cli_guide.md** - Complete idc-index command-line interface reference (`idc download`, `idc download-from-manifest`, `idc download-from-selection`)
- **bigquery_guide.md** - Advanced BigQuery usage guide for complex metadata queries
- **dicomweb_guide.md** - DICOMweb endpoint URLs, code examples, and Google Healthcare API implementation details
+See the Quick Navigation section at the top for the full list of reference guides with decision triggers.
+
 - **[indices_reference](https://idc-index.readthedocs.io/en/latest/indices_reference.html)** - External documentation for index tables (may be ahead of the installed version)

 ### External Links
--- a/scientific-skills/imaging-data-commons/references/clinical_data_guide.md
+++ b/scientific-skills/imaging-data-commons/references/clinical_data_guide.md
@@ -0,0 +1,324 @@
+# Clinical Data Guide for IDC
+
+**Tested with:** idc-index 0.11.7 (IDC data version v23)
+
+Clinical data (demographics, diagnoses, therapies, lab tests, staging) accompanies many IDC imaging collections. This guide covers how to discover, access, and integrate clinical data with imaging data using `idc-index`.
+
+## When to Use This Guide
+
+Use this guide when you need to:
+- Find what clinical metadata is available for a collection
+- Filter patients by clinical criteria (e.g., cancer stage, treatment history)
+- Join clinical attributes with imaging data for cohort selection
+- Understand and decode coded values in clinical tables
+
+For basic clinical data access, see the "Clinical Data Access" section in the main SKILL.md. This guide provides detailed workflows and advanced patterns.
+
+## Prerequisites
+
+```bash
+pip install --upgrade idc-index
+```
+
+No BigQuery credentials required - clinical data is packaged with `idc-index`.
+
+## Understanding Clinical Data in IDC
+
+### What is Clinical Data?
+
+Clinical data refers to non-imaging information that accompanies medical images:
+- Patient demographics (age, sex, race)
+- Clinical history (diagnoses, surgeries, therapies)
+- Lab tests and pathology results
+- Cancer staging (clinical and pathological)
+- Treatment outcomes
+
+### Data Organization
+
+Clinical data in IDC comes from collection-specific spreadsheets provided by data submitters. IDC parses these into queryable tables accessible via `idc-index`.
+
+**Important characteristics:**
+- Clinical data is **not harmonized** across collections (terms and formats vary)
+- Not all collections have clinical data (check availability first)
+- All data is **anonymized** - `dicom_patient_id` links to imaging
+
+### The clinical_index Table
+
+The `clinical_index` serves as a dictionary/catalog of all available clinical data:
+
+| Column | Purpose | Use For |
+|--------|---------|---------|
+| `collection_id` | Collection identifier | Filtering by collection |
+| `table_name` | Full BigQuery table reference | BigQuery queries (if needed) |
+| `short_table_name` | Short name | `get_clinical_table()` method |
+| `column` | Column name in table | Selecting data columns |
+| `column_label` | Human-readable description | Searching for concepts |
+| `values` | Observed attribute values for the column | Interpreting coded values |
+
+### The `values` Column
+
+The `values` column contains an array of observed attribute values for the column defined in the `column` field. Each entry has:
+- **option_code**: The actual value observed in that column
+- **option_description**: Human-readable description of that value (from data dictionary if available, otherwise `None`)
+
+For ACRIN collections, value descriptions come from provided data dictionaries. For other collections, they are derived from inspection of the actual data values.
+
+**Note:** For columns with >20 unique values, the `values` array is left empty (`[]`) for simplicity.
+
+## Core Workflow
+
+### Step 1: Fetch Clinical Index
+
+```python
+from idc_index import IDCClient
+
+client = IDCClient()
+client.fetch_index('clinical_index')
+
+# View available columns
+print(client.clinical_index.columns.tolist())
+```
+
+### Step 2: Discover Available Clinical Data
+
+```python
+# List all collections with clinical data
+collections_with_clinical = client.clinical_index["collection_id"].unique().tolist()
+print(f"{len(collections_with_clinical)} collections have clinical data")
+
+# Find clinical attributes for a specific collection
+nlst_columns = client.clinical_index[client.clinical_index['collection_id']=='nlst']
+nlst_columns[['short_table_name', 'column', 'column_label', 'values']]
+```
+
+### Step 3: Search for Specific Attributes
+
+```python
+# Search by keyword in column_label (case-insensitive)
+stage_attrs = client.clinical_index[
+    client.clinical_index["column_label"].str.contains("[Ss]tage", na=False)
+]
+stage_attrs[["collection_id", "short_table_name", "column", "column_label"]]
+```
+
+### Step 4: Load Clinical Table
+
+```python
+# Load table using short_table_name
+nlst_canc_df = client.get_clinical_table("nlst_canc")
+
+# Examine structure
+print(f"Rows: {len(nlst_canc_df)}, Columns: {len(nlst_canc_df.columns)}")
+nlst_canc_df.head()
+```
+
+### Step 5: Map Coded Values to Descriptions
+
+Many clinical attributes use coded values. The `values` column in `clinical_index` contains an array of observed values with their descriptions (when available).
+
+```python
+# Get the clinical_index rows for NLST
+nlst_clinical_columns = client.clinical_index[client.clinical_index['collection_id']=='nlst']
+
+# Get observed values for a specific column
+# Filter to the row for 'clinical_stag' and extract the values array
+clinical_stag_values = nlst_clinical_columns[
+    nlst_clinical_columns['column']=='clinical_stag'
+]['values'].values[0]
+
+# View the observed values and their descriptions
+print(clinical_stag_values)
+# Output: array([{'option_code': '.M', 'option_description': 'Missing'},
+#                {'option_code': '110', 'option_description': 'Stage IA'},
+#                {'option_code': '120', 'option_description': 'Stage IB'}, ...])
+
+# Create mapping dictionary from codes to descriptions
+mapping_dict = {item['option_code']: item['option_description'] for item in clinical_stag_values}
+
+# Apply to DataFrame - convert column to string first for consistent matching
+nlst_canc_df['clinical_stag_meaning'] = nlst_canc_df['clinical_stag'].astype(str).map(mapping_dict)
+```
+
+### Step 6: Join with Imaging Data
+
+The `dicom_patient_id` column links clinical data to imaging. It matches the `PatientID` column in the imaging index.
+
+```python
+# Pandas merge approach
+import pandas as pd
+
+# Get NLST CT imaging data
+nlst_imaging = client.index[(client.index['collection_id']=='nlst') & (client.index['Modality']=='CT')]
+
+# Join with clinical data
+merged = pd.merge(
+    nlst_imaging[['PatientID', 'StudyInstanceUID']].drop_duplicates(),
+    nlst_canc_df[['dicom_patient_id', 'clinical_stag', 'clinical_stag_meaning']],
+    left_on='PatientID',
+    right_on='dicom_patient_id',
+    how='inner'
+)
+```
+
+```python
+# SQL join approach
+query = """
+SELECT
+  index.PatientID,
+  index.StudyInstanceUID,
+  index.Modality,
+  nlst_canc.clinical_stag
+FROM index
+JOIN nlst_canc ON index.PatientID = nlst_canc.dicom_patient_id
+WHERE index.collection_id = 'nlst' AND index.Modality = 'CT'
+"""
+results = client.sql_query(query)
+```
+
+## Common Use Cases
+
+### Use Case 1: Select Patients by Cancer Stage
+
+```python
+from idc_index import IDCClient
+import pandas as pd
+
+client = IDCClient()
+client.fetch_index('clinical_index')
+
+# Load clinical table
+nlst_canc = client.get_clinical_table("nlst_canc")
+
+# Select Stage IV patients (code '400')
+stage_iv_patients = nlst_canc[nlst_canc['clinical_stag'] == '400']['dicom_patient_id']
+
+# Get CT imaging studies for these patients
+stage_iv_studies = pd.merge(
+    client.index[(client.index['collection_id']=='nlst') & (client.index['Modality']=='CT')],
+    stage_iv_patients,
+    left_on='PatientID',
+    right_on='dicom_patient_id',
+    how='inner'
+)['StudyInstanceUID'].drop_duplicates()
+
+print(f"Found {len(stage_iv_studies)} CT studies for Stage IV patients")
+```
+
+### Use Case 2: Find Collections with Specific Clinical Attributes
+
+```python
+# Find collections with chemotherapy information
+chemo_collections = client.clinical_index[
+    client.clinical_index["column_label"].str.contains("[Cc]hemotherapy", na=False)
+]["collection_id"].unique()
+
+print(f"Collections with chemotherapy data: {list(chemo_collections)}")
+```
+
+### Use Case 3: Examine Observed Values for a Clinical Attribute
+
+```python
+# Find what values have been observed for a specific attribute
+chemotherapy_rows = client.clinical_index[
+    (client.clinical_index["collection_id"] == "hcc_tace_seg") &
+    (client.clinical_index["column"] == "chemotherapy")
+]
+
+# Get the observed values array
+values_list = chemotherapy_rows["values"].tolist()
+print(values_list)
+# Output: [[{'option_code': 'Cisplastin', 'option_description': None},
+#           {'option_code': 'Cisplatin, Mitomycin-C', 'option_description': None}, ...]]
+```
+
+### Use Case 4: Generate Viewer URLs for Selected Patients
+
+```python
+import random
+
+# Get studies for a sample Stage IV patient
+sample_patient = stage_iv_patients.iloc[0]
+studies = client.index[client.index['PatientID'] == sample_patient]['StudyInstanceUID'].unique()
+
+# Generate viewer URL
+if len(studies) > 0:
+    viewer_url = client.get_viewer_URL(studyInstanceUID=studies[0])
+    print(viewer_url)
+```
+
+## Key Concepts
+
+### column vs column_label
+
+- **column**: Use for selecting data from tables (programmatic access)
+- **column_label**: Use for searching/understanding what data means (human-readable)
+
+Some collections (like `c4kc_kits`) have identical column and column_label. Others (like ACRIN collections) have cryptic column names but descriptive labels.
+
+### option_code vs option_description
+
+The `values` array contains observed attribute values:
+- **option_code**: The actual value observed in the column (what you filter on)
+- **option_description**: Human-readable description (from data dictionary if available, otherwise `None`)
+
+### dicom_patient_id
+
+Every clinical table includes `dicom_patient_id`, which matches the `PatientID` column in the imaging index. This is the key for joining clinical and imaging data.
+
+## Troubleshooting
+
+### Issue: Clinical table not found
+
+**Cause:** Using wrong table name or table doesn't exist for collection
+
+**Solution:** Query clinical_index first to find available tables:
+```python
+client.clinical_index[client.clinical_index['collection_id']=='your_collection']['short_table_name'].unique()
+```
+
+### Issue: Empty values array
+
+**Cause:** The `values` array is left empty when a column has >20 unique values
+
+**Solution:** Load the clinical table and examine unique values directly:
+```python
+clinical_df = client.get_clinical_table("table_name")
+clinical_df['column_name'].unique()
+```
+
+### Issue: Coded values not in mapping
+
+**Cause:** Some values may be missing from the dictionary (e.g., empty strings, special codes like `.M` for missing)
+
+**Solution:** Handle unmapped values gracefully:
+```python
+df['meaning'] = df['code'].astype(str).map(mapping_dict).fillna('Unknown/Missing')
+```
+
+### Issue: No matching patients when joining
+
+**Cause:** Clinical data may include patients without images, or vice versa
+
+**Solution:** Verify patient overlap before joining:
+```python
+imaging_patients = set(client.index[client.index['collection_id']=='nlst']['PatientID'].unique())
+clinical_patients = set(clinical_df['dicom_patient_id'].unique())
+overlap = imaging_patients & clinical_patients
+print(f"Patients with both imaging and clinical data: {len(overlap)}")
+```
+
+## Resources
+
+**IDC Documentation:**
+- [Clinical data organization](https://learn.canceridc.dev/data/organization-of-data/clinical) - How clinical data is organized in IDC
+- [Clinical data dashboard](https://datastudio.google.com/u/0/reporting/04cf5976-4ea0-4fee-a749-8bfd162f2e87/page/p_s7mk6eybqc) - Visual summary of available clinical data
+- [idc-index clinical_index documentation](https://idc-index.readthedocs.io/en/latest/column_descriptions.html#clinical-index)
+
+**Related Guides:**
+- `bigquery_guide.md` - Advanced clinical queries via BigQuery
+- Main SKILL.md - Core IDC workflows
+
+**IDC Tutorials:**
+- [clinical_data_intro.ipynb](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/advanced_topics/clinical_data_intro.ipynb)
+- [exploring_clinical_data.ipynb](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/exploring_clinical_data.ipynb)
+- [nlst_clinical_data.ipynb](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/collections_demos/nlst_clinical_data.ipynb)
--- a/scientific-skills/imaging-data-commons/references/digital_pathology_guide.md
+++ b/scientific-skills/imaging-data-commons/references/digital_pathology_guide.md
@@ -0,0 +1,254 @@
+# Digital Pathology Guide for IDC
+
+**Tested with:** IDC data version v23, idc-index 0.11.9
+
+For general IDC queries and downloads, use `idc-index` (see main SKILL.md). This guide covers slide microscopy (SM) imaging, microscopy bulk simple annotations (ANN), and segmentations (SEG) in the context of digital pathology in IDC.
+
+## Index Tables for Digital Pathology
+
+Five specialized index tables provide curated metadata without needing BigQuery:
+
+| Table | Row Granularity | Description |
+|-------|-----------------|-------------|
+| `sm_index` | 1 row = 1 SM series | Slide Microscopy series metadata: lens power, pixel spacing, image dimensions |
+| `sm_instance_index` | 1 row = 1 SM instance | Instance-level (SOPInstanceUID) metadata for individual slide images |
+| `seg_index` | 1 row = 1 SEG series | DICOM Segmentation metadata: algorithm, segment count, reference to source series. Used for both radiology and pathology — filter by source Modality to find pathology-specific segmentations |
+| `ann_index` | 1 row = 1 ANN series | Microscopy Bulk Simple Annotations series metadata; includes `referenced_SeriesInstanceUID` linking to the annotated slide |
+| `ann_group_index` | 1 row = 1 annotation group | Annotation group details: `AnnotationGroupLabel`, `GraphicType`, `NumberOfAnnotations`, `AlgorithmName`, property codes |
+
+All require `client.fetch_index("table_name")` before querying. Use `client.indices_overview` to inspect column schemas programmatically.
+
+## Slide Microscopy Queries
+
+### Basic SM metadata
+
+```python
+from idc_index import IDCClient
+client = IDCClient()
+
+# sm_index has detailed metadata; join with index for collection_id
+client.fetch_index("sm_index")
+client.sql_query("""
+    SELECT i.collection_id, COUNT(*) as slides,
+           MIN(s.min_PixelSpacing_2sf) as min_resolution
+    FROM sm_index s
+    JOIN index i ON s.SeriesInstanceUID = i.SeriesInstanceUID
+    GROUP BY i.collection_id
+    ORDER BY slides DESC
+""")
+```
+
+### Find SM series with specific properties
+
+```python
+# Find high-resolution slides with specific objective lens power
+client.fetch_index("sm_index")
+client.sql_query("""
+    SELECT
+        i.collection_id,
+        i.PatientID,
+        s.ObjectiveLensPower,
+        s.min_PixelSpacing_2sf
+    FROM sm_index s
+    JOIN index i ON s.SeriesInstanceUID = i.SeriesInstanceUID
+    WHERE s.ObjectiveLensPower >= 40
+    ORDER BY s.min_PixelSpacing_2sf
+    LIMIT 20
+""")
+```
+
+## Annotation Queries (ANN)
+
+DICOM Microscopy Bulk Simple Annotations (Modality = 'ANN') are annotations **on** slide microscopy images. They appear in `ann_index` (series-level) and `ann_group_index` (group-level detail). Each ANN series references the slide it annotates via `referenced_SeriesInstanceUID`.
+
+### Basic annotation discovery
+
+```python
+# Find annotation series and their referenced images
+client.fetch_index("ann_index")
+client.fetch_index("ann_group_index")
+
+client.sql_query("""
+    SELECT
+        a.SeriesInstanceUID as ann_series,
+        a.AnnotationCoordinateType,
+        a.referenced_SeriesInstanceUID as source_series
+    FROM ann_index a
+    LIMIT 10
+""")
+```
+
+### Annotation group statistics
+
+```python
+# Get annotation group details (graphic types, counts, algorithms)
+client.sql_query("""
+    SELECT
+        GraphicType,
+        SUM(NumberOfAnnotations) as total_annotations,
+        COUNT(*) as group_count
+    FROM ann_group_index
+    GROUP BY GraphicType
+    ORDER BY total_annotations DESC
+""")
+```
+
+### Find annotations with source slide context
+
+```python
+# Find annotations with their source slide microscopy context
+client.sql_query("""
+    SELECT
+        i.collection_id,
+        g.GraphicType,
+        g.AnnotationPropertyType_CodeMeaning,
+        g.AlgorithmName,
+        g.NumberOfAnnotations
+    FROM ann_group_index g
+    JOIN ann_index a ON g.SeriesInstanceUID = a.SeriesInstanceUID
+    JOIN index i ON a.referenced_SeriesInstanceUID = i.SeriesInstanceUID
+    WHERE g.AlgorithmName IS NOT NULL
+    LIMIT 10
+""")
+```
+
+## Segmentations on Slide Microscopy
+
+DICOM Segmentations (Modality = 'SEG') are used for both radiology (e.g., organ segmentations on CT) and pathology (e.g., tissue region segmentations on whole slide images). Use `seg_index.segmented_SeriesInstanceUID` to find the source series, then filter by source Modality to isolate pathology segmentations.
+
+```python
+# Find segmentations whose source is a slide microscopy image
+client.fetch_index("seg_index")
+client.fetch_index("sm_index")
+client.sql_query("""
+    SELECT
+        seg.SeriesInstanceUID as seg_series,
+        seg.AlgorithmName,
+        seg.total_segments,
+        src.collection_id,
+        src.Modality as source_modality
+    FROM seg_index seg
+    JOIN index src ON seg.segmented_SeriesInstanceUID = src.SeriesInstanceUID
+    WHERE src.Modality = 'SM'
+    LIMIT 20
+""")
+```
+
+## Filter by AnnotationGroupLabel
+
+`AnnotationGroupLabel` is the most direct column for finding annotation groups by name or semantic content. Use `LIKE` with wildcards for text search.
+
+### Simple label filtering
+
+```python
+# Find annotation groups by label (e.g., groups mentioning "blast")
+client.fetch_index("ann_group_index")
+client.sql_query("""
+    SELECT
+        g.SeriesInstanceUID,
+        g.AnnotationGroupLabel,
+        g.GraphicType,
+        g.NumberOfAnnotations,
+        g.AlgorithmName
+    FROM ann_group_index g
+    WHERE LOWER(g.AnnotationGroupLabel) LIKE '%blast%'
+    ORDER BY g.NumberOfAnnotations DESC
+""")
+```
+
+### Label filtering with collection context
+
+```python
+# Find annotation groups matching a label within a specific collection
+client.fetch_index("ann_index")
+client.fetch_index("ann_group_index")
+client.sql_query("""
+    SELECT
+        i.collection_id,
+        g.AnnotationGroupLabel,
+        g.GraphicType,
+        g.NumberOfAnnotations,
+        g.AnnotationPropertyType_CodeMeaning
+    FROM ann_group_index g
+    JOIN ann_index a ON g.SeriesInstanceUID = a.SeriesInstanceUID
+    JOIN index i ON a.SeriesInstanceUID = i.SeriesInstanceUID
+    WHERE i.collection_id = 'your_collection_id'
+      AND LOWER(g.AnnotationGroupLabel) LIKE '%keyword%'
+    ORDER BY g.NumberOfAnnotations DESC
+""")
+```
+
+## Annotations on Slide Microscopy (SM + ANN Cross-Reference)
+
+When looking for annotations related to slide microscopy data, use both SM and ANN tables together. The `ann_index.referenced_SeriesInstanceUID` links each annotation series to its source slide.
+
+```python
+# Find slide microscopy images and their annotations in a collection
+client.fetch_index("sm_index")
+client.fetch_index("ann_index")
+client.fetch_index("ann_group_index")
+client.sql_query("""
+    SELECT
+        i.collection_id,
+        s.ObjectiveLensPower,
+        g.AnnotationGroupLabel,
+        g.NumberOfAnnotations,
+        g.GraphicType
+    FROM ann_group_index g
+    JOIN ann_index a ON g.SeriesInstanceUID = a.SeriesInstanceUID
+    JOIN sm_index s ON a.referenced_SeriesInstanceUID = s.SeriesInstanceUID
+    JOIN index i ON a.SeriesInstanceUID = i.SeriesInstanceUID
+    WHERE i.collection_id = 'your_collection_id'
+    ORDER BY g.NumberOfAnnotations DESC
+""")
+```
+
+## Join Patterns
+
+### SM join (slide microscopy details with collection context)
+
+```python
+client.fetch_index("sm_index")
+result = client.sql_query("""
+    SELECT i.collection_id, i.PatientID, s.ObjectiveLensPower, s.min_PixelSpacing_2sf
+    FROM index i
+    JOIN sm_index s ON i.SeriesInstanceUID = s.SeriesInstanceUID
+    LIMIT 10
+""")
+```
+
+### ANN join (annotation groups with collection context)
+
+```python
+client.fetch_index("ann_index")
+client.fetch_index("ann_group_index")
+result = client.sql_query("""
+    SELECT
+        i.collection_id,
+        g.AnnotationGroupLabel,
+        g.GraphicType,
+        g.NumberOfAnnotations,
+        a.referenced_SeriesInstanceUID as source_series
+    FROM ann_group_index g
+    JOIN ann_index a ON g.SeriesInstanceUID = a.SeriesInstanceUID
+    JOIN index i ON a.SeriesInstanceUID = i.SeriesInstanceUID
+    LIMIT 10
+""")
+```
+
+## Related Tools
+
+The following tools work with DICOM format for digital pathology workflows:
+
+**Python Libraries:**
+- [highdicom](https://github.com/ImagingDataCommons/highdicom) - High-level DICOM abstractions for Python. Create and read DICOM Segmentations (SEG), Structured Reports (SR), and parametric maps for pathology and radiology. Developed by IDC.
+- [wsidicom](https://github.com/imi-bigpicture/wsidicom) - Python package for reading DICOM WSI datasets. Parses metadata into easy-to-use dataclasses for whole slide image analysis.
+- [TIA-Toolbox](https://github.com/TissueImageAnalytics/tiatoolbox) - End-to-end computational pathology library with DICOM support via `DICOMWSIReader`. Provides tile extraction, feature extraction, and pretrained deep learning models.
+- [EZ-WSI-DICOMweb](https://github.com/GoogleCloudPlatform/EZ-WSI-DICOMweb) - Extract image patches from DICOM whole slide images via DICOMweb. Designed for AI/ML workflows with cloud DICOM stores.
+
+**Viewers:**
+- [Slim](https://github.com/ImagingDataCommons/slim) - Web-based DICOM slide microscopy viewer and annotation tool. Supports brightfield and multiplexed immunofluorescence imaging via DICOMweb. Developed by IDC.
+- [QuPath](https://qupath.github.io/) - Cross-platform open source software for whole slide image analysis. Supports DICOM WSI via Bio-Formats and OpenSlide (v0.4.0+).
+
+**Conversion:**
+- [dicom_wsi](https://github.com/Steven-N-Hart/dicom_wsi) - Python implementation for converting proprietary WSI formats to DICOM-compliant files.
--- a/scientific-skills/imaging-data-commons/references/index_tables_guide.md
+++ b/scientific-skills/imaging-data-commons/references/index_tables_guide.md
@@ -0,0 +1,146 @@
+# Index Tables Guide for IDC
+
+**Tested with:** idc-index 0.11.9 (IDC data version v23)
+
+This guide covers the structure and access patterns for IDC index tables: programmatic schema discovery, DataFrame access, and join column references. For the overview of available tables and their purposes, see the "Index Tables" section in the main SKILL.md.
+
+**Complete index table documentation:** https://idc-index.readthedocs.io/en/latest/indices_reference.html
+
+## When to Use This Guide
+
+Load this guide when you need to:
+- Discover table schemas and column types programmatically
+- Access index tables as pandas DataFrames (not via SQL)
+- Understand key columns and join relationships between tables
+
+For SQL query examples (filter discovery, finding annotations, size estimation), see `references/sql_patterns.md`.
+
+## Prerequisites
+
+```bash
+pip install --upgrade idc-index
+```
+
+## Accessing Index Tables
+
+### Via SQL (recommended for filtering/aggregation)
+
+```python
+from idc_index import IDCClient
+client = IDCClient()
+
+# Query the primary index (always available)
+results = client.sql_query("SELECT * FROM index WHERE Modality = 'CT' LIMIT 10")
+
+# Fetch and query additional indices
+client.fetch_index("collections_index")
+collections = client.sql_query("SELECT collection_id, CancerTypes, TumorLocations FROM collections_index")
+
+client.fetch_index("analysis_results_index")
+analysis = client.sql_query("SELECT * FROM analysis_results_index LIMIT 5")
+```
+
+### As pandas DataFrames (direct access)
+
+```python
+# Primary index (always available after client initialization)
+df = client.index
+
+# Fetch and access on-demand indices
+client.fetch_index("sm_index")
+sm_df = client.sm_index
+```
+
+## Discovering Table Schemas
+
+The `indices_overview` dictionary contains complete schema information for all tables. **Always consult this when writing queries or exploring data structure.**
+
+**DICOM attribute mapping:** Many columns are populated directly from DICOM attributes in the source files. The column description in the schema indicates when a column corresponds to a DICOM attribute (e.g., "DICOM Modality attribute" or references a DICOM tag). This allows leveraging DICOM knowledge when querying — standard DICOM attribute names like `PatientID`, `StudyInstanceUID`, `Modality`, `BodyPartExamined` work as expected.
+
+```python
+from idc_index import IDCClient
+client = IDCClient()
+
+# List all available indices with descriptions
+for name, info in client.indices_overview.items():
+    print(f"\n{name}:")
+    print(f"  Installed: {info['installed']}")
+    print(f"  Description: {info['description']}")
+
+# Get complete schema for a specific index (columns, types, descriptions)
+schema = client.indices_overview["index"]["schema"]
+print(f"\nTable: {schema['table_description']}")
+print("\nColumns:")
+for col in schema['columns']:
+    desc = col.get('description', 'No description')
+    # Description indicates if column is from DICOM attribute
+    print(f"  {col['name']} ({col['type']}): {desc}")
+
+# Find columns that are DICOM attributes (check description for "DICOM" reference)
+dicom_cols = [c['name'] for c in schema['columns'] if 'DICOM' in c.get('description', '').upper()]
+print(f"\nDICOM-sourced columns: {dicom_cols}")
+```
+
+**Alternative: use `get_index_schema()` method:**
+```python
+schema = client.get_index_schema("index")
+# Returns same schema dict: {'table_description': ..., 'columns': [...]}
+```
+
+## Key Columns Reference
+
+Most common columns in the primary `index` table (use `indices_overview` for complete list and descriptions):
+
+| Column | Type | DICOM | Description |
+|--------|------|-------|-------------|
+| `collection_id` | STRING | No | IDC collection identifier |
+| `analysis_result_id` | STRING | No | If applicable, indicates what analysis results collection given series is part of |
+| `source_DOI` | STRING | No | DOI linking to dataset details; use for learning more about the content and for attribution (see citations below) |
+| `PatientID` | STRING | Yes | Patient identifier |
+| `StudyInstanceUID` | STRING | Yes | DICOM Study UID |
+| `SeriesInstanceUID` | STRING | Yes | DICOM Series UID — use for downloads/viewing |
+| `Modality` | STRING | Yes | Imaging modality (CT, MR, PT, SM, SEG, ANN, RTSTRUCT, etc.) |
+| `BodyPartExamined` | STRING | Yes | Anatomical region |
+| `SeriesDescription` | STRING | Yes | Description of the series |
+| `Manufacturer` | STRING | Yes | Equipment manufacturer |
+| `StudyDate` | STRING | Yes | Date study was performed |
+| `PatientSex` | STRING | Yes | Patient sex |
+| `PatientAge` | STRING | Yes | Patient age at time of study |
+| `license_short_name` | STRING | No | License type (CC BY 4.0, CC BY-NC 4.0, etc.) |
+| `series_size_MB` | FLOAT | No | Size of series in megabytes |
+| `instanceCount` | INTEGER | No | Number of DICOM instances in series |
+
+**DICOM = Yes**: Column value extracted from the DICOM attribute with the same name. Refer to the [DICOM standard](https://dicom.nema.org/medical/dicom/current/output/chtml/part06/chapter_6.html) for numeric tag mappings. Use standard DICOM knowledge for expected values and formats.
+
+## Join Column Reference
+
+Use this table to identify join columns between index tables. Always call `client.fetch_index("table_name")` before using a table in SQL.
+
+| Table A | Table B | Join Condition |
+|---------|---------|----------------|
+| `index` | `collections_index` | `index.collection_id = collections_index.collection_id` |
+| `index` | `sm_index` | `index.SeriesInstanceUID = sm_index.SeriesInstanceUID` |
+| `index` | `seg_index` | `index.SeriesInstanceUID = seg_index.segmented_SeriesInstanceUID` |
+| `index` | `ann_index` | `index.SeriesInstanceUID = ann_index.SeriesInstanceUID` |
+| `ann_index` | `ann_group_index` | `ann_index.SeriesInstanceUID = ann_group_index.SeriesInstanceUID` |
+| `index` | `clinical_index` | `index.collection_id = clinical_index.collection_id` (then filter by patient) |
+| `index` | `contrast_index` | `index.SeriesInstanceUID = contrast_index.SeriesInstanceUID` |
+
+For complete query examples using these joins, see `references/sql_patterns.md`.
+
+## Troubleshooting
+
+**Issue:** Column not found in table
+- **Cause:** Column name misspelled or doesn't exist in that table
+- **Solution:** Use `client.indices_overview["table_name"]["schema"]["columns"]` to list available columns
+
+**Issue:** DataFrame access returns None
+- **Cause:** Index not fetched or property name incorrect
+- **Solution:** Fetch first with `client.fetch_index()`, then access via property matching the index name
+
+## Resources
+
+- Complete index table documentation: https://idc-index.readthedocs.io/en/latest/indices_reference.html
+- `references/sql_patterns.md` for query examples using these tables
+- `references/clinical_data_guide.md` for clinical data workflows
+- `references/digital_pathology_guide.md` for pathology-specific indices
--- a/scientific-skills/imaging-data-commons/references/sql_patterns.md
+++ b/scientific-skills/imaging-data-commons/references/sql_patterns.md
@@ -0,0 +1,207 @@
+# SQL Query Patterns for IDC
+
+**Tested with:** idc-index 0.11.9 (IDC data version v23)
+
+Quick reference for common SQL query patterns when working with IDC data. For detailed examples with context, see the "Core Capabilities" section in the main SKILL.md.
+
+## When to Use This Guide
+
+Load this guide when you need quick-reference SQL patterns for:
+- Discovering available filter values (modalities, body parts, manufacturers)
+- Finding annotations and segmentations across collections
+- Querying slide microscopy and annotation data
+- Estimating download sizes before download
+- Linking imaging data to clinical data
+
+For table schemas, DataFrame access, and join column references, see `references/index_tables_guide.md`.
+
+## Prerequisites
+
+```bash
+pip install --upgrade idc-index
+```
+
+```python
+from idc_index import IDCClient
+client = IDCClient()
+```
+
+## Discover Available Filter Values
+
+```python
+# What modalities exist?
+client.sql_query("SELECT DISTINCT Modality FROM index")
+
+# What body parts for a specific modality?
+client.sql_query("""
+    SELECT DISTINCT BodyPartExamined, COUNT(*) as n
+    FROM index WHERE Modality = 'CT' AND BodyPartExamined IS NOT NULL
+    GROUP BY BodyPartExamined ORDER BY n DESC
+""")
+
+# What manufacturers for MR?
+client.sql_query("""
+    SELECT DISTINCT Manufacturer, COUNT(*) as n
+    FROM index WHERE Modality = 'MR'
+    GROUP BY Manufacturer ORDER BY n DESC
+""")
+```
+
+## Find Annotations and Segmentations
+
+**Note:** Not all image-derived objects belong to analysis result collections. Some annotations are deposited alongside original images. Use DICOM Modality or SOPClassUID to find all derived objects regardless of collection type.
+
+```python
+# Find ALL segmentations and structure sets by DICOM Modality
+# SEG = DICOM Segmentation, RTSTRUCT = Radiotherapy Structure Set
+client.sql_query("""
+    SELECT collection_id, Modality, COUNT(*) as series_count
+    FROM index
+    WHERE Modality IN ('SEG', 'RTSTRUCT')
+    GROUP BY collection_id, Modality
+    ORDER BY series_count DESC
+""")
+
+# Find segmentations for a specific collection (includes non-analysis-result items)
+client.sql_query("""
+    SELECT SeriesInstanceUID, SeriesDescription, analysis_result_id
+    FROM index
+    WHERE collection_id = 'tcga_luad' AND Modality = 'SEG'
+""")
+
+# List analysis result collections (curated derived datasets)
+client.fetch_index("analysis_results_index")
+client.sql_query("""
+    SELECT analysis_result_id, analysis_result_title, Collections, Modalities
+    FROM analysis_results_index
+""")
+
+# Find analysis results for a specific source collection
+client.sql_query("""
+    SELECT analysis_result_id, analysis_result_title
+    FROM analysis_results_index
+    WHERE Collections LIKE '%tcga_luad%'
+""")
+
+# Use seg_index for detailed DICOM Segmentation metadata
+client.fetch_index("seg_index")
+
+# Get segmentation statistics by algorithm
+client.sql_query("""
+    SELECT AlgorithmName, AlgorithmType, COUNT(*) as seg_count
+    FROM seg_index
+    WHERE AlgorithmName IS NOT NULL
+    GROUP BY AlgorithmName, AlgorithmType
+    ORDER BY seg_count DESC
+    LIMIT 10
+""")
+
+# Find segmentations for specific source images (e.g., chest CT)
+client.sql_query("""
+    SELECT
+        s.SeriesInstanceUID as seg_series,
+        s.AlgorithmName,
+        s.total_segments,
+        s.segmented_SeriesInstanceUID as source_series
+    FROM seg_index s
+    JOIN index src ON s.segmented_SeriesInstanceUID = src.SeriesInstanceUID
+    WHERE src.Modality = 'CT' AND src.BodyPartExamined = 'CHEST'
+    LIMIT 10
+""")
+
+# Find TotalSegmentator results with source image context
+client.sql_query("""
+    SELECT
+        seg_info.collection_id,
+        COUNT(DISTINCT s.SeriesInstanceUID) as seg_count,
+        SUM(s.total_segments) as total_segments
+    FROM seg_index s
+    JOIN index seg_info ON s.SeriesInstanceUID = seg_info.SeriesInstanceUID
+    WHERE s.AlgorithmName LIKE '%TotalSegmentator%'
+    GROUP BY seg_info.collection_id
+    ORDER BY seg_count DESC
+""")
+
+# Use ann_index and ann_group_index for Microscopy Bulk Simple Annotations
+# ann_group_index has AnnotationGroupLabel, GraphicType, NumberOfAnnotations, AlgorithmName
+client.fetch_index("ann_index")
+client.fetch_index("ann_group_index")
+client.sql_query("""
+    SELECT g.AnnotationGroupLabel, g.GraphicType, g.NumberOfAnnotations, i.collection_id
+    FROM ann_group_index g
+    JOIN ann_index a ON g.SeriesInstanceUID = a.SeriesInstanceUID
+    JOIN index i ON a.SeriesInstanceUID = i.SeriesInstanceUID
+    WHERE g.AlgorithmName IS NOT NULL
+    LIMIT 10
+""")
+# See references/digital_pathology_guide.md for AnnotationGroupLabel filtering, SM+ANN joins, and more
+```
+
+## Query Slide Microscopy and Annotation Data
+
+Use `sm_index` for slide microscopy metadata and `ann_index`/`ann_group_index` for annotations on slides (DICOM ANN objects). Filter annotation groups by `AnnotationGroupLabel` to find annotations by name.
+
+```python
+client.fetch_index("sm_index")
+client.fetch_index("ann_index")
+client.fetch_index("ann_group_index")
+
+# Example: find annotation groups by label within a collection
+client.sql_query("""
+    SELECT g.AnnotationGroupLabel, g.GraphicType, g.NumberOfAnnotations
+    FROM ann_group_index g
+    JOIN index i ON g.SeriesInstanceUID = i.SeriesInstanceUID
+    WHERE i.collection_id = 'your_collection_id'
+      AND LOWER(g.AnnotationGroupLabel) LIKE '%keyword%'
+""")
+```
+
+See `references/digital_pathology_guide.md` for SM queries, ANN filtering patterns, SM+ANN cross-references, and join examples.
+
+## Estimate Download Size
+
+```python
+# Size for specific criteria
+client.sql_query("""
+    SELECT SUM(series_size_MB) as total_mb, COUNT(*) as series_count
+    FROM index
+    WHERE collection_id = 'nlst' AND Modality = 'CT'
+""")
+```
+
+## Link to Clinical Data
+
+```python
+client.fetch_index("clinical_index")
+
+# Find collections with clinical data and their tables
+client.sql_query("""
+    SELECT collection_id, table_name, COUNT(DISTINCT column_label) as columns
+    FROM clinical_index
+    GROUP BY collection_id, table_name
+    ORDER BY collection_id
+""")
+```
+
+See `references/clinical_data_guide.md` for complete patterns including value mapping and patient cohort selection.
+
+## Troubleshooting
+
+**Issue:** Query returns error "table not found"
+- **Cause:** Index not fetched before query
+- **Solution:** Call `client.fetch_index("table_name")` before using tables other than the primary `index`
+
+**Issue:** LIKE pattern not matching expected results
+- **Cause:** Case sensitivity or whitespace
+- **Solution:** Use `LOWER(column)` for case-insensitive matching, `TRIM()` for whitespace
+
+**Issue:** JOIN returns fewer rows than expected
+- **Cause:** NULL values in join columns or no matching records
+- **Solution:** Use `LEFT JOIN` to include rows without matches, check for NULLs with `IS NOT NULL`
+
+## Resources
+
+- `references/index_tables_guide.md` for table schemas, DataFrame access, and join column references
+- `references/clinical_data_guide.md` for clinical data patterns and value mapping
+- `references/digital_pathology_guide.md` for pathology-specific queries
+- `references/bigquery_guide.md` for advanced queries requiring full DICOM metadata
--- a/scientific-skills/imaging-data-commons/references/use_cases.md
+++ b/scientific-skills/imaging-data-commons/references/use_cases.md
@@ -0,0 +1,186 @@
+# Common Use Cases for IDC
+
+**Tested with:** idc-index 0.11.9 (IDC data version v23)
+
+This guide provides complete end-to-end workflow examples for common IDC use cases. Each use case demonstrates the full workflow from query to download with best practices.
+
+## When to Use This Guide
+
+Load this guide when you need:
+- Complete end-to-end workflow examples for training dataset creation
+- Patterns for multi-step data selection and download workflows
+- Examples of license-aware data handling for commercial use
+- Visualization workflows for data preview before download
+
+For core API patterns (query, download, visualize, citations), see the "Core Capabilities" section in the main SKILL.md.
+
+## Prerequisites
+
+```bash
+pip install --upgrade idc-index
+```
+
+## Use Case 1: Find and Download Lung CT Scans for Deep Learning
+
+**Objective:** Build training dataset of lung CT scans from NLST collection
+
+**Steps:**
+```python
+from idc_index import IDCClient
+
+client = IDCClient()
+
+# 1. Query for lung CT scans with specific criteria
+query = """
+SELECT
+  PatientID,
+  SeriesInstanceUID,
+  SeriesDescription
+FROM index
+WHERE collection_id = 'nlst'
+  AND Modality = 'CT'
+  AND BodyPartExamined = 'CHEST'
+  AND license_short_name = 'CC BY 4.0'
+ORDER BY PatientID
+LIMIT 100
+"""
+
+results = client.sql_query(query)
+print(f"Found {len(results)} series from {results['PatientID'].nunique()} patients")
+
+# 2. Download data organized by patient
+client.download_from_selection(
+    seriesInstanceUID=list(results['SeriesInstanceUID'].values),
+    downloadDir="./training_data",
+    dirTemplate="%collection_id/%PatientID/%SeriesInstanceUID"
+)
+
+# 3. Save manifest for reproducibility
+results.to_csv('training_manifest.csv', index=False)
+```
+
+## Use Case 2: Query Brain MRI by Manufacturer for Quality Study
+
+**Objective:** Compare image quality across different MRI scanner manufacturers
+
+**Steps:**
+```python
+from idc_index import IDCClient
+import pandas as pd
+
+client = IDCClient()
+
+# Query for brain MRI grouped by manufacturer
+query = """
+SELECT
+  Manufacturer,
+  ManufacturerModelName,
+  COUNT(DISTINCT SeriesInstanceUID) as num_series,
+  COUNT(DISTINCT PatientID) as num_patients
+FROM index
+WHERE Modality = 'MR'
+  AND BodyPartExamined LIKE '%BRAIN%'
+GROUP BY Manufacturer, ManufacturerModelName
+HAVING num_series >= 10
+ORDER BY num_series DESC
+"""
+
+manufacturers = client.sql_query(query)
+print(manufacturers)
+
+# Download sample from each manufacturer for comparison
+for _, row in manufacturers.head(3).iterrows():
+    mfr = row['Manufacturer']
+    model = row['ManufacturerModelName']
+
+    query = f"""
+    SELECT SeriesInstanceUID
+    FROM index
+    WHERE Manufacturer = '{mfr}'
+      AND ManufacturerModelName = '{model}'
+      AND Modality = 'MR'
+      AND BodyPartExamined LIKE '%BRAIN%'
+    LIMIT 5
+    """
+
+    series = client.sql_query(query)
+    client.download_from_selection(
+        seriesInstanceUID=list(series['SeriesInstanceUID'].values),
+        downloadDir=f"./quality_study/{mfr.replace(' ', '_')}"
+    )
+```
+
+## Use Case 3: Visualize Series Without Downloading
+
+**Objective:** Preview imaging data before committing to download
+
+```python
+from idc_index import IDCClient
+import webbrowser
+
+client = IDCClient()
+
+series_list = client.sql_query("""
+    SELECT SeriesInstanceUID, PatientID, SeriesDescription
+    FROM index
+    WHERE collection_id = 'acrin_nsclc_fdg_pet' AND Modality = 'PT'
+    LIMIT 10
+""")
+
+# Preview each in browser
+for _, row in series_list.iterrows():
+    viewer_url = client.get_viewer_URL(seriesInstanceUID=row['SeriesInstanceUID'])
+    print(f"Patient {row['PatientID']}: {row['SeriesDescription']}")
+    print(f"  View at: {viewer_url}")
+    # webbrowser.open(viewer_url)  # Uncomment to open automatically
+```
+
+For additional visualization options, see the [IDC Portal getting started guide](https://learn.canceridc.dev/portal/getting-started) or [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser) for 3D Slicer integration.
+
+## Use Case 4: License-Aware Batch Download for Commercial Use
+
+**Objective:** Download only CC-BY licensed data suitable for commercial applications
+
+**Steps:**
+```python
+from idc_index import IDCClient
+
+client = IDCClient()
+
+# Query ONLY for CC BY licensed data (allows commercial use with attribution)
+query = """
+SELECT
+  SeriesInstanceUID,
+  collection_id,
+  PatientID,
+  Modality
+FROM index
+WHERE license_short_name LIKE 'CC BY%'
+  AND license_short_name NOT LIKE '%NC%'
+  AND Modality IN ('CT', 'MR')
+  AND BodyPartExamined IN ('CHEST', 'BRAIN', 'ABDOMEN')
+LIMIT 200
+"""
+
+cc_by_data = client.sql_query(query)
+
+print(f"Found {len(cc_by_data)} CC BY licensed series")
+print(f"Collections: {cc_by_data['collection_id'].unique()}")
+
+# Download with license verification
+client.download_from_selection(
+    seriesInstanceUID=list(cc_by_data['SeriesInstanceUID'].values),
+    downloadDir="./commercial_dataset",
+    dirTemplate="%collection_id/%Modality/%PatientID/%SeriesInstanceUID"
+)
+
+# Save license information
+cc_by_data.to_csv('commercial_dataset_manifest_CC-BY_ONLY.csv', index=False)
+```
+
+## Resources
+
+- Main SKILL.md for core API patterns (query, download, visualize)
+- `references/clinical_data_guide.md` for clinical data integration workflows
+- `references/sql_patterns.md` for additional SQL query patterns
+- `references/index_tables_guide.md` for complex join patterns