update imaging-data-commons skill to v1.3.0

2026-03-27 07:09:27 +08:00 · 2026-02-10 18:12:49 -05:00
parent 3a5f2e2227
commit 2597540aa1
6 changed files with 1198 additions and 435 deletions
--- a/scientific-skills/imaging-data-commons/SKILL.md
+++ b/scientific-skills/imaging-data-commons/SKILL.md
@@ -3,9 +3,10 @@ name: imaging-data-commons
 description: Query and download public cancer imaging data from NCI Imaging Data Commons using idc-index. Use for accessing large-scale radiology (CT, MR, PET) and pathology datasets for AI training or research. No authentication required. Query by metadata, visualize in browser, check licenses.
 license: This skill is provided under the MIT License. IDC data itself has individual licensing (mostly CC-BY, some CC-NC) that must be respected when using the data.
 metadata:
-    version: 1.2.0
+    version: 1.3.0
    skill-author: Andrey Fedorov, @fedorov
-    idc-index: "0.11.7"
+    idc-index: "0.11.9"
+    idc-data-version: "v23"
    repository: https://github.com/ImagingDataCommons/idc-claude-skill
 ---

@@ -15,16 +16,20 @@ metadata:

 Use the `idc-index` Python package to query and download public cancer imaging data from the National Cancer Institute Imaging Data Commons (IDC). No authentication required for data access.

+**Current IDC Data Version: v23** (always verify with `IDCClient().get_idc_version()`)
+
 **Primary tool:** `idc-index` ([GitHub](https://github.com/imagingdatacommons/idc-index))

-**Check current data scale for the latest version:**
+**Verify version and check current data scale:**

 ```python
 from idc_index import IDCClient
 client = IDCClient()

-# get IDC data version
-print(client.get_idc_version())
+# IMPORTANT: Always verify you're using the latest IDC data version
+idc_version = client.get_idc_version()
+print(f"IDC data version: {idc_version}")  # Should be "v23"
+# If version is older than v23, upgrade: pip install --upgrade idc-index

 # Get collection count and total series
 stats = client.sql_query("""
@@ -54,6 +59,30 @@ print(stats)
 - Checking data licenses before use in research or commercial applications
 - Visualizing medical images in a browser without local DICOM viewer software

+## Quick Navigation
+
+**Core Sections (inline):**
+- IDC Data Model - Collection and analysis result hierarchy
+- Index Tables - Available tables and joining patterns
+- Installation - Package setup and version verification
+- Core Capabilities - Essential API patterns (query, download, visualize, license, citations, batch)
+- Best Practices - Usage guidelines
+- Troubleshooting - Common issues and solutions
+
+**Reference Guides (load on demand):**
+
+| Guide | When to Load |
+|-------|--------------|
+| `index_tables_guide.md` | Complex JOINs, schema discovery, DataFrame access |
+| `use_cases.md` | End-to-end workflow examples (training datasets, batch downloads) |
+| `sql_patterns.md` | Quick SQL patterns for filter discovery, annotations, size estimation |
+| `clinical_data_guide.md` | Clinical/tabular data, imaging+clinical joins, value mapping |
+| `cloud_storage_guide.md` | Direct S3/GCS access, versioning, UUID mapping |
+| `dicomweb_guide.md` | DICOMweb endpoints, PACS integration |
+| `digital_pathology_guide.md` | Slide microscopy (SM), annotations (ANN), pathology workflows |
+| `bigquery_guide.md` | Full DICOM metadata, private elements (requires GCP) |
+| `cli_guide.md` | Command-line tools (`idc download`, manifest files) |
+
 ## IDC Data Model

 IDC adds two grouping levels above the standard DICOM hierarchy (Patient → Study → Series → Instance):
@@ -75,6 +104,8 @@ Use `collection_id` to find original imaging data, may include annotations depos

 The `idc-index` package provides multiple metadata index tables, accessible via SQL or as pandas DataFrames.

+**Complete index table documentation:** Use https://idc-index.readthedocs.io/en/latest/indices_reference.html for quick check of available tables and columns without executing any code.
+
 **Important:** Use `client.indices_overview` to get current table descriptions and column schemas. This is the authoritative source for available columns and their types — always query it when writing SQL or exploring data structure.

 ### Available Tables
@@ -89,6 +120,9 @@ The `idc-index` package provides multiple metadata index tables, accessible via
 | `sm_index` | 1 row = 1 slide microscopy series | fetch_index() | Slide Microscopy (pathology) series metadata |
 | `sm_instance_index` | 1 row = 1 slide microscopy instance | fetch_index() | Instance-level (SOPInstanceUID) metadata for slide microscopy |
 | `seg_index` | 1 row = 1 DICOM Segmentation series | fetch_index() | Segmentation metadata: algorithm, segment count, reference to source image series |
+| `ann_index` | 1 row = 1 DICOM ANN series | fetch_index() | Microscopy Bulk Simple Annotations series metadata; references annotated image series |
+| `ann_group_index` | 1 row = 1 annotation group | fetch_index() | Detailed annotation group metadata: graphic type, annotation count, property codes, algorithm |
+| `contrast_index` | 1 row = 1 series with contrast info | fetch_index() | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |

 **Auto** = loaded automatically when `IDCClient()` is instantiated
 **fetch_index()** = requires `client.fetch_index("table_name")` to load
@@ -107,140 +141,13 @@ The `idc-index` package provides multiple metadata index tables, accessible via
 | `source_DOI` | index, analysis_results_index | Link by publication DOI |
 | `crdc_series_uuid` | index, prior_versions_index | Link by CRDC unique identifier |
 | `Modality` | index, prior_versions_index | Filter by imaging modality |
-| `SeriesInstanceUID` | index, seg_index | Link segmentation series to its index metadata |
+| `SeriesInstanceUID` | index, seg_index, ann_index, ann_group_index, contrast_index | Link segmentation/annotation/contrast series to its index metadata |
 | `segmented_SeriesInstanceUID` | seg_index → index | Link segmentation to its source image series (join seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
+| `referenced_SeriesInstanceUID` | ann_index → index | Link annotation to its source image series (join ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |

 **Note:** `Subjects`, `Updated`, and `Description` appear in multiple tables but have different meanings (counts vs identifiers, different update contexts).

-**Example joins:**
-```python
-from idc_index import IDCClient
-client = IDCClient()
-
-# Join index with collections_index to get cancer types
-client.fetch_index("collections_index")
-result = client.sql_query("""
-    SELECT i.SeriesInstanceUID, i.Modality, c.CancerTypes, c.TumorLocations
-    FROM index i
-    JOIN collections_index c ON i.collection_id = c.collection_id
-    WHERE i.Modality = 'MR'
-    LIMIT 10
-""")
-
-# Join index with sm_index for slide microscopy details
-client.fetch_index("sm_index")
-result = client.sql_query("""
-    SELECT i.collection_id, i.PatientID, s.ObjectiveLensPower, s.min_PixelSpacing_2sf
-    FROM index i
-    JOIN sm_index s ON i.SeriesInstanceUID = s.SeriesInstanceUID
-    LIMIT 10
-""")
-
-# Join seg_index with index to find segmentations and their source images
-client.fetch_index("seg_index")
-result = client.sql_query("""
-    SELECT
-        s.SeriesInstanceUID as seg_series,
-        s.AlgorithmName,
-        s.total_segments,
-        src.collection_id,
-        src.Modality as source_modality,
-        src.BodyPartExamined
-    FROM seg_index s
-    JOIN index src ON s.segmented_SeriesInstanceUID = src.SeriesInstanceUID
-    WHERE s.AlgorithmType = 'AUTOMATIC'
-    LIMIT 10
-""")
-```
-
-### Accessing Index Tables
-
-**Via SQL (recommended for filtering/aggregation):**
-```python
-from idc_index import IDCClient
-client = IDCClient()
-
-# Query the primary index (always available)
-results = client.sql_query("SELECT * FROM index WHERE Modality = 'CT' LIMIT 10")
-
-# Fetch and query additional indices
-client.fetch_index("collections_index")
-collections = client.sql_query("SELECT collection_id, CancerTypes, TumorLocations FROM collections_index")
-
-client.fetch_index("analysis_results_index")
-analysis = client.sql_query("SELECT * FROM analysis_results_index LIMIT 5")
-```
-
-**As pandas DataFrames (direct access):**
-```python
-# Primary index (always available after client initialization)
-df = client.index
-
-# Fetch and access on-demand indices
-client.fetch_index("sm_index")
-sm_df = client.sm_index
-```
-
-### Discovering Table Schemas (Essential for Query Writing)
-
-The `indices_overview` dictionary contains complete schema information for all tables. **Always consult this when writing queries or exploring data structure.**
-
-**DICOM attribute mapping:** Many columns are populated directly from DICOM attributes in the source files. The column description in the schema indicates when a column corresponds to a DICOM attribute (e.g., "DICOM Modality attribute" or references a DICOM tag). This allows leveraging DICOM knowledge when querying — standard DICOM attribute names like `PatientID`, `StudyInstanceUID`, `Modality`, `BodyPartExamined` work as expected.
-
-```python
-from idc_index import IDCClient
-client = IDCClient()
-
-# List all available indices with descriptions
-for name, info in client.indices_overview.items():
-    print(f"\n{name}:")
-    print(f"  Installed: {info['installed']}")
-    print(f"  Description: {info['description']}")
-
-# Get complete schema for a specific index (columns, types, descriptions)
-schema = client.indices_overview["index"]["schema"]
-print(f"\nTable: {schema['table_description']}")
-print("\nColumns:")
-for col in schema['columns']:
-    desc = col.get('description', 'No description')
-    # Description indicates if column is from DICOM attribute
-    print(f"  {col['name']} ({col['type']}): {desc}")
-
-# Find columns that are DICOM attributes (check description for "DICOM" reference)
-dicom_cols = [c['name'] for c in schema['columns'] if 'DICOM' in c.get('description', '').upper()]
-print(f"\nDICOM-sourced columns: {dicom_cols}")
-```
-
-**Alternative: use `get_index_schema()` method:**
-```python
-schema = client.get_index_schema("index")
-# Returns same schema dict: {'table_description': ..., 'columns': [...]}
-```
-
-### Key Columns in Primary `index` Table
-
-Most common columns for queries (use `indices_overview` for complete list and descriptions):
-
-| Column | Type | DICOM | Description |
-|--------|------|-------|-------------|
-| `collection_id` | STRING | No | IDC collection identifier |
-| `analysis_result_id` | STRING | No | If applicable, indicates what analysis results collection given series is part of |
-| `source_DOI` | STRING | No | DOI linking to dataset details; use for learning more about the content and for attribution (see citations below) |
-| `PatientID` | STRING | Yes | Patient identifier |
-| `StudyInstanceUID` | STRING | Yes | DICOM Study UID |
-| `SeriesInstanceUID` | STRING | Yes | DICOM Series UID — use for downloads/viewing |
-| `Modality` | STRING | Yes | Imaging modality (CT, MR, PT, SM, etc.) |
-| `BodyPartExamined` | STRING | Yes | Anatomical region |
-| `SeriesDescription` | STRING | Yes | Description of the series |
-| `Manufacturer` | STRING | Yes | Equipment manufacturer |
-| `StudyDate` | STRING | Yes | Date study was performed |
-| `PatientSex` | STRING | Yes | Patient sex |
-| `PatientAge` | STRING | Yes | Patient age at time of study |
-| `license_short_name` | STRING | No | License type (CC BY 4.0, CC BY-NC 4.0, etc.) |
-| `series_size_MB` | FLOAT | No | Size of series in megabytes |
-| `instanceCount` | INTEGER | No | Number of DICOM instances in series |
-
-**DICOM = Yes**: Column value extracted from the DICOM attribute with the same name. Refer to the [DICOM standard](https://dicom.nema.org/medical/dicom/current/output/chtml/part06/chapter_6.html) for numeric tag mappings. Use standard DICOM knowledge for expected values and formats.
+For detailed join examples, schema discovery patterns, key columns reference, and DataFrame access, see `references/index_tables_guide.md`.

 ### Clinical Data Access

@@ -301,7 +208,13 @@ pip install --upgrade idc-index

 **Important:** New IDC data release will always trigger a new version of `idc-index`. Always use `--upgrade` flag while installing, unless an older version is needed for reproducibility.

-**Tested with:** idc-index 0.11.7 (IDC data version v23)
+**IMPORTANT:** IDC data version v23 is current. Always verify your version:
+```python
+print(client.get_idc_version())  # Should return "v23"
+```
+If you see an older version, upgrade with: `pip install --upgrade idc-index`
+
+**Tested with:** idc-index 0.11.9 (IDC data version v23)

 **Optional (for data analysis):**
 ```bash
@@ -484,6 +397,15 @@ client.download_from_selection(
 # Results in: ./data/flat/*.dcm
 ```

+**Downloaded file names:**
+
+Individual DICOM files are named using their CRDC instance UUID: `<crdc_instance_uuid>.dcm` (e.g., `0d73f84e-70ae-4eeb-96a0-1c613b5d9229.dcm`). This UUID-based naming:
+- Enables version tracking (UUIDs change when file content changes)
+- Matches cloud storage organization (`s3://idc-open-data/<crdc_series_uuid>/<crdc_instance_uuid>.dcm`)
+- Differs from DICOM UIDs (SOPInstanceUID) which are preserved inside the file metadata
+
+To identify files, use the `crdc_instance_uuid` column in queries or read DICOM metadata (SOPInstanceUID) from the files.
+
 ### Command-Line Download

 The `idc download` command provides command-line access to download functionality without writing Python code. Available after installing `idc-index`.
@@ -705,6 +627,13 @@ For queries requiring full DICOM metadata, complex JOINs, clinical data tables,

 See `references/bigquery_guide.md` for setup, table schemas, query patterns, private element access, and cost optimization.

+**Before using BigQuery**, always check if a specialized index table already has the metadata you need:
+1. Use `client.indices_overview` or the [idc-index indices reference](https://idc-index.readthedocs.io/en/latest/indices_reference.html) to discover all available tables and their columns
+2. Fetch the relevant index: `client.fetch_index("table_name")`
+3. Query locally with `client.sql_query()` (free, no GCP account needed)
+
+Common specialized indices: `seg_index` (segmentations), `ann_index` / `ann_group_index` (microscopy annotations), `sm_index` (slide microscopy), `collections_index` (collection metadata). Only use BigQuery if you need private DICOM elements or attributes not in any index.
+
 ### 8. Tool Selection Guide

 | Task | Tool | Reference |
@@ -782,166 +711,15 @@ sitk.WriteImage(smoothed, "processed_volume.nii.gz")

 ## Common Use Cases

-### Use Case 1: Find and Download Lung CT Scans for Deep Learning
-
-**Objective:** Build training dataset of lung CT scans from NLST collection
-
-**Steps:**
-```python
-from idc_index import IDCClient
-
-client = IDCClient()
-
-# 1. Query for lung CT scans with specific criteria
-query = """
-SELECT
-  PatientID,
-  SeriesInstanceUID,
-  SeriesDescription
-FROM index
-WHERE collection_id = 'nlst'
-  AND Modality = 'CT'
-  AND BodyPartExamined = 'CHEST'
-  AND license_short_name = 'CC BY 4.0'
-ORDER BY PatientID
-LIMIT 100
-"""
-
-results = client.sql_query(query)
-print(f"Found {len(results)} series from {results['PatientID'].nunique()} patients")
-
-# 2. Download data organized by patient
-client.download_from_selection(
-    seriesInstanceUID=list(results['SeriesInstanceUID'].values),
-    downloadDir="./training_data",
-    dirTemplate="%collection_id/%PatientID/%SeriesInstanceUID"
-)
-
-# 3. Save manifest for reproducibility
-results.to_csv('training_manifest.csv', index=False)
-```
-
-### Use Case 2: Query Brain MRI by Manufacturer for Quality Study
-
-**Objective:** Compare image quality across different MRI scanner manufacturers
-
-**Steps:**
-```python
-from idc_index import IDCClient
-import pandas as pd
-
-client = IDCClient()
-
-# Query for brain MRI grouped by manufacturer
-query = """
-SELECT
-  Manufacturer,
-  ManufacturerModelName,
-  COUNT(DISTINCT SeriesInstanceUID) as num_series,
-  COUNT(DISTINCT PatientID) as num_patients
-FROM index
-WHERE Modality = 'MR'
-  AND BodyPartExamined LIKE '%BRAIN%'
-GROUP BY Manufacturer, ManufacturerModelName
-HAVING num_series >= 10
-ORDER BY num_series DESC
-"""
-
-manufacturers = client.sql_query(query)
-print(manufacturers)
-
-# Download sample from each manufacturer for comparison
-for _, row in manufacturers.head(3).iterrows():
-    mfr = row['Manufacturer']
-    model = row['ManufacturerModelName']
-
-    query = f"""
-    SELECT SeriesInstanceUID
-    FROM index
-    WHERE Manufacturer = '{mfr}'
-      AND ManufacturerModelName = '{model}'
-      AND Modality = 'MR'
-      AND BodyPartExamined LIKE '%BRAIN%'
-    LIMIT 5
-    """
-
-    series = client.sql_query(query)
-    client.download_from_selection(
-        seriesInstanceUID=list(series['SeriesInstanceUID'].values),
-        downloadDir=f"./quality_study/{mfr.replace(' ', '_')}"
-    )
-```
-
-### Use Case 3: Visualize Series Without Downloading
-
-**Objective:** Preview imaging data before committing to download
-
-```python
-from idc_index import IDCClient
-import webbrowser
-
-client = IDCClient()
-
-series_list = client.sql_query("""
-    SELECT SeriesInstanceUID, PatientID, SeriesDescription
-    FROM index
-    WHERE collection_id = 'acrin_nsclc_fdg_pet' AND Modality = 'PT'
-    LIMIT 10
-""")
-
-# Preview each in browser
-for _, row in series_list.iterrows():
-    viewer_url = client.get_viewer_URL(seriesInstanceUID=row['SeriesInstanceUID'])
-    print(f"Patient {row['PatientID']}: {row['SeriesDescription']}")
-    print(f"  View at: {viewer_url}")
-    # webbrowser.open(viewer_url)  # Uncomment to open automatically
-```
-
-For additional visualization options, see the [IDC Portal getting started guide](https://learn.canceridc.dev/portal/getting-started) or [SlicerIDCBrowser](https://github.com/ImagingDataCommons/SlicerIDCBrowser) for 3D Slicer integration.
-
-### Use Case 4: License-Aware Batch Download for Commercial Use
-
-**Objective:** Download only CC-BY licensed data suitable for commercial applications
-
-**Steps:**
-```python
-from idc_index import IDCClient
-
-client = IDCClient()
-
-# Query ONLY for CC BY licensed data (allows commercial use with attribution)
-query = """
-SELECT
-  SeriesInstanceUID,
-  collection_id,
-  PatientID,
-  Modality
-FROM index
-WHERE license_short_name LIKE 'CC BY%'
-  AND license_short_name NOT LIKE '%NC%'
-  AND Modality IN ('CT', 'MR')
-  AND BodyPartExamined IN ('CHEST', 'BRAIN', 'ABDOMEN')
-LIMIT 200
-"""
-
-cc_by_data = client.sql_query(query)
-
-print(f"Found {len(cc_by_data)} CC BY licensed series")
-print(f"Collections: {cc_by_data['collection_id'].unique()}")
-
-# Download with license verification
-client.download_from_selection(
-    seriesInstanceUID=list(cc_by_data['SeriesInstanceUID'].values),
-    downloadDir="./commercial_dataset",
-    dirTemplate="%collection_id/%Modality/%PatientID/%SeriesInstanceUID"
-)
-
-# Save license information
-cc_by_data.to_csv('commercial_dataset_manifest_CC-BY_ONLY.csv', index=False)
-```
+See `references/use_cases.md` for complete end-to-end workflow examples including:
+- Building deep learning training datasets from lung CT scans
+- Comparing image quality across scanner manufacturers
+- Previewing data in browser before downloading
+- License-aware batch downloads for commercial use

 ## Best Practices

+- **Verify IDC version before generating responses** - Always call `client.get_idc_version()` at the start of a session to confirm you're using the expected data version (currently v23). If using an older version, recommend `pip install --upgrade idc-index`
 - **Check licenses before use** - Always query the `license_short_name` field and respect licensing terms (CC BY vs CC BY-NC)
 - **Generate citations for attribution** - Use `citations_from_selection()` to get properly formatted citations from `source_DOI` values; include these in publications
 - **Start with small queries** - Use `LIMIT` clause when exploring to avoid long downloads and understand data structure
@@ -989,142 +767,14 @@ cc_by_data.to_csv('commercial_dataset_manifest_CC-BY_ONLY.csv', index=False)

 ## Common SQL Query Patterns

-Quick reference for common queries. For detailed examples with context, see the Core Capabilities section above.
+See `references/sql_patterns.md` for quick-reference SQL patterns including:
+- Filter value discovery (modalities, body parts, manufacturers)
+- Annotation and segmentation queries (including seg_index, ann_index joins)
+- Slide microscopy queries (sm_index patterns)
+- Download size estimation
+- Clinical data linking

-### Discover available filter values
-```python
-# What modalities exist?
-client.sql_query("SELECT DISTINCT Modality FROM index")
-
-# What body parts for a specific modality?
-client.sql_query("""
-    SELECT DISTINCT BodyPartExamined, COUNT(*) as n
-    FROM index WHERE Modality = 'CT' AND BodyPartExamined IS NOT NULL
-    GROUP BY BodyPartExamined ORDER BY n DESC
-""")
-
-# What manufacturers for MR?
-client.sql_query("""
-    SELECT DISTINCT Manufacturer, COUNT(*) as n
-    FROM index WHERE Modality = 'MR'
-    GROUP BY Manufacturer ORDER BY n DESC
-""")
-```
-
-### Find annotations and segmentations
-
-**Note:** Not all image-derived objects belong to analysis result collections. Some annotations are deposited alongside original images. Use DICOM Modality or SOPClassUID to find all derived objects regardless of collection type.
-
-```python
-# Find ALL segmentations and structure sets by DICOM Modality
-# SEG = DICOM Segmentation, RTSTRUCT = Radiotherapy Structure Set
-client.sql_query("""
-    SELECT collection_id, Modality, COUNT(*) as series_count
-    FROM index
-    WHERE Modality IN ('SEG', 'RTSTRUCT')
-    GROUP BY collection_id, Modality
-    ORDER BY series_count DESC
-""")
-
-# Find segmentations for a specific collection (includes non-analysis-result items)
-client.sql_query("""
-    SELECT SeriesInstanceUID, SeriesDescription, analysis_result_id
-    FROM index
-    WHERE collection_id = 'tcga_luad' AND Modality = 'SEG'
-""")
-
-# List analysis result collections (curated derived datasets)
-client.fetch_index("analysis_results_index")
-client.sql_query("""
-    SELECT analysis_result_id, analysis_result_title, Collections, Modalities
-    FROM analysis_results_index
-""")
-
-# Find analysis results for a specific source collection
-client.sql_query("""
-    SELECT analysis_result_id, analysis_result_title
-    FROM analysis_results_index
-    WHERE Collections LIKE '%tcga_luad%'
-""")
-
-# Use seg_index for detailed DICOM Segmentation metadata
-client.fetch_index("seg_index")
-
-# Get segmentation statistics by algorithm
-client.sql_query("""
-    SELECT AlgorithmName, AlgorithmType, COUNT(*) as seg_count
-    FROM seg_index
-    WHERE AlgorithmName IS NOT NULL
-    GROUP BY AlgorithmName, AlgorithmType
-    ORDER BY seg_count DESC
-    LIMIT 10
-""")
-
-# Find segmentations for specific source images (e.g., chest CT)
-client.sql_query("""
-    SELECT
-        s.SeriesInstanceUID as seg_series,
-        s.AlgorithmName,
-        s.total_segments,
-        s.segmented_SeriesInstanceUID as source_series
-    FROM seg_index s
-    JOIN index src ON s.segmented_SeriesInstanceUID = src.SeriesInstanceUID
-    WHERE src.Modality = 'CT' AND src.BodyPartExamined = 'CHEST'
-    LIMIT 10
-""")
-
-# Find TotalSegmentator results with source image context
-client.sql_query("""
-    SELECT
-        seg_info.collection_id,
-        COUNT(DISTINCT s.SeriesInstanceUID) as seg_count,
-        SUM(s.total_segments) as total_segments
-    FROM seg_index s
-    JOIN index seg_info ON s.SeriesInstanceUID = seg_info.SeriesInstanceUID
-    WHERE s.AlgorithmName LIKE '%TotalSegmentator%'
-    GROUP BY seg_info.collection_id
-    ORDER BY seg_count DESC
-""")
-```
-
-### Query slide microscopy data
-```python
-# sm_index has detailed metadata; join with index for collection_id
-client.fetch_index("sm_index")
-client.sql_query("""
-    SELECT i.collection_id, COUNT(*) as slides,
-           MIN(s.min_PixelSpacing_2sf) as min_resolution
-    FROM sm_index s
-    JOIN index i ON s.SeriesInstanceUID = i.SeriesInstanceUID
-    GROUP BY i.collection_id
-    ORDER BY slides DESC
-""")
-```
-
-### Estimate download size
-```python
-# Size for specific criteria
-client.sql_query("""
-    SELECT SUM(series_size_MB) as total_mb, COUNT(*) as series_count
-    FROM index
-    WHERE collection_id = 'nlst' AND Modality = 'CT'
-""")
-```
-
-### Link to clinical data
-```python
-client.fetch_index("clinical_index")
-
-# Find collections with clinical data and their tables
-client.sql_query("""
-    SELECT collection_id, table_name, COUNT(DISTINCT column_label) as columns
-    FROM clinical_index
-    GROUP BY collection_id, table_name
-    ORDER BY collection_id
-""")
-```
-
-See `references/clinical_data_guide.md` for complete patterns including value mapping and patient cohort selection.
+For segmentation and annotation details, also see `references/digital_pathology_guide.md`.

 ## Related Skills

@@ -1134,8 +784,7 @@ The following skills complement IDC workflows for downstream analysis and visual
 - **pydicom** - Read, write, and manipulate downloaded DICOM files. Use for extracting pixel data, reading metadata, anonymization, and format conversion. Essential for working with IDC radiology data (CT, MR, PET).

 ### Pathology and Slide Microscopy
- **histolab** - Lightweight tile extraction and preprocessing for whole slide images. Use for basic slide processing, tissue detection, and dataset preparation from IDC slide microscopy data.
- **pathml** - Full-featured computational pathology toolkit. Use for advanced WSI analysis including multiplexed imaging, nucleus segmentation, and ML model training on pathology data downloaded from IDC.
+See `references/digital_pathology_guide.md` for DICOM-compatible tools (highdicom, wsidicom, TIA-Toolbox, Slim viewer).

 ### Metadata Visualization
 - **matplotlib** - Low-level plotting for full customization. Use for creating static figures summarizing IDC query results (bar charts of modalities, histograms of series counts, etc.).
@@ -1159,11 +808,8 @@ columns = [(c['name'], c['type'], c.get('description', '')) for c in schema['col

 ### Reference Documentation

- **clinical_data_guide.md** - Clinical/tabular data navigation, value mapping, and joining with imaging data
- **cloud_storage_guide.md** - Direct cloud bucket access (S3/GCS), file organization, CRDC UUIDs, versioning, and reproducibility
- **cli_guide.md** - Complete idc-index command-line interface reference (`idc download`, `idc download-from-manifest`, `idc download-from-selection`)
- **bigquery_guide.md** - Advanced BigQuery usage guide for complex metadata queries
- **dicomweb_guide.md** - DICOMweb endpoint URLs, code examples, and Google Healthcare API implementation details
+See the Quick Navigation section at the top for the full list of reference guides with decision triggers.
+
 - **[indices_reference](https://idc-index.readthedocs.io/en/latest/indices_reference.html)** - External documentation for index tables (may be ahead of the installed version)

 ### External Links