mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-01-26 16:58:56 +08:00
Merge pull request #35 from fedorov/add-idc-clean
Added Imaging Data Commons skill
This commit is contained in:
@@ -41,6 +41,7 @@
|
|||||||
"./scientific-skills/gget",
|
"./scientific-skills/gget",
|
||||||
"./scientific-skills/gtars",
|
"./scientific-skills/gtars",
|
||||||
"./scientific-skills/histolab",
|
"./scientific-skills/histolab",
|
||||||
|
"./scientific-skills/imaging-data-commons",
|
||||||
"./scientific-skills/hypogenic",
|
"./scientific-skills/hypogenic",
|
||||||
"./scientific-skills/lamindb",
|
"./scientific-skills/lamindb",
|
||||||
"./scientific-skills/markitdown",
|
"./scientific-skills/markitdown",
|
||||||
|
|||||||
1150
scientific-skills/imaging-data-commons/SKILL.md
Normal file
1150
scientific-skills/imaging-data-commons/SKILL.md
Normal file
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,289 @@
|
|||||||
|
# BigQuery Guide for IDC
|
||||||
|
|
||||||
|
**Tested with:** IDC data version v23
|
||||||
|
|
||||||
|
For most queries and downloads, use `idc-index` (see main SKILL.md). This guide covers BigQuery for advanced use cases requiring full DICOM metadata or complex joins.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
1. Google account
|
||||||
|
2. Google Cloud project with billing enabled (first 1 TB/month free)
|
||||||
|
3. `google-cloud-bigquery` Python package or BigQuery console access
|
||||||
|
|
||||||
|
**Authentication setup:**
|
||||||
|
```bash
|
||||||
|
# Install Google Cloud SDK, then:
|
||||||
|
gcloud auth application-default login
|
||||||
|
```
|
||||||
|
|
||||||
|
## When to Use BigQuery
|
||||||
|
|
||||||
|
Use BigQuery instead of `idc-index` when you need:
|
||||||
|
- Full DICOM metadata (all 4000+ tags, not just the ~50 in idc-index)
|
||||||
|
- Complex joins across clinical data tables
|
||||||
|
- DICOM sequence attributes (nested structures)
|
||||||
|
- Queries on fields not in the idc-index mini-index
|
||||||
|
|
||||||
|
## Accessing IDC in BigQuery
|
||||||
|
|
||||||
|
### Dataset Structure
|
||||||
|
|
||||||
|
All IDC tables are in the `bigquery-public-data` BigQuery project.
|
||||||
|
|
||||||
|
**Current version (recommended for exploration):**
|
||||||
|
- `bigquery-public-data.idc_current.*`
|
||||||
|
- `bigquery-public-data.idc_current_clinical.*`
|
||||||
|
|
||||||
|
**Versioned datasets (recommended for reproducibility):**
|
||||||
|
|
||||||
|
- `bigquery-public-data.idc_v{IDC version}.*`
|
||||||
|
- `bigquery-public-data.idc_v{IDC version}_clinical.*`
|
||||||
|
|
||||||
|
Always use versioned datasets for reproducible research!
|
||||||
|
|
||||||
|
## Key Tables
|
||||||
|
|
||||||
|
### dicom_all
|
||||||
|
Primary table joining complete DICOM metadata with IDC-specific columns (collection_id, gcs_url, license). Contains all DICOM tags from `dicom_metadata` plus collection and administrative metadata. See [dicom_all.sql](https://github.com/ImagingDataCommons/etl_flow/blob/master/bq/generate_tables_and_views/derived_tables/BQ_Table_Building/derived_data_views/sql/dicom_all.sql) for the exact derivation.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
collection_id,
|
||||||
|
PatientID,
|
||||||
|
StudyInstanceUID,
|
||||||
|
SeriesInstanceUID,
|
||||||
|
Modality,
|
||||||
|
BodyPartExamined,
|
||||||
|
SeriesDescription,
|
||||||
|
gcs_url,
|
||||||
|
license_short_name
|
||||||
|
FROM `bigquery-public-data.idc_current.dicom_all`
|
||||||
|
WHERE Modality = 'CT'
|
||||||
|
AND BodyPartExamined = 'CHEST'
|
||||||
|
LIMIT 10
|
||||||
|
```
|
||||||
|
|
||||||
|
### Derived Tables
|
||||||
|
|
||||||
|
**segmentations** - DICOM Segmentation objects
|
||||||
|
```sql
|
||||||
|
SELECT *
|
||||||
|
FROM `bigquery-public-data.idc_current.segmentations`
|
||||||
|
LIMIT 10
|
||||||
|
```
|
||||||
|
|
||||||
|
**measurement_groups** - SR TID1500 measurement groups
|
||||||
|
**qualitative_measurements** - Coded evaluations
|
||||||
|
**quantitative_measurements** - Numeric measurements
|
||||||
|
|
||||||
|
### Collection Metadata
|
||||||
|
|
||||||
|
**original_collections_metadata** - Collection-level descriptions
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
collection_id,
|
||||||
|
CancerTypes,
|
||||||
|
TumorLocations,
|
||||||
|
Subjects,
|
||||||
|
src.source_doi,
|
||||||
|
src.ImageTypes,
|
||||||
|
src.license.license_short_name
|
||||||
|
FROM `bigquery-public-data.idc_current.original_collections_metadata`,
|
||||||
|
UNNEST(Sources) AS src
|
||||||
|
WHERE CancerTypes LIKE '%Lung%'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Query Patterns
|
||||||
|
|
||||||
|
### Find Collections by Criteria
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
collection_id,
|
||||||
|
COUNT(DISTINCT PatientID) as patient_count,
|
||||||
|
COUNT(DISTINCT SeriesInstanceUID) as series_count,
|
||||||
|
ARRAY_AGG(DISTINCT Modality) as modalities
|
||||||
|
FROM `bigquery-public-data.idc_current.dicom_all`
|
||||||
|
WHERE BodyPartExamined LIKE '%BRAIN%'
|
||||||
|
GROUP BY collection_id
|
||||||
|
HAVING patient_count > 50
|
||||||
|
ORDER BY patient_count DESC
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get Download URLs
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
SeriesInstanceUID,
|
||||||
|
gcs_url
|
||||||
|
FROM `bigquery-public-data.idc_current.dicom_all`
|
||||||
|
WHERE collection_id = 'rider_pilot'
|
||||||
|
AND Modality = 'CT'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Find Studies with Multiple Modalities
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
StudyInstanceUID,
|
||||||
|
ARRAY_AGG(DISTINCT Modality) as modalities,
|
||||||
|
COUNT(DISTINCT SeriesInstanceUID) as series_count
|
||||||
|
FROM `bigquery-public-data.idc_current.dicom_all`
|
||||||
|
GROUP BY StudyInstanceUID
|
||||||
|
HAVING ARRAY_LENGTH(ARRAY_AGG(DISTINCT Modality)) > 1
|
||||||
|
LIMIT 100
|
||||||
|
```
|
||||||
|
|
||||||
|
### License Filtering
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
collection_id,
|
||||||
|
license_short_name,
|
||||||
|
COUNT(*) as instance_count
|
||||||
|
FROM `bigquery-public-data.idc_current.dicom_all`
|
||||||
|
WHERE license_short_name = 'CC BY 4.0'
|
||||||
|
GROUP BY collection_id, license_short_name
|
||||||
|
```
|
||||||
|
|
||||||
|
### Find Segmentations with Source Images
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
src.collection_id,
|
||||||
|
seg.SeriesInstanceUID as seg_series,
|
||||||
|
seg.SegmentedPropertyType,
|
||||||
|
src.SeriesInstanceUID as source_series,
|
||||||
|
src.Modality as source_modality
|
||||||
|
FROM `bigquery-public-data.idc_current.segmentations` seg
|
||||||
|
JOIN `bigquery-public-data.idc_current.dicom_all` src
|
||||||
|
ON seg.segmented_SeriesInstanceUID = src.SeriesInstanceUID
|
||||||
|
WHERE src.collection_id = 'qin_prostate_repeatability'
|
||||||
|
LIMIT 10
|
||||||
|
```
|
||||||
|
|
||||||
|
## Using Query Results with idc-index
|
||||||
|
|
||||||
|
Combine BigQuery for complex queries with idc-index for downloads (no GCP auth needed for downloads):
|
||||||
|
|
||||||
|
```python
|
||||||
|
from google.cloud import bigquery
|
||||||
|
from idc_index import IDCClient
|
||||||
|
|
||||||
|
# Initialize BigQuery client
|
||||||
|
# Requires: pip install google-cloud-bigquery
|
||||||
|
# Auth: gcloud auth application-default login
|
||||||
|
# Project: needed for billing even on public datasets (free tier applies)
|
||||||
|
bq_client = bigquery.Client(project="your-gcp-project-id")
|
||||||
|
|
||||||
|
# Query for series with specific criteria
|
||||||
|
query = """
|
||||||
|
SELECT DISTINCT SeriesInstanceUID
|
||||||
|
FROM `bigquery-public-data.idc_current.dicom_all`
|
||||||
|
WHERE collection_id = 'tcga_luad'
|
||||||
|
AND Modality = 'CT'
|
||||||
|
AND Manufacturer = 'GE MEDICAL SYSTEMS'
|
||||||
|
LIMIT 100
|
||||||
|
"""
|
||||||
|
|
||||||
|
df = bq_client.query(query).to_dataframe()
|
||||||
|
print(f"Found {len(df)} GE CT series")
|
||||||
|
|
||||||
|
# Download with idc-index (no GCP auth required)
|
||||||
|
idc_client = IDCClient()
|
||||||
|
idc_client.download_from_selection(
|
||||||
|
seriesInstanceUID=list(df['SeriesInstanceUID'].values),
|
||||||
|
downloadDir="./tcga_luad_thin_ct"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Cost and Optimization
|
||||||
|
|
||||||
|
**Pricing:** $5 per TB scanned (first 1 TB/month free). Most users stay within free tier.
|
||||||
|
|
||||||
|
**Minimize data scanned:**
|
||||||
|
- Select only needed columns (not `SELECT *`)
|
||||||
|
- Filter early with `WHERE` clauses
|
||||||
|
- Use `LIMIT` when testing
|
||||||
|
- Use `dicom_all` instead of `dicom_metadata` when possible (smaller)
|
||||||
|
- Preview queries in BQ console (free, shows bytes to scan)
|
||||||
|
|
||||||
|
**Check cost before running:**
|
||||||
|
```python
|
||||||
|
query_job = client.query(query, job_config=bigquery.QueryJobConfig(dry_run=True))
|
||||||
|
print(f"Query will scan {query_job.total_bytes_processed / 1e9:.2f} GB")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Use materialized tables:** IDC provides both views (`table_name_view`) and materialized tables (`table_name`). Always use the materialized tables (faster, lower cost).
|
||||||
|
|
||||||
|
## Clinical Data
|
||||||
|
|
||||||
|
Clinical data is in separate datasets with collection-specific tables. Not all collections have clinical data (started in IDC v11).
|
||||||
|
|
||||||
|
**List available clinical tables:**
|
||||||
|
```sql
|
||||||
|
SELECT table_name
|
||||||
|
FROM `bigquery-public-data.idc_current_clinical.INFORMATION_SCHEMA.TABLES`
|
||||||
|
```
|
||||||
|
|
||||||
|
**Query clinical data for a collection:**
|
||||||
|
```sql
|
||||||
|
-- Example: TCGA-LUAD clinical data
|
||||||
|
SELECT *
|
||||||
|
FROM `bigquery-public-data.idc_current_clinical.tcga_luad_clinical`
|
||||||
|
LIMIT 10
|
||||||
|
```
|
||||||
|
|
||||||
|
**Join clinical with imaging data:**
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
d.PatientID,
|
||||||
|
d.SeriesInstanceUID,
|
||||||
|
d.Modality,
|
||||||
|
c.age_at_diagnosis,
|
||||||
|
c.pathologic_stage
|
||||||
|
FROM `bigquery-public-data.idc_current.dicom_all` d
|
||||||
|
JOIN `bigquery-public-data.idc_current_clinical.tcga_luad_clinical` c
|
||||||
|
ON d.PatientID = c.dicom_patient_id
|
||||||
|
WHERE d.collection_id = 'tcga_luad'
|
||||||
|
AND d.Modality = 'CT'
|
||||||
|
LIMIT 20
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** Clinical table schemas vary by collection. Check column names with `INFORMATION_SCHEMA.COLUMNS` before querying.
|
||||||
|
|
||||||
|
## Important Notes
|
||||||
|
|
||||||
|
- Tables are read-only (public dataset)
|
||||||
|
- Schema changes between IDC versions
|
||||||
|
- Use versioned datasets for reproducibility
|
||||||
|
- Some DICOM sequences >15 levels deep are not extracted
|
||||||
|
- Very large sequences (>1MB) may be truncated
|
||||||
|
- Always check data license before use
|
||||||
|
|
||||||
|
## Common Errors
|
||||||
|
|
||||||
|
**Issue: Billing must be enabled**
|
||||||
|
- Cause: BigQuery requires a billing-enabled GCP project
|
||||||
|
- Solution: Enable billing in Google Cloud Console or use idc-index mini-index instead
|
||||||
|
|
||||||
|
**Issue: Query exceeds resource limits**
|
||||||
|
- Cause: Query scans too much data or is too complex
|
||||||
|
- Solution: Add more specific WHERE filters, use LIMIT, break into smaller queries
|
||||||
|
|
||||||
|
**Issue: Column not found**
|
||||||
|
- Cause: Field name typo or not in selected table
|
||||||
|
- Solution: Check table schema first with `INFORMATION_SCHEMA.COLUMNS`
|
||||||
|
|
||||||
|
**Issue: Permission denied**
|
||||||
|
- Cause: Not authenticated to Google Cloud
|
||||||
|
- Solution: Run `gcloud auth application-default login` or set GOOGLE_APPLICATION_CREDENTIALS
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [Understanding the BigQuery DICOM schema](https://docs.cloud.google.com/healthcare-api/docs/how-tos/dicom-bigquery-schema)
|
||||||
|
- [BigQuery Query Syntax](https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax)
|
||||||
|
- [Kaggle Intro to SQL](https://www.kaggle.com/learn/intro-to-sql)
|
||||||
|
- [Sample BigQuery queries of IDC data](https://github.com/ImagingDataCommons/idc-bigquery-cookbook)
|
||||||
@@ -0,0 +1,308 @@
|
|||||||
|
# DICOMweb Guide for IDC
|
||||||
|
|
||||||
|
IDC provides DICOMweb access through Google Cloud Healthcare API DICOM stores. This guide covers the implementation specifics and usage patterns.
|
||||||
|
|
||||||
|
## When to Use DICOMweb
|
||||||
|
|
||||||
|
Use DICOMweb when you need:
|
||||||
|
- Integration with PACS systems or DICOMweb-compatible tools
|
||||||
|
- Streaming metadata without downloading full files
|
||||||
|
- Building custom viewers or web applications
|
||||||
|
- Using existing DICOMweb client libraries (OHIF, dicomweb-client, etc.)
|
||||||
|
|
||||||
|
For most use cases, `idc-index` is simpler and recommended. Use DICOMweb when you specifically need the DICOMweb protocol.
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
### Public Proxy (No Authentication)
|
||||||
|
|
||||||
|
```
|
||||||
|
https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb
|
||||||
|
```
|
||||||
|
|
||||||
|
- Points to the latest IDC version automatically
|
||||||
|
- Daily quota applies (suitable for testing and moderate use)
|
||||||
|
- No authentication required
|
||||||
|
- Note: "viewer-only-no-downloads" in URL is legacy naming with no functional meaning
|
||||||
|
|
||||||
|
### Google Healthcare API (Requires Authentication)
|
||||||
|
|
||||||
|
```
|
||||||
|
https://healthcare.googleapis.com/v1/projects/nci-idc-data/locations/us-central1/datasets/idc/dicomStores/idc-store-v{VERSION}/dicomWeb
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `{VERSION}` with the IDC release number. To find the current version:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from idc_index import IDCClient
|
||||||
|
client = IDCClient()
|
||||||
|
print(client.get_idc_version()) # e.g., "23" for v23
|
||||||
|
```
|
||||||
|
|
||||||
|
The Google Healthcare endpoint requires authentication and provides higher quotas. See [Authentication](#authentication-for-google-healthcare-api) section below.
|
||||||
|
|
||||||
|
## Implementation Details
|
||||||
|
|
||||||
|
IDC DICOMweb is provided through Google Cloud Healthcare API DICOM stores. The implementation follows DICOM PS3.18 Web Services with specific characteristics documented in the [Google Healthcare DICOM conformance statement](https://docs.cloud.google.com/healthcare-api/docs/dicom).
|
||||||
|
|
||||||
|
### Supported Operations
|
||||||
|
|
||||||
|
| Service | Description | Supported |
|
||||||
|
|---------|-------------|-----------|
|
||||||
|
| QIDO-RS | Search for DICOM objects | Yes |
|
||||||
|
| WADO-RS | Retrieve DICOM objects and metadata | Yes |
|
||||||
|
| STOW-RS | Store DICOM objects | No (IDC is read-only) |
|
||||||
|
|
||||||
|
**Not supported:** URI Service, Worklist Service, Non-Patient Instance Service, Capabilities Transactions
|
||||||
|
|
||||||
|
### Searchable DICOM Tags (QIDO-RS)
|
||||||
|
|
||||||
|
The implementation supports a limited set of searchable tags:
|
||||||
|
|
||||||
|
| Level | Searchable Tags |
|
||||||
|
|-------|-----------------|
|
||||||
|
| Study | StudyInstanceUID, PatientName, PatientID, AccessionNumber, ReferringPhysicianName, StudyDate |
|
||||||
|
| Series | All study tags + SeriesInstanceUID, Modality |
|
||||||
|
| Instance | All series tags + SOPInstanceUID |
|
||||||
|
|
||||||
|
**Important:** Only exact matching is supported, except for:
|
||||||
|
- StudyDate: supports range queries
|
||||||
|
- PatientName: supports fuzzy matching
|
||||||
|
|
||||||
|
### Query Limitations
|
||||||
|
|
||||||
|
- Maximum results: 5,000 for studies/series searches; 50,000 for instances
|
||||||
|
- Maximum offset: 1,000,000
|
||||||
|
- DICOM sequence tags larger than ~1 MB are not returned in metadata (BulkDataURI provided instead)
|
||||||
|
|
||||||
|
## Code Examples
|
||||||
|
|
||||||
|
All examples use the public proxy endpoint. For authenticated access to Google Healthcare, see the [authentication section](#authentication-for-google-healthcare-api).
|
||||||
|
|
||||||
|
### Finding UIDs with idc-index
|
||||||
|
|
||||||
|
Use `idc-index` to discover data, then use DICOMweb for metadata access:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from idc_index import IDCClient
|
||||||
|
|
||||||
|
client = IDCClient()
|
||||||
|
|
||||||
|
# Find studies of interest
|
||||||
|
results = client.sql_query("""
|
||||||
|
SELECT StudyInstanceUID, SeriesInstanceUID, PatientID, Modality
|
||||||
|
FROM index
|
||||||
|
WHERE collection_id = 'tcga_luad' AND Modality = 'CT'
|
||||||
|
LIMIT 5
|
||||||
|
""")
|
||||||
|
|
||||||
|
# Use these UIDs with DICOMweb
|
||||||
|
study_uid = results.iloc[0]['StudyInstanceUID']
|
||||||
|
series_uid = results.iloc[0]['SeriesInstanceUID']
|
||||||
|
print(f"Study: {study_uid}")
|
||||||
|
print(f"Series: {series_uid}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### QIDO-RS: Search by UID
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
|
||||||
|
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
|
||||||
|
|
||||||
|
# Search for a specific study
|
||||||
|
study_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.307623500513044641407722230440"
|
||||||
|
response = requests.get(
|
||||||
|
f"{base_url}/studies",
|
||||||
|
params={"StudyInstanceUID": study_uid},
|
||||||
|
headers={"Accept": "application/dicom+json"}
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
studies = response.json()
|
||||||
|
print(f"Found {len(studies)} study")
|
||||||
|
```
|
||||||
|
|
||||||
|
### QIDO-RS: List Series in a Study
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
|
||||||
|
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
|
||||||
|
study_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.307623500513044641407722230440"
|
||||||
|
|
||||||
|
response = requests.get(
|
||||||
|
f"{base_url}/studies/{study_uid}/series",
|
||||||
|
headers={"Accept": "application/dicom+json"}
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
series_list = response.json()
|
||||||
|
for series in series_list:
|
||||||
|
# DICOM tags are returned as hex codes
|
||||||
|
series_uid = series.get("0020000E", {}).get("Value", [None])[0]
|
||||||
|
modality = series.get("00080060", {}).get("Value", [None])[0]
|
||||||
|
description = series.get("0008103E", {}).get("Value", [""])[0]
|
||||||
|
print(f"{modality}: {description}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### QIDO-RS: List Instances in a Series
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
|
||||||
|
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
|
||||||
|
study_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.307623500513044641407722230440"
|
||||||
|
series_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.217441095430480124587725641302"
|
||||||
|
|
||||||
|
response = requests.get(
|
||||||
|
f"{base_url}/studies/{study_uid}/series/{series_uid}/instances",
|
||||||
|
params={"limit": 10},
|
||||||
|
headers={"Accept": "application/dicom+json"}
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
instances = response.json()
|
||||||
|
print(f"Found {len(instances)} instances")
|
||||||
|
for inst in instances[:3]:
|
||||||
|
sop_uid = inst.get("00080018", {}).get("Value", [None])[0]
|
||||||
|
print(f" SOPInstanceUID: {sop_uid}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### WADO-RS: Retrieve Series Metadata
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
|
||||||
|
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
|
||||||
|
study_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.307623500513044641407722230440"
|
||||||
|
series_uid = "1.3.6.1.4.1.14519.5.2.1.6450.9002.217441095430480124587725641302"
|
||||||
|
|
||||||
|
response = requests.get(
|
||||||
|
f"{base_url}/studies/{study_uid}/series/{series_uid}/metadata",
|
||||||
|
headers={"Accept": "application/dicom+json"}
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
instances = response.json()
|
||||||
|
print(f"Retrieved metadata for {len(instances)} instances")
|
||||||
|
|
||||||
|
# Extract image dimensions from first instance
|
||||||
|
if instances:
|
||||||
|
inst = instances[0]
|
||||||
|
rows = inst.get("00280010", {}).get("Value", [None])[0]
|
||||||
|
cols = inst.get("00280011", {}).get("Value", [None])[0]
|
||||||
|
print(f"Image dimensions: {rows} x {cols}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Combined Workflow: idc-index Discovery + DICOMweb Metadata
|
||||||
|
|
||||||
|
```python
|
||||||
|
from idc_index import IDCClient
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Use idc-index for efficient discovery
|
||||||
|
idc = IDCClient()
|
||||||
|
results = idc.sql_query("""
|
||||||
|
SELECT StudyInstanceUID, SeriesInstanceUID, Modality, SeriesDescription
|
||||||
|
FROM index
|
||||||
|
WHERE collection_id = 'nlst' AND Modality = 'CT'
|
||||||
|
LIMIT 1
|
||||||
|
""")
|
||||||
|
|
||||||
|
study_uid = results.iloc[0]['StudyInstanceUID']
|
||||||
|
series_uid = results.iloc[0]['SeriesInstanceUID']
|
||||||
|
print(f"Found: {results.iloc[0]['SeriesDescription']}")
|
||||||
|
|
||||||
|
# Use DICOMweb to stream metadata without downloading files
|
||||||
|
base_url = "https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb"
|
||||||
|
|
||||||
|
response = requests.get(
|
||||||
|
f"{base_url}/studies/{study_uid}/series/{series_uid}/metadata",
|
||||||
|
headers={"Accept": "application/dicom+json"}
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
metadata = response.json()
|
||||||
|
print(f"Retrieved metadata for {len(metadata)} instances without downloading files")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common DICOM Tags Reference
|
||||||
|
|
||||||
|
DICOMweb returns tags as hexadecimal codes. Common tags:
|
||||||
|
|
||||||
|
| Tag | Name | Description |
|
||||||
|
|-----|------|-------------|
|
||||||
|
| 00080018 | SOPInstanceUID | Unique instance identifier |
|
||||||
|
| 00080020 | StudyDate | Date study was performed |
|
||||||
|
| 00080060 | Modality | Imaging modality (CT, MR, PT, etc.) |
|
||||||
|
| 0008103E | SeriesDescription | Description of series |
|
||||||
|
| 00100020 | PatientID | Patient identifier |
|
||||||
|
| 0020000D | StudyInstanceUID | Unique study identifier |
|
||||||
|
| 0020000E | SeriesInstanceUID | Unique series identifier |
|
||||||
|
| 00280010 | Rows | Image height in pixels |
|
||||||
|
| 00280011 | Columns | Image width in pixels |
|
||||||
|
|
||||||
|
## Authentication for Google Healthcare API
|
||||||
|
|
||||||
|
To use the Google Healthcare endpoint with higher quotas:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from google.auth import default
|
||||||
|
from google.auth.transport.requests import Request
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Get credentials (requires gcloud auth)
|
||||||
|
credentials, project = default()
|
||||||
|
credentials.refresh(Request())
|
||||||
|
|
||||||
|
# Build authenticated request
|
||||||
|
base_url = "https://healthcare.googleapis.com/v1/projects/nci-idc-data/locations/us-central1/datasets/idc/dicomStores/idc-store-v23/dicomWeb"
|
||||||
|
|
||||||
|
response = requests.get(
|
||||||
|
f"{base_url}/studies",
|
||||||
|
params={"limit": 5},
|
||||||
|
headers={
|
||||||
|
"Authorization": f"Bearer {credentials.token}",
|
||||||
|
"Accept": "application/dicom+json"
|
||||||
|
}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Prerequisites:**
|
||||||
|
1. Google Cloud SDK installed (`gcloud`)
|
||||||
|
2. Authenticated: `gcloud auth application-default login`
|
||||||
|
3. Account has access to public Google Cloud datasets
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Issue: 400 Bad Request on search queries
|
||||||
|
- **Cause:** Using unsupported search parameters. The implementation only supports specific DICOM tags for filtering.
|
||||||
|
- **Solution:** Use UID-based queries (StudyInstanceUID, SeriesInstanceUID). For filtering by Modality or other attributes, use `idc-index` to discover UIDs first, then query DICOMweb with specific UIDs.
|
||||||
|
|
||||||
|
### Issue: 403 Forbidden on Google Healthcare endpoint
|
||||||
|
- **Cause:** Missing authentication or insufficient permissions
|
||||||
|
- **Solution:** Run `gcloud auth application-default login` and ensure your account has access
|
||||||
|
|
||||||
|
### Issue: 429 Too Many Requests
|
||||||
|
- **Cause:** Rate limit exceeded
|
||||||
|
- **Solution:** Add delays between requests, reduce `limit` values, or use authenticated endpoint for higher quotas
|
||||||
|
|
||||||
|
### Issue: 204 No Content for valid UIDs
|
||||||
|
- **Cause:** UID may be from an older IDC version not in current data
|
||||||
|
- **Solution:** Verify UID exists using `idc-index` query first. The proxy points to the latest IDC version.
|
||||||
|
|
||||||
|
### Issue: Large metadata responses slow to parse
|
||||||
|
- **Cause:** Series with many instances returns large JSON
|
||||||
|
- **Solution:** Use `limit` parameter on instance queries, or query specific instances by SOPInstanceUID
|
||||||
|
|
||||||
|
### Issue: Response missing expected attributes
|
||||||
|
- **Cause:** DICOM sequences larger than ~1 MB are excluded from metadata responses
|
||||||
|
- **Solution:** Retrieve the full DICOM instance using WADO-RS instance retrieval if you need all attributes
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [Google Healthcare DICOM Conformance Statement](https://docs.cloud.google.com/healthcare-api/docs/dicom)
|
||||||
|
- [DICOMweb Standard](https://www.dicomstandard.org/using/dicomweb)
|
||||||
|
- [dicomweb-client Python library](https://dicomweb-client.readthedocs.io/)
|
||||||
|
- [IDC Documentation](https://learn.canceridc.dev/)
|
||||||
Reference in New Issue
Block a user