mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Add critical VCF ingestion requirements
- VCFs must be single-sample (multi-sample not supported) - Index files (.csi or .tbi) are required for all VCF/BCF files - Add indexing examples with bcftools and tabix - Document requirements prominently in both main skill and ingestion guide
This commit is contained in:
@@ -63,7 +63,10 @@ import tiledbvcf
|
||||
ds = tiledbvcf.Dataset(uri="my_dataset", mode="w",
|
||||
cfg=tiledbvcf.ReadConfig(memory_budget=1024))
|
||||
|
||||
# Ingest VCF files (can be run incrementally)
|
||||
# Ingest VCF files (must be single-sample with indexes)
|
||||
# Requirements:
|
||||
# - VCFs must be single-sample (not multi-sample)
|
||||
# - Must have indexes: .csi (bcftools) or .tbi (tabix)
|
||||
ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
|
||||
```
|
||||
|
||||
@@ -100,6 +103,10 @@ ds.export(
|
||||
|
||||
Create TileDB-VCF datasets and incrementally ingest variant data from multiple VCF/BCF files. This is appropriate for building population genomics databases and cohort studies.
|
||||
|
||||
**Requirements:**
|
||||
- **Single-sample VCFs only**: Multi-sample VCFs are not supported
|
||||
- **Index files required**: VCF/BCF files must have indexes (.csi or .tbi)
|
||||
|
||||
**Common operations:**
|
||||
- Create new datasets with optimized array schemas
|
||||
- Ingest single or multiple VCF/BCF files in parallel
|
||||
|
||||
Reference in New Issue
Block a user