diff --git a/scientific-skills/tiledbvcf/SKILL.md b/scientific-skills/tiledbvcf/SKILL.md index 5a9e652..8391795 100644 --- a/scientific-skills/tiledbvcf/SKILL.md +++ b/scientific-skills/tiledbvcf/SKILL.md @@ -63,7 +63,10 @@ import tiledbvcf ds = tiledbvcf.Dataset(uri="my_dataset", mode="w", cfg=tiledbvcf.ReadConfig(memory_budget=1024)) -# Ingest VCF files (can be run incrementally) +# Ingest VCF files (must be single-sample with indexes) +# Requirements: +# - VCFs must be single-sample (not multi-sample) +# - Must have indexes: .csi (bcftools) or .tbi (tabix) ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"]) ``` @@ -100,6 +103,10 @@ ds.export( Create TileDB-VCF datasets and incrementally ingest variant data from multiple VCF/BCF files. This is appropriate for building population genomics databases and cohort studies. +**Requirements:** +- **Single-sample VCFs only**: Multi-sample VCFs are not supported +- **Index files required**: VCF/BCF files must have indexes (.csi or .tbi) + **Common operations:** - Create new datasets with optimized array schemas - Ingest single or multiple VCF/BCF files in parallel diff --git a/scientific-skills/tiledbvcf/references/ingestion.md b/scientific-skills/tiledbvcf/references/ingestion.md index b663165..32eba8a 100644 --- a/scientific-skills/tiledbvcf/references/ingestion.md +++ b/scientific-skills/tiledbvcf/references/ingestion.md @@ -2,6 +2,22 @@ Complete guide to creating TileDB-VCF datasets and ingesting VCF/BCF files with optimal performance and reliability. +## Important Requirements + +**Before ingesting VCF files, ensure they meet these requirements:** + +- **Single-sample VCFs only**: Multi-sample VCFs are not supported by TileDB-VCF +- **Index files required**: All VCF/BCF files must have corresponding index files: + - `.csi` files (created with `bcftools index`) + - `.tbi` files (created with `tabix`) + +```bash +# Create indexes if they don't exist +bcftools index sample.vcf.gz # Creates sample.vcf.gz.csi +# OR +tabix -p vcf sample.vcf.gz # Creates sample.vcf.gz.tbi +``` + ## Dataset Creation ### Basic Dataset Creation