mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Add critical VCF ingestion requirements
- VCFs must be single-sample (multi-sample not supported) - Index files (.csi or .tbi) are required for all VCF/BCF files - Add indexing examples with bcftools and tabix - Document requirements prominently in both main skill and ingestion guide
This commit is contained in:
@@ -63,7 +63,10 @@ import tiledbvcf
|
||||
ds = tiledbvcf.Dataset(uri="my_dataset", mode="w",
|
||||
cfg=tiledbvcf.ReadConfig(memory_budget=1024))
|
||||
|
||||
# Ingest VCF files (can be run incrementally)
|
||||
# Ingest VCF files (must be single-sample with indexes)
|
||||
# Requirements:
|
||||
# - VCFs must be single-sample (not multi-sample)
|
||||
# - Must have indexes: .csi (bcftools) or .tbi (tabix)
|
||||
ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
|
||||
```
|
||||
|
||||
@@ -100,6 +103,10 @@ ds.export(
|
||||
|
||||
Create TileDB-VCF datasets and incrementally ingest variant data from multiple VCF/BCF files. This is appropriate for building population genomics databases and cohort studies.
|
||||
|
||||
**Requirements:**
|
||||
- **Single-sample VCFs only**: Multi-sample VCFs are not supported
|
||||
- **Index files required**: VCF/BCF files must have indexes (.csi or .tbi)
|
||||
|
||||
**Common operations:**
|
||||
- Create new datasets with optimized array schemas
|
||||
- Ingest single or multiple VCF/BCF files in parallel
|
||||
|
||||
@@ -2,6 +2,22 @@
|
||||
|
||||
Complete guide to creating TileDB-VCF datasets and ingesting VCF/BCF files with optimal performance and reliability.
|
||||
|
||||
## Important Requirements
|
||||
|
||||
**Before ingesting VCF files, ensure they meet these requirements:**
|
||||
|
||||
- **Single-sample VCFs only**: Multi-sample VCFs are not supported by TileDB-VCF
|
||||
- **Index files required**: All VCF/BCF files must have corresponding index files:
|
||||
- `.csi` files (created with `bcftools index`)
|
||||
- `.tbi` files (created with `tabix`)
|
||||
|
||||
```bash
|
||||
# Create indexes if they don't exist
|
||||
bcftools index sample.vcf.gz # Creates sample.vcf.gz.csi
|
||||
# OR
|
||||
tabix -p vcf sample.vcf.gz # Creates sample.vcf.gz.tbi
|
||||
```
|
||||
|
||||
## Dataset Creation
|
||||
|
||||
### Basic Dataset Creation
|
||||
|
||||
Reference in New Issue
Block a user