Fix TileDB-VCF installation instructions

- Correct installation method: Docker images, not pip packages
- Update examples to show Docker container usage
- Based on actual TileDB-VCF repository documentation
This commit is contained in:
Jeremy Leipzig
2026-02-24 10:02:34 -07:00
parent 3c98f0cada
commit 18ecbc3b30

View File

@@ -75,50 +75,58 @@ Use **open source TileDB-VCF** (this skill) when:
## Quick Start ## Quick Start
### Installation ### Installation
```bash
# Install from conda-forge
conda install -c conda-forge tiledbvcf-py
# Or from PyPI TileDB-VCF is distributed as Docker images, not pip packages:
pip install tiledbvcf
```bash
# Pull Docker images
docker pull tiledb/tiledbvcf-py # Python interface
docker pull tiledb/tiledbvcf-cli # Command-line interface
# Or build from source
git clone https://github.com/TileDB-Inc/TileDB-VCF.git
cd TileDB-VCF
# See documentation for build instructions
``` ```
### Basic Examples ### Basic Examples
**Create and populate a dataset:** **Create and populate a dataset (via Docker):**
```python ```bash
import tiledbvcf # Create dataset
docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
tiledb/tiledbvcf-cli tiledbvcf create -u my_dataset
# Create a new dataset # Ingest VCF files
ds = tiledbvcf.Dataset(uri="my_dataset", mode="w", docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
cfg=tiledbvcf.ReadConfig(memory_budget=1024)) tiledb/tiledbvcf-cli tiledbvcf store \
-u my_dataset --samples sample1.vcf.gz,sample2.vcf.gz
# Ingest VCF files (can be run incrementally)
ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
``` ```
**Query variant data:** **Query variant data (Python in Docker):**
```python ```python
# Inside tiledb/tiledbvcf-py container
import tiledbvcf
# Open existing dataset for reading # Open existing dataset for reading
ds = tiledbvcf.Dataset(uri="my_dataset", mode="r") ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")
# Query specific regions and samples # Query specific regions and samples
df = ds.read( df = ds.read(
attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"], attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"],
regions=["chr1:1000000-2000000", "chr2:500000-1500000"], regions=["chr1:1000000-2000000"],
samples=["sample1", "sample2", "sample3"] samples=["sample1", "sample2"]
) )
print(df.head()) print(df.head())
``` ```
**Export to VCF:** **Export to VCF (via CLI):**
```python ```bash
# Export query results as VCF # Export query results as BCF
ds.export_bcf( docker run --rm -v $PWD:/data \
uri="output.bcf", tiledb/tiledbvcf-cli tiledbvcf export \
regions=["chr1:1000000-2000000"], --uri my_dataset --regions "chr1:1000000-2000000" \
samples=["sample1", "sample2"] --sample-names "sample1,sample2" --output-format bcf
)
``` ```
## Core Capabilities ## Core Capabilities