Update TileDB-VCF installation with preferred conda/mamba method

- Add preferred conda environment setup with Python <3.10
- Include M1 Mac specific configuration (CONDA_SUBDIR=osx-64)
- Install tiledbvcf-py via mamba from tiledb channel
- Restore normal Python examples (not Docker-only)
- Keep Docker as alternative installation method
This commit is contained in:
Jeremy Leipzig
2026-02-24 10:21:14 -07:00
parent 18ecbc3b30
commit 6fcc786915

View File

@@ -76,57 +76,65 @@ Use **open source TileDB-VCF** (this skill) when:
### Installation
TileDB-VCF is distributed as Docker images, not pip packages:
**Preferred Method: Conda/Mamba**
```bash
# Enter the following two lines if you are on a M1 Mac
CONDA_SUBDIR=osx-64
conda config --env --set subdir osx-64
# Create the conda environment
conda create -n tiledb-vcf "python<3.10"
conda activate tiledb-vcf
# Mamba is a faster and more reliable alternative to conda
conda install -c conda-forge mamba
# Install TileDB-Py and TileDB-VCF, align with other useful libraries
mamba install -y -c conda-forge -c bioconda -c tiledb tiledb-py tiledbvcf-py pandas pyarrow numpy
```
**Alternative: Docker Images**
```bash
# Pull Docker images
docker pull tiledb/tiledbvcf-py # Python interface
docker pull tiledb/tiledbvcf-cli # Command-line interface
# Or build from source
git clone https://github.com/TileDB-Inc/TileDB-VCF.git
cd TileDB-VCF
# See documentation for build instructions
```
### Basic Examples
**Create and populate a dataset (via Docker):**
```bash
# Create dataset
docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
tiledb/tiledbvcf-cli tiledbvcf create -u my_dataset
# Ingest VCF files
docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
tiledb/tiledbvcf-cli tiledbvcf store \
-u my_dataset --samples sample1.vcf.gz,sample2.vcf.gz
```
**Query variant data (Python in Docker):**
**Create and populate a dataset:**
```python
# Inside tiledb/tiledbvcf-py container
import tiledbvcf
# Create a new dataset
ds = tiledbvcf.Dataset(uri="my_dataset", mode="w",
cfg=tiledbvcf.ReadConfig(memory_budget=1024))
# Ingest VCF files (can be run incrementally)
ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
```
**Query variant data:**
```python
# Open existing dataset for reading
ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")
# Query specific regions and samples
df = ds.read(
attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"],
regions=["chr1:1000000-2000000"],
samples=["sample1", "sample2"]
regions=["chr1:1000000-2000000", "chr2:500000-1500000"],
samples=["sample1", "sample2", "sample3"]
)
print(df.head())
```
**Export to VCF (via CLI):**
```bash
# Export query results as BCF
docker run --rm -v $PWD:/data \
tiledb/tiledbvcf-cli tiledbvcf export \
--uri my_dataset --regions "chr1:1000000-2000000" \
--sample-names "sample1,sample2" --output-format bcf
**Export to VCF:**
```python
# Export query results as VCF
ds.export_bcf(
uri="output.bcf",
regions=["chr1:1000000-2000000"],
samples=["sample1", "sample2"]
)
```
## Core Capabilities