Update TileDB-VCF installation with preferred conda/mamba method

- Add preferred conda environment setup with Python <3.10 - Include M1 Mac specific configuration (CONDA_SUBDIR=osx-64) - Install tiledbvcf-py via mamba from tiledb channel - Restore normal Python examples (not Docker-only) - Keep Docker as alternative installation method
2026-03-27 07:09:27 +08:00 · 2026-02-24 10:21:14 -07:00
parent 18ecbc3b30
commit 6fcc786915
1 changed files with 39 additions and 31 deletions
--- a/scientific-skills/tiledbvcf/SKILL.md
+++ b/scientific-skills/tiledbvcf/SKILL.md
@@ -76,57 +76,65 @@ Use **open source TileDB-VCF** (this skill) when:

 ### Installation

-TileDB-VCF is distributed as Docker images, not pip packages:
-
+**Preferred Method: Conda/Mamba**
+```bash
+# Enter the following two lines if you are on a M1 Mac
+CONDA_SUBDIR=osx-64
+conda config --env --set subdir osx-64
+
+# Create the conda environment
+conda create -n tiledb-vcf "python<3.10"
+conda activate tiledb-vcf
+
+# Mamba is a faster and more reliable alternative to conda
+conda install -c conda-forge mamba
+
+# Install TileDB-Py and TileDB-VCF, align with other useful libraries
+mamba install -y -c conda-forge -c bioconda -c tiledb tiledb-py tiledbvcf-py pandas pyarrow numpy
+```
+
+**Alternative: Docker Images**
 ```bash
-# Pull Docker images
 docker pull tiledb/tiledbvcf-py     # Python interface
 docker pull tiledb/tiledbvcf-cli    # Command-line interface
-
-# Or build from source
-git clone https://github.com/TileDB-Inc/TileDB-VCF.git
-cd TileDB-VCF
-# See documentation for build instructions
 ```

 ### Basic Examples

-**Create and populate a dataset (via Docker):**
-```bash
-# Create dataset
-docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
-  tiledb/tiledbvcf-cli tiledbvcf create -u my_dataset
-
-# Ingest VCF files
-docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
-  tiledb/tiledbvcf-cli tiledbvcf store \
-  -u my_dataset --samples sample1.vcf.gz,sample2.vcf.gz
-```
-
-**Query variant data (Python in Docker):**
+**Create and populate a dataset:**
 ```python
-# Inside tiledb/tiledbvcf-py container
 import tiledbvcf

+# Create a new dataset
+ds = tiledbvcf.Dataset(uri="my_dataset", mode="w",
+                      cfg=tiledbvcf.ReadConfig(memory_budget=1024))
+
+# Ingest VCF files (can be run incrementally)
+ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
+```
+
+**Query variant data:**
+```python
 # Open existing dataset for reading
 ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")

 # Query specific regions and samples
 df = ds.read(
    attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"],
-    regions=["chr1:1000000-2000000"],
-    samples=["sample1", "sample2"]
+    regions=["chr1:1000000-2000000", "chr2:500000-1500000"],
+    samples=["sample1", "sample2", "sample3"]
 )
 print(df.head())
 ```

-**Export to VCF (via CLI):**
-```bash
-# Export query results as BCF
-docker run --rm -v $PWD:/data \
-  tiledb/tiledbvcf-cli tiledbvcf export \
-  --uri my_dataset --regions "chr1:1000000-2000000" \
-  --sample-names "sample1,sample2" --output-format bcf
+**Export to VCF:**
+```python
+# Export query results as VCF
+ds.export_bcf(
+    uri="output.bcf",
+    regions=["chr1:1000000-2000000"],
+    samples=["sample1", "sample2"]
+)
 ```

 ## Core Capabilities