Update TileDB-VCF installation with preferred conda/mamba method

- Add preferred conda environment setup with Python <3.10 - Include M1 Mac specific configuration (CONDA_SUBDIR=osx-64) - Install tiledbvcf-py via mamba from tiledb channel - Restore normal Python examples (not Docker-only) - Keep Docker as alternative installation method
2026-03-27 07:09:27 +08:00 · 2026-02-24 10:21:14 -07:00
parent 18ecbc3b30
commit 6fcc786915
1 changed files with 39 additions and 31 deletions
--- a/scientific-skills/tiledbvcf/SKILL.md
+++ b/scientific-skills/tiledbvcf/SKILL.md
@@ -76,57 +76,65 @@ Use **open source TileDB-VCF** (this skill) when:
 ### Installation
-TileDB-VCF is distributed as Docker images, not pip packages:
+**Preferred Method: Conda/Mamba**
-
+```bash
 # Enter the following two lines if you are on a M1 Mac
 CONDA_SUBDIR=osx-64
 conda config --env --set subdir osx-64
 # Create the conda environment
 conda create -n tiledb-vcf "python<3.10"
 conda activate tiledb-vcf
 # Mamba is a faster and more reliable alternative to conda
 conda install -c conda-forge mamba
 # Install TileDB-Py and TileDB-VCF, align with other useful libraries
 mamba install -y -c conda-forge -c bioconda -c tiledb tiledb-py tiledbvcf-py pandas pyarrow numpy
 ```
 **Alternative: Docker Images**
 ```bash
 # Pull Docker images
 docker pull tiledb/tiledbvcf-py     # Python interface
 docker pull tiledb/tiledbvcf-cli    # Command-line interface
 # Or build from source
 git clone https://github.com/TileDB-Inc/TileDB-VCF.git
 cd TileDB-VCF
 # See documentation for build instructions
 ```
 ### Basic Examples
-**Create and populate a dataset (via Docker):**
+**Create and populate a dataset:**
 ```bash
 # Create dataset
 docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
  tiledb/tiledbvcf-cli tiledbvcf create -u my_dataset
 # Ingest VCF files
 docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
  tiledb/tiledbvcf-cli tiledbvcf store \
  -u my_dataset --samples sample1.vcf.gz,sample2.vcf.gz
 ```
 **Query variant data (Python in Docker):**
 ```python
 # Inside tiledb/tiledbvcf-py container
 import tiledbvcf
 # Create a new dataset
 ds = tiledbvcf.Dataset(uri="my_dataset", mode="w",
                      cfg=tiledbvcf.ReadConfig(memory_budget=1024))
 # Ingest VCF files (can be run incrementally)
 ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
 ```
 **Query variant data:**
 ```python
 # Open existing dataset for reading
 ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")
 # Query specific regions and samples
 df = ds.read(
    attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"],
-    regions=["chr1:1000000-2000000"],
+    regions=["chr1:1000000-2000000", "chr2:500000-1500000"],
-    samples=["sample1", "sample2"]
+    samples=["sample1", "sample2", "sample3"]
 )
 print(df.head())
 ```
-**Export to VCF (via CLI):**
+**Export to VCF:**
-```bash
+```python
-# Export query results as BCF
+# Export query results as VCF
-docker run --rm -v $PWD:/data \
+ds.export_bcf(
-  tiledb/tiledbvcf-cli tiledbvcf export \
+    uri="output.bcf",
-  --uri my_dataset --regions "chr1:1000000-2000000" \
+    regions=["chr1:1000000-2000000"],
-  --sample-names "sample1,sample2" --output-format bcf
+    samples=["sample1", "sample2"]
 )
 ```
 ## Core Capabilities