mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Update TileDB-VCF installation with preferred conda/mamba method
- Add preferred conda environment setup with Python <3.10 - Include M1 Mac specific configuration (CONDA_SUBDIR=osx-64) - Install tiledbvcf-py via mamba from tiledb channel - Restore normal Python examples (not Docker-only) - Keep Docker as alternative installation method
This commit is contained in:
@@ -76,57 +76,65 @@ Use **open source TileDB-VCF** (this skill) when:
|
|||||||
|
|
||||||
### Installation
|
### Installation
|
||||||
|
|
||||||
TileDB-VCF is distributed as Docker images, not pip packages:
|
**Preferred Method: Conda/Mamba**
|
||||||
|
```bash
|
||||||
|
# Enter the following two lines if you are on a M1 Mac
|
||||||
|
CONDA_SUBDIR=osx-64
|
||||||
|
conda config --env --set subdir osx-64
|
||||||
|
|
||||||
|
# Create the conda environment
|
||||||
|
conda create -n tiledb-vcf "python<3.10"
|
||||||
|
conda activate tiledb-vcf
|
||||||
|
|
||||||
|
# Mamba is a faster and more reliable alternative to conda
|
||||||
|
conda install -c conda-forge mamba
|
||||||
|
|
||||||
|
# Install TileDB-Py and TileDB-VCF, align with other useful libraries
|
||||||
|
mamba install -y -c conda-forge -c bioconda -c tiledb tiledb-py tiledbvcf-py pandas pyarrow numpy
|
||||||
|
```
|
||||||
|
|
||||||
|
**Alternative: Docker Images**
|
||||||
```bash
|
```bash
|
||||||
# Pull Docker images
|
|
||||||
docker pull tiledb/tiledbvcf-py # Python interface
|
docker pull tiledb/tiledbvcf-py # Python interface
|
||||||
docker pull tiledb/tiledbvcf-cli # Command-line interface
|
docker pull tiledb/tiledbvcf-cli # Command-line interface
|
||||||
|
|
||||||
# Or build from source
|
|
||||||
git clone https://github.com/TileDB-Inc/TileDB-VCF.git
|
|
||||||
cd TileDB-VCF
|
|
||||||
# See documentation for build instructions
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Basic Examples
|
### Basic Examples
|
||||||
|
|
||||||
**Create and populate a dataset (via Docker):**
|
**Create and populate a dataset:**
|
||||||
```bash
|
|
||||||
# Create dataset
|
|
||||||
docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
|
|
||||||
tiledb/tiledbvcf-cli tiledbvcf create -u my_dataset
|
|
||||||
|
|
||||||
# Ingest VCF files
|
|
||||||
docker run --rm -v $PWD:/data -u "$(id -u):$(id -g)" \
|
|
||||||
tiledb/tiledbvcf-cli tiledbvcf store \
|
|
||||||
-u my_dataset --samples sample1.vcf.gz,sample2.vcf.gz
|
|
||||||
```
|
|
||||||
|
|
||||||
**Query variant data (Python in Docker):**
|
|
||||||
```python
|
```python
|
||||||
# Inside tiledb/tiledbvcf-py container
|
|
||||||
import tiledbvcf
|
import tiledbvcf
|
||||||
|
|
||||||
|
# Create a new dataset
|
||||||
|
ds = tiledbvcf.Dataset(uri="my_dataset", mode="w",
|
||||||
|
cfg=tiledbvcf.ReadConfig(memory_budget=1024))
|
||||||
|
|
||||||
|
# Ingest VCF files (can be run incrementally)
|
||||||
|
ds.ingest_samples(["sample1.vcf.gz", "sample2.vcf.gz"])
|
||||||
|
```
|
||||||
|
|
||||||
|
**Query variant data:**
|
||||||
|
```python
|
||||||
# Open existing dataset for reading
|
# Open existing dataset for reading
|
||||||
ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")
|
ds = tiledbvcf.Dataset(uri="my_dataset", mode="r")
|
||||||
|
|
||||||
# Query specific regions and samples
|
# Query specific regions and samples
|
||||||
df = ds.read(
|
df = ds.read(
|
||||||
attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"],
|
attrs=["sample_name", "pos_start", "pos_end", "alleles", "fmt_GT"],
|
||||||
regions=["chr1:1000000-2000000"],
|
regions=["chr1:1000000-2000000", "chr2:500000-1500000"],
|
||||||
samples=["sample1", "sample2"]
|
samples=["sample1", "sample2", "sample3"]
|
||||||
)
|
)
|
||||||
print(df.head())
|
print(df.head())
|
||||||
```
|
```
|
||||||
|
|
||||||
**Export to VCF (via CLI):**
|
**Export to VCF:**
|
||||||
```bash
|
```python
|
||||||
# Export query results as BCF
|
# Export query results as VCF
|
||||||
docker run --rm -v $PWD:/data \
|
ds.export_bcf(
|
||||||
tiledb/tiledbvcf-cli tiledbvcf export \
|
uri="output.bcf",
|
||||||
--uri my_dataset --regions "chr1:1000000-2000000" \
|
regions=["chr1:1000000-2000000"],
|
||||||
--sample-names "sample1,sample2" --output-format bcf
|
samples=["sample1", "sample2"]
|
||||||
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Core Capabilities
|
## Core Capabilities
|
||||||
|
|||||||
Reference in New Issue
Block a user