From c2b16829f616647ce5f6c6c52ae80ccdd3f9a51a Mon Sep 17 00:00:00 2001 From: "Haoxuan \"Orion\" Li" <92694268+OrionLi545@users.noreply.github.com> Date: Mon, 20 Oct 2025 20:51:50 -0700 Subject: [PATCH] Update SKILL.md files to add double quotation marks for all skills, ensuring clarity and consistency across all entries. --- scientific-databases/alphafold-database/SKILL.md | 2 +- scientific-databases/chembl-database/SKILL.md | 2 +- scientific-databases/clinpgx-database/SKILL.md | 2 +- scientific-databases/clinvar-database/SKILL.md | 2 +- scientific-databases/cosmic-database/SKILL.md | 2 +- scientific-databases/ena-database/SKILL.md | 2 +- scientific-databases/ensembl-database/SKILL.md | 2 +- scientific-databases/gene-database/SKILL.md | 2 +- scientific-databases/geo-database/SKILL.md | 2 +- scientific-databases/gwas-database/SKILL.md | 2 +- scientific-databases/hmdb-database/SKILL.md | 2 +- scientific-databases/kegg-database/SKILL.md | 2 +- scientific-databases/metabolomics-workbench-database/SKILL.md | 2 +- scientific-databases/pdb-database/SKILL.md | 2 +- scientific-databases/pubchem-database/SKILL.md | 2 +- scientific-databases/pubmed-database/SKILL.md | 2 +- .../reactome-database/reactome-database/SKILL.md | 2 +- scientific-databases/string-database/SKILL.md | 2 +- scientific-databases/uniprot-database/SKILL.md | 2 +- scientific-databases/zinc-database/SKILL.md | 2 +- scientific-packages/anndata/SKILL.md | 2 +- scientific-packages/arboreto/SKILL.md | 2 +- scientific-packages/astropy/SKILL.md | 2 +- scientific-packages/biomni/SKILL.md | 2 +- scientific-packages/biopython/SKILL.md | 2 +- scientific-packages/bioservices/SKILL.md | 2 +- scientific-packages/cellxgene-census/SKILL.md | 2 +- scientific-packages/cobrapy/SKILL.md | 2 +- scientific-packages/dask/SKILL.md | 2 +- scientific-packages/datamol/SKILL.md | 2 +- scientific-packages/deepchem/SKILL.md | 2 +- scientific-packages/deeptools/SKILL.md | 2 +- scientific-packages/diffdock/SKILL.md | 2 +- scientific-packages/etetoolkit/SKILL.md | 2 +- scientific-packages/flowio/SKILL.md | 2 +- scientific-packages/gget/SKILL.md | 2 +- scientific-packages/matchms/SKILL.md | 2 +- scientific-packages/matplotlib/SKILL.md | 2 +- scientific-packages/medchem/SKILL.md | 2 +- scientific-packages/molfeat/SKILL.md | 2 +- scientific-packages/polars/SKILL.md | 2 +- scientific-packages/pydeseq2/SKILL.md | 2 +- scientific-packages/pymatgen/SKILL.md | 2 +- scientific-packages/pymc/SKILL.md | 2 +- scientific-packages/pymoo/SKILL.md | 2 +- scientific-packages/pyopenms/SKILL.md | 2 +- scientific-packages/pysam/SKILL.md | 2 +- scientific-packages/pytdc/SKILL.md | 2 +- scientific-packages/pytorch-lightning/SKILL.md | 2 +- scientific-packages/rdkit/SKILL.md | 2 +- scientific-packages/reportlab/SKILL.md | 2 +- scientific-packages/scanpy/SKILL.md | 2 +- scientific-packages/scikit-bio/SKILL.md | 2 +- scientific-packages/scikit-learn/SKILL.md | 2 +- scientific-packages/seaborn/SKILL.md | 2 +- scientific-packages/statsmodels/SKILL.md | 2 +- scientific-packages/torch_geometric/SKILL.md | 2 +- scientific-packages/transformers/SKILL.md | 2 +- scientific-packages/umap-learn/SKILL.md | 2 +- scientific-packages/zarr-python/SKILL.md | 2 +- scientific-thinking/document-skills/pdf/SKILL.md | 2 +- scientific-thinking/exploratory-data-analysis/SKILL.md | 2 +- scientific-thinking/hypothesis-generation/SKILL.md | 2 +- scientific-thinking/peer-review/SKILL.md | 2 +- scientific-thinking/scientific-brainstorming/SKILL.md | 2 +- scientific-thinking/scientific-critical-thinking/SKILL.md | 2 +- scientific-thinking/scientific-visualization/SKILL.md | 2 +- scientific-thinking/statistical-analysis/SKILL.md | 2 +- 68 files changed, 68 insertions(+), 68 deletions(-) diff --git a/scientific-databases/alphafold-database/SKILL.md b/scientific-databases/alphafold-database/SKILL.md index 37de662..dbca28a 100644 --- a/scientific-databases/alphafold-database/SKILL.md +++ b/scientific-databases/alphafold-database/SKILL.md @@ -1,6 +1,6 @@ --- name: alphafold-database -description: Access and analyze AlphaFold protein structure predictions from the DeepMind/EMBL-EBI database containing 200M+ AI-predicted protein structures. Use this skill for: retrieving protein structures by UniProt accession codes, downloading structure files (mmCIF, PDB), accessing confidence metrics (pLDDT scores, PAE matrices), bulk proteome downloads via Google Cloud Storage, structural analysis workflows, drug discovery target preparation, protein engineering studies, evolutionary structural comparisons, structural bioinformatics pipelines, molecular modeling preparation, protein function prediction from structure, structural genomics projects, comparative structural analysis, protein domain identification, binding site analysis, structural quality assessment, confidence score interpretation, batch processing multiple proteins, integrating AlphaFold predictions with experimental data, structural visualization preparation, protein-protein interaction analysis, conformational analysis, structural annotation workflows, homology modeling validation, structural feature extraction, protein classification by structure, structural motif identification, and any computational biology task requiring AI-predicted protein structures. +description: "Access and analyze AlphaFold protein structure predictions from the DeepMind/EMBL-EBI database containing 200M+ AI-predicted protein structures. Use this skill for: retrieving protein structures by UniProt accession codes, downloading structure files (mmCIF, PDB), accessing confidence metrics (pLDDT scores, PAE matrices), bulk proteome downloads via Google Cloud Storage, structural analysis workflows, drug discovery target preparation, protein engineering studies, evolutionary structural comparisons, structural bioinformatics pipelines, molecular modeling preparation, protein function prediction from structure, structural genomics projects, comparative structural analysis, protein domain identification, binding site analysis, structural quality assessment, confidence score interpretation, batch processing multiple proteins, integrating AlphaFold predictions with experimental data, structural visualization preparation, protein-protein interaction analysis, conformational analysis, structural annotation workflows, homology modeling validation, structural feature extraction, protein classification by structure, structural motif identification, and any computational biology task requiring AI-predicted protein structures." --- # AlphaFold Database diff --git a/scientific-databases/chembl-database/SKILL.md b/scientific-databases/chembl-database/SKILL.md index 6932272..817e289 100644 --- a/scientific-databases/chembl-database/SKILL.md +++ b/scientific-databases/chembl-database/SKILL.md @@ -1,6 +1,6 @@ --- name: chembl-database -description: Comprehensive toolkit for accessing and querying the ChEMBL database, the world's largest manually curated repository of bioactive drug-like molecules. Use this skill when you need to: search for compounds by name, structure, or molecular properties; retrieve bioactivity data (IC50, Ki, EC50, etc.) for drug targets; find inhibitors, agonists, or bioactive molecules for specific proteins; perform similarity and substructure searches using SMILES; analyze drug-target interactions and mechanisms of action; explore approved drugs and their indications; conduct structure-activity relationship (SAR) studies; identify kinase inhibitors or other drug classes; perform virtual screening based on molecular properties; retrieve pharmaceutical information and drug discovery data; analyze molecular properties like molecular weight, LogP, and drug-likeness; find compounds for drug repurposing studies; query target information for proteins, enzymes, and biological receptors; export bioactivity data for further analysis. This skill is essential for drug discovery, medicinal chemistry, pharmacology, cheminformatics, and any research involving small molecule therapeutics and their biological activities. +description: "Comprehensive toolkit for accessing and querying the ChEMBL database, the world's largest manually curated repository of bioactive drug-like molecules. Use this skill when you need to: search for compounds by name, structure, or molecular properties; retrieve bioactivity data (IC50, Ki, EC50, etc.) for drug targets; find inhibitors, agonists, or bioactive molecules for specific proteins; perform similarity and substructure searches using SMILES; analyze drug-target interactions and mechanisms of action; explore approved drugs and their indications; conduct structure-activity relationship (SAR) studies; identify kinase inhibitors or other drug classes; perform virtual screening based on molecular properties; retrieve pharmaceutical information and drug discovery data; analyze molecular properties like molecular weight, LogP, and drug-likeness; find compounds for drug repurposing studies; query target information for proteins, enzymes, and biological receptors; export bioactivity data for further analysis. This skill is essential for drug discovery, medicinal chemistry, pharmacology, cheminformatics, and any research involving small molecule therapeutics and their biological activities." --- # ChEMBL Database diff --git a/scientific-databases/clinpgx-database/SKILL.md b/scientific-databases/clinpgx-database/SKILL.md index 2c3d40f..48d2f17 100644 --- a/scientific-databases/clinpgx-database/SKILL.md +++ b/scientific-databases/clinpgx-database/SKILL.md @@ -1,6 +1,6 @@ --- name: clinpgx-database -description: Comprehensive toolkit for accessing ClinPGx (Clinical Pharmacogenomics Database), the successor to PharmGKB providing clinical pharmacogenomics information on how genetic variation affects drug response, metabolism, efficacy, and toxicity. Use this skill for pharmacogenomics research, clinical decision support, gene-drug interaction queries, CPIC guideline access, allele function and frequency analysis, drug metabolism pathway exploration, personalized medicine implementation, precision pharmacotherapy, adverse drug reaction prediction, genotype-guided dosing, clinical annotation retrieval, drug label analysis, variant interpretation, phenotype prediction, population pharmacogenomics, medication therapy management, clinical trial screening, pharmacogene panel analysis, CYP450 enzyme interactions, transporter gene effects, HLA-associated drug reactions, immunosuppressant dosing, oncology drug toxicity, antidepressant response, anticoagulant therapy, pain medication metabolism, antiviral screening, cardiovascular drug interactions, and PharmDOG clinical decision support tool integration. ClinPGx consolidates PharmGKB, CPIC (Clinical Pharmacogenetics Implementation Consortium), PharmCAT, DPWG guidelines, FDA/EMA drug labels, and provides REST API access to curated pharmacogenomic knowledge for evidence-based clinical practice. +description: "Comprehensive toolkit for accessing ClinPGx (Clinical Pharmacogenomics Database), the successor to PharmGKB providing clinical pharmacogenomics information on how genetic variation affects drug response, metabolism, efficacy, and toxicity. Use this skill for pharmacogenomics research, clinical decision support, gene-drug interaction queries, CPIC guideline access, allele function and frequency analysis, drug metabolism pathway exploration, personalized medicine implementation, precision pharmacotherapy, adverse drug reaction prediction, genotype-guided dosing, clinical annotation retrieval, drug label analysis, variant interpretation, phenotype prediction, population pharmacogenomics, medication therapy management, clinical trial screening, pharmacogene panel analysis, CYP450 enzyme interactions, transporter gene effects, HLA-associated drug reactions, immunosuppressant dosing, oncology drug toxicity, antidepressant response, anticoagulant therapy, pain medication metabolism, antiviral screening, cardiovascular drug interactions, and PharmDOG clinical decision support tool integration. ClinPGx consolidates PharmGKB, CPIC (Clinical Pharmacogenetics Implementation Consortium), PharmCAT, DPWG guidelines, FDA/EMA drug labels, and provides REST API access to curated pharmacogenomic knowledge for evidence-based clinical practice." --- # ClinPGx Database diff --git a/scientific-databases/clinvar-database/SKILL.md b/scientific-databases/clinvar-database/SKILL.md index fb73148..e04cc17 100644 --- a/scientific-databases/clinvar-database/SKILL.md +++ b/scientific-databases/clinvar-database/SKILL.md @@ -1,6 +1,6 @@ --- name: clinvar-database -description: Access and analyze ClinVar, NCBI's authoritative database of human genomic variants and their clinical significance classifications. Use this skill when: searching for pathogenic, benign, or VUS variants by gene name, chromosome position, or disease condition; interpreting clinical significance classifications (pathogenic, likely pathogenic, uncertain significance, likely benign, benign) and review status star ratings; querying variant pathogenicity for specific genes like BRCA1, BRCA2, TP53, CFTR; accessing ClinVar data programmatically via NCBI E-utilities API (esearch, esummary, efetch); downloading bulk ClinVar datasets from FTP in XML, VCF, or tab-delimited formats; annotating variant call files (VCF) with clinical significance; analyzing variant-condition relationships for genetic disease research; resolving conflicting variant interpretations between submitters; filtering variants by review status (expert panel, practice guidelines); building local ClinVar databases for genomic analysis pipelines; studying hereditary cancer variants, Mendelian disease mutations, or pharmacogenomic variants; performing variant interpretation for precision medicine applications; accessing ClinVar submission data and evidence criteria; tracking variant classification updates over time; or any task requiring authoritative clinical variant interpretation data from NCBI's ClinVar database. +description: "Access and analyze ClinVar, NCBI's authoritative database of human genomic variants and their clinical significance classifications. Use this skill when: searching for pathogenic, benign, or VUS variants by gene name, chromosome position, or disease condition; interpreting clinical significance classifications (pathogenic, likely pathogenic, uncertain significance, likely benign, benign) and review status star ratings; querying variant pathogenicity for specific genes like BRCA1, BRCA2, TP53, CFTR; accessing ClinVar data programmatically via NCBI E-utilities API (esearch, esummary, efetch); downloading bulk ClinVar datasets from FTP in XML, VCF, or tab-delimited formats; annotating variant call files (VCF) with clinical significance; analyzing variant-condition relationships for genetic disease research; resolving conflicting variant interpretations between submitters; filtering variants by review status (expert panel, practice guidelines); building local ClinVar databases for genomic analysis pipelines; studying hereditary cancer variants, Mendelian disease mutations, or pharmacogenomic variants; performing variant interpretation for precision medicine applications; accessing ClinVar submission data and evidence criteria; tracking variant classification updates over time; or any task requiring authoritative clinical variant interpretation data from NCBI's ClinVar database." --- # ClinVar Database diff --git a/scientific-databases/cosmic-database/SKILL.md b/scientific-databases/cosmic-database/SKILL.md index a1fb8e4..64b92ee 100644 --- a/scientific-databases/cosmic-database/SKILL.md +++ b/scientific-databases/cosmic-database/SKILL.md @@ -1,6 +1,6 @@ --- name: cosmic-database -description: Access and analyze COSMIC (Catalogue of Somatic Mutations in Cancer), the world's largest database of somatic cancer mutations. Use this skill for downloading cancer mutation datasets, accessing the Cancer Gene Census, retrieving mutational signatures, analyzing drug resistance mutations, working with structural variants and gene fusions, accessing copy number alterations, and integrating cancer genomics data into bioinformatics pipelines. Essential for cancer research, oncology drug discovery, precision medicine, tumor profiling, biomarker identification, cancer genomics analysis, somatic variant annotation, mutational signature analysis, cancer gene prioritization, drug resistance studies, and cancer cell line research. Supports both academic (free) and commercial (licensed) access with authentication required. +description: "Access and analyze COSMIC (Catalogue of Somatic Mutations in Cancer), the world's largest database of somatic cancer mutations. Use this skill for downloading cancer mutation datasets, accessing the Cancer Gene Census, retrieving mutational signatures, analyzing drug resistance mutations, working with structural variants and gene fusions, accessing copy number alterations, and integrating cancer genomics data into bioinformatics pipelines. Essential for cancer research, oncology drug discovery, precision medicine, tumor profiling, biomarker identification, cancer genomics analysis, somatic variant annotation, mutational signature analysis, cancer gene prioritization, drug resistance studies, and cancer cell line research. Supports both academic (free) and commercial (licensed) access with authentication required." --- # COSMIC Database diff --git a/scientific-databases/ena-database/SKILL.md b/scientific-databases/ena-database/SKILL.md index 9b7d2dd..2ab21b6 100644 --- a/scientific-databases/ena-database/SKILL.md +++ b/scientific-databases/ena-database/SKILL.md @@ -1,6 +1,6 @@ --- name: ena-database -description: Comprehensive toolkit for accessing, searching, and retrieving data from the European Nucleotide Archive (ENA) - the primary European repository for nucleotide sequence data. Provides programmatic API access for DNA/RNA sequences, genome assemblies, raw sequencing reads (FASTQ), samples, studies, experiments, runs, analyses, and taxonomic records. Use this skill for: retrieving genomic/transcriptomic data by accession numbers (ERR, SRR, PRJ, GCA, etc.), searching sequence databases, downloading raw sequencing data, accessing genome assemblies, finding samples and studies, building bioinformatics pipelines, performing sequence similarity searches, accessing taxonomic information, bulk data downloads, metadata extraction, and integrating ENA data into computational biology workflows. Supports multiple data formats (FASTQ, FASTA, BAM, CRAM, XML, JSON, TSV) and download methods (API, FTP, Aspera). Essential for genomics, transcriptomics, metagenomics, phylogenetics, and molecular biology research requiring access to European nucleotide sequence repositories. +description: "Comprehensive toolkit for accessing, searching, and retrieving data from the European Nucleotide Archive (ENA) - the primary European repository for nucleotide sequence data. Provides programmatic API access for DNA/RNA sequences, genome assemblies, raw sequencing reads (FASTQ), samples, studies, experiments, runs, analyses, and taxonomic records. Use this skill for: retrieving genomic/transcriptomic data by accession numbers (ERR, SRR, PRJ, GCA, etc.), searching sequence databases, downloading raw sequencing data, accessing genome assemblies, finding samples and studies, building bioinformatics pipelines, performing sequence similarity searches, accessing taxonomic information, bulk data downloads, metadata extraction, and integrating ENA data into computational biology workflows. Supports multiple data formats (FASTQ, FASTA, BAM, CRAM, XML, JSON, TSV) and download methods (API, FTP, Aspera). Essential for genomics, transcriptomics, metagenomics, phylogenetics, and molecular biology research requiring access to European nucleotide sequence repositories." --- # ENA Database diff --git a/scientific-databases/ensembl-database/SKILL.md b/scientific-databases/ensembl-database/SKILL.md index 087af10..8bb19c5 100644 --- a/scientific-databases/ensembl-database/SKILL.md +++ b/scientific-databases/ensembl-database/SKILL.md @@ -1,6 +1,6 @@ --- name: ensembl-database -description: Access and query the Ensembl genome database for comprehensive vertebrate genomic data analysis. Use this skill for gene lookups, sequence retrieval, variant analysis, comparative genomics, ortholog/paralog identification, genomic region analysis, assembly mapping, and VEP predictions. Handles gene symbols, Ensembl IDs, rsIDs, genomic coordinates, chromosome regions, cross-species comparisons, evolutionary analysis, regulatory elements, transcript/protein sequences, population genetics, phenotype associations, and genome assembly conversions. Supports REST API queries across 250+ species including human, mouse, zebrafish, and other vertebrates. Essential for genomics research, variant interpretation, evolutionary studies, gene annotation pipelines, and bioinformatics workflows requiring authoritative genomic reference data. +description: "Access and query the Ensembl genome database for comprehensive vertebrate genomic data analysis. Use this skill for gene lookups, sequence retrieval, variant analysis, comparative genomics, ortholog/paralog identification, genomic region analysis, assembly mapping, and VEP predictions. Handles gene symbols, Ensembl IDs, rsIDs, genomic coordinates, chromosome regions, cross-species comparisons, evolutionary analysis, regulatory elements, transcript/protein sequences, population genetics, phenotype associations, and genome assembly conversions. Supports REST API queries across 250+ species including human, mouse, zebrafish, and other vertebrates. Essential for genomics research, variant interpretation, evolutionary studies, gene annotation pipelines, and bioinformatics workflows requiring authoritative genomic reference data." --- # Ensembl Database diff --git a/scientific-databases/gene-database/SKILL.md b/scientific-databases/gene-database/SKILL.md index e827c91..4d46f10 100644 --- a/scientific-databases/gene-database/SKILL.md +++ b/scientific-databases/gene-database/SKILL.md @@ -1,6 +1,6 @@ --- name: gene-database -description: Access and query NCBI Gene database programmatically using E-utilities and Datasets API. Search genes by symbol, name, ID, or biological context across organisms. Retrieve comprehensive gene information including nomenclature, aliases, reference sequences (RefSeqs), chromosomal locations, Gene Ontology annotations, phenotypes, pathways, and cross-references. Perform batch gene lookups, validate gene lists, analyze gene functions, and access gene metadata. Handle gene symbol resolution, organism-specific queries, and gene identifier mapping. Use for gene annotation, functional analysis, pathway enrichment, variant interpretation, and genomic data integration workflows. Supports JSON, XML, GenBank, FASTA, and text output formats with rate limiting and error handling. +description: "Access and query NCBI Gene database programmatically using E-utilities and Datasets API. Search genes by symbol, name, ID, or biological context across organisms. Retrieve comprehensive gene information including nomenclature, aliases, reference sequences (RefSeqs), chromosomal locations, Gene Ontology annotations, phenotypes, pathways, and cross-references. Perform batch gene lookups, validate gene lists, analyze gene functions, and access gene metadata. Handle gene symbol resolution, organism-specific queries, and gene identifier mapping. Use for gene annotation, functional analysis, pathway enrichment, variant interpretation, and genomic data integration workflows. Supports JSON, XML, GenBank, FASTA, and text output formats with rate limiting and error handling." --- # Gene Database diff --git a/scientific-databases/geo-database/SKILL.md b/scientific-databases/geo-database/SKILL.md index dd428a3..66651e6 100644 --- a/scientific-databases/geo-database/SKILL.md +++ b/scientific-databases/geo-database/SKILL.md @@ -1,6 +1,6 @@ --- name: geo-database -description: Work with the Gene Expression Omnibus (GEO) database to search, retrieve, download, and analyze high-throughput gene expression and functional genomics data. Use this skill for microarray data analysis, RNA-seq datasets, gene expression profiling, accessing GEO accessions (GSE series, GSM samples, GPL platforms, GDS datasets), downloading SOFT/MINiML/Matrix files, querying expression experiments, performing differential expression analysis, accessing GEO metadata, batch processing multiple datasets, quality control of expression data, correlation analysis, clustering, meta-analysis across studies, biomarker discovery, drug response studies, disease biology research, transcriptomics analysis, or when needing programmatic access to functional genomics repositories. This skill covers GEOparse library usage, NCBI E-utilities API, FTP downloads, data preprocessing, statistical analysis, visualization, and integration with downstream analysis workflows. +description: "Work with the Gene Expression Omnibus (GEO) database to search, retrieve, download, and analyze high-throughput gene expression and functional genomics data. Use this skill for microarray data analysis, RNA-seq datasets, gene expression profiling, accessing GEO accessions (GSE series, GSM samples, GPL platforms, GDS datasets), downloading SOFT/MINiML/Matrix files, querying expression experiments, performing differential expression analysis, accessing GEO metadata, batch processing multiple datasets, quality control of expression data, correlation analysis, clustering, meta-analysis across studies, biomarker discovery, drug response studies, disease biology research, transcriptomics analysis, or when needing programmatic access to functional genomics repositories. This skill covers GEOparse library usage, NCBI E-utilities API, FTP downloads, data preprocessing, statistical analysis, visualization, and integration with downstream analysis workflows." --- # GEO Database diff --git a/scientific-databases/gwas-database/SKILL.md b/scientific-databases/gwas-database/SKILL.md index d1f1fcf..0932df0 100644 --- a/scientific-databases/gwas-database/SKILL.md +++ b/scientific-databases/gwas-database/SKILL.md @@ -1,6 +1,6 @@ --- name: gwas-database -description: Comprehensive toolkit for accessing and querying the GWAS Catalog (NHGRI-EBI database of published genome-wide association studies). Use this skill when you need to: find genetic variants (SNPs) associated with diseases or traits, retrieve SNP-trait associations with p-values and effect sizes, access GWAS summary statistics and genome-wide data, explore gene-disease relationships and pleiotropy, conduct genetic epidemiology research, perform systematic reviews of genetic associations, identify variants for polygenic risk scores, investigate population-specific genetic associations, access curated SNP-trait associations from thousands of GWAS publications, query by rs ID variants, search by disease/trait names, find variants in specific genes or chromosomal regions, retrieve study metadata and publication information, access full summary statistics for downstream analysis, cross-reference with Ensembl and other genomic databases, or perform meta-analyses of genetic associations. Essential for human genetics research, precision medicine applications, genomic research, pharmacogenomics, population genetics, and genetic risk prediction studies. +description: "Comprehensive toolkit for accessing and querying the GWAS Catalog (NHGRI-EBI database of published genome-wide association studies). Use this skill when you need to: find genetic variants (SNPs) associated with diseases or traits, retrieve SNP-trait associations with p-values and effect sizes, access GWAS summary statistics and genome-wide data, explore gene-disease relationships and pleiotropy, conduct genetic epidemiology research, perform systematic reviews of genetic associations, identify variants for polygenic risk scores, investigate population-specific genetic associations, access curated SNP-trait associations from thousands of GWAS publications, query by rs ID variants, search by disease/trait names, find variants in specific genes or chromosomal regions, retrieve study metadata and publication information, access full summary statistics for downstream analysis, cross-reference with Ensembl and other genomic databases, or perform meta-analyses of genetic associations. Essential for human genetics research, precision medicine applications, genomic research, pharmacogenomics, population genetics, and genetic risk prediction studies." --- # GWAS Catalog Database diff --git a/scientific-databases/hmdb-database/SKILL.md b/scientific-databases/hmdb-database/SKILL.md index 09c5f1d..7659a20 100644 --- a/scientific-databases/hmdb-database/SKILL.md +++ b/scientific-databases/hmdb-database/SKILL.md @@ -1,6 +1,6 @@ --- name: hmdb-database -description: Access and analyze the Human Metabolome Database (HMDB) for comprehensive metabolite information, metabolomics research, biomarker discovery, metabolite identification, pathway analysis, and clinical associations. Use this skill for: searching metabolites by name, HMDB ID, structure, or spectral data; retrieving chemical properties (SMILES, InChI, molecular weight, formula); accessing clinical biomarker data and disease associations; downloading bulk datasets (XML, SDF, CSV formats); analyzing metabolic pathways and enzyme associations; performing spectral matching for metabolite identification; accessing concentration ranges in biological fluids; integrating with external databases (KEGG, PubChem, ChEBI); and supporting untargeted metabolomics workflows. HMDB contains 220,945+ metabolite entries with chemical, biological, clinical, and analytical data including NMR/MS spectra, pathway information, and biomarker associations for human metabolomics research. +description: "Access and analyze the Human Metabolome Database (HMDB) for comprehensive metabolite information, metabolomics research, biomarker discovery, metabolite identification, pathway analysis, and clinical associations. Use this skill for: searching metabolites by name, HMDB ID, structure, or spectral data; retrieving chemical properties (SMILES, InChI, molecular weight, formula); accessing clinical biomarker data and disease associations; downloading bulk datasets (XML, SDF, CSV formats); analyzing metabolic pathways and enzyme associations; performing spectral matching for metabolite identification; accessing concentration ranges in biological fluids; integrating with external databases (KEGG, PubChem, ChEBI); and supporting untargeted metabolomics workflows. HMDB contains 220,945+ metabolite entries with chemical, biological, clinical, and analytical data including NMR/MS spectra, pathway information, and biomarker associations for human metabolomics research." --- # HMDB Database diff --git a/scientific-databases/kegg-database/SKILL.md b/scientific-databases/kegg-database/SKILL.md index 0e30112..6a525c7 100644 --- a/scientific-databases/kegg-database/SKILL.md +++ b/scientific-databases/kegg-database/SKILL.md @@ -1,6 +1,6 @@ --- name: kegg-database -description: Access and analyze the KEGG (Kyoto Encyclopedia of Genes and Genomes) database for comprehensive biological pathway analysis, molecular interaction networks, and cross-database integration. Use this skill for pathway enrichment analysis, gene-to-pathway mapping, metabolic pathway exploration, drug-drug interaction checking, compound structure retrieval, enzyme pathway analysis, disease pathway investigation, and identifier conversion between KEGG and external databases (UniProt, NCBI Gene, PubChem, ChEBI). Supports querying pathways, genes, compounds, enzymes, diseases, drugs, reactions, modules, and orthology groups across multiple organisms including human, mouse, yeast, E. coli, and fruit fly. Key operations include database information retrieval, entry listing and searching, detailed entry retrieval in multiple formats (FASTA sequences, MOL structures, pathway images, KGML XML, JSON), cross-referencing between databases, ID conversion, and drug interaction analysis. Essential for bioinformatics workflows involving pathway analysis, systems biology, drug discovery, metabolic engineering, comparative genomics, and functional annotation of genes and proteins. +description: "Access and analyze the KEGG (Kyoto Encyclopedia of Genes and Genomes) database for comprehensive biological pathway analysis, molecular interaction networks, and cross-database integration. Use this skill for pathway enrichment analysis, gene-to-pathway mapping, metabolic pathway exploration, drug-drug interaction checking, compound structure retrieval, enzyme pathway analysis, disease pathway investigation, and identifier conversion between KEGG and external databases (UniProt, NCBI Gene, PubChem, ChEBI). Supports querying pathways, genes, compounds, enzymes, diseases, drugs, reactions, modules, and orthology groups across multiple organisms including human, mouse, yeast, E. coli, and fruit fly. Key operations include database information retrieval, entry listing and searching, detailed entry retrieval in multiple formats (FASTA sequences, MOL structures, pathway images, KGML XML, JSON), cross-referencing between databases, ID conversion, and drug interaction analysis. Essential for bioinformatics workflows involving pathway analysis, systems biology, drug discovery, metabolic engineering, comparative genomics, and functional annotation of genes and proteins." --- # KEGG Database diff --git a/scientific-databases/metabolomics-workbench-database/SKILL.md b/scientific-databases/metabolomics-workbench-database/SKILL.md index 58fa56f..885d152 100644 --- a/scientific-databases/metabolomics-workbench-database/SKILL.md +++ b/scientific-databases/metabolomics-workbench-database/SKILL.md @@ -1,6 +1,6 @@ --- name: metabolomics-workbench-database -description: Comprehensive toolkit for accessing and analyzing metabolomics data through the Metabolomics Workbench REST API. This NIH-sponsored repository contains 4,200+ metabolomics studies with standardized RefMet nomenclature, experimental datasets, metabolite structures, and gene/protein associations. Use this skill for: querying metabolite structures and downloading molecular data (MOL files, PNG images), accessing study metadata and experimental results from GC-MS/LC-MS/NMR platforms, standardizing metabolite names using RefMet classification system, performing mass spectrometry searches by m/z values with ion adducts, filtering studies by analytical methods (LCMS/GCMS/NMR), ionization polarity (positive/negative), chromatography types (HILIC/RP/GC), species, sample sources, and diseases, retrieving gene and protein information related to metabolic pathways, cross-referencing metabolite identifiers across databases (PubChem, KEGG, HMDB), identifying compounds from MS data using exact mass calculations, exploring disease-specific metabolomics studies, accessing untargeted metabolomics datasets, and retrieving complete experimental data in JSON or TXT formats. Essential for metabolomics research, biomarker discovery, metabolic pathway analysis, and mass spectrometry data interpretation. +description: "Comprehensive toolkit for accessing and analyzing metabolomics data through the Metabolomics Workbench REST API. This NIH-sponsored repository contains 4,200+ metabolomics studies with standardized RefMet nomenclature, experimental datasets, metabolite structures, and gene/protein associations. Use this skill for: querying metabolite structures and downloading molecular data (MOL files, PNG images), accessing study metadata and experimental results from GC-MS/LC-MS/NMR platforms, standardizing metabolite names using RefMet classification system, performing mass spectrometry searches by m/z values with ion adducts, filtering studies by analytical methods (LCMS/GCMS/NMR), ionization polarity (positive/negative), chromatography types (HILIC/RP/GC), species, sample sources, and diseases, retrieving gene and protein information related to metabolic pathways, cross-referencing metabolite identifiers across databases (PubChem, KEGG, HMDB), identifying compounds from MS data using exact mass calculations, exploring disease-specific metabolomics studies, accessing untargeted metabolomics datasets, and retrieving complete experimental data in JSON or TXT formats. Essential for metabolomics research, biomarker discovery, metabolic pathway analysis, and mass spectrometry data interpretation." --- # Metabolomics Workbench Database diff --git a/scientific-databases/pdb-database/SKILL.md b/scientific-databases/pdb-database/SKILL.md index 44a1da3..fa4c827 100644 --- a/scientific-databases/pdb-database/SKILL.md +++ b/scientific-databases/pdb-database/SKILL.md @@ -1,6 +1,6 @@ --- name: pdb-database -description: Access and analyze the RCSB Protein Data Bank (PDB) - the global repository for 3D structural data of biological macromolecules including proteins, nucleic acids, complexes, and ligands. This skill enables searching structures by text, attributes, sequence similarity, and structural similarity; retrieving detailed metadata and experimental information; downloading coordinate files in PDB, mmCIF, and BinaryCIF formats; performing batch operations on multiple structures; and integrating PDB data into computational workflows. Use this skill for protein structure analysis, molecular visualization, drug discovery research, protein engineering, structural biology studies, crystallographic data analysis, homology modeling, ligand binding site analysis, structure-function relationship studies, evolutionary analysis, educational content creation, and any task requiring access to experimentally determined or computationally predicted macromolecular structures. Key capabilities include querying by organism, resolution, experimental method, deposition date, biological assembly information, and performing sequence/structure similarity searches across the entire PDB archive. +description: "Access and analyze the RCSB Protein Data Bank (PDB) - the global repository for 3D structural data of biological macromolecules including proteins, nucleic acids, complexes, and ligands. This skill enables searching structures by text, attributes, sequence similarity, and structural similarity; retrieving detailed metadata and experimental information; downloading coordinate files in PDB, mmCIF, and BinaryCIF formats; performing batch operations on multiple structures; and integrating PDB data into computational workflows. Use this skill for protein structure analysis, molecular visualization, drug discovery research, protein engineering, structural biology studies, crystallographic data analysis, homology modeling, ligand binding site analysis, structure-function relationship studies, evolutionary analysis, educational content creation, and any task requiring access to experimentally determined or computationally predicted macromolecular structures. Key capabilities include querying by organism, resolution, experimental method, deposition date, biological assembly information, and performing sequence/structure similarity searches across the entire PDB archive." --- # PDB Database diff --git a/scientific-databases/pubchem-database/SKILL.md b/scientific-databases/pubchem-database/SKILL.md index f44d869..b91820d 100644 --- a/scientific-databases/pubchem-database/SKILL.md +++ b/scientific-databases/pubchem-database/SKILL.md @@ -1,6 +1,6 @@ --- name: pubchem-database -description: Access and analyze chemical compound data from PubChem database using PubChemPy and PUG-REST API. Use this skill when you need to: search compounds by name/CID/SMILES/InChI/formula, retrieve molecular properties (MW/LogP/TPSA/H-bond counts), perform similarity searches with Tanimoto thresholds, conduct substructure searches for pharmacophores, convert between chemical formats (SMILES/InChI/SDF/JSON), generate 2D structure images, access bioactivity data from assays, get compound synonyms and annotations, screen compounds using Lipinski's Rule of Five, batch process multiple compounds, or find drug-like candidates. Handles 110M+ compounds and 270M+ bioactivities with rate limiting (5 req/sec, 400 req/min). Includes error handling for timeouts, not found errors, and missing properties. Supports both synchronous and asynchronous operations for large similarity/substructure searches. +description: "Access and analyze chemical compound data from PubChem database using PubChemPy and PUG-REST API. Use this skill when you need to: search compounds by name/CID/SMILES/InChI/formula, retrieve molecular properties (MW/LogP/TPSA/H-bond counts), perform similarity searches with Tanimoto thresholds, conduct substructure searches for pharmacophores, convert between chemical formats (SMILES/InChI/SDF/JSON), generate 2D structure images, access bioactivity data from assays, get compound synonyms and annotations, screen compounds using Lipinski's Rule of Five, batch process multiple compounds, or find drug-like candidates. Handles 110M+ compounds and 270M+ bioactivities with rate limiting (5 req/sec, 400 req/min). Includes error handling for timeouts, not found errors, and missing properties. Supports both synchronous and asynchronous operations for large similarity/substructure searches." --- # PubChem Database diff --git a/scientific-databases/pubmed-database/SKILL.md b/scientific-databases/pubmed-database/SKILL.md index 658cf12..6356fc2 100644 --- a/scientific-databases/pubmed-database/SKILL.md +++ b/scientific-databases/pubmed-database/SKILL.md @@ -1,6 +1,6 @@ --- name: pubmed-database -description: Comprehensive PubMed database expertise for searching, retrieving, and analyzing biomedical research literature. Use for literature searches, systematic reviews, meta-analyses, citation discovery, programmatic data access, and biomedical research workflows. Handles advanced search queries with Boolean operators, MeSH terms, field tags, publication type filters, and date ranges. Provides E-utilities API integration for automated workflows, batch processing, and large-scale data extraction. Supports citation management, export formats, search history, and related article discovery. Covers medicine, biology, pharmacology, genetics, clinical trials, epidemiology, drug discovery, disease research, treatment protocols, and all life sciences domains. Essential for researchers, clinicians, students, and anyone conducting biomedical literature analysis or evidence-based research. +description: "Comprehensive PubMed database expertise for searching, retrieving, and analyzing biomedical research literature. Use for literature searches, systematic reviews, meta-analyses, citation discovery, programmatic data access, and biomedical research workflows. Handles advanced search queries with Boolean operators, MeSH terms, field tags, publication type filters, and date ranges. Provides E-utilities API integration for automated workflows, batch processing, and large-scale data extraction. Supports citation management, export formats, search history, and related article discovery. Covers medicine, biology, pharmacology, genetics, clinical trials, epidemiology, drug discovery, disease research, treatment protocols, and all life sciences domains. Essential for researchers, clinicians, students, and anyone conducting biomedical literature analysis or evidence-based research." --- # PubMed Database diff --git a/scientific-databases/reactome-database/reactome-database/SKILL.md b/scientific-databases/reactome-database/reactome-database/SKILL.md index 79dc846..c3fb9c7 100644 --- a/scientific-databases/reactome-database/reactome-database/SKILL.md +++ b/scientific-databases/reactome-database/reactome-database/SKILL.md @@ -1,6 +1,6 @@ --- name: reactome-database -description: Comprehensive Reactome pathway database integration for biological pathway analysis, enrichment studies, molecular interaction queries, and gene expression analysis. Use this skill for pathway overrepresentation analysis, gene-to-pathway mapping, expression dataset analysis, disease pathway exploration, molecular interaction networks, pathway hierarchy queries, species comparison studies, pathway visualization, and statistical pathway enrichment. Supports REST API access to Content Service (data retrieval) and Analysis Service (computational analysis), plus reactome2py Python package integration. Handles gene symbols, UniProt IDs, Ensembl IDs, EntrezGene IDs, ChEBI IDs, and expression data formats. Provides pathway statistics, literature references, molecular entities, reactions, complexes, and pathway browser visualization links. +description: "Comprehensive Reactome pathway database integration for biological pathway analysis, enrichment studies, molecular interaction queries, and gene expression analysis. Use this skill for pathway overrepresentation analysis, gene-to-pathway mapping, expression dataset analysis, disease pathway exploration, molecular interaction networks, pathway hierarchy queries, species comparison studies, pathway visualization, and statistical pathway enrichment. Supports REST API access to Content Service (data retrieval) and Analysis Service (computational analysis), plus reactome2py Python package integration. Handles gene symbols, UniProt IDs, Ensembl IDs, EntrezGene IDs, ChEBI IDs, and expression data formats. Provides pathway statistics, literature references, molecular entities, reactions, complexes, and pathway browser visualization links." --- # Reactome Database diff --git a/scientific-databases/string-database/SKILL.md b/scientific-databases/string-database/SKILL.md index 2a8315f..3bab0ff 100644 --- a/scientific-databases/string-database/SKILL.md +++ b/scientific-databases/string-database/SKILL.md @@ -1,6 +1,6 @@ --- name: string-database -description: Access and analyze the STRING database for comprehensive protein-protein interaction (PPI) network analysis, functional enrichment, pathway analysis, and protein interaction discovery. This skill enables querying protein interactions, building interaction networks, performing Gene Ontology (GO) enrichment, KEGG pathway analysis, Pfam domain enrichment, protein-protein interaction enrichment testing, network visualization, interaction partner discovery, homology analysis, and identifier mapping across 5000+ species. Use for analyzing protein lists from experiments (differential expression, proteomics, mass spectrometry), validating protein networks, discovering novel protein interactions, pathway enrichment analysis, functional annotation, network connectivity analysis, cross-species protein comparison, protein family analysis, hub protein identification, network expansion from seed proteins, and systems biology studies. Provides access to 59.3 million proteins and 20+ billion interactions from experimental data, computational predictions, text-mining, and curated databases. Supports confidence scoring, multiple evidence types (experimental, coexpression, phylogenetic, genomic context), physical vs functional networks, and visualization capabilities. +description: "Access and analyze the STRING database for comprehensive protein-protein interaction (PPI) network analysis, functional enrichment, pathway analysis, and protein interaction discovery. This skill enables querying protein interactions, building interaction networks, performing Gene Ontology (GO) enrichment, KEGG pathway analysis, Pfam domain enrichment, protein-protein interaction enrichment testing, network visualization, interaction partner discovery, homology analysis, and identifier mapping across 5000+ species. Use for analyzing protein lists from experiments (differential expression, proteomics, mass spectrometry), validating protein networks, discovering novel protein interactions, pathway enrichment analysis, functional annotation, network connectivity analysis, cross-species protein comparison, protein family analysis, hub protein identification, network expansion from seed proteins, and systems biology studies. Provides access to 59.3 million proteins and 20+ billion interactions from experimental data, computational predictions, text-mining, and curated databases. Supports confidence scoring, multiple evidence types (experimental, coexpression, phylogenetic, genomic context), physical vs functional networks, and visualization capabilities." --- # STRING Database diff --git a/scientific-databases/uniprot-database/SKILL.md b/scientific-databases/uniprot-database/SKILL.md index 6916c27..4a0d41d 100644 --- a/scientific-databases/uniprot-database/SKILL.md +++ b/scientific-databases/uniprot-database/SKILL.md @@ -1,6 +1,6 @@ --- name: uniprot-database -description: Access and query the UniProt protein database for comprehensive protein information retrieval. Use this skill when you need to search for proteins by name, gene symbol, accession number, or functional terms; retrieve protein sequences in FASTA format; access detailed protein annotations including function, structure, interactions, and pathways; map protein identifiers between different databases (Ensembl, RefSeq, PDB, KEGG, GO terms); query Swiss-Prot (reviewed) or TrEMBL (unreviewed) protein entries; perform batch operations on multiple proteins; download protein datasets; analyze protein families and domains; investigate protein-protein interactions; explore evolutionary relationships through UniRef clusters; access protein structure predictions from AlphaFoldDB; retrieve Gene Ontology annotations; analyze protein modifications and post-translational sites; or work with protein sequences for bioinformatics analysis. This skill provides programmatic access to UniProtKB, UniRef, UniParc, and related databases through REST API endpoints with support for various output formats (JSON, TSV, FASTA, XML). +description: "Access and query the UniProt protein database for comprehensive protein information retrieval. Use this skill when you need to search for proteins by name, gene symbol, accession number, or functional terms; retrieve protein sequences in FASTA format; access detailed protein annotations including function, structure, interactions, and pathways; map protein identifiers between different databases (Ensembl, RefSeq, PDB, KEGG, GO terms); query Swiss-Prot (reviewed) or TrEMBL (unreviewed) protein entries; perform batch operations on multiple proteins; download protein datasets; analyze protein families and domains; investigate protein-protein interactions; explore evolutionary relationships through UniRef clusters; access protein structure predictions from AlphaFoldDB; retrieve Gene Ontology annotations; analyze protein modifications and post-translational sites; or work with protein sequences for bioinformatics analysis. This skill provides programmatic access to UniProtKB, UniRef, UniParc, and related databases through REST API endpoints with support for various output formats (JSON, TSV, FASTA, XML)." --- # UniProt Database diff --git a/scientific-databases/zinc-database/SKILL.md b/scientific-databases/zinc-database/SKILL.md index 0de7f8d..64be72e 100644 --- a/scientific-databases/zinc-database/SKILL.md +++ b/scientific-databases/zinc-database/SKILL.md @@ -1,6 +1,6 @@ --- name: zinc-database -description: Access and query the ZINC database containing 230+ million commercially-available compounds for virtual screening, drug discovery, and molecular docking studies. Use this skill when you need to: search for purchasable molecules by ZINC ID, SMILES, or supplier codes; perform structural similarity searches and analog discovery; retrieve random compound sets for screening libraries; find lead compounds for drug development; explore chemical space for virtual screening campaigns; download 3D-ready molecular structures for docking; verify compound availability from chemical suppliers; perform batch compound retrieval; generate screening libraries based on drug-likeness criteria; find commercially-available analogs of known active compounds; access the CartBlanche22 API for programmatic compound searches; filter compounds by molecular properties (LogP, MW, H-bond donors); retrieve compounds from specific catalogs or vendors; perform substructure searches; generate diverse compound sets for high-throughput screening; find fragment-like molecules for fragment-based drug discovery; access ready-to-dock 3D conformations; cross-reference supplier information; perform chemical space sampling; and integrate ZINC data with molecular docking workflows. This skill provides both web interface access and API endpoints for automated compound discovery and virtual screening applications. +description: "Access and query the ZINC database containing 230+ million commercially-available compounds for virtual screening, drug discovery, and molecular docking studies. Use this skill when you need to: search for purchasable molecules by ZINC ID, SMILES, or supplier codes; perform structural similarity searches and analog discovery; retrieve random compound sets for screening libraries; find lead compounds for drug development; explore chemical space for virtual screening campaigns; download 3D-ready molecular structures for docking; verify compound availability from chemical suppliers; perform batch compound retrieval; generate screening libraries based on drug-likeness criteria; find commercially-available analogs of known active compounds; access the CartBlanche22 API for programmatic compound searches; filter compounds by molecular properties (LogP, MW, H-bond donors); retrieve compounds from specific catalogs or vendors; perform substructure searches; generate diverse compound sets for high-throughput screening; find fragment-like molecules for fragment-based drug discovery; access ready-to-dock 3D conformations; cross-reference supplier information; perform chemical space sampling; and integrate ZINC data with molecular docking workflows. This skill provides both web interface access and API endpoints for automated compound discovery and virtual screening applications." --- # ZINC Database diff --git a/scientific-packages/anndata/SKILL.md b/scientific-packages/anndata/SKILL.md index 8e35f89..303be3b 100644 --- a/scientific-packages/anndata/SKILL.md +++ b/scientific-packages/anndata/SKILL.md @@ -1,6 +1,6 @@ --- name: anndata -description: Comprehensive AnnData (Annotated Data) manipulation for single-cell genomics, multi-omics, and structured scientific datasets. Use this skill for: loading/saving .h5ad files, creating AnnData objects from matrices/DataFrames, managing obs/var metadata, storing embeddings (PCA/UMAP/t-SNE) in obsm/varm, using layers for raw/normalized data, concatenating datasets with batch tracking, memory-efficient backed mode for large files, sparse matrix optimization, subsetting with views/copies, converting between formats (CSV/MTX/Loom/Zarr), single-cell RNA-seq workflows, batch integration, quality control filtering, dimensionality reduction storage, and scientific data management best practices. +description: "Comprehensive AnnData (Annotated Data) manipulation for single-cell genomics, multi-omics, and structured scientific datasets. Use this skill for: loading/saving .h5ad files, creating AnnData objects from matrices/DataFrames, managing obs/var metadata, storing embeddings (PCA/UMAP/t-SNE) in obsm/varm, using layers for raw/normalized data, concatenating datasets with batch tracking, memory-efficient backed mode for large files, sparse matrix optimization, subsetting with views/copies, converting between formats (CSV/MTX/Loom/Zarr), single-cell RNA-seq workflows, batch integration, quality control filtering, dimensionality reduction storage, and scientific data management best practices." --- # AnnData diff --git a/scientific-packages/arboreto/SKILL.md b/scientific-packages/arboreto/SKILL.md index d49bb5d..378b513 100644 --- a/scientific-packages/arboreto/SKILL.md +++ b/scientific-packages/arboreto/SKILL.md @@ -1,6 +1,6 @@ --- name: arboreto -description: Python toolkit for gene regulatory network (GRN) inference from gene expression data using machine learning algorithms. Use this skill for inferring transcription factor-target gene relationships, analyzing single-cell RNA-seq data, building regulatory networks, performing GRN inference from expression matrices, working with GRNBoost2 and GENIE3 algorithms, setting up distributed computing with Dask, integrating with pySCENIC workflows, comparing GRN inference methods, troubleshooting arboreto installation issues, handling large-scale genomic data analysis, and performing reproducible regulatory network analysis. Supports both GRNBoost2 (fast gradient boosting) and GENIE3 (Random Forest) algorithms with distributed computing capabilities via Dask for scalable analysis from single machines to multi-node clusters. +description: "Python toolkit for gene regulatory network (GRN) inference from gene expression data using machine learning algorithms. Use this skill for inferring transcription factor-target gene relationships, analyzing single-cell RNA-seq data, building regulatory networks, performing GRN inference from expression matrices, working with GRNBoost2 and GENIE3 algorithms, setting up distributed computing with Dask, integrating with pySCENIC workflows, comparing GRN inference methods, troubleshooting arboreto installation issues, handling large-scale genomic data analysis, and performing reproducible regulatory network analysis. Supports both GRNBoost2 (fast gradient boosting) and GENIE3 (Random Forest) algorithms with distributed computing capabilities via Dask for scalable analysis from single machines to multi-node clusters." --- # Arboreto - Gene Regulatory Network Inference diff --git a/scientific-packages/astropy/SKILL.md b/scientific-packages/astropy/SKILL.md index bc481ae..1ba89d5 100644 --- a/scientific-packages/astropy/SKILL.md +++ b/scientific-packages/astropy/SKILL.md @@ -1,6 +1,6 @@ --- name: astropy -description: Expert guidance for astronomical data analysis using the astropy Python library. Use this skill for FITS file operations (reading, writing, inspecting, modifying), coordinate transformations between celestial reference frames (ICRS, galactic, FK5, ecliptic, horizontal), cosmological distance and age calculations, astronomical time systems (UTC, TAI, TT, TDB), physical units and dimensional analysis, astronomical data tables with specialized column types, model fitting to astronomical data, World Coordinate System (WCS) transformations between pixel and sky coordinates, robust statistical analysis of astronomical datasets, and visualization of astronomical images with proper scaling. Essential for tasks involving celestial coordinates, astronomical file formats, photometry, spectroscopy, catalog matching, time series analysis, image processing, cosmological calculations, or any astronomy-specific Python computations requiring astropy's specialized tools and data structures. +description: "Expert guidance for astronomical data analysis using the astropy Python library. Use this skill for FITS file operations (reading, writing, inspecting, modifying), coordinate transformations between celestial reference frames (ICRS, galactic, FK5, ecliptic, horizontal), cosmological distance and age calculations, astronomical time systems (UTC, TAI, TT, TDB), physical units and dimensional analysis, astronomical data tables with specialized column types, model fitting to astronomical data, World Coordinate System (WCS) transformations between pixel and sky coordinates, robust statistical analysis of astronomical datasets, and visualization of astronomical images with proper scaling. Essential for tasks involving celestial coordinates, astronomical file formats, photometry, spectroscopy, catalog matching, time series analysis, image processing, cosmological calculations, or any astronomy-specific Python computations requiring astropy's specialized tools and data structures." --- # Astropy diff --git a/scientific-packages/biomni/SKILL.md b/scientific-packages/biomni/SKILL.md index 73df432..85c8c3c 100644 --- a/scientific-packages/biomni/SKILL.md +++ b/scientific-packages/biomni/SKILL.md @@ -1,6 +1,6 @@ --- name: biomni -description: Use this skill for autonomous biomedical research execution across genomics, proteomics, drug discovery, and computational biology. Biomni is an AI agent that combines LLM reasoning with retrieval-augmented planning and code generation to autonomously execute complex biomedical tasks. Use when you need: CRISPR guide RNA design and screening experiments, single-cell RNA-seq analysis workflows, molecular ADMET property prediction, GWAS analysis, protein structure prediction, disease classification from multi-omics data, pathway analysis, drug repurposing, biomarker discovery, variant interpretation, cell type annotation, or any biomedical computational task requiring automated code generation, data analysis, and scientific reasoning. The agent autonomously decomposes tasks, retrieves relevant biomedical knowledge from its 11GB knowledge base, generates and executes analysis code, and provides comprehensive results. Ideal for researchers needing automated execution of complex biomedical workflows without manual coding. +description: "Use this skill for autonomous biomedical research execution across genomics, proteomics, drug discovery, and computational biology. Biomni is an AI agent that combines LLM reasoning with retrieval-augmented planning and code generation to autonomously execute complex biomedical tasks. Use when you need: CRISPR guide RNA design and screening experiments, single-cell RNA-seq analysis workflows, molecular ADMET property prediction, GWAS analysis, protein structure prediction, disease classification from multi-omics data, pathway analysis, drug repurposing, biomarker discovery, variant interpretation, cell type annotation, or any biomedical computational task requiring automated code generation, data analysis, and scientific reasoning. The agent autonomously decomposes tasks, retrieves relevant biomedical knowledge from its 11GB knowledge base, generates and executes analysis code, and provides comprehensive results. Ideal for researchers needing automated execution of complex biomedical workflows without manual coding." --- # Biomni diff --git a/scientific-packages/biopython/SKILL.md b/scientific-packages/biopython/SKILL.md index a37082e..f250ff8 100644 --- a/scientific-packages/biopython/SKILL.md +++ b/scientific-packages/biopython/SKILL.md @@ -1,6 +1,6 @@ --- name: biopython -description: Use BioPython for computational molecular biology and bioinformatics tasks. Essential for: sequence manipulation (DNA/RNA/protein transcription, translation, complement, reverse complement), reading/writing biological file formats (FASTA, FASTQ, GenBank, EMBL, Swiss-Prot, PDB, Clustal, PHYLIP, NEXUS), NCBI database access (Entrez searches, downloads from GenBank/PubMed/Protein databases), BLAST sequence similarity searches and result parsing, pairwise and multiple sequence alignments, phylogenetic tree construction and analysis (UPGMA, Neighbor-Joining), protein structure analysis (PDB parsing, secondary structure, structural alignment), sequence property calculations (GC content, melting temperature, molecular weight, isoelectric point), format conversion between biological file types, restriction enzyme analysis, motif discovery, population genetics calculations, and any task requiring Bio.Seq, Bio.SeqIO, Bio.Entrez, Bio.Blast, Bio.Align, Bio.Phylo, Bio.PDB, Bio.SeqUtils, or other BioPython modules. +description: "Use BioPython for computational molecular biology and bioinformatics tasks. Essential for: sequence manipulation (DNA/RNA/protein transcription, translation, complement, reverse complement), reading/writing biological file formats (FASTA, FASTQ, GenBank, EMBL, Swiss-Prot, PDB, Clustal, PHYLIP, NEXUS), NCBI database access (Entrez searches, downloads from GenBank/PubMed/Protein databases), BLAST sequence similarity searches and result parsing, pairwise and multiple sequence alignments, phylogenetic tree construction and analysis (UPGMA, Neighbor-Joining), protein structure analysis (PDB parsing, secondary structure, structural alignment), sequence property calculations (GC content, melting temperature, molecular weight, isoelectric point), format conversion between biological file types, restriction enzyme analysis, motif discovery, population genetics calculations, and any task requiring Bio.Seq, Bio.SeqIO, Bio.Entrez, Bio.Blast, Bio.Align, Bio.Phylo, Bio.PDB, Bio.SeqUtils, or other BioPython modules." --- # BioPython diff --git a/scientific-packages/bioservices/SKILL.md b/scientific-packages/bioservices/SKILL.md index bea1422..f85fe11 100644 --- a/scientific-packages/bioservices/SKILL.md +++ b/scientific-packages/bioservices/SKILL.md @@ -1,6 +1,6 @@ --- name: bioservices -description: Python toolkit for programmatic access to 40+ biological web services and databases including UniProt, KEGG, ChEBI, ChEMBL, PubChem, NCBI BLAST, PSICQUIC, QuickGO, BioMart, ArrayExpress, ENA, PDB, Pfam, Reactome, and many others. Use this skill for retrieving protein sequences and annotations, analyzing metabolic pathways and gene functions, searching compound databases, converting identifiers between biological databases (UniProt↔KEGG↔ChEMBL), running BLAST searches, querying gene ontology terms, accessing protein-protein interactions, mining genomic data, performing sequence alignments, cross-referencing compounds across databases, and integrating data from multiple bioinformatics resources in Python workflows. Essential for bioinformatics data retrieval, identifier mapping, pathway analysis, compound searches, sequence similarity analysis, and multi-database integration tasks. +description: "Python toolkit for programmatic access to 40+ biological web services and databases including UniProt, KEGG, ChEBI, ChEMBL, PubChem, NCBI BLAST, PSICQUIC, QuickGO, BioMart, ArrayExpress, ENA, PDB, Pfam, Reactome, and many others. Use this skill for retrieving protein sequences and annotations, analyzing metabolic pathways and gene functions, searching compound databases, converting identifiers between biological databases (UniProt↔KEGG↔ChEMBL), running BLAST searches, querying gene ontology terms, accessing protein-protein interactions, mining genomic data, performing sequence alignments, cross-referencing compounds across databases, and integrating data from multiple bioinformatics resources in Python workflows. Essential for bioinformatics data retrieval, identifier mapping, pathway analysis, compound searches, sequence similarity analysis, and multi-database integration tasks." --- # BioServices diff --git a/scientific-packages/cellxgene-census/SKILL.md b/scientific-packages/cellxgene-census/SKILL.md index a5f21fc..24c62cd 100644 --- a/scientific-packages/cellxgene-census/SKILL.md +++ b/scientific-packages/cellxgene-census/SKILL.md @@ -1,6 +1,6 @@ --- name: cellxgene-census -description: Access, query, and analyze single-cell genomics data from the CZ CELLxGENE Census containing 61+ million cells from human and mouse. Use this skill for single-cell RNA-seq analysis, cell type identification, gene expression queries, tissue-specific analysis, disease studies, cross-dataset integration, machine learning model training, and large-scale genomics workflows. Supports filtering by cell type, tissue, disease, donor, and gene expression patterns. Provides both in-memory (AnnData) and out-of-core processing for datasets of any size. Integrates with PyTorch for ML workflows, scanpy for standard single-cell analysis, and supports batch processing for computational efficiency. Essential for exploring cell type diversity, marker gene analysis, differential expression studies, multi-tissue comparisons, COVID-19 research, developmental biology, and population-scale genomics projects. +description: "Access, query, and analyze single-cell genomics data from the CZ CELLxGENE Census containing 61+ million cells from human and mouse. Use this skill for single-cell RNA-seq analysis, cell type identification, gene expression queries, tissue-specific analysis, disease studies, cross-dataset integration, machine learning model training, and large-scale genomics workflows. Supports filtering by cell type, tissue, disease, donor, and gene expression patterns. Provides both in-memory (AnnData) and out-of-core processing for datasets of any size. Integrates with PyTorch for ML workflows, scanpy for standard single-cell analysis, and supports batch processing for computational efficiency. Essential for exploring cell type diversity, marker gene analysis, differential expression studies, multi-tissue comparisons, COVID-19 research, developmental biology, and population-scale genomics projects." --- # CZ CELLxGENE Census diff --git a/scientific-packages/cobrapy/SKILL.md b/scientific-packages/cobrapy/SKILL.md index eb3295c..24da5ff 100644 --- a/scientific-packages/cobrapy/SKILL.md +++ b/scientific-packages/cobrapy/SKILL.md @@ -1,6 +1,6 @@ --- name: cobrapy -description: Python library for constraint-based reconstruction and analysis (COBRA) of metabolic models. Essential for systems biology, metabolic engineering, and computational biology tasks involving genome-scale metabolic models. Use for flux balance analysis (FBA), flux variability analysis (FVA), gene knockout simulations, reaction deletion studies, metabolic flux sampling, production envelope calculations, minimal media optimization, gapfilling metabolic networks, model reconstruction, metabolic pathway analysis, phenotype prediction, drug target identification, metabolic engineering design, and constraint-based modeling. Supports loading/saving models in SBML, JSON, YAML, and MATLAB formats. Handles metabolic networks, stoichiometric matrices, gene-protein-reaction rules, exchange reactions, cellular compartments, and metabolic flux distributions. Ideal for analyzing E. coli, yeast, human, and other organism metabolic models. +description: "Python library for constraint-based reconstruction and analysis (COBRA) of metabolic models. Essential for systems biology, metabolic engineering, and computational biology tasks involving genome-scale metabolic models. Use for flux balance analysis (FBA), flux variability analysis (FVA), gene knockout simulations, reaction deletion studies, metabolic flux sampling, production envelope calculations, minimal media optimization, gapfilling metabolic networks, model reconstruction, metabolic pathway analysis, phenotype prediction, drug target identification, metabolic engineering design, and constraint-based modeling. Supports loading/saving models in SBML, JSON, YAML, and MATLAB formats. Handles metabolic networks, stoichiometric matrices, gene-protein-reaction rules, exchange reactions, cellular compartments, and metabolic flux distributions. Ideal for analyzing E. coli, yeast, human, and other organism metabolic models." --- # COBRApy - Constraint-Based Reconstruction and Analysis diff --git a/scientific-packages/dask/SKILL.md b/scientific-packages/dask/SKILL.md index 991bbc6..0ba8680 100644 --- a/scientific-packages/dask/SKILL.md +++ b/scientific-packages/dask/SKILL.md @@ -1,6 +1,6 @@ --- name: dask -description: Toolkit for parallel and distributed computing in Python enabling larger-than-memory operations, parallel processing, and distributed computation. Use this skill when: (1) datasets exceed available RAM and need chunked processing, (2) pandas/NumPy operations are slow and need parallelization across cores, (3) processing multiple files (CSV, Parquet, JSON, logs) that collectively exceed memory, (4) building custom parallel workflows with task dependencies, (5) scaling from prototype pandas/NumPy code to production on larger data, (6) need distributed computing across multiple machines, (7) working with scientific datasets (HDF5, Zarr, NetCDF) larger than memory, (8) ETL pipelines processing terabytes of unstructured data, (9) parameter sweeps or embarrassingly parallel computations, (10) when simpler solutions (better algorithms, efficient formats, sampling) aren't sufficient. Supports DataFrames (parallel pandas), Arrays (parallel NumPy), Bags (parallel Python lists), Futures (task-based parallelization), and various schedulers. NOT suitable for: small datasets fitting in memory, single-file processing, simple computations without parallelism needs, when NumPy/Pandas already perform adequately, or when task overhead exceeds computation time. +description: "Toolkit for parallel and distributed computing in Python enabling larger-than-memory operations, parallel processing, and distributed computation. Use this skill when: (1) datasets exceed available RAM and need chunked processing, (2) pandas/NumPy operations are slow and need parallelization across cores, (3) processing multiple files (CSV, Parquet, JSON, logs) that collectively exceed memory, (4) building custom parallel workflows with task dependencies, (5) scaling from prototype pandas/NumPy code to production on larger data, (6) need distributed computing across multiple machines, (7) working with scientific datasets (HDF5, Zarr, NetCDF) larger than memory, (8) ETL pipelines processing terabytes of unstructured data, (9) parameter sweeps or embarrassingly parallel computations, (10) when simpler solutions (better algorithms, efficient formats, sampling) aren't sufficient. Supports DataFrames (parallel pandas), Arrays (parallel NumPy), Bags (parallel Python lists), Futures (task-based parallelization), and various schedulers. NOT suitable for: small datasets fitting in memory, single-file processing, simple computations without parallelism needs, when NumPy/Pandas already perform adequately, or when task overhead exceeds computation time." --- # Dask diff --git a/scientific-packages/datamol/SKILL.md b/scientific-packages/datamol/SKILL.md index 582f993..1050539 100644 --- a/scientific-packages/datamol/SKILL.md +++ b/scientific-packages/datamol/SKILL.md @@ -1,6 +1,6 @@ --- name: datamol -description: Complete molecular cheminformatics toolkit using datamol (Pythonic RDKit wrapper). Use for SMILES parsing/conversion, molecular standardization/sanitization, descriptor calculation, fingerprint generation, similarity analysis, clustering, diversity selection, scaffold extraction, molecular fragmentation (BRICS/RECAP), 3D conformer generation, chemical reactions, molecular visualization, file I/O (SDF/CSV/Excel), cloud storage access, batch processing with parallelization, drug-likeness filtering, virtual screening, SAR analysis, and machine learning feature generation. Essential for drug discovery, medicinal chemistry, chemical informatics, molecular property prediction, compound library analysis, lead optimization, and any computational chemistry workflows involving molecular data processing and analysis. +description: "Complete molecular cheminformatics toolkit using datamol (Pythonic RDKit wrapper). Use for SMILES parsing/conversion, molecular standardization/sanitization, descriptor calculation, fingerprint generation, similarity analysis, clustering, diversity selection, scaffold extraction, molecular fragmentation (BRICS/RECAP), 3D conformer generation, chemical reactions, molecular visualization, file I/O (SDF/CSV/Excel), cloud storage access, batch processing with parallelization, drug-likeness filtering, virtual screening, SAR analysis, and machine learning feature generation. Essential for drug discovery, medicinal chemistry, chemical informatics, molecular property prediction, compound library analysis, lead optimization, and any computational chemistry workflows involving molecular data processing and analysis." --- # Datamol Cheminformatics Skill diff --git a/scientific-packages/deepchem/SKILL.md b/scientific-packages/deepchem/SKILL.md index ca08fe3..194d754 100644 --- a/scientific-packages/deepchem/SKILL.md +++ b/scientific-packages/deepchem/SKILL.md @@ -1,6 +1,6 @@ --- name: deepchem -description: DeepChem toolkit for molecular machine learning, drug discovery, and materials science. Use for: molecular property prediction (solubility, toxicity, ADMET, binding affinity, drug-likeness), molecular featurization (fingerprints, descriptors, graph representations), graph neural networks (GCN, GAT, MPNN, AttentiveFP, DMPNN), MoleculeNet benchmark datasets (Tox21, BBBP, Delaney, HIV, ClinTox, FreeSolv, Lipophilicity), transfer learning with pretrained models (ChemBERTa, GROVER, MolFormer), materials property prediction (crystal structures, bandgap, formation energy), protein/DNA sequence analysis, molecular data loading (SMILES, SDF, FASTA), scaffold-based data splitting, molecular generation, hyperparameter optimization, model evaluation and comparison, custom model integration, and end-to-end drug discovery workflows. +description: "DeepChem toolkit for molecular machine learning, drug discovery, and materials science. Use for: molecular property prediction (solubility, toxicity, ADMET, binding affinity, drug-likeness), molecular featurization (fingerprints, descriptors, graph representations), graph neural networks (GCN, GAT, MPNN, AttentiveFP, DMPNN), MoleculeNet benchmark datasets (Tox21, BBBP, Delaney, HIV, ClinTox, FreeSolv, Lipophilicity), transfer learning with pretrained models (ChemBERTa, GROVER, MolFormer), materials property prediction (crystal structures, bandgap, formation energy), protein/DNA sequence analysis, molecular data loading (SMILES, SDF, FASTA), scaffold-based data splitting, molecular generation, hyperparameter optimization, model evaluation and comparison, custom model integration, and end-to-end drug discovery workflows." --- # DeepChem diff --git a/scientific-packages/deeptools/SKILL.md b/scientific-packages/deeptools/SKILL.md index ca602bd..33b6a3f 100644 --- a/scientific-packages/deeptools/SKILL.md +++ b/scientific-packages/deeptools/SKILL.md @@ -1,6 +1,6 @@ --- name: deeptools -description: deepTools is a comprehensive Python toolkit for analyzing next-generation sequencing (NGS) data including ChIP-seq, RNA-seq, ATAC-seq, MNase-seq, and other genomic experiments. Use this skill for: converting BAM files to bigWig/bedGraph coverage tracks with normalization (RPGC, CPM, RPKM); quality control analysis including sample correlation, PCA, fingerprint plots, coverage assessment, and fragment size analysis; creating heatmaps and profile plots around genomic features like TSS, gene bodies, or peak regions; comparing samples using log2 ratios and correlation analysis; enrichment analysis and peak region visualization; normalization and scaling of sequencing data; publication-quality visualization generation for genomic datasets. Key tools include bamCoverage, bamCompare, computeMatrix, plotHeatmap, plotProfile, plotFingerprint, plotCorrelation, multiBamSummary, and alignmentSieve. Essential for ChIP-seq quality control, RNA-seq coverage analysis, ATAC-seq processing with Tn5 correction, sample comparison workflows, and generating standardized genomic visualizations. Use when working with BAM files, bigWig files, BED region files, or when users request genomic data analysis, quality control assessment, sample correlation, heatmap generation, profile plotting, or publication-ready visualizations for sequencing experiments. +description: "deepTools is a comprehensive Python toolkit for analyzing next-generation sequencing (NGS) data including ChIP-seq, RNA-seq, ATAC-seq, MNase-seq, and other genomic experiments. Use this skill for: converting BAM files to bigWig/bedGraph coverage tracks with normalization (RPGC, CPM, RPKM); quality control analysis including sample correlation, PCA, fingerprint plots, coverage assessment, and fragment size analysis; creating heatmaps and profile plots around genomic features like TSS, gene bodies, or peak regions; comparing samples using log2 ratios and correlation analysis; enrichment analysis and peak region visualization; normalization and scaling of sequencing data; publication-quality visualization generation for genomic datasets. Key tools include bamCoverage, bamCompare, computeMatrix, plotHeatmap, plotProfile, plotFingerprint, plotCorrelation, multiBamSummary, and alignmentSieve. Essential for ChIP-seq quality control, RNA-seq coverage analysis, ATAC-seq processing with Tn5 correction, sample comparison workflows, and generating standardized genomic visualizations. Use when working with BAM files, bigWig files, BED region files, or when users request genomic data analysis, quality control assessment, sample correlation, heatmap generation, profile plotting, or publication-ready visualizations for sequencing experiments." --- # deepTools: NGS Data Analysis Toolkit diff --git a/scientific-packages/diffdock/SKILL.md b/scientific-packages/diffdock/SKILL.md index 9d081bd..29fb4e6 100644 --- a/scientific-packages/diffdock/SKILL.md +++ b/scientific-packages/diffdock/SKILL.md @@ -1,6 +1,6 @@ --- name: diffdock -description: This skill provides comprehensive guidance for using DiffDock, a state-of-the-art diffusion-based deep learning tool for molecular docking that predicts 3D binding poses of small molecule ligands to protein targets. Use this skill when users request molecular docking simulations, protein-ligand binding pose predictions, virtual screening campaigns, structure-based drug design, lead optimization, binding site identification, or computational drug discovery tasks. This skill applies to tasks involving PDB protein structure files, SMILES ligand strings, protein amino acid sequences, ligand structure files (SDF, MOL2), batch docking of compound libraries, confidence score interpretation, ensemble docking with multiple protein conformations, integration with scoring functions (GNINA, MM/GBSA), parameter optimization for specific ligand types, troubleshooting docking issues, or analyzing docking results and ranking predictions. DiffDock predicts binding poses and confidence scores but NOT binding affinity - always combine with scoring functions for affinity assessment. Suitable for small molecule ligands (100-1000 Da), drug-like compounds, and small peptides (<20 residues), but NOT for protein-protein docking, large peptides, covalent docking, or membrane proteins without caution. +description: "This skill provides comprehensive guidance for using DiffDock, a state-of-the-art diffusion-based deep learning tool for molecular docking that predicts 3D binding poses of small molecule ligands to protein targets. Use this skill when users request molecular docking simulations, protein-ligand binding pose predictions, virtual screening campaigns, structure-based drug design, lead optimization, binding site identification, or computational drug discovery tasks. This skill applies to tasks involving PDB protein structure files, SMILES ligand strings, protein amino acid sequences, ligand structure files (SDF, MOL2), batch docking of compound libraries, confidence score interpretation, ensemble docking with multiple protein conformations, integration with scoring functions (GNINA, MM/GBSA), parameter optimization for specific ligand types, troubleshooting docking issues, or analyzing docking results and ranking predictions. DiffDock predicts binding poses and confidence scores but NOT binding affinity - always combine with scoring functions for affinity assessment. Suitable for small molecule ligands (100-1000 Da), drug-like compounds, and small peptides (<20 residues), but NOT for protein-protein docking, large peptides, covalent docking, or membrane proteins without caution." --- # DiffDock: Molecular Docking with Diffusion Models diff --git a/scientific-packages/etetoolkit/SKILL.md b/scientific-packages/etetoolkit/SKILL.md index 1352aa2..15149d9 100644 --- a/scientific-packages/etetoolkit/SKILL.md +++ b/scientific-packages/etetoolkit/SKILL.md @@ -1,6 +1,6 @@ --- name: etetoolkit -description: Expert toolkit for phylogenetic and hierarchical tree analysis using ETE (Environment for Tree Exploration). Use this skill for any tree-related bioinformatics tasks including phylogenetic trees, gene trees, species trees, clustering dendrograms, taxonomic hierarchies, or evolutionary analysis. Key applications: tree manipulation (pruning, rerooting, format conversion between Newick/NHX/PhyloXML/NeXML), evolutionary event detection (orthology/paralogy identification, duplication/speciation events), tree comparison and topology analysis (Robinson-Foulds distances, consensus trees), NCBI taxonomy integration (taxonomic ID lookup, lineage retrieval, species tree construction), tree visualization and publication figures (PDF/SVG/PNG output, custom styling, interactive GUI), clustering analysis with heatmaps and validation metrics, sequence alignment integration, gene family analysis, and phylogenomic pipelines. Handles Newick formats 0-100, supports large trees with memory-efficient iteration, provides command-line tools for batch processing, and integrates with biological databases for comprehensive phylogenetic analysis workflows. +description: "Expert toolkit for phylogenetic and hierarchical tree analysis using ETE (Environment for Tree Exploration). Use this skill for any tree-related bioinformatics tasks including phylogenetic trees, gene trees, species trees, clustering dendrograms, taxonomic hierarchies, or evolutionary analysis. Key applications: tree manipulation (pruning, rerooting, format conversion between Newick/NHX/PhyloXML/NeXML), evolutionary event detection (orthology/paralogy identification, duplication/speciation events), tree comparison and topology analysis (Robinson-Foulds distances, consensus trees), NCBI taxonomy integration (taxonomic ID lookup, lineage retrieval, species tree construction), tree visualization and publication figures (PDF/SVG/PNG output, custom styling, interactive GUI), clustering analysis with heatmaps and validation metrics, sequence alignment integration, gene family analysis, and phylogenomic pipelines. Handles Newick formats 0-100, supports large trees with memory-efficient iteration, provides command-line tools for batch processing, and integrates with biological databases for comprehensive phylogenetic analysis workflows." --- # ETE Toolkit Skill diff --git a/scientific-packages/flowio/SKILL.md b/scientific-packages/flowio/SKILL.md index 08516c8..192ee7c 100644 --- a/scientific-packages/flowio/SKILL.md +++ b/scientific-packages/flowio/SKILL.md @@ -1,6 +1,6 @@ --- name: flowio -description: Python library for reading, writing, and manipulating Flow Cytometry Standard (FCS) files. Use this skill for: parsing FCS files (versions 2.0, 3.0, 3.1) to extract event data as NumPy arrays, reading FCS metadata and channel information, creating new FCS files from NumPy arrays, converting FCS data to CSV/DataFrame formats, handling multi-dataset FCS files, extracting scatter/fluorescence/time channels, batch processing multiple FCS files, filtering events and re-exporting, validating FCS file structure, accessing TEXT segment keywords, handling problematic files with offset discrepancies, memory-efficient metadata-only reading, and FCS file format conversion. Essential for flow cytometry data preprocessing, file format conversion, metadata extraction, and cytometry data pipeline operations. Supports both raw and preprocessed event data extraction with gain scaling and logarithmic transformations. +description: "Python library for reading, writing, and manipulating Flow Cytometry Standard (FCS) files. Use this skill for: parsing FCS files (versions 2.0, 3.0, 3.1) to extract event data as NumPy arrays, reading FCS metadata and channel information, creating new FCS files from NumPy arrays, converting FCS data to CSV/DataFrame formats, handling multi-dataset FCS files, extracting scatter/fluorescence/time channels, batch processing multiple FCS files, filtering events and re-exporting, validating FCS file structure, accessing TEXT segment keywords, handling problematic files with offset discrepancies, memory-efficient metadata-only reading, and FCS file format conversion. Essential for flow cytometry data preprocessing, file format conversion, metadata extraction, and cytometry data pipeline operations. Supports both raw and preprocessed event data extraction with gain scaling and logarithmic transformations." --- # FlowIO: Flow Cytometry Standard File Handler diff --git a/scientific-packages/gget/SKILL.md b/scientific-packages/gget/SKILL.md index 59a32f9..4a785cd 100644 --- a/scientific-packages/gget/SKILL.md +++ b/scientific-packages/gget/SKILL.md @@ -1,6 +1,6 @@ --- name: gget -description: Comprehensive bioinformatics toolkit for genomic database queries, sequence analysis, and molecular biology workflows. Use this skill for: gene information retrieval (Ensembl, UniProt, NCBI), sequence analysis (BLAST, BLAT, multiple sequence alignment), protein structure prediction (AlphaFold), gene expression analysis (ARCHS4, single-cell RNA-seq), enrichment analysis (Enrichr), disease and drug associations (OpenTargets), cancer genomics (cBioPortal, COSMIC), orthology analysis (Bgee), reference genome downloads, mutation analysis, and comparative genomics. Handles nucleotide/amino acid sequences, gene symbols, Ensembl IDs, UniProt accessions, PDB structures, mutation annotations, tissue expression data, and genomic annotations. Supports both command-line and Python interfaces with automatic database updates and comprehensive error handling for reliable bioinformatics analysis workflows. +description: "Comprehensive bioinformatics toolkit for genomic database queries, sequence analysis, and molecular biology workflows. Use this skill for: gene information retrieval (Ensembl, UniProt, NCBI), sequence analysis (BLAST, BLAT, multiple sequence alignment), protein structure prediction (AlphaFold), gene expression analysis (ARCHS4, single-cell RNA-seq), enrichment analysis (Enrichr), disease and drug associations (OpenTargets), cancer genomics (cBioPortal, COSMIC), orthology analysis (Bgee), reference genome downloads, mutation analysis, and comparative genomics. Handles nucleotide/amino acid sequences, gene symbols, Ensembl IDs, UniProt accessions, PDB structures, mutation annotations, tissue expression data, and genomic annotations. Supports both command-line and Python interfaces with automatic database updates and comprehensive error handling for reliable bioinformatics analysis workflows." --- # gget diff --git a/scientific-packages/matchms/SKILL.md b/scientific-packages/matchms/SKILL.md index a167334..190af72 100644 --- a/scientific-packages/matchms/SKILL.md +++ b/scientific-packages/matchms/SKILL.md @@ -1,6 +1,6 @@ --- name: matchms -description: Process and analyze mass spectrometry data using matchms, a Python library for spectral similarity calculations, metadata harmonization, and compound identification. Use this skill when: (1) Working with mass spectrometry data files (mzML, mzXML, MGF, MSP, JSON) - importing, exporting, or converting between formats; (2) Compound identification tasks - matching unknown spectra against reference libraries using cosine similarity, modified cosine, or neutral loss patterns; (3) Spectral data preprocessing - harmonizing metadata, normalizing intensities, filtering peaks by m/z or intensity, removing precursor peaks, or applying quality control filters; (4) Building reproducible workflows - creating standardized processing pipelines, batch processing multiple datasets, or implementing consistent analysis protocols; (5) Chemical structure analysis - deriving SMILES/InChI from spectra, adding molecular fingerprints, validating structural annotations, or comparing structural similarities; (6) Large-scale spectral comparisons - performing library-to-library comparisons, finding duplicate spectra, or clustering similar compounds; (7) Multi-metric scoring - combining spectral similarity with structural similarity or metadata matching for robust compound identification; (8) Quality control and validation - filtering low-quality spectra, validating precursor masses, ensuring metadata completeness, or generating identification reports. This skill is essential for metabolomics, proteomics, natural products research, environmental analysis, and any field requiring mass spectrometry data processing and compound identification. +description: "Process and analyze mass spectrometry data using matchms, a Python library for spectral similarity calculations, metadata harmonization, and compound identification. Use this skill when: (1) Working with mass spectrometry data files (mzML, mzXML, MGF, MSP, JSON) - importing, exporting, or converting between formats; (2) Compound identification tasks - matching unknown spectra against reference libraries using cosine similarity, modified cosine, or neutral loss patterns; (3) Spectral data preprocessing - harmonizing metadata, normalizing intensities, filtering peaks by m/z or intensity, removing precursor peaks, or applying quality control filters; (4) Building reproducible workflows - creating standardized processing pipelines, batch processing multiple datasets, or implementing consistent analysis protocols; (5) Chemical structure analysis - deriving SMILES/InChI from spectra, adding molecular fingerprints, validating structural annotations, or comparing structural similarities; (6) Large-scale spectral comparisons - performing library-to-library comparisons, finding duplicate spectra, or clustering similar compounds; (7) Multi-metric scoring - combining spectral similarity with structural similarity or metadata matching for robust compound identification; (8) Quality control and validation - filtering low-quality spectra, validating precursor masses, ensuring metadata completeness, or generating identification reports. This skill is essential for metabolomics, proteomics, natural products research, environmental analysis, and any field requiring mass spectrometry data processing and compound identification." --- # Matchms diff --git a/scientific-packages/matplotlib/SKILL.md b/scientific-packages/matplotlib/SKILL.md index b7f2413..82d4d02 100644 --- a/scientific-packages/matplotlib/SKILL.md +++ b/scientific-packages/matplotlib/SKILL.md @@ -1,6 +1,6 @@ --- name: matplotlib -description: Python's foundational data visualization library for creating publication-quality plots, charts, and scientific figures. Use this skill for any visualization task including line plots, scatter plots, bar charts, histograms, heatmaps, contour plots, 3D visualizations, subplots, animations, and statistical plots. Essential for data analysis visualization, scientific plotting, figure generation, plot customization, color mapping, exporting graphics (PNG, PDF, SVG), creating multi-panel figures, interactive plots, and integrating visualizations into reports, papers, presentations, or web applications. Covers both pyplot interface and object-oriented API with best practices for styling, layout management, accessibility, and performance optimization. +description: "Python's foundational data visualization library for creating publication-quality plots, charts, and scientific figures. Use this skill for any visualization task including line plots, scatter plots, bar charts, histograms, heatmaps, contour plots, 3D visualizations, subplots, animations, and statistical plots. Essential for data analysis visualization, scientific plotting, figure generation, plot customization, color mapping, exporting graphics (PNG, PDF, SVG), creating multi-panel figures, interactive plots, and integrating visualizations into reports, papers, presentations, or web applications. Covers both pyplot interface and object-oriented API with best practices for styling, layout management, accessibility, and performance optimization." --- # Matplotlib diff --git a/scientific-packages/medchem/SKILL.md b/scientific-packages/medchem/SKILL.md index a270570..84554cd 100644 --- a/scientific-packages/medchem/SKILL.md +++ b/scientific-packages/medchem/SKILL.md @@ -1,6 +1,6 @@ --- name: medchem -description: Python library for medicinal chemistry filtering and compound prioritization in drug discovery workflows. Use medchem when you need to: apply drug-likeness rules (Lipinski Rule of Five, CNS rules, leadlike criteria, Veber rules, Oprea rules), detect structural alerts and problematic substructures (PAINS filters, NIBR alerts, Lilly demerits, common structural alerts), filter compound libraries by medicinal chemistry criteria, calculate molecular complexity metrics (Bertz, Whitlock, Barone), identify specific chemical groups (hinge binders, phosphate binders, Michael acceptors), apply property-based constraints (molecular weight, LogP, TPSA, rotatable bonds), screen large compound collections for drug-like properties, prioritize hits from virtual screening, optimize lead compounds during medicinal chemistry campaigns, validate compound libraries before biological testing, or perform batch processing of molecular datasets. Medchem integrates with RDKit and datamol, accepts SMILES strings and RDKit mol objects, provides parallel processing for large datasets, includes a query language for complex filtering criteria, and offers both functional and object-oriented APIs. Essential for computational medicinal chemistry, compound library management, hit-to-lead optimization, and drug discovery pipeline workflows. +description: "Python library for medicinal chemistry filtering and compound prioritization in drug discovery workflows. Use medchem when you need to: apply drug-likeness rules (Lipinski Rule of Five, CNS rules, leadlike criteria, Veber rules, Oprea rules), detect structural alerts and problematic substructures (PAINS filters, NIBR alerts, Lilly demerits, common structural alerts), filter compound libraries by medicinal chemistry criteria, calculate molecular complexity metrics (Bertz, Whitlock, Barone), identify specific chemical groups (hinge binders, phosphate binders, Michael acceptors), apply property-based constraints (molecular weight, LogP, TPSA, rotatable bonds), screen large compound collections for drug-like properties, prioritize hits from virtual screening, optimize lead compounds during medicinal chemistry campaigns, validate compound libraries before biological testing, or perform batch processing of molecular datasets. Medchem integrates with RDKit and datamol, accepts SMILES strings and RDKit mol objects, provides parallel processing for large datasets, includes a query language for complex filtering criteria, and offers both functional and object-oriented APIs. Essential for computational medicinal chemistry, compound library management, hit-to-lead optimization, and drug discovery pipeline workflows." --- # Medchem diff --git a/scientific-packages/molfeat/SKILL.md b/scientific-packages/molfeat/SKILL.md index 62fca4d..b9d14f5 100644 --- a/scientific-packages/molfeat/SKILL.md +++ b/scientific-packages/molfeat/SKILL.md @@ -1,6 +1,6 @@ --- name: molfeat -description: Comprehensive molecular featurization toolkit for converting chemical structures into numerical representations for machine learning. Use this skill when working with molecular data, SMILES strings, chemical fingerprints, molecular descriptors, or building QSAR/QSPR models. Provides access to 100+ featurizers including traditional fingerprints (ECFP, MACCS), molecular descriptors (RDKit, Mordred), and pretrained deep learning models (ChemBERTa, ChemGPT, GNN models) for cheminformatics and drug discovery tasks. Use molfeat for converting SMILES strings to machine learning features, molecular fingerprinting, chemical similarity analysis, virtual screening, QSAR model development, molecular property prediction, chemical space analysis, drug discovery pipelines, molecular machine learning, cheminformatics workflows, chemical data preprocessing, molecular representation learning, and any task requiring conversion of chemical structures to numerical features for computational analysis. +description: "Comprehensive molecular featurization toolkit for converting chemical structures into numerical representations for machine learning. Use this skill when working with molecular data, SMILES strings, chemical fingerprints, molecular descriptors, or building QSAR/QSPR models. Provides access to 100+ featurizers including traditional fingerprints (ECFP, MACCS), molecular descriptors (RDKit, Mordred), and pretrained deep learning models (ChemBERTa, ChemGPT, GNN models) for cheminformatics and drug discovery tasks. Use molfeat for converting SMILES strings to machine learning features, molecular fingerprinting, chemical similarity analysis, virtual screening, QSAR model development, molecular property prediction, chemical space analysis, drug discovery pipelines, molecular machine learning, cheminformatics workflows, chemical data preprocessing, molecular representation learning, and any task requiring conversion of chemical structures to numerical features for computational analysis." --- # Molfeat - Molecular Featurization Hub diff --git a/scientific-packages/polars/SKILL.md b/scientific-packages/polars/SKILL.md index 9217fcb..f8e8d79 100644 --- a/scientific-packages/polars/SKILL.md +++ b/scientific-packages/polars/SKILL.md @@ -1,6 +1,6 @@ --- name: polars -description: Use this skill for all Polars DataFrame operations, data manipulation, analysis, and processing tasks in Python. This includes: DataFrame creation and operations (select, filter, group_by, aggregations, joins, pivots, concatenation), lazy evaluation with LazyFrame for large datasets, data I/O (CSV, Parquet, JSON, Excel, databases), migrating from pandas to Polars, performance optimization, expression-based API usage, window functions, data transformations, statistical operations, and working with Apache Arrow-based data structures. Also use for questions about Polars syntax, best practices, query optimization, parallel processing, streaming data, type handling, null value management, and any data science or analytics workflows requiring fast DataFrame operations. +description: "Use this skill for all Polars DataFrame operations, data manipulation, analysis, and processing tasks in Python. This includes: DataFrame creation and operations (select, filter, group_by, aggregations, joins, pivots, concatenation), lazy evaluation with LazyFrame for large datasets, data I/O (CSV, Parquet, JSON, Excel, databases), migrating from pandas to Polars, performance optimization, expression-based API usage, window functions, data transformations, statistical operations, and working with Apache Arrow-based data structures. Also use for questions about Polars syntax, best practices, query optimization, parallel processing, streaming data, type handling, null value management, and any data science or analytics workflows requiring fast DataFrame operations." --- # Polars diff --git a/scientific-packages/pydeseq2/SKILL.md b/scientific-packages/pydeseq2/SKILL.md index 523ce47..f283cfd 100644 --- a/scientific-packages/pydeseq2/SKILL.md +++ b/scientific-packages/pydeseq2/SKILL.md @@ -1,6 +1,6 @@ --- name: pydeseq2 -description: Comprehensive toolkit for differential gene expression analysis using PyDESeq2, the Python implementation of DESeq2 for bulk RNA-seq data. Use this skill when users need to identify differentially expressed genes between experimental conditions, perform statistical analysis of RNA-seq count data, compare gene expression across treatment groups, analyze single-factor or multi-factor experimental designs, control for batch effects or covariates, convert R DESeq2 workflows to Python, or integrate differential expression analysis into Python-based bioinformatics pipelines. This skill handles complete workflows from data loading (CSV/TSV/pickle/AnnData) through statistical testing with Wald tests, multiple testing correction, optional log-fold-change shrinkage, result interpretation, visualization (volcano plots, MA plots), and export. Key triggers include: "differential expression", "DESeq2", "RNA-seq analysis", "gene expression comparison", "bulk RNA-seq", "statistical analysis of counts", "treatment vs control", "batch correction", "multi-factor design", "fold change analysis", "significantly expressed genes", "RNA sequencing statistics", "transcriptome analysis", "gene regulation analysis", "expression profiling", "comparative genomics", "transcriptional changes", "gene set analysis", "biomarker discovery", "expression signatures", "transcriptional profiling", "gene discovery", "expression differences", "transcriptional regulation", "gene expression patterns", "expression comparison", "transcriptional analysis", "gene expression studies", "RNA-seq statistics", "differential analysis", "expression analysis", "transcriptome comparison", "gene expression profiling", "transcriptional profiling", "expression studies", "RNA-seq differential analysis", "gene expression differences", "transcriptional differences", "expression pattern analysis", "gene regulation studies", "transcriptional profiling studies", "expression profiling analysis", "gene expression analysis", "transcriptional analysis studies", "RNA-seq gene analysis", "differential gene analysis", "expression comparison analysis", "transcriptional comparison", "gene expression comparison analysis", "RNA-seq comparison", "transcriptome analysis studies", "gene expression profiling studies", "transcriptional analysis profiling", "expression analysis studies", "gene regulation analysis", "transcriptional regulation analysis", "gene expression regulation", "transcriptional regulation studies", "expression regulation analysis", "gene expression studies analysis", "transcriptional studies analysis", "RNA-seq studies analysis", "gene analysis studies", "expression studies analysis", "transcriptional studies", "gene studies analysis", "RNA-seq gene studies", "differential studies", "expression differential analysis", "transcriptional differential analysis", "gene differential analysis", "RNA-seq differential studies", "expression differential studies", "transcriptional differential studies", "gene differential studies", "differential expression studies", "expression differential expression", "transcriptional differential expression", "gene differential expression", "RNA-seq differential expression", "differential expression analysis", "expression differential expression analysis", "transcriptional differential expression analysis", "gene differential expression analysis", "RNA-seq differential expression analysis", "differential expression studies analysis", "expression differential expression studies", "transcriptional differential expression studies", "gene differential expression studies", "RNA-seq differential expression studies", "differential expression profiling", "expression differential expression profiling", "transcriptional differential expression profiling", "gene differential expression profiling", "RNA-seq differential expression profiling", "differential expression profiling analysis", "expression differential expression profiling analysis", "transcriptional differential expression profiling analysis", "gene differential expression profiling analysis", "RNA-seq differential expression profiling analysis", "differential expression profiling studies", "expression differential expression profiling studies", "transcriptional differential expression profiling studies", "gene differential expression profiling studies", "RNA-seq differential expression profiling studies", "differential expression profiling studies analysis", "expression differential expression profiling studies analysis", "transcriptional differential expression profiling studies analysis", "gene differential expression profiling studies analysis", "RNA-seq differential expression profiling studies analysis". Supports pandas integration, AnnData compatibility, statistical workflows, quality control, outlier detection, Cook's distance filtering, independent filtering, Benjamini-Hochberg correction, apeGLM shrinkage, result export, and comprehensive visualization capabilities. +description: "Comprehensive toolkit for differential gene expression analysis using PyDESeq2, the Python implementation of DESeq2 for bulk RNA-seq data. Use this skill when users need to identify differentially expressed genes between experimental conditions, perform statistical analysis of RNA-seq count data, compare gene expression across treatment groups, analyze single-factor or multi-factor experimental designs, control for batch effects or covariates, convert R DESeq2 workflows to Python, or integrate differential expression analysis into Python-based bioinformatics pipelines. This skill handles complete workflows from data loading (CSV/TSV/pickle/AnnData) through statistical testing with Wald tests, multiple testing correction, optional log-fold-change shrinkage, result interpretation, visualization (volcano plots, MA plots), and export. Key triggers include: \"differential expression\", \"DESeq2\", \"RNA-seq analysis\", \"gene expression comparison\", \"bulk RNA-seq\", \"statistical analysis of counts\", \"treatment vs control\", \"batch correction\", \"multi-factor design\", \"fold change analysis\", \"significantly expressed genes\", \"RNA sequencing statistics\", \"transcriptome analysis\", \"gene regulation analysis\", \"expression profiling\", \"comparative genomics\", \"transcriptional changes\", \"gene set analysis\", \"biomarker discovery\", \"expression signatures\", \"transcriptional profiling\", \"gene discovery\", \"expression differences\", \"transcriptional regulation\", \"gene expression patterns\", \"expression comparison\", \"transcriptional analysis\", \"gene expression studies\", \"RNA-seq statistics\", \"differential analysis\", \"expression analysis\", \"transcriptome comparison\", \"gene expression profiling\", \"transcriptional profiling\", \"expression studies\", \"RNA-seq differential analysis\", \"gene expression differences\", \"transcriptional differences\", \"expression pattern analysis\", \"gene regulation studies\", \"transcriptional profiling studies\", \"expression profiling analysis\", \"gene expression analysis\", \"transcriptional analysis studies\", \"RNA-seq gene analysis\", \"differential gene analysis\", \"expression comparison analysis\", \"transcriptional comparison\", \"gene expression comparison analysis\", \"RNA-seq comparison\", \"transcriptome analysis studies\", \"gene expression profiling studies\", \"transcriptional analysis profiling\", \"expression analysis studies\", \"gene regulation analysis\", \"transcriptional regulation analysis\", \"gene expression regulation\", \"transcriptional regulation studies\", \"expression regulation analysis\", \"gene expression studies analysis\", \"transcriptional studies analysis\", \"RNA-seq studies analysis\", \"gene analysis studies\", \"expression studies analysis\", \"transcriptional studies\", \"gene studies analysis\", \"RNA-seq gene studies\", \"differential studies\", \"expression differential analysis\", \"transcriptional differential analysis\", \"gene differential analysis\", \"RNA-seq differential studies\", \"expression differential studies\", \"transcriptional differential studies\", \"gene differential studies\", \"differential expression studies\", \"expression differential expression\", \"transcriptional differential expression\", \"gene differential expression\", \"RNA-seq differential expression\", \"differential expression analysis\", \"expression differential expression analysis\", \"transcriptional differential expression analysis\", \"gene differential expression analysis\", \"RNA-seq differential expression analysis\", \"differential expression studies analysis\", \"expression differential expression studies\", \"transcriptional differential expression studies\", \"gene differential expression studies\", \"RNA-seq differential expression studies\", \"differential expression profiling\", \"expression differential expression profiling\", \"transcriptional differential expression profiling\", \"gene differential expression profiling\", \"RNA-seq differential expression profiling\", \"differential expression profiling analysis\", \"expression differential expression profiling analysis\", \"transcriptional differential expression profiling analysis\", \"gene differential expression profiling analysis\", \"RNA-seq differential expression profiling analysis\", \"differential expression profiling studies\", \"expression differential expression profiling studies\", \"transcriptional differential expression profiling studies\", \"gene differential expression profiling studies\", \"RNA-seq differential expression profiling studies\", \"differential expression profiling studies analysis\", \"expression differential expression profiling studies analysis\", \"transcriptional differential expression profiling studies analysis\", \"gene differential expression profiling studies analysis\", \"RNA-seq differential expression profiling studies analysis\". Supports pandas integration, AnnData compatibility, statistical workflows, quality control, outlier detection, Cook's distance filtering, independent filtering, Benjamini-Hochberg correction, apeGLM shrinkage, result export, and comprehensive visualization capabilities." --- # PyDESeq2 diff --git a/scientific-packages/pymatgen/SKILL.md b/scientific-packages/pymatgen/SKILL.md index 2f7dbd3..af6c280 100644 --- a/scientific-packages/pymatgen/SKILL.md +++ b/scientific-packages/pymatgen/SKILL.md @@ -1,6 +1,6 @@ --- name: pymatgen -description: Python Materials Genomics (pymatgen) toolkit for comprehensive materials science analysis and computational chemistry workflows. Use for crystal structure manipulation, molecular systems, materials property analysis, electronic structure calculations, phase diagram construction, surface and interface studies, thermodynamic stability analysis, symmetry operations, coordination environment analysis, band structure and density of states calculations, Materials Project database integration, file format conversion between 100+ formats (CIF, POSCAR, XYZ, VASP, Gaussian, Quantum ESPRESSO, etc.), high-throughput materials screening, computational workflow setup, diffraction pattern analysis, elastic properties, magnetic ordering, adsorption site finding, slab generation, Wulff shape construction, Pourbaix diagrams, reaction energy calculations, diffusion analysis, and integration with electronic structure codes. Essential for computational materials science, crystal structure analysis, materials discovery, DFT calculations, surface science, catalysis research, battery materials, semiconductor analysis, and any materials informatics applications requiring structure-property relationships. +description: "Python Materials Genomics (pymatgen) toolkit for comprehensive materials science analysis and computational chemistry workflows. Use for crystal structure manipulation, molecular systems, materials property analysis, electronic structure calculations, phase diagram construction, surface and interface studies, thermodynamic stability analysis, symmetry operations, coordination environment analysis, band structure and density of states calculations, Materials Project database integration, file format conversion between 100+ formats (CIF, POSCAR, XYZ, VASP, Gaussian, Quantum ESPRESSO, etc.), high-throughput materials screening, computational workflow setup, diffraction pattern analysis, elastic properties, magnetic ordering, adsorption site finding, slab generation, Wulff shape construction, Pourbaix diagrams, reaction energy calculations, diffusion analysis, and integration with electronic structure codes. Essential for computational materials science, crystal structure analysis, materials discovery, DFT calculations, surface science, catalysis research, battery materials, semiconductor analysis, and any materials informatics applications requiring structure-property relationships." --- # Pymatgen - Python Materials Genomics diff --git a/scientific-packages/pymc/SKILL.md b/scientific-packages/pymc/SKILL.md index 4ec9c4f..24caa5a 100644 --- a/scientific-packages/pymc/SKILL.md +++ b/scientific-packages/pymc/SKILL.md @@ -1,6 +1,6 @@ --- name: pymc-bayesian-modeling -description: Comprehensive toolkit for Bayesian modeling, probabilistic programming, and statistical inference using PyMC. Use this skill for building, fitting, validating, and analyzing Bayesian models including linear regression, logistic regression, hierarchical/multilevel models, time series analysis, mixture models, count data models, and survival analysis. Essential for MCMC sampling, variational inference, model comparison using LOO/WAIC, prior and posterior predictive checks, uncertainty quantification, Bayesian hypothesis testing, parameter estimation with credible intervals, handling missing data, measurement error modeling, and hierarchical data structures. Includes diagnostic procedures for convergence checking, effective sample size assessment, divergence detection, and model validation. Use for Bayesian model selection, model averaging, posterior predictive simulation, and making predictions with uncertainty intervals. Covers both NUTS and variational inference methods, distribution selection for priors and likelihoods, non-centered parameterization for hierarchical models, and best practices for reproducible Bayesian analyses. +description: "Comprehensive toolkit for Bayesian modeling, probabilistic programming, and statistical inference using PyMC. Use this skill for building, fitting, validating, and analyzing Bayesian models including linear regression, logistic regression, hierarchical/multilevel models, time series analysis, mixture models, count data models, and survival analysis. Essential for MCMC sampling, variational inference, model comparison using LOO/WAIC, prior and posterior predictive checks, uncertainty quantification, Bayesian hypothesis testing, parameter estimation with credible intervals, handling missing data, measurement error modeling, and hierarchical data structures. Includes diagnostic procedures for convergence checking, effective sample size assessment, divergence detection, and model validation. Use for Bayesian model selection, model averaging, posterior predictive simulation, and making predictions with uncertainty intervals. Covers both NUTS and variational inference methods, distribution selection for priors and likelihoods, non-centered parameterization for hierarchical models, and best practices for reproducible Bayesian analyses." --- # PyMC Bayesian Modeling diff --git a/scientific-packages/pymoo/SKILL.md b/scientific-packages/pymoo/SKILL.md index 385eb2f..9fda5df 100644 --- a/scientific-packages/pymoo/SKILL.md +++ b/scientific-packages/pymoo/SKILL.md @@ -1,6 +1,6 @@ --- name: pymoo -description: Comprehensive Python framework for solving optimization problems including single-objective, multi-objective (2-3 objectives), many-objective (4+ objectives), constrained, and dynamic optimization. Use this skill for evolutionary algorithms (NSGA-II, NSGA-III, MOEA/D, GA, DE, PSO, CMA-ES), Pareto front analysis, trade-off visualization, constraint handling (feasibility-first, penalty methods), multi-criteria decision making (MCDM), genetic operator customization, benchmark problem testing (ZDT, DTLZ, WFG), and optimization algorithm comparison. Essential for engineering design optimization, portfolio allocation, combinatorial problems, parameter tuning, hyperparameter optimization, feature selection, neural architecture search, resource allocation, scheduling problems, and any task requiring finding optimal solutions or analyzing solution trade-offs. Supports continuous, discrete, binary, and mixed-variable optimization with advanced visualization tools for high-dimensional results. +description: "Comprehensive Python framework for solving optimization problems including single-objective, multi-objective (2-3 objectives), many-objective (4+ objectives), constrained, and dynamic optimization. Use this skill for evolutionary algorithms (NSGA-II, NSGA-III, MOEA/D, GA, DE, PSO, CMA-ES), Pareto front analysis, trade-off visualization, constraint handling (feasibility-first, penalty methods), multi-criteria decision making (MCDM), genetic operator customization, benchmark problem testing (ZDT, DTLZ, WFG), and optimization algorithm comparison. Essential for engineering design optimization, portfolio allocation, combinatorial problems, parameter tuning, hyperparameter optimization, feature selection, neural architecture search, resource allocation, scheduling problems, and any task requiring finding optimal solutions or analyzing solution trade-offs. Supports continuous, discrete, binary, and mixed-variable optimization with advanced visualization tools for high-dimensional results." --- # Pymoo - Multi-Objective Optimization in Python diff --git a/scientific-packages/pyopenms/SKILL.md b/scientific-packages/pyopenms/SKILL.md index b7db7fd..88bbf51 100644 --- a/scientific-packages/pyopenms/SKILL.md +++ b/scientific-packages/pyopenms/SKILL.md @@ -1,6 +1,6 @@ --- name: pyopenms -description: Toolkit for mass spectrometry data analysis with pyOpenMS, supporting proteomics and metabolomics workflows including LC-MS/MS data processing, peptide identification, feature detection, quantification, and chemical calculations. Use this skill when: (1) Working with mass spectrometry file formats (mzML, mzXML, FASTA, mzTab, mzIdentML, TraML, pepXML/protXML) and need to read, write, or convert between formats; (2) Processing raw LC-MS/MS data including spectral smoothing, peak picking, noise filtering, and signal processing; (3) Performing proteomics workflows such as peptide digestion simulation, theoretical fragmentation, modification analysis, and protein identification post-processing; (4) Conducting metabolomics analysis including feature detection, adduct annotation, isotope pattern matching, and small molecule identification; (5) Implementing quantitative proteomics pipelines with feature detection, alignment across samples, and statistical analysis; (6) Calculating chemical properties including molecular formulas, isotopic distributions, amino acid properties, and peptide masses; (7) Integrating with search engines (Comet, Mascot, MSGF+) and post-processing tools (Percolator, MSstats); (8) Building custom MS data analysis workflows that require low-level access to spectra, chromatograms, and peak data; (9) Performing quality control on MS data including TIC/BPC calculation, retention time analysis, and data validation; (10) When you need Python-based alternatives to vendor software for MS data processing and analysis. +description: "Toolkit for mass spectrometry data analysis with pyOpenMS, supporting proteomics and metabolomics workflows including LC-MS/MS data processing, peptide identification, feature detection, quantification, and chemical calculations. Use this skill when: (1) Working with mass spectrometry file formats (mzML, mzXML, FASTA, mzTab, mzIdentML, TraML, pepXML/protXML) and need to read, write, or convert between formats; (2) Processing raw LC-MS/MS data including spectral smoothing, peak picking, noise filtering, and signal processing; (3) Performing proteomics workflows such as peptide digestion simulation, theoretical fragmentation, modification analysis, and protein identification post-processing; (4) Conducting metabolomics analysis including feature detection, adduct annotation, isotope pattern matching, and small molecule identification; (5) Implementing quantitative proteomics pipelines with feature detection, alignment across samples, and statistical analysis; (6) Calculating chemical properties including molecular formulas, isotopic distributions, amino acid properties, and peptide masses; (7) Integrating with search engines (Comet, Mascot, MSGF+) and post-processing tools (Percolator, MSstats); (8) Building custom MS data analysis workflows that require low-level access to spectra, chromatograms, and peak data; (9) Performing quality control on MS data including TIC/BPC calculation, retention time analysis, and data validation; (10) When you need Python-based alternatives to vendor software for MS data processing and analysis." --- # pyOpenMS diff --git a/scientific-packages/pysam/SKILL.md b/scientific-packages/pysam/SKILL.md index 751bd68..59cc15e 100644 --- a/scientific-packages/pysam/SKILL.md +++ b/scientific-packages/pysam/SKILL.md @@ -1,6 +1,6 @@ --- name: pysam -description: Toolkit for working with genomic data files in Python, including SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequence files. This skill should be used when reading, writing, or manipulating genomic datasets, performing operations like extracting reads from specific regions, calculating coverage, analyzing variants, querying reference sequences, processing sequencing data, or implementing bioinformatics pipelines that work with high-throughput sequencing data formats. +description: "Toolkit for working with genomic data files in Python, including SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequence files. This skill should be used when reading, writing, or manipulating genomic datasets, performing operations like extracting reads from specific regions, calculating coverage, analyzing variants, querying reference sequences, processing sequencing data, or implementing bioinformatics pipelines that work with high-throughput sequencing data formats." --- # Pysam diff --git a/scientific-packages/pytdc/SKILL.md b/scientific-packages/pytdc/SKILL.md index 2bf3223..db47df8 100644 --- a/scientific-packages/pytdc/SKILL.md +++ b/scientific-packages/pytdc/SKILL.md @@ -1,6 +1,6 @@ --- name: pytdc -description: PyTDC (Therapeutics Data Commons) provides AI-ready datasets and benchmarks for drug discovery, therapeutic machine learning, and pharmacological research. Use this skill for: loading curated drug discovery datasets (ADME properties like Caco2 permeability, HIA absorption, bioavailability, lipophilicity, solubility, BBB penetration, CYP metabolism; toxicity datasets like hERG cardiotoxicity, AMES mutagenicity, DILI liver injury, carcinogenicity; drug-target interaction datasets like BindingDB Kd/Ki/IC50, DAVIS, KIBA; drug-drug interaction prediction; protein-protein interactions; molecular generation and optimization; retrosynthesis prediction; benchmark evaluations with standardized metrics; data splitting strategies (scaffold, random, cold splits); molecular format conversions (SMILES, SELFIES, PyG, DGL, ECFP); oracle functions for molecular optimization; label transformations and unit conversions; entity retrieval (PubChem CID to SMILES, UniProt to sequence). Essential for therapeutic ML model development, pharmacological property prediction, drug discovery pipeline evaluation, molecular design optimization, and accessing standardized therapeutic datasets with proper train/validation/test splits. +description: "PyTDC (Therapeutics Data Commons) provides AI-ready datasets and benchmarks for drug discovery, therapeutic machine learning, and pharmacological research. Use this skill for: loading curated drug discovery datasets (ADME properties like Caco2 permeability, HIA absorption, bioavailability, lipophilicity, solubility, BBB penetration, CYP metabolism; toxicity datasets like hERG cardiotoxicity, AMES mutagenicity, DILI liver injury, carcinogenicity; drug-target interaction datasets like BindingDB Kd/Ki/IC50, DAVIS, KIBA; drug-drug interaction prediction; protein-protein interactions; molecular generation and optimization; retrosynthesis prediction; benchmark evaluations with standardized metrics; data splitting strategies (scaffold, random, cold splits); molecular format conversions (SMILES, SELFIES, PyG, DGL, ECFP); oracle functions for molecular optimization; label transformations and unit conversions; entity retrieval (PubChem CID to SMILES, UniProt to sequence). Essential for therapeutic ML model development, pharmacological property prediction, drug discovery pipeline evaluation, molecular design optimization, and accessing standardized therapeutic datasets with proper train/validation/test splits." --- # PyTDC (Therapeutics Data Commons) diff --git a/scientific-packages/pytorch-lightning/SKILL.md b/scientific-packages/pytorch-lightning/SKILL.md index 15e384c..964b400 100644 --- a/scientific-packages/pytorch-lightning/SKILL.md +++ b/scientific-packages/pytorch-lightning/SKILL.md @@ -1,6 +1,6 @@ --- name: pytorch-lightning -description: PyTorch Lightning deep learning framework skill for organizing PyTorch code and automating training workflows. Use this skill for: creating LightningModules with training_step/validation_step hooks, implementing DataModules for data loading and preprocessing, configuring Trainer with accelerators/devices/strategies, setting up distributed training (DDP/FSDP/DeepSpeed), implementing callbacks (ModelCheckpoint/EarlyStopping), configuring loggers (TensorBoard/WandB/MLflow), converting PyTorch code to Lightning format, optimizing performance with mixed precision/gradient accumulation, debugging with fast_dev_run/overfit_batches, checkpointing and resuming training, hyperparameter tuning with Tuner, handling multi-GPU/multi-node training, memory optimization for large models, experiment tracking and reproducibility, custom training loops, validation/testing workflows, prediction pipelines, and production deployment. Includes templates, API references, distributed training guides, and best practices for efficient deep learning development. +description: "PyTorch Lightning deep learning framework skill for organizing PyTorch code and automating training workflows. Use this skill for: creating LightningModules with training_step/validation_step hooks, implementing DataModules for data loading and preprocessing, configuring Trainer with accelerators/devices/strategies, setting up distributed training (DDP/FSDP/DeepSpeed), implementing callbacks (ModelCheckpoint/EarlyStopping), configuring loggers (TensorBoard/WandB/MLflow), converting PyTorch code to Lightning format, optimizing performance with mixed precision/gradient accumulation, debugging with fast_dev_run/overfit_batches, checkpointing and resuming training, hyperparameter tuning with Tuner, handling multi-GPU/multi-node training, memory optimization for large models, experiment tracking and reproducibility, custom training loops, validation/testing workflows, prediction pipelines, and production deployment. Includes templates, API references, distributed training guides, and best practices for efficient deep learning development." --- # PyTorch Lightning diff --git a/scientific-packages/rdkit/SKILL.md b/scientific-packages/rdkit/SKILL.md index 5979056..bd1d1c9 100644 --- a/scientific-packages/rdkit/SKILL.md +++ b/scientific-packages/rdkit/SKILL.md @@ -1,6 +1,6 @@ --- name: rdkit -description: Comprehensive cheminformatics toolkit for molecular manipulation, analysis, and visualization. Use this skill when working with chemical structures (SMILES, MOL files, SDF, InChI), calculating molecular descriptors (molecular weight, LogP, TPSA, HBD/HBA), performing substructure searches with SMARTS patterns, generating molecular fingerprints (Morgan, RDKit, MACCS), visualizing molecules, processing chemical reactions, conducting drug discovery workflows, generating 2D/3D coordinates, calculating molecular similarity, clustering compounds, standardizing molecules, analyzing pharmacophores, or any cheminformatics/computational chemistry tasks involving molecular data processing, structure-activity relationships, virtual screening, or chemical informatics analysis. +description: "Comprehensive cheminformatics toolkit for molecular manipulation, analysis, and visualization. Use this skill when working with chemical structures (SMILES, MOL files, SDF, InChI), calculating molecular descriptors (molecular weight, LogP, TPSA, HBD/HBA), performing substructure searches with SMARTS patterns, generating molecular fingerprints (Morgan, RDKit, MACCS), visualizing molecules, processing chemical reactions, conducting drug discovery workflows, generating 2D/3D coordinates, calculating molecular similarity, clustering compounds, standardizing molecules, analyzing pharmacophores, or any cheminformatics/computational chemistry tasks involving molecular data processing, structure-activity relationships, virtual screening, or chemical informatics analysis." --- # RDKit Cheminformatics Toolkit diff --git a/scientific-packages/reportlab/SKILL.md b/scientific-packages/reportlab/SKILL.md index aa9d561..6657053 100644 --- a/scientific-packages/reportlab/SKILL.md +++ b/scientific-packages/reportlab/SKILL.md @@ -1,6 +1,6 @@ --- name: reportlab -description: ReportLab PDF generation skill for creating professional PDF documents programmatically in Python. Use this skill for generating invoices, reports, certificates, labels, forms, charts, tables, barcodes, QR codes, and multi-page documents. Covers both Canvas API (low-level coordinate-based drawing) and Platypus (high-level flowing document layout). Includes text formatting, custom fonts, images, interactive forms, headers/footers, page breaks, and PDF features like bookmarks and encryption. Provides templates for invoices, reports, certificates, and labels. Supports all major barcode formats (Code128, EAN, UPC, QR) and chart types (bar, line, pie, scatter). Essential for document automation, billing systems, report generation, certificate creation, label printing, and any PDF output requiring precise layout control or professional formatting. +description: "ReportLab PDF generation skill for creating professional PDF documents programmatically in Python. Use this skill for generating invoices, reports, certificates, labels, forms, charts, tables, barcodes, QR codes, and multi-page documents. Covers both Canvas API (low-level coordinate-based drawing) and Platypus (high-level flowing document layout). Includes text formatting, custom fonts, images, interactive forms, headers/footers, page breaks, and PDF features like bookmarks and encryption. Provides templates for invoices, reports, certificates, and labels. Supports all major barcode formats (Code128, EAN, UPC, QR) and chart types (bar, line, pie, scatter). Essential for document automation, billing systems, report generation, certificate creation, label printing, and any PDF output requiring precise layout control or professional formatting." --- # ReportLab PDF Generation diff --git a/scientific-packages/scanpy/SKILL.md b/scientific-packages/scanpy/SKILL.md index 19e6753..81c4b4b 100644 --- a/scientific-packages/scanpy/SKILL.md +++ b/scientific-packages/scanpy/SKILL.md @@ -1,6 +1,6 @@ --- name: scanpy -description: Use this skill for comprehensive single-cell RNA-seq analysis with scanpy. Essential for: loading single-cell data (.h5ad, 10X Genomics, CSV, HDF5), performing quality control and filtering, normalization and preprocessing, dimensionality reduction (PCA, UMAP, t-SNE), clustering (Leiden, Louvain), marker gene identification, cell type annotation, trajectory inference, differential expression analysis, batch correction, gene set scoring, creating publication-quality visualizations, and complete scRNA-seq workflows. Use when analyzing single-cell genomics data, identifying cell populations, characterizing gene expression patterns, performing pseudotime analysis, comparing conditions or treatments, visualizing cellular heterogeneity, or conducting any single-cell omics analysis requiring scalable Python tools. +description: "Use this skill for comprehensive single-cell RNA-seq analysis with scanpy. Essential for: loading single-cell data (.h5ad, 10X Genomics, CSV, HDF5), performing quality control and filtering, normalization and preprocessing, dimensionality reduction (PCA, UMAP, t-SNE), clustering (Leiden, Louvain), marker gene identification, cell type annotation, trajectory inference, differential expression analysis, batch correction, gene set scoring, creating publication-quality visualizations, and complete scRNA-seq workflows. Use when analyzing single-cell genomics data, identifying cell populations, characterizing gene expression patterns, performing pseudotime analysis, comparing conditions or treatments, visualizing cellular heterogeneity, or conducting any single-cell omics analysis requiring scalable Python tools." --- # Scanpy: Single-Cell Analysis diff --git a/scientific-packages/scikit-bio/SKILL.md b/scientific-packages/scikit-bio/SKILL.md index 56cccdb..8eebd0e 100644 --- a/scientific-packages/scikit-bio/SKILL.md +++ b/scientific-packages/scikit-bio/SKILL.md @@ -1,6 +1,6 @@ --- name: scikit-bio -description: Comprehensive Python toolkit for biological data analysis and bioinformatics workflows. Handles DNA/RNA/protein sequence manipulation, sequence alignments (global/local), phylogenetic tree construction and analysis, microbial diversity metrics (alpha/beta diversity, UniFrac distances), ordination methods (PCoA, CCA, RDA), statistical hypothesis testing (PERMANOVA, ANOSIM, Mantel), and biological file format I/O. Use this skill for sequence analysis, alignment, phylogenetics, microbiome analysis, ecological community analysis, diversity calculations, ordination visualization, statistical testing on biological data, phylogenetic tree manipulation, protein embeddings, biological table processing, distance matrix calculations, and format conversion between 19+ biological file formats including FASTA, FASTQ, GenBank, Newick, BIOM, Clustal, PHYLIP, Stockholm, BLAST, GFF3, and more. +description: "Comprehensive Python toolkit for biological data analysis and bioinformatics workflows. Handles DNA/RNA/protein sequence manipulation, sequence alignments (global/local), phylogenetic tree construction and analysis, microbial diversity metrics (alpha/beta diversity, UniFrac distances), ordination methods (PCoA, CCA, RDA), statistical hypothesis testing (PERMANOVA, ANOSIM, Mantel), and biological file format I/O. Use this skill for sequence analysis, alignment, phylogenetics, microbiome analysis, ecological community analysis, diversity calculations, ordination visualization, statistical testing on biological data, phylogenetic tree manipulation, protein embeddings, biological table processing, distance matrix calculations, and format conversion between 19+ biological file formats including FASTA, FASTQ, GenBank, Newick, BIOM, Clustal, PHYLIP, Stockholm, BLAST, GFF3, and more." --- # scikit-bio diff --git a/scientific-packages/scikit-learn/SKILL.md b/scientific-packages/scikit-learn/SKILL.md index 80de584..560adef 100644 --- a/scientific-packages/scikit-learn/SKILL.md +++ b/scientific-packages/scikit-learn/SKILL.md @@ -1,6 +1,6 @@ --- name: scikit-learn -description: Comprehensive machine learning toolkit using scikit-learn for Python. Use this skill for supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), data preprocessing (scaling, encoding, imputation, feature engineering), model evaluation (cross-validation, metrics, hyperparameter tuning), ML pipeline creation, anomaly detection, ensemble methods, feature selection, algorithm comparison, model deployment, and best practices. Covers RandomForest, SVM, LogisticRegression, KMeans, PCA, preprocessing pipelines, GridSearch, cross-validation, imbalanced data handling, mixed data types, text classification, and preventing data leakage. Essential for any machine learning project requiring predictive modeling, pattern recognition, or data analysis workflows. +description: "Comprehensive machine learning toolkit using scikit-learn for Python. Use this skill for supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), data preprocessing (scaling, encoding, imputation, feature engineering), model evaluation (cross-validation, metrics, hyperparameter tuning), ML pipeline creation, anomaly detection, ensemble methods, feature selection, algorithm comparison, model deployment, and best practices. Covers RandomForest, SVM, LogisticRegression, KMeans, PCA, preprocessing pipelines, GridSearch, cross-validation, imbalanced data handling, mixed data types, text classification, and preventing data leakage. Essential for any machine learning project requiring predictive modeling, pattern recognition, or data analysis workflows." --- # Scikit-learn: Machine Learning in Python diff --git a/scientific-packages/seaborn/SKILL.md b/scientific-packages/seaborn/SKILL.md index f52f7e3..ea8764e 100644 --- a/scientific-packages/seaborn/SKILL.md +++ b/scientific-packages/seaborn/SKILL.md @@ -1,6 +1,6 @@ --- name: seaborn -description: Use seaborn for statistical data visualization, exploratory data analysis, and publication-quality plots. This skill covers creating scatter plots, line plots, histograms, KDE plots, box plots, violin plots, bar plots, heatmaps, correlation matrices, pair plots, joint plots, regression plots, categorical comparisons, distribution analysis, multi-panel figures, faceted visualizations, statistical estimation with confidence intervals, color palettes, themes, and matplotlib integration. Apply when visualizing relationships between variables, comparing distributions across categories, analyzing correlations, creating heatmaps, performing regression analysis, exploring multivariate data, generating small multiples, designing publication figures, or when matplotlib plots need statistical enhancements. Suitable for data exploration, statistical analysis, scientific visualization, and creating complex multi-panel figures with minimal code. +description: "Use seaborn for statistical data visualization, exploratory data analysis, and publication-quality plots. This skill covers creating scatter plots, line plots, histograms, KDE plots, box plots, violin plots, bar plots, heatmaps, correlation matrices, pair plots, joint plots, regression plots, categorical comparisons, distribution analysis, multi-panel figures, faceted visualizations, statistical estimation with confidence intervals, color palettes, themes, and matplotlib integration. Apply when visualizing relationships between variables, comparing distributions across categories, analyzing correlations, creating heatmaps, performing regression analysis, exploring multivariate data, generating small multiples, designing publication figures, or when matplotlib plots need statistical enhancements. Suitable for data exploration, statistical analysis, scientific visualization, and creating complex multi-panel figures with minimal code." --- # Seaborn Statistical Visualization diff --git a/scientific-packages/statsmodels/SKILL.md b/scientific-packages/statsmodels/SKILL.md index 7dd21cc..4d4941b 100644 --- a/scientific-packages/statsmodels/SKILL.md +++ b/scientific-packages/statsmodels/SKILL.md @@ -1,6 +1,6 @@ --- name: statsmodels -description: Comprehensive statistical modeling and econometric analysis toolkit for Python. This skill should be used when you need to fit statistical models, perform hypothesis testing, conduct econometric analysis, or analyze time series data. Specifically, use this skill for: (1) Linear regression analysis including OLS, WLS, GLS, quantile regression, and mixed effects models for continuous outcomes with comprehensive diagnostics, robust standard errors, and influence analysis; (2) Generalized linear models (GLM) for non-normal outcomes including logistic regression for binary data, Poisson/Negative Binomial for count data, and Gamma regression for skewed continuous data; (3) Discrete choice models including logit/probit for binary outcomes, multinomial logit for categorical outcomes, and zero-inflated models for count data with excess zeros; (4) Time series analysis including ARIMA/SARIMAX for univariate forecasting, VAR/VARMAX for multivariate analysis, state space models, and exponential smoothing with stationarity testing and residual diagnostics; (5) Statistical testing and diagnostics including tests for heteroskedasticity, autocorrelation, normality, multicollinearity, influence detection, ANOVA, and hypothesis testing with appropriate corrections; (6) Model comparison and selection using AIC/BIC, likelihood ratio tests, and cross-validation; (7) Causal inference with instrumental variables, difference-in-differences, and regression discontinuity designs. This skill is essential when you need rigorous statistical inference, publication-ready results, detailed model diagnostics, or when working with econometric, biomedical, or social science data requiring proper statistical methodology. Always use this skill instead of basic regression when you need confidence intervals, p-values, diagnostic tests, or when assumptions need to be validated. +description: "Comprehensive statistical modeling and econometric analysis toolkit for Python. This skill should be used when you need to fit statistical models, perform hypothesis testing, conduct econometric analysis, or analyze time series data. Specifically, use this skill for: (1) Linear regression analysis including OLS, WLS, GLS, quantile regression, and mixed effects models for continuous outcomes with comprehensive diagnostics, robust standard errors, and influence analysis; (2) Generalized linear models (GLM) for non-normal outcomes including logistic regression for binary data, Poisson/Negative Binomial for count data, and Gamma regression for skewed continuous data; (3) Discrete choice models including logit/probit for binary outcomes, multinomial logit for categorical outcomes, and zero-inflated models for count data with excess zeros; (4) Time series analysis including ARIMA/SARIMAX for univariate forecasting, VAR/VARMAX for multivariate analysis, state space models, and exponential smoothing with stationarity testing and residual diagnostics; (5) Statistical testing and diagnostics including tests for heteroskedasticity, autocorrelation, normality, multicollinearity, influence detection, ANOVA, and hypothesis testing with appropriate corrections; (6) Model comparison and selection using AIC/BIC, likelihood ratio tests, and cross-validation; (7) Causal inference with instrumental variables, difference-in-differences, and regression discontinuity designs. This skill is essential when you need rigorous statistical inference, publication-ready results, detailed model diagnostics, or when working with econometric, biomedical, or social science data requiring proper statistical methodology. Always use this skill instead of basic regression when you need confidence intervals, p-values, diagnostic tests, or when assumptions need to be validated." --- # Statsmodels: Statistical Modeling and Econometrics diff --git a/scientific-packages/torch_geometric/SKILL.md b/scientific-packages/torch_geometric/SKILL.md index 7ab5cad..e3b1353 100644 --- a/scientific-packages/torch_geometric/SKILL.md +++ b/scientific-packages/torch_geometric/SKILL.md @@ -1,6 +1,6 @@ --- name: torch-geometric -description: PyTorch Geometric (PyG) skill for building, training, and deploying Graph Neural Networks (GNNs) on structured data including graphs, 3D meshes, point clouds, and heterogeneous networks. Use this skill for graph-based machine learning tasks such as node classification, graph classification, link prediction, graph generation, geometric deep learning, and message passing on irregular structures. Essential for molecular property prediction, drug discovery, chemical informatics, social network analysis, citation networks, recommendation systems, 3D computer vision, protein structure analysis, knowledge graphs, fraud detection, traffic prediction, and any domain involving relational, geometric, or topological data. Supports large-scale graph processing, multi-GPU training, neighbor sampling, heterogeneous graphs, graph transforms, model explainability, and custom message passing layers. Includes comprehensive datasets, pre-built GNN architectures (GCN, GAT, GraphSAGE, etc.), and utilities for graph visualization and benchmarking. +description: "PyTorch Geometric (PyG) skill for building, training, and deploying Graph Neural Networks (GNNs) on structured data including graphs, 3D meshes, point clouds, and heterogeneous networks. Use this skill for graph-based machine learning tasks such as node classification, graph classification, link prediction, graph generation, geometric deep learning, and message passing on irregular structures. Essential for molecular property prediction, drug discovery, chemical informatics, social network analysis, citation networks, recommendation systems, 3D computer vision, protein structure analysis, knowledge graphs, fraud detection, traffic prediction, and any domain involving relational, geometric, or topological data. Supports large-scale graph processing, multi-GPU training, neighbor sampling, heterogeneous graphs, graph transforms, model explainability, and custom message passing layers. Includes comprehensive datasets, pre-built GNN architectures (GCN, GAT, GraphSAGE, etc.), and utilities for graph visualization and benchmarking." --- # PyTorch Geometric (PyG) diff --git a/scientific-packages/transformers/SKILL.md b/scientific-packages/transformers/SKILL.md index 8f4a3cd..86f9389 100644 --- a/scientific-packages/transformers/SKILL.md +++ b/scientific-packages/transformers/SKILL.md @@ -1,6 +1,6 @@ --- name: transformers -description: Essential toolkit for Hugging Face Transformers library enabling state-of-the-art machine learning across natural language processing, computer vision, audio processing, and multimodal applications. Use this skill for: loading and using pretrained transformer models (BERT, GPT, T5, RoBERTa, DistilBERT, BART, T5, ViT, CLIP, Whisper, Llama, Mistral), implementing text generation and completion, fine-tuning models for custom tasks, text classification and sentiment analysis, question answering and reading comprehension, named entity recognition and token classification, text summarization and translation, image classification and object detection, speech recognition and audio processing, multimodal tasks combining text and images, parameter-efficient fine-tuning with LoRA and adapters, model quantization and optimization, training custom transformer models, implementing chat interfaces and conversational AI, working with tokenizers and text preprocessing, handling model inference and deployment, managing GPU memory and device allocation, implementing custom training loops, using pipelines for quick inference, working with Hugging Face Hub for model sharing, and any machine learning task involving transformer architectures or attention mechanisms. +description: "Essential toolkit for Hugging Face Transformers library enabling state-of-the-art machine learning across natural language processing, computer vision, audio processing, and multimodal applications. Use this skill for: loading and using pretrained transformer models (BERT, GPT, T5, RoBERTa, DistilBERT, BART, T5, ViT, CLIP, Whisper, Llama, Mistral), implementing text generation and completion, fine-tuning models for custom tasks, text classification and sentiment analysis, question answering and reading comprehension, named entity recognition and token classification, text summarization and translation, image classification and object detection, speech recognition and audio processing, multimodal tasks combining text and images, parameter-efficient fine-tuning with LoRA and adapters, model quantization and optimization, training custom transformer models, implementing chat interfaces and conversational AI, working with tokenizers and text preprocessing, handling model inference and deployment, managing GPU memory and device allocation, implementing custom training loops, using pipelines for quick inference, working with Hugging Face Hub for model sharing, and any machine learning task involving transformer architectures or attention mechanisms." --- # Transformers diff --git a/scientific-packages/umap-learn/SKILL.md b/scientific-packages/umap-learn/SKILL.md index 47dc38a..3aefd9f 100644 --- a/scientific-packages/umap-learn/SKILL.md +++ b/scientific-packages/umap-learn/SKILL.md @@ -1,6 +1,6 @@ --- name: umap-learn -description: Comprehensive guide for UMAP (Uniform Manifold Approximation and Projection) - a fast, scalable dimensionality reduction technique for visualization, clustering, and machine learning. Use this skill for: dimensionality reduction of high-dimensional datasets (genes, proteins, images, text embeddings, sensor data), creating 2D/3D visualizations of complex data, preprocessing data for clustering algorithms (especially HDBSCAN), supervised and semi-supervised dimensionality reduction with labels, transforming new data using trained UMAP models, parametric UMAP with neural networks, feature engineering for downstream ML models, manifold learning and non-linear dimensionality reduction, comparing UMAP to t-SNE/PCA/other methods, inverse transforms and data reconstruction, aligned UMAP for temporal/batch data analysis. Triggers include: "dimensionality reduction", "UMAP", "manifold learning", "data visualization", "clustering preprocessing", "high-dimensional data", "embedding", "reduce dimensions", "2D visualization", "3D visualization", "supervised dimensionality reduction", "parametric UMAP", "transform new data", "feature engineering", "HDBSCAN clustering", "t-SNE alternative", "non-linear dimensionality reduction", "inverse transform", "data reconstruction", "aligned embeddings", "batch effect correction", "temporal data analysis". +description: "Comprehensive guide for UMAP (Uniform Manifold Approximation and Projection) - a fast, scalable dimensionality reduction technique for visualization, clustering, and machine learning. Use this skill for: dimensionality reduction of high-dimensional datasets (genes, proteins, images, text embeddings, sensor data), creating 2D/3D visualizations of complex data, preprocessing data for clustering algorithms (especially HDBSCAN), supervised and semi-supervised dimensionality reduction with labels, transforming new data using trained UMAP models, parametric UMAP with neural networks, feature engineering for downstream ML models, manifold learning and non-linear dimensionality reduction, comparing UMAP to t-SNE/PCA/other methods, inverse transforms and data reconstruction, aligned UMAP for temporal/batch data analysis. Triggers include: \"dimensionality reduction\", \"UMAP\", \"manifold learning\", \"data visualization\", \"clustering preprocessing\", \"high-dimensional data\", \"embedding\", \"reduce dimensions\", \"2D visualization\", \"3D visualization\", \"supervised dimensionality reduction\", \"parametric UMAP\", \"transform new data\", \"feature engineering\", \"HDBSCAN clustering\", \"t-SNE alternative\", \"non-linear dimensionality reduction\", \"inverse transform\", \"data reconstruction\", \"aligned embeddings\", \"batch effect correction\", \"temporal data analysis\"." --- # UMAP-Learn diff --git a/scientific-packages/zarr-python/SKILL.md b/scientific-packages/zarr-python/SKILL.md index 15e227d..e2c74c7 100644 --- a/scientific-packages/zarr-python/SKILL.md +++ b/scientific-packages/zarr-python/SKILL.md @@ -1,6 +1,6 @@ --- name: zarr-python -description: Toolkit for working with Zarr, a Python library for chunked, compressed N-dimensional arrays optimized for cloud storage and large-scale scientific computing. Use this skill when working with large datasets that need efficient storage and parallel access, multidimensional arrays requiring chunking and compression, cloud-native data workflows (S3, GCS), or when integrating with NumPy, Dask, and Xarray for scientific computing tasks. Essential for handling datasets larger than memory, implementing parallel I/O operations, optimizing storage with compression, creating hierarchical data structures, converting between scientific data formats (HDF5, NetCDF, NumPy), managing cloud storage workflows, implementing chunked array operations, and building scalable scientific computing pipelines. +description: "Toolkit for working with Zarr, a Python library for chunked, compressed N-dimensional arrays optimized for cloud storage and large-scale scientific computing. Use this skill when working with large datasets that need efficient storage and parallel access, multidimensional arrays requiring chunking and compression, cloud-native data workflows (S3, GCS), or when integrating with NumPy, Dask, and Xarray for scientific computing tasks. Essential for handling datasets larger than memory, implementing parallel I/O operations, optimizing storage with compression, creating hierarchical data structures, converting between scientific data formats (HDF5, NetCDF, NumPy), managing cloud storage workflows, implementing chunked array operations, and building scalable scientific computing pipelines." --- # Zarr Python diff --git a/scientific-thinking/document-skills/pdf/SKILL.md b/scientific-thinking/document-skills/pdf/SKILL.md index f6a22dd..875e57e 100644 --- a/scientific-thinking/document-skills/pdf/SKILL.md +++ b/scientific-thinking/document-skills/pdf/SKILL.md @@ -1,6 +1,6 @@ --- name: pdf -description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale. +description: "Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale." license: Proprietary. LICENSE.txt has complete terms --- diff --git a/scientific-thinking/exploratory-data-analysis/SKILL.md b/scientific-thinking/exploratory-data-analysis/SKILL.md index 3dca02f..7ffbdd3 100644 --- a/scientific-thinking/exploratory-data-analysis/SKILL.md +++ b/scientific-thinking/exploratory-data-analysis/SKILL.md @@ -1,6 +1,6 @@ --- name: exploratory-data-analysis -description: Comprehensive exploratory data analysis (EDA) toolkit for analyzing datasets and generating actionable insights. Use this skill when users provide data files and request analysis, exploration, insights, or understanding of their data. Handles CSV, Excel (.xlsx/.xls), JSON, Parquet, TSV, Feather, HDF5, and Pickle files. Automatically performs statistical analysis including distributions, correlations, outlier detection, missing data patterns, and data quality assessment. Generates professional visualizations (histograms, box plots, correlation heatmaps, scatter matrices) and comprehensive markdown reports with automated insights. Key triggers: "analyze this data", "explore this dataset", "what's in this file", "data insights", "statistical summary", "data visualization", "EDA", "exploratory analysis", "data profiling", "understand my data", "find patterns", "data quality", "missing data", "outliers", "correlations", "distributions". Always outputs structured markdown reports with embedded visualizations and actionable recommendations. +description: "Comprehensive exploratory data analysis (EDA) toolkit for analyzing datasets and generating actionable insights. Use this skill when users provide data files and request analysis, exploration, insights, or understanding of their data. Handles CSV, Excel (.xlsx/.xls), JSON, Parquet, TSV, Feather, HDF5, and Pickle files. Automatically performs statistical analysis including distributions, correlations, outlier detection, missing data patterns, and data quality assessment. Generates professional visualizations (histograms, box plots, correlation heatmaps, scatter matrices) and comprehensive markdown reports with automated insights. Key triggers: \"analyze this data\", \"explore this dataset\", \"what's in this file\", \"data insights\", \"statistical summary\", \"data visualization\", \"EDA\", \"exploratory analysis\", \"data profiling\", \"understand my data\", \"find patterns\", \"data quality\", \"missing data\", \"outliers\", \"correlations\", \"distributions\". Always outputs structured markdown reports with embedded visualizations and actionable recommendations." --- # Exploratory Data Analysis diff --git a/scientific-thinking/hypothesis-generation/SKILL.md b/scientific-thinking/hypothesis-generation/SKILL.md index 23ba17b..e0bfdc4 100644 --- a/scientific-thinking/hypothesis-generation/SKILL.md +++ b/scientific-thinking/hypothesis-generation/SKILL.md @@ -1,6 +1,6 @@ --- name: hypothesis-generation -description: Generate robust, testable scientific hypotheses grounded in existing literature. Use this skill when users need to formulate hypotheses from observations, design experiments to test hypotheses, explore competing explanations for phenomena, develop testable predictions, or create mechanistic explanations across any scientific domain. This skill is essential for hypothesis formation, experimental design, developing testable predictions, proposing mechanistic explanations, generating alternative theories, designing studies to distinguish between competing hypotheses, creating falsifiable predictions, and systematically evaluating hypothesis quality. Apply when users ask about "why" something happens, need to explain observations, want to test theories, design experiments, propose mechanisms, generate predictions, or explore alternative explanations in biology, chemistry, physics, medicine, psychology, or any scientific field. +description: "Generate robust, testable scientific hypotheses grounded in existing literature. Use this skill when users need to formulate hypotheses from observations, design experiments to test hypotheses, explore competing explanations for phenomena, develop testable predictions, or create mechanistic explanations across any scientific domain. This skill is essential for hypothesis formation, experimental design, developing testable predictions, proposing mechanistic explanations, generating alternative theories, designing studies to distinguish between competing hypotheses, creating falsifiable predictions, and systematically evaluating hypothesis quality. Apply when users ask about \"why\" something happens, need to explain observations, want to test theories, design experiments, propose mechanisms, generate predictions, or explore alternative explanations in biology, chemistry, physics, medicine, psychology, or any scientific field." --- # Scientific Hypothesis Generation diff --git a/scientific-thinking/peer-review/SKILL.md b/scientific-thinking/peer-review/SKILL.md index efff76b..b5249f1 100644 --- a/scientific-thinking/peer-review/SKILL.md +++ b/scientific-thinking/peer-review/SKILL.md @@ -1,6 +1,6 @@ --- name: peer-review -description: Comprehensive scientific peer review toolkit for evaluating manuscripts, papers, preprints, and research documents across all disciplines. Use this skill to conduct systematic peer review following established scientific standards, providing constructive feedback on methodology, statistical analysis, experimental design, data interpretation, reproducibility, ethical considerations, and scientific rigor. Includes structured evaluation workflows, reporting standards compliance checks, figure/data integrity assessment, and guidance for writing professional review reports. Applicable to original research articles, reviews, meta-analyses, methods papers, short reports, and preprints in biology, chemistry, physics, medicine, computational sciences, and interdisciplinary research. Essential for manuscript evaluation, grant review, conference paper assessment, and maintaining scientific quality standards. +description: "Comprehensive scientific peer review toolkit for evaluating manuscripts, papers, preprints, and research documents across all disciplines. Use this skill to conduct systematic peer review following established scientific standards, providing constructive feedback on methodology, statistical analysis, experimental design, data interpretation, reproducibility, ethical considerations, and scientific rigor. Includes structured evaluation workflows, reporting standards compliance checks, figure/data integrity assessment, and guidance for writing professional review reports. Applicable to original research articles, reviews, meta-analyses, methods papers, short reports, and preprints in biology, chemistry, physics, medicine, computational sciences, and interdisciplinary research. Essential for manuscript evaluation, grant review, conference paper assessment, and maintaining scientific quality standards." --- # Scientific Critical Evaluation and Peer Review diff --git a/scientific-thinking/scientific-brainstorming/SKILL.md b/scientific-thinking/scientific-brainstorming/SKILL.md index a52a6cd..5af95ca 100644 --- a/scientific-thinking/scientific-brainstorming/SKILL.md +++ b/scientific-thinking/scientific-brainstorming/SKILL.md @@ -1,6 +1,6 @@ --- name: scientific-brainstorming -description: Structured conversational brainstorming partner for scientific research ideation and creative problem-solving. Activates when scientists need to: generate novel research ideas and hypotheses; explore interdisciplinary connections and cross-domain analogies; challenge research assumptions and conventional thinking; overcome creative blocks and mental barriers; develop innovative methodologies and experimental approaches; synthesize disparate concepts into coherent research directions; identify unexpected research opportunities and unexplored angles; brainstorm solutions to complex scientific problems; expand research scope beyond obvious approaches; connect findings across different scientific fields; develop collaborative research proposals; explore "what if" scenarios and alternative hypotheses; identify gaps in current scientific understanding; generate research questions from preliminary observations; develop creative approaches to experimental design; brainstorm applications of emerging technologies; explore unconventional data analysis methods; identify novel research collaborations; develop scientific communication strategies; and think through research problems from multiple fresh perspectives. This skill provides structured brainstorming workflows including divergent exploration, connection-making, critical evaluation, and synthesis phases, while maintaining conversational collaboration and domain-aware guidance across scientific disciplines. +description: "Structured conversational brainstorming partner for scientific research ideation and creative problem-solving. Activates when scientists need to: generate novel research ideas and hypotheses; explore interdisciplinary connections and cross-domain analogies; challenge research assumptions and conventional thinking; overcome creative blocks and mental barriers; develop innovative methodologies and experimental approaches; synthesize disparate concepts into coherent research directions; identify unexpected research opportunities and unexplored angles; brainstorm solutions to complex scientific problems; expand research scope beyond obvious approaches; connect findings across different scientific fields; develop collaborative research proposals; explore \"what if\" scenarios and alternative hypotheses; identify gaps in current scientific understanding; generate research questions from preliminary observations; develop creative approaches to experimental design; brainstorm applications of emerging technologies; explore unconventional data analysis methods; identify novel research collaborations; develop scientific communication strategies; and think through research problems from multiple fresh perspectives. This skill provides structured brainstorming workflows including divergent exploration, connection-making, critical evaluation, and synthesis phases, while maintaining conversational collaboration and domain-aware guidance across scientific disciplines." --- # Scientific Brainstorming diff --git a/scientific-thinking/scientific-critical-thinking/SKILL.md b/scientific-thinking/scientific-critical-thinking/SKILL.md index f77f299..2008ca3 100644 --- a/scientific-thinking/scientific-critical-thinking/SKILL.md +++ b/scientific-thinking/scientific-critical-thinking/SKILL.md @@ -1,6 +1,6 @@ --- name: scientific-critical-thinking -description: Apply systematic scientific critical thinking to rigorously evaluate research methodology, statistical analyses, evidence quality, and scientific claims. Use this skill when: analyzing research papers for methodological flaws and biases; evaluating experimental designs for validity threats; assessing statistical methods, power, multiple comparisons, and effect sizes; identifying logical fallacies and cognitive biases in scientific arguments; reviewing evidence hierarchies and GRADE criteria; critiquing causal claims vs correlational findings; evaluating study quality using established frameworks (Cochrane ROB, Newcastle-Ottawa); detecting publication bias, p-hacking, and selective reporting; assessing confounding, selection bias, and measurement validity; reviewing research proposals and study protocols; evaluating media reports of scientific findings; conducting systematic literature reviews; determining confidence levels in scientific conclusions; distinguishing between exploratory and confirmatory findings; and providing constructive methodological feedback for improving research rigor. +description: "Apply systematic scientific critical thinking to rigorously evaluate research methodology, statistical analyses, evidence quality, and scientific claims. Use this skill when: analyzing research papers for methodological flaws and biases; evaluating experimental designs for validity threats; assessing statistical methods, power, multiple comparisons, and effect sizes; identifying logical fallacies and cognitive biases in scientific arguments; reviewing evidence hierarchies and GRADE criteria; critiquing causal claims vs correlational findings; evaluating study quality using established frameworks (Cochrane ROB, Newcastle-Ottawa); detecting publication bias, p-hacking, and selective reporting; assessing confounding, selection bias, and measurement validity; reviewing research proposals and study protocols; evaluating media reports of scientific findings; conducting systematic literature reviews; determining confidence levels in scientific conclusions; distinguishing between exploratory and confirmatory findings; and providing constructive methodological feedback for improving research rigor." --- # Scientific Critical Thinking diff --git a/scientific-thinking/scientific-visualization/SKILL.md b/scientific-thinking/scientific-visualization/SKILL.md index d31f6d3..481aa24 100644 --- a/scientific-thinking/scientific-visualization/SKILL.md +++ b/scientific-thinking/scientific-visualization/SKILL.md @@ -1,6 +1,6 @@ --- name: scientific-visualization -description: Create publication-ready scientific figures, plots, charts, and visualizations using matplotlib, seaborn, and plotly. Use this skill for any scientific data visualization task including: creating figures for research papers and manuscripts; preparing plots for journal submission (Nature, Science, Cell, PLOS, PNAS, etc.); making publication-quality figures with proper resolution, fonts, and formatting; ensuring colorblind accessibility and accessibility compliance; creating multi-panel figures with consistent styling; visualizing statistical data with error bars, significance markers, and proper statistical representation; exporting figures in correct formats (PDF, EPS, TIFF, PNG) with appropriate DPI; following journal-specific requirements and style guidelines; improving existing figures to meet publication standards; creating figures that work in both color and grayscale; visualizing experimental results, data analysis outputs, statistical comparisons, time series, distributions, correlations, heatmaps, scatter plots, bar charts, line plots, box plots, violin plots, and other scientific plot types; ensuring figures are clear, accurate, accessible, and professional; applying proper typography, color palettes, and layout principles; creating figures for presentations, posters, and scientific communication; visualizing genomics data, microscopy images, experimental measurements, and research findings. +description: "Create publication-ready scientific figures, plots, charts, and visualizations using matplotlib, seaborn, and plotly. Use this skill for any scientific data visualization task including: creating figures for research papers and manuscripts; preparing plots for journal submission (Nature, Science, Cell, PLOS, PNAS, etc.); making publication-quality figures with proper resolution, fonts, and formatting; ensuring colorblind accessibility and accessibility compliance; creating multi-panel figures with consistent styling; visualizing statistical data with error bars, significance markers, and proper statistical representation; exporting figures in correct formats (PDF, EPS, TIFF, PNG) with appropriate DPI; following journal-specific requirements and style guidelines; improving existing figures to meet publication standards; creating figures that work in both color and grayscale; visualizing experimental results, data analysis outputs, statistical comparisons, time series, distributions, correlations, heatmaps, scatter plots, bar charts, line plots, box plots, violin plots, and other scientific plot types; ensuring figures are clear, accurate, accessible, and professional; applying proper typography, color palettes, and layout principles; creating figures for presentations, posters, and scientific communication; visualizing genomics data, microscopy images, experimental measurements, and research findings." --- # Scientific Visualization diff --git a/scientific-thinking/statistical-analysis/SKILL.md b/scientific-thinking/statistical-analysis/SKILL.md index 7ed1603..c62df80 100644 --- a/scientific-thinking/statistical-analysis/SKILL.md +++ b/scientific-thinking/statistical-analysis/SKILL.md @@ -1,6 +1,6 @@ --- name: statistical-analysis -description: Comprehensive statistical analysis toolkit for rigorous academic research using Python. This skill handles hypothesis testing (t-tests, ANOVA, chi-square, non-parametric tests), regression analysis (linear, multiple, logistic), correlation analysis, Bayesian statistics, and power analysis. It provides systematic workflows for test selection, assumption checking, effect size calculation, diagnostic visualization, and APA-style reporting. Use this skill when you need to: analyze data statistically, choose appropriate statistical tests, check assumptions before analysis, calculate effect sizes and confidence intervals, conduct power analysis for study planning, perform hypothesis testing or regression analysis, interpret statistical results, create publication-ready statistical reports, handle assumption violations, conduct Bayesian analysis, or generate diagnostic plots and statistical visualizations. Essential for research data analysis, experimental design validation, statistical modeling, and academic reporting. +description: "Comprehensive statistical analysis toolkit for rigorous academic research using Python. This skill handles hypothesis testing (t-tests, ANOVA, chi-square, non-parametric tests), regression analysis (linear, multiple, logistic), correlation analysis, Bayesian statistics, and power analysis. It provides systematic workflows for test selection, assumption checking, effect size calculation, diagnostic visualization, and APA-style reporting. Use this skill when you need to: analyze data statistically, choose appropriate statistical tests, check assumptions before analysis, calculate effect sizes and confidence intervals, conduct power analysis for study planning, perform hypothesis testing or regression analysis, interpret statistical results, create publication-ready statistical reports, handle assumption violations, conduct Bayesian analysis, or generate diagnostic plots and statistical visualizations. Essential for research data analysis, experimental design validation, statistical modeling, and academic reporting." --- # Statistical Analysis