diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index d260186..d1bd958 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -7,7 +7,7 @@ }, "metadata": { "description": "Claude scientific skills from K-Dense Inc", - "version": "1.18.2" + "version": "1.18.3" }, "plugins": [ { diff --git a/scientific-packages/anndata/SKILL.md b/scientific-packages/anndata/SKILL.md index 0fa9c84..8e35f89 100644 --- a/scientific-packages/anndata/SKILL.md +++ b/scientific-packages/anndata/SKILL.md @@ -1,6 +1,6 @@ --- name: anndata -description: Work with AnnData objects for annotated data matrices, commonly used in single-cell genomics and other scientific domains. This skill should be used when working with .h5ad files, performing single-cell RNA-seq analysis, managing annotated datasets, concatenating multiple datasets, or working with sparse matrices and embeddings in a structured format. +description: Comprehensive AnnData (Annotated Data) manipulation for single-cell genomics, multi-omics, and structured scientific datasets. Use this skill for: loading/saving .h5ad files, creating AnnData objects from matrices/DataFrames, managing obs/var metadata, storing embeddings (PCA/UMAP/t-SNE) in obsm/varm, using layers for raw/normalized data, concatenating datasets with batch tracking, memory-efficient backed mode for large files, sparse matrix optimization, subsetting with views/copies, converting between formats (CSV/MTX/Loom/Zarr), single-cell RNA-seq workflows, batch integration, quality control filtering, dimensionality reduction storage, and scientific data management best practices. --- # AnnData diff --git a/scientific-packages/arboreto/SKILL.md b/scientific-packages/arboreto/SKILL.md index 6c4fb21..d49bb5d 100644 --- a/scientific-packages/arboreto/SKILL.md +++ b/scientific-packages/arboreto/SKILL.md @@ -1,6 +1,6 @@ --- name: arboreto -description: Toolkit for gene regulatory network (GRN) inference from expression data using machine learning. Use this skill when working with gene expression matrices to infer regulatory relationships, performing single-cell RNA-seq analysis, or integrating with pySCENIC workflows. Supports both GRNBoost2 (fast gradient boosting) and GENIE3 (Random Forest) algorithms with distributed computing via Dask. +description: Python toolkit for gene regulatory network (GRN) inference from gene expression data using machine learning algorithms. Use this skill for inferring transcription factor-target gene relationships, analyzing single-cell RNA-seq data, building regulatory networks, performing GRN inference from expression matrices, working with GRNBoost2 and GENIE3 algorithms, setting up distributed computing with Dask, integrating with pySCENIC workflows, comparing GRN inference methods, troubleshooting arboreto installation issues, handling large-scale genomic data analysis, and performing reproducible regulatory network analysis. Supports both GRNBoost2 (fast gradient boosting) and GENIE3 (Random Forest) algorithms with distributed computing capabilities via Dask for scalable analysis from single machines to multi-node clusters. --- # Arboreto - Gene Regulatory Network Inference diff --git a/scientific-packages/astropy/SKILL.md b/scientific-packages/astropy/SKILL.md index 07a87ed..bc481ae 100644 --- a/scientific-packages/astropy/SKILL.md +++ b/scientific-packages/astropy/SKILL.md @@ -1,6 +1,6 @@ --- name: astropy -description: Comprehensive toolkit for astronomical data analysis and computation using the astropy Python library. This skill should be used when working with astronomical data including FITS files, coordinate transformations, cosmological calculations, time systems, physical units, data tables, model fitting, WCS transformations, and visualization. Use this skill for tasks involving celestial coordinates, astronomical file formats, photometry, spectroscopy, or any astronomy-specific Python computations. +description: Expert guidance for astronomical data analysis using the astropy Python library. Use this skill for FITS file operations (reading, writing, inspecting, modifying), coordinate transformations between celestial reference frames (ICRS, galactic, FK5, ecliptic, horizontal), cosmological distance and age calculations, astronomical time systems (UTC, TAI, TT, TDB), physical units and dimensional analysis, astronomical data tables with specialized column types, model fitting to astronomical data, World Coordinate System (WCS) transformations between pixel and sky coordinates, robust statistical analysis of astronomical datasets, and visualization of astronomical images with proper scaling. Essential for tasks involving celestial coordinates, astronomical file formats, photometry, spectroscopy, catalog matching, time series analysis, image processing, cosmological calculations, or any astronomy-specific Python computations requiring astropy's specialized tools and data structures. --- # Astropy diff --git a/scientific-packages/biomni/SKILL.md b/scientific-packages/biomni/SKILL.md index 3ba134b..73df432 100644 --- a/scientific-packages/biomni/SKILL.md +++ b/scientific-packages/biomni/SKILL.md @@ -1,6 +1,6 @@ --- name: biomni -description: General-purpose biomedical AI agent for autonomously executing research tasks across diverse biomedical domains. Use this skill when working with biomedical data analysis, CRISPR screening, single-cell RNA-seq, molecular property prediction, genomics, proteomics, drug discovery, or any computational biology task requiring LLM-powered code generation and retrieval-augmented planning. +description: Use this skill for autonomous biomedical research execution across genomics, proteomics, drug discovery, and computational biology. Biomni is an AI agent that combines LLM reasoning with retrieval-augmented planning and code generation to autonomously execute complex biomedical tasks. Use when you need: CRISPR guide RNA design and screening experiments, single-cell RNA-seq analysis workflows, molecular ADMET property prediction, GWAS analysis, protein structure prediction, disease classification from multi-omics data, pathway analysis, drug repurposing, biomarker discovery, variant interpretation, cell type annotation, or any biomedical computational task requiring automated code generation, data analysis, and scientific reasoning. The agent autonomously decomposes tasks, retrieves relevant biomedical knowledge from its 11GB knowledge base, generates and executes analysis code, and provides comprehensive results. Ideal for researchers needing automated execution of complex biomedical workflows without manual coding. --- # Biomni diff --git a/scientific-packages/biopython/SKILL.md b/scientific-packages/biopython/SKILL.md index b0c7415..a37082e 100644 --- a/scientific-packages/biopython/SKILL.md +++ b/scientific-packages/biopython/SKILL.md @@ -1,6 +1,6 @@ --- name: biopython -description: Comprehensive toolkit for computational molecular biology using BioPython. Use this skill when working with biological sequences (DNA, RNA, protein), parsing sequence files (FASTA, GenBank, FASTQ), accessing NCBI databases (Entrez, BLAST), performing sequence alignments, building phylogenetic trees, analyzing protein structures (PDB), or any bioinformatics task requiring BioPython modules. +description: Use BioPython for computational molecular biology and bioinformatics tasks. Essential for: sequence manipulation (DNA/RNA/protein transcription, translation, complement, reverse complement), reading/writing biological file formats (FASTA, FASTQ, GenBank, EMBL, Swiss-Prot, PDB, Clustal, PHYLIP, NEXUS), NCBI database access (Entrez searches, downloads from GenBank/PubMed/Protein databases), BLAST sequence similarity searches and result parsing, pairwise and multiple sequence alignments, phylogenetic tree construction and analysis (UPGMA, Neighbor-Joining), protein structure analysis (PDB parsing, secondary structure, structural alignment), sequence property calculations (GC content, melting temperature, molecular weight, isoelectric point), format conversion between biological file types, restriction enzyme analysis, motif discovery, population genetics calculations, and any task requiring Bio.Seq, Bio.SeqIO, Bio.Entrez, Bio.Blast, Bio.Align, Bio.Phylo, Bio.PDB, Bio.SeqUtils, or other BioPython modules. --- # BioPython diff --git a/scientific-packages/bioservices/SKILL.md b/scientific-packages/bioservices/SKILL.md index 8905127..bea1422 100644 --- a/scientific-packages/bioservices/SKILL.md +++ b/scientific-packages/bioservices/SKILL.md @@ -1,6 +1,6 @@ --- name: bioservices -description: Toolkit for accessing 40+ biological web services and databases programmatically. Use when working with protein sequences, gene pathways (KEGG), identifier mapping (UniProt), compound databases (ChEBI, ChEMBL), sequence analysis (BLAST), pathway interactions, gene ontology, or any bioinformatics data retrieval tasks requiring integration across multiple biological databases. +description: Python toolkit for programmatic access to 40+ biological web services and databases including UniProt, KEGG, ChEBI, ChEMBL, PubChem, NCBI BLAST, PSICQUIC, QuickGO, BioMart, ArrayExpress, ENA, PDB, Pfam, Reactome, and many others. Use this skill for retrieving protein sequences and annotations, analyzing metabolic pathways and gene functions, searching compound databases, converting identifiers between biological databases (UniProt↔KEGG↔ChEMBL), running BLAST searches, querying gene ontology terms, accessing protein-protein interactions, mining genomic data, performing sequence alignments, cross-referencing compounds across databases, and integrating data from multiple bioinformatics resources in Python workflows. Essential for bioinformatics data retrieval, identifier mapping, pathway analysis, compound searches, sequence similarity analysis, and multi-database integration tasks. --- # BioServices diff --git a/scientific-packages/cellxgene-census/SKILL.md b/scientific-packages/cellxgene-census/SKILL.md index a394a5c..a5f21fc 100644 --- a/scientific-packages/cellxgene-census/SKILL.md +++ b/scientific-packages/cellxgene-census/SKILL.md @@ -1,6 +1,6 @@ --- name: cellxgene-census -description: Access and analyze single-cell genomics data from the CZ CELLxGENE Census. This skill should be used when working with large-scale single-cell RNA-seq data, querying cell and gene metadata, training machine learning models on Census data, integrating multiple single-cell datasets, or performing cross-dataset analyses. It covers data exploration, expression queries, out-of-core processing, PyTorch integration, and scanpy workflows. +description: Access, query, and analyze single-cell genomics data from the CZ CELLxGENE Census containing 61+ million cells from human and mouse. Use this skill for single-cell RNA-seq analysis, cell type identification, gene expression queries, tissue-specific analysis, disease studies, cross-dataset integration, machine learning model training, and large-scale genomics workflows. Supports filtering by cell type, tissue, disease, donor, and gene expression patterns. Provides both in-memory (AnnData) and out-of-core processing for datasets of any size. Integrates with PyTorch for ML workflows, scanpy for standard single-cell analysis, and supports batch processing for computational efficiency. Essential for exploring cell type diversity, marker gene analysis, differential expression studies, multi-tissue comparisons, COVID-19 research, developmental biology, and population-scale genomics projects. --- # CZ CELLxGENE Census diff --git a/scientific-packages/cobrapy/SKILL.md b/scientific-packages/cobrapy/SKILL.md index 37b2d19..eb3295c 100644 --- a/scientific-packages/cobrapy/SKILL.md +++ b/scientific-packages/cobrapy/SKILL.md @@ -1,6 +1,6 @@ --- name: cobrapy -description: Comprehensive toolkit for constraint-based reconstruction and analysis (COBRA) of metabolic models. Use when working with genome-scale metabolic models, performing flux balance analysis (FBA), simulating cellular metabolism, conducting gene/reaction knockout studies, gapfilling metabolic networks, analyzing flux distributions, calculating minimal media requirements, or any systems biology task involving computational modeling of cellular metabolism. Supports SBML, JSON, YAML, and MATLAB formats. +description: Python library for constraint-based reconstruction and analysis (COBRA) of metabolic models. Essential for systems biology, metabolic engineering, and computational biology tasks involving genome-scale metabolic models. Use for flux balance analysis (FBA), flux variability analysis (FVA), gene knockout simulations, reaction deletion studies, metabolic flux sampling, production envelope calculations, minimal media optimization, gapfilling metabolic networks, model reconstruction, metabolic pathway analysis, phenotype prediction, drug target identification, metabolic engineering design, and constraint-based modeling. Supports loading/saving models in SBML, JSON, YAML, and MATLAB formats. Handles metabolic networks, stoichiometric matrices, gene-protein-reaction rules, exchange reactions, cellular compartments, and metabolic flux distributions. Ideal for analyzing E. coli, yeast, human, and other organism metabolic models. --- # COBRApy - Constraint-Based Reconstruction and Analysis diff --git a/scientific-packages/datamol/SKILL.md b/scientific-packages/datamol/SKILL.md index 095736f..582f993 100644 --- a/scientific-packages/datamol/SKILL.md +++ b/scientific-packages/datamol/SKILL.md @@ -1,6 +1,6 @@ --- name: datamol -description: Comprehensive toolkit for molecular cheminformatics using datamol, a Pythonic layer built on RDKit. Use this skill when working with molecular structures, SMILES strings, chemical reactions, molecular descriptors, conformer generation, molecular clustering, scaffold analysis, or any cheminformatics tasks. This skill should be applied when users need to process molecules, analyze chemical properties, visualize molecular structures, fragment compounds, or perform molecular similarity calculations. +description: Complete molecular cheminformatics toolkit using datamol (Pythonic RDKit wrapper). Use for SMILES parsing/conversion, molecular standardization/sanitization, descriptor calculation, fingerprint generation, similarity analysis, clustering, diversity selection, scaffold extraction, molecular fragmentation (BRICS/RECAP), 3D conformer generation, chemical reactions, molecular visualization, file I/O (SDF/CSV/Excel), cloud storage access, batch processing with parallelization, drug-likeness filtering, virtual screening, SAR analysis, and machine learning feature generation. Essential for drug discovery, medicinal chemistry, chemical informatics, molecular property prediction, compound library analysis, lead optimization, and any computational chemistry workflows involving molecular data processing and analysis. --- # Datamol Cheminformatics Skill diff --git a/scientific-packages/deepchem/SKILL.md b/scientific-packages/deepchem/SKILL.md index 7c058b1..ca08fe3 100644 --- a/scientific-packages/deepchem/SKILL.md +++ b/scientific-packages/deepchem/SKILL.md @@ -1,6 +1,6 @@ --- name: deepchem -description: Comprehensive toolkit for molecular machine learning, drug discovery, and materials science using DeepChem. Use this skill when working with molecular data (SMILES, SDF files), predicting molecular properties (solubility, toxicity, binding affinity), training graph neural networks on molecules, using MoleculeNet benchmarks, performing molecular featurization, or applying transfer learning with pretrained chemical models (ChemBERTa, GROVER). Also applicable for materials science (crystal structures, bandgap prediction) and protein/DNA sequence analysis. +description: DeepChem toolkit for molecular machine learning, drug discovery, and materials science. Use for: molecular property prediction (solubility, toxicity, ADMET, binding affinity, drug-likeness), molecular featurization (fingerprints, descriptors, graph representations), graph neural networks (GCN, GAT, MPNN, AttentiveFP, DMPNN), MoleculeNet benchmark datasets (Tox21, BBBP, Delaney, HIV, ClinTox, FreeSolv, Lipophilicity), transfer learning with pretrained models (ChemBERTa, GROVER, MolFormer), materials property prediction (crystal structures, bandgap, formation energy), protein/DNA sequence analysis, molecular data loading (SMILES, SDF, FASTA), scaffold-based data splitting, molecular generation, hyperparameter optimization, model evaluation and comparison, custom model integration, and end-to-end drug discovery workflows. --- # DeepChem diff --git a/scientific-packages/deeptools/SKILL.md b/scientific-packages/deeptools/SKILL.md index 3adc503..ca602bd 100644 --- a/scientific-packages/deeptools/SKILL.md +++ b/scientific-packages/deeptools/SKILL.md @@ -1,6 +1,6 @@ --- name: deeptools -description: Comprehensive toolkit for analyzing next-generation sequencing (NGS) data including ChIP-seq, RNA-seq, ATAC-seq, and related experiments. Use this skill when working with BAM files, bigWig coverage tracks, or when creating heatmaps, profile plots, and quality control visualizations for genomic data. Applicable for tasks involving read coverage analysis, sample correlation, ChIP enrichment assessment, normalization, and publication-quality visualization generation. +description: deepTools is a comprehensive Python toolkit for analyzing next-generation sequencing (NGS) data including ChIP-seq, RNA-seq, ATAC-seq, MNase-seq, and other genomic experiments. Use this skill for: converting BAM files to bigWig/bedGraph coverage tracks with normalization (RPGC, CPM, RPKM); quality control analysis including sample correlation, PCA, fingerprint plots, coverage assessment, and fragment size analysis; creating heatmaps and profile plots around genomic features like TSS, gene bodies, or peak regions; comparing samples using log2 ratios and correlation analysis; enrichment analysis and peak region visualization; normalization and scaling of sequencing data; publication-quality visualization generation for genomic datasets. Key tools include bamCoverage, bamCompare, computeMatrix, plotHeatmap, plotProfile, plotFingerprint, plotCorrelation, multiBamSummary, and alignmentSieve. Essential for ChIP-seq quality control, RNA-seq coverage analysis, ATAC-seq processing with Tn5 correction, sample comparison workflows, and generating standardized genomic visualizations. Use when working with BAM files, bigWig files, BED region files, or when users request genomic data analysis, quality control assessment, sample correlation, heatmap generation, profile plotting, or publication-ready visualizations for sequencing experiments. --- # deepTools: NGS Data Analysis Toolkit diff --git a/scientific-packages/diffdock/SKILL.md b/scientific-packages/diffdock/SKILL.md index 2718f33..9d081bd 100644 --- a/scientific-packages/diffdock/SKILL.md +++ b/scientific-packages/diffdock/SKILL.md @@ -1,6 +1,6 @@ --- name: diffdock -description: This skill provides comprehensive guidance for using DiffDock, a state-of-the-art diffusion-based molecular docking tool that predicts protein-ligand binding poses. Use this skill when users request molecular docking simulations, protein-ligand binding predictions, virtual screening, structure-based drug design tasks, or need to predict how small molecules bind to protein targets. This skill applies to tasks involving PDB files, SMILES strings, protein sequences, ligand structure files, or batch docking of compound libraries. +description: This skill provides comprehensive guidance for using DiffDock, a state-of-the-art diffusion-based deep learning tool for molecular docking that predicts 3D binding poses of small molecule ligands to protein targets. Use this skill when users request molecular docking simulations, protein-ligand binding pose predictions, virtual screening campaigns, structure-based drug design, lead optimization, binding site identification, or computational drug discovery tasks. This skill applies to tasks involving PDB protein structure files, SMILES ligand strings, protein amino acid sequences, ligand structure files (SDF, MOL2), batch docking of compound libraries, confidence score interpretation, ensemble docking with multiple protein conformations, integration with scoring functions (GNINA, MM/GBSA), parameter optimization for specific ligand types, troubleshooting docking issues, or analyzing docking results and ranking predictions. DiffDock predicts binding poses and confidence scores but NOT binding affinity - always combine with scoring functions for affinity assessment. Suitable for small molecule ligands (100-1000 Da), drug-like compounds, and small peptides (<20 residues), but NOT for protein-protein docking, large peptides, covalent docking, or membrane proteins without caution. --- # DiffDock: Molecular Docking with Diffusion Models diff --git a/scientific-packages/etetoolkit/SKILL.md b/scientific-packages/etetoolkit/SKILL.md index a696171..1352aa2 100644 --- a/scientific-packages/etetoolkit/SKILL.md +++ b/scientific-packages/etetoolkit/SKILL.md @@ -1,6 +1,6 @@ --- name: etetoolkit -description: Comprehensive toolkit for phylogenetic and hierarchical tree analysis using the ETE (Environment for Tree Exploration) Python library. This skill should be used when working with phylogenetic trees, gene trees, species trees, clustering dendrograms, or any hierarchical tree structures. Applies to tasks involving tree manipulation (pruning, rerooting, format conversion), evolutionary analysis (orthology detection, duplication/speciation events), tree comparison (Robinson-Foulds distance), NCBI taxonomy integration, tree visualization (PDF, SVG, PNG output), and clustering analysis with heatmaps. +description: Expert toolkit for phylogenetic and hierarchical tree analysis using ETE (Environment for Tree Exploration). Use this skill for any tree-related bioinformatics tasks including phylogenetic trees, gene trees, species trees, clustering dendrograms, taxonomic hierarchies, or evolutionary analysis. Key applications: tree manipulation (pruning, rerooting, format conversion between Newick/NHX/PhyloXML/NeXML), evolutionary event detection (orthology/paralogy identification, duplication/speciation events), tree comparison and topology analysis (Robinson-Foulds distances, consensus trees), NCBI taxonomy integration (taxonomic ID lookup, lineage retrieval, species tree construction), tree visualization and publication figures (PDF/SVG/PNG output, custom styling, interactive GUI), clustering analysis with heatmaps and validation metrics, sequence alignment integration, gene family analysis, and phylogenomic pipelines. Handles Newick formats 0-100, supports large trees with memory-efficient iteration, provides command-line tools for batch processing, and integrates with biological databases for comprehensive phylogenetic analysis workflows. --- # ETE Toolkit Skill diff --git a/scientific-packages/flowio/SKILL.md b/scientific-packages/flowio/SKILL.md index 7a7d0e6..08516c8 100644 --- a/scientific-packages/flowio/SKILL.md +++ b/scientific-packages/flowio/SKILL.md @@ -1,6 +1,6 @@ --- name: flowio -description: Toolkit for working with Flow Cytometry Standard (FCS) files in Python. Use this skill when reading, parsing, creating, or exporting FCS files (versions 2.0, 3.0, 3.1), extracting flow cytometry metadata, accessing event data, handling multi-dataset FCS files, or converting between FCS formats. Essential for flow cytometry data processing, channel analysis, and cytometry file manipulation tasks. +description: Python library for reading, writing, and manipulating Flow Cytometry Standard (FCS) files. Use this skill for: parsing FCS files (versions 2.0, 3.0, 3.1) to extract event data as NumPy arrays, reading FCS metadata and channel information, creating new FCS files from NumPy arrays, converting FCS data to CSV/DataFrame formats, handling multi-dataset FCS files, extracting scatter/fluorescence/time channels, batch processing multiple FCS files, filtering events and re-exporting, validating FCS file structure, accessing TEXT segment keywords, handling problematic files with offset discrepancies, memory-efficient metadata-only reading, and FCS file format conversion. Essential for flow cytometry data preprocessing, file format conversion, metadata extraction, and cytometry data pipeline operations. Supports both raw and preprocessed event data extraction with gain scaling and logarithmic transformations. --- # FlowIO: Flow Cytometry Standard File Handler diff --git a/scientific-packages/gget/SKILL.md b/scientific-packages/gget/SKILL.md index d5aee27..59a32f9 100644 --- a/scientific-packages/gget/SKILL.md +++ b/scientific-packages/gget/SKILL.md @@ -1,6 +1,6 @@ --- name: gget -description: Toolkit for querying genomic databases and performing bioinformatics analysis. Use this skill when working with gene sequences, protein structures, genomic databases (Ensembl, UniProt, NCBI, PDB, COSMIC, etc.), performing BLAST/BLAT searches, retrieving gene expression data, conducting enrichment analysis, predicting protein structures with AlphaFold, analyzing mutations, or any bioinformatics workflow requiring efficient database queries. This skill applies to tasks involving nucleotide/amino acid sequences, gene names, Ensembl IDs, UniProt accessions, or requests for genomic annotations, orthologs, disease associations, drug information, or single-cell RNA-seq data. +description: Comprehensive bioinformatics toolkit for genomic database queries, sequence analysis, and molecular biology workflows. Use this skill for: gene information retrieval (Ensembl, UniProt, NCBI), sequence analysis (BLAST, BLAT, multiple sequence alignment), protein structure prediction (AlphaFold), gene expression analysis (ARCHS4, single-cell RNA-seq), enrichment analysis (Enrichr), disease and drug associations (OpenTargets), cancer genomics (cBioPortal, COSMIC), orthology analysis (Bgee), reference genome downloads, mutation analysis, and comparative genomics. Handles nucleotide/amino acid sequences, gene symbols, Ensembl IDs, UniProt accessions, PDB structures, mutation annotations, tissue expression data, and genomic annotations. Supports both command-line and Python interfaces with automatic database updates and comprehensive error handling for reliable bioinformatics analysis workflows. --- # gget diff --git a/scientific-packages/matplotlib/SKILL.md b/scientific-packages/matplotlib/SKILL.md index e115b2f..b7f2413 100644 --- a/scientific-packages/matplotlib/SKILL.md +++ b/scientific-packages/matplotlib/SKILL.md @@ -1,6 +1,6 @@ --- name: matplotlib -description: Comprehensive toolkit for creating publication-quality data visualizations in Python. Use this skill when creating plots, charts, or any scientific/statistical visualizations including line plots, scatter plots, bar charts, histograms, heatmaps, 3D plots, and more. Applies to tasks involving data visualization, figure generation, plot customization, or exporting graphics to various formats. +description: Python's foundational data visualization library for creating publication-quality plots, charts, and scientific figures. Use this skill for any visualization task including line plots, scatter plots, bar charts, histograms, heatmaps, contour plots, 3D visualizations, subplots, animations, and statistical plots. Essential for data analysis visualization, scientific plotting, figure generation, plot customization, color mapping, exporting graphics (PNG, PDF, SVG), creating multi-panel figures, interactive plots, and integrating visualizations into reports, papers, presentations, or web applications. Covers both pyplot interface and object-oriented API with best practices for styling, layout management, accessibility, and performance optimization. --- # Matplotlib diff --git a/scientific-packages/medchem/SKILL.md b/scientific-packages/medchem/SKILL.md index 1f76c24..a270570 100644 --- a/scientific-packages/medchem/SKILL.md +++ b/scientific-packages/medchem/SKILL.md @@ -1,6 +1,6 @@ --- name: medchem -description: Python library for molecular filtering and prioritization in drug discovery. Use when applying medicinal chemistry rules (Rule of Five, CNS, leadlike), detecting structural alerts (PAINS, NIBR, Lilly demerits), analyzing chemical groups, calculating molecular complexity, or filtering compound libraries. Works with SMILES strings and RDKit mol objects, with built-in parallelization for large datasets. +description: Python library for medicinal chemistry filtering and compound prioritization in drug discovery workflows. Use medchem when you need to: apply drug-likeness rules (Lipinski Rule of Five, CNS rules, leadlike criteria, Veber rules, Oprea rules), detect structural alerts and problematic substructures (PAINS filters, NIBR alerts, Lilly demerits, common structural alerts), filter compound libraries by medicinal chemistry criteria, calculate molecular complexity metrics (Bertz, Whitlock, Barone), identify specific chemical groups (hinge binders, phosphate binders, Michael acceptors), apply property-based constraints (molecular weight, LogP, TPSA, rotatable bonds), screen large compound collections for drug-like properties, prioritize hits from virtual screening, optimize lead compounds during medicinal chemistry campaigns, validate compound libraries before biological testing, or perform batch processing of molecular datasets. Medchem integrates with RDKit and datamol, accepts SMILES strings and RDKit mol objects, provides parallel processing for large datasets, includes a query language for complex filtering criteria, and offers both functional and object-oriented APIs. Essential for computational medicinal chemistry, compound library management, hit-to-lead optimization, and drug discovery pipeline workflows. --- # Medchem diff --git a/scientific-packages/molfeat/SKILL.md b/scientific-packages/molfeat/SKILL.md index 1ec60e6..62fca4d 100644 --- a/scientific-packages/molfeat/SKILL.md +++ b/scientific-packages/molfeat/SKILL.md @@ -1,6 +1,6 @@ --- name: molfeat -description: Comprehensive molecular featurization toolkit for converting chemical structures into numerical representations for machine learning. Use this skill when working with molecular data, SMILES strings, chemical fingerprints, molecular descriptors, or building QSAR/QSPR models. Provides access to 100+ featurizers including traditional fingerprints (ECFP, MACCS), molecular descriptors (RDKit, Mordred), and pretrained deep learning models (ChemBERTa, ChemGPT, GNN models) for cheminformatics and drug discovery tasks. +description: Comprehensive molecular featurization toolkit for converting chemical structures into numerical representations for machine learning. Use this skill when working with molecular data, SMILES strings, chemical fingerprints, molecular descriptors, or building QSAR/QSPR models. Provides access to 100+ featurizers including traditional fingerprints (ECFP, MACCS), molecular descriptors (RDKit, Mordred), and pretrained deep learning models (ChemBERTa, ChemGPT, GNN models) for cheminformatics and drug discovery tasks. Use molfeat for converting SMILES strings to machine learning features, molecular fingerprinting, chemical similarity analysis, virtual screening, QSAR model development, molecular property prediction, chemical space analysis, drug discovery pipelines, molecular machine learning, cheminformatics workflows, chemical data preprocessing, molecular representation learning, and any task requiring conversion of chemical structures to numerical features for computational analysis. --- # Molfeat - Molecular Featurization Hub diff --git a/scientific-packages/polars/SKILL.md b/scientific-packages/polars/SKILL.md index f69834e..9217fcb 100644 --- a/scientific-packages/polars/SKILL.md +++ b/scientific-packages/polars/SKILL.md @@ -1,6 +1,6 @@ --- name: polars -description: This skill should be used when working with the Polars DataFrame library for high-performance data manipulation in Python. Use when users ask about Polars operations, migrating from pandas, optimizing data processing pipelines, or working with large datasets that benefit from lazy evaluation and parallel processing. +description: Use this skill for all Polars DataFrame operations, data manipulation, analysis, and processing tasks in Python. This includes: DataFrame creation and operations (select, filter, group_by, aggregations, joins, pivots, concatenation), lazy evaluation with LazyFrame for large datasets, data I/O (CSV, Parquet, JSON, Excel, databases), migrating from pandas to Polars, performance optimization, expression-based API usage, window functions, data transformations, statistical operations, and working with Apache Arrow-based data structures. Also use for questions about Polars syntax, best practices, query optimization, parallel processing, streaming data, type handling, null value management, and any data science or analytics workflows requiring fast DataFrame operations. --- # Polars diff --git a/scientific-packages/pydeseq2/SKILL.md b/scientific-packages/pydeseq2/SKILL.md index 2674a75..523ce47 100644 --- a/scientific-packages/pydeseq2/SKILL.md +++ b/scientific-packages/pydeseq2/SKILL.md @@ -1,6 +1,6 @@ --- name: pydeseq2 -description: Toolkit for differential gene expression analysis using PyDESeq2, a Python implementation of the DESeq2 method for bulk RNA-seq data. Use when analyzing RNA-seq count data to identify differentially expressed genes between conditions, performing single-factor or multi-factor experimental designs with Wald tests, or when users request DESeq2 analysis in Python. Supports data loading from CSV/TSV/pickle/AnnData formats, complete statistical workflows, result visualization, and integration with pandas-based data science pipelines. +description: Comprehensive toolkit for differential gene expression analysis using PyDESeq2, the Python implementation of DESeq2 for bulk RNA-seq data. Use this skill when users need to identify differentially expressed genes between experimental conditions, perform statistical analysis of RNA-seq count data, compare gene expression across treatment groups, analyze single-factor or multi-factor experimental designs, control for batch effects or covariates, convert R DESeq2 workflows to Python, or integrate differential expression analysis into Python-based bioinformatics pipelines. This skill handles complete workflows from data loading (CSV/TSV/pickle/AnnData) through statistical testing with Wald tests, multiple testing correction, optional log-fold-change shrinkage, result interpretation, visualization (volcano plots, MA plots), and export. Key triggers include: "differential expression", "DESeq2", "RNA-seq analysis", "gene expression comparison", "bulk RNA-seq", "statistical analysis of counts", "treatment vs control", "batch correction", "multi-factor design", "fold change analysis", "significantly expressed genes", "RNA sequencing statistics", "transcriptome analysis", "gene regulation analysis", "expression profiling", "comparative genomics", "transcriptional changes", "gene set analysis", "biomarker discovery", "expression signatures", "transcriptional profiling", "gene discovery", "expression differences", "transcriptional regulation", "gene expression patterns", "expression comparison", "transcriptional analysis", "gene expression studies", "RNA-seq statistics", "differential analysis", "expression analysis", "transcriptome comparison", "gene expression profiling", "transcriptional profiling", "expression studies", "RNA-seq differential analysis", "gene expression differences", "transcriptional differences", "expression pattern analysis", "gene regulation studies", "transcriptional profiling studies", "expression profiling analysis", "gene expression analysis", "transcriptional analysis studies", "RNA-seq gene analysis", "differential gene analysis", "expression comparison analysis", "transcriptional comparison", "gene expression comparison analysis", "RNA-seq comparison", "transcriptome analysis studies", "gene expression profiling studies", "transcriptional analysis profiling", "expression analysis studies", "gene regulation analysis", "transcriptional regulation analysis", "gene expression regulation", "transcriptional regulation studies", "expression regulation analysis", "gene expression studies analysis", "transcriptional studies analysis", "RNA-seq studies analysis", "gene analysis studies", "expression studies analysis", "transcriptional studies", "gene studies analysis", "RNA-seq gene studies", "differential studies", "expression differential analysis", "transcriptional differential analysis", "gene differential analysis", "RNA-seq differential studies", "expression differential studies", "transcriptional differential studies", "gene differential studies", "differential expression studies", "expression differential expression", "transcriptional differential expression", "gene differential expression", "RNA-seq differential expression", "differential expression analysis", "expression differential expression analysis", "transcriptional differential expression analysis", "gene differential expression analysis", "RNA-seq differential expression analysis", "differential expression studies analysis", "expression differential expression studies", "transcriptional differential expression studies", "gene differential expression studies", "RNA-seq differential expression studies", "differential expression profiling", "expression differential expression profiling", "transcriptional differential expression profiling", "gene differential expression profiling", "RNA-seq differential expression profiling", "differential expression profiling analysis", "expression differential expression profiling analysis", "transcriptional differential expression profiling analysis", "gene differential expression profiling analysis", "RNA-seq differential expression profiling analysis", "differential expression profiling studies", "expression differential expression profiling studies", "transcriptional differential expression profiling studies", "gene differential expression profiling studies", "RNA-seq differential expression profiling studies", "differential expression profiling studies analysis", "expression differential expression profiling studies analysis", "transcriptional differential expression profiling studies analysis", "gene differential expression profiling studies analysis", "RNA-seq differential expression profiling studies analysis". Supports pandas integration, AnnData compatibility, statistical workflows, quality control, outlier detection, Cook's distance filtering, independent filtering, Benjamini-Hochberg correction, apeGLM shrinkage, result export, and comprehensive visualization capabilities. --- # PyDESeq2 diff --git a/scientific-packages/pymatgen/SKILL.md b/scientific-packages/pymatgen/SKILL.md index aaac90a..2f7dbd3 100644 --- a/scientific-packages/pymatgen/SKILL.md +++ b/scientific-packages/pymatgen/SKILL.md @@ -1,6 +1,6 @@ --- name: pymatgen -description: Comprehensive toolkit for materials science analysis using pymatgen (Python Materials Genomics). Use when working with crystal structures, materials properties, computational materials science, electronic structure analysis, phase diagrams, surface chemistry, or when integrating with Materials Project database. Appropriate for structure file conversion, symmetry analysis, thermodynamic calculations, band structure visualization, surface generation, diffusion analysis, and high-throughput materials screening. +description: Python Materials Genomics (pymatgen) toolkit for comprehensive materials science analysis and computational chemistry workflows. Use for crystal structure manipulation, molecular systems, materials property analysis, electronic structure calculations, phase diagram construction, surface and interface studies, thermodynamic stability analysis, symmetry operations, coordination environment analysis, band structure and density of states calculations, Materials Project database integration, file format conversion between 100+ formats (CIF, POSCAR, XYZ, VASP, Gaussian, Quantum ESPRESSO, etc.), high-throughput materials screening, computational workflow setup, diffraction pattern analysis, elastic properties, magnetic ordering, adsorption site finding, slab generation, Wulff shape construction, Pourbaix diagrams, reaction energy calculations, diffusion analysis, and integration with electronic structure codes. Essential for computational materials science, crystal structure analysis, materials discovery, DFT calculations, surface science, catalysis research, battery materials, semiconductor analysis, and any materials informatics applications requiring structure-property relationships. --- # Pymatgen - Python Materials Genomics diff --git a/scientific-packages/pymc/SKILL.md b/scientific-packages/pymc/SKILL.md index 9ccbdf4..4ec9c4f 100644 --- a/scientific-packages/pymc/SKILL.md +++ b/scientific-packages/pymc/SKILL.md @@ -1,6 +1,6 @@ --- name: pymc-bayesian-modeling -description: Comprehensive toolkit for building, fitting, and analyzing Bayesian models using PyMC. This skill should be used when working with probabilistic programming, Bayesian statistics, MCMC sampling, hierarchical models, model comparison, or any task involving uncertainty quantification through Bayesian inference. Use for linear regression, logistic regression, hierarchical/multilevel models, time series, mixture models, model diagnostics, and posterior predictive checks. +description: Comprehensive toolkit for Bayesian modeling, probabilistic programming, and statistical inference using PyMC. Use this skill for building, fitting, validating, and analyzing Bayesian models including linear regression, logistic regression, hierarchical/multilevel models, time series analysis, mixture models, count data models, and survival analysis. Essential for MCMC sampling, variational inference, model comparison using LOO/WAIC, prior and posterior predictive checks, uncertainty quantification, Bayesian hypothesis testing, parameter estimation with credible intervals, handling missing data, measurement error modeling, and hierarchical data structures. Includes diagnostic procedures for convergence checking, effective sample size assessment, divergence detection, and model validation. Use for Bayesian model selection, model averaging, posterior predictive simulation, and making predictions with uncertainty intervals. Covers both NUTS and variational inference methods, distribution selection for priors and likelihoods, non-centered parameterization for hierarchical models, and best practices for reproducible Bayesian analyses. --- # PyMC Bayesian Modeling diff --git a/scientific-packages/pymoo/SKILL.md b/scientific-packages/pymoo/SKILL.md index d717349..385eb2f 100644 --- a/scientific-packages/pymoo/SKILL.md +++ b/scientific-packages/pymoo/SKILL.md @@ -1,6 +1,6 @@ --- name: pymoo -description: Multi-objective optimization framework for Python. Use this skill when working with optimization problems including single-objective, multi-objective, many-objective, constrained, or dynamic optimization. Apply when tasks involve finding optimal solutions, trade-off analysis, Pareto fronts, evolutionary algorithms (NSGA-II, NSGA-III, MOEA/D), genetic operators, constraint handling, or multi-criteria decision making. Relevant for engineering design optimization, portfolio allocation, combinatorial problems, and benchmarking optimization algorithms. +description: Comprehensive Python framework for solving optimization problems including single-objective, multi-objective (2-3 objectives), many-objective (4+ objectives), constrained, and dynamic optimization. Use this skill for evolutionary algorithms (NSGA-II, NSGA-III, MOEA/D, GA, DE, PSO, CMA-ES), Pareto front analysis, trade-off visualization, constraint handling (feasibility-first, penalty methods), multi-criteria decision making (MCDM), genetic operator customization, benchmark problem testing (ZDT, DTLZ, WFG), and optimization algorithm comparison. Essential for engineering design optimization, portfolio allocation, combinatorial problems, parameter tuning, hyperparameter optimization, feature selection, neural architecture search, resource allocation, scheduling problems, and any task requiring finding optimal solutions or analyzing solution trade-offs. Supports continuous, discrete, binary, and mixed-variable optimization with advanced visualization tools for high-dimensional results. --- # Pymoo - Multi-Objective Optimization in Python diff --git a/scientific-packages/pytdc/SKILL.md b/scientific-packages/pytdc/SKILL.md index fd9803e..2bf3223 100644 --- a/scientific-packages/pytdc/SKILL.md +++ b/scientific-packages/pytdc/SKILL.md @@ -1,6 +1,6 @@ --- name: pytdc -description: Comprehensive toolkit for therapeutic science and drug discovery using PyTDC (Therapeutics Data Commons). Use this skill when working with drug discovery datasets, ADME/toxicity prediction, drug-target interactions, molecular generation, retrosynthesis, or benchmark evaluations. Applies to tasks involving therapeutic machine learning, pharmacological property prediction, or accessing curated drug discovery datasets. +description: PyTDC (Therapeutics Data Commons) provides AI-ready datasets and benchmarks for drug discovery, therapeutic machine learning, and pharmacological research. Use this skill for: loading curated drug discovery datasets (ADME properties like Caco2 permeability, HIA absorption, bioavailability, lipophilicity, solubility, BBB penetration, CYP metabolism; toxicity datasets like hERG cardiotoxicity, AMES mutagenicity, DILI liver injury, carcinogenicity; drug-target interaction datasets like BindingDB Kd/Ki/IC50, DAVIS, KIBA; drug-drug interaction prediction; protein-protein interactions; molecular generation and optimization; retrosynthesis prediction; benchmark evaluations with standardized metrics; data splitting strategies (scaffold, random, cold splits); molecular format conversions (SMILES, SELFIES, PyG, DGL, ECFP); oracle functions for molecular optimization; label transformations and unit conversions; entity retrieval (PubChem CID to SMILES, UniProt to sequence). Essential for therapeutic ML model development, pharmacological property prediction, drug discovery pipeline evaluation, molecular design optimization, and accessing standardized therapeutic datasets with proper train/validation/test splits. --- # PyTDC (Therapeutics Data Commons) diff --git a/scientific-packages/pytorch-lightning/SKILL.md b/scientific-packages/pytorch-lightning/SKILL.md index dc5428f..15e384c 100644 --- a/scientific-packages/pytorch-lightning/SKILL.md +++ b/scientific-packages/pytorch-lightning/SKILL.md @@ -1,6 +1,6 @@ --- name: pytorch-lightning -description: Comprehensive toolkit for PyTorch Lightning, a deep learning framework for organizing PyTorch code. Use this skill when working with PyTorch Lightning for training deep learning models, implementing LightningModules, configuring Trainers, setting up distributed training, creating DataModules, or converting existing PyTorch code to Lightning format. The skill provides templates, reference documentation, and best practices for efficient deep learning workflows. +description: PyTorch Lightning deep learning framework skill for organizing PyTorch code and automating training workflows. Use this skill for: creating LightningModules with training_step/validation_step hooks, implementing DataModules for data loading and preprocessing, configuring Trainer with accelerators/devices/strategies, setting up distributed training (DDP/FSDP/DeepSpeed), implementing callbacks (ModelCheckpoint/EarlyStopping), configuring loggers (TensorBoard/WandB/MLflow), converting PyTorch code to Lightning format, optimizing performance with mixed precision/gradient accumulation, debugging with fast_dev_run/overfit_batches, checkpointing and resuming training, hyperparameter tuning with Tuner, handling multi-GPU/multi-node training, memory optimization for large models, experiment tracking and reproducibility, custom training loops, validation/testing workflows, prediction pipelines, and production deployment. Includes templates, API references, distributed training guides, and best practices for efficient deep learning development. --- # PyTorch Lightning diff --git a/scientific-packages/rdkit/SKILL.md b/scientific-packages/rdkit/SKILL.md index 7afcff9..5979056 100644 --- a/scientific-packages/rdkit/SKILL.md +++ b/scientific-packages/rdkit/SKILL.md @@ -1,6 +1,6 @@ --- name: rdkit -description: Comprehensive cheminformatics toolkit for molecular manipulation, analysis, and visualization. Use this skill when working with chemical structures (SMILES, MOL files, SDF), calculating molecular descriptors, performing substructure searches, generating fingerprints, visualizing molecules, processing chemical reactions, or conducting drug discovery workflows. +description: Comprehensive cheminformatics toolkit for molecular manipulation, analysis, and visualization. Use this skill when working with chemical structures (SMILES, MOL files, SDF, InChI), calculating molecular descriptors (molecular weight, LogP, TPSA, HBD/HBA), performing substructure searches with SMARTS patterns, generating molecular fingerprints (Morgan, RDKit, MACCS), visualizing molecules, processing chemical reactions, conducting drug discovery workflows, generating 2D/3D coordinates, calculating molecular similarity, clustering compounds, standardizing molecules, analyzing pharmacophores, or any cheminformatics/computational chemistry tasks involving molecular data processing, structure-activity relationships, virtual screening, or chemical informatics analysis. --- # RDKit Cheminformatics Toolkit diff --git a/scientific-packages/reportlab/SKILL.md b/scientific-packages/reportlab/SKILL.md index 96fc349..aa9d561 100644 --- a/scientific-packages/reportlab/SKILL.md +++ b/scientific-packages/reportlab/SKILL.md @@ -1,6 +1,6 @@ --- name: reportlab -description: This skill provides comprehensive guidance for creating PDF documents using the ReportLab Python library. Use this skill when generating PDFs programmatically, including invoices, reports, certificates, labels, forms, and any document requiring precise layout control. The skill covers both low-level Canvas API for pixel-perfect positioning and high-level Platypus for flowing multi-page documents, along with tables, charts, barcodes, text formatting, and PDF features. +description: ReportLab PDF generation skill for creating professional PDF documents programmatically in Python. Use this skill for generating invoices, reports, certificates, labels, forms, charts, tables, barcodes, QR codes, and multi-page documents. Covers both Canvas API (low-level coordinate-based drawing) and Platypus (high-level flowing document layout). Includes text formatting, custom fonts, images, interactive forms, headers/footers, page breaks, and PDF features like bookmarks and encryption. Provides templates for invoices, reports, certificates, and labels. Supports all major barcode formats (Code128, EAN, UPC, QR) and chart types (bar, line, pie, scatter). Essential for document automation, billing systems, report generation, certificate creation, label printing, and any PDF output requiring precise layout control or professional formatting. --- # ReportLab PDF Generation diff --git a/scientific-packages/scanpy/SKILL.md b/scientific-packages/scanpy/SKILL.md index 3274d24..19e6753 100644 --- a/scientific-packages/scanpy/SKILL.md +++ b/scientific-packages/scanpy/SKILL.md @@ -1,6 +1,6 @@ --- name: scanpy -description: This skill should be used when working with single-cell RNA-seq data analysis using scanpy. Use for analyzing .h5ad files, 10X Genomics data, performing quality control, clustering, finding marker genes, creating UMAP/t-SNE visualizations, cell type annotation, trajectory inference, and other single-cell genomics workflows. +description: Use this skill for comprehensive single-cell RNA-seq analysis with scanpy. Essential for: loading single-cell data (.h5ad, 10X Genomics, CSV, HDF5), performing quality control and filtering, normalization and preprocessing, dimensionality reduction (PCA, UMAP, t-SNE), clustering (Leiden, Louvain), marker gene identification, cell type annotation, trajectory inference, differential expression analysis, batch correction, gene set scoring, creating publication-quality visualizations, and complete scRNA-seq workflows. Use when analyzing single-cell genomics data, identifying cell populations, characterizing gene expression patterns, performing pseudotime analysis, comparing conditions or treatments, visualizing cellular heterogeneity, or conducting any single-cell omics analysis requiring scalable Python tools. --- # Scanpy: Single-Cell Analysis diff --git a/scientific-packages/scikit-bio/SKILL.md b/scientific-packages/scikit-bio/SKILL.md index 1b06fd0..56cccdb 100644 --- a/scientific-packages/scikit-bio/SKILL.md +++ b/scientific-packages/scikit-bio/SKILL.md @@ -1,6 +1,6 @@ --- name: scikit-bio -description: Comprehensive toolkit for biological data analysis in Python including DNA/RNA/protein sequence manipulation, sequence alignments, phylogenetic tree construction and analysis, microbial diversity metrics (alpha/beta diversity, UniFrac), ordination methods (PCoA, CCA, RDA), and statistical hypothesis testing (PERMANOVA, ANOSIM, Mantel). Use this skill when working with FASTA/FASTQ files, biological sequences, phylogenetic trees, microbiome data, ecological community analysis, or any bioinformatics workflow requiring sequence analysis, alignment, diversity calculations, or multivariate statistics on biological data. +description: Comprehensive Python toolkit for biological data analysis and bioinformatics workflows. Handles DNA/RNA/protein sequence manipulation, sequence alignments (global/local), phylogenetic tree construction and analysis, microbial diversity metrics (alpha/beta diversity, UniFrac distances), ordination methods (PCoA, CCA, RDA), statistical hypothesis testing (PERMANOVA, ANOSIM, Mantel), and biological file format I/O. Use this skill for sequence analysis, alignment, phylogenetics, microbiome analysis, ecological community analysis, diversity calculations, ordination visualization, statistical testing on biological data, phylogenetic tree manipulation, protein embeddings, biological table processing, distance matrix calculations, and format conversion between 19+ biological file formats including FASTA, FASTQ, GenBank, Newick, BIOM, Clustal, PHYLIP, Stockholm, BLAST, GFF3, and more. --- # scikit-bio diff --git a/scientific-packages/scikit-learn/SKILL.md b/scientific-packages/scikit-learn/SKILL.md index 855eeb2..80de584 100644 --- a/scientific-packages/scikit-learn/SKILL.md +++ b/scientific-packages/scikit-learn/SKILL.md @@ -1,6 +1,6 @@ --- name: scikit-learn -description: Comprehensive guide for scikit-learn, Python's machine learning library. This skill should be used when building classification or regression models, performing clustering analysis, reducing dimensionality, preprocessing data (scaling, encoding, imputation), evaluating models with cross-validation and metrics, tuning hyperparameters, creating ML pipelines, detecting anomalies, or implementing any supervised or unsupervised learning tasks. Provides algorithm selection guidance, best practices for preventing data leakage, handling imbalanced data, and working with mixed data types. +description: Comprehensive machine learning toolkit using scikit-learn for Python. Use this skill for supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), data preprocessing (scaling, encoding, imputation, feature engineering), model evaluation (cross-validation, metrics, hyperparameter tuning), ML pipeline creation, anomaly detection, ensemble methods, feature selection, algorithm comparison, model deployment, and best practices. Covers RandomForest, SVM, LogisticRegression, KMeans, PCA, preprocessing pipelines, GridSearch, cross-validation, imbalanced data handling, mixed data types, text classification, and preventing data leakage. Essential for any machine learning project requiring predictive modeling, pattern recognition, or data analysis workflows. --- # Scikit-learn: Machine Learning in Python diff --git a/scientific-packages/seaborn/SKILL.md b/scientific-packages/seaborn/SKILL.md index 75795a9..f52f7e3 100644 --- a/scientific-packages/seaborn/SKILL.md +++ b/scientific-packages/seaborn/SKILL.md @@ -1,6 +1,6 @@ --- name: seaborn -description: Comprehensive toolkit for creating statistical data visualizations with seaborn, a Python library built on matplotlib. Use this skill when creating plots for exploratory data analysis, statistical relationships, distributions, categorical comparisons, regression analysis, heatmaps, or multi-panel figures. Applies to tasks involving scatter plots, line plots, histograms, KDE plots, box plots, violin plots, bar plots, pair plots, joint plots, and faceted visualizations. +description: Use seaborn for statistical data visualization, exploratory data analysis, and publication-quality plots. This skill covers creating scatter plots, line plots, histograms, KDE plots, box plots, violin plots, bar plots, heatmaps, correlation matrices, pair plots, joint plots, regression plots, categorical comparisons, distribution analysis, multi-panel figures, faceted visualizations, statistical estimation with confidence intervals, color palettes, themes, and matplotlib integration. Apply when visualizing relationships between variables, comparing distributions across categories, analyzing correlations, creating heatmaps, performing regression analysis, exploring multivariate data, generating small multiples, designing publication figures, or when matplotlib plots need statistical enhancements. Suitable for data exploration, statistical analysis, scientific visualization, and creating complex multi-panel figures with minimal code. --- # Seaborn Statistical Visualization diff --git a/scientific-packages/torch_geometric/SKILL.md b/scientific-packages/torch_geometric/SKILL.md index 49c8cc3..7ab5cad 100644 --- a/scientific-packages/torch_geometric/SKILL.md +++ b/scientific-packages/torch_geometric/SKILL.md @@ -1,6 +1,6 @@ --- name: torch-geometric -description: PyTorch Geometric (PyG) skill for building and training Graph Neural Networks (GNNs) on structured data including graphs, 3D meshes, and point clouds. Use this skill when working with graph-based machine learning tasks such as node classification, graph classification, link prediction, or geometric deep learning on irregular structures. Applies to molecular property prediction, social network analysis, citation networks, 3D vision, and any domain involving relational or geometric data. +description: PyTorch Geometric (PyG) skill for building, training, and deploying Graph Neural Networks (GNNs) on structured data including graphs, 3D meshes, point clouds, and heterogeneous networks. Use this skill for graph-based machine learning tasks such as node classification, graph classification, link prediction, graph generation, geometric deep learning, and message passing on irregular structures. Essential for molecular property prediction, drug discovery, chemical informatics, social network analysis, citation networks, recommendation systems, 3D computer vision, protein structure analysis, knowledge graphs, fraud detection, traffic prediction, and any domain involving relational, geometric, or topological data. Supports large-scale graph processing, multi-GPU training, neighbor sampling, heterogeneous graphs, graph transforms, model explainability, and custom message passing layers. Includes comprehensive datasets, pre-built GNN architectures (GCN, GAT, GraphSAGE, etc.), and utilities for graph visualization and benchmarking. --- # PyTorch Geometric (PyG) diff --git a/scientific-packages/transformers/SKILL.md b/scientific-packages/transformers/SKILL.md index 611ef18..8f4a3cd 100644 --- a/scientific-packages/transformers/SKILL.md +++ b/scientific-packages/transformers/SKILL.md @@ -1,6 +1,6 @@ --- name: transformers -description: Comprehensive toolkit for working with Hugging Face Transformers library for state-of-the-art machine learning across NLP, computer vision, audio, and multimodal tasks. Use this skill when working with pretrained models, fine-tuning transformers, implementing text generation, image classification, speech recognition, or any task involving transformer architectures like BERT, GPT, T5, Vision Transformers, CLIP, or Whisper. +description: Essential toolkit for Hugging Face Transformers library enabling state-of-the-art machine learning across natural language processing, computer vision, audio processing, and multimodal applications. Use this skill for: loading and using pretrained transformer models (BERT, GPT, T5, RoBERTa, DistilBERT, BART, T5, ViT, CLIP, Whisper, Llama, Mistral), implementing text generation and completion, fine-tuning models for custom tasks, text classification and sentiment analysis, question answering and reading comprehension, named entity recognition and token classification, text summarization and translation, image classification and object detection, speech recognition and audio processing, multimodal tasks combining text and images, parameter-efficient fine-tuning with LoRA and adapters, model quantization and optimization, training custom transformer models, implementing chat interfaces and conversational AI, working with tokenizers and text preprocessing, handling model inference and deployment, managing GPU memory and device allocation, implementing custom training loops, using pipelines for quick inference, working with Hugging Face Hub for model sharing, and any machine learning task involving transformer architectures or attention mechanisms. --- # Transformers diff --git a/scientific-packages/umap-learn/SKILL.md b/scientific-packages/umap-learn/SKILL.md index 08daf9c..47dc38a 100644 --- a/scientific-packages/umap-learn/SKILL.md +++ b/scientific-packages/umap-learn/SKILL.md @@ -1,6 +1,6 @@ --- name: umap-learn -description: Guide for using UMAP (Uniform Manifold Approximation and Projection) for dimensionality reduction, visualization, and clustering. Use this skill when working with high-dimensional data that needs to be reduced for visualization, machine learning pipelines, or clustering tasks. Triggers include requests for dimensionality reduction, manifold learning, data visualization in 2D/3D, UMAP-based clustering, or supervised feature engineering. +description: Comprehensive guide for UMAP (Uniform Manifold Approximation and Projection) - a fast, scalable dimensionality reduction technique for visualization, clustering, and machine learning. Use this skill for: dimensionality reduction of high-dimensional datasets (genes, proteins, images, text embeddings, sensor data), creating 2D/3D visualizations of complex data, preprocessing data for clustering algorithms (especially HDBSCAN), supervised and semi-supervised dimensionality reduction with labels, transforming new data using trained UMAP models, parametric UMAP with neural networks, feature engineering for downstream ML models, manifold learning and non-linear dimensionality reduction, comparing UMAP to t-SNE/PCA/other methods, inverse transforms and data reconstruction, aligned UMAP for temporal/batch data analysis. Triggers include: "dimensionality reduction", "UMAP", "manifold learning", "data visualization", "clustering preprocessing", "high-dimensional data", "embedding", "reduce dimensions", "2D visualization", "3D visualization", "supervised dimensionality reduction", "parametric UMAP", "transform new data", "feature engineering", "HDBSCAN clustering", "t-SNE alternative", "non-linear dimensionality reduction", "inverse transform", "data reconstruction", "aligned embeddings", "batch effect correction", "temporal data analysis". --- # UMAP-Learn diff --git a/scientific-packages/zarr-python/SKILL.md b/scientific-packages/zarr-python/SKILL.md index 9ab7395..15e227d 100644 --- a/scientific-packages/zarr-python/SKILL.md +++ b/scientific-packages/zarr-python/SKILL.md @@ -1,6 +1,6 @@ --- name: zarr-python -description: Toolkit for working with Zarr, a Python library for chunked, compressed N-dimensional arrays optimized for cloud storage and large-scale scientific computing. Use this skill when working with large datasets that need efficient storage and parallel access, multidimensional arrays requiring chunking and compression, cloud-native data workflows (S3, GCS), or when integrating with NumPy, Dask, and Xarray for scientific computing tasks. +description: Toolkit for working with Zarr, a Python library for chunked, compressed N-dimensional arrays optimized for cloud storage and large-scale scientific computing. Use this skill when working with large datasets that need efficient storage and parallel access, multidimensional arrays requiring chunking and compression, cloud-native data workflows (S3, GCS), or when integrating with NumPy, Dask, and Xarray for scientific computing tasks. Essential for handling datasets larger than memory, implementing parallel I/O operations, optimizing storage with compression, creating hierarchical data structures, converting between scientific data formats (HDF5, NetCDF, NumPy), managing cloud storage workflows, implementing chunked array operations, and building scalable scientific computing pipelines. --- # Zarr Python