mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-01-26 16:58:56 +08:00
7.0 KiB
7.0 KiB
Scientific Packages
Bioinformatics & Genomics
- AnnData - Annotated data matrices for single-cell genomics and h5ad files
- Arboreto - Gene regulatory network inference using GRNBoost2 and GENIE3
- BioPython - Sequence manipulation, NCBI database access, BLAST searches, alignments, and phylogenetics
- BioServices - Programmatic access to 40+ biological web services (KEGG, UniProt, ChEBI, ChEMBL)
- Cellxgene Census - Query and analyze large-scale single-cell RNA-seq data
- gget - Efficient genomic database queries (Ensembl, UniProt, NCBI, PDB, COSMIC)
- pysam - Read, write, and manipulate genomic data files (SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences) with pileup analysis, coverage calculations, and bioinformatics workflows
- PyDESeq2 - Differential gene expression analysis for bulk RNA-seq data
- Scanpy - Single-cell RNA-seq analysis with clustering, marker genes, and UMAP/t-SNE visualization
Cheminformatics & Drug Discovery
- Datamol - Molecular manipulation and featurization with enhanced RDKit workflows
- DeepChem - Molecular machine learning, graph neural networks, and MoleculeNet benchmarks
- DiffDock - Diffusion-based molecular docking for protein-ligand binding prediction
- MedChem - Medicinal chemistry analysis, ADMET prediction, and drug-likeness assessment
- Molfeat - 100+ molecular featurizers including fingerprints, descriptors, and pretrained models
- PyTDC - Therapeutics Data Commons for drug discovery datasets and benchmarks
- RDKit - Cheminformatics toolkit for molecular I/O, descriptors, fingerprints, and SMARTS
- TorchDrug - PyTorch-based machine learning platform for drug discovery with 40+ datasets, 20+ GNN models for molecular property prediction, protein modeling, knowledge graph reasoning, molecular generation, and retrosynthesis planning
Proteomics & Mass Spectrometry
- matchms - Processing and similarity matching of mass spectrometry data with 40+ filters, spectral library matching (Cosine, Modified Cosine, Neutral Losses), metadata harmonization, molecular fingerprint comparison, and support for multiple file formats (MGF, MSP, mzML, JSON)
- pyOpenMS - Comprehensive mass spectrometry data analysis for proteomics and metabolomics (LC-MS/MS processing, peptide identification, feature detection, quantification, chemical calculations, and integration with search engines like Comet, Mascot, MSGF+)
Machine Learning & Deep Learning
- PyMC - Bayesian statistical modeling and probabilistic programming
- PyMOO - Multi-objective optimization with evolutionary algorithms
- PyTorch Lightning - Deep learning framework that organizes PyTorch code to eliminate boilerplate while maintaining full flexibility. Automates training workflows (40+ tasks including epoch/batch iteration, optimizer steps, gradient management, checkpointing), supports multi-GPU/TPU training with DDP/FSDP/DeepSpeed strategies, includes LightningModule for model organization, Trainer for automation, LightningDataModule for data pipelines, callbacks for extensibility, and integrations with TensorBoard, Wandb, MLflow for experiment tracking
- scikit-learn - Machine learning algorithms, preprocessing, and model selection
- statsmodels - Statistical modeling and econometrics (OLS, GLM, logit/probit, ARIMA, time series forecasting, hypothesis testing, diagnostics)
- Torch Geometric - Graph Neural Networks for molecular and geometric data
- Transformers - State-of-the-art machine learning models for NLP, computer vision, audio, and multimodal tasks. Provides 1M+ pre-trained models accessible via pipelines (text-classification, NER, QA, summarization, translation, text-generation, image-classification, object-detection, ASR, VQA), comprehensive training via Trainer API with distributed training and mixed precision, flexible text generation with multiple decoding strategies (greedy, beam search, sampling), and Auto classes for automatic architecture selection (BERT, GPT, T5, ViT, BART, etc.)
- UMAP-learn - Dimensionality reduction and manifold learning
Materials Science & Chemistry
- Astropy - Astronomy and astrophysics (coordinates, cosmology, FITS files)
- COBRApy - Constraint-based metabolic modeling and flux balance analysis
- Pymatgen - Materials structure analysis, phase diagrams, and electronic structure
Data Analysis & Visualization
- Dask - Parallel computing for larger-than-memory datasets with distributed DataFrames, Arrays, Bags, and Futures
- Matplotlib - Publication-quality plotting and visualization
- Polars - High-performance DataFrame operations with lazy evaluation
- Seaborn - Statistical data visualization with dataset-oriented interface, automatic confidence intervals, publication-quality themes, colorblind-safe palettes, and comprehensive support for exploratory analysis, distribution comparisons, correlation matrices, regression plots, and multi-panel figures
- ReportLab - Programmatic PDF generation for reports and documents
Phylogenetics & Trees
- ETE Toolkit - Phylogenetic tree manipulation, visualization, and analysis
Genomics Tools
- deepTools - NGS data analysis (ChIP-seq, RNA-seq, ATAC-seq) with BAM/bigWig files
- FlowIO - Flow Cytometry Standard (FCS) file reading and manipulation
- scikit-bio - Bioinformatics sequence analysis and diversity metrics
- Zarr - Chunked, compressed N-dimensional array storage
Multi-omics & AI Agent Frameworks
- BIOMNI - Autonomous biomedical AI agent framework from Stanford SNAP lab for executing complex research tasks across genomics, drug discovery, molecular biology, and clinical analysis. Combines LLM reasoning with code execution and ~11GB of integrated biomedical databases (Ensembl, NCBI Gene, UniProt, PDB, AlphaFold, ClinVar, OMIM, HPO, PubMed, KEGG, Reactome, GO). Supports multiple LLM providers (Claude, GPT-4, Gemini, Groq, Bedrock). Includes A1 agent class for autonomous task decomposition, BiomniEval1 benchmark framework, and MCP server integration. Use cases: CRISPR screening design, single-cell RNA-seq analysis, ADMET prediction, GWAS interpretation, rare disease diagnosis, protein structure analysis, literature synthesis, and multi-omics integration
Scientific Communication & Publishing
- Paper-2-Web - Autonomous pipeline for transforming academic papers into multiple promotional formats using the Paper2All system. Converts LaTeX or PDF papers into: (1) Paper2Web - interactive, layout-aware academic homepages with responsive design, interactive figures, and mobile support; (2) Paper2Video - professional presentation videos with slides, narration, cursor movements, and optional talking-head generation using Hallo2; (3) Paper2Poster - print-ready conference posters with custom dimensions, professional layouts, and institution branding. Supports GPT-4/GPT-4.1 models, batch processing, QR code generation, multi-language content, and quality assessment metrics. Use cases: conference materials, video abstracts, preprint enhancement, research promotion, poster sessions, and academic website creation