mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-01-26 16:58:56 +08:00
Add utility skill to get system resources
This commit is contained in:
@@ -29,6 +29,7 @@
|
||||
"./scientific-packages/deepchem",
|
||||
"./scientific-packages/deeptools",
|
||||
"./scientific-packages/diffdock",
|
||||
"./scientific-packages/esm",
|
||||
"./scientific-packages/etetoolkit",
|
||||
"./scientific-packages/flowio",
|
||||
"./scientific-packages/gget",
|
||||
@@ -49,9 +50,11 @@
|
||||
"./scientific-packages/rdkit",
|
||||
"./scientific-packages/reportlab",
|
||||
"./scientific-packages/scanpy",
|
||||
"./scientific-packages/scvi-tools",
|
||||
"./scientific-packages/scikit-bio",
|
||||
"./scientific-packages/scikit-learn",
|
||||
"./scientific-packages/seaborn",
|
||||
"./scientific-packages/shap",
|
||||
"./scientific-packages/statsmodels",
|
||||
"./scientific-packages/torch_geometric",
|
||||
"./scientific-packages/torchdrug",
|
||||
@@ -133,6 +136,12 @@
|
||||
"description": "Always Auto-invoked skill that creates/updates workspace AGENT.md to instruct the agent to always search for existing skills before attempting any scientific task",
|
||||
"source": "./scientific-helpers/scientific-context-initialization",
|
||||
"strict": false
|
||||
},
|
||||
{
|
||||
"name": "get-available-resources",
|
||||
"description": "Detects and reports available system resources (CPU cores, GPUs, memory, disk space) to inform computational approach decisions",
|
||||
"source": "./scientific-helpers/get-available-resources",
|
||||
"strict": false
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
26
README.md
26
README.md
@@ -2,7 +2,7 @@
|
||||
|
||||
[](LICENSE.md)
|
||||
[](https://github.com/K-Dense-AI/claude-scientific-skills)
|
||||
[](#what-s-included)
|
||||
[](#what-s-included)
|
||||
[](#what-s-included)
|
||||
|
||||
A comprehensive collection of ready-to-use scientific skills for Claude, curated by the K-Dense team.
|
||||
@@ -45,9 +45,9 @@ These skills enable Claude to work with specialized scientific libraries and dat
|
||||
| Category | Count | Description |
|
||||
|----------|-------|-------------|
|
||||
| 📊 **Scientific Databases** | 25 | PubMed, PubChem, UniProt, ChEMBL, COSMIC, AlphaFold DB, bioRxiv, and more |
|
||||
| 🔬 **Scientific Packages** | 43 | BioPython, RDKit, PyTorch, Scanpy, and specialized tools |
|
||||
| 🔬 **Scientific Packages** | 46 | BioPython, RDKit, PyTorch, Scanpy, scvi-tools, ESM, and specialized tools |
|
||||
| 🔌 **Scientific Integrations** | 6 | Benchling, DNAnexus, Opentrons, LabArchives, LatchBio, OMERO |
|
||||
| 🎯 **Context Initialization** | 1 | Auto-invoked skill to ensure Claude uses existing skills effectively |
|
||||
| 🛠️ **Scientific Helpers** | 2 | Context initialization and resource detection utilities |
|
||||
| 📚 **Documented Workflows** | 122 | Ready-to-use examples and reference materials |
|
||||
|
||||
---
|
||||
@@ -78,7 +78,7 @@ Then, to install a specific set of skills:
|
||||
2. Select **claude-scientific-skills**
|
||||
3. Choose from:
|
||||
- `scientific-databases` - Access to 25 scientific databases
|
||||
- `scientific-packages` - 43 specialized Python packages
|
||||
- `scientific-packages` - 46 specialized Python packages
|
||||
- `scientific-thinking` - Analysis tools and document processing
|
||||
- `scientific-integrations` - Lab automation and platform integrations
|
||||
- `scientific-context-initialization` - Ensures Claude searches for and uses existing skills
|
||||
@@ -248,15 +248,15 @@ network visualizations. Finally, search GEO for similar expression patterns acro
|
||||
---
|
||||
|
||||
### 🔬 Scientific Packages
|
||||
**43 specialized Python packages** organized by domain.
|
||||
**44 specialized Python packages** organized by domain.
|
||||
|
||||
📖 **[Full Package Documentation →](docs/scientific-packages.md)**
|
||||
|
||||
<details>
|
||||
<summary><strong>Bioinformatics & Genomics (11 packages)</strong></summary>
|
||||
<summary><strong>Bioinformatics & Genomics (12 packages)</strong></summary>
|
||||
|
||||
- AnnData, Arboreto, BioPython, BioServices, Cellxgene Census
|
||||
- deepTools, FlowIO, gget, pysam, PyDESeq2, Scanpy
|
||||
- deepTools, FlowIO, gget, pysam, PyDESeq2, Scanpy, scvi-tools
|
||||
|
||||
</details>
|
||||
|
||||
@@ -275,9 +275,9 @@ network visualizations. Finally, search GEO for similar expression patterns acro
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><strong>Machine Learning & Deep Learning (8 packages)</strong></summary>
|
||||
<summary><strong>Machine Learning & Deep Learning (9 packages)</strong></summary>
|
||||
|
||||
- PyMC, PyMOO, PyTorch Lightning, scikit-learn, statsmodels
|
||||
- PyMC, PyMOO, PyTorch Lightning, scikit-learn, SHAP, statsmodels
|
||||
- Torch Geometric, Transformers, UMAP-learn
|
||||
|
||||
</details>
|
||||
@@ -344,6 +344,14 @@ network visualizations. Finally, search GEO for similar expression patterns acro
|
||||
|
||||
---
|
||||
|
||||
### 🛠️ Scientific Helpers
|
||||
**2 helper utilities** for enhanced scientific computing capabilities.
|
||||
|
||||
- **scientific-context-initialization** - Auto-invoked skill that creates/updates workspace AGENT.md to instruct Claude to search for and use existing skills before attempting any scientific task
|
||||
- **get-available-resources** - Detects available system resources (CPU cores, GPUs, memory, disk space) and generates strategic recommendations for computational approaches (parallel processing, out-of-core computing, GPU acceleration)
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
We welcome contributions to expand and improve this scientific skills repository!
|
||||
|
||||
@@ -10,6 +10,7 @@
|
||||
- **pysam** - Read, write, and manipulate genomic data files (SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences) with pileup analysis, coverage calculations, and bioinformatics workflows
|
||||
- **PyDESeq2** - Differential gene expression analysis for bulk RNA-seq data
|
||||
- **Scanpy** - Single-cell RNA-seq analysis with clustering, marker genes, and UMAP/t-SNE visualization
|
||||
- **scvi-tools** - Probabilistic deep learning models for single-cell omics analysis. PyTorch-based framework providing variational autoencoders (VAEs) for dimensionality reduction, batch correction, differential expression, and data integration across modalities. Includes 25+ models: scVI/scANVI (RNA-seq integration and cell type annotation), totalVI (CITE-seq protein+RNA), MultiVI (multiome RNA+ATAC integration), PeakVI (ATAC-seq analysis), DestVI/Stereoscope/Tangram (spatial transcriptomics deconvolution), MethylVI (methylation), CytoVI (flow/mass cytometry), VeloVI (RNA velocity), contrastiveVI (perturbation studies), and Solo (doublet detection). Supports seamless integration with Scanpy/AnnData ecosystem, GPU acceleration, reference mapping (scArches), and probabilistic differential expression with uncertainty quantification
|
||||
|
||||
## Cheminformatics & Drug Discovery
|
||||
- **Datamol** - Molecular manipulation and featurization with enhanced RDKit workflows
|
||||
@@ -25,11 +26,15 @@
|
||||
- **matchms** - Processing and similarity matching of mass spectrometry data with 40+ filters, spectral library matching (Cosine, Modified Cosine, Neutral Losses), metadata harmonization, molecular fingerprint comparison, and support for multiple file formats (MGF, MSP, mzML, JSON)
|
||||
- **pyOpenMS** - Comprehensive mass spectrometry data analysis for proteomics and metabolomics (LC-MS/MS processing, peptide identification, feature detection, quantification, chemical calculations, and integration with search engines like Comet, Mascot, MSGF+)
|
||||
|
||||
## Protein Engineering & Design
|
||||
- **ESM (Evolutionary Scale Modeling)** - State-of-the-art protein language models from EvolutionaryScale for protein design, structure prediction, and representation learning. Includes ESM3 (1.4B-98B parameter multimodal generative models for simultaneous reasoning across sequence, structure, and function with chain-of-thought generation, inverse folding, and function-conditioned design) and ESM C (300M-6B parameter efficient embedding models 3x faster than ESM2 for similarity analysis, classification, and feature extraction). Supports local inference with open weights and cloud-based Forge API for scalable batch processing. Use cases: novel protein design, structure prediction from sequence, sequence design from structure, protein embeddings, function annotation, variant generation, and directed evolution workflows
|
||||
|
||||
## Machine Learning & Deep Learning
|
||||
- **PyMC** - Bayesian statistical modeling and probabilistic programming
|
||||
- **PyMOO** - Multi-objective optimization with evolutionary algorithms
|
||||
- **PyTorch Lightning** - Deep learning framework that organizes PyTorch code to eliminate boilerplate while maintaining full flexibility. Automates training workflows (40+ tasks including epoch/batch iteration, optimizer steps, gradient management, checkpointing), supports multi-GPU/TPU training with DDP/FSDP/DeepSpeed strategies, includes LightningModule for model organization, Trainer for automation, LightningDataModule for data pipelines, callbacks for extensibility, and integrations with TensorBoard, Wandb, MLflow for experiment tracking
|
||||
- **scikit-learn** - Machine learning algorithms, preprocessing, and model selection
|
||||
- **SHAP** - Model interpretability and explainability using Shapley values from game theory. Provides unified approach to explain any ML model with TreeExplainer (fast exact explanations for XGBoost/LightGBM/Random Forest), DeepExplainer (TensorFlow/PyTorch neural networks), KernelExplainer (model-agnostic), and LinearExplainer. Includes comprehensive visualizations (waterfall plots for individual predictions, beeswarm plots for global importance, scatter plots for feature relationships, bar/force/heatmap plots), supports model debugging, fairness analysis, feature engineering guidance, and production deployment
|
||||
- **statsmodels** - Statistical modeling and econometrics (OLS, GLM, logit/probit, ARIMA, time series forecasting, hypothesis testing, diagnostics)
|
||||
- **Torch Geometric** - Graph Neural Networks for molecular and geometric data
|
||||
- **Transformers** - State-of-the-art machine learning models for NLP, computer vision, audio, and multimodal tasks. Provides 1M+ pre-trained models accessible via pipelines (text-classification, NER, QA, summarization, translation, text-generation, image-classification, object-detection, ASR, VQA), comprehensive training via Trainer API with distributed training and mixed precision, flexible text generation with multiple decoding strategies (greedy, beam search, sampling), and Auto classes for automatic architecture selection (BERT, GPT, T5, ViT, BART, etc.)
|
||||
|
||||
271
scientific-helpers/get-available-resources/SKILL.md
Normal file
271
scientific-helpers/get-available-resources/SKILL.md
Normal file
@@ -0,0 +1,271 @@
|
||||
---
|
||||
name: get-available-resources
|
||||
description: This skill should be used at the start of any computationally intensive scientific task to detect and report available system resources (CPU cores, GPUs, memory, disk space). It creates a JSON file with resource information and strategic recommendations that inform computational approach decisions such as whether to use parallel processing (joblib, multiprocessing), out-of-core computing (Dask, Zarr), GPU acceleration (PyTorch, JAX), or memory-efficient strategies. Use this skill before running analyses, training models, processing large datasets, or any task where resource constraints matter.
|
||||
---
|
||||
|
||||
# Get Available Resources
|
||||
|
||||
## Overview
|
||||
|
||||
Detect available computational resources and generate strategic recommendations for scientific computing tasks. This skill automatically identifies CPU capabilities, GPU availability (NVIDIA CUDA, AMD ROCm, Apple Silicon Metal), memory constraints, and disk space to help make informed decisions about computational approaches.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill proactively before any computationally intensive task:
|
||||
|
||||
- **Before data analysis**: Determine if datasets can be loaded into memory or require out-of-core processing
|
||||
- **Before model training**: Check if GPU acceleration is available and which backend to use
|
||||
- **Before parallel processing**: Identify optimal number of workers for joblib, multiprocessing, or Dask
|
||||
- **Before large file operations**: Verify sufficient disk space and appropriate storage strategies
|
||||
- **At project initialization**: Understand baseline capabilities for making architectural decisions
|
||||
|
||||
**Example scenarios:**
|
||||
- "Help me analyze this 50GB genomics dataset" → Use this skill first to determine if Dask/Zarr are needed
|
||||
- "Train a neural network on this data" → Use this skill to detect available GPUs and backends
|
||||
- "Process 10,000 files in parallel" → Use this skill to determine optimal worker count
|
||||
- "Run a computationally intensive simulation" → Use this skill to understand resource constraints
|
||||
|
||||
## How This Skill Works
|
||||
|
||||
### Resource Detection
|
||||
|
||||
The skill runs `scripts/detect_resources.py` to automatically detect:
|
||||
|
||||
1. **CPU Information**
|
||||
- Physical and logical core counts
|
||||
- Processor architecture and model
|
||||
- CPU frequency information
|
||||
|
||||
2. **GPU Information**
|
||||
- NVIDIA GPUs: Detects via nvidia-smi, reports VRAM, driver version, compute capability
|
||||
- AMD GPUs: Detects via rocm-smi
|
||||
- Apple Silicon: Detects M1/M2/M3/M4 chips with Metal support and unified memory
|
||||
|
||||
3. **Memory Information**
|
||||
- Total and available RAM
|
||||
- Current memory usage percentage
|
||||
- Swap space availability
|
||||
|
||||
4. **Disk Space Information**
|
||||
- Total and available disk space for working directory
|
||||
- Current usage percentage
|
||||
|
||||
5. **Operating System Information**
|
||||
- OS type (macOS, Linux, Windows)
|
||||
- OS version and release
|
||||
- Python version
|
||||
|
||||
### Output Format
|
||||
|
||||
The skill generates a `.claude_resources.json` file in the current working directory containing:
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2025-10-23T10:30:00",
|
||||
"os": {
|
||||
"system": "Darwin",
|
||||
"release": "25.0.0",
|
||||
"machine": "arm64"
|
||||
},
|
||||
"cpu": {
|
||||
"physical_cores": 8,
|
||||
"logical_cores": 8,
|
||||
"architecture": "arm64"
|
||||
},
|
||||
"memory": {
|
||||
"total_gb": 16.0,
|
||||
"available_gb": 8.5,
|
||||
"percent_used": 46.9
|
||||
},
|
||||
"disk": {
|
||||
"total_gb": 500.0,
|
||||
"available_gb": 200.0,
|
||||
"percent_used": 60.0
|
||||
},
|
||||
"gpu": {
|
||||
"nvidia_gpus": [],
|
||||
"amd_gpus": [],
|
||||
"apple_silicon": {
|
||||
"name": "Apple M2",
|
||||
"type": "Apple Silicon",
|
||||
"backend": "Metal",
|
||||
"unified_memory": true
|
||||
},
|
||||
"total_gpus": 1,
|
||||
"available_backends": ["Metal"]
|
||||
},
|
||||
"recommendations": {
|
||||
"parallel_processing": {
|
||||
"strategy": "high_parallelism",
|
||||
"suggested_workers": 6,
|
||||
"libraries": ["joblib", "multiprocessing", "dask"]
|
||||
},
|
||||
"memory_strategy": {
|
||||
"strategy": "moderate_memory",
|
||||
"libraries": ["dask", "zarr"],
|
||||
"note": "Consider chunking for datasets > 2GB"
|
||||
},
|
||||
"gpu_acceleration": {
|
||||
"available": true,
|
||||
"backends": ["Metal"],
|
||||
"suggested_libraries": ["pytorch-mps", "tensorflow-metal", "jax-metal"]
|
||||
},
|
||||
"large_data_handling": {
|
||||
"strategy": "disk_abundant",
|
||||
"note": "Sufficient space for large intermediate files"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Strategic Recommendations
|
||||
|
||||
The skill generates context-aware recommendations:
|
||||
|
||||
**Parallel Processing Recommendations:**
|
||||
- **High parallelism (8+ cores)**: Use Dask, joblib, or multiprocessing with workers = cores - 2
|
||||
- **Moderate parallelism (4-7 cores)**: Use joblib or multiprocessing with workers = cores - 1
|
||||
- **Sequential (< 4 cores)**: Prefer sequential processing to avoid overhead
|
||||
|
||||
**Memory Strategy Recommendations:**
|
||||
- **Memory constrained (< 4GB available)**: Use Zarr, Dask, or H5py for out-of-core processing
|
||||
- **Moderate memory (4-16GB available)**: Use Dask/Zarr for datasets > 2GB
|
||||
- **Memory abundant (> 16GB available)**: Can load most datasets into memory directly
|
||||
|
||||
**GPU Acceleration Recommendations:**
|
||||
- **NVIDIA GPUs detected**: Use PyTorch, TensorFlow, JAX, CuPy, or RAPIDS
|
||||
- **AMD GPUs detected**: Use PyTorch-ROCm or TensorFlow-ROCm
|
||||
- **Apple Silicon detected**: Use PyTorch with MPS backend, TensorFlow-Metal, or JAX-Metal
|
||||
- **No GPU detected**: Use CPU-optimized libraries
|
||||
|
||||
**Large Data Handling Recommendations:**
|
||||
- **Disk constrained (< 10GB)**: Use streaming or compression strategies
|
||||
- **Moderate disk (10-100GB)**: Use Zarr, H5py, or Parquet formats
|
||||
- **Disk abundant (> 100GB)**: Can create large intermediate files freely
|
||||
|
||||
## Usage Instructions
|
||||
|
||||
### Step 1: Run Resource Detection
|
||||
|
||||
Execute the detection script at the start of any computationally intensive task:
|
||||
|
||||
```bash
|
||||
python scripts/detect_resources.py
|
||||
```
|
||||
|
||||
Optional arguments:
|
||||
- `-o, --output <path>`: Specify custom output path (default: `.claude_resources.json`)
|
||||
- `-v, --verbose`: Print full resource information to stdout
|
||||
|
||||
### Step 2: Read and Apply Recommendations
|
||||
|
||||
After running detection, read the generated `.claude_resources.json` file to inform computational decisions:
|
||||
|
||||
```python
|
||||
# Example: Use recommendations in code
|
||||
import json
|
||||
|
||||
with open('.claude_resources.json', 'r') as f:
|
||||
resources = json.load(f)
|
||||
|
||||
# Check parallel processing strategy
|
||||
if resources['recommendations']['parallel_processing']['strategy'] == 'high_parallelism':
|
||||
n_jobs = resources['recommendations']['parallel_processing']['suggested_workers']
|
||||
# Use joblib, Dask, or multiprocessing with n_jobs workers
|
||||
|
||||
# Check memory strategy
|
||||
if resources['recommendations']['memory_strategy']['strategy'] == 'memory_constrained':
|
||||
# Use Dask, Zarr, or H5py for out-of-core processing
|
||||
import dask.array as da
|
||||
# Load data in chunks
|
||||
|
||||
# Check GPU availability
|
||||
if resources['recommendations']['gpu_acceleration']['available']:
|
||||
backends = resources['recommendations']['gpu_acceleration']['backends']
|
||||
# Use appropriate GPU library based on available backend
|
||||
```
|
||||
|
||||
### Step 3: Make Informed Decisions
|
||||
|
||||
Use the resource information and recommendations to make strategic choices:
|
||||
|
||||
**For data loading:**
|
||||
```python
|
||||
memory_available_gb = resources['memory']['available_gb']
|
||||
dataset_size_gb = 10
|
||||
|
||||
if dataset_size_gb > memory_available_gb * 0.5:
|
||||
# Dataset is large relative to memory, use Dask
|
||||
import dask.dataframe as dd
|
||||
df = dd.read_csv('large_file.csv')
|
||||
else:
|
||||
# Dataset fits in memory, use pandas
|
||||
import pandas as pd
|
||||
df = pd.read_csv('large_file.csv')
|
||||
```
|
||||
|
||||
**For parallel processing:**
|
||||
```python
|
||||
from joblib import Parallel, delayed
|
||||
|
||||
n_jobs = resources['recommendations']['parallel_processing'].get('suggested_workers', 1)
|
||||
|
||||
results = Parallel(n_jobs=n_jobs)(
|
||||
delayed(process_function)(item) for item in data
|
||||
)
|
||||
```
|
||||
|
||||
**For GPU acceleration:**
|
||||
```python
|
||||
import torch
|
||||
|
||||
if 'CUDA' in resources['gpu']['available_backends']:
|
||||
device = torch.device('cuda')
|
||||
elif 'Metal' in resources['gpu']['available_backends']:
|
||||
device = torch.device('mps')
|
||||
else:
|
||||
device = torch.device('cpu')
|
||||
|
||||
model = model.to(device)
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
The detection script requires the following Python packages:
|
||||
|
||||
```bash
|
||||
pip install psutil
|
||||
```
|
||||
|
||||
All other functionality uses Python standard library modules (json, os, platform, subprocess, sys, pathlib).
|
||||
|
||||
## Platform Support
|
||||
|
||||
- **macOS**: Full support including Apple Silicon (M1/M2/M3/M4) GPU detection
|
||||
- **Linux**: Full support including NVIDIA (nvidia-smi) and AMD (rocm-smi) GPU detection
|
||||
- **Windows**: Full support including NVIDIA GPU detection
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Run early**: Execute resource detection at the start of projects or before major computational tasks
|
||||
2. **Re-run periodically**: System resources change over time (memory usage, disk space)
|
||||
3. **Check before scaling**: Verify resources before scaling up parallel workers or data sizes
|
||||
4. **Document decisions**: Keep the `.claude_resources.json` file in project directories to document resource-aware decisions
|
||||
5. **Use with versioning**: Different machines have different capabilities; resource files help maintain portability
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**GPU not detected:**
|
||||
- Ensure GPU drivers are installed (nvidia-smi, rocm-smi, or system_profiler for Apple Silicon)
|
||||
- Check that GPU utilities are in system PATH
|
||||
- Verify GPU is not in use by other processes
|
||||
|
||||
**Script execution fails:**
|
||||
- Ensure psutil is installed: `pip install psutil`
|
||||
- Check Python version compatibility (Python 3.6+)
|
||||
- Verify script has execute permissions: `chmod +x scripts/detect_resources.py`
|
||||
|
||||
**Inaccurate memory readings:**
|
||||
- Memory readings are snapshots; actual available memory changes constantly
|
||||
- Close other applications before detection for accurate "available" memory
|
||||
- Consider running detection multiple times and averaging results
|
||||
401
scientific-helpers/get-available-resources/scripts/detect_resources.py
Executable file
401
scientific-helpers/get-available-resources/scripts/detect_resources.py
Executable file
@@ -0,0 +1,401 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
System Resource Detection Script
|
||||
|
||||
Detects available compute resources including CPU, GPU, memory, and disk space.
|
||||
Outputs a JSON file that Claude Code can use to make informed decisions about
|
||||
computational approaches (e.g., whether to use Dask, Zarr, Joblib, etc.).
|
||||
|
||||
Supports: macOS, Linux, Windows
|
||||
GPU Detection: NVIDIA (CUDA), AMD (ROCm), Apple Silicon (Metal)
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import platform
|
||||
import psutil
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional
|
||||
|
||||
|
||||
def get_cpu_info() -> Dict[str, Any]:
|
||||
"""Detect CPU information."""
|
||||
cpu_info = {
|
||||
"physical_cores": psutil.cpu_count(logical=False),
|
||||
"logical_cores": psutil.cpu_count(logical=True),
|
||||
"max_frequency_mhz": None,
|
||||
"architecture": platform.machine(),
|
||||
"processor": platform.processor(),
|
||||
}
|
||||
|
||||
# Get CPU frequency if available
|
||||
try:
|
||||
freq = psutil.cpu_freq()
|
||||
if freq:
|
||||
cpu_info["max_frequency_mhz"] = freq.max
|
||||
cpu_info["current_frequency_mhz"] = freq.current
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return cpu_info
|
||||
|
||||
|
||||
def get_memory_info() -> Dict[str, Any]:
|
||||
"""Detect memory information."""
|
||||
mem = psutil.virtual_memory()
|
||||
swap = psutil.swap_memory()
|
||||
|
||||
return {
|
||||
"total_gb": round(mem.total / (1024**3), 2),
|
||||
"available_gb": round(mem.available / (1024**3), 2),
|
||||
"used_gb": round(mem.used / (1024**3), 2),
|
||||
"percent_used": mem.percent,
|
||||
"swap_total_gb": round(swap.total / (1024**3), 2),
|
||||
"swap_available_gb": round((swap.total - swap.used) / (1024**3), 2),
|
||||
}
|
||||
|
||||
|
||||
def get_disk_info(path: str = None) -> Dict[str, Any]:
|
||||
"""Detect disk space information for working directory or specified path."""
|
||||
if path is None:
|
||||
path = os.getcwd()
|
||||
|
||||
try:
|
||||
disk = psutil.disk_usage(path)
|
||||
return {
|
||||
"path": path,
|
||||
"total_gb": round(disk.total / (1024**3), 2),
|
||||
"available_gb": round(disk.free / (1024**3), 2),
|
||||
"used_gb": round(disk.used / (1024**3), 2),
|
||||
"percent_used": disk.percent,
|
||||
}
|
||||
except Exception as e:
|
||||
return {
|
||||
"path": path,
|
||||
"error": str(e),
|
||||
}
|
||||
|
||||
|
||||
def detect_nvidia_gpus() -> List[Dict[str, Any]]:
|
||||
"""Detect NVIDIA GPUs using nvidia-smi."""
|
||||
gpus = []
|
||||
|
||||
try:
|
||||
# Try to run nvidia-smi
|
||||
result = subprocess.run(
|
||||
["nvidia-smi", "--query-gpu=index,name,memory.total,memory.free,driver_version,compute_cap",
|
||||
"--format=csv,noheader,nounits"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=5
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
for line in result.stdout.strip().split('\n'):
|
||||
if line:
|
||||
parts = [p.strip() for p in line.split(',')]
|
||||
if len(parts) >= 6:
|
||||
gpus.append({
|
||||
"index": int(parts[0]),
|
||||
"name": parts[1],
|
||||
"memory_total_mb": float(parts[2]),
|
||||
"memory_free_mb": float(parts[3]),
|
||||
"driver_version": parts[4],
|
||||
"compute_capability": parts[5],
|
||||
"type": "NVIDIA",
|
||||
"backend": "CUDA"
|
||||
})
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError, Exception):
|
||||
pass
|
||||
|
||||
return gpus
|
||||
|
||||
|
||||
def detect_amd_gpus() -> List[Dict[str, Any]]:
|
||||
"""Detect AMD GPUs using rocm-smi."""
|
||||
gpus = []
|
||||
|
||||
try:
|
||||
# Try to run rocm-smi
|
||||
result = subprocess.run(
|
||||
["rocm-smi", "--showid", "--showmeminfo", "vram"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=5
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
# Parse rocm-smi output (basic parsing, may need refinement)
|
||||
lines = result.stdout.strip().split('\n')
|
||||
gpu_index = 0
|
||||
for line in lines:
|
||||
if 'GPU' in line and 'DID' in line:
|
||||
gpus.append({
|
||||
"index": gpu_index,
|
||||
"name": "AMD GPU",
|
||||
"type": "AMD",
|
||||
"backend": "ROCm",
|
||||
"info": line.strip()
|
||||
})
|
||||
gpu_index += 1
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError, Exception):
|
||||
pass
|
||||
|
||||
return gpus
|
||||
|
||||
|
||||
def detect_apple_silicon_gpu() -> Optional[Dict[str, Any]]:
|
||||
"""Detect Apple Silicon GPU (M1/M2/M3/etc.)."""
|
||||
if platform.system() != "Darwin":
|
||||
return None
|
||||
|
||||
try:
|
||||
# Check if running on Apple Silicon
|
||||
result = subprocess.run(
|
||||
["sysctl", "-n", "machdep.cpu.brand_string"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=5
|
||||
)
|
||||
|
||||
cpu_brand = result.stdout.strip()
|
||||
|
||||
# Check for Apple Silicon (M1, M2, M3, etc.)
|
||||
if "Apple" in cpu_brand and any(chip in cpu_brand for chip in ["M1", "M2", "M3", "M4"]):
|
||||
# Get GPU core count if possible
|
||||
gpu_info = {
|
||||
"name": cpu_brand,
|
||||
"type": "Apple Silicon",
|
||||
"backend": "Metal",
|
||||
"unified_memory": True, # Apple Silicon uses unified memory
|
||||
}
|
||||
|
||||
# Try to get GPU core information
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["system_profiler", "SPDisplaysDataType"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10
|
||||
)
|
||||
|
||||
# Parse GPU core info from system_profiler
|
||||
for line in result.stdout.split('\n'):
|
||||
if 'Chipset Model' in line:
|
||||
gpu_info["chipset"] = line.split(':')[1].strip()
|
||||
elif 'Total Number of Cores' in line:
|
||||
try:
|
||||
cores = line.split(':')[1].strip()
|
||||
gpu_info["gpu_cores"] = cores
|
||||
except:
|
||||
pass
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return gpu_info
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def get_gpu_info() -> Dict[str, Any]:
|
||||
"""Detect all available GPUs."""
|
||||
gpu_info = {
|
||||
"nvidia_gpus": detect_nvidia_gpus(),
|
||||
"amd_gpus": detect_amd_gpus(),
|
||||
"apple_silicon": detect_apple_silicon_gpu(),
|
||||
"total_gpus": 0,
|
||||
"available_backends": []
|
||||
}
|
||||
|
||||
# Count total GPUs and available backends
|
||||
if gpu_info["nvidia_gpus"]:
|
||||
gpu_info["total_gpus"] += len(gpu_info["nvidia_gpus"])
|
||||
gpu_info["available_backends"].append("CUDA")
|
||||
|
||||
if gpu_info["amd_gpus"]:
|
||||
gpu_info["total_gpus"] += len(gpu_info["amd_gpus"])
|
||||
gpu_info["available_backends"].append("ROCm")
|
||||
|
||||
if gpu_info["apple_silicon"]:
|
||||
gpu_info["total_gpus"] += 1
|
||||
gpu_info["available_backends"].append("Metal")
|
||||
|
||||
return gpu_info
|
||||
|
||||
|
||||
def get_os_info() -> Dict[str, Any]:
|
||||
"""Get operating system information."""
|
||||
return {
|
||||
"system": platform.system(),
|
||||
"release": platform.release(),
|
||||
"version": platform.version(),
|
||||
"machine": platform.machine(),
|
||||
"python_version": platform.python_version(),
|
||||
}
|
||||
|
||||
|
||||
def detect_all_resources(output_path: str = None) -> Dict[str, Any]:
|
||||
"""
|
||||
Detect all system resources and save to JSON.
|
||||
|
||||
Args:
|
||||
output_path: Optional path to save JSON. Defaults to .claude_resources.json in cwd.
|
||||
|
||||
Returns:
|
||||
Dictionary containing all resource information.
|
||||
"""
|
||||
if output_path is None:
|
||||
output_path = os.path.join(os.getcwd(), ".claude_resources.json")
|
||||
|
||||
resources = {
|
||||
"timestamp": __import__("datetime").datetime.now().isoformat(),
|
||||
"os": get_os_info(),
|
||||
"cpu": get_cpu_info(),
|
||||
"memory": get_memory_info(),
|
||||
"disk": get_disk_info(),
|
||||
"gpu": get_gpu_info(),
|
||||
}
|
||||
|
||||
# Add computational recommendations
|
||||
resources["recommendations"] = generate_recommendations(resources)
|
||||
|
||||
# Save to JSON file
|
||||
with open(output_path, 'w') as f:
|
||||
json.dump(resources, f, indent=2)
|
||||
|
||||
return resources
|
||||
|
||||
|
||||
def generate_recommendations(resources: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""
|
||||
Generate computational approach recommendations based on available resources.
|
||||
"""
|
||||
recommendations = {
|
||||
"parallel_processing": {},
|
||||
"memory_strategy": {},
|
||||
"gpu_acceleration": {},
|
||||
"large_data_handling": {}
|
||||
}
|
||||
|
||||
# CPU recommendations
|
||||
cpu_cores = resources["cpu"]["logical_cores"]
|
||||
if cpu_cores >= 8:
|
||||
recommendations["parallel_processing"]["strategy"] = "high_parallelism"
|
||||
recommendations["parallel_processing"]["suggested_workers"] = max(cpu_cores - 2, 1)
|
||||
recommendations["parallel_processing"]["libraries"] = ["joblib", "multiprocessing", "dask"]
|
||||
elif cpu_cores >= 4:
|
||||
recommendations["parallel_processing"]["strategy"] = "moderate_parallelism"
|
||||
recommendations["parallel_processing"]["suggested_workers"] = max(cpu_cores - 1, 1)
|
||||
recommendations["parallel_processing"]["libraries"] = ["joblib", "multiprocessing"]
|
||||
else:
|
||||
recommendations["parallel_processing"]["strategy"] = "sequential"
|
||||
recommendations["parallel_processing"]["note"] = "Limited cores, prefer sequential processing"
|
||||
|
||||
# Memory recommendations
|
||||
available_memory_gb = resources["memory"]["available_gb"]
|
||||
total_memory_gb = resources["memory"]["total_gb"]
|
||||
|
||||
if available_memory_gb < 4:
|
||||
recommendations["memory_strategy"]["strategy"] = "memory_constrained"
|
||||
recommendations["memory_strategy"]["libraries"] = ["zarr", "dask", "h5py"]
|
||||
recommendations["memory_strategy"]["note"] = "Use out-of-core processing for large datasets"
|
||||
elif available_memory_gb < 16:
|
||||
recommendations["memory_strategy"]["strategy"] = "moderate_memory"
|
||||
recommendations["memory_strategy"]["libraries"] = ["dask", "zarr"]
|
||||
recommendations["memory_strategy"]["note"] = "Consider chunking for datasets > 2GB"
|
||||
else:
|
||||
recommendations["memory_strategy"]["strategy"] = "memory_abundant"
|
||||
recommendations["memory_strategy"]["note"] = "Can load most datasets into memory"
|
||||
|
||||
# GPU recommendations
|
||||
gpu_info = resources["gpu"]
|
||||
if gpu_info["total_gpus"] > 0:
|
||||
recommendations["gpu_acceleration"]["available"] = True
|
||||
recommendations["gpu_acceleration"]["backends"] = gpu_info["available_backends"]
|
||||
|
||||
if "CUDA" in gpu_info["available_backends"]:
|
||||
recommendations["gpu_acceleration"]["suggested_libraries"] = [
|
||||
"pytorch", "tensorflow", "jax", "cupy", "rapids"
|
||||
]
|
||||
elif "Metal" in gpu_info["available_backends"]:
|
||||
recommendations["gpu_acceleration"]["suggested_libraries"] = [
|
||||
"pytorch-mps", "tensorflow-metal", "jax-metal"
|
||||
]
|
||||
elif "ROCm" in gpu_info["available_backends"]:
|
||||
recommendations["gpu_acceleration"]["suggested_libraries"] = [
|
||||
"pytorch-rocm", "tensorflow-rocm"
|
||||
]
|
||||
else:
|
||||
recommendations["gpu_acceleration"]["available"] = False
|
||||
recommendations["gpu_acceleration"]["note"] = "No GPU detected, use CPU-based libraries"
|
||||
|
||||
# Large data handling recommendations
|
||||
disk_available_gb = resources["disk"]["available_gb"]
|
||||
if disk_available_gb < 10:
|
||||
recommendations["large_data_handling"]["strategy"] = "disk_constrained"
|
||||
recommendations["large_data_handling"]["note"] = "Limited disk space, use streaming or compression"
|
||||
elif disk_available_gb < 100:
|
||||
recommendations["large_data_handling"]["strategy"] = "moderate_disk"
|
||||
recommendations["large_data_handling"]["libraries"] = ["zarr", "h5py", "parquet"]
|
||||
else:
|
||||
recommendations["large_data_handling"]["strategy"] = "disk_abundant"
|
||||
recommendations["large_data_handling"]["note"] = "Sufficient space for large intermediate files"
|
||||
|
||||
return recommendations
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point for CLI usage."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Detect system resources for scientific computing"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-o", "--output",
|
||||
default=".claude_resources.json",
|
||||
help="Output JSON file path (default: .claude_resources.json)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-v", "--verbose",
|
||||
action="store_true",
|
||||
help="Print resources to stdout"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
print("🔍 Detecting system resources...")
|
||||
resources = detect_all_resources(args.output)
|
||||
|
||||
print(f"✅ Resources detected and saved to: {args.output}")
|
||||
|
||||
if args.verbose:
|
||||
print("\n" + "="*60)
|
||||
print(json.dumps(resources, indent=2))
|
||||
print("="*60)
|
||||
|
||||
# Print summary
|
||||
print("\n📊 Resource Summary:")
|
||||
print(f" OS: {resources['os']['system']} {resources['os']['release']}")
|
||||
print(f" CPU: {resources['cpu']['logical_cores']} cores ({resources['cpu']['physical_cores']} physical)")
|
||||
print(f" Memory: {resources['memory']['total_gb']} GB total, {resources['memory']['available_gb']} GB available")
|
||||
print(f" Disk: {resources['disk']['total_gb']} GB total, {resources['disk']['available_gb']} GB available")
|
||||
|
||||
if resources['gpu']['total_gpus'] > 0:
|
||||
print(f" GPU: {resources['gpu']['total_gpus']} detected ({', '.join(resources['gpu']['available_backends'])})")
|
||||
else:
|
||||
print(" GPU: None detected")
|
||||
|
||||
print("\n💡 Recommendations:")
|
||||
recs = resources['recommendations']
|
||||
print(f" Parallel Processing: {recs['parallel_processing'].get('strategy', 'N/A')}")
|
||||
print(f" Memory Strategy: {recs['memory_strategy'].get('strategy', 'N/A')}")
|
||||
print(f" GPU Acceleration: {'Available' if recs['gpu_acceleration'].get('available') else 'Not Available'}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user