--- name: primekg description: Query the Precision Medicine Knowledge Graph (PrimeKG) for multiscale biological data including genes, drugs, diseases, phenotypes, and more. license: Unknown metadata: skill-author: K-Dense Inc. (PrimeKG original from Harvard MIMS) --- # PrimeKG Knowledge Graph Skill ## Overview PrimeKG is a precision medicine knowledge graph that integrates over 20 primary databases and high-quality scientific literature into a single resource. It contains over 100,000 nodes and 4 million edges across 29 relationship types, including drug-target, disease-gene, and phenotype-disease associations. **Key capabilities:** - Search for nodes (genes, proteins, drugs, diseases, phenotypes) - Retrieve direct neighbors (associated entities and clinical evidence) - Analyze local disease context (related genes, drugs, phenotypes) - Identify drug-disease paths (potential repurposing opportunities) **Data access:** Programmatic access via `query_primekg.py`. Data is stored at `C:\Users\eamon\Documents\Data\PrimeKG\kg.csv`. ## When to Use This Skill This skill should be used when: - **Knowledge-based drug discovery:** Identifying targets and mechanisms for diseases. - **Drug repurposing:** Finding existing drugs that might have evidence for new indications. - **Phenotype analysis:** Understanding how symptoms/phenotypes relate to diseases and genes. - **Multiscale biology:** Bridging the gap between molecular targets (genes) and clinical outcomes (diseases). - **Network pharmacology:** Investigating the broader network effects of drug-target interactions. ## Core Workflow ### 1. Search for Entities Find identifiers for genes, drugs, or diseases. ```python from scripts.query_primekg import search_nodes # Search for Alzheimer's disease nodes results = search_nodes("Alzheimer", node_type="disease") # Returns: [{"id": "EFO_0000249", "type": "disease", "name": "Alzheimer's disease", ...}] ``` ### 2. Get Neighbors (Direct Associations) Retrieve all connected nodes and relationship types. ```python from scripts.query_primekg import get_neighbors # Get all neighbors of a specific disease ID neighbors = get_neighbors("EFO_0000249") # Returns: List of neighbors like {"neighbor_name": "APOE", "relation": "disease_gene", ...} ``` ### 3. Analyze Disease Context A high-level function to summarize associations for a disease. ```python from scripts.query_primekg import get_disease_context # Comprehensive summary for a disease context = get_disease_context("Alzheimer's disease") # Access: context['associated_genes'], context['associated_drugs'], context['phenotypes'] ``` ## Relationship Types in PrimeKG The graph contains several key relationship types including: - `protein_protein`: Physical PPIs - `drug_protein`: Drug target/mechanism associations - `disease_gene`: Genetic associations - `drug_disease`: Indications and contraindications - `disease_phenotype`: Clinical signs and symptoms - `gwas`: Genome-wide association studies evidence ## Best Practices 1. **Use specific IDs:** When using `get_neighbors`, ensure you have the correct ID from `search_nodes`. 2. **Context first:** Use `get_disease_context` for a broad overview before diving into specific genes or drugs. 3. **Filter relationships:** Use the `relation_type` filter in `get_neighbors` to focus on specific evidence (e.g., only `drug_protein`). 4. **Multiscale integration:** Combine with `OpenTargets` for deeper genetic evidence or `Semantic Scholar` for the latest literature context. ## Resources ### Scripts - `scripts/query_primekg.py`: Core functions for searching and querying the knowledge graph. ### Data Path - Data: `/mnt/c/Users/eamon/Documents/Data/PrimeKG/kg.csv` - Total nodes: ~129,000 - Total edges: ~4,000,000 - Database: CSV-based, optimized for pandas querying.