--- name: pdb-database description: Work with the RCSB Protein Data Bank (PDB) to search, retrieve, and analyze 3D structures of proteins, nucleic acids, and other biological macromolecules. Use this skill when working with protein structures, PDB IDs, crystallographic data, protein structure analysis, molecular visualization, structure-function relationships, or when needing to query or download structural biology data programmatically. --- # PDB Database ## Overview This skill provides tools and guidance for working with the RCSB Protein Data Bank (PDB), the worldwide repository for 3D structural data of biological macromolecules. The PDB contains over 200,000 experimentally determined structures of proteins, nucleic acids, and complex assemblies, along with computed structure models. Use this skill to search for structures, retrieve structural data, perform sequence and structure similarity searches, and integrate PDB data into computational workflows. ## Core Capabilities ### 1. Searching for Structures Find PDB entries using various search criteria: **Text Search:** Search by protein name, keywords, or descriptions ```python from rcsbapi.search import TextQuery query = TextQuery("hemoglobin") results = list(query()) print(f"Found {len(results)} structures") ``` **Attribute Search:** Query specific properties (organism, resolution, method, etc.) ```python from rcsbapi.search import AttributeQuery from rcsbapi.search.attrs import rcsb_entity_source_organism # Find human protein structures query = AttributeQuery( attribute=rcsb_entity_source_organism.scientific_name, operator="exact_match", value="Homo sapiens" ) results = list(query()) ``` **Sequence Similarity:** Find structures similar to a given sequence ```python from rcsbapi.search import SequenceQuery query = SequenceQuery( value="MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQGVDDAFYTLVREIRKHKEKMSKDGKKKKKKSKTKCVIM", evalue_cutoff=0.1, identity_cutoff=0.9 ) results = list(query()) ``` **Structure Similarity:** Find structures with similar 3D geometry ```python from rcsbapi.search import StructSimilarityQuery query = StructSimilarityQuery( structure_search_type="entry", entry_id="4HHB" # Hemoglobin ) results = list(query()) ``` **Combining Queries:** Use logical operators to build complex searches ```python from rcsbapi.search import TextQuery, AttributeQuery from rcsbapi.search.attrs import rcsb_entry_info # High-resolution human proteins query1 = AttributeQuery( attribute=rcsb_entity_source_organism.scientific_name, operator="exact_match", value="Homo sapiens" ) query2 = AttributeQuery( attribute=rcsb_entry_info.resolution_combined, operator="less", value=2.0 ) combined_query = query1 & query2 # AND operation results = list(combined_query()) ``` ### 2. Retrieving Structure Data Access detailed information about specific PDB entries: **Basic Entry Information:** ```python from rcsbapi.data import Schema, fetch # Get entry-level data entry_data = fetch("4HHB", schema=Schema.ENTRY) print(entry_data["struct"]["title"]) print(entry_data["exptl"][0]["method"]) ``` **Polymer Entity Information:** ```python # Get protein/nucleic acid information entity_data = fetch("4HHB_1", schema=Schema.POLYMER_ENTITY) print(entity_data["entity_poly"]["pdbx_seq_one_letter_code"]) ``` **Using GraphQL for Flexible Queries:** ```python from rcsbapi.data import fetch # Custom GraphQL query query = """ { entry(entry_id: "4HHB") { struct { title } exptl { method } rcsb_entry_info { resolution_combined deposited_atom_count } } } """ data = fetch(query_type="graphql", query=query) ``` ### 3. Downloading Structure Files Retrieve coordinate files in various formats: **Download Methods:** - **PDB format** (legacy text format): `https://files.rcsb.org/download/{PDB_ID}.pdb` - **mmCIF format** (modern standard): `https://files.rcsb.org/download/{PDB_ID}.cif` - **BinaryCIF** (compressed binary): Use ModelServer API for efficient access - **Biological assembly**: `https://files.rcsb.org/download/{PDB_ID}.pdb1` (for assembly 1) **Example Download:** ```python import requests pdb_id = "4HHB" # Download PDB format pdb_url = f"https://files.rcsb.org/download/{pdb_id}.pdb" response = requests.get(pdb_url) with open(f"{pdb_id}.pdb", "w") as f: f.write(response.text) # Download mmCIF format cif_url = f"https://files.rcsb.org/download/{pdb_id}.cif" response = requests.get(cif_url) with open(f"{pdb_id}.cif", "w") as f: f.write(response.text) ``` ### 4. Working with Structure Data Common operations with retrieved structures: **Parse and Analyze Coordinates:** Use BioPython or other structural biology libraries to work with downloaded files: ```python from Bio.PDB import PDBParser parser = PDBParser() structure = parser.get_structure("protein", "4HHB.pdb") # Iterate through atoms for model in structure: for chain in model: for residue in chain: for atom in residue: print(atom.get_coord()) ``` **Extract Metadata:** ```python from rcsbapi.data import fetch, Schema # Get experimental details data = fetch("4HHB", schema=Schema.ENTRY) resolution = data.get("rcsb_entry_info", {}).get("resolution_combined") method = data.get("exptl", [{}])[0].get("method") deposition_date = data.get("rcsb_accession_info", {}).get("deposit_date") print(f"Resolution: {resolution} Å") print(f"Method: {method}") print(f"Deposited: {deposition_date}") ``` ### 5. Batch Operations Process multiple structures efficiently: ```python from rcsbapi.data import fetch, Schema pdb_ids = ["4HHB", "1MBN", "1GZX"] # Hemoglobin, myoglobin, etc. results = {} for pdb_id in pdb_ids: try: data = fetch(pdb_id, schema=Schema.ENTRY) results[pdb_id] = { "title": data["struct"]["title"], "resolution": data.get("rcsb_entry_info", {}).get("resolution_combined"), "organism": data.get("rcsb_entity_source_organism", [{}])[0].get("scientific_name") } except Exception as e: print(f"Error fetching {pdb_id}: {e}") # Display results for pdb_id, info in results.items(): print(f"\n{pdb_id}: {info['title']}") print(f" Resolution: {info['resolution']} Å") print(f" Organism: {info['organism']}") ``` ## Python Package Installation Install the official RCSB PDB Python API client: ```bash # Current recommended package pip install rcsb-api # For legacy code (deprecated, use rcsb-api instead) pip install rcsbsearchapi ``` The `rcsb-api` package provides unified access to both Search and Data APIs through the `rcsbapi.search` and `rcsbapi.data` modules. ## Common Use Cases ### Drug Discovery - Search for structures of drug targets - Analyze ligand binding sites - Compare protein-ligand complexes - Identify similar binding pockets ### Protein Engineering - Find homologous structures for modeling - Analyze sequence-structure relationships - Compare mutant structures - Study protein stability and dynamics ### Structural Biology Research - Download structures for computational analysis - Build structure-based alignments - Analyze structural features (secondary structure, domains) - Compare experimental methods and quality metrics ### Education and Visualization - Retrieve structures for teaching - Generate molecular visualizations - Explore structure-function relationships - Study evolutionary conservation ## Key Concepts **PDB ID:** Unique 4-character identifier (e.g., "4HHB") for each structure entry. AlphaFold and ModelArchive entries start with "AF_" or "MA_" prefixes. **mmCIF/PDBx:** Modern file format that uses key-value structure, replacing legacy PDB format for large structures. **Biological Assembly:** The functional form of a macromolecule, which may contain multiple copies of chains from the asymmetric unit. **Resolution:** Measure of detail in crystallographic structures (lower values = higher detail). Typical range: 1.5-3.5 Å for high-quality structures. **Entity:** A unique molecular component in a structure (protein chain, DNA, ligand, etc.). ## Resources This skill includes reference documentation in the `references/` directory: ### references/api_reference.md Comprehensive API documentation covering: - Detailed API endpoint specifications - Advanced query patterns and examples - Data schema reference - Rate limiting and best practices - Troubleshooting common issues Use this reference when you need in-depth information about API capabilities, complex query construction, or detailed data schema information. ## Additional Resources - **RCSB PDB Website:** https://www.rcsb.org - **PDB-101 Educational Portal:** https://pdb101.rcsb.org - **API Documentation:** https://www.rcsb.org/docs/programmatic-access/web-apis-overview - **Python Package Docs:** https://rcsbapi.readthedocs.io/ - **Data API Documentation:** https://data.rcsb.org/ - **GitHub Repository:** https://github.com/rcsb/py-rcsb-api