Add more scientific skills

2026-03-28 07:33:45 +08:00 · 2025-10-19 14:12:02 -07:00
parent 78d5ac2b56
commit 660c8574d0
210 changed files with 88957 additions and 1 deletions
--- a/scientific-packages/biopython/references/core_modules.md
+++ b/scientific-packages/biopython/references/core_modules.md
@@ -0,0 +1,232 @@
+# BioPython Core Modules Reference
+
+This document provides detailed information about BioPython's core modules and their capabilities.
+
+## Sequence Handling
+
+### Bio.Seq - Sequence Objects
+
+Seq objects are BioPython's fundamental data structure for biological sequences, providing biological methods on top of string-like behavior.
+
+**Creation:**
+```python
+from Bio.Seq import Seq
+my_seq = Seq("AGTACACTGGT")
+```
+
+**Key Operations:**
+- String methods: `find()`, `count()`, `count_overlap()` (for overlapping patterns)
+- Complement/Reverse complement: Returns complementary sequences
+- Transcription: DNA → RNA (T → U)
+- Back transcription: RNA → DNA
+- Translation: DNA/RNA → protein with customizable genetic codes and stop codon handling
+
+**Use Cases:**
+- DNA/RNA sequence manipulation
+- Converting between nucleic acid types
+- Protein translation from coding sequences
+- Sequence searching and pattern counting
+
+### Bio.SeqRecord - Sequence Metadata
+
+SeqRecord wraps Seq objects with metadata like ID, description, and features.
+
+**Attributes:**
+- `seq`: The sequence itself (Seq object)
+- `id`: Unique identifier
+- `name`: Short name
+- `description`: Longer description
+- `features`: List of SeqFeature objects
+- `annotations`: Dictionary of additional information
+- `letter_annotations`: Per-letter annotations (e.g., quality scores)
+
+### Bio.SeqFeature - Sequence Annotations
+
+Manages sequence annotations and features such as genes, promoters, and coding regions.
+
+**Common Features:**
+- Gene locations
+- CDS (coding sequences)
+- Promoters and regulatory elements
+- Exons and introns
+- Protein domains
+
+## File Input/Output
+
+### Bio.SeqIO - Sequence File I/O
+
+Unified interface for reading and writing sequence files in multiple formats.
+
+**Supported Formats:**
+- FASTA/FASTQ: Standard sequence formats
+- GenBank/EMBL: Feature-rich annotation formats
+- Clustal/Stockholm/PHYLIP: Alignment formats
+- ABI/SFF: Trace and flowgram data
+- Swiss-Prot/PIR: Protein databases
+- PDB: Protein structure files
+
+**Key Functions:**
+
+**SeqIO.parse()** - Iterator for reading multiple records:
+```python
+from Bio import SeqIO
+for record in SeqIO.parse("file.fasta", "fasta"):
+    print(record.id, len(record.seq))
+```
+
+**SeqIO.read()** - Read single record:
+```python
+record = SeqIO.read("file.fasta", "fasta")
+```
+
+**SeqIO.write()** - Write sequences:
+```python
+SeqIO.write(sequences, "output.fasta", "fasta")
+```
+
+**SeqIO.convert()** - Direct format conversion:
+```python
+count = SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")
+```
+
+**SeqIO.index()** - Memory-efficient random access for large files:
+```python
+record_dict = SeqIO.index("large_file.fasta", "fasta")
+sequence = record_dict["seq_id"]
+```
+
+**SeqIO.to_dict()** - Load all records into dictionary (memory-based):
+```python
+record_dict = SeqIO.to_dict(SeqIO.parse("file.fasta", "fasta"))
+```
+
+**Common Patterns:**
+- Format conversion between FASTA, GenBank, FASTQ
+- Filtering sequences by length, ID, or content
+- Extracting subsequences
+- Batch processing large files with iterators
+
+### Bio.AlignIO - Multiple Sequence Alignment I/O
+
+Handles multiple sequence alignment files.
+
+**Key Functions:**
+- `write()`: Save alignments
+- `parse()`: Read multiple alignments
+- `read()`: Read single alignment
+- `convert()`: Convert between formats
+
+**Supported Formats:**
+- Clustal
+- PHYLIP (sequential and interleaved)
+- Stockholm
+- NEXUS
+- FASTA (aligned)
+- MAF (Multiple Alignment Format)
+
+## Sequence Alignment
+
+### Bio.Align - Alignment Tools
+
+**PairwiseAligner** - High-performance pairwise alignment:
+```python
+from Bio import Align
+aligner = Align.PairwiseAligner()
+aligner.mode = 'global'  # or 'local'
+aligner.match_score = 2
+aligner.mismatch_score = -1
+aligner.gap_score = -2.5
+alignments = aligner.align(seq1, seq2)
+```
+
+**CodonAligner** - Codon-aware alignment
+
+**MultipleSeqAlignment** - Container for MSA with column access
+
+### Bio.pairwise2 (Legacy)
+
+Legacy pairwise alignment module with functions like `align.globalxx()`, `align.localxx()`.
+
+## Sequence Analysis Utilities
+
+### Bio.SeqUtils - Sequence Analysis
+
+Collection of utility functions:
+
+**CheckSum** - Calculate sequence checksums (CRC32, CRC64, GCG)
+
+**MeltingTemp** - DNA melting temperature calculations:
+- Nearest-neighbor method
+- Wallace rule
+- GC content method
+
+**IsoelectricPoint** - Protein pI calculation
+
+**ProtParam** - Protein analysis:
+- Molecular weight
+- Aromaticity
+- Instability index
+- Secondary structure fractions
+
+**GC/GC_skew** - Calculate GC content and GC skew for sequence windows
+
+### Bio.Data.CodonTable - Genetic Codes
+
+Access to NCBI genetic code tables:
+```python
+from Bio.Data import CodonTable
+standard_table = CodonTable.unambiguous_dna_by_id[1]
+print(standard_table.forward_table)  # codon to amino acid
+print(standard_table.back_table)     # amino acid to codons
+print(standard_table.start_codons)
+print(standard_table.stop_codons)
+```
+
+**Available codes:**
+- Standard code (1)
+- Vertebrate mitochondrial (2)
+- Yeast mitochondrial (3)
+- And many more organism-specific codes
+
+## Sequence Motifs and Patterns
+
+### Bio.motifs - Sequence Motif Analysis
+
+Tools for working with sequence motifs:
+
+**Position Weight Matrices (PWM):**
+- Create PWM from aligned sequences
+- Calculate information content
+- Search sequences for motif matches
+- Generate consensus sequences
+
+**Position Specific Scoring Matrices (PSSM):**
+- Convert PWM to PSSM
+- Score sequences against motifs
+- Determine significance thresholds
+
+**Supported Formats:**
+- JASPAR
+- TRANSFAC
+- MEME
+- AlignAce
+
+### Bio.Restriction - Restriction Enzymes
+
+Comprehensive restriction enzyme database and analysis:
+
+**Capabilities:**
+- Search for restriction sites
+- Predict digestion products
+- Analyze restriction maps
+- Access enzyme properties (recognition site, cut positions, isoschizomers)
+
+**Example usage:**
+```python
+from Bio import Restriction
+from Bio.Seq import Seq
+
+seq = Seq("GAATTC...")
+enzyme = Restriction.EcoRI
+results = enzyme.search(seq)
+```
--- a/scientific-packages/biopython/references/database_tools.md
+++ b/scientific-packages/biopython/references/database_tools.md
@@ -0,0 +1,306 @@
+# BioPython Database Access and Search Tools
+
+This document covers BioPython's capabilities for accessing biological databases and performing sequence searches.
+
+## NCBI Database Access
+
+### Bio.Entrez - NCBI E-utilities Interface
+
+Provides programmatic access to NCBI databases including PubMed, GenBank, Protein, Nucleotide, and more.
+
+**Important:** Always set your email before using Entrez:
+```python
+from Bio import Entrez
+Entrez.email = "your.email@example.com"
+```
+
+#### Core Query Functions
+
+**esearch** - Search databases and retrieve IDs:
+```python
+handle = Entrez.esearch(db="nucleotide", term="Homo sapiens[Organism] AND COX1")
+record = Entrez.read(handle)
+id_list = record["IdList"]
+```
+
+Parameters:
+- `db`: Database to search (nucleotide, protein, pubmed, etc.)
+- `term`: Search query
+- `retmax`: Maximum number of IDs to return
+- `sort`: Sort order (relevance, pub_date, etc.)
+- `usehistory`: Store results on server (useful for large queries)
+
+**efetch** - Retrieve full records:
+```python
+handle = Entrez.efetch(db="nucleotide", id="123456", rettype="gb", retmode="text")
+record = SeqIO.read(handle, "genbank")
+```
+
+Parameters:
+- `db`: Database name
+- `id`: Single ID or comma-separated list
+- `rettype`: Return type (gb, fasta, gp, xml, etc.)
+- `retmode`: Return mode (text, xml, asn.1)
+- Automatically uses POST for >200 IDs
+
+**elink** - Find related records across databases:
+```python
+handle = Entrez.elink(dbfrom="protein", db="gene", id="15718680")
+result = Entrez.read(handle)
+```
+
+Parameters:
+- `dbfrom`: Source database
+- `db`: Target database
+- `id`: ID(s) to link from
+- Returns LinkOut providers and relevancy scores
+
+**esummary** - Get document summaries:
+```python
+handle = Entrez.esummary(db="protein", id="15718680")
+summary = Entrez.read(handle)
+print(summary[0]['Title'])
+```
+
+Returns quick overviews without full records.
+
+**einfo** - Get database statistics:
+```python
+handle = Entrez.einfo(db="nucleotide")
+info = Entrez.read(handle)
+```
+
+Provides field indices, term counts, update dates, and available links.
+
+**epost** - Upload ID lists to server:
+```python
+handle = Entrez.epost("nucleotide", id="123456,789012")
+result = Entrez.read(handle)
+webenv = result["WebEnv"]
+query_key = result["QueryKey"]
+```
+
+Useful for large queries split across multiple requests.
+
+**espell** - Get spelling suggestions:
+```python
+handle = Entrez.espell(term="brest cancer")
+result = Entrez.read(handle)
+print(result["CorrectedQuery"])  # "breast cancer"
+```
+
+**ecitmatch** - Convert citations to PubMed IDs:
+```python
+citation = "proc natl acad sci u s a|1991|88|3248|mann bj|"
+handle = Entrez.ecitmatch(db="pubmed", bdata=citation)
+```
+
+#### Data Processing Functions
+
+**Entrez.read()** - Parse XML to Python dictionary:
+```python
+handle = Entrez.esearch(db="protein", term="insulin")
+record = Entrez.read(handle)
+```
+
+**Entrez.parse()** - Generator for large XML results:
+```python
+handle = Entrez.efetch(db="protein", id=id_list, rettype="gp", retmode="xml")
+for record in Entrez.parse(handle):
+    process(record)
+```
+
+#### Common Workflows
+
+**Download sequences by accession:**
+```python
+handle = Entrez.efetch(db="nucleotide", id="NM_001301717", rettype="fasta", retmode="text")
+record = SeqIO.read(handle, "fasta")
+```
+
+**Search and download multiple sequences:**
+```python
+# Search
+search_handle = Entrez.esearch(db="nucleotide", term="human kinase", retmax="100")
+search_results = Entrez.read(search_handle)
+
+# Download
+fetch_handle = Entrez.efetch(db="nucleotide", id=search_results["IdList"], rettype="gb", retmode="text")
+for record in SeqIO.parse(fetch_handle, "genbank"):
+    print(record.id)
+```
+
+**Use WebEnv for large queries:**
+```python
+# Post IDs
+post_handle = Entrez.epost(db="nucleotide", id=",".join(large_id_list))
+post_result = Entrez.read(post_handle)
+
+# Fetch in batches
+batch_size = 500
+for start in range(0, count, batch_size):
+    fetch_handle = Entrez.efetch(
+        db="nucleotide",
+        rettype="fasta",
+        retmode="text",
+        retstart=start,
+        retmax=batch_size,
+        webenv=post_result["WebEnv"],
+        query_key=post_result["QueryKey"]
+    )
+    # Process batch
+```
+
+### Bio.GenBank - GenBank Format Parsing
+
+Low-level GenBank file parser (SeqIO is usually preferred).
+
+### Bio.SwissProt - Swiss-Prot/UniProt Parsing
+
+Parse Swiss-Prot and UniProtKB flat file format:
+```python
+from Bio import SwissProt
+with open("uniprot.dat") as handle:
+    for record in SwissProt.parse(handle):
+        print(record.entry_name, record.organism)
+```
+
+## Sequence Similarity Searches
+
+### Bio.Blast - BLAST Interface
+
+Tools for running BLAST searches and parsing results.
+
+#### Running BLAST
+
+**NCBI QBLAST (online):**
+```python
+from Bio.Blast import NCBIWWW
+result_handle = NCBIWWW.qblast("blastn", "nt", sequence)
+```
+
+Parameters:
+- Program: blastn, blastp, blastx, tblastn, tblastx
+- Database: nt, nr, refseq_rna, pdb, etc.
+- Sequence: string or Seq object
+- Additional parameters: `expect`, `word_size`, `hitlist_size`, `format_type`
+
+**Local BLAST:**
+Run standalone BLAST from command line, then parse results.
+
+#### Parsing BLAST Results
+
+**XML format (recommended):**
+```python
+from Bio.Blast import NCBIXML
+
+result_handle = open("blast_results.xml")
+blast_records = NCBIXML.parse(result_handle)
+
+for blast_record in blast_records:
+    for alignment in blast_record.alignments:
+        for hsp in alignment.hsps:
+            if hsp.expect < 0.001:
+                print(f"Hit: {alignment.title}")
+                print(f"Length: {alignment.length}")
+                print(f"E-value: {hsp.expect}")
+                print(f"Identities: {hsp.identities}/{hsp.align_length}")
+```
+
+**Functions:**
+- `NCBIXML.read()`: Single query
+- `NCBIXML.parse()`: Multiple queries (generator)
+
+**Key Record Attributes:**
+- `alignments`: List of matching sequences
+- `query`: Query sequence ID
+- `query_length`: Length of query
+
+**Alignment Attributes:**
+- `title`: Description of hit
+- `length`: Length of hit sequence
+- `hsps`: High-scoring segment pairs
+
+**HSP Attributes:**
+- `expect`: E-value
+- `score`: Bit score
+- `identities`: Number of identical residues
+- `positives`: Number of positive scoring matches
+- `gaps`: Number of gaps
+- `align_length`: Length of alignment
+- `query`: Aligned query sequence
+- `match`: Match indicators
+- `sbjct`: Aligned subject sequence
+- `query_start`, `query_end`: Query coordinates
+- `sbjct_start`, `sbjct_end`: Subject coordinates
+
+#### Common BLAST Workflows
+
+**Find homologs:**
+```python
+result = NCBIWWW.qblast("blastp", "nr", protein_sequence, expect=1e-10)
+with open("results.xml", "w") as out:
+    out.write(result.read())
+```
+
+**Filter results by criteria:**
+```python
+for alignment in blast_record.alignments:
+    for hsp in alignment.hsps:
+        if hsp.expect < 1e-5 and hsp.identities/hsp.align_length > 0.5:
+            # Process high-quality hits
+            pass
+```
+
+### Bio.SearchIO - Unified Search Results Parser
+
+Modern interface for parsing various search tool outputs (BLAST, HMMER, BLAT, etc.).
+
+**Key Functions:**
+- `read()`: Parse single query
+- `parse()`: Parse multiple queries (generator)
+- `write()`: Write results to file
+- `convert()`: Convert between formats
+
+**Supported Tools:**
+- BLAST (XML, tabular, plain text)
+- HMMER (hmmscan, hmmsearch, phmmer)
+- BLAT
+- FASTA
+- InterProScan
+- Exonerate
+
+**Example:**
+```python
+from Bio import SearchIO
+results = SearchIO.parse("blast_output.xml", "blast-xml")
+for result in results:
+    for hit in result:
+        if hit.hsps[0].evalue < 0.001:
+            print(hit.id, hit.hsps[0].evalue)
+```
+
+## Local Database Management
+
+### BioSQL - SQL Database Interface
+
+Store and manage biological sequences in SQL databases (PostgreSQL, MySQL, SQLite).
+
+**Features:**
+- Store SeqRecord objects with annotations
+- Efficient querying and retrieval
+- Cross-reference sequences
+- Track relationships between sequences
+
+**Example:**
+```python
+from BioSQL import BioSeqDatabase
+server = BioSeqDatabase.open_database(driver="MySQLdb", user="user", passwd="pass", host="localhost", db="bioseqdb")
+db = server["my_db"]
+
+# Store sequences
+db.load(SeqIO.parse("sequences.gb", "genbank"))
+
+# Query
+seq = db.lookup(accession="NC_005816")
+```
--- a/scientific-packages/biopython/references/specialized_modules.md
+++ b/scientific-packages/biopython/references/specialized_modules.md
@@ -0,0 +1,612 @@
+# BioPython Specialized Analysis Modules
+
+This document covers BioPython's specialized modules for structural biology, phylogenetics, population genetics, and other advanced analyses.
+
+## Structural Bioinformatics
+
+### Bio.PDB - Protein Structure Analysis
+
+Comprehensive tools for handling macromolecular crystal structures.
+
+#### Structure Hierarchy
+
+PDB structures are organized hierarchically:
+- **Structure** → Models → Chains → Residues → Atoms
+
+```python
+from Bio.PDB import PDBParser
+
+parser = PDBParser()
+structure = parser.get_structure("protein", "1abc.pdb")
+
+# Navigate hierarchy
+for model in structure:
+    for chain in model:
+        for residue in chain:
+            for atom in residue:
+                print(atom.coord)  # xyz coordinates
+```
+
+#### Parsing Structure Files
+
+**PDB format:**
+```python
+from Bio.PDB import PDBParser
+parser = PDBParser(QUIET=True)
+structure = parser.get_structure("myprotein", "structure.pdb")
+```
+
+**mmCIF format:**
+```python
+from Bio.PDB import MMCIFParser
+parser = MMCIFParser(QUIET=True)
+structure = parser.get_structure("myprotein", "structure.cif")
+```
+
+**Fast mmCIF parser:**
+```python
+from Bio.PDB import FastMMCIFParser
+parser = FastMMCIFParser(QUIET=True)
+structure = parser.get_structure("myprotein", "structure.cif")
+```
+
+**MMTF format:**
+```python
+from Bio.PDB import MMTFParser
+parser = MMTFParser()
+structure = parser.get_structure("structure.mmtf")
+```
+
+**Binary CIF:**
+```python
+from Bio.PDB.binary_cif import BinaryCIFParser
+parser = BinaryCIFParser()
+structure = parser.get_structure("structure.bcif")
+```
+
+#### Downloading Structures
+
+```python
+from Bio.PDB import PDBList
+pdbl = PDBList()
+
+# Download specific structure
+pdbl.retrieve_pdb_file("1ABC", file_format="pdb", pdir="structures/")
+
+# Download entire PDB (obsolete entries)
+pdbl.download_obsolete_entries(pdir="obsolete/")
+
+# Update local PDB mirror
+pdbl.update_pdb()
+```
+
+#### Structure Selection and Filtering
+
+```python
+# Select specific chains
+chain_A = structure[0]['A']
+
+# Select specific residues
+residue_10 = chain_A[10]
+
+# Select specific atoms
+ca_atom = residue_10['CA']
+
+# Iterate over specific atom types
+for atom in structure.get_atoms():
+    if atom.name == 'CA':  # Alpha carbons only
+        print(atom.coord)
+```
+
+**Structure selectors:**
+```python
+from Bio.PDB.Polypeptide import is_aa
+
+# Filter by residue type
+for residue in structure.get_residues():
+    if is_aa(residue):
+        print(f"Amino acid: {residue.resname}")
+```
+
+#### Secondary Structure Analysis
+
+**DSSP integration:**
+```python
+from Bio.PDB import DSSP
+
+# Requires DSSP program installed
+model = structure[0]
+dssp = DSSP(model, "structure.pdb")
+
+# Access secondary structure
+for key in dssp:
+    secondary_structure = dssp[key][2]
+    accessibility = dssp[key][3]
+    print(f"Residue {key}: {secondary_structure}, accessible: {accessibility}")
+```
+
+DSSP codes:
+- H: Alpha helix
+- B: Beta bridge
+- E: Extended strand (beta sheet)
+- G: 3-10 helix
+- I: Pi helix
+- T: Turn
+- S: Bend
+- -: Coil
+
+#### Solvent Accessibility
+
+**Shrake-Rupley algorithm:**
+```python
+from Bio.PDB import ShrakeRupley
+
+sr = ShrakeRupley()
+sr.compute(structure, level="R")  # R=residue, A=atom, C=chain, M=model, S=structure
+
+for residue in structure.get_residues():
+    print(f"{residue.resname} {residue.id[1]}: {residue.sasa} Ų")
+```
+
+**NACCESS wrapper:**
+```python
+from Bio.PDB import NACCESS
+
+# Requires NACCESS program
+naccess = NACCESS("structure.pdb")
+for residue_id, data in naccess.items():
+    print(f"Residue {residue_id}: {data['all_atoms_abs']} Ų")
+```
+
+**Half-sphere exposure:**
+```python
+from Bio.PDB import HSExposure
+
+# Requires DSSP
+model = structure[0]
+hse = HSExposure()
+hse.calc_hs_exposure(model, "structure.pdb")
+
+for chain in model:
+    for residue in chain:
+        if residue.has_id('EXP_HSE_A_U'):
+            hse_up = residue.xtra['EXP_HSE_A_U']
+            hse_down = residue.xtra['EXP_HSE_A_D']
+```
+
+#### Structural Alignment and Superimposition
+
+**Standard superimposition:**
+```python
+from Bio.PDB import Superimposer
+
+sup = Superimposer()
+sup.set_atoms(ref_atoms, alt_atoms)  # Lists of atoms to align
+sup.apply(structure2.get_atoms())  # Apply transformation
+
+print(f"RMSD: {sup.rms}")
+print(f"Rotation matrix: {sup.rotran[0]}")
+print(f"Translation vector: {sup.rotran[1]}")
+```
+
+**QCP (Quaternion Characteristic Polynomial) method:**
+```python
+from Bio.PDB import QCPSuperimposer
+
+qcp = QCPSuperimposer()
+qcp.set(ref_coords, alt_coords)
+qcp.run()
+print(f"RMSD: {qcp.get_rms()}")
+```
+
+#### Geometric Calculations
+
+**Distances and angles:**
+```python
+# Distance between atoms
+from Bio.PDB import Vector
+dist = atom1 - atom2  # Returns distance
+
+# Angle between three atoms
+from Bio.PDB import calc_angle
+angle = calc_angle(atom1.coord, atom2.coord, atom3.coord)
+
+# Dihedral angle
+from Bio.PDB import calc_dihedral
+dihedral = calc_dihedral(atom1.coord, atom2.coord, atom3.coord, atom4.coord)
+```
+
+**Vector operations:**
+```python
+from Bio.PDB.Vector import Vector
+
+v1 = Vector(atom1.coord)
+v2 = Vector(atom2.coord)
+
+# Vector operations
+v3 = v1 + v2
+v4 = v1 - v2
+dot_product = v1 * v2
+cross_product = v1 ** v2
+magnitude = v1.norm()
+normalized = v1.normalized()
+```
+
+#### Internal Coordinates
+
+Advanced residue geometry representation:
+```python
+from Bio.PDB import internal_coords
+
+# Enable internal coordinates
+structure.atom_to_internal_coordinates()
+
+# Access phi, psi angles
+for residue in structure.get_residues():
+    if residue.internal_coord:
+        print(f"Phi: {residue.internal_coord.get_angle('phi')}")
+        print(f"Psi: {residue.internal_coord.get_angle('psi')}")
+```
+
+#### Writing Structures
+
+```python
+from Bio.PDB import PDBIO
+
+io = PDBIO()
+io.set_structure(structure)
+io.save("output.pdb")
+
+# Save specific selection
+io.save("chain_A.pdb", select=ChainSelector("A"))
+```
+
+### Bio.SCOP - SCOP Database
+
+Access to Structural Classification of Proteins database.
+
+### Bio.KEGG - Pathway Analysis
+
+Interface to KEGG (Kyoto Encyclopedia of Genes and Genomes) databases:
+
+**Capabilities:**
+- Access pathway maps
+- Retrieve enzyme data
+- Get compound information
+- Query orthology relationships
+
+## Phylogenetics
+
+### Bio.Phylo - Phylogenetic Tree Analysis
+
+Comprehensive phylogenetic tree manipulation and analysis.
+
+#### Reading and Writing Trees
+
+**Supported formats:**
+- Newick: Simple, widely-used format
+- NEXUS: Rich metadata format
+- PhyloXML: XML-based with extensive annotations
+- NeXML: Modern XML standard
+
+```python
+from Bio import Phylo
+
+# Read tree
+tree = Phylo.read("tree.nwk", "newick")
+
+# Read multiple trees
+trees = list(Phylo.parse("trees.nex", "nexus"))
+
+# Write tree
+Phylo.write(tree, "output.nwk", "newick")
+```
+
+#### Tree Visualization
+
+**ASCII visualization:**
+```python
+Phylo.draw_ascii(tree)
+```
+
+**Matplotlib plotting:**
+```python
+import matplotlib.pyplot as plt
+Phylo.draw(tree)
+plt.show()
+
+# With customization
+fig, ax = plt.subplots(figsize=(10, 8))
+Phylo.draw(tree, axes=ax, do_show=False)
+ax.set_title("My Phylogenetic Tree")
+plt.show()
+```
+
+#### Tree Navigation and Manipulation
+
+**Find clades:**
+```python
+# Get all terminal nodes (leaves)
+terminals = tree.get_terminals()
+
+# Get all nonterminal nodes
+nonterminals = tree.get_nonterminals()
+
+# Find specific clade
+target = tree.find_any(name="Species_A")
+
+# Find all matching clades
+matches = tree.find_clades(terminal=True)
+```
+
+**Tree properties:**
+```python
+# Count terminals
+num_species = tree.count_terminals()
+
+# Get total branch length
+total_length = tree.total_branch_length()
+
+# Check if tree is bifurcating
+is_bifurcating = tree.is_bifurcating()
+
+# Get maximum distance from root
+max_dist = tree.distance(tree.root)
+```
+
+**Tree modification:**
+```python
+# Prune tree to specific taxa
+keep_taxa = ["Species_A", "Species_B", "Species_C"]
+tree.prune(keep_taxa)
+
+# Collapse short branches
+tree.collapse_all(lambda c: c.branch_length < 0.01)
+
+# Ladderize (sort branches)
+tree.ladderize()
+
+# Root tree at midpoint
+tree.root_at_midpoint()
+
+# Root at specific clade
+outgroup = tree.find_any(name="Outgroup_species")
+tree.root_with_outgroup(outgroup)
+```
+
+**Calculate distances:**
+```python
+# Distance between two clades
+dist = tree.distance(clade1, clade2)
+
+# Distance from root
+root_dist = tree.distance(tree.root, terminal_clade)
+```
+
+#### Tree Construction
+
+**Distance-based methods:**
+```python
+from Bio.Phylo.TreeConstruction import DistanceTreeConstructor, DistanceCalculator
+from Bio import AlignIO
+
+# Load alignment
+aln = AlignIO.read("alignment.fasta", "fasta")
+
+# Calculate distance matrix
+calculator = DistanceCalculator('identity')
+dm = calculator.get_distance(aln)
+
+# Construct tree using UPGMA
+constructor = DistanceTreeConstructor()
+tree_upgma = constructor.upgma(dm)
+
+# Or using Neighbor-Joining
+tree_nj = constructor.nj(dm)
+```
+
+**Parsimony method:**
+```python
+from Bio.Phylo.TreeConstruction import ParsimonyScorer, NNITreeSearcher
+
+scorer = ParsimonyScorer()
+searcher = NNITreeSearcher(scorer)
+tree = searcher.search(starting_tree, alignment)
+```
+
+**Distance calculators:**
+- 'identity': Simple identity scoring
+- 'blastn': BLAST nucleotide scoring
+- 'blastp': BLAST protein scoring
+- 'dnafull': EMBOSS DNA scoring matrix
+- 'blosum62': BLOSUM62 protein matrix
+- 'pam250': PAM250 protein matrix
+
+#### Consensus Trees
+
+```python
+from Bio.Phylo.Consensus import majority_consensus, strict_consensus
+
+# Strict consensus
+consensus_strict = strict_consensus(trees)
+
+# Majority rule consensus
+consensus_majority = majority_consensus(trees, cutoff=0.5)
+
+# Bootstrap consensus
+from Bio.Phylo.Consensus import bootstrap_consensus
+bootstrap_tree = bootstrap_consensus(trees, cutoff=0.7)
+```
+
+#### External Tool Wrappers
+
+**PhyML:**
+```python
+from Bio.Phylo.Applications import PhymlCommandline
+
+cmd = PhymlCommandline(input="alignment.phy", datatype="nt", model="HKY85", alpha="e", bootstrap=100)
+stdout, stderr = cmd()
+tree = Phylo.read("alignment.phy_phyml_tree.txt", "newick")
+```
+
+**RAxML:**
+```python
+from Bio.Phylo.Applications import RaxmlCommandline
+
+cmd = RaxmlCommandline(
+    sequences="alignment.phy",
+    model="GTRGAMMA",
+    name="mytree",
+    parsimony_seed=12345
+)
+stdout, stderr = cmd()
+```
+
+**FastTree:**
+```python
+from Bio.Phylo.Applications import FastTreeCommandline
+
+cmd = FastTreeCommandline(input="alignment.fasta", out="tree.nwk", gtr=True, gamma=True)
+stdout, stderr = cmd()
+```
+
+### Bio.Phylo.PAML - Evolutionary Analysis
+
+Interface to PAML (Phylogenetic Analysis by Maximum Likelihood):
+
+**CODEML - Codon-based analysis:**
+```python
+from Bio.Phylo.PAML import codeml
+
+cml = codeml.Codeml()
+cml.alignment = "alignment.phy"
+cml.tree = "tree.nwk"
+cml.out_file = "results.out"
+cml.working_dir = "./paml_wd"
+
+# Set parameters
+cml.set_options(
+    seqtype=1,      # Codon sequences
+    model=0,        # One omega ratio
+    NSsites=[0, 1, 2],  # Test different models
+    CodonFreq=2,    # F3x4 codon frequencies
+)
+
+results = cml.run()
+```
+
+**BaseML - Nucleotide-based analysis:**
+```python
+from Bio.Phylo.PAML import baseml
+
+bml = baseml.Baseml()
+bml.alignment = "alignment.phy"
+bml.tree = "tree.nwk"
+results = bml.run()
+```
+
+**YN00 - Yang-Nielsen method:**
+```python
+from Bio.Phylo.PAML import yn00
+
+yn = yn00.Yn00()
+yn.alignment = "alignment.phy"
+results = yn.run()
+```
+
+## Population Genetics
+
+### Bio.PopGen - Population Genetics Analysis
+
+Tools for population-level genetic analysis.
+
+**Capabilities:**
+- Allele frequency calculations
+- Hardy-Weinberg equilibrium testing
+- Linkage disequilibrium analysis
+- F-statistics (FST, FIS, FIT)
+- Tajima's D
+- Population structure analysis
+
+## Clustering and Machine Learning
+
+### Bio.Cluster - Clustering Algorithms
+
+Statistical clustering for gene expression and other biological data:
+
+**Hierarchical clustering:**
+```python
+from Bio.Cluster import treecluster
+
+tree = treecluster(data, method='a', dist='e')
+# method: 'a'=average, 's'=single, 'm'=maximum, 'c'=centroid
+# dist: 'e'=Euclidean, 'c'=correlation, 'a'=absolute correlation
+```
+
+**k-means clustering:**
+```python
+from Bio.Cluster import kcluster
+
+clusterid, error, nfound = kcluster(data, nclusters=5, npass=100)
+```
+
+**Self-Organizing Maps (SOM):**
+```python
+from Bio.Cluster import somcluster
+
+clusterid, celldata = somcluster(data, nx=3, ny=3)
+```
+
+**Principal Component Analysis:**
+```python
+from Bio.Cluster import pca
+
+columnmean, coordinates, components, eigenvalues = pca(data)
+```
+
+## Visualization
+
+### Bio.Graphics - Genomic Visualization
+
+Tools for creating publication-quality biological graphics.
+
+**GenomeDiagram - Circular and linear genome maps:**
+```python
+from Bio.Graphics import GenomeDiagram
+from Bio import SeqIO
+
+record = SeqIO.read("genome.gb", "genbank")
+
+gd_diagram = GenomeDiagram.Diagram("Genome Map")
+gd_track = gd_diagram.new_track(1, greytrack=True)
+gd_feature_set = gd_track.new_set()
+
+# Add features
+for feature in record.features:
+    if feature.type == "gene":
+        gd_feature_set.add_feature(feature, color="blue", label=True)
+
+gd_diagram.draw(format="linear", pagesize='A4', fragments=1)
+gd_diagram.write("genome_map.pdf", "PDF")
+```
+
+**Chromosomes - Chromosome visualization:**
+```python
+from Bio.Graphics.BasicChromosome import Chromosome
+
+chr = Chromosome("Chromosome 1")
+chr.add("gene1", 1000, 2000, color="red")
+chr.add("gene2", 3000, 4500, color="blue")
+```
+
+## Phenotype Analysis
+
+### Bio.phenotype - Phenotypic Microarray Analysis
+
+Tools for analyzing phenotypic microarray data (e.g., Biolog plates):
+
+**Capabilities:**
+- Parse PM plate data
+- Growth curve analysis
+- Compare phenotypic profiles
+- Calculate similarity metrics