Add more scientific skills

2026-01-26 16:58:56 +08:00 · 2025-10-19 14:12:02 -07:00
parent 78d5ac2b56
commit 660c8574d0
210 changed files with 88957 additions and 1 deletions
--- a/scientific-packages/biopython/references/core_modules.md
+++ b/scientific-packages/biopython/references/core_modules.md
@@ -0,0 +1,232 @@
+# BioPython Core Modules Reference
+
+This document provides detailed information about BioPython's core modules and their capabilities.
+
+## Sequence Handling
+
+### Bio.Seq - Sequence Objects
+
+Seq objects are BioPython's fundamental data structure for biological sequences, providing biological methods on top of string-like behavior.
+
+**Creation:**
+```python
+from Bio.Seq import Seq
+my_seq = Seq("AGTACACTGGT")
+```
+
+**Key Operations:**
+- String methods: `find()`, `count()`, `count_overlap()` (for overlapping patterns)
+- Complement/Reverse complement: Returns complementary sequences
+- Transcription: DNA → RNA (T → U)
+- Back transcription: RNA → DNA
+- Translation: DNA/RNA → protein with customizable genetic codes and stop codon handling
+
+**Use Cases:**
+- DNA/RNA sequence manipulation
+- Converting between nucleic acid types
+- Protein translation from coding sequences
+- Sequence searching and pattern counting
+
+### Bio.SeqRecord - Sequence Metadata
+
+SeqRecord wraps Seq objects with metadata like ID, description, and features.
+
+**Attributes:**
+- `seq`: The sequence itself (Seq object)
+- `id`: Unique identifier
+- `name`: Short name
+- `description`: Longer description
+- `features`: List of SeqFeature objects
+- `annotations`: Dictionary of additional information
+- `letter_annotations`: Per-letter annotations (e.g., quality scores)
+
+### Bio.SeqFeature - Sequence Annotations
+
+Manages sequence annotations and features such as genes, promoters, and coding regions.
+
+**Common Features:**
+- Gene locations
+- CDS (coding sequences)
+- Promoters and regulatory elements
+- Exons and introns
+- Protein domains
+
+## File Input/Output
+
+### Bio.SeqIO - Sequence File I/O
+
+Unified interface for reading and writing sequence files in multiple formats.
+
+**Supported Formats:**
+- FASTA/FASTQ: Standard sequence formats
+- GenBank/EMBL: Feature-rich annotation formats
+- Clustal/Stockholm/PHYLIP: Alignment formats
+- ABI/SFF: Trace and flowgram data
+- Swiss-Prot/PIR: Protein databases
+- PDB: Protein structure files
+
+**Key Functions:**
+
+**SeqIO.parse()** - Iterator for reading multiple records:
+```python
+from Bio import SeqIO
+for record in SeqIO.parse("file.fasta", "fasta"):
+    print(record.id, len(record.seq))
+```
+
+**SeqIO.read()** - Read single record:
+```python
+record = SeqIO.read("file.fasta", "fasta")
+```
+
+**SeqIO.write()** - Write sequences:
+```python
+SeqIO.write(sequences, "output.fasta", "fasta")
+```
+
+**SeqIO.convert()** - Direct format conversion:
+```python
+count = SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")
+```
+
+**SeqIO.index()** - Memory-efficient random access for large files:
+```python
+record_dict = SeqIO.index("large_file.fasta", "fasta")
+sequence = record_dict["seq_id"]
+```
+
+**SeqIO.to_dict()** - Load all records into dictionary (memory-based):
+```python
+record_dict = SeqIO.to_dict(SeqIO.parse("file.fasta", "fasta"))
+```
+
+**Common Patterns:**
+- Format conversion between FASTA, GenBank, FASTQ
+- Filtering sequences by length, ID, or content
+- Extracting subsequences
+- Batch processing large files with iterators
+
+### Bio.AlignIO - Multiple Sequence Alignment I/O
+
+Handles multiple sequence alignment files.
+
+**Key Functions:**
+- `write()`: Save alignments
+- `parse()`: Read multiple alignments
+- `read()`: Read single alignment
+- `convert()`: Convert between formats
+
+**Supported Formats:**
+- Clustal
+- PHYLIP (sequential and interleaved)
+- Stockholm
+- NEXUS
+- FASTA (aligned)
+- MAF (Multiple Alignment Format)
+
+## Sequence Alignment
+
+### Bio.Align - Alignment Tools
+
+**PairwiseAligner** - High-performance pairwise alignment:
+```python
+from Bio import Align
+aligner = Align.PairwiseAligner()
+aligner.mode = 'global'  # or 'local'
+aligner.match_score = 2
+aligner.mismatch_score = -1
+aligner.gap_score = -2.5
+alignments = aligner.align(seq1, seq2)
+```
+
+**CodonAligner** - Codon-aware alignment
+
+**MultipleSeqAlignment** - Container for MSA with column access
+
+### Bio.pairwise2 (Legacy)
+
+Legacy pairwise alignment module with functions like `align.globalxx()`, `align.localxx()`.
+
+## Sequence Analysis Utilities
+
+### Bio.SeqUtils - Sequence Analysis
+
+Collection of utility functions:
+
+**CheckSum** - Calculate sequence checksums (CRC32, CRC64, GCG)
+
+**MeltingTemp** - DNA melting temperature calculations:
+- Nearest-neighbor method
+- Wallace rule
+- GC content method
+
+**IsoelectricPoint** - Protein pI calculation
+
+**ProtParam** - Protein analysis:
+- Molecular weight
+- Aromaticity
+- Instability index
+- Secondary structure fractions
+
+**GC/GC_skew** - Calculate GC content and GC skew for sequence windows
+
+### Bio.Data.CodonTable - Genetic Codes
+
+Access to NCBI genetic code tables:
+```python
+from Bio.Data import CodonTable
+standard_table = CodonTable.unambiguous_dna_by_id[1]
+print(standard_table.forward_table)  # codon to amino acid
+print(standard_table.back_table)     # amino acid to codons
+print(standard_table.start_codons)
+print(standard_table.stop_codons)
+```
+
+**Available codes:**
+- Standard code (1)
+- Vertebrate mitochondrial (2)
+- Yeast mitochondrial (3)
+- And many more organism-specific codes
+
+## Sequence Motifs and Patterns
+
+### Bio.motifs - Sequence Motif Analysis
+
+Tools for working with sequence motifs:
+
+**Position Weight Matrices (PWM):**
+- Create PWM from aligned sequences
+- Calculate information content
+- Search sequences for motif matches
+- Generate consensus sequences
+
+**Position Specific Scoring Matrices (PSSM):**
+- Convert PWM to PSSM
+- Score sequences against motifs
+- Determine significance thresholds
+
+**Supported Formats:**
+- JASPAR
+- TRANSFAC
+- MEME
+- AlignAce
+
+### Bio.Restriction - Restriction Enzymes
+
+Comprehensive restriction enzyme database and analysis:
+
+**Capabilities:**
+- Search for restriction sites
+- Predict digestion products
+- Analyze restriction maps
+- Access enzyme properties (recognition site, cut positions, isoschizomers)
+
+**Example usage:**
+```python
+from Bio import Restriction
+from Bio.Seq import Seq
+
+seq = Seq("GAATTC...")
+enzyme = Restriction.EcoRI
+results = enzyme.search(seq)
+```