mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-01-26 16:58:56 +08:00
Add PyOpenms
This commit is contained in:
@@ -7,7 +7,7 @@
|
|||||||
},
|
},
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"description": "Claude scientific skills from K-Dense Inc",
|
"description": "Claude scientific skills from K-Dense Inc",
|
||||||
"version": "1.18.3"
|
"version": "1.19.0"
|
||||||
},
|
},
|
||||||
"plugins": [
|
"plugins": [
|
||||||
{
|
{
|
||||||
@@ -39,6 +39,7 @@
|
|||||||
"./scientific-packages/pymatgen",
|
"./scientific-packages/pymatgen",
|
||||||
"./scientific-packages/pymc",
|
"./scientific-packages/pymc",
|
||||||
"./scientific-packages/pymoo",
|
"./scientific-packages/pymoo",
|
||||||
|
"./scientific-packages/pyopenms",
|
||||||
"./scientific-packages/pytdc",
|
"./scientific-packages/pytdc",
|
||||||
"./scientific-packages/pytorch-lightning",
|
"./scientific-packages/pytorch-lightning",
|
||||||
"./scientific-packages/rdkit",
|
"./scientific-packages/rdkit",
|
||||||
|
|||||||
@@ -66,6 +66,9 @@ After installing the plugin, you can use the skill by just mentioning it. Additi
|
|||||||
- **PyTDC** - Therapeutics Data Commons for drug discovery datasets and benchmarks
|
- **PyTDC** - Therapeutics Data Commons for drug discovery datasets and benchmarks
|
||||||
- **RDKit** - Cheminformatics toolkit for molecular I/O, descriptors, fingerprints, and SMARTS
|
- **RDKit** - Cheminformatics toolkit for molecular I/O, descriptors, fingerprints, and SMARTS
|
||||||
|
|
||||||
|
**Proteomics & Mass Spectrometry:**
|
||||||
|
- **pyOpenMS** - Comprehensive mass spectrometry data analysis for proteomics and metabolomics (LC-MS/MS processing, peptide identification, feature detection, quantification, chemical calculations, and integration with search engines like Comet, Mascot, MSGF+)
|
||||||
|
|
||||||
**Machine Learning & Deep Learning:**
|
**Machine Learning & Deep Learning:**
|
||||||
- **PyMC** - Bayesian statistical modeling and probabilistic programming
|
- **PyMC** - Bayesian statistical modeling and probabilistic programming
|
||||||
- **PyMOO** - Multi-objective optimization with evolutionary algorithms
|
- **PyMOO** - Multi-objective optimization with evolutionary algorithms
|
||||||
@@ -157,7 +160,6 @@ After installing the plugin, you can use the skill by just mentioning it. Additi
|
|||||||
|
|
||||||
### Proteomics & Mass Spectrometry
|
### Proteomics & Mass Spectrometry
|
||||||
- **pyteomics** - Mass spectrometry data analysis
|
- **pyteomics** - Mass spectrometry data analysis
|
||||||
- **pyOpenMS** - OpenMS Python bindings for proteomics
|
|
||||||
- **matchms** - Processing and similarity matching of mass spectrometry data
|
- **matchms** - Processing and similarity matching of mass spectrometry data
|
||||||
- **MSstats** - Statistical analysis of quantitative proteomics
|
- **MSstats** - Statistical analysis of quantitative proteomics
|
||||||
|
|
||||||
|
|||||||
522
scientific-packages/pyopenms/SKILL.md
Normal file
522
scientific-packages/pyopenms/SKILL.md
Normal file
@@ -0,0 +1,522 @@
|
|||||||
|
---
|
||||||
|
name: pyopenms
|
||||||
|
description: Toolkit for mass spectrometry data analysis with pyOpenMS, supporting proteomics and metabolomics workflows including LC-MS/MS data processing, peptide identification, feature detection, quantification, and chemical calculations. Use this skill when: (1) Working with mass spectrometry file formats (mzML, mzXML, FASTA, mzTab, mzIdentML, TraML, pepXML/protXML) and need to read, write, or convert between formats; (2) Processing raw LC-MS/MS data including spectral smoothing, peak picking, noise filtering, and signal processing; (3) Performing proteomics workflows such as peptide digestion simulation, theoretical fragmentation, modification analysis, and protein identification post-processing; (4) Conducting metabolomics analysis including feature detection, adduct annotation, isotope pattern matching, and small molecule identification; (5) Implementing quantitative proteomics pipelines with feature detection, alignment across samples, and statistical analysis; (6) Calculating chemical properties including molecular formulas, isotopic distributions, amino acid properties, and peptide masses; (7) Integrating with search engines (Comet, Mascot, MSGF+) and post-processing tools (Percolator, MSstats); (8) Building custom MS data analysis workflows that require low-level access to spectra, chromatograms, and peak data; (9) Performing quality control on MS data including TIC/BPC calculation, retention time analysis, and data validation; (10) When you need Python-based alternatives to vendor software for MS data processing and analysis.
|
||||||
|
---
|
||||||
|
|
||||||
|
# pyOpenMS
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
pyOpenMS is an open-source Python library providing comprehensive tools for mass spectrometry data analysis in proteomics and metabolomics research. It offers Python bindings to the OpenMS C++ library, enabling efficient processing of LC-MS/MS data, peptide identification, feature detection, quantification, and integration with common proteomics tools like Comet, Mascot, MSGF+, Percolator, and MSstats.
|
||||||
|
|
||||||
|
Use this skill when working with mass spectrometry data analysis tasks, processing proteomics or metabolomics datasets, or implementing computational workflows for biomolecular identification and quantification.
|
||||||
|
|
||||||
|
## Core Capabilities
|
||||||
|
|
||||||
|
### 1. File I/O and Data Import/Export
|
||||||
|
|
||||||
|
Handle diverse mass spectrometry file formats efficiently:
|
||||||
|
|
||||||
|
**Supported Formats:**
|
||||||
|
- **mzML/mzXML**: Primary raw MS data formats (profile or centroid)
|
||||||
|
- **FASTA**: Protein/peptide sequence databases
|
||||||
|
- **mzTab**: Standardized reporting format for identification and quantification
|
||||||
|
- **mzIdentML**: Peptide and protein identification data
|
||||||
|
- **TraML**: Transition lists for targeted experiments
|
||||||
|
- **pepXML/protXML**: Search engine results
|
||||||
|
|
||||||
|
**Reading mzML Files:**
|
||||||
|
```python
|
||||||
|
import pyopenms as oms
|
||||||
|
|
||||||
|
# Load MS data
|
||||||
|
exp = oms.MSExperiment()
|
||||||
|
oms.MzMLFile().load("input_data.mzML", exp)
|
||||||
|
|
||||||
|
# Access basic information
|
||||||
|
print(f"Number of spectra: {exp.getNrSpectra()}")
|
||||||
|
print(f"Number of chromatograms: {exp.getNrChromatograms()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Writing mzML Files:**
|
||||||
|
```python
|
||||||
|
# Save processed data
|
||||||
|
oms.MzMLFile().store("output_data.mzML", exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**File Encoding:** pyOpenMS automatically handles Base64 encoding, zlib compression, and Numpress compression internally.
|
||||||
|
|
||||||
|
### 2. MS Data Structures and Manipulation
|
||||||
|
|
||||||
|
Work with core mass spectrometry data structures. See `references/data_structures.md` for comprehensive details.
|
||||||
|
|
||||||
|
**MSSpectrum** - Individual mass spectrum:
|
||||||
|
```python
|
||||||
|
# Create spectrum with metadata
|
||||||
|
spectrum = oms.MSSpectrum()
|
||||||
|
spectrum.setRT(205.2) # Retention time in seconds
|
||||||
|
spectrum.setMSLevel(2) # MS2 spectrum
|
||||||
|
|
||||||
|
# Set peak data (m/z, intensity arrays)
|
||||||
|
mz_array = [100.5, 200.3, 300.7, 400.2]
|
||||||
|
intensity_array = [1000, 5000, 3000, 2000]
|
||||||
|
spectrum.set_peaks((mz_array, intensity_array))
|
||||||
|
|
||||||
|
# Add precursor information for MS2
|
||||||
|
precursor = oms.Precursor()
|
||||||
|
precursor.setMZ(450.5)
|
||||||
|
precursor.setCharge(2)
|
||||||
|
spectrum.setPrecursors([precursor])
|
||||||
|
```
|
||||||
|
|
||||||
|
**MSExperiment** - Complete LC-MS/MS run:
|
||||||
|
```python
|
||||||
|
# Create experiment and add spectra
|
||||||
|
exp = oms.MSExperiment()
|
||||||
|
exp.addSpectrum(spectrum)
|
||||||
|
|
||||||
|
# Access spectra
|
||||||
|
first_spectrum = exp.getSpectrum(0)
|
||||||
|
for spec in exp:
|
||||||
|
print(f"RT: {spec.getRT()}, MS Level: {spec.getMSLevel()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
**MSChromatogram** - Extracted ion chromatogram:
|
||||||
|
```python
|
||||||
|
# Create chromatogram
|
||||||
|
chrom = oms.MSChromatogram()
|
||||||
|
chrom.set_peaks(([10.5, 11.2, 11.8], [1000, 5000, 3000])) # RT, intensity
|
||||||
|
exp.addChromatogram(chrom)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Efficient Peak Access:**
|
||||||
|
```python
|
||||||
|
# Get peaks as numpy arrays for fast processing
|
||||||
|
mz_array, intensity_array = spectrum.get_peaks()
|
||||||
|
|
||||||
|
# Modify and set back
|
||||||
|
intensity_array *= 2 # Double all intensities
|
||||||
|
spectrum.set_peaks((mz_array, intensity_array))
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Chemistry and Peptide Handling
|
||||||
|
|
||||||
|
Perform chemical calculations for proteomics and metabolomics. See `references/chemistry.md` for detailed examples.
|
||||||
|
|
||||||
|
**Molecular Formulas and Mass Calculations:**
|
||||||
|
```python
|
||||||
|
# Create empirical formula
|
||||||
|
formula = oms.EmpiricalFormula("C6H12O6") # Glucose
|
||||||
|
print(f"Monoisotopic mass: {formula.getMonoWeight()}")
|
||||||
|
print(f"Average mass: {formula.getAverageWeight()}")
|
||||||
|
|
||||||
|
# Formula arithmetic
|
||||||
|
water = oms.EmpiricalFormula("H2O")
|
||||||
|
dehydrated = formula - water
|
||||||
|
|
||||||
|
# Isotope-specific formulas
|
||||||
|
heavy_carbon = oms.EmpiricalFormula("(13)C6H12O6")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isotopic Distributions:**
|
||||||
|
```python
|
||||||
|
# Generate coarse isotope pattern (unit mass resolution)
|
||||||
|
coarse_gen = oms.CoarseIsotopePatternGenerator()
|
||||||
|
pattern = coarse_gen.run(formula)
|
||||||
|
|
||||||
|
# Generate fine structure (high resolution)
|
||||||
|
fine_gen = oms.FineIsotopePatternGenerator(0.01) # 0.01 Da resolution
|
||||||
|
fine_pattern = fine_gen.run(formula)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Amino Acids and Residues:**
|
||||||
|
```python
|
||||||
|
# Access residue information
|
||||||
|
res_db = oms.ResidueDB()
|
||||||
|
leucine = res_db.getResidue("Leucine")
|
||||||
|
print(f"L monoisotopic mass: {leucine.getMonoWeight()}")
|
||||||
|
print(f"L formula: {leucine.getFormula()}")
|
||||||
|
print(f"L pKa: {leucine.getPka()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Peptide Sequences:**
|
||||||
|
```python
|
||||||
|
# Create peptide sequence
|
||||||
|
peptide = oms.AASequence.fromString("PEPTIDE")
|
||||||
|
print(f"Peptide mass: {peptide.getMonoWeight()}")
|
||||||
|
print(f"Formula: {peptide.getFormula()}")
|
||||||
|
|
||||||
|
# Add modifications
|
||||||
|
modified = oms.AASequence.fromString("PEPTIDEM(Oxidation)")
|
||||||
|
print(f"Modified mass: {modified.getMonoWeight()}")
|
||||||
|
|
||||||
|
# Theoretical fragmentation
|
||||||
|
ions = []
|
||||||
|
for i in range(1, peptide.size()):
|
||||||
|
b_ion = peptide.getPrefix(i)
|
||||||
|
y_ion = peptide.getSuffix(i)
|
||||||
|
ions.append(('b', i, b_ion.getMonoWeight()))
|
||||||
|
ions.append(('y', i, y_ion.getMonoWeight()))
|
||||||
|
```
|
||||||
|
|
||||||
|
**Protein Digestion:**
|
||||||
|
```python
|
||||||
|
# Enzymatic digestion
|
||||||
|
dig = oms.ProteaseDigestion()
|
||||||
|
dig.setEnzyme("Trypsin")
|
||||||
|
dig.setMissedCleavages(2)
|
||||||
|
|
||||||
|
protein_seq = oms.AASequence.fromString("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK")
|
||||||
|
peptides = []
|
||||||
|
dig.digest(protein_seq, peptides)
|
||||||
|
|
||||||
|
for pep in peptides:
|
||||||
|
print(f"{pep.toString()}: {pep.getMonoWeight():.2f} Da")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Modifications:**
|
||||||
|
```python
|
||||||
|
# Access modification database
|
||||||
|
mod_db = oms.ModificationsDB()
|
||||||
|
oxidation = mod_db.getModification("Oxidation")
|
||||||
|
print(f"Oxidation mass diff: {oxidation.getDiffMonoMass()}")
|
||||||
|
print(f"Residues: {oxidation.getResidues()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Signal Processing and Filtering
|
||||||
|
|
||||||
|
Apply algorithms to process and filter MS data. See `references/algorithms.md` for comprehensive coverage.
|
||||||
|
|
||||||
|
**Spectral Smoothing:**
|
||||||
|
```python
|
||||||
|
# Gaussian smoothing
|
||||||
|
gauss_filter = oms.GaussFilter()
|
||||||
|
params = gauss_filter.getParameters()
|
||||||
|
params.setValue("gaussian_width", 0.2)
|
||||||
|
gauss_filter.setParameters(params)
|
||||||
|
gauss_filter.filterExperiment(exp)
|
||||||
|
|
||||||
|
# Savitzky-Golay filter
|
||||||
|
sg_filter = oms.SavitzkyGolayFilter()
|
||||||
|
sg_filter.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Peak Filtering:**
|
||||||
|
```python
|
||||||
|
# Keep only N largest peaks per spectrum
|
||||||
|
n_largest = oms.NLargest()
|
||||||
|
params = n_largest.getParameters()
|
||||||
|
params.setValue("n", 100) # Keep top 100 peaks
|
||||||
|
n_largest.setParameters(params)
|
||||||
|
n_largest.filterExperiment(exp)
|
||||||
|
|
||||||
|
# Threshold filtering
|
||||||
|
threshold_filter = oms.ThresholdMower()
|
||||||
|
params = threshold_filter.getParameters()
|
||||||
|
params.setValue("threshold", 1000.0) # Remove peaks below 1000 intensity
|
||||||
|
threshold_filter.setParameters(params)
|
||||||
|
threshold_filter.filterExperiment(exp)
|
||||||
|
|
||||||
|
# Window-based filtering
|
||||||
|
window_filter = oms.WindowMower()
|
||||||
|
params = window_filter.getParameters()
|
||||||
|
params.setValue("windowsize", 50.0) # 50 m/z windows
|
||||||
|
params.setValue("peakcount", 10) # Keep 10 highest per window
|
||||||
|
window_filter.setParameters(params)
|
||||||
|
window_filter.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Spectrum Normalization:**
|
||||||
|
```python
|
||||||
|
normalizer = oms.Normalizer()
|
||||||
|
normalizer.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**MS Level Filtering:**
|
||||||
|
```python
|
||||||
|
# Keep only MS2 spectra
|
||||||
|
exp.filterMSLevel(2)
|
||||||
|
|
||||||
|
# Filter by retention time range
|
||||||
|
exp.filterRT(100.0, 500.0) # Keep RT between 100-500 seconds
|
||||||
|
|
||||||
|
# Filter by m/z range
|
||||||
|
exp.filterMZ(400.0, 1500.0) # Keep m/z between 400-1500
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Feature Detection and Quantification
|
||||||
|
|
||||||
|
Detect and quantify features in LC-MS data:
|
||||||
|
|
||||||
|
**Peak Picking (Centroiding):**
|
||||||
|
```python
|
||||||
|
# Convert profile data to centroid
|
||||||
|
picker = oms.PeakPickerHiRes()
|
||||||
|
params = picker.getParameters()
|
||||||
|
params.setValue("signal_to_noise", 1.0)
|
||||||
|
picker.setParameters(params)
|
||||||
|
|
||||||
|
exp_centroided = oms.MSExperiment()
|
||||||
|
picker.pickExperiment(exp, exp_centroided)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Feature Detection:**
|
||||||
|
```python
|
||||||
|
# Detect features across LC-MS runs
|
||||||
|
feature_finder = oms.FeatureFinderMultiplex()
|
||||||
|
|
||||||
|
features = oms.FeatureMap()
|
||||||
|
feature_finder.run(exp, features, params)
|
||||||
|
|
||||||
|
print(f"Found {features.size()} features")
|
||||||
|
for feature in features:
|
||||||
|
print(f"m/z: {feature.getMZ():.4f}, RT: {feature.getRT():.2f}, "
|
||||||
|
f"Intensity: {feature.getIntensity():.0f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Feature Linking (Map Alignment):**
|
||||||
|
```python
|
||||||
|
# Link features across multiple samples
|
||||||
|
feature_grouper = oms.FeatureGroupingAlgorithmQT()
|
||||||
|
consensus_map = oms.ConsensusMap()
|
||||||
|
|
||||||
|
# Provide multiple feature maps from different samples
|
||||||
|
feature_maps = [features1, features2, features3]
|
||||||
|
feature_grouper.group(feature_maps, consensus_map)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Peptide Identification Workflows
|
||||||
|
|
||||||
|
Integrate with search engines and process identification results:
|
||||||
|
|
||||||
|
**Database Searching:**
|
||||||
|
```python
|
||||||
|
# Prepare parameters for search engine
|
||||||
|
params = oms.Param()
|
||||||
|
params.setValue("database", "uniprot_human.fasta")
|
||||||
|
params.setValue("precursor_mass_tolerance", 10.0) # ppm
|
||||||
|
params.setValue("fragment_mass_tolerance", 0.5) # Da
|
||||||
|
params.setValue("enzyme", "Trypsin")
|
||||||
|
params.setValue("missed_cleavages", 2)
|
||||||
|
|
||||||
|
# Variable modifications
|
||||||
|
params.setValue("variable_modifications", ["Oxidation (M)", "Phospho (STY)"])
|
||||||
|
|
||||||
|
# Fixed modifications
|
||||||
|
params.setValue("fixed_modifications", ["Carbamidomethyl (C)"])
|
||||||
|
```
|
||||||
|
|
||||||
|
**FDR Control:**
|
||||||
|
```python
|
||||||
|
# False discovery rate estimation
|
||||||
|
fdr = oms.FalseDiscoveryRate()
|
||||||
|
fdr_threshold = 0.01 # 1% FDR
|
||||||
|
|
||||||
|
# Apply to peptide identifications
|
||||||
|
protein_ids = []
|
||||||
|
peptide_ids = []
|
||||||
|
oms.IdXMLFile().load("search_results.idXML", protein_ids, peptide_ids)
|
||||||
|
|
||||||
|
fdr.apply(protein_ids, peptide_ids)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Metabolomics Workflows
|
||||||
|
|
||||||
|
Analyze small molecule data:
|
||||||
|
|
||||||
|
**Adduct Detection:**
|
||||||
|
```python
|
||||||
|
# Common metabolite adducts
|
||||||
|
adducts = ["[M+H]+", "[M+Na]+", "[M+K]+", "[M-H]-", "[M+Cl]-"]
|
||||||
|
|
||||||
|
# Feature annotation with adducts
|
||||||
|
for feature in features:
|
||||||
|
mz = feature.getMZ()
|
||||||
|
# Calculate neutral mass for each adduct hypothesis
|
||||||
|
for adduct in adducts:
|
||||||
|
# Annotation logic
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isotope Pattern Matching:**
|
||||||
|
```python
|
||||||
|
# Compare experimental to theoretical isotope patterns
|
||||||
|
experimental_pattern = [] # Extract from feature
|
||||||
|
theoretical = coarse_gen.run(formula)
|
||||||
|
|
||||||
|
# Calculate similarity score
|
||||||
|
similarity = compare_isotope_patterns(experimental_pattern, theoretical)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Quality Control and Visualization
|
||||||
|
|
||||||
|
Monitor data quality and visualize results:
|
||||||
|
|
||||||
|
**Basic Statistics:**
|
||||||
|
```python
|
||||||
|
# Calculate TIC (Total Ion Current)
|
||||||
|
tic_values = []
|
||||||
|
rt_values = []
|
||||||
|
for spectrum in exp:
|
||||||
|
if spectrum.getMSLevel() == 1:
|
||||||
|
tic = sum(spectrum.get_peaks()[1]) # Sum intensities
|
||||||
|
tic_values.append(tic)
|
||||||
|
rt_values.append(spectrum.getRT())
|
||||||
|
|
||||||
|
# Base peak chromatogram
|
||||||
|
bpc_values = []
|
||||||
|
for spectrum in exp:
|
||||||
|
if spectrum.getMSLevel() == 1:
|
||||||
|
max_intensity = max(spectrum.get_peaks()[1]) if spectrum.size() > 0 else 0
|
||||||
|
bpc_values.append(max_intensity)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Plotting (with pyopenms.plotting or matplotlib):**
|
||||||
|
```python
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
|
||||||
|
# Plot TIC
|
||||||
|
plt.figure(figsize=(10, 4))
|
||||||
|
plt.plot(rt_values, tic_values)
|
||||||
|
plt.xlabel('Retention Time (s)')
|
||||||
|
plt.ylabel('Total Ion Current')
|
||||||
|
plt.title('TIC')
|
||||||
|
plt.show()
|
||||||
|
|
||||||
|
# Plot single spectrum
|
||||||
|
spectrum = exp.getSpectrum(0)
|
||||||
|
mz, intensity = spectrum.get_peaks()
|
||||||
|
plt.stem(mz, intensity, basefmt=' ')
|
||||||
|
plt.xlabel('m/z')
|
||||||
|
plt.ylabel('Intensity')
|
||||||
|
plt.title(f'Spectrum at RT {spectrum.getRT():.2f}s')
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Workflows
|
||||||
|
|
||||||
|
### Complete LC-MS/MS Processing Pipeline
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pyopenms as oms
|
||||||
|
|
||||||
|
# 1. Load data
|
||||||
|
exp = oms.MSExperiment()
|
||||||
|
oms.MzMLFile().load("raw_data.mzML", exp)
|
||||||
|
|
||||||
|
# 2. Filter and smooth
|
||||||
|
exp.filterMSLevel(1) # Keep only MS1 for feature detection
|
||||||
|
gauss = oms.GaussFilter()
|
||||||
|
gauss.filterExperiment(exp)
|
||||||
|
|
||||||
|
# 3. Peak picking
|
||||||
|
picker = oms.PeakPickerHiRes()
|
||||||
|
exp_centroid = oms.MSExperiment()
|
||||||
|
picker.pickExperiment(exp, exp_centroid)
|
||||||
|
|
||||||
|
# 4. Feature detection
|
||||||
|
ff = oms.FeatureFinderMultiplex()
|
||||||
|
features = oms.FeatureMap()
|
||||||
|
ff.run(exp_centroid, features, oms.Param())
|
||||||
|
|
||||||
|
# 5. Export results
|
||||||
|
oms.FeatureXMLFile().store("features.featureXML", features)
|
||||||
|
print(f"Detected {features.size()} features")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Theoretical Peptide Mass Calculation
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Calculate masses for peptide with modifications
|
||||||
|
peptide = oms.AASequence.fromString("PEPTIDEK")
|
||||||
|
print(f"Unmodified [M+H]+: {peptide.getMonoWeight() + 1.007276:.4f}")
|
||||||
|
|
||||||
|
# With modification
|
||||||
|
modified = oms.AASequence.fromString("PEPTIDEM(Oxidation)K")
|
||||||
|
print(f"Oxidized [M+H]+: {modified.getMonoWeight() + 1.007276:.4f}")
|
||||||
|
|
||||||
|
# Calculate for different charge states
|
||||||
|
for z in [1, 2, 3]:
|
||||||
|
mz = (peptide.getMonoWeight() + z * 1.007276) / z
|
||||||
|
print(f"[M+{z}H]^{z}+: {mz:.4f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
Ensure pyOpenMS is installed before using this skill:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Via conda (recommended)
|
||||||
|
conda install -c bioconda pyopenms
|
||||||
|
|
||||||
|
# Via pip
|
||||||
|
pip install pyopenms
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration with Other Tools
|
||||||
|
|
||||||
|
pyOpenMS integrates seamlessly with:
|
||||||
|
|
||||||
|
- **Search Engines**: Comet, Mascot, MSGF+, MSFragger, Sage, SpectraST
|
||||||
|
- **Post-processing**: Percolator, MSstats, Epiphany
|
||||||
|
- **Metabolomics**: SIRIUS, CSI:FingerID
|
||||||
|
- **Data Analysis**: Pandas, NumPy, SciPy for downstream analysis
|
||||||
|
- **Visualization**: Matplotlib, Seaborn for plotting
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
### references/
|
||||||
|
|
||||||
|
Detailed documentation on core concepts:
|
||||||
|
|
||||||
|
- **data_structures.md** - Comprehensive guide to MSExperiment, MSSpectrum, MSChromatogram, and peak data handling
|
||||||
|
- **algorithms.md** - Complete reference for signal processing, filtering, feature detection, and quantification algorithms
|
||||||
|
- **chemistry.md** - In-depth coverage of chemistry calculations, peptide handling, modifications, and isotope distributions
|
||||||
|
|
||||||
|
Load these references when needing detailed information about specific pyOpenMS capabilities.
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
1. **File Format**: Always use mzML for raw MS data (standardized, well-supported)
|
||||||
|
2. **Peak Access**: Use `get_peaks()` and `set_peaks()` with numpy arrays for efficient processing
|
||||||
|
3. **Parameters**: Always check and configure algorithm parameters via `getParameters()` and `setParameters()`
|
||||||
|
4. **Memory**: For large datasets, process spectra iteratively rather than loading entire experiments
|
||||||
|
5. **Validation**: Check data integrity (MS levels, RT ordering, precursor information) after loading
|
||||||
|
6. **Modifications**: Use standard modification names from UniMod database
|
||||||
|
7. **Units**: RT in seconds, m/z in Thomson (Da/charge), intensity in arbitrary units
|
||||||
|
|
||||||
|
## Common Patterns
|
||||||
|
|
||||||
|
**Algorithm Application Pattern:**
|
||||||
|
```python
|
||||||
|
# 1. Instantiate algorithm
|
||||||
|
algorithm = oms.SomeAlgorithm()
|
||||||
|
|
||||||
|
# 2. Get and configure parameters
|
||||||
|
params = algorithm.getParameters()
|
||||||
|
params.setValue("parameter_name", value)
|
||||||
|
algorithm.setParameters(params)
|
||||||
|
|
||||||
|
# 3. Apply to data
|
||||||
|
algorithm.filterExperiment(exp) # or .process(), .run(), depending on algorithm
|
||||||
|
```
|
||||||
|
|
||||||
|
**File I/O Pattern:**
|
||||||
|
```python
|
||||||
|
# Read
|
||||||
|
data_container = oms.DataContainer() # MSExperiment, FeatureMap, etc.
|
||||||
|
oms.FileHandler().load("input.format", data_container)
|
||||||
|
|
||||||
|
# Process
|
||||||
|
# ... manipulate data_container ...
|
||||||
|
|
||||||
|
# Write
|
||||||
|
oms.FileHandler().store("output.format", data_container)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Getting Help
|
||||||
|
|
||||||
|
- **Documentation**: https://pyopenms.readthedocs.io/
|
||||||
|
- **API Reference**: Browse class documentation for detailed method signatures
|
||||||
|
- **OpenMS Website**: https://www.openms.org/
|
||||||
|
- **GitHub Issues**: https://github.com/OpenMS/OpenMS/issues
|
||||||
643
scientific-packages/pyopenms/references/algorithms.md
Normal file
643
scientific-packages/pyopenms/references/algorithms.md
Normal file
@@ -0,0 +1,643 @@
|
|||||||
|
# pyOpenMS Algorithms Reference
|
||||||
|
|
||||||
|
This document provides comprehensive coverage of algorithms available in pyOpenMS for signal processing, feature detection, and quantification.
|
||||||
|
|
||||||
|
## Algorithm Usage Pattern
|
||||||
|
|
||||||
|
Most pyOpenMS algorithms follow a consistent pattern:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pyopenms as oms
|
||||||
|
|
||||||
|
# 1. Instantiate algorithm
|
||||||
|
algorithm = oms.AlgorithmName()
|
||||||
|
|
||||||
|
# 2. Get parameters
|
||||||
|
params = algorithm.getParameters()
|
||||||
|
|
||||||
|
# 3. Modify parameters
|
||||||
|
params.setValue("parameter_name", value)
|
||||||
|
|
||||||
|
# 4. Set parameters back
|
||||||
|
algorithm.setParameters(params)
|
||||||
|
|
||||||
|
# 5. Apply to data
|
||||||
|
algorithm.filterExperiment(exp) # or .process(), .run(), etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Signal Processing Algorithms
|
||||||
|
|
||||||
|
### Smoothing Filters
|
||||||
|
|
||||||
|
#### GaussFilter - Gaussian Smoothing
|
||||||
|
|
||||||
|
Applies Gaussian smoothing to reduce noise.
|
||||||
|
|
||||||
|
```python
|
||||||
|
gauss = oms.GaussFilter()
|
||||||
|
|
||||||
|
# Configure parameters
|
||||||
|
params = gauss.getParameters()
|
||||||
|
params.setValue("gaussian_width", 0.2) # Gaussian width (larger = more smoothing)
|
||||||
|
params.setValue("ppm_tolerance", 10.0) # PPM tolerance for spacing
|
||||||
|
params.setValue("use_ppm_tolerance", "true")
|
||||||
|
gauss.setParameters(params)
|
||||||
|
|
||||||
|
# Apply to experiment
|
||||||
|
gauss.filterExperiment(exp)
|
||||||
|
|
||||||
|
# Or apply to single spectrum
|
||||||
|
spectrum_smoothed = oms.MSSpectrum()
|
||||||
|
gauss.filter(spectrum, spectrum_smoothed)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Parameters:**
|
||||||
|
- `gaussian_width`: Width of Gaussian kernel (default: 0.2 Da)
|
||||||
|
- `ppm_tolerance`: Tolerance in ppm for spacing
|
||||||
|
- `use_ppm_tolerance`: Whether to use ppm instead of absolute spacing
|
||||||
|
|
||||||
|
#### SavitzkyGolayFilter
|
||||||
|
|
||||||
|
Applies Savitzky-Golay smoothing (polynomial fitting).
|
||||||
|
|
||||||
|
```python
|
||||||
|
sg_filter = oms.SavitzkyGolayFilter()
|
||||||
|
|
||||||
|
params = sg_filter.getParameters()
|
||||||
|
params.setValue("frame_length", 11) # Window size (must be odd)
|
||||||
|
params.setValue("polynomial_order", 3) # Polynomial degree
|
||||||
|
sg_filter.setParameters(params)
|
||||||
|
|
||||||
|
sg_filter.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Parameters:**
|
||||||
|
- `frame_length`: Size of smoothing window (must be odd)
|
||||||
|
- `polynomial_order`: Degree of polynomial (typically 2-4)
|
||||||
|
|
||||||
|
### Peak Filtering
|
||||||
|
|
||||||
|
#### NLargest - Keep Top N Peaks
|
||||||
|
|
||||||
|
Retains only the N most intense peaks per spectrum.
|
||||||
|
|
||||||
|
```python
|
||||||
|
n_largest = oms.NLargest()
|
||||||
|
|
||||||
|
params = n_largest.getParameters()
|
||||||
|
params.setValue("n", 100) # Keep top 100 peaks
|
||||||
|
params.setValue("threshold", 0.0) # Optional minimum intensity
|
||||||
|
n_largest.setParameters(params)
|
||||||
|
|
||||||
|
n_largest.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Parameters:**
|
||||||
|
- `n`: Number of peaks to keep per spectrum
|
||||||
|
- `threshold`: Minimum absolute intensity threshold
|
||||||
|
|
||||||
|
#### ThresholdMower - Intensity Threshold Filtering
|
||||||
|
|
||||||
|
Removes peaks below a specified intensity threshold.
|
||||||
|
|
||||||
|
```python
|
||||||
|
threshold_filter = oms.ThresholdMower()
|
||||||
|
|
||||||
|
params = threshold_filter.getParameters()
|
||||||
|
params.setValue("threshold", 1000.0) # Absolute intensity threshold
|
||||||
|
threshold_filter.setParameters(params)
|
||||||
|
|
||||||
|
threshold_filter.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Parameters:**
|
||||||
|
- `threshold`: Absolute intensity cutoff
|
||||||
|
|
||||||
|
#### WindowMower - Window-Based Peak Selection
|
||||||
|
|
||||||
|
Divides m/z range into windows and keeps top N peaks per window.
|
||||||
|
|
||||||
|
```python
|
||||||
|
window_mower = oms.WindowMower()
|
||||||
|
|
||||||
|
params = window_mower.getParameters()
|
||||||
|
params.setValue("windowsize", 50.0) # Window size in Da (or Thomson)
|
||||||
|
params.setValue("peakcount", 10) # Peaks to keep per window
|
||||||
|
params.setValue("movetype", "jump") # "jump" or "slide"
|
||||||
|
window_mower.setParameters(params)
|
||||||
|
|
||||||
|
window_mower.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Parameters:**
|
||||||
|
- `windowsize`: Size of m/z window (Da)
|
||||||
|
- `peakcount`: Number of peaks to retain per window
|
||||||
|
- `movetype`: "jump" (non-overlapping) or "slide" (overlapping windows)
|
||||||
|
|
||||||
|
#### BernNorm - Bernoulli Normalization
|
||||||
|
|
||||||
|
Statistical normalization based on Bernoulli distribution.
|
||||||
|
|
||||||
|
```python
|
||||||
|
bern_norm = oms.BernNorm()
|
||||||
|
|
||||||
|
params = bern_norm.getParameters()
|
||||||
|
params.setValue("threshold", 0.7) # Threshold for normalization
|
||||||
|
bern_norm.setParameters(params)
|
||||||
|
|
||||||
|
bern_norm.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Spectrum Normalization
|
||||||
|
|
||||||
|
#### Normalizer
|
||||||
|
|
||||||
|
Normalizes spectrum intensities to unit total intensity or maximum intensity.
|
||||||
|
|
||||||
|
```python
|
||||||
|
normalizer = oms.Normalizer()
|
||||||
|
|
||||||
|
params = normalizer.getParameters()
|
||||||
|
params.setValue("method", "to_one") # "to_one" or "to_TIC"
|
||||||
|
normalizer.setParameters(params)
|
||||||
|
|
||||||
|
normalizer.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Methods:**
|
||||||
|
- `to_one`: Normalize max peak to 1.0
|
||||||
|
- `to_TIC`: Normalize to total ion current = 1.0
|
||||||
|
|
||||||
|
#### Scaler
|
||||||
|
|
||||||
|
Scales intensities by a constant factor.
|
||||||
|
|
||||||
|
```python
|
||||||
|
scaler = oms.Scaler()
|
||||||
|
|
||||||
|
params = scaler.getParameters()
|
||||||
|
params.setValue("scaling", 1000.0) # Scaling factor
|
||||||
|
scaler.setParameters(params)
|
||||||
|
|
||||||
|
scaler.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Centroiding and Peak Picking
|
||||||
|
|
||||||
|
### PeakPickerHiRes - High-Resolution Peak Picking
|
||||||
|
|
||||||
|
Converts profile spectra to centroid mode for high-resolution data.
|
||||||
|
|
||||||
|
```python
|
||||||
|
picker = oms.PeakPickerHiRes()
|
||||||
|
|
||||||
|
params = picker.getParameters()
|
||||||
|
params.setValue("signal_to_noise", 1.0) # S/N threshold
|
||||||
|
params.setValue("spacing_difference", 1.5) # Peak spacing factor
|
||||||
|
params.setValue("sn_win_len", 20.0) # S/N window length
|
||||||
|
params.setValue("sn_bin_count", 30) # Bins for S/N estimation
|
||||||
|
params.setValue("ms1_only", "false") # Process only MS1
|
||||||
|
params.setValue("ms_levels", [1, 2]) # MS levels to process
|
||||||
|
picker.setParameters(params)
|
||||||
|
|
||||||
|
# Pick peaks
|
||||||
|
exp_centroided = oms.MSExperiment()
|
||||||
|
picker.pickExperiment(exp, exp_centroided)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Parameters:**
|
||||||
|
- `signal_to_noise`: Minimum signal-to-noise ratio
|
||||||
|
- `spacing_difference`: Minimum spacing between peaks
|
||||||
|
- `ms_levels`: List of MS levels to process
|
||||||
|
|
||||||
|
### PeakPickerWavelet - Wavelet-Based Peak Picking
|
||||||
|
|
||||||
|
Uses continuous wavelet transform for peak detection.
|
||||||
|
|
||||||
|
```python
|
||||||
|
wavelet_picker = oms.PeakPickerWavelet()
|
||||||
|
|
||||||
|
params = wavelet_picker.getParameters()
|
||||||
|
params.setValue("signal_to_noise", 1.0)
|
||||||
|
params.setValue("peak_width", 0.15) # Expected peak width (Da)
|
||||||
|
wavelet_picker.setParameters(params)
|
||||||
|
|
||||||
|
wavelet_picker.pickExperiment(exp, exp_centroided)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Feature Detection
|
||||||
|
|
||||||
|
### FeatureFinder Algorithms
|
||||||
|
|
||||||
|
Feature finders detect 2D features (m/z and RT) in LC-MS data.
|
||||||
|
|
||||||
|
#### FeatureFinderMultiplex
|
||||||
|
|
||||||
|
For multiplex labeling experiments (SILAC, dimethyl labeling).
|
||||||
|
|
||||||
|
```python
|
||||||
|
ff = oms.FeatureFinderMultiplex()
|
||||||
|
|
||||||
|
params = ff.getParameters()
|
||||||
|
params.setValue("algorithm:labels", "[]") # Empty for label-free
|
||||||
|
params.setValue("algorithm:charge", "2:4") # Charge range
|
||||||
|
params.setValue("algorithm:rt_typical", 40.0) # Expected feature RT width
|
||||||
|
params.setValue("algorithm:rt_min", 2.0) # Minimum RT width
|
||||||
|
params.setValue("algorithm:mz_tolerance", 10.0) # m/z tolerance (ppm)
|
||||||
|
params.setValue("algorithm:intensity_cutoff", 1000.0) # Minimum intensity
|
||||||
|
ff.setParameters(params)
|
||||||
|
|
||||||
|
# Run feature detection
|
||||||
|
features = oms.FeatureMap()
|
||||||
|
ff.run(exp, features, oms.Param())
|
||||||
|
|
||||||
|
print(f"Found {features.size()} features")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key Parameters:**
|
||||||
|
- `algorithm:charge`: Charge state range to consider
|
||||||
|
- `algorithm:rt_typical`: Expected peak width in RT dimension
|
||||||
|
- `algorithm:mz_tolerance`: Mass tolerance in ppm
|
||||||
|
- `algorithm:intensity_cutoff`: Minimum intensity threshold
|
||||||
|
|
||||||
|
#### FeatureFinderCentroided
|
||||||
|
|
||||||
|
For centroided data, identifies isotope patterns and traces over RT.
|
||||||
|
|
||||||
|
```python
|
||||||
|
ff_centroided = oms.FeatureFinderCentroided()
|
||||||
|
|
||||||
|
params = ff_centroided.getParameters()
|
||||||
|
params.setValue("mass_trace:mz_tolerance", 10.0) # ppm
|
||||||
|
params.setValue("mass_trace:min_spectra", 5) # Min consecutive spectra
|
||||||
|
params.setValue("isotopic_pattern:charge_low", 1)
|
||||||
|
params.setValue("isotopic_pattern:charge_high", 4)
|
||||||
|
params.setValue("seed:min_score", 0.5)
|
||||||
|
ff_centroided.setParameters(params)
|
||||||
|
|
||||||
|
features = oms.FeatureMap()
|
||||||
|
seeds = oms.FeatureMap() # Optional seed features
|
||||||
|
ff_centroided.run(exp, features, params, seeds)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### FeatureFinderIdentification
|
||||||
|
|
||||||
|
Uses peptide identifications to guide feature detection.
|
||||||
|
|
||||||
|
```python
|
||||||
|
ff_id = oms.FeatureFinderIdentification()
|
||||||
|
|
||||||
|
params = ff_id.getParameters()
|
||||||
|
params.setValue("extract:mz_window", 10.0) # ppm
|
||||||
|
params.setValue("extract:rt_window", 60.0) # seconds
|
||||||
|
params.setValue("detect:peak_width", 30.0) # Expected peak width
|
||||||
|
ff_id.setParameters(params)
|
||||||
|
|
||||||
|
# Requires peptide identifications
|
||||||
|
protein_ids = []
|
||||||
|
peptide_ids = []
|
||||||
|
features = oms.FeatureMap()
|
||||||
|
|
||||||
|
ff_id.run(exp, protein_ids, peptide_ids, features)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Charge and Isotope Deconvolution
|
||||||
|
|
||||||
|
### Decharging and Charge State Deconvolution
|
||||||
|
|
||||||
|
#### FeatureDeconvolution
|
||||||
|
|
||||||
|
Resolves charge states and combines features.
|
||||||
|
|
||||||
|
```python
|
||||||
|
deconv = oms.FeatureDeconvolution()
|
||||||
|
|
||||||
|
params = deconv.getParameters()
|
||||||
|
params.setValue("charge_min", 1)
|
||||||
|
params.setValue("charge_max", 4)
|
||||||
|
params.setValue("q_value", 0.01) # FDR threshold
|
||||||
|
deconv.setParameters(params)
|
||||||
|
|
||||||
|
features_deconv = oms.FeatureMap()
|
||||||
|
consensus_map = oms.ConsensusMap()
|
||||||
|
deconv.compute(features, features_deconv, consensus_map)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Map Alignment
|
||||||
|
|
||||||
|
### MapAlignmentAlgorithm
|
||||||
|
|
||||||
|
Aligns retention times across multiple LC-MS runs.
|
||||||
|
|
||||||
|
#### MapAlignmentAlgorithmPoseClustering
|
||||||
|
|
||||||
|
Pose clustering-based RT alignment.
|
||||||
|
|
||||||
|
```python
|
||||||
|
aligner = oms.MapAlignmentAlgorithmPoseClustering()
|
||||||
|
|
||||||
|
params = aligner.getParameters()
|
||||||
|
params.setValue("max_num_peaks_considered", 1000)
|
||||||
|
params.setValue("pairfinder:distance_MZ:max_difference", 0.3) # Da
|
||||||
|
params.setValue("pairfinder:distance_RT:max_difference", 60.0) # seconds
|
||||||
|
aligner.setParameters(params)
|
||||||
|
|
||||||
|
# Align multiple feature maps
|
||||||
|
feature_maps = [features1, features2, features3]
|
||||||
|
transformations = []
|
||||||
|
|
||||||
|
# Create reference (e.g., use first map)
|
||||||
|
reference = oms.FeatureMap(feature_maps[0])
|
||||||
|
|
||||||
|
# Align others to reference
|
||||||
|
for fm in feature_maps[1:]:
|
||||||
|
transformation = oms.TransformationDescription()
|
||||||
|
aligner.align(fm, reference, transformation)
|
||||||
|
transformations.append(transformation)
|
||||||
|
|
||||||
|
# Apply transformation
|
||||||
|
transformer = oms.MapAlignmentTransformer()
|
||||||
|
transformer.transformRetentionTimes(fm, transformation)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Feature Linking
|
||||||
|
|
||||||
|
### FeatureGroupingAlgorithm
|
||||||
|
|
||||||
|
Links features across samples to create consensus features.
|
||||||
|
|
||||||
|
#### FeatureGroupingAlgorithmQT
|
||||||
|
|
||||||
|
Quality threshold-based feature linking.
|
||||||
|
|
||||||
|
```python
|
||||||
|
grouper = oms.FeatureGroupingAlgorithmQT()
|
||||||
|
|
||||||
|
params = grouper.getParameters()
|
||||||
|
params.setValue("distance_RT:max_difference", 60.0) # seconds
|
||||||
|
params.setValue("distance_MZ:max_difference", 10.0) # ppm
|
||||||
|
params.setValue("distance_MZ:unit", "ppm")
|
||||||
|
grouper.setParameters(params)
|
||||||
|
|
||||||
|
# Create consensus map
|
||||||
|
consensus_map = oms.ConsensusMap()
|
||||||
|
|
||||||
|
# Group features from multiple samples
|
||||||
|
feature_maps = [features1, features2, features3]
|
||||||
|
grouper.group(feature_maps, consensus_map)
|
||||||
|
|
||||||
|
print(f"Created {consensus_map.size()} consensus features")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### FeatureGroupingAlgorithmKD
|
||||||
|
|
||||||
|
KD-tree based linking (faster for large datasets).
|
||||||
|
|
||||||
|
```python
|
||||||
|
grouper_kd = oms.FeatureGroupingAlgorithmKD()
|
||||||
|
|
||||||
|
params = grouper_kd.getParameters()
|
||||||
|
params.setValue("mz_unit", "ppm")
|
||||||
|
params.setValue("mz_tolerance", 10.0)
|
||||||
|
params.setValue("rt_tolerance", 30.0)
|
||||||
|
grouper_kd.setParameters(params)
|
||||||
|
|
||||||
|
consensus_map = oms.ConsensusMap()
|
||||||
|
grouper_kd.group(feature_maps, consensus_map)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Chromatographic Analysis
|
||||||
|
|
||||||
|
### ElutionPeakDetection
|
||||||
|
|
||||||
|
Detects elution peaks in chromatograms.
|
||||||
|
|
||||||
|
```python
|
||||||
|
epd = oms.ElutionPeakDetection()
|
||||||
|
|
||||||
|
params = epd.getParameters()
|
||||||
|
params.setValue("chrom_peak_snr", 3.0) # Signal-to-noise threshold
|
||||||
|
params.setValue("chrom_fwhm", 5.0) # Expected FWHM (seconds)
|
||||||
|
epd.setParameters(params)
|
||||||
|
|
||||||
|
# Apply to chromatograms
|
||||||
|
for chrom in exp.getChromatograms():
|
||||||
|
peaks = epd.detectPeaks(chrom)
|
||||||
|
```
|
||||||
|
|
||||||
|
### MRMFeatureFinderScoring
|
||||||
|
|
||||||
|
Scoring and peak picking for targeted (MRM/SRM) experiments.
|
||||||
|
|
||||||
|
```python
|
||||||
|
mrm_finder = oms.MRMFeatureFinderScoring()
|
||||||
|
|
||||||
|
params = mrm_finder.getParameters()
|
||||||
|
params.setValue("TransitionGroupPicker:min_peak_width", 2.0)
|
||||||
|
params.setValue("TransitionGroupPicker:recalculate_peaks", "true")
|
||||||
|
params.setValue("TransitionGroupPicker:PeakPickerMRM:signal_to_noise", 1.0)
|
||||||
|
mrm_finder.setParameters(params)
|
||||||
|
|
||||||
|
# Requires chromatograms
|
||||||
|
features = oms.FeatureMap()
|
||||||
|
mrm_finder.pickExperiment(chrom_exp, features, targets, transformation, swath_maps)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quantification
|
||||||
|
|
||||||
|
### ProteinInference
|
||||||
|
|
||||||
|
Infers proteins from peptide identifications.
|
||||||
|
|
||||||
|
```python
|
||||||
|
protein_inference = oms.BasicProteinInferenceAlgorithm()
|
||||||
|
|
||||||
|
# Apply to identification results
|
||||||
|
protein_inference.run(peptide_ids, protein_ids)
|
||||||
|
```
|
||||||
|
|
||||||
|
### IsobaricQuantification
|
||||||
|
|
||||||
|
Quantification for isobaric labeling (TMT, iTRAQ).
|
||||||
|
|
||||||
|
```python
|
||||||
|
# For TMT/iTRAQ quantification
|
||||||
|
iso_quant = oms.IsobaricQuantification()
|
||||||
|
|
||||||
|
params = iso_quant.getParameters()
|
||||||
|
params.setValue("channel_116_description", "Sample1")
|
||||||
|
params.setValue("channel_117_description", "Sample2")
|
||||||
|
# ... configure all channels
|
||||||
|
iso_quant.setParameters(params)
|
||||||
|
|
||||||
|
# Run quantification
|
||||||
|
quant_method = oms.IsobaricQuantitationMethod.TMT_10PLEX
|
||||||
|
quant_info = oms.IsobaricQuantifierStatistics()
|
||||||
|
iso_quant.quantify(exp, quant_info)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Processing
|
||||||
|
|
||||||
|
### BaselineFiltering
|
||||||
|
|
||||||
|
Removes baseline from spectra.
|
||||||
|
|
||||||
|
```python
|
||||||
|
baseline = oms.TopHatFilter()
|
||||||
|
|
||||||
|
params = baseline.getParameters()
|
||||||
|
params.setValue("struc_elem_length", 3.0) # Structuring element size
|
||||||
|
params.setValue("struc_elem_unit", "Thomson")
|
||||||
|
baseline.setParameters(params)
|
||||||
|
|
||||||
|
baseline.filterExperiment(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
### SpectraMerger
|
||||||
|
|
||||||
|
Merges consecutive similar spectra.
|
||||||
|
|
||||||
|
```python
|
||||||
|
merger = oms.SpectraMerger()
|
||||||
|
|
||||||
|
params = merger.getParameters()
|
||||||
|
params.setValue("mz_binning_width", 0.05) # Binning width (Da)
|
||||||
|
params.setValue("sort_blocks", "RT_ascending")
|
||||||
|
merger.setParameters(params)
|
||||||
|
|
||||||
|
merger.mergeSpectra(exp)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quality Control
|
||||||
|
|
||||||
|
### MzMLFileQuality
|
||||||
|
|
||||||
|
Analyzes mzML file quality.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Calculate basic QC metrics
|
||||||
|
def calculate_qc_metrics(exp):
|
||||||
|
metrics = {
|
||||||
|
'n_spectra': exp.getNrSpectra(),
|
||||||
|
'n_ms1': sum(1 for s in exp if s.getMSLevel() == 1),
|
||||||
|
'n_ms2': sum(1 for s in exp if s.getMSLevel() == 2),
|
||||||
|
'rt_range': (exp.getMinRT(), exp.getMaxRT()),
|
||||||
|
'mz_range': (exp.getMinMZ(), exp.getMaxMZ()),
|
||||||
|
}
|
||||||
|
|
||||||
|
# Calculate TIC
|
||||||
|
tics = []
|
||||||
|
for spectrum in exp:
|
||||||
|
if spectrum.getMSLevel() == 1:
|
||||||
|
mz, intensity = spectrum.get_peaks()
|
||||||
|
tics.append(sum(intensity))
|
||||||
|
|
||||||
|
metrics['median_tic'] = np.median(tics)
|
||||||
|
metrics['mean_tic'] = np.mean(tics)
|
||||||
|
|
||||||
|
return metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
## FDR Control
|
||||||
|
|
||||||
|
### FalseDiscoveryRate
|
||||||
|
|
||||||
|
Estimates and controls false discovery rate.
|
||||||
|
|
||||||
|
```python
|
||||||
|
fdr = oms.FalseDiscoveryRate()
|
||||||
|
|
||||||
|
params = fdr.getParameters()
|
||||||
|
params.setValue("add_decoy_peptides", "false")
|
||||||
|
params.setValue("add_decoy_proteins", "false")
|
||||||
|
fdr.setParameters(params)
|
||||||
|
|
||||||
|
# Apply to identifications
|
||||||
|
fdr.apply(protein_ids, peptide_ids)
|
||||||
|
|
||||||
|
# Filter by FDR threshold
|
||||||
|
fdr_threshold = 0.01
|
||||||
|
filtered_peptides = [p for p in peptide_ids if p.getMetaValue("q-value") <= fdr_threshold]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Algorithm Selection Guide
|
||||||
|
|
||||||
|
### When to Use Which Algorithm
|
||||||
|
|
||||||
|
**For Smoothing:**
|
||||||
|
- Use `GaussFilter` for general-purpose smoothing
|
||||||
|
- Use `SavitzkyGolayFilter` for preserving peak shapes
|
||||||
|
|
||||||
|
**For Peak Picking:**
|
||||||
|
- Use `PeakPickerHiRes` for high-resolution Orbitrap/FT-ICR data
|
||||||
|
- Use `PeakPickerWavelet` for lower-resolution TOF data
|
||||||
|
|
||||||
|
**For Feature Detection:**
|
||||||
|
- Use `FeatureFinderCentroided` for label-free proteomics (DDA)
|
||||||
|
- Use `FeatureFinderMultiplex` for SILAC/dimethyl labeling
|
||||||
|
- Use `FeatureFinderIdentification` when you have ID information
|
||||||
|
- Use `MRMFeatureFinderScoring` for targeted (MRM/SRM) experiments
|
||||||
|
|
||||||
|
**For Feature Linking:**
|
||||||
|
- Use `FeatureGroupingAlgorithmQT` for small-medium datasets (<10 samples)
|
||||||
|
- Use `FeatureGroupingAlgorithmKD` for large datasets (>10 samples)
|
||||||
|
|
||||||
|
## Parameter Tuning Tips
|
||||||
|
|
||||||
|
1. **S/N Thresholds**: Start with 1-3 for clean data, increase for noisy data
|
||||||
|
2. **m/z Tolerance**: Use 5-10 ppm for high-resolution instruments, 0.5-1 Da for low-res
|
||||||
|
3. **RT Tolerance**: Typically 30-60 seconds depending on chromatographic stability
|
||||||
|
4. **Peak Width**: Measure from real data - varies by instrument and gradient length
|
||||||
|
5. **Charge States**: Set based on expected analytes (1-2 for metabolites, 2-4 for peptides)
|
||||||
|
|
||||||
|
## Common Algorithm Workflows
|
||||||
|
|
||||||
|
### Complete Proteomics Workflow
|
||||||
|
|
||||||
|
```python
|
||||||
|
# 1. Load data
|
||||||
|
exp = oms.MSExperiment()
|
||||||
|
oms.MzMLFile().load("raw.mzML", exp)
|
||||||
|
|
||||||
|
# 2. Smooth
|
||||||
|
gauss = oms.GaussFilter()
|
||||||
|
gauss.filterExperiment(exp)
|
||||||
|
|
||||||
|
# 3. Peak picking
|
||||||
|
picker = oms.PeakPickerHiRes()
|
||||||
|
exp_centroid = oms.MSExperiment()
|
||||||
|
picker.pickExperiment(exp, exp_centroid)
|
||||||
|
|
||||||
|
# 4. Feature detection
|
||||||
|
ff = oms.FeatureFinderCentroided()
|
||||||
|
features = oms.FeatureMap()
|
||||||
|
ff.run(exp_centroid, features, oms.Param(), oms.FeatureMap())
|
||||||
|
|
||||||
|
# 5. Save results
|
||||||
|
oms.FeatureXMLFile().store("features.featureXML", features)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multi-Sample Quantification
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Load multiple samples
|
||||||
|
feature_maps = []
|
||||||
|
for filename in ["sample1.mzML", "sample2.mzML", "sample3.mzML"]:
|
||||||
|
exp = oms.MSExperiment()
|
||||||
|
oms.MzMLFile().load(filename, exp)
|
||||||
|
|
||||||
|
# Process and detect features
|
||||||
|
features = detect_features(exp) # Your processing function
|
||||||
|
feature_maps.append(features)
|
||||||
|
|
||||||
|
# Align retention times
|
||||||
|
align_feature_maps(feature_maps) # Implement alignment
|
||||||
|
|
||||||
|
# Link features
|
||||||
|
grouper = oms.FeatureGroupingAlgorithmQT()
|
||||||
|
consensus_map = oms.ConsensusMap()
|
||||||
|
grouper.group(feature_maps, consensus_map)
|
||||||
|
|
||||||
|
# Export quantification matrix
|
||||||
|
export_quant_matrix(consensus_map)
|
||||||
|
```
|
||||||
715
scientific-packages/pyopenms/references/chemistry.md
Normal file
715
scientific-packages/pyopenms/references/chemistry.md
Normal file
@@ -0,0 +1,715 @@
|
|||||||
|
# pyOpenMS Chemistry Reference
|
||||||
|
|
||||||
|
This document provides comprehensive coverage of chemistry-related functionality in pyOpenMS, including elements, isotopes, molecular formulas, amino acids, peptides, proteins, and modifications.
|
||||||
|
|
||||||
|
## Elements and Isotopes
|
||||||
|
|
||||||
|
### ElementDB - Element Database
|
||||||
|
|
||||||
|
Access atomic and isotopic data for all elements.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pyopenms as oms
|
||||||
|
|
||||||
|
# Get element database instance
|
||||||
|
element_db = oms.ElementDB()
|
||||||
|
|
||||||
|
# Get element by symbol
|
||||||
|
carbon = element_db.getElement("C")
|
||||||
|
nitrogen = element_db.getElement("N")
|
||||||
|
oxygen = element_db.getElement("O")
|
||||||
|
|
||||||
|
# Element properties
|
||||||
|
print(f"Carbon monoisotopic weight: {carbon.getMonoWeight()}")
|
||||||
|
print(f"Carbon average weight: {carbon.getAverageWeight()}")
|
||||||
|
print(f"Atomic number: {carbon.getAtomicNumber()}")
|
||||||
|
print(f"Symbol: {carbon.getSymbol()}")
|
||||||
|
print(f"Name: {carbon.getName()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Isotope Information
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get isotope distribution for an element
|
||||||
|
isotopes = carbon.getIsotopeDistribution()
|
||||||
|
|
||||||
|
# Access specific isotope
|
||||||
|
c12 = element_db.getElement("C", 12) # Carbon-12
|
||||||
|
c13 = element_db.getElement("C", 13) # Carbon-13
|
||||||
|
|
||||||
|
print(f"C-12 abundance: {isotopes.getContainer()[0].getIntensity()}")
|
||||||
|
print(f"C-13 abundance: {isotopes.getContainer()[1].getIntensity()}")
|
||||||
|
|
||||||
|
# Isotope mass
|
||||||
|
print(f"C-12 mass: {c12.getMonoWeight()}")
|
||||||
|
print(f"C-13 mass: {c13.getMonoWeight()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Constants
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Physical constants
|
||||||
|
avogadro = oms.Constants.AVOGADRO
|
||||||
|
electron_mass = oms.Constants.ELECTRON_MASS_U
|
||||||
|
proton_mass = oms.Constants.PROTON_MASS_U
|
||||||
|
|
||||||
|
print(f"Avogadro's number: {avogadro}")
|
||||||
|
print(f"Electron mass: {electron_mass} u")
|
||||||
|
print(f"Proton mass: {proton_mass} u")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Empirical Formulas
|
||||||
|
|
||||||
|
### EmpiricalFormula - Molecular Formulas
|
||||||
|
|
||||||
|
Represent and manipulate molecular formulas.
|
||||||
|
|
||||||
|
#### Creating Formulas
|
||||||
|
|
||||||
|
```python
|
||||||
|
# From string
|
||||||
|
glucose = oms.EmpiricalFormula("C6H12O6")
|
||||||
|
water = oms.EmpiricalFormula("H2O")
|
||||||
|
ammonia = oms.EmpiricalFormula("NH3")
|
||||||
|
|
||||||
|
# From element composition
|
||||||
|
formula = oms.EmpiricalFormula()
|
||||||
|
formula.setCharge(1) # Set charge state
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Formula Arithmetic
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Addition
|
||||||
|
sucrose = oms.EmpiricalFormula("C12H22O11")
|
||||||
|
hydrolyzed = sucrose + water # Hydrolysis adds water
|
||||||
|
|
||||||
|
# Subtraction
|
||||||
|
dehydrated = glucose - water # Dehydration removes water
|
||||||
|
|
||||||
|
# Multiplication
|
||||||
|
three_waters = water * 3 # 3 H2O = H6O3
|
||||||
|
|
||||||
|
# Division
|
||||||
|
formula_half = sucrose / 2 # Half the formula
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Mass Calculations
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Monoisotopic mass
|
||||||
|
mono_mass = glucose.getMonoWeight()
|
||||||
|
print(f"Glucose monoisotopic mass: {mono_mass:.6f} Da")
|
||||||
|
|
||||||
|
# Average mass
|
||||||
|
avg_mass = glucose.getAverageWeight()
|
||||||
|
print(f"Glucose average mass: {avg_mass:.6f} Da")
|
||||||
|
|
||||||
|
# Mass difference
|
||||||
|
mass_diff = (glucose - water).getMonoWeight()
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Elemental Composition
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get element counts
|
||||||
|
formula = oms.EmpiricalFormula("C6H12O6")
|
||||||
|
|
||||||
|
# Access individual elements
|
||||||
|
n_carbon = formula.getNumberOf(element_db.getElement("C"))
|
||||||
|
n_hydrogen = formula.getNumberOf(element_db.getElement("H"))
|
||||||
|
n_oxygen = formula.getNumberOf(element_db.getElement("O"))
|
||||||
|
|
||||||
|
print(f"C: {n_carbon}, H: {n_hydrogen}, O: {n_oxygen}")
|
||||||
|
|
||||||
|
# String representation
|
||||||
|
print(f"Formula: {formula.toString()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Isotope-Specific Formulas
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Specify specific isotopes using parentheses
|
||||||
|
labeled_glucose = oms.EmpiricalFormula("(13)C6H12O6") # All carbons are C-13
|
||||||
|
partially_labeled = oms.EmpiricalFormula("C5(13)CH12O6") # One C-13
|
||||||
|
|
||||||
|
# Deuterium labeling
|
||||||
|
deuterated = oms.EmpiricalFormula("C6D12O6") # D2O instead of H2O
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Charge States
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Set charge
|
||||||
|
formula = oms.EmpiricalFormula("C6H12O6")
|
||||||
|
formula.setCharge(1) # Positive charge
|
||||||
|
|
||||||
|
# Get charge
|
||||||
|
charge = formula.getCharge()
|
||||||
|
|
||||||
|
# Calculate m/z for charged molecule
|
||||||
|
mz = formula.getMonoWeight() / abs(charge) if charge != 0 else formula.getMonoWeight()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Isotope Pattern Generation
|
||||||
|
|
||||||
|
Generate theoretical isotope patterns for formulas.
|
||||||
|
|
||||||
|
#### CoarseIsotopePatternGenerator
|
||||||
|
|
||||||
|
For unit mass resolution (low-resolution instruments).
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create generator
|
||||||
|
coarse_gen = oms.CoarseIsotopePatternGenerator()
|
||||||
|
|
||||||
|
# Generate pattern
|
||||||
|
formula = oms.EmpiricalFormula("C6H12O6")
|
||||||
|
pattern = coarse_gen.run(formula)
|
||||||
|
|
||||||
|
# Access isotope peaks
|
||||||
|
iso_dist = pattern.getContainer()
|
||||||
|
for peak in iso_dist:
|
||||||
|
mass = peak.getMZ()
|
||||||
|
abundance = peak.getIntensity()
|
||||||
|
print(f"m/z: {mass:.4f}, Abundance: {abundance:.4f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### FineIsotopePatternGenerator
|
||||||
|
|
||||||
|
For high-resolution instruments (hyperfine structure).
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create generator with resolution
|
||||||
|
fine_gen = oms.FineIsotopePatternGenerator(0.01) # 0.01 Da resolution
|
||||||
|
|
||||||
|
# Generate fine pattern
|
||||||
|
fine_pattern = fine_gen.run(formula)
|
||||||
|
|
||||||
|
# Access fine isotope structure
|
||||||
|
for peak in fine_pattern.getContainer():
|
||||||
|
print(f"m/z: {peak.getMZ():.6f}, Abundance: {peak.getIntensity():.6f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Isotope Pattern Matching
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Compare experimental to theoretical
|
||||||
|
def compare_isotope_patterns(experimental_mz, experimental_int, formula):
|
||||||
|
# Generate theoretical
|
||||||
|
coarse_gen = oms.CoarseIsotopePatternGenerator()
|
||||||
|
theoretical = coarse_gen.run(formula)
|
||||||
|
|
||||||
|
# Extract theoretical peaks
|
||||||
|
theo_peaks = theoretical.getContainer()
|
||||||
|
theo_mz = [p.getMZ() for p in theo_peaks]
|
||||||
|
theo_int = [p.getIntensity() for p in theo_peaks]
|
||||||
|
|
||||||
|
# Normalize both patterns
|
||||||
|
exp_int_norm = [i / max(experimental_int) for i in experimental_int]
|
||||||
|
theo_int_norm = [i / max(theo_int) for i in theo_int]
|
||||||
|
|
||||||
|
# Calculate similarity (e.g., cosine similarity)
|
||||||
|
# ... implement similarity calculation
|
||||||
|
return similarity_score
|
||||||
|
```
|
||||||
|
|
||||||
|
## Amino Acids and Residues
|
||||||
|
|
||||||
|
### Residue - Amino Acid Representation
|
||||||
|
|
||||||
|
Access properties of amino acids.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get residue database
|
||||||
|
res_db = oms.ResidueDB()
|
||||||
|
|
||||||
|
# Get specific residue
|
||||||
|
leucine = res_db.getResidue("Leucine")
|
||||||
|
# Or by one-letter code
|
||||||
|
leu = res_db.getResidue("L")
|
||||||
|
|
||||||
|
# Residue properties
|
||||||
|
print(f"Name: {leucine.getName()}")
|
||||||
|
print(f"Three-letter code: {leucine.getThreeLetterCode()}")
|
||||||
|
print(f"One-letter code: {leucine.getOneLetterCode()}")
|
||||||
|
print(f"Monoisotopic mass: {leucine.getMonoWeight():.6f}")
|
||||||
|
print(f"Average mass: {leucine.getAverageWeight():.6f}")
|
||||||
|
|
||||||
|
# Chemical formula
|
||||||
|
formula = leucine.getFormula()
|
||||||
|
print(f"Formula: {formula.toString()}")
|
||||||
|
|
||||||
|
# pKa values
|
||||||
|
print(f"pKa (N-term): {leucine.getPka()}")
|
||||||
|
print(f"pKa (C-term): {leucine.getPkb()}")
|
||||||
|
print(f"pKa (side chain): {leucine.getPkc()}")
|
||||||
|
|
||||||
|
# Side chain basicity/acidity
|
||||||
|
print(f"Basicity: {leucine.getBasicity()}")
|
||||||
|
print(f"Hydrophobicity: {leucine.getHydrophobicity()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### All Standard Amino Acids
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Iterate over all residues
|
||||||
|
for residue_name in ["Alanine", "Cysteine", "Aspartic acid", "Glutamic acid",
|
||||||
|
"Phenylalanine", "Glycine", "Histidine", "Isoleucine",
|
||||||
|
"Lysine", "Leucine", "Methionine", "Asparagine",
|
||||||
|
"Proline", "Glutamine", "Arginine", "Serine",
|
||||||
|
"Threonine", "Valine", "Tryptophan", "Tyrosine"]:
|
||||||
|
res = res_db.getResidue(residue_name)
|
||||||
|
print(f"{res.getOneLetterCode()}: {res.getMonoWeight():.4f} Da")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Internal Residues vs. Termini
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get internal residue mass (no terminal groups)
|
||||||
|
internal_mass = leucine.getInternalToFull()
|
||||||
|
|
||||||
|
# Get residue with N-terminal modification
|
||||||
|
n_terminal = res_db.getResidue("L[1]") # With NH2
|
||||||
|
|
||||||
|
# Get residue with C-terminal modification
|
||||||
|
c_terminal = res_db.getResidue("L[2]") # With COOH
|
||||||
|
```
|
||||||
|
|
||||||
|
## Peptide Sequences
|
||||||
|
|
||||||
|
### AASequence - Amino Acid Sequences
|
||||||
|
|
||||||
|
Represent and manipulate peptide sequences.
|
||||||
|
|
||||||
|
#### Creating Sequences
|
||||||
|
|
||||||
|
```python
|
||||||
|
# From string
|
||||||
|
peptide = oms.AASequence.fromString("PEPTIDE")
|
||||||
|
longer = oms.AASequence.fromString("MKTAYIAKQRQISFVK")
|
||||||
|
|
||||||
|
# Empty sequence
|
||||||
|
empty_seq = oms.AASequence()
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Sequence Properties
|
||||||
|
|
||||||
|
```python
|
||||||
|
peptide = oms.AASequence.fromString("PEPTIDE")
|
||||||
|
|
||||||
|
# Length
|
||||||
|
length = peptide.size()
|
||||||
|
print(f"Length: {length} residues")
|
||||||
|
|
||||||
|
# Mass
|
||||||
|
mono_mass = peptide.getMonoWeight()
|
||||||
|
avg_mass = peptide.getAverageWeight()
|
||||||
|
print(f"Monoisotopic mass: {mono_mass:.6f} Da")
|
||||||
|
print(f"Average mass: {avg_mass:.6f} Da")
|
||||||
|
|
||||||
|
# Formula
|
||||||
|
formula = peptide.getFormula()
|
||||||
|
print(f"Formula: {formula.toString()}")
|
||||||
|
|
||||||
|
# String representation
|
||||||
|
seq_str = peptide.toString()
|
||||||
|
print(f"Sequence: {seq_str}")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Accessing Individual Residues
|
||||||
|
|
||||||
|
```python
|
||||||
|
peptide = oms.AASequence.fromString("PEPTIDE")
|
||||||
|
|
||||||
|
# Access by index
|
||||||
|
first_aa = peptide[0] # Returns Residue object
|
||||||
|
print(f"First amino acid: {first_aa.getOneLetterCode()}")
|
||||||
|
|
||||||
|
# Iterate
|
||||||
|
for i in range(peptide.size()):
|
||||||
|
residue = peptide[i]
|
||||||
|
print(f"Position {i}: {residue.getOneLetterCode()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Modifications
|
||||||
|
|
||||||
|
Add post-translational modifications (PTMs) to sequences.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Modifications in sequence string
|
||||||
|
# Format: AA(ModificationName)
|
||||||
|
oxidized_met = oms.AASequence.fromString("PEPTIDEM(Oxidation)")
|
||||||
|
phospho = oms.AASequence.fromString("PEPTIDES(Phospho)T(Phospho)")
|
||||||
|
|
||||||
|
# Multiple modifications
|
||||||
|
multi_mod = oms.AASequence.fromString("M(Oxidation)PEPTIDEK(Acetyl)")
|
||||||
|
|
||||||
|
# N-terminal modifications
|
||||||
|
n_term_acetyl = oms.AASequence.fromString("(Acetyl)PEPTIDE")
|
||||||
|
|
||||||
|
# C-terminal modifications
|
||||||
|
c_term_amide = oms.AASequence.fromString("PEPTIDE(Amidated)")
|
||||||
|
|
||||||
|
# Check mass change
|
||||||
|
unmodified = oms.AASequence.fromString("PEPTIDE")
|
||||||
|
modified = oms.AASequence.fromString("PEPTIDEM(Oxidation)")
|
||||||
|
mass_diff = modified.getMonoWeight() - unmodified.getMonoWeight()
|
||||||
|
print(f"Mass shift from oxidation: {mass_diff:.6f} Da")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Sequence Manipulation
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Prefix (N-terminal fragment)
|
||||||
|
prefix = peptide.getPrefix(3) # First 3 residues
|
||||||
|
print(f"Prefix: {prefix.toString()}")
|
||||||
|
|
||||||
|
# Suffix (C-terminal fragment)
|
||||||
|
suffix = peptide.getSuffix(3) # Last 3 residues
|
||||||
|
print(f"Suffix: {suffix.toString()}")
|
||||||
|
|
||||||
|
# Subsequence
|
||||||
|
subseq = peptide.getSubsequence(2, 4) # Residues 2-4
|
||||||
|
print(f"Subsequence: {subseq.toString()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Theoretical Fragmentation
|
||||||
|
|
||||||
|
Generate theoretical fragment ions for MS/MS.
|
||||||
|
|
||||||
|
```python
|
||||||
|
peptide = oms.AASequence.fromString("PEPTIDE")
|
||||||
|
|
||||||
|
# b-ions (N-terminal fragments)
|
||||||
|
b_ions = []
|
||||||
|
for i in range(1, peptide.size()):
|
||||||
|
b_fragment = peptide.getPrefix(i)
|
||||||
|
b_mass = b_fragment.getMonoWeight()
|
||||||
|
b_ions.append(('b', i, b_mass))
|
||||||
|
print(f"b{i}: {b_mass:.4f}")
|
||||||
|
|
||||||
|
# y-ions (C-terminal fragments)
|
||||||
|
y_ions = []
|
||||||
|
for i in range(1, peptide.size()):
|
||||||
|
y_fragment = peptide.getSuffix(i)
|
||||||
|
y_mass = y_fragment.getMonoWeight()
|
||||||
|
y_ions.append(('y', i, y_mass))
|
||||||
|
print(f"y{i}: {y_mass:.4f}")
|
||||||
|
|
||||||
|
# a-ions (b - CO)
|
||||||
|
a_ions = []
|
||||||
|
CO_mass = 27.994915 # CO loss
|
||||||
|
for ion_type, position, mass in b_ions:
|
||||||
|
a_mass = mass - CO_mass
|
||||||
|
a_ions.append(('a', position, a_mass))
|
||||||
|
|
||||||
|
# c-ions (b + NH3)
|
||||||
|
NH3_mass = 17.026549 # NH3 gain
|
||||||
|
c_ions = []
|
||||||
|
for ion_type, position, mass in b_ions:
|
||||||
|
c_mass = mass + NH3_mass
|
||||||
|
c_ions.append(('c', position, c_mass))
|
||||||
|
|
||||||
|
# z-ions (y - NH3)
|
||||||
|
z_ions = []
|
||||||
|
for ion_type, position, mass in y_ions:
|
||||||
|
z_mass = mass - NH3_mass
|
||||||
|
z_ions.append(('z', position, z_mass))
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Calculate m/z for Charge States
|
||||||
|
|
||||||
|
```python
|
||||||
|
peptide = oms.AASequence.fromString("PEPTIDE")
|
||||||
|
proton_mass = 1.007276
|
||||||
|
|
||||||
|
# [M+H]+
|
||||||
|
mz_1 = peptide.getMonoWeight() + proton_mass
|
||||||
|
print(f"[M+H]+: {mz_1:.4f}")
|
||||||
|
|
||||||
|
# [M+2H]2+
|
||||||
|
mz_2 = (peptide.getMonoWeight() + 2 * proton_mass) / 2
|
||||||
|
print(f"[M+2H]2+: {mz_2:.4f}")
|
||||||
|
|
||||||
|
# [M+3H]3+
|
||||||
|
mz_3 = (peptide.getMonoWeight() + 3 * proton_mass) / 3
|
||||||
|
print(f"[M+3H]3+: {mz_3:.4f}")
|
||||||
|
|
||||||
|
# General formula for any charge
|
||||||
|
def calculate_mz(sequence, charge):
|
||||||
|
proton_mass = 1.007276
|
||||||
|
return (sequence.getMonoWeight() + charge * proton_mass) / charge
|
||||||
|
|
||||||
|
for z in range(1, 5):
|
||||||
|
print(f"[M+{z}H]{z}+: {calculate_mz(peptide, z):.4f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Protein Digestion
|
||||||
|
|
||||||
|
### ProteaseDigestion - Enzymatic Cleavage
|
||||||
|
|
||||||
|
Simulate enzymatic protein digestion.
|
||||||
|
|
||||||
|
#### Basic Digestion
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create digestion object
|
||||||
|
dig = oms.ProteaseDigestion()
|
||||||
|
|
||||||
|
# Set enzyme
|
||||||
|
dig.setEnzyme("Trypsin") # Cleaves after K, R
|
||||||
|
|
||||||
|
# Other common enzymes:
|
||||||
|
# - "Trypsin" (K, R)
|
||||||
|
# - "Lys-C" (K)
|
||||||
|
# - "Arg-C" (R)
|
||||||
|
# - "Asp-N" (D)
|
||||||
|
# - "Glu-C" (E, D)
|
||||||
|
# - "Chymotrypsin" (F, Y, W, L)
|
||||||
|
|
||||||
|
# Set missed cleavages
|
||||||
|
dig.setMissedCleavages(0) # No missed cleavages
|
||||||
|
dig.setMissedCleavages(2) # Allow up to 2 missed cleavages
|
||||||
|
|
||||||
|
# Perform digestion
|
||||||
|
protein = oms.AASequence.fromString("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK")
|
||||||
|
peptides = []
|
||||||
|
dig.digest(protein, peptides)
|
||||||
|
|
||||||
|
# Print results
|
||||||
|
for pep in peptides:
|
||||||
|
print(f"{pep.toString()}: {pep.getMonoWeight():.2f} Da")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Advanced Digestion Options
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get enzyme specificity
|
||||||
|
specificity = dig.getSpecificity()
|
||||||
|
# oms.EnzymaticDigestion.SPEC_FULL (both termini)
|
||||||
|
# oms.EnzymaticDigestion.SPEC_SEMI (one terminus)
|
||||||
|
# oms.EnzymaticDigestion.SPEC_NONE (no specificity)
|
||||||
|
|
||||||
|
# Set specificity for semi-tryptic search
|
||||||
|
dig.setSpecificity(oms.EnzymaticDigestion.SPEC_SEMI)
|
||||||
|
|
||||||
|
# Get cleavage sites
|
||||||
|
cleavage_residues = dig.getEnzyme().getCutAfterResidues()
|
||||||
|
restriction_residues = dig.getEnzyme().getRestriction()
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Filter Peptides by Properties
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Filter by mass range
|
||||||
|
min_mass = 600.0
|
||||||
|
max_mass = 4000.0
|
||||||
|
filtered = [p for p in peptides if min_mass <= p.getMonoWeight() <= max_mass]
|
||||||
|
|
||||||
|
# Filter by length
|
||||||
|
min_length = 6
|
||||||
|
max_length = 30
|
||||||
|
length_filtered = [p for p in peptides if min_length <= p.size() <= max_length]
|
||||||
|
|
||||||
|
# Combine filters
|
||||||
|
valid_peptides = [p for p in peptides
|
||||||
|
if min_mass <= p.getMonoWeight() <= max_mass
|
||||||
|
and min_length <= p.size() <= max_length]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Modifications
|
||||||
|
|
||||||
|
### ModificationsDB - Modification Database
|
||||||
|
|
||||||
|
Access and apply post-translational modifications.
|
||||||
|
|
||||||
|
#### Accessing Modifications
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get modifications database
|
||||||
|
mod_db = oms.ModificationsDB()
|
||||||
|
|
||||||
|
# Get specific modification
|
||||||
|
oxidation = mod_db.getModification("Oxidation")
|
||||||
|
phospho = mod_db.getModification("Phospho")
|
||||||
|
acetyl = mod_db.getModification("Acetyl")
|
||||||
|
|
||||||
|
# Modification properties
|
||||||
|
print(f"Name: {oxidation.getFullName()}")
|
||||||
|
print(f"Mass difference: {oxidation.getDiffMonoMass():.6f} Da")
|
||||||
|
print(f"Formula: {oxidation.getDiffFormula().toString()}")
|
||||||
|
|
||||||
|
# Affected residues
|
||||||
|
print(f"Residues: {oxidation.getResidues()}") # e.g., ['M']
|
||||||
|
|
||||||
|
# Specificity (N-term, C-term, anywhere)
|
||||||
|
print(f"Term specificity: {oxidation.getTermSpecificity()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Common Modifications
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Oxidation (M)
|
||||||
|
oxidation = mod_db.getModification("Oxidation")
|
||||||
|
print(f"Oxidation: +{oxidation.getDiffMonoMass():.4f} Da")
|
||||||
|
|
||||||
|
# Phosphorylation (S, T, Y)
|
||||||
|
phospho = mod_db.getModification("Phospho")
|
||||||
|
print(f"Phospho: +{phospho.getDiffMonoMass():.4f} Da")
|
||||||
|
|
||||||
|
# Carbamidomethylation (C) - common alkylation
|
||||||
|
carbamido = mod_db.getModification("Carbamidomethyl")
|
||||||
|
print(f"Carbamidomethyl: +{carbamido.getDiffMonoMass():.4f} Da")
|
||||||
|
|
||||||
|
# Acetylation (K, N-term)
|
||||||
|
acetyl = mod_db.getModification("Acetyl")
|
||||||
|
print(f"Acetyl: +{acetyl.getDiffMonoMass():.4f} Da")
|
||||||
|
|
||||||
|
# Deamidation (N, Q)
|
||||||
|
deamid = mod_db.getModification("Deamidated")
|
||||||
|
print(f"Deamidation: +{deamid.getDiffMonoMass():.4f} Da")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Searching Modifications
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Search modifications by mass
|
||||||
|
mass_tolerance = 0.01 # Da
|
||||||
|
target_mass = 15.9949 # Oxidation
|
||||||
|
|
||||||
|
# Get all modifications
|
||||||
|
all_mods = []
|
||||||
|
mod_db.getAllSearchModifications(all_mods)
|
||||||
|
|
||||||
|
# Find matching modifications
|
||||||
|
matching = []
|
||||||
|
for mod_name in all_mods:
|
||||||
|
mod = mod_db.getModification(mod_name)
|
||||||
|
if abs(mod.getDiffMonoMass() - target_mass) < mass_tolerance:
|
||||||
|
matching.append(mod)
|
||||||
|
print(f"Match: {mod.getFullName()} ({mod.getDiffMonoMass():.4f} Da)")
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Variable vs. Fixed Modifications
|
||||||
|
|
||||||
|
```python
|
||||||
|
# In search engines, specify:
|
||||||
|
# Fixed modifications: applied to all occurrences
|
||||||
|
fixed_mods = ["Carbamidomethyl (C)"]
|
||||||
|
|
||||||
|
# Variable modifications: optionally present
|
||||||
|
variable_mods = ["Oxidation (M)", "Phospho (S)", "Phospho (T)", "Phospho (Y)"]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Ribonucleotides (RNA)
|
||||||
|
|
||||||
|
### Ribonucleotide - RNA Building Blocks
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get ribonucleotide database
|
||||||
|
ribo_db = oms.RibonucleotideDB()
|
||||||
|
|
||||||
|
# Get specific ribonucleotide
|
||||||
|
adenine = ribo_db.getRibonucleotide("A")
|
||||||
|
uracil = ribo_db.getRibonucleotide("U")
|
||||||
|
guanine = ribo_db.getRibonucleotide("G")
|
||||||
|
cytosine = ribo_db.getRibonucleotide("C")
|
||||||
|
|
||||||
|
# Properties
|
||||||
|
print(f"Adenine mono mass: {adenine.getMonoWeight()}")
|
||||||
|
print(f"Formula: {adenine.getFormula().toString()}")
|
||||||
|
|
||||||
|
# Modified ribonucleotides
|
||||||
|
modified_ribo = ribo_db.getRibonucleotide("m6A") # N6-methyladenosine
|
||||||
|
```
|
||||||
|
|
||||||
|
## Practical Examples
|
||||||
|
|
||||||
|
### Calculate Peptide Mass with Modifications
|
||||||
|
|
||||||
|
```python
|
||||||
|
def calculate_peptide_mz(sequence_str, charge):
|
||||||
|
"""Calculate m/z for a peptide sequence string with modifications."""
|
||||||
|
peptide = oms.AASequence.fromString(sequence_str)
|
||||||
|
proton_mass = 1.007276
|
||||||
|
mz = (peptide.getMonoWeight() + charge * proton_mass) / charge
|
||||||
|
return mz
|
||||||
|
|
||||||
|
# Examples
|
||||||
|
print(calculate_peptide_mz("PEPTIDE", 2)) # Unmodified [M+2H]2+
|
||||||
|
print(calculate_peptide_mz("PEPTIDEM(Oxidation)", 2)) # With oxidation
|
||||||
|
print(calculate_peptide_mz("(Acetyl)PEPTIDEK(Acetyl)", 2)) # Acetylated
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generate Complete Fragment Ion Series
|
||||||
|
|
||||||
|
```python
|
||||||
|
def generate_fragment_ions(sequence_str, charge_states=[1, 2]):
|
||||||
|
"""Generate comprehensive fragment ion list."""
|
||||||
|
peptide = oms.AASequence.fromString(sequence_str)
|
||||||
|
proton_mass = 1.007276
|
||||||
|
fragments = []
|
||||||
|
|
||||||
|
for i in range(1, peptide.size()):
|
||||||
|
# b and y ions
|
||||||
|
b_frag = peptide.getPrefix(i)
|
||||||
|
y_frag = peptide.getSuffix(i)
|
||||||
|
|
||||||
|
for z in charge_states:
|
||||||
|
b_mz = (b_frag.getMonoWeight() + z * proton_mass) / z
|
||||||
|
y_mz = (y_frag.getMonoWeight() + z * proton_mass) / z
|
||||||
|
|
||||||
|
fragments.append({
|
||||||
|
'type': 'b',
|
||||||
|
'position': i,
|
||||||
|
'charge': z,
|
||||||
|
'mz': b_mz
|
||||||
|
})
|
||||||
|
fragments.append({
|
||||||
|
'type': 'y',
|
||||||
|
'position': i,
|
||||||
|
'charge': z,
|
||||||
|
'mz': y_mz
|
||||||
|
})
|
||||||
|
|
||||||
|
return fragments
|
||||||
|
|
||||||
|
# Usage
|
||||||
|
ions = generate_fragment_ions("PEPTIDE", charge_states=[1, 2])
|
||||||
|
for ion in ions:
|
||||||
|
print(f"{ion['type']}{ion['position']}^{ion['charge']}+: {ion['mz']:.4f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Digest Protein and Calculate Peptide Masses
|
||||||
|
|
||||||
|
```python
|
||||||
|
def digest_and_calculate(protein_seq_str, enzyme="Trypsin", missed_cleavages=2,
|
||||||
|
min_mass=600, max_mass=4000):
|
||||||
|
"""Digest protein and return valid peptides with masses."""
|
||||||
|
dig = oms.ProteaseDigestion()
|
||||||
|
dig.setEnzyme(enzyme)
|
||||||
|
dig.setMissedCleavages(missed_cleavages)
|
||||||
|
|
||||||
|
protein = oms.AASequence.fromString(protein_seq_str)
|
||||||
|
peptides = []
|
||||||
|
dig.digest(protein, peptides)
|
||||||
|
|
||||||
|
results = []
|
||||||
|
for pep in peptides:
|
||||||
|
mass = pep.getMonoWeight()
|
||||||
|
if min_mass <= mass <= max_mass:
|
||||||
|
results.append({
|
||||||
|
'sequence': pep.toString(),
|
||||||
|
'mass': mass,
|
||||||
|
'length': pep.size()
|
||||||
|
})
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
# Usage
|
||||||
|
protein = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK"
|
||||||
|
peptides = digest_and_calculate(protein)
|
||||||
|
for pep in peptides:
|
||||||
|
print(f"{pep['sequence']}: {pep['mass']:.2f} Da ({pep['length']} aa)")
|
||||||
|
```
|
||||||
560
scientific-packages/pyopenms/references/data_structures.md
Normal file
560
scientific-packages/pyopenms/references/data_structures.md
Normal file
@@ -0,0 +1,560 @@
|
|||||||
|
# pyOpenMS Data Structures Reference
|
||||||
|
|
||||||
|
This document provides comprehensive coverage of core data structures in pyOpenMS for representing mass spectrometry data.
|
||||||
|
|
||||||
|
## Core Hierarchy
|
||||||
|
|
||||||
|
```
|
||||||
|
MSExperiment # Top-level: Complete LC-MS/MS run
|
||||||
|
├── MSSpectrum[] # Collection of mass spectra
|
||||||
|
│ ├── Peak1D[] # Individual m/z, intensity pairs
|
||||||
|
│ └── SpectrumSettings # Metadata (RT, MS level, precursor)
|
||||||
|
└── MSChromatogram[] # Collection of chromatograms
|
||||||
|
├── ChromatogramPeak[] # RT, intensity pairs
|
||||||
|
└── ChromatogramSettings # Metadata
|
||||||
|
```
|
||||||
|
|
||||||
|
## MSSpectrum
|
||||||
|
|
||||||
|
Represents a single mass spectrum (1-dimensional peak data).
|
||||||
|
|
||||||
|
### Creation and Basic Properties
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pyopenms as oms
|
||||||
|
|
||||||
|
# Create empty spectrum
|
||||||
|
spectrum = oms.MSSpectrum()
|
||||||
|
|
||||||
|
# Set metadata
|
||||||
|
spectrum.setRT(123.45) # Retention time in seconds
|
||||||
|
spectrum.setMSLevel(1) # MS level (1 for MS1, 2 for MS2, etc.)
|
||||||
|
spectrum.setNativeID("scan=1234") # Native ID from file
|
||||||
|
|
||||||
|
# Additional metadata
|
||||||
|
spectrum.setDriftTime(15.2) # Ion mobility drift time
|
||||||
|
spectrum.setName("MyScan") # Optional name
|
||||||
|
```
|
||||||
|
|
||||||
|
### Peak Data Management
|
||||||
|
|
||||||
|
**Setting Peaks (Method 1 - Lists):**
|
||||||
|
```python
|
||||||
|
mz_values = [100.5, 200.3, 300.7, 400.2, 500.1]
|
||||||
|
intensity_values = [1000, 5000, 3000, 2000, 1500]
|
||||||
|
|
||||||
|
spectrum.set_peaks((mz_values, intensity_values))
|
||||||
|
```
|
||||||
|
|
||||||
|
**Setting Peaks (Method 2 - NumPy arrays):**
|
||||||
|
```python
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
mz_array = np.array([100.5, 200.3, 300.7, 400.2, 500.1])
|
||||||
|
intensity_array = np.array([1000, 5000, 3000, 2000, 1500])
|
||||||
|
|
||||||
|
spectrum.set_peaks((mz_array, intensity_array))
|
||||||
|
```
|
||||||
|
|
||||||
|
**Retrieving Peaks:**
|
||||||
|
```python
|
||||||
|
# Get as numpy arrays (efficient)
|
||||||
|
mz_array, intensity_array = spectrum.get_peaks()
|
||||||
|
|
||||||
|
# Check number of peaks
|
||||||
|
n_peaks = spectrum.size()
|
||||||
|
|
||||||
|
# Get individual peak (slower)
|
||||||
|
for i in range(spectrum.size()):
|
||||||
|
peak = spectrum[i]
|
||||||
|
mz = peak.getMZ()
|
||||||
|
intensity = peak.getIntensity()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Precursor Information (for MS2/MSn spectra)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create precursor
|
||||||
|
precursor = oms.Precursor()
|
||||||
|
precursor.setMZ(456.789) # Precursor m/z
|
||||||
|
precursor.setCharge(2) # Precursor charge
|
||||||
|
precursor.setIntensity(50000) # Precursor intensity
|
||||||
|
precursor.setIsolationWindowLowerOffset(1.5) # Lower isolation window
|
||||||
|
precursor.setIsolationWindowUpperOffset(1.5) # Upper isolation window
|
||||||
|
|
||||||
|
# Set activation method
|
||||||
|
activation = oms.Activation()
|
||||||
|
activation.setActivationEnergy(35.0) # Collision energy
|
||||||
|
activation.setMethod(oms.Activation.ActivationMethod.CID)
|
||||||
|
precursor.setActivation(activation)
|
||||||
|
|
||||||
|
# Assign to spectrum
|
||||||
|
spectrum.setPrecursors([precursor])
|
||||||
|
|
||||||
|
# Retrieve precursor information
|
||||||
|
precursors = spectrum.getPrecursors()
|
||||||
|
if len(precursors) > 0:
|
||||||
|
prec = precursors[0]
|
||||||
|
print(f"Precursor m/z: {prec.getMZ()}")
|
||||||
|
print(f"Precursor charge: {prec.getCharge()}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Spectrum Metadata Access
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Check if spectrum is sorted by m/z
|
||||||
|
is_sorted = spectrum.isSorted()
|
||||||
|
|
||||||
|
# Sort spectrum by m/z
|
||||||
|
spectrum.sortByPosition()
|
||||||
|
|
||||||
|
# Sort by intensity
|
||||||
|
spectrum.sortByIntensity()
|
||||||
|
|
||||||
|
# Clear all peaks
|
||||||
|
spectrum.clear(False) # False = keep metadata, True = clear everything
|
||||||
|
|
||||||
|
# Get retention time
|
||||||
|
rt = spectrum.getRT()
|
||||||
|
|
||||||
|
# Get MS level
|
||||||
|
ms_level = spectrum.getMSLevel()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Spectrum Types and Modes
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Set spectrum type
|
||||||
|
spectrum.setType(oms.SpectrumSettings.SpectrumType.CENTROID) # or PROFILE
|
||||||
|
|
||||||
|
# Get spectrum type
|
||||||
|
spec_type = spectrum.getType()
|
||||||
|
if spec_type == oms.SpectrumSettings.SpectrumType.CENTROID:
|
||||||
|
print("Centroid spectrum")
|
||||||
|
elif spec_type == oms.SpectrumSettings.SpectrumType.PROFILE:
|
||||||
|
print("Profile spectrum")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Processing Annotations
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Add processing information
|
||||||
|
processing = oms.DataProcessing()
|
||||||
|
processing.setMetaValue("smoothing", "gaussian")
|
||||||
|
spectrum.setDataProcessing([processing])
|
||||||
|
```
|
||||||
|
|
||||||
|
## MSExperiment
|
||||||
|
|
||||||
|
Represents a complete LC-MS/MS experiment containing multiple spectra and chromatograms.
|
||||||
|
|
||||||
|
### Creation and Population
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create empty experiment
|
||||||
|
exp = oms.MSExperiment()
|
||||||
|
|
||||||
|
# Add spectra
|
||||||
|
spectrum1 = oms.MSSpectrum()
|
||||||
|
spectrum1.setRT(100.0)
|
||||||
|
spectrum1.set_peaks(([100, 200], [1000, 2000]))
|
||||||
|
|
||||||
|
spectrum2 = oms.MSSpectrum()
|
||||||
|
spectrum2.setRT(200.0)
|
||||||
|
spectrum2.set_peaks(([100, 200], [1500, 2500]))
|
||||||
|
|
||||||
|
exp.addSpectrum(spectrum1)
|
||||||
|
exp.addSpectrum(spectrum2)
|
||||||
|
|
||||||
|
# Add chromatograms
|
||||||
|
chrom = oms.MSChromatogram()
|
||||||
|
chrom.set_peaks(([10.5, 11.0, 11.5], [1000, 5000, 3000]))
|
||||||
|
exp.addChromatogram(chrom)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Accessing Spectra and Chromatograms
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Get number of spectra and chromatograms
|
||||||
|
n_spectra = exp.getNrSpectra()
|
||||||
|
n_chroms = exp.getNrChromatograms()
|
||||||
|
|
||||||
|
# Access by index
|
||||||
|
first_spectrum = exp.getSpectrum(0)
|
||||||
|
last_spectrum = exp.getSpectrum(exp.getNrSpectra() - 1)
|
||||||
|
|
||||||
|
# Iterate over all spectra
|
||||||
|
for spectrum in exp:
|
||||||
|
rt = spectrum.getRT()
|
||||||
|
ms_level = spectrum.getMSLevel()
|
||||||
|
n_peaks = spectrum.size()
|
||||||
|
print(f"RT: {rt:.2f}s, MS{ms_level}, Peaks: {n_peaks}")
|
||||||
|
|
||||||
|
# Get all spectra as list
|
||||||
|
spectra = exp.getSpectra()
|
||||||
|
|
||||||
|
# Access chromatograms
|
||||||
|
chrom = exp.getChromatogram(0)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Filtering Operations
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Filter by MS level
|
||||||
|
exp.filterMSLevel(1) # Keep only MS1 spectra
|
||||||
|
exp.filterMSLevel(2) # Keep only MS2 spectra
|
||||||
|
|
||||||
|
# Filter by retention time range
|
||||||
|
exp.filterRT(100.0, 500.0) # Keep RT between 100-500 seconds
|
||||||
|
|
||||||
|
# Filter by m/z range (all spectra)
|
||||||
|
exp.filterMZ(300.0, 1500.0) # Keep m/z between 300-1500
|
||||||
|
|
||||||
|
# Filter by scan number
|
||||||
|
exp.filterScanNumber(100, 200) # Keep scans 100-200
|
||||||
|
```
|
||||||
|
|
||||||
|
### Metadata and Properties
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Set experiment metadata
|
||||||
|
exp.setMetaValue("operator", "John Doe")
|
||||||
|
exp.setMetaValue("instrument", "Q Exactive HF")
|
||||||
|
|
||||||
|
# Get metadata
|
||||||
|
operator = exp.getMetaValue("operator")
|
||||||
|
|
||||||
|
# Get RT range
|
||||||
|
rt_range = exp.getMinRT(), exp.getMaxRT()
|
||||||
|
|
||||||
|
# Get m/z range
|
||||||
|
mz_range = exp.getMinMZ(), exp.getMaxMZ()
|
||||||
|
|
||||||
|
# Clear all data
|
||||||
|
exp.clear(False) # False = keep metadata
|
||||||
|
```
|
||||||
|
|
||||||
|
### Sorting and Organization
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Sort spectra by retention time
|
||||||
|
exp.sortSpectra()
|
||||||
|
|
||||||
|
# Update ranges (call after modifications)
|
||||||
|
exp.updateRanges()
|
||||||
|
|
||||||
|
# Check if experiment is empty
|
||||||
|
is_empty = exp.empty()
|
||||||
|
|
||||||
|
# Reset (clear everything)
|
||||||
|
exp.reset()
|
||||||
|
```
|
||||||
|
|
||||||
|
## MSChromatogram
|
||||||
|
|
||||||
|
Represents an extracted or reconstructed chromatogram (retention time vs. intensity).
|
||||||
|
|
||||||
|
### Creation and Basic Usage
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create chromatogram
|
||||||
|
chrom = oms.MSChromatogram()
|
||||||
|
|
||||||
|
# Set peaks (RT, intensity pairs)
|
||||||
|
rt_values = [10.0, 10.5, 11.0, 11.5, 12.0]
|
||||||
|
intensity_values = [1000, 5000, 8000, 6000, 2000]
|
||||||
|
chrom.set_peaks((rt_values, intensity_values))
|
||||||
|
|
||||||
|
# Get peaks
|
||||||
|
rt_array, int_array = chrom.get_peaks()
|
||||||
|
|
||||||
|
# Get size
|
||||||
|
n_points = chrom.size()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Chromatogram Types
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Set chromatogram type
|
||||||
|
chrom.setChromatogramType(oms.ChromatogramSettings.ChromatogramType.SELECTED_ION_CURRENT_CHROMATOGRAM)
|
||||||
|
|
||||||
|
# Other types:
|
||||||
|
# - TOTAL_ION_CURRENT_CHROMATOGRAM
|
||||||
|
# - BASEPEAK_CHROMATOGRAM
|
||||||
|
# - SELECTED_ION_CURRENT_CHROMATOGRAM
|
||||||
|
# - SELECTED_REACTION_MONITORING_CHROMATOGRAM
|
||||||
|
```
|
||||||
|
|
||||||
|
### Metadata
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Set native ID
|
||||||
|
chrom.setNativeID("TIC")
|
||||||
|
|
||||||
|
# Set name
|
||||||
|
chrom.setName("Total Ion Current")
|
||||||
|
|
||||||
|
# Access
|
||||||
|
native_id = chrom.getNativeID()
|
||||||
|
name = chrom.getName()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Precursor and Product Information (for SRM/MRM)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# For targeted experiments
|
||||||
|
precursor = oms.Precursor()
|
||||||
|
precursor.setMZ(456.7)
|
||||||
|
chrom.setPrecursor(precursor)
|
||||||
|
|
||||||
|
product = oms.Product()
|
||||||
|
product.setMZ(789.4)
|
||||||
|
chrom.setProduct(product)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Peak1D and ChromatogramPeak
|
||||||
|
|
||||||
|
Individual peak data points.
|
||||||
|
|
||||||
|
### Peak1D (for mass spectra)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create individual peak
|
||||||
|
peak = oms.Peak1D()
|
||||||
|
peak.setMZ(456.789)
|
||||||
|
peak.setIntensity(10000)
|
||||||
|
|
||||||
|
# Access
|
||||||
|
mz = peak.getMZ()
|
||||||
|
intensity = peak.getIntensity()
|
||||||
|
|
||||||
|
# Set position and intensity
|
||||||
|
peak.setPosition([456.789])
|
||||||
|
peak.setIntensity(10000)
|
||||||
|
```
|
||||||
|
|
||||||
|
### ChromatogramPeak (for chromatograms)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create chromatogram peak
|
||||||
|
chrom_peak = oms.ChromatogramPeak()
|
||||||
|
chrom_peak.setRT(125.5)
|
||||||
|
chrom_peak.setIntensity(5000)
|
||||||
|
|
||||||
|
# Access
|
||||||
|
rt = chrom_peak.getRT()
|
||||||
|
intensity = chrom_peak.getIntensity()
|
||||||
|
```
|
||||||
|
|
||||||
|
## FeatureMap and Feature
|
||||||
|
|
||||||
|
For quantification results.
|
||||||
|
|
||||||
|
### Feature
|
||||||
|
|
||||||
|
Represents a detected LC-MS feature (peptide or metabolite signal).
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create feature
|
||||||
|
feature = oms.Feature()
|
||||||
|
|
||||||
|
# Set properties
|
||||||
|
feature.setMZ(456.789)
|
||||||
|
feature.setRT(123.45)
|
||||||
|
feature.setIntensity(1000000)
|
||||||
|
feature.setCharge(2)
|
||||||
|
feature.setWidth(15.0) # RT width in seconds
|
||||||
|
|
||||||
|
# Set quality score
|
||||||
|
feature.setOverallQuality(0.95)
|
||||||
|
|
||||||
|
# Access
|
||||||
|
mz = feature.getMZ()
|
||||||
|
rt = feature.getRT()
|
||||||
|
intensity = feature.getIntensity()
|
||||||
|
charge = feature.getCharge()
|
||||||
|
```
|
||||||
|
|
||||||
|
### FeatureMap
|
||||||
|
|
||||||
|
Collection of features.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create feature map
|
||||||
|
feature_map = oms.FeatureMap()
|
||||||
|
|
||||||
|
# Add features
|
||||||
|
feature1 = oms.Feature()
|
||||||
|
feature1.setMZ(456.789)
|
||||||
|
feature1.setRT(123.45)
|
||||||
|
feature1.setIntensity(1000000)
|
||||||
|
|
||||||
|
feature_map.push_back(feature1)
|
||||||
|
|
||||||
|
# Get size
|
||||||
|
n_features = feature_map.size()
|
||||||
|
|
||||||
|
# Iterate
|
||||||
|
for feature in feature_map:
|
||||||
|
print(f"m/z: {feature.getMZ():.4f}, RT: {feature.getRT():.2f}")
|
||||||
|
|
||||||
|
# Access by index
|
||||||
|
first_feature = feature_map[0]
|
||||||
|
|
||||||
|
# Clear
|
||||||
|
feature_map.clear()
|
||||||
|
```
|
||||||
|
|
||||||
|
## PeptideIdentification and ProteinIdentification
|
||||||
|
|
||||||
|
For identification results.
|
||||||
|
|
||||||
|
### PeptideIdentification
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create peptide identification
|
||||||
|
pep_id = oms.PeptideIdentification()
|
||||||
|
pep_id.setRT(123.45)
|
||||||
|
pep_id.setMZ(456.789)
|
||||||
|
|
||||||
|
# Create peptide hit
|
||||||
|
hit = oms.PeptideHit()
|
||||||
|
hit.setSequence(oms.AASequence.fromString("PEPTIDE"))
|
||||||
|
hit.setCharge(2)
|
||||||
|
hit.setScore(25.5)
|
||||||
|
hit.setRank(1)
|
||||||
|
|
||||||
|
# Add to identification
|
||||||
|
pep_id.setHits([hit])
|
||||||
|
pep_id.setHigherScoreBetter(True)
|
||||||
|
pep_id.setScoreType("XCorr")
|
||||||
|
|
||||||
|
# Access
|
||||||
|
hits = pep_id.getHits()
|
||||||
|
for hit in hits:
|
||||||
|
seq = hit.getSequence().toString()
|
||||||
|
score = hit.getScore()
|
||||||
|
print(f"Sequence: {seq}, Score: {score}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### ProteinIdentification
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create protein identification
|
||||||
|
prot_id = oms.ProteinIdentification()
|
||||||
|
|
||||||
|
# Create protein hit
|
||||||
|
prot_hit = oms.ProteinHit()
|
||||||
|
prot_hit.setAccession("P12345")
|
||||||
|
prot_hit.setSequence("MKTAYIAKQRQISFVK...")
|
||||||
|
prot_hit.setScore(100.5)
|
||||||
|
|
||||||
|
# Add to identification
|
||||||
|
prot_id.setHits([prot_hit])
|
||||||
|
prot_id.setScoreType("Mascot Score")
|
||||||
|
prot_id.setHigherScoreBetter(True)
|
||||||
|
|
||||||
|
# Search parameters
|
||||||
|
search_params = oms.ProteinIdentification.SearchParameters()
|
||||||
|
search_params.db = "uniprot_human.fasta"
|
||||||
|
search_params.enzyme = "Trypsin"
|
||||||
|
prot_id.setSearchParameters(search_params)
|
||||||
|
```
|
||||||
|
|
||||||
|
## ConsensusMap and ConsensusFeature
|
||||||
|
|
||||||
|
For linking features across multiple samples.
|
||||||
|
|
||||||
|
### ConsensusFeature
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create consensus feature
|
||||||
|
cons_feature = oms.ConsensusFeature()
|
||||||
|
cons_feature.setMZ(456.789)
|
||||||
|
cons_feature.setRT(123.45)
|
||||||
|
cons_feature.setIntensity(5000000) # Combined intensity
|
||||||
|
|
||||||
|
# Access linked features
|
||||||
|
for handle in cons_feature.getFeatureList():
|
||||||
|
map_index = handle.getMapIndex()
|
||||||
|
feature_index = handle.getIndex()
|
||||||
|
intensity = handle.getIntensity()
|
||||||
|
```
|
||||||
|
|
||||||
|
### ConsensusMap
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create consensus map
|
||||||
|
consensus_map = oms.ConsensusMap()
|
||||||
|
|
||||||
|
# Add consensus features
|
||||||
|
consensus_map.push_back(cons_feature)
|
||||||
|
|
||||||
|
# Iterate
|
||||||
|
for cons_feat in consensus_map:
|
||||||
|
mz = cons_feat.getMZ()
|
||||||
|
rt = cons_feat.getRT()
|
||||||
|
n_features = cons_feat.size() # Number of linked features
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
1. **Use numpy arrays** for peak data when possible - much faster than individual peak access
|
||||||
|
2. **Sort spectra** by position (m/z) before searching or filtering
|
||||||
|
3. **Update ranges** after modifying MSExperiment: `exp.updateRanges()`
|
||||||
|
4. **Check MS level** before processing - different algorithms for MS1 vs MS2
|
||||||
|
5. **Validate precursor info** for MS2 spectra - ensure charge and m/z are set
|
||||||
|
6. **Use appropriate containers** - MSExperiment for raw data, FeatureMap for quantification
|
||||||
|
7. **Clear metadata carefully** - use `clear(False)` to preserve metadata when clearing peaks
|
||||||
|
|
||||||
|
## Common Patterns
|
||||||
|
|
||||||
|
### Create MS2 Spectrum with Precursor
|
||||||
|
|
||||||
|
```python
|
||||||
|
spectrum = oms.MSSpectrum()
|
||||||
|
spectrum.setRT(205.2)
|
||||||
|
spectrum.setMSLevel(2)
|
||||||
|
spectrum.set_peaks(([100, 200, 300], [1000, 5000, 3000]))
|
||||||
|
|
||||||
|
precursor = oms.Precursor()
|
||||||
|
precursor.setMZ(450.5)
|
||||||
|
precursor.setCharge(2)
|
||||||
|
spectrum.setPrecursors([precursor])
|
||||||
|
```
|
||||||
|
|
||||||
|
### Extract MS1 Spectra from Experiment
|
||||||
|
|
||||||
|
```python
|
||||||
|
ms1_exp = oms.MSExperiment()
|
||||||
|
for spectrum in exp:
|
||||||
|
if spectrum.getMSLevel() == 1:
|
||||||
|
ms1_exp.addSpectrum(spectrum)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Calculate Total Ion Current (TIC)
|
||||||
|
|
||||||
|
```python
|
||||||
|
tic_values = []
|
||||||
|
rt_values = []
|
||||||
|
for spectrum in exp:
|
||||||
|
if spectrum.getMSLevel() == 1:
|
||||||
|
mz, intensity = spectrum.get_peaks()
|
||||||
|
tic = np.sum(intensity)
|
||||||
|
tic_values.append(tic)
|
||||||
|
rt_values.append(spectrum.getRT())
|
||||||
|
```
|
||||||
|
|
||||||
|
### Find Spectrum Closest to RT
|
||||||
|
|
||||||
|
```python
|
||||||
|
target_rt = 125.5
|
||||||
|
closest_spectrum = None
|
||||||
|
min_diff = float('inf')
|
||||||
|
|
||||||
|
for spectrum in exp:
|
||||||
|
diff = abs(spectrum.getRT() - target_rt)
|
||||||
|
if diff < min_diff:
|
||||||
|
min_diff = diff
|
||||||
|
closest_spectrum = spectrum
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user