From c127b737a5942a2aec51fed2c5f8fc456c7e6fd6 Mon Sep 17 00:00:00 2001 From: Vinayak Agarwal Date: Mon, 5 Jan 2026 13:07:50 -0800 Subject: [PATCH] Remove extra md files from markitdown --- scientific-skills/markitdown/LICENSE.txt | 22 -- .../references/INSTALLATION_GUIDE.md | 318 ---------------- .../references/OPENROUTER_INTEGRATION.md | 359 ------------------ .../markitdown/references/QUICK_REFERENCE.md | 309 --------------- .../markitdown/references/README.md | 184 --------- 5 files changed, 1192 deletions(-) delete mode 100644 scientific-skills/markitdown/LICENSE.txt delete mode 100644 scientific-skills/markitdown/references/INSTALLATION_GUIDE.md delete mode 100644 scientific-skills/markitdown/references/OPENROUTER_INTEGRATION.md delete mode 100644 scientific-skills/markitdown/references/QUICK_REFERENCE.md delete mode 100644 scientific-skills/markitdown/references/README.md diff --git a/scientific-skills/markitdown/LICENSE.txt b/scientific-skills/markitdown/LICENSE.txt deleted file mode 100644 index 72196cb..0000000 --- a/scientific-skills/markitdown/LICENSE.txt +++ /dev/null @@ -1,22 +0,0 @@ -MIT License - -Copyright (c) Microsoft Corporation. - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. - diff --git a/scientific-skills/markitdown/references/INSTALLATION_GUIDE.md b/scientific-skills/markitdown/references/INSTALLATION_GUIDE.md deleted file mode 100644 index 4bd1fc1..0000000 --- a/scientific-skills/markitdown/references/INSTALLATION_GUIDE.md +++ /dev/null @@ -1,318 +0,0 @@ -# MarkItDown Installation Guide - -## Prerequisites - -- Python 3.10 or higher -- pip package manager -- Virtual environment (recommended) - -## Basic Installation - -### Install All Features (Recommended) - -```bash -pip install 'markitdown[all]' -``` - -This installs support for all file formats and features. - -### Install Specific Features - -If you only need certain file formats, you can install specific dependencies: - -```bash -# PDF support only -pip install 'markitdown[pdf]' - -# Office documents -pip install 'markitdown[docx,pptx,xlsx]' - -# Multiple formats -pip install 'markitdown[pdf,docx,pptx,xlsx,audio-transcription]' -``` - -### Install from Source - -```bash -git clone https://github.com/microsoft/markitdown.git -cd markitdown -pip install -e 'packages/markitdown[all]' -``` - -## Optional Dependencies - -| Feature | Installation | Use Case | -|---------|--------------|----------| -| All formats | `pip install 'markitdown[all]'` | Everything | -| PDF | `pip install 'markitdown[pdf]'` | PDF documents | -| Word | `pip install 'markitdown[docx]'` | DOCX files | -| PowerPoint | `pip install 'markitdown[pptx]'` | PPTX files | -| Excel (new) | `pip install 'markitdown[xlsx]'` | XLSX files | -| Excel (old) | `pip install 'markitdown[xls]'` | XLS files | -| Outlook | `pip install 'markitdown[outlook]'` | MSG files | -| Azure DI | `pip install 'markitdown[az-doc-intel]'` | Enhanced PDF | -| Audio | `pip install 'markitdown[audio-transcription]'` | WAV/MP3 | -| YouTube | `pip install 'markitdown[youtube-transcription]'` | YouTube videos | - -## System Dependencies - -### OCR Support (for scanned documents and images) - -#### macOS -```bash -brew install tesseract -``` - -#### Ubuntu/Debian -```bash -sudo apt-get update -sudo apt-get install tesseract-ocr -``` - -#### Windows -Download from: https://github.com/UB-Mannheim/tesseract/wiki - -### Poppler Utils (for advanced PDF operations) - -#### macOS -```bash -brew install poppler -``` - -#### Ubuntu/Debian -```bash -sudo apt-get install poppler-utils -``` - -## Verification - -Test your installation: - -```bash -# Check version -python -c "import markitdown; print('MarkItDown installed successfully')" - -# Test basic conversion -echo "Test" > test.txt -markitdown test.txt -rm test.txt -``` - -## Virtual Environment Setup - -### Using venv - -```bash -# Create virtual environment -python -m venv markitdown-env - -# Activate (macOS/Linux) -source markitdown-env/bin/activate - -# Activate (Windows) -markitdown-env\Scripts\activate - -# Install -pip install 'markitdown[all]' -``` - -### Using conda - -```bash -# Create environment -conda create -n markitdown python=3.12 - -# Activate -conda activate markitdown - -# Install -pip install 'markitdown[all]' -``` - -### Using uv - -```bash -# Create virtual environment -uv venv --python=3.12 .venv - -# Activate -source .venv/bin/activate - -# Install -uv pip install 'markitdown[all]' -``` - -## AI Enhancement Setup (Optional) - -For AI-powered image descriptions using OpenRouter: - -### OpenRouter API - -OpenRouter provides unified access to multiple AI models (GPT-4, Claude, Gemini, etc.) through a single API. - -```bash -# Install OpenAI SDK (required, already included with markitdown) -pip install openai - -# Get API key from https://openrouter.ai/keys - -# Set API key -export OPENROUTER_API_KEY="sk-or-v1-..." - -# Add to shell profile for persistence -echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux -echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc # macOS -``` - -**Why OpenRouter?** -- Access to 100+ AI models through one API -- Choose between GPT-4, Claude, Gemini, and more -- Competitive pricing -- No vendor lock-in -- Simple OpenAI-compatible interface - -**Popular Models for Image Description:** -- `anthropic/claude-sonnet-4.5` - **Recommended** - Best for scientific vision -- `anthropic/claude-opus-4.5` - Excellent technical analysis -- `openai/gpt-4o` - Good vision understanding -- `google/gemini-pro-vision` - Cost-effective option - -See https://openrouter.ai/models for complete model list and pricing. - -## Azure Document Intelligence Setup (Optional) - -For enhanced PDF conversion: - -1. Create Azure Document Intelligence resource in Azure Portal -2. Get endpoint and key -3. Set environment variables: - -```bash -export AZURE_DOCUMENT_INTELLIGENCE_KEY="your-key" -export AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://your-endpoint.cognitiveservices.azure.com/" -``` - -## Docker Installation (Alternative) - -```bash -# Clone repository -git clone https://github.com/microsoft/markitdown.git -cd markitdown - -# Build image -docker build -t markitdown:latest . - -# Run -docker run --rm -i markitdown:latest < input.pdf > output.md -``` - -## Troubleshooting - -### Import Error -``` -ModuleNotFoundError: No module named 'markitdown' -``` - -**Solution**: Ensure you're in the correct virtual environment and markitdown is installed: -```bash -pip install 'markitdown[all]' -``` - -### Missing Feature -``` -Error: PDF conversion not supported -``` - -**Solution**: Install the specific feature: -```bash -pip install 'markitdown[pdf]' -``` - -### OCR Not Working - -**Solution**: Install Tesseract OCR (see System Dependencies above) - -### Permission Errors - -**Solution**: Use virtual environment or install with `--user` flag: -```bash -pip install --user 'markitdown[all]' -``` - -## Upgrading - -```bash -# Upgrade to latest version -pip install --upgrade 'markitdown[all]' - -# Check version -pip show markitdown -``` - -## Uninstallation - -```bash -pip uninstall markitdown -``` - -## Next Steps - -After installation: -1. Read `QUICK_REFERENCE.md` for basic usage -2. See `SKILL.md` for comprehensive guide -3. Try example scripts in `scripts/` directory -4. Check `assets/example_usage.md` for practical examples - -## Skill Scripts Setup - -To use the skill scripts: - -```bash -# Navigate to scripts directory -cd /Users/vinayak/Documents/claude-scientific-writer/.claude/skills/markitdown/scripts - -# Scripts are already executable, just run them -python batch_convert.py --help -python convert_with_ai.py --help -python convert_literature.py --help -``` - -## Testing Installation - -Create a test file to verify everything works: - -```python -# test_markitdown.py -from markitdown import MarkItDown - -def test_basic(): - md = MarkItDown() - # Create a simple test file - with open("test.txt", "w") as f: - f.write("Hello MarkItDown!") - - # Convert it - result = md.convert("test.txt") - print("✓ Basic conversion works") - print(result.text_content) - - # Cleanup - import os - os.remove("test.txt") - -if __name__ == "__main__": - test_basic() -``` - -Run it: -```bash -python test_markitdown.py -``` - -## Getting Help - -- **Documentation**: See `SKILL.md` and `README.md` -- **GitHub Issues**: https://github.com/microsoft/markitdown/issues -- **Examples**: `assets/example_usage.md` -- **API Reference**: `references/api_reference.md` - diff --git a/scientific-skills/markitdown/references/OPENROUTER_INTEGRATION.md b/scientific-skills/markitdown/references/OPENROUTER_INTEGRATION.md deleted file mode 100644 index f15af23..0000000 --- a/scientific-skills/markitdown/references/OPENROUTER_INTEGRATION.md +++ /dev/null @@ -1,359 +0,0 @@ -# OpenRouter Integration for MarkItDown - -## Overview - -This MarkItDown skill has been configured to use **OpenRouter** instead of direct OpenAI API access. OpenRouter provides a unified API gateway to access 100+ AI models from different providers through a single, OpenAI-compatible interface. - -## Why OpenRouter? - -### Benefits - -1. **Multiple Model Access**: Access GPT-4, Claude, Gemini, and 100+ other models through one API -2. **No Vendor Lock-in**: Switch between models without code changes -3. **Competitive Pricing**: Often better rates than going direct -4. **Simple Migration**: OpenAI-compatible API means minimal code changes -5. **Flexible Choice**: Choose the best model for each task - -### Popular Models for Image Description - -| Model | Provider | Use Case | Vision Support | -|-------|----------|----------|----------------| -| `anthropic/claude-sonnet-4.5` | Anthropic | **Recommended** - Best overall for scientific analysis | ✅ | -| `anthropic/claude-opus-4.5` | Anthropic | Excellent technical analysis | ✅ | -| `openai/gpt-4o` | OpenAI | Strong vision understanding | ✅ | -| `openai/gpt-4-vision` | OpenAI | GPT-4 with vision | ✅ | -| `google/gemini-pro-vision` | Google | Cost-effective option | ✅ | - -See https://openrouter.ai/models for the complete list. - -## Getting Started - -### 1. Get an API Key - -1. Visit https://openrouter.ai/keys -2. Sign up or log in -3. Create a new API key -4. Copy the key (starts with `sk-or-v1-...`) - -### 2. Set Environment Variable - -```bash -# Add to your environment -export OPENROUTER_API_KEY="sk-or-v1-..." - -# Make it permanent -echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc # macOS -echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux - -# Reload shell -source ~/.zshrc # or source ~/.bashrc -``` - -### 3. Use in Python - -```python -from markitdown import MarkItDown -from openai import OpenAI - -# Initialize OpenRouter client (OpenAI-compatible) -client = OpenAI( - api_key="your-openrouter-api-key", # or use env var - base_url="https://openrouter.ai/api/v1" -) - -# Create MarkItDown with AI support -md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5" # Choose your model -) - -# Convert with AI-enhanced descriptions -result = md.convert("presentation.pptx") -print(result.text_content) -``` - -## Using the Scripts - -All skill scripts have been updated to use OpenRouter: - -### convert_with_ai.py - -```bash -# Set API key -export OPENROUTER_API_KEY="sk-or-v1-..." - -# Convert with default model (advanced vision model) -python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific - -# Use GPT-4o as alternative -python scripts/convert_with_ai.py paper.pdf output.md \ - --model openai/gpt-4o \ - --prompt-type scientific - -# Use Gemini Pro Vision (cost-effective) -python scripts/convert_with_ai.py slides.pptx output.md \ - --model google/gemini-pro-vision \ - --prompt-type presentation - -# List available prompt types -python scripts/convert_with_ai.py --list-prompts -``` - -### Choosing the Right Model - -```bash -# For scientific papers - use advanced vision model for technical analysis -python scripts/convert_with_ai.py research.pdf output.md \ - --model anthropic/claude-sonnet-4.5 \ - --prompt-type scientific - -# For presentations - use advanced vision model -python scripts/convert_with_ai.py slides.pptx output.md \ - --model anthropic/claude-sonnet-4.5 \ - --prompt-type presentation - -# For data visualizations - use advanced vision model -python scripts/convert_with_ai.py charts.pdf output.md \ - --model anthropic/claude-sonnet-4.5 \ - --prompt-type data_viz - -# For medical images - use advanced vision model for detailed analysis -python scripts/convert_with_ai.py xray.jpg output.md \ - --model anthropic/claude-sonnet-4.5 \ - --prompt-type medical -``` - -## Code Examples - -### Basic Usage - -```python -from markitdown import MarkItDown -from openai import OpenAI -import os - -# Initialize OpenRouter client -client = OpenAI( - api_key=os.environ.get("OPENROUTER_API_KEY"), - base_url="https://openrouter.ai/api/v1" -) - -# Use advanced vision model for image descriptions -md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5" -) - -result = md.convert("document.pptx") -print(result.text_content) -``` - -### Switching Models Dynamically - -```python -from markitdown import MarkItDown -from openai import OpenAI -import os - -client = OpenAI( - api_key=os.environ["OPENROUTER_API_KEY"], - base_url="https://openrouter.ai/api/v1" -) - -# Use different models for different file types -def convert_with_best_model(filepath): - if filepath.endswith('.pdf'): - # Use advanced vision model for technical PDFs - md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5", - llm_prompt="Describe scientific figures with technical precision" - ) - elif filepath.endswith('.pptx'): - # Use advanced vision model for presentations - md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5", - llm_prompt="Describe slide content and visual elements" - ) - else: - # Use advanced vision model as default - md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5" - ) - - return md.convert(filepath) - -# Use it -result = convert_with_best_model("paper.pdf") -``` - -### Custom Prompts per Model - -```python -from markitdown import MarkItDown -from openai import OpenAI - -client = OpenAI( - api_key="your-openrouter-api-key", - base_url="https://openrouter.ai/api/v1" -) - -# Scientific analysis with advanced vision model -scientific_prompt = """ -Analyze this scientific figure. Provide: -1. Type of visualization and methodology -2. Quantitative data points and trends -3. Statistical significance -4. Technical interpretation -Be precise and use scientific terminology. -""" - -md_scientific = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5", - llm_prompt=scientific_prompt -) - -# Visual analysis with advanced vision model -visual_prompt = """ -Describe this image comprehensively: -1. Main visual elements and composition -2. Colors, layout, and design -3. Text and labels -4. Overall message -""" - -md_visual = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5", - llm_prompt=visual_prompt -) -``` - -## Model Comparison - -### For Scientific Content - -**Recommended: anthropic/claude-sonnet-4.5** -- Excellent at technical analysis -- Superior reasoning capabilities -- Best at understanding scientific figures -- Most detailed and accurate explanations -- Advanced vision capabilities - -**Alternative: openai/gpt-4o** -- Good vision understanding -- Fast processing -- Good at charts and graphs - -### For Presentations - -**Recommended: anthropic/claude-sonnet-4.5** -- Superior vision capabilities -- Excellent at understanding slide layouts -- Fast and reliable -- Best technical comprehension - -### For Cost-Effectiveness - -**Recommended: google/gemini-pro-vision** -- Lower cost per request -- Good quality -- Fast processing - -## Pricing Considerations - -OpenRouter pricing varies by model. Check current rates at https://openrouter.ai/models - -**Tips for Cost Optimization:** -1. Use advanced vision models for best quality on complex scientific content -2. Use cheaper models (Gemini) for simple images -3. Batch process similar content with the same model -4. Use appropriate prompts to get better results in fewer retries - -## Troubleshooting - -### API Key Issues - -```bash -# Check if key is set -echo $OPENROUTER_API_KEY - -# Should show: sk-or-v1-... -# If empty, set it: -export OPENROUTER_API_KEY="sk-or-v1-..." -``` - -### Model Not Found - -If you get a "model not found" error, check: -1. Model name format: `provider/model-name` -2. Model availability: https://openrouter.ai/models -3. Vision support: Ensure model supports vision for image description - -### Rate Limits - -OpenRouter has rate limits. If you hit them: -1. Add delays between requests -2. Use batch processing scripts with `--workers` parameter -3. Consider upgrading your OpenRouter plan - -## Migration Notes - -This skill was updated from direct OpenAI API to OpenRouter. Key changes: - -1. **Environment Variable**: `OPENAI_API_KEY` → `OPENROUTER_API_KEY` -2. **Client Initialization**: Added `base_url="https://openrouter.ai/api/v1"` -3. **Model Names**: `gpt-4o` → `openai/gpt-4o` (with provider prefix) -4. **Script Updates**: All scripts now use OpenRouter by default - -## Resources - -- **OpenRouter Website**: https://openrouter.ai -- **Get API Keys**: https://openrouter.ai/keys -- **Model List**: https://openrouter.ai/models -- **Pricing**: https://openrouter.ai/models (click on model for details) -- **Documentation**: https://openrouter.ai/docs -- **Support**: https://openrouter.ai/discord - -## Example Workflow - -Here's a complete workflow using OpenRouter: - -```bash -# 1. Set up API key -export OPENROUTER_API_KEY="sk-or-v1-your-key-here" - -# 2. Convert a scientific paper with Claude -python scripts/convert_with_ai.py \ - research_paper.pdf \ - output.md \ - --model anthropic/claude-opus-4.5 \ - --prompt-type scientific - -# 3. Convert presentation with GPT-4o -python scripts/convert_with_ai.py \ - talk_slides.pptx \ - slides.md \ - --model openai/gpt-4o \ - --prompt-type presentation - -# 4. Batch convert with cost-effective model -python scripts/batch_convert.py \ - images/ \ - markdown_output/ \ - --extensions .jpg .png -``` - -## Support - -For OpenRouter-specific issues: -- Discord: https://openrouter.ai/discord -- Email: support@openrouter.ai - -For MarkItDown skill issues: -- Check documentation in this skill directory -- Review examples in `assets/example_usage.md` - diff --git a/scientific-skills/markitdown/references/QUICK_REFERENCE.md b/scientific-skills/markitdown/references/QUICK_REFERENCE.md deleted file mode 100644 index 09e2dc8..0000000 --- a/scientific-skills/markitdown/references/QUICK_REFERENCE.md +++ /dev/null @@ -1,309 +0,0 @@ -# MarkItDown Quick Reference - -## Installation - -```bash -# All features -pip install 'markitdown[all]' - -# Specific formats -pip install 'markitdown[pdf,docx,pptx,xlsx]' -``` - -## Basic Usage - -```python -from markitdown import MarkItDown - -md = MarkItDown() -result = md.convert("file.pdf") -print(result.text_content) -``` - -## Command Line - -```bash -# Simple conversion -markitdown input.pdf > output.md -markitdown input.pdf -o output.md - -# With plugins -markitdown --use-plugins file.pdf -o output.md -``` - -## Common Tasks - -### Convert PDF -```python -md = MarkItDown() -result = md.convert("paper.pdf") -``` - -### Convert with AI -```python -from openai import OpenAI - -# Use OpenRouter for multiple model access -client = OpenAI( - api_key="your-openrouter-api-key", - base_url="https://openrouter.ai/api/v1" -) - -md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5" # recommended for vision -) -result = md.convert("slides.pptx") -``` - -### Batch Convert -```bash -python scripts/batch_convert.py input/ output/ --extensions .pdf .docx -``` - -### Literature Conversion -```bash -python scripts/convert_literature.py papers/ markdown/ --create-index -``` - -## Supported Formats - -| Format | Extension | Notes | -|--------|-----------|-------| -| PDF | `.pdf` | Full text + OCR | -| Word | `.docx` | Tables, formatting | -| PowerPoint | `.pptx` | Slides + notes | -| Excel | `.xlsx`, `.xls` | Tables | -| Images | `.jpg`, `.png`, `.gif`, `.webp` | EXIF + OCR | -| Audio | `.wav`, `.mp3` | Transcription | -| HTML | `.html`, `.htm` | Clean conversion | -| Data | `.csv`, `.json`, `.xml` | Structured | -| Archives | `.zip` | Iterates contents | -| E-books | `.epub` | Full text | -| YouTube | URLs | Transcripts | - -## Optional Dependencies - -```bash -[all] # All features -[pdf] # PDF support -[docx] # Word documents -[pptx] # PowerPoint -[xlsx] # Excel -[xls] # Old Excel -[outlook] # Outlook messages -[az-doc-intel] # Azure Document Intelligence -[audio-transcription] # Audio files -[youtube-transcription] # YouTube videos -``` - -## AI-Enhanced Conversion - -### Scientific Papers -```python -from openai import OpenAI - -# Initialize OpenRouter client -client = OpenAI( - api_key="your-openrouter-api-key", - base_url="https://openrouter.ai/api/v1" -) - -md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5", # recommended for scientific vision - llm_prompt="Describe scientific figures with technical precision" -) -result = md.convert("paper.pdf") -``` - -### Custom Prompts -```python -prompt = """ -Analyze this data visualization. Describe: -- Type of chart/graph -- Key trends and patterns -- Notable data points -""" - -md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5", - llm_prompt=prompt -) -``` - -### Available Models via OpenRouter -- `anthropic/claude-sonnet-4.5` - **Recommended for scientific vision** -- `anthropic/claude-opus-4.5` - Advanced vision model -- `openai/gpt-4o` - GPT-4 Omni (vision) -- `openai/gpt-4-vision` - GPT-4 Vision -- `google/gemini-pro-vision` - Gemini Pro Vision - -See https://openrouter.ai/models for full list - -## Azure Document Intelligence - -```python -md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/") -result = md.convert("complex_layout.pdf") -``` - -## Batch Processing - -### Python -```python -from markitdown import MarkItDown -from pathlib import Path - -md = MarkItDown() - -for file in Path("input/").glob("*.pdf"): - result = md.convert(str(file)) - output = Path("output") / f"{file.stem}.md" - output.write_text(result.text_content) -``` - -### Script -```bash -# Parallel conversion -python scripts/batch_convert.py input/ output/ --workers 8 - -# Recursive -python scripts/batch_convert.py input/ output/ -r -``` - -## Error Handling - -```python -try: - result = md.convert("file.pdf") -except FileNotFoundError: - print("File not found") -except Exception as e: - print(f"Error: {e}") -``` - -## Streaming - -```python -with open("large_file.pdf", "rb") as f: - result = md.convert_stream(f, file_extension=".pdf") -``` - -## Common Prompts - -### Scientific -``` -Analyze this scientific figure. Describe: -- Type of visualization -- Key data points and trends -- Axes, labels, and legends -- Scientific significance -``` - -### Medical -``` -Describe this medical image. Include: -- Type of imaging (X-ray, MRI, CT, etc.) -- Anatomical structures visible -- Notable findings -- Clinical relevance -``` - -### Data Visualization -``` -Analyze this data visualization: -- Chart type -- Variables and axes -- Data ranges -- Key patterns and outliers -``` - -## Performance Tips - -1. **Reuse instance**: Create once, use many times -2. **Parallel processing**: Use ThreadPoolExecutor for multiple files -3. **Stream large files**: Use `convert_stream()` for big files -4. **Choose right format**: Install only needed dependencies - -## Environment Variables - -```bash -# OpenRouter for AI-enhanced conversions -export OPENROUTER_API_KEY="sk-or-v1-..." - -# Azure Document Intelligence (optional) -export AZURE_DOCUMENT_INTELLIGENCE_KEY="key..." -export AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://..." -``` - -## Scripts Quick Reference - -### batch_convert.py -```bash -python scripts/batch_convert.py INPUT OUTPUT [OPTIONS] - -Options: - --extensions .pdf .docx File types to convert - --recursive, -r Search subdirectories - --workers 4 Parallel workers - --verbose, -v Detailed output - --plugins, -p Enable plugins -``` - -### convert_with_ai.py -```bash -python scripts/convert_with_ai.py INPUT OUTPUT [OPTIONS] - -Options: - --api-key KEY OpenRouter API key - --model MODEL Model name (default: anthropic/claude-sonnet-4.5) - --prompt-type TYPE Preset prompt (scientific, medical, etc.) - --custom-prompt TEXT Custom prompt - --list-prompts Show available prompts -``` - -### convert_literature.py -```bash -python scripts/convert_literature.py INPUT OUTPUT [OPTIONS] - -Options: - --organize-by-year, -y Organize by year - --create-index, -i Create index file - --recursive, -r Search subdirectories -``` - -## Troubleshooting - -### Missing Dependencies -```bash -pip install 'markitdown[pdf]' # Install PDF support -``` - -### Binary File Error -```python -# Wrong -with open("file.pdf", "r") as f: - -# Correct -with open("file.pdf", "rb") as f: # Binary mode -``` - -### OCR Not Working -```bash -# macOS -brew install tesseract - -# Ubuntu -sudo apt-get install tesseract-ocr -``` - -## More Information - -- **Full Documentation**: See `SKILL.md` -- **API Reference**: See `references/api_reference.md` -- **Format Details**: See `references/file_formats.md` -- **Examples**: See `assets/example_usage.md` -- **GitHub**: https://github.com/microsoft/markitdown - diff --git a/scientific-skills/markitdown/references/README.md b/scientific-skills/markitdown/references/README.md deleted file mode 100644 index 9769486..0000000 --- a/scientific-skills/markitdown/references/README.md +++ /dev/null @@ -1,184 +0,0 @@ -# MarkItDown Skill - -This skill provides comprehensive support for converting various file formats to Markdown using Microsoft's MarkItDown tool. - -## Overview - -MarkItDown is a Python tool that converts files and office documents to Markdown format. This skill includes: - -- Complete API documentation -- Format-specific conversion guides -- Utility scripts for batch processing -- AI-enhanced conversion examples -- Integration with scientific workflows - -## Contents - -### Main Skill File -- **SKILL.md** - Complete guide to using MarkItDown with quick start, examples, and best practices - -### References -- **api_reference.md** - Detailed API documentation, class references, and method signatures -- **file_formats.md** - Format-specific details for all supported file types - -### Scripts -- **batch_convert.py** - Batch convert multiple files with parallel processing -- **convert_with_ai.py** - AI-enhanced conversion with custom prompts -- **convert_literature.py** - Scientific literature conversion with metadata extraction - -### Assets -- **example_usage.md** - Practical examples for common use cases - -## Installation - -```bash -# Install with all features -pip install 'markitdown[all]' - -# Or install specific features -pip install 'markitdown[pdf,docx,pptx,xlsx]' -``` - -## Quick Start - -```python -from markitdown import MarkItDown - -md = MarkItDown() -result = md.convert("document.pdf") -print(result.text_content) -``` - -## Supported Formats - -- **Documents**: PDF, DOCX, PPTX, XLSX, EPUB -- **Images**: JPEG, PNG, GIF, WebP (with OCR) -- **Audio**: WAV, MP3 (with transcription) -- **Web**: HTML, YouTube URLs -- **Data**: CSV, JSON, XML -- **Archives**: ZIP files - -## Key Features - -### 1. AI-Enhanced Conversions -Use AI models via OpenRouter to generate detailed image descriptions: - -```python -from openai import OpenAI - -# OpenRouter provides access to 100+ AI models -client = OpenAI( - api_key="your-openrouter-api-key", - base_url="https://openrouter.ai/api/v1" -) - -md = MarkItDown( - llm_client=client, - llm_model="anthropic/claude-sonnet-4.5" # recommended for vision -) -result = md.convert("presentation.pptx") -``` - -### 2. Batch Processing -Convert multiple files efficiently: - -```bash -python scripts/batch_convert.py papers/ output/ --extensions .pdf .docx -``` - -### 3. Scientific Literature -Convert and organize research papers: - -```bash -python scripts/convert_literature.py papers/ output/ --organize-by-year --create-index -``` - -### 4. Azure Document Intelligence -Enhanced PDF conversion with Microsoft Document Intelligence: - -```python -md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/") -result = md.convert("complex_document.pdf") -``` - -## Use Cases - -### Literature Review -Convert research papers to Markdown for easier analysis and note-taking. - -### Data Extraction -Extract tables from Excel files into Markdown format. - -### Presentation Processing -Convert PowerPoint slides with AI-generated descriptions. - -### Document Analysis -Process documents for LLM consumption with token-efficient Markdown. - -### YouTube Transcripts -Fetch and convert YouTube video transcriptions. - -## Scripts Usage - -### Batch Convert -```bash -# Convert all PDFs in a directory -python scripts/batch_convert.py input_dir/ output_dir/ --extensions .pdf - -# Recursive with multiple formats -python scripts/batch_convert.py docs/ markdown/ --extensions .pdf .docx .pptx -r -``` - -### AI-Enhanced Conversion -```bash -# Convert with AI descriptions via OpenRouter -export OPENROUTER_API_KEY="sk-or-v1-..." -python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific - -# Use different models -python scripts/convert_with_ai.py image.png output.md --model anthropic/claude-sonnet-4.5 - -# Use custom prompt -python scripts/convert_with_ai.py image.png output.md --custom-prompt "Describe this diagram" -``` - -### Literature Conversion -```bash -# Convert papers with metadata extraction -python scripts/convert_literature.py papers/ markdown/ --organize-by-year --create-index -``` - -## Integration with Scientific Writer - -This skill integrates seamlessly with the Scientific Writer CLI for: -- Converting source materials for paper writing -- Processing literature for reviews -- Extracting data from various document formats -- Preparing documents for LLM analysis - -## Resources - -- **MarkItDown GitHub**: https://github.com/microsoft/markitdown -- **PyPI**: https://pypi.org/project/markitdown/ -- **OpenRouter**: https://openrouter.ai (AI model access) -- **OpenRouter API Keys**: https://openrouter.ai/keys -- **OpenRouter Models**: https://openrouter.ai/models -- **License**: MIT - -## Requirements - -- Python 3.10+ -- Optional dependencies based on formats needed -- OpenRouter API key (for AI-enhanced conversions) - Get at https://openrouter.ai/keys -- Azure subscription (optional, for Document Intelligence) - -## Examples - -See `assets/example_usage.md` for comprehensive examples covering: -- Basic conversions -- Scientific workflows -- AI-enhanced processing -- Batch operations -- Error handling -- Integration patterns -