Remove extra md files from markitdown

2026-01-26 16:58:56 +08:00 · 2026-01-05 13:07:50 -08:00
parent 5b52a8dd23
commit c127b737a5
5 changed files with 0 additions and 1192 deletions
--- a/scientific-skills/markitdown/LICENSE.txt
+++ b/scientific-skills/markitdown/LICENSE.txt
@@ -1,22 +0,0 @@
 MIT License
 Copyright (c) Microsoft Corporation.
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/scientific-skills/markitdown/references/INSTALLATION_GUIDE.md
+++ b/scientific-skills/markitdown/references/INSTALLATION_GUIDE.md
@@ -1,318 +0,0 @@
 # MarkItDown Installation Guide
 ## Prerequisites
 - Python 3.10 or higher
 - pip package manager
 - Virtual environment (recommended)
 ## Basic Installation
 ### Install All Features (Recommended)
 ```bash
 pip install 'markitdown[all]'
 ```
 This installs support for all file formats and features.
 ### Install Specific Features
 If you only need certain file formats, you can install specific dependencies:
 ```bash
 # PDF support only
 pip install 'markitdown[pdf]'
 # Office documents
 pip install 'markitdown[docx,pptx,xlsx]'
 # Multiple formats
 pip install 'markitdown[pdf,docx,pptx,xlsx,audio-transcription]'
 ```
 ### Install from Source
 ```bash
 git clone https://github.com/microsoft/markitdown.git
 cd markitdown
 pip install -e 'packages/markitdown[all]'
 ```
 ## Optional Dependencies
 | Feature | Installation | Use Case |
 |---------|--------------|----------|
 | All formats | `pip install 'markitdown[all]'` | Everything |
 | PDF | `pip install 'markitdown[pdf]'` | PDF documents |
 | Word | `pip install 'markitdown[docx]'` | DOCX files |
 | PowerPoint | `pip install 'markitdown[pptx]'` | PPTX files |
 | Excel (new) | `pip install 'markitdown[xlsx]'` | XLSX files |
 | Excel (old) | `pip install 'markitdown[xls]'` | XLS files |
 | Outlook | `pip install 'markitdown[outlook]'` | MSG files |
 | Azure DI | `pip install 'markitdown[az-doc-intel]'` | Enhanced PDF |
 | Audio | `pip install 'markitdown[audio-transcription]'` | WAV/MP3 |
 | YouTube | `pip install 'markitdown[youtube-transcription]'` | YouTube videos |
 ## System Dependencies
 ### OCR Support (for scanned documents and images)
 #### macOS
 ```bash
 brew install tesseract
 ```
 #### Ubuntu/Debian
 ```bash
 sudo apt-get update
 sudo apt-get install tesseract-ocr
 ```
 #### Windows
 Download from: https://github.com/UB-Mannheim/tesseract/wiki
 ### Poppler Utils (for advanced PDF operations)
 #### macOS
 ```bash
 brew install poppler
 ```
 #### Ubuntu/Debian
 ```bash
 sudo apt-get install poppler-utils
 ```
 ## Verification
 Test your installation:
 ```bash
 # Check version
 python -c "import markitdown; print('MarkItDown installed successfully')"
 # Test basic conversion
 echo "Test" > test.txt
 markitdown test.txt
 rm test.txt
 ```
 ## Virtual Environment Setup
 ### Using venv
 ```bash
 # Create virtual environment
 python -m venv markitdown-env
 # Activate (macOS/Linux)
 source markitdown-env/bin/activate
 # Activate (Windows)
 markitdown-env\Scripts\activate
 # Install
 pip install 'markitdown[all]'
 ```
 ### Using conda
 ```bash
 # Create environment
 conda create -n markitdown python=3.12
 # Activate
 conda activate markitdown
 # Install
 pip install 'markitdown[all]'
 ```
 ### Using uv
 ```bash
 # Create virtual environment
 uv venv --python=3.12 .venv
 # Activate
 source .venv/bin/activate
 # Install
 uv pip install 'markitdown[all]'
 ```
 ## AI Enhancement Setup (Optional)
 For AI-powered image descriptions using OpenRouter:
 ### OpenRouter API
 OpenRouter provides unified access to multiple AI models (GPT-4, Claude, Gemini, etc.) through a single API.
 ```bash
 # Install OpenAI SDK (required, already included with markitdown)
 pip install openai
 # Get API key from https://openrouter.ai/keys
 # Set API key
 export OPENROUTER_API_KEY="sk-or-v1-..."
 # Add to shell profile for persistence
 echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc  # Linux
 echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc   # macOS
 ```
 **Why OpenRouter?**
 - Access to 100+ AI models through one API
 - Choose between GPT-4, Claude, Gemini, and more
 - Competitive pricing
 - No vendor lock-in
 - Simple OpenAI-compatible interface
 **Popular Models for Image Description:**
 - `anthropic/claude-sonnet-4.5` - **Recommended** - Best for scientific vision
 - `anthropic/claude-opus-4.5` - Excellent technical analysis
 - `openai/gpt-4o` - Good vision understanding
 - `google/gemini-pro-vision` - Cost-effective option
 See https://openrouter.ai/models for complete model list and pricing.
 ## Azure Document Intelligence Setup (Optional)
 For enhanced PDF conversion:
 1. Create Azure Document Intelligence resource in Azure Portal
 2. Get endpoint and key
 3. Set environment variables:
 ```bash
 export AZURE_DOCUMENT_INTELLIGENCE_KEY="your-key"
 export AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://your-endpoint.cognitiveservices.azure.com/"
 ```
 ## Docker Installation (Alternative)
 ```bash
 # Clone repository
 git clone https://github.com/microsoft/markitdown.git
 cd markitdown
 # Build image
 docker build -t markitdown:latest .
 # Run
 docker run --rm -i markitdown:latest < input.pdf > output.md
 ```
 ## Troubleshooting
 ### Import Error
 ```
 ModuleNotFoundError: No module named 'markitdown'
 ```
 **Solution**: Ensure you're in the correct virtual environment and markitdown is installed:
 ```bash
 pip install 'markitdown[all]'
 ```
 ### Missing Feature
 ```
 Error: PDF conversion not supported
 ```
 **Solution**: Install the specific feature:
 ```bash
 pip install 'markitdown[pdf]'
 ```
 ### OCR Not Working
 **Solution**: Install Tesseract OCR (see System Dependencies above)
 ### Permission Errors
 **Solution**: Use virtual environment or install with `--user` flag:
 ```bash
 pip install --user 'markitdown[all]'
 ```
 ## Upgrading
 ```bash
 # Upgrade to latest version
 pip install --upgrade 'markitdown[all]'
 # Check version
 pip show markitdown
 ```
 ## Uninstallation
 ```bash
 pip uninstall markitdown
 ```
 ## Next Steps
 After installation:
 1. Read `QUICK_REFERENCE.md` for basic usage
 2. See `SKILL.md` for comprehensive guide
 3. Try example scripts in `scripts/` directory
 4. Check `assets/example_usage.md` for practical examples
 ## Skill Scripts Setup
 To use the skill scripts:
 ```bash
 # Navigate to scripts directory
 cd /Users/vinayak/Documents/claude-scientific-writer/.claude/skills/markitdown/scripts
 # Scripts are already executable, just run them
 python batch_convert.py --help
 python convert_with_ai.py --help
 python convert_literature.py --help
 ```
 ## Testing Installation
 Create a test file to verify everything works:
 ```python
 # test_markitdown.py
 from markitdown import MarkItDown
 def test_basic():
    md = MarkItDown()
    # Create a simple test file
    with open("test.txt", "w") as f:
        f.write("Hello MarkItDown!")
    # Convert it
    result = md.convert("test.txt")
    print("✓ Basic conversion works")
    print(result.text_content)
    # Cleanup
    import os
    os.remove("test.txt")
 if __name__ == "__main__":
    test_basic()
 ```
 Run it:
 ```bash
 python test_markitdown.py
 ```
 ## Getting Help
 - **Documentation**: See `SKILL.md` and `README.md`
 - **GitHub Issues**: https://github.com/microsoft/markitdown/issues
 - **Examples**: `assets/example_usage.md`
 - **API Reference**: `references/api_reference.md`
--- a/scientific-skills/markitdown/references/OPENROUTER_INTEGRATION.md
+++ b/scientific-skills/markitdown/references/OPENROUTER_INTEGRATION.md
@@ -1,359 +0,0 @@
 # OpenRouter Integration for MarkItDown
 ## Overview
 This MarkItDown skill has been configured to use **OpenRouter** instead of direct OpenAI API access. OpenRouter provides a unified API gateway to access 100+ AI models from different providers through a single, OpenAI-compatible interface.
 ## Why OpenRouter?
 ### Benefits
 1. **Multiple Model Access**: Access GPT-4, Claude, Gemini, and 100+ other models through one API
 2. **No Vendor Lock-in**: Switch between models without code changes
 3. **Competitive Pricing**: Often better rates than going direct
 4. **Simple Migration**: OpenAI-compatible API means minimal code changes
 5. **Flexible Choice**: Choose the best model for each task
 ### Popular Models for Image Description
 | Model | Provider | Use Case | Vision Support |
 |-------|----------|----------|----------------|
 | `anthropic/claude-sonnet-4.5` | Anthropic | **Recommended** - Best overall for scientific analysis | ✅ |
 | `anthropic/claude-opus-4.5` | Anthropic | Excellent technical analysis | ✅ |
 | `openai/gpt-4o` | OpenAI | Strong vision understanding | ✅ |
 | `openai/gpt-4-vision` | OpenAI | GPT-4 with vision | ✅ |
 | `google/gemini-pro-vision` | Google | Cost-effective option | ✅ |
 See https://openrouter.ai/models for the complete list.
 ## Getting Started
 ### 1. Get an API Key
 1. Visit https://openrouter.ai/keys
 2. Sign up or log in
 3. Create a new API key
 4. Copy the key (starts with `sk-or-v1-...`)
 ### 2. Set Environment Variable
 ```bash
 # Add to your environment
 export OPENROUTER_API_KEY="sk-or-v1-..."
 # Make it permanent
 echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc  # macOS
 echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux
 # Reload shell
 source ~/.zshrc  # or source ~/.bashrc
 ```
 ### 3. Use in Python
 ```python
 from markitdown import MarkItDown
 from openai import OpenAI
 # Initialize OpenRouter client (OpenAI-compatible)
 client = OpenAI(
    api_key="your-openrouter-api-key",  # or use env var
    base_url="https://openrouter.ai/api/v1"
 )
 # Create MarkItDown with AI support
 md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5"  # Choose your model
 )
 # Convert with AI-enhanced descriptions
 result = md.convert("presentation.pptx")
 print(result.text_content)
 ```
 ## Using the Scripts
 All skill scripts have been updated to use OpenRouter:
 ### convert_with_ai.py
 ```bash
 # Set API key
 export OPENROUTER_API_KEY="sk-or-v1-..."
 # Convert with default model (advanced vision model)
 python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific
 # Use GPT-4o as alternative
 python scripts/convert_with_ai.py paper.pdf output.md \
  --model openai/gpt-4o \
  --prompt-type scientific
 # Use Gemini Pro Vision (cost-effective)
 python scripts/convert_with_ai.py slides.pptx output.md \
  --model google/gemini-pro-vision \
  --prompt-type presentation
 # List available prompt types
 python scripts/convert_with_ai.py --list-prompts
 ```
 ### Choosing the Right Model
 ```bash
 # For scientific papers - use advanced vision model for technical analysis
 python scripts/convert_with_ai.py research.pdf output.md \
  --model anthropic/claude-sonnet-4.5 \
  --prompt-type scientific
 # For presentations - use advanced vision model
 python scripts/convert_with_ai.py slides.pptx output.md \
  --model anthropic/claude-sonnet-4.5 \
  --prompt-type presentation
 # For data visualizations - use advanced vision model
 python scripts/convert_with_ai.py charts.pdf output.md \
  --model anthropic/claude-sonnet-4.5 \
  --prompt-type data_viz
 # For medical images - use advanced vision model for detailed analysis
 python scripts/convert_with_ai.py xray.jpg output.md \
  --model anthropic/claude-sonnet-4.5 \
  --prompt-type medical
 ```
 ## Code Examples
 ### Basic Usage
 ```python
 from markitdown import MarkItDown
 from openai import OpenAI
 import os
 # Initialize OpenRouter client
 client = OpenAI(
    api_key=os.environ.get("OPENROUTER_API_KEY"),
    base_url="https://openrouter.ai/api/v1"
 )
 # Use advanced vision model for image descriptions
 md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5"
 )
 result = md.convert("document.pptx")
 print(result.text_content)
 ```
 ### Switching Models Dynamically
 ```python
 from markitdown import MarkItDown
 from openai import OpenAI
 import os
 client = OpenAI(
    api_key=os.environ["OPENROUTER_API_KEY"],
    base_url="https://openrouter.ai/api/v1"
 )
 # Use different models for different file types
 def convert_with_best_model(filepath):
    if filepath.endswith('.pdf'):
        # Use advanced vision model for technical PDFs
        md = MarkItDown(
            llm_client=client,
            llm_model="anthropic/claude-sonnet-4.5",
            llm_prompt="Describe scientific figures with technical precision"
        )
    elif filepath.endswith('.pptx'):
        # Use advanced vision model for presentations
        md = MarkItDown(
            llm_client=client,
            llm_model="anthropic/claude-sonnet-4.5",
            llm_prompt="Describe slide content and visual elements"
        )
    else:
        # Use advanced vision model as default
        md = MarkItDown(
            llm_client=client,
            llm_model="anthropic/claude-sonnet-4.5"
        )
    return md.convert(filepath)
 # Use it
 result = convert_with_best_model("paper.pdf")
 ```
 ### Custom Prompts per Model
 ```python
 from markitdown import MarkItDown
 from openai import OpenAI
 client = OpenAI(
    api_key="your-openrouter-api-key",
    base_url="https://openrouter.ai/api/v1"
 )
 # Scientific analysis with advanced vision model
 scientific_prompt = """
 Analyze this scientific figure. Provide:
 1. Type of visualization and methodology
 2. Quantitative data points and trends
 3. Statistical significance
 4. Technical interpretation
 Be precise and use scientific terminology.
 """
 md_scientific = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5",
    llm_prompt=scientific_prompt
 )
 # Visual analysis with advanced vision model
 visual_prompt = """
 Describe this image comprehensively:
 1. Main visual elements and composition
 2. Colors, layout, and design
 3. Text and labels
 4. Overall message
 """
 md_visual = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5",
    llm_prompt=visual_prompt
 )
 ```
 ## Model Comparison
 ### For Scientific Content
 **Recommended: anthropic/claude-sonnet-4.5**
 - Excellent at technical analysis
 - Superior reasoning capabilities
 - Best at understanding scientific figures
 - Most detailed and accurate explanations
 - Advanced vision capabilities
 **Alternative: openai/gpt-4o**
 - Good vision understanding
 - Fast processing
 - Good at charts and graphs
 ### For Presentations
 **Recommended: anthropic/claude-sonnet-4.5**
 - Superior vision capabilities
 - Excellent at understanding slide layouts
 - Fast and reliable
 - Best technical comprehension
 ### For Cost-Effectiveness
 **Recommended: google/gemini-pro-vision**
 - Lower cost per request
 - Good quality
 - Fast processing
 ## Pricing Considerations
 OpenRouter pricing varies by model. Check current rates at https://openrouter.ai/models
 **Tips for Cost Optimization:**
 1. Use advanced vision models for best quality on complex scientific content
 2. Use cheaper models (Gemini) for simple images
 3. Batch process similar content with the same model
 4. Use appropriate prompts to get better results in fewer retries
 ## Troubleshooting
 ### API Key Issues
 ```bash
 # Check if key is set
 echo $OPENROUTER_API_KEY
 # Should show: sk-or-v1-...
 # If empty, set it:
 export OPENROUTER_API_KEY="sk-or-v1-..."
 ```
 ### Model Not Found
 If you get a "model not found" error, check:
 1. Model name format: `provider/model-name`
 2. Model availability: https://openrouter.ai/models
 3. Vision support: Ensure model supports vision for image description
 ### Rate Limits
 OpenRouter has rate limits. If you hit them:
 1. Add delays between requests
 2. Use batch processing scripts with `--workers` parameter
 3. Consider upgrading your OpenRouter plan
 ## Migration Notes
 This skill was updated from direct OpenAI API to OpenRouter. Key changes:
 1. **Environment Variable**: `OPENAI_API_KEY` → `OPENROUTER_API_KEY`
 2. **Client Initialization**: Added `base_url="https://openrouter.ai/api/v1"`
 3. **Model Names**: `gpt-4o` → `openai/gpt-4o` (with provider prefix)
 4. **Script Updates**: All scripts now use OpenRouter by default
 ## Resources
 - **OpenRouter Website**: https://openrouter.ai
 - **Get API Keys**: https://openrouter.ai/keys
 - **Model List**: https://openrouter.ai/models
 - **Pricing**: https://openrouter.ai/models (click on model for details)
 - **Documentation**: https://openrouter.ai/docs
 - **Support**: https://openrouter.ai/discord
 ## Example Workflow
 Here's a complete workflow using OpenRouter:
 ```bash
 # 1. Set up API key
 export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
 # 2. Convert a scientific paper with Claude
 python scripts/convert_with_ai.py \
  research_paper.pdf \
  output.md \
  --model anthropic/claude-opus-4.5 \
  --prompt-type scientific
 # 3. Convert presentation with GPT-4o
 python scripts/convert_with_ai.py \
  talk_slides.pptx \
  slides.md \
  --model openai/gpt-4o \
  --prompt-type presentation
 # 4. Batch convert with cost-effective model
 python scripts/batch_convert.py \
  images/ \
  markdown_output/ \
  --extensions .jpg .png
 ```
 ## Support
 For OpenRouter-specific issues:
 - Discord: https://openrouter.ai/discord
 - Email: support@openrouter.ai
 For MarkItDown skill issues:
 - Check documentation in this skill directory
 - Review examples in `assets/example_usage.md`
--- a/scientific-skills/markitdown/references/QUICK_REFERENCE.md
+++ b/scientific-skills/markitdown/references/QUICK_REFERENCE.md
@@ -1,309 +0,0 @@
 # MarkItDown Quick Reference
 ## Installation
 ```bash
 # All features
 pip install 'markitdown[all]'
 # Specific formats
 pip install 'markitdown[pdf,docx,pptx,xlsx]'
 ```
 ## Basic Usage
 ```python
 from markitdown import MarkItDown
 md = MarkItDown()
 result = md.convert("file.pdf")
 print(result.text_content)
 ```
 ## Command Line
 ```bash
 # Simple conversion
 markitdown input.pdf > output.md
 markitdown input.pdf -o output.md
 # With plugins
 markitdown --use-plugins file.pdf -o output.md
 ```
 ## Common Tasks
 ### Convert PDF
 ```python
 md = MarkItDown()
 result = md.convert("paper.pdf")
 ```
 ### Convert with AI
 ```python
 from openai import OpenAI
 # Use OpenRouter for multiple model access
 client = OpenAI(
    api_key="your-openrouter-api-key",
    base_url="https://openrouter.ai/api/v1"
 )
 md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5"  # recommended for vision
 )
 result = md.convert("slides.pptx")
 ```
 ### Batch Convert
 ```bash
 python scripts/batch_convert.py input/ output/ --extensions .pdf .docx
 ```
 ### Literature Conversion
 ```bash
 python scripts/convert_literature.py papers/ markdown/ --create-index
 ```
 ## Supported Formats
 | Format | Extension | Notes |
 |--------|-----------|-------|
 | PDF | `.pdf` | Full text + OCR |
 | Word | `.docx` | Tables, formatting |
 | PowerPoint | `.pptx` | Slides + notes |
 | Excel | `.xlsx`, `.xls` | Tables |
 | Images | `.jpg`, `.png`, `.gif`, `.webp` | EXIF + OCR |
 | Audio | `.wav`, `.mp3` | Transcription |
 | HTML | `.html`, `.htm` | Clean conversion |
 | Data | `.csv`, `.json`, `.xml` | Structured |
 | Archives | `.zip` | Iterates contents |
 | E-books | `.epub` | Full text |
 | YouTube | URLs | Transcripts |
 ## Optional Dependencies
 ```bash
 [all]                  # All features
 [pdf]                  # PDF support
 [docx]                 # Word documents
 [pptx]                 # PowerPoint
 [xlsx]                 # Excel
 [xls]                  # Old Excel
 [outlook]              # Outlook messages
 [az-doc-intel]         # Azure Document Intelligence
 [audio-transcription]  # Audio files
 [youtube-transcription] # YouTube videos
 ```
 ## AI-Enhanced Conversion
 ### Scientific Papers
 ```python
 from openai import OpenAI
 # Initialize OpenRouter client
 client = OpenAI(
    api_key="your-openrouter-api-key",
    base_url="https://openrouter.ai/api/v1"
 )
 md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5",  # recommended for scientific vision
    llm_prompt="Describe scientific figures with technical precision"
 )
 result = md.convert("paper.pdf")
 ```
 ### Custom Prompts
 ```python
 prompt = """
 Analyze this data visualization. Describe:
 - Type of chart/graph
 - Key trends and patterns
 - Notable data points
 """
 md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5",
    llm_prompt=prompt
 )
 ```
 ### Available Models via OpenRouter
 - `anthropic/claude-sonnet-4.5` - **Recommended for scientific vision**
 - `anthropic/claude-opus-4.5` - Advanced vision model
 - `openai/gpt-4o` - GPT-4 Omni (vision)
 - `openai/gpt-4-vision` - GPT-4 Vision
 - `google/gemini-pro-vision` - Gemini Pro Vision
 See https://openrouter.ai/models for full list
 ## Azure Document Intelligence
 ```python
 md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/")
 result = md.convert("complex_layout.pdf")
 ```
 ## Batch Processing
 ### Python
 ```python
 from markitdown import MarkItDown
 from pathlib import Path
 md = MarkItDown()
 for file in Path("input/").glob("*.pdf"):
    result = md.convert(str(file))
    output = Path("output") / f"{file.stem}.md"
    output.write_text(result.text_content)
 ```
 ### Script
 ```bash
 # Parallel conversion
 python scripts/batch_convert.py input/ output/ --workers 8
 # Recursive
 python scripts/batch_convert.py input/ output/ -r
 ```
 ## Error Handling
 ```python
 try:
    result = md.convert("file.pdf")
 except FileNotFoundError:
    print("File not found")
 except Exception as e:
    print(f"Error: {e}")
 ```
 ## Streaming
 ```python
 with open("large_file.pdf", "rb") as f:
    result = md.convert_stream(f, file_extension=".pdf")
 ```
 ## Common Prompts
 ### Scientific
 ```
 Analyze this scientific figure. Describe:
 - Type of visualization
 - Key data points and trends
 - Axes, labels, and legends
 - Scientific significance
 ```
 ### Medical
 ```
 Describe this medical image. Include:
 - Type of imaging (X-ray, MRI, CT, etc.)
 - Anatomical structures visible
 - Notable findings
 - Clinical relevance
 ```
 ### Data Visualization
 ```
 Analyze this data visualization:
 - Chart type
 - Variables and axes
 - Data ranges
 - Key patterns and outliers
 ```
 ## Performance Tips
 1. **Reuse instance**: Create once, use many times
 2. **Parallel processing**: Use ThreadPoolExecutor for multiple files
 3. **Stream large files**: Use `convert_stream()` for big files
 4. **Choose right format**: Install only needed dependencies
 ## Environment Variables
 ```bash
 # OpenRouter for AI-enhanced conversions
 export OPENROUTER_API_KEY="sk-or-v1-..."
 # Azure Document Intelligence (optional)
 export AZURE_DOCUMENT_INTELLIGENCE_KEY="key..."
 export AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://..."
 ```
 ## Scripts Quick Reference
 ### batch_convert.py
 ```bash
 python scripts/batch_convert.py INPUT OUTPUT [OPTIONS]
 Options:
  --extensions .pdf .docx    File types to convert
  --recursive, -r            Search subdirectories
  --workers 4                Parallel workers
  --verbose, -v              Detailed output
  --plugins, -p              Enable plugins
 ```
 ### convert_with_ai.py
 ```bash
 python scripts/convert_with_ai.py INPUT OUTPUT [OPTIONS]
 Options:
  --api-key KEY              OpenRouter API key
  --model MODEL              Model name (default: anthropic/claude-sonnet-4.5)
  --prompt-type TYPE         Preset prompt (scientific, medical, etc.)
  --custom-prompt TEXT       Custom prompt
  --list-prompts             Show available prompts
 ```
 ### convert_literature.py
 ```bash
 python scripts/convert_literature.py INPUT OUTPUT [OPTIONS]
 Options:
  --organize-by-year, -y     Organize by year
  --create-index, -i         Create index file
  --recursive, -r            Search subdirectories
 ```
 ## Troubleshooting
 ### Missing Dependencies
 ```bash
 pip install 'markitdown[pdf]'  # Install PDF support
 ```
 ### Binary File Error
 ```python
 # Wrong
 with open("file.pdf", "r") as f:
 # Correct
 with open("file.pdf", "rb") as f:  # Binary mode
 ```
 ### OCR Not Working
 ```bash
 # macOS
 brew install tesseract
 # Ubuntu
 sudo apt-get install tesseract-ocr
 ```
 ## More Information
 - **Full Documentation**: See `SKILL.md`
 - **API Reference**: See `references/api_reference.md`
 - **Format Details**: See `references/file_formats.md`
 - **Examples**: See `assets/example_usage.md`
 - **GitHub**: https://github.com/microsoft/markitdown
--- a/scientific-skills/markitdown/references/README.md
+++ b/scientific-skills/markitdown/references/README.md
@@ -1,184 +0,0 @@
 # MarkItDown Skill
 This skill provides comprehensive support for converting various file formats to Markdown using Microsoft's MarkItDown tool.
 ## Overview
 MarkItDown is a Python tool that converts files and office documents to Markdown format. This skill includes:
 - Complete API documentation
 - Format-specific conversion guides
 - Utility scripts for batch processing
 - AI-enhanced conversion examples
 - Integration with scientific workflows
 ## Contents
 ### Main Skill File
 - **SKILL.md** - Complete guide to using MarkItDown with quick start, examples, and best practices
 ### References
 - **api_reference.md** - Detailed API documentation, class references, and method signatures
 - **file_formats.md** - Format-specific details for all supported file types
 ### Scripts
 - **batch_convert.py** - Batch convert multiple files with parallel processing
 - **convert_with_ai.py** - AI-enhanced conversion with custom prompts
 - **convert_literature.py** - Scientific literature conversion with metadata extraction
 ### Assets
 - **example_usage.md** - Practical examples for common use cases
 ## Installation
 ```bash
 # Install with all features
 pip install 'markitdown[all]'
 # Or install specific features
 pip install 'markitdown[pdf,docx,pptx,xlsx]'
 ```
 ## Quick Start
 ```python
 from markitdown import MarkItDown
 md = MarkItDown()
 result = md.convert("document.pdf")
 print(result.text_content)
 ```
 ## Supported Formats
 - **Documents**: PDF, DOCX, PPTX, XLSX, EPUB
 - **Images**: JPEG, PNG, GIF, WebP (with OCR)
 - **Audio**: WAV, MP3 (with transcription)
 - **Web**: HTML, YouTube URLs
 - **Data**: CSV, JSON, XML
 - **Archives**: ZIP files
 ## Key Features
 ### 1. AI-Enhanced Conversions
 Use AI models via OpenRouter to generate detailed image descriptions:
 ```python
 from openai import OpenAI
 # OpenRouter provides access to 100+ AI models
 client = OpenAI(
    api_key="your-openrouter-api-key",
    base_url="https://openrouter.ai/api/v1"
 )
 md = MarkItDown(
    llm_client=client,
    llm_model="anthropic/claude-sonnet-4.5"  # recommended for vision
 )
 result = md.convert("presentation.pptx")
 ```
 ### 2. Batch Processing
 Convert multiple files efficiently:
 ```bash
 python scripts/batch_convert.py papers/ output/ --extensions .pdf .docx
 ```
 ### 3. Scientific Literature
 Convert and organize research papers:
 ```bash
 python scripts/convert_literature.py papers/ output/ --organize-by-year --create-index
 ```
 ### 4. Azure Document Intelligence
 Enhanced PDF conversion with Microsoft Document Intelligence:
 ```python
 md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/")
 result = md.convert("complex_document.pdf")
 ```
 ## Use Cases
 ### Literature Review
 Convert research papers to Markdown for easier analysis and note-taking.
 ### Data Extraction
 Extract tables from Excel files into Markdown format.
 ### Presentation Processing
 Convert PowerPoint slides with AI-generated descriptions.
 ### Document Analysis
 Process documents for LLM consumption with token-efficient Markdown.
 ### YouTube Transcripts
 Fetch and convert YouTube video transcriptions.
 ## Scripts Usage
 ### Batch Convert
 ```bash
 # Convert all PDFs in a directory
 python scripts/batch_convert.py input_dir/ output_dir/ --extensions .pdf
 # Recursive with multiple formats
 python scripts/batch_convert.py docs/ markdown/ --extensions .pdf .docx .pptx -r
 ```
 ### AI-Enhanced Conversion
 ```bash
 # Convert with AI descriptions via OpenRouter
 export OPENROUTER_API_KEY="sk-or-v1-..."
 python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific
 # Use different models
 python scripts/convert_with_ai.py image.png output.md --model anthropic/claude-sonnet-4.5
 # Use custom prompt
 python scripts/convert_with_ai.py image.png output.md --custom-prompt "Describe this diagram"
 ```
 ### Literature Conversion
 ```bash
 # Convert papers with metadata extraction
 python scripts/convert_literature.py papers/ markdown/ --organize-by-year --create-index
 ```
 ## Integration with Scientific Writer
 This skill integrates seamlessly with the Scientific Writer CLI for:
 - Converting source materials for paper writing
 - Processing literature for reviews
 - Extracting data from various document formats
 - Preparing documents for LLM analysis
 ## Resources
 - **MarkItDown GitHub**: https://github.com/microsoft/markitdown
 - **PyPI**: https://pypi.org/project/markitdown/
 - **OpenRouter**: https://openrouter.ai (AI model access)
 - **OpenRouter API Keys**: https://openrouter.ai/keys
 - **OpenRouter Models**: https://openrouter.ai/models
 - **License**: MIT
 ## Requirements
 - Python 3.10+
 - Optional dependencies based on formats needed
 - OpenRouter API key (for AI-enhanced conversions) - Get at https://openrouter.ai/keys
 - Azure subscription (optional, for Document Intelligence)
 ## Examples
 See `assets/example_usage.md` for comprehensive examples covering:
 - Basic conversions
 - Scientific workflows
 - AI-enhanced processing
 - Batch operations
 - Error handling
 - Integration patterns