mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-01-26 16:58:56 +08:00
Add skill author
This commit is contained in:
@@ -1,318 +0,0 @@
|
||||
# MarkItDown Installation Guide
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.10 or higher
|
||||
- pip package manager
|
||||
- Virtual environment (recommended)
|
||||
|
||||
## Basic Installation
|
||||
|
||||
### Install All Features (Recommended)
|
||||
|
||||
```bash
|
||||
pip install 'markitdown[all]'
|
||||
```
|
||||
|
||||
This installs support for all file formats and features.
|
||||
|
||||
### Install Specific Features
|
||||
|
||||
If you only need certain file formats, you can install specific dependencies:
|
||||
|
||||
```bash
|
||||
# PDF support only
|
||||
pip install 'markitdown[pdf]'
|
||||
|
||||
# Office documents
|
||||
pip install 'markitdown[docx,pptx,xlsx]'
|
||||
|
||||
# Multiple formats
|
||||
pip install 'markitdown[pdf,docx,pptx,xlsx,audio-transcription]'
|
||||
```
|
||||
|
||||
### Install from Source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/microsoft/markitdown.git
|
||||
cd markitdown
|
||||
pip install -e 'packages/markitdown[all]'
|
||||
```
|
||||
|
||||
## Optional Dependencies
|
||||
|
||||
| Feature | Installation | Use Case |
|
||||
|---------|--------------|----------|
|
||||
| All formats | `pip install 'markitdown[all]'` | Everything |
|
||||
| PDF | `pip install 'markitdown[pdf]'` | PDF documents |
|
||||
| Word | `pip install 'markitdown[docx]'` | DOCX files |
|
||||
| PowerPoint | `pip install 'markitdown[pptx]'` | PPTX files |
|
||||
| Excel (new) | `pip install 'markitdown[xlsx]'` | XLSX files |
|
||||
| Excel (old) | `pip install 'markitdown[xls]'` | XLS files |
|
||||
| Outlook | `pip install 'markitdown[outlook]'` | MSG files |
|
||||
| Azure DI | `pip install 'markitdown[az-doc-intel]'` | Enhanced PDF |
|
||||
| Audio | `pip install 'markitdown[audio-transcription]'` | WAV/MP3 |
|
||||
| YouTube | `pip install 'markitdown[youtube-transcription]'` | YouTube videos |
|
||||
|
||||
## System Dependencies
|
||||
|
||||
### OCR Support (for scanned documents and images)
|
||||
|
||||
#### macOS
|
||||
```bash
|
||||
brew install tesseract
|
||||
```
|
||||
|
||||
#### Ubuntu/Debian
|
||||
```bash
|
||||
sudo apt-get update
|
||||
sudo apt-get install tesseract-ocr
|
||||
```
|
||||
|
||||
#### Windows
|
||||
Download from: https://github.com/UB-Mannheim/tesseract/wiki
|
||||
|
||||
### Poppler Utils (for advanced PDF operations)
|
||||
|
||||
#### macOS
|
||||
```bash
|
||||
brew install poppler
|
||||
```
|
||||
|
||||
#### Ubuntu/Debian
|
||||
```bash
|
||||
sudo apt-get install poppler-utils
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
Test your installation:
|
||||
|
||||
```bash
|
||||
# Check version
|
||||
python -c "import markitdown; print('MarkItDown installed successfully')"
|
||||
|
||||
# Test basic conversion
|
||||
echo "Test" > test.txt
|
||||
markitdown test.txt
|
||||
rm test.txt
|
||||
```
|
||||
|
||||
## Virtual Environment Setup
|
||||
|
||||
### Using venv
|
||||
|
||||
```bash
|
||||
# Create virtual environment
|
||||
python -m venv markitdown-env
|
||||
|
||||
# Activate (macOS/Linux)
|
||||
source markitdown-env/bin/activate
|
||||
|
||||
# Activate (Windows)
|
||||
markitdown-env\Scripts\activate
|
||||
|
||||
# Install
|
||||
pip install 'markitdown[all]'
|
||||
```
|
||||
|
||||
### Using conda
|
||||
|
||||
```bash
|
||||
# Create environment
|
||||
conda create -n markitdown python=3.12
|
||||
|
||||
# Activate
|
||||
conda activate markitdown
|
||||
|
||||
# Install
|
||||
pip install 'markitdown[all]'
|
||||
```
|
||||
|
||||
### Using uv
|
||||
|
||||
```bash
|
||||
# Create virtual environment
|
||||
uv venv --python=3.12 .venv
|
||||
|
||||
# Activate
|
||||
source .venv/bin/activate
|
||||
|
||||
# Install
|
||||
uv pip install 'markitdown[all]'
|
||||
```
|
||||
|
||||
## AI Enhancement Setup (Optional)
|
||||
|
||||
For AI-powered image descriptions using OpenRouter:
|
||||
|
||||
### OpenRouter API
|
||||
|
||||
OpenRouter provides unified access to multiple AI models (GPT-4, Claude, Gemini, etc.) through a single API.
|
||||
|
||||
```bash
|
||||
# Install OpenAI SDK (required, already included with markitdown)
|
||||
pip install openai
|
||||
|
||||
# Get API key from https://openrouter.ai/keys
|
||||
|
||||
# Set API key
|
||||
export OPENROUTER_API_KEY="sk-or-v1-..."
|
||||
|
||||
# Add to shell profile for persistence
|
||||
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux
|
||||
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc # macOS
|
||||
```
|
||||
|
||||
**Why OpenRouter?**
|
||||
- Access to 100+ AI models through one API
|
||||
- Choose between GPT-4, Claude, Gemini, and more
|
||||
- Competitive pricing
|
||||
- No vendor lock-in
|
||||
- Simple OpenAI-compatible interface
|
||||
|
||||
**Popular Models for Image Description:**
|
||||
- `anthropic/claude-sonnet-4.5` - **Recommended** - Best for scientific vision
|
||||
- `anthropic/claude-opus-4.5` - Excellent technical analysis
|
||||
- `openai/gpt-4o` - Good vision understanding
|
||||
- `google/gemini-pro-vision` - Cost-effective option
|
||||
|
||||
See https://openrouter.ai/models for complete model list and pricing.
|
||||
|
||||
## Azure Document Intelligence Setup (Optional)
|
||||
|
||||
For enhanced PDF conversion:
|
||||
|
||||
1. Create Azure Document Intelligence resource in Azure Portal
|
||||
2. Get endpoint and key
|
||||
3. Set environment variables:
|
||||
|
||||
```bash
|
||||
export AZURE_DOCUMENT_INTELLIGENCE_KEY="your-key"
|
||||
export AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://your-endpoint.cognitiveservices.azure.com/"
|
||||
```
|
||||
|
||||
## Docker Installation (Alternative)
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://github.com/microsoft/markitdown.git
|
||||
cd markitdown
|
||||
|
||||
# Build image
|
||||
docker build -t markitdown:latest .
|
||||
|
||||
# Run
|
||||
docker run --rm -i markitdown:latest < input.pdf > output.md
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Import Error
|
||||
```
|
||||
ModuleNotFoundError: No module named 'markitdown'
|
||||
```
|
||||
|
||||
**Solution**: Ensure you're in the correct virtual environment and markitdown is installed:
|
||||
```bash
|
||||
pip install 'markitdown[all]'
|
||||
```
|
||||
|
||||
### Missing Feature
|
||||
```
|
||||
Error: PDF conversion not supported
|
||||
```
|
||||
|
||||
**Solution**: Install the specific feature:
|
||||
```bash
|
||||
pip install 'markitdown[pdf]'
|
||||
```
|
||||
|
||||
### OCR Not Working
|
||||
|
||||
**Solution**: Install Tesseract OCR (see System Dependencies above)
|
||||
|
||||
### Permission Errors
|
||||
|
||||
**Solution**: Use virtual environment or install with `--user` flag:
|
||||
```bash
|
||||
pip install --user 'markitdown[all]'
|
||||
```
|
||||
|
||||
## Upgrading
|
||||
|
||||
```bash
|
||||
# Upgrade to latest version
|
||||
pip install --upgrade 'markitdown[all]'
|
||||
|
||||
# Check version
|
||||
pip show markitdown
|
||||
```
|
||||
|
||||
## Uninstallation
|
||||
|
||||
```bash
|
||||
pip uninstall markitdown
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
After installation:
|
||||
1. Read `QUICK_REFERENCE.md` for basic usage
|
||||
2. See `SKILL.md` for comprehensive guide
|
||||
3. Try example scripts in `scripts/` directory
|
||||
4. Check `assets/example_usage.md` for practical examples
|
||||
|
||||
## Skill Scripts Setup
|
||||
|
||||
To use the skill scripts:
|
||||
|
||||
```bash
|
||||
# Navigate to scripts directory
|
||||
cd /Users/vinayak/Documents/claude-scientific-writer/.claude/skills/markitdown/scripts
|
||||
|
||||
# Scripts are already executable, just run them
|
||||
python batch_convert.py --help
|
||||
python convert_with_ai.py --help
|
||||
python convert_literature.py --help
|
||||
```
|
||||
|
||||
## Testing Installation
|
||||
|
||||
Create a test file to verify everything works:
|
||||
|
||||
```python
|
||||
# test_markitdown.py
|
||||
from markitdown import MarkItDown
|
||||
|
||||
def test_basic():
|
||||
md = MarkItDown()
|
||||
# Create a simple test file
|
||||
with open("test.txt", "w") as f:
|
||||
f.write("Hello MarkItDown!")
|
||||
|
||||
# Convert it
|
||||
result = md.convert("test.txt")
|
||||
print("✓ Basic conversion works")
|
||||
print(result.text_content)
|
||||
|
||||
# Cleanup
|
||||
import os
|
||||
os.remove("test.txt")
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_basic()
|
||||
```
|
||||
|
||||
Run it:
|
||||
```bash
|
||||
python test_markitdown.py
|
||||
```
|
||||
|
||||
## Getting Help
|
||||
|
||||
- **Documentation**: See `SKILL.md` and `README.md`
|
||||
- **GitHub Issues**: https://github.com/microsoft/markitdown/issues
|
||||
- **Examples**: `assets/example_usage.md`
|
||||
- **API Reference**: `references/api_reference.md`
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) Microsoft Corporation.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
|
||||
@@ -1,359 +0,0 @@
|
||||
# OpenRouter Integration for MarkItDown
|
||||
|
||||
## Overview
|
||||
|
||||
This MarkItDown skill has been configured to use **OpenRouter** instead of direct OpenAI API access. OpenRouter provides a unified API gateway to access 100+ AI models from different providers through a single, OpenAI-compatible interface.
|
||||
|
||||
## Why OpenRouter?
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **Multiple Model Access**: Access GPT-4, Claude, Gemini, and 100+ other models through one API
|
||||
2. **No Vendor Lock-in**: Switch between models without code changes
|
||||
3. **Competitive Pricing**: Often better rates than going direct
|
||||
4. **Simple Migration**: OpenAI-compatible API means minimal code changes
|
||||
5. **Flexible Choice**: Choose the best model for each task
|
||||
|
||||
### Popular Models for Image Description
|
||||
|
||||
| Model | Provider | Use Case | Vision Support |
|
||||
|-------|----------|----------|----------------|
|
||||
| `anthropic/claude-sonnet-4.5` | Anthropic | **Recommended** - Best overall for scientific analysis | ✅ |
|
||||
| `anthropic/claude-opus-4.5` | Anthropic | Excellent technical analysis | ✅ |
|
||||
| `openai/gpt-4o` | OpenAI | Strong vision understanding | ✅ |
|
||||
| `openai/gpt-4-vision` | OpenAI | GPT-4 with vision | ✅ |
|
||||
| `google/gemini-pro-vision` | Google | Cost-effective option | ✅ |
|
||||
|
||||
See https://openrouter.ai/models for the complete list.
|
||||
|
||||
## Getting Started
|
||||
|
||||
### 1. Get an API Key
|
||||
|
||||
1. Visit https://openrouter.ai/keys
|
||||
2. Sign up or log in
|
||||
3. Create a new API key
|
||||
4. Copy the key (starts with `sk-or-v1-...`)
|
||||
|
||||
### 2. Set Environment Variable
|
||||
|
||||
```bash
|
||||
# Add to your environment
|
||||
export OPENROUTER_API_KEY="sk-or-v1-..."
|
||||
|
||||
# Make it permanent
|
||||
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc # macOS
|
||||
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux
|
||||
|
||||
# Reload shell
|
||||
source ~/.zshrc # or source ~/.bashrc
|
||||
```
|
||||
|
||||
### 3. Use in Python
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
from openai import OpenAI
|
||||
|
||||
# Initialize OpenRouter client (OpenAI-compatible)
|
||||
client = OpenAI(
|
||||
api_key="your-openrouter-api-key", # or use env var
|
||||
base_url="https://openrouter.ai/api/v1"
|
||||
)
|
||||
|
||||
# Create MarkItDown with AI support
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5" # Choose your model
|
||||
)
|
||||
|
||||
# Convert with AI-enhanced descriptions
|
||||
result = md.convert("presentation.pptx")
|
||||
print(result.text_content)
|
||||
```
|
||||
|
||||
## Using the Scripts
|
||||
|
||||
All skill scripts have been updated to use OpenRouter:
|
||||
|
||||
### convert_with_ai.py
|
||||
|
||||
```bash
|
||||
# Set API key
|
||||
export OPENROUTER_API_KEY="sk-or-v1-..."
|
||||
|
||||
# Convert with default model (advanced vision model)
|
||||
python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific
|
||||
|
||||
# Use GPT-4o as alternative
|
||||
python scripts/convert_with_ai.py paper.pdf output.md \
|
||||
--model openai/gpt-4o \
|
||||
--prompt-type scientific
|
||||
|
||||
# Use Gemini Pro Vision (cost-effective)
|
||||
python scripts/convert_with_ai.py slides.pptx output.md \
|
||||
--model google/gemini-pro-vision \
|
||||
--prompt-type presentation
|
||||
|
||||
# List available prompt types
|
||||
python scripts/convert_with_ai.py --list-prompts
|
||||
```
|
||||
|
||||
### Choosing the Right Model
|
||||
|
||||
```bash
|
||||
# For scientific papers - use advanced vision model for technical analysis
|
||||
python scripts/convert_with_ai.py research.pdf output.md \
|
||||
--model anthropic/claude-sonnet-4.5 \
|
||||
--prompt-type scientific
|
||||
|
||||
# For presentations - use advanced vision model
|
||||
python scripts/convert_with_ai.py slides.pptx output.md \
|
||||
--model anthropic/claude-sonnet-4.5 \
|
||||
--prompt-type presentation
|
||||
|
||||
# For data visualizations - use advanced vision model
|
||||
python scripts/convert_with_ai.py charts.pdf output.md \
|
||||
--model anthropic/claude-sonnet-4.5 \
|
||||
--prompt-type data_viz
|
||||
|
||||
# For medical images - use advanced vision model for detailed analysis
|
||||
python scripts/convert_with_ai.py xray.jpg output.md \
|
||||
--model anthropic/claude-sonnet-4.5 \
|
||||
--prompt-type medical
|
||||
```
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
from openai import OpenAI
|
||||
import os
|
||||
|
||||
# Initialize OpenRouter client
|
||||
client = OpenAI(
|
||||
api_key=os.environ.get("OPENROUTER_API_KEY"),
|
||||
base_url="https://openrouter.ai/api/v1"
|
||||
)
|
||||
|
||||
# Use advanced vision model for image descriptions
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5"
|
||||
)
|
||||
|
||||
result = md.convert("document.pptx")
|
||||
print(result.text_content)
|
||||
```
|
||||
|
||||
### Switching Models Dynamically
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
from openai import OpenAI
|
||||
import os
|
||||
|
||||
client = OpenAI(
|
||||
api_key=os.environ["OPENROUTER_API_KEY"],
|
||||
base_url="https://openrouter.ai/api/v1"
|
||||
)
|
||||
|
||||
# Use different models for different file types
|
||||
def convert_with_best_model(filepath):
|
||||
if filepath.endswith('.pdf'):
|
||||
# Use advanced vision model for technical PDFs
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5",
|
||||
llm_prompt="Describe scientific figures with technical precision"
|
||||
)
|
||||
elif filepath.endswith('.pptx'):
|
||||
# Use advanced vision model for presentations
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5",
|
||||
llm_prompt="Describe slide content and visual elements"
|
||||
)
|
||||
else:
|
||||
# Use advanced vision model as default
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5"
|
||||
)
|
||||
|
||||
return md.convert(filepath)
|
||||
|
||||
# Use it
|
||||
result = convert_with_best_model("paper.pdf")
|
||||
```
|
||||
|
||||
### Custom Prompts per Model
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key="your-openrouter-api-key",
|
||||
base_url="https://openrouter.ai/api/v1"
|
||||
)
|
||||
|
||||
# Scientific analysis with advanced vision model
|
||||
scientific_prompt = """
|
||||
Analyze this scientific figure. Provide:
|
||||
1. Type of visualization and methodology
|
||||
2. Quantitative data points and trends
|
||||
3. Statistical significance
|
||||
4. Technical interpretation
|
||||
Be precise and use scientific terminology.
|
||||
"""
|
||||
|
||||
md_scientific = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5",
|
||||
llm_prompt=scientific_prompt
|
||||
)
|
||||
|
||||
# Visual analysis with advanced vision model
|
||||
visual_prompt = """
|
||||
Describe this image comprehensively:
|
||||
1. Main visual elements and composition
|
||||
2. Colors, layout, and design
|
||||
3. Text and labels
|
||||
4. Overall message
|
||||
"""
|
||||
|
||||
md_visual = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5",
|
||||
llm_prompt=visual_prompt
|
||||
)
|
||||
```
|
||||
|
||||
## Model Comparison
|
||||
|
||||
### For Scientific Content
|
||||
|
||||
**Recommended: anthropic/claude-sonnet-4.5**
|
||||
- Excellent at technical analysis
|
||||
- Superior reasoning capabilities
|
||||
- Best at understanding scientific figures
|
||||
- Most detailed and accurate explanations
|
||||
- Advanced vision capabilities
|
||||
|
||||
**Alternative: openai/gpt-4o**
|
||||
- Good vision understanding
|
||||
- Fast processing
|
||||
- Good at charts and graphs
|
||||
|
||||
### For Presentations
|
||||
|
||||
**Recommended: anthropic/claude-sonnet-4.5**
|
||||
- Superior vision capabilities
|
||||
- Excellent at understanding slide layouts
|
||||
- Fast and reliable
|
||||
- Best technical comprehension
|
||||
|
||||
### For Cost-Effectiveness
|
||||
|
||||
**Recommended: google/gemini-pro-vision**
|
||||
- Lower cost per request
|
||||
- Good quality
|
||||
- Fast processing
|
||||
|
||||
## Pricing Considerations
|
||||
|
||||
OpenRouter pricing varies by model. Check current rates at https://openrouter.ai/models
|
||||
|
||||
**Tips for Cost Optimization:**
|
||||
1. Use advanced vision models for best quality on complex scientific content
|
||||
2. Use cheaper models (Gemini) for simple images
|
||||
3. Batch process similar content with the same model
|
||||
4. Use appropriate prompts to get better results in fewer retries
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### API Key Issues
|
||||
|
||||
```bash
|
||||
# Check if key is set
|
||||
echo $OPENROUTER_API_KEY
|
||||
|
||||
# Should show: sk-or-v1-...
|
||||
# If empty, set it:
|
||||
export OPENROUTER_API_KEY="sk-or-v1-..."
|
||||
```
|
||||
|
||||
### Model Not Found
|
||||
|
||||
If you get a "model not found" error, check:
|
||||
1. Model name format: `provider/model-name`
|
||||
2. Model availability: https://openrouter.ai/models
|
||||
3. Vision support: Ensure model supports vision for image description
|
||||
|
||||
### Rate Limits
|
||||
|
||||
OpenRouter has rate limits. If you hit them:
|
||||
1. Add delays between requests
|
||||
2. Use batch processing scripts with `--workers` parameter
|
||||
3. Consider upgrading your OpenRouter plan
|
||||
|
||||
## Migration Notes
|
||||
|
||||
This skill was updated from direct OpenAI API to OpenRouter. Key changes:
|
||||
|
||||
1. **Environment Variable**: `OPENAI_API_KEY` → `OPENROUTER_API_KEY`
|
||||
2. **Client Initialization**: Added `base_url="https://openrouter.ai/api/v1"`
|
||||
3. **Model Names**: `gpt-4o` → `openai/gpt-4o` (with provider prefix)
|
||||
4. **Script Updates**: All scripts now use OpenRouter by default
|
||||
|
||||
## Resources
|
||||
|
||||
- **OpenRouter Website**: https://openrouter.ai
|
||||
- **Get API Keys**: https://openrouter.ai/keys
|
||||
- **Model List**: https://openrouter.ai/models
|
||||
- **Pricing**: https://openrouter.ai/models (click on model for details)
|
||||
- **Documentation**: https://openrouter.ai/docs
|
||||
- **Support**: https://openrouter.ai/discord
|
||||
|
||||
## Example Workflow
|
||||
|
||||
Here's a complete workflow using OpenRouter:
|
||||
|
||||
```bash
|
||||
# 1. Set up API key
|
||||
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
|
||||
|
||||
# 2. Convert a scientific paper with Claude
|
||||
python scripts/convert_with_ai.py \
|
||||
research_paper.pdf \
|
||||
output.md \
|
||||
--model anthropic/claude-opus-4.5 \
|
||||
--prompt-type scientific
|
||||
|
||||
# 3. Convert presentation with GPT-4o
|
||||
python scripts/convert_with_ai.py \
|
||||
talk_slides.pptx \
|
||||
slides.md \
|
||||
--model openai/gpt-4o \
|
||||
--prompt-type presentation
|
||||
|
||||
# 4. Batch convert with cost-effective model
|
||||
python scripts/batch_convert.py \
|
||||
images/ \
|
||||
markdown_output/ \
|
||||
--extensions .jpg .png
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
For OpenRouter-specific issues:
|
||||
- Discord: https://openrouter.ai/discord
|
||||
- Email: support@openrouter.ai
|
||||
|
||||
For MarkItDown skill issues:
|
||||
- Check documentation in this skill directory
|
||||
- Review examples in `assets/example_usage.md`
|
||||
|
||||
@@ -1,309 +0,0 @@
|
||||
# MarkItDown Quick Reference
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# All features
|
||||
pip install 'markitdown[all]'
|
||||
|
||||
# Specific formats
|
||||
pip install 'markitdown[pdf,docx,pptx,xlsx]'
|
||||
```
|
||||
|
||||
## Basic Usage
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
|
||||
md = MarkItDown()
|
||||
result = md.convert("file.pdf")
|
||||
print(result.text_content)
|
||||
```
|
||||
|
||||
## Command Line
|
||||
|
||||
```bash
|
||||
# Simple conversion
|
||||
markitdown input.pdf > output.md
|
||||
markitdown input.pdf -o output.md
|
||||
|
||||
# With plugins
|
||||
markitdown --use-plugins file.pdf -o output.md
|
||||
```
|
||||
|
||||
## Common Tasks
|
||||
|
||||
### Convert PDF
|
||||
```python
|
||||
md = MarkItDown()
|
||||
result = md.convert("paper.pdf")
|
||||
```
|
||||
|
||||
### Convert with AI
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
# Use OpenRouter for multiple model access
|
||||
client = OpenAI(
|
||||
api_key="your-openrouter-api-key",
|
||||
base_url="https://openrouter.ai/api/v1"
|
||||
)
|
||||
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5" # recommended for vision
|
||||
)
|
||||
result = md.convert("slides.pptx")
|
||||
```
|
||||
|
||||
### Batch Convert
|
||||
```bash
|
||||
python scripts/batch_convert.py input/ output/ --extensions .pdf .docx
|
||||
```
|
||||
|
||||
### Literature Conversion
|
||||
```bash
|
||||
python scripts/convert_literature.py papers/ markdown/ --create-index
|
||||
```
|
||||
|
||||
## Supported Formats
|
||||
|
||||
| Format | Extension | Notes |
|
||||
|--------|-----------|-------|
|
||||
| PDF | `.pdf` | Full text + OCR |
|
||||
| Word | `.docx` | Tables, formatting |
|
||||
| PowerPoint | `.pptx` | Slides + notes |
|
||||
| Excel | `.xlsx`, `.xls` | Tables |
|
||||
| Images | `.jpg`, `.png`, `.gif`, `.webp` | EXIF + OCR |
|
||||
| Audio | `.wav`, `.mp3` | Transcription |
|
||||
| HTML | `.html`, `.htm` | Clean conversion |
|
||||
| Data | `.csv`, `.json`, `.xml` | Structured |
|
||||
| Archives | `.zip` | Iterates contents |
|
||||
| E-books | `.epub` | Full text |
|
||||
| YouTube | URLs | Transcripts |
|
||||
|
||||
## Optional Dependencies
|
||||
|
||||
```bash
|
||||
[all] # All features
|
||||
[pdf] # PDF support
|
||||
[docx] # Word documents
|
||||
[pptx] # PowerPoint
|
||||
[xlsx] # Excel
|
||||
[xls] # Old Excel
|
||||
[outlook] # Outlook messages
|
||||
[az-doc-intel] # Azure Document Intelligence
|
||||
[audio-transcription] # Audio files
|
||||
[youtube-transcription] # YouTube videos
|
||||
```
|
||||
|
||||
## AI-Enhanced Conversion
|
||||
|
||||
### Scientific Papers
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
# Initialize OpenRouter client
|
||||
client = OpenAI(
|
||||
api_key="your-openrouter-api-key",
|
||||
base_url="https://openrouter.ai/api/v1"
|
||||
)
|
||||
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5", # recommended for scientific vision
|
||||
llm_prompt="Describe scientific figures with technical precision"
|
||||
)
|
||||
result = md.convert("paper.pdf")
|
||||
```
|
||||
|
||||
### Custom Prompts
|
||||
```python
|
||||
prompt = """
|
||||
Analyze this data visualization. Describe:
|
||||
- Type of chart/graph
|
||||
- Key trends and patterns
|
||||
- Notable data points
|
||||
"""
|
||||
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5",
|
||||
llm_prompt=prompt
|
||||
)
|
||||
```
|
||||
|
||||
### Available Models via OpenRouter
|
||||
- `anthropic/claude-sonnet-4.5` - **Recommended for scientific vision**
|
||||
- `anthropic/claude-opus-4.5` - Advanced vision model
|
||||
- `openai/gpt-4o` - GPT-4 Omni (vision)
|
||||
- `openai/gpt-4-vision` - GPT-4 Vision
|
||||
- `google/gemini-pro-vision` - Gemini Pro Vision
|
||||
|
||||
See https://openrouter.ai/models for full list
|
||||
|
||||
## Azure Document Intelligence
|
||||
|
||||
```python
|
||||
md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/")
|
||||
result = md.convert("complex_layout.pdf")
|
||||
```
|
||||
|
||||
## Batch Processing
|
||||
|
||||
### Python
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
from pathlib import Path
|
||||
|
||||
md = MarkItDown()
|
||||
|
||||
for file in Path("input/").glob("*.pdf"):
|
||||
result = md.convert(str(file))
|
||||
output = Path("output") / f"{file.stem}.md"
|
||||
output.write_text(result.text_content)
|
||||
```
|
||||
|
||||
### Script
|
||||
```bash
|
||||
# Parallel conversion
|
||||
python scripts/batch_convert.py input/ output/ --workers 8
|
||||
|
||||
# Recursive
|
||||
python scripts/batch_convert.py input/ output/ -r
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```python
|
||||
try:
|
||||
result = md.convert("file.pdf")
|
||||
except FileNotFoundError:
|
||||
print("File not found")
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
```
|
||||
|
||||
## Streaming
|
||||
|
||||
```python
|
||||
with open("large_file.pdf", "rb") as f:
|
||||
result = md.convert_stream(f, file_extension=".pdf")
|
||||
```
|
||||
|
||||
## Common Prompts
|
||||
|
||||
### Scientific
|
||||
```
|
||||
Analyze this scientific figure. Describe:
|
||||
- Type of visualization
|
||||
- Key data points and trends
|
||||
- Axes, labels, and legends
|
||||
- Scientific significance
|
||||
```
|
||||
|
||||
### Medical
|
||||
```
|
||||
Describe this medical image. Include:
|
||||
- Type of imaging (X-ray, MRI, CT, etc.)
|
||||
- Anatomical structures visible
|
||||
- Notable findings
|
||||
- Clinical relevance
|
||||
```
|
||||
|
||||
### Data Visualization
|
||||
```
|
||||
Analyze this data visualization:
|
||||
- Chart type
|
||||
- Variables and axes
|
||||
- Data ranges
|
||||
- Key patterns and outliers
|
||||
```
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **Reuse instance**: Create once, use many times
|
||||
2. **Parallel processing**: Use ThreadPoolExecutor for multiple files
|
||||
3. **Stream large files**: Use `convert_stream()` for big files
|
||||
4. **Choose right format**: Install only needed dependencies
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
# OpenRouter for AI-enhanced conversions
|
||||
export OPENROUTER_API_KEY="sk-or-v1-..."
|
||||
|
||||
# Azure Document Intelligence (optional)
|
||||
export AZURE_DOCUMENT_INTELLIGENCE_KEY="key..."
|
||||
export AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://..."
|
||||
```
|
||||
|
||||
## Scripts Quick Reference
|
||||
|
||||
### batch_convert.py
|
||||
```bash
|
||||
python scripts/batch_convert.py INPUT OUTPUT [OPTIONS]
|
||||
|
||||
Options:
|
||||
--extensions .pdf .docx File types to convert
|
||||
--recursive, -r Search subdirectories
|
||||
--workers 4 Parallel workers
|
||||
--verbose, -v Detailed output
|
||||
--plugins, -p Enable plugins
|
||||
```
|
||||
|
||||
### convert_with_ai.py
|
||||
```bash
|
||||
python scripts/convert_with_ai.py INPUT OUTPUT [OPTIONS]
|
||||
|
||||
Options:
|
||||
--api-key KEY OpenRouter API key
|
||||
--model MODEL Model name (default: anthropic/claude-sonnet-4.5)
|
||||
--prompt-type TYPE Preset prompt (scientific, medical, etc.)
|
||||
--custom-prompt TEXT Custom prompt
|
||||
--list-prompts Show available prompts
|
||||
```
|
||||
|
||||
### convert_literature.py
|
||||
```bash
|
||||
python scripts/convert_literature.py INPUT OUTPUT [OPTIONS]
|
||||
|
||||
Options:
|
||||
--organize-by-year, -y Organize by year
|
||||
--create-index, -i Create index file
|
||||
--recursive, -r Search subdirectories
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Missing Dependencies
|
||||
```bash
|
||||
pip install 'markitdown[pdf]' # Install PDF support
|
||||
```
|
||||
|
||||
### Binary File Error
|
||||
```python
|
||||
# Wrong
|
||||
with open("file.pdf", "r") as f:
|
||||
|
||||
# Correct
|
||||
with open("file.pdf", "rb") as f: # Binary mode
|
||||
```
|
||||
|
||||
### OCR Not Working
|
||||
```bash
|
||||
# macOS
|
||||
brew install tesseract
|
||||
|
||||
# Ubuntu
|
||||
sudo apt-get install tesseract-ocr
|
||||
```
|
||||
|
||||
## More Information
|
||||
|
||||
- **Full Documentation**: See `SKILL.md`
|
||||
- **API Reference**: See `references/api_reference.md`
|
||||
- **Format Details**: See `references/file_formats.md`
|
||||
- **Examples**: See `assets/example_usage.md`
|
||||
- **GitHub**: https://github.com/microsoft/markitdown
|
||||
|
||||
@@ -1,184 +0,0 @@
|
||||
# MarkItDown Skill
|
||||
|
||||
This skill provides comprehensive support for converting various file formats to Markdown using Microsoft's MarkItDown tool.
|
||||
|
||||
## Overview
|
||||
|
||||
MarkItDown is a Python tool that converts files and office documents to Markdown format. This skill includes:
|
||||
|
||||
- Complete API documentation
|
||||
- Format-specific conversion guides
|
||||
- Utility scripts for batch processing
|
||||
- AI-enhanced conversion examples
|
||||
- Integration with scientific workflows
|
||||
|
||||
## Contents
|
||||
|
||||
### Main Skill File
|
||||
- **SKILL.md** - Complete guide to using MarkItDown with quick start, examples, and best practices
|
||||
|
||||
### References
|
||||
- **api_reference.md** - Detailed API documentation, class references, and method signatures
|
||||
- **file_formats.md** - Format-specific details for all supported file types
|
||||
|
||||
### Scripts
|
||||
- **batch_convert.py** - Batch convert multiple files with parallel processing
|
||||
- **convert_with_ai.py** - AI-enhanced conversion with custom prompts
|
||||
- **convert_literature.py** - Scientific literature conversion with metadata extraction
|
||||
|
||||
### Assets
|
||||
- **example_usage.md** - Practical examples for common use cases
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Install with all features
|
||||
pip install 'markitdown[all]'
|
||||
|
||||
# Or install specific features
|
||||
pip install 'markitdown[pdf,docx,pptx,xlsx]'
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
|
||||
md = MarkItDown()
|
||||
result = md.convert("document.pdf")
|
||||
print(result.text_content)
|
||||
```
|
||||
|
||||
## Supported Formats
|
||||
|
||||
- **Documents**: PDF, DOCX, PPTX, XLSX, EPUB
|
||||
- **Images**: JPEG, PNG, GIF, WebP (with OCR)
|
||||
- **Audio**: WAV, MP3 (with transcription)
|
||||
- **Web**: HTML, YouTube URLs
|
||||
- **Data**: CSV, JSON, XML
|
||||
- **Archives**: ZIP files
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. AI-Enhanced Conversions
|
||||
Use AI models via OpenRouter to generate detailed image descriptions:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
# OpenRouter provides access to 100+ AI models
|
||||
client = OpenAI(
|
||||
api_key="your-openrouter-api-key",
|
||||
base_url="https://openrouter.ai/api/v1"
|
||||
)
|
||||
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5" # recommended for vision
|
||||
)
|
||||
result = md.convert("presentation.pptx")
|
||||
```
|
||||
|
||||
### 2. Batch Processing
|
||||
Convert multiple files efficiently:
|
||||
|
||||
```bash
|
||||
python scripts/batch_convert.py papers/ output/ --extensions .pdf .docx
|
||||
```
|
||||
|
||||
### 3. Scientific Literature
|
||||
Convert and organize research papers:
|
||||
|
||||
```bash
|
||||
python scripts/convert_literature.py papers/ output/ --organize-by-year --create-index
|
||||
```
|
||||
|
||||
### 4. Azure Document Intelligence
|
||||
Enhanced PDF conversion with Microsoft Document Intelligence:
|
||||
|
||||
```python
|
||||
md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/")
|
||||
result = md.convert("complex_document.pdf")
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Literature Review
|
||||
Convert research papers to Markdown for easier analysis and note-taking.
|
||||
|
||||
### Data Extraction
|
||||
Extract tables from Excel files into Markdown format.
|
||||
|
||||
### Presentation Processing
|
||||
Convert PowerPoint slides with AI-generated descriptions.
|
||||
|
||||
### Document Analysis
|
||||
Process documents for LLM consumption with token-efficient Markdown.
|
||||
|
||||
### YouTube Transcripts
|
||||
Fetch and convert YouTube video transcriptions.
|
||||
|
||||
## Scripts Usage
|
||||
|
||||
### Batch Convert
|
||||
```bash
|
||||
# Convert all PDFs in a directory
|
||||
python scripts/batch_convert.py input_dir/ output_dir/ --extensions .pdf
|
||||
|
||||
# Recursive with multiple formats
|
||||
python scripts/batch_convert.py docs/ markdown/ --extensions .pdf .docx .pptx -r
|
||||
```
|
||||
|
||||
### AI-Enhanced Conversion
|
||||
```bash
|
||||
# Convert with AI descriptions via OpenRouter
|
||||
export OPENROUTER_API_KEY="sk-or-v1-..."
|
||||
python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific
|
||||
|
||||
# Use different models
|
||||
python scripts/convert_with_ai.py image.png output.md --model anthropic/claude-sonnet-4.5
|
||||
|
||||
# Use custom prompt
|
||||
python scripts/convert_with_ai.py image.png output.md --custom-prompt "Describe this diagram"
|
||||
```
|
||||
|
||||
### Literature Conversion
|
||||
```bash
|
||||
# Convert papers with metadata extraction
|
||||
python scripts/convert_literature.py papers/ markdown/ --organize-by-year --create-index
|
||||
```
|
||||
|
||||
## Integration with Scientific Writer
|
||||
|
||||
This skill integrates seamlessly with the Scientific Writer CLI for:
|
||||
- Converting source materials for paper writing
|
||||
- Processing literature for reviews
|
||||
- Extracting data from various document formats
|
||||
- Preparing documents for LLM analysis
|
||||
|
||||
## Resources
|
||||
|
||||
- **MarkItDown GitHub**: https://github.com/microsoft/markitdown
|
||||
- **PyPI**: https://pypi.org/project/markitdown/
|
||||
- **OpenRouter**: https://openrouter.ai (AI model access)
|
||||
- **OpenRouter API Keys**: https://openrouter.ai/keys
|
||||
- **OpenRouter Models**: https://openrouter.ai/models
|
||||
- **License**: MIT
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.10+
|
||||
- Optional dependencies based on formats needed
|
||||
- OpenRouter API key (for AI-enhanced conversions) - Get at https://openrouter.ai/keys
|
||||
- Azure subscription (optional, for Document Intelligence)
|
||||
|
||||
## Examples
|
||||
|
||||
See `assets/example_usage.md` for comprehensive examples covering:
|
||||
- Basic conversions
|
||||
- Scientific workflows
|
||||
- AI-enhanced processing
|
||||
- Batch operations
|
||||
- Error handling
|
||||
- Integration patterns
|
||||
|
||||
@@ -3,7 +3,8 @@ name: markitdown
|
||||
description: "Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more."
|
||||
allowed-tools: [Read, Write, Edit, Bash]
|
||||
license: MIT
|
||||
source: https://github.com/microsoft/markitdown
|
||||
metadata:
|
||||
skill-author: K-Dense Inc.
|
||||
---
|
||||
|
||||
# MarkItDown - File to Markdown Conversion
|
||||
|
||||
@@ -1,307 +0,0 @@
|
||||
# MarkItDown Skill - Creation Summary
|
||||
|
||||
## Overview
|
||||
|
||||
A comprehensive skill for using Microsoft's MarkItDown tool has been created for the Claude Scientific Writer. This skill enables conversion of 15+ file formats to Markdown, optimized for LLM processing and scientific workflows.
|
||||
|
||||
## What Was Created
|
||||
|
||||
### Core Documentation
|
||||
|
||||
1. **SKILL.md** (Main skill file)
|
||||
- Complete guide to MarkItDown
|
||||
- Quick start examples
|
||||
- All supported formats
|
||||
- Advanced features (AI, Azure DI)
|
||||
- Best practices
|
||||
- Use cases and examples
|
||||
|
||||
2. **README.md**
|
||||
- Skill overview
|
||||
- Key features
|
||||
- Quick reference
|
||||
- Integration guide
|
||||
|
||||
3. **QUICK_REFERENCE.md**
|
||||
- Cheat sheet for common tasks
|
||||
- Quick syntax reference
|
||||
- Common commands
|
||||
- Troubleshooting tips
|
||||
|
||||
4. **INSTALLATION_GUIDE.md**
|
||||
- Step-by-step installation
|
||||
- System dependencies
|
||||
- Virtual environment setup
|
||||
- Optional features
|
||||
- Troubleshooting
|
||||
|
||||
### Reference Documentation
|
||||
|
||||
Located in `references/`:
|
||||
|
||||
1. **api_reference.md**
|
||||
- Complete API documentation
|
||||
- Class and method references
|
||||
- Custom converter development
|
||||
- Plugin system
|
||||
- Error handling
|
||||
- Breaking changes guide
|
||||
|
||||
2. **file_formats.md**
|
||||
- Detailed format-specific guides
|
||||
- 15+ supported formats
|
||||
- Format capabilities and limitations
|
||||
- Best practices per format
|
||||
- Example outputs
|
||||
|
||||
### Utility Scripts
|
||||
|
||||
Located in `scripts/`:
|
||||
|
||||
1. **batch_convert.py**
|
||||
- Parallel batch conversion
|
||||
- Multi-format support
|
||||
- Recursive directory search
|
||||
- Progress tracking
|
||||
- Error reporting
|
||||
- Command-line interface
|
||||
|
||||
2. **convert_with_ai.py**
|
||||
- AI-enhanced conversions
|
||||
- Predefined prompt types (scientific, medical, data viz, etc.)
|
||||
- Custom prompt support
|
||||
- Multiple model support
|
||||
- OpenRouter integration (advanced vision models)
|
||||
|
||||
3. **convert_literature.py**
|
||||
- Scientific literature conversion
|
||||
- Metadata extraction from filenames
|
||||
- Year-based organization
|
||||
- Automatic index generation
|
||||
- JSON catalog creation
|
||||
- Front matter support
|
||||
|
||||
### Assets
|
||||
|
||||
Located in `assets/`:
|
||||
|
||||
1. **example_usage.md**
|
||||
- 20+ practical examples
|
||||
- Basic conversions
|
||||
- Scientific workflows
|
||||
- AI-enhanced processing
|
||||
- Batch operations
|
||||
- Error handling patterns
|
||||
- Integration examples
|
||||
|
||||
### License
|
||||
|
||||
- **LICENSE.txt** - MIT License from Microsoft
|
||||
|
||||
## Skill Structure
|
||||
|
||||
```
|
||||
.claude/skills/markitdown/
|
||||
├── SKILL.md # Main skill documentation
|
||||
├── README.md # Skill overview
|
||||
├── QUICK_REFERENCE.md # Quick reference guide
|
||||
├── INSTALLATION_GUIDE.md # Installation instructions
|
||||
├── SKILL_SUMMARY.md # This file
|
||||
├── LICENSE.txt # MIT License
|
||||
├── references/
|
||||
│ ├── api_reference.md # Complete API docs
|
||||
│ └── file_formats.md # Format-specific guides
|
||||
├── scripts/
|
||||
│ ├── batch_convert.py # Batch conversion utility
|
||||
│ ├── convert_with_ai.py # AI-enhanced conversion
|
||||
│ └── convert_literature.py # Literature conversion
|
||||
└── assets/
|
||||
└── example_usage.md # Practical examples
|
||||
```
|
||||
|
||||
## Capabilities
|
||||
|
||||
### File Format Support
|
||||
|
||||
- **Documents**: PDF, DOCX, PPTX, XLSX, XLS, EPUB
|
||||
- **Images**: JPEG, PNG, GIF, WebP (with OCR)
|
||||
- **Audio**: WAV, MP3 (with transcription)
|
||||
- **Web**: HTML, YouTube URLs
|
||||
- **Data**: CSV, JSON, XML
|
||||
- **Archives**: ZIP files
|
||||
- **Email**: Outlook MSG files
|
||||
|
||||
### Advanced Features
|
||||
|
||||
1. **AI Enhancement via OpenRouter**
|
||||
- Access to 100+ AI models through OpenRouter
|
||||
- Multiple preset prompts (scientific, medical, data viz)
|
||||
- Custom prompt support
|
||||
- Default: Advanced vision model (best for scientific vision)
|
||||
- Choose best model for each task
|
||||
|
||||
2. **Azure Integration**
|
||||
- Azure Document Intelligence for complex PDFs
|
||||
- Enhanced layout understanding
|
||||
- Better table extraction
|
||||
|
||||
3. **Batch Processing**
|
||||
- Parallel conversion with configurable workers
|
||||
- Recursive directory processing
|
||||
- Progress tracking and error reporting
|
||||
- Format-specific organization
|
||||
|
||||
4. **Scientific Workflows**
|
||||
- Literature conversion with metadata
|
||||
- Automatic index generation
|
||||
- Year-based organization
|
||||
- Citation-friendly output
|
||||
|
||||
## Integration with Scientific Writer
|
||||
|
||||
The skill has been added to the Scientific Writer's skill catalog:
|
||||
|
||||
- **Location**: `.claude/skills/markitdown/`
|
||||
- **Skill Number**: #5 in Document Manipulation Skills
|
||||
- **SKILLS.md**: Updated with complete skill description
|
||||
|
||||
### Usage Examples
|
||||
|
||||
```
|
||||
> Convert all PDFs in the literature folder to Markdown
|
||||
> Convert this PowerPoint presentation to Markdown with AI-generated descriptions
|
||||
> Extract tables from this Excel file
|
||||
> Transcribe this lecture recording
|
||||
```
|
||||
|
||||
## Scripts Usage
|
||||
|
||||
### Batch Convert
|
||||
```bash
|
||||
python scripts/batch_convert.py input_dir/ output_dir/ --extensions .pdf .docx --workers 4
|
||||
```
|
||||
|
||||
### AI-Enhanced Convert
|
||||
```bash
|
||||
export OPENROUTER_API_KEY="sk-or-v1-..."
|
||||
python scripts/convert_with_ai.py paper.pdf output.md \
|
||||
--model anthropic/claude-sonnet-4.5 \
|
||||
--prompt-type scientific
|
||||
```
|
||||
|
||||
### Literature Convert
|
||||
```bash
|
||||
python scripts/convert_literature.py papers/ markdown/ --organize-by-year --create-index
|
||||
```
|
||||
|
||||
## Key Features
|
||||
|
||||
1. **Token-Efficient Output**: Markdown optimized for LLM processing
|
||||
2. **Comprehensive Format Support**: 15+ file types
|
||||
3. **AI Enhancement**: Detailed image descriptions via OpenAI
|
||||
4. **OCR Support**: Extract text from scanned documents
|
||||
5. **Audio Transcription**: Speech-to-text for audio files
|
||||
6. **YouTube Support**: Video transcript extraction
|
||||
7. **Plugin System**: Extensible architecture
|
||||
8. **Batch Processing**: Efficient parallel conversion
|
||||
9. **Error Handling**: Robust error management
|
||||
10. **Scientific Focus**: Optimized for research workflows
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Full installation
|
||||
pip install 'markitdown[all]'
|
||||
|
||||
# Selective installation
|
||||
pip install 'markitdown[pdf,docx,pptx,xlsx]'
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from markitdown import MarkItDown
|
||||
|
||||
# Basic usage
|
||||
md = MarkItDown()
|
||||
result = md.convert("document.pdf")
|
||||
print(result.text_content)
|
||||
|
||||
# With AI via OpenRouter
|
||||
from openai import OpenAI
|
||||
client = OpenAI(
|
||||
api_key="your-openrouter-api-key",
|
||||
base_url="https://openrouter.ai/api/v1"
|
||||
)
|
||||
md = MarkItDown(
|
||||
llm_client=client,
|
||||
llm_model="anthropic/claude-sonnet-4.5" # or openai/gpt-4o
|
||||
)
|
||||
result = md.convert("presentation.pptx")
|
||||
```
|
||||
|
||||
## Documentation Files
|
||||
|
||||
| File | Purpose | Lines |
|
||||
|------|---------|-------|
|
||||
| SKILL.md | Main documentation | 400+ |
|
||||
| api_reference.md | API documentation | 500+ |
|
||||
| file_formats.md | Format guides | 600+ |
|
||||
| example_usage.md | Practical examples | 500+ |
|
||||
| batch_convert.py | Batch conversion | 200+ |
|
||||
| convert_with_ai.py | AI conversion | 200+ |
|
||||
| convert_literature.py | Literature conversion | 250+ |
|
||||
| QUICK_REFERENCE.md | Quick reference | 300+ |
|
||||
| INSTALLATION_GUIDE.md | Installation guide | 300+ |
|
||||
|
||||
**Total**: ~3,000+ lines of documentation and code
|
||||
|
||||
## Use Cases
|
||||
|
||||
1. **Literature Review**: Convert research papers to Markdown for analysis
|
||||
2. **Data Extraction**: Extract tables from Excel/PDF for processing
|
||||
3. **Presentation Processing**: Convert slides with AI descriptions
|
||||
4. **Document Analysis**: Prepare documents for LLM consumption
|
||||
5. **Lecture Transcription**: Convert audio recordings to text
|
||||
6. **YouTube Analysis**: Extract video transcripts
|
||||
7. **Archive Processing**: Batch convert document collections
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Install MarkItDown: `pip install 'markitdown[all]'`
|
||||
2. Read `QUICK_REFERENCE.md` for common tasks
|
||||
3. Try example scripts in `scripts/` directory
|
||||
4. Explore `SKILL.md` for comprehensive guide
|
||||
5. Check `example_usage.md` for practical examples
|
||||
|
||||
## Resources
|
||||
|
||||
- **MarkItDown GitHub**: https://github.com/microsoft/markitdown
|
||||
- **PyPI**: https://pypi.org/project/markitdown/
|
||||
- **OpenRouter**: https://openrouter.ai (AI model access)
|
||||
- **OpenRouter API Keys**: https://openrouter.ai/keys
|
||||
- **OpenRouter Models**: https://openrouter.ai/models
|
||||
- **License**: MIT (Microsoft Corporation)
|
||||
- **Python**: 3.10+ required
|
||||
- **Skill Location**: `.claude/skills/markitdown/`
|
||||
|
||||
## Success Criteria
|
||||
|
||||
✅ Comprehensive skill documentation created
|
||||
✅ Complete API reference provided
|
||||
✅ Format-specific guides included
|
||||
✅ Utility scripts implemented
|
||||
✅ Practical examples documented
|
||||
✅ Installation guide created
|
||||
✅ Quick reference guide added
|
||||
✅ Integration with Scientific Writer complete
|
||||
✅ SKILLS.md updated
|
||||
✅ Scripts made executable
|
||||
✅ MIT License included
|
||||
|
||||
## Skill Status
|
||||
|
||||
**Status**: ✅ Complete and Ready to Use
|
||||
|
||||
The MarkItDown skill is fully integrated into the Claude Scientific Writer and ready for use. All documentation, scripts, and examples are in place.
|
||||
|
||||
Reference in New Issue
Block a user