Add skill author

This commit is contained in:
Timothy Kassis
2025-12-31 13:57:51 -08:00
parent 57bde764fe
commit 2621ee329d
164 changed files with 289 additions and 4988 deletions

View File

@@ -1,318 +0,0 @@
# MarkItDown Installation Guide
## Prerequisites
- Python 3.10 or higher
- pip package manager
- Virtual environment (recommended)
## Basic Installation
### Install All Features (Recommended)
```bash
pip install 'markitdown[all]'
```
This installs support for all file formats and features.
### Install Specific Features
If you only need certain file formats, you can install specific dependencies:
```bash
# PDF support only
pip install 'markitdown[pdf]'
# Office documents
pip install 'markitdown[docx,pptx,xlsx]'
# Multiple formats
pip install 'markitdown[pdf,docx,pptx,xlsx,audio-transcription]'
```
### Install from Source
```bash
git clone https://github.com/microsoft/markitdown.git
cd markitdown
pip install -e 'packages/markitdown[all]'
```
## Optional Dependencies
| Feature | Installation | Use Case |
|---------|--------------|----------|
| All formats | `pip install 'markitdown[all]'` | Everything |
| PDF | `pip install 'markitdown[pdf]'` | PDF documents |
| Word | `pip install 'markitdown[docx]'` | DOCX files |
| PowerPoint | `pip install 'markitdown[pptx]'` | PPTX files |
| Excel (new) | `pip install 'markitdown[xlsx]'` | XLSX files |
| Excel (old) | `pip install 'markitdown[xls]'` | XLS files |
| Outlook | `pip install 'markitdown[outlook]'` | MSG files |
| Azure DI | `pip install 'markitdown[az-doc-intel]'` | Enhanced PDF |
| Audio | `pip install 'markitdown[audio-transcription]'` | WAV/MP3 |
| YouTube | `pip install 'markitdown[youtube-transcription]'` | YouTube videos |
## System Dependencies
### OCR Support (for scanned documents and images)
#### macOS
```bash
brew install tesseract
```
#### Ubuntu/Debian
```bash
sudo apt-get update
sudo apt-get install tesseract-ocr
```
#### Windows
Download from: https://github.com/UB-Mannheim/tesseract/wiki
### Poppler Utils (for advanced PDF operations)
#### macOS
```bash
brew install poppler
```
#### Ubuntu/Debian
```bash
sudo apt-get install poppler-utils
```
## Verification
Test your installation:
```bash
# Check version
python -c "import markitdown; print('MarkItDown installed successfully')"
# Test basic conversion
echo "Test" > test.txt
markitdown test.txt
rm test.txt
```
## Virtual Environment Setup
### Using venv
```bash
# Create virtual environment
python -m venv markitdown-env
# Activate (macOS/Linux)
source markitdown-env/bin/activate
# Activate (Windows)
markitdown-env\Scripts\activate
# Install
pip install 'markitdown[all]'
```
### Using conda
```bash
# Create environment
conda create -n markitdown python=3.12
# Activate
conda activate markitdown
# Install
pip install 'markitdown[all]'
```
### Using uv
```bash
# Create virtual environment
uv venv --python=3.12 .venv
# Activate
source .venv/bin/activate
# Install
uv pip install 'markitdown[all]'
```
## AI Enhancement Setup (Optional)
For AI-powered image descriptions using OpenRouter:
### OpenRouter API
OpenRouter provides unified access to multiple AI models (GPT-4, Claude, Gemini, etc.) through a single API.
```bash
# Install OpenAI SDK (required, already included with markitdown)
pip install openai
# Get API key from https://openrouter.ai/keys
# Set API key
export OPENROUTER_API_KEY="sk-or-v1-..."
# Add to shell profile for persistence
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc # macOS
```
**Why OpenRouter?**
- Access to 100+ AI models through one API
- Choose between GPT-4, Claude, Gemini, and more
- Competitive pricing
- No vendor lock-in
- Simple OpenAI-compatible interface
**Popular Models for Image Description:**
- `anthropic/claude-sonnet-4.5` - **Recommended** - Best for scientific vision
- `anthropic/claude-opus-4.5` - Excellent technical analysis
- `openai/gpt-4o` - Good vision understanding
- `google/gemini-pro-vision` - Cost-effective option
See https://openrouter.ai/models for complete model list and pricing.
## Azure Document Intelligence Setup (Optional)
For enhanced PDF conversion:
1. Create Azure Document Intelligence resource in Azure Portal
2. Get endpoint and key
3. Set environment variables:
```bash
export AZURE_DOCUMENT_INTELLIGENCE_KEY="your-key"
export AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://your-endpoint.cognitiveservices.azure.com/"
```
## Docker Installation (Alternative)
```bash
# Clone repository
git clone https://github.com/microsoft/markitdown.git
cd markitdown
# Build image
docker build -t markitdown:latest .
# Run
docker run --rm -i markitdown:latest < input.pdf > output.md
```
## Troubleshooting
### Import Error
```
ModuleNotFoundError: No module named 'markitdown'
```
**Solution**: Ensure you're in the correct virtual environment and markitdown is installed:
```bash
pip install 'markitdown[all]'
```
### Missing Feature
```
Error: PDF conversion not supported
```
**Solution**: Install the specific feature:
```bash
pip install 'markitdown[pdf]'
```
### OCR Not Working
**Solution**: Install Tesseract OCR (see System Dependencies above)
### Permission Errors
**Solution**: Use virtual environment or install with `--user` flag:
```bash
pip install --user 'markitdown[all]'
```
## Upgrading
```bash
# Upgrade to latest version
pip install --upgrade 'markitdown[all]'
# Check version
pip show markitdown
```
## Uninstallation
```bash
pip uninstall markitdown
```
## Next Steps
After installation:
1. Read `QUICK_REFERENCE.md` for basic usage
2. See `SKILL.md` for comprehensive guide
3. Try example scripts in `scripts/` directory
4. Check `assets/example_usage.md` for practical examples
## Skill Scripts Setup
To use the skill scripts:
```bash
# Navigate to scripts directory
cd /Users/vinayak/Documents/claude-scientific-writer/.claude/skills/markitdown/scripts
# Scripts are already executable, just run them
python batch_convert.py --help
python convert_with_ai.py --help
python convert_literature.py --help
```
## Testing Installation
Create a test file to verify everything works:
```python
# test_markitdown.py
from markitdown import MarkItDown
def test_basic():
md = MarkItDown()
# Create a simple test file
with open("test.txt", "w") as f:
f.write("Hello MarkItDown!")
# Convert it
result = md.convert("test.txt")
print("✓ Basic conversion works")
print(result.text_content)
# Cleanup
import os
os.remove("test.txt")
if __name__ == "__main__":
test_basic()
```
Run it:
```bash
python test_markitdown.py
```
## Getting Help
- **Documentation**: See `SKILL.md` and `README.md`
- **GitHub Issues**: https://github.com/microsoft/markitdown/issues
- **Examples**: `assets/example_usage.md`
- **API Reference**: `references/api_reference.md`

View File

@@ -1,22 +0,0 @@
MIT License
Copyright (c) Microsoft Corporation.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,359 +0,0 @@
# OpenRouter Integration for MarkItDown
## Overview
This MarkItDown skill has been configured to use **OpenRouter** instead of direct OpenAI API access. OpenRouter provides a unified API gateway to access 100+ AI models from different providers through a single, OpenAI-compatible interface.
## Why OpenRouter?
### Benefits
1. **Multiple Model Access**: Access GPT-4, Claude, Gemini, and 100+ other models through one API
2. **No Vendor Lock-in**: Switch between models without code changes
3. **Competitive Pricing**: Often better rates than going direct
4. **Simple Migration**: OpenAI-compatible API means minimal code changes
5. **Flexible Choice**: Choose the best model for each task
### Popular Models for Image Description
| Model | Provider | Use Case | Vision Support |
|-------|----------|----------|----------------|
| `anthropic/claude-sonnet-4.5` | Anthropic | **Recommended** - Best overall for scientific analysis | ✅ |
| `anthropic/claude-opus-4.5` | Anthropic | Excellent technical analysis | ✅ |
| `openai/gpt-4o` | OpenAI | Strong vision understanding | ✅ |
| `openai/gpt-4-vision` | OpenAI | GPT-4 with vision | ✅ |
| `google/gemini-pro-vision` | Google | Cost-effective option | ✅ |
See https://openrouter.ai/models for the complete list.
## Getting Started
### 1. Get an API Key
1. Visit https://openrouter.ai/keys
2. Sign up or log in
3. Create a new API key
4. Copy the key (starts with `sk-or-v1-...`)
### 2. Set Environment Variable
```bash
# Add to your environment
export OPENROUTER_API_KEY="sk-or-v1-..."
# Make it permanent
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.zshrc # macOS
echo 'export OPENROUTER_API_KEY="sk-or-v1-..."' >> ~/.bashrc # Linux
# Reload shell
source ~/.zshrc # or source ~/.bashrc
```
### 3. Use in Python
```python
from markitdown import MarkItDown
from openai import OpenAI
# Initialize OpenRouter client (OpenAI-compatible)
client = OpenAI(
api_key="your-openrouter-api-key", # or use env var
base_url="https://openrouter.ai/api/v1"
)
# Create MarkItDown with AI support
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5" # Choose your model
)
# Convert with AI-enhanced descriptions
result = md.convert("presentation.pptx")
print(result.text_content)
```
## Using the Scripts
All skill scripts have been updated to use OpenRouter:
### convert_with_ai.py
```bash
# Set API key
export OPENROUTER_API_KEY="sk-or-v1-..."
# Convert with default model (advanced vision model)
python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific
# Use GPT-4o as alternative
python scripts/convert_with_ai.py paper.pdf output.md \
--model openai/gpt-4o \
--prompt-type scientific
# Use Gemini Pro Vision (cost-effective)
python scripts/convert_with_ai.py slides.pptx output.md \
--model google/gemini-pro-vision \
--prompt-type presentation
# List available prompt types
python scripts/convert_with_ai.py --list-prompts
```
### Choosing the Right Model
```bash
# For scientific papers - use advanced vision model for technical analysis
python scripts/convert_with_ai.py research.pdf output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type scientific
# For presentations - use advanced vision model
python scripts/convert_with_ai.py slides.pptx output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type presentation
# For data visualizations - use advanced vision model
python scripts/convert_with_ai.py charts.pdf output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type data_viz
# For medical images - use advanced vision model for detailed analysis
python scripts/convert_with_ai.py xray.jpg output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type medical
```
## Code Examples
### Basic Usage
```python
from markitdown import MarkItDown
from openai import OpenAI
import os
# Initialize OpenRouter client
client = OpenAI(
api_key=os.environ.get("OPENROUTER_API_KEY"),
base_url="https://openrouter.ai/api/v1"
)
# Use advanced vision model for image descriptions
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5"
)
result = md.convert("document.pptx")
print(result.text_content)
```
### Switching Models Dynamically
```python
from markitdown import MarkItDown
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["OPENROUTER_API_KEY"],
base_url="https://openrouter.ai/api/v1"
)
# Use different models for different file types
def convert_with_best_model(filepath):
if filepath.endswith('.pdf'):
# Use advanced vision model for technical PDFs
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt="Describe scientific figures with technical precision"
)
elif filepath.endswith('.pptx'):
# Use advanced vision model for presentations
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt="Describe slide content and visual elements"
)
else:
# Use advanced vision model as default
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5"
)
return md.convert(filepath)
# Use it
result = convert_with_best_model("paper.pdf")
```
### Custom Prompts per Model
```python
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI(
api_key="your-openrouter-api-key",
base_url="https://openrouter.ai/api/v1"
)
# Scientific analysis with advanced vision model
scientific_prompt = """
Analyze this scientific figure. Provide:
1. Type of visualization and methodology
2. Quantitative data points and trends
3. Statistical significance
4. Technical interpretation
Be precise and use scientific terminology.
"""
md_scientific = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt=scientific_prompt
)
# Visual analysis with advanced vision model
visual_prompt = """
Describe this image comprehensively:
1. Main visual elements and composition
2. Colors, layout, and design
3. Text and labels
4. Overall message
"""
md_visual = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt=visual_prompt
)
```
## Model Comparison
### For Scientific Content
**Recommended: anthropic/claude-sonnet-4.5**
- Excellent at technical analysis
- Superior reasoning capabilities
- Best at understanding scientific figures
- Most detailed and accurate explanations
- Advanced vision capabilities
**Alternative: openai/gpt-4o**
- Good vision understanding
- Fast processing
- Good at charts and graphs
### For Presentations
**Recommended: anthropic/claude-sonnet-4.5**
- Superior vision capabilities
- Excellent at understanding slide layouts
- Fast and reliable
- Best technical comprehension
### For Cost-Effectiveness
**Recommended: google/gemini-pro-vision**
- Lower cost per request
- Good quality
- Fast processing
## Pricing Considerations
OpenRouter pricing varies by model. Check current rates at https://openrouter.ai/models
**Tips for Cost Optimization:**
1. Use advanced vision models for best quality on complex scientific content
2. Use cheaper models (Gemini) for simple images
3. Batch process similar content with the same model
4. Use appropriate prompts to get better results in fewer retries
## Troubleshooting
### API Key Issues
```bash
# Check if key is set
echo $OPENROUTER_API_KEY
# Should show: sk-or-v1-...
# If empty, set it:
export OPENROUTER_API_KEY="sk-or-v1-..."
```
### Model Not Found
If you get a "model not found" error, check:
1. Model name format: `provider/model-name`
2. Model availability: https://openrouter.ai/models
3. Vision support: Ensure model supports vision for image description
### Rate Limits
OpenRouter has rate limits. If you hit them:
1. Add delays between requests
2. Use batch processing scripts with `--workers` parameter
3. Consider upgrading your OpenRouter plan
## Migration Notes
This skill was updated from direct OpenAI API to OpenRouter. Key changes:
1. **Environment Variable**: `OPENAI_API_KEY``OPENROUTER_API_KEY`
2. **Client Initialization**: Added `base_url="https://openrouter.ai/api/v1"`
3. **Model Names**: `gpt-4o``openai/gpt-4o` (with provider prefix)
4. **Script Updates**: All scripts now use OpenRouter by default
## Resources
- **OpenRouter Website**: https://openrouter.ai
- **Get API Keys**: https://openrouter.ai/keys
- **Model List**: https://openrouter.ai/models
- **Pricing**: https://openrouter.ai/models (click on model for details)
- **Documentation**: https://openrouter.ai/docs
- **Support**: https://openrouter.ai/discord
## Example Workflow
Here's a complete workflow using OpenRouter:
```bash
# 1. Set up API key
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
# 2. Convert a scientific paper with Claude
python scripts/convert_with_ai.py \
research_paper.pdf \
output.md \
--model anthropic/claude-opus-4.5 \
--prompt-type scientific
# 3. Convert presentation with GPT-4o
python scripts/convert_with_ai.py \
talk_slides.pptx \
slides.md \
--model openai/gpt-4o \
--prompt-type presentation
# 4. Batch convert with cost-effective model
python scripts/batch_convert.py \
images/ \
markdown_output/ \
--extensions .jpg .png
```
## Support
For OpenRouter-specific issues:
- Discord: https://openrouter.ai/discord
- Email: support@openrouter.ai
For MarkItDown skill issues:
- Check documentation in this skill directory
- Review examples in `assets/example_usage.md`

View File

@@ -1,309 +0,0 @@
# MarkItDown Quick Reference
## Installation
```bash
# All features
pip install 'markitdown[all]'
# Specific formats
pip install 'markitdown[pdf,docx,pptx,xlsx]'
```
## Basic Usage
```python
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("file.pdf")
print(result.text_content)
```
## Command Line
```bash
# Simple conversion
markitdown input.pdf > output.md
markitdown input.pdf -o output.md
# With plugins
markitdown --use-plugins file.pdf -o output.md
```
## Common Tasks
### Convert PDF
```python
md = MarkItDown()
result = md.convert("paper.pdf")
```
### Convert with AI
```python
from openai import OpenAI
# Use OpenRouter for multiple model access
client = OpenAI(
api_key="your-openrouter-api-key",
base_url="https://openrouter.ai/api/v1"
)
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5" # recommended for vision
)
result = md.convert("slides.pptx")
```
### Batch Convert
```bash
python scripts/batch_convert.py input/ output/ --extensions .pdf .docx
```
### Literature Conversion
```bash
python scripts/convert_literature.py papers/ markdown/ --create-index
```
## Supported Formats
| Format | Extension | Notes |
|--------|-----------|-------|
| PDF | `.pdf` | Full text + OCR |
| Word | `.docx` | Tables, formatting |
| PowerPoint | `.pptx` | Slides + notes |
| Excel | `.xlsx`, `.xls` | Tables |
| Images | `.jpg`, `.png`, `.gif`, `.webp` | EXIF + OCR |
| Audio | `.wav`, `.mp3` | Transcription |
| HTML | `.html`, `.htm` | Clean conversion |
| Data | `.csv`, `.json`, `.xml` | Structured |
| Archives | `.zip` | Iterates contents |
| E-books | `.epub` | Full text |
| YouTube | URLs | Transcripts |
## Optional Dependencies
```bash
[all] # All features
[pdf] # PDF support
[docx] # Word documents
[pptx] # PowerPoint
[xlsx] # Excel
[xls] # Old Excel
[outlook] # Outlook messages
[az-doc-intel] # Azure Document Intelligence
[audio-transcription] # Audio files
[youtube-transcription] # YouTube videos
```
## AI-Enhanced Conversion
### Scientific Papers
```python
from openai import OpenAI
# Initialize OpenRouter client
client = OpenAI(
api_key="your-openrouter-api-key",
base_url="https://openrouter.ai/api/v1"
)
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5", # recommended for scientific vision
llm_prompt="Describe scientific figures with technical precision"
)
result = md.convert("paper.pdf")
```
### Custom Prompts
```python
prompt = """
Analyze this data visualization. Describe:
- Type of chart/graph
- Key trends and patterns
- Notable data points
"""
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5",
llm_prompt=prompt
)
```
### Available Models via OpenRouter
- `anthropic/claude-sonnet-4.5` - **Recommended for scientific vision**
- `anthropic/claude-opus-4.5` - Advanced vision model
- `openai/gpt-4o` - GPT-4 Omni (vision)
- `openai/gpt-4-vision` - GPT-4 Vision
- `google/gemini-pro-vision` - Gemini Pro Vision
See https://openrouter.ai/models for full list
## Azure Document Intelligence
```python
md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/")
result = md.convert("complex_layout.pdf")
```
## Batch Processing
### Python
```python
from markitdown import MarkItDown
from pathlib import Path
md = MarkItDown()
for file in Path("input/").glob("*.pdf"):
result = md.convert(str(file))
output = Path("output") / f"{file.stem}.md"
output.write_text(result.text_content)
```
### Script
```bash
# Parallel conversion
python scripts/batch_convert.py input/ output/ --workers 8
# Recursive
python scripts/batch_convert.py input/ output/ -r
```
## Error Handling
```python
try:
result = md.convert("file.pdf")
except FileNotFoundError:
print("File not found")
except Exception as e:
print(f"Error: {e}")
```
## Streaming
```python
with open("large_file.pdf", "rb") as f:
result = md.convert_stream(f, file_extension=".pdf")
```
## Common Prompts
### Scientific
```
Analyze this scientific figure. Describe:
- Type of visualization
- Key data points and trends
- Axes, labels, and legends
- Scientific significance
```
### Medical
```
Describe this medical image. Include:
- Type of imaging (X-ray, MRI, CT, etc.)
- Anatomical structures visible
- Notable findings
- Clinical relevance
```
### Data Visualization
```
Analyze this data visualization:
- Chart type
- Variables and axes
- Data ranges
- Key patterns and outliers
```
## Performance Tips
1. **Reuse instance**: Create once, use many times
2. **Parallel processing**: Use ThreadPoolExecutor for multiple files
3. **Stream large files**: Use `convert_stream()` for big files
4. **Choose right format**: Install only needed dependencies
## Environment Variables
```bash
# OpenRouter for AI-enhanced conversions
export OPENROUTER_API_KEY="sk-or-v1-..."
# Azure Document Intelligence (optional)
export AZURE_DOCUMENT_INTELLIGENCE_KEY="key..."
export AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT="https://..."
```
## Scripts Quick Reference
### batch_convert.py
```bash
python scripts/batch_convert.py INPUT OUTPUT [OPTIONS]
Options:
--extensions .pdf .docx File types to convert
--recursive, -r Search subdirectories
--workers 4 Parallel workers
--verbose, -v Detailed output
--plugins, -p Enable plugins
```
### convert_with_ai.py
```bash
python scripts/convert_with_ai.py INPUT OUTPUT [OPTIONS]
Options:
--api-key KEY OpenRouter API key
--model MODEL Model name (default: anthropic/claude-sonnet-4.5)
--prompt-type TYPE Preset prompt (scientific, medical, etc.)
--custom-prompt TEXT Custom prompt
--list-prompts Show available prompts
```
### convert_literature.py
```bash
python scripts/convert_literature.py INPUT OUTPUT [OPTIONS]
Options:
--organize-by-year, -y Organize by year
--create-index, -i Create index file
--recursive, -r Search subdirectories
```
## Troubleshooting
### Missing Dependencies
```bash
pip install 'markitdown[pdf]' # Install PDF support
```
### Binary File Error
```python
# Wrong
with open("file.pdf", "r") as f:
# Correct
with open("file.pdf", "rb") as f: # Binary mode
```
### OCR Not Working
```bash
# macOS
brew install tesseract
# Ubuntu
sudo apt-get install tesseract-ocr
```
## More Information
- **Full Documentation**: See `SKILL.md`
- **API Reference**: See `references/api_reference.md`
- **Format Details**: See `references/file_formats.md`
- **Examples**: See `assets/example_usage.md`
- **GitHub**: https://github.com/microsoft/markitdown

View File

@@ -1,184 +0,0 @@
# MarkItDown Skill
This skill provides comprehensive support for converting various file formats to Markdown using Microsoft's MarkItDown tool.
## Overview
MarkItDown is a Python tool that converts files and office documents to Markdown format. This skill includes:
- Complete API documentation
- Format-specific conversion guides
- Utility scripts for batch processing
- AI-enhanced conversion examples
- Integration with scientific workflows
## Contents
### Main Skill File
- **SKILL.md** - Complete guide to using MarkItDown with quick start, examples, and best practices
### References
- **api_reference.md** - Detailed API documentation, class references, and method signatures
- **file_formats.md** - Format-specific details for all supported file types
### Scripts
- **batch_convert.py** - Batch convert multiple files with parallel processing
- **convert_with_ai.py** - AI-enhanced conversion with custom prompts
- **convert_literature.py** - Scientific literature conversion with metadata extraction
### Assets
- **example_usage.md** - Practical examples for common use cases
## Installation
```bash
# Install with all features
pip install 'markitdown[all]'
# Or install specific features
pip install 'markitdown[pdf,docx,pptx,xlsx]'
```
## Quick Start
```python
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)
```
## Supported Formats
- **Documents**: PDF, DOCX, PPTX, XLSX, EPUB
- **Images**: JPEG, PNG, GIF, WebP (with OCR)
- **Audio**: WAV, MP3 (with transcription)
- **Web**: HTML, YouTube URLs
- **Data**: CSV, JSON, XML
- **Archives**: ZIP files
## Key Features
### 1. AI-Enhanced Conversions
Use AI models via OpenRouter to generate detailed image descriptions:
```python
from openai import OpenAI
# OpenRouter provides access to 100+ AI models
client = OpenAI(
api_key="your-openrouter-api-key",
base_url="https://openrouter.ai/api/v1"
)
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5" # recommended for vision
)
result = md.convert("presentation.pptx")
```
### 2. Batch Processing
Convert multiple files efficiently:
```bash
python scripts/batch_convert.py papers/ output/ --extensions .pdf .docx
```
### 3. Scientific Literature
Convert and organize research papers:
```bash
python scripts/convert_literature.py papers/ output/ --organize-by-year --create-index
```
### 4. Azure Document Intelligence
Enhanced PDF conversion with Microsoft Document Intelligence:
```python
md = MarkItDown(docintel_endpoint="https://YOUR-ENDPOINT.cognitiveservices.azure.com/")
result = md.convert("complex_document.pdf")
```
## Use Cases
### Literature Review
Convert research papers to Markdown for easier analysis and note-taking.
### Data Extraction
Extract tables from Excel files into Markdown format.
### Presentation Processing
Convert PowerPoint slides with AI-generated descriptions.
### Document Analysis
Process documents for LLM consumption with token-efficient Markdown.
### YouTube Transcripts
Fetch and convert YouTube video transcriptions.
## Scripts Usage
### Batch Convert
```bash
# Convert all PDFs in a directory
python scripts/batch_convert.py input_dir/ output_dir/ --extensions .pdf
# Recursive with multiple formats
python scripts/batch_convert.py docs/ markdown/ --extensions .pdf .docx .pptx -r
```
### AI-Enhanced Conversion
```bash
# Convert with AI descriptions via OpenRouter
export OPENROUTER_API_KEY="sk-or-v1-..."
python scripts/convert_with_ai.py paper.pdf output.md --prompt-type scientific
# Use different models
python scripts/convert_with_ai.py image.png output.md --model anthropic/claude-sonnet-4.5
# Use custom prompt
python scripts/convert_with_ai.py image.png output.md --custom-prompt "Describe this diagram"
```
### Literature Conversion
```bash
# Convert papers with metadata extraction
python scripts/convert_literature.py papers/ markdown/ --organize-by-year --create-index
```
## Integration with Scientific Writer
This skill integrates seamlessly with the Scientific Writer CLI for:
- Converting source materials for paper writing
- Processing literature for reviews
- Extracting data from various document formats
- Preparing documents for LLM analysis
## Resources
- **MarkItDown GitHub**: https://github.com/microsoft/markitdown
- **PyPI**: https://pypi.org/project/markitdown/
- **OpenRouter**: https://openrouter.ai (AI model access)
- **OpenRouter API Keys**: https://openrouter.ai/keys
- **OpenRouter Models**: https://openrouter.ai/models
- **License**: MIT
## Requirements
- Python 3.10+
- Optional dependencies based on formats needed
- OpenRouter API key (for AI-enhanced conversions) - Get at https://openrouter.ai/keys
- Azure subscription (optional, for Document Intelligence)
## Examples
See `assets/example_usage.md` for comprehensive examples covering:
- Basic conversions
- Scientific workflows
- AI-enhanced processing
- Batch operations
- Error handling
- Integration patterns

View File

@@ -3,7 +3,8 @@ name: markitdown
description: "Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more."
allowed-tools: [Read, Write, Edit, Bash]
license: MIT
source: https://github.com/microsoft/markitdown
metadata:
skill-author: K-Dense Inc.
---
# MarkItDown - File to Markdown Conversion

View File

@@ -1,307 +0,0 @@
# MarkItDown Skill - Creation Summary
## Overview
A comprehensive skill for using Microsoft's MarkItDown tool has been created for the Claude Scientific Writer. This skill enables conversion of 15+ file formats to Markdown, optimized for LLM processing and scientific workflows.
## What Was Created
### Core Documentation
1. **SKILL.md** (Main skill file)
- Complete guide to MarkItDown
- Quick start examples
- All supported formats
- Advanced features (AI, Azure DI)
- Best practices
- Use cases and examples
2. **README.md**
- Skill overview
- Key features
- Quick reference
- Integration guide
3. **QUICK_REFERENCE.md**
- Cheat sheet for common tasks
- Quick syntax reference
- Common commands
- Troubleshooting tips
4. **INSTALLATION_GUIDE.md**
- Step-by-step installation
- System dependencies
- Virtual environment setup
- Optional features
- Troubleshooting
### Reference Documentation
Located in `references/`:
1. **api_reference.md**
- Complete API documentation
- Class and method references
- Custom converter development
- Plugin system
- Error handling
- Breaking changes guide
2. **file_formats.md**
- Detailed format-specific guides
- 15+ supported formats
- Format capabilities and limitations
- Best practices per format
- Example outputs
### Utility Scripts
Located in `scripts/`:
1. **batch_convert.py**
- Parallel batch conversion
- Multi-format support
- Recursive directory search
- Progress tracking
- Error reporting
- Command-line interface
2. **convert_with_ai.py**
- AI-enhanced conversions
- Predefined prompt types (scientific, medical, data viz, etc.)
- Custom prompt support
- Multiple model support
- OpenRouter integration (advanced vision models)
3. **convert_literature.py**
- Scientific literature conversion
- Metadata extraction from filenames
- Year-based organization
- Automatic index generation
- JSON catalog creation
- Front matter support
### Assets
Located in `assets/`:
1. **example_usage.md**
- 20+ practical examples
- Basic conversions
- Scientific workflows
- AI-enhanced processing
- Batch operations
- Error handling patterns
- Integration examples
### License
- **LICENSE.txt** - MIT License from Microsoft
## Skill Structure
```
.claude/skills/markitdown/
├── SKILL.md # Main skill documentation
├── README.md # Skill overview
├── QUICK_REFERENCE.md # Quick reference guide
├── INSTALLATION_GUIDE.md # Installation instructions
├── SKILL_SUMMARY.md # This file
├── LICENSE.txt # MIT License
├── references/
│ ├── api_reference.md # Complete API docs
│ └── file_formats.md # Format-specific guides
├── scripts/
│ ├── batch_convert.py # Batch conversion utility
│ ├── convert_with_ai.py # AI-enhanced conversion
│ └── convert_literature.py # Literature conversion
└── assets/
└── example_usage.md # Practical examples
```
## Capabilities
### File Format Support
- **Documents**: PDF, DOCX, PPTX, XLSX, XLS, EPUB
- **Images**: JPEG, PNG, GIF, WebP (with OCR)
- **Audio**: WAV, MP3 (with transcription)
- **Web**: HTML, YouTube URLs
- **Data**: CSV, JSON, XML
- **Archives**: ZIP files
- **Email**: Outlook MSG files
### Advanced Features
1. **AI Enhancement via OpenRouter**
- Access to 100+ AI models through OpenRouter
- Multiple preset prompts (scientific, medical, data viz)
- Custom prompt support
- Default: Advanced vision model (best for scientific vision)
- Choose best model for each task
2. **Azure Integration**
- Azure Document Intelligence for complex PDFs
- Enhanced layout understanding
- Better table extraction
3. **Batch Processing**
- Parallel conversion with configurable workers
- Recursive directory processing
- Progress tracking and error reporting
- Format-specific organization
4. **Scientific Workflows**
- Literature conversion with metadata
- Automatic index generation
- Year-based organization
- Citation-friendly output
## Integration with Scientific Writer
The skill has been added to the Scientific Writer's skill catalog:
- **Location**: `.claude/skills/markitdown/`
- **Skill Number**: #5 in Document Manipulation Skills
- **SKILLS.md**: Updated with complete skill description
### Usage Examples
```
> Convert all PDFs in the literature folder to Markdown
> Convert this PowerPoint presentation to Markdown with AI-generated descriptions
> Extract tables from this Excel file
> Transcribe this lecture recording
```
## Scripts Usage
### Batch Convert
```bash
python scripts/batch_convert.py input_dir/ output_dir/ --extensions .pdf .docx --workers 4
```
### AI-Enhanced Convert
```bash
export OPENROUTER_API_KEY="sk-or-v1-..."
python scripts/convert_with_ai.py paper.pdf output.md \
--model anthropic/claude-sonnet-4.5 \
--prompt-type scientific
```
### Literature Convert
```bash
python scripts/convert_literature.py papers/ markdown/ --organize-by-year --create-index
```
## Key Features
1. **Token-Efficient Output**: Markdown optimized for LLM processing
2. **Comprehensive Format Support**: 15+ file types
3. **AI Enhancement**: Detailed image descriptions via OpenAI
4. **OCR Support**: Extract text from scanned documents
5. **Audio Transcription**: Speech-to-text for audio files
6. **YouTube Support**: Video transcript extraction
7. **Plugin System**: Extensible architecture
8. **Batch Processing**: Efficient parallel conversion
9. **Error Handling**: Robust error management
10. **Scientific Focus**: Optimized for research workflows
## Installation
```bash
# Full installation
pip install 'markitdown[all]'
# Selective installation
pip install 'markitdown[pdf,docx,pptx,xlsx]'
```
## Quick Start
```python
from markitdown import MarkItDown
# Basic usage
md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)
# With AI via OpenRouter
from openai import OpenAI
client = OpenAI(
api_key="your-openrouter-api-key",
base_url="https://openrouter.ai/api/v1"
)
md = MarkItDown(
llm_client=client,
llm_model="anthropic/claude-sonnet-4.5" # or openai/gpt-4o
)
result = md.convert("presentation.pptx")
```
## Documentation Files
| File | Purpose | Lines |
|------|---------|-------|
| SKILL.md | Main documentation | 400+ |
| api_reference.md | API documentation | 500+ |
| file_formats.md | Format guides | 600+ |
| example_usage.md | Practical examples | 500+ |
| batch_convert.py | Batch conversion | 200+ |
| convert_with_ai.py | AI conversion | 200+ |
| convert_literature.py | Literature conversion | 250+ |
| QUICK_REFERENCE.md | Quick reference | 300+ |
| INSTALLATION_GUIDE.md | Installation guide | 300+ |
**Total**: ~3,000+ lines of documentation and code
## Use Cases
1. **Literature Review**: Convert research papers to Markdown for analysis
2. **Data Extraction**: Extract tables from Excel/PDF for processing
3. **Presentation Processing**: Convert slides with AI descriptions
4. **Document Analysis**: Prepare documents for LLM consumption
5. **Lecture Transcription**: Convert audio recordings to text
6. **YouTube Analysis**: Extract video transcripts
7. **Archive Processing**: Batch convert document collections
## Next Steps
1. Install MarkItDown: `pip install 'markitdown[all]'`
2. Read `QUICK_REFERENCE.md` for common tasks
3. Try example scripts in `scripts/` directory
4. Explore `SKILL.md` for comprehensive guide
5. Check `example_usage.md` for practical examples
## Resources
- **MarkItDown GitHub**: https://github.com/microsoft/markitdown
- **PyPI**: https://pypi.org/project/markitdown/
- **OpenRouter**: https://openrouter.ai (AI model access)
- **OpenRouter API Keys**: https://openrouter.ai/keys
- **OpenRouter Models**: https://openrouter.ai/models
- **License**: MIT (Microsoft Corporation)
- **Python**: 3.10+ required
- **Skill Location**: `.claude/skills/markitdown/`
## Success Criteria
✅ Comprehensive skill documentation created
✅ Complete API reference provided
✅ Format-specific guides included
✅ Utility scripts implemented
✅ Practical examples documented
✅ Installation guide created
✅ Quick reference guide added
✅ Integration with Scientific Writer complete
✅ SKILLS.md updated
✅ Scripts made executable
✅ MIT License included
## Skill Status
**Status**: ✅ Complete and Ready to Use
The MarkItDown skill is fully integrated into the Claude Scientific Writer and ready for use. All documentation, scripts, and examples are in place.