Add support for generating scientific illustrations using Nano Banan Pro and Flux.2 Pro

This commit is contained in:
Timothy Kassis
2025-11-30 09:30:58 -05:00
parent 8d82c83a1a
commit 90de96a99b
6 changed files with 561 additions and 6 deletions

View File

@@ -0,0 +1,124 @@
---
name: generate-image
description: Generate or edit scientific illustrations, schematics and images. Also use if the user mentions specific models like "Flux" or "Nano Banana".
---
# Generate Image
Generate and edit high-quality images using OpenRouter's image generation models including FLUX.2 Pro and Nano Banana Pro (Gemini 3 Pro).
## Quick Start
Use the `scripts/generate_image.py` script to generate or edit images:
```bash
# Generate a new image
python scripts/generate_image.py "A beautiful sunset over mountains"
# Edit an existing image
python scripts/generate_image.py "Make the sky purple" --input photo.jpg
```
This generates/edits an image and saves it as `generated_image.png` in the current directory.
## API Key Setup
**CRITICAL**: The script requires an OpenRouter API key. Before running, check if the user has configured their API key:
1. Look for a `.env` file in the project directory or parent directories
2. Check for `OPENROUTER_API_KEY=<key>` in the `.env` file
3. If not found, inform the user they need to:
- Create a `.env` file with `OPENROUTER_API_KEY=your-api-key-here`
- Or set the environment variable: `export OPENROUTER_API_KEY=your-api-key-here`
- Get an API key from: https://openrouter.ai/keys
The script will automatically detect the `.env` file and provide clear error messages if the API key is missing.
## Model Selection
**Default model**: `google/gemini-3-pro-image-preview` (high quality, recommended)
**Available models for generation and editing**:
- `google/gemini-3-pro-image-preview` - High quality, supports generation + editing
- `black-forest-labs/flux.2-pro` - Fast, high quality, supports generation + editing
**Generation only**:
- `black-forest-labs/flux.2-dev` - Development version, generation only
Select based on:
- **Quality**: Use gemini-3-pro or flux.2-pro
- **Editing**: Use gemini-3-pro or flux.2-pro (both support image editing)
- **Cost**: Use flux.2-dev for generation only
## Common Usage Patterns
### Basic generation
```bash
python scripts/generate_image.py "Your prompt here"
```
### Specify model
```bash
python scripts/generate_image.py "A cat in space" --model "black-forest-labs/flux.2-pro"
```
### Custom output path
```bash
python scripts/generate_image.py "Abstract art" --output artwork.png
```
### Edit an existing image
```bash
python scripts/generate_image.py "Make the background blue" --input photo.jpg
```
### Edit with a specific model
```bash
python scripts/generate_image.py "Add sunglasses to the person" --input portrait.png --model "black-forest-labs/flux.2-pro"
```
### Edit with custom output
```bash
python scripts/generate_image.py "Remove the text from the image" --input screenshot.png --output cleaned.png
```
### Multiple images
Run the script multiple times with different prompts or output paths:
```bash
python scripts/generate_image.py "Image 1 description" --output image1.png
python scripts/generate_image.py "Image 2 description" --output image2.png
```
## Script Parameters
- `prompt` (required): Text description of the image to generate, or editing instructions
- `--input` or `-i`: Input image path for editing (enables edit mode)
- `--model` or `-m`: OpenRouter model ID (default: google/gemini-3-pro-image-preview)
- `--output` or `-o`: Output file path (default: generated_image.png)
- `--api-key`: OpenRouter API key (overrides .env file)
## Error Handling
The script provides clear error messages for:
- Missing API key (with setup instructions)
- API errors (with status codes)
- Unexpected response formats
- Missing dependencies (requests library)
If the script fails, read the error message and address the issue before retrying.
## Notes
- Images are returned as base64-encoded data URLs and automatically saved as PNG files
- The script supports both `images` and `content` response formats from different OpenRouter models
- Generation time varies by model (typically 5-30 seconds)
- For image editing, the input image is encoded as base64 and sent to the model
- Supported input image formats: PNG, JPEG, GIF, WebP
- Check OpenRouter pricing for cost information: https://openrouter.ai/models
## Image Editing Tips
- Be specific about what changes you want (e.g., "change the sky to sunset colors" vs "edit the sky")
- Reference specific elements in the image when possible
- For best results, use clear and detailed editing instructions
- Both Gemini 3 Pro and FLUX.2 Pro support image editing through OpenRouter

View File

@@ -0,0 +1,281 @@
#!/usr/bin/env python3
"""
Generate and edit images using OpenRouter API with various image generation models.
Supports models like:
- google/gemini-3-pro-image-preview (generation and editing)
- black-forest-labs/flux.2-pro (generation and editing)
- black-forest-labs/flux.2-dev (generation)
- And more image generation models available on OpenRouter
For image editing, provide an input image along with an editing prompt.
"""
import sys
import json
import base64
import argparse
from pathlib import Path
from typing import Optional
def check_env_file() -> Optional[str]:
"""Check if .env file exists and contains OPENROUTER_API_KEY."""
# Look for .env in current directory and parent directories
current_dir = Path.cwd()
for parent in [current_dir] + list(current_dir.parents):
env_file = parent / ".env"
if env_file.exists():
with open(env_file, 'r') as f:
for line in f:
if line.startswith('OPENROUTER_API_KEY='):
api_key = line.split('=', 1)[1].strip().strip('"').strip("'")
if api_key:
return api_key
return None
def load_image_as_base64(image_path: str) -> str:
"""Load an image file and return it as a base64 data URL."""
path = Path(image_path)
if not path.exists():
print(f"❌ Error: Image file not found: {image_path}")
sys.exit(1)
# Determine MIME type from extension
ext = path.suffix.lower()
mime_types = {
'.png': 'image/png',
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.gif': 'image/gif',
'.webp': 'image/webp',
}
mime_type = mime_types.get(ext, 'image/png')
with open(path, 'rb') as f:
image_data = f.read()
base64_data = base64.b64encode(image_data).decode('utf-8')
return f"data:{mime_type};base64,{base64_data}"
def save_base64_image(base64_data: str, output_path: str) -> None:
"""Save base64 encoded image to file."""
# Remove data URL prefix if present
if ',' in base64_data:
base64_data = base64_data.split(',', 1)[1]
# Decode and save
image_data = base64.b64decode(base64_data)
with open(output_path, 'wb') as f:
f.write(image_data)
def generate_image(
prompt: str,
model: str = "google/gemini-3-pro-image-preview",
output_path: str = "generated_image.png",
api_key: Optional[str] = None,
input_image: Optional[str] = None
) -> dict:
"""
Generate or edit an image using OpenRouter API.
Args:
prompt: Text description of the image to generate, or editing instructions
model: OpenRouter model ID (default: google/gemini-3-pro-image-preview)
output_path: Path to save the generated image
api_key: OpenRouter API key (will check .env if not provided)
input_image: Path to an input image for editing (optional)
Returns:
dict: Response from OpenRouter API
"""
try:
import requests
except ImportError:
print("Error: 'requests' library not found. Install with: pip install requests")
sys.exit(1)
# Check for API key
if not api_key:
api_key = check_env_file()
if not api_key:
print("❌ Error: OPENROUTER_API_KEY not found!")
print("\nPlease create a .env file in your project directory with:")
print("OPENROUTER_API_KEY=your-api-key-here")
print("\nOr set the environment variable:")
print("export OPENROUTER_API_KEY=your-api-key-here")
print("\nGet your API key from: https://openrouter.ai/keys")
sys.exit(1)
# Determine if this is generation or editing
is_editing = input_image is not None
if is_editing:
print(f"✏️ Editing image with model: {model}")
print(f"📷 Input image: {input_image}")
print(f"📝 Edit prompt: {prompt}")
# Load input image as base64
image_data_url = load_image_as_base64(input_image)
# Build multimodal message content for image editing
message_content = [
{
"type": "text",
"text": prompt
},
{
"type": "image_url",
"image_url": {
"url": image_data_url
}
}
]
else:
print(f"🎨 Generating image with model: {model}")
print(f"📝 Prompt: {prompt}")
message_content = prompt
# Make API request
response = requests.post(
url="https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
json={
"model": model,
"messages": [
{
"role": "user",
"content": message_content
}
],
"modalities": ["image", "text"]
}
)
# Check for errors
if response.status_code != 200:
print(f"❌ API Error ({response.status_code}): {response.text}")
sys.exit(1)
result = response.json()
# Extract and save image
if result.get("choices"):
message = result["choices"][0]["message"]
# Handle both 'images' and 'content' response formats
images = []
if message.get("images"):
images = message["images"]
elif message.get("content"):
# Some models return content as array with image parts
content = message["content"]
if isinstance(content, list):
for part in content:
if isinstance(part, dict) and part.get("type") == "image":
images.append(part)
if images:
# Save the first image
image = images[0]
if "image_url" in image:
image_url = image["image_url"]["url"]
save_base64_image(image_url, output_path)
print(f"✅ Image saved to: {output_path}")
elif "url" in image:
save_base64_image(image["url"], output_path)
print(f"✅ Image saved to: {output_path}")
else:
print(f"⚠️ Unexpected image format: {image}")
else:
print("⚠️ No image found in response")
if message.get("content"):
print(f"Response content: {message['content']}")
else:
print("❌ No choices in response")
print(f"Response: {json.dumps(result, indent=2)}")
return result
def main():
parser = argparse.ArgumentParser(
description="Generate or edit images using OpenRouter API",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Generate with default model (Gemini 3 Pro Image Preview)
python generate_image.py "A beautiful sunset over mountains"
# Use a specific model
python generate_image.py "A cat in space" --model "black-forest-labs/flux.2-pro"
# Specify output path
python generate_image.py "Abstract art" --output my_image.png
# Edit an existing image
python generate_image.py "Make the sky purple" --input photo.jpg --output edited.png
# Edit with a specific model
python generate_image.py "Add a hat to the person" --input portrait.png -m "black-forest-labs/flux.2-pro"
Popular image models:
- google/gemini-3-pro-image-preview (default, high quality, generation + editing)
- black-forest-labs/flux.2-pro (fast, high quality, generation + editing)
- black-forest-labs/flux.2-dev (development version)
"""
)
parser.add_argument(
"prompt",
type=str,
help="Text description of the image to generate, or editing instructions"
)
parser.add_argument(
"--model", "-m",
type=str,
default="google/gemini-3-pro-image-preview",
help="OpenRouter model ID (default: google/gemini-3-pro-image-preview)"
)
parser.add_argument(
"--output", "-o",
type=str,
default="generated_image.png",
help="Output file path (default: generated_image.png)"
)
parser.add_argument(
"--input", "-i",
type=str,
help="Input image path for editing (enables edit mode)"
)
parser.add_argument(
"--api-key",
type=str,
help="OpenRouter API key (will check .env if not provided)"
)
args = parser.parse_args()
generate_image(
prompt=args.prompt,
model=args.model,
output_path=args.output,
api_key=args.api_key,
input_image=args.input
)
if __name__ == "__main__":
main()