Add support for generating scientific illustrations using Nano Banan Pro and Flux.2 Pro

2026-01-26 16:58:56 +08:00 · 2025-11-30 09:30:58 -05:00
parent 8d82c83a1a
commit 90de96a99b
6 changed files with 561 additions and 6 deletions
--- a/scientific-skills/generate-image/SKILL.md
+++ b/scientific-skills/generate-image/SKILL.md
@@ -0,0 +1,124 @@
+---
+name: generate-image
+description: Generate or edit scientific illustrations, schematics and images. Also use if the user mentions specific models like "Flux" or "Nano Banana".
+---
+
+# Generate Image
+
+Generate and edit high-quality images using OpenRouter's image generation models including FLUX.2 Pro and Nano Banana Pro (Gemini 3 Pro).
+
+## Quick Start
+
+Use the `scripts/generate_image.py` script to generate or edit images:
+
+```bash
+# Generate a new image
+python scripts/generate_image.py "A beautiful sunset over mountains"
+
+# Edit an existing image
+python scripts/generate_image.py "Make the sky purple" --input photo.jpg
+```
+
+This generates/edits an image and saves it as `generated_image.png` in the current directory.
+
+## API Key Setup
+
+**CRITICAL**: The script requires an OpenRouter API key. Before running, check if the user has configured their API key:
+
+1. Look for a `.env` file in the project directory or parent directories
+2. Check for `OPENROUTER_API_KEY=<key>` in the `.env` file
+3. If not found, inform the user they need to:
+   - Create a `.env` file with `OPENROUTER_API_KEY=your-api-key-here`
+   - Or set the environment variable: `export OPENROUTER_API_KEY=your-api-key-here`
+   - Get an API key from: https://openrouter.ai/keys
+
+The script will automatically detect the `.env` file and provide clear error messages if the API key is missing.
+
+## Model Selection
+
+**Default model**: `google/gemini-3-pro-image-preview` (high quality, recommended)
+
+**Available models for generation and editing**:
+- `google/gemini-3-pro-image-preview` - High quality, supports generation + editing
+- `black-forest-labs/flux.2-pro` - Fast, high quality, supports generation + editing
+
+**Generation only**:
+- `black-forest-labs/flux.2-dev` - Development version, generation only
+
+Select based on:
+- **Quality**: Use gemini-3-pro or flux.2-pro
+- **Editing**: Use gemini-3-pro or flux.2-pro (both support image editing)
+- **Cost**: Use flux.2-dev for generation only
+
+## Common Usage Patterns
+
+### Basic generation
+```bash
+python scripts/generate_image.py "Your prompt here"
+```
+
+### Specify model
+```bash
+python scripts/generate_image.py "A cat in space" --model "black-forest-labs/flux.2-pro"
+```
+
+### Custom output path
+```bash
+python scripts/generate_image.py "Abstract art" --output artwork.png
+```
+
+### Edit an existing image
+```bash
+python scripts/generate_image.py "Make the background blue" --input photo.jpg
+```
+
+### Edit with a specific model
+```bash
+python scripts/generate_image.py "Add sunglasses to the person" --input portrait.png --model "black-forest-labs/flux.2-pro"
+```
+
+### Edit with custom output
+```bash
+python scripts/generate_image.py "Remove the text from the image" --input screenshot.png --output cleaned.png
+```
+
+### Multiple images
+Run the script multiple times with different prompts or output paths:
+```bash
+python scripts/generate_image.py "Image 1 description" --output image1.png
+python scripts/generate_image.py "Image 2 description" --output image2.png
+```
+
+## Script Parameters
+
+- `prompt` (required): Text description of the image to generate, or editing instructions
+- `--input` or `-i`: Input image path for editing (enables edit mode)
+- `--model` or `-m`: OpenRouter model ID (default: google/gemini-3-pro-image-preview)
+- `--output` or `-o`: Output file path (default: generated_image.png)
+- `--api-key`: OpenRouter API key (overrides .env file)
+
+## Error Handling
+
+The script provides clear error messages for:
+- Missing API key (with setup instructions)
+- API errors (with status codes)
+- Unexpected response formats
+- Missing dependencies (requests library)
+
+If the script fails, read the error message and address the issue before retrying.
+
+## Notes
+
+- Images are returned as base64-encoded data URLs and automatically saved as PNG files
+- The script supports both `images` and `content` response formats from different OpenRouter models
+- Generation time varies by model (typically 5-30 seconds)
+- For image editing, the input image is encoded as base64 and sent to the model
+- Supported input image formats: PNG, JPEG, GIF, WebP
+- Check OpenRouter pricing for cost information: https://openrouter.ai/models
+
+## Image Editing Tips
+
+- Be specific about what changes you want (e.g., "change the sky to sunset colors" vs "edit the sky")
+- Reference specific elements in the image when possible
+- For best results, use clear and detailed editing instructions
+- Both Gemini 3 Pro and FLUX.2 Pro support image editing through OpenRouter
--- a/scientific-skills/generate-image/scripts/generate_image.py
+++ b/scientific-skills/generate-image/scripts/generate_image.py
@@ -0,0 +1,281 @@
+#!/usr/bin/env python3
+"""
+Generate and edit images using OpenRouter API with various image generation models.
+
+Supports models like:
+- google/gemini-3-pro-image-preview (generation and editing)
+- black-forest-labs/flux.2-pro (generation and editing)
+- black-forest-labs/flux.2-dev (generation)
+- And more image generation models available on OpenRouter
+
+For image editing, provide an input image along with an editing prompt.
+"""
+
+import sys
+import json
+import base64
+import argparse
+from pathlib import Path
+from typing import Optional
+
+
+def check_env_file() -> Optional[str]:
+    """Check if .env file exists and contains OPENROUTER_API_KEY."""
+    # Look for .env in current directory and parent directories
+    current_dir = Path.cwd()
+    for parent in [current_dir] + list(current_dir.parents):
+        env_file = parent / ".env"
+        if env_file.exists():
+            with open(env_file, 'r') as f:
+                for line in f:
+                    if line.startswith('OPENROUTER_API_KEY='):
+                        api_key = line.split('=', 1)[1].strip().strip('"').strip("'")
+                        if api_key:
+                            return api_key
+    return None
+
+
+def load_image_as_base64(image_path: str) -> str:
+    """Load an image file and return it as a base64 data URL."""
+    path = Path(image_path)
+    if not path.exists():
+        print(f"❌ Error: Image file not found: {image_path}")
+        sys.exit(1)
+    
+    # Determine MIME type from extension
+    ext = path.suffix.lower()
+    mime_types = {
+        '.png': 'image/png',
+        '.jpg': 'image/jpeg',
+        '.jpeg': 'image/jpeg',
+        '.gif': 'image/gif',
+        '.webp': 'image/webp',
+    }
+    mime_type = mime_types.get(ext, 'image/png')
+    
+    with open(path, 'rb') as f:
+        image_data = f.read()
+    
+    base64_data = base64.b64encode(image_data).decode('utf-8')
+    return f"data:{mime_type};base64,{base64_data}"
+
+
+def save_base64_image(base64_data: str, output_path: str) -> None:
+    """Save base64 encoded image to file."""
+    # Remove data URL prefix if present
+    if ',' in base64_data:
+        base64_data = base64_data.split(',', 1)[1]
+
+    # Decode and save
+    image_data = base64.b64decode(base64_data)
+    with open(output_path, 'wb') as f:
+        f.write(image_data)
+
+
+def generate_image(
+    prompt: str,
+    model: str = "google/gemini-3-pro-image-preview",
+    output_path: str = "generated_image.png",
+    api_key: Optional[str] = None,
+    input_image: Optional[str] = None
+) -> dict:
+    """
+    Generate or edit an image using OpenRouter API.
+
+    Args:
+        prompt: Text description of the image to generate, or editing instructions
+        model: OpenRouter model ID (default: google/gemini-3-pro-image-preview)
+        output_path: Path to save the generated image
+        api_key: OpenRouter API key (will check .env if not provided)
+        input_image: Path to an input image for editing (optional)
+
+    Returns:
+        dict: Response from OpenRouter API
+    """
+    try:
+        import requests
+    except ImportError:
+        print("Error: 'requests' library not found. Install with: pip install requests")
+        sys.exit(1)
+
+    # Check for API key
+    if not api_key:
+        api_key = check_env_file()
+
+    if not api_key:
+        print("❌ Error: OPENROUTER_API_KEY not found!")
+        print("\nPlease create a .env file in your project directory with:")
+        print("OPENROUTER_API_KEY=your-api-key-here")
+        print("\nOr set the environment variable:")
+        print("export OPENROUTER_API_KEY=your-api-key-here")
+        print("\nGet your API key from: https://openrouter.ai/keys")
+        sys.exit(1)
+
+    # Determine if this is generation or editing
+    is_editing = input_image is not None
+    
+    if is_editing:
+        print(f"✏️ Editing image with model: {model}")
+        print(f"📷 Input image: {input_image}")
+        print(f"📝 Edit prompt: {prompt}")
+        
+        # Load input image as base64
+        image_data_url = load_image_as_base64(input_image)
+        
+        # Build multimodal message content for image editing
+        message_content = [
+            {
+                "type": "text",
+                "text": prompt
+            },
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": image_data_url
+                }
+            }
+        ]
+    else:
+        print(f"🎨 Generating image with model: {model}")
+        print(f"📝 Prompt: {prompt}")
+        message_content = prompt
+
+    # Make API request
+    response = requests.post(
+        url="https://openrouter.ai/api/v1/chat/completions",
+        headers={
+            "Authorization": f"Bearer {api_key}",
+            "Content-Type": "application/json",
+        },
+        json={
+            "model": model,
+            "messages": [
+                {
+                    "role": "user",
+                    "content": message_content
+                }
+            ],
+            "modalities": ["image", "text"]
+        }
+    )
+
+    # Check for errors
+    if response.status_code != 200:
+        print(f"❌ API Error ({response.status_code}): {response.text}")
+        sys.exit(1)
+
+    result = response.json()
+
+    # Extract and save image
+    if result.get("choices"):
+        message = result["choices"][0]["message"]
+
+        # Handle both 'images' and 'content' response formats
+        images = []
+
+        if message.get("images"):
+            images = message["images"]
+        elif message.get("content"):
+            # Some models return content as array with image parts
+            content = message["content"]
+            if isinstance(content, list):
+                for part in content:
+                    if isinstance(part, dict) and part.get("type") == "image":
+                        images.append(part)
+
+        if images:
+            # Save the first image
+            image = images[0]
+            if "image_url" in image:
+                image_url = image["image_url"]["url"]
+                save_base64_image(image_url, output_path)
+                print(f"✅ Image saved to: {output_path}")
+            elif "url" in image:
+                save_base64_image(image["url"], output_path)
+                print(f"✅ Image saved to: {output_path}")
+            else:
+                print(f"⚠️ Unexpected image format: {image}")
+        else:
+            print("⚠️ No image found in response")
+            if message.get("content"):
+                print(f"Response content: {message['content']}")
+    else:
+        print("❌ No choices in response")
+        print(f"Response: {json.dumps(result, indent=2)}")
+
+    return result
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate or edit images using OpenRouter API",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Generate with default model (Gemini 3 Pro Image Preview)
+  python generate_image.py "A beautiful sunset over mountains"
+
+  # Use a specific model
+  python generate_image.py "A cat in space" --model "black-forest-labs/flux.2-pro"
+
+  # Specify output path
+  python generate_image.py "Abstract art" --output my_image.png
+
+  # Edit an existing image
+  python generate_image.py "Make the sky purple" --input photo.jpg --output edited.png
+
+  # Edit with a specific model
+  python generate_image.py "Add a hat to the person" --input portrait.png -m "black-forest-labs/flux.2-pro"
+
+Popular image models:
+  - google/gemini-3-pro-image-preview (default, high quality, generation + editing)
+  - black-forest-labs/flux.2-pro (fast, high quality, generation + editing)
+  - black-forest-labs/flux.2-dev (development version)
+        """
+    )
+
+    parser.add_argument(
+        "prompt",
+        type=str,
+        help="Text description of the image to generate, or editing instructions"
+    )
+
+    parser.add_argument(
+        "--model", "-m",
+        type=str,
+        default="google/gemini-3-pro-image-preview",
+        help="OpenRouter model ID (default: google/gemini-3-pro-image-preview)"
+    )
+
+    parser.add_argument(
+        "--output", "-o",
+        type=str,
+        default="generated_image.png",
+        help="Output file path (default: generated_image.png)"
+    )
+
+    parser.add_argument(
+        "--input", "-i",
+        type=str,
+        help="Input image path for editing (enables edit mode)"
+    )
+
+    parser.add_argument(
+        "--api-key",
+        type=str,
+        help="OpenRouter API key (will check .env if not provided)"
+    )
+
+    args = parser.parse_args()
+
+    generate_image(
+        prompt=args.prompt,
+        model=args.model,
+        output_path=args.output,
+        api_key=args.api_key,
+        input_image=args.input
+    )
+
+
+if __name__ == "__main__":
+    main()