Add open-notebook skill: self-hosted NotebookLM alternative (issue #56)

Implements the open-notebook skill as a comprehensive integration for the open-source, self-hosted alternative to Google NotebookLM. Addresses the gap created by Google not providing a public NotebookLM API. Developed using TDD with 44 tests covering skill structure, SKILL.md frontmatter/content, reference documentation, example scripts, API endpoint coverage, and marketplace.json registration. Includes: - SKILL.md with full documentation, code examples, and provider matrix - references/api_reference.md covering all 20+ REST API endpoint groups - references/examples.md with complete research workflow examples - references/configuration.md with Docker, env vars, and security setup - references/architecture.md with system design and data flow diagrams - scripts/ with 3 example scripts (notebook, source, chat) + test suite - marketplace.json updated to register the new skill Closes #56 https://claude.ai/code/session_015CqcNWNYmDF9sqxKxziXcz
2026-03-27 07:09:27 +08:00 · 2026-02-23 00:18:19 +00:00
parent f7585b7624
commit 259e01f7fd
10 changed files with 2599 additions and 0 deletions
--- a/scientific-skills/open-notebook/references/api_reference.md
+++ b/scientific-skills/open-notebook/references/api_reference.md
@@ -0,0 +1,715 @@
+# Open Notebook API Reference
+
+## Base URL
+
+```
+http://localhost:5055/api
+```
+
+Interactive API documentation is available at `http://localhost:5055/docs` (Swagger UI) and `http://localhost:5055/redoc` (ReDoc).
+
+## Authentication
+
+If `OPEN_NOTEBOOK_PASSWORD` is configured, include the password in requests. The following routes are excluded from authentication: `/`, `/health`, `/docs`, `/openapi.json`, `/redoc`, `/api/auth/status`, `/api/config`.
+
+---
+
+## Notebooks
+
+### List Notebooks
+
+```
+GET /api/notebooks
+```
+
+**Query Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `archived` | boolean | Filter by archived status |
+| `order_by` | string | Sort field (default: `updated_at`) |
+
+**Response:** Array of notebook objects with `source_count` and `note_count`.
+
+### Create Notebook
+
+```
+POST /api/notebooks
+```
+
+**Request Body:**
+```json
+{
+  "name": "My Research",
+  "description": "Optional description"
+}
+```
+
+### Get Notebook
+
+```
+GET /api/notebooks/{notebook_id}
+```
+
+### Update Notebook
+
+```
+PUT /api/notebooks/{notebook_id}
+```
+
+**Request Body:**
+```json
+{
+  "name": "Updated Name",
+  "description": "Updated description",
+  "archived": false
+}
+```
+
+### Delete Notebook
+
+```
+DELETE /api/notebooks/{notebook_id}
+```
+
+**Query Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `delete_sources` | boolean | Also delete exclusive sources (default: false) |
+
+### Delete Preview
+
+```
+GET /api/notebooks/{notebook_id}/delete-preview
+```
+
+Returns counts of notes and sources that would be affected by deletion.
+
+### Link Source to Notebook
+
+```
+POST /api/notebooks/{notebook_id}/sources/{source_id}
+```
+
+Idempotent operation to associate a source with a notebook.
+
+### Unlink Source from Notebook
+
+```
+DELETE /api/notebooks/{notebook_id}/sources/{source_id}
+```
+
+---
+
+## Sources
+
+### List Sources
+
+```
+GET /api/sources
+```
+
+**Query Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `notebook_id` | string | Filter by notebook |
+| `limit` | integer | Number of results |
+| `offset` | integer | Pagination offset |
+| `order_by` | string | Sort field |
+
+### Create Source
+
+```
+POST /api/sources
+```
+
+Accepts multipart form data for file uploads or JSON for URL/text sources.
+
+**Form Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `file` | file | Upload file (PDF, DOCX, audio, video) |
+| `url` | string | Web URL to ingest |
+| `text` | string | Raw text content |
+| `notebook_id` | string | Associate with notebook |
+| `process_async` | boolean | Process asynchronously (default: true) |
+
+### Create Source (JSON)
+
+```
+POST /api/sources/json
+```
+
+Legacy JSON-based endpoint for source creation.
+
+### Get Source
+
+```
+GET /api/sources/{source_id}
+```
+
+### Get Source Status
+
+```
+GET /api/sources/{source_id}/status
+```
+
+Poll processing status for asynchronously ingested sources.
+
+### Update Source
+
+```
+PUT /api/sources/{source_id}
+```
+
+**Request Body:**
+```json
+{
+  "title": "Updated Title",
+  "topic": "Updated topic"
+}
+```
+
+### Delete Source
+
+```
+DELETE /api/sources/{source_id}
+```
+
+### Download Source File
+
+```
+GET /api/sources/{source_id}/download
+```
+
+Returns the original uploaded file.
+
+### Check Source File
+
+```
+HEAD /api/sources/{source_id}/download
+```
+
+### Retry Failed Source
+
+```
+POST /api/sources/{source_id}/retry
+```
+
+Requeue a failed source for processing.
+
+### Get Source Insights
+
+```
+GET /api/sources/{source_id}/insights
+```
+
+Retrieve AI-generated insights for a source.
+
+---
+
+## Notes
+
+### List Notes
+
+```
+GET /api/notes
+```
+
+**Query Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `notebook_id` | string | Filter by notebook |
+
+### Create Note
+
+```
+POST /api/notes
+```
+
+**Request Body:**
+```json
+{
+  "title": "My Note",
+  "content": "Note content...",
+  "note_type": "human",
+  "notebook_id": "notebook:abc123"
+}
+```
+
+`note_type` must be `"human"` or `"ai"`. AI notes without titles get auto-generated titles.
+
+### Get Note
+
+```
+GET /api/notes/{note_id}
+```
+
+### Update Note
+
+```
+PUT /api/notes/{note_id}
+```
+
+**Request Body:**
+```json
+{
+  "title": "Updated Title",
+  "content": "Updated content",
+  "note_type": "human"
+}
+```
+
+### Delete Note
+
+```
+DELETE /api/notes/{note_id}
+```
+
+---
+
+## Chat
+
+### List Sessions
+
+```
+GET /api/chat/sessions
+```
+
+**Query Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `notebook_id` | string | Filter by notebook |
+
+### Create Session
+
+```
+POST /api/chat/sessions
+```
+
+**Request Body:**
+```json
+{
+  "notebook_id": "notebook:abc123",
+  "title": "Discussion Topic",
+  "model_override": "optional_model_id"
+}
+```
+
+### Get Session
+
+```
+GET /api/chat/sessions/{session_id}
+```
+
+Returns session details with message history.
+
+### Update Session
+
+```
+PUT /api/chat/sessions/{session_id}
+```
+
+### Delete Session
+
+```
+DELETE /api/chat/sessions/{session_id}
+```
+
+### Execute Chat
+
+```
+POST /api/chat/execute
+```
+
+**Request Body:**
+```json
+{
+  "session_id": "chat_session:abc123",
+  "message": "Your question here",
+  "context": {
+    "include_sources": true,
+    "include_notes": true
+  },
+  "model_override": "optional_model_id"
+}
+```
+
+### Build Context
+
+```
+POST /api/chat/context
+```
+
+Build contextual data from sources and notes for a chat session.
+
+---
+
+## Search
+
+### Search Knowledge Base
+
+```
+POST /api/search
+```
+
+**Request Body:**
+```json
+{
+  "query": "search terms",
+  "search_type": "vector",
+  "limit": 10,
+  "source_ids": [],
+  "note_ids": [],
+  "min_similarity": 0.7
+}
+```
+
+`search_type` can be `"vector"` (requires embedding model) or `"text"` (keyword matching).
+
+### Ask with Streaming
+
+```
+POST /api/search/ask
+```
+
+Returns Server-Sent Events with AI-generated answers based on knowledge base content.
+
+### Ask Simple
+
+```
+POST /api/search/ask/simple
+```
+
+Non-streaming version that returns a complete response.
+
+---
+
+## Podcasts
+
+### Generate Podcast
+
+```
+POST /api/podcasts/generate
+```
+
+**Request Body:**
+```json
+{
+  "notebook_id": "notebook:abc123",
+  "episode_profile_id": "episode_profile:xyz",
+  "speaker_profile_ids": ["speaker:a", "speaker:b"]
+}
+```
+
+Returns a `job_id` for tracking generation progress.
+
+### Get Job Status
+
+```
+GET /api/podcasts/jobs/{job_id}
+```
+
+### List Episodes
+
+```
+GET /api/podcasts/episodes
+```
+
+### Get Episode
+
+```
+GET /api/podcasts/episodes/{episode_id}
+```
+
+### Get Episode Audio
+
+```
+GET /api/podcasts/episodes/{episode_id}/audio
+```
+
+Streams the podcast audio file.
+
+### Retry Failed Episode
+
+```
+POST /api/podcasts/episodes/{episode_id}/retry
+```
+
+### Delete Episode
+
+```
+DELETE /api/podcasts/episodes/{episode_id}
+```
+
+---
+
+## Transformations
+
+### List Transformations
+
+```
+GET /api/transformations
+```
+
+### Create Transformation
+
+```
+POST /api/transformations
+```
+
+**Request Body:**
+```json
+{
+  "name": "summarize",
+  "title": "Summarize Content",
+  "description": "Generate a concise summary",
+  "prompt": "Summarize the following text...",
+  "apply_default": false
+}
+```
+
+### Execute Transformation
+
+```
+POST /api/transformations/execute
+```
+
+**Request Body:**
+```json
+{
+  "transformation_id": "transformation:abc",
+  "input_text": "Text to transform...",
+  "model_id": "model:xyz"
+}
+```
+
+### Get Default Prompt
+
+```
+GET /api/transformations/default-prompt
+```
+
+### Update Default Prompt
+
+```
+PUT /api/transformations/default-prompt
+```
+
+### Get Transformation
+
+```
+GET /api/transformations/{transformation_id}
+```
+
+### Update Transformation
+
+```
+PUT /api/transformations/{transformation_id}
+```
+
+### Delete Transformation
+
+```
+DELETE /api/transformations/{transformation_id}
+```
+
+---
+
+## Models
+
+### List Models
+
+```
+GET /api/models
+```
+
+**Query Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `model_type` | string | Filter by type (llm, embedding, stt, tts) |
+
+### Create Model
+
+```
+POST /api/models
+```
+
+### Delete Model
+
+```
+DELETE /api/models/{model_id}
+```
+
+### Test Model
+
+```
+POST /api/models/{model_id}/test
+```
+
+### Get Default Models
+
+```
+GET /api/models/defaults
+```
+
+Returns default model assignments for seven service slots: chat, transformation, embedding, speech-to-text, text-to-speech, podcast, and summary.
+
+### Update Default Models
+
+```
+PUT /api/models/defaults
+```
+
+### Get Providers
+
+```
+GET /api/models/providers
+```
+
+### Discover Models
+
+```
+GET /api/models/discover/{provider}
+```
+
+### Sync Models (Single Provider)
+
+```
+POST /api/models/sync/{provider}
+```
+
+### Sync All Models
+
+```
+POST /api/models/sync
+```
+
+### Auto-Assign Defaults
+
+```
+POST /api/models/auto-assign
+```
+
+Automatically populate empty default model slots using provider priority rankings.
+
+### Get Model Count
+
+```
+GET /api/models/count/{provider}
+```
+
+### Get Models by Provider
+
+```
+GET /api/models/by-provider/{provider}
+```
+
+---
+
+## Credentials
+
+### Get Status
+
+```
+GET /api/credentials/status
+```
+
+### Get Environment Status
+
+```
+GET /api/credentials/env-status
+```
+
+### List Credentials
+
+```
+GET /api/credentials
+```
+
+**Query Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `provider` | string | Filter by provider |
+
+### List by Provider
+
+```
+GET /api/credentials/by-provider/{provider}
+```
+
+### Create Credential
+
+```
+POST /api/credentials
+```
+
+**Request Body:**
+```json
+{
+  "provider": "openai",
+  "name": "My OpenAI Key",
+  "api_key": "sk-...",
+  "base_url": null
+}
+```
+
+### Get Credential
+
+```
+GET /api/credentials/{credential_id}
+```
+
+Note: API key values are never returned.
+
+### Update Credential
+
+```
+PUT /api/credentials/{credential_id}
+```
+
+### Delete Credential
+
+```
+DELETE /api/credentials/{credential_id}
+```
+
+### Test Credential
+
+```
+POST /api/credentials/{credential_id}/test
+```
+
+### Discover Models via Credential
+
+```
+POST /api/credentials/{credential_id}/discover
+```
+
+### Register Models via Credential
+
+```
+POST /api/credentials/{credential_id}/register-models
+```
+
+---
+
+## Error Responses
+
+The API returns standard HTTP status codes with JSON error bodies:
+
+| Status | Meaning |
+|--------|---------|
+| 400 | Invalid input |
+| 401 | Authentication required |
+| 404 | Resource not found |
+| 422 | Configuration error |
+| 429 | Rate limited |
+| 500 | Internal server error |
+| 502 | External service error |
+
+**Error Response Format:**
+```json
+{
+  "detail": "Description of the error"
+}
+```
--- a/scientific-skills/open-notebook/references/architecture.md
+++ b/scientific-skills/open-notebook/references/architecture.md
@@ -0,0 +1,163 @@
+# Open Notebook Architecture
+
+## System Overview
+
+Open Notebook is built as a modern Python web application with a clear separation between frontend and backend, using Docker for deployment.
+
+```
+┌─────────────────────────────────────────────────────┐
+│                   Docker Compose                     │
+│                                                     │
+│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐ │
+│  │   Next.js     │  │   FastAPI    │  │ SurrealDB │ │
+│  │   Frontend    │──│   Backend    │──│           │ │
+│  │  (port 8502)  │  │  (port 5055) │  │ (port 8K) │ │
+│  └──────────────┘  └──────────────┘  └───────────┘ │
+│                          │                          │
+│                    ┌─────┴─────┐                    │
+│                    │ LangChain │                    │
+│                    │ Esperanto │                    │
+│                    └─────┬─────┘                    │
+│                          │                          │
+│              ┌───────────┼───────────┐              │
+│              │           │           │              │
+│          ┌───┴───┐  ┌───┴───┐  ┌───┴───┐          │
+│          │OpenAI │  │Claude │  │Ollama │  ...      │
+│          └───────┘  └───────┘  └───────┘           │
+└─────────────────────────────────────────────────────┘
+```
+
+## Core Components
+
+### FastAPI Backend
+
+The REST API is built with FastAPI and organized into routers:
+
+- **20 route modules** covering notebooks, sources, notes, chat, search, podcasts, transformations, models, credentials, embeddings, settings, and more
+- Async/await throughout for non-blocking I/O
+- Pydantic models for request/response validation
+- Custom exception handlers mapping domain errors to HTTP status codes
+- CORS middleware for cross-origin access
+- Optional password authentication middleware
+
+### SurrealDB
+
+SurrealDB serves as the primary data store, providing both document and relational capabilities:
+
+- **Document storage** for notebooks, sources, notes, transformations, and models
+- **Relational references** for notebook-source associations
+- **Full-text search** across indexed content
+- **RocksDB** backend for persistent storage on disk
+- Schema migrations run automatically on application startup
+
+### LangChain Integration
+
+AI features are powered by LangChain with the Esperanto multi-provider library:
+
+- **LangGraph** manages conversational state for chat sessions
+- **Embedding models** power vector search across content
+- **LLM chains** drive transformations, note generation, and podcast scripting
+- **Prompt templates** stored in the `prompts/` directory
+
+### Esperanto Multi-Provider Library
+
+Esperanto provides a unified interface to 16+ AI providers:
+
+- Abstracts provider-specific API differences
+- Supports LLM, embedding, speech-to-text, and text-to-speech capabilities
+- Handles credential management and model discovery
+- Enables runtime provider switching without code changes
+
+### Next.js Frontend
+
+The user interface is a React application built with Next.js:
+
+- Responsive design for desktop and tablet use
+- Real-time updates for chat and processing status
+- File upload with progress tracking
+- Audio player for podcast episodes
+
+## Data Flow
+
+### Source Ingestion
+
+```
+Upload/URL → Source Record Created → Processing Queue
+                                         │
+                              ┌──────────┼──────────┐
+                              ▼          ▼          ▼
+                          Text       Embedding   Metadata
+                        Extraction   Generation  Extraction
+                              │          │          │
+                              └──────────┼──────────┘
+                                         ▼
+                                  Source Updated
+                                  (searchable)
+```
+
+### Chat Execution
+
+```
+User Message → Build Context (sources + notes)
+                    │
+                    ▼
+              LangGraph State Machine
+                    │
+                    ├─ Retrieve relevant context
+                    ├─ Format prompt with citations
+                    └─ Stream LLM response
+                         │
+                         ▼
+                   Response with
+                   source citations
+```
+
+### Podcast Generation
+
+```
+Notebook Content → Episode Profile → Script Generation (LLM)
+                                          │
+                                          ▼
+                                    Speaker Assignment
+                                          │
+                                          ▼
+                                    Text-to-Speech
+                                    (per segment)
+                                          │
+                                          ▼
+                                    Audio Assembly
+                                          │
+                                          ▼
+                                    Episode Record
+                                    + Audio File
+```
+
+## Key Design Decisions
+
+1. **Multi-provider by default**: Not locked to any single AI provider, enabling cost optimization and capability matching
+2. **Async processing**: Long-running operations (source ingestion, podcast generation) run asynchronously with status polling
+3. **Self-hosted data**: All data stays on the user's infrastructure with encrypted credential storage
+4. **REST-first API**: Every UI action is backed by an API endpoint for automation
+5. **Docker-native**: Designed for containerized deployment with persistent volumes
+
+## File Structure
+
+```
+open-notebook/
+├── api/               # FastAPI REST API
+│   ├── main.py        # App setup, middleware, routers
+│   ├── routers/       # Route handlers (20 modules)
+│   ├── models.py      # Pydantic request/response models
+│   └── auth.py        # Authentication middleware
+├── open_notebook/     # Core library
+│   ├── ai/            # AI integration (LangChain, Esperanto)
+│   ├── database/      # SurrealDB operations
+│   ├── domain/        # Domain models and business logic
+│   ├── graphs/        # LangGraph chat and processing graphs
+│   ├── podcasts/      # Podcast generation pipeline
+│   └── utils/         # Shared utilities
+├── frontend/          # Next.js React application
+├── prompts/           # AI prompt templates
+├── tests/             # Test suite
+└── docker-compose.yml # Deployment configuration
+```
--- a/scientific-skills/open-notebook/references/configuration.md
+++ b/scientific-skills/open-notebook/references/configuration.md
@@ -0,0 +1,226 @@
+# Open Notebook Configuration Guide
+
+## Docker Deployment
+
+Open Notebook is deployed as a Docker Compose stack with two main services: the application server and SurrealDB.
+
+### Minimal docker-compose.yml
+
+```yaml
+version: "3.8"
+
+services:
+  surrealdb:
+    image: surrealdb/surrealdb:latest
+    command: start --user root --pass root rocksdb://data/database.db
+    volumes:
+      - surrealdb_data:/data
+    ports:
+      - "8000:8000"
+
+  open-notebook:
+    image: ghcr.io/lfnovo/open-notebook:latest
+    depends_on:
+      - surrealdb
+    environment:
+      - OPEN_NOTEBOOK_ENCRYPTION_KEY=${OPEN_NOTEBOOK_ENCRYPTION_KEY}
+      - SURREAL_URL=ws://surrealdb:8000/rpc
+      - SURREAL_NAMESPACE=open_notebook
+      - SURREAL_DATABASE=open_notebook
+    ports:
+      - "8502:8502"   # Frontend UI
+      - "5055:5055"   # REST API
+    volumes:
+      - on_uploads:/app/uploads
+
+volumes:
+  surrealdb_data:
+  on_uploads:
+```
+
+### Starting the Stack
+
+```bash
+# Set the encryption key (required)
+export OPEN_NOTEBOOK_ENCRYPTION_KEY="your-secure-random-key"
+
+# Start services
+docker-compose up -d
+
+# View logs
+docker-compose logs -f open-notebook
+
+# Stop services
+docker-compose down
+
+# Stop and remove data
+docker-compose down -v
+```
+
+## Environment Variables
+
+### Required
+
+| Variable | Description |
+|----------|-------------|
+| `OPEN_NOTEBOOK_ENCRYPTION_KEY` | Secret key for encrypting stored API credentials. Must be set before first launch and kept consistent. |
+
+### Database
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SURREAL_URL` | `ws://surrealdb:8000/rpc` | SurrealDB WebSocket connection URL |
+| `SURREAL_NAMESPACE` | `open_notebook` | SurrealDB namespace |
+| `SURREAL_DATABASE` | `open_notebook` | SurrealDB database name |
+| `SURREAL_USER` | `root` | SurrealDB username |
+| `SURREAL_PASS` | `root` | SurrealDB password |
+
+### Application
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `OPEN_NOTEBOOK_PASSWORD` | None | Optional password protection for the web UI |
+| `UPLOAD_DIR` | `/app/uploads` | Directory for uploaded file storage |
+
+### AI Provider Keys (Legacy)
+
+API keys can also be set via environment variables for legacy compatibility. The preferred method is using the credentials API or UI.
+
+| Variable | Provider |
+|----------|----------|
+| `OPENAI_API_KEY` | OpenAI |
+| `ANTHROPIC_API_KEY` | Anthropic |
+| `GOOGLE_API_KEY` | Google GenAI |
+| `GROQ_API_KEY` | Groq |
+| `MISTRAL_API_KEY` | Mistral |
+| `ELEVENLABS_API_KEY` | ElevenLabs |
+
+## AI Provider Configuration
+
+### Via UI
+
+1. Go to **Settings > API Keys**
+2. Click **Add Credential**
+3. Select provider, enter API key and optional base URL
+4. Click **Test Connection** to verify
+5. Click **Discover Models** to find available models
+6. Select models to register
+
+### Via API
+
+```python
+import requests
+
+BASE_URL = "http://localhost:5055/api"
+
+# 1. Create credential
+cred = requests.post(f"{BASE_URL}/credentials", json={
+    "provider": "anthropic",
+    "name": "Anthropic Production",
+    "api_key": "sk-ant-..."
+}).json()
+
+# 2. Test connection
+test = requests.post(f"{BASE_URL}/credentials/{cred['id']}/test").json()
+assert test["success"]
+
+# 3. Discover and register models
+discovered = requests.post(
+    f"{BASE_URL}/credentials/{cred['id']}/discover"
+).json()
+
+requests.post(
+    f"{BASE_URL}/credentials/{cred['id']}/register-models",
+    json={"model_ids": [m["id"] for m in discovered["models"]]}
+)
+
+# 4. Auto-assign defaults
+requests.post(f"{BASE_URL}/models/auto-assign")
+```
+
+### Using Ollama (Free Local Inference)
+
+For free AI inference without API costs, use Ollama:
+
+```yaml
+# docker-compose-ollama.yml addition
+services:
+  ollama:
+    image: ollama/ollama:latest
+    volumes:
+      - ollama_data:/root/.ollama
+    ports:
+      - "11434:11434"
+```
+
+Then configure Ollama as a provider with base URL `http://ollama:11434`.
+
+## Security Configuration
+
+### Password Protection
+
+Set `OPEN_NOTEBOOK_PASSWORD` to require authentication:
+
+```bash
+export OPEN_NOTEBOOK_PASSWORD="your-ui-password"
+```
+
+### Reverse Proxy (Nginx Example)
+
+```nginx
+server {
+    listen 443 ssl;
+    server_name notebook.example.com;
+
+    ssl_certificate /etc/ssl/certs/cert.pem;
+    ssl_certificate_key /etc/ssl/private/key.pem;
+
+    location / {
+        proxy_pass http://localhost:8502;
+        proxy_http_version 1.1;
+        proxy_set_header Upgrade $http_upgrade;
+        proxy_set_header Connection "upgrade";
+        proxy_set_header Host $host;
+    }
+
+    location /api/ {
+        proxy_pass http://localhost:5055/api/;
+        proxy_set_header Host $host;
+    }
+}
+```
+
+## Backup and Restore
+
+### Backup SurrealDB Data
+
+```bash
+# Export database
+docker exec surrealdb surreal export \
+  --conn ws://localhost:8000 \
+  --user root --pass root \
+  --ns open_notebook --db open_notebook \
+  /tmp/backup.surql
+
+# Copy backup from container
+docker cp surrealdb:/tmp/backup.surql ./backup.surql
+```
+
+### Backup Uploaded Files
+
+```bash
+# Copy upload volume contents
+docker cp open-notebook:/app/uploads ./uploads_backup/
+```
+
+### Restore
+
+```bash
+# Import database backup
+docker cp ./backup.surql surrealdb:/tmp/backup.surql
+docker exec surrealdb surreal import \
+  --conn ws://localhost:8000 \
+  --user root --pass root \
+  --ns open_notebook --db open_notebook \
+  /tmp/backup.surql
+```
--- a/scientific-skills/open-notebook/references/examples.md
+++ b/scientific-skills/open-notebook/references/examples.md
@@ -0,0 +1,290 @@
+# Open Notebook Examples
+
+## Complete Research Workflow
+
+This example demonstrates a full research workflow: creating a notebook, adding sources, generating notes, chatting with the AI, and searching across materials.
+
+```python
+import requests
+import time
+
+BASE_URL = "http://localhost:5055/api"
+
+
+def complete_research_workflow():
+    """End-to-end research workflow with Open Notebook."""
+
+    # 1. Create a research notebook
+    notebook = requests.post(f"{BASE_URL}/notebooks", json={
+        "name": "Drug Resistance in Cancer",
+        "description": "Review of mechanisms of drug resistance in solid tumors"
+    }).json()
+    notebook_id = notebook["id"]
+    print(f"Created notebook: {notebook_id}")
+
+    # 2. Add sources from URLs
+    urls = [
+        "https://www.nature.com/articles/s41568-020-0281-y",
+        "https://www.cell.com/cancer-cell/fulltext/S1535-6108(20)30211-8",
+    ]
+
+    source_ids = []
+    for url in urls:
+        source = requests.post(f"{BASE_URL}/sources", data={
+            "url": url,
+            "notebook_id": notebook_id,
+            "process_async": "true"
+        }).json()
+        source_ids.append(source["id"])
+        print(f"Added source: {source['id']}")
+
+    # 3. Wait for processing to complete
+    for source_id in source_ids:
+        while True:
+            status = requests.get(
+                f"{BASE_URL}/sources/{source_id}/status"
+            ).json()
+            if status.get("status") in ("completed", "failed"):
+                break
+            time.sleep(5)
+        print(f"Source {source_id}: {status['status']}")
+
+    # 4. Create a chat session and ask questions
+    session = requests.post(f"{BASE_URL}/chat/sessions", json={
+        "notebook_id": notebook_id,
+        "title": "Resistance Mechanisms"
+    }).json()
+
+    answer = requests.post(f"{BASE_URL}/chat/execute", json={
+        "session_id": session["id"],
+        "message": "What are the primary mechanisms of drug resistance in solid tumors?",
+        "context": {"include_sources": True, "include_notes": True}
+    }).json()
+    print(f"AI response: {answer}")
+
+    # 5. Search across materials
+    results = requests.post(f"{BASE_URL}/search", json={
+        "query": "efflux pump resistance mechanism",
+        "search_type": "vector",
+        "limit": 5
+    }).json()
+    print(f"Found {results['total']} search results")
+
+    # 6. Create a human note summarizing findings
+    note = requests.post(f"{BASE_URL}/notes", json={
+        "title": "Summary of Resistance Mechanisms",
+        "content": "Key findings from the literature...",
+        "note_type": "human",
+        "notebook_id": notebook_id
+    }).json()
+    print(f"Created note: {note['id']}")
+
+
+if __name__ == "__main__":
+    complete_research_workflow()
+```
+
+## File Upload Example
+
+```python
+import requests
+
+BASE_URL = "http://localhost:5055/api"
+
+
+def upload_research_papers(notebook_id, file_paths):
+    """Upload multiple research papers to a notebook."""
+    for path in file_paths:
+        with open(path, "rb") as f:
+            response = requests.post(
+                f"{BASE_URL}/sources",
+                data={
+                    "notebook_id": notebook_id,
+                    "process_async": "true",
+                },
+                files={"file": (path.split("/")[-1], f)},
+            )
+        if response.status_code == 200:
+            print(f"Uploaded: {path}")
+        else:
+            print(f"Failed: {path} - {response.text}")
+
+
+# Usage
+upload_research_papers("notebook:abc123", [
+    "papers/study_1.pdf",
+    "papers/study_2.pdf",
+    "papers/supplementary.docx",
+])
+```
+
+## Podcast Generation Example
+
+```python
+import requests
+import time
+
+BASE_URL = "http://localhost:5055/api"
+
+
+def generate_research_podcast(notebook_id):
+    """Generate a podcast episode from notebook contents."""
+
+    # Get available episode and speaker profiles
+    # (these must be configured in the UI or via API first)
+
+    # Submit podcast generation job
+    job = requests.post(f"{BASE_URL}/podcasts/generate", json={
+        "notebook_id": notebook_id,
+        "episode_profile_id": "episode_profile:default",
+        "speaker_profile_ids": [
+            "speaker_profile:host",
+            "speaker_profile:expert"
+        ]
+    }).json()
+    job_id = job["job_id"]
+    print(f"Podcast generation started: {job_id}")
+
+    # Poll for completion
+    while True:
+        status = requests.get(f"{BASE_URL}/podcasts/jobs/{job_id}").json()
+        print(f"Status: {status.get('status', 'processing')}")
+        if status.get("status") in ("completed", "failed"):
+            break
+        time.sleep(10)
+
+    if status["status"] == "completed":
+        # Download the audio
+        episode_id = status["episode_id"]
+        audio = requests.get(
+            f"{BASE_URL}/podcasts/episodes/{episode_id}/audio"
+        )
+        with open("research_podcast.mp3", "wb") as f:
+            f.write(audio.content)
+        print("Podcast saved to research_podcast.mp3")
+
+
+if __name__ == "__main__":
+    generate_research_podcast("notebook:abc123")
+```
+
+## Custom Transformation Pipeline
+
+```python
+import requests
+
+BASE_URL = "http://localhost:5055/api"
+
+
+def create_and_run_transformations():
+    """Create custom transformations and apply them to content."""
+
+    # Create a methodology extraction transformation
+    transform = requests.post(f"{BASE_URL}/transformations", json={
+        "name": "extract_methods",
+        "title": "Extract Methods",
+        "description": "Extract and structure methodology from papers",
+        "prompt": (
+            "Extract the methodology section from this text. "
+            "Organize into: Study Design, Sample Size, Statistical Methods, "
+            "and Key Variables. Format as structured markdown."
+        ),
+        "apply_default": False,
+    }).json()
+
+    # Get models to find a suitable one
+    models = requests.get(f"{BASE_URL}/models", params={
+        "model_type": "llm"
+    }).json()
+    model_id = models[0]["id"]
+
+    # Execute the transformation
+    result = requests.post(f"{BASE_URL}/transformations/execute", json={
+        "transformation_id": transform["id"],
+        "input_text": "We conducted a randomized controlled trial with...",
+        "model_id": model_id,
+    }).json()
+    print(f"Extracted methods:\n{result['output']}")
+
+
+if __name__ == "__main__":
+    create_and_run_transformations()
+```
+
+## Semantic Search with Filtering
+
+```python
+import requests
+
+BASE_URL = "http://localhost:5055/api"
+
+
+def advanced_search(notebook_id, query):
+    """Perform filtered semantic search and get AI answers."""
+
+    # Get sources from a specific notebook
+    sources = requests.get(f"{BASE_URL}/sources", params={
+        "notebook_id": notebook_id
+    }).json()
+    source_ids = [s["id"] for s in sources]
+
+    # Vector search restricted to notebook sources
+    results = requests.post(f"{BASE_URL}/search", json={
+        "query": query,
+        "search_type": "vector",
+        "limit": 10,
+        "source_ids": source_ids,
+        "min_similarity": 0.75,
+    }).json()
+
+    print(f"Found {results['total']} results:")
+    for result in results["results"]:
+        print(f"  - {result.get('title', 'Untitled')} "
+              f"(similarity: {result.get('similarity', 'N/A')})")
+
+    # Get an AI-powered answer
+    answer = requests.post(f"{BASE_URL}/search/ask/simple", json={
+        "query": query,
+    }).json()
+    print(f"\nAI Answer: {answer['response']}")
+
+
+if __name__ == "__main__":
+    advanced_search("notebook:abc123", "CRISPR gene editing efficiency")
+```
+
+## Model Management
+
+```python
+import requests
+
+BASE_URL = "http://localhost:5055/api"
+
+
+def setup_ai_models():
+    """Configure AI models for Open Notebook."""
+
+    # Check available providers
+    providers = requests.get(f"{BASE_URL}/models/providers").json()
+    print(f"Available providers: {providers}")
+
+    # Discover models from a provider
+    discovered = requests.get(
+        f"{BASE_URL}/models/discover/openai"
+    ).json()
+    print(f"Discovered {len(discovered)} OpenAI models")
+
+    # Sync models to make them available
+    requests.post(f"{BASE_URL}/models/sync/openai")
+
+    # Auto-assign default models
+    requests.post(f"{BASE_URL}/models/auto-assign")
+
+    # Check current defaults
+    defaults = requests.get(f"{BASE_URL}/models/defaults").json()
+    print(f"Default models: {defaults}")
+
+
+if __name__ == "__main__":
+    setup_ai_models()
+```