Add open-notebook skill: self-hosted NotebookLM alternative (issue #56)

Implements the open-notebook skill as a comprehensive integration for the
open-source, self-hosted alternative to Google NotebookLM. Addresses the
gap created by Google not providing a public NotebookLM API.

Developed using TDD with 44 tests covering skill structure, SKILL.md
frontmatter/content, reference documentation, example scripts, API
endpoint coverage, and marketplace.json registration.

Includes:
- SKILL.md with full documentation, code examples, and provider matrix
- references/api_reference.md covering all 20+ REST API endpoint groups
- references/examples.md with complete research workflow examples
- references/configuration.md with Docker, env vars, and security setup
- references/architecture.md with system design and data flow diagrams
- scripts/ with 3 example scripts (notebook, source, chat) + test suite
- marketplace.json updated to register the new skill

Closes #56

https://claude.ai/code/session_015CqcNWNYmDF9sqxKxziXcz
This commit is contained in:
Claude
2026-02-23 00:18:19 +00:00
parent f7585b7624
commit 259e01f7fd
10 changed files with 2599 additions and 0 deletions

View File

@@ -0,0 +1,715 @@
# Open Notebook API Reference
## Base URL
```
http://localhost:5055/api
```
Interactive API documentation is available at `http://localhost:5055/docs` (Swagger UI) and `http://localhost:5055/redoc` (ReDoc).
## Authentication
If `OPEN_NOTEBOOK_PASSWORD` is configured, include the password in requests. The following routes are excluded from authentication: `/`, `/health`, `/docs`, `/openapi.json`, `/redoc`, `/api/auth/status`, `/api/config`.
---
## Notebooks
### List Notebooks
```
GET /api/notebooks
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `archived` | boolean | Filter by archived status |
| `order_by` | string | Sort field (default: `updated_at`) |
**Response:** Array of notebook objects with `source_count` and `note_count`.
### Create Notebook
```
POST /api/notebooks
```
**Request Body:**
```json
{
"name": "My Research",
"description": "Optional description"
}
```
### Get Notebook
```
GET /api/notebooks/{notebook_id}
```
### Update Notebook
```
PUT /api/notebooks/{notebook_id}
```
**Request Body:**
```json
{
"name": "Updated Name",
"description": "Updated description",
"archived": false
}
```
### Delete Notebook
```
DELETE /api/notebooks/{notebook_id}
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `delete_sources` | boolean | Also delete exclusive sources (default: false) |
### Delete Preview
```
GET /api/notebooks/{notebook_id}/delete-preview
```
Returns counts of notes and sources that would be affected by deletion.
### Link Source to Notebook
```
POST /api/notebooks/{notebook_id}/sources/{source_id}
```
Idempotent operation to associate a source with a notebook.
### Unlink Source from Notebook
```
DELETE /api/notebooks/{notebook_id}/sources/{source_id}
```
---
## Sources
### List Sources
```
GET /api/sources
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `notebook_id` | string | Filter by notebook |
| `limit` | integer | Number of results |
| `offset` | integer | Pagination offset |
| `order_by` | string | Sort field |
### Create Source
```
POST /api/sources
```
Accepts multipart form data for file uploads or JSON for URL/text sources.
**Form Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `file` | file | Upload file (PDF, DOCX, audio, video) |
| `url` | string | Web URL to ingest |
| `text` | string | Raw text content |
| `notebook_id` | string | Associate with notebook |
| `process_async` | boolean | Process asynchronously (default: true) |
### Create Source (JSON)
```
POST /api/sources/json
```
Legacy JSON-based endpoint for source creation.
### Get Source
```
GET /api/sources/{source_id}
```
### Get Source Status
```
GET /api/sources/{source_id}/status
```
Poll processing status for asynchronously ingested sources.
### Update Source
```
PUT /api/sources/{source_id}
```
**Request Body:**
```json
{
"title": "Updated Title",
"topic": "Updated topic"
}
```
### Delete Source
```
DELETE /api/sources/{source_id}
```
### Download Source File
```
GET /api/sources/{source_id}/download
```
Returns the original uploaded file.
### Check Source File
```
HEAD /api/sources/{source_id}/download
```
### Retry Failed Source
```
POST /api/sources/{source_id}/retry
```
Requeue a failed source for processing.
### Get Source Insights
```
GET /api/sources/{source_id}/insights
```
Retrieve AI-generated insights for a source.
---
## Notes
### List Notes
```
GET /api/notes
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `notebook_id` | string | Filter by notebook |
### Create Note
```
POST /api/notes
```
**Request Body:**
```json
{
"title": "My Note",
"content": "Note content...",
"note_type": "human",
"notebook_id": "notebook:abc123"
}
```
`note_type` must be `"human"` or `"ai"`. AI notes without titles get auto-generated titles.
### Get Note
```
GET /api/notes/{note_id}
```
### Update Note
```
PUT /api/notes/{note_id}
```
**Request Body:**
```json
{
"title": "Updated Title",
"content": "Updated content",
"note_type": "human"
}
```
### Delete Note
```
DELETE /api/notes/{note_id}
```
---
## Chat
### List Sessions
```
GET /api/chat/sessions
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `notebook_id` | string | Filter by notebook |
### Create Session
```
POST /api/chat/sessions
```
**Request Body:**
```json
{
"notebook_id": "notebook:abc123",
"title": "Discussion Topic",
"model_override": "optional_model_id"
}
```
### Get Session
```
GET /api/chat/sessions/{session_id}
```
Returns session details with message history.
### Update Session
```
PUT /api/chat/sessions/{session_id}
```
### Delete Session
```
DELETE /api/chat/sessions/{session_id}
```
### Execute Chat
```
POST /api/chat/execute
```
**Request Body:**
```json
{
"session_id": "chat_session:abc123",
"message": "Your question here",
"context": {
"include_sources": true,
"include_notes": true
},
"model_override": "optional_model_id"
}
```
### Build Context
```
POST /api/chat/context
```
Build contextual data from sources and notes for a chat session.
---
## Search
### Search Knowledge Base
```
POST /api/search
```
**Request Body:**
```json
{
"query": "search terms",
"search_type": "vector",
"limit": 10,
"source_ids": [],
"note_ids": [],
"min_similarity": 0.7
}
```
`search_type` can be `"vector"` (requires embedding model) or `"text"` (keyword matching).
### Ask with Streaming
```
POST /api/search/ask
```
Returns Server-Sent Events with AI-generated answers based on knowledge base content.
### Ask Simple
```
POST /api/search/ask/simple
```
Non-streaming version that returns a complete response.
---
## Podcasts
### Generate Podcast
```
POST /api/podcasts/generate
```
**Request Body:**
```json
{
"notebook_id": "notebook:abc123",
"episode_profile_id": "episode_profile:xyz",
"speaker_profile_ids": ["speaker:a", "speaker:b"]
}
```
Returns a `job_id` for tracking generation progress.
### Get Job Status
```
GET /api/podcasts/jobs/{job_id}
```
### List Episodes
```
GET /api/podcasts/episodes
```
### Get Episode
```
GET /api/podcasts/episodes/{episode_id}
```
### Get Episode Audio
```
GET /api/podcasts/episodes/{episode_id}/audio
```
Streams the podcast audio file.
### Retry Failed Episode
```
POST /api/podcasts/episodes/{episode_id}/retry
```
### Delete Episode
```
DELETE /api/podcasts/episodes/{episode_id}
```
---
## Transformations
### List Transformations
```
GET /api/transformations
```
### Create Transformation
```
POST /api/transformations
```
**Request Body:**
```json
{
"name": "summarize",
"title": "Summarize Content",
"description": "Generate a concise summary",
"prompt": "Summarize the following text...",
"apply_default": false
}
```
### Execute Transformation
```
POST /api/transformations/execute
```
**Request Body:**
```json
{
"transformation_id": "transformation:abc",
"input_text": "Text to transform...",
"model_id": "model:xyz"
}
```
### Get Default Prompt
```
GET /api/transformations/default-prompt
```
### Update Default Prompt
```
PUT /api/transformations/default-prompt
```
### Get Transformation
```
GET /api/transformations/{transformation_id}
```
### Update Transformation
```
PUT /api/transformations/{transformation_id}
```
### Delete Transformation
```
DELETE /api/transformations/{transformation_id}
```
---
## Models
### List Models
```
GET /api/models
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `model_type` | string | Filter by type (llm, embedding, stt, tts) |
### Create Model
```
POST /api/models
```
### Delete Model
```
DELETE /api/models/{model_id}
```
### Test Model
```
POST /api/models/{model_id}/test
```
### Get Default Models
```
GET /api/models/defaults
```
Returns default model assignments for seven service slots: chat, transformation, embedding, speech-to-text, text-to-speech, podcast, and summary.
### Update Default Models
```
PUT /api/models/defaults
```
### Get Providers
```
GET /api/models/providers
```
### Discover Models
```
GET /api/models/discover/{provider}
```
### Sync Models (Single Provider)
```
POST /api/models/sync/{provider}
```
### Sync All Models
```
POST /api/models/sync
```
### Auto-Assign Defaults
```
POST /api/models/auto-assign
```
Automatically populate empty default model slots using provider priority rankings.
### Get Model Count
```
GET /api/models/count/{provider}
```
### Get Models by Provider
```
GET /api/models/by-provider/{provider}
```
---
## Credentials
### Get Status
```
GET /api/credentials/status
```
### Get Environment Status
```
GET /api/credentials/env-status
```
### List Credentials
```
GET /api/credentials
```
**Query Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `provider` | string | Filter by provider |
### List by Provider
```
GET /api/credentials/by-provider/{provider}
```
### Create Credential
```
POST /api/credentials
```
**Request Body:**
```json
{
"provider": "openai",
"name": "My OpenAI Key",
"api_key": "sk-...",
"base_url": null
}
```
### Get Credential
```
GET /api/credentials/{credential_id}
```
Note: API key values are never returned.
### Update Credential
```
PUT /api/credentials/{credential_id}
```
### Delete Credential
```
DELETE /api/credentials/{credential_id}
```
### Test Credential
```
POST /api/credentials/{credential_id}/test
```
### Discover Models via Credential
```
POST /api/credentials/{credential_id}/discover
```
### Register Models via Credential
```
POST /api/credentials/{credential_id}/register-models
```
---
## Error Responses
The API returns standard HTTP status codes with JSON error bodies:
| Status | Meaning |
|--------|---------|
| 400 | Invalid input |
| 401 | Authentication required |
| 404 | Resource not found |
| 422 | Configuration error |
| 429 | Rate limited |
| 500 | Internal server error |
| 502 | External service error |
**Error Response Format:**
```json
{
"detail": "Description of the error"
}
```

View File

@@ -0,0 +1,163 @@
# Open Notebook Architecture
## System Overview
Open Notebook is built as a modern Python web application with a clear separation between frontend and backend, using Docker for deployment.
```
┌─────────────────────────────────────────────────────┐
│ Docker Compose │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Next.js │ │ FastAPI │ │ SurrealDB │ │
│ │ Frontend │──│ Backend │──│ │ │
│ │ (port 8502) │ │ (port 5055) │ │ (port 8K) │ │
│ └──────────────┘ └──────────────┘ └───────────┘ │
│ │ │
│ ┌─────┴─────┐ │
│ │ LangChain │ │
│ │ Esperanto │ │
│ └─────┬─────┘ │
│ │ │
│ ┌───────────┼───────────┐ │
│ │ │ │ │
│ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐ │
│ │OpenAI │ │Claude │ │Ollama │ ... │
│ └───────┘ └───────┘ └───────┘ │
└─────────────────────────────────────────────────────┘
```
## Core Components
### FastAPI Backend
The REST API is built with FastAPI and organized into routers:
- **20 route modules** covering notebooks, sources, notes, chat, search, podcasts, transformations, models, credentials, embeddings, settings, and more
- Async/await throughout for non-blocking I/O
- Pydantic models for request/response validation
- Custom exception handlers mapping domain errors to HTTP status codes
- CORS middleware for cross-origin access
- Optional password authentication middleware
### SurrealDB
SurrealDB serves as the primary data store, providing both document and relational capabilities:
- **Document storage** for notebooks, sources, notes, transformations, and models
- **Relational references** for notebook-source associations
- **Full-text search** across indexed content
- **RocksDB** backend for persistent storage on disk
- Schema migrations run automatically on application startup
### LangChain Integration
AI features are powered by LangChain with the Esperanto multi-provider library:
- **LangGraph** manages conversational state for chat sessions
- **Embedding models** power vector search across content
- **LLM chains** drive transformations, note generation, and podcast scripting
- **Prompt templates** stored in the `prompts/` directory
### Esperanto Multi-Provider Library
Esperanto provides a unified interface to 16+ AI providers:
- Abstracts provider-specific API differences
- Supports LLM, embedding, speech-to-text, and text-to-speech capabilities
- Handles credential management and model discovery
- Enables runtime provider switching without code changes
### Next.js Frontend
The user interface is a React application built with Next.js:
- Responsive design for desktop and tablet use
- Real-time updates for chat and processing status
- File upload with progress tracking
- Audio player for podcast episodes
## Data Flow
### Source Ingestion
```
Upload/URL → Source Record Created → Processing Queue
┌──────────┼──────────┐
▼ ▼ ▼
Text Embedding Metadata
Extraction Generation Extraction
│ │ │
└──────────┼──────────┘
Source Updated
(searchable)
```
### Chat Execution
```
User Message → Build Context (sources + notes)
LangGraph State Machine
├─ Retrieve relevant context
├─ Format prompt with citations
└─ Stream LLM response
Response with
source citations
```
### Podcast Generation
```
Notebook Content → Episode Profile → Script Generation (LLM)
Speaker Assignment
Text-to-Speech
(per segment)
Audio Assembly
Episode Record
+ Audio File
```
## Key Design Decisions
1. **Multi-provider by default**: Not locked to any single AI provider, enabling cost optimization and capability matching
2. **Async processing**: Long-running operations (source ingestion, podcast generation) run asynchronously with status polling
3. **Self-hosted data**: All data stays on the user's infrastructure with encrypted credential storage
4. **REST-first API**: Every UI action is backed by an API endpoint for automation
5. **Docker-native**: Designed for containerized deployment with persistent volumes
## File Structure
```
open-notebook/
├── api/ # FastAPI REST API
│ ├── main.py # App setup, middleware, routers
│ ├── routers/ # Route handlers (20 modules)
│ ├── models.py # Pydantic request/response models
│ └── auth.py # Authentication middleware
├── open_notebook/ # Core library
│ ├── ai/ # AI integration (LangChain, Esperanto)
│ ├── database/ # SurrealDB operations
│ ├── domain/ # Domain models and business logic
│ ├── graphs/ # LangGraph chat and processing graphs
│ ├── podcasts/ # Podcast generation pipeline
│ └── utils/ # Shared utilities
├── frontend/ # Next.js React application
├── prompts/ # AI prompt templates
├── tests/ # Test suite
└── docker-compose.yml # Deployment configuration
```

View File

@@ -0,0 +1,226 @@
# Open Notebook Configuration Guide
## Docker Deployment
Open Notebook is deployed as a Docker Compose stack with two main services: the application server and SurrealDB.
### Minimal docker-compose.yml
```yaml
version: "3.8"
services:
surrealdb:
image: surrealdb/surrealdb:latest
command: start --user root --pass root rocksdb://data/database.db
volumes:
- surrealdb_data:/data
ports:
- "8000:8000"
open-notebook:
image: ghcr.io/lfnovo/open-notebook:latest
depends_on:
- surrealdb
environment:
- OPEN_NOTEBOOK_ENCRYPTION_KEY=${OPEN_NOTEBOOK_ENCRYPTION_KEY}
- SURREAL_URL=ws://surrealdb:8000/rpc
- SURREAL_NAMESPACE=open_notebook
- SURREAL_DATABASE=open_notebook
ports:
- "8502:8502" # Frontend UI
- "5055:5055" # REST API
volumes:
- on_uploads:/app/uploads
volumes:
surrealdb_data:
on_uploads:
```
### Starting the Stack
```bash
# Set the encryption key (required)
export OPEN_NOTEBOOK_ENCRYPTION_KEY="your-secure-random-key"
# Start services
docker-compose up -d
# View logs
docker-compose logs -f open-notebook
# Stop services
docker-compose down
# Stop and remove data
docker-compose down -v
```
## Environment Variables
### Required
| Variable | Description |
|----------|-------------|
| `OPEN_NOTEBOOK_ENCRYPTION_KEY` | Secret key for encrypting stored API credentials. Must be set before first launch and kept consistent. |
### Database
| Variable | Default | Description |
|----------|---------|-------------|
| `SURREAL_URL` | `ws://surrealdb:8000/rpc` | SurrealDB WebSocket connection URL |
| `SURREAL_NAMESPACE` | `open_notebook` | SurrealDB namespace |
| `SURREAL_DATABASE` | `open_notebook` | SurrealDB database name |
| `SURREAL_USER` | `root` | SurrealDB username |
| `SURREAL_PASS` | `root` | SurrealDB password |
### Application
| Variable | Default | Description |
|----------|---------|-------------|
| `OPEN_NOTEBOOK_PASSWORD` | None | Optional password protection for the web UI |
| `UPLOAD_DIR` | `/app/uploads` | Directory for uploaded file storage |
### AI Provider Keys (Legacy)
API keys can also be set via environment variables for legacy compatibility. The preferred method is using the credentials API or UI.
| Variable | Provider |
|----------|----------|
| `OPENAI_API_KEY` | OpenAI |
| `ANTHROPIC_API_KEY` | Anthropic |
| `GOOGLE_API_KEY` | Google GenAI |
| `GROQ_API_KEY` | Groq |
| `MISTRAL_API_KEY` | Mistral |
| `ELEVENLABS_API_KEY` | ElevenLabs |
## AI Provider Configuration
### Via UI
1. Go to **Settings > API Keys**
2. Click **Add Credential**
3. Select provider, enter API key and optional base URL
4. Click **Test Connection** to verify
5. Click **Discover Models** to find available models
6. Select models to register
### Via API
```python
import requests
BASE_URL = "http://localhost:5055/api"
# 1. Create credential
cred = requests.post(f"{BASE_URL}/credentials", json={
"provider": "anthropic",
"name": "Anthropic Production",
"api_key": "sk-ant-..."
}).json()
# 2. Test connection
test = requests.post(f"{BASE_URL}/credentials/{cred['id']}/test").json()
assert test["success"]
# 3. Discover and register models
discovered = requests.post(
f"{BASE_URL}/credentials/{cred['id']}/discover"
).json()
requests.post(
f"{BASE_URL}/credentials/{cred['id']}/register-models",
json={"model_ids": [m["id"] for m in discovered["models"]]}
)
# 4. Auto-assign defaults
requests.post(f"{BASE_URL}/models/auto-assign")
```
### Using Ollama (Free Local Inference)
For free AI inference without API costs, use Ollama:
```yaml
# docker-compose-ollama.yml addition
services:
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
```
Then configure Ollama as a provider with base URL `http://ollama:11434`.
## Security Configuration
### Password Protection
Set `OPEN_NOTEBOOK_PASSWORD` to require authentication:
```bash
export OPEN_NOTEBOOK_PASSWORD="your-ui-password"
```
### Reverse Proxy (Nginx Example)
```nginx
server {
listen 443 ssl;
server_name notebook.example.com;
ssl_certificate /etc/ssl/certs/cert.pem;
ssl_certificate_key /etc/ssl/private/key.pem;
location / {
proxy_pass http://localhost:8502;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
}
location /api/ {
proxy_pass http://localhost:5055/api/;
proxy_set_header Host $host;
}
}
```
## Backup and Restore
### Backup SurrealDB Data
```bash
# Export database
docker exec surrealdb surreal export \
--conn ws://localhost:8000 \
--user root --pass root \
--ns open_notebook --db open_notebook \
/tmp/backup.surql
# Copy backup from container
docker cp surrealdb:/tmp/backup.surql ./backup.surql
```
### Backup Uploaded Files
```bash
# Copy upload volume contents
docker cp open-notebook:/app/uploads ./uploads_backup/
```
### Restore
```bash
# Import database backup
docker cp ./backup.surql surrealdb:/tmp/backup.surql
docker exec surrealdb surreal import \
--conn ws://localhost:8000 \
--user root --pass root \
--ns open_notebook --db open_notebook \
/tmp/backup.surql
```

View File

@@ -0,0 +1,290 @@
# Open Notebook Examples
## Complete Research Workflow
This example demonstrates a full research workflow: creating a notebook, adding sources, generating notes, chatting with the AI, and searching across materials.
```python
import requests
import time
BASE_URL = "http://localhost:5055/api"
def complete_research_workflow():
"""End-to-end research workflow with Open Notebook."""
# 1. Create a research notebook
notebook = requests.post(f"{BASE_URL}/notebooks", json={
"name": "Drug Resistance in Cancer",
"description": "Review of mechanisms of drug resistance in solid tumors"
}).json()
notebook_id = notebook["id"]
print(f"Created notebook: {notebook_id}")
# 2. Add sources from URLs
urls = [
"https://www.nature.com/articles/s41568-020-0281-y",
"https://www.cell.com/cancer-cell/fulltext/S1535-6108(20)30211-8",
]
source_ids = []
for url in urls:
source = requests.post(f"{BASE_URL}/sources", data={
"url": url,
"notebook_id": notebook_id,
"process_async": "true"
}).json()
source_ids.append(source["id"])
print(f"Added source: {source['id']}")
# 3. Wait for processing to complete
for source_id in source_ids:
while True:
status = requests.get(
f"{BASE_URL}/sources/{source_id}/status"
).json()
if status.get("status") in ("completed", "failed"):
break
time.sleep(5)
print(f"Source {source_id}: {status['status']}")
# 4. Create a chat session and ask questions
session = requests.post(f"{BASE_URL}/chat/sessions", json={
"notebook_id": notebook_id,
"title": "Resistance Mechanisms"
}).json()
answer = requests.post(f"{BASE_URL}/chat/execute", json={
"session_id": session["id"],
"message": "What are the primary mechanisms of drug resistance in solid tumors?",
"context": {"include_sources": True, "include_notes": True}
}).json()
print(f"AI response: {answer}")
# 5. Search across materials
results = requests.post(f"{BASE_URL}/search", json={
"query": "efflux pump resistance mechanism",
"search_type": "vector",
"limit": 5
}).json()
print(f"Found {results['total']} search results")
# 6. Create a human note summarizing findings
note = requests.post(f"{BASE_URL}/notes", json={
"title": "Summary of Resistance Mechanisms",
"content": "Key findings from the literature...",
"note_type": "human",
"notebook_id": notebook_id
}).json()
print(f"Created note: {note['id']}")
if __name__ == "__main__":
complete_research_workflow()
```
## File Upload Example
```python
import requests
BASE_URL = "http://localhost:5055/api"
def upload_research_papers(notebook_id, file_paths):
"""Upload multiple research papers to a notebook."""
for path in file_paths:
with open(path, "rb") as f:
response = requests.post(
f"{BASE_URL}/sources",
data={
"notebook_id": notebook_id,
"process_async": "true",
},
files={"file": (path.split("/")[-1], f)},
)
if response.status_code == 200:
print(f"Uploaded: {path}")
else:
print(f"Failed: {path} - {response.text}")
# Usage
upload_research_papers("notebook:abc123", [
"papers/study_1.pdf",
"papers/study_2.pdf",
"papers/supplementary.docx",
])
```
## Podcast Generation Example
```python
import requests
import time
BASE_URL = "http://localhost:5055/api"
def generate_research_podcast(notebook_id):
"""Generate a podcast episode from notebook contents."""
# Get available episode and speaker profiles
# (these must be configured in the UI or via API first)
# Submit podcast generation job
job = requests.post(f"{BASE_URL}/podcasts/generate", json={
"notebook_id": notebook_id,
"episode_profile_id": "episode_profile:default",
"speaker_profile_ids": [
"speaker_profile:host",
"speaker_profile:expert"
]
}).json()
job_id = job["job_id"]
print(f"Podcast generation started: {job_id}")
# Poll for completion
while True:
status = requests.get(f"{BASE_URL}/podcasts/jobs/{job_id}").json()
print(f"Status: {status.get('status', 'processing')}")
if status.get("status") in ("completed", "failed"):
break
time.sleep(10)
if status["status"] == "completed":
# Download the audio
episode_id = status["episode_id"]
audio = requests.get(
f"{BASE_URL}/podcasts/episodes/{episode_id}/audio"
)
with open("research_podcast.mp3", "wb") as f:
f.write(audio.content)
print("Podcast saved to research_podcast.mp3")
if __name__ == "__main__":
generate_research_podcast("notebook:abc123")
```
## Custom Transformation Pipeline
```python
import requests
BASE_URL = "http://localhost:5055/api"
def create_and_run_transformations():
"""Create custom transformations and apply them to content."""
# Create a methodology extraction transformation
transform = requests.post(f"{BASE_URL}/transformations", json={
"name": "extract_methods",
"title": "Extract Methods",
"description": "Extract and structure methodology from papers",
"prompt": (
"Extract the methodology section from this text. "
"Organize into: Study Design, Sample Size, Statistical Methods, "
"and Key Variables. Format as structured markdown."
),
"apply_default": False,
}).json()
# Get models to find a suitable one
models = requests.get(f"{BASE_URL}/models", params={
"model_type": "llm"
}).json()
model_id = models[0]["id"]
# Execute the transformation
result = requests.post(f"{BASE_URL}/transformations/execute", json={
"transformation_id": transform["id"],
"input_text": "We conducted a randomized controlled trial with...",
"model_id": model_id,
}).json()
print(f"Extracted methods:\n{result['output']}")
if __name__ == "__main__":
create_and_run_transformations()
```
## Semantic Search with Filtering
```python
import requests
BASE_URL = "http://localhost:5055/api"
def advanced_search(notebook_id, query):
"""Perform filtered semantic search and get AI answers."""
# Get sources from a specific notebook
sources = requests.get(f"{BASE_URL}/sources", params={
"notebook_id": notebook_id
}).json()
source_ids = [s["id"] for s in sources]
# Vector search restricted to notebook sources
results = requests.post(f"{BASE_URL}/search", json={
"query": query,
"search_type": "vector",
"limit": 10,
"source_ids": source_ids,
"min_similarity": 0.75,
}).json()
print(f"Found {results['total']} results:")
for result in results["results"]:
print(f" - {result.get('title', 'Untitled')} "
f"(similarity: {result.get('similarity', 'N/A')})")
# Get an AI-powered answer
answer = requests.post(f"{BASE_URL}/search/ask/simple", json={
"query": query,
}).json()
print(f"\nAI Answer: {answer['response']}")
if __name__ == "__main__":
advanced_search("notebook:abc123", "CRISPR gene editing efficiency")
```
## Model Management
```python
import requests
BASE_URL = "http://localhost:5055/api"
def setup_ai_models():
"""Configure AI models for Open Notebook."""
# Check available providers
providers = requests.get(f"{BASE_URL}/models/providers").json()
print(f"Available providers: {providers}")
# Discover models from a provider
discovered = requests.get(
f"{BASE_URL}/models/discover/openai"
).json()
print(f"Discovered {len(discovered)} OpenAI models")
# Sync models to make them available
requests.post(f"{BASE_URL}/models/sync/openai")
# Auto-assign default models
requests.post(f"{BASE_URL}/models/auto-assign")
# Check current defaults
defaults = requests.get(f"{BASE_URL}/models/defaults").json()
print(f"Default models: {defaults}")
if __name__ == "__main__":
setup_ai_models()
```