mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Add open-notebook skill: self-hosted NotebookLM alternative (issue #56)
Implements the open-notebook skill as a comprehensive integration for the open-source, self-hosted alternative to Google NotebookLM. Addresses the gap created by Google not providing a public NotebookLM API. Developed using TDD with 44 tests covering skill structure, SKILL.md frontmatter/content, reference documentation, example scripts, API endpoint coverage, and marketplace.json registration. Includes: - SKILL.md with full documentation, code examples, and provider matrix - references/api_reference.md covering all 20+ REST API endpoint groups - references/examples.md with complete research workflow examples - references/configuration.md with Docker, env vars, and security setup - references/architecture.md with system design and data flow diagrams - scripts/ with 3 example scripts (notebook, source, chat) + test suite - marketplace.json updated to register the new skill Closes #56 https://claude.ai/code/session_015CqcNWNYmDF9sqxKxziXcz
This commit is contained in:
715
scientific-skills/open-notebook/references/api_reference.md
Normal file
715
scientific-skills/open-notebook/references/api_reference.md
Normal file
@@ -0,0 +1,715 @@
|
||||
# Open Notebook API Reference
|
||||
|
||||
## Base URL
|
||||
|
||||
```
|
||||
http://localhost:5055/api
|
||||
```
|
||||
|
||||
Interactive API documentation is available at `http://localhost:5055/docs` (Swagger UI) and `http://localhost:5055/redoc` (ReDoc).
|
||||
|
||||
## Authentication
|
||||
|
||||
If `OPEN_NOTEBOOK_PASSWORD` is configured, include the password in requests. The following routes are excluded from authentication: `/`, `/health`, `/docs`, `/openapi.json`, `/redoc`, `/api/auth/status`, `/api/config`.
|
||||
|
||||
---
|
||||
|
||||
## Notebooks
|
||||
|
||||
### List Notebooks
|
||||
|
||||
```
|
||||
GET /api/notebooks
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `archived` | boolean | Filter by archived status |
|
||||
| `order_by` | string | Sort field (default: `updated_at`) |
|
||||
|
||||
**Response:** Array of notebook objects with `source_count` and `note_count`.
|
||||
|
||||
### Create Notebook
|
||||
|
||||
```
|
||||
POST /api/notebooks
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"name": "My Research",
|
||||
"description": "Optional description"
|
||||
}
|
||||
```
|
||||
|
||||
### Get Notebook
|
||||
|
||||
```
|
||||
GET /api/notebooks/{notebook_id}
|
||||
```
|
||||
|
||||
### Update Notebook
|
||||
|
||||
```
|
||||
PUT /api/notebooks/{notebook_id}
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"name": "Updated Name",
|
||||
"description": "Updated description",
|
||||
"archived": false
|
||||
}
|
||||
```
|
||||
|
||||
### Delete Notebook
|
||||
|
||||
```
|
||||
DELETE /api/notebooks/{notebook_id}
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `delete_sources` | boolean | Also delete exclusive sources (default: false) |
|
||||
|
||||
### Delete Preview
|
||||
|
||||
```
|
||||
GET /api/notebooks/{notebook_id}/delete-preview
|
||||
```
|
||||
|
||||
Returns counts of notes and sources that would be affected by deletion.
|
||||
|
||||
### Link Source to Notebook
|
||||
|
||||
```
|
||||
POST /api/notebooks/{notebook_id}/sources/{source_id}
|
||||
```
|
||||
|
||||
Idempotent operation to associate a source with a notebook.
|
||||
|
||||
### Unlink Source from Notebook
|
||||
|
||||
```
|
||||
DELETE /api/notebooks/{notebook_id}/sources/{source_id}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
### List Sources
|
||||
|
||||
```
|
||||
GET /api/sources
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `notebook_id` | string | Filter by notebook |
|
||||
| `limit` | integer | Number of results |
|
||||
| `offset` | integer | Pagination offset |
|
||||
| `order_by` | string | Sort field |
|
||||
|
||||
### Create Source
|
||||
|
||||
```
|
||||
POST /api/sources
|
||||
```
|
||||
|
||||
Accepts multipart form data for file uploads or JSON for URL/text sources.
|
||||
|
||||
**Form Parameters:**
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `file` | file | Upload file (PDF, DOCX, audio, video) |
|
||||
| `url` | string | Web URL to ingest |
|
||||
| `text` | string | Raw text content |
|
||||
| `notebook_id` | string | Associate with notebook |
|
||||
| `process_async` | boolean | Process asynchronously (default: true) |
|
||||
|
||||
### Create Source (JSON)
|
||||
|
||||
```
|
||||
POST /api/sources/json
|
||||
```
|
||||
|
||||
Legacy JSON-based endpoint for source creation.
|
||||
|
||||
### Get Source
|
||||
|
||||
```
|
||||
GET /api/sources/{source_id}
|
||||
```
|
||||
|
||||
### Get Source Status
|
||||
|
||||
```
|
||||
GET /api/sources/{source_id}/status
|
||||
```
|
||||
|
||||
Poll processing status for asynchronously ingested sources.
|
||||
|
||||
### Update Source
|
||||
|
||||
```
|
||||
PUT /api/sources/{source_id}
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"title": "Updated Title",
|
||||
"topic": "Updated topic"
|
||||
}
|
||||
```
|
||||
|
||||
### Delete Source
|
||||
|
||||
```
|
||||
DELETE /api/sources/{source_id}
|
||||
```
|
||||
|
||||
### Download Source File
|
||||
|
||||
```
|
||||
GET /api/sources/{source_id}/download
|
||||
```
|
||||
|
||||
Returns the original uploaded file.
|
||||
|
||||
### Check Source File
|
||||
|
||||
```
|
||||
HEAD /api/sources/{source_id}/download
|
||||
```
|
||||
|
||||
### Retry Failed Source
|
||||
|
||||
```
|
||||
POST /api/sources/{source_id}/retry
|
||||
```
|
||||
|
||||
Requeue a failed source for processing.
|
||||
|
||||
### Get Source Insights
|
||||
|
||||
```
|
||||
GET /api/sources/{source_id}/insights
|
||||
```
|
||||
|
||||
Retrieve AI-generated insights for a source.
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
### List Notes
|
||||
|
||||
```
|
||||
GET /api/notes
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `notebook_id` | string | Filter by notebook |
|
||||
|
||||
### Create Note
|
||||
|
||||
```
|
||||
POST /api/notes
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"title": "My Note",
|
||||
"content": "Note content...",
|
||||
"note_type": "human",
|
||||
"notebook_id": "notebook:abc123"
|
||||
}
|
||||
```
|
||||
|
||||
`note_type` must be `"human"` or `"ai"`. AI notes without titles get auto-generated titles.
|
||||
|
||||
### Get Note
|
||||
|
||||
```
|
||||
GET /api/notes/{note_id}
|
||||
```
|
||||
|
||||
### Update Note
|
||||
|
||||
```
|
||||
PUT /api/notes/{note_id}
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"title": "Updated Title",
|
||||
"content": "Updated content",
|
||||
"note_type": "human"
|
||||
}
|
||||
```
|
||||
|
||||
### Delete Note
|
||||
|
||||
```
|
||||
DELETE /api/notes/{note_id}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Chat
|
||||
|
||||
### List Sessions
|
||||
|
||||
```
|
||||
GET /api/chat/sessions
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `notebook_id` | string | Filter by notebook |
|
||||
|
||||
### Create Session
|
||||
|
||||
```
|
||||
POST /api/chat/sessions
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"notebook_id": "notebook:abc123",
|
||||
"title": "Discussion Topic",
|
||||
"model_override": "optional_model_id"
|
||||
}
|
||||
```
|
||||
|
||||
### Get Session
|
||||
|
||||
```
|
||||
GET /api/chat/sessions/{session_id}
|
||||
```
|
||||
|
||||
Returns session details with message history.
|
||||
|
||||
### Update Session
|
||||
|
||||
```
|
||||
PUT /api/chat/sessions/{session_id}
|
||||
```
|
||||
|
||||
### Delete Session
|
||||
|
||||
```
|
||||
DELETE /api/chat/sessions/{session_id}
|
||||
```
|
||||
|
||||
### Execute Chat
|
||||
|
||||
```
|
||||
POST /api/chat/execute
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"session_id": "chat_session:abc123",
|
||||
"message": "Your question here",
|
||||
"context": {
|
||||
"include_sources": true,
|
||||
"include_notes": true
|
||||
},
|
||||
"model_override": "optional_model_id"
|
||||
}
|
||||
```
|
||||
|
||||
### Build Context
|
||||
|
||||
```
|
||||
POST /api/chat/context
|
||||
```
|
||||
|
||||
Build contextual data from sources and notes for a chat session.
|
||||
|
||||
---
|
||||
|
||||
## Search
|
||||
|
||||
### Search Knowledge Base
|
||||
|
||||
```
|
||||
POST /api/search
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"query": "search terms",
|
||||
"search_type": "vector",
|
||||
"limit": 10,
|
||||
"source_ids": [],
|
||||
"note_ids": [],
|
||||
"min_similarity": 0.7
|
||||
}
|
||||
```
|
||||
|
||||
`search_type` can be `"vector"` (requires embedding model) or `"text"` (keyword matching).
|
||||
|
||||
### Ask with Streaming
|
||||
|
||||
```
|
||||
POST /api/search/ask
|
||||
```
|
||||
|
||||
Returns Server-Sent Events with AI-generated answers based on knowledge base content.
|
||||
|
||||
### Ask Simple
|
||||
|
||||
```
|
||||
POST /api/search/ask/simple
|
||||
```
|
||||
|
||||
Non-streaming version that returns a complete response.
|
||||
|
||||
---
|
||||
|
||||
## Podcasts
|
||||
|
||||
### Generate Podcast
|
||||
|
||||
```
|
||||
POST /api/podcasts/generate
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"notebook_id": "notebook:abc123",
|
||||
"episode_profile_id": "episode_profile:xyz",
|
||||
"speaker_profile_ids": ["speaker:a", "speaker:b"]
|
||||
}
|
||||
```
|
||||
|
||||
Returns a `job_id` for tracking generation progress.
|
||||
|
||||
### Get Job Status
|
||||
|
||||
```
|
||||
GET /api/podcasts/jobs/{job_id}
|
||||
```
|
||||
|
||||
### List Episodes
|
||||
|
||||
```
|
||||
GET /api/podcasts/episodes
|
||||
```
|
||||
|
||||
### Get Episode
|
||||
|
||||
```
|
||||
GET /api/podcasts/episodes/{episode_id}
|
||||
```
|
||||
|
||||
### Get Episode Audio
|
||||
|
||||
```
|
||||
GET /api/podcasts/episodes/{episode_id}/audio
|
||||
```
|
||||
|
||||
Streams the podcast audio file.
|
||||
|
||||
### Retry Failed Episode
|
||||
|
||||
```
|
||||
POST /api/podcasts/episodes/{episode_id}/retry
|
||||
```
|
||||
|
||||
### Delete Episode
|
||||
|
||||
```
|
||||
DELETE /api/podcasts/episodes/{episode_id}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Transformations
|
||||
|
||||
### List Transformations
|
||||
|
||||
```
|
||||
GET /api/transformations
|
||||
```
|
||||
|
||||
### Create Transformation
|
||||
|
||||
```
|
||||
POST /api/transformations
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"name": "summarize",
|
||||
"title": "Summarize Content",
|
||||
"description": "Generate a concise summary",
|
||||
"prompt": "Summarize the following text...",
|
||||
"apply_default": false
|
||||
}
|
||||
```
|
||||
|
||||
### Execute Transformation
|
||||
|
||||
```
|
||||
POST /api/transformations/execute
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"transformation_id": "transformation:abc",
|
||||
"input_text": "Text to transform...",
|
||||
"model_id": "model:xyz"
|
||||
}
|
||||
```
|
||||
|
||||
### Get Default Prompt
|
||||
|
||||
```
|
||||
GET /api/transformations/default-prompt
|
||||
```
|
||||
|
||||
### Update Default Prompt
|
||||
|
||||
```
|
||||
PUT /api/transformations/default-prompt
|
||||
```
|
||||
|
||||
### Get Transformation
|
||||
|
||||
```
|
||||
GET /api/transformations/{transformation_id}
|
||||
```
|
||||
|
||||
### Update Transformation
|
||||
|
||||
```
|
||||
PUT /api/transformations/{transformation_id}
|
||||
```
|
||||
|
||||
### Delete Transformation
|
||||
|
||||
```
|
||||
DELETE /api/transformations/{transformation_id}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Models
|
||||
|
||||
### List Models
|
||||
|
||||
```
|
||||
GET /api/models
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `model_type` | string | Filter by type (llm, embedding, stt, tts) |
|
||||
|
||||
### Create Model
|
||||
|
||||
```
|
||||
POST /api/models
|
||||
```
|
||||
|
||||
### Delete Model
|
||||
|
||||
```
|
||||
DELETE /api/models/{model_id}
|
||||
```
|
||||
|
||||
### Test Model
|
||||
|
||||
```
|
||||
POST /api/models/{model_id}/test
|
||||
```
|
||||
|
||||
### Get Default Models
|
||||
|
||||
```
|
||||
GET /api/models/defaults
|
||||
```
|
||||
|
||||
Returns default model assignments for seven service slots: chat, transformation, embedding, speech-to-text, text-to-speech, podcast, and summary.
|
||||
|
||||
### Update Default Models
|
||||
|
||||
```
|
||||
PUT /api/models/defaults
|
||||
```
|
||||
|
||||
### Get Providers
|
||||
|
||||
```
|
||||
GET /api/models/providers
|
||||
```
|
||||
|
||||
### Discover Models
|
||||
|
||||
```
|
||||
GET /api/models/discover/{provider}
|
||||
```
|
||||
|
||||
### Sync Models (Single Provider)
|
||||
|
||||
```
|
||||
POST /api/models/sync/{provider}
|
||||
```
|
||||
|
||||
### Sync All Models
|
||||
|
||||
```
|
||||
POST /api/models/sync
|
||||
```
|
||||
|
||||
### Auto-Assign Defaults
|
||||
|
||||
```
|
||||
POST /api/models/auto-assign
|
||||
```
|
||||
|
||||
Automatically populate empty default model slots using provider priority rankings.
|
||||
|
||||
### Get Model Count
|
||||
|
||||
```
|
||||
GET /api/models/count/{provider}
|
||||
```
|
||||
|
||||
### Get Models by Provider
|
||||
|
||||
```
|
||||
GET /api/models/by-provider/{provider}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Credentials
|
||||
|
||||
### Get Status
|
||||
|
||||
```
|
||||
GET /api/credentials/status
|
||||
```
|
||||
|
||||
### Get Environment Status
|
||||
|
||||
```
|
||||
GET /api/credentials/env-status
|
||||
```
|
||||
|
||||
### List Credentials
|
||||
|
||||
```
|
||||
GET /api/credentials
|
||||
```
|
||||
|
||||
**Query Parameters:**
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `provider` | string | Filter by provider |
|
||||
|
||||
### List by Provider
|
||||
|
||||
```
|
||||
GET /api/credentials/by-provider/{provider}
|
||||
```
|
||||
|
||||
### Create Credential
|
||||
|
||||
```
|
||||
POST /api/credentials
|
||||
```
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"provider": "openai",
|
||||
"name": "My OpenAI Key",
|
||||
"api_key": "sk-...",
|
||||
"base_url": null
|
||||
}
|
||||
```
|
||||
|
||||
### Get Credential
|
||||
|
||||
```
|
||||
GET /api/credentials/{credential_id}
|
||||
```
|
||||
|
||||
Note: API key values are never returned.
|
||||
|
||||
### Update Credential
|
||||
|
||||
```
|
||||
PUT /api/credentials/{credential_id}
|
||||
```
|
||||
|
||||
### Delete Credential
|
||||
|
||||
```
|
||||
DELETE /api/credentials/{credential_id}
|
||||
```
|
||||
|
||||
### Test Credential
|
||||
|
||||
```
|
||||
POST /api/credentials/{credential_id}/test
|
||||
```
|
||||
|
||||
### Discover Models via Credential
|
||||
|
||||
```
|
||||
POST /api/credentials/{credential_id}/discover
|
||||
```
|
||||
|
||||
### Register Models via Credential
|
||||
|
||||
```
|
||||
POST /api/credentials/{credential_id}/register-models
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Responses
|
||||
|
||||
The API returns standard HTTP status codes with JSON error bodies:
|
||||
|
||||
| Status | Meaning |
|
||||
|--------|---------|
|
||||
| 400 | Invalid input |
|
||||
| 401 | Authentication required |
|
||||
| 404 | Resource not found |
|
||||
| 422 | Configuration error |
|
||||
| 429 | Rate limited |
|
||||
| 500 | Internal server error |
|
||||
| 502 | External service error |
|
||||
|
||||
**Error Response Format:**
|
||||
```json
|
||||
{
|
||||
"detail": "Description of the error"
|
||||
}
|
||||
```
|
||||
163
scientific-skills/open-notebook/references/architecture.md
Normal file
163
scientific-skills/open-notebook/references/architecture.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# Open Notebook Architecture
|
||||
|
||||
## System Overview
|
||||
|
||||
Open Notebook is built as a modern Python web application with a clear separation between frontend and backend, using Docker for deployment.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Docker Compose │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
|
||||
│ │ Next.js │ │ FastAPI │ │ SurrealDB │ │
|
||||
│ │ Frontend │──│ Backend │──│ │ │
|
||||
│ │ (port 8502) │ │ (port 5055) │ │ (port 8K) │ │
|
||||
│ └──────────────┘ └──────────────┘ └───────────┘ │
|
||||
│ │ │
|
||||
│ ┌─────┴─────┐ │
|
||||
│ │ LangChain │ │
|
||||
│ │ Esperanto │ │
|
||||
│ └─────┬─────┘ │
|
||||
│ │ │
|
||||
│ ┌───────────┼───────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐ │
|
||||
│ │OpenAI │ │Claude │ │Ollama │ ... │
|
||||
│ └───────┘ └───────┘ └───────┘ │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Core Components
|
||||
|
||||
### FastAPI Backend
|
||||
|
||||
The REST API is built with FastAPI and organized into routers:
|
||||
|
||||
- **20 route modules** covering notebooks, sources, notes, chat, search, podcasts, transformations, models, credentials, embeddings, settings, and more
|
||||
- Async/await throughout for non-blocking I/O
|
||||
- Pydantic models for request/response validation
|
||||
- Custom exception handlers mapping domain errors to HTTP status codes
|
||||
- CORS middleware for cross-origin access
|
||||
- Optional password authentication middleware
|
||||
|
||||
### SurrealDB
|
||||
|
||||
SurrealDB serves as the primary data store, providing both document and relational capabilities:
|
||||
|
||||
- **Document storage** for notebooks, sources, notes, transformations, and models
|
||||
- **Relational references** for notebook-source associations
|
||||
- **Full-text search** across indexed content
|
||||
- **RocksDB** backend for persistent storage on disk
|
||||
- Schema migrations run automatically on application startup
|
||||
|
||||
### LangChain Integration
|
||||
|
||||
AI features are powered by LangChain with the Esperanto multi-provider library:
|
||||
|
||||
- **LangGraph** manages conversational state for chat sessions
|
||||
- **Embedding models** power vector search across content
|
||||
- **LLM chains** drive transformations, note generation, and podcast scripting
|
||||
- **Prompt templates** stored in the `prompts/` directory
|
||||
|
||||
### Esperanto Multi-Provider Library
|
||||
|
||||
Esperanto provides a unified interface to 16+ AI providers:
|
||||
|
||||
- Abstracts provider-specific API differences
|
||||
- Supports LLM, embedding, speech-to-text, and text-to-speech capabilities
|
||||
- Handles credential management and model discovery
|
||||
- Enables runtime provider switching without code changes
|
||||
|
||||
### Next.js Frontend
|
||||
|
||||
The user interface is a React application built with Next.js:
|
||||
|
||||
- Responsive design for desktop and tablet use
|
||||
- Real-time updates for chat and processing status
|
||||
- File upload with progress tracking
|
||||
- Audio player for podcast episodes
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Source Ingestion
|
||||
|
||||
```
|
||||
Upload/URL → Source Record Created → Processing Queue
|
||||
│
|
||||
┌──────────┼──────────┐
|
||||
▼ ▼ ▼
|
||||
Text Embedding Metadata
|
||||
Extraction Generation Extraction
|
||||
│ │ │
|
||||
└──────────┼──────────┘
|
||||
▼
|
||||
Source Updated
|
||||
(searchable)
|
||||
```
|
||||
|
||||
### Chat Execution
|
||||
|
||||
```
|
||||
User Message → Build Context (sources + notes)
|
||||
│
|
||||
▼
|
||||
LangGraph State Machine
|
||||
│
|
||||
├─ Retrieve relevant context
|
||||
├─ Format prompt with citations
|
||||
└─ Stream LLM response
|
||||
│
|
||||
▼
|
||||
Response with
|
||||
source citations
|
||||
```
|
||||
|
||||
### Podcast Generation
|
||||
|
||||
```
|
||||
Notebook Content → Episode Profile → Script Generation (LLM)
|
||||
│
|
||||
▼
|
||||
Speaker Assignment
|
||||
│
|
||||
▼
|
||||
Text-to-Speech
|
||||
(per segment)
|
||||
│
|
||||
▼
|
||||
Audio Assembly
|
||||
│
|
||||
▼
|
||||
Episode Record
|
||||
+ Audio File
|
||||
```
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Multi-provider by default**: Not locked to any single AI provider, enabling cost optimization and capability matching
|
||||
2. **Async processing**: Long-running operations (source ingestion, podcast generation) run asynchronously with status polling
|
||||
3. **Self-hosted data**: All data stays on the user's infrastructure with encrypted credential storage
|
||||
4. **REST-first API**: Every UI action is backed by an API endpoint for automation
|
||||
5. **Docker-native**: Designed for containerized deployment with persistent volumes
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
open-notebook/
|
||||
├── api/ # FastAPI REST API
|
||||
│ ├── main.py # App setup, middleware, routers
|
||||
│ ├── routers/ # Route handlers (20 modules)
|
||||
│ ├── models.py # Pydantic request/response models
|
||||
│ └── auth.py # Authentication middleware
|
||||
├── open_notebook/ # Core library
|
||||
│ ├── ai/ # AI integration (LangChain, Esperanto)
|
||||
│ ├── database/ # SurrealDB operations
|
||||
│ ├── domain/ # Domain models and business logic
|
||||
│ ├── graphs/ # LangGraph chat and processing graphs
|
||||
│ ├── podcasts/ # Podcast generation pipeline
|
||||
│ └── utils/ # Shared utilities
|
||||
├── frontend/ # Next.js React application
|
||||
├── prompts/ # AI prompt templates
|
||||
├── tests/ # Test suite
|
||||
└── docker-compose.yml # Deployment configuration
|
||||
```
|
||||
226
scientific-skills/open-notebook/references/configuration.md
Normal file
226
scientific-skills/open-notebook/references/configuration.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# Open Notebook Configuration Guide
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
Open Notebook is deployed as a Docker Compose stack with two main services: the application server and SurrealDB.
|
||||
|
||||
### Minimal docker-compose.yml
|
||||
|
||||
```yaml
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
surrealdb:
|
||||
image: surrealdb/surrealdb:latest
|
||||
command: start --user root --pass root rocksdb://data/database.db
|
||||
volumes:
|
||||
- surrealdb_data:/data
|
||||
ports:
|
||||
- "8000:8000"
|
||||
|
||||
open-notebook:
|
||||
image: ghcr.io/lfnovo/open-notebook:latest
|
||||
depends_on:
|
||||
- surrealdb
|
||||
environment:
|
||||
- OPEN_NOTEBOOK_ENCRYPTION_KEY=${OPEN_NOTEBOOK_ENCRYPTION_KEY}
|
||||
- SURREAL_URL=ws://surrealdb:8000/rpc
|
||||
- SURREAL_NAMESPACE=open_notebook
|
||||
- SURREAL_DATABASE=open_notebook
|
||||
ports:
|
||||
- "8502:8502" # Frontend UI
|
||||
- "5055:5055" # REST API
|
||||
volumes:
|
||||
- on_uploads:/app/uploads
|
||||
|
||||
volumes:
|
||||
surrealdb_data:
|
||||
on_uploads:
|
||||
```
|
||||
|
||||
### Starting the Stack
|
||||
|
||||
```bash
|
||||
# Set the encryption key (required)
|
||||
export OPEN_NOTEBOOK_ENCRYPTION_KEY="your-secure-random-key"
|
||||
|
||||
# Start services
|
||||
docker-compose up -d
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f open-notebook
|
||||
|
||||
# Stop services
|
||||
docker-compose down
|
||||
|
||||
# Stop and remove data
|
||||
docker-compose down -v
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Required
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `OPEN_NOTEBOOK_ENCRYPTION_KEY` | Secret key for encrypting stored API credentials. Must be set before first launch and kept consistent. |
|
||||
|
||||
### Database
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `SURREAL_URL` | `ws://surrealdb:8000/rpc` | SurrealDB WebSocket connection URL |
|
||||
| `SURREAL_NAMESPACE` | `open_notebook` | SurrealDB namespace |
|
||||
| `SURREAL_DATABASE` | `open_notebook` | SurrealDB database name |
|
||||
| `SURREAL_USER` | `root` | SurrealDB username |
|
||||
| `SURREAL_PASS` | `root` | SurrealDB password |
|
||||
|
||||
### Application
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `OPEN_NOTEBOOK_PASSWORD` | None | Optional password protection for the web UI |
|
||||
| `UPLOAD_DIR` | `/app/uploads` | Directory for uploaded file storage |
|
||||
|
||||
### AI Provider Keys (Legacy)
|
||||
|
||||
API keys can also be set via environment variables for legacy compatibility. The preferred method is using the credentials API or UI.
|
||||
|
||||
| Variable | Provider |
|
||||
|----------|----------|
|
||||
| `OPENAI_API_KEY` | OpenAI |
|
||||
| `ANTHROPIC_API_KEY` | Anthropic |
|
||||
| `GOOGLE_API_KEY` | Google GenAI |
|
||||
| `GROQ_API_KEY` | Groq |
|
||||
| `MISTRAL_API_KEY` | Mistral |
|
||||
| `ELEVENLABS_API_KEY` | ElevenLabs |
|
||||
|
||||
## AI Provider Configuration
|
||||
|
||||
### Via UI
|
||||
|
||||
1. Go to **Settings > API Keys**
|
||||
2. Click **Add Credential**
|
||||
3. Select provider, enter API key and optional base URL
|
||||
4. Click **Test Connection** to verify
|
||||
5. Click **Discover Models** to find available models
|
||||
6. Select models to register
|
||||
|
||||
### Via API
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
BASE_URL = "http://localhost:5055/api"
|
||||
|
||||
# 1. Create credential
|
||||
cred = requests.post(f"{BASE_URL}/credentials", json={
|
||||
"provider": "anthropic",
|
||||
"name": "Anthropic Production",
|
||||
"api_key": "sk-ant-..."
|
||||
}).json()
|
||||
|
||||
# 2. Test connection
|
||||
test = requests.post(f"{BASE_URL}/credentials/{cred['id']}/test").json()
|
||||
assert test["success"]
|
||||
|
||||
# 3. Discover and register models
|
||||
discovered = requests.post(
|
||||
f"{BASE_URL}/credentials/{cred['id']}/discover"
|
||||
).json()
|
||||
|
||||
requests.post(
|
||||
f"{BASE_URL}/credentials/{cred['id']}/register-models",
|
||||
json={"model_ids": [m["id"] for m in discovered["models"]]}
|
||||
)
|
||||
|
||||
# 4. Auto-assign defaults
|
||||
requests.post(f"{BASE_URL}/models/auto-assign")
|
||||
```
|
||||
|
||||
### Using Ollama (Free Local Inference)
|
||||
|
||||
For free AI inference without API costs, use Ollama:
|
||||
|
||||
```yaml
|
||||
# docker-compose-ollama.yml addition
|
||||
services:
|
||||
ollama:
|
||||
image: ollama/ollama:latest
|
||||
volumes:
|
||||
- ollama_data:/root/.ollama
|
||||
ports:
|
||||
- "11434:11434"
|
||||
```
|
||||
|
||||
Then configure Ollama as a provider with base URL `http://ollama:11434`.
|
||||
|
||||
## Security Configuration
|
||||
|
||||
### Password Protection
|
||||
|
||||
Set `OPEN_NOTEBOOK_PASSWORD` to require authentication:
|
||||
|
||||
```bash
|
||||
export OPEN_NOTEBOOK_PASSWORD="your-ui-password"
|
||||
```
|
||||
|
||||
### Reverse Proxy (Nginx Example)
|
||||
|
||||
```nginx
|
||||
server {
|
||||
listen 443 ssl;
|
||||
server_name notebook.example.com;
|
||||
|
||||
ssl_certificate /etc/ssl/certs/cert.pem;
|
||||
ssl_certificate_key /etc/ssl/private/key.pem;
|
||||
|
||||
location / {
|
||||
proxy_pass http://localhost:8502;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
|
||||
location /api/ {
|
||||
proxy_pass http://localhost:5055/api/;
|
||||
proxy_set_header Host $host;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Backup and Restore
|
||||
|
||||
### Backup SurrealDB Data
|
||||
|
||||
```bash
|
||||
# Export database
|
||||
docker exec surrealdb surreal export \
|
||||
--conn ws://localhost:8000 \
|
||||
--user root --pass root \
|
||||
--ns open_notebook --db open_notebook \
|
||||
/tmp/backup.surql
|
||||
|
||||
# Copy backup from container
|
||||
docker cp surrealdb:/tmp/backup.surql ./backup.surql
|
||||
```
|
||||
|
||||
### Backup Uploaded Files
|
||||
|
||||
```bash
|
||||
# Copy upload volume contents
|
||||
docker cp open-notebook:/app/uploads ./uploads_backup/
|
||||
```
|
||||
|
||||
### Restore
|
||||
|
||||
```bash
|
||||
# Import database backup
|
||||
docker cp ./backup.surql surrealdb:/tmp/backup.surql
|
||||
docker exec surrealdb surreal import \
|
||||
--conn ws://localhost:8000 \
|
||||
--user root --pass root \
|
||||
--ns open_notebook --db open_notebook \
|
||||
/tmp/backup.surql
|
||||
```
|
||||
290
scientific-skills/open-notebook/references/examples.md
Normal file
290
scientific-skills/open-notebook/references/examples.md
Normal file
@@ -0,0 +1,290 @@
|
||||
# Open Notebook Examples
|
||||
|
||||
## Complete Research Workflow
|
||||
|
||||
This example demonstrates a full research workflow: creating a notebook, adding sources, generating notes, chatting with the AI, and searching across materials.
|
||||
|
||||
```python
|
||||
import requests
|
||||
import time
|
||||
|
||||
BASE_URL = "http://localhost:5055/api"
|
||||
|
||||
|
||||
def complete_research_workflow():
|
||||
"""End-to-end research workflow with Open Notebook."""
|
||||
|
||||
# 1. Create a research notebook
|
||||
notebook = requests.post(f"{BASE_URL}/notebooks", json={
|
||||
"name": "Drug Resistance in Cancer",
|
||||
"description": "Review of mechanisms of drug resistance in solid tumors"
|
||||
}).json()
|
||||
notebook_id = notebook["id"]
|
||||
print(f"Created notebook: {notebook_id}")
|
||||
|
||||
# 2. Add sources from URLs
|
||||
urls = [
|
||||
"https://www.nature.com/articles/s41568-020-0281-y",
|
||||
"https://www.cell.com/cancer-cell/fulltext/S1535-6108(20)30211-8",
|
||||
]
|
||||
|
||||
source_ids = []
|
||||
for url in urls:
|
||||
source = requests.post(f"{BASE_URL}/sources", data={
|
||||
"url": url,
|
||||
"notebook_id": notebook_id,
|
||||
"process_async": "true"
|
||||
}).json()
|
||||
source_ids.append(source["id"])
|
||||
print(f"Added source: {source['id']}")
|
||||
|
||||
# 3. Wait for processing to complete
|
||||
for source_id in source_ids:
|
||||
while True:
|
||||
status = requests.get(
|
||||
f"{BASE_URL}/sources/{source_id}/status"
|
||||
).json()
|
||||
if status.get("status") in ("completed", "failed"):
|
||||
break
|
||||
time.sleep(5)
|
||||
print(f"Source {source_id}: {status['status']}")
|
||||
|
||||
# 4. Create a chat session and ask questions
|
||||
session = requests.post(f"{BASE_URL}/chat/sessions", json={
|
||||
"notebook_id": notebook_id,
|
||||
"title": "Resistance Mechanisms"
|
||||
}).json()
|
||||
|
||||
answer = requests.post(f"{BASE_URL}/chat/execute", json={
|
||||
"session_id": session["id"],
|
||||
"message": "What are the primary mechanisms of drug resistance in solid tumors?",
|
||||
"context": {"include_sources": True, "include_notes": True}
|
||||
}).json()
|
||||
print(f"AI response: {answer}")
|
||||
|
||||
# 5. Search across materials
|
||||
results = requests.post(f"{BASE_URL}/search", json={
|
||||
"query": "efflux pump resistance mechanism",
|
||||
"search_type": "vector",
|
||||
"limit": 5
|
||||
}).json()
|
||||
print(f"Found {results['total']} search results")
|
||||
|
||||
# 6. Create a human note summarizing findings
|
||||
note = requests.post(f"{BASE_URL}/notes", json={
|
||||
"title": "Summary of Resistance Mechanisms",
|
||||
"content": "Key findings from the literature...",
|
||||
"note_type": "human",
|
||||
"notebook_id": notebook_id
|
||||
}).json()
|
||||
print(f"Created note: {note['id']}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
complete_research_workflow()
|
||||
```
|
||||
|
||||
## File Upload Example
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
BASE_URL = "http://localhost:5055/api"
|
||||
|
||||
|
||||
def upload_research_papers(notebook_id, file_paths):
|
||||
"""Upload multiple research papers to a notebook."""
|
||||
for path in file_paths:
|
||||
with open(path, "rb") as f:
|
||||
response = requests.post(
|
||||
f"{BASE_URL}/sources",
|
||||
data={
|
||||
"notebook_id": notebook_id,
|
||||
"process_async": "true",
|
||||
},
|
||||
files={"file": (path.split("/")[-1], f)},
|
||||
)
|
||||
if response.status_code == 200:
|
||||
print(f"Uploaded: {path}")
|
||||
else:
|
||||
print(f"Failed: {path} - {response.text}")
|
||||
|
||||
|
||||
# Usage
|
||||
upload_research_papers("notebook:abc123", [
|
||||
"papers/study_1.pdf",
|
||||
"papers/study_2.pdf",
|
||||
"papers/supplementary.docx",
|
||||
])
|
||||
```
|
||||
|
||||
## Podcast Generation Example
|
||||
|
||||
```python
|
||||
import requests
|
||||
import time
|
||||
|
||||
BASE_URL = "http://localhost:5055/api"
|
||||
|
||||
|
||||
def generate_research_podcast(notebook_id):
|
||||
"""Generate a podcast episode from notebook contents."""
|
||||
|
||||
# Get available episode and speaker profiles
|
||||
# (these must be configured in the UI or via API first)
|
||||
|
||||
# Submit podcast generation job
|
||||
job = requests.post(f"{BASE_URL}/podcasts/generate", json={
|
||||
"notebook_id": notebook_id,
|
||||
"episode_profile_id": "episode_profile:default",
|
||||
"speaker_profile_ids": [
|
||||
"speaker_profile:host",
|
||||
"speaker_profile:expert"
|
||||
]
|
||||
}).json()
|
||||
job_id = job["job_id"]
|
||||
print(f"Podcast generation started: {job_id}")
|
||||
|
||||
# Poll for completion
|
||||
while True:
|
||||
status = requests.get(f"{BASE_URL}/podcasts/jobs/{job_id}").json()
|
||||
print(f"Status: {status.get('status', 'processing')}")
|
||||
if status.get("status") in ("completed", "failed"):
|
||||
break
|
||||
time.sleep(10)
|
||||
|
||||
if status["status"] == "completed":
|
||||
# Download the audio
|
||||
episode_id = status["episode_id"]
|
||||
audio = requests.get(
|
||||
f"{BASE_URL}/podcasts/episodes/{episode_id}/audio"
|
||||
)
|
||||
with open("research_podcast.mp3", "wb") as f:
|
||||
f.write(audio.content)
|
||||
print("Podcast saved to research_podcast.mp3")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
generate_research_podcast("notebook:abc123")
|
||||
```
|
||||
|
||||
## Custom Transformation Pipeline
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
BASE_URL = "http://localhost:5055/api"
|
||||
|
||||
|
||||
def create_and_run_transformations():
|
||||
"""Create custom transformations and apply them to content."""
|
||||
|
||||
# Create a methodology extraction transformation
|
||||
transform = requests.post(f"{BASE_URL}/transformations", json={
|
||||
"name": "extract_methods",
|
||||
"title": "Extract Methods",
|
||||
"description": "Extract and structure methodology from papers",
|
||||
"prompt": (
|
||||
"Extract the methodology section from this text. "
|
||||
"Organize into: Study Design, Sample Size, Statistical Methods, "
|
||||
"and Key Variables. Format as structured markdown."
|
||||
),
|
||||
"apply_default": False,
|
||||
}).json()
|
||||
|
||||
# Get models to find a suitable one
|
||||
models = requests.get(f"{BASE_URL}/models", params={
|
||||
"model_type": "llm"
|
||||
}).json()
|
||||
model_id = models[0]["id"]
|
||||
|
||||
# Execute the transformation
|
||||
result = requests.post(f"{BASE_URL}/transformations/execute", json={
|
||||
"transformation_id": transform["id"],
|
||||
"input_text": "We conducted a randomized controlled trial with...",
|
||||
"model_id": model_id,
|
||||
}).json()
|
||||
print(f"Extracted methods:\n{result['output']}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
create_and_run_transformations()
|
||||
```
|
||||
|
||||
## Semantic Search with Filtering
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
BASE_URL = "http://localhost:5055/api"
|
||||
|
||||
|
||||
def advanced_search(notebook_id, query):
|
||||
"""Perform filtered semantic search and get AI answers."""
|
||||
|
||||
# Get sources from a specific notebook
|
||||
sources = requests.get(f"{BASE_URL}/sources", params={
|
||||
"notebook_id": notebook_id
|
||||
}).json()
|
||||
source_ids = [s["id"] for s in sources]
|
||||
|
||||
# Vector search restricted to notebook sources
|
||||
results = requests.post(f"{BASE_URL}/search", json={
|
||||
"query": query,
|
||||
"search_type": "vector",
|
||||
"limit": 10,
|
||||
"source_ids": source_ids,
|
||||
"min_similarity": 0.75,
|
||||
}).json()
|
||||
|
||||
print(f"Found {results['total']} results:")
|
||||
for result in results["results"]:
|
||||
print(f" - {result.get('title', 'Untitled')} "
|
||||
f"(similarity: {result.get('similarity', 'N/A')})")
|
||||
|
||||
# Get an AI-powered answer
|
||||
answer = requests.post(f"{BASE_URL}/search/ask/simple", json={
|
||||
"query": query,
|
||||
}).json()
|
||||
print(f"\nAI Answer: {answer['response']}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
advanced_search("notebook:abc123", "CRISPR gene editing efficiency")
|
||||
```
|
||||
|
||||
## Model Management
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
BASE_URL = "http://localhost:5055/api"
|
||||
|
||||
|
||||
def setup_ai_models():
|
||||
"""Configure AI models for Open Notebook."""
|
||||
|
||||
# Check available providers
|
||||
providers = requests.get(f"{BASE_URL}/models/providers").json()
|
||||
print(f"Available providers: {providers}")
|
||||
|
||||
# Discover models from a provider
|
||||
discovered = requests.get(
|
||||
f"{BASE_URL}/models/discover/openai"
|
||||
).json()
|
||||
print(f"Discovered {len(discovered)} OpenAI models")
|
||||
|
||||
# Sync models to make them available
|
||||
requests.post(f"{BASE_URL}/models/sync/openai")
|
||||
|
||||
# Auto-assign default models
|
||||
requests.post(f"{BASE_URL}/models/auto-assign")
|
||||
|
||||
# Check current defaults
|
||||
defaults = requests.get(f"{BASE_URL}/models/defaults").json()
|
||||
print(f"Default models: {defaults}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
setup_ai_models()
|
||||
```
|
||||
Reference in New Issue
Block a user