Files
claude-scientific-skills/scientific-skills/open-notebook/references/architecture.md
2026-02-23 22:52:10 -05:00

164 lines
7.3 KiB
Markdown

# Open Notebook Architecture
## System Overview
Open Notebook is built as a modern Python web application with a clear separation between frontend and backend, using Docker for deployment.
```
┌─────────────────────────────────────────────────────┐
│ Docker Compose │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Next.js │ │ FastAPI │ │ SurrealDB │ │
│ │ Frontend │──│ Backend │──│ │ │
│ │ (port 8502) │ │ (port 5055) │ │ (port 8K) │ │
│ └──────────────┘ └──────────────┘ └───────────┘ │
│ │ │
│ ┌─────┴─────┐ │
│ │ LangChain │ │
│ │ Esperanto │ │
│ └─────┬─────┘ │
│ │ │
│ ┌───────────┼───────────┐ │
│ │ │ │ │
│ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐ │
│ │OpenAI │ │Claude │ │Ollama │ ... │
│ └───────┘ └───────┘ └───────┘ │
└─────────────────────────────────────────────────────┘
```
## Core Components
### FastAPI Backend
The REST API is built with FastAPI and organized into routers:
- **20 route modules** covering notebooks, sources, notes, chat, search, podcasts, transformations, models, credentials, embeddings, settings, and more
- Async/await throughout for non-blocking I/O
- Pydantic models for request/response validation
- Custom exception handlers mapping domain errors to HTTP status codes
- CORS middleware for cross-origin access
- Optional password authentication middleware
### SurrealDB
SurrealDB serves as the primary data store, providing both document and relational capabilities:
- **Document storage** for notebooks, sources, notes, transformations, and models
- **Relational references** for notebook-source associations
- **Full-text search** across indexed content
- **RocksDB** backend for persistent storage on disk
- Schema migrations run automatically on application startup
### LangChain Integration
AI features are powered by LangChain with the Esperanto multi-provider library:
- **LangGraph** manages conversational state for chat sessions
- **Embedding models** power vector search across content
- **LLM chains** drive transformations, note generation, and podcast scripting
- **Prompt templates** stored in the `prompts/` directory
### Esperanto Multi-Provider Library
Esperanto provides a unified interface to 16+ AI providers:
- Abstracts provider-specific API differences
- Supports LLM, embedding, speech-to-text, and text-to-speech capabilities
- Handles credential management and model discovery
- Enables runtime provider switching without code changes
### Next.js Frontend
The user interface is a React application built with Next.js:
- Responsive design for desktop and tablet use
- Real-time updates for chat and processing status
- File upload with progress tracking
- Audio player for podcast episodes
## Data Flow
### Source Ingestion
```
Upload/URL → Source Record Created → Processing Queue
┌──────────┼──────────┐
▼ ▼ ▼
Text Embedding Metadata
Extraction Generation Extraction
│ │ │
└──────────┼──────────┘
Source Updated
(searchable)
```
### Chat Execution
```
User Message → Build Context (sources + notes)
LangGraph State Machine
├─ Retrieve relevant context
├─ Format prompt with citations
└─ Stream LLM response
Response with
source citations
```
### Podcast Generation
```
Notebook Content → Episode Profile → Script Generation (LLM)
Speaker Assignment
Text-to-Speech
(per segment)
Audio Assembly
Episode Record
+ Audio File
```
## Key Design Decisions
1. **Multi-provider by default**: Not locked to any single AI provider, enabling cost optimization and capability matching
2. **Async processing**: Long-running operations (source ingestion, podcast generation) run asynchronously with status polling
3. **Self-hosted data**: All data stays on the user's infrastructure with encrypted credential storage
4. **REST-first API**: Every UI action is backed by an API endpoint for automation
5. **Docker-native**: Designed for containerized deployment with persistent volumes
## File Structure
```
open-notebook/
├── api/ # FastAPI REST API
│ ├── main.py # App setup, middleware, routers
│ ├── routers/ # Route handlers (20 modules)
│ ├── models.py # Pydantic request/response models
│ └── auth.py # Authentication middleware
├── open_notebook/ # Core library
│ ├── ai/ # AI integration (LangChain, Esperanto)
│ ├── database/ # SurrealDB operations
│ ├── domain/ # Domain models and business logic
│ ├── graphs/ # LangGraph chat and processing graphs
│ ├── podcasts/ # Podcast generation pipeline
│ └── utils/ # Shared utilities
├── frontend/ # Next.js React application
├── prompts/ # AI prompt templates
├── tests/ # Test suite
└── docker-compose.yml # Deployment configuration
```