Files
Vinayak Agarwal f72b7f4521 Added parallel-web skill
Refactor research lookup skill to enhance backend routing and update documentation. The skill now intelligently selects between the Parallel Chat API and Perplexity sonar-pro-search based on query type. Added compatibility notes, license information, and improved descriptions for clarity. Removed outdated example scripts to streamline the codebase.
2026-03-01 07:36:19 -08:00

363 lines
13 KiB
Markdown

# Deep Research Guide
Comprehensive guide to using Parallel's Task API for deep research, including processor selection, output formats, structured schemas, and advanced patterns.
---
## Overview
Deep Research transforms natural language research queries into comprehensive intelligence reports. Unlike simple search, it performs multi-step web exploration across authoritative sources and synthesizes findings with inline citations and confidence levels.
**Key characteristics:**
- Multi-step, multi-source research
- Automatic citation and source attribution
- Structured or text output formats
- Asynchronous processing (30 seconds to 25+ minutes)
- Research basis with confidence levels per finding
---
## Processor Selection
Choosing the right processor is the most important decision. It determines research depth, speed, and cost.
### Decision Matrix
| Scenario | Recommended Processor | Why |
|----------|----------------------|-----|
| Quick background for a paper section | `pro-fast` | Fast, good depth, low cost |
| Comprehensive market research report | `ultra-fast` | Deep multi-source synthesis |
| Simple fact lookup or metadata | `base-fast` | Fast, low cost |
| Competitive landscape analysis | `pro-fast` | Good balance of depth and speed |
| Background for grant proposal | `pro-fast` | Thorough but timely |
| State-of-the-art review for a topic | `ultra-fast` | Maximum source coverage |
| Quick question during writing | `core-fast` | Sub-2-minute response |
| Breaking news or very recent events | `pro` (standard) | Freshest data prioritized |
| Large-scale data enrichment | `base-fast` | Cost-effective at scale |
### Processor Tiers Explained
**`pro-fast`** (default, recommended for most tasks):
- Latency: 30 seconds to 5 minutes
- Depth: Explores 10-20+ web sources
- Best for: Section-level research, background gathering, comparative analysis
- Cost: $0.10 per query
**`ultra-fast`** (for comprehensive research):
- Latency: 1 to 10 minutes
- Depth: Explores 20-50+ web sources, multiple reasoning steps
- Best for: Full reports, market analysis, complex multi-faceted questions
- Cost: $0.30 per query
**`core-fast`** (quick cross-referenced answers):
- Latency: 15 seconds to 100 seconds
- Depth: Cross-references 5-10 sources
- Best for: Moderate complexity questions, verification tasks
- Cost: $0.025 per query
**`base-fast`** (simple enrichment):
- Latency: 15 to 50 seconds
- Depth: Standard web lookup, 3-5 sources
- Best for: Simple factual queries, metadata enrichment
- Cost: $0.01 per query
### Standard vs Fast
- **Fast processors** (`-fast`): 2-5x faster, very fresh data, ideal for interactive use
- **Standard processors** (no suffix): Highest data freshness, better for background jobs
**Rule of thumb:** Always use `-fast` variants unless you specifically need the freshest possible data (breaking news, live financial data, real-time events).
---
## Output Formats
### Text Mode (Markdown Reports)
Returns a comprehensive markdown report with inline citations. Best for human consumption and document integration.
```python
researcher = ParallelDeepResearch()
result = researcher.research(
query="Comprehensive analysis of mRNA vaccine technology platforms and their applications beyond COVID-19",
processor="pro-fast",
description="Focus on clinical trials, approved applications, pipeline developments, and key companies. Include market size data."
)
# result["output"] contains a full markdown report
# result["citations"] contains source URLs with excerpts
```
**When to use text mode:**
- Writing scientific documents (papers, reviews, reports)
- Background research for a topic
- Creating summaries for human readers
- When you need flowing prose, not structured data
**Guiding text output with `description`:**
The `description` parameter steers the report content:
```python
# Focus on specific aspects
result = researcher.research(
query="Electric vehicle battery technology landscape",
description="Focus on: (1) solid-state battery progress, (2) charging speed improvements, (3) cost per kWh trends, (4) key patents and IP. Format as a structured report with clear sections."
)
# Control length and depth
result = researcher.research(
query="AI in drug discovery",
description="Provide a concise 500-word executive summary covering key applications, notable successes, leading companies, and market projections."
)
```
### Auto-Schema Mode (Structured JSON)
Lets the processor determine the best output structure automatically. Returns structured JSON with per-field citations.
```python
result = researcher.research_structured(
query="Top 5 cloud computing companies: revenue, market share, key products, and recent developments",
processor="pro-fast",
)
# result["content"] contains structured data (dict)
# result["basis"] contains per-field citations with confidence
```
**When to use auto-schema:**
- Data extraction and enrichment
- Comparative analysis with specific fields
- When you need programmatic access to individual data points
- Integration with databases or spreadsheets
### Custom JSON Schema
Define exactly what fields you want returned:
```python
schema = {
"type": "object",
"properties": {
"market_size_2024": {
"type": "string",
"description": "Global market size in USD billions for 2024. Include source."
},
"growth_rate": {
"type": "string",
"description": "CAGR percentage for 2024-2030 forecast period."
},
"top_companies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Company name"},
"market_share": {"type": "string", "description": "Approximate market share percentage"},
"revenue": {"type": "string", "description": "Most recent annual revenue"}
},
"required": ["name", "market_share", "revenue"]
},
"description": "Top 5 companies by market share"
},
"key_trends": {
"type": "array",
"items": {"type": "string"},
"description": "Top 3-5 industry trends driving growth"
}
},
"required": ["market_size_2024", "growth_rate", "top_companies", "key_trends"],
"additionalProperties": False
}
result = researcher.research_structured(
query="Global cybersecurity market analysis",
output_schema=schema,
)
```
---
## Writing Effective Research Queries
### Query Construction Framework
Structure your query as: **[Topic] + [Specific Aspect] + [Scope/Time] + [Output Expectations]**
**Good queries:**
```
"Comprehensive analysis of the global lithium-ion battery recycling market,
including market size, key players, regulatory drivers, and technology
approaches. Focus on 2023-2025 developments."
"Compare the efficacy, safety profiles, and cost-effectiveness of GLP-1
receptor agonists (semaglutide, tirzepatide, liraglutide) for type 2
diabetes management based on recent clinical trial data."
"Survey of federated learning approaches for healthcare AI, covering
privacy-preserving techniques, real-world deployments, regulatory
compliance, and performance benchmarks from 2023-2025 publications."
```
**Poor queries:**
```
"Tell me about batteries" # Too vague
"AI" # No specific aspect
"What's new?" # No topic at all
"Everything about quantum computing from all time" # Too broad
```
### Tips for Better Results
1. **Be specific about what you need**: "market size" vs "tell me about the market"
2. **Include time bounds**: "2024-2025" narrows to relevant data
3. **Name entities**: "semaglutide vs tirzepatide" vs "diabetes drugs"
4. **Specify output expectations**: "Include statistics, key players, and growth projections"
5. **Keep under 15,000 characters**: Concise queries work better than massive prompts
---
## Working with Research Basis
Every deep research result includes a **basis** -- citations, reasoning, and confidence levels for each finding.
### Text Mode Basis
```python
result = researcher.research(query="...", processor="pro-fast")
# Citations are deduplicated and include URLs + excerpts
for citation in result["citations"]:
print(f"Source: {citation['title']}")
print(f"URL: {citation['url']}")
if citation.get("excerpts"):
print(f"Excerpt: {citation['excerpts'][0][:200]}")
```
### Structured Mode Basis
```python
result = researcher.research_structured(query="...", processor="pro-fast")
for basis_entry in result["basis"]:
print(f"Field: {basis_entry['field']}")
print(f"Confidence: {basis_entry['confidence']}")
print(f"Reasoning: {basis_entry['reasoning']}")
for cit in basis_entry["citations"]:
print(f" Source: {cit['url']}")
```
### Confidence Levels
| Level | Meaning | Action |
|-------|---------|--------|
| `high` | Multiple authoritative sources agree | Use directly |
| `medium` | Some supporting evidence, minor uncertainty | Use with caveat |
| `low` | Limited evidence, significant uncertainty | Verify independently |
---
## Advanced Patterns
### Multi-Stage Research
Use different processors in sequence for progressively deeper research:
```python
# Stage 1: Quick overview with base-fast
overview = researcher.research(
query="What are the main approaches to quantum error correction?",
processor="base-fast",
)
# Stage 2: Deep dive on the most promising approach
deep_dive = researcher.research(
query=f"Detailed analysis of surface code quantum error correction: "
f"recent breakthroughs, implementation challenges, and leading research groups. "
f"Context: {overview['output'][:500]}",
processor="pro-fast",
)
```
### Comparative Research
```python
result = researcher.research(
query="Compare and contrast three leading large language model architectures: "
"GPT-4, Claude, and Gemini. Cover architecture differences, benchmark performance, "
"pricing, context window, and unique capabilities. Include specific benchmark scores.",
processor="pro-fast",
description="Create a structured comparison with a summary table. Include specific numbers and benchmarks."
)
```
### Research with Follow-Up Extraction
```python
# Step 1: Research to find relevant sources
research_result = researcher.research(
query="Most influential papers on attention mechanisms in 2024",
processor="pro-fast",
)
# Step 2: Extract full content from the most relevant sources
from parallel_web import ParallelExtract
extractor = ParallelExtract()
key_urls = [c["url"] for c in research_result["citations"][:5]]
for url in key_urls:
extracted = extractor.extract(
urls=[url],
objective="Key methodology, results, and conclusions",
)
```
---
## Performance Optimization
### Reducing Latency
1. **Use `-fast` processors**: 2-5x faster than standard
2. **Use `core-fast` for moderate queries**: Sub-2-minute for most questions
3. **Be specific in queries**: Vague queries require more exploration
4. **Set appropriate timeouts**: Don't over-wait
### Reducing Cost
1. **Start with `base-fast`**: Upgrade only if depth is insufficient
2. **Use `core-fast` for moderate complexity**: $0.025 vs $0.10 for pro
3. **Batch related queries**: One well-crafted query > multiple simple ones
4. **Cache results**: Store research output for reuse across sections
### Maximizing Quality
1. **Use `pro-fast` or `ultra-fast`**: More sources = better synthesis
2. **Provide context**: "I'm writing a paper for Nature Medicine about..."
3. **Use `description` parameter**: Guide the output structure and focus
4. **Verify critical findings**: Cross-check with Search API or Extract
---
## Common Mistakes
| Mistake | Impact | Fix |
|---------|--------|-----|
| Query too vague | Scattered, unfocused results | Add specific aspects and time bounds |
| Query too long (>15K chars) | API rejection or degraded results | Summarize context, focus on key question |
| Wrong processor | Too slow or too shallow | Use decision matrix above |
| Not using `description` | Report structure not aligned with needs | Add description to guide output |
| Ignoring confidence levels | Using low-confidence data as fact | Check basis confidence before citing |
| Not verifying citations | Risk of outdated or misattributed data | Cross-check key citations with Extract |
---
## See Also
- [API Reference](api_reference.md) - Complete API parameter reference
- [Search Best Practices](search_best_practices.md) - For quick web searches
- [Extraction Patterns](extraction_patterns.md) - For reading specific URLs
- [Workflow Recipes](workflow_recipes.md) - Common multi-step patterns