mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-28 07:33:45 +08:00
Refactor research lookup skill to enhance backend routing and update documentation. The skill now intelligently selects between the Parallel Chat API and Perplexity sonar-pro-search based on query type. Added compatibility notes, license information, and improved descriptions for clarity. Removed outdated example scripts to streamline the codebase.
363 lines
13 KiB
Markdown
363 lines
13 KiB
Markdown
# Deep Research Guide
|
|
|
|
Comprehensive guide to using Parallel's Task API for deep research, including processor selection, output formats, structured schemas, and advanced patterns.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Deep Research transforms natural language research queries into comprehensive intelligence reports. Unlike simple search, it performs multi-step web exploration across authoritative sources and synthesizes findings with inline citations and confidence levels.
|
|
|
|
**Key characteristics:**
|
|
- Multi-step, multi-source research
|
|
- Automatic citation and source attribution
|
|
- Structured or text output formats
|
|
- Asynchronous processing (30 seconds to 25+ minutes)
|
|
- Research basis with confidence levels per finding
|
|
|
|
---
|
|
|
|
## Processor Selection
|
|
|
|
Choosing the right processor is the most important decision. It determines research depth, speed, and cost.
|
|
|
|
### Decision Matrix
|
|
|
|
| Scenario | Recommended Processor | Why |
|
|
|----------|----------------------|-----|
|
|
| Quick background for a paper section | `pro-fast` | Fast, good depth, low cost |
|
|
| Comprehensive market research report | `ultra-fast` | Deep multi-source synthesis |
|
|
| Simple fact lookup or metadata | `base-fast` | Fast, low cost |
|
|
| Competitive landscape analysis | `pro-fast` | Good balance of depth and speed |
|
|
| Background for grant proposal | `pro-fast` | Thorough but timely |
|
|
| State-of-the-art review for a topic | `ultra-fast` | Maximum source coverage |
|
|
| Quick question during writing | `core-fast` | Sub-2-minute response |
|
|
| Breaking news or very recent events | `pro` (standard) | Freshest data prioritized |
|
|
| Large-scale data enrichment | `base-fast` | Cost-effective at scale |
|
|
|
|
### Processor Tiers Explained
|
|
|
|
**`pro-fast`** (default, recommended for most tasks):
|
|
- Latency: 30 seconds to 5 minutes
|
|
- Depth: Explores 10-20+ web sources
|
|
- Best for: Section-level research, background gathering, comparative analysis
|
|
- Cost: $0.10 per query
|
|
|
|
**`ultra-fast`** (for comprehensive research):
|
|
- Latency: 1 to 10 minutes
|
|
- Depth: Explores 20-50+ web sources, multiple reasoning steps
|
|
- Best for: Full reports, market analysis, complex multi-faceted questions
|
|
- Cost: $0.30 per query
|
|
|
|
**`core-fast`** (quick cross-referenced answers):
|
|
- Latency: 15 seconds to 100 seconds
|
|
- Depth: Cross-references 5-10 sources
|
|
- Best for: Moderate complexity questions, verification tasks
|
|
- Cost: $0.025 per query
|
|
|
|
**`base-fast`** (simple enrichment):
|
|
- Latency: 15 to 50 seconds
|
|
- Depth: Standard web lookup, 3-5 sources
|
|
- Best for: Simple factual queries, metadata enrichment
|
|
- Cost: $0.01 per query
|
|
|
|
### Standard vs Fast
|
|
|
|
- **Fast processors** (`-fast`): 2-5x faster, very fresh data, ideal for interactive use
|
|
- **Standard processors** (no suffix): Highest data freshness, better for background jobs
|
|
|
|
**Rule of thumb:** Always use `-fast` variants unless you specifically need the freshest possible data (breaking news, live financial data, real-time events).
|
|
|
|
---
|
|
|
|
## Output Formats
|
|
|
|
### Text Mode (Markdown Reports)
|
|
|
|
Returns a comprehensive markdown report with inline citations. Best for human consumption and document integration.
|
|
|
|
```python
|
|
researcher = ParallelDeepResearch()
|
|
|
|
result = researcher.research(
|
|
query="Comprehensive analysis of mRNA vaccine technology platforms and their applications beyond COVID-19",
|
|
processor="pro-fast",
|
|
description="Focus on clinical trials, approved applications, pipeline developments, and key companies. Include market size data."
|
|
)
|
|
|
|
# result["output"] contains a full markdown report
|
|
# result["citations"] contains source URLs with excerpts
|
|
```
|
|
|
|
**When to use text mode:**
|
|
- Writing scientific documents (papers, reviews, reports)
|
|
- Background research for a topic
|
|
- Creating summaries for human readers
|
|
- When you need flowing prose, not structured data
|
|
|
|
**Guiding text output with `description`:**
|
|
|
|
The `description` parameter steers the report content:
|
|
|
|
```python
|
|
# Focus on specific aspects
|
|
result = researcher.research(
|
|
query="Electric vehicle battery technology landscape",
|
|
description="Focus on: (1) solid-state battery progress, (2) charging speed improvements, (3) cost per kWh trends, (4) key patents and IP. Format as a structured report with clear sections."
|
|
)
|
|
|
|
# Control length and depth
|
|
result = researcher.research(
|
|
query="AI in drug discovery",
|
|
description="Provide a concise 500-word executive summary covering key applications, notable successes, leading companies, and market projections."
|
|
)
|
|
```
|
|
|
|
### Auto-Schema Mode (Structured JSON)
|
|
|
|
Lets the processor determine the best output structure automatically. Returns structured JSON with per-field citations.
|
|
|
|
```python
|
|
result = researcher.research_structured(
|
|
query="Top 5 cloud computing companies: revenue, market share, key products, and recent developments",
|
|
processor="pro-fast",
|
|
)
|
|
|
|
# result["content"] contains structured data (dict)
|
|
# result["basis"] contains per-field citations with confidence
|
|
```
|
|
|
|
**When to use auto-schema:**
|
|
- Data extraction and enrichment
|
|
- Comparative analysis with specific fields
|
|
- When you need programmatic access to individual data points
|
|
- Integration with databases or spreadsheets
|
|
|
|
### Custom JSON Schema
|
|
|
|
Define exactly what fields you want returned:
|
|
|
|
```python
|
|
schema = {
|
|
"type": "object",
|
|
"properties": {
|
|
"market_size_2024": {
|
|
"type": "string",
|
|
"description": "Global market size in USD billions for 2024. Include source."
|
|
},
|
|
"growth_rate": {
|
|
"type": "string",
|
|
"description": "CAGR percentage for 2024-2030 forecast period."
|
|
},
|
|
"top_companies": {
|
|
"type": "array",
|
|
"items": {
|
|
"type": "object",
|
|
"properties": {
|
|
"name": {"type": "string", "description": "Company name"},
|
|
"market_share": {"type": "string", "description": "Approximate market share percentage"},
|
|
"revenue": {"type": "string", "description": "Most recent annual revenue"}
|
|
},
|
|
"required": ["name", "market_share", "revenue"]
|
|
},
|
|
"description": "Top 5 companies by market share"
|
|
},
|
|
"key_trends": {
|
|
"type": "array",
|
|
"items": {"type": "string"},
|
|
"description": "Top 3-5 industry trends driving growth"
|
|
}
|
|
},
|
|
"required": ["market_size_2024", "growth_rate", "top_companies", "key_trends"],
|
|
"additionalProperties": False
|
|
}
|
|
|
|
result = researcher.research_structured(
|
|
query="Global cybersecurity market analysis",
|
|
output_schema=schema,
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Writing Effective Research Queries
|
|
|
|
### Query Construction Framework
|
|
|
|
Structure your query as: **[Topic] + [Specific Aspect] + [Scope/Time] + [Output Expectations]**
|
|
|
|
**Good queries:**
|
|
```
|
|
"Comprehensive analysis of the global lithium-ion battery recycling market,
|
|
including market size, key players, regulatory drivers, and technology
|
|
approaches. Focus on 2023-2025 developments."
|
|
|
|
"Compare the efficacy, safety profiles, and cost-effectiveness of GLP-1
|
|
receptor agonists (semaglutide, tirzepatide, liraglutide) for type 2
|
|
diabetes management based on recent clinical trial data."
|
|
|
|
"Survey of federated learning approaches for healthcare AI, covering
|
|
privacy-preserving techniques, real-world deployments, regulatory
|
|
compliance, and performance benchmarks from 2023-2025 publications."
|
|
```
|
|
|
|
**Poor queries:**
|
|
```
|
|
"Tell me about batteries" # Too vague
|
|
"AI" # No specific aspect
|
|
"What's new?" # No topic at all
|
|
"Everything about quantum computing from all time" # Too broad
|
|
```
|
|
|
|
### Tips for Better Results
|
|
|
|
1. **Be specific about what you need**: "market size" vs "tell me about the market"
|
|
2. **Include time bounds**: "2024-2025" narrows to relevant data
|
|
3. **Name entities**: "semaglutide vs tirzepatide" vs "diabetes drugs"
|
|
4. **Specify output expectations**: "Include statistics, key players, and growth projections"
|
|
5. **Keep under 15,000 characters**: Concise queries work better than massive prompts
|
|
|
|
---
|
|
|
|
## Working with Research Basis
|
|
|
|
Every deep research result includes a **basis** -- citations, reasoning, and confidence levels for each finding.
|
|
|
|
### Text Mode Basis
|
|
|
|
```python
|
|
result = researcher.research(query="...", processor="pro-fast")
|
|
|
|
# Citations are deduplicated and include URLs + excerpts
|
|
for citation in result["citations"]:
|
|
print(f"Source: {citation['title']}")
|
|
print(f"URL: {citation['url']}")
|
|
if citation.get("excerpts"):
|
|
print(f"Excerpt: {citation['excerpts'][0][:200]}")
|
|
```
|
|
|
|
### Structured Mode Basis
|
|
|
|
```python
|
|
result = researcher.research_structured(query="...", processor="pro-fast")
|
|
|
|
for basis_entry in result["basis"]:
|
|
print(f"Field: {basis_entry['field']}")
|
|
print(f"Confidence: {basis_entry['confidence']}")
|
|
print(f"Reasoning: {basis_entry['reasoning']}")
|
|
for cit in basis_entry["citations"]:
|
|
print(f" Source: {cit['url']}")
|
|
```
|
|
|
|
### Confidence Levels
|
|
|
|
| Level | Meaning | Action |
|
|
|-------|---------|--------|
|
|
| `high` | Multiple authoritative sources agree | Use directly |
|
|
| `medium` | Some supporting evidence, minor uncertainty | Use with caveat |
|
|
| `low` | Limited evidence, significant uncertainty | Verify independently |
|
|
|
|
---
|
|
|
|
## Advanced Patterns
|
|
|
|
### Multi-Stage Research
|
|
|
|
Use different processors in sequence for progressively deeper research:
|
|
|
|
```python
|
|
# Stage 1: Quick overview with base-fast
|
|
overview = researcher.research(
|
|
query="What are the main approaches to quantum error correction?",
|
|
processor="base-fast",
|
|
)
|
|
|
|
# Stage 2: Deep dive on the most promising approach
|
|
deep_dive = researcher.research(
|
|
query=f"Detailed analysis of surface code quantum error correction: "
|
|
f"recent breakthroughs, implementation challenges, and leading research groups. "
|
|
f"Context: {overview['output'][:500]}",
|
|
processor="pro-fast",
|
|
)
|
|
```
|
|
|
|
### Comparative Research
|
|
|
|
```python
|
|
result = researcher.research(
|
|
query="Compare and contrast three leading large language model architectures: "
|
|
"GPT-4, Claude, and Gemini. Cover architecture differences, benchmark performance, "
|
|
"pricing, context window, and unique capabilities. Include specific benchmark scores.",
|
|
processor="pro-fast",
|
|
description="Create a structured comparison with a summary table. Include specific numbers and benchmarks."
|
|
)
|
|
```
|
|
|
|
### Research with Follow-Up Extraction
|
|
|
|
```python
|
|
# Step 1: Research to find relevant sources
|
|
research_result = researcher.research(
|
|
query="Most influential papers on attention mechanisms in 2024",
|
|
processor="pro-fast",
|
|
)
|
|
|
|
# Step 2: Extract full content from the most relevant sources
|
|
from parallel_web import ParallelExtract
|
|
extractor = ParallelExtract()
|
|
|
|
key_urls = [c["url"] for c in research_result["citations"][:5]]
|
|
for url in key_urls:
|
|
extracted = extractor.extract(
|
|
urls=[url],
|
|
objective="Key methodology, results, and conclusions",
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Optimization
|
|
|
|
### Reducing Latency
|
|
|
|
1. **Use `-fast` processors**: 2-5x faster than standard
|
|
2. **Use `core-fast` for moderate queries**: Sub-2-minute for most questions
|
|
3. **Be specific in queries**: Vague queries require more exploration
|
|
4. **Set appropriate timeouts**: Don't over-wait
|
|
|
|
### Reducing Cost
|
|
|
|
1. **Start with `base-fast`**: Upgrade only if depth is insufficient
|
|
2. **Use `core-fast` for moderate complexity**: $0.025 vs $0.10 for pro
|
|
3. **Batch related queries**: One well-crafted query > multiple simple ones
|
|
4. **Cache results**: Store research output for reuse across sections
|
|
|
|
### Maximizing Quality
|
|
|
|
1. **Use `pro-fast` or `ultra-fast`**: More sources = better synthesis
|
|
2. **Provide context**: "I'm writing a paper for Nature Medicine about..."
|
|
3. **Use `description` parameter**: Guide the output structure and focus
|
|
4. **Verify critical findings**: Cross-check with Search API or Extract
|
|
|
|
---
|
|
|
|
## Common Mistakes
|
|
|
|
| Mistake | Impact | Fix |
|
|
|---------|--------|-----|
|
|
| Query too vague | Scattered, unfocused results | Add specific aspects and time bounds |
|
|
| Query too long (>15K chars) | API rejection or degraded results | Summarize context, focus on key question |
|
|
| Wrong processor | Too slow or too shallow | Use decision matrix above |
|
|
| Not using `description` | Report structure not aligned with needs | Add description to guide output |
|
|
| Ignoring confidence levels | Using low-confidence data as fact | Check basis confidence before citing |
|
|
| Not verifying citations | Risk of outdated or misattributed data | Cross-check key citations with Extract |
|
|
|
|
---
|
|
|
|
## See Also
|
|
|
|
- [API Reference](api_reference.md) - Complete API parameter reference
|
|
- [Search Best Practices](search_best_practices.md) - For quick web searches
|
|
- [Extraction Patterns](extraction_patterns.md) - For reading specific URLs
|
|
- [Workflow Recipes](workflow_recipes.md) - Common multi-step patterns
|