mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Added parallel-web skill
Refactor research lookup skill to enhance backend routing and update documentation. The skill now intelligently selects between the Parallel Chat API and Perplexity sonar-pro-search based on query type. Added compatibility notes, license information, and improved descriptions for clarity. Removed outdated example scripts to streamline the codebase.
This commit is contained in:
362
scientific-skills/parallel-web/references/deep_research_guide.md
Normal file
362
scientific-skills/parallel-web/references/deep_research_guide.md
Normal file
@@ -0,0 +1,362 @@
|
||||
# Deep Research Guide
|
||||
|
||||
Comprehensive guide to using Parallel's Task API for deep research, including processor selection, output formats, structured schemas, and advanced patterns.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Deep Research transforms natural language research queries into comprehensive intelligence reports. Unlike simple search, it performs multi-step web exploration across authoritative sources and synthesizes findings with inline citations and confidence levels.
|
||||
|
||||
**Key characteristics:**
|
||||
- Multi-step, multi-source research
|
||||
- Automatic citation and source attribution
|
||||
- Structured or text output formats
|
||||
- Asynchronous processing (30 seconds to 25+ minutes)
|
||||
- Research basis with confidence levels per finding
|
||||
|
||||
---
|
||||
|
||||
## Processor Selection
|
||||
|
||||
Choosing the right processor is the most important decision. It determines research depth, speed, and cost.
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Scenario | Recommended Processor | Why |
|
||||
|----------|----------------------|-----|
|
||||
| Quick background for a paper section | `pro-fast` | Fast, good depth, low cost |
|
||||
| Comprehensive market research report | `ultra-fast` | Deep multi-source synthesis |
|
||||
| Simple fact lookup or metadata | `base-fast` | Fast, low cost |
|
||||
| Competitive landscape analysis | `pro-fast` | Good balance of depth and speed |
|
||||
| Background for grant proposal | `pro-fast` | Thorough but timely |
|
||||
| State-of-the-art review for a topic | `ultra-fast` | Maximum source coverage |
|
||||
| Quick question during writing | `core-fast` | Sub-2-minute response |
|
||||
| Breaking news or very recent events | `pro` (standard) | Freshest data prioritized |
|
||||
| Large-scale data enrichment | `base-fast` | Cost-effective at scale |
|
||||
|
||||
### Processor Tiers Explained
|
||||
|
||||
**`pro-fast`** (default, recommended for most tasks):
|
||||
- Latency: 30 seconds to 5 minutes
|
||||
- Depth: Explores 10-20+ web sources
|
||||
- Best for: Section-level research, background gathering, comparative analysis
|
||||
- Cost: $0.10 per query
|
||||
|
||||
**`ultra-fast`** (for comprehensive research):
|
||||
- Latency: 1 to 10 minutes
|
||||
- Depth: Explores 20-50+ web sources, multiple reasoning steps
|
||||
- Best for: Full reports, market analysis, complex multi-faceted questions
|
||||
- Cost: $0.30 per query
|
||||
|
||||
**`core-fast`** (quick cross-referenced answers):
|
||||
- Latency: 15 seconds to 100 seconds
|
||||
- Depth: Cross-references 5-10 sources
|
||||
- Best for: Moderate complexity questions, verification tasks
|
||||
- Cost: $0.025 per query
|
||||
|
||||
**`base-fast`** (simple enrichment):
|
||||
- Latency: 15 to 50 seconds
|
||||
- Depth: Standard web lookup, 3-5 sources
|
||||
- Best for: Simple factual queries, metadata enrichment
|
||||
- Cost: $0.01 per query
|
||||
|
||||
### Standard vs Fast
|
||||
|
||||
- **Fast processors** (`-fast`): 2-5x faster, very fresh data, ideal for interactive use
|
||||
- **Standard processors** (no suffix): Highest data freshness, better for background jobs
|
||||
|
||||
**Rule of thumb:** Always use `-fast` variants unless you specifically need the freshest possible data (breaking news, live financial data, real-time events).
|
||||
|
||||
---
|
||||
|
||||
## Output Formats
|
||||
|
||||
### Text Mode (Markdown Reports)
|
||||
|
||||
Returns a comprehensive markdown report with inline citations. Best for human consumption and document integration.
|
||||
|
||||
```python
|
||||
researcher = ParallelDeepResearch()
|
||||
|
||||
result = researcher.research(
|
||||
query="Comprehensive analysis of mRNA vaccine technology platforms and their applications beyond COVID-19",
|
||||
processor="pro-fast",
|
||||
description="Focus on clinical trials, approved applications, pipeline developments, and key companies. Include market size data."
|
||||
)
|
||||
|
||||
# result["output"] contains a full markdown report
|
||||
# result["citations"] contains source URLs with excerpts
|
||||
```
|
||||
|
||||
**When to use text mode:**
|
||||
- Writing scientific documents (papers, reviews, reports)
|
||||
- Background research for a topic
|
||||
- Creating summaries for human readers
|
||||
- When you need flowing prose, not structured data
|
||||
|
||||
**Guiding text output with `description`:**
|
||||
|
||||
The `description` parameter steers the report content:
|
||||
|
||||
```python
|
||||
# Focus on specific aspects
|
||||
result = researcher.research(
|
||||
query="Electric vehicle battery technology landscape",
|
||||
description="Focus on: (1) solid-state battery progress, (2) charging speed improvements, (3) cost per kWh trends, (4) key patents and IP. Format as a structured report with clear sections."
|
||||
)
|
||||
|
||||
# Control length and depth
|
||||
result = researcher.research(
|
||||
query="AI in drug discovery",
|
||||
description="Provide a concise 500-word executive summary covering key applications, notable successes, leading companies, and market projections."
|
||||
)
|
||||
```
|
||||
|
||||
### Auto-Schema Mode (Structured JSON)
|
||||
|
||||
Lets the processor determine the best output structure automatically. Returns structured JSON with per-field citations.
|
||||
|
||||
```python
|
||||
result = researcher.research_structured(
|
||||
query="Top 5 cloud computing companies: revenue, market share, key products, and recent developments",
|
||||
processor="pro-fast",
|
||||
)
|
||||
|
||||
# result["content"] contains structured data (dict)
|
||||
# result["basis"] contains per-field citations with confidence
|
||||
```
|
||||
|
||||
**When to use auto-schema:**
|
||||
- Data extraction and enrichment
|
||||
- Comparative analysis with specific fields
|
||||
- When you need programmatic access to individual data points
|
||||
- Integration with databases or spreadsheets
|
||||
|
||||
### Custom JSON Schema
|
||||
|
||||
Define exactly what fields you want returned:
|
||||
|
||||
```python
|
||||
schema = {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"market_size_2024": {
|
||||
"type": "string",
|
||||
"description": "Global market size in USD billions for 2024. Include source."
|
||||
},
|
||||
"growth_rate": {
|
||||
"type": "string",
|
||||
"description": "CAGR percentage for 2024-2030 forecast period."
|
||||
},
|
||||
"top_companies": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {"type": "string", "description": "Company name"},
|
||||
"market_share": {"type": "string", "description": "Approximate market share percentage"},
|
||||
"revenue": {"type": "string", "description": "Most recent annual revenue"}
|
||||
},
|
||||
"required": ["name", "market_share", "revenue"]
|
||||
},
|
||||
"description": "Top 5 companies by market share"
|
||||
},
|
||||
"key_trends": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Top 3-5 industry trends driving growth"
|
||||
}
|
||||
},
|
||||
"required": ["market_size_2024", "growth_rate", "top_companies", "key_trends"],
|
||||
"additionalProperties": False
|
||||
}
|
||||
|
||||
result = researcher.research_structured(
|
||||
query="Global cybersecurity market analysis",
|
||||
output_schema=schema,
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Writing Effective Research Queries
|
||||
|
||||
### Query Construction Framework
|
||||
|
||||
Structure your query as: **[Topic] + [Specific Aspect] + [Scope/Time] + [Output Expectations]**
|
||||
|
||||
**Good queries:**
|
||||
```
|
||||
"Comprehensive analysis of the global lithium-ion battery recycling market,
|
||||
including market size, key players, regulatory drivers, and technology
|
||||
approaches. Focus on 2023-2025 developments."
|
||||
|
||||
"Compare the efficacy, safety profiles, and cost-effectiveness of GLP-1
|
||||
receptor agonists (semaglutide, tirzepatide, liraglutide) for type 2
|
||||
diabetes management based on recent clinical trial data."
|
||||
|
||||
"Survey of federated learning approaches for healthcare AI, covering
|
||||
privacy-preserving techniques, real-world deployments, regulatory
|
||||
compliance, and performance benchmarks from 2023-2025 publications."
|
||||
```
|
||||
|
||||
**Poor queries:**
|
||||
```
|
||||
"Tell me about batteries" # Too vague
|
||||
"AI" # No specific aspect
|
||||
"What's new?" # No topic at all
|
||||
"Everything about quantum computing from all time" # Too broad
|
||||
```
|
||||
|
||||
### Tips for Better Results
|
||||
|
||||
1. **Be specific about what you need**: "market size" vs "tell me about the market"
|
||||
2. **Include time bounds**: "2024-2025" narrows to relevant data
|
||||
3. **Name entities**: "semaglutide vs tirzepatide" vs "diabetes drugs"
|
||||
4. **Specify output expectations**: "Include statistics, key players, and growth projections"
|
||||
5. **Keep under 15,000 characters**: Concise queries work better than massive prompts
|
||||
|
||||
---
|
||||
|
||||
## Working with Research Basis
|
||||
|
||||
Every deep research result includes a **basis** -- citations, reasoning, and confidence levels for each finding.
|
||||
|
||||
### Text Mode Basis
|
||||
|
||||
```python
|
||||
result = researcher.research(query="...", processor="pro-fast")
|
||||
|
||||
# Citations are deduplicated and include URLs + excerpts
|
||||
for citation in result["citations"]:
|
||||
print(f"Source: {citation['title']}")
|
||||
print(f"URL: {citation['url']}")
|
||||
if citation.get("excerpts"):
|
||||
print(f"Excerpt: {citation['excerpts'][0][:200]}")
|
||||
```
|
||||
|
||||
### Structured Mode Basis
|
||||
|
||||
```python
|
||||
result = researcher.research_structured(query="...", processor="pro-fast")
|
||||
|
||||
for basis_entry in result["basis"]:
|
||||
print(f"Field: {basis_entry['field']}")
|
||||
print(f"Confidence: {basis_entry['confidence']}")
|
||||
print(f"Reasoning: {basis_entry['reasoning']}")
|
||||
for cit in basis_entry["citations"]:
|
||||
print(f" Source: {cit['url']}")
|
||||
```
|
||||
|
||||
### Confidence Levels
|
||||
|
||||
| Level | Meaning | Action |
|
||||
|-------|---------|--------|
|
||||
| `high` | Multiple authoritative sources agree | Use directly |
|
||||
| `medium` | Some supporting evidence, minor uncertainty | Use with caveat |
|
||||
| `low` | Limited evidence, significant uncertainty | Verify independently |
|
||||
|
||||
---
|
||||
|
||||
## Advanced Patterns
|
||||
|
||||
### Multi-Stage Research
|
||||
|
||||
Use different processors in sequence for progressively deeper research:
|
||||
|
||||
```python
|
||||
# Stage 1: Quick overview with base-fast
|
||||
overview = researcher.research(
|
||||
query="What are the main approaches to quantum error correction?",
|
||||
processor="base-fast",
|
||||
)
|
||||
|
||||
# Stage 2: Deep dive on the most promising approach
|
||||
deep_dive = researcher.research(
|
||||
query=f"Detailed analysis of surface code quantum error correction: "
|
||||
f"recent breakthroughs, implementation challenges, and leading research groups. "
|
||||
f"Context: {overview['output'][:500]}",
|
||||
processor="pro-fast",
|
||||
)
|
||||
```
|
||||
|
||||
### Comparative Research
|
||||
|
||||
```python
|
||||
result = researcher.research(
|
||||
query="Compare and contrast three leading large language model architectures: "
|
||||
"GPT-4, Claude, and Gemini. Cover architecture differences, benchmark performance, "
|
||||
"pricing, context window, and unique capabilities. Include specific benchmark scores.",
|
||||
processor="pro-fast",
|
||||
description="Create a structured comparison with a summary table. Include specific numbers and benchmarks."
|
||||
)
|
||||
```
|
||||
|
||||
### Research with Follow-Up Extraction
|
||||
|
||||
```python
|
||||
# Step 1: Research to find relevant sources
|
||||
research_result = researcher.research(
|
||||
query="Most influential papers on attention mechanisms in 2024",
|
||||
processor="pro-fast",
|
||||
)
|
||||
|
||||
# Step 2: Extract full content from the most relevant sources
|
||||
from parallel_web import ParallelExtract
|
||||
extractor = ParallelExtract()
|
||||
|
||||
key_urls = [c["url"] for c in research_result["citations"][:5]]
|
||||
for url in key_urls:
|
||||
extracted = extractor.extract(
|
||||
urls=[url],
|
||||
objective="Key methodology, results, and conclusions",
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Reducing Latency
|
||||
|
||||
1. **Use `-fast` processors**: 2-5x faster than standard
|
||||
2. **Use `core-fast` for moderate queries**: Sub-2-minute for most questions
|
||||
3. **Be specific in queries**: Vague queries require more exploration
|
||||
4. **Set appropriate timeouts**: Don't over-wait
|
||||
|
||||
### Reducing Cost
|
||||
|
||||
1. **Start with `base-fast`**: Upgrade only if depth is insufficient
|
||||
2. **Use `core-fast` for moderate complexity**: $0.025 vs $0.10 for pro
|
||||
3. **Batch related queries**: One well-crafted query > multiple simple ones
|
||||
4. **Cache results**: Store research output for reuse across sections
|
||||
|
||||
### Maximizing Quality
|
||||
|
||||
1. **Use `pro-fast` or `ultra-fast`**: More sources = better synthesis
|
||||
2. **Provide context**: "I'm writing a paper for Nature Medicine about..."
|
||||
3. **Use `description` parameter**: Guide the output structure and focus
|
||||
4. **Verify critical findings**: Cross-check with Search API or Extract
|
||||
|
||||
---
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
| Mistake | Impact | Fix |
|
||||
|---------|--------|-----|
|
||||
| Query too vague | Scattered, unfocused results | Add specific aspects and time bounds |
|
||||
| Query too long (>15K chars) | API rejection or degraded results | Summarize context, focus on key question |
|
||||
| Wrong processor | Too slow or too shallow | Use decision matrix above |
|
||||
| Not using `description` | Report structure not aligned with needs | Add description to guide output |
|
||||
| Ignoring confidence levels | Using low-confidence data as fact | Check basis confidence before citing |
|
||||
| Not verifying citations | Risk of outdated or misattributed data | Cross-check key citations with Extract |
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [API Reference](api_reference.md) - Complete API parameter reference
|
||||
- [Search Best Practices](search_best_practices.md) - For quick web searches
|
||||
- [Extraction Patterns](extraction_patterns.md) - For reading specific URLs
|
||||
- [Workflow Recipes](workflow_recipes.md) - Common multi-step patterns
|
||||
Reference in New Issue
Block a user