Refactor research lookup skill to enhance backend routing and update documentation. The skill now intelligently selects between the Parallel Chat API and Perplexity sonar-pro-search based on query type. Added compatibility notes, license information, and improved descriptions for clarity. Removed outdated example scripts to streamline the codebase.
13 KiB
Deep Research Guide
Comprehensive guide to using Parallel's Task API for deep research, including processor selection, output formats, structured schemas, and advanced patterns.
Overview
Deep Research transforms natural language research queries into comprehensive intelligence reports. Unlike simple search, it performs multi-step web exploration across authoritative sources and synthesizes findings with inline citations and confidence levels.
Key characteristics:
- Multi-step, multi-source research
- Automatic citation and source attribution
- Structured or text output formats
- Asynchronous processing (30 seconds to 25+ minutes)
- Research basis with confidence levels per finding
Processor Selection
Choosing the right processor is the most important decision. It determines research depth, speed, and cost.
Decision Matrix
| Scenario | Recommended Processor | Why |
|---|---|---|
| Quick background for a paper section | pro-fast |
Fast, good depth, low cost |
| Comprehensive market research report | ultra-fast |
Deep multi-source synthesis |
| Simple fact lookup or metadata | base-fast |
Fast, low cost |
| Competitive landscape analysis | pro-fast |
Good balance of depth and speed |
| Background for grant proposal | pro-fast |
Thorough but timely |
| State-of-the-art review for a topic | ultra-fast |
Maximum source coverage |
| Quick question during writing | core-fast |
Sub-2-minute response |
| Breaking news or very recent events | pro (standard) |
Freshest data prioritized |
| Large-scale data enrichment | base-fast |
Cost-effective at scale |
Processor Tiers Explained
pro-fast (default, recommended for most tasks):
- Latency: 30 seconds to 5 minutes
- Depth: Explores 10-20+ web sources
- Best for: Section-level research, background gathering, comparative analysis
- Cost: $0.10 per query
ultra-fast (for comprehensive research):
- Latency: 1 to 10 minutes
- Depth: Explores 20-50+ web sources, multiple reasoning steps
- Best for: Full reports, market analysis, complex multi-faceted questions
- Cost: $0.30 per query
core-fast (quick cross-referenced answers):
- Latency: 15 seconds to 100 seconds
- Depth: Cross-references 5-10 sources
- Best for: Moderate complexity questions, verification tasks
- Cost: $0.025 per query
base-fast (simple enrichment):
- Latency: 15 to 50 seconds
- Depth: Standard web lookup, 3-5 sources
- Best for: Simple factual queries, metadata enrichment
- Cost: $0.01 per query
Standard vs Fast
- Fast processors (
-fast): 2-5x faster, very fresh data, ideal for interactive use - Standard processors (no suffix): Highest data freshness, better for background jobs
Rule of thumb: Always use -fast variants unless you specifically need the freshest possible data (breaking news, live financial data, real-time events).
Output Formats
Text Mode (Markdown Reports)
Returns a comprehensive markdown report with inline citations. Best for human consumption and document integration.
researcher = ParallelDeepResearch()
result = researcher.research(
query="Comprehensive analysis of mRNA vaccine technology platforms and their applications beyond COVID-19",
processor="pro-fast",
description="Focus on clinical trials, approved applications, pipeline developments, and key companies. Include market size data."
)
# result["output"] contains a full markdown report
# result["citations"] contains source URLs with excerpts
When to use text mode:
- Writing scientific documents (papers, reviews, reports)
- Background research for a topic
- Creating summaries for human readers
- When you need flowing prose, not structured data
Guiding text output with description:
The description parameter steers the report content:
# Focus on specific aspects
result = researcher.research(
query="Electric vehicle battery technology landscape",
description="Focus on: (1) solid-state battery progress, (2) charging speed improvements, (3) cost per kWh trends, (4) key patents and IP. Format as a structured report with clear sections."
)
# Control length and depth
result = researcher.research(
query="AI in drug discovery",
description="Provide a concise 500-word executive summary covering key applications, notable successes, leading companies, and market projections."
)
Auto-Schema Mode (Structured JSON)
Lets the processor determine the best output structure automatically. Returns structured JSON with per-field citations.
result = researcher.research_structured(
query="Top 5 cloud computing companies: revenue, market share, key products, and recent developments",
processor="pro-fast",
)
# result["content"] contains structured data (dict)
# result["basis"] contains per-field citations with confidence
When to use auto-schema:
- Data extraction and enrichment
- Comparative analysis with specific fields
- When you need programmatic access to individual data points
- Integration with databases or spreadsheets
Custom JSON Schema
Define exactly what fields you want returned:
schema = {
"type": "object",
"properties": {
"market_size_2024": {
"type": "string",
"description": "Global market size in USD billions for 2024. Include source."
},
"growth_rate": {
"type": "string",
"description": "CAGR percentage for 2024-2030 forecast period."
},
"top_companies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Company name"},
"market_share": {"type": "string", "description": "Approximate market share percentage"},
"revenue": {"type": "string", "description": "Most recent annual revenue"}
},
"required": ["name", "market_share", "revenue"]
},
"description": "Top 5 companies by market share"
},
"key_trends": {
"type": "array",
"items": {"type": "string"},
"description": "Top 3-5 industry trends driving growth"
}
},
"required": ["market_size_2024", "growth_rate", "top_companies", "key_trends"],
"additionalProperties": False
}
result = researcher.research_structured(
query="Global cybersecurity market analysis",
output_schema=schema,
)
Writing Effective Research Queries
Query Construction Framework
Structure your query as: [Topic] + [Specific Aspect] + [Scope/Time] + [Output Expectations]
Good queries:
"Comprehensive analysis of the global lithium-ion battery recycling market,
including market size, key players, regulatory drivers, and technology
approaches. Focus on 2023-2025 developments."
"Compare the efficacy, safety profiles, and cost-effectiveness of GLP-1
receptor agonists (semaglutide, tirzepatide, liraglutide) for type 2
diabetes management based on recent clinical trial data."
"Survey of federated learning approaches for healthcare AI, covering
privacy-preserving techniques, real-world deployments, regulatory
compliance, and performance benchmarks from 2023-2025 publications."
Poor queries:
"Tell me about batteries" # Too vague
"AI" # No specific aspect
"What's new?" # No topic at all
"Everything about quantum computing from all time" # Too broad
Tips for Better Results
- Be specific about what you need: "market size" vs "tell me about the market"
- Include time bounds: "2024-2025" narrows to relevant data
- Name entities: "semaglutide vs tirzepatide" vs "diabetes drugs"
- Specify output expectations: "Include statistics, key players, and growth projections"
- Keep under 15,000 characters: Concise queries work better than massive prompts
Working with Research Basis
Every deep research result includes a basis -- citations, reasoning, and confidence levels for each finding.
Text Mode Basis
result = researcher.research(query="...", processor="pro-fast")
# Citations are deduplicated and include URLs + excerpts
for citation in result["citations"]:
print(f"Source: {citation['title']}")
print(f"URL: {citation['url']}")
if citation.get("excerpts"):
print(f"Excerpt: {citation['excerpts'][0][:200]}")
Structured Mode Basis
result = researcher.research_structured(query="...", processor="pro-fast")
for basis_entry in result["basis"]:
print(f"Field: {basis_entry['field']}")
print(f"Confidence: {basis_entry['confidence']}")
print(f"Reasoning: {basis_entry['reasoning']}")
for cit in basis_entry["citations"]:
print(f" Source: {cit['url']}")
Confidence Levels
| Level | Meaning | Action |
|---|---|---|
high |
Multiple authoritative sources agree | Use directly |
medium |
Some supporting evidence, minor uncertainty | Use with caveat |
low |
Limited evidence, significant uncertainty | Verify independently |
Advanced Patterns
Multi-Stage Research
Use different processors in sequence for progressively deeper research:
# Stage 1: Quick overview with base-fast
overview = researcher.research(
query="What are the main approaches to quantum error correction?",
processor="base-fast",
)
# Stage 2: Deep dive on the most promising approach
deep_dive = researcher.research(
query=f"Detailed analysis of surface code quantum error correction: "
f"recent breakthroughs, implementation challenges, and leading research groups. "
f"Context: {overview['output'][:500]}",
processor="pro-fast",
)
Comparative Research
result = researcher.research(
query="Compare and contrast three leading large language model architectures: "
"GPT-4, Claude, and Gemini. Cover architecture differences, benchmark performance, "
"pricing, context window, and unique capabilities. Include specific benchmark scores.",
processor="pro-fast",
description="Create a structured comparison with a summary table. Include specific numbers and benchmarks."
)
Research with Follow-Up Extraction
# Step 1: Research to find relevant sources
research_result = researcher.research(
query="Most influential papers on attention mechanisms in 2024",
processor="pro-fast",
)
# Step 2: Extract full content from the most relevant sources
from parallel_web import ParallelExtract
extractor = ParallelExtract()
key_urls = [c["url"] for c in research_result["citations"][:5]]
for url in key_urls:
extracted = extractor.extract(
urls=[url],
objective="Key methodology, results, and conclusions",
)
Performance Optimization
Reducing Latency
- Use
-fastprocessors: 2-5x faster than standard - Use
core-fastfor moderate queries: Sub-2-minute for most questions - Be specific in queries: Vague queries require more exploration
- Set appropriate timeouts: Don't over-wait
Reducing Cost
- Start with
base-fast: Upgrade only if depth is insufficient - Use
core-fastfor moderate complexity: $0.025 vs $0.10 for pro - Batch related queries: One well-crafted query > multiple simple ones
- Cache results: Store research output for reuse across sections
Maximizing Quality
- Use
pro-fastorultra-fast: More sources = better synthesis - Provide context: "I'm writing a paper for Nature Medicine about..."
- Use
descriptionparameter: Guide the output structure and focus - Verify critical findings: Cross-check with Search API or Extract
Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
| Query too vague | Scattered, unfocused results | Add specific aspects and time bounds |
| Query too long (>15K chars) | API rejection or degraded results | Summarize context, focus on key question |
| Wrong processor | Too slow or too shallow | Use decision matrix above |
Not using description |
Report structure not aligned with needs | Add description to guide output |
| Ignoring confidence levels | Using low-confidence data as fact | Check basis confidence before citing |
| Not verifying citations | Risk of outdated or misattributed data | Cross-check key citations with Extract |
See Also
- API Reference - Complete API parameter reference
- Search Best Practices - For quick web searches
- Extraction Patterns - For reading specific URLs
- Workflow Recipes - Common multi-step patterns