mirror of https://github.com/K-Dense-AI/claude-scientific-skills.git synced 2026-03-27 07:09:27 +08:00

Files

Vinayak Agarwal f72b7f4521 Added parallel-web skill

Refactor research lookup skill to enhance backend routing and update documentation. The skill now intelligently selects between the Parallel Chat API and Perplexity sonar-pro-search based on query type. Added compatibility notes, license information, and improved descriptions for clarity. Removed outdated example scripts to streamline the codebase.

2026-03-01 07:36:19 -08:00

13 KiB

Raw Blame History

Deep Research Guide

Comprehensive guide to using Parallel's Task API for deep research, including processor selection, output formats, structured schemas, and advanced patterns.

Overview

Deep Research transforms natural language research queries into comprehensive intelligence reports. Unlike simple search, it performs multi-step web exploration across authoritative sources and synthesizes findings with inline citations and confidence levels.

Key characteristics:

Multi-step, multi-source research
Automatic citation and source attribution
Structured or text output formats
Asynchronous processing (30 seconds to 25+ minutes)
Research basis with confidence levels per finding

Processor Selection

Choosing the right processor is the most important decision. It determines research depth, speed, and cost.

Decision Matrix

Scenario	Recommended Processor	Why
Quick background for a paper section	`pro-fast`	Fast, good depth, low cost
Comprehensive market research report	`ultra-fast`	Deep multi-source synthesis
Simple fact lookup or metadata	`base-fast`	Fast, low cost
Competitive landscape analysis	`pro-fast`	Good balance of depth and speed
Background for grant proposal	`pro-fast`	Thorough but timely
State-of-the-art review for a topic	`ultra-fast`	Maximum source coverage
Quick question during writing	`core-fast`	Sub-2-minute response
Breaking news or very recent events	`pro` (standard)	Freshest data prioritized
Large-scale data enrichment	`base-fast`	Cost-effective at scale

Processor Tiers Explained

pro-fast (default, recommended for most tasks):

Latency: 30 seconds to 5 minutes
Depth: Explores 10-20+ web sources
Best for: Section-level research, background gathering, comparative analysis
Cost: $0.10 per query

ultra-fast (for comprehensive research):

Latency: 1 to 10 minutes
Depth: Explores 20-50+ web sources, multiple reasoning steps
Best for: Full reports, market analysis, complex multi-faceted questions
Cost: $0.30 per query

core-fast (quick cross-referenced answers):

Latency: 15 seconds to 100 seconds
Depth: Cross-references 5-10 sources
Best for: Moderate complexity questions, verification tasks
Cost: $0.025 per query

base-fast (simple enrichment):

Latency: 15 to 50 seconds
Depth: Standard web lookup, 3-5 sources
Best for: Simple factual queries, metadata enrichment
Cost: $0.01 per query

Standard vs Fast

Fast processors (-fast): 2-5x faster, very fresh data, ideal for interactive use
Standard processors (no suffix): Highest data freshness, better for background jobs

Rule of thumb: Always use -fast variants unless you specifically need the freshest possible data (breaking news, live financial data, real-time events).

Output Formats

Text Mode (Markdown Reports)

Returns a comprehensive markdown report with inline citations. Best for human consumption and document integration.

researcher = ParallelDeepResearch()

result = researcher.research(
    query="Comprehensive analysis of mRNA vaccine technology platforms and their applications beyond COVID-19",
    processor="pro-fast",
    description="Focus on clinical trials, approved applications, pipeline developments, and key companies. Include market size data."
)

# result["output"] contains a full markdown report
# result["citations"] contains source URLs with excerpts

When to use text mode:

Writing scientific documents (papers, reviews, reports)
Background research for a topic
Creating summaries for human readers
When you need flowing prose, not structured data

Guiding text output with description:

The description parameter steers the report content:

# Focus on specific aspects
result = researcher.research(
    query="Electric vehicle battery technology landscape",
    description="Focus on: (1) solid-state battery progress, (2) charging speed improvements, (3) cost per kWh trends, (4) key patents and IP. Format as a structured report with clear sections."
)

# Control length and depth
result = researcher.research(
    query="AI in drug discovery",
    description="Provide a concise 500-word executive summary covering key applications, notable successes, leading companies, and market projections."
)

Auto-Schema Mode (Structured JSON)

Lets the processor determine the best output structure automatically. Returns structured JSON with per-field citations.

result = researcher.research_structured(
    query="Top 5 cloud computing companies: revenue, market share, key products, and recent developments",
    processor="pro-fast",
)

# result["content"] contains structured data (dict)
# result["basis"] contains per-field citations with confidence

When to use auto-schema:

Data extraction and enrichment
Comparative analysis with specific fields
When you need programmatic access to individual data points
Integration with databases or spreadsheets

Custom JSON Schema

Define exactly what fields you want returned:

schema = {
    "type": "object",
    "properties": {
        "market_size_2024": {
            "type": "string",
            "description": "Global market size in USD billions for 2024. Include source."
        },
        "growth_rate": {
            "type": "string",
            "description": "CAGR percentage for 2024-2030 forecast period."
        },
        "top_companies": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string", "description": "Company name"},
                    "market_share": {"type": "string", "description": "Approximate market share percentage"},
                    "revenue": {"type": "string", "description": "Most recent annual revenue"}
                },
                "required": ["name", "market_share", "revenue"]
            },
            "description": "Top 5 companies by market share"
        },
        "key_trends": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Top 3-5 industry trends driving growth"
        }
    },
    "required": ["market_size_2024", "growth_rate", "top_companies", "key_trends"],
    "additionalProperties": False
}

result = researcher.research_structured(
    query="Global cybersecurity market analysis",
    output_schema=schema,
)

Writing Effective Research Queries

Query Construction Framework

Structure your query as: [Topic] + [Specific Aspect] + [Scope/Time] + [Output Expectations]

Good queries:

"Comprehensive analysis of the global lithium-ion battery recycling market,
including market size, key players, regulatory drivers, and technology
approaches. Focus on 2023-2025 developments."

"Compare the efficacy, safety profiles, and cost-effectiveness of GLP-1
receptor agonists (semaglutide, tirzepatide, liraglutide) for type 2
diabetes management based on recent clinical trial data."

"Survey of federated learning approaches for healthcare AI, covering
privacy-preserving techniques, real-world deployments, regulatory
compliance, and performance benchmarks from 2023-2025 publications."

Poor queries:

"Tell me about batteries"          # Too vague
"AI"                                # No specific aspect
"What's new?"                       # No topic at all
"Everything about quantum computing from all time"  # Too broad

Tips for Better Results

Be specific about what you need: "market size" vs "tell me about the market"
Include time bounds: "2024-2025" narrows to relevant data
Name entities: "semaglutide vs tirzepatide" vs "diabetes drugs"
Specify output expectations: "Include statistics, key players, and growth projections"
Keep under 15,000 characters: Concise queries work better than massive prompts

Working with Research Basis

Every deep research result includes a basis -- citations, reasoning, and confidence levels for each finding.

Text Mode Basis

result = researcher.research(query="...", processor="pro-fast")

# Citations are deduplicated and include URLs + excerpts
for citation in result["citations"]:
    print(f"Source: {citation['title']}")
    print(f"URL: {citation['url']}")
    if citation.get("excerpts"):
        print(f"Excerpt: {citation['excerpts'][0][:200]}")

Structured Mode Basis

result = researcher.research_structured(query="...", processor="pro-fast")

for basis_entry in result["basis"]:
    print(f"Field: {basis_entry['field']}")
    print(f"Confidence: {basis_entry['confidence']}")
    print(f"Reasoning: {basis_entry['reasoning']}")
    for cit in basis_entry["citations"]:
        print(f"  Source: {cit['url']}")

Confidence Levels

Level	Meaning	Action
`high`	Multiple authoritative sources agree	Use directly
`medium`	Some supporting evidence, minor uncertainty	Use with caveat
`low`	Limited evidence, significant uncertainty	Verify independently

Advanced Patterns

Multi-Stage Research

Use different processors in sequence for progressively deeper research:

# Stage 1: Quick overview with base-fast
overview = researcher.research(
    query="What are the main approaches to quantum error correction?",
    processor="base-fast",
)

# Stage 2: Deep dive on the most promising approach
deep_dive = researcher.research(
    query=f"Detailed analysis of surface code quantum error correction: "
          f"recent breakthroughs, implementation challenges, and leading research groups. "
          f"Context: {overview['output'][:500]}",
    processor="pro-fast",
)

Comparative Research

result = researcher.research(
    query="Compare and contrast three leading large language model architectures: "
          "GPT-4, Claude, and Gemini. Cover architecture differences, benchmark performance, "
          "pricing, context window, and unique capabilities. Include specific benchmark scores.",
    processor="pro-fast",
    description="Create a structured comparison with a summary table. Include specific numbers and benchmarks."
)

Research with Follow-Up Extraction

# Step 1: Research to find relevant sources
research_result = researcher.research(
    query="Most influential papers on attention mechanisms in 2024",
    processor="pro-fast",
)

# Step 2: Extract full content from the most relevant sources
from parallel_web import ParallelExtract
extractor = ParallelExtract()

key_urls = [c["url"] for c in research_result["citations"][:5]]
for url in key_urls:
    extracted = extractor.extract(
        urls=[url],
        objective="Key methodology, results, and conclusions",
    )

Performance Optimization

Reducing Latency

Use -fast processors: 2-5x faster than standard
Use core-fast for moderate queries: Sub-2-minute for most questions
Be specific in queries: Vague queries require more exploration
Set appropriate timeouts: Don't over-wait

Reducing Cost

Start with base-fast: Upgrade only if depth is insufficient
Use core-fast for moderate complexity: $0.025 vs $0.10 for pro
Batch related queries: One well-crafted query > multiple simple ones
Cache results: Store research output for reuse across sections

Maximizing Quality

Use pro-fast or ultra-fast: More sources = better synthesis
Provide context: "I'm writing a paper for Nature Medicine about..."
Use description parameter: Guide the output structure and focus
Verify critical findings: Cross-check with Search API or Extract

Common Mistakes

Mistake	Impact	Fix
Query too vague	Scattered, unfocused results	Add specific aspects and time bounds
Query too long (>15K chars)	API rejection or degraded results	Summarize context, focus on key question
Wrong processor	Too slow or too shallow	Use decision matrix above
Not using `description`	Report structure not aligned with needs	Add description to guide output
Ignoring confidence levels	Using low-confidence data as fact	Check basis confidence before citing
Not verifying citations	Risk of outdated or misattributed data	Cross-check key citations with Extract

13 KiB Raw Blame History