Added parallel-web skill

Refactor research lookup skill to enhance backend routing and update documentation. The skill now intelligently selects between the Parallel Chat API and Perplexity sonar-pro-search based on query type. Added compatibility notes, license information, and improved descriptions for clarity. Removed outdated example scripts to streamline the codebase.
2026-03-27 07:09:27 +08:00 · 2026-03-01 07:36:19 -08:00
parent 29c869326e
commit f72b7f4521
13 changed files with 3969 additions and 769 deletions
--- a/scientific-skills/parallel-web/references/deep_research_guide.md
+++ b/scientific-skills/parallel-web/references/deep_research_guide.md
@@ -0,0 +1,362 @@
+# Deep Research Guide
+
+Comprehensive guide to using Parallel's Task API for deep research, including processor selection, output formats, structured schemas, and advanced patterns.
+
+---
+
+## Overview
+
+Deep Research transforms natural language research queries into comprehensive intelligence reports. Unlike simple search, it performs multi-step web exploration across authoritative sources and synthesizes findings with inline citations and confidence levels.
+
+**Key characteristics:**
+- Multi-step, multi-source research
+- Automatic citation and source attribution
+- Structured or text output formats
+- Asynchronous processing (30 seconds to 25+ minutes)
+- Research basis with confidence levels per finding
+
+---
+
+## Processor Selection
+
+Choosing the right processor is the most important decision. It determines research depth, speed, and cost.
+
+### Decision Matrix
+
+| Scenario | Recommended Processor | Why |
+|----------|----------------------|-----|
+| Quick background for a paper section | `pro-fast` | Fast, good depth, low cost |
+| Comprehensive market research report | `ultra-fast` | Deep multi-source synthesis |
+| Simple fact lookup or metadata | `base-fast` | Fast, low cost |
+| Competitive landscape analysis | `pro-fast` | Good balance of depth and speed |
+| Background for grant proposal | `pro-fast` | Thorough but timely |
+| State-of-the-art review for a topic | `ultra-fast` | Maximum source coverage |
+| Quick question during writing | `core-fast` | Sub-2-minute response |
+| Breaking news or very recent events | `pro` (standard) | Freshest data prioritized |
+| Large-scale data enrichment | `base-fast` | Cost-effective at scale |
+
+### Processor Tiers Explained
+
+**`pro-fast`** (default, recommended for most tasks):
+- Latency: 30 seconds to 5 minutes
+- Depth: Explores 10-20+ web sources
+- Best for: Section-level research, background gathering, comparative analysis
+- Cost: $0.10 per query
+
+**`ultra-fast`** (for comprehensive research):
+- Latency: 1 to 10 minutes
+- Depth: Explores 20-50+ web sources, multiple reasoning steps
+- Best for: Full reports, market analysis, complex multi-faceted questions
+- Cost: $0.30 per query
+
+**`core-fast`** (quick cross-referenced answers):
+- Latency: 15 seconds to 100 seconds
+- Depth: Cross-references 5-10 sources
+- Best for: Moderate complexity questions, verification tasks
+- Cost: $0.025 per query
+
+**`base-fast`** (simple enrichment):
+- Latency: 15 to 50 seconds
+- Depth: Standard web lookup, 3-5 sources
+- Best for: Simple factual queries, metadata enrichment
+- Cost: $0.01 per query
+
+### Standard vs Fast
+
+- **Fast processors** (`-fast`): 2-5x faster, very fresh data, ideal for interactive use
+- **Standard processors** (no suffix): Highest data freshness, better for background jobs
+
+**Rule of thumb:** Always use `-fast` variants unless you specifically need the freshest possible data (breaking news, live financial data, real-time events).
+
+---
+
+## Output Formats
+
+### Text Mode (Markdown Reports)
+
+Returns a comprehensive markdown report with inline citations. Best for human consumption and document integration.
+
+```python
+researcher = ParallelDeepResearch()
+
+result = researcher.research(
+    query="Comprehensive analysis of mRNA vaccine technology platforms and their applications beyond COVID-19",
+    processor="pro-fast",
+    description="Focus on clinical trials, approved applications, pipeline developments, and key companies. Include market size data."
+)
+
+# result["output"] contains a full markdown report
+# result["citations"] contains source URLs with excerpts
+```
+
+**When to use text mode:**
+- Writing scientific documents (papers, reviews, reports)
+- Background research for a topic
+- Creating summaries for human readers
+- When you need flowing prose, not structured data
+
+**Guiding text output with `description`:**
+
+The `description` parameter steers the report content:
+
+```python
+# Focus on specific aspects
+result = researcher.research(
+    query="Electric vehicle battery technology landscape",
+    description="Focus on: (1) solid-state battery progress, (2) charging speed improvements, (3) cost per kWh trends, (4) key patents and IP. Format as a structured report with clear sections."
+)
+
+# Control length and depth
+result = researcher.research(
+    query="AI in drug discovery",
+    description="Provide a concise 500-word executive summary covering key applications, notable successes, leading companies, and market projections."
+)
+```
+
+### Auto-Schema Mode (Structured JSON)
+
+Lets the processor determine the best output structure automatically. Returns structured JSON with per-field citations.
+
+```python
+result = researcher.research_structured(
+    query="Top 5 cloud computing companies: revenue, market share, key products, and recent developments",
+    processor="pro-fast",
+)
+
+# result["content"] contains structured data (dict)
+# result["basis"] contains per-field citations with confidence
+```
+
+**When to use auto-schema:**
+- Data extraction and enrichment
+- Comparative analysis with specific fields
+- When you need programmatic access to individual data points
+- Integration with databases or spreadsheets
+
+### Custom JSON Schema
+
+Define exactly what fields you want returned:
+
+```python
+schema = {
+    "type": "object",
+    "properties": {
+        "market_size_2024": {
+            "type": "string",
+            "description": "Global market size in USD billions for 2024. Include source."
+        },
+        "growth_rate": {
+            "type": "string",
+            "description": "CAGR percentage for 2024-2030 forecast period."
+        },
+        "top_companies": {
+            "type": "array",
+            "items": {
+                "type": "object",
+                "properties": {
+                    "name": {"type": "string", "description": "Company name"},
+                    "market_share": {"type": "string", "description": "Approximate market share percentage"},
+                    "revenue": {"type": "string", "description": "Most recent annual revenue"}
+                },
+                "required": ["name", "market_share", "revenue"]
+            },
+            "description": "Top 5 companies by market share"
+        },
+        "key_trends": {
+            "type": "array",
+            "items": {"type": "string"},
+            "description": "Top 3-5 industry trends driving growth"
+        }
+    },
+    "required": ["market_size_2024", "growth_rate", "top_companies", "key_trends"],
+    "additionalProperties": False
+}
+
+result = researcher.research_structured(
+    query="Global cybersecurity market analysis",
+    output_schema=schema,
+)
+```
+
+---
+
+## Writing Effective Research Queries
+
+### Query Construction Framework
+
+Structure your query as: **[Topic] + [Specific Aspect] + [Scope/Time] + [Output Expectations]**
+
+**Good queries:**
+```
+"Comprehensive analysis of the global lithium-ion battery recycling market,
+including market size, key players, regulatory drivers, and technology
+approaches. Focus on 2023-2025 developments."
+
+"Compare the efficacy, safety profiles, and cost-effectiveness of GLP-1
+receptor agonists (semaglutide, tirzepatide, liraglutide) for type 2
+diabetes management based on recent clinical trial data."
+
+"Survey of federated learning approaches for healthcare AI, covering
+privacy-preserving techniques, real-world deployments, regulatory
+compliance, and performance benchmarks from 2023-2025 publications."
+```
+
+**Poor queries:**
+```
+"Tell me about batteries"          # Too vague
+"AI"                                # No specific aspect
+"What's new?"                       # No topic at all
+"Everything about quantum computing from all time"  # Too broad
+```
+
+### Tips for Better Results
+
+1. **Be specific about what you need**: "market size" vs "tell me about the market"
+2. **Include time bounds**: "2024-2025" narrows to relevant data
+3. **Name entities**: "semaglutide vs tirzepatide" vs "diabetes drugs"
+4. **Specify output expectations**: "Include statistics, key players, and growth projections"
+5. **Keep under 15,000 characters**: Concise queries work better than massive prompts
+
+---
+
+## Working with Research Basis
+
+Every deep research result includes a **basis** -- citations, reasoning, and confidence levels for each finding.
+
+### Text Mode Basis
+
+```python
+result = researcher.research(query="...", processor="pro-fast")
+
+# Citations are deduplicated and include URLs + excerpts
+for citation in result["citations"]:
+    print(f"Source: {citation['title']}")
+    print(f"URL: {citation['url']}")
+    if citation.get("excerpts"):
+        print(f"Excerpt: {citation['excerpts'][0][:200]}")
+```
+
+### Structured Mode Basis
+
+```python
+result = researcher.research_structured(query="...", processor="pro-fast")
+
+for basis_entry in result["basis"]:
+    print(f"Field: {basis_entry['field']}")
+    print(f"Confidence: {basis_entry['confidence']}")
+    print(f"Reasoning: {basis_entry['reasoning']}")
+    for cit in basis_entry["citations"]:
+        print(f"  Source: {cit['url']}")
+```
+
+### Confidence Levels
+
+| Level | Meaning | Action |
+|-------|---------|--------|
+| `high` | Multiple authoritative sources agree | Use directly |
+| `medium` | Some supporting evidence, minor uncertainty | Use with caveat |
+| `low` | Limited evidence, significant uncertainty | Verify independently |
+
+---
+
+## Advanced Patterns
+
+### Multi-Stage Research
+
+Use different processors in sequence for progressively deeper research:
+
+```python
+# Stage 1: Quick overview with base-fast
+overview = researcher.research(
+    query="What are the main approaches to quantum error correction?",
+    processor="base-fast",
+)
+
+# Stage 2: Deep dive on the most promising approach
+deep_dive = researcher.research(
+    query=f"Detailed analysis of surface code quantum error correction: "
+          f"recent breakthroughs, implementation challenges, and leading research groups. "
+          f"Context: {overview['output'][:500]}",
+    processor="pro-fast",
+)
+```
+
+### Comparative Research
+
+```python
+result = researcher.research(
+    query="Compare and contrast three leading large language model architectures: "
+          "GPT-4, Claude, and Gemini. Cover architecture differences, benchmark performance, "
+          "pricing, context window, and unique capabilities. Include specific benchmark scores.",
+    processor="pro-fast",
+    description="Create a structured comparison with a summary table. Include specific numbers and benchmarks."
+)
+```
+
+### Research with Follow-Up Extraction
+
+```python
+# Step 1: Research to find relevant sources
+research_result = researcher.research(
+    query="Most influential papers on attention mechanisms in 2024",
+    processor="pro-fast",
+)
+
+# Step 2: Extract full content from the most relevant sources
+from parallel_web import ParallelExtract
+extractor = ParallelExtract()
+
+key_urls = [c["url"] for c in research_result["citations"][:5]]
+for url in key_urls:
+    extracted = extractor.extract(
+        urls=[url],
+        objective="Key methodology, results, and conclusions",
+    )
+```
+
+---
+
+## Performance Optimization
+
+### Reducing Latency
+
+1. **Use `-fast` processors**: 2-5x faster than standard
+2. **Use `core-fast` for moderate queries**: Sub-2-minute for most questions
+3. **Be specific in queries**: Vague queries require more exploration
+4. **Set appropriate timeouts**: Don't over-wait
+
+### Reducing Cost
+
+1. **Start with `base-fast`**: Upgrade only if depth is insufficient
+2. **Use `core-fast` for moderate complexity**: $0.025 vs $0.10 for pro
+3. **Batch related queries**: One well-crafted query > multiple simple ones
+4. **Cache results**: Store research output for reuse across sections
+
+### Maximizing Quality
+
+1. **Use `pro-fast` or `ultra-fast`**: More sources = better synthesis
+2. **Provide context**: "I'm writing a paper for Nature Medicine about..."
+3. **Use `description` parameter**: Guide the output structure and focus
+4. **Verify critical findings**: Cross-check with Search API or Extract
+
+---
+
+## Common Mistakes
+
+| Mistake | Impact | Fix |
+|---------|--------|-----|
+| Query too vague | Scattered, unfocused results | Add specific aspects and time bounds |
+| Query too long (>15K chars) | API rejection or degraded results | Summarize context, focus on key question |
+| Wrong processor | Too slow or too shallow | Use decision matrix above |
+| Not using `description` | Report structure not aligned with needs | Add description to guide output |
+| Ignoring confidence levels | Using low-confidence data as fact | Check basis confidence before citing |
+| Not verifying citations | Risk of outdated or misattributed data | Cross-check key citations with Extract |
+
+---
+
+## See Also
+
+- [API Reference](api_reference.md) - Complete API parameter reference
+- [Search Best Practices](search_best_practices.md) - For quick web searches
+- [Extraction Patterns](extraction_patterns.md) - For reading specific URLs
+- [Workflow Recipes](workflow_recipes.md) - Common multi-step patterns