Added parallel-web skill

Refactor research lookup skill to enhance backend routing and update documentation. The skill now intelligently selects between the Parallel Chat API and Perplexity sonar-pro-search based on query type. Added compatibility notes, license information, and improved descriptions for clarity. Removed outdated example scripts to streamline the codebase.
This commit is contained in:
Vinayak Agarwal
2026-03-01 07:36:19 -08:00
parent 29c869326e
commit f72b7f4521
13 changed files with 3969 additions and 769 deletions

View File

@@ -0,0 +1,156 @@
# Research Lookup Skill
This skill provides real-time research information lookup using Perplexity's Sonar Pro Search model through OpenRouter.
## Setup
1. **Get OpenRouter API Key:**
- Visit [openrouter.ai](https://openrouter.ai)
- Create account and generate API key
- Add credits to your account
2. **Configure Environment:**
```bash
export OPENROUTER_API_KEY="your_api_key_here"
```
3. **Test Setup:**
```bash
python scripts/research_lookup.py --model-info
```
## Usage
### Command Line Usage
```bash
# Single research query
python scripts/research_lookup.py "Recent advances in CRISPR gene editing 2024"
# Multiple queries with delay
python scripts/research_lookup.py --batch "CRISPR applications" "gene therapy trials" "ethical considerations"
# Claude Code integration (called automatically)
python lookup.py "your research query here"
```
### Claude Code Integration
The research lookup tool is automatically available in Claude Code when you:
1. **Ask research questions:** "Research recent advances in quantum computing"
2. **Request literature reviews:** "Find current studies on climate change impacts"
3. **Need citations:** "What are the latest papers on transformer attention mechanisms?"
4. **Want technical information:** "Standard protocols for flow cytometry"
## Features
- **Academic Focus:** Prioritizes peer-reviewed papers and reputable sources
- **Current Information:** Focuses on recent publications (2020-2024)
- **Complete Citations:** Provides full bibliographic information with DOIs
- **Multiple Formats:** Supports various query types and research needs
- **High Search Context:** Always uses high search context for deeper, more comprehensive research
- **Quality Prioritization:** Automatically prioritizes highly-cited papers from top venues
- **Cost Effective:** Typically $0.01-0.05 per research query
## Paper Quality Prioritization
This skill **always prioritizes high-impact, influential papers** over obscure publications. Results are ranked by:
### Citation-Based Ranking
| Paper Age | Citation Threshold | Classification |
|-----------|-------------------|----------------|
| 0-3 years | 20+ citations | Noteworthy |
| 0-3 years | 100+ citations | Highly Influential |
| 3-7 years | 100+ citations | Significant |
| 3-7 years | 500+ citations | Landmark |
| 7+ years | 500+ citations | Seminal |
| 7+ years | 1000+ citations | Foundational |
### Venue Quality Tiers
Papers from higher-tier venues are always preferred:
- **Tier 1 (Highest Priority):** Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS, Nature Medicine, Nature Biotechnology
- **Tier 2 (High Priority):** High-impact journals (IF>10), top conferences (NeurIPS, ICML, ICLR for ML/AI)
- **Tier 3 (Good):** Respected specialized journals (IF 5-10)
- **Tier 4 (Use Sparingly):** Other peer-reviewed venues
### Author Reputation
The skill prefers papers from:
- Senior researchers with high h-index
- Established research groups at recognized institutions
- Authors with multiple publications in Tier-1 venues
- Researchers with recognized expertise (awards, editorial positions)
### Relevance Priority
1. Papers directly addressing the research question
2. Papers with applicable methods/data
3. Tangentially related papers (only from top venues or highly cited)
## Query Examples
### Academic Research
- "Recent systematic reviews on AI in medical diagnosis 2024"
- "Meta-analysis of randomized controlled trials for depression treatment"
- "Current state of quantum computing error correction research"
### Technical Methods
- "Standard protocols for immunohistochemistry in tissue samples"
- "Best practices for machine learning model validation"
- "Statistical methods for analyzing longitudinal data"
### Statistical Data
- "Global renewable energy adoption statistics 2024"
- "Prevalence of diabetes in different populations"
- "Market size for autonomous vehicles industry"
## Response Format
Each research result includes:
- **Summary:** Brief overview of key findings
- **Key Studies:** 3-5 most relevant recent papers
- **Citations:** Complete bibliographic information
- **Usage Stats:** Token usage for cost tracking
- **Timestamp:** When the research was performed
## Integration with Scientific Writing
This skill enhances the scientific writing process by providing:
1. **Literature Reviews:** Current research for introduction sections
2. **Methods Validation:** Verify protocols against current standards
3. **Results Context:** Compare findings with recent similar studies
4. **Discussion Support:** Latest evidence for arguments
5. **Citation Management:** Properly formatted references
## Troubleshooting
**"API key not found"**
- Ensure `OPENROUTER_API_KEY` environment variable is set
- Check that you have credits in your OpenRouter account
**"Model not available"**
- Verify your API key has access to Perplexity models
- Check OpenRouter status page for service issues
**"Rate limit exceeded"**
- Add delays between requests using `--delay` option
- Check your OpenRouter account limits
**"No relevant results"**
- Try more specific or broader queries
- Include time frames (e.g., "2023-2024")
- Use academic keywords and technical terms
## Cost Management
- Monitor usage through OpenRouter dashboard
- Typical costs: $0.01-0.05 per research query
- Batch processing available for multiple queries
- Consider query specificity to optimize token usage
This skill is designed for academic and research purposes, providing high-quality, cited information to support scientific writing and research activities.

View File

@@ -1,27 +1,35 @@
---
name: research-lookup
description: "Look up current research information using Perplexity's Sonar Pro Search or Sonar Reasoning Pro models through OpenRouter. Automatically selects the best model based on query complexity. Search academic papers, recent studies, technical documentation, and general research information with citations."
description: Look up current research information using the Parallel Chat API (primary) or Perplexity sonar-pro-search (academic paper searches). Automatically routes queries to the best backend. Use for finding papers, gathering research data, and verifying scientific information.
allowed-tools: Read Write Edit Bash
license: MIT license
compatibility: PARALLEL_API_KEY and OPENROUTER_API_KEY required
metadata:
skill-author: K-Dense Inc.
---
# Research Information Lookup
## Overview
This skill enables real-time research information lookup using Perplexity's Sonar models through OpenRouter. It intelligently selects between **Sonar Pro Search** (fast, efficient lookup) and **Sonar Reasoning Pro** (deep analytical reasoning) based on query complexity. The skill provides access to current academic literature, recent studies, technical documentation, and general research information with proper citations and source attribution.
This skill provides real-time research information lookup with **intelligent backend routing**:
- **Parallel Chat API** (`core` model): Default backend for all general research queries. Provides comprehensive, multi-source research reports with inline citations via the OpenAI-compatible Chat API at `https://api.parallel.ai`.
- **Perplexity sonar-pro-search** (via OpenRouter): Used only for academic-specific paper searches where scholarly database access is critical.
The skill automatically detects query type and routes to the optimal backend.
## When to Use This Skill
Use this skill when you need:
- **Current Research Information**: Latest studies, papers, and findings in a specific field
- **Current Research Information**: Latest studies, papers, and findings
- **Literature Verification**: Check facts, statistics, or claims against current research
- **Background Research**: Gather context and supporting evidence for scientific writing
- **Citation Sources**: Find relevant papers and studies to cite in manuscripts
- **Citation Sources**: Find relevant papers and studies to cite
- **Technical Documentation**: Look up specifications, protocols, or methodologies
- **Recent Developments**: Stay current with emerging trends and breakthroughs
- **Statistical Data**: Find recent statistics, survey results, or research findings
- **Expert Opinions**: Access insights from recent interviews, reviews, or commentary
- **Market/Industry Data**: Current statistics, trends, competitive intelligence
- **Recent Developments**: Emerging trends, breakthroughs, announcements
## Visual Enhancement with Scientific Schematics
@@ -30,269 +38,133 @@ Use this skill when you need:
If your document does not already contain schematics or diagrams:
- Use the **scientific-schematics** skill to generate AI-powered publication-quality diagrams
- Simply describe your desired diagram in natural language
- Nano Banana Pro will automatically generate, review, and refine the schematic
**For new documents:** Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.
**How to generate schematics:**
```bash
python scripts/generate_schematic.py "your diagram description" -o figures/output.png
```
The AI will automatically:
- Create publication-quality images with proper formatting
- Review and refine through multiple iterations
- Ensure accessibility (colorblind-friendly, high contrast)
- Save outputs in the figures/ directory
---
**When to add schematics:**
- Research information flow diagrams
- Query processing workflow illustrations
- Model selection decision trees
- System integration architecture diagrams
- Information retrieval pipeline visualizations
- Knowledge synthesis frameworks
- Any complex concept that benefits from visualization
## Automatic Backend Selection
For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.
The skill automatically routes queries to the best backend based on content:
### Routing Logic
```
Query arrives
|
+-- Contains academic keywords? (papers, DOI, journal, peer-reviewed, etc.)
| YES --> Perplexity sonar-pro-search (academic search mode)
|
+-- Everything else (general research, market data, technical info, analysis)
--> Parallel Chat API (core model)
```
### Academic Keywords (Routes to Perplexity)
Queries containing these terms are routed to Perplexity for academic-focused search:
- Paper finding: `find papers`, `find articles`, `research papers on`, `published studies`
- Citations: `cite`, `citation`, `doi`, `pubmed`, `pmid`
- Academic sources: `peer-reviewed`, `journal article`, `scholarly`, `arxiv`, `preprint`
- Review types: `systematic review`, `meta-analysis`, `literature search`
- Paper quality: `foundational papers`, `seminal papers`, `landmark papers`, `highly cited`
### Everything Else (Routes to Parallel)
All other queries go to the Parallel Chat API (core model), including:
- General research questions
- Market and industry analysis
- Technical information and documentation
- Current events and recent developments
- Comparative analysis
- Statistical data retrieval
- Complex analytical queries
### Manual Override
You can force a specific backend:
```bash
# Force Parallel Deep Research
python research_lookup.py "your query" --force-backend parallel
# Force Perplexity academic search
python research_lookup.py "your query" --force-backend perplexity
```
---
## Core Capabilities
### 1. Academic Research Queries
### 1. General Research Queries (Parallel Chat API)
**Search Academic Literature**: Query for recent papers, studies, and reviews in specific domains:
**Default backend.** Provides comprehensive, multi-source research with citations via the Chat API (`core` model).
```
Query Examples:
- "Recent advances in CRISPR gene editing 2024"
- "Latest clinical trials for Alzheimer's disease treatment"
- "Machine learning applications in drug discovery systematic review"
- "Climate change impacts on biodiversity meta-analysis"
- "Recent advances in CRISPR gene editing 2025"
- "Compare mRNA vaccines vs traditional vaccines for cancer treatment"
- "AI adoption in healthcare industry statistics"
- "Global renewable energy market trends and projections"
- "Explain the mechanism underlying gut microbiome and depression"
```
**Expected Response Format**:
- Summary of key findings from recent literature
- Citation of 3-5 most relevant papers with authors, titles, journals, and years
- Key statistics or findings highlighted
- Identification of research gaps or controversies
- Links to full papers when available
**Response includes:**
- Comprehensive research report in markdown
- Inline citations from authoritative web sources
- Structured sections with key findings
- Multiple perspectives and data points
- Source URLs for verification
### 2. Technical and Methodological Information
### 2. Academic Paper Search (Perplexity sonar-pro-search)
**Protocol and Method Lookups**: Find detailed procedures, specifications, and methodologies:
**Used for academic-specific queries.** Prioritizes scholarly databases and peer-reviewed sources.
```
Query Examples:
- "Find papers on transformer attention mechanisms in NeurIPS 2024"
- "Foundational papers on quantum error correction"
- "Systematic review of immunotherapy in non-small cell lung cancer"
- "Cite the original BERT paper and its most influential follow-ups"
- "Published studies on CRISPR off-target effects in clinical trials"
```
**Response includes:**
- Summary of key findings from academic literature
- 5-8 high-quality citations with authors, titles, journals, years, DOIs
- Citation counts and venue tier indicators
- Key statistics and methodology highlights
- Research gaps and future directions
### 3. Technical and Methodological Information
```
Query Examples:
- "Western blot protocol for protein detection"
- "RNA sequencing library preparation methods"
- "Statistical power analysis for clinical trials"
- "Machine learning model evaluation metrics"
- "Machine learning model evaluation metrics comparison"
```
**Expected Response Format**:
- Step-by-step procedures or protocols
- Required materials and equipment
- Critical parameters and considerations
- Troubleshooting common issues
- References to standard protocols or seminal papers
### 3. Statistical and Data Information
**Research Statistics**: Look up current statistics, survey results, and research data:
### 4. Statistical and Market Data
```
Query Examples:
- "Prevalence of diabetes in US population 2024"
- "Global renewable energy adoption statistics"
- "Prevalence of diabetes in US population 2025"
- "Global AI market size and growth projections"
- "COVID-19 vaccination rates by country"
- "AI adoption in healthcare industry survey"
```
**Expected Response Format**:
- Current statistics with dates and sources
- Methodology of data collection
- Confidence intervals or margins of error when available
- Comparison with previous years or benchmarks
- Citations to original surveys or studies
### 4. Citation and Reference Assistance
**Citation Finding**: Locate the most influential, highly-cited papers from reputable authors and prestigious venues:
```
Query Examples:
- "Foundational papers on transformer architecture" (expect: Vaswani et al. 2017 in NeurIPS, 90,000+ citations)
- "Seminal works in quantum computing" (expect: papers from Nature, Science by leading researchers)
- "Key studies on climate change mitigation" (expect: IPCC-cited papers, Nature Climate Change)
- "Landmark trials in cancer immunotherapy" (expect: NEJM, Lancet trials with 1000+ citations)
```
**Expected Response Format**:
- 5-10 most influential papers, **ranked by impact and relevance**
- Complete citation information (authors, title, journal, year, DOI)
- **Citation count** for each paper (approximate if exact unavailable)
- **Venue tier** indication (Nature, Science, Cell = Tier 1, etc.)
- Brief description of each paper's contribution
- **Author credentials** when notable (e.g., "from the Hinton lab", "Nobel laureate")
- Journal impact factors when relevant
**Quality Criteria for Citation Selection**:
- Prefer papers with **100+ citations** (for papers 3+ years old)
- Prioritize **Tier-1 journals** (Nature, Science, Cell, NEJM, Lancet)
- Include work from **recognized leaders** in the field
- Balance **foundational papers** (high citations, older) with **recent advances** (emerging, high-impact venues)
## Automatic Model Selection
This skill features **intelligent model selection** based on query complexity:
### Model Types
**1. Sonar Pro Search** (`perplexity/sonar-pro-search`)
- **Use Case**: Straightforward information lookup
- **Best For**:
- Simple fact-finding queries
- Recent publication searches
- Basic protocol lookups
- Statistical data retrieval
- **Speed**: Fast responses
- **Cost**: Lower cost per query
**2. Sonar Reasoning Pro** (`perplexity/sonar-reasoning-pro`)
- **Use Case**: Complex analytical queries requiring deep reasoning
- **Best For**:
- Comparative analysis ("compare X vs Y")
- Synthesis of multiple studies
- Evaluating trade-offs or controversies
- Explaining mechanisms or relationships
- Critical analysis and interpretation
- **Speed**: Slower but more thorough
- **Cost**: Higher cost per query, but provides deeper insights
### Complexity Assessment
The skill automatically detects query complexity using these indicators:
**Reasoning Keywords** (triggers Sonar Reasoning Pro):
- Analytical: `compare`, `contrast`, `analyze`, `analysis`, `evaluate`, `critique`
- Comparative: `versus`, `vs`, `vs.`, `compared to`, `differences between`, `similarities`
- Synthesis: `meta-analysis`, `systematic review`, `synthesis`, `integrate`
- Causal: `mechanism`, `why`, `how does`, `how do`, `explain`, `relationship`, `causal relationship`, `underlying mechanism`
- Theoretical: `theoretical framework`, `implications`, `interpret`, `reasoning`
- Debate: `controversy`, `conflicting`, `paradox`, `debate`, `reconcile`
- Trade-offs: `pros and cons`, `advantages and disadvantages`, `trade-off`, `tradeoff`, `trade offs`
- Complexity: `multifaceted`, `complex interaction`, `critical analysis`
**Complexity Scoring**:
- Reasoning keywords: 3 points each (heavily weighted)
- Multiple questions: 2 points per question mark
- Complex sentence structures: 1.5 points per clause indicator (and, or, but, however, whereas, although)
- Very long queries: 1 point if >150 characters
- **Threshold**: Queries scoring ≥3 points trigger Sonar Reasoning Pro
**Practical Result**: Even a single strong reasoning keyword (compare, explain, analyze, etc.) will trigger the more powerful Sonar Reasoning Pro model, ensuring you get deep analysis when needed.
**Example Query Classification**:
**Sonar Pro Search** (straightforward lookup):
- "Recent advances in CRISPR gene editing 2024"
- "Prevalence of diabetes in US population"
- "Western blot protocol for protein detection"
**Sonar Reasoning Pro** (complex analysis):
- "Compare and contrast mRNA vaccines vs traditional vaccines for cancer treatment"
- "Explain the mechanism underlying the relationship between gut microbiome and depression"
- "Analyze the controversy surrounding AI in medical diagnosis and evaluate trade-offs"
### Manual Override
You can force a specific model using the `force_model` parameter:
```python
# Force Sonar Pro Search for fast lookup
research = ResearchLookup(force_model='pro')
# Force Sonar Reasoning Pro for deep analysis
research = ResearchLookup(force_model='reasoning')
# Automatic selection (default)
research = ResearchLookup()
```
Command-line usage:
```bash
# Force Sonar Pro Search
python research_lookup.py "your query" --force-model pro
# Force Sonar Reasoning Pro
python research_lookup.py "your query" --force-model reasoning
# Automatic (no flag)
python research_lookup.py "your query"
# Save output to a file
python research_lookup.py "your query" -o results.txt
# Output as JSON (useful for programmatic access)
python research_lookup.py "your query" --json
# Combine: JSON output saved to file
python research_lookup.py "your query" --json -o results.json
```
## Technical Integration
### OpenRouter API Configuration
This skill integrates with OpenRouter (openrouter.ai) to access Perplexity's Sonar models:
**Model Specifications**:
- **Models**:
- `perplexity/sonar-pro-search` (fast lookup)
- `perplexity/sonar-reasoning-pro-online` (deep analysis)
- **Search Mode**: Academic/scholarly mode (prioritizes peer-reviewed sources)
- **Search Context**: Always uses `high` search context for deeper, more comprehensive research results
- **Context Window**: 200K+ tokens for comprehensive research
- **Capabilities**: Academic paper search, citation generation, scholarly analysis
- **Output**: Rich responses with citations and source links from academic databases
**API Requirements**:
- OpenRouter API key (set as `OPENROUTER_API_KEY` environment variable)
- Account with sufficient credits for research queries
- Proper attribution and citation of sources
**Academic Mode Configuration**:
- System message configured to prioritize scholarly sources
- Search focused on peer-reviewed journals and academic publications
- Enhanced citation extraction for academic references
- Preference for recent academic literature (2020-2024)
- Direct access to academic databases and repositories
### Response Quality and Reliability
**Source Verification**: The skill prioritizes:
- Peer-reviewed academic papers and journals
- Reputable institutional sources (universities, government agencies, NGOs)
- Recent publications (within last 2-3 years preferred)
- High-impact journals and conferences
- Primary research over secondary sources
**Citation Standards**: All responses include:
- Complete bibliographic information
- DOI or stable URLs when available
- Access dates for web sources
- Clear attribution of direct quotes or data
---
## Paper Quality and Popularity Prioritization
**CRITICAL**: When searching for papers, ALWAYS prioritize high-quality, influential papers over obscure or low-impact publications. Quality matters more than quantity.
**CRITICAL**: When searching for papers, ALWAYS prioritize high-quality, influential papers.
### Citation-Based Ranking
Prioritize papers based on citation count relative to their age:
| Paper Age | Citation Threshold | Classification |
|-----------|-------------------|----------------|
| 0-3 years | 20+ citations | Noteworthy |
@@ -302,305 +174,240 @@ Prioritize papers based on citation count relative to their age:
| 7+ years | 500+ citations | Seminal Work |
| 7+ years | 1000+ citations | Foundational |
**When reporting citations**: Always indicate approximate citation count when known (e.g., "cited 500+ times" or "highly cited").
### Venue Quality Tiers
Prioritize papers from higher-tier venues:
**Tier 1 - Premier Venues** (Always prefer):
- **General Science**: Nature, Science, Cell, PNAS
- **Medicine**: NEJM, Lancet, JAMA, BMJ
- **Field-Specific Flagships**: Nature Medicine, Nature Biotechnology, Nature Methods, Nature Genetics, Cell Stem Cell, Immunity
- **Top CS/AI**: NeurIPS, ICML, ICLR, ACL, CVPR (for ML/AI topics)
- **Field-Specific**: Nature Medicine, Nature Biotechnology, Nature Methods
- **Top CS/AI**: NeurIPS, ICML, ICLR, ACL, CVPR
**Tier 2 - High-Impact Specialized** (Strong preference):
- Journals with Impact Factor > 10
- Top conferences in subfields (e.g., EMNLP, NAACL, ECCV, MICCAI)
- Society flagship journals (e.g., Blood, Circulation, Gastroenterology)
- Top conferences in subfields (EMNLP, NAACL, ECCV, MICCAI)
**Tier 3 - Respected Specialized** (Include when relevant):
- Journals with Impact Factor 5-10
- Established conferences in the field
- Well-indexed specialized journals
**Tier 4 - Other Peer-Reviewed** (Use sparingly):
- Lower-impact journals, only if directly relevant and no better source exists
---
### Author Reputation Indicators
## Technical Integration
Prefer papers from established, reputable researchers:
### Environment Variables
- **Senior authors with high h-index** (>40 in established fields)
- **Multiple publications in Tier-1 venues**
- **Leadership positions** at recognized research institutions
- **Recognized expertise**: Awards, editorial positions, society fellows
- **First/last author on landmark papers** in the field
```bash
# Primary backend (Parallel Chat API) - REQUIRED
export PARALLEL_API_KEY="your_parallel_api_key"
### Direct Relevance Scoring
Always prioritize papers that directly address the research question:
1. **Primary Priority**: Papers directly addressing the exact research question
2. **Secondary Priority**: Papers with applicable methods, data, or conceptual frameworks
3. **Tertiary Priority**: Tangentially related papers (include ONLY if from Tier-1 venues or highly cited)
### Practical Application
When conducting research lookups:
1. **Start with the most influential papers** - Look for highly-cited, foundational work first
2. **Prioritize Tier-1 venues** - Nature, Science, Cell family journals, NEJM, Lancet for medical topics
3. **Check author credentials** - Prefer work from established research groups
4. **Balance recency with impact** - Recent highly-cited papers > older obscure papers > recent uncited papers
5. **Report quality indicators** - Include citation counts, journal names, and author affiliations in responses
**Example Quality-Focused Query Response**:
```
Key findings from high-impact literature:
1. Smith et al. (2023), Nature Medicine (IF: 82.9, cited 450+ times)
- Senior author: Prof. John Smith, Harvard Medical School
- Key finding: [finding]
2. Johnson & Lee (2024), Cell (IF: 64.5, cited 120+ times)
- From the renowned Lee Lab at Stanford
- Key finding: [finding]
3. Chen et al. (2022), NEJM (IF: 158.5, cited 890+ times)
- Landmark clinical trial (N=5,000)
- Key finding: [finding]
# Academic search backend (Perplexity) - REQUIRED for academic queries
export OPENROUTER_API_KEY="your_openrouter_api_key"
```
## Query Best Practices
### API Specifications
### 1. Model Selection Strategy
**Parallel Chat API:**
- Endpoint: `https://api.parallel.ai` (OpenAI SDK compatible)
- Model: `core` (60s-5min latency, complex multi-source synthesis)
- Output: Markdown text with inline citations
- Citations: Research basis with URLs, reasoning, and confidence levels
- Rate limits: 300 req/min
- Python package: `openai`
**For Simple Lookups (Sonar Pro Search)**:
- Recent papers on a specific topic
- Statistical data or prevalence rates
- Standard protocols or methodologies
- Citation finding for specific papers
- Factual information retrieval
**Perplexity sonar-pro-search:**
- Model: `perplexity/sonar-pro-search` (via OpenRouter)
- Search mode: Academic (prioritizes peer-reviewed sources)
- Search context: High (comprehensive research)
- Response time: 5-15 seconds
**For Complex Analysis (Sonar Reasoning Pro)**:
- Comparative studies and synthesis
- Mechanism explanations
- Controversy evaluation
- Trade-off analysis
- Theoretical frameworks
- Multi-faceted relationships
### Command-Line Usage
**Pro Tip**: The automatic selection is optimized for most use cases. Only use `force_model` if you have specific requirements or know the query needs deeper reasoning than detected.
```bash
# Auto-routed research (recommended) — ALWAYS save to sources/
python research_lookup.py "your query" -o sources/research_YYYYMMDD_HHMMSS_<topic>.md
### 2. Specific and Focused Queries
# Force specific backend — ALWAYS save to sources/
python research_lookup.py "your query" --force-backend parallel -o sources/research_<topic>.md
python research_lookup.py "your query" --force-backend perplexity -o sources/papers_<topic>.md
**Good Queries** (will trigger appropriate model):
- "Randomized controlled trials of mRNA vaccines for cancer treatment 2023-2024" → Sonar Pro Search
- "Compare the efficacy and safety of mRNA vaccines vs traditional vaccines for cancer treatment" → Sonar Reasoning Pro
- "Explain the mechanism by which CRISPR off-target effects occur and strategies to minimize them" → Sonar Reasoning Pro
# JSON output — ALWAYS save to sources/
python research_lookup.py "your query" --json -o sources/research_<topic>.json
**Poor Queries**:
- "Tell me about AI" (too broad)
- "Cancer research" (lacks specificity)
- "Latest news" (too vague)
### 3. Structured Query Format
**Recommended Structure**:
```
[Topic] + [Specific Aspect] + [Time Frame] + [Type of Information]
# Batch queries — ALWAYS save to sources/
python research_lookup.py --batch "query 1" "query 2" "query 3" -o sources/batch_research_<topic>.md
```
**Examples**:
- "CRISPR gene editing + off-target effects + 2024 + clinical trials"
- "Quantum computing + error correction + recent advances + review papers"
- "Renewable energy + solar efficiency + 2023-2024 + statistical data"
---
### 4. Follow-up Queries
## MANDATORY: Save All Results to Sources Folder
**Effective Follow-ups**:
- "Show me the full citation for the Smith et al. 2024 paper"
- "What are the limitations of this methodology?"
- "Find similar studies using different approaches"
- "What controversies exist in this research area?"
**Every research-lookup result MUST be saved to the project's `sources/` folder.**
This is non-negotiable. Research results are expensive to obtain and critical for reproducibility.
### Saving Rules
| Backend | `-o` Flag Target | Filename Pattern |
|---------|-----------------|------------------|
| Parallel Deep Research | `sources/research_<topic>.md` | `research_YYYYMMDD_HHMMSS_<brief_topic>.md` |
| Perplexity (academic) | `sources/papers_<topic>.md` | `papers_YYYYMMDD_HHMMSS_<brief_topic>.md` |
| Batch queries | `sources/batch_<topic>.md` | `batch_research_YYYYMMDD_HHMMSS_<brief_topic>.md` |
### How to Save
**CRITICAL: Every call to `research_lookup.py` MUST include the `-o` flag pointing to the `sources/` folder.**
**CRITICAL: Saved files MUST preserve all citations, source URLs, and DOIs.** The default text output automatically includes a `Sources` section (with title, date, URL for each source) and an `Additional References` section (with DOIs and academic URLs extracted from the response text). For maximum citation metadata, use `--json`.
```bash
# General research — save to sources/ (includes Sources + Additional References sections)
python research_lookup.py "Recent advances in CRISPR gene editing 2025" \
-o sources/research_20250217_143000_crispr_advances.md
# Academic paper search — save to sources/ (includes paper citations with DOIs)
python research_lookup.py "Find papers on transformer attention mechanisms in NeurIPS 2024" \
-o sources/papers_20250217_143500_transformer_attention.md
# JSON format for maximum citation metadata (full citation objects with URLs, DOIs, snippets)
python research_lookup.py "CRISPR clinical trials" --json \
-o sources/research_20250217_143000_crispr_trials.json
# Forced backend — save to sources/
python research_lookup.py "AI regulation landscape" --force-backend parallel \
-o sources/research_20250217_144000_ai_regulation.md
# Batch queries — save to sources/
python research_lookup.py --batch "mRNA vaccines efficacy" "mRNA vaccines safety" \
-o sources/batch_research_20250217_144500_mrna_vaccines.md
```
### Citation Preservation in Saved Files
Each output format preserves citations differently:
| Format | Citations Included | When to Use |
|--------|-------------------|-------------|
| Text (default) | `Sources (N):` section with `[title] (date) + URL` + `Additional References (N):` with DOIs and academic URLs | Standard use — human-readable with all citations |
| JSON (`--json`) | Full citation objects: `url`, `title`, `date`, `snippet`, `doi`, `type` | When you need maximum citation metadata |
**For Parallel backend**, saved files include: research report + Sources list (title, URL) + Additional References (DOIs, academic URLs).
**For Perplexity backend**, saved files include: academic summary + Sources list (title, date, URL, snippet) + Additional References (DOIs, academic URLs).
**Use `--json` when you need to:**
- Parse citation metadata programmatically
- Preserve full DOI and URL data for BibTeX generation
- Maintain the structured citation objects for cross-referencing
### Why Save Everything
1. **Reproducibility**: Every citation and claim can be traced back to its raw research source
2. **Context Window Recovery**: If context is compacted, saved results can be re-read without re-querying
3. **Audit Trail**: The `sources/` folder documents exactly how all research information was gathered
4. **Reuse Across Sections**: Multiple sections can reference the same saved research without duplicate queries
5. **Cost Efficiency**: Check `sources/` for existing results before making new API calls
6. **Peer Review Support**: Reviewers can verify the research backing every citation
### Before Making a New Query, Check Sources First
Before calling `research_lookup.py`, check if a relevant result already exists:
```bash
ls sources/ # Check existing saved results
```
If a prior lookup covers the same topic, re-read the saved file instead of making a new API call.
### Logging
When saving research results, always log:
```
[HH:MM:SS] SAVED: Research lookup to sources/research_20250217_143000_crispr_advances.md (3,800 words, 8 citations)
[HH:MM:SS] SAVED: Paper search to sources/papers_20250217_143500_transformer_attention.md (6 papers found)
```
---
## Integration with Scientific Writing
This skill enhances scientific writing by providing:
1. **Literature Review Support**: Gather current research for introduction and discussion sections
2. **Methods Validation**: Verify protocols and procedures against current standards
3. **Results Contextualization**: Compare findings with recent similar studies
4. **Discussion Enhancement**: Support arguments with latest evidence
5. **Citation Management**: Provide properly formatted citations in multiple styles
## Error Handling and Limitations
**Known Limitations**:
- Information cutoff: Responses limited to training data (typically 2023-2024)
- Paywall content: May not access full text behind paywalls
- Emerging research: May miss very recent papers not yet indexed
- Specialized databases: Cannot access proprietary or restricted databases
**Error Conditions**:
- API rate limits or quota exceeded
- Network connectivity issues
- Malformed or ambiguous queries
- Model unavailability or maintenance
**Fallback Strategies**:
- Rephrase queries for better clarity
- Break complex queries into simpler components
- Use broader time frames if recent data unavailable
- Cross-reference with multiple query variations
## Usage Examples
### Example 1: Simple Literature Search (Sonar Pro Search)
**Query**: "Recent advances in transformer attention mechanisms 2024"
**Model Selected**: Sonar Pro Search (straightforward lookup)
**Response Includes**:
- Summary of 5 key papers from 2024
- Complete citations with DOIs
- Key innovations and improvements
- Performance benchmarks
- Future research directions
### Example 2: Comparative Analysis (Sonar Reasoning Pro)
**Query**: "Compare and contrast the advantages and limitations of transformer-based models versus traditional RNNs for sequence modeling"
**Model Selected**: Sonar Reasoning Pro (complex analysis required)
**Response Includes**:
- Detailed comparison across multiple dimensions
- Analysis of architectural differences
- Trade-offs in computational efficiency vs performance
- Use case recommendations
- Synthesis of evidence from multiple studies
- Discussion of ongoing debates in the field
### Example 3: Method Verification (Sonar Pro Search)
**Query**: "Standard protocols for flow cytometry analysis"
**Model Selected**: Sonar Pro Search (protocol lookup)
**Response Includes**:
- Step-by-step protocol from recent review
- Required controls and calibrations
- Common pitfalls and troubleshooting
- Reference to definitive methodology paper
- Alternative approaches with pros/cons
### Example 4: Mechanism Explanation (Sonar Reasoning Pro)
**Query**: "Explain the underlying mechanism of how mRNA vaccines trigger immune responses and why they differ from traditional vaccines"
**Model Selected**: Sonar Reasoning Pro (requires causal reasoning)
**Response Includes**:
- Detailed mechanistic explanation
- Step-by-step biological processes
- Comparative analysis with traditional vaccines
- Molecular-level interactions
- Integration of immunology and pharmacology concepts
- Evidence from recent research
### Example 5: Statistical Data (Sonar Pro Search)
**Query**: "Global AI adoption in healthcare statistics 2024"
**Model Selected**: Sonar Pro Search (data lookup)
**Response Includes**:
- Current adoption rates by region
- Market size and growth projections
- Survey methodology and sample size
- Comparison with previous years
- Citations to market research reports
## Performance and Cost Considerations
### Response Times
**Sonar Pro Search**:
- Typical response time: 5-15 seconds
- Best for rapid information gathering
- Suitable for batch queries
**Sonar Reasoning Pro**:
- Typical response time: 15-45 seconds
- Worth the wait for complex analytical queries
- Provides more thorough reasoning and synthesis
### Cost Optimization
**Automatic Selection Benefits**:
- Saves costs by using Sonar Pro Search for straightforward queries
- Reserves Sonar Reasoning Pro for queries that truly benefit from deeper analysis
- Optimizes the balance between cost and quality
**Manual Override Use Cases**:
- Force Sonar Pro Search when budget is constrained and speed is priority
- Force Sonar Reasoning Pro when working on critical research requiring maximum depth
- Use for specific sections of papers (e.g., Pro Search for methods, Reasoning for discussion)
**Best Practices**:
1. Trust the automatic selection for most use cases
2. Review query results - if Sonar Pro Search doesn't provide sufficient depth, rephrase with reasoning keywords
3. Use batch queries strategically - combine simple lookups to minimize total query count
4. For literature reviews, start with Sonar Pro Search for breadth, then use Sonar Reasoning Pro for synthesis
## Security and Ethical Considerations
**Responsible Use**:
- Verify all information against primary sources when possible
- Clearly attribute all data and quotes to original sources
- Avoid presenting AI-generated summaries as original research
- Respect copyright and licensing restrictions
- Use for research assistance, not to bypass paywalls or subscriptions
**Academic Integrity**:
- Always cite original sources, not the AI tool
- Use as a starting point for literature searches
- Follow institutional guidelines for AI tool usage
- Maintain transparency about research methods
1. **Literature Review Support**: Gather current research for introduction and discussion **save to `sources/`**
2. **Methods Validation**: Verify protocols against current standards**save to `sources/`**
3. **Results Contextualization**: Compare findings with recent similar studies**save to `sources/`**
4. **Discussion Enhancement**: Support arguments with latest evidence**save to `sources/`**
5. **Citation Management**: Provide properly formatted citations **save to `sources/`**
## Complementary Tools
In addition to research-lookup, the scientific writer has access to **WebSearch** for:
- **Quick metadata verification**: Look up DOIs, publication years, journal names, volume/page numbers
- **Non-academic sources**: News, blogs, technical documentation, current events
- **General information**: Company info, product details, current statistics
- **Cross-referencing**: Verify citation details found through research-lookup
**When to use which tool:**
| Task | Tool |
|------|------|
| Find academic papers | research-lookup |
| Literature search | research-lookup |
| Deep analysis/comparison | research-lookup (Sonar Reasoning Pro) |
| Look up DOI/metadata | WebSearch |
| Verify publication year | WebSearch |
| Find journal volume/pages | WebSearch |
| Current events/news | WebSearch |
| Non-scholarly sources | WebSearch |
| General web search | `parallel-web` skill (`parallel_web.py search`) |
| Citation verification | `parallel-web` skill (`parallel_web.py extract`) |
| Deep research (any topic) | `research-lookup` or `parallel-web` skill |
| Academic paper search | `research-lookup` (auto-routes to Perplexity) |
| Google Scholar search | `citation-management` skill |
| PubMed search | `citation-management` skill |
| DOI to BibTeX | `citation-management` skill |
| Metadata verification | `parallel-web` skill (`parallel_web.py search` or `extract`) |
---
## Error Handling and Limitations
**Known Limitations:**
- Parallel Chat API (core model): Complex queries may take up to 5 minutes
- Perplexity: Information cutoff, may not access full text behind paywalls
- Both: Cannot access proprietary or restricted databases
**Fallback Behavior:**
- If the selected backend's API key is missing, tries the other backend
- If both backends fail, returns structured error response
- Rephrase queries for better results if initial response is insufficient
---
## Usage Examples
### Example 1: General Research (Routes to Parallel)
**Query**: "Recent advances in transformer attention mechanisms 2025"
**Backend**: Parallel Chat API (core model)
**Response**: Comprehensive markdown report with citations from authoritative sources, covering recent papers, key innovations, and performance benchmarks.
### Example 2: Academic Paper Search (Routes to Perplexity)
**Query**: "Find papers on CRISPR off-target effects in clinical trials"
**Backend**: Perplexity sonar-pro-search (academic mode)
**Response**: Curated list of 5-8 high-impact papers with full citations, DOIs, citation counts, and venue tier indicators.
### Example 3: Comparative Analysis (Routes to Parallel)
**Query**: "Compare and contrast mRNA vaccines vs traditional vaccines for cancer treatment"
**Backend**: Parallel Chat API (core model)
**Response**: Detailed comparative report with data from multiple sources, structured analysis, and cited evidence.
### Example 4: Market Data (Routes to Parallel)
**Query**: "Global AI adoption in healthcare statistics 2025"
**Backend**: Parallel Chat API (core model)
**Response**: Current market data, adoption rates, growth projections, and regional analysis with source citations.
---
## Summary
This skill serves as a powerful research assistant with intelligent dual-model selection:
This skill serves as the primary research interface with intelligent dual-backend routing:
- **Automatic Intelligence**: Analyzes query complexity and selects the optimal model (Sonar Pro Search or Sonar Reasoning Pro)
- **Cost-Effective**: Uses faster, cheaper Sonar Pro Search for straightforward lookups
- **Deep Analysis**: Automatically engages Sonar Reasoning Pro for complex comparative, analytical, and theoretical queries
- **Flexible Control**: Manual override available when you know exactly what level of analysis you need
- **Academic Focus**: Both models configured to prioritize peer-reviewed sources and scholarly literature
- **Complementary WebSearch**: Use alongside WebSearch for metadata verification and non-academic sources
Whether you need quick fact-finding or deep analytical synthesis, this skill automatically adapts to deliver the right level of research support for your scientific writing needs.
- **Parallel Chat API** (default, `core` model): Comprehensive, multi-source research for any topic
- **Perplexity sonar-pro-search**: Academic-specific paper searches only
- **Automatic routing**: Detects academic queries and routes appropriately
- **Manual override**: Force any backend when needed
- **Complementary**: Works alongside `parallel-web` skill for web search and URL extraction

View File

@@ -0,0 +1,566 @@
#!/usr/bin/env python3
"""
Research Information Lookup Tool
Routes research queries to the best backend:
- Parallel Chat API (core model): Default for all general research queries
- Perplexity sonar-pro-search (via OpenRouter): Academic-specific paper searches
Environment variables:
PARALLEL_API_KEY - Required for Parallel Chat API (primary backend)
OPENROUTER_API_KEY - Required for Perplexity academic searches (fallback)
"""
import os
import sys
import json
import re
import time
import requests
from datetime import datetime
from typing import Any, Dict, List, Optional
class ResearchLookup:
"""Research information lookup with intelligent backend routing.
Routes queries to the Parallel Chat API (default) or Perplexity
sonar-pro-search (academic paper searches only).
"""
ACADEMIC_KEYWORDS = [
"find papers", "find paper", "find articles", "find article",
"cite ", "citation", "citations for",
"doi ", "doi:", "pubmed", "pmid",
"journal article", "peer-reviewed",
"systematic review", "meta-analysis",
"literature search", "literature on",
"academic papers", "academic paper",
"research papers on", "research paper on",
"published studies", "published study",
"scholarly", "scholar",
"arxiv", "preprint",
"foundational papers", "seminal papers", "landmark papers",
"highly cited", "most cited",
]
PARALLEL_SYSTEM_PROMPT = (
"You are a deep research analyst. Provide a comprehensive, well-cited "
"research report on the user's topic. Include:\n"
"- Key findings with specific data, statistics, and quantitative evidence\n"
"- Detailed analysis organized by themes\n"
"- Multiple authoritative sources cited inline\n"
"- Methodologies and implications where relevant\n"
"- Future outlook and research gaps\n"
"Use markdown formatting with clear section headers. "
"Prioritize authoritative and recent sources."
)
CHAT_BASE_URL = "https://api.parallel.ai"
def __init__(self, force_backend: Optional[str] = None):
"""Initialize the research lookup tool.
Args:
force_backend: Force a specific backend ('parallel' or 'perplexity').
If None, backend is auto-selected based on query content.
"""
self.force_backend = force_backend
self.parallel_available = bool(os.getenv("PARALLEL_API_KEY"))
self.perplexity_available = bool(os.getenv("OPENROUTER_API_KEY"))
if not self.parallel_available and not self.perplexity_available:
raise ValueError(
"No API keys found. Set at least one of:\n"
" PARALLEL_API_KEY (for Parallel Chat API - primary)\n"
" OPENROUTER_API_KEY (for Perplexity academic search - fallback)"
)
def _select_backend(self, query: str) -> str:
"""Select the best backend for a query."""
if self.force_backend:
if self.force_backend == "perplexity" and self.perplexity_available:
return "perplexity"
if self.force_backend == "parallel" and self.parallel_available:
return "parallel"
query_lower = query.lower()
is_academic = any(kw in query_lower for kw in self.ACADEMIC_KEYWORDS)
if is_academic and self.perplexity_available:
return "perplexity"
if self.parallel_available:
return "parallel"
if self.perplexity_available:
return "perplexity"
raise ValueError("No backend available. Check API keys.")
# ------------------------------------------------------------------
# Parallel Chat API backend
# ------------------------------------------------------------------
def _get_chat_client(self):
"""Lazy-load and cache the OpenAI client for Parallel Chat API."""
if not hasattr(self, "_chat_client"):
try:
from openai import OpenAI
except ImportError:
raise ImportError(
"The 'openai' package is required for Parallel Chat API.\n"
"Install it with: pip install openai"
)
self._chat_client = OpenAI(
api_key=os.getenv("PARALLEL_API_KEY"),
base_url=self.CHAT_BASE_URL,
)
return self._chat_client
def _parallel_lookup(self, query: str) -> Dict[str, Any]:
"""Run research via the Parallel Chat API (core model)."""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
model = "core"
try:
client = self._get_chat_client()
print(f"[Research] Parallel Chat API (model={model})...", file=sys.stderr)
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": self.PARALLEL_SYSTEM_PROMPT},
{"role": "user", "content": query},
],
stream=False,
)
content = ""
if response.choices and len(response.choices) > 0:
content = response.choices[0].message.content or ""
api_citations = self._extract_basis_citations(response)
text_citations = self._extract_citations_from_text(content)
return {
"success": True,
"query": query,
"response": content,
"citations": api_citations + text_citations,
"sources": api_citations,
"timestamp": timestamp,
"backend": "parallel",
"model": f"parallel-chat/{model}",
}
except Exception as e:
return {
"success": False,
"query": query,
"error": str(e),
"timestamp": timestamp,
"backend": "parallel",
"model": f"parallel-chat/{model}",
}
def _extract_basis_citations(self, response) -> List[Dict[str, str]]:
"""Extract citation sources from the Chat API research basis."""
citations = []
basis = getattr(response, "basis", None)
if not basis:
return citations
seen_urls = set()
if isinstance(basis, list):
for item in basis:
cits = (
item.get("citations", []) if isinstance(item, dict)
else getattr(item, "citations", None) or []
)
for cit in cits:
url = cit.get("url", "") if isinstance(cit, dict) else getattr(cit, "url", "")
if url and url not in seen_urls:
seen_urls.add(url)
title = cit.get("title", "") if isinstance(cit, dict) else getattr(cit, "title", "")
excerpts = cit.get("excerpts", []) if isinstance(cit, dict) else getattr(cit, "excerpts", [])
citations.append({
"type": "source",
"url": url,
"title": title,
"excerpts": excerpts,
})
return citations
# ------------------------------------------------------------------
# Perplexity academic search backend
# ------------------------------------------------------------------
def _perplexity_lookup(self, query: str) -> Dict[str, Any]:
"""Run academic search via Perplexity sonar-pro-search through OpenRouter."""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
api_key = os.getenv("OPENROUTER_API_KEY")
model = "perplexity/sonar-pro-search"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"HTTP-Referer": "https://scientific-writer.local",
"X-Title": "Scientific Writer Research Tool",
}
research_prompt = self._format_academic_prompt(query)
messages = [
{
"role": "system",
"content": (
"You are an academic research assistant specializing in finding "
"HIGH-IMPACT, INFLUENTIAL research.\n\n"
"QUALITY PRIORITIZATION (CRITICAL):\n"
"- ALWAYS prefer highly-cited papers over obscure publications\n"
"- ALWAYS prioritize Tier-1 venues: Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS\n"
"- ALWAYS prefer papers from established researchers\n"
"- Include citation counts when known (e.g., 'cited 500+ times')\n"
"- Quality matters more than quantity\n\n"
"VENUE HIERARCHY:\n"
"1. Nature/Science/Cell family, NEJM, Lancet, JAMA (highest)\n"
"2. High-impact specialized journals (IF>10), top conferences (NeurIPS, ICML, ICLR)\n"
"3. Respected field-specific journals (IF 5-10)\n"
"4. Other peer-reviewed sources (only if no better option)\n\n"
"Focus exclusively on scholarly sources. Prioritize recent literature (2020-2026) "
"and provide complete citations with DOIs."
),
},
{"role": "user", "content": research_prompt},
]
data = {
"model": model,
"messages": messages,
"max_tokens": 8000,
"temperature": 0.1,
"search_mode": "academic",
"search_context_size": "high",
}
try:
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers=headers,
json=data,
timeout=90,
)
response.raise_for_status()
resp_json = response.json()
if "choices" in resp_json and len(resp_json["choices"]) > 0:
choice = resp_json["choices"][0]
if "message" in choice and "content" in choice["message"]:
content = choice["message"]["content"]
api_citations = self._extract_api_citations(resp_json, choice)
text_citations = self._extract_citations_from_text(content)
citations = api_citations + text_citations
return {
"success": True,
"query": query,
"response": content,
"citations": citations,
"sources": api_citations,
"timestamp": timestamp,
"backend": "perplexity",
"model": model,
"usage": resp_json.get("usage", {}),
}
else:
raise Exception("Invalid response format from API")
else:
raise Exception("No response choices received from API")
except Exception as e:
return {
"success": False,
"query": query,
"error": str(e),
"timestamp": timestamp,
"backend": "perplexity",
"model": model,
}
# ------------------------------------------------------------------
# Shared utilities
# ------------------------------------------------------------------
def _format_academic_prompt(self, query: str) -> str:
"""Format a query for academic research results via Perplexity."""
return f"""You are an expert research assistant. Please provide comprehensive, accurate research information for the following query: "{query}"
IMPORTANT INSTRUCTIONS:
1. Focus on ACADEMIC and SCIENTIFIC sources (peer-reviewed papers, reputable journals, institutional research)
2. Include RECENT information (prioritize 2020-2026 publications)
3. Provide COMPLETE citations with authors, title, journal/conference, year, and DOI when available
4. Structure your response with clear sections and proper attribution
5. Be comprehensive but concise - aim for 800-1200 words
6. Include key findings, methodologies, and implications when relevant
7. Note any controversies, limitations, or conflicting evidence
PAPER QUALITY PRIORITIZATION (CRITICAL):
8. ALWAYS prioritize HIGHLY-CITED papers over obscure publications
9. ALWAYS prioritize papers from TOP-TIER VENUES (Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS)
10. PREFER papers from ESTABLISHED, REPUTABLE AUTHORS
11. For EACH citation include when available: citation count, venue tier, author credentials
12. PRIORITIZE papers that DIRECTLY address the research question
RESPONSE FORMAT:
- Start with a brief summary (2-3 sentences)
- Present key findings and studies in organized sections
- Rank papers by impact: most influential/cited first
- End with future directions or research gaps if applicable
- Include 5-8 high-quality citations
Remember: Quality over quantity. Prioritize influential, highly-cited papers from prestigious venues."""
def _extract_api_citations(self, response: Dict[str, Any], choice: Dict[str, Any]) -> List[Dict[str, str]]:
"""Extract citations from Perplexity API response fields."""
citations = []
search_results = (
response.get("search_results")
or choice.get("search_results")
or choice.get("message", {}).get("search_results")
or []
)
for result in search_results:
citation = {
"type": "source",
"title": result.get("title", ""),
"url": result.get("url", ""),
"date": result.get("date", ""),
}
if result.get("snippet"):
citation["snippet"] = result["snippet"]
citations.append(citation)
legacy_citations = (
response.get("citations")
or choice.get("citations")
or choice.get("message", {}).get("citations")
or []
)
for url in legacy_citations:
if isinstance(url, str):
citations.append({"type": "source", "url": url, "title": "", "date": ""})
elif isinstance(url, dict):
citations.append({
"type": "source",
"url": url.get("url", ""),
"title": url.get("title", ""),
"date": url.get("date", ""),
})
return citations
def _extract_citations_from_text(self, text: str) -> List[Dict[str, str]]:
"""Extract DOIs and academic URLs from response text as fallback."""
citations = []
doi_pattern = r'(?:doi[:\s]*|https?://(?:dx\.)?doi\.org/)(10\.[0-9]{4,}/[^\s\)\]\,\[\<\>]+)'
doi_matches = re.findall(doi_pattern, text, re.IGNORECASE)
seen_dois = set()
for doi in doi_matches:
doi_clean = doi.strip().rstrip(".,;:)]")
if doi_clean and doi_clean not in seen_dois:
seen_dois.add(doi_clean)
citations.append({
"type": "doi",
"doi": doi_clean,
"url": f"https://doi.org/{doi_clean}",
})
url_pattern = (
r'https?://[^\s\)\]\,\<\>\"\']+(?:arxiv\.org|pubmed|ncbi\.nlm\.nih\.gov|'
r'nature\.com|science\.org|wiley\.com|springer\.com|ieee\.org|acm\.org)'
r'[^\s\)\]\,\<\>\"\']*'
)
url_matches = re.findall(url_pattern, text, re.IGNORECASE)
seen_urls = set()
for url in url_matches:
url_clean = url.rstrip(".")
if url_clean not in seen_urls:
seen_urls.add(url_clean)
citations.append({"type": "url", "url": url_clean})
return citations
# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------
def lookup(self, query: str) -> Dict[str, Any]:
"""Perform a research lookup, routing to the best backend.
Parallel Chat API is used by default. Perplexity sonar-pro-search
is used only for academic-specific queries (paper searches, DOI lookups).
"""
backend = self._select_backend(query)
print(f"[Research] Backend: {backend} | Query: {query[:80]}...", file=sys.stderr)
if backend == "parallel":
return self._parallel_lookup(query)
else:
return self._perplexity_lookup(query)
def batch_lookup(self, queries: List[str], delay: float = 1.0) -> List[Dict[str, Any]]:
"""Perform multiple research lookups with delay between requests."""
results = []
for i, query in enumerate(queries):
if i > 0 and delay > 0:
time.sleep(delay)
result = self.lookup(query)
results.append(result)
print(f"[Research] Completed query {i+1}/{len(queries)}: {query[:50]}...", file=sys.stderr)
return results
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
"""Command-line interface for the research lookup tool."""
import argparse
parser = argparse.ArgumentParser(
description="Research Information Lookup Tool (Parallel Chat API + Perplexity)",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# General research (uses Parallel Chat API, core model)
python research_lookup.py "latest advances in quantum computing 2025"
# Academic paper search (auto-routes to Perplexity)
python research_lookup.py "find papers on CRISPR gene editing clinical trials"
# Force a specific backend
python research_lookup.py "topic" --force-backend parallel
python research_lookup.py "topic" --force-backend perplexity
# Save output to file
python research_lookup.py "topic" -o results.txt
# JSON output
python research_lookup.py "topic" --json -o results.json
""",
)
parser.add_argument("query", nargs="?", help="Research query to look up")
parser.add_argument("--batch", nargs="+", help="Run multiple queries")
parser.add_argument(
"--force-backend",
choices=["parallel", "perplexity"],
help="Force a specific backend (default: auto-select)",
)
parser.add_argument("-o", "--output", help="Write output to file")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
output_file = None
if args.output:
output_file = open(args.output, "w", encoding="utf-8")
def write_output(text):
if output_file:
output_file.write(text + "\n")
else:
print(text)
has_parallel = bool(os.getenv("PARALLEL_API_KEY"))
has_perplexity = bool(os.getenv("OPENROUTER_API_KEY"))
if not has_parallel and not has_perplexity:
print("Error: No API keys found. Set at least one:", file=sys.stderr)
print(" export PARALLEL_API_KEY='...' (primary - Parallel Chat API)", file=sys.stderr)
print(" export OPENROUTER_API_KEY='...' (fallback - Perplexity academic)", file=sys.stderr)
if output_file:
output_file.close()
return 1
if not args.query and not args.batch:
parser.print_help()
if output_file:
output_file.close()
return 1
try:
research = ResearchLookup(force_backend=args.force_backend)
if args.batch:
print(f"Running batch research for {len(args.batch)} queries...", file=sys.stderr)
results = research.batch_lookup(args.batch)
else:
print(f"Researching: {args.query}", file=sys.stderr)
results = [research.lookup(args.query)]
if args.json:
write_output(json.dumps(results, indent=2, ensure_ascii=False, default=str))
if output_file:
output_file.close()
return 0
for i, result in enumerate(results):
if result["success"]:
write_output(f"\n{'='*80}")
write_output(f"Query {i+1}: {result['query']}")
write_output(f"Timestamp: {result['timestamp']}")
write_output(f"Backend: {result.get('backend', 'unknown')} | Model: {result.get('model', 'unknown')}")
write_output(f"{'='*80}")
write_output(result["response"])
sources = result.get("sources", [])
if sources:
write_output(f"\nSources ({len(sources)}):")
for j, source in enumerate(sources):
title = source.get("title", "Untitled")
url = source.get("url", "")
date = source.get("date", "")
date_str = f" ({date})" if date else ""
write_output(f" [{j+1}] {title}{date_str}")
if url:
write_output(f" {url}")
citations = result.get("citations", [])
text_citations = [c for c in citations if c.get("type") in ("doi", "url")]
if text_citations:
write_output(f"\nAdditional References ({len(text_citations)}):")
for j, citation in enumerate(text_citations):
if citation.get("type") == "doi":
write_output(f" [{j+1}] DOI: {citation.get('doi', '')} - {citation.get('url', '')}")
elif citation.get("type") == "url":
write_output(f" [{j+1}] {citation.get('url', '')}")
if result.get("usage"):
write_output(f"\nUsage: {result['usage']}")
else:
write_output(f"\nError in query {i+1}: {result['error']}")
if output_file:
output_file.close()
return 0
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
if output_file:
output_file.close()
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -1,208 +1,269 @@
#!/usr/bin/env python3
"""
Research Information Lookup Tool
Uses Perplexity's Sonar Pro Search model through OpenRouter for academic research queries.
Routes research queries to the best backend:
- Parallel Chat API (core model): Default for all general research queries
- Perplexity sonar-pro-search (via OpenRouter): Academic-specific paper searches
Environment variables:
PARALLEL_API_KEY - Required for Parallel Chat API (primary backend)
OPENROUTER_API_KEY - Required for Perplexity academic searches (fallback)
"""
import os
import sys
import json
import requests
import re
import time
import requests
from datetime import datetime
from typing import Dict, List, Optional, Any
from urllib.parse import quote
from typing import Any, Dict, List, Optional
class ResearchLookup:
"""Research information lookup using Perplexity Sonar models via OpenRouter."""
"""Research information lookup with intelligent backend routing.
# Available models
MODELS = {
"pro": "perplexity/sonar-pro", # Fast lookup, cost-effective
"reasoning": "perplexity/sonar-reasoning-pro", # Deep analysis with reasoning
}
Routes queries to the Parallel Chat API (default) or Perplexity
sonar-pro-search (academic paper searches only).
"""
# Keywords that indicate complex queries requiring reasoning model
REASONING_KEYWORDS = [
"compare", "contrast", "analyze", "analysis", "evaluate", "critique",
"versus", "vs", "vs.", "compared to", "differences between", "similarities",
"meta-analysis", "systematic review", "synthesis", "integrate",
"mechanism", "why", "how does", "how do", "explain", "relationship",
"theoretical framework", "implications", "interpret", "reasoning",
"controversy", "conflicting", "paradox", "debate", "reconcile",
"pros and cons", "advantages and disadvantages", "trade-off", "tradeoff",
ACADEMIC_KEYWORDS = [
"find papers", "find paper", "find articles", "find article",
"cite ", "citation", "citations for",
"doi ", "doi:", "pubmed", "pmid",
"journal article", "peer-reviewed",
"systematic review", "meta-analysis",
"literature search", "literature on",
"academic papers", "academic paper",
"research papers on", "research paper on",
"published studies", "published study",
"scholarly", "scholar",
"arxiv", "preprint",
"foundational papers", "seminal papers", "landmark papers",
"highly cited", "most cited",
]
def __init__(self, force_model: Optional[str] = None):
"""
Initialize the research lookup tool.
Args:
force_model: Optional model override ('pro' or 'reasoning').
If None, model is auto-selected based on query complexity.
"""
self.api_key = os.getenv("OPENROUTER_API_KEY")
if not self.api_key:
raise ValueError("OPENROUTER_API_KEY environment variable not set")
PARALLEL_SYSTEM_PROMPT = (
"You are a deep research analyst. Provide a comprehensive, well-cited "
"research report on the user's topic. Include:\n"
"- Key findings with specific data, statistics, and quantitative evidence\n"
"- Detailed analysis organized by themes\n"
"- Multiple authoritative sources cited inline\n"
"- Methodologies and implications where relevant\n"
"- Future outlook and research gaps\n"
"Use markdown formatting with clear section headers. "
"Prioritize authoritative and recent sources."
)
self.base_url = "https://openrouter.ai/api/v1"
self.force_model = force_model
self.headers = {
"Authorization": f"Bearer {self.api_key}",
CHAT_BASE_URL = "https://api.parallel.ai"
def __init__(self, force_backend: Optional[str] = None):
"""Initialize the research lookup tool.
Args:
force_backend: Force a specific backend ('parallel' or 'perplexity').
If None, backend is auto-selected based on query content.
"""
self.force_backend = force_backend
self.parallel_available = bool(os.getenv("PARALLEL_API_KEY"))
self.perplexity_available = bool(os.getenv("OPENROUTER_API_KEY"))
if not self.parallel_available and not self.perplexity_available:
raise ValueError(
"No API keys found. Set at least one of:\n"
" PARALLEL_API_KEY (for Parallel Chat API - primary)\n"
" OPENROUTER_API_KEY (for Perplexity academic search - fallback)"
)
def _select_backend(self, query: str) -> str:
"""Select the best backend for a query."""
if self.force_backend:
if self.force_backend == "perplexity" and self.perplexity_available:
return "perplexity"
if self.force_backend == "parallel" and self.parallel_available:
return "parallel"
query_lower = query.lower()
is_academic = any(kw in query_lower for kw in self.ACADEMIC_KEYWORDS)
if is_academic and self.perplexity_available:
return "perplexity"
if self.parallel_available:
return "parallel"
if self.perplexity_available:
return "perplexity"
raise ValueError("No backend available. Check API keys.")
# ------------------------------------------------------------------
# Parallel Chat API backend
# ------------------------------------------------------------------
def _get_chat_client(self):
"""Lazy-load and cache the OpenAI client for Parallel Chat API."""
if not hasattr(self, "_chat_client"):
try:
from openai import OpenAI
except ImportError:
raise ImportError(
"The 'openai' package is required for Parallel Chat API.\n"
"Install it with: pip install openai"
)
self._chat_client = OpenAI(
api_key=os.getenv("PARALLEL_API_KEY"),
base_url=self.CHAT_BASE_URL,
)
return self._chat_client
def _parallel_lookup(self, query: str) -> Dict[str, Any]:
"""Run research via the Parallel Chat API (core model)."""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
model = "core"
try:
client = self._get_chat_client()
print(f"[Research] Parallel Chat API (model={model})...", file=sys.stderr)
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": self.PARALLEL_SYSTEM_PROMPT},
{"role": "user", "content": query},
],
stream=False,
)
content = ""
if response.choices and len(response.choices) > 0:
content = response.choices[0].message.content or ""
api_citations = self._extract_basis_citations(response)
text_citations = self._extract_citations_from_text(content)
return {
"success": True,
"query": query,
"response": content,
"citations": api_citations + text_citations,
"sources": api_citations,
"timestamp": timestamp,
"backend": "parallel",
"model": f"parallel-chat/{model}",
}
except Exception as e:
return {
"success": False,
"query": query,
"error": str(e),
"timestamp": timestamp,
"backend": "parallel",
"model": f"parallel-chat/{model}",
}
def _extract_basis_citations(self, response) -> List[Dict[str, str]]:
"""Extract citation sources from the Chat API research basis."""
citations = []
basis = getattr(response, "basis", None)
if not basis:
return citations
seen_urls = set()
if isinstance(basis, list):
for item in basis:
cits = (
item.get("citations", []) if isinstance(item, dict)
else getattr(item, "citations", None) or []
)
for cit in cits:
url = cit.get("url", "") if isinstance(cit, dict) else getattr(cit, "url", "")
if url and url not in seen_urls:
seen_urls.add(url)
title = cit.get("title", "") if isinstance(cit, dict) else getattr(cit, "title", "")
excerpts = cit.get("excerpts", []) if isinstance(cit, dict) else getattr(cit, "excerpts", [])
citations.append({
"type": "source",
"url": url,
"title": title,
"excerpts": excerpts,
})
return citations
# ------------------------------------------------------------------
# Perplexity academic search backend
# ------------------------------------------------------------------
def _perplexity_lookup(self, query: str) -> Dict[str, Any]:
"""Run academic search via Perplexity sonar-pro-search through OpenRouter."""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
api_key = os.getenv("OPENROUTER_API_KEY")
model = "perplexity/sonar-pro-search"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"HTTP-Referer": "https://scientific-writer.local",
"X-Title": "Scientific Writer Research Tool"
"X-Title": "Scientific Writer Research Tool",
}
def _select_model(self, query: str) -> str:
"""
Select the appropriate model based on query complexity.
Args:
query: The research query
Returns:
Model identifier string
"""
if self.force_model:
return self.MODELS.get(self.force_model, self.MODELS["reasoning"])
# Check for reasoning keywords (case-insensitive)
query_lower = query.lower()
for keyword in self.REASONING_KEYWORDS:
if keyword in query_lower:
return self.MODELS["reasoning"]
# Check for multiple questions or complex structure
question_count = query.count("?")
if question_count >= 2:
return self.MODELS["reasoning"]
# Check for very long queries (likely complex)
if len(query) > 200:
return self.MODELS["reasoning"]
# Default to pro for simple lookups
return self.MODELS["pro"]
research_prompt = self._format_academic_prompt(query)
messages = [
{
"role": "system",
"content": (
"You are an academic research assistant specializing in finding "
"HIGH-IMPACT, INFLUENTIAL research.\n\n"
"QUALITY PRIORITIZATION (CRITICAL):\n"
"- ALWAYS prefer highly-cited papers over obscure publications\n"
"- ALWAYS prioritize Tier-1 venues: Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS\n"
"- ALWAYS prefer papers from established researchers\n"
"- Include citation counts when known (e.g., 'cited 500+ times')\n"
"- Quality matters more than quantity\n\n"
"VENUE HIERARCHY:\n"
"1. Nature/Science/Cell family, NEJM, Lancet, JAMA (highest)\n"
"2. High-impact specialized journals (IF>10), top conferences (NeurIPS, ICML, ICLR)\n"
"3. Respected field-specific journals (IF 5-10)\n"
"4. Other peer-reviewed sources (only if no better option)\n\n"
"Focus exclusively on scholarly sources. Prioritize recent literature (2020-2026) "
"and provide complete citations with DOIs."
),
},
{"role": "user", "content": research_prompt},
]
def _make_request(self, messages: List[Dict[str, str]], model: str, **kwargs) -> Dict[str, Any]:
"""Make a request to the OpenRouter API with academic search mode."""
data = {
"model": model,
"messages": messages,
"max_tokens": 8000,
"temperature": 0.1, # Low temperature for factual research
# Perplexity-specific parameters for academic search
"search_mode": "academic", # Prioritize scholarly sources (peer-reviewed papers, journals)
"search_context_size": "high", # Always use high context for deeper research
**kwargs
"temperature": 0.1,
"search_mode": "academic",
"search_context_size": "high",
}
try:
response = requests.post(
f"{self.base_url}/chat/completions",
headers=self.headers,
"https://openrouter.ai/api/v1/chat/completions",
headers=headers,
json=data,
timeout=90 # Increased timeout for academic search
timeout=90,
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
raise Exception(f"API request failed: {str(e)}")
resp_json = response.json()
def _format_research_prompt(self, query: str) -> str:
"""Format the query for optimal research results."""
return f"""You are an expert research assistant. Please provide comprehensive, accurate research information for the following query: "{query}"
IMPORTANT INSTRUCTIONS:
1. Focus on ACADEMIC and SCIENTIFIC sources (peer-reviewed papers, reputable journals, institutional research)
2. Include RECENT information (prioritize 2020-2026 publications)
3. Provide COMPLETE citations with authors, title, journal/conference, year, and DOI when available
4. Structure your response with clear sections and proper attribution
5. Be comprehensive but concise - aim for 800-1200 words
6. Include key findings, methodologies, and implications when relevant
7. Note any controversies, limitations, or conflicting evidence
PAPER QUALITY AND POPULARITY PRIORITIZATION (CRITICAL):
8. ALWAYS prioritize HIGHLY-CITED papers over obscure publications:
- Recent papers (0-3 years): prefer 20+ citations, highlight 100+ as highly influential
- Mid-age papers (3-7 years): prefer 100+ citations, highlight 500+ as landmark
- Older papers (7+ years): prefer 500+ citations, highlight 1000+ as foundational
9. ALWAYS prioritize papers from TOP-TIER VENUES:
- Tier 1 (highest priority): Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS, Nature Medicine, Nature Biotechnology
- Tier 2 (high priority): High-impact specialized journals (IF>10), top conferences (NeurIPS, ICML, ICLR for AI/ML)
- Tier 3: Respected specialized journals (IF 5-10)
- Only cite lower-tier venues if directly relevant AND no better source exists
10. PREFER papers from ESTABLISHED, REPUTABLE AUTHORS:
- Senior researchers with high h-index and multiple high-impact publications
- Leading research groups at recognized institutions
- Authors with recognized expertise (awards, editorial positions)
11. For EACH citation, include when available:
- Approximate citation count (e.g., "cited 500+ times")
- Journal/venue tier indicator
- Notable author credentials if relevant
12. PRIORITIZE papers that DIRECTLY address the research question over tangentially related work
RESPONSE FORMAT:
- Start with a brief summary (2-3 sentences)
- Present key findings and studies in organized sections
- Rank papers by impact: most influential/cited first
- End with future directions or research gaps if applicable
- Include 5-8 high-quality citations, emphasizing Tier-1 venues and highly-cited papers
Remember: Quality over quantity. Prioritize influential, highly-cited papers from prestigious venues and established researchers."""
def lookup(self, query: str) -> Dict[str, Any]:
"""Perform a research lookup for the given query."""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
# Select model based on query complexity
model = self._select_model(query)
# Format the research prompt
research_prompt = self._format_research_prompt(query)
# Prepare messages for the API with system message for academic mode
messages = [
{
"role": "system",
"content": """You are an academic research assistant specializing in finding HIGH-IMPACT, INFLUENTIAL research.
QUALITY PRIORITIZATION (CRITICAL):
- ALWAYS prefer highly-cited papers over obscure publications
- ALWAYS prioritize Tier-1 venues: Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS, and their family journals
- ALWAYS prefer papers from established researchers with strong publication records
- Include citation counts when known (e.g., "cited 500+ times")
- Quality matters more than quantity - 5 excellent papers beats 10 mediocre ones
VENUE HIERARCHY:
1. Nature/Science/Cell family, NEJM, Lancet, JAMA (highest priority)
2. High-impact specialized journals (IF>10), top ML conferences (NeurIPS, ICML, ICLR)
3. Respected field-specific journals (IF 5-10)
4. Other peer-reviewed sources (only if no better option exists)
Focus exclusively on scholarly sources: peer-reviewed journals, academic papers, research institutions. Prioritize recent academic literature (2020-2026) and provide complete citations with DOIs. Always indicate paper impact through citation counts and venue prestige."""
},
{"role": "user", "content": research_prompt}
]
try:
# Make the API request
response = self._make_request(messages, model)
# Extract the response content
if "choices" in response and len(response["choices"]) > 0:
choice = response["choices"][0]
if "choices" in resp_json and len(resp_json["choices"]) > 0:
choice = resp_json["choices"][0]
if "message" in choice and "content" in choice["message"]:
content = choice["message"]["content"]
# Extract citations from API response (Perplexity provides these)
api_citations = self._extract_api_citations(response, choice)
# Also extract citations from text as fallback
api_citations = self._extract_api_citations(resp_json, choice)
text_citations = self._extract_citations_from_text(content)
# Combine: prioritize API citations, add text citations if no duplicates
citations = api_citations + text_citations
return {
@@ -210,10 +271,11 @@ Focus exclusively on scholarly sources: peer-reviewed journals, academic papers,
"query": query,
"response": content,
"citations": citations,
"sources": api_citations, # Separate field for API-provided sources
"sources": api_citations,
"timestamp": timestamp,
"backend": "perplexity",
"model": model,
"usage": response.get("usage", {})
"usage": resp_json.get("usage", {}),
}
else:
raise Exception("Invalid response format from API")
@@ -226,22 +288,54 @@ Focus exclusively on scholarly sources: peer-reviewed journals, academic papers,
"query": query,
"error": str(e),
"timestamp": timestamp,
"model": model
"backend": "perplexity",
"model": model,
}
# ------------------------------------------------------------------
# Shared utilities
# ------------------------------------------------------------------
def _format_academic_prompt(self, query: str) -> str:
"""Format a query for academic research results via Perplexity."""
return f"""You are an expert research assistant. Please provide comprehensive, accurate research information for the following query: "{query}"
IMPORTANT INSTRUCTIONS:
1. Focus on ACADEMIC and SCIENTIFIC sources (peer-reviewed papers, reputable journals, institutional research)
2. Include RECENT information (prioritize 2020-2026 publications)
3. Provide COMPLETE citations with authors, title, journal/conference, year, and DOI when available
4. Structure your response with clear sections and proper attribution
5. Be comprehensive but concise - aim for 800-1200 words
6. Include key findings, methodologies, and implications when relevant
7. Note any controversies, limitations, or conflicting evidence
PAPER QUALITY PRIORITIZATION (CRITICAL):
8. ALWAYS prioritize HIGHLY-CITED papers over obscure publications
9. ALWAYS prioritize papers from TOP-TIER VENUES (Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS)
10. PREFER papers from ESTABLISHED, REPUTABLE AUTHORS
11. For EACH citation include when available: citation count, venue tier, author credentials
12. PRIORITIZE papers that DIRECTLY address the research question
RESPONSE FORMAT:
- Start with a brief summary (2-3 sentences)
- Present key findings and studies in organized sections
- Rank papers by impact: most influential/cited first
- End with future directions or research gaps if applicable
- Include 5-8 high-quality citations
Remember: Quality over quantity. Prioritize influential, highly-cited papers from prestigious venues."""
def _extract_api_citations(self, response: Dict[str, Any], choice: Dict[str, Any]) -> List[Dict[str, str]]:
"""Extract citations from Perplexity API response fields."""
citations = []
# Perplexity returns citations in search_results field (new format)
# Check multiple possible locations where OpenRouter might place them
search_results = (
response.get("search_results") or
choice.get("search_results") or
choice.get("message", {}).get("search_results") or
[]
response.get("search_results")
or choice.get("search_results")
or choice.get("message", {}).get("search_results")
or []
)
for result in search_results:
citation = {
"type": "source",
@@ -249,162 +343,164 @@ Focus exclusively on scholarly sources: peer-reviewed journals, academic papers,
"url": result.get("url", ""),
"date": result.get("date", ""),
}
# Add snippet if available (newer API feature)
if result.get("snippet"):
citation["snippet"] = result.get("snippet")
citation["snippet"] = result["snippet"]
citations.append(citation)
# Also check for legacy citations field (backward compatibility)
legacy_citations = (
response.get("citations") or
choice.get("citations") or
choice.get("message", {}).get("citations") or
[]
response.get("citations")
or choice.get("citations")
or choice.get("message", {}).get("citations")
or []
)
for url in legacy_citations:
if isinstance(url, str):
# Legacy format was just URLs
citations.append({
"type": "source",
"url": url,
"title": "",
"date": ""
})
citations.append({"type": "source", "url": url, "title": "", "date": ""})
elif isinstance(url, dict):
citations.append({
"type": "source",
"url": url.get("url", ""),
"title": url.get("title", ""),
"date": url.get("date", "")
"date": url.get("date", ""),
})
return citations
def _extract_citations_from_text(self, text: str) -> List[Dict[str, str]]:
"""Extract potential citations from the response text as fallback."""
import re
"""Extract DOIs and academic URLs from response text as fallback."""
citations = []
# Look for DOI patterns first (most reliable)
# Matches: doi:10.xxx, DOI: 10.xxx, https://doi.org/10.xxx
doi_pattern = r'(?:doi[:\s]*|https?://(?:dx\.)?doi\.org/)(10\.[0-9]{4,}/[^\s\)\]\,\[\<\>]+)'
doi_matches = re.findall(doi_pattern, text, re.IGNORECASE)
seen_dois = set()
for doi in doi_matches:
# Clean up DOI - remove trailing punctuation and brackets
doi_clean = doi.strip().rstrip('.,;:)]')
doi_clean = doi.strip().rstrip(".,;:)]")
if doi_clean and doi_clean not in seen_dois:
seen_dois.add(doi_clean)
citations.append({
"type": "doi",
"doi": doi_clean,
"url": f"https://doi.org/{doi_clean}"
"url": f"https://doi.org/{doi_clean}",
})
# Look for URLs that might be sources
url_pattern = r'https?://[^\s\)\]\,\<\>\"\']+(?:arxiv\.org|pubmed|ncbi\.nlm\.nih\.gov|nature\.com|science\.org|wiley\.com|springer\.com|ieee\.org|acm\.org)[^\s\)\]\,\<\>\"\']*'
url_pattern = (
r'https?://[^\s\)\]\,\<\>\"\']+(?:arxiv\.org|pubmed|ncbi\.nlm\.nih\.gov|'
r'nature\.com|science\.org|wiley\.com|springer\.com|ieee\.org|acm\.org)'
r'[^\s\)\]\,\<\>\"\']*'
)
url_matches = re.findall(url_pattern, text, re.IGNORECASE)
seen_urls = set()
for url in url_matches:
url_clean = url.rstrip('.')
url_clean = url.rstrip(".")
if url_clean not in seen_urls:
seen_urls.add(url_clean)
citations.append({
"type": "url",
"url": url_clean
})
citations.append({"type": "url", "url": url_clean})
return citations
def batch_lookup(self, queries: List[str], delay: float = 1.0) -> List[Dict[str, Any]]:
"""Perform multiple research lookups with optional delay between requests."""
results = []
# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------
def lookup(self, query: str) -> Dict[str, Any]:
"""Perform a research lookup, routing to the best backend.
Parallel Chat API is used by default. Perplexity sonar-pro-search
is used only for academic-specific queries (paper searches, DOI lookups).
"""
backend = self._select_backend(query)
print(f"[Research] Backend: {backend} | Query: {query[:80]}...", file=sys.stderr)
if backend == "parallel":
return self._parallel_lookup(query)
else:
return self._perplexity_lookup(query)
def batch_lookup(self, queries: List[str], delay: float = 1.0) -> List[Dict[str, Any]]:
"""Perform multiple research lookups with delay between requests."""
results = []
for i, query in enumerate(queries):
if i > 0 and delay > 0:
time.sleep(delay) # Rate limiting
time.sleep(delay)
result = self.lookup(query)
results.append(result)
# Print progress
print(f"[Research] Completed query {i+1}/{len(queries)}: {query[:50]}...")
print(f"[Research] Completed query {i+1}/{len(queries)}: {query[:50]}...", file=sys.stderr)
return results
def get_model_info(self) -> Dict[str, Any]:
"""Get information about available models from OpenRouter."""
try:
response = requests.get(
f"{self.base_url}/models",
headers=self.headers,
timeout=30
)
response.raise_for_status()
return response.json()
except Exception as e:
return {"error": str(e)}
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
"""Command-line interface for testing the research lookup tool."""
"""Command-line interface for the research lookup tool."""
import argparse
import sys
parser = argparse.ArgumentParser(description="Research Information Lookup Tool")
parser = argparse.ArgumentParser(
description="Research Information Lookup Tool (Parallel Chat API + Perplexity)",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# General research (uses Parallel Chat API, core model)
python research_lookup.py "latest advances in quantum computing 2025"
# Academic paper search (auto-routes to Perplexity)
python research_lookup.py "find papers on CRISPR gene editing clinical trials"
# Force a specific backend
python research_lookup.py "topic" --force-backend parallel
python research_lookup.py "topic" --force-backend perplexity
# Save output to file
python research_lookup.py "topic" -o results.txt
# JSON output
python research_lookup.py "topic" --json -o results.json
""",
)
parser.add_argument("query", nargs="?", help="Research query to look up")
parser.add_argument("--model-info", action="store_true", help="Show available models")
parser.add_argument("--batch", nargs="+", help="Run multiple queries")
parser.add_argument("--force-model", choices=["pro", "reasoning"],
help="Force specific model: 'pro' for fast lookup, 'reasoning' for deep analysis")
parser.add_argument("-o", "--output", help="Write output to file instead of stdout")
parser.add_argument("--json", action="store_true", help="Output results as JSON")
parser.add_argument(
"--force-backend",
choices=["parallel", "perplexity"],
help="Force a specific backend (default: auto-select)",
)
parser.add_argument("-o", "--output", help="Write output to file")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
# Set up output destination
output_file = None
if args.output:
output_file = open(args.output, 'w', encoding='utf-8')
output_file = open(args.output, "w", encoding="utf-8")
def write_output(text):
"""Write to file or stdout."""
if output_file:
output_file.write(text + '\n')
output_file.write(text + "\n")
else:
print(text)
# Check for API key
if not os.getenv("OPENROUTER_API_KEY"):
print("Error: OPENROUTER_API_KEY environment variable not set", file=sys.stderr)
print("Please set it in your .env file or export it:", file=sys.stderr)
print(" export OPENROUTER_API_KEY='your_openrouter_api_key'", file=sys.stderr)
has_parallel = bool(os.getenv("PARALLEL_API_KEY"))
has_perplexity = bool(os.getenv("OPENROUTER_API_KEY"))
if not has_parallel and not has_perplexity:
print("Error: No API keys found. Set at least one:", file=sys.stderr)
print(" export PARALLEL_API_KEY='...' (primary - Parallel Chat API)", file=sys.stderr)
print(" export OPENROUTER_API_KEY='...' (fallback - Perplexity academic)", file=sys.stderr)
if output_file:
output_file.close()
return 1
if not args.query and not args.batch:
parser.print_help()
if output_file:
output_file.close()
return 1
try:
research = ResearchLookup(force_model=args.force_model)
if args.model_info:
write_output("Available models from OpenRouter:")
models = research.get_model_info()
if "data" in models:
for model in models["data"]:
if "perplexity" in model["id"].lower():
write_output(f" - {model['id']}: {model.get('name', 'N/A')}")
if output_file:
output_file.close()
return 0
if not args.query and not args.batch:
print("Error: No query provided. Use --model-info to see available models.", file=sys.stderr)
if output_file:
output_file.close()
return 1
research = ResearchLookup(force_backend=args.force_backend)
if args.batch:
print(f"Running batch research for {len(args.batch)} queries...", file=sys.stderr)
@@ -413,27 +509,24 @@ def main():
print(f"Researching: {args.query}", file=sys.stderr)
results = [research.lookup(args.query)]
# Output as JSON if requested
if args.json:
write_output(json.dumps(results, indent=2, ensure_ascii=False))
write_output(json.dumps(results, indent=2, ensure_ascii=False, default=str))
if output_file:
output_file.close()
return 0
# Display results in human-readable format
for i, result in enumerate(results):
if result["success"]:
write_output(f"\n{'='*80}")
write_output(f"Query {i+1}: {result['query']}")
write_output(f"Timestamp: {result['timestamp']}")
write_output(f"Model: {result['model']}")
write_output(f"Backend: {result.get('backend', 'unknown')} | Model: {result.get('model', 'unknown')}")
write_output(f"{'='*80}")
write_output(result["response"])
# Display API-provided sources first (most reliable)
sources = result.get("sources", [])
if sources:
write_output(f"\n📚 Sources ({len(sources)}):")
write_output(f"\nSources ({len(sources)}):")
for j, source in enumerate(sources):
title = source.get("title", "Untitled")
url = source.get("url", "")
@@ -443,11 +536,10 @@ def main():
if url:
write_output(f" {url}")
# Display additional text-extracted citations
citations = result.get("citations", [])
text_citations = [c for c in citations if c.get("type") in ("doi", "url")]
if text_citations:
write_output(f"\n🔗 Additional References ({len(text_citations)}):")
write_output(f"\nAdditional References ({len(text_citations)}):")
for j, citation in enumerate(text_citations):
if citation.get("type") == "doi":
write_output(f" [{j+1}] DOI: {citation.get('doi', '')} - {citation.get('url', '')}")
@@ -464,11 +556,11 @@ def main():
return 0
except Exception as e:
print(f"Error: {str(e)}", file=sys.stderr)
print(f"Error: {e}", file=sys.stderr)
if output_file:
output_file.close()
return 1
if __name__ == "__main__":
exit(main())
sys.exit(main())