Enhance citation management and literature review guidelines

- Updated SKILL.md in citation management to include best practices for identifying seminal and high-impact papers, emphasizing citation count thresholds, venue quality tiers, and author reputation indicators. - Expanded literature review SKILL.md to prioritize high-impact papers, detailing citation metrics, journal tiers, and author reputation assessment. - Added comprehensive evaluation strategies for paper impact and quality in literature_search_strategies.md, including citation count significance and journal impact factor guidance. - Improved research lookup scripts to prioritize results based on citation count, venue prestige, and author reputation, enhancing the quality of research outputs.
2026-03-27 07:09:27 +08:00 · 2026-01-05 13:01:10 -08:00
parent d243a12564
commit 3439a21f57
41 changed files with 11802 additions and 61 deletions
--- a/scientific-skills/venue-templates/references/cs_conference_style.md
+++ b/scientific-skills/venue-templates/references/cs_conference_style.md
@@ -0,0 +1,463 @@
+# CS Conference Writing Style Guide
+
+Comprehensive writing guide for ACL, EMNLP, NAACL (NLP), CHI, CSCW (HCI), SIGKDD, WWW, SIGIR (data mining/IR), and other major CS conferences.
+
+**Last Updated**: 2024
+
+---
+
+## Overview
+
+CS conferences span diverse subfields with distinct writing cultures. This guide covers NLP, HCI, and data mining/IR venues, each with unique expectations and evaluation criteria.
+
+---
+
+# Part 1: NLP Conferences (ACL, EMNLP, NAACL)
+
+## NLP Writing Philosophy
+
+> "Strong empirical results on standard benchmarks with insightful analysis."
+
+NLP papers balance empirical rigor with linguistic insight. Human evaluation is increasingly important alongside automatic metrics.
+
+## Audience and Tone
+
+### Target Reader
+- NLP researchers and computational linguists
+- Familiar with transformer architectures, standard benchmarks
+- Expect reproducible results and error analysis
+
+### Tone Characteristics
+| Characteristic | Description |
+|---------------|-------------|
+| **Task-focused** | Clear problem definition |
+| **Benchmark-oriented** | Standard datasets emphasized |
+| **Analysis-rich** | Error analysis, qualitative examples |
+| **Reproducible** | Full implementation details |
+
+## Abstract (NLP Style)
+
+### Structure
+- **Task/problem** (1 sentence)
+- **Limitation of prior work** (1 sentence)
+- **Your approach** (1-2 sentences)
+- **Results on benchmarks** (2 sentences)
+- **Analysis finding** (optional, 1 sentence)
+
+### Example Abstract
+
+```
+Coreference resolution remains challenging for pronouns with distant or 
+ambiguous antecedents. Prior neural approaches struggle with these 
+difficult cases due to limited context modeling. We introduce 
+LongContext-Coref, a retrieval-augmented coreference model that 
+dynamically retrieves relevant context from document history. On the 
+OntoNotes 5.0 benchmark, LongContext-Coref achieves 83.4 F1, improving 
+over the previous state-of-the-art by 1.2 points. On the challenging 
+WinoBias dataset, we reduce gender bias by 34% while maintaining 
+accuracy. Qualitative analysis reveals that our model successfully 
+resolves pronouns requiring world knowledge, a known weakness of 
+prior approaches.
+```
+
+## NLP Paper Structure
+
+```
+├── Introduction
+│   ├── Task motivation
+│   ├── Prior work limitations
+│   ├── Your contribution
+│   └── Contribution bullets
+├── Related Work
+├── Method
+│   ├── Problem formulation
+│   ├── Model architecture
+│   └── Training procedure
+├── Experiments
+│   ├── Datasets (with statistics)
+│   ├── Baselines
+│   ├── Main results
+│   ├── Analysis
+│   │   ├── Error analysis
+│   │   ├── Ablation study
+│   │   └── Qualitative examples
+│   └── Human evaluation (if applicable)
+├── Discussion / Limitations
+└── Conclusion
+```
+
+## NLP-Specific Requirements
+
+### Datasets
+- Use **standard benchmarks**: GLUE, SQuAD, CoNLL, OntoNotes
+- Report **dataset statistics**: train/dev/test sizes
+- **Data preprocessing**: Document all steps
+
+### Evaluation Metrics
+- **Task-appropriate metrics**: F1, BLEU, ROUGE, accuracy
+- **Statistical significance**: Paired bootstrap, p-values
+- **Multiple runs**: Report mean ± std across seeds
+
+### Human Evaluation
+Increasingly expected for generation tasks:
+- **Annotator details**: Number, qualifications, agreement
+- **Evaluation protocol**: Guidelines, interface, payment
+- **Inter-annotator agreement**: Cohen's κ or Krippendorff's α
+
+### Example Human Evaluation Table
+
+```
+Table 3: Human Evaluation Results (100 samples, 3 annotators)
+─────────────────────────────────────────────────────────────
+Method        | Fluency | Coherence | Factuality | Overall
+─────────────────────────────────────────────────────────────
+Baseline      |   3.8   |    3.2    |    3.5     |   3.5
+GPT-3.5       |   4.2   |    4.0    |    3.7     |   4.0
+Our Method    |   4.4   |    4.3    |    4.1     |   4.3
+─────────────────────────────────────────────────────────────
+Inter-annotator κ = 0.72. Scale: 1-5 (higher is better).
+```
+
+## ACL-Specific Notes
+
+- **ARR (ACL Rolling Review)**: Shared review system across ACL venues
+- **Responsible NLP checklist**: Ethics, limitations, risks
+- **Long (8 pages) vs. Short (4 pages)**: Different expectations
+- **Findings papers**: Lower-tier acceptance track
+
+---
+
+# Part 2: HCI Conferences (CHI, CSCW, UIST)
+
+## HCI Writing Philosophy
+
+> "Technology in service of humans—understand users first, then design and evaluate."
+
+HCI papers are fundamentally **user-centered**. Technology novelty alone is insufficient; understanding human needs and demonstrating user benefit is essential.
+
+## Audience and Tone
+
+### Target Reader
+- HCI researchers and practitioners
+- UX designers and product developers
+- Interdisciplinary (CS, psychology, design, social science)
+
+### Tone Characteristics
+| Characteristic | Description |
+|---------------|-------------|
+| **User-centered** | Focus on people, not technology |
+| **Design-informed** | Grounded in design thinking |
+| **Empirical** | User studies provide evidence |
+| **Reflective** | Consider broader implications |
+
+## HCI Abstract
+
+### Focus on Users and Impact
+
+```
+Video calling has become essential for remote collaboration, yet 
+current interfaces poorly support the peripheral awareness that makes 
+in-person work effective. Through formative interviews with 24 remote 
+workers, we identified three key challenges: difficulty gauging 
+colleague availability, lack of ambient presence cues, and interruption 
+anxiety. We designed AmbientOffice, a peripheral display system that 
+conveys teammate presence through subtle ambient visualizations. In a 
+two-week deployment study with 18 participants across three distributed 
+teams, AmbientOffice increased spontaneous collaboration by 40% and 
+reduced perceived isolation (p<0.01). Participants valued the system's 
+non-intrusive nature and reported feeling more connected to remote 
+colleagues. We discuss implications for designing ambient awareness 
+systems and the tension between visibility and privacy in remote work.
+```
+
+## HCI Paper Structure
+
+### Research Through Design / Systems Papers
+
+```
+├── Introduction
+│   ├── Problem in human terms
+│   ├── Why technology can help
+│   └── Contribution summary
+├── Related Work
+│   ├── Domain background
+│   ├── Prior systems
+│   └── Theoretical frameworks
+├── Formative Work (often)
+│   ├── Interviews / observations
+│   └── Design requirements
+├── System Design
+│   ├── Design rationale
+│   ├── Implementation
+│   └── Interface walkthrough
+├── Evaluation
+│   ├── Study design
+│   ├── Participants
+│   ├── Procedure
+│   ├── Findings (quant + qual)
+│   └── Limitations
+├── Discussion
+│   ├── Design implications
+│   ├── Generalizability
+│   └── Future work
+└── Conclusion
+```
+
+### Qualitative / Interview Studies
+
+```
+├── Introduction
+├── Related Work
+├── Methods
+│   ├── Participants
+│   ├── Procedure
+│   ├── Data collection
+│   └── Analysis method (thematic, grounded theory, etc.)
+├── Findings
+│   ├── Theme 1 (with quotes)
+│   ├── Theme 2 (with quotes)
+│   └── Theme 3 (with quotes)
+├── Discussion
+│   ├── Implications for design
+│   ├── Implications for research
+│   └── Limitations
+└── Conclusion
+```
+
+## HCI-Specific Requirements
+
+### Participant Reporting
+- **Demographics**: Age, gender, relevant experience
+- **Recruitment**: How and where recruited
+- **Compensation**: Payment amount and type
+- **IRB approval**: Ethics board statement
+
+### Quotes in Findings
+Use direct quotes to ground findings:
+```
+Participants valued the ambient nature of the display. As P7 described: 
+"It's like having a window to my teammate's office. I don't need to 
+actively check it, but I know they're there." This passive awareness 
+reduced the barrier to initiating contact.
+```
+
+### Design Implications Section
+Translate findings into actionable guidance:
+```
+**Implication 1: Support peripheral awareness without demanding attention.**
+Ambient displays should be visible in peripheral vision but not require 
+active monitoring. Designers should consider calm technology principles.
+
+**Implication 2: Balance visibility with privacy.**
+Users want to share presence but fear surveillance. Systems should 
+provide granular controls and make visibility mutual.
+```
+
+## CHI-Specific Notes
+
+- **Contribution types**: Empirical, artifact, methodological, theoretical
+- **ACM format**: `acmart` document class with `sigchi` option
+- **Accessibility**: Alt text, inclusive language expected
+- **Contribution statement**: Required per-author contributions
+
+---
+
+# Part 3: Data Mining & IR (SIGKDD, WWW, SIGIR)
+
+## Data Mining Writing Philosophy
+
+> "Scalable methods for real-world data with demonstrated practical impact."
+
+Data mining papers emphasize **scalability**, **real-world applicability**, and **solid experimental methodology**.
+
+## Audience and Tone
+
+### Target Reader
+- Data scientists and ML engineers
+- Industry researchers
+- Applied ML practitioners
+
+### Tone Characteristics
+| Characteristic | Description |
+|---------------|-------------|
+| **Scalable** | Handle large datasets |
+| **Practical** | Real-world applications |
+| **Reproducible** | Datasets and code shared |
+| **Industrial** | Industry datasets valued |
+
+## KDD Abstract
+
+### Emphasize Scale and Application
+
+```
+Fraud detection in e-commerce requires processing millions of 
+transactions in real-time while adapting to evolving attack patterns. 
+We present FraudShield, a graph neural network framework for real-time 
+fraud detection that scales to billion-edge transaction graphs. Unlike 
+prior methods that require full graph access, FraudShield uses 
+incremental updates with O(1) inference cost per transaction. On a 
+proprietary dataset of 2.3 billion transactions from a major e-commerce 
+platform, FraudShield achieves 94.2% precision at 80% recall, 
+outperforming production baselines by 12%. The system has been deployed 
+at [Company], processing 50K transactions per second and preventing 
+an estimated $400M in annual fraud losses. We release an anonymized 
+benchmark dataset and code.
+```
+
+## KDD Paper Structure
+
+```
+├── Introduction
+│   ├── Problem and impact
+│   ├── Technical challenges
+│   ├── Your approach
+│   └── Contributions
+├── Related Work
+├── Preliminaries
+│   ├── Problem definition
+│   └── Notation
+├── Method
+│   ├── Overview
+│   ├── Technical components
+│   └── Complexity analysis
+├── Experiments
+│   ├── Datasets (with scale statistics)
+│   ├── Baselines
+│   ├── Main results
+│   ├── Scalability experiments
+│   ├── Ablation study
+│   └── Case study / deployment
+└── Conclusion
+```
+
+## KDD-Specific Requirements
+
+### Scalability
+- **Dataset sizes**: Report number of nodes, edges, samples
+- **Runtime analysis**: Wall-clock time comparisons
+- **Complexity**: Time and space complexity stated
+- **Scaling experiments**: Show performance vs. data size
+
+### Industrial Deployment
+- **Case studies**: Real-world deployment stories
+- **A/B tests**: Online evaluation results (if applicable)
+- **Production metrics**: Business impact (if shareable)
+
+### Example Scalability Table
+
+```
+Table 4: Scalability Comparison (runtime in seconds)
+──────────────────────────────────────────────────────
+Dataset     | Nodes  | Edges  | GCN   | GraphSAGE | Ours
+──────────────────────────────────────────────────────
+Cora        |  2.7K  |  5.4K  |  0.3  |    0.2    |  0.1
+Citeseer    |  3.3K  |  4.7K  |  0.4  |    0.3    |  0.1
+PubMed      | 19.7K  | 44.3K  |  1.2  |    0.8    |  0.3
+ogbn-arxiv  | 169K   | 1.17M  |  8.4  |    4.2    |  1.6
+ogbn-papers | 111M   | 1.6B   |  OOM  |   OOM     | 42.3
+──────────────────────────────────────────────────────
+```
+
+---
+
+# Part 4: Common Elements Across CS Venues
+
+## Writing Quality
+
+### Clarity
+- **One idea per sentence**
+- **Define terms before use**
+- **Use consistent notation**
+
+### Precision
+- **Exact numbers**: "23.4%" not "about 20%"
+- **Clear claims**: Avoid hedging unless necessary
+- **Specific comparisons**: Name the baseline
+
+## Contribution Bullets
+
+Used across all CS venues:
+```
+Our contributions are:
+• We identify [problem/insight]
+• We propose [method name] that [key innovation]
+• We demonstrate [results] on [benchmarks]
+• We release [code/data] at [URL]
+```
+
+## Reproducibility Standards
+
+All CS venues increasingly expect:
+- **Code availability**: GitHub link (anonymous for review)
+- **Data availability**: Public datasets or release plans
+- **Full hyperparameters**: Training details complete
+- **Random seeds**: Exact values for reproduction
+
+## Ethics and Broader Impact
+
+### NLP (ACL/EMNLP)
+- **Limitations section**: Required
+- **Responsible NLP checklist**: Ethical considerations
+- **Bias analysis**: For models affecting people
+
+### HCI (CHI)
+- **IRB/Ethics approval**: Required for human subjects
+- **Informed consent**: Procedure described
+- **Privacy considerations**: Data handling
+
+### KDD/WWW
+- **Societal impact**: Consider misuse potential
+- **Privacy preservation**: For sensitive data
+- **Fairness analysis**: When applicable
+
+---
+
+## Venue Comparison Table
+
+| Aspect | ACL/EMNLP | CHI | KDD/WWW | SIGIR |
+|--------|-----------|-----|---------|-------|
+| **Focus** | NLP tasks | User studies | Scalable ML | IR/search |
+| **Evaluation** | Benchmarks + human | User studies | Large-scale exp | Datasets |
+| **Theory weight** | Moderate | Low | Moderate | Moderate |
+| **Industry value** | High | Medium | Very high | High |
+| **Page limit** | 8 long / 4 short | 10 + refs | 9 + refs | 10 + refs |
+| **Review style** | ARR | Direct | Direct | Direct |
+
+---
+
+## Pre-Submission Checklist
+
+### All CS Venues
+- [ ] Clear contribution statement
+- [ ] Strong baselines
+- [ ] Reproducibility information complete
+- [ ] Correct venue template
+- [ ] Anonymized (if double-blind)
+
+### NLP-Specific
+- [ ] Standard benchmark results
+- [ ] Error analysis included
+- [ ] Human evaluation (for generation)
+- [ ] Responsible NLP checklist
+
+### HCI-Specific
+- [ ] IRB approval stated
+- [ ] Participant demographics
+- [ ] Direct quotes in findings
+- [ ] Design implications
+
+### Data Mining-Specific
+- [ ] Scalability experiments
+- [ ] Dataset size statistics
+- [ ] Runtime comparisons
+- [ ] Complexity analysis
+
+---
+
+## See Also
+
+- `venue_writing_styles.md` - Master style overview
+- `ml_conference_style.md` - NeurIPS/ICML style guide
+- `conferences_formatting.md` - Technical formatting requirements
+- `reviewer_expectations.md` - What CS reviewers seek
+