Enhance citation management and literature review guidelines

- Updated SKILL.md in citation management to include best practices for identifying seminal and high-impact papers, emphasizing citation count thresholds, venue quality tiers, and author reputation indicators.
- Expanded literature review SKILL.md to prioritize high-impact papers, detailing citation metrics, journal tiers, and author reputation assessment.
- Added comprehensive evaluation strategies for paper impact and quality in literature_search_strategies.md, including citation count significance and journal impact factor guidance.
- Improved research lookup scripts to prioritize results based on citation count, venue prestige, and author reputation, enhancing the quality of research outputs.
This commit is contained in:
Vinayak Agarwal
2026-01-05 13:01:10 -08:00
parent d243a12564
commit 3439a21f57
41 changed files with 11802 additions and 61 deletions

View File

@@ -0,0 +1,463 @@
# CS Conference Writing Style Guide
Comprehensive writing guide for ACL, EMNLP, NAACL (NLP), CHI, CSCW (HCI), SIGKDD, WWW, SIGIR (data mining/IR), and other major CS conferences.
**Last Updated**: 2024
---
## Overview
CS conferences span diverse subfields with distinct writing cultures. This guide covers NLP, HCI, and data mining/IR venues, each with unique expectations and evaluation criteria.
---
# Part 1: NLP Conferences (ACL, EMNLP, NAACL)
## NLP Writing Philosophy
> "Strong empirical results on standard benchmarks with insightful analysis."
NLP papers balance empirical rigor with linguistic insight. Human evaluation is increasingly important alongside automatic metrics.
## Audience and Tone
### Target Reader
- NLP researchers and computational linguists
- Familiar with transformer architectures, standard benchmarks
- Expect reproducible results and error analysis
### Tone Characteristics
| Characteristic | Description |
|---------------|-------------|
| **Task-focused** | Clear problem definition |
| **Benchmark-oriented** | Standard datasets emphasized |
| **Analysis-rich** | Error analysis, qualitative examples |
| **Reproducible** | Full implementation details |
## Abstract (NLP Style)
### Structure
- **Task/problem** (1 sentence)
- **Limitation of prior work** (1 sentence)
- **Your approach** (1-2 sentences)
- **Results on benchmarks** (2 sentences)
- **Analysis finding** (optional, 1 sentence)
### Example Abstract
```
Coreference resolution remains challenging for pronouns with distant or
ambiguous antecedents. Prior neural approaches struggle with these
difficult cases due to limited context modeling. We introduce
LongContext-Coref, a retrieval-augmented coreference model that
dynamically retrieves relevant context from document history. On the
OntoNotes 5.0 benchmark, LongContext-Coref achieves 83.4 F1, improving
over the previous state-of-the-art by 1.2 points. On the challenging
WinoBias dataset, we reduce gender bias by 34% while maintaining
accuracy. Qualitative analysis reveals that our model successfully
resolves pronouns requiring world knowledge, a known weakness of
prior approaches.
```
## NLP Paper Structure
```
├── Introduction
│ ├── Task motivation
│ ├── Prior work limitations
│ ├── Your contribution
│ └── Contribution bullets
├── Related Work
├── Method
│ ├── Problem formulation
│ ├── Model architecture
│ └── Training procedure
├── Experiments
│ ├── Datasets (with statistics)
│ ├── Baselines
│ ├── Main results
│ ├── Analysis
│ │ ├── Error analysis
│ │ ├── Ablation study
│ │ └── Qualitative examples
│ └── Human evaluation (if applicable)
├── Discussion / Limitations
└── Conclusion
```
## NLP-Specific Requirements
### Datasets
- Use **standard benchmarks**: GLUE, SQuAD, CoNLL, OntoNotes
- Report **dataset statistics**: train/dev/test sizes
- **Data preprocessing**: Document all steps
### Evaluation Metrics
- **Task-appropriate metrics**: F1, BLEU, ROUGE, accuracy
- **Statistical significance**: Paired bootstrap, p-values
- **Multiple runs**: Report mean ± std across seeds
### Human Evaluation
Increasingly expected for generation tasks:
- **Annotator details**: Number, qualifications, agreement
- **Evaluation protocol**: Guidelines, interface, payment
- **Inter-annotator agreement**: Cohen's κ or Krippendorff's α
### Example Human Evaluation Table
```
Table 3: Human Evaluation Results (100 samples, 3 annotators)
─────────────────────────────────────────────────────────────
Method | Fluency | Coherence | Factuality | Overall
─────────────────────────────────────────────────────────────
Baseline | 3.8 | 3.2 | 3.5 | 3.5
GPT-3.5 | 4.2 | 4.0 | 3.7 | 4.0
Our Method | 4.4 | 4.3 | 4.1 | 4.3
─────────────────────────────────────────────────────────────
Inter-annotator κ = 0.72. Scale: 1-5 (higher is better).
```
## ACL-Specific Notes
- **ARR (ACL Rolling Review)**: Shared review system across ACL venues
- **Responsible NLP checklist**: Ethics, limitations, risks
- **Long (8 pages) vs. Short (4 pages)**: Different expectations
- **Findings papers**: Lower-tier acceptance track
---
# Part 2: HCI Conferences (CHI, CSCW, UIST)
## HCI Writing Philosophy
> "Technology in service of humans—understand users first, then design and evaluate."
HCI papers are fundamentally **user-centered**. Technology novelty alone is insufficient; understanding human needs and demonstrating user benefit is essential.
## Audience and Tone
### Target Reader
- HCI researchers and practitioners
- UX designers and product developers
- Interdisciplinary (CS, psychology, design, social science)
### Tone Characteristics
| Characteristic | Description |
|---------------|-------------|
| **User-centered** | Focus on people, not technology |
| **Design-informed** | Grounded in design thinking |
| **Empirical** | User studies provide evidence |
| **Reflective** | Consider broader implications |
## HCI Abstract
### Focus on Users and Impact
```
Video calling has become essential for remote collaboration, yet
current interfaces poorly support the peripheral awareness that makes
in-person work effective. Through formative interviews with 24 remote
workers, we identified three key challenges: difficulty gauging
colleague availability, lack of ambient presence cues, and interruption
anxiety. We designed AmbientOffice, a peripheral display system that
conveys teammate presence through subtle ambient visualizations. In a
two-week deployment study with 18 participants across three distributed
teams, AmbientOffice increased spontaneous collaboration by 40% and
reduced perceived isolation (p<0.01). Participants valued the system's
non-intrusive nature and reported feeling more connected to remote
colleagues. We discuss implications for designing ambient awareness
systems and the tension between visibility and privacy in remote work.
```
## HCI Paper Structure
### Research Through Design / Systems Papers
```
├── Introduction
│ ├── Problem in human terms
│ ├── Why technology can help
│ └── Contribution summary
├── Related Work
│ ├── Domain background
│ ├── Prior systems
│ └── Theoretical frameworks
├── Formative Work (often)
│ ├── Interviews / observations
│ └── Design requirements
├── System Design
│ ├── Design rationale
│ ├── Implementation
│ └── Interface walkthrough
├── Evaluation
│ ├── Study design
│ ├── Participants
│ ├── Procedure
│ ├── Findings (quant + qual)
│ └── Limitations
├── Discussion
│ ├── Design implications
│ ├── Generalizability
│ └── Future work
└── Conclusion
```
### Qualitative / Interview Studies
```
├── Introduction
├── Related Work
├── Methods
│ ├── Participants
│ ├── Procedure
│ ├── Data collection
│ └── Analysis method (thematic, grounded theory, etc.)
├── Findings
│ ├── Theme 1 (with quotes)
│ ├── Theme 2 (with quotes)
│ └── Theme 3 (with quotes)
├── Discussion
│ ├── Implications for design
│ ├── Implications for research
│ └── Limitations
└── Conclusion
```
## HCI-Specific Requirements
### Participant Reporting
- **Demographics**: Age, gender, relevant experience
- **Recruitment**: How and where recruited
- **Compensation**: Payment amount and type
- **IRB approval**: Ethics board statement
### Quotes in Findings
Use direct quotes to ground findings:
```
Participants valued the ambient nature of the display. As P7 described:
"It's like having a window to my teammate's office. I don't need to
actively check it, but I know they're there." This passive awareness
reduced the barrier to initiating contact.
```
### Design Implications Section
Translate findings into actionable guidance:
```
**Implication 1: Support peripheral awareness without demanding attention.**
Ambient displays should be visible in peripheral vision but not require
active monitoring. Designers should consider calm technology principles.
**Implication 2: Balance visibility with privacy.**
Users want to share presence but fear surveillance. Systems should
provide granular controls and make visibility mutual.
```
## CHI-Specific Notes
- **Contribution types**: Empirical, artifact, methodological, theoretical
- **ACM format**: `acmart` document class with `sigchi` option
- **Accessibility**: Alt text, inclusive language expected
- **Contribution statement**: Required per-author contributions
---
# Part 3: Data Mining & IR (SIGKDD, WWW, SIGIR)
## Data Mining Writing Philosophy
> "Scalable methods for real-world data with demonstrated practical impact."
Data mining papers emphasize **scalability**, **real-world applicability**, and **solid experimental methodology**.
## Audience and Tone
### Target Reader
- Data scientists and ML engineers
- Industry researchers
- Applied ML practitioners
### Tone Characteristics
| Characteristic | Description |
|---------------|-------------|
| **Scalable** | Handle large datasets |
| **Practical** | Real-world applications |
| **Reproducible** | Datasets and code shared |
| **Industrial** | Industry datasets valued |
## KDD Abstract
### Emphasize Scale and Application
```
Fraud detection in e-commerce requires processing millions of
transactions in real-time while adapting to evolving attack patterns.
We present FraudShield, a graph neural network framework for real-time
fraud detection that scales to billion-edge transaction graphs. Unlike
prior methods that require full graph access, FraudShield uses
incremental updates with O(1) inference cost per transaction. On a
proprietary dataset of 2.3 billion transactions from a major e-commerce
platform, FraudShield achieves 94.2% precision at 80% recall,
outperforming production baselines by 12%. The system has been deployed
at [Company], processing 50K transactions per second and preventing
an estimated $400M in annual fraud losses. We release an anonymized
benchmark dataset and code.
```
## KDD Paper Structure
```
├── Introduction
│ ├── Problem and impact
│ ├── Technical challenges
│ ├── Your approach
│ └── Contributions
├── Related Work
├── Preliminaries
│ ├── Problem definition
│ └── Notation
├── Method
│ ├── Overview
│ ├── Technical components
│ └── Complexity analysis
├── Experiments
│ ├── Datasets (with scale statistics)
│ ├── Baselines
│ ├── Main results
│ ├── Scalability experiments
│ ├── Ablation study
│ └── Case study / deployment
└── Conclusion
```
## KDD-Specific Requirements
### Scalability
- **Dataset sizes**: Report number of nodes, edges, samples
- **Runtime analysis**: Wall-clock time comparisons
- **Complexity**: Time and space complexity stated
- **Scaling experiments**: Show performance vs. data size
### Industrial Deployment
- **Case studies**: Real-world deployment stories
- **A/B tests**: Online evaluation results (if applicable)
- **Production metrics**: Business impact (if shareable)
### Example Scalability Table
```
Table 4: Scalability Comparison (runtime in seconds)
──────────────────────────────────────────────────────
Dataset | Nodes | Edges | GCN | GraphSAGE | Ours
──────────────────────────────────────────────────────
Cora | 2.7K | 5.4K | 0.3 | 0.2 | 0.1
Citeseer | 3.3K | 4.7K | 0.4 | 0.3 | 0.1
PubMed | 19.7K | 44.3K | 1.2 | 0.8 | 0.3
ogbn-arxiv | 169K | 1.17M | 8.4 | 4.2 | 1.6
ogbn-papers | 111M | 1.6B | OOM | OOM | 42.3
──────────────────────────────────────────────────────
```
---
# Part 4: Common Elements Across CS Venues
## Writing Quality
### Clarity
- **One idea per sentence**
- **Define terms before use**
- **Use consistent notation**
### Precision
- **Exact numbers**: "23.4%" not "about 20%"
- **Clear claims**: Avoid hedging unless necessary
- **Specific comparisons**: Name the baseline
## Contribution Bullets
Used across all CS venues:
```
Our contributions are:
• We identify [problem/insight]
• We propose [method name] that [key innovation]
• We demonstrate [results] on [benchmarks]
• We release [code/data] at [URL]
```
## Reproducibility Standards
All CS venues increasingly expect:
- **Code availability**: GitHub link (anonymous for review)
- **Data availability**: Public datasets or release plans
- **Full hyperparameters**: Training details complete
- **Random seeds**: Exact values for reproduction
## Ethics and Broader Impact
### NLP (ACL/EMNLP)
- **Limitations section**: Required
- **Responsible NLP checklist**: Ethical considerations
- **Bias analysis**: For models affecting people
### HCI (CHI)
- **IRB/Ethics approval**: Required for human subjects
- **Informed consent**: Procedure described
- **Privacy considerations**: Data handling
### KDD/WWW
- **Societal impact**: Consider misuse potential
- **Privacy preservation**: For sensitive data
- **Fairness analysis**: When applicable
---
## Venue Comparison Table
| Aspect | ACL/EMNLP | CHI | KDD/WWW | SIGIR |
|--------|-----------|-----|---------|-------|
| **Focus** | NLP tasks | User studies | Scalable ML | IR/search |
| **Evaluation** | Benchmarks + human | User studies | Large-scale exp | Datasets |
| **Theory weight** | Moderate | Low | Moderate | Moderate |
| **Industry value** | High | Medium | Very high | High |
| **Page limit** | 8 long / 4 short | 10 + refs | 9 + refs | 10 + refs |
| **Review style** | ARR | Direct | Direct | Direct |
---
## Pre-Submission Checklist
### All CS Venues
- [ ] Clear contribution statement
- [ ] Strong baselines
- [ ] Reproducibility information complete
- [ ] Correct venue template
- [ ] Anonymized (if double-blind)
### NLP-Specific
- [ ] Standard benchmark results
- [ ] Error analysis included
- [ ] Human evaluation (for generation)
- [ ] Responsible NLP checklist
### HCI-Specific
- [ ] IRB approval stated
- [ ] Participant demographics
- [ ] Direct quotes in findings
- [ ] Design implications
### Data Mining-Specific
- [ ] Scalability experiments
- [ ] Dataset size statistics
- [ ] Runtime comparisons
- [ ] Complexity analysis
---
## See Also
- `venue_writing_styles.md` - Master style overview
- `ml_conference_style.md` - NeurIPS/ICML style guide
- `conferences_formatting.md` - Technical formatting requirements
- `reviewer_expectations.md` - What CS reviewers seek