- Updated SKILL.md in citation management to include best practices for identifying seminal and high-impact papers, emphasizing citation count thresholds, venue quality tiers, and author reputation indicators. - Expanded literature review SKILL.md to prioritize high-impact papers, detailing citation metrics, journal tiers, and author reputation assessment. - Added comprehensive evaluation strategies for paper impact and quality in literature_search_strategies.md, including citation count significance and journal impact factor guidance. - Improved research lookup scripts to prioritize results based on citation count, venue prestige, and author reputation, enhancing the quality of research outputs.
15 KiB
CS Conference Writing Style Guide
Comprehensive writing guide for ACL, EMNLP, NAACL (NLP), CHI, CSCW (HCI), SIGKDD, WWW, SIGIR (data mining/IR), and other major CS conferences.
Last Updated: 2024
Overview
CS conferences span diverse subfields with distinct writing cultures. This guide covers NLP, HCI, and data mining/IR venues, each with unique expectations and evaluation criteria.
Part 1: NLP Conferences (ACL, EMNLP, NAACL)
NLP Writing Philosophy
"Strong empirical results on standard benchmarks with insightful analysis."
NLP papers balance empirical rigor with linguistic insight. Human evaluation is increasingly important alongside automatic metrics.
Audience and Tone
Target Reader
- NLP researchers and computational linguists
- Familiar with transformer architectures, standard benchmarks
- Expect reproducible results and error analysis
Tone Characteristics
| Characteristic | Description |
|---|---|
| Task-focused | Clear problem definition |
| Benchmark-oriented | Standard datasets emphasized |
| Analysis-rich | Error analysis, qualitative examples |
| Reproducible | Full implementation details |
Abstract (NLP Style)
Structure
- Task/problem (1 sentence)
- Limitation of prior work (1 sentence)
- Your approach (1-2 sentences)
- Results on benchmarks (2 sentences)
- Analysis finding (optional, 1 sentence)
Example Abstract
Coreference resolution remains challenging for pronouns with distant or
ambiguous antecedents. Prior neural approaches struggle with these
difficult cases due to limited context modeling. We introduce
LongContext-Coref, a retrieval-augmented coreference model that
dynamically retrieves relevant context from document history. On the
OntoNotes 5.0 benchmark, LongContext-Coref achieves 83.4 F1, improving
over the previous state-of-the-art by 1.2 points. On the challenging
WinoBias dataset, we reduce gender bias by 34% while maintaining
accuracy. Qualitative analysis reveals that our model successfully
resolves pronouns requiring world knowledge, a known weakness of
prior approaches.
NLP Paper Structure
├── Introduction
│ ├── Task motivation
│ ├── Prior work limitations
│ ├── Your contribution
│ └── Contribution bullets
├── Related Work
├── Method
│ ├── Problem formulation
│ ├── Model architecture
│ └── Training procedure
├── Experiments
│ ├── Datasets (with statistics)
│ ├── Baselines
│ ├── Main results
│ ├── Analysis
│ │ ├── Error analysis
│ │ ├── Ablation study
│ │ └── Qualitative examples
│ └── Human evaluation (if applicable)
├── Discussion / Limitations
└── Conclusion
NLP-Specific Requirements
Datasets
- Use standard benchmarks: GLUE, SQuAD, CoNLL, OntoNotes
- Report dataset statistics: train/dev/test sizes
- Data preprocessing: Document all steps
Evaluation Metrics
- Task-appropriate metrics: F1, BLEU, ROUGE, accuracy
- Statistical significance: Paired bootstrap, p-values
- Multiple runs: Report mean ± std across seeds
Human Evaluation
Increasingly expected for generation tasks:
- Annotator details: Number, qualifications, agreement
- Evaluation protocol: Guidelines, interface, payment
- Inter-annotator agreement: Cohen's κ or Krippendorff's α
Example Human Evaluation Table
Table 3: Human Evaluation Results (100 samples, 3 annotators)
─────────────────────────────────────────────────────────────
Method | Fluency | Coherence | Factuality | Overall
─────────────────────────────────────────────────────────────
Baseline | 3.8 | 3.2 | 3.5 | 3.5
GPT-3.5 | 4.2 | 4.0 | 3.7 | 4.0
Our Method | 4.4 | 4.3 | 4.1 | 4.3
─────────────────────────────────────────────────────────────
Inter-annotator κ = 0.72. Scale: 1-5 (higher is better).
ACL-Specific Notes
- ARR (ACL Rolling Review): Shared review system across ACL venues
- Responsible NLP checklist: Ethics, limitations, risks
- Long (8 pages) vs. Short (4 pages): Different expectations
- Findings papers: Lower-tier acceptance track
Part 2: HCI Conferences (CHI, CSCW, UIST)
HCI Writing Philosophy
"Technology in service of humans—understand users first, then design and evaluate."
HCI papers are fundamentally user-centered. Technology novelty alone is insufficient; understanding human needs and demonstrating user benefit is essential.
Audience and Tone
Target Reader
- HCI researchers and practitioners
- UX designers and product developers
- Interdisciplinary (CS, psychology, design, social science)
Tone Characteristics
| Characteristic | Description |
|---|---|
| User-centered | Focus on people, not technology |
| Design-informed | Grounded in design thinking |
| Empirical | User studies provide evidence |
| Reflective | Consider broader implications |
HCI Abstract
Focus on Users and Impact
Video calling has become essential for remote collaboration, yet
current interfaces poorly support the peripheral awareness that makes
in-person work effective. Through formative interviews with 24 remote
workers, we identified three key challenges: difficulty gauging
colleague availability, lack of ambient presence cues, and interruption
anxiety. We designed AmbientOffice, a peripheral display system that
conveys teammate presence through subtle ambient visualizations. In a
two-week deployment study with 18 participants across three distributed
teams, AmbientOffice increased spontaneous collaboration by 40% and
reduced perceived isolation (p<0.01). Participants valued the system's
non-intrusive nature and reported feeling more connected to remote
colleagues. We discuss implications for designing ambient awareness
systems and the tension between visibility and privacy in remote work.
HCI Paper Structure
Research Through Design / Systems Papers
├── Introduction
│ ├── Problem in human terms
│ ├── Why technology can help
│ └── Contribution summary
├── Related Work
│ ├── Domain background
│ ├── Prior systems
│ └── Theoretical frameworks
├── Formative Work (often)
│ ├── Interviews / observations
│ └── Design requirements
├── System Design
│ ├── Design rationale
│ ├── Implementation
│ └── Interface walkthrough
├── Evaluation
│ ├── Study design
│ ├── Participants
│ ├── Procedure
│ ├── Findings (quant + qual)
│ └── Limitations
├── Discussion
│ ├── Design implications
│ ├── Generalizability
│ └── Future work
└── Conclusion
Qualitative / Interview Studies
├── Introduction
├── Related Work
├── Methods
│ ├── Participants
│ ├── Procedure
│ ├── Data collection
│ └── Analysis method (thematic, grounded theory, etc.)
├── Findings
│ ├── Theme 1 (with quotes)
│ ├── Theme 2 (with quotes)
│ └── Theme 3 (with quotes)
├── Discussion
│ ├── Implications for design
│ ├── Implications for research
│ └── Limitations
└── Conclusion
HCI-Specific Requirements
Participant Reporting
- Demographics: Age, gender, relevant experience
- Recruitment: How and where recruited
- Compensation: Payment amount and type
- IRB approval: Ethics board statement
Quotes in Findings
Use direct quotes to ground findings:
Participants valued the ambient nature of the display. As P7 described:
"It's like having a window to my teammate's office. I don't need to
actively check it, but I know they're there." This passive awareness
reduced the barrier to initiating contact.
Design Implications Section
Translate findings into actionable guidance:
**Implication 1: Support peripheral awareness without demanding attention.**
Ambient displays should be visible in peripheral vision but not require
active monitoring. Designers should consider calm technology principles.
**Implication 2: Balance visibility with privacy.**
Users want to share presence but fear surveillance. Systems should
provide granular controls and make visibility mutual.
CHI-Specific Notes
- Contribution types: Empirical, artifact, methodological, theoretical
- ACM format:
acmartdocument class withsigchioption - Accessibility: Alt text, inclusive language expected
- Contribution statement: Required per-author contributions
Part 3: Data Mining & IR (SIGKDD, WWW, SIGIR)
Data Mining Writing Philosophy
"Scalable methods for real-world data with demonstrated practical impact."
Data mining papers emphasize scalability, real-world applicability, and solid experimental methodology.
Audience and Tone
Target Reader
- Data scientists and ML engineers
- Industry researchers
- Applied ML practitioners
Tone Characteristics
| Characteristic | Description |
|---|---|
| Scalable | Handle large datasets |
| Practical | Real-world applications |
| Reproducible | Datasets and code shared |
| Industrial | Industry datasets valued |
KDD Abstract
Emphasize Scale and Application
Fraud detection in e-commerce requires processing millions of
transactions in real-time while adapting to evolving attack patterns.
We present FraudShield, a graph neural network framework for real-time
fraud detection that scales to billion-edge transaction graphs. Unlike
prior methods that require full graph access, FraudShield uses
incremental updates with O(1) inference cost per transaction. On a
proprietary dataset of 2.3 billion transactions from a major e-commerce
platform, FraudShield achieves 94.2% precision at 80% recall,
outperforming production baselines by 12%. The system has been deployed
at [Company], processing 50K transactions per second and preventing
an estimated $400M in annual fraud losses. We release an anonymized
benchmark dataset and code.
KDD Paper Structure
├── Introduction
│ ├── Problem and impact
│ ├── Technical challenges
│ ├── Your approach
│ └── Contributions
├── Related Work
├── Preliminaries
│ ├── Problem definition
│ └── Notation
├── Method
│ ├── Overview
│ ├── Technical components
│ └── Complexity analysis
├── Experiments
│ ├── Datasets (with scale statistics)
│ ├── Baselines
│ ├── Main results
│ ├── Scalability experiments
│ ├── Ablation study
│ └── Case study / deployment
└── Conclusion
KDD-Specific Requirements
Scalability
- Dataset sizes: Report number of nodes, edges, samples
- Runtime analysis: Wall-clock time comparisons
- Complexity: Time and space complexity stated
- Scaling experiments: Show performance vs. data size
Industrial Deployment
- Case studies: Real-world deployment stories
- A/B tests: Online evaluation results (if applicable)
- Production metrics: Business impact (if shareable)
Example Scalability Table
Table 4: Scalability Comparison (runtime in seconds)
──────────────────────────────────────────────────────
Dataset | Nodes | Edges | GCN | GraphSAGE | Ours
──────────────────────────────────────────────────────
Cora | 2.7K | 5.4K | 0.3 | 0.2 | 0.1
Citeseer | 3.3K | 4.7K | 0.4 | 0.3 | 0.1
PubMed | 19.7K | 44.3K | 1.2 | 0.8 | 0.3
ogbn-arxiv | 169K | 1.17M | 8.4 | 4.2 | 1.6
ogbn-papers | 111M | 1.6B | OOM | OOM | 42.3
──────────────────────────────────────────────────────
Part 4: Common Elements Across CS Venues
Writing Quality
Clarity
- One idea per sentence
- Define terms before use
- Use consistent notation
Precision
- Exact numbers: "23.4%" not "about 20%"
- Clear claims: Avoid hedging unless necessary
- Specific comparisons: Name the baseline
Contribution Bullets
Used across all CS venues:
Our contributions are:
• We identify [problem/insight]
• We propose [method name] that [key innovation]
• We demonstrate [results] on [benchmarks]
• We release [code/data] at [URL]
Reproducibility Standards
All CS venues increasingly expect:
- Code availability: GitHub link (anonymous for review)
- Data availability: Public datasets or release plans
- Full hyperparameters: Training details complete
- Random seeds: Exact values for reproduction
Ethics and Broader Impact
NLP (ACL/EMNLP)
- Limitations section: Required
- Responsible NLP checklist: Ethical considerations
- Bias analysis: For models affecting people
HCI (CHI)
- IRB/Ethics approval: Required for human subjects
- Informed consent: Procedure described
- Privacy considerations: Data handling
KDD/WWW
- Societal impact: Consider misuse potential
- Privacy preservation: For sensitive data
- Fairness analysis: When applicable
Venue Comparison Table
| Aspect | ACL/EMNLP | CHI | KDD/WWW | SIGIR |
|---|---|---|---|---|
| Focus | NLP tasks | User studies | Scalable ML | IR/search |
| Evaluation | Benchmarks + human | User studies | Large-scale exp | Datasets |
| Theory weight | Moderate | Low | Moderate | Moderate |
| Industry value | High | Medium | Very high | High |
| Page limit | 8 long / 4 short | 10 + refs | 9 + refs | 10 + refs |
| Review style | ARR | Direct | Direct | Direct |
Pre-Submission Checklist
All CS Venues
- Clear contribution statement
- Strong baselines
- Reproducibility information complete
- Correct venue template
- Anonymized (if double-blind)
NLP-Specific
- Standard benchmark results
- Error analysis included
- Human evaluation (for generation)
- Responsible NLP checklist
HCI-Specific
- IRB approval stated
- Participant demographics
- Direct quotes in findings
- Design implications
Data Mining-Specific
- Scalability experiments
- Dataset size statistics
- Runtime comparisons
- Complexity analysis
See Also
venue_writing_styles.md- Master style overviewml_conference_style.md- NeurIPS/ICML style guideconferences_formatting.md- Technical formatting requirementsreviewer_expectations.md- What CS reviewers seek