464 lines
15 KiB
Markdown
464 lines
15 KiB
Markdown
# CS Conference Writing Style Guide
|
||
|
||
Comprehensive writing guide for ACL, EMNLP, NAACL (NLP), CHI, CSCW (HCI), SIGKDD, WWW, SIGIR (data mining/IR), and other major CS conferences.
|
||
|
||
**Last Updated**: 2024
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
CS conferences span diverse subfields with distinct writing cultures. This guide covers NLP, HCI, and data mining/IR venues, each with unique expectations and evaluation criteria.
|
||
|
||
---
|
||
|
||
# Part 1: NLP Conferences (ACL, EMNLP, NAACL)
|
||
|
||
## NLP Writing Philosophy
|
||
|
||
> "Strong empirical results on standard benchmarks with insightful analysis."
|
||
|
||
NLP papers balance empirical rigor with linguistic insight. Human evaluation is increasingly important alongside automatic metrics.
|
||
|
||
## Audience and Tone
|
||
|
||
### Target Reader
|
||
- NLP researchers and computational linguists
|
||
- Familiar with transformer architectures, standard benchmarks
|
||
- Expect reproducible results and error analysis
|
||
|
||
### Tone Characteristics
|
||
| Characteristic | Description |
|
||
|---------------|-------------|
|
||
| **Task-focused** | Clear problem definition |
|
||
| **Benchmark-oriented** | Standard datasets emphasized |
|
||
| **Analysis-rich** | Error analysis, qualitative examples |
|
||
| **Reproducible** | Full implementation details |
|
||
|
||
## Abstract (NLP Style)
|
||
|
||
### Structure
|
||
- **Task/problem** (1 sentence)
|
||
- **Limitation of prior work** (1 sentence)
|
||
- **Your approach** (1-2 sentences)
|
||
- **Results on benchmarks** (2 sentences)
|
||
- **Analysis finding** (optional, 1 sentence)
|
||
|
||
### Example Abstract
|
||
|
||
```
|
||
Coreference resolution remains challenging for pronouns with distant or
|
||
ambiguous antecedents. Prior neural approaches struggle with these
|
||
difficult cases due to limited context modeling. We introduce
|
||
LongContext-Coref, a retrieval-augmented coreference model that
|
||
dynamically retrieves relevant context from document history. On the
|
||
OntoNotes 5.0 benchmark, LongContext-Coref achieves 83.4 F1, improving
|
||
over the previous state-of-the-art by 1.2 points. On the challenging
|
||
WinoBias dataset, we reduce gender bias by 34% while maintaining
|
||
accuracy. Qualitative analysis reveals that our model successfully
|
||
resolves pronouns requiring world knowledge, a known weakness of
|
||
prior approaches.
|
||
```
|
||
|
||
## NLP Paper Structure
|
||
|
||
```
|
||
├── Introduction
|
||
│ ├── Task motivation
|
||
│ ├── Prior work limitations
|
||
│ ├── Your contribution
|
||
│ └── Contribution bullets
|
||
├── Related Work
|
||
├── Method
|
||
│ ├── Problem formulation
|
||
│ ├── Model architecture
|
||
│ └── Training procedure
|
||
├── Experiments
|
||
│ ├── Datasets (with statistics)
|
||
│ ├── Baselines
|
||
│ ├── Main results
|
||
│ ├── Analysis
|
||
│ │ ├── Error analysis
|
||
│ │ ├── Ablation study
|
||
│ │ └── Qualitative examples
|
||
│ └── Human evaluation (if applicable)
|
||
├── Discussion / Limitations
|
||
└── Conclusion
|
||
```
|
||
|
||
## NLP-Specific Requirements
|
||
|
||
### Datasets
|
||
- Use **standard benchmarks**: GLUE, SQuAD, CoNLL, OntoNotes
|
||
- Report **dataset statistics**: train/dev/test sizes
|
||
- **Data preprocessing**: Document all steps
|
||
|
||
### Evaluation Metrics
|
||
- **Task-appropriate metrics**: F1, BLEU, ROUGE, accuracy
|
||
- **Statistical significance**: Paired bootstrap, p-values
|
||
- **Multiple runs**: Report mean ± std across seeds
|
||
|
||
### Human Evaluation
|
||
Increasingly expected for generation tasks:
|
||
- **Annotator details**: Number, qualifications, agreement
|
||
- **Evaluation protocol**: Guidelines, interface, payment
|
||
- **Inter-annotator agreement**: Cohen's κ or Krippendorff's α
|
||
|
||
### Example Human Evaluation Table
|
||
|
||
```
|
||
Table 3: Human Evaluation Results (100 samples, 3 annotators)
|
||
─────────────────────────────────────────────────────────────
|
||
Method | Fluency | Coherence | Factuality | Overall
|
||
─────────────────────────────────────────────────────────────
|
||
Baseline | 3.8 | 3.2 | 3.5 | 3.5
|
||
GPT-3.5 | 4.2 | 4.0 | 3.7 | 4.0
|
||
Our Method | 4.4 | 4.3 | 4.1 | 4.3
|
||
─────────────────────────────────────────────────────────────
|
||
Inter-annotator κ = 0.72. Scale: 1-5 (higher is better).
|
||
```
|
||
|
||
## ACL-Specific Notes
|
||
|
||
- **ARR (ACL Rolling Review)**: Shared review system across ACL venues
|
||
- **Responsible NLP checklist**: Ethics, limitations, risks
|
||
- **Long (8 pages) vs. Short (4 pages)**: Different expectations
|
||
- **Findings papers**: Lower-tier acceptance track
|
||
|
||
---
|
||
|
||
# Part 2: HCI Conferences (CHI, CSCW, UIST)
|
||
|
||
## HCI Writing Philosophy
|
||
|
||
> "Technology in service of humans—understand users first, then design and evaluate."
|
||
|
||
HCI papers are fundamentally **user-centered**. Technology novelty alone is insufficient; understanding human needs and demonstrating user benefit is essential.
|
||
|
||
## Audience and Tone
|
||
|
||
### Target Reader
|
||
- HCI researchers and practitioners
|
||
- UX designers and product developers
|
||
- Interdisciplinary (CS, psychology, design, social science)
|
||
|
||
### Tone Characteristics
|
||
| Characteristic | Description |
|
||
|---------------|-------------|
|
||
| **User-centered** | Focus on people, not technology |
|
||
| **Design-informed** | Grounded in design thinking |
|
||
| **Empirical** | User studies provide evidence |
|
||
| **Reflective** | Consider broader implications |
|
||
|
||
## HCI Abstract
|
||
|
||
### Focus on Users and Impact
|
||
|
||
```
|
||
Video calling has become essential for remote collaboration, yet
|
||
current interfaces poorly support the peripheral awareness that makes
|
||
in-person work effective. Through formative interviews with 24 remote
|
||
workers, we identified three key challenges: difficulty gauging
|
||
colleague availability, lack of ambient presence cues, and interruption
|
||
anxiety. We designed AmbientOffice, a peripheral display system that
|
||
conveys teammate presence through subtle ambient visualizations. In a
|
||
two-week deployment study with 18 participants across three distributed
|
||
teams, AmbientOffice increased spontaneous collaboration by 40% and
|
||
reduced perceived isolation (p<0.01). Participants valued the system's
|
||
non-intrusive nature and reported feeling more connected to remote
|
||
colleagues. We discuss implications for designing ambient awareness
|
||
systems and the tension between visibility and privacy in remote work.
|
||
```
|
||
|
||
## HCI Paper Structure
|
||
|
||
### Research Through Design / Systems Papers
|
||
|
||
```
|
||
├── Introduction
|
||
│ ├── Problem in human terms
|
||
│ ├── Why technology can help
|
||
│ └── Contribution summary
|
||
├── Related Work
|
||
│ ├── Domain background
|
||
│ ├── Prior systems
|
||
│ └── Theoretical frameworks
|
||
├── Formative Work (often)
|
||
│ ├── Interviews / observations
|
||
│ └── Design requirements
|
||
├── System Design
|
||
│ ├── Design rationale
|
||
│ ├── Implementation
|
||
│ └── Interface walkthrough
|
||
├── Evaluation
|
||
│ ├── Study design
|
||
│ ├── Participants
|
||
│ ├── Procedure
|
||
│ ├── Findings (quant + qual)
|
||
│ └── Limitations
|
||
├── Discussion
|
||
│ ├── Design implications
|
||
│ ├── Generalizability
|
||
│ └── Future work
|
||
└── Conclusion
|
||
```
|
||
|
||
### Qualitative / Interview Studies
|
||
|
||
```
|
||
├── Introduction
|
||
├── Related Work
|
||
├── Methods
|
||
│ ├── Participants
|
||
│ ├── Procedure
|
||
│ ├── Data collection
|
||
│ └── Analysis method (thematic, grounded theory, etc.)
|
||
├── Findings
|
||
│ ├── Theme 1 (with quotes)
|
||
│ ├── Theme 2 (with quotes)
|
||
│ └── Theme 3 (with quotes)
|
||
├── Discussion
|
||
│ ├── Implications for design
|
||
│ ├── Implications for research
|
||
│ └── Limitations
|
||
└── Conclusion
|
||
```
|
||
|
||
## HCI-Specific Requirements
|
||
|
||
### Participant Reporting
|
||
- **Demographics**: Age, gender, relevant experience
|
||
- **Recruitment**: How and where recruited
|
||
- **Compensation**: Payment amount and type
|
||
- **IRB approval**: Ethics board statement
|
||
|
||
### Quotes in Findings
|
||
Use direct quotes to ground findings:
|
||
```
|
||
Participants valued the ambient nature of the display. As P7 described:
|
||
"It's like having a window to my teammate's office. I don't need to
|
||
actively check it, but I know they're there." This passive awareness
|
||
reduced the barrier to initiating contact.
|
||
```
|
||
|
||
### Design Implications Section
|
||
Translate findings into actionable guidance:
|
||
```
|
||
**Implication 1: Support peripheral awareness without demanding attention.**
|
||
Ambient displays should be visible in peripheral vision but not require
|
||
active monitoring. Designers should consider calm technology principles.
|
||
|
||
**Implication 2: Balance visibility with privacy.**
|
||
Users want to share presence but fear surveillance. Systems should
|
||
provide granular controls and make visibility mutual.
|
||
```
|
||
|
||
## CHI-Specific Notes
|
||
|
||
- **Contribution types**: Empirical, artifact, methodological, theoretical
|
||
- **ACM format**: `acmart` document class with `sigchi` option
|
||
- **Accessibility**: Alt text, inclusive language expected
|
||
- **Contribution statement**: Required per-author contributions
|
||
|
||
---
|
||
|
||
# Part 3: Data Mining & IR (SIGKDD, WWW, SIGIR)
|
||
|
||
## Data Mining Writing Philosophy
|
||
|
||
> "Scalable methods for real-world data with demonstrated practical impact."
|
||
|
||
Data mining papers emphasize **scalability**, **real-world applicability**, and **solid experimental methodology**.
|
||
|
||
## Audience and Tone
|
||
|
||
### Target Reader
|
||
- Data scientists and ML engineers
|
||
- Industry researchers
|
||
- Applied ML practitioners
|
||
|
||
### Tone Characteristics
|
||
| Characteristic | Description |
|
||
|---------------|-------------|
|
||
| **Scalable** | Handle large datasets |
|
||
| **Practical** | Real-world applications |
|
||
| **Reproducible** | Datasets and code shared |
|
||
| **Industrial** | Industry datasets valued |
|
||
|
||
## KDD Abstract
|
||
|
||
### Emphasize Scale and Application
|
||
|
||
```
|
||
Fraud detection in e-commerce requires processing millions of
|
||
transactions in real-time while adapting to evolving attack patterns.
|
||
We present FraudShield, a graph neural network framework for real-time
|
||
fraud detection that scales to billion-edge transaction graphs. Unlike
|
||
prior methods that require full graph access, FraudShield uses
|
||
incremental updates with O(1) inference cost per transaction. On a
|
||
proprietary dataset of 2.3 billion transactions from a major e-commerce
|
||
platform, FraudShield achieves 94.2% precision at 80% recall,
|
||
outperforming production baselines by 12%. The system has been deployed
|
||
at [Company], processing 50K transactions per second and preventing
|
||
an estimated $400M in annual fraud losses. We release an anonymized
|
||
benchmark dataset and code.
|
||
```
|
||
|
||
## KDD Paper Structure
|
||
|
||
```
|
||
├── Introduction
|
||
│ ├── Problem and impact
|
||
│ ├── Technical challenges
|
||
│ ├── Your approach
|
||
│ └── Contributions
|
||
├── Related Work
|
||
├── Preliminaries
|
||
│ ├── Problem definition
|
||
│ └── Notation
|
||
├── Method
|
||
│ ├── Overview
|
||
│ ├── Technical components
|
||
│ └── Complexity analysis
|
||
├── Experiments
|
||
│ ├── Datasets (with scale statistics)
|
||
│ ├── Baselines
|
||
│ ├── Main results
|
||
│ ├── Scalability experiments
|
||
│ ├── Ablation study
|
||
│ └── Case study / deployment
|
||
└── Conclusion
|
||
```
|
||
|
||
## KDD-Specific Requirements
|
||
|
||
### Scalability
|
||
- **Dataset sizes**: Report number of nodes, edges, samples
|
||
- **Runtime analysis**: Wall-clock time comparisons
|
||
- **Complexity**: Time and space complexity stated
|
||
- **Scaling experiments**: Show performance vs. data size
|
||
|
||
### Industrial Deployment
|
||
- **Case studies**: Real-world deployment stories
|
||
- **A/B tests**: Online evaluation results (if applicable)
|
||
- **Production metrics**: Business impact (if shareable)
|
||
|
||
### Example Scalability Table
|
||
|
||
```
|
||
Table 4: Scalability Comparison (runtime in seconds)
|
||
──────────────────────────────────────────────────────
|
||
Dataset | Nodes | Edges | GCN | GraphSAGE | Ours
|
||
──────────────────────────────────────────────────────
|
||
Cora | 2.7K | 5.4K | 0.3 | 0.2 | 0.1
|
||
Citeseer | 3.3K | 4.7K | 0.4 | 0.3 | 0.1
|
||
PubMed | 19.7K | 44.3K | 1.2 | 0.8 | 0.3
|
||
ogbn-arxiv | 169K | 1.17M | 8.4 | 4.2 | 1.6
|
||
ogbn-papers | 111M | 1.6B | OOM | OOM | 42.3
|
||
──────────────────────────────────────────────────────
|
||
```
|
||
|
||
---
|
||
|
||
# Part 4: Common Elements Across CS Venues
|
||
|
||
## Writing Quality
|
||
|
||
### Clarity
|
||
- **One idea per sentence**
|
||
- **Define terms before use**
|
||
- **Use consistent notation**
|
||
|
||
### Precision
|
||
- **Exact numbers**: "23.4%" not "about 20%"
|
||
- **Clear claims**: Avoid hedging unless necessary
|
||
- **Specific comparisons**: Name the baseline
|
||
|
||
## Contribution Bullets
|
||
|
||
Used across all CS venues:
|
||
```
|
||
Our contributions are:
|
||
• We identify [problem/insight]
|
||
• We propose [method name] that [key innovation]
|
||
• We demonstrate [results] on [benchmarks]
|
||
• We release [code/data] at [URL]
|
||
```
|
||
|
||
## Reproducibility Standards
|
||
|
||
All CS venues increasingly expect:
|
||
- **Code availability**: GitHub link (anonymous for review)
|
||
- **Data availability**: Public datasets or release plans
|
||
- **Full hyperparameters**: Training details complete
|
||
- **Random seeds**: Exact values for reproduction
|
||
|
||
## Ethics and Broader Impact
|
||
|
||
### NLP (ACL/EMNLP)
|
||
- **Limitations section**: Required
|
||
- **Responsible NLP checklist**: Ethical considerations
|
||
- **Bias analysis**: For models affecting people
|
||
|
||
### HCI (CHI)
|
||
- **IRB/Ethics approval**: Required for human subjects
|
||
- **Informed consent**: Procedure described
|
||
- **Privacy considerations**: Data handling
|
||
|
||
### KDD/WWW
|
||
- **Societal impact**: Consider misuse potential
|
||
- **Privacy preservation**: For sensitive data
|
||
- **Fairness analysis**: When applicable
|
||
|
||
---
|
||
|
||
## Venue Comparison Table
|
||
|
||
| Aspect | ACL/EMNLP | CHI | KDD/WWW | SIGIR |
|
||
|--------|-----------|-----|---------|-------|
|
||
| **Focus** | NLP tasks | User studies | Scalable ML | IR/search |
|
||
| **Evaluation** | Benchmarks + human | User studies | Large-scale exp | Datasets |
|
||
| **Theory weight** | Moderate | Low | Moderate | Moderate |
|
||
| **Industry value** | High | Medium | Very high | High |
|
||
| **Page limit** | 8 long / 4 short | 10 + refs | 9 + refs | 10 + refs |
|
||
| **Review style** | ARR | Direct | Direct | Direct |
|
||
|
||
---
|
||
|
||
## Pre-Submission Checklist
|
||
|
||
### All CS Venues
|
||
- [ ] Clear contribution statement
|
||
- [ ] Strong baselines
|
||
- [ ] Reproducibility information complete
|
||
- [ ] Correct venue template
|
||
- [ ] Anonymized (if double-blind)
|
||
|
||
### NLP-Specific
|
||
- [ ] Standard benchmark results
|
||
- [ ] Error analysis included
|
||
- [ ] Human evaluation (for generation)
|
||
- [ ] Responsible NLP checklist
|
||
|
||
### HCI-Specific
|
||
- [ ] IRB approval stated
|
||
- [ ] Participant demographics
|
||
- [ ] Direct quotes in findings
|
||
- [ ] Design implications
|
||
|
||
### Data Mining-Specific
|
||
- [ ] Scalability experiments
|
||
- [ ] Dataset size statistics
|
||
- [ ] Runtime comparisons
|
||
- [ ] Complexity analysis
|
||
|
||
---
|
||
|
||
## See Also
|
||
|
||
- `venue_writing_styles.md` - Master style overview
|
||
- `ml_conference_style.md` - NeurIPS/ICML style guide
|
||
- `conferences_formatting.md` - Technical formatting requirements
|
||
- `reviewer_expectations.md` - What CS reviewers seek
|
||
|