mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-01-26 16:58:56 +08:00
- Updated SKILL.md in citation management to include best practices for identifying seminal and high-impact papers, emphasizing citation count thresholds, venue quality tiers, and author reputation indicators. - Expanded literature review SKILL.md to prioritize high-impact papers, detailing citation metrics, journal tiers, and author reputation assessment. - Added comprehensive evaluation strategies for paper impact and quality in literature_search_strategies.md, including citation count significance and journal impact factor guidance. - Improved research lookup scripts to prioritize results based on citation count, venue prestige, and author reputation, enhancing the quality of research outputs.
464 lines
15 KiB
Markdown
464 lines
15 KiB
Markdown
# CS Conference Writing Style Guide
|
||
|
||
Comprehensive writing guide for ACL, EMNLP, NAACL (NLP), CHI, CSCW (HCI), SIGKDD, WWW, SIGIR (data mining/IR), and other major CS conferences.
|
||
|
||
**Last Updated**: 2024
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
CS conferences span diverse subfields with distinct writing cultures. This guide covers NLP, HCI, and data mining/IR venues, each with unique expectations and evaluation criteria.
|
||
|
||
---
|
||
|
||
# Part 1: NLP Conferences (ACL, EMNLP, NAACL)
|
||
|
||
## NLP Writing Philosophy
|
||
|
||
> "Strong empirical results on standard benchmarks with insightful analysis."
|
||
|
||
NLP papers balance empirical rigor with linguistic insight. Human evaluation is increasingly important alongside automatic metrics.
|
||
|
||
## Audience and Tone
|
||
|
||
### Target Reader
|
||
- NLP researchers and computational linguists
|
||
- Familiar with transformer architectures, standard benchmarks
|
||
- Expect reproducible results and error analysis
|
||
|
||
### Tone Characteristics
|
||
| Characteristic | Description |
|
||
|---------------|-------------|
|
||
| **Task-focused** | Clear problem definition |
|
||
| **Benchmark-oriented** | Standard datasets emphasized |
|
||
| **Analysis-rich** | Error analysis, qualitative examples |
|
||
| **Reproducible** | Full implementation details |
|
||
|
||
## Abstract (NLP Style)
|
||
|
||
### Structure
|
||
- **Task/problem** (1 sentence)
|
||
- **Limitation of prior work** (1 sentence)
|
||
- **Your approach** (1-2 sentences)
|
||
- **Results on benchmarks** (2 sentences)
|
||
- **Analysis finding** (optional, 1 sentence)
|
||
|
||
### Example Abstract
|
||
|
||
```
|
||
Coreference resolution remains challenging for pronouns with distant or
|
||
ambiguous antecedents. Prior neural approaches struggle with these
|
||
difficult cases due to limited context modeling. We introduce
|
||
LongContext-Coref, a retrieval-augmented coreference model that
|
||
dynamically retrieves relevant context from document history. On the
|
||
OntoNotes 5.0 benchmark, LongContext-Coref achieves 83.4 F1, improving
|
||
over the previous state-of-the-art by 1.2 points. On the challenging
|
||
WinoBias dataset, we reduce gender bias by 34% while maintaining
|
||
accuracy. Qualitative analysis reveals that our model successfully
|
||
resolves pronouns requiring world knowledge, a known weakness of
|
||
prior approaches.
|
||
```
|
||
|
||
## NLP Paper Structure
|
||
|
||
```
|
||
├── Introduction
|
||
│ ├── Task motivation
|
||
│ ├── Prior work limitations
|
||
│ ├── Your contribution
|
||
│ └── Contribution bullets
|
||
├── Related Work
|
||
├── Method
|
||
│ ├── Problem formulation
|
||
│ ├── Model architecture
|
||
│ └── Training procedure
|
||
├── Experiments
|
||
│ ├── Datasets (with statistics)
|
||
│ ├── Baselines
|
||
│ ├── Main results
|
||
│ ├── Analysis
|
||
│ │ ├── Error analysis
|
||
│ │ ├── Ablation study
|
||
│ │ └── Qualitative examples
|
||
│ └── Human evaluation (if applicable)
|
||
├── Discussion / Limitations
|
||
└── Conclusion
|
||
```
|
||
|
||
## NLP-Specific Requirements
|
||
|
||
### Datasets
|
||
- Use **standard benchmarks**: GLUE, SQuAD, CoNLL, OntoNotes
|
||
- Report **dataset statistics**: train/dev/test sizes
|
||
- **Data preprocessing**: Document all steps
|
||
|
||
### Evaluation Metrics
|
||
- **Task-appropriate metrics**: F1, BLEU, ROUGE, accuracy
|
||
- **Statistical significance**: Paired bootstrap, p-values
|
||
- **Multiple runs**: Report mean ± std across seeds
|
||
|
||
### Human Evaluation
|
||
Increasingly expected for generation tasks:
|
||
- **Annotator details**: Number, qualifications, agreement
|
||
- **Evaluation protocol**: Guidelines, interface, payment
|
||
- **Inter-annotator agreement**: Cohen's κ or Krippendorff's α
|
||
|
||
### Example Human Evaluation Table
|
||
|
||
```
|
||
Table 3: Human Evaluation Results (100 samples, 3 annotators)
|
||
─────────────────────────────────────────────────────────────
|
||
Method | Fluency | Coherence | Factuality | Overall
|
||
─────────────────────────────────────────────────────────────
|
||
Baseline | 3.8 | 3.2 | 3.5 | 3.5
|
||
GPT-3.5 | 4.2 | 4.0 | 3.7 | 4.0
|
||
Our Method | 4.4 | 4.3 | 4.1 | 4.3
|
||
─────────────────────────────────────────────────────────────
|
||
Inter-annotator κ = 0.72. Scale: 1-5 (higher is better).
|
||
```
|
||
|
||
## ACL-Specific Notes
|
||
|
||
- **ARR (ACL Rolling Review)**: Shared review system across ACL venues
|
||
- **Responsible NLP checklist**: Ethics, limitations, risks
|
||
- **Long (8 pages) vs. Short (4 pages)**: Different expectations
|
||
- **Findings papers**: Lower-tier acceptance track
|
||
|
||
---
|
||
|
||
# Part 2: HCI Conferences (CHI, CSCW, UIST)
|
||
|
||
## HCI Writing Philosophy
|
||
|
||
> "Technology in service of humans—understand users first, then design and evaluate."
|
||
|
||
HCI papers are fundamentally **user-centered**. Technology novelty alone is insufficient; understanding human needs and demonstrating user benefit is essential.
|
||
|
||
## Audience and Tone
|
||
|
||
### Target Reader
|
||
- HCI researchers and practitioners
|
||
- UX designers and product developers
|
||
- Interdisciplinary (CS, psychology, design, social science)
|
||
|
||
### Tone Characteristics
|
||
| Characteristic | Description |
|
||
|---------------|-------------|
|
||
| **User-centered** | Focus on people, not technology |
|
||
| **Design-informed** | Grounded in design thinking |
|
||
| **Empirical** | User studies provide evidence |
|
||
| **Reflective** | Consider broader implications |
|
||
|
||
## HCI Abstract
|
||
|
||
### Focus on Users and Impact
|
||
|
||
```
|
||
Video calling has become essential for remote collaboration, yet
|
||
current interfaces poorly support the peripheral awareness that makes
|
||
in-person work effective. Through formative interviews with 24 remote
|
||
workers, we identified three key challenges: difficulty gauging
|
||
colleague availability, lack of ambient presence cues, and interruption
|
||
anxiety. We designed AmbientOffice, a peripheral display system that
|
||
conveys teammate presence through subtle ambient visualizations. In a
|
||
two-week deployment study with 18 participants across three distributed
|
||
teams, AmbientOffice increased spontaneous collaboration by 40% and
|
||
reduced perceived isolation (p<0.01). Participants valued the system's
|
||
non-intrusive nature and reported feeling more connected to remote
|
||
colleagues. We discuss implications for designing ambient awareness
|
||
systems and the tension between visibility and privacy in remote work.
|
||
```
|
||
|
||
## HCI Paper Structure
|
||
|
||
### Research Through Design / Systems Papers
|
||
|
||
```
|
||
├── Introduction
|
||
│ ├── Problem in human terms
|
||
│ ├── Why technology can help
|
||
│ └── Contribution summary
|
||
├── Related Work
|
||
│ ├── Domain background
|
||
│ ├── Prior systems
|
||
│ └── Theoretical frameworks
|
||
├── Formative Work (often)
|
||
│ ├── Interviews / observations
|
||
│ └── Design requirements
|
||
├── System Design
|
||
│ ├── Design rationale
|
||
│ ├── Implementation
|
||
│ └── Interface walkthrough
|
||
├── Evaluation
|
||
│ ├── Study design
|
||
│ ├── Participants
|
||
│ ├── Procedure
|
||
│ ├── Findings (quant + qual)
|
||
│ └── Limitations
|
||
├── Discussion
|
||
│ ├── Design implications
|
||
│ ├── Generalizability
|
||
│ └── Future work
|
||
└── Conclusion
|
||
```
|
||
|
||
### Qualitative / Interview Studies
|
||
|
||
```
|
||
├── Introduction
|
||
├── Related Work
|
||
├── Methods
|
||
│ ├── Participants
|
||
│ ├── Procedure
|
||
│ ├── Data collection
|
||
│ └── Analysis method (thematic, grounded theory, etc.)
|
||
├── Findings
|
||
│ ├── Theme 1 (with quotes)
|
||
│ ├── Theme 2 (with quotes)
|
||
│ └── Theme 3 (with quotes)
|
||
├── Discussion
|
||
│ ├── Implications for design
|
||
│ ├── Implications for research
|
||
│ └── Limitations
|
||
└── Conclusion
|
||
```
|
||
|
||
## HCI-Specific Requirements
|
||
|
||
### Participant Reporting
|
||
- **Demographics**: Age, gender, relevant experience
|
||
- **Recruitment**: How and where recruited
|
||
- **Compensation**: Payment amount and type
|
||
- **IRB approval**: Ethics board statement
|
||
|
||
### Quotes in Findings
|
||
Use direct quotes to ground findings:
|
||
```
|
||
Participants valued the ambient nature of the display. As P7 described:
|
||
"It's like having a window to my teammate's office. I don't need to
|
||
actively check it, but I know they're there." This passive awareness
|
||
reduced the barrier to initiating contact.
|
||
```
|
||
|
||
### Design Implications Section
|
||
Translate findings into actionable guidance:
|
||
```
|
||
**Implication 1: Support peripheral awareness without demanding attention.**
|
||
Ambient displays should be visible in peripheral vision but not require
|
||
active monitoring. Designers should consider calm technology principles.
|
||
|
||
**Implication 2: Balance visibility with privacy.**
|
||
Users want to share presence but fear surveillance. Systems should
|
||
provide granular controls and make visibility mutual.
|
||
```
|
||
|
||
## CHI-Specific Notes
|
||
|
||
- **Contribution types**: Empirical, artifact, methodological, theoretical
|
||
- **ACM format**: `acmart` document class with `sigchi` option
|
||
- **Accessibility**: Alt text, inclusive language expected
|
||
- **Contribution statement**: Required per-author contributions
|
||
|
||
---
|
||
|
||
# Part 3: Data Mining & IR (SIGKDD, WWW, SIGIR)
|
||
|
||
## Data Mining Writing Philosophy
|
||
|
||
> "Scalable methods for real-world data with demonstrated practical impact."
|
||
|
||
Data mining papers emphasize **scalability**, **real-world applicability**, and **solid experimental methodology**.
|
||
|
||
## Audience and Tone
|
||
|
||
### Target Reader
|
||
- Data scientists and ML engineers
|
||
- Industry researchers
|
||
- Applied ML practitioners
|
||
|
||
### Tone Characteristics
|
||
| Characteristic | Description |
|
||
|---------------|-------------|
|
||
| **Scalable** | Handle large datasets |
|
||
| **Practical** | Real-world applications |
|
||
| **Reproducible** | Datasets and code shared |
|
||
| **Industrial** | Industry datasets valued |
|
||
|
||
## KDD Abstract
|
||
|
||
### Emphasize Scale and Application
|
||
|
||
```
|
||
Fraud detection in e-commerce requires processing millions of
|
||
transactions in real-time while adapting to evolving attack patterns.
|
||
We present FraudShield, a graph neural network framework for real-time
|
||
fraud detection that scales to billion-edge transaction graphs. Unlike
|
||
prior methods that require full graph access, FraudShield uses
|
||
incremental updates with O(1) inference cost per transaction. On a
|
||
proprietary dataset of 2.3 billion transactions from a major e-commerce
|
||
platform, FraudShield achieves 94.2% precision at 80% recall,
|
||
outperforming production baselines by 12%. The system has been deployed
|
||
at [Company], processing 50K transactions per second and preventing
|
||
an estimated $400M in annual fraud losses. We release an anonymized
|
||
benchmark dataset and code.
|
||
```
|
||
|
||
## KDD Paper Structure
|
||
|
||
```
|
||
├── Introduction
|
||
│ ├── Problem and impact
|
||
│ ├── Technical challenges
|
||
│ ├── Your approach
|
||
│ └── Contributions
|
||
├── Related Work
|
||
├── Preliminaries
|
||
│ ├── Problem definition
|
||
│ └── Notation
|
||
├── Method
|
||
│ ├── Overview
|
||
│ ├── Technical components
|
||
│ └── Complexity analysis
|
||
├── Experiments
|
||
│ ├── Datasets (with scale statistics)
|
||
│ ├── Baselines
|
||
│ ├── Main results
|
||
│ ├── Scalability experiments
|
||
│ ├── Ablation study
|
||
│ └── Case study / deployment
|
||
└── Conclusion
|
||
```
|
||
|
||
## KDD-Specific Requirements
|
||
|
||
### Scalability
|
||
- **Dataset sizes**: Report number of nodes, edges, samples
|
||
- **Runtime analysis**: Wall-clock time comparisons
|
||
- **Complexity**: Time and space complexity stated
|
||
- **Scaling experiments**: Show performance vs. data size
|
||
|
||
### Industrial Deployment
|
||
- **Case studies**: Real-world deployment stories
|
||
- **A/B tests**: Online evaluation results (if applicable)
|
||
- **Production metrics**: Business impact (if shareable)
|
||
|
||
### Example Scalability Table
|
||
|
||
```
|
||
Table 4: Scalability Comparison (runtime in seconds)
|
||
──────────────────────────────────────────────────────
|
||
Dataset | Nodes | Edges | GCN | GraphSAGE | Ours
|
||
──────────────────────────────────────────────────────
|
||
Cora | 2.7K | 5.4K | 0.3 | 0.2 | 0.1
|
||
Citeseer | 3.3K | 4.7K | 0.4 | 0.3 | 0.1
|
||
PubMed | 19.7K | 44.3K | 1.2 | 0.8 | 0.3
|
||
ogbn-arxiv | 169K | 1.17M | 8.4 | 4.2 | 1.6
|
||
ogbn-papers | 111M | 1.6B | OOM | OOM | 42.3
|
||
──────────────────────────────────────────────────────
|
||
```
|
||
|
||
---
|
||
|
||
# Part 4: Common Elements Across CS Venues
|
||
|
||
## Writing Quality
|
||
|
||
### Clarity
|
||
- **One idea per sentence**
|
||
- **Define terms before use**
|
||
- **Use consistent notation**
|
||
|
||
### Precision
|
||
- **Exact numbers**: "23.4%" not "about 20%"
|
||
- **Clear claims**: Avoid hedging unless necessary
|
||
- **Specific comparisons**: Name the baseline
|
||
|
||
## Contribution Bullets
|
||
|
||
Used across all CS venues:
|
||
```
|
||
Our contributions are:
|
||
• We identify [problem/insight]
|
||
• We propose [method name] that [key innovation]
|
||
• We demonstrate [results] on [benchmarks]
|
||
• We release [code/data] at [URL]
|
||
```
|
||
|
||
## Reproducibility Standards
|
||
|
||
All CS venues increasingly expect:
|
||
- **Code availability**: GitHub link (anonymous for review)
|
||
- **Data availability**: Public datasets or release plans
|
||
- **Full hyperparameters**: Training details complete
|
||
- **Random seeds**: Exact values for reproduction
|
||
|
||
## Ethics and Broader Impact
|
||
|
||
### NLP (ACL/EMNLP)
|
||
- **Limitations section**: Required
|
||
- **Responsible NLP checklist**: Ethical considerations
|
||
- **Bias analysis**: For models affecting people
|
||
|
||
### HCI (CHI)
|
||
- **IRB/Ethics approval**: Required for human subjects
|
||
- **Informed consent**: Procedure described
|
||
- **Privacy considerations**: Data handling
|
||
|
||
### KDD/WWW
|
||
- **Societal impact**: Consider misuse potential
|
||
- **Privacy preservation**: For sensitive data
|
||
- **Fairness analysis**: When applicable
|
||
|
||
---
|
||
|
||
## Venue Comparison Table
|
||
|
||
| Aspect | ACL/EMNLP | CHI | KDD/WWW | SIGIR |
|
||
|--------|-----------|-----|---------|-------|
|
||
| **Focus** | NLP tasks | User studies | Scalable ML | IR/search |
|
||
| **Evaluation** | Benchmarks + human | User studies | Large-scale exp | Datasets |
|
||
| **Theory weight** | Moderate | Low | Moderate | Moderate |
|
||
| **Industry value** | High | Medium | Very high | High |
|
||
| **Page limit** | 8 long / 4 short | 10 + refs | 9 + refs | 10 + refs |
|
||
| **Review style** | ARR | Direct | Direct | Direct |
|
||
|
||
---
|
||
|
||
## Pre-Submission Checklist
|
||
|
||
### All CS Venues
|
||
- [ ] Clear contribution statement
|
||
- [ ] Strong baselines
|
||
- [ ] Reproducibility information complete
|
||
- [ ] Correct venue template
|
||
- [ ] Anonymized (if double-blind)
|
||
|
||
### NLP-Specific
|
||
- [ ] Standard benchmark results
|
||
- [ ] Error analysis included
|
||
- [ ] Human evaluation (for generation)
|
||
- [ ] Responsible NLP checklist
|
||
|
||
### HCI-Specific
|
||
- [ ] IRB approval stated
|
||
- [ ] Participant demographics
|
||
- [ ] Direct quotes in findings
|
||
- [ ] Design implications
|
||
|
||
### Data Mining-Specific
|
||
- [ ] Scalability experiments
|
||
- [ ] Dataset size statistics
|
||
- [ ] Runtime comparisons
|
||
- [ ] Complexity analysis
|
||
|
||
---
|
||
|
||
## See Also
|
||
|
||
- `venue_writing_styles.md` - Master style overview
|
||
- `ml_conference_style.md` - NeurIPS/ICML style guide
|
||
- `conferences_formatting.md` - Technical formatting requirements
|
||
- `reviewer_expectations.md` - What CS reviewers seek
|
||
|