๐ Overview
Parallel programs are notoriously difficult to get right. Data races โ where two threads access the same memory location
concurrently and at least one is a write โ are subtle, non-deterministic bugs that can corrupt results, cause crashes,
or silently produce wrong answers in scientific simulations. Detecting and eliminating these bugs requires both powerful
analysis tools and rigorous benchmarks that can evaluate how well those tools work. Chunhua's correctness research addresses
both sides of this challenge.
DataRaceBench is a community benchmark suite specifically designed to evaluate data race detection tools.
It contains hundreds of microbenchmarks covering a comprehensive range of race patterns in OpenMP programs โ from simple
shared-variable races to complex patterns involving reductions, atomic operations, and nested parallelism. By providing
ground-truth annotations (each microbenchmark is labeled as having or not having a race), DataRaceBench enables apples-to-apples
comparison of different detection tools and serves as a regression suite for tool developers. It has been adopted by
research groups worldwide and cited in dozens of publications.
AutoParBench extends this philosophy to automatic parallelization: a benchmark framework for evaluating
how well compilers and tools can automatically identify and parallelize sequential loops. More recently, Chunhua's group has
explored whether LLMs can detect data races โ a fascinating intersection of AI and program correctness.
The SC-W 2023 paper found that current LLMs show surprising capability on simple race patterns but struggle with the
complex, context-dependent races that occur in real HPC codes โ highlighting both the promise and the current limits of
AI-based program correctness analysis.