๐ Overview
The challenge of parallel programming lies in the enormous gap between what a programmer expresses โ a sequential algorithm
and its intended semantics โ and what the hardware actually executes โ thousands of concurrent threads with complex memory
hierarchies and synchronization requirements. Chunhua's research addresses this gap from multiple directions: building better
compilers for parallel programming models, automating parallelization, and extending these models to emerging hardware architectures.
A central contribution is XOMP, a complete OpenMP implementation built on top of the ROSE compiler infrastructure.
XOMP enables source-to-source translation of OpenMP programs, making it possible to experiment with new OpenMP features,
runtime optimizations, and custom parallelization strategies without modifying production compilers. This work laid the
groundwork for ROSE's automatic parallelization capabilities and continues to inform OpenMP standard development.
Earlier, Chunhua contributed to the OpenUH compiler at the University of Houston, building one of the most
complete and standards-compliant OpenMP implementations of its time.
As scientific workloads shifted to GPU accelerators, Chunhua extended this work with HOMP (Heterogeneous OpenMP),
targeting automatic offloading of OpenMP loops to GPU devices. More recently, research at CGO 2024 addressed
control-flow unmerging on GPUs โ a critical optimization for avoiding thread divergence and maximizing
GPU utilization in irregular scientific codes. Together, these efforts represent a comprehensive body of work spanning
the full lifecycle of parallel programming model support.