Research Area

Parallel Programming Models

Advancing the science and practice of parallel programming — from OpenMP compiler implementations and automatic parallelization to GPU accelerator support and control-flow optimization. Chunhua's work bridges the intent of the programmer with the reality of massively parallel hardware, making HPC more accessible and correct.

📖 Overview

The challenge of parallel programming lies in the enormous gap between what a programmer expresses — a sequential algorithm and its intended semantics — and what the hardware actually executes — thousands of concurrent threads with complex memory hierarchies and synchronization requirements. Chunhua's research addresses this gap from multiple directions: building better compilers for parallel programming models, automating parallelization, and extending these models to emerging hardware architectures.

A central contribution is XOMP, a complete OpenMP implementation built on top of the ROSE compiler infrastructure. XOMP enables source-to-source translation of OpenMP programs, making it possible to experiment with new OpenMP features, runtime optimizations, and custom parallelization strategies without modifying production compilers. This work laid the groundwork for ROSE's automatic parallelization capabilities and continues to inform OpenMP standard development. Earlier, Chunhua contributed to the OpenUH compiler at the University of Houston, building one of the most complete and standards-compliant OpenMP implementations of its time.

As scientific workloads shifted to GPU accelerators, Chunhua extended this work with HOMP (Heterogeneous OpenMP), targeting automatic offloading of OpenMP loops to GPU devices. More recently, research at CGO 2024 addressed control-flow unmerging on GPUs — a critical optimization for avoiding thread divergence and maximizing GPU utilization in irregular scientific codes. Together, these efforts represent a comprehensive body of work spanning the full lifecycle of parallel programming model support.

📄 Key Publications

2024 CGO 2024

Control-Flow Unmerging Optimizations for GPU Programs

Chunhua Liao et al.

Introduces compiler transformations that unmerge GPU control-flow paths to reduce thread divergence and improve warp utilization. Published at the International Symposium on Code Generation and Optimization (CGO), the premier venue for compiler optimization research.

Paper details coming soon

2024 IWOMP 2024

An Interactive OpenMP Programming Book with LLM Assistance

Chunhua Liao et al.

Explores a novel approach to OpenMP education: using LLMs to create an interactive, adaptive programming textbook that helps developers learn OpenMP through conversational explanations and code examples. Presented at the International Workshop on OpenMP (IWOMP).

Paper details coming soon

2015–2020 OpenMP / EuroMPI

XOMP: An OpenMP Extension for Source-to-Source Transformation

Chunhua Liao et al. — LLNL

A series of publications describing XOMP — a complete, extensible OpenMP implementation built as a ROSE compiler plugin. Enables custom OpenMP runtime backends, new OpenMP constructs, and experimental parallelization strategies for HPC workloads.

ROSE Project →

2007–2012 CC / PLDI / EuroPar

OpenMP Implementation and Optimization in the OpenUH Compiler

Chunhua Liao, Oscar Hernandez, Barbara Chapman, et al. — University of Houston

Foundational work on OpenMP 3.0 implementation in the OpenUH open-source compiler framework. Includes task parallelism support, NUMA-aware optimizations, and one of the first complete implementations of the OpenMP task model.

Paper details coming soon

💻 Software & Tools

🧵

XOMP / ROSE OpenMP

OpenMP Compiler Plugin

A complete OpenMP 4.x implementation built as a ROSE source-to-source compiler plugin. Enables custom runtime backends, experimental OpenMP extensions, and serves as the foundation for ROSE's automatic parallelization infrastructure.

ROSE Website GitHub

OpenMPC/C++FortranHPC

🖥️

HOMP

Heterogeneous OpenMP

An extension to the XOMP framework that adds GPU offloading support — automatically translating OpenMP target directives to CUDA or OpenCL kernel calls. Enables portable GPU programming through standard OpenMP syntax.

ROSE Website

GPUOpenMPCUDAOffloading

🔬

OpenUH Compiler

Academic Open-Source Compiler

A research compiler developed at the University of Houston with one of the most complete OpenMP 3.0 implementations available at the time. Formed the basis of Chunhua's PhD research on OpenMP implementation strategies.

Historical research compiler (University of Houston)

OpenMPOpen64Research

💡 Impact & Insights

The goal of a parallel programming model is to let programmers express what they want to compute — not micromanage how to distribute it. Our work strives to close the gap between programmer intent and machine execution.

XOMP's source-to-source OpenMP translation enabled rapid prototyping of new OpenMP features and custom runtime systems, accelerating research that would otherwise require months of compiler hacking.
The OpenUH OpenMP implementation was one of the most complete and standards-compliant of its era, influencing subsequent compiler implementations and OpenMP standard discussions.
GPU control-flow unmerging (CGO 2024) addresses a fundamental performance bottleneck in GPU computing — thread divergence — with a compiler-level solution that requires no programmer effort.
The HOMP heterogeneous extension demonstrates a key design principle: parallel programming models should abstract away hardware heterogeneity, letting programmers write portable code that runs on CPUs and GPUs alike.
The LLM-assisted OpenMP book (IWOMP 2024) shows a future direction: using AI to teach parallel programming, making OpenMP knowledge accessible to the next generation of HPC developers.