Research Area

HPC Performance Optimization

Closing the gap between theoretical peak performance and what scientific applications actually achieve — through compiler-guided analysis, AI-assisted optimization, intelligent data placement, and developer-facing tools that translate cryptic compiler reports into actionable engineering guidance for exascale machines.

📖 Overview

High-performance computing applications running on the world's largest supercomputers — from LLNL's Sierra and El Capitan to Frontier and Aurora — routinely achieve only a fraction of their hardware's theoretical peak performance. Closing this performance gap requires a deep understanding of both the application's computational structure and the machine's memory hierarchy, parallelism model, and compiler optimization opportunities. Chunhua's HPC performance optimization research addresses this challenge at multiple levels: from compiler-level analysis of individual loops to system-level data placement strategies for heterogeneous memory architectures.

A central theme is making performance optimization accessible and actionable. Modern compilers generate voluminous optimization reports — thousands of lines of diagnostics explaining why loops weren't vectorized, why functions weren't inlined, why memory accesses were flagged as potential bottlenecks. This information is invaluable but requires expert interpretation. Chunhua's CompilerGPT and related tools use AI to translate these reports into plain-English explanations with concrete, prioritized recommendations — democratizing performance engineering.

Beyond individual code optimization, Chunhua's group has tackled data placement in heterogeneous memory with tools like XPlacer, which automatically recommends optimal placement of data structures across NUMA domains, GPU HBM, and host DRAM. This problem becomes critical at exascale, where memory bandwidth is often the binding performance constraint. Chunhua's work under the RAPIDS/SciDAC initiative and the SUPER institute contributes directly to the DOE's exascale computing mission, developing the tools and methodologies that scientific application teams need to productively use next-generation hardware.

📄 Key Publications

2025 PLDI 2025

Reductive Analysis with Compiler-Guided LLMs for Code Optimizations

Chunhua Liao et al.

Presents a novel compiler-AI collaboration framework where compiler analysis progressively reduces program complexity to help LLMs identify and apply targeted code optimizations. Demonstrates significant speedups on real HPC benchmarks through automated identification of vectorization, parallelization, and memory access pattern improvements.

Paper details coming soon

2025 C3PO-HPC @ ISC 2025

CompilerGPT: Leveraging LLMs for Compiler Optimization Reports

Chunhua Liao et al.

Demonstrates how LLMs can interpret compiler optimization diagnostics and generate actionable performance tuning recommendations. Evaluated on real HPC application codes from LLNL's production portfolio, showing substantial developer time savings in performance engineering workflows.

Code →

2020–2023 ICS / SC / ISC

XPlacer: Guided Optimal Placement of Data on Heterogeneous Memory Systems

Chunhua Liao et al. — LLNL

Introduces XPlacer, a tool that automatically analyzes application memory access patterns and recommends optimal data placement across heterogeneous memory systems — GPU HBM, NUMA host memory, and NVM. Critical for extracting peak performance from modern heterogeneous supercomputers where memory bandwidth is the binding constraint.

Paper details coming soon

2015–2020 DOE SciDAC / RAPIDS

RAPIDS SciDAC Institute: Enabling Exascale Application Performance

Chunhua Liao, LLNL CASC Team, et al.

Contributions to the DOE SciDAC RAPIDS Institute for Computer Science and Data, developing compiler and runtime tools that help scientific application teams extract performance from pre-exascale and exascale platforms. Work encompasses auto-parallelization, memory optimization, and performance portability frameworks.

LLNL Profile →

2013–2018 SUPER Institute / ASCR

SUPER Institute: Compiler and Runtime Support for Scalable Systems

Chunhua Liao, LLNL, ANL, LBNL, et al.

Multi-laboratory collaboration developing compiler technologies and runtime systems for scalable parallel computing. Chunhua's contributions focused on OpenMP extensions, auto-parallelization, and performance analysis tooling for DOE leadership-class systems.

Project page coming soon

💻 Software & Tools

🌹

ROSE Optimization Plugins

ROSE-based Performance Tools

A suite of ROSE compiler plugins for performance analysis and transformation: loop optimization advisors, vectorization analyzers, memory access pattern profilers, and auto-parallelization tools. These plugins leverage ROSE's source-level analysis capabilities to provide actionable optimization guidance while preserving code readability.

ROSE Website GitHub

ROSEC/C++FortranPerformanceVectorization

🗺️

XPlacer

Heterogeneous Memory Placement

A compiler-assisted tool that analyzes application memory access patterns and recommends optimal data placement across heterogeneous memory systems. Particularly valuable for GPU-accelerated HPC codes where choosing between HBM and host memory for each data structure can dramatically affect performance.

LLNL Profile

GPUMemoryNUMAExascale

🤖

CompilerGPT

AI Performance Assistant

LLM-powered tool that converts compiler optimization reports into plain-English explanations with actionable recommendations. Reduces performance engineering from a specialist skill to something any HPC developer can practice effectively with AI assistance.

GitHub

LLMPerformancePython

💡 Impact & Insights

Performance optimization is ultimately a human activity. The compiler knows what's happening in the machine, but a developer must decide what to change. Our mission is to make compiler intelligence actionable for every HPC developer.

The performance gap problem at exascale is more acute than ever: with heterogeneous CPUs, GPUs, and novel memory tiers, even expert programmers struggle to fully exploit modern HPC hardware without systematic tool support.
Compiler-guided LLM optimization (PLDI 2025) demonstrates that the combination of precise compiler analysis and flexible LLM reasoning outperforms either approach alone — a key architectural insight for AI-assisted programming tools.
XPlacer's data placement approach addresses a performance bottleneck that is invisible to most developers but critical for GPU-accelerated codes: choosing the wrong memory tier can cost 10–100× in bandwidth-bound applications.
The HPC-FAIR framework applies FAIR (Findable, Accessible, Interoperable, Reusable) principles to HPC performance data, enabling reproducible performance engineering and cross-site benchmarking.
Contributions to RAPIDS/SciDAC and the SUPER Institute represent long-term investment in national laboratory infrastructure — the tools and methods developed here directly enable the scientific missions of DOE's flagship computing facilities.