There Is No Optimal: Pareto Trade Offs in C++ — Compile-Time Search with Real Constraints
andrew drakeford
unsigned
In performance-critical C++, what does “optimal” mean? It is never a universal quantity. Game engines might need bit-exact determinism. In HPC, a solver may prioritise throughput, latency, or some combination of both. An embedded application might be more constrained by memory than by power consumption. Yet developers are often forced to hand-tune low-level code—or accept generic libraries that can’t reflect their real constraints.
What if you could define what “best” means for your context—and have a compile-time solver choose the best legal implementation for that definition? This talk presents a practical, reusable methodology using only C++20/23. You’ll learn how to:
- Declare what transformations are allowed (e.g., “reordering is permitted within a bounded numerical error”),
- Encode your trade-offs as policies (e.g., “determinism first, then speed, under 64 KB”),
- Ground decisions in measured profiles, not theoretical assumptions.
We will demonstrate by a progression through examples, how we can use the modern C++ machinery to represent and solve for this Pareto optimum: The examples are as follows:
- Variadic reduction: for calculating multiple statistics.
- Matrix chain multiplication: choose evaluation order to minimise work.
- Memory layout: To reduce cache misses, we reorder struct fields by access frequency, alignment, concurrency needs, and logical grouping—not simply by packing the largest elements first. We also consider necessary constraints.
- Sparse matrix–vector multiplication: We select hybrid storage formats based on sparsity patterns and evaluation approaches based on hardware constraints.
For each example, we use a small code framework to expose the intent. We use it to define and constrain the search space. It uses compile-time exploration (via dynamic programming) to identify the best candidate and construct an appropriate execution plan. Since the decision is made during compilation, the resulting binary executes a plan where the only selected path has no runtime dispatch.
We will also discuss techniques for controlling build times and briefly sketch how C++26 reflection could reduce boilerplate (e.g., introspecting members) and how executors might extend this approach to scheduling.
This isn’t about automating away expertise. It’s about giving expert developers a way to encode their judgment—then letting the compiler do the tedious searching.
andrew drakeford
Physics PhD who started out at BT Labs working with AI and acting as a design consultant for large scale systems in network management and billing. He has spent the last few decades designing and writing high performance quantitative libraries and applications in C++. He is interested in quant finance, machine learning, vectorisation, SIMD and HPC. He is a member of the UK C++ panel.