Stop Boiling Your Frogs: High Impact Fixes to Problems of Scale
Lara Bailey
When software engineers build and work on large systems, perfectly reasonable looking code can often exhibit unanticipated problems only when a system starts to be used at scale, perhaps even far beyond what it was originally designed for.
The good news is that the solution to these problems can often be surprisingly simple. This allows a huge benefit to be gained - perhaps even saving a project - due to a comparatively small engineering effort.
After this talk, attendees will come away with ideas and techniques for finding the places where they can make changes with the biggest impact in the systems that they work on.
Whether a system has a long history – perhaps operating in a very different environment from originally envisioned – or is relatively new and designed to solve a new problem in a new space, it will have intrinsic complexity defined by the problem which it solves, but it will also have incidental complexity in its implementation. It can be hard to measure how much of the new system's complexity is incidental and whether there is significant scope to improve it.
Like the proverbial frog in the pan of water, it can be easy to accept the current behavior of the system as a given. However, as the system is scaled up and the temperature rises, it is important to recognize the severity of the situation. Once we understand the situation, the solution is often relatively simple. We can then scoop the frog out of the water to safety.
The presenter will share past experiences to illustrate such successes, looking at the tools and techniques used to identify the problems, the technical details of the fixes, and the scale of the benefits realised. One example is a project for building data warehouses that was exhausting its available memory. The cause was a couple of C++ classes being used outside of their ideal design space. While they met the interface requirements, their performance characteristics caused more and more issues as the test data volumes were increased. They were easily replaced with ones better designed for the use case. Warehouse builds that took hours now took minutes and used an order of magnitude less memory.
Lara Bailey
Lara is a senior software engineer at Bloomberg. She works on infrastructure code in libraries and tools supporting our principal messaging framework used throughout the company.