Mechanical sympathy – not as low-level as you think

Last week I had a pleasure to attend a 1 day workshop on “Understanding Mechanical Sympathy” by Martin Thompson.

The special thing about this workshop, and Martin’s work in general is that he convinced me that performance optimizations are not black magic, assembly tricks. If you think otherwise, I encourage you to watch his “95% of performance is about clean representative models” presentation.

Besides, he’s a great teacher – the knowledge is distilled, complex things reduced to simple basics, explained in a straightforward way. Last but not least, even though the exercises and presentations are using Java, the lessons can be applied to any other programming language. For the workshop the basic knowledge of Java was sufficient.

So what did I learn?

Clean code leads to good performance

I did learn that at the uni, I’ve seen that before, even in Martin’s presentations, but only now it finally clicked for me. Martin spent a few hours (literally!) explaining how CPUs work and how they evolved across a few models. Before the workshop I believed him that clean code results in a good performance, now I feel I also understand why.

It comes down to a few simple rules. For example, one must realize that CPU is a mini-distributed system. Every level down the architecture diagram (so registry, L1 cache, L2, L3, etc.) is more expensive with regards to the communication in terms of latency. On the other hand , every level up can store less data. If your class doesn’t fit into that limited space, then you might be wasting a few cycles for getting necessary data to the registers when its needed. Also typically processors have a higher number of ALUs than for example JMP units. They also have dedicated units for performing operations on matrices and vectors. All that comes down to the fact that CPUs are better in arithmetic than in evaluating logical statements. All conditionals (loops, ifs, etc.) are expensive.

Another important point is that CPUs try to optimize the execution of our code and make a few bets: they assume that things which are close to each other will be used together, things which have been recently used will be used again, and that memory access will follow some kind of a pattern. That improves the performance of a “typical” (or should I rather say well-written?) code, but if you’re code doesn’t follow those rules you’ll pay for it. To be fair, the CPUs don’t expect anything wicked. On a higher level the hardware-friendly code can be translated to a few basic rules of clean code: small classes, high cohesion, short functions, low cyclomatic complexity, keeping loops very short and simple, etc.

I’ve never heard anybody else making the argument that clean code leads to good performance, and I think it’s a great shame. I regret now I didn’t use this argument in my previous job when advocating for improving code quality. And that I didn’t know how to measure it and show its impact. A few things I want to learn now after the workshop.

Things can go wrong when you measure performance

This is one of the areas I’d like to learn more about. Martin mentioned a few things to consider when measuring performance. Micro-benchmarks are generally hard to get right, even harder is to measure things that actually matter.

He suggested focusing on a higher-level tests, kind of end-to-end ones, which use realistic data and realistic use cases. The last two are very important, because compilers, etc. tend to be smart and will optimize our code (e.g. remove paths that will never get executed for our crappy data, even though they’ll be for realistic one).

Gil Tene talked more about such things in his “Priming Java for Speed at Market Open” presentation. Even if you don’t use Java it’s worth watching to get an idea what can be really happening when your code is executed, especially if you use JIT compilation.

Locks are evil and concurrency is hard

I knew it before, but I didn’t realize that communication between threads using locks might be slower than over the network. Apart from examples, we’ve heard a few “tales from the trenches” which were both funny and scary at the same time. Realizing you got your basic data structure wrong after over a year in production is not a place anybody would like to be one day.

Pro-tip for work: if somebody suggests using concurrency and claims it’s easy, then it’s time to run! They’re dangerous.

I think it’s worth a deeper thought that one of the best known experts on performance optimizations and concurrency in Java is saying that. If LMAX could do with just a single thread for business logic, then 90%+ systems in this world could also probably manage.

Martin ends his “The Quest for Low-latency with Concurrent Java” with Albert Einstain’s quote:
“Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius, and a lot of courage, to move in the opposite direction.” I let you guess where concurrency fits in it.

It’s all about the basics… and great teachers

We have lots of hypes in programming. Every now and then somebody writes that everything has changed and people get worried how they can ever keep up with the pace of change.

After Martin’s workshop I have one more example that hypes come and go. Performance seems to be an ever-green hype, every year we have more data and want to process it faster, or we have more users and want to give them even better experience. Luckily, it seems that principles don’t change that often. If you understand the underlying principles then you can learn the syntax of a new language, master 95% of API of a new framework, or understand impact of new hardware features very quickly.

At the same time a lot of the complexity we deal with in our daily work is unnecessary and generated by ourselves. Because we overcomplicate things and copy-and-paste without understanding what happens under the hoods, maybe we know what and how to do something, but don’t really understand why.

To be honest for me hardware and all the “low-level stuff” was one of such things that I never could fully grasp. It just didn’t seem relevant or as important as other things I could spend time on. Martin convinced me otherwise.

I’m not sure if performance will become my great passion, but even if I stop at the level of “20% of knowledge that gives 80% results” that will be great. At least now I understand where to start and “hardware and all this low-level stuff” seems less scary than ever before.

Summary

If you ever have opportunity to attend Martin’s workshop, go! It doesn’t matter what language you use or how high is the level of abstraction you operate on – it all runs on the same hardware. Learning a little bit about it will make you a better developer.

(Visited 2,401 times, 1 visits today)

Leave a Reply

Your email address will not be published. Required fields are marked *