For the last week or so, I’ve been mostly fighting with the performance of the Rust compiler. The rustc performance graph from the RustBot page tells a lot of the story:
The wildly varying performance is largely due to a change I made to move the code for addition on vectors out of the compiler and into the core library. The compiler used to generate very optimized vector code. For example, when doing
v += [x] (that is, appending a single element to a vector in place), the compiler would avoid allocating a temporary vector for the
[x]. In effect, the compiler generated what amounts to an inline call to
My work involved adding an overloaded addition operator to Rust’s core library, and changing the compiler to use that version instead of its generated version. The landing of this change corresponds to about build number 70 or so on the graph, where there is a small but significant jump in the time rustc took. Due to the way we handle overloaded operators, the compiler must insert many more vector copies than it did before. It seemed too good to be true that I would only lose a small amount of performance.
And it was too good to be true. It turns out through a bunch of rebasing, I had managed to undo a small switch in
is_binopable that controlled whether my new code ever got called. In the meantime, Sully has been doing a lot of work on the way Rust handles vectors, and he landed a change around build 90 in the performance graph above that ended up causing Rust to use the library version of vector append instead. This was more like the performance decrease I was expecting.
In order to fix this, I had to track down all the places in the compiler where we were using the addition operator on vectors, and replace them with things like
vec::append_one. This was tedious work, and I did it in stages. Each spot where the execution time improves in the graph is where I landed a couple more of these changes. Once this was finally done, we can see that rustc’s performance was even slightly better than it was before. I believe this is because the library versions of these functions are able to remove some copies that rustc could not remove before. Sadly, the code is much uglier than it was before. We are discussing the ability to specify modes on the self argument, which will allow the compiler to again avoid unnecessary copies when using the plus operator. This should allow us to make using the plus operator just as fast as the underlying library functions.
The moral of the story is that copies can kill performance. Rust warns often when it is inserting copies you may not have expected. Pay attention to these warnings and you will write much faster code.