![]() ![]() It would be better to inline some copies of fib into itself which might even allow some small amount of CSE, but importantly would reduce the number of calls. Unfortunately, the result isn't too great as the initial part of the function is pretty heavy, and this bottlenecks computation. In other words, the function was transformed from: This loop repeatedly calls the `fib(n - 1)` part. The compiler has determined that the second call `fib(n - 2)` is "almost" tail-recursive (there is probably a better term for this), and so was able to convert that call into the loop you see at 10081 to 10095. It's also interesting to note that -Os generates an even smaller version with only one recursion, but that ends up slower (~6.5s). įWIW, this is what the function looks like at 0x10070:ġ00a2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)Įdit: It's interesting to note that there's only one recursive call to the function, not two (you get two calls if you compile with -O1). This looks familiar, but I thought I had seen something similar in a blog form. only in cycles, which is even more intriguing.Īnother interesting fact: valgrind's branch simulator doesn't show a difference in branch mispredictions between both (it also shows a misprediction rate much larger than reality, it's likely based on an old model Edit: valgrind manual says: Cachegrind simulates branch predictors intended to be typical of mainstream desktop/server processors of around 2004.). Edit: actually, there isn't a difference in branch-misses for those. Interestingly, the C and C++ versions have different performance for the same reason: they produce the exact same machine code, at different addresses. I didn't know an address difference of 0x30 could influence the number of branch misses so much. I was starting to think, well, this might be related to instruction-cache lines. ![]() Then I looked at the assembly, and it made even less sense: the main function, fib::fib, which is where the vast majority of the time is spent, is identical, except for addresses. ![]() Then I looked at the code, and it began to make no sense. This yielded, interestingly, a ~500ms difference: ~6.45s vs ~6.95s (with LTO winning). before looking at the code, I compared the elapsed time between the fib.rs program compiled with `rustc -O` and `rustc -O -C lto`. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |