Dec. 19th, 2011

deane: (Default)
I was just reviewing some code which was copying a linear array of 16 values into a 4x4 matrix as follows:

for (int i = 0; i < 16; ++i) {
    matrix[i / 4][i % 4] = array[i];
}


That immediately joggled my inner optimizer. Shifting right by 2 bits is equivalent to dividing by 4, but faster on most processors. Similarly, masking off the lower two bits is equivalent to taking a 4 modulus, but again faster. So this should be significantly faster:

for (int i = 0; i < 16; ++i) {
    matrix[i >> 2][i & 0x3] = array[i];
}

But wait. Surely today's modern compilers are capable of detecting this level of optimization themselves!

I tried it out and found that even at the highest level of optimization in g++, the second version is 60 times faster than the first one.

Nice to see that some of that old learning is still useful today.

Profile

deane: (Default)
deane

April 2014

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
27282930   
Progressive Bloggers

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags