deane: (Default)
I was just reviewing some code which was copying a linear array of 16 values into a 4x4 matrix as follows:

for (int i = 0; i < 16; ++i) {
    matrix[i / 4][i % 4] = array[i];
}


That immediately joggled my inner optimizer. Shifting right by 2 bits is equivalent to dividing by 4, but faster on most processors. Similarly, masking off the lower two bits is equivalent to taking a 4 modulus, but again faster. So this should be significantly faster:

for (int i = 0; i < 16; ++i) {
    matrix[i >> 2][i & 0x3] = array[i];
}

But wait. Surely today's modern compilers are capable of detecting this level of optimization themselves!

I tried it out and found that even at the highest level of optimization in g++, the second version is 60 times faster than the first one.

Nice to see that some of that old learning is still useful today.

deane: (Default)
For the past two weeks I've been sorting out symbol visibility issues in our product.

By default, when an object file is placed into a shared library (or "DLL" in Windows-speak) all of its global symbols are exported by that library for use by others. This can lead to bloating of the library's global symbol table which in turn can slow down the startup of any program which links to that library. So it would be nice to be able to restrict the exported symbols to just those which we know will be needed by others.

The gcc compiler's __attribute__ declarator provides a way to do just that. To simplify the rest of this discussion, let's create the following macros:
#define DLL_EXPORT __attribute__ ((visibility("default")))
#define DLL_HIDDEN __attribute__ ((visibility("hidden")))
The first macro can be used to mark global symbols which should be exported by the shared library while the second can be used to mark those which should remain hidden inside the library. For example:
class DLL_EXPORT MyClass
{
public:
               void someFunc(int i);
    DLL_HIDDEN int  someOtherFunc();
};
The DLL_EXPORT in the class declaration says that all of the class's global symbols should be exported. The DLL_HIDDEN before someOtherFunc() overrides the class-level declaration for that one function, indicating that it should remain hidden.

If you've lived the cloistered life of a Windows programmer you might recognize this as being vaguely similar to Visual C++'s dllexport/dllimport declarations, and that's the problem: so did the people who originally implemented this stuff into our product.

The Visual C++ approach to symbol visibility is much more convoluted than gcc's. You must mark the class declaration as exported when compiling the code which implements it, but mark it as hidden when compiling other code which uses it. You basically need two different versions of the class's header file: one for insiders and one for outsiders. In practice you can get by with just one copy of the header file by using a pair of macros for each shared library and half a dozen lines of preprocessor code. E.g:
#ifdef COMPILING_MYDLL
# define MYDLL_EXPORT _declspec( dllexport )
#else
# define MYDLL_EXPORT _declspec( dllimport )
#endif

class MYDLL_EXPORT MyClass
{
   ...
};
Many Windows programmers consider this idiocy "clever" because they, quite literally, don't know any better.

But I digress.

Whoever implemented symbol visibility into our product was familiar with Visual C++'s craziness and tried to set up the gcc compiles the same way. The result was a schizophrenic setup where a symbol would be exported by its shared library but code in other libraries thought that it was hidden. Fortunately, the linker was smart enough to sort it out and only generated warnings, not errors. Unfortunately, the developers didn't take those warnings as a sign that perhaps they'd gotten it wrong.

That's where I entered the picture. I wasn't interested in this symbol visibility crap, I just wanted to write a couple of template classes which could automatically generate Python bindings for some of our existing classes. My code worked wonderfully on Linux where, as it turned out, we have the symbol visibility stuff disabled, but it refused to build on OS X where symbol visibility was enabled. There was nothing wrong with my template classes, they'd just brought the schizophrenia to the open in a way which the linker could no longer handle.

So instead of working on my project, which was already several weeks behind schedule, I spent two days learning about symbol visibility on Linux, OS X and Windows, followed by a couple more days to track down and fix all the places where we were doing it wrong.

To make matters worse, we use two different compilers on OS X: gcc, as mentioned earlier, and Intel's icpc compiler. As I fixed the errors of our own making it became apparent that there were bugs in the Intel compiler such that it generated incorrect symbol visibility in a number of different situations. Tracking those down and reporting them to Intel sucked up another couple of days.

Eventually I got it all fixed and my template classes began working on OS X. But when I tried to port them to Windows, Visual C++ didn't like them.

Of course it didn't. My templates are written in standard C++ and Microsoft never met a standard that they were capable of adhering to. So now I get to spend another gods-know-how-many days butchering my code into whatever Byzantine form Visual C++ will accept.

*sigh*

Profile

deane: (Default)
deane

April 2014

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
27282930   
Progressive Bloggers

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags