deane: (Default)
2012-01-14 05:56 pm

Qt And OS X Run Loop Modes

I ran into an interesting bug on OS X this week. I had an object which was trying to calculate a value that had been requested of it. The calculation can take a long time so every now and then the object would check to see if the user had pressed the escape key to abort the calculation. But the very first time it tried to make the check, it would crash.

Using a combination of the debugger and print statements I was able to determine that the problem was that when it tried to check for the escape key, window redraw events were running and at least one of those was calling on the object to calculate the same value that it was already in the process of calculating. Since the object was non-re-entrant (i.e. it had to finish doing one thing before it could be asked to do another), that was resulting in the crash.

The thing is, the check for the escape key shouldn't be allowing any other events to run. Most or our code is written in C++, but this bit was in Cocoa/Objective-C and looked like this:

NSUInteger mask = NSKeyDownMask|NSKeyUpMask;

NSEvent *event = [NSApp nextEventMatchingMask:mask untilDate:nil inMode:NSDefaultRunLoopMode dequeue:YES];

The first line defines a mask of the types of events we want to consider: just key presses and key releases. The second line, which is probably broken into two in your browser, asks the application's event loop (known in OS X jargon as a 'run loop') to return the first event matching the mask which is currently queued up for processing, if there is one.

Given that the mask setting should only allow key presses and releases to be processed, how the heck were window redraw events getting through?

It turns out that the answer lies in that 'inMode:NSDefaultRunLoopMode' bit. A "run loop mode" determines which input sources the loop will check for events (keyboard, mouse, etc) and, critically, which observers it should talk to while processing events. Observers are other bits of code which have asked to be alerted at various stages during the processing of events.

You can define your own run loop modes, but OS X has four pre-defined ones:
  • NSDefaultRunLoopMode
  • NSEventTrackingRunLoopMode
  • NSModalPanelRunLoopMode
  • NSConnectionReplyMode
Without going into the details of each one, it's sufficient for the purposes of this post to just note that the first three are so frequently used that they are grouped together as "common modes". Observers can subscribe to all three just by subscribing to 'NSRunLoopCommonModes'.

We use Nokia's Qt toolkit to implement our application's GUI. It turns out that Qt sets up an observer on the common run loop modes. That observer apparently does not (or cannot) check that we are running the loop with a mask which only allows for keystroke events. So it goes ahead and issues its own internally generated window redraw events, even when we are only asking for keystroke events. Thus the crash.

The fourth run loop mode, NSConnectionReplyMode, is intended for waiting for network connection events, but we don't really care about that. The important things are that it is not one of the common run loop modes and thus does not have a Qt observer sitting on it, and it accepts the keyboard as a valid input source. By using it, we can stop Qt from issuing unwanted GUI events while we check for key presses:

NSEvent *event = [NSApp nextEventMatchingMask:mask untilDate:nil inMode:NSConnectionReplyMode dequeue:YES];

It would be even better if we created our own run loop mode which had no observers at all attached to it, and whose only valid input source was the keyboard. Unfortunately, while I've be able to figure out how to create my own run loop, I haven't been able to figure out how to get hold of the 'port' for the keyboard so that I can add it to the run loop. So we'll have to make do with NSConnectionReplyMode for now.
deane: (Default)
2011-12-19 06:52 pm
Entry tags:

An Oldie But Goodie

I was just reviewing some code which was copying a linear array of 16 values into a 4x4 matrix as follows:

for (int i = 0; i < 16; ++i) {
    matrix[i / 4][i % 4] = array[i];
}


That immediately joggled my inner optimizer. Shifting right by 2 bits is equivalent to dividing by 4, but faster on most processors. Similarly, masking off the lower two bits is equivalent to taking a 4 modulus, but again faster. So this should be significantly faster:

for (int i = 0; i < 16; ++i) {
    matrix[i >> 2][i & 0x3] = array[i];
}

But wait. Surely today's modern compilers are capable of detecting this level of optimization themselves!

I tried it out and found that even at the highest level of optimization in g++, the second version is 60 times faster than the first one.

Nice to see that some of that old learning is still useful today.

deane: (Default)
2011-05-20 08:58 pm
Entry tags:

A Flaw In Amdahl's Law?

Amdahl's law, when applied to parallel computing, basically says that when trying to speed up a computation you are limited by the sequential portion of the code. That is, those parts of the code which cannot be executed at the same time but must be executed sequentially place a limit on the improvement in performance which can be achieved by throwing more processors at the problem.

Read more... )

deane: (Default)
2011-03-21 10:25 pm
Entry tags:

Job Security

I just came across a method in our API1 in which the second parameter is optional. Looking at the code which implements that method, it always raises an error condition if the second parameter is not supplied.

Some option, huh?

I find stuff like this all the time. I could spend years just fixing little bits of crud like that, if there weren't more pressing tasks at hand. It's not exactly job security since there's nothing to stop the company from deciding that it's not worthwhile to fix those sorts of things, but it does mean that there will never be any shortage of work for me to do.


1Application Programming Interface. It's a way for our users to write their own applications which interact directly with the internals of our product.
deane: (Default)
2010-07-29 03:52 am
Entry tags:

Musings On Symbol Visibility

For the past two weeks I've been sorting out symbol visibility issues in our product.

By default, when an object file is placed into a shared library (or "DLL" in Windows-speak) all of its global symbols are exported by that library for use by others. This can lead to bloating of the library's global symbol table which in turn can slow down the startup of any program which links to that library. So it would be nice to be able to restrict the exported symbols to just those which we know will be needed by others.

The gcc compiler's __attribute__ declarator provides a way to do just that. To simplify the rest of this discussion, let's create the following macros:
#define DLL_EXPORT __attribute__ ((visibility("default")))
#define DLL_HIDDEN __attribute__ ((visibility("hidden")))
The first macro can be used to mark global symbols which should be exported by the shared library while the second can be used to mark those which should remain hidden inside the library. For example:
class DLL_EXPORT MyClass
{
public:
               void someFunc(int i);
    DLL_HIDDEN int  someOtherFunc();
};
The DLL_EXPORT in the class declaration says that all of the class's global symbols should be exported. The DLL_HIDDEN before someOtherFunc() overrides the class-level declaration for that one function, indicating that it should remain hidden.

If you've lived the cloistered life of a Windows programmer you might recognize this as being vaguely similar to Visual C++'s dllexport/dllimport declarations, and that's the problem: so did the people who originally implemented this stuff into our product.

The Visual C++ approach to symbol visibility is much more convoluted than gcc's. You must mark the class declaration as exported when compiling the code which implements it, but mark it as hidden when compiling other code which uses it. You basically need two different versions of the class's header file: one for insiders and one for outsiders. In practice you can get by with just one copy of the header file by using a pair of macros for each shared library and half a dozen lines of preprocessor code. E.g:
#ifdef COMPILING_MYDLL
# define MYDLL_EXPORT _declspec( dllexport )
#else
# define MYDLL_EXPORT _declspec( dllimport )
#endif

class MYDLL_EXPORT MyClass
{
   ...
};
Many Windows programmers consider this idiocy "clever" because they, quite literally, don't know any better.

But I digress.

Whoever implemented symbol visibility into our product was familiar with Visual C++'s craziness and tried to set up the gcc compiles the same way. The result was a schizophrenic setup where a symbol would be exported by its shared library but code in other libraries thought that it was hidden. Fortunately, the linker was smart enough to sort it out and only generated warnings, not errors. Unfortunately, the developers didn't take those warnings as a sign that perhaps they'd gotten it wrong.

That's where I entered the picture. I wasn't interested in this symbol visibility crap, I just wanted to write a couple of template classes which could automatically generate Python bindings for some of our existing classes. My code worked wonderfully on Linux where, as it turned out, we have the symbol visibility stuff disabled, but it refused to build on OS X where symbol visibility was enabled. There was nothing wrong with my template classes, they'd just brought the schizophrenia to the open in a way which the linker could no longer handle.

So instead of working on my project, which was already several weeks behind schedule, I spent two days learning about symbol visibility on Linux, OS X and Windows, followed by a couple more days to track down and fix all the places where we were doing it wrong.

To make matters worse, we use two different compilers on OS X: gcc, as mentioned earlier, and Intel's icpc compiler. As I fixed the errors of our own making it became apparent that there were bugs in the Intel compiler such that it generated incorrect symbol visibility in a number of different situations. Tracking those down and reporting them to Intel sucked up another couple of days.

Eventually I got it all fixed and my template classes began working on OS X. But when I tried to port them to Windows, Visual C++ didn't like them.

Of course it didn't. My templates are written in standard C++ and Microsoft never met a standard that they were capable of adhering to. So now I get to spend another gods-know-how-many days butchering my code into whatever Byzantine form Visual C++ will accept.

*sigh*