Hey AMD Devs,
I'm not sure if you know about the problems that were introduced with Catalyst 14.12 (compared to 14.9). I was able to track down at least one of them (the most important one) and wanted to let you know so that you might be able to find the problem more easy or to fix it in a later catalyst release. First of all, here are my observeration so far: https://hashcat.net/forum/thread-3915.html
What I am talking here is the problem that the first GPU always seem to be slower than all the other ones. In oclHashcat, there is a seperate thread for each GPU from which the kernels are queued. That's important as oclHashcat supports having GPU's of different speed and therefore it has to do the synchronization on its own. Each GPU has its own context and its own command queue.
I played a bit with the GPU_* variables found in /usr/lib/libamdocl64.so and then, by luck, I found a workaround for the first-gpu-is-slow problem: Set GPU_NUM_COMPUTE_RINGS to 1