A bit of background. I'm working on a computational mathematics problem. The problem is such that it gets a nice boost from GPU optimizations. I have a working CUDA implementation and I'm trying to port it to AMD (since AMD chips appear to have an edge in operations per second, at least on paper).
The port is mostly going well, two biggest issues encountered so far were the limit on the number of work-units per kernel invocation (NVIDIA can do 32768x256, my current Barts can only do 124x256. This is silly since it's clearly not a hardware issue, and easy to fix), and the lack of 64-bit atomics (that took some coding to work around, but it's also surmountable.)
Here's the issue I'm facing right now. My program has a lot of 64-bit integer divisions. Those are typically quit slow. But there's a workaround. I have this macro:
#define divide(a, b, c) ((__umul64hi(a,b)+a)>>(c))
here 'a' and 'b' are uint64_t, and 'c' is uint8_t. '__umul64hi' computes the 128-bit product of two 64-bit unsigned integers and saves the upper 64 bit.
If the list of divisors is known in advance, we can precompute b & c and store them somewhere. On NVIDIA (and on x86, too), this speeds up the division by something like a factor of 3.
I tried to get this working on AMD, too. The closest fit to __umul64hi I could find is 'mul_hi' (defined in the OpenCL 1.1 spec, section 6.11.3), though I'm not sure if it's exactly what I need. But when I plug that one into my macro, I get a crash during the kernel compilation stage (see attached log).
So: is this the correct function to call? If it is, there's apparently a bug in the run-time compiler that ships with 2.3.
(gdb) r Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6db4392 in ?? () from /lib/libc.so.6 (gdb) bt #0 0x00007ffff6db4392 in ?? () from /lib/libc.so.6 #1 0x00007fffb9522e10 in ?? () from /home/eugene/Downloads/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so #2 0x00007fffb9523534 in ?? () from /home/eugene/Downloads/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so #3 0x00007fffb9524a40 in ?? () from /home/eugene/Downloads/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so <snip> #24 0x00007fffb8dec16a in ?? () from /home/eugene/Downloads/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so #25 0x00007fffb8d7d77a in clBuildProgram () from /home/eugene/Downloads/ati-stream-sdk-v2.3-lnx64/lib/x86_64/libatiocl64.so