AnsweredAssumed Answered

Bug in AMD OpenCL implementation?

Question asked by ddemidov on Apr 16, 2013
Latest reply on Apr 19, 2013 by himanshu.gautam



I am not sure if this is the right place for OpenCL bug reports, so please forgive me if I am wrong. Here is the link to the simple program that should add two vectors multiple times: The source is also attached here for convenience.


This simple program, when compiled with


    g++ -std=c++0x -o vector_sum vector_sum.cpp -lOpenCL


outputs 4096 == 4096 on NVIDIA and Intel OpenCL implementations. When, however, it is executed on AMD GPUs (the ones I tested are HD 7970 'Tahiti' and HD 7770 'Capeverde'), it may output 4096 == 4081, 4096 == 4082, or something else.


Adding call to cl::CommandQueue::finish() after each kernel launch (but not after the complete loop) solves the issue, but should be unnecessary according to standard.


Replacing definition of global_size at line 99 with


    size_t global_size = alignup(N, workgroup_size);


also helps, but is equally unnecessary.


The current operating system is Gentoo linux, kernel version 3.7.1. ati-drivers package has version 13.1. But I have observed this behavior on several machines for several consecutive versions of ati-drivers (and several linux kernels).


Is this a bug in AMD OpenCL, or am I doing something wrong?