I solved the first issue myself:
Due to the weak consistency memory model, accesses to counter caused an unexpected behavior.
It was fixed by using a mem_fence:
counter = (counter + 1) % 10 ;
Anyway, avoiding the mem_fence caused certain instabilities in the driver, that maybe could be reviewed by the development team.
I got no way to solve the second issue. Could be posssible to force the use of c11 atomic in a OpenCL kernel launched in the CPU?
1 of 1 people found this helpful
I hope you would have fixed your second issue.
@"I might want to launch this kernel in the CPU, but when building the program, it does not recognize the #pragma."
The reason for getting "unrecognized OpenCL extension" error while building the kernel is because cl_amd_c11_atomics extension may not be supported by CPU. This is a AMD vendor specific extension which AMD supports for its GPU. And to make sure that whether AMD supports this extension for its CPU, check clinfo.
You can also query the supported extensions for a platform using clGetPlatformInfo(), with the param_name parameter set to enumerated value CL_PLATFORM_EXTENSIONS, or for a specific device using clGetDeviceInfo() with param_name set to enumerated value CL_DEVICE_EXTENSIONS.
The better way to approach this case is to:
1. query the specific extension on host side using clGetDeviceInfo() function and if it is not supported, return an custom message.
2. as per the opencl spec, each extension that affects kernel code compilation must add a defined macro with the name of the extension. This allows the kernel code to be compiled differently, depending on whether the extension is supported and enabled, or not. so for example the following macro could be used to test if a specific extension is supported or not. In kernel side add the following macro:
#pragma OPENCL EXTENSION cl_amd_c11_atomics : enable
//error, " cl_amd_c11_atomics extension not supported"
Thank you for your answer.
As you pointed, I checked that said extension is not available in the CPU.
Finally, I have ended up implementing a plain (not OpenCL) C++11 version of the algorithm to be executed in the CPU.
I appreciate your answer, as it could be useful in another context in the future.