Thanks for reporting this, it should be fixed in our upcoming release.
I have the same problem.
So we cannot run this kerenel on GPU for now ? What can be the source of the matter ? I really need to run my kernel on GPU so I would like to know where the crash come from to avoid it.
Our next release is soon, it has the fix. A workaround is to not use structures, which is not realistic.
Thank you for your quick answers.
I tried the new release but the crash still outcome ... Any Idea ?
EDIT: I made a mistake, I was running the old release. Sorry. Now the bug is actually corrected but I have a new one ...
Anyway, I have a new issue.
I am running my kernel on a Firestream 9270 and get this build log :
Warning: W000: Barrier caused limited groupsize
Warning: W001: Dangerous Barrier Opt Detected!
Error: E005: Byte Addressable Stores Invalid!
I included the #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable in my kernel code.
I think that the new SDK 2.1 support byte addressable extension. Is it a known issue with the FireStream GPU ? Any idea ?
The Firestream 9270 does not support byte addressable stores as the hardware is based on the RV770 chipset and not the evergreen family of chips.
Ok, thank you for your help.
I have a new error occurs and did not find any thread on the forum.
When I run my kernel on GPU (FiresStream), this error occurs:
Error: E010: Irreducible ControlFlow Detected
What does it mean ?
Could you post the source code?
This means that there are some optimizations that are occuring that are causing irreducible control flow to occur. This is illegal with our implementation at this time. Please send an example of this to email@example.com cc: Micah and i'll send it to the relevant engineer to look at.
Actually I cannot post or send the source code for confidentiality reason.
But I tried to reduce the code to find where the error comes from and I found a clue. When kernel has only a few functions, I get this message error:
in clEnqueueNDRangeKernel, CL_INVALID_WORK_GROUP_SIZE
And then, when I run the complete code I get the ControlFlow error message. So I guess thay are linked.
But what is surprising me, is the first error. Here is my configuration files. In file.hpp:
#define BLOCK 10
#define BLOCK_WIDTH 11
#define BLOCK_HEIGHT 15
globalThreads = (size_t)BLOCK_WIDTH*BLOCK;
globalThreads = (size_t)BLOCK_HEIGHT*BLOCK;
localThreads = BLOCK;
I also have to say that the code is perfectly running on CPU. And the CLInfo sample program provides this information for the FireStream GPU:
Device type: CL_DEVICE_TYPE_GPU
Max compute units : 10
Max work items dimension: 3
Max work items: 256
Max work items:256
Max work items: 256
Max work group size: 256
Do I misunderstood anything ?
1. The maximum workgroup size also depends on your kernel. You can query the maximum workgroup size for your kernel using clGetKernelWorkGroupInfo() API call.
2. Since you said you cant send the whole kernel, narrow out which function/snippet is causing the control flow error and forward that to firstname.lastname@example.org.
Ok, thank you very much omkaranathan.
I used the clGetKernelWorkGroupInfo and the kernel work group size is actually ... 64!
But how is it possible to drop down from 256 work items to 64 ? Is it due to memory, I guess private memory, usage ?
EDIT: Ok, I figured out that using local memory, regardless to the size, means only 64 work items per work group !