I tried to compile my code in Stream Kernel Analyzer (Using Stream SDK 2.01)
I got the following error. I am new to OpenCL Please help me.
OpenCL Compile Error: clBuildProgram failed (CL_BUILD_PROGRAM_FAILURE).
llc has failed
----------------------------------------
Stack dump:
0. Program arguments: C:\Program Files (x86)\ATI Stream\bin\x86\llc -mcpu=a
tir710 -mattr=mwgs-3-128-1-1 -regalloc=linearscan -mtriple=amdil-pc-amdopencl C:
\Users\sonkanit\AppData\Local\Temp\OCL24A9.tmp.bc -f -o C:\Users\sonkanit\AppDat
a\Local\Temp\OCL24A9.tmp.il
1. Running pass 'AMDIL DAG->DAG Pattern Instruction Selection' on function
'@__OpenCL_transformKernel_kernel'
0072EFF5 (0x011AFC24 0x00000000 0x01D2E584 0x0186B358)
typedef struct { uint primitiveIndex; //Index of primitive in PrimitiveBuffer uint numPrimitives; //Number of Primitives uint materialIndex; //Index of material in matBuffer uint transformIndex; //Index of transformation in tranBuffer } RTObject; typedef struct { uchar primitiveType; //Type of primitive 0 = ignore, 1 = triangle, 2 = sphere, ... float4 vertices[3]; //Vertex Positions float4 normals[3]; //Vertex Normals uint objectIndex; //Index of object in objectBaffer } RTPrimitive; typedef struct { float opacity; float ior; float reflectCoeff; float4 diffuse; float4 specular; } RTMaterial; typedef struct { float16 matrix; } RTTransformation; typedef struct { float4 position; //position float4 color; //color float falloff; //attenuation uint objectID; //area light object } RTLight; typedef struct { float4 origin; float4 direction; } RTRay; __kernel void transformKernel(__global RTObject* objects, __global RTTransformation* transformations, __global RTPrimitive* primitives) { uint i = get_global_id(0); RTPrimitive primitive = primitives; //Transform vertices float16 matrix = transformations[objects[primitive.objectIndex].transformIndex].matrix; float f = matrix.s0 * 2; primitives.vertices[0].s0 = f * primitive.vertices[0].s0; }
Hi,
I have the same problem.
So we cannot run this kerenel on GPU for now ? What can be the source of the matter ? I really need to run my kernel on GPU so I would like to know where the crash come from to avoid it.
Thanks.
Hi MicahVillmow,
Thank you for your quick answers.
I tried the new release but the crash still outcome ... Any Idea ?
EDIT: I made a mistake, I was running the old release. Sorry. Now the bug is actually corrected but I have a new one ...
Anyway, I have a new issue.
I am running my kernel on a Firestream 9270 and get this build log :
Warning: W000: Barrier caused limited groupsize
Warning: W001: Dangerous Barrier Opt Detected!
...
Error: E005: Byte Addressable Stores Invalid!
I included the #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable in my kernel code.
I think that the new SDK 2.1 support byte addressable extension. Is it a known issue with the FireStream GPU ? Any idea ?
Ok, thank you for your help.
MicahVillmow,
I have a new error occurs and did not find any thread on the forum.
When I run my kernel on GPU (FiresStream), this error occurs:
Error: E010: Irreducible ControlFlow Detected
What does it mean ?
Could you post the source code?
Actually I cannot post or send the source code for confidentiality reason.
But I tried to reduce the code to find where the error comes from and I found a clue. When kernel has only a few functions, I get this message error:
in clEnqueueNDRangeKernel, CL_INVALID_WORK_GROUP_SIZE
And then, when I run the complete code I get the ControlFlow error message. So I guess thay are linked.
But what is surprising me, is the first error. Here is my configuration files. In file.hpp:
#define BLOCK 10
#define BLOCK_WIDTH 11
#define BLOCK_HEIGHT 15
size_t globalThreads[2];
size_t localThreads[2];
In file.cpp:
globalThreads[0] = (size_t)BLOCK_WIDTH*BLOCK;
globalThreads[1] = (size_t)BLOCK_HEIGHT*BLOCK;
localThreads[0] = BLOCK;
clEnqueueNDRangeKernel(commandQueue,
kernel,
2,
NULL,
globalThreads,
localThreads,
0,
NULL,
&events[0]);
I also have to say that the code is perfectly running on CPU. And the CLInfo sample program provides this information for the FireStream GPU:
Device type: CL_DEVICE_TYPE_GPU
Max compute units : 10
Max work items dimension: 3
Max work items[0]: 256
Max work items[1]:256
Max work items[2]: 256
Max work group size: 256
Do I misunderstood anything ?
1. The maximum workgroup size also depends on your kernel. You can query the maximum workgroup size for your kernel using clGetKernelWorkGroupInfo() API call.
2. Since you said you cant send the whole kernel, narrow out which function/snippet is causing the control flow error and forward that to streamdeveloper@amd.com.
Ok, thank you very much omkaranathan.
I used the clGetKernelWorkGroupInfo and the kernel work group size is actually ... 64!
But how is it possible to drop down from 256 work items to 64 ? Is it due to memory, I guess private memory, usage ?
EDIT: Ok, I figured out that using local memory, regardless to the size, means only 64 work items per work group !