Archives Discussions

sonkanit · ‎03-29-2010

I tried to compile my code in Stream Kernel Analyzer (Using Stream SDK 2.01)

I got the following error. I am new to OpenCL Please help me.

OpenCL Compile Error: clBuildProgram failed (CL_BUILD_PROGRAM_FAILURE).

llc has failed

----------------------------------------

Stack dump:
0. Program arguments: C:\Program Files (x86)\ATI Stream\bin\x86\llc -mcpu=a
tir710 -mattr=mwgs-3-128-1-1 -regalloc=linearscan -mtriple=amdil-pc-amdopencl C:
\Users\sonkanit\AppData\Local\Temp\OCL24A9.tmp.bc -f -o C:\Users\sonkanit\AppDat
a\Local\Temp\OCL24A9.tmp.il
1. Running pass 'AMDIL DAG->DAG Pattern Instruction Selection' on function
'@__OpenCL_transformKernel_kernel'
0072EFF5 (0x011AFC24 0x00000000 0x01D2E584 0x0186B358)

typedef struct { uint primitiveIndex; //Index of primitive in PrimitiveBuffer uint numPrimitives; //Number of Primitives uint materialIndex; //Index of material in matBuffer uint transformIndex; //Index of transformation in tranBuffer } RTObject; typedef struct { uchar primitiveType; //Type of primitive 0 = ignore, 1 = triangle, 2 = sphere, ... float4 vertices[3]; //Vertex Positions float4 normals[3]; //Vertex Normals uint objectIndex; //Index of object in objectBaffer } RTPrimitive; typedef struct { float opacity; float ior; float reflectCoeff; float4 diffuse; float4 specular; } RTMaterial; typedef struct { float16 matrix; } RTTransformation; typedef struct { float4 position; //position float4 color; //color float falloff; //attenuation uint objectID; //area light object } RTLight; typedef struct { float4 origin; float4 direction; } RTRay; __kernel void transformKernel(__global RTObject* objects, __global RTTransformation* transformations, __global RTPrimitive* primitives) { uint i = get_global_id(0); RTPrimitive primitive = primitives; //Transform vertices float16 matrix = transformations[objects[primitive.objectIndex].transformIndex].matrix; float f = matrix.s0 * 2; primitives.vertices[0].s0 = f * primitive.vertices[0].s0; }

MicahVillmow · ‎03-29-2010

sonkanit,
Thanks for reporting this, it should be fixed in our upcoming release.

renoViry · ‎04-22-2010

Hi,

I have the same problem.

So we cannot run this kerenel on GPU for now ? What can be the source of the matter ? I really need to run my kernel on GPU so I would like to know where the crash come from to avoid it.

Thanks.

MicahVillmow · ‎04-22-2010

renoViry,
Our next release is soon, it has the fix. A workaround is to not use structures, which is not realistic.

renoViry · ‎05-10-2010

Hi MicahVillmow,

Thank you for your quick answers.

I tried the new release but the crash still outcome ... Any Idea ?

EDIT: I made a mistake, I was running the old release. Sorry. Now the bug is actually corrected but I have a new one ...

renoViry · ‎05-10-2010

Anyway, I have a new issue.
I am running my kernel on a Firestream 9270 and get this build log :

                 Warning: W000: Barrier caused limited groupsize
                 Warning: W001: Dangerous Barrier Opt Detected!
                 ...
                 Error: E005: Byte Addressable Stores Invalid!

I included the #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable in my kernel code.

I think that the new SDK 2.1 support byte addressable extension. Is it a known issue with the FireStream GPU ? Any idea ?

MicahVillmow · ‎05-11-2010

renoViry,
The Firestream 9270 does not support byte addressable stores as the hardware is based on the RV770 chipset and not the evergreen family of chips.

renoViry · ‎05-11-2010

Ok, thank you for your help.

renoViry · ‎05-11-2010

MicahVillmow,
I have a new error occurs and did not find any thread on the forum.

When I run my kernel on GPU (FiresStream), this error occurs:
Error: E010: Irreducible ControlFlow Detected

What does it mean ?

omkaranathan · ‎05-11-2010

Could you post the source code?

MicahVillmow · ‎05-11-2010

renoViry,
This means that there are some optimizations that are occuring that are causing irreducible control flow to occur. This is illegal with our implementation at this time. Please send an example of this to streamdeveloper@amd.com cc: Micah and i'll send it to the relevant engineer to look at.

renoViry · ‎05-12-2010

Actually I cannot post or send the source code for confidentiality reason.

But I tried to reduce the code to find where the error comes from and I found a clue. When kernel has only a few functions, I get this message error:

in clEnqueueNDRangeKernel, CL_INVALID_WORK_GROUP_SIZE

And then, when I run the complete code I get the ControlFlow error message. So I guess thay are linked.

But what is surprising me, is the first error. Here is my configuration files. In file.hpp:

#define BLOCK 10

#define BLOCK_WIDTH 11

#define BLOCK_HEIGHT 15

size_t globalThreads[2];

size_t localThreads[2];

In file.cpp:

globalThreads[0] = (size_t)BLOCK_WIDTH*BLOCK;

globalThreads[1] = (size_t)BLOCK_HEIGHT*BLOCK;

localThreads[0] = BLOCK;
localThreads[1] = BLOCK;

clEnqueueNDRangeKernel(commandQueue,
                       kernel,
                       2,
                       NULL,
                       globalThreads,
                       localThreads,
                       0,
                       NULL,
                       &events[0]);

I also have to say that the code is perfectly running on CPU. And the CLInfo sample program provides this information for the FireStream GPU:

Device type: CL_DEVICE_TYPE_GPU
Max compute units : 10
Max work items dimension: 3
Max work items[0]: 256
Max work items[1]:256
Max work items[2]: 256
Max work group size: 256

Do I misunderstood anything ?

omkaranathan · ‎05-12-2010

1. The maximum workgroup size also depends on your kernel. You can query the maximum workgroup size for your kernel using clGetKernelWorkGroupInfo() API call.

2. Since you said you cant send the whole kernel, narrow out which function/snippet is causing the control flow error and forward that to streamdeveloper@amd.com.

renoViry · ‎05-12-2010

Ok, thank you very much omkaranathan.

I used the clGetKernelWorkGroupInfo and the kernel work group size is actually ... 64!

But how is it possible to drop down from 256 work items to 64 ? Is it due to memory, I guess private memory, usage ?

EDIT: Ok, I figured out that using local memory, regardless to the size, means only 64 work items per work group !

Archives Discussions

Compiler crashes in my cl code