13 Replies Latest reply on May 12, 2010 9:59 PM by renoViry

    Compiler crashes in my cl code



      I tried to compile my code in Stream Kernel Analyzer (Using Stream SDK 2.01)

      I got the following error. I am new to OpenCL Please help me.

      OpenCL Compile Error: clBuildProgram failed (CL_BUILD_PROGRAM_FAILURE).

      llc has failed



      Stack dump:
      0.      Program arguments: C:\Program Files (x86)\ATI Stream\bin\x86\llc -mcpu=a
      tir710 -mattr=mwgs-3-128-1-1 -regalloc=linearscan -mtriple=amdil-pc-amdopencl C:
      \Users\sonkanit\AppData\Local\Temp\OCL24A9.tmp.bc -f -o C:\Users\sonkanit\AppDat
      1.      Running pass 'AMDIL DAG->DAG Pattern Instruction Selection' on function
      0072EFF5 (0x011AFC24 0x00000000 0x01D2E584 0x0186B358)

      typedef struct { uint primitiveIndex; //Index of primitive in PrimitiveBuffer uint numPrimitives; //Number of Primitives uint materialIndex; //Index of material in matBuffer uint transformIndex; //Index of transformation in tranBuffer } RTObject; typedef struct { uchar primitiveType; //Type of primitive 0 = ignore, 1 = triangle, 2 = sphere, ... float4 vertices[3]; //Vertex Positions float4 normals[3]; //Vertex Normals uint objectIndex; //Index of object in objectBaffer } RTPrimitive; typedef struct { float opacity; float ior; float reflectCoeff; float4 diffuse; float4 specular; } RTMaterial; typedef struct { float16 matrix; } RTTransformation; typedef struct { float4 position; //position float4 color; //color float falloff; //attenuation uint objectID; //area light object } RTLight; typedef struct { float4 origin; float4 direction; } RTRay; __kernel void transformKernel(__global RTObject* objects, __global RTTransformation* transformations, __global RTPrimitive* primitives) { uint i = get_global_id(0); RTPrimitive primitive = primitives[i]; //Transform vertices float16 matrix = transformations[objects[primitive.objectIndex].transformIndex].matrix; float f = matrix.s0 * 2; primitives[i].vertices[0].s0 = f * primitive.vertices[0].s0; }

        • Compiler crashes in my cl code
          Thanks for reporting this, it should be fixed in our upcoming release.
          • Compiler crashes in my cl code
            Our next release is soon, it has the fix. A workaround is to not use structures, which is not realistic.
              • Compiler crashes in my cl code

                Hi MicahVillmow,

                Thank you for your quick answers.

                I tried the new release but the crash still outcome ... Any Idea ?


                EDIT: I made a mistake, I was running the old release. Sorry. Now the bug is actually corrected but I have a new one ...

                  • Compiler crashes in my cl code

                    Anyway, I have a new issue.
                    I am running my kernel on a Firestream 9270 and get this build log :

                                     Warning: W000: Barrier caused limited groupsize
                                     Warning: W001: Dangerous Barrier Opt Detected!
                                     Error: E005: Byte Addressable Stores Invalid!

                    I included the #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable in my kernel code.

                    I think that the new SDK 2.1 support byte addressable extension. Is it a known issue with the FireStream GPU ? Any idea ?

                • Compiler crashes in my cl code
                  The Firestream 9270 does not support byte addressable stores as the hardware is based on the RV770 chipset and not the evergreen family of chips.
                  • Compiler crashes in my cl code
                    This means that there are some optimizations that are occuring that are causing irreducible control flow to occur. This is illegal with our implementation at this time. Please send an example of this to streamdeveloper@amd.com cc: Micah and i'll send it to the relevant engineer to look at.
                      • Compiler crashes in my cl code

                        Actually I cannot post or send the source code for confidentiality reason.

                        But I tried to reduce the code to find where the error comes from and I found a clue. When kernel has only a few functions, I get this message error:

                        in clEnqueueNDRangeKernel, CL_INVALID_WORK_GROUP_SIZE

                        And then, when I run the complete code I get the ControlFlow error message. So I guess thay are linked.

                        But what is surprising me, is the first error. Here is my configuration files. In file.hpp:

                        #define BLOCK 10 

                        #define BLOCK_WIDTH 11

                        #define BLOCK_HEIGHT 15

                        size_t      globalThreads[2];

                        size_t      localThreads[2];

                        In file.cpp:


                        globalThreads[0] = (size_t)BLOCK_WIDTH*BLOCK;

                        globalThreads[1] = (size_t)BLOCK_HEIGHT*BLOCK;


                        localThreads[0] = BLOCK;
                        localThreads[1] = BLOCK; 


                        I also have to say that the code is perfectly running on CPU. And the CLInfo sample program provides this information for the FireStream GPU:

                        Device type: CL_DEVICE_TYPE_GPU
                        Max compute units : 10
                        Max work items dimension: 3
                        Max work items[0]: 256
                        Max work items[1]:256
                        Max work items[2]: 256
                        Max work group size: 256

                        Do I misunderstood anything ?

                          • Compiler crashes in my cl code

                            1. The maximum workgroup size also depends on your kernel. You can query the maximum workgroup size for your kernel using clGetKernelWorkGroupInfo() API call.

                            2. Since you said you cant send the whole kernel, narrow out which function/snippet is causing the control flow error and forward that to  streamdeveloper@amd.com.


                              • Compiler crashes in my cl code

                                Ok, thank you very much omkaranathan.

                                I used the clGetKernelWorkGroupInfo and the kernel work group size is actually ... 64!

                                But how is it possible to drop down from 256 work items to 64 ? Is it due to memory, I guess private memory, usage ?

                                EDIT: Ok, I figured out that using local memory, regardless to the size, means only 64 work items per work group !