cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

inducer77
Adept II

Tahiti compiler crash

Hi there,

the attached PyOpenCL script (really, just the embedded CL C) causes the 12.8 OpenCL compiler to segfault when targeting Tahiti.

Any help would be much appreciated.

Thanks!

Andreas

0 Likes
10 Replies
Wenju
Elite

Hi Andreas,

It will be compiled failed if the kernel is too complex. I compiled following kernel by using AMD APP KernelAnalyzer,

#define myfloat4 float4
#define myfloat float
#define NqP 64
#define PADP 64
#define BSIZEP 32


__kernel void code(const int K,
                           __global const int      * restrict galnums,
                           __global const myfloat4 * restrict u
                           )
{
  /* advects (u.grad)(u,v,w,T) */
  /* shared register for 'r,s' plane */
  __local myfloat LPVL[NqP][NqP+PADP];

  volatile __local myfloat Lw[NqP];

  // u[:] -> uk[:]

  __local myfloat4  Lu1[NqP][NqP][NqP+PADP]; // use shared ?

  const unsigned int e = get_group_id(0);
  const unsigned int i = get_local_id(0);
  const unsigned int j = get_local_id(1);
  unsigned int k;

  int m;


  for(k=0;k<NqP;++k) {

    const int id = e*BSIZEP+k*NqP*NqP+j*NqP+i;

    const int gid = galnums[id];

    const myfloat sc = (Lw*Lw*Lw);

    myfloat4 uk = 0;

    if(gid>=0)
      uk = sc*u[gid];

    for(m=0;m<NqP;++m) Lu1 += LPVL*uk;
  }

  barrier(CLK_LOCAL_MEM_FENCE);

  for(k=0;k<NqP;++k) {
    myfloat4 tmp1 = 0;

    for(m=0;m<NqP;++m) tmp1 += LPVL*Lu1;

    barrier(CLK_LOCAL_MEM_FENCE);
    Lu1 = tmp1;
  }

}

I got failed, and I don't know whether the kernel is correct.

0 Likes

Thanks for your answer. That said, to me the only acceptable outcomes for a compiler are 1) a working binary or 2) an intelligible error message. Note how a segmentation fault does not fall into either of these categories.

0 Likes

Hi inducer77,

Your kernel  failed to compile because it requires more than 2,100,000 bytes of local memory per workgroup

GPUs divide their local memory between the compute units, Tahiti has 64K bytes on each of 32 CUs, for

a total of. 2,097,152 bytes. Each kernel (workgroup) can use a maximum of 32768 bytes but if 2 kernels are running on the same CU, then all 64K will be used.

The kernel compiles fine if the local memory is within spec. I got the error message:

calclCompile failedError: Creating kernel code failed! Using the Kernel Analyzer.

inducer77
Adept II

Dear drallan,

thanks for your answer. It would still be great if the compiler (called from CL) didn't crash, but just said what its problem is...

Andreas

0 Likes

Are you sure that the segmentation fault is caused when compiling the kernel or if your program is trying to run a nonexistent program binary? (due to compilation error which was probably undetected). It might be a downside of using python perhaps?

In the past I had a kernel which caused an actual crash at program build stage and it also caused a crash in kernel analyzer.  (I believe the same code is used for building programs on both cases, even between Linux and Windows)

When I copy/pasted the kernel into one of my C/C++ programs I get the same error as drallan logged from the build process:

calclCompile failedError: Creating kernel code failed!

and also the program build returns an error code CL_BUILD_PROGRAM_FAILURE and no segmentation fault occurs since I have an if statement which exits the program if kernel cant be built.

Sure enough, the error message could have been more sophisticated...

0 Likes

I am quite sure that the Python wrapper checks for all possible error conditions (and reports them). I wrote the thing, after all.

Now this may come down to a difference in driver versions--I'm using version 12.8 of the driver package as shipped by Debian, here:

http://packages.debian.org/experimental/amd-opencl-icd

Andreas

0 Likes

To be honest, I dont trust python that much, when there is a problem, there is often python also hehe

I would think that the compilation is independent of the driver. Did you not install amd-app-sdk 2.7 ?

Try the attached sample where is test_kern.cpp and kernel.cl:

$ g++ test_kern.cpp -lOpenCL

$ ./a.out

...

blah blah

...

Building OpenCL executable...

Error: Failed to build program executable!

-11

calclCompile failedError: Creating kernel code failed!

$

If your platform and device are not the first, you might have to edit some variables in the code (its a quick hack so).

0 Likes

Are you able to see the error message with my code?

0 Likes

Thanks for your help, but given that you're name-calling other people's code for causing trouble, I'm surprised by your code's brittleness. Here's how it fails:

Available platforms: 2

-------------------------------------------------------------------------------

Platform ID - Name : 0* - Intel(R) OpenCL

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.1 LINUX

Platform Vendor: Intel(R) Corporation

Platform Extensions: Intel(R) Corporation

Platform ID - Name : 1  - AMD Accelerated Parallel Processing

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.2 AMD-APP (938.2)

Platform Vendor: Advanced Micro Devices, Inc.

Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

-------------------------------------------------------------------------------

* - Selected

Error: Failed to get device IDs!

(and even when convinced to pick the AMD platform, it would die with the same error, and even then it would likely pick just any GPU, and thereby not the right one--the machine in question has a Tahiti and a Cayman, with the Cayman as device 0.)

0 Likes

Like I said, you should select the right platform by changing some variables in the code (and if you did this, I dont understand why you post a different output with wrong platform selected, did you forget to recompile?). You should have changed the platform ID to 1 in the beginning of the code. Then recompile...

#define PLATFORM_ID 1

As you can see from the output, it chose platform with ID 0 (defined by *) which is Intel OpenCL and therefore does not contain any GPUs. Therefore, it cant get device IDs and it is correctly exiting the program without a segmentation fault

As an example of a test case with NVIDIA and AMD platforms (from a machine with Nvidia GPUs only):

Available platforms: 2

-------------------------------------------------------------------------------

Platform ID - Name : 0* - NVIDIA CUDA

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.1 CUDA 4.2.1

Platform Vendor: NVIDIA Corporation

Platform Extensions: NVIDIA Corporation

Platform ID - Name : 1  - AMD Accelerated Parallel Processing

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.2 AMD-APP (923.1)

Platform Vendor: Advanced Micro Devices, Inc.

Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

-------------------------------------------------------------------------------

* - Selected

Available devices: 2 (can use only 1 for now!)

-------------------------------------------------------------------------------

1* - NVIDIA Corporation Tesla M2050...

2  - NVIDIA Corporation Tesla M2050...

-------------------------------------------------------------------------------

* - Selected

Device preferred float vector size 1

Trying to use OpenCL source file kernel.cl

Mapped the source file (1163 bytes) to 0x7f0ce0e0c000

Building OpenCL executable...

Error: Failed to build program executable!

-42

ptxas error   : Entry function 'code' uses too much shared data (0x808100 bytes, 0xc000 max)

I see now that NVIDIAs error is much nicer...

On this specific machine there are no AMD GPUs so I set:

#define PLATFORM_ID 1

#define DEV_TYPE "CPU"

Available platforms: 2

-------------------------------------------------------------------------------

Platform ID - Name : 0  - NVIDIA CUDA

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.1 CUDA 4.2.1

Platform Vendor: NVIDIA Corporation

Platform Extensions: NVIDIA Corporation

Platform ID - Name : 1* - AMD Accelerated Parallel Processing

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.2 AMD-APP (923.1)

Platform Vendor: Advanced Micro Devices, Inc.

Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

-------------------------------------------------------------------------------

* - Selected

Error: Failed to get device IDs!

-bash-4.1$ g++ test_kern.cpp -lOpenCL -I$OPENCL_INCLUDE

-bash-4.1$ ./a.out

Available platforms: 2

-------------------------------------------------------------------------------

Platform ID - Name : 0  - NVIDIA CUDA

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.1 CUDA 4.2.1

Platform Vendor: NVIDIA Corporation

Platform Extensions: NVIDIA Corporation

Platform ID - Name : 1* - AMD Accelerated Parallel Processing

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.2 AMD-APP (923.1)

Platform Vendor: Advanced Micro Devices, Inc.

Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

-------------------------------------------------------------------------------

* - Selected

Available devices: 1 (can use only 1 for now!)

-------------------------------------------------------------------------------

1* - GenuineIntel Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz...

-------------------------------------------------------------------------------

* - Selected

Device preferred float vector size 4

Trying to use OpenCL source file kernel.cl

Mapped the source file (1163 bytes) to 0x7f1411a2c000

Building OpenCL executable...

Error: Failed to create compute kernel!

-46

0 Likes