Problem happens only when used with binary kernel image compiled for the same device. Building from kernel sources works fine.
My program uses the FFT_Kernels.cl from your samples. Binary image was saved and loaded using your code in CLUtils.cpp,
however, converted to C.
Running it through the debugger, gdb, shows:
setupCL () at dsp.c:252
252 fft_kernel = clCreateKernel(prog, "fft", &err);
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff3bc71fc in ?? () from /opt/AMDAPPSDK-2.9-1/lib/x86_64/libamdocl64.so
#0 0x00007ffff3bc71fc in ?? ()
#1 0x00007ffff3baa8de in clCreateKernel ()
#2 0x000000000040e0da in setupCL () at dsp.c:252
#3 0x000000000040a29f in main (argc=1, argv=0x7fffffffdd30) at sdmp.l:2097
(gdb) p prog
$6 = (cl_program) 0xdeacd0
Using fglrx-14.501.1003 in Ubuntu 14.04 x64, with SDK 2.9.1 (opencl 1.2)
Video card R9 200 Saphire, codename Pitcairn. CPU AMD FX(tm)-8320 Eight-Core Processor
Did you try the command line arguments "--dump" and "--load" to generate and consume the device binary for any of SDK sample?
Please make sure that you are providing the correct device binary during the "--load" or clCreateProgramWithBinary. Input device binary should match with the target device.
I'm not sure what you mean by --load and --dump command lines. I just use the FFT_Kernels.cl from the samples to do my fft.My program doesn't take these args. I only have 1 GPU device and I save binary image under FFT_Kernels_Pitcairn, where Pitcairn is the codename of my GPU. This again my app gets it from clGetDeviceInfo->DEVICE_NAME, both when it saves it and when it loads it. Not a chance for a mistake:-(
Bear in mind that the same mechanism works fine with kernel sources...
file -s FFT_Kernels_Pitcairn detects "corrupted ELF32 executable"
I finally understood that you meant running the samples using these args. Did it and it worked. So problem was in my code. I compared kernel images for FFT_Kernels.cl with the sample's and were identical. So saving image was fine. Looking at the loading part, the problem was that I was just loading the binary kernel image, not building it. I didn't suspect that a binary kernel image for a specific device needed further compilation or building :-(.
Now everything works fine. Thank you so much:-)
One last question. Is there a compiler option to generate 64bit kernel code? I used the example's cflags and resulting kernel image is ELF32. I tried adding the gcc option -m64, but nothing changed. I'm not even sure that the final kernel object is 32bit, since it is always built in memory.
The bitnes of kernels depends on bitness of device. AFAIK you need set GPU_FORCE_64BIT_PTR=1 enviroment variable to build 64 bit code for device.
Its good that you've identified the problem. As per the OpenCL spec, kernel program needs to be built/compiled irrespective of its created from source or binary. That's why, clBuildProgram() is a common step for both the cases.
As I know, there is an environmental variable named GPU_FORCE_64BIT_PTR which can be set to 1 to enable 64bit addressing. For more useful information, you may check this thread Cannot make OpenCL runtime expose more than 3 GB of RAM . Actually CL2.0 supports 64-bit addressing by default when application is 64-bit.