I have a particular OpenCL program that is revealing a bug with clBuildProgam for AMD CPU device. After several recent code changes, my OpenCL kernel/program will compile just fine on Apple and NVIDIA platforms, but a Segmentation Fault is created within clBuildProgram on AMD Platform / CPU device.
It is hard to guess what might be the problem. I considered this a bug with the APP SDK. I am attaching a simple bug demonstration program, but I would prefer to send you the offending kernel source code by private message or email.
Build Command:
g++ -o build_bug_demo opencl_program_build.cpp bugDemoSupport.cpp -I $AMDAPPSDKROOT/include -L $AMDAPPSDKROOT/lib/x86_64 -lOpenCL
Run Command:
./build_bug_demo
(Or perhaps specify -p option to specify the platform if not first platform. Or -h for help)
Again, the files attached will build a very simple kernel. Please message me for the actual offending openCL source code.
(I have also tried this with AMD APP SDK 2.9.1 (version 1445.5) with the same results.)
./build_bug_demo
Selected CL_PLATFORM_NAME: AMD Accelerated Parallel Processing
CL_DEVICE_NAME: AMD Opteron(tm) Processor 6140
CL_DRIVER_VERSION: 1214.3 (sse2)
Loading Source...
clCreateProgramWithSource...
clBuildProgram...
Segmentation fault
I was able to isolate the bug to a pretty simple code case: initializing an empty struct.
The attached below source will demonstrate the segmentation fault.
struct GridDataStruct_defn
{
// empty struct
};
typedef struct GridDataStruct_defn GridDataStruct;
// Kernel block.
kernel void square( const global float* const restrict input, global float* const restrict output)
{
size_t i = get_global_id(0);
output = input * input;
const GridDataStruct gridDataStruct = { }; // Offending line
}
I run your attached code. and every thing compiles fine with me. Your program gave me the following output.
Selected CL_PLATFORM_NAME: NVIDIA CUDA
CL_DEVICE_NAME: GeForce GTX 260
CL_DRIVER_VERSION: 295.41
Loading Source...
clCreateProgramWithSource...
clBuildProgram...
Build complete.
Build-log ( 2 bytes):
The End
Are you sure you're using the kernel code in my second message? The code attached to my first message was supposed to work.
Yes the kernel code in your second post runs fine with me. No segmentation fault.
Hi,
I was able to reproduce your issue (with sample kernel code posted on Oct 22, 2014 8:20 PM) on Windows. However, when I tried to compile the same code with OpenCL compiler flag "-cl-std=2.0" using latest driver, it worked fine. If possible, please can you check and share your observation.
Regards,
I tried the build option: -cl-std=2.0 as you suggest, but I still get a segmentation fault. I don't have a Windows machine to test with. I'm using the latest AMDAPPSDK 2.9.1 on an AMD CPU running linux.
I used that option when I tried using latest OpenCL 2.0 supported driver. Can you please share your clinfo output?
Regards,
Oh, I see. I'm not sure how I would install latest OpenCL 2.0 driver support for linux CPU. I'm already running the most recent AMDAPPSDK. I see now in the clinfo output that only OpenCL 1.2 is supported, so the cl-std build option was probably ignored anyway.
./AMDAPPSDK-2.9-1/bin/x86_64/clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 AMD-APP (1445.5)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 32
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 2
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 2
Max clock frequency: 2599Mhz
Address bits: 64
Max memory allocation: 67754655744
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 271018622976
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x00002ac94645cde0
Name: AMD Opteron(tm) Processor 6140
Vendor: AuthenticAMD
Device OpenCL C version: OpenCL C 1.2
Driver version: 1445.5 (sse2)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (1445.5)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm cl_khr_gl_event
That version of OpenCL 2.0 driver will not work on your system. BTW, I'm able to reproduce it with CPU only setup with APP SDK 2.9-1 on Windows7. This same issue is also reproducible using latest catalyst driver [I used CodeXL to build the kernel code]. So, I guess its a compiler bug. I've filed an internal bug report against it. If get any update, I'll share with you.
Regards,