    segmentation fault inside clBuildProgram (bug demonstration attached)


      I have a particular OpenCL program that is revealing a bug with clBuildProgam for AMD CPU device.  After several recent code changes, my OpenCL kernel/program will compile just fine on Apple and NVIDIA platforms, but a Segmentation Fault is created within clBuildProgram on AMD Platform / CPU device.


      It is hard to guess what might be the problem.  I considered this a bug with the APP SDK.  I am attaching a simple bug demonstration program, but I would prefer to send you the offending kernel source code by private message or email.


      Build Command:

      g++ -o build_bug_demo opencl_program_build.cpp bugDemoSupport.cpp -I $AMDAPPSDKROOT/include -L $AMDAPPSDKROOT/lib/x86_64 -lOpenCL


      Run Command:


      (Or perhaps specify -p option to specify the platform if not first platform. Or -h for help)


      Again, the files attached will build a very simple kernel.  Please message me for the actual offending openCL source code.


      (I have also tried this with AMD APP SDK 2.9.1 (version 1445.5) with the same results.)



      Selected CL_PLATFORM_NAME: AMD Accelerated Parallel Processing

      CL_DEVICE_NAME: AMD Opteron(tm) Processor 6140

      CL_DRIVER_VERSION: 1214.3 (sse2)

      Loading Source...



      Segmentation fault

          I was able to isolate the bug to a pretty simple code case:  initializing an empty struct.

          The attached below source will demonstrate the segmentation fault.



          struct GridDataStruct_defn


          // empty struct



          typedef struct GridDataStruct_defn GridDataStruct;



          // Kernel block.

          kernel void square( const global float* const restrict input, global float* const restrict output)


              size_t i = get_global_id(0);

              output[i] = input[i] * input[i];


              const GridDataStruct gridDataStruct = { }; // Offending line


              I run your attached code. and every thing compiles fine with me. Your program gave me the following output.


              Selected CL_PLATFORM_NAME: NVIDIA CUDA

              CL_DEVICE_NAME: GeForce GTX 260

              CL_DRIVER_VERSION: 295.41

              Loading Source...



              Build complete.



              Build-log ( 2 bytes):






              The End

                I was able to reproduce your issue (with sample kernel code posted on on Windows. However, when I tried to compile the same code with OpenCL compiler flag "-cl-std=2.0" using latest driver, it worked fine. If possible, please can you check and share your observation.


                    I tried the build option:  -cl-std=2.0   as you suggest, but I still get a segmentation fault.  I don't have a Windows machine to test with.  I'm using the latest AMDAPPSDK 2.9.1 on an AMD CPU running linux.

                        I used that option when I tried using latest OpenCL 2.0 supported driver. Can you please share your clinfo output?


                            Oh, I see.  I'm not sure how I would install latest OpenCL 2.0 driver support for linux CPU.  I'm already running the most recent AMDAPPSDK.  I see now in the clinfo output that only OpenCL 1.2 is supported, so the cl-std build option was probably ignored anyway.



                            Number of platforms: 1

                              Platform Profile: FULL_PROFILE

                              Platform Version: OpenCL 1.2 AMD-APP (1445.5)

                              Platform Name: AMD Accelerated Parallel Processing

                              Platform Vendor: Advanced Micro Devices, Inc.

                              Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa



                              Platform Name: AMD Accelerated Parallel Processing

                            Number of devices: 1

                              Device Type: CL_DEVICE_TYPE_CPU

                              Vendor ID: 1002h

                              Board name: 

                              Max compute units: 32

                              Max work items dimensions: 3

                                Max work items[0]: 1024

                                Max work items[1]: 1024

                                Max work items[2]: 1024

                              Max work group size: 1024

                              Preferred vector width char: 16

                              Preferred vector width short: 8

                              Preferred vector width int: 4

                              Preferred vector width long: 2

                              Preferred vector width float: 4

                              Preferred vector width double: 2

                              Native vector width char: 16

                              Native vector width short: 8

                              Native vector width int: 4

                              Native vector width long: 2

                              Native vector width float: 4

                              Native vector width double: 2

                              Max clock frequency: 2599Mhz

                              Address bits: 64

                              Max memory allocation: 67754655744

                              Image support: Yes

                              Max number of images read arguments: 128

                              Max number of images write arguments: 8

                              Max image 2D width: 8192

                              Max image 2D height: 8192

                              Max image 3D width: 2048

                              Max image 3D height: 2048

                              Max image 3D depth: 2048

                              Max samplers within kernel: 16

                              Max size of kernel argument: 4096

                              Alignment (bits) of base address: 1024

                              Minimum alignment (bytes) for any datatype: 128

                              Single precision floating point capability

                                Denorms: Yes

                                Quiet NaNs: Yes

                                Round to nearest even: Yes

                                Round to zero: Yes

                                Round to +ve and infinity: Yes

                                IEEE754-2008 fused multiply-add: Yes

                              Cache type: Read/Write

                              Cache line size: 64

                              Cache size: 65536

                              Global memory size: 271018622976

                              Constant buffer size: 65536

                              Max number of constant args: 8

                              Local memory type: Global

                              Local memory size: 32768

                              Kernel Preferred work group size multiple: 1

                              Error correction support: 0

                              Unified memory for Host and Device: 1

                              Profiling timer resolution: 1

                              Device endianess: Little

                              Available: Yes

                              Compiler available: Yes

                              Execution capabilities: 

                                Execute OpenCL kernels: Yes

                                Execute native function: Yes

                              Queue properties: 

                                Out-of-Order: No

                                Profiling : Yes

                              Platform ID: 0x00002ac94645cde0

                              Name: AMD Opteron(tm) Processor 6140

                              Vendor: AuthenticAMD

                              Device OpenCL C version: OpenCL C 1.2

                              Driver version: 1445.5 (sse2)

                              Profile: FULL_PROFILE

                              Version: OpenCL 1.2 AMD-APP (1445.5)

                              Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm cl_khr_gl_event