9 Replies Latest reply on Dec 9, 2014 1:10 AM by dipak

    segmentation fault inside clBuildProgram (bug demonstration attached)

    noah_r

      I have a particular OpenCL program that is revealing a bug with clBuildProgam for AMD CPU device.  After several recent code changes, my OpenCL kernel/program will compile just fine on Apple and NVIDIA platforms, but a Segmentation Fault is created within clBuildProgram on AMD Platform / CPU device.

       

      It is hard to guess what might be the problem.  I considered this a bug with the APP SDK.  I am attaching a simple bug demonstration program, but I would prefer to send you the offending kernel source code by private message or email.

       

      Build Command:

      g++ -o build_bug_demo opencl_program_build.cpp bugDemoSupport.cpp -I $AMDAPPSDKROOT/include -L $AMDAPPSDKROOT/lib/x86_64 -lOpenCL

       

      Run Command:

      ./build_bug_demo

      (Or perhaps specify -p option to specify the platform if not first platform. Or -h for help)

       

      Again, the files attached will build a very simple kernel.  Please message me for the actual offending openCL source code.

       

      (I have also tried this with AMD APP SDK 2.9.1 (version 1445.5) with the same results.)

       

      ./build_bug_demo

      Selected CL_PLATFORM_NAME: AMD Accelerated Parallel Processing

      CL_DEVICE_NAME: AMD Opteron(tm) Processor 6140

      CL_DRIVER_VERSION: 1214.3 (sse2)

      Loading Source...

      clCreateProgramWithSource...

      clBuildProgram...

      Segmentation fault

        • Re: segmentation fault inside clBuildProgram (bug demonstration attached)
          noah_r

          I was able to isolate the bug to a pretty simple code case:  initializing an empty struct.

          The attached below source will demonstrate the segmentation fault.

           

           

          struct GridDataStruct_defn

          {

          // empty struct

          };

           

          typedef struct GridDataStruct_defn GridDataStruct;

           

           

          // Kernel block.

          kernel void square( const global float* const restrict input, global float* const restrict output)

          {

              size_t i = get_global_id(0);

              output[i] = input[i] * input[i];

           

              const GridDataStruct gridDataStruct = { }; // Offending line

          }

            • Re: segmentation fault inside clBuildProgram (bug demonstration attached)
              bilal

              I run your attached code. and every thing compiles fine with me. Your program gave me the following output.

               

              Selected CL_PLATFORM_NAME: NVIDIA CUDA

              CL_DEVICE_NAME: GeForce GTX 260

              CL_DRIVER_VERSION: 295.41

              Loading Source...

              clCreateProgramWithSource...

              clBuildProgram...

              Build complete.

               

               

              Build-log ( 2 bytes):

               

               

               

               

               

              The End

              • Re: segmentation fault inside clBuildProgram (bug demonstration attached)
                dipak

                Hi,

                I was able to reproduce your issue (with sample kernel code posted on on Windows. However, when I tried to compile the same code with OpenCL compiler flag "-cl-std=2.0" using latest driver, it worked fine. If possible, please can you check and share your observation.


                Regards,

                  • Re: segmentation fault inside clBuildProgram (bug demonstration attached)
                    noah_r

                    I tried the build option:  -cl-std=2.0   as you suggest, but I still get a segmentation fault.  I don't have a Windows machine to test with.  I'm using the latest AMDAPPSDK 2.9.1 on an AMD CPU running linux.

                      • Re: segmentation fault inside clBuildProgram (bug demonstration attached)
                        dipak

                        I used that option when I tried using latest OpenCL 2.0 supported driver. Can you please share your clinfo output?


                        Regards,

                          • Re: segmentation fault inside clBuildProgram (bug demonstration attached)
                            noah_r

                            Oh, I see.  I'm not sure how I would install latest OpenCL 2.0 driver support for linux CPU.  I'm already running the most recent AMDAPPSDK.  I see now in the clinfo output that only OpenCL 1.2 is supported, so the cl-std build option was probably ignored anyway.

                             

                            ./AMDAPPSDK-2.9-1/bin/x86_64/clinfo

                            Number of platforms: 1

                              Platform Profile: FULL_PROFILE

                              Platform Version: OpenCL 1.2 AMD-APP (1445.5)

                              Platform Name: AMD Accelerated Parallel Processing

                              Platform Vendor: Advanced Micro Devices, Inc.

                              Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa

                             

                             

                              Platform Name: AMD Accelerated Parallel Processing

                            Number of devices: 1

                              Device Type: CL_DEVICE_TYPE_CPU

                              Vendor ID: 1002h

                              Board name: 

                              Max compute units: 32

                              Max work items dimensions: 3

                                Max work items[0]: 1024

                                Max work items[1]: 1024

                                Max work items[2]: 1024

                              Max work group size: 1024

                              Preferred vector width char: 16

                              Preferred vector width short: 8

                              Preferred vector width int: 4

                              Preferred vector width long: 2

                              Preferred vector width float: 4

                              Preferred vector width double: 2

                              Native vector width char: 16

                              Native vector width short: 8

                              Native vector width int: 4

                              Native vector width long: 2

                              Native vector width float: 4

                              Native vector width double: 2

                              Max clock frequency: 2599Mhz

                              Address bits: 64

                              Max memory allocation: 67754655744

                              Image support: Yes

                              Max number of images read arguments: 128

                              Max number of images write arguments: 8

                              Max image 2D width: 8192

                              Max image 2D height: 8192

                              Max image 3D width: 2048

                              Max image 3D height: 2048

                              Max image 3D depth: 2048

                              Max samplers within kernel: 16

                              Max size of kernel argument: 4096

                              Alignment (bits) of base address: 1024

                              Minimum alignment (bytes) for any datatype: 128

                              Single precision floating point capability

                                Denorms: Yes

                                Quiet NaNs: Yes

                                Round to nearest even: Yes

                                Round to zero: Yes

                                Round to +ve and infinity: Yes

                                IEEE754-2008 fused multiply-add: Yes

                              Cache type: Read/Write

                              Cache line size: 64

                              Cache size: 65536

                              Global memory size: 271018622976

                              Constant buffer size: 65536

                              Max number of constant args: 8

                              Local memory type: Global

                              Local memory size: 32768

                              Kernel Preferred work group size multiple: 1

                              Error correction support: 0

                              Unified memory for Host and Device: 1

                              Profiling timer resolution: 1

                              Device endianess: Little

                              Available: Yes

                              Compiler available: Yes

                              Execution capabilities: 

                                Execute OpenCL kernels: Yes

                                Execute native function: Yes

                              Queue properties: 

                                Out-of-Order: No

                                Profiling : Yes

                              Platform ID: 0x00002ac94645cde0

                              Name: AMD Opteron(tm) Processor 6140

                              Vendor: AuthenticAMD

                              Device OpenCL C version: OpenCL C 1.2

                              Driver version: 1445.5 (sse2)

                              Profile: FULL_PROFILE

                              Version: OpenCL 1.2 AMD-APP (1445.5)

                              Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm cl_khr_gl_event