cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

noah_r
Journeyman III

segmentation fault inside clBuildProgram (bug demonstration attached)

I have a particular OpenCL program that is revealing a bug with clBuildProgam for AMD CPU device.  After several recent code changes, my OpenCL kernel/program will compile just fine on Apple and NVIDIA platforms, but a Segmentation Fault is created within clBuildProgram on AMD Platform / CPU device.

It is hard to guess what might be the problem.  I considered this a bug with the APP SDK.  I am attaching a simple bug demonstration program, but I would prefer to send you the offending kernel source code by private message or email.

Build Command:

g++ -o build_bug_demo opencl_program_build.cpp bugDemoSupport.cpp -I $AMDAPPSDKROOT/include -L $AMDAPPSDKROOT/lib/x86_64 -lOpenCL

Run Command:

./build_bug_demo

(Or perhaps specify -p option to specify the platform if not first platform. Or -h for help)

Again, the files attached will build a very simple kernel.  Please message me for the actual offending openCL source code.

(I have also tried this with AMD APP SDK 2.9.1 (version 1445.5) with the same results.)

./build_bug_demo

Selected CL_PLATFORM_NAME: AMD Accelerated Parallel Processing

CL_DEVICE_NAME: AMD Opteron(tm) Processor 6140

CL_DRIVER_VERSION: 1214.3 (sse2)

Loading Source...

clCreateProgramWithSource...

clBuildProgram...

Segmentation fault

0 Likes
9 Replies
noah_r
Journeyman III

I was able to isolate the bug to a pretty simple code case:  initializing an empty struct.

The attached below source will demonstrate the segmentation fault.

struct GridDataStruct_defn

{

// empty struct

};

typedef struct GridDataStruct_defn GridDataStruct;

// Kernel block.

kernel void square( const global float* const restrict input, global float* const restrict output)

{

    size_t i = get_global_id(0);

    output = input * input;

    const GridDataStruct gridDataStruct = { }; // Offending line

}

0 Likes

I run your attached code. and every thing compiles fine with me. Your program gave me the following output.

Selected CL_PLATFORM_NAME: NVIDIA CUDA

CL_DEVICE_NAME: GeForce GTX 260

CL_DRIVER_VERSION: 295.41

Loading Source...

clCreateProgramWithSource...

clBuildProgram...

Build complete.

Build-log ( 2 bytes):

The End

0 Likes

Are you sure you're using the kernel code in my second message?    The code attached to my first message was supposed to work.

0 Likes

Yes the kernel code in your second post runs fine with me. No segmentation fault.

0 Likes

Hi,

I was able to reproduce your issue (with sample kernel code posted on on Windows. However, when I tried to compile the same code with OpenCL compiler flag "-cl-std=2.0" using latest driver, it worked fine. If possible, please can you check and share your observation.


Regards,

0 Likes

I tried the build option:  -cl-std=2.0   as you suggest, but I still get a segmentation fault.  I don't have a Windows machine to test with.  I'm using the latest AMDAPPSDK 2.9.1 on an AMD CPU running linux.

0 Likes

I used that option when I tried using latest OpenCL 2.0 supported driver. Can you please share your clinfo output?


Regards,

0 Likes

Oh, I see.  I'm not sure how I would install latest OpenCL 2.0 driver support for linux CPU.  I'm already running the most recent AMDAPPSDK.  I see now in the clinfo output that only OpenCL 1.2 is supported, so the cl-std build option was probably ignored anyway.

./AMDAPPSDK-2.9-1/bin/x86_64/clinfo

Number of platforms: 1

  Platform Profile: FULL_PROFILE

  Platform Version: OpenCL 1.2 AMD-APP (1445.5)

  Platform Name: AMD Accelerated Parallel Processing

  Platform Vendor: Advanced Micro Devices, Inc.

  Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa

  Platform Name: AMD Accelerated Parallel Processing

Number of devices: 1

  Device Type: CL_DEVICE_TYPE_CPU

  Vendor ID: 1002h

  Board name: 

  Max compute units: 32

  Max work items dimensions: 3

    Max work items[0]: 1024

    Max work items[1]: 1024

    Max work items[2]: 1024

  Max work group size: 1024

  Preferred vector width char: 16

  Preferred vector width short: 8

  Preferred vector width int: 4

  Preferred vector width long: 2

  Preferred vector width float: 4

  Preferred vector width double: 2

  Native vector width char: 16

  Native vector width short: 8

  Native vector width int: 4

  Native vector width long: 2

  Native vector width float: 4

  Native vector width double: 2

  Max clock frequency: 2599Mhz

  Address bits: 64

  Max memory allocation: 67754655744

  Image support: Yes

  Max number of images read arguments: 128

  Max number of images write arguments: 8

  Max image 2D width: 8192

  Max image 2D height: 8192

  Max image 3D width: 2048

  Max image 3D height: 2048

  Max image 3D depth: 2048

  Max samplers within kernel: 16

  Max size of kernel argument: 4096

  Alignment (bits) of base address: 1024

  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms: Yes

    Quiet NaNs: Yes

    Round to nearest even: Yes

    Round to zero: Yes

    Round to +ve and infinity: Yes

    IEEE754-2008 fused multiply-add: Yes

  Cache type: Read/Write

  Cache line size: 64

  Cache size: 65536

  Global memory size: 271018622976

  Constant buffer size: 65536

  Max number of constant args: 8

  Local memory type: Global

  Local memory size: 32768

  Kernel Preferred work group size multiple: 1

  Error correction support: 0

  Unified memory for Host and Device: 1

  Profiling timer resolution: 1

  Device endianess: Little

  Available: Yes

  Compiler available: Yes

  Execution capabilities: 

    Execute OpenCL kernels: Yes

    Execute native function: Yes

  Queue properties: 

    Out-of-Order: No

    Profiling : Yes

  Platform ID: 0x00002ac94645cde0

  Name: AMD Opteron(tm) Processor 6140

  Vendor: AuthenticAMD

  Device OpenCL C version: OpenCL C 1.2

  Driver version: 1445.5 (sse2)

  Profile: FULL_PROFILE

  Version: OpenCL 1.2 AMD-APP (1445.5)

  Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm cl_khr_gl_event

0 Likes

That version of OpenCL 2.0 driver will not work on your system. BTW, I'm able to reproduce it with CPU only setup with APP SDK 2.9-1 on Windows7. This same issue is also reproducible using latest catalyst driver [I used CodeXL to build the kernel code]. So, I guess its a compiler bug. I've filed an internal bug report against it. If get any update, I'll share with you.

Regards,