Archives Discussions

noah_r · ‎10-21-2014

I have a particular OpenCL program that is revealing a bug with clBuildProgam for AMD CPU device. After several recent code changes, my OpenCL kernel/program will compile just fine on Apple and NVIDIA platforms, but a Segmentation Fault is created within clBuildProgram on AMD Platform / CPU device.

It is hard to guess what might be the problem. I considered this a bug with the APP SDK. I am attaching a simple bug demonstration program, but I would prefer to send you the offending kernel source code by private message or email.

Build Command:

g++ -o build_bug_demo opencl_program_build.cpp bugDemoSupport.cpp -I $AMDAPPSDKROOT/include -L $AMDAPPSDKROOT/lib/x86_64 -lOpenCL

Run Command:

./build_bug_demo

(Or perhaps specify -p option to specify the platform if not first platform. Or -h for help)

Again, the files attached will build a very simple kernel. Please message me for the actual offending openCL source code.

(I have also tried this with AMD APP SDK 2.9.1 (version 1445.5) with the same results.)

./build_bug_demo

Selected CL_PLATFORM_NAME: AMD Accelerated Parallel Processing

CL_DEVICE_NAME: AMD Opteron(tm) Processor 6140

CL_DRIVER_VERSION: 1214.3 (sse2)

Loading Source...

clCreateProgramWithSource...

clBuildProgram...

Segmentation fault

noah_r · ‎10-22-2014

I was able to isolate the bug to a pretty simple code case: initializing an empty struct.

The attached below source will demonstrate the segmentation fault.

struct GridDataStruct_defn

{

// empty struct

};

typedef struct GridDataStruct_defn GridDataStruct;

// Kernel block.

kernel void square( const global float* const restrict input, global float* const restrict output)

{

size_t i = get_global_id(0);

output = input * input;

const GridDataStruct gridDataStruct = { }; // Offending line

}

bilal · ‎10-23-2014

I run your attached code. and every thing compiles fine with me. Your program gave me the following output.

Selected CL_PLATFORM_NAME: NVIDIA CUDA

CL_DEVICE_NAME: GeForce GTX 260

CL_DRIVER_VERSION: 295.41

Loading Source...

clCreateProgramWithSource...

clBuildProgram...

Build complete.

Build-log ( 2 bytes):

The End

noah_r · ‎10-23-2014

Are you sure you're using the kernel code in my second message? The code attached to my first message was supposed to work.

bilal · ‎10-24-2014

Yes the kernel code in your second post runs fine with me. No segmentation fault.

dipak · ‎11-20-2014

Hi,

I was able to reproduce your issue (with sample kernel code posted on Oct 22, 2014 8:20 PM) on Windows. However, when I tried to compile the same code with OpenCL compiler flag "-cl-std=2.0" using latest driver, it worked fine. If possible, please can you check and share your observation.

Regards,

noah_r · ‎12-08-2014

I tried the build option: -cl-std=2.0 as you suggest, but I still get a segmentation fault. I don't have a Windows machine to test with. I'm using the latest AMDAPPSDK 2.9.1 on an AMD CPU running linux.

dipak · ‎12-08-2014

I used that option when I tried using latest OpenCL 2.0 supported driver. Can you please share your clinfo output?

Regards,

noah_r · ‎12-08-2014

Oh, I see. I'm not sure how I would install latest OpenCL 2.0 driver support for linux CPU. I'm already running the most recent AMDAPPSDK. I see now in the clinfo output that only OpenCL 1.2 is supported, so the cl-std build option was probably ignored anyway.

./AMDAPPSDK-2.9-1/bin/x86_64/clinfo

Number of platforms: 1

Platform Profile: FULL_PROFILE

Platform Version: OpenCL 1.2 AMD-APP (1445.5)

Platform Name: AMD Accelerated Parallel Processing

Platform Vendor: Advanced Micro Devices, Inc.

Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa

Platform Name: AMD Accelerated Parallel Processing

Number of devices: 1

Device Type: CL_DEVICE_TYPE_CPU

Vendor ID: 1002h

Board name:

Max compute units: 32

Max work items dimensions: 3

Max work items[0]: 1024

Max work items[1]: 1024

Max work items[2]: 1024

Max work group size: 1024

Preferred vector width char: 16

Preferred vector width short: 8

Preferred vector width int: 4

Preferred vector width long: 2

Preferred vector width float: 4

Preferred vector width double: 2

Native vector width char: 16

Native vector width short: 8

Native vector width int: 4

Native vector width long: 2

Native vector width float: 4

Native vector width double: 2

Max clock frequency: 2599Mhz

Address bits: 64

Max memory allocation: 67754655744

Image support: Yes

Max number of images read arguments: 128

Max number of images write arguments: 8

Max image 2D width: 8192

Max image 2D height: 8192

Max image 3D width: 2048

Max image 3D height: 2048

Max image 3D depth: 2048

Max samplers within kernel: 16

Max size of kernel argument: 4096

Alignment (bits) of base address: 1024

Minimum alignment (bytes) for any datatype: 128

Single precision floating point capability

Denorms: Yes

Quiet NaNs: Yes

Round to nearest even: Yes

Round to zero: Yes

Round to +ve and infinity: Yes

IEEE754-2008 fused multiply-add: Yes

Cache type: Read/Write

Cache line size: 64

Cache size: 65536

Global memory size: 271018622976

Constant buffer size: 65536

Max number of constant args: 8

Local memory type: Global

Local memory size: 32768

Kernel Preferred work group size multiple: 1

Error correction support: 0

Unified memory for Host and Device: 1

Profiling timer resolution: 1

Device endianess: Little

Available: Yes

Compiler available: Yes

Execution capabilities:

Execute OpenCL kernels: Yes

Execute native function: Yes

Queue properties:

Out-of-Order: No

Profiling : Yes

Platform ID: 0x00002ac94645cde0

Name: AMD Opteron(tm) Processor 6140

Vendor: AuthenticAMD

Device OpenCL C version: OpenCL C 1.2

Driver version: 1445.5 (sse2)

Profile: FULL_PROFILE

Version: OpenCL 1.2 AMD-APP (1445.5)

Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm cl_khr_gl_event

dipak · ‎12-09-2014

That version of OpenCL 2.0 driver will not work on your system. BTW, I'm able to reproduce it with CPU only setup with APP SDK 2.9-1 on Windows7. This same issue is also reproducible using latest catalyst driver [I used CodeXL to build the kernel code]. So, I guess its a compiler bug. I've filed an internal bug report against it. If get any update, I'll share with you.

Regards,

Archives Discussions

segmentation fault inside clBuildProgram (bug demonstration attached)