cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

michaeltesch
Adept I

bug in compiling double for Cypress on linux?

compilation segfaults when targeting Cypress, but not when targeting CPU

Any ideas? Is this a known bug?

 

this error:

 

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff36b4069 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
Missing separate debuginfos, use: debuginfo-install glibc-2.14-5.x86_64 libX11-1.4.3-1.fc15.x86_64 libXau-1.0.6-2.fc15.x86_64 libXext-1.2.0-2.fc15.x86_64 libXinerama-1.1.1-2.fc15.x86_64 libgcc-4.6.1-9.fc15.x86_64 libgfortran-4.6.1-9.fc15.x86_64 libgomp-4.6.1-9.fc15.x86_64 libjpeg-turbo-1.1.1-1.fc15.x86_64 libpng-1.2.46-1.fc15.x86_64 libquadmath-4.6.1-9.fc15.x86_64 libstdc++-4.6.1-9.fc15.x86_64 libtiff-3.9.5-1.fc15.x86_64 libxcb-1.7-2.fc15.x86_64 numactl-2.0.7-1.fc15.x86_64 xorg-x11-drv-catalyst-libs-11.9-3.fc15.x86_64 zlib-1.2.5-3.fc15.x86_64
(gdb) bt
#0  0x00007ffff36b4069 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#1  0x00007ffff36b5c27 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#2  0x00007ffff36b6641 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#3  0x00007ffff3996e5f in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#4  0x00007ffff3e28950 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#5  0x00007ffff3e28a62 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#6  0x00007ffff3e28bf9 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#7  0x00007ffff320656b in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#8  0x00007ffff3208bc7 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#9  0x00007ffff3222f18 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#10 0x00007ffff324339f in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#11 0x00007ffff320d25d in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#12 0x00007ffff325f565 in ?? () from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#13 0x00007ffff320335d in clBuildProgram ()
   from /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64/libamdocl64.so
#14 0x00000000004038db in cl::Program::build (this=0xa2aec0,
    devices=std::vector of length 1, capacity 1 = {...}, options=0x407ab9 "-I.", notifyFptr=0, data=0x0)
    at /home/tesch/src/AMD-APP-SDK-v2.5-RC2-lnx64/include/CL/cl.hpp:2675
#15 0x0000000000402782 in clsetup (sourcecount=1) at Clutil.cpp:79

-----------

is cause by this code :

---------


//#define REAL_T_IS_FLOAT
#define REAL_T_IS_DOUBLE


#ifdef REAL_T_IS_FLOAT
typedef float real_t;
typedef float3 vec3_t;

#else

#pragma OPENCL EXTENSION cl_amd_fp64 : enable // subset of cl_khr_fp64
typedef double real_t;
typedef double3 vec3_t;
#endif


__kernel void
bloch_steps(int nsteps)
{
  real_t T2i;

  for (int i = 0; i < nsteps; i++) {
    vec3_t b1;
    real_t et2 = exp(-1 * T2i);
    real_t theta;

    theta = length(b1);
    b1 *= 1. / theta;
  }
}

 

----------

when compiling on this device (clinfo output):

[tesch@localhost mtsrc]$ ~/src/AMD-APP-SDK-v2.5-RC2-lnx64/bin/x86_64/clinfo
Number of platforms:                             2
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.1 LINUX
  Platform Name:                                 Intel(R) OpenCL
  Platform Vendor:                               Intel(R) Corporation
  Platform Extensions:                           cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_khr_icd


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               2
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Device ID:                                     4098
  Device Topology:                               PCI[ B#1, D#0, F#0 ]
  Max compute units:                             18
  Max work items dimensions:                     3
    Max work items[0]:                           256
    Max work items[1]:                           256
    Max work items[2]:                           256
  Max work group size:                           256
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  4
  Preferred vector width double:                 2
  Native vector width char:                      16
  Native vector width short:                     8
  Native vector width int:                       4
  Native vector width long:                      2
  Native vector width float:                     4
  Native vector width double:                    2
  Max clock frequency:                           725Mhz
  Address bits:                                  32
  Max memory allocation:                         134217728
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            8192
  Max image 2D height:                           8192
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              32768
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    None
  Cache line size:                               0
  Cache size:                                    0
  Global memory size:                            536870912
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             32768
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                               
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue properties:                             
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   0x7f45fad3b060
  Name:                                          Cypress
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 1.1
  Driver version:                                CAL 1.4.1546
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt

 

-------------

but NOT when compiling for this device: (clinfo output continued)

-------------

  Device Type:                                   CL_DEVICE_TYPE_CPU
  Device ID:                                     4098
  Max compute units:                             6
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           1024
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  4
  Preferred vector width double:                 0
  Native vector width char:                      16
  Native vector width short:                     8
  Native vector width int:                       4
  Native vector width long:                      2
  Native vector width float:                     4
  Native vector width double:                    0
  Max clock frequency:                           800Mhz
  Address bits:                                  64
  Max memory allocation:                         4215897088
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            8192
  Max image 2D height:                           8192
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   4096
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             No
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    65536
  Global memory size:                            16863588352
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             32768
  Kernel Preferred work group size multiple:     1
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                               
    Execute OpenCL kernels:                      Yes
    Execute native function:                     Yes
  Queue properties:                             
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   0x7f45fad3b060
  Name:                                          AMD Phenom(tm) II X6 1100T Processor
  Vendor:                                        AuthenticAMD
  Device OpenCL C version:                       OpenCL C 1.1
  Driver version:                                2.0
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf

0 Likes
2 Replies
arsenm
Adept III

This looks a lot like a crash I've seen that Micah says is fixed in an upcoming release. I bet if you comment out the length that it won't crash.

0 Likes

Yep, commenting it out does stop the crash, also removing the for loop, and removing various other surrounding statements or declarations.  I removed a ton of code from the original function to boil down the offending code to just this.

 

In the end my workaround is to compute the length with sqrt(a.x * a.x + a.y * a.y...) .. but still it would be much better if it just worked like it's supposed to, and I didn't spend any time at all finding workarounds for compiler bugs.

 

0 Likes