AnsweredAssumed Answered

OpenCL crash on Atom CPU

Question asked by tomaszrybak on Feb 18, 2012
Latest reply on Feb 20, 2012 by MicahVillmow

Hello.

I am maintaining Debian PyOpenCL packages. They allow for running OpenCL code in Python on both AMD and NVIDIA hardware. I have problem with running code on CPU.

I am experiencing this problem on Asus eeePC 1215N with NVIDIA ION - Atom CPU (two cores with HyperThreading and GeForce 9400M), 64-bit Debian unstable with NVIDIA drivers 295.20 and AMD OpenCL from Catalyst 12-1.

clinfo says:

Number of platforms:                     2
  Platform Profile:                     FULL_PROFILE
  Platform Version:                     OpenCL 1.1 AMD-APP (851.4)
  Platform Name:                     AMD Accelerated Parallel Processing
  Platform Vendor:                     Advanced Micro Devices, Inc.
  Platform Extensions:                     cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Profile:                     FULL_PROFILE
  Platform Version:                     OpenCL 1.1 CUDA 4.2.1
  Platform Name:                     NVIDIA CUDA
  Platform Vendor:                     NVIDIA Corporation
  Platform Extensions:                     cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll 


  Platform Name:                     AMD Accelerated Parallel Processing
Number of devices:                     1
  Device Type:                          CL_DEVICE_TYPE_CPU
  Device ID:                          4098
  Board name:                          
  Max compute units:                     4
  Max work items dimensions:                3
    Max work items[0]:                     1024
    Max work items[1]:                     1024
    Max work items[2]:                     1024
  Max work group size:                     1024
  Preferred vector width char:                16
  Preferred vector width short:                8
  Preferred vector width int:                4
  Preferred vector width long:                2
  Preferred vector width float:                4
  Preferred vector width double:           0
  Native vector width char:                16
  Native vector width short:                8
  Native vector width int:                4
  Native vector width long:                2
  Native vector width float:                4
  Native vector width double:                0
  Max clock frequency:                     1600Mhz
  Address bits:                          64
  Max memory allocation:                1843699712
  Image support:                     Yes
  Max number of images read arguments:           128
  Max number of images write arguments:           8
  Max image 2D width:                     8192
  Max image 2D height:                     8192
  Max image 3D width:                     2048
  Max image 3D height:                     2048
  Max image 3D depth:                     2048
  Max samplers within kernel:                16
  Max size of kernel argument:                4096
  Alignment (bits) of base address:           1024
  Minimum alignment (bytes) for any datatype:      128
  Single precision floating point capability
    Denorms:                          Yes
    Quiet NaNs:                          Yes
    Round to nearest even:                Yes
    Round to zero:                     Yes
    Round to +ve and infinity:                Yes
    IEEE754-2008 fused multiply-add:           Yes
  Cache type:                          Read/Write
  Cache line size:                     64
  Cache size:                          24576
  Global memory size:                     1843699712
  Constant buffer size:                     65536
  Max number of constant args:                8
  Local memory type:                     Global
  Local memory size:                     32768
  Kernel Preferred work group size multiple:      1
  Error correction support:                0
  Unified memory for Host and Device:           1
  Profiling timer resolution:                1
  Device endianess:                     Little
  Available:                          Yes
  Compiler available:                     Yes
  Execution capabilities:                     
    Execute OpenCL kernels:                Yes
    Execute native function:                Yes
  Queue properties:                     
    Out-of-Order:                     No
    Profiling :                          Yes
  Platform ID:                          0x7fccc089f100
  Name:                               Intel(R) Atom(TM) CPU  330   @ 1.60GHz
  Vendor:                          GenuineIntel
  Device OpenCL C version:                OpenCL C 1.1 
  Driver version:                     2.0
  Profile:                          FULL_PROFILE
  Version:                          OpenCL 1.1 AMD-APP (851.4)
  Extensions:                          cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt 


  Platform Name:                     NVIDIA CUDA
Number of devices:                     1
  Device Type:                          CL_DEVICE_TYPE_GPU
  Device ID:                          4318
  Max compute units:                     2
  Max work items dimensions:                3
    Max work items[0]:                     512
    Max work items[1]:                     512
    Max work items[2]:                     64
  Max work group size:                     512
  Preferred vector width char:                1
  Preferred vector width short:                1
  Preferred vector width int:                1
  Preferred vector width long:                1
  Preferred vector width float:                1
  Preferred vector width double:           0
  Native vector width char:                1
  Native vector width short:                1
  Native vector width int:                1
  Native vector width long:                1
  Native vector width float:                1
  Native vector width double:                0
  Max clock frequency:                     1100Mhz
  Address bits:                          32
  Max memory allocation:                134217728
  Image support:                     Yes
  Max number of images read arguments:           128
  Max number of images write arguments:           8
  Max image 2D width:                     4096
  Max image 2D height:                     16383
  Max image 3D width:                     2048
  Max image 3D height:                     2048
  Max image 3D depth:                     2048
  Max samplers within kernel:                16
  Max size of kernel argument:                4352
  Alignment (bits) of base address:           2048
  Minimum alignment (bytes) for any datatype:      128
  Single precision floating point capability
    Denorms:                          No
    Quiet NaNs:                          Yes
    Round to nearest even:                Yes
    Round to zero:                     Yes
    Round to +ve and infinity:                Yes
    IEEE754-2008 fused multiply-add:           Yes
  Cache type:                          None
  Cache line size:                     0
  Cache size:                          0
  Global memory size:                     265617408
  Constant buffer size:                     65536
  Max number of constant args:                9
  Local memory type:                     Scratchpad
  Local memory size:                     16384
  Kernel Preferred work group size multiple:      32
  Error correction support:                0
  Unified memory for Host and Device:           1
  Profiling timer resolution:                1000
  Device endianess:                     Little
  Available:                          Yes
  Compiler available:                     Yes
  Execution capabilities:                     
    Execute OpenCL kernels:                Yes
    Execute native function:                No
  Queue properties:                     
    Out-of-Order:                     Yes
    Profiling :                          Yes
  Platform ID:                          0x1386290
  Name:                               ION
  Vendor:                          NVIDIA Corporation
  Device OpenCL C version:                OpenCL C 1.0 
  Driver version:                     295.20
  Profile:                          FULL_PROFILE
  Version:                          OpenCL 1.0 CUDA
  Extensions:                          cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics 

I am running PyOpenCL tests. They run fine on E-350 (both on GPU and CPU) and on ION GPU (GeForce 9400M). They fail on ION CPU:

(gdb) bt
#0  0x00007fa2005a0475 in *__GI_raise (sig=)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fa2005a36f0 in *__GI_abort () at abort.c:92
#2  0x00007fa1fa136909 in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#3  0x00007fa1fa135c3b in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#4  0x00007fa1fa135f15 in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#5  0x00007fa1fa12bb6e in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#6  0x00007fa1fa0fd5ab in clCreateCommandQueue ()
   from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#7  0x00007fa1fc5f07f0 in boost::python::objects::make_holder<3>::apply, boost::mpl::vector3 >::execute(_object*, pyopencl::context const&, pyopencl::device const*, unsigned long) ()
   from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#8  0x00007fa1fc5e9e95 in boost::python::objects::caller_py_function_impl > >::operator()(_object*, _object*) ()
   from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#9  0x00007fa1fc34101f in operator() (kw=0x0, args=
    (, , None, 0), this=0x12e4730) at ./boost/python/object/py_function.hpp:143
#10 boost::python::objects::function::call (this=0x12e4720, args=
    (, ), 
    keywords=0x0) at libs/python/src/object/function.cpp:226
#11 0x00007fa1fc341278 in operator() (this=)
    at libs/python/src/object/function.cpp:585
#12 boost::detail::function::void_function_ref_invoker0::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=)
    at ./boost/function/function_template.hpp:188
#13 0x00007fa1fc34a693 in operator() (this=)
    at ./boost/function/function_template.hpp:760
#14 boost::python::detail::exception_handler::operator() (
    this=, f=) at libs/python/src/errors.cpp:74
#15 0x00007fa1fc639963 in boost::detail::function::function_obj_invoker2, boost::_bi::list3, boost::arg<2>, boost::_bi::value > >, bool, boost::python::detail::exception_handler const&, boost::function0 const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0 const&) ()
   from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#16 0x00007fa1fc34a479 in operator() (a0=, 
    this=, a1=)
    at ./boost/function/function_template.hpp:760
#17 handle (f=, this=)
    at ./boost/python/detail/exception_handler.hpp:41
#18 boost::python::handle_exception_impl (f=)
    at libs/python/src/errors.cpp:24
#19 0x00007fa1fc33f514 in handle_exception (f=...) at ./boost/python/errors.hpp:29
#20 boost::python::objects::function_call (func=, 
    args=, kw=)
    at libs/python/src/object/function.cpp:626
#21 0x00000000004824c6 in PyObject_Call ()
#22 0x00000000005186e5 in instancemethod_call.8521 ()
#23 0x00000000004824c6 in PyObject_Call ()
#24 0x00000000005179ee in slot_tp_init.25626 ()
#25 0x000000000048212a in type_call.25275 ()
#26 0x00000000004824c6 in PyObject_Call ()
#27 0x00000000004c5e8a in PyEval_EvalFrameEx ()
#28 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#29 0x00000000005772f4 in function_call.15044 ()
#30 0x00000000004824c6 in PyObject_Call ()
#31 0x00000000004c7861 in PyEval_EvalFrameEx ()
#32 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#33 0x00000000005772f4 in function_call.15044 ()
#34 0x00000000004824c6 in PyObject_Call ()
#35 0x00000000004c7861 in PyEval_EvalFrameEx ()
#36 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#37 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#38 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#39 0x00000000005772f4 in function_call.15044 ()
#40 0x00000000004824c6 in PyObject_Call ()
#41 0x00000000004c7861 in PyEval_EvalFrameEx ()
#42 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#43 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#44 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#45 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#46 0x00000000005772f4 in function_call.15044 ()
#47 0x00000000004824c6 in PyObject_Call ()
#48 0x00000000004c7861 in PyEval_EvalFrameEx ()
#49 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#50 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#51 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#52 0x00000000005772f4 in function_call.15044 ()
#53 0x00000000004824c6 in PyObject_Call ()
#54 0x00000000004c7861 in PyEval_EvalFrameEx ()
#55 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#56 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#57 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#58 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#59 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#60 0x00000000005772f4 in function_call.15044 ()
#61 0x00000000004824c6 in PyObject_Call ()
#62 0x00000000005186e5 in instancemethod_call.8521 ()
#63 0x00000000004824c6 in PyObject_Call ()
#64 0x0000000000486086 in PyEval_CallObjectWithKeywords ()
#65 0x000000000044f19b in PyInstance_New ()
#66 0x00000000004824c6 in PyObject_Call ()
#67 0x00000000004c5e8a in PyEval_EvalFrameEx ()
#68 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#69 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#70 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#71 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#72 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#73 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#74 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#75 0x00000000005772f4 in function_call.15044 ()
#76 0x00000000004824c6 in PyObject_Call ()
#77 0x00000000004c7861 in PyEval_EvalFrameEx ()
#78 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#79 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#80 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#81 0x00000000005772f4 in function_call.15044 ()
#82 0x00000000004824c6 in PyObject_Call ()
#83 0x00000000005186e5 in instancemethod_call.8521 ()
#84 0x00000000004824c6 in PyObject_Call ()
#85 0x000000000051a827 in instance_call.8662 ()
#86 0x00000000004824c6 in PyObject_Call ()
#87 0x00000000004c5e8a in PyEval_EvalFrameEx ()
#88 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#89 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#90 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#91 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#92 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#93 0x0000000000577901 in PyRun_FileExFlags ()
#94 0x0000000000577b37 in PyRun_SimpleFileExFlags ()
#95 0x0000000000550497 in Py_Main ()
#96 0x00007fa20058cead in __libc_start_main (main=, 
    argc=, ubp_av=, init=, 
    fini=, rtld_fini=, stack_end=0x7ffff9be1018)
    at libc-start.c:228
#97 0x000000000041dea1 in _start ()

The last few frames contain calls to functions from libamdopencl.so

When I am running example under GDB, everything runs fine, and creates many threads:

[New Thread 0x7fffec5bc700 (LWP 26152)]
[New Thread 0x7fffebdaa700 (LWP 26153)]
[New Thread 0x7fffeb598700 (LWP 26154)]
[New Thread 0x7fffead86700 (LWP 26155)]
[Thread 0x7fffead86700 (LWP 26155) exited]
[Thread 0x7fffec5bc700 (LWP 26152) exited]
[Thread 0x7fffeb598700 (LWP 26154) exited]
[Thread 0x7fffebdaa700 (LWP 26153) exited]
[Thread 0x7ffff7e8d700 (LWP 26151) exited]

It repeats for all tests.

Problem occurs both for Python 2.6 and 2.7. I suspect that there is some problem with threads, either on Python or AMD OpenCL side.

 

BTW - what is the correct version of AMD OpenCL libraries? Debian package has version 12-1, amdcccle displays "Catalyst version 11.10, clinfo shows AMD-APP (851.4). Which one should I give when providing information about configuration of my computer?

Outcomes