1 Reply Latest reply on Feb 20, 2012 12:34 PM by MicahVillmow

    OpenCL crash on Atom CPU

    tomaszrybak

      Hello.

      I am maintaining Debian PyOpenCL packages. They allow for running OpenCL code in Python on both AMD and NVIDIA hardware. I have problem with running code on CPU.

      I am experiencing this problem on Asus eeePC 1215N with NVIDIA ION - Atom CPU (two cores with HyperThreading and GeForce 9400M), 64-bit Debian unstable with NVIDIA drivers 295.20 and AMD OpenCL from Catalyst 12-1.

      clinfo says:

      Number of platforms:                     2
        Platform Profile:                     FULL_PROFILE
        Platform Version:                     OpenCL 1.1 AMD-APP (851.4)
        Platform Name:                     AMD Accelerated Parallel Processing
        Platform Vendor:                     Advanced Micro Devices, Inc.
        Platform Extensions:                     cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
        Platform Profile:                     FULL_PROFILE
        Platform Version:                     OpenCL 1.1 CUDA 4.2.1
        Platform Name:                     NVIDIA CUDA
        Platform Vendor:                     NVIDIA Corporation
        Platform Extensions:                     cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll 
      
      
        Platform Name:                     AMD Accelerated Parallel Processing
      Number of devices:                     1
        Device Type:                          CL_DEVICE_TYPE_CPU
        Device ID:                          4098
        Board name:                          
        Max compute units:                     4
        Max work items dimensions:                3
          Max work items[0]:                     1024
          Max work items[1]:                     1024
          Max work items[2]:                     1024
        Max work group size:                     1024
        Preferred vector width char:                16
        Preferred vector width short:                8
        Preferred vector width int:                4
        Preferred vector width long:                2
        Preferred vector width float:                4
        Preferred vector width double:           0
        Native vector width char:                16
        Native vector width short:                8
        Native vector width int:                4
        Native vector width long:                2
        Native vector width float:                4
        Native vector width double:                0
        Max clock frequency:                     1600Mhz
        Address bits:                          64
        Max memory allocation:                1843699712
        Image support:                     Yes
        Max number of images read arguments:           128
        Max number of images write arguments:           8
        Max image 2D width:                     8192
        Max image 2D height:                     8192
        Max image 3D width:                     2048
        Max image 3D height:                     2048
        Max image 3D depth:                     2048
        Max samplers within kernel:                16
        Max size of kernel argument:                4096
        Alignment (bits) of base address:           1024
        Minimum alignment (bytes) for any datatype:      128
        Single precision floating point capability
          Denorms:                          Yes
          Quiet NaNs:                          Yes
          Round to nearest even:                Yes
          Round to zero:                     Yes
          Round to +ve and infinity:                Yes
          IEEE754-2008 fused multiply-add:           Yes
        Cache type:                          Read/Write
        Cache line size:                     64
        Cache size:                          24576
        Global memory size:                     1843699712
        Constant buffer size:                     65536
        Max number of constant args:                8
        Local memory type:                     Global
        Local memory size:                     32768
        Kernel Preferred work group size multiple:      1
        Error correction support:                0
        Unified memory for Host and Device:           1
        Profiling timer resolution:                1
        Device endianess:                     Little
        Available:                          Yes
        Compiler available:                     Yes
        Execution capabilities:                     
          Execute OpenCL kernels:                Yes
          Execute native function:                Yes
        Queue properties:                     
          Out-of-Order:                     No
          Profiling :                          Yes
        Platform ID:                          0x7fccc089f100
        Name:                               Intel(R) Atom(TM) CPU  330   @ 1.60GHz
        Vendor:                          GenuineIntel
        Device OpenCL C version:                OpenCL C 1.1 
        Driver version:                     2.0
        Profile:                          FULL_PROFILE
        Version:                          OpenCL 1.1 AMD-APP (851.4)
        Extensions:                          cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt 
      
      
        Platform Name:                     NVIDIA CUDA
      Number of devices:                     1
        Device Type:                          CL_DEVICE_TYPE_GPU
        Device ID:                          4318
        Max compute units:                     2
        Max work items dimensions:                3
          Max work items[0]:                     512
          Max work items[1]:                     512
          Max work items[2]:                     64
        Max work group size:                     512
        Preferred vector width char:                1
        Preferred vector width short:                1
        Preferred vector width int:                1
        Preferred vector width long:                1
        Preferred vector width float:                1
        Preferred vector width double:           0
        Native vector width char:                1
        Native vector width short:                1
        Native vector width int:                1
        Native vector width long:                1
        Native vector width float:                1
        Native vector width double:                0
        Max clock frequency:                     1100Mhz
        Address bits:                          32
        Max memory allocation:                134217728
        Image support:                     Yes
        Max number of images read arguments:           128
        Max number of images write arguments:           8
        Max image 2D width:                     4096
        Max image 2D height:                     16383
        Max image 3D width:                     2048
        Max image 3D height:                     2048
        Max image 3D depth:                     2048
        Max samplers within kernel:                16
        Max size of kernel argument:                4352
        Alignment (bits) of base address:           2048
        Minimum alignment (bytes) for any datatype:      128
        Single precision floating point capability
          Denorms:                          No
          Quiet NaNs:                          Yes
          Round to nearest even:                Yes
          Round to zero:                     Yes
          Round to +ve and infinity:                Yes
          IEEE754-2008 fused multiply-add:           Yes
        Cache type:                          None
        Cache line size:                     0
        Cache size:                          0
        Global memory size:                     265617408
        Constant buffer size:                     65536
        Max number of constant args:                9
        Local memory type:                     Scratchpad
        Local memory size:                     16384
        Kernel Preferred work group size multiple:      32
        Error correction support:                0
        Unified memory for Host and Device:           1
        Profiling timer resolution:                1000
        Device endianess:                     Little
        Available:                          Yes
        Compiler available:                     Yes
        Execution capabilities:                     
          Execute OpenCL kernels:                Yes
          Execute native function:                No
        Queue properties:                     
          Out-of-Order:                     Yes
          Profiling :                          Yes
        Platform ID:                          0x1386290
        Name:                               ION
        Vendor:                          NVIDIA Corporation
        Device OpenCL C version:                OpenCL C 1.0 
        Driver version:                     295.20
        Profile:                          FULL_PROFILE
        Version:                          OpenCL 1.0 CUDA
        Extensions:                          cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics 
      

      I am running PyOpenCL tests. They run fine on E-350 (both on GPU and CPU) and on ION GPU (GeForce 9400M). They fail on ION CPU:

      (gdb) bt
      #0  0x00007fa2005a0475 in *__GI_raise (sig=)
          at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
      #1  0x00007fa2005a36f0 in *__GI_abort () at abort.c:92
      #2  0x00007fa1fa136909 in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
      #3  0x00007fa1fa135c3b in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
      #4  0x00007fa1fa135f15 in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
      #5  0x00007fa1fa12bb6e in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
      #6  0x00007fa1fa0fd5ab in clCreateCommandQueue ()
         from /usr/lib/x86_64-linux-gnu/libamdocl64.so
      #7  0x00007fa1fc5f07f0 in boost::python::objects::make_holder<3>::apply, boost::mpl::vector3 >::execute(_object*, pyopencl::context const&, pyopencl::device const*, unsigned long) ()
         from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
      #8  0x00007fa1fc5e9e95 in boost::python::objects::caller_py_function_impl > >::operator()(_object*, _object*) ()
         from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
      #9  0x00007fa1fc34101f in operator() (kw=0x0, args=
          (, , None, 0), this=0x12e4730) at ./boost/python/object/py_function.hpp:143
      #10 boost::python::objects::function::call (this=0x12e4720, args=
          (, ), 
          keywords=0x0) at libs/python/src/object/function.cpp:226
      #11 0x00007fa1fc341278 in operator() (this=)
          at libs/python/src/object/function.cpp:585
      #12 boost::detail::function::void_function_ref_invoker0::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=)
          at ./boost/function/function_template.hpp:188
      #13 0x00007fa1fc34a693 in operator() (this=)
          at ./boost/function/function_template.hpp:760
      #14 boost::python::detail::exception_handler::operator() (
          this=, f=) at libs/python/src/errors.cpp:74
      #15 0x00007fa1fc639963 in boost::detail::function::function_obj_invoker2, boost::_bi::list3, boost::arg<2>, boost::_bi::value > >, bool, boost::python::detail::exception_handler const&, boost::function0 const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0 const&) ()
         from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
      #16 0x00007fa1fc34a479 in operator() (a0=, 
          this=, a1=)
          at ./boost/function/function_template.hpp:760
      #17 handle (f=, this=)
          at ./boost/python/detail/exception_handler.hpp:41
      #18 boost::python::handle_exception_impl (f=)
          at libs/python/src/errors.cpp:24
      #19 0x00007fa1fc33f514 in handle_exception (f=...) at ./boost/python/errors.hpp:29
      #20 boost::python::objects::function_call (func=, 
          args=, kw=)
          at libs/python/src/object/function.cpp:626
      #21 0x00000000004824c6 in PyObject_Call ()
      #22 0x00000000005186e5 in instancemethod_call.8521 ()
      #23 0x00000000004824c6 in PyObject_Call ()
      #24 0x00000000005179ee in slot_tp_init.25626 ()
      #25 0x000000000048212a in type_call.25275 ()
      #26 0x00000000004824c6 in PyObject_Call ()
      #27 0x00000000004c5e8a in PyEval_EvalFrameEx ()
      #28 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #29 0x00000000005772f4 in function_call.15044 ()
      #30 0x00000000004824c6 in PyObject_Call ()
      #31 0x00000000004c7861 in PyEval_EvalFrameEx ()
      #32 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #33 0x00000000005772f4 in function_call.15044 ()
      #34 0x00000000004824c6 in PyObject_Call ()
      #35 0x00000000004c7861 in PyEval_EvalFrameEx ()
      #36 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #37 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #38 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #39 0x00000000005772f4 in function_call.15044 ()
      #40 0x00000000004824c6 in PyObject_Call ()
      #41 0x00000000004c7861 in PyEval_EvalFrameEx ()
      #42 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #43 0x00000000004c5da8 in PyEval_EvalFrameEx ()
      #44 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #45 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #46 0x00000000005772f4 in function_call.15044 ()
      #47 0x00000000004824c6 in PyObject_Call ()
      #48 0x00000000004c7861 in PyEval_EvalFrameEx ()
      #49 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #50 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #51 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #52 0x00000000005772f4 in function_call.15044 ()
      #53 0x00000000004824c6 in PyObject_Call ()
      #54 0x00000000004c7861 in PyEval_EvalFrameEx ()
      #55 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #56 0x00000000004c5da8 in PyEval_EvalFrameEx ()
      #57 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #58 0x00000000004c5da8 in PyEval_EvalFrameEx ()
      #59 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #60 0x00000000005772f4 in function_call.15044 ()
      #61 0x00000000004824c6 in PyObject_Call ()
      #62 0x00000000005186e5 in instancemethod_call.8521 ()
      #63 0x00000000004824c6 in PyObject_Call ()
      #64 0x0000000000486086 in PyEval_CallObjectWithKeywords ()
      #65 0x000000000044f19b in PyInstance_New ()
      #66 0x00000000004824c6 in PyObject_Call ()
      #67 0x00000000004c5e8a in PyEval_EvalFrameEx ()
      #68 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #69 0x00000000004c5da8 in PyEval_EvalFrameEx ()
      #70 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #71 0x00000000004c5da8 in PyEval_EvalFrameEx ()
      #72 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #73 0x00000000004c5da8 in PyEval_EvalFrameEx ()
      #74 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #75 0x00000000005772f4 in function_call.15044 ()
      #76 0x00000000004824c6 in PyObject_Call ()
      #77 0x00000000004c7861 in PyEval_EvalFrameEx ()
      #78 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #79 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #80 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #81 0x00000000005772f4 in function_call.15044 ()
      #82 0x00000000004824c6 in PyObject_Call ()
      #83 0x00000000005186e5 in instancemethod_call.8521 ()
      #84 0x00000000004824c6 in PyObject_Call ()
      #85 0x000000000051a827 in instance_call.8662 ()
      #86 0x00000000004824c6 in PyObject_Call ()
      #87 0x00000000004c5e8a in PyEval_EvalFrameEx ()
      #88 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #89 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
      #90 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #91 0x00000000004c5da8 in PyEval_EvalFrameEx ()
      #92 0x00000000004ccee6 in PyEval_EvalCodeEx ()
      #93 0x0000000000577901 in PyRun_FileExFlags ()
      #94 0x0000000000577b37 in PyRun_SimpleFileExFlags ()
      #95 0x0000000000550497 in Py_Main ()
      #96 0x00007fa20058cead in __libc_start_main (main=, 
          argc=, ubp_av=, init=, 
          fini=, rtld_fini=, stack_end=0x7ffff9be1018)
          at libc-start.c:228
      #97 0x000000000041dea1 in _start ()
      

      The last few frames contain calls to functions from libamdopencl.so

      When I am running example under GDB, everything runs fine, and creates many threads:

      [New Thread 0x7fffec5bc700 (LWP 26152)]
      [New Thread 0x7fffebdaa700 (LWP 26153)]
      [New Thread 0x7fffeb598700 (LWP 26154)]
      [New Thread 0x7fffead86700 (LWP 26155)]
      [Thread 0x7fffead86700 (LWP 26155) exited]
      [Thread 0x7fffec5bc700 (LWP 26152) exited]
      [Thread 0x7fffeb598700 (LWP 26154) exited]
      [Thread 0x7fffebdaa700 (LWP 26153) exited]
      [Thread 0x7ffff7e8d700 (LWP 26151) exited]
      

      It repeats for all tests.

      Problem occurs both for Python 2.6 and 2.7. I suspect that there is some problem with threads, either on Python or AMD OpenCL side.

       

      BTW - what is the correct version of AMD OpenCL libraries? Debian package has version 12-1, amdcccle displays "Catalyst version 11.10, clinfo shows AMD-APP (851.4). Which one should I give when providing information about configuration of my computer?