Hello.
I am maintaining Debian PyOpenCL packages. They allow for running OpenCL code in Python on both AMD and NVIDIA hardware. I have problem with running code on CPU.
I am experiencing this problem on Asus eeePC 1215N with NVIDIA ION - Atom CPU (two cores with HyperThreading and GeForce 9400M), 64-bit Debian unstable with NVIDIA drivers 295.20 and AMD OpenCL from Catalyst 12-1.
clinfo says:
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 AMD-APP (851.4)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 CUDA 4.2.1
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Board name:
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 1600Mhz
Address bits: 64
Max memory allocation: 1843699712
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 24576
Global memory size: 1843699712
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7fccc089f100
Name: Intel(R) Atom(TM) CPU 330 @ 1.60GHz
Vendor: GenuineIntel
Device OpenCL C version: OpenCL C 1.1
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP (851.4)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4318
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 64
Max work group size: 512
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
Native vector width char: 1
Native vector width short: 1
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 0
Max clock frequency: 1100Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 4096
Max image 2D height: 16383
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4352
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 265617408
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 16384
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x1386290
Name: ION
Vendor: NVIDIA Corporation
Device OpenCL C version: OpenCL C 1.0
Driver version: 295.20
Profile: FULL_PROFILE
Version: OpenCL 1.0 CUDA
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
I am running PyOpenCL tests. They run fine on E-350 (both on GPU and CPU) and on ION GPU (GeForce 9400M). They fail on ION CPU:
(gdb) bt
#0 0x00007fa2005a0475 in *__GI_raise (sig=)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007fa2005a36f0 in *__GI_abort () at abort.c:92
#2 0x00007fa1fa136909 in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#3 0x00007fa1fa135c3b in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#4 0x00007fa1fa135f15 in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#5 0x00007fa1fa12bb6e in ?? () from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#6 0x00007fa1fa0fd5ab in clCreateCommandQueue ()
from /usr/lib/x86_64-linux-gnu/libamdocl64.so
#7 0x00007fa1fc5f07f0 in boost::python::objects::make_holder<3>::apply, boost::mpl::vector3 >::execute(_object*, pyopencl::context const&, pyopencl::device const*, unsigned long) ()
from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#8 0x00007fa1fc5e9e95 in boost::python::objects::caller_py_function_impl > >::operator()(_object*, _object*) ()
from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#9 0x00007fa1fc34101f in operator() (kw=0x0, args=
(, , None, 0), this=0x12e4730) at ./boost/python/object/py_function.hpp:143
#10 boost::python::objects::function::call (this=0x12e4720, args=
(, ),
keywords=0x0) at libs/python/src/object/function.cpp:226
#11 0x00007fa1fc341278 in operator() (this=)
at libs/python/src/object/function.cpp:585
#12 boost::detail::function::void_function_ref_invoker0::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=)
at ./boost/function/function_template.hpp:188
#13 0x00007fa1fc34a693 in operator() (this=)
at ./boost/function/function_template.hpp:760
#14 boost::python::detail::exception_handler::operator() (
this=, f=) at libs/python/src/errors.cpp:74
#15 0x00007fa1fc639963 in boost::detail::function::function_obj_invoker2, boost::_bi::list3, boost::arg<2>, boost::_bi::value > >, bool, boost::python::detail::exception_handler const&, boost::function0 const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0 const&) ()
from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#16 0x00007fa1fc34a479 in operator() (a0=,
this=, a1=)
at ./boost/function/function_template.hpp:760
#17 handle (f=, this=)
at ./boost/python/detail/exception_handler.hpp:41
#18 boost::python::handle_exception_impl (f=)
at libs/python/src/errors.cpp:24
#19 0x00007fa1fc33f514 in handle_exception (f=...) at ./boost/python/errors.hpp:29
#20 boost::python::objects::function_call (func=,
args=, kw=)
at libs/python/src/object/function.cpp:626
#21 0x00000000004824c6 in PyObject_Call ()
#22 0x00000000005186e5 in instancemethod_call.8521 ()
#23 0x00000000004824c6 in PyObject_Call ()
#24 0x00000000005179ee in slot_tp_init.25626 ()
#25 0x000000000048212a in type_call.25275 ()
#26 0x00000000004824c6 in PyObject_Call ()
#27 0x00000000004c5e8a in PyEval_EvalFrameEx ()
#28 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#29 0x00000000005772f4 in function_call.15044 ()
#30 0x00000000004824c6 in PyObject_Call ()
#31 0x00000000004c7861 in PyEval_EvalFrameEx ()
#32 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#33 0x00000000005772f4 in function_call.15044 ()
#34 0x00000000004824c6 in PyObject_Call ()
#35 0x00000000004c7861 in PyEval_EvalFrameEx ()
#36 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#37 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#38 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#39 0x00000000005772f4 in function_call.15044 ()
#40 0x00000000004824c6 in PyObject_Call ()
#41 0x00000000004c7861 in PyEval_EvalFrameEx ()
#42 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#43 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#44 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#45 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#46 0x00000000005772f4 in function_call.15044 ()
#47 0x00000000004824c6 in PyObject_Call ()
#48 0x00000000004c7861 in PyEval_EvalFrameEx ()
#49 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#50 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#51 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#52 0x00000000005772f4 in function_call.15044 ()
#53 0x00000000004824c6 in PyObject_Call ()
#54 0x00000000004c7861 in PyEval_EvalFrameEx ()
#55 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#56 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#57 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#58 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#59 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#60 0x00000000005772f4 in function_call.15044 ()
#61 0x00000000004824c6 in PyObject_Call ()
#62 0x00000000005186e5 in instancemethod_call.8521 ()
#63 0x00000000004824c6 in PyObject_Call ()
#64 0x0000000000486086 in PyEval_CallObjectWithKeywords ()
#65 0x000000000044f19b in PyInstance_New ()
#66 0x00000000004824c6 in PyObject_Call ()
#67 0x00000000004c5e8a in PyEval_EvalFrameEx ()
#68 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#69 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#70 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#71 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#72 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#73 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#74 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#75 0x00000000005772f4 in function_call.15044 ()
#76 0x00000000004824c6 in PyObject_Call ()
#77 0x00000000004c7861 in PyEval_EvalFrameEx ()
#78 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#79 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#80 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#81 0x00000000005772f4 in function_call.15044 ()
#82 0x00000000004824c6 in PyObject_Call ()
#83 0x00000000005186e5 in instancemethod_call.8521 ()
#84 0x00000000004824c6 in PyObject_Call ()
#85 0x000000000051a827 in instance_call.8662 ()
#86 0x00000000004824c6 in PyObject_Call ()
#87 0x00000000004c5e8a in PyEval_EvalFrameEx ()
#88 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#89 0x00000000004c5ff2 in PyEval_EvalFrameEx ()
#90 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#91 0x00000000004c5da8 in PyEval_EvalFrameEx ()
#92 0x00000000004ccee6 in PyEval_EvalCodeEx ()
#93 0x0000000000577901 in PyRun_FileExFlags ()
#94 0x0000000000577b37 in PyRun_SimpleFileExFlags ()
#95 0x0000000000550497 in Py_Main ()
#96 0x00007fa20058cead in __libc_start_main (main=,
argc=, ubp_av=, init=,
fini=, rtld_fini=, stack_end=0x7ffff9be1018)
at libc-start.c:228
#97 0x000000000041dea1 in _start ()
The last few frames contain calls to functions from libamdopencl.so
When I am running example under GDB, everything runs fine, and creates many threads:
[New Thread 0x7fffec5bc700 (LWP 26152)]
[New Thread 0x7fffebdaa700 (LWP 26153)]
[New Thread 0x7fffeb598700 (LWP 26154)]
[New Thread 0x7fffead86700 (LWP 26155)]
[Thread 0x7fffead86700 (LWP 26155) exited]
[Thread 0x7fffec5bc700 (LWP 26152) exited]
[Thread 0x7fffeb598700 (LWP 26154) exited]
[Thread 0x7fffebdaa700 (LWP 26153) exited]
[Thread 0x7ffff7e8d700 (LWP 26151) exited]
It repeats for all tests.
Problem occurs both for Python 2.6 and 2.7. I suspect that there is some problem with threads, either on Python or AMD OpenCL side.
BTW - what is the correct version of AMD OpenCL libraries? Debian package has version 12-1, amdcccle displays "Catalyst version 11.10, clinfo shows AMD-APP (851.4). Which one should I give when providing information about configuration of my computer?
Use the AMD-APP version for the the OpenCL library information, we can figure out most everything else from that. I've forwarded your crash to our runtime guys to look at.