I installed the amdgpu-pro 16.30 drivers on a 64-bit Ubuntu 16.04 server machine. And strangely the clinfo utility reports 14 max compute units for the R9 nano and the RX 480:
$ clinfo
[snip]
Max compute units: 14
Shouldn't it be 64 for the R9 nano and 36 for the RX 480?
Hi,
Could you please share the clinfo output?
Regards,
Right, here is the output for an RX 480. I don't know if this matters, but the clinfo utility is from amdgpu-pro-clinfo (not the AMD APP SDK) and it links against libOpenCL.so from amdgpu-pro-libopencl1 (not the AMD APP SDK).
$ ldd /usr/bin/clinfo
linux-vdso.so.1 => (0x00007ffc0897f000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fde13577000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fde1326e000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fde13069000)
libOpenCL.so.1 => /usr/lib/x86_64-linux-gnu/amdgpu-pro/libOpenCL.so.1 (0x00007fde12e62000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fde12c4c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fde12a2e000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fde12665000)
/lib64/ld-linux-x86-64.so.2 (0x0000561b8e6c6000)
$ dpkg -S /usr/lib/x86_64-linux-gnu/amdgpu-pro/libOpenCL.so.1
amdgpu-pro-libopencl1:amd64: /usr/lib/x86_64-linux-gnu/amdgpu-pro/libOpenCL.so.1
$ dpkg -l amdgpu-pro-libopencl1:amd64
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================-===================-===================-================================================================
ii amdgpu-pro-libopencl1:amd64 16.30.3-315407 amd64 AMD OpenCL ICD Loader library
$ clinfo # the output below is for an RX 480
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2117.7)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name:
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 14
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 555Mhz
Address bits: 64
Max memory allocation: 4244635648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8544440320
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f02276c08f8
Name: Ellesmere
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2117.7 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2117.7)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Thanks for sharing the information. I'll check with the concerned team and get back to you.
Regards,
Hi Marc,
It seems that a similar issue has already been reported to the driver team and they are working on it.
Regards,
I'm experiencing the same problem with R9-Nano.
I have already referred to the issue in the past.
Yup. This bug is very annoying. I am writing OpenCL code and because of this bug, I have no proper way of determining the optimal global work size based on the number of compute units...
That's right. btw, I'm wondering where this magic number (14) comes from.
Hi Elias,
Somehow it slipped through the cracks. My apologies. Thanks for reviving it again.
Regards,
Thanks.
Is there any rough estimation on the availability date of a new release? Current release is almost 3 months old.
Good news. AMD GPU Pro 16.40 has just been released. It also supports RHEL 6.8 and 7.2.
AMDGPU-PRO Driver for Linux® – Release Notes
Regards,
That's good news though the bug still remains. R9 Nano still reports 14 max compute units. In addition, the GPU of AMD FX-7500 APU (Spectre) is reported to have 7 compute units where 6 is the correct number.
Thanks for reporting.
Sorry, the issue has not been fixed yet. Please keep patience.
Regards,
The bug still persists when using the just released 16.50 driver.
Please do what's necessary to see this annoying bug corrected.
It was fixed in 16.60, finally! clinfo now properly reports 64 for the R9 nano and 36 for the RX 480.