Hello. I'm trying to get OpenCL working on Linux Ubuntu 20.04.1 LTS on my 6900XT. I've tried these combinations:
ubuntu kernel 5.4.0-54
amdgpu driver 20.45-1188099 with included rocr dkms
result: GPU is detected but clinfo hangs part way through displaying the info. My opencl code hangs trying to create a context
ubuntu kernel 5.4.0-54
amdgpu driver 20.45-1188099 with --no-dkms and rocm 220.127.116.11000-23 (rocm-opencl 3.6Beta-17-g875c1f8-rocm-rel-4.0-23)
result: GPU is not detected. clinfo shows the platform AMD Accelerated Parallel Processing but has 0 devices
upstream kernel 5.10.0-051000
rocm 18.104.22.168000-23 (rocm-opencl 3.6Beta-17-g875c1f8-rocm-rel-4.0-23)
result: rocminfo sees the GPU fine, clinfo hangs after displaying some of the data for the GPU just like in 1)
Here's the GPU data for rocminfo:
******* Agent 2 ******* Name: gfx1030 Uuid: GPU-XX Marketing Name: Device 73bf Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 4096(0x1000) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB Chip ID: 29631(0x73bf) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2660 BDFID: 768 Internal Node ID: 1 Compute Unit: 80 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: FALSE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 64(0x40) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1030 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32
Here's the clinfo for the GPU:
Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Device 73bf Device Topology: PCI[ B#3, D#0, F#0 ] Max compute units: 40 Max work items dimensions: 3 Max work items: 1024 Max work items: 1024 Max work items: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 2660Mhz Address bits: 64 Max memory allocation: 14588628168 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 16384 Max image 3D height: 16384 Max image 3D depth: 8192 Max samplers within kernel: 29631 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 17163091968 Constant buffer size: 14588628168 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 1703726280 Max global variable size: 14588628168 Max global variable preferred total size: 17163091968 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0
<clinfo hangs here>
Has anyone gotten OpenCL working? Am I doing something wrong? I got this card specifically for a compute research project I'm doing so any help I can get on this is appreciated. I'm happy to provide any other info I can. Thanks!
A couple of edits to my above post. My pyopencl code doesn't hang creating the context, it hangs after that when creating the command queue.
Also, I'm running the GPU with an FX-8350. From reading the rocm documentation older GPUs require PCIe 3.0 atomics which this CPU doesn't support but I also am guessing the 6900XT is new enough to not require it.
I'm open to other OS/software combinations to get this working so if anyone has this working under a different configuration I'd be happy to try it too! I've seen opencl benchmarks for this card so it has to work somehow right?
Unfortunately I never solved this. I confirmed that the same problem exists on CentOS. I've been running my code on Windows which isn't efficient for my workflow and also periodically produces bizarre results so I've been waiting for AMD to release an update for the Linux driver. It has been 2 months so hopefully it will happen soon.
Update: I downloaded the new 20.50 linux drivers and installed them in a fresh install of Ubuntu and it still hangs on clinfo. I was hoping that this basic functionality would work after 3 months but no luck yet
Good news, The update to rocm-opencl/HSA-rocr 4.1 today seems to have fixed it for me .
Working well now
If memory serves the 6xxx series requires rocm to use openCL so might be an idea to try that Terrence !
Unbutu should have the update I would have thought.
Thanks for the info! I tried installing rocm-dkms from the ROCm repo but then my card wasn't detected at all. I also tried installing rocm-opencl4.1.0 and hsa-rocr-dev4.1.0 among others but that didn't help either. Did I miss a step?
You need to remove the previous version as upgrading causes issues. Best off purging all related romc/hsa packages and then trying.
I don't have have hsa-rocr-dev installed so but have the following on mine