Is there a way to access the fast 64KB Global Data Share (GDS) on Ellesmere (RX 480), either in OpenCL or in the GCN assembly?
cl_ext_atomic_counters_32 has been deprecated, and I had no luck so far with CLRadeonExtender.
Update (1/4/2017 8:14 p.m. PST): It seems that you need the OpenCL 1.2 ABI to enable the GDS functionarity. You can still create OpenCL 1.2 binaries for Ellesmere with Crimson drivers with the "-legacy" build option, and I was able to run a program with the cl_ext_atomic_counters_32 extension and the counter32_t type by adding the "-Dcl_ext_atomic_counters_32" build option. CLRadeonExtender does not seem to be able to handle OpenCL 1.2 binaries for Ellesmere. Further testing is needed.
Solved! Go to Solution.
I made further progress after the first update. To summarize my findings as to how to access Global Data Share (GDS) on GCN2/3/4 devices with Crimson drivers:
(1) You need the OpenCL 1.2 ABI to enable the GDS functionarity.
(2) You can still create OpenCL 1.2 binaries for GCN2/3/4 devices with Crimson drivers with the "-legacy" build option.
(3) In order to use the formally deprecated cl_ext_atomic_counters_32 extension, you need to specify the following build options: "-legacy -Dcl_ext_atomic_counters_32".
(4) CLRadeonExtender can be modified to handle OpenCL 1.2 binaries for GCN2/3/4 devices. See: Fixes for Ellesmere (RX 480). · zawawawa/CLRX-mirror@05ed08a · GitHub
Hope this helps.
I made further progress after the first update. To summarize my findings as to how to access Global Data Share (GDS) on GCN2/3/4 devices with Crimson drivers:
(1) You need the OpenCL 1.2 ABI to enable the GDS functionarity.
(2) You can still create OpenCL 1.2 binaries for GCN2/3/4 devices with Crimson drivers with the "-legacy" build option.
(3) In order to use the formally deprecated cl_ext_atomic_counters_32 extension, you need to specify the following build options: "-legacy -Dcl_ext_atomic_counters_32".
(4) CLRadeonExtender can be modified to handle OpenCL 1.2 binaries for GCN2/3/4 devices. See: Fixes for Ellesmere (RX 480). · zawawawa/CLRX-mirror@05ed08a · GitHub
Hope this helps.
I also found out that you can access the GDS on RX 480 on Ubuntu 16.04.02 LTS with AMDGPU-Pro 16.60. All you have to do is to set the m0 register to 0x1000. By default, you can only access 4096 bytes at a time. If anybody finds a way to lift this really annoying restriction, please do let me know.
Hi,
The lower 16 bits of M0 reg specifies the size of the accessable gds memory, set it to 0xFFFF!
The upper 16 bits are the base offset in the physical GDS memory, just leave it to 0.
So basically the M0 reg does some basic memory protection.
The thing is, anything greater than 0x1000 for m0 does not work with Linux, AMDGPU-Pro, and RX 480.
I tried almost every possible value for m0 to no avail.
In fact, there are hard-coded restrictions on the amount of GDS segment size set by the Linux kernel module for AMDGPU.
See gfx_v8_0_set_gds_init() in the following Linux kernel source file for example:
linux/gfx_v8_0.c at 5924bbecd0267d87c24110cbe2041b5075173a25 · torvalds/linux · GitHub
I tried to access configuration registers by bypassing the driver and sending PM4 packets directly to the GPU, but that didn't work.
You can find a list of such registers here:
linux/gfx_8_0_d.h at 5924bbecd0267d87c24110cbe2041b5075173a25 · torvalds/linux · GitHub
Do you know how to access them?