Archives Discussions

FangQ · ‎11-19-2017

I bought a Vega 64 recently. From the specs, it has 23 TFLOPs fp16 throughput compared to 12 TFLOP fp32. so I converted portion of my Monte Carlo code to half, expecting to gain some noticeable speed up. Disappointingly, instead of gaining speed, I got a 5% speed drop.

the changes were done for a core function, which I believe is the bottleneck of the code (maybe account for 1/4 of the run-time), see the key

add half precision raytracer, support AMD Vega · fangq/mcxcl@0c11f79 · GitHub

in comparison, here is the float counter-part:

mcxcl/mcx_core.cl at master · fangq/mcxcl · GitHub

my kernel is a compute-bound kernel.

I don't know what is the common scenario when converting to half will bring speedup. in my case, were the conversions or extra registers responsible for the drop? any dos and not-dos when using half?

thanks

PS: the code can be tested by

git clone https://github.com/fangq/mcxcl.git 
cd mcxcl 
git checkout
cd src 
make clean all 
cd ../example/benchmark 
./run_benchmark1.sh -G 1 -J "-DUSE_HALF"

removing the -J "-DUSE_HALF" option will enable the original fp32 code

dipak · ‎11-20-2017

Actually, rapid packed math (RPM) feature, which improves the FP16 performance, is currently not exposed to opencl under amdgpu pro. That's why there might be no performance gain compared to FP32 in your case. At this moment, RPM is supported on rocm stack. Following thread suggests that rocm 1.6.4 has the support: OpenCL rapid packed math support for Vega · Issue #219 · RadeonOpenCompute/ROCm · GitHub

FangQ · ‎11-20-2017

thanks dipak. I installed rocm on one of my Ubuntu 16.04 boxes, unfortunately it does not support my kernel well. My code runs without a problem with amdgpu-pro ocl driver and previously fglrx driver, but now start to hang with rocm libamdocl64

is there a way to enable RPM on amdgpu-pro or this is simply not possible?

dipak · ‎11-21-2017

Currently, the compiler tool-chain under amdgpu-pro does not support packed math.

FangQ · ‎11-21-2017

thanks. I managed to get my code work on rocm for some specific simulation settings, but it still fails in most other tests. Even in the test that it worked, the speed is about 10% of that when using the amdgpu-pro driver.

is there a place for reporting compatibility issues like these? I saw the github repos for different modules, but not sure if there is a better place to report.

dipak · ‎11-22-2017

Currently, rocm related issues are managed at github only. You can report your problem here: Issues · RadeonOpenCompute/ROCm · GitHub. I can see many OpenCL related issues posted there. Here is another place to report rocm OCL issues: Issues · RadeonOpenCompute/ROCm-OpenCL-Runtime · GitHub

Regarding the performance thing, please make sure that you're using FP16/INT16 datatypes and operations properly to enable the packed math. For example, operations on vector type like half2 or short2 can be benefited from RPM if supported by the compiler.

FangQ · ‎01-27-2018

I am curious if the latest amdgpu-pro now supports half-precision hardware in Vega64? or if there is a timeline when this support will be added?

currently, my code has lots of trouble with ROCm, very slow speed, even infinite loops in many simulations. I am not sure if it worth the trouble going the ROCm path.

thanks

dipak · ‎01-29-2018

I don't know its current support status under amdgpu-pro stack. I'll check and get back to you.

dipak · ‎01-31-2018

Half precision is supported on Vega with amdgpu-pro. What is not supported is packed F16 math, only scalar F16 operations are issued. There is no immediate plan for adding packed math support at this moment.

bomby · ‎01-31-2018

dipak wrote:
Half precision is supported on Vega with amdgpu-pro. What is not supported is packed F16 math, only scalar F16 operations are issued. There is no immediate plan for adding packed math support at this moment.

But why? Rapid Packed Math support was promised since the Vega Technology Preview in January 2016? Why can't you, or are you not allowed to, enable it for OpenCL in AMDGPU-Pro and for Windows?

dipak · ‎02-01-2018

At this moment, RPM is only supported by the newer compiler tool-chain under rocm stack. There is a plan to implement it on amdgpu-pro, but can't say an ETA.

FangQ · ‎08-30-2018

hi dipak

just want to follow up on this previous issue, and wondering if you see any confirmation/plan if the fp16 support for vega has been added to amdgpu driver?

I am currently playing with rocm 1.8.3, rocminfo does say fp16 is supported on my card, but I did not observe any speed improvement. the biggest issue for rocm is that it has a 10-fold slow down compared to amdgpu driver.

thanks

dipak · ‎08-31-2018

any confirmation/plan if the fp16 support for vega has been added to amdgpu driver?

As I earlier said, half precision is already supported on Vega with amdgpu-pro. Are you referring to packed FP16 math support? If so, I need to check with the compiler team to know the current status.

Thanks.

Archives Discussions

Disappointing opencl half-precision performance on vega - any advice?