cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

FangQ
Journeyman III

Disappointing opencl half-precision performance on vega - any advice?

I bought a Vega 64 recently. From the specs, it has 23 TFLOPs fp16 throughput compared to 12 TFLOP fp32. so I converted portion of my Monte Carlo code to half, expecting to gain some noticeable speed up. Disappointingly, instead of gaining speed, I got a 5% speed drop.

the changes were done for a core function, which I believe is the bottleneck of the code (maybe account for 1/4 of the run-time), see the key

add half precision raytracer, support AMD Vega · fangq/mcxcl@0c11f79 · GitHub

in comparison, here is the float counter-part:

mcxcl/mcx_core.cl at master · fangq/mcxcl · GitHub

my kernel is a compute-bound kernel.

I don't know what is the common scenario when converting to half will bring speedup. in my case, were the conversions or extra registers responsible for the drop? any dos and not-dos when using half?

thanks

PS: the code can be tested by

git clone https://github.com/fangq/mcxcl.git 
cd mcxcl
git checkout
cd src
make clean all
cd ../example/benchmark
./run_benchmark1.sh -G 1 -J "-DUSE_HALF"

removing the -J "-DUSE_HALF" option will enable the original fp32 code
0 Likes
12 Replies
dipak
Staff
Staff

Re: Disappointing opencl half-precision performance on vega - any advice?

Actually, rapid packed math (RPM) feature, which improves the FP16 performance, is currently not exposed to opencl under amdgpu pro. That's why there might be no performance gain compared to FP32 in your case. At this moment, RPM is supported on rocm stack. Following thread suggests that rocm 1.6.4 has the support: OpenCL rapid packed math support for Vega · Issue #219 · RadeonOpenCompute/ROCm · GitHub

0 Likes
FangQ
Journeyman III

Re: Disappointing opencl half-precision performance on vega - any advice?

thanks dipak. I installed rocm on one of my Ubuntu 16.04 boxes, unfortunately it does not support my kernel well. My code runs without a problem with amdgpu-pro ocl driver and previously fglrx driver, but now start to hang with rocm libamdocl64

is there a way to enable RPM on amdgpu-pro or this is simply not possible?

0 Likes
dipak
Staff
Staff

Re: Disappointing opencl half-precision performance on vega - any advice?

Currently, the compiler tool-chain under amdgpu-pro does not support packed math.

0 Likes
FangQ
Journeyman III

Re: Disappointing opencl half-precision performance on vega - any advice?

thanks. I managed to get my code work on rocm for some specific simulation settings, but it still fails in most other tests. Even in the test that it worked, the speed is about 10% of that when using the amdgpu-pro driver.

is there a place for reporting compatibility issues like these? I saw the github repos for different modules, but not sure if there is a better place to report.

0 Likes
dipak
Staff
Staff

Re: Disappointing opencl half-precision performance on vega - any advice?

Currently, rocm related issues are managed at github only. You can report your problem here: Issues · RadeonOpenCompute/ROCm · GitHub. I can see many OpenCL related issues posted there. Here is another place to report rocm OCL issues:  Issues · RadeonOpenCompute/ROCm-OpenCL-Runtime · GitHub

Regarding the performance thing, please make sure that you're using FP16/INT16 datatypes and operations properly to enable the packed math. For example, operations on vector type like half2 or short2 can be benefited from RPM if supported by the compiler.

0 Likes
FangQ
Journeyman III

Re: Disappointing opencl half-precision performance on vega - any advice?

I am curious if the latest amdgpu-pro now supports half-precision hardware in Vega64? or if there is a timeline when this support will be added?

currently, my code has lots of trouble with ROCm, very slow speed, even infinite loops in many simulations. I am not sure if it worth the trouble going the ROCm path.

thanks

0 Likes
dipak
Staff
Staff

Re: Disappointing opencl half-precision performance on vega - any advice?

I don't know its current support status under amdgpu-pro stack. I'll check and get back to you.

0 Likes
dipak
Staff
Staff

Re: Disappointing opencl half-precision performance on vega - any advice?

Half precision is supported on Vega with amdgpu-pro. What is not supported is packed F16 math, only scalar F16 operations are issued. There is no immediate plan for adding packed math support at this moment.

0 Likes
bomby
Adept I

Re: Disappointing opencl half-precision performance on vega - any advice?

dipak wrote:

Half precision is supported on Vega with amdgpu-pro. What is not supported is packed F16 math, only scalar F16 operations are issued. There is no immediate plan for adding packed math support at this moment.

But why? Rapid Packed Math support was promised since the Vega Technology Preview in January 2016? Why can't you, or are you not allowed to, enable it for OpenCL in AMDGPU-Pro and for Windows?

0 Likes