AnsweredAssumed Answered

Rocm performance low issue with HIP-Examples-master/gpu-burn

Question asked by j0hnny on Aug 7, 2018
Latest reply on Aug 8, 2018 by dipak

Hi,
I met with rocm1.8 +ubuntu18.04 LTS (16.04 same issue)+ 2 cards Ellesmere [Radeon RX 470/480] performance low issue.
I ran hip example code to burn gpu to check performance, the code I run is the HIP examples: HIP-Examples-master/gpu-burn. from GitHub - ROCm-Developer-Tools/HIP-Examples: Examples for HIP
While gpu burnning, rocm-smi showed that MCLK has only 500M, but SCLK increased to 1130MHz from 300MHz in idle.
I could use rocm-smi -d 1 --setmclk 2 to change MCLK to 2000M, rocm-smi showed it changed to 2000M but performance not increase at all, I think GPU not really worked at 2000MHz.

 So my question is: 1. Why gpu performance is slow under gpu-burn, what can I try to increase performance? 2. Is it related to 500M MCLK  ? 3. MCLK changed to 2000M not really work, right? 

idle:

GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD
1 35c 30.178W 300Mhz 500Mhz 0.0% manual 0% 0%
0 34c 32.244W 300Mhz 500Mhz 0.0% manual 0% 0%

While burning:

GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD
1 45c 104.171W 1130Mhz 500Mhz 0.0% manual 0% 0%
0 43c 98.192W 1130Mhz 500Mhz 0.0% manual 0% 0%

After rocm-smi -d 1 --setmclk 2

GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD
1 38c 32.153W 300Mhz 2000Mhz 0.0% manual 0% 0%
0 36c 34.176W 300Mhz 2000Mhz 0.0% manual 0% 0%

 

About HIP examples gpu-burn it keeps doing dimension 512x512 matrix mul, like A[i] * B[i] =C[i], and all the 3 matrixs stays in the ddr by calling hipmalloc.  So for this case, gpu compute unit should access ddr memory alot. MCLK is critical for the performance.

Below is burn kernel example.


int BurnKernel::runComputeKernel()
{
     int err = 0;
     for (int i = 0; mRunKernel && i < mNumIterations; ++i) {
          hipLaunchKernel(
               /* Launch params /
               HIP_KERNEL_NAME(hip_sgemm_kernel),
               dim3(cRowSize/cBlockSize, cRowSize/cBlockSize, 1),
               dim3(cBlockSize,cBlockSize,1), 0, 0,
               /
Kernel params /
               cRowSize, cRowSize, cRowSize, cAlpha,
               mDeviceAdata, cRowSize,
               mDeviceBdata, cRowSize,
               cBeta,
               mDeviceCdata + i
cMatrixSize,
               cRowSize);
     }
     checkError(hipDeviceSynchronize(), "Sync");

     return err;

}
``

Outcomes