AnsweredAssumed Answered

Cat13.4: How to avoid the high CPU load for GPU kernels?

Question asked by Bdot on May 8, 2013
Latest reply on Dec 13, 2013 by Bdot

Hi,

 

since 13.4 and 13.5 beta, my OpenCL GPU program consumes ~80% of one CPU core while in clFinish, waiting for a string of GPU kernels and a final clEnqueueReadBuffer. My main thread looks like this

[code]

ntdll.dll!NtWaitForSingleObject+0xa

KERNELBASE.dll!WaitForSingleObjectEx+0x9c

amdocl64.dll!clGetSamplerInfo+0x1031c

amdocl64.dll!clGetSamplerInfo+0x101f8

amdocl64.dll!clGetSamplerInfo+0x120ea

amdocl64.dll!clGetSamplerInfo+0x4b51

amdocl64.dll!clFinish+0x89

mfakto.exe!tf_class_opencl+0xf94

mfakto.exe!tf+0x583

mfakto.exe!main+0x117d

mfakto.exe!__tmainCRTStartup+0x11a

kernel32.dll!BaseThreadInitThunk+0xd

ntdll.dll!RtlUserThreadStart+0x21

[/code]

and is using 0.01% CPU.

 

However, there is another thread:

[code]

amdocl64.dll!clIcdGetPlatformIDsKHR+0x3e5

amdocl64.dll!clGetSamplerInfo+0x49cf

amdocl64.dll!clGetSamplerInfo+0x38af2

amdocl64.dll!clGetSamplerInfo+0x38d18

amdocl64.dll!clGetSamplerInfo+0x504e

amdocl64.dll!clGetSamplerInfo+0x5172

amdocl64.dll!clGetSamplerInfo+0x1ccf

kernel32.dll!BaseThreadInitThunk+0xd

ntdll.dll!RtlUserThreadStart+0x21

[/code]

that is using ~19% CPU (76% of a core). The upper part of the stack changes - it is not stuck in clIcdGetPlatformIDsKHR.

 

When using a CPU-hungry program to consume almost all CPU and starve my program, then this thread's CPU load goes back to almost nothing, but the GPU is not fed very well and GPU load is very jumpy between 70-98%. GPU load would normally be pegged at 100%.

 

When rolling back to cat13.3, the program's total CPU load is at ~0.1-0.3%, and running a CPU-hog has almost no effect on my program.

 

Is there anything special to be done on the newer drivers to make them leave the CPU alone? Is there any setting to get the CPU-behavior of the previous drivers?

 

My environment: HD5770+Phenom II X4 955, Win7-64. I got reports that the same happens with an APU and the integrated 6550D (also Win7-64).

 

Note: Making the final clEnqueueReadBuffer synchronous instead of the final clFinish does not change the CPU load.

 

... and could someone please give me a hint how I can get a proper code formatting in this forum? Thanks a lot!

Outcomes