I has write a simple kernel to do memcopy and loop to call the kernel 500 times to test GPU copy speed. but when the code is running ,I found the cpu usage is so high(50%, my cpu is dual core).
As my understanding, the kernel will run on the GPU, and CPU has seldom work to do except for controlling the loop, so the CPU usage should be low, Am I right? or is there anything wrong with my code? Can someone help me about it? Thank you!
my code is as below:
kernel void kernel_brookcopy(float4 input<>, out float4 output<>
output = input;
the caller of the kernel is:
float4 in<480, 720>;
int iteration = 500;
int i = 0;