cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bgamine
Journeyman III

How can I execute a Kernel on one GPU core ?

How can I do to launch a kernel on 1 GPU core ? and how can I controle the execution on the other cores?

when  "Device fission"  can be available on GPU ?

thx

0 Likes
9 Replies
binying
Challenger

AMD's APP SDK provides various samples, including device fission...

0 Likes

Are you sure ?? Device fission for GPU?? you can read on the doc with Devicefission sample : The Device Fission extension is supported only on the CPU.

0 Likes

SDK 2.7, the documentation of DeviceFission

command line options "--device" : Device on which the program is to be run. Acceptalbe values are cpu or gpu...

0 Likes

Yes but if you look at the sample source code you can find that only CPU can be divided in subdevices.

I repeat my question: can I do thread affinity in Opencl to GPU cores ??

0 Likes

device must support cl_ext_device_fission extension. AFAIK no GPU support this extension. so you can't.

0 Likes
yurtesen
Miniboss

Maybe you can simulate it if you run your kernel with global work size 1 ?

http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html

Why do you want to do this?

Thank you for you suggestion, but when I set the global work size to 1 and the also the local work size to 1 I see the work load of the GPU at 100%, I don't really understand what goes on?? I expected to have a work load of 1/NbCores??

Any explanation of this behaviour?

0 Likes

It might have something to do with how the load is calculated. My wild guess is that GPU does not respond when running kernels, so load is probably calculated based on how long of a unit time that the GPU was unavailable. Therefore it would probably show 100% even if all the threads were sleeping as long as the kernel appears to be running constantly.

I am guessing that, because if a task takes too long, it times out... If the driver could tell that GPU is able to progress forward  or not in the task, in my opinion it wouldnt make sense to have a timeout.

Of course this is just a guess,  perhaps what you can try is to run a kernel with 1 global/local size (perhaps you would need a for loop in there) then try to run with a different number, lets say 2 or 4 global/local size where the original loop is distributed based on the number of global size between threads and see how long the execution takes.

You still did not say why you need this?

0 Likes

you are correct. GPU can run only one kernel/program at the time. only latest GCN based GPU have something called "rings" which enables to run different kernel/program at the GPU at the same time.

0 Likes