How can I do to launch a kernel on 1 GPU core ? and how can I controle the execution on the other cores?
when "Device fission" can be available on GPU ?
AMD's APP SDK provides various samples, including device fission...
Are you sure ?? Device fission for GPU?? you can read on the doc with Devicefission sample : The Device Fission extension is supported only on the CPU.
SDK 2.7, the documentation of DeviceFission
command line options "--device" : Device on which the program is to be run. Acceptalbe values are cpu or gpu...
Yes but if you look at the sample source code you can find that only CPU can be divided in subdevices.
I repeat my question: can I do thread affinity in Opencl to GPU cores ??
device must support cl_ext_device_fission extension. AFAIK no GPU support this extension. so you can't.
Maybe you can simulate it if you run your kernel with global work size 1 ?
Why do you want to do this?
Thank you for you suggestion, but when I set the global work size to 1 and the also the local work size to 1 I see the work load of the GPU at 100%, I don't really understand what goes on?? I expected to have a work load of 1/NbCores??
Any explanation of this behaviour?
It might have something to do with how the load is calculated. My wild guess is that GPU does not respond when running kernels, so load is probably calculated based on how long of a unit time that the GPU was unavailable. Therefore it would probably show 100% even if all the threads were sleeping as long as the kernel appears to be running constantly.
I am guessing that, because if a task takes too long, it times out... If the driver could tell that GPU is able to progress forward or not in the task, in my opinion it wouldnt make sense to have a timeout.
Of course this is just a guess, perhaps what you can try is to run a kernel with 1 global/local size (perhaps you would need a for loop in there) then try to run with a different number, lets say 2 or 4 global/local size where the original loop is distributed based on the number of global size between threads and see how long the execution takes.
You still did not say why you need this?
you are correct. GPU can run only one kernel/program at the time. only latest GCN based GPU have something called "rings" which enables to run different kernel/program at the GPU at the same time.
Retrieving data ...