Archives Discussions

bgamine · ‎10-03-2012

How can I do to launch a kernel on 1 GPU core ? and how can I controle the execution on the other cores?

when "Device fission" can be available on GPU ?

thx

binying · ‎10-03-2012

AMD's APP SDK provides various samples, including device fission...

bgamine · ‎10-03-2012

Are you sure ?? Device fission for GPU?? you can read on the doc with Devicefission sample : The Device Fission extension is supported only on the CPU.

binying · ‎10-03-2012

SDK 2.7, the documentation of DeviceFission

command line options "--device" : Device on which the program is to be run. Acceptalbe values are cpu or gpu...

bgamine · ‎10-03-2012

Yes but if you look at the sample source code you can find that only CPU can be divided in subdevices.

I repeat my question: can I do thread affinity in Opencl to GPU cores ??

nou · ‎10-03-2012

device must support cl_ext_device_fission extension. AFAIK no GPU support this extension. so you can't.

yurtesen · ‎10-03-2012

Maybe you can simulate it if you run your kernel with global work size 1 ?

http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html

Why do you want to do this?

bgamine · ‎10-04-2012

Thank you for you suggestion, but when I set the global work size to 1 and the also the local work size to 1 I see the work load of the GPU at 100%, I don't really understand what goes on?? I expected to have a work load of 1/NbCores??

Any explanation of this behaviour?

yurtesen · ‎10-04-2012

It might have something to do with how the load is calculated. My wild guess is that GPU does not respond when running kernels, so load is probably calculated based on how long of a unit time that the GPU was unavailable. Therefore it would probably show 100% even if all the threads were sleeping as long as the kernel appears to be running constantly.

I am guessing that, because if a task takes too long, it times out... If the driver could tell that GPU is able to progress forward or not in the task, in my opinion it wouldnt make sense to have a timeout.

Of course this is just a guess, perhaps what you can try is to run a kernel with 1 global/local size (perhaps you would need a for loop in there) then try to run with a different number, lets say 2 or 4 global/local size where the original loop is distributed based on the number of global size between threads and see how long the execution takes.

You still did not say why you need this?

nou · ‎10-04-2012

you are correct. GPU can run only one kernel/program at the time. only latest GCN based GPU have something called "rings" which enables to run different kernel/program at the GPU at the same time.

Archives Discussions

How can I execute a Kernel on one GPU core ?