Do you know of any samples where the device fission extension is used?


For better cache usage in a raycaster I am trying to have every core of the cpu work on a column of the picture instead of working on random work-groups.

There are a couple of things the documentation left me unsure about.


- Can I have both 8 command queues for the cores of the CPU as well as a command queue for the parent device (the entire CPU)?

So I could use the individual cores for the raycasting, but then use the entire CPU for post-processing

- Can I build the program with only one build call using all subdevices and the parent device as parameter?

- If I use only one program can I use only one instance of the kernels as well?

And they could all work using the same kernel with different arguments? at the same time?

So when I want to start the kernel on all cores I would first set the common kernel arguments and then for every core just set the kernel-specific arguments, enqueue the kernel and then move on to the next core (change a kernel argument, enqueue...) ?


- can you somehow use the device fission extension with the OpenCL 1.1 C++ bindings and the current StreamSDK?

If I define USE_CL_DEVICE_FISSION it won't compile because the cl_ext.h is still in revision 10424 instead of 11702


