I've been playing around with the clCreateSubDevices feature of OpenCL 1.2 and I've found and interesting difference between implementations.
With AMD, if you create a context with a device and _then_ create sub-devices, you can manipulate these sub-devices as if they were in the same context as their root device. For example, you can create command queues and launch kernels on them. In fact, this is what AMD's own fission example in the 3.0 beta APP SDK does.
However, Intel does not operate the same way: if you attempt to create a command queue on a subdevice which was not itself included in the context creation, it will fail.
The specification does not say anything about this case, it only mentions that sub-devices can be used anywhere devices can be used, although it does explicitly mention that you can create contexts with them.
I suspect AMD's behavior can be considered “wrong”, in the sense that implicit additions of sub-devices to contexts (which is in practice what is happening in this case) is nowhere mentioned in the specification, and it would break any code that assumes that all (and only) the devices that can operate are the ones actually in the context. It also introduces, I suspect, dangerous bugs with respect to reference counting. On the other hand, conceptually there is nothing wrong in assuming that a sub-device could be used wherever its parent device could be used (it being a subdevice and all).
I'm perplexed. What should be considered the correct behavior?
Really interesting observation. Honestly, I'm not quite sure either. I've forwarded your question to some experienced folks. As soon as I get any reply from them, I'll share with you.
Regards,
Thank you very much. I've also taken the initiative of posting the question on the Khronos forums, since this is more of a general question about the ambiguity of the specification than something specific to AMD. For reference, the link to the discussion is here (no answer there yet though)
Good initiative. Hope Khronos OpenCL team will take some action to clarify this ambiguity.
BTW, did you observe this different behaviour for both CPU and GPU devices? or for any particular device type?
Regards,
Sorry for the late reply.
I've only played around on CPU for the time being.
Khronos has replied to my post, mentioning that the sub-devices should not be available for manipulation unless they were passed as devices at context creation, so the AMD approach is more relaxed than the spec (which in itself is not a bad thing: "be strict in what you produce but liberal in what you accept" is a good strategy; however, the AMD SDK should probably be adjusted to rely on the standard behavior rather than assuming AMD's relaxed one)
Thanks for sharing the information. Will try to pass your suggestions to concerned folks.
Regards,