I've been playing around with the clCreateSubDevices feature of OpenCL 1.2 and I've found and interesting difference between implementations.
With AMD, if you create a context with a device and _then_ create sub-devices, you can manipulate these sub-devices as if they were in the same context as their root device. For example, you can create command queues and launch kernels on them. In fact, this is what AMD's own fission example in the 3.0 beta APP SDK does.
However, Intel does not operate the same way: if you attempt to create a command queue on a subdevice which was not itself included in the context creation, it will fail.
The specification does not say anything about this case, it only mentions that sub-devices can be used anywhere devices can be used, although it does explicitly mention that you can create contexts with them.
I suspect AMD's behavior can be considered “wrong”, in the sense that implicit additions of sub-devices to contexts (which is in practice what is happening in this case) is nowhere mentioned in the specification, and it would break any code that assumes that all (and only) the devices that can operate are the ones actually in the context. It also introduces, I suspect, dangerous bugs with respect to reference counting. On the other hand, conceptually there is nothing wrong in assuming that a sub-device could be used wherever its parent device could be used (it being a subdevice and all).
I'm perplexed. What should be considered the correct behavior?