Main new features:
And now let me tell again you that SPIR is the most critical feature you should implement. The 90% of the enterprises I know aren't using OpenCL because they don't want to distribute their kernel source with the app
Yes i'm kind of sceptical on that too, I didn't see recursion for non-kernel functions.
That doesn't follow CUDA roadmap. Recursion for non-kernel functions is available since CUDA 3.0 and Fermi chips dynamic parallelism was introduce in CUDA 5.0 and Kepler 3.5 . I though Dynamic parallelism would be harder to implement so they would begin with classic recursion.
However pretty good stuff in those specs, love pipes, pointer containing structs...
Why don't you distribute your code in binary or in amd_il format (in an .elf)? When it comes to reverse engineering even amd_il format looks like full garbage just as the disassembled ISA. Both need lot of effort if someone wants to understand your program.
Dynamic parallelism and recursion are not the same thing. One is a capability of some combination of the compiler and the ISA (in terms of calling conventions, stacks and true function call/return behaviour) while the other is a capability of the queuing infrastructure.
I'm glad you like the changes we've made in the spec, it's been a lot of work to get it this far.
Don't forget subgroups in your list (as a KHR at the moment). People have long asked for wavefront parameters to be visible to the programmer. Subgroups should provide that in an abstract way. Remember also that a wavefront is a thread so you can think about what that means for what you can do safely in the execution model.
Please do give feedback on the Khronos forums and bugzilla. We will be making small modifications to the final release spec in a few months time based on feedback we receive on this public review draft and we hope that will lead to a much stronger final specification.
can you explain a bit more about this? What's Generic Adress Space? To use void* and other pointers without having to specify __global/__local/__private? Why is this useful and applications, pls?
and now we can fire kernels from kernels... but .... are executed thread-by-thread or...? If you fire a kernel form each threadID that gonna kill SIMD grouping... I don't understand how this is executed?
And I think the spec is lacking two important feature we were demanding: a flag to tell OpenCL to disable the Watchdog for long-time operations and a way to use different size images in an array.