Yay!
https://www.khronos.org/opencl/
Main new features:
And now let me tell again you that SPIR is the most critical feature you should implement. The 90% of the enterprises I know aren't using OpenCL because they don't want to distribute their kernel source with the app
Looks like a lot of awesome features
Also SPIR 1.2 spec is released for review. Great day for OpenCL people today.
Wonder if is there any current AMD GPU which can support OpenCL 2.0 in HW as many people coments that Dynamic Parallelism is not possible on current HW.
Yes i'm kind of sceptical on that too, I didn't see recursion for non-kernel functions.
That doesn't follow CUDA roadmap. Recursion for non-kernel functions is available since CUDA 3.0 and Fermi chips dynamic parallelism was introduce in CUDA 5.0 and Kepler 3.5 . I though Dynamic parallelism would be harder to implement so they would begin with classic recursion.
However pretty good stuff in those specs, love pipes, pointer containing structs...
Roger
Dynamic parallelism and recursion are not the same thing. One is a capability of some combination of the compiler and the ISA (in terms of calling conventions, stacks and true function call/return behaviour) while the other is a capability of the queuing infrastructure.
ok, I read the specs more closely and I understand what you mean. I had in mind the CUDA Dynamic Parallelism which offer kernel function recursion, a parent kernel can block and wait the execution of his child , then reseting his execution when his child's finished. It seems it's not that extended in OpenCL 2.0.
Roger
Correct. It is a slightly lower level model where the user would have to split the kernel manually into the primary kernel and the post-enqueue continuation which could then wait on the child because completion events nest. This is a slightly more generic approach given the range of architectures involved. A higher level compiler could perform this transformation for you, that is just not specified as part of OpenCL 2.
Why don't you distribute your code in binary or in amd_il format (in an .elf)? When it comes to reverse engineering even amd_il format looks like full garbage just as the disassembled ISA. Both need lot of effort if someone wants to understand your program.
Cuz SPIR is better.-.. it works with other vendors too
Maybe the AMD radeon 7000 series resist the OpenCL 2.0?
I'm glad you like the changes we've made in the spec, it's been a lot of work to get it this far.
Don't forget subgroups in your list (as a KHR at the moment). People have long asked for wavefront parameters to be visible to the programmer. Subgroups should provide that in an abstract way. Remember also that a wavefront is a thread so you can think about what that means for what you can do safely in the execution model.
Please do give feedback on the Khronos forums and bugzilla. We will be making small modifications to the final release spec in a few months time based on feedback we receive on this public review draft and we hope that will lead to a much stronger final specification.
Btw....
can you explain a bit more about this? What's Generic Adress Space? To use void* and other pointers without having to specify __global/__local/__private? Why is this useful and applications, pls?
and now we can fire kernels from kernels... but .... are executed thread-by-thread or...? If you fire a kernel form each threadID that gonna kill SIMD grouping... I don't understand how this is executed?
And I think the spec is lacking two important feature we were demanding: a flag to tell OpenCL to disable the Watchdog for long-time operations and a way to use different size images in an array.
There are many use cases, but the obvious one is where you want to make a generic function to use for multiple address spaces:
int foo(int *a);
kernel void bar(local int *a, global int *b)
{
foo(a);
foo(b);
}
Launching kernels from kernels is like launching from the host - it's an asynchronous dispatch. The launching work items don't block. You get completely new threads once the current ones have completed (or concurrently if you happen to have cores available and the launchers are long running). There's no relationship between the original kernel and the launched kernel.
Disabling the watchdog on a per-launch basis is may not be practical if it is an OS timer rather than something the driver has control of. I can't really comment on why some features got in and others didn't. There is obviously ongoing negotiation on priorities given the needs of different vendors, their customers and the capabilities of the planned devices.
The spec is provisional. FYI.
Note: "clCreatCommandQueue" is deprecated. We need to go with "clCreateCommandQueueWithProperties". Bit sad to see such a common, basic API getting deprecated.
Other deprecated APIs include clCreateSampler and clEnqueueTask.
"Shared Virtual Memory" capability was almost there in AMD cards. The "VM" string in the OpenCl driver suggests the ability to map OpenCL kernel pointers into Host RAM. But now -- it is more formally exposed.
Also, Will help HSA Runtimes and HSA aware OpenCL applications
- Bruhaspati