cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bubu
Adept II

OpenCL 2.0 spec out !

Yay!

https://www.khronos.org/opencl/

Main new features:

  • Shared Virtual Memory
    Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices.
  • Dynamic Parallelism
    Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks.
  • Generic Address Space
    Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application.
  • Images
    Improved image support including sRGB images and 3D image writes, the ability for kernels to read from and write to the same image, and the creation of OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for improved OpenGL interop.
  • C11 Atomics
    A subset of C11 atomics and synchronization operations to enable assignments in one work-item to be visible to other work-items in a work-group, across work-groups executing on a device or for sharing data between the OpenCL device and host.
  • Pipes
    Pipes are memory objects that store data organized as a FIFO and OpenCL 2.0 provides built-in functions for kernels to read from or write to a pipe, providing straightforward programming of pipe data structures that can be highly optimized by OpenCL implementers.
  • Android Installable Client Driver Extension
    Enables OpenCL implementations to be discovered and loaded as a shared object on Android systems.

And now let me tell again you that SPIR is the most critical feature you should implement. The 90% of the enterprises I know aren't using OpenCL because they don't want to distribute their kernel source with the app

0 Likes
13 Replies
himanshu_gautam
Grandmaster

Looks like a lot of awesome features

Also SPIR 1.2 spec is released for review. Great day for OpenCL people today.

0 Likes
nou
Exemplar

Wonder if is there any current AMD GPU which can support OpenCL 2.0 in HW as many people coments that Dynamic Parallelism is not possible on current HW.

0 Likes

Yes i'm kind of sceptical on that too, I didn't see recursion for non-kernel functions.

That doesn't follow CUDA roadmap. Recursion for non-kernel functions is available since CUDA 3.0 and Fermi chips dynamic parallelism was introduce in CUDA 5.0  and Kepler 3.5 . I though Dynamic parallelism would be harder to implement so they would begin with classic recursion.

However pretty good stuff in those specs, love pipes, pointer containing structs...

Roger

0 Likes

Dynamic parallelism and recursion are not the same thing. One is a capability of some combination of the compiler and the ISA (in terms of calling conventions, stacks and true function call/return behaviour) while the other is a capability of the queuing infrastructure.

0 Likes

ok, I read the specs more closely and I understand what you mean. I had in mind the CUDA Dynamic Parallelism which offer kernel function recursion, a parent kernel can block and wait the execution of his child , then reseting his execution when his child's finished. It seems it's not that extended in OpenCL 2.0.

Roger

0 Likes

Correct. It is a slightly lower level model where the user would have to split the kernel manually into the primary kernel and the post-enqueue continuation which could then wait on the child because completion events nest. This is a slightly more generic approach given the range of architectures involved. A higher level compiler could perform this transformation for you, that is just not specified as part of OpenCL 2.

0 Likes
realhet
Miniboss

Why don't you distribute your code in binary or in amd_il format (in an .elf)? When it comes to reverse engineering even amd_il format looks like full garbage just as the disassembled ISA. Both need lot of effort if someone wants to understand your program.

0 Likes

Cuz SPIR is better.-.. it works with other vendors too

0 Likes
cusa123
Adept I

Maybe the AMD radeon 7000 series resist the OpenCL 2.0?

0 Likes
LeeHowes
Staff

I'm glad you like the changes we've made in the spec, it's been a lot of work to get it this far.

Don't forget subgroups in your list (as a KHR at the moment). People have long asked for wavefront parameters to be visible to the programmer. Subgroups should provide that in an abstract way. Remember also that a wavefront is a thread so you can think about what that means for what you can do safely in the execution model.

Please do give feedback on the Khronos forums and bugzilla. We will be making small modifications to the final release spec in a few months time based on feedback we receive on this public review draft and we hope that will lead to a much stronger final specification.

0 Likes
bubu
Adept II

Btw....

can you explain a bit more about this? What's Generic Adress Space? To use void* and other pointers without having to specify __global/__local/__private? Why is this useful and applications, pls?

and now we can fire kernels from kernels... but .... are executed thread-by-thread or...? If you fire a kernel form each threadID that gonna kill SIMD grouping... I don't understand how this is executed?

And I think the spec is lacking two important feature we were demanding: a flag to tell OpenCL to disable the Watchdog for long-time operations and a way to use different size images in an array.

0 Likes

There are many use cases, but the obvious one is where you want to make a generic function to use for multiple address spaces:

int foo(int *a);

kernel void bar(local int *a, global int *b)

{

  foo(a);

  foo(b);
}

Launching kernels from kernels is like launching from the host - it's an asynchronous dispatch. The launching work items don't block. You get completely new threads once the current ones have completed (or concurrently if you happen to have cores available and the launchers are long running). There's no relationship between the original kernel and the launched kernel.

Disabling the watchdog on a per-launch basis is may not be practical if it is an OS timer rather than something the driver has control of. I can't really comment on why some features got in and others didn't. There is obviously ongoing negotiation on priorities given the needs of different vendors, their customers and the capabilities of the planned devices.

0 Likes
himanshu_gautam
Grandmaster

The spec is provisional. FYI.

Note: "clCreatCommandQueue" is deprecated. We need to go with "clCreateCommandQueueWithProperties". Bit sad to see such a common, basic API getting deprecated.

Other deprecated APIs include clCreateSampler and clEnqueueTask.

"Shared Virtual Memory" capability was almost there in AMD cards. The "VM" string in the OpenCl driver suggests the ability to map OpenCL kernel pointers into Host RAM. But now -- it is more formally exposed.

Also, Will help HSA Runtimes and HSA aware OpenCL applications

- Bruhaspati

0 Likes