I'm pretty interested on AMD HSA stack as much perhaps as when I heard about CUDA stack on Nvidia GPUs..
so naturally lots of questions sorry in advance if have been answered or are more or less evident..
First is about OpenCL support, it's OpenCL 2.0 or at start will be 1.2.. if it's only 1.2 at start how far are we from OpenCL 2.0 HSA stack on Linux?
Also I see on github HSAFoundation/CLOC and HSAFoundation/HLC-HSAIL-Development-LLVM as projects to generate HSAIL from OpenCL kernel files. One is closed source other open source.. I'm interested of course on open source one.. So right now, how far (from a maturity, feature set, etc..) is the closed source from the open source one.. are they both OpenCL 2.0 ready? only the propietary?
Also don't know if that makes sense, but there is interest for CLOC (the binary one) to publish binaries for Windows.. that could be useful if OpenCL Windows implementation accepts to feed HSAIL kernels directly and assuming that CLOC gets updated faster for bugs/features that OpenCL on Catalyst Windows drivers..
don't know if this stack also will interoperate with Mesa or/and fglrx and how OpenCL/OpenGL interop is handled (i.e. how efficient it is) on each case i.e. on Mesa and fglrx drivers..
As said on Twitter hope AMD can enable that stack on it's dGPUs and better from GCN 1.0 like 7970 cards onwards so owners of it can use with radeonsi drivers get good OpenCL support without fglrx binaries..
Also interesting seems SPIR2 which allows to other than OpenCL C languages to target GPUs with functionality similar equal to OpenCL 2.0 (like dynamic parallelism).. would be good if we have a path for this way on HSA stack i.e. a translator to generate HSAIL from SPIR2.. This is a non direct way of asking for support for expanding support of cl_khr_spir extension to SPIR2 on the OpenCL HSA implementation..
just to finish I'm pretty interested on dynamic parallelism feature (GPU can launch kernels from kernels).. it's already supported on the HSA open source stack? I say that because I asked on of the early projects using HSA stack (C++ AMP for Linux) and they say HSA stack isn't ready for dynamic parallelislm (see on multicoreware / cppamp-driver-ng / issues / #20 - Request for support dynamic parallelism in OCL 2.0...: I'm sorry that as of now, none of available OpenCL 2.0 driver or HSA runtime have infrastructure ready to implement dynamic parallelism)if that's the case hope we get a simple HSA sample showing it how to use dynamic parallelism: in HSAIL language, how to set create device queue from HSA RT API, how to set async or sync launch of this nested GPU call, etc..
Last I'm interested is seems still OpenCL 2.0 doesn't catch feature set of CUDA with Fermi (CUDA CC>=2.0) GPUs, that supported true recursion of GPU functions, function pointers, pointers from pointers, etc..
All is supported and seeing PTX they doesn't do any tricks (i.e. like expanding function calls in recursion,etc..)..
Seems HSA hardware spec should support all of that so would be interesting to know if that's already working on initial implementation of HSA stack or will be supported later..
if yes hope all the effort being open source someone if not AMD is interested can enable all that relaxations on an OpenCL 2.0 improved implementation one that lifts related restrictions in 6.9 section of OpenCL C language spec:
*Pointers to functions are not allowed
*Recursion is not supported.
Perhaps not very useful but from a perfomance point of view but can be useful for massaging the porting process of large/intrincate CPU codebases.. An example of that is the multicoreware C++ AMP Linux port which AMD is "closely" I think supporting.. A page of HSA support says they can already using it:
see (on HSA-specific extensions section) multicoreware / cppamp-driver-ng / wiki / HSA Support Status — Bitbucket
Hope can answer or talk what do you think about all this topics,
I started trying to answer this but found that nearly all of the questions related to our OpenCL implementation rather than HSA itself. I'll see if we can move this post over to the OpenCL part of the forum or, failing that, will tell our OpenCL folks about it.
Regarding dynamic parallelism, not sure what is felt to be missing from the HSA stack but will try to find out and reply back separately.