cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

neworderofjamie
Adept I

Re: OpenCL compiler bug

Ok, the Linux machine with the 5700XT is upgraded to AMDGPU-Pro 20.40 and the behaviour is also unchanged.

0 Kudos
Reply
neworderofjamie
Adept I

Re: OpenCL compiler bug

Hi there,

Just wondering if there's been any progress on the OpenCL teams investigation of this issue?

Thanks

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compiler bug

At this moment, I don't have any update that I can share with you. I'll let you know if I get any information on this issue.

Thanks.

0 Kudos
Reply
german
Staff
Staff

Re: OpenCL compiler bug

I believe the app violates OCL1.2 spec. It's not allowed to pass the pointers from one kernel and reuse in another. It's not Cuda. Even in OCL2.0 SVM the app still has to pass the CL mem objects, hidden inside other buffers, for every kernel in clSetKernelExecInfo and only Fine-Grain System SVM doesn't require that.

buildNeuronUpdate1Kernel(__global struct MergedNeuronUpdateGroup1 *group, unsigned int idx, __global float* x ...){
   group[idx].x = x;

}

updateNeuronsKernel() {

   group->x[lid] = group->inSynInSyn0[lid];
}

0 Kudos
Reply
neworderofjamie
Adept I

Re: OpenCL compiler bug

Sorry for taking so long to respond to this thread, I was busy with other things and then got locked out of the new forum (thanks @dipak  for helping me with that). Could you clarify whether @german 's response is the outcome of the OpenCL team's investigation of this issue or just another opinion on this issue?

0 Kudos
Reply
german
Staff
Staff

Re: OpenCL compiler bug

I work in OCL team and it was an outcome of my investigation. The app can't hide the pointers to some memory objects inside an arbitrary memory location and fetch them later in another kernel without a notification to OCL runtime about extra memory objects. It's a feature of Fine Grain System SVM.
Potentially the app can use OCL subbuffers, but make sure to send the parent buffer into the kernel that fetches pointers to subbuffers, even if a kernel doesn't have direct access to the parent.

0 Kudos
Reply
neworderofjamie
Adept I

Re: OpenCL compiler bug

Ok, that's good to know - I was always somewhat worried this approach violated the spec although I still don't really understand what's actually going wrong. The pointers we're 'hiding' are in device memory and it all works fine if you insert a lot of flushes so it's not like the pointers are actual somehow virtualized and thus not transferrable between kernels..

Additionally, none of the solutions I can think of are very satisfactory 😕 As I understand it, coarse-grained SVM would let us build the data structures we need but, we have no need for most data to be accessible from the host and really want to remain in control of copying data between host and device. I guess we could switch to HIP, where this kind of approach presumably works, but then we'd lose support for 90% of AMD hardware. Any suggestions would be much appreciated...

0 Kudos
Reply
german
Staff
Staff

Re: OpenCL compiler bug

Runtime requires to know memory objects in order for MS Vidmm to work properly, also there are optimizations in runtime, which need the knowledge of all used memory objects. Much older HW(when OCL 1.0-1.2 was designed) wouldn't work even with flushes, because Windows required memory address patching upon a submission to HW. Flushes (but rather Finish) serialize execution, disabling some optimizations and/or changing timing for memory access.
I already mentioned a possible solution. The app can use subbuffers, but the original parent object must be included into the fetch kernel. However it's not really 100% robust solution and the app may still need clFlush() before and after the kernel that fetches the saved pointers, but at least OCL runtime will be able to pass proper usage information to MS Vidmm.

0 Kudos
Reply
neworderofjamie
Adept I

Re: OpenCL compiler bug

Thank you for the additional information. After reading the documentation on clCreateSubBuffer, can you clarify how I could use that in my case? Are you suggesting allocating a single large buffer which I pass to every kernel and then, from that, creating sub-buffers which I point to in the structs?

0 Kudos
Reply
german
Staff
Staff

Re: OpenCL compiler bug

That's correct. However the things are a bit more complicated than that. On gfx10  HW(navi) it should work. On gfx9 HW it may still require extra clFlush(), since compiler can detect if the global resource has R/W access and then runtime could consider it as a nop. To avoid that the app may need some arbitrary single DWORD write to the global mem object in the kernel that has access to the hidden objects.

0 Kudos
Reply