cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

neworderofjamie
Adept I

Re: OpenCL compiler bug

Apologies once again for taking such a long time to reply. I have now implemented your suggestion and, on the 5700 XT, I can now remove all the flushes and am getting competitive performance which is fantastic. However, I have encountered one issue. For simplicity, I wanted to allocate a single large buffer of CL_DEVICE_MAX_MEM_ALLOC_SIZE but, when I do this, the performance drops hugely until I reduce the buffer size a bit (around 4GB is fine - I haven't managed to find the exact threshold). Is this a known issue here or am I mis-understanding CL_DEVICE_MAX_MEM_ALLOC_SIZE? For reference, clInfo shows the following related values:

Global memory size (CL_DEVICE_GLOBAL_MEM_SIZE) 8573157376 (7.984GiB)

Max memory allocation (CL_DEVICE_MAX_MEM_ALLOC_SIZE) 7059013632 (6.574GiB)

Max size for global variable (CL_DEVICE_MAX_GLOBAL_ VARIABLE_SIZE) 6353112064 (5.917GiB)

Thanks again for all your help

 

Jamie

0 Likes
Reply
german
Staff
Staff

Re: OpenCL compiler bug

I can't say much without investigation. In general Windows doesn't support >4GB single allocation and runtime requires extra logic to handle that case, but the split is enabled even for much smaller allocations. Usually performance drop occurs if runtime can't fit the allocation inside device memory and it will fallback into system memory. Check memory monitor and see if something else consumes GPU memory on your system. 

0 Likes
Reply
neworderofjamie
Adept I

Re: OpenCL compiler bug

Thanks for your rapid response. The allocation coming from system memory would totally explain this but, this is on Linux and CL_DEVICE_GLOBAL_FREE_MEMORY_AMD reports that there is 7.922 GiB free. I can try and make a minimal reproducible example if that helps? Also, on Windows, would the 32-bit limit be reflected in the CL_DEVICE_MAX_MEM_ALLOC_SIZE device info?

Tags (1)
0 Likes
Reply
german
Staff
Staff

Re: OpenCL compiler bug

Yes, 32 bit binary should limit CL_DEVICE_MAX_MEM_ALLOC_SIZE.

0 Likes
Reply
neworderofjamie
Adept I

Re: OpenCL compiler bug

So, on 64-bit Linux, what can I do to investigate why a CL_DEVICE_MAX_MEM_ALLOC_SIZE byte buffer would appear to being allocated in system memory?

0 Likes
Reply
german
Staff
Staff

Re: OpenCL compiler bug

That's my guess. I don't know for sure. Is it really the first allocation in the app? You allocate >4GB and a kernel has low performance?  

0 Likes
Reply
neworderofjamie
Adept I

Re: OpenCL compiler bug

It is indeed the first allocation of the app but the size at which everything slows down is (found after some binary-searching) is actually 4645191681 bytes which doesn't seem to have any significance in binary or any relation to any of the device info values.

0 Likes
Reply
german
Staff
Staff

Re: OpenCL compiler bug

After the app allocates memory just run clEnqueueFillBuffer() (use clear pattern size 4 or 8 bytes) and measure performance. Do you see the drop with > 4645191681 bytes?

0 Likes
Reply
neworderofjamie
Adept I

Re: OpenCL compiler bug

That would have been an excellent test but....after the machine got rebooted the issue no longer occurs. Thanks again for your help

0 Likes
Reply