OpenCL

lanaka · ‎10-30-2023

Hello,

I am currently working on my master's thesis, in which we want to improve the performance of the Java Vector API with an integrated GPU. To do this, we are customising the JDK and using OpenCL to run code on the iGPU. The current solution is to create SVM coarse-grain buffers, copy the Java array elements into the buffers and execute the kernel using these buffers. With this approach, we lose most of the time copying data into the SVM buffers.

- Are there CPUs and iGPUs that support the SVM fine-grain system?
- In other discussions, I have read that performance decreases as the SVM stack (coarse-grain buffer -> fine-grain system) increases. Do you agree with our assumption that, in our case, an SVM fine-grain system will increase the performance significantly?

Another idea is to provide SVM buffers that Java developers can use instead of Java arrays. When using coarse-grained buffers, it is necessary to map and unmap them every time the user writes or reads an element, resulting in increased execution times.

- Would using fine-grain buffers significantly reduce those access times?

Thank you very much for your help and best regards.

dipak · ‎10-31-2023

Hi @lanaka ,

Thanks for your query. I have forwarded it to the OpenCL team.

Also, I have whitelisted you and moved the post to the OpenCL forum.

Thanks.

german · ‎10-31-2023

Hello,

-> Are there CPUs and iGPUs that support the SVM fine-grain system?

APU(s) can support fine-grain system memory, but only if IOMMUv2 is enabled in the base driver. IOMMUv2 provides access to the CPU page table from GPU.

-> Do you agree with our assumption that, in our case, an SVM fine-grain system will increase the performance significantly?
APUs use the same system memory for both fine-grain and coarse-grain. Hence, GPU access performance should be about the same, but map() performance is faster with fine-grain memory.
In general the app may avoid SVM and achieve the same goal with CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR in normal allocations on APU(s). CL_MEM_USE_HOST_PTR should allow the app to use Java allocation directly and map() performance will be fast for both (CL_MEM_USE_HOST_PTR and CL_MEM_ALLOC_HOST_PTR ).

-> Would using fine-grain buffers significantly reduce those access times?
That's correct. Fine-grain buffers don't have the map() performance penalty.

OpenCL

OpenCL SVM fine-grain System with iGPU