cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

binghy
Adept II

Re: Host<->Device Memory Transfer during Kernel-Run?

Jump to solution

Hi,

I was getting closer to zero copy buffers and their features (both device-resident and host-resident zero copy buffers).

Even if on my system the Transfer Overlap sample runs fine, with no problems about buffer flags, I realized that creating a buffer from scratch in another VS project with the flag CL_MEM_USE_PERSISTENT_MEM_AMD results as an error (flag not defined).

Since my laptop configuration should support zero copy buffer (GPU: AMD Radeon HD R9M290X (GCN), Driver Version: 1268.1 (VM), OS: Win7, System Architecture: 64-bit machine), I hardly believed the CL_MEM_USE_PERSISTENT_MEM_AMD flag was an option already coded in OpenCL. I checked in the cl.h header file and there's no declaration about it.

So, it's just a matter of declaring a struct and all the flags to comprise the CL_MEM_USE_PERSISTENT_MEM_AMD one (as in the Transfer Overlap sample), or I'm missing something else (like paths or other header files attached to that sample and that have to be included in the VS project for working with the CL_MEM_USE_PERSISTENT_MEM_AMD flag available)?

Thanks

Marco

0 Kudos
Reply
tzachi_cohen
Staff
Staff

Re: Host<->Device Memory Transfer during Kernel-Run?

Jump to solution

The reason mid-kernel CPU writes do not work is because CPU writes to persistant memory are not GPU cache coherent. That is, if the address happens to be cached in the GPU L1/L2, the cache lines will not be invalidated. It is safe to write to persistent memory only outside of kernel execution boundaries where the associated caches are flushed out and invalidated. This is also true for host memory zero copy buffers.

There is a way to make CPU writes visible to the GPU mid kernel execution, however it is not for everyone.

The GCN load/store ISA command (mubuf/mtbuf) have a bit called 'SLC'. SLC stands for System Level Coherent. If you flip it on the GPU will bypass its L1\L2 caches for this specific load\store operation forming coherency with the CPU.

You can patch your kernel ISA to flip the bit on.

The OCL compiler will flip it on in  single case - atomic operations the read back the previous value from memory. You can verify this behavior by observing the ISA when compiling kernels using CodeXL analysis mode.

Tzachi

0 Kudos
Reply
dipak
Staff
Staff

Re: Host<->Device Memory Transfer during Kernel-Run?

Jump to solution

I guess, you've created a separate new thread [Re: CL_MEM_USE_PERSISTENT_MEM_AMD flag implementation] for the same query. To see the responses, please check that thread.

P.S. We usually encourage the forum users to create a new thread instead of reviving an old one when new query/observation is not directly related to that old thread. Thanks to do so.

Regards,

0 Kudos
Reply
binghy
Adept II

Re: Host<->Device Memory Transfer during Kernel-Run?

Jump to solution

Yes, I did since requested. Sorry for posting the question here in this old topic.

I guess you guys want to create a kind of ordered "web-enciclopedia" through this forum.

Thanks for the job.

Regards

0 Kudos
Reply