Here my steps
1. Create an Buffer with
clCreateBuffer with flag
2. Get Pointer to Buffer with clEnqueueMapBuffer
3. Set Value of Pointer to x
4. Unmap Buffer with clEnqueueUnmapMemObject
5. Set Kernel Args and run Kernel
6. do some stuff on Host
7. Set Value of Pointer to y
8. wait for Kernel run to finish
As i said, step 7. works on CPU and not on GPU....
1 of 1 people found this helpful
I think it's impossible to transfer data between cpu and gpu during the Kernel-Run at present. And I think before step 7, the kernel has already finished.
there is special sample in SDK which cover this use case. but it needs zero copy support which is on Linux supported only for Northern Isladns aka 7xxx GPU.
what I understand it should work like this.
1. you create buffers with CL_MEM_PERSISTENT_AMD
2. enqueue block map of first buffer
3. enqueue kernel with second buffer
5. write to first buffer
6. enqueu unmap
But it doesn't happen that data transfered during kernel-run.
map/unmap is with zero copy buffers return almost inmidietly. you write to mapped buffer with normal memcpy() memset()
as far as i understood the transfer overlap example, they use to run two kernels to exchange data...
map buffer 1
memset to buffer 1, overlapping with kernel 2
unmap buffer 1
map buffer 2
launch kernel for buffer 1
memset to buffer 2, overlapping with kernel 1
unmap buffer 2
map buffer 1
launch kernel for buffer 2
Because the need of UnMapping an Memory Object to ensure it is written i guess it is impossible to transfer Data during the Kernel-Run....
I was getting closer to zero copy buffers and their features (both device-resident and host-resident zero copy buffers).
Even if on my system the Transfer Overlap sample runs fine, with no problems about buffer flags, I realized that creating a buffer from scratch in another VS project with the flag CL_MEM_USE_PERSISTENT_MEM_AMD results as an error (flag not defined).
Since my laptop configuration should support zero copy buffer (GPU: AMD Radeon HD R9M290X (GCN), Driver Version: 1268.1 (VM), OS: Win7, System Architecture: 64-bit machine), I hardly believed the CL_MEM_USE_PERSISTENT_MEM_AMD flag was an option already coded in OpenCL. I checked in the cl.h header file and there's no declaration about it.
So, it's just a matter of declaring a struct and all the flags to comprise the CL_MEM_USE_PERSISTENT_MEM_AMD one (as in the Transfer Overlap sample), or I'm missing something else (like paths or other header files attached to that sample and that have to be included in the VS project for working with the CL_MEM_USE_PERSISTENT_MEM_AMD flag available)?
I guess, you've created a separate new thread [Re: CL_MEM_USE_PERSISTENT_MEM_AMD flag implementation] for the same query. To see the responses, please check that thread.
P.S. We usually encourage the forum users to create a new thread instead of reviving an old one when new query/observation is not directly related to that old thread. Thanks to do so.
Yes, I did since requested. Sorry for posting the question here in this old topic.
I guess you guys want to create a kind of ordered "web-enciclopedia" through this forum.
Thanks for the job.
The reason mid-kernel CPU writes do not work is because CPU writes to persistant memory are not GPU cache coherent. That is, if the address happens to be cached in the GPU L1/L2, the cache lines will not be invalidated. It is safe to write to persistent memory only outside of kernel execution boundaries where the associated caches are flushed out and invalidated. This is also true for host memory zero copy buffers.
There is a way to make CPU writes visible to the GPU mid kernel execution, however it is not for everyone.
The GCN load/store ISA command (mubuf/mtbuf) have a bit called 'SLC'. SLC stands for System Level Coherent. If you flip it on the GPU will bypass its L1\L2 caches for this specific load\store operation forming coherency with the CPU.
You can patch your kernel ISA to flip the bit on.
The OCL compiler will flip it on in single case - atomic operations the read back the previous value from memory. You can verify this behavior by observing the ISA when compiling kernels using CodeXL analysis mode.
You can't modify the data during the kernel execution, if that's what you're trying to achieve.
The API wont even allow you do to it:
e.g. you can only write to a mapped buffer when it's mapped, and you can't pass a mapped buffer to enqueuendrangekernel.
Even with zero-copy buffers there are things like caches to flush and synchronise and so forth, mmu tables, etc.
Your example is invalid api use as you cannot access Pointer after calling unmap. unmap should be considered the same way free() is: once you've called it that pointer cannot be re-used.
So, you can transfer memory whilst another kenel is in process, but you cannot modify the memory that the executing one is using. And this is a good thing.