Archives Discussions

smatovic · ‎07-11-2012

Heyho,

i try to transfer some data between Host and OpenCL Device during the Kernel Run,

by use of "cl_mem_use_persistent_mem_amd" i figured out that i can create an Pointer to an Buffer,

this works on my CPU but not on GPU (Programm crashes).

Is it even possible to transfer Data to the GPU during the Kernel-Run?

My System:

Ubuntu Linux 12.04 64 bit, AMD X4 CPU, AMD HD 7750, Catalyst 12.6

Thanks in advance,

Srdja

notzed · ‎07-12-2012

You can't modify the data during the kernel execution, if that's what you're trying to achieve.

The API wont even allow you do to it:

e.g. you can only write to a mapped buffer when it's mapped, and you can't pass a mapped buffer to enqueuendrangekernel.

Even with zero-copy buffers there are things like caches to flush and synchronise and so forth, mmu tables, etc.

Your example is invalid api use as you cannot access Pointer after calling unmap. unmap should be considered the same way free() is: once you've called it that pointer cannot be re-used.

So, you can transfer memory whilst another kenel is in process, but you cannot modify the memory that the executing one is using. And this is a good thing.

View solution in original post

smatovic · ‎07-11-2012

Here my steps

1. Create an Buffer with clCreateBuffer with flag CL_MEM_READ_WRITE | CL_MEM_USE_PERSISTENT_MEM_AMD

2. Get Pointer to Buffer with clEnqueueMapBuffer

3. Set Value of Pointer to x

4. Unmap Buffer with clEnqueueUnmapMemObject

5. Set Kernel Args and run Kernel

6. do some stuff on Host

7. Set Value of Pointer to y

8. wait for Kernel run to finish

As i said, step 7. works on CPU and not on GPU....

--

Srdja

Wenju · ‎07-12-2012

Hi Srdja,

I think it's impossible to transfer data between cpu and gpu during the Kernel-Run at present. And I think before step 7, the kernel has already finished.

nou · ‎07-12-2012

there is special sample in SDK which cover this use case. but it needs zero copy support which is on Linux supported only for Northern Isladns aka 7xxx GPU.

Wenju · ‎07-12-2012

Hi nou,

which sample?

nou · ‎07-12-2012

Transfer Overlap

what I understand it should work like this.

1. you create buffers with CL_MEM_PERSISTENT_AMD

2. enqueue block map of first buffer

3. enqueue kernel with second buffer

4. clflush

5. write to first buffer

6. enqueu unmap

7. clfinish

Wenju · ‎07-12-2012

But it doesn't happen that data transfered during kernel-run.

nou · ‎07-12-2012

map/unmap is with zero copy buffers return almost inmidietly. you write to mapped buffer with normal memcpy() memset()

smatovic · ‎07-12-2012

as far as i understood the transfer overlap example, they use to run two kernels to exchange data...

map buffer 1

while(..)

{

memset to buffer 1, overlapping with kernel 2

unmap buffer 1

map buffer 2

launch kernel for buffer 1

memset to buffer 2, overlapping with kernel 1

unmap buffer 2

map buffer 1

launch kernel for buffer 2

}

Because the need of UnMapping an Memory Object to ensure it is written i guess it is impossible to transfer Data during the Kernel-Run....

--

Srdja

binghy · ‎02-06-2015

Hi,

I was getting closer to zero copy buffers and their features (both device-resident and host-resident zero copy buffers).

Even if on my system the Transfer Overlap sample runs fine, with no problems about buffer flags, I realized that creating a buffer from scratch in another VS project with the flag CL_MEM_USE_PERSISTENT_MEM_AMD results as an error (flag not defined).

Since my laptop configuration should support zero copy buffer (GPU: AMD Radeon HD R9M290X (GCN), Driver Version: 1268.1 (VM), OS: Win7, System Architecture: 64-bit machine), I hardly believed the CL_MEM_USE_PERSISTENT_MEM_AMD flag was an option already coded in OpenCL. I checked in the cl.h header file and there's no declaration about it.

So, it's just a matter of declaring a struct and all the flags to comprise the CL_MEM_USE_PERSISTENT_MEM_AMD one (as in the Transfer Overlap sample), or I'm missing something else (like paths or other header files attached to that sample and that have to be included in the VS project for working with the CL_MEM_USE_PERSISTENT_MEM_AMD flag available)?

Thanks

Marco

dipak · ‎02-09-2015

I guess, you've created a separate new thread [Re: CL_MEM_USE_PERSISTENT_MEM_AMD flag implementation] for the same query. To see the responses, please check that thread.

P.S. We usually encourage the forum users to create a new thread instead of reviving an old one when new query/observation is not directly related to that old thread. Thanks to do so.

Regards,

binghy · ‎02-09-2015

Yes, I did since requested. Sorry for posting the question here in this old topic.

I guess you guys want to create a kind of ordered "web-enciclopedia" through this forum.

Thanks for the job.

Regards

tzachi_cohen · ‎02-09-2015

The reason mid-kernel CPU writes do not work is because CPU writes to persistant memory are not GPU cache coherent. That is, if the address happens to be cached in the GPU L1/L2, the cache lines will not be invalidated. It is safe to write to persistent memory only outside of kernel execution boundaries where the associated caches are flushed out and invalidated. This is also true for host memory zero copy buffers.

There is a way to make CPU writes visible to the GPU mid kernel execution, however it is not for everyone.

The GCN load/store ISA command (mubuf/mtbuf) have a bit called 'SLC'. SLC stands for System Level Coherent. If you flip it on the GPU will bypass its L1\L2 caches for this specific load\store operation forming coherency with the CPU.

You can patch your kernel ISA to flip the bit on.

The OCL compiler will flip it on in single case - atomic operations the read back the previous value from memory. You can verify this behavior by observing the ISA when compiling kernels using CodeXL analysis mode.

Tzachi

notzed · ‎07-12-2012

You can't modify the data during the kernel execution, if that's what you're trying to achieve.

The API wont even allow you do to it:

e.g. you can only write to a mapped buffer when it's mapped, and you can't pass a mapped buffer to enqueuendrangekernel.

Even with zero-copy buffers there are things like caches to flush and synchronise and so forth, mmu tables, etc.

Your example is invalid api use as you cannot access Pointer after calling unmap. unmap should be considered the same way free() is: once you've called it that pointer cannot be re-used.

So, you can transfer memory whilst another kenel is in process, but you cannot modify the memory that the executing one is using. And this is a good thing.

Archives Discussions

Host<->Device Memory Transfer during Kernel-Run?