Hi, I'm a PhD student working on building a software router with APU.
In order to take advantage of APU's memory sharing between CPU and GPU to process packets in parallel, I also need to enable zero-copy between my app and NIC. In order to do that, I have to somehow allocate memory in a physically contiguous order (e.g., clSVMAlloc(2MB) -> allocate physically contiguous memory of 2MB). I was trying to see how clSVMAlloc allocates memory but it seems that's closed source. I'm curious how clSVMAlloc actually allocates memory into physical memory space. Is it allocated contiguously? or is it allocated like normal malloc would do (map virtual address to multiple physical addresses)? If it's not contiguous, is there a way to make it contiguous? Any comment or advice will be a great help!
I have white-listed you, so you should be able to post in any of the AMD developer forums directly.
I am moving this post to the OpenCL forum, where you should receive helpful replies.
I can't use ocl 2.0 to check for myself, but you can do it easily yourself. A few well placed printfs in your kernel, will tell you if the chunk is continuous or not. My guess is that it works like malloc - continuous for short segments, distributed for larger ones. Size is defined as your buffer over your total VRAM size.
On the gcn asm level, you can access any part of the 48(?)bit memory without any restrictions (even the memory region of the running program).
On the APU it's possible that they inserted a transparent layer that remaps GPU memory pages to Sys memory pages, but on the GPU side you have the feeling that it is contiguous.
I think they must do something like this, unless the GPU could alter the protected memory of the CPU.
But I'm only guessing.
The OCL runtime has no guarantee regarding the physical contiguousness of its allocations, however, If your app can somehow allocate contiguous memory externally to OCL you can wrap the buffer with an OCL object using 'CL_MEM_USE_HOST_PTR'.
The resource will be zero-copy as long as the pointer is 256 bytes aligned.
What operating system are you using? On Windows you can allocate a 2MB chuck of contiguous memory from user space with 'VirtualAlloc' by specifying large pages flag. Not sure how to do it in linux.
Since I don't have access to clSVMAlloc() driver code, I tried another quicker way. I allocated memory on multiple variables in a row with clSVMAlloc(), then calculated physical address from /proc/self/pagemap to see whether physical address difference between each variable equals the allocated size. It seems the values are correct, which I believe means they are allocated contiguously.
Yes I was also guessing that allocating memory contiguously is the intuitive way to share memory between CPU and GPU. It turns out that is the case after some testing.
Hi, I'm running Linux (Ubuntu 14.04). I implemented some test code to check whether the allocated memory is physically contiguous. I allocated memory on multiple variables in a row with clSVMAlloc(), then calculated physical addresses from /proc/self/pagemap to see whether physical address difference between each variable equals the allocated size. It seems the values are the same. Normally, I would use mmap to memory map to a file to guarantee contiguity, but it seems clSVMAlloc() works fine too. CL_MEM_USE_HOST_PTR should also work but I wanted to keep using SVM so that I can exploit fine-grained SVM buffer and atomics. Thanks.