cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

akhal
Journeyman III

Local, Private and Constant memories in OpenCL

Hello

I am trying OpenCL code on CPUs. I know that global memories are implemented as main memory (RAM) in CPUs and GPUs. But GPUs probably have on chip local memory which implement the OpenCL "local memory and/or private memory". Is it true?

And Also I wonder how this constant, local and private OpenCL memories are implemented/handled in CPUs? Is it like constant memory is RAM and local+private memory is CPU caches ???

0 Likes
6 Replies
himanshu_gautam
Grandmaster

Originally posted by: akhal Hello

 

I am trying OpenCL code on CPUs. I know that global memories are implemented as main memory (RAM) in CPUs and GPUs. But GPUs probably have on chip local memory which implement the OpenCL "local memory and/or private memory". Is it true?



Yes, GPUs have on-chip LDS(local Data Share) which is OpenCL local memory(from 5xxx devices) and separate private memory bank with each compute unit.

Originally posted by: akhal Hello

And Also I wonder how this constant, local and private OpenCL memories are implemented/handled in CPUs? Is it like constant memory is RAM and local+private memory is CPU caches ???

 

I am not able to recall the answer, but you may be able to find it on some other threads. Also, this may change from one vendor implementation to other, so it is not something to be sure of. If caches are used you will get better speed-up but otherwise also your kernel will run even though somawhat slowly.

 

0 Likes

Thank you, but what about CPUs, how they implement OpenCL local and private memories?

0 Likes

And also Constant OpenCL memory? how it is in GPUs and CPUs

0 Likes

Originally posted by: akhal Thank you, but what about CPUs, how they implement OpenCL local and private memories?

 

AFAIK, private memory in both the cases is register memory. 

Local memory is emulated in CPU by using cache.  

0 Likes

as I see, in CPU private is register or L1 cache, local is L2 or L3 cache (depending on the architecture) and global/constant is RAM. But, constant is roughly as fast as and as small as local (might be stored in some cache). Bulldozer designing is even more OpenCL friendly, and the L2 cache will probably hold local memory data and are way bigger than anything we have so far, 2MB of data per module. Plus, L3 cache has 2MB per module, and can be seen by all modules (like global memory), so I can smell some nice optimizations going on here (bigger constant memory space (?)), and depending on how much data you have on global memory, it would be nice if we could just bypass RAM access once the entire data is copied to L3 cache. Wait and see.

0 Likes

isn't a caches on CPU fully transparent? IMHO all constant, local, private and global memory is allocated in normal RAM.

but use a local memory can improve cache hits as it will reuse same memory.

0 Likes