cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

kaatish
Journeyman III

Indirect memory access read on GPU

Hi,

I want to have a small table lookup (256 byte array) which would be the same for every work item. This table would be read very freq uently. Depending upon the data a workitem reads, a particular index of the table lookup must be read. Therefore, the access pattern is random and the compiler would not know at compile time as to which data is being read.

What is the effecient way of doing this? Would it work if this array is in texture memory so that it is cached?

Regards.

0 Likes
1 Solution
pesh
Adept I

You can also try to use __constant memory (it is right memory type for your needs). This memory is cacheble and if you only use 256 bytes, then most likely you will have good cache hits and good performance. There is no bank conflict when read from same address too.

__local memory is good choice as well, but you will need to initialize it for each work-group where you can lose some time.

I think you need to implement both variants and choose one that has better performance.

View solution in original post

0 Likes
4 Replies
Wenju
Elite

Hi kaatish

I think use the local memory is the best way.

0 Likes

Hi Wenju,

I think the problem with local memory would be that it would be quite possible to have bank/memory conflicts when two workitems access the same word. Since there is no predictability in the pattern of access, I would not be able to address the problem of conflicts.

Does texture memory give good performance with random pattern access?

0 Likes
Wenju
Elite

Hi,kaatish

Don't worry about the local memory read,The bank/memory conflicts occur only on writing..

0 Likes
pesh
Adept I

You can also try to use __constant memory (it is right memory type for your needs). This memory is cacheble and if you only use 256 bytes, then most likely you will have good cache hits and good performance. There is no bank conflict when read from same address too.

__local memory is good choice as well, but you will need to initialize it for each work-group where you can lose some time.

I think you need to implement both variants and choose one that has better performance.

0 Likes