I am using a a Lookup Table(LUT) for my kernel and just bump up with a question about how the GPU cores access a common memory location like a LUT?
I am considering this scenario: I have a lookup table LUT which is located somewhere in global memory and during the execution time my threads repeatedly access this LUT to refer to some result (lets say refer to the LUT through 2D indexing LUT[j]). Now the question is: what happens if there are 100 cores(computational units) access this LUT at a same moment? Will the cores have to wait in a queue or smt like that to access this LUT since they are refer to the same memory object? If they don't have to wait in line for accessing LUT, how does the GPU memory controller handle this? especially in the case that 100 cores refers to 100 different element of the LUT, how does the GPU's address decoder unit work? Is there any special architecture inside the GPU that allows the cores to share the same memory bus?