By chance, I created a device buffer for my kernel, but did not access it.
The creation of this unused buffer changed the performance of my kernels.
The size of the buffer was small.
Is this a bug in the driver perhaps? I am using the latest crimson driver,
on windows 7 64 bit.
Solved! Go to Solution.
It is probably due to virtual addresses assigned to these one byte buffers. On one occasion the virtual addresses of these buffers lead them to evenly distribute between the GPU memory channels and on the other occasion some buffers are assigned to the same memory channel and access to them is being serialized.
This behavior may change between one GPU to the other as each GPU has a different number of channels and a slightly different memory configuration.
Tzachi
Update: it looks like it is the number of buffers allocated. If I keep allocating 1 byte buffers that do not get used,
then the perf goes up and down: add one buffer and the per goes up, add a second and it goes back to where it
was before, add a third buffer and it goes up, add a fourth and it goes back to where it was, etc.
Very odd.
It is probably due to virtual addresses assigned to these one byte buffers. On one occasion the virtual addresses of these buffers lead them to evenly distribute between the GPU memory channels and on the other occasion some buffers are assigned to the same memory channel and access to them is being serialized.
This behavior may change between one GPU to the other as each GPU has a different number of channels and a slightly different memory configuration.
Tzachi
Tzachi, so will this apply to a whole family of GPUs, i.e. will all Capeverde GPUs exhibit this behaviour?
Also, since it was so easy for me in userland to dramatically change performance, would it not make sense to
modify the driver so that it better managed mapping virtual memory addresses to GPU memory channels?
Is this topic discussed anywhere, so I can learn more?
boxerab wrote:
Tzachi, so will this apply to a whole family of GPUs, i.e. will all Capeverde GPUs exhibit this behaviour?
Also, since it was so easy for me in userland to dramatically change performance, would it not make sense to
modify the driver so that it better managed mapping virtual memory addresses to GPU memory channels?
Is this topic discussed anywhere, so I can learn more?
Yes, you can read all about memory optimization in the optimization guide. It affects all GCN cards. The more recent the card, usually, the more channels it has to avoid collissions...
Thanks. I will be upgrading next year to 300 or 400 series card, so hopefully this channel issue will be less of a problem. I am just happy that I
stumbled on this performance improvement (+ 20 % ).