Will I benefit from caching in local memory if my filter is only 3x3? Or even smaller - if I only access, the X values in this pattern:
O X O
X O X
Using local memory will definately help as it will reduce the fetches by a factor of 4 i.e n+4(atmost) vs 4n fetches where n is the group size.
You can also use images which fetch data through a 2D L1 Texture cache, so that helps in locality fetches such as yours.
Thanks. This arises another question
Is it reasonable then to use both, images and local memory caching?
I dont think both can be used together.
Use LDS in case of linear access it is slower than just the internal registers in fetching.
In images we like to have 2 D unordered accesses where the texture is more suitable.
Retrieving data ...