I think use the local memory is the best way.
I think the problem with local memory would be that it would be quite possible to have bank/memory conflicts when two workitems access the same word. Since there is no predictability in the pattern of access, I would not be able to address the problem of conflicts.
Does texture memory give good performance with random pattern access?
Don't worry about the local memory read,The bank/memory conflicts occur only on writing..
You can also try to use __constant memory (it is right memory type for your needs). This memory is cacheble and if you only use 256 bytes, then most likely you will have good cache hits and good performance. There is no bank conflict when read from same address too.
__local memory is good choice as well, but you will need to initialize it for each work-group where you can lose some time.
I think you need to implement both variants and choose one that has better performance.