There isn't really a "design pattern" for it's use: it depends completely on the problem being solved.
But there are some rules of thumb for when it's useful:
- when you can share data between threads
- when you need to access the same data often (i.e. a cache)
- when you can use it to re-arrange memory accesses to be 'memory friendly' (when they wouldn't otherwise be)
e.g. an 'algorithm friendly' workgroup topology might not be 'memory friendly', but sometimes you can split the operation into parts: a memory friendly part which gathers data into local store, and an algorithm friendly part which works on that data. Even if you only ever read that data once in the algorithm it could still be a significant win.
IMHO local memory is the real key feature that makes OpenCL/GPU worth it, but I find it one of the more challenging components to use effectively.
Without some code to look at, it's hard to suggest whether local memory would be of any help to your problem. But really, unless it's simply an element-by-element array operation, the answer is : probably yes.