global vs. local memory

I'm wondering what's the main difference between such types of memory.

There are some examples for CL, which could clarify it a bit [but they don't, for me anyway] - for instance transpose or multiply matrices uses local memory and on the other hand sobel filter or matrix convolution works without any local memory.

I know that fetch data from local is probably faster than from global, but copying data from global to local also costs some time - isn't it ?

So far I assume that local memory might be more efficient while user must deal with a lot of data whereas global memory is more effective when user operates just on each element and moreover needs additional access to the element neighbourhood - Am I right ?

Thanks in advance for any response.