Does loading data from shared memory accelerate the letency of fetching data???
yes.
Shared memory ,global memory and global memory fetch, which one costs least??
(in cuda loading data from shared memory is as fast as that of register)
shared memory fetch costs least.