My code had problems with a couple of offsets. Now it runs fine.
Sorry i couldn't test it before, but I had the build problem with -g and printf, and CodeXL doesn't show local variables.
BTW the lds avg implementation runs a bit slower than the global one. Since LDS is a precious resource, I'm dumping it.
Ty for your time.