I changed the code to print the buffer that is been processed, and these are the findings:
With a HD 7750 (1GB), on Linux with 5GB of RAM and Catalyst 12.6: the clWrite and clMap versions both fail at 120 buffers (with -5 and -12), which is about 2GB of memory, but there is no obvious slowdown.
Nice to hear you had added some performance metrics to that test. Can you please share it in your github repo.(Link above somewhere )
It is interesting to know what happens on HD7750 + 5GB RAM machine. But Catalyst 12.6 is way old. Can you check with 13.6beta there? It will be also useful, if you can test on windows as well. Performance is expected to be better on windows.