I have a very simple problem in hand but the solution does not seem to be trivial.
I have more than 1 giga byte of data (uints in an array) and I am applying a filter on them and getting my result in another array.
I am doing this on AMD devastator using opencl. max memory allocation on devastator is 200540160 though global memory size is 536870912 as shown by clinfo. now the cpu part shows max memory allocation as 4146315264 and global memory size as 16585261056. This means that I have a lot of memory for my data.
Now the problem is that I cannot allocate buffer (clCreateBuffer) having elements more than 32 million which is obvious from max memory allocation on devastator. How can i handle more than giga byte or even tera byte elements array? where and how to save these elements (in cpu memory or in gpu and how) and how can gpu access these elements?
This problem becomes worse when my output array should have 10 times (or more) the number of elements as compared to that in input array (another scenario) because then the number of input elements i can process at a time reduce by a factor of 4. input elements that i can send now are limited to 2 million.
Any suggestions to solve these problems?