Running a 5870 and I can't allocate more than 20 inputs of float4 data with 1 float4 output.
The domain size is 1024x1024.
So that should be 4 bytes/float, 4floats/input+output.
So 24 inputs+1 output = 25, 25*4*4 = 400 bytes total
Now, the domain size is 1024*1024, *400 = *1024*1024 = 419, 430, 400 bytes.
There is 1GB on the card so what's the problem? Am I missing something here?
The same kernel runs fine on the 4870 with Catalyst 9.4.
Note that I am currently running Catalyst 9.10.
I will try the 4870 too and edit this if needed.