I have a problem with AMD Stream.
I write a program for image processing and want to use AMD Stream to accelerate it! But due to optimization for CPU all my data is represented in large 1D arrays (with size > 8192, but less then 8192x8192). Is it possible in any reasonable way copy such array into 2D stream without any conversation ? If it is inpossible can AMD implement such operations for example copy 1D to 2D, and 2D to 1D. I know it is not so hard on low level, because I use OpenGL for such purposes a lot of times. Also I notice that readStream and writeStream only works in synchronization regime. Can AMD implement async versions of such command for example readStreamAsync and writeStreamAsync?
You can copy 1D array into 2D stream without any problem. Brook+ data transfer methods read/write works similar to CPU memcpy and do a byte by byte copy.
You can also declare 1D streams of size > 8192. Brook+ virtualise larger 1D streams with smaller 2D GPU buffers internally, but it has some performance overhead.
Thanks I will try, but what size? Is ATI Stream use non_power_of_two extension for samplers? I know from my OpenGL experience that such samples GPU proceed slower than power_of_two due to memory and cache structure of AMD GPU?
For larger 1D streams (> 8192), Brook+ tries to use GPU buffers with width of power of 2. Otherwise, the GPU buffer is allocated of same size as of stream.
Brook+ allows non power of 2 streams, but as you said it is expected to be slower.
Thanks for replay. I will try!
It is a little bit hard migrate from OpenGL to ATI Stream 🙂
Would it be easier to just redo the representation into 2D, even on the CPU side? Just curious.
For CPU part it can reduce cache efficiency!!!!! That is why I use 1D array!
Originally posted by: godsic For CPU part it can reduce cache efficiency!!!!! That is why I use 1D array!
As you can control how you iterate through your data, it does not matter to the CPU as a 2D array is also stored in a linear fashion. You only have to access the data elements of your array in the apropriate order.