I am performing image compression.
The image I is broken up into K code blocks {Bi}.
Each block has fixed size MxN pixels.
Each block is independently compressed.
All compressed blocks {Ci}, with compressed sizes {Pi}, are stored in a linear buffer B, of size K * M, where M is a fixed size greater than all sizes Pi.
Now, I would like to pack buffer B into buffer C, and get rid of empty space at the end of each compressed code block Ci.
So, I need a kernel that will:
- for each block Ci, find sum of all Pk for k < i, (call this offset_i)
- copy data for each Ci, from B into C, at offset_i, of size Pi
Any ideas on how to do this would be greatly appreciated!!