Could you post it in more readable format, I had hard time reading it. May be you can mail it on my e-mail address mentioned in my profile.
Originally posted by: gaurav.gargCould you post it in more readable format, I had hard time reading it. May be you can mail it on my e-mail address mentioned in my profile.
What is your system configuration? I have recently seen some issues with scatter on Vista.
Originally posted by: gaurav.garg What is your system configuration? I have recently seen some issues with scatter on Vista.
Vista x86 SP1, Business Edition.
Catalyst 9.2 (cause new ones can't handle big streams ).
Radeon HD4870 GPU.
This is standalone sample that produces same error:
1+1=0 ?? (On CAL backend, CPU backend compute correctly).
>& g1=gpu_temp_coadd_old->domain(begin, end_old);
>& g2=gpu_temp_coadd->domain(begin, end);
//R: coadd block end
Originally posted by: MicahVillmowRaistmer, Try using something like pastebin(http://www.pastebin.com) to paste your code and provide a link. It allows for much easier reading than pasting code onto the forum directly.
For the case when size is two, it seems that you are writing to only first two lines of output and in host code you are reading back only the last row that is going to be uninitialized. That's why you see zeros.
Some basics on Brook+ kernel, not sure if you know already -
instance().x gives the colum number that is going to give value from 0 to size-1.
dest[threadID][ i ] means you are writing on row threadID and column i of dst. That would mean that you are writing sub-matrix from (0,0) to (1,1) of dst.
In host code, you are reading from last row of both src and dst stream. As you can guess the last row of dst stream was not updated inside kernel.
Originally posted by: gaurav.gargSome basics on Brook+ kernel, not sure if you know already -instance().x gives the colum number that is going to give value from 0 to size-1.
No, it is the first output stream that define the domain of execution. So, in your case it is size * 3.
Column-row relationship is actually similar. Width/column number is the first index in instance(), domain operator as well as stream dimension pointer. You need to just take care at stream indexing that is similar to C-style indexing.
Yes, you need to use domain of execution. Regarding performance, I guess you would still see bad performance with 2D non-128 bit scatter stream.
You need to change your kernel to use 128-bit 1D scatter stream with size < 8192 to get better performance.
Retrieving data ...