Could you post it in more readable format, I had hard time reading it. May be you can mail it on my e-mail address mentioned in my profile.
Originally posted by: gaurav.garg
Could you post it in more readable format, I had hard time reading it. May be you can mail it on my e-mail address mentioned in my profile.
What is your system configuration? I have recently seen some issues with scatter on Vista.
Originally posted by: gaurav.garg What is your system configuration? I have recently seen some issues with scatter on Vista.
Vista x86 SP1, Business Edition.
Catalyst 9.2 (cause new ones can't handle big streams ).
Radeon HD4870 GPU.
This is standalone sample that produces same error:
1+1=0 ?? (On CAL backend, CPU backend compute correctly).
main(){
unsigned int
buf_size[2];
unsigned int
thread_num_coadd=3;
buf_size[0]=4;
buf_size[1]=thread_num_coadd;
brook::Stream<
float
>* gpu_temp_coadd_old=NULL;
brook::Stream<
float>* gpu_temp_coadd=new brook::Stream<float
>(2,buf_size);
buf_size[0]=2;
float
cpu_temp[3][4];
for(int
i=0;i<thread_num_coadd;i++)
for(int
j=0;j<4;j++)
cpu_temp
gpu_temp_coadd->read(cpu_temp);
int
temp_coadd_working_length[]={2,2,2};
brook::Stream<
int
> *gpu_temp_coadd_working_length=NULL;
#if
1
fprintf(stderr,
"buf_size(coadd loop) is (%u,%u)\n"
,buf_size[0],buf_size[1]);
#endif
{
if(gpu_temp_coadd_old)delete
gpu_temp_coadd_old;
gpu_temp_coadd_old=gpu_temp_coadd;
gpu_temp_coadd=
new brook::Stream<float
>(2,buf_size);
if(gpu_temp_coadd_working_length) delete
gpu_temp_coadd_working_length;
gpu_temp_coadd_working_length=
new brook::Stream<int
>(1,&thread_num_coadd);
gpu_temp_coadd_working_length->read(temp_coadd_working_length);
GPU_coadd_kernel3(*gpu_temp_coadd_old,*gpu_temp_coadd_working_length,*gpu_temp_coadd);
#if
1
gpu_temp_coadd->finish();
#endif
if
(gpu_temp_coadd->error())
fprintf(stderr,
"ERROR: GPU_coadd_kernel3(coadd loop): %s\n"
,gpu_temp_coadd->errorLog());
#if
1
if(true
){
float
t1[4096];
float
t2[4096];
float
ta[3*4096];
fprintf(stderr,
"ARRAYS just after coadd:\n"
);
unsigned int
begin[]={0,2};
unsigned int
end[]={2,3};
unsigned int
end_old[]={2*2,3};
brook::Stream<
float
>& g1=gpu_temp_coadd_old->domain(begin, end_old);
g1.write(t1);
if(g1.error())fprintf(stderr,"ERROR: g1:%s\n"
,g1.errorLog());
brook::Stream<
float
>& g2=gpu_temp_coadd->domain(begin, end);
g2.write(t2);
if(g2.error())fprintf(stderr,"ERROR: g2:%s\n"
,g2.errorLog());
g2.write(ta);
if(g2.error())fprintf(stderr,"ERROR: g2->ta:%s\n"
,g2.errorLog());
for(int
i=0;i<2;i++){
fprintf(stderr,
"Old[%d]=%.9g,old[%d]=%.9g,new[%d]=%.9g\n"
,2*i,t1[2*i],2*i+1,t1[2*i+1],i,t2);
}
for(int
i=0;i<2;i++){
fprintf(stderr,
"Old[%d]=%.9g,old[%d]=%.9g,new[%d]=%.9g\n"
,2*i,t1[2*i],2*i+1,t1[2*i+1],i,t2);
}
}
#endif
}
//R: coadd block end
}
---------------
Originally posted by: MicahVillmow
Raistmer,
Try using something like pastebin(http://www.pastebin.com) to paste your code and provide a link. It allows for much easier reading than pasting code onto the forum directly.
For the case when size is two, it seems that you are writing to only first two lines of output and in host code you are reading back only the last row that is going to be uninitialized. That's why you see zeros.
Some basics on Brook+ kernel, not sure if you know already -
instance().x gives the colum number that is going to give value from 0 to size-1.
dest[threadID][ i ] means you are writing on row threadID and column i of dst. That would mean that you are writing sub-matrix from (0,0) to (1,1) of dst.
In host code, you are reading from last row of both src and dst stream. As you can guess the last row of dst stream was not updated inside kernel.
Originally posted by: gaurav.garg
Some basics on Brook+ kernel, not sure if you know already -
instance().x gives the colum number that is going to give value from 0 to size-1.
No, it is the first output stream that define the domain of execution. So, in your case it is size * 3.
Column-row relationship is actually similar. Width/column number is the first index in instance(), domain operator as well as stream dimension pointer. You need to just take care at stream indexing that is similar to C-style indexing.
Yes, you need to use domain of execution. Regarding performance, I guess you would still see bad performance with 2D non-128 bit scatter stream.
You need to change your kernel to use 128-bit 1D scatter stream with size < 8192 to get better performance.