15 Replies Latest reply on Jul 1, 2009 11:03 AM by gaurav.garg

    Kernel can't add 4 numbers, please help!

    Raistmer
        • Kernel can't add 4 numbers, please help!
          gaurav.garg

          Could you post it in more readable format, I had hard time reading it. May be you can mail it on my e-mail address mentioned in my profile.

            • Kernel can't add 4 numbers, please help!
              Raistmer
              Originally posted by: gaurav.garg

              Could you post it in more readable format, I had hard time reading it. May be you can mail it on my e-mail address mentioned in my profile.


              Thanks for offer, will do right now!

              [
              About posting in more readable format - I edited most many times - fighted with [ i ] as italic i < as even don't know what - it just eats end of line....
              If AMD representatives think that this forum engine just right for developers I understand why AMD still have no own compiler and more less decorous performance libraries....
              ]
                • Kernel can't add 4 numbers, please help!
                  gaurav.garg

                  What is your system configuration? I have recently seen some issues with scatter on Vista.

                   

                    • Kernel can't add 4 numbers, please help!
                      Raistmer

                       

                      Originally posted by: gaurav.garg What is your system configuration? I have recently seen some issues with scatter on Vista.

                       

                      Vista x86 SP1, Business Edition.

                      Catalyst 9.2 (cause new ones can't handle big streams ).

                      Radeon HD4870 GPU.

                       

                        • Kernel can't add 4 numbers, please help!
                          Raistmer

                          This is standalone sample that produces same error:

                          1+1=0 ?? (On CAL backend, CPU backend compute correctly).

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                           

                          main(){

                           

                           

                          unsigned int

                          buf_size[2];

                           



                           

                          unsigned int

                          thread_num_coadd=3;

                          buf_size[0]=4;

                          buf_size[1]=thread_num_coadd;

                          brook::Stream<



                           

                          float

                          >* gpu_temp_coadd_old=NULL;

                          brook::Stream<



                           

                          float>* gpu_temp_coadd=new brook::Stream<float

                          >(2,buf_size);

                          buf_size[0]=2;

                           



                           

                          float

                          cpu_temp[3][4];

                           



                           

                          for(int

                          i=0;i<thread_num_coadd;i++)

                           



                           

                          for(int

                          j=0;j<4;j++)

                          cpu_temp[j]=1.0f;

                          gpu_temp_coadd->read(cpu_temp);

                           



                           

                          int

                          temp_coadd_working_length[]={2,2,2};

                          brook::Stream<



                           

                          int

                          > *gpu_temp_coadd_working_length=NULL;

                          #if

                           

                          1

                          fprintf(stderr,



                           

                          "buf_size(coadd loop) is (%u,%u)\n"

                          ,buf_size[0],buf_size[1]);

                          #endif

                          {

                           

                           

                          if(gpu_temp_coadd_old)delete

                          gpu_temp_coadd_old;

                          gpu_temp_coadd_old=gpu_temp_coadd;

                          gpu_temp_coadd=



                           

                          new brook::Stream<float

                          >(2,buf_size);

                           



                           

                          if(gpu_temp_coadd_working_length) delete

                          gpu_temp_coadd_working_length;

                          gpu_temp_coadd_working_length=



                           

                          new brook::Stream<int

                          >(1,&thread_num_coadd);

                          gpu_temp_coadd_working_length->read(temp_coadd_working_length);

                          GPU_coadd_kernel3(*gpu_temp_coadd_old,*gpu_temp_coadd_working_length,*gpu_temp_coadd);



                          #if

                           

                          1

                          gpu_temp_coadd->finish();



                          #endif

                           

                           

                          if

                          (gpu_temp_coadd->error())

                          fprintf(stderr,



                           

                          "ERROR: GPU_coadd_kernel3(coadd loop): %s\n"

                          ,gpu_temp_coadd->errorLog());

                          #if

                           

                          1

                           



                           

                          if(true

                          ){

                           



                           

                          float

                          t1[4096];

                           



                           

                          float

                          t2[4096];

                           



                           

                          float

                          ta[3*4096];

                          fprintf(stderr,



                           

                          "ARRAYS just after coadd:\n"

                          );

                           



                           

                          unsigned int

                          begin[]={0,2};

                           



                           

                          unsigned int

                          end[]={2,3};

                           



                           

                          unsigned int

                          end_old[]={2*2,3};

                          brook::Stream<



                           

                          float

                          >& g1=gpu_temp_coadd_old->domain(begin, end_old);

                          g1.write(t1);

                           



                           

                          if(g1.error())fprintf(stderr,"ERROR: g1:%s\n"

                          ,g1.errorLog());

                          brook::Stream<



                           

                          float

                          >& g2=gpu_temp_coadd->domain(begin, end);

                          g2.write(t2);

                           



                           

                          if(g2.error())fprintf(stderr,"ERROR: g2:%s\n"

                          ,g2.errorLog());

                          g2.write(ta);

                           



                           

                          if(g2.error())fprintf(stderr,"ERROR: g2->ta:%s\n"

                          ,g2.errorLog());

                           



                           

                          for(int

                          i=0;i<2;i++){

                          fprintf(stderr,



                           

                          "Old[%d]=%.9g,old[%d]=%.9g,new[%d]=%.9g\n"

                          ,2*i,t1[2*i],2*i+1,t1[2*i+1],i,t2);

                          }

                           



                           

                          for(int

                          i=0;i<2;i++){

                          fprintf(stderr,



                           

                          "Old[%d]=%.9g,old[%d]=%.9g,new[%d]=%.9g\n"

                          ,2*i,t1[2*i],2*i+1,t1[2*i+1],i,t2);

                          }

                          }



                          #endif

                          }

                           

                          //R: coadd block end

                          }

                          ---------------

                           

                           

                           

                           

                           

                           

                           







                  • Kernel can't add 4 numbers, please help!
                    MicahVillmow
                    Raistmer,
                    Try using something like pastebin(http://www.pastebin.com) to paste your code and provide a link. It allows for much easier reading than pasting code onto the forum directly.
                      • Kernel can't add 4 numbers, please help!
                        Raistmer
                        Originally posted by: MicahVillmow

                        Raistmer,

                        Try using something like pastebin(http://www.pastebin.com) to paste your code and provide a link. It allows for much easier reading than pasting code onto the forum directly.


                        Ok, I will cause I need help in my own problem with ATI Stream SDK (for now it looks like fresh bug under Vista ).
                        But natural extension of such advises will be "try to use another boards and then, try to use products of another vendors"... Unfortunately, I already bought 2 Radeons, will think twice next time....
                          • Kernel can't add 4 numbers, please help!
                            Raistmer
                            Link on standalone test case that shows the same problem (CAL backend, Vista; no problems on CPU backend, Win2003x64).
                            1+1=0 by CAL version ;)

                            http://pastebin.com/meaaf6ed

                              • Kernel can't add 4 numbers, please help!
                                Raistmer
                                Possible workaround:
                                (look comments at size variable)
                                http://pastebin.com/m4b983c48
                                  • Kernel can't add 4 numbers, please help!
                                    gaurav.garg

                                    For the case when size is two, it seems that you are writing to only first two lines of output and in host code you are reading back only the last row that is going to be uninitialized. That's why you see zeros.

                                    Some basics on Brook+ kernel, not sure if you know already -

                                    instance().x gives the colum number that is going to give value from 0 to size-1.

                                    dest[threadID][ i ] means you are writing on row threadID and column i of dst. That would mean that you are writing sub-matrix from (0,0) to (1,1) of dst.

                                    In host code, you are reading from last row of both src and dst stream. As you can guess the last row of dst stream was not updated inside kernel.

                                      • Kernel can't add 4 numbers, please help!
                                        Raistmer
                                        Originally posted by: gaurav.garg

                                        Some basics on Brook+ kernel, not sure if you know already -


                                        instance().x gives the colum number that is going to give value from 0 to size-1.


                                        Column <-> row relation seems reversed in kernel code regarding to host code.
                                        I use 1D stream as ordinary stream that will define domain of execution, right?
                                        It should have only x dimension greater than 1, y dimension should be 1, correct?
                                        Do you suggest that if I will use instance().y I will recive correct result in my case?


                                        It leads to big question:
                                        What defines domain of execution in case of such kernel?
                                        kernel a(float b[][],int c<>,out float d[][]);
                                        I thought size of c stream will define how many invocations of this kernel will be run.
                                        Should I use dimensions of stream d instead to determine how many kernel invocations will be launched ?