14 Replies Latest reply on Jun 9, 2009 2:27 PM by dabrunhosa

    Problems with Kernel Code

    dabrunhosa

      Hi, I'm having some problems when compiling the code above.

      kernel void calcular_eq_calor(double input[],double dt,double dx,double t_final,double x_final,double alpha,int nx,int num_processos,out double output<>
      {
          double i , m;
          double x , t , beta;

          t = 0.0;
          m = 0.0;
          beta = (alpha*dx*dx)/dt;

         while (t <= t_final)
         {
            // ======================== BEGIN ========================
            i = 1.0;
            x = (double) i*dx;
            while (x < x_final)
            {
               // ============== BEGIN =============
                output = (input[i+1.0] - (2.0 - beta)* input + input[i-1.0])/beta;         
                // =============== END ===============
                i++;
                x = (double) i*dx;
            }
            // ========================= END =========================
            m++;
            t = (double) m*dt;
         }
      }

      I want to be able to access the input [i+1], input[i-1] e o input, and put this data in the output. How can i do this ?

       

        • Problems with Kernel Code
          gaurav.garg

          Looks like there is some problem with address translation code generation in brcc. You can compile .br file with -r flag to disable address translation if you are not using large 1D streams (width > 8192) or 3D streams.

            • Problems with Kernel Code
              dabrunhosa

              It worked. Now my problem is that I want to store each result in a diferrent position in the output. The code above is my modification:

              kernel void calcular_eq_calor(double input[],double dt,double dx,double t_final,double x_final,double alpha,int nx,int num_processos,out double output[])
              {
                  double i , m;
                  double x , t , beta;

                  t = 0.0;
                  m = 0.0;
                  beta = (alpha*dx*dx)/dt;

                 while (t <= t_final)
                 {
                    // ======================== BEGIN ========================
                    i = 1.0;
                    x = (double) i*dx;
                    while (x < x_final)
                    {
                       // ============== BEGIN =============
                        output = (input[i+1.0] - (2.0 - beta)* input + input[i-1.0])/beta;         
                        // =============== END ===============
                        i++;
                        x = (double) i*dx;
                    }
                    // ========================= END =========================
                    m++;
                    t = (double) m*dt;
                 }
              }

               

              The problem with the code above is the result of the operation. I execute the same algorithm in the CPU and the result is that every position is 0, and in the CPU should be the same, but it's not.

                • Problems with Kernel Code
                  gaurav.garg

                  Could you post your runtime code also?

                    • Problems with Kernel Code
                      dabrunhosa

                      Runtime Code Above :

                      #include "Eq_Calor.h"
                      #include "brookgenfiles/eq_calor_gpu.h"

                      CPerfCounter* timer;

                      //Para a equação do calor ser estável a equação a seguir precisa
                      // ser satisfeita : dt <= (dx*dx)/(2*alpha)
                      const double dt = 0.000005; // Delta tempo
                      const double dx = 0.1;  // Delta x
                      const double t_final = 10;
                      const double x_final = 1;
                      const double alpha = 1;
                      int nx = (int)(1/dx) + 1;


                      double _input[100000000] , _input2[100000000];
                      int _length;
                      float _count;
                      double* _output_gpu;
                      float* _output;
                      float _result;


                      Eq_Calor::Eq_Calor(int num_processos)
                      {
                          _output = NULL;
                          _output_gpu = NULL;
                          _result = 0.0f;
                          _length = num_processos;
                          _count = 0.0f;
                      }

                      ////////////////////////////////////////////////////////////////////////////////
                      //
                      //  \Equal CPU Code
                      //
                      ////////////////////////////////////////////////////////////////////////////////

                      void eq_calor_cpu(float *output)
                      {
                          long int i , m;
                          double x , t , beta;

                          t = 0;
                          m = 0;
                          beta = (alpha*dx*dx)/dt;

                         while (t <= t_final)
                         {
                            // ======================== BEGIN ========================
                            i = 1;
                            x = (double) i*dx;
                            while (x < x_final)
                            {
                               // ============== BEGIN =============
                                _input2 = (_input[i+1] - (2 - beta)*_input + _input[i-1])/beta;         
                                // =============== END ===============
                                i++;
                                x = (double) i*dx;
                            }
                            // Passar os novos valores para o vetor U
                            for (i = 1 ; i < nx; i++)
                            {
                               _input = _input2;
                            }
                            // ========================= END =========================
                            m++;
                            t = (double) m*dt;
                         }
                      }



                      void PreencherStream()
                      {
                          _output = (float*) malloc(sizeof(float));
                          _output_gpu = (double*) malloc(sizeof(double));
                          double x;
                          for(int i = 0; i< nx; i++)
                          {
                              x = (double) i*dx;
                              _input =  exp(-((x - 0.5)*(x - 0.5))/(0.01));
                          }

                          // Condicao de contorno de DIRICHLET
                         _input[0] = 0;
                         _input[nx] = 0;
                      }

                      ////////////////////////////////////////////////////////////////////////////////
                      //!
                      //! \brief  backend implementation for the sample
                      //!
                      ////////////////////////////////////////////////////////////////////////////////

                      bool Eq_Calor::run()
                      {
                          unsigned int retVal = 0;
                          timer = new CPerfCounter();
                          /////////////////////////////////////////////////////////////////////////
                          // Brook code block
                          /////////////////////////////////////////////////////////////////////////
                          {
                              unsigned int dim[] = {nx};
                              ::brook::Stream<double> inputStream(1, dim);
                              ::brook::Stream<double> outputStream(1, dim);

                              PreencherStream();

                              inputStream.read(&_input);
                             
                              printf ("\n\nCondicao Inicial\n\n");
                              for (int i = 0 ; i<= nx; i++)
                              {
                                  printf ("%.20f\n" , _input
                      );
                              }

                              timer->Start();
                              eq_calor_cpu(_output);
                              timer->Stop();
                              cout<<"\nPassaram "<<timer->GetElapsedTime()<<" unidades de tempo no execucao na CPU";

                              printf ("\n\nResultado FINAL\n\n");
                              for (int i = 0 ; i <= nx; i++)
                              {
                                  printf ("%.20f\n" , _input);
                              }

                              timer->Reset();

                              timer->Start();
                              calcular_eq_calor(inputStream,dt,dx,t_final,x_final,alpha,nx,_length,outputStream);
                              timer->Stop();
                              cout<<"\nPassaram "<<timer->GetElapsedTime()<<" unidades de tempo no execucao na GPU\n";

                              outputStream.write(_output_gpu);

                              printf ("\n\nResultado FINAL\n\n");
                              for (int i = 0 ; i <= nx; i++)
                              {
                                  printf ("%.20f\n" , _output_gpu
                      );
                              }
                             
                              // Handle errors if any
                              if(outputStream.error())
                              {
                                  std::cout << "Error occured" << std::endl;
                                  std::cout << outputStream.errorLog() << std::endl;
                              }
                          }

                          /////////////////////////////////////////////////////////////////////////
                          // Print results
                          /////////////////////////////////////////////////////////////////////////

                          system("pause");
                             
                          return true;
                      }

                    • Problems with Kernel Code
                      Gipsel

                       

                      Originally posted by: dabrunhosa

                      The problem with the code above is the result of the operation. I execute the same algorithm in the CPU and the result is that every position is 0, and in the CPU should be the same, but it's not.



                      Could you verify the CPU code is doing what it is supposed to do? As I understood you want to replace the function "void eq_calor_cpu(float *output)" with a Brook equivalent, right?
                      Frankly, I don't think, that function is really working as it is posted.

                      But guessing what you want to do I think that you can omit the inner while loop (over i or x respectively) of your kernel, as Brook can implicitly execute them in parallel (the whole execution domain are implicitly parallel). You don't need a scatter output for that. The outer loop (over t) has to be done in host code (calling the kernel several times with switched input and output streams). The only thing you have to ensure, is that the input stream has two elements more than the output stream.

                      It would look like that:

                      while(t<=tfinal){
                        calcular_eq_calor.domainOffset(uint4(1, 0, 0, 0));
                        calcular_eq_calor.domainSize(uint4(nx-2, 0, 0, 0));
                        calcular_eq_calor.(input, output, ...);

                        calcular_eq_calor.domainOffset(uint4(1, 0, 0, 0));
                        calcular_eq_calor.domainOffset(uint4(nx-2, 0, 0, 0));
                        calcular_eq_calor.(output, input, ...);
                        t+=2*dt;
                      }

                      And the kernel itself may be as simple as something like that:

                      kernel void calcular_eq_calor(double input[], out double output<>, double beta, double inverted_beta)
                      {
                          int j = instance().x;
                          output = (input[j+1] - (2.0 - beta)* input[j] + input[j-1]) * inverted_beta;
                      }

                        • Problems with Kernel Code
                          dabrunhosa

                          I think it works, what the function is supossed to do is observe a certain material for a period of time. The idea of the function is to wait for the material to reach 0, 0 means that it reached the temperature equilibrium. And i'm put the position 0 and nx of this material already in equilibrium.

                            • Problems with Kernel Code
                              Gipsel

                               

                              Originally posted by: dabrunhosa I think it works


                              I guess the missing indices for _input and _input2 are really confusing

                              Did you have a look at the suggested code or got I your intention completely wrong?

                              PS:

                              The forum software appears to be complete crap, it just deletes the index i within square brackets (as it is a reserved for italics). We need some code tags!

                                • Problems with Kernel Code
                                  dabrunhosa

                                  I'm sorry, it seems that the last time I enter the forum didn't show you whole messange. I will see if it works and I post the result here. Thanks

                                    • Problems with Kernel Code
                                      dabrunhosa

                                      Gipsel, the code that you sent to me didnt work, the code above does work, but the problem is that for some reason the code only executes the first loop in the for statement.

                                      kernel void calcular_eq_calor(double input[], out double output[], double beta, double inverted_beta,int nx,double dt,double t_final)
                                      {
                                          double m ,beta_in;
                                          double t;

                                          int i = instance().x;
                                          int j = i + 1;
                                          int l = i -1;

                                          beta_in = beta - 2.0;

                                          t = 0.0;
                                          m = 0.0;

                                         for(; t <= t_final; m = m + 1.0)
                                         {
                                            output = (input[j] - (beta_in)* input + input[l]) * inverted_beta; 
                                            t = m * dt;
                                         }

                                          if(i == (nx-1))
                                          {
                                              output[0] = 0.0;
                                              output[nx] = 0.0;
                                          }
                                      }

                                       

                                      The Host Code is :

                                       

                                      #include "Eq_Calor.h"
                                      #include "brookgenfiles/eq_calor_gpu.h"

                                      CPerfCounter* timer;

                                      //Para a equação do calor ser estável a equação a seguir precisa
                                      // ser satisfeita : dt <= (dx*dx)/(2*alpha)
                                      const double dt = 0.005; // Delta tempo
                                      const double dx = 0.1;  // Delta x
                                      const double t_final = 10;
                                      const double x_final = 1;
                                      const double alpha = 1;
                                      int nx = (int)(1/dx) + 1;
                                      double beta = (alpha*dx*dx)/dt;


                                      double _input[100000000] , _input2[100000000];
                                      int _length;
                                      float _count;
                                      double* _output_gpu;
                                      double* _input_gpu;
                                      float _result;


                                      Eq_Calor::Eq_Calor(int num_processos)
                                      {
                                          _input_gpu = NULL;
                                          _output_gpu = NULL;
                                          _result = 0.0f;
                                          _length = num_processos;
                                          _count = 0.0f;
                                      }

                                      ////////////////////////////////////////////////////////////////////////////////
                                      //
                                      //  \Equal CPU Code
                                      //
                                      ////////////////////////////////////////////////////////////////////////////////

                                      void eq_calor_cpu()
                                      {
                                          long int i , m;
                                          double x , t;

                                          t = 0;
                                          m = 0;

                                         while (t <= t_final)
                                         {
                                            // ======================== BEGIN ========================
                                            i = 1;
                                            x = (double) i*dx;
                                            while (x < x_final)
                                            {
                                               // ============== BEGIN =============
                                                _input2 = (_input[i+1] - (2 - beta)*_input + _input[i-1])/beta;         
                                                // =============== END ===============
                                                i++;
                                                x = (double) i*dx;
                                            }
                                            // Passar os novos valores para o vetor U
                                            for (i = 1 ; i < nx; i++)
                                            {
                                               _input = _input2;
                                            }
                                            // ========================= END =========================
                                            m++;
                                            t = (double) m*dt;
                                         }
                                      }



                                      void PreencherStream()
                                      {
                                          _input_gpu = (double*) malloc(sizeof(double));
                                          _output_gpu = (double*) malloc(sizeof(double));
                                          double x;
                                          for(int i = 0; i< nx; i++)
                                          {
                                              x = (double) i*dx;
                                              _input =  exp(-((x - 0.5)*(x - 0.5))/(0.01));
                                              _input_gpu
                                      = exp(-((x - 0.5)*(x - 0.5))/(0.01));
                                          }

                                          // Condicao de contorno de DIRICHLET
                                         _input[0] = 0;
                                         _input[nx] = 0;
                                         _input_gpu[0] = 0;
                                         _input_gpu[nx] = 0;
                                      }

                                      ////////////////////////////////////////////////////////////////////////////////
                                      //!
                                      //! \brief  backend implementation for the sample
                                      //!
                                      ////////////////////////////////////////////////////////////////////////////////

                                      bool Eq_Calor::run()
                                      {
                                          unsigned int retVal = 0;
                                          timer = new CPerfCounter();
                                          double t = 0;
                                          double m = 0;
                                          double inverted_beta = beta / (beta*beta);
                                          /////////////////////////////////////////////////////////////////////////
                                          // Brook code block
                                          /////////////////////////////////////////////////////////////////////////
                                          {
                                              unsigned int dim[] = {nx};
                                              ::brook::Stream<double> inputStream(1, dim);
                                              ::brook::Stream<double> outputStream(1, dim);

                                              PreencherStream();

                                              inputStream.read(_input_gpu);
                                             
                                              printf ("\n\nCondicao Inicial\n\n");
                                              for (int i = 0 ; i<= nx; i++)
                                              {
                                                  printf ("%.20f\n" , _input);
                                              }

                                              timer->Start();
                                              eq_calor_cpu();
                                              timer->Stop();
                                              cout<<"\nPassaram "<<timer->GetElapsedTime()<<" unidades de tempo no execucao na CPU";

                                              printf ("\n\nResultado FINAL\n\n");
                                              for (int i = 0 ; i <= nx; i++)
                                              {
                                                  printf ("%.20f\n" , _input
                                      );
                                              }

                                              timer->Reset();
                                             
                                              printf ("\n\nCondicao Inicial\n\n");
                                              for (int i = 0 ; i<= nx; i++)
                                              {
                                                  printf ("%.20f\n" , _input_gpu);
                                              }

                                              timer->Start();
                                              //calcular_eq_calor.domainOffset(uint4(1, 0, 0, 0));
                                              //calcular_eq_calor.domainSize(uint4(nx-1, 0, 0, 0));
                                              calcular_eq_calor(inputStream,outputStream,beta,inverted_beta,nx,dt,t_final);
                                              timer->Stop();
                                              cout<<"\nPassaram "<<timer->GetElapsedTime()<<" unidades de tempo no execucao na GPU\n";

                                              outputStream.write(_output_gpu);

                                              printf ("\n\nResultado FINAL\n\n");
                                              for (int i = 0 ; i < nx; i++)
                                              {
                                                  printf ("%.20f\n" , _output_gpu
                                      );
                                              }
                                             
                                              // Handle errors if any
                                              if(outputStream.error())
                                              {
                                                  std::cout << "Error occured" << std::endl;
                                                  std::cout << outputStream.errorLog() << std::endl;
                                              }
                                          }

                                          /////////////////////////////////////////////////////////////////////////
                                          // Print results
                                          /////////////////////////////////////////////////////////////////////////

                                          system("pause");
                                             
                                          return true;
                                      }

                                       

                                      I'm sorry it took so long to reply.

                                        • Problems with Kernel Code
                                          Gipsel

                                           

                                          Originally posted by: dabrunhosa Gipsel, the code that you sent to me didnt work, the code above does work, but the problem is that for some reason the code only executes the first loop in the for statement.


                                          Actually, my suggestion was so simple it has to work. What error do you see? Are you using SDK1.3? Then you can't use literal constants in kernel code (it's a bug) and have to declare variables (as you do in the latest version you posted).

                                          That you see no effect of the for loop in your kernel is normal. You are reading every time the same values and calculate everytime the same value out of it. You can't propagate changes over the whole output stream within a kernel, you need to call the kernel in a loop (as I suggested) to get this done (or you restrict yourself to smaller streams and use the local data share).

                                          I guess you didn't get the basic priniciple how a brook kernel works on its execution domain. All elements of the execution domain are implicitly parallel.  And you have to understand that in a "normal" Brook kernel you don't have any communication between different positions of the output stream. I would think my suggestion is really doing what you want.

                                            • Problems with Kernel Code
                                              dabrunhosa

                                              Hi Gipsel, I'm using the SDK 1.4. Before I post this message the code above was doing exactly what i expected.

                                              Host Code:

                                              while (t <= t_final)
                                              {
                                                          calcular_eq_calor.domainOffset(uint4(1, 0, 0, 0));
                                                          calcular_eq_calor.domainSize(uint4(nx-1, 0, 0, 0));
                                                          calcular_eq_calor(inputStream,outputStream,beta,inverted_beta);

                                                          calcular_eq_calor.domainOffset(uint4(1, 0, 0, 0));
                                                          calcular_eq_calor.domainSize(uint4(nx-1, 0, 0, 0));
                                                          calcular_eq_calor(outputStream,inputStream,beta,inverted_beta);

                                                          m = m + 2;
                                                          t += m * dt;
                                               }

                                               

                                              Kernel Code :

                                               

                                              kernel void calcular_eq_calor(double input[], out double output<>, double beta, double inverted_beta)
                                              {
                                                  double m ,beta_in;
                                                  double t;

                                                  int i = instance().x;
                                                  int j = i + 1;
                                                  int l = i -1;

                                                  beta_in = beta - 2.0;

                                                  t = 0.0;
                                                  m = 0.0;

                                                    output = (input[j] - (beta_in)* input + input[l]) * inverted_beta; 
                                              }

                                               

                                              But for some reason when I was adjusting for the last element receive 0.0. The kernel started to give me the wrong result, without alterations to the prevously code.

                                              I changed your original code becuase :

                                              1 - The domain size is actualy nx - 1, because i want to begin at 1 and go all the way to 10.

                                                • Problems with Kernel Code
                                                  dabrunhosa

                                                  The code below, sory

                                                  • Problems with Kernel Code
                                                    Gipsel

                                                     

                                                    Originally posted by: dabrunhosa Hi Gipsel, I'm using the SDK 1.4.


                                                    I thought that bug with the literal constants was solved in 1.4. You should be able to to get by without the variable declarations.

                                                     

                                                    Originally posted by: dabrunhosa Before I post this message the code [below] was doing exactly what i expected.

                                                    [..]

                                                    I changed your original code becuase :

                                                    1 - The domain size is actualy nx - 1, because i want to begin at 1 and go all the way to 10.



                                                    You are right, the domain has to end at nx-1, I just didn't look that carefully to your algorithms and the nx declaration.

                                                    So the problem is solved now and all is working correctly?

                                                    PS:

                                                     

                                                    Originally posted by: dabrunhosa
                                                    m = m + 2;
                                                    t += m * dt;


                                                    Are you sure it should not look like that:
                                                    m = m + 2;
                                                    t = m * dt;

                                                    At least that would be equivalent to your CPU code. Otherwise t rises quadratically and not linear anymore!

                                                      • Problems with Kernel Code
                                                        dabrunhosa

                                                        You are right, it shoul be : t = m * dt;

                                                        It's not working correctly. When I modified t_final the CPU gives me one set of values and the gpu gives me another set, that is equal to the set before i modofied the t_final.

                                                        I don''t understand the Ati Stream SDK, if i modify a variable and recompile the program this changes take effect imediatly ? Do the test in your computer, alter the t_final to 10. For the values of variables below the code seems it works :

                                                        const double dt = 0.005; // Delta tempo
                                                        const double dx = 0.1;  // Delta x
                                                        const double t_final = 50000;
                                                        const double x_final = 1;
                                                        const double alpha = 100000;
                                                        int nx = (int)(1/dx) + 1;
                                                        double beta = (alpha*dx*dx)/dt;