1 Reply Latest reply on Oct 10, 2008 10:14 PM by udeepta@amd

    several problems on FS9170. Help me

    garrison

      1.  What does AMD think about OpenCL? AMD will work hard to support it efficiently or just provide a library for it and emphasize going on with Brook+?

      2.  A problem about "domain":
       
      -----------------------------------------
      //kernel code
      void kernel copy(float s<>, out float d1<>,out float d2<>
      {
        d1 = s;
        d2 = s;
      }
       
      //stream code
      float input<16,16>;
      float output1<16,16>;
      float output2<16,16>;
       
      copy(input.domain( int2(0,0), int2(15,15)),
              output1.domain(int2(0,1),int2(15,16)),
              output2.domain(int2(1,0),int2(16,15)));
      -------------------------------------------
       
      In this program, I want to copy the left-upper of input to left-lower of output1 and to right-upper of output2. The result shows that output1 is right but output2 is wrong. The left-upper 15*15 of input data is copied into left-lower corner of both output1 and output2. Does it means when I use ".domain" with output stream, all output streams will use the same domain information of the first one and ignore its own?

      3. I find that if we pass two or more scalar parameters to a kernel function, just as

      ----------------------------------------------
      "void kernel func(double a, double b, double s1<>,....)"
      -----------------------------------------------

      Then in the kernel, the scalar parameter b will be 0, no matter what value was passed into.  Is this a bug?

      A small test case:

      ------------------------------------------------
      kernel void cons(double a, double b, out double c<>,out double d<>
      {
        c = a;
        d = b;
      }

      int main(int argc, char** argv)
      {
          double in1;
          double in2;
          double* out1;
          double* out2;
          out1 = allocate_mat_d(4,4);
          out2 = allocate_mat_d(4,4);
          
          in1 = 2.0;
          in2 = 4.0;
         
          {
           double s1<4,4>;
           double s2<4,4>;

              cons(in1,in2,s1,s2);

              streamWrite(s1, out1);
              streamWrite(s2, out2);
          }
          print_mat_d("out1", "%lf ", (double*)out1,4,4);
          print_mat_d("out2", "%lf ", (double*)out2,4,4);
          return retval;
      }
      --------------------------------------------------
      The result is:
      out1
      2.000000 2.000000 2.000000 2.000000
      2.000000 2.000000 2.000000 2.000000
      2.000000 2.000000 2.000000 2.000000
      2.000000 2.000000 2.000000 2.000000

      out2
      0.000000 0.000000 0.000000 0.000000
      0.000000 0.000000 0.000000 0.000000
      0.000000 0.000000 0.000000 0.000000
      0.000000 0.000000 0.000000 0.000000

      4. Are there any limitations to the size of stream in reduce kernel? I have a simple reduce kernel like this:

      ------------------------------------
      reduce void sum(double in<>,reduce double out<>
      {
          out += in;
      }

      double * input;
      double output;
      input = allocate_mat_d(1023,1023);
      fill_mat_d(.....);
      double s<1023,1023>;
      streamRead(s,input);
      sum(s,output);
      -------------------------------------
      It says:
      "Failed to find usable kernel fragment to implement requested reduction."
      But it runs normally if I tune the size to 1024*1024. How to fix it? 

      5. After installing the 1.2 version of SDK, all programs can not run.
      -------------------------------------------------------------------------
      XIO:  fatal IO error 0 (Success) on X server ":0.0"
       after 8 requests (8 known processed) with 0 events remaining.
      -------------------------------------------------------------------------
      What should I do?

        • several problems on FS9170. Help me
          udeepta@amd

          1. Brook+ is being actively developed as a high level and a simple API for GPGPU. AMD is participating in OpenCL spec discussions, but I do not have any more details because there is not OpenCL spec yet.

          2. Domain interval is interpreted as [,). That is, domain(0,10) implies a domain of 0 to 9 inclusive.

          3. It is a bug; should get fixed in 1.3.

          4. Reduction only works for stream sizes that have a prime factorization of only 2,3,5,7 as the factors.

          5. were you able to get going?