24 Replies Latest reply on May 21, 2008 6:23 AM by michael.chu

    Scatter Question

    ryta1203
      Is it possible to scatter into a stream without using the scatter function. I am looking at the scatter-gather example in the SDK and it's a very simple example.

      For instance, if I want to do something like this:

      a[0] = x * y / z + c[4]

      The problem is that the array sizes for a and c are not the same, a is larger by *8.

      I need something like:

      a[indexof(c)+height*length*8] = x*y/z+c;

      where c is a stream with dimensions height*length

      I know I can do this:

      c = x*y/z+a[indexof(c)+height*length*8], so is there anyway to do the opposite of that?
        • Scatter Question
          michael.chu
          Hi ryta1203,

          I believe you can also do scatter with the [] notation on the output stream.

          It is important to note that you will need to scatter in 128-bit chunks though.

          Michael.
            • Scatter Question
              michael.chu
              Take a look at the topic: Scatter stream base type has to be 128 bit
                • Scatter Question
                  ryta1203
                  Michael,

                  I'm not sure why this code crashes:

                  kernel void kern(float index<>, float d[], float a[], float b[], float size, out float4 c[])
                  {
                  if (d[index] == 2 || d[index] == 3)
                  {
                  c[index] = d[index];
                  }
                  }

                  index is of the same size as d and goes from 0 to size-1

                  Also, I want to write c.x back to a 1D array. Is this possible?

                  Is there an example of scatter using this method in the samples I can look at? All I've seen is the scatter using the scatterOp function.
                    • Scatter Question
                      michael.chu
                      Hi ryta1203,

                      I assume it is crashing for you when using the CAL backend? Does it work properly when you use the CPU backend? (just to make sure no value of index is out of bounds).

                      When you say you want to write c.x back to a 1D array, are you saying in a different kernel? Or that you wanted to also treat c as an input stream as well?

                      I believe the way you are doing it is the correct syntax. You need to scatter in 128-bit chunks.

                      The only thing that MIGHT be an issue is the assignment of a float to a float4.

                      Let me check with the Brook+ team to see if this could be the problem.

                      Michael.
                        • Scatter Question
                          ryta1203
                          Originally posted by: michael.chu@amd.com

                          Hi ryta1203,



                          I assume it is crashing for you when using the CAL backend? Does it work properly when you use the CPU backend? (just to make sure no value of index is out of bounds).



                          When you say you want to write c.x back to a 1D array, are you saying in a different kernel? Or that you wanted to also treat c as an input stream as well?



                          I believe the way you are doing it is the correct syntax. You need to scatter in 128-bit chunks.



                          The only thing that MIGHT be an issue is the assignment of a float to a float4.



                          Let me check with the Brook+ team to see if this could be the problem.



                          Michael.


                          I assume I am using the CAL backend since I haven't changed any settings and am using the defaults. It crashes even if I give it a static index value.

                          Basically, I want to input the stream C to the kernel as an array, then I want to assign each element of the array a float value (since you have to use float4 or double2, I would like to assign this value to the .x, OR assign it to all .xyzw and then just streamWrite back the .x)

                          I want to write all of the .x of the stream C (which in the kernel is passed as an array) back to the 1D array I have created in MAIN()

                            • Scatter Question
                              michael.chu
                              Hi ryta1203,

                              I noticed you are using the float type for the index.

                              Can you try using int instead?

                              Michael.
                                • Scatter Question
                                  ryta1203
                                  Originally posted by: michael.chu@amd.com

                                  Hi ryta1203,



                                  I noticed you are using the float type for the index.



                                  Can you try using int instead?



                                  Michael.


                                  Michael,

                                  ints aren't supported yet so how would I go about using them. As I described in the other thread, I tried multiple things, none of which worked.
                                    • Scatter Question
                                      michael.chu
                                      Hi ryta1203,

                                      As noted in the other scatter thread you posted, we'll post something soon, as soon as one of our AEs has had a chance to test out the sample from the engineers upstairs.

                                      Michael.
                                        • Scatter Question
                                          ryta1203
                                          Michael,

                                          I saw your other post too, that's great news, thanks a bunch!! Meanwhile, I will switch over to CAL since my app definitely needs scatter ability.
                                            • Scatter Question
                                              marcr
                                              Hi,

                                              There is now a scatter example available at:

                                              ftp://streamcomputing:streamcomputing@ftp-developer.amd.com/samples

                                              This code reflects the current scatter limitations (1D scatter target stream, 128 bit
                                              element size).

                                              Simply drop the scatter directory into your desktop and build.

                                              -- marcr
                                                • Scatter Question
                                                  ryta1203
                                                  Will we see the ability to scatter without the 128 bit element size limitation in an upcoming release?

                                                  It seems to me that the scatter ability is widely used/needed and without it Brook+ is very limited. If we can make a "feature request" this would be mine, the ability to scatter (even limited to a 1D array) without the 128 bit element size limitation.

                                                  As it stands now, if you want to have the ability to scatter a 1D array of "float", you have to create a wrapper function that transfers all the 1D floats involved to 1D float4.x and then read them back from float4.x into float. Is this correct? The scatter example doesn't deal with this issue so I am assuming that this is true. This will incur some overhead, particulary for larger data sets.
                                                    • Scatter Question
                                                      michael.chu
                                                      Hi ryta1203,

                                                      This is a function of what the hardware itself provides at the moment unfortunately. The hardware is optimized to do 128-bit writes. It is definitely on my feature request list so that it is revisited when it is practical to do it.

                                                      At this moment, if you absolutely need to deal in floats instead of float4s then, yes, that is the sequence of operations you need to make.

                                                      Michael.
                                                        • Scatter Question
                                                          michael.chu
                                                          Hi ryta1203,

                                                          I stand corrected... :-) I was told by an engineer on the team that actually the hardware is capable of scattering on a 32-bit granularity level. The request has already been made to the appropriate team to take a look at adding that capability to the tools.

                                                          I apologize for the confusion!

                                                          Michael.
                                                          • Scatter Question
                                                            ryta1203
                                                            Michael,

                                                            I've looked at the example and attempted to mimic a simple example of my own:

                                                            I could not get the "sample" to execute because of a missing MSVCP80.dll, so I don't know if it will run or not. I know it will compile, but then again, so does my code below which does not run.

                                                            kernel void kern(float4 a[], float4 b[], out float4 c[])
                                                            {
                                                            float idx = indexof(c);
                                                            c[idx] = a[idx]+b[idx];
                                                            }

                                                            This kernel; however, crashes the program. All of the array sizes (stream sizes) are the same, which in this example happens to be size 8.

                                                            The b and a arrays are initialized to 0-7, respective to the array index (ie. [0] = 0, [1]=1.....[7]=7).

                                                            The program crashes at kernel call.


                                                            Any suggestions would be much appreciated. I apologize, I'm not sure why I am having so many problems getting scatter to work.
                                                              • Scatter Question
                                                                marcr

                                                                Hi ryta1203,

                                                                Can you go to Project->Properties->Configuration Properties->
                                                                C/C++->Code Generation, and set "Runtime Library" to "Multi-threaded DLL (/MD)"?
                                                                That did the trick for me. It appears that this setting gets replaced with /MT when moving an existing project directory around, which then leads to the MSVCP80.dll error.

                                                                marcr

                                                                  • Scatter Question
                                                                    ryta1203
                                                                    marcr,

                                                                    This got rid of that error in "Release" but not in "Debug".

                                                                    In "Release, the sample still crashes giving no output and a message box saying:

                                                                    "Debuggin information for "scatter.exe" cannot be found or does not match. Binary was not built with debug information.

                                                                    Do you want to continue debugging?"

                                                                      • Scatter Question
                                                                        marcr

                                                                        Hi,

                                                                        It appears I made a mistake when uploading the original example, sorry about that.

                                                                        Can you please go to the ftp site again, and grab either of "scatter" or "hello_brook".
                                                                        We've massaged those so that you can drop them into your desktop, and build and
                                                                        any Release/Debug combo (but only Win32 on 32 bit systems, and x64 on 64 bit
                                                                        systems).

                                                                        Let me know how it goes.

                                                                        marcr
                                                                          • Scatter Question
                                                                            ryta1203
                                                                            marcr,

                                                                            Thanks. I will take a look at it and see if I can get my code working.

                                                                            • Scatter Question
                                                                              ryta1203
                                                                              Here is my code. I have changed all the Project Properties to be the same as in your scatter example. At this point I can't really think of a simpler example. This just crashes when it calls the kernel. It runs fine if BRT_RUNTIME (which I had to create, it's not created automatically) is set to "cpu" but not when it is set to "cal".

                                                                              #include < stdio.h >
                                                                              #include < stdlib.h >

                                                                              #define size1 2*2*2
                                                                              #define size2 2*2

                                                                              kernel void foo(float4 a[], float4 b[], out float4 c[])
                                                                              {
                                                                              float idx = indexof(c);
                                                                              c[idx] = a[idx]+b[idx];
                                                                              }

                                                                              int main()
                                                                              {
                                                                              int j=0;
                                                                              float num;
                                                                              float4 b < size1 > ;
                                                                              float4 c < size1 > ;
                                                                              float4 a < size1 > ;
                                                                              float4 g[size1];
                                                                              float4 h[size1];

                                                                              for (j=0;j < size1;j=j+1)
                                                                              {
                                                                              g[j].x= (float)j;
                                                                              g[j].y= (float)j;
                                                                              g[j].z= (float)j;
                                                                              g[j].w= (float)j;
                                                                              h[j].x=(float)j;
                                                                              h[j].y=(float)j;
                                                                              h[j].z=(float)j;
                                                                              h[j].w=(float)j;
                                                                              }

                                                                              streamRead(b, g);
                                                                              streamRead(a, h);
                                                                              foo(a, b, c);
                                                                              scanf_s("%f", &num);
                                                                              streamWrite(c, g);
                                                                              }

                                                                              What would be the reasons it would run in cpu but not in cal? Also, I'm currently using 2900xt, don't think that should matter though, since it's R600.
                                                                                • Scatter Question
                                                                                  marcr
                                                                                  This works fine on my system (and produces the correct result).
                                                                                  Can you run any Brook programs on your system at all?

                                                                                  Feel free to send you project file to streamdeveloper@amd.com.
                                                                                  I will build and run it on my system, then at least we know
                                                                                  if it's something in the project or your system.

                                                                                  Thanks,

                                                                                  -- marcr
                                                                                    • Scatter Question
                                                                                      ryta1203
                                                                                      marcr,

                                                                                      Yes, I only have problems when using scatter.

                                                                                      I have emailed my project file and code. I'm sure there is something I am missing, I'm just not sure what.

                                                                                      Thank you,

                                                                                      Ryan
                                                                                        • Scatter Question
                                                                                          michael.chu
                                                                                          Hi Ryan,

                                                                                          I just saw in a separate topic that you are using an R600 card.

                                                                                          You need an RV670 card (Radeon HD 3870 or FireStream 9170) to use scatter or DPFP. Those features were introduced in that GPU.

                                                                                          This might be what is causing your problem.

                                                                                          Michael.
                                                                                            • Scatter Question
                                                                                              ryta1203
                                                                                              Michael,

                                                                                              As I said in my email, I am using R600. I will go back and check the documentation/release notes again. I must have missed this.

                                                                                              EDIT: This is not in the relase notes. The release notes (Mar-8) specify:

                                                                                              Scatter
                                                                                              -------

                                                                                              Scatter to 1-dimensional targets is supported. The syntax is similar to gather
                                                                                              operations, in that the stream is bound using square brackets instead of angle
                                                                                              brackets and elements are accessed in an array-like fashion.

                                                                                              Double Precision
                                                                                              ----------------

                                                                                              Double precision is supported on cards that have the necessary hardware
                                                                                              support. Brcc does not currently automatically promote or downcast between
                                                                                              float and double - the user must add explicit casts.

                                                                                              All floating-point literals are still single precision.


                                                                                              There is no mention of "necessary hardware support" under the "Scatter" section as there is in the "Double Precision" section. Can this be changed?

                                                                                              Thank you,

                                                                                              Ryan