15 Replies Latest reply on Jun 25, 2008 11:51 AM by bonissent

    3D streams

    bonissent
      3D streams as arrays

      I tried to generalize the matrix multiplication from the tutorial from 2D to 3D. I tried to streamWrite a 3D array into an inputStream (mimicking the code from the matrix multiplication), then in the kernel I use the float inputA[][][] notation, and out float result<>, where the result is a 3D stream. Then I thought I could use a float3 as indexof(result).xyz. Overoptimistic?

      This does not work (the program simply ends in the kernel) but if I replace the 3D by 2D arrays, it seems to be OK.

      Can you confirm that direct access to streams is allowed only for 2D? and if so, what would you suggest if I really want to handle 3D arrays?
        • 3D streams
          michael.chu
          Hi bonissent,

          I'm asking the engineers for you and will get back to you when I get their response.

          Michael.
          • 3D streams
            bonissent
            To make the question more interesting , below is my br code. It is not very long and not very elaborate, but in my ignorance I would like to know :

            - why the 2D part seems to work and the 3D part fails. The last thing which gets printed is : Passe 34 just before entering the kernel;
            - why although the 2D part seems to work, the returned value is always 0 when I would expect it to be 10. (the value set inside the kernel).

            ====================================== source code below ================================================

            kernel void interpolate3D(float dim, float VOLREC[][][], float IMAGE[][], float theta, out float result<>
            {
            result = 10.;
            }
            kernel void interpolate2D(float dim, float VOLREC[][], float IMAGE[][], float theta, out float result<>
            {
            result = 10.;
            }


            int main(int argc, char** argv)
            {
            int dim = 10;
            int i = 0;
            int j = 0;
            int k = 0;

            float VR3D<dim, dim, dim>;
            float Image3D<dim, dim>;
            float RES3D<dim, dim, dim>;

            float VR<dim, dim>;
            float Image<dim>;
            float RES<dim, dim>;

            float* inputVR;
            float* inputImage;
            float* inputVR3D;
            float* inputImage3D;

            float output[10][10];

            float* output3D;

            printf(" Start \n");

            inputVR = (float*)malloc(sizeof *inputVR*dim*dim);
            inputImage = (float*)malloc(sizeof *inputImage*dim);

            streamRead(VR, inputVR);
            streamRead(Image, inputImage);

            interpolate2D((float)dim, VR, Image, 0., RES);
            streamWrite(RES, output);

            for(i=0; i<dim; i++){
            for(j=0; j<dim; j++){
            printf(" Output %d %d : %d \n", i, j, output[j]);
            //printf(" Output %d %d : \n", i, j);
            }
            }

            inputVR3D = (float*)malloc(sizeof *inputVR3D*dim*dim*dim);
            printf(" Passe 31 \n");
            inputImage3D = (float*)malloc(sizeof *inputImage3D*dim*dim);
            printf(" Passe 32 \n");

            output3D = (float*)malloc(sizeof *output3D*dim*dim*dim);


            streamRead(VR3D, inputVR3D);
            printf(" Passe 33 \n");
            streamRead(Image3D, inputImage3D);
            printf(" Passe 34 \n");

            interpolate3D((float)dim, VR3D, Image3D, 0., RES3D);
            printf(" Passe 35 \n");
            streamWrite(RES3D, output);
            printf(" Done 3D\n");

            }
              • 3D streams
                michael.chu
                Hi bonissent,

                I checked with the engineers and 3D gather is not yet supported. It is on the list of features to look at and implement.

                In the meantime, is it possible to treat your 3D arrays as 2D arrays and do additional math to emulate that 3rd dimension inside of the kernel and apply it to the 2nd dimension's index?

                By the way, are you using v1.0beta from the website or 0.9alpha? I wanted to make sure you were using v1.0beta to make sure your 2D issue wasn't due to something in the older SDK.

                Michael.
              • 3D streams
                bonissent
                Well, everything works now (apart from 3D of course). I realized that I was using the 2D output stream as a C array, which it is not. Access to the stream data needs to be done with a pointer, properly updated.

                I also understood that I can do everything I need using 1D streams and computing the 3D indices myself from the 1D index.
                However, I see that you have a ratjher small limit on the stream sizes, this is no problem for tests but may not be appropriate to use the gigantic memory available on the FireStream. Do you plan to extend this limit of 8192 elements?
                  • 3D streams
                    michael.chu
                    Hi bonissent,

                    I think the limit is 8192 for 1D and 8192x8192 for 2D.

                    This is an artifact of hardware implementation which we really should be abstracting for you. I am going to file a request to the engineers to see if they can abstract that in Brook+.

                    If you deal in 2D streams then that should allow you to operate on up to 1GB if you work with float4s.

                    Michael.
                      • 3D streams
                        nberger
                        Hi!

                        I strongly support abstraction of stream length issues, this would be very helpful for my application - is there any chance that this will be available in the 1.0 release?

                        Thanks

                        Nik
                          • 3D streams
                            michael.chu
                            Hi Nik,

                            Unfortunately, the engineering team is already in testing at the moment for v1.0 so I probably won't be able to get that in.

                            It's on my list of proposed features and I'll bring it up with them after this release.

                            Michael.
                      • 3D streams
                        bonissent
                        After being diverted for a while, I come back to multidimensions. My understanding from message above and from reading the documentation was that I am allowed to use up to
                        8192x8192 2D streams. I need 4000x2000, so I expected to be safe. Yet the code fragment below has problems :

                        If I comment out the line labelled B fails, it works fine; if I uncomment it and comment the line A fails, it crashes without any message, except for Windows asking if I want to notify Microsoft.

                        You will see that the 400x200 case works while the 4000x2000 fails. Therefore there must be something I fail to understand about max permitted sizes. Thanks for pointing my mistake.

                        =========================== below the code ===========================
                        float VR<4000, 2000>;
                        float VR2<400, 200>;

                        float* inputVR;
                        float* inputVR2;

                        inputVR = (float *)malloc(4*8000000);
                        inputVR2 = (float *)malloc(4*80000);
                        memset(inputVR, 0, 4*8000000);
                        memset(inputVR2, 0, 4*80000);
                        printf("stream read 0\n");
                        streamRead(VR2, inputVR2); // A works
                        printf("stream read 1\n");
                        streamRead(VR, inputVR); // B fails
                        printf("stream read 2\n");

                        printf(" after stream read\n");
                        exit(0);
                          • 3D streams
                            marcr

                            Hi,

                            I just built and ran your example on a Windows XP 32 bit system with
                            a Radeon 3870, using an 8.4 driver and the CAL/Brook beta off of the
                            public SDK site. It ran fine. You might be running into some system
                            or gfx card specific limit . There are a couple of CAL example programs
                            that show how to query such limits, such as "throughput" and
                            "OpenCloseDevice", which look at the device attrib and device info structs.

                            -- marcr
                          • 3D streams
                            bonissent
                            This is new to me. In fact I always wondered if one had to manage the memory, like clear the old unused streams. I understand now that it is my job to do that, via CAL coding. Is it not possible to do it from Brook+? I do not remember seeing it in the brook documentation, did I miss it?
                            • 3D streams
                              bonissent
                              • 3D streams
                                bonissent
                                  • 3D streams
                                    marcr

                                    Regarding dynamic allocation of streams, this is currently not supported at the Brook+ level. It is implicit in the sense that a stream is deallocated at the end of the block in which it was created.

                                    If this is something essential to a given app, there may be a way to do it at the level of the C++ code produced by Brook+, I'd have to look into it -- marc

                                  • 3D streams
                                    bonissent
                                    Sorry for not seeing the answer earlier. This is relatively important for me because I want to have my streams (some of them at least) as global variables so that they can stay inside the video card and I save the transfer time (some of my streams contain constants, and some contain accumulators). This is nice but I do not know how to create such a stream with a size determined at execution time.


                                    An other question, somehow decoupled : when I terminate my program with Ctrl^c, I receive several lines of the following message :

                                    CAL resrouce could not be freed <0>

                                    If it can help, the spelling is exactly that : resrouce, not that I care much but... My understanding is that it is telling me that I did not cleanup my streams, which is true. Is this a serious problem? I seem to live happily despite these warnings.