5 Replies Latest reply on Aug 11, 2009 6:12 PM by titanius

    Reduction kernels with structs

    lasagna
      To find the smallest item

      Hi everyone,

       

      I'm trying to write a kernel to find the 'smallest' item in a 4096 by 4096 matrix that consists out of structs called pair. It looks to me that a reduce kernel is perfect for this. However, when I use my struct, it doesn't work, while floats work perfectly as in the documentation. Basicly nothing happens to the result, all values are always 0 no matter the input data. (even if I set the result pair's values before calling the kernel)

      I have also tried to use a modified bitonic sort kernel to find the smallest item. This works, but it takes me about 14 seconds to sort a 4096 by 4096 matrix which is a bit much as I'm going to be calling it quite a lot.

      Does anyone have any idea's how to fix my code, or a better way to find the smallest item in a rather large matrix?

      My code is attached below.

       

      Kind regards and many thanks in advance!

       

      Rob

      /* brook file */ typedef struct stpair { uchar2 coordsY; uchar2 coordsZ; int difference; } pair; reduce void min_pair(pair a<>, reduce pair b) { if(a.difference < b.difference) b = a; } /* main.cpp */ #include <conio.h> #include "brook/brook.h" #include "kernels.h" int main() { unsigned int i_dimensions[] = {10, 10}; Stream<pair> stmTest(2, i_dimensions); /* create dummy data */ pair flTest[10][10]; for(int i=0; i<10; i++) for(int j=0; j<10; j++) { flTest[i][j].coordsY = uchar2((uchar)i, (uchar)j); flTest[i][j].coordsZ = uchar2((uchar)j, (uchar)i); flTest[i][j].difference = (10); /* difference value of 10 for all items */ } stmTest.read(flTest); /* create result pair to store smallest item */ pair result; result.coordsY = uchar2((uchar)1, (uchar)2); result.coordsZ = uchar2((uchar)3, (uchar)4); result.difference = 40; /* set to 40 to make sure every item is smaller */ /* call kernel */ min_pair(stmTest, result); printf("%d at %d %d", result.difference, result.coordsY.x, result.coordsY.y); /* printed result is always 40 at 1 2, nothing happens to pair result */ getch(); return 0; }

        • Reduction kernels with structs
          gaurav.garg

          Structures are not supported in reduction kernel.

            • Reduction kernels with structs
              lasagna

              Thanks Guarav, that's what I was afraid of. 

              Does anyone have any idea's how I could find these items other than sorting the entire matrix?

                • Reduction kernels with structs
                  titanius

                  Could you post more of what comparisons are done in your algorithm?

                   

                  Because, a single pass over the data should be enough to find the minimum rather than doing a sort. So, maybe i am missing something.

                    • Reduction kernels with structs
                      lasagna

                       

                      Originally posted by: titanius Could you post more of what comparisons are done in your algorithm?

                       

                       

                       

                      Because, a single pass over the data should be enough to find the minimum rather than doing a sort. So, maybe i am missing something.

                       

                       

                      Hi Titanius,

                      My goal is to simply find the 12 lowest pairs in a 4096 by 4096 matrix, by their difference value. The catch is that I need the 2 coordinates from the struct that correspond with them.

                      I have tried a single pass over the data with a double for loop, but at the size of 4096 by 4096 my driver crashes or I get an access violation error. It works fine up to about 3000²~ for me. Asides from that, it also feels like such a big iteration is sloppy when trying to work in parallel, so I checked out different possibilities

                      By sorting them, I could simply take the 12 first items and copy them. This of course takes a lot of time.

                      I was looking at a reduction kernel so I could get the smallest item, save it, increase it greatly in the matrix and iterate the process 12 times to get the 12 smallest items.

                      I really don't have a clue if this would be faster than sorting but I'm willing to try anything to gain performance. My next guess will be trying to figure out a way to chop up the data safely and look at those parts individually.

                       

                        • Reduction kernels with structs
                          titanius

                          I think Sort might have a large overhead. Roughly a larger O(n*log(n)) overhead compared to O(n) that you have with a single pass

                          I think you might have a better chance at a reduction kernel as with sorting you also have to move (lots of) data around but with 12 passes of minimums you wont have to

                           

                          If you want to go the sorting route, might i suggest a [descending] bubble sort http://en.wikipedia.org/wiki/Bubble_sort  (look at the alternative implementations section). With every pass you can make sure that the largest/smallest value will be fixed. So you can code up so that after the first 12 iterations, the 12 largest/smallest value will be bubble-sorted to their positions.

                           

                          edited: to add the second para.

                          edited: added bubblesorting way