10 Replies Latest reply on Sep 17, 2008 1:24 PM by MicahVillmow

    Segmentation fault for repeated execution of a reduction kernel on a 4850

    jopakastner

      Hi,

      the following brook+ code results in a segmentation fault on my 4850 with SDK 1.2 and Catalyst 8.8:


      reduce void sum(float a<>, reduce float b<> {
          b = b + a;

      }

       

      int main() {
          float x<10>;
          float x_in[10];
          float s;
          int i;

          for (i=0; i<9; i++)
              x_in = (float)i;

          streamRead(x,x_in);

          sum(x,s);
          sum(x,s);  // 2 calls to sum are OK
          sum(x,s);  // 3rd call --> segmentation fault

          return 0;
      }


      This error is somewhat weird, since two succesive calls to the reduction kernel work fine, but the third call results in a segmentation fault. The same problem occurs if the reduction kernel is part of a for- or while-loop. Has anyone experienced a similar problem?

      My configuration:
      - OpenSuSE 11.0 64bit
      - HD 4850
      - Stream SDK 1.2
      - Catalyst 8.8

       

      --- johannes

        • Segmentation fault for repeated execution of a reduction kernel on a 4850
          JiaweiOu

           

          Originally posted by: jopakastner Hi,

          the following brook+ code results in a segmentation fault on my 4850 with SDK 1.2 and Catalyst 8.8:


          reduce void sum(float a<>, reduce float b<> {     b = b + a;

          }

           

          int main() {     float x<10>;     float x_in[10];     float s;     int i;

              for (i=0; i<9; i++)         x_in = (float)i;     streamRead(x,x_in);     sum(x,s);     sum(x,s);  // 2 calls to sum are OK     sum(x,s);  // 3rd call --> segmentation fault     return 0; }


          This error is somewhat weird, since two succesive calls to the reduction kernel work fine, but the third call results in a segmentation fault. The same problem occurs if the reduction kernel is part of a for- or while-loop. Has anyone experienced a similar problem?

          My configuration: - OpenSuSE 11.0 64bit - HD 4850 - Stream SDK 1.2 - Catalyst 8.8

           

          --- johannes

           

          I have the similiar problem too, the kernel function only work in the first invocation, the second invocation will cause a memory access error. There is no error message, so I don't know the where the problem is. But the stack trace shows that the problem may lying in the CAL layer.

          However, when I roll back to use the 1.1 api, everything works fine~

            • Segmentation fault for repeated execution of a reduction kernel on a 4850
              moodz

              s should be declared as a stream s<10> ... you are attempting summing a vector into single float var.

              eg

              int main() {
                  float x<10>;
                  float x_in[10];
                  float s<10>;
                  int i;

              The segfaults go away when you do this. 

               

               

                • Segmentation fault for repeated execution of a reduction kernel on a 4850
                  moodz

                  Actually ... declare s as 

                  float s<1>; and the segfault goes away.

                    • Segmentation fault for repeated execution of a reduction kernel on a 4850
                      jopakastner

                      Unfortunately, that is not the solution to the problem. I've already tried every possible declaration for s, but the segmentation fault always occurs in the third invocation of the reduction kernel. However, the code works fine in CPU emulation mode.

                      --- johannes

                        • Segmentation fault for repeated execution of a reduction kernel on a 4850
                          moodz

                           

                          Hey Johannes  below is the source that works on my platform ...

                          If I dont declare s as  a stream it definitely segfaults.

                          --------------------------------------------------------------------------------

                          # Brook source code ..... sum.br

                          #include <stdio.h>
                          #include "common.h"

                          reduce  void sum(float a<>, reduce float b<>
                          {
                                  b+=a;
                          }

                          int main() {
                              float x<10>;
                              float x_in[10];
                             float s<1>;
                              int i = 0;

                              for (i=0; i<9; i++)
                                  x_in = (float)i;

                              streamRead(x,x_in);

                              sum(x,s);
                              sum(x,s);  // 2 calls to sum are OK
                              sum(x,s);  // 3rd call --> segmentation fault

                              return 0;
                          }

                          -----------------------------------------

                          Makefile  ....

                           

                          ROOTDIR := ../../..
                          COMMONDIR := ../../common
                          vpath %.cpp $(COMMONDIR)
                          OUTPUTBASE := samples/bin
                          FILES := \
                                  common \
                                  sum \
                                  Timer
                          SDKLIBS := brook

                          GENERATE_EXECUTABLE := sum
                          include $(ROOTDIR)/samples/utils/build/main.mk
                          INCLUDEDIR += $(C_INCLUDE_FLAG)"$(COMMONDIR)"

                          Makefile (END)

                          -----------------------------------------------------------------------

                          Make results

                          [root@uncle05 sum]# make
                          make: Entering directory `/usr/local/amdbrook/samples/tests/sum'
                          mkdir -p depends
                          Rebuilding dependencies for ../../common/Timer.cpp
                          perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I  "../../common" --obj-suffix='.o' --obj-prefix='built_d/' ../../common/Timer.cpp > depends/Timer.depend
                          mkdir -p depends
                          Rebuilding dependencies for sum.br
                          perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I  "../../common" --obj-suffix='.o' --obj-prefix='built_d/' sum.br > depends/sum.depend
                          mkdir -p depends
                          Rebuilding dependencies for ../../common/common.cpp
                          perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I  "../../common" --obj-suffix='.o' --obj-prefix='built_d/' ../../common/common.cpp > depends/common.depend
                          make: Leaving directory `/usr/local/amdbrook/samples/tests/sum'
                          make: Entering directory `/usr/local/amdbrook/samples/tests/sum'
                          mkdir -p ../../../samples/bin/lnx_x86_64
                          mkdir -p built_d
                          g++  -Wfloat-equal -Wpointer-arith  -g3 -ffor-scope  -o built_d/common.o -c  -I  ../../../sdk/include -I  "../../common" ../../common/common.cpp
                          ../../common/common.cpp: In function ‘int floatCompare(float, float)’:
                          ../../common/common.cpp:372: warning: passing ‘float’ for argument 1 to ‘int abs(int)’
                          ../../common/common.cpp: In function ‘int doubleCompare(double, double)’:
                          ../../common/common.cpp:886: warning: passing ‘double’ for argument 1 to ‘int abs(int)’
                          mkdir -p built_d
                          ../../../sdk/bin/brcc_d  -o built_d/sum sum.br
                          mkdir -p built_d
                          g++  -Wfloat-equal -Wpointer-arith  -g3 -ffor-scope  -o built_d/sum.o -c  -I  ../../../sdk/include -I  "../../common" -I  . built_d/sum.cpp
                          mkdir -p built_d
                          g++  -Wfloat-equal -Wpointer-arith  -g3 -ffor-scope  -o built_d/Timer.o -c  -I  ../../../sdk/include -I  "../../common" ../../common/Timer.cpp
                          Building ../../../samples/bin/lnx_x86_64/sum_d
                          g++ -o ../../../samples/bin/lnx_x86_64/sum_d built_d/common.o built_d/sum.o built_d/Timer.o -lpthread -L/usr/X11R6/lib     -L../../../sdk/lib  -l brook_d  -L../../../samples/bin/lnx_x86_64
                          make: Leaving directory `/usr/local/amdbrook/samples/tests/sum'
                          [root@uncle05 sum]#

                          -------------------------------------------------------------------------------------------------

                          I called sum 10  times and it doesnt segf.

                          however change s to s<8> for example and you get the following o/p

                          [root@uncle05 sum]# ./../../bin/lnx_x86_64/sum_d
                          Assertion failure: calkernel.cpp (1027): Reduction output width is not an integer divisor of input width
                          sum_d: calbase.hpp:92: void CALAssertImpl(const char*, int, const char*): Assertion `0' failed.
                          Aborted
                          [root@uncle05 sum]#

                          The clue here is as it says ... the output width must be an integer divider of the input width.

                          Ok ... after further testing and calling sum multiple times ( up to eleven in this case )... it would seem that the only safe value for s is s<10>    .... the reduction output must equal the input width.

                          I am running this on a HD 4870 so you should be able to replicate results.

                          moodz

                           

                           

                            • Segmentation fault for repeated execution of a reduction kernel on a 4850
                              moodz

                              oops ... s must be declared as s<10>  not s<1> as above .... my bad.

                                • Segmentation fault for repeated execution of a reduction kernel on a 4850
                                  jopakastner

                                  Thanks a lot for your help -- you're right: the length of the output stream obviously has to match the input length. However, in this case the reduction kernel is not very useful for my purpose: I want to calculate the abs sum of a large stream (~million elements), so I'd have to waste 999999 elements for nothing ;-)  I think reduction really needs some improvement ...

                                   

                                    • Segmentation fault for repeated execution of a reduction kernel on a 4850
                                      udeepta@amd

                                      Guys, to confirm -- can you call

                                      float streamA<10>;

                                      float streamB<1>;

                                      sum(A,B);

                                      With this kernel

                                      reduce  void sum(float a<>, reduce float b<>)
                                      {
                                              b+=a;
                                      }

                                      AFAIK, this should work.

                                      float streamB<2> should work, as should float streamB<5>.

                                      float streamB<3> will not, since 10 is not an integral multiple of 3. Summing 3.3 elements of streamA to one element of streamB is ill defined. (Well, for most of us anyway.)

                                        • Segmentation fault for repeated execution of a reduction kernel on a 4850
                                          jopakastner

                                          Ok, I've tested this code with various combinations for the sizes of A and B.

                                          Config:
                                          HD 4850
                                          SDK 1.2
                                          Catalyst 8.8
                                          Linux 64 (openSuSE 10.3)

                                          Results:

                                          size(A)     size(B)     Result
                                          -------------------------------------------------------------------------------
                                          4               4               ok
                                          4               2               seg. fault during 4th invocation of sum
                                          4               1               seg. fault during 5th invocation
                                          -------------------------------------------------------------------------------
                                          8               8               ok
                                          8               4               seg. fault during 4th invocation
                                          8               2               seg. fault during 5th invocation
                                          8               1               Assertion failure: calcontext.cpp (1251)
                                          -------------------------------------------------------------------------------
                                          10             10            ok
                                          10             5              seg. fault during 3rd invocation
                                          10             2              seg. fault during 4th invocation
                                          10             1              seg. fault during 8th invocation
                                          -------------------------------------------------------------------------------
                                          16             16            ok
                                          16             8              seg. fault during 4th invocation
                                          16             4              seg. fault during 5th invocation
                                          16             2              Assertion failure: calcontext.cpp (1251)
                                          16             1              seg. fault during 4th invocation
                                          -------------------------------------------------------------------------------
                                          100           100         ok
                                          100           50           seg. fault during 4th invocation
                                          100           2             seg. fault during 4th invocation
                                          -------------------------------------------------------------------------------
                                          256          256          ok
                                          256          128          seg. fault during 4th invocation
                                          256          64            seg. fault during 5th invocation
                                          256          32            Assertion failure: calcontext.cpp (1251)
                                          256          16            seg. fault during 4th invocation
                                          256          8              seg. fault during 3rd invocation
                                          256          4              seg. fault during 5th invocation
                                          256          2              seg. fault during 4th invocation
                                          256          1              seg. fault during 3rd invocation
                                          -------------------------------------------------------------------------------

                                          The number of calls before a seg. fault is reproducible

                                          Hope, this helps!

                                          Greetings
                                          Johannes

                          • Segmentation fault for repeated execution of a reduction kernel on a 4850
                            MicahVillmow
                            Thanks for the bug report, we will work on getting this fixed.