Archives Discussions

jopakastner · ‎09-12-2008

Hi,

the following brook+ code results in a segmentation fault on my 4850 with SDK 1.2 and Catalyst 8.8:

reduce void sum(float a<>, reduce float b<> {
b = b + a;

}

int main() {
    float x<10>;
    float x_in[10];
    float s;
    int i;

    for (i=0; i<9; i++)
        x_in = (float)i;

    streamRead(x,x_in);

    sum(x,s);
    sum(x,s); // 2 calls to sum are OK
    sum(x,s); // 3rd call --> segmentation fault

    return 0;
}

This error is somewhat weird, since two succesive calls to the reduction kernel work fine, but the third call results in a segmentation fault. The same problem occurs if the reduction kernel is part of a for- or while-loop. Has anyone experienced a similar problem?

My configuration:
- OpenSuSE 11.0 64bit
- HD 4850
- Stream SDK 1.2
- Catalyst 8.8

--- johannes

JiaweiOu · ‎09-12-2008

Originally posted by: jopakastner Hi,

the following brook+ code results in a segmentation fault on my 4850 with SDK 1.2 and Catalyst 8.8:
reduce void sum(float a<>, reduce float b<> {     b = b + a;
}

int main() {     float x<10>;     float x_in[10];     float s;     int i;
    for (i=0; i<9; i++)         x_in = (float)i;     streamRead(x,x_in);     sum(x,s);     sum(x,s); // 2 calls to sum are OK     sum(x,s); // 3rd call --> segmentation fault     return 0; }
This error is somewhat weird, since two succesive calls to the reduction kernel work fine, but the third call results in a segmentation fault. The same problem occurs if the reduction kernel is part of a for- or while-loop. Has anyone experienced a similar problem?
My configuration: - OpenSuSE 11.0 64bit - HD 4850 - Stream SDK 1.2 - Catalyst 8.8

--- johannes

I have the similiar problem too, the kernel function only work in the first invocation, the second invocation will cause a memory access error. There is no error message, so I don't know the where the problem is. But the stack trace shows that the problem may lying in the CAL layer.

However, when I roll back to use the 1.1 api, everything works fine~

moodz · ‎09-12-2008

s should be declared as a stream s<10> ... you are attempting summing a vector into single float var.

eg

int main() {
    float x<10>;
    float x_in[10];
    float s<10>;
    int i;

The segfaults go away when you do this.

moodz · ‎09-12-2008

Actually ... declare s as

float s<1>; and the segfault goes away.

jopakastner · ‎09-15-2008

Unfortunately, that is not the solution to the problem. I've already tried every possible declaration for s, but the segmentation fault always occurs in the third invocation of the reduction kernel. However, the code works fine in CPU emulation mode.

--- johannes

moodz · ‎09-16-2008

Hey Johannes below is the source that works on my platform ...

If I dont declare s as a stream it definitely segfaults.

--------------------------------------------------------------------------------

# Brook source code ..... sum.br

#include <stdio.h>
#include "common.h"

reduce void sum(float a<>, reduce float b<>
{
        b+=a;
}

int main() {
    float x<10>;
    float x_in[10];
   float s<1>;
    int i = 0;

    for (i=0; i<9; i++)
        x_in = (float)i;

    streamRead(x,x_in);

    sum(x,s);
    sum(x,s); // 2 calls to sum are OK
    sum(x,s); // 3rd call --> segmentation fault

    return 0;
}

-----------------------------------------

Makefile ....

ROOTDIR := ../../..
COMMONDIR := ../../common
vpath %.cpp $(COMMONDIR)
OUTPUTBASE := samples/bin
FILES := \
        common \
        sum \
        Timer
SDKLIBS := brook

GENERATE_EXECUTABLE := sum
include $(ROOTDIR)/samples/utils/build/main.mk
INCLUDEDIR += $(C_INCLUDE_FLAG)"$(COMMONDIR)"

Makefile (END)

-----------------------------------------------------------------------

Make results

[root@uncle05 sum]# make
make: Entering directory `/usr/local/amdbrook/samples/tests/sum'
mkdir -p depends
Rebuilding dependencies for ../../common/Timer.cpp
perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I "../../common" --obj-suffix='.o' --obj-prefix='built_d/' ../../common/Timer.cpp > depends/Timer.depend
mkdir -p depends
Rebuilding dependencies for sum.br
perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I "../../common" --obj-suffix='.o' --obj-prefix='built_d/' sum.br > depends/sum.depend
mkdir -p depends
Rebuilding dependencies for ../../common/common.cpp
perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I "../../common" --obj-suffix='.o' --obj-prefix='built_d/' ../../common/common.cpp > depends/common.depend
make: Leaving directory `/usr/local/amdbrook/samples/tests/sum'
make: Entering directory `/usr/local/amdbrook/samples/tests/sum'
mkdir -p ../../../samples/bin/lnx_x86_64
mkdir -p built_d
g++ -Wfloat-equal -Wpointer-arith -g3 -ffor-scope -o built_d/common.o -c -I ../../../sdk/include -I "../../common" ../../common/common.cpp
../../common/common.cpp: In function ‘int floatCompare(float, float)’:
../../common/common.cpp:372: warning: passing ‘float’ for argument 1 to ‘int abs(int)’
../../common/common.cpp: In function ‘int doubleCompare(double, double)’:
../../common/common.cpp:886: warning: passing ‘double’ for argument 1 to ‘int abs(int)’
mkdir -p built_d
../../../sdk/bin/brcc_d -o built_d/sum sum.br
mkdir -p built_d
g++ -Wfloat-equal -Wpointer-arith -g3 -ffor-scope -o built_d/sum.o -c -I ../../../sdk/include -I "../../common" -I . built_d/sum.cpp
mkdir -p built_d
g++ -Wfloat-equal -Wpointer-arith -g3 -ffor-scope -o built_d/Timer.o -c -I ../../../sdk/include -I "../../common" ../../common/Timer.cpp
Building ../../../samples/bin/lnx_x86_64/sum_d
g++ -o ../../../samples/bin/lnx_x86_64/sum_d built_d/common.o built_d/sum.o built_d/Timer.o -lpthread -L/usr/X11R6/lib -L../../../sdk/lib -l brook_d -L../../../samples/bin/lnx_x86_64
make: Leaving directory `/usr/local/amdbrook/samples/tests/sum'
[root@uncle05 sum]#

-------------------------------------------------------------------------------------------------

I called sum 10 times and it doesnt segf.

however change s to s<8> for example and you get the following o/p

[root@uncle05 sum]# ./../../bin/lnx_x86_64/sum_d
Assertion failure: calkernel.cpp (1027): Reduction output width is not an integer divisor of input width
sum_d: calbase.hpp:92: void CALAssertImpl(const char*, int, const char*): Assertion `0' failed.
Aborted
[root@uncle05 sum]#

The clue here is as it says ... the output width must be an integer divider of the input width.

Ok ... after further testing and calling sum multiple times ( up to eleven in this case )... it would seem that the only safe value for s is s<10> .... the reduction output must equal the input width.

I am running this on a HD 4870 so you should be able to replicate results.

moodz

moodz · ‎09-16-2008

oops ... s must be declared as s<10> not s<1> as above .... my bad.

jopakastner · ‎09-16-2008

Thanks a lot for your help -- you're right: the length of the output stream obviously has to match the input length. However, in this case the reduction kernel is not very useful for my purpose: I want to calculate the abs sum of a large stream (~million elements), so I'd have to waste 999999 elements for nothing 😉 I think reduction really needs some improvement ...

udeepta · ‎09-16-2008

Guys, to confirm -- can you call

float streamA<10>;

float streamB<1>;

sum(A,B);

With this kernel

reduce void sum(float a<>, reduce float b<>)
{
b+=a;
}

AFAIK, this should work.

float streamB<2> should work, as should float streamB<5>.

float streamB<3> will not, since 10 is not an integral multiple of 3. Summing 3.3 elements of streamA to one element of streamB is ill defined. (Well, for most of us anyway.)

jopakastner · ‎09-17-2008

Ok, I've tested this code with various combinations for the sizes of A and B.

Config:
HD 4850
SDK 1.2
Catalyst 8.8
Linux 64 (openSuSE 10.3)

Results:

size(A)     size(B)     Result
-------------------------------------------------------------------------------
4               4               ok
4               2               seg. fault during 4th invocation of sum
4               1               seg. fault during 5th invocation
-------------------------------------------------------------------------------
8               8               ok
8               4               seg. fault during 4th invocation
8               2               seg. fault during 5th invocation
8               1               Assertion failure: calcontext.cpp (1251)
-------------------------------------------------------------------------------
10             10            ok
10             5              seg. fault during 3rd invocation
10             2              seg. fault during 4th invocation
10             1              seg. fault during 8th invocation
-------------------------------------------------------------------------------
16             16            ok
16             8              seg. fault during 4th invocation
16             4              seg. fault during 5th invocation
16             2              Assertion failure: calcontext.cpp (1251)
16             1              seg. fault during 4th invocation
-------------------------------------------------------------------------------
100           100         ok
100           50           seg. fault during 4th invocation
100           2             seg. fault during 4th invocation
-------------------------------------------------------------------------------
256          256          ok
256          128          seg. fault during 4th invocation
256          64            seg. fault during 5th invocation
256          32            Assertion failure: calcontext.cpp (1251)
256          16            seg. fault during 4th invocation
256          8              seg. fault during 3rd invocation
256          4              seg. fault during 5th invocation
256          2              seg. fault during 4th invocation
256          1              seg. fault during 3rd invocation
-------------------------------------------------------------------------------

The number of calls before a seg. fault is reproducible

Hope, this helps!

Greetings
Johannes

MicahVillmow · ‎09-17-2008

Thanks for the bug report, we will work on getting this fixed.

Archives Discussions

Segmentation fault for repeated execution of a reduction kernel on a 4850