Hi,
the following brook+ code results in a segmentation fault on my 4850 with SDK 1.2 and Catalyst 8.8:
reduce void sum(float a<>, reduce float b<> {
b = b + a;
}
int main() {
float x<10>;
float x_in[10];
float s;
int i;
for (i=0; i<9; i++)
x_in = (float)i;
streamRead(x,x_in);
sum(x,s);
sum(x,s); // 2 calls to sum are OK
sum(x,s); // 3rd call --> segmentation fault
return 0;
}
This error is somewhat weird, since two succesive calls to the reduction kernel work fine, but the third call results in a segmentation fault. The same problem occurs if the reduction kernel is part of a for- or while-loop. Has anyone experienced a similar problem?
My configuration:
- OpenSuSE 11.0 64bit
- HD 4850
- Stream SDK 1.2
- Catalyst 8.8
--- johannes
Originally posted by: jopakastner Hi,
the following brook+ code results in a segmentation fault on my 4850 with SDK 1.2 and Catalyst 8.8:
reduce void sum(float a<>, reduce float b<> { b = b + a;
}
int main() { float x<10>; float x_in[10]; float s; int i;
for (i=0; i<9; i++) x_in = (float)i; streamRead(x,x_in); sum(x,s); sum(x,s); // 2 calls to sum are OK sum(x,s); // 3rd call --> segmentation fault return 0; }
This error is somewhat weird, since two succesive calls to the reduction kernel work fine, but the third call results in a segmentation fault. The same problem occurs if the reduction kernel is part of a for- or while-loop. Has anyone experienced a similar problem?
My configuration: - OpenSuSE 11.0 64bit - HD 4850 - Stream SDK 1.2 - Catalyst 8.8
--- johannes
I have the similiar problem too, the kernel function only work in the first invocation, the second invocation will cause a memory access error. There is no error message, so I don't know the where the problem is. But the stack trace shows that the problem may lying in the CAL layer.
However, when I roll back to use the 1.1 api, everything works fine~
s should be declared as a stream s<10> ... you are attempting summing a vector into single float var.
eg
int main() {
float x<10>;
float x_in[10];
float s<10>;
int i;
The segfaults go away when you do this.
Actually ... declare s as
float s<1>; and the segfault goes away.
Unfortunately, that is not the solution to the problem. I've already tried every possible declaration for s, but the segmentation fault always occurs in the third invocation of the reduction kernel. However, the code works fine in CPU emulation mode.
--- johannes
Hey Johannes below is the source that works on my platform ...
If I dont declare s as a stream it definitely segfaults.
--------------------------------------------------------------------------------
# Brook source code ..... sum.br
#include <stdio.h>
#include "common.h"
reduce void sum(float a<>, reduce float b<>
{
b+=a;
}
int main() {
float x<10>;
float x_in[10];
float s<1>;
int i = 0;
for (i=0; i<9; i++)
x_in = (float)i;
streamRead(x,x_in);
sum(x,s);
sum(x,s); // 2 calls to sum are OK
sum(x,s); // 3rd call --> segmentation fault
return 0;
}
-----------------------------------------
Makefile ....
ROOTDIR := ../../..
COMMONDIR := ../../common
vpath %.cpp $(COMMONDIR)
OUTPUTBASE := samples/bin
FILES := \
common \
sum \
Timer
SDKLIBS := brook
GENERATE_EXECUTABLE := sum
include $(ROOTDIR)/samples/utils/build/main.mk
INCLUDEDIR += $(C_INCLUDE_FLAG)"$(COMMONDIR)"
Makefile (END)
-----------------------------------------------------------------------
Make results
[root@uncle05 sum]# make
make: Entering directory `/usr/local/amdbrook/samples/tests/sum'
mkdir -p depends
Rebuilding dependencies for ../../common/Timer.cpp
perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I "../../common" --obj-suffix='.o' --obj-prefix='built_d/' ../../common/Timer.cpp > depends/Timer.depend
mkdir -p depends
Rebuilding dependencies for sum.br
perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I "../../common" --obj-suffix='.o' --obj-prefix='built_d/' sum.br > depends/sum.depend
mkdir -p depends
Rebuilding dependencies for ../../common/common.cpp
perl ../../../samples/utils/build/fastdep.pl -I. -I../../../sdk/include -I "../../common" --obj-suffix='.o' --obj-prefix='built_d/' ../../common/common.cpp > depends/common.depend
make: Leaving directory `/usr/local/amdbrook/samples/tests/sum'
make: Entering directory `/usr/local/amdbrook/samples/tests/sum'
mkdir -p ../../../samples/bin/lnx_x86_64
mkdir -p built_d
g++ -Wfloat-equal -Wpointer-arith -g3 -ffor-scope -o built_d/common.o -c -I ../../../sdk/include -I "../../common" ../../common/common.cpp
../../common/common.cpp: In function ‘int floatCompare(float, float)’:
../../common/common.cpp:372: warning: passing ‘float’ for argument 1 to ‘int abs(int)’
../../common/common.cpp: In function ‘int doubleCompare(double, double)’:
../../common/common.cpp:886: warning: passing ‘double’ for argument 1 to ‘int abs(int)’
mkdir -p built_d
../../../sdk/bin/brcc_d -o built_d/sum sum.br
mkdir -p built_d
g++ -Wfloat-equal -Wpointer-arith -g3 -ffor-scope -o built_d/sum.o -c -I ../../../sdk/include -I "../../common" -I . built_d/sum.cpp
mkdir -p built_d
g++ -Wfloat-equal -Wpointer-arith -g3 -ffor-scope -o built_d/Timer.o -c -I ../../../sdk/include -I "../../common" ../../common/Timer.cpp
Building ../../../samples/bin/lnx_x86_64/sum_d
g++ -o ../../../samples/bin/lnx_x86_64/sum_d built_d/common.o built_d/sum.o built_d/Timer.o -lpthread -L/usr/X11R6/lib -L../../../sdk/lib -l brook_d -L../../../samples/bin/lnx_x86_64
make: Leaving directory `/usr/local/amdbrook/samples/tests/sum'
[root@uncle05 sum]#
-------------------------------------------------------------------------------------------------
I called sum 10 times and it doesnt segf.
however change s to s<8> for example and you get the following o/p
[root@uncle05 sum]# ./../../bin/lnx_x86_64/sum_d
Assertion failure: calkernel.cpp (1027): Reduction output width is not an integer divisor of input width
sum_d: calbase.hpp:92: void CALAssertImpl(const char*, int, const char*): Assertion `0' failed.
Aborted
[root@uncle05 sum]#
The clue here is as it says ... the output width must be an integer divider of the input width.
Ok ... after further testing and calling sum multiple times ( up to eleven in this case )... it would seem that the only safe value for s is s<10> .... the reduction output must equal the input width.
I am running this on a HD 4870 so you should be able to replicate results.
moodz
oops ... s must be declared as s<10> not s<1> as above .... my bad.
Thanks a lot for your help -- you're right: the length of the output stream obviously has to match the input length. However, in this case the reduction kernel is not very useful for my purpose: I want to calculate the abs sum of a large stream (~million elements), so I'd have to waste 999999 elements for nothing 😉 I think reduction really needs some improvement ...
Guys, to confirm -- can you call
float streamA<10>;
float streamB<1>;
sum(A,B);
With this kernel
reduce void sum(float a<>, reduce float b<>)
{
b+=a;
}
AFAIK, this should work.
float streamB<2> should work, as should float streamB<5>.
float streamB<3> will not, since 10 is not an integral multiple of 3. Summing 3.3 elements of streamA to one element of streamB is ill defined. (Well, for most of us anyway.)
Ok, I've tested this code with various combinations for the sizes of A and B.
Config:
HD 4850
SDK 1.2
Catalyst 8.8
Linux 64 (openSuSE 10.3)
Results:
size(A) size(B) Result
-------------------------------------------------------------------------------
4 4 ok
4 2 seg. fault during 4th invocation of sum
4 1 seg. fault during 5th invocation
-------------------------------------------------------------------------------
8 8 ok
8 4 seg. fault during 4th invocation
8 2 seg. fault during 5th invocation
8 1 Assertion failure: calcontext.cpp (1251)
-------------------------------------------------------------------------------
10 10 ok
10 5 seg. fault during 3rd invocation
10 2 seg. fault during 4th invocation
10 1 seg. fault during 8th invocation
-------------------------------------------------------------------------------
16 16 ok
16 8 seg. fault during 4th invocation
16 4 seg. fault during 5th invocation
16 2 Assertion failure: calcontext.cpp (1251)
16 1 seg. fault during 4th invocation
-------------------------------------------------------------------------------
100 100 ok
100 50 seg. fault during 4th invocation
100 2 seg. fault during 4th invocation
-------------------------------------------------------------------------------
256 256 ok
256 128 seg. fault during 4th invocation
256 64 seg. fault during 5th invocation
256 32 Assertion failure: calcontext.cpp (1251)
256 16 seg. fault during 4th invocation
256 8 seg. fault during 3rd invocation
256 4 seg. fault during 5th invocation
256 2 seg. fault during 4th invocation
256 1 seg. fault during 3rd invocation
-------------------------------------------------------------------------------
The number of calls before a seg. fault is reproducible
Hope, this helps!
Greetings
Johannes