Archives Discussions

michael_chu · ‎03-12-2009

Now that 1.4 has been released to the public, we would like feedback on it un order to further improve future releases of the SDK. we would appreciate your help in providing feedback in this thread so that the information does not get buried in other threads. Please make sure you label each item as a 'Feature Request', 'Bug Reports', 'Documentation' or 'Other'. As always, you can send an email to 'streamcomputing@amd.com' for general requests or 'streamdeveloper@amd.com' for development related requests.

If you wish to file a Feature Request, please include a description of the feature request and the part of the SDK that this request applies to.

If you wish to file a Bug Report, please include the hardware you are running on, operating system, SDK version, driver/catalyst version, and if possible either a detailed description on how to reproduce the problem or a test case. A test case is preferable as it can help reduce the time it takes to determine the cause of the issue.

If you wish to file a Documentation request, please specify the document, what you believe is in error or what you believe should be added and which SDK the document is from.

Thank you for your feedback.
AMD Stream Computing Team

rahulgarg · ‎03-26-2009

Documentation request : Please publish L1 and L2 cache sizes if possible.

karx11erx · ‎04-24-2009

#define PI ((float) 3.14159265358979323846)

kernel float adjlon (float lon<> )
{
if (abs (lon) > PI) {
   lon += PI;
   lon -= 2 * PI * floor (lon / (2 * PI));
   lon += PI;
   }
return lon;
}

1>d:/projects/proj-4.6.1/src/pj_merc.br(37) : ERROR--1: Problem with variable in kernel: variable not defined
1> Statement: PI in abs(lon) > PI

karx11erx · ‎04-24-2009

double post because I get an error message everytime I post instead of the post being displayed.

What kind of crap forum software is this?

karx11erx · ‎04-24-2009

double post because I get an error message everytime I post instead of the post being displayed.

karx11erx · ‎04-24-2009

Is there no atan() function in ATI's stream SDK?

gaurav_garg · ‎04-24-2009

Pre-processor is disabled by default. You need to compile br file with -pp flag.

karx11erx · ‎04-24-2009

The preprocessor is disabled by default?

Oh my.

tan() doesn't seem to be supported either. That's poor. Makes ATI's stream computing useless for me. Too bad.

Well, that's a result too, as I am only just evaluating it.

Btw, what's this?

Here's my source code, Brook+ compiles, but then ...

#define PI       (float) 3.14159265358979323846
#define EPS       (float) 1.0e-12

typedef struct { float x, y; } XY;
typedef struct { float lam, phi; } LP;

typedef struct {
   int over;
   int geoc;
       int is_latlong;
       int is_geocent;
   double
       a, a_orig,
       es, es_orig,
       e,
       ra,
       one_es,
       rone_es,
       lam0, phi0,
       x0, y0,
       k0,
       to_meter, fr_meter;
       int datum_type;
           float datum_params[7];
           float from_greenwich;
           float long_wrap_center;
   } PJ;

kernel float adjlon (out float lon<> )
{
if (abs (lon) > PI) {
   lon += (PI);
   lon -= (2.0f * PI) * floor (lon / (2.0f * PI));
   lon += (PI);
   }
return lon;
}

kernel float atan (float f<> )
{
return f;
}

kernel float tan (float f<> )
{
return f;
}

kernel void pj_inv_pre (float x_in<>, float y_in<>, out float x_out<>, out float y_out<>, float to_meter, float x0, float y0, float ra)
{
x_out = (x_in * to_meter - x0) * ra;
y_out = (y_in * to_meter - y0) * ra;
}

kernel void merc_s_inverse (float x<>, float y<>, out float lam<>, out float phi<>, float k0)
{
phi = (PI / 2.0f) - 2.0f * atan (exp (-y / k0));
lam = x / k0;
}

kernel void pj_inv_post (float lam_in<>, float phi_in<>, out float lam_out<>, out float phi_out<>, float lam0, float one_es, int geoc, int over)
{
lam_out = lam_in + lam0;
if (!over)
   lam_out = adjlon (lam_out);
if (((float) geoc != 0.0f) && (abs (abs (phi_in) - PI > EPS)))
   phi_out = atan (one_es * tan (phi_in));
else
   phi_out = phi_in;
}

void pj_inv_par (PJ* P, float x[], float y[], float lam[], float phi[], int nCoord)
{
   float x_in , y_in, x_temp , y_temp ;
   float lam_in , phi_in, lam_temp , phi_temp ;

streamRead (x_in, x);
streamRead (y_in, y);
pj_inv_pre (x_in, y_in, x_temp, y_temp, P->to_meter, P->x0, P->y0, P->ra);
merc_s_inverse (x_temp, y_temp, lam_temp, phi_temp, P->k0);
pj_inv_post (lam_temp, phi_temp, lam_out, phi_out, P->lam0, P->one_es, P->geoc, P->over);
streamWrite (lam, lam_out);
streamWrite (phi, phi_out);
}

1>------ Build started: Project: proj, Configuration: Debug Win32 ------
1>Brook+ compilation
1>NOTICE: Parse error
1>While processing :105
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>While processing :105
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ","
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_pj_inv_pre.hlsl
1>Error--:cal back end failed to compile kernel "pj_inv_pre"
1>NOTICE: Parse error
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ";"
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_pj_inv_pre_addr.hlsl
1>Error--:cal back end failed to compile kernel "pj_inv_pre"
1>NOTICE: Parse error
1>While processing :105
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>While processing :105
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ","
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_merc_s_inverse.hlsl
1>Error--:cal back end failed to compile kernel "merc_s_inverse"
1>NOTICE: Parse error
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ";"
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_merc_s_inverse_addr.hlsl
1>Error--:cal back end failed to compile kernel "merc_s_inverse"
1>NOTICE: Parse error
1>While processing :105
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>While processing :105
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ","
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_pj_inv_post.hlsl
1>Error--:cal back end failed to compile kernel "pj_inv_post"
1>NOTICE: Parse error
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ";"
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_pj_inv_post_addr.hlsl
1>Error--:cal back end failed to compile kernel "pj_inv_post"
1>deleting file : d:\projects\proj-4.6.1\src\pj_merc_br.cpp
1>deleting file : d:\projects\proj-4.6.1\src\pj_merc_br.h
1>deleting file : d:\projects\proj-4.6.1\src\pj_merc_br_gpu.h
1>***Code generation found errors
1>Project : error PRJ0019: A tool returned an error code from "Brook+ compilation"
1>Build log was saved at "file://d:\projects\proj-4.6.1\VisualC\Debug\BuildLog.htm"
1>proj - 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

I may have misread something, but from the Brook+ documentation I had the impression that compound data types (i.e. structs) would be supported by streams and kernels. Apparently they don't though.

double gets rejected by the CAL compilation step following on the Brook+ compilation step, too, did I interpret that right?

Apart from missing functions tan and atan the following code doesn't compile at all:

#define HALFPI    1.5707963267948966
#define FORTPI    0.78539816339744833
#define PI        3.14159265358979323846
#define TWOPI    6.2831853071795864769
#define EPS        1.0e-12

typedef struct { double x, y; } XY;
typedef struct { double lam, phi; } LP;

typedef struct {
    int over;
    int geoc;
        int is_latlong;
        int is_geocent;
    double
        a, a_orig,
        es, es_orig,
        e,
        ra,
        one_es,
        rone_es,
        lam0, phi0,
        x0, y0,
        k0,
        to_meter, fr_meter;
        int datum_type;
            double datum_params[7];
            double from_greenwich;
            double long_wrap_center;
    } PJ;

kernel double adjlon (double lon<>
{
if (fabs (lon) > SPI) {
    lon += ONEPI;
    lon -= TWOPI * floor (lon / TWOPI);
    lon -= ONEPI;
    }
return lon;
}

kernel void pj_inv_pre (XY xy_in<>, out XY xy_out<>, PJ P)
{
xy_out.x = (xy_in.x * P.to_meter - P.x0) * P.ra;
xy_out.y = (xy_in.y * P.to_meter - P.y0) * P.ra;
}

kernel void merc_s_inverse (XY xy<>, out LP lp<>, PJ P)
{
lp.phi = HALFPI - 2.0 * atan (exp (-xy.y / P.k0));
lp.lam = xy.x / P.k0;
}

kernel void pj_inv_post (LP lp_in<>, out LP lp_out<>, PJ P)
{
lp_out.lam = lp_in.lam + P.lam0;
if (!P.over)
    lp_out.lam = adjlon (lp_out.lam);
if (geoc && fabs (fabs (lp_in.phi) - HALFPI) > EPS)
    lp_out.phi = atan (P.one_es * tan (lp_in.phi));
else
    phi_out = phi_in;
}

void pj_inv_par (PJ P, XY xy[], LP lp[], int nCoord)
{
    XY xy_stream ;
    XY xy_temp ;
    LP lp_out ;
    LP lp_temp ;

streamRead (xy_stream, xy);
pj_inv_pre (xy_stream, yx_temp, P);
merc_s_inverse (xy_temp, lp_temp, P);
pj_inv_post (lp_temp, lp_out, P);
streamWrite (lp, lp_out);
}

gaurav_garg · ‎04-24-2009

Structures and double both are supported, but there seems to be a bug in structure compilation. Also, atan and tan are keywords, hence compilation fails. Changing your code to -

#define PI        (float) 3.14159265358979323846
#define EPS        (float) 1.0e-12

typedef struct { float x; float y; } XY;
typedef struct { float lam; float phi; } LP;

typedef struct {
    int over;
    int geoc;
        int is_latlong;
          int is_geocent;
    double a;
    double a_orig;
    double es; double es_orig;
    double e;
    double ra;
    double one_es;
    double rone_es;
    double lam0;
    double phi0;
    double x0;
    double y0;
    double k0;
    double to_meter;
    double fr_meter;
        int datum_type;
            float datum_params[7];
            float from_greenwich;
            float long_wrap_center;
    } PJ;

kernel float adjlon (out float lon<> )
{
if (abs (lon) > PI) {
    lon += (PI);
    lon -= (2.0f * PI) * floor (lon / (2.0f * PI));
    lon += (PI);
    }
return lon;
}

kernel float atan_mine (float f<> )
{
return f;
}

kernel float tan_mine (float f<> )
{
return f;
}

kernel void pj_inv_pre (float x_in<>, float y_in<>, out float x_out<>, out float y_out<>, float to_meter, float x0, float y0, float ra)
{
x_out = (x_in * to_meter - x0) * ra;
y_out = (y_in * to_meter - y0) * ra;
}

kernel void merc_s_inverse (float x<>, float y<>, out float lam<>, out float phi<>, float k0)
{
phi = (PI / 2.0f) - 2.0f * atan_mine (exp (-y / k0));
lam = x / k0;
}

kernel void pj_inv_post (float lam_in<>, float phi_in<>, out float lam_out<>, out float phi_out<>, float lam0, float one_es, int geoc, int over)
{
lam_out = lam_in + lam0;
if (!over)
    lam_out = adjlon (lam_out);
if (((float) geoc != 0.0f) && (abs (abs (phi_in) - PI > EPS)))
    phi_out = atan_mine (one_es * tan_mine (phi_in));
else
    phi_out = phi_in;
}

void pj_inv_par (PJ* P, float x[], float y[], float lam[], float phi[], int nCoord)
{
    float x_in , y_in, x_temp , y_temp ;
    float lam_in , phi_in, lam_temp , phi_temp ;

streamRead (x_in, x);
streamRead (y_in, y);
pj_inv_pre (x_in, y_in, x_temp, y_temp, P->to_meter, P->x0, P->y0, P->ra);
merc_s_inverse (x_temp, y_temp, lam_temp, phi_temp, P->k0);
pj_inv_post (lam_temp, phi_temp, lam_out, phi_out, P->lam0, P->one_es, P->geoc, P->over);
streamWrite (lam, lam_out);
streamWrite (phi, phi_out);
}

works.

karx11erx · ‎04-24-2009

Gaurav,

thank you very much. I'd never have figured Brook+ the struct declaration problem myself.

As far as tan and atan go: I was of course initially expecting to be built-in math functions with these names available, but when I compiled that code, I got error messages about unknown identifiers tan and atan, so I added my own tan and atan functions just to make the code compileable.

Edit:

I have just checked the Brook+ documentation, appendix A.5: tan and atan are not intrinsic Brook+ functions.

The following code looks valid to me, but the CAL backend compiler flags errors for it and deletes all output files:

#define HALFPI    1.5707963267948966
#define FORTPI    0.78539816339744833
#define PI        3.14159265358979323846
#define SPI     3.14159265359
#define TWOPI    6.2831853071795864769
#define EPS        1.0e-12
#define HUGE    1.0e308

typedef struct { double x, y; } XY;
typedef struct { double lam, phi; } LP;

typedef struct {
    int over;
    int geoc;
        int is_latlong;
        int is_geocent;
    double a;
        double a_orig;
    double es;
        double es_orig;
    double e;
    double ra;
    double one_es;
    double rone_es;
    double lam0;
    double phi0;
    double x0;
    double y0;
    double k0;
    double to_meter;
    double fr_meter;
    int datum_type;
        double datum_params[7];
        double from_greenwich;
        double long_wrap_center;
    } PJ;

kernel double fabs (double d)
{
return (d < 0.0) ? -d : d;
}

kernel double adjlon (out double lon<> )
{
if (fabs (lon) > SPI) {
    lon += PI;
    lon -= TWOPI * (double) floor ((float) (lon / TWOPI));
    lon -= PI;
    }
return lon;
}

kernel double tan (double phi<>
{
    double cosphi = (double) cos ((float) phi);

return (cosphi == 0.0) ? HUGE : (double) sin ((float) phi) / cosphi;
}

kernel double atan (double phi<>
{
    double phi2 = phi * phi;

return (phi + 0.43157974 * phi2 * phi) / (1.0 + 0.76443945 * phi2 + 0.05831938 * phi2 * phi2);
}

kernel void pj_inv_pre (double x_in<>, double y_in<>, out double x_out<>, out double y_out<>, double to_meter, double x0, double y0, double ra)
{
x_out = (x_in * to_meter - x0) * ra;
y_out = (y_in * to_meter - y0) * ra;
}

kernel void merc_s_inverse (double x<>, double y<>, out double lam<>, out double phi<>, double k0)
{
phi = HALFPI - 2.0 * atan ((double) exp ((float) (-y / k0)));
lam = x / k0;
}

kernel void pj_inv_post (double lam_in<>, double phi_in<>, out double lam_out<>, out double phi_out<>, double lam0, double one_es, int geoc, int over)
{
lam_out = lam_in + lam0;
if (!over)
    lam_out = adjlon (lam_out);
if (((double) geoc != 0.0) && (fabs (fabs (phi_in) - PI > EPS)))
    phi_out = atan (one_es * tan (phi_in));
else
    phi_out = phi_in;
}

void pj_inv_par (PJ* P, double x[], double y[], double lam[], double phi[], int nCoord)
{
    double x_in <nCoord>, y_in<nCoord>, x_temp <nCoord>, y_temp <nCoord>;
    double lam_in <nCoord>, phi_in<nCoord>, lam_temp <nCoord>, phi_temp <nCoord>;

streamRead (x_in, x);
streamRead (y_in, y);
pj_inv_pre (x_in, y_in, x_temp, y_temp, P->to_meter, P->x0, P->y0, P->ra);
merc_s_inverse (x_temp, y_temp, lam_temp, phi_temp, P->k0);
pj_inv_post (lam_temp, phi_temp, lam_out, phi_out, P->lam0, P->one_es, P->geoc, P->over);
streamWrite (lam, lam_out);
streamWrite (phi, phi_out);
}

1>------ Build started: Project: proj, Configuration: Debug Win32 ------
1>Brook+ compilation
1>NOTICE: Parse error
1>While processing <buffer>:105
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>While processing <buffer>:105
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ","
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_pj_inv_pre.hlsl
1>Error--:cal back end failed to compile kernel "pj_inv_pre"
1>NOTICE: Parse error
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ";"
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_pj_inv_pre_addr.hlsl
1>Error--:cal back end failed to compile kernel "pj_inv_pre"
1>NOTICE: Parse error
1>While processing <buffer>:105
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>While processing <buffer>:105
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ","
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_merc_s_inverse.hlsl
1>Error--:cal back end failed to compile kernel "merc_s_inverse"
1>NOTICE: Parse error
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ";"
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_merc_s_inverse_addr.hlsl
1>Error--:cal back end failed to compile kernel "merc_s_inverse"
1>NOTICE: Parse error
1>While processing <buffer>:105
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>While processing <buffer>:105
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ","
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_pj_inv_post.hlsl
1>Error--:cal back end failed to compile kernel "pj_inv_post"
1>NOTICE: Parse error
1>In compiler at zzerror()[parser.y:112]
1> message = syntax error
1>ERROR: Parse error. Expected declaration.
1>In compiler at zzparse()[parser.y:198]
1> (yyvsp[0]) = ";"
1>Aborting...
1>Problem with compiling d:\projects\proj-4.6.1\src\pj_merc_br_pj_inv_post_addr.hlsl
1>Error--:cal back end failed to compile kernel "pj_inv_post"
1>deleting file : d:\projects\proj-4.6.1\src\pj_merc_br.cpp
1>deleting file : d:\projects\proj-4.6.1\src\pj_merc_br.h
1>deleting file : d:\projects\proj-4.6.1\src\pj_merc_br_gpu.h
1>***Code generation found errors
1>Project : error PRJ0019: A tool returned an error code from "Brook+ compilation"
1>Build log was saved at "file://d:\projects\proj-4.6.1\VisualC\Debug\BuildLog.htm"
1>proj - 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

gaurav_garg · ‎04-24-2009

Is there no atan() function in ATI's stream SDK?

atan() is not supported. List of supported intrinsic methods is availale under section A.5 of stream computing user guide.

ajit1984 · ‎07-25-2009

gdfgdgd

rveldema · ‎03-17-2009

cpu runtime doesn't seem to work, for example, here's a GDB session (Fedora 9).

The hello_brook example is from the brook distro, not something I created...

[veldema@faui21k lnx_x86_64]$ gdb ./hello_brook
GNU gdb Fedora (6.8-23.fc9)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) r
Starting program: /usr/local/atibrook/samples/bin/CPP/lnx_x86_64/hello_brook
[Thread debugging using libthread_db enabled]
[New Thread 0x7fe1340eb720 (LWP 630)]
No protocol specified
Failed to initialize CAL. Falling back to CPU
^C
Program received signal SIGINT, Interrupt.
0x0000003d2220d744 in __lll_lock_wait () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install atistream-brook.x86_64
(gdb) where
#0 0x0000003d2220d744 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x0000003d22208ee4 in _L_lock_100 () from /lib64/libpthread.so.0
#2 0x0000003d22208901 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000a56a69 in brook::ThreadLock::lock () from /usr/local/atibrook/sdk/lib/libbrook.so
#4 0x0000000000a4e57b in brook::SystemRT::getDevices () from /usr/local/atibrook/sdk/lib/libbrook.so
#5 0x0000000000a4e6c1 in brook::SystemRT::getCurrentDevices () from /usr/local/atibrook/sdk/lib/libbrook.so
#6 0x0000000000a4e7a3 in brook::SystemRT::createStreamImpl () from /usr/local/atibrook/sdk/lib/libbrook.so
#7 0x000000000040317f in brook::Stream<float>::Stream ()
#8 0x0000000000406b55 in HelloBrook::run ()
#9 0x0000000000406e7a in main ()
(gdb)

ryta1203 · ‎03-17-2009

My apologies, I'm having a mental lapse from an unusual weekend still. Thanks.

rahulgarg · ‎03-17-2009

Documentation request : Performance guidelines for LDS in CAL.

There are no performance guidelines about how to use LDS in CAL. For example, its unclear whether the address calculation done to access a value in LDS is done by the TUs (in which case the net bandwidth from LDS will become TU bound)? Is the LDS arranged in some sort of banks? What is the latency of access to LDS?

ANogin · ‎03-19-2009

According to http://developer.amd.com/gpu/ATIStreamSDK/pages/ATIStreamSystemRequirements.aspx I need Catalyst 9.3 to use 1.4 SDK with AMD FireStream, however when I attempt to download the driver, I only get offered Catalyst 9.2. Is 9.3 available at all? What do I need to do in order to be able to use SDK 1.4 with AMD FireStream 9250 under 64 bit RHEL 5.3?

TIA!

Aleksey

LenAlox · ‎03-19-2009

I am not sure what is going on, we upgraded from 1.3 to 1.4. Everything was working fine under sdk 1.3 but after upgrading to 1.4 and trying to compile I get the following message:

brcc simple.br

g++ -L/usr/local/atibrook/sdk/lib -lbrook -I/usr/local/atibrook/sdk/include/ -g -o simple simple.cpp common/Timer.cpp

/usr/local/atibrook/sdk/lib/libbrook.so: undefined reference to `dlsym'

/usr/local/atibrook/sdk/lib/libbrook.so: undefined reference to `dlerror'

/usr/local/atibrook/sdk/lib/libbrook.so: undefined reference to `dlopen'

/usr/local/atibrook/sdk/lib/libbrook.so: undefined reference to `dlclose'

collect2: ld returned 1 exit status

make: *** [cp] Error 1

I ideas what is going wrong? I am very sure we did also upgrade to catylist 9.2. The system I am using is a 64 bit centOS box.

ryta1203 · ‎03-20-2009

Feature Request: Give the ability to turn off different levels of optimizations (preferably altogether) within the CAL compiler. According to yourselves this currently does not exist.

Ceq · ‎03-22-2009

Which is the "official" way of doing this in SDK 1.4?

float4 a = float4(1.0f, 2.0f, 3.0f, 4.0f);

float b = 0.5f;

float4 a_mul_b;

...

a) a_mul_b = a * b.xxxx;

ERROR--1: Problem with expression in kernel: Illegal to use swizzle on scalar types

b) a_mul_b = a * b;

ERROR--1: In Binary expression: Mismatched operands: both must have same type and same number of components

c) a_mul_b = a * (float4)b;

ERROR--1: : explicit casting required to have same no of components

d) float4 t = float4(b, b, b, b);

a_mul_b = a * b;

But this results in longer and harder to read code and you have to define all temporal variables at the beginning of the each kernel.

Any other way or hint?

ryta1203 · ‎03-22-2009

Ceq,

Why not this:

kernel void foo(float4 a<>, float b<>, out float4 c<>
{
c.x = a.x*b;
c.y = a.y*b;
c.z = a.z*b;
c.w = a.w*b;
}

This creates 1 bundle in ISA according to KSA, just like your example 1 would. Yes, it's a little longer code and it would be very nice to be able to use "a_mul_b = a * b" and I think you should be able to do this (well, at least according to the KSA Brook+ compiler), but the above method results in no errors and the ISA looks clean. Have you tried turning off type checking and looking at the results?

Ceq · ‎03-22-2009

Thanks Ryta, my kernels are already quite large, doing that they would become quite cumbersome.

I agree it would be nice to have something like "a_mul_b = a * b" or "a_mul_b = a * b.xxxx" by default.

I will try disabling strong type checking, thanks.

By the way, using the last workaround (d) to compile in Brook 1.4 my code runs about 10% slower than 1.3.

Jawed · ‎03-22-2009

OK, I'm confused, why doesn't something like this work:

kernel void test(float4 a, float b, out float4 c<>)
{
c=a*b;
}

it works here in SKA.

I'm using the brcc.exe from 1.4. To make it work with SKA I copied Brook+ from:

C:\Program Files\ATI\ATI Brook+ 1.4.0_beta

into:

C:\Program Files\AMD\AMD CAL 1.2.0_beta

(deleted the 1.2 tree when I installed 1.4 - I didn't have 1.3 installed, so then copied 1.4 in there when I discovered SKA had stopped working).

Jawed

ryta1203 · ‎03-22-2009

Originally posted by: Ceq Thanks Ryta, my kernels are already quite large, doing that they would become quite cumbersome.

I agree it would be nice to have something like "a_mul_b = a * b" or "a_mul_b = a * b.xxxx" by default.
I will try disabling strong type checking, thanks.

By the way, using the last workaround (d) to compile in Brook 1.4 my code runs about 10% slower than 1.3.

I disabled strong type checking and my results were ok. I ran a very simple test so I'd like to know how your results turned out when you disabled strong type checking.

AMD: IMO, if you are going to allow float4 = float4*float then you should have this be ok even WITHOUT strong type checking disabled. JMO.

thesquiff · ‎03-23-2009

BRCC bug:

In a kernels.br file I started my kernel definition on the first line of the file kernel void etc etc. When compiled to the .cpp file this then contained #line -1 statements which the compiler didn't like (line -1 doesn't exist).

Workaround is simple - leave a blank line at the start of the .br file!

Ceq · ‎03-23-2009

Hi Ryta, disabling strong checking works... sometimes. Looks like there is a bug in BRCC 1.4 that makes the compiler to abort. Commenting unrelated code (even if never used) could fix the compilation. It's quite weird so I'll try to extract a test case and send a example to AMD.

EDIT: The bug is unrelated to strong checking flag, it happens even without using it.

psyhtest · ‎05-06-2009

I'm trying to get started with the Stream SDK 1.4, but installing things under Ubuntu are certainly not easy :brokenheart. Luckily, much of the [http=http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=28]old advice[/http] applies.

However, I'm using GCC 4.3 that [http=http://gcc.gnu.org/gcc-4.3/porting_to.html]strictly enforces[/url] the "int main(int m, char** c)" signature, which makes all examples broken (CALuint needs to be replaced with CALunt). I guess this is something you could easily update.

psyhtest · ‎05-12-2009

More problems with GCC 4.3. Attempting to compile Brook+ samples results in "error: explicit template specialization cannot have a storage class" (see http://gcc.gnu.org/gcc-4.3/porting_to.html).

gaurav_garg · ‎05-12-2009

Re-post

gaurav_garg · ‎05-12-2009

Originally posted by: psyhtest More problems with GCC 4.3. Attempting to compile Brook+ samples results in "error: explicit template specialization cannot have a storage class" (see http://gcc.gnu.org/gcc-4.3/porting_to.html).

These error comes becuase of Code generated for CPU backend. The quick workaround for this is to compile .br file -p cal option that disables CPU backend generation and only generates code required by GPU runtime.

guenthernoack · ‎03-23-2009

Hi!

I think there's a bug in the Brook+ Specification (Appendix A of the Stream Computing User Guide). In A 4.1.2 (p A-9), the following is stated:

"Brook+ provides two mechanisms for specifying reductions: reduction variables and reduction functions."

As far as I understand it, reduction variables should be usable from within any kernel, given that a well-defined reduction function or associative assignment operator is used. However, Brook+ doesn't seem to support it. (I tried with the examples on page A-9 with Brook+ 1.3. It doesn't compile.)

The same goes for the example reduction functions min_reduce and max_reduce on page A-10. (They don't compile.)

i must admit that I also have problems understanding the example calc_sum_f on the same page. Is the kernel's result undefined because of calling f() on a, or is it undefined because it's not commutative. (Is the function f to be seen as a mathematics-style function or a C-style function?)

Best regards,

Günther

Ceq · ‎03-25-2009

Hello, in a previous post I said I was having compilation issues with Brook+ 1.4 as sometimes the compiler aborted compilation of complex code. I found out that it only was happening on a Core2 machine (Win32 MSVC), on my Athlon X2 it compiled fine, so I rebuilt the compiler BRCC and the problems went away. Maybe it was built or linked with some troublesome option.

I just wanted to report it just in case somebody had the same problem.

Peterp · ‎03-25-2009

I'm using Windows XP with a Firestream 9250, can i install the Hotfix http://support.amd.com/us/kbarticles/Pages/GPU-5-Catalyst93HOTFIXFireStrm.aspx over the 9.3 Driver or do i have to uninstall the old driver first?

dromit · ‎03-31-2009

Bug Report (Radeon HD 3850, Athlon XP, Windows XP SP3, atistream-1.4.0_beta-winxp32, Catalyst 9.2).
Everytime I run every pre-built samples from atistream brook+, I encounter the same error message "unknown software exception (0xc000001d) .. at address 0x100123b2". Similar error occurs when I run brcc.exe. It seems that my cpu does not support some instructions (possible, from sse2 set). I guess some library (and brcc.exe) was built with sse2 support. And I think this instructions set is not critical for accelerating calculation on GPU 🙂 Can I run brcc.exe and the program using this library on my cpu?

scollange · ‎04-06-2009

Documentation request:

In the CAL Intermediate language specifications from v1.4, arithmetic behavior of instructions should be clarified. In particular:

* What is the difference between AND and IAND?

* IMIN, IMAX, UMIN, UMAX descriptions include discussions about NaN, signed zero and IEEE-754 compliance which is irrelevant for these integer operations.

* ITOF: which rounding mode is used? (Toward zero?)

* UTOF: description contains two mutually-exclusive statements: "If an integer is not represented exactly, the nearest representable value is used." and "Rounding is
performed towards zero". Which one is true?

* UTOF: "Applications that require different rounding semantics can invoke the
ROUND_* instructions before using UTOF."
Calling ROUND_* on integers does not make sense.

* F2D: Discussion about rounding is irrelevant, a conversion from float to double is always exact.

* ADD, MUL: Which rounding mode is used? (Round to nearest-even?)

* MAD: "The lower 32 bits of the multiplication": does not make sense for floating-point arithmetic.
How many roundings are performed, using which rounding modes? When are overflows/underflows detected?

* MIN, MAX: description seems to imply the same behavior whether or not the _ieee suffix is used. Is this really what is meant?

* RND: change "vertex" to "vector".

* D_MULADD: description suggests that the behavior is implementation-dependant. Could this behavior be documented precisely for each GPU?

Thanks.

aoooooooooon · ‎04-15-2009

calInit() cause reboot and hang on POST if each monitor on 4850 and 3850 is enabled.
A workaround is disabling 3850's monitor, but cal runtime can't find 3850.

CPU : Athlon64 X2 5600+ (Winsor)
Motherboard : GA-MA69G-S3H
Memory : 2GB + 2GB Dual channel
GPU : HD 4850 in x16 slot and HD 3850 in x4 slot
OS : Windows XP Professional SP3
Driver : Catalyst 9.1 or later
SDK : 1.3 or later

maxmkh · ‎04-19-2009

Hi,

I have Vista 64bit, radeon 4800 series and intel q9450.

BRT_RUNTIME = cpu, does not help to debug, I was expecting to go through all the lines of the kernel (like usually I debug programs), but I'm not able to do that. The only effect that I got is that the program was executed much slower.

BRT_PERMIT_READ_WRITE_ALIASING, does not work for me. I need to adapt the following code:

...

localVal = CV_IMAGE_ELEM(Poles,float,pty,ptx);

toAdd = (1-fx)*(1-fy);

CV_IMAGE_ELEM(Poles,float,pty,ptx) = localVal+toAdd;

...

where pty and ptx could be arbitary values (but they dont go outside array bounds).

I'm trying to pas to the kernel the same stream for input and output, some thing like that:

in the main program

poles_computition(detalls, solutionMap, Poles_in, dx2_int, dy2_int, dxdy_int, xdx2_ydxdy_int, ydy2_xdxdy_int, Poles_in, halfWindowSize, halfWindowSize, thresholdDet, (float)dimension[0], (float)dimension[1]);

and below is the code of the kernel

...

localVal = Poles_in[ind_y][ind_x];

toAdd = (1.0f-fx.x)*(1.0f-fy.x);

Poles_out[ind_y][ind_x] = localVal+toAdd;

...

but i got rong result.

I did try to read and write to the same stream with very simple kernel but did not succed.

Any ideas how to implent this?

maxmkh · ‎04-19-2009

Hi,

I have Vista 64bit, radeon 4800 series and intel q9450.

BRT_RUNTIME = cpu, does not help to debug, I was expecting to go through all the lines of the kernel (like usually I debug programs), but I'm not able to do that. The only effect that I got is that the program was executed much slower.

BRT_PERMIT_READ_WRITE_ALIASING, does not work for me. I need to adapt the following code:

...

localVal = CV_IMAGE_ELEM(Poles,float,pty,ptx);

toAdd = (1-fx)*(1-fy);

CV_IMAGE_ELEM(Poles,float,pty,ptx) = localVal+toAdd;

...

where pty and ptx could be arbitary values (but they dont go outside array bounds).

I'm trying to pas to the kernel the same stream for input and output, some thing like that:

in the main program

poles_computition(detalls, solutionMap, Poles_in, dx2_int, dy2_int, dxdy_int, xdx2_ydxdy_int, ydy2_xdxdy_int, Poles_in, halfWindowSize, halfWindowSize, thresholdDet, (float)dimension[0], (float)dimension[1]);

and below is the code of the kernel

...

localVal = Poles_in[ind_y][ind_x];

toAdd = (1.0f-fx.x)*(1.0f-fy.x);

Poles_out[ind_y][ind_x] = localVal+toAdd;

...

but i got rong result.

I did try to read and write to the same stream with very simple kernel but did not succed.

Any ideas how to implent this?

mbouzaidi · ‎05-11-2009

Originally posted by: michael.chu@amd.com Now that 1.4 has been released to the public, we would like feedback on it un order to further improve future releases of the SDK. we would appreciate your help in providing feedback in this thread so that the information does not get buried in other threads. Please make sure you label each item as a 'Feature Request', 'Bug Reports', 'Documentation' or 'Other'. As always, you can send an email to 'streamcomputing@amd.com' for general requests or 'streamdeveloper@amd.com' for development related requests. If you wish to file a Feature Request, please include a description of the feature request and the part of the SDK that this request applies to.

Feature request: OpenGL interoperability
Description: Read from and write to OpenGL buffer objects/textures inside/outside kernels.
Applies to: CAL

WTrei · ‎05-14-2009

I started testing 1.4 a few days ago and I was somehow disapointed that local array's still aren't supported.

Is there any hope they will be available in 1.5?

WTrei · ‎05-14-2009

Sorry, there was an error-message when posting my message, so I did it a 2nd time. ^^

vvolkov · ‎05-20-2009

It seems there is no way to use LDS broadcast, which is a useful tool for many codes, such as in dense linear algebra. It's neither exposed explicitly in IL, nor generated by IL compiler when used implicitly. It is nearly impossible to use it in native assembly either as the assembler has bugs that prevent compiling valid programs.

It's also a pity that GDS is not exposed in CAL. It might provide performance advantages even if it did not support atomic operations.

Vasily

Archives Discussions

SDK 1.4 Feedback