cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

SDK 1.4 Feedback

Now that 1.4 has been released to the public, we would like feedback on it un order to further improve future releases of the SDK. we would appreciate your help in providing feedback in this thread so that the information does not get buried in other threads. Please make sure you label each item as a 'Feature Request', 'Bug Reports', 'Documentation' or 'Other'. As always, you can send an email to 'streamcomputing@amd.com' for general requests or 'streamdeveloper@amd.com' for development related requests.

If you wish to file a Feature Request, please include a description of the feature request and the part of the SDK that this request applies to.

If you wish to file a Bug Report, please include the hardware you are running on, operating system, SDK version, driver/catalyst version, and if possible either a detailed description on how to reproduce the problem or a test case. A test case is preferable as it can help reduce the time it takes to determine the cause of the issue.

If you wish to file a Documentation request, please specify the document, what you believe is in error or what you believe should be added and which SDK the document is from.

Thank you for your feedback.
AMD Stream Computing Team

0 Likes
115 Replies
tgm
Journeyman III

where is libaticalcl.so? I only find libamdcalcl.so in /usr/lib64. You guys give three different names from v1.2 to v1.4. What a mess! Why?

0 Likes

Update your driver to catalyst 9.2.

SDK 1.4 is intended to work with Catalyst 9.2 and later.

Catalyst 9.2 comes with the required aticalcl.so/aticalrt.so etc.

0 Likes

Could you confirm this under Linux?

0 Likes

tgm: I am using sdk 1.4 on ubuntu 8.10 64-bit. I have installed catalyst 9.2 and its working for me.

0 Likes

Feature request for driver: Please increase limit of pinned memory. On 64-bit linux systems, the limit appears to be 16mb. This restricts usage of calResCreate2D to small array sizes. How should I report this feature request to the driver team?

 

0 Likes

Originally posted by: rahulgarg tgm: I am using sdk 1.4 on ubuntu 8.10 64-bit. I have installed catalyst 9.2 and its working for me.

 

Sorry, I didn't make my problem clear. The card on my system is Firestream9250. It seems that there is no support?

0 Likes

After looking at the release notes, it is still not clear to me whether one still has to call strem.error() after kernel calls to avoid the slowdown issue.

0 Likes

You need not to make this call any longer. This has been fixed.

0 Likes

I've just installed Windows XP, which driver i've to install so use the SDK 1.4? I've a Firestream 9250.

0 Likes

Originally posted by: rahulgarg Update your driver to catalyst 9.2.

 

SDK 1.4 is intended to work with Catalyst 9.2 and later.

 

Catalyst 9.2 comes with the required aticalcl.so/aticalrt.so etc.

 

I installed Catalyst 9.2 also.  Where is aticalcl.so located?  My ACMLg1.0 makefile need to define the directory contains it.

0 Likes

Never mind.

0 Likes
maxmkh
Journeyman III

Hi,

I have Vista 64bit,  radeon 4800 series  and intel q9450.

BRT_RUNTIME = cpu, does not help to debug,  I was expecting to go through all the lines of the kernel (like usually I debug programs), but I'm not able to do that.  The only effect that I got is that the program was executed much slower.  

BRT_PERMIT_READ_WRITE_ALIASING, does not work for me. I need to adapt the following code:

 

...

 

localVal = CV_IMAGE_ELEM(Poles,float,pty,ptx);

toAdd = (1-fx)*(1-fy);

CV_IMAGE_ELEM(Poles,float,pty,ptx) = localVal+toAdd;

 

...

where pty and ptx could be arbitary values (but they dont go outside array bounds).

I'm trying to pas to the kernel the same stream for input and output, some thing like that:

in the main program

poles_computition(detalls, solutionMap, Poles_in, dx2_int, dy2_int, dxdy_int, xdx2_ydxdy_int, ydy2_xdxdy_int, Poles_in, halfWindowSize, halfWindowSize, thresholdDet, (float)dimension[0], (float)dimension[1]);

and below is the code of the kernel

...

localVal = Poles_in[ind_y][ind_x];

toAdd = (1.0f-fx.x)*(1.0f-fy.x);

Poles_out[ind_y][ind_x] = localVal+toAdd;

...

 

 but i got rong result.

I did try to read and write to the same stream with very simple kernel but did not succed.

Any ideas how to implent this?

 

0 Likes

Sorry for the repetition, I did try 3 time and every time got an error.

0 Likes

You can debug with BRT_RUNTIME=cpu if you compile br file with -nl flag.

========

If you are reading and writing at random places, you can get incorrect results with read-write aliasing. But, in your case read/write seems from the same place. Could you paste a test case that shows this issue? Also, could you try to use regular output stream and see if it works?

0 Likes
yuriy_v
Journeyman III

For curiosity: what is the target version/estimated availability for OpenCL 1.0 support?

In v.1.3 description was stated that OpenCL will be supported upon standart availability.

0 Likes
fesc2000
Adept I

I just upgraded to 1.4 beta and found out that the CPU backend doesn't seem to work any more. It seems to hang in streamRead().

Is this a known issue?

Thanks,

Felix.

0 Likes

could you please paste the test case which shows hanging?

0 Likes

Below is a code snipped and a gdb backtrace, nothing spectacular, and the sample binaries show the same behaviour ..

This is a debian system with no GPU (although i have catalyst 9.2 installed). It used to work with sdk 1.3.

 

int grav_cal(int w, int h, int iterations, float *dens, float *grav, float *chg, char *fb)
{
    float gm, cm;

    {
        unsigned int i;
        float dens_stream<h,w>;
        float grav_stream<h,w>;
        float new_grav_stream<h,w>;
        float new_chg_stream<h,w>;
        float chg_stream<h,w>;
        float diff2<h,w>;
        char  fb_stream<h,w>;
        char  fb_in_stream<h,w>;

        streamRead(grav_stream, grav);
(...)

GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) set args 1 1
(gdb) run
Starting program: /home/fesc/work/plansch/btest 1 1
[Thread debugging using libthread_db enabled]
[New Thread 0x40573b40 (LWP 6699)]
Failed to initialize CAL. Falling back to CPU
^C
Program received signal SIGINT, Interrupt.
[Switching to Thread 0x40573b40 (LWP 6699)]
0x4052e1b4 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) backtrace
#0  0x4052e1b4 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x405299e5 in _L_lock_89 () from /lib/libpthread.so.0
#2  0x405292f2 in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x40266c66 in pthread_mutex_lock () from /lib/libc.so.6
#4  0x40166d2b in brook::ThreadLock::lock () from /usr/local/atibrook/sdk/lib/libbrook.so
#5  0x4015defc in brook::SystemRT::getDevices () from /usr/local/atibrook/sdk/lib/libbrook.so
#6  0x4015e069 in brook::SystemRT::getCurrentDevices () from /usr/local/atibrook/sdk/lib/libbrook.so
#7  0x4015e152 in brook::SystemRT::createStreamImpl () from /usr/local/atibrook/sdk/lib/libbrook.so
#8  0x0804ad81 in brook::Stream<float>::Stream ()
#9  0x080497f2 in grav_cal ()
#10 0x0804b059 in main ()

 

0 Likes

could you please send us the values for w, h?

 

 

could you please send us your system information?

0 Likes

In this particular example the dimension is 32x32.

System is debian sid, 32-bit, gcc 4.3, X.org 7.3, E2160 CPU.

0 Likes

Hi!

I believe I found a Brook compilerbug - it was already present in the SDK v1.3-beta but I didn't manage to report this yet.

When assigning the value 0x80000000 to a uint in a kernel, the most significant bit is lost.

Can anyone confirm this?

I sort of 'fixed' it by passing the value as an argument to the kernel.

 

Edit: I see this has already been reported - This seems to be the same bug: http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=109176&enterthread=y

0 Likes

This has been fixed in 1.3

see following test case.

kernel void TestBrookKernel(int Input<>, out int Output<>
{
    Output = (int)0x80000000; // any value with 1 at last bit will produce
//0x00000000 at output
}

int main()
{
    int Input[64 * 64], Output[64 * 64];
    int i = 0, j = 0;
    int d_Input<64, 64>;
    int d_Output<64, 64>;

    streamRead(d_Input, Input);
    TestBrookKernel(d_Input, d_Output);
    streamWrite(d_Output, Output);
    for(i = 0; i < 64; i++)
    {
        for(j = 0; j < 64; j++)
        {
            if(Output[i * 64 + j] != (int) 0x80000000)
            {
                printf("Failed");
                return 0;
            }
        }
    }
    printf("Passed");
    return 0;
}

0 Likes

This has been fixed in 1.3

see following test case.

kernel void TestBrookKernel(int Input<>, out int Output<>
{
    Output = (int)0x80000000; // any value with 1 at last bit will produce
//0x00000000 at output
}

int main()
{
    int Input[64 * 64], Output[64 * 64];
    int i = 0, j = 0;
    int d_Input<64, 64>;
    int d_Output<64, 64>;

    streamRead(d_Input, Input);
    TestBrookKernel(d_Input, d_Output);
    streamWrite(d_Output, Output);
    for(i = 0; i < 64; i++)
    {
        for(j = 0; j < 64; j++)
        {
            if(Output[i * 64 + j] != (int) 0x80000000)
            {
                printf("Failed");
                return 0;
            }
        }
    }
    printf("Passed");
    return 0;
}

0 Likes

We are not able to reproduce at our end

see test case

 

int grav_cal(int w, int h, int iterations, float *grav, float *chg)
{
    float gm, cm;

    {
        unsigned int i;
        float dens_stream<h,w>;
        float grav_stream<h,w>;
        float new_grav_stream<h,w>;
        float new_chg_stream<h,w>;
        float chg_stream<h,w>;
        float diff2<h,w>;
        char  fb_stream<h,w>;
        char  fb_in_stream<h,w>;

        streamRead(grav_stream, grav);
        streamWrite(grav_stream, chg);
    }
    return 0;
}
int main()
{

int iterCount = 100;
int i = 0;
float* i0 = NULL;
float* o0 = NULL;
int width = 32, height = 32;

i0 = (float*)malloc(sizeof(float)* width * height);
o0 = (float*)malloc(sizeof(float)* width * height);

for(i = 0; i < width * height; i++)
{
    i0 = (float)(i + 1);
}


grav_cal(width, height, iterCount, i0, o0);

for(i = 0; i < width * height; i++)
{
    if(i0
!= o0)
    {
        printf("Failed!!");
        free(i0);
        free(o0);
        exit(0);
    }
}

printf("\nPass\n");

free(i0);
free(o0);

return 0;
}

 

0 Likes

Below is a code snipped and a gdb backtrace, nothing spectacular, and the sample binaries show the same behaviour ..

 

This is a debian system with no GPU (although i have catalyst 9.2 installed). It used to work with sdk 1.3.

 

 

I think it is OS dependent problem, as debian is not officially supported, this OS was not tested. Try exporting environment variable BRT_RUNTIME=cpu (one way to tell runtime to use CPU backend explictly) and see if you still get this problem. Let me know if it helps.

0 Likes

Ok, thanks, that fixes it. I thought that compiling with "-p cpu" is sufficient.

 

0 Likes

Ok, thanks, this fixes it!

I thought compiling with "-p cpu" is sufficient.

0 Likes
dar
Journeyman III

brook 1.3 issues warning related to conditional expressions, e.g.,

test_int4.br(13) : WARN--1: conditional expression must have scalar type. On short vectors, assumes x components as condition
                 Statement: (int4 ) (imask == tmp) in tmp = ((int4 ) (imask == tmp)) ? (XXX4) : (tmp)

However, code relying upon component-wise conditional evaluation worked and produced correct results, i.e., the warning appeared incorrect.

brook 1.4 elevates this warning to an error which causes code that worked to not be compilable anymore, e.g.,

em_fdtd_gpu.br(96) : ERROR--1: : conditional expression must be a scalar data type.

        Statement: iz == mz4 in ex001 = iz == mz4 ? ex00o.yzwx : ex001
        Expression : iz, Type : float4
        Expression : mz4, Type : float4

 Note: Use built-in functions any() or all() when you are using relational operators on vector data types

As example of code that worked under 1.3 despite warning, consider below,

kernel void test_int4_gpu_kern( int n, int4 s_src<>, out int4 s_dst<> ) {

    const int4 XXX4 = int4(65535,65535,65535,65535);
    int4 imask = int4(n,n,n,n);
    int4 tmp = s_src;

    /* works with brtvector.hpp patch */
    tmp = ((int4)(imask == tmp))? XXX4 : tmp;

    s_dst = tmp;
}

void
test_int4_gpu( int n, int m, int* ia, int* ib ) {


    int n4 = n/4;
    {
    int i;
    int4 s_ia<n4>;
    int4 s_ib<n4>;

    streamRead(s_ia,ia);
    test_int4_gpu_kern(m,s_ia,s_ib);
    streamWrite(s_ib,ib);
    }
}

int main(int argc, char** argv) {


    int i;
    int n = 40;
    int* ia = (int*)malloc(n*sizeof(int));
    int* ib = (int*)malloc(n*sizeof(int));

    for(i=0;i<n;i++) {
        ia = i;
        ib
= 0;
    }

    for(int m = 0; m < 12; m++) {

        test_int4_gpu(n,m,ia,ib);

        printf("m=%d\n",m);
        for(i=0;i<n/4;i++) {
            printf("%d: %x %x %x %x -> %x %x %x %x\n",
                i,ia[i*4+0],ia[i*4+1],ia[i*4+2],ia[i*4+3],
                ib[i*4+0],ib[i*4+1],ib[i*4+2],ib[i*4+3]
            );
        }

    }

cleanup:

    if (ia) free(ia);
    if (ib) free(ib);

   return(0);

}

This test code "sweeps" a value through the int4 data and masks out values component-wise to 0xffff.  here is part of output showing component-wise application  of conditional expression.

m=0
0: 0 1 2 3 -> ffff 1 2 3
1: 4 5 6 7 -> 4 5 6 7
2: 8 9 a b -> 8 9 a b
3: c d e f -> c d e f
...
m=1
0: 0 1 2 3 -> 0 ffff 2 3
1: 4 5 6 7 -> 4 5 6 7
2: 8 9 a b -> 8 9 a b
3: c d e f -> c d e f
...
m=2
0: 0 1 2 3 -> 0 1 ffff 3
1: 4 5 6 7 -> 4 5 6 7
2: 8 9 a b -> 8 9 a b
3: c d e f -> c d e f
...

...

 

 

 

 

 

 

0 Likes
rick_weber
Adept II

Does 1.4 support double precision math functions? These are kinda important for scientific computing.

0 Likes

Originally posted by: rick.weber Does 1.4 support double precision math functions? These are kinda important for scientific computing.

 

 

I suppose it wouldn´t kill me to read the release notes It´s not presently supported.

0 Likes

When I go to the drivers page for the Firestream 9170, all I see is catalyst 8.51, which is what I currently have. If I install the 9.2 drivers for another card, will it work?

0 Likes
ryta1203
Journeyman III

1. Feature Request: Local arrays in a high level language. I'm pretty sure that this is going to be necessary for OpenCL, so I imagine that you guys are already working on this for OpenCL.

2. Documentation: ISA examples for CAL. I can't seem to get ISA code generated by the AMUDISASM (which generated both header and footer) program to work, I keep getting parser error, I've posted about this several times with little/no response other than "it's not recommended".

0 Likes
ryta1203
Journeyman III

Sorry, I almost forgot:

Feature Request/Other: PROFILER!!!!!!!!!!!!!!!!

0 Likes

Feature request : Profiler (as also suggested by ryta1203)

Specifically, I request that we get some kinds of counters (in CAL) for :

a) usage of texture units. often the TUs are the bottlenecks given the high ALU:TU ratio of the chips.

b) cache hit ratio (this seems broken on 48xx cards afaik). it will be useful to get different cache hit ratios for L1 and L2 cache. there are  no published details about the cache so often its difficult to calculate theoretically what the hit ratios will be.

Without the counters, it is often very difficult to determine where the bottleneck in a system is. We are often reduced to trial and error to optimize kernels.

0 Likes
ryta1203
Journeyman III

Is there a particular reason that my app would run slower in 1.4 than in 1.3 using Brook+?

For 2000 iterations of a size of 1024x1024 (which runs at a total of about 87000 iterations), my code runs 1 second slower, that's about 40 seconds slower for the whole app.

That's quite a decrease in performance, any ideas?

0 Likes
ryta1203
Journeyman III

The variable name "size" no longer compiles, get error "redefinition".

This was NOT a problem in 1.3 but IS a problem in 1.4 and does NOT seem to be documented like "line" and "transpose" are.

Just a note, it's a "Documentation" error it seems.

0 Likes

we did not see any compilation issue with the following kernel

kernel void copy(float i<>, out float o<>
{
    float size = 0.0f;
    o = i;
}

 

if above kernel is not appropriate one, could you please send use test case which shows compilation issue?

 

0 Likes

where is libaticalcl.so? I only find libamdcalcl.so in /usr/lib64. You guys give three different names from v1.2 to v1.4. What a mess! Why


0 Likes

Is there a particular reason that my app would run slower in 1.4 than in 1.3 using Brook+?

For 2000 iterations of a size of 1024x1024 (which runs at a total of about 87000 iterations), my code runs 1 second slower, that's about 40 seconds slower for the whole app.

 

 

0 Likes