Archives Discussions

jtelf2 · ‎11-17-2008

I thought I'd do a quick and dirty comparison of the speed of the CPU vs CAL back-end by using a large value for the Length variable in the hello_brook sample. I ran into some odd behaviour that I can't immediately figure.

For values of Length up to 100,000 everything works fine. For values of Length > 100,000 up to some (undertermined) limit the program returns "failed to get usable kernel fragment to implement requested reduction".

For very large values (Length=1,000,000) the CPU route returns the correct result (eventually) but the CAL route returns "There are 0.000000 elements...".

Have I done something stupid, or have I missed something fundamental? This is running on Win XP 32-bit, Radeon 4870 with 8.10 driver and the debug build of hello_brook.

Jonathan.

** EDIT**

I'm at work right now so I can't test this (no access to a 4870!) but reading through hello_brook.br I noticed that there's a difference from the documented way of specifying the reduction variable. In hello_brook.br we have:

reduce void hello_brook_sum(float input<>, reduce float val<>

Where val is a stream with a single element. The documentation for reduction kernels shows the reduction variable being given as a simple data type, i.e. a float. I'll test this when I get home, but if anyone can confirm that this is the source of the issue I've noticed then that would be great.

If this is the cause are there many of these gotchas in Brook+ and are they being addressed in 1.3?

Thanks again,

Jonathan.

jtelf2 · ‎11-18-2008

I just tested specifying the reduction result as a simple float and I still get the same result - different behaviour between CAL and CPU.

MicahVillmow · ‎11-18-2008

jtelf2,
This issue might be fixed in the 1.3 release, but if you can send your test case to streamdeveloper@amd.com. We can verify that it works and if it doesn't work on getting it fixed for 1.4.

Archives Discussions

CAL and CPU Output Differences in hello_brook?