cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

zpdixon
Journeyman III

HD 5870 and 5970 working :-)

I ordered an HD 5870 from Newegg on Tuesday (5min after a script I wrote sent me an alert indicating its availability ), I received it today Thursday, upgraded the ATI drivers on my 64-bit Linux GPGPU dev box to version 9.9, kept the SDK to version 1.4, and compiled a test program to measure the FLOPS rating:

2662 GFLOPS, or 98% of the max theoretical 2720 GFLOPS
This is 36% more FLOPS than my 4850 X2 cards, at 81% the power consumption. Everything just worked on the first attempt even though the card is not yet "officially supported" by the 9.9 Linux drivers - I love it 🙂

Update: I have 2 HD 5970 working too

0 Likes
51 Replies

And more about R800. The way it works with memory fetching way too differs from R700. I'm using simple IL construction like:

dcl_resource_id(0)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(1)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(2)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(3)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)

...

sample_resource(1)_sampler(1) r11, r1.y000
sample_resource(2)_sampler(2) r61, r1.y000

... etc

And test results aren't looking good at all. With a lot of memory reads performance of 5770 dropped to 285M while 4770 shows 357M. That's R800 850Mhz 800SP vs R700 750Mhz 640SP, while memory for 5770 clocked at 1200Mhz vs 800Mhz for 4770, GDDR5, 128-bit bus. So 5770 25% slower than 4770 while in theory it must be 40% faster.

 

 Anybody else getting similar results? Any explanations of this? If memory model changed for R800 what's the best way to do memory fetches? CAL examples coming with OpenCL beta 4 using the same sample_resource() constructions but that examples way too old.

 

0 Likes

Any plans to document uav_raw_load_id, uav_raw_store_id? To publish R800 ISA? To answer some questions on this forum?

0 Likes

empty_knapsack,
Our next release will have updated documentation that should cover all the newer hardware.
0 Likes

Our next release will have updated documentation that should cover all the newer hardware.

A Christmas present i.e. Coming this year?

0 Likes

Yeah, it'll be nice if you'll able to provide some ETA for next release, Micah.

 

I've just tested calCtxWaitForEvents() extproc and like it functionality, no more endless calCtxIsEventDone() pulling needed, so cpu load now at 0%.

Still cannot figure out how to create uav buffer though, I guess it should be done with calResAllocView but it contains too many params.

 

If next OpenCL release way ahead in future is it possible to simply post updated cal_ext.h file? I guess it'll be enough to figure out how new CAL extensions works.

0 Likes

I hope next docs is copiable and has easy CAL/IL tutorial for beginner

0 Likes

UAV instructions finally documented, however cal extensions aren't, so it's still impossible (or unknown how) to alloc UAV buffer. Any plans to publish fresh cal_ext.h file? The one that was used to compile OpenCL.DLL, cal_ext.h dated as 10-Dec-2009 coming from latest SDK still doesn't contains any new extensions, namely calResAllocView.

0 Likes

empty_knapsack,
binding a UAV surface should work just like binding a global buffer surface except that instead of using g[] with cal get name you use uav#.

so instead of r = calModuleGetName(&progName, *ctx, *module, "g[]"); you would use
r = calModuleGetName(&progName, *ctx, *module, "uav0");
r = calModuleGetName(&progName, *ctx, *module, "uav1");
up to 8 UAV's on HD5XXX cards and 1 UAV on HD4XXX card

0 Likes

Binding isn't a problem. Allocating buffer itself is.

Or am I wrong and it's possible to bind resource created with calResCreate2D/calResAllocLocal2D/etc as UAV buffer? I wasn't successful with it but probably I've made some mistake, I was under expression that calResAllocView strongly required to allocate UAV buffer.

0 Likes

empty_knapsack,
calResAllocView is not used to create a resource, it's usage is still experimental, which is why it is not exposed yet. The OpenCL runtime uses calResCreate*/calResAlloc* to create the UAV memory.
0 Likes

OK, thanks for info, I'll make more tests with UAV then.

 

Also, is calCtxWaitForEvents() also experimental and so not exposed yet? Looks like OpenCL layer heavily using it, waitForEvent functionality is really nice addition.

0 Likes

Ok, allocating raw UAV should be no different than allocating global.
0 Likes