I ordered an HD 5870 from Newegg on Tuesday (5min after a script I wrote sent me an alert indicating its availability ), I received it today Thursday, upgraded the ATI drivers on my 64-bit Linux GPGPU dev box to version 9.9, kept the SDK to version 1.4, and compiled a test program to measure the FLOPS rating:
This is 36% more FLOPS than my 4850 X2 cards, at 81% the power consumption. Everything just worked on the first attempt even though the card is not yet "officially supported" by the 9.9 Linux drivers - I love it 🙂
2662 GFLOPS, or 98% of the max theoretical 2720 GFLOPS
Update: I have 2 HD 5970 working too
And more about R800. The way it works with memory fetching way too differs from R700. I'm using simple IL construction like:
dcl_resource_id(0)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(1)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(2)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(3)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
...
sample_resource(1)_sampler(1) r11, r1.y000
sample_resource(2)_sampler(2) r61, r1.y000
... etc
And test results aren't looking good at all. With a lot of memory reads performance of 5770 dropped to 285M while 4770 shows 357M. That's R800 850Mhz 800SP vs R700 750Mhz 640SP, while memory for 5770 clocked at 1200Mhz vs 800Mhz for 4770, GDDR5, 128-bit bus. So 5770 25% slower than 4770 while in theory it must be 40% faster.
Anybody else getting similar results? Any explanations of this? If memory model changed for R800 what's the best way to do memory fetches? CAL examples coming with OpenCL beta 4 using the same sample_resource() constructions but that examples way too old.
Any plans to document uav_raw_load_id, uav_raw_store_id? To publish R800 ISA? To answer some questions on this forum?
Our next release will have updated documentation that should cover all the newer hardware.
A Christmas present i.e. Coming this year?
Yeah, it'll be nice if you'll able to provide some ETA for next release, Micah.
I've just tested calCtxWaitForEvents() extproc and like it functionality, no more endless calCtxIsEventDone() pulling needed, so cpu load now at 0%.
Still cannot figure out how to create uav buffer though, I guess it should be done with calResAllocView but it contains too many params.
If next OpenCL release way ahead in future is it possible to simply post updated cal_ext.h file? I guess it'll be enough to figure out how new CAL extensions works.
I hope next docs is copiable and has easy CAL/IL tutorial for beginner
UAV instructions finally documented, however cal extensions aren't, so it's still impossible (or unknown how) to alloc UAV buffer. Any plans to publish fresh cal_ext.h file? The one that was used to compile OpenCL.DLL, cal_ext.h dated as 10-Dec-2009 coming from latest SDK still doesn't contains any new extensions, namely calResAllocView.
Binding isn't a problem. Allocating buffer itself is.
Or am I wrong and it's possible to bind resource created with calResCreate2D/calResAllocLocal2D/etc as UAV buffer? I wasn't successful with it but probably I've made some mistake, I was under expression that calResAllocView strongly required to allocate UAV buffer.
OK, thanks for info, I'll make more tests with UAV then.
Also, is calCtxWaitForEvents() also experimental and so not exposed yet? Looks like OpenCL layer heavily using it, waitForEvent functionality is really nice addition.