Hi All!
I would like to ask people that have try to use ATI Stream (with CAL or without), about perfomance that they got. (in GFlops please)
I have try with CUDA (may be i was stupid that i did not try ATI first) , and i got maximum (real maximum) 25% of decleare perfomance in specs.
Now i am trying to understand, is it possible to reach 1000 GFlops decleared for Rxx or it is a fake (sory... :-)) like a CUDA?
I have red, that some people realy got 97%. I beleve.
I need to know:
Speed of one shader processor.
Speed with access to L1 cashe,
Speed with access to L2 cashe,
Speed with access to global memory.
If it is possible to attach some simple test application for VS2005(or 2008) , it will be simply perfect.
Thanks & regards,
Dmitry
PS. If some one give me example with 95% of decleared perfomance, i will be bigest fan of ATI and AMD!!!
?? ????? ?? ????????, ? ?????? ???? ? ???? ?? ??????. ? ? ??????? ???????? ??? FLOAT_4 ? ????? ??? 95%.
Originally posted by: godsic ?? ????? ?? ????????, ? ?????? ???? ? ???? ?? ??????. ? ? ??????? ???????? ??? FLOAT_4 ? ????? ??? 95%.
Some problems with fonts. I can not read. (pishi latinicey, forum ne ponimaet po russky :-))
Teoreticheski, napisav shader v kotorom slazhyvausta dva FLOAT_4 i potom pishutsa v FLOAT_4 buffer dadut etu samuu proizvoditelnost, marketing. Struktura cachei ne pohozha na CPU. Oni menshe i organizovany po drugomu. CAL odnoznachno luchshe CUDA, no trebuet ot programmera ne tolko znanie C. Zhelatelno opyt s graph API - DX ili OpenGL. Elsi on est to vse prosto i legko, hotya IL zadalbyvaet ochen. Vspominaetsa srazu rodnoi Pentagon-1024 .
Eli tak, to voobsche zashibis`!!! 🙂
Opyta imenno s DX i OpenGL net, no est` ochen` bol`shoy opyt s raznimy DSPs, tak chto problem ne dolzhno byt`.
PS. Ya ochen` udivilsya kogda ubedilsya chto CUDA - fake.
PS.PS. U tebya net sluchajno prosten`kogo proekta ka primera?
K sozhaleniu seichas vse ne so mnoi. Mogu kinut v etu sredu s moim opytom optimizacii.
A tak mozhesh zaglyanut v /doc/Stream_programming_guide.pdf. Tam est razdel HelloCAL i tam prostenkoe prilozhenie s opisaniem chego i kak. Ya s nego i nachinal. Tam zhe lezhyt pdf s specifikaciaymi IL - sintaksis i komandy.
Et tozhe nado obyazatelno prochitat.
V otnoshenii API - ono tak sebe. So vremenem napishysh classy s oblegchennymi vyzovami pod konkretnuu zadachu.
OK!
Spasibo! 🙂
Ne za chto. Nashe delo pravoe!
One more question.
Is it possible to save values from kernels to frame buffer directly? Or how to use a framebuffer in ATI Stream? Is it possible?
You can try it with DX-interoperability exposed in ATI stream.
Konechno mozhno.
Tut est dva aspecta:
1. Sledi za tem kakie imena on prisvaivaet peremennym (bufferam, konstantam itd). Potom imenno ih neobhodimo ispolzovat v vyzovah CAL ili popravit rukami esli ne nravyatsa v samom IL code.
2. Brook+ -> IL mozhet generirovat ne optimalnyi kod!!! AMD reshyla ne zamorachivatsa na optimizaciyah. (GLSL i DX shadery optimiziruutsa!).
Mozho ispolzovat SKA (Stream Kernel Analyzer - skachat sdes zhe v razdele GPU tools). U nego v levom pole pishysh Brook+ a v pravom klacaesh - otobrazit IL. Sdes zhe proizvoditsa analiz effectivnosti coda(eto osnovnoe naznachenie etoi programmy). Brook+ ochen lubit konvertirovat tipy tam gde eto ne nuzhno.