i found that now there is some support for extensions on GPU.
double support for basic operators +-*/ thoug only on 58xx and 59xx series wonder if CPU is supported now.
ICD support nvidia and ati in the same system as two platforms.
Stream Analyzer Profiler - hope there will be some tool to developer which do not use VS
and new documentation like performance optimizations ISA for 5xxx card.
I tought, that you are speaking of extensions, like khr_byte_addressable_store, but i run my test program, and found nothing new there.
I was hoping, because the new driver and so...maybe next time.
EDIT: I think, i missed something. There is a NEW SDK out!
I found something interesting. There is a line in the Release Notes, page 3, 6.2 Runtime, it says:
"On 7xx-based GPUs, the maximum work-group size is 64. Specifying a larger size can result
in undefined behavior."
I have an EHA4850, wich is an ATI RV770 GPU. I have an applikation, that runs with a work group of 230, and i have no problems there. Wich cards are meant then? The 3xxx series?
I see. A tried now something. I used a work group size of 50 instead of 230, and i got not so much prformance loss, as i tought, in fact, i think its almost nothing. Is it because, i the implicit barrier is holding back the run of the other solution? Or is the barrier ONLY optimized away by a group size of 64?
Another question. How come, that the max work group size is 256 if i can only use 64 of it?
Ok, becaus i got the 230 group size from the runtime. I did set the group size to NullRange, and so.
With 230 i get ~7,27 ms kernel time, and with 50 ~8,13 ms.
So, is it because the barrier, that there is so little between the two performances, or because the kernel does not so much work?
I attach my kernel if you'd like to see, its a simple masking algorithm, for pictures.
And, thank you, for the quick, and accurate answer!
__kernel void maszkol ( __global unsigned int *rOut, __global unsigned int *gOut, __global unsigned int *bOut, __global unsigned int *rMid, __global unsigned int *gMid, __global unsigned int *bMid, __global unsigned int *N, __global unsigned int *width, __global unsigned int *height, __global int *seged_t, __global float *maszk ) { unsigned int seged=*width*(*height); unsigned int hatar=*N*(*N); unsigned int cim=0; unsigned int i=0; float ertek=0; switch(get_global_id(0)/seged) { case 0: cim=((*N/2)*(*N-1+*width))+((get_global_id(0)/(*width))*(*N-1))+(*N/2)+get_global_id(0); for(i=0;i<hatar;++i) { ertek+=rMid[cim+seged_t]*maszk; } rOut[get_global_id(0)]=ertek; break; case 1: cim=((*N/2)*(*N-1+*width))+(((get_global_id(0)-seged)/(*width))*(*N-1))+(*N/2)+get_global_id(0)-seged; for(i=0;i<hatar;++i) { ertek+=gMid[cim+seged_t]*maszk; } gOut[get_global_id(0)-seged]=ertek; break; case 2: cim=((*N/2)*(*N-1+*width))+(((get_global_id(0)-(2*seged))/(*width))*(*N-1))+(*N/2)+get_global_id(0)-(2*seged); for(i=0;i<hatar;++i) { ertek+=bMid[cim+seged_t]*maszk; } bOut[get_global_id(0)-(2*seged)]=ertek; break; } }