Hi, I've got few question about opencl & cal for ATI team. ( And if possible please don't give PR type of answers - examples: "you can't compare opencl to brook cause opencl doesn't use texture units" or "opencl is only for 5xxx family" when compiler lists only 4xxx target ).
1. Do you plan to release new extesions used by opencl ( and when ) ? Also if possible could you write short description what they do ( some are obvious ) ?
Here is the list:
calExtGetProc: extid=8007 name=calConfig
calExtGetProc: extid=8005 name=calCtxCreatePrivateCounter
calExtGetProc: extid=8005 name=calCtxConfigPrivateCounter
calExtGetProc: extid=8005 name=calCtxGetPrivateCounter
calExtGetProc: extid=8008 name=calResAllocView
calExtGetProc: extid=8008 name=calResQueryInfo
calExtGetProc: extid=8008 name=calResMemCopy
calExtGetProc: extid=8009 name=calCtxWaitForEvents ( is it blocking ? )
calExtGetProc: extid=800B name=calMemCopyRaw
2. Are the devs going to implement LDS optimization for 4xxx family. Specifically I'm thinking about detecting if kernel writes to LDS match pattern "LDS[const1*p + const2]=value" ( where const2<const1 ). This would allow to use native LDS. And if memory access doesn't follow this pattern use global memory ( as it's done now ).
Probably most of the kernels will use this access pattern anyway and it would give huge speed advantage ( and probably some could be converted by programmers if they knew about this optimization ).
3. Are the devs planning to implement use of texture units for memory access ( 4xxx family ). Again the problem is detecting by compiler if memory reads follow the pattern "value=some_const_pointer_parameter[width*y + x]" ( where x<width and width is some value which could be computed by kernel ( or const or parameter ) ). As it's const it cannot be written and memory overlapping with other parameters could be detected at run time ( then we use current compiler code ).
This optimization is quite important for writing efficient kernels ( like matrix mul ).
4. CAL & 4xxx question. Access from/to memory by g variable ( global buffer ) generates code with UNCACHED flag. Is it possible to change it to CACHED ?
Example from some code:
07 TEX: ADDR(64) CNT(1)
9 RD_SCATTER R0, DWORD_PTR[0+R0.x], ELEM_SIZE(3) UNCACHED
Documentation to 7xx ISA suggests CACHED flag should be available.
And one more thingy . If you can't answer some or all of this questions please write so .
Also I can add that for me opencl is unusable without points 2 & 3 ( i'm forced to use CAL or switch to other brand which could be less hassle ).