So, I'm kinda hoping with the 7970 out, perhaps you guys might be willing to divulge the secret happy dance to making uav11 work? I thought perhaps it was that typeless read-only uav thing, and tried it because, well, bascially what I need is a nice big constant buffer. (ok, big is a relative term, but bigger than the constant buffer). I figured, well, a cached read-only uav should be about perfect, and I suspect it will work rather nicely on the 7970s since you said there is basically no more constant buffer.
So a follow up would be how would a typess read-only uav be implemented on Caymans? (wrt caching?) Speed is obviously "my thing" I have a perhaps nice time/space tradeoff, but if I can't get caching, since the time is already fairly low, it may not be worth it. If its cached, I suspect I'll see a decent improvement! If its only good for the 7970's, well, it may not do me much good yet...
You can add "_cached" to the loads. Such as
uav_raw_load_id(11)_cached r1011, r1010.xxxx
This will tell the compiler to use the cached resource. This will work for all GPUs HD54xx and up.
Edit: Sorry this doesn't work outside of OpenCL, sorry for the confusion. See post below.
I was under the impression that the _cached was only supposed to work with uav11, which always seems to return 0s no matter what you do to it. That was what I took away from my last conversation about uav11 anyways I think my playing with my kernel though has got it all messed up now as simply reading from uav0 now shows me all 0's...so I'll get back on that tomorrow, get it working slowly with uav0, then try cached/11/typeless_raw read-only, etc again.
Yes, the example I showed uses uav11 As for getting 0s for the loads... Looks like the "cached" feature for HD5xxx/6xxx is only supported for OpenCL as you won't be able to get access to uav11 otherwise. Sorry for the confusion.
HD77xx/79xx always use caching so you don't need to do anything special there.
Right, that was the reason for my question...as I put it before, I was hoping you could divulge the secret happy dance to get uav11 working. OpenCL is based on CAL from everything I've been told. so there has to be a way....not even really sure why AMD wants to keep this a secret as everything else about the system is pretty open....
So if I had to guess....I'd guess I need to load aticaldd64.dll, GetExport calResAllocView and/or calResAllocViewSlice, but also typedef something for the view return value?
Well, if I have time, I guess I'll just have to try and RE it some more...that was only about 5 minutes of looking...so I bet I can figure it out...;)
Anyhow, then how about telling me if its worthwhile at all to switch to using typless read only uav's on Caymans/turks? How is it implemented? If its small enough will it go into the constant buffer? Will it do something like caching on the typless read only uavs on cayman hardware if uav11 isn't declared anywhere else? Or will it just be like any other memory read, and therefore a performance hit?
Edit: Wow its a monday...stupid typo....
I thought you might say that...had hoped since it compiled that there has been some effort in making it efficient on the slightly older stuff...guess its back to Reverse Engineering...that unfortunately takes time...so its a backburner issue...
Yeah, the EG/NI UAV's have all sorts of problems(caching, alignment, ro/wo/rw, etc..), typeless are the replacements that fix all of the issues, but because of hardware/software constraints, they are not ported backwards.