Originally posted by: notzed
Umm ok. Calm down mate! I know exactly where you're coming from, so there's no need to get so worked up. I really have 'been there, done that' with this; far more than is sensible.
This is why someone needs to come up with some sort of proper text accenting system, better and more acceptable than emoticons. You read that with accenting on the text as though I was "worked up". Anything but! I had figured use of the word soapbox would mean something different, but given your use of the word mate, I suppose you're not from the US? Though chillax would make me think California, either way, perhaps it means something different where you come from. Out here its kind of a joke saying this is an important issue to me, and I talk about it alot, but people rarely listen, and though sometimes it may seem like I'm preaching, and therefore would care with religious fervor, the very fact that I bring out the word soapbox, means I'm lauging at myself for it, and have come to grips with it being unpopular, and so I'm anything but worked up. Yes, thats a long explanation for a single word, but that's the meaning it carries around here. I also tried to indicate it was a joke by repeating it in such an obvious way with the pattern "statement", did I mention "statement", because if I didn't, "statement". That pattern was used by comedians, and US sitcoms, so its supposed to have a humorous connotation, thus the intentional use....so relax and cheer up! Not everyone on internet forums are angry! I for one am usually happy and very easy going....only when I'm asking my own questions, and getting stonewalled with marketing type answers do I ever get "worked up", and I usually end up using words like "frustrated" to indicate 🙂 When you read things, try not to put the negative, snippy, snarky, or otherwise negetive inflections on things unless its clear the person is being very negetive. I tried my hardest to show that was not the tone I was using (see description of connotation of soapbox, and the repeating pattern again for how), but seems, I still failed! Worst case is you think people you may never actually meet are friendlier than they are....thats not such a bad thing! Perhaps I've been duped with highly negetive postings, taken positivly. Does it have any effect on me? Nope!
As for "wrong", you applied it broadly to yourself, rather than how I applied it to your making of the statement saying this is not the case anymore, to my statement of you need to help the optimizer rather than trust the optimizer. Thus the inclusion of the quote. That statement was, is, and will continue to be wrong for some time, and thats not an opinion, thats fact. It's quite easy to write bad (bad as in slow) C/C++ code. I wasn't going to sugarcoat that, despite my not being "worked up", and I never will. As I said, its a soapbox issue for me, with all the meaning given in the first paragraph, so I will continue to "preach" it, and I *KNOW* what I'm saying is factual, not subjective, not an opinion. I have code for the proof, but it was written for the company, on company time, so the company owns it, so I can't just post it, though perhaps if I find some more time somehow, I'll start an optimization blog...
See, this is why we need better text accenting schemes...all that *just* to explain how what I wrote should have been read, and how I read what was written...so much wasted text...
More on topic, I never argued against using pragma unroll, merely stated you shouldn't trust it. Remember, in the C spec, pragma unroll is merely a hint, just like inline. OpenCL may be different, but given it's C like nature, I doubt it. So even this workaround for the stated problem is not necessarily a workaround should the optimizer (probably wrongly) decide not to unroll the loop. (Though AMD's optimizer is very aggressive as I've been stating, so chances are pretty good it will listen, or even just always listen reguardless, but thats internal details, we only know what we're told) I just want everyone to understand that it, nor other directives, optimization settings, or the optimizer itself are magic black boxes that through tremendous heat and pressure turn dung into diamonds. At best, I'd say it managaes http://www.changingworldtech.com 's TCP turning dung into crude oil, but give it bad inputs and it will turn dung into dung.
Since you have a little background...here's a little on mine...I tried starting programming at age 5 on a commodore 64, but my brain wasn't really mature enough, however, I still showed interest, and by 8, I had my dad's old 286 with borland C++ 2.0, and a book on programming, and off I went. Started college classes at age 13. Always focused on video games which is synonomous with optimization. I could list the books I have on the topic, but it would make this far too long. I currently work at a job where optimization is a function I perform, though most of my previous jobs, it was still appreciated, and I'm at least decent at it. I bet Michael Abrash could sqeeze another 10% out my code, but that dude is something else...No, I didn't go into the game industry, so you won't see any Corrys in the credits, (or if you do, its not me). I decided $20K less/year, 80+ hour work weeks, and higher stress was just downright stupid compared to the defense industry where I had a job offer....I also have interests in theoretical physics, mechanical engineering, electrical engineering, and deriviatives of combining all of the above, enough that I have a small research company attempting to build/experiment with one such derivative idea. Thus my always being way too busy!
Anyhow, keep it positive, someone dig through the IL, for the OP, Micah said he can't reproduce so either its fixed internally, or something else is up. There's a chance tonight I may get a chance to look at it if no one else does...I've got a lot to do...
Originally posted by: corryMight I suggest then, doing AMD's work for them and characterizing the OpenCL->IL bug exactly?
Actually, as best I can recall (I'm away from my GPU workstation today), the swizzle/loop-unrolling bug I encountered was not going from OpenCL to IL, but instead the bug manifested itself going from IL to ISA. But to be honest, I didn't look much at the IL. I speculate that the the "pragma unroll" bypasses the bug since by the time the IL->ISA compiler in Catalyst is working on my code, the four loop bodies and the swizzles in them, have all been directly listed out four times. Without the "pragma unroll", the IL->ISA compiler is doing it's own loop unrolling and messing up keeping track of the swizzle. (Oh, and the swizzle is being fully optimzed away and does not appear in the ISA file, it's simply changing which field in the 4 element vector is being worked on per loop body.)
On a side note, AMD did respond to my bug report support ticket, but I've not heard back if they could reproduce it or if it's fixed yet, etc.
10/26/2011 update: At AMD's request, I submitted a full code to reproduce the bug, along with my test results on various GPUs, Catalyst versions, and SDK versions. Hopefully AMD Catalyst developers can make sense of it and get this fixed. Actually, I do hope this is just one bug, and not multiple bugs that have a similar workaround.
I am also experiencing this bug with APP 2.5 in my application. If required I could try to also create a sample code showing this.
Some slight systematic I saw was that it seems to get the correct number of iterations without the pragma if the compile time defined iteration count is something like 4 or 8, but with odd numbers - in my case 13 - it seems to get off track. Specifying "#pragma unroll 13" fixes it in that case.