Hmmm such a shame. I was hoping for something more.
Yes, I am too interested in this AMD optimised code...i Can only assume this is the same code that was used when AMD said in January that BD would have 50% higher performance than a 4C I7. Unfortunately, it seems to only match the performance of a 8xx/9xx I7 range, and falls far behind in single threaded and gaming. Maybe they were mistakenly benchmarking on their OC'd 8.4ghz chip?
I assume the only significant improvement would be to make use of AVX instructions, but how much this will improve your average OpenCL code is another thing.
My reply is slightly off topic, but the short version is have you run your code through CodeAnalyst? Its free, so if you're going to complain about optimized code running slow, you'll probably want to.
It should show you if you're consistantly generating cache misses, mispredicted branches, or not taking full advantage of Out of Order Execution (getting your loads as early as possible, and not generating dependencies on stores), etc. Should make it pretty easy to speed up, especially if you're working in ASM. It will most certainly help C/C++ code as well though, you just don't get 100% control over the hardware, so you have to be a little more creative in getting the compiler to do the right thing. :)
I have no idea how much time either major x86 chip vendor is spending on OpenCL for the CPU, but if I had to guess, I'd say intel is probably doing more because they're betting their GPU market on x86 with Knights Ferry, then again, those x86 cores are stripped down, so the same optimization techniques may not apply...
Yeah, I'd say if I wanted something fast on the CPU, I'd look at CodeAnalyzer, vTune (err "Parralel Studio XE" so you've got their optimizing compiler as well), and generate specific versions of the code targeting specific microarchitectures....although, I'd probably not waste my time with netburst....its just all around trash....can you tell I dislike netburst? :)
Edit: Or I could be wrong, Micah posted in a different thread comparing CPU and GPU performance on a mac with this link http://dl.acm.org/citation.cfm?id=1854302 Seems AMD might be doing more to optimize OpenCL on the CPU :)
The short answer is yes, expect something in SDK 2.6.