Wow, really? No one? or No Where?
Originally posted by: corry Wow, really? No one? or No Where?
This is right place. Not sure whether it is worthy to discuss or not as CAL is going to be deprecated.
Ugh, I could go on for hours about this...really, and I'm still new to IL, and still making what I know to be newbie mistakes...Unfortunately, as far as my own tests have shown, and experimentally proven by closed and open source projects, OpenCL is still up to 5x slower than hand optimized IL. Thats just the simple fact of the matter. I didn't push for AMD GPU's for the ease of development, I pushed for the performace, up to 3x better than the closest competition hand optimized IL vs hand optimized ptx, thanks mostly due to some really clever hardware design decisions...so for me reguardless of AMD's deprecation, I'm with IL/CAL. I had hoped that with the deprecation the community using it would have moved elsewhere...in reality, it should never have been in the OpenCL forums, but I digress.
Anyhow, I'll just say this, I certainly hope IL/CAL haven't been deprecated in favor of OpenCL only. My hope is that its more because of the APU movement, and that we should expect to see some sort of x86/IL hybrid instruction set soon. No, I don't expect a comment from AMD about that, as I am sure if that's the case, given the x86 instruction set war going on, it would be a closely guarded secret. (Though AMD has been *much* more open then Intel...)
I guess I'll be looking to register on some of the project specific forums to see if I can't get some of the developers to talk. The hand optimized IL and ptx is usually closed source...including ISA binaries not IL code. I think some of them are on here, but certainly not all of them.
While we can't give a lot of details, we have already exposed our next generation IL language.
It is called FSA and was publicly talked about as the keynote to the AFDS(AMD Fusion Developer Summit). Some more information about it can be found here:
Also, create an OpenCL program with -fbin-amdil, and you can use objdump to extract the IL, modifiy it and place it back in the OpenCL binary. This will allow you to benefit from the memory/execution model of OpenCL but get hand optimized IL.
I had looked at that when it came out, but I don't remember seeing FSAIL in there before. Unfortunately, google pulls up pretty much nothing. Thats ok though, I had forgotten some of the stuff was on that FSA roadmap, like the coherent shared memory/cache/tables....means GPU and CPU operate in the same virtual address space...One wish granted ;) Also saw pre-emption, and context switching...another wish granted! Also centered on FSAIL, I saw you intend this to be an open standard. Of course, unless intel buys nvidia, or someone else with an x86 license partners/buys/is acquired by nvidia, well, I've been less than impressed with intel's graphics offerings, perhaps with an ISA all spelled out for them that will change, but, though I'm an intel x86/SSE (especially SSE) fan, I just don't see much in the way of impressive graphics offerings coming from that direction...factor in graphics patents are likely mostly held by AMD and nVidia...yeah, I suppose I could go on for hours about the future of all this as well...
I have seen your other posts about dumping the IL and patching it...I really don't like the idea of patching a production binary like that. Yes, I understand its just IL Text, it just seems like a bad idea...though I suspect its the closest thing I'm ever going to get to inline IL, since I'm guessing all the focus over there is on APU's and FSAIL...I'll probably just tear apart some basics and see how the opencl compiler does it, like the memory model, though I think I have it down how that works, and just keep it in IL for as long as I can...though I hate not upgrading and getting tied to versions almost as much as I dislike the idea of patching binaries...we'll see where it all lands...
The deprecation of CAL brought about the tension of developers having to wait for features until it makes it into OpenCL (such as global sync or global data share), which does take a lot of time. If there are too many vendor specific hacks in an open standard, it becomes useless again. Patching binaries is yet again a pain in the @ss.
Unfortunately, shaking hands and finding a compromise for the good of the users has never been a strong part of industrial competition among IT companies, but let's hope for the best.
BTW, are there any rumors about Intel or NV joining the FSA? I cannot tell how good/bad it is for them. All people see from the outside is that NV clings to it's golden egg of CUDA with 10 nails, (like Blizz to WoW, which brings humongous profit each year, no matter how badly it should be replaced) and tries to sabotage the spread of OpenCL deliberatly. (Those loyal to NV say it's not a priority to them, as CUDA is capable of everything their customers need. True in some sense, however broadening the horizon is never a bad thing.)
OpenCL seems like a good place for vendor-OS-architecture independant coding API, however I do not fully see how OpenCL or FSA would build on top or replace each other. And also, in what sense is this a competition/supplement/replacement of DirectX (for eg.)?
Originally posted by: Meteorhead The deprecation of CAL brought about the tension of developers having to wait for features until it makes it into OpenCL (such as global sync or global data share), which does take a lot of time.
For me and others used to working at a low level, its not waiting for things to make it to openCL, its the fact that even MSVC and gcc still make stupid mistakes in missing optimizations. Its because they are looking at the code, and just looking for common things to speed it up, not understanding the problem, and giving you optimal code for it. For example, for fun, and to compare SSE intrinsics vs ASM with the MSVC optimizing compiler, I wrote a simple double precision 4 element vector times a double precision 16 element matrix. Why doubles? To make it interesting. I used all 8 registers, the compiler using intrinsics (which mapped 1-1 with my asm code) used 4. Mine ran twice as fast. In the C/C++ settings, I had all optimizations turned on. The straight C function, which compiled to use x87 worked nearly as fast as the intrinsic version. Thats garbage. Why would I do that? Ever? I've got masm, and can write it up in asm real quick, and done. To me, the deprecation is like Intel mandating that for x86 licenses to be valid, x86 no longer be published, and all assemblers revoked. I'd be there dumbfounded...I realize not many devs still work at this level, and there's less and less every day, but there will always be a need!
Originally posted by: MeteorheadIf there are too many vendor specific hacks in an open standard, it becomes useless again. Patching binaries is yet again a pain in the @ss.
Well, OpenGL seemed to solve this by taking the vendor specific functionality used most often and intrgrating it into the standard. I don't see why OpenCL can't do the same.
Originally posted by: MeteorheadUnfortunately, shaking hands and finding a compromise for the good of the users has never been a strong part of industrial competition among IT companies, but let's hope for the best.
Good luck with that...My only hope is AMD with an x86 licence, and great graphics capabilities has enough weight to throw around and force the others into compliance...
Originally posted by: MeteorheadBTW, are there any rumors about Intel or NV joining the FSA? I cannot tell how good/bad it is for them. All people see from the outside is that NV clings to it's golden egg of CUDA with 10 nails, (like Blizz to WoW, which brings humongous profit each year, no matter how badly it should be replaced) and tries to sabotage the spread of OpenCL deliberatly. (Those loyal to NV say it's not a priority to them, as CUDA is capable of everything their customers need. True in some sense, however broadening the horizon is never a bad thing.)
nVidia had been trying to acquire an x86 license a while ago, certainly not for FSAIL, but it shows they want to make CPUs as well. Presumably this was because they didn't want to lose the nForce market, but intel shut the book on them there. There were rumors via and nvidia were in discussions...I suspect, unless they had prior information about APU's, this was either to introduce the concept themsevles (possible), or more likely because intel graphics are a joke, but they seem to think they can live without nVidia, and AMD graphics, and go it alone, and via has an x86 license... I would think intel's attitude is going to have to change, however, MS of all companies, has an agreement (according to a story on slashdot months ago) to counter-offer any buy-out offers on nVidia, supposedly to protect the rights of the IP of the original XBox. (can we say, yeah right!)? So chances are the via talks aren't to buy them out (in either direction). Via also obtained a patent in April to add a mad instruction to x87 (multiply add for those not familier), which I found somewhat curious. Something must be cooking over there. Will it conform to FSAIL? Your guess is as good as mine! I would suspect if nVidia found some way to make an x86 chip of their own they probably have their own closed model, I'll go ahead and take a shot at being a marketing person and try to make up whay they might call it, how about nCuda nTegrated CPU/CUDA, and that they would revive the nForce platform name for the chipset to back it. That sounds about like what I'd expect from them. Again, TOTAL FABRICATION on my part :) I wouldn't expect them to play nice with an open standard...its like saying Sony would play nice and develop an open standard for a change. (Beta...more recently memory stick flash interface, mini discs, Blu-ray (which they at least brought down the licensing fees so others can produce players for reasonable prices), etc )
I just don't see intel being able to compete alone, nor nVidia alone. intel/nVidia combined, yeah, they could adopt FSAIL and there would be competition.
Originally posted by: MeteorheadOpenCL seems like a good place for vendor-OS-architecture independant coding API, however I do not fully see how OpenCL or FSA would build on top or replace each other. And also, in what sense is this a competition/supplement/replacement of DirectX (for eg.)?
OpenCL is nice because its open...but IMO thats all it has going for it...I'm a low level guy...MS has C++ Amp, which they say will be an open standard as well (shocking I know, like the aforementioned sony comment!). To me though, its just as much crap, because they actually specify, no asm code allowed....they have DX11 shader assembly they could let us inline (since its based on DX), but no...they can't let us do our jobs...
As for FSA vs OpenCL/DX/OpenGL, I think you're missing the point. OpenCL already can dispatch tasks to CPU and/or GPU, so one combined APU isn't going to make much difference to it. DirectX already makes use of the GPU, so it can continue to do so at the APU level. The "video driver" just maps calls to FSAIL rather than to a card connected via PCI Express. Same with OpenGL.
In essence, FSA brings the GPU closer to the CPU, and my hope is that they actually make it fully integrated. In a different link to a more full presentation of the material, (I lost it months ago, it was on slashdot though as well) they show memory coherency, same virtual address space, and context switching, like that one, but I also believe I saw them say that task scheduling will be brought down to the SIMD engine level. I don't remember how many SIMD units there are per SIMD engine, but its still a lot. IIRC, caymans have 24 SIMD engines, if so, the 6990 has 48 total, and 3072 stream processors. 3072/4 stream processors per SIMD Unit=768 SIMD units, 768/2 GPU's per 6990 is 384 SIMD units. 384/24 SIMD engines per GPU gives us 16. So that would mean minimum massivly parallel granularity if 16 4x32 bit SIMD units, or 64 stream processors if my memory is serving me (who knows...more and more CRC errors as I get older :) ) Thats still pretty parallel, but no so huge you have to go and look for tasks to fit it.
Lastly, with a unified architecture, it would sound like they are moving towards a 0 to very little overhead switching between CPU and GPU...this opens so many doors at a low level, I can't even begin to express my excitement! Yeah, its all speculation, but based on what AMD is saying, and not saying, I think I have reason to be excited. So DX, GL, and CL are just interfaces to FSA. Hopefully, FSAIL will also be another interface, much like x86 asm to give us access to the full power of the entire system.....I can dream can't I? :)