I highly doubt AMD wishes to insert restrictions into their HW similar to NV for the following pupose: AMD does not follow monolithic chip design (although GF116 is a viable chip). For AMD to perform in the top gaming and HPC segment, they have to create dual-GPU solutions, and up until now there were none dual-GPU professional cards, only gaming cards are dual-GPU. FirePro is optimized for CAD programs, and they are not optimized for multi-GPU applications, so most likely there will be no multi-GPU FirePros in the future also.
If they were to insert restrictions into gaming HW for the sole reason of enforcing people to buy FirePros, they cut themselves from the high-end HPC segment completely.
My guess goes with nou, it will be class-dependant how the chips will perform in DP.
Originally posted by: MeteorheadFirePro is optimized for CAD programs, and they are not optimized for multi-GPU applications, so most likely there will be no multi-GPU FirePros in the future also.
unless they create a new line up to compete directly with Teslas. Since AMD wants to make a name on the GPGPU, it would not be so surprising. NVIDIA had maybe not all the success they expected to on the software side about CUDA (push every big software company to make plugins using CUDA), but they certainly sell lots of Teslas with the new supercomputers out there. I ignore how much dollars it represents, but I can guess that AMD would like to compete in this segment also.
All this makes me think about some people who said, like: "VLWI is the strength of AMD, it will never disappear". Well, seems not... I'm really interested on this new architecture and how it will improve things on the GPGPU side. Only wonder if it will be inside HD8000 or HD9000 series (or maybe another name, why not!)
All this makes me think about some people who said, like: "VLWI is the strength of AMD, it will never disappear". Well, seems not...
I don't think that's necessarily the case, what it does mean is that AMD feels there is a market that they can better compete in by moving to this new architecture (ie being more similar to Nvidia). Like I said in another thread, I'd be really surprised if the 1st generation of these new cards can compete with a generation back of the VLIW cards when it comes to algorithms like MM, for example, my guess is that the peak is going to go down and since the most optimized MM is getting over 90% peak...
APUs are very interesting for the HPC world: they are small enough to fit in a 1U rack and they consume low power. However, without DP support the product has a big handicap.
I hope AMD could make a Fusion APU version with full DP support soon ( Opteron APU? )
I'd be really surprised if the 1st generation of these new cards can compete with a generation back of the VLIW cards when it comes to algorithms like MM
The problem is that the MM is the perfectly parallel use-case, but most of the algorithms used by us all today are very very far way from this perfect fit case. Having an architecture that is in average better than the current one will make it sell better. How much it worth an architecture whose peak performance is the fastest but can only be achieved in a very small number of cases? Sure, games are one of these cases, for now. But even there shaders have been more and more complex and the new to come compute pipeline of opengl will make shader even more flexible, compute friendly, closer to opencl in some way. AMD architects have be seen this for some time now. It's time to move on. It would surprise me if new cards will be slower than current ones for games. New architecture will be forged using smaller form factor, so it will give us more fps, AMD marketing guys will sell it as the fastest architecture ever and gamers will be happy. Plus, it will be faster for GPGPU, compilers will fit better and OpenCL will get closer and closer to CUDA. Better yet, same kernel will have smaller performance difference between AMD and NVDIA than today, making code development easier and more general. More yet, we will have price war in almost all segments. What else could people ask for!
I hate the whole "DP is not important for consumer uses" argument.
What if Intel/AMD took this to heart and chopped the 80bit x87 FPU to 32 bits to make a consumer level CPU...
Fact is that most software uses the x87 instructions to perform all calculations and only casts back to double for storage.
Excel uses doubles, would people be happy if microsoft release a consumer version that only used single precision?
Fact is that other than multimedia, games and video compress every other computation the average pc user does is in double precision.
If openCL wants to move out of these areas and into general computing then double precision is a requirement not an optional addon.
Originally posted by: MicahVillmow but we does not view DP as professional/HPC only use.
I guess I'm concerned high DP performance will be reserved for HPC products as per Nvidia. There is a huge price gap between the highest Nvidia consumer graphics card and the cheapest Tesla.
I fully accept that you (AMD) should try and differentiate your HPC parts. But I feel this should be on the basis of relablity, ecc, thermal design, designed for packed blade use, fast detailed support and additional driver features (Infiniband performance) etc
I agree also. Making a dual-GPU, double size ECC VRAM, strictly front-to-back cooled HPC card, with proper driver (Xorg independant), fit for close packing (meaning the cooler is 4 mm thinner than a double-width cooling solution) would indeed be welcome and be worth the extra money.
Originally posted by: dravisherWhat I'm still wondering though, is how this affects global memory latencies? Basically my question is: If we feed a Cayman CU and a GCN CU with four wavefronts, will the GCN be more strangled by global memory latencies than Cayman? With Cayman only a single wavefront is actually executing at any one time, so it does have others to switch to when waiting for global memory. With GCN all four wavefronts are actually executing at the same time, and so there is nothing to switch to (other than within the wavefronts). Would this lead to us needing more wavefronts per GCN CU to hide global memory latencies than we do on Cayman? I find this interesting since needing more wavefronts per CU in practice increases pressure on both LDS and registers. The LDS has doubled so that's fine, but the registers have stayed the same size per CU.
GCN will no longer waste registers like the VLIW chips do. Register allocation on the current chips is terrible, hence all the complaints about register spill.
So GPRs will prove to be less of a constraint on the number of hardware threads per SIMD as the compiler won't be so profligate (fingers-crossed).
Of course if your algorithm wants to use a small number of hardware threads per SIMD due to a large workgroup size or large local memory allocation per work item, then you're stuck.