I am building a hardware acceleration demonstrator and plan on using a GPU for it.
Initially I was targeting the S9000 but with less CU and a lower frequency clock I am now thinking HD 7970 instead.
I can do with just 3GB of memory and no ECC has this is just a demonstrator but I am more latency bound than in need of a very high throughput.
Therefore the GPU clock matter and I don't particularly care if I waste resources, any resources, only the main loop average cycles per iteration matter.
This excluded multiple wavefront per SIMD unit. And will probably leave many wave front under utilized as I need about a 1 KB of LUT (LDS) per work unit plus some extra space in LDS for input/output management.
With a limit of 32K per workgroup I will spilt the problem between to workgroup each using 32K of LDS. The output will use the global memory to reasemble the final output produce by the two workgroups sharing the same CU.
Overclocking the HD 7970 to 1100MHz and even 1200 MHz has been achieved so my question is which card from which vendor will work best?
Thanks you advance,
System: Dell R720xd with one PCI-e Gen 3x16 and and the power supply needed for a 350W+ GPU card.
OS: Windows 2008 R2. and Windows 2012 when the drivers will be available.
You need 1KB of LUTs per work-item. That is huge for GPUs.
I would not suggest you to agree to that much LDS requirements and try to run 2 wavefronts on a CU, with only 32 work-items. Better way would be re-architect the algorithm to bring down the LDS requirements. I had a similar issue with a algorithm, and it gave me >2X performance after tweaking the algorithm.
I am not aware what vendor is providing a better 7970 with more overclocking facilities. May be someone else can help there.