cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

boxerab
Challenger

OpenCL benchmarks for GCN cards

I am interested in getting an idea of how much of a speedup I can expect when I move from my olde

HD 7700 to an R290X.  I currently have a 2GB PCI 3.0 HD 7700 running on windows 7, and my kernel

takes about 250 ms to run.  This is an image compression kernel, so there is a lot of memory traffic between

global and local memory.  Is it reasonable to expect a 10X increase in performance when moving to R290X ?

I realize it is quite hard to tell without running the durn thang.

And, by the way,  any progress on the board farm that was being discussed earlier ?

Thanks,

Aaron

0 Likes
1 Solution

they should really figure out how to hide the fact it is a separate device better and make it look like more compute units on a single card  - it's always going to haunt them in benchmarks which will directly related to sales and uptake.

There are other advances in GCN that might affect your speedup but if it is a case of straight forward memory bandwidth (you can verify your bottlenecks with CodeXL) - your answer is likely based on the ratio of the cards memory bandwidth.  Number of compute units is the next effector probably...

View solution in original post

0 Likes
5 Replies
boxerab
Challenger

Compubench results show a roughly 5X speedup from 7700 to R290X, on a variety of OpenCL benchmarks:

http://compubench.com/result.jsp

Strangely, R295X scores about the same as R290X - I am guessing that the test software doesn't scale

to multiple devices.

0 Likes

they should really figure out how to hide the fact it is a separate device better and make it look like more compute units on a single card  - it's always going to haunt them in benchmarks which will directly related to sales and uptake.

There are other advances in GCN that might affect your speedup but if it is a case of straight forward memory bandwidth (you can verify your bottlenecks with CodeXL) - your answer is likely based on the ratio of the cards memory bandwidth.  Number of compute units is the next effector probably...

0 Likes

Thanks, Jason.  What other advances were you thinking of in comparing 290X with 7700?

One feature I would like to make use of is the 8 Asynchronous Compute Engines.

0 Likes

I'm going out on a limb here to say that I don't know but I figure it only matters if you have multiple concurrent kernel dispatches (ie queue consumers).  For my own work I probably wouldn't use more than 1-2 and fully could probably utilize all or close to all compute resources (memory bandwidth or compute units) with that much.  I think another thing is that GCN handles better is divergence and works a bit more predictably than the VLIW architectures - I'm just parroting what slides like these show though: http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf

0 Likes

Thanks. I am processing large volumes of video images, and some of my kernels are very serial due to serial algorithm

so they don't make full use of GPU resources. So, the more concurrent kernel dispatches the better.

By the way, HD 7700 is GCN, not VLIW.  Thanks for the link.

Aaron

0 Likes