HD7900 has enough 3GB memory , enough 32CUs, I wonder why.
I guess some matters.
- nvidia GPU's branch granularity is half size of amd GPU? (32 threads warp vs 64 threads wavefront)
many branche penalties?
- driver's bug ?
- this program is optimized hard for nvidia gpu? working buffer size is fit to nvidia gpu's L2 texture cache size?
Somebody please tell me your opinion.
1 of 1 people found this helpful
We've been looking at this demo. I was hoping to update this thread when we have something to report. Unfortunately, although we're sure we can make it go faster, we've been unable to reproduce the poor performance you're seeing. On our test machines with Radeon HD 7970, we're seeing framerates in the low 20's. Even with debug builds of our drivers and the application, we're able to produce framerates in the high teens.
Do other OpenGL applications perform acceptably on your machine?
Thank you for your reply.
I will report OpenGL benchmark score the day after tomorrow.
How about OpenGL SpecViewPerf for test?
And I will also add LEO-DEMO result for reference.
I am using the latest freeglut. This might effect...?
I have checked some AMD's OpenGL demos for this performance evaluation.
In my opinion, it seems to be no problem about running opengl programs.
1) AMD OpenGL parallaxMapping demo
If upper left score means FPS, then 2868.
2) AMD OpenGL alpha to coverage demo
If upper left score means FPS, then 2925.
3) AMD OpenGL fbo demo
If upper left score means FPS, then 2077.
4) supplement : AMD's forward+ demo
following image and list is the result of GPU Perfstudio.
According to the result, it seems to glActiveTexture and glBindTexture are the bottleneck.
Each function takes 16 to 32 milli second by GPU Perfstudio's CPU time measurement.
attached file is detailed captured data. (csv format file)