I am wondering why voxel cone tracing demo is too slow on HD7900 with AMD's openGL 4.3 driver.
I have tried following voxel cone tracing demo.
Running speed is less than 1fps. I feel this is too slow comparing with nVIDIA GPU.
How can I get more FPS?
1) When I change voxel size from 256x256x256 to 16x16x16, fps incresed to 2.
2) According to geeks3D.com, even nvidia old card is faster.
- GTX 680: 30fps
3) My environment is
- HD7900 with 3GB video memory / PCIexpress 2.0x16
- windows 7 64bit
- core i7 firstname.lastname@example.orgGHz
- system memory 24GB
- I have installed following driver. I have tried beta1 and beta2, both result is the same.
Thank you in advance.
HD7900 has enough 3GB memory , enough 32CUs, I wonder why.
I guess some matters.
- nvidia GPU's branch granularity is half size of amd GPU? (32 threads warp vs 64 threads wavefront)
many branche penalties?
- driver's bug ?
- this program is optimized hard for nvidia gpu? working buffer size is fit to nvidia gpu's L2 texture cache size?
Somebody please tell me your opinion.
We've been looking at this demo. I was hoping to update this thread when we have something to report. Unfortunately, although we're sure we can make it go faster, we've been unable to reproduce the poor performance you're seeing. On our test machines with Radeon HD 7970, we're seeing framerates in the low 20's. Even with debug builds of our drivers and the application, we're able to produce framerates in the high teens.
Do other OpenGL applications perform acceptably on your machine?
Thank you for your reply.
I will report OpenGL benchmark score the day after tomorrow.
How about OpenGL SpecViewPerf for test?
And I will also add LEO-DEMO result for reference.
I am using the latest freeglut. This might effect...?
I have checked some AMD's OpenGL demos for this performance evaluation.
In my opinion, it seems to be no problem about running opengl programs.
1) AMD OpenGL parallaxMapping demo
If upper left score means FPS, then 2868.
2) AMD OpenGL alpha to coverage demo
If upper left score means FPS, then 2925.
3) AMD OpenGL fbo demo
If upper left score means FPS, then 2077.
4) supplement : AMD's forward+ demo
following image and list is the result of GPU Perfstudio.
According to the result, it seems to glActiveTexture and glBindTexture are the bottleneck.
Each function takes 16 to 32 milli second by GPU Perfstudio's CPU time measurement.
attached file is detailed captured data. (csv format file)