cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ryta1203
Journeyman III

Shader Performance!?

Overall, which shader mode gives the best theoretical performance?

It seems from the docs that pixel shader mode currently (on current hardware) provides better theoretical performance, is that correct?

0 Likes
17 Replies

Compute shader has a higher possibility of getting peak performance because it is not part of the graphics pipeline.
For example, when you run a pixel shader, a vertex/geometry shader must be executed first in order to generate the pixels. So there is overhead involved and it requires resources. In compute shader, you basically get all the resources on the chip. Actually achieving that peak performance is another thing however.
0 Likes

So why Brook+ code compiled into PS, not CS ?
0 Likes

Good question!

0 Likes

Originally posted by: Raistmer So why Brook+ code compiled into PS, not CS ?


Simple answer: compatibility.

Pixel Shader code (if not using some special stuff like double precision) runs on all cards. Compute shaders only on the HD4000 series.

0 Likes

That is, so called "stream comuting" via Brook+ can be applied to some older cards too?
I'm afraid not.... or?
0 Likes

So why Brook+ code compiled into PS, not CS ?


I think the main reason is that CS mode has much more constraints than PS mode (CS allows only single scatter stream - no color buffers).

0 Likes

Originally posted by: gaurav.garg

So why Brook+ code compiled into PS, not CS ?





I think the main reason is that CS mode has much more constraints than PS mode (CS allows only single scatter stream - no color buffers).



Hm, AFAIK only one scatter stream allowed in Brook+, same restriction 😞
0 Likes

But, PS mode can have multiple regular output streams (that is not allowed in CS mode).

0 Likes

Originally posted by: gaurav.garg But, PS mode can have multiple regular output streams (that is not allowed in CS mode).

 

What do you mean exactly by "regular"? To my knowledge you can write out as many outputs as you want to the global buffer.

Though I will say this, and it's one of the reasons I asked the question to begin with... there is a presentation by AMD (one of the presenters is Micah) that gives the bandwidth of output and in CS the bandwidth is much slower than in PS... is this true?

0 Likes

What do you mean exactly by "regular"? To my knowledge you can write out as many outputs as you want to the global buffer.


Non-scater output streams in Brook+ or Color buffers in CAL PS mode.

0 Likes

Originally posted by: gaurav.garg
What do you mean exactly by "regular"? To my knowledge you can write out as many outputs as you want to the global buffer.


Non-scater output streams in Brook+ or Color buffers in CAL PS mode.

You can use burst write in CS to output multiple non-scatter streams, so I'm still confused by your statement. I'm sure it's something semantic.

0 Likes

If you write in Brook code without using AMD extensions and using the older brook codebase, you can compile to a vast majority of graphics cards using the DX/OGL backends. With pixel shader mode, you can target all Radeon HD cards and compute shader can target all HD4XXX series and later cards. So yeah, compatibility is a reason.
0 Likes

So does Brook+ find out what card you are using and compile into either ps mode or cs mode depending?

0 Likes

I have a HD4870 card. How can I use compute shader via Brook+?

0 Likes

Originally posted by: ryta1203 So does Brook+ find out what card you are using and compile into either ps mode or cs mode depending?


No, it compiles to pixel shader code by default (just look to the created .h file) and switches to compute shader only if you use some compute shader features. The hardware dependent compilation is done by the brook runtime (calling the CAL compiler), but it does not change the shader mode.

0 Likes

to MicahVillmow:

What about OpenCL realization?

Capability or performance or maybe automatic determination of the best way of running?

Why only 4 generation of GPU(from X1XXX) ATI decide unhide the real face of GPU?

0 Likes

Originally posted by: godsic to MicahVillmow:

What about OpenCL realization?

Capability or performance or maybe automatic determination of the best way of running?

Why only 4 generation of GPU(from X1XXX) ATI decide unhide the real face of GPU?

I think it is because previous hardware don't have FP32 capabilities

0 Likes