tesselation and GPGPU

Discussion created by Meteorhead on Mar 24, 2011

I would have a few questions about the practical usage of tesselation.

As far as I know (and correct me if I'm wrong), HW tesselation is used in the DX11 pipeline to reduce memory and memory bandwidth requirement by inserting a "shader" into rendering that has vertex inputs and vertex outputs that is some multiple of the input vertecies and they fit onto a geometrical surface given by any analytical function designated by the programmer. This calculation is done every frame and only the subsequent shader sees the output of the tesselator, if not stored output is destroyed (for this is bandwidth reduction).

Tesselators could be used in many places in HPC where memory bandwidth would be crucial and mostly calculations are done on lattices with very primitive geometry (spheres, geoids, torus, etc) where high lattice density is required, but it would be more efficient to make the simple tesselation every iterational step rather than store a very high resolution object in memory.

Access to tesselators only makes sense so long as they exist as a fixed function unit on the HW. Fermi uses shaders for tesselation (correct me if I'm wrong), thus tesselation can be used to any extent in exchange for remaining shader capacity. AMD HW could benefit however, and since DX12 is sure to extend on relying on tesselation for wider use (ability to use for collision detection, etc.) it wouldn't be too far off from reality to incorporate tesselation access into OpenCL for GPGPU usage, as capable HW will surely continue to exist so long as games do. (And games will always exist.) Even if tessealators as a fixed function unit will seize to exist to give bigger freedom for developers at resource allocation, inserting an apropriate element into OpenCL could be useful. Implementations that lack HW accel of tesselation would ignore the special nature of the function.

I know this idea shold be posted to some Khronos forum, rather than here, but AMD has somewhat bigger word in this matter than a single forumer. I would love to hear both dev and corporate feedback on the matter.

// Could look something like: __tessel ZQ(__local double4* input, __private double4* output) {...} __kernel void XY(...) { . . . tess(A, B); . . . }