AnsweredAssumed Answered

Compute Shader Problem

Question asked by nmanjofo on Nov 30, 2012

I'm implementing a simple N-Body simulation using DX11 & Compute Shader, running on GTX 280. Theory behind is based on this article:


I also noticed that such simulation is already a part of MS DX SDK (nBodyGravityCS11), where I took some inspiration.


The problem I encountered:



void body_body_interaction(inout float3 ai, float4 bi, float4 bj)


    float3 r = -;


    float distSqr = dot(r, r);

    distSqr += g_softeningFactorSq;


    float distInvCube = 1.0f / sqrt(distSqr * distSqr * distSqr);


    //ai += g_FG * bj.w * distInvCube * r; - NOT WORKING

    ai += g_FG *g_fParticleMass * distInvCube * r; //WORKS, g_fParticleMass can be either in cbuffer or global constant, both work



Variable bj (xyz - position, w - mass) is at first loaded to shared memory, then GroupMemoryBarrierWithGroupSync() is called to sync group.



for(uint block=0; block< num_blocks; ++block)


    //Fetch positions to shared cache

    sh_Positions[indexGroup] = oldPar[block * BLOCK_SIZE + indexGroup].pos;




    for(uint i = 0; i<BLOCK_SIZE; i+=8)


        body_body_interaction(accel, myParticle.pos, sh_Positions[i]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+1]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+2]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+3]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+4]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+5]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+6]);

        body_body_interaction(accel, myParticle.pos, sh_Positions[i+7]);






If I use mass stored in bj.w, I end up with NaNs as a result of simulation, even after very first step. Particle positions are correct, because when I choose particle weight from cbuffer or from global constant, simulation works. I init all particle weights to the same number, same as the g_fParticleMass constant in shader.


Funy about this is that if I do the same thing in MS example I mentioned above, the result is very same - I get no output and buffer contains NaNs. Why am I unable to use 4th vector component from a shared memory in this case?? It is initialized properly on CPU side and the copied to GPU (verified)


Full shader code here:


Thank You very much!