0 Replies Latest reply on Nov 30, 2012 3:01 AM by nmanjofo

    Compute Shader Problem


      I'm implementing a simple N-Body simulation using DX11 & Compute Shader, running on GTX 280. Theory behind is based on this article:http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html


      I also noticed that such simulation is already a part of MS DX SDK (nBodyGravityCS11), where I took some inspiration.


      The problem I encountered:



      void body_body_interaction(inout float3 ai, float4 bi, float4 bj)


          float3 r = bj.xyz - bi.xyz;


          float distSqr = dot(r, r);

          distSqr += g_softeningFactorSq;


          float distInvCube = 1.0f / sqrt(distSqr * distSqr * distSqr);


          //ai += g_FG * bj.w * distInvCube * r; - NOT WORKING

          ai += g_FG *g_fParticleMass * distInvCube * r; //WORKS, g_fParticleMass can be either in cbuffer or global constant, both work



      Variable bj (xyz - position, w - mass) is at first loaded to shared memory, then GroupMemoryBarrierWithGroupSync() is called to sync group.



      for(uint block=0; block< num_blocks; ++block)


          //Fetch positions to shared cache

          sh_Positions[indexGroup] = oldPar[block * BLOCK_SIZE + indexGroup].pos;




          for(uint i = 0; i<BLOCK_SIZE; i+=8)


              body_body_interaction(accel, myParticle.pos, sh_Positions[i]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+1]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+2]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+3]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+4]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+5]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+6]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+7]);






      If I use mass stored in bj.w, I end up with NaNs as a result of simulation, even after very first step. Particle positions are correct, because when I choose particle weight from cbuffer or from global constant, simulation works. I init all particle weights to the same number, same as the g_fParticleMass constant in shader.


      Funy about this is that if I do the same thing in MS example I mentioned above, the result is very same - I get no output and buffer contains NaNs. Why am I unable to use 4th vector component from a shared memory in this case?? It is initialized properly on CPU side and the copied to GPU (verified)


      Full shader code here: http://pastebin.com/SJhs8ntthttp://pastebin.com/SJhs8ntt


      Thank You very much!