0 Replies Latest reply on Nov 30, 2012 3:01 AM by nmanjofo

    Compute Shader Problem

    nmanjofo

      I'm implementing a simple N-Body simulation using DX11 & Compute Shader, running on GTX 280. Theory behind is based on this article:http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html

       

      I also noticed that such simulation is already a part of MS DX SDK (nBodyGravityCS11), where I took some inspiration.

       

      The problem I encountered:

       

       

      void body_body_interaction(inout float3 ai, float4 bi, float4 bj)

      {

          float3 r = bj.xyz - bi.xyz;

       

          float distSqr = dot(r, r);

          distSqr += g_softeningFactorSq;

       

          float distInvCube = 1.0f / sqrt(distSqr * distSqr * distSqr);

       

          //ai += g_FG * bj.w * distInvCube * r; - NOT WORKING

          ai += g_FG *g_fParticleMass * distInvCube * r; //WORKS, g_fParticleMass can be either in cbuffer or global constant, both work

      }

       

      Variable bj (xyz - position, w - mass) is at first loaded to shared memory, then GroupMemoryBarrierWithGroupSync() is called to sync group.

       

      [loop]

      for(uint block=0; block< num_blocks; ++block)

      {

          //Fetch positions to shared cache

          sh_Positions[indexGroup] = oldPar[block * BLOCK_SIZE + indexGroup].pos;

          GroupMemoryBarrierWithGroupSync();

       

          [unroll]

          for(uint i = 0; i<BLOCK_SIZE; i+=8)

          {

              body_body_interaction(accel, myParticle.pos, sh_Positions[i]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+1]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+2]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+3]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+4]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+5]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+6]);

              body_body_interaction(accel, myParticle.pos, sh_Positions[i+7]);

          }

       

          GroupMemoryBarrierWithGroupSync();

      }

       

      If I use mass stored in bj.w, I end up with NaNs as a result of simulation, even after very first step. Particle positions are correct, because when I choose particle weight from cbuffer or from global constant, simulation works. I init all particle weights to the same number, same as the g_fParticleMass constant in shader.

       

      Funy about this is that if I do the same thing in MS example I mentioned above, the result is very same - I get no output and buffer contains NaNs. Why am I unable to use 4th vector component from a shared memory in this case?? It is initialized properly on CPU side and the copied to GPU (verified)

       

      Full shader code here: http://pastebin.com/SJhs8ntthttp://pastebin.com/SJhs8ntt

       

      Thank You very much!