6 Replies Latest reply on May 25, 2011 11:02 AM by himanshu.gautam

    Merging data from several GPU ?

    spectral

      Hi,

      I'm running a kernel on several GPU and a CPU. This kernel update some pixel colors, simply.

      The kernel work on a clTask that has 'x,y' coordinates. But several tasks have the same pixel coordinates (Even on the same GPU).

      So, I'm searching for an efficient way to merge all theses 'tasks' color (from several GPU and even from the CPU) into one buffer.

      Do you have an idea to do this ?

       

      Thanks

      struct { int x; int y; int4 color, } clTask;

        • Merging data from several GPU ?
          laobrasuca

          you mean better than creating all buffers (for all devices) as pointers to a single host memory buffer?

            • Merging data from several GPU ?
              spectral

              Imagine I have 4 GPU and each one has 1000 clTask.

              So, I have to Compute the average of the colors :

              for(int i = 0; i < taskCount; i++)

              {

              color = (gpu[0].color+gpu[1].color+gpu[2].color+gpu[3].color) * 0.25;

              }

               

              So, there are several methods :

              1 - I retreive all the tasks into RAM, then use CPU to average
              2 - Maybe I can use a "shared" memory mecanism to execute a Kernel to average all
              3 - Use an OpenGL texture to merge everything

              I have no other idea ! Maybe there are some way to transfer some memory betweens GPU without going into the host ? 

                • Merging data from several GPU ?
                  laobrasuca

                   

                  Originally posted by: viewon01 Imagine I have 4 GPU and each one has 1000 clTask.

                   

                  So, I have to Compute the average of the colors :

                   

                  for(int i = 0; i < taskCount; i++)

                   

                  {

                   

                  color = (gpu[0].color+gpu[1].color+gpu[2].color+gpu[3].color) * 0.25;

                   

                   

                  }

                   

                   

                   

                  So, there are several methods :

                   

                  1 - I retreive all the tasks into RAM, then use CPU to average 2 - Maybe I can use a "shared" memory mecanism to execute a Kernel to average all 3 - Use an OpenGL texture to merge everything

                  I believe that what I said above is something close to the option 2, with the shared buffer being the host memory.

                  I have no other idea ! Maybe there are some way to transfer some memory betweens GPU without going into the host ? 

                  I've been observing the post and expecting someone more expert on this to comment (and they maybe will), but for now I think option 1 is the best one, specially if you have an at least 4 core processor, in which case you can dispatch into different threads each GPU job than, once everything is done, do the average computing on CPU since the amount of data seems to be quite low (unless it is large, than you could update data to one of your GPU and make a reduction like average).

                  Option 3 will maybe require memory share between GPUs, and I think in order to share memory between GPU you need to pass trough host memory, unless you do something like CrossFire. I don't think that OpenCL can be use with an CrossFire setup, but OpenGL can, so maybe using textures, but I ignore how the driver shares memory. Other option maybe would be using wglsharelists to share texture or object buffers, but I don't know if you can access a given GPU memory from other GPUs.

                  Btw, how clCreateBuffer act when you have more than one device on your context? Does it allocate the same amount of memory in each device? Because this function doesn't ask for the devices in the argument list, only the context.

                   

                    • Merging data from several GPU ?
                      himanshu.gautam

                      I agree with laobrasuca that its better to do the final merging on CPU.

                      This is what we do in the reduction sample.

                       

                        • Merging data from several GPU ?
                          spectral

                          Thanks, but I'm not sure, because :

                          1 - I need to read all the buffer from the GPUs...

                          2 - I don't use OpenCL to compute the average, so no benefit of parellel processing

                          3 - I have to resend the new buffer

                          So, why not :

                          1 - Create a new buffer on the most powerful GPU
                          2 - Transfer one but one each buffer
                          3 - Use OpenCL to average

                          Then I can continue to use this buffer

                          The problem is phase (2). Because it required a read and a write ! I suppose it is not possible to do this in one operation. Like we read data from the GPU memory to CPU memory, is there a way to read memory from the GPU to another GPU ?