1 Reply Latest reply on May 2, 2011 7:16 AM by nou

    Splitting huge dataset across multiple GPUs and communicating



      I have a simulation with a huge 3D dataset which I cannot fit into a single GPU. I have a machine with 4 GPUs which I want to work together. I split the dataset into 4 sub-cubes and want only a single sub-cube to be allocated on each device. For each simulation step I have to communicate ghost layers between the devices. What is the best way to do this?

      - Creating 1 context with 4 GPUs? Will all GPUs get all sub-cubes allocated, since buffer creation is for a context and not device?
      - Creating 4 contexts with 1 GPU? It is possible to synchronize between contexts?

      What commands should I use to transfer data directly from one GPU to another?


        • Splitting huge dataset across multiple GPUs and communicating

          AMD OpenCl implementation allocate buffer on device on the first use. you can allocate 8GB of buffers. when you dont pass COPY_HOST_PTR then they are not allocated at all. when you do they are copied to host memory. they are allocated at device only when you enqueue kernel which use them.

          and they stay there until you release then. if you enqueue kernel whcih use same buffer on multiple devices then this buffer is allocated on all used devices.

          more in AMD OpenCL programing guide 4.5

          i am not sure but clEnqueueWrite/Read seems allocate buffer on device.