4 Replies Latest reply on Jun 27, 2008 12:53 PM by Nexis

    Does CAL copy elements from cpu to gpu one by one?

    zellux

      I wrote a program to test whether it was true.

      My program worked like this

      calResMap((CALvoid**)&fdata, &pitch, inputRes, 0);
      for (int i = 0; i < LENGTH; ++i) {
          fdata[index] = inputData;
      }

      The inputData[] array was initialized with random floats, and index[] array determined how data are transferred to GPU (sequentially or randomly). The LENGTH is set to 300, 000

      At the first run, it copied an array of floating points to GPU sequentially, i.e, index = i

      In the second run, it copied the same data to GPU in a random order, i.e, index is first random_shuffled.

      But it seemed that both run consumed nearly the same time. Does that mean CAL copy floats to gpu one by one?

      Can I make it copy a chunk of data to GPU simutaneously with an operation like DMA, in order to improve performance?

      Many thanks.

        • Does CAL copy elements from cpu to gpu one by one?
          michael.chu
          Hi zellux,

          I assume inputRes is a local resource (local to the GPU)?

          If so, did you do an unmap as well? I think the automatic copy back doesn't happen until after you unmap it from the CPU side.

          Michael.
            • Does CAL copy elements from cpu to gpu one by one?
              zellux

              inputRes is local to CPU, and I test the speed to copy the array to the GPU.

              I called calResUnmap immediately after transferring.

              Here's the code snippet:

                  clock_t start = clock();
                  calCtxGetMem(&inputMem, ctx, inputRes);
                  calResMap((CALvoid**)&fdata, &pitch, inputRes, 0);
                  for (int i = 0; i < LENGTH; ++i) {
                      fdata[index] = inputData;
                  }
                  calResUnmap(inputRes);
                  printf("Elapsed time: %d\n", clock() - start);

              The elapsed time grows linearly as length grows, which is expected.

                • Does CAL copy elements from cpu to gpu one by one?
                  michael.chu
                  Can you try timing the following:
                  - that section of code but with the loop commented out,
                  - that section of code but with the CAL calls commented out.

                  I'm curious to see what is dominating the time, the loop or the CAL calls.

                  Michael.
                    • Does CAL copy elements from cpu to gpu one by one?
                      Nexis

                       

                      Originally posted by: michael.chu@amd.com Can you try timing the following: - that section of code but with the loop commented out, - that section of code but with the CAL calls commented out. I'm curious to see what is dominating the time, the loop or the CAL calls. Michael.


                      I timed the following code:

                      calResMap((void**)&dataPtr, &pitch, resLocal, 0);
                      memset(dataPtr, 0, pitch * sizeof(float) * 4);
                      calResUnmap(resLocal);
                       

                      For a 1D ressource of 8192 Float4, i get:

                      calResMap -> 155us
                      memset -> 67 us
                      calResUnmap -> 85us

                      For a 1D ressource of 1 Float4, i get:

                      calResMap -> 61us
                      memset -> 0.8us
                      calResUnmap -> 35us

                      This is quite slow, what is taking so much time? And why is the calResMap taking 2 times more time than carResUnmap?