cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

zellux
Journeyman III

Does CAL copy elements from cpu to gpu one by one?

I wrote a program to test whether it was true.

My program worked like this

calResMap((CALvoid**)&fdata, &pitch, inputRes, 0);
for (int i = 0; i < LENGTH; ++i) {
    fdata[index] = inputData;
}

The inputData[] array was initialized with random floats, and index[] array determined how data are transferred to GPU (sequentially or randomly). The LENGTH is set to 300, 000

At the first run, it copied an array of floating points to GPU sequentially, i.e, index = i

In the second run, it copied the same data to GPU in a random order, i.e, index is first random_shuffled.

But it seemed that both run consumed nearly the same time. Does that mean CAL copy floats to gpu one by one?

Can I make it copy a chunk of data to GPU simutaneously with an operation like DMA, in order to improve performance?

Many thanks.

0 Likes
4 Replies

Hi zellux,

I assume inputRes is a local resource (local to the GPU)?

If so, did you do an unmap as well? I think the automatic copy back doesn't happen until after you unmap it from the CPU side.

Michael.
0 Likes

inputRes is local to CPU, and I test the speed to copy the array to the GPU.

I called calResUnmap immediately after transferring.

Here's the code snippet:

    clock_t start = clock();
    calCtxGetMem(&inputMem, ctx, inputRes);
    calResMap((CALvoid**)&fdata, &pitch, inputRes, 0);
    for (int i = 0; i < LENGTH; ++i) {
        fdata[index] = inputData;
    }
    calResUnmap(inputRes);
    printf("Elapsed time: %d\n", clock() - start);

The elapsed time grows linearly as length grows, which is expected.

0 Likes

Can you try timing the following:
- that section of code but with the loop commented out,
- that section of code but with the CAL calls commented out.

I'm curious to see what is dominating the time, the loop or the CAL calls.

Michael.
0 Likes

Originally posted by: michael.chu@amd.com Can you try timing the following: - that section of code but with the loop commented out, - that section of code but with the CAL calls commented out. I'm curious to see what is dominating the time, the loop or the CAL calls. Michael.


I timed the following code:

calResMap((void**)&dataPtr, &pitch, resLocal, 0);
memset(dataPtr, 0, pitch * sizeof(float) * 4);
calResUnmap(resLocal);
 

For a 1D ressource of 8192 Float4, i get:

calResMap -> 155us
memset -> 67 us
calResUnmap -> 85us

For a 1D ressource of 1 Float4, i get:

calResMap -> 61us
memset -> 0.8us
calResUnmap -> 35us

This is quite slow, what is taking so much time? And why is the calResMap taking 2 times more time than carResUnmap?

0 Likes