Archives Discussions

pavandsp · ‎03-06-2010

Hi,

When I execute the simple (multiplier) code on CPU the output is correct but when executed on GPU with proper modification in context ,command APIs output is not proper.

Kernel: Multiply 8x8 Matrix by 2.i.e A*2.

actually I have other Algo in the func which is not working so I commented and trying with this multiply so as to get the simple func to work in GPU.

lines=Len=8; globalThreads[0] =8; globalThreads[1] =8;

status = clEnqueueNDRangeKernel(
                             commandQueue,
                 kernel, 2, NULL,
                 globalThreads,
                NULL, //localThreads
                 0,
                 NULL,
                 &events[0]);

_kernel void myKernel(const float x,
                        const float y,
                        const int lines,
                        const int Len, //width
                        __global unsigned char * output,
                        __global unsigned char * input)
{
    uint tx = get_global_id(0);
    uint ty = get_global_id(1);
   output[(ty * Len) + tx] = input[(ty * Len) + tx] * 2;

}

Details:GPU ATI RV710.AMD CPU

Input:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 .

Output in CPUOutput is 3 times the size of input).Correct
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Output in GPUOutput is 3 times the size of input).Wrong
6 0 0 0 14 0 0 0 22 0 0 0 30 0 0 0 38 0 0 0 46 0 0 0 54 0 0 0 62 0 0 0 70 0 0 0 78 0 0 0 86 0 0 0 94 0 0 0 102 0 0 0 110 0 0 0 118 0 0 0 126 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.

I am not sure whats happening i think all the Max work items sizes and maxworkgroupsize are within the limit because my size is 8x8.

Also I am not clear of

1.global work items and it relation to parallelism

2.work item :How many pixel elements wil be processed in a work item and where do i get this data

Thanks in Advance

Pavan

Archives Discussions

code working in CPU and not in GPU