Hi,
When I execute the simple (multiplier) code on CPU the output is correct but when executed on GPU with proper modification in context ,command APIs output is not proper.
Kernel: Multiply 8x8 Matrix by 2.i.e A*2.
actually I have other Algo in the func which is not working so I commented and trying with this multiply so as to get the simple func to work in GPU.
lines=Len=8; globalThreads[0] =8; globalThreads[1] =8;
status = clEnqueueNDRangeKernel(
commandQueue,
kernel, 2, NULL,
globalThreads,
NULL, //localThreads
0,
NULL,
&events[0]);
_kernel void myKernel(const float x,
const float y,
const int lines,
const int Len, //width
__global unsigned char * output,
__global unsigned char * input)
{
uint tx = get_global_id(0);
uint ty = get_global_id(1);
output[(ty * Len) + tx] = input[(ty * Len) + tx] * 2;
}
Details:GPU ATI RV710.AMD CPU
Input:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 .
Output in CPUOutput is 3 times the size of input).Correct
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 122 124 126 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Output in GPUOutput is 3 times the size of input).Wrong
6 0 0 0 14 0 0 0 22 0 0 0 30 0 0 0 38 0 0 0 46 0 0 0 54 0 0 0 62 0 0 0 70 0 0 0 78 0 0 0 86 0 0 0 94 0 0 0 102 0 0 0 110 0 0 0 118 0 0 0 126 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.
I am not sure whats happening i think all the Max work items sizes and maxworkgroupsize are within the limit because my size is 8x8.
Also I am not clear of
1.global work items and it relation to parallelism
2.work item :How many pixel elements wil be processed in a work item and where do i get this data
Thanks in Advance
Pavan