Stib

Disturbing numbers

Discussion created by Stib on Nov 10, 2009
Latest reply on Nov 11, 2009 by Stib
Some VERY disturbing numbers...

I have run the same kernel with 2 different work group settings on my CPU, and GPU 10 times. The output i get is wrong. Very wrong!

Is my hardware broken, or what could be the problem??

The kernel: __kernel void hello(__global int *out) { out[0]=get_global_id(0); out[1]=get_local_id(0); out[2]=get_global_id(1); out[3]=get_local_id(1); out[4]=get_global_id(2); out[5]=get_local_id(2); out[6]+=1; } ...and my full output, with every detail: Available platforms: 1 Platform profile: FULL_PROFILE Platform version: OpenCL 1.0 ATI-Stream-v2.0-beta4 Platform name: ATI Stream Platform vendor: Advanced Micro Devices, Inc. Platform extensions: Number of devices: 2 Device #1: CPU Vendor ID: 4098 Max compute units: 2 Max work item dimensions: 3 Max work item sizes: ( 1024, 1024, 1024 ) Max work group size: 1024 Max clock requency: 3005 MHz Address bits: 64 Max mem alloc size: 1024 MB Image support: FALSE Max parameter size: 4096 byte Mem base addr align: 1024 bit Min data type align size: 128 byte Denorms supported: TRUE INF and quiet NaNs supported: TRUE Round to nearest supported: TRUE Round to zero supported: FALSE Round to +/- inf supported: FALSE IEEE754-2008 FMA supported: FALSE Global mem cache type: Read and Write Size of global mem cache line: 64 byte Size of global mem cache: 64 KB Global mem size: 3072 MB Max size of a const buff alloc: 64 KB Max number of const args: 8 Local memory type: Global Local memory size: 32 KB Error correction support: FALSE Resolution of device timer: 1 nanosec Litle endian device: TRUE Device available: TRUE Compiler available: TRUE Can execute OpenCL kernels: TRUE Can execute native kernels: FALSE Out of order exec enabled: FALSE Queue profiling enabled: TRUE Associated platform: ATI Stream Device name: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz Vendor: GenuineIntel Driver version: 1.0 Supported profile: FULL_PROFILE Supported OpenCL version: OpenCL 1.0 ATI-Stream-v2.0-beta4 Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store Context reference count: 1 Device #2: GPU Vendor ID: 4098 Max compute units: 10 Max work item dimensions: 3 Max work item sizes: ( 256, 256, 256 ) Max work group size: 256 Max clock requency: 625 MHz Address bits: 32 Max mem alloc size: 128 MB Image support: FALSE Max parameter size: 1024 byte Mem base addr align: 32768 bit Min data type align size: 128 byte Denorms supported: FALSE INF and quiet NaNs supported: TRUE Round to nearest supported: TRUE Round to zero supported: FALSE Round to +/- inf supported: FALSE IEEE754-2008 FMA supported: FALSE Global mem cache type: None Size of global mem cache line: 0 byte Size of global mem cache: 0 KB Global mem size: 128 MB Max size of a const buff alloc: 64 KB Max number of const args: 8 Local memory type: Global Local memory size: 16 KB Error correction support: FALSE Resolution of device timer: 1 nanosec Litle endian device: TRUE Device available: TRUE Compiler available: TRUE Can execute OpenCL kernels: TRUE Can execute native kernels: FALSE Out of order exec enabled: FALSE Queue profiling enabled: TRUE Associated platform: ATI Stream Device name: ATI RV770 Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.467 Supported profile: FULL_PROFILE Supported OpenCL version: OpenCL 1.0 ATI-Stream-v2.0-beta4 Extensions: Context reference count: 1 Loading and compiling CL source Kernel Function name: hello Number of kernel args: 1 Kernel reference count: 1 #1 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 11 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1164 ? 'hello' run time: 2.9981e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.2464e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 5.111e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6532e-005 sec #2 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1255 ? 'hello' run time: 2.7256e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.2752e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 3.748e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6604e-005 sec #3 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 11 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1091 ? 'hello' run time: 2.6233e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 7 ? 'hello' run time: 4.2707e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 5.451e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6561e-005 sec #4 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 11 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1046 ? 'hello' run time: 2.6575e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.2721e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 5.452e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.663e-005 sec #5 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 11 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1194 ? 'hello' run time: 2.4871e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.267e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 5.11e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6582e-005 sec #6 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1209 ? 'hello' run time: 2.8278e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.2902e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 5.11e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6563e-005 sec #7 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 11 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1031 ? 'hello' run time: 2.5893e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.2632e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 4.089e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6476e-005 sec #8 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 11 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1073 ? 'hello' run time: 2.5552e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.2632e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 6.132e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6524e-005 sec #9 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1248 ? 'hello' run time: 2.8959e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.2656e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 5.451e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6531e-005 sec #10 ------------------------------------------------------- Device #1 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 11 localID dim1: 5 globalID dim2: 11 localID dim2: 5 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 1150 ? 'hello' run time: 2.5552e-005 sec Device #2 global size: ( 12, 12, 12 ) local size: ( 6, 6, 6 ) globalID dim1: 9 localID dim1: 3 globalID dim2: 10 localID dim2: 4 globalID dim3: 11 localID dim3: 5 number of iterations (12^3): 8 ? 'hello' run time: 4.2712e-005 sec Device #1 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 5 localID dim1: 5 globalID dim2: 5 localID dim2: 5 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 216 ? 'hello' run time: 5.111e-006 sec Device #2 global size: ( 6, 6, 6 ) local size: ( 6, 6, 6 ) globalID dim1: 3 localID dim1: 3 globalID dim2: 4 localID dim2: 4 globalID dim3: 5 localID dim3: 5 number of iterations (6^3): 1 ? 'hello' run time: 2.6555e-005 sec Done Passed!

Outcomes