AnsweredAssumed Answered

Results differs for GPU and CPU devices depending on data.

Question asked by inferrna on Oct 17, 2013
Latest reply on Nov 15, 2013 by inferrna

I wrote simple propagation example realized for CPU and for OpenCL. OpenCL results differs from CPU results depending on data structure and GPU using. First device Devastator (7560D) gives right result only when all layers have the same size. Second device Pitcairn(HD 7870) gives wrong result when all layers have size equals WORK_GROUP_SIZE (256 for both). Wrong always is the third layer (second of mutable layers). My question is: is something wrong with my code, or fglrx driver, or this task is impossible for OpenCL and I must do separate call for each propagation?


In attachment is test program to see the difference between CPU & GPU results.

Compile options: gcc -O0  -std=c99 test_prop.c test_prop_cl.c -o test_prop -lm -lOpenCL -ggdb -L/usr/lib/


First variant of kernel, gives the same results as version in attachment.

__kernel void test_prop( __global float* values, __global __read_only float* conns, __global unsigned int* sz)


    __global float *pvalues = values;

    __global float *cvalues = values;

    __local float sums[256];

    float sum;

    unsigned int i, nr, pnr, cn, clc, k, offc=0, lcn, szp;

    for(k=1; k<3; k++){

        szp = sz[k-1];


        clc = sz[k]*szp;

        cn  = select((uint)0, (uint)get_global_id(0), get_global_id(0)<clc);

        lcn = get_local_id(0);


            nr  = cn / szp; //Current neuron

            pnr = cn % szp; //Prev layer's connected neuron.

            sums[lcn] = conns[cn+offc]*pvalues[pnr];


            for(i=2; i<=szp; i<<=1){

                sum = select((float) 0.0, sums[lcn + i/2], (uint)(clc % i == 0));

                sums[lcn] += sum;



           cvalues[nr] = sums[lcn];


        sums[lcn] = 0;


        offc += clc;

        pvalues = cvalues;