notyou

Error unless printing after passing data size???

Discussion created by notyou on Nov 26, 2011
Latest reply on Dec 1, 2011 by notyou

I realize this sounds incredibly weird, but for some reason, once I pass a certain input size (N = 2561 for this particular example) my returned results never come out as correct unless I print the value while executing the kernel.

In the attached code (binomial lattice if it helps), if I execute 2561 as i (the number of steps), the buffer value returned is 0.15... and is exactly the value I expect. If I step up to 2562, I get a very large number in the thousands which slowly increases. What I don't understand is that, if I then enable the printf in the kernel then the same buffer I retrieve (absolutely no changes to my main.cpp file) will have the correct value, 0.15... but it takes an incredibly long time to execute because of the printf. If it helps, I am also running clFinish() after each iteration to force it to update (for testing).

Does anyone have any idea what could be going on? PS. I can provide the main.cpp file as well if needed. Thanks.

-Matt

#pragma OPENCL EXTENSION cl_amd_printf : enable __kernel void InitializeValues(__global float *optionValues, float u, int N) { int globalID = get_global_id(0) + 1; float myYValue, myOptionValue; if ( globalID <= N ) { myYValue = pow(u, globalID); myOptionValue = fmax(myYValue - 1.0f, 0.0f); optionValues[globalID] = myOptionValue; } //if ( globalID == N ) // printf("ID: %i\toptVal: %f\n", globalID, optionValues[globalID]); } __kernel void LookbackOpenCL(__local float *sharedOptionValues, __global float *optionValues, __global float *tempOptionValues, int N, float u, float d, float pu, float pd, float disc, int i, __global float* result) { int globalID = get_global_id(0); int localID = get_local_id(0); float myYValue, tempOptionValue; if ( globalID < i ) { sharedOptionValues[localID] = optionValues[globalID]; if ( localID == 255 || globalID == i - 1 ) sharedOptionValues[localID + 1] = optionValues[globalID + 1]; myYValue = pow(u, globalID); } EDIT: barrier(CLK_LOCAL_MEM_FENCE); if ( globalID < i ) { if ( globalID == 0 ) tempOptionValue = fmax(myYValue - 1.0f, ((pu * sharedOptionValues[1] * d) + (pd * sharedOptionValues[0] * u)) * disc); else tempOptionValue = fmax(myYValue - 1.0f, ((pu * sharedOptionValues[localID + 1] * d) + (pd * optionValues[globalID - 1] * u)) * disc); tempOptionValues[globalID] = tempOptionValue; } if ( globalID == 0 ) *result = optionValues[0]; //error here - for some reason, when I don't print the value it comes out incorrectly //if I do print, the value is as expected. // just a random thread //if ( globalID == 0 ) // printf("OptionValues[0]: %f\n", optionValues[0]); }

Outcomes