cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Raistmer
Adept II

Catalyst 11.11 is broken too

As linux topic was touched, our Linux porting team has the same problems with Cat 11.10 and 11.11 that I have with Windows. App produces invalid results under these 2 drivers. Versions before work OK. Maybe this will give more hints to AMD engineers what thing they broke in last Catalysts. So, it's not windows-specific issue.
0 Likes
Reply
Meteorhead
Challenger

Catalyst 11.11 is broken too

I also thought of reverting to some older Catalyst, but the problem was that on Ubuntu, it regularly crashed the machine when I tried to run multi-GPU applications. 11.11 is the first that was stable in that sense, but unfortunately it failes to boot in a significant ratio of times.

0 Likes
Reply
FrodoTheGiant
Journeyman III

Catalyst 11.11 is broken too

Originally posted by: Meteorhead Since the topic has been idle for a while now (don't know if my previous post was really that effective, or people just simply lost interest in dising QoS)..."


I totally agree with you, but did not post it so that your comment remained the last in this thread. The most visible.

So I was hoping someone from AMD might read it - and respond.

But ... nothing. I am done with AMD.

0 Likes
Reply
himanshu_gautam
Grandmaster

Catalyst 11.11 is broken too

Hi Everybody,

It is really sad that it is taking so much time in fixing these reported issues. I will try to find out there status and update here.I would like to assure that these issues are being looked into, based on their priorities.

To summarize the issues, so that I am not missing anything:

Raistmer reports the issue related to correctness in correlation function in SETI@home. This happens in both windows and linux.

Meteohead is trying to run Multi-GPU(x2 cards) configurations which seem to be working. But the driver is giving booting issues in both ubuntu and SLC5.

0 Likes
Reply
FrodoTheGiant
Journeyman III

Catalyst 11.11 is broken too

@himanshu.gautam: Please also add the "100% CPU load bug" to your list.

This is broken since the last 5 Catalyst releases. Including the newest one.

On (Windows?) systems with more than once graphics card installed you always have a CPU load of 100% when running Open CL applications. It should be ~2-3% (and has been in older Catalyst versions).

0 Likes
Reply
freighter
Journeyman III

Catalyst 11.11 is broken too

Originally posted by: himanshu.gautam Hi Everybody,...

To summarize the issues, so that I am not missing anything:...



And not to forget the seti@home-issue with ATI GPUs with max. workgroup size of 128 (HD43xx - HD46xx), which will crash on windows (tested drivers Cat 10.12 till 11.11) but do work ok on linux (driver <= Cat 11.9). This is a different problem than the incorrectness issue.

0 Likes
Reply
gat3way
Journeyman III

Catalyst 11.11 is broken too

Are you sure it works OK on linux? When using local memory, workgroup size is forced to 64 on 4xxx. It might work in some cases though, but in most cases you need to write a separate 4xxx kernel, otherwise it produces wrong results. I did not know it crashes on windows though, that sounds bad...

0 Likes
Reply
Raistmer
Adept II

Catalyst 11.11 is broken too

Originally posted by: gat3way

Are you sure it works OK on linux? When using local memory, workgroup size is forced to 64 on 4xxx. It might work in some cases though, but in most cases you need to write a separate 4xxx kernel, otherwise it produces wrong results. I did not know it crashes on windows though, that sounds bad...



Actually HD4xxx have no local memory exposed at all. It's emulated via global memory so senseless to use it for performance reasons.
The single point in program where it could be used even on HD4xxx is Apple's oclFFT library.
0 Likes
Reply
Raistmer
Adept II

Catalyst 11.11 is broken too

About those crashes: so far (~10 identical runs complete ) there are 2 locations in program's computational loop where crash occurs.
But number of loop iterations before crash varies. Also, in 2 cases incorrect data was detected (so, memory buffer was damaged before crash).
If AMD wants executable for test I can provide it.
It's separate problem (different application's modification) from problem in discussion in this topic.
0 Likes
Reply
Raistmer
Adept II

Catalyst 11.11 is broken too

Hope this speeds up AMD solution:

I narrowed down place of error to single kernel invocation:

Results of kernel call under Cat 11.2:

INFO: Setting kernel argument:FindAutoCorrelation_reduce1_kernel_cl ok
INFO: Enqueuing FindAutoCorrelation_reduce1_kernel_cl ok
INFO: clReadBuffer(gpu_AutoCorrelationResults,PowerBin) ok
fully reduced:
i=0, power=7.331,bin=9861
i=1, power=8.084,bin=1.983e+004
i=2, power=8.756,bin=4.039e+004
i=3, power=8.399,bin=1.305e+004
i=4, power=8.169,bin=3.056e+004
i=5, power=8.838,bin=1.377e+004
i=6, power=7.687,bin=6.322e+004
i=7, power=6.46,bin=4.259e+004

Under Cat 11.11:

INFO: Setting kernel argument:FindAutoCorrelation_reduce1_kernel_cl ok
INFO: Enqueuing FindAutoCorrelation_reduce1_kernel_cl ok
INFO: clReadBuffer(gpu_AutoCorrelationResults,PowerBin) ok
fully reduced:
i=0, power=150,bin=2.456e+004
i=1, power=135.1,bin=3.156e+004
i=2, power=139.5,bin=3.984e+004
i=3, power=141.9,bin=2.395e+004
i=4, power=139.9,bin=3.459e+004
i=5, power=141.9,bin=4.022e+004
i=6, power=140.4,bin=3.872e+004
i=7, power=138.1,bin=2.452e+004

And with so big power program ends with too many false positives found.

The code:

{ int i=0; size_t localThreads[1]={1}; size_t globalThreads[1]={8};//R: hardwired number of FFTs for now err = clSetKernelArg(FindAutoCorrelation_reduce1_kernel_cl,i++,sizeof(cl_mem),(void *)&FFTbuf); err |= clSetKernelArg(FindAutoCorrelation_reduce1_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_AutoCorrelationResults); if(err) fprintf(stderr,"ERROR: Setting kernel argument:FindAutoCorrelation_reduce1_kernel_cl: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: Setting kernel argument:FindAutoCorrelation_reduce1_kernel_cl ok\n"); #endif err = clEnqueueNDRangeKernel(cq,FindAutoCorrelation_reduce1_kernel_cl, 1, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: Enqueueing FindAutoCorrelation_reduce1_kernel_cl: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: Enqueuing FindAutoCorrelation_reduce1_kernel_cl ok\n"); #endif } #endif #if OCL_WDM err=clFlush(cq); if(err)fprintf(stderr,"ERROR: submitting kernels autocorr search: %d\n",err); #if __Win32 Sleep(OCL_WDM_SLEEP); #elif _GNU_SOURCE usleep(OCL_WDM_SLEEP); #endif #endif err=clEnqueueReadBuffer(cq,gpu_AutoCorrelationResults,CL_TRUE,0,sizeof(cl_float2)*8,PowerBin,0, NULL,NULL); if(err) fprintf(stderr,"ERROR: clReadBuffer(gpu_AutoCorrelationResults,PowerBin): %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: clReadBuffer(gpu_AutoCorrelationResults,PowerBin) ok\n"); #endif #if 1 //dump fprintf(stderr,"fully reduced:\n"); for(int i=0;i<8;i++)fprintf(stderr,"i=%d, power=%.4g,bin=%.4g\n",i,PowerBin[2*i],PowerBin[2*i+1]); exit(0); #endif } The kernel: __kernel void FindAutoCorrelation_reduce1_kernel_cl(__global float* PartialSumMaxIdx, __global float2* result){ const int sid=256*get_global_id(0);//only 8 workitems here float full_max=0.0f; float idx=0.0f; float Sum=0.0f; for(int i=0;i<256;i++){ Sum+=PartialSumMaxIdx[3*(sid+i)]; float tmp=PartialSumMaxIdx[(sid+i)*3+1]; if(tmp>full_max){ full_max=tmp; idx=PartialSumMaxIdx[(sid+i)*3+2]; } } result[get_global_id(0)].x=full_max*(128.0f*1024.0f)/Sum;//peak value result[get_global_id(0)].y=idx;//bin }

0 Likes
Reply