cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Raistmer
Adept II

Catalyst 11.11 is broken too

So, what is the status of this bug report ?
Now, when Cat11.12 ready for preview, what you get with this new driver or any newer internal builds? Will it fix bug or again too old to contain any fixes? ...
0 Likes
Reply
himanshu_gautam
Grandmaster

Catalyst 11.11 is broken too

Hi Raistmer,

Thanks for sending the code. I was trying to see what type of outputs the code generate on my system. The Kernel posted appears to be running fine with the internal driver i have installed on my machine for some set of values.

I wil try to install 11.11b performance driver and try that too. It would be helpful if you can give what kind of values the FFTBuf buffer contain.

 

0 Likes
Reply
Raistmer
Adept II

Catalyst 11.11 is broken too

Originally posted by: himanshu.gautam

Hi Raistmer,


Thanks for sending the code. I was trying to see what type of outputs the code generate on my system. The Kernel posted appears to be running fine with the internal driver i have installed on my machine for some set of values.


I wil try to install 11.11b performance driver and try that too. It would be helpful if you can give what kind of values the FFTBuf buffer contain.


 



Hi Himanshu,
Here I attached full logs for Cat 11.2 and Cat 11.11 (as one can see, they are identical before very last call, different part I cited in prev post) along with debug build of app itself and test workunit .
I will post whole code that generates dumps in log. I think you don't need all other kernels cause they work OK and data they generate listed in logs.

EDIT:
and here goes the code.
So you can easely compare log outputs with places in code that generated corresponding outputs.

void PC_FindAutoCorrelation_cl(float* PowerBin){ // Because positive and negative delays are the same // we only have to look at half the points //Jason: Use 4N-FFT method for Type 2 Discrete Cosine Tranform for now, to match fftw's REDFT10 // 1 Autocorrelation from global powerspectrum at fft_num*ac_fft_len (fft_num*ul_NumDataPoints ) //R: all transforms on whole 2D matrix instead of single power specrum. #if 1 //for dump const int ac_fftlen=128*1024; float* b=new float[ac_fftlen*4*2]; if(b==NULL)fprintf(stderr,"Can't allocate buffer\n"); #endif //Step 1: Preprocessing - repack relevant powerspectrum into a 4N array with 'real-even symmetry' { size_t localThreads[2]={64,1};//R: TODO tune size_t globalThreads[2]={128*1024,8};//R: hardwired FFT len and number of FFTs for now #if 1 //dump err=clEnqueueReadBuffer(cq,gpu_PowerSpectrum,CL_TRUE,sizeof(float)*ac_fftlen*1,sizeof(float)*ac_fftlen,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"power spectrum:\n"); for(int i=0;i<ac_fftlen;i++) fprintf(stderr,"i=%d,x=%g\n",i,b); #endif int i=0; err = clSetKernelArg(RepackInput_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_PowerSpectrum); err |= clSetKernelArg(RepackInput_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_WorkData); //int size=128*1024; //err |=clSetKernelArg(RepackInput_kernel_cl,i++,sizeof(cl_int),(void *)&size); if(err) fprintf(stderr,"ERROR: Setting kernel argument:RepackInput_kernel_cl: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: Setting kernel argument:RepackInput_kernel_cl ok\n"); #endif err = clEnqueueNDRangeKernel(cq,RepackInput_kernel_cl, 2, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: RepackInput_kernel_cl: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: RepackInput_kernel_cl ok\n"); #endif } #if 1 //dump err=clEnqueueReadBuffer(cq,gpu_WorkData,CL_TRUE,1*4*2*sizeof(float)*ac_fftlen,4*2*sizeof(float)*ac_fftlen,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"after inflation:\n"); for(int i=0;i<4*ac_fftlen;i++) fprintf(stderr,"i=%d,x=%g, y=%g\n",i,b[2*i],b[2*i+1]); #endif //Step 2: Process the 4N-FFT (Complex to Complex, size is 4 * ac_fft_len) err |= clFFT_ExecuteInterleaved_mb(cq, autocorr_plan,8, clFFT_Forward, gpu_WorkData,gpu_WorkData, FFTbuf, 0, NULL, NULL); if(err) fprintf(stderr,"ERROR: autocorr fft: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: autocorr fft ok\n"); #endif #if 1 //dump err=clEnqueueReadBuffer(cq,gpu_WorkData,CL_TRUE,1*4*2*sizeof(float)*ac_fftlen,4*2*sizeof(float)*ac_fftlen,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"after fft:\n"); for(int i=0;i<4*ac_fftlen;i++) fprintf(stderr,"i=%d,x=%g, y=%g\n",i,b[2*i],b[2*i+1]); #endif #if 0 //R: this step removed now //Step 3: Postprocess the FFT result (Scale, take powers & normalize), discarding unused data packing into AutoCorr_in first half for VRAM reuse { size_t localThreads[2]={64,1};//R: TODO tune size_t globalThreads[2]={32*1024,8};//R: hardwired FFT len and number of FFTs for now int i=0; err = clSetKernelArg(RepackOutput_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_WorkData); err |= clSetKernelArg(RepackOutput_kernel_cl,i++,sizeof(cl_mem),(void *)&FFTbuf); //int size=64*1024; //err |=clSetKernelArg(RepackOutput_kernel_cl,i++,sizeof(cl_int),(void *)&size); if(err) fprintf(stderr,"ERROR: Setting kernel argument:RepackOutput_kernel_cl: %d\n",err); err = clEnqueueNDRangeKernel(cq,RepackOutput_kernel_cl, 2, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: RepackOutput_kernel_cl: %d\n",err); } #endif #if 0 err=clEnqueueReadBuffer(cq,FFTbuf,CL_TRUE,1*64*1024*sizeof(float),sizeof(float)*ac_fftlen/2,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"after packing:\n"); for(int j=0;j<ac_fftlen/2;j++) fprintf(stderr,"AutoCorrelation[%d]=%.4g\n",j,b); //fprintf(stderr,"i=%d, x=%g, y=%g\n",i,b[2*i],b[2*i+1]); exit(0); #endif #if 0 //USE_OPENCL_NV || USE_OPENCL_HD5xxx {size_t localThreads[2]={256,1}; size_t globalThreads[2]={256,8};//R: hardwired number of FFTs for now int i=0; err = clSetKernelArg(FindAutoCorrelation_kernel1_cl,i++,sizeof(cl_mem),(void *)&gpu_WorkData); err |= clSetKernelArg(FindAutoCorrelation_kernel1_cl,i++,sizeof(cl_mem),(void *)&gpu_AutoCorrelationResults); if(err) fprintf(stderr,"ERROR: Setting kernel argument:FindAutoCorrelation_kernel1_cl: %d\n",err); err = clEnqueueNDRangeKernel(cq,FindAutoCorrelation_kernel1_cl, 2, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: FindAutoCorrelation_reduce0_kernel_cl: %d\n",err); } #else // const int len = 64*1024;//autocorr_fft_len/2; // const int blksize = UNSTDMAX(4, UNSTDMIN(pow2a((unsigned int) sqrt((float) (len / 32)) * 32), 512)); {size_t localThreads[2]={64,1}; size_t globalThreads[2]={256,8};//R: hardwired number of FFTs for now int i=0; err = clSetKernelArg(FindAutoCorrelation_reduce0_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_WorkData); err |= clSetKernelArg(FindAutoCorrelation_reduce0_kernel_cl,i++,sizeof(cl_mem),(void *)&FFTbuf); if(err) fprintf(stderr,"ERROR: Setting kernel argument:FindAutoCorrelation_reduce0_kernel_cl: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: Setting kernel argument:FindAutoCorrelation_reduce0_kernel_cl ok\n"); #endif err = clEnqueueNDRangeKernel(cq,FindAutoCorrelation_reduce0_kernel_cl, 2, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: FindAutoCorrelation_reduce0_kernel_cl: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: FindAutoCorrelation_reduce0_kernel_cl ok\n"); #endif } #if 1 //dump err=clEnqueueReadBuffer(cq,gpu_WorkData,CL_TRUE,0*256*sizeof(float),sizeof(float)*256*8*3,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"partially reduced:\n"); for(int j=0;j<256*8;j++) fprintf(stderr,"j=%d,partial_sum=%.4g,partial_max=%.4g, idx=%.4g\n",j,b[3*j],b[3*j+1],b[3*j+2]); //fprintf(stderr,"i=%d, x=%g, y=%g\n",i,b[2*i],b[2*i+1]); //exit(0); #endif { int i=0; size_t localThreads[1]={1}; size_t globalThreads[1]={8};//R: hardwired number of FFTs for now err = clSetKernelArg(FindAutoCorrelation_reduce1_kernel_cl,i++,sizeof(cl_mem),(void *)&FFTbuf); err |= clSetKernelArg(FindAutoCorrelation_reduce1_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_AutoCorrelationResults); if(err) fprintf(stderr,"ERROR: Setting kernel argument:FindAutoCorrelation_reduce1_kernel_cl: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: Setting kernel argument:FindAutoCorrelation_reduce1_kernel_cl ok\n"); #endif err = clEnqueueNDRangeKernel(cq,FindAutoCorrelation_reduce1_kernel_cl, 1, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: Enqueueing FindAutoCorrelation_reduce1_kernel_cl: %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: Enqueuing FindAutoCorrelation_reduce1_kernel_cl ok\n"); #endif } #endif #if OCL_WDM err=clFlush(cq); if(err)fprintf(stderr,"ERROR: submitting kernels autocorr search: %d\n",err); #if __Win32 Sleep(OCL_WDM_SLEEP); #elif _GNU_SOURCE usleep(OCL_WDM_SLEEP); #endif #endif err=clEnqueueReadBuffer(cq,gpu_AutoCorrelationResults,CL_TRUE,0,sizeof(cl_float2)*8,PowerBin,0, NULL,NULL); if(err) fprintf(stderr,"ERROR: clReadBuffer(gpu_AutoCorrelationResults,PowerBin): %d\n",err); #if OCL_VERBOSE fprintf(stderr,"INFO: clReadBuffer(gpu_AutoCorrelationResults,PowerBin) ok\n"); #endif #if 1 //dump fprintf(stderr,"fully reduced:\n"); for(int i=0;i<8;i++)fprintf(stderr,"i=%d, power=%.4g,bin=%.4g\n",i,PowerBin[2*i],PowerBin[2*i+1]); exit(0); #endif }

0 Likes
Reply
himanshu_gautam
Grandmaster

Catalyst 11.11 is broken too

Hi raistmer,

I am not able to access the logs you provided link for, as the link is non-english.

Can you please provide a english link.

0 Likes
Reply
Raistmer
Adept II

Catalyst 11.11 is broken too

Originally posted by: himanshu.gautam

Hi raistmer,




I am not able to access the logs you provided link for, as the link is non-english.




Can you please provide a english link.



There was "english" link on site before, perhaps hoster changed site design... OK, if Google's translate didn't help I will attach file to my bug report ticket. And find another hoster in the future.
0 Likes
Reply
Raistmer
Adept II

Catalyst 11.11 is broken too

0 Likes
Reply
himanshu_gautam
Grandmaster

Catalyst 11.11 is broken too

Got them 🙂

0 Likes
Reply
freighter
Journeyman III

Catalyst 11.11 is broken too

Originally posted by: freighter
Originally posted by: himanshu.gautam Hi Everybody,...

 

To summarize the issues, so that I am not missing anything:...



 

And not to forget the seti@home-issue with ATI GPUs with max. workgroup size of 128 (HD43xx - HD46xx), which will crash on windows (tested drivers Cat 10.12 till 11.11) but do work ok on linux (driver <= Cat 11.9). This is a different problem than the incorrectness issue.

 

Just for update : Tested again today with Cat 11.12 : still see crashing host with a total freeze (openSuse 11.3 64bit) or a bluescreen (windows XPsp3), so no change of that issue yet.

0 Likes
Reply
nou
Exemplar

Catalyst 11.11 is broken too

windows XP is not supported with SDK 2.6

0 Likes
Reply
freighter
Journeyman III

Catalyst 11.11 is broken too

Originally posted by: nou windows XP is not supported with SDK 2.6


The OpenCL runtime components are still included in driver Cat 11.12 for winXPpro 32bit. So, i guess, running OpenCL apps on XP IS still supported, isn't it ?

0 Likes
Reply