Raistmer,
I'm curious what project (some BOINC project perhaps?) your participants are generating incorrect results for. You don't give a googleable name, and you don't give a URL.
If you want AMD to help out, you need to be a bit more understanding that they are dealing with hundreds (thousands?) of different users/developers. To expect them to just remember you and your unnamed project is a bit much.
Without details on what is broken, and any hope of AMD reproducing the problem, how can they tell if your problem is real, or is simply PEBKAC. (Google it...)
I am also frustrated in the slow pace of support from AMD for OpenCL issues. But being rude in the forums won't help your case.
-- Tim
PEBKAC: Poorly Educated Bored Know-all AMD Coder
... and AMD seems to have lots of those in their driver department.
Originally posted by: timattox
Without details on what is broken, and any hope of AMD reproducing the problem, how can they tell if your problem is real, or is simply PEBKAC. (Google it...)
-- Tim
Originally posted by: Raistmer 3) Well, I can say what is broken - AutoCorrelation search is broken and produces overflow in results (too many reported false positives). But I hardly think this app-specific info can help in driver debugging.
Raistmer,
Would it be possible to send AutoCorrelation search code which helps us to look issue?
Originally posted by: genaganna Originally posted by: Raistmer 3) Well, I can say what is broken - AutoCorrelation search is broken and produces overflow in results (too many reported false positives). But I hardly think this app-specific info can help in driver debugging.
Raistmer,
Would it be possible to send AutoCorrelation search code which helps us to look issue?
void PC_FindAutoCorrelation_cl(float* PowerBin){ // Because positive and negative delays are the same // we only have to look at half the points //Jason: Use 4N-FFT method for Type 2 Discrete Cosine Tranform for now, to match fftw's REDFT10 // 1 Autocorrelation from global powerspectrum at fft_num*ac_fft_len (fft_num*ul_NumDataPoints ) //R: all transforms on whole 2D matrix instead of single power specrum. #if 0 const int ac_fftlen=128*1024; float* b=new float[ac_fftlen*4*2]; if(b==NULL)fprintf(stderr,"Can't allocate buffer\n"); #endif //Step 1: Preprocessing - repack relevant powerspectrum into a 4N array with 'real-even symmetry' { size_t localThreads[2]={64,1};//R: TODO tune size_t globalThreads[2]={128*1024,8};//R: hardwired FFT len and number of FFTs for now #if 0 err=clEnqueueReadBuffer(cq,gpu_PowerSpectrum,CL_TRUE,sizeof(float)*ac_fftlen*1,sizeof(float)*ac_fftlen,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"power spectrum:\n"); for(int i=0;i<ac_fftlen;i++) fprintf(stderr,"i=%d,x=%g\n",i,b); #endif int i=0; err = clSetKernelArg(RepackInput_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_PowerSpectrum); err |= clSetKernelArg(RepackInput_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_WorkData); //int size=128*1024; //err |=clSetKernelArg(RepackInput_kernel_cl,i++,sizeof(cl_int),(void *)&size); if(err) fprintf(stderr,"ERROR: Setting kernel argument:RepackInput_kernel_cl: %d\n",err); #if OCL_VERBOSE else fprintf(stderr,"INFO: Setting kernel argument:RepackInput_kernel_cl ok\n"); #endif err = clEnqueueNDRangeKernel(cq,RepackInput_kernel_cl, 2, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: RepackInput_kernel_cl: %d\n",err); #if OCL_VERBOSE else fprintf(stderr,"INFO: RepackInput_kernel_cl ok\n"); #endif } #if 0 err=clEnqueueReadBuffer(cq,gpu_WorkData,CL_TRUE,1*4*2*sizeof(float)*ac_fftlen,4*2*sizeof(float)*ac_fftlen,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"after inflation:\n"); for(int i=0;i<4*ac_fftlen;i++) fprintf(stderr,"i=%d,x=%g, y=%g\n",i,b[2*i],b[2*i+1]); #endif //Step 2: Process the 4N-FFT (Complex to Complex, size is 4 * ac_fft_len) err |= clFFT_ExecuteInterleaved_mb(cq, autocorr_plan,8, clFFT_Forward, gpu_WorkData,gpu_WorkData, FFTbuf, 0, NULL, NULL); if(err) fprintf(stderr,"ERROR: autocorr fft: %d\n",err); #if OCL_VERBOSE else fprintf(stderr,"INFO: autocorr fft ok\n"); #endif #if 0 err=clEnqueueReadBuffer(cq,gpu_WorkData,CL_TRUE,1*4*2*sizeof(float)*ac_fftlen,4*2*sizeof(float)*ac_fftlen,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"after fft:\n"); for(int i=0;i<4*ac_fftlen;i++) fprintf(stderr,"i=%d,x=%g, y=%g\n",i,b[2*i],b[2*i+1]); #endif #if 0 //R: this step removed now //Step 3: Postprocess the FFT result (Scale, take powers & normalize), discarding unused data packing into AutoCorr_in first half for VRAM reuse { size_t localThreads[2]={64,1};//R: TODO tune size_t globalThreads[2]={32*1024,8};//R: hardwired FFT len and number of FFTs for now int i=0; err = clSetKernelArg(RepackOutput_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_WorkData); err |= clSetKernelArg(RepackOutput_kernel_cl,i++,sizeof(cl_mem),(void *)&FFTbuf); //int size=64*1024; //err |=clSetKernelArg(RepackOutput_kernel_cl,i++,sizeof(cl_int),(void *)&size); if(err) fprintf(stderr,"ERROR: Setting kernel argument:RepackOutput_kernel_cl: %d\n",err); err = clEnqueueNDRangeKernel(cq,RepackOutput_kernel_cl, 2, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: RepackOutput_kernel_cl: %d\n",err); } #endif #if 0 err=clEnqueueReadBuffer(cq,FFTbuf,CL_TRUE,1*64*1024*sizeof(float),sizeof(float)*ac_fftlen/2,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"after packing:\n"); for(int j=0;j<ac_fftlen/2;j++) fprintf(stderr,"AutoCorrelation[%d]=%.4g\n",j,b
); //fprintf(stderr,"i=%d, x=%g, y=%g\n",i,b[2*i],b[2*i+1]); exit(0); #endif #if 0 //USE_OPENCL_NV || USE_OPENCL_HD5xxx {size_t localThreads[2]={256,1}; size_t globalThreads[2]={256,8};//R: hardwired number of FFTs for now int i=0; err = clSetKernelArg(FindAutoCorrelation_kernel1_cl,i++,sizeof(cl_mem),(void *)&gpu_WorkData); err |= clSetKernelArg(FindAutoCorrelation_kernel1_cl,i++,sizeof(cl_mem),(void *)&gpu_AutoCorrelationResults); if(err) fprintf(stderr,"ERROR: Setting kernel argument:FindAutoCorrelation_kernel1_cl: %d\n",err); err = clEnqueueNDRangeKernel(cq,FindAutoCorrelation_kernel1_cl, 2, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: FindAutoCorrelation_reduce0_kernel_cl: %d\n",err); } #else // const int len = 64*1024;//autocorr_fft_len/2; // const int blksize = UNSTDMAX(4, UNSTDMIN(pow2a((unsigned int) sqrt((float) (len / 32)) * 32), 512)); {size_t localThreads[2]={64,1}; size_t globalThreads[2]={256,8};//R: hardwired number of FFTs for now int i=0; err = clSetKernelArg(FindAutoCorrelation_reduce0_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_WorkData); err |= clSetKernelArg(FindAutoCorrelation_reduce0_kernel_cl,i++,sizeof(cl_mem),(void *)&FFTbuf); if(err) fprintf(stderr,"ERROR: Setting kernel argument:FindAutoCorrelation_reduce0_kernel_cl: %d\n",err); #if OCL_VERBOSE else fprintf(stderr,"INFO: Setting kernel argument:FindAutoCorrelation_reduce0_kernel_cl ok\n"); #endif err = clEnqueueNDRangeKernel(cq,FindAutoCorrelation_reduce0_kernel_cl, 2, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: FindAutoCorrelation_reduce0_kernel_cl: %d\n",err); #if OCL_VERBOSE else fprintf(stderr,"INFO: FindAutoCorrelation_reduce0_kernel_cl ok\n"); #endif } #if 0 err=clEnqueueReadBuffer(cq,gpu_WorkData,CL_TRUE,0*256*sizeof(float),sizeof(float)*256*8*3,b,0, NULL,NULL); if(err)fprintf(stderr,"ERROR: ReadBuffer:%d\n",err); fprintf(stderr,"partially reduced:\n"); for(int j=0;j<256*8;j++) fprintf(stderr,"j=%d,partial_sum=%.4g,partial_max=%.4g, idx=%.4g\n",j,b[3*j],b[3*j+1],b[3*j+2]); //fprintf(stderr,"i=%d, x=%g, y=%g\n",i,b[2*i],b[2*i+1]); exit(0); #endif { int i=0; size_t localThreads[1]={1}; size_t globalThreads[1]={8};//R: hardwired number of FFTs for now err = clSetKernelArg(FindAutoCorrelation_reduce1_kernel_cl,i++,sizeof(cl_mem),(void *)&FFTbuf); err |= clSetKernelArg(FindAutoCorrelation_reduce1_kernel_cl,i++,sizeof(cl_mem),(void *)&gpu_AutoCorrelationResults); if(err) fprintf(stderr,"ERROR: Setting kernel argument:FindAutoCorrelation_reduce1_kernel_cl: %d\n",err); #if OCL_VERBOSE else fprintf(stderr,"INFO: Setting kernel argument:FindAutoCorrelation_reduce1_kernel_cl ok\n"); #endif err = clEnqueueNDRangeKernel(cq,FindAutoCorrelation_reduce1_kernel_cl, 1, NULL,globalThreads, localThreads, 0,NULL,NULL); if(err) fprintf(stderr,"ERROR: Enqueueing FindAutoCorrelation_reduce1_kernel_cl: %d\n",err); #if OCL_VERBOSE else fprintf(stderr,"INFO: Enqueuing FindAutoCorrelation_reduce1_kernel_cl ok\n"); #endif } #endif #if OCL_WDM err=clFlush(cq); if(err)fprintf(stderr,"ERROR: submitting kernels autocorr search: %d\n",err); #if __Win32 Sleep(OCL_WDM_SLEEP); #elif _GNU_SOURCE usleep(OCL_WDM_SLEEP); #endif #endif err=clEnqueueReadBuffer(cq,gpu_AutoCorrelationResults,CL_TRUE,0,sizeof(cl_float2)*8,PowerBin,0, NULL,NULL); if(err) fprintf(stderr,"ERROR: clReadBuffer(gpu_AutoCorrelationResults,PowerBin): %d\n",err); #if OCL_VERBOSE else fprintf(stderr,"INFO: clReadBuffer(gpu_AutoCorrelationResults,PowerBin) ok\n"); #endif #if 0 for(int i=0;i<8;i++)fprintf(stderr,"i=%d, power=%.4g,bin=%.4g\n",i,PowerBin[2*i],PowerBin[2*i+1]); exit(0); #endif }
Originally posted by: Raistmer (but it uses oclFFT on size of 512k. Quite possible that one of FFT kernels failed) EDIT: full function as is:
Thank you for giving some code. Could you please copy oclFFT kernels also or tell us with what options oclFFT kernels were generated?
Originally posted by: genaganna Originally posted by: Raistmer (but it uses oclFFT on size of 512k. Quite possible that one of FFT kernels failed) EDIT: full function as is:
Thank you for giving some code. Could you please copy oclFFT kernels also or tell us with what options oclFFT kernels were generated?
cl_int clFFT_ExecuteInterleaved_mb( cl_command_queue queue, clFFT_Plan Plan, cl_int batchSize, clFFT_Direction dir, cl_mem data_in, cl_mem data_out, cl_mem temp_buf, cl_int num_events, cl_event *event_list, cl_event *event ) { int s; cl_fft_plan *plan = (cl_fft_plan *) Plan; if(plan->format != clFFT_InterleavedComplexFormat) return CL_INVALID_VALUE; cl_int err=CL_SUCCESS; size_t gWorkItems, lWorkItems; int inPlaceDone; cl_int isInPlace = data_in == data_out ? 1 : 0; cl_mem memObj[3]; memObj[0] = data_in; memObj[1] = data_out; memObj[2] = temp_buf; cl_fft_kernel_info *kernelInfo = plan->kernel_info; int numKernels = plan->num_kernels; //fprintf(stderr,"ExecuteInterleaved_mb: plan->num_kernels =%d\n",(int)plan->num_kernels); int numKernelsOdd = numKernels & 1; int currRead = 0; int currWrite = 1; #if 0 cl_event fft_event=NULL; unsigned int pass=0; #endif // at least one external dram shuffle (transpose) required if(plan->temp_buffer_needed) { // in-place transform if(isInPlace) { inPlaceDone = 0; currRead = 1; currWrite = 2; } else { currWrite = (numKernels & 1) ? 1 : 2; } while(kernelInfo){ if( isInPlace && numKernelsOdd && !inPlaceDone && kernelInfo->in_place_possible) { currWrite = currRead; inPlaceDone = 1; } #if 0 if(fft_event){ cl_ulong start,end; err=clWaitForEvents(1,&fft_event); err|=clGetEventProfilingInfo (fft_event,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&start,NULL); err|=clGetEventProfilingInfo (fft_event,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&end,NULL); //Counters<T_oclFFT1_ns,cl_ulong>::update(end-start); fprintf(stderr,"Pass %u: kernel took: %.2e ns, s=%d\n",pass,float(end-start),batchSize); err|=clReleaseEvent(fft_event);fft_event=NULL; if(err != CL_SUCCESS) fprintf(stderr,"ERROR: mb oclFFT_1 event: %d\n",err); else fprintf(stderr,"INFO: mb oclFFT_1 event done ok.\n"); } #endif s = batchSize; getKernelWorkDimensions(plan, kernelInfo, &s, &gWorkItems, &lWorkItems); //fprintf(stderr,"After getKernelWorkDimensions:\nbatchSize s =%d, gWorkItems =%d, lWorkItems =%d, dir =%d\n", s, gWorkItems, lWorkItems, dir); err |= clSetKernelArg(kernelInfo->kernel, 0, sizeof(cl_mem), &memObj[currRead]); err |= clSetKernelArg(kernelInfo->kernel, 1, sizeof(cl_mem), &memObj[currWrite]); err |= clSetKernelArg(kernelInfo->kernel, 2, sizeof(cl_int), &dir); err |= clSetKernelArg(kernelInfo->kernel, 3, sizeof(cl_int), &s); err |= clEnqueueNDRangeKernel(queue, kernelInfo->kernel, 1, NULL, &gWorkItems, &lWorkItems, 0, NULL,NULL/*&fft_event*/); if(err){ fprintf(stderr,"Error in mb oclFFT_1: %d\n",err); return err; } #if OCL_VERBOSE else fprintf(stderr,"INFO: in mb oclFFT_1 ok\n"); #endif currRead = (currWrite == 1) ? 1 : 2; currWrite = (currWrite == 1) ? 2 : 1; kernelInfo = kernelInfo->next; #if 0 pass++; #endif } } // no dram shuffle (transpose required) transform // all kernels can execute in-place. else { while(kernelInfo){ #if 0 if(fft_event){ cl_ulong start,end; err=clWaitForEvents(1,&fft_event); err|=clGetEventProfilingInfo (fft_event,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&start,NULL); err|=clGetEventProfilingInfo (fft_event,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&end,NULL); //Counters<T_oclFFT2_ns,cl_ulong>::update(end-start); fprintf(stderr,"Pass %u: kernel took: %.2e ns, s=%d\n",pass,float(end-start),batchSize); err|=clReleaseEvent(fft_event);fft_event=NULL; if(err != CL_SUCCESS) fprintf(stderr,"ERROR: oclFFT_2 event.\n"); else fprintf(stderr,"INFO: mb oclFFT_2 event done ok\n"); } #endif s = batchSize; getKernelWorkDimensions(plan, kernelInfo, &s, &gWorkItems, &lWorkItems); //fprintf(stderr,"After getKernelWorkDimensions:\nbatchSize s =%d, gWorkItems =%d, lWorkItems =%d, dir =%d\n", s, gWorkItems, lWorkItems, dir); err |= clSetKernelArg(kernelInfo->kernel, 0, sizeof(cl_mem), &memObj[currRead]); err |= clSetKernelArg(kernelInfo->kernel, 1, sizeof(cl_mem), &memObj[currWrite]); err |= clSetKernelArg(kernelInfo->kernel, 2, sizeof(cl_int), &dir); err |= clSetKernelArg(kernelInfo->kernel, 3, sizeof(cl_int), &s); err |= clEnqueueNDRangeKernel(queue, kernelInfo->kernel, 1, NULL, &gWorkItems, &lWorkItems, 0, NULL,NULL/*&fft_event*/); if(err){ fprintf(stderr,"Error in mb oclFFT_2: %d\n",err); return err; } #if OCL_VERBOSE else fprintf(stderr,"INFO: in mb oclFFT_2 ok\n"); #endif currRead = 1; currWrite = 1; //fprintf(stderr,"INFO: before mb oclFFT_2 kernel_info->next\n"); kernelInfo = kernelInfo->next; //fprintf(stderr,"INFO: after mb oclFFT_2 kernel_info->next\n"); #if 0 pass++; #endif } } #if 0 if(fft_event){ cl_ulong start,end; err=clWaitForEvents(1,&fft_event); err|=clGetEventProfilingInfo (fft_event,CL_PROFILING_COMMAND_START,sizeof(cl_ulong),&start,NULL); err|=clGetEventProfilingInfo (fft_event,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),&end,NULL); //Counters<T_oclFFT1_ns,cl_ulong>::update(end-start); fprintf(stderr,"Final pass: kernel took: %.2e ns, s=%d\n",float(end-start),batchSize); err|=clReleaseEvent(fft_event);fft_event=NULL; if(err != CL_SUCCESS)fprintf(stderr,"ERROR: Final mb oclFFT event: %d\n",err); else fprintf(stderr,"INFO: Final mb oclFFT event done ok.\n",err); } #endif // if(batchSize>1) Sleep(2); // else Sleep(1); //fprintf(stderr,"ExecuteInterleaved_mb: about to return with err=%d\n",(int)err); return err; }
Originally posted by: Raistmer any news on topic?
Only some more bad news using the AMD Catalyst 11.10 release x86_64 linux driver.
Everything that Raistmer reported before in this forum thread for the windows drivers is also affecting the linux version. Windows and linux share the same codebase.
The same type of incorrect results the autocorrelation kernels created on OpenSuse 11.4, 64bit and Kubuntu 10.10, 64bit.
Also another problem got introduced with the new driver version Catalyst 11.10 when using linux OpenCL version for HD4xxx with max workgroup size 128. A severe host system freeze (bluescreen ati2dvag.dll) when running this application version did affect all windows versions (tested since Cat 10.12) and now newly appears on 64bit linux (tested ok until Cat11.9), too. As linux is not throwing "bluescreens" the computer gets completely irresponsive, while still showing the frozen desktop and needs to be rebooted. This happens reproducible at the same place in code (see below : POINT OF FAILURE). This was tested on OpenSuse 11.3, 64bit with a HD4670, 1GB, OpenCL 1.0 device.
....
localThreads[0]=32; //uje: needed for lower HD4xxx GPUs
#else
localThreads[0]=host.GetWGSize();
#endif
//fprintf(stderr,"localThreads[0]=%d\n",localThreads[0]);
err = clEnqueueNDRangeKernel(cq,PC_find_spike_kernel_cl,
1,//R: 1D execution domain used, each work item works with 4 data elements
NULL,globalThreads,
localThreads,
0,NULL,NULL);
if(err) fprintf(stderr,"ERROR: Enqueueing kernel:PC_find_spike_kernel_cl:%d\n",err);
#if OCL_VERBOSE
else fprintf(stderr,"INFO: Enqueueing kernel: PC_find_spike_kernel_cl done ok\n");
#endif
//clFinish(cq);
//R: checking if any CPU reprocessing/logging needed and if yes, retrieve results
cl_uint cpu_result_flag[RESULT_SIZE];
fprintf(stderr,"INFO: Before clEnqueueReadBuffer\n");
#if OCL_WDM
err=clFlush(cq);
if(err)fprintf(stderr,"ERROR: submitting kernels for non-strip Spike search: %d\n",err);
#if __Win32
Sleep(OCL_WDM_SLEEP);
#elif _GNU_SOURCE
usleep(OCL_WDM_SLEEP);
#endif
#endif
err=clEnqueueReadBuffer(cq,gpu_result_flag,CL_TRUE,0,
sizeof(cl_uint)*((globalThreads[0]<RESULT_SIZE)?globalThreads[0]:RESULT_SIZE),
&cpu_result_flag,0, NULL,NULL); //<---------------------------------------POINT OF FAILURE
if(err) fprintf(stderr,"ERROR: ReadBuffer(gpu_result_flag,spike):%d\n",err);
#if OCL_VERBOSE
else fprintf(stderr,"INFO: ReadBuffer(gpu_result_flag,spike) done ok\n");
#endif
fprintf(stderr,"spike search results (main path): ");
....
Retested on openSuse11.3, 64bit with Cat 11.11 and still have found my host crashing the same way like described earlier.
How about a patch for this issue ?
If you have previously installed SDK 2.5, please make sure you remove everything its installer placed in /etc/ld.so.conf.d
Otherwise you are using the libOpenCL.so from your SDK2.5 directory, not the one provided with Catalyst 11.11. I haven't tested it yet (I am redoing the offline compilation of all my kernels). The difference is two new targets (Scrapper and Devastator), faster compilation time, mad24 broken for some reason (didn't investigate that though, just replaced it with a slower equivalent).
Well, apparently Catalyst 11.11 download link is broken too 🙂
Yep. This is a bad thing to do I know, but it worked before:
#define getglobalid(a) (mad24(get_group_id(0), 64, get_local_id(0)))
...
found_ind[getglobalid(0)] = 1;
Error is:
/tmp/OCLYsVbCt.cl(786): error: more than one instance of overloaded function
"mad24" matches the argument list:
function "mad24(int, int, int) C++"
function "mad24(uint, uint, uint) C++"
argument types are: (uint, int, uint)
found_ind[getglobalid(0)] = 1;
Thanks, Micah!
There is a fix. It is called NVIDIA.
My company decided last week to dump AMD and switch to NVIDIA.
AMD is simply to unreliable when it comes to serious GPGPU computing.
2 FrodoTheGiant:
Can you stop hysterics?
Originally posted by: FrodoTheGiant
There is a fix. It is called NVIDIA.
My company decided last week to dump AMD and switch to NVIDIA.
AMD is simply to unreliable when it comes to serious GPGPU computing.
$10^5 is spared change to AMD. If the rumors are real, the loss of their contract with Apple will be significant. If you guys have worked for a big corporates, AMD engineers are and should be spending more time on Indeed than fixing anything. Why? Fixing bugs won't prevent you from layoff. What we need for OpenCL to succeed are killer applications used by consumers.
I feel bad for engineers being axed, but not for the corporates making dumb decisions.
Originally posted by: NURBS
$10^5 is spared change to AMD.
Frodo has already stated that his company decide to shif from AMD to NV. If he isn't working for MS, Apple or some other big company, AMD really doesn't have to care (taking into account 10^5 $ is small money). It is one sad, true story like many others.
I am lucky enough to work at a research center and not at a company which is very result-oriented. I have the luxury to keep ever experimenting with alternative methods and not use factory standards.
Long ago I convinced people to invest money in a multi-GPU development node, used for exploring multi-GPU applications on AMD side. 3X HD5970 is a lot of money for a small hungarian research project. I have my faith in AMD engineers, who create kick@ss HW, but as Raistimer has sad, it is all negated by the poor drivers. I really hate people when they write in full capitals, or when people are vulgar, but as a good friend of mine said: vulgar has it's place:
Two damned GPU generations have passed (having HD7xxx just around the corner), and our 3X HD5970s ARE STILL NOT SUPPORTED!! Really... is AMD really serious about that?? Even if I have my faith in AMD, how should I convince others to invest money in it? Let me try to guess what comes into Frodo's mind: "NOHOW! Do a favor for everyone and keep them away from AMD!"
Unfortunate for AMD, I believe I can achieve more if I keep on pushing AMD, and either keep showing it's impotency, or it's merits. It is all up to "you".
And for the higher ups. Please-please-please Micah, Genaganna or anyone else who reads this. Do convey these comments as quotations. Not summaries, or some light version, but word by word:
Dear AMD,
you're going to your doom if you keep on like this. 2 ****ing years for a GPU to be supported?? By the time I can leverage my HW, it is surpassed by competition mid-class. INVEST MORE INTO SW DEVELOPMENT!! A LOT MORE!!!!!!
Regards,
a fan.
Edit: and don't get me wrong. This is not an advice... this is an ultimatum from the users (developers) as a collective.
Edit2: Not to mention I opened a topic concerning a rather fatal bug of Kernel Analyzer over 3 weeks ago, and I still got no response from anybody, although I'm pretty sure that at least one person knows where KA stores data that has to be cleaned after reinstall.
Originally posted by: Raistmer
Needless to say about disgusting software support of most advanced AMD hardware. I have tester with 2 HD6990, 2-core GPUs. He barely can use them.
It is really a slap in the face for those users spending big money on dual GPU cards like the 5970 or 6990 - and then can't use them because AMD doesn't support them, drops support or breaks drivers.
How would anyone trust AMD again?
Why would anyone every buy a high-end AMD GPU again based on that history of scorched earth they have with existing users?
Since the topic has been idle for a while now (don't know if my previous post was really that effective, or people just simply lost interest in dising QoS), I though I would post how can one waste a complete day with installing linux.
... ... ... I wanted to write a little short story, to put things nice, and not just btich about things that don't work, but as I was thinking, I figured there's really no point, so I'll just make it short: 11.11 fails even to launch Ubuntu desktop 80% of the times. The desktop simply freezes just a few seconds after the background is shown and the mouse could be moved for a while. Naturally this prevents remote rebooting, as most of the times, the computer will not beable to boot.
For other reasons, I got fed up with Ubuntu, so I installed SLC5 (Scientific Linux CERN, a Red Hat based distro), and after installing a few stuff and restarting the computer in between, after udev has been loaded at boot time and the GUI boot window would be opened, the screen changes resolution (very noticable on the CRT monitor in the computer room), and then ust stays blank and boot halts, just like on Ubuntu (10.04.3, both 64-bit). I have done this twice already (I thought it was the fault of trying to get auto-login working, but even after undoing changes with a live Ubuntu stick, it still fails to boot).
If I pull the computer out of the rack, I got myself a 3000$ doorstand. Or I could heat my office with it...
What is the conclusion? Life would be much better, if I could install a headless node, with no GUI whatsoever and be able to use the GPUs without having to use this crap XServer which crashes at every second corner. Yet again, one thing the community has been asking for over a year now.
Peace.
Originally posted by: Meteorhead Since the topic has been idle for a while now (don't know if my previous post was really that effective, or people just simply lost interest in dising QoS)..."
I totally agree with you, but did not post it so that your comment remained the last in this thread. The most visible.
So I was hoping someone from AMD might read it - and respond.
But ... nothing. I am done with AMD.
Hi Everybody,
It is really sad that it is taking so much time in fixing these reported issues. I will try to find out there status and update here.I would like to assure that these issues are being looked into, based on their priorities.
To summarize the issues, so that I am not missing anything:
Raistmer reports the issue related to correctness in correlation function in SETI@home. This happens in both windows and linux.
Meteohead is trying to run Multi-GPU(x2 cards) configurations which seem to be working. But the driver is giving booting issues in both ubuntu and SLC5.
@himanshu.gautam: Please also add the "100% CPU load bug" to your list.
This is broken since the last 5 Catalyst releases. Including the newest one.
On (Windows?) systems with more than once graphics card installed you always have a CPU load of 100% when running Open CL applications. It should be ~2-3% (and has been in older Catalyst versions).
Originally posted by: himanshu.gautam Hi Everybody,...
To summarize the issues, so that I am not missing anything:...
And not to forget the seti@home-issue with ATI GPUs with max. workgroup size of 128 (HD43xx - HD46xx), which will crash on windows (tested drivers Cat 10.12 till 11.11) but do work ok on linux (driver <= Cat 11.9). This is a different problem than the incorrectness issue.
Are you sure it works OK on linux? When using local memory, workgroup size is forced to 64 on 4xxx. It might work in some cases though, but in most cases you need to write a separate 4xxx kernel, otherwise it produces wrong results. I did not know it crashes on windows though, that sounds bad...
Originally posted by: gat3way
Are you sure it works OK on linux? When using local memory, workgroup size is forced to 64 on 4xxx. It might work in some cases though, but in most cases you need to write a separate 4xxx kernel, otherwise it produces wrong results. I did not know it crashes on windows though, that sounds bad...
Originally posted by: freighter Originally posted by: himanshu.gautam Hi Everybody,...
To summarize the issues, so that I am not missing anything:...
And not to forget the seti@home-issue with ATI GPUs with max. workgroup size of 128 (HD43xx - HD46xx), which will crash on windows (tested drivers Cat 10.12 till 11.11) but do work ok on linux (driver <= Cat 11.9). This is a different problem than the incorrectness issue.
Just for update : Tested again today with Cat 11.12 : still see crashing host with a total freeze (openSuse 11.3 64bit) or a bluescreen (windows XPsp3), so no change of that issue yet.
windows XP is not supported with SDK 2.6
Originally posted by: nou windows XP is not supported with SDK 2.6
The OpenCL runtime components are still included in driver Cat 11.12 for winXPpro 32bit. So, i guess, running OpenCL apps on XP IS still supported, isn't it ?