Hi,
I have try to run the clpp gpu radix sort on a HD6950 and after 1 sort, it locks !
To test, you can simply download the last version from SVN (not the packaged one), run it... and you will have to reboot your computer !
It is a CRITICAL problem... so please do your best to fix it !
Thanks
Does someone from the AMD Team can test it ?
My computer is locked each time I start the sort algorithm ! (I know that I'm not the only one to meet this kind of problem !!!)
You can download the last version from SVN here :
Thanks Micah,
I will try, but what and where/how can I set GPU_BARRIER_DETECTION=true ?
Hi Micah,
I have do some tests :
1) I only test the 'scan' algorithm and the computer is blocked.
2) I have put the env var. GPU_BARRIER_DETECTION but it has NO effect
3) I have remove all the barriers and this has the computer DOES NOT block. So, the problem is related to a local-barrier
What can I do to help you to find the problem ? Maybe someone at AMD can take a closer look ? Tell me what to do ?
Thanks
BTW: I have also update to Catalyst 11.6 but it change nothing 😞
I have also tested with the Catalyst 11.7 preview, the one delivered with gDebugger.
First, I still have the problem and even there are some regressions !!!!
I have test my main software with it (This one does not contains any _sync_ operation) and it also block the computer !
There is a terrible bug somewhere 😞
Tell me how I can help you to debug this ?
I have see that other peoples also have their computer freezing :
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=150237&enterthread=y
And even, I have test my application with Catalyst 11.7 and now even the CPU version crash... I run it and it exit directly !!!!!!!
vewon01,
I am looking into this issue.I will try to communicate if i find some thing fishy inyour code.
Anyways as you said before, the code doesn't crash when you removed the barriers.I would suggest you to incrementely remove barriers one by one and try to figure out which barrier gets stuck.
Also have you tried to use gDEBugger, it can be helpful.
Thanks,
So I'll wait for your feedback. Thanks for your help.
Hi Himanshu,
Have you find something ? Maybe the problem is somewhere else ?
Still no news ?
I have try to find the bug, but maybe I have miss a specific architecture constraint (Like the one that Micah is talking about). Or it is a driver bug !
Can you help ?
Thanks
Any news ?
How can I help to debug please ?
Thanks
Originally posted by: viewon01 Any news ?
How can I help to debug please ?
Thanks
Viewon01,
It is look like they have not followed OpenCL. I am getting following error from clBuildProgram. I have tested with SDK2.5 and 11.7 driver.
==============================================
Platform[AMD Accelerated Parallel Processing] Device[Cypress]
--------------- Satish radix sort Key
Error: Failed to build program executable!
C:\Users\Naganna\AppData\Local\Temp\OCL850A.tmp.cl(100): error: non-kernel
function: variable with automatic storage duration cannot be stored
in a named address space
__local uint localBuffer[TPG*2];
^
C:\Users\Naganna\AppData\Local\Temp\OCL850A.tmp.cl(104): error: identifier
"localBuffer" is undefined
uint4 localBits = inclusive_scan_128(localBuffer, tid, block, lane, init
ialValue, bitsOnCount);
^
C:\Users\Naganna\AppData\Local\Temp\OCL850A.tmp.cl(128): error: mixed
vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
localBits += localBuffer[block + 4 - 1];
^
C:\Users\Naganna\AppData\Local\Temp\OCL850A.tmp.cl(308): warning: type
qualifier is meaningless on cast type
const int4 tid4 = ((const int4)tid) + (const int4)(0,WGZ,WGZ_x2,WGZ_x3);
^
C:\Users\Naganna\AppData\Local\Temp\OCL850A.tmp.cl(309): warning: type
qualifier is meaningless on cast type
const int4 gid4 = tid4 + ((const int4)groupId<<2);
^
3 errors detected in the compilation of "C:\Users\Naganna\AppData\Local\Temp\OCL
850A.tmp.cl".
≡¡Internal error: compiler frontend invocation failed. Make sure ATISTREAMSDKROO
T is set
Program build failure
Assertion failed: clStatus == CL_SUCCESS, file c:\users\naganna\downloads\clpp_v
1_beta3\clpp\src\clpp\clppprogram.cpp, line 180
==============================================
Please ask clpp developers to fix all OpenCL compilation error for AMD OpenCL.
Hi,
The fix have been dones. Can you help to fix for the GPU please ?
Thanks
It looks like the latest drivers, Catalyst 11.9 fixes the problem.
Can you confirm?
Originally posted by: erwincoumans
It looks like the latest drivers, Catalyst 11.9 fixes the problem.
Can you confirm?
I still see those errors with internel libraries. Please ask clpp developer to fix the issue.
Error log
==============================================
Platform[AMD Accelerated Parallel Processing] Device[Cypress]
--------------- Satish radix sort Key
Error: Failed to build program executable!
C:\Users\Naganna\AppData\Local\Temp\OCLC91D.tmp.cl(100): error: variable with
automatic storage duration cannot be stored in the named address
space
__local uint localBuffer[TPG*2];
^
C:\Users\Naganna\AppData\Local\Temp\OCLC91D.tmp.cl(104): error: identifier
"localBuffer" is undefined
uint4 localBits = inclusive_scan_128(localBuffer, tid, block, lane, initialValue, bitsOnCount);
^
C:\Users\Naganna\AppData\Local\Temp\OCLC91D.tmp.cl(128): error: mixed
vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
localBits += localBuffer[block + 4 - 1];
^
C:\Users\Naganna\AppData\Local\Temp\OCLC91D.tmp.cl(308): warning: type
qualifier is meaningless on cast type
const int4 tid4 = ((const int4)tid) + (const int4)(0,WGZ,WGZ_x2,WGZ_x3);
^
C:\Users\Naganna\AppData\Local\Temp\OCLC91D.tmp.cl(309): warning: type
qualifier is meaningless on cast type
const int4 gid4 = tid4 + ((const int4)groupId<<2);
^
3 errors detected in the compilation of "C:\Users\Naganna\AppData\Local\Temp\OCLC91D.tmp.cl".
Internal error: clc compiler invocation failed.
Program build failure
Assertion failed: clStatus == CL_SUCCESS, file c:\users\naganna\desktop\clpp_v1_beta3\clpp\src\clpp\clppprogram.cpp, line 180
Has anyone got this problem solved yet? What a pity that the very helpful cpll can't be used on AMD GPUs!
To be fair it looks like this should not compile on any OpenCL hardware as it breaks the OpenCL spec. Havent read it yet but you shouldn't be able to declare __local memory in a non-kernel function as this would cause all sorts of confusion.