Hi,
I'm the author of Pyrit and have yet another possible bug in ATIs OpenCL-implementation to ask about. The upstream-bug is tracked at http://code.google.com/p/pyrit/issues/detail?id=123
A user reports that his setup crashes with a HD5970. The cause of the error is a self-test in my code that checks if the GPU is actually computing correct results; this catches the second GPU on the HD5970 producing bogus results (see comment #3).
What puzzles me is that there is a *ton* of error checking in my OpenCL code, which checks every single API-call. No error is reported, all functions return CL_SUCCESS...
Does someone have an idea what might cause this problem?
One possible explanation for this might be that Pyrit uses one host-thread for every GPU, possibly calling the OpenCL-library from different threads at the same time. This *should* not be a problem as there are no global variables (all data is local to the calling thread) and the OpenCL-library is *supposed* to be thread-safe.
You can find the whole source-code at http://code.google.com/p/pyrit/source/browse/#svn/trunk/cpyrit_opencl
Update: The problem also occurs when only the second GPU is used. This means that also only one host thread operates on the OpenCL-library so it's probably not a locking-issue.
More information about this:
* I've dumped the .il and the .isa file that get generated for both GPUs on the HD5970. They are identical.
* It takes a very long time on the second GPU for clFinish() to return (seval seconds). It finally returns CL_SUCCESS (the call to clFinish() takes roughly a few milliseconds on the first GPU).
* CL_EVENT_COMMAND_EXECUTION_STATUS for clEnqueueNDRangeKernel() is always CL_COMPLETE
* The kernel-output on the second GPU is always wrong, but always the same every time the kernel is called. It behaves regardless of the input (!).
CAL itself isn't multi-thread safe as it stated at documentation, so I won't be surprised if there numerous problems with OpenCL as well.
Also it's usually better to write direct mails to streamdeveloper@amd.com rather than use these forums to report problems/bugs.
to clarify: the problem turned out to persist if the second gpu is called alone and from the the main thread only.
try export DISPLAY=0.1 it should use only second GPU.
run some CAL samples if it get incorrect result too. it is possible that it can be HW issue
Simple testcase:
* Download http://pyrit.googlecode.com/files/cpyrit-opencl-0.3.0.tar.gz
* Compile and install the python module via setup.py
* Run the following command from a console:
python -c "from cpyrit import _cpyrit_opencl; print [(i, _cpyrit_opencl.OpenCLDevice(0,i).solve('bla',('blablabla',)*3)) for i in xrange(_cpyrit_opencl.OpenCLPlatform(0).numDevices)]"
Example output:
[(0,
('\xb8\xb7\xf5E^\xfe\xb1\x8b\xf5\x98\xce\x95z$%\xd4\x05\xdb4\xb6E\xeb\xf4\xfd\x92\xf0\xa18\x18\xae\xaa\x1c',
'\xb8\xb7\xf5E^\xfe\xb1\x8b\xf5\x98\xce\x95z$%\xd4\x05\xdb4\xb6E\xeb\xf4\xfd\x92\xf0\xa18\x18\xae\xaa\x1c',
'\xb8\xb7\xf5E^\xfe\xb1\x8b\xf5\x98\xce\x95z$%\xd4\x05\xdb4\xb6E\xeb\xf4\xfd\x92\xf0\xa18\x18\xae\xaa\x1c')),
(1,
('\xb8\xb7\xf5E^\xfe\xb1\x8b\xf5\x98\xce\x95z$%\xd4\x05\xdb4\xb6E\xeb\xf4\xfd\x92\xf0\xa18\x18\xae\xaa\x1c',
'\xb8\xb7\xf5E^\xfe\xb1\x8b\xf5\x98\xce\x95z$%\xd4\x05\xdb4\xb6E\xeb\xf4\xfd\x92\xf0\xa18\x18\xae\xaa\x1c',
'\xb8\xb7\xf5E^\xfe\xb1\x8b\xf5\x98\xce\x95z$%\xd4\x05\xdb4\xb6E\xeb\xf4\xfd\x92\xf0\xa18\x18\xae\xaa\x1c')),
(2,
('\xc1\xed\xb1dJ\x93\xcb5sg0b\xfd\xc2\x8eO5\xef\xea*\x9c\xff\xc7I\xfc\x16\x14\x0e\xe3\x8f\xa3\xa8',
'\x89XU"\xfa\xfb\xc7\x96\xda\x8c\xc5\xdd\xf4\xe5\xefG\xa4\xb5(B\xc2\x7f\xf6\xbc\xb7\xe0\xfdg\r\x99v<',
'(h|L\xce\xceq\xd46#\xa2{\xcaq\xf7\xd7\xd7\xa0IT\xd1\r\xe4\x00\x1b(DCT\x18\xc1Q'))]
The results for device 2 (the second GPU) is not equal to the results on device 0 (CPU) and device 1 (first GPU)
I can report that the issue has not been fixed in 10.3-final. The second GPU still produces random, invalid results without the API indicating any error condition.
I know that ATI is used to ignore their user-base. It would be *so* sweet if you'd at least inform the developers like us about known problems, upcoming changes or the lack thereof. This is really frustrating.
Does this issue still persist in the current ATI Stream SDK/Catalyst driver?