Raistmer, are you seeing this under Windows or Linux? (and if Windows, which version?)
Originally posted by: michael.chu Raistmer, are you seeing this under Windows or Linux? (and if Windows, which version?)
Hi,
porting Raistmers app to 64bit Linux (using openSuse 11, GPU HD5670), there occurs a repeatable error. But instead of an openCL error message the application stops with a SIGSEGV. After searching for other reasons for some days i've run out of options other than to report this issue as a bug in the SDKv2.1 or driver Cat 10.4
On an identical computer (see below, except GPU HD4670) the application produces wrong results, but is not erroring out.
Identified this command as source of error: clEnqueueNDRangeKernel(...) in bool Science::find_single_pulse_range_cl(...), using identical kernels file like Windows app.
GDB (backtrace) and Valgrind (both point to problems within /usr/lib64/libaticaldd.so)
outcome from GDB:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7e41950 (LWP 4469)]
0x00007ffff45d2424 in ?? () from /usr/lib64/libaticaldd.so
first example of thousands from Valgrind:
==4483== Conditional jump or move depends on uninitialised value(s)
==4483== at 0x84BA337: (within /usr/lib64/libaticaldd.so)
==4483== by 0x84B9BF7: (within /usr/lib64/libaticaldd.so)
==4483== by 0x84AEF94: (within /usr/lib64/libaticaldd.so)
==4483== by 0x849EDDE: (within /usr/lib64/libaticaldd.so)
==4483== by 0x83B4FEE: (within /usr/lib64/libaticaldd.so)
==4483== by 0x84F0D8E: (within /usr/lib64/libaticaldd.so)
==4483== by 0x84E9F40: (within /usr/lib64/libaticaldd.so)
==4483== by 0x84FAC54: (within /usr/lib64/libaticaldd.so)
==4483== by 0x660B562: (within ~/development/ati-stream-sdk-v2.1-lnx64/lib/x86_64/libatiocl64.so)
==4483== by 0x65ECA19: (within ~/development/ati-stream-sdk-v2.1-lnx64/lib/x86_64/libatiocl64.so)
==4483== by 0x66169C9: (within ~/development/ati-stream-sdk-v2.1-lnx64/lib/x86_64/libatiocl64.so)
==4483== by 0x65DBDB4: clGetPlatformInfo (in ~/development/ati-stream-sdk-v2.1-lnx64/lib/x86_64/libatiocl64.so)
Hardware:
Desktop1 Desktop2
CPU: Intel T7200 Intel T7200
GPU: HD5670, 1GB HD4670, 1GB
RAM: 2GB 2GB
MB: ASUS N4LVM-DH
OS: openSuse11, 64bit
Urs
Originally posted by: Raistmer First of all, it's not my own observation, I recived report from beta-tester of my app. AFAIK it was Windows7 x64. I asked GPU owner to describe situation by himself, hope he could provide more detailed description.
Actually it is XP Pro-64, there is also a thread on the game side of the forums.
http://forums.amd.com/game/messageview.cfm?catid=279&threadid=132505&enterthread=y
When I removed all the of the drivers and SDK and attempted to reinstall 10.4, there is no option for driver in installer. I tried both the preview and the final release. Hoping that the 10.5 preview will be coming out soon.
Hi,
I represent the 5970 behaviour.
Setup:
Windows 7 x64, 2x 5970, Cat 10.4. All GPUs reported as cat 10.4 by the app.
The application never utilizes any of the GPUs, as the driver restarts. This goes on indefinitely.
Morten
Originally posted by: Raistmer First of all, it's not my own observation, I recived report from beta-tester of my app. AFAIK it was Windows7 x64. I asked GPU owner to describe situation by himself, hope he could provide more detailed description.
My setup is an HD5770 and a Nvidia 9800GTX+, with Cat 10.4, SDK2.1_x86_64 and FW197.57, and Windows 7 64bit Premium, in this config Raistmer's OpenCL app crashes, even his Brook+ version crashes (which worked fine with Cat 10.2, SDK2.01_x86_64 and 197.25), and also GPU-Z crashes,
it isn't until i physically remove the 9800GTX+ and it's drivers, that i can get Raistmer's OpenCL app to function correctly, GPU-Z then also works.
If install Cat 10.2 and SDK2.1_x86_64 (with the 9800GTX+ fitted), Raistmer's Brook+ app now works, GPU-Z works, but the OpenCL app causes an ATI driver restart and doesn't progress.
Claggy
freighter,
Could you post the source code?(A compilable testcase)
Morten,
Which application is causing driver restart? Are you able to run the samples?
Originally posted by: omkaranathan Morten,
Which application is causing driver restart? Are you able to run the samples?
Hi,
ap_5.05_win_x86_SSE2_ATI_r420.exe.
What samples are you referring to?
No work unit is ever started, as GPU driver restarts when CPU is done pre-processing and passing to GPU.
Morten
Originally posted by: rosmo01 Originally posted by: omkaranathan Morten,
Which application is causing driver restart? Are you able to run the samples?
Hi,
ap_5.05_win_x86_SSE2_ATI_r420.exe.
What samples are you referring to?
No work unit is ever started, as GPU driver restarts when CPU is done pre-processing and passing to GPU.
Morten
Thanks to Raistmer I finally got an answer to what applications you were referring to:
HelloCl and Clinfo both executed with no errors logged.
Originally posted by: omkaranathan freighter, Could you post the source code?(A compilable testcase)
Will put a testcase together ...
EDIT: Sorry, this took a while and the testcase does not produce the exact backtrace which the application did, but i tried to get as close as possible.
(size 110.3KiB): http://www.echtbaer.de/download/Astropulse/uje_test_case_linux.tar.bz2
md5sum : 196a929895fbdbd11be87173ada62e8d
Originally posted by: freighter
Originally posted by: omkaranathan freighter, Could you post the source code?(A compilable testcase)Will put a testcase together ... EDIT: Sorry, this took a while and the testcase does not produce the exact backtrace which the application did, but i tried to get as close as possible. (size 110.3KiB): http://www.echtbaer.de/download/Astropulse/uje_test_case_linux.tar.bz2 md5sum : 196a929895fbdbd11be87173ada62e8d
With new driver version 10.5 the reported problem is still unresolved.
10.5 did solve my problems as I am able to run the OpenCL code on my 5830.
I also updated my system to Win7-64 due to some compatibility issues.
Originally posted by: freighter Originally posted by: freighter
Originally posted by: omkaranathan freighter, Could you post the source code?(A compilable testcase)Will put a testcase together ... EDIT: Sorry, this took a while and the testcase does not produce the exact backtrace which the application did, but i tried to get as close as possible. (size 110.3KiB): http://www.echtbaer.de/downloa...est_case_linux.tar.bz2 md5sum : 196a929895fbdbd11be87173ada62e8d
With new driver version 10.5 the reported problem is still unresolved.
Raistmer,
The issue has been passed to the developers and they are looking into it.
Cat 10.5 didn't fix my problems, when Nvidia GPU also installed with 257.15 drivers, SDK2.1 installed, Brook+ and ATI OpenCL apps still crash,
reverted back to Cat 10.2, Brook+ apps now work again.
Claggy
Edit: HelloCl and Clinfo both Crash with Cat 10.5 & 257.15, run with Cat 10.2 O.K, but with:
Error : Bytes mismatch!
Error : glSharing mismatch!
Error : images mismatch!
Failed!
freighter,
The application is sending an invalid memory object.
You are doing typecast of data array to a memory object
gpu_power = (cl_mem)&power;
Originally posted by: omkaranathan freighter,
The application is sending an invalid memory object.
You are doing typecast of data array to a memory object
gpu_power = (cl_mem)&power;
Yes, i know that and did it this way to get a possibly partly matching backtrace.
A version that reads the data array in the correct way, using a buffer, is also in test_case.cpp, uncommented, but that way there is no error, no backtrace.
This is the closest that i can get to match a backtrace to the real problem. Like i wrote before, i was not able to reproduce the exact problem with such a short testcase.
omkaranathan, please have a look into test_case.h . Try to combine the two backtraces there. Both mem addresses match upto a certain address.
Sorry, if it is a bit more complicated to get to the problem.
Originally posted by: freighter Originally posted by: omkaranathan freighter,
The application is sending an invalid memory object.
You are doing typecast of data array to a memory object
gpu_power = (cl_mem)&power;
Yes, i know that and did it this way to get a possibly partly matching backtrace.
A version that reads the data array in the correct way, using a buffer, is also in test_case.cpp, uncommented, but that way there is no error, no backtrace.
This is the closest that i can get to match a backtrace to the real problem. Like i wrote before, i was not able to reproduce the exact problem with such a short testcase.
omkaranathan, please have a look into test_case.h . Try to combine the two backtraces there. Both mem addresses match upto a certain address.
Sorry, if it is a bit more complicated to get to the problem.
As you are doing illegal casting, This testcase is not sufficient to reproduce your problem. Please send your code to streamdeveloper@amd.com
Still got this Issue, with Nvidia GPU fitted in 1st slot, HD5770 fitted in 2nd, Cat 10.6 or Cat 10.7 installed with latest Nvidia drivers,
Originally posted by: Claggy Cat 10.5 didn't fix my problems, when Nvidia GPU also installed with 257.15 drivers, SDK2.1 installed, Brook+ and ATI OpenCL apps still crash,
reverted back to Cat 10.2, Brook+ apps now work again.
Claggy
Edit: HelloCl and Clinfo both Crash with Cat 10.5 & 257.15, run with Cat 10.2 O.K, but with:
Error : Bytes mismatch! Error : glSharing mismatch! Error : images mismatch! Failed!
SDK_2.1 installed with Cat 10.6, SDK_2.2 installed with Cat 10.7,
Brook+ and OpenCL apps still crash, GPU-Z crashes when SDK installed, O.K when not installed, at least latest version of Boinc now ignores ATI GPU when it detects SIGSEGV in ATI GPU detection,
Is there any timescale for a fix?
Claggy