cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

edwintorok
Journeyman III

linux: segmentation fault with FindNumDevices /PCIeSpeedTest

linux: segmentation fault with FindNumDevices /PCIeSpeedTest

Using fglrx 9.6, and ATIStream 1.4beta on Debian (unstable) x86_64 Linux:

/usr/local/atical/bin/lnx64/FindNumDevices
Supported CAL Runtime Version: 1.3.185
Found CAL Runtime Version: 1.4.317
Use -? for help
Segmentation fault

Running any other program that uses cal segfaults too (like PCIeSpeedTest). Happens with both 32/64-bit variants, running on:

Linux debian 2.6.30.1 #126 SMP PREEMPT Thu Jul 16 12:40:15 EEST 2009 x86_64 GNU/Linux

Here is a stracktrace (not much use without the debug info):
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77ea15a in ?? () from /usr/lib/libaticaldd.so
(gdb) bt
#0  0x00007ffff77ea15a in ?? () from /usr/lib/libaticaldd.so
#1  0x00007ffff77ea095 in ?? () from /usr/lib/libaticaldd.so
#2  0x00007ffff77e68de in ?? () from /usr/lib/libaticaldd.so
#3  0x00007ffff77de565 in ?? () from /usr/lib/libaticaldd.so
#4  0x00007ffff7734353 in ?? () from /usr/lib/libaticaldd.so
#5  0x00007ffff782277a in ?? () from /usr/lib/libaticaldd.so
#6  0x00007ffff781d33f in ?? () from /usr/lib/libaticaldd.so
#7  0x00007ffff782b049 in ?? () from /usr/lib/libaticaldd.so
#8  0x0000000000403565 in ?? ()
#9  0x0000003fe7e1e5a6 in __libc_start_main () from /lib/libc.so.6
#10 0x00000000004015ca in ?? ()
#11 0x00007fffffffe2c8 in ?? ()
#12 0x000000000000001c in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x00007fffffffe594 in ?? ()
#15 0x0000000000000000 in ?? ()
(gdb)

 ldd /usr/local/atical/bin/lnx64/FindNumDevices
        linux-vdso.so.1 =>  (0x00007fffeb1e1000)
        libaticalrt.so => /usr/lib/libaticalrt.so (0x00007f96cc55d000)
        libaticalcl.so => /usr/lib/libaticalcl.so (0x00007f96cc43e000)
        libXext.so.6 => /usr/lib/libXext.so.6 (0x0000003fea200000)
        libX11.so.6 => /usr/lib/libX11.so.6 (0x0000003fe9e00000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x0000003feaa00000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x0000003fe8a00000)
        libdl.so.2 => /lib/libdl.so.2 (0x0000003fe8600000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0000003fea600000)
        libc.so.6 => /lib/libc.so.6 (0x0000003fe7e00000)
        librt.so.1 => /lib/librt.so.1 (0x0000003feae00000)
        libm.so.6 => /lib/libm.so.6 (0x0000003fe8200000)
        libXau.so.6 => /usr/lib/libXau.so.6 (0x0000003fe9a00000)
        libxcb.so.1 => /usr/lib/libxcb.so.1 (0x0000003fe9200000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003fe7a00000)
        libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x0000003fe9600000)

 

Relevant messages from the kernel:

[11136.492007] FindNumDevices:21637 conflicting memory types d0000000-e0000000 uncached-minus<->write-combining                                                                         
[11136.492012] reserve_memtype failed 0xd0000000-0xe0000000, track uncached-minus, req write-combining                                                                                  
[11136.492015] [fglrx:KCL_MEM_VM_MapRegion] *ERROR* remap_pfn_range failed                                                                                                              
[11136.492114] FindNumDevices[21637]: segfault at 3e8 ip 00007f3435fad15a sp 00007fff6d1e9070 error 4 in libaticaldd.so[7f3435e24000+4fc000]

 

Here is a strace too:

open("/dev/dri/card0", O_RDWR)          = 5
ioctl(5, DECODER_SET_PICTURE, 0x7fff4bfdfee0) = -1 EINVAL (Invalid argument)
ioctl(5, DECODER_GET_CAPABILITIES, 0x7fff4bfdfed0) = 0
ioctl(5, DECODER_GET_CAPABILITIES, 0x7fff4bfdfed0) = 0
ioctl(5, DECODER_GET_STATUS or DEVFSDIOC_SET_EVENT_MASK, 0x7fff4bfe0210) = 0
poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
writev(4, [{"\212\v\3\0\0\0\0\0\33\0\0\0", 12}, {NULL, 0}, {"", 0}], 3) = 12
poll([{fd=4, events=POLLIN}], 1, -1)    = 1 ([{fd=4, revents=POLLIN}])
read(4, "\1\202\f\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\254\21\1\0\0\0\0\0", 4096) = 32
read(4, 0x1096244, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=4, revents=POLLOUT}])
writev(4, [{"\212\n\2\0\0\0\0\0", 8}, {NULL, 0}, {"", 0}], 3) = 8
poll([{fd=4, events=POLLIN}], 1, -1)    = 1 ([{fd=4, revents=POLLIN}])
read(4, "\1\0\r\0D\1\0\0\0pO\0\0\0\0\0\0\0\0\0\0\0\0\20\0\34\0\0\20\5\0\0"..., 4096) = 1328
read(4, 0x1096244, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
mmap(NULL, 268435456, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0x4f7000) = -1 EAGAIN (Resource temporarily unavailable)
ioctl(5, 0x80146454, 0x108be48)         = 0
ioctl(5, 0x80046457, 0x7fff4bfe028c)    = 0
ioctl(5, 0x80046446, 0x7fff4bfe02dc)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0230)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0230)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0230)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0230)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0230)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0230)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0200)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0200)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0210)    = 0
ioctl(5, 0x4004645e, 0x7fff4bfe0314)    = 0
ioctl(5, 0xc03064a6, 0x7fff4bfe0240)    = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

 

Please tell me if you need further information, and if you know of any workarounds.

 

0 Likes
6 Replies
edwintorok
Journeyman III

FWIW booting the kernel with 'nopat' on the kernel commandline allows FindNumDevices to work.

 

Looks like a bug in the fglrx driver when PAT is enabled in the kernel

0 Likes

thank you for the solution!

I spent hours trying to figure it out.

 

Such solution also solves seg fault using catalyst 9.12 with kernel 2.6.31 and ATI stream 2.0

0 Likes

FWIW same problem on 2.6.32.4, and the OpenCL samples from the ati stream sdk 2.0

I symlinked the atiocl stuff to /usr/lib/OpenCL/vendors, but I still get:

[35255.967654] HelloCL:11090 conflicting memory types d0000000-e0000000 uncached-minus<->write-combining
[35255.967657] reserve_memtype failed 0xd0000000-0xe0000000, track uncached-minus, req write-combining
[35255.967660] [fglrx:KCL_MEM_VM_MapRegion] *ERROR* remap_pfn_range failed

With 'nopat' kernel commandline HelloCL works.

 

However if I try something more complicated (such as the AES example),

it causes a kernel panic (keyboard leds blinking, doesn't respond to anything, and of course no panic message since I am in X).

I was doing something like:

./AESEncryptDecrypt -x 100 --device gpu

^C

./AESEncryptDecrypt -x 10 --device gpu

 

It paniced there.

 

0 Likes

HelloCL use default CPU device.

did you run AES examle interupt it with Ctrl+C and then run again? you can't do that. it freeze whole system if you interupt program during kernel execution.

0 Likes

Yes I interrupted AES example with Ctrl-C.

If that freezes the entire system then it is a serious bug in AMD's OpenCL implementation.

I am running a userspace program, that is not allowed to cause the kernel to panic, or freeze the entire system. This is not windows

 

0 Likes

edwin,
This is a known issue that we are working on fixing.
0 Likes