Archives Discussions

matszpk · ‎12-02-2015

Hi. I encountered severe problem when any an OpenCL application is working non-stop. Just GUI (X server) is hanging up and no respond. I am using KDE Plasma 5 environment on the OpenSUSE Leap 42.1 distro. KDE desktop is running under XRender or OpenGL compositor (desktop shows special effects). I have Radeon HD 7850 and that hardware is working under latest Radeon Crimson 15.11 drivers.

Thanks to remote access, I extracted excerpt of logs (by using journalctl):

Nov 26 16:17:24 gigas sshd[2498]: pam_unix(sshd:session): session opened for user root by (uid=0)

Nov 26 16:19:12 gigas kernel: <6>[fglrx] ASIC hang happened

Nov 26 16:19:12 gigas kernel: CPU: 1 PID: 1439 Comm: X Tainted: P O 4.1.12-1-default #1

Nov 26 16:19:12 gigas kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z77-DS3H, BIOS F8 08/21/2012

Nov 26 16:19:12 gigas kernel: 0000000000000000 00000001000129a2 ffffffff81658898 0000000000000000

Nov 26 16:19:12 gigas kernel: ffffffffa048437c 0000000000000000 ffffffffa05523f7 ffff88040aab7c28

Nov 26 16:19:12 gigas kernel: ffffffffa0552356 ffffc90003141620 0000000000000001 ffffc90003140020

Nov 26 16:19:12 gigas kernel: Call Trace:

Nov 26 16:19:12 gigas kernel: [<ffffffff8100559c>] dump_trace+0x8c/0x340

Nov 26 16:19:12 gigas kernel: [<ffffffff8100594c>] show_stack_log_lvl+0xfc/0x1a0

Nov 26 16:19:12 gigas kernel: [<ffffffff81006ea1>] show_stack+0x21/0x50

Nov 26 16:19:12 gigas kernel: [<ffffffff81658898>] dump_stack+0x47/0x67

Nov 26 16:19:12 gigas kernel: [<ffffffffa048437c>] firegl_hardwareHangRecovery+0x1c/0x30 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa05523f7>] _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x37/0x40 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa0552356>] _ZN4Asic9WaitUntil15WaitForCompleteEv+0xc6/0x130 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa054ec95>] _ZN4Asic19PM4ElapsedTimeStampEj14_LARGE_INTEGER12_QS_CP_RING_+0xd5/0x160 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa05575d9>] _ZN15ExecutableUnits35flush_all_and_invalidate_HDP_cachesE12_QS_CP_RING_+0xc9/0xf

Nov 26 16:19:12 gigas kernel: [<ffffffffa05574be>] _ZN15ExecutableUnits8ringIdleE12_QS_CP_RING_+0x5e/0xb0 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa052cacd>] _Z17uQSPm4SynchronizemP18_QS_SYNC_PACKET_IN+0x4d/0x50 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa0527492>] _Z8uCWDDEQCmjjPvjS_+0x652/0x12c0 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa051d5ea>] CMMQS_uCWDDEQC+0xa/0x10 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa04b225f>] firegl_cmmqs_CWDDE_32+0x36f/0x480 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa04b0ace>] firegl_cmmqs_CWDDE32+0x8e/0x140 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa047e8d4>] firegl_ioctl+0x1f4/0x260 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffffa046c1ae>] ip_firegl_unlocked_ioctl+0xe/0x20 [fglrx]

Nov 26 16:19:12 gigas kernel: [<ffffffff811f0f4f>] do_vfs_ioctl+0x2ff/0x510

Nov 26 16:19:12 gigas kernel: [<ffffffff811f11e1>] SyS_ioctl+0x81/0xa0

Nov 26 16:19:12 gigas kernel: [<ffffffff8165f032>] system_call_fastpath+0x16/0x75

Nov 26 16:19:12 gigas kernel: [<00007f302d171be7>] 0x7f302d171be7

Nov 26 16:19:12 gigas kernel: pubdev:0xffffffffa123d440, num of device:1 , name:fglrx, major 15, minor 30.

Nov 26 16:19:12 gigas kernel: device 0 : 0xffff880036c6c000 .

Nov 26 16:19:12 gigas kernel: Asic ID:0x6819, revision:0x15, MMIOReg:0xffffc90003080000.

Nov 26 16:19:12 gigas kernel: FB phys addr: 0xe0000000, MC :0xf400000000, Total FB size :0x40000000.

Nov 26 16:19:12 gigas kernel: gart table MC:0xf40f7b8000, Physical:0xef7b8000, size:0x547000.

Nov 26 16:19:12 gigas kernel: mc_node :FB, total 1 zones

Nov 26 16:19:12 gigas kernel: MC start:0xf400000000, Physical:0xe0000000, size:0xfd00000.

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x0, size:0xf7b4000, reference count:40, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0xf7b4000, size:0x4000, reference count:1, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0xf7b8000, size:0x548000, reference count:1, mapping count:0,

Nov 26 16:19:12 gigas kernel: mc_node :INV_FB, total 1 zones

Nov 26 16:19:12 gigas kernel: MC start:0xf40fd00000, Physical:0xefd00000, size:0x30300000.

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x302ee000, size:0x12000, reference count:1, mapping count:0,

Nov 26 16:19:12 gigas kernel: mc_node :GART_USWC, total 4 zones

Nov 26 16:19:12 gigas kernel: MC start:0xff80900000, Physical:0x0, size:0x78000000.

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x5000000, size:0x1800000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x3800000, size:0x1800000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x2000000, size:0x1800000, reference count:17, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x0, size:0x2000000, reference count:27, mapping count:0,

Nov 26 16:19:12 gigas kernel: mc_node :GART_CACHEABLE, total 4 zones

Nov 26 16:19:12 gigas kernel: MC start:0xff50400000, Physical:0x0, size:0x30500000.

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x6100000, size:0x600000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x5b00000, size:0x600000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x3200000, size:0x400000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x4d00000, size:0x600000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x4700000, size:0x600000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x3e00000, size:0x900000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x3800000, size:0x600000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x2300000, size:0x900000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x1d00000, size:0x600000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x1400000, size:0x900000, reference count:2, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0xb00000, size:0x900000, reference count:11, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x200000, size:0x900000, reference count:4, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0x0, size:0x200000, reference count:41, mapping count:0,

Nov 26 16:19:12 gigas kernel: Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,

Nov 26 16:19:12 gigas kernel: mc_node :PEER_FB_GART, total 1 zones

Nov 26 16:19:12 gigas kernel: MC start:0xfff8900000, Physical:0x0, size:0x1000.

Nov 26 16:19:12 gigas kernel: GRBM : 0xa0003028, SRBM : 0x20004ec0 .

Nov 26 16:19:12 gigas kernel: CP_RB_BASE : 0xff809000, CP_RB_RPTR : 0x1a6e0 , CP_RB_WPTR :0x1a780.

Nov 26 16:19:12 gigas kernel: CP_IB1_BUFSZ:0xe0, CP_IB1_BASE_HI:0xff, CP_IB1_BASE_LO:0x80d9d000.

Nov 26 16:19:12 gigas kernel: last submit IB buffer -- MC :0xff80d9d000,phys:0x4058bc000.

Nov 26 16:19:12 gigas kernel: Dump the trace queue.

Nov 26 16:19:12 gigas kernel: End of dump

lines 27516-27602/27602 (END)

That problem occurred many times, while I was crunching the BOINC project that uses GPU (OpenCL app). Can anybody solve that severe problem?

matszpk · ‎12-07-2015

Can anybody help me or explain how to omit that problem? Any help will be appreciated.

Meteorhead · ‎12-08-2015

That is because currently all GPU drivers work in a way where compute tasks have a higher priority than graphics tasks. If you are running heavy GPGPU task, desktop render may become jagged, or even come to a complete halt. That is because the GPU discards desktop render calls, because it has no time to process them.

This can be omitted if the task you are running is made of granular kernels (those that complete in a few milliseconds) or if the application does not overflow the GPU with compute tasks. If it is caused by BOINC then you have no control over what happens (AFAIK) and must not run GPGPU BOINC tasks if you wish to use the computer. This is limitation that is deeply rooted into all drivers and APIs. Not even the newest APIs support prioritizing of tasks, because the GPU scheduler does not support it in HW.

I too feel this to be a great limitation and was hoping to see this issue addressed in DX12, Vulkan, OpenCL 2.1, etc. but I saw no indication in any of the provisional specs that a priority value could be set to any task issued to the GPU.

Hope I was clear enough.

bsp2020 · ‎12-08-2015

I thought task switching limitation is hardware issue and not something API/driver can change. Isn't that one of the main purpose for AMD developing HSA(Requirement: Kernel agent context switching ), so that GPGPU tasks can be deployed to any server?

As far as I know, VI architecture devices (Carrizo, Tonga, Fiji) are the first GPU architecture that support general context switching. Kaveri supported limited context switching.

Meteorhead · ‎12-08-2015

Forgive me, indeed HSA is capable of doing it, but it is really not something that benefits the end-user as long as there is no way of controlling it programmatically. HSAIL currently can be generated from C++AMP and OpenCL 2.0. There is an LLVM -> HSAIL converter and GCC is about to be able to emit HSAIL, and GCC has OpenMP 4.0 as well as OpenACC front-ends. Clang has a fork that compiles CUDA to PTX, but because it builds atop LLVM, in theory CUDA -> HSAIL is also doable. The new HCC initiative will also be able to do that.

Now which of these APIs allow me to provide an integer value priority to clEnqueueNDRangeKernel for instance? None of them. Neither do the graphics APIs. Even though there were lots of requests for Windows 10 to propagate GPU usage in Task Manager, it is not implemented. I am not familiar with WDDM 2.0, but I doubt there is a chance to prioritize tasks.

HSA does define a set of darn useful QoS features, but it's really of no use if it remains an implementation detail, something like GDS (Global Data Share) since HD5000 series. It could have been put to such good use in my simulations, but it's inaccessible. It accelerates global atomics in an implicit manner. I have a feeling that HSA features will remain that way too. There will be no API in which you could use it to it's full extent. The reason is that on the PC side of the world, Intel and Nvidia have no interest in supporting HSA. Until this is true, no PC-compatible API will be polluted with capabilities that can only be implemented on AMD HW.

We just have to live with it, knowing that the HW is capable of soooo much more than what end-users actually see.

bsp2020 · ‎12-08-2015

I'm just speculating here. But I thought task priority management is the responsibility of OS. So, I think it is natural to expect that when HSA driver is released (see "Linux Driver and Runtime Focused on the Needs of HPC Cluster-Class Computing" section of http://www.amd.com/en-us/press-releases/Pages/boltzmann-initiative-2015nov16.aspx), task management (including priority setting) will be handled by the OS. If it is supported, I don't expect it to be through programming APIs/environments such as OpenMP/OpenCL/OpenACC.

I think AMD is concentrating on Linux HSA software stack first. I would not expect Windows support until well after Linux stack is completed. The last mention of HSA stuff for Windows I remember is this post (https://community.amd.com/message/1303153#1303153 see my reply to sarobi) and I think proper HSA support is not something you can retrofit into an OS using a driver, which is why I believed they dropped their effort to support it in Windows at the time. Also, I think HSA can bring more immediate benefits to HPC market. Consumer market benefits will take longer to materialize.

I find Boltzmann initiative very interesting. It will make it easy to port HPC applications to run on AMD hardware and really remove the major barrier of entry into HPC market. I think AMD did not have resources to port all the HPC application themselves and had tough time finding partners who were willing to port the software to AMD platform using OpenCL. As far as I tell, Boltzmann initiative should be able to remove that barrier in one fell swoop.

A few interesting facts I found. I'm speculating with my rose colored glasses on. So, take these with a large grain of salt.

1. From http://www.amd.com/en-us/press-releases/Pages/boltzmann-initiative-2015nov16.aspx, New Tools Target an Unprecedented 28 Teraflops of Processing at Less Than a Kilowatt by 2016. An early access program for the "Boltzmann Initiative" tools is planned for Q1 2016.

2. From multicoreware / hcc / wiki / Home — Bitbucket, HSAIL and BRIG for HSA devices: AMD Kaveri APU, AMD Fiji dGPU

So, I think Fiji is capable of 2:1 double precision and AMD will use 8 of them to release 28TFlop double precision box that uses less than 1Kwatts, because

1. R9 Fury Nano is capable of about 8 Single Precision TFlop per 175W running at up to 1000MHz

2. Running Fiji at around 850MHz will reduce the performance to about 7 Single Precision TFlops and should bring the power consumption down further.

3. Putting two Fiji on a single board will also help reduce power consumption.

So, I think it's possible that AMD can release dual Fiji FirePro board that runs at 850Mhz and consume less than 250W. If Fiji is capable of 2:1 single/double precision performance ratio, AMD can release 28TFlop in a single box that consume less than 1Kwatts using Fiji. If AMD has to use their next generation GPU that is supposed to improve performance/watt by further 2X to release 28TFlop box, they won't be able to meet 2016 time frame. So, I'm pretty sure that Fiji is capable of 2:1 single/double precision performance ratio. The only thing I need to figure out is whether Fiji can work with HBM2 or not. That 4GB memory size limit is the only thing that will hold it back if Fiji can't support HBM2. Or, maybe Hynix will release 32Gbit HBM1 memory so that Fiji can use 16GB of VRAM (ShieldSquare reCAPTCHA Page).

It is also probable that using Boltzmann initiative/HSA framework, users will not suffer from GPU locking up when working with long running kernel, allowing it to be deployed widely in the cloud. ISC2016 should be an interesting event to watch.

Brian

realhet · ‎12-08-2015

>"We just have to live with it, knowing that the HW is capable of soooo much more than what end-users actually see."

We don't have to as we have GCN asm. IMO it's a stable platform since 2012, you only have to make sure that you have the right driver version for the binary, and it works reliably.

And unlike Evergreen, there is much more in it than the GDS.

matszpk · ‎12-08-2015

Thank you for advice. I switched to IGP in CPU. Just I ran two X servers: one for desktop, second for GPGPU. Ofcourse, OpenCL doesn't need X environment, however one GPU BOINC application had problems with that prepared OpenCL environment.

For problem: logs points to driver/kernel area. I was crunching on GPU and working on GUI simultaneously very often without any problems on an older OpenSUSE distro (13.2) and older Catalyst drivers. I was conscious a freezing/hanging desktop when some OpenCL application ran long kernel, however I was encountering that problems rarely.

I would like to get help from AMD to solve that problem. I assume that problem concerns driver/kernel bug or X server bug.

matszpk · ‎01-12-2016

Is any progress in resolving that problem? Whether has some fixes in driver been made? Please notify me about any progress. Any help will be appreciated.

matszpk · ‎03-21-2016

Driver hang up has been repeated when only an OpenCL application is working without a running X'es (driver 15.12 crimson). Now, I have 2 graphics cards: old HD7850 and new R7 360 (bonaire), and both are crunching. I am crunching moowrapper (distributed.net) project with using OpenCL app. I was encountering similar problems while crunching under a running X'es (one client is hanging up, or same X'es reports problems). Ofcourse, temperature of two graphics card is greater (old graphics card have 65-70 degrees, new closely 80) . However, I suspect that is driver problem. My clgpustress didn't fail while 10 minute stress testing both graphics cards.

Archives Discussions

GUI is hanging up when an OpenCL app is heavily working.