14 Replies Latest reply on Mar 25, 2016 1:34 AM by nibal

    Memory corruption in latest crimson driver 15.302?

    nibal

      Using Ubuntu 14.04 and valgrind:

       

      ==00:00:01:30.014 4949== Invalid write of size 8

      ==00:00:01:30.014 4949== at 0x4C2F5F3: memcpy@GLIBC_2.2.5 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

      ==00:00:01:30.014 4949== by 0xB3B6154: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.014 4949== by 0xB3B899A: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.014 4949== by 0xB3BB911: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB3C5F98: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB3C6667: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB3C6838: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB329CFB: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB35182C: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB351BD6: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB2F2DAC: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB2F312C: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB29115E: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0xB30115B: ??? (in /usr/lib/libamdocl64.so)

      ==00:00:01:30.015 4949== by 0x60BA181: start_thread (pthread_create.c:312)

      ==00:00:01:30.015 4949== by 0x63CA47C: clone (clone.S:111)

      ==00:00:01:30.015 4949== Address 0x7f126ed63000 is not stack'd, malloc'd or (recently) free'd

       

      Could be a false positive, but I'm getting some unexplained crashes:(

        • Re: Memory corruption in latest crimson driver 15.302?
          nibal

          Actually this is much worse than I thought. This is real. That corruption existed in catalyst 15.201, 15.101 and anything in between. Not only it gave instability to the ocl part of the program, but anything else it came in contact with in the same program. Plz fix urgently. Is there a place to download older catalysts?

          I will have to comment out all ocl parts and stop linking to the libraries until it is fixed

              • Re: Memory corruption in latest crimson driver 15.302?
                gstoner

                Hi Nibal

                  I am  having the team look into this I will get back to you by the end of the week. 

                 

                Greg

                  • Re: Memory corruption in latest crimson driver 15.302?
                    nibal

                    Hi Greg,

                     

                    And thanks for helping out.

                    This is a tough corruption to track. Since it is very reproducible in my system, I will try to limit it to specific ocl calls and update ticket.

                     

                    BR

                    Nikos

                      • Re: Memory corruption in latest crimson driver 15.302?
                        gstoner

                        What I need is what motherboard, processor, system bios version,  which GPU, if possible vbios number for the GPU,  which os and version  ( if linux kernel version)  you are running.    Also if you have test app that causes the issue you can get us.

                         

                        greg

                          • Re: Memory corruption in latest crimson driver 15.302?
                            nibal

                            My info so far:

                             

                            Motherboard: Gigabyte Technology Co., Ltd. 970A-UD3P

                            BIOS: UEFI DualBIOS, American Megatrends Inc. version: F1

                            CPU: AMD FX(tm)-8320 Eight-Core Processor, @1.4 Ghz

                            GPU: AMD Radeon (TM) R9 270, Pitcairn, Curacao Pro, Platform ID: 0x7f7227b45a18 (as reported by clinfo)

                            OS: Ubuntu 14.04 x64, 3.13.0-49 generic

                            ocl SDK: 3.0, working ocl 1.2

                             

                            Working on test app (Need to reboot).

                             

                            BR

                            Nikos

                              • Re: Memory corruption in latest crimson driver 15.302?
                                nibal

                                Using printfs and the valgrind output I was able to bracket the Invalid write between NDRangeKernel and completing the kernel.

                                But here is the catch: It happens only on the first time the kernel is executed.

                                My kernel is a slightly modified kernel of your FFT sample.

                                Unfortunately validation of your FFT sample, will take more time.

                                Each time I run it through valgrind it crashes my PC.  I do not crash my PC when running FFT alone,

                                but I do not run it for long and corruption may not show. I will have to compile latest valgrind

                                from sources and retest

                                The pattern suggests that this is not specific to the kernel itself (else it would appear on every kernel pass),

                                but general to the kernel mechanism. I hope it can be reproduced with any kernel. I'm compiling as default (ocl 1.2)

                                 

                                BR,

                                Nikos

                                  • Re: Memory corruption in latest crimson driver 15.302?
                                    nibal

                                    It doesn't show in your FFT sample. Will have to create a test app with my kernel

                                      • Re: Memory corruption in latest crimson driver 15.302?
                                        nibal

                                        Hi Greg,

                                         

                                        Plz use attached fft.tgz to recreate problem. Included in val.out are 2 more Invalid reads, which were not in original valgrind report. You might want to check on them, too. These contain full stack trace. Instructions for recreating bug:

                                         

                                        -> tar -xzvf fft.tgz      //This will create a directory fft/ witth the sources

                                        -> cd fft

                                        -> make db

                                        -> fft                        // Optional. This terminates with a core dump in my system. Be careful in yours it could crash your PC

                                        -> make clean

                                        -> make db

                                        -> script

                                        -> valgrind fft          // Best use latest valgrind 3.11.0, from sources.  Otherwise it might crash your PC. Can be interrupted with <ctrl-C>,

                                                                          but in my case it core dumps before I get the chance to and generates vgcore,<pid>

                                        -> exit                    // Script

                                         

                                        Let me know if you can recreate problem.

                                         

                                        TIA

                                        Nikos

                                          • Re: Memory corruption in latest crimson driver 15.302?
                                            german

                                            1. Freqs array reallocation in the code looks broken. The code below:

                                            freqs[fidx].hz = sig[i].hz;

                                            freqs[fidx++].ts = ts;

                                            if (fidx >= maxfreqs) {

                                            maxfreqs += 16;

                                            freqs = realloc(freqs, maxfreqs);

                                            }

                                            Should be something

                                            if (fidx == (maxfreqs-1)) {

                                            maxfreqs += 16;

                                            freqs = realloc(freqs, maxfreqs * sizeof(freq_t));

                                            }

                                            freqs[fidx].hz = sig[i].hz;

                                            freqs[fidx++].ts = ts;

                                            2. You call run_fft() with pass=8 and that causes access to a destroyed cl_event ndr on the pass=7.

                                            You destroyed ndr (pass=7)

                                            if (pass == MAXPASS - 1) {

                                                if ((err = waitForEventAndRelease(&ndr)) != SUCCESS)

                                            you have access to a destroyed object and corrupt memory. (pass=8)

                                            if (pass && (err = waitForEventAndRelease(&ndr)) != SUCCESS)

                                            1 of 1 people found this helpful
                                              • Re: Memory corruption in latest crimson driver 15.302?
                                                nibal

                                                Hi,

                                                 

                                                Oops, sorry about that. These 2 were artifacts of the test file, the additional invalid reads. Should have checked it more carefully before shipping it out, but it was quite complex. However, after fixing them, leaves the original memory corruption. I imagine you must have recreated it by now

                                                 

                                                Thank you for your feedback,

                                                Nikos

                                                  • Re: Memory corruption in latest crimson driver 15.302?
                                                    gstoner

                                                    When the team fixed these two issues.  The corruption was no longer there

                                                     

                                                    Only crashed with the two issues

                                                     

                                                    Greg

                                                     

                                                    Sent from Outlook Mobile<https://aka.ms/qtex0l>

                                                      • Re: Memory corruption in latest crimson driver 15.302?
                                                        nibal

                                                        These were artifacts of the test file as noted initially. I'm still getting the original problem:

                                                         

                                                        ==00:00:00:49.169 4191== Invalid write of size 8^M
                                                        ==00:00:00:49.170 4191== at 0x4C2F0F3: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1017)^M
                                                        ==00:00:00:49.170 4191== by 0x68E8154: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x68EA99A: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x68ED911: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x68F7F98: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x68F8667: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x68F8838: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x685BCFB: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x688382C: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x6883BD6: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x6824DAC: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x682512C: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x67C315E: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x683315B: ??? (in /usr/lib/libamdocl64.so)^M
                                                        ==00:00:00:49.170 4191== by 0x4E3F181: start_thread (pthread_create.c:312)^M
                                                        ==00:00:00:49.170 4191== by 0x565C47C: clone (clone.S:111)^M
                                                        ==00:00:00:49.170 4191== Address 0x7fbf95aaf000 is not stack'd, malloc'd or (recently) free'd^M
                                                        ==00:00:00:49.170 4191== ^M
                                                        {^M

                                                         

                                                         

                                                        Is this a false positive? Do you not see it in your valgrind?

                                                         

                                                         

                                                        BR,

                                                        Nikos