6 Replies Latest reply on Apr 18, 2018 2:57 AM by fsadough

    S9100: gpu-z not %100 functional + problems

    beemerrbiker

      Using driver 1800.12 and having problems.

      First off, GPU-Z is not showing temperature nor clock though appears to be working fine.s9100tmp.png

       

      Secondly, I seem to be having problems performing more than one task concurrently.  About 1 out of every 4 tasks fail when running 8 concurrent tasks.  Comparing my system to another user who has S9150 (different driver and windows 8.1), they have no failures running the same tasks.  When I reduce the number of concurrent tasks, the % of failed tasks decrease.  It is not clear to me what the problem is and I was wondering if there was a diagnostic program that I can run.  I am not an expert on AMD and OpenCL.  I have some experience with CUDA and that comes with some diagnostic stuff.  Is there a user friendly diagnostic I can download for AMD & OpenCL?  I am not interested in the development system unless it has a diagnostic already built.  I have windows 10 x 64 and the "free" visual studio package. 

       

      Thanks for looking!

        • Re: S9100: gpu-z not %100 functional + problems
          fsadough

          Firepro S9100 is a server GPU. What are your server model/spec? I am not familiar with 1800.12 driver, please provide the link where you downloaded the driver from?

          • Re: S9100: gpu-z not %100 functional + problems
            beemerrbiker

            I put the S9100 in a older (Core 2 quad) Compaq that has built in video as the 9100 had no video output even though a mini DP connector is present.  I have three S9000 that work fine using Adrenalin 17.12, one of which is shown below. 

            When booting windows 10x64 I let Microsoft install what it thought was best for the S9100 and it put in 1800.8

            • Using SSE4.1 path
            • Found 1 platform
            • Platform 0 information:
            •   Name:       AMD Accelerated Parallel Processing
            •   Version:    OpenCL 2.0 AMD-APP (1800.8)
            •   Vendor:     Advanced Micro Devices, Inc

            I had problems, as mentioned in the first post, so I went to AMD and manually obtained "FirePro S Series"

            amd2.png

            Which gave me "15.201.2401" and was not any better as it was 1800.12

            • Using SSE4.1 path
            • Found 1 platform
            • Platform 0 information:
            •   Name:       AMD Accelerated Parallel Processing
            •   Version:    OpenCL 2.0 AMD-APP (1800.12)
            •   Vendor:     Advanced Micro Devices, Inc.

             

            The above drivers are dated 2015

            However, when looking at the release notes for the W9100 driver I noticed it supported FirePro S series quote "+all other AMD FirePro™ S series products" so I downloaded and installed 18.Q1.1 and finally GPU-Z started working and I could monitor temperatures using TThrottle.

            9100stats3.png

            There are a few strange things such as voltage VDDC jumping to over 800,000 occasionally but that is bug in how GPU-Z handles the sampling or maybe latching of the sensor.  I noticed that the GPU clock varies considerably.  Perhaps the data is not being fed to the video board by the cpu fast enough?

            The driver version numbering is not consistent.  I expected "18" but got "2527.9" instead.  It is dated 3/22/2018 so it is really new compared to 2015 and the S9100 from the manual search.

            • Using SSE4.1 path
            • Found 1 platform
            • Platform 0 information:
            •   Name:       AMD Accelerated Parallel Processing
            •   Version:    OpenCL 2.1 AMD-APP (2527.9)
            •   Vendor:     Advanced Micro Devices, Inc.

             

            While GPU-Z is now working, I still have a problem running more than 1 task concurrently and am looking into the problem.  The board seems to work perfectly if I only run one task.  The more concurrent tasks, the higher percentage of failed work units.  I was mistaken about the other user having no problems, but the number of failed tasks when he ran 5 concurrent tasks is insignificant compared to what I had when running 20 concurrent tasks.  Each task is modeling possible movements of stars in the Milkyway Galaxy as sort of explained here  http://milkyway.cs.rpi.edu/milkyway/science.php

              The app makes extensive use of double precision arithmetic and can only run on graphics boards supporting DP.  I have processed over 40,000 units at 4 per HD7950 and 5 per S9000 and had only 2 failures.  I got 90 failures within an hour running 20 concurrent tasks on my S9100. This should not have happened.  Currently I am running only 1 at a time on the S9100 and there are no failures.  I am not sure what the problem is but I suspect the driver as all these boards are Tahiti class processors.