cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Adept I
Adept I

W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Hello everybody,

I am trying to develop an OpenCL ODE solvers library and encountered the following problem. I hope someone can provide some input.

In my system (windows 8.1 x64) I have a Firepro W8100 (first PCI slot) and a 7970 GHz Edition (second PCI slot). I have installed the latest driver for both cards.

W8100 : 14.502.1019-WHQL-FirePro-WindowsX32X64

7970 : AMD-Catalyst-15.6-Beta-Software-Suite-Win8.1-64Bit-June22

The OpenCL driver for both devices is 1642.5 (OpenCL 1.2).

I am using Visual Studio 2013 Pro to compile my OpenCL programs. I compile the programs for x86 and x64 targets.

When I run the executables using the 7970 there is no difference between the x86 and x64 version. x86 and x64 have the same execution time and by using CodeXL I see that the kernel uses the same number of VGPRs and SGPRs.

The problem is when I run the executables using the W8100. I see that in the case of x64 it takes considerable more time to run and the kernel uses more VGPRs and SGPRs.

Any idea why this happens and how I can fix it?

Many thanks!!!

CodeXL screenshots:

7970_x86 https://www.dropbox.com/s/wktzusnnltg4nee/7970_x86.png?dl=0

7970_x64 https://www.dropbox.com/s/sr40dwx61s0v7c8/7970_x64.png?dl=0

W8100_x86 https://www.dropbox.com/s/s312pflf9puoszc/W8100_x86.png?dl=0

W8100_x64 https://www.dropbox.com/s/5ulngea7retxxgy/W8100_x64.png?dl=0

0 Kudos
Reply
1 Solution

Accepted Solutions
Adept I
Adept I

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Dipak suggested that compiling the kernel in the x64 build using the -cl-std=CL20.0 option can reduce the number of registers used.

Many thanks Dipak!

View solution in original post

0 Kudos
Reply
14 Replies
Staff
Staff

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Welcome! I have whitelisted you and moved this to the OpenCL forum.

0 Kudos
Reply
Adept I
Adept I

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Thank you very much jtrudeau!

0 Kudos
Reply
Adept I
Adept I

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

I did some more test using the latest driver and an older one. We can see at the table below that in the case of the latest driver the runtime (seconds) and the number of VGPRs and SGPRs is higher in the case of the x64 executable.

   

x86x64
Driver versionruntimeVGPRsSGPRswavesoccupacyruntimeVGPRsSGPRswavesoccupacy
13.352.10141.2-1411.424.12616032410%23.74316032410%
14.502.10192.0-1642.523.8329230420%98.8219559410%
0 Kudos
Reply
Staff
Staff

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Thanks for reporting this.

Not sure whether your issue has any connection with this one Catalyst 14.12 OpenCL problems  or not. At that time, we verified that issue for x64 version only (OS: Windows 7 64bit), not for x86. So, not sure about the x86 numbers. If there is any link, this x86 related information will be useful for that issue also.

I would suggest you to try a few more combinations as below and share your observations. Those information will be helpful to get more hints.

1) One card at a time

2) Reverse the order of the cards on multi-gpu setup i.e. Tahiti as first GPU and FirePro as second one.

3) may try some other catalyst versions, for example 14.9

Based on the observations, we may require a reproducible codebase to test the issue at our end.

BTW, I was unable to open the CodeXL screenshots. Please verify the links once.

Regards,

0 Kudos
Reply
Adept I
Adept I

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Hello dipak,

Thanks for you input.

I will do the tests that you suggested and get back to you in a couple of days.

I believe that the links are good. In the case you see login screen press esc.

Many thanks!

0 Kudos
Reply
Staff
Staff

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Hi,

Just to let you know that a latest version of catalyst (15.7, display driver version 15.20.x) has been released recently. As per the release notes, it has many performance optimization/improvement features compared to earlier Omega drivers. So, please check your issue on this version once.

Regards,

0 Kudos
Reply
Adept I
Adept I

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Hello,

New drivers is always good news.

I haven't installed them yet but I tested my code having the Firepro second and then only that card on the machine and observed the same behavior. The x64 version is slower. Uses more registers. The performance of the code running on the 7970 is not affected by any configuration. The only workaround I found so far is to install the older version of Firepro drivers.

A couple more questions. It is my understanding that I cannot install the latest version of catalyst on the Firepro card. I have to install the drivers that are for the Firepros only, right?

I can test the new driver only on the 7970. Is that correct?

Many thanks!

0 Kudos
Reply
Staff
Staff

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Yes, you're right. Catalyst 15.7 driver is mainly for desktop graphics cards and APUs, not for FirePro series (detail list of supported devices can be found under "supported products" tab). It seems that you're already using the latest driver for your FirePro card.

Its good that you're able to isolate your problem. From your description, it seems that the problem is related to only FirePro card and driver, it has no connection with multi-gpu setup or other cards. Could you please share the reproducible test-case?

FYI: There is a FirePro Development community specially for questions or issues about developing code on AMD FirePro cards.

Regards,

0 Kudos
Reply
Adept I
Adept I

Re: W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Jump to solution

Hello,

Because the code is part of a publication under preparation I will send you a private message with the code.

Many thanks

0 Kudos
Reply