cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

elavram
Adept I

W8100 is slower in x64 executable - uses more VGPRs and SGPRs

Hello everybody,

I am trying to develop an OpenCL ODE solvers library and encountered the following problem. I hope someone can provide some input.

In my system (windows 8.1 x64) I have a Firepro W8100 (first PCI slot) and a 7970 GHz Edition (second PCI slot). I have installed the latest driver for both cards.

W8100 : 14.502.1019-WHQL-FirePro-WindowsX32X64

7970 : AMD-Catalyst-15.6-Beta-Software-Suite-Win8.1-64Bit-June22

The OpenCL driver for both devices is 1642.5 (OpenCL 1.2).

I am using Visual Studio 2013 Pro to compile my OpenCL programs. I compile the programs for x86 and x64 targets.

When I run the executables using the 7970 there is no difference between the x86 and x64 version. x86 and x64 have the same execution time and by using CodeXL I see that the kernel uses the same number of VGPRs and SGPRs.

The problem is when I run the executables using the W8100. I see that in the case of x64 it takes considerable more time to run and the kernel uses more VGPRs and SGPRs.

Any idea why this happens and how I can fix it?

Many thanks!!!

CodeXL screenshots:

7970_x86 https://www.dropbox.com/s/wktzusnnltg4nee/7970_x86.png?dl=0

7970_x64 https://www.dropbox.com/s/sr40dwx61s0v7c8/7970_x64.png?dl=0

W8100_x86 https://www.dropbox.com/s/s312pflf9puoszc/W8100_x86.png?dl=0

W8100_x64 https://www.dropbox.com/s/5ulngea7retxxgy/W8100_x64.png?dl=0

0 Likes
1 Solution

Dipak suggested that compiling the kernel in the x64 build using the -cl-std=CL20.0 option can reduce the number of registers used.

Many thanks Dipak!

View solution in original post

0 Likes
14 Replies