I'm having trouble with the windows drivers provided by AMD. I now have built two machines: let's call them Alpha (1xHD5450, 3xHD6990) with 7 GPUs, and Beta (8xHD7970) with 8 GPUs. I got Alpha working under windows after some trouble; I still haven't got Beta working under windows so any help is appreciated.
When building Alpha, every time with a 64-bit windows 7 professional,
Now I'm buiding Beta, still every time with a 64-bit windows 7 professional,
Alas, windows updates does not propose me to install HD7970 drivers. So the fourth thing that solved it for Alpha, doesn't work here. Under Linux (Ubuntu 12.04), OpenCL sees all 8 cards, so there's no hardware problem. It's purely a problem with the AMD graphics drivers for windows (or at least win7 64-bit).
It's also not a problem with the APP SDK, since the problems arise already when just installing the graphics drivers. Looking in the device manager, it lists 5 HD7970 and 3 standard VGA display adapters.
I'm out of clue here. Are there any generic drivers for windows, like the linux people have made for linux? Or do you have any other suggestions for me to make all 8 cards work under windows?
Oh this might might be worthwile: when trying to disable&re-enable the non-recognized device through the Device Manager, another BSOD came, but this time it did not automatically rebooted, so I could take a picture from it: http://dl.dropbox.com/u/3060536/bsod.jpg
Seems atikmdag.sys is the culprit. Hope this helps in debugging the drivers.
Please uninstall all the driver files with Driver Sweeper from Guru3D
http://downloads.guru3d.com/Guru3D---Driver-Sweeper-%28Setup%29_d1655.html#download
This tool is very helpful an clean really everything. It is always possible, that there are some file still even if you uninstall the driver with the AMD uninstall function.
Please report, if it have helped or not.
EDIT:
You can also try this:
Start the system in secured mode
press: windows+R key
enter: services.msc
disable "ATI External Event Utility"
restart the system and look if the problem have solved
And the last possibility is, that your DDR3 Ram is broken. Check the Memory with the Memtest from linux. if i remember right memtest86+ is the name
You can also remove all DIMMs, and use only one.
EDIT2:
Ok another possible solution is to reduce the Core/RAM clocks for the GPUs.
How big is your powersupply?
EDIT3:
Ok and one more
http://board.zuxxez.com/showthread.php?t=35435
The atikmdag.sys is a file from MS as far i see. How i understand, it handle the response from the GPU. I remember, that i have a problem when i run a OpenCL Programm on the GPU, that tooks longer than 2 secons, the driver get resetted.
Just try to change the values in the registry. Perhaps it helps, because you have so many GPUs, and the system is not fast enough to get a response from the cards intime.
Skysnake wrote:
Please uninstall all the driver files with Driver Sweeper from Guru3D
http://downloads.guru3d.com/Guru3D---Driver-Sweeper-%28Setup%29_d1655.html#download
This tool is very helpful an clean really everything. It is always possible, that there are some file still even if you uninstall the driver with the AMD uninstall function.
Please report, if it have helped or not.
Uninstall, and then do what? I have tried it on a clean vanilla install of windows, several times. So I don't get what 'just' uninstalling them would help.
Start the system in secured mode
press: windows+R key
enter: services.msc
disable "ATI External Event Utility"
restart the system and look if the problem have solved
You mean safe mode? What exactly would I be disabling here?
I will reinstall windows and try this, once I have time.
And the last possibility is, that your DDR3 Ram is broken. Check the Memory with the Memtest from linux. if i remember right memtest86+ is the name
You can also remove all DIMMs, and use only one.
There are no memory problems. It's a problem with the windows drivers provided by AMD.
EDIT2:
Ok another possible solution is to reduce the Core/RAM clocks for the GPUs.
How big is your powersupply?
I have roughly 300W available per GPU, that's really not the problem. Under linux, I can run my computations overclocked to 1150 MHz without any problem.
EDIT3:
Ok and one more
http://board.zuxxez.com/showthread.php?t=35435
The atikmdag.sys is a file from MS as far i see. How i understand, it handle the response from the GPU. I remember, that i have a problem when i run a OpenCL Programm on the GPU, that tooks longer than 2 secons, the driver get resetted.
Just try to change the values in the registry. Perhaps it helps, because you have so many GPUs, and the system is not fast enough to get a response from the cards intime.
Indeed, the watchdog timer kills a process after 2 seconds, but it's not clear to me how this could ever help the problem that only 5 out of 8 cards are being recognized. But I will try, thanks.
It is just everything i found
And of cours, your Memory should be ok, but i read, that this kind of failure can happen because of memory problems. I don´t know how big the differences are between linux and windows in handling such problems.
I hope something helps. If not, it would become realy realy realy hard to solve it.
Let it me know if you need more help. I have also a contact to Microsoft, so perhaps this could help.
Let it me know if you need more help. I have also a contact to Microsoft, so perhaps this could help.
Actually, that might be very handy. What solved it for the other machine (with 1x HD5450 and 3x HD6990) was that Windows Update provided the graphics drivers, instead of AMD Catalyst. The drivers installed by Windows Update did recognize all GPUs flawlessly out of the box.
So it would be very helpful if you could gain more information on the difference between both machines I described:
Puh, you have really hard questions
How i told you, how far i know, the file with the problem is a Microsoft file. So it could be, that Windows updates this file, when it loads the default driver, and when you install the catalyst, this is not done.
If you want, i can call/send a Mail to my Microsoft contact.
But please give me as much informations as possible.
Skysnake wrote:
Puh, you have really hard questions
Of course, they are unsolvably hard for us. But perhaps someone with internal information from MS may be able to answer them. And yes, it is very well possible that the problematic file gets updated, but then it stays a mystery to me why this update was triggered on the first machine and not on the second.
If you want, i can call/send a Mail to my Microsoft contact.
But please give me as much informations as possible.
That would be very useful. What more information do you want? I can give you my email address if you want more instant reply than in this topic. But other than what is listed here, I wouldn't know what to add.
(Except maybe that the motherboard is a MSI Marshal Big Bang (B3) with chipset/cpu/ethernet drivers installed.)
Yeah, that would be helpful. You find my E-Mailadress in my profile-informations. Just send me a short mail, so that i have your adress and we can exchange the informations faster.
I will call later my contact and ask, if they could help.
EDIT:
Ok, the Mail to MS is done. I will make a node, when there is a answer.
But i have also one more idear!
Can you remove one of the HD7970 cards, and insert one of the 6990 or the small 5450? Perhaps this helps.
Yeah, that would be helpful. You find my E-Mailadress in my profile-informations. Just send me a short mail, so that i have your adress and we can exchange the informations faster.
Done.
But i have also one more idear!
Can you remove one of the HD7970 cards, and insert one of the 6990 or the small 5450? Perhaps this helps.
Could you tell me why you expect it to help? I don't own the 6990s, I only borrowed them for testing and getting them back now is rather difficult. And the machine is now 2m high on a server rack in our research unit's datacenter for some more serious stress testing. So while it's not impossible, it'd be quite a hassle to try this.
The reason why i hope this could help is, that the FASTRA guys also have had added a different card to make there machine bootable, and on the windows machine with the 6990 it also works with the one different card.
So perhaps it also works with 7x 7970 and 1xsomething else.
It is just something you could try. Perhaps it works. It is definetly something you should try. From MS no answer until now