Hi and Merry Christmas,
Try rerunning your benchmarks while running in another console (in a script
It could also give you a useful baseline of your cards, if you try it b4
running your tests.
I suspected power management might be to blame or throttling because the cards are the same but I never tested this directly - the benchmarks to get these timings are 7 seconds long and I run them back to back or extend the numbers - it doesn't really change it the timing per call after the first second or so (you can indeed see the numbers change while the cards ramp up). Unfortunately the m290x's are still "not supported" by amdconfig despite 14.12 and the drivers been working well for a while (I've posted this bug to one of these amd boards before):
DISPLAY=:0 amdconfig --odgc
amdconfig: No supported adapters detected
DISPLAY=:0 amdconfig --odgt
amdconfig: No supported adapters detected
opencl works on the gpus obviously as does amdcccle. I'm remote right now thus the DISPLAY=:0 - this all works on remote desktop under duplicate configuration except its a single AMD Radeon HD 7900 Series - I just tested that.
If its of value I can post some timing plots showing a ramp down in time per call but I can't figure anything else. In CodeXL performance counters were showing lower - I believe occupancy went down from 100 to 70. Not 100% sure that was the case but I've been trying to figure this out here and there for 2 weeks.
Too bad amdconfig doesn't support your cards. An independent
monitoring/validation of your problem would be very useful. Can you get
some figures from CodeXL? I suspect a hardware issue. I use a Sapphire R9
277 card with Ubuntu 14.04 x64. amdconfig is compatible with my setup. I am
using webmail (gmail) to reply, so I'm using full Xorg with 4 virtual
Default Adapter - Supported device 6811
Core (MHz) Memory (MHz)
Current Clocks : 300 150
Current Peak : 945 1400
Configurable Peak Range :
GPU load : 0%
Sometimes it might get up to 3%, but this is spurious, too fast to identify
what is causing it. What I'm trying to say is that all Ubuntu resources,
don't amount anything for my (and hopefully your) card. So, I'm suspecting
a hardware issue, card or bus. Can you try switching the cards and report
But I missed you have a xfire setup. I don't know about it and don't use
it, but I imagine it must have some overhead and must run off somewhere.
Could it be that 30% off your first card?
If that is the case, maybe for opencl, it is better to use 2 independent
cards (processors) instead of a single crossfired one
CodeXL will have to wait till next week as I can't run it remotely - it crashes on gl related problems with x2go and doesn't support newer glx protocols anyway.
Re crossfire, afaik it's not as complicated as it once was, just point to point communications over pcie bus. I doubt any overhead is going on there. I was considering maybe the card knows it's a display driver and reserves some of itself or something or hardware issues but was hoping AMD team could help me poke around better. It is a real bummer amdconfig still isn't working. Judging from my desktops where I've used amdconfig to watch GPU load, I would say like you generally not doing much on the desktop will not incur much on the average. I actually replicated the timing with Xorg shut down - going headless so to speak.
For switching cards, I'm supposing you mean switch which one is driving the display? I don't think I can do that since it's a laptop.
I meant switching the PCI bus, not just the monitor cable. Didn't know
about the laptop, and I imagine it is tough to do.
You can switch display cards through "Devices" section in
I would imagine that xfire is a bit more than point 2 point communications
over PCIE (schedulers, load balancers, cache mappers?)
I've collected profiling information for both GPUS. I can't post the kernel sourcecode here but I've replicated the timing behavior across a few of my kernels which all are different work flows.
GPU1 is the m290 GPU whos timings are good and consistent with my desktop R9 290. GPU0 is the m290 one who's behaving a few times slower.
I've noted that most fields of the profilings are the same but it is easy to see the last few columns from VALUBusy to LDSBankConflict (1.17 on GPU0 and 3.32 on GPU1) and VALUBusy have some significant differences - differing by factors of 2-3 which is around how many times kernels are running slower on that card.
prepost update: I just stumbled on this thread which works around the problem: http://devgurus.amd.com/thread/169896 by setting env variable GPU_NUM_COMPUTE_RINGS=1
I guess at this point: wtf @ regression testing on amd's side.
I have an update to this that it should not be relied on - I've observed for several kernels incorrect results from GPU0 and for iterative kernels this can also lead to deadlock. I've also noted sometime later the effect of this work-around on timing seems to go away and said GPU becomes slow to execute again.