Hello, Using the latest linux driver shows a much worse performance for float16 and double16. Is this expected?
Meassured with clpeak:
driver_version | 1642.5 (sse2) -> 1702.3 (sse2)
float16| 9.96633 -> 1.91836 Mflops
double16| 2.37198 -> 0.640011 Mflops
clinfo differences:
SVM capabilities:
- Coarse grain buffer: Yes
- Fine grain buffer: Yes
- Fine grain system: Yes
- Atomics: Yes
+ Coarse grain buffer: No
+ Fine grain buffer: No
+ Fine grain system: No
+ Atomics: No
Any pointers will be appreciated. Thanks!
Could you please provide a reproducible test case? Please also mention the setup details.
Hi
Just run clpeak on 14.12 and on 15.5
There are two issues
1) SVM capabilities have changed.
2) On the CPU the performance for double16 and float16 has dropped,I guess you could replicate it with any gpu
Thanks!
root@qt5022:~# uname -a
Linux qt5022 4.0.0 #1 SMP Fri Jun 5 15:50:44 CEST 2015 x86_64 GNU/Linux
root@qt5022:~# lspci -d 1002:9806 -vvv
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wrestler [Radeon HD 6320] (prog-if 00 [VGA controller])
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Wrestler [Radeon HD 6320]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 45
Region 0: Memory at a0000000 (32-bit, prefetchable) [size=256M]
Region 1: I/O ports at 2000 [size=256]
Region 2: Memory at d0200000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at <unassigned> [disabled]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag+ RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0300c Data: 4143
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Kernel driver in use: fglrx_pci
Kernel modules: fglrx
Thanks Ricardo. We'll check and get back to you.
Hi Ricardo,
I indeed found much lower GFLOP for double16, but, GFLOPs for float16 were almost same (please see the attached files). I'll report this to driver team.
FYI: Currently, on AMD platform, OpenCL 2.0 features such as SVM, device-side en-queue etc. are not supported on CPUs. So, I guess, difference in reported SVM capability has no effect on the performance.
Regards,
Hello
FYI: Just tried with 15.7 and I still get the error.
Regards!
Hello
With 15.9, there is exactly the same error. So you have all you need to replicate the error in your side? Can I help you somehow? Is somebody taking a look to this?
Regards!
Yes, the issue has already been reported to the engg. team and they are working on it. As soon as I get any update, I'll share with you. Please keep patience.
Regards,
What could be a reasonable timeframe for fixing this bug?
Thanks
As I checked, the issue is still open. Sorry, I can't comment about any timeline at this moment.
Regards,
Update:
The dev. team has identified the possible reasons (most probably due to disabling some of the optimizations for the CPU devices and it's also expected) for the above performance impact. However, they can't provide any timeframe of the fix right now.
Regards,
Hi again dipak
Any news to share? At least to celebrate the 9 month anniversary of the bug?
BTW, it is also failing on 15.12