I want to us 16 nodes, which has one HD4870x2 card, with 32 GPUs to run HPL, I find Q only can be equal to 2! P is bigger than Q, so the benchmark is very lower than Rpeak. How can I solve this problem?
when Q is greater than 2, like P*Q = 8*4, will display error output like below:
gpu1 Total Available Last Request
Local: 1024 MB 1003 MB 16777216 ( 16 MB) FAILED
Remote (NC): 1788 MB 1770 MB 0 ( 0 MB) FAILED
Remote (C): 508 MB 507 MB 0 ( 0 MB) FAILED