7 Replies Latest reply on Feb 3, 2014 2:28 PM by artdensmore

    clMAGMA with HD5850 gpu, WinXP32, VS2008 + MKL

    artdensmore

      Any suggestion is appreciated to help solve this problem I'm having trying to resolve some bugs with my clMAGMA build using HD5850 gpus:

       

      1) Compiling and running  with AMD Sempron CPU:

       

      testing_Xgesv_gpu, with N not an integer multiple of 4, yields the relative compute error of about 5x10^-3, irregardless of X=z/d/c/s.  (All have appropriate error 10^-17 or 10^-8 when N is a multiple of 4.)  I added an lapack sgesv solution to the testing code and found that the sgesv_gpu solution differs substantially from the lapack sgesv solution when N not a multiple of 4 - thus the 5x10^-3 relative error.

       

      My clMAGMA build is detailed in this post on MAGMA forum: http://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=727&sid=4bc73532664b1156d8cfec04099c23e1

       

       

       

      2) Compiling and running with Intel Celeron CPU (otherwise everything else supposedly the same as above):

       

      testing_Xgesvd (X=z,c,d,s) all run fine, with relative compute error 10^-16 (z/d) or 10^-7 (c/s).

       

      "testing_zgesv_gpu.exe -N n -R r", with n a multiple of 4, runs fine, with relative compute error 10^-16.

      But, "testing_zgesv_gpu.exe -N n -R r", with n not a multiple of 4 yields relative compute error of about 10^-2.

       

      "testing_Xgesv_gpu.exe -N n -R r", with X=c/d/s and n and r of any value, all yield relative compute error "1.#Re+000". I added some diagnostics to the testing_sgesv_gpu prog, and it appears that only the 4th row of x (Ax=b) is transferred from the GPU back to the host when N from 4 to 7, or only the 8th row when N from 8 to 11, etc.  There's a pattern there. Since only one row of the solution matrix has valid data, the final result including the entire solution matrix is nonsense and the result unprintable (1.#Re+000).

       

      "testing_zgetrf_gpu.exe -M m -N n", with any m and n, yield relative compute error 3e-2, but

      "testing_Xgetrf_gpu.exe -M m -N n", with X=c/d/s, and any m and n, yield good relative compute error 10^-9/-18/-9.

       

      Also, all "testing_Xpotrf_gpu.exe -N n", with any n, and X=z/d/c/s, all yield relative compute error from about 1 to 100 to QNAN.  (Might be the same problem mentioned above of data transfer of the gpu solution back from GPU to host.)

       

      My clMAGMA build is detailed in this post on MAGMA forum: http://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=727&sid=4bc73532664b1156d8cfec04099c23e1

       

       

       

      3) clMAGMA with my builds seems to be only able to allocate up to about 135 MB of the GPU memory (1GB DDR5), with matrix size only up to about 2500.  Since 135 MB is such a small fraction of the GPU's 1GB of memory, I'm wondering if the fact that the same gpus are used as Windows primary display device might be the cause.  Is Windows, by using the gpu as the primary display device, reserving most of the memory on the HD5850 card?   If so, can a HD5850 be in a WinXP32 system without having to serve as the primary video device?  (The AMD Catalyst drivers apparently automatically disable the video driver on the motherboard.)