ACML-GPU v1.0 is now available at:
http://developer.amd.com/gpu/acmlgpu/Pages/default.aspx
And can be downloaded without having to email streamcomputing@amd.com.
You will need at least Catalyst 9.2 or later to run ACML-GPU v1.0.
Only 64 bit versions are posted. Will there be a 32 bit version?
For now, only the 64-bit versions are available. I will relay the interest to the ACML team for a 32-bit version.
Michael.
Hi, Michael:
Two questions regarding the ACML-GPU library.
(1) Last time I called, they told me it only work for certain processors (I think they mentioned 3870). Does it work with all of the GPUs that support streaming processing? (I just try to use my existing 3450 before buying a more powerful one.)
(2) What I want to do is to use the GPU processing power without doing much GPU specific programming. Can I use my regular c program just link to the proper routines in the ACML-GPU with the program to speed up my program?
Appreciate your help.
P.S. I am also interested in the 32 bit windows version of the library, since I don't have Windows x64.
Hi twinclouds,
It should work on the cards supported by the SDK. It uses CAL so whatever CAL works on it should work on as well. DGEMM will only work on the cards with DPFP hardware support though.
With ACML-GPU, you can accelerate SGEMM and DGEMM without coding in a GPU-specific language. The calls are simply normal C/Fortran function calls from the perspective of the programmer.
I have already let them know of the interest in a 32-bit version.
Michael.
Michael:
Thank you very much for your prompt reply. I will try it out and let you know who it works.
One more question may be you can answer. Does streaming processing works on integrated graphics, e.g. 780G chipset? Thanks.
It's not officially on our supported list yet (I'm aiming for it to be there in the next month or 2... it's a testing matrix thing...).
That being said, it's going to behave much like an ATI Radeon HD 2400. The newer discrete cards are going to offer a huge boost in performance when you run SGEMM or DGEMM.
Michael.
Thanks for your reply.
Yes. I understand the power of higher grade cards. Right now, I just want to see if it works before I get into the speed.
You are very helpful. Thank you very much.
Hi, Michael:
Now I am trying to use the amclg1.0 library on Linux. The first thing I did is try to run the makefile in the example directly. It told me that it cannot find the laticalc.a library. I have installed the 9.2 video diver. Is there anything else I need to install, e.g. SDK1.4? It says it is optional, though.
Another question is that It said in the manual that amclg1.0 need to use gcc/gfortran 4.1.2. The defauld compilers are 4.3 for Ubuntu 8.10. I installed gcc 4.1.2 also but gfortran 4.1.2 is not supported by 8.10. I can go back to earlier versions of Linux if necessary but for now, I would like to first to compile the example to begin with.
Thanks in advance for your help.
Please try with Catalyst 9.3. There are prebuilt binary examples in the acml download, they should run out of the box if the installation went fine.
Do you mean 9.2 because I think 9.2 is not available yet?
Sorry, I meant 9.3 is not available yet.
As someone suggested earlier, you dont need a full installation of gfortran 4.1. Instead search for a deb of libgfortran1, extract it into a folder and then include that folder in your library path.
Thanks. Will try.
Not sure where to ask and don't know if anyone have asked this question before. Can program using ACML-GPU run in virtual machine under VMWARE server?
Dear Michael:
I eventually made the acml-GPU examples work on my Linux machine. What a relief!
Acutally, some of the programs in the example folder are in infortran, so gfortran-4.1.2 is needed if one want to comple them. I downgraded my Ubuntu to 8.04 for whcih the gcc4.1.2 and gfortran-4.1.2 are both available and everything works.
What I really want to use are the FFT routines in the acml library. It looks like that they are not available for acml-GPU library yet. If it is true, do you know when they will be available? If possible I wish I can get some early vesions of them to test out. I will be happy to report the results back once I try them out. Mainly, I am interested in 1D FFT routines.
Thanks in advance for your help.
Only 64 bit versions are posted. Will there be a 32 bit version?
I recently wrote a program to benchmark NVidia's CUDA FFT (CUFFT) routine. The program did 204800 2048 point FFT and IFFT pair. It take about 17 seconds using a 9600GT. This is similar to the routines in FFTw Library on a 2.66 GHz Core 2 CPU. Very disappointed. Actually, AMCL scored the best. It only took 7 seconds. I also found independent verification from the Web that CUFFT only became faster if the FFT size is larger than 32768 points.
Hope AMCL-GPU FFT will be better when ready.