Dear Stream SDK users,

I am proud to announce the release of the Vienna Computing Library (ViennaCL), an open source (MIT license) scientific computing library written in C++ based on OpenCL. It allows simple, high-level access to the vast computing resources available on parallel architectures such as GPUs and is primarily focused on common linear algebra operations (BLAS level 1 and 2) and the solution of large systems of equations by means of iterative methods. At present, the following iterative solvers are implemented:

* Conjugate Gradient (CG)

* Stabilized BiConjugate Gradient (BiCGStab)

* Generalized Minimum Residual (GMRES)

An optional ILU preconditioner can be used, which is so far precomputed on the CPU and may thus not lead to overall performance gains.

The library interface is similar to that of the ublas library, which is shipped with Boost. The iterative solvers can be used either on the CPU with ublas types or on the GPU using ViennaCL types. Consequently, there are only a few code changes in existing simulators necessary to get the iterative solvers running on the GPU.

At present, ViennaCL does not provide double precision using Stream SDK 2.1, because not all functionality as defined in the double precision extension in the OpenCL standard is implemented. Moreover, the current version of ViennaCL only uses GPUs via OpenCL, but not CPUs. This will most likely change in the next revision.

More information can be found on the project homepage located at http://viennacl.sourceforge.net/ (remark for the forum rules: we don't earn anything if you click on that link)

If you have any questions, feel free to ask them here :-)

Best regards,

Karli

Hi, I have some questions.

Are the CPU implementations optimized? Specifically, are the calculations blocked to fit in cache? An example of this kind of tuning is the ATLAS BLAS library. And do the CPU implementations use all of the cores? I ask as the usual wisdom is that CPUs have the advantage when arithmetic intensity is low due to very high effective memory bandwidth.

The other thing I wonder is about what performance is like with dense matrices, specifically something like a covariance matrix (so is symmetric positive definite and dense).