Archives Discussions

michael_chu · ‎05-13-2010

http://developer.amd.com/documentation/articles/Pages/OpenCL-Optimization-Case-Study.aspx

This article discusses performance optimizations for AMD GPUs and CPUs using as a case study a simple, yet widely used computationally intensive kernel: Diagonal Sparse Matrix Vector Multiplication. We look at several topics which come up during OpenCL™ performance optimization and apply them to our case study:

Translating C code to OpenCL™
Choosing data structures for dense, aligned memory accesses
Using local, on-chip memory
Vectorizing the computation for higher efficiency
Using OpenCL™ images to improve effective memory bandwidth
Parallelism for multicore processors

At the end of our journey, we'll have a high-performance kernel for both the AMD Radeon™ HD 5870 GPU, as well as the AMD Phenom™ II X4 965 CPU.

OpenCL™ allows developers to write portable, high-performance code that can target both GPUs and CPUs. OpenCL™ unlocks the performance capabilities of today's parallel processors, although, as with any other programming environment, achieving high performance requires careful attention to how the code is mapped to the hardware platform and executed. Since performance is a prime motivation for using OpenCL™, performance optimization is a natural part of learning how to program in OpenCL™.

cuorematto · ‎06-24-2010

Really a great article from Bryan Catanzaro i have liked and printed it. In future i want to experiment a little on GPU performance improvements.

Very very very Good the translation from C code language to open Cl language in future i hope in other examples ever with this programming languages traductions from C to open Cl.

kathleenr · ‎03-18-2011

Excellent article. I have got it bookmarked.

kiddoman · ‎03-31-2011

That is pretty kind to do so! It does me a lot of favor.

galmok · ‎04-28-2011

Do we really need 2 identical threads, both stickied?

MicahVillmow · ‎04-28-2011

galmok,
Thanks for bringing this to our attention, I've requested the other post be unstuck.

Archives Discussions

OpenCL Optimization Case Study: Diagonal Sparse Matrix Vector Multiplication