Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Adept I

Best GPU algorithm for solving 4x4 linear equation system


What is the best GPU algorithm for solving a linear equation system Ax = b, where A is a 4x4 or 3x3 matrix?

The equation is to be solved in one work item.

Thank you in advance!

Vis Cocoa

10 Replies
Adept I

I'm not aware of a backslash operator available yet for OpenCL.  Does anyone know of one?  If so, we'd love to add it to Jacket and ArrayFire.

Also, I assume you're going to do many 4x4 or 3x3 matrices in a big batch.  Doing just one of these is going to be slower on the GPU than it would be on the CPU (i.e. not enough data to get benefit from exploiting data-parallelism).


I have just got an idea for 3x3

[a b c] x = d

det [a b c] = dot (a, cross (b, c));

det [d b c] = dot (d, cross (b, c));

det [a d c] = dot (a, cross (d, c));

det [a b d] = dot (a, cross (b, d));

x = (float3)(det[d b c], det [a d c], det [a b d]) / det [a b c];

Seems to be an efficient way? Any better ideas?


Your 3x3 method assumes the matrix is invertible.  Is that guaranteed for your case?


Hi Jeff,

Nice to see your reply. No, the matrix is not guaranteed to be invertible. I wonder what if A is a singular matrix? I hope it results in an infinite float number

I have not make the kernel work yet. I am currently struggling with printf, which sometimes prints wrong characters, sometimes does not work at all. I appreciate if you give me some ideas on printing floats.

Thank you!

Vis Cocoa


Row reduction is guaranteed to work even for singular matrices, but then I don't have a GPU friendly algorithm for that!


Thanks Jeff.

For Ax = b, if A is singular, the equation has either no solutions or infinitely many solutions.

I am working on a ray tracer. I can safely treat both cases as no solutions.

I have tested the triple product based method. It is more efficient than my previous method using scalar calculations


Since you said you are working on a ray tracer I assume the 4x4 matrix would be a transformation matrix in which case it would be guaranteed to be invertible (that is if the transformation is linear, which I think it is, you would have to check). In this case I would do something similar to your 3x3 method of going for the determinant but for a 4x4 matrix, I suggest you use mathcad or matlab with the simbolic toolbox to make things easier and less error prone. Iterative methods for a matrix this size are not efficient, by the second iteration you probably could have already solved it with gauss, LU, etc..


thank you John!

Adept I

Left multiplying by the transpose of A and solving the thus resulting system should work (this should turn your system into a symmetric definite one). Given that it's a small system, your best bet is to just solve it on the CPU, doing LU decomp or Gauss-Jordan elimination or whatever. Alternately you have iterative methods that should map reasonably well to a parallel machine (still, if you just want to solve one system, just use the CPU and bog standard serial C/C++/Java/somethingsomething). Look at something like CG or GMRES, or, for lots of useless but trivially parallelizable work, Jacobi. If you have more systems to solve, or perhaps more right-hand side, parallel solves should be obvious for most of the aforementioned methods.

Hi AlexV,

Thank you very much for so many good ideas. I need to solve a small equation system in each work item, rather than solving a big equation for all work items. Therefore, it would be inefficient to move the work to CPU.

When working on CPUs, I like Singular-Value-Decomposition based method. A CPU core is much more powerful than a GPU core