All functions run on the device in OpenCL are inlined at compile time and cannot call themselves. You can recurse on the CPU and call GPU kernels during each CPU call. For your example, you don't even want to use matrix multiply to compute A^k where A is mxn. You should create an eigenvalue decomposition, compute the powers on the eigenvalues, and recombine the eigenvector matrices.