I'm new in OpenCL and I'm trying to implement power iteration method (described over here)
matrix sizes over 100000x100000!
Actually I have no idea how to implement this.
It's because workgroup have restriction CL_DEVICE_MAX_WORK_GROUP_SIZE (so I can't make one workgoup with 1000000 work-items)
But on each step of iterating I need to synchronize and normalize vector.
1) So is it possible to make all calculations inside one kernel? (I think that answer is no if matrix sizes is more than CL_DEVICE_MAX_WORK_GROUP_SIZE)
2) Can I make "while" loop in the host code? and is it still profitable to use GPU in this case?
1) is k+1 iteration dependent on result from k iteration and you need global synchronization? global synchronization is done at separate kernel execution
2) yes 100000 work items can be profitable. if you are enqueue small kernels then it is best to enqueue multiple then before calling any synchronization. for example
so you OpenCL driver can batch them together and make it more effective.