Archives Discussions

jski · ‎05-31-2011

Been reading about NDRange and am wondering about how best to map it to the problem at hand. NDRange is a 1, 2, or 3 dimensional space where each element corresponds to a kernel instance. NDRange appears to best map to the architectural layout of the GPU.

If I have two 10K by 10K matrices and wish to multiply them, undoubtedly I would choose a 2D NDRange. As large as possible? But since these matrices are beyond the capacity for the GPU how should I best map the A and B matrices to the 2D NDRange available?

---jski

himanshu_gautam · ‎05-31-2011

that is a out of core matrix mutiplication problem you are talking about.

The answer is you will have to divide the matrices into blocks( say divide A matrix in rows and B in columns). Then send these blocks one by one multiply.

Archives Discussions

How best to map NDRange to the problem at hand?