Multithreaded Brook+ algorithm from a nested for loop

Hi ,

I am new to Brook+  programming . I read a the Brook+ programming guide and could not figure out how to create mutiple threads in the kernel to take advantage of the GPU .

The Goal is to convert the a algorithm with 4 nested for loop to a multi threaded Brook+ program so as to improve the perfomance .

Is there anything where i can learn how to do this ?