Till SDK1.3 shared memory was not supported in Brook+, the application was written before that. There are not many samples on shared memory either as there are some limitation in using shared memory on current ATI GPUs.
But, I think with the current limitation also, writing a better algorithm for binomial option pricing should be possible.
Thanks for getting back.
1. One more question. Since kernels can spawn sub-kernels -- what if I spawn a sub-kernel that has more threads than the current one, Will it work? (anyway, sub-kernel concept looks cool)
2. When I use "read-write" stream -- how does synchronization happen between threads? (OR Is it like a thread is allowed to modify only the content allocated to it -- must be the case, I think....)
3. When I use "read-write" global memory - say , for a refined binomial SDK, how does synchronization happen beween threads? This is very important for binomial - because the same buffer has to be back-calculated and reduced.. It is essentially a process of reduction. But one that can be done parallely and smartly.
A search for "Synchronization" in the user guide only tells me about application's synchronization requirements for CAL calls.
Thanks & Best Regards,
1. Kernels can NOT spawn sub-kernels, they can just call them like a function. Sub-kernel gets inlined in assembly.
Brook+ and CAL support different mode of execution - Pixel mode and compute mode (similar to CUDA execution model). There is no synchronization mechanism in pixel mode, but in compute mode you can synchronize between threads within a group (similar to CUDA block). Use syncGroup() in Brook+ and fence_threads in CAL IL.
Look at section 2.17.6 of stream computing user guide for Brook+ and Intermediate language specification for CAL IL instructions.