Archives Discussions

sarnath · ‎06-01-2009

1kernel for each timestep??

I just downloaded the AMD streams SDK.

I see 2 things from the binomial option pricing SDK:

1. Kernels call other kernels inside. This is interesting. And the user-guide too says its possible. Just great!

2. There is a separate GPU kernel for different time-steps(4,8,12 etc.). This is just ridiculous. Are not there better way to do this binomial thing on ATI?

3. More time-steps usually result in correct results and are desirable. 200 time-steps is normal. Even 1000 time-steps would be good ( though I dont have practical knowledge on how bankers use it). More time-steps make this discrete tree close to a continuous model like black scholes and will yield correct results. So, if I were to run option pricing for different time-steps, will I need to write kernels aftr kernels??? There must b a better way guys....

We have developed option pricing models using CUDA and it is just a breeze. The SDK sample is just plain discouraging. A better one would be good!

gaurav_garg · ‎06-01-2009

Till SDK1.3 shared memory was not supported in Brook+, the application was written before that. There are not many samples on shared memory either as there are some limitation in using shared memory on current ATI GPUs.

But, I think with the current limitation also, writing a better algorithm for binomial option pricing should be possible.

sarnath · ‎06-01-2009

Dear Gaurav,

Thanks for getting back.

1. One more question. Since kernels can spawn sub-kernels -- what if I spawn a sub-kernel that has more threads than the current one, Will it work? (anyway, sub-kernel concept looks cool)

2. When I use "read-write" stream -- how does synchronization happen between threads? (OR Is it like a thread is allowed to modify only the content allocated to it -- must be the case, I think....)

3. When I use "read-write" global memory - say , for a refined binomial SDK, how does synchronization happen beween threads? This is very important for binomial - because the same buffer has to be back-calculated and reduced.. It is essentially a process of reduction. But one that can be done parallely and smartly.

A search for "Synchronization" in the user guide only tells me about application's synchronization requirements for CAL calls.

Thanks & Best Regards,

Sarnath

gaurav_garg · ‎06-01-2009

Hi Sarnath,

1. Kernels can NOT spawn sub-kernels, they can just call them like a function. Sub-kernel gets inlined in assembly.

Brook+ and CAL support different mode of execution - Pixel mode and compute mode (similar to CUDA execution model). There is no synchronization mechanism in pixel mode, but in compute mode you can synchronize between threads within a group (similar to CUDA block). Use syncGroup() in Brook+ and fence_threads in CAL IL.

Look at section 2.17.6 of stream computing user guide for Brook+ and Intermediate language specification for CAL IL instructions.

Archives Discussions

Binomial option pricing SDK