I am using the Brook+ to compute on the Stream of type `double`. But the sin cos and sqrt functions only support type float, so the results of sin cos and sqrt are a little different from the results computed by CPU. As there are many such computations in my kernel function, the final results of GPU are very different from that of CPU.
I also want to implement my sin, cos and sqrt function in kernel. But the kernel function dosen't support local array, struct or union. So I can't implement these functions.
Therefore, what should I do to solve this precision problem? The requirements of precision for my program is strict.