With ratGPU this is the render I get using the AMD CPU implementation:
and this is the correct one using the GPU and also from Intel CPU implementation.
So it seems your compiler is messing a bit some ray signs, probably due to some kind of optimization you applied in the drivers?
I have no idea.. But if you are using OpenCL, Can you try "-cl-opt-disable" and see if some optimizations are messing it up?
Apparently, what mess the result is the -cl-fast-relaxed-math flag.
If I use -cl-opt-disable or empty options string then it works ok.
I think it's a bug in your OpenCL CPU implementation because:
1. It works ok with the Radeon GPU implementation.
2. It works ok with other implementation like Intel or NVIDIA.
Btw, I'm using Windows 7 x64 and Catalyst 13.9/13.11 beta, a Radeon 7790 and ratGPU 0.6.0.
What you do inside your runtime for "cl-fast-relaxed-math" can be vendor dependent....
Anyway, I will go check...
Thanks for testing and posting the result here,
Best,
Bruhaspati
I hear that "cl-fast-relaxed-math" must be more accurate on CPU than GPU.
Can you confirm that what you are seeing is just the opposite?
+
Any repro-case would be helpful.
I visited the ratGPU page.
My proxy failed the download....But I guess you are supplying only binaries..
Any chance you can give us a small petite repro-case?
+
What is the speed-gain you achieve using cl-fast-relaxed-math option on all platforms?
Thats probably an indication of the degree of loss of accuracy..
Best,
Bruhaspati
Debugging it I see clearly the sign of some operations and inverted. Things that should be X are -X or 0.0f, causing the bug.
The fast relaxed math improves performance around 10%, because I'm making lots of MAD operations as well 1/sqrt and 1/x operations.
Repo case: download ratGPU 0.6.0 for Windows and test yourself with Catalst 13.9/11beta with only the AMD CPU OpenCL device enabled.
ratGPU is available as deb package or EXE installer.
Does it install the sources as well?
+
I need a compact test-case. If you think the signs are inverted, Can you create a small test-case that shows the problem...............?
I am planning to develop a repro-case template with the necessary host-code to launch a kernel and get output.
May be, you can then just plug-in your kernel fragment, replace some host stubs and can provide us a repro-case.
Do you think that would help?
For now,
I have just attached a test-repro-case I wrote for finding some problem in some other thread...
May be, this can give you a headstart in writing your own simple repro-case....
Thanks,
Best,
Bruhaspati
Thanks for the small code template. I'll try to locate which part is causing the problem but gonna take me some time because the kernel is quite complex
I had no idea, but now I have learned enough from this forum. Specially thanks to devgurus.amd.com.
Hi bubu
I'm reviving this thread. Do you still see this issue?
--Prasad