cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

joej
Adept I

OpenCL Driver Bug FuryX 32bit

I get wrong results with a simple prefix sum with 32 bit version on Fiji, 64 bit works, Tahiti works with both.

I created a minimal project to reproduce:

GitHub - JoeJGit/OpenCL_Fiji_Bug_Report: Expose a 32bit driver bug

Project is in the zip file (don't know how to upload a complete directory).

Please take a look, i have a similar problem with Vulkan but was not yet able to reproduce it in a small test. See bug report.txt for details.

0 Likes
4 Replies
dipak
Big Boss

Hi Joe,

Thanks for reporting the issue.

It seems a compiler optimization issue. I can see the same error even for x64 build on my Hawaii card. After some experiments, I got below workarounds:

1) Disable the optimization during kernel build i.e. pass optimization flag "-O0" or "-cl-opt-disable"

OR

2) In PrefixSum() kernel (prefix_sum.cl), declare the following variables inside the loop as "volatile":

for (uint step = 0; step < 8; step++)

{

uint mask = ..

uint rd_id = ...

uint wr_id = ...

....

}

Could you please try the above workarounds and share your observation?

Regards,

0 Likes

Hi Dipak,yes both suggested workarounds work for me.

Disabling optimization also solved another issue just showed up in x64 for me too.

That's a nice stress test because it's a graphics app processing 60000 workgroups per frame, so i can see it keeps stable with big workloads over time.

Workgroups are either 64, 128 or 256 threads wide, and i need to disable optimizer only for 128 & 256 groups.

Let me know if you want me to track down the origin of this different bug, maybe i can create a second test case.

Do you think the same compiler issue can explain similar bugs in Vulkan?

In Vulkan the behaviour is very different: No bugs show up for about 10 - 30 frames, then they start popping up with increasing frequency.

And if i remember correctly, bugs also happen with workgroup size of 64, so it's not necessarily just a wavefront sync issue.

EDIT:

The Vulkan bug has magically disappeared. I did not change the shader and can't remember any relevant changes in the project - no clue why it's gone.

My guess is that another shader executed before and replaced in the meantime may have caused some kind of corruption - or something completely different...

0 Likes

Thank you Joe for the confirmation. I'll open a ticket for that optimization issue.

Let me know if you want me to track down the origin of this different bug, maybe i can create a second test case.

Sure, you can share the test-case. I would encourage you to create a new thread for the second one if it's a different bug. It would help us to track in future.

Regards,

0 Likes

It turned out my second OpenCL bug was my own fault.

One more issue, maybe not worth it's own thread:

I often use half floats to compress data in LDS.

Mostly this gives me a speedup close to 2 as it helps to increase occupancy.

But on complex shaders VGPR usage increases by large numbers (according to CodeXL), and copression causes slow down.

I looks like the compiler does not free temporary registers used for the conversation by convert_float4.

The same code in Vulkan still shows improvement using compression, Vulkan shader is 4 times faster than OpenCL kernel.

(Vulkan is generally faster, but usually about 10-20%)

0 Likes