Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

OpenCL -O2 makes program incorrect: Help!

Hello AMD Forums,

This is my first post and I'm also new to OpenCL. Hopefully my mistake here isn't too "n00b". At the core, my program functions perfectly well at -O0 optimization level, but as soon as I go to -O1 or -O2, the output is completely wrong. The program overall is an OpenCL-based PuyoPuyo AI. In short, when you match four of the same color together, the "puyos" fall down, creating a chain reaction. This GIF is a good example of how these chain-reactions can occur: . Generally speaking, the bigger your chain, the more damage you do to the opponent.

There's a lot going on here (falling and scoring for example), but I've simplified the code down to the "pop" routine only. I've removed as much code as possible in the attached "" Visual Studio 2015 project. Please let me know if there are any issues with this .zip file or the project.

My Hardware is an R9 290x and my drivers are "Radeon version 17.10.3".


The output includes a ton of "printf" statements, but the very first 4 statements are all you need to see the difference from -O2 and -O0. In -O2, the first few lines are:

Err is: 0

A 80000000, 80000000 00000000 00000000

Color Table: e0800000 04001500 e5600020

B 80000000, 80000000 00000000 00000000

C 80000000, 80000000 80000000 80000000

Without going into how the algorithm works... the difference between line B and line C are intriguing. The printf in the OpenCL "" file really show the mystery.

          printf("B %08x, %08x %08x %08x", pickedBit, groupTable[0], groupTable[1], groupTable[2]);
          while (didGrow) {
               didGrow = false;

               // Why is "Printout C" wrong when compiled with -O2?
               printf("C %08x, %08x %08x %08x", pickedBit, groupTable[0], groupTable[1], groupTable[2]);

This is simple enough: there's absolutely no change to "groupTable[1]" between "printf B" and "printf C". So why would these variables change with nothing but a "while" loop in between them?

Again, this doesn't happen with -O0. My only guess at the moment is that this might be a compiler bug? But I'm wondering if there was anything else that I could have done that would have made this output.

Thank you for your time. If there are any questions on how the code is supposed to work, please ask. I would like to get to the bottom of this if at all possible. In particular, -O2 has zero-scratch registers in the full code, so I would rather use the optimized -O2. But if -O2 is wrong, I guess I'll be forced to keep my code at -O0.


Edit: some notes: This GIF: is represented by


And I've verified that they are the same. "B" for Blue, "G" for Green, "R" for Red, and "Y" for Yellow. Within the code however, I refer to these colors as "A, B, C, and D". Red is A, Green is B, Blue is C, and Yellow is D.

The algorithm starts to look for groups of 4 in the bottom left. Then it works "up" and "over". The "color table" is a local copy of the current color that its trying to match, as a bitblock. ColorTable[0] == "e0800000" represents column 1 and column 2. E == 0x1110, which correlates to Blue-Blue-Blue in the column 1 (starting at the bottom).

Hopefully this edit will help anybody who is trying to understand what the code is doing. What I expect is for "groupTable[0]" to be 0x8000000, while groupTable[1] and groupTable[2] needs to be == 0x00000000, at least in the first iteration of the loop. Ultimately, the groupTable[] will EVENTUALLY equal: (0xE0000000, 0x0, 0x0), representing the 3-blues in the bottom left (which isn't a large enough group to pop).

4 Replies
Big Boss

Thanks for reporting the issue and sharing the repro. My observation was same also when ran the project on my carrizo laptop. Seems an optimization problem. I'll report it to the appropriate team.

P.S. You've been whitelisted now.



Thanks for the quick response.

Has the team responsible figured out if its an optimization problem yet? I gave the simplified code a look through again, and I personally didn't see anything that seemed to be "undefined" with regards to OpenCL. Its all just bit-operations without many pointers going anywhere. And since the workgroup size is {1, 1, 1}, I'm certain there aren't any strange multithreading issues going on.


Already a ticket (under OCL compiler issues) has been opened against the problem, which looks like an optimization issue from my initial observation. Once I get any update from the concerned, I'll share with you.



I put a little bit more thought into this. If I set the compiler to OpenCL 1.2 mode with the flags: "-g -O1", it runs in OpenCL1.2 and therefore works with the CodeXL debugger. The "printf" statements seem to take up too many resources, so I had to disable all the printf statements to work with the CodeXL debugger.

But once those two steps are complete, the debugger is indeed useful to look at the values. I can't seem to inspect the entire "groupTable" array however.

Instead, I set a watch-expression on "expansionMask" and step through the inner "while(didGrow)" loop. This crashes the CodeXL debugger however after a few iterations.

Just in case this information is helpful to the compiler team, I thought I'd share.