cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

corry
Adept III

ret_logicalnz

What could I be doing that makes this call make my kernel run about 10,000 times too fast when I use it?  I checked, nothing is being optimized out   I'm trying to provide an early termination without using if statements, as it would require several nested if statements, which is just silly.  Originally, I was using break to break the main functions loop, which works for killing a single if statement, but not an if-else.  returning control to the loop to continue processing would, but alas, results are incorrect, and the time it takes to run is miniscule.  Something is going wrong...Help? 

Yes, thats 2 questions in one day....I think I need to sleep more so I can solve these on my own

0 Likes
1 Solution

So, in case anyone else has this particular issue....

The workaround I used was to make my main function loop more of a do-while loop.  Basically, the last thing it did in the loop was call the function that performed the check of whether to terminate early.  Thus, a continue (yes, inside a function with no loops) would to the same thing as ret should do.  Prior to that, I had my re-initilaization code after the call to check., which with continue, would never get called, and botch everything up.  Just added it to my list of quirks to workaround

View solution in original post

0 Likes
10 Replies
LeeHowes
Staff

Just a timing error? It sounds like you're not measuring the kernel execution at all and only measuring the enqueue. Have you definitely wrapped timing code around everything to the point of waiting for the kernel event to be marked complete (or, better for testing, the readback)?

0 Likes

Relevant CAL code looks something like the attached code...

Hmmm, no more attach code button...wonder if ib tag code works...will try it

I guess code tags don't work...Is there still a way to post pretty printed code?

QueryPerformanceCounter(&Start);

LastResult=calCtxRunProgramGrid(&KernelFinishedEvent, DeviceContext, &pg);

if (LastResult!= CAL_RESULT_OK)

{

  printf("Ugh, Failed to run, error was %d, %s\n", LastResult, calGetErrorString());

  return;

}

while (calCtxIsEventDone(DeviceContext, KernelFinishedEvent) == CAL_RESULT_PENDING)

{

  //printf("Sleeping....waiting....\n");

  //Nah, lets grab grab 100% CPU for this spin loop just to see if it makes a difference in timing...

  //Sleep(0); //Give up our time slice...

}

QueryPerformanceCounter(&Finish);

QueryPerformanceFrequency(&Freq);

unsigned long long time=Finish.QuadPart-Start.QuadPart;

double dtime=(double)time/((double)Freq.QuadPart);

The result it shows is 100% illogical as well.  The ret_logicalnz is a performance enhancement, so I can take it out and still get correct results, albeit somewhat slower than I would if the ret was working as it should.  Everything works until I try to return from this call to the loop.

I even removed the wackey thing I was doing before, abusing the break command.  I had it in a function outside of main to break the loop in main.  Which to my great satisfaction, worked   However, I thought something might be getting confused, so I moved that up into main inside the loop.  Tried the ret_logicalnz, and there it goes again, illogical results, done in .02 seconds.

You want even weirder?  I can get the ISA if I use the Dump env variable, but kernel analyzer using 11.12 refuses to compile the code.  It doesn't give any error, it just doesnt show the ISA, nor any statistics.  Yeah, I'm scratching my head on this one....if there isn't something completly obvious, I may have to try to write up a test case and see if that doesn't let you guys see what I'm doing wrong!

0 Likes

Ok, so here's some more behavior characterization.  Normally, I expect the ret to be taken, thus the performance optimization.  ret_logicalnz seems to ret the entire program.  No amount of ret_dyn changes that behavior.  If I switch the ret_logicalnz to ret_logicalnz, it runs to the maximum value each thread is allowed to run to, as expected. 

I guess I need this to be an actuall function somehow, or get ret to act as ret, not as break.  I'm a fan of inlining, obviously   However, in this case it seems to be hindering proper flow control.  Any example of ret_dyn and ret_logicalz/nz should demonstrate the problem.  I'm running driver 12.1, but was previously running with 11.12 with the same issue.  Not really sure where to go from here short of nested ifs, dropping the optimization, or finding out there is a ret_dyn_logicalnz  

Either way, ret_dyn doesn't work, and ret_logicalnz anywhere in my program exits the program as soon as the condition is hit.

0 Likes

So, in case anyone else has this particular issue....

The workaround I used was to make my main function loop more of a do-while loop.  Basically, the last thing it did in the loop was call the function that performed the check of whether to terminate early.  Thus, a continue (yes, inside a function with no loops) would to the same thing as ret should do.  Prior to that, I had my re-initilaization code after the call to check., which with continue, would never get called, and botch everything up.  Just added it to my list of quirks to workaround

0 Likes
corry
Adept III

So let me also add that no only has this *NOT* been fixed, but callnz/callz also do *NOT* work.

The call is completly missing, and the function optimized out.  So here's another request for FSAIL, fix it, or remove it, and/or let us control the optimizer and prevent things from being inlined...

0 Likes

If the problem does not show up in OpenCL, it won't be fixed. CAL has been deprecated.

0 Likes

Well, you tell me, I'm not the OpenCL compiler author.  Do you try to use callnz/callz/ret_logicalz/ret_logicalnz when you compile OpenCL code?  Is FSAIL going to have these instructions?  Seems to me this one is a little too basic for your boilerplate answer...Or is OpenCL no longer compiling to IL?

0 Likes

I cannot speak about FSAIL, but we do have some internal apps that use ret_* instructions. So they are known to work in OpenCL.

As for CALL instruction, there are some situations where the CALL can be dropped/ignored. This is documented in the IL spec. If you have a sufficiently complex program, this might be a problem. These limits are guaranteed to never be hit by OpenCL.

0 Likes

Let me amend my statement.  ANY function call, reguardless of how its called, or how you return from it *WILL NOT WORK* if it cannot be inlined.  I.E. if its called based on a condition, its broken, because that doesn't inlne....

so something like

if_logicalnz r0.x

call 4

endif

must be replaced with

if_logicalz r0.x

mov r6, r7

blah blah blah

endif

Why, because the ret in function 4 kills the whole program. 

This really smaks of a simple optimization bug...something that should be pretty simple to fix...and I imagine even for OpenCL its making some serious constraints on how you generate your code...hell, I haven't looked at people's OpenCL problems, maybe you might find this is one of your OpenCL bugs...

0 Likes

This type of code will never be generated by OpenCL because of other constraints that AMDIL has in relation to OpenCL. That being said, are you using 'ret' or 'ret_dyn'? 'ret' is a dx9 instruction, 'ret_dyn' is a dx10/11 instruction, so is more likely to be correct.

0 Likes