cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dasmurkel
Journeyman III

Strange behavior in kernel - Losing variables during kernel run

Hi,

I've come across some very strange kernel behavior when running a certain kernel on my Radeon Device. Interestingly, the exact same code performs fine using the CPU-Device, either with the AMD-APP SDK or the Intel SDK (I have a Core i5).

The kernel looks as follows:

kernel void
incident_bitmask (const uint size,
                  const uint intersectionBufferPerRay,
                  const global uint* intersectionCount,
                  const global intersection* intersections,
                  const global ray* rays,
                  const global transmitter* sourceTransmitters,
                  const global wall* walls,
                  const global building* buildings,
                  float rx_planeHeight,
                  global uint* bitmask,
                  global incidentPoint* incidents)
{
  const uint id = get_global_id (0);
  if (id >= size)
    {
      return;
    }

// ... Some stuff ...
  printf ("ID: %i, intersectionBufPerRay: %i\n", id, intersectionBufferPerRay);
  const uint resultBufferSize = 2 * intersectionBufferPerRay + 1;

// ... Lots of more stuff, branches, loopsand function calls ...


  printf ("resultBufferSize for ID %i: %i\n", id, resultBufferSize);

}

I run this kernel with a global size of 6 and obviously the "size" parameter is also set to 6. This is my output:

ID: 0, intersectionBufPerRay: 100
resultBufferSize for ID 0: 201
ID: 1, intersectionBufPerRay: 100
resultBufferSize for ID 1: 201
ID: 2, intersectionBufPerRay: 100
resultBufferSize for ID 2: 201
ID: 3, intersectionBufPerRay: 100
resultBufferSize for ID 0: 201
ID: 4, intersectionBufPerRay: 100
resultBufferSize for ID 0: 201
ID: 5, intersectionBufPerRay: 100
resultBufferSize for ID 0: 201
ID: 6, intersectionBufPerRay: 100
resultBufferSize for ID 0: 201

Please note that I get the IDs printed out correctly for the first prinf but that I get only 0-2 for the second printf. The rest of the ids (which were set to the correct global_id at the beginning of the kernel and are const, so there is no way I changed them) is set to zero.

Another thing is, that this only happens sometimes and hat the margin for which the values are correct changes. Sometimes everything is fine, sometimes I get 0-3 correct and the rest is wrong, sometimes the "intersectionBufferPerRay" also prints zero.

Environment: Ubuntu 12.10, Catalyst 13.4, Radeon HD 7850

Any help is greatly appreciated!

0 Likes
9 Replies
himanshu_gautam
Grandmaster

Hi,

Just with the above its difficult to say anything..

There are some points which are actualy need to check

1. you are saying the global size which you set is 6. then actually you should get the id range from 0 to 5 not 6... So i have doubt that your host program is having some problem.

2. You are mentioning that in the second run the results are not coming properly which means that are you executing the program for the second time or you are just calling the kernel in some loop.

So to understand all these please post the sample code.

0 Likes

Thank you for your answer.

The kernel together with is helper functions is very long and since I couldn't find any environment for code in the forum editor, I cropped it.

As to your points:

1. Global range and "size" are, of course, the same. But it should not matter anyway.

2. I meant reruns of the whole program.

I did, however, find the problem and you are correct, it couldn't have been found with my short code snippet. At some point, I declared some functions inline. That messed up the kernel. Once I removed the "inline" keyword, the results were correct (again).

Then I looked for "inline" in the standard and didn't find it mentioned anywhere, neither as allowed, nor as restricted from the C99 subset. So I guess I feel like there should be a compiler warning if "inline" is used but not supported. If it is supported - did I just find my first compiler bug?

If requested, I can still post the entire code.

0 Likes

Hi,

could you please post your code. Because i have written one sample code with inline function and its working fine for me.

Check the below code... inline function F1 just receiving argument and returning the same...

inline int F1(unsigned int val)
{
    return val;
}

__kernel void templateKernel(__global  unsigned int * output,
                             __global  unsigned int * input,
                             const     unsigned int multiplier)

    uint tid = get_global_id(0);
    if(tid==0)printf(" tid:%d", tid);
    output[tid] = input[tid] * multiplier;
    output[tid] =   F1(output[tid]);

}

0 Likes

Sure. Please see the attachment. The main kernel is in incidentBitmask.cl

As I said, it is a pretty big kernel (at least for my understanding). The trouble starts, when I put "inline" in front of these four functions:

createReceiveIP

createBuildinglessReceiveIP

createWallIP

createRoofIP

I've used other functions inline, too, without problems. But in this case I just tried again and the four "inline"s make the difference between correct and totally bogus.

I also added the IL and ISA from KernelAnalyzer for both, the correctly running and the inlined version. Maybe it helps you track this down. Even though I can't really make sense of it, diff tells me that there are quite some differences between the versions.

If you need any more material or have any questions, please let me know. Also, I would be interested in what caused this - being a curious computer engineer and all .

Best regards,

Matthias

0 Likes

Hi Could you please check the same in latest driver ie., 13.8 beta. Also please check  on a non GCN card if possible and let us know the infomation.

0 Likes

Hi,

I tried it on a HD 6670 Device (Turks, VLIW5, if I'm not mistaken?). It produces the same issue. Everything is fine without the "inline" but with "inline" the results are completely wrong.

As for the test with the newest beta driver: I'm currently in the final stage of my Diploma thesis which is about OpenCL and I can't afford to lose a day trying to reinstall my system because I messed up the driver. Never touch a running system is my motto right now sorry

I will get back to you when the thesis is finished and I have the time to experiment. That should be by beginning of October. If there is anything else helpful I can provide you with, please let me know.

One thing more I did try, because this thread made me think of it was to pass "-cl-opt-disable" to clBuildProgram. Then, the results are correct, even with "inline". Hope this helps at least a little.

Best regards,

Matthias

0 Likes


Sorry this was missed out our Radar...I will work on this and let you know.

Thanks for posting the repro case.

0 Likes

Hi,

Your ZIP does not contain a compilable piece of code.

It has dependencies on your project's header files like "api/public_enums.hpp"

Without compiling, executing and getting bad results -- I cannot take this up with engg team.

Request you to provide a self-contained small package - which is bare minimal enough to reproduce the problem,

Thanks,

Bruhaspati

0 Likes

Reviving the thread.  Do you see the problem with the latest drivers?

--Prasad

0 Likes