cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mz24cn
Adept II

Bug report: Hawaii returns no results when printf removed

I will post the details later. Could any AMD staff give me an email address so I can send you the files to reproduce the bug. A exe file (running on Windows 8/10 x86_64, no dll required) and an opencl source file are enough (since I don't wanna send host side code)?

The correct results should be six numbers (returned by queue.enqueueReadBuffer). Hawaii returns zero results when printf (in opencl source) removed. If run again, it returns correct results. If the printf was reserved, both running return correct results. Spectre integrated GPU (A10-7870K) returns correct results whatever whether printf is removed.

printf was removed: (comment out one line in opencl source: if (env->flags) printf("found: GID=%ld\n", GID);)

Hawaii:

        build time: 10,372ms.

        run time: 187,969 microseconds.

        Found 0 results:

        run again: 5,587 microseconds.

        Found 6 results: 6,590 4,830 6,625 4,785 1,931 1,846

Hawaii:

        build time: 10,366ms.

        run time: 189,547 microseconds.

        Found 0 results:

        run again: 5,499 microseconds.

        Found 6 results: 6,590 6,625 4,785 4,830 1,931 1,846

Spectre:

        build time: 10,356ms.

        run time: 202,173 microseconds.

        Found 6 results: 1,846 1,931 4,830 4,785 6,590 6,625

        run again: 24,345 microseconds.

        Found 6 results: 1,846 1,931 4,785 4,830 6,590 6,625

printf was reserved: (env->flags differs in first run and second run, so, only printf once for each GPU)

Hawaii:

        build time: 9,831ms.

found: GID=1846

found: GID=1931

found: GID=4785

found: GID=4830

found: GID=6590

found: GID=6625

        run time: 807,821 microseconds.

        Found 6 results: 1,931 1,846 4,830 4,785 6,590 6,625

        run again: 594,972 microseconds.

        Found 6 results: 1,846 1,931 4,830 4,785 6,590 6,625

Hawaii:

        build time: 10,906ms.

found: GID=1846

found: GID=1931

found: GID=4785

found: GID=4830

found: GID=6590

found: GID=6625

        run time: 860,503 microseconds.

        Found 6 results: 1,931 1,846 4,830 4,785 6,590 6,625

        run again: 634,349 microseconds.

        Found 6 results: 1,931 1,846 4,785 4,830 6,590 6,625

Spectre:

        build time: 10,818ms.

found: GID=1846

found: GID=1931

found: GID=4785

found: GID=4830

found: GID=6590

found: GID=6625

        run time: 759,770 microseconds.

        Found 6 results: 1,931 1,846 4,830 4,785 6,590 6,625

        run again: 570,127 microseconds.

        Found 6 results: 1,931 1,846 4,830 4,785 6,590 6,625

0 Likes
1 Solution

  1. if (GID == 0) {
  2.   env->null = env->self - env->delta;
  3.   //env->null = env->self;
  4.   dest[0] = 0;
  5.   }
  6.   barrier(CLK_GLOBAL_MEM_FENCE);
  7.   if (GID == 1846) {
  8.   ++dest[0];
  9.   }

barrier() function can be only applicable to work-items within a work-group, not among work-items belonging from different work-groups.

In your case, work-item 0 (i.e GID == 0) and work-item 1846 (i.e GID == 1846) are from two different work-groups, so, there is no effect of barrier statement between these two work-items. However, both the work-items are trying to update the same memory value i.e. dest[0], which is totally undefined and the final value depends on the order of their execution. It also depends on the capability of particular GPU. Say, you may get the expected result from a low-end GPU which can run only 4 WGs at a time. Whereas a more powerful gpu, which can run 20 WGs at a time, may produce completely unpredictable result.

So, in order to produce expected result, both the work-items under the same barrier statement should belong to same work-group.

Regards,

View solution in original post

0 Likes
9 Replies
dipak
Big Boss

I've sent you a private message and shared my email address. Kindly check your message box.

Regards,

0 Likes

Hi dipak,

    Fortunately I find the root cause and reproduce it in a simple VS2013 project. I attched it.

    Note:

1, Comment out the line env->null = env->self - env->delta; and enable the line env->null = env->self; the two results are different for Hawaii/Spectre, but both are wrong for Hawaii/Spectre/Cypress. The correct behaviour is both lines should not change the results (value of dest[0]);

2, The wrong results are different for each device, even between two Hawaii devices (R9 295X2), and unstable;

3, Change GID == 1846 to be GID == 0 or comment out both env->null lines, the results are correct now (dest[0]=1);

4, The OpenCL driver version is lastest 1800.5. I compiled in x86_64 and opencl 1.2.

Please run and explain the results. Thank you, dipak~

0 Likes

I found a way to work around the issue.

Change the line if (GID == 0) to be if (get_local_id(0) == 0) . In this way Hawaii always returns correct results in actural code. However this time Spectre may return wrong results in low  probability (see attached picture). In the simple project above, Spectre returns wrong results in far smaller probability but wrong results still can be observed.

QQ截图20150714215925.jpg

0 Likes

Since get_global_id(0) was changed to be get_local_id(0), there is no globally synchronization between work items, I have to change the code which allocates memory only once for all items to be allocating memory {work_item_number} times for each item itself. That means atomicity operations change from once to {work_item_number} times: A performace issue is waiting there: Why Hawaii/Spectre (R9 290X/A10-7870K R7) slower five ~ ten times than Intel/NVidia on atomic adding...

0 Likes

Hi, dipak​ Have you tried the project?

0 Likes

Sorry, I couldn't check it last week as I was busy with some other stuff. I'll check it this week.

Regards,

0 Likes

  1. if (GID == 0) {
  2.   env->null = env->self - env->delta;
  3.   //env->null = env->self;
  4.   dest[0] = 0;
  5.   }
  6.   barrier(CLK_GLOBAL_MEM_FENCE);
  7.   if (GID == 1846) {
  8.   ++dest[0];
  9.   }

barrier() function can be only applicable to work-items within a work-group, not among work-items belonging from different work-groups.

In your case, work-item 0 (i.e GID == 0) and work-item 1846 (i.e GID == 1846) are from two different work-groups, so, there is no effect of barrier statement between these two work-items. However, both the work-items are trying to update the same memory value i.e. dest[0], which is totally undefined and the final value depends on the order of their execution. It also depends on the capability of particular GPU. Say, you may get the expected result from a low-end GPU which can run only 4 WGs at a time. Whereas a more powerful gpu, which can run 20 WGs at a time, may produce completely unpredictable result.

So, in order to produce expected result, both the work-items under the same barrier statement should belong to same work-group.

Regards,

0 Likes

Thank you, @dipak. It seems I misunderstand the meanings of CLK_GLOBAL_MEM_FENCE. OpenCL cannot synchronize work items beyond work group, right?

0 Likes

Yeah, most of the synchronize operations are limited to work-group only. It makes OpenCL program flexible and scalable. Though there are few atomic functions which work at global level i.e. across multiple work groups.

0 Likes