cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

richard_vega
Journeyman III

Maximum length for a for loop in a kernel?

Is there an upper limit on the length of a for loop in a OpenCL kernel? I am wondering because I have a kernel that needs to run a very long for loop, but when the length of the loop exceeds about 100,000,000, my whole dang computer crashes. The internals of the loop are very simple and the code is producing the correct results for lower loop bounds. Any help would be greatly appreciated. Until then, I will simply try to find a clever work around to reduce the length of the loop.

                                                                                                                                                                          - Richard Vega 

0 Likes
8 Replies
nou
Exemplar

maybe you need disable TDR? http://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx

but it is recommended to break your loop to smaller chunks.

0 Likes

Hi richard_vega,

Apart from your register usage inside the loop i don't think there is any upper limit on the size of for loop.

You can also try with disabling the TDR if your kernel is taking too long to complete the loop(i.e  more than 2sec depends on your OS).

For your reference you can check with the similar previous issue Shared Virtual Memory Can not be larger than about 30M  

For better clarity on your issue can you please share your Kernel code with us?

Thanks,

AMD_Support

0 Likes

I do not understand what the TDR is or how to disable it. I have attached everything that you would need to compile and run my code. The problem occurs in line 11 of the kernel triangle.cl. Let me describe what it going on. The code needs to plot a total of 1,000,000,000 points. The code is run with a number of threads as a command line argument (i.e. ./parSeirp 1000). The number of points is divided by the number of threads, and each kernel plots this many points. For reference, a serial implementaion with one for loop going to 1,000,000,000 on a CPU runs in roughly 30 seconds. It seems curious to me that when I change line 11 to have 100,000,000 instead of 1,000,000,000, the code runs fine for threads > ~30. When I use 1,000,000,000, the code crashes regardless of the number of threads I have, even if I set the number of threads to 1,000 which would mean that the upper limit of the for loop is only 1,000,000. I am very confused now. Also, I have tried running on various machines. On some machines it does not crash, but it does not finish either (at least not within ~ 10 min).   

0 Likes

you need have "thread" or global work size at 2000 items or more. modern high end GPU now contain 2048 compute units. so to fully utilize this GPU you need enqueue kernel with at least 2048 workitems. if you enqueue kernel with work size 30 then 99% of GPU sits there idle.

0 Likes

I understand that threads = 30 was highly inefficient. It was my lower bound. Attached is a plot of the ratio of run times for CPU/GPU when using 100,000,000 points (i.e. when line 11 states iter = 100000000/threads). The x axis is the number of threads. I see that the run time decreases for the GPU as threads is increased, but it seems to level out well below 2000. For reference, I am using a Radeon HD 6770M. This was not really my problem however. The issue is that when I plot 1,000,000,000 points, the computer crashes, even if the number of threads > 1000, which means the for loop would only go to 1,000,000 per kernel.

0 Likes

The 2000 for high end cards. your GPU have only 480 units so it need only 480 of them which roughly correspond to your graph. why you see crash is that GPU can't interrupt running kernel until it finish. but Windows have TDR timer which reset GPU if it doesn't respond after 2-5 seconds. disable this TDR or break your kernel to smaller parts. recommendation is don't run your kernel for too long.

0 Likes

I am running on a mac, not windows.

Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone

0 Likes

then you are on wrong forum. AMD doesn't support mac os. you must ask apple for support.

0 Likes