Hello.
I made a program including loop which calls a simple function.
I had made it with "decrement" loop at first, but it didn't work correctly.
I remade it with "increment" loop, it worked.
I think this is a bug of optimizer.
Sample of bad case is here:
void Replace( unsigned short a[], unsigned short To, unsigned short From )
{
// simple load, and, not and, or, store procedures.
}
void Sample( __local unsigned short img[128*128] )
{
unsigned short L[64]; unsigned short R[64];
// after initialization of L, R.....
unsigned int m;
for( m=63; m<64; --m ){ // <- This is the point. An increment loop works correctly.
unsigned short To = L
unsigned short From = R
if( From == To || !From || !To ){ continue; }
// some function calls......
Replace( R, To, From );
// subsequent processes.....
}
}
My environment:
Driver Package: 9.012-121219a-152192C-ATI
Catalyst Version: 13.1
Radeon HD 7970
for( m=63; m<64; --m ){
For m is 63, while m < 64, subtract one from m? It's going to loop forever, the bounds make no sense. Is that a typo in your post or an actual bug in your code?
You could do:
for( unsigned int m = 63; m > 0; --m )
but don't do:
for( unsigned int m = 63; m >= 0; --m )
with an unsigned int, whatever you do, because that also will loop forever
Hello, Lee. Thank you for your reply.
Sorry, I'm careless of my explanation.
I mean the result of replacement with "decrement loop" is not equal what I expected.
After change into "increment loop", it worked as I expected.
The replacement code is like this. (simple replacement)
void Replace( unsigned short dst[], const unsigned short To, const unsigned short From )
{
const ushort8 vFrom = (ushort8)( From );
const ushort8 vTo = (ushort8)( To );
size_t i;
const size_t ip = 64 >> 3;
ushort8 flag, val;
for( i=0; i<ip; ++i ){
val = vload8( i, dst );
flag = convert_ushort8( val == vFrom );
vstore8( ( vTo & flag ) | ( val & ~flag ), i, dst );
}
}
-------------------------------------------------------
You know, m is "unsigned int".
So, it is between 0 and 4294967295.
When m = 0, m goes to 4294967295 after decrement.
for( unsigned int m=63; m<64; --m ) doesn't loop forever, and I confirmed that.
I'm still not entirely sure I understand your question. The new loop only goes from 0 to 16 (>>3). What exactly is the difference in the iteration spaces of the two loops? Have you tried just writing i out and seeing what the values you get are? You could try using printf if it doesn't change the way the compiler generates the code too significantly.
I tried these 3 loops.
The combination "Decremental Loop" and "Function Call" didn't worked correctly.
So that I thought "Optimizer is something wrong."
1. Good :
for( unsigned int m=63; m<64; --m ){ // <- Decremental Loop
unsigned short To = L
unsigned short From = R
if( From == To || !From || !To ){ continue; }
// replacement without function cal
for( unsigned int mm = m; mm<64; --mm ){ R[mm] = R[mm] == R
}
2. Bad :
for( unsigned int m=63; m<64; --m ){ // <- Decremental Loop
unsigned short To = L
unsigned short From = R
if( From == To || !From || !To ){ continue; }
Replace( R, To, From ); // <- Replacement Function Call
}
3. Good :
for( unsigned int m=0; m<64; ++m ){ // <- Incremental Loop
unsigned short To = L
unsigned short From = R
if( From == To || !From || !To ){ continue; }
Replace( R, To, From ); // <- Replacement Function Call
}
Your for loops are quite crazy . Why to use unsigned integers for loops, when ints can do the job.
also I do not see "to" and "from" variables used in case 1. Also in case 2 & 3, I see the same set of 64 values being read, processed and written back, for every iteration for variable "m". Surely going backward or forward may give different results.
Please attach a zip file with the whole code.
Hello.
This is image labeling program.
The problem happens in Combine* kernel, WorkCombineH and WorkCombineV functions.
(attached file is already modified to incremental loop.)
Input: unsigned char[256 *4096]
Output: unsigned short[256*4096]
Host Calls kernels in this order:
1. LabelTo128x128
2. Add128x256
3. Border128x256
4. Combine128x256
5. Add256x256
6. Border256x256
7. Combine256x256
8. Add256x512
9. Border256x512
10. Combine256x512
11. Add256x1024
12. Border256x1024
13. Combine256x1024
14. Add256x2048
15. Border256x2048
16. Combine256x2048
17. Add256x4096
18. Border256x4096
19. Combine256x4096
20. InitInfo
21. CreateInfo
22. CreateLrRef
23. LrCombine
-----------------------------------------------------------------------------------------------------------
I always use unsigned int for array access.
The reasons are here:
1. It never point negative (illegal) array index.
Unsigned value is always 0 or larger than 0.
2. Both increment loop and decrement loop can use same loop condition, "iterator < array length".
for( unsigned int ui = 0; ui < ARRAY_LENGTH; ++ui ) // incremental
for( unsigned int ui = ARRAY_LENGTH - 1; ui < ARRAY_LENGTH; --ui ) // decremental
Can you post a narrowed down version of the testcase. The attached code is quite complicated and it is hard to say , if the issue is in Runtime or the application itself. Anyways It is hard to run these kernels without the host code.