Hello!
I am designing an Intra-prediction mode selector for an H.264 encoder. I dumbed down the code to isolate the error,so the code doesn't actually do anything useful now.
The problem is: sometimes it returns the correct values, sometimes it doesn't, at approximately 50% of success. As if though sometimes it didn't manage to finish what it was doing.
I tried assigning fixed values to predL within Intra_16x16_Vertical, that works. I tried assigning the p array to predModes, that also returns the correct values.
Also, in my host program i called clFinish after every single operation, but that didn't help.
Please help me,
Thanks in advance.
#define NA 511 void fetchPredictionSamples16(int *predSamples, __global int *frame, int frameWidth, int CurrMbAddr) { int frameWidthInMbs = frameWidth >> 4; int mbAddrA, mbAddrB, mbAddrC; if ((CurrMbAddr % frameWidthInMbs) == 0) { mbAddrA = -1; } else { mbAddrA = CurrMbAddr - 1; } if (CurrMbAddr < frameWidthInMbs) { mbAddrB = -1; } else { mbAddrB = CurrMbAddr - frameWidthInMbs; } // predSamples[0]: if ((mbAddrA == -1) || (mbAddrB == -1)) { predSamples[0] = NA; } else { mbAddrC = mbAddrB - 1; int xF = ((mbAddrC % frameWidthInMbs) << 4) + 15; int yF = ((mbAddrC / frameWidthInMbs) << 4) + 15; int frameIdx = yF*frameWidth + xF; predSamples[0] = frame[frameIdx]; } for (int i = 1; i < 17; i++) { if (mbAddrA == -1) { predSamples = NA; } else { int xF = ((mbAddrA % frameWidthInMbs) << 4) + 15; int yF = ((mbAddrA / frameWidthInMbs) << 4) + (i-1); int frameIdx = yF*frameWidth + xF; predSamples = frame[frameIdx]; } } for (int i = 17; i < 33; i++) { if (mbAddrB == -1) { predSamples = NA; } else { int xF = ((mbAddrB % frameWidthInMbs) << 4) + (i - 17); int yF = ((mbAddrB / frameWidthInMbs) << 4) + 15; int frameIdx = yF*frameWidth + xF; predSamples = frame[frameIdx]; } } } void Intra_16x16_Vertical(int p[33], int predL[16][16]) { for (int y = 0; y < 16; y++) { for (int x = 0; x < 16; x++) { predL
= p[x+17]; } } } __kernel void GetIntra16x16PredModes(global int *frame, int frameWidth, global int *predModes) { uint CurrMbAddr = get_global_id(0); int predL[16][16]; int p[33]; fetchPredictionSamples16(p, frame, frameWidth, CurrMbAddr); Intra_16x16_Vertical(p, predL); predModes[CurrMbAddr] = predL[15][15]; }
Btw, this only happens if I target the GPU, it works fine on the CPU. I have an HD 5830.
Is it possible that this problem is caused by insufficient power supplied to the GPU? The PSU's max output is 400W, there's only one HDD and one optical drive in the PC.
Could you provide a compilable testcase? Its easier to reproduce and track down the issue that way.
Wow, you guys are too kind.
Yes well, this program is anything but robust. It's not for professional use, I'm only experimenting with OpenCL for a university paper. Either way, frameWidth is never expected to be less then 16. This is ensured in the host program.
I installed the GPU in another computer (with a stronger PSU) and now the problem seems to have gone away. So unfortunately, I cannot provide you with a compilable test case because I can't try it on the initial computer any more.
Anyhow, thanks a lot for your effort.