cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

kbala
Adept I

Strange behavior of a kernel, need fresh ideas

I lost two days debugging or better to say tried to debug my kernel. Basically the kernel looks like this (part of dagger-hashimoto initialization):

1. copy from global to private

2. do private

3. copy from private to local

4. do local

5. copy from local to private

6. do private

7. copy from private to global

After two days of digging, it seems to me that the problem is somewhere between 4. and 6.

If I compile kernel with "-opt-disable" everything works fine. With optimization, it doesn't.

Now, if put barrier or mem_fence between 4. and 5. and/or 5. and 6. nothing happens. But if I instead put conditional printf (never will fire) between 5. and 6. then again everything works fine.

So i tried everything cross my mind.

The line that is problematic looks like this:

for (uint word_id = 0; word_id < 16; word_id++)

     state.Words[word_id] = sharedBlocks[groupId][threadId][word_id]; 

// workgroup = 64

// uint sharedBlocks[4][16][16]

// groupId [0..3]

// threadId [0..15]

I looked at the ISA and found (I hope I'm right) that compiler copied only 10 of 16 words. Don't know why.

(This is in line with the following experiment: loop works fine with indexes word_id = 0, 1, 2, 3, 6, 7, 10, 11, 14, 15; but not when word_id = 4, 5, 8, 9, 12, 13)

However, if I change those two lines to something like this:

for (uint word_id = 0; word_id < 8; word_id++)

     ((ulong*)(&state))[word_id] = ((ulong*)(sharedBlocks[groupId][threadId]))[word_id];

then, it works again. Compiler copied all 16 words.

However something like this:

for (uint word_id = 0; word_id < 16; word_id++)

((uint*)(&state))[word_id] = ((uint*)(sharedBlocks[groupId][threadId]))[word_id];

doesn't work.

I'm really puzzled.

Any Ideas?

0 Kudos
Reply
7 Replies
dipak
Staff
Staff

Re: Strange behavior of a kernel, need fresh ideas

Thanks for reporting it.

From your description, it looks like a compiler optimization issue. For investigation, we need a minimal test-case (host code + kernel) that reproduces the problem. Please provide a repro and mention about the setup details (OS, GPU, driver etc.).

0 Kudos
Reply
kbala
Adept I

Re: Strange behavior of a kernel, need fresh ideas

I pulled out a piece of code that I'm not sure is right.

Honestly, I doubt that the problem is in the compiler, however I'm just blind and I can not see what is wrong.

I'm sending the project in VS2017, OS is Win10, gpu rx550, latest driver. Had same problem with rx480 and older driver.

Almost the same code works on nVidia Cuda, so I suspect that I violated OpenCl C++ standard somewhere, but I don't know where.

Hope that you have answer.

Thanks.

0 Kudos
Reply
dipak
Staff
Staff

Re: Strange behavior of a kernel, need fresh ideas

After a quick look at the kernel code, I suspect below declaration might be causing the problem. Please find my comments right-side of the code.

typedef union

{

ulong words[KECCAK_BYTES / sizeof(uint)]; ----> causing conversion between ulong and uint;

                                                                                  expected declaration: uint words[KECCAK_BYTES / sizeof(uint)];

ulong dwords[KECCAK_BYTES / sizeof(ulong)];

block64_t block;

} state_t;

Thanks.

0 Kudos
Reply
kbala
Adept I

Re: Strange behavior of a kernel, need fresh ideas

Thank you Dipak.

I'm sorry I made a typo error when I tried to simplify the original structures.

Your answer helped me to look for an error elsewhere and I found it here:

#define BLOCK64_BYTES 64

#define BLOCK64_WORDS (BLOCK64_BYTES / sizeof (uint))

#define BLOCK64_DWORDS (BLOCK64_BYTES / sizeof (ulong))

#define BLOCK64_QWORDS (BLOCK64_BYTES / sizeof (uint4))

typedef union

{

uint mWords [BLOCK64_WORDS];

ulong mDWords [BLOCK64_DWORDS];

uint4 mQWords [BLOCK64_QWORDS]; <---- whithout this line everything works fine

} tBlock64;

Can the error be due to automatic alignment?

This time I'm sending a little extended version. I would be grateful for one more help. I just have a need to understand what the problem is

0 Kudos
Reply
kbala
Adept I

Re: Strange behavior of a kernel, need fresh ideas

Addition:

The error does not appear if we use user defined tUInt4

typedef struct

{

uint x;

uint y;

uint u;

uint w;

} tUInt4;

typedef union

{

uint mWords[BLOCK64_WORDS];

ulong mDWords[BLOCK64_DWORDS];

tUInt4 mQWords[BLOCK64_QWORDS];

} tBlock64;

0 Kudos
Reply
dipak
Staff
Staff

Re: Strange behavior of a kernel, need fresh ideas

Hi Karlo,

I tested the latest code. As per my observation, it looks like a compiler optimization problem particularly for OpenCL 2.0. If the same kernel is built for OpenCL 1.2 (with and without optimization), it produces expected result. I will report this problem to the concerned team.

Thanks.

0 Kudos
Reply
kbala
Adept I

Re: Strange behavior of a kernel, need fresh ideas

The mystery is then more or less solved

Thanks.

0 Kudos
Reply