cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Mikey
Journeyman III

How to get more build info

Hi. I have CL code that works fine on CPU and not work on GPU. My CL code got quiet big now so it's useless to post about 1k lines here...

I'm getting this output:

Stack dump:
0.      Program arguments: C:\Program Files (x86)\ATI Stream\bin\x86\llc -mcpu=atir770 -mattr=mwgs-3-256-1-1 -regalloc=linearscan -march=amdil C:\Users\Mikey\AppData\Local\Temp\OCLA304.tmp.bc -f -o C:\Users\Mikey\AppData\Local\Temp\OCLA304.tmp.il
1.      Running pass 'AMDIL Load Store Setup Pass' on function '@__OpenCL_main_kernel'
005410A5 (0x01B2E454 0x52DC4749 0x01B2E454 0x01A42F10)
00541162 (0x0080D8A0 0xFFFFFFFF 0x00000000 0x01B99FC4)
00598CD1 (0x01A6CBD8 0x0119FD84 0x0079CF29 0xFFFFFFFF)
00593D5B (0x01B2E400 0x00000000 0x01B2E4A0 0x01A42F10)
00541B44 (0x01A6FD4C 0x01B2E454 0x00000000 0x012F00B8)
Compilation log: C:\Users\Mikey\AppData\Local\Temp\OCLA17D.tmp.cl(956): warning
: null (zero)
          character in input line ignored
  }
   ^

How can I find out what's wrong? Can I somehow tell compiler to not delete those files?

 

--- btw.

I have weird problem. At some point in code I have to put some meaningless function call (like cos(1)) - without this kernel returns wrong result...

0 Likes
9 Replies
genaganna
Journeyman III

Originally posted by: Mikey Hi. I have CL code that works fine on CPU and not work on GPU. My CL code got quiet big now so it's useless to post about 1k lines here...

 

I'm getting this output:

 

Stack dump: 0.      Program arguments: C:\Program Files (x86)\ATI Stream\bin\x86\llc -mcpu=atir770 -mattr=mwgs-3-256-1-1 -regalloc=linearscan -march=amdil C:\Users\Mikey\AppData\Local\Temp\OCLA304.tmp.bc -f -o C:\Users\Mikey\AppData\Local\Temp\OCLA304.tmp.il 1.      Running pass 'AMDIL Load Store Setup Pass' on function '@__OpenCL_main_kernel' 005410A5 (0x01B2E454 0x52DC4749 0x01B2E454 0x01A42F10) 00541162 (0x0080D8A0 0xFFFFFFFF 0x00000000 0x01B99FC4) 00598CD1 (0x01A6CBD8 0x0119FD84 0x0079CF29 0xFFFFFFFF) 00593D5B (0x01B2E400 0x00000000 0x01B2E4A0 0x01A42F10) 00541B44 (0x01A6FD4C 0x01B2E454 0x00000000 0x012F00B8) Compilation log: C:\Users\Mikey\AppData\Local\Temp\OCLA17D.tmp.cl(956): warning : null (zero)           character in input line ignored   }    ^

 

How can I find out what's wrong? Can I somehow tell compiler to not delete those files?

 

 

 

--- btw.

 

I have weird problem. At some point in code I have to put some meaningless function call (like cos(1)) - without this kernel returns wrong result...

 

Please send your code to streamdeveloper@amd.com.  Please send your system configuration also(OS, CPU, GPU, SDK version and Driver version).

0 Likes

Originally posted by: genaganna

 

Please send your code to streamdeveloper@amd.com.  Please send your system configuration also(OS, CPU, GPU, SDK version and Driver version).

 

OK.

It could be problem with memory I think. I've got struct with uint a[64] and uint b[32], in both cases it crashes (GPU) when I'm using index i, where i >= 16.

Is it not allowed to have more than 16 elements in array?

0 Likes

Originally posted by: Mikey

 

Is it not allowed to have more than 16 elements in array?

 

It is allowed. Could you provide a test case which reproduces your problem?

0 Likes

I couldn't generate the same output with simpler case, however, I could reproduce something else: 'Link failed'.

 

 

 

typedef struct S { uint bitLength[32]; ulong8 hash; } S; void f(struct S * const sptr) { int size = 32; // change this to 16 - no crash for (int i = 0; i < size; i++) sptr->bitLength = 0U; // without this line size can be greater than 16 sptr->hash = (ulong8)(1UL); } kernel void main() { struct S s; f(&s); }

0 Likes

Hi Mikey,

I had a look into the test case which you have sent. In your code,  as below you are trying to use array indexing to access a vector element, which is illegal.  

ulong8 block; // mu(buffer) ulong8 state; // the cipher state ulong8 L; unsigned int *buffer = sp->buffer; for (int i = 0; i < 8; i++, buffer += 😎 { block = (((ulong)buffer[0] ) << 56)

0 Likes

Thank you, omkaranathan.

I've already found that mistake - I thought that using [] was ok because it has been working in most cases (especally on CPU). But now I see that it was just by sheer chance (struct with values allocated one next to another).

0 Likes

Originally posted by: Mikey Thank you, omkaranathan.

 

I've already found that mistake - I thought that using [] was ok because it has been working in most cases (especally on CPU). But now I see that it was just by sheer chance (struct with values allocated one next to another).

 

Mikey,

        Are you facing any more issue after solving that issue?

0 Likes

Originally posted by: genaganna

 

Mikey,

 

        Are you facing any more issue after solving that issue?

 

No, I have succesfully completed my program. However, it works a little bit slow - even 10 times slower on GPU than on CPU - so I will keep trying making it faster

I'm aware of the fact that SDK isn't finished yet so I'm not giving up easily.

Thanks for your interest!

0 Likes

Ok, I have to take those words back!

I've just read another thread about optimalization and decided to see if my local group size is ok. Well, it wasn't. But now is - that gives me 24 times faster program. I have to addmit, that's quiet awsome!

0 Likes