cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

webmaster128
Adept I

OpenCL compilation hangs forever

Hi all,

I am trying to compile this project for an AMD GPU: GitHub - webmaster128/lisk-vanity: A tool to generate short Lisk addresses with GPU support

The c.l files are in lisk-vanity/src/opencl at master · webmaster128/lisk-vanity · GitHub  which are concatenated as follows:

lisk-vanity/gpu.rs at 9515c00c01adbc1eb8c68d3b2acf245b99c675ca · webmaster128/lisk-vanity · GitHub

Unfortuntly the compilation never terminates.

- It works fine for NVIDIA GPU and Apple/Intel CPU
- It shows proper erros when there are syntax errors
- It compiles when I comment out enough code. Which code does not matter.

This is how to reproduce on Ubuntu:

# Install rust

curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain nightly

source $HOME/.cargo/env

git clone https://github.com/webmaster128/lisk-vanity && cd lisk-vanity

export RUSTFLAGS='-L /opt/rocm/opencl/lib/x86_64/'

cargo build

./target/debug/lisk-vanity --gpu --threads 0

System/Driver is AMD-ROCm1.9.224+TensorFlow1.10 Ubuntu16.04 x64

Tags (2)
0 Kudos
Reply
25 Replies
dipak
Staff
Staff

Re: OpenCL compilation hangs forever

Thank you for reporting this compilation issue. We will check it and get back to you.

Please share the GPU details where you observed the above problem.

P.S. I have whitelisted you.

0 Kudos
Reply
webmaster128
Adept I

Re: OpenCL compilation hangs forever

The issue occurs for those 3 GPUs from GPUEater.com:
- Radeon RX 580 (8G)
- Radeon RX Vega 56 (8GB)
- Radeon Vega Frontier Edition (16G)

If you need more details, please let me know

0 Kudos
Reply
webmaster128
Adept I

Re: OpenCL compilation hangs forever

Running the script lisk-vanity/get-ocl-line.py at master · webmaster128/lisk-vanity · GitHub after checking out the above git repo allows merging all code into one .cl file. Maybe that helps for debugging.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compilation hangs forever

Thanks for your inputs. I just did a quick check with CodeXL and observed a build error for those devices. I will do some more tests and let you know my findings.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compilation hangs forever

I observed a build error for any CI+ devices. For example, CodeXL reports following build error for Vega (gfx900):

Error in hsa_operand section, at offset 397288:

Address offset exceeds variable size

LLVM ERROR:

Brig container validation has failed in BRIGAsmPrinter.cpp

As I checked with the compiler team about the error message, they suspect this is an user side error because it occurs when somewhere in the source programmer statically addresses an array out of bound. So, I would suggest you to check the kernel files for any such statically addressed out of bound array access.

Thanks.

0 Kudos
Reply
webmaster128
Adept I

Re: OpenCL compilation hangs forever

Thanks for the feedback! I will try to use CodeXL on my own to see if it can help me find the place. hsa_operand and BRIGAsmPrinter.cpp is nothing that is in my code. If offset 397288 is a number of bytes in the kernel.cl, I do not see anything there.

But even if there was an issue in my program, the compiler must terminate and show an error message. My clGetProgramBuildInfo with CL_PROGRAM_BUILD_STATUS remains at CL_BUILD_IN_PROGRESS.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compilation hangs forever

hsa_operand and BRIGAsmPrinter.cpp is nothing that is in my code. If offset 397288 is a number of bytes in the kernel.cl, I do not see anything there.

That error message includes information about the compiler's internal section which identified and triggered the error. Please ignore those details. As I said earlier, essentially the error message indicates a statically addressed out of bound array access.

Actually I got the error on a Win10 setup. I don't have a ROCm setup to verify it. On Windows, compiler tool-chain (HSAIL) is different than the ROCm one. The above error message is from HSAIL compiler which has diagnosed the out of bound accessing error early.  As per the compiler team, there is no such diagnostics on ROCm tool-chain, but error is still there so hang is a normal outcome.

I agree with you that the compiler should not crash. However, please note that compiler tool-chain on ROCm is new and still improving. So I think it will have the fix in future. Anyway, I've reported it to the concerned team.

Thanks.

0 Kudos
Reply
webmaster128
Adept I

Re: OpenCL compilation hangs forever

I can reproduce the issue from a different Linux machine using rga-2.0.1: when I use the default language level, I get a bunch of errors regarding __generic address space. However, the code was designed for OpenCL 1.2 where no __generic exists. Setting the language level to 1.2 leads to the behaviour described in the original post: compiler hangs forever.

Default

./rga -s rocm-cl -c gfx900 --isa test_isa.txt --livereg regs.txt kernel.cl

Target GPU detected:

gfx900 (Vega)

    Radeon (TM) Pro WX 9100

    Radeon Instinct MI25

    Radeon Instinct MI25 MxGPU

    Radeon Pro SSG

    Radeon RX Vega

    Radeon Vega Frontier Edition

Building for gfx900... failed.

Error (reported by the ROCm OpenCL Compiler):

kernel.cl:26418:17: error: passing '__generic uint32_t *' (aka '__generic unsigned int *') to parameter of type 'uint32_t *' (aka 'unsigned int *') changes address space of pointer

        curve25519_mul(r->x, p->x, p->t);

                      ^~~~

kernel.cl:25888:28: note: passing argument to parameter 'out' here

curve25519_mul(bignum25519 out, const bignum25519 a, const bignum25519 b) {

                          ^

kernel.cl:26418:23: error: passing 'const __generic uint32_t *' (aka 'const __generic unsigned int *') to parameter of type 'const uint32_t *' (aka 'const unsigned int *') changes address space of pointer

        curve25519_mul(r->x, p->x, p->t);

                            ^~~~


The same happens when explitcitly adding --OpenCLoption "-cl-std=CL2.0".

OpenCL 1.2

./rga -s rocm-cl -c gfx900 --OpenCLoption "-cl-std=CL1.2" --isa test_isa.txt --livereg regs.txt kernel.cl

Target GPU detected:

gfx900 (Vega)

    Radeon (TM) Pro WX 9100

    Radeon Instinct MI25

    Radeon Instinct MI25 MxGPU

    Radeon Pro SSG

    Radeon RX Vega

    Radeon Vega Frontier Edition

No more output for minutes

0 Kudos
Reply
webmaster128
Adept I

Re: OpenCL compilation hangs forever

After annotating some function argument pointers (Comparing master...opencl2.0 · webmaster128/lisk-vanity · GitHub ) the compile errors for OpenCL 2.0 disappear and I have the same problen for 1.2 and 2.0: compiler hangs forever.

I will try Window and compiler tool-chain (HSAIL) and hope for some hint at which part of the code the "statically addressed out of bound array access" appears.

0 Kudos
Reply