cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

thenumbernine
Journeyman III

OpenCL compiler / clBuildProgram / LLVM ERROR

Hello, I'm trying to post a LLVM ERROR bug, I wasn't sure which github repo it belonged to so i thought I'd post it at the developer forums here.  I would put this in the OpenCL discussion but it seems, being a new user, that I can't. Please point me to which of the many ROCm github repos I should be posting this bug at.

Here is the failing OpenCL code, and the LLVM errors it produces:

 

 

//LLVM ERROR: Cannot select: 0x555a2eca6b90: i32 = GlobalAddress<[3 x %struct.real3] addrspace(5)* @constinit> 0
typedef float real;
typedef struct { real x, y, z; } real3;
typedef struct { real3 v[3]; } real3x3;
kernel void failedKernel(global real3x3 * const U) {
	U[0] = (real3x3){.v={
		(real3){.x=0, .y=0, .z=0},
		(real3){.x=0, .y=0, .z=0},
		(real3){.x=0, .y=0, .z=0},
	}};
}


//LLVM ERROR: Cannot select: 0x55bc4ae975a0: i32 = GlobalAddress<[3 x %struct.real3] addrspace(5)* @constinit> 0
typedef float real;
typedef struct { real s0, s1, s2; } real3;
typedef struct { real3 v[3]; } real3x3;
kernel void failedKernel(global real3x3 * const U) {
	U[0] = (real3x3){.v={
		(real3){.s0=0, .s1=0, .s2=0},
		(real3){.s0=0, .s1=0, .s2=0},
		(real3){.s0=0, .s1=0, .s2=0},
	}};
}


//LLVM ERROR: Cannot select: 0x56030681c910: i32 = GlobalAddress<[3 x %struct.real3] addrspace(5)* @constinit> 0
typedef float real;
typedef struct { real s[3]; } real3;
typedef struct { real3 v[3]; } real3x3;
kernel void failedKernel(global real3x3 * const U) {
	int i = get_global_id(0);
	U[i] = (real3x3){.v={
		(real3){.s={0,0,0}},
		(real3){.s={0,0,0}},
		(real3){.s={0,0,0}},
	}};
}


//LLVM ERROR: Cannot select: 0x562e18268880: i32 = GlobalAddress<[9 x float] addrspace(5)* @constinit> 0
typedef float real;
typedef struct { real s[9]; } real3x3;	//if the array size is 4 or less then it works
kernel void failedKernel(global real3x3 * const U) {
	U[0] = (real3x3){.s={0,0,0,0,0,0,0,0,0}};
}

 

 


Here are some equivalent test-cases that compile fine:

 

 

//works:
typedef float real;
typedef union {
	struct { real x, y, z; };
	struct { real s0, s1, s2; };
	real s[3];
} real3;
typedef union {
	real s[9];
	real3 v[3];
	struct {real3 v0,v1,v2;};
	struct {real3 x,y,z;};
} real3x3;
kernel void failedKernel(global real3x3 * const U) {
	U[0] = (real3x3){
		.x=((real3){.x=0, .y=0, .z=0}),
		.y=((real3){.x=0, .y=0, .z=0}),
		.z=((real3){.x=0, .y=0, .z=0}),
	};
}


//works:
typedef float real;
typedef union {
	struct { real x, y, z; };
	struct { real s0, s1, s2; };
	real s[3];
} real3;
typedef union {
	real s[9];
	real3 v[3];
	struct {real3 v0,v1,v2;};
	struct {real3 x,y,z;};
} real3x3;
kernel void failedKernel(global real3x3 * const U) {
	int i = get_global_id(0);
	U[i] = (real3x3){
		.x=((real3){.s0=0, .s1=0, .s2=0}),
		.y=((real3){.s0=0, .s1=0, .s2=0}),
		.z=((real3){.s0=0, .s1=0, .s2=0}),
	};
}


//works:
typedef float real;
typedef union {
	struct { real x, y, z; };
	struct { real s0, s1, s2; };
	real s[3];
} real3;
typedef union {
	real s[9];
	real3 v[3];
	struct {real3 v0,v1,v2;};
	struct {real3 x,y,z;};
} real3x3;
kernel void failedKernel(global real3x3 * const U) {
	int i = get_global_id(0);
	U[i] = (real3x3){
		.x=((real3){.s={0,0,0}}),
		.y=((real3){.s={0,0,0}}),
		.z=((real3){.s={0,0,0}}),
	};
}


//works
typedef float real;
typedef struct { real s[4]; } real3x3;	//if the array size is 4 or less then it works
kernel void failedKernel(global real3x3 * const U) {
	U[0] = (real3x3){.s={0,0,0,0}};
}

 

 


clinfo says:

Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.2 AMD-APP (3361.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD

Platform Name AMD Accelerated Parallel Processing
Number of devices 2
Device Name gfx1010:xnack-
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 2.0
Driver Version 3361.0 (HSA1.1,LC)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Board Name (AMD) AMD Radeon RX 5600M
Device Topology (AMD) PCI-E, 03:00.0

I am using Ubuntu 20.04.3 LTS, kernel 5.11.0-46
cat /opt/rocm-4.5.2/.info/* says everything is version 4.5.2-164
I installed my OpenCL from amdgpu-install_21.40.2.40502-1_all.deb
using amdgpu-install --usecase=opencl --opencl=rocr --vulkan=amdvlk

0 Likes
9 Replies
dipak
Big Boss

Thank you for reporting the issue. I have whitelisted you and moved the post to the OpenCL forum.

Device Type GPU
Device Board Name (AMD) AMD Radeon RX 5600M
..
I am using Ubuntu 20.04.3 LTS, kernel 5.11.0-46
cat /opt/rocm-4.5.2/.info/* says everything is version 4.5.2-164
I installed my OpenCL from amdgpu-install_21.40.2.40502-1_all.deb
using amdgpu-install --usecase=opencl --opencl=rocr --vulkan=amdvlk

From the above information, it looks like you are using the below AMDGPU-Pro 21.40 driver on Radeon RX 5600M. Can you please confirm the configuration?

https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-21-40-2

 

Thanks.

0 Likes

Yup.  Thanks much.

0 Likes

As per the below driver page, there is no linux gpu driver (AMDGpu-Pro) available for RX 5600M. 

https://www.amd.com/en/support/graphics/amd-radeon-5000m-series/amd-radeon-rx-5000m-series/amd-radeo...

Also, the AMDGPU-Pro 21.40 release note  does not include RX 5600M in the "Product Family Compatibility List". So it looks like the installed AMDGPU-Pro driver is not compatible with this product.

Thanks.

 

0 Likes

I just tried to build the kernel code using the "Radeon GPU Analyzer (RGA)" tool (part of the "Radeon Developer Tool Suite" available here: https://gpuopen.com/tools/  )  and it compiled fine for RX 5600M device (gfx1010). 

Thanks.

 

0 Likes

Mind if I ask a question I asked in the original post: what ROCM github repo does this LLVM error pertain to?

I am curious because I would like to find out what version of what repo that I am using versus what version Radeon GPU Analyzer is using, so I can know how long it will be until I get to use the fixed version of this.

0 Likes

Looks like I missed this post.

So if the 5600M isn't compatible with this AMD Ubuntu driver, can you point me to which AMD Ubuntu driver is compatible with the 5600M?  Last I checked, none were.

Also I am curious what 5600M-specific instructions are being generated by this LLVM code - specifically that are causing this error?  Everything else on the OpenCL compiler is working fine, except this erroneous LLVM code being generated for this edge case, which I showed in the original post I have a work-around for.

0 Likes

So if the 5600M isn't compatible with this AMD Ubuntu driver, can you point me to which AMD Ubuntu driver is compatible with the 5600M?  Last I checked, none were.

As per the below driver pages, it looks like RX 5600M is not officially supported by AMDGPU-Pro or ROCm.

https://www.amd.com/en/support/graphics/amd-radeon-5000m-series/amd-radeon-rx-5000m-series/amd-radeo...

https://github.com/RadeonOpenCompute/ROCm#hardware-and-software-support

 

Also I am curious what 5600M-specific instructions are being generated by this LLVM code - specifically that are causing this error?  ...

As I mentioned, I was unable to reproduce the issue with the latest offline OpenCL compiler shipped with the RGA package. 

Please let us know if the issue is reproducible with the latest Adrenalin driver available on the above driver page.

Thanks. 

 

0 Likes

Yes, all the same failing code fails in the same exact identical way on Windows, producing the same exact error messages, and all the passing code passes on Windows.

Like I said, point me to the github repo and I'll file the bug myself.  I just didn't feel like sifting through all the ROCM repos, especially when the llvm-project repo is 2Gb or more.

Why did you test it on gfx1011 when I told you the bug was on gfx1010?  Shouldn't you have tried on gfx1010?

Here's my Windows clinfo:

Number of platforms: 1
  Platform Profile: FULL_PROFILE
  Platform Version: OpenCL 2.1 AMD-APP (3075.12)
  Platform Name: AMD Accelerated Parallel Processing
  Platform Vendor: Advanced Micro Devices, Inc.
  Platform Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices


  Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
  Device Type: CL_DEVICE_TYPE_GPU
  Vendor ID: 1002h
  Board name: AMD Radeon(TM) RX 5600M Series
  Device Topology: PCI[ B#3, D#0, F#0 ]
  Max compute units: 18
  Max work items dimensions: 3
    Max work items[0]: 1024
    Max work items[1]: 1024
    Max work items[2]: 1024
  Max work group size: 256
  Preferred vector width char: 4
  Preferred vector width short: 2
  Preferred vector width int: 1
  Preferred vector width long: 1
  Preferred vector width float: 1
  Preferred vector width double: 1
  Native vector width char: 4
  Native vector width short: 2
  Native vector width int: 1
  Native vector width long: 1
  Native vector width float: 1
  Native vector width double: 1
  Max clock frequency: 818Mhz
  Address bits: 64
  Max memory allocation: 5233652531
  Image support: Yes
  Max number of images read arguments: 128
  Max number of images write arguments: 64
  Max image 2D width: 16384
  Max image 2D height: 16384
  Max image 3D width: 2048
  Max image 3D height: 2048
  Max image 3D depth: 2048
  Max samplers within kernel: 16
  Max size of kernel argument: 1024
  Alignment (bits) of base address: 2048
  Minimum alignment (bytes) for any datatype: 128
  Single precision floating point capability
    Denorms: Yes
    Quiet NaNs: Yes
    Round to nearest even: Yes
    Round to zero: Yes
    Round to +ve and infinity: Yes
    IEEE754-2008 fused multiply-add: Yes
  Cache type: Read/Write
  Cache line size: 64
  Cache size: 16384
  Global memory size: 6425673728
  Constant buffer size: 5233652531
  Max number of constant args: 8
  Local memory type: Scratchpad
  Local memory size: 65536
  Max pipe arguments: 16
  Max pipe active reservations: 16
  Max pipe packet size: 938685235
  Max global variable size: 4710287104
  Max global variable preferred total size: 6425673728
  Max read/write image args: 64
  Max on device events: 1024
  Queue on device max size: 8388608
  Max on device queues: 1
  Queue on device preferred size: 262144
  SVM capabilities:
    Coarse grain buffer: Yes
    Fine grain buffer: Yes
    Fine grain system: No
    Atomics: No
  Preferred platform atomic alignment: 0
  Preferred global atomic alignment: 0
  Preferred local atomic alignment: 0
  Kernel Preferred work group size multiple: 32
  Error correction support: 0
  Unified memory for Host and Device: 0
  Profiling timer resolution: 1
  Device endianess: Little
  Available: Yes
  Compiler available: Yes
  Execution capabilities:
    Execute OpenCL kernels: Yes
    Execute native function: No
  Queue on Host properties:
    Out-of-Order: No
    Profiling : Yes
  Queue on Device properties:
    Out-of-Order: Yes
    Profiling : Yes
  Platform ID: 00007FFF6FBAEFD0
  Name: gfx1010
  Vendor: Advanced Micro Devices, Inc.
  Device OpenCL C version: OpenCL C 2.0
  Driver version: 3075.12 (PAL,LC)
  Profile: FULL_PROFILE
  Version: OpenCL 2.0 AMD-APP (3075.12)
  Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_copy_buffer_p2p cl_amd_planar_yuv


  Device Type: CL_DEVICE_TYPE_GPU
  Vendor ID: 1002h
  Board name: AMD Radeon(TM) Graphics
  Device Topology: PCI[ B#7, D#0, F#0 ]
  Max compute units: 7
  Max work items dimensions: 3
    Max work items[0]: 1024
    Max work items[1]: 1024
    Max work items[2]: 1024
  Max work group size: 256
  Preferred vector width char: 4
  Preferred vector width short: 2
  Preferred vector width int: 1
  Preferred vector width long: 1
  Preferred vector width float: 1
  Preferred vector width double: 1
  Native vector width char: 4
  Native vector width short: 2
  Native vector width int: 1
  Native vector width long: 1
  Native vector width float: 1
  Native vector width double: 1
  Max clock frequency: 1600Mhz
  Address bits: 64
  Max memory allocation: 2350330675
  Image support: Yes
  Max number of images read arguments: 128
  Max number of images write arguments: 64
  Max image 2D width: 16384
  Max image 2D height: 16384
  Max image 3D width: 2048
  Max image 3D height: 2048
  Max image 3D depth: 2048
  Max samplers within kernel: 16
  Max size of kernel argument: 1024
  Alignment (bits) of base address: 2048
  Minimum alignment (bytes) for any datatype: 128
  Single precision floating point capability
    Denorms: No
    Quiet NaNs: Yes
    Round to nearest even: Yes
    Round to zero: Yes
    Round to +ve and infinity: Yes
    IEEE754-2008 fused multiply-add: Yes
  Cache type: Read/Write
  Cache line size: 64
  Cache size: 16384
  Global memory size: 3301965824
  Constant buffer size: 2350330675
  Max number of constant args: 8
  Local memory type: Scratchpad
  Local memory size: 32768
  Max pipe arguments: 16
  Max pipe active reservations: 16
  Max pipe packet size: 2350330675
  Max global variable size: 2115297536
  Max global variable preferred total size: 3301965824
  Max read/write image args: 64
  Max on device events: 1024
  Queue on device max size: 8388608
  Max on device queues: 1
  Queue on device preferred size: 262144
  SVM capabilities:
    Coarse grain buffer: Yes
    Fine grain buffer: Yes
    Fine grain system: No
    Atomics: No
  Preferred platform atomic alignment: 0
  Preferred global atomic alignment: 0
  Preferred local atomic alignment: 0
  Kernel Preferred work group size multiple: 64
  Error correction support: 0
  Unified memory for Host and Device: 1
  Profiling timer resolution: 1
  Device endianess: Little
  Available: Yes
  Compiler available: Yes
  Execution capabilities:
    Execute OpenCL kernels: Yes
    Execute native function: No
  Queue on Host properties:
    Out-of-Order: No
    Profiling : Yes
  Queue on Device properties:
    Out-of-Order: Yes
    Profiling : Yes
  Platform ID: 00007FFF6FBAEFD0
  Name: gfx902
  Vendor: Advanced Micro Devices, Inc.
  Device OpenCL C version: OpenCL C 2.0
  Driver version: 3075.12 (PAL,HSAIL)
  Profile: FULL_PROFILE
  Version: OpenCL 2.0 AMD-APP (3075.12)
  Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_amd_liquid_flash cl_amd_copy_buffer_p2p cl_amd_planar_yuv

0 Likes

Platform Version: OpenCL 2.1 AMD-APP (3075.12)
...
Driver version: 3075.12 (PAL,LC)

From the above clinfo output, it seems like an old driver.  Can you please check the driver version from the Radeon software settings and share the information? 

As per the Adrenalin 22.2.1 Release Notes: 

"The Radeon™ Software Adrenalin 22.2.1 installation package contains the following:

Radeon™ Software Adrenalin 22.2.1 Driver Version 21.40.23.07 for Windows® 10 and Windows® 11 (Windows Driver Store Version 30.0.14023.7007)."

Just for reference, here is a post where the user shared the clinfo output of Adrenalin 22.1.2. It says "Platform Version OpenCL 2.1 AMD-APP (3354.13)". So it is expected that platform/driver version must be same or higher for Adrenalin 22.2.1.

Another point, I also observed this LLVM error when tried with an older version of RGA package. But the error is not reproducible with the latest RGA package which provides a more recent OpenCL compiler. So I would suggest to test the kernel with the latest driver/compiler.

 

Why did you test it on gfx1011 when I told you the bug was on gfx1010?  Shouldn't you have tried on gfx1010?

I indeed checked the kernel for gfx1010 and it compiled fine (see the attached screenshot). Sorry, I wrongly mentioned it as gfx1011. I'll correct it in the original post.

Thanks.

0 Likes