cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

RX 6900XT: Memory issue with small OpenCL program for 2D-CT

Hello all, thanks for time and effort in advance. For a university assignment on computer tomography, I wrote a little OpenCL program (~1000 lines total) that reads a square greyscale image, computes the radon transform, filters the radon transform and computes the corresponding back projection. 

So far I have tested said program on my 2017 12" Macbook, my Linux machine (Ubuntu 20.04, i9 9900K, 6900XT, 32GB of RAM) and on the windows machine on a friend (Windows 10, Ryzen 5 1500X, GTX 1050, 16GB of RAM), achieving mixed results. The code runs mostly fine on my Macbook and perfectly fine on my friends Nvidia GTX 1050 with current OpenCL drivers for Windows (tested with image resolutions 64x64, 128x128, 256x256, 512x512, 1024x1024, 2048x2048 and 4096x4096), which makes me quite confident in my code. However the OpenCL CPU driver of his Ryzen 5 1500X as well as the OpenCL driver for my 6900XT (from driver 21.10 on Ubuntu 21.10) refuse to execute with image sizes of 1024x1024 and above, throwing a memory related core dump error message.

I am not sure if this is a driver issue or an issue with my source code, but have limited experience with OpenCL development in general. If it is of any help, I can post further details and give you my code to test. 

0 Likes
17 Replies
dipak
Big Boss

Hi @FriedrichGuenther ,

I've moved the post to the OpenCL forum.

Please provide a minimal test-case that reproduces the issue. Also share the clinfo output.

 

Thanks.

0 Likes

@dipak Hello dipak, I think creating a minimal example that produces the issue is not helpful (because I would just delete parts of the main program, the program fails in the first OpenCL-kernel "compute_sinogram.cl" already), but if you insist, I will try doing that. 

Here is a link to a github repository containing my code, the clinfo output, as well as screenshots of the core dump on my AMD card and a successful run on my CPU. All the additional info is in the folder "Debug"

If there is anything else I can do, please tell me! Thanks a lot in advance for your time.

Edit: I forgot to mention that with my new revision of the program, it does not run on the AMD GPU ever. Before I could run it with resolutions below 1024x1024, now it crashes in the first function every time

0 Likes

Hi @FriedrichGuenther ,

Thank you for providing the repro and other information. We will look into this and get back to you.

 

Thanks.

Hi @FriedrichGuenther ,

the OpenCL driver for my 6900XT (from driver 21.10 on Ubuntu 21.10) ... throwing a memory related core dump error message.

It looks like a more recent AMDGPU-Pro driver (21.20) is available here: amd-radeon-rx-6900-xt . Please try this latest driver and share your observation.

Note: As per the driver release note, Ubuntu 21.10 is not officially supported. Below are the supported Ubuntu versions. 

  • Ubuntu 20.04.2
  • Ubuntu 18.04.5 HWE

 

Thanks.

0 Likes

@dipak I will try the newer driver and report back to you. Also I am running Ubuntu 20.04, I just mistyped it before. Sorry for that!

Edit: I just uninstalled the old driver and underwent an install of the new driver. Still the same issue. What happens if you just run the code? Do you get output?

0 Likes

Hi @FriedrichGuenther ,

Thank for the quick update. 

Based on the screenshot of the core dump, it looks to me a runtime/compiler issue. I'll report it to the OpenCL team.

By the way, the code ran fine when I tried it on a Windows laptop with Vega device.

 

Thanks.

@dipak Hey, thank you very much for your effort! That is partially comforting. Hopefully there is an easy fix for this, in 2 weeks time I will present the software and would love to see what my AMD card can do in what time. 🙂

 

As a slight off-topic question: I want to switch from buffers to image2d_t memory objects but can not get it working. Can I ask this as a question here or is your OpenCL-forum meant for AMD related topics and issues only?

0 Likes

Yes, you can also post a query related to OpenCL programming to get feedback/suggestion from the other community members.

 

Thanks.

@dipak I have come up with an alternative (very barebones) to the kernel that crashes consistently on my AMD GPU (and also on the Radeon VII of a friend who is running Linux as well), which does work on my GPU. That kernel is too simple for my purposes though, the image quality is significantly worse. Could this kernel help with the debugging process?

 

0 Likes

You can provide the new kernel and related host code.

 

Thanks.

 

0 Likes

@dipak I put the two archives "OpenCL_PP.zip" and "CT_OpenCL.zip" in the folder "Debug" of the Github repository from last page. In "OpenCL_PP.zip" you find updated host code and kernels for pixel-wise computation as opposed to row-wise computation of the output data. The kernel "compute_sinogram_prec.cl" is a pixel-wise version of the "compute_sinogram.cl" kernel that caused trouble before and still doesn't work on my AMD GPU. The kernel "compute_sinogram_fast.cl" is much more primitive (integration by trapezoidal rule, nearest neighbour interpolation as opposed to bilinear in the other kernel), but does work.

In the archive "CT_OpenCL.zip" you find a version of the program ported to image2d_t memory objects, whose "compute_sinogram_img.cl" kernel does the same as "compute_sinogram_prec.cl" from the other archive, but works on my AMD GPU.

If I can be of any further assistance, please let me know. 🙂

 

0 Likes

Hi @FriedrichGuenther ,

Thanks for the above information.

From the clinfo output, it seems like there are multiple OpenCL platforms available. As the concerned team suggested, could you please check the issue without any non-AMD OpenCL platforms and share your findings?

 

Thanks.

0 Likes

@dipak On the POCL-Platform as well as my Macbook, all kernels work without an issue. (Almost. Running the code on CPUs with small resolutions, i.e. 64x64 and 128x128 produces pixel garbage, with higher resolutions it works flawlessly).

 

On the Nvidia platform (Windows, GTX 1050) of a friend, the kernels work as well.

0 Likes

Thanks for the information. Just to clarify my last post, the concerned team was suggesting to uninstall/remove the POCL or any other non-AMD driver and then check the issue.

Thanks.

0 Likes

@dipakI have now setup a fresh install of Ubuntu with AMD-drivers, OpenCL headers, C++ wrapper and clinfo. I have observed the following:

  • The "compute_sinogram.cl" kernel now doesn't crash instantly, but works in 64x64, 128x128, 256x256 and 512x512 and crashes in 1024x1024. The error message is similar to the old one:

 

 

Memory access fault by GPU node-1 (Agent handle: 0xd460e0) on address 0x7f6ab0600000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)​

 

 

After installing clang on the old installation with POCL still installed, I could also run the kernel "compute_sinogram.cl" for 64x64... up to 512x512. This is why i reinstalled this evening, was too thrilled.

  • Using "compute_sinogram_fast.cl" I can run my benchmark from 64x64 to 4096x4096 and observe vastly improved performance using buffers. I have also noticed said uplift after installing the clang compiler on my old installation.
  • A further performance optimisation for the kernel "fast_ram_lak_filter.cl" (replacing "shift = k%2" with "shift = k&1") results in the following error message:

 

 

LLVM ERROR: Cannot select: 0x1f2d9e8: i1 = mul 0x1f311d0, 0x1f2d570
  0x1f311d0: i1 = truncate 0x1f2e478
    0x1f2e478: i32,ch = CopyFromReg 0x1e3ede8, Register:i32 %8
      0x254e270: i32 = Register %8
  0x1f2d570: i1 = truncate 0x254e750
    0x254e750: i16,ch = CopyFromReg 0x1e3ede8, Register:i16 %5
      0x254ea28: i16 = Register %5
In function: fast_ram_lak_filter
Aborted (core dumped)
​

 

 

I provided the new (and mostly final) version of the project on github in the .zip-file "Rev7.zip" in the folder Debug. In the Output folder contained in Rev7 i put some logs of my program to illustrate my findings.

 

One final note: I asked two friends, one with a Radeon VII and one with an RX 570, to run the program. The friend with the Radeon VII reports the same issue with „compute_sinogram.cl“, the friend with the RX 570 could run said kernel without an issue up to 4096x4096 (fresh install of Ubuntu 20.04.02, AMD drivers 21.20, —opencl=legacy)

0 Likes

Hi @FriedrichGuenther ,

Thanks for the update. 

Memory access fault by GPU node-1 (Agent handle: 0xd460e0) on address 0x7f6ab0600000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

Based on the earlier discussions, I'm just trying to list the exact steps that reproduce the above memory issue. Please correct me if I missed any point.

Steps:

1. Setup: RX6900XT + Ubuntu 20.04 + latest AMDGPU-Pro (21.20) driver + OpenCL headers and C++ wrapper

2. Download the repro available here: github repository -->"Debug"-->"Rev7.zip" 

3. Extract the source code, build and run.

4. Observation: The "compute_sinogram.cl" kernel works fine for image-resolution 64x64, 128x128, 256x256 and 512x512, but crashes for 1024x1024 with the above memory error message.

 

Thanks.

 

 

@dipakThanks for the summary, the steps are correct. Sorry if it was confusing. In short and hopefully overseeable form, I summarise some hopefully helpful information I have gathered:

  • Using pinned buffers (CL_MEM_ALLOC_HOST_PTR) makes the kernel "compute_sinogram.cl" crash immediately regardless of resolution
  • Testing with a bisection based ansatz, I verified that without pinned buffers, the kernel really crashes for the first time with resolution 1024x1024
  • The same problem seems to be present on a friends linux system, who runs a Radeon VII (Vega 20 based), but not on another friends system, who runs an RX 570 (Polaris 20 based, latest AMD drivers, --opencl=legacy)

Additionally, attempted performance tuning of a different kernel results in crashes:

  • Changing one line in the kernel "fast_ram_lak_filter.cl" (namely "shift = k%2;" to "shift = k&1;") causes this kernel to crash regardless of resolution. The same kernel for images, i.e. "fast_ram_lak_filter_img.cl", can handle the optimisation

All kernels and code can be found in the aforementioned github repository. If you provide me with older drivers, I am willing to test the problem with those.

0 Likes