cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

dipak
Staff
Staff

Re: RX 6900XT: Memory issue with small OpenCL program for 2D-CT

You can provide the new kernel and related host code.

 

Thanks.

 

0 Likes
Reply

Re: RX 6900XT: Memory issue with small OpenCL program for 2D-CT

@dipak I put the two archives "OpenCL_PP.zip" and "CT_OpenCL.zip" in the folder "Debug" of the Github repository from last page. In "OpenCL_PP.zip" you find updated host code and kernels for pixel-wise computation as opposed to row-wise computation of the output data. The kernel "compute_sinogram_prec.cl" is a pixel-wise version of the "compute_sinogram.cl" kernel that caused trouble before and still doesn't work on my AMD GPU. The kernel "compute_sinogram_fast.cl" is much more primitive (integration by trapezoidal rule, nearest neighbour interpolation as opposed to bilinear in the other kernel), but does work.

In the archive "CT_OpenCL.zip" you find a version of the program ported to image2d_t memory objects, whose "compute_sinogram_img.cl" kernel does the same as "compute_sinogram_prec.cl" from the other archive, but works on my AMD GPU.

If I can be of any further assistance, please let me know. 🙂

 

Tags (1)
0 Likes
Reply
dipak
Staff
Staff

Re: RX 6900XT: Memory issue with small OpenCL program for 2D-CT

Hi @FriedrichGuenther ,

Thanks for the above information.

From the clinfo output, it seems like there are multiple OpenCL platforms available. As the concerned team suggested, could you please check the issue without any non-AMD OpenCL platforms and share your findings?

 

Thanks.

0 Likes
Reply

Re: RX 6900XT: Memory issue with small OpenCL program for 2D-CT

@dipak On the POCL-Platform as well as my Macbook, all kernels work without an issue. (Almost. Running the code on CPUs with small resolutions, i.e. 64x64 and 128x128 produces pixel garbage, with higher resolutions it works flawlessly).

 

On the Nvidia platform (Windows, GTX 1050) of a friend, the kernels work as well.

0 Likes
Reply
dipak
Staff
Staff

Re: RX 6900XT: Memory issue with small OpenCL program for 2D-CT

Thanks for the information. Just to clarify my last post, the concerned team was suggesting to uninstall/remove the POCL or any other non-AMD driver and then check the issue.

Thanks.

0 Likes
Reply

Re: RX 6900XT: Memory issue with small OpenCL program for 2D-CT

@dipakI have now setup a fresh install of Ubuntu with AMD-drivers, OpenCL headers, C++ wrapper and clinfo. I have observed the following:

  • The "compute_sinogram.cl" kernel now doesn't crash instantly, but works in 64x64, 128x128, 256x256 and 512x512 and crashes in 1024x1024. The error message is similar to the old one:

 

 

Memory access fault by GPU node-1 (Agent handle: 0xd460e0) on address 0x7f6ab0600000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)​

 

 

After installing clang on the old installation with POCL still installed, I could also run the kernel "compute_sinogram.cl" for 64x64... up to 512x512. This is why i reinstalled this evening, was too thrilled.

  • Using "compute_sinogram_fast.cl" I can run my benchmark from 64x64 to 4096x4096 and observe vastly improved performance using buffers. I have also noticed said uplift after installing the clang compiler on my old installation.
  • A further performance optimisation for the kernel "fast_ram_lak_filter.cl" (replacing "shift = k%2" with "shift = k&1") results in the following error message:

 

 

LLVM ERROR: Cannot select: 0x1f2d9e8: i1 = mul 0x1f311d0, 0x1f2d570
  0x1f311d0: i1 = truncate 0x1f2e478
    0x1f2e478: i32,ch = CopyFromReg 0x1e3ede8, Register:i32 %8
      0x254e270: i32 = Register %8
  0x1f2d570: i1 = truncate 0x254e750
    0x254e750: i16,ch = CopyFromReg 0x1e3ede8, Register:i16 %5
      0x254ea28: i16 = Register %5
In function: fast_ram_lak_filter
Aborted (core dumped)
​

 

 

I provided the new (and mostly final) version of the project on github in the .zip-file "Rev7.zip" in the folder Debug. In the Output folder contained in Rev7 i put some logs of my program to illustrate my findings.

 

One final note: I asked two friends, one with a Radeon VII and one with an RX 570, to run the program. The friend with the Radeon VII reports the same issue with „compute_sinogram.cl“, the friend with the RX 570 could run said kernel without an issue up to 4096x4096 (fresh install of Ubuntu 20.04.02, AMD drivers 21.20, —opencl=legacy)

Tags (1)
0 Likes
Reply
dipak
Staff
Staff

Re: RX 6900XT: Memory issue with small OpenCL program for 2D-CT

Hi @FriedrichGuenther ,

Thanks for the update. 

Memory access fault by GPU node-1 (Agent handle: 0xd460e0) on address 0x7f6ab0600000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

Based on the earlier discussions, I'm just trying to list the exact steps that reproduce the above memory issue. Please correct me if I missed any point.

Steps:

1. Setup: RX6900XT + Ubuntu 20.04 + latest AMDGPU-Pro (21.20) driver + OpenCL headers and C++ wrapper

2. Download the repro available here: github repository -->"Debug"-->"Rev7.zip" 

3. Extract the source code, build and run.

4. Observation: The "compute_sinogram.cl" kernel works fine for image-resolution 64x64, 128x128, 256x256 and 512x512, but crashes for 1024x1024 with the above memory error message.

 

Thanks.

 

 

Re: RX 6900XT: Memory issue with small OpenCL program for 2D-CT

@dipakThanks for the summary, the steps are correct. Sorry if it was confusing. In short and hopefully overseeable form, I summarise some hopefully helpful information I have gathered:

  • Using pinned buffers (CL_MEM_ALLOC_HOST_PTR) makes the kernel "compute_sinogram.cl" crash immediately regardless of resolution
  • Testing with a bisection based ansatz, I verified that without pinned buffers, the kernel really crashes for the first time with resolution 1024x1024
  • The same problem seems to be present on a friends linux system, who runs a Radeon VII (Vega 20 based), but not on another friends system, who runs an RX 570 (Polaris 20 based, latest AMD drivers, --opencl=legacy)

Additionally, attempted performance tuning of a different kernel results in crashes:

  • Changing one line in the kernel "fast_ram_lak_filter.cl" (namely "shift = k%2;" to "shift = k&1;") causes this kernel to crash regardless of resolution. The same kernel for images, i.e. "fast_ram_lak_filter_img.cl", can handle the optimisation

All kernels and code can be found in the aforementioned github repository. If you provide me with older drivers, I am willing to test the problem with those.

Tags (1)
0 Likes
Reply