cancel
Showing results for 
Search instead for 
Did you mean: 

AI Discussions

Larosquilla
Journeyman III

Only batch size 1 worked

Platform: ROCm 6.1.3
OS: WSL2 Ubuntu 22.04
GPU: AMD Radeon RX 7900 XTX

Hi everyone,

I'm currently working on training a PyTorch deep learning model with an AMD GPU on WSL2. I successfully installed ROCm and all necessary packages, following this guide: https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/wsl/install-pytorch.html

My project involves an image classification model with images sized at approximately 300x300 pixels. When I set batch_size=1, training runs smoothly. Here are the utilization stats from my monitoring panel:

  • CPU i7-12: 50%
  • RAM: 16GB out of 32GB
  • GPU: 20%
  • GPU Memory: 4GB out of 24GB

The issue arises when I try to increase the batch size beyond 1. I receive the following error:

RuntimeError: Caught RuntimeError in pin memory thread for device 0.

HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

I’ve already configured WSL2 with 128GB for the virtual machine, but the batch size remains limited to 1.

The configuration step is here https://learn.microsoft.com/en-us/windows/wsl/wsl-config

Does anyone have suggestions on how to unlock or increase GPU usage to support larger batch sizes? Any advice would be greatly appreciated!

3 Replies
DigitalHumunculus
Journeyman III

Your GPU is hardly being utilized, I had a similair issue running a generative modeling program last year on my machine (I7-10700k, 32G RAM, RX6700xt, 12G VM), the under-utilazion was caused by improperly updated PyTorch files. After entering command line arguments for bypassing PyTorch updates/GIT Pull I could run small (600*600px) single batches at 100% CPU utilization with approximately .0001 iterations per second it took 5 minutes to generate. Using Linux has become more viable with better support but still can cause issues with programs designed to run in Windows. I found that making sure PyTorch files update before allowing the program to open the VM enviroment otherwise the CPU will be chosen as the main unit to be utilized. I would suggest changing O.S. for windows if no other option seems available.

 

I am not a professional programmer or computer scientist, but feel this is opinion may help steer you in the right direction(s). Good luck and AMD-speed.

0 Likes
hcveeh
Journeyman III

I had similar issues with GPU under-utilization running generative models, which I solved by ensuring PyTorch files updated properly before opening the VM. Switching to Windows might help if updates don’t resolve it. Good luck!

0 Likes
Mona
Adept I

It looks like WSL2 with ROCm has some limitations with GPU memory handling, especially for larger batch sizes. First, try setting HIP_LAUNCH_BLOCKING=1 to synchronize operations, which can sometimes help with memory errors—just add export HIP_LAUNCH_BLOCKING=1 at the start of your script. You might also disable pin_memory in your DataLoader by setting pin_memory=False, as this often resolves issues on WSL. Increasing the shared memory size by remounting /dev/shm with a larger size (e.g., sudo mount -o remount,size=16G /dev/shm) could also improve performance. WSL2 support for ROCm is evolving, so keeping an eye on updates could help too. Let us know if this makes a difference!

0 Likes