Dear All,
I am trying to use tensorflow-rocm (version 2.13.0.570) to run in Jupyter on Ubuntu 22.04.
When I check if tensorflow "sees" the GPU using :
print(tf.config.list_physical_devices('GPU'))
I get the following message :
2024-01-06 12:19:43.135928: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:838] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-06 12:19:43.215412: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:838] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-06 12:19:43.215471: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:838] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-06 12:19:43.215488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2015] Ignoring visible gpu device (device: 0, name: Radeon RX 7900 XT, pci bus id: 0000:0b:00.0) with AMDGPU version : gfx1100. The supported AMDGPU versions are gfx1030, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
What I understand is that this version of tensorflow doesn't work with my GPU.
Is tensorflow 2.13.0.570 compatible with the RX 7900 XT? And if so, what do I need to do to make it work.
I tried uninstall/reinstalling rocm, tensorflow-rocm, ...
I have the same issue with the next version they posted, tensorflow-rocm 2.14 for Rocm 6.0 In this case it seems there is a typo, as the error reads:
Radeon RX 7900 XTX, pci bus id: 0000:03:00.0) with AMDGPU version : gfx1100. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
It seems they forgot to include a space between gfx1030 and gfx1100. Please AMD, fix this problem.
I installed version 2.14.0.600 and I have the same issue as you :
Ignoring visible gpu device (device: 0, name: Radeon RX 7900 XT, pci bus id: 0000:0b:00.0) with AMDGPU version : gfx1030. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942.
AMD : please add the missing comma.
I wonder whether we can recompile tensorflow-rocm from scratch to correct this typo manually. I am searching for a guide, so if you see anything, please let me know. Thanks!
Today I wrote a letter to AMD with the problems I found. I tested the solution to this problem on two versions of Tensorflow.
I compiled two versions from source. Everything went well. Below, in the second paragraph there is a solution.
Here is my letter to AMD:
Good afternoon.
You have many problems with ROCM and documentation.
These are some of the issues that have been identified that have a huge impact!
1) On the page https://github.com/ROCm/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release....
The link leads to a non-existent version:
https://pypi.org/project/tensorflow-rocm/2.14.0.602
2) There is a problem with a typo in the source code (a comma is missing), which can be solved like this:
sed -i 's/"gfx1030" /"gfx1030",/g' tensorflow/compiler/xla/stream_executor/device_description.h
Otherwise, there will be an error when using ROSM with a 7900 xthx video card.
Here is a link to fix it:
https://gist.github.com/briansp2020/1e8c3e5735087398ebfd9514f26a0007
This problem is being discussed on your forum and no one has been able to solve it since January!
https://community.amd.com/t5/discussions/rocm-6-0-0-tensorflow-not-working-on-rx-7900-xt/m-p/657519
Hello, I ran into a similar problem, the error is I tensorflow/core/common_runtime/gpu/ gpu_deveid.cc :2266] Ignoring visible gpu device (device: 0, name: AMD Radeon RX 6800 XT, pci bus id: 0000:0a:00.0) with AMDGPU version: gfx1030. The supported AMDGPU versions are gfx1030gfx1100, gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942. And I use the way you recommend sed -i 's/" gfx1030 "/" gfx1030, "/ g' tensorflow/compiler/xla/stream_executor/device_description h, I ran the code again after rebooting the system and still encountered errors. Do I need to reinstall, recompile or other fixes?
I was thinking the same thing. 🙂
any news from AMD?????
It's a bummer this still hasn't been fixed...
yep tried it yesterday, uninstalled and reinstalled it just to make sure.
Same thing......
I though AI was a big deal....
any news from AMD right now?
or tell me how can i add the missing comma, i will try it.
Was buying the 7900xtx the wrong choice?