cancel
Showing results for 
Search instead for 
Did you mean: 

Drivers & Software

drajarshi
Adept I

Reading power sensors with GPU offloading on V520

Hello

I am using a AWS shared instance running amazon linux 2 with a V520 GPU. I am able to read the power sensor with the 20.20 version of the amdgpu-pro driver loaded in the instance, specifically amdgpu-pro-20.20-1184451-rhel-7.8. I am using a ioctl call to read the same.

Now, I also need to have AOMP installed on the instance since I need to get GPU offloading to work. My goal is to have GPU offloading in action and measure the GPU power during the offload. 

Therefore, I installed AOMP_11.12.0. However, this replaced the existing amdgpu driver with a new version. 

Now, I have GPU offloading working however the power sensor returns zero values. The ioctl call for the power reading returns successfully however the value returned is always zero.

I am assuming this is because the new driver which AOMP installs, does not populate the power sensor with the actual values.

Please suggest as to what is the simplest way to get both a) GPU power readings and b) GPU offloading to concurrently work.

Thanks and Regards,

Rajarshi Das

 

0 Likes
1 Reply
drajarshi
Adept I

What I have been able to achieve so far:

a) I am calling the DRM_IOCTL_AMDGPU_INFO ioctl which reads the AMDGPU_INFO_SENSOR_GPU_AVG_POWER sensor after opening the device '/dev/dri/card0'.

With this, I am able to see the power readings.

This works with the RHEL 7.8 amdgpu driver v20.20.

I run this ioctl call repeatedly in a infinite (interruptible) loop capturing the time before each call.

b) I downloaded aomp-11.12.0 tarball and used the clang/lld available inside it by setting the path (and avoiding an install) to build the veccopy example.

I am able to run the veccopy sample program and capture the time before and after its invocation.

a) and b) run (almost) simultaneously from 2 consoles.

I am able to see a few power readings before the ioctls start to fail with EINVAL. This happens around the time that the veccopy finishes.

At this time, I need to reboot the instance in order to get to a good state and repeat the above.

I have still not been able to figure out why after the veccopy example run completes, the ioctls stop working.

These are the messages in dmesg output when I issue the ioctl call (and it fails):

[ 1685.007425] Msg issuing pre-check failed and SMU may be not in the right state!
[ 1685.013868] Failed to export SMU metrics table!
[ 1685.017480] Msg issuing pre-check failed and SMU may be not in the right state!
[ 1685.023633] Failed to export SMU metrics table!

Any pointers on what might be messing up are highly appreciated.

Thanks.

0 Likes