What I have been able to achieve so far:
a) I am calling the DRM_IOCTL_AMDGPU_INFO ioctl which reads the AMDGPU_INFO_SENSOR_GPU_AVG_POWER sensor after opening the device '/dev/dri/card0'.
With this, I am able to see the power readings.
This works with the RHEL 7.8 amdgpu driver v20.20.
I run this ioctl call repeatedly in a infinite (interruptible) loop capturing the time before each call.
b) I downloaded aomp-11.12.0 tarball and used the clang/lld available inside it by setting the path (and avoiding an install) to build the veccopy example.
I am able to run the veccopy sample program and capture the time before and after its invocation.
a) and b) run (almost) simultaneously from 2 consoles.
I am able to see a few power readings before the ioctls start to fail with EINVAL. This happens around the time that the veccopy finishes.
At this time, I need to reboot the instance in order to get to a good state and repeat the above.
I have still not been able to figure out why after the veccopy example run completes, the ioctls stop working.
These are the messages in dmesg output when I issue the ioctl call (and it fails):
[ 1685.007425] Msg issuing pre-check failed and SMU may be not in the right state!
[ 1685.013868] Failed to export SMU metrics table!
[ 1685.017480] Msg issuing pre-check failed and SMU may be not in the right state!
[ 1685.023633] Failed to export SMU metrics table!
Any pointers on what might be messing up are highly appreciated.
Thanks.