Hello,
I am trying to write a C++ application that benchmarks a set of functions and collects power/energy information during their execution using the AMDPowerProfile API (part of uProf).
In an attempt to avoid polluting the benchmark results with the PowerProfile driver constantly executing, I set the driver sample period to a large value and only process the captured PowerProfile samples once the function being benchmarked exits. My first call to AMDTPwrReadAllEnabledCounters works as expected - returning 12 samples with RecordIDs 1 through 12 and with the expected elapsed time difference.
However, my second call to AMDTPwrReadAllEnabledCounters returns 12 samples but with RecordIDs 14 through 25. RecordID 13 appears to be missing. The elapsed time difference between the samples with RecordIDs 12 and 14 is also double what I would expect. This trend continues with each subsequent call to AMDTPwrReadAllEnabledCounters skipping one RecordID.
Does anyone know of a workaround to this? I've already tried indexing one more entry into the returned array to see if the returned count had an off by 1 error.
I've put together a modified version of the CollectAllCounters example which demonstrates this behavior and have included the output I see when running it. The development platform is a Ryzen 2990WX on Ubuntu 18.04 LTS with AMD uProf 2.0.493. I recently tried upgrading to uProf 3.1.35 but the behavior persists.
Thanks!
Solved! Go to Solution.
Hi,
Thanks for writing to us.
As you are not concern much about the sampling period. I suggest you can increase the sampling period to 500 ms or higher. in that case, even you are calling AMDTPwrReadAllEnabledCounters less frequently, your number of sample records will be less and there is less probability of loosing samples.
Just to explain how it works in AmduProf consider following example
AMDuProfCLI timechart --event Energy -t 100 -d 10 -o timechart
1. AMD driver collects average power value (from last read) EXACTLY at 100ms intervals and stores in a ring buffer.
2. At every 100 ms (APROXIMATELY) at user space user space AMDuProfCLI program read these samples from ring buffer and write into a txt/csv files. Since you have used -d 10, it will run for 10 sec in this manner.
Ring buffer is of a fix size based on platform and some other parameters. So number of samples it can hold depends on the size of the ring buffer. If there is no AMDTPwrReadAllEnabledCounters call for long time ring buffer will be full and start overwriting.
In your case (Example code), you have selected 100ms (-t 100 option) as sampling rate and trying to consume data from AMDTPwrReadAllEnabledCounters at 10Sec. That means at minimum 100 samples ring buffer need to hold.
In your program, you have selected all available counters to collect data. in that case ring buffer is not sufficient to hold all samples.
Following can be done
1. Increase sample period from 100ms to 500ms or 1Sec
2. Collect data frequently by calling AMDTPwrReadAllEnabledCounters
3. Reduce number of counters for profiling
Regarding your second part of question:
AMDuProf always take the difference from previous read. For e.g. if you are collecting counters data at 100ms, 200ms,300ms then sample collected in 200ms is average between 100ms to 200ms. So you don't need to reset anything. Hope it is clear.
Thanks
Hi,
Thank you for submitting the issue you mentioned. I hope this issue is appearing when you are using Power Profiler APIs. I have following few queries to understand this issue
1. Is this happens when you use AMDuProfCLI ?
2. What is the sampling rate you are setting?
We will also try to reproduce this issue at our end and update you soon.
Thanks
Hi,
Thanks so much for your response. I have not tried it with the AMDuProfCLI (since I needed this to run as part of a separate application) but can give it a try. Are there a set of CLI options that you would suggest I use which would reveal if this behavior is present?
The driver sampling rate is set to 100 ms. However, the thread generally runs for over 2 seconds before the samples are accessed leading to over 12 samples being collected.
Thanks again!
Hi,
You can run any CLI option to collect power or energy counter. Sampling period 100ms should be fine. Can you check the timestamps of the generated CSV/TXT. Let me know incase samples are missing there as well.
When you call AMDTPwrStartProfiling, driver start collecting data. However, first sample timestamp is the actual time when driver starts collecting data. AMDuProf has internal buffer to store few samples. In your case 12 samples are stored. Ideally, we recommend to minimize the interval of collect/access to avoid sample loss.
Thanks
Hi,
We tried to reproduce the issue you mentioned. In your code, you are trying to retrieve the counter data (AMDTPwrReadAllEnabledCounters ) at every 10 Sec ( line 189: usleep(threadWakeInterval * 1000)).However your sampling rate is 100ms.
Please choose this sleep time less to avoid missing samples. You can tryout usleep(1000000) // 1 Sec
Let me know, in case this doesn't solve your issue.
Thanks
Thanks,
My hope was to avoid running another thread to collect sample data while the benchmark is running. Most of the benchmarks last for over a second. I could increase the sampling period but this increases the blind spot at the end of my benchmark (between the last sampled point and the end of the benchmark's execution).
Are you aware of any method by which I can read the counter values at the end of the benchmark without periodically sampling (ie. reset the counters at the start of the benchmark and read their values at the end of the benchmark)? For my application, knowing average/cumulative values of counters over the course of the benchmark is sufficient.
Thanks again!
P.S. I'll check the CLI once our system finishes some long running tasks.
Hello,
I was able to run the CLI command and the report does not appear to be missing samples.
Here is the command I ran: AMDuProfCLI timechart --event Energy -t 100 -d 10 -o timechart
I should mention that the example program I attached in the original question is a simplified example to show the behavior we were seeing. The actual code is part of a larger application which does not start a separate thread to periodically fetch samples (in an attempt to avoid introducing additional interrupts & disturbing the cache while a benchmark is running).
For our application we don't actually need to have a timechart of the counters - just knowing the average/cumulative values of counters over the course of the benchmark is sufficient. Is it possible to use the AMDPowerProfile API to reset the counters at the start of a benchmark and read them at the end? We tried changing the mode from AMDT_PWR_MODE_TIMELINE_ONLINE to AMDT_PWR_MODE_INSTANT_COUNTER a little while ago but it did not seem to work for this particular Ryzen CPU.
Thanks again!
Hi,
Thanks for reporting this issue. We are looking into it. We will get back to you with our analysis asap.
Regards,
Aalok Agarwal
AMDuProf Team
Hi,
Thanks for writing to us.
As you are not concern much about the sampling period. I suggest you can increase the sampling period to 500 ms or higher. in that case, even you are calling AMDTPwrReadAllEnabledCounters less frequently, your number of sample records will be less and there is less probability of loosing samples.
Just to explain how it works in AmduProf consider following example
AMDuProfCLI timechart --event Energy -t 100 -d 10 -o timechart
1. AMD driver collects average power value (from last read) EXACTLY at 100ms intervals and stores in a ring buffer.
2. At every 100 ms (APROXIMATELY) at user space user space AMDuProfCLI program read these samples from ring buffer and write into a txt/csv files. Since you have used -d 10, it will run for 10 sec in this manner.
Ring buffer is of a fix size based on platform and some other parameters. So number of samples it can hold depends on the size of the ring buffer. If there is no AMDTPwrReadAllEnabledCounters call for long time ring buffer will be full and start overwriting.
In your case (Example code), you have selected 100ms (-t 100 option) as sampling rate and trying to consume data from AMDTPwrReadAllEnabledCounters at 10Sec. That means at minimum 100 samples ring buffer need to hold.
In your program, you have selected all available counters to collect data. in that case ring buffer is not sufficient to hold all samples.
Following can be done
1. Increase sample period from 100ms to 500ms or 1Sec
2. Collect data frequently by calling AMDTPwrReadAllEnabledCounters
3. Reduce number of counters for profiling
Regarding your second part of question:
AMDuProf always take the difference from previous read. For e.g. if you are collecting counters data at 100ms, 200ms,300ms then sample collected in 200ms is average between 100ms to 200ms. So you don't need to reset anything. Hope it is clear.
Thanks
Thanks for the explanation, rajeebbarman! The fact that the driver uses a fixed size ring buffer explains the behavior I was seeing. I'll make the suggested modifications to my benchmark program.