This is an offshoot of the more general thread
I'm making this a separate question because it specifically involves Instruction Based Sampling.
I've been writing a kernel driver to run the IBS Ops facility. It runs on only one core at a time, and it has interrupts masked. It uses a timer to periodically check the Control MSR (c001_1033) to see if it has taken a sample, then it turns off the sample bit and the enable bit.
A little later the driver gets sample data from other Ibs Ops MSRs, and then sets the enable bit in the Control MSR.
After doing this for awhile, the computer freezes, just like was described in the above thread. The Event Viewer shows the same Kernel-Power error (event 41, task 63), as other people have encountered.
Depending on how often I enable the hardware, I get the freeze more quickly if I do the enabling more often.
Now I have a switch in the driver that omits the write to the MSR that enables counting. When I set this switch, the driver runs happy indefinitely.
This I take to be evidence that then the IBS is counting, it could be overheating the CPU, or getting into some other condition that drains a lot of power, and eventually triggers the shutdown.
Interestingly, most of the earlier reports of freezes involve ASUS PRIME motherboards, but also I saw Gigabyte. There was some talk about a BIOS bug regarding power state transitions. Could it be that different MBs have the same BIOS bug? I suppose if AMD supplied the code, that could happen.
Anyway, I upgraded my BIOS to the January 2018 version from ASUS.
There's also some talk about power plans. I downloaded the Ryzen Balanced plan and selected it. I also selected High Performance and Power Saver plans. I got the freeze with all of them.
Sorry, I posted the above before I was done.
I haven't yet tried slowing down my memory. I doubt that this is related, but I'll try it since other people did.
I've run Ryzen Master while doing the testing. I see the CPU running the driver (and the target program) going up to a higher speed. However, I don't see any increase in temperature at all. It maintains a steady 27 C. If there's a temperature problem, I suppose it would show up in the histogram.
My questions are:
1. What's the problem here? Is the CPU doing an emergency shutdown, is it overheating? Why would IBS make so big a difference? How can I get a power plan that will work to prevent the problem?
2. Would someone with the right diagnostic hardware like to run my driver program? I can provide a user side application and a kernel driver.
3. Would anyone with this problem with an ASUS motherboard, who solved the problem, please report what you changed.
4. I hope to get an AMD hardware and/or BIOS and/or HAL engineer involved in this discussion.