I've continued to monitor and faff with this and am still undecided about it.
At the weekend I had some regular crashes in Dirt Rally 2 and decided it could only be the CPU and contacted OC to start the RMA process. I got prompt a reply back from OC asking for a bit more info and this made me want to do just a tiny bit more testing, with some interesting (to me at least) results.
At stock settings - XMP off, most stuff on 'auto', processor virtualisation on (I need this cpu for VMs) - it crashes with WHEA bugcheck error 0x0000124.
If I enable PBO and set the PBO limits to Motherboard, with no other changes, it *doesn't* crash
If I enable XMP with PBO limits set to Motherboard it crashes with WHEA Bus/Interconnect Error (on various APIC IDs)
If I leave XMP enabled and set the IF frequency down to 3200 (from 3600, which is what it's set to when I enable XMP) it *doesn't* crash
Note that in each case the crashes only ever happen when gaming. Crashing happens within about 15 mins, the doesn't-crash cases are stable for at least a couple of hours.
So it appears there are 2 triggers for these crashes - the CPU isn't stable with IF set to 3600, and the CPU isn't stable with PBO limits set to auto. With XMP on, IF at 3200 and PBO limits to Motherboard (which means crazy PPT limit like 500w, EDC at like 200 or something) the thing appears to be stable in gaming.
To answer some of the questions here, the GPU is hooked up to 2 separate PCIe power connectors (as it was before), no daisychaining. It's not overclocked. The board is an MSI x570 pro-a. I know this has relatively weak power delivery gubbins but would still have expected it to be able to run a supported CPU at stock settings.
I don't have access to another board to test. I've not tried positive voltage offset