diff mbox series

Relying on_OSC to be accurate about CPPC v2 support breaks scheduling on heterogenous-core Intel systems with buggy firmware

Message ID d01b0a1f-bd33-47fe-ab41-43843d8a374f@kfocus.org (mailing list archive)
State Superseded, archived
Headers show
Series Relying on_OSC to be accurate about CPPC v2 support breaks scheduling on heterogenous-core Intel systems with buggy firmware | expand

Commit Message

Aaron Rainbolt June 16, 2024, 9:15 p.m. UTC
My name is Aaron Rainbolt, and I am working as a developer with Kubuntu 
Focus.

In commit 7feec7430eddd, the `acpi_cppc_processor_probe()` function was 
modified to check the CPPC v2 bit in _OSC to determine is CPPC v2 
support was present on the system. If this bit is not set, a particular 
set of CPUs are checked using `cpc_supported_by_cpu()` (defined in 
arch/x86/kernel/acpi/cppc.c) to see if the processor supports CPPC v2 
even though the BIOS does not report it. If this function returns false, 
CPPC v2 is considered absent.

While this works well on systems where the firmware accurately reports 
CPPC v2 support in _OSC, this causes a severe performance regression 
when using the new EEVDF scheduler on some machines. So far we've noted 
this issue on certain machines with i5-13500H processors, and have seen 
some reports of the same issue elsewhere on other hardware. All machines 
encountering this issue had two things in common:

* They use heterogenous-core Intel processors
* They have buggy or misconfigured firmware. In the clearest cases, this 
firmware fails to report CPPC v2 support in _OSC even though CPPC v2 works.

When these two things are true, the EEVDF scheduler will oftentimes 
schedule processes on efficiency cores rather than performance cores, 
resulting in badly impaired single-core performance (my workplace was 
seeing 50% slower Geekbench 5 scores on some systems because of this 
bug). Some examples of the bug online can be seen here:

* Kernel.org Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218195
* Same issue, same author, Star Labs Firmware bug tracker: 
https://github.com/StarLabsLtd/firmware/issues/143
* Similar but less-clear issue on the Manjaro forums: 
https://forum.manjaro.org/t/linux-kernel-6-6-lts-cpu-regression-on-i7-alderlake/157474
* Similar but less-clear issue on the Gentoo forums: 
https://forums.gentoo.org/viewtopic-p-8819389.html?sid=5997f89fd5a202b6db8396fba0b45821 
(resolved by enabling Intel SpeedStep - I suspect the poster meant Intel 
SpeedShift here, though I can't be certain)

To test whether the _OSC mis-reporting CPPC v2 support was the issue, I 
recompiled the latest kernel for Ubuntu 24.04 with the following test patch:


This essentially ignores the results of the _OSC bit check and continues 
on to parsing the ACPI table regardless. This immediately resolves the 
problem in our testing - CPPC v2 appears enabled looking under /sys and 
/proc, and single-core performance improves dramatically.

Looking through the mailing list archives, it does not appear simply 
ignoring this bit is safe in the long run - apparently is can mess 
something up with USB4? (See 
https://marc.info/?l=linux-acpi&m=165704566017713&w=2 - I've CC'd Mario 
Limonciello on this.)

Some ideas I have for potential long-term fixes:

* Perhaps add a kernel parameter such as "force_cppc_v2" that will allow 
the user to choose whether to ignore this check or not? This isn't 
ideal, but it would work, I think.
* The `cpc_supported_by_cpu()` function appears to be used to work 
around this very bug for select AMD and Hygon CPUs. Would it be possible 
to add heterogenous-core Intel CPUs to this function so that the _OSC 
CPPC v2 bit is overridden for all such processors?
* (Long shot) Make the new scheduler not need CPPC v2?

While not ideal, I think the kernel parameter solution is the safest, 
and it is also sufficient for Kubuntu Focus's purposes. I'll work on a 
patch that uses that strategy if no one objects or has better suggestions.

Thanks for your help!
diff mbox series

Patch

--- cppc_acpi_old.c     2024-06-16 15:27:44.214202299 -0500
+++ cppc_acpi.c 2024-06-16 00:29:51.684020493 -0500
@@ -679,8 +679,13 @@ 

        if (!osc_sb_cppc2_support_acked) {
                pr_debug("CPPC v2 _OSC not acked\n");
+               /* KFOCUS TEST PATCH
+                * Some machines have a BIOS bug that causes
+                * this code path to be mistakenly hit. Ignore
+                * it and continue regardless.
                if (!cpc_supported_by_cpu())
                        return -ENODEV;
+               */
        }

        /* Parse the ACPI _CPC table for this CPU. */