mbox series

[v8,00/15] Intel Hardware P-States (HWP) support

Message ID 20230807185119.98333-1-jandryuk@gmail.com (mailing list archive)
Headers show
Series Intel Hardware P-States (HWP) support | expand

Message

Jason Andryuk Aug. 7, 2023, 6:51 p.m. UTC
Hi,

This patch series adds Hardware-Controlled Performance States (HWP) for
Intel processors to Xen.

v2 was only partially reviewed, so v3 is mostly a reposting of v2.  In v2 &
v3, I think I addressed all comments for v1.  I kept patch 11 "xenpm:
Factor out a non-fatal cpuid_parse variant", with a v2 comment
explaining why I keep it.

v3 adds "xen/x86: Tweak PDC bits when using HWP".  Qubes testing revealed
an issue where enabling HWP can crash firwmare code (maybe SMM).  This
requires a Linux change to get the PDC bits from Xen and pass them to
ACPI.  Roger has a patch [0] to set the PDC bits.  Roger's 3 patch
series was tested with "xen/x86: Tweak PDC bits when using HWP" on
affected hardware and allowed proper operation.

v4:
There is a large amount or renaming from HWP/hwp to CPPC/cppc in the series.
The driver remains hwp_ prefixed since it is dealing with the hardware
interface.  The sysctl, xc and xenpm interfaces were renamed to cppc to
be the generic ACPI CPPC (Collaborative Processor Performance Control)
interface.

struct xen_get_cpufreq_para was re-organized in a binary compatible
fashion to nest scaling governor options.  This allows the cppc support
to use uint32_t's for its parameters.

HWP is now enabled with a top-level cpufreq=hwp option.  It will
fallback to cpufreq=xen if hwp is unavailable.  This seems like the most
user-friendly option.  Since the user was trying to specify *some*
cpufreq, we should give them the best that we can instead of disabling
the functionality.

"xenpm: Factor out a non-fatal cpuid_parse variant" was dropped.
set-cpufreq-cppc expects either a cpu number or none specified, which
implies all.

Some patches were re-arrange - "xen/x86: Tweak PDC bits when using HWP"
now comes immediately after "cpufreq: Add Hardware P-State (HWP) driver"

The implementation of "cpufreq: Allow restricting to internal governors
only " changed, so I removed Jan's Ack.

v5:
HWP is enabled with a toplevel cpufreq=hwp option.  There is no fallback
by default, but a cpufreq=hwp;xen syntax is now supported.  That tries
hwp first.  If HWP registration is unsuccessful, then xen registration
is performed as a fallback.

More changes from Jan's feedback.  They are typically minor and
documented in individual patches.

v6:
After some discussion, the cpufreq=hwp;xen syntax is retained, with the
global applicability of "verbose" mention in docs.  Minor tweaks
elsewhere as documented in individual patches.

v7:
Minor tweaks from review feedback - documented in respective patches.
Jan previously asked for a toolstack side review / ack of
"[PATCH 04/15] xen/sysctl: Nest cpufreq scaling options"

v8:
Minor tweaks from review feedback - documented in respective patches.

Previous cover letter:

With HWP, the processor makes its own determinations for frequency
selection, though users can set some parameters and preferences.  There
is also Turbo Boost which dynamically pushes the max frequency if
possible.

The existing governors don't work with HWP since they select frequencies
and HWP doesn't expose those.  Therefore a dummy hwp-interal governor is
used that doesn't do anything.

xenpm get-cpufreq-para is extended to show HWP parameters, and
set-cpufreq-cppc is added to set them.

A lightly loaded OpenXT laptop showed ~1W power savings according to
powertop.  A mostly idle Fedora system (dom0 only) showed a more modest
power savings.

This is for a 10th gen 6-core 1600 MHz base 4900 MHZ max cpu.  In the
default balance mode, Turbo Boost doesn't exceed 4GHz.  Tweaking the
energy_perf preference with `xenpm set-cpufreq-para balance ene:64`,
I've seen the CPU hit 4.7GHz before throttling down and bouncing around
between 4.3 and 4.5 GHz.  Curiously the other cores read ~4GHz when
turbo boost takes affect.  This was done after pinning all dom0 cores,
and using taskset to pin to vCPU/pCPU 11 and running a bash tightloop.

HWP defaults to disabled and running with the existing HWP configuration
- it doesn't reconfigure by default.  It can be enabled with
cpufreq=hwp.

Hardware Duty Cycling (HDC) is another feature to autonomously powerdown
things.  It defaults to enabled when HWP is enabled, but HDC can be
disabled on the command line.  cpufreq=xen:hwp,no-hdc

I've only tested on 8th gen and 10th gen systems with activity window
and energy_perf support.  So the pathes for CPUs lacking those features
are untested.

Fast MSR support was removed in v2.  The model specific checking was not
done properly, and I don't have hardware to test with.  Since writes are
expected to be infrequent, I just removed the code.

This changes the systcl_pm_op hypercall, so that wants review.

Regards,
Jason

[0] https://lore.kernel.org/xen-devel/20221121102113.41893-3-roger.pau@citrix.com/

Jason Andryuk (15):
  cpufreq: Allow restricting to internal governors only
  cpufreq: Add perf_freq to cpuinfo
  cpufreq: Export intel_feature_detect
  xen/sysctl: Nest cpufreq scaling options
  pmstat&xenpm: Re-arrage for cpufreq union
  cpufreq: Add Hardware P-State (HWP) driver
  xen/x86: Tweak PDC bits when using HWP
  xenpm: Change get-cpufreq-para output for hwp
  cpufreq: Export HWP parameters to userspace as CPPC
  libxc: Include cppc_para in definitions
  xenpm: Print HWP/CPPC parameters
  xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op
  libxc: Add xc_set_cpufreq_cppc
  xenpm: Add set-cpufreq-cppc subcommand
  CHANGELOG: Add Intel HWP entry

 CHANGELOG.md                                 |   1 +
 docs/misc/xen-command-line.pandoc            |  20 +-
 tools/include/xenctrl.h                      |  28 +-
 tools/libs/ctrl/xc_pm.c                      |  81 ++-
 tools/misc/xenpm.c                           | 392 ++++++++++-
 xen/arch/x86/acpi/cpufreq/Makefile           |   1 +
 xen/arch/x86/acpi/cpufreq/cpufreq.c          |  31 +-
 xen/arch/x86/acpi/cpufreq/hwp.c              | 658 +++++++++++++++++++
 xen/arch/x86/acpi/lib.c                      |   5 +
 xen/arch/x86/cpu/mcheck/mce_intel.c          |   6 +
 xen/arch/x86/include/asm/cpufeature.h        |  12 +-
 xen/arch/x86/include/asm/msr-index.h         |  16 +-
 xen/drivers/acpi/pmstat.c                    | 100 +--
 xen/drivers/cpufreq/cpufreq.c                |  60 +-
 xen/drivers/cpufreq/cpufreq_misc_governors.c |   9 +
 xen/drivers/cpufreq/cpufreq_ondemand.c       |   3 +
 xen/drivers/cpufreq/utility.c                |   1 +
 xen/include/acpi/cpufreq/cpufreq.h           |  22 +
 xen/include/acpi/pdc_intel.h                 |   1 +
 xen/include/public/sysctl.h                  | 144 +++-
 20 files changed, 1471 insertions(+), 120 deletions(-)
 create mode 100644 xen/arch/x86/acpi/cpufreq/hwp.c