mbox series

[v7,00/12] Introduce In Field Scan driver

Message ID 20220506225410.1652287-1-tony.luck@intel.com (mailing list archive)
Headers show
Series Introduce In Field Scan driver | expand

Message

Luck, Tony May 6, 2022, 10:53 p.m. UTC
TL;DR this driver loads scan test files that can check whether silicon
in a CPU core is still running correctly. It is expected that these tests
would be run several times per day to catch problems as silicon ages.

Changes since v6

Thomas Gleixner
---------------
"struct workqueue_struct *ifs_wq; Seems to be unused."

True. Deleted.

"static bool oscan_enabled = true; What changes this?"

Code that cleared it deleted. Drop this too.

"Please add curly brackets as these are not one-line statements"

Added

cpumask_first(topology_sibling_cpumask(cpu)); Shouldn't that be cpu_smt_mask()?"

Changed (and several other places)

"take up to 200 milliseconds before it retires.  200ms per test chunk?"

Updated comment to note that 200ms is for all chunks.

"Documentation lost in the intertubes"

Dredged up the version from v3 series and changed:
1) Fixed pathnames now this is a virtual misc device instead of platform
device
2) Put all the text into a "/** DOC:" comment section in ifs.h with just
a "kernel-doc:: drivers/platform/x86/intel/ifs/ifs.h" in the ifs.rst
file under Documentation/x86.
3) Added a "big fat warning" (in all CAPS) pointing out that a core test
can take up to 200 milliseconds. So admins must take extra steps if they
are running latency sensitive workloads.
4) Added note that all HT threads of a core must be online to run a
test.

Tony Luck
---------
Off-by-one on retries check (#define set to 5, but tried 6 times). Fixed
Fixed kerneldoc description of "integrity_cap_bit" (was missing a ":")

Jithu Joseph (7):
  x86/microcode/intel: Expose collect_cpu_info_early() for IFS
  platform/x86/intel/ifs: Read IFS firmware image
  platform/x86/intel/ifs: Check IFS Image sanity
  platform/x86/intel/ifs: Authenticate and copy to secured memory
  platform/x86/intel/ifs: Add scan test support
  platform/x86/intel/ifs: Add IFS sysfs interface
  platform/x86/intel/ifs: add ABI documentation for IFS

Peter Zijlstra (1):
  stop_machine: Add stop_core_cpuslocked() for per-core operations

Tony Luck (4):
  x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
  platform/x86/intel/ifs: Add stub driver for In-Field Scan
  trace: platform/x86/intel/ifs: Add trace point to track Intel IFS
    operations
  Documentation: In-Field Scan

 .../ABI/testing/sysfs-platform-intel-ifs      |  39 +++
 Documentation/x86/ifs.rst                     |   2 +
 Documentation/x86/index.rst                   |   1 +
 MAINTAINERS                                   |   8 +
 arch/x86/include/asm/cpu.h                    |  18 ++
 arch/x86/include/asm/msr-index.h              |   7 +
 arch/x86/kernel/cpu/intel.c                   |  32 +++
 arch/x86/kernel/cpu/microcode/intel.c         |  59 +---
 drivers/platform/x86/intel/Kconfig            |   1 +
 drivers/platform/x86/intel/Makefile           |   1 +
 drivers/platform/x86/intel/ifs/Kconfig        |  13 +
 drivers/platform/x86/intel/ifs/Makefile       |   3 +
 drivers/platform/x86/intel/ifs/core.c         |  73 +++++
 drivers/platform/x86/intel/ifs/ifs.h          | 234 +++++++++++++++
 drivers/platform/x86/intel/ifs/load.c         | 266 ++++++++++++++++++
 drivers/platform/x86/intel/ifs/runtest.c      | 252 +++++++++++++++++
 drivers/platform/x86/intel/ifs/sysfs.c        | 149 ++++++++++
 include/linux/stop_machine.h                  |  16 ++
 include/trace/events/intel_ifs.h              |  41 +++
 kernel/stop_machine.c                         |  19 ++
 20 files changed, 1182 insertions(+), 52 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-platform-intel-ifs
 create mode 100644 Documentation/x86/ifs.rst
 create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
 create mode 100644 drivers/platform/x86/intel/ifs/Makefile
 create mode 100644 drivers/platform/x86/intel/ifs/core.c
 create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
 create mode 100644 drivers/platform/x86/intel/ifs/load.c
 create mode 100644 drivers/platform/x86/intel/ifs/runtest.c
 create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
 create mode 100644 include/trace/events/intel_ifs.h


base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a

Comments

Hans de Goede May 11, 2022, 3:51 p.m. UTC | #1
Hi,

On 5/7/22 00:53, Tony Luck wrote:
> TL;DR this driver loads scan test files that can check whether silicon
> in a CPU core is still running correctly. It is expected that these tests
> would be run several times per day to catch problems as silicon ages.

Thank you for your patch-series, I've applied the series to my
review-hans branch:
https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=review-hans

Note it will show up in my review-hans branch once I've pushed my
local branch there, which might take a while.

Once I've run some tests on this branch the patches there will be
added to the platform-drivers-x86/for-next branch and eventually
will be included in the pdx86 pull-request to Linus for the next
merge-window.

Regards,

Hans




> 
> Changes since v6
> 
> Thomas Gleixner
> ---------------
> "struct workqueue_struct *ifs_wq; Seems to be unused."
> 
> True. Deleted.
> 
> "static bool oscan_enabled = true; What changes this?"
> 
> Code that cleared it deleted. Drop this too.
> 
> "Please add curly brackets as these are not one-line statements"
> 
> Added
> 
> cpumask_first(topology_sibling_cpumask(cpu)); Shouldn't that be cpu_smt_mask()?"
> 
> Changed (and several other places)
> 
> "take up to 200 milliseconds before it retires.  200ms per test chunk?"
> 
> Updated comment to note that 200ms is for all chunks.
> 
> "Documentation lost in the intertubes"
> 
> Dredged up the version from v3 series and changed:
> 1) Fixed pathnames now this is a virtual misc device instead of platform
> device
> 2) Put all the text into a "/** DOC:" comment section in ifs.h with just
> a "kernel-doc:: drivers/platform/x86/intel/ifs/ifs.h" in the ifs.rst
> file under Documentation/x86.
> 3) Added a "big fat warning" (in all CAPS) pointing out that a core test
> can take up to 200 milliseconds. So admins must take extra steps if they
> are running latency sensitive workloads.
> 4) Added note that all HT threads of a core must be online to run a
> test.
> 
> Tony Luck
> ---------
> Off-by-one on retries check (#define set to 5, but tried 6 times). Fixed
> Fixed kerneldoc description of "integrity_cap_bit" (was missing a ":")
> 
> Jithu Joseph (7):
>   x86/microcode/intel: Expose collect_cpu_info_early() for IFS
>   platform/x86/intel/ifs: Read IFS firmware image
>   platform/x86/intel/ifs: Check IFS Image sanity
>   platform/x86/intel/ifs: Authenticate and copy to secured memory
>   platform/x86/intel/ifs: Add scan test support
>   platform/x86/intel/ifs: Add IFS sysfs interface
>   platform/x86/intel/ifs: add ABI documentation for IFS
> 
> Peter Zijlstra (1):
>   stop_machine: Add stop_core_cpuslocked() for per-core operations
> 
> Tony Luck (4):
>   x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
>   platform/x86/intel/ifs: Add stub driver for In-Field Scan
>   trace: platform/x86/intel/ifs: Add trace point to track Intel IFS
>     operations
>   Documentation: In-Field Scan
> 
>  .../ABI/testing/sysfs-platform-intel-ifs      |  39 +++
>  Documentation/x86/ifs.rst                     |   2 +
>  Documentation/x86/index.rst                   |   1 +
>  MAINTAINERS                                   |   8 +
>  arch/x86/include/asm/cpu.h                    |  18 ++
>  arch/x86/include/asm/msr-index.h              |   7 +
>  arch/x86/kernel/cpu/intel.c                   |  32 +++
>  arch/x86/kernel/cpu/microcode/intel.c         |  59 +---
>  drivers/platform/x86/intel/Kconfig            |   1 +
>  drivers/platform/x86/intel/Makefile           |   1 +
>  drivers/platform/x86/intel/ifs/Kconfig        |  13 +
>  drivers/platform/x86/intel/ifs/Makefile       |   3 +
>  drivers/platform/x86/intel/ifs/core.c         |  73 +++++
>  drivers/platform/x86/intel/ifs/ifs.h          | 234 +++++++++++++++
>  drivers/platform/x86/intel/ifs/load.c         | 266 ++++++++++++++++++
>  drivers/platform/x86/intel/ifs/runtest.c      | 252 +++++++++++++++++
>  drivers/platform/x86/intel/ifs/sysfs.c        | 149 ++++++++++
>  include/linux/stop_machine.h                  |  16 ++
>  include/trace/events/intel_ifs.h              |  41 +++
>  kernel/stop_machine.c                         |  19 ++
>  20 files changed, 1182 insertions(+), 52 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-platform-intel-ifs
>  create mode 100644 Documentation/x86/ifs.rst
>  create mode 100644 drivers/platform/x86/intel/ifs/Kconfig
>  create mode 100644 drivers/platform/x86/intel/ifs/Makefile
>  create mode 100644 drivers/platform/x86/intel/ifs/core.c
>  create mode 100644 drivers/platform/x86/intel/ifs/ifs.h
>  create mode 100644 drivers/platform/x86/intel/ifs/load.c
>  create mode 100644 drivers/platform/x86/intel/ifs/runtest.c
>  create mode 100644 drivers/platform/x86/intel/ifs/sysfs.c
>  create mode 100644 include/trace/events/intel_ifs.h
> 
> 
> base-commit: 672c0c5173427e6b3e2a9bbb7be51ceeec78093a