mbox series

[v5,00/21] Change ghes to use HEST-based offsets and add support for error inject

Message ID cover.1740653898.git.mchehab+huawei@kernel.org (mailing list archive)
Headers show
Series Change ghes to use HEST-based offsets and add support for error inject | expand

Message

Mauro Carvalho Chehab Feb. 27, 2025, 11:03 a.m. UTC
Now that the ghes preparation patches were merged, let's add support
for error injection.

On this version, HEST table got added to ACPI tables testing for aarch64 virt.

There are also some patch reorder to help reviewers to check the changes.

The code itself is almost identical to v4, with just a few minor nits addressed.

---
v5:
- make checkpatch happier;
- HEST table is now tested;
- some changes at HEST spec documentation to align with code changes;
- extra care was taken with regards to git bisectability.

v4:
- added an extra comment for AcpiGhesState structure;
- patches reordered;
- no functional changes, just code shift between the patches in this series.

v3:
- addressed more nits;
- hest_add_le now points to the beginning of HEST table;
- removed HEST from tests/data/acpi;
- added an extra patch to not use fw_cfg with virt-10.0 for hw_error_le

v2: 
- address some nits;
- improved ags cleanup patch and removed ags.present field;
- added some missing le*_to_cpu() calls;
- update date at copyright for new files to 2024-2025;
- qmp command changed to: inject-ghes-v2-error ans since updated to 10.0;
- added HEST and DSDT tables after the changes to make check target happy.
  (two patches: first one whitelisting such tables; second one removing from
   whitelist and updating/adding such tables to tests/data/acpi)


Mauro Carvalho Chehab (21):
  tests/acpi: virt: add an empty HEST file
  tests/qtest/bios-tables-test: extend to also check HEST table
  tests/acpi: virt: update HEST file with its current data
  acpi/ghes: Cleanup the code which gets ghes ged state
  acpi/ghes: prepare to change the way HEST offsets are calculated
  acpi/ghes: add a firmware file with HEST address
  acpi/ghes: Use HEST table offsets when preparing GHES records
  acpi/ghes: don't hard-code the number of sources for HEST table
  acpi/ghes: add a notifier to notify when error data is ready
  acpi/ghes: create an ancillary acpi_ghes_get_state() function
  acpi/generic_event_device: Update GHES migration to cover hest addr
  acpi/generic_event_device: add logic to detect if HEST addr is
    available
  acpi/generic_event_device: add an APEI error device
  tests/acpi: virt: allow acpi table changes at DSDT and HEST tables
  arm/virt: Wire up a GED error device for ACPI / GHES
  qapi/acpi-hest: add an interface to do generic CPER error injection
  tests/acpi: virt: update HEST table to accept two sources
  tests/acpi: virt: and update DSDT table to add the new GED device
  docs: hest: add new "etc/acpi_table_hest_addr" and update workflow
  acpi/generic_event_device.c: enable use_hest_addr for QEMU 10.x
  scripts/ghes_inject: add a script to generate GHES error inject

 MAINTAINERS                                   |  10 +
 docs/specs/acpi_hest_ghes.rst                 |  28 +-
 hw/acpi/Kconfig                               |   5 +
 hw/acpi/aml-build.c                           |  10 +
 hw/acpi/generic_event_device.c                |  43 ++
 hw/acpi/ghes-stub.c                           |   7 +-
 hw/acpi/ghes.c                                | 231 ++++--
 hw/acpi/ghes_cper.c                           |  38 +
 hw/acpi/ghes_cper_stub.c                      |  19 +
 hw/acpi/meson.build                           |   2 +
 hw/arm/virt-acpi-build.c                      |  36 +-
 hw/arm/virt.c                                 |  19 +-
 hw/core/machine.c                             |   2 +
 include/hw/acpi/acpi_dev_interface.h          |   1 +
 include/hw/acpi/aml-build.h                   |   2 +
 include/hw/acpi/generic_event_device.h        |   1 +
 include/hw/acpi/ghes.h                        |  52 +-
 include/hw/arm/virt.h                         |   2 +
 qapi/acpi-hest.json                           |  35 +
 qapi/meson.build                              |   1 +
 qapi/qapi-schema.json                         |   1 +
 scripts/arm_processor_error.py                | 476 ++++++++++++
 scripts/ghes_inject.py                        |  51 ++
 scripts/qmp_helper.py                         | 702 ++++++++++++++++++
 target/arm/kvm.c                              |   7 +-
 tests/data/acpi/aarch64/virt/DSDT             | Bin 5196 -> 5240 bytes
 .../data/acpi/aarch64/virt/DSDT.acpihmatvirt  | Bin 5282 -> 5326 bytes
 tests/data/acpi/aarch64/virt/DSDT.memhp       | Bin 6557 -> 6601 bytes
 tests/data/acpi/aarch64/virt/DSDT.pxb         | Bin 7679 -> 7723 bytes
 tests/data/acpi/aarch64/virt/DSDT.topology    | Bin 5398 -> 5442 bytes
 tests/data/acpi/aarch64/virt/HEST             | Bin 0 -> 224 bytes
 tests/qtest/bios-tables-test.c                |   2 +-
 32 files changed, 1692 insertions(+), 91 deletions(-)
 create mode 100644 hw/acpi/ghes_cper.c
 create mode 100644 hw/acpi/ghes_cper_stub.c
 create mode 100644 qapi/acpi-hest.json
 create mode 100644 scripts/arm_processor_error.py
 create mode 100755 scripts/ghes_inject.py
 create mode 100755 scripts/qmp_helper.py
 create mode 100644 tests/data/acpi/aarch64/virt/HEST

Comments

Igor Mammedov Feb. 27, 2025, 1:30 p.m. UTC | #1
On Thu, 27 Feb 2025 12:03:30 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Now that the ghes preparation patches were merged, let's add support
> for error injection.
> 
> On this version, HEST table got added to ACPI tables testing for aarch64 virt.
> 
> There are also some patch reorder to help reviewers to check the changes.
> 
> The code itself is almost identical to v4, with just a few minor nits addressed.

series still has checkpatch errors 'line over 80' which are not false positive,
it needs to be fixed

> 
> ---
> v5:
> - make checkpatch happier;
> - HEST table is now tested;
> - some changes at HEST spec documentation to align with code changes;
> - extra care was taken with regards to git bisectability.
> 
> v4:
> - added an extra comment for AcpiGhesState structure;
> - patches reordered;
> - no functional changes, just code shift between the patches in this series.
> 
> v3:
> - addressed more nits;
> - hest_add_le now points to the beginning of HEST table;
> - removed HEST from tests/data/acpi;
> - added an extra patch to not use fw_cfg with virt-10.0 for hw_error_le
> 
> v2: 
> - address some nits;
> - improved ags cleanup patch and removed ags.present field;
> - added some missing le*_to_cpu() calls;
> - update date at copyright for new files to 2024-2025;
> - qmp command changed to: inject-ghes-v2-error ans since updated to 10.0;
> - added HEST and DSDT tables after the changes to make check target happy.
>   (two patches: first one whitelisting such tables; second one removing from
>    whitelist and updating/adding such tables to tests/data/acpi)
> 
> 
> Mauro Carvalho Chehab (21):
>   tests/acpi: virt: add an empty HEST file
>   tests/qtest/bios-tables-test: extend to also check HEST table
>   tests/acpi: virt: update HEST file with its current data
>   acpi/ghes: Cleanup the code which gets ghes ged state
>   acpi/ghes: prepare to change the way HEST offsets are calculated
>   acpi/ghes: add a firmware file with HEST address
>   acpi/ghes: Use HEST table offsets when preparing GHES records
>   acpi/ghes: don't hard-code the number of sources for HEST table
>   acpi/ghes: add a notifier to notify when error data is ready
>   acpi/ghes: create an ancillary acpi_ghes_get_state() function
>   acpi/generic_event_device: Update GHES migration to cover hest addr
>   acpi/generic_event_device: add logic to detect if HEST addr is
>     available
>   acpi/generic_event_device: add an APEI error device
>   tests/acpi: virt: allow acpi table changes at DSDT and HEST tables
>   arm/virt: Wire up a GED error device for ACPI / GHES
>   qapi/acpi-hest: add an interface to do generic CPER error injection
>   tests/acpi: virt: update HEST table to accept two sources
>   tests/acpi: virt: and update DSDT table to add the new GED device
>   docs: hest: add new "etc/acpi_table_hest_addr" and update workflow
>   acpi/generic_event_device.c: enable use_hest_addr for QEMU 10.x
>   scripts/ghes_inject: add a script to generate GHES error inject
> 
>  MAINTAINERS                                   |  10 +
>  docs/specs/acpi_hest_ghes.rst                 |  28 +-
>  hw/acpi/Kconfig                               |   5 +
>  hw/acpi/aml-build.c                           |  10 +
>  hw/acpi/generic_event_device.c                |  43 ++
>  hw/acpi/ghes-stub.c                           |   7 +-
>  hw/acpi/ghes.c                                | 231 ++++--
>  hw/acpi/ghes_cper.c                           |  38 +
>  hw/acpi/ghes_cper_stub.c                      |  19 +
>  hw/acpi/meson.build                           |   2 +
>  hw/arm/virt-acpi-build.c                      |  36 +-
>  hw/arm/virt.c                                 |  19 +-
>  hw/core/machine.c                             |   2 +
>  include/hw/acpi/acpi_dev_interface.h          |   1 +
>  include/hw/acpi/aml-build.h                   |   2 +
>  include/hw/acpi/generic_event_device.h        |   1 +
>  include/hw/acpi/ghes.h                        |  52 +-
>  include/hw/arm/virt.h                         |   2 +
>  qapi/acpi-hest.json                           |  35 +
>  qapi/meson.build                              |   1 +
>  qapi/qapi-schema.json                         |   1 +
>  scripts/arm_processor_error.py                | 476 ++++++++++++
>  scripts/ghes_inject.py                        |  51 ++
>  scripts/qmp_helper.py                         | 702 ++++++++++++++++++
>  target/arm/kvm.c                              |   7 +-
>  tests/data/acpi/aarch64/virt/DSDT             | Bin 5196 -> 5240 bytes
>  .../data/acpi/aarch64/virt/DSDT.acpihmatvirt  | Bin 5282 -> 5326 bytes
>  tests/data/acpi/aarch64/virt/DSDT.memhp       | Bin 6557 -> 6601 bytes
>  tests/data/acpi/aarch64/virt/DSDT.pxb         | Bin 7679 -> 7723 bytes
>  tests/data/acpi/aarch64/virt/DSDT.topology    | Bin 5398 -> 5442 bytes
>  tests/data/acpi/aarch64/virt/HEST             | Bin 0 -> 224 bytes
>  tests/qtest/bios-tables-test.c                |   2 +-
>  32 files changed, 1692 insertions(+), 91 deletions(-)
>  create mode 100644 hw/acpi/ghes_cper.c
>  create mode 100644 hw/acpi/ghes_cper_stub.c
>  create mode 100644 qapi/acpi-hest.json
>  create mode 100644 scripts/arm_processor_error.py
>  create mode 100755 scripts/ghes_inject.py
>  create mode 100755 scripts/qmp_helper.py
>  create mode 100644 tests/data/acpi/aarch64/virt/HEST
>
Mauro Carvalho Chehab Feb. 27, 2025, 3:13 p.m. UTC | #2
Em Thu, 27 Feb 2025 14:30:28 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Thu, 27 Feb 2025 12:03:30 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Now that the ghes preparation patches were merged, let's add support
> > for error injection.
> > 
> > On this version, HEST table got added to ACPI tables testing for aarch64 virt.
> > 
> > There are also some patch reorder to help reviewers to check the changes.
> > 
> > The code itself is almost identical to v4, with just a few minor nits addressed.  
> 
> series still has checkpatch errors 'line over 80' which are not false positive,
> it needs to be fixed

The long line warnings are at the patch adding the Python script. IMO,
all but one are false positives:

1. Long lines at patch description because of the tool output example added
   inside the commit description:

	ERROR: line over 90 characters
	#148: FILE: scripts/arm_processor_error.py:83:
	+[Hardware Error]:     bus error, operation type: Generic read (type of instruction or data request cannot be determined)

	ERROR: line over 90 characters
	#153: FILE: scripts/arm_processor_error.py:88:
	+[Hardware Error]:     Program execution can be restarted reliably at the PC associated with the error.

	WARNING: line over 80 characters
	#170: FILE: scripts/arm_processor_error.py:105:
	+[Hardware Error]:    00000000: 13 7b 04 05 01                                   .{...

	WARNING: line over 80 characters
	#174: FILE: scripts/arm_processor_error.py:109:
	+[Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error

	ERROR: line over 90 characters
	#175: FILE: scripts/arm_processor_error.py:110:
	+[Firmware Warn]: GHES: Unhandled processor error type 0x14: TLB error|micro-architectural error

   IMO, breaking command output at the description is a bad practice.

2. Big strings at help message:

	WARNING: line over 80 characters
	#261: FILE: scripts/arm_processor_error.py:196:
	+                           help="Power State Coordination Interface - PSCI state")

	ERROR: line over 90 characters
	#276: FILE: scripts/arm_processor_error.py:211:
	+                        help="Number of errors: 0: Single error, 1: Multiple errors, 2-65535: Error count if known")

	WARNING: line over 80 characters
	#278: FILE: scripts/arm_processor_error.py:213:
	+                        help="Error information (UEFI 2.10 tables N.18 to N.20)")

	ERROR: line over 90 characters
	#287: FILE: scripts/arm_processor_error.py:222:
	+                        help="Type of the context (0=ARM32 GPR, 5=ARM64 EL1, other values supported)")


	WARNING: line over 80 characters
	#1046: FILE: scripts/qmp_helper.py:442:
	+                           help="Marks the timestamp as precise if --timestamp is used")

	WARNING: line over 80 characters
	#1048: FILE: scripts/qmp_helper.py:444:
	+                           help=f"General Error Data Block flags: {gedb_flags_bits}")

   Those might be changed if we add one variable per string to store the
   help lines, at the expense of doing some code obfuscation.

   I don't think doing it is a good idea.

3. Long class function names that are part of Python's standard library:

	ERROR: line over 90 characters
	#576: FILE: scripts/ghes_inject.py:29:
	+    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter,

   We can't change the big name of the argparse formatter. The only
   possible fix would be to obfuscate it by doing:

	format = argparse.ArgumentDefaultsHelpFormatter,
	parser = argparse.ArgumentParser(formatter_class=format,

   IMO this is a bad practice.

4. False-positive warning disable for pylint coding style tool:

	ERROR: line over 90 characters
	#805: FILE: scripts/qmp_helper.py:201:
	+        data.extend(value.to_bytes(num_bytes, byteorder="little"))  # pylint: disable=E1101

	WARNING: line over 80 characters
	#1028: FILE: scripts/qmp_helper.py:424:
	+        g_gen = parser.add_argument_group("Generic Error Data")  # pylint: disable=E1101

   AFAIKT, those need to be at the same line for pylint to process them
   properly.

5. A long name inside an indented block:

	WARNING: line over 80 characters
	#1109: FILE: scripts/qmp_helper.py:505:
	+                                                   value=args.gen_err_valid_bits,

   Again the only solution would be to obfuscate the argument, like:

	a = args.gen_err_valid_bits

							    value=a,

   Not nice, IMHO.

Now, there is one warning that I is not a false positive, which I ended
missing:

	WARNING: line over 80 characters
	#1227: FILE: scripts/qmp_helper.py:623:
	+            ret = self.send_cmd("qom-get", args, may_open=True, return_error=False)

I'll fix it at the next respin.

Regards,
Mauro