mbox series

[RFC,0/5] Change ghes driver to use HEST-based offsets

Message ID cover.1727782588.git.mchehab+huawei@kernel.org (mailing list archive)
Headers show
Series Change ghes driver to use HEST-based offsets | expand

Message

Mauro Carvalho Chehab Oct. 1, 2024, 11:42 a.m. UTC
This RFC series was part of the previous PR to add generic error injection
support on GHES.

It contains only the changes of the math used to calculate offsets at
HEST table and hardware_error firmware file.

The first patch adds a new firmware file to store HEST address.
The second patch makes use of it.
The remaining ones add migration support.

PS.: I'm sending this as a RFC as using the proceudure defined at the
pseudo-migration of:

	https://www.linux-kvm.org/page/Migration

Didn't work. I tried to use two different QEMU versions to check a
real life case and also to use just one QEMU and trying to load a
virt-9.1 state on a virt-9.2 machine. 

For instance, trying to restore a virt-9.1 state on virt-9.2 gave me
this error:

	(qemu) qemu: Machine type received is 'virt-9.1' and local is 'virt-9.2'
	qemu: load of migration failed: Invalid argument

Yet, running virt-9.1 used the old math code (offsets from hardware_errors firmware
file) while running virt-9.2 executed the new math code using HEST address.

Mauro Carvalho Chehab (5):
  acpi/ghes: add a firmware file with HEST address
  acpi/ghes: Use HEST table offsets when preparing GHES records
  acpi/generic_event_device: Update GHES migration to cover hest addr
  acpi/generic_event_device: add logic to detect if HEST addr is
    available
  arm/virt-acpi-build: Properly handle virt-9.1

 hw/acpi/generic_event_device.c |  30 +++++++++
 hw/acpi/ghes.c                 | 108 ++++++++++++++++++++++++++++++---
 hw/arm/virt-acpi-build.c       |  30 +++++++--
 hw/core/machine.c              |   4 +-
 include/hw/acpi/ghes.h         |   2 +
 5 files changed, 159 insertions(+), 15 deletions(-)

Comments

Igor Mammedov Oct. 2, 2024, 1:45 p.m. UTC | #1
On Tue,  1 Oct 2024 13:42:45 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> This RFC series was part of the previous PR to add generic error injection
> support on GHES.
> 
> It contains only the changes of the math used to calculate offsets at
> HEST table and hardware_error firmware file.
> 
> The first patch adds a new firmware file to store HEST address.
> The second patch makes use of it.
> The remaining ones add migration support.
> 
> PS.: I'm sending this as a RFC as using the proceudure defined at the
> pseudo-migration of:
> 
> 	https://www.linux-kvm.org/page/Migration
> 
> Didn't work. I tried to use two different QEMU versions to check a
> real life case and also to use just one QEMU and trying to load a
> virt-9.1 state on a virt-9.2 machine. 
> 
> For instance, trying to restore a virt-9.1 state on virt-9.2 gave me
> this error:
> 
> 	(qemu) qemu: Machine type received is 'virt-9.1' and local is 'virt-9.2'
> 	qemu: load of migration failed: Invalid argument

that's expected (idea is to keep machine type (virt-X) ABI stable so
it would work the same way on old and new QEMU)
migration is meant to move VM of the same machine type to a new/another QEMU instance.

i.e try migrate 

qemu-9.1 -M virt-9.1  => qemu-9.2 -M virt-9.1
and vice-versa
migration should succeed and memory error injection should still work
the old way in both instances (I don't recall anymore how to simulate SEA,
perhaps original author left a description how to do that somewhere on mail-list).

virt-9.2 is never meant to be 

> Yet, running virt-9.1 used the old math code (offsets from hardware_errors firmware
> file) while running virt-9.2 executed the new math code using HEST address.
> 
> Mauro Carvalho Chehab (5):
>   acpi/ghes: add a firmware file with HEST address
>   acpi/ghes: Use HEST table offsets when preparing GHES records
>   acpi/generic_event_device: Update GHES migration to cover hest addr
>   acpi/generic_event_device: add logic to detect if HEST addr is
>     available
>   arm/virt-acpi-build: Properly handle virt-9.1
> 
>  hw/acpi/generic_event_device.c |  30 +++++++++
>  hw/acpi/ghes.c                 | 108 ++++++++++++++++++++++++++++++---
>  hw/arm/virt-acpi-build.c       |  30 +++++++--
>  hw/core/machine.c              |   4 +-
>  include/hw/acpi/ghes.h         |   2 +
>  5 files changed, 159 insertions(+), 15 deletions(-)
>
Mauro Carvalho Chehab Nov. 13, 2024, 6:54 a.m. UTC | #2
Em Wed, 2 Oct 2024 15:45:34 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Tue,  1 Oct 2024 13:42:45 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > This RFC series was part of the previous PR to add generic error injection
> > support on GHES.
> > 
> > It contains only the changes of the math used to calculate offsets at
> > HEST table and hardware_error firmware file.
> > 
> > The first patch adds a new firmware file to store HEST address.
> > The second patch makes use of it.
> > The remaining ones add migration support.
> > 
> > PS.: I'm sending this as a RFC as using the proceudure defined at the
> > pseudo-migration of:
> > 
> > 	https://www.linux-kvm.org/page/Migration
> > 
> > Didn't work. I tried to use two different QEMU versions to check a
> > real life case and also to use just one QEMU and trying to load a
> > virt-9.1 state on a virt-9.2 machine. 
> > 
> > For instance, trying to restore a virt-9.1 state on virt-9.2 gave me
> > this error:
> > 
> > 	(qemu) qemu: Machine type received is 'virt-9.1' and local is 'virt-9.2'
> > 	qemu: load of migration failed: Invalid argument  
> 
> that's expected (idea is to keep machine type (virt-X) ABI stable so
> it would work the same way on old and new QEMU)
> migration is meant to move VM of the same machine type to a new/another QEMU instance.

I found a couple of issues and, after the fixes, it can successfully
migrate both virt-9.1 and virt-9.2 machines. 

> 
> i.e try migrate 
> 
> qemu-9.1 -M virt-9.1  => qemu-9.2 -M virt-9.1
> and vice-versa
> migration should succeed and memory error injection should still work
> the old way in both instances (I don't recall anymore how to simulate SEA,
> perhaps original author left a description how to do that somewhere on mail-list).

Those work as well, but I had to pass -cpu cortex-a57 to both 9.1
and 9.2, as using -cpu max caused qemu to refuse loading the guest.

I tested with both:

	qemu-9.1 -M virt-9.1 -cpu cortex-a57 => qemu-9.2 -M virt-9.1 -cpu cortex-a57
	qemu-9.2 -M virt-9.1 -cpu cortex-a57 => qemu-9.1 -M virt-9.1 -cpu cortex-a57

I'll address your other comments to the series and post a new version
today.

Thanks,
Mauro
Mauro Carvalho Chehab Nov. 13, 2024, 6:57 a.m. UTC | #3
Em Wed, 13 Nov 2024 07:54:18 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Wed, 2 Oct 2024 15:45:34 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
> 
> > On Tue,  1 Oct 2024 13:42:45 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> > > This RFC series was part of the previous PR to add generic error injection
> > > support on GHES.
> > > 
> > > It contains only the changes of the math used to calculate offsets at
> > > HEST table and hardware_error firmware file.
> > > 
> > > The first patch adds a new firmware file to store HEST address.
> > > The second patch makes use of it.
> > > The remaining ones add migration support.
> > > 
> > > PS.: I'm sending this as a RFC as using the proceudure defined at the
> > > pseudo-migration of:
> > > 
> > > 	https://www.linux-kvm.org/page/Migration
> > > 
> > > Didn't work. I tried to use two different QEMU versions to check a
> > > real life case and also to use just one QEMU and trying to load a
> > > virt-9.1 state on a virt-9.2 machine. 
> > > 
> > > For instance, trying to restore a virt-9.1 state on virt-9.2 gave me
> > > this error:
> > > 
> > > 	(qemu) qemu: Machine type received is 'virt-9.1' and local is 'virt-9.2'
> > > 	qemu: load of migration failed: Invalid argument    
> > 
> > that's expected (idea is to keep machine type (virt-X) ABI stable so
> > it would work the same way on old and new QEMU)
> > migration is meant to move VM of the same machine type to a new/another QEMU instance.  
> 
> I found a couple of issues and, after the fixes, it can successfully
> migrate both virt-9.1 and virt-9.2 machines. 
> 
> > 
> > i.e try migrate 
> > 
> > qemu-9.1 -M virt-9.1  => qemu-9.2 -M virt-9.1
> > and vice-versa
> > migration should succeed and memory error injection should still work
> > the old way in both instances (I don't recall anymore how to simulate SEA,
> > perhaps original author left a description how to do that somewhere on mail-list).  
> 
> Those work as well, but I had to pass -cpu cortex-a57 to both 9.1
> and 9.2, as using -cpu max caused qemu to refuse loading the guest.
> 
> I tested with both:
> 
> 	qemu-9.1 -M virt-9.1 -cpu cortex-a57 => qemu-9.2 -M virt-9.1 -cpu cortex-a57
> 	qemu-9.2 -M virt-9.1 -cpu cortex-a57 => qemu-9.1 -M virt-9.1 -cpu cortex-a57

Forgot to mention, but I modified qemu-9.1 to use GPIO instead of SEA, as
it is a lot easier to do the tests using the error injection logic.
Also, I don't know how to test SEA errors.

> I'll address your other comments to the series and post a new version
> today.
> 
> Thanks,
> Mauro



Thanks,
Mauro