mbox series

[v4,0/5] Some fixes about vgic-its

Message ID 20241107214137.428439-1-jingzhangos@google.com (mailing list archive)
Headers show
Series Some fixes about vgic-its | expand

Message

Jing Zhang Nov. 7, 2024, 9:41 p.m. UTC
This patch series addresses a critical issue in the VGIC ITS tables'
save/restore mechanism, accompanied by a comprehensive selftest for bug
reproduction and verification.

The fix is originally from Kunkun Jiang at [1]. 

The identified bug manifests as a failure in VM suspend/resume operations.
The root cause lies in the repeated suspend attempts often required for
successful VM suspension, coupled with concurrent device interrupt registration
and freeing. This concurrency leads to inconsistencies in ITS mappings before
the save operation, potentially leaving orphaned Device Translation Entries
(DTEs) and Interrupt Translation Entries (ITEs) in the respective tables.

During the subsequent restore operation, encountering these orphaned entries
can result in two error scenarios:
* EINVAL Error: If an orphaned entry lacks a corresponding collection ID, the
  restore operation fails with an EINVAL error.
* Mapping Corruption: If an orphaned entry possesses a valid collection ID, the
  restore operation may succeed but with incorrect or lost mappings,
  compromising system integrity.

The provided selftest facilitates the reproduction of both error scenarios:
* EINVAL Reproduction: Execute ./vgic_its_tables without any options.
* Mapping Corruption Reproduction: Execute ./vgic_its_tables -s
  The -s option enforces identical collection IDs for all mappings.
* A workaround within the selftest involves clearing the tables before the save
  operation using the command ./vgic_its_tables -c. With this, we can run the
  the selftest successfully on host w/o the fix.

---

* v3 -> v4:
  - Added two helper functions for table entry read/write in guest memory.
  - Move selftest as the first patch to easily run on a host without the fix.

* v2 -> v3:
  - Rebased to v6.12-rc6
  - Fixed some typos
  - Added a selftest for bug reproduction and verification

* v1 -> v2:
  - Replaced BUG_ON() with KVM_BUG_ON()

[1] https://lore.kernel.org/linux-arm-kernel/20240704142319.728-1-jiangkunkun@huawei.com

---

Jing Zhang (2):
  KVM: selftests: aarch64: Add VGIC selftest for save/restore ITS table
    mappings
  KVM: arm64: vgic-its: Add read/write helpers on ITS table entries.

Kunkun Jiang (3):
  KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*
  KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device
  KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE

 arch/arm64/kvm/vgic/vgic-its.c                |  31 +-
 arch/arm64/kvm/vgic/vgic.h                    |  23 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/aarch64/vgic_its_tables.c   | 565 ++++++++++++++++++
 .../kvm/include/aarch64/gic_v3_its.h          |   3 +-
 .../testing/selftests/kvm/include/kvm_util.h  |   4 +-
 .../selftests/kvm/lib/aarch64/gic_v3_its.c    |  24 +-
 7 files changed, 631 insertions(+), 20 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/aarch64/vgic_its_tables.c


base-commit: 59b723cd2adbac2a34fc8e12c74ae26ae45bf230

Comments

Oliver Upton Nov. 11, 2024, 8:40 p.m. UTC | #1
On Thu, 7 Nov 2024 13:41:32 -0800, Jing Zhang wrote:
> This patch series addresses a critical issue in the VGIC ITS tables'
> save/restore mechanism, accompanied by a comprehensive selftest for bug
> reproduction and verification.
> 
> The fix is originally from Kunkun Jiang at [1].
> 
> The identified bug manifests as a failure in VM suspend/resume operations.
> The root cause lies in the repeated suspend attempts often required for
> successful VM suspension, coupled with concurrent device interrupt registration
> and freeing. This concurrency leads to inconsistencies in ITS mappings before
> the save operation, potentially leaving orphaned Device Translation Entries
> (DTEs) and Interrupt Translation Entries (ITEs) in the respective tables.
> 
> [...]

Taking the immediate fixes for now, selftest might need a bit more work
(will review soon). Note that I squashed patch 2 + 3 together as well.

Applied to kvmarm/next, thanks!

[3/5] KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*
      https://git.kernel.org/kvmarm/kvmarm/c/7fe28d7e68f9
[4/5] KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a device
      https://git.kernel.org/kvmarm/kvmarm/c/e9649129d33d
[5/5] KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITE
      https://git.kernel.org/kvmarm/kvmarm/c/7602ffd1d5e8

--
Best,
Oliver