Message ID | 20231102153549.53984-1-imbrenda@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/1] KVM: s390: vsie: fix wrong VIR 37 when MSO is used | expand |
On 02.11.23 16:35, Claudio Imbrenda wrote: > When the host invalidates a guest page, it will also check if the page > was used to map the prefix of any guest CPUs, in which case they are > stopped and marked as needing a prefix refresh. Upon starting the > affected CPUs again, their prefix pages are explicitly faulted in and > revalidated if they had been invalidated. A bit in the PGSTEs indicates > whether or not a page might contain a prefix. The bit is allowed to > overindicate. Pages above 2G are skipped, because they cannot be > prefixes, since KVM runs all guests with MSO = 0. > > The same applies for nested guests (VSIE). When the host invalidates a > guest page that maps the prefix of the nested guest, it has to stop the > affected nested guest CPUs and mark them as needing a prefix refresh. > The same PGSTE bit used for the guest prefix is also used for the > nested guest. Pages above 2G are skipped like for normal guests, which > is the source of the bug. > > The nested guest runs is the guest primary address space. The guest > could be running the nested guest using MSO != 0. If the MSO + prefix > for the nested guest is above 2G, the check for nested prefix will skip > it. This will cause the invalidation notifier to not stop the CPUs of > the nested guest and not mark them as needing refresh. When the nested > guest is run again, its prefix will not be refreshed, since it has not > been marked for refresh. This will cause a fatal validity intercept > with VIR code 37. > > Fix this by removing the check for 2G for nested guests. Now all > invalidations of pages with the notify bit set will always scan the > existing VSIE shadow state descriptors. > > This allows to catch invalidations of nested guest prefix mappings even > when the prefix is above 2G in the guest virtual address space. > > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > --- > arch/s390/kvm/vsie.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c > index 61499293c2ac..e55f489e1fb7 100644 > --- a/arch/s390/kvm/vsie.c > +++ b/arch/s390/kvm/vsie.c > @@ -587,10 +587,6 @@ void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start, > > if (!gmap_is_shadow(gmap)) > return; > - if (start >= 1UL << 31) > - /* We are only interested in prefix pages */ > - return; > - > /* > * Only new shadow blocks are added to the list during runtime, > * therefore we can safely reference them all the time. Right, mso is 64bit, the prefix is 18bit (shifted by 13) -> 31bit. Reviewed-by: David Hildenbrand <david@redhat.com> Does it make sense to remember the maximum prefix address across all shadows (or if the mso was ever 0), so we can optimize for the mso == 0 case and not have to go through all vsie pages on all notification? Sounds like a reasonable optimization on top.
Quoting Claudio Imbrenda (2023-11-02 16:35:49) > When the host invalidates a guest page, it will also check if the page > was used to map the prefix of any guest CPUs, in which case they are > stopped and marked as needing a prefix refresh. Upon starting the > affected CPUs again, their prefix pages are explicitly faulted in and > revalidated if they had been invalidated. A bit in the PGSTEs indicates > whether or not a page might contain a prefix. The bit is allowed to > overindicate. Pages above 2G are skipped, because they cannot be > prefixes, since KVM runs all guests with MSO = 0. > > The same applies for nested guests (VSIE). When the host invalidates a > guest page that maps the prefix of the nested guest, it has to stop the > affected nested guest CPUs and mark them as needing a prefix refresh. > The same PGSTE bit used for the guest prefix is also used for the > nested guest. Pages above 2G are skipped like for normal guests, which > is the source of the bug. > > The nested guest runs is the guest primary address space. The guest > could be running the nested guest using MSO != 0. If the MSO + prefix > for the nested guest is above 2G, the check for nested prefix will skip > it. This will cause the invalidation notifier to not stop the CPUs of > the nested guest and not mark them as needing refresh. When the nested > guest is run again, its prefix will not be refreshed, since it has not > been marked for refresh. This will cause a fatal validity intercept > with VIR code 37. > > Fix this by removing the check for 2G for nested guests. Now all > invalidations of pages with the notify bit set will always scan the > existing VSIE shadow state descriptors. > > This allows to catch invalidations of nested guest prefix mappings even > when the prefix is above 2G in the guest virtual address space. > > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Tested-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
On Thu, 2 Nov 2023 21:38:43 +0100 David Hildenbrand <david@redhat.com> wrote: [...] > > /* > > * Only new shadow blocks are added to the list during runtime, > > * therefore we can safely reference them all the time. > > Right, mso is 64bit, the prefix is 18bit (shifted by 13) -> 31bit. > > Reviewed-by: David Hildenbrand <david@redhat.com> thanks > > Does it make sense to remember the maximum prefix address across all > shadows (or if the mso was ever 0), so we can optimize for the mso == 0 > case and not have to go through all vsie pages on all notification? > Sounds like a reasonable optimization on top. yes, but it adds complexity to already complex code
On 03.11.23 15:21, Claudio Imbrenda wrote: > On Thu, 2 Nov 2023 21:38:43 +0100 > David Hildenbrand <david@redhat.com> wrote: > > [...] > >>> /* >>> * Only new shadow blocks are added to the list during runtime, >>> * therefore we can safely reference them all the time. >> >> Right, mso is 64bit, the prefix is 18bit (shifted by 13) -> 31bit. >> >> Reviewed-by: David Hildenbrand <david@redhat.com> > > thanks > >> >> Does it make sense to remember the maximum prefix address across all >> shadows (or if the mso was ever 0), so we can optimize for the mso == 0 >> case and not have to go through all vsie pages on all notification? >> Sounds like a reasonable optimization on top. > > yes, but it adds complexity to already complex code > Such an optimization would be very simplistic.
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index 61499293c2ac..e55f489e1fb7 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -587,10 +587,6 @@ void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start, if (!gmap_is_shadow(gmap)) return; - if (start >= 1UL << 31) - /* We are only interested in prefix pages */ - return; - /* * Only new shadow blocks are added to the list during runtime, * therefore we can safely reference them all the time.
When the host invalidates a guest page, it will also check if the page was used to map the prefix of any guest CPUs, in which case they are stopped and marked as needing a prefix refresh. Upon starting the affected CPUs again, their prefix pages are explicitly faulted in and revalidated if they had been invalidated. A bit in the PGSTEs indicates whether or not a page might contain a prefix. The bit is allowed to overindicate. Pages above 2G are skipped, because they cannot be prefixes, since KVM runs all guests with MSO = 0. The same applies for nested guests (VSIE). When the host invalidates a guest page that maps the prefix of the nested guest, it has to stop the affected nested guest CPUs and mark them as needing a prefix refresh. The same PGSTE bit used for the guest prefix is also used for the nested guest. Pages above 2G are skipped like for normal guests, which is the source of the bug. The nested guest runs is the guest primary address space. The guest could be running the nested guest using MSO != 0. If the MSO + prefix for the nested guest is above 2G, the check for nested prefix will skip it. This will cause the invalidation notifier to not stop the CPUs of the nested guest and not mark them as needing refresh. When the nested guest is run again, its prefix will not be refreshed, since it has not been marked for refresh. This will cause a fatal validity intercept with VIR code 37. Fix this by removing the check for 2G for nested guests. Now all invalidations of pages with the notify bit set will always scan the existing VSIE shadow state descriptors. This allows to catch invalidations of nested guest prefix mappings even when the prefix is above 2G in the guest virtual address space. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> --- arch/s390/kvm/vsie.c | 4 ---- 1 file changed, 4 deletions(-)