diff mbox series

[v2,1/1] KVM: s390: vsie: fix wrong VIR 37 when MSO is used

Message ID 20231102153549.53984-1-imbrenda@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/1] KVM: s390: vsie: fix wrong VIR 37 when MSO is used | expand

Commit Message

Claudio Imbrenda Nov. 2, 2023, 3:35 p.m. UTC
When the host invalidates a guest page, it will also check if the page
was used to map the prefix of any guest CPUs, in which case they are
stopped and marked as needing a prefix refresh. Upon starting the
affected CPUs again, their prefix pages are explicitly faulted in and
revalidated if they had been invalidated. A bit in the PGSTEs indicates
whether or not a page might contain a prefix. The bit is allowed to
overindicate. Pages above 2G are skipped, because they cannot be
prefixes, since KVM runs all guests with MSO = 0.

The same applies for nested guests (VSIE). When the host invalidates a
guest page that maps the prefix of the nested guest, it has to stop the
affected nested guest CPUs and mark them as needing a prefix refresh.
The same PGSTE bit used for the guest prefix is also used for the
nested guest. Pages above 2G are skipped like for normal guests, which
is the source of the bug.

The nested guest runs is the guest primary address space. The guest
could be running the nested guest using MSO != 0. If the MSO + prefix
for the nested guest is above 2G, the check for nested prefix will skip
it. This will cause the invalidation notifier to not stop the CPUs of
the nested guest and not mark them as needing refresh. When the nested
guest is run again, its prefix will not be refreshed, since it has not
been marked for refresh. This will cause a fatal validity intercept
with VIR code 37.

Fix this by removing the check for 2G for nested guests. Now all
invalidations of pages with the notify bit set will always scan the
existing VSIE shadow state descriptors.

This allows to catch invalidations of nested guest prefix mappings even
when the prefix is above 2G in the guest virtual address space.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/vsie.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

David Hildenbrand Nov. 2, 2023, 8:38 p.m. UTC | #1
On 02.11.23 16:35, Claudio Imbrenda wrote:
> When the host invalidates a guest page, it will also check if the page
> was used to map the prefix of any guest CPUs, in which case they are
> stopped and marked as needing a prefix refresh. Upon starting the
> affected CPUs again, their prefix pages are explicitly faulted in and
> revalidated if they had been invalidated. A bit in the PGSTEs indicates
> whether or not a page might contain a prefix. The bit is allowed to
> overindicate. Pages above 2G are skipped, because they cannot be
> prefixes, since KVM runs all guests with MSO = 0.
> 
> The same applies for nested guests (VSIE). When the host invalidates a
> guest page that maps the prefix of the nested guest, it has to stop the
> affected nested guest CPUs and mark them as needing a prefix refresh.
> The same PGSTE bit used for the guest prefix is also used for the
> nested guest. Pages above 2G are skipped like for normal guests, which
> is the source of the bug.
> 
> The nested guest runs is the guest primary address space. The guest
> could be running the nested guest using MSO != 0. If the MSO + prefix
> for the nested guest is above 2G, the check for nested prefix will skip
> it. This will cause the invalidation notifier to not stop the CPUs of
> the nested guest and not mark them as needing refresh. When the nested
> guest is run again, its prefix will not be refreshed, since it has not
> been marked for refresh. This will cause a fatal validity intercept
> with VIR code 37.
> 
> Fix this by removing the check for 2G for nested guests. Now all
> invalidations of pages with the notify bit set will always scan the
> existing VSIE shadow state descriptors.
> 
> This allows to catch invalidations of nested guest prefix mappings even
> when the prefix is above 2G in the guest virtual address space.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
>   arch/s390/kvm/vsie.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index 61499293c2ac..e55f489e1fb7 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -587,10 +587,6 @@ void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start,
>   
>   	if (!gmap_is_shadow(gmap))
>   		return;
> -	if (start >= 1UL << 31)
> -		/* We are only interested in prefix pages */
> -		return;
> -
>   	/*
>   	 * Only new shadow blocks are added to the list during runtime,
>   	 * therefore we can safely reference them all the time.

Right, mso is 64bit, the prefix is 18bit (shifted by 13) -> 31bit.

Reviewed-by: David Hildenbrand <david@redhat.com>

Does it make sense to remember the maximum prefix address across all 
shadows (or if the mso was ever 0), so we can optimize for the mso == 0 
case and not have to go through all vsie pages on all notification? 
Sounds like a reasonable optimization on top.
Nico Boehr Nov. 3, 2023, 6:50 a.m. UTC | #2
Quoting Claudio Imbrenda (2023-11-02 16:35:49)
> When the host invalidates a guest page, it will also check if the page
> was used to map the prefix of any guest CPUs, in which case they are
> stopped and marked as needing a prefix refresh. Upon starting the
> affected CPUs again, their prefix pages are explicitly faulted in and
> revalidated if they had been invalidated. A bit in the PGSTEs indicates
> whether or not a page might contain a prefix. The bit is allowed to
> overindicate. Pages above 2G are skipped, because they cannot be
> prefixes, since KVM runs all guests with MSO = 0.
> 
> The same applies for nested guests (VSIE). When the host invalidates a
> guest page that maps the prefix of the nested guest, it has to stop the
> affected nested guest CPUs and mark them as needing a prefix refresh.
> The same PGSTE bit used for the guest prefix is also used for the
> nested guest. Pages above 2G are skipped like for normal guests, which
> is the source of the bug.
> 
> The nested guest runs is the guest primary address space. The guest
> could be running the nested guest using MSO != 0. If the MSO + prefix
> for the nested guest is above 2G, the check for nested prefix will skip
> it. This will cause the invalidation notifier to not stop the CPUs of
> the nested guest and not mark them as needing refresh. When the nested
> guest is run again, its prefix will not be refreshed, since it has not
> been marked for refresh. This will cause a fatal validity intercept
> with VIR code 37.
> 
> Fix this by removing the check for 2G for nested guests. Now all
> invalidations of pages with the notify bit set will always scan the
> existing VSIE shadow state descriptors.
> 
> This allows to catch invalidations of nested guest prefix mappings even
> when the prefix is above 2G in the guest virtual address space.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

Tested-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Claudio Imbrenda Nov. 3, 2023, 2:21 p.m. UTC | #3
On Thu, 2 Nov 2023 21:38:43 +0100
David Hildenbrand <david@redhat.com> wrote:

[...]

> >   	/*
> >   	 * Only new shadow blocks are added to the list during runtime,
> >   	 * therefore we can safely reference them all the time.  
> 
> Right, mso is 64bit, the prefix is 18bit (shifted by 13) -> 31bit.
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

thanks

> 
> Does it make sense to remember the maximum prefix address across all 
> shadows (or if the mso was ever 0), so we can optimize for the mso == 0 
> case and not have to go through all vsie pages on all notification? 
> Sounds like a reasonable optimization on top.
 
yes, but it adds complexity to already complex code
David Hildenbrand Nov. 3, 2023, 2:22 p.m. UTC | #4
On 03.11.23 15:21, Claudio Imbrenda wrote:
> On Thu, 2 Nov 2023 21:38:43 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
> [...]
> 
>>>    	/*
>>>    	 * Only new shadow blocks are added to the list during runtime,
>>>    	 * therefore we can safely reference them all the time.
>>
>> Right, mso is 64bit, the prefix is 18bit (shifted by 13) -> 31bit.
>>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> thanks
> 
>>
>> Does it make sense to remember the maximum prefix address across all
>> shadows (or if the mso was ever 0), so we can optimize for the mso == 0
>> case and not have to go through all vsie pages on all notification?
>> Sounds like a reasonable optimization on top.
>   
> yes, but it adds complexity to already complex code
> 

Such an optimization would be very simplistic.
diff mbox series

Patch

diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 61499293c2ac..e55f489e1fb7 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -587,10 +587,6 @@  void kvm_s390_vsie_gmap_notifier(struct gmap *gmap, unsigned long start,
 
 	if (!gmap_is_shadow(gmap))
 		return;
-	if (start >= 1UL << 31)
-		/* We are only interested in prefix pages */
-		return;
-
 	/*
 	 * Only new shadow blocks are added to the list during runtime,
 	 * therefore we can safely reference them all the time.