Message ID | 20230421085036.52511-2-imbrenda@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: s390: pv: fix asynchronous teardown for small VMs | expand |
On 4/21/23 10:50, Claudio Imbrenda wrote: > On machines without the Destroy Secure Configuration Fast UVC, the > topmost level of page tables is set aside and freed asynchronously > as last step of the asynchronous teardown. > > Each gmap has a host_to_guest radix tree mapping host (userspace) > addresses (with 1M granularity) to gmap segment table entries (pmds). > > If a guest is smaller than 2GB, the topmost level of page tables is the > segment table (i.e. there are only 2 levels). Replacing it means that > the pointers in the host_to_guest mapping would become stale and cause > all kinds of nasty issues. > > This patch fixes the issue by disallowing asynchronous teardown for > guests with only 2 levels of page tables. Userspace should (and already > does) try using the normal destroy if the asynchronous one fails. > > Update s390_replace_asce so it refuses to replace segment type ASCEs. > This is still needed in case the normal destroy VM fails. > > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot") Since QEMU will do a normal PV disable on a rc != 0 this should work out just fine. The less code to fix this, the better. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> > --- > arch/s390/kvm/pv.c | 5 +++++ > arch/s390/mm/gmap.c | 7 +++++++ > 2 files changed, 12 insertions(+) > > diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c > index e032ebbf51b9..3ce5f4351156 100644 > --- a/arch/s390/kvm/pv.c > +++ b/arch/s390/kvm/pv.c > @@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc) > */ > if (kvm->arch.pv.set_aside) > return -EINVAL; > + > + /* Guest with segment type ASCE, refuse to destroy asynchronously */ > + if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) > + return -EINVAL; > + > priv = kzalloc(sizeof(*priv), GFP_KERNEL); > if (!priv) > return -ENOMEM; > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > index 5a716bdcba05..2267cf9819b2 100644 > --- a/arch/s390/mm/gmap.c > +++ b/arch/s390/mm/gmap.c > @@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce); > * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy > * @gmap: the gmap whose ASCE needs to be replaced > * > + * If the ASCE is a SEGMENT type then this function will return -EINVAL, > + * otherwise the pointers in the host_to_guest radix tree will keep pointing > + * to the wrong pages, causing use-after-free and memory corruption. > * If the allocation of the new top level page table fails, the ASCE is not > * replaced. > * In any case, the old ASCE is always removed from the gmap CRST list. > @@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap) > > s390_unlist_old_asce(gmap); > > + /* Replacing segment type ASCEs would cause serious issues */ > + if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) > + return -EINVAL; > + > page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); > if (!page) > return -ENOMEM;
Claudio Imbrenda <imbrenda@linux.ibm.com> writes: > On machines without the Destroy Secure Configuration Fast UVC, the > topmost level of page tables is set aside and freed asynchronously > as last step of the asynchronous teardown. > > Each gmap has a host_to_guest radix tree mapping host (userspace) > addresses (with 1M granularity) to gmap segment table entries (pmds). > > If a guest is smaller than 2GB, the topmost level of page tables is the > segment table (i.e. there are only 2 levels). Replacing it means that > the pointers in the host_to_guest mapping would become stale and cause > all kinds of nasty issues. > > This patch fixes the issue by disallowing asynchronous teardown for > guests with only 2 levels of page tables. Userspace should (and already > does) try using the normal destroy if the asynchronous one fails. > > Update s390_replace_asce so it refuses to replace segment type ASCEs. > This is still needed in case the normal destroy VM fails. > > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot") > --- > arch/s390/kvm/pv.c | 5 +++++ > arch/s390/mm/gmap.c | 7 +++++++ > 2 files changed, 12 insertions(+) > > diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c > index e032ebbf51b9..3ce5f4351156 100644 > --- a/arch/s390/kvm/pv.c > +++ b/arch/s390/kvm/pv.c > @@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc) > */ > if (kvm->arch.pv.set_aside) > return -EINVAL; > + > + /* Guest with segment type ASCE, refuse to destroy asynchronously */ > + if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) > + return -EINVAL; > + > priv = kzalloc(sizeof(*priv), GFP_KERNEL); > if (!priv) > return -ENOMEM; > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > index 5a716bdcba05..2267cf9819b2 100644 > --- a/arch/s390/mm/gmap.c > +++ b/arch/s390/mm/gmap.c > @@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce); > * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy > * @gmap: the gmap whose ASCE needs to be replaced > * > + * If the ASCE is a SEGMENT type then this function will return -EINVAL, > + * otherwise the pointers in the host_to_guest radix tree will keep pointing > + * to the wrong pages, causing use-after-free and memory corruption. > * If the allocation of the new top level page table fails, the ASCE is not > * replaced. > * In any case, the old ASCE is always removed from the gmap CRST list. > @@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap) > > s390_unlist_old_asce(gmap); > > + /* Replacing segment type ASCEs would cause serious issues */ > + if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) > + return -EINVAL; As discussed... not sure if this is a valid scenario or if it can be considered a bug if it happens. > + > page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); > if (!page) > return -ENOMEM; > -- > 2.40.0 IMO, much better. Reviewed-by: Marc Hartmayer <mhartmay@linux.ibm.com>
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c index e032ebbf51b9..3ce5f4351156 100644 --- a/arch/s390/kvm/pv.c +++ b/arch/s390/kvm/pv.c @@ -314,6 +314,11 @@ int kvm_s390_pv_set_aside(struct kvm *kvm, u16 *rc, u16 *rrc) */ if (kvm->arch.pv.set_aside) return -EINVAL; + + /* Guest with segment type ASCE, refuse to destroy asynchronously */ + if ((kvm->arch.gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) + return -EINVAL; + priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 5a716bdcba05..2267cf9819b2 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2833,6 +2833,9 @@ EXPORT_SYMBOL_GPL(s390_unlist_old_asce); * s390_replace_asce - Try to replace the current ASCE of a gmap with a copy * @gmap: the gmap whose ASCE needs to be replaced * + * If the ASCE is a SEGMENT type then this function will return -EINVAL, + * otherwise the pointers in the host_to_guest radix tree will keep pointing + * to the wrong pages, causing use-after-free and memory corruption. * If the allocation of the new top level page table fails, the ASCE is not * replaced. * In any case, the old ASCE is always removed from the gmap CRST list. @@ -2847,6 +2850,10 @@ int s390_replace_asce(struct gmap *gmap) s390_unlist_old_asce(gmap); + /* Replacing segment type ASCEs would cause serious issues */ + if ((gmap->asce & _ASCE_TYPE_MASK) == _ASCE_TYPE_SEGMENT) + return -EINVAL; + page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER); if (!page) return -ENOMEM;
On machines without the Destroy Secure Configuration Fast UVC, the topmost level of page tables is set aside and freed asynchronously as last step of the asynchronous teardown. Each gmap has a host_to_guest radix tree mapping host (userspace) addresses (with 1M granularity) to gmap segment table entries (pmds). If a guest is smaller than 2GB, the topmost level of page tables is the segment table (i.e. there are only 2 levels). Replacing it means that the pointers in the host_to_guest mapping would become stale and cause all kinds of nasty issues. This patch fixes the issue by disallowing asynchronous teardown for guests with only 2 levels of page tables. Userspace should (and already does) try using the normal destroy if the asynchronous one fails. Update s390_replace_asce so it refuses to replace segment type ASCEs. This is still needed in case the normal destroy VM fails. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot") --- arch/s390/kvm/pv.c | 5 +++++ arch/s390/mm/gmap.c | 7 +++++++ 2 files changed, 12 insertions(+)