Message ID | 20221005163258.117232-2-nrb@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | KVM: s390: pv: fix clock comparator late after suspend/resume | expand |
On Wed, 5 Oct 2022 18:32:57 +0200 Nico Boehr <nrb@linux.ibm.com> wrote: > When running under PV, the guest's TOD clock is under control of the > ultravisor and the hypervisor isn't allowed to change it. Hence, don't > allow userspace to change the guest's TOD clock by returning > -EOPNOTSUPP. > > When userspace changes the guest's TOD clock, KVM updates its > kvm.arch.epoch field and, in addition, the epoch field in all state > descriptions of all VCPUs. > > But, under PV, the ultravisor will ignore the epoch field in the state > description and simply overwrite it on next SIE exit with the actual > guest epoch. This leads to KVM having an incorrect view of the guest's > TOD clock: it has updated its internal kvm.arch.epoch field, but the > ultravisor ignores the field in the state description. > > Whenever a guest is now waiting for a clock comparator, KVM will > incorrectly calculate the time when the guest should wake up, possibly > causing the guest to sleep for much longer than expected. > > With this change, kvm_s390_set_tod() will now take the kvm->lock to be > able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock() > also takes kvm->lock, use __kvm_s390_set_tod_clock() instead. > > Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible") > Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> > Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > --- > arch/s390/kvm/kvm-s390.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index b7ef0b71014d..0a8019b14c8f 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -1207,6 +1207,8 @@ static int kvm_s390_vm_get_migration(struct kvm *kvm, > return 0; > } > > +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > + > static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr) > { > struct kvm_s390_vm_tod_clock gtod; > @@ -1216,7 +1218,7 @@ static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr) > > if (!test_kvm_facility(kvm, 139) && gtod.epoch_idx) > return -EINVAL; > - kvm_s390_set_tod_clock(kvm, >od); > + __kvm_s390_set_tod_clock(kvm, >od); > > VM_EVENT(kvm, 3, "SET: TOD extension: 0x%x, TOD base: 0x%llx", > gtod.epoch_idx, gtod.tod); > @@ -1247,7 +1249,7 @@ static int kvm_s390_set_tod_low(struct kvm *kvm, struct kvm_device_attr *attr) > sizeof(gtod.tod))) > return -EFAULT; > > - kvm_s390_set_tod_clock(kvm, >od); > + __kvm_s390_set_tod_clock(kvm, >od); > VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod.tod); > return 0; > } > @@ -1259,6 +1261,12 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr) > if (attr->flags) > return -EINVAL; > > + mutex_lock(&kvm->lock); > + if (kvm_s390_pv_is_protected(kvm)) { > + ret = -EOPNOTSUPP; > + goto out_unlock; > + } > + > switch (attr->attr) { > case KVM_S390_VM_TOD_EXT: > ret = kvm_s390_set_tod_ext(kvm, attr); > @@ -1273,6 +1281,9 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr) > ret = -ENXIO; > break; > } > + > +out_unlock: > + mutex_unlock(&kvm->lock); > return ret; > } >
On 10/5/22 18:32, Nico Boehr wrote: > When running under PV, the guest's TOD clock is under control of the > ultravisor and the hypervisor isn't allowed to change it. Hence, don't > allow userspace to change the guest's TOD clock by returning > -EOPNOTSUPP. > > When userspace changes the guest's TOD clock, KVM updates its > kvm.arch.epoch field and, in addition, the epoch field in all state > descriptions of all VCPUs. > > But, under PV, the ultravisor will ignore the epoch field in the state > description and simply overwrite it on next SIE exit with the actual > guest epoch. This leads to KVM having an incorrect view of the guest's > TOD clock: it has updated its internal kvm.arch.epoch field, but the > ultravisor ignores the field in the state description. > > Whenever a guest is now waiting for a clock comparator, KVM will > incorrectly calculate the time when the guest should wake up, possibly > causing the guest to sleep for much longer than expected. > > With this change, kvm_s390_set_tod() will now take the kvm->lock to be > able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock() > also takes kvm->lock, use __kvm_s390_set_tod_clock() instead. > > Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible") > Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> > Signed-off-by: Nico Boehr <nrb@linux.ibm.com> > --- > arch/s390/kvm/kvm-s390.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) This will ONLY result in a warning and there's no way that this can result in QEMU crashing, right? > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index b7ef0b71014d..0a8019b14c8f 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -1207,6 +1207,8 @@ static int kvm_s390_vm_get_migration(struct kvm *kvm, > return 0; > } > > +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > + > static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr) > { > struct kvm_s390_vm_tod_clock gtod; > @@ -1216,7 +1218,7 @@ static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr) > > if (!test_kvm_facility(kvm, 139) && gtod.epoch_idx) > return -EINVAL; > - kvm_s390_set_tod_clock(kvm, >od); > + __kvm_s390_set_tod_clock(kvm, >od); > > VM_EVENT(kvm, 3, "SET: TOD extension: 0x%x, TOD base: 0x%llx", > gtod.epoch_idx, gtod.tod); > @@ -1247,7 +1249,7 @@ static int kvm_s390_set_tod_low(struct kvm *kvm, struct kvm_device_attr *attr) > sizeof(gtod.tod))) > return -EFAULT; > > - kvm_s390_set_tod_clock(kvm, >od); > + __kvm_s390_set_tod_clock(kvm, >od); > VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod.tod); > return 0; > } > @@ -1259,6 +1261,12 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr) > if (attr->flags) > return -EINVAL; > Add comment: For a protected guest the TOD is managed by the Ultravisor so trying to change it will never bring the expected results. -EOPNOTSUPP is a new return code for the tod attribute, therefore programs using it might need a fix to be able to handle it. And as -EOPNOTSUPP has never been used before you'll also need to update: Documentation/virt/kvm/devices/vm.rst > + mutex_lock(&kvm->lock); > + if (kvm_s390_pv_is_protected(kvm)) { > + ret = -EOPNOTSUPP; > + goto out_unlock; > + } > + > switch (attr->attr) { > case KVM_S390_VM_TOD_EXT: > ret = kvm_s390_set_tod_ext(kvm, attr); > @@ -1273,6 +1281,9 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr) > ret = -ENXIO; > break; > } > + > +out_unlock: > + mutex_unlock(&kvm->lock); > return ret; > } >
Quoting Janosch Frank (2022-10-10 17:20:10) [...] > This will ONLY result in a warning and there's no way that this can > result in QEMU crashing, right? Yes, QEMU code in hw/s390x/tod-kvm.c just sets an Error pointer which is then passed to warn_report(). So no crash is possible. > > > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > > index b7ef0b71014d..0a8019b14c8f 100644 > > --- a/arch/s390/kvm/kvm-s390.c > > +++ b/arch/s390/kvm/kvm-s390.c > > @@ -1207,6 +1207,8 @@ static int kvm_s390_vm_get_migration(struct kvm *kvm, > > return 0; > > } > > > > +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > > + > > static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr) > > { > > struct kvm_s390_vm_tod_clock gtod; > > @@ -1216,7 +1218,7 @@ static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr) > > > > if (!test_kvm_facility(kvm, 139) && gtod.epoch_idx) > > return -EINVAL; > > - kvm_s390_set_tod_clock(kvm, >od); > > + __kvm_s390_set_tod_clock(kvm, >od); > > > > VM_EVENT(kvm, 3, "SET: TOD extension: 0x%x, TOD base: 0x%llx", > > gtod.epoch_idx, gtod.tod); > > @@ -1247,7 +1249,7 @@ static int kvm_s390_set_tod_low(struct kvm *kvm, struct kvm_device_attr *attr) > > sizeof(gtod.tod))) > > return -EFAULT; > > > > - kvm_s390_set_tod_clock(kvm, >od); > > + __kvm_s390_set_tod_clock(kvm, >od); > > VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod.tod); > > return 0; > > } > > @@ -1259,6 +1261,12 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr) > > if (attr->flags) > > return -EINVAL; > > > > Add comment: > For a protected guest the TOD is managed by the Ultravisor so trying to > change it will never bring the expected results. Yes, good point. Done. > -EOPNOTSUPP is a new return code for the tod attribute, therefore > programs using it might need a fix to be able to handle it. Hmm, yes indeed. Another alternative to consider might be -EINVAL. That is already specified as a return for KVM_S390_VM_TOD_HIGH and KVM_S390_VM_TOD_EXT (in different circumstances though). However, it's missing from KVM_S390_VM_TOD_LOW... > And as -EOPNOTSUPP has never been used before you'll also need to > update: Documentation/virt/kvm/devices/vm.rst Yeah, I will update the docs and use -EOPNOTSUPP for now. If someone argues for -EINVAL, I can still change it.
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index b7ef0b71014d..0a8019b14c8f 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -1207,6 +1207,8 @@ static int kvm_s390_vm_get_migration(struct kvm *kvm, return 0; } +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); + static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr) { struct kvm_s390_vm_tod_clock gtod; @@ -1216,7 +1218,7 @@ static int kvm_s390_set_tod_ext(struct kvm *kvm, struct kvm_device_attr *attr) if (!test_kvm_facility(kvm, 139) && gtod.epoch_idx) return -EINVAL; - kvm_s390_set_tod_clock(kvm, >od); + __kvm_s390_set_tod_clock(kvm, >od); VM_EVENT(kvm, 3, "SET: TOD extension: 0x%x, TOD base: 0x%llx", gtod.epoch_idx, gtod.tod); @@ -1247,7 +1249,7 @@ static int kvm_s390_set_tod_low(struct kvm *kvm, struct kvm_device_attr *attr) sizeof(gtod.tod))) return -EFAULT; - kvm_s390_set_tod_clock(kvm, >od); + __kvm_s390_set_tod_clock(kvm, >od); VM_EVENT(kvm, 3, "SET: TOD base: 0x%llx", gtod.tod); return 0; } @@ -1259,6 +1261,12 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr) if (attr->flags) return -EINVAL; + mutex_lock(&kvm->lock); + if (kvm_s390_pv_is_protected(kvm)) { + ret = -EOPNOTSUPP; + goto out_unlock; + } + switch (attr->attr) { case KVM_S390_VM_TOD_EXT: ret = kvm_s390_set_tod_ext(kvm, attr); @@ -1273,6 +1281,9 @@ static int kvm_s390_set_tod(struct kvm *kvm, struct kvm_device_attr *attr) ret = -ENXIO; break; } + +out_unlock: + mutex_unlock(&kvm->lock); return ret; }
When running under PV, the guest's TOD clock is under control of the ultravisor and the hypervisor isn't allowed to change it. Hence, don't allow userspace to change the guest's TOD clock by returning -EOPNOTSUPP. When userspace changes the guest's TOD clock, KVM updates its kvm.arch.epoch field and, in addition, the epoch field in all state descriptions of all VCPUs. But, under PV, the ultravisor will ignore the epoch field in the state description and simply overwrite it on next SIE exit with the actual guest epoch. This leads to KVM having an incorrect view of the guest's TOD clock: it has updated its internal kvm.arch.epoch field, but the ultravisor ignores the field in the state description. Whenever a guest is now waiting for a clock comparator, KVM will incorrectly calculate the time when the guest should wake up, possibly causing the guest to sleep for much longer than expected. With this change, kvm_s390_set_tod() will now take the kvm->lock to be able to call kvm_s390_pv_is_protected(). Since kvm_s390_set_tod_clock() also takes kvm->lock, use __kvm_s390_set_tod_clock() instead. Fixes: 0f3035047140 ("KVM: s390: protvirt: Do only reset registers that are accessible") Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Signed-off-by: Nico Boehr <nrb@linux.ibm.com> --- arch/s390/kvm/kvm-s390.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)