Message ID | 20220301143340.111129-1-imbrenda@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1,1/1] KVM: s390x: fix SCK locking | expand |
On 3/1/22 15:33, Claudio Imbrenda wrote: > When handling the SCK instruction, the kvm lock is taken, even though > the vcpu lock is already being held. The normal locking order is kvm > lock first and then vcpu lock. This is can (and in some circumstances > does) lead to deadlocks. > > The function kvm_s390_set_tod_clock is called both by the SCK handler > and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu > lock, so they can safely take the kvm lock. The SCK handler holds the > vcpu lock, but will also somehow need to acquire the kvm lock without > relinquishing the vcpu lock. > > The solution is to factor out the code to set the clock, and provide > two wrappers. One is called like the original function and does the > locking, the other is called kvm_s390_try_set_tod_clock and uses > trylock to try to acquire the kvm lock. This new wrapper is then used > in the SCK handler. If locking fails, -EAGAIN is returned, which is > eventually propagated to userspace, thus also freeing the vcpu lock and > allowing for forward progress. > > This is not the most efficient or elegant way to solve this issue, but > the SCK instruction is deprecated and its performance is not critical. > > The goal of this patch is just to provide a simple but correct way to > fix the bug. > > Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> > --- > arch/s390/kvm/kvm-s390.c | 19 ++++++++++++++++--- > arch/s390/kvm/kvm-s390.h | 4 ++-- > arch/s390/kvm/priv.c | 14 +++++++++++++- > 3 files changed, 31 insertions(+), 6 deletions(-) > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index 2296b1ff1e02..4e3db4004bfd 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -3869,14 +3869,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) > return 0; > } > > -void kvm_s390_set_tod_clock(struct kvm *kvm, > - const struct kvm_s390_vm_tod_clock *gtod) > +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > { > struct kvm_vcpu *vcpu; > union tod_clock clk; > unsigned long i; > > - mutex_lock(&kvm->lock); > preempt_disable(); > > store_tod_clock_ext(&clk); > @@ -3897,7 +3895,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm, > > kvm_s390_vcpu_unblock_all(kvm); > preempt_enable(); > +} > + > +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > +{ > + mutex_lock(&kvm->lock); > + __kvm_s390_set_tod_clock(kvm, gtod); > + mutex_unlock(&kvm->lock); > +} > + > +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) Why int instead of bool? > +{ > + if (!mutex_trylock(&kvm->lock)) > + return 0; > + __kvm_s390_set_tod_clock(kvm, gtod); > mutex_unlock(&kvm->lock); > + return 1; > } > [...]
On Wed, 2 Mar 2022 11:15:23 +0100 Janis Schoetterl-Glausch <scgl@linux.ibm.com> wrote: > On 3/1/22 15:33, Claudio Imbrenda wrote: > > When handling the SCK instruction, the kvm lock is taken, even though > > the vcpu lock is already being held. The normal locking order is kvm > > lock first and then vcpu lock. This is can (and in some circumstances > > does) lead to deadlocks. > > > > The function kvm_s390_set_tod_clock is called both by the SCK handler > > and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu > > lock, so they can safely take the kvm lock. The SCK handler holds the > > vcpu lock, but will also somehow need to acquire the kvm lock without > > relinquishing the vcpu lock. > > > > The solution is to factor out the code to set the clock, and provide > > two wrappers. One is called like the original function and does the > > locking, the other is called kvm_s390_try_set_tod_clock and uses > > trylock to try to acquire the kvm lock. This new wrapper is then used > > in the SCK handler. If locking fails, -EAGAIN is returned, which is > > eventually propagated to userspace, thus also freeing the vcpu lock and > > allowing for forward progress. > > > > This is not the most efficient or elegant way to solve this issue, but > > the SCK instruction is deprecated and its performance is not critical. > > > > The goal of this patch is just to provide a simple but correct way to > > fix the bug. > > > > Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") > > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > > Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> > > --- > > arch/s390/kvm/kvm-s390.c | 19 ++++++++++++++++--- > > arch/s390/kvm/kvm-s390.h | 4 ++-- > > arch/s390/kvm/priv.c | 14 +++++++++++++- > > 3 files changed, 31 insertions(+), 6 deletions(-) > > > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > > index 2296b1ff1e02..4e3db4004bfd 100644 > > --- a/arch/s390/kvm/kvm-s390.c > > +++ b/arch/s390/kvm/kvm-s390.c > > @@ -3869,14 +3869,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) > > return 0; > > } > > > > -void kvm_s390_set_tod_clock(struct kvm *kvm, > > - const struct kvm_s390_vm_tod_clock *gtod) > > +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > > { > > struct kvm_vcpu *vcpu; > > union tod_clock clk; > > unsigned long i; > > > > - mutex_lock(&kvm->lock); > > preempt_disable(); > > > > store_tod_clock_ext(&clk); > > @@ -3897,7 +3895,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm, > > > > kvm_s390_vcpu_unblock_all(kvm); > > preempt_enable(); > > +} > > + > > +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > > +{ > > + mutex_lock(&kvm->lock); > > + __kvm_s390_set_tod_clock(kvm, gtod); > > + mutex_unlock(&kvm->lock); > > +} > > + > > +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > > Why int instead of bool? to be consistent with mutex_trylock, which also returns int > > > +{ > > + if (!mutex_trylock(&kvm->lock)) > > + return 0; > > + __kvm_s390_set_tod_clock(kvm, gtod); > > mutex_unlock(&kvm->lock); > > + return 1; > > } > > > [...]
Am 01.03.22 um 15:33 schrieb Claudio Imbrenda: > When handling the SCK instruction, the kvm lock is taken, even though > the vcpu lock is already being held. The normal locking order is kvm > lock first and then vcpu lock. This is can (and in some circumstances > does) lead to deadlocks. > > The function kvm_s390_set_tod_clock is called both by the SCK handler > and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu > lock, so they can safely take the kvm lock. The SCK handler holds the > vcpu lock, but will also somehow need to acquire the kvm lock without > relinquishing the vcpu lock. > > The solution is to factor out the code to set the clock, and provide > two wrappers. One is called like the original function and does the > locking, the other is called kvm_s390_try_set_tod_clock and uses > trylock to try to acquire the kvm lock. This new wrapper is then used > in the SCK handler. If locking fails, -EAGAIN is returned, which is > eventually propagated to userspace, thus also freeing the vcpu lock and > allowing for forward progress. > > This is not the most efficient or elegant way to solve this issue, but > the SCK instruction is deprecated and its performance is not critical. > > The goal of this patch is just to provide a simple but correct way to > fix the bug. > > Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> > --- > arch/s390/kvm/kvm-s390.c | 19 ++++++++++++++++--- > arch/s390/kvm/kvm-s390.h | 4 ++-- > arch/s390/kvm/priv.c | 14 +++++++++++++- > 3 files changed, 31 insertions(+), 6 deletions(-) > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index 2296b1ff1e02..4e3db4004bfd 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -3869,14 +3869,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) > return 0; > } > > -void kvm_s390_set_tod_clock(struct kvm *kvm, > - const struct kvm_s390_vm_tod_clock *gtod) > +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > { > struct kvm_vcpu *vcpu; > union tod_clock clk; > unsigned long i; > > - mutex_lock(&kvm->lock); > preempt_disable(); > > store_tod_clock_ext(&clk); > @@ -3897,7 +3895,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm, > > kvm_s390_vcpu_unblock_all(kvm); > preempt_enable(); > +} > + > +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > +{ > + mutex_lock(&kvm->lock); > + __kvm_s390_set_tod_clock(kvm, gtod); > + mutex_unlock(&kvm->lock); > +} > + > +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > +{ > + if (!mutex_trylock(&kvm->lock)) > + return 0; > + __kvm_s390_set_tod_clock(kvm, gtod); > mutex_unlock(&kvm->lock); > + return 1; > } > > /** > diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h > index 098831e815e6..f2c910763d7f 100644 > --- a/arch/s390/kvm/kvm-s390.h > +++ b/arch/s390/kvm/kvm-s390.h > @@ -349,8 +349,8 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); > int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu); > > /* implemented in kvm-s390.c */ > -void kvm_s390_set_tod_clock(struct kvm *kvm, > - const struct kvm_s390_vm_tod_clock *gtod); > +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable); > int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr); > int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr); > diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c > index 417154b314a6..7f3e7990ef82 100644 > --- a/arch/s390/kvm/priv.c > +++ b/arch/s390/kvm/priv.c > @@ -102,7 +102,19 @@ static int handle_set_clock(struct kvm_vcpu *vcpu) > return kvm_s390_inject_prog_cond(vcpu, rc); > > VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", gtod.tod); > - kvm_s390_set_tod_clock(vcpu->kvm, >od); > + /* > + * To set the TOD clock we need to take the kvm lock, but we are > + * already holding the vcpu lock, and the usual lock order is the > + * opposite. Therefore we use trylock instead of lock, and if the > + * kvm lock cannot be taken, we retry the instruction and return > + * -EAGAIN to userspace, thus freeing the vcpu lock. > + * The SCK instruction is considered legacy and at this point it's > + * not worth the effort to find a nicer solution. > + */ > + if (!kvm_s390_try_set_tod_clock(vcpu->kvm, >od)) { > + kvm_s390_retry_instr(vcpu); > + return -EAGAIN; > + } > > kvm_s390_set_psw_cc(vcpu, 0); > return 0;
Am 01.03.22 um 15:33 schrieb Claudio Imbrenda: > When handling the SCK instruction, the kvm lock is taken, even though > the vcpu lock is already being held. The normal locking order is kvm > lock first and then vcpu lock. This is can (and in some circumstances > does) lead to deadlocks. > > The function kvm_s390_set_tod_clock is called both by the SCK handler > and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu > lock, so they can safely take the kvm lock. The SCK handler holds the > vcpu lock, but will also somehow need to acquire the kvm lock without > relinquishing the vcpu lock. > > The solution is to factor out the code to set the clock, and provide > two wrappers. One is called like the original function and does the > locking, the other is called kvm_s390_try_set_tod_clock and uses > trylock to try to acquire the kvm lock. This new wrapper is then used > in the SCK handler. If locking fails, -EAGAIN is returned, which is > eventually propagated to userspace, thus also freeing the vcpu lock and > allowing for forward progress. > > This is not the most efficient or elegant way to solve this issue, but > the SCK instruction is deprecated and its performance is not critical. > > The goal of this patch is just to provide a simple but correct way to > fix the bug. > > Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Thanks applied. I will also add cc stable when queueing. > --- > arch/s390/kvm/kvm-s390.c | 19 ++++++++++++++++--- > arch/s390/kvm/kvm-s390.h | 4 ++-- > arch/s390/kvm/priv.c | 14 +++++++++++++- > 3 files changed, 31 insertions(+), 6 deletions(-) > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index 2296b1ff1e02..4e3db4004bfd 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -3869,14 +3869,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) > return 0; > } > > -void kvm_s390_set_tod_clock(struct kvm *kvm, > - const struct kvm_s390_vm_tod_clock *gtod) > +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > { > struct kvm_vcpu *vcpu; > union tod_clock clk; > unsigned long i; > > - mutex_lock(&kvm->lock); > preempt_disable(); > > store_tod_clock_ext(&clk); > @@ -3897,7 +3895,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm, > > kvm_s390_vcpu_unblock_all(kvm); > preempt_enable(); > +} > + > +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > +{ > + mutex_lock(&kvm->lock); > + __kvm_s390_set_tod_clock(kvm, gtod); > + mutex_unlock(&kvm->lock); > +} > + > +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > +{ > + if (!mutex_trylock(&kvm->lock)) > + return 0; > + __kvm_s390_set_tod_clock(kvm, gtod); > mutex_unlock(&kvm->lock); > + return 1; > } > > /** > diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h > index 098831e815e6..f2c910763d7f 100644 > --- a/arch/s390/kvm/kvm-s390.h > +++ b/arch/s390/kvm/kvm-s390.h > @@ -349,8 +349,8 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); > int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu); > > /* implemented in kvm-s390.c */ > -void kvm_s390_set_tod_clock(struct kvm *kvm, > - const struct kvm_s390_vm_tod_clock *gtod); > +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable); > int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr); > int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr); > diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c > index 417154b314a6..7f3e7990ef82 100644 > --- a/arch/s390/kvm/priv.c > +++ b/arch/s390/kvm/priv.c > @@ -102,7 +102,19 @@ static int handle_set_clock(struct kvm_vcpu *vcpu) > return kvm_s390_inject_prog_cond(vcpu, rc); > > VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", gtod.tod); > - kvm_s390_set_tod_clock(vcpu->kvm, >od); > + /* > + * To set the TOD clock we need to take the kvm lock, but we are > + * already holding the vcpu lock, and the usual lock order is the > + * opposite. Therefore we use trylock instead of lock, and if the > + * kvm lock cannot be taken, we retry the instruction and return > + * -EAGAIN to userspace, thus freeing the vcpu lock. > + * The SCK instruction is considered legacy and at this point it's > + * not worth the effort to find a nicer solution. > + */ > + if (!kvm_s390_try_set_tod_clock(vcpu->kvm, >od)) { > + kvm_s390_retry_instr(vcpu); > + return -EAGAIN; > + } > > kvm_s390_set_psw_cc(vcpu, 0); > return 0;
Am 01.03.22 um 15:33 schrieb Claudio Imbrenda: > When handling the SCK instruction, the kvm lock is taken, even though > the vcpu lock is already being held. The normal locking order is kvm > lock first and then vcpu lock. This is can (and in some circumstances > does) lead to deadlocks. > > The function kvm_s390_set_tod_clock is called both by the SCK handler > and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu > lock, so they can safely take the kvm lock. The SCK handler holds the > vcpu lock, but will also somehow need to acquire the kvm lock without > relinquishing the vcpu lock. > > The solution is to factor out the code to set the clock, and provide > two wrappers. One is called like the original function and does the > locking, the other is called kvm_s390_try_set_tod_clock and uses > trylock to try to acquire the kvm lock. This new wrapper is then used > in the SCK handler. If locking fails, -EAGAIN is returned, which is > eventually propagated to userspace, thus also freeing the vcpu lock and > allowing for forward progress. > > This is not the most efficient or elegant way to solve this issue, but > the SCK instruction is deprecated and its performance is not critical. > > The goal of this patch is just to provide a simple but correct way to > fix the bug. > > Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > --- > arch/s390/kvm/kvm-s390.c | 19 ++++++++++++++++--- > arch/s390/kvm/kvm-s390.h | 4 ++-- > arch/s390/kvm/priv.c | 14 +++++++++++++- > 3 files changed, 31 insertions(+), 6 deletions(-) > > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > index 2296b1ff1e02..4e3db4004bfd 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -3869,14 +3869,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) > return 0; > } > > -void kvm_s390_set_tod_clock(struct kvm *kvm, > - const struct kvm_s390_vm_tod_clock *gtod) > +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > { > struct kvm_vcpu *vcpu; > union tod_clock clk; > unsigned long i; > > - mutex_lock(&kvm->lock); > preempt_disable(); > > store_tod_clock_ext(&clk); > @@ -3897,7 +3895,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm, > > kvm_s390_vcpu_unblock_all(kvm); > preempt_enable(); > +} > + > +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > +{ > + mutex_lock(&kvm->lock); > + __kvm_s390_set_tod_clock(kvm, gtod); > + mutex_unlock(&kvm->lock); > +} > + > +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > +{ > + if (!mutex_trylock(&kvm->lock)) > + return 0; > + __kvm_s390_set_tod_clock(kvm, gtod); > mutex_unlock(&kvm->lock); > + return 1; > } > > /** > diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h > index 098831e815e6..f2c910763d7f 100644 > --- a/arch/s390/kvm/kvm-s390.h > +++ b/arch/s390/kvm/kvm-s390.h > @@ -349,8 +349,8 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); > int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu); > > /* implemented in kvm-s390.c */ > -void kvm_s390_set_tod_clock(struct kvm *kvm, > - const struct kvm_s390_vm_tod_clock *gtod); > +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable); > int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr); > int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr); > diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c > index 417154b314a6..7f3e7990ef82 100644 > --- a/arch/s390/kvm/priv.c > +++ b/arch/s390/kvm/priv.c > @@ -102,7 +102,19 @@ static int handle_set_clock(struct kvm_vcpu *vcpu) > return kvm_s390_inject_prog_cond(vcpu, rc); > > VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", gtod.tod); > - kvm_s390_set_tod_clock(vcpu->kvm, >od); > + /* > + * To set the TOD clock we need to take the kvm lock, but we are > + * already holding the vcpu lock, and the usual lock order is the > + * opposite. Therefore we use trylock instead of lock, and if the > + * kvm lock cannot be taken, we retry the instruction and return > + * -EAGAIN to userspace, thus freeing the vcpu lock. > + * The SCK instruction is considered legacy and at this point it's > + * not worth the effort to find a nicer solution. > + */ To comply more with usual comment style (no we, us) and to give more context on the legacy I will slightly modify the comment before sending out. /* * To set the TOD clock the kvm lock must be taken, but the vcpu * lock is already held in handle_set_clock. The usual lock order * is the opposite. * As SCK is deprecated and should not be used in several cases * like the existence of the multiple epoch facility or TOD clock * steering (see Principles of Operation) a slow path can be used. * If the lock can not be taken via try_lock, the instruction will * be retried via -EAGAIN at a later point in time. */ Ok with everybody? > + if (!kvm_s390_try_set_tod_clock(vcpu->kvm, >od)) { > + kvm_s390_retry_instr(vcpu); > + return -EAGAIN; > + } > > kvm_s390_set_psw_cc(vcpu, 0); > return 0;
On 3/14/22 14:33, Christian Borntraeger wrote: > Am 01.03.22 um 15:33 schrieb Claudio Imbrenda: >> When handling the SCK instruction, the kvm lock is taken, even though >> the vcpu lock is already being held. The normal locking order is kvm >> lock first and then vcpu lock. This is can (and in some circumstances >> does) lead to deadlocks. >> >> The function kvm_s390_set_tod_clock is called both by the SCK handler >> and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu >> lock, so they can safely take the kvm lock. The SCK handler holds the >> vcpu lock, but will also somehow need to acquire the kvm lock without >> relinquishing the vcpu lock. >> >> The solution is to factor out the code to set the clock, and provide >> two wrappers. One is called like the original function and does the >> locking, the other is called kvm_s390_try_set_tod_clock and uses >> trylock to try to acquire the kvm lock. This new wrapper is then used >> in the SCK handler. If locking fails, -EAGAIN is returned, which is >> eventually propagated to userspace, thus also freeing the vcpu lock and >> allowing for forward progress. >> >> This is not the most efficient or elegant way to solve this issue, but >> the SCK instruction is deprecated and its performance is not critical. >> >> The goal of this patch is just to provide a simple but correct way to >> fix the bug. >> >> Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") >> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> >> --- >> arch/s390/kvm/kvm-s390.c | 19 ++++++++++++++++--- >> arch/s390/kvm/kvm-s390.h | 4 ++-- >> arch/s390/kvm/priv.c | 14 +++++++++++++- >> 3 files changed, 31 insertions(+), 6 deletions(-) >> >> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c >> index 2296b1ff1e02..4e3db4004bfd 100644 >> --- a/arch/s390/kvm/kvm-s390.c >> +++ b/arch/s390/kvm/kvm-s390.c >> @@ -3869,14 +3869,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) >> return 0; >> } >> -void kvm_s390_set_tod_clock(struct kvm *kvm, >> - const struct kvm_s390_vm_tod_clock *gtod) >> +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) >> { >> struct kvm_vcpu *vcpu; >> union tod_clock clk; >> unsigned long i; >> - mutex_lock(&kvm->lock); >> preempt_disable(); >> store_tod_clock_ext(&clk); >> @@ -3897,7 +3895,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm, >> kvm_s390_vcpu_unblock_all(kvm); >> preempt_enable(); >> +} >> + >> +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) >> +{ >> + mutex_lock(&kvm->lock); >> + __kvm_s390_set_tod_clock(kvm, gtod); >> + mutex_unlock(&kvm->lock); >> +} >> + >> +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) >> +{ >> + if (!mutex_trylock(&kvm->lock)) >> + return 0; >> + __kvm_s390_set_tod_clock(kvm, gtod); >> mutex_unlock(&kvm->lock); >> + return 1; >> } >> /** >> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h >> index 098831e815e6..f2c910763d7f 100644 >> --- a/arch/s390/kvm/kvm-s390.h >> +++ b/arch/s390/kvm/kvm-s390.h >> @@ -349,8 +349,8 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); >> int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu); >> /* implemented in kvm-s390.c */ >> -void kvm_s390_set_tod_clock(struct kvm *kvm, >> - const struct kvm_s390_vm_tod_clock *gtod); >> +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); >> +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); >> long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable); >> int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr); >> int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr); >> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c >> index 417154b314a6..7f3e7990ef82 100644 >> --- a/arch/s390/kvm/priv.c >> +++ b/arch/s390/kvm/priv.c >> @@ -102,7 +102,19 @@ static int handle_set_clock(struct kvm_vcpu *vcpu) >> return kvm_s390_inject_prog_cond(vcpu, rc); >> VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", gtod.tod); >> - kvm_s390_set_tod_clock(vcpu->kvm, >od); >> + /* >> + * To set the TOD clock we need to take the kvm lock, but we are >> + * already holding the vcpu lock, and the usual lock order is the >> + * opposite. Therefore we use trylock instead of lock, and if the >> + * kvm lock cannot be taken, we retry the instruction and return >> + * -EAGAIN to userspace, thus freeing the vcpu lock. >> + * The SCK instruction is considered legacy and at this point it's >> + * not worth the effort to find a nicer solution. >> + */ > > To comply more with usual comment style (no we, us) and to give more context > on the legacy I will slightly modify the comment before sending out. > > /* > * To set the TOD clock the kvm lock must be taken, but the vcpu > * lock is already held in handle_set_clock. The usual lock order > * is the opposite. > * As SCK is deprecated and should not be used in several cases I think you'd want commas around that clause, i.e. * As SCK is deprecated and should not be used in several cases, * for example when the multiple-opoch or the TOD-clock-steering * facility is installed (see Principles of Operation), * a slow path can be used. > * like the existence of the multiple epoch facility or TOD clock> * steering (see Principles of Operation) a slow path can be used. > * If the lock can not be taken via try_lock, the instruction will > * be retried via -EAGAIN at a later point in time. > */ > > Ok with everybody? > > > >> + if (!kvm_s390_try_set_tod_clock(vcpu->kvm, >od)) { >> + kvm_s390_retry_instr(vcpu); >> + return -EAGAIN; >> + } >> kvm_s390_set_psw_cc(vcpu, 0); >> return 0;
On Mon, 14 Mar 2022 15:02:13 +0100 Janis Schoetterl-Glausch <scgl@linux.ibm.com> wrote: > On 3/14/22 14:33, Christian Borntraeger wrote: > > Am 01.03.22 um 15:33 schrieb Claudio Imbrenda: > >> When handling the SCK instruction, the kvm lock is taken, even though > >> the vcpu lock is already being held. The normal locking order is kvm > >> lock first and then vcpu lock. This is can (and in some circumstances > >> does) lead to deadlocks. > >> > >> The function kvm_s390_set_tod_clock is called both by the SCK handler > >> and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu > >> lock, so they can safely take the kvm lock. The SCK handler holds the > >> vcpu lock, but will also somehow need to acquire the kvm lock without > >> relinquishing the vcpu lock. > >> > >> The solution is to factor out the code to set the clock, and provide > >> two wrappers. One is called like the original function and does the > >> locking, the other is called kvm_s390_try_set_tod_clock and uses > >> trylock to try to acquire the kvm lock. This new wrapper is then used > >> in the SCK handler. If locking fails, -EAGAIN is returned, which is > >> eventually propagated to userspace, thus also freeing the vcpu lock and > >> allowing for forward progress. > >> > >> This is not the most efficient or elegant way to solve this issue, but > >> the SCK instruction is deprecated and its performance is not critical. > >> > >> The goal of this patch is just to provide a simple but correct way to > >> fix the bug. > >> > >> Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") > >> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > >> --- > >> arch/s390/kvm/kvm-s390.c | 19 ++++++++++++++++--- > >> arch/s390/kvm/kvm-s390.h | 4 ++-- > >> arch/s390/kvm/priv.c | 14 +++++++++++++- > >> 3 files changed, 31 insertions(+), 6 deletions(-) > >> > >> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c > >> index 2296b1ff1e02..4e3db4004bfd 100644 > >> --- a/arch/s390/kvm/kvm-s390.c > >> +++ b/arch/s390/kvm/kvm-s390.c > >> @@ -3869,14 +3869,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) > >> return 0; > >> } > >> -void kvm_s390_set_tod_clock(struct kvm *kvm, > >> - const struct kvm_s390_vm_tod_clock *gtod) > >> +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > >> { > >> struct kvm_vcpu *vcpu; > >> union tod_clock clk; > >> unsigned long i; > >> - mutex_lock(&kvm->lock); > >> preempt_disable(); > >> store_tod_clock_ext(&clk); > >> @@ -3897,7 +3895,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm, > >> kvm_s390_vcpu_unblock_all(kvm); > >> preempt_enable(); > >> +} > >> + > >> +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > >> +{ > >> + mutex_lock(&kvm->lock); > >> + __kvm_s390_set_tod_clock(kvm, gtod); > >> + mutex_unlock(&kvm->lock); > >> +} > >> + > >> +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) > >> +{ > >> + if (!mutex_trylock(&kvm->lock)) > >> + return 0; > >> + __kvm_s390_set_tod_clock(kvm, gtod); > >> mutex_unlock(&kvm->lock); > >> + return 1; > >> } > >> /** > >> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h > >> index 098831e815e6..f2c910763d7f 100644 > >> --- a/arch/s390/kvm/kvm-s390.h > >> +++ b/arch/s390/kvm/kvm-s390.h > >> @@ -349,8 +349,8 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); > >> int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu); > >> /* implemented in kvm-s390.c */ > >> -void kvm_s390_set_tod_clock(struct kvm *kvm, > >> - const struct kvm_s390_vm_tod_clock *gtod); > >> +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > >> +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); > >> long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable); > >> int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr); > >> int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr); > >> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c > >> index 417154b314a6..7f3e7990ef82 100644 > >> --- a/arch/s390/kvm/priv.c > >> +++ b/arch/s390/kvm/priv.c > >> @@ -102,7 +102,19 @@ static int handle_set_clock(struct kvm_vcpu *vcpu) > >> return kvm_s390_inject_prog_cond(vcpu, rc); > >> VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", gtod.tod); > >> - kvm_s390_set_tod_clock(vcpu->kvm, >od); > >> + /* > >> + * To set the TOD clock we need to take the kvm lock, but we are > >> + * already holding the vcpu lock, and the usual lock order is the > >> + * opposite. Therefore we use trylock instead of lock, and if the > >> + * kvm lock cannot be taken, we retry the instruction and return > >> + * -EAGAIN to userspace, thus freeing the vcpu lock. > >> + * The SCK instruction is considered legacy and at this point it's > >> + * not worth the effort to find a nicer solution. > >> + */ > > > > To comply more with usual comment style (no we, us) and to give more context > > on the legacy I will slightly modify the comment before sending out. > > > > /* > > * To set the TOD clock the kvm lock must be taken, but the vcpu > > * lock is already held in handle_set_clock. The usual lock order > > * is the opposite. > > * As SCK is deprecated and should not be used in several cases > > I think you'd want commas around that clause, i.e. > * As SCK is deprecated and should not be used in several cases, > * for example when the multiple-opoch or the TOD-clock-steering > * facility is installed (see Principles of Operation), > * a slow path can be used. > +1 looks good with the commas > > * like the existence of the multiple epoch facility or TOD clock> * steering (see Principles of Operation) a slow path can be used. > > * If the lock can not be taken via try_lock, the instruction will > > * be retried via -EAGAIN at a later point in time. > > */ > > > > Ok with everybody? > > > > > > > >> + if (!kvm_s390_try_set_tod_clock(vcpu->kvm, >od)) { > >> + kvm_s390_retry_instr(vcpu); > >> + return -EAGAIN; > >> + } > >> kvm_s390_set_psw_cc(vcpu, 0); > >> return 0; >
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 2296b1ff1e02..4e3db4004bfd 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -3869,14 +3869,12 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu) return 0; } -void kvm_s390_set_tod_clock(struct kvm *kvm, - const struct kvm_s390_vm_tod_clock *gtod) +static void __kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) { struct kvm_vcpu *vcpu; union tod_clock clk; unsigned long i; - mutex_lock(&kvm->lock); preempt_disable(); store_tod_clock_ext(&clk); @@ -3897,7 +3895,22 @@ void kvm_s390_set_tod_clock(struct kvm *kvm, kvm_s390_vcpu_unblock_all(kvm); preempt_enable(); +} + +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) +{ + mutex_lock(&kvm->lock); + __kvm_s390_set_tod_clock(kvm, gtod); + mutex_unlock(&kvm->lock); +} + +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod) +{ + if (!mutex_trylock(&kvm->lock)) + return 0; + __kvm_s390_set_tod_clock(kvm, gtod); mutex_unlock(&kvm->lock); + return 1; } /** diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 098831e815e6..f2c910763d7f 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -349,8 +349,8 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu); int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu); /* implemented in kvm-s390.c */ -void kvm_s390_set_tod_clock(struct kvm *kvm, - const struct kvm_s390_vm_tod_clock *gtod); +void kvm_s390_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); +int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clock *gtod); long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable); int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr); int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr); diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 417154b314a6..7f3e7990ef82 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -102,7 +102,19 @@ static int handle_set_clock(struct kvm_vcpu *vcpu) return kvm_s390_inject_prog_cond(vcpu, rc); VCPU_EVENT(vcpu, 3, "SCK: setting guest TOD to 0x%llx", gtod.tod); - kvm_s390_set_tod_clock(vcpu->kvm, >od); + /* + * To set the TOD clock we need to take the kvm lock, but we are + * already holding the vcpu lock, and the usual lock order is the + * opposite. Therefore we use trylock instead of lock, and if the + * kvm lock cannot be taken, we retry the instruction and return + * -EAGAIN to userspace, thus freeing the vcpu lock. + * The SCK instruction is considered legacy and at this point it's + * not worth the effort to find a nicer solution. + */ + if (!kvm_s390_try_set_tod_clock(vcpu->kvm, >od)) { + kvm_s390_retry_instr(vcpu); + return -EAGAIN; + } kvm_s390_set_psw_cc(vcpu, 0); return 0;
When handling the SCK instruction, the kvm lock is taken, even though the vcpu lock is already being held. The normal locking order is kvm lock first and then vcpu lock. This is can (and in some circumstances does) lead to deadlocks. The function kvm_s390_set_tod_clock is called both by the SCK handler and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu lock, so they can safely take the kvm lock. The SCK handler holds the vcpu lock, but will also somehow need to acquire the kvm lock without relinquishing the vcpu lock. The solution is to factor out the code to set the clock, and provide two wrappers. One is called like the original function and does the locking, the other is called kvm_s390_try_set_tod_clock and uses trylock to try to acquire the kvm lock. This new wrapper is then used in the SCK handler. If locking fails, -EAGAIN is returned, which is eventually propagated to userspace, thus also freeing the vcpu lock and allowing for forward progress. This is not the most efficient or elegant way to solve this issue, but the SCK instruction is deprecated and its performance is not critical. The goal of this patch is just to provide a simple but correct way to fix the bug. Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> --- arch/s390/kvm/kvm-s390.c | 19 ++++++++++++++++--- arch/s390/kvm/kvm-s390.h | 4 ++-- arch/s390/kvm/priv.c | 14 +++++++++++++- 3 files changed, 31 insertions(+), 6 deletions(-)