Message ID | 20230823124140.3839373-1-mimu@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: s390: fix gisa destroy operation might lead to cpu stalls | expand |
On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote: ... > diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c > index 9bd0a873f3b1..73153bea6c24 100644 > --- a/arch/s390/kvm/interrupt.c > +++ b/arch/s390/kvm/interrupt.c > @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm) > if (gi->alert.mask) > KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x", > kvm, gi->alert.mask); > - while (gisa_in_alert_list(gi->origin)) > - cpu_relax(); > + while (gisa_in_alert_list(gi->origin)) { > + KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm); > + process_gib_alert_list(); process_gib_alert_list() has two nested loops and neither of them does cpu_relax(). I guess, those are needed instead of one you remove? > + } > hrtimer_cancel(&gi->timer); > gi->origin = NULL; > VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);
On 23.08.23 15:23, Alexander Gordeev wrote: > On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote: > ... >> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c >> index 9bd0a873f3b1..73153bea6c24 100644 >> --- a/arch/s390/kvm/interrupt.c >> +++ b/arch/s390/kvm/interrupt.c >> @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm) >> if (gi->alert.mask) >> KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x", >> kvm, gi->alert.mask); >> - while (gisa_in_alert_list(gi->origin)) >> - cpu_relax(); >> + while (gisa_in_alert_list(gi->origin)) { >> + KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm); >> + process_gib_alert_list(); > > process_gib_alert_list() has two nested loops and neither of them > does cpu_relax(). I guess, those are needed instead of one you remove? Calling function process_gib_alert_list() guarantees the gisa is taken out of the alert list immediately and thus the potential endless loop on gisa_in_alert_list() is solved. The issue surfaced with the following patch that accidently disabled the GAL interrupt processing on the host that normaly handles the alert list. The patch has been reverted from devel and will be re-applied in v2. 88a096a7a460 Revert "s390/airq: remove lsi_mask from airq_struct" a9d17c5d8813 s390/airq: remove lsi_mask from airq_struct Does that make sense for you? > >> + } >> hrtimer_cancel(&gi->timer); >> gi->origin = NULL; >> VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);
On Wed, Aug 23, 2023 at 04:09:26PM +0200, Michael Mueller wrote: > > > On 23.08.23 15:23, Alexander Gordeev wrote: > > On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote: > > ... > > > diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c > > > index 9bd0a873f3b1..73153bea6c24 100644 > > > --- a/arch/s390/kvm/interrupt.c > > > +++ b/arch/s390/kvm/interrupt.c > > > @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm) > > > if (gi->alert.mask) > > > KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x", > > > kvm, gi->alert.mask); > > > - while (gisa_in_alert_list(gi->origin)) > > > - cpu_relax(); > > > + while (gisa_in_alert_list(gi->origin)) { > > > + KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm); > > > + process_gib_alert_list(); > > > > process_gib_alert_list() has two nested loops and neither of them > > does cpu_relax(). I guess, those are needed instead of one you remove? > > Calling function process_gib_alert_list() guarantees the gisa > is taken out of the alert list immediately and thus the potential > endless loop on gisa_in_alert_list() is solved. The issue surfaced > with the following patch that accidently disabled the GAL interrupt > processing on the host that normaly handles the alert list. > The patch has been reverted from devel and will be re-applied in v2. > > 88a096a7a460 Revert "s390/airq: remove lsi_mask from airq_struct" > a9d17c5d8813 s390/airq: remove lsi_mask from airq_struct > > Does that make sense for you? Not really. If process_gib_alert_list() does guarantee the removal, then it should be a condition, not the loop. But I am actually not into this code. Just wanted to point out that cpu_relax() is removed from this loop and the two other loops within process_gib_alert_list() do not have it either. So up to Christian, Janosch and Claudio.
On Wed, 23 Aug 2023 18:01:59 +0200 Alexander Gordeev <agordeev@linux.ibm.com> wrote: > On Wed, Aug 23, 2023 at 04:09:26PM +0200, Michael Mueller wrote: > > > > > > On 23.08.23 15:23, Alexander Gordeev wrote: > > > On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote: > > > ... > > > > diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c > > > > index 9bd0a873f3b1..73153bea6c24 100644 > > > > --- a/arch/s390/kvm/interrupt.c > > > > +++ b/arch/s390/kvm/interrupt.c > > > > @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm) > > > > if (gi->alert.mask) > > > > KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x", > > > > kvm, gi->alert.mask); > > > > - while (gisa_in_alert_list(gi->origin)) > > > > - cpu_relax(); > > > > + while (gisa_in_alert_list(gi->origin)) { > > > > + KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm); > > > > + process_gib_alert_list(); > > > > > > process_gib_alert_list() has two nested loops and neither of them > > > does cpu_relax(). I guess, those are needed instead of one you remove? > > > > Calling function process_gib_alert_list() guarantees the gisa > > is taken out of the alert list immediately and thus the potential > > endless loop on gisa_in_alert_list() is solved. The issue surfaced > > with the following patch that accidently disabled the GAL interrupt > > processing on the host that normaly handles the alert list. > > The patch has been reverted from devel and will be re-applied in v2. > > > > 88a096a7a460 Revert "s390/airq: remove lsi_mask from airq_struct" > > a9d17c5d8813 s390/airq: remove lsi_mask from airq_struct > > > > Does that make sense for you? > > Not really. If process_gib_alert_list() does guarantee the removal, > then it should be a condition, not the loop. this is actually a good question. why is it still a loop? > > But I am actually not into this code. Just wanted to point out that > cpu_relax() is removed from this loop and the two other loops within > process_gib_alert_list() do not have it either. > > So up to Christian, Janosch and Claudio.
On Wed, Aug 23, 2023 at 06:01:59PM +0200, Alexander Gordeev wrote: > On Wed, Aug 23, 2023 at 04:09:26PM +0200, Michael Mueller wrote: > > Does that make sense for you? > > Not really. If process_gib_alert_list() does guarantee the removal, > then it should be a condition, not the loop. > > But I am actually not into this code. Just wanted to point out that > cpu_relax() is removed from this loop and the two other loops within > process_gib_alert_list() do not have it either. Not sure if you are mainly referring to the missing cpu_relax(), however: any chance you missed that cpu_relax() translates only to barrier() on s390? So it really doesn't "relax" anything. cpu_relax() used to be a diagnose 0x44 (aka voluntary yield), but that caused many problems, therefore we removed that logic, and the only thing remaining is a no-op with compiler barrier semantics.
On 23.08.23 18:16, Claudio Imbrenda wrote: > On Wed, 23 Aug 2023 18:01:59 +0200 > Alexander Gordeev <agordeev@linux.ibm.com> wrote: > >> On Wed, Aug 23, 2023 at 04:09:26PM +0200, Michael Mueller wrote: >>> >>> >>> On 23.08.23 15:23, Alexander Gordeev wrote: >>>> On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote: >>>> ... >>>>> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c >>>>> index 9bd0a873f3b1..73153bea6c24 100644 >>>>> --- a/arch/s390/kvm/interrupt.c >>>>> +++ b/arch/s390/kvm/interrupt.c >>>>> @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm) >>>>> if (gi->alert.mask) >>>>> KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x", >>>>> kvm, gi->alert.mask); >>>>> - while (gisa_in_alert_list(gi->origin)) >>>>> - cpu_relax(); >>>>> + while (gisa_in_alert_list(gi->origin)) { >>>>> + KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm); >>>>> + process_gib_alert_list(); >>>> >>>> process_gib_alert_list() has two nested loops and neither of them >>>> does cpu_relax(). I guess, those are needed instead of one you remove? >>> >>> Calling function process_gib_alert_list() guarantees the gisa >>> is taken out of the alert list immediately and thus the potential >>> endless loop on gisa_in_alert_list() is solved. The issue surfaced >>> with the following patch that accidently disabled the GAL interrupt >>> processing on the host that normaly handles the alert list. >>> The patch has been reverted from devel and will be re-applied in v2. >>> >>> 88a096a7a460 Revert "s390/airq: remove lsi_mask from airq_struct" >>> a9d17c5d8813 s390/airq: remove lsi_mask from airq_struct >>> >>> Does that make sense for you? >> >> Not really. If process_gib_alert_list() does guarantee the removal, >> then it should be a condition, not the loop. > > this is actually a good question. why is it still a loop? The reason for the loop aproach was that I was not sure if any late incoming interrupts would bring the gisa back into the alert list by the firmware. I verified that this cannot happen if the mask that restores the interruption alert mask (IAM) is properly set to 0x00 by the last device driver de-registration before gisa destruction. In addition I now enforce it to be 0x00 if not already done. (That would be a bug.) That finally means the *if in_alert_list then process_alert_list* is sufficient. I will send a v2. > >> >> But I am actually not into this code. Just wanted to point out that >> cpu_relax() is removed from this loop and the two other loops within >> process_gib_alert_list() do not have it either. >> >> So up to Christian, Janosch and Claudio. >
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index 9bd0a873f3b1..73153bea6c24 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm) if (gi->alert.mask) KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x", kvm, gi->alert.mask); - while (gisa_in_alert_list(gi->origin)) - cpu_relax(); + while (gisa_in_alert_list(gi->origin)) { + KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm); + process_gib_alert_list(); + } hrtimer_cancel(&gi->timer); gi->origin = NULL; VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);