diff mbox series

KVM: s390: fix gisa destroy operation might lead to cpu stalls

Message ID 20230823124140.3839373-1-mimu@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series KVM: s390: fix gisa destroy operation might lead to cpu stalls | expand

Commit Message

Michael Mueller Aug. 23, 2023, 12:41 p.m. UTC
A GISA cannot be destroyed as long it is linked in the GIB alert list
as this would breake the alert list. Just waiting for its removal from
the list triggered by another vm is not sufficient as it might be the
only vm. The below shown cpu stall situation might occur when GIB alerts
are delayed and is fixed by calling process_gib_alert_list() instead of
waiting. At this time the vcpus of the vm are already destroyed and thus
no vcpu can be kicked to enter the SIE again if for some reason an
interrupt is pending for that vm.

The situation can now be observed in the kvm-trace:

 00 01692784587:752383 3 - 0014 000003ff80165b58  vm 0x000000008a880000 created by pid 1462
 00 01692784634:287555 3 - 0007 000003ff80172c14  vm 0x000000008a880000 gisa in alert list during destroy
 00 01692784634:322955 3 - 0002 000003ff8016219a  vm 0x000000008a880000 destroyed

CPU stall caused by kvm_s390_gisa_destroy():

 [ 4915.311372] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 14-.... } 24533 jiffies s: 5269 root: 0x1/.
 [ 4915.311390] rcu: blocking rcu_node structures (internal RCU debug): l=1:0-15:0x4000/.
 [ 4915.311394] Task dump for CPU 14:
 [ 4915.311395] task:qemu-system-s39 state:R  running task     stack:0     pid:217198 ppid:1      flags:0x00000045
 [ 4915.311399] Call Trace:
 [ 4915.311401]  [<0000038003a33a10>] 0x38003a33a10
 [ 4933.861321] rcu: INFO: rcu_sched self-detected stall on CPU
 [ 4933.861332] rcu: 	14-....: (42008 ticks this GP) idle=53f4/1/0x4000000000000000 softirq=61530/61530 fqs=14031
 [ 4933.861353] rcu: 	(t=42008 jiffies g=238109 q=100360 ncpus=18)
 [ 4933.861357] CPU: 14 PID: 217198 Comm: qemu-system-s39 Not tainted 6.5.0-20230816.rc6.git26.a9d17c5d8813.300.fc38.s390x #1
 [ 4933.861360] Hardware name: IBM 8561 T01 703 (LPAR)
 [ 4933.861361] Krnl PSW : 0704e00180000000 000003ff804bfc66 (kvm_s390_gisa_destroy+0x3e/0xe0 [kvm])
 [ 4933.861414]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
 [ 4933.861416] Krnl GPRS: 0000000000000000 00000372000000fc 00000002134f8000 000000000d5f5900
 [ 4933.861419]            00000002f5ea1d18 00000002f5ea1d18 0000000000000000 0000000000000000
 [ 4933.861420]            00000002134fa890 00000002134f8958 000000000d5f5900 00000002134f8000
 [ 4933.861422]            000003ffa06acf98 000003ffa06858b0 0000038003a33c20 0000038003a33bc8
 [ 4933.861430] Krnl Code: 000003ff804bfc58: ec66002b007e	cij	%r6,0,6,000003ff804bfcae
                           000003ff804bfc5e: b904003a		lgr	%r3,%r10
                          #000003ff804bfc62: a7f40005		brc	15,000003ff804bfc6c
                          >000003ff804bfc66: e330b7300204	lg	%r3,10032(%r11)
                           000003ff804bfc6c: 58003000		l	%r0,0(%r3)
                           000003ff804bfc70: ec03fffb6076	crj	%r0,%r3,6,000003ff804bfc66
                           000003ff804bfc76: e320b7600271	lay	%r2,10080(%r11)
                           000003ff804bfc7c: c0e5fffea339	brasl	%r14,000003ff804942ee
 [ 4933.861444] Call Trace:
 [ 4933.861445]  [<000003ff804bfc66>] kvm_s390_gisa_destroy+0x3e/0xe0 [kvm]
 [ 4933.861460] ([<00000002623523de>] free_unref_page+0xee/0x148)
 [ 4933.861507]  [<000003ff804aea98>] kvm_arch_destroy_vm+0x50/0x120 [kvm]
 [ 4933.861521]  [<000003ff8049d374>] kvm_destroy_vm+0x174/0x288 [kvm]
 [ 4933.861532]  [<000003ff8049d4fe>] kvm_vm_release+0x36/0x48 [kvm]
 [ 4933.861542]  [<00000002623cd04a>] __fput+0xea/0x2a8
 [ 4933.861547]  [<00000002620d5bf8>] task_work_run+0x88/0xf0
 [ 4933.861551]  [<00000002620b0aa6>] do_exit+0x2c6/0x528
 [ 4933.861556]  [<00000002620b0f00>] do_group_exit+0x40/0xb8
 [ 4933.861557]  [<00000002620b0fa6>] __s390x_sys_exit_group+0x2e/0x30
 [ 4933.861559]  [<0000000262d481f4>] __do_syscall+0x1d4/0x200
 [ 4933.861563]  [<0000000262d59028>] system_call+0x70/0x98
 [ 4933.861565] Last Breaking-Event-Address:
 [ 4933.861566]  [<0000038003a33b60>] 0x38003a33b60

Fixes: 9f30f6216378 ("KVM: s390: add gib_alert_irq_handler()")
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
---
 arch/s390/kvm/interrupt.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Alexander Gordeev Aug. 23, 2023, 1:23 p.m. UTC | #1
On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote:
...
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 9bd0a873f3b1..73153bea6c24 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm)
>  	if (gi->alert.mask)
>  		KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x",
>  			  kvm, gi->alert.mask);
> -	while (gisa_in_alert_list(gi->origin))
> -		cpu_relax();
> +	while (gisa_in_alert_list(gi->origin)) {
> +		KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm);
> +		process_gib_alert_list();

process_gib_alert_list() has two nested loops and neither of them
does cpu_relax(). I guess, those are needed instead of one you remove?

> +	}
>  	hrtimer_cancel(&gi->timer);
>  	gi->origin = NULL;
>  	VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);
Michael Mueller Aug. 23, 2023, 2:09 p.m. UTC | #2
On 23.08.23 15:23, Alexander Gordeev wrote:
> On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote:
> ...
>> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
>> index 9bd0a873f3b1..73153bea6c24 100644
>> --- a/arch/s390/kvm/interrupt.c
>> +++ b/arch/s390/kvm/interrupt.c
>> @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm)
>>   	if (gi->alert.mask)
>>   		KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x",
>>   			  kvm, gi->alert.mask);
>> -	while (gisa_in_alert_list(gi->origin))
>> -		cpu_relax();
>> +	while (gisa_in_alert_list(gi->origin)) {
>> +		KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm);
>> +		process_gib_alert_list();
> 
> process_gib_alert_list() has two nested loops and neither of them
> does cpu_relax(). I guess, those are needed instead of one you remove?

Calling function process_gib_alert_list() guarantees the gisa
is taken out of the alert list immediately and thus the potential
endless loop on gisa_in_alert_list() is solved. The issue surfaced
with the following patch that accidently disabled the GAL interrupt
processing on the host that normaly handles the alert list.
The patch has been reverted from devel and will be re-applied in v2.

88a096a7a460 Revert "s390/airq: remove lsi_mask from airq_struct"
a9d17c5d8813 s390/airq: remove lsi_mask from airq_struct

Does that make sense for you?

> 
>> +	}
>>   	hrtimer_cancel(&gi->timer);
>>   	gi->origin = NULL;
>>   	VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);
Alexander Gordeev Aug. 23, 2023, 4:01 p.m. UTC | #3
On Wed, Aug 23, 2023 at 04:09:26PM +0200, Michael Mueller wrote:
> 
> 
> On 23.08.23 15:23, Alexander Gordeev wrote:
> > On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote:
> > ...
> > > diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> > > index 9bd0a873f3b1..73153bea6c24 100644
> > > --- a/arch/s390/kvm/interrupt.c
> > > +++ b/arch/s390/kvm/interrupt.c
> > > @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm)
> > >   	if (gi->alert.mask)
> > >   		KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x",
> > >   			  kvm, gi->alert.mask);
> > > -	while (gisa_in_alert_list(gi->origin))
> > > -		cpu_relax();
> > > +	while (gisa_in_alert_list(gi->origin)) {
> > > +		KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm);
> > > +		process_gib_alert_list();
> > 
> > process_gib_alert_list() has two nested loops and neither of them
> > does cpu_relax(). I guess, those are needed instead of one you remove?
> 
> Calling function process_gib_alert_list() guarantees the gisa
> is taken out of the alert list immediately and thus the potential
> endless loop on gisa_in_alert_list() is solved. The issue surfaced
> with the following patch that accidently disabled the GAL interrupt
> processing on the host that normaly handles the alert list.
> The patch has been reverted from devel and will be re-applied in v2.
> 
> 88a096a7a460 Revert "s390/airq: remove lsi_mask from airq_struct"
> a9d17c5d8813 s390/airq: remove lsi_mask from airq_struct
> 
> Does that make sense for you?

Not really. If process_gib_alert_list() does guarantee the removal,
then it should be a condition, not the loop.

But I am actually not into this code. Just wanted to point out that
cpu_relax() is removed from this loop and the two other loops within
process_gib_alert_list() do not have it either.

So up to Christian, Janosch and Claudio.
Claudio Imbrenda Aug. 23, 2023, 4:16 p.m. UTC | #4
On Wed, 23 Aug 2023 18:01:59 +0200
Alexander Gordeev <agordeev@linux.ibm.com> wrote:

> On Wed, Aug 23, 2023 at 04:09:26PM +0200, Michael Mueller wrote:
> > 
> > 
> > On 23.08.23 15:23, Alexander Gordeev wrote:  
> > > On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote:
> > > ...  
> > > > diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> > > > index 9bd0a873f3b1..73153bea6c24 100644
> > > > --- a/arch/s390/kvm/interrupt.c
> > > > +++ b/arch/s390/kvm/interrupt.c
> > > > @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm)
> > > >   	if (gi->alert.mask)
> > > >   		KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x",
> > > >   			  kvm, gi->alert.mask);
> > > > -	while (gisa_in_alert_list(gi->origin))
> > > > -		cpu_relax();
> > > > +	while (gisa_in_alert_list(gi->origin)) {
> > > > +		KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm);
> > > > +		process_gib_alert_list();  
> > > 
> > > process_gib_alert_list() has two nested loops and neither of them
> > > does cpu_relax(). I guess, those are needed instead of one you remove?  
> > 
> > Calling function process_gib_alert_list() guarantees the gisa
> > is taken out of the alert list immediately and thus the potential
> > endless loop on gisa_in_alert_list() is solved. The issue surfaced
> > with the following patch that accidently disabled the GAL interrupt
> > processing on the host that normaly handles the alert list.
> > The patch has been reverted from devel and will be re-applied in v2.
> > 
> > 88a096a7a460 Revert "s390/airq: remove lsi_mask from airq_struct"
> > a9d17c5d8813 s390/airq: remove lsi_mask from airq_struct
> > 
> > Does that make sense for you?  
> 
> Not really. If process_gib_alert_list() does guarantee the removal,
> then it should be a condition, not the loop.

this is actually a good question. why is it still a loop?

> 
> But I am actually not into this code. Just wanted to point out that
> cpu_relax() is removed from this loop and the two other loops within
> process_gib_alert_list() do not have it either.
> 
> So up to Christian, Janosch and Claudio.
Heiko Carstens Aug. 23, 2023, 7:29 p.m. UTC | #5
On Wed, Aug 23, 2023 at 06:01:59PM +0200, Alexander Gordeev wrote:
> On Wed, Aug 23, 2023 at 04:09:26PM +0200, Michael Mueller wrote:
> > Does that make sense for you?
> 
> Not really. If process_gib_alert_list() does guarantee the removal,
> then it should be a condition, not the loop.
> 
> But I am actually not into this code. Just wanted to point out that
> cpu_relax() is removed from this loop and the two other loops within
> process_gib_alert_list() do not have it either.

Not sure if you are mainly referring to the missing cpu_relax(), however:
any chance you missed that cpu_relax() translates only to barrier() on
s390? So it really doesn't "relax" anything. cpu_relax() used to be a
diagnose 0x44 (aka voluntary yield), but that caused many problems,
therefore we removed that logic, and the only thing remaining is a no-op
with compiler barrier semantics.
Michael Mueller Aug. 24, 2023, 10:09 a.m. UTC | #6
On 23.08.23 18:16, Claudio Imbrenda wrote:
> On Wed, 23 Aug 2023 18:01:59 +0200
> Alexander Gordeev <agordeev@linux.ibm.com> wrote:
> 
>> On Wed, Aug 23, 2023 at 04:09:26PM +0200, Michael Mueller wrote:
>>>
>>>
>>> On 23.08.23 15:23, Alexander Gordeev wrote:
>>>> On Wed, Aug 23, 2023 at 02:41:40PM +0200, Michael Mueller wrote:
>>>> ...
>>>>> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
>>>>> index 9bd0a873f3b1..73153bea6c24 100644
>>>>> --- a/arch/s390/kvm/interrupt.c
>>>>> +++ b/arch/s390/kvm/interrupt.c
>>>>> @@ -3205,8 +3205,10 @@ void kvm_s390_gisa_destroy(struct kvm *kvm)
>>>>>    	if (gi->alert.mask)
>>>>>    		KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x",
>>>>>    			  kvm, gi->alert.mask);
>>>>> -	while (gisa_in_alert_list(gi->origin))
>>>>> -		cpu_relax();
>>>>> +	while (gisa_in_alert_list(gi->origin)) {
>>>>> +		KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm);
>>>>> +		process_gib_alert_list();
>>>>
>>>> process_gib_alert_list() has two nested loops and neither of them
>>>> does cpu_relax(). I guess, those are needed instead of one you remove?
>>>
>>> Calling function process_gib_alert_list() guarantees the gisa
>>> is taken out of the alert list immediately and thus the potential
>>> endless loop on gisa_in_alert_list() is solved. The issue surfaced
>>> with the following patch that accidently disabled the GAL interrupt
>>> processing on the host that normaly handles the alert list.
>>> The patch has been reverted from devel and will be re-applied in v2.
>>>
>>> 88a096a7a460 Revert "s390/airq: remove lsi_mask from airq_struct"
>>> a9d17c5d8813 s390/airq: remove lsi_mask from airq_struct
>>>
>>> Does that make sense for you?
>>
>> Not really. If process_gib_alert_list() does guarantee the removal,
>> then it should be a condition, not the loop.
> 
> this is actually a good question. why is it still a loop?

The reason for the loop aproach was that I was not sure if any late
incoming interrupts would bring the gisa back into the alert list
by the firmware.

I verified that this cannot happen if the mask that restores the
interruption alert mask (IAM) is properly set to 0x00 by the last
device driver de-registration before gisa destruction.

In addition I now enforce it to be 0x00 if not already done. (That would
be a bug.) That finally means the *if in_alert_list then 
process_alert_list* is sufficient.

I will send a v2.

> 
>>
>> But I am actually not into this code. Just wanted to point out that
>> cpu_relax() is removed from this loop and the two other loops within
>> process_gib_alert_list() do not have it either.
>>
>> So up to Christian, Janosch and Claudio.
>
diff mbox series

Patch

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 9bd0a873f3b1..73153bea6c24 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -3205,8 +3205,10 @@  void kvm_s390_gisa_destroy(struct kvm *kvm)
 	if (gi->alert.mask)
 		KVM_EVENT(3, "vm 0x%pK has unexpected iam 0x%02x",
 			  kvm, gi->alert.mask);
-	while (gisa_in_alert_list(gi->origin))
-		cpu_relax();
+	while (gisa_in_alert_list(gi->origin)) {
+		KVM_EVENT(3, "vm 0x%pK gisa in alert list during destroy", kvm);
+		process_gib_alert_list();
+	}
 	hrtimer_cancel(&gi->timer);
 	gi->origin = NULL;
 	VM_EVENT(kvm, 3, "gisa 0x%pK destroyed", gisa);