Message ID | 20200608094619.28336-1-paul@xen.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCH-for-4.14] ioreq: handle pending emulation racing with ioreq server destruction | expand |
On 08.06.2020 11:46, Paul Durrant wrote: > From: Paul Durrant <pdurrant@amazon.com> > > When an emulation request is initiated in hvm_send_ioreq() the guest vcpu is > blocked on an event channel until that request is completed. If, however, > the emulator is killed whilst that emulation is pending then the ioreq > server may be destroyed. Thus when the vcpu is awoken the code in > handle_hvm_io_completion() will find no pending request to wait for, but will > leave the internal vcpu io_req.state set to IOREQ_READY and the vcpu shutdown > deferall flag in place (because hvm_io_assist() will never be called). The > emulation request is then completed anyway. This means that any subsequent call > to hvmemul_do_io() will find an unexpected value in io_req.state and will > return X86EMUL_UNHANDLEABLE, which in some cases will result in continuous > re-tries. > > This patch fixes the issue by moving the setting of io_req.state and clearing > of shutdown deferral (as will as MSI-X write completion) out of hvm_io_assist() > and directly into handle_hvm_io_completion(). > > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> with a question: > --- a/xen/arch/x86/hvm/ioreq.c > +++ b/xen/arch/x86/hvm/ioreq.c > @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) > ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; > > if ( hvm_ioreq_needs_completion(ioreq) ) > - { > - ioreq->state = STATE_IORESP_READY; > ioreq->data = data; > - } > - else > - ioreq->state = STATE_IOREQ_NONE; > - > - msix_write_completion(v); > - vcpu_end_shutdown_deferral(v); > > sv->pending = false; > } With this, is the function worth keeping at all? Also I assume the patch has your implied R-a-b? Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 08 June 2020 11:58 > To: Paul Durrant <paul@xen.org> > Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Marek Marczykowski-Górecki > <marmarek@invisiblethingslab.com> > Subject: Re: [PATCH-for-4.14] ioreq: handle pending emulation racing with ioreq server destruction > > On 08.06.2020 11:46, Paul Durrant wrote: > > From: Paul Durrant <pdurrant@amazon.com> > > > > When an emulation request is initiated in hvm_send_ioreq() the guest vcpu is > > blocked on an event channel until that request is completed. If, however, > > the emulator is killed whilst that emulation is pending then the ioreq > > server may be destroyed. Thus when the vcpu is awoken the code in > > handle_hvm_io_completion() will find no pending request to wait for, but will > > leave the internal vcpu io_req.state set to IOREQ_READY and the vcpu shutdown > > deferall flag in place (because hvm_io_assist() will never be called). The > > emulation request is then completed anyway. This means that any subsequent call > > to hvmemul_do_io() will find an unexpected value in io_req.state and will > > return X86EMUL_UNHANDLEABLE, which in some cases will result in continuous > > re-tries. > > > > This patch fixes the issue by moving the setting of io_req.state and clearing > > of shutdown deferral (as will as MSI-X write completion) out of hvm_io_assist() > > and directly into handle_hvm_io_completion(). > > > > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > Signed-off-by: Paul Durrant <pdurrant@amazon.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > with a question: > > > --- a/xen/arch/x86/hvm/ioreq.c > > +++ b/xen/arch/x86/hvm/ioreq.c > > @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) > > ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; > > > > if ( hvm_ioreq_needs_completion(ioreq) ) > > - { > > - ioreq->state = STATE_IORESP_READY; > > ioreq->data = data; > > - } > > - else > > - ioreq->state = STATE_IOREQ_NONE; > > - > > - msix_write_completion(v); > > - vcpu_end_shutdown_deferral(v); > > > > sv->pending = false; > > } > > With this, is the function worth keeping at all? I did think about that, but it is called in more than one place. So, in the interest of trying to make back-porting straightforward, I thought it best to keep it simple. > > Also I assume the patch has your implied R-a-b? > Since it has your R-b now, yes :-) Paul
On 08.06.2020 13:04, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 08 June 2020 11:58 >> To: Paul Durrant <paul@xen.org> >> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Marek Marczykowski-Górecki >> <marmarek@invisiblethingslab.com> >> Subject: Re: [PATCH-for-4.14] ioreq: handle pending emulation racing with ioreq server destruction >> >> On 08.06.2020 11:46, Paul Durrant wrote: >>> From: Paul Durrant <pdurrant@amazon.com> >>> >>> When an emulation request is initiated in hvm_send_ioreq() the guest vcpu is >>> blocked on an event channel until that request is completed. If, however, >>> the emulator is killed whilst that emulation is pending then the ioreq >>> server may be destroyed. Thus when the vcpu is awoken the code in >>> handle_hvm_io_completion() will find no pending request to wait for, but will >>> leave the internal vcpu io_req.state set to IOREQ_READY and the vcpu shutdown >>> deferall flag in place (because hvm_io_assist() will never be called). The >>> emulation request is then completed anyway. This means that any subsequent call >>> to hvmemul_do_io() will find an unexpected value in io_req.state and will >>> return X86EMUL_UNHANDLEABLE, which in some cases will result in continuous >>> re-tries. >>> >>> This patch fixes the issue by moving the setting of io_req.state and clearing >>> of shutdown deferral (as will as MSI-X write completion) out of hvm_io_assist() >>> and directly into handle_hvm_io_completion(). >>> >>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> >>> Signed-off-by: Paul Durrant <pdurrant@amazon.com> >> >> Reviewed-by: Jan Beulich <jbeulich@suse.com> >> with a question: >> >>> --- a/xen/arch/x86/hvm/ioreq.c >>> +++ b/xen/arch/x86/hvm/ioreq.c >>> @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) >>> ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; >>> >>> if ( hvm_ioreq_needs_completion(ioreq) ) >>> - { >>> - ioreq->state = STATE_IORESP_READY; >>> ioreq->data = data; >>> - } >>> - else >>> - ioreq->state = STATE_IOREQ_NONE; >>> - >>> - msix_write_completion(v); >>> - vcpu_end_shutdown_deferral(v); >>> >>> sv->pending = false; >>> } >> >> With this, is the function worth keeping at all? > > I did think about that, but it is called in more than one place. So, in the interest of trying to make back-porting straightforward, I thought it best to keep it simple. > >> >> Also I assume the patch has your implied R-a-b? >> > > Since it has your R-b now, yes :-) Thanks. As said in the other thread, I think the change here will mask a problem elsewhere (presumably in qemu). I'm therefore unsure whether we want to apply it right away, rather than first understanding the root cause of Marek's specific problem. Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 08 June 2020 12:10 > To: paul@xen.org > Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Marek Marczykowski-Górecki' > <marmarek@invisiblethingslab.com> > Subject: Re: [PATCH-for-4.14] ioreq: handle pending emulation racing with ioreq server destruction > > On 08.06.2020 13:04, Paul Durrant wrote: > >> -----Original Message----- > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: 08 June 2020 11:58 > >> To: Paul Durrant <paul@xen.org> > >> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Marek Marczykowski-Górecki > >> <marmarek@invisiblethingslab.com> > >> Subject: Re: [PATCH-for-4.14] ioreq: handle pending emulation racing with ioreq server destruction > >> > >> On 08.06.2020 11:46, Paul Durrant wrote: > >>> From: Paul Durrant <pdurrant@amazon.com> > >>> > >>> When an emulation request is initiated in hvm_send_ioreq() the guest vcpu is > >>> blocked on an event channel until that request is completed. If, however, > >>> the emulator is killed whilst that emulation is pending then the ioreq > >>> server may be destroyed. Thus when the vcpu is awoken the code in > >>> handle_hvm_io_completion() will find no pending request to wait for, but will > >>> leave the internal vcpu io_req.state set to IOREQ_READY and the vcpu shutdown > >>> deferall flag in place (because hvm_io_assist() will never be called). The > >>> emulation request is then completed anyway. This means that any subsequent call > >>> to hvmemul_do_io() will find an unexpected value in io_req.state and will > >>> return X86EMUL_UNHANDLEABLE, which in some cases will result in continuous > >>> re-tries. > >>> > >>> This patch fixes the issue by moving the setting of io_req.state and clearing > >>> of shutdown deferral (as will as MSI-X write completion) out of hvm_io_assist() > >>> and directly into handle_hvm_io_completion(). > >>> > >>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > >>> Signed-off-by: Paul Durrant <pdurrant@amazon.com> > >> > >> Reviewed-by: Jan Beulich <jbeulich@suse.com> > >> with a question: > >> > >>> --- a/xen/arch/x86/hvm/ioreq.c > >>> +++ b/xen/arch/x86/hvm/ioreq.c > >>> @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) > >>> ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; > >>> > >>> if ( hvm_ioreq_needs_completion(ioreq) ) > >>> - { > >>> - ioreq->state = STATE_IORESP_READY; > >>> ioreq->data = data; > >>> - } > >>> - else > >>> - ioreq->state = STATE_IOREQ_NONE; > >>> - > >>> - msix_write_completion(v); > >>> - vcpu_end_shutdown_deferral(v); > >>> > >>> sv->pending = false; > >>> } > >> > >> With this, is the function worth keeping at all? > > > > I did think about that, but it is called in more than one place. So, in the interest of trying to > make back-porting straightforward, I thought it best to keep it simple. > > > >> > >> Also I assume the patch has your implied R-a-b? > >> > > > > Since it has your R-b now, yes :-) > > Thanks. As said in the other thread, I think the change here will mask > a problem elsewhere (presumably in qemu). I'm therefore unsure whether > we want to apply it right away, rather than first understanding the > root cause of Marek's specific problem. > I think it ought to be applied right away since an emulator could die at any time and we don't really want the vcpu constantly bouncing in this case. Also, thinking of using emulators other than qemu, it's not necessarily a bug if they deregister at an arbitrary time... In fact it takes my mind back to PCMCIA cards, when device drivers had to be robust to them disappearing during an I/O cycle. Paul > Jan
On 08.06.2020 13:21, Paul Durrant wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 08 June 2020 12:10 >> >> As said in the other thread, I think the change here will mask >> a problem elsewhere (presumably in qemu). I'm therefore unsure whether >> we want to apply it right away, rather than first understanding the >> root cause of Marek's specific problem. >> > > I think it ought to be applied right away since an emulator could die at any time and we don't really want the vcpu constantly bouncing in this case. Also, thinking of using emulators other than qemu, it's not necessarily a bug if they deregister at an arbitrary time... Oh, sure, I fully agree we need the change regardless of the possible qemu misbehavior. But if we commit this change right away, will there be anyone still caring about / looking into the qemu side issue? Jan
On 08.06.2020 13:04, Paul Durrant wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 08 June 2020 11:58 >> >> On 08.06.2020 11:46, Paul Durrant wrote: >>> --- a/xen/arch/x86/hvm/ioreq.c >>> +++ b/xen/arch/x86/hvm/ioreq.c >>> @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) >>> ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; >>> >>> if ( hvm_ioreq_needs_completion(ioreq) ) >>> - { >>> - ioreq->state = STATE_IORESP_READY; >>> ioreq->data = data; >>> - } >>> - else >>> - ioreq->state = STATE_IOREQ_NONE; >>> - >>> - msix_write_completion(v); >>> - vcpu_end_shutdown_deferral(v); >>> >>> sv->pending = false; >>> } >> >> With this, is the function worth keeping at all? > > I did think about that, but it is called in more than one place. So, > in the interest of trying to make back-porting straightforward, I > thought it best to keep it simple. Fair enough, but going forward I still think it would be nice to get rid of the function. To do this sufficiently cleanly, the main question I have is: Why is hvm_wait_for_io() implemented as a loop? Hasn't this become pointless with the XSA-262 fix? Switching this to a do {} while(false) construct (seeing that the only caller has already checked sv->pending) would look to allow moving what is now in hvm_io_assist() immediately past this construct, at the expense of a local variable holding ~0ul initially and then in the common case p->data. Thoughts? (I'll be happy to make such a patch, I'm just checking whether I'm overlooking something crucial.) Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 09 June 2020 16:08 > To: paul@xen.org > Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Marek Marczykowski-Górecki' > <marmarek@invisiblethingslab.com> > Subject: Re: [PATCH-for-4.14] ioreq: handle pending emulation racing with ioreq server destruction > > On 08.06.2020 13:04, Paul Durrant wrote: > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: 08 June 2020 11:58 > >> > >> On 08.06.2020 11:46, Paul Durrant wrote: > >>> --- a/xen/arch/x86/hvm/ioreq.c > >>> +++ b/xen/arch/x86/hvm/ioreq.c > >>> @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) > >>> ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; > >>> > >>> if ( hvm_ioreq_needs_completion(ioreq) ) > >>> - { > >>> - ioreq->state = STATE_IORESP_READY; > >>> ioreq->data = data; > >>> - } > >>> - else > >>> - ioreq->state = STATE_IOREQ_NONE; > >>> - > >>> - msix_write_completion(v); > >>> - vcpu_end_shutdown_deferral(v); > >>> > >>> sv->pending = false; > >>> } > >> > >> With this, is the function worth keeping at all? > > > > I did think about that, but it is called in more than one place. So, > > in the interest of trying to make back-porting straightforward, I > > thought it best to keep it simple. > > Fair enough, but going forward I still think it would be nice to get > rid of the function. To do this sufficiently cleanly, the main > question I have is: Why is hvm_wait_for_io() implemented as a loop? I guess the idea is that it should tolerate the emulator kicking the event channel spuriously. I don't know whether this was the case at one time, but it seems reasonable to be robust against waking up from wait_on_xen_event_channel() before state has made it to IORESP_READY. > Hasn't this become pointless with the XSA-262 fix? Switching this to > a do {} while(false) construct (seeing that the only caller has > already checked sv->pending) would look to allow moving what is now > in hvm_io_assist() immediately past this construct, at the expense > of a local variable holding ~0ul initially and then in the common > case p->data. That sounds ok. We can do that even with the current while loop by just setting sv->pending to false when we see state == IORESP_READY. After the loop terminates then we can grab the result and set state back to IOREQ_NONE. > > Thoughts? (I'll be happy to make such a patch, I'm just checking > whether I'm overlooking something crucial.) > Only that I don't think we should use do {} while(false) just in case of early wakeup. Paul > Jan
On 09.06.2020 17:28, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 09 June 2020 16:08 >> To: paul@xen.org >> Cc: xen-devel@lists.xenproject.org; 'Paul Durrant' <pdurrant@amazon.com>; 'Marek Marczykowski-Górecki' >> <marmarek@invisiblethingslab.com> >> Subject: Re: [PATCH-for-4.14] ioreq: handle pending emulation racing with ioreq server destruction >> >> On 08.06.2020 13:04, Paul Durrant wrote: >>>> From: Jan Beulich <jbeulich@suse.com> >>>> Sent: 08 June 2020 11:58 >>>> >>>> On 08.06.2020 11:46, Paul Durrant wrote: >>>>> --- a/xen/arch/x86/hvm/ioreq.c >>>>> +++ b/xen/arch/x86/hvm/ioreq.c >>>>> @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) >>>>> ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; >>>>> >>>>> if ( hvm_ioreq_needs_completion(ioreq) ) >>>>> - { >>>>> - ioreq->state = STATE_IORESP_READY; >>>>> ioreq->data = data; >>>>> - } >>>>> - else >>>>> - ioreq->state = STATE_IOREQ_NONE; >>>>> - >>>>> - msix_write_completion(v); >>>>> - vcpu_end_shutdown_deferral(v); >>>>> >>>>> sv->pending = false; >>>>> } >>>> >>>> With this, is the function worth keeping at all? >>> >>> I did think about that, but it is called in more than one place. So, >>> in the interest of trying to make back-porting straightforward, I >>> thought it best to keep it simple. >> >> Fair enough, but going forward I still think it would be nice to get >> rid of the function. To do this sufficiently cleanly, the main >> question I have is: Why is hvm_wait_for_io() implemented as a loop? > > I guess the idea is that it should tolerate the emulator kicking the > event channel spuriously. I don't know whether this was the case at > one time, but it seems reasonable to be robust against waking up > from wait_on_xen_event_channel() before state has made it to > IORESP_READY. But such wakeup is taken care of by "goto recheck", not by the main loop in the function. Jan >> Hasn't this become pointless with the XSA-262 fix? Switching this to >> a do {} while(false) construct (seeing that the only caller has >> already checked sv->pending) would look to allow moving what is now >> in hvm_io_assist() immediately past this construct, at the expense >> of a local variable holding ~0ul initially and then in the common >> case p->data. > > That sounds ok. We can do that even with the current while loop by just setting sv->pending to false when we see state == IORESP_READY. After the loop terminates then we can grab the result and set state back to IOREQ_NONE. > >> >> Thoughts? (I'll be happy to make such a patch, I'm just checking >> whether I'm overlooking something crucial.) >> > > Only that I don't think we should use do {} while(false) just in case of early wakeup. > > Paul > >> Jan >
On 09.06.2020 17:44, Paul Durrant wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 09 June 2020 16:33 >> >> On 09.06.2020 17:28, Paul Durrant wrote: >>>> From: Jan Beulich <jbeulich@suse.com> >>>> Sent: 09 June 2020 16:08 >>>> >>>> On 08.06.2020 13:04, Paul Durrant wrote: >>>>>> From: Jan Beulich <jbeulich@suse.com> >>>>>> Sent: 08 June 2020 11:58 >>>>>> >>>>>> On 08.06.2020 11:46, Paul Durrant wrote: >>>>>>> --- a/xen/arch/x86/hvm/ioreq.c >>>>>>> +++ b/xen/arch/x86/hvm/ioreq.c >>>>>>> @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) >>>>>>> ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; >>>>>>> >>>>>>> if ( hvm_ioreq_needs_completion(ioreq) ) >>>>>>> - { >>>>>>> - ioreq->state = STATE_IORESP_READY; >>>>>>> ioreq->data = data; >>>>>>> - } >>>>>>> - else >>>>>>> - ioreq->state = STATE_IOREQ_NONE; >>>>>>> - >>>>>>> - msix_write_completion(v); >>>>>>> - vcpu_end_shutdown_deferral(v); >>>>>>> >>>>>>> sv->pending = false; >>>>>>> } >>>>>> >>>>>> With this, is the function worth keeping at all? >>>>> >>>>> I did think about that, but it is called in more than one place. So, >>>>> in the interest of trying to make back-porting straightforward, I >>>>> thought it best to keep it simple. >>>> >>>> Fair enough, but going forward I still think it would be nice to get >>>> rid of the function. To do this sufficiently cleanly, the main >>>> question I have is: Why is hvm_wait_for_io() implemented as a loop? >>> >>> I guess the idea is that it should tolerate the emulator kicking the >>> event channel spuriously. I don't know whether this was the case at >>> one time, but it seems reasonable to be robust against waking up >>> from wait_on_xen_event_channel() before state has made it to >>> IORESP_READY. >> >> But such wakeup is taken care of by "goto recheck", not by the main >> loop in the function. > > Oh yes, sorry, I misread. It would be nice to get rid of the goto. I've done this in a pair of patches - first one doing as I suggested, second one doing as you suggested: Getting rid of the label and converting the fake loop put in place by the 1st one into a real loop again. I think it's better in two steps. But this isn't for 4.14, so I won't submit the pair right away. Jan
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index c55c4bc4bc..724007016d 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -109,15 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; if ( hvm_ioreq_needs_completion(ioreq) ) - { - ioreq->state = STATE_IORESP_READY; ioreq->data = data; - } - else - ioreq->state = STATE_IOREQ_NONE; - - msix_write_completion(v); - vcpu_end_shutdown_deferral(v); sv->pending = false; } @@ -209,6 +201,12 @@ bool handle_hvm_io_completion(struct vcpu *v) } } + vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? + STATE_IORESP_READY : STATE_IOREQ_NONE; + + msix_write_completion(v); + vcpu_end_shutdown_deferral(v); + io_completion = vio->io_completion; vio->io_completion = HVMIO_no_completion;