Message ID | 20201124070106.26854-4-jgross@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/events: further locking adjustments | expand |
On 24.11.2020 08:01, Juergen Gross wrote: > Two cpus entering evtchn_fifo_set_pending() for the same event channel > can race in case the first one gets interrupted after setting > EVTCHN_FIFO_PENDING and when the other one manages to set > EVTCHN_FIFO_LINKED before the first one is testing that bit. This can > lead to evtchn_check_pollers() being called before the event is put > properly into the queue, resulting eventually in the guest not seeing > the event pending and thus blocking forever afterwards. > > Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel > lock") made the race just more obvious, while the fifo event channel > implementation had this race from the beginning when an unmask operation > was running in parallel with an event channel send operation. Ah yes, but then also only for inter-domain channels, as it was only in that case that the "wrong" domain's event lock was held. IOW there was a much earlier change already where this issue got widened (when the per-channel locking got introduced). This then got reduced to the original scope by XSA-343's adding of locking to evtchn_unmask(). (Not sure how much of this history wants actually adding here. I'm writing it down not the least to make sure I have a complete enough picture.) > For avoiding this race the queue locking in evtchn_fifo_set_pending() > needs to be reworked to cover the test of EVTCHN_FIFO_PENDING, > EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. Perhaps mention the prior possible (and imo more natural) alternative of taking consistent per-channel locks would have been an alternative, until they got converted to rw ones? > Additionally when an > event channel needs to change queues both queues need to be locked > initially. Since this was (afaict) intentionally not the case before, I think I would want to see a word spent on the "why", perhaps better in a code comment than here. Even more so that you delete a respective comment justifying the possible race as permissible. And I have to admit right now I'm still uncertain both ways, i.e. I neither have a clear understanding of why it would have been considered fine the other way around before, nor why the double locking is strictly needed. > Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock") > Fixes: 88910061ec615b2d ("evtchn: add FIFO-based event channel hypercalls and port ops") > Signed-off-by: Juergen Gross <jgross@suse.com> I guess at least this one wants a Reported-by. > @@ -204,6 +175,48 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) > return; > } > > + for ( try = 0; ; try++ ) > + { > + union evtchn_fifo_lastq lastq; > + struct vcpu *old_v; I think this one can have const added. > + lastq.raw = read_atomic(&evtchn->fifo_lastq); > + old_v = d->vcpu[lastq.last_vcpu_id]; > + > + q = &v->evtchn_fifo->queue[evtchn->priority]; > + old_q = &old_v->evtchn_fifo->queue[lastq.last_priority]; > + > + if ( q <= old_q ) > + { > + spin_lock_irqsave(&q->lock, flags); > + if ( q != old_q ) > + spin_lock(&old_q->lock); > + } > + else > + { > + spin_lock_irqsave(&old_q->lock, flags); > + spin_lock(&q->lock); > + } Since the vast majority of cases is going to be q == old_q, would it be worth structuring this like if ( q == old_q ) spin_lock_irqsave(&q->lock, flags); else if ( q < old_q ) { spin_lock_irqsave(&q->lock, flags); spin_lock(&old_q->lock); } else { spin_lock_irqsave(&old_q->lock, flags); spin_lock(&q->lock); } saving (on average) half a conditional in this most common case? (This is specifically different from the double locking in event_channel.c, where the common case is to have different entities. In fact double_evtchn_{,un}lock() look to pointlessly check for chn1 == chn2 - I guess I'll make a patch.) > + lastq.raw = read_atomic(&evtchn->fifo_lastq); > + old_v = d->vcpu[lastq.last_vcpu_id]; > + if ( q == &v->evtchn_fifo->queue[evtchn->priority] && > + old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] ) > + break; > + > + if ( q != old_q ) > + spin_unlock(&old_q->lock); > + spin_unlock_irqrestore(&q->lock, flags); > + > + if ( try == 3 ) > + { > + gprintk(XENLOG_WARNING, > + "dom%d port %d lost event (too many queue changes)\n", > + d->domain_id, evtchn->port); > + return; Originally evtchn_check_pollers() would still have been called in this case. Wouldn't you better retain this, or else justify the possibly observable change in behavior? > @@ -228,22 +239,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) > goto done; > } This if() right above here can, aiui, in principle be moved out of the surrounding if(), at which point ... > - /* > - * No locking around getting the queue. This may race with > - * changing the priority but we are allowed to signal the > - * event once on the old priority. > - */ > - q = &v->evtchn_fifo->queue[evtchn->priority]; > - > - old_q = lock_old_queue(d, evtchn, &flags); > - if ( !old_q ) > - goto done; ... with all of this gone ... > if ( guest_test_and_set_bit(d, EVTCHN_FIFO_LINKED, word) ) > - { > - spin_unlock_irqrestore(&old_q->lock, flags); > goto done; > - } ... this could become part of the outer if(), replacing the 2nd guest_test_bit() there. (Possibly, if deemed worthwhile at all, to be carried out in a separate follow-on patch, to keep focus here on the actual goal.) Jan
On 24.11.20 15:02, Jan Beulich wrote: > On 24.11.2020 08:01, Juergen Gross wrote: >> Two cpus entering evtchn_fifo_set_pending() for the same event channel >> can race in case the first one gets interrupted after setting >> EVTCHN_FIFO_PENDING and when the other one manages to set >> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can >> lead to evtchn_check_pollers() being called before the event is put >> properly into the queue, resulting eventually in the guest not seeing >> the event pending and thus blocking forever afterwards. >> >> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel >> lock") made the race just more obvious, while the fifo event channel >> implementation had this race from the beginning when an unmask operation >> was running in parallel with an event channel send operation. > > Ah yes, but then also only for inter-domain channels, as it was > only in that case that the "wrong" domain's event lock was held. > IOW there was a much earlier change already where this issue > got widened (when the per-channel locking got introduced). This > then got reduced to the original scope by XSA-343's adding of > locking to evtchn_unmask(). (Not sure how much of this history > wants actually adding here. I'm writing it down not the least to > make sure I have a complete enough picture.) I think we both agree that this race was possible for quite some time. And I even think one customer bug I've been looking into recently might be exactly this problem (a dom0 was occasionally hanging in cross-cpu function calls, but switching to 2-level events made the problem disappear). > >> For avoiding this race the queue locking in evtchn_fifo_set_pending() >> needs to be reworked to cover the test of EVTCHN_FIFO_PENDING, >> EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. > > Perhaps mention the prior possible (and imo more natural) > alternative of taking consistent per-channel locks would have > been an alternative, until they got converted to rw ones? Okay (with reasoning why this is no simple option due to the lock needed to be taken with interrupts on and off). > >> Additionally when an >> event channel needs to change queues both queues need to be locked >> initially. > > Since this was (afaict) intentionally not the case before, I > think I would want to see a word spent on the "why", perhaps > better in a code comment than here. Even more so that you > delete a respective comment justifying the possible race as > permissible. And I have to admit right now I'm still uncertain > both ways, i.e. I neither have a clear understanding of why it > would have been considered fine the other way around before, > nor why the double locking is strictly needed. I need the double locking to avoid someone entering the locked region when dropping the lock for the old queue and taking the one for the new queue, as this would open the same race window again. > >> Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock") >> Fixes: 88910061ec615b2d ("evtchn: add FIFO-based event channel hypercalls and port ops") >> Signed-off-by: Juergen Gross <jgross@suse.com> > > I guess at least this one wants a Reported-by. Oh, right. Sorry for missing that. > >> @@ -204,6 +175,48 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) >> return; >> } >> >> + for ( try = 0; ; try++ ) >> + { >> + union evtchn_fifo_lastq lastq; >> + struct vcpu *old_v; > > I think this one can have const added. Yes. > >> + lastq.raw = read_atomic(&evtchn->fifo_lastq); >> + old_v = d->vcpu[lastq.last_vcpu_id]; >> + >> + q = &v->evtchn_fifo->queue[evtchn->priority]; >> + old_q = &old_v->evtchn_fifo->queue[lastq.last_priority]; >> + >> + if ( q <= old_q ) >> + { >> + spin_lock_irqsave(&q->lock, flags); >> + if ( q != old_q ) >> + spin_lock(&old_q->lock); >> + } >> + else >> + { >> + spin_lock_irqsave(&old_q->lock, flags); >> + spin_lock(&q->lock); >> + } > > Since the vast majority of cases is going to be q == old_q, would > it be worth structuring this like > > if ( q == old_q ) > spin_lock_irqsave(&q->lock, flags); > else if ( q < old_q ) > { > spin_lock_irqsave(&q->lock, flags); > spin_lock(&old_q->lock); > } > else > { > spin_lock_irqsave(&old_q->lock, flags); > spin_lock(&q->lock); > } > > saving (on average) half a conditional in this most common > case? (This is specifically different from the double locking in Fine with me. > event_channel.c, where the common case is to have different > entities. In fact double_evtchn_{,un}lock() look to pointlessly > check for chn1 == chn2 - I guess I'll make a patch.) > >> + lastq.raw = read_atomic(&evtchn->fifo_lastq); >> + old_v = d->vcpu[lastq.last_vcpu_id]; >> + if ( q == &v->evtchn_fifo->queue[evtchn->priority] && >> + old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] ) >> + break; >> + >> + if ( q != old_q ) >> + spin_unlock(&old_q->lock); >> + spin_unlock_irqrestore(&q->lock, flags); >> + >> + if ( try == 3 ) >> + { >> + gprintk(XENLOG_WARNING, >> + "dom%d port %d lost event (too many queue changes)\n", >> + d->domain_id, evtchn->port); >> + return; > > Originally evtchn_check_pollers() would still have been called > in this case. Wouldn't you better retain this, or else justify > the possibly observable change in behavior? I could retain it, but without having set the event to be pending I don't see the value in doing so. > >> @@ -228,22 +239,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) >> goto done; >> } > > This if() right above here can, aiui, in principle be moved out > of the surrounding if(), at which point ... It can even be moved out of the locked region. > >> - /* >> - * No locking around getting the queue. This may race with >> - * changing the priority but we are allowed to signal the >> - * event once on the old priority. >> - */ >> - q = &v->evtchn_fifo->queue[evtchn->priority]; >> - >> - old_q = lock_old_queue(d, evtchn, &flags); >> - if ( !old_q ) >> - goto done; > > ... with all of this gone ... > >> if ( guest_test_and_set_bit(d, EVTCHN_FIFO_LINKED, word) ) >> - { >> - spin_unlock_irqrestore(&old_q->lock, flags); >> goto done; >> - } > > ... this could become part of the outer if(), replacing the 2nd > guest_test_bit() there. (Possibly, if deemed worthwhile at all, > to be carried out in a separate follow-on patch, to keep focus > here on the actual goal.) Will add a patch doing that. Juergen
On 24.11.2020 15:49, Jürgen Groß wrote: > On 24.11.20 15:02, Jan Beulich wrote: >> On 24.11.2020 08:01, Juergen Gross wrote: >>> Two cpus entering evtchn_fifo_set_pending() for the same event channel >>> can race in case the first one gets interrupted after setting >>> EVTCHN_FIFO_PENDING and when the other one manages to set >>> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can >>> lead to evtchn_check_pollers() being called before the event is put >>> properly into the queue, resulting eventually in the guest not seeing >>> the event pending and thus blocking forever afterwards. >>> >>> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel >>> lock") made the race just more obvious, while the fifo event channel >>> implementation had this race from the beginning when an unmask operation >>> was running in parallel with an event channel send operation. >> >> Ah yes, but then also only for inter-domain channels, as it was >> only in that case that the "wrong" domain's event lock was held. >> IOW there was a much earlier change already where this issue >> got widened (when the per-channel locking got introduced). This >> then got reduced to the original scope by XSA-343's adding of >> locking to evtchn_unmask(). (Not sure how much of this history >> wants actually adding here. I'm writing it down not the least to >> make sure I have a complete enough picture.) > > I think we both agree that this race was possible for quite some time. > And I even think one customer bug I've been looking into recently > might be exactly this problem (a dom0 was occasionally hanging in > cross-cpu function calls, but switching to 2-level events made the > problem disappear). IPIs weren't affected earlier on (i.e. in any released version), if my analysis above is correct. >>> Additionally when an >>> event channel needs to change queues both queues need to be locked >>> initially. >> >> Since this was (afaict) intentionally not the case before, I >> think I would want to see a word spent on the "why", perhaps >> better in a code comment than here. Even more so that you >> delete a respective comment justifying the possible race as >> permissible. And I have to admit right now I'm still uncertain >> both ways, i.e. I neither have a clear understanding of why it >> would have been considered fine the other way around before, >> nor why the double locking is strictly needed. > > I need the double locking to avoid someone entering the locked region > when dropping the lock for the old queue and taking the one for the > new queue, as this would open the same race window again. Well, that's what have already said. Thing is that the code prior to your change gives the impression as if this race was benign. >>> + lastq.raw = read_atomic(&evtchn->fifo_lastq); >>> + old_v = d->vcpu[lastq.last_vcpu_id]; >>> + if ( q == &v->evtchn_fifo->queue[evtchn->priority] && >>> + old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] ) >>> + break; >>> + >>> + if ( q != old_q ) >>> + spin_unlock(&old_q->lock); >>> + spin_unlock_irqrestore(&q->lock, flags); >>> + >>> + if ( try == 3 ) >>> + { >>> + gprintk(XENLOG_WARNING, >>> + "dom%d port %d lost event (too many queue changes)\n", >>> + d->domain_id, evtchn->port); >>> + return; >> >> Originally evtchn_check_pollers() would still have been called >> in this case. Wouldn't you better retain this, or else justify >> the possibly observable change in behavior? > > I could retain it, but without having set the event to be pending > I don't see the value in doing so. But that's part of my concern - you now don't set PENDING when bailing here. Jan
On 24.11.20 17:32, Jan Beulich wrote: > On 24.11.2020 15:49, Jürgen Groß wrote: >> On 24.11.20 15:02, Jan Beulich wrote: >>> On 24.11.2020 08:01, Juergen Gross wrote: >>>> Two cpus entering evtchn_fifo_set_pending() for the same event channel >>>> can race in case the first one gets interrupted after setting >>>> EVTCHN_FIFO_PENDING and when the other one manages to set >>>> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can >>>> lead to evtchn_check_pollers() being called before the event is put >>>> properly into the queue, resulting eventually in the guest not seeing >>>> the event pending and thus blocking forever afterwards. >>>> >>>> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel >>>> lock") made the race just more obvious, while the fifo event channel >>>> implementation had this race from the beginning when an unmask operation >>>> was running in parallel with an event channel send operation. >>> >>> Ah yes, but then also only for inter-domain channels, as it was >>> only in that case that the "wrong" domain's event lock was held. >>> IOW there was a much earlier change already where this issue >>> got widened (when the per-channel locking got introduced). This >>> then got reduced to the original scope by XSA-343's adding of >>> locking to evtchn_unmask(). (Not sure how much of this history >>> wants actually adding here. I'm writing it down not the least to >>> make sure I have a complete enough picture.) >> >> I think we both agree that this race was possible for quite some time. >> And I even think one customer bug I've been looking into recently >> might be exactly this problem (a dom0 was occasionally hanging in >> cross-cpu function calls, but switching to 2-level events made the >> problem disappear). > > IPIs weren't affected earlier on (i.e. in any released version), > if my analysis above is correct. I don't think it is correct. An unmask operation in parallel with set_pending will have had the same race for IPIs. > >>>> Additionally when an >>>> event channel needs to change queues both queues need to be locked >>>> initially. >>> >>> Since this was (afaict) intentionally not the case before, I >>> think I would want to see a word spent on the "why", perhaps >>> better in a code comment than here. Even more so that you >>> delete a respective comment justifying the possible race as >>> permissible. And I have to admit right now I'm still uncertain >>> both ways, i.e. I neither have a clear understanding of why it >>> would have been considered fine the other way around before, >>> nor why the double locking is strictly needed. >> >> I need the double locking to avoid someone entering the locked region >> when dropping the lock for the old queue and taking the one for the >> new queue, as this would open the same race window again. > > Well, that's what have already said. Thing is that the code > prior to your change gives the impression as if this race was > benign. The race regarding a queue change, yes. But not the race I'm fixing with this patch. I need to make sure that only one caller is inside the big if clause for a specific event. And dropping the lock inside this clause would violate that assumption. > >>>> + lastq.raw = read_atomic(&evtchn->fifo_lastq); >>>> + old_v = d->vcpu[lastq.last_vcpu_id]; >>>> + if ( q == &v->evtchn_fifo->queue[evtchn->priority] && >>>> + old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] ) >>>> + break; >>>> + >>>> + if ( q != old_q ) >>>> + spin_unlock(&old_q->lock); >>>> + spin_unlock_irqrestore(&q->lock, flags); >>>> + >>>> + if ( try == 3 ) >>>> + { >>>> + gprintk(XENLOG_WARNING, >>>> + "dom%d port %d lost event (too many queue changes)\n", >>>> + d->domain_id, evtchn->port); >>>> + return; >>> >>> Originally evtchn_check_pollers() would still have been called >>> in this case. Wouldn't you better retain this, or else justify >>> the possibly observable change in behavior? >> >> I could retain it, but without having set the event to be pending >> I don't see the value in doing so. > > But that's part of my concern - you now don't set PENDING when > bailing here. Hmm, I'm not sure this will really help, as the event still won't be LINKED, what is necessary for the guest to recognize the event to be pending. OTOH if the event is masked setting the PENDING bit would ensure proper handling later, so it is better to do that, even with the risk to have the same old race again in this case (which might have the same effect as not setting PENDING, but with a much lower probability). I'll change the locking to handle that properly. Juergen
On 25.11.2020 06:23, Jürgen Groß wrote: > On 24.11.20 17:32, Jan Beulich wrote: >> On 24.11.2020 15:49, Jürgen Groß wrote: >>> On 24.11.20 15:02, Jan Beulich wrote: >>>> On 24.11.2020 08:01, Juergen Gross wrote: >>>>> Two cpus entering evtchn_fifo_set_pending() for the same event channel >>>>> can race in case the first one gets interrupted after setting >>>>> EVTCHN_FIFO_PENDING and when the other one manages to set >>>>> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can >>>>> lead to evtchn_check_pollers() being called before the event is put >>>>> properly into the queue, resulting eventually in the guest not seeing >>>>> the event pending and thus blocking forever afterwards. >>>>> >>>>> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel >>>>> lock") made the race just more obvious, while the fifo event channel >>>>> implementation had this race from the beginning when an unmask operation >>>>> was running in parallel with an event channel send operation. >>>> >>>> Ah yes, but then also only for inter-domain channels, as it was >>>> only in that case that the "wrong" domain's event lock was held. >>>> IOW there was a much earlier change already where this issue >>>> got widened (when the per-channel locking got introduced). This >>>> then got reduced to the original scope by XSA-343's adding of >>>> locking to evtchn_unmask(). (Not sure how much of this history >>>> wants actually adding here. I'm writing it down not the least to >>>> make sure I have a complete enough picture.) >>> >>> I think we both agree that this race was possible for quite some time. >>> And I even think one customer bug I've been looking into recently >>> might be exactly this problem (a dom0 was occasionally hanging in >>> cross-cpu function calls, but switching to 2-level events made the >>> problem disappear). >> >> IPIs weren't affected earlier on (i.e. in any released version), >> if my analysis above is correct. > > I don't think it is correct. > > An unmask operation in parallel with set_pending will have had the > same race for IPIs. Why? When FIFO locks were introduced, the event lock got acquired around the call to evtchn_unmask(), and IPIs got sent with that lock similarly held. Likewise after XSA-343 evtchn_unmask() as well as the sending of IPIs acquire the per-channel lock (which at that point was still an ordinary spin lock). >>>>> Additionally when an >>>>> event channel needs to change queues both queues need to be locked >>>>> initially. >>>> >>>> Since this was (afaict) intentionally not the case before, I >>>> think I would want to see a word spent on the "why", perhaps >>>> better in a code comment than here. Even more so that you >>>> delete a respective comment justifying the possible race as >>>> permissible. And I have to admit right now I'm still uncertain >>>> both ways, i.e. I neither have a clear understanding of why it >>>> would have been considered fine the other way around before, >>>> nor why the double locking is strictly needed. >>> >>> I need the double locking to avoid someone entering the locked region >>> when dropping the lock for the old queue and taking the one for the >>> new queue, as this would open the same race window again. >> >> Well, that's what have already said. Thing is that the code >> prior to your change gives the impression as if this race was >> benign. > > The race regarding a queue change, yes. But not the race I'm fixing with > this patch. I need to make sure that only one caller is inside the big > if clause for a specific event. And dropping the lock inside this clause > would violate that assumption. IOW the presumed wrong assumption back then was that the function would always be called with a lock already held which excludes the region to be entered twice for the same channel. But - was this a wrong assumption at the time? Thinking about this again I now actually come to the conclusion that my analysis above was wrong in the other direction: Even inter-domain channels did have consistent locking (of the other side's event lock), preventing any such race there. Which implies that imo one of the Fixes: tags wants dropping, as the race became possible only when "downgrading" some of the involved locks to rw ones. Obviously my "evtchn: convert vIRQ lock to an r/w one" then extends this race to vIRQ-s. Jan
On 25.11.20 08:42, Jan Beulich wrote: > On 25.11.2020 06:23, Jürgen Groß wrote: >> On 24.11.20 17:32, Jan Beulich wrote: >>> On 24.11.2020 15:49, Jürgen Groß wrote: >>>> On 24.11.20 15:02, Jan Beulich wrote: >>>>> On 24.11.2020 08:01, Juergen Gross wrote: >>>>>> Two cpus entering evtchn_fifo_set_pending() for the same event channel >>>>>> can race in case the first one gets interrupted after setting >>>>>> EVTCHN_FIFO_PENDING and when the other one manages to set >>>>>> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can >>>>>> lead to evtchn_check_pollers() being called before the event is put >>>>>> properly into the queue, resulting eventually in the guest not seeing >>>>>> the event pending and thus blocking forever afterwards. >>>>>> >>>>>> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel >>>>>> lock") made the race just more obvious, while the fifo event channel >>>>>> implementation had this race from the beginning when an unmask operation >>>>>> was running in parallel with an event channel send operation. >>>>> >>>>> Ah yes, but then also only for inter-domain channels, as it was >>>>> only in that case that the "wrong" domain's event lock was held. >>>>> IOW there was a much earlier change already where this issue >>>>> got widened (when the per-channel locking got introduced). This >>>>> then got reduced to the original scope by XSA-343's adding of >>>>> locking to evtchn_unmask(). (Not sure how much of this history >>>>> wants actually adding here. I'm writing it down not the least to >>>>> make sure I have a complete enough picture.) >>>> >>>> I think we both agree that this race was possible for quite some time. >>>> And I even think one customer bug I've been looking into recently >>>> might be exactly this problem (a dom0 was occasionally hanging in >>>> cross-cpu function calls, but switching to 2-level events made the >>>> problem disappear). >>> >>> IPIs weren't affected earlier on (i.e. in any released version), >>> if my analysis above is correct. >> >> I don't think it is correct. >> >> An unmask operation in parallel with set_pending will have had the >> same race for IPIs. > > Why? When FIFO locks were introduced, the event lock got acquired > around the call to evtchn_unmask(), and IPIs got sent with that > lock similarly held. Likewise after XSA-343 evtchn_unmask() as > well as the sending of IPIs acquire the per-channel lock (which at > that point was still an ordinary spin lock). Oh, I think we are talking about different paths. I'm talking about EVTCHNOP_unmask. There is no lock involved when calling evtchn_unmask(). > >>>>>> Additionally when an >>>>>> event channel needs to change queues both queues need to be locked >>>>>> initially. >>>>> >>>>> Since this was (afaict) intentionally not the case before, I >>>>> think I would want to see a word spent on the "why", perhaps >>>>> better in a code comment than here. Even more so that you >>>>> delete a respective comment justifying the possible race as >>>>> permissible. And I have to admit right now I'm still uncertain >>>>> both ways, i.e. I neither have a clear understanding of why it >>>>> would have been considered fine the other way around before, >>>>> nor why the double locking is strictly needed. >>>> >>>> I need the double locking to avoid someone entering the locked region >>>> when dropping the lock for the old queue and taking the one for the >>>> new queue, as this would open the same race window again. >>> >>> Well, that's what have already said. Thing is that the code >>> prior to your change gives the impression as if this race was >>> benign. >> >> The race regarding a queue change, yes. But not the race I'm fixing with >> this patch. I need to make sure that only one caller is inside the big >> if clause for a specific event. And dropping the lock inside this clause >> would violate that assumption. > > IOW the presumed wrong assumption back then was that the function > would always be called with a lock already held which excludes > the region to be entered twice for the same channel. But - was > this a wrong assumption at the time? Thinking about this again I > now actually come to the conclusion that my analysis above was > wrong in the other direction: Even inter-domain channels did have > consistent locking (of the other side's event lock), preventing > any such race there. Which implies that imo one of the Fixes: tags > wants dropping, as the race became possible only when "downgrading" > some of the involved locks to rw ones. Obviously my "evtchn: > convert vIRQ lock to an r/w one" then extends this race to vIRQ-s. No. See my remark regarding unmask. Juergen
On 25.11.2020 09:08, Jürgen Groß wrote: > On 25.11.20 08:42, Jan Beulich wrote: >> On 25.11.2020 06:23, Jürgen Groß wrote: >>> On 24.11.20 17:32, Jan Beulich wrote: >>>> On 24.11.2020 15:49, Jürgen Groß wrote: >>>>> On 24.11.20 15:02, Jan Beulich wrote: >>>>>> On 24.11.2020 08:01, Juergen Gross wrote: >>>>>>> Two cpus entering evtchn_fifo_set_pending() for the same event channel >>>>>>> can race in case the first one gets interrupted after setting >>>>>>> EVTCHN_FIFO_PENDING and when the other one manages to set >>>>>>> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can >>>>>>> lead to evtchn_check_pollers() being called before the event is put >>>>>>> properly into the queue, resulting eventually in the guest not seeing >>>>>>> the event pending and thus blocking forever afterwards. >>>>>>> >>>>>>> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel >>>>>>> lock") made the race just more obvious, while the fifo event channel >>>>>>> implementation had this race from the beginning when an unmask operation >>>>>>> was running in parallel with an event channel send operation. >>>>>> >>>>>> Ah yes, but then also only for inter-domain channels, as it was >>>>>> only in that case that the "wrong" domain's event lock was held. >>>>>> IOW there was a much earlier change already where this issue >>>>>> got widened (when the per-channel locking got introduced). This >>>>>> then got reduced to the original scope by XSA-343's adding of >>>>>> locking to evtchn_unmask(). (Not sure how much of this history >>>>>> wants actually adding here. I'm writing it down not the least to >>>>>> make sure I have a complete enough picture.) >>>>> >>>>> I think we both agree that this race was possible for quite some time. >>>>> And I even think one customer bug I've been looking into recently >>>>> might be exactly this problem (a dom0 was occasionally hanging in >>>>> cross-cpu function calls, but switching to 2-level events made the >>>>> problem disappear). >>>> >>>> IPIs weren't affected earlier on (i.e. in any released version), >>>> if my analysis above is correct. >>> >>> I don't think it is correct. >>> >>> An unmask operation in parallel with set_pending will have had the >>> same race for IPIs. >> >> Why? When FIFO locks were introduced, the event lock got acquired >> around the call to evtchn_unmask(), and IPIs got sent with that >> lock similarly held. Likewise after XSA-343 evtchn_unmask() as >> well as the sending of IPIs acquire the per-channel lock (which at >> that point was still an ordinary spin lock). > > Oh, I think we are talking about different paths. > > I'm talking about EVTCHNOP_unmask. There is no lock involved when > calling evtchn_unmask(). Above I said "When FIFO locks were introduced, the event lock got acquired around the call to evtchn_unmask()" and further "Likewise after XSA-343 evtchn_unmask() ..." I can't see how we're talking of different paths here. The situation has changed from back then (lock in the callers of evtchn_unmask()) to after XSA-343 (lock in evtchn_unmask()), but there was suitable locking. There was a (large) window in time prior to XSA-343 where there was indeed no locking, but that was introduced by the conversion to per-channel locks and addressed by one of the XSA-343 changes. The issue then got re-introduced by converting spin_lock() to read_lock(). Jan
On 25.11.20 09:25, Jan Beulich wrote: > On 25.11.2020 09:08, Jürgen Groß wrote: >> On 25.11.20 08:42, Jan Beulich wrote: >>> On 25.11.2020 06:23, Jürgen Groß wrote: >>>> On 24.11.20 17:32, Jan Beulich wrote: >>>>> On 24.11.2020 15:49, Jürgen Groß wrote: >>>>>> On 24.11.20 15:02, Jan Beulich wrote: >>>>>>> On 24.11.2020 08:01, Juergen Gross wrote: >>>>>>>> Two cpus entering evtchn_fifo_set_pending() for the same event channel >>>>>>>> can race in case the first one gets interrupted after setting >>>>>>>> EVTCHN_FIFO_PENDING and when the other one manages to set >>>>>>>> EVTCHN_FIFO_LINKED before the first one is testing that bit. This can >>>>>>>> lead to evtchn_check_pollers() being called before the event is put >>>>>>>> properly into the queue, resulting eventually in the guest not seeing >>>>>>>> the event pending and thus blocking forever afterwards. >>>>>>>> >>>>>>>> Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel >>>>>>>> lock") made the race just more obvious, while the fifo event channel >>>>>>>> implementation had this race from the beginning when an unmask operation >>>>>>>> was running in parallel with an event channel send operation. >>>>>>> >>>>>>> Ah yes, but then also only for inter-domain channels, as it was >>>>>>> only in that case that the "wrong" domain's event lock was held. >>>>>>> IOW there was a much earlier change already where this issue >>>>>>> got widened (when the per-channel locking got introduced). This >>>>>>> then got reduced to the original scope by XSA-343's adding of >>>>>>> locking to evtchn_unmask(). (Not sure how much of this history >>>>>>> wants actually adding here. I'm writing it down not the least to >>>>>>> make sure I have a complete enough picture.) >>>>>> >>>>>> I think we both agree that this race was possible for quite some time. >>>>>> And I even think one customer bug I've been looking into recently >>>>>> might be exactly this problem (a dom0 was occasionally hanging in >>>>>> cross-cpu function calls, but switching to 2-level events made the >>>>>> problem disappear). >>>>> >>>>> IPIs weren't affected earlier on (i.e. in any released version), >>>>> if my analysis above is correct. >>>> >>>> I don't think it is correct. >>>> >>>> An unmask operation in parallel with set_pending will have had the >>>> same race for IPIs. >>> >>> Why? When FIFO locks were introduced, the event lock got acquired >>> around the call to evtchn_unmask(), and IPIs got sent with that >>> lock similarly held. Likewise after XSA-343 evtchn_unmask() as >>> well as the sending of IPIs acquire the per-channel lock (which at >>> that point was still an ordinary spin lock). >> >> Oh, I think we are talking about different paths. >> >> I'm talking about EVTCHNOP_unmask. There is no lock involved when >> calling evtchn_unmask(). > > Above I said "When FIFO locks were introduced, the event lock got > acquired around the call to evtchn_unmask()" and further "Likewise > after XSA-343 evtchn_unmask() ..." I can't see how we're talking > of different paths here. The situation has changed from back then > (lock in the callers of evtchn_unmask()) to after XSA-343 (lock in > evtchn_unmask()), but there was suitable locking. There was a > (large) window in time prior to XSA-343 where there was indeed no > locking, but that was introduced by the conversion to per-channel > locks and addressed by one of the XSA-343 changes. The issue then > got re-introduced by converting spin_lock() to read_lock(). Okay, then I misunderstood your claim. I thought you meant there was always suitable locking before the rwlock change. So I need to modify the Fixes: tag. Juergen
diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c index 79090c04ca..a57d459cc2 100644 --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -87,38 +87,6 @@ static void evtchn_fifo_init(struct domain *d, struct evtchn *evtchn) d->domain_id, evtchn->port); } -static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, - struct evtchn *evtchn, - unsigned long *flags) -{ - struct vcpu *v; - struct evtchn_fifo_queue *q, *old_q; - unsigned int try; - union evtchn_fifo_lastq lastq; - - for ( try = 0; try < 3; try++ ) - { - lastq.raw = read_atomic(&evtchn->fifo_lastq); - v = d->vcpu[lastq.last_vcpu_id]; - old_q = &v->evtchn_fifo->queue[lastq.last_priority]; - - spin_lock_irqsave(&old_q->lock, *flags); - - v = d->vcpu[lastq.last_vcpu_id]; - q = &v->evtchn_fifo->queue[lastq.last_priority]; - - if ( old_q == q ) - return old_q; - - spin_unlock_irqrestore(&old_q->lock, *flags); - } - - gprintk(XENLOG_WARNING, - "dom%d port %d lost event (too many queue changes)\n", - d->domain_id, evtchn->port); - return NULL; -} - static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link) { event_word_t new, old; @@ -190,6 +158,9 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) event_word_t *word; unsigned long flags; bool_t was_pending; + struct evtchn_fifo_queue *q, *old_q; + unsigned int try; + bool linked = true; port = evtchn->port; word = evtchn_fifo_word_from_port(d, port); @@ -204,6 +175,48 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) return; } + for ( try = 0; ; try++ ) + { + union evtchn_fifo_lastq lastq; + struct vcpu *old_v; + + lastq.raw = read_atomic(&evtchn->fifo_lastq); + old_v = d->vcpu[lastq.last_vcpu_id]; + + q = &v->evtchn_fifo->queue[evtchn->priority]; + old_q = &old_v->evtchn_fifo->queue[lastq.last_priority]; + + if ( q <= old_q ) + { + spin_lock_irqsave(&q->lock, flags); + if ( q != old_q ) + spin_lock(&old_q->lock); + } + else + { + spin_lock_irqsave(&old_q->lock, flags); + spin_lock(&q->lock); + } + + lastq.raw = read_atomic(&evtchn->fifo_lastq); + old_v = d->vcpu[lastq.last_vcpu_id]; + if ( q == &v->evtchn_fifo->queue[evtchn->priority] && + old_q == &old_v->evtchn_fifo->queue[lastq.last_priority] ) + break; + + if ( q != old_q ) + spin_unlock(&old_q->lock); + spin_unlock_irqrestore(&q->lock, flags); + + if ( try == 3 ) + { + gprintk(XENLOG_WARNING, + "dom%d port %d lost event (too many queue changes)\n", + d->domain_id, evtchn->port); + return; + } + } + was_pending = guest_test_and_set_bit(d, EVTCHN_FIFO_PENDING, word); /* @@ -212,9 +225,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) if ( !guest_test_bit(d, EVTCHN_FIFO_MASKED, word) && !guest_test_bit(d, EVTCHN_FIFO_LINKED, word) ) { - struct evtchn_fifo_queue *q, *old_q; event_word_t *tail_word; - bool_t linked = 0; /* * Control block not mapped. The guest must not unmask an @@ -228,22 +239,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) goto done; } - /* - * No locking around getting the queue. This may race with - * changing the priority but we are allowed to signal the - * event once on the old priority. - */ - q = &v->evtchn_fifo->queue[evtchn->priority]; - - old_q = lock_old_queue(d, evtchn, &flags); - if ( !old_q ) - goto done; - if ( guest_test_and_set_bit(d, EVTCHN_FIFO_LINKED, word) ) - { - spin_unlock_irqrestore(&old_q->lock, flags); goto done; - } /* * If this event was a tail, the old queue is now empty and @@ -262,8 +259,8 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) lastq.last_priority = q->priority; write_atomic(&evtchn->fifo_lastq, lastq.raw); - spin_unlock_irqrestore(&old_q->lock, flags); - spin_lock_irqsave(&q->lock, flags); + spin_unlock(&old_q->lock); + old_q = q; } /* @@ -276,6 +273,7 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) * If the queue is empty (i.e., we haven't linked to the new * event), head must be updated. */ + linked = false; if ( q->tail ) { tail_word = evtchn_fifo_word_from_port(d, q->tail); @@ -284,15 +282,18 @@ static void evtchn_fifo_set_pending(struct vcpu *v, struct evtchn *evtchn) if ( !linked ) write_atomic(q->head, port); q->tail = port; - - spin_unlock_irqrestore(&q->lock, flags); - - if ( !linked - && !guest_test_and_set_bit(d, q->priority, - &v->evtchn_fifo->control_block->ready) ) - vcpu_mark_events_pending(v); } + done: + if ( q != old_q ) + spin_unlock(&old_q->lock); + spin_unlock_irqrestore(&q->lock, flags); + + if ( !linked && + !guest_test_and_set_bit(d, q->priority, + &v->evtchn_fifo->control_block->ready) ) + vcpu_mark_events_pending(v); + if ( !was_pending ) evtchn_check_pollers(d, port); }
Two cpus entering evtchn_fifo_set_pending() for the same event channel can race in case the first one gets interrupted after setting EVTCHN_FIFO_PENDING and when the other one manages to set EVTCHN_FIFO_LINKED before the first one is testing that bit. This can lead to evtchn_check_pollers() being called before the event is put properly into the queue, resulting eventually in the guest not seeing the event pending and thus blocking forever afterwards. Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock") made the race just more obvious, while the fifo event channel implementation had this race from the beginning when an unmask operation was running in parallel with an event channel send operation. For avoiding this race the queue locking in evtchn_fifo_set_pending() needs to be reworked to cover the test of EVTCHN_FIFO_PENDING, EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. Additionally when an event channel needs to change queues both queues need to be locked initially. Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock") Fixes: 88910061ec615b2d ("evtchn: add FIFO-based event channel hypercalls and port ops") Signed-off-by: Juergen Gross <jgross@suse.com> --- xen/common/event_fifo.c | 115 ++++++++++++++++++++-------------------- 1 file changed, 58 insertions(+), 57 deletions(-)