diff mbox

xen: credit2: fix spinlock irq-safety violation

Message ID 1505907844.3483.12.camel@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli Sept. 20, 2017, 11:44 a.m. UTC
On Wed, 2017-09-20 at 11:04 +0100, George Dunlap wrote:
> On 09/20/2017 10:19 AM, George Dunlap wrote:
> > > diff --git a/xen/common/sched_credit2.c
> > > b/xen/common/sched_credit2.c
> > > index 32234ac..7a550db 100644
> > > --- a/xen/common/sched_credit2.c
> > > +++ b/xen/common/sched_credit2.c
> > > @@ -2923,13 +2923,13 @@ csched2_free_domdata(const struct
> > > scheduler *ops, void *data)
> > >  
> > >      write_lock_irqsave(&prv->lock, flags);
> > >  
> > > -    kill_timer(sdom->repl_timer);
> > > -    xfree(sdom->repl_timer);
> > > -
> > >      list_del_init(&sdom->sdom_elem);
> > >  
> > >      write_unlock_irqrestore(&prv->lock, flags);
> > >  
> > > +    kill_timer(sdom->repl_timer);
> > > +    xfree(sdom->repl_timer);
> > 
> > Any particular reason for moving the kill_timer() as well as the
> > xfree
> > outside the lock?  What happens if the timer goes off after the
> > irqrestore but before the kill_timer?
> 
> Looks like if the timer fires, nothing terribly bad will happen; it
> will
> just do a useless budget replenishment.
> 
It's just that it has not reason to be there, as nothing that it does
is serialized by prv->lock, so it only makes the critical section (with
IRQ disabled) longer, for no reason, which is bad, as this being a
write_lock(), it'll stop readers too.

I think it was (my) mistake to put it there in the first place.

> It looks like kill_timer() disables irqs, so it could be moved inside
> the critical section.  I'm inclined to say we should do so.  I don't
> anticipate the budget replenishment ever to need to walk the domain
> list, but should that change, this would be a bear of a bug to find.
> 
Indeed it's rather unlikely for the replenishment handler to have to
use sdom_elem to go through the list of domains.

IAC, if you're concerned about that, I'd much rather put both
kill_timer() and xfree() before the critical section, rather than
after, like in the attached patch.

Dario

Comments

Jan Beulich Sept. 20, 2017, 11:59 a.m. UTC | #1
>>> On 20.09.17 at 13:44, <dario.faggioli@citrix.com> wrote:
> IAC, if you're concerned about that, I'd much rather put both
> kill_timer() and xfree() before the critical section, rather than
> after, like in the attached patch.

Hmm, killing the timer upfront is certainly fine, but is freeing the
data before removing the element from the list safe not only
currently, but also going forward?

Jan
George Dunlap Sept. 20, 2017, 1:49 p.m. UTC | #2
On 09/20/2017 12:59 PM, Jan Beulich wrote:
>>>> On 20.09.17 at 13:44, <dario.faggioli@citrix.com> wrote:
>> IAC, if you're concerned about that, I'd much rather put both
>> kill_timer() and xfree() before the critical section, rather than
>> after, like in the attached patch.
> 
> Hmm, killing the timer upfront is certainly fine, but is freeing the
> data before removing the element from the list safe not only
> currently, but also going forward?

I agree with Jan -- if you don't want to put the kill_timer() in the
critical section, put it beforehand; but don't free the structure until
after the sdom struct has been removed from the list.

Sorry to be picky, but I'm positive I'm not going to remember this in
six months' time, and I'd just rather be safe.

 -George
Dario Faggioli Sept. 21, 2017, 12:29 a.m. UTC | #3
On Wed, 2017-09-20 at 14:49 +0100, George Dunlap wrote:
> On 09/20/2017 12:59 PM, Jan Beulich wrote:
> > Hmm, killing the timer upfront is certainly fine, but is freeing
> > the
> > data before removing the element from the list safe not only
> > currently, but also going forward?
> 
> I agree with Jan -- if you don't want to put the kill_timer() in the
> critical section, put it beforehand; but don't free the structure
> until
> after the sdom struct has been removed from the list.
> 
In general, I agree, and so I'll do this.

In this case, considering what list we are talking about, and what it
is used for right now (i.e., only for debug dump!), there are no
dependencies between these two operations.

And in any scenario that I can anticipate, where such a dependency
would come into being, the level of restructuring of the code that is
necessary to use the list in any really useful way, would be
significant, and there may be a few other cases where a similar
dependency would also surface and become an issue, and that will
probably lead us to consider/remember this code here as well.

So, IOW, I wouldn't consider this a problem, in the specific case. But
since the net effect of this patch and what George suggests is the
same, and since I appreciate the value that it has, in principle, doing
changes like these with this approach, I'm fine with killing before and
freeing after the critical sections, and I'll send a patch for that

Thanks and Regards,
Dario
diff mbox

Patch

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 32234ac..32b0363 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2921,11 +2921,11 @@  csched2_free_domdata(const struct scheduler *ops, void *data)
     struct csched2_dom *sdom = data;
     struct csched2_private *prv = csched2_priv(ops);
 
-    write_lock_irqsave(&prv->lock, flags);
-
     kill_timer(sdom->repl_timer);
     xfree(sdom->repl_timer);
 
+    write_lock_irqsave(&prv->lock, flags);
+
     list_del_init(&sdom->sdom_elem);
 
     write_unlock_irqrestore(&prv->lock, flags);