diff mbox

xen: credit2: fix spinlock irq-safety violation

Message ID 150578708843.32006.17195420852157192880.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli Sept. 19, 2017, 2:11 a.m. UTC
In commit ad4b3e1e9df34 ("xen: credit2: implement
utilization cap") xfree() was being called (for
deallocating the budget replenishment timer, during
domain destruction) inside an IRQ disabled critical
section.

That must not happen, as it uses the mem-pool's lock,
which needs to be taken with IRQ enabled. And, in fact,
we crash (in debug builds):

(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at spinlock.c:47
(XEN) ****************************************

Let's, therefore, kill and deallocate the timer outside of
the critical sections, when IRQs are enabled.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: osstest service owner <osstest-admin@xenproject.org>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
This was spotted by OSSTest's flight 113562:

 http://logs.test-lab.xenproject.org/osstest/logs/113562/
 http://logs.test-lab.xenproject.org/osstest/logs/113562/test-amd64-amd64-xl-credit2/serial-godello0.log
---
 xen/common/sched_credit2.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Wei Liu Sept. 19, 2017, 8:23 a.m. UTC | #1
On Tue, Sep 19, 2017 at 04:11:28AM +0200, Dario Faggioli wrote:
> In commit ad4b3e1e9df34 ("xen: credit2: implement
> utilization cap") xfree() was being called (for
> deallocating the budget replenishment timer, during
> domain destruction) inside an IRQ disabled critical
> section.
> 
> That must not happen, as it uses the mem-pool's lock,
> which needs to be taken with IRQ enabled. And, in fact,
> we crash (in debug builds):
> 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Xen BUG at spinlock.c:47
> (XEN) ****************************************
> 
> Let's, therefore, kill and deallocate the timer outside of
> the critical sections, when IRQs are enabled.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monné Sept. 20, 2017, 9:16 a.m. UTC | #2
On Tue, Sep 19, 2017 at 04:11:28AM +0200, Dario Faggioli wrote:
> In commit ad4b3e1e9df34 ("xen: credit2: implement
> utilization cap") xfree() was being called (for
> deallocating the budget replenishment timer, during
> domain destruction) inside an IRQ disabled critical
> section.
> 
> That must not happen, as it uses the mem-pool's lock,
> which needs to be taken with IRQ enabled. And, in fact,
> we crash (in debug builds):
> 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Xen BUG at spinlock.c:47
> (XEN) ****************************************
> 
> Let's, therefore, kill and deallocate the timer outside of
> the critical sections, when IRQs are enabled.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

It would be good to get this committed sooner rather than later, so
the push gate can be unblocked.

Thanks, Roger.
George Dunlap Sept. 20, 2017, 9:19 a.m. UTC | #3
On 09/19/2017 03:11 AM, Dario Faggioli wrote:
> In commit ad4b3e1e9df34 ("xen: credit2: implement
> utilization cap") xfree() was being called (for
> deallocating the budget replenishment timer, during
> domain destruction) inside an IRQ disabled critical
> section.
> 
> That must not happen, as it uses the mem-pool's lock,
> which needs to be taken with IRQ enabled. And, in fact,
> we crash (in debug builds):
> 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Xen BUG at spinlock.c:47
> (XEN) ****************************************
> 
> Let's, therefore, kill and deallocate the timer outside of
> the critical sections, when IRQs are enabled.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: osstest service owner <osstest-admin@xenproject.org>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
> This was spotted by OSSTest's flight 113562:
> 
>  http://logs.test-lab.xenproject.org/osstest/logs/113562/
>  http://logs.test-lab.xenproject.org/osstest/logs/113562/test-amd64-amd64-xl-credit2/serial-godello0.log
> ---
>  xen/common/sched_credit2.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 32234ac..7a550db 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -2923,13 +2923,13 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
>  
>      write_lock_irqsave(&prv->lock, flags);
>  
> -    kill_timer(sdom->repl_timer);
> -    xfree(sdom->repl_timer);
> -
>      list_del_init(&sdom->sdom_elem);
>  
>      write_unlock_irqrestore(&prv->lock, flags);
>  
> +    kill_timer(sdom->repl_timer);
> +    xfree(sdom->repl_timer);

Any particular reason for moving the kill_timer() as well as the xfree
outside the lock?  What happens if the timer goes off after the
irqrestore but before the kill_timer?

 -George
George Dunlap Sept. 20, 2017, 10:04 a.m. UTC | #4
On 09/20/2017 10:19 AM, George Dunlap wrote:
> On 09/19/2017 03:11 AM, Dario Faggioli wrote:
>> In commit ad4b3e1e9df34 ("xen: credit2: implement
>> utilization cap") xfree() was being called (for
>> deallocating the budget replenishment timer, during
>> domain destruction) inside an IRQ disabled critical
>> section.
>>
>> That must not happen, as it uses the mem-pool's lock,
>> which needs to be taken with IRQ enabled. And, in fact,
>> we crash (in debug builds):
>>
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Xen BUG at spinlock.c:47
>> (XEN) ****************************************
>>
>> Let's, therefore, kill and deallocate the timer outside of
>> the critical sections, when IRQs are enabled.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> ---
>> Cc: osstest service owner <osstest-admin@xenproject.org>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>> This was spotted by OSSTest's flight 113562:
>>
>>  http://logs.test-lab.xenproject.org/osstest/logs/113562/
>>  http://logs.test-lab.xenproject.org/osstest/logs/113562/test-amd64-amd64-xl-credit2/serial-godello0.log
>> ---
>>  xen/common/sched_credit2.c |    6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index 32234ac..7a550db 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -2923,13 +2923,13 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
>>  
>>      write_lock_irqsave(&prv->lock, flags);
>>  
>> -    kill_timer(sdom->repl_timer);
>> -    xfree(sdom->repl_timer);
>> -
>>      list_del_init(&sdom->sdom_elem);
>>  
>>      write_unlock_irqrestore(&prv->lock, flags);
>>  
>> +    kill_timer(sdom->repl_timer);
>> +    xfree(sdom->repl_timer);
> 
> Any particular reason for moving the kill_timer() as well as the xfree
> outside the lock?  What happens if the timer goes off after the
> irqrestore but before the kill_timer?

Looks like if the timer fires, nothing terribly bad will happen; it will
just do a useless budget replenishment.

It looks like kill_timer() disables irqs, so it could be moved inside
the critical section.  I'm inclined to say we should do so.  I don't
anticipate the budget replenishment ever to need to walk the domain
list, but should that change, this would be a bear of a bug to find.

 -George
diff mbox

Patch

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 32234ac..7a550db 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2923,13 +2923,13 @@  csched2_free_domdata(const struct scheduler *ops, void *data)
 
     write_lock_irqsave(&prv->lock, flags);
 
-    kill_timer(sdom->repl_timer);
-    xfree(sdom->repl_timer);
-
     list_del_init(&sdom->sdom_elem);
 
     write_unlock_irqrestore(&prv->lock, flags);
 
+    kill_timer(sdom->repl_timer);
+    xfree(sdom->repl_timer);
+
     xfree(data);
 }