diff mbox

[v1,1/1] KVM: PPC: disable preemption when using hard_irq_disable()

Message ID 1373436139-27998-1-git-send-email-tiejun.chen@windriver.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tiejun Chen July 10, 2013, 6:02 a.m. UTC
We should ensure the preemption cannot occur while calling get_paca()
insdide hard_irq_disable(), otherwise the paca_struct may be the
wrong one just after. And btw, we may update timing stats in this case.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
---
 arch/powerpc/kvm/booke.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

Alexander Graf July 10, 2013, 9:49 a.m. UTC | #1
On 10.07.2013, at 08:02, Tiejun Chen wrote:

> We should ensure the preemption cannot occur while calling get_paca()
> insdide hard_irq_disable(), otherwise the paca_struct may be the
> wrong one just after. And btw, we may update timing stats in this case.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> ---
> arch/powerpc/kvm/booke.c |    2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index dcc94f0..9dae25d 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 	WARN_ON(local_paca->irq_happened != 0);
> #endif
> 
> +	preempt_disable();
> 	/*
> 	 * We enter with interrupts disabled in hardware, but
> 	 * we need to call hard_irq_disable anyway to ensure that
> @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
> 
> 	/* update before a new last_exit_type is rewritten */
> 	kvmppc_update_timing_stats(vcpu);
> +	preempt_enable();

All of the code here is already called with interrupts disabled. I don't see how we could preempt then?


Alex

> 
> 	/* restart interrupts if they were meant for the host */
> 	kvmppc_restart_interrupt(vcpu, exit_nr);
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood July 10, 2013, 7:15 p.m. UTC | #2
On 07/10/2013 01:02:19 AM, Tiejun Chen wrote:
> We should ensure the preemption cannot occur while calling get_paca()
> insdide hard_irq_disable(), otherwise the paca_struct may be the
> wrong one just after. And btw, we may update timing stats in this  
> case.

The soft-ee mechanism depends on accessing the PACA directly via r13 to  
avoid this.  We probably should be using inline asm to read was_enabled  
rather than hoping the compiler doesn't do anything silly.

Plus what Alex said, regarding this patch specifically.

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tiejun Chen July 11, 2013, 2:48 a.m. UTC | #3
On 07/10/2013 05:49 PM, Alexander Graf wrote:
>
> On 10.07.2013, at 08:02, Tiejun Chen wrote:
>
>> We should ensure the preemption cannot occur while calling get_paca()
>> insdide hard_irq_disable(), otherwise the paca_struct may be the
>> wrong one just after. And btw, we may update timing stats in this case.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>> ---
>> arch/powerpc/kvm/booke.c |    2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index dcc94f0..9dae25d 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>> 	WARN_ON(local_paca->irq_happened != 0);
>> #endif
>>
>> +	preempt_disable();
>> 	/*
>> 	 * We enter with interrupts disabled in hardware, but
>> 	 * we need to call hard_irq_disable anyway to ensure that
>> @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>
>> 	/* update before a new last_exit_type is rewritten */
>> 	kvmppc_update_timing_stats(vcpu);
>> +	preempt_enable();
>
> All of the code here is already called with interrupts disabled. I don't see how we could preempt then?

But the kernel still check this in preempt case,

#define get_paca()      ((void) debug_smp_processor_id(), local_paca)

then trigger that known call trace as I observed :)

BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-ppc/2065
caller is .kvmppc_handle_exit+0x48/0x810
CPU: 0 PID: 2065 Comm: qemu-system-ppc Not tainted 3.10.0-172036-ge2daa28-dirty #116
Call Trace:
[c0000001fc637570] [c00000000000835c] .show_stack+0x7c/0x1f0 (unreliable)
[c0000001fc637640] [c000000000673a0c] .dump_stack+0x28/0x3c
[c0000001fc6376b0] [c0000000002f04d8] .debug_smp_processor_id+0x108/0x120
[c0000001fc637740] [c0000000000444e8] .kvmppc_handle_exit+0x48/0x810
[c0000001fc6377f0] [c000000000047c80] .kvmppc_resume_host+0xa4/0xf8

Tiejun

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tiejun Chen July 11, 2013, 3 a.m. UTC | #4
On 07/11/2013 03:15 AM, Scott Wood wrote:
> On 07/10/2013 01:02:19 AM, Tiejun Chen wrote:
>> We should ensure the preemption cannot occur while calling get_paca()
>> insdide hard_irq_disable(), otherwise the paca_struct may be the
>> wrong one just after. And btw, we may update timing stats in this case.
>
> The soft-ee mechanism depends on accessing the PACA directly via r13 to avoid
> this.  We probably should be using inline asm to read was_enabled rather than

Yes.

> hoping the compiler doesn't do anything silly.

Do you recommend I should directly replace get_paca() with local_paca inside 
hard_irq_disable()?

#define hard_irq_disable()      do {                    \
         u8 _was_enabled = get_paca()->soft_enabled;     \

->	u8 _was_enabled = local_paca->soft_enabled;

But is this safe for all scenarios?

Tiejun

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 11, 2013, 9:49 a.m. UTC | #5
On 11.07.2013, at 04:48, tiejun.chen wrote:

> On 07/10/2013 05:49 PM, Alexander Graf wrote:
>> 
>> On 10.07.2013, at 08:02, Tiejun Chen wrote:
>> 
>>> We should ensure the preemption cannot occur while calling get_paca()
>>> insdide hard_irq_disable(), otherwise the paca_struct may be the
>>> wrong one just after. And btw, we may update timing stats in this case.
>>> 
>>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>>> ---
>>> arch/powerpc/kvm/booke.c |    2 ++
>>> 1 file changed, 2 insertions(+)
>>> 
>>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>>> index dcc94f0..9dae25d 100644
>>> --- a/arch/powerpc/kvm/booke.c
>>> +++ b/arch/powerpc/kvm/booke.c
>>> @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>> 	WARN_ON(local_paca->irq_happened != 0);
>>> #endif
>>> 
>>> +	preempt_disable();
>>> 	/*
>>> 	 * We enter with interrupts disabled in hardware, but
>>> 	 * we need to call hard_irq_disable anyway to ensure that
>>> @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>> 
>>> 	/* update before a new last_exit_type is rewritten */
>>> 	kvmppc_update_timing_stats(vcpu);
>>> +	preempt_enable();
>> 
>> All of the code here is already called with interrupts disabled. I don't see how we could preempt then?
> 
> But the kernel still check this in preempt case,
> 
> #define get_paca()      ((void) debug_smp_processor_id(), local_paca)
> 
> then trigger that known call trace as I observed :)
> 
> BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-ppc/2065

# define preemptible()	(preempt_count() == 0 && !irqs_disabled())

So we are only hitting this BUG() when either preempt_count is 0 (which your patch is trying to fix) and at the same time interrupts are enabled. But wait - interrupts are disabled, aren't they? Let's check.

#define irqs_disabled()                                 \
        ({                                              \
                unsigned long _flags;                   \
                raw_local_save_flags(_flags);           \
                raw_irqs_disabled_flags(_flags);        \
        })

#define raw_irqs_disabled_flags(flags)                  \
        ({                                              \
                typecheck(unsigned long, flags);        \
                arch_irqs_disabled_flags(flags);        \
        })

static inline unsigned long arch_local_save_flags(void)
{
        unsigned long flags;

        asm volatile(
                "lbz %0,%1(13)"
                : "=r" (flags)
                : "i" (offsetof(struct paca_struct, soft_enabled)));

        return flags;
}

static inline bool arch_irqs_disabled_flags(unsigned long flags)
{
        return flags == 0;
}


So we're running with soft_enabled == 0 here which means that irqs_disabled() also returns 0.

Ben, is soft_enabled == 0; hard_enabled == 1 a valid combination that may ever occur?


Alex

> caller is .kvmppc_handle_exit+0x48/0x810
> CPU: 0 PID: 2065 Comm: qemu-system-ppc Not tainted 3.10.0-172036-ge2daa28-dirty #116
> Call Trace:
> [c0000001fc637570] [c00000000000835c] .show_stack+0x7c/0x1f0 (unreliable)
> [c0000001fc637640] [c000000000673a0c] .dump_stack+0x28/0x3c
> [c0000001fc6376b0] [c0000000002f04d8] .debug_smp_processor_id+0x108/0x120
> [c0000001fc637740] [c0000000000444e8] .kvmppc_handle_exit+0x48/0x810
> [c0000001fc6377f0] [c000000000047c80] .kvmppc_resume_host+0xa4/0xf8
> 
> Tiejun
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt July 11, 2013, 12:28 p.m. UTC | #6
On Thu, 2013-07-11 at 11:49 +0200, Alexander Graf wrote:
> Ben, is soft_enabled == 0; hard_enabled == 1 a valid combination that
> may ever occur?

Yes of course, that's what we call "soft disabled" :-) It's even the
whole point of doing lazy disable...

Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 11, 2013, 12:47 p.m. UTC | #7
On 11.07.2013, at 14:28, Benjamin Herrenschmidt wrote:

> On Thu, 2013-07-11 at 11:49 +0200, Alexander Graf wrote:
>> Ben, is soft_enabled == 0; hard_enabled == 1 a valid combination that
>> may ever occur?
> 
> Yes of course, that's what we call "soft disabled" :-) It's even the
> whole point of doing lazy disable...

Meh. Of course it's soft_enabled = 1; hard_enabled = 0.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt July 11, 2013, 12:54 p.m. UTC | #8
On Thu, 2013-07-11 at 14:47 +0200, Alexander Graf wrote:
> > Yes of course, that's what we call "soft disabled" :-) It's even the
> > whole point of doing lazy disable...
> 
> Meh. Of course it's soft_enabled = 1; hard_enabled = 0.

That doesn't happen in "normal" C code. It happens under very specific
circumstances, such as in the guts of entry_64.S, in areas where we just
want to temporarily mask HW interrupts without changing the SW state
(and thus without having to deal with replays etc...).

We typically also do that right before going to idle on some processors
where we come back from idle with interrupts hard enabled, possibly
right in an interrupt vector.

Typically that's a state that makes some sense on KVM entry. From a
Linux perspective, you enter KVM with interrupts enabled. But you
temporarily hard disable on the way down while doing the low level
context switch.

This works well as long as you know you have no pending replay event.
That should be fine if you do a direct transition from soft+hard enabled
to hard disabled (without soft disabling). In that case there should be
nothing in irq_happened.

It's equivalent to returning to userspace from the kernel. We are
soft-enabled, but the code in ret_from_except hard disables while
mucking around with TIF_FLAGS etc... until the final rfid

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf July 11, 2013, 1:07 p.m. UTC | #9
On 11.07.2013, at 14:54, Benjamin Herrenschmidt wrote:

> On Thu, 2013-07-11 at 14:47 +0200, Alexander Graf wrote:
>>> Yes of course, that's what we call "soft disabled" :-) It's even the
>>> whole point of doing lazy disable...
>> 
>> Meh. Of course it's soft_enabled = 1; hard_enabled = 0.
> 
> That doesn't happen in "normal" C code. It happens under very specific
> circumstances, such as in the guts of entry_64.S, in areas where we just
> want to temporarily mask HW interrupts without changing the SW state
> (and thus without having to deal with replays etc...).
> 
> We typically also do that right before going to idle on some processors
> where we come back from idle with interrupts hard enabled, possibly
> right in an interrupt vector.
> 
> Typically that's a state that makes some sense on KVM entry. From a
> Linux perspective, you enter KVM with interrupts enabled. But you
> temporarily hard disable on the way down while doing the low level
> context switch.
> 
> This works well as long as you know you have no pending replay event.
> That should be fine if you do a direct transition from soft+hard enabled
> to hard disabled (without soft disabling). In that case there should be
> nothing in irq_happened.
> 
> It's equivalent to returning to userspace from the kernel. We are
> soft-enabled, but the code in ret_from_except hard disables while
> mucking around with TIF_FLAGS etc... until the final rfid

Ok, let me quickly explain the problem.

We are leaving host context, switching slowly into guest context. During that transition we call get_paca() indirectly (apparently by another call to hard_disable() which sounds bogus, but that's another story).

get_paca() warns when we're preemptible. We're only not preemptible when either preempt is disabled or irqs are disabled. Irqs are disabled, but arch_irqs_disabled() doesn't know, because it only checks for soft disabled IRQs.

So we can fix this either by setting IRQs as soft disabled as well or by disabling preemption until we enter the guest for real. Any preferences?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt July 12, 2013, 12:19 a.m. UTC | #10
On Thu, 2013-07-11 at 15:07 +0200, Alexander Graf wrote:
> Ok, let me quickly explain the problem.
> 
> We are leaving host context, switching slowly into guest context.
> During that transition we call get_paca() indirectly (apparently by
> another call to hard_disable() which sounds bogus, but that's another
> story).
> 
> get_paca() warns when we're preemptible. We're only not preemptible
> when either preempt is disabled or irqs are disabled. Irqs are
> disabled, but arch_irqs_disabled() doesn't know, because it only
> checks for soft disabled IRQs.
> 
> So we can fix this either by setting IRQs as soft disabled as well or
> by disabling preemption until we enter the guest for real. Any
> preferences?

Well, if you hard disable first (ie, direct transition from full enabled
to hard disabled), you know you have nothing lazy pending in
irq_pending, then it's ok to mess around with local_paca->soft_enabled
to make it "look disabled".

IE. Call hard_irq_disable(), then only turn local_paca->soft_enabled
back on late in the process, some time before the final rfi(d).

That works as long as you had a transition from full enabled to full
disabled and don't hard re-enable in the process. IE, You are certain
that there is nothing pending in irq_happened.

HOWEVER !

If you do that, you *ALSO* need to clear irq_happened. You must *NEVER*
leave PACA_IRQ_HARD_DIS set in irq_happened if you are soft-enabled, and
since the above means that you *will* be seen as soft enabled on the way
out of the guest, you can kaboom.

BTW. I'm fine with a patch that does:

#define hard_irq_disable()	do {			\
	u8 _was_enabled = get_paca()->soft_enabled;	\
	__hard_irq_disable();				\
-	get_paca()->soft_enabled = 0;			\
+	local_paca->soft_enabled = 0;			\

In fact we should probably do it anyway.

Cheers,
Ben.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tiejun Chen July 12, 2013, 2:13 a.m. UTC | #11
On 07/12/2013 08:19 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2013-07-11 at 15:07 +0200, Alexander Graf wrote:
>> Ok, let me quickly explain the problem.
>>
>> We are leaving host context, switching slowly into guest context.
>> During that transition we call get_paca() indirectly (apparently by
>> another call to hard_disable() which sounds bogus, but that's another
>> story).
>>
>> get_paca() warns when we're preemptible. We're only not preemptible
>> when either preempt is disabled or irqs are disabled. Irqs are
>> disabled, but arch_irqs_disabled() doesn't know, because it only
>> checks for soft disabled IRQs.
>>
>> So we can fix this either by setting IRQs as soft disabled as well or
>> by disabling preemption until we enter the guest for real. Any
>> preferences?
>
> Well, if you hard disable first (ie, direct transition from full enabled
> to hard disabled), you know you have nothing lazy pending in
> irq_pending, then it's ok to mess around with local_paca->soft_enabled
> to make it "look disabled".
>
> IE. Call hard_irq_disable(), then only turn local_paca->soft_enabled
> back on late in the process, some time before the final rfi(d).
>
> That works as long as you had a transition from full enabled to full
> disabled and don't hard re-enable in the process. IE, You are certain
> that there is nothing pending in irq_happened.
>
> HOWEVER !
>
> If you do that, you *ALSO* need to clear irq_happened. You must *NEVER*
> leave PACA_IRQ_HARD_DIS set in irq_happened if you are soft-enabled, and
> since the above means that you *will* be seen as soft enabled on the way
> out of the guest, you can kaboom.
>
> BTW. I'm fine with a patch that does:
>
> #define hard_irq_disable()	do {			\
> 	u8 _was_enabled = get_paca()->soft_enabled;	\

Current problem I met is issued from the above line.

> 	__hard_irq_disable();				\
> -	get_paca()->soft_enabled = 0;			\

Not here.

If I'm misunderstanding what you guys means, please correct me since this is a 
long discussion thread. I have to reread that carefully.

Tiejun

> +	local_paca->soft_enabled = 0;			\
>
> In fact we should probably do it anyway.
>
> Cheers,
> Ben.
>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt July 12, 2013, 3:57 a.m. UTC | #12
On Fri, 2013-07-12 at 10:13 +0800, tiejun.chen wrote:
> > #define hard_irq_disable()    do {                    \
> >       u8 _was_enabled = get_paca()->soft_enabled;     \
> 
> Current problem I met is issued from the above line.
> 
> >       __hard_irq_disable();                           \
> > -     get_paca()->soft_enabled = 0;                   \
> 
> Not here.
> 
> If I'm misunderstanding what you guys means, please correct me since this is a 
> long discussion thread. I have to reread that carefully.

Then make it
	u8 _was_enabled;
	__hard_irq_disable();
	was_enabled = local_paca->....

Once you have hard disabled, using local_paca directly *should* be safe
(minus that gcc problem I mentioned).

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index dcc94f0..9dae25d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -839,6 +839,7 @@  int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	WARN_ON(local_paca->irq_happened != 0);
 #endif
 
+	preempt_disable();
 	/*
 	 * We enter with interrupts disabled in hardware, but
 	 * we need to call hard_irq_disable anyway to ensure that
@@ -848,6 +849,7 @@  int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 	/* update before a new last_exit_type is rewritten */
 	kvmppc_update_timing_stats(vcpu);
+	preempt_enable();
 
 	/* restart interrupts if they were meant for the host */
 	kvmppc_restart_interrupt(vcpu, exit_nr);