Message ID | 1373436139-27998-1-git-send-email-tiejun.chen@windriver.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10.07.2013, at 08:02, Tiejun Chen wrote: > We should ensure the preemption cannot occur while calling get_paca() > insdide hard_irq_disable(), otherwise the paca_struct may be the > wrong one just after. And btw, we may update timing stats in this case. > > Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> > --- > arch/powerpc/kvm/booke.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > index dcc94f0..9dae25d 100644 > --- a/arch/powerpc/kvm/booke.c > +++ b/arch/powerpc/kvm/booke.c > @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, > WARN_ON(local_paca->irq_happened != 0); > #endif > > + preempt_disable(); > /* > * We enter with interrupts disabled in hardware, but > * we need to call hard_irq_disable anyway to ensure that > @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, > > /* update before a new last_exit_type is rewritten */ > kvmppc_update_timing_stats(vcpu); > + preempt_enable(); All of the code here is already called with interrupts disabled. I don't see how we could preempt then? Alex > > /* restart interrupts if they were meant for the host */ > kvmppc_restart_interrupt(vcpu, exit_nr); > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/10/2013 01:02:19 AM, Tiejun Chen wrote: > We should ensure the preemption cannot occur while calling get_paca() > insdide hard_irq_disable(), otherwise the paca_struct may be the > wrong one just after. And btw, we may update timing stats in this > case. The soft-ee mechanism depends on accessing the PACA directly via r13 to avoid this. We probably should be using inline asm to read was_enabled rather than hoping the compiler doesn't do anything silly. Plus what Alex said, regarding this patch specifically. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/10/2013 05:49 PM, Alexander Graf wrote: > > On 10.07.2013, at 08:02, Tiejun Chen wrote: > >> We should ensure the preemption cannot occur while calling get_paca() >> insdide hard_irq_disable(), otherwise the paca_struct may be the >> wrong one just after. And btw, we may update timing stats in this case. >> >> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> >> --- >> arch/powerpc/kvm/booke.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c >> index dcc94f0..9dae25d 100644 >> --- a/arch/powerpc/kvm/booke.c >> +++ b/arch/powerpc/kvm/booke.c >> @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, >> WARN_ON(local_paca->irq_happened != 0); >> #endif >> >> + preempt_disable(); >> /* >> * We enter with interrupts disabled in hardware, but >> * we need to call hard_irq_disable anyway to ensure that >> @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, >> >> /* update before a new last_exit_type is rewritten */ >> kvmppc_update_timing_stats(vcpu); >> + preempt_enable(); > > All of the code here is already called with interrupts disabled. I don't see how we could preempt then? But the kernel still check this in preempt case, #define get_paca() ((void) debug_smp_processor_id(), local_paca) then trigger that known call trace as I observed :) BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-ppc/2065 caller is .kvmppc_handle_exit+0x48/0x810 CPU: 0 PID: 2065 Comm: qemu-system-ppc Not tainted 3.10.0-172036-ge2daa28-dirty #116 Call Trace: [c0000001fc637570] [c00000000000835c] .show_stack+0x7c/0x1f0 (unreliable) [c0000001fc637640] [c000000000673a0c] .dump_stack+0x28/0x3c [c0000001fc6376b0] [c0000000002f04d8] .debug_smp_processor_id+0x108/0x120 [c0000001fc637740] [c0000000000444e8] .kvmppc_handle_exit+0x48/0x810 [c0000001fc6377f0] [c000000000047c80] .kvmppc_resume_host+0xa4/0xf8 Tiejun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/11/2013 03:15 AM, Scott Wood wrote: > On 07/10/2013 01:02:19 AM, Tiejun Chen wrote: >> We should ensure the preemption cannot occur while calling get_paca() >> insdide hard_irq_disable(), otherwise the paca_struct may be the >> wrong one just after. And btw, we may update timing stats in this case. > > The soft-ee mechanism depends on accessing the PACA directly via r13 to avoid > this. We probably should be using inline asm to read was_enabled rather than Yes. > hoping the compiler doesn't do anything silly. Do you recommend I should directly replace get_paca() with local_paca inside hard_irq_disable()? #define hard_irq_disable() do { \ u8 _was_enabled = get_paca()->soft_enabled; \ -> u8 _was_enabled = local_paca->soft_enabled; But is this safe for all scenarios? Tiejun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11.07.2013, at 04:48, tiejun.chen wrote: > On 07/10/2013 05:49 PM, Alexander Graf wrote: >> >> On 10.07.2013, at 08:02, Tiejun Chen wrote: >> >>> We should ensure the preemption cannot occur while calling get_paca() >>> insdide hard_irq_disable(), otherwise the paca_struct may be the >>> wrong one just after. And btw, we may update timing stats in this case. >>> >>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> >>> --- >>> arch/powerpc/kvm/booke.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c >>> index dcc94f0..9dae25d 100644 >>> --- a/arch/powerpc/kvm/booke.c >>> +++ b/arch/powerpc/kvm/booke.c >>> @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, >>> WARN_ON(local_paca->irq_happened != 0); >>> #endif >>> >>> + preempt_disable(); >>> /* >>> * We enter with interrupts disabled in hardware, but >>> * we need to call hard_irq_disable anyway to ensure that >>> @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, >>> >>> /* update before a new last_exit_type is rewritten */ >>> kvmppc_update_timing_stats(vcpu); >>> + preempt_enable(); >> >> All of the code here is already called with interrupts disabled. I don't see how we could preempt then? > > But the kernel still check this in preempt case, > > #define get_paca() ((void) debug_smp_processor_id(), local_paca) > > then trigger that known call trace as I observed :) > > BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-ppc/2065 # define preemptible() (preempt_count() == 0 && !irqs_disabled()) So we are only hitting this BUG() when either preempt_count is 0 (which your patch is trying to fix) and at the same time interrupts are enabled. But wait - interrupts are disabled, aren't they? Let's check. #define irqs_disabled() \ ({ \ unsigned long _flags; \ raw_local_save_flags(_flags); \ raw_irqs_disabled_flags(_flags); \ }) #define raw_irqs_disabled_flags(flags) \ ({ \ typecheck(unsigned long, flags); \ arch_irqs_disabled_flags(flags); \ }) static inline unsigned long arch_local_save_flags(void) { unsigned long flags; asm volatile( "lbz %0,%1(13)" : "=r" (flags) : "i" (offsetof(struct paca_struct, soft_enabled))); return flags; } static inline bool arch_irqs_disabled_flags(unsigned long flags) { return flags == 0; } So we're running with soft_enabled == 0 here which means that irqs_disabled() also returns 0. Ben, is soft_enabled == 0; hard_enabled == 1 a valid combination that may ever occur? Alex > caller is .kvmppc_handle_exit+0x48/0x810 > CPU: 0 PID: 2065 Comm: qemu-system-ppc Not tainted 3.10.0-172036-ge2daa28-dirty #116 > Call Trace: > [c0000001fc637570] [c00000000000835c] .show_stack+0x7c/0x1f0 (unreliable) > [c0000001fc637640] [c000000000673a0c] .dump_stack+0x28/0x3c > [c0000001fc6376b0] [c0000000002f04d8] .debug_smp_processor_id+0x108/0x120 > [c0000001fc637740] [c0000000000444e8] .kvmppc_handle_exit+0x48/0x810 > [c0000001fc6377f0] [c000000000047c80] .kvmppc_resume_host+0xa4/0xf8 > > Tiejun > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-07-11 at 11:49 +0200, Alexander Graf wrote: > Ben, is soft_enabled == 0; hard_enabled == 1 a valid combination that > may ever occur? Yes of course, that's what we call "soft disabled" :-) It's even the whole point of doing lazy disable... Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11.07.2013, at 14:28, Benjamin Herrenschmidt wrote: > On Thu, 2013-07-11 at 11:49 +0200, Alexander Graf wrote: >> Ben, is soft_enabled == 0; hard_enabled == 1 a valid combination that >> may ever occur? > > Yes of course, that's what we call "soft disabled" :-) It's even the > whole point of doing lazy disable... Meh. Of course it's soft_enabled = 1; hard_enabled = 0. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-07-11 at 14:47 +0200, Alexander Graf wrote: > > Yes of course, that's what we call "soft disabled" :-) It's even the > > whole point of doing lazy disable... > > Meh. Of course it's soft_enabled = 1; hard_enabled = 0. That doesn't happen in "normal" C code. It happens under very specific circumstances, such as in the guts of entry_64.S, in areas where we just want to temporarily mask HW interrupts without changing the SW state (and thus without having to deal with replays etc...). We typically also do that right before going to idle on some processors where we come back from idle with interrupts hard enabled, possibly right in an interrupt vector. Typically that's a state that makes some sense on KVM entry. From a Linux perspective, you enter KVM with interrupts enabled. But you temporarily hard disable on the way down while doing the low level context switch. This works well as long as you know you have no pending replay event. That should be fine if you do a direct transition from soft+hard enabled to hard disabled (without soft disabling). In that case there should be nothing in irq_happened. It's equivalent to returning to userspace from the kernel. We are soft-enabled, but the code in ret_from_except hard disables while mucking around with TIF_FLAGS etc... until the final rfid Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11.07.2013, at 14:54, Benjamin Herrenschmidt wrote: > On Thu, 2013-07-11 at 14:47 +0200, Alexander Graf wrote: >>> Yes of course, that's what we call "soft disabled" :-) It's even the >>> whole point of doing lazy disable... >> >> Meh. Of course it's soft_enabled = 1; hard_enabled = 0. > > That doesn't happen in "normal" C code. It happens under very specific > circumstances, such as in the guts of entry_64.S, in areas where we just > want to temporarily mask HW interrupts without changing the SW state > (and thus without having to deal with replays etc...). > > We typically also do that right before going to idle on some processors > where we come back from idle with interrupts hard enabled, possibly > right in an interrupt vector. > > Typically that's a state that makes some sense on KVM entry. From a > Linux perspective, you enter KVM with interrupts enabled. But you > temporarily hard disable on the way down while doing the low level > context switch. > > This works well as long as you know you have no pending replay event. > That should be fine if you do a direct transition from soft+hard enabled > to hard disabled (without soft disabling). In that case there should be > nothing in irq_happened. > > It's equivalent to returning to userspace from the kernel. We are > soft-enabled, but the code in ret_from_except hard disables while > mucking around with TIF_FLAGS etc... until the final rfid Ok, let me quickly explain the problem. We are leaving host context, switching slowly into guest context. During that transition we call get_paca() indirectly (apparently by another call to hard_disable() which sounds bogus, but that's another story). get_paca() warns when we're preemptible. We're only not preemptible when either preempt is disabled or irqs are disabled. Irqs are disabled, but arch_irqs_disabled() doesn't know, because it only checks for soft disabled IRQs. So we can fix this either by setting IRQs as soft disabled as well or by disabling preemption until we enter the guest for real. Any preferences? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-07-11 at 15:07 +0200, Alexander Graf wrote: > Ok, let me quickly explain the problem. > > We are leaving host context, switching slowly into guest context. > During that transition we call get_paca() indirectly (apparently by > another call to hard_disable() which sounds bogus, but that's another > story). > > get_paca() warns when we're preemptible. We're only not preemptible > when either preempt is disabled or irqs are disabled. Irqs are > disabled, but arch_irqs_disabled() doesn't know, because it only > checks for soft disabled IRQs. > > So we can fix this either by setting IRQs as soft disabled as well or > by disabling preemption until we enter the guest for real. Any > preferences? Well, if you hard disable first (ie, direct transition from full enabled to hard disabled), you know you have nothing lazy pending in irq_pending, then it's ok to mess around with local_paca->soft_enabled to make it "look disabled". IE. Call hard_irq_disable(), then only turn local_paca->soft_enabled back on late in the process, some time before the final rfi(d). That works as long as you had a transition from full enabled to full disabled and don't hard re-enable in the process. IE, You are certain that there is nothing pending in irq_happened. HOWEVER ! If you do that, you *ALSO* need to clear irq_happened. You must *NEVER* leave PACA_IRQ_HARD_DIS set in irq_happened if you are soft-enabled, and since the above means that you *will* be seen as soft enabled on the way out of the guest, you can kaboom. BTW. I'm fine with a patch that does: #define hard_irq_disable() do { \ u8 _was_enabled = get_paca()->soft_enabled; \ __hard_irq_disable(); \ - get_paca()->soft_enabled = 0; \ + local_paca->soft_enabled = 0; \ In fact we should probably do it anyway. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/12/2013 08:19 AM, Benjamin Herrenschmidt wrote: > On Thu, 2013-07-11 at 15:07 +0200, Alexander Graf wrote: >> Ok, let me quickly explain the problem. >> >> We are leaving host context, switching slowly into guest context. >> During that transition we call get_paca() indirectly (apparently by >> another call to hard_disable() which sounds bogus, but that's another >> story). >> >> get_paca() warns when we're preemptible. We're only not preemptible >> when either preempt is disabled or irqs are disabled. Irqs are >> disabled, but arch_irqs_disabled() doesn't know, because it only >> checks for soft disabled IRQs. >> >> So we can fix this either by setting IRQs as soft disabled as well or >> by disabling preemption until we enter the guest for real. Any >> preferences? > > Well, if you hard disable first (ie, direct transition from full enabled > to hard disabled), you know you have nothing lazy pending in > irq_pending, then it's ok to mess around with local_paca->soft_enabled > to make it "look disabled". > > IE. Call hard_irq_disable(), then only turn local_paca->soft_enabled > back on late in the process, some time before the final rfi(d). > > That works as long as you had a transition from full enabled to full > disabled and don't hard re-enable in the process. IE, You are certain > that there is nothing pending in irq_happened. > > HOWEVER ! > > If you do that, you *ALSO* need to clear irq_happened. You must *NEVER* > leave PACA_IRQ_HARD_DIS set in irq_happened if you are soft-enabled, and > since the above means that you *will* be seen as soft enabled on the way > out of the guest, you can kaboom. > > BTW. I'm fine with a patch that does: > > #define hard_irq_disable() do { \ > u8 _was_enabled = get_paca()->soft_enabled; \ Current problem I met is issued from the above line. > __hard_irq_disable(); \ > - get_paca()->soft_enabled = 0; \ Not here. If I'm misunderstanding what you guys means, please correct me since this is a long discussion thread. I have to reread that carefully. Tiejun > + local_paca->soft_enabled = 0; \ > > In fact we should probably do it anyway. > > Cheers, > Ben. > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-07-12 at 10:13 +0800, tiejun.chen wrote: > > #define hard_irq_disable() do { \ > > u8 _was_enabled = get_paca()->soft_enabled; \ > > Current problem I met is issued from the above line. > > > __hard_irq_disable(); \ > > - get_paca()->soft_enabled = 0; \ > > Not here. > > If I'm misunderstanding what you guys means, please correct me since this is a > long discussion thread. I have to reread that carefully. Then make it u8 _was_enabled; __hard_irq_disable(); was_enabled = local_paca->.... Once you have hard disabled, using local_paca directly *should* be safe (minus that gcc problem I mentioned). Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index dcc94f0..9dae25d 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, WARN_ON(local_paca->irq_happened != 0); #endif + preempt_disable(); /* * We enter with interrupts disabled in hardware, but * we need to call hard_irq_disable anyway to ensure that @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + preempt_enable(); /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr);
We should ensure the preemption cannot occur while calling get_paca() insdide hard_irq_disable(), otherwise the paca_struct may be the wrong one just after. And btw, we may update timing stats in this case. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> --- arch/powerpc/kvm/booke.c | 2 ++ 1 file changed, 2 insertions(+)