Message ID | 20170714040437.22034-1-nikunj@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 2017-07-14 at 09:34 +0530, Nikunj A Dadhania wrote: > Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. > > When reset happens, all the CPUs are in halted state. First CPU is brought out > of reset and secondary CPUs would be initialized by the guest kernel using a > rtas call start-cpu. > > However, in case of TCG, decrementer interrupts keep on coming and waking the > secondary CPUs up. > > These secondary CPUs would see the decrementer interrupt pending, which makes > cpu::has_work() to bring them out of wait loop and start executing > tcg_exec_cpu(). > > The problem with this is all the CPUs wake up and start booting SLOF image, > causing the following exception(4 CPUs TCG VM): This patch isn't right.... > --- a/target/ppc/translate_init.c > +++ b/target/ppc/translate_init.c > @@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs) > } > if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && > (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) { > - return true; > + /* Return true only when MSR_EE is enabled */ > + return msr_ee; > } No, the HW will not check MSR_EE, in fact, MSR_EE can purposefully be left off and we might expect a wakeup still. I think the right fix is for reset to clear the LPCR PECE bits. Cheers Ben. > > if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) && > (env->spr[SPR_LPCR] & LPCR_P7_PECE2)) { > @@ -8693,7 +8694,8 @@ static bool cpu_has_work_POWER8(CPUState *cs) > } > if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && > (env->spr[SPR_LPCR] & LPCR_P8_PECE3)) { > - return true; > + /* Return true only when MSR_EE is enabled */ > + return msr_ee; > } > if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) && > (env->spr[SPR_LPCR] & LPCR_P8_PECE4)) { > @@ -8876,7 +8878,8 @@ static bool cpu_has_work_POWER9(CPUState *cs) > /* Decrementer Exception */ > if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && > (env->spr[SPR_LPCR] & LPCR_DEE)) { > - return true; > + /* Return true only when MSR_EE is enabled */ > + return msr_ee; > } > /* Machine Check or Hypervisor Maintenance Exception */ > if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK |
On 15-Jul-2017 8:10 AM, "Benjamin Herrenschmidt" <benh@au1.ibm.com> wrote: On Fri, 2017-07-14 at 09:34 +0530, Nikunj A Dadhania wrote: > Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. > > When reset happens, all the CPUs are in halted state. First CPU is brought out > of reset and secondary CPUs would be initialized by the guest kernel using a > rtas call start-cpu. > > However, in case of TCG, decrementer interrupts keep on coming and waking the > secondary CPUs up. > > These secondary CPUs would see the decrementer interrupt pending, which makes > cpu::has_work() to bring them out of wait loop and start executing > tcg_exec_cpu(). > > The problem with this is all the CPUs wake up and start booting SLOF image, > causing the following exception(4 CPUs TCG VM): This patch isn't right.... > --- a/target/ppc/translate_init.c > +++ b/target/ppc/translate_init.c > @@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs) > } > if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && > (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) { > - return true; > + /* Return true only when MSR_EE is enabled */ > + return msr_ee; > } No, the HW will not check MSR_EE, in fact, MSR_EE can purposefully be left off and we might expect a wakeup still. Here we had a decr timer callback which is started during machine init and remains on even during reset. In case of HW, does the decr stop on reset? I think the right fix is for reset to clear the LPCR PECE bits. Ok, let me try that out. Regards Nikunj
diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c index 783bf98..8b98076 100644 --- a/target/ppc/translate_init.c +++ b/target/ppc/translate_init.c @@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs) } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) { - return true; + /* Return true only when MSR_EE is enabled */ + return msr_ee; } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) && (env->spr[SPR_LPCR] & LPCR_P7_PECE2)) { @@ -8693,7 +8694,8 @@ static bool cpu_has_work_POWER8(CPUState *cs) } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_P8_PECE3)) { - return true; + /* Return true only when MSR_EE is enabled */ + return msr_ee; } if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) && (env->spr[SPR_LPCR] & LPCR_P8_PECE4)) { @@ -8876,7 +8878,8 @@ static bool cpu_has_work_POWER9(CPUState *cs) /* Decrementer Exception */ if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) && (env->spr[SPR_LPCR] & LPCR_DEE)) { - return true; + /* Return true only when MSR_EE is enabled */ + return msr_ee; } /* Machine Check or Hypervisor Maintenance Exception */ if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK |