[v2] spapr: ignore decr interrupts when MSR_EE is disabled

Message ID	20170714040437.22034-1-nikunj@linux.vnet.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> Gateway: Authorized Use Only! Violators will be prosecuted for <qemu-devel@nongnu.org> from <nikunj@linux.vnet.ibm.com>; Fri, 14 Jul 2017 14:06:12 +1000 Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 14 Jul 2017 14:06:09 +1000 From: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> To: qemu-ppc@nongnu.org, david@gibson.dropbear.id.au Date: Fri, 14 Jul 2017 09:34:37 +0530 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Message-Id: <20170714040437.22034-1-nikunj@linux.vnet.ibm.com> Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] [PATCH v2] spapr: ignore decr interrupts when MSR_EE is disabled Precedence: list Cc: bharata@linux.vnet.ibm.com, qemu-devel@nongnu.org, Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>, clg@kaod.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Message ID

20170714040437.22034-1-nikunj@linux.vnet.ibm.com (mailing list archive)

State

New, archived

Headers

From: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
To: qemu-ppc@nongnu.org, david@gibson.dropbear.id.au
Date: Fri, 14 Jul 2017 09:34:37 +0530
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Message-Id: <20170714040437.22034-1-nikunj@linux.vnet.ibm.com>
Content-Transfer-Encoding: quoted-printable
Subject: [Qemu-devel] [PATCH v2] spapr: ignore decr interrupts when MSR_EE
	is disabled
Precedence: list
Cc: bharata@linux.vnet.ibm.com, qemu-devel@nongnu.org,
	Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>, clg@kaod.org
Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Commit Message

Nikunj A. Dadhania July 14, 2017, 4:04 a.m. UTC

Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.

When reset happens, all the CPUs are in halted state. First CPU is brought out
of reset and secondary CPUs would be initialized by the guest kernel using a
rtas call start-cpu.

However, in case of TCG, decrementer interrupts keep on coming and waking the
secondary CPUs up.

These secondary CPUs would see the decrementer interrupt pending, which makes
cpu::has_work() to bring them out of wait loop and start executing
tcg_exec_cpu().

The problem with this is all the CPUs wake up and start booting SLOF image,
causing the following exception(4 CPUs TCG VM):

[   81.440850] reboot: Restarting system

SLOF
S
SLOF
SLOFLOF[0[0m **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
[0m **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
[0m *************************************m**********[?25l **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
***********************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
ERROR: Flatten device tree not available!

 exception 300
SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
ERROR: Flatten device tree not available!

 exception 300
SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target/ppc/translate_init.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Benjamin Herrenschmidt July 15, 2017, 2:38 a.m. UTC | #1

On Fri, 2017-07-14 at 09:34 +0530, Nikunj A Dadhania wrote:
> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> 
> When reset happens, all the CPUs are in halted state. First CPU is brought out
> of reset and secondary CPUs would be initialized by the guest kernel using a
> rtas call start-cpu.
> 
> However, in case of TCG, decrementer interrupts keep on coming and waking the
> secondary CPUs up.
> 
> These secondary CPUs would see the decrementer interrupt pending, which makes
> cpu::has_work() to bring them out of wait loop and start executing
> tcg_exec_cpu().
> 
> The problem with this is all the CPUs wake up and start booting SLOF image,
> causing the following exception(4 CPUs TCG VM):

This patch isn't right....

> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs)
>          }
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
>              (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
> -            return true;
> +            /* Return true only when MSR_EE is enabled */
> +            return msr_ee;
>          }

No, the HW will not check MSR_EE, in fact, MSR_EE can purposefully be
left off and we might expect a wakeup still.

I think the right fix is for reset to clear the LPCR PECE bits.

 Cheers
Ben.
 
> >          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) &&
>              (env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
> @@ -8693,7 +8694,8 @@ static bool cpu_has_work_POWER8(CPUState *cs)
>          }
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
>              (env->spr[SPR_LPCR] & LPCR_P8_PECE3)) {
> -            return true;
> +            /* Return true only when MSR_EE is enabled */
> +            return msr_ee;
>          }
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) &&
>              (env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
> @@ -8876,7 +8878,8 @@ static bool cpu_has_work_POWER9(CPUState *cs)
>          /* Decrementer Exception */
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
>              (env->spr[SPR_LPCR] & LPCR_DEE)) {
> -            return true;
> +            /* Return true only when MSR_EE is enabled */
> +            return msr_ee;
>          }
>          /* Machine Check or Hypervisor Maintenance Exception */
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK |

Nikunj Dadhania July 15, 2017, 3:40 a.m. UTC | #2

On 15-Jul-2017 8:10 AM, "Benjamin Herrenschmidt" <benh@au1.ibm.com> wrote:

On Fri, 2017-07-14 at 09:34 +0530, Nikunj A Dadhania wrote:
> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>
> When reset happens, all the CPUs are in halted state. First CPU is
brought out
> of reset and secondary CPUs would be initialized by the guest kernel
using a
> rtas call start-cpu.
>
> However, in case of TCG, decrementer interrupts keep on coming and waking
the
> secondary CPUs up.
>
> These secondary CPUs would see the decrementer interrupt pending, which
makes
> cpu::has_work() to bring them out of wait loop and start executing
> tcg_exec_cpu().
>
> The problem with this is all the CPUs wake up and start booting SLOF
image,
> causing the following exception(4 CPUs TCG VM):

This patch isn't right....

> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs)
>          }
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
>              (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
> -            return true;
> +            /* Return true only when MSR_EE is enabled */
> +            return msr_ee;
>          }

No, the HW will not check MSR_EE, in fact, MSR_EE can purposefully be
left off and we might expect a wakeup still.


Here we had a decr timer callback which is started during machine init and
remains on even during reset.

In case of HW, does the decr stop on reset?


I think the right fix is for reset to clear the LPCR PECE bits.


Ok, let me try that out.

Regards
Nikunj

diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index 783bf98..8b98076 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -8536,7 +8536,8 @@  static bool cpu_has_work_POWER7(CPUState *cs)
         }
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
             (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
-            return true;
+            /* Return true only when MSR_EE is enabled */
+            return msr_ee;
         }
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) &&
             (env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
@@ -8693,7 +8694,8 @@  static bool cpu_has_work_POWER8(CPUState *cs)
         }
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
             (env->spr[SPR_LPCR] & LPCR_P8_PECE3)) {
-            return true;
+            /* Return true only when MSR_EE is enabled */
+            return msr_ee;
         }
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) &&
             (env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
@@ -8876,7 +8878,8 @@  static bool cpu_has_work_POWER9(CPUState *cs)
         /* Decrementer Exception */
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
             (env->spr[SPR_LPCR] & LPCR_DEE)) {
-            return true;
+            /* Return true only when MSR_EE is enabled */
+            return msr_ee;
         }
         /* Machine Check or Hypervisor Maintenance Exception */
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK |

[v2] spapr: ignore decr interrupts when MSR_EE is disabled

Commit Message

Comments

Patch