From patchwork Thu Sep 26 13:53:34 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chao Gao <chao.gao@intel.com>
X-Patchwork-Id: 11162793
Return-Path: <SRS0=M2I+=XV=lists.xenproject.org=xen-devel-bounces@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
 [172.30.200.123])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D18117D4
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Thu, 26 Sep 2019 13:51:24 +0000 (UTC)
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id EA06221A4A
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Thu, 26 Sep 2019 13:51:23 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EA06221A4A
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org;
 spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1iDU9q-0002ky-Ke; Thu, 26 Sep 2019 13:50:10 +0000
Received: from us1-rack-iad1.inumbo.com ([172.99.69.81])
 by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from
 <SRS0=njQ1=XV=intel.com=chao.gao@srs-us1.protection.inumbo.net>)
 id 1iDU9q-0002kZ-8m
 for xen-devel@lists.xenproject.org; Thu, 26 Sep 2019 13:50:10 +0000
X-Inumbo-ID: 846ce508-e064-11e9-8628-bc764e2007e4
Received: from mga04.intel.com (unknown [192.55.52.120])
 by localhost (Halon) with ESMTPS
 id 846ce508-e064-11e9-8628-bc764e2007e4;
 Thu, 26 Sep 2019 13:49:53 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 26 Sep 2019 06:49:53 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.64,552,1559545200"; d="scan'208";a="189126069"
Received: from gao-cwp.sh.intel.com ([10.239.159.26])
 by fmsmga008.fm.intel.com with ESMTP; 26 Sep 2019 06:49:50 -0700
From: Chao Gao <chao.gao@intel.com>
To: xen-devel@lists.xenproject.org
Date: Thu, 26 Sep 2019 21:53:34 +0800
Message-Id: <1569506015-26938-7-git-send-email-chao.gao@intel.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1569506015-26938-1-git-send-email-chao.gao@intel.com>
References: <1569506015-26938-1-git-send-email-chao.gao@intel.com>
Subject: [Xen-devel] [PATCH v11 6/7] microcode: rendezvous CPUs in NMI
 handler and load ucode
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Sergey Dyasli <sergey.dyasli@citrix.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Ashok Raj <ashok.raj@intel.com>,
 Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
 Julien Grall <julien.grall@arm.com>, Jan Beulich <jbeulich@suse.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Chao Gao <chao.gao@intel.com>,
	=?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

When one core is loading ucode, handling NMI on sibling threads or
on other cores in the system might be problematic. By rendezvousing
all CPUs in NMI handler, it prevents NMI acceptance during ucode
loading.

Basically, some work previously done in stop_machine context is
moved to NMI handler. Primary threads call in and load ucode in
NMI handler. Secondary threads wait for the completion of ucode
loading on all CPU cores. An option is introduced to disable this
behavior.

Control thread doesn't rendezvous in NMI handler by calling self_nmi()
(in case of unknown_nmi_error() being triggered). The side effect is
control thread might be handling an NMI and interacting with the old
ucode not in a controlled way while other threads are loading ucode.
Update ucode on the control thread first to mitigate this issue.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Changes in v11:
 - Extend existing 'nmi' option rather than use a new one.
 - use per-cpu variable to store error code of xxx_nmi_work()
 - rename secondary_thread_work to secondary_nmi_work.
 - intialize nmi_patch to ZERO_BLOCK_PTR and make it static.
 - constify nmi_cpu
 - explain why control thread loads ucode first in patch description

Changes in v10:
 - rewrite based on Sergey's idea and patch
 - add Sergey's SOB.
 - add an option to disable ucode loading in NMI handler
 - don't send IPI NMI to the control thread to avoid unknown_nmi_error()
 in do_nmi().
 - add an assertion to make sure the cpu chosen to handle platform NMI
 won't send self NMI. Otherwise, there is a risk that we encounter
 unknown_nmi_error() and system crashes.

Changes in v9:
 - control threads send NMI to all other threads. Slave threads will
 stay in the NMI handling to prevent NMI acceptance during ucode
 loading. Note that self-nmi is invalid according to SDM.
 - s/rep_nop/cpu_relax
 - remove debug message in microcode_nmi_callback(). Printing debug
 message would take long times and control thread may timeout.
 - rebase and fix conflicts

Changes in v8:
 - new
---
 docs/misc/xen-command-line.pandoc |   6 +-
 xen/arch/x86/microcode.c          | 156 ++++++++++++++++++++++++++++++--------
 xen/arch/x86/traps.c              |   6 +-
 xen/include/asm-x86/nmi.h         |   3 +
 4 files changed, 138 insertions(+), 33 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 832797e..8beb285 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2036,7 +2036,7 @@ pages) must also be specified via the tbuf_size parameter.
 > `= unstable | skewed | stable:socket`
 
 ### ucode (x86)
-> `= [<integer> | scan]`
+> `= List of [ <integer> | scan, nmi=<bool> ]`
 
 Specify how and where to find CPU microcode update blob.
 
@@ -2057,6 +2057,10 @@ microcode in the cpio name space must be:
   - on Intel: kernel/x86/microcode/GenuineIntel.bin
   - on AMD  : kernel/x86/microcode/AuthenticAMD.bin
 
+'nmi' determines late loading is performed in NMI handler or just in
+stop_machine context. In NMI handler, even NMIs are blocked, which is
+considered safer. The default value is `true`.
+
 ### unrestricted_guest (Intel)
 > `= <boolean>`
 
diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 6c23879..b9fa8bb 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -36,8 +36,10 @@
 #include <xen/earlycpio.h>
 #include <xen/watchdog.h>
 
+#include <asm/apic.h>
 #include <asm/delay.h>
 #include <asm/msr.h>
+#include <asm/nmi.h>
 #include <asm/processor.h>
 #include <asm/setup.h>
 #include <asm/microcode.h>
@@ -95,6 +97,9 @@ static struct ucode_mod_blob __initdata ucode_blob;
  */
 static bool_t __initdata ucode_scan;
 
+/* By default, ucode loading is done in NMI handler */
+static bool ucode_in_nmi = true;
+
 /* Protected by microcode_mutex */
 static struct microcode_patch *microcode_cache;
 
@@ -105,23 +110,42 @@ void __init microcode_set_module(unsigned int idx)
 }
 
 /*
- * The format is '[<integer>|scan]'. Both options are optional.
+ * The format is '[<integer>|scan, nmi=<bool>]'. Both options are optional.
  * If the EFI has forced which of the multiboot payloads is to be used,
- * no parsing will be attempted.
+ * only nmi=<bool> is parsed.
  */
 static int __init parse_ucode(const char *s)
 {
-    const char *q = NULL;
+    const char *ss;
+    int val, rc = 0;
 
-    if ( ucode_mod_forced ) /* Forced by EFI */
-       return 0;
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
 
-    if ( !strncmp(s, "scan", 4) )
-        ucode_scan = 1;
-    else
-        ucode_mod_idx = simple_strtol(s, &q, 0);
+        if ( (val = parse_boolean("nmi", s, ss)) >= 0 )
+            ucode_in_nmi = val;
+        else if ( !ucode_mod_forced ) /* Not forced by EFI */
+        {
+            const char *q = NULL;
+
+            if ( !strncmp(s, "scan", 4) )
+            {
+                ucode_scan = true;
+                q = s + 4;
+            }
+            else
+                ucode_mod_idx = simple_strtol(s, &q, 0);
+
+            if ( q != ss )
+                rc = -EINVAL;
+        }
+
+        s = ss + 1;
+    } while ( *ss );
 
-    return (q && *q) ? -EINVAL : 0;
+    return rc;
 }
 custom_param("ucode", parse_ucode);
 
@@ -222,6 +246,8 @@ const struct microcode_ops *microcode_ops;
 static DEFINE_SPINLOCK(microcode_mutex);
 
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
+/* Store error code of the work done in NMI handler */
+DEFINE_PER_CPU(int, loading_err);
 
 /*
  * Count the CPUs that have entered, exited the rendezvous and succeeded in
@@ -232,6 +258,7 @@ DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
  */
 static cpumask_t cpu_callin_map;
 static atomic_t cpu_out, cpu_updated;
+static const struct microcode_patch *nmi_patch = ZERO_BLOCK_PTR;
 
 /*
  * Return a patch that covers current CPU. If there are multiple patches,
@@ -356,42 +383,88 @@ static void set_state(unsigned int state)
     smp_wmb();
 }
 
-static int secondary_thread_fn(void)
+static int secondary_nmi_work(void)
 {
-    unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
+    cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
 
-    if ( !wait_for_state(LOADING_CALLIN) )
-        return -EBUSY;
+    return wait_for_state(LOADING_EXIT) ? 0 : -EBUSY;
+}
+
+static int primary_thread_work(const struct microcode_patch *patch)
+{
+    int ret;
 
     cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
 
-    if ( !wait_for_state(LOADING_EXIT) )
+    if ( !wait_for_state(LOADING_ENTER) )
         return -EBUSY;
 
-    /* Copy update revision from the primary thread. */
-    this_cpu(cpu_sig).rev = per_cpu(cpu_sig, primary).rev;
+    ret = microcode_ops->apply_microcode(patch);
+    if ( !ret )
+        atomic_inc(&cpu_updated);
+    atomic_inc(&cpu_out);
 
-    return 0;
+    return ret;
 }
 
-static int primary_thread_fn(const struct microcode_patch *patch)
+static int primary_nmi_work(const struct microcode_patch *patch)
+{
+    return primary_thread_work(patch);
+}
+
+static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
 {
-    int ret = 0;
+    unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
+    int ret;
+
+    /* System-generated NMI, leave to main handler */
+    if ( ACCESS_ONCE(loading_state) != LOADING_CALLIN )
+        return 0;
+
+    /*
+     * Primary threads load ucode in NMI handler on if ucode_in_nmi is true.
+     * Secondary threads are expected to stay in NMI handler regardless of
+     * ucode_in_nmi.
+     */
+    if ( cpu == cpumask_first(&cpu_online_map) ||
+         (!ucode_in_nmi && cpu == primary) )
+        return 0;
+
+    if ( cpu == primary )
+        ret = primary_nmi_work(nmi_patch);
+    else
+        ret = secondary_nmi_work();
+    this_cpu(loading_err) = ret;
+
+    return 0;
+}
 
+static int secondary_thread_fn(void)
+{
     if ( !wait_for_state(LOADING_CALLIN) )
         return -EBUSY;
 
-    cpumask_set_cpu(smp_processor_id(), &cpu_callin_map);
+    self_nmi();
 
-    if ( !wait_for_state(LOADING_ENTER) )
+    /* Copy update revision from the primary thread. */
+    this_cpu(cpu_sig).rev =
+        per_cpu(cpu_sig, cpumask_first(this_cpu(cpu_sibling_mask))).rev;
+
+    return this_cpu(loading_err);
+}
+
+static int primary_thread_fn(const struct microcode_patch *patch)
+{
+    if ( !wait_for_state(LOADING_CALLIN) )
         return -EBUSY;
 
-    ret = microcode_ops->apply_microcode(patch);
-    if ( !ret )
-        atomic_inc(&cpu_updated);
-    atomic_inc(&cpu_out);
+    if ( ucode_in_nmi )
+    {
+        self_nmi();
+        return this_cpu(loading_err);
+    }
 
-    return ret;
+    return primary_thread_work(patch);
 }
 
 static int control_thread_fn(const struct microcode_patch *patch)
@@ -399,6 +472,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
     unsigned int cpu = smp_processor_id(), done;
     unsigned long tick;
     int ret;
+    nmi_callback_t *saved_nmi_callback;
 
     /*
      * We intend to keep interrupt disabled for a long time, which may lead to
@@ -406,6 +480,10 @@ static int control_thread_fn(const struct microcode_patch *patch)
      */
     watchdog_disable();
 
+    nmi_patch = patch;
+    smp_wmb();
+    saved_nmi_callback = set_nmi_callback(microcode_nmi_callback);
+
     /* Allow threads to call in */
     set_state(LOADING_CALLIN);
 
@@ -420,14 +498,23 @@ static int control_thread_fn(const struct microcode_patch *patch)
         return ret;
     }
 
-    /* Let primary threads load the given ucode update */
-    set_state(LOADING_ENTER);
-
+    /* Control thread loads ucode first while others are in NMI handler. */
     ret = microcode_ops->apply_microcode(patch);
     if ( !ret )
         atomic_inc(&cpu_updated);
     atomic_inc(&cpu_out);
 
+    if ( ret == -EIO )
+    {
+        printk(XENLOG_ERR
+               "Late loading aborted: CPU%u failed to update ucode\n", cpu);
+        set_state(LOADING_EXIT);
+        return ret;
+    }
+
+    /* Let primary threads load the given ucode update */
+    set_state(LOADING_ENTER);
+
     tick = rdtsc_ordered();
     /* Wait for primary threads finishing update */
     while ( (done = atomic_read(&cpu_out)) != nr_cores )
@@ -456,6 +543,8 @@ static int control_thread_fn(const struct microcode_patch *patch)
     /* Mark loading is done to unblock other threads */
     set_state(LOADING_EXIT);
 
+    set_nmi_callback(saved_nmi_callback);
+    nmi_patch = ZERO_BLOCK_PTR;
     watchdog_enable();
 
     return ret;
@@ -515,6 +604,13 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
         return -EBUSY;
     }
 
+    /*
+     * CPUs except the first online CPU would send a fake (self) NMI to
+     * rendezvous in NMI handler. But a fake NMI to nmi_cpu may trigger
+     * unknown_nmi_error(). It ensures nmi_cpu won't receive a fake NMI.
+     */
+    ASSERT(cpumask_first(&cpu_online_map) == nmi_cpu);
+
     patch = parse_blob(buffer, len);
     xfree(buffer);
     if ( IS_ERR(patch) )
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 16c590d..2cd5e29 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -126,6 +126,8 @@ boolean_param("ler", opt_ler);
 /* LastExceptionFromIP on this hardware.  Zero if LER is not in use. */
 unsigned int __read_mostly ler_msr;
 
+const unsigned int nmi_cpu;
+
 #define stack_words_per_line 4
 #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)regs->rsp)
 
@@ -1679,7 +1681,7 @@ void do_nmi(const struct cpu_user_regs *regs)
      * this port before we re-arm the NMI watchdog, we reduce the chance
      * of having an NMI watchdog expire while in the SMI handler.
      */
-    if ( cpu == 0 )
+    if ( cpu == nmi_cpu )
         reason = inb(0x61);
 
     if ( (nmi_watchdog == NMI_NONE) ||
@@ -1687,7 +1689,7 @@ void do_nmi(const struct cpu_user_regs *regs)
         handle_unknown = true;
 
     /* Only the BSP gets external NMIs from the system. */
-    if ( cpu == 0 )
+    if ( cpu == nmi_cpu )
     {
         if ( reason & 0x80 )
             pci_serr_error(regs);
diff --git a/xen/include/asm-x86/nmi.h b/xen/include/asm-x86/nmi.h
index 99f6284..f9dfca6 100644
--- a/xen/include/asm-x86/nmi.h
+++ b/xen/include/asm-x86/nmi.h
@@ -11,6 +11,9 @@ extern bool opt_watchdog;
 
 /* Watchdog force parameter from the command line */
 extern bool watchdog_force;
+
+/* CPU to handle platform NMI */
+extern const unsigned int nmi_cpu;
  
 typedef int nmi_callback_t(const struct cpu_user_regs *regs, int cpu);