From patchwork Thu Aug  1 10:22:49 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chao Gao <chao.gao@intel.com>
X-Patchwork-Id: 11070471
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C812A112C
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Thu,  1 Aug 2019 10:20:53 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B9950283B2
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Thu,  1 Aug 2019 10:20:53 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id ADA12283C9; Thu,  1 Aug 2019 10:20:53 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CC7C426E3C
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Thu,  1 Aug 2019 10:20:52 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1ht8BA-00025L-9i; Thu, 01 Aug 2019 10:19:24 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from
 <SRS0=TV56=V5=intel.com=chao.gao@srs-us1.protection.inumbo.net>)
 id 1ht8B8-00023i-QV
 for xen-devel@lists.xenproject.org; Thu, 01 Aug 2019 10:19:22 +0000
X-Inumbo-ID: d1c44e29-b445-11e9-8980-bc764e045a96
Received: from mga17.intel.com (unknown [192.55.52.151])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id d1c44e29-b445-11e9-8980-bc764e045a96;
 Thu, 01 Aug 2019 10:19:20 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 01 Aug 2019 03:19:20 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.64,333,1559545200"; d="scan'208";a="175207972"
Received: from gao-cwp.sh.intel.com ([10.239.159.26])
 by orsmga003.jf.intel.com with ESMTP; 01 Aug 2019 03:19:18 -0700
From: Chao Gao <chao.gao@intel.com>
To: xen-devel@lists.xenproject.org
Date: Thu,  1 Aug 2019 18:22:49 +0800
Message-Id: <1564654971-31328-15-git-send-email-chao.gao@intel.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1564654971-31328-1-git-send-email-chao.gao@intel.com>
References: <1564654971-31328-1-git-send-email-chao.gao@intel.com>
Subject: [Xen-devel] [PATCH v8 14/16] x86/microcode: Synchronize late
 microcode loading
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Kevin Tian <kevin.tian@intel.com>, Borislav Petkov <bp@suse.de>,
 Ashok Raj <ashok.raj@intel.com>, Wei Liu <wl@xen.org>,
 Jun Nakajima <jun.nakajima@intel.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <jbeulich@suse.com>,
 Thomas Gleixner <tglx@linutronix.de>, Chao Gao <chao.gao@intel.com>,
	=?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Tested-by: Chao Gao <chao.gao@intel.com>
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes in v8:
 - to support blocking #NMI handling during loading ucode
   * introduce a flag, 'loading_state', to mark the start or end of
     ucode loading.
   * use a bitmap for cpu callin since if cpu may stay in #NMI handling,
     there are two places for a cpu to call in. bitmap won't be counted
     twice.
   * don't wait for all CPUs callout, just wait for CPUs that perform the
     update. We have to do this because some threads may be stuck in NMI
     handling (where cannot reach the rendezvous).
 - emit a warning if the system stays in stop_machine context for more
 than 1s
 - comment that rdtsc is fine while loading an update
 - use cmpxchg() to avoid panic being called on multiple CPUs
 - Propagate revision number to other threads
 - refine comments and prompt messages

Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.
---
 xen/arch/x86/microcode.c | 229 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 207 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index cbaf13d..67549be 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -30,18 +30,41 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/stop_machine.h>
 #include <xen/tasklet.h>
 #include <xen/guest_access.h>
 #include <xen/earlycpio.h>
+#include <xen/watchdog.h>
 
+#include <asm/delay.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
 #include <asm/setup.h>
 #include <asm/microcode.h>
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 30000
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 1000000
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
+static unsigned int nr_cores;
+static enum {
+    LOADING_EXITED,
+    LOADING_ENTERED,
+    LOADING_ABORTED,
+} loading_state;
 
 /*
  * If we scan the initramfs.cpio for the early microcode code
@@ -190,6 +213,16 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ *
+ * Note that a bitmap is used for callin to allow cpu to set a bit multiple
+ * times. It is required to do busy-loop in #NMI handling.
+ */
+static cpumask_t cpu_callin_map;
+static atomic_t cpu_out, cpu_updated;
+
+/*
  * Return a patch that covers current CPU. If there are multiple patches,
  * return the one with the highest revision number. Return error If no
  * patch is found and an error occurs during the parsing process. Otherwise
@@ -234,6 +267,35 @@ bool microcode_update_cache(struct microcode_patch *patch)
 }
 
 /*
+ * Wait for a condition to be met with a timeout (us).
+ */
+static int wait_for_condition(int (*func)(void *data), void *data,
+                         unsigned int timeout)
+{
+    while ( !func(data) )
+    {
+        if ( !timeout-- )
+        {
+            printk("CPU%u: Timeout in %s\n", smp_processor_id(), __func__);
+            return -EBUSY;
+        }
+        udelay(1);
+    }
+
+    return 0;
+}
+
+static int wait_cpu_callin(void *nr)
+{
+    return cpumask_weight(&cpu_callin_map) >= (unsigned long)nr;
+}
+
+static int wait_cpu_callout(void *nr)
+{
+    return atomic_read(&cpu_out) >= (unsigned long)nr;
+}
+
+/*
  * Load a microcode update to current CPU.
  *
  * If no patch is provided, the cached patch will be loaded. Microcode update
@@ -283,37 +345,105 @@ static int microcode_update_cpu(const struct microcode_patch *patch)
     return err;
 }
 
-static long do_microcode_update(void *patch)
+static int do_microcode_update(void *patch)
 {
-    unsigned int cpu;
+    unsigned int cpu = smp_processor_id();
+    unsigned int cpu_nr = num_online_cpus();
+    int ret;
 
-    /* store the patch after a successful loading */
-    if ( !microcode_update_cpu(patch) && patch )
+    /* Mark loading an ucode is in progress */
+    cmpxchg(&loading_state, LOADING_EXITED, LOADING_ENTERED);
+    cpumask_set_cpu(cpu, &cpu_callin_map);
+    ret = wait_for_condition(wait_cpu_callin, (void *)(unsigned long)cpu_nr,
+                             MICROCODE_CALLIN_TIMEOUT_US);
+    if ( ret )
     {
-        spin_lock(&microcode_mutex);
-        microcode_update_cache(patch);
-        spin_unlock(&microcode_mutex);
-        patch = NULL;
+        cmpxchg(&loading_state, LOADING_ENTERED, LOADING_ABORTED);
+        return ret;
     }
 
-    if ( microcode_ops->end_update )
-        microcode_ops->end_update();
+    /*
+     * Load microcode update on only one logical processor per core, or in
+     * AMD's term, one core per compute unit. The one with the lowest thread
+     * id among all siblings is chosen to perform the loading.
+     */
+    if ( (cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu))) )
+    {
+        static unsigned int panicked = 0;
+        bool monitor;
+        unsigned int done;
+        unsigned long tick = 0;
 
-    cpu = cpumask_next(smp_processor_id(), &cpu_online_map);
-    if ( cpu < nr_cpu_ids )
-        return continue_hypercall_on_cpu(cpu, do_microcode_update, patch);
+        ret = microcode_ops->apply_microcode(patch);
+        if ( !ret )
+        {
+            unsigned int cpu2;
 
-    /* Free the patch if no CPU has loaded it successfully. */
-    if ( patch )
-        microcode_free_patch(patch);
+            atomic_inc(&cpu_updated);
+            /* Propagate revision number to all siblings */
+            for_each_cpu(cpu2, per_cpu(cpu_sibling_mask, cpu))
+                per_cpu(cpu_sig, cpu2).rev = this_cpu(cpu_sig).rev;
+        }
 
-    return 0;
+        /*
+         * The first CPU reaching here will monitor the progress and emit
+         * warning message if the duration is too long (e.g. >1 second).
+         */
+        monitor = !atomic_inc_return(&cpu_out);
+        if ( monitor )
+            tick = rdtsc_ordered();
+
+        /* Waiting for all cores or computing units finishing update */
+        done = atomic_read(&cpu_out);
+        while ( panicked && done != nr_cores )
+        {
+            /*
+             * During each timeout interval, at least a CPU is expected to
+             * finish its update. Otherwise, something goes wrong.
+             *
+             * Note that RDTSC (in wait_for_condition()) is safe for threads to
+             * execute while waiting for completion of loading an update.
+             */
+            if ( wait_for_condition(&wait_cpu_callout,
+                                    (void *)(unsigned long)(done + 1),
+                                    MICROCODE_UPDATE_TIMEOUT_US) &&
+                 !cmpxchg(&panicked, 0, 1) )
+                panic("Timeout when finishing updating microcode (finished %u/%u)",
+                      done, nr_cores);
+
+            /* Print warning message once if long time is spent here */
+            if ( monitor )
+            {
+                if ( rdtsc_ordered() - tick >= cpu_khz * 1000 )
+                {
+                    printk(XENLOG_WARNING "WARNING: UPDATING MICROCODE HAS CONSUMED MORE THAN 1 SECOND!\n");
+                    monitor = false;
+                }
+            }
+
+            done = atomic_read(&cpu_out);
+        }
+
+        /* Mark loading is done to unblock other threads */
+        loading_state = LOADING_EXITED;
+    }
+    else
+    {
+        while ( loading_state == LOADING_ENTERED )
+            rep_nop();
+    }
+
+    if ( microcode_ops->end_update )
+        microcode_ops->end_update();
+
+    return ret;
 }
 
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 {
     int ret;
     void *buffer;
+    unsigned int cpu, updated;
     struct microcode_patch *patch;
 
     if ( len != (uint32_t)len )
@@ -332,11 +462,18 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
         goto free;
     }
 
+    /* cpu_online_map must not change during update */
+    if ( !get_cpu_maps() )
+    {
+        ret = -EBUSY;
+        goto free;
+    }
+
     if ( microcode_ops->start_update )
     {
         ret = microcode_ops->start_update();
         if ( ret != 0 )
-            goto free;
+            goto put;
     }
 
     patch = microcode_parse_blob(buffer, len);
@@ -344,19 +481,67 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
     {
         ret = PTR_ERR(patch);
         printk(XENLOG_INFO "Parsing microcode blob error %d\n", ret);
-        goto free;
+        goto put;
     }
 
     if ( !patch )
     {
         printk(XENLOG_INFO "No ucode found. Update aborted!\n");
         ret = -EINVAL;
-        goto free;
+        goto put;
+    }
+
+    cpumask_clear(&cpu_callin_map);
+    atomic_set(&cpu_out, 0);
+    atomic_set(&cpu_updated, 0);
+    loading_state = LOADING_EXITED;
+
+    /* Calculate the number of online CPU core */
+    nr_cores = 0;
+    for_each_online_cpu(cpu)
+        if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
+            nr_cores++;
+
+    printk(XENLOG_INFO "%u cores are to update their microcode\n", nr_cores);
+
+    /*
+     * We intend to disable interrupt for long time, which may lead to
+     * watchdog timeout.
+     */
+    watchdog_disable();
+    /*
+     * Late loading dance. Why the heavy-handed stop_machine effort?
+     *
+     * - HT siblings must be idle and not execute other code while the other
+     *   sibling is loading microcode in order to avoid any negative
+     *   interactions cause by the loading.
+     *
+     * - In addition, microcode update on the cores must be serialized until
+     *   this requirement can be relaxed in the future. Right now, this is
+     *   conservative and good.
+     */
+    ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
+    watchdog_enable();
+
+    updated = atomic_read(&cpu_updated);
+    if ( updated > 0 )
+    {
+        spin_lock(&microcode_mutex);
+        microcode_update_cache(patch);
+        spin_unlock(&microcode_mutex);
     }
+    else
+        microcode_free_patch(patch);
 
-    ret = continue_hypercall_on_cpu(cpumask_first(&cpu_online_map),
-                                    do_microcode_update, patch);
+    if ( updated && updated != nr_cores )
+        printk(XENLOG_ERR
+               "ERROR: Updating microcode succeeded on %u cores and failed on\n"
+               "other %u cores. A system with differing microcode revisions is\n"
+               "considered unstable. Please reboot and do not load the microcode\n"
+               "that triggers this warning!\n", updated, nr_cores - updated);
 
+ put:
+    put_cpu_maps();
  free:
     xfree(buffer);
     return ret;