From patchwork Mon May 27 08:31:29 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Chao Gao <chao.gao@intel.com>
X-Patchwork-Id: 10962291
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CEEC91E
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Mon, 27 May 2019 08:29:43 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E50D28AAC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Mon, 27 May 2019 08:29:43 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 5251328AB3; Mon, 27 May 2019 08:29:43 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 91F2428AAC
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Mon, 27 May 2019 08:29:42 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1hVAyw-0004IW-0A; Mon, 27 May 2019 08:27:46 +0000
Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6])
 by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from
 <SRS0=Vl5o=T3=intel.com=chao.gao@srs-us1.protection.inumbo.net>)
 id 1hVAyu-0004Hg-Oo
 for xen-devel@lists.xenproject.org; Mon, 27 May 2019 08:27:44 +0000
X-Inumbo-ID: 4c7b69c6-8059-11e9-8980-bc764e045a96
Received: from mga12.intel.com (unknown [192.55.52.136])
 by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS
 id 4c7b69c6-8059-11e9-8980-bc764e045a96;
 Mon, 27 May 2019 08:27:43 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
 by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 27 May 2019 01:27:43 -0700
X-ExtLoop1: 1
Received: from gao-cwp.sh.intel.com ([10.239.159.26])
 by orsmga005.jf.intel.com with ESMTP; 27 May 2019 01:27:40 -0700
From: Chao Gao <chao.gao@intel.com>
To: xen-devel@lists.xenproject.org
Date: Mon, 27 May 2019 16:31:29 +0800
Message-Id: <1558945891-3015-9-git-send-email-chao.gao@intel.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1558945891-3015-1-git-send-email-chao.gao@intel.com>
References: <1558945891-3015-1-git-send-email-chao.gao@intel.com>
Subject: [Xen-devel] [PATCH v7 08/10] x86/microcode: Synchronize late
 microcode loading
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Sergey Dyasli <sergey.dyasli@citrix.com>,
 Kevin Tian <kevin.tian@intel.com>, Borislav Petkov <bp@suse.de>,
 Ashok Raj <ashok.raj@intel.com>, Wei Liu <wl@xen.org>,
 Jun Nakajima <jun.nakajima@intel.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <jbeulich@suse.com>,
 Thomas Gleixner <tglx@linutronix.de>, Chao Gao <chao.gao@intel.com>,
	=?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Tested-by: Chao Gao <chao.gao@intel.com>
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes in v7:
 - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
 - reword the comment above microcode_update_cpu() to clearly state that
 one thread per core should do the update.

Changes in v6:
 - Use one timeout period for rendezvous stage and another for update stage.
 - scale time to wait by the number of remaining cpus to respond.
   It helps to find something wrong earlier and thus we can reboot the
   system earlier.
---
 xen/arch/x86/microcode.c | 171 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 155 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 23cf550..f4a417e 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -22,6 +22,7 @@
  */
 
 #include <xen/cpu.h>
+#include <xen/cpumask.h>
 #include <xen/lib.h>
 #include <xen/kernel.h>
 #include <xen/init.h>
@@ -30,15 +31,34 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/stop_machine.h>
 #include <xen/tasklet.h>
 #include <xen/guest_access.h>
 #include <xen/earlycpio.h>
+#include <xen/watchdog.h>
 
+#include <asm/delay.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
 #include <asm/setup.h>
 #include <asm/microcode.h>
 
+/*
+ * Before performing a late microcode update on any thread, we
+ * rendezvous all cpus in stop_machine context. The timeout for
+ * waiting for cpu rendezvous is 30ms. It is the timeout used by
+ * live patching
+ */
+#define MICROCODE_CALLIN_TIMEOUT_US 30000
+
+/*
+ * Timeout for each thread to complete update is set to 1s. It is a
+ * conservative choice considering all possible interference (for
+ * instance, sometimes wbinvd takes relative long time). And a perfect
+ * timeout doesn't help a lot except an early shutdown.
+ */
+#define MICROCODE_UPDATE_TIMEOUT_US 1000000
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
@@ -190,6 +210,12 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct cpu_signature, cpu_sig);
 
 /*
+ * Count the CPUs that have entered, exited the rendezvous and succeeded in
+ * microcode update during late microcode update respectively.
+ */
+static atomic_t cpu_in, cpu_out, cpu_updated;
+
+/*
  * Return the patch with the highest revision id among all matching
  * patches in the blob. Return NULL if no suitable patch.
  */
@@ -270,31 +296,90 @@ bool microcode_update_cache(struct microcode_patch *patch)
     return true;
 }
 
-static long do_microcode_update(void *patch)
+/* Wait for CPUs to rendezvous with a timeout (us) */
+static int wait_for_cpus(atomic_t *cnt, unsigned int expect,
+                         unsigned int timeout)
 {
-    int error, cpu;
-
-    error = microcode_update_cpu(patch);
-    if ( error )
+    while ( atomic_read(cnt) < expect )
     {
-        microcode_ops->free_patch(microcode_cache);
-        return error;
+        if ( !timeout )
+        {
+            printk("CPU%d: Timeout when waiting for CPUs calling in\n",
+                   smp_processor_id());
+            return -EBUSY;
+        }
+        udelay(1);
+        timeout--;
     }
 
+    return 0;
+}
 
-    cpu = cpumask_next(smp_processor_id(), &cpu_online_map);
-    if ( cpu < nr_cpu_ids )
-        return continue_hypercall_on_cpu(cpu, do_microcode_update, patch);
+static int do_microcode_update(void *patch)
+{
+    unsigned int cpu = smp_processor_id();
+    unsigned int cpu_nr = num_online_cpus();
+    unsigned int finished;
+    int ret;
+    static bool error;
 
-    microcode_update_cache(patch);
+    atomic_inc(&cpu_in);
+    ret = wait_for_cpus(&cpu_in, cpu_nr, MICROCODE_CALLIN_TIMEOUT_US);
+    if ( ret )
+        return ret;
 
-    return error;
+    ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
+    /*
+     * Load microcode update on only one logical processor per core.
+     * Here, among logical processors of a core, the one with the
+     * lowest thread id is chosen to perform the loading.
+     */
+    if ( !ret && (cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu))) )
+    {
+        ret = microcode_ops->apply_microcode(patch);
+        if ( !ret )
+            atomic_inc(&cpu_updated);
+    }
+    /*
+     * Increase the wait timeout to a safe value here since we're serializing
+     * the microcode update and that could take a while on a large number of
+     * CPUs. And that is fine as the *actual* timeout will be determined by
+     * the last CPU finished updating and thus cut short
+     */
+    atomic_inc(&cpu_out);
+    finished = atomic_read(&cpu_out);
+    while ( !error && finished != cpu_nr )
+    {
+        /*
+         * During each timeout interval, at least a CPU is expected to
+         * finish its update. Otherwise, something goes wrong.
+         */
+        if ( wait_for_cpus(&cpu_out, finished + 1,
+                           MICROCODE_UPDATE_TIMEOUT_US) && !error )
+        {
+            error = true;
+            panic("Timeout when finishing updating microcode (finished %d/%d)",
+                  finished, cpu_nr);
+        }
+
+        finished = atomic_read(&cpu_out);
+    }
+
+    /*
+     * Refresh CPU signature (revision) on threads which didn't call
+     * apply_microcode().
+     */
+    if ( cpu != cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
+        ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
+
+    return ret;
 }
 
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 {
     int ret;
     void *buffer;
+    unsigned int cpu, nr_cores;
     struct microcode_patch *patch;
 
     if ( len != (uint32_t)len )
@@ -316,11 +401,18 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
         goto free;
     }
 
+    /* cpu_online_map must not change during update */
+    if ( !get_cpu_maps() )
+    {
+        ret = -EBUSY;
+        goto free;
+    }
+
     if ( microcode_ops->start_update )
     {
         ret = microcode_ops->start_update();
         if ( ret != 0 )
-            goto free;
+            goto put;
     }
 
     patch = microcode_parse_blob(buffer, len);
@@ -337,12 +429,59 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
         if ( patch )
             microcode_ops->free_patch(patch);
         ret = -EINVAL;
-        goto free;
+        goto put;
     }
 
-    ret = continue_hypercall_on_cpu(cpumask_first(&cpu_online_map),
-                                    do_microcode_update, patch);
+    atomic_set(&cpu_in, 0);
+    atomic_set(&cpu_out, 0);
+    atomic_set(&cpu_updated, 0);
+
+    /* Calculate the number of online CPU core */
+    nr_cores = 0;
+    for_each_online_cpu(cpu)
+        if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
+            nr_cores++;
+
+    printk(XENLOG_INFO "%d cores are to update their microcode\n", nr_cores);
+
+    /*
+     * We intend to disable interrupt for long time, which may lead to
+     * watchdog timeout.
+     */
+    watchdog_disable();
+    /*
+     * Late loading dance. Why the heavy-handed stop_machine effort?
+     *
+     * - HT siblings must be idle and not execute other code while the other
+     *   sibling is loading microcode in order to avoid any negative
+     *   interactions cause by the loading.
+     *
+     * - In addition, microcode update on the cores must be serialized until
+     *   this requirement can be relaxed in the future. Right now, this is
+     *   conservative and good.
+     */
+    ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
+    watchdog_enable();
+
+    if ( atomic_read(&cpu_updated) == nr_cores )
+    {
+        spin_lock(&microcode_mutex);
+        microcode_update_cache(patch);
+        spin_unlock(&microcode_mutex);
+    }
+    else if ( atomic_read(&cpu_updated) == 0 )
+        microcode_ops->free_patch(patch);
+    else
+    {
+        printk("Updating microcode succeeded on part of CPUs and failed on\n"
+               "others due to an unknown reason. A system with different\n"
+               "microcode revisions is considered unstable. Please reboot and\n"
+               "do not load the microcode that triggers this warning\n");
+        microcode_ops->free_patch(patch);
+    }
 
+ put:
+    put_cpu_maps();
  free:
     xfree(buffer);
     return ret;