From patchwork Tue Jul 12 14:31:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 12915044 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54896C43334 for ; Tue, 12 Jul 2022 14:32:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232170AbiGLOch (ORCPT ); Tue, 12 Jul 2022 10:32:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231684AbiGLOcg (ORCPT ); Tue, 12 Jul 2022 10:32:36 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A56F2240AC; Tue, 12 Jul 2022 07:32:35 -0700 (PDT) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26CDBugK007915; Tue, 12 Jul 2022 14:32:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=0DhHvsxCxRfs8ONe0EUj23JBu5ERuQgHbLPOK7uZoz4=; b=Kcql+c5vfVTWypeyQaPJEwNBwFhWg8xSeKuDQiQ0zOyv5VCLVhj2WPmnO9WjMFmsAi3E vpZceXF81G0cm9k3pQy3KNPYRR3cpygpycF7MEOXeWg1OpS7g0B7SDHNBX6UkJmun0nI Wv3jfak+WMfl9bCjASAoaBHZszV8/PpG8N6hC63xk/Z3RFPVrhw70+jHQqTIiQGJRAxy z6Te9FR5jLXp/uWOdlg0nkVgShC7eLRJ93lIuEDvBK89Cr7iEsJSlttG1SAMrQ6zwrzJ NbG7qvL2z1T3GcuBxcM+SEZf6wgyI9TUErFRKZlnH1wgeJcbtZgz03iZDeIJI2f1+xLf kA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3h98jvm1fn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:08 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26CEUH2B029687; Tue, 12 Jul 2022 14:32:08 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3h98jvm1e2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:08 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26CEKX4u024863; Tue, 12 Jul 2022 14:32:06 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma06ams.nl.ibm.com with ESMTP id 3h70xhvaut-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:06 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26CEW3tK16253304 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Jul 2022 14:32:03 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B5417A405F; Tue, 12 Jul 2022 14:32:03 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3C07FA405B; Tue, 12 Jul 2022 14:32:03 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 12 Jul 2022 14:32:03 +0000 (GMT) From: Laurent Dufour To: mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, wim@linux-watchdog.org, linux@roeck-us.net, nathanl@linux.ibm.com Cc: haren@linux.vnet.ibm.com, hch@infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-watchdog@vger.kernel.org Subject: [PATCH v4 1/4] powerpc/mobility: wait for memory transfer to complete Date: Tue, 12 Jul 2022 16:31:59 +0200 Message-Id: <20220712143202.23144-2-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220712143202.23144-1-ldufour@linux.ibm.com> References: <20220712143202.23144-1-ldufour@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: tfBTM2f9QAUpjECIqqJp_2ByD9f3wySB X-Proofpoint-ORIG-GUID: a9YD_9VNreQoh8r922KXmEml-utvB6L6 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-12_08,2022-07-12_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 suspectscore=0 adultscore=0 lowpriorityscore=0 mlxlogscore=999 impostorscore=0 clxscore=1015 spamscore=0 bulkscore=0 phishscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207120055 Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org In pseries_migration_partition(), loop until the memory transfer is complete. This way the calling drmgr process will not exit earlier, allowing callbacks to be run only once the migration is fully completed. If reading the VASI state is done after the hypervisor has completed the migration, the HCALL is returning H_PARAMETER. We can safely assume that the memory transfer is achieved if this happens. This will also allow to manage the NMI watchdog state in the next commits. Reviewed-by: Nathan Lynch Reviewed-by: Nicholas Piggin Signed-off-by: Laurent Dufour --- arch/powerpc/platforms/pseries/mobility.c | 48 ++++++++++++++++++++++- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 78f3f74c7056..6297467072e6 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle) return ret; } +static void wait_for_vasi_session_completed(u64 handle) +{ + unsigned long state = 0; + int ret; + + pr_info("waiting for memory transfer to complete...\n"); + + /* + * Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED. + */ + while (true) { + ret = poll_vasi_state(handle, &state); + + /* + * If the memory transfer is already complete and the migration + * has been cleaned up by the hypervisor, H_PARAMETER is return, + * which is translate in EINVAL by poll_vasi_state(). + */ + if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) { + pr_info("memory transfer completed.\n"); + break; + } + + if (ret) { + pr_err("H_VASI_STATE return error (%d)\n", ret); + break; + } + + if (state != H_VASI_RESUMED) { + pr_err("unexpected H_VASI_STATE result %lu\n", state); + break; + } + + msleep(500); + } +} + static void prod_single(unsigned int target_cpu) { long hvrc; @@ -673,9 +710,16 @@ static int pseries_migrate_partition(u64 handle) vas_migration_handler(VAS_SUSPEND); ret = pseries_suspend(handle); - if (ret == 0) + if (ret == 0) { post_mobility_fixup(); - else + /* + * Wait until the memory transfer is complete, so that the user + * space process returns from the syscall after the transfer is + * complete. This allows the user hooks to be executed at the + * right time. + */ + wait_for_vasi_session_completed(handle); + } else pseries_cancel_migration(handle, ret); vas_migration_handler(VAS_RESUME); From patchwork Tue Jul 12 14:32:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 12915046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C855CCA47C for ; Tue, 12 Jul 2022 14:32:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229962AbiGLOci (ORCPT ); Tue, 12 Jul 2022 10:32:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229782AbiGLOcg (ORCPT ); Tue, 12 Jul 2022 10:32:36 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77EB120F51; Tue, 12 Jul 2022 07:32:35 -0700 (PDT) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26CDJ0Ag006981; Tue, 12 Jul 2022 14:32:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=OqEsGehoDv7WhZIWjrAncQCiO+QkUV2kIJiuCbfX46c=; b=tQwpbPwJRqFs4fEcXf2zCihGiXm5VxUBNc3lsPC0tMWmK+uPXlkUD6n8F/eXhajc4hFf aVLPTeEmzDnxroIHHJw5lFwOquN62+DwXPh/sWuDk8dAXrgVCW/uBm4Yhi8clYhWfPve d0gcmayhT4bQTtuR1rcC+TgY3c0EB6BQ6MVDvs2MaRHx9Jf+oQPgqOoNLD+zL5KsoYJw y4+8QCC1hFLkY1Is8gBgRyDoIRi/YYKjAoYz8Sb5ze4bnEWkJo5yhgmd5Hbf4zededCZ vnLE6BGZxHNn3SBGNWjJZf926xPG/+ka2HA/dZ3hEJNMp247CtoD0jMei7chQ9H7H6/g Ag== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3h99r0228r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:12 +0000 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26CDfZwf002699; Tue, 12 Jul 2022 14:32:11 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3h99r0226d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:11 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26CEMDi3029760; Tue, 12 Jul 2022 14:32:06 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma06fra.de.ibm.com with ESMTP id 3h8ncngdq5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:06 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26CEW4Rk16384434 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Jul 2022 14:32:04 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4CBFCA405F; Tue, 12 Jul 2022 14:32:04 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C7301A4054; Tue, 12 Jul 2022 14:32:03 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 12 Jul 2022 14:32:03 +0000 (GMT) From: Laurent Dufour To: mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, wim@linux-watchdog.org, linux@roeck-us.net, nathanl@linux.ibm.com Cc: haren@linux.vnet.ibm.com, hch@infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-watchdog@vger.kernel.org Subject: [PATCH v4 2/4] watchdog: export lockup_detector_reconfigure Date: Tue, 12 Jul 2022 16:32:00 +0200 Message-Id: <20220712143202.23144-3-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220712143202.23144-1-ldufour@linux.ibm.com> References: <20220712143202.23144-1-ldufour@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: EuQ7dFqyQ2kclyPpg0qGIUzE6zxKXqRY X-Proofpoint-ORIG-GUID: L4hbfXEqjeFr9W7WKTq27rcfQ3vkY1eZ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-12_08,2022-07-12_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 priorityscore=1501 phishscore=0 suspectscore=0 mlxlogscore=999 impostorscore=0 mlxscore=0 clxscore=1015 spamscore=0 malwarescore=0 lowpriorityscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207120056 Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org In some circumstances it may be interesting to reconfigure the watchdog from inside the kernel. On PowerPC, this may helpful before and after a LPAR migration (LPM) is initiated, because it implies some latencies, watchdog, and especially NMI watchdog is expected to be triggered during this operation. Reconfiguring the watchdog with a factor, would prevent it to happen too frequently during LPM. Rename lockup_detector_reconfigure() as __lockup_detector_reconfigure() and create a new function lockup_detector_reconfigure() calling __lockup_detector_reconfigure() under the protection of watchdog_mutex. Cc: Christoph Hellwig Signed-off-by: Laurent Dufour --- include/linux/nmi.h | 2 ++ kernel/watchdog.c | 21 ++++++++++++++++----- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 750c7f395ca9..f700ff2df074 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -122,6 +122,8 @@ int watchdog_nmi_probe(void); int watchdog_nmi_enable(unsigned int cpu); void watchdog_nmi_disable(unsigned int cpu); +void lockup_detector_reconfigure(void); + /** * touch_nmi_watchdog - restart NMI watchdog timeout. * diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 20a7a55e62b6..90e6c41d5e33 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -541,7 +541,7 @@ int lockup_detector_offline_cpu(unsigned int cpu) return 0; } -static void lockup_detector_reconfigure(void) +static void __lockup_detector_reconfigure(void) { cpus_read_lock(); watchdog_nmi_stop(); @@ -561,6 +561,13 @@ static void lockup_detector_reconfigure(void) __lockup_detector_cleanup(); } +void lockup_detector_reconfigure(void) +{ + mutex_lock(&watchdog_mutex); + __lockup_detector_reconfigure(); + mutex_unlock(&watchdog_mutex); +} + /* * Create the watchdog infrastructure and configure the detector(s). */ @@ -577,13 +584,13 @@ static __init void lockup_detector_setup(void) return; mutex_lock(&watchdog_mutex); - lockup_detector_reconfigure(); + __lockup_detector_reconfigure(); softlockup_initialized = true; mutex_unlock(&watchdog_mutex); } #else /* CONFIG_SOFTLOCKUP_DETECTOR */ -static void lockup_detector_reconfigure(void) +void __lockup_detector_reconfigure(void) { cpus_read_lock(); watchdog_nmi_stop(); @@ -591,9 +598,13 @@ static void lockup_detector_reconfigure(void) watchdog_nmi_start(); cpus_read_unlock(); } +static inline void lockup_detector_reconfigure(void) +{ + __lockup_detector_reconfigure(); +} static inline void lockup_detector_setup(void) { - lockup_detector_reconfigure(); + __lockup_detector_reconfigure(); } #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */ @@ -633,7 +644,7 @@ static void proc_watchdog_update(void) { /* Remove impossible cpus to keep sysctl output clean. */ cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask); - lockup_detector_reconfigure(); + __lockup_detector_reconfigure(); } /* From patchwork Tue Jul 12 14:32:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 12915045 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE6C5C433EF for ; Tue, 12 Jul 2022 14:32:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232632AbiGLOch (ORCPT ); Tue, 12 Jul 2022 10:32:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229677AbiGLOcg (ORCPT ); Tue, 12 Jul 2022 10:32:36 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6AAC96372; Tue, 12 Jul 2022 07:32:35 -0700 (PDT) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26CDXfDf003189; Tue, 12 Jul 2022 14:32:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=u2K0EugPdrYrrSRtyLhyqh2X5IDubSfltxPB5W0G0Hc=; b=apy6+W4RsKyXuj203yQg87RzAktos2L7nosq6OuLsf1Pt1/x8709eTafd63FAOQIPNMs 5veE6uscMxMPIKGGjywzr92WYlVdadWFciPosFAidyEDe1/iZ4rXHKTEOhxyDmRxir7S /22KGecvVZSbkQ8kmKiUKHQ3BDBWKpdaM1GjT4mlTGs9aYvLp8JBBVeelYG0XEzZhc3Q j/RiZSt+CryIkkig28F9+XHrgzS1cQO/fy/vbAzS/+Ez/4+X2nNZ6Zhl2CYkIJFWyprt yy++832X0i18EiUiD+64BuVPvW2H6ZLIRpa2WGSMBD/u6CL+Q7KpHYxS5ZC/bF77/ZTL qw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3h964305xu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:10 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26CBcldb003553; Tue, 12 Jul 2022 14:32:09 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3h964305w5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:09 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26CEKHqq002725; Tue, 12 Jul 2022 14:32:07 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma03ams.nl.ibm.com with ESMTP id 3h71a8v9y8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:07 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26CEW43517629488 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Jul 2022 14:32:05 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D75D2A4054; Tue, 12 Jul 2022 14:32:04 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5E3C2A4060; Tue, 12 Jul 2022 14:32:04 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 12 Jul 2022 14:32:04 +0000 (GMT) From: Laurent Dufour To: mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, wim@linux-watchdog.org, linux@roeck-us.net, nathanl@linux.ibm.com Cc: haren@linux.vnet.ibm.com, hch@infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-watchdog@vger.kernel.org Subject: [PATCH v4 3/4] powerpc/watchdog: introduce a NMI watchdog's factor Date: Tue, 12 Jul 2022 16:32:01 +0200 Message-Id: <20220712143202.23144-4-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220712143202.23144-1-ldufour@linux.ibm.com> References: <20220712143202.23144-1-ldufour@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: WCfyCS5KUq1CApqNRH36bbGcubUuAg7R X-Proofpoint-GUID: cUEboccxOYrSJbS2z31MXTq7emzkj1Ql X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-12_08,2022-07-12_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 adultscore=0 mlxscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 phishscore=0 mlxlogscore=999 spamscore=0 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207120056 Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org Introduce a factor which would apply to the NMI watchdog timeout. This factor is a percentage added to the watchdog_tresh value. The value is set under the watchdog_mutex protection and lockup_detector_reconfigure() is called to recompute wd_panic_timeout_tb. Once the factor is set, it remains until it is set back to 0, which means no impact. Reviewed-by: Nicholas Piggin Signed-off-by: Laurent Dufour --- arch/powerpc/include/asm/nmi.h | 2 ++ arch/powerpc/kernel/watchdog.c | 21 ++++++++++++++++++++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h index ea0e487f87b1..c3c7adef74de 100644 --- a/arch/powerpc/include/asm/nmi.h +++ b/arch/powerpc/include/asm/nmi.h @@ -5,8 +5,10 @@ #ifdef CONFIG_PPC_WATCHDOG extern void arch_touch_nmi_watchdog(void); long soft_nmi_interrupt(struct pt_regs *regs); +void watchdog_nmi_set_timeout_pct(u64 pct); #else static inline void arch_touch_nmi_watchdog(void) {} +static inline void watchdog_nmi_set_timeout_pct(u64 pct) {} #endif #ifdef CONFIG_NMI_IPI diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 7d28b9553654..5d903e63f932 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending; static cpumask_t wd_smp_cpus_stuck; static u64 wd_smp_last_reset_tb; +#ifdef CONFIG_PPC_PSERIES +static u64 wd_timeout_pct; +#endif + /* * Try to take the exclusive watchdog action / NMI IPI / printing lock. * wd_smp_lock must be held. If this fails, we should return and wait @@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu) static void watchdog_calc_timeouts(void) { - wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq; + u64 threshold = watchdog_thresh; + +#ifdef CONFIG_PPC_PSERIES + threshold += (READ_ONCE(wd_timeout_pct) * threshold) / 100; +#endif + + wd_panic_timeout_tb = threshold * ppc_tb_freq; /* Have the SMP detector trigger a bit later */ wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2; @@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void) } return 0; } + +#ifdef CONFIG_PPC_PSERIES +void watchdog_nmi_set_timeout_pct(u64 pct) +{ + pr_info("Set the NMI watchdog timeout factor to %llu%%\n", pct); + WRITE_ONCE(wd_timeout_pct, pct); + lockup_detector_reconfigure(); +} +#endif From patchwork Tue Jul 12 14:32:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laurent Dufour X-Patchwork-Id: 12915048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0122C43334 for ; Tue, 12 Jul 2022 14:37:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229976AbiGLOhZ (ORCPT ); Tue, 12 Jul 2022 10:37:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229617AbiGLOhY (ORCPT ); Tue, 12 Jul 2022 10:37:24 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 755C42707; Tue, 12 Jul 2022 07:37:23 -0700 (PDT) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26CEau8b011499; Tue, 12 Jul 2022 14:37:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=5ubSAhK9MadoENIdxzLVflCgge7eS08M/oEp20xeqZc=; b=TXktl4bAaXcZXegbXRjt6CG/0+mQSXDx4mWvZ+GM4jxWes8ShLISrwfzSEKhWzexrkiy YrsGcISd0X8ZvOI9s/fuFaYCmRtXI1YNh7ctaMFLgMg6anXJbDorS01yipBmOzNtGzd7 5+pTHyvwVu4a3TigrzLUNe3dYzzLkfP+FSmjMEZeYoezMqwZWzh9AoFVGZKfjyMnY0LO 5Xh+K4nw2ZvJ+1WddfrrDKuiIbJ5vGN1pVwBjbb9y72hfohN6x2QW3xii1Kldv3X01LV cqQaROtG4L6GRU9+qcC9i6PC1LDLv6TfQ+IZTqi7DAtniLiqRG+kcVRvrjs3ReXL6ayO UQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3h969809rp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:37:01 +0000 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26CEb1Cg012144; Tue, 12 Jul 2022 14:37:01 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3h969809h2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:37:01 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26CEKCk7024546; Tue, 12 Jul 2022 14:32:07 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma06ams.nl.ibm.com with ESMTP id 3h70xhvauv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Jul 2022 14:32:07 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26CEW57x21889526 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Jul 2022 14:32:05 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6ED78A4054; Tue, 12 Jul 2022 14:32:05 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E9461A405C; Tue, 12 Jul 2022 14:32:04 +0000 (GMT) Received: from pomme.tlslab.ibm.com (unknown [9.101.4.33]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 12 Jul 2022 14:32:04 +0000 (GMT) From: Laurent Dufour To: mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, wim@linux-watchdog.org, linux@roeck-us.net, nathanl@linux.ibm.com Cc: haren@linux.vnet.ibm.com, hch@infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-watchdog@vger.kernel.org Subject: [PATCH v4 4/4] pseries/mobility: set NMI watchdog factor during LPM Date: Tue, 12 Jul 2022 16:32:02 +0200 Message-Id: <20220712143202.23144-5-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220712143202.23144-1-ldufour@linux.ibm.com> References: <20220712143202.23144-1-ldufour@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: zC0O_VQigcbeif90ysm-vPGd-SdNmXcM X-Proofpoint-GUID: EMsToIFc0IgO1oF9GGMKslaLjHC2osb1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-12_08,2022-07-12_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 adultscore=0 mlxscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 bulkscore=0 impostorscore=0 clxscore=1015 lowpriorityscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207120056 Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org During a LPM, while the memory transfer is in progress on the arrival side, some latencies is generated when accessing not yet transferred pages on the arrival side. Thus, the NMI watchdog may be triggered too frequently, which increases the risk to hit a NMI interrupt in a bad place in the kernel, leading to a kernel panic. Disabling the Hard Lockup Watchdog until the memory transfer could be a too strong work around, some users would want this timeout to be eventually triggered if the system is hanging even during LPM. Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply a factor to the NMI watchdog timeout during a LPM. Just before the CPU are stopped for the switchover sequence, the NMI watchdog timer is set to watchdog_tresh + factor% A value of 0 has no effect. The default value is 200, meaning that the NMI watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value). Once the memory transfer is achieved, the factor is reset to 0. Setting this value to a high number is like disabling the NMI watchdog during a LPM. Reviewed-by: Nicholas Piggin Signed-off-by: Laurent Dufour --- Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++ arch/powerpc/platforms/pseries/mobility.c | 43 +++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index ddccd1077462..0bb0b7f27e96 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -592,6 +592,18 @@ to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst). +nmi_watchdog_factor (PPC only) +================================== + +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is +set to 1). This factor represents the percentage added to +``watchdog_thresh`` when calculating the NMI watchdog timeout during a +LPM. The soft lockup timeout is not impacted. + +A value of 0 means no change. The default value is 200 meaning the NMI +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10). + + numa_balancing ============== diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 6297467072e6..78535a0791f9 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -48,6 +48,39 @@ struct update_props_workarea { #define MIGRATION_SCOPE (1) #define PRRN_SCOPE -2 +#ifdef CONFIG_PPC_WATCHDOG +static unsigned int nmi_wd_factor = 200; + +#ifdef CONFIG_SYSCTL +static struct ctl_table nmi_wd_factor_ctl_table[] = { + { + .procname = "nmi_watchdog_factor", + .data = &nmi_wd_factor, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + }, + {} +}; +static struct ctl_table nmi_wd_factor_sysctl_root[] = { + { + .procname = "kernel", + .mode = 0555, + .child = nmi_wd_factor_ctl_table, + }, + {} +}; + +static int __init register_nmi_wd_factor_sysctl(void) +{ + register_sysctl_table(nmi_wd_factor_sysctl_root); + + return 0; +} +device_initcall(register_nmi_wd_factor_sysctl); +#endif /* CONFIG_SYSCTL */ +#endif /* CONFIG_PPC_WATCHDOG */ + static int mobility_rtas_call(int token, char *buf, s32 scope) { int rc; @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle) static int pseries_migrate_partition(u64 handle) { int ret; + unsigned int factor = 0; +#ifdef CONFIG_PPC_WATCHDOG + factor = nmi_wd_factor; +#endif ret = wait_for_vasi_session_suspending(handle); if (ret) return ret; vas_migration_handler(VAS_SUSPEND); + if (factor) + watchdog_nmi_set_timeout_pct(factor); + ret = pseries_suspend(handle); if (ret == 0) { post_mobility_fixup(); @@ -722,6 +762,9 @@ static int pseries_migrate_partition(u64 handle) } else pseries_cancel_migration(handle, ret); + if (factor) + watchdog_nmi_set_timeout_pct(0); + vas_migration_handler(VAS_RESUME); return ret;