From patchwork Fri Jan 8 14:52:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ido Schimmel X-Patchwork-Id: 12006767 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3B68C433DB for ; Fri, 8 Jan 2021 14:53:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F4862396F for ; Fri, 8 Jan 2021 14:53:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727128AbhAHOxm (ORCPT ); Fri, 8 Jan 2021 09:53:42 -0500 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:48373 "EHLO out5-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725793AbhAHOxm (ORCPT ); Fri, 8 Jan 2021 09:53:42 -0500 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 0E5E85C02C0; Fri, 8 Jan 2021 09:52:56 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Fri, 08 Jan 2021 09:52:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=n7ZV//HxK19udrI8a5sOZAQ+DSCDuSzroWTDmGb2V80=; b=Q4AxLpP3 klFuPrs9E/XU0AhqUobScn7rakxw2OppVFKE+diQw+7JPUxS7MGxxYfKZm3bbOQn pg4Aj3CgL/wUYx22lS6PibGuM/Jpuk9U3tQ+FmHezLfaJFA7H5Xu41B1qPzmXKQV sVgjiPFzMlGrlQ/KPX7OWmo/Npp/U/4TufPZDr44h5zAprJbJsvEGQg9rZUdwVjT FNTOFjUxflKUfQHbBHGx9bwK1wQR5hYoOGeqGNohtgRSFxVtpihhsGt8bYpZDwB/ xmT14ExN8X8SuAM6teAJ6jBn5ijnWzNbxNOHnK9eGNBW3M7KoGcSh4jceoF6E3N+ t3QFhgp3t2Q3KA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrvdeghedgtdduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepkfguohcuufgthhhimhhmvghluceoihguohhstghhsehiugho shgthhdrohhrgheqnecuggftrfgrthhtvghrnhepudetieevffffveelkeeljeffkefhke ehgfdtffethfelvdejgffghefgveejkefhnecukfhppeekgedrvddvledrudehfedrgeeg necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepihguoh hstghhsehiughoshgthhdrohhrgh X-ME-Proxy: Received: from shredder.lan (igld-84-229-153-44.inter.net.il [84.229.153.44]) by mail.messagingengine.com (Postfix) with ESMTPA id B319124005C; Fri, 8 Jan 2021 09:52:53 -0500 (EST) From: Ido Schimmel To: netdev@vger.kernel.org Cc: davem@davemloft.net, kuba@kernel.org, vadimp@nvidia.com, jiri@nvidia.com, mlxsw@nvidia.com, Ido Schimmel Subject: [PATCH net 1/2] mlxsw: core: Add validation of transceiver temperature thresholds Date: Fri, 8 Jan 2021 16:52:09 +0200 Message-Id: <20210108145210.1229820-2-idosch@idosch.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210108145210.1229820-1-idosch@idosch.org> References: <20210108145210.1229820-1-idosch@idosch.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Vadim Pasternak Validate thresholds to avoid a single failure due to some transceiver unreliability. Ignore the last readouts in case warning temperature is above alarm temperature, since it can cause unexpected thermal shutdown. Stay with the previous values and refresh threshold within the next iteration. This is a rare scenario, but it was observed at a customer site. Fixes: 6a79507cfe94 ("mlxsw: core: Extend thermal module with per QSFP module thermal zones") Signed-off-by: Vadim Pasternak Reviewed-by: Jiri Pirko Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c index 8fa286ccdd6b..250a85049697 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c @@ -176,6 +176,12 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core, if (err) return err; + if (crit_temp > emerg_temp) { + dev_warn(dev, "%s : Critical threshold %d is above emergency threshold %d\n", + tz->tzdev->type, crit_temp, emerg_temp); + return 0; + } + /* According to the system thermal requirements, the thermal zones are * defined with four trip points. The critical and emergency * temperature thresholds, provided by QSFP module are set as "active" @@ -190,11 +196,8 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core, tz->trips[MLXSW_THERMAL_TEMP_TRIP_NORM].temp = crit_temp; tz->trips[MLXSW_THERMAL_TEMP_TRIP_HIGH].temp = crit_temp; tz->trips[MLXSW_THERMAL_TEMP_TRIP_HOT].temp = emerg_temp; - if (emerg_temp > crit_temp) - tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp + + tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp + MLXSW_THERMAL_MODULE_TEMP_SHIFT; - else - tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp; return 0; } From patchwork Fri Jan 8 14:52:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ido Schimmel X-Patchwork-Id: 12006769 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23F66C433E0 for ; Fri, 8 Jan 2021 14:53:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C8DD423998 for ; Fri, 8 Jan 2021 14:53:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727252AbhAHOxo (ORCPT ); Fri, 8 Jan 2021 09:53:44 -0500 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:34559 "EHLO out5-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725793AbhAHOxo (ORCPT ); Fri, 8 Jan 2021 09:53:44 -0500 Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.nyi.internal (Postfix) with ESMTP id D724A5C02E3; Fri, 8 Jan 2021 09:52:57 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Fri, 08 Jan 2021 09:52:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=fqlqPFT9FgUnhyI5sLeqwH906yHCiqH5VWEUeqUKZi8=; b=TSaglgC8 3Jcn1ibPwJ2batMun6JSrInNrY4OXvnhG3gdcDJTotPPL+EuHmkqeNhvZEESNUog h6/PjXAGEaNiJhrGy/6ywBHdxI+ZH/EW54SnqlaHQarfAYxMMMoX0V78VShZL5FC gW+ty5w2wVBLM6N+2B+oZdd47yTqfaKkrvZ+s0O0R7Ww9Vj7Bc/VvEsFNP79ddBY cmi3M+57XP/8pv76dX8vdo2L89sH2yrid6ipJoKmA3e4Mz87DFHgiPtqTa9RTBc3 Jb/BwDrRSTTDmwzMKKFzs/DIFpaOfhjj3e4USKC23jJH+P5hw61ayT/SBfTbQ6Cb q+7zlA6rD2O6uQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrvdeghedgtdduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre dtredttdenucfhrhhomhepkfguohcuufgthhhimhhmvghluceoihguohhstghhsehiugho shgthhdrohhrgheqnecuggftrfgrthhtvghrnhepudetieevffffveelkeeljeffkefhke ehgfdtffethfelvdejgffghefgveejkefhnecukfhppeekgedrvddvledrudehfedrgeeg necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepihguoh hstghhsehiughoshgthhdrohhrgh X-ME-Proxy: Received: from shredder.lan (igld-84-229-153-44.inter.net.il [84.229.153.44]) by mail.messagingengine.com (Postfix) with ESMTPA id F0AF924005D; Fri, 8 Jan 2021 09:52:55 -0500 (EST) From: Ido Schimmel To: netdev@vger.kernel.org Cc: davem@davemloft.net, kuba@kernel.org, vadimp@nvidia.com, jiri@nvidia.com, mlxsw@nvidia.com, Ido Schimmel Subject: [PATCH net 2/2] mlxsw: core: Increase critical threshold for ASIC thermal zone Date: Fri, 8 Jan 2021 16:52:10 +0200 Message-Id: <20210108145210.1229820-3-idosch@idosch.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20210108145210.1229820-1-idosch@idosch.org> References: <20210108145210.1229820-1-idosch@idosch.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Vadim Pasternak Increase critical threshold for ASIC thermal zone from 110C to 140C according to the system hardware requirements. All the supported ASICs (Spectrum-1, Spectrum-2, Spectrum-3) could be still operational with ASIC temperature below 140C. With the old critical threshold value system can perform unjustified shutdown. All the systems equipped with the above ASICs implement thermal protection mechanism at firmware level and firmware could decide to perform system thermal shutdown in case the temperature is below 140C. So with the new threshold system will not meltdown, while thermal operating range will be aligned with hardware abilities. Fixes: 41e760841d26 ("mlxsw: core: Replace thermal temperature trips with defines") Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone") Signed-off-by: Vadim Pasternak Reviewed-by: Jiri Pirko Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c index 250a85049697..bf85ce9835d7 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c @@ -19,7 +19,7 @@ #define MLXSW_THERMAL_ASIC_TEMP_NORM 75000 /* 75C */ #define MLXSW_THERMAL_ASIC_TEMP_HIGH 85000 /* 85C */ #define MLXSW_THERMAL_ASIC_TEMP_HOT 105000 /* 105C */ -#define MLXSW_THERMAL_ASIC_TEMP_CRIT 110000 /* 110C */ +#define MLXSW_THERMAL_ASIC_TEMP_CRIT 140000 /* 140C */ #define MLXSW_THERMAL_HYSTERESIS_TEMP 5000 /* 5C */ #define MLXSW_THERMAL_MODULE_TEMP_SHIFT (MLXSW_THERMAL_HYSTERESIS_TEMP * 2) #define MLXSW_THERMAL_ZONE_MAX_NAME 16