From patchwork Thu Jun 16 08:45:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Bristot de Oliveira X-Patchwork-Id: 12883567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A69EDC43334 for ; Thu, 16 Jun 2022 08:47:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376276AbiFPIrs (ORCPT ); Thu, 16 Jun 2022 04:47:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376322AbiFPIrB (ORCPT ); Thu, 16 Jun 2022 04:47:01 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 247B95DE76; Thu, 16 Jun 2022 01:46:41 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7777361D98; Thu, 16 Jun 2022 08:46:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9AF2BC385A5; Thu, 16 Jun 2022 08:46:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655369199; bh=rp8p1in6XSjjIlPwwGB/IGry0t0yANStwOH4/NCPCDo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pbySOll6TmqwMkFntyeSC8x46dH4x6gdNN6wbzr5oDqlfw0sULtu8nf/VLZxspWCr OzePsT8wvbtEETPG+FxaTFMs3oU2FRk4rgSVRjGsQ14avuq2iIwcpZ5J9y964DFn4u 4Xn7I047eE+5laGCB919zyLxMn6/z9zZsw4Pj6Tt+BqKDSY2MoSF2yjAiMidp2q8Wz onKY7WCg9iBdrxGUQKGfVMlVuJC+WJPUAyVb3Mvm2sJ2Zxmibs8S8ip7Re5U/nJTf8 JjRknpOwgUN2RzWvguDQnDaFzPDQ9VpaDk0lTPNRercXt0YkkNcWplnY52+1HJqrnd kHNc2jSQ8C9LA== From: Daniel Bristot de Oliveira To: Steven Rostedt Cc: Daniel Bristot de Oliveira , Wim Van Sebroeck , Guenter Roeck , Jonathan Corbet , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Will Deacon , Catalin Marinas , Marco Elver , Dmitry Vyukov , "Paul E. McKenney" , Shuah Khan , Gabriele Paoloni , Juri Lelli , Clark Williams , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-devel@vger.kernel.org Subject: [PATCH V4 18/20] rv/monitor: Add safe watchdog monitor Date: Thu, 16 Jun 2022 10:45:00 +0200 Message-Id: <6366fdc89d65b8d9b14ccd1e42fa0d793fbe9f73.1655368610.git.bristot@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org The watchdog is an essential building block for the usage of Linux in safety-critical systems because it allows the system to be monitored from an external element - the watchdog hardware, acting as a safety-monitor. A user-space application controls the watchdog device via the watchdog interface. This application, hereafter safety_app, enables the watchdog and periodically pets the watchdog upon correct completion of the safety related processing. If the safety_app, for any reason, stops pinging the watchdog, the watchdog hardware can set the system in a fail-safe state. For example, shutting the system down. Given the importance of the safety_app / watchdog hardware couple, the interaction between these software pieces also needs some sort of monitoring. In other words, "who monitors the monitor?" The safe watchdog (safe_wtd) RV monitor monitors the interaction between the safety_app and the watchdog device, enforcing the correct sequence of events that leads the system to a safe state. Furthermore, the safety_app can monitor the RV monitor by collecting the events generated by the RV monitor itself via tracing interface. In this way, closing the monitoring loop with the safety_app. To reach a safe state, the safe_wtd RV monitor requires the safety_app to: - Open the watchdog device - Start the watchdog - Set a timeout - ping at least once The RV monitor also avoids some undesired actions. For example, to have other threads to touch the watchdog. The monitor also has a set of options, enabled via kernel command line/module options. They are: - watchdog_id: the device id to monitor (default 0). - dont_stop: once enabled, do not allow the RV monitor to be stopped (default off); - safe_timeout: define a maximum safe value that an user-space application can set as the watchdog timeout (default unlimited). - check_timeout: After every ping, check if the time left in the watchdog is less than or equal to the last timeout set for the watchdog. It only works for watchdog devices that provide the get_timeleft() function (default off). For further information, please refer to: Documentation/trace/rv/watchdog-monitor.rst The monitor specification was developed together with Gabriele Paoloni, in the context of the Linux Foundation Elisa Project. Cc: Wim Van Sebroeck Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Steven Rostedt Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Will Deacon Cc: Catalin Marinas Cc: Marco Elver Cc: Dmitry Vyukov Cc: "Paul E. McKenney" Cc: Shuah Khan Cc: Gabriele Paoloni Cc: Juri Lelli Cc: Clark Williams Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira --- include/trace/events/rv.h | 11 + kernel/trace/rv/Kconfig | 10 + kernel/trace/rv/Makefile | 1 + kernel/trace/rv/monitors/safe_wtd/safe_wtd.c | 300 +++++++++++++++++++ kernel/trace/rv/monitors/safe_wtd/safe_wtd.h | 84 ++++++ 5 files changed, 406 insertions(+) create mode 100644 kernel/trace/rv/monitors/safe_wtd/safe_wtd.c create mode 100644 kernel/trace/rv/monitors/safe_wtd/safe_wtd.h diff --git a/include/trace/events/rv.h b/include/trace/events/rv.h index 00f11a8dac3b..895eb3435ed7 100644 --- a/include/trace/events/rv.h +++ b/include/trace/events/rv.h @@ -66,6 +66,17 @@ DEFINE_EVENT(error_da_monitor, error_wip, TP_PROTO(char *state, char *event), TP_ARGS(state, event)); #endif /* CONFIG_RV_MON_WIP */ + +#ifdef CONFIG_RV_MON_SAFE_WTD +DEFINE_EVENT(event_da_monitor, event_safe_wtd, + TP_PROTO(char *state, char *event, char *next_state, bool safe), + TP_ARGS(state, event, next_state, safe)); + +DEFINE_EVENT(error_da_monitor, error_safe_wtd, + TP_PROTO(char *state, char *event), + TP_ARGS(state, event)); +#endif /* CONFIG_RV_MON_SAFE_WTD */ + #endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */ #ifdef CONFIG_DA_MON_EVENTS_ID diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig index 21f03fb3101a..b14ae63e792b 100644 --- a/kernel/trace/rv/Kconfig +++ b/kernel/trace/rv/Kconfig @@ -45,6 +45,16 @@ config RV_MON_WWNR illustrates the usage of per-task monitor. The model is broken on purpose: it serves to test reactors. +config RV_MON_SAFE_WTD + select DA_MON_EVENTS_IMPLICIT + bool "Safety watchdog" + help + Enable safe_wtd, this monitor observes the interaction + between a user-space safety monitor and a watchdog device. + + For futher information see: + Documentation/trace/rv/safety-monitor.rst + config RV_REACTORS bool "Runtime verification reactors" default y if RV diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile index 963d14875b45..904db96c7eae 100644 --- a/kernel/trace/rv/Makefile +++ b/kernel/trace/rv/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_RV) += rv.o obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o +obj-$(CONFIG_RV_MON_SAFE_WTD) += monitors/safe_wtd/safe_wtd.o obj-$(CONFIG_RV_REACTORS) += rv_reactors.o obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o obj-$(CONFIG_RV_REACT_PANIC) += reactor_panic.o diff --git a/kernel/trace/rv/monitors/safe_wtd/safe_wtd.c b/kernel/trace/rv/monitors/safe_wtd/safe_wtd.c new file mode 100644 index 000000000000..9856e0770d0d --- /dev/null +++ b/kernel/trace/rv/monitors/safe_wtd/safe_wtd.c @@ -0,0 +1,300 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include +#include + +#define MODULE_NAME "safe_wtd" + +/* + * This is the self-generated part of the monitor. Generally, there is no need + * to touch this section. + */ +#include "safe_wtd.h" + +/* + * Declare the deterministic automata monitor. + * + * The rv monitor reference is needed for the monitor declaration. + */ +struct rv_monitor rv_safe_wtd; +DECLARE_DA_MON_GLOBAL(safe_wtd, char); + +/* + * custom: safe_timeout is the maximum value a watchdog monitor + * can set. This value is registered here to duplicate the information. + * In this way, a miss-behaving monitor can be detected. + */ +static int safe_timeout = ~0; +module_param(safe_timeout, int, 0444); + +/* + * custom: if check_timeout is set, the monitor will check if the time left + * in the watchdog is less than or equals to the last safe timeout set by + * user-space. This check is done after each ping. In this way, if any + * code by-passed the watchdog dev interface setting a higher (so unsafe) + * timeout, this monitor will catch the side effect and react. + */ +static int last_timeout_set = 0; +static int check_timeout = 0; +module_param(check_timeout, int, 0444); + +/* + * custom: if dont_stop is set the monitor will react if stopped. + */ +static int dont_stop = 0; +module_param(dont_stop, int, 0444); + +/* + * custom: there are some states that are kept after the watchdog is closed. + * For example, the nowayout state. + * + * Thus, the RV monitor needs to keep track of these states after a start/stop + * of the RV monitor itself, and should not reset after each restart - keeping the + * know state until the system shutdown. + * + * If for an unknown reason an RV monitor would like to reset the RV monitor at each + * RV monitor start, set it to one. + */ +static int reset_on_restart = 0; +module_param(reset_on_restart, int, 0444); + +/* + * open_pid takes note of the first thread that opened the watchdog. + * + * Any other thread that generates an event will cause an "other_threads" + * event in the monitor. + */ +static int open_pid = 0; + +/* + * watchdog_id: the watchdog to monitor + */ +static int watchdog_id = 0; +module_param(watchdog_id, int, 0444); + +static void handle_nowayout(void *data, struct watchdog_device *wdd) +{ + if (wdd->id != watchdog_id) + return; + + da_handle_init_run_event_safe_wtd(nowayout_safe_wtd); +} + +static void handle_close(void *data, struct watchdog_device *wdd) +{ + if (wdd->id != watchdog_id) + return; + + if (open_pid && current->pid != open_pid) { + da_handle_init_run_event_safe_wtd(other_threads_safe_wtd); + } else { + da_handle_event_safe_wtd(close_safe_wtd); + open_pid = 0; + } +} + +static void handle_open(void *data, struct watchdog_device *wdd) +{ + if (wdd->id != watchdog_id) + return; + + if (open_pid && current->pid != open_pid) { + da_handle_init_run_event_safe_wtd(other_threads_safe_wtd); + } else { + da_handle_init_run_event_safe_wtd(open_safe_wtd); + open_pid = current->pid; + } +} + +static void blocked_events(void *data, struct watchdog_device *wdd) +{ + if (wdd->id != watchdog_id) + return; + + if (open_pid && current->pid != open_pid) { + da_handle_init_run_event_safe_wtd(other_threads_safe_wtd); + return; + } + da_handle_event_safe_wtd(other_threads_safe_wtd); +} + +static void blocked_events_timeout(void *data, struct watchdog_device *wdd, u64 timeout) +{ + blocked_events(data, wdd); +} + +static void handle_ping(void *data, struct watchdog_device *wdd) +{ + char msg[128]; + unsigned int timeout; + + if (wdd->id != watchdog_id) + return; + + if (open_pid && current->pid != open_pid) { + da_handle_init_run_event_safe_wtd(other_threads_safe_wtd); + return; + } + + da_handle_event_safe_wtd(ping_safe_wtd); + + if (!check_timeout) + return; + + if (wdd->ops->get_timeleft) { + timeout = wdd->ops->get_timeleft(wdd); + if (timeout > last_timeout_set) { + snprintf(msg, 128, + "watchdog timeout is %u > than previously set (%d)\n", + timeout, last_timeout_set); + cond_react(msg); + } + } else { + snprintf(msg, 128, "error getting timeout: option not supported\n"); + cond_react(msg); + } +} + +static void handle_set_safe_timeout(void *data, struct watchdog_device *wdd, u64 timeout) +{ + char msg[128]; + + if (wdd->id != watchdog_id) + return; + + if (open_pid && current->pid != open_pid) { + da_handle_init_run_event_safe_wtd(other_threads_safe_wtd); + return; + } + + da_handle_event_safe_wtd(set_safe_timeout_safe_wtd); + + if (timeout > safe_timeout) { + snprintf(msg, 128, "set safety timeout is too high: %d", (int) timeout); + cond_react(msg); + } + + if (check_timeout) + last_timeout_set = timeout; +} + +static void handle_start(void *data, struct watchdog_device *wdd) +{ + if (wdd->id != watchdog_id) + return; + + if (open_pid && current->pid != open_pid) { + da_handle_init_run_event_safe_wtd(other_threads_safe_wtd); + return; + } + + da_handle_event_safe_wtd(start_safe_wtd); +} + +static void handle_stop(void *data, struct watchdog_device *wdd) +{ + if (wdd->id != watchdog_id) + return; + + if (open_pid && current->pid != open_pid) { + da_handle_init_run_event_safe_wtd(other_threads_safe_wtd); + return; + } + + da_handle_event_safe_wtd(stop_safe_wtd); +} + +static int mon_started = 0; + +static int start_safe_wtd_monitor(void) +{ + int retval; + + if (!mon_started || reset_on_restart) { + retval = da_monitor_init_safe_wtd(); + if (retval) + return retval; + + mon_started = 1; + } + + rv_attach_trace_probe("safe_wtd", watchdog_close, handle_close); + rv_attach_trace_probe("safe_wtd", watchdog_nowayout, handle_nowayout); + rv_attach_trace_probe("safe_wtd", watchdog_open, handle_open); + rv_attach_trace_probe("safe_wtd", watchdog_ping, handle_ping); + rv_attach_trace_probe("safe_wtd", watchdog_set_timeout, handle_set_safe_timeout); + rv_attach_trace_probe("safe_wtd", watchdog_start, handle_start); + rv_attach_trace_probe("safe_wtd", watchdog_stop, handle_stop); + rv_attach_trace_probe("safe_wtd", watchdog_set_keep_alive, blocked_events_timeout); + rv_attach_trace_probe("safe_wtd", watchdog_keep_alive, blocked_events); + rv_attach_trace_probe("safe_wtd", watchdog_set_pretimeout, blocked_events_timeout); + rv_attach_trace_probe("safe_wtd", watchdog_pretimeout, blocked_events); + + return 0; +} + +static void stop_safe_wtd_monitor(void) +{ + if (dont_stop) + cond_react("dont_stop safe_wtd is set."); + + rv_safe_wtd.enabled = 0; + + rv_detach_trace_probe("safe_wtd", watchdog_close, handle_close); + rv_detach_trace_probe("safe_wtd", watchdog_nowayout, handle_nowayout); + rv_detach_trace_probe("safe_wtd", watchdog_open, handle_open); + rv_detach_trace_probe("safe_wtd", watchdog_ping, handle_ping); + rv_detach_trace_probe("safe_wtd", watchdog_set_timeout, handle_set_safe_timeout); + rv_detach_trace_probe("safe_wtd", watchdog_start, handle_start); + rv_detach_trace_probe("safe_wtd", watchdog_stop, handle_stop); + rv_detach_trace_probe("safe_wtd", watchdog_set_keep_alive, blocked_events_timeout); + rv_detach_trace_probe("safe_wtd", watchdog_keep_alive, blocked_events); + rv_detach_trace_probe("safe_wtd", watchdog_set_pretimeout, blocked_events_timeout); + rv_detach_trace_probe("safe_wtd", watchdog_pretimeout, blocked_events); + + da_monitor_destroy_safe_wtd(); +} + +/* + * This is the monitor register section. + */ +struct rv_monitor rv_safe_wtd = { + .name = "safe_wtd", + .description = "A watchdog monitor guarding a safety monitor actions", + .start = start_safe_wtd_monitor, + .stop = stop_safe_wtd_monitor, + .reset = da_monitor_reset_all_safe_wtd, + .enabled = 0, +}; + +int register_safe_wtd(void) +{ + rv_register_monitor(&rv_safe_wtd); + return 0; +} + +void unregister_safe_wtd(void) +{ + if (rv_safe_wtd.enabled) + stop_safe_wtd_monitor(); + + rv_unregister_monitor(&rv_safe_wtd); +} + +module_init(register_safe_wtd); +module_exit(unregister_safe_wtd); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Daniel Bristot de Oliveira "); +MODULE_DESCRIPTION("Safe watchdog RV monitor"); diff --git a/kernel/trace/rv/monitors/safe_wtd/safe_wtd.h b/kernel/trace/rv/monitors/safe_wtd/safe_wtd.h new file mode 100644 index 000000000000..835c9d0979f6 --- /dev/null +++ b/kernel/trace/rv/monitors/safe_wtd/safe_wtd.h @@ -0,0 +1,84 @@ +enum states_safe_wtd { + init_safe_wtd = 0, + closed_running_safe_wtd, + closed_running_nwo_safe_wtd, + nwo_safe_wtd, + opened_safe_wtd, + opened_nwo_safe_wtd, + reopened_safe_wtd, + safe_safe_wtd, + safe_nwo_safe_wtd, + set_safe_wtd, + set_nwo_safe_wtd, + started_safe_wtd, + started_nwo_safe_wtd, + stoped_safe_wtd, + state_max_safe_wtd +}; + +enum events_safe_wtd { + close_safe_wtd = 0, + nowayout_safe_wtd, + open_safe_wtd, + other_threads_safe_wtd, + ping_safe_wtd, + set_safe_timeout_safe_wtd, + start_safe_wtd, + stop_safe_wtd, + event_max_safe_wtd +}; + +struct automaton_safe_wtd { + char *state_names[state_max_safe_wtd]; + char *event_names[event_max_safe_wtd]; + char function[state_max_safe_wtd][event_max_safe_wtd]; + char initial_state; + char final_states[state_max_safe_wtd]; +}; + +struct automaton_safe_wtd automaton_safe_wtd = { + .state_names = { + "init", + "closed_running", + "closed_running_nwo", + "nwo", + "opened", + "opened_nwo", + "reopened", + "safe", + "safe_nwo", + "set", + "set_nwo", + "started", + "started_nwo", + "stoped" + }, + .event_names = { + "close", + "nowayout", + "open", + "other_threads", + "ping", + "set_safe_timeout", + "start", + "stop" + }, + .function = { + { -1, nwo_safe_wtd, opened_safe_wtd, init_safe_wtd, -1, -1, -1, -1 }, + { -1, closed_running_nwo_safe_wtd, reopened_safe_wtd, closed_running_safe_wtd, -1, -1, -1, -1 }, + { -1, closed_running_nwo_safe_wtd, started_nwo_safe_wtd, closed_running_nwo_safe_wtd, -1, -1, -1, -1 }, + { -1, nwo_safe_wtd, opened_nwo_safe_wtd, nwo_safe_wtd, -1, -1, -1, -1 }, + { init_safe_wtd, -1, -1, -1, -1, -1, started_safe_wtd, -1 }, + { nwo_safe_wtd, -1, -1, -1, -1, -1, started_nwo_safe_wtd, -1 }, + { closed_running_safe_wtd, -1, -1, -1, -1, set_safe_wtd, -1, opened_safe_wtd }, + { closed_running_safe_wtd, -1, -1, -1, safe_safe_wtd, -1, -1, stoped_safe_wtd }, + { closed_running_nwo_safe_wtd, -1, -1, -1, safe_nwo_safe_wtd, -1, -1, -1 }, + { -1, -1, -1, -1, safe_safe_wtd, -1, -1, -1 }, + { -1, -1, -1, -1, safe_nwo_safe_wtd, -1, -1, -1 }, + { closed_running_safe_wtd, -1, -1, -1, -1, set_safe_wtd, -1, stoped_safe_wtd }, + { closed_running_nwo_safe_wtd, -1, -1, -1, -1, set_nwo_safe_wtd, -1, -1 }, + { init_safe_wtd, -1, -1, -1, -1, -1, -1, -1 }, + }, + .initial_state = init_safe_wtd, + .final_states = { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, +};