From patchwork Thu Feb 4 21:31:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12068825 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 857ECC433E0 for ; Thu, 4 Feb 2021 21:32:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3FE8364FAC for ; Thu, 4 Feb 2021 21:32:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230051AbhBDVcI (ORCPT ); Thu, 4 Feb 2021 16:32:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230001AbhBDVcC (ORCPT ); Thu, 4 Feb 2021 16:32:02 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B038C06178A for ; Thu, 4 Feb 2021 13:31:22 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id t6so2962471pje.9 for ; Thu, 04 Feb 2021 13:31:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=tjTS3kEeQ+IonJ+K5+N9UT8qiB72Ta632M9PwJCf/+k=; b=mFlWyWYSeFesdsXP5dNeu+BWgVM/+5hC4lSyDVpVlpbjkeNYRs8ZgSb0B2HqAttlpC hmBVjfSUuhSpMpFxU3hkKkJtzqJ/zoaL/rLqzx5AvflB9Z3myR6DysrtIIMNBecX9TzF nJnG7N5z1Go3bz6kSNtiFre4fk4sN6GePde1ZnmG1EoET1Tq+Zd6H+FYiF1v1/PKcUX4 kEPdevRBPesgecI4KlQe6u5KykeqB6wRmZtcTBBy5HgzV8TKkRoXk25qRFjIyRu+4nN+ R1SN7MAqlyc+eMpaTVASli54hnYH/ZKUNey+jsrnb1HiIIdIwmGJfCwxEFRx3OolFh18 io2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=tjTS3kEeQ+IonJ+K5+N9UT8qiB72Ta632M9PwJCf/+k=; b=aloLUpCLNrBd8ttTiGNKeYid+MRXLclyhEAeJekTTsxhjrsK2AGNcqY8upEmPXNdDZ OjHNdsJQKsgHuQXN/Kne16IGxIf6kNPhWhHzgAOnbpfdOA0jCP7nOLkFCKbBF10V74j3 O0b01mKn6vChD8AjAXjSYR5tMuebmUx+bZwqNFzbGQYSV95jkz7A8/ntZXfcb+nfkXZQ LrPEL1nb1OHjMDetq9N4D8ultAh+vYNQEeRqglWRRwBxRRYCmgfG+PK4k8Q74aiAelN9 4pcfquWGz2On7XZkuQ6clewUSM/zCU95f4b0qicoDJR26h6pTE8Txz/vF55VlG796RPK TRRg== X-Gm-Message-State: AOAM533Fagm0dsvIZan23D2khC17AnPhWkE+brUYU6ZIuxYrOv4qspTg amr+MdW7MUEVylMZ12j/QUls5jJg8Yo= X-Google-Smtp-Source: ABdhPJy8BB5w9J49u7VaG4+MuLX6DJPgRVrgemrPqwL6ZEhS2VPwTaT7G3lcuLGg82KhGTG5s2AXMiO46m4= Sender: "weiwan via sendgmr" X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:819e:a51d:5f26:827c]) (user=weiwan job=sendgmr) by 2002:a17:90a:8d83:: with SMTP id d3mr129584pjo.0.1612474281645; Thu, 04 Feb 2021 13:31:21 -0800 (PST) Date: Thu, 4 Feb 2021 13:31:15 -0800 In-Reply-To: <20210204213117.1736289-1-weiwan@google.com> Message-Id: <20210204213117.1736289-2-weiwan@google.com> Mime-Version: 1.0 References: <20210204213117.1736289-1-weiwan@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH net-next v10 1/3] net: extract napi poll functionality to __napi_poll() From: Wei Wang To: David Miller , netdev@vger.kernel.org, Jakub Kicinski Cc: Paolo Abeni , Hannes Frederic Sowa , Eric Dumazet , Felix Fietkau , Alexander Duyck Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Felix Fietkau This commit introduces a new function __napi_poll() which does the main logic of the existing napi_poll() function, and will be called by other functions in later commits. This idea and implementation is done by Felix Fietkau and is proposed as part of the patch to move napi work to work_queue context. This commit by itself is a code restructure. Signed-off-by: Felix Fietkau Signed-off-by: Wei Wang Reviewed-by: Alexander Duyck --- net/core/dev.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index aae116d059da..0fd40b9847c3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6781,15 +6781,10 @@ void __netif_napi_del(struct napi_struct *napi) } EXPORT_SYMBOL(__netif_napi_del); -static int napi_poll(struct napi_struct *n, struct list_head *repoll) +static int __napi_poll(struct napi_struct *n, bool *repoll) { - void *have; int work, weight; - list_del_init(&n->poll_list); - - have = netpoll_poll_lock(n); - weight = n->weight; /* This NAPI_STATE_SCHED test is for avoiding a race @@ -6809,7 +6804,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) n->poll, work, weight); if (likely(work < weight)) - goto out_unlock; + return work; /* Drivers must not modify the NAPI state if they * consume the entire weight. In such cases this code @@ -6818,7 +6813,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) */ if (unlikely(napi_disable_pending(n))) { napi_complete(n); - goto out_unlock; + return work; } /* The NAPI context has more processing work, but busy-polling @@ -6831,7 +6826,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) */ napi_schedule(n); } - goto out_unlock; + return work; } if (n->gro_bitmask) { @@ -6849,12 +6844,29 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) if (unlikely(!list_empty(&n->poll_list))) { pr_warn_once("%s: Budget exhausted after napi rescheduled\n", n->dev ? n->dev->name : "backlog"); - goto out_unlock; + return work; } - list_add_tail(&n->poll_list, repoll); + *repoll = true; + + return work; +} + +static int napi_poll(struct napi_struct *n, struct list_head *repoll) +{ + bool do_repoll = false; + void *have; + int work; + + list_del_init(&n->poll_list); + + have = netpoll_poll_lock(n); + + work = __napi_poll(n, &do_repoll); + + if (do_repoll) + list_add_tail(&n->poll_list, repoll); -out_unlock: netpoll_poll_unlock(have); return work; From patchwork Thu Feb 4 21:31:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12068827 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB742C43381 for ; Thu, 4 Feb 2021 21:32:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 675CF64FA7 for ; Thu, 4 Feb 2021 21:32:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230022AbhBDVcO (ORCPT ); Thu, 4 Feb 2021 16:32:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229996AbhBDVcF (ORCPT ); Thu, 4 Feb 2021 16:32:05 -0500 Received: from mail-qv1-xf4a.google.com (mail-qv1-xf4a.google.com [IPv6:2607:f8b0:4864:20::f4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96430C06178B for ; Thu, 4 Feb 2021 13:31:24 -0800 (PST) Received: by mail-qv1-xf4a.google.com with SMTP id b1so3287053qvk.17 for ; Thu, 04 Feb 2021 13:31:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=mQbZ39SMxx0go79W9ZRo8vRHwIKeCMnLqq3TMUcC4Sg=; b=UG/KfyUNRVhFZqTWlta3mmkJIKJ5h2CZuOU4VLk8rSInvDxMDs/K+a4w/8zhafE7AP E7tPU2cyshh9uoMRcySMwNzPqLSurcoKvNT7iYcvb7I3vHQKnucPtw+VrwkIdDjgMuZl mwkxpkznBxIPTp4mZPGZvrs7oU+ZO8dhUiGCk/+jL5HgiDYQKo5QIehVguDhzQKM3j50 AOAjYD/x74oGArU0SnfEt6G+m5oxz74VdoMZICnm+BKRzFEcJQMPNBh/1r3JHPmGFSN1 sNVb4K50VsxvXfLf5M8p7UwIGe/NYtP8ZSHNwBCMPa4jE9BgigCh/9gVCZQmob/fsrAb W3gA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=mQbZ39SMxx0go79W9ZRo8vRHwIKeCMnLqq3TMUcC4Sg=; b=SYwAJkuNL/u9vGoi6b6g7pxYKTZjhuVSRYnI62/hrinczBHJLcELVSA+LJCZa7BGfr qcL7amDgnazmx7ygepTZPxVZViKwtqhunqclbpGG/rsLYpt9BTD0TJASrr2EOIVWzHsW rLIxWlQLYR1ASCRVd9dKHCER7C6VUYEFGy0EoGYVTt6qsMNCN6mrpdBsskk4dN3SphYE RbJ8XnsAiOPirABLiwmNoGFhcdX446MZN7CUyMro6B/n+9rYIrLkLyFYOO9DO4MxNuS3 U+NqInwCL6vmBYlbqaU+m4fi9uNE+mUM8iia1lkgblt/tkiuwdJUjythRHDF6ruG47DK Yq5g== X-Gm-Message-State: AOAM530uL4I3GgLZ2xxHW+eLrBm8c/12gcxKycFpD/qu76NuU++gnzU7 FsIUpEUUaaP1NIE+bguUPeYtobxwbAk= X-Google-Smtp-Source: ABdhPJw1rrbj0xKL4R7HNO7KIlBiV8kF2EdpsU1Lmbl5Xd5iMwHZx+DIQgNHMwHDbLW1Bf0LcnZWCTFkpkI= Sender: "weiwan via sendgmr" X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:819e:a51d:5f26:827c]) (user=weiwan job=sendgmr) by 2002:a05:6214:9d3:: with SMTP id dp19mr1335290qvb.40.1612474283801; Thu, 04 Feb 2021 13:31:23 -0800 (PST) Date: Thu, 4 Feb 2021 13:31:16 -0800 In-Reply-To: <20210204213117.1736289-1-weiwan@google.com> Message-Id: <20210204213117.1736289-3-weiwan@google.com> Mime-Version: 1.0 References: <20210204213117.1736289-1-weiwan@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH net-next v10 2/3] net: implement threaded-able napi poll loop support From: Wei Wang To: David Miller , netdev@vger.kernel.org, Jakub Kicinski Cc: Paolo Abeni , Hannes Frederic Sowa , Eric Dumazet , Felix Fietkau , Alexander Duyck Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org This patch allows running each napi poll loop inside its own kernel thread. The kthread is created during netif_napi_add() if dev->threaded is set. And threaded mode is enabled in napi_enable(). We will provide a way to set dev->threaded and enable threaded mode without a device up/down in the following patch. Once that threaded mode is enabled and the kthread is started, napi_schedule() will wake-up such thread instead of scheduling the softirq. The threaded poll loop behaves quite likely the net_rx_action, but it does not have to manipulate local irqs and uses an explicit scheduling point based on netdev_budget. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Hannes Frederic Sowa Signed-off-by: Hannes Frederic Sowa Co-developed-by: Jakub Kicinski Signed-off-by: Jakub Kicinski Signed-off-by: Wei Wang Reviewed-by: Alexander Duyck --- include/linux/netdevice.h | 21 +++---- net/core/dev.c | 112 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 119 insertions(+), 14 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e9e7ada07ea1..99fb4ec9573e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -347,6 +347,7 @@ struct napi_struct { struct list_head dev_list; struct hlist_node napi_hash_node; unsigned int napi_id; + struct task_struct *thread; }; enum { @@ -358,6 +359,7 @@ enum { NAPI_STATE_NO_BUSY_POLL, /* Do not add in napi_hash, no busy polling */ NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ + NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ }; enum { @@ -369,6 +371,7 @@ enum { NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL), NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), + NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), }; enum gro_result { @@ -503,20 +506,7 @@ static inline bool napi_complete(struct napi_struct *n) */ void napi_disable(struct napi_struct *n); -/** - * napi_enable - enable NAPI scheduling - * @n: NAPI context - * - * Resume NAPI from being scheduled on this context. - * Must be paired with napi_disable. - */ -static inline void napi_enable(struct napi_struct *n) -{ - BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); - smp_mb__before_atomic(); - clear_bit(NAPI_STATE_SCHED, &n->state); - clear_bit(NAPI_STATE_NPSVC, &n->state); -} +void napi_enable(struct napi_struct *n); /** * napi_synchronize - wait until NAPI is not running @@ -1827,6 +1817,8 @@ enum netdev_priv_flags { * * @wol_enabled: Wake-on-LAN is enabled * + * @threaded: napi threaded mode is enabled + * * @net_notifier_list: List of per-net netdev notifier block * that follow this device when it is moved * to another network namespace. @@ -2145,6 +2137,7 @@ struct net_device { struct lock_class_key *qdisc_running_key; bool proto_down; unsigned wol_enabled:1; + unsigned threaded:1; struct list_head net_notifier_list; diff --git a/net/core/dev.c b/net/core/dev.c index 0fd40b9847c3..a8c5eca17074 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -91,6 +91,7 @@ #include #include #include +#include #include #include #include @@ -1493,6 +1494,27 @@ void netdev_notify_peers(struct net_device *dev) } EXPORT_SYMBOL(netdev_notify_peers); +static int napi_threaded_poll(void *data); + +static int napi_kthread_create(struct napi_struct *n) +{ + int err = 0; + + /* Create and wake up the kthread once to put it in + * TASK_INTERRUPTIBLE mode to avoid the blocked task + * warning and work with loadavg. + */ + n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d", + n->dev->name, n->napi_id); + if (IS_ERR(n->thread)) { + err = PTR_ERR(n->thread); + pr_err("kthread_run failed with err %d\n", err); + n->thread = NULL; + } + + return err; +} + static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack) { const struct net_device_ops *ops = dev->netdev_ops; @@ -4264,6 +4286,21 @@ int gro_normal_batch __read_mostly = 8; static inline void ____napi_schedule(struct softnet_data *sd, struct napi_struct *napi) { + struct task_struct *thread; + + if (test_bit(NAPI_STATE_THREADED, &napi->state)) { + /* Paired with smp_mb__before_atomic() in + * napi_enable(). Use READ_ONCE() to guarantee + * a complete read on napi->thread. Only call + * wake_up_process() when it's not NULL. + */ + thread = READ_ONCE(napi->thread); + if (thread) { + wake_up_process(thread); + return; + } + } + list_add_tail(&napi->poll_list, &sd->poll_list); __raise_softirq_irqoff(NET_RX_SOFTIRQ); } @@ -6733,6 +6770,12 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi, set_bit(NAPI_STATE_NPSVC, &napi->state); list_add_rcu(&napi->dev_list, &dev->napi_list); napi_hash_add(napi); + /* Create kthread for this napi if dev->threaded is set. + * Clear dev->threaded if kthread creation failed so that + * threaded mode will not be enabled in napi_enable(). + */ + if (dev->threaded && napi_kthread_create(napi)) + dev->threaded = 0; } EXPORT_SYMBOL(netif_napi_add); @@ -6750,9 +6793,28 @@ void napi_disable(struct napi_struct *n) clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state); clear_bit(NAPI_STATE_DISABLE, &n->state); + clear_bit(NAPI_STATE_THREADED, &n->state); } EXPORT_SYMBOL(napi_disable); +/** + * napi_enable - enable NAPI scheduling + * @n: NAPI context + * + * Resume NAPI from being scheduled on this context. + * Must be paired with napi_disable. + */ +void napi_enable(struct napi_struct *n) +{ + BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); + smp_mb__before_atomic(); + clear_bit(NAPI_STATE_SCHED, &n->state); + clear_bit(NAPI_STATE_NPSVC, &n->state); + if (n->dev->threaded && n->thread) + set_bit(NAPI_STATE_THREADED, &n->state); +} +EXPORT_SYMBOL(napi_enable); + static void flush_gro_hash(struct napi_struct *napi) { int i; @@ -6778,6 +6840,11 @@ void __netif_napi_del(struct napi_struct *napi) flush_gro_hash(napi); napi->gro_bitmask = 0; + + if (napi->thread) { + kthread_stop(napi->thread); + napi->thread = NULL; + } } EXPORT_SYMBOL(__netif_napi_del); @@ -6872,6 +6939,51 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) return work; } +static int napi_thread_wait(struct napi_struct *napi) +{ + set_current_state(TASK_INTERRUPTIBLE); + + while (!kthread_should_stop() && !napi_disable_pending(napi)) { + if (test_bit(NAPI_STATE_SCHED, &napi->state)) { + WARN_ON(!list_empty(&napi->poll_list)); + __set_current_state(TASK_RUNNING); + return 0; + } + + schedule(); + set_current_state(TASK_INTERRUPTIBLE); + } + __set_current_state(TASK_RUNNING); + return -1; +} + +static int napi_threaded_poll(void *data) +{ + struct napi_struct *napi = data; + void *have; + + while (!napi_thread_wait(napi)) { + for (;;) { + bool repoll = false; + + local_bh_disable(); + + have = netpoll_poll_lock(napi); + __napi_poll(napi, &repoll); + netpoll_poll_unlock(have); + + __kfree_skb_flush(); + local_bh_enable(); + + if (!repoll) + break; + + cond_resched(); + } + } + return 0; +} + static __latent_entropy void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data); From patchwork Thu Feb 4 21:31:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Wang X-Patchwork-Id: 12068829 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FD1BC433DB for ; Thu, 4 Feb 2021 21:32:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3605064FA7 for ; Thu, 4 Feb 2021 21:32:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229983AbhBDVc2 (ORCPT ); Thu, 4 Feb 2021 16:32:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229613AbhBDVcU (ORCPT ); Thu, 4 Feb 2021 16:32:20 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C7D4C06178C for ; Thu, 4 Feb 2021 13:31:26 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id t6so2962588pje.9 for ; Thu, 04 Feb 2021 13:31:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=vLAVWDvtnifMpRG3fKzFh3oaLPMvwLy+AB3n8yEmzGc=; b=q6KhSavC3sR6pDaU5VgKKIF6rgqp0p0rKSTJe3cqK6cmmqoTkzSc4r4cow75z1tF8h nqb/EUzwHt03iXeGX9+VYtLAfB9wV9mEM/YAaYK44+mrjLVZTgG1+27BdEV9NeaJavHf KIbw4OAWCLtoqVErwo/0heT7R5oUzhQKE3aYCszsTxQ0EMB4GQnVoHaVyMcm2YDfRGDm Dk41aUetE2uQGkLZgo9+KH9OMtDr8cdwqdHV2H/vIgRpIRgavdhEL/TbkZxOYhNZ8RSP j6kU46lAUR4lCPS7ENVwgTj17H1/jZNvsEZEDqbAx941S8iZ9vm9U+o4PRTdEVNq1VsP Zltg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=vLAVWDvtnifMpRG3fKzFh3oaLPMvwLy+AB3n8yEmzGc=; b=s7aJg3pvxxN6z1SoyGDeU9AMvAjIznmFxvah4EavUWr5T5lRBDXWkGiZRe65dsnSIS rOQDR8cjL8RIjENytMK2fXFcPlRfKjyjE4Iml2WDwj6ysJUjHb4rz7I71K7E7bsgT+pn qblIfW6RW78Ue0bKAKLov02JtoIS0+ZpLqdQHLTXmfaxDT92PY72ZE3N0Ae4/NVWK6kh u/8nJUOt+5Pt97o0cUfaQ3LsiVvARGq/d3a8nn3R4xLgVNtRT/ACozfOH2RyO49TTZIX j6NSVPGP0owfMk9TvqkfKvLip2Gubtg9yr0tOnhPyHj9Xo1F63WRPz3cSwNj875oPCgB JmRg== X-Gm-Message-State: AOAM531XXavPmqMuJvzaOKxwEbPCI+eGQZiwNM7dWDgEVYfJ5VbCj5iS /eUce/wltL1tCiSMkjZnWo9d5XHW8Q8= X-Google-Smtp-Source: ABdhPJy7Suf1k8bT6t7lHGXgDiWxiNiyo5tnsASNVEDrxtnUZ8jWM/tkSyEbIlGTm0A/b7gOJsU9F+tyoiE= Sender: "weiwan via sendgmr" X-Received: from weiwan.svl.corp.google.com ([2620:15c:2c4:201:819e:a51d:5f26:827c]) (user=weiwan job=sendgmr) by 2002:a17:90a:d0c4:: with SMTP id y4mr913997pjw.212.1612474285659; Thu, 04 Feb 2021 13:31:25 -0800 (PST) Date: Thu, 4 Feb 2021 13:31:17 -0800 In-Reply-To: <20210204213117.1736289-1-weiwan@google.com> Message-Id: <20210204213117.1736289-4-weiwan@google.com> Mime-Version: 1.0 References: <20210204213117.1736289-1-weiwan@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH net-next v10 3/3] net: add sysfs attribute to control napi threaded mode From: Wei Wang To: David Miller , netdev@vger.kernel.org, Jakub Kicinski Cc: Paolo Abeni , Hannes Frederic Sowa , Eric Dumazet , Felix Fietkau , Alexander Duyck Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org This patch adds a new sysfs attribute to the network device class. Said attribute provides a per-device control to enable/disable the threaded mode for all the napi instances of the given network device, without the need for a device up/down. User sets it to 1 or 0 to enable or disable threaded mode. Note: when switching between threaded and the current softirq based mode for a napi instance, it will not immediately take effect if the napi is currently being polled. The mode switch will happen for the next time napi_schedule() is called. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Hannes Frederic Sowa Signed-off-by: Hannes Frederic Sowa Co-developed-by: Felix Fietkau Signed-off-by: Felix Fietkau Signed-off-by: Wei Wang --- Documentation/ABI/testing/sysfs-class-net | 15 +++++ include/linux/netdevice.h | 2 + net/core/dev.c | 67 ++++++++++++++++++++++- net/core/net-sysfs.c | 45 +++++++++++++++ 4 files changed, 127 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net index 1f2002df5ba2..1419103d11f9 100644 --- a/Documentation/ABI/testing/sysfs-class-net +++ b/Documentation/ABI/testing/sysfs-class-net @@ -337,3 +337,18 @@ Contact: netdev@vger.kernel.org Description: 32-bit unsigned integer counting the number of times the link has been down + +What: /sys/class/net//threaded +Date: Jan 2021 +KernelVersion: 5.12 +Contact: netdev@vger.kernel.org +Description: + Boolean value to control the threaded mode per device. User could + set this value to enable/disable threaded mode for all napi + belonging to this device, without the need to do device up/down. + + Possible values: + == ================================== + 0 threaded mode disabled for this dev + 1 threaded mode enabled for this dev + == ================================== diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 99fb4ec9573e..1340327f7abf 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n) return napi_complete_done(n, 0); } +int dev_set_threaded(struct net_device *dev, bool threaded); + /** * napi_disable - prevent NAPI from scheduling * @n: NAPI context diff --git a/net/core/dev.c b/net/core/dev.c index a8c5eca17074..9cc9b245419e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4290,8 +4290,9 @@ static inline void ____napi_schedule(struct softnet_data *sd, if (test_bit(NAPI_STATE_THREADED, &napi->state)) { /* Paired with smp_mb__before_atomic() in - * napi_enable(). Use READ_ONCE() to guarantee - * a complete read on napi->thread. Only call + * napi_enable()/napi_set_threaded(). + * Use READ_ONCE() to guarantee a complete + * read on napi->thread. Only call * wake_up_process() when it's not NULL. */ thread = READ_ONCE(napi->thread); @@ -6743,6 +6744,68 @@ static void init_gro_hash(struct napi_struct *napi) napi->gro_bitmask = 0; } +/* Setting/unsetting threaded mode on a napi might not immediately + * take effect, if the current napi instance is actively being + * polled. In this case, the switch between threaded mode and + * softirq mode will happen in the next round of napi_schedule(). + * This should not cause hiccups/stalls to the live traffic. + */ +static int napi_set_threaded(struct napi_struct *n, bool threaded) +{ + int err = 0; + + if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state)) + return 0; + + if (!threaded) { + clear_bit(NAPI_STATE_THREADED, &n->state); + return 0; + } + + if (!n->thread) { + err = napi_kthread_create(n); + if (err) + return err; + } + + /* Make sure kthread is created before THREADED bit + * is set. + */ + smp_mb__before_atomic(); + set_bit(NAPI_STATE_THREADED, &n->state); + + return 0; +} + +static void dev_disable_threaded_all(struct net_device *dev) +{ + struct napi_struct *napi; + + list_for_each_entry(napi, &dev->napi_list, dev_list) + napi_set_threaded(napi, false); + dev->threaded = 0; +} + +int dev_set_threaded(struct net_device *dev, bool threaded) +{ + struct napi_struct *napi; + int ret; + + dev->threaded = threaded; + list_for_each_entry(napi, &dev->napi_list, dev_list) { + ret = napi_set_threaded(napi, threaded); + if (ret) { + /* Error occurred on one of the napi, + * reset threaded mode on all napi. + */ + dev_disable_threaded_all(dev); + break; + } + } + + return ret; +} + void netif_napi_add(struct net_device *dev, struct napi_struct *napi, int (*poll)(struct napi_struct *, int), int weight) { diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index daf502c13d6d..969743567257 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -538,6 +538,50 @@ static ssize_t phys_switch_id_show(struct device *dev, } static DEVICE_ATTR_RO(phys_switch_id); +static ssize_t threaded_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct net_device *netdev = to_net_dev(dev); + int ret; + + if (!rtnl_trylock()) + return restart_syscall(); + + if (!dev_isalive(netdev)) { + ret = -EINVAL; + goto unlock; + } + + ret = sprintf(buf, fmt_dec, netdev->threaded); + +unlock: + rtnl_unlock(); + return ret; +} + +static int modify_napi_threaded(struct net_device *dev, unsigned long val) +{ + int ret; + + if (list_empty(&dev->napi_list)) + return -EOPNOTSUPP; + + if (val != 0 && val != 1) + return -EOPNOTSUPP; + + ret = dev_set_threaded(dev, val); + + return ret; +} + +static ssize_t threaded_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + return netdev_store(dev, attr, buf, len, modify_napi_threaded); +} +static DEVICE_ATTR_RW(threaded); + static struct attribute *net_class_attrs[] __ro_after_init = { &dev_attr_netdev_group.attr, &dev_attr_type.attr, @@ -570,6 +614,7 @@ static struct attribute *net_class_attrs[] __ro_after_init = { &dev_attr_proto_down.attr, &dev_attr_carrier_up_count.attr, &dev_attr_carrier_down_count.attr, + &dev_attr_threaded.attr, NULL, }; ATTRIBUTE_GROUPS(net_class);