From patchwork Fri Apr 23 11:05:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12220107 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD28CC43461 for ; Fri, 23 Apr 2021 11:05:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 919376146D for ; Fri, 23 Apr 2021 11:05:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229807AbhDWLF7 (ORCPT ); Fri, 23 Apr 2021 07:05:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21957 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229961AbhDWLF7 (ORCPT ); Fri, 23 Apr 2021 07:05:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619175922; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NrCDxRbYZCigGpcjiW65eSzoXxIQrnv7V+nogAQLaR8=; b=bJw+P6qfEV7ObkdYc7Dj6C6qMCZP98/7+MAruJ24rrnJB259wvWwBZCH5FX/Nb/IiYTM5Q ukU2YghaC0D/3sxt+92NJpWt6NPGdKAZL7jOpdqnS2ZGzXlnexyCWgrQCHL4xPCY2qIghZ Ub6N+8EV6ojkP9V8XsT15ZfOB0mXj04= Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-348-zjD42UzZPMCBTthOjYPhBg-1; Fri, 23 Apr 2021 07:05:20 -0400 X-MC-Unique: zjD42UzZPMCBTthOjYPhBg-1 Received: by mail-ed1-f71.google.com with SMTP id y10-20020a50f1ca0000b0290382d654f75eso18404937edl.1 for ; Fri, 23 Apr 2021 04:05:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=NrCDxRbYZCigGpcjiW65eSzoXxIQrnv7V+nogAQLaR8=; b=k9v9N9jDYGJrhVwUXya6zgrXGZnvUc2fme0jOySEP+4zZsfSWl0CcXjeQzwcodoOJd tD783qVnc0R5I/KLtQovQaTCGqeEbZ1rWOX1+3rllDWgA1DdQCxFvL+/TKCMNRSmTQvs Gvbh6ru4OG2uBba2C3I7UQWVN9z2nKublSvO6XSYS6F/xJsBJJ3UV7MzT260SmsZVbhY V06aVF54UlgakcTA/dUyILYAXabAxRjKqCUePQuwohewMNS8xjUnONFBthZfzxJT8NBJ 4/Rj28LFmykesyq/selOC62OSe4V6MTEQFHrAP4n3S7Mmf5J5xG8Ny8ifsGxpSOVdCIS VgEw== X-Gm-Message-State: AOAM531IrIRIcvUXxrbxkgRS0Lmg5RBLi0KT0Id1Rgp1pHf6wmWbuQIo rsjmlRQph8GlQIDyWrOEM18L7b2zZuj4knbUgFkquca90PbapsaBl+TLrnBPHUVzFAlgUdmlu5f hYUrfwyp/5pqf X-Received: by 2002:a05:6402:78d:: with SMTP id d13mr3725185edy.277.1619175919583; Fri, 23 Apr 2021 04:05:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy7pIUuP2KdS5bS5B9w13P79WLAurnXHNN70WZ7Z2HnP7r6Nj5YH8Z7IZJ2pOwi892hSsxMbA== X-Received: by 2002:a05:6402:78d:: with SMTP id d13mr3725127edy.277.1619175918961; Fri, 23 Apr 2021 04:05:18 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id by27sm3642620ejc.47.2021.04.23.04.05.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Apr 2021 04:05:18 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id BDCE8180676; Fri, 23 Apr 2021 13:05:16 +0200 (CEST) Subject: [PATCH RFC bpf-next 1/4] rcu: Create an unrcu_pointer() to remove __rcu from a pointer From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, Martin KaFai Lau , Hangbin Liu , Jesper Dangaard Brouer , Magnus Karlsson , "Paul E. McKenney" Date: Fri, 23 Apr 2021 13:05:16 +0200 Message-ID: <161917591669.102337.2073449001446955820.stgit@toke.dk> In-Reply-To: <161917591559.102337.3558507780042453425.stgit@toke.dk> References: <161917591559.102337.3558507780042453425.stgit@toke.dk> User-Agent: StGit/1.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Paul E. McKenney The xchg() and cmpxchg() functions are sometimes used to carry out RCU updates. Unfortunately, this can result in sparse warnings for both the old-value and new-value arguments, as well as for the return value. The arguments can be dealt with using RCU_INITIALIZER(): old_p = xchg(&p, RCU_INITIALIZER(new_p)); But a sparse warning still remains due to assigning the __rcu pointer returned from xchg to the (most likely) non-__rcu pointer old_p. This commit therefore provides an unrcu_pointer() macro that strips the __rcu. This macro can be used as follows: old_p = unrcu_pointer(xchg(&p, RCU_INITIALIZER(new_p))); Reported-by: Toke Høiland-Jørgensen Signed-off-by: Paul E. McKenney Signed-off-by: Toke Høiland-Jørgensen --- include/linux/rcupdate.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index bd04f722714f..49f368c5d4ec 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -362,6 +362,20 @@ static inline void rcu_preempt_sleep_check(void) { } #define rcu_check_sparse(p, space) #endif /* #else #ifdef __CHECKER__ */ +/** + * unrcu_pointer - mark a pointer as not being RCU protected + * @p: pointer needing to lose its __rcu property + * + * Converts @p from an __rcu pointer to a __kernel pointer. + * This allows an __rcu pointer to be used with xchg() and friends. + */ +#define unrcu_pointer(p) \ +({ \ + typeof(*p) *_________p1 = (typeof(*p) *__force)(p); \ + rcu_check_sparse(p, __rcu); \ + ((typeof(*p) __force __kernel *)(_________p1)); \ +}) + #define __rcu_access_pointer(p, space) \ ({ \ typeof(*p) *_________p1 = (typeof(*p) *__force)READ_ONCE(p); \ From patchwork Fri Apr 23 11:05:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12220105 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74AE3C433B4 for ; Fri, 23 Apr 2021 11:05:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3B52861468 for ; Fri, 23 Apr 2021 11:05:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242029AbhDWLF7 (ORCPT ); Fri, 23 Apr 2021 07:05:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:35540 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229807AbhDWLF6 (ORCPT ); Fri, 23 Apr 2021 07:05:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619175922; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tU71HJ9a16XxEOcyk9gr3EIn1rS0Q/y86aLDvKigw/U=; b=RU03UsgW3fEO8PchtSMaWUmkeZ+w+3bVlfC8/vy8N+yNdXNoK4N8jWYRFznLy+grj3HOey t8lJNV+ezMxwmjBPKLLMEWByD4eXMelYsZFZ52dXGAMUh8re0nXnQcRoAvW0G2gvNDhDoo 17TZor6t5A0VE8sn3H75EVcWtc0v4nw= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-565-bR8sCtKhNsWTi7Z3CtDxWw-1; Fri, 23 Apr 2021 07:05:20 -0400 X-MC-Unique: bR8sCtKhNsWTi7Z3CtDxWw-1 Received: by mail-ej1-f72.google.com with SMTP id o25-20020a1709061d59b029037c94676df5so8287287ejh.7 for ; Fri, 23 Apr 2021 04:05:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=tU71HJ9a16XxEOcyk9gr3EIn1rS0Q/y86aLDvKigw/U=; b=aVXNGtzCO25Xrdvwi0J4cennZS1HqBOtKhxIVQFMRbdW/zJrbYB28NLMMjQk1YWT3H x2eMozEkFwj9Mp3894BquLn2dk8sHpWLf3ekaUr6x8X+Wjyv3PL6c3cGD+B+TfYKP3FK 2+K6/OAgZf6XGr1OQpKMRd0njGh7uJi2EJmvGqX06Sq1nEX99PU+hTHu0236Hv6GCSA2 1U1c8SNfTmyvjur549VSWfgomUMwjeDmBbhnjKvheaAPutLHgWXvzICTMbhKuelnKndO 0RWul9WxrW1S1p3bdr70l/O5Jj2OwEztNYnSnsy97t6OQJPPMzg8tUeKILDze1CRAVm0 6FrQ== X-Gm-Message-State: AOAM532RlVaJPn0ZBVGeUksRLyoi201sS4DW/tOFbbLMNcZX+QsKO9Fb fo4pyS9hvSZT0crMFmp7VvvIsD0gT+qT0JYQvix0iDxseWF/bvH/+EN7MT/jbTTFN6T0faLKskd jGI/vv8CHYwkU X-Received: by 2002:a17:906:f8cd:: with SMTP id lh13mr3555273ejb.387.1619175919384; Fri, 23 Apr 2021 04:05:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy0j5s13g+AXEBAelkMkVfD/IFdrSngCBlCH+7O5wgayqscrs4HoHrUZ0OuSAf4NtRQhjv3DQ== X-Received: by 2002:a17:906:f8cd:: with SMTP id lh13mr3555259ejb.387.1619175919246; Fri, 23 Apr 2021 04:05:19 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id e20sm3752294ejy.66.2021.04.23.04.05.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Apr 2021 04:05:18 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id D7214180675; Fri, 23 Apr 2021 13:05:17 +0200 (CEST) Subject: [PATCH RFC bpf-next 2/4] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, Martin KaFai Lau , Hangbin Liu , Jesper Dangaard Brouer , Magnus Karlsson , "Paul E. McKenney" Date: Fri, 23 Apr 2021 13:05:17 +0200 Message-ID: <161917591778.102337.1084869564077259501.stgit@toke.dk> In-Reply-To: <161917591559.102337.3558507780042453425.stgit@toke.dk> References: <161917591559.102337.3558507780042453425.stgit@toke.dk> User-Agent: StGit/1.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Toke Høiland-Jørgensen Some of the XDP helpers (in particular, xdp_do_redirect()) will get a struct net_device reference using dev_get_by_index_rcu(). These are called from a NAPI poll context, which means the RCU reference liveness is ensured by local_bh_disable(). Add rcu_read_lock_bh_held() as a condition to the RCU list traversal in dev_get_by_index_rcu() so lockdep understands that the dereferences are safe from *both* an rcu_read_lock() *and* with local_bh_disable(). Signed-off-by: Toke Høiland-Jørgensen --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index b4c67a5be606..a7b8e3289f7c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1002,7 +1002,7 @@ struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex) struct net_device *dev; struct hlist_head *head = dev_index_hash(net, ifindex); - hlist_for_each_entry_rcu(dev, head, index_hlist) + hlist_for_each_entry_rcu(dev, head, index_hlist, rcu_read_lock_bh_held()) if (dev->ifindex == ifindex) return dev; From patchwork Fri Apr 23 11:05:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12220109 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98F3DC43462 for ; Fri, 23 Apr 2021 11:05:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E04C61476 for ; Fri, 23 Apr 2021 11:05:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242182AbhDWLGC (ORCPT ); Fri, 23 Apr 2021 07:06:02 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:25332 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242152AbhDWLGB (ORCPT ); Fri, 23 Apr 2021 07:06:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619175925; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P5pxWWEwDPsI/lkLQ7B9CkGVVfb+v2beJFRMQPSWoOc=; b=EeTdZVr97oKsm75bj1IlwhgibrR7T8Yt5yv2om5b06dk7ugrW1FiQx3xbWtRjbaso0BSCq rz27vc3Hy1u7P7YKjVzGxSvuTMj4AROVsEKvXZ1CFBPzw4qcwvqDNevQ5owsA6SS4XsueH WuyP07ElVOglfbTV1Oyjvjuf6UOFhaw= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-9-2PUvk3edNZi15T1Q0FLSGg-1; Fri, 23 Apr 2021 07:05:23 -0400 X-MC-Unique: 2PUvk3edNZi15T1Q0FLSGg-1 Received: by mail-ed1-f69.google.com with SMTP id h13-20020a05640250cdb02903790a9c55acso18379428edb.4 for ; Fri, 23 Apr 2021 04:05:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=P5pxWWEwDPsI/lkLQ7B9CkGVVfb+v2beJFRMQPSWoOc=; b=Ga8E3M7ovOVta2eGq3qhZcViHAOhB2sWm248Ue74x3ODqOsqo8WLSPjALlEpOvAKhm 6GcMGt0zy03i27S6f+SiKm9Zp6hYwfLAdYjeYqnUG/gFXzdAl1k05GSLmtiA4NkvE+rJ fxd7l848bI2OOPlk00c/bkE69rmfqE2ciqRvPtvRV4of1RK9hBgGXH4TQDMEdFHXkdRS ieNWP/Xa+13XFKtcTaY+bfgSxBJ9G1noUC4AQZSHLWtBOo54fimaWVqbHKmsSiq0Wp+m pTDJh3igq+G5qXJKlx/qd4ONYFH43lc0gVLtn4c+MDxMqRYibWLiEq836lfP7fvNDwCM /PBg== X-Gm-Message-State: AOAM531OJO1FUspUr5vjE1ujSTPDHWsfM9w/yRyeEjVs1s+OHuLUjqE6 gp1jGY8naAjlfwwUoGIuMFT/1k8Lyv5cocB5khWpQJZGqAF4mb38s4w5YC1yiqFk21UPX0QXY7A F1WI8xzgHaX2x X-Received: by 2002:a17:906:5585:: with SMTP id y5mr3524033ejp.129.1619175921385; Fri, 23 Apr 2021 04:05:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwbddJ6f33AvDiRUAG56laKa8qGA6nBD1mudjJzXfoX1SlyBHctyz9o6LFxWeA9+u1+vtJ/Fg== X-Received: by 2002:a17:906:5585:: with SMTP id y5mr3523984ejp.129.1619175920864; Fri, 23 Apr 2021 04:05:20 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id f7sm3847587ejz.95.2021.04.23.04.05.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Apr 2021 04:05:19 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id EB9A1180677; Fri, 23 Apr 2021 13:05:18 +0200 (CEST) Subject: [PATCH RFC bpf-next 3/4] xdp: add proper __rcu annotations to redirect map entries From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, Martin KaFai Lau , Hangbin Liu , Jesper Dangaard Brouer , Magnus Karlsson , "Paul E. McKenney" Date: Fri, 23 Apr 2021 13:05:18 +0200 Message-ID: <161917591888.102337.17261581380237633381.stgit@toke.dk> In-Reply-To: <161917591559.102337.3558507780042453425.stgit@toke.dk> References: <161917591559.102337.3558507780042453425.stgit@toke.dk> User-Agent: StGit/1.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Toke Høiland-Jørgensen XDP_REDIRECT works by a three-step process: the bpf_redirect() and bpf_redirect_map() helpers will lookup the target of the redirect and store it (along with some other metadata) in a per-CPU struct bpf_redirect_info. Next, when the program returns the XDP_REDIRECT return code, the driver will call xdp_do_redirect() which will use the information thus stored to actually enqueue the frame into a bulk queue structure (that differs slightly by map type, but shares the same principle). Finally, before exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will flush all the different bulk queues, thus completing the redirect. Pointers to the map entries will be kept around for this whole sequence of steps, protected by RCU. However, there is no top-level rcu_read_lock() in the core code; instead drivers add their own rcu_read_lock() around the XDP portions of the code, but somewhat inconsistently as Martin discovered[0]. However, things still work because everything happens inside a single NAPI poll sequence, which means it's between a pair of calls to local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could document this intention by using rcu_dereference_check() with rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and lockdep to verify that everything is done correctly. This patch does just that: we add an __rcu annotation to the map entry pointers and remove the various comments explaining the NAPI poll assurance strewn through devmap.c in favour of a longer explanation in filter.c. The goal is to have one coherent documentation of the entire flow, and rely on the RCU annotations as a "standard" way of communicating the flow in the map code (which can additionally be understood by sparse and lockdep). The RCU annotation replacements result in a fairly straight-forward replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE() becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the proper constructs to cast the pointer back and forth between __rcu and __kernel address space (for the benefit of sparse). The one complication is that xskmap has a few constructions where double-pointers are passed back and forth; these simply all gain __rcu annotations, and only the final reference/dereference to the inner-most pointer gets changed. With this, everything can be run through sparse without eliciting complaints, and lockdep can verify correctness even without the use of rcu_read_lock() in the drivers. Subsequent patches will clean these up from the drivers. [0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/ [1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/ Signed-off-by: Toke Høiland-Jørgensen --- include/net/xdp_sock.h | 2 +- kernel/bpf/cpumap.c | 14 +++++++++---- kernel/bpf/devmap.c | 52 +++++++++++++++++++++--------------------------- net/core/filter.c | 28 ++++++++++++++++++++++++++ net/xdp/xsk.c | 4 ++-- net/xdp/xsk.h | 4 ++-- net/xdp/xskmap.c | 29 ++++++++++++++++----------- 7 files changed, 83 insertions(+), 50 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 9c0722c6d7ac..fff069d2ed1b 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -37,7 +37,7 @@ struct xdp_umem { struct xsk_map { struct bpf_map map; spinlock_t lock; /* Synchronize map updates */ - struct xdp_sock *xsk_map[]; + struct xdp_sock __rcu *xsk_map[]; }; struct xdp_sock { diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 0cf2791d5099..5e51d939ddf5 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -74,7 +74,7 @@ struct bpf_cpu_map_entry { struct bpf_cpu_map { struct bpf_map map; /* Below members specific for map type */ - struct bpf_cpu_map_entry **cpu_map; + struct bpf_cpu_map_entry __rcu **cpu_map; }; static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list); @@ -469,7 +469,7 @@ static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap, { struct bpf_cpu_map_entry *old_rcpu; - old_rcpu = xchg(&cmap->cpu_map[key_cpu], rcpu); + old_rcpu = unrcu_pointer(xchg(&cmap->cpu_map[key_cpu], RCU_INITIALIZER(rcpu))); if (old_rcpu) { call_rcu(&old_rcpu->rcu, __cpu_map_entry_free); INIT_WORK(&old_rcpu->kthread_stop_wq, cpu_map_kthread_stop); @@ -551,7 +551,8 @@ static void cpu_map_free(struct bpf_map *map) for (i = 0; i < cmap->map.max_entries; i++) { struct bpf_cpu_map_entry *rcpu; - rcpu = READ_ONCE(cmap->cpu_map[i]); + rcpu = rcu_dereference_check(cmap->cpu_map[i], + rcu_read_lock_bh_held()); if (!rcpu) continue; @@ -562,6 +563,10 @@ static void cpu_map_free(struct bpf_map *map) kfree(cmap); } +/* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or + * by local_bh_disable() (from XDP calls inside NAPI). The + * rcu_read_lock_bh_held() below makes lockdep accept both. + */ static void *__cpu_map_lookup_elem(struct bpf_map *map, u32 key) { struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map); @@ -570,7 +575,8 @@ static void *__cpu_map_lookup_elem(struct bpf_map *map, u32 key) if (key >= map->max_entries) return NULL; - rcpu = READ_ONCE(cmap->cpu_map[key]); + rcpu = rcu_dereference_check(cmap->cpu_map[key], + rcu_read_lock_bh_held()); return rcpu; } diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index aa516472ce46..a1d2e86c8898 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -72,7 +72,7 @@ struct bpf_dtab_netdev { struct bpf_dtab { struct bpf_map map; - struct bpf_dtab_netdev **netdev_map; /* DEVMAP type only */ + struct bpf_dtab_netdev __rcu **netdev_map; /* DEVMAP type only */ struct list_head list; /* these are only used for DEVMAP_HASH type maps */ @@ -224,7 +224,7 @@ static void dev_map_free(struct bpf_map *map) for (i = 0; i < dtab->map.max_entries; i++) { struct bpf_dtab_netdev *dev; - dev = dtab->netdev_map[i]; + dev = rcu_dereference_raw(dtab->netdev_map[i]); if (!dev) continue; @@ -257,6 +257,10 @@ static int dev_map_get_next_key(struct bpf_map *map, void *key, void *next_key) return 0; } +/* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or + * by local_bh_disable() (from XDP calls inside NAPI). The + * rcu_read_lock_bh_held() below makes lockdep accept both. + */ static void *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key) { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); @@ -264,7 +268,8 @@ static void *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key) struct bpf_dtab_netdev *dev; hlist_for_each_entry_rcu(dev, head, index_hlist, - lockdep_is_held(&dtab->index_lock)) + (lockdep_is_held(&dtab->index_lock) || + rcu_read_lock_bh_held())) if (dev->idx == key) return dev; @@ -362,15 +367,9 @@ static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) __list_del_clearprev(&bq->flush_node); } -/* __dev_flush is called from xdp_do_flush() which _must_ be signaled - * from the driver before returning from its napi->poll() routine. The poll() - * routine is called either from busy_poll context or net_rx_action signaled - * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the - * net device can be torn down. On devmap tear down we ensure the flush list - * is empty before completing to ensure all flush operations have completed. - * When drivers update the bpf program they may need to ensure any flush ops - * are also complete. Using synchronize_rcu or call_rcu will suffice for this - * because both wait for napi context to exit. +/* __dev_flush is called from xdp_do_flush() which _must_ be signalled from the + * driver before returning from its napi->poll() routine. See the comment above + * xdp_do_flush() in filter.c. */ void __dev_flush(void) { @@ -381,9 +380,9 @@ void __dev_flush(void) bq_xmit_all(bq, XDP_XMIT_FLUSH); } -/* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or - * update happens in parallel here a dev_put wont happen until after reading the - * ifindex. +/* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or + * by local_bh_disable() (from XDP calls inside NAPI). The + * rcu_read_lock_bh_held() below makes lockdep accept both. */ static void *__dev_map_lookup_elem(struct bpf_map *map, u32 key) { @@ -393,12 +392,14 @@ static void *__dev_map_lookup_elem(struct bpf_map *map, u32 key) if (key >= map->max_entries) return NULL; - obj = READ_ONCE(dtab->netdev_map[key]); + obj = rcu_dereference_check(dtab->netdev_map[key], + rcu_read_lock_bh_held()); return obj; } -/* Runs under RCU-read-side, plus in softirq under NAPI protection. - * Thus, safe percpu variable access. +/* Runs in NAPI, i.e., softirq under local_bh_disable(). Thus, safe percpu + * variable access, and map elements stick around. See comment above + * xdp_do_flush() in filter.c. */ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, struct net_device *dev_rx) @@ -538,14 +539,7 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key) if (k >= map->max_entries) return -EINVAL; - /* Use call_rcu() here to ensure any rcu critical sections have - * completed as well as any flush operations because call_rcu - * will wait for preempt-disable region to complete, NAPI in this - * context. And additionally, the driver tear down ensures all - * soft irqs are complete before removing the net device in the - * case of dev_put equals zero. - */ - old_dev = xchg(&dtab->netdev_map[k], NULL); + old_dev = unrcu_pointer(xchg(&dtab->netdev_map[k], NULL)); if (old_dev) call_rcu(&old_dev->rcu, __dev_map_entry_free); return 0; @@ -654,7 +648,7 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map, * Remembering the driver side flush operation will happen before the * net device is removed. */ - old_dev = xchg(&dtab->netdev_map[i], dev); + old_dev = unrcu_pointer(xchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev))); if (old_dev) call_rcu(&old_dev->rcu, __dev_map_entry_free); @@ -830,10 +824,10 @@ static int dev_map_notification(struct notifier_block *notifier, for (i = 0; i < dtab->map.max_entries; i++) { struct bpf_dtab_netdev *dev, *odev; - dev = READ_ONCE(dtab->netdev_map[i]); + dev = rcu_dereference(dtab->netdev_map[i]); if (!dev || netdev != dev->dev) continue; - odev = cmpxchg(&dtab->netdev_map[i], dev, NULL); + odev = unrcu_pointer(cmpxchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev), NULL)); if (dev == odev) call_rcu(&dev->rcu, __dev_map_entry_free); diff --git a/net/core/filter.c b/net/core/filter.c index cae56d08a670..e2c0dc2b551c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3918,6 +3918,34 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { .arg2_type = ARG_ANYTHING, }; +/* XDP_REDIRECT works by a three-step process, implemented in the functions + * below: + * + * 1. The bpf_redirect() and bpf_redirect_map() helpers will lookup the target + * of the redirect and store it (along with some other metadata) in a per-CPU + * struct bpf_redirect_info. + * + * 2. When the program returns the XDP_REDIRECT return code, the driver will + * call xdp_do_redirect() which will use the information in struct + * bpf_redirect_info to actually enqueue the frame into a map type-specific + * bulk queue structure. + * + * 3. Before exiting its NAPI poll loop, the driver will call xdp_do_flush(), + * which will flush all the different bulk queues, thus completing the + * redirect. + * + * Pointers to the map entries will be kept around for this whole sequence of + * steps, protected by RCU. However, there is no top-level rcu_read_lock() in + * the core code; instead, the RCU protection relies on everything happening + * inside a single NAPI poll sequence, which means it's between a pair of calls + * to local_bh_disable()/local_bh_enable(). + * + * The map entries are marked as __rcu and the map code makes sure to + * dereference those pointers with rcu_dereference_check() in a way that works + * for both sections that to hold an rcu_read_lock() and sections that are + * called from NAPI without a separate rcu_read_lock(). The code below does not + * use RCU annotations, but relies on those in the map code. + */ void xdp_do_flush(void) { __dev_flush(); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index a71ed664da0a..f910445aad85 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -749,7 +749,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs) } static struct xsk_map *xsk_get_map_list_entry(struct xdp_sock *xs, - struct xdp_sock ***map_entry) + struct xdp_sock __rcu ***map_entry) { struct xsk_map *map = NULL; struct xsk_map_node *node; @@ -785,7 +785,7 @@ static void xsk_delete_from_maps(struct xdp_sock *xs) * might be updates to the map between * xsk_get_map_list_entry() and xsk_map_try_sock_delete(). */ - struct xdp_sock **map_entry = NULL; + struct xdp_sock __rcu **map_entry = NULL; struct xsk_map *map; while ((map = xsk_get_map_list_entry(xs, &map_entry))) { diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h index edcf249ad1f1..a4bc4749faac 100644 --- a/net/xdp/xsk.h +++ b/net/xdp/xsk.h @@ -31,7 +31,7 @@ struct xdp_mmap_offsets_v1 { struct xsk_map_node { struct list_head node; struct xsk_map *map; - struct xdp_sock **map_entry; + struct xdp_sock __rcu **map_entry; }; static inline struct xdp_sock *xdp_sk(struct sock *sk) @@ -40,7 +40,7 @@ static inline struct xdp_sock *xdp_sk(struct sock *sk) } void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs, - struct xdp_sock **map_entry); + struct xdp_sock __rcu **map_entry); void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id); int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, u16 queue_id); diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c index 67b4ce504852..b8b90abbae67 100644 --- a/net/xdp/xskmap.c +++ b/net/xdp/xskmap.c @@ -12,7 +12,7 @@ #include "xsk.h" static struct xsk_map_node *xsk_map_node_alloc(struct xsk_map *map, - struct xdp_sock **map_entry) + struct xdp_sock __rcu **map_entry) { struct xsk_map_node *node; @@ -42,7 +42,7 @@ static void xsk_map_sock_add(struct xdp_sock *xs, struct xsk_map_node *node) } static void xsk_map_sock_delete(struct xdp_sock *xs, - struct xdp_sock **map_entry) + struct xdp_sock __rcu **map_entry) { struct xsk_map_node *n, *tmp; @@ -124,6 +124,10 @@ static int xsk_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) return insn - insn_buf; } +/* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or + * by local_bh_disable() (from XDP calls inside NAPI). The + * rcu_read_lock_bh_held() below makes lockdep accept both. + */ static void *__xsk_map_lookup_elem(struct bpf_map *map, u32 key) { struct xsk_map *m = container_of(map, struct xsk_map, map); @@ -131,12 +135,11 @@ static void *__xsk_map_lookup_elem(struct bpf_map *map, u32 key) if (key >= map->max_entries) return NULL; - return READ_ONCE(m->xsk_map[key]); + return rcu_dereference_check(m->xsk_map[key], rcu_read_lock_bh_held()); } static void *xsk_map_lookup_elem(struct bpf_map *map, void *key) { - WARN_ON_ONCE(!rcu_read_lock_held()); return __xsk_map_lookup_elem(map, *(u32 *)key); } @@ -149,7 +152,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags) { struct xsk_map *m = container_of(map, struct xsk_map, map); - struct xdp_sock *xs, *old_xs, **map_entry; + struct xdp_sock __rcu **map_entry; + struct xdp_sock *xs, *old_xs; u32 i = *(u32 *)key, fd = *(u32 *)value; struct xsk_map_node *node; struct socket *sock; @@ -179,7 +183,7 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value, } spin_lock_bh(&m->lock); - old_xs = READ_ONCE(*map_entry); + old_xs = rcu_dereference_check(*map_entry, rcu_read_lock_bh_held()); if (old_xs == xs) { err = 0; goto out; @@ -191,7 +195,7 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value, goto out; } xsk_map_sock_add(xs, node); - WRITE_ONCE(*map_entry, xs); + rcu_assign_pointer(*map_entry, xs); if (old_xs) xsk_map_sock_delete(old_xs, map_entry); spin_unlock_bh(&m->lock); @@ -208,7 +212,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value, static int xsk_map_delete_elem(struct bpf_map *map, void *key) { struct xsk_map *m = container_of(map, struct xsk_map, map); - struct xdp_sock *old_xs, **map_entry; + struct xdp_sock __rcu **map_entry; + struct xdp_sock *old_xs; int k = *(u32 *)key; if (k >= map->max_entries) @@ -216,7 +221,7 @@ static int xsk_map_delete_elem(struct bpf_map *map, void *key) spin_lock_bh(&m->lock); map_entry = &m->xsk_map[k]; - old_xs = xchg(map_entry, NULL); + old_xs = unrcu_pointer(xchg(map_entry, NULL)); if (old_xs) xsk_map_sock_delete(old_xs, map_entry); spin_unlock_bh(&m->lock); @@ -230,11 +235,11 @@ static int xsk_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags) } void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs, - struct xdp_sock **map_entry) + struct xdp_sock __rcu **map_entry) { spin_lock_bh(&map->lock); - if (READ_ONCE(*map_entry) == xs) { - WRITE_ONCE(*map_entry, NULL); + if (rcu_dereference(*map_entry) == xs) { + rcu_assign_pointer(*map_entry, NULL); xsk_map_sock_delete(xs, map_entry); } spin_unlock_bh(&map->lock); From patchwork Fri Apr 23 11:05:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 12220111 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 621CEC433B4 for ; Fri, 23 Apr 2021 11:05:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2827761469 for ; Fri, 23 Apr 2021 11:05:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242222AbhDWLGE (ORCPT ); Fri, 23 Apr 2021 07:06:04 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:60950 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230026AbhDWLGD (ORCPT ); Fri, 23 Apr 2021 07:06:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619175927; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y6CycjK+7AbIhUKRxq6jvnqcUM32iF9Q4aWXrDqLBZk=; b=CB3GtEK+girdnZ7+8RMIrp8P49JNsbZ1vRgWBPH7pMajFxRBs6bpY9P/Ys7//qNgiHUDGM OEz2d94HRxgWR5yuMGnCiC3UCS0LbhmuUa4jCv/fpK7WtVryYuMLNG6Z7qcfQojtsSkTxX 75wChsMXtDHrjYav1LNfB32R4+I5iNo= Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-11-9Z9XXFfUNU-sXLkZoF3VSA-1; Fri, 23 Apr 2021 07:05:23 -0400 X-MC-Unique: 9Z9XXFfUNU-sXLkZoF3VSA-1 Received: by mail-ed1-f69.google.com with SMTP id c15-20020a056402100fb029038518e5afc5so10906776edu.18 for ; Fri, 23 Apr 2021 04:05:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=y6CycjK+7AbIhUKRxq6jvnqcUM32iF9Q4aWXrDqLBZk=; b=KEnkWBla7U7PCT14Ru6HlcB0f64hx5hps90r754tytTWD3zEZtRFtQIjPZugoAGmk6 egiT9inIvGtk7kEhWg1DzSRZ04d2hyrblJSF1uy9RE25W7qlgX+Yo6m1q44AXa3Kw2G6 9LoHkMegb+Zlf7grk7wog+4OjEzO5x3vADQMDdIBoMpYfghg849BwD8McFEbuYQB4iTE 1xa9NI3UX0vE7omOyc3T1GzxJTnyIDSSEJoOM3XkoM9Yqro1RcAhzRJa50HlZ+J3PUri IilJceB0Oi08SfIrWXarlqytWv1tGsrTIHepDl1EK36zvzA/mWSJl8L7GILpp40TvxbJ s60A== X-Gm-Message-State: AOAM532EWfX6AS5Hp8hQy0f3c0JUdzhJyHyd4lTouzbdZc0a+MnVJOAf DbDb2J2QQ9XJcARiq7Q4GW3GoBU9q1e+WXnZpBIO3mPtH7hGmHDhRjaXdefFhzsXVwI+m43/RMW S6Sg8EvO+1Qh0 X-Received: by 2002:a05:6402:204e:: with SMTP id bc14mr3844050edb.312.1619175921929; Fri, 23 Apr 2021 04:05:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyVN6CsFlSwZ75LwdpuXibVMgjzYRAUr4kCAins33TvWRaF0+GxO5jSw81lxKZ0okryL/u+pg== X-Received: by 2002:a05:6402:204e:: with SMTP id bc14mr3844007edb.312.1619175921535; Fri, 23 Apr 2021 04:05:21 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id q13sm131479edr.38.2021.04.23.04.05.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Apr 2021 04:05:20 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 13205180675; Fri, 23 Apr 2021 13:05:20 +0200 (CEST) Subject: [PATCH RFC bpf-next 4/4] i40e: remove rcu_read_lock() around XDP program invocation From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: bpf@vger.kernel.org Cc: netdev@vger.kernel.org, Martin KaFai Lau , Hangbin Liu , Jesper Dangaard Brouer , Magnus Karlsson , "Paul E. McKenney" Date: Fri, 23 Apr 2021 13:05:20 +0200 Message-ID: <161917591996.102337.9559803697014955421.stgit@toke.dk> In-Reply-To: <161917591559.102337.3558507780042453425.stgit@toke.dk> References: <161917591559.102337.3558507780042453425.stgit@toke.dk> User-Agent: StGit/1.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net X-Patchwork-State: RFC From: Toke Høiland-Jørgensen The i40e driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP program invocations. However, the actual lifetime of the objects referred by the XDP program invocation is longer, all the way through to the call to xdp_do_flush(), making the scope of the rcu_read_lock() too small. This turns out to be harmless because it all happens in a single NAPI poll cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock() misleading. Rather than extend the scope of the rcu_read_lock(), just get rid of it entirely. With the addition of RCU annotations to the XDP_REDIRECT map types that take bh execution into account, lockdep even understands this to be safe, so there's really no reason to keep it around. Signed-off-by: Toke Høiland-Jørgensen --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 -- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 6 +----- 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index fc20afc23bfa..3f4c947a5185 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2303,7 +2303,6 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, struct bpf_prog *xdp_prog; u32 act; - rcu_read_lock(); xdp_prog = READ_ONCE(rx_ring->xdp_prog); if (!xdp_prog) @@ -2334,7 +2333,6 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, break; } xdp_out: - rcu_read_unlock(); return ERR_PTR(-result); } diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index d89c22347d9d..93b349f11d3b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -153,7 +153,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) struct bpf_prog *xdp_prog; u32 act; - rcu_read_lock(); /* NB! xdp_prog will always be !NULL, due to the fact that * this path is enabled by setting an XDP program. */ @@ -162,9 +161,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) if (likely(act == XDP_REDIRECT)) { err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); - result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; - rcu_read_unlock(); - return result; + return !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED; } switch (act) { @@ -184,7 +181,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp) result = I40E_XDP_CONSUMED; break; } - rcu_read_unlock(); return result; }