From patchwork Tue Mar 21 18:45:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13183049 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BE24C761AF for ; Tue, 21 Mar 2023 18:46:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231126AbjCUSqj (ORCPT ); Tue, 21 Mar 2023 14:46:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230523AbjCUSq3 (ORCPT ); Tue, 21 Mar 2023 14:46:29 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94E7356534 for ; Tue, 21 Mar 2023 11:45:51 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id u5so17001191plq.7 for ; Tue, 21 Mar 2023 11:45:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679424349; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=37RotG0IIxyZ+FCfQpVz+Q9+HYtoydoJMOkMmFnsZuI=; b=MFsMJThQ+6h8rA4rz/2vUFiNSLWloE/AZDMRqwIrr0VbJ3VKDujY2XCWEVP4P9Kyz8 nuZHy7iMu89449cT/wHuQjeRXrBTy1moAbi9vMLRat4D2W1+p4jkZIbCmc0pRJaPyr+x 5APH2PNH5a7SqkhwPAtKsZdtF9hEv+M1lWGLCTIMemrAymQk9MVHYZ8QvIKKEMLafbAg VGX7rn83mUvGZPC/mSoY9vgcBilYERujM9JftGMYzaNmKp8VmsyT1ekF00ZvxB9twhOG d/QvkWijtJhUgCrcHCR+LOBTmc5qgx5ILDOoJBLzKAe+GF9mHH4NpzzLtElwNekgoEA5 2MKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679424349; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=37RotG0IIxyZ+FCfQpVz+Q9+HYtoydoJMOkMmFnsZuI=; b=aPhboFpbkaqMWbb72k9QLVXemW4IpZ31RrfH0twqOdkKTnuiG6Zk2ghaifCveACCOk iBZQKGDaF8Tl8T6hHCfIQbl/0GSybtEogM6M3DZdyMNyFkgxz6QbAFLESJ6AuoanIIXl kxk2VSuBiR/80CgYbIGbgzVI9DsKh6VxCUUrl6KbRQH7OI1Jp+4odNUoeGOCRDYNt/D4 IUf/ilv88w8hOWE3OpBeVYSQRAu2ur5vNVetjnBnMlwu6/6NiNcVYh7LjA4v+qndYzUX iurqXWJP2tKCzsO/UBbTK9orjYtNkwlR6M0o7WwdDIwmkE4w8U1k4YjElNs7z3swkgdm jROA== X-Gm-Message-State: AO0yUKXvjXDMc2XbUL3XhnmXG7oEnIBTk+ouYOFQe3b1B+uhxc9WpyMO qID5NDKSC4NcKcdkLCdkgz9PHPDJ7IOx2u6jTas= X-Google-Smtp-Source: AK7set/aUMS47cbcrWi5sWZQ9lFSKkmEFYSqLjNu6b+ajSMohZHg9AAevuQUhRatj7oZsWsRZl6CwQ== X-Received: by 2002:a17:903:290f:b0:1a1:98a9:4069 with SMTP id lh15-20020a170903290f00b001a198a94069mr114436plb.9.1679424348670; Tue, 21 Mar 2023 11:45:48 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c15500b0019cf747253csm9095878plj.87.2023.03.21.11.45.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 11:45:48 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com, Martin KaFai Lau Subject: [PATCH v3 bpf-next 1/5] bpf: Implement batching in UDP iterator Date: Tue, 21 Mar 2023 18:45:37 +0000 Message-Id: <20230321184541.1857363-2-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230321184541.1857363-1-aditi.ghag@isovalent.com> References: <20230321184541.1857363-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Batch UDP sockets from BPF iterator that allows for overlapping locking semantics in BPF/kernel helpers executed in BPF programs. This facilitates BPF socket destroy kfunc (introduced by follow-up patches) to execute from BPF iterator programs. Previously, BPF iterators acquired the sock lock and sockets hash table bucket lock while executing BPF programs. This prevented BPF helpers that again acquire these locks to be executed from BPF iterators. With the batching approach, we acquire a bucket lock, batch all the bucket sockets, and then release the bucket lock. This enables BPF or kernel helpers to skip sock locking when invoked in the supported BPF contexts. The batching logic is similar to the logic implemented in TCP iterator: https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com/. Suggested-by: Martin KaFai Lau Signed-off-by: Aditi Ghag --- net/ipv4/udp.c | 222 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 213 insertions(+), 9 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c605d171eb2d..545e56329355 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -3152,6 +3152,141 @@ struct bpf_iter__udp { int bucket __aligned(8); }; +struct bpf_udp_iter_state { + struct udp_iter_state state; + unsigned int cur_sk; + unsigned int end_sk; + unsigned int max_sk; + struct sock **batch; + bool st_bucket_done; +}; + +static unsigned short seq_file_family(const struct seq_file *seq); +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, + unsigned int new_batch_sz); + +static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk) +{ + unsigned short family = seq_file_family(seq); + + /* AF_UNSPEC is used as a match all */ + return ((family == AF_UNSPEC || family == sk->sk_family) && + net_eq(sock_net(sk), seq_file_net(seq))); +} + +static struct sock *bpf_iter_udp_batch(struct seq_file *seq) +{ + struct bpf_udp_iter_state *iter = seq->private; + struct udp_iter_state *state = &iter->state; + struct net *net = seq_file_net(seq); + struct udp_seq_afinfo *afinfo = state->bpf_seq_afinfo; + struct udp_table *udptable; + struct sock *first_sk = NULL; + struct sock *sk; + unsigned int bucket_sks = 0; + bool first; + bool resized = false; + + /* The current batch is done, so advance the bucket. */ + if (iter->st_bucket_done) + state->bucket++; + + udptable = udp_get_table_afinfo(afinfo, net); + +again: + /* New batch for the next bucket. + * Iterate over the hash table to find a bucket with sockets matching + * the iterator attributes, and return the first matching socket from + * the bucket. The remaining matched sockets from the bucket are batched + * before releasing the bucket lock. This allows BPF programs that are + * called in seq_show to acquire the bucket lock if needed. + */ + iter->cur_sk = 0; + iter->end_sk = 0; + iter->st_bucket_done = false; + first = true; + + for (; state->bucket <= udptable->mask; state->bucket++) { + struct udp_hslot *hslot = &udptable->hash[state->bucket]; + + if (hlist_empty(&hslot->head)) + continue; + + spin_lock_bh(&hslot->lock); + sk_for_each(sk, &hslot->head) { + if (seq_sk_match(seq, sk)) { + if (first) { + first_sk = sk; + first = false; + } + if (iter->end_sk < iter->max_sk) { + sock_hold(sk); + iter->batch[iter->end_sk++] = sk; + } + bucket_sks++; + } + } + spin_unlock_bh(&hslot->lock); + if (first_sk) + break; + } + + /* All done: no batch made. */ + if (!first_sk) + return NULL; + + if (iter->end_sk == bucket_sks) { + /* Batching is done for the current bucket; return the first + * socket to be iterated from the batch. + */ + iter->st_bucket_done = true; + return first_sk; + } + if (!resized && !bpf_iter_udp_realloc_batch(iter, bucket_sks * 3 / 2)) { + resized = true; + /* Go back to the previous bucket to resize its batch. */ + state->bucket--; + goto again; + } + return first_sk; +} + +static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct bpf_udp_iter_state *iter = seq->private; + struct sock *sk; + + /* Whenever seq_next() is called, the iter->cur_sk is + * done with seq_show(), so unref the iter->cur_sk. + */ + if (iter->cur_sk < iter->end_sk) + sock_put(iter->batch[iter->cur_sk++]); + + /* After updating iter->cur_sk, check if there are more sockets + * available in the current bucket batch. + */ + if (iter->cur_sk < iter->end_sk) { + sk = iter->batch[iter->cur_sk]; + } else { + // Prepare a new batch. + sk = bpf_iter_udp_batch(seq); + } + + ++*pos; + return sk; +} + +static void *bpf_iter_udp_seq_start(struct seq_file *seq, loff_t *pos) +{ + /* bpf iter does not support lseek, so it always + * continue from where it was stop()-ped. + */ + if (*pos) + return bpf_iter_udp_batch(seq); + + return SEQ_START_TOKEN; +} + static int udp_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta, struct udp_sock *udp_sk, uid_t uid, int bucket) { @@ -3172,18 +3307,38 @@ static int bpf_iter_udp_seq_show(struct seq_file *seq, void *v) struct bpf_prog *prog; struct sock *sk = v; uid_t uid; + bool slow; + int rc; if (v == SEQ_START_TOKEN) return 0; + slow = lock_sock_fast(sk); + + if (unlikely(sk_unhashed(sk))) { + rc = SEQ_SKIP; + goto unlock; + } + uid = from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)); meta.seq = seq; prog = bpf_iter_get_info(&meta, false); - return udp_prog_seq_show(prog, &meta, v, uid, state->bucket); + rc = udp_prog_seq_show(prog, &meta, v, uid, state->bucket); + +unlock: + unlock_sock_fast(sk, slow); + return rc; +} + +static void bpf_iter_udp_unref_batch(struct bpf_udp_iter_state *iter) +{ + while (iter->cur_sk < iter->end_sk) + sock_put(iter->batch[iter->cur_sk++]); } static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v) { + struct bpf_udp_iter_state *iter = seq->private; struct bpf_iter_meta meta; struct bpf_prog *prog; @@ -3194,15 +3349,31 @@ static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v) (void)udp_prog_seq_show(prog, &meta, v, 0, 0); } - udp_seq_stop(seq, v); + if (iter->cur_sk < iter->end_sk) { + bpf_iter_udp_unref_batch(iter); + iter->st_bucket_done = false; + } } static const struct seq_operations bpf_iter_udp_seq_ops = { - .start = udp_seq_start, - .next = udp_seq_next, + .start = bpf_iter_udp_seq_start, + .next = bpf_iter_udp_seq_next, .stop = bpf_iter_udp_seq_stop, .show = bpf_iter_udp_seq_show, }; + +static unsigned short seq_file_family(const struct seq_file *seq) +{ + const struct udp_seq_afinfo *afinfo; + + /* BPF iterator: bpf programs to filter sockets. */ + if (seq->op == &bpf_iter_udp_seq_ops) + return AF_UNSPEC; + + /* Proc fs iterator */ + afinfo = pde_data(file_inode(seq->file)); + return afinfo->family; +} #endif const struct seq_operations udp_seq_ops = { @@ -3413,9 +3584,30 @@ static struct pernet_operations __net_initdata udp_sysctl_ops = { DEFINE_BPF_ITER_FUNC(udp, struct bpf_iter_meta *meta, struct udp_sock *udp_sk, uid_t uid, int bucket) +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, + unsigned int new_batch_sz) +{ + struct sock **new_batch; + + new_batch = kvmalloc_array(new_batch_sz, sizeof(*new_batch), + GFP_USER | __GFP_NOWARN); + if (!new_batch) + return -ENOMEM; + + bpf_iter_udp_unref_batch(iter); + kvfree(iter->batch); + iter->batch = new_batch; + iter->max_sk = new_batch_sz; + + return 0; +} + +#define INIT_BATCH_SZ 16 + static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux) { - struct udp_iter_state *st = priv_data; + struct bpf_udp_iter_state *iter = priv_data; + struct udp_iter_state *st = &iter->state; struct udp_seq_afinfo *afinfo; int ret; @@ -3427,24 +3619,36 @@ static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux) afinfo->udp_table = NULL; st->bpf_seq_afinfo = afinfo; ret = bpf_iter_init_seq_net(priv_data, aux); - if (ret) + if (ret) { kfree(afinfo); + return ret; + } + ret = bpf_iter_udp_realloc_batch(iter, INIT_BATCH_SZ); + if (ret) { + bpf_iter_fini_seq_net(priv_data); + return ret; + } + iter->cur_sk = 0; + iter->end_sk = 0; + return ret; } static void bpf_iter_fini_udp(void *priv_data) { - struct udp_iter_state *st = priv_data; + struct bpf_udp_iter_state *iter = priv_data; + struct udp_iter_state *st = &iter->state; - kfree(st->bpf_seq_afinfo); bpf_iter_fini_seq_net(priv_data); + kfree(st->bpf_seq_afinfo); + kvfree(iter->batch); } static const struct bpf_iter_seq_info udp_seq_info = { .seq_ops = &bpf_iter_udp_seq_ops, .init_seq_private = bpf_iter_init_udp, .fini_seq_private = bpf_iter_fini_udp, - .seq_priv_size = sizeof(struct udp_iter_state), + .seq_priv_size = sizeof(struct bpf_udp_iter_state), }; static struct bpf_iter_reg udp_reg_info = { From patchwork Tue Mar 21 18:45:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13183048 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91B95C74A5B for ; Tue, 21 Mar 2023 18:46:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231175AbjCUSqi (ORCPT ); Tue, 21 Mar 2023 14:46:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230478AbjCUSq2 (ORCPT ); Tue, 21 Mar 2023 14:46:28 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95AFD56537 for ; Tue, 21 Mar 2023 11:45:51 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id j3-20020a17090adc8300b0023d09aea4a6so21263915pjv.5 for ; Tue, 21 Mar 2023 11:45:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679424350; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=B2I8quhFbc4vtkKyPv2g0+bAejfUzQ0M28B7xFlLMww=; b=Ci1LNHOvS/bCINF6xdghYzt8y5OvEz6H7oGUl90XFYvowKMIu2o87UvS3hxRyqsSx2 KqCrH3t1RPokuE3Flyi0km1jDh3znDAZZRi/N4h9lEzFnJqNrG+ST6j5OQ0bAPczzn0z Ir2easmGoCNTXqCdkit/taS1qq8RCtCF2AryGu72U/dG933cX6XxBQyvbaHQ/cj2OEQj Jj4XqGPscSxNHxcjHRjOlgdOEAXZU8jwT16Ip3w4TnRJj0Cl7L9ZqI7lNG2MsyKNdNah dBLMELtaKTBm1E3obq4CDJuQsUWV7q9W1nAwIHd2oOqqwrBklzao+NtxLNf4Hz/koRIR yBQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679424350; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=B2I8quhFbc4vtkKyPv2g0+bAejfUzQ0M28B7xFlLMww=; b=zK4K2g4PBjJmYGVbLHM6HbdKA/o1zazwSKLBzrK44O+C49oV8t355FoygQZ7BFTKIr V3a3w9NtZY+gswYNIlq6WrkermAOveT6QA8MOp9yhMN0Dku7/XvGFPDvauxgGfV8c49/ Nag6IrbyR66CpYRB6h94tt5JHSKTXOhx/yaKfK9BLq80KEdAHhnORmxwhffrDsNVTDRL E+fCcrbTvgm8pzKhUdLQTOG/7CJCSK8b0dfDbTaNcys6PKTj5xnJELKHhx91tCQrTV7u ZPk8UxsH5DR4UWqdfx5cRK5bgU156rrbZC1Lr+sQpHaeA/1Wjj/R+A8vdKu6oZhZUuq6 NyTw== X-Gm-Message-State: AO0yUKV9YRq2Jl1Sv4eLrxZKhy3kS9E+libet+yztqhr4dxvmzlagoJ+ pdY+p3XUT0jSO4YghBoSti3H3/FMdEUyDObQzWA= X-Google-Smtp-Source: AK7set9wv+gfg/D5F7prkanrzjRiPDZrwBSHEo2e66LQ5zZv4HUGhwst8JSCw+zzNKgrsFSZVPOrnQ== X-Received: by 2002:a17:902:e5d2:b0:1a1:956e:542f with SMTP id u18-20020a170902e5d200b001a1956e542fmr171121plf.19.1679424349728; Tue, 21 Mar 2023 11:45:49 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c15500b0019cf747253csm9095878plj.87.2023.03.21.11.45.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 11:45:49 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v3 bpf-next 2/5] bpf: Add bpf_sock_destroy kfunc Date: Tue, 21 Mar 2023 18:45:38 +0000 Message-Id: <20230321184541.1857363-3-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230321184541.1857363-1-aditi.ghag@isovalent.com> References: <20230321184541.1857363-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The socket destroy kfunc is used to forcefully terminate sockets from certain BPF contexts. We plan to use the capability in Cilium to force client sockets to reconnect when their remote load-balancing backends are deleted. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be forcefully terminated. The helper allows terminating sockets that may or may not be actively sending traffic. The helper is currently exposed to certain BPF iterators where users can filter, and terminate selected sockets. Additionally, the helper can only be called from these BPF contexts that ensure socket locking in order to allow synchronous execution of destroy helpers that also acquire socket locks. The previous commit that batches UDP sockets during iteration facilitated a synchronous invocation of the destroy helper from BPF context by skipping taking socket locks in the destroy handler. TCP iterators already supported batching. The helper takes `sock_common` type argument, even though it expects, and casts them to a `sock` pointer. This enables the verifier to allow the sock_destroy kfunc to be called for TCP with `sock_common` and UDP with `sock` structs. As a comparison, BPF helpers enable this behavior with the `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such option available with the verifier logic that handles kfuncs where BTF types are inferred. Furthermore, as `sock_common` only has a subset of certain fields of `sock`, casting pointer to the latter type might not always be safe. Hence, the BPF kfunc converts the argument to a full sock before casting. Signed-off-by: Aditi Ghag --- include/net/udp.h | 1 + net/core/filter.c | 54 ++++++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp.c | 16 +++++++++---- net/ipv4/udp.c | 60 +++++++++++++++++++++++++++++++++++++---------- 4 files changed, 114 insertions(+), 17 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index de4b528522bb..d2999447d3f2 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -437,6 +437,7 @@ struct udp_seq_afinfo { struct udp_iter_state { struct seq_net_private p; int bucket; + int offset; struct udp_seq_afinfo *bpf_seq_afinfo; }; diff --git a/net/core/filter.c b/net/core/filter.c index 1d6f165923bf..ba3e0dac119c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -11621,3 +11621,57 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id) return func; } + +/* Disables missing prototype warnings */ +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); + +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code. + * + * The helper expects a non-NULL pointer to a socket. It invokes the + * protocol specific socket destroy handlers. + * + * The helper can only be called from BPF contexts that have acquired the socket + * locks. + * + * Parameters: + * @sock: Pointer to socket to be destroyed + * + * Return: + * On error, may return EPROTONOSUPPORT, EINVAL. + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented. + * 0 otherwise + */ +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock) +{ + struct sock *sk = (struct sock *)sock; + + if (!sk) + return -EINVAL; + + /* The locking semantics that allow for synchronous execution of the + * destroy handlers are only supported for TCP and UDP. + */ + if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW) + return -EOPNOTSUPP; + + return sk->sk_prot->diag_destroy(sk, ECONNABORTED); +} + +__diag_pop() + +BTF_SET8_START(sock_destroy_kfunc_set) +BTF_ID_FLAGS(func, bpf_sock_destroy) +BTF_SET8_END(sock_destroy_kfunc_set) + +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = { + .owner = THIS_MODULE, + .set = &sock_destroy_kfunc_set, +}; + +static int init_subsystem(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set); +} +late_initcall(init_subsystem); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 33f559f491c8..59a833c0c872 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4678,8 +4678,10 @@ int tcp_abort(struct sock *sk, int err) return 0; } - /* Don't race with userspace socket closes such as tcp_close. */ - lock_sock(sk); + /* BPF context ensures sock locking. */ + if (!has_current_bpf_ctx()) + /* Don't race with userspace socket closes such as tcp_close. */ + lock_sock(sk); if (sk->sk_state == TCP_LISTEN) { tcp_set_state(sk, TCP_CLOSE); @@ -4688,7 +4690,8 @@ int tcp_abort(struct sock *sk, int err) /* Don't race with BH socket closes such as inet_csk_listen_stop. */ local_bh_disable(); - bh_lock_sock(sk); + if (!has_current_bpf_ctx()) + bh_lock_sock(sk); if (!sock_flag(sk, SOCK_DEAD)) { sk->sk_err = err; @@ -4700,10 +4703,13 @@ int tcp_abort(struct sock *sk, int err) tcp_done(sk); } - bh_unlock_sock(sk); + if (!has_current_bpf_ctx()) + bh_unlock_sock(sk); + local_bh_enable(); tcp_write_queue_purge(sk); - release_sock(sk); + if (!has_current_bpf_ctx()) + release_sock(sk); return 0; } EXPORT_SYMBOL_GPL(tcp_abort); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 545e56329355..02d357713838 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2925,7 +2925,9 @@ EXPORT_SYMBOL(udp_poll); int udp_abort(struct sock *sk, int err) { - lock_sock(sk); + /* BPF context ensures sock locking. */ + if (!has_current_bpf_ctx()) + lock_sock(sk); /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing * with close() @@ -2938,7 +2940,8 @@ int udp_abort(struct sock *sk, int err) __udp_disconnect(sk, 0); out: - release_sock(sk); + if (!has_current_bpf_ctx()) + release_sock(sk); return 0; } @@ -3184,15 +3187,23 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) struct sock *first_sk = NULL; struct sock *sk; unsigned int bucket_sks = 0; - bool first; bool resized = false; + int offset = 0; + int new_offset; /* The current batch is done, so advance the bucket. */ - if (iter->st_bucket_done) + if (iter->st_bucket_done) { state->bucket++; + state->offset = 0; + } udptable = udp_get_table_afinfo(afinfo, net); + if (state->bucket > udptable->mask) { + state->bucket = 0; + state->offset = 0; + } + again: /* New batch for the next bucket. * Iterate over the hash table to find a bucket with sockets matching @@ -3204,43 +3215,60 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) iter->cur_sk = 0; iter->end_sk = 0; iter->st_bucket_done = false; - first = true; + first_sk = NULL; + bucket_sks = 0; + offset = state->offset; + new_offset = offset; for (; state->bucket <= udptable->mask; state->bucket++) { struct udp_hslot *hslot = &udptable->hash[state->bucket]; - if (hlist_empty(&hslot->head)) + if (hlist_empty(&hslot->head)) { + offset = 0; continue; + } spin_lock_bh(&hslot->lock); + /* Resume from the last saved position in a bucket before + * iterator was stopped. + */ + while (offset-- > 0) { + sk_for_each(sk, &hslot->head) + continue; + } sk_for_each(sk, &hslot->head) { if (seq_sk_match(seq, sk)) { - if (first) { + if (!first_sk) first_sk = sk; - first = false; - } if (iter->end_sk < iter->max_sk) { sock_hold(sk); iter->batch[iter->end_sk++] = sk; } bucket_sks++; } + new_offset++; } spin_unlock_bh(&hslot->lock); + if (first_sk) break; + + /* Reset the current bucket's offset before moving to the next bucket. */ + offset = 0; + new_offset = 0; + } /* All done: no batch made. */ if (!first_sk) - return NULL; + goto ret; if (iter->end_sk == bucket_sks) { /* Batching is done for the current bucket; return the first * socket to be iterated from the batch. */ iter->st_bucket_done = true; - return first_sk; + goto ret; } if (!resized && !bpf_iter_udp_realloc_batch(iter, bucket_sks * 3 / 2)) { resized = true; @@ -3248,19 +3276,24 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) state->bucket--; goto again; } +ret: + state->offset = new_offset; return first_sk; } static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) { struct bpf_udp_iter_state *iter = seq->private; + struct udp_iter_state *state = &iter->state; struct sock *sk; /* Whenever seq_next() is called, the iter->cur_sk is * done with seq_show(), so unref the iter->cur_sk. */ - if (iter->cur_sk < iter->end_sk) + if (iter->cur_sk < iter->end_sk) { sock_put(iter->batch[iter->cur_sk++]); + ++state->offset; + } /* After updating iter->cur_sk, check if there are more sockets * available in the current bucket batch. @@ -3630,6 +3663,9 @@ static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux) } iter->cur_sk = 0; iter->end_sk = 0; + iter->st_bucket_done = false; + st->bucket = 0; + st->offset = 0; return ret; } From patchwork Tue Mar 21 18:45:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13183051 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 154E1C761A6 for ; Tue, 21 Mar 2023 18:46:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230476AbjCUSqk (ORCPT ); Tue, 21 Mar 2023 14:46:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231124AbjCUSq3 (ORCPT ); Tue, 21 Mar 2023 14:46:29 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9599656506 for ; Tue, 21 Mar 2023 11:45:52 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id d13so16441034pjh.0 for ; Tue, 21 Mar 2023 11:45:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679424351; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ySR/7H/KndjNOioe8wi69UHVxagx7ZilKLOoZyQ/76g=; b=jXJGqU5NOSY3X78jYA4klyuheJGhc1gv2EBl7cKlJciO88FzAtSbGkAxxAerIzuPQR H7OnofPlNauub7kVmKCgqLVORgOkLFJ10T0K4lboU/t2/uDuIUXeA7UsHPVHweUXASoX TOvdKqlpXBXPMWMg1OU9YJui3cSBzIbH3tQUv0CnvI7Zfy666I+WW6DLJHUgeVMHk+06 tc8pFZtYoEDaevUkfalJS//QolWIxlk4OPsLqdq2VSL+6oq9O7yZfzWuKq8Y7S00pvoq hTbhSIzXaWh43acIooqzlsqOJki6O7SKLZ+Cs0HR0PgzLtr8Lt7O4AwE9J8F2USPN3KP ootQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679424351; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ySR/7H/KndjNOioe8wi69UHVxagx7ZilKLOoZyQ/76g=; b=uEiEH7KE1ABAYgXwzs5jxhQf7bLi+CG28uruqBU/ULHSzExguva8u8mY254RgeKADU 4F6TXMYMXkTAfRufQnZObycQHeBGAttl8+S9mKBz4T0gowAi510TNeWpI5gcZk3iqPEf eJYrYbxNMFeGQeWqlhgsquPmYnuHX5N4ZgBoT7/yB2kK4xS6I2ZGy+gy1sJxIlslN/Ty +mAMu0I9cC0KldlJJxWaIl6xSXX2F6mRUaN+HPOeKoiLqnyvi2HKnn5Dw9c+Z9SCuzA6 63y8yzK5APH0Jvr5ZBzMu45ql/7B3WBFDAlULnbxtFE/RaeW5s/huDPqIr7HmgAEtK42 jVeQ== X-Gm-Message-State: AO0yUKUR54gUR6U0oVj8uwMkpflEm6rvx+z69lhLYosVdtuQxpP+4yRN RFVtOxTA/gmwAGQvv51N7HgQv1Tztk21iA/MXZw= X-Google-Smtp-Source: AK7set/LKH37N81+ekuT2OO4haAShAmThuI34pBiosUg7tDCw95FaNMGpoK4vmw+oFgOie3UOmRgfw== X-Received: by 2002:a17:902:fa0f:b0:1a1:a8eb:d33e with SMTP id la15-20020a170902fa0f00b001a1a8ebd33emr110985plb.8.1679424350837; Tue, 21 Mar 2023 11:45:50 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c15500b0019cf747253csm9095878plj.87.2023.03.21.11.45.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 11:45:50 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v3 bpf-next 3/5] [RFC] net: Skip taking lock in BPF context Date: Tue, 21 Mar 2023 18:45:39 +0000 Message-Id: <20230321184541.1857363-4-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230321184541.1857363-1-aditi.ghag@isovalent.com> References: <20230321184541.1857363-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net When sockets are destroyed in the BPF iterator context, sock lock is already acquired, so skip taking the lock. This allows TCP listening sockets to be destroyed from BPF programs. Signed-off-by: Aditi Ghag --- net/ipv4/inet_hashtables.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index e41fdc38ce19..5543a3e0d1b4 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -777,9 +777,11 @@ void inet_unhash(struct sock *sk) /* Don't disable bottom halves while acquiring the lock to * avoid circular locking dependency on PREEMPT_RT. */ - spin_lock(&ilb2->lock); + if (!has_current_bpf_ctx()) + spin_lock(&ilb2->lock); if (sk_unhashed(sk)) { - spin_unlock(&ilb2->lock); + if (!has_current_bpf_ctx()) + spin_unlock(&ilb2->lock); return; } @@ -788,7 +790,8 @@ void inet_unhash(struct sock *sk) __sk_nulls_del_node_init_rcu(sk); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); - spin_unlock(&ilb2->lock); + if (!has_current_bpf_ctx()) + spin_unlock(&ilb2->lock); } else { spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash); From patchwork Tue Mar 21 18:45:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13183047 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46159C6FD1D for ; Tue, 21 Mar 2023 18:46:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230511AbjCUSqe (ORCPT ); Tue, 21 Mar 2023 14:46:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230466AbjCUSqG (ORCPT ); Tue, 21 Mar 2023 14:46:06 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 027B25616B for ; Tue, 21 Mar 2023 11:45:53 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id j13so16411578pjd.1 for ; Tue, 21 Mar 2023 11:45:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679424352; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Hw/jbz9v/fFialDFe5P2QsHF8bQDPNcaDJeRDmqQ8D8=; b=aQWjMii5blnv7WBgvBVlrrXprMrbchCrlYfgHJBfbzVYEE5QPY/BqT6Q2Aj884ztMK xidihVR32QHczjwRGWws1EHNPZVjODMYY0k/QBcvn3QvKmybGR9+KoG1I+FZhaSEtssH byx8KN0SUWS0RyRZr5wHxUOCjV+qIxwlHGplEmH3NZ+udUPbozu6C4Ne++d6mFuLGAW3 kONyn35d6+O87/GNuILOYJvY4pNddDBKMfdBi+XFKGqGHo4uj7Jrmitop/uoVwg/bXFQ qW5T7kOigjv4O9e45sxX3ADm0QjItak5MBH98hEVW4ueFRgWBxMJPGZONMTRWjfM279W hFPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679424352; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hw/jbz9v/fFialDFe5P2QsHF8bQDPNcaDJeRDmqQ8D8=; b=r28QhFFaDzmY4MEnMaoZXylp0GLkK+VX2LFES5fKqmefj9YpkoL17TBHYcz3K7PVHm mkm/6v6QwNNkvlpC0iRyoZpeh9izHzY3I9FSDF994wN5GEifabpVA7BMkoGO7tCU9qLx s5OB8Iu8kF5OamF9eFseYmkdsDtiNi9K4ZC/MpY4+/72hOuzP+yOzxrl2vhfTFhZZ30u /qd+OzRwfsLKGGzMETi090HSzVfuntUjSMFLXwjFY/U8H626Kv8R1wGTNgVEfvy4gKPg cEw4dayK5BbWO9LLXoe/DV3UEcqHtcqlsYdHjEe3DlAp9ok4znfhlHeb+d9N6gmFC5F/ tf5A== X-Gm-Message-State: AO0yUKUD8MpjfAlFuQUUX25HwUGODmLwlsDTPpt2ngF33274xxDtS6yK RM0PJRqrAyfiqM5cXJxLaQMqX65vZFcp0Uvgeio= X-Google-Smtp-Source: AK7set/2SLtauPZJPbJk8fzzaZqkMJNvP4L61DxqudxrAovKfISQ+fHzZsxDh/qKrjDE2EH/qP5ekg== X-Received: by 2002:a17:902:ecd1:b0:1a1:e237:5f0 with SMTP id a17-20020a170902ecd100b001a1e23705f0mr64193plh.58.1679424351732; Tue, 21 Mar 2023 11:45:51 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c15500b0019cf747253csm9095878plj.87.2023.03.21.11.45.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 11:45:51 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v3 bpf-next 4/5] [RFC] udp: Fix destroying UDP listening sockets Date: Tue, 21 Mar 2023 18:45:40 +0000 Message-Id: <20230321184541.1857363-5-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230321184541.1857363-1-aditi.ghag@isovalent.com> References: <20230321184541.1857363-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Previously, UDP listening sockets that bind'ed to a port weren't getting properly destroyed via udp_abort. Specifically, the sockets were left in the UDP hash table with unset source port. Fix the issue by unconditionally unhashing and resetting source port for sockets that are getting destroyed. This would mean that in case of sockets listening on wildcarded address and on a specific address with a common port, users would have to explicitly select the socket(s) they intend to destroy. Signed-off-by: Aditi Ghag --- net/ipv4/udp.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 02d357713838..a495ac88fcee 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1965,6 +1965,25 @@ int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) } EXPORT_SYMBOL(udp_pre_connect); +int __udp_disconnect_with_abort(struct sock *sk) +{ + struct inet_sock *inet = inet_sk(sk); + + sk->sk_state = TCP_CLOSE; + inet->inet_daddr = 0; + inet->inet_dport = 0; + sock_rps_reset_rxhash(sk); + sk->sk_bound_dev_if = 0; + inet_reset_saddr(sk); + inet->inet_sport = 0; + sk_dst_reset(sk); + /* (TBD) In case of sockets listening on wildcard and specific address + * with a common port, socket will be removed from {hash, hash2} table. + */ + sk->sk_prot->unhash(sk); + return 0; +} + int __udp_disconnect(struct sock *sk, int flags) { struct inet_sock *inet = inet_sk(sk); @@ -2937,7 +2956,7 @@ int udp_abort(struct sock *sk, int err) sk->sk_err = err; sk_error_report(sk); - __udp_disconnect(sk, 0); + __udp_disconnect_with_abort(sk); out: if (!has_current_bpf_ctx()) From patchwork Tue Mar 21 18:45:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13183050 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D723BC6FD1D for ; Tue, 21 Mar 2023 18:46:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230521AbjCUSqj (ORCPT ); Tue, 21 Mar 2023 14:46:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230527AbjCUSq3 (ORCPT ); Tue, 21 Mar 2023 14:46:29 -0400 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0AC55652B for ; Tue, 21 Mar 2023 11:45:54 -0700 (PDT) Received: by mail-pj1-x102f.google.com with SMTP id qe8-20020a17090b4f8800b0023f07253a2cso16834690pjb.3 for ; Tue, 21 Mar 2023 11:45:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679424353; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nbANLhf3hqhIyGGt/ULscXDjjpqtdotAlnChBBHHfc0=; b=H4BOwD8qXkex/aCS2BFONce/ndIBsl8WuxXInjHTAnq2Xz5IxIEBW1rHhRezsvRV9F PNYQePAPb4X51pG5fV8XoYgxEOe53fFwyMO1L5b2PeRXahNid8QvE3Zf5FpnA2ZWSNDt rvMfXxLSiFYGe3lksovUMyAl634rAahTq+wHEGN/C51tjhSxxInpoXL4BNykpsC7G1pM ahxxp14rvatYBH2qkF/0/lcR6PWmsrLrhiffj0k+bM0vW54gYkX5HwOHY6LaduKWmbT9 g7ZLFe4fUgpKND0OC3HcwVIKeqyIMsMIHQczs1O1le319LdOngxHAHUTEqDZ0QWmnXys dUOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679424353; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nbANLhf3hqhIyGGt/ULscXDjjpqtdotAlnChBBHHfc0=; b=uS7BFvJ1DGCS4ToEaFMa5r/gbuYIvJFzMQodK7jd/pshJ9/UjUd2uNvZSrgW8G0Qac 08wKlD/sTcBZlDG5q+7savZKJSFRgHnRdVA1bo3yT4OTSOXPgnT9fQNUgrNGxH+F3hBP 410RVwJSxQUOCJWRyWgRkLi5ZVhzxXOIEoFVQOr6UBh7iMwvouyPoL33yjyDghplj4cp uko7nZtZY31m+Fdko1G0MMirp58zqtUIHNDq3qFxKE4jf6NqDSxUmge1t/JRNNpFyQXE FsSEgIi+zku9beLytAJpBMg1W+X6xnIEhS1Qt5aqmuwQEh7BpQH+A+4stFQ0AjjZ4bXd QRwQ== X-Gm-Message-State: AO0yUKWMGOx9IaQhfTYoE9iB26RjtF32V/Exnf6VOeMAg6tm1Xf/g9sl N9aHNgIkwNfzulcG9mn3OfzxBc2yEZxm2rXw2mg= X-Google-Smtp-Source: AK7set/mRuL11fQG2APBE27NKAPSePbfzh4H4EQqpGxVIoiRiu6OJgtoryTNVFLpX+HoTaHFgUmqQw== X-Received: by 2002:a17:902:dac7:b0:19f:3d59:e0ac with SMTP id q7-20020a170902dac700b0019f3d59e0acmr113366plx.44.1679424352839; Tue, 21 Mar 2023 11:45:52 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id 21-20020a170902c15500b0019cf747253csm9095878plj.87.2023.03.21.11.45.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 11:45:52 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH 5/5] selftests/bpf: Add tests for bpf_sock_destroy Date: Tue, 21 Mar 2023 18:45:41 +0000 Message-Id: <20230321184541.1857363-6-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230321184541.1857363-1-aditi.ghag@isovalent.com> References: <20230321184541.1857363-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The test cases for TCP and UDP iterators mirror the intended usages of the helper using BPF iterators. The destroy helpers set `ECONNABORTED` error code that we can validate in the test code with client sockets. But UDP sockets have an overriding error code from the disconnect called during abort, so the error code the validation is only done for TCP sockets. The `struct sock` is redefined as vmlinux.h forward declares the struct, and the loader fails to load the program as it finds the BTF FWD type for the struct incompatible with the BTF STRUCT type. Here are the snippets of the verifier error, and corresponding BTF output: ``` verifier error: extern (func ksym) ...: func_proto ... incompatible with kernel BTF for selftest prog binary: [104] FWD 'sock' fwd_kind=struct [70] PTR '(anon)' type_id=104 [84] FUNC_PROTO '(anon)' ret_type_id=2 vlen=1 '(anon)' type_id=70 [85] FUNC 'bpf_sock_destroy' type_id=84 linkage=extern --- [96] DATASEC '.ksyms' size=0 vlen=1 type_id=85 offset=0 size=0 (FUNC 'bpf_sock_destroy') BTF for selftest vmlinux: [74923] FUNC 'bpf_sock_destroy' type_id=48965 linkage=static [48965] FUNC_PROTO '(anon)' ret_type_id=9 vlen=1 'sk' type_id=1340 [1340] PTR '(anon)' type_id=2363 [2363] STRUCT 'sock' size=1280 vlen=93 ``` Signed-off-by: Aditi Ghag --- .../selftests/bpf/prog_tests/sock_destroy.c | 190 ++++++++++++++++++ .../selftests/bpf/progs/sock_destroy_prog.c | 151 ++++++++++++++ 2 files changed, 341 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_destroy.c create mode 100644 tools/testing/selftests/bpf/progs/sock_destroy_prog.c diff --git a/tools/testing/selftests/bpf/prog_tests/sock_destroy.c b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c new file mode 100644 index 000000000000..86c29f2c9d50 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c @@ -0,0 +1,190 @@ +// SPDX-License-Identifier: GPL-2.0 +#include + +#include "sock_destroy_prog.skel.h" +#include "network_helpers.h" + +static void start_iter_sockets(struct bpf_program *prog) +{ + struct bpf_link *link; + char buf[50] = {}; + int iter_fd, len; + + link = bpf_program__attach_iter(prog, NULL); + if (!ASSERT_OK_PTR(link, "attach_iter")) + return; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "create_iter")) + goto free_link; + + while ((len = read(iter_fd, buf, sizeof(buf))) > 0) + ; + ASSERT_GE(len, 0, "read"); + + close(iter_fd); + +free_link: + bpf_link__destroy(link); +} + +static void test_tcp_client(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0; + + serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + serv = accept(serv, NULL, NULL); + if (!ASSERT_GE(serv, 0, "serv accept")) + goto cleanup; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys connected client sockets. */ + start_iter_sockets(skel->progs.iter_tcp6_client); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket"); + + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + +static void test_tcp_server(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0; + + serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + serv = accept(serv, NULL, NULL); + if (!ASSERT_GE(serv, 0, "serv accept")) + goto cleanup; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys server sockets. */ + start_iter_sockets(skel->progs.iter_tcp6_server); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + ASSERT_EQ(errno, ECONNRESET, "error code on destroyed socket"); + + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + + +static void test_udp_client(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0; + + serv = start_server(AF_INET6, SOCK_DGRAM, NULL, 6161, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys sockets. */ + start_iter_sockets(skel->progs.iter_udp6_client); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + /* UDP sockets have an overriding error code after they are disconnected, + * so we don't check for ECONNABORTED error code. + */ + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + +static void test_udp_server(struct sock_destroy_prog *skel) +{ + int *listen_fds = NULL, serv = 0; + unsigned int num_listens = 5; + + /* Start reuseport servers */ + listen_fds = start_reuseport_server(AF_INET6, SOCK_DGRAM, + "::1", 6062, 0, + num_listens); + if (!ASSERT_OK_PTR(listen_fds, "start_reuseport_server")) + goto cleanup; + + /* Run iterator program that destroys sockets. */ + start_iter_sockets(skel->progs.iter_udp6_server); + + /* Start a regular server that binds on the same port as the destroyed + * sockets. + */ + serv = start_server(AF_INET6, SOCK_DGRAM, NULL, 6062, 0); + ASSERT_GE(serv, 0, "start_server failed"); + +cleanup: + free_fds(listen_fds, num_listens); +} + +void test_sock_destroy(void) +{ + int cgroup_fd = 0; + struct sock_destroy_prog *skel; + + skel = sock_destroy_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + cgroup_fd = test__join_cgroup("/sock_destroy"); + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup")) + goto close_cgroup_fd; + + skel->links.sock_connect = bpf_program__attach_cgroup( + skel->progs.sock_connect, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links.sock_connect, "prog_attach")) + goto close_cgroup_fd; + + if (test__start_subtest("tcp_client")) + test_tcp_client(skel); + if (test__start_subtest("tcp_server")) + test_tcp_server(skel); + if (test__start_subtest("udp_client")) + test_udp_client(skel); + if (test__start_subtest("udp_server")) + test_udp_server(skel); + + +close_cgroup_fd: + close(cgroup_fd); + sock_destroy_prog__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c new file mode 100644 index 000000000000..3ccce63f0245 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define sock sock___not_used +#include "vmlinux.h" +#undef sock + +#include + +#define AF_INET6 10 + +/* Redefine the struct: vmlinux.h forward declares it, and the loader fails + * to load the program as it finds the BTF FWD type for the struct incompatible + * with the BTF STRUCT type. + */ +struct sock { + struct sock_common __sk_common; +#define sk_family __sk_common.skc_family +#define sk_cookie __sk_common.skc_cookie +}; + +int bpf_sock_destroy(struct sock_common *sk) __ksym; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} tcp_conn_sockets SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} udp_conn_sockets SEC(".maps"); + +SEC("cgroup/connect6") +int sock_connect(struct bpf_sock_addr *ctx) +{ + int key = 0; + __u64 sock_cookie = 0; + __u32 keyc = 0; + + if (ctx->family != AF_INET6 || ctx->user_family != AF_INET6) + return 1; + + sock_cookie = bpf_get_socket_cookie(ctx); + if (ctx->protocol == IPPROTO_TCP) + bpf_map_update_elem(&tcp_conn_sockets, &key, &sock_cookie, 0); + else if (ctx->protocol == IPPROTO_UDP) + bpf_map_update_elem(&udp_conn_sockets, &keyc, &sock_cookie, 0); + else + return 1; + + return 1; +} + +SEC("iter/tcp") +int iter_tcp6_client(struct bpf_iter__tcp *ctx) +{ + struct sock_common *sk_common = ctx->sk_common; + struct seq_file *seq = ctx->meta->seq; + __u64 sock_cookie = 0; + __u64 *val; + int key = 0; + + if (!sk_common) + return 0; + + if (sk_common->skc_family != AF_INET6) + return 0; + + sock_cookie = bpf_get_socket_cookie(sk_common); + val = bpf_map_lookup_elem(&tcp_conn_sockets, &key); + if (!val) + return 0; + /* Destroy connected client sockets. */ + if (sock_cookie == *val) + bpf_sock_destroy(sk_common); + + return 0; +} + +SEC("iter/tcp") +int iter_tcp6_server(struct bpf_iter__tcp *ctx) +{ + struct sock_common *sk_common = ctx->sk_common; + struct seq_file *seq = ctx->meta->seq; + __u64 sock_cookie = 0; + __u64 *val; + int key = 0; + + if (!sk_common) + return 0; + + if (sk_common->skc_family != AF_INET6) + return 0; + + sock_cookie = bpf_get_socket_cookie(sk_common); + val = bpf_map_lookup_elem(&tcp_conn_sockets, &key); + + if (!val) + return 0; + + /* Destroy server sockets. */ + if (sock_cookie != *val) + bpf_sock_destroy(sk_common); + + return 0; +} + + +SEC("iter/udp") +int iter_udp6_client(struct bpf_iter__udp *ctx) +{ + struct seq_file *seq = ctx->meta->seq; + struct udp_sock *udp_sk = ctx->udp_sk; + struct sock *sk = (struct sock *) udp_sk; + __u64 sock_cookie = 0, *val; + int key = 0; + + if (!sk) + return 0; + + sock_cookie = bpf_get_socket_cookie(sk); + val = bpf_map_lookup_elem(&udp_conn_sockets, &key); + if (!val) + return 0; + /* Destroy connected client sockets. */ + if (sock_cookie == *val) + bpf_sock_destroy((struct sock_common *)sk); + + return 0; +} + +SEC("iter/udp") +int iter_udp6_server(struct bpf_iter__udp *ctx) +{ + struct seq_file *seq = ctx->meta->seq; + struct udp_sock *udp_sk = ctx->udp_sk; + struct sock *sk = (struct sock *) udp_sk; + + if (!sk) + return 0; + + bpf_sock_destroy((struct sock_common *)sk); + + return 0; +} + +char _license[] SEC("license") = "GPL";