From patchwork Thu Mar 23 20:06:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13186057 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDEBBC74A5B for ; Thu, 23 Mar 2023 20:06:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231225AbjCWUGz (ORCPT ); Thu, 23 Mar 2023 16:06:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbjCWUGy (ORCPT ); Thu, 23 Mar 2023 16:06:54 -0400 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D70B125BAB for ; Thu, 23 Mar 2023 13:06:52 -0700 (PDT) Received: by mail-pj1-x102f.google.com with SMTP id l9-20020a17090a3f0900b0023d32684e7fso4030427pjc.1 for ; Thu, 23 Mar 2023 13:06:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679602012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xlv5ThkHMgAmdLiCImP4wBlf4uCQmcFpqTg5KqFl6q0=; b=SgQEuwObwxUt1FIYwDdG6HrM0z1xeRnQbvP3CY7Tj/e4HJntHMayzW+qVnzmJFwwf6 NfEt+11hXrAdIeeE3Y+JG5+wYKI2gYvaTB2bxkJBSujnGaXXLwyIa0mKtZwWRaRkNtPv Sf933DsCumwwCMkcPgMFKPHbTlPVdGyVG26r3wXIx4v8U7Ux+wrYVvbw1EZmq0wOit5N WPSq+JLYkOYy0w7feII1i7OvfC8THBSe2Ju68SQkkh0cYepzN3NaShMX39E/9s5HOIPR 739hMAFyPYak68uNtUdeie27mMI6/qzdhwLCqmk9SMRG+npNehiOnGldYjvX87EhCDLQ jR4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679602012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xlv5ThkHMgAmdLiCImP4wBlf4uCQmcFpqTg5KqFl6q0=; b=PZP0cKPqGEXlRR22ttgwv0UolfTCpB4DIkv6Hu7i+me08azwg41M91ckras+rqFkfu RSVdkTBSsSyVLR2Tqgbzl8rD5wm+hGj7TENY+hVzVcdFzjXv9sG+1WwzYz8Up77ReJTQ iZjajMCiRMYPlu0KmJe4Gw2F/7W8itEx2TsdZBuMONxvNjzTGF9CiyKoprCq0kHvUwfZ zsm74f4zKAJ6si40IXuzzKob6qX5qldAN3fSDURMIjOtn7QhSOgFHd+/EX9xnfx6cERm O5kCE115EugX6jcS1n8GZyvx2WJ4NNYfvA7syvObT/J9Q4ADFZgqVovfJ8oGQPFdoIxn Ca/Q== X-Gm-Message-State: AAQBX9ecK3MrB7kLY0+79dGVJy0yS71dppFewsYayCwkjjPTqvfp6E3f KcoQbFLVOpERNFDCRckyrR9nl1/9wKHcPpc4Iiw= X-Google-Smtp-Source: AK7set/ZFtw4h4WWyXThSfIWPPWRNsL/8OOJZTGys94sbTxw9L7GqUdPm6xor+K2MjtJYFxguoolEg== X-Received: by 2002:a17:90a:17a9:b0:23f:452c:7a4f with SMTP id q38-20020a17090a17a900b0023f452c7a4fmr43561pja.46.1679602011960; Thu, 23 Mar 2023 13:06:51 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id j12-20020a17090276cc00b0019aaab3f9d7sm12701698plt.113.2023.03.23.13.06.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 13:06:51 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com, Martin KaFai Lau Subject: [PATCH v4 bpf-next 1/4] bpf: Implement batching in UDP iterator Date: Thu, 23 Mar 2023 20:06:30 +0000 Message-Id: <20230323200633.3175753-2-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230323200633.3175753-1-aditi.ghag@isovalent.com> References: <20230323200633.3175753-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Batch UDP sockets from BPF iterator that allows for overlapping locking semantics in BPF/kernel helpers executed in BPF programs. This facilitates BPF socket destroy kfunc (introduced by follow-up patches) to execute from BPF iterator programs. Previously, BPF iterators acquired the sock lock and sockets hash table bucket lock while executing BPF programs. This prevented BPF helpers that again acquire these locks to be executed from BPF iterators. With the batching approach, we acquire a bucket lock, batch all the bucket sockets, and then release the bucket lock. This enables BPF or kernel helpers to skip sock locking when invoked in the supported BPF contexts. The batching logic is similar to the logic implemented in TCP iterator: https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com/. Suggested-by: Martin KaFai Lau Signed-off-by: Aditi Ghag --- include/net/udp.h | 1 + net/ipv4/udp.c | 255 ++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 247 insertions(+), 9 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index de4b528522bb..d2999447d3f2 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -437,6 +437,7 @@ struct udp_seq_afinfo { struct udp_iter_state { struct seq_net_private p; int bucket; + int offset; struct udp_seq_afinfo *bpf_seq_afinfo; }; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c605d171eb2d..58c620243e47 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -3152,6 +3152,171 @@ struct bpf_iter__udp { int bucket __aligned(8); }; +struct bpf_udp_iter_state { + struct udp_iter_state state; + unsigned int cur_sk; + unsigned int end_sk; + unsigned int max_sk; + struct sock **batch; + bool st_bucket_done; +}; + +static unsigned short seq_file_family(const struct seq_file *seq); +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, + unsigned int new_batch_sz); + +static inline bool seq_sk_match(struct seq_file *seq, const struct sock *sk) +{ + unsigned short family = seq_file_family(seq); + + /* AF_UNSPEC is used as a match all */ + return ((family == AF_UNSPEC || family == sk->sk_family) && + net_eq(sock_net(sk), seq_file_net(seq))); +} + +static struct sock *bpf_iter_udp_batch(struct seq_file *seq) +{ + struct bpf_udp_iter_state *iter = seq->private; + struct udp_iter_state *state = &iter->state; + struct net *net = seq_file_net(seq); + struct udp_seq_afinfo *afinfo = state->bpf_seq_afinfo; + struct udp_table *udptable; + struct sock *first_sk = NULL; + struct sock *sk; + unsigned int bucket_sks = 0; + bool resized = false; + int offset = 0; + int new_offset; + + /* The current batch is done, so advance the bucket. */ + if (iter->st_bucket_done) { + state->bucket++; + state->offset = 0; + } + + udptable = udp_get_table_afinfo(afinfo, net); + + if (state->bucket > udptable->mask) { + state->bucket = 0; + state->offset = 0; + return NULL; + } + +again: + /* New batch for the next bucket. + * Iterate over the hash table to find a bucket with sockets matching + * the iterator attributes, and return the first matching socket from + * the bucket. The remaining matched sockets from the bucket are batched + * before releasing the bucket lock. This allows BPF programs that are + * called in seq_show to acquire the bucket lock if needed. + */ + iter->cur_sk = 0; + iter->end_sk = 0; + iter->st_bucket_done = false; + first_sk = NULL; + bucket_sks = 0; + offset = state->offset; + new_offset = offset; + + for (; state->bucket <= udptable->mask; state->bucket++) { + struct udp_hslot *hslot = &udptable->hash[state->bucket]; + + if (hlist_empty(&hslot->head)) { + offset = 0; + continue; + } + + spin_lock_bh(&hslot->lock); + /* Resume from the last saved position in a bucket before + * iterator was stopped. + */ + while (offset-- > 0) { + sk_for_each(sk, &hslot->head) + continue; + } + sk_for_each(sk, &hslot->head) { + if (seq_sk_match(seq, sk)) { + if (!first_sk) + first_sk = sk; + if (iter->end_sk < iter->max_sk) { + sock_hold(sk); + iter->batch[iter->end_sk++] = sk; + } + bucket_sks++; + } + new_offset++; + } + spin_unlock_bh(&hslot->lock); + + if (first_sk) + break; + + /* Reset the current bucket's offset before moving to the next bucket. */ + offset = 0; + new_offset = 0; + } + + /* All done: no batch made. */ + if (!first_sk) + goto ret; + + if (iter->end_sk == bucket_sks) { + /* Batching is done for the current bucket; return the first + * socket to be iterated from the batch. + */ + iter->st_bucket_done = true; + goto ret; + } + if (!resized && !bpf_iter_udp_realloc_batch(iter, bucket_sks * 3 / 2)) { + resized = true; + /* Go back to the previous bucket to resize its batch. */ + state->bucket--; + goto again; + } +ret: + state->offset = new_offset; + return first_sk; +} + +static void *bpf_iter_udp_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + struct bpf_udp_iter_state *iter = seq->private; + struct udp_iter_state *state = &iter->state; + struct sock *sk; + + /* Whenever seq_next() is called, the iter->cur_sk is + * done with seq_show(), so unref the iter->cur_sk. + */ + if (iter->cur_sk < iter->end_sk) { + sock_put(iter->batch[iter->cur_sk++]); + ++state->offset; + } + + /* After updating iter->cur_sk, check if there are more sockets + * available in the current bucket batch. + */ + if (iter->cur_sk < iter->end_sk) { + sk = iter->batch[iter->cur_sk]; + } else { + // Prepare a new batch. + sk = bpf_iter_udp_batch(seq); + } + + ++*pos; + return sk; +} + +static void *bpf_iter_udp_seq_start(struct seq_file *seq, loff_t *pos) +{ + /* bpf iter does not support lseek, so it always + * continue from where it was stop()-ped. + */ + if (*pos) + return bpf_iter_udp_batch(seq); + + return SEQ_START_TOKEN; +} + static int udp_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta, struct udp_sock *udp_sk, uid_t uid, int bucket) { @@ -3172,18 +3337,38 @@ static int bpf_iter_udp_seq_show(struct seq_file *seq, void *v) struct bpf_prog *prog; struct sock *sk = v; uid_t uid; + bool slow; + int rc; if (v == SEQ_START_TOKEN) return 0; + slow = lock_sock_fast(sk); + + if (unlikely(sk_unhashed(sk))) { + rc = SEQ_SKIP; + goto unlock; + } + uid = from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)); meta.seq = seq; prog = bpf_iter_get_info(&meta, false); - return udp_prog_seq_show(prog, &meta, v, uid, state->bucket); + rc = udp_prog_seq_show(prog, &meta, v, uid, state->bucket); + +unlock: + unlock_sock_fast(sk, slow); + return rc; +} + +static void bpf_iter_udp_unref_batch(struct bpf_udp_iter_state *iter) +{ + while (iter->cur_sk < iter->end_sk) + sock_put(iter->batch[iter->cur_sk++]); } static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v) { + struct bpf_udp_iter_state *iter = seq->private; struct bpf_iter_meta meta; struct bpf_prog *prog; @@ -3194,15 +3379,31 @@ static void bpf_iter_udp_seq_stop(struct seq_file *seq, void *v) (void)udp_prog_seq_show(prog, &meta, v, 0, 0); } - udp_seq_stop(seq, v); + if (iter->cur_sk < iter->end_sk) { + bpf_iter_udp_unref_batch(iter); + iter->st_bucket_done = false; + } } static const struct seq_operations bpf_iter_udp_seq_ops = { - .start = udp_seq_start, - .next = udp_seq_next, + .start = bpf_iter_udp_seq_start, + .next = bpf_iter_udp_seq_next, .stop = bpf_iter_udp_seq_stop, .show = bpf_iter_udp_seq_show, }; + +static unsigned short seq_file_family(const struct seq_file *seq) +{ + const struct udp_seq_afinfo *afinfo; + + /* BPF iterator: bpf programs to filter sockets. */ + if (seq->op == &bpf_iter_udp_seq_ops) + return AF_UNSPEC; + + /* Proc fs iterator */ + afinfo = pde_data(file_inode(seq->file)); + return afinfo->family; +} #endif const struct seq_operations udp_seq_ops = { @@ -3413,9 +3614,30 @@ static struct pernet_operations __net_initdata udp_sysctl_ops = { DEFINE_BPF_ITER_FUNC(udp, struct bpf_iter_meta *meta, struct udp_sock *udp_sk, uid_t uid, int bucket) +static int bpf_iter_udp_realloc_batch(struct bpf_udp_iter_state *iter, + unsigned int new_batch_sz) +{ + struct sock **new_batch; + + new_batch = kvmalloc_array(new_batch_sz, sizeof(*new_batch), + GFP_USER | __GFP_NOWARN); + if (!new_batch) + return -ENOMEM; + + bpf_iter_udp_unref_batch(iter); + kvfree(iter->batch); + iter->batch = new_batch; + iter->max_sk = new_batch_sz; + + return 0; +} + +#define INIT_BATCH_SZ 16 + static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux) { - struct udp_iter_state *st = priv_data; + struct bpf_udp_iter_state *iter = priv_data; + struct udp_iter_state *st = &iter->state; struct udp_seq_afinfo *afinfo; int ret; @@ -3427,24 +3649,39 @@ static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux) afinfo->udp_table = NULL; st->bpf_seq_afinfo = afinfo; ret = bpf_iter_init_seq_net(priv_data, aux); - if (ret) + if (ret) { kfree(afinfo); + return ret; + } + ret = bpf_iter_udp_realloc_batch(iter, INIT_BATCH_SZ); + if (ret) { + bpf_iter_fini_seq_net(priv_data); + return ret; + } + iter->cur_sk = 0; + iter->end_sk = 0; + iter->st_bucket_done = false; + st->bucket = 0; + st->offset = 0; + return ret; } static void bpf_iter_fini_udp(void *priv_data) { - struct udp_iter_state *st = priv_data; + struct bpf_udp_iter_state *iter = priv_data; + struct udp_iter_state *st = &iter->state; - kfree(st->bpf_seq_afinfo); bpf_iter_fini_seq_net(priv_data); + kfree(st->bpf_seq_afinfo); + kvfree(iter->batch); } static const struct bpf_iter_seq_info udp_seq_info = { .seq_ops = &bpf_iter_udp_seq_ops, .init_seq_private = bpf_iter_init_udp, .fini_seq_private = bpf_iter_fini_udp, - .seq_priv_size = sizeof(struct udp_iter_state), + .seq_priv_size = sizeof(struct bpf_udp_iter_state), }; static struct bpf_iter_reg udp_reg_info = { From patchwork Thu Mar 23 20:06:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13186058 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C22D0C76196 for ; Thu, 23 Mar 2023 20:06:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231248AbjCWUG4 (ORCPT ); Thu, 23 Mar 2023 16:06:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229508AbjCWUGy (ORCPT ); Thu, 23 Mar 2023 16:06:54 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3D7125E0D for ; Thu, 23 Mar 2023 13:06:53 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id o6-20020a17090a9f8600b0023f32869993so3060515pjp.1 for ; Thu, 23 Mar 2023 13:06:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679602013; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JW0tiAWBVLKjnwIUis1lvm04oebQt67O7eZhyfelwws=; b=eBGeADZfk0TYeLGMQfqd8+yxtYmv15ALJi4Js6FzcvhX/bfuYve9cjy/awPL98ZjKa +6MpSTUEHa9flk4KyLoykQ+Fe3dP/kGdMiQTJRqOj9tdAOSdUz7LhvyefCDgEW1Yob1w VmY3BQgLufHoFDk10qivs1N8pQVL20Vl4515zIpHKs8DZ5ou5zXkTsCqLIuErxQyCWBj t+toQE1Ej38ii1g8uD2QkhYH0EHx4bvrDRKIihz7XQlOcRr5ftTVLlWJ3Hwa/X5vXmqu tZEKZfd9tZDOlNFBYgMhhRf7mS/Kq1HST775bvnS/jkxG92H5RwHaES4vnXs6C7spXqH IIWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679602013; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JW0tiAWBVLKjnwIUis1lvm04oebQt67O7eZhyfelwws=; b=05wDErAsRMFSEgnY8OqffrWlj62DMpSNC1Q3OMsmVo0alKd40Swfj3xO3NA6XkfgCc khNy7rU2D7zyC9dtFVV4bHSQpLKQIRHiNJnaVOMTyQyPHq9UAt9BuWfwGfbGuwIetXcU u682fvwx9drZggxjfKvfDR84FQ1SWBCpdPF7qkpWyvP9uOWuYTFISlvoZ8V+fYRANtrP AZYOOeDSqZDyfXnVU8z8rW1HhCzypnOQ/Trs/TbMxO313mdWcM86QhkUuT3iVjz7TG9k XXiD3ZKJnLHMxQ0VmXa9WmNK0NIaHbnOLV2jtR+KKNM5scLiH1alBWKt6/3HZacP8V6q JsBg== X-Gm-Message-State: AAQBX9fxG4+tKX/jRZjGWNktkoKTx1lbUV126CUisVwVIvm2GLlS3iyV LWiVXLMYOHiKkF+ED01D6/XhWPud2gWcfVAgI3s= X-Google-Smtp-Source: AKy350a4PxFjTOqgOs582So2YuwSIDOgT+20ei8L0nqIA0mrpkDi9nrgCElRfMErogp2yohykid9IQ== X-Received: by 2002:a17:903:cc:b0:19e:6e00:4676 with SMTP id x12-20020a17090300cc00b0019e6e004676mr35811plc.61.1679602012832; Thu, 23 Mar 2023 13:06:52 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id j12-20020a17090276cc00b0019aaab3f9d7sm12701698plt.113.2023.03.23.13.06.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 13:06:52 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v4 bpf-next 2/4] bpf: Add bpf_sock_destroy kfunc Date: Thu, 23 Mar 2023 20:06:31 +0000 Message-Id: <20230323200633.3175753-3-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230323200633.3175753-1-aditi.ghag@isovalent.com> References: <20230323200633.3175753-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The socket destroy kfunc is used to forcefully terminate sockets from certain BPF contexts. We plan to use the capability in Cilium to force client sockets to reconnect when their remote load-balancing backends are deleted. The other use case is on-the-fly policy enforcement where existing socket connections prevented by policies need to be forcefully terminated. The helper allows terminating sockets that may or may not be actively sending traffic. The helper is currently exposed to certain BPF iterators where users can filter, and terminate selected sockets. Additionally, the helper can only be called from these BPF contexts that ensure socket locking in order to allow synchronous execution of destroy helpers that also acquire socket locks. The previous commit that batches UDP sockets during iteration facilitated a synchronous invocation of the destroy helper from BPF context by skipping taking socket locks in the destroy handler. TCP iterators already supported batching. The helper takes `sock_common` type argument, even though it expects, and casts them to a `sock` pointer. This enables the verifier to allow the sock_destroy kfunc to be called for TCP with `sock_common` and UDP with `sock` structs. As a comparison, BPF helpers enable this behavior with the `ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such option available with the verifier logic that handles kfuncs where BTF types are inferred. Furthermore, as `sock_common` only has a subset of certain fields of `sock`, casting pointer to the latter type might not always be safe. Hence, the BPF kfunc converts the argument to a full sock before casting. Signed-off-by: Aditi Ghag --- net/core/filter.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp.c | 10 ++++++--- net/ipv4/udp.c | 6 ++++-- 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 1d6f165923bf..ba3e0dac119c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -11621,3 +11621,57 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id) return func; } + +/* Disables missing prototype warnings */ +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); + +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code. + * + * The helper expects a non-NULL pointer to a socket. It invokes the + * protocol specific socket destroy handlers. + * + * The helper can only be called from BPF contexts that have acquired the socket + * locks. + * + * Parameters: + * @sock: Pointer to socket to be destroyed + * + * Return: + * On error, may return EPROTONOSUPPORT, EINVAL. + * EPROTONOSUPPORT if protocol specific destroy handler is not implemented. + * 0 otherwise + */ +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock) +{ + struct sock *sk = (struct sock *)sock; + + if (!sk) + return -EINVAL; + + /* The locking semantics that allow for synchronous execution of the + * destroy handlers are only supported for TCP and UDP. + */ + if (!sk->sk_prot->diag_destroy || sk->sk_protocol == IPPROTO_RAW) + return -EOPNOTSUPP; + + return sk->sk_prot->diag_destroy(sk, ECONNABORTED); +} + +__diag_pop() + +BTF_SET8_START(sock_destroy_kfunc_set) +BTF_ID_FLAGS(func, bpf_sock_destroy) +BTF_SET8_END(sock_destroy_kfunc_set) + +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = { + .owner = THIS_MODULE, + .set = &sock_destroy_kfunc_set, +}; + +static int init_subsystem(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set); +} +late_initcall(init_subsystem); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 33f559f491c8..5df6231016e3 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4678,8 +4678,10 @@ int tcp_abort(struct sock *sk, int err) return 0; } - /* Don't race with userspace socket closes such as tcp_close. */ - lock_sock(sk); + /* BPF context ensures sock locking. */ + if (!has_current_bpf_ctx()) + /* Don't race with userspace socket closes such as tcp_close. */ + lock_sock(sk); if (sk->sk_state == TCP_LISTEN) { tcp_set_state(sk, TCP_CLOSE); @@ -4701,9 +4703,11 @@ int tcp_abort(struct sock *sk, int err) } bh_unlock_sock(sk); + local_bh_enable(); tcp_write_queue_purge(sk); - release_sock(sk); + if (!has_current_bpf_ctx()) + release_sock(sk); return 0; } EXPORT_SYMBOL_GPL(tcp_abort); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 58c620243e47..408836102e20 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2925,7 +2925,8 @@ EXPORT_SYMBOL(udp_poll); int udp_abort(struct sock *sk, int err) { - lock_sock(sk); + if (!has_current_bpf_ctx()) + lock_sock(sk); /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing * with close() @@ -2938,7 +2939,8 @@ int udp_abort(struct sock *sk, int err) __udp_disconnect(sk, 0); out: - release_sock(sk); + if (!has_current_bpf_ctx()) + release_sock(sk); return 0; } From patchwork Thu Mar 23 20:06:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13186059 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A75ACC6FD1C for ; Thu, 23 Mar 2023 20:06:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231433AbjCWUG5 (ORCPT ); Thu, 23 Mar 2023 16:06:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbjCWUGz (ORCPT ); Thu, 23 Mar 2023 16:06:55 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1E0725BAB for ; Thu, 23 Mar 2023 13:06:54 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id p3-20020a17090a74c300b0023f69bc7a68so3193603pjl.4 for ; Thu, 23 Mar 2023 13:06:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679602014; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MZ+GI8VjW65EMHhu5amPir47kL8jARTSQ0pSRqLs5Zk=; b=hj8liSOPeC0B9WWtkaXkD+GwqnyLbT93BpadrDmj6iz7KxegywQ5Ieshn9XWoHreeN gjpkaSpQzTc7pzVpJEMBSrOkzyDdUryDqqGmwK9fHC2cNpbdarQRruGiJIjxDEjcSvZb 3UYM8zE/6EWwrNXBn4IOjHegPt6WK/g61pRzxUd8O4W81RaZTpDiw5jdU8EzFE9Q30SJ 6DS1Djr+NrAOCyaK2l6kJoTotheFApNB5QSCqIBohMnrCqF+RdcqRU3by9nFAx7OrRev DgVu335MuoMkkVzvZyBvtlClN53fei5mnB/9GCCtudTu6xdMIhdr/fr/PbGXYvbGwu9E 0L8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679602014; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MZ+GI8VjW65EMHhu5amPir47kL8jARTSQ0pSRqLs5Zk=; b=BnoW3TcJsvPr+/O2B04RYmDyQD4qJfVx0Oh1+10r8O6jcpJOkXAdTxRSNW3escRhs3 eEy++WOWQ/zVcZ3WZPSVXjZpoeAxkUNpkbOvHsKm7Ta32+0VIVW6y+5bc5r6Hac9EgZj duZJTeb7uswFmlapai0VkyA21p2F2ZvkEIHQoOcTtFxE2nGJk9EHydRa3SykNQponZ43 hMRMMfh+ZXIbDoRhew+iF37rkeuj25+Nev0axG9nInYYfBi+5PBbmsxOluk8dalqjmBA aroDaV5GOUGBwiGPmQZvn+JomiQLLl0YUFgTvUmdKdMmOK41pm5ItreQKupJaQbgcm2n XCww== X-Gm-Message-State: AAQBX9cvwS1PX0+/igmaza5aTzrbw9NyYqxyQOek0ZOeYZW9MUAfWEkX l780VtuPRMrheQt6p4BE3xwBm0+ofOV987qQ5WY= X-Google-Smtp-Source: AKy350YglOLRucQzZ0lkS6D9j8KB1rUXWDbHBCURAJ4gX/zeDiB784IeQztPjxo/ITIFrFcJB+n4aQ== X-Received: by 2002:a17:90a:510d:b0:237:a500:eca6 with SMTP id t13-20020a17090a510d00b00237a500eca6mr6967672pjh.22.1679602013966; Thu, 23 Mar 2023 13:06:53 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id j12-20020a17090276cc00b0019aaab3f9d7sm12701698plt.113.2023.03.23.13.06.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 13:06:53 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v4 bpf-next 3/4] bpf,tcp: Avoid taking fast sock lock in iterator Date: Thu, 23 Mar 2023 20:06:32 +0000 Message-Id: <20230323200633.3175753-4-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230323200633.3175753-1-aditi.ghag@isovalent.com> References: <20230323200633.3175753-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Previously, BPF TCP iterator was acquiring fast version of sock lock that disables the BH. This introduced a circular dependency with code paths that later acquire sockets hash table bucket lock. Replace the fast version of sock lock with slow that faciliates BPF programs executed from the iterator to destroy TCP listening sockets using the bpf_sock_destroy kfunc. Here is a stack trace that motivated this change: ``` 1) sock_lock with BH disabled + bucket lock lock_acquire+0xcd/0x330 _raw_spin_lock_bh+0x38/0x50 inet_unhash+0x96/0xd0 tcp_set_state+0x6a/0x210 tcp_abort+0x12b/0x230 bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa bpf_iter_run_prog+0x1ff/0x340 bpf_iter_tcp_seq_show+0xca/0x190 bpf_seq_read+0x177/0x450 vfs_read+0xc6/0x300 ksys_read+0x69/0xf0 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc 2) sock lock with BH enable [ 1.499968] lock_acquire+0xcd/0x330 [ 1.500316] _raw_spin_lock+0x33/0x40 [ 1.500670] sk_clone_lock+0x146/0x520 [ 1.501030] inet_csk_clone_lock+0x1b/0x110 [ 1.501433] tcp_create_openreq_child+0x22/0x3f0 [ 1.501873] tcp_v6_syn_recv_sock+0x96/0x940 [ 1.502284] tcp_check_req+0x137/0x660 [ 1.502646] tcp_v6_rcv+0xa63/0xe80 [ 1.502994] ip6_protocol_deliver_rcu+0x78/0x590 [ 1.503434] ip6_input_finish+0x72/0x140 [ 1.503818] __netif_receive_skb_one_core+0x63/0xa0 [ 1.504281] process_backlog+0x79/0x260 [ 1.504668] __napi_poll.constprop.0+0x27/0x170 [ 1.505104] net_rx_action+0x14a/0x2a0 [ 1.505469] __do_softirq+0x165/0x510 [ 1.505842] do_softirq+0xcd/0x100 [ 1.506172] __local_bh_enable_ip+0xcc/0xf0 [ 1.506588] ip6_finish_output2+0x2a8/0xb00 [ 1.506988] ip6_finish_output+0x274/0x510 [ 1.507377] ip6_xmit+0x319/0x9b0 [ 1.507726] inet6_csk_xmit+0x12b/0x2b0 [ 1.508096] __tcp_transmit_skb+0x549/0xc40 [ 1.508498] tcp_rcv_state_process+0x362/0x1180 ``` Signed-off-by: Aditi Ghag Acked-by: Stanislav Fomichev --- net/ipv4/tcp_ipv4.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ea370afa70ed..f2d370a9450f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) struct bpf_iter_meta meta; struct bpf_prog *prog; struct sock *sk = v; - bool slow; uid_t uid; int ret; @@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) return 0; if (sk_fullsock(sk)) - slow = lock_sock_fast(sk); + lock_sock(sk); if (unlikely(sk_unhashed(sk))) { ret = SEQ_SKIP; @@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) unlock: if (sk_fullsock(sk)) - unlock_sock_fast(sk, slow); + release_sock(sk); return ret; } From patchwork Thu Mar 23 20:06:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aditi Ghag X-Patchwork-Id: 13186060 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D243C761AF for ; Thu, 23 Mar 2023 20:06:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231384AbjCWUG6 (ORCPT ); Thu, 23 Mar 2023 16:06:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231271AbjCWUG5 (ORCPT ); Thu, 23 Mar 2023 16:06:57 -0400 Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8FF327D72 for ; Thu, 23 Mar 2023 13:06:55 -0700 (PDT) Received: by mail-pj1-x102e.google.com with SMTP id h12-20020a17090aea8c00b0023d1311fab3so3202318pjz.1 for ; Thu, 23 Mar 2023 13:06:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1679602015; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dqlCtkrEe4IG93h1sBoZV6/NlO9q0b2uUq4t5Cc8lEw=; b=XMaCj/5TqxWdOx/gaoorvxMzSdRXCusEzWN93LR8TcAJ7NeQfvX3WD9o1QEjsDz+0I H2TmkNJtxc+2qMTdfBz+pgbmLDo2V8qNwQb/v8HZIAejrnw6m5m+dGb4gLqbhLo4k4AD nQNEDUePsR1+IYuvwHlF8vWPLnVsQBDD9POtNDBJYDFHstME7g3wERrP98h7IEo0DR1h q05OBx97DXdPnt2AZu9LSm6f2li8BsdxpgVGcNmJH67eqaP0FmGm/1Of5hXxkptQ6XCW G9hojRVhIxHfc7/7mPqwjH4C7cCQjAWqgqKdpd0OMbJ2VUsDWppWMyNeuNtMC+tABi2n KaDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679602015; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dqlCtkrEe4IG93h1sBoZV6/NlO9q0b2uUq4t5Cc8lEw=; b=6jHQ4gfYNJhTKc4ZWAQzpMng8Qm2SAOUkwfTNkpxGi8ZKg4ncaAx+5geuSdwqM93lz 26AOKehtVCp1qDN8oXPyYr1sjB5POSpSxf2LNojW32P5iduCvh8Dyvhj/kONcNOlz9SP /5FrJTiU1QbiIUZJZ6g+V2IrEQsgwn432aVk+9M+SSayHUtFopPZj1O2ye8lXHJTxvou pKhBIfP+NpvG8NAdIG3GOSFXLffAfiMViadbnrPcZQIksdcALJjAnJ8pVJm31sGo9zuO E/w98WhwJuN798skgfgpt+df6ha14VWSDVJm4DzT4LBYM2msvaPwWw3TFIa+ufQYjICU c06A== X-Gm-Message-State: AAQBX9eil0hGRvF2vMAFNKALqymmijaOfVvcxpwCIWtYMbIokrn3u1Qj 6Eatrx1vIK911TvI+HDy883q7s6kOlnEYGU47Ts= X-Google-Smtp-Source: AKy350ZDC4kdwtzrHXFXAP4YvoguLeP33fmmEthejpJjzDVnKFSR5PY3TgFETTuQQRtO1kUBn+MBWQ== X-Received: by 2002:a17:903:22cb:b0:1a1:b174:8363 with SMTP id y11-20020a17090322cb00b001a1b1748363mr61488plg.59.1679602014928; Thu, 23 Mar 2023 13:06:54 -0700 (PDT) Received: from localhost.localdomain ([2604:1380:4611:8100::1]) by smtp.gmail.com with ESMTPSA id j12-20020a17090276cc00b0019aaab3f9d7sm12701698plt.113.2023.03.23.13.06.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 13:06:54 -0700 (PDT) From: Aditi Ghag To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v4 bpf-next 4/4] selftests/bpf: Add tests for bpf_sock_destroy Date: Thu, 23 Mar 2023 20:06:33 +0000 Message-Id: <20230323200633.3175753-5-aditi.ghag@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230323200633.3175753-1-aditi.ghag@isovalent.com> References: <20230323200633.3175753-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The test cases for destroying sockets mirror the intended usages of the bpf_sock_destroy kfunc using iterators. The destroy helpers set `ECONNABORTED` error code that we can validate in the test code with client sockets. But UDP sockets have an overriding error code from the disconnect called during abort, so the error code the validation is only done for TCP sockets. Signed-off-by: Aditi Ghag --- .../selftests/bpf/prog_tests/sock_destroy.c | 195 ++++++++++++++++++ .../selftests/bpf/progs/sock_destroy_prog.c | 151 ++++++++++++++ 2 files changed, 346 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_destroy.c create mode 100644 tools/testing/selftests/bpf/progs/sock_destroy_prog.c diff --git a/tools/testing/selftests/bpf/prog_tests/sock_destroy.c b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c new file mode 100644 index 000000000000..cbce966af568 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/sock_destroy.c @@ -0,0 +1,195 @@ +// SPDX-License-Identifier: GPL-2.0 +#include + +#include "sock_destroy_prog.skel.h" +#include "network_helpers.h" + +#define SERVER_PORT 6062 + +static void start_iter_sockets(struct bpf_program *prog) +{ + struct bpf_link *link; + char buf[50] = {}; + int iter_fd, len; + + link = bpf_program__attach_iter(prog, NULL); + if (!ASSERT_OK_PTR(link, "attach_iter")) + return; + + iter_fd = bpf_iter_create(bpf_link__fd(link)); + if (!ASSERT_GE(iter_fd, 0, "create_iter")) + goto free_link; + + while ((len = read(iter_fd, buf, sizeof(buf))) > 0) + ; + ASSERT_GE(len, 0, "read"); + + close(iter_fd); + +free_link: + bpf_link__destroy(link); +} + +static void test_tcp_client(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0; + + serv = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + serv = accept(serv, NULL, NULL); + if (!ASSERT_GE(serv, 0, "serv accept")) + goto cleanup; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys connected client sockets. */ + start_iter_sockets(skel->progs.iter_tcp6_client); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket"); + + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + +static void test_tcp_server(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0; + + serv = start_server(AF_INET6, SOCK_STREAM, NULL, SERVER_PORT, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + serv = accept(serv, NULL, NULL); + if (!ASSERT_GE(serv, 0, "serv accept")) + goto cleanup; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys server sockets. */ + start_iter_sockets(skel->progs.iter_tcp6_server); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + ASSERT_EQ(errno, ECONNRESET, "error code on destroyed socket"); + + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + + +static void test_udp_client(struct sock_destroy_prog *skel) +{ + int serv = -1, clien = -1, n = 0; + + serv = start_server(AF_INET6, SOCK_DGRAM, NULL, 6161, 0); + if (!ASSERT_GE(serv, 0, "start_server")) + goto cleanup_serv; + + clien = connect_to_fd(serv, 0); + if (!ASSERT_GE(clien, 0, "connect_to_fd")) + goto cleanup_serv; + + n = send(clien, "t", 1, 0); + if (!ASSERT_GE(n, 0, "client send")) + goto cleanup; + + /* Run iterator program that destroys sockets. */ + start_iter_sockets(skel->progs.iter_udp6_client); + + n = send(clien, "t", 1, 0); + if (!ASSERT_LT(n, 0, "client_send on destroyed socket")) + goto cleanup; + /* UDP sockets have an overriding error code after they are disconnected, + * so we don't check for ECONNABORTED error code. + */ + +cleanup: + close(clien); +cleanup_serv: + close(serv); +} + +static void test_udp_server(struct sock_destroy_prog *skel) +{ + int *listen_fds = NULL, n, i; + unsigned int num_listens = 5; + char buf[1]; + + /* Start reuseport servers. */ + listen_fds = start_reuseport_server(AF_INET6, SOCK_DGRAM, + "::1", SERVER_PORT, 0, + num_listens); + if (!ASSERT_OK_PTR(listen_fds, "start_reuseport_server")) + goto cleanup; + + /* Run iterator program that destroys server sockets. */ + start_iter_sockets(skel->progs.iter_udp6_server); + + for (i = 0; i < num_listens; ++i) { + n = read(listen_fds[i], buf, sizeof(buf)); + if (!ASSERT_EQ(n, -1, "read") || + !ASSERT_EQ(errno, ECONNABORTED, "error code on destroyed socket")) + break; + } + ASSERT_EQ(i, num_listens, "server socket"); + +cleanup: + free_fds(listen_fds, num_listens); +} + +void test_sock_destroy(void) +{ + int cgroup_fd = 0; + struct sock_destroy_prog *skel; + + skel = sock_destroy_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + cgroup_fd = test__join_cgroup("/sock_destroy"); + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup")) + goto close_cgroup_fd; + + skel->links.sock_connect = bpf_program__attach_cgroup( + skel->progs.sock_connect, cgroup_fd); + if (!ASSERT_OK_PTR(skel->links.sock_connect, "prog_attach")) + goto close_cgroup_fd; + + if (test__start_subtest("tcp_client")) + test_tcp_client(skel); + if (test__start_subtest("tcp_server")) + test_tcp_server(skel); + if (test__start_subtest("udp_client")) + test_udp_client(skel); + if (test__start_subtest("udp_server")) + test_udp_server(skel); + + +close_cgroup_fd: + close(cgroup_fd); + sock_destroy_prog__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/sock_destroy_prog.c b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c new file mode 100644 index 000000000000..8e09d82c50f3 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/sock_destroy_prog.c @@ -0,0 +1,151 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "vmlinux.h" + +#include "bpf_tracing_net.h" +#include +#include + +#define AF_INET6 10 +/* Keep it in sync with prog_test/sock_destroy. */ +#define SERVER_PORT 6062 + +int bpf_sock_destroy(struct sock_common *sk) __ksym; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} tcp_conn_sockets SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} udp_conn_sockets SEC(".maps"); + +SEC("cgroup/connect6") +int sock_connect(struct bpf_sock_addr *ctx) +{ + int key = 0; + __u64 sock_cookie = 0; + __u32 keyc = 0; + + if (ctx->family != AF_INET6 || ctx->user_family != AF_INET6) + return 1; + + sock_cookie = bpf_get_socket_cookie(ctx); + if (ctx->protocol == IPPROTO_TCP) + bpf_map_update_elem(&tcp_conn_sockets, &key, &sock_cookie, 0); + else if (ctx->protocol == IPPROTO_UDP) + bpf_map_update_elem(&udp_conn_sockets, &keyc, &sock_cookie, 0); + else + return 1; + + return 1; +} + +SEC("iter/tcp") +int iter_tcp6_client(struct bpf_iter__tcp *ctx) +{ + struct sock_common *sk_common = ctx->sk_common; + struct seq_file *seq = ctx->meta->seq; + __u64 sock_cookie = 0; + __u64 *val; + int key = 0; + + if (!sk_common) + return 0; + + if (sk_common->skc_family != AF_INET6) + return 0; + + sock_cookie = bpf_get_socket_cookie(sk_common); + val = bpf_map_lookup_elem(&tcp_conn_sockets, &key); + if (!val) + return 0; + /* Destroy connected client sockets. */ + if (sock_cookie == *val) + bpf_sock_destroy(sk_common); + + return 0; +} + +SEC("iter/tcp") +int iter_tcp6_server(struct bpf_iter__tcp *ctx) +{ + struct sock_common *sk_common = ctx->sk_common; + struct seq_file *seq = ctx->meta->seq; + struct tcp6_sock *tcp_sk; + const struct inet_connection_sock *icsk; + const struct inet_sock *inet; + __u16 srcp; + + if (!sk_common) + return 0; + + if (sk_common->skc_family != AF_INET6) + return 0; + + tcp_sk = bpf_skc_to_tcp6_sock(sk_common); + if (!tcp_sk) + return 0; + + icsk = &tcp_sk->tcp.inet_conn; + inet = &icsk->icsk_inet; + srcp = bpf_ntohs(inet->inet_sport); + + /* Destroy server sockets. */ + if (srcp == SERVER_PORT) + bpf_sock_destroy(sk_common); + + return 0; +} + + +SEC("iter/udp") +int iter_udp6_client(struct bpf_iter__udp *ctx) +{ + struct seq_file *seq = ctx->meta->seq; + struct udp_sock *udp_sk = ctx->udp_sk; + struct sock *sk = (struct sock *) udp_sk; + __u64 sock_cookie = 0, *val; + int key = 0; + + if (!sk) + return 0; + + sock_cookie = bpf_get_socket_cookie(sk); + val = bpf_map_lookup_elem(&udp_conn_sockets, &key); + if (!val) + return 0; + /* Destroy connected client sockets. */ + if (sock_cookie == *val) + bpf_sock_destroy((struct sock_common *)sk); + + return 0; +} + +SEC("iter/udp") +int iter_udp6_server(struct bpf_iter__udp *ctx) +{ + struct seq_file *seq = ctx->meta->seq; + struct udp_sock *udp_sk = ctx->udp_sk; + struct sock *sk = (struct sock *) udp_sk; + __u16 srcp; + struct inet_sock *inet; + + if (!sk) + return 0; + + inet = &udp_sk->inet; + srcp = bpf_ntohs(inet->inet_sport); + if (srcp == SERVER_PORT) + bpf_sock_destroy((struct sock_common *)sk); + + return 0; +} + +char _license[] SEC("license") = "GPL";