From patchwork Tue May 23 02:56:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251450 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14F336ADC; Tue, 23 May 2023 02:56:25 +0000 (UTC) Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 683C4CD; Mon, 22 May 2023 19:56:23 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-64d41d8bc63so2752737b3a.0; Mon, 22 May 2023 19:56:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810583; x=1687402583; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WYY6gCktLyS2evy87Xcih8x5aN4rar4oFOrscwwNY2s=; b=lZC/2SgJcMrs0jGe/tCzNuQn36XzL0xjCopwy2crblttCbMf0EYSFrztwwgZwG9dy0 v7cjKoTQTBwjyXOwDXAZwLiJZiERvH9og582MRSQpNJg0ge9+mr2vCqXgVbURcFTYwat JdJUwL5Ae56UhoN4q8ArfyYdz8cHJ0hLSaoWmmIenVFe1+WQXbmn2WRbdj09BaHjhkfH 874RQtWLmxlsUTUxGdKmhE2n+fVZOo6JHK7AqcG8g404AtA7BjpVnf2mlgDvH1wYXA7U In4zZiVCU0sckPePxkO7emaRM8cZ6kLyAsFU07rAF66m/oH2KLMuni2LNM9hCm6EWdTo dRDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810583; x=1687402583; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WYY6gCktLyS2evy87Xcih8x5aN4rar4oFOrscwwNY2s=; b=QokV6MlEvwavAWotGmIz73l5vB7/w4W+O//S6v5TGLwEC+REv8tKKrkpZ+dAjUM962 6ac6VCKt6EhSHe4aUvCkbujWdc9J8fTMticQiuBYgn/dGhiZXNvPpm/9weTovPN7thv4 bHBrvkBYWMas/O24t2a36Aa46MYnAg9zB/gTdeTAIL0L+zigEEKTrdd1Gq7nAy/KxHij eSOYQh5Mh5gYohSa3KhKEgPYhKyJLZlWJQWwkl1378mHB4JG/QdGhDB67l0loizrudbf YDk0mUxIVydxcRcRbREsFdY1w6Y/3j4Cvu59Bb27QDKWFR1OJfIVt2QY8fZfXUPrybqI iWtA== X-Gm-Message-State: AC+VfDxPzG5wo1vuz3CPJXE4Td6999GGVSlAuK3HqyOAQtENM17dYvjz qaGzIwfnA93Crs7P9eNXnTQBG5iWLFw= X-Google-Smtp-Source: ACHHUZ7ALlnpYpfhmdUufnqoRSewxcjYtitmRObweRuZZqY/7JnrqSnUFBSZpImg3W68sTmoKYZWhQ== X-Received: by 2002:a17:902:d511:b0:1ab:7c4:eb24 with SMTP id b17-20020a170902d51100b001ab07c4eb24mr16548053plg.22.1684810582844; Mon, 22 May 2023 19:56:22 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:22 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 01/14] bpf: sockmap, pass skb ownership through read_skb Date: Mon, 22 May 2023 19:56:05 -0700 Message-Id: <20230523025618.113937-2-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that the consume_skb() doesn't kfree the sk_buff. This is problematic because in some error cases under memory pressure we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue(). Then we get this, skb_linearize() __pskb_pull_tail() pskb_expand_head() BUG_ON(skb_shared(skb)) Because we incremented users refcnt from sk_psock_verdict_recv() we hit the bug on with refcnt > 1 and trip it. To fix lets simply pass ownership of the sk_buff through the skb_read call. Then we can drop the consume from read_skb handlers and assume the verdict recv does any required kfree. Bug found while testing in our CI which runs in VMs that hit memory constraints rather regularly. William tested TCP read_skb handlers. [ 106.536188] ------------[ cut here ]------------ [ 106.536197] kernel BUG at net/core/skbuff.c:1693! [ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1 [ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014 [ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330 [ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202 [ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20 [ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8 [ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000 [ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8 [ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8 [ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0 [ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.542255] Call Trace: [ 106.542383] [ 106.542487] __pskb_pull_tail+0x4b/0x3e0 [ 106.542681] skb_ensure_writable+0x85/0xa0 [ 106.542882] sk_skb_pull_data+0x18/0x20 [ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9 [ 106.543536] ? migrate_disable+0x66/0x80 [ 106.543871] sk_psock_verdict_recv+0xe2/0x310 [ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0 [ 106.544561] tcp_read_skb+0x7b/0x120 [ 106.544740] tcp_data_queue+0x904/0xee0 [ 106.544931] tcp_rcv_established+0x212/0x7c0 [ 106.545142] tcp_v4_do_rcv+0x174/0x2a0 [ 106.545326] tcp_v4_rcv+0xe70/0xf60 [ 106.545500] ip_protocol_deliver_rcu+0x48/0x290 [ 106.545744] ip_local_deliver_finish+0xa7/0x150 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reported-by: William Findlay Tested-by: William Findlay Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/core/skmsg.c | 2 -- net/ipv4/tcp.c | 1 - net/ipv4/udp.c | 7 ++----- net/unix/af_unix.c | 7 ++----- net/vmw_vsock/virtio_transport_common.c | 5 +---- 5 files changed, 5 insertions(+), 17 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index f81883759d38..4a3dc8d27295 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1183,8 +1183,6 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) int ret = __SK_DROP; int len = skb->len; - skb_get(skb); - rcu_read_lock(); psock = sk_psock(sk); if (unlikely(!psock)) { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4d6392c16b7a..e914e3446377 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1773,7 +1773,6 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk)); tcp_flags = TCP_SKB_CB(skb)->tcp_flags; used = recv_actor(sk, skb); - consume_skb(skb); if (used < 0) { if (!copied) copied = used; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index aa32afd871ee..9482def1f310 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1818,7 +1818,7 @@ EXPORT_SYMBOL(__skb_recv_udp); int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { struct sk_buff *skb; - int err, copied; + int err; try_again: skb = skb_recv_udp(sk, MSG_DONTWAIT, &err); @@ -1837,10 +1837,7 @@ int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) } WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk)); - copied = recv_actor(sk, skb); - kfree_skb(skb); - - return copied; + return recv_actor(sk, skb); } EXPORT_SYMBOL(udp_read_skb); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index cc695c9f09ec..e7728b57a8c7 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2553,7 +2553,7 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { struct unix_sock *u = unix_sk(sk); struct sk_buff *skb; - int err, copied; + int err; mutex_lock(&u->iolock); skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err); @@ -2561,10 +2561,7 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor) if (!skb) return err; - copied = recv_actor(sk, skb); - kfree_skb(skb); - - return copied; + return recv_actor(sk, skb); } /* diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index e4878551f140..b769fc258931 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -1441,7 +1441,6 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_acto struct sock *sk = sk_vsock(vsk); struct sk_buff *skb; int off = 0; - int copied; int err; spin_lock_bh(&vvs->rx_lock); @@ -1454,9 +1453,7 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_acto if (!skb) return err; - copied = recv_actor(sk, skb); - kfree_skb(skb); - return copied; + return recv_actor(sk, skb); } EXPORT_SYMBOL_GPL(virtio_transport_read_skb); From patchwork Tue May 23 02:56:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251451 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4FE56FCC; Tue, 23 May 2023 02:56:26 +0000 (UTC) Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F73ACA; Mon, 22 May 2023 19:56:25 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-64a9335a8e7so2933358b3a.0; Mon, 22 May 2023 19:56:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810584; x=1687402584; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2hgnG9cuwOu31Gm/saBWLMh9Jksf/Q9YgmS45zoBKS4=; b=MgY3It4M9X7IYbneZwcIDDSnQwdVZM8JwsVPmUutRZcWvRKBmf0OZEMj5EmdQW1O01 masjJl7TyO+9H+BboY4Wv4nojooUhg2SYItga0EEHaR4nrBN1SAtP2scgF7w/xnqO6IB NQHApFTLw/OnoZ9VN5R7lbq0ybypeTpjEq3eX9swgmbKu9J2EYPgxNCWvz6Qc/bEFDmd 7v/CYasDPJKmetEtfqeQfzkhekn5fUsLkRv2yQW7XLU3c4GGFqIg6OaA1bzJ0AgqEUC7 an3a48tnJpX3fcGJzr/4dYBY6Qzb2uNHw9tqKUECQ3ElD6yFu9c7C9mc5uR27J0dCcTj MhFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810584; x=1687402584; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2hgnG9cuwOu31Gm/saBWLMh9Jksf/Q9YgmS45zoBKS4=; b=JT34MuEULy3HCc68LhzHnk4Y8cCauQeSfw2M7QdMR9ZR8hThwhQgSVSWSmmBFUxKSN m7rF8nPgOtcLkG+Zg2wYfbMOKLnqsbRgkCRnk+7SDnRB4dEJjOwphhoWo5GEzXKJPMFK U9427kW5R6MqSKeE7r5EF+jJr3QhGsZmBGX0ovXdAJi5Gdfu9YKJ3RV9Jof/VjsVX2LY q1V17+VPkHw74Oh9EJdxV/yyC0R+BkRLcGsWbN1OXaEG7qUEJvwV1Zdn5p6DmQ5AU3Ak +WywhYWHMvMI094LMbr7CiaYwOxUns09gQMLyBAF0qNrJChIueqM4VMKciBdySIFz7/e R5pQ== X-Gm-Message-State: AC+VfDyigzy4gS2ibxw230Z3K6PWLGvFwU638uIYPZLkb/y89lWQreBj peROftdvFqesI1QGrxE5uD4= X-Google-Smtp-Source: ACHHUZ7fvc7SJ4MJdGdA1Dh6r/lsb0iGgTxAtZfK24mwAMndq6Q3tuXWeaQwNH1IIdZ782lGangyuQ== X-Received: by 2002:a17:902:d50f:b0:1a9:7dc2:9427 with SMTP id b15-20020a170902d50f00b001a97dc29427mr18311074plg.21.1684810584479; Mon, 22 May 2023 19:56:24 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:23 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 02/14] bpf: sockmap, convert schedule_work into delayed_work Date: Mon, 22 May 2023 19:56:06 -0700 Message-Id: <20230523025618.113937-3-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Sk_buffs are fed into sockmap verdict programs either from a strparser (when the user might want to decide how framing of skb is done by attaching another parser program) or directly through tcp_read_sock. The tcp_read_sock is the preferred method for performance when the BPF logic is a stream parser. The flow for Cilium's common use case with a stream parser is, tcp_read_sock() sk_psock_verdict_recv ret = bpf_prog_run_pin_on_cpu() sk_psock_verdict_apply(sock, skb, ret) // if system is under memory pressure or app is slow we may // need to queue skb. Do this queuing through ingress_skb and // then kick timer to wake up handler skb_queue_tail(ingress_skb, skb) schedule_work(work); The work queue is wired up to sk_psock_backlog(). This will then walk the ingress_skb skb list that holds our sk_buffs that could not be handled, but should be OK to run at some later point. However, its possible that the workqueue doing this work still hits an error when sending the skb. When this happens the skbuff is requeued on a temporary 'state' struct kept with the workqueue. This is necessary because its possible to partially send an skbuff before hitting an error and we need to know how and where to restart when the workqueue runs next. Now for the trouble, we don't rekick the workqueue. This can cause a stall where the skbuff we just cached on the state variable might never be sent. This happens when its the last packet in a flow and no further packets come along that would cause the system to kick the workqueue from that side. To fix we could do simple schedule_work(), but while under memory pressure it makes sense to back off some instead of continue to retry repeatedly. So instead to fix convert schedule_work to schedule_delayed_work and add backoff logic to reschedule from backlog queue on errors. Its not obvious though what a good backoff is so use '1'. To test we observed some flakes whil running NGINX compliance test with sockmap we attributed these failed test to this bug and subsequent issue. From on list discussion. This commit bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock") was intended to address similar race, but had a couple cases it missed. Most obvious it only accounted for receiving traffic on the local socket so if redirecting into another socket we could still get an sk_buff stuck here. Next it missed the case where copied=0 in the recv() handler and then we wouldn't kick the scheduler. Also its sub-optimal to require userspace to kick the internal mechanisms of sockmap to wake it up and copy data to user. It results in an extra syscall and requires the app to actual handle the EAGAIN correctly. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Tested-by: William Findlay Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- include/linux/skmsg.h | 2 +- net/core/skmsg.c | 21 ++++++++++++++------- net/core/sock_map.c | 3 ++- 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 84f787416a54..904ff9a32ad6 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -105,7 +105,7 @@ struct sk_psock { struct proto *sk_proto; struct mutex work_mutex; struct sk_psock_work_state work_state; - struct work_struct work; + struct delayed_work work; struct rcu_work rwork; }; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 4a3dc8d27295..0a9ee2acac0b 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -482,7 +482,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, } out: if (psock->work_state.skb && copied > 0) - schedule_work(&psock->work); + schedule_delayed_work(&psock->work, 0); return copied; } EXPORT_SYMBOL_GPL(sk_msg_recvmsg); @@ -640,7 +640,8 @@ static void sk_psock_skb_state(struct sk_psock *psock, static void sk_psock_backlog(struct work_struct *work) { - struct sk_psock *psock = container_of(work, struct sk_psock, work); + struct delayed_work *dwork = to_delayed_work(work); + struct sk_psock *psock = container_of(dwork, struct sk_psock, work); struct sk_psock_work_state *state = &psock->work_state; struct sk_buff *skb = NULL; bool ingress; @@ -680,6 +681,12 @@ static void sk_psock_backlog(struct work_struct *work) if (ret == -EAGAIN) { sk_psock_skb_state(psock, state, skb, len, off); + + /* Delay slightly to prioritize any + * other work that might be here. + */ + if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) + schedule_delayed_work(&psock->work, 1); goto end; } /* Hard errors break pipe and stop xmit. */ @@ -734,7 +741,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) INIT_LIST_HEAD(&psock->link); spin_lock_init(&psock->link_lock); - INIT_WORK(&psock->work, sk_psock_backlog); + INIT_DELAYED_WORK(&psock->work, sk_psock_backlog); mutex_init(&psock->work_mutex); INIT_LIST_HEAD(&psock->ingress_msg); spin_lock_init(&psock->ingress_lock); @@ -823,7 +830,7 @@ static void sk_psock_destroy(struct work_struct *work) sk_psock_done_strp(psock); - cancel_work_sync(&psock->work); + cancel_delayed_work_sync(&psock->work); mutex_destroy(&psock->work_mutex); psock_progs_drop(&psock->progs); @@ -938,7 +945,7 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb) } skb_queue_tail(&psock_other->ingress_skb, skb); - schedule_work(&psock_other->work); + schedule_delayed_work(&psock_other->work, 0); spin_unlock_bh(&psock_other->ingress_lock); return 0; } @@ -1018,7 +1025,7 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, spin_lock_bh(&psock->ingress_lock); if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); + schedule_delayed_work(&psock->work, 0); err = 0; } spin_unlock_bh(&psock->ingress_lock); @@ -1049,7 +1056,7 @@ static void sk_psock_write_space(struct sock *sk) psock = sk_psock(sk); if (likely(psock)) { if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) - schedule_work(&psock->work); + schedule_delayed_work(&psock->work, 0); write_space = psock->saved_write_space; } rcu_read_unlock(); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 7c189c2e2fbf..00afb66cd095 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1644,9 +1644,10 @@ void sock_map_close(struct sock *sk, long timeout) rcu_read_unlock(); sk_psock_stop(psock); release_sock(sk); - cancel_work_sync(&psock->work); + cancel_delayed_work_sync(&psock->work); sk_psock_put(sk, psock); } + /* Make sure we do not recurse. This is a bug. * Leak the socket instead of crashing on a stack overflow. */ From patchwork Tue May 23 02:56:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251452 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B954479DF; Tue, 23 May 2023 02:56:27 +0000 (UTC) Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98718E0; Mon, 22 May 2023 19:56:26 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-534696e4e0aso3921505a12.0; Mon, 22 May 2023 19:56:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810586; x=1687402586; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8LDFpEJF2r3u4aoZRDO+MEf7dfGfkdbpGtCFF1Q2dVs=; b=sD0KAy/4foe/p2Mr5fyxWy/pMwTNAxI3QLVcHznhxj0koUy/tnW4TDCXloLYCtMtPG xqiw/asbzlOSS7RvfjFaSuBnX1aFMEIy932NvAZiEyOiG9kGK/ok05nTDCBCelAU/Wlz fBFnI7TTnbr5516/CU1YKiM6JlGTJuiCrJ0DXMGzeTaBf/uOREhcbQvzRQSNPaudV8mS 2uhu4ZMq1PKGCvTYa18/+xp0KPjn32V8CoA1ofkFTQd5EpOIVxAboiVtMZ+IntYyow7D HVjmxeZluV6TGenKJy4KA7lmJursVX0zdX3ie1fY20ZOGV8twM4/kXFeCrKoGnL/pdch lvMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810586; x=1687402586; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8LDFpEJF2r3u4aoZRDO+MEf7dfGfkdbpGtCFF1Q2dVs=; b=FMz5ZXLR0QzQC15c9Me+1SiTfiYwe3qz4JRbICBmgBZq2adVrpEt8QQr/q6ysBusRd xz1snhgTlJxiRssMEdjIE2cxjD1K4ziYUx3REPAQi0DL6TGkgBGcIZbfESU/d0FA35RT um3OXYcxWey7739ST8/Bcp6XhNjF9SqIHzZ1MGQiujDsOh+l2I6XQv53D6jDGkZEIrjJ +TzbSUaUXm70uKFkXcvXiGKSCPSBCDX2Dih2I7S68bfpdmjlEUruFryv0nJF7L1UYNaJ yuq5RDNWU+UIx7JUFs17NwDQGAq9SFxR93F2zSdM5CuPzNwUZbaYuALrZvSdKMvyyIXn fbeQ== X-Gm-Message-State: AC+VfDwDB2x5KS8MWxzhoyBbp4Sz7tN6DgX9ou80KB2xTqUI3Wv5qdsn 7x92OIcs1KJJu10CDZOiBKc= X-Google-Smtp-Source: ACHHUZ4Rq3ZfaOYXiTHNotxGDfeiitiY29OVaLB/ppf1Xvokst1L7UQ/fk/PZZfUCtvyowz9je7Nwg== X-Received: by 2002:a17:902:d50e:b0:1ae:6720:8e01 with SMTP id b14-20020a170902d50e00b001ae67208e01mr14213921plg.20.1684810586061; Mon, 22 May 2023 19:56:26 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:25 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 03/14] bpf: sockmap, reschedule is now done through backlog Date: Mon, 22 May 2023 19:56:07 -0700 Message-Id: <20230523025618.113937-4-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Now that the backlog manages the reschedule() logic correctly we can drop the partial fix to reschedule from recvmsg hook. Rescheduling on recvmsg hook was added to address a corner case where we still had data in the backlog state but had nothing to kick it and reschedule the backlog worker to run and finish copying data out of the state. This had a couple limitations, first it required user space to kick it introducing an unnecessary EBUSY and retry. Second it only handled the ingress case and egress redirects would still be hung. With the correct fix, pushing the reschedule logic down to where the enomem error occurs we can drop this fix. Reviewed-by: Jakub Sitnicki Fixes: bec217197b412 ("skmsg: Schedule psock work if the cached skb exists on the psock") Signed-off-by: John Fastabend --- net/core/skmsg.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 0a9ee2acac0b..76ff15f8bb06 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -481,8 +481,6 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, msg_rx = sk_psock_peek_msg(psock); } out: - if (psock->work_state.skb && copied > 0) - schedule_delayed_work(&psock->work, 0); return copied; } EXPORT_SYMBOL_GPL(sk_msg_recvmsg); From patchwork Tue May 23 02:56:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251453 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23DACA950; Tue, 23 May 2023 02:56:30 +0000 (UTC) Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD676CA; Mon, 22 May 2023 19:56:28 -0700 (PDT) Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-51b0f9d7d70so6179062a12.1; Mon, 22 May 2023 19:56:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810588; x=1687402588; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AMWQXrGcXhHgmJZ5RnEojce3chjmU+rt7GqrTxtYxbI=; b=ELzI/2ygFwNYPjS4l2LSxZxiw8wWt7I+ayqSLgy/sC0KIs90afgnP1urC/iQ84jIIU o+KavJ3gnIHNP1oy3iReVPWbFFM5LottGr3/OJGhPG6gFCF+3uNFAsi8ZtCJSD5oxviF IiEXa60tEhKcqN3iu5ZMh/iA0q/G+2B1UISyRJ3FVOkIqJ3p4fqneYOJnl7GGqVk7eow 1wM5mBASjZa/aRuUj2QC99vwLRS8NXAo7q9pDhv242r29AR/q8YevuhvvP+v3kCdzKnU KLmsM5dSq9i6Ap4uK0nNtBPT9Hx4hYiECkzREtNKauVVwlBd2fBRCZgbdPqb3rAgd3x1 Mcyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810588; x=1687402588; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AMWQXrGcXhHgmJZ5RnEojce3chjmU+rt7GqrTxtYxbI=; b=d86Y4YXhO8yCmKLVqpfizQxb42VhzvB4GucWty+TohB6oRYTD1Y+H84YRQTlsyLFit BmZ6ZA0tb4V/IRYLw8AVkO+PSdGMAklM9NoC0K6h6wwPCO9qfb4raeqBhjoR0XKu0JCx dVzTAeFO9qoGyMOiezelZiRfojc6fYe/t++qpt+iVhxDxM2m3QtuKzUKjkASF1wNnH0P /VD0Lc8a8nbE9xlJwNoQT+Lg1CCXFJkf7y+krLVtg0DuJGsThg8oCTlWMJwe1S/k+eXk xA9V1JJCqn9CYiZh42i7Jm4nzuiG2Ln8iAH8vovXLv9NiLmhRilOJmqpXoTOGRoDS7oA T7FA== X-Gm-Message-State: AC+VfDwka46x4sA7wf+5fT1tuMGnngKoH0rWtLZc306MT/ehWRcNccY1 pRlcAStdyHyRU/ZqzU+peOCQib54cUM= X-Google-Smtp-Source: ACHHUZ7N/AWXHIDdkLqhcdOYAw4DQHTprxXDN6s5lRGYhGy5hzAFu4FGJhwliP2y939f+Ix4qDU+SQ== X-Received: by 2002:a17:902:ba8c:b0:1ad:f138:b2f6 with SMTP id k12-20020a170902ba8c00b001adf138b2f6mr10620855pls.16.1684810587878; Mon, 22 May 2023 19:56:27 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:27 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 04/14] bpf: sockmap, improved check for empty queue Date: Mon, 22 May 2023 19:56:08 -0700 Message-Id: <20230523025618.113937-5-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net We noticed some rare sk_buffs were stepping past the queue when system was under memory pressure. The general theory is to skip enqueueing sk_buffs when its not necessary which is the normal case with a system that is properly provisioned for the task, no memory pressure and enough cpu assigned. But, if we can't allocate memory due to an ENOMEM error when enqueueing the sk_buff into the sockmap receive queue we push it onto a delayed workqueue to retry later. When a new sk_buff is received we then check if that queue is empty. However, there is a problem with simply checking the queue length. When a sk_buff is being processed from the ingress queue but not yet on the sockmap msg receive queue its possible to also recv a sk_buff through normal path. It will check the ingress queue which is zero and then skip ahead of the pkt being processed. Previously we used sock lock from both contexts which made the problem harder to hit, but not impossible. To fix instead of popping the skb from the queue entirely we peek the skb from the queue and do the copy there. This ensures checks to the queue length are non-zero while skb is being processed. Then finally when the entire skb has been copied to user space queue or another socket we pop it off the queue. This way the queue length check allows bypassing the queue only after the list has been completely processed. To reproduce issue we run NGINX compliance test with sockmap running and observe some flakes in our testing that we attributed to this issue. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Tested-by: William Findlay Suggested-by: Jakub Sitnicki Signed-off-by: John Fastabend Reviewed-by: Jakub Sitnicki --- include/linux/skmsg.h | 1 - net/core/skmsg.c | 32 ++++++++------------------------ 2 files changed, 8 insertions(+), 25 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 904ff9a32ad6..054d7911bfc9 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -71,7 +71,6 @@ struct sk_psock_link { }; struct sk_psock_work_state { - struct sk_buff *skb; u32 len; u32 off; }; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 76ff15f8bb06..bcd45a99a3db 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -622,16 +622,12 @@ static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, static void sk_psock_skb_state(struct sk_psock *psock, struct sk_psock_work_state *state, - struct sk_buff *skb, int len, int off) { spin_lock_bh(&psock->ingress_lock); if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { - state->skb = skb; state->len = len; state->off = off; - } else { - sock_drop(psock->sk, skb); } spin_unlock_bh(&psock->ingress_lock); } @@ -642,23 +638,17 @@ static void sk_psock_backlog(struct work_struct *work) struct sk_psock *psock = container_of(dwork, struct sk_psock, work); struct sk_psock_work_state *state = &psock->work_state; struct sk_buff *skb = NULL; + u32 len = 0, off = 0; bool ingress; - u32 len, off; int ret; mutex_lock(&psock->work_mutex); - if (unlikely(state->skb)) { - spin_lock_bh(&psock->ingress_lock); - skb = state->skb; + if (unlikely(state->len)) { len = state->len; off = state->off; - state->skb = NULL; - spin_unlock_bh(&psock->ingress_lock); } - if (skb) - goto start; - while ((skb = skb_dequeue(&psock->ingress_skb))) { + while ((skb = skb_peek(&psock->ingress_skb))) { len = skb->len; off = 0; if (skb_bpf_strparser(skb)) { @@ -667,7 +657,6 @@ static void sk_psock_backlog(struct work_struct *work) off = stm->offset; len = stm->full_len; } -start: ingress = skb_bpf_ingress(skb); skb_bpf_redirect_clear(skb); do { @@ -677,8 +666,7 @@ static void sk_psock_backlog(struct work_struct *work) len, ingress); if (ret <= 0) { if (ret == -EAGAIN) { - sk_psock_skb_state(psock, state, skb, - len, off); + sk_psock_skb_state(psock, state, len, off); /* Delay slightly to prioritize any * other work that might be here. @@ -690,15 +678,16 @@ static void sk_psock_backlog(struct work_struct *work) /* Hard errors break pipe and stop xmit. */ sk_psock_report_error(psock, ret ? -ret : EPIPE); sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); - sock_drop(psock->sk, skb); goto end; } off += ret; len -= ret; } while (len); - if (!ingress) + skb = skb_dequeue(&psock->ingress_skb); + if (!ingress) { kfree_skb(skb); + } } end: mutex_unlock(&psock->work_mutex); @@ -791,11 +780,6 @@ static void __sk_psock_zap_ingress(struct sk_psock *psock) skb_bpf_redirect_clear(skb); sock_drop(psock->sk, skb); } - kfree_skb(psock->work_state.skb); - /* We null the skb here to ensure that calls to sk_psock_backlog - * do not pick up the free'd skb. - */ - psock->work_state.skb = NULL; __sk_psock_purge_ingress_msg(psock); } @@ -814,7 +798,6 @@ void sk_psock_stop(struct sk_psock *psock) spin_lock_bh(&psock->ingress_lock); sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); sk_psock_cork_free(psock); - __sk_psock_zap_ingress(psock); spin_unlock_bh(&psock->ingress_lock); } @@ -829,6 +812,7 @@ static void sk_psock_destroy(struct work_struct *work) sk_psock_done_strp(psock); cancel_delayed_work_sync(&psock->work); + __sk_psock_zap_ingress(psock); mutex_destroy(&psock->work_mutex); psock_progs_drop(&psock->progs); From patchwork Tue May 23 02:56:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251454 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2188CBA3A; Tue, 23 May 2023 02:56:32 +0000 (UTC) Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37C72CA; Mon, 22 May 2023 19:56:30 -0700 (PDT) Received: by mail-pg1-x533.google.com with SMTP id 41be03b00d2f7-51f1b6e8179so4800846a12.3; Mon, 22 May 2023 19:56:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810589; x=1687402589; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1X8MavyMxt9ea7RvB7abRtbZx5QchGwJbc+sm6sp6gI=; b=fbcv3eWbWfOQ8G2DHCfVDBgYgHhv+c8mVnUevD0YHwHDoQ11EbCpWoKK3zsjSGLg+o RjNlJ8qXfiiBo8h5CpmqlZ6qqnOHEzdmCYNsYevU0JlHNR8SDgyVJ8It+aWrZob/cHcv RKLMVo6aa/yTqdiyzl3ajuozwuEDn4Go7EHYcOHuiIezkT8p0g73gNSzd/keIiNwcRrY MVzTIKJWBg4MUyvyWkfVf6485ehv/9J57MTpNtWXC9SZQ7sGRw4Vs9kjfdgeR1vabz+d G7Tjhix1HoBVoEz6xmZVhFdVVLxMf/Af5cEKiR/sxqkkGOxpwFyUczb3F+oKhlbnocBd SRsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810589; x=1687402589; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1X8MavyMxt9ea7RvB7abRtbZx5QchGwJbc+sm6sp6gI=; b=FcLClSlGPS8/Yon/yzgJY97etyEUE8SO/sS9/OgFWMoBe0/vVB0UprCs5NC6q1X28L 9RH4e2vrU5HX9mCGVwvL9Y5JCL0okkb1pfPSZr8b95Te3oTRTUrW9cR8QX8hEy0rVwHB nHbFd3LJEXWVTRUXLvNsziQhp+mznsZMspYxcePpAjuzzcIXAvBS+p+/GeMmnBbj0K/X w0r4eN0yosz0pZ/YqnnMtVF9h8SbhfJaD9LH7OVf6HnLamhR5pGrfgoZuFqT3qIfJLHa X7FPlLJr+GIFYqB8em8/jQLK0pyUMkP+6Hx4OS61zue6KCcqzPLqB+lUWLs6U5jbLpCL rUeg== X-Gm-Message-State: AC+VfDyEqY6+uEUbdVSXZV7WLqXOWwyqlElRmAAPtZJ8H8HuQ2f/VWlT HtJTNqLzjqHqrM4EC2ERqho= X-Google-Smtp-Source: ACHHUZ4t5bPlcj8XxkeuxCbbb5+12p5YVwFNrXDzn2il+ZyWPtQfnpAc5aYUNUeghGqFaVNH2ouG4w== X-Received: by 2002:a17:903:24c:b0:1ac:3ddf:2299 with SMTP id j12-20020a170903024c00b001ac3ddf2299mr14341990plh.44.1684810589656; Mon, 22 May 2023 19:56:29 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:29 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 05/14] bpf: sockmap, handle fin correctly Date: Mon, 22 May 2023 19:56:09 -0700 Message-Id: <20230523025618.113937-6-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net The sockmap code is returning EAGAIN after a FIN packet is received and no more data is on the receive queue. Correct behavior is to return 0 to the user and the user can then close the socket. The EAGAIN causes many apps to retry which masks the problem. Eventually the socket is evicted from the sockmap because its released from sockmap sock free handling. The issue creates a delay and can cause some errors on application side. To fix this check on sk_msg_recvmsg side if length is zero and FIN flag is set then set return to zero. A selftest will be added to check this condition. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Tested-by: William Findlay Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 2e9547467edb..73c13642d47f 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -174,6 +174,24 @@ static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock, return ret; } +static bool is_next_msg_fin(struct sk_psock *psock) +{ + struct scatterlist *sge; + struct sk_msg *msg_rx; + int i; + + msg_rx = sk_psock_peek_msg(psock); + i = msg_rx->sg.start; + sge = sk_msg_elem(msg_rx, i); + if (!sge->length) { + struct sk_buff *skb = msg_rx->skb; + + if (skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) + return true; + } + return false; +} + static int tcp_bpf_recvmsg_parser(struct sock *sk, struct msghdr *msg, size_t len, @@ -196,6 +214,19 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, lock_sock(sk); msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + /* The typical case for EFAULT is the socket was gracefully + * shutdown with a FIN pkt. So check here the other case is + * some error on copy_page_to_iter which would be unexpected. + * On fin return correct return code to zero. + */ + if (copied == -EFAULT) { + bool is_fin = is_next_msg_fin(psock); + + if (is_fin) { + copied = 0; + goto out; + } + } if (!copied) { long timeo; int data; From patchwork Tue May 23 02:56:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251456 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49CD5C129; Tue, 23 May 2023 02:56:37 +0000 (UTC) Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 157B2E0; Mon, 22 May 2023 19:56:32 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1ae85b71141so36219375ad.0; Mon, 22 May 2023 19:56:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810591; x=1687402591; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xp4LbGPkNU6EN2wBzfXws+yiU1Dph6UgcZtiAxf5AYY=; b=X1roESqs+aWMANqz1vyFLX4+kbHYcXYHRClEtYho7oy+mcxCEkqf7pC2+ZmX7Rz/BY 6nORmVlqsgy+lHoCny/EMFxFN9X1/9hwEqK9PB1xyA5XaIJiDFdR3EVlvmmGeUiKeNzA fmzmzvw7c9nUu2leiJPCJ6EKPdQ9oOh9qjgRgbwAcpvConhYYLBDqptNcY0e1068tw4C a8OMRD+0fWCScej2NBAsuOME7bOOYS+VVwPC+lYrNhV6USMDy77WZNf5IzDYtkofEUyE JYQ+AhSoCDxF6BzHWnweXTcOnzZGP8e5EVnPyzczZzs2w6NFqiOMrUePJAH86TDewZap Dssg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810591; x=1687402591; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xp4LbGPkNU6EN2wBzfXws+yiU1Dph6UgcZtiAxf5AYY=; b=jttjE+NtLxUWNKojQemkgs+47XWJ20r9cANFt/S94vWottM8mreXZvtcR6Pvc/4X7g Uw8sMsUzLvYNoeUPBVZVIfWNZ3rdcZouOqYkl8kCzyrA8T9mKoLUzI3lM7mwPbwQ2h2k IyVnc6vBM1thuJkm40sZ9OvYsSTxDHoIH8anq7XY/RgfCrveHNmvIbhNTid7FEED2rtz SougUB87DJil464lMIIAW/cZz6OSeB8cHxT/1/QfgK1i+UQ9n+lKlViohXxoz8+AT45/ GWSuyf9qwU4duCyeVuxooQBvThu8c57TjAmbSIJl+/OUwj7nM4wwVU7cSlGsmgLXM/yG hbvw== X-Gm-Message-State: AC+VfDy6lezS/fmy094PA67y1REKRHbko9EeT8YbwHVf9e5+iCRtRa3a xMd9nqG+GpU3s46rRF1iKsc= X-Google-Smtp-Source: ACHHUZ7F/yWscFhYJQUQm7JAA5B7V0yzQuX3UbtOpBWYHP8R7lvZlQoMzbjzN6zCN4FqFa1aXMZ35Q== X-Received: by 2002:a17:902:f2cc:b0:1ac:72ff:9853 with SMTP id h12-20020a170902f2cc00b001ac72ff9853mr10480838plc.30.1684810591274; Mon, 22 May 2023 19:56:31 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:30 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 06/14] bpf: sockmap, TCP data stall on recv before accept Date: Mon, 22 May 2023 19:56:10 -0700 Message-Id: <20230523025618.113937-7-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net A common mechanism to put a TCP socket into the sockmap is to hook the BPF_SOCK_OPS_{ACTIVE_PASSIVE}_ESTABLISHED_CB event with a BPF program that can map the socket info to the correct BPF verdict parser. When the user adds the socket to the map the psock is created and the new ops are assigned to ensure the verdict program will 'see' the sk_buffs as they arrive. Part of this process hooks the sk_data_ready op with a BPF specific handler to wake up the BPF verdict program when data is ready to read. The logic is simple enough (posted here for easy reading) static void sk_psock_verdict_data_ready(struct sock *sk) { struct socket *sock = sk->sk_socket; if (unlikely(!sock || !sock->ops || !sock->ops->read_skb)) return; sock->ops->read_skb(sk, sk_psock_verdict_recv); } The oversight here is sk->sk_socket is not assigned until the application accepts() the new socket. However, its entirely ok for the peer application to do a connect() followed immediately by sends. The socket on the receiver is sitting on the backlog queue of the listening socket until its accepted and the data is queued up. If the peer never accepts the socket or is slow it will eventually hit data limits and rate limit the session. But, important for BPF sockmap hooks when this data is received TCP stack does the sk_data_ready() call but the read_skb() for this data is never called because sk_socket is missing. The data sits on the sk_receive_queue. Then once the socket is accepted if we never receive more data from the peer there will be no further sk_data_ready calls and all the data is still on the sk_receive_queue(). Then user calls recvmsg after accept() and for TCP sockets in sockmap we use the tcp_bpf_recvmsg_parser() handler. The handler checks for data in the sk_msg ingress queue expecting that the BPF program has already run from the sk_data_ready hook and enqueued the data as needed. So we are stuck. To fix do an unlikely check in recvmsg handler for data on the sk_receive_queue and if it exists wake up data_ready. We have the sock locked in both read_skb and recvmsg so should avoid having multiple runners. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/ipv4/tcp_bpf.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 73c13642d47f..01dd76be1a58 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -212,6 +212,26 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, return tcp_recvmsg(sk, msg, len, flags, addr_len); lock_sock(sk); + + /* We may have received data on the sk_receive_queue pre-accept and + * then we can not use read_skb in this context because we haven't + * assigned a sk_socket yet so have no link to the ops. The work-around + * is to check the sk_receive_queue and in these cases read skbs off + * queue again. The read_skb hook is not running at this point because + * of lock_sock so we avoid having multiple runners in read_skb. + */ + if (unlikely(!skb_queue_empty(&sk->sk_receive_queue))) { + tcp_data_ready(sk); + /* This handles the ENOMEM errors if we both receive data + * pre accept and are already under memory pressure. At least + * let user know to retry. + */ + if (unlikely(!skb_queue_empty(&sk->sk_receive_queue))) { + copied = -EAGAIN; + goto out; + } + } + msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); /* The typical case for EFAULT is the socket was gracefully From patchwork Tue May 23 02:56:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251455 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B668BE51; Tue, 23 May 2023 02:56:34 +0000 (UTC) Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55E5CCA; Mon, 22 May 2023 19:56:33 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-52867360efcso4810329a12.2; Mon, 22 May 2023 19:56:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810593; x=1687402593; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XBzXJoALt8kblYyN0yx9KfML90NNLIp7TFuBtKGUqOs=; b=Ls+m6uOOIDU+FT3L2QdEQtpoE5XiZ2P4uTg2bVncevjxAu6c5Rdzxi2EEJhWuWiwy+ 4FthRw4PoJKtM2nAAqibBUJEmx5/qghQuyD48WRYzcb/UxFVVTFv6ZKoXjrlkScWk3rH ZgFvv9GnwZgU5i/XTfq3LRxWKQI2OSY/5YKJURYwn6Zyqg+R/0Qy/Vs8M3ueziLUj1iM v2u4dAng8dbedsiK4Nk7p1IFgT6+yyie/Rao44AH7GhPJqLiQKWzPvHUPP/AopXqR5im l8Qu3LtPFQ3wTO1ucK13JILrVsGq0ZJnQkkK++8pUERtBtNCj8+B0TLd0DjP9ZzVLKgF JPYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810593; x=1687402593; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XBzXJoALt8kblYyN0yx9KfML90NNLIp7TFuBtKGUqOs=; b=e/FugwQjJ5dhOB5LlBzzHzYefP+8f2hFu8H7hpZcZ09l5zF1EXDqmX8K/EuR82eSA1 5aMLLR7q28mny+uv1TxeVkcXR31YdPb0jr/pZRigs7XvgYqe8ad7Lj51Z3pQKsfLpZhF UYrDOTvlVLjrHpb0E4XrI4J7IMR3xwHN0UqorMACXqwvyi3HtRF6kYCsfxg6V6/iGMD1 tXH7qJzXZAhIlqC9xhSwKpAE0DenRFJcoTn2oQ8JNa/uxNXCqfcCsMVyW1FbWvIC0lP6 VPXF3cb9msm4nX7gjABkkvUQ7av0SOC/ZEKjHRsKi/x2t25jMGRV2QJ5O5YgEjMTNUFq Eb9A== X-Gm-Message-State: AC+VfDy8yQECouS5hHDcVtDbvQRmzpFu+2rBQC+tK1GysBQJYpPPAr6C olnPMTKNwRcsPAa8vc0mm08= X-Google-Smtp-Source: ACHHUZ7UqV7K7I/N0ZTuv0Jf7e64lHNCAyDM41sIuiYDhKqFsGDM2+VL4Wmu1gKJJYGDQhkpgMKY1Q== X-Received: by 2002:a17:902:e752:b0:1af:cd00:d4e4 with SMTP id p18-20020a170902e75200b001afcd00d4e4mr904144plf.47.1684810592809; Mon, 22 May 2023 19:56:32 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:32 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 07/14] bpf: sockmap, wake up polling after data copy Date: Mon, 22 May 2023 19:56:11 -0700 Message-Id: <20230523025618.113937-8-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net When TCP stack has data ready to read sk_data_ready() is called. Sockmap overwrites this with its own handler to call into BPF verdict program. But, the original TCP socket had sock_def_readable that would additionally wake up any user space waiters with sk_wake_async(). Sockmap saved the callback when the socket was created so call the saved data ready callback and then we can wake up any epoll() logic waiting on the read. Note we call on 'copied >= 0' to account for returning 0 when a FIN is received because we need to wake up user for this as well so they can do the recvmsg() -> 0 and detect the shutdown. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- net/core/skmsg.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index bcd45a99a3db..08be5f409fb8 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1199,12 +1199,21 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) static void sk_psock_verdict_data_ready(struct sock *sk) { struct socket *sock = sk->sk_socket; + int copied; trace_sk_data_ready(sk); if (unlikely(!sock || !sock->ops || !sock->ops->read_skb)) return; - sock->ops->read_skb(sk, sk_psock_verdict_recv); + copied = sock->ops->read_skb(sk, sk_psock_verdict_recv); + if (copied >= 0) { + struct sk_psock *psock; + + rcu_read_lock(); + psock = sk_psock(sk); + psock->saved_data_ready(sk); + rcu_read_unlock(); + } } void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock) From patchwork Tue May 23 02:56:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251458 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E241017721; Tue, 23 May 2023 02:56:39 +0000 (UTC) Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F4E1CA; Mon, 22 May 2023 19:56:35 -0700 (PDT) Received: by mail-pg1-x534.google.com with SMTP id 41be03b00d2f7-5304d0d1eddso3712685a12.2; Mon, 22 May 2023 19:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810594; x=1687402594; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PKVXOAHKFA789J7ho2XD1RIAcfngnJV04ADcLg/bfiw=; b=PLBOfDxhokrojxBBPJq3NEFU3LJCvjnED+g6f6xZt+A0ZxLxVS01ikoRchkpieKH/g 0UlxOz4lHwx4kqc9iYcCzAdvxHj/XKcF9L2zOWSFUxN/kGR+k3XLAVS6H9uySGW6KwvE cEWuB61OeBrWFSXfL0zIcBPFjq2y7XymX5o6zXqLOj8gomqYPgDaaCv43ncGAJG4KU09 5cwy20JkMVOlz4LeGojm94Qj8Tc//kZTBwZWpxJ87C1wIayMIwRENBQyXRLPJQA7NdPH BQamzsao93EmVoe7BAkuSSJq5GD4I5msKXgowV3+seebi/uTInGaGsSafzuYHhYHqEu7 oV2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810594; x=1687402594; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PKVXOAHKFA789J7ho2XD1RIAcfngnJV04ADcLg/bfiw=; b=N9HTbiEjwizMvA2EeP6cWNK6S33jF4H24L/ch9hPYL3JIFcNVCOwKdqtEcAfx+Piu/ P1tv5kykksV6hs5Xq9sWrD1emvQX6fmj/nWKFAUwxaDBgl1QNnOiymqn5/jzyrLFgMOV CiPkMDBwem2drEhj2cYUeDW0Js86qxROPBdinhNYgklc6JJ/Y8El3pCFeER6couSyOr9 3+u3ZSCMYvlUgN9hlw0tEnR1BoHspJMIfxjrl5cKWBe4Ob0HCZ6Yx6xMN8vsNeXC0SeN JixBB4fLkzXFfMIYl3pHsHSoRz3Pk5wnQzQuZLXO5Zo83ErQtR/0ccDXVVotY/dgqooQ TKyg== X-Gm-Message-State: AC+VfDwLNghUb21+GWfK/0ZFboOnYi54DeU6bTiviteQgSJD21Zy/S1i 9zYrc7Q+ARAqu9N8+Ixy7as= X-Google-Smtp-Source: ACHHUZ7KNoPV7skQ1YdBUPxAlD+7EXy3CMKYU8RVizQgIOSiKmzZcvoKNnaJYdeW0Rlkcl7wysD7xg== X-Received: by 2002:a17:902:a589:b0:1ad:dd21:2691 with SMTP id az9-20020a170902a58900b001addd212691mr14232613plb.10.1684810594420; Mon, 22 May 2023 19:56:34 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:33 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 08/14] bpf: sockmap, incorrectly handling copied_seq Date: Mon, 22 May 2023 19:56:12 -0700 Message-Id: <20230523025618.113937-9-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net The read_skb() logic is incrementing the tcp->copied_seq which is used for among other things calculating how many outstanding bytes can be read by the application. This results in application errors, if the application does an ioctl(FIONREAD) we return zero because this is calculated from the copied_seq value. To fix this we move tcp->copied_seq accounting into the recv handler so that we update these when the recvmsg() hook is called and data is in fact copied into user buffers. This gives an accurate FIONREAD value as expected and improves ACK handling. Before we were calling the tcp_rcv_space_adjust() which would update 'number of bytes copied to user in last RTT' which is wrong for programs returning SK_PASS. The bytes are only copied to the user when recvmsg is handled. Doing the fix for recvmsg is straightforward, but fixing redirect and SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then call this from skmsg handlers. This fixes another issue where a broken socket with a BPF program doing a resubmit could hang the receiver. This happened because although read_skb() consumed the skb through sock_drop() it did not update the copied_seq. Now if a single reccv socket is redirecting to many sockets (for example for lb) the receiver sk will be hung even though we might expect it to continue. The hang comes from not updating the copied_seq numbers and memory pressure resulting from that. We have a slight layer problem of calling tcp_eat_skb even if its not a TCP socket. To fix we could refactor and create per type receiver handlers. I decided this is more work than we want in the fix and we already have some small tweaks depending on caller that use the helper skb_bpf_strparser(). So we extend that a bit and always set the strparser bit when it is in use and then we can gate the seq_copied updates on this. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend Reviewed-by: Jakub Sitnicki --- include/net/tcp.h | 10 ++++++++++ net/core/skmsg.c | 15 +++++++-------- net/ipv4/tcp.c | 10 +--------- net/ipv4/tcp_bpf.c | 28 +++++++++++++++++++++++++++- 4 files changed, 45 insertions(+), 18 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 04a31643cda3..18a038d16434 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1470,6 +1470,8 @@ static inline void tcp_adjust_rcv_ssthresh(struct sock *sk) } void tcp_cleanup_rbuf(struct sock *sk, int copied); +void __tcp_cleanup_rbuf(struct sock *sk, int copied); + /* We provision sk_rcvbuf around 200% of sk_rcvlowat. * If 87.5 % (7/8) of the space has been consumed, we want to override @@ -2326,6 +2328,14 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); #endif /* CONFIG_BPF_SYSCALL */ +#ifdef CONFIG_INET +void tcp_eat_skb(struct sock *sk, struct sk_buff *skb); +#else +static inline void tcp_eat_skb(struct sock *sk, struct sk_buff *skb) +{ +} +#endif + int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress, struct sk_msg *msg, u32 bytes, int flags); #endif /* CONFIG_NET_SOCK_MSG */ diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 08be5f409fb8..a9060e1f0e43 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -979,10 +979,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, err = -EIO; sk_other = psock->sk; if (sock_flag(sk_other, SOCK_DEAD) || - !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { - skb_bpf_redirect_clear(skb); + !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) goto out_free; - } skb_bpf_set_ingress(skb); @@ -1011,18 +1009,19 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb, err = 0; } spin_unlock_bh(&psock->ingress_lock); - if (err < 0) { - skb_bpf_redirect_clear(skb); + if (err < 0) goto out_free; - } } break; case __SK_REDIRECT: + tcp_eat_skb(psock->sk, skb); err = sk_psock_skb_redirect(psock, skb); break; case __SK_DROP: default: out_free: + skb_bpf_redirect_clear(skb); + tcp_eat_skb(psock->sk, skb); sock_drop(psock->sk, skb); } @@ -1067,8 +1066,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) skb_dst_drop(skb); skb_bpf_redirect_clear(skb); ret = bpf_prog_run_pin_on_cpu(prog, skb); - if (ret == SK_PASS) - skb_bpf_set_strparser(skb); + skb_bpf_set_strparser(skb); ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb)); skb->sk = NULL; } @@ -1176,6 +1174,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) psock = sk_psock(sk); if (unlikely(!psock)) { len = 0; + tcp_eat_skb(sk, skb); sock_drop(sk, skb); goto out; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e914e3446377..a60f6f4e7cd9 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1571,7 +1571,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len) * calculation of whether or not we must ACK for the sake of * a window update. */ -static void __tcp_cleanup_rbuf(struct sock *sk, int copied) +void __tcp_cleanup_rbuf(struct sock *sk, int copied) { struct tcp_sock *tp = tcp_sk(sk); bool time_to_ack = false; @@ -1786,14 +1786,6 @@ int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) break; } } - WRITE_ONCE(tp->copied_seq, seq); - - tcp_rcv_space_adjust(sk); - - /* Clean up data we have read: This will do ACK frames. */ - if (copied > 0) - __tcp_cleanup_rbuf(sk, copied); - return copied; } EXPORT_SYMBOL(tcp_read_skb); diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 01dd76be1a58..5f93918c063c 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -11,6 +11,24 @@ #include #include +void tcp_eat_skb(struct sock *sk, struct sk_buff *skb) +{ + struct tcp_sock *tcp; + int copied; + + if (!skb || !skb->len || !sk_is_tcp(sk)) + return; + + if (skb_bpf_strparser(skb)) + return; + + tcp = tcp_sk(sk); + copied = tcp->copied_seq + skb->len; + WRITE_ONCE(tcp->copied_seq, copied); + tcp_rcv_space_adjust(sk); + __tcp_cleanup_rbuf(sk, skb->len); +} + static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg, u32 apply_bytes, int flags) { @@ -198,8 +216,10 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, int flags, int *addr_len) { + struct tcp_sock *tcp = tcp_sk(sk); + u32 seq = tcp->copied_seq; struct sk_psock *psock; - int copied; + int copied = 0; if (unlikely(flags & MSG_ERRQUEUE)) return inet_recv_error(sk, msg, len, addr_len); @@ -244,9 +264,11 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, if (is_fin) { copied = 0; + seq++; goto out; } } + seq += copied; if (!copied) { long timeo; int data; @@ -284,6 +306,10 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, copied = -EAGAIN; } out: + WRITE_ONCE(tcp->copied_seq, seq); + tcp_rcv_space_adjust(sk); + if (copied > 0) + __tcp_cleanup_rbuf(sk, copied); release_sock(sk); sk_psock_put(sk, psock); return copied; From patchwork Tue May 23 02:56:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251457 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0514EEDF; Tue, 23 May 2023 02:56:38 +0000 (UTC) Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 918CCE6; Mon, 22 May 2023 19:56:36 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-64a9335a8e7so2933432b3a.0; Mon, 22 May 2023 19:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810596; x=1687402596; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RV3cy6fzT+FP/o9Z76rCWXv8erLvFM5dnU00SJ/KL7o=; b=FCye5yb7EdGr4f0Mw19pOJVlSUCjki10A5Xt6gsdReYkkIKmd1h8F/wyDRfMQhSt5D zAhrS1I2seytotbloUWwNxM5AeApfTIwxp24QPJhiWGimXh8BTQwL1vASd0PzHEj3d4r LFKaanYgqJEtEjAKwlh1KX7JLAfygLBt5fsaEpAYdYhtCEXqDfCdsplS9vjy/ZAm42Hn 4kK5pq9D4cIYjKOppwiq1Ei9AEX35k9DiTjD/WWW+XvbCu/MMraZiWpHneciY3es6oFd OPztH2P49r7Ngbrbmxtg6THzuo87cOHleXjiEvrpS7fFwkkiwTn6YZdW4jxWkCxZPEpm ABbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810596; x=1687402596; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RV3cy6fzT+FP/o9Z76rCWXv8erLvFM5dnU00SJ/KL7o=; b=VulTJwX6+WpwVqPJ68bx1KZ8icdorb+sNEeBI1fs/LTNdT+fPuIqbNzw2DhIl+Zydn 5u3ouiymz4cMOPkC2QtzJLJauU3458vwgZn5YvKpl/rP5AuSbJ3JQTBKGPL5j2kqYXpI 9pUEyGuJhbOliaase3Io+QZ1sxf6bcX++cU6j/bqsYx4hXdeZhcfkjtYWhLceoJxwltq cN0MdIAQWhRYlfB7dJFU8raKSNenppMwpUa6H9uq2tgzH3K3AAL5n4ee/1ruX/H0i4TP 5DV2cOTOmFIjroX8y0eVt3jOckCuUTJRZLNY541CaOcD06XekJ9Y7H3lVolRfjtnpOy+ dhEw== X-Gm-Message-State: AC+VfDzPUy0PLMie2mfl6CZGAJByE/uSM6yevCASc9O1mjos8TPS1uJl zXuWPC3yrbd7Z/LK4dW/r4k= X-Google-Smtp-Source: ACHHUZ73KecFiVlvxWoFvhkHq1S1ka+aP5L44WOTUoFTld5TNlnM5H4g5wI9JUK6sQc1OdqWfTxwvg== X-Received: by 2002:a17:902:d2ca:b0:1ae:8fa:cd4c with SMTP id n10-20020a170902d2ca00b001ae08facd4cmr20022292plc.7.1684810596148; Mon, 22 May 2023 19:56:36 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:35 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 09/14] bpf: sockmap, pull socket helpers out of listen test for general use Date: Mon, 22 May 2023 19:56:13 -0700 Message-Id: <20230523025618.113937-10-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net No functional change here we merely pull the helpers in sockmap_listen.c into a header file so we can use these in other programs. The tests we are about to add aren't really _listen tests so doesn't make sense to add them here. Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- .../bpf/prog_tests/sockmap_helpers.h | 272 ++++++++++++++++++ .../selftests/bpf/prog_tests/sockmap_listen.c | 263 +---------------- 2 files changed, 273 insertions(+), 262 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h new file mode 100644 index 000000000000..5aa99e6adcb4 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h @@ -0,0 +1,272 @@ +#ifndef __SOCKMAP_HELPERS__ +#define __SOCKMAP_HELPERS__ + +#include + +#define IO_TIMEOUT_SEC 30 +#define MAX_STRERR_LEN 256 +#define MAX_TEST_NAME 80 + +/* workaround for older vm_sockets.h */ +#ifndef VMADDR_CID_LOCAL +#define VMADDR_CID_LOCAL 1 +#endif + +#define __always_unused __attribute__((__unused__)) + +#define _FAIL(errnum, fmt...) \ + ({ \ + error_at_line(0, (errnum), __func__, __LINE__, fmt); \ + CHECK_FAIL(true); \ + }) +#define FAIL(fmt...) _FAIL(0, fmt) +#define FAIL_ERRNO(fmt...) _FAIL(errno, fmt) +#define FAIL_LIBBPF(err, msg) \ + ({ \ + char __buf[MAX_STRERR_LEN]; \ + libbpf_strerror((err), __buf, sizeof(__buf)); \ + FAIL("%s: %s", (msg), __buf); \ + }) + +/* Wrappers that fail the test on error and report it. */ + +#define xaccept_nonblock(fd, addr, len) \ + ({ \ + int __ret = \ + accept_timeout((fd), (addr), (len), IO_TIMEOUT_SEC); \ + if (__ret == -1) \ + FAIL_ERRNO("accept"); \ + __ret; \ + }) + +#define xbind(fd, addr, len) \ + ({ \ + int __ret = bind((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("bind"); \ + __ret; \ + }) + +#define xclose(fd) \ + ({ \ + int __ret = close((fd)); \ + if (__ret == -1) \ + FAIL_ERRNO("close"); \ + __ret; \ + }) + +#define xconnect(fd, addr, len) \ + ({ \ + int __ret = connect((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("connect"); \ + __ret; \ + }) + +#define xgetsockname(fd, addr, len) \ + ({ \ + int __ret = getsockname((fd), (addr), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("getsockname"); \ + __ret; \ + }) + +#define xgetsockopt(fd, level, name, val, len) \ + ({ \ + int __ret = getsockopt((fd), (level), (name), (val), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("getsockopt(" #name ")"); \ + __ret; \ + }) + +#define xlisten(fd, backlog) \ + ({ \ + int __ret = listen((fd), (backlog)); \ + if (__ret == -1) \ + FAIL_ERRNO("listen"); \ + __ret; \ + }) + +#define xsetsockopt(fd, level, name, val, len) \ + ({ \ + int __ret = setsockopt((fd), (level), (name), (val), (len)); \ + if (__ret == -1) \ + FAIL_ERRNO("setsockopt(" #name ")"); \ + __ret; \ + }) + +#define xsend(fd, buf, len, flags) \ + ({ \ + ssize_t __ret = send((fd), (buf), (len), (flags)); \ + if (__ret == -1) \ + FAIL_ERRNO("send"); \ + __ret; \ + }) + +#define xrecv_nonblock(fd, buf, len, flags) \ + ({ \ + ssize_t __ret = recv_timeout((fd), (buf), (len), (flags), \ + IO_TIMEOUT_SEC); \ + if (__ret == -1) \ + FAIL_ERRNO("recv"); \ + __ret; \ + }) + +#define xsocket(family, sotype, flags) \ + ({ \ + int __ret = socket(family, sotype, flags); \ + if (__ret == -1) \ + FAIL_ERRNO("socket"); \ + __ret; \ + }) + +#define xbpf_map_delete_elem(fd, key) \ + ({ \ + int __ret = bpf_map_delete_elem((fd), (key)); \ + if (__ret < 0) \ + FAIL_ERRNO("map_delete"); \ + __ret; \ + }) + +#define xbpf_map_lookup_elem(fd, key, val) \ + ({ \ + int __ret = bpf_map_lookup_elem((fd), (key), (val)); \ + if (__ret < 0) \ + FAIL_ERRNO("map_lookup"); \ + __ret; \ + }) + +#define xbpf_map_update_elem(fd, key, val, flags) \ + ({ \ + int __ret = bpf_map_update_elem((fd), (key), (val), (flags)); \ + if (__ret < 0) \ + FAIL_ERRNO("map_update"); \ + __ret; \ + }) + +#define xbpf_prog_attach(prog, target, type, flags) \ + ({ \ + int __ret = \ + bpf_prog_attach((prog), (target), (type), (flags)); \ + if (__ret < 0) \ + FAIL_ERRNO("prog_attach(" #type ")"); \ + __ret; \ + }) + +#define xbpf_prog_detach2(prog, target, type) \ + ({ \ + int __ret = bpf_prog_detach2((prog), (target), (type)); \ + if (__ret < 0) \ + FAIL_ERRNO("prog_detach2(" #type ")"); \ + __ret; \ + }) + +#define xpthread_create(thread, attr, func, arg) \ + ({ \ + int __ret = pthread_create((thread), (attr), (func), (arg)); \ + errno = __ret; \ + if (__ret) \ + FAIL_ERRNO("pthread_create"); \ + __ret; \ + }) + +#define xpthread_join(thread, retval) \ + ({ \ + int __ret = pthread_join((thread), (retval)); \ + errno = __ret; \ + if (__ret) \ + FAIL_ERRNO("pthread_join"); \ + __ret; \ + }) + +static inline int poll_read(int fd, unsigned int timeout_sec) +{ + struct timeval timeout = { .tv_sec = timeout_sec }; + fd_set rfds; + int r; + + FD_ZERO(&rfds); + FD_SET(fd, &rfds); + + r = select(fd + 1, &rfds, NULL, NULL, &timeout); + if (r == 0) + errno = ETIME; + + return r == 1 ? 0 : -1; +} + +static inline int accept_timeout(int fd, struct sockaddr *addr, socklen_t *len, + unsigned int timeout_sec) +{ + if (poll_read(fd, timeout_sec)) + return -1; + + return accept(fd, addr, len); +} + +static inline int recv_timeout(int fd, void *buf, size_t len, int flags, + unsigned int timeout_sec) +{ + if (poll_read(fd, timeout_sec)) + return -1; + + return recv(fd, buf, len, flags); +} + +static inline void init_addr_loopback4(struct sockaddr_storage *ss, + socklen_t *len) +{ + struct sockaddr_in *addr4 = memset(ss, 0, sizeof(*ss)); + + addr4->sin_family = AF_INET; + addr4->sin_port = 0; + addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK); + *len = sizeof(*addr4); +} + +static inline void init_addr_loopback6(struct sockaddr_storage *ss, + socklen_t *len) +{ + struct sockaddr_in6 *addr6 = memset(ss, 0, sizeof(*ss)); + + addr6->sin6_family = AF_INET6; + addr6->sin6_port = 0; + addr6->sin6_addr = in6addr_loopback; + *len = sizeof(*addr6); +} + +static inline void init_addr_loopback_vsock(struct sockaddr_storage *ss, + socklen_t *len) +{ + struct sockaddr_vm *addr = memset(ss, 0, sizeof(*ss)); + + addr->svm_family = AF_VSOCK; + addr->svm_port = VMADDR_PORT_ANY; + addr->svm_cid = VMADDR_CID_LOCAL; + *len = sizeof(*addr); +} + +static inline void init_addr_loopback(int family, struct sockaddr_storage *ss, + socklen_t *len) +{ + switch (family) { + case AF_INET: + init_addr_loopback4(ss, len); + return; + case AF_INET6: + init_addr_loopback6(ss, len); + return; + case AF_VSOCK: + init_addr_loopback_vsock(ss, len); + return; + default: + FAIL("unsupported address family %d", family); + } +} + +static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss) +{ + return (struct sockaddr *)ss; +} + +#endif // __SOCKMAP_HELPERS__ diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 141c1e5944ee..7dc8dd713256 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -20,11 +20,6 @@ #include #include -/* workaround for older vm_sockets.h */ -#ifndef VMADDR_CID_LOCAL -#define VMADDR_CID_LOCAL 1 -#endif - #include #include @@ -32,263 +27,7 @@ #include "test_progs.h" #include "test_sockmap_listen.skel.h" -#define IO_TIMEOUT_SEC 30 -#define MAX_STRERR_LEN 256 -#define MAX_TEST_NAME 80 - -#define __always_unused __attribute__((__unused__)) - -#define _FAIL(errnum, fmt...) \ - ({ \ - error_at_line(0, (errnum), __func__, __LINE__, fmt); \ - CHECK_FAIL(true); \ - }) -#define FAIL(fmt...) _FAIL(0, fmt) -#define FAIL_ERRNO(fmt...) _FAIL(errno, fmt) -#define FAIL_LIBBPF(err, msg) \ - ({ \ - char __buf[MAX_STRERR_LEN]; \ - libbpf_strerror((err), __buf, sizeof(__buf)); \ - FAIL("%s: %s", (msg), __buf); \ - }) - -/* Wrappers that fail the test on error and report it. */ - -#define xaccept_nonblock(fd, addr, len) \ - ({ \ - int __ret = \ - accept_timeout((fd), (addr), (len), IO_TIMEOUT_SEC); \ - if (__ret == -1) \ - FAIL_ERRNO("accept"); \ - __ret; \ - }) - -#define xbind(fd, addr, len) \ - ({ \ - int __ret = bind((fd), (addr), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("bind"); \ - __ret; \ - }) - -#define xclose(fd) \ - ({ \ - int __ret = close((fd)); \ - if (__ret == -1) \ - FAIL_ERRNO("close"); \ - __ret; \ - }) - -#define xconnect(fd, addr, len) \ - ({ \ - int __ret = connect((fd), (addr), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("connect"); \ - __ret; \ - }) - -#define xgetsockname(fd, addr, len) \ - ({ \ - int __ret = getsockname((fd), (addr), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("getsockname"); \ - __ret; \ - }) - -#define xgetsockopt(fd, level, name, val, len) \ - ({ \ - int __ret = getsockopt((fd), (level), (name), (val), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("getsockopt(" #name ")"); \ - __ret; \ - }) - -#define xlisten(fd, backlog) \ - ({ \ - int __ret = listen((fd), (backlog)); \ - if (__ret == -1) \ - FAIL_ERRNO("listen"); \ - __ret; \ - }) - -#define xsetsockopt(fd, level, name, val, len) \ - ({ \ - int __ret = setsockopt((fd), (level), (name), (val), (len)); \ - if (__ret == -1) \ - FAIL_ERRNO("setsockopt(" #name ")"); \ - __ret; \ - }) - -#define xsend(fd, buf, len, flags) \ - ({ \ - ssize_t __ret = send((fd), (buf), (len), (flags)); \ - if (__ret == -1) \ - FAIL_ERRNO("send"); \ - __ret; \ - }) - -#define xrecv_nonblock(fd, buf, len, flags) \ - ({ \ - ssize_t __ret = recv_timeout((fd), (buf), (len), (flags), \ - IO_TIMEOUT_SEC); \ - if (__ret == -1) \ - FAIL_ERRNO("recv"); \ - __ret; \ - }) - -#define xsocket(family, sotype, flags) \ - ({ \ - int __ret = socket(family, sotype, flags); \ - if (__ret == -1) \ - FAIL_ERRNO("socket"); \ - __ret; \ - }) - -#define xbpf_map_delete_elem(fd, key) \ - ({ \ - int __ret = bpf_map_delete_elem((fd), (key)); \ - if (__ret < 0) \ - FAIL_ERRNO("map_delete"); \ - __ret; \ - }) - -#define xbpf_map_lookup_elem(fd, key, val) \ - ({ \ - int __ret = bpf_map_lookup_elem((fd), (key), (val)); \ - if (__ret < 0) \ - FAIL_ERRNO("map_lookup"); \ - __ret; \ - }) - -#define xbpf_map_update_elem(fd, key, val, flags) \ - ({ \ - int __ret = bpf_map_update_elem((fd), (key), (val), (flags)); \ - if (__ret < 0) \ - FAIL_ERRNO("map_update"); \ - __ret; \ - }) - -#define xbpf_prog_attach(prog, target, type, flags) \ - ({ \ - int __ret = \ - bpf_prog_attach((prog), (target), (type), (flags)); \ - if (__ret < 0) \ - FAIL_ERRNO("prog_attach(" #type ")"); \ - __ret; \ - }) - -#define xbpf_prog_detach2(prog, target, type) \ - ({ \ - int __ret = bpf_prog_detach2((prog), (target), (type)); \ - if (__ret < 0) \ - FAIL_ERRNO("prog_detach2(" #type ")"); \ - __ret; \ - }) - -#define xpthread_create(thread, attr, func, arg) \ - ({ \ - int __ret = pthread_create((thread), (attr), (func), (arg)); \ - errno = __ret; \ - if (__ret) \ - FAIL_ERRNO("pthread_create"); \ - __ret; \ - }) - -#define xpthread_join(thread, retval) \ - ({ \ - int __ret = pthread_join((thread), (retval)); \ - errno = __ret; \ - if (__ret) \ - FAIL_ERRNO("pthread_join"); \ - __ret; \ - }) - -static int poll_read(int fd, unsigned int timeout_sec) -{ - struct timeval timeout = { .tv_sec = timeout_sec }; - fd_set rfds; - int r; - - FD_ZERO(&rfds); - FD_SET(fd, &rfds); - - r = select(fd + 1, &rfds, NULL, NULL, &timeout); - if (r == 0) - errno = ETIME; - - return r == 1 ? 0 : -1; -} - -static int accept_timeout(int fd, struct sockaddr *addr, socklen_t *len, - unsigned int timeout_sec) -{ - if (poll_read(fd, timeout_sec)) - return -1; - - return accept(fd, addr, len); -} - -static int recv_timeout(int fd, void *buf, size_t len, int flags, - unsigned int timeout_sec) -{ - if (poll_read(fd, timeout_sec)) - return -1; - - return recv(fd, buf, len, flags); -} - -static void init_addr_loopback4(struct sockaddr_storage *ss, socklen_t *len) -{ - struct sockaddr_in *addr4 = memset(ss, 0, sizeof(*ss)); - - addr4->sin_family = AF_INET; - addr4->sin_port = 0; - addr4->sin_addr.s_addr = htonl(INADDR_LOOPBACK); - *len = sizeof(*addr4); -} - -static void init_addr_loopback6(struct sockaddr_storage *ss, socklen_t *len) -{ - struct sockaddr_in6 *addr6 = memset(ss, 0, sizeof(*ss)); - - addr6->sin6_family = AF_INET6; - addr6->sin6_port = 0; - addr6->sin6_addr = in6addr_loopback; - *len = sizeof(*addr6); -} - -static void init_addr_loopback_vsock(struct sockaddr_storage *ss, socklen_t *len) -{ - struct sockaddr_vm *addr = memset(ss, 0, sizeof(*ss)); - - addr->svm_family = AF_VSOCK; - addr->svm_port = VMADDR_PORT_ANY; - addr->svm_cid = VMADDR_CID_LOCAL; - *len = sizeof(*addr); -} - -static void init_addr_loopback(int family, struct sockaddr_storage *ss, - socklen_t *len) -{ - switch (family) { - case AF_INET: - init_addr_loopback4(ss, len); - return; - case AF_INET6: - init_addr_loopback6(ss, len); - return; - case AF_VSOCK: - init_addr_loopback_vsock(ss, len); - return; - default: - FAIL("unsupported address family %d", family); - } -} - -static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss) -{ - return (struct sockaddr *)ss; -} +#include "sockmap_helpers.h" static int enable_reuseport(int s, int progfd) { From patchwork Tue May 23 02:56:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251460 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4306517AAF; Tue, 23 May 2023 02:56:43 +0000 (UTC) Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41DC7FA; Mon, 22 May 2023 19:56:38 -0700 (PDT) Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-53033a0b473so4815259a12.0; Mon, 22 May 2023 19:56:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810597; x=1687402597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2OiIh6/gwAzSF0mWF9OG5fkwKj7b+3VvE1TExhAd9ek=; b=FhxuFIGtCdvzYJ7ScaVHBjXG/eiXR/m/b0Uoxpc9NiwHKvA/NuKqMoUZ7gYteg5cVK fKN5B+4fVbfg6kY2kFXRkr/5L6rTt64LNxOxEuzOYVF/bHHWr6gmh8VJOeXzMjpCYMiT JJT7uwkBuXyV5OFRevkkBt5Rbk9sLaTGQ77J1NmXlKmwAd4JizQQ5KWAKxMLJPTRSGCP hgyycrzygGEFxc8sMQ31MUOEkPpnHvNGJ4E8tW7RgPf+c3vCGRyvxSVdVN54WOuMupOy xE9wIWvXs+PbCma2hWdwmQ8s4IQMgmp/4FbVBf5xyE9XHV9Sx2sZFPcrwrmz9/kYnTj5 wiEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810597; x=1687402597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2OiIh6/gwAzSF0mWF9OG5fkwKj7b+3VvE1TExhAd9ek=; b=acPsK2xtk0i+EoLHuACttbuWU9gE1zSWKpofe2D8Hd9sreb5JfimbNIjmed6QoyiHm fLWC3Q8CFvPKALiFhvYlAZb7022efMnaJzaWcKsxy/vMWcwfcvUvvW7hebF0RCCV+8d8 IZDa5LuGvxPlKCOg2GTmE3NvPiLCzvoHkPcuFAAErU00f2/AulFI852vtTs7wyfGwbsu fE8sTRwjSfwvuz7wi/qv92gnv+BkgnZppwDavBRvBM+A8cg0S5GDFdLvSOSAFpwj52pM /6b0JOUVffcuhNT9ihmZ4WjqiEdJMzVj3btgQhBSCnzUVS6Az2FI5YfKH4EkR798mhw9 ZjQQ== X-Gm-Message-State: AC+VfDx/2ETLWrt2am9yOaT2IXnoWbPhM3q4xW9ghGKorS+fLR27LPyF 3TfTw+oQ6XoPK8Glo4f9DPA= X-Google-Smtp-Source: ACHHUZ7lZ1trSa+VTrojyyyZVUojO4m9TcPItP82QyHyLOYyYuT03XfsRowLOWWiUBK6/YDnwIkYow== X-Received: by 2002:a17:902:7785:b0:1ad:f912:c047 with SMTP id o5-20020a170902778500b001adf912c047mr10732703pll.42.1684810597576; Mon, 22 May 2023 19:56:37 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:37 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 10/14] bpf: sockmap, build helper to create connected socket pair Date: Mon, 22 May 2023 19:56:14 -0700 Message-Id: <20230523025618.113937-11-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net A common operation for testing is to spin up a pair of sockets that are connected. Then we can use these to run specific tests that need to send data, check BPF programs and so on. The sockmap_listen programs already have this logic lets move it into the new sockmap_helpers header file for general use. Signed-off-by: John Fastabend Reviewed-by: Jakub Sitnicki --- .../bpf/prog_tests/sockmap_helpers.h | 118 ++++++++++++++++++ .../selftests/bpf/prog_tests/sockmap_listen.c | 107 +--------------- 2 files changed, 123 insertions(+), 102 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h index 5aa99e6adcb4..d12665490a90 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_helpers.h @@ -269,4 +269,122 @@ static inline struct sockaddr *sockaddr(struct sockaddr_storage *ss) return (struct sockaddr *)ss; } +static inline int add_to_sockmap(int sock_mapfd, int fd1, int fd2) +{ + u64 value; + u32 key; + int err; + + key = 0; + value = fd1; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + return err; + + key = 1; + value = fd2; + return xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); +} + +static inline int create_pair(int s, int family, int sotype, int *c, int *p) +{ + struct sockaddr_storage addr; + socklen_t len; + int err = 0; + + len = sizeof(addr); + err = xgetsockname(s, sockaddr(&addr), &len); + if (err) + return err; + + *c = xsocket(family, sotype, 0); + if (*c < 0) + return errno; + err = xconnect(*c, sockaddr(&addr), len); + if (err) { + err = errno; + goto close_cli0; + } + + *p = xaccept_nonblock(s, NULL, NULL); + if (*p < 0) { + err = errno; + goto close_cli0; + } + return err; +close_cli0: + close(*c); + return err; +} + +static inline int create_socket_pairs(int s, int family, int sotype, + int *c0, int *c1, int *p0, int *p1) +{ + int err; + + err = create_pair(s, family, sotype, c0, p0); + if (err) + return err; + + err = create_pair(s, family, sotype, c1, p1); + if (err) { + close(*c0); + close(*p0); + } + return err; +} + +static inline int enable_reuseport(int s, int progfd) +{ + int err, one = 1; + + err = xsetsockopt(s, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one)); + if (err) + return -1; + err = xsetsockopt(s, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, &progfd, + sizeof(progfd)); + if (err) + return -1; + + return 0; +} + +static inline int socket_loopback_reuseport(int family, int sotype, int progfd) +{ + struct sockaddr_storage addr; + socklen_t len; + int err, s; + + init_addr_loopback(family, &addr, &len); + + s = xsocket(family, sotype, 0); + if (s == -1) + return -1; + + if (progfd >= 0) + enable_reuseport(s, progfd); + + err = xbind(s, sockaddr(&addr), len); + if (err) + goto close; + + if (sotype & SOCK_DGRAM) + return s; + + err = xlisten(s, SOMAXCONN); + if (err) + goto close; + + return s; +close: + xclose(s); + return -1; +} + +static inline int socket_loopback(int family, int sotype) +{ + return socket_loopback_reuseport(family, sotype, -1); +} + + #endif // __SOCKMAP_HELPERS__ diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 7dc8dd713256..b4f6f3a50ae5 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -29,58 +29,6 @@ #include "sockmap_helpers.h" -static int enable_reuseport(int s, int progfd) -{ - int err, one = 1; - - err = xsetsockopt(s, SOL_SOCKET, SO_REUSEPORT, &one, sizeof(one)); - if (err) - return -1; - err = xsetsockopt(s, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, &progfd, - sizeof(progfd)); - if (err) - return -1; - - return 0; -} - -static int socket_loopback_reuseport(int family, int sotype, int progfd) -{ - struct sockaddr_storage addr; - socklen_t len; - int err, s; - - init_addr_loopback(family, &addr, &len); - - s = xsocket(family, sotype, 0); - if (s == -1) - return -1; - - if (progfd >= 0) - enable_reuseport(s, progfd); - - err = xbind(s, sockaddr(&addr), len); - if (err) - goto close; - - if (sotype & SOCK_DGRAM) - return s; - - err = xlisten(s, SOMAXCONN); - if (err) - goto close; - - return s; -close: - xclose(s); - return -1; -} - -static int socket_loopback(int family, int sotype) -{ - return socket_loopback_reuseport(family, sotype, -1); -} - static void test_insert_invalid(struct test_sockmap_listen *skel __always_unused, int family, int sotype, int mapfd) { @@ -723,31 +671,12 @@ static const char *redir_mode_str(enum redir_mode mode) } } -static int add_to_sockmap(int sock_mapfd, int fd1, int fd2) -{ - u64 value; - u32 key; - int err; - - key = 0; - value = fd1; - err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); - if (err) - return err; - - key = 1; - value = fd2; - return xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); -} - static void redir_to_connected(int family, int sotype, int sock_mapfd, int verd_mapfd, enum redir_mode mode) { const char *log_prefix = redir_mode_str(mode); - struct sockaddr_storage addr; int s, c0, c1, p0, p1; unsigned int pass; - socklen_t len; int err, n; u32 key; char b; @@ -758,36 +687,13 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, if (s < 0) return; - len = sizeof(addr); - err = xgetsockname(s, sockaddr(&addr), &len); + err = create_socket_pairs(s, family, sotype, &c0, &c1, &p0, &p1); if (err) goto close_srv; - c0 = xsocket(family, sotype, 0); - if (c0 < 0) - goto close_srv; - err = xconnect(c0, sockaddr(&addr), len); - if (err) - goto close_cli0; - - p0 = xaccept_nonblock(s, NULL, NULL); - if (p0 < 0) - goto close_cli0; - - c1 = xsocket(family, sotype, 0); - if (c1 < 0) - goto close_peer0; - err = xconnect(c1, sockaddr(&addr), len); - if (err) - goto close_cli1; - - p1 = xaccept_nonblock(s, NULL, NULL); - if (p1 < 0) - goto close_cli1; - err = add_to_sockmap(sock_mapfd, p0, p1); if (err) - goto close_peer1; + goto close; n = write(mode == REDIR_INGRESS ? c1 : p1, "a", 1); if (n < 0) @@ -795,12 +701,12 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, if (n == 0) FAIL("%s: incomplete write", log_prefix); if (n < 1) - goto close_peer1; + goto close; key = SK_PASS; err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); if (err) - goto close_peer1; + goto close; if (pass != 1) FAIL("%s: want pass count 1, have %d", log_prefix, pass); n = recv_timeout(c0, &b, 1, 0, IO_TIMEOUT_SEC); @@ -809,13 +715,10 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, if (n == 0) FAIL("%s: incomplete recv", log_prefix); -close_peer1: +close: xclose(p1); -close_cli1: xclose(c1); -close_peer0: xclose(p0); -close_cli0: xclose(c0); close_srv: xclose(s); From patchwork Tue May 23 02:56:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251459 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18D4C17721; Tue, 23 May 2023 02:56:42 +0000 (UTC) Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C491E102; Mon, 22 May 2023 19:56:39 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1ae4be0b1f3so47797555ad.0; Mon, 22 May 2023 19:56:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810599; x=1687402599; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kLga20o0ypa3YZ6F5VBKmdyV1GP+jP4eW2M1n6l95P8=; b=iz1Z8BOw4SQjVkqUOa0mEtRVTlmvHjRabBhc8ncfkkm0safwNfyOCjb64/pvfndCDR SfJQpcxXkjjxpiIz+dpXZyh7jAwGuGzKJohqMjyjX8Bwad/N6aFk2Y7X3wAU5HTPfYAJ zV1LnEYiPWmoq0jxCqtt9Bvq6Qjn0AqessKPVpLVLBMcdzaIINorgE/IabjgV6Ta3X8B 9BVhE78sLc6KBEwC98s1ikk9M/haLK1pYLHhK/XdpCCjX/qLjoi7J9BUQGYvPlHkrehO GJ0b6EdNb6aTb2HRAEGshmmhpvFn6DdrJNrm03YTVbTQGuyl/S2JRoChIIECYQvFukEa rARw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810599; x=1687402599; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kLga20o0ypa3YZ6F5VBKmdyV1GP+jP4eW2M1n6l95P8=; b=jVCV4LDyPtxlTxO22dtYTIloW94M3UQR8c+4DopXbEchUx2pDXs06z3bEbboi8j9nM D+DMZJ5+7t+8h/hjJjBnCgokKHp3fm0fFgma9Vab9AsVE90GODla185AOJ3Y0FlcgKuJ 6G6ngPaH56IZ9vHaks+3FOKbmb+EH7twXhk2FXdv+F4AHd7I9gfzFWNvkyGgM5pa8CSH HjdW6ppLwdcKd80uRt4asTGPqjBhtjeXrZgFJyX/a0O01hVXCLVlgPZLd5Mq29S/tj5O YoVzZTLMF4c4X6aj0420oSZa3KBoOa6j8c0fb/Lfvq9RyPu85DEyHtE3BZVQhZj0zBX1 SK3g== X-Gm-Message-State: AC+VfDxPyzTeZC3THK3hZQwJs2ComM52sb63nTXMcywbeE+jIwvbn4EV btk5FUl/ALeVJVlorICU388= X-Google-Smtp-Source: ACHHUZ7M/mN57jhBOpSgtRaiUaFFihFX2kXuevvGaP/b5u4A1khYyGAU1TmtL2ECL9tIORXSb64zXQ== X-Received: by 2002:a17:903:1108:b0:1ad:be4d:5dfe with SMTP id n8-20020a170903110800b001adbe4d5dfemr15054022plh.27.1684810599148; Mon, 22 May 2023 19:56:39 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:38 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 11/14] bpf: sockmap, test shutdown() correctly exits epoll and recv()=0 Date: Mon, 22 May 2023 19:56:15 -0700 Message-Id: <20230523025618.113937-12-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net When session gracefully shutdowns epoll needs to wake up and any recv() readers should return 0 not the -EAGAIN they previously returned. Note we use epoll instead of select to test the epoll wake on shutdown event as well. Signed-off-by: John Fastabend Reviewed-by: Jakub Sitnicki --- .../selftests/bpf/prog_tests/sockmap_basic.c | 62 +++++++++++++++++++ .../bpf/progs/test_sockmap_pass_prog.c | 32 ++++++++++ 2 files changed, 94 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 0ce25a967481..615a8164c8f0 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -2,6 +2,7 @@ // Copyright (c) 2020 Cloudflare #include #include +#include #include "test_progs.h" #include "test_skmsg_load_helpers.skel.h" @@ -9,8 +10,11 @@ #include "test_sockmap_invalid_update.skel.h" #include "test_sockmap_skb_verdict_attach.skel.h" #include "test_sockmap_progs_query.skel.h" +#include "test_sockmap_pass_prog.skel.h" #include "bpf_iter_sockmap.skel.h" +#include "sockmap_helpers.h" + #define TCP_REPAIR 19 /* TCP sock is under repair right now */ #define TCP_REPAIR_ON 1 @@ -350,6 +354,62 @@ static void test_sockmap_progs_query(enum bpf_attach_type attach_type) test_sockmap_progs_query__destroy(skel); } +#define MAX_EVENTS 10 +static void test_sockmap_skb_verdict_shutdown(void) +{ + struct epoll_event ev, events[MAX_EVENTS]; + int n, err, map, verdict, s, c1, p1; + struct test_sockmap_pass_prog *skel; + int epollfd; + int zero = 0; + char b; + + skel = test_sockmap_pass_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + map = bpf_map__fd(skel->maps.sock_map_rx); + + err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach")) + goto out; + + s = socket_loopback(AF_INET, SOCK_STREAM); + if (s < 0) + goto out; + err = create_pair(s, AF_INET, SOCK_STREAM, &c1, &p1); + if (err < 0) + goto out; + + err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST); + if (err < 0) + goto out_close; + + shutdown(p1, SHUT_WR); + + ev.events = EPOLLIN; + ev.data.fd = c1; + + epollfd = epoll_create1(0); + if (!ASSERT_GT(epollfd, -1, "epoll_create(0)")) + goto out_close; + err = epoll_ctl(epollfd, EPOLL_CTL_ADD, c1, &ev); + if (!ASSERT_OK(err, "epoll_ctl(EPOLL_CTL_ADD)")) + goto out_close; + err = epoll_wait(epollfd, events, MAX_EVENTS, -1); + if (!ASSERT_EQ(err, 1, "epoll_wait(fd)")) + goto out_close; + + n = recv(c1, &b, 1, SOCK_NONBLOCK); + ASSERT_EQ(n, 0, "recv_timeout(fin)"); +out_close: + close(c1); + close(p1); +out: + test_sockmap_pass_prog__destroy(skel); +} + void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -384,4 +444,6 @@ void test_sockmap_basic(void) test_sockmap_progs_query(BPF_SK_SKB_STREAM_VERDICT); if (test__start_subtest("sockmap skb_verdict progs query")) test_sockmap_progs_query(BPF_SK_SKB_VERDICT); + if (test__start_subtest("sockmap skb_verdict shutdown")) + test_sockmap_skb_verdict_shutdown(); } diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c new file mode 100644 index 000000000000..1d86a717a290 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_sockmap_pass_prog.c @@ -0,0 +1,32 @@ +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_rx SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_tx SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_msg SEC(".maps"); + +SEC("sk_skb") +int prog_skb_verdict(struct __sk_buff *skb) +{ + return SK_PASS; +} + +char _license[] SEC("license") = "GPL"; From patchwork Tue May 23 02:56:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251461 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E4D417AAF; Tue, 23 May 2023 02:56:44 +0000 (UTC) Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5232120; Mon, 22 May 2023 19:56:41 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1ae85b71141so36220125ad.0; Mon, 22 May 2023 19:56:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810601; x=1687402601; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BJKVFV10Wskzd6UgwZOel3wsQif2pwvMqBZ8BBnsjqM=; b=TGiMWH8oOjGdaGxXWdTktUofbsjfQA9dvJpZlTDqU9knz/nWodCkyGjHw40QjG+A5T m3lfGyyT86C4o233T+tFuQHDmpcvjP7DYo79ckwX/7KpN5tL1IW2Pi1iRczhqURlvdFt /Cjh4+HI4XmXmyealPrDz2U4wY2kpDGTtTkWjCEhsNY66oyC1jeT/PqpZfeTBUfSrf9d 5KC/Rw3W33f2yUfO7R6oLRNiUJKgQHe1FVXb/bJFnlUkhV1cxA8g5EtXy6TY3EcDNnZ/ 3MhW9CVudkUBGu4XLRNHB/oQPIHebFvs/8UJixjoj4cZ0ebJjpPLtWFoeFAKAL2vC9lt WFJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810601; x=1687402601; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BJKVFV10Wskzd6UgwZOel3wsQif2pwvMqBZ8BBnsjqM=; b=DOGH7an81wa1kdpAaDadKK3XybAxvdi/fjK2CDC3usuHGlHkygOZQYP4hxClPFRQy5 lrVLcU+/vwebSYWzX4X7eq/ryHi+0cAPoxG5+hyrJpv6BcFOwV76LxykmV23uOrr8kmF TuKloNzIVal4bqcTKYVu9MjvooCIhP+sFuQKlRCHk7jUdXcyFroUJwpGUCi2cHB/sQz0 OdMFOfjpNYt95Olyz0zqFCXj1uz1Jj4QKJekoLr4aWAXsRTl/fyJmdINImoL8fg23kYn MnPKxWWsCvKUmk7amFehy1Ow1LzVKp/70X0FogAdWOb5k/s3dCOVgv0RkU+q9S8p7OQP g/cQ== X-Gm-Message-State: AC+VfDx73AzC9F24Hrj3dd/hxOc+VdUVqr1/iL0npEmf7/qWDK+zh9S6 u+KUrAdhc3DrrOI8PHQhEGk= X-Google-Smtp-Source: ACHHUZ5MUayufWANFEFDxs67/6lBqVLCAlx7vwnDiIet1czscqdlStTaVlGiB12EJ7I9/KVmTRLSHw== X-Received: by 2002:a17:902:680d:b0:1a5:5e7:a1cc with SMTP id h13-20020a170902680d00b001a505e7a1ccmr10002855plk.58.1684810601251; Mon, 22 May 2023 19:56:41 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:40 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 12/14] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer Date: Mon, 22 May 2023 19:56:16 -0700 Message-Id: <20230523025618.113937-13-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net A bug was reported where ioctl(FIONREAD) returned zero even though the socket with a SK_SKB verdict program attached had bytes in the msg queue. The result is programs may hang or more likely try to recover, but use suboptimal buffer sizes. Add a test to check that ioctl(FIONREAD) returns the correct number of bytes. Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- .../selftests/bpf/prog_tests/sockmap_basic.c | 48 +++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 615a8164c8f0..fe56049f6568 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -410,6 +410,52 @@ static void test_sockmap_skb_verdict_shutdown(void) test_sockmap_pass_prog__destroy(skel); } +static void test_sockmap_skb_verdict_fionread(void) +{ + int err, map, verdict, s, c0, c1, p0, p1; + struct test_sockmap_pass_prog *skel; + int zero = 0, sent, recvd, avail; + char buf[256] = "0123456789"; + + skel = test_sockmap_pass_prog__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + return; + + verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + map = bpf_map__fd(skel->maps.sock_map_rx); + + err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach")) + goto out; + + s = socket_loopback(AF_INET, SOCK_STREAM); + if (!ASSERT_GT(s, -1, "socket_loopback(s)")) + goto out; + err = create_socket_pairs(s, AF_INET, SOCK_STREAM, &c0, &c1, &p0, &p1); + if (!ASSERT_OK(err, "create_socket_pairs(s)")) + goto out; + + err = bpf_map_update_elem(map, &zero, &c1, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(c1)")) + goto out_close; + + sent = xsend(p1, &buf, sizeof(buf), 0); + ASSERT_EQ(sent, sizeof(buf), "xsend(p0)"); + err = ioctl(c1, FIONREAD, &avail); + ASSERT_OK(err, "ioctl(FIONREAD) error"); + ASSERT_EQ(avail, sizeof(buf), "ioctl(FIONREAD)"); + recvd = recv_timeout(c1, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC); + ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(c0)"); + +out_close: + close(c0); + close(p0); + close(c1); + close(p1); +out: + test_sockmap_pass_prog__destroy(skel); +} + void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -446,4 +492,6 @@ void test_sockmap_basic(void) test_sockmap_progs_query(BPF_SK_SKB_VERDICT); if (test__start_subtest("sockmap skb_verdict shutdown")) test_sockmap_skb_verdict_shutdown(); + if (test__start_subtest("sockmap skb_verdict fionread")) + test_sockmap_skb_verdict_fionread(); } From patchwork Tue May 23 02:56:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251462 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DCCD17FF8; Tue, 23 May 2023 02:56:45 +0000 (UTC) Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9783A133; Mon, 22 May 2023 19:56:43 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-534696e4e0aso3921705a12.0; Mon, 22 May 2023 19:56:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810603; x=1687402603; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yWuuJj1UmKb8bhc9koutosmHK7yiCrZ5XUNipDsC3uQ=; b=QICa9CQvQUxEF+CfHe6OdSr/h2efqIEO5uy/kGJP7hjrMkVktYNa8SEAPCAPUF8xhK S+gR83y0wJrupbgNiuCbNCM70ORCJhZbzYUKZ0zQYLZZl/yxjSFVVhyRun7wg7DxEovw 5tGlDzZeDTJoUDAGYgfQ2Pvv0PMaK5UrvpoaeB4r+rNhCMb2BnGmlJ7MBWkjeZ5f7PCm DqaBRDdjCdokeStywvjqvg63GlouVe3tb6jnOFDuRitcSts9zHI8MMK1z4WngSC+/Ygs FF4eT6M5EJzUDbM0FkzJDOyhFBr6M5OdgN/G7Xn9oQcVvcZQ5mzx20mgTE6CaIAbnpRP vLmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810603; x=1687402603; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yWuuJj1UmKb8bhc9koutosmHK7yiCrZ5XUNipDsC3uQ=; b=LYMJ4MFqPyvr1ICSiQCxds63JFJwOHshVBmtql2J9/KyMeP7UjYmMtoNWQTEO/sJaz 5XMXkLLO5dg7udDQhwWzfPk0Bl5spGq1SN7REqUq9pk1br4HoUu2Zo+mPeFO+Wmf8iCI QaKi8Y0lcSTg5WyZsfKtLwt7HE3nUzovxbzSqL8Lwd3l5Qg7AP6vlLEdTa0mkJkPcbx5 vrUHHehyZBxSoxyda4Oli3qCMCbgwqpPKyLdLKlm2kDtb9xO/tVfTNp+Yx1QveCxvZ4Y QfXM7SqPXii+wis6rRpNBRSYeeS3WJv4mz5D6YBvhvXjXbv/ZlllD8/Shg85WQffAZKb 9CoA== X-Gm-Message-State: AC+VfDzh9CkL2vMvkNV0+aXcouta+Bx4UfRZow4M6DylBDIxsOy8yIfL hHZvOvkl4ZUsTA65B0ZRhCg= X-Google-Smtp-Source: ACHHUZ7Y7HD3Em1J32WNFTFHzN2jjz6psNiekzQWPZS7NK0tdjCxUcRkSP73KYnaBUvK6d9kDP2UVw== X-Received: by 2002:a17:903:451:b0:1a1:c7b2:e7c7 with SMTP id iw17-20020a170903045100b001a1c7b2e7c7mr10510659plb.49.1684810602852; Mon, 22 May 2023 19:56:42 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:42 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 13/14] bpf: sockmap, test FIONREAD returns correct bytes in rx buffer with drops Date: Mon, 22 May 2023 19:56:17 -0700 Message-Id: <20230523025618.113937-14-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net When BPF program drops pkts the sockmap logic 'eats' the packet and updates copied_seq. In the PASS case where the sk_buff is accepted we update copied_seq from recvmsg path so we need a new test to handle the drop case. Original patch series broke this resulting in test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256 #176/17 sockmap_basic/sockmap skb_verdict fionread on drop:FAIL After updated patch with fix. #176/16 sockmap_basic/sockmap skb_verdict fionread:OK #176/17 sockmap_basic/sockmap skb_verdict fionread on drop:OK Reviewed-by: Jakub Sitnicki Signed-off-by: John Fastabend --- .../selftests/bpf/prog_tests/sockmap_basic.c | 47 ++++++++++++++----- .../bpf/progs/test_sockmap_drop_prog.c | 32 +++++++++++++ 2 files changed, 66 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_drop_prog.c diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index fe56049f6568..064cc5e8d9ad 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -11,6 +11,7 @@ #include "test_sockmap_skb_verdict_attach.skel.h" #include "test_sockmap_progs_query.skel.h" #include "test_sockmap_pass_prog.skel.h" +#include "test_sockmap_drop_prog.skel.h" #include "bpf_iter_sockmap.skel.h" #include "sockmap_helpers.h" @@ -410,19 +411,31 @@ static void test_sockmap_skb_verdict_shutdown(void) test_sockmap_pass_prog__destroy(skel); } -static void test_sockmap_skb_verdict_fionread(void) +static void test_sockmap_skb_verdict_fionread(bool pass_prog) { + int expected, zero = 0, sent, recvd, avail; int err, map, verdict, s, c0, c1, p0, p1; - struct test_sockmap_pass_prog *skel; - int zero = 0, sent, recvd, avail; + struct test_sockmap_pass_prog *pass; + struct test_sockmap_drop_prog *drop; char buf[256] = "0123456789"; - skel = test_sockmap_pass_prog__open_and_load(); - if (!ASSERT_OK_PTR(skel, "open_and_load")) - return; + if (pass_prog) { + pass = test_sockmap_pass_prog__open_and_load(); + if (!ASSERT_OK_PTR(pass, "open_and_load")) + return; + verdict = bpf_program__fd(pass->progs.prog_skb_verdict); + map = bpf_map__fd(pass->maps.sock_map_rx); + expected = sizeof(buf); + } else { + drop = test_sockmap_drop_prog__open_and_load(); + if (!ASSERT_OK_PTR(drop, "open_and_load")) + return; + verdict = bpf_program__fd(drop->progs.prog_skb_verdict); + map = bpf_map__fd(drop->maps.sock_map_rx); + /* On drop data is consumed immediately and copied_seq inc'd */ + expected = 0; + } - verdict = bpf_program__fd(skel->progs.prog_skb_verdict); - map = bpf_map__fd(skel->maps.sock_map_rx); err = bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); if (!ASSERT_OK(err, "bpf_prog_attach")) @@ -443,9 +456,12 @@ static void test_sockmap_skb_verdict_fionread(void) ASSERT_EQ(sent, sizeof(buf), "xsend(p0)"); err = ioctl(c1, FIONREAD, &avail); ASSERT_OK(err, "ioctl(FIONREAD) error"); - ASSERT_EQ(avail, sizeof(buf), "ioctl(FIONREAD)"); - recvd = recv_timeout(c1, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC); - ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(c0)"); + ASSERT_EQ(avail, expected, "ioctl(FIONREAD)"); + /* On DROP test there will be no data to read */ + if (pass_prog) { + recvd = recv_timeout(c1, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC); + ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(c0)"); + } out_close: close(c0); @@ -453,7 +469,10 @@ static void test_sockmap_skb_verdict_fionread(void) close(c1); close(p1); out: - test_sockmap_pass_prog__destroy(skel); + if (pass_prog) + test_sockmap_pass_prog__destroy(pass); + else + test_sockmap_drop_prog__destroy(drop); } void test_sockmap_basic(void) @@ -493,5 +512,7 @@ void test_sockmap_basic(void) if (test__start_subtest("sockmap skb_verdict shutdown")) test_sockmap_skb_verdict_shutdown(); if (test__start_subtest("sockmap skb_verdict fionread")) - test_sockmap_skb_verdict_fionread(); + test_sockmap_skb_verdict_fionread(true); + if (test__start_subtest("sockmap skb_verdict fionread on drop")) + test_sockmap_skb_verdict_fionread(false); } diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_drop_prog.c b/tools/testing/selftests/bpf/progs/test_sockmap_drop_prog.c new file mode 100644 index 000000000000..29314805ce42 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_sockmap_drop_prog.c @@ -0,0 +1,32 @@ +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_rx SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_tx SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map_msg SEC(".maps"); + +SEC("sk_skb") +int prog_skb_verdict(struct __sk_buff *skb) +{ + return SK_DROP; +} + +char _license[] SEC("license") = "GPL"; From patchwork Tue May 23 02:56:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 13251463 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05B0117FF8; Tue, 23 May 2023 02:56:48 +0000 (UTC) Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CEBFFD; Mon, 22 May 2023 19:56:45 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1ae79528d4dso44829835ad.2; Mon, 22 May 2023 19:56:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684810604; x=1687402604; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=srUsONv1+BtA+seq8Jh4M9M/ZYn/sI/8M3Ij3rhpTaI=; b=ZZ/a68bUghvUvsOMQ6uS9KomRQCd7n8PypB/cYu3ufTiS/dI4OeoTZdpMYorqLL1AK VfZ+rf3QaD5jCsfYNMtZGSqsVj3rqJz+ktLxp7GY18tKqRhoXEVaDWCpZX82NXYme2pd pS/8TesdjtNuZJs2rrzH2VqNiKVxmfz/H65aXPDER7I/IlhiWQHPhOA/zTURZckfuDnX 0b0/4jm5ZZ6t10CjXp9a8B93ezT/lka+X4X/AqeShCQh7T2usGI+E9R/CRfDxVdEFQHN Wa1FzqgC5emy3b9tWqZv/ob+0HiZyYMgOahZ1x7KxVKd7iBPRYSNPf9QudP7S6znQqVW iF+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684810604; x=1687402604; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=srUsONv1+BtA+seq8Jh4M9M/ZYn/sI/8M3Ij3rhpTaI=; b=DgxwM9wtlnj+miTFKXHZdpcJxeOXdFCVHV/uqiKHPfN9tsU6fcwAaMoKw2HF+50Zob cqg1wMPiQPVEDYrNysZrKDrKaZtK9z1VTz7ZZonekOxlW278VWG2ttfiM44Qnvf2/KBE vCXzLZPTk5oA8I6wZWnb+q5DBgL7EimF5QtcKPBXW+aBbhQ6btHIzbnA2EZXl3WwsNHI qPMpGMrYASirv3CHenHdaXShMaeFqxx7/d4WJ9ZcmJ3HWCnO9jmndMlSgKWC16c0gs1m fMMee74ierD/QcJxLrsIu7sGP+bUnTVRw44BjLN/cxpD7V2a03D0FLZF5ExzV5YwtByP sTPw== X-Gm-Message-State: AC+VfDxAKMec4SDuCTkbDuG+/Tlr3JqbPwFskWlz5xd9AfkEbgeBfnef oIcihMlXDY115jTMawHvuVcXAirb1o4= X-Google-Smtp-Source: ACHHUZ46CruItPMvj0jEGz4i8MrYWiA4w82MQNTZWkXPSl0bmWnW3IWRon83hMKUUBKzAgm3GYfYMQ== X-Received: by 2002:a17:902:e9d5:b0:1ab:b120:8efe with SMTP id 21-20020a170902e9d500b001abb1208efemr12833780plk.22.1684810604657; Mon, 22 May 2023 19:56:44 -0700 (PDT) Received: from john.lan ([2605:59c8:148:ba10:82a6:5b19:9c99:3aad]) by smtp.gmail.com with ESMTPSA id h10-20020a170902748a00b001a67759f9f8sm5508285pll.106.2023.05.22.19.56.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 May 2023 19:56:44 -0700 (PDT) From: John Fastabend To: jakub@cloudflare.com, daniel@iogearbox.net Cc: john.fastabend@gmail.com, bpf@vger.kernel.org, netdev@vger.kernel.org, edumazet@google.com, ast@kernel.org, andrii@kernel.org, will@isovalent.com Subject: [PATCH bpf v10 14/14] bpf: sockmap, test progs verifier error with latest clang Date: Mon, 22 May 2023 19:56:18 -0700 Message-Id: <20230523025618.113937-15-john.fastabend@gmail.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230523025618.113937-1-john.fastabend@gmail.com> References: <20230523025618.113937-1-john.fastabend@gmail.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net With a relatively recent clang (7090c10273119) and with this commit to fix warnings in selftests (c8ed668593972) that uses __sink(err) to resolve unused variables. We get the following verifier error. root@6e731a24b33a:/host/tools/testing/selftests/bpf# ./test_sockmap libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = *(u32 *)(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- libbpf: prog 'bpf_sockmap': failed to load: -13 libbpf: failed to load object 'test_sockmap_kern.bpf.o' load_bpf_file: (-1) No such file or directory ERROR: (-1) load bpf failed libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = *(u32 *)(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- libbpf: prog 'bpf_sockmap': failed to load: -13 libbpf: failed to load object 'test_sockhash_kern.bpf.o' load_bpf_file: (-1) No such file or directory ERROR: (-1) load bpf failed libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = *(u32 *)(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- To fix simply remove the err value because its not actually used anywhere in the testing. We can investigate the root cause later. Future patch should probably actually test the err value as well. Although if the map updates fail they will get caught eventually by userspace. Fixes: c8ed668593972 ("selftests/bpf: fix lots of silly mistakes pointed out by compiler") Signed-off-by: John Fastabend Reviewed-by: Jakub Sitnicki --- .../testing/selftests/bpf/progs/test_sockmap_kern.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_kern.h b/tools/testing/selftests/bpf/progs/test_sockmap_kern.h index baf9ebc6d903..99d2ea9fb658 100644 --- a/tools/testing/selftests/bpf/progs/test_sockmap_kern.h +++ b/tools/testing/selftests/bpf/progs/test_sockmap_kern.h @@ -191,7 +191,7 @@ SEC("sockops") int bpf_sockmap(struct bpf_sock_ops *skops) { __u32 lport, rport; - int op, err, ret; + int op, ret; op = (int) skops->op; @@ -203,10 +203,10 @@ int bpf_sockmap(struct bpf_sock_ops *skops) if (lport == 10000) { ret = 1; #ifdef SOCKMAP - err = bpf_sock_map_update(skops, &sock_map, &ret, + bpf_sock_map_update(skops, &sock_map, &ret, BPF_NOEXIST); #else - err = bpf_sock_hash_update(skops, &sock_map, &ret, + bpf_sock_hash_update(skops, &sock_map, &ret, BPF_NOEXIST); #endif } @@ -218,10 +218,10 @@ int bpf_sockmap(struct bpf_sock_ops *skops) if (bpf_ntohl(rport) == 10001) { ret = 10; #ifdef SOCKMAP - err = bpf_sock_map_update(skops, &sock_map, &ret, + bpf_sock_map_update(skops, &sock_map, &ret, BPF_NOEXIST); #else - err = bpf_sock_hash_update(skops, &sock_map, &ret, + bpf_sock_hash_update(skops, &sock_map, &ret, BPF_NOEXIST); #endif } @@ -230,8 +230,6 @@ int bpf_sockmap(struct bpf_sock_ops *skops) break; } - __sink(err); - return 0; }