[bpf-next,v4,1/2] net: bpf: Check SKB ownership against full socket.

Check SKB ownership of an SKB against full sockets instead of request_sock.

The filters were called only if an SKB is owned by the sock that the SKB is
sent out through.  In another words, skb->sk should point to the sock that
it is sending through its egress.  However, the filters would miss SYN/ACK
SKBs that they are owned by a request_sock but sent through the listener
sock, that is the socket listening incoming connections.

However, the listener socket is also the full socket of the request socket.
We should use the full socket as the owner socket of an SKB instead.

What is the ownership check for?
====================================

BPF_CGROUP_RUN_PROG_INET_EGRESS() checked sk == skb->sk to ensure the
ownership of an SKB. Alexei referred to a mailing list conversation that
took place a few years ago. [1] In that conversation, Daniel Borkmann
stated that:

    Wouldn't that mean however, when you go through stacked devices that
    you'd run the same eBPF cgroup program for skb->sk multiple times?

According to what Daniel said, the ownership check mentioned earlier
presumably prevents multiple calls of egress filters caused by an skb.

A test that reproduce this scenario shows that the BPF cgroup egress
programs can be called multiple times for one SKB if this ownership
check is not there.  So, we can not just remove this check.

Test Stacked Devices
=======================

We use L2TP to build an environment of stacked devices. L2TP (Layer 2
Tunneling Protocol) is a tunneling protocol used to support virtual private
networks (VPNs). It relays encapsulated packets; for example in UDP, to its
peer by using a socket.

Using L2TP, packets are first sent through the IP stack and should then
arrive at an L2TP device. The device will expand its SKB header to
encapsulate the packet. The SKB will be sent back to the IP stack using the
socket that was made for the L2TP session. After that, the routing process
will occur once more, but this time for a new destination.

We changed tools/testing/selftests/net/l2tp.sh to set up a test environment
using L2TP.  The run_ping() function in l2tp.sh is where the main change
occurred.

    run_ping()
    {
        local desc="$1"

        sleep 10
        run_cmd host-1 ${ping6} -s 227 -c 4 -i 10 -I fc00:101::1
        fc00:101::2
        log_test $? 0 "IPv6 route through L2TP tunnel ${desc}"
        sleep 10

    }

The test will use L2TP devices to send PING messages. These messages will
have a message size of 227 bytes as a special label to distinguish them.
This is not an ideal solution, but works.

During the execution of the test script, bpftrace was attached to
ip6_finish_output() and l2tp_xmit_skb(). BPF

    bpftrace -e '
      kfunc:ip6_finish_output {
        time("%H:%M:%S: ");
        printf("ip6_finish_output skb=%p skb->len=%d cgroup=%p sk=%p
                skb->sk=%p\n", args->skb, args->skb->len,
               args->sk->sk_cgrp_data.cgroup, args->sk, args->skb->sk); }
      kfunc:l2tp_xmit_skb {
        time("%H:%M:%S: ");
        printf("l2tp_xmit_skb skb=%p sk=%p\n", args->skb,
	       args->session->tunnel->sock); }'

The following is part of the output messages printed by bpftrace.

    16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=275
              cgroup=0xffff88810741f800 sk=0xffff888105f3b900
              skb->sk=0xffff888105f3b900

    16:35:20: l2tp_xmit_skb skb=0xffff888103d8e600 sk=0xffff888103dd6300

    16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=337
              cgroup=0xffff88810741f800 sk=0xffff888103dd6300
              skb->sk=0xffff888105f3b900

    16:35:20: ip6_finish_output skb=0xffff888103d8e600 skb->len=337
              cgroup=(nil) sk=(nil) skb->sk=(nil)

    16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=275
              cgroup=0xffffffff837741d0 sk=0xffff888101fe0000
              skb->sk=0xffff888101fe0000

    16:35:20: l2tp_xmit_skb skb=0xffff888103d8e000 sk=0xffff888103483180

    16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=337
              cgroup=0xffff88810741f800 sk=0xffff888103483180
              skb->sk=0xffff888101fe0000

    16:35:20: ip6_finish_output skb=0xffff888103d8e000 skb->len=337
              cgroup=(nil) sk=(nil) skb->sk=(nil)

The first four entries describe a PING message that was sent using the ping
command, whereas the following four entries describe the response received.
Multiple sockets are used to send one skb, including the socket used by the
L2TP session. This can be observed.

Based on this information, it seems that the ownership check is designed to
avoid multiple calls of egress filters caused by a single skb.

[1] https://lore.kernel.org/all/58193E9D.7040201@iogearbox.net/,

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
---
 include/linux/bpf-cgroup.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Message ID	20230624014600.576756-2-kuifeng@meta.com (mailing list archive)
State	Accepted
Commit	223f5f79f2ce8facd9d77dd44e9f403343630bfc
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E69F262E; Sat, 24 Jun 2023 01:46:10 +0000 (UTC) Received: from mail-yw1-x112b.google.com (mail-yw1-x112b.google.com [IPv6:2607:f8b0:4864:20::112b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05D6626B0; Fri, 23 Jun 2023 18:46:09 -0700 (PDT) Received: by mail-yw1-x112b.google.com with SMTP id 00721157ae682-5700b37da3fso11960747b3.1; Fri, 23 Jun 2023 18:46:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687571168; x=1690163168; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=76JpghSzqxpkmUMEkAEWsCp3Rh8C6sYcv5A/KZzk8E8=; b=iDurpGnRl0819ENTtN4y48rldrg6A9SdwD4pWSYwryoE+MRrnkbfSQ/Vx4Mo1+ZB0F pROIuW4Db8rBIwU2x7XHqectYTc7lor+/+rBHS5UjLnufoTtVzPfEeqsGMJgczpyzC6z QxxTFSYoVfR/bO0o+/1Ve+5ps9dmsmbB0+ju9V4KiTU5gr3xueo0YuZ7q5mCa/O+MA4j y4mpKI0VNVezWTKJkIMoOgHVEcaEDDQdHTH63iFWM3dJxRdvc9Xt04rNU9yFXqHVJWOG FdhcmBDDeHI4FIZnVa4hxeO8SGTTGFTQ6OzMjCYu3s3XYO25zRwKckVI/enSD6IDsBUM jv8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687571168; x=1690163168; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=76JpghSzqxpkmUMEkAEWsCp3Rh8C6sYcv5A/KZzk8E8=; b=GniWX2sSEPzR593rbYasnOmRT0l60WfDVITZmlK/ufc6jBBRz0keM6JWiymYJb/mZ7 8WgtqnZz5AzK1u+4T/8RiLLR0tfNwyH5WrKUUHIxox3nWhpBu39H6wW3v6Wbt0SYbAvd HNsojP7KyTotdZKhV+8AWf8oqwhvsgRyHbQoD6Gb3ynP8Rts6Tm6pBBjhKKHnMDx8zIR zkbyI62OP2L5/bXhc7y+K2dpnlEB4IuVenoIXanyin/uZW9Lgb/YYnX9GgZcpMZhKy6g brOMfjApbyl+/TzJ03cCNsYSMbBWyvRMJutPbEWvauRs5vr3bDqPK8f+Ptg8J4GUlO/X +hCg== X-Gm-Message-State: AC+VfDx1cs7aDCgDhRbzTuCx5P6kununXcm5Y1ootsFpIXu6RGGT7zGM n514UXJ3aEhMkZlNQDSkmWqErTV7VrYWzA== X-Google-Smtp-Source: ACHHUZ4RM/VQxfKMSuU+bvWKDxEDiR437qgUg8Y5x+aDVH5+hHJAhvn7gD0EZxFhnlPwzp1/nz9w+Q== X-Received: by 2002:a81:8450:0:b0:561:9082:8039 with SMTP id u77-20020a818450000000b0056190828039mr16258862ywf.38.1687571167745; Fri, 23 Jun 2023 18:46:07 -0700 (PDT) Received: from kickker.attlocal.net ([2600:1700:6cf8:1240:d5fe:ab6b:8508:3503]) by smtp.gmail.com with ESMTPSA id b126-20020a816784000000b005731f3c8989sm107916ywc.62.2023.06.23.18.46.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Jun 2023 18:46:07 -0700 (PDT) From: Kui-Feng Lee <thinker.li@gmail.com> X-Google-Original-From: Kui-Feng Lee <kuifeng@meta.com> To: bpf@vger.kernel.org, ast@kernel.org, martin.lau@linux.dev, song@kernel.org, kernel-team@meta.com, andrii@kernel.org, daniel@iogearbox.net, yhs@fb.com, kpsingh@kernel.org, shuah@kernel.org, john.fastabend@gmail.com, sdf@google.com, mykolal@fb.com, linux-kselftest@vger.kernel.org, jolsa@kernel.org, haoluo@google.com, netdev@vger.kernel.org Cc: Kui-Feng Lee <kuifeng@meta.com> Subject: [PATCH bpf-next v4 1/2] net: bpf: Check SKB ownership against full socket. Date: Fri, 23 Jun 2023 18:45:59 -0700 Message-Id: <20230624014600.576756-2-kuifeng@meta.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230624014600.576756-1-kuifeng@meta.com> References: <20230624014600.576756-1-kuifeng@meta.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: <bpf.vger.kernel.org> List-Subscribe: <mailto:bpf+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:bpf+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net
Series	Fix missing synack in BPF cgroup_skb filters \| expand [bpf-next,v4,0/2] Fix missing synack in BPF cgroup_skb filters [bpf-next,v4,1/2] net: bpf: Check SKB ownership against full socket. [bpf-next,v4,2/2] selftests/bpf: Verify that the cgroup_skb filters receive expected packets.

Context	Check	Description
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for veristat
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1070 this patch: 1070
netdev/cc_maintainers	success	CCed 12 of 12 maintainers
netdev/build_clang	success	Errors and warnings before: 142 this patch: 142
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1077 this patch: 1077
netdev/checkpatch	warning	WARNING: From:/Signed-off-by: email address mismatch: 'From: Kui-Feng Lee <thinker.li@gmail.com>' != 'Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>'
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for veristat
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-5	fail	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	fail	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for veristat

[bpf-next,v4,1/2] net: bpf: Check SKB ownership against full socket.

Checks

Commit Message

Comments

Patch