From patchwork Wed Aug 16 02:54:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhai X-Patchwork-Id: 13354522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 388E1C04A6A for ; Wed, 16 Aug 2023 02:55:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241393AbjHPCzQ (ORCPT ); Tue, 15 Aug 2023 22:55:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241383AbjHPCyr (ORCPT ); Tue, 15 Aug 2023 22:54:47 -0400 Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74D652132 for ; Tue, 15 Aug 2023 19:54:45 -0700 (PDT) Received: by mail-qv1-xf2d.google.com with SMTP id 6a1803df08f44-6424c61c80eso19665866d6.1 for ; Tue, 15 Aug 2023 19:54:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1692154484; x=1692759284; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=B//PxqTOcLRZSFYVWOXGViYC9OULMJhe+RhXHERECls=; b=VvEnBrfJTbDnS7YO6wf4U3/Wb0P5UNqUPHrcpqNithi5S0BrD4zhGQP9vNu7gX4U69 O4PnAXhdHHEEMNemMlmWN8cR6k7G2oaNkIgMN6edW/LPayyq255oQb/D3Zoc/FOPZDhL chxBz1yU4VNEqdDK/hP+j4mCLwWPYllhrEPFU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692154484; x=1692759284; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=B//PxqTOcLRZSFYVWOXGViYC9OULMJhe+RhXHERECls=; b=bayeCEh13Aq7RdnjwtrG4nbtSS8ZJd+DRyQb84vC5HdZIp4JXCBX1yjGLmnpbAPr1q B8hUf0M3vU1gQAh4F/szH/7dBmWlRr4PdxPi/rlE49Z4N49irW8FvHs90mqeQ3wCfF/Y VdVe3KD7NbpTBkTaCLVKqNWBLgj2YF/UZawFra0t5D9aY5OtzyBNJpkp5pXXKXJFa4/x ERS3ThtJScCup9HGtMULgGM9kUd/Fsot5uVFw3H9XAjCxI7iOixSuHVdwu0C2/GdlYX8 iGvTj0rFqn1fK7zAY0qolD6M2MpB9StDJuvgzqu9lSLCc1Sawyf3GnDxj0iRfTOjIBy1 DgKg== X-Gm-Message-State: AOJu0Ywyn0DXJCAyYLmmlqjuKjq/0GUI/VpZtUfLuCYk66uFnZ+bcMWw OZ5kI9moYHYsvBsMbPmgoLFnKw== X-Google-Smtp-Source: AGHT+IH2b3FS6jHzul6TdkbYn8e/Gv5OnxHjyZ+bzWZToXD1BUhFlrn/X8Ho5v7x8qdmANdvX12Jow== X-Received: by 2002:a05:6214:caf:b0:635:ce65:38a2 with SMTP id s15-20020a0562140caf00b00635ce6538a2mr6525581qvs.6.1692154484594; Tue, 15 Aug 2023 19:54:44 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id i13-20020a37c20d000000b007655a4c5423sm4143104qkm.130.2023.08.15.19.54.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 19:54:44 -0700 (PDT) Date: Tue, 15 Aug 2023 19:54:41 -0700 From: Yan Zhai To: bpf@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , Yan Zhai , Thomas Graf , Jordan Griege , Dan Carpenter , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v5 bpf 1/4] lwt: fix return values of BPF ops Message-ID: <28cb906436e87eada712f55e63ae5c420bea0ecb.1692153515.git.yan@cloudflare.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org BPF encap ops can return different types of positive values, such like NET_RX_DROP, NET_XMIT_CN, NETDEV_TX_BUSY, and so on, from function skb_do_redirect and bpf_lwt_xmit_reroute. At the xmit hook, such return values would be treated implicitly as LWTUNNEL_XMIT_CONTINUE in ip(6)_finish_output2. When this happens, skbs that have been freed would continue to the neighbor subsystem, causing use-after-free bug and kernel crashes. To fix the incorrect behavior, skb_do_redirect return values can be simply discarded, the same as tc-egress behavior. On the other hand, bpf_lwt_xmit_reroute returns useful errors to local senders, e.g. PMTU information. Thus convert its return values to avoid the conflict with LWTUNNEL_XMIT_CONTINUE. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Suggested-by: Martin KaFai Lau Suggested-by: Stanislav Fomichev Reported-by: Jordan Griege Signed-off-by: Yan Zhai --- * v5: discards skb_do_redirect return instead; convert bpf_lwt_xmit_reroute return; * v4: minor commit message changes * v3: converts skb_do_redirect statuses from both ingress and egress * v2: code style amend --- net/core/lwt_bpf.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index 8b6b5e72b217..4a0797f0a154 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -60,9 +60,8 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, ret = BPF_OK; } else { skb_reset_mac_header(skb); - ret = skb_do_redirect(skb); - if (ret == 0) - ret = BPF_REDIRECT; + skb_do_redirect(skb); + ret = BPF_REDIRECT; } break; @@ -255,7 +254,7 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb) err = dst_output(dev_net(skb_dst(skb)->dev), skb->sk, skb); if (unlikely(err)) - return err; + return net_xmit_errno(err); /* ip[6]_finish_output2 understand LWTUNNEL_XMIT_DONE */ return LWTUNNEL_XMIT_DONE; From patchwork Wed Aug 16 02:55:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhai X-Patchwork-Id: 13354523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF9E6C41513 for ; Wed, 16 Aug 2023 02:56:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241396AbjHPCzt (ORCPT ); Tue, 15 Aug 2023 22:55:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241424AbjHPCzd (ORCPT ); Tue, 15 Aug 2023 22:55:33 -0400 Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAFD32690 for ; Tue, 15 Aug 2023 19:55:25 -0700 (PDT) Received: by mail-qv1-xf34.google.com with SMTP id 6a1803df08f44-6424c61c80eso19667156d6.1 for ; Tue, 15 Aug 2023 19:55:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1692154525; x=1692759325; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PlkFhJjPZ+nVOvUd+Bq8Pm9xKRLhY4/ZIov2TViDPGA=; b=u0crk9vGr/nqq4DRoRfsmHXKIisixdAHqHTsS1KMoVwfubQuVk0JmLH7NclBlUCzwP GVP/G00X4Kfqdn6GNdCAunGqtZAtgZ0V0eUK0jkq68HAlYeRZeYac/6KOjVfyyxP/QjU lFKCEbNa2yP5IuAQCD/x011qQerffKA17z1cU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692154525; x=1692759325; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PlkFhJjPZ+nVOvUd+Bq8Pm9xKRLhY4/ZIov2TViDPGA=; b=YlXTU3G548R8CN1la1b5EhBkQsX9uaw5gsBnF7DZXAjojRQGcgspVj8knA8mq3rS8o 75Yz9Y0lXi6UZvyVW+1dox9eT0J2nR6c7ICuxPldpXafybFA9tVuuB/e4h7u2ZlsMCTH 2ndpiYIl3625+qRdhVgz7aNkezLs4rDHwvx+eCjphSH/rMRKuwopXDylmRa8gGuPUKW2 dqJrUTUjcq3zxZLyvpg0yCxK1N//gUN2pNAY66NCf6hUuvqkWh8l318YQdnnYuWAW13G pU3jolEiyJm9GtzgStC9FegtBnBo3x0YIH08VqNOHwgK0+nzXhJkqQxZGwcEl59ZNZk4 YLhw== X-Gm-Message-State: AOJu0YxEf/bqNzlo5+VJpvLoqY74jDeOfh77KYaB05PdDt/aIL8Gu3rM wkQKcuBKQgKMm+zRAYy16+hU0w== X-Google-Smtp-Source: AGHT+IER30di/frjLmSfjBFxrbvcoUh8ENJ/K3XUUSW6J703Xp97ZRzLzUrHCHHTK6WJ/FNFpTmFmg== X-Received: by 2002:a05:6214:500b:b0:63d:580:9c68 with SMTP id jo11-20020a056214500b00b0063d05809c68mr5481820qvb.32.1692154524793; Tue, 15 Aug 2023 19:55:24 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id i4-20020a0cf384000000b006300722883fsm4576015qvk.33.2023.08.15.19.55.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 19:55:24 -0700 (PDT) Date: Tue, 15 Aug 2023 19:55:21 -0700 From: Yan Zhai To: bpf@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , Yan Zhai , Thomas Graf , Jordan Griege , Dan Carpenter , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v5 bpf 2/4] lwt: check LWTUNNEL_XMIT_CONTINUE strictly Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2, such that any positive return value from a xmit hook could cause unexpected continue behavior, despite that related skb may have been freed. This could be error-prone for future xmit hook ops, particularly if dst_output statuses are directly returned. To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to distinguish from dst_output statuses and check the continue condition explicitly. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Suggested-by: Dan Carpenter Signed-off-by: Yan Zhai --- include/net/lwtunnel.h | 5 ++++- net/ipv4/ip_output.c | 2 +- net/ipv6/ip6_output.c | 2 +- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index 6f15e6fa154e..53bd2d02a4f0 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -16,9 +16,12 @@ #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) +/* LWTUNNEL_XMIT_CONTINUE should be distinguishable from dst_output return + * values (NET_XMIT_xxx and NETDEV_TX_xxx in linux/netdevice.h) for safety. + */ enum { LWTUNNEL_XMIT_DONE, - LWTUNNEL_XMIT_CONTINUE, + LWTUNNEL_XMIT_CONTINUE = 0x100, }; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 6ba1a0fafbaa..a6e4c82615d7 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s if (lwtunnel_xmit_redirect(dst->lwtstate)) { int res = lwtunnel_xmit(skb); - if (res < 0 || res == LWTUNNEL_XMIT_DONE) + if (res != LWTUNNEL_XMIT_CONTINUE) return res; } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 1e8c90e97608..016b0a513259 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -113,7 +113,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * if (lwtunnel_xmit_redirect(dst->lwtstate)) { int res = lwtunnel_xmit(skb); - if (res < 0 || res == LWTUNNEL_XMIT_DONE) + if (res != LWTUNNEL_XMIT_CONTINUE) return res; } From patchwork Wed Aug 16 02:56:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhai X-Patchwork-Id: 13354527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D28DCC41513 for ; Wed, 16 Aug 2023 02:57:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241359AbjHPC4z (ORCPT ); Tue, 15 Aug 2023 22:56:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241448AbjHPC4h (ORCPT ); Tue, 15 Aug 2023 22:56:37 -0400 Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F55F212D for ; Tue, 15 Aug 2023 19:56:35 -0700 (PDT) Received: by mail-qk1-x732.google.com with SMTP id af79cd13be357-76d70d17ab8so7978785a.1 for ; Tue, 15 Aug 2023 19:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1692154594; x=1692759394; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=wUS22k3AgcW7X1Fk7O5pl817ZLlS8d3V4wXOz1NMjBc=; b=tzza1nbLRB8tbdUG8ocRV16I6ZnKOByzToRA2xU6kZXHAGDsmmmGm2sOzBJKbA3t0e 4zjkr+C6ZZGHDCtZMS3lIwRkoMGrc9SzfpPwfTr6KzcmXhXgWN/7yBcf9faXI3FTBDGa mZuc1c4raeXY7x/06gX+QvGNTm/IrBuQfwR8w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692154594; x=1692759394; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=wUS22k3AgcW7X1Fk7O5pl817ZLlS8d3V4wXOz1NMjBc=; b=PU9dXZyKu5eSI7NhbuhvdrItTmzW5Oqy/8mgDKVgVKFQ0fIGY84JiGf50tHRws8oeO tstUT7cLrr+Tb0XIjx6laaxKaxozS1+q9WPN3VlQKtIE+uQ8GcFQqoDF+3LxyNFicLjI wG8BMrLbOZjeIZB951RllPyU49rnGhPxhYwpHYX6jWkt0RHlVUvtfJBgYZfK5MyM6h6T 08kelTlyG2RfHjdXCN/w/yNHh93rCZargo1DWD9KR00zB8P0Xirmq/osYV8j3x7uI5Wc ChjnurvIekaDBgwCWQeOhngyHCynH2eMaObE+W2QFxMWb0F7tG4nuowRhcr4HmIChY0h nWiw== X-Gm-Message-State: AOJu0YwXkyRl2ahlTvvUt1OeYLlXb09Ls0VvostvvT84VP3bUIrrhAcZ 0m9+ltlFRtKjpc8/D6Douw2acg== X-Google-Smtp-Source: AGHT+IESBkcivUQmO8KQstVRbxG1ZYE+IqctD3ngwgi91s6USXC8ccBqkCKegatKNgP4AuwKaRIXUg== X-Received: by 2002:a05:620a:178c:b0:75b:23a0:d9dc with SMTP id ay12-20020a05620a178c00b0075b23a0d9dcmr770197qkb.50.1692154594413; Tue, 15 Aug 2023 19:56:34 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id c7-20020a05620a11a700b0076827ce13f6sm4174464qkk.10.2023.08.15.19.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 19:56:33 -0700 (PDT) Date: Tue, 15 Aug 2023 19:56:31 -0700 From: Yan Zhai To: bpf@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , Yan Zhai , Thomas Graf , Jordan Griege , Dan Carpenter , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v5 bpf 3/4] selftests/bpf: add lwt_xmit tests for BPF_REDIRECT Message-ID: <8d0477aaee535503082de6a85793ffb0be73fccb.1692153515.git.yan@cloudflare.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org There is no lwt_xmit selftest for BPF_REDIRECT yet. Add test cases for both normal and abnormal situations. For abnormal test cases, devices are set down or have its carrier set down. Without proper fixes, BPF_REDIRECT to either ingress or egress of such device would panic the kernel, so please run the test in a VM for safety. Signed-off-by: Yan Zhai --- .../selftests/bpf/prog_tests/lwt_helpers.h | 139 ++++++++ .../selftests/bpf/prog_tests/lwt_redirect.c | 319 ++++++++++++++++++ .../selftests/bpf/progs/test_lwt_redirect.c | 58 ++++ 3 files changed, 516 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/lwt_helpers.h create mode 100644 tools/testing/selftests/bpf/prog_tests/lwt_redirect.c create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_redirect.c diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h b/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h new file mode 100644 index 000000000000..61333f2a03f9 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/lwt_helpers.h @@ -0,0 +1,139 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LWT_HELPERS_H +#define __LWT_HELPERS_H + +#include +#include +#include +#include + +#include "test_progs.h" + +#define log_err(MSG, ...) \ + fprintf(stderr, "(%s:%d: errno: %s) " MSG "\n", \ + __FILE__, __LINE__, strerror(errno), ##__VA_ARGS__) + +#define RUN_TEST(name) \ + ({ \ + if (test__start_subtest(#name)) \ + if (ASSERT_OK(netns_create(), "netns_create")) { \ + struct nstoken *token = open_netns(NETNS); \ + if (ASSERT_OK_PTR(token, "setns")) { \ + test_ ## name(); \ + close_netns(token); \ + } \ + netns_delete(); \ + } \ + }) + +#define NETNS "ns_lwt" + +static inline int netns_create(void) +{ + return system("ip netns add " NETNS); +} + +static inline int netns_delete(void) +{ + return system("ip netns del " NETNS ">/dev/null 2>&1"); +} + +static int open_tuntap(const char *dev_name, bool need_mac) +{ + int err = 0; + struct ifreq ifr; + int fd = open("/dev/net/tun", O_RDWR); + + if (!ASSERT_GT(fd, 0, "open(/dev/net/tun)")) + return -1; + + ifr.ifr_flags = IFF_NO_PI | (need_mac ? IFF_TAP : IFF_TUN); + memcpy(ifr.ifr_name, dev_name, IFNAMSIZ); + + err = ioctl(fd, TUNSETIFF, &ifr); + if (!ASSERT_OK(err, "ioctl(TUNSETIFF)")) { + close(fd); + return -1; + } + + err = fcntl(fd, F_SETFL, O_NONBLOCK); + if (!ASSERT_OK(err, "fcntl(O_NONBLOCK)")) { + close(fd); + return -1; + } + + return fd; +} + +#define ICMP_PAYLOAD_SIZE 100 + +/* Match an ICMP packet with payload len ICMP_PAYLOAD_SIZE */ +static int __expect_icmp_ipv4(char *buf, ssize_t len) +{ + struct iphdr *ip = (struct iphdr *)buf; + struct icmphdr *icmp = (struct icmphdr *)(ip + 1); + ssize_t min_header_len = sizeof(*ip) + sizeof(*icmp); + + if (len < min_header_len) + return -1; + + if (ip->protocol != IPPROTO_ICMP) + return -1; + + if (icmp->type != ICMP_ECHO) + return -1; + + return len == ICMP_PAYLOAD_SIZE + min_header_len; +} + +typedef int (*filter_t) (char *, ssize_t); + +/* wait_for_packet - wait for a packet that matches the filter + * + * @fd: tun fd/packet socket to read packet + * @filter: filter function, returning 1 if matches + * @timeout: timeout to wait for the packet + * + * Returns 1 if a matching packet is read, 0 if timeout expired, -1 on error. + */ +static int wait_for_packet(int fd, filter_t filter, struct timeval *timeout) +{ + char buf[4096]; + int max_retry = 5; /* in case we read some spurious packets */ + fd_set fds; + + FD_ZERO(&fds); + while (max_retry--) { + /* Linux modifies timeout arg... So make a copy */ + struct timeval copied_timeout = *timeout; + ssize_t ret = -1; + + FD_SET(fd, &fds); + + ret = select(1 + fd, &fds, NULL, NULL, &copied_timeout); + if (ret <= 0) { + if (errno == EINTR) + continue; + else if (errno == EAGAIN || ret == 0) + return 0; + + log_err("select failed"); + return -1; + } + + ret = read(fd, buf, sizeof(buf)); + + if (ret <= 0) { + log_err("read(dev): %ld", ret); + return -1; + } + + if (filter && filter(buf, ret) > 0) + return 1; + } + + return 0; +} + +#endif /* __LWT_HELPERS_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c new file mode 100644 index 000000000000..818642064bcb --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c @@ -0,0 +1,319 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause + +/* + * Test suite of lwt_xmit BPF programs that redirect packets + * The file tests focus not only if these programs work as expected normally, + * but also if they can handle abnormal situations gracefully. + * + * WARNING + * ------- + * This test suite may crash the kernel, thus should be run in a VM. + * + * Setup: + * --------- + * All tests are performed in a single netns. Two lwt encap routes are setup for + * each subtest: + * + * ip route add 10.0.0.2 encap bpf xmit sec "" dev link_err + * ip route add 10.0.0.3 encap bpf xmit sec "" dev link_err + * + * Here is statically defined to test_lwt_redirect.bpf.o, and each section + * of this object holds a program entry to test. The BPF object is built from + * progs/test_lwt_redirect.c. We didn't use generated BPF skeleton since the + * attachment for lwt programs are not supported by libbpf yet. + * + * For testing, a ping command is running in the test netns: + * + * ping -m 10.0.0. -c 1 -w 1 -s 100 + * + * All the BPF redirect program use the packet mark as output device index. + * + * Scenarios: + * -------------------------------- + * 1. Redirect to a running tap/tun device + * 2. Redirect to a down tap/tun device + * 3. Redirect to a vlan device with lower layer down + * + * Case 1, ping packets should be received by packet socket on target device + * when redirected to ingress, and by tun/tap fd when redirected to egress. + * + * Case 2,3 are considered successful as long as they do not crash the kernel + * as a regression. + * + * Case 1,2 use tap device to test redirect to device that requires MAC + * header, and tun device to test the case with no MAC header added. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "lwt_helpers.h" +#include "test_progs.h" +#include "network_helpers.h" + +#define BPF_OBJECT "test_lwt_redirect.bpf.o" +#define INGRESS_SEC(need_mac) ((need_mac) ? "redir_ingress" : "redir_ingress_nomac") +#define EGRESS_SEC(need_mac) ((need_mac) ? "redir_egress" : "redir_egress_nomac") +#define LOCAL_SRC "10.0.0.1" +#define IP_TO_INGRESS "10.0.0.2" +#define IP_TO_EGRESS "10.0.0.3" + +/* ping to redirect toward given dev, with mark being its index */ +static void ping_dev(const char *dev, const char *ip) +{ + int link_index = if_nametoindex(dev); + + if (!ASSERT_GE(link_index, 0, "if_nametoindex")) + return; + + /* We won't get a reply. Don't fail here */ + SYS_NOFAIL("ping -m %d %s -c1 -w1 -s %d >/dev/null 2>&1", + link_index, ip, ICMP_PAYLOAD_SIZE); +} + +static int new_packet_sock(const char *ifname) +{ + int err = 0; + int ignore_outgoing = 1; + int ifindex = -1; + int s = -1; + + s = socket(AF_PACKET, SOCK_RAW, 0); + if (!ASSERT_GE(s, 0, "socket(AF_PACKET)")) + return -1; + + ifindex = if_nametoindex(ifname); + if (!ASSERT_GE(ifindex, 0, "if_nametoindex")) { + close(s); + return -1; + } + + struct sockaddr_ll addr = { + .sll_family = AF_PACKET, + .sll_protocol = htons(ETH_P_IP), + .sll_ifindex = ifindex, + }; + + err = bind(s, (struct sockaddr *)&addr, sizeof(addr)); + if (!ASSERT_OK(err, "bind(AF_PACKET)")) { + close(s); + return -1; + } + + /* Use packet socket to capture only the ingress, so we can distinguish + * the case where a regression that actually redirects the packet to + * the egress. + */ + err = setsockopt(s, SOL_PACKET, PACKET_IGNORE_OUTGOING, + &ignore_outgoing, sizeof(ignore_outgoing)); + if (!ASSERT_OK(err, "setsockopt(PACKET_IGNORE_OUTGOING)")) { + close(s); + return -1; + } + + err = fcntl(s, F_SETFL, O_NONBLOCK); + if (!ASSERT_OK(err, "fcntl(O_NONBLOCK)")) { + close(s); + return -1; + } + + return s; +} + +static int expect_icmp(char *buf, ssize_t len) +{ + struct ethhdr *eth = (struct ethhdr *)buf; + + if (len < (ssize_t)sizeof(*eth)) + return -1; + + if (eth->h_proto == htons(ETH_P_IP)) + return __expect_icmp_ipv4((char *)(eth + 1), len - sizeof(*eth)); + + return -1; +} + +static int expect_icmp_nomac(char *buf, ssize_t len) +{ + return __expect_icmp_ipv4(buf, len); +} + +static void send_and_capture_test_packets(const char *test_name, int tap_fd, + const char *target_dev, bool need_mac) +{ + int psock = -1; + struct timeval timeo = { + .tv_sec = 0, + .tv_usec = 250000, + }; + int ret = -1; + + filter_t filter = need_mac ? expect_icmp : expect_icmp_nomac; + + ping_dev(target_dev, IP_TO_EGRESS); + + ret = wait_for_packet(tap_fd, filter, &timeo); + if (!ASSERT_EQ(ret, 1, "wait_for_epacket")) { + log_err("%s egress test fails", test_name); + goto out; + } + + psock = new_packet_sock(target_dev); + ping_dev(target_dev, IP_TO_INGRESS); + + ret = wait_for_packet(psock, filter, &timeo); + if (!ASSERT_EQ(ret, 1, "wait_for_ipacket")) { + log_err("%s ingress test fails", test_name); + goto out; + } + +out: + if (psock >= 0) + close(psock); +} + +static int setup_redirect_target(const char *target_dev, bool need_mac) +{ + int target_index = -1; + int tap_fd = -1; + + tap_fd = open_tuntap(target_dev, need_mac); + if (!ASSERT_GE(tap_fd, 0, "open_tuntap")) + goto fail; + + target_index = if_nametoindex(target_dev); + if (!ASSERT_GE(target_index, 0, "if_nametoindex")) + goto fail; + + SYS(fail, "ip link add link_err type dummy"); + SYS(fail, "ip link set lo up"); + SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32"); + SYS(fail, "ip link set link_err up"); + SYS(fail, "ip link set %s up", target_dev); + + SYS(fail, "ip route add %s dev link_err encap bpf xmit obj %s sec %s", + IP_TO_INGRESS, BPF_OBJECT, INGRESS_SEC(need_mac)); + SYS(fail, "ip route add %s dev link_err encap bpf xmit obj %s sec %s", + IP_TO_EGRESS, BPF_OBJECT, EGRESS_SEC(need_mac)); + + return tap_fd; + +fail: + if (tap_fd >= 0) + close(tap_fd); + return -1; +} + +static void test_lwt_redirect_normal(void) +{ + const char *target_dev = "tap0"; + int tap_fd = -1; + bool need_mac = true; + + tap_fd = setup_redirect_target(target_dev, need_mac); + if (!ASSERT_GE(tap_fd, 0, "setup_redirect_target")) + return; + + send_and_capture_test_packets(__func__, tap_fd, target_dev, need_mac); + close(tap_fd); +} + +static void test_lwt_redirect_normal_nomac(void) +{ + const char *target_dev = "tun0"; + int tap_fd = -1; + bool need_mac = false; + + tap_fd = setup_redirect_target(target_dev, need_mac); + if (!ASSERT_GE(tap_fd, 0, "setup_redirect_target")) + return; + + send_and_capture_test_packets(__func__, tap_fd, target_dev, need_mac); + close(tap_fd); +} + +/* This test aims to prevent regression of future. As long as the kernel does + * not panic, it is considered as success. + */ +static void __test_lwt_redirect_dev_down(bool need_mac) +{ + const char *target_dev = "tap0"; + int tap_fd = -1; + + tap_fd = setup_redirect_target(target_dev, need_mac); + if (!ASSERT_GE(tap_fd, 0, "setup_redirect_target")) + return; + + SYS(out, "ip link set %s down", target_dev); + ping_dev(target_dev, IP_TO_INGRESS); + ping_dev(target_dev, IP_TO_EGRESS); + +out: + close(tap_fd); +} + +static void test_lwt_redirect_dev_down(void) +{ + __test_lwt_redirect_dev_down(true); +} + +static void test_lwt_redirect_dev_down_nomac(void) +{ + __test_lwt_redirect_dev_down(false); +} + +/* This test aims to prevent regression of future. As long as the kernel does + * not panic, it is considered as success. + */ +static void test_lwt_redirect_dev_carrier_down(void) +{ + const char *lower_dev = "tap0"; + const char *vlan_dev = "vlan100"; + int tap_fd = -1; + + tap_fd = setup_redirect_target(lower_dev, true); + if (!ASSERT_GE(tap_fd, 0, "setup_redirect_target")) + return; + + SYS(out, "ip link add vlan100 link %s type vlan id 100", lower_dev); + SYS(out, "ip link set %s up", vlan_dev); + SYS(out, "ip link set %s down", lower_dev); + ping_dev(vlan_dev, IP_TO_INGRESS); + ping_dev(vlan_dev, IP_TO_EGRESS); + +out: + close(tap_fd); +} + +static void *test_lwt_redirect_run(void *arg) +{ + netns_delete(); + RUN_TEST(lwt_redirect_normal); + RUN_TEST(lwt_redirect_normal_nomac); + RUN_TEST(lwt_redirect_dev_down); + RUN_TEST(lwt_redirect_dev_down_nomac); + RUN_TEST(lwt_redirect_dev_carrier_down); + return NULL; +} + +void test_lwt_redirect(void) +{ + pthread_t test_thread; + int err; + + /* Run the tests in their own thread to isolate the namespace changes + * so they do not affect the environment of other tests. + * (specifically needed because of unshare(CLONE_NEWNS) in open_netns()) + */ + err = pthread_create(&test_thread, NULL, &test_lwt_redirect_run, NULL); + if (ASSERT_OK(err, "pthread_create")) + ASSERT_OK(pthread_join(test_thread, NULL), "pthread_join"); +} diff --git a/tools/testing/selftests/bpf/progs/test_lwt_redirect.c b/tools/testing/selftests/bpf/progs/test_lwt_redirect.c new file mode 100644 index 000000000000..f3dad9c8d356 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_lwt_redirect.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include "bpf_tracing_net.h" + +/* We don't care about whether the packet can be received by network stack. + * Just care if the packet is sent to the correct device at correct direction + * and not panic the kernel. + */ +static int prepend_dummy_mac(struct __sk_buff *skb) +{ + char mac[] = {0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0xf, + 0xe, 0xd, 0xc, 0xb, 0xa, 0x08, 0x00}; + + if (bpf_skb_change_head(skb, ETH_HLEN, 0)) { + bpf_printk("%s: fail to change head", __func__); + return -1; + } + + if (bpf_skb_store_bytes(skb, 0, mac, sizeof(mac), 0)) { + bpf_printk("%s: fail to update mac", __func__); + return -1; + } + + return 0; +} + +SEC("redir_ingress") +int test_lwt_redirect_in(struct __sk_buff *skb) +{ + if (prepend_dummy_mac(skb)) + return BPF_DROP; + + return bpf_redirect(skb->mark, BPF_F_INGRESS); +} + +SEC("redir_egress") +int test_lwt_redirect_out(struct __sk_buff *skb) +{ + if (prepend_dummy_mac(skb)) + return BPF_DROP; + + return bpf_redirect(skb->mark, 0); +} + +SEC("redir_egress_nomac") +int test_lwt_redirect_out_nomac(struct __sk_buff *skb) +{ + return bpf_redirect(skb->mark, 0); +} + +SEC("redir_ingress_nomac") +int test_lwt_redirect_in_nomac(struct __sk_buff *skb) +{ + return bpf_redirect(skb->mark, BPF_F_INGRESS); +} + +char _license[] SEC("license") = "GPL"; From patchwork Wed Aug 16 02:57:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhai X-Patchwork-Id: 13354528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DECE9C41513 for ; Wed, 16 Aug 2023 02:58:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241340AbjHPC6B (ORCPT ); Tue, 15 Aug 2023 22:58:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241413AbjHPC5a (ORCPT ); Tue, 15 Aug 2023 22:57:30 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 137F3212E for ; Tue, 15 Aug 2023 19:57:28 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-76d5352bdb0so216400585a.0 for ; Tue, 15 Aug 2023 19:57:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1692154647; x=1692759447; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=YSGdEnCqI6zb744rb3nN7Qwri65wFsNuiAOi4seAdmc=; b=u8cfweM9p3FM+s756YlMsJ0fuQr8ukg7dueUa3JJLNQdPPFyaZbCSs4ccefaj1/nrB 06xrJB2KpDhj7NKdcmHtxZb1+LriH2ZkfnZdZ1i4GzAjV9ujS1Kg35AhZ9AwH1ytBXpZ TshUcVzBwe92qWPX6hm0jYNjIfMyuACs5mtTA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692154647; x=1692759447; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=YSGdEnCqI6zb744rb3nN7Qwri65wFsNuiAOi4seAdmc=; b=ApX4NR6k4GeRboZFuVnPf+gIOrNhd+Ttc3LC3omiWapwXYrmP+xViE1Js7/jvcj7Yv krBzJfhdDKW8wxYsd8ozbCgiWklqpS1qTIQKzHeE36QbtlY5d8aEBK5mf2xfhjc17AMD 2zQDaHmTPA/MjAilAcFH3lCsYmtzKP3zMG+biRNNs3NLX/k8M6EnSvhhlYiyKbWeRZUr vaXMI9oDrRmB5E5yxx5co6LJNco6gpxxXKRzvTUOpRSMlJXWB/PIgtbzuU6P6YbpTfAv WkmE61JRekHuvyykx6j0OdQ8nY6x3lN7m65gr3qM2t+jIvawKifsOq+jlKjcCy42zLm/ PxxQ== X-Gm-Message-State: AOJu0Yws5YEACHQbjO7F403sxVV1lV0g+5RSdvrfX7DGSF+Z3gb6+hKw +MNZLZYpoY7aqtTsmxxQLk9trA== X-Google-Smtp-Source: AGHT+IG+B5NS32lthGUirYjWUP3Lv0I6Z9Ii988Ra+jZxvPldsot/LT2oAEmBFcInF/JgS6mgB5+yQ== X-Received: by 2002:a05:620a:4414:b0:767:edc0:64f8 with SMTP id v20-20020a05620a441400b00767edc064f8mr950843qkp.2.1692154647011; Tue, 15 Aug 2023 19:57:27 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id q4-20020a05620a038400b0076639dfca8dsm4158014qkm.80.2023.08.15.19.57.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Aug 2023 19:57:26 -0700 (PDT) Date: Tue, 15 Aug 2023 19:57:24 -0700 From: Yan Zhai To: bpf@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , Yan Zhai , Thomas Graf , Jordan Griege , Dan Carpenter , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v5 bpf 4/4] selftests/bpf: add lwt_xmit tests for BPF_REROUTE Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org There is no lwt selftest for BPF_REROUTE yet. Add test cases for both normal and abnormal situations. The abnormal situation is set up with an fq qdisc on the reroute target device. Without proper fixes, BPF_REROUTE to this device and overflow this qdisc queue limit (to trigger a drop) would panic the kernel, so please run the test in a VM for safety. Signed-off-by: Yan Zhai --- .../selftests/bpf/prog_tests/lwt_reroute.c | 256 ++++++++++++++++++ .../selftests/bpf/progs/test_lwt_reroute.c | 36 +++ 2 files changed, 292 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/lwt_reroute.c create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_reroute.c diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_reroute.c b/tools/testing/selftests/bpf/prog_tests/lwt_reroute.c new file mode 100644 index 000000000000..e5fa5451bd2c --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/lwt_reroute.c @@ -0,0 +1,256 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause + +/* + * Test suite of lwt BPF programs that reroutes packets + * The file tests focus not only if these programs work as expected normally, + * but also if they can handle abnormal situations gracefully. This test + * suite currently only covers lwt_xmit hook. lwt_in tests have not been + * implemented. + * + * WARNING + * ------- + * This test suite can crash the kernel, thus should be run in a VM. + * + * Setup: + * --------- + * all tests are performed in a single netns. A lwt encap route is setup for + * each subtest: + * + * ip route add 10.0.0.0/24 encap bpf xmit sec "" dev link_err + * + * Here is statically defined to test_lwt_reroute.bpf.o, and it contains + * a single test program entry. This program sets packet mark by last byte of + * the IPv4 daddr. For example, a packet going to 1.2.3.4 will receive a skb + * mark 4. A packet will only be marked once, and IP x.x.x.0 will be skipped + * to avoid route loop. We didn't use generated BPF skeleton since the + * attachment for lwt programs are not supported by libbpf yet. + * + * The test program will bring up a tun device, and sets up the following + * routes: + * + * ip rule add pref 100 from all fwmark lookup 100 + * ip route add table 100 default dev tun0 + * + * For normal testing, a ping command is running in the test netns: + * + * ping 10.0.0. -c 1 -w 1 -s 100 + * + * For abnormal testing, fq is used as the qdisc of the tun device. Then a UDP + * socket will try to overflow the fq queue and trigger qdisc drop error. + * + * Scenarios: + * -------------------------------- + * 1. Reroute to a running tun device + * 2. Reroute to a device where qdisc drop + * + * For case 1, ping packets should be received by the tun device. + * + * For case 2, force UDP packets to overflow fq limit. As long as kernel + * is not crashed, it is considered successful. + */ +#include "lwt_helpers.h" +#include "network_helpers.h" +#include + +#define BPF_OBJECT "test_lwt_reroute.bpf.o" +#define LOCAL_SRC "10.0.0.1" +#define TEST_CIDR "10.0.0.0/24" +#define XMIT_HOOK "xmit" +#define XMIT_SECTION "lwt_xmit" +#define NSEC_PER_SEC 1000000000ULL + +/* send a ping to be rerouted to the target device */ +static void ping_once(const char *ip) +{ + /* We won't get a reply. Don't fail here */ + SYS_NOFAIL("ping %s -c1 -w1 -s %d >/dev/null 2>&1", + ip, ICMP_PAYLOAD_SIZE); +} + +/* Send snd_target UDP packets to overflow the fq queue and trigger qdisc drop + * error. This is done via TX tstamp to force buffering delayed packets. + */ +static int overflow_fq(int snd_target, const char *target_ip) +{ + struct sockaddr_in addr = { + .sin_family = AF_INET, + .sin_port = htons(1234), + }; + + char data_buf[8]; /* only #pkts matter, so use a random small buffer */ + char control_buf[CMSG_SPACE(sizeof(uint64_t))]; + struct iovec iov = { + .iov_base = data_buf, + .iov_len = sizeof(data_buf), + }; + int err = -1; + int s = -1; + struct sock_txtime txtime_on = { + .clockid = CLOCK_MONOTONIC, + .flags = 0, + }; + struct msghdr msg = { + .msg_name = &addr, + .msg_namelen = sizeof(addr), + .msg_control = control_buf, + .msg_controllen = sizeof(control_buf), + .msg_iovlen = 1, + .msg_iov = &iov, + }; + struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg); + + memset(data_buf, 0, sizeof(data_buf)); + + s = socket(AF_INET, SOCK_DGRAM, 0); + if (!ASSERT_GE(s, 0, "socket")) + goto out; + + err = setsockopt(s, SOL_SOCKET, SO_TXTIME, &txtime_on, sizeof(txtime_on)); + if (!ASSERT_OK(err, "setsockopt(SO_TXTIME)")) + goto out; + + err = inet_pton(AF_INET, target_ip, &addr.sin_addr); + if (!ASSERT_EQ(err, 1, "inet_pton")) + goto out; + + while (snd_target > 0) { + struct timespec now; + + memset(control_buf, 0, sizeof(control_buf)); + cmsg->cmsg_type = SCM_TXTIME; + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_len = CMSG_LEN(sizeof(uint64_t)); + + err = clock_gettime(CLOCK_MONOTONIC, &now); + if (!ASSERT_OK(err, "clock_gettime(CLOCK_MONOTONIC)")) { + err = -1; + goto out; + } + + *(uint64_t *)CMSG_DATA(cmsg) = (now.tv_nsec + 1) * NSEC_PER_SEC + + now.tv_nsec; + + /* we will intentionally send more than fq limit, so ignore + * the error here. + */ + sendmsg(s, &msg, MSG_NOSIGNAL); + snd_target--; + } + + /* no kernel crash so far is considered success */ + err = 0; + +out: + if (s >= 0) + close(s); + + return err; +} + +static int setup(const char *tun_dev) +{ + int target_index = -1; + int tap_fd = -1; + + tap_fd = open_tuntap(tun_dev, false); + if (!ASSERT_GE(tap_fd, 0, "open_tun")) + return -1; + + target_index = if_nametoindex(tun_dev); + if (!ASSERT_GE(target_index, 0, "if_nametoindex")) + return -1; + + SYS(fail, "ip link add link_err type dummy"); + SYS(fail, "ip link set lo up"); + SYS(fail, "ip addr add dev lo " LOCAL_SRC "/32"); + SYS(fail, "ip link set link_err up"); + SYS(fail, "ip link set %s up", tun_dev); + + SYS(fail, "ip route add %s dev link_err encap bpf xmit obj %s sec lwt_xmit", + TEST_CIDR, BPF_OBJECT); + + SYS(fail, "ip rule add pref 100 from all fwmark %d lookup 100", + target_index); + SYS(fail, "ip route add t 100 default dev %s", tun_dev); + + return tap_fd; + +fail: + if (tap_fd >= 0) + close(tap_fd); + return -1; +} + +static void test_lwt_reroute_normal_xmit(void) +{ + const char *tun_dev = "tun0"; + int tun_fd = -1; + int ifindex = -1; + char ip[256]; + struct timeval timeo = { + .tv_sec = 0, + .tv_usec = 250000, + }; + + tun_fd = setup(tun_dev); + if (!ASSERT_GE(tun_fd, 0, "setup_reroute")) + return; + + ifindex = if_nametoindex(tun_dev); + snprintf(ip, 256, "10.0.0.%d", ifindex); + + /* ping packets should be received by the tun device */ + ping_once(ip); + + if (!ASSERT_EQ(wait_for_packet(tun_fd, __expect_icmp_ipv4, &timeo), 1, + "wait_for_packet")) + log_err("%s xmit", __func__); +} + +/* + * Test the failure case when the skb is dropped at the qdisc. This is a + * regression prevention at the xmit hook only. + */ +static void test_lwt_reroute_qdisc_dropped(void) +{ + const char *tun_dev = "tun0"; + int tun_fd = -1; + int ifindex = -1; + char ip[256]; + + tun_fd = setup(tun_dev); + if (!ASSERT_GE(tun_fd, 0, "setup_reroute")) + goto fail; + + SYS(fail, "tc qdisc replace dev %s root fq limit 5 flow_limit 5", tun_dev); + + ifindex = if_nametoindex(tun_dev); + snprintf(ip, 256, "10.0.0.%d", ifindex); + ASSERT_EQ(overflow_fq(10, ip), 0, "overflow_fq"); + +fail: + if (tun_fd >= 0) + close(tun_fd); +} + +static void *test_lwt_reroute_run(void *arg) +{ + netns_delete(); + RUN_TEST(lwt_reroute_normal_xmit); + RUN_TEST(lwt_reroute_qdisc_dropped); + return NULL; +} + +void test_lwt_reroute(void) +{ + pthread_t test_thread; + int err; + + /* Run the tests in their own thread to isolate the namespace changes + * so they do not affect the environment of other tests. + * (specifically needed because of unshare(CLONE_NEWNS) in open_netns()) + */ + err = pthread_create(&test_thread, NULL, &test_lwt_reroute_run, NULL); + if (ASSERT_OK(err, "pthread_create")) + ASSERT_OK(pthread_join(test_thread, NULL), "pthread_join"); +} diff --git a/tools/testing/selftests/bpf/progs/test_lwt_reroute.c b/tools/testing/selftests/bpf/progs/test_lwt_reroute.c new file mode 100644 index 000000000000..1dc64351929c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_lwt_reroute.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +/* This function extracts the last byte of the daddr, and uses it + * as output dev index. + */ +SEC("lwt_xmit") +int test_lwt_reroute(struct __sk_buff *skb) +{ + struct iphdr *iph = NULL; + void *start = (void *)(long)skb->data; + void *end = (void *)(long)skb->data_end; + + /* set mark at most once */ + if (skb->mark != 0) + return BPF_OK; + + if (start + sizeof(*iph) > end) + return BPF_DROP; + + iph = (struct iphdr *)start; + skb->mark = bpf_ntohl(iph->daddr) & 0xff; + + /* do not reroute x.x.x.0 packets */ + if (skb->mark == 0) + return BPF_OK; + + return BPF_LWT_REROUTE; +} + +char _license[] SEC("license") = "GPL";