diff mbox series

[RFC,net-next] bpf, net: support redirecting to ifb with bpf

Message ID 20230328115105.13553-1-laoar.shao@gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net-next] bpf, net: support redirecting to ifb with bpf | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 26 this patch: 26
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 26 this patch: 26
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yafang Shao March 28, 2023, 11:51 a.m. UTC
In our container environment, we are using EDT-bpf to limit the egress
bandwidth. EDT-bpf can be used to limit egress only, but can't be used
to limit ingress. Some of our users also want to limit the ingress
bandwidth. But after applying EDT-bpf, which is based on clsact qdisc,
it is impossible to limit the ingress bandwidth currently, due to some
reasons,
1). We can't add ingress qdisc
The ingress qdisc can't coexist with clsact qdisc as clsact has both
ingress and egress handler. So our traditional method to limit ingress
bandwidth can't work any more.
2). We can't redirect ingress packet to ifb with bpf
By trying to analyze if it is possible to redirect the ingress packet to
ifb with a bpf program, we find that the ifb device is not supported by
bpf redirect yet.

This patch tries to resolve it by supporting redirecting to ifb with bpf
program.

There're some other users who want to resolve this issue as well. By
searching it in the lore, it shows that Jesper[1] and Tonghao[2] used to
propose similar solution. This proposal is almost the same with Jesper's
proposal, so I add Jesper's Co-developed-by here.

[1]. https://lore.kernel.org/bpf/160650040800.2890576.9811290366501747109.stgit@firesoul/
[2]. https://lore.kernel.org/netdev/20220324135653.2189-1-xiangxia.m.yue@gmail.com/

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/dev.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Yafang Shao April 3, 2023, noon UTC | #1
On Tue, Mar 28, 2023 at 7:51 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> In our container environment, we are using EDT-bpf to limit the egress
> bandwidth. EDT-bpf can be used to limit egress only, but can't be used
> to limit ingress. Some of our users also want to limit the ingress
> bandwidth. But after applying EDT-bpf, which is based on clsact qdisc,
> it is impossible to limit the ingress bandwidth currently, due to some
> reasons,
> 1). We can't add ingress qdisc
> The ingress qdisc can't coexist with clsact qdisc as clsact has both
> ingress and egress handler. So our traditional method to limit ingress
> bandwidth can't work any more.
> 2). We can't redirect ingress packet to ifb with bpf
> By trying to analyze if it is possible to redirect the ingress packet to
> ifb with a bpf program, we find that the ifb device is not supported by
> bpf redirect yet.
>
> This patch tries to resolve it by supporting redirecting to ifb with bpf
> program.
>
> There're some other users who want to resolve this issue as well. By
> searching it in the lore, it shows that Jesper[1] and Tonghao[2] used to
> propose similar solution. This proposal is almost the same with Jesper's
> proposal, so I add Jesper's Co-developed-by here.
>
> [1]. https://lore.kernel.org/bpf/160650040800.2890576.9811290366501747109.stgit@firesoul/
> [2]. https://lore.kernel.org/netdev/20220324135653.2189-1-xiangxia.m.yue@gmail.com/
>
> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
>  net/core/dev.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 18dc8d7..3e63f6b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3956,6 +3956,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
>                 return NULL;
>         case TC_ACT_REDIRECT:
>                 /* No need to push/pop skb's mac_header here on egress! */
> +               skb_set_redirected(skb, skb->tc_at_ingress);
>                 skb_do_redirect(skb);
>                 *ret = NET_XMIT_SUCCESS;
>                 return NULL;
> @@ -5138,6 +5139,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
>                  * redirecting to another netdev
>                  */
>                 __skb_push(skb, skb->mac_len);
> +               skb_set_redirected(skb, skb->tc_at_ingress);
>                 if (skb_do_redirect(skb) == -EAGAIN) {
>                         __skb_pull(skb, skb->mac_len);
>                         *another = true;
> --
> 1.8.3.1
>

Daniel, Alexei,

Any comments on this solution?
I noticed that you have rejected the other two proposals which are
listed in the commit log, one reason is that ifb is not recommended.
But it seems we have to use ifb if we want to limit ingress bandwidth.
Or do you have any better suggestions on the ingress bandwidth limit?
Ingress bandwidth is useful in some scenarios, for example, for the
TCP-based service, there may be lots of clients connecting it, so it
is not wise to limit the peers' egress. After limiting the ingress of
the server side, it will lower the send rate of the client by lowering
the TCP cwnd if the ingress bandwidth limit is reached. If we don't
limit it, the clients will continue sending requests at a high rate.
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 18dc8d7..3e63f6b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3956,6 +3956,7 @@  int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
 		return NULL;
 	case TC_ACT_REDIRECT:
 		/* No need to push/pop skb's mac_header here on egress! */
+		skb_set_redirected(skb, skb->tc_at_ingress);
 		skb_do_redirect(skb);
 		*ret = NET_XMIT_SUCCESS;
 		return NULL;
@@ -5138,6 +5139,7 @@  static __latent_entropy void net_tx_action(struct softirq_action *h)
 		 * redirecting to another netdev
 		 */
 		__skb_push(skb, skb->mac_len);
+		skb_set_redirected(skb, skb->tc_at_ingress);
 		if (skb_do_redirect(skb) == -EAGAIN) {
 			__skb_pull(skb, skb->mac_len);
 			*another = true;