Message ID | 20230328115105.13553-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC,net-next] bpf, net: support redirecting to ifb with bpf | expand |
On Tue, Mar 28, 2023 at 7:51 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > In our container environment, we are using EDT-bpf to limit the egress > bandwidth. EDT-bpf can be used to limit egress only, but can't be used > to limit ingress. Some of our users also want to limit the ingress > bandwidth. But after applying EDT-bpf, which is based on clsact qdisc, > it is impossible to limit the ingress bandwidth currently, due to some > reasons, > 1). We can't add ingress qdisc > The ingress qdisc can't coexist with clsact qdisc as clsact has both > ingress and egress handler. So our traditional method to limit ingress > bandwidth can't work any more. > 2). We can't redirect ingress packet to ifb with bpf > By trying to analyze if it is possible to redirect the ingress packet to > ifb with a bpf program, we find that the ifb device is not supported by > bpf redirect yet. > > This patch tries to resolve it by supporting redirecting to ifb with bpf > program. > > There're some other users who want to resolve this issue as well. By > searching it in the lore, it shows that Jesper[1] and Tonghao[2] used to > propose similar solution. This proposal is almost the same with Jesper's > proposal, so I add Jesper's Co-developed-by here. > > [1]. https://lore.kernel.org/bpf/160650040800.2890576.9811290366501747109.stgit@firesoul/ > [2]. https://lore.kernel.org/netdev/20220324135653.2189-1-xiangxia.m.yue@gmail.com/ > > Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com> > Cc: Jesper Dangaard Brouer <brouer@redhat.com> > Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com> > --- > net/core/dev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 18dc8d7..3e63f6b 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3956,6 +3956,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) > return NULL; > case TC_ACT_REDIRECT: > /* No need to push/pop skb's mac_header here on egress! */ > + skb_set_redirected(skb, skb->tc_at_ingress); > skb_do_redirect(skb); > *ret = NET_XMIT_SUCCESS; > return NULL; > @@ -5138,6 +5139,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) > * redirecting to another netdev > */ > __skb_push(skb, skb->mac_len); > + skb_set_redirected(skb, skb->tc_at_ingress); > if (skb_do_redirect(skb) == -EAGAIN) { > __skb_pull(skb, skb->mac_len); > *another = true; > -- > 1.8.3.1 > Daniel, Alexei, Any comments on this solution? I noticed that you have rejected the other two proposals which are listed in the commit log, one reason is that ifb is not recommended. But it seems we have to use ifb if we want to limit ingress bandwidth. Or do you have any better suggestions on the ingress bandwidth limit? Ingress bandwidth is useful in some scenarios, for example, for the TCP-based service, there may be lots of clients connecting it, so it is not wise to limit the peers' egress. After limiting the ingress of the server side, it will lower the send rate of the client by lowering the TCP cwnd if the ingress bandwidth limit is reached. If we don't limit it, the clients will continue sending requests at a high rate.
diff --git a/net/core/dev.c b/net/core/dev.c index 18dc8d7..3e63f6b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3956,6 +3956,7 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) return NULL; case TC_ACT_REDIRECT: /* No need to push/pop skb's mac_header here on egress! */ + skb_set_redirected(skb, skb->tc_at_ingress); skb_do_redirect(skb); *ret = NET_XMIT_SUCCESS; return NULL; @@ -5138,6 +5139,7 @@ static __latent_entropy void net_tx_action(struct softirq_action *h) * redirecting to another netdev */ __skb_push(skb, skb->mac_len); + skb_set_redirected(skb, skb->tc_at_ingress); if (skb_do_redirect(skb) == -EAGAIN) { __skb_pull(skb, skb->mac_len); *another = true;