Message ID | 20231215180827.3638838-1-victor@mojatatu.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC,net-next] net: sched: act_mirred: Extend the cpu mirred nest guard with an explicit loop ttl | expand |
hello Victor, thanks for the patch! On Fri, Dec 15, 2023 at 03:08:27PM -0300, Victor Nogueira wrote: > As pointed out by Jamal in: > https://lore.kernel.org/netdev/CAM0EoMn4C-zwrTCGzKzuRYukxoqBa8tyHyFDwUSZYwkMOUJ4Lw@mail.gmail.com/ > > Mirred is allowing for infinite loops in certain use cases, such as the > following: > > ---- > sudo ip netns add p4node > sudo ip link add p4port0 address 10:00:00:01:AA:BB type veth peer \ > port0 address 10:00:00:02:AA:BB > > sudo ip link set dev port0 netns p4node > sudo ip a add 10.0.0.1/24 dev p4port0 > sudo ip neigh add 10.0.0.2 dev p4port0 lladdr 10:00:00:02:aa:bb > sudo ip netns exec p4node ip a add 10.0.0.2/24 dev port0 > sudo ip netns exec p4node ip l set dev port0 up > sudo ip l set dev p4port0 up > sudo ip netns exec p4node tc qdisc add dev port0 clsact > sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip \ > prio 10 matchall action mirred ingress redirect dev port0 > > ping -I p4port0 10.0.0.2 -c 1 > ----- > > To solve this, we reintroduced a ttl variable attached to the skb (in > struct tc_skb_cb) which will prevent infinite loops for use cases such as > the one described above. > > The nest per cpu variable (tcf_mirred_nest_level) is now only used for > detecting whether we should call netif_rx or netif_receive_skb when > sending the packet to ingress. looks good to me. Do you think it's worth setting an initial value (0, AFAIU) for tc_skb_cb(skb)->ttl inside tc_run() ? other than this, Acked-by: Davide Caratti <dcaratti@redhat.com>
On Mon, Dec 18, 2023 at 3:49 AM Davide Caratti <dcaratti@redhat.com> wrote: > > hello Victor, thanks for the patch! > > On Fri, Dec 15, 2023 at 03:08:27PM -0300, Victor Nogueira wrote: > > As pointed out by Jamal in: > > https://lore.kernel.org/netdev/CAM0EoMn4C-zwrTCGzKzuRYukxoqBa8tyHyFDwUSZYwkMOUJ4Lw@mail.gmail.com/ > > > > Mirred is allowing for infinite loops in certain use cases, such as the > > following: > > > > ---- > > sudo ip netns add p4node > > sudo ip link add p4port0 address 10:00:00:01:AA:BB type veth peer \ > > port0 address 10:00:00:02:AA:BB > > > > sudo ip link set dev port0 netns p4node > > sudo ip a add 10.0.0.1/24 dev p4port0 > > sudo ip neigh add 10.0.0.2 dev p4port0 lladdr 10:00:00:02:aa:bb > > sudo ip netns exec p4node ip a add 10.0.0.2/24 dev port0 > > sudo ip netns exec p4node ip l set dev port0 up > > sudo ip l set dev p4port0 up > > sudo ip netns exec p4node tc qdisc add dev port0 clsact > > sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip \ > > prio 10 matchall action mirred ingress redirect dev port0 > > > > ping -I p4port0 10.0.0.2 -c 1 > > ----- > > > > To solve this, we reintroduced a ttl variable attached to the skb (in > > struct tc_skb_cb) which will prevent infinite loops for use cases such as > > the one described above. > > > > The nest per cpu variable (tcf_mirred_nest_level) is now only used for > > detecting whether we should call netif_rx or netif_receive_skb when > > sending the packet to ingress. > > looks good to me. Do you think it's worth setting an initial value (0, AFAIU) > for tc_skb_cb(skb)->ttl inside tc_run() ? > Good point but I am afraid that will reset the loop counter (imagine ingress->ingress, egress->ingress etc). So it wont work. Unfortunately we've hit a snag with cb because it is shared across multiple layers. I am afraid we cant ignore it. If the packet came downward from some upper layer (or driver, buggy mostly) using the same ttl spot in the cb, then the ttl field will be either 0 or > 0. 1) 0 < ttl < 4 then we will interpret it as "packet has looped before" and we wont drop it, so we are good here and we will end up dropping it later in one or more loop. 2) If the retrieved ttl is >=4 we will immediately drop it. This is catastrophic because we cant stop ipv4 or esp from using this field to their pleasure and they certainly dont reset these fields. I dont see a way out unless we extend the skb->tc_at_ingress to be to 2 bits as it was originally. In the original code it was called we had SET_TC_AT() which set two bits to in the skb->verdict to say "this packet was last seen at ingress/egress/elsewhere". Elsewhere was a 0 and this got changed to a boolean which translates to "this packet was last seen at ingress/egress". So if it came from a freshly allocated skb eg from driver or if it came from the stack it will be 0 (since the field is reserved for tc). +Cc Florian. cheers, jamal > other than this, > > Acked-by: Davide Caratti <dcaratti@redhat.com> >
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index 9fa1d0794dfa..fb8234fd5324 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -282,6 +282,7 @@ struct tc_skb_cb { u8 post_ct:1; u8 post_ct_snat:1; u8 post_ct_dnat:1; + u8 ttl:3; u16 zone; /* Only valid if post_ct = true */ }; @@ -293,6 +294,16 @@ static inline struct tc_skb_cb *tc_skb_cb(const struct sk_buff *skb) return cb; } +static inline void tcf_ttl_set(struct sk_buff *skb, const u8 ttl) +{ + tc_skb_cb(skb)->ttl = ttl; +} + +static inline u8 tcf_ttl_get(struct sk_buff *skb) +{ + return tc_skb_cb(skb)->ttl; +} + static inline bool tc_qdisc_stats_dump(struct Qdisc *sch, unsigned long cl, struct qdisc_walker *arg) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 0a711c184c29..42b267817f3c 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -29,7 +29,7 @@ static LIST_HEAD(mirred_list); static DEFINE_SPINLOCK(mirred_list_lock); -#define MIRRED_NEST_LIMIT 4 +#define MAX_REC_LOOP 4 static DEFINE_PER_CPU(unsigned int, mirred_nest_level); static bool tcf_mirred_is_act_redirect(int action) @@ -233,7 +233,6 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb, struct sk_buff *skb2 = skb; bool m_mac_header_xmit; struct net_device *dev; - unsigned int nest_level; int retval, err = 0; bool use_reinsert; bool want_ingress; @@ -243,9 +242,12 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb, int m_eaction; int mac_len; bool at_nh; + u8 ttl; - nest_level = __this_cpu_inc_return(mirred_nest_level); - if (unlikely(nest_level > MIRRED_NEST_LIMIT)) { + __this_cpu_inc(mirred_nest_level); + + ttl = tcf_ttl_get(skb); + if (unlikely(ttl + 1 > MAX_REC_LOOP)) { net_warn_ratelimited("Packet exceeded mirred recursion limit on dev %s\n", netdev_name(skb->dev)); __this_cpu_dec(mirred_nest_level); @@ -307,6 +309,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb, skb2->skb_iif = skb->dev->ifindex; skb2->dev = dev; + tcf_ttl_set(skb2, ttl + 1); /* mirror is always swallowed */ if (is_redirect) {
As pointed out by Jamal in: https://lore.kernel.org/netdev/CAM0EoMn4C-zwrTCGzKzuRYukxoqBa8tyHyFDwUSZYwkMOUJ4Lw@mail.gmail.com/ Mirred is allowing for infinite loops in certain use cases, such as the following: ---- sudo ip netns add p4node sudo ip link add p4port0 address 10:00:00:01:AA:BB type veth peer \ port0 address 10:00:00:02:AA:BB sudo ip link set dev port0 netns p4node sudo ip a add 10.0.0.1/24 dev p4port0 sudo ip neigh add 10.0.0.2 dev p4port0 lladdr 10:00:00:02:aa:bb sudo ip netns exec p4node ip a add 10.0.0.2/24 dev port0 sudo ip netns exec p4node ip l set dev port0 up sudo ip l set dev p4port0 up sudo ip netns exec p4node tc qdisc add dev port0 clsact sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip \ prio 10 matchall action mirred ingress redirect dev port0 ping -I p4port0 10.0.0.2 -c 1 ----- To solve this, we reintroduced a ttl variable attached to the skb (in struct tc_skb_cb) which will prevent infinite loops for use cases such as the one described above. The nest per cpu variable (tcf_mirred_nest_level) is now only used for detecting whether we should call netif_rx or netif_receive_skb when sending the packet to ingress. Note that we do increment the ttl in every redirect/mirror so if you have policies that redirect or mirror between devices of up to MAX_REC_LOOP (4) with this patch that will be considered to be a loop. Signed-off-by: Victor Nogueira <victor@mojatatu.com> --- include/net/pkt_sched.h | 11 +++++++++++ net/sched/act_mirred.c | 11 +++++++---- 2 files changed, 18 insertions(+), 4 deletions(-)