Message ID | 20240627130843.21042-20-antonio@openvpn.net (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Introducing OpenVPN Data Channel Offload | expand |
2024-06-27, 15:08:37 +0200, Antonio Quartulli wrote: > +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) > +{ > + struct sockaddr_storage ss; > + const u8 *local_ip = NULL; > + struct sockaddr_in6 *sa6; > + struct sockaddr_in *sa; > + struct ovpn_bind *bind; > + sa_family_t family; > + size_t salen; > + > + rcu_read_lock(); > + bind = rcu_dereference(peer->bind); > + if (unlikely(!bind)) > + goto unlock; Why are you aborting here? ovpn_bind_skb_src_match considers bind==NULL to be "no match" (reasonable), then we would create a new bind for the current address. > + > + if (likely(ovpn_bind_skb_src_match(bind, skb))) This could be running in parallel on two CPUs, because ->encap_rcv isn't protected against that. So the bind could be getting updated in parallel. I would move spin_lock_bh above this check to make sure it doesn't happen. ovpn_peer_update_local_endpoint would also need something like that, I think. > + goto unlock; > + > + family = skb_protocol_to_family(skb); > + > + if (bind->sa.in4.sin_family == family) > + local_ip = (u8 *)&bind->local; > + > + switch (family) { > + case AF_INET: > + sa = (struct sockaddr_in *)&ss; > + sa->sin_family = AF_INET; > + sa->sin_addr.s_addr = ip_hdr(skb)->saddr; > + sa->sin_port = udp_hdr(skb)->source; > + salen = sizeof(*sa); > + break; > + case AF_INET6: > + sa6 = (struct sockaddr_in6 *)&ss; > + sa6->sin6_family = AF_INET6; > + sa6->sin6_addr = ipv6_hdr(skb)->saddr; > + sa6->sin6_port = udp_hdr(skb)->source; > + sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr, > + skb->skb_iif); > + salen = sizeof(*sa6); > + break; > + default: > + goto unlock; > + } > + > + netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__, > + peer->id, &ss); > + ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss, > + local_ip); > + > + spin_lock_bh(&peer->ovpn->peers->lock); > + /* remove old hashing */ > + hlist_del_init_rcu(&peer->hash_entry_transp_addr); > + /* re-add with new transport address */ > + hlist_add_head_rcu(&peer->hash_entry_transp_addr, > + ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr, > + &ss, salen)); That could send a concurrent reader onto the wrong hash bucket, if it's going through peer's old bucket, finds peer before the update, then continues reading after peer is moved to the new bucket. This kind of re-hash can be handled with nulls, and re-trying the lookup if we ended up on the wrong chain. See for example __inet_lookup_established in net/ipv4/inet_hashtables.c (Thanks to Paolo for the pointer). > + spin_unlock_bh(&peer->ovpn->peers->lock); > + > +unlock: > + rcu_read_unlock(); > +}
On 17/07/2024 19:15, Sabrina Dubroca wrote: > 2024-06-27, 15:08:37 +0200, Antonio Quartulli wrote: >> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) >> +{ >> + struct sockaddr_storage ss; >> + const u8 *local_ip = NULL; >> + struct sockaddr_in6 *sa6; >> + struct sockaddr_in *sa; >> + struct ovpn_bind *bind; >> + sa_family_t family; >> + size_t salen; >> + >> + rcu_read_lock(); >> + bind = rcu_dereference(peer->bind); >> + if (unlikely(!bind)) >> + goto unlock; > > Why are you aborting here? ovpn_bind_skb_src_match considers > bind==NULL to be "no match" (reasonable), then we would create a new > bind for the current address. (NOTE: float and the following explanation assume connection via UDP) peer->bind is assigned right after peer creation in ovpn_nl_set_peer_doit(). ovpn_peer_float() is called while the peer is exchanging traffic. If we got to this point and bind is NULL, then the peer was being released, because there is no way we are going to NULLify bind during the peer life cycle, except upon ovpn_peer_release(). Does it make sense? > >> + >> + if (likely(ovpn_bind_skb_src_match(bind, skb))) > > This could be running in parallel on two CPUs, because ->encap_rcv > isn't protected against that. So the bind could be getting updated in > parallel. I would move spin_lock_bh above this check to make sure it > doesn't happen. hm, I should actually use peer->lock for this, which is currently only used in ovpn_bind_reset() to avoid multiple concurrent assignments...but you're right we should include the call to skb_src_check() as well. > > ovpn_peer_update_local_endpoint would also need something like that, I > think. at least the test-and-set part should be protected, if we can truly invoke ovpn_peer_update_local_endpoint() multiple times concurrently. How do I test running encap_rcv in parallel? This is actually an interesting case that I thought to not be possible (no specific reason for this..). > >> + goto unlock; >> + >> + family = skb_protocol_to_family(skb); >> + >> + if (bind->sa.in4.sin_family == family) >> + local_ip = (u8 *)&bind->local; >> + >> + switch (family) { >> + case AF_INET: >> + sa = (struct sockaddr_in *)&ss; >> + sa->sin_family = AF_INET; >> + sa->sin_addr.s_addr = ip_hdr(skb)->saddr; >> + sa->sin_port = udp_hdr(skb)->source; >> + salen = sizeof(*sa); >> + break; >> + case AF_INET6: >> + sa6 = (struct sockaddr_in6 *)&ss; >> + sa6->sin6_family = AF_INET6; >> + sa6->sin6_addr = ipv6_hdr(skb)->saddr; >> + sa6->sin6_port = udp_hdr(skb)->source; >> + sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr, >> + skb->skb_iif); >> + salen = sizeof(*sa6); >> + break; >> + default: >> + goto unlock; >> + } >> + >> + netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__, >> + peer->id, &ss); >> + ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss, >> + local_ip); >> + >> + spin_lock_bh(&peer->ovpn->peers->lock); >> + /* remove old hashing */ >> + hlist_del_init_rcu(&peer->hash_entry_transp_addr); >> + /* re-add with new transport address */ >> + hlist_add_head_rcu(&peer->hash_entry_transp_addr, >> + ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr, >> + &ss, salen)); > > That could send a concurrent reader onto the wrong hash bucket, if > it's going through peer's old bucket, finds peer before the update, > then continues reading after peer is moved to the new bucket. I haven't fully grasped this scenario. I am imagining we are running ovpn_peer_get_by_transp_addr() in parallel: reader gets the old bucket and finds peer, because ovpn_peer_transp_match() will still return true (update wasn't performed yet), and will return it. At this point, what do you mean with "continues reading after peer is moved to the new bucket"? > > This kind of re-hash can be handled with nulls, and re-trying the > lookup if we ended up on the wrong chain. See for example > __inet_lookup_established in net/ipv4/inet_hashtables.c (Thanks to > Paolo for the pointer). > >> + spin_unlock_bh(&peer->ovpn->peers->lock); >> + >> +unlock: >> + rcu_read_unlock(); >> +} >
2024-07-18, 11:37:38 +0200, Antonio Quartulli wrote: > On 17/07/2024 19:15, Sabrina Dubroca wrote: > > 2024-06-27, 15:08:37 +0200, Antonio Quartulli wrote: > > > +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) > > > +{ > > > + struct sockaddr_storage ss; > > > + const u8 *local_ip = NULL; > > > + struct sockaddr_in6 *sa6; > > > + struct sockaddr_in *sa; > > > + struct ovpn_bind *bind; > > > + sa_family_t family; > > > + size_t salen; > > > + > > > + rcu_read_lock(); > > > + bind = rcu_dereference(peer->bind); > > > + if (unlikely(!bind)) > > > + goto unlock; > > > > Why are you aborting here? ovpn_bind_skb_src_match considers > > bind==NULL to be "no match" (reasonable), then we would create a new > > bind for the current address. > > (NOTE: float and the following explanation assume connection via UDP) > > peer->bind is assigned right after peer creation in ovpn_nl_set_peer_doit(). > > ovpn_peer_float() is called while the peer is exchanging traffic. > > If we got to this point and bind is NULL, then the peer was being released, > because there is no way we are going to NULLify bind during the peer life > cycle, except upon ovpn_peer_release(). > > Does it make sense? Alright, thanks, I missed that. > > > + if (likely(ovpn_bind_skb_src_match(bind, skb))) > > > > This could be running in parallel on two CPUs, because ->encap_rcv > > isn't protected against that. So the bind could be getting updated in > > parallel. I would move spin_lock_bh above this check to make sure it > > doesn't happen. > > hm, I should actually use peer->lock for this, which is currently only used > in ovpn_bind_reset() to avoid multiple concurrent assignments...but you're > right we should include the call to skb_src_check() as well. Ok, sounds good. > > ovpn_peer_update_local_endpoint would also need something like that, I > > think. > > at least the test-and-set part should be protected, if we can truly invoke > ovpn_peer_update_local_endpoint() multiple times concurrently. Yes. > How do I test running encap_rcv in parallel? > This is actually an interesting case that I thought to not be possible (no > specific reason for this..). It should happen when the packets come from different source IPs and the NIC has multiple queues, then they can be spread over different CPUs. But it's probably not going to be easy to land multiple packets in ovpn_peer_float at the same time to trigger this issue. > > > + netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__, > > > + peer->id, &ss); > > > + ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss, > > > + local_ip); > > > + > > > + spin_lock_bh(&peer->ovpn->peers->lock); > > > + /* remove old hashing */ > > > + hlist_del_init_rcu(&peer->hash_entry_transp_addr); > > > + /* re-add with new transport address */ > > > + hlist_add_head_rcu(&peer->hash_entry_transp_addr, > > > + ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr, > > > + &ss, salen)); > > > > That could send a concurrent reader onto the wrong hash bucket, if > > it's going through peer's old bucket, finds peer before the update, > > then continues reading after peer is moved to the new bucket. > > I haven't fully grasped this scenario. > I am imagining we are running ovpn_peer_get_by_transp_addr() in parallel: > reader gets the old bucket and finds peer, because ovpn_peer_transp_match() > will still return true (update wasn't performed yet), and will return it. The other reader isn't necessarily looking for peer, but maybe another item that landed in the same bucket (though your hashtables are so large, it would be a bit unlucky). > At this point, what do you mean with "continues reading after peer is moved > to the new bucket"? Continues iterating, in hlist_for_each_entry_rcu inside ovpn_peer_get_by_transp_addr. ovpn_peer_float ovpn_peer_get_by_transp_addr start lookup head = ovpn_get_hash_head(...) hlist_for_each_entry_rcu ... find peer on head peer moved from head to head2 continue hlist_for_each_entry_rcu with peer->next but peer->next is now on head2 keep walking ->next on head2 instead of head
On 18/07/2024 13:12, Sabrina Dubroca wrote: >> How do I test running encap_rcv in parallel? >> This is actually an interesting case that I thought to not be possible (no >> specific reason for this..). > > It should happen when the packets come from different source IPs and > the NIC has multiple queues, then they can be spread over different > CPUs. But it's probably not going to be easy to land multiple packets > in ovpn_peer_float at the same time to trigger this issue. I see. Yeah, this is not easy. > > >>>> + netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__, >>>> + peer->id, &ss); >>>> + ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss, >>>> + local_ip); >>>> + >>>> + spin_lock_bh(&peer->ovpn->peers->lock); >>>> + /* remove old hashing */ >>>> + hlist_del_init_rcu(&peer->hash_entry_transp_addr); >>>> + /* re-add with new transport address */ >>>> + hlist_add_head_rcu(&peer->hash_entry_transp_addr, >>>> + ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr, >>>> + &ss, salen)); >>> >>> That could send a concurrent reader onto the wrong hash bucket, if >>> it's going through peer's old bucket, finds peer before the update, >>> then continues reading after peer is moved to the new bucket. >> >> I haven't fully grasped this scenario. >> I am imagining we are running ovpn_peer_get_by_transp_addr() in parallel: >> reader gets the old bucket and finds peer, because ovpn_peer_transp_match() >> will still return true (update wasn't performed yet), and will return it. > > The other reader isn't necessarily looking for peer, but maybe another > item that landed in the same bucket (though your hashtables are so > large, it would be a bit unlucky). > >> At this point, what do you mean with "continues reading after peer is moved >> to the new bucket"? > > Continues iterating, in hlist_for_each_entry_rcu inside > ovpn_peer_get_by_transp_addr. > > ovpn_peer_float ovpn_peer_get_by_transp_addr > > start lookup > head = ovpn_get_hash_head(...) > hlist_for_each_entry_rcu > ... > find peer on head > > peer moved from head to head2 > > continue hlist_for_each_entry_rcu with peer->next > but peer->next is now on head2 > keep walking ->next on head2 instead of head > Ok got it. Basically we might move the reader from a list to another without it noticing. Will have a look at the pointer provided by Paolo and modify this code accordingly. Thanks!
diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c index 9188afe0f47e..4c6a50f3f0d0 100644 --- a/drivers/net/ovpn/io.c +++ b/drivers/net/ovpn/io.c @@ -120,6 +120,10 @@ void ovpn_decrypt_post(struct sk_buff *skb, int ret) ovpn_peer_keepalive_recv_reset(peer); if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) { + /* check if this peer changed it's IP address and update + * state + */ + ovpn_peer_float(peer, skb); /* update source endpoint for this peer */ ovpn_peer_update_local_endpoint(peer, skb); } diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c index ec3064438753..c07d148c52b4 100644 --- a/drivers/net/ovpn/peer.c +++ b/drivers/net/ovpn/peer.c @@ -126,6 +126,117 @@ struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id) return peer; } +/** + * ovpn_peer_reset_sockaddr - recreate binding for peer + * @peer: peer to recreate the binding for + * @ss: sockaddr to use as remote endpoint for the binding + * @local_ip: local IP for the binding + * + * Return: 0 on success or a negative error code otherwise + */ +static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer, + const struct sockaddr_storage *ss, + const u8 *local_ip) +{ + struct ovpn_bind *bind; + size_t ip_len; + + /* create new ovpn_bind object */ + bind = ovpn_bind_from_sockaddr(ss); + if (IS_ERR(bind)) + return PTR_ERR(bind); + + if (local_ip) { + if (ss->ss_family == AF_INET) { + ip_len = sizeof(struct in_addr); + } else if (ss->ss_family == AF_INET6) { + ip_len = sizeof(struct in6_addr); + } else { + netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n", + __func__); + kfree(bind); + return -EINVAL; + } + + memcpy(&bind->local, local_ip, ip_len); + } + + /* set binding */ + ovpn_bind_reset(peer, bind); + + return 0; +} + +#define ovpn_get_hash_head(_tbl, _key, _key_len) \ + (&(_tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(_tbl)]) \ + +/** + * ovpn_peer_float - update remote endpoint for peer + * @peer: peer to update the remote endpoint for + * @skb: incoming packet to retrieve the source address (remote) from + */ +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb) +{ + struct sockaddr_storage ss; + const u8 *local_ip = NULL; + struct sockaddr_in6 *sa6; + struct sockaddr_in *sa; + struct ovpn_bind *bind; + sa_family_t family; + size_t salen; + + rcu_read_lock(); + bind = rcu_dereference(peer->bind); + if (unlikely(!bind)) + goto unlock; + + if (likely(ovpn_bind_skb_src_match(bind, skb))) + goto unlock; + + family = skb_protocol_to_family(skb); + + if (bind->sa.in4.sin_family == family) + local_ip = (u8 *)&bind->local; + + switch (family) { + case AF_INET: + sa = (struct sockaddr_in *)&ss; + sa->sin_family = AF_INET; + sa->sin_addr.s_addr = ip_hdr(skb)->saddr; + sa->sin_port = udp_hdr(skb)->source; + salen = sizeof(*sa); + break; + case AF_INET6: + sa6 = (struct sockaddr_in6 *)&ss; + sa6->sin6_family = AF_INET6; + sa6->sin6_addr = ipv6_hdr(skb)->saddr; + sa6->sin6_port = udp_hdr(skb)->source; + sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr, + skb->skb_iif); + salen = sizeof(*sa6); + break; + default: + goto unlock; + } + + netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__, + peer->id, &ss); + ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss, + local_ip); + + spin_lock_bh(&peer->ovpn->peers->lock); + /* remove old hashing */ + hlist_del_init_rcu(&peer->hash_entry_transp_addr); + /* re-add with new transport address */ + hlist_add_head_rcu(&peer->hash_entry_transp_addr, + ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr, + &ss, salen)); + spin_unlock_bh(&peer->ovpn->peers->lock); + +unlock: + rcu_read_unlock(); +} + /** * ovpn_peer_timer_delete_all - killall keepalive timers * @peer: peer for which timers should be killed @@ -231,9 +342,6 @@ static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb) return rt->rt6i_gateway; } -#define ovpn_get_hash_head(_tbl, _key, _key_len) \ - (&(_tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(_tbl)]) \ - /** * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address * @ovpn: the openvpn instance to search diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h index 1f12ba141d80..691cf20bd870 100644 --- a/drivers/net/ovpn/peer.h +++ b/drivers/net/ovpn/peer.h @@ -192,4 +192,6 @@ void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout); void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer, struct sk_buff *skb); +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb); + #endif /* _NET_OVPN_OVPNPEER_H_ */
A peer connected via UDP may change its IP address without reconnecting (float). Add support for detecting and updating the new peer IP/port in case of floating. Signed-off-by: Antonio Quartulli <antonio@openvpn.net> --- drivers/net/ovpn/io.c | 4 ++ drivers/net/ovpn/peer.c | 114 ++++++++++++++++++++++++++++++++++++++-- drivers/net/ovpn/peer.h | 2 + 3 files changed, 117 insertions(+), 3 deletions(-)