diff mbox series

[net-next,v5,19/25] ovpn: add support for peer floating

Message ID 20240627130843.21042-20-antonio@openvpn.net (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Introducing OpenVPN Data Channel Offload | expand

Checks

Context Check Description
netdev/series_format fail Series longer than 15 patches
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 2612 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 842 this patch: 842
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: openvpn-devel@lists.sourceforge.net
netdev/build_clang success Errors and warnings before: 849 this patch: 849
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 846 this patch: 846
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 142 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-28--06-00 (tests: 666)

Commit Message

Antonio Quartulli June 27, 2024, 1:08 p.m. UTC
A peer connected via UDP may change its IP address without reconnecting
(float).

Add support for detecting and updating the new peer IP/port in case of
floating.

Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
---
 drivers/net/ovpn/io.c   |   4 ++
 drivers/net/ovpn/peer.c | 114 ++++++++++++++++++++++++++++++++++++++--
 drivers/net/ovpn/peer.h |   2 +
 3 files changed, 117 insertions(+), 3 deletions(-)

Comments

Sabrina Dubroca July 17, 2024, 5:15 p.m. UTC | #1
2024-06-27, 15:08:37 +0200, Antonio Quartulli wrote:
> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
> +{
> +	struct sockaddr_storage ss;
> +	const u8 *local_ip = NULL;
> +	struct sockaddr_in6 *sa6;
> +	struct sockaddr_in *sa;
> +	struct ovpn_bind *bind;
> +	sa_family_t family;
> +	size_t salen;
> +
> +	rcu_read_lock();
> +	bind = rcu_dereference(peer->bind);
> +	if (unlikely(!bind))
> +		goto unlock;

Why are you aborting here? ovpn_bind_skb_src_match considers
bind==NULL to be "no match" (reasonable), then we would create a new
bind for the current address.

> +
> +	if (likely(ovpn_bind_skb_src_match(bind, skb)))

This could be running in parallel on two CPUs, because ->encap_rcv
isn't protected against that. So the bind could be getting updated in
parallel. I would move spin_lock_bh above this check to make sure it
doesn't happen.

ovpn_peer_update_local_endpoint would also need something like that, I
think.

> +		goto unlock;
> +
> +	family = skb_protocol_to_family(skb);
> +
> +	if (bind->sa.in4.sin_family == family)
> +		local_ip = (u8 *)&bind->local;
> +
> +	switch (family) {
> +	case AF_INET:
> +		sa = (struct sockaddr_in *)&ss;
> +		sa->sin_family = AF_INET;
> +		sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
> +		sa->sin_port = udp_hdr(skb)->source;
> +		salen = sizeof(*sa);
> +		break;
> +	case AF_INET6:
> +		sa6 = (struct sockaddr_in6 *)&ss;
> +		sa6->sin6_family = AF_INET6;
> +		sa6->sin6_addr = ipv6_hdr(skb)->saddr;
> +		sa6->sin6_port = udp_hdr(skb)->source;
> +		sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr,
> +							 skb->skb_iif);
> +		salen = sizeof(*sa6);
> +		break;
> +	default:
> +		goto unlock;
> +	}
> +
> +	netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
> +		   peer->id, &ss);
> +	ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
> +				 local_ip);
> +
> +	spin_lock_bh(&peer->ovpn->peers->lock);
> +	/* remove old hashing */
> +	hlist_del_init_rcu(&peer->hash_entry_transp_addr);
> +	/* re-add with new transport address */
> +	hlist_add_head_rcu(&peer->hash_entry_transp_addr,
> +			   ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr,
> +					      &ss, salen));

That could send a concurrent reader onto the wrong hash bucket, if
it's going through peer's old bucket, finds peer before the update,
then continues reading after peer is moved to the new bucket.

This kind of re-hash can be handled with nulls, and re-trying the
lookup if we ended up on the wrong chain. See for example
__inet_lookup_established in net/ipv4/inet_hashtables.c (Thanks to
Paolo for the pointer).

> +	spin_unlock_bh(&peer->ovpn->peers->lock);
> +
> +unlock:
> +	rcu_read_unlock();
> +}
Antonio Quartulli July 18, 2024, 9:37 a.m. UTC | #2
On 17/07/2024 19:15, Sabrina Dubroca wrote:
> 2024-06-27, 15:08:37 +0200, Antonio Quartulli wrote:
>> +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
>> +{
>> +	struct sockaddr_storage ss;
>> +	const u8 *local_ip = NULL;
>> +	struct sockaddr_in6 *sa6;
>> +	struct sockaddr_in *sa;
>> +	struct ovpn_bind *bind;
>> +	sa_family_t family;
>> +	size_t salen;
>> +
>> +	rcu_read_lock();
>> +	bind = rcu_dereference(peer->bind);
>> +	if (unlikely(!bind))
>> +		goto unlock;
> 
> Why are you aborting here? ovpn_bind_skb_src_match considers
> bind==NULL to be "no match" (reasonable), then we would create a new
> bind for the current address.

(NOTE: float and the following explanation assume connection via UDP)

peer->bind is assigned right after peer creation in ovpn_nl_set_peer_doit().

ovpn_peer_float() is called while the peer is exchanging traffic.

If we got to this point and bind is NULL, then the peer was being 
released, because there is no way we are going to NULLify bind during 
the peer life cycle, except upon ovpn_peer_release().

Does it make sense?

> 
>> +
>> +	if (likely(ovpn_bind_skb_src_match(bind, skb)))
> 
> This could be running in parallel on two CPUs, because ->encap_rcv
> isn't protected against that. So the bind could be getting updated in
> parallel. I would move spin_lock_bh above this check to make sure it
> doesn't happen.

hm, I should actually use peer->lock for this, which is currently only 
used in ovpn_bind_reset() to avoid multiple concurrent assignments...but 
you're right we should include the call to skb_src_check() as well.

> 
> ovpn_peer_update_local_endpoint would also need something like that, I
> think.

at least the test-and-set part should be protected, if we can truly 
invoke ovpn_peer_update_local_endpoint() multiple times concurrently.


How do I test running encap_rcv in parallel?
This is actually an interesting case that I thought to not be possible 
(no specific reason for this..).

> 
>> +		goto unlock;
>> +
>> +	family = skb_protocol_to_family(skb);
>> +
>> +	if (bind->sa.in4.sin_family == family)
>> +		local_ip = (u8 *)&bind->local;
>> +
>> +	switch (family) {
>> +	case AF_INET:
>> +		sa = (struct sockaddr_in *)&ss;
>> +		sa->sin_family = AF_INET;
>> +		sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
>> +		sa->sin_port = udp_hdr(skb)->source;
>> +		salen = sizeof(*sa);
>> +		break;
>> +	case AF_INET6:
>> +		sa6 = (struct sockaddr_in6 *)&ss;
>> +		sa6->sin6_family = AF_INET6;
>> +		sa6->sin6_addr = ipv6_hdr(skb)->saddr;
>> +		sa6->sin6_port = udp_hdr(skb)->source;
>> +		sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr,
>> +							 skb->skb_iif);
>> +		salen = sizeof(*sa6);
>> +		break;
>> +	default:
>> +		goto unlock;
>> +	}
>> +
>> +	netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
>> +		   peer->id, &ss);
>> +	ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
>> +				 local_ip);
>> +
>> +	spin_lock_bh(&peer->ovpn->peers->lock);
>> +	/* remove old hashing */
>> +	hlist_del_init_rcu(&peer->hash_entry_transp_addr);
>> +	/* re-add with new transport address */
>> +	hlist_add_head_rcu(&peer->hash_entry_transp_addr,
>> +			   ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr,
>> +					      &ss, salen));
> 
> That could send a concurrent reader onto the wrong hash bucket, if
> it's going through peer's old bucket, finds peer before the update,
> then continues reading after peer is moved to the new bucket.

I haven't fully grasped this scenario.
I am imagining we are running ovpn_peer_get_by_transp_addr() in 
parallel: reader gets the old bucket and finds peer, because 
ovpn_peer_transp_match() will still return true (update wasn't performed 
yet), and will return it.

At this point, what do you mean with "continues reading after peer is 
moved to the new bucket"?

> 
> This kind of re-hash can be handled with nulls, and re-trying the
> lookup if we ended up on the wrong chain. See for example
> __inet_lookup_established in net/ipv4/inet_hashtables.c (Thanks to
> Paolo for the pointer).
> 
>> +	spin_unlock_bh(&peer->ovpn->peers->lock);
>> +
>> +unlock:
>> +	rcu_read_unlock();
>> +}
>
Sabrina Dubroca July 18, 2024, 11:12 a.m. UTC | #3
2024-07-18, 11:37:38 +0200, Antonio Quartulli wrote:
> On 17/07/2024 19:15, Sabrina Dubroca wrote:
> > 2024-06-27, 15:08:37 +0200, Antonio Quartulli wrote:
> > > +void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
> > > +{
> > > +	struct sockaddr_storage ss;
> > > +	const u8 *local_ip = NULL;
> > > +	struct sockaddr_in6 *sa6;
> > > +	struct sockaddr_in *sa;
> > > +	struct ovpn_bind *bind;
> > > +	sa_family_t family;
> > > +	size_t salen;
> > > +
> > > +	rcu_read_lock();
> > > +	bind = rcu_dereference(peer->bind);
> > > +	if (unlikely(!bind))
> > > +		goto unlock;
> > 
> > Why are you aborting here? ovpn_bind_skb_src_match considers
> > bind==NULL to be "no match" (reasonable), then we would create a new
> > bind for the current address.
> 
> (NOTE: float and the following explanation assume connection via UDP)
> 
> peer->bind is assigned right after peer creation in ovpn_nl_set_peer_doit().
> 
> ovpn_peer_float() is called while the peer is exchanging traffic.
> 
> If we got to this point and bind is NULL, then the peer was being released,
> because there is no way we are going to NULLify bind during the peer life
> cycle, except upon ovpn_peer_release().
> 
> Does it make sense?

Alright, thanks, I missed that.


> > > +	if (likely(ovpn_bind_skb_src_match(bind, skb)))
> > 
> > This could be running in parallel on two CPUs, because ->encap_rcv
> > isn't protected against that. So the bind could be getting updated in
> > parallel. I would move spin_lock_bh above this check to make sure it
> > doesn't happen.
> 
> hm, I should actually use peer->lock for this, which is currently only used
> in ovpn_bind_reset() to avoid multiple concurrent assignments...but you're
> right we should include the call to skb_src_check() as well.

Ok, sounds good.

> > ovpn_peer_update_local_endpoint would also need something like that, I
> > think.
> 
> at least the test-and-set part should be protected, if we can truly invoke
> ovpn_peer_update_local_endpoint() multiple times concurrently.

Yes.

> How do I test running encap_rcv in parallel?
> This is actually an interesting case that I thought to not be possible (no
> specific reason for this..).

It should happen when the packets come from different source IPs and
the NIC has multiple queues, then they can be spread over different
CPUs. But it's probably not going to be easy to land multiple packets
in ovpn_peer_float at the same time to trigger this issue.


> > > +	netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
> > > +		   peer->id, &ss);
> > > +	ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
> > > +				 local_ip);
> > > +
> > > +	spin_lock_bh(&peer->ovpn->peers->lock);
> > > +	/* remove old hashing */
> > > +	hlist_del_init_rcu(&peer->hash_entry_transp_addr);
> > > +	/* re-add with new transport address */
> > > +	hlist_add_head_rcu(&peer->hash_entry_transp_addr,
> > > +			   ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr,
> > > +					      &ss, salen));
> > 
> > That could send a concurrent reader onto the wrong hash bucket, if
> > it's going through peer's old bucket, finds peer before the update,
> > then continues reading after peer is moved to the new bucket.
> 
> I haven't fully grasped this scenario.
> I am imagining we are running ovpn_peer_get_by_transp_addr() in parallel:
> reader gets the old bucket and finds peer, because ovpn_peer_transp_match()
> will still return true (update wasn't performed yet), and will return it.

The other reader isn't necessarily looking for peer, but maybe another
item that landed in the same bucket (though your hashtables are so
large, it would be a bit unlucky).

> At this point, what do you mean with "continues reading after peer is moved
> to the new bucket"?

Continues iterating, in hlist_for_each_entry_rcu inside
ovpn_peer_get_by_transp_addr.

ovpn_peer_float                          ovpn_peer_get_by_transp_addr

                                         start lookup
                                         head = ovpn_get_hash_head(...)
                                         hlist_for_each_entry_rcu
                                         ...
                                         find peer on head

peer moved from head to head2

                                         continue hlist_for_each_entry_rcu with peer->next
                                         but peer->next is now on head2
                                         keep walking ->next on head2 instead of head
Antonio Quartulli July 18, 2024, 1:21 p.m. UTC | #4
On 18/07/2024 13:12, Sabrina Dubroca wrote:
>> How do I test running encap_rcv in parallel?
>> This is actually an interesting case that I thought to not be possible (no
>> specific reason for this..).
> 
> It should happen when the packets come from different source IPs and
> the NIC has multiple queues, then they can be spread over different
> CPUs. But it's probably not going to be easy to land multiple packets
> in ovpn_peer_float at the same time to trigger this issue.

I see. Yeah, this is not easy.

> 
> 
>>>> +	netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
>>>> +		   peer->id, &ss);
>>>> +	ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
>>>> +				 local_ip);
>>>> +
>>>> +	spin_lock_bh(&peer->ovpn->peers->lock);
>>>> +	/* remove old hashing */
>>>> +	hlist_del_init_rcu(&peer->hash_entry_transp_addr);
>>>> +	/* re-add with new transport address */
>>>> +	hlist_add_head_rcu(&peer->hash_entry_transp_addr,
>>>> +			   ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr,
>>>> +					      &ss, salen));
>>>
>>> That could send a concurrent reader onto the wrong hash bucket, if
>>> it's going through peer's old bucket, finds peer before the update,
>>> then continues reading after peer is moved to the new bucket.
>>
>> I haven't fully grasped this scenario.
>> I am imagining we are running ovpn_peer_get_by_transp_addr() in parallel:
>> reader gets the old bucket and finds peer, because ovpn_peer_transp_match()
>> will still return true (update wasn't performed yet), and will return it.
> 
> The other reader isn't necessarily looking for peer, but maybe another
> item that landed in the same bucket (though your hashtables are so
> large, it would be a bit unlucky).
> 
>> At this point, what do you mean with "continues reading after peer is moved
>> to the new bucket"?
> 
> Continues iterating, in hlist_for_each_entry_rcu inside
> ovpn_peer_get_by_transp_addr.
> 
> ovpn_peer_float                          ovpn_peer_get_by_transp_addr
> 
>                                           start lookup
>                                           head = ovpn_get_hash_head(...)
>                                           hlist_for_each_entry_rcu
>                                           ...
>                                           find peer on head
> 
> peer moved from head to head2
> 
>                                           continue hlist_for_each_entry_rcu with peer->next
>                                           but peer->next is now on head2
>                                           keep walking ->next on head2 instead of head
> 

Ok got it.
Basically we might move the reader from a list to another without it 
noticing.

Will have a look at the pointer provided by Paolo and modify this code 
accordingly.

Thanks!
diff mbox series

Patch

diff --git a/drivers/net/ovpn/io.c b/drivers/net/ovpn/io.c
index 9188afe0f47e..4c6a50f3f0d0 100644
--- a/drivers/net/ovpn/io.c
+++ b/drivers/net/ovpn/io.c
@@ -120,6 +120,10 @@  void ovpn_decrypt_post(struct sk_buff *skb, int ret)
 	ovpn_peer_keepalive_recv_reset(peer);
 
 	if (peer->sock->sock->sk->sk_protocol == IPPROTO_UDP) {
+		/* check if this peer changed it's IP address and update
+		 * state
+		 */
+		ovpn_peer_float(peer, skb);
 		/* update source endpoint for this peer */
 		ovpn_peer_update_local_endpoint(peer, skb);
 	}
diff --git a/drivers/net/ovpn/peer.c b/drivers/net/ovpn/peer.c
index ec3064438753..c07d148c52b4 100644
--- a/drivers/net/ovpn/peer.c
+++ b/drivers/net/ovpn/peer.c
@@ -126,6 +126,117 @@  struct ovpn_peer *ovpn_peer_new(struct ovpn_struct *ovpn, u32 id)
 	return peer;
 }
 
+/**
+ * ovpn_peer_reset_sockaddr - recreate binding for peer
+ * @peer: peer to recreate the binding for
+ * @ss: sockaddr to use as remote endpoint for the binding
+ * @local_ip: local IP for the binding
+ *
+ * Return: 0 on success or a negative error code otherwise
+ */
+static int ovpn_peer_reset_sockaddr(struct ovpn_peer *peer,
+				    const struct sockaddr_storage *ss,
+				    const u8 *local_ip)
+{
+	struct ovpn_bind *bind;
+	size_t ip_len;
+
+	/* create new ovpn_bind object */
+	bind = ovpn_bind_from_sockaddr(ss);
+	if (IS_ERR(bind))
+		return PTR_ERR(bind);
+
+	if (local_ip) {
+		if (ss->ss_family == AF_INET) {
+			ip_len = sizeof(struct in_addr);
+		} else if (ss->ss_family == AF_INET6) {
+			ip_len = sizeof(struct in6_addr);
+		} else {
+			netdev_dbg(peer->ovpn->dev, "%s: invalid family for remote endpoint\n",
+				   __func__);
+			kfree(bind);
+			return -EINVAL;
+		}
+
+		memcpy(&bind->local, local_ip, ip_len);
+	}
+
+	/* set binding */
+	ovpn_bind_reset(peer, bind);
+
+	return 0;
+}
+
+#define ovpn_get_hash_head(_tbl, _key, _key_len)		\
+	(&(_tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(_tbl)])	\
+
+/**
+ * ovpn_peer_float - update remote endpoint for peer
+ * @peer: peer to update the remote endpoint for
+ * @skb: incoming packet to retrieve the source address (remote) from
+ */
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb)
+{
+	struct sockaddr_storage ss;
+	const u8 *local_ip = NULL;
+	struct sockaddr_in6 *sa6;
+	struct sockaddr_in *sa;
+	struct ovpn_bind *bind;
+	sa_family_t family;
+	size_t salen;
+
+	rcu_read_lock();
+	bind = rcu_dereference(peer->bind);
+	if (unlikely(!bind))
+		goto unlock;
+
+	if (likely(ovpn_bind_skb_src_match(bind, skb)))
+		goto unlock;
+
+	family = skb_protocol_to_family(skb);
+
+	if (bind->sa.in4.sin_family == family)
+		local_ip = (u8 *)&bind->local;
+
+	switch (family) {
+	case AF_INET:
+		sa = (struct sockaddr_in *)&ss;
+		sa->sin_family = AF_INET;
+		sa->sin_addr.s_addr = ip_hdr(skb)->saddr;
+		sa->sin_port = udp_hdr(skb)->source;
+		salen = sizeof(*sa);
+		break;
+	case AF_INET6:
+		sa6 = (struct sockaddr_in6 *)&ss;
+		sa6->sin6_family = AF_INET6;
+		sa6->sin6_addr = ipv6_hdr(skb)->saddr;
+		sa6->sin6_port = udp_hdr(skb)->source;
+		sa6->sin6_scope_id = ipv6_iface_scope_id(&ipv6_hdr(skb)->saddr,
+							 skb->skb_iif);
+		salen = sizeof(*sa6);
+		break;
+	default:
+		goto unlock;
+	}
+
+	netdev_dbg(peer->ovpn->dev, "%s: peer %d floated to %pIScp", __func__,
+		   peer->id, &ss);
+	ovpn_peer_reset_sockaddr(peer, (struct sockaddr_storage *)&ss,
+				 local_ip);
+
+	spin_lock_bh(&peer->ovpn->peers->lock);
+	/* remove old hashing */
+	hlist_del_init_rcu(&peer->hash_entry_transp_addr);
+	/* re-add with new transport address */
+	hlist_add_head_rcu(&peer->hash_entry_transp_addr,
+			   ovpn_get_hash_head(peer->ovpn->peers->by_transp_addr,
+					      &ss, salen));
+	spin_unlock_bh(&peer->ovpn->peers->lock);
+
+unlock:
+	rcu_read_unlock();
+}
+
 /**
  * ovpn_peer_timer_delete_all - killall keepalive timers
  * @peer: peer for which timers should be killed
@@ -231,9 +342,6 @@  static struct in6_addr ovpn_nexthop_from_skb6(struct sk_buff *skb)
 	return rt->rt6i_gateway;
 }
 
-#define ovpn_get_hash_head(_tbl, _key, _key_len)		\
-	(&(_tbl)[jhash(_key, _key_len, 0) % HASH_SIZE(_tbl)])	\
-
 /**
  * ovpn_peer_get_by_vpn_addr4 - retrieve peer by its VPN IPv4 address
  * @ovpn: the openvpn instance to search
diff --git a/drivers/net/ovpn/peer.h b/drivers/net/ovpn/peer.h
index 1f12ba141d80..691cf20bd870 100644
--- a/drivers/net/ovpn/peer.h
+++ b/drivers/net/ovpn/peer.h
@@ -192,4 +192,6 @@  void ovpn_peer_keepalive_set(struct ovpn_peer *peer, u32 interval, u32 timeout);
 void ovpn_peer_update_local_endpoint(struct ovpn_peer *peer,
 				     struct sk_buff *skb);
 
+void ovpn_peer_float(struct ovpn_peer *peer, struct sk_buff *skb);
+
 #endif /* _NET_OVPN_OVPNPEER_H_ */