Message ID | 20220202110137.470850-2-atenart@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: fix issues when uncloning an skb dst+metadata | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Clearly marked for net |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/subject_prefix | success | Link |
netdev/cover_letter | success | Series has a cover letter |
netdev/patch_count | success | Link |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 75 this patch: 75 |
netdev/cc_maintainers | success | CCed 4 of 4 maintainers |
netdev/build_clang | success | Errors and warnings before: 25 this patch: 25 |
netdev/module_param | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 94 this patch: 94 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 26 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
Quoting Antoine Tenart (2022-02-02 12:01:36) > When uncloning an skb dst and its associated metadata a new dst+metadata > is allocated and the tunnel information from the old metadata is copied > over there. > > The issue is the tunnel metadata has references to cached dst, which are > copied along the way. When a dst+metadata refcount drops to 0 the > metadata is freed including the cached dst entries. As they are also > referenced in the initial dst+metadata, this ends up in UaFs. > > In practice the above did not happen because of another issue, the > dst+metadata was never freed because its refcount never dropped to 0 > (this will be fixed in a subsequent patch). > > Fix this by initializing the dst cache after copying the tunnel > information from the old metadata to also unshare the dst cache. > > Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") > Cc: Paolo Abeni <pabeni@redhat.com> > Reported-by: Vlad Buslov <vladbu@nvidia.com> > Tested-by: Vlad Buslov <vladbu@nvidia.com> > Signed-off-by: Antoine Tenart <atenart@kernel.org> > --- > include/net/dst_metadata.h | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > index 14efa0ded75d..c8f8b7b56bba 100644 > --- a/include/net/dst_metadata.h > +++ b/include/net/dst_metadata.h > @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size) > static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > { > struct metadata_dst *md_dst = skb_metadata_dst(skb); > - int md_size; > struct metadata_dst *new_md; > + int md_size, ret; Hmmm ret should probably be defined inside a CONFIG_DST_CACHE section. > if (!md_dst || md_dst->type != METADATA_IP_TUNNEL) > return ERR_PTR(-EINVAL); > @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > sizeof(struct ip_tunnel_info) + md_size); > +#ifdef CONFIG_DST_CACHE > + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); > + if (ret) { > + /* We can't call metadata_dst_free directly as the still shared > + * dst cache would be released. > + */ > + kfree(new_md); > + return ERR_PTR(ret); > + } > +#endif > + > skb_dst_drop(skb); > dst_hold(&new_md->dst); > skb_dst_set(skb, &new_md->dst); > -- > 2.34.1 >
On 2/2/22 12:01 PM, Antoine Tenart wrote: > When uncloning an skb dst and its associated metadata a new dst+metadata > is allocated and the tunnel information from the old metadata is copied > over there. > > The issue is the tunnel metadata has references to cached dst, which are > copied along the way. When a dst+metadata refcount drops to 0 the > metadata is freed including the cached dst entries. As they are also > referenced in the initial dst+metadata, this ends up in UaFs. > > In practice the above did not happen because of another issue, the > dst+metadata was never freed because its refcount never dropped to 0 > (this will be fixed in a subsequent patch). > > Fix this by initializing the dst cache after copying the tunnel > information from the old metadata to also unshare the dst cache. > > Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") > Cc: Paolo Abeni <pabeni@redhat.com> > Reported-by: Vlad Buslov <vladbu@nvidia.com> > Tested-by: Vlad Buslov <vladbu@nvidia.com> > Signed-off-by: Antoine Tenart <atenart@kernel.org> > --- > include/net/dst_metadata.h | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > index 14efa0ded75d..c8f8b7b56bba 100644 > --- a/include/net/dst_metadata.h > +++ b/include/net/dst_metadata.h > @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size) > static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > { > struct metadata_dst *md_dst = skb_metadata_dst(skb); > - int md_size; > struct metadata_dst *new_md; > + int md_size, ret; > > if (!md_dst || md_dst->type != METADATA_IP_TUNNEL) > return ERR_PTR(-EINVAL); > @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > sizeof(struct ip_tunnel_info) + md_size); > +#ifdef CONFIG_DST_CACHE > + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); > + if (ret) { > + /* We can't call metadata_dst_free directly as the still shared > + * dst cache would be released. > + */ > + kfree(new_md); > + return ERR_PTR(ret); > + } > +#endif Could you elaborate (e.g. also in commit message) how this interacts or whether it is needed for TUNNEL_NOCACHE users? (Among others, latter is used by BPF, for example.) > skb_dst_drop(skb); > dst_hold(&new_md->dst); > skb_dst_set(skb, &new_md->dst); >
Quoting Daniel Borkmann (2022-02-02 13:13:30) > On 2/2/22 12:01 PM, Antoine Tenart wrote: > > When uncloning an skb dst and its associated metadata a new dst+metadata > > is allocated and the tunnel information from the old metadata is copied > > over there. > > > > The issue is the tunnel metadata has references to cached dst, which are > > copied along the way. When a dst+metadata refcount drops to 0 the > > metadata is freed including the cached dst entries. As they are also > > referenced in the initial dst+metadata, this ends up in UaFs. > > > > In practice the above did not happen because of another issue, the > > dst+metadata was never freed because its refcount never dropped to 0 > > (this will be fixed in a subsequent patch). > > > > Fix this by initializing the dst cache after copying the tunnel > > information from the old metadata to also unshare the dst cache. > > > > Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") > > Cc: Paolo Abeni <pabeni@redhat.com> > > Reported-by: Vlad Buslov <vladbu@nvidia.com> > > Tested-by: Vlad Buslov <vladbu@nvidia.com> > > Signed-off-by: Antoine Tenart <atenart@kernel.org> > > --- > > include/net/dst_metadata.h | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > > index 14efa0ded75d..c8f8b7b56bba 100644 > > --- a/include/net/dst_metadata.h > > +++ b/include/net/dst_metadata.h > > @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size) > > static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > > { > > struct metadata_dst *md_dst = skb_metadata_dst(skb); > > - int md_size; > > struct metadata_dst *new_md; > > + int md_size, ret; > > > > if (!md_dst || md_dst->type != METADATA_IP_TUNNEL) > > return ERR_PTR(-EINVAL); > > @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > > > > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > > sizeof(struct ip_tunnel_info) + md_size); > > +#ifdef CONFIG_DST_CACHE > > + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); > > + if (ret) { > > + /* We can't call metadata_dst_free directly as the still shared > > + * dst cache would be released. > > + */ > > + kfree(new_md); > > + return ERR_PTR(ret); > > + } > > +#endif > > Could you elaborate (e.g. also in commit message) how this interacts > or whether it is needed for TUNNEL_NOCACHE users? (Among others, > latter is used by BPF, for example.) My understanding is that TUNNEL_NOCACHE is used to decide whether or not to use a dst cache, that might or might not come from the tunnel info attached to an skb. The dst cache being allocated in a tunnel info is orthogonal to the use of TUNNEL_NOCACHE. While looking around I actually found a code path explicitly setting both, in nft_tunnel_obj_init (that might need to be investigated though but it is another topic). It doesn't look like initializing the dst cache would break TUNNEL_NOCACHE users as ip_tunnel_dst_cache_usable would return false anyway. Having said that, we probably want to unshare the dst cache only if there is one already, checking for 'md_dst->u.tun_info.dst_cache.cache != NULL' first. Does that make sense? Thanks! Antoine
On Wed, 2022-02-02 at 12:01 +0100, Antoine Tenart wrote: > When uncloning an skb dst and its associated metadata a new dst+metadata > is allocated and the tunnel information from the old metadata is copied > over there. > > The issue is the tunnel metadata has references to cached dst, which are > copied along the way. When a dst+metadata refcount drops to 0 the > metadata is freed including the cached dst entries. As they are also > referenced in the initial dst+metadata, this ends up in UaFs. > > In practice the above did not happen because of another issue, the > dst+metadata was never freed because its refcount never dropped to 0 > (this will be fixed in a subsequent patch). > > Fix this by initializing the dst cache after copying the tunnel > information from the old metadata to also unshare the dst cache. > > Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") > Cc: Paolo Abeni <pabeni@redhat.com> > Reported-by: Vlad Buslov <vladbu@nvidia.com> > Tested-by: Vlad Buslov <vladbu@nvidia.com> > Signed-off-by: Antoine Tenart <atenart@kernel.org> > --- > include/net/dst_metadata.h | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > index 14efa0ded75d..c8f8b7b56bba 100644 > --- a/include/net/dst_metadata.h > +++ b/include/net/dst_metadata.h > @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size) > static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > { > struct metadata_dst *md_dst = skb_metadata_dst(skb); > - int md_size; > struct metadata_dst *new_md; > + int md_size, ret; > > if (!md_dst || md_dst->type != METADATA_IP_TUNNEL) > return ERR_PTR(-EINVAL); > @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > sizeof(struct ip_tunnel_info) + md_size); > +#ifdef CONFIG_DST_CACHE > + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); > + if (ret) { > + /* We can't call metadata_dst_free directly as the still shared > + * dst cache would be released. > + */ > + kfree(new_md); I think here you can use metadata_dst_free(): if dst_cache_init fails, the dst_cache will be zeroed. > + return ERR_PTR(ret); > + } > +#endif > + > skb_dst_drop(skb); > dst_hold(&new_md->dst); > skb_dst_set(skb, &new_md->dst); Other than that LGTM, thanks! /P
Quoting Paolo Abeni (2022-02-02 15:24:51) > On Wed, 2022-02-02 at 12:01 +0100, Antoine Tenart wrote: > > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > > sizeof(struct ip_tunnel_info) + md_size); > > +#ifdef CONFIG_DST_CACHE > > + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); > > + if (ret) { > > + /* We can't call metadata_dst_free directly as the still shared > > + * dst cache would be released. > > + */ > > + kfree(new_md); > > I think here you can use metadata_dst_free(): if dst_cache_init fails, > the dst_cache will be zeroed. You're right, I'll use it in v2. Thanks! Antoine
On 2/2/22 2:44 PM, Antoine Tenart wrote: > Quoting Daniel Borkmann (2022-02-02 13:13:30) >> On 2/2/22 12:01 PM, Antoine Tenart wrote: >>> When uncloning an skb dst and its associated metadata a new dst+metadata >>> is allocated and the tunnel information from the old metadata is copied >>> over there. >>> >>> The issue is the tunnel metadata has references to cached dst, which are >>> copied along the way. When a dst+metadata refcount drops to 0 the >>> metadata is freed including the cached dst entries. As they are also >>> referenced in the initial dst+metadata, this ends up in UaFs. >>> >>> In practice the above did not happen because of another issue, the >>> dst+metadata was never freed because its refcount never dropped to 0 >>> (this will be fixed in a subsequent patch). >>> >>> Fix this by initializing the dst cache after copying the tunnel >>> information from the old metadata to also unshare the dst cache. >>> >>> Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") >>> Cc: Paolo Abeni <pabeni@redhat.com> >>> Reported-by: Vlad Buslov <vladbu@nvidia.com> >>> Tested-by: Vlad Buslov <vladbu@nvidia.com> >>> Signed-off-by: Antoine Tenart <atenart@kernel.org> >>> --- >>> include/net/dst_metadata.h | 13 ++++++++++++- >>> 1 file changed, 12 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h >>> index 14efa0ded75d..c8f8b7b56bba 100644 >>> --- a/include/net/dst_metadata.h >>> +++ b/include/net/dst_metadata.h >>> @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size) >>> static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) >>> { >>> struct metadata_dst *md_dst = skb_metadata_dst(skb); >>> - int md_size; >>> struct metadata_dst *new_md; >>> + int md_size, ret; >>> >>> if (!md_dst || md_dst->type != METADATA_IP_TUNNEL) >>> return ERR_PTR(-EINVAL); >>> @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) >>> >>> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, >>> sizeof(struct ip_tunnel_info) + md_size); >>> +#ifdef CONFIG_DST_CACHE >>> + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); >>> + if (ret) { >>> + /* We can't call metadata_dst_free directly as the still shared >>> + * dst cache would be released. >>> + */ >>> + kfree(new_md); >>> + return ERR_PTR(ret); >>> + } >>> +#endif >> >> Could you elaborate (e.g. also in commit message) how this interacts >> or whether it is needed for TUNNEL_NOCACHE users? (Among others, >> latter is used by BPF, for example.) > > My understanding is that TUNNEL_NOCACHE is used to decide whether or not > to use a dst cache, that might or might not come from the tunnel info > attached to an skb. The dst cache being allocated in a tunnel info is > orthogonal to the use of TUNNEL_NOCACHE. While looking around I actually > found a code path explicitly setting both, in nft_tunnel_obj_init (that > might need to be investigated though but it is another topic). Good point, this is coming from 3e511d5652ce ("netfilter: nft_tunnel: Add dst_cache support") and was added only after af308b94a2a4 ("netfilter: nf_tables: add tunnel support") which initially indicated TUNNEL_NOCACHE. This is indeed contradictory. wenxu (+Cc), ptal. > It doesn't look like initializing the dst cache would break > TUNNEL_NOCACHE users as ip_tunnel_dst_cache_usable would return false > anyway. Having said that, we probably want to unshare the dst cache only > if there is one already, checking for > 'md_dst->u.tun_info.dst_cache.cache != NULL' first. Meaning, if that is the case, we wouldn't require the dst_cache_init() and thus extra alloc, right? Would make sense afaics. db3c6139e6ea ("bpf, vxlan, geneve, gre: fix usage of dst_cache on xmit") had some details related to BPF use. Thanks again! Daniel
Quoting Daniel Borkmann (2022-02-04 13:33:20) > On 2/2/22 2:44 PM, Antoine Tenart wrote: > > Quoting Daniel Borkmann (2022-02-02 13:13:30) > >> On 2/2/22 12:01 PM, Antoine Tenart wrote: > >>> When uncloning an skb dst and its associated metadata a new dst+metadata > >>> is allocated and the tunnel information from the old metadata is copied > >>> over there. > >>> > >>> The issue is the tunnel metadata has references to cached dst, which are > >>> copied along the way. When a dst+metadata refcount drops to 0 the > >>> metadata is freed including the cached dst entries. As they are also > >>> referenced in the initial dst+metadata, this ends up in UaFs. > >>> > >>> In practice the above did not happen because of another issue, the > >>> dst+metadata was never freed because its refcount never dropped to 0 > >>> (this will be fixed in a subsequent patch). > >>> > >>> Fix this by initializing the dst cache after copying the tunnel > >>> information from the old metadata to also unshare the dst cache. > >>> > >>> Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") > >>> Cc: Paolo Abeni <pabeni@redhat.com> > >>> Reported-by: Vlad Buslov <vladbu@nvidia.com> > >>> Tested-by: Vlad Buslov <vladbu@nvidia.com> > >>> Signed-off-by: Antoine Tenart <atenart@kernel.org> > >>> --- > >>> include/net/dst_metadata.h | 13 ++++++++++++- > >>> 1 file changed, 12 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > >>> index 14efa0ded75d..c8f8b7b56bba 100644 > >>> --- a/include/net/dst_metadata.h > >>> +++ b/include/net/dst_metadata.h > >>> @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size) > >>> static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > >>> { > >>> struct metadata_dst *md_dst = skb_metadata_dst(skb); > >>> - int md_size; > >>> struct metadata_dst *new_md; > >>> + int md_size, ret; > >>> > >>> if (!md_dst || md_dst->type != METADATA_IP_TUNNEL) > >>> return ERR_PTR(-EINVAL); > >>> @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > >>> > >>> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > >>> sizeof(struct ip_tunnel_info) + md_size); > >>> +#ifdef CONFIG_DST_CACHE > >>> + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); > >>> + if (ret) { > >>> + /* We can't call metadata_dst_free directly as the still shared > >>> + * dst cache would be released. > >>> + */ > >>> + kfree(new_md); > >>> + return ERR_PTR(ret); > >>> + } > >>> +#endif > >> > >> Could you elaborate (e.g. also in commit message) how this interacts > >> or whether it is needed for TUNNEL_NOCACHE users? (Among others, > >> latter is used by BPF, for example.) > > > > My understanding is that TUNNEL_NOCACHE is used to decide whether or not > > to use a dst cache, that might or might not come from the tunnel info > > attached to an skb. The dst cache being allocated in a tunnel info is > > orthogonal to the use of TUNNEL_NOCACHE. While looking around I actually > > found a code path explicitly setting both, in nft_tunnel_obj_init (that > > might need to be investigated though but it is another topic). > > Good point, this is coming from 3e511d5652ce ("netfilter: nft_tunnel: Add dst_cache > support") and was added only after af308b94a2a4 ("netfilter: nf_tables: add tunnel > support") which initially indicated TUNNEL_NOCACHE. This is indeed contradictory. > wenxu (+Cc), ptal. > > > It doesn't look like initializing the dst cache would break > > TUNNEL_NOCACHE users as ip_tunnel_dst_cache_usable would return false > > anyway. Having said that, we probably want to unshare the dst cache only > > if there is one already, checking for > > 'md_dst->u.tun_info.dst_cache.cache != NULL' first. > > Meaning, if that is the case, we wouldn't require the dst_cache_init() > and thus extra alloc, right? Would make sense afaics. Meaning: memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, sizeof(struct ip_tunnel_info) + md_size); +#ifdef CONFIG_DST_CACHE + if (new_md->u.tun_info.dst_cache.cache) { + int ret = dst_cache_init(&new_md->u.tun_info.dst_cache, + GFP_ATOMIC); + if (ret) { + metadata_dst_free(new_md); + return ERR_PTR(ret); + } + } +#endif So that the cache is unshared only if there was one in the first place. If there was no cache to unshare, we can save the extra alloc. > db3c6139e6ea ("bpf, vxlan, geneve, gre: fix usage of dst_cache on > xmit") had some details related to BPF use. With the above commit if TUNNEL_NOCACHE is set the tunnel cache shouldn't be used, regardless of it being allocated. I guess with that and the above, we should be good. Thanks! Antoine
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index 14efa0ded75d..c8f8b7b56bba 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -110,8 +110,8 @@ static inline struct metadata_dst *tun_rx_dst(int md_size) static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) { struct metadata_dst *md_dst = skb_metadata_dst(skb); - int md_size; struct metadata_dst *new_md; + int md_size, ret; if (!md_dst || md_dst->type != METADATA_IP_TUNNEL) return ERR_PTR(-EINVAL); @@ -123,6 +123,17 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, sizeof(struct ip_tunnel_info) + md_size); +#ifdef CONFIG_DST_CACHE + ret = dst_cache_init(&new_md->u.tun_info.dst_cache, GFP_ATOMIC); + if (ret) { + /* We can't call metadata_dst_free directly as the still shared + * dst cache would be released. + */ + kfree(new_md); + return ERR_PTR(ret); + } +#endif + skb_dst_drop(skb); dst_hold(&new_md->dst); skb_dst_set(skb, &new_md->dst);