diff mbox

Moving frags and SKBTX_DEV_ZEROCOPY skbs

Message ID 1400077432.7973.85.camel@edumazet-glaptop2.roam.corp.google.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Dumazet May 14, 2014, 2:23 p.m. UTC
On Wed, 2014-05-14 at 14:40 +0100, Zoltan Kiss wrote:
> Hi,
> 
> Recently I've investigated issues around SKBTX_DEV_ZEROCOPY skbs where 
> the frags list were modified. I came across this function skb_shift(), 
> which moves frags between skbs. And there are a lot more of such kind, 
> skb_split or skb_try_coalesce, for example.
> It could be a dangerous thing if a frag is referenced from an skb which 
> doesn't have the original destructor_arg, and to avoid that 
> skb_orphan_frags should be called. Although probably these functions are 
> not normally touched in usual usecases, I think it would be useful to 
> review core skb functions proactively and add an skb_orphan_frags 
> everywhere where the frags could be referenced from other places.
> Any opinion about this?


For skb_shift(), it is currently used from tcp stack only, where
this SKBTX_DEV_ZEROCOPY thing is not used, so I do not think there is a
bug for the moment.

I already gave a patch for skb_try_coalesce() : For this one we do not
wan skb_orphan_frags() overhead. Its simply better in this case to
abort.




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller May 14, 2014, 5:42 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 14 May 2014 07:23:52 -0700

> On Wed, 2014-05-14 at 14:40 +0100, Zoltan Kiss wrote:
>> Hi,
>> 
>> Recently I've investigated issues around SKBTX_DEV_ZEROCOPY skbs where 
>> the frags list were modified. I came across this function skb_shift(), 
>> which moves frags between skbs. And there are a lot more of such kind, 
>> skb_split or skb_try_coalesce, for example.
>> It could be a dangerous thing if a frag is referenced from an skb which 
>> doesn't have the original destructor_arg, and to avoid that 
>> skb_orphan_frags should be called. Although probably these functions are 
>> not normally touched in usual usecases, I think it would be useful to 
>> review core skb functions proactively and add an skb_orphan_frags 
>> everywhere where the frags could be referenced from other places.
>> Any opinion about this?
> 
> 
> For skb_shift(), it is currently used from tcp stack only, where
> this SKBTX_DEV_ZEROCOPY thing is not used, so I do not think there is a
> bug for the moment.
> 
> I already gave a patch for skb_try_coalesce() : For this one we do not
> wan skb_orphan_frags() overhead. Its simply better in this case to
> abort.

Eric can you please submit this formally?  It is second time I've seen
it posted as RFC :-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 14, 2014, 5:52 p.m. UTC | #2
On Wed, 2014-05-14 at 13:42 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 14 May 2014 07:23:52 -0700
> 
> > On Wed, 2014-05-14 at 14:40 +0100, Zoltan Kiss wrote:
> >> Hi,
> >> 
> >> Recently I've investigated issues around SKBTX_DEV_ZEROCOPY skbs where 
> >> the frags list were modified. I came across this function skb_shift(), 
> >> which moves frags between skbs. And there are a lot more of such kind, 
> >> skb_split or skb_try_coalesce, for example.
> >> It could be a dangerous thing if a frag is referenced from an skb which 
> >> doesn't have the original destructor_arg, and to avoid that 
> >> skb_orphan_frags should be called. Although probably these functions are 
> >> not normally touched in usual usecases, I think it would be useful to 
> >> review core skb functions proactively and add an skb_orphan_frags 
> >> everywhere where the frags could be referenced from other places.
> >> Any opinion about this?
> > 
> > 
> > For skb_shift(), it is currently used from tcp stack only, where
> > this SKBTX_DEV_ZEROCOPY thing is not used, so I do not think there is a
> > bug for the moment.
> > 
> > I already gave a patch for skb_try_coalesce() : For this one we do not
> > wan skb_orphan_frags() overhead. Its simply better in this case to
> > abort.
> 
> Eric can you please submit this formally?  It is second time I've seen
> it posted as RFC :-)

Sure, I was kind of waiting to make sure it was needed.

It looks like it is ;)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss May 14, 2014, 7:41 p.m. UTC | #3
On 14/05/14 15:23, Eric Dumazet wrote:
> On Wed, 2014-05-14 at 14:40 +0100, Zoltan Kiss wrote:
>> Hi,
>>
>> Recently I've investigated issues around SKBTX_DEV_ZEROCOPY skbs where
>> the frags list were modified. I came across this function skb_shift(),
>> which moves frags between skbs. And there are a lot more of such kind,
>> skb_split or skb_try_coalesce, for example.
>> It could be a dangerous thing if a frag is referenced from an skb which
>> doesn't have the original destructor_arg, and to avoid that
>> skb_orphan_frags should be called. Although probably these functions are
>> not normally touched in usual usecases, I think it would be useful to
>> review core skb functions proactively and add an skb_orphan_frags
>> everywhere where the frags could be referenced from other places.
>> Any opinion about this?
>
>
> For skb_shift(), it is currently used from tcp stack only, where
> this SKBTX_DEV_ZEROCOPY thing is not used, so I do not think there is a
> bug for the moment.
It is called from tcp_input.c, which suggests it can be called on 
incoming TCP packets. If the backend domain communicates with the 
frontend through sockets, zerocopy packets can turn up here.
But here is the thing: deliver_skb calls orphan_frags for every packet 
delivered to the local stack, so we are safe IF these functions are 
called before the IP stack. So we are safe now, but things can go wrong, if:
- such a frag-mangling function is called before deliver_skb, now or in 
the future
- if someone wants to take advantage of zerocopy in the guest<->backend path

>
> I already gave a patch for skb_try_coalesce() : For this one we do not
> wan skb_orphan_frags() overhead. Its simply better in this case to
> abort.
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1b62343f5837..85995a14aafc 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3838,7 +3839,10 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>   		return true;
>   	}
>
> -	if (skb_has_frag_list(to) || skb_has_frag_list(from))
> +	if (skb_has_frag_list(to) ||
> +	    skb_has_frag_list(from) ||
> +	    (skb_shinfo(to)->tx_flags & SKBTX_DEV_ZEROCOPY) ||
> +	    (skb_shinfo(from)->tx_flags & SKBTX_DEV_ZEROCOPY))
>   		return false;
>
>   	if (skb_headlen(from) != 0) {
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 14, 2014, 7:52 p.m. UTC | #4
On Wed, 2014-05-14 at 20:41 +0100, Zoltan Kiss wrote:
> On 14/05/14 15:23, Eric Dumazet wrote:
> > On Wed, 2014-05-14 at 14:40 +0100, Zoltan Kiss wrote:
> >> Hi,
> >>
> >> Recently I've investigated issues around SKBTX_DEV_ZEROCOPY skbs where
> >> the frags list were modified. I came across this function skb_shift(),
> >> which moves frags between skbs. And there are a lot more of such kind,
> >> skb_split or skb_try_coalesce, for example.
> >> It could be a dangerous thing if a frag is referenced from an skb which
> >> doesn't have the original destructor_arg, and to avoid that
> >> skb_orphan_frags should be called. Although probably these functions are
> >> not normally touched in usual usecases, I think it would be useful to
> >> review core skb functions proactively and add an skb_orphan_frags
> >> everywhere where the frags could be referenced from other places.
> >> Any opinion about this?
> >
> >
> > For skb_shift(), it is currently used from tcp stack only, where
> > this SKBTX_DEV_ZEROCOPY thing is not used, so I do not think there is a
> > bug for the moment.
> It is called from tcp_input.c, which suggests it can be called on 
> incoming TCP packets.

Nope. 

We split outgoing packets, stored in the socket write queue.
These packets are locally generated by tcp_sendmsg() and tcp_sendpage(),
no way we use SKBTX_DEV_ZEROCOPY yet.

This split happens when we receive an ACK, that's why it is in
tcp_input.c



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zoltan Kiss May 15, 2014, 5:14 p.m. UTC | #5
On 14/05/14 20:41, Zoltan Kiss wrote:
> But here is the thing: deliver_skb calls orphan_frags for every packet
> delivered to the local stack, so we are safe IF these functions are
> called before the IP stack. So we are safe now, but things can go wrong,
> if:
> - such a frag-mangling function is called before deliver_skb, now or in
> the future
> - if someone wants to take advantage of zerocopy in the guest<->backend
> path

Running through the code I've found the following core functions can 
shuffle frags between skbs (and don't handle zerocopy skbs already):
skb_gro_receive
skb_shift
skb_split

None of them can meet at the moment with zerocopy skbs, but it's better 
to keep it in mind for the future, that would blow up these kind of skbs.

Zoli
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1b62343f5837..85995a14aafc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3838,7 +3839,10 @@  bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 		return true;
 	}
 
-	if (skb_has_frag_list(to) || skb_has_frag_list(from))
+	if (skb_has_frag_list(to) ||
+	    skb_has_frag_list(from) ||
+	    (skb_shinfo(to)->tx_flags & SKBTX_DEV_ZEROCOPY) ||
+	    (skb_shinfo(from)->tx_flags & SKBTX_DEV_ZEROCOPY))
 		return false;
 
 	if (skb_headlen(from) != 0) {