Message ID | 20220907015618.2140679-1-william.xuanziyang@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: tun: limit first seg size to avoid oversized linearization | expand |
Thanks, this looks good to me! On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan <william.xuanziyang@huawei.com> wrote: > > Recently, we found a syzkaller problem as following: > > ======================================================== > WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 > ... > Call trace: > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 > __alloc_pages_node include/linux/gfp.h:550 [inline] > alloc_pages_node include/linux/gfp.h:564 [inline] > kmalloc_large_node+0x94/0x350 mm/slub.c:4038 > __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 > __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 > pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 > __skb_grow include/linux/skbuff.h:2779 [inline] > tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 > tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 > tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036 > > It is because the first seg size of the iov_iter from user space is > very big, it is 2147479538 which is bigger than the threshold value > for bail out early in __alloc_pages(). And skb->pfmemalloc is true, > __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN > flag. Thus we got a warning. > > I noticed that non-first segs size are required less than PAGE_SIZE in > tun_napi_alloc_frags(). The first seg should not be a special case, and > oversized linearization is also unreasonable. Limit the first seg size to > PAGE_SIZE to avoid oversized linearization. > > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> > --- Acked-by: Petar Penkov <ppenkov@aviatrix.com>
On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan <william.xuanziyang@huawei.com> wrote: > > Recently, we found a syzkaller problem as following: > > ======================================================== > WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 > ... > Call trace: > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 > __alloc_pages_node include/linux/gfp.h:550 [inline] > alloc_pages_node include/linux/gfp.h:564 [inline] > kmalloc_large_node+0x94/0x350 mm/slub.c:4038 > __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 > __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 > pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 > __skb_grow include/linux/skbuff.h:2779 [inline] > tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 > tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 > tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036 > > It is because the first seg size of the iov_iter from user space is > very big, it is 2147479538 which is bigger than the threshold value > for bail out early in __alloc_pages(). And skb->pfmemalloc is true, > __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN > flag. Thus we got a warning. > > I noticed that non-first segs size are required less than PAGE_SIZE in > tun_napi_alloc_frags(). The first seg should not be a special case, and > oversized linearization is also unreasonable. Limit the first seg size to > PAGE_SIZE to avoid oversized linearization. > > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> > --- > drivers/net/tun.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 259b2b84b2b3..7db515f94667 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -1454,12 +1454,12 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, > size_t len, > const struct iov_iter *it) > { > + size_t linear = iov_iter_single_seg_count(it); > struct sk_buff *skb; > - size_t linear; > int err; > int i; > > - if (it->nr_segs > MAX_SKB_FRAGS + 1) > + if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear > PAGE_SIZE) > return ERR_PTR(-EMSGSIZE); > This does not look good to me. Some drivers allocate 9KB+ for 9000 MTU, in a single allocation, because the hardware is not SG capable in RX. > local_bh_disable(); > @@ -1468,7 +1468,6 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, > if (!skb) > return ERR_PTR(-ENOMEM); > > - linear = iov_iter_single_seg_count(it); > err = __skb_grow(skb, linear); > if (err) > goto free; > -- > 2.25.1 >
> On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan > <william.xuanziyang@huawei.com> wrote: >> >> Recently, we found a syzkaller problem as following: >> >> ======================================================== >> WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 >> ... >> Call trace: >> __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 >> __alloc_pages_node include/linux/gfp.h:550 [inline] >> alloc_pages_node include/linux/gfp.h:564 [inline] >> kmalloc_large_node+0x94/0x350 mm/slub.c:4038 >> __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 >> __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 >> pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 >> __skb_grow include/linux/skbuff.h:2779 [inline] >> tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 >> tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 >> tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036 >> >> It is because the first seg size of the iov_iter from user space is >> very big, it is 2147479538 which is bigger than the threshold value >> for bail out early in __alloc_pages(). And skb->pfmemalloc is true, >> __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN >> flag. Thus we got a warning. >> >> I noticed that non-first segs size are required less than PAGE_SIZE in >> tun_napi_alloc_frags(). The first seg should not be a special case, and >> oversized linearization is also unreasonable. Limit the first seg size to >> PAGE_SIZE to avoid oversized linearization. >> >> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") >> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> >> --- >> drivers/net/tun.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index 259b2b84b2b3..7db515f94667 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -1454,12 +1454,12 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, >> size_t len, >> const struct iov_iter *it) >> { >> + size_t linear = iov_iter_single_seg_count(it); >> struct sk_buff *skb; >> - size_t linear; >> int err; >> int i; >> >> - if (it->nr_segs > MAX_SKB_FRAGS + 1) >> + if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear > PAGE_SIZE) >> return ERR_PTR(-EMSGSIZE); >> > > This does not look good to me. > > Some drivers allocate 9KB+ for 9000 MTU, in a single allocation, > because the hardware is not SG capable in RX. So, do you mean that it does not matter and keep current status, or give a bigger size but PAGE_SIZE (usually 4KB size)? Would like to hear your advice. Thank you. > >> local_bh_disable(); >> @@ -1468,7 +1468,6 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, >> if (!skb) >> return ERR_PTR(-ENOMEM); >> >> - linear = iov_iter_single_seg_count(it); >> err = __skb_grow(skb, linear); >> if (err) >> goto free; >> -- >> 2.25.1 >> > . >
On Tue, 2022-09-13 at 20:07 +0800, Ziyang Xuan (William) wrote: > > On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan > > <william.xuanziyang@huawei.com> wrote: > > > > > > Recently, we found a syzkaller problem as following: > > > > > > ======================================================== > > > WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 > > > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 > > > ... > > > Call trace: > > > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 > > > __alloc_pages_node include/linux/gfp.h:550 [inline] > > > alloc_pages_node include/linux/gfp.h:564 [inline] > > > kmalloc_large_node+0x94/0x350 mm/slub.c:4038 > > > __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 > > > __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 > > > pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 > > > __skb_grow include/linux/skbuff.h:2779 [inline] > > > tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 > > > tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 > > > tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036 > > > > > > It is because the first seg size of the iov_iter from user space > > > is > > > very big, it is 2147479538 which is bigger than the threshold > > > value > > > for bail out early in __alloc_pages(). And skb->pfmemalloc is > > > true, > > > __kmalloc_reserve() would use pfmemalloc reserves without > > > __GFP_NOWARN > > > flag. Thus we got a warning. > > > > > > I noticed that non-first segs size are required less than > > > PAGE_SIZE in > > > tun_napi_alloc_frags(). The first seg should not be a special > > > case, and > > > oversized linearization is also unreasonable. Limit the first seg > > > size to > > > PAGE_SIZE to avoid oversized linearization. > > > > > > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP > > > driver") > > > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> > > > --- > > > drivers/net/tun.c | 5 ++--- > > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > > index 259b2b84b2b3..7db515f94667 100644 > > > --- a/drivers/net/tun.c > > > +++ b/drivers/net/tun.c > > > @@ -1454,12 +1454,12 @@ static struct sk_buff > > > *tun_napi_alloc_frags(struct tun_file *tfile, > > > size_t len, > > > const struct iov_iter > > > *it) > > > { > > > + size_t linear = iov_iter_single_seg_count(it); > > > struct sk_buff *skb; > > > - size_t linear; > > > int err; > > > int i; > > > > > > - if (it->nr_segs > MAX_SKB_FRAGS + 1) > > > + if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear > > > > PAGE_SIZE) > > > return ERR_PTR(-EMSGSIZE); > > > > > > > This does not look good to me. > > > > Some drivers allocate 9KB+ for 9000 MTU, in a single allocation, > > because the hardware is not SG capable in RX. > > So, do you mean that it does not matter and keep current status, or > give a bigger size but PAGE_SIZE (usually 4KB size)? > > Would like to hear your advice. I'm guessing that what Eric is suggesting here is to use a bigger limit for 'linear'. Possibly ETH_MAX_MTU could fit. @Eric, fell free to correct me :) Thanks! Paolo
On Thu, Sep 15, 2022 at 3:31 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2022-09-13 at 20:07 +0800, Ziyang Xuan (William) wrote: > > > On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan > > > <william.xuanziyang@huawei.com> wrote: > > > > > > > > Recently, we found a syzkaller problem as following: > > > > > > > > ======================================================== > > > > WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 > > > > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 > > > > ... > > > > Call trace: > > > > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 > > > > __alloc_pages_node include/linux/gfp.h:550 [inline] > > > > alloc_pages_node include/linux/gfp.h:564 [inline] > > > > kmalloc_large_node+0x94/0x350 mm/slub.c:4038 > > > > __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 > > > > __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 > > > > pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 > > > > __skb_grow include/linux/skbuff.h:2779 [inline] > > > > tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 > > > > tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 > > > > tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036 > > > > > > > > It is because the first seg size of the iov_iter from user space > > > > is > > > > very big, it is 2147479538 which is bigger than the threshold > > > > value > > > > for bail out early in __alloc_pages(). And skb->pfmemalloc is > > > > true, > > > > __kmalloc_reserve() would use pfmemalloc reserves without > > > > __GFP_NOWARN > > > > flag. Thus we got a warning. > > > > > > > > I noticed that non-first segs size are required less than > > > > PAGE_SIZE in > > > > tun_napi_alloc_frags(). The first seg should not be a special > > > > case, and > > > > oversized linearization is also unreasonable. Limit the first seg > > > > size to > > > > PAGE_SIZE to avoid oversized linearization. > > > > > > > > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP > > > > driver") > > > > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> > > > > --- > > > > drivers/net/tun.c | 5 ++--- > > > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > > > index 259b2b84b2b3..7db515f94667 100644 > > > > --- a/drivers/net/tun.c > > > > +++ b/drivers/net/tun.c > > > > @@ -1454,12 +1454,12 @@ static struct sk_buff > > > > *tun_napi_alloc_frags(struct tun_file *tfile, > > > > size_t len, > > > > const struct iov_iter > > > > *it) > > > > { > > > > + size_t linear = iov_iter_single_seg_count(it); > > > > struct sk_buff *skb; > > > > - size_t linear; > > > > int err; > > > > int i; > > > > > > > > - if (it->nr_segs > MAX_SKB_FRAGS + 1) > > > > + if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear > > > > > PAGE_SIZE) > > > > return ERR_PTR(-EMSGSIZE); > > > > > > > > > > This does not look good to me. > > > > > > Some drivers allocate 9KB+ for 9000 MTU, in a single allocation, > > > because the hardware is not SG capable in RX. > > > > So, do you mean that it does not matter and keep current status, or > > give a bigger size but PAGE_SIZE (usually 4KB size)? > > > > Would like to hear your advice. > > I'm guessing that what Eric is suggesting here is to use a bigger limit > for 'linear'. Possibly ETH_MAX_MTU could fit. @Eric, fell free to > correct me :) > Something like that, yes. We need to be careful when approaching 64K limit, because of possible u16 fields overflows. We just got another patch in GRO layer, just because tun has not been fixed yet. > Thanks! > > Paolo >
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 259b2b84b2b3..7db515f94667 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1454,12 +1454,12 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, size_t len, const struct iov_iter *it) { + size_t linear = iov_iter_single_seg_count(it); struct sk_buff *skb; - size_t linear; int err; int i; - if (it->nr_segs > MAX_SKB_FRAGS + 1) + if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear > PAGE_SIZE) return ERR_PTR(-EMSGSIZE); local_bh_disable(); @@ -1468,7 +1468,6 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile, if (!skb) return ERR_PTR(-ENOMEM); - linear = iov_iter_single_seg_count(it); err = __skb_grow(skb, linear); if (err) goto free;
Recently, we found a syzkaller problem as following: ======================================================== WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 ... Call trace: __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295 __alloc_pages_node include/linux/gfp.h:550 [inline] alloc_pages_node include/linux/gfp.h:564 [inline] kmalloc_large_node+0x94/0x350 mm/slub.c:4038 __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545 __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151 pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654 __skb_grow include/linux/skbuff.h:2779 [inline] tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477 tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835 tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036 It is because the first seg size of the iov_iter from user space is very big, it is 2147479538 which is bigger than the threshold value for bail out early in __alloc_pages(). And skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN flag. Thus we got a warning. I noticed that non-first segs size are required less than PAGE_SIZE in tun_napi_alloc_frags(). The first seg should not be a special case, and oversized linearization is also unreasonable. Limit the first seg size to PAGE_SIZE to avoid oversized linearization. Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> --- drivers/net/tun.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)