[v3] net/core/skbuff: fix passing wrong size to __alloc_skb

Message ID	20210227175114.28645-1-paskripkin@gmail.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: Pavel Skripkin <paskripkin@gmail.com> To: alobakin@pm.me Cc: davem@davemloft.net, linmiaohe@huawei.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Pavel Skripkin <paskripkin@gmail.com>, syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com Subject: [PATCH v3] net/core/skbuff: fix passing wrong size to __alloc_skb Date: Sat, 27 Feb 2021 20:51:14 +0300 Message-Id: <20210227175114.28645-1-paskripkin@gmail.com> In-Reply-To: <20210227110306.13360-1-alobakin@pm.me> References: <20210227110306.13360-1-alobakin@pm.me> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v3] net/core/skbuff: fix passing wrong size to __alloc_skb \| expand [v3] net/core/skbuff: fix passing wrong size to __alloc_skb

Context	Check	Description
netdev/tree_selection	success	Guessing tree name failed - patch did not apply

Pavel Skripkin Feb. 27, 2021, 5:51 p.m. UTC

syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER.
It was caused by __netdev_alloc_skb(), which doesn't check len value after adding NET_SKB_PAD.
Order will be >= MAX_ORDER and passed to __alloc_pages_nodemask() if size > KMALLOC_MAX_SIZE.
Same happens in __napi_alloc_skb.

static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
{
	struct page *page;
	void *ptr = NULL;
	unsigned int order = get_order(size);
...
	page = alloc_pages_node(node, flags, order);
...

[1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
Call Trace:
 __alloc_pages include/linux/gfp.h:511 [inline]
 __alloc_pages_node include/linux/gfp.h:524 [inline]
 alloc_pages_node include/linux/gfp.h:538 [inline]
 kmalloc_large_node+0x60/0x110 mm/slub.c:3999
 __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
 __kmalloc_reserve net/core/skbuff.c:150 [inline]
 __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
 __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
 netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
 qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
 qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
 call_write_iter include/linux/fs.h:1901 [inline]
 new_sync_write+0x426/0x650 fs/read_write.c:518
 vfs_write+0x791/0xa30 fs/read_write.c:605
 ksys_write+0x12d/0x250 fs/read_write.c:658
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>

---
Changes from v3:
* Removed Change-Id and extra tabs in net/core/skbuff.c

Changes from v2:
* Added length check to __napi_alloc_skb
* Added unlikely() in checks

Change from v1:
* Added length check to __netdev_alloc_skb
---
 net/core/skbuff.c | 6 ++++++
 1 file changed, 6 insertions(+)

Alexander Lobakin Feb. 28, 2021, 6:14 p.m. UTC | #1

From: Pavel Skripkin <paskripkin@gmail.com>
Date: Sat, 27 Feb 2021 20:51:14 +0300

Hi,

> syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER.
> It was caused by __netdev_alloc_skb(), which doesn't check len value after adding NET_SKB_PAD.
> Order will be >= MAX_ORDER and passed to __alloc_pages_nodemask() if size > KMALLOC_MAX_SIZE.
> Same happens in __napi_alloc_skb.
>
> static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> {
> 	struct page *page;
> 	void *ptr = NULL;
> 	unsigned int order = get_order(size);
> ...
> 	page = alloc_pages_node(node, flags, order);
> ...
>
> [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
> Call Trace:
>  __alloc_pages include/linux/gfp.h:511 [inline]
>  __alloc_pages_node include/linux/gfp.h:524 [inline]
>  alloc_pages_node include/linux/gfp.h:538 [inline]
>  kmalloc_large_node+0x60/0x110 mm/slub.c:3999
>  __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
>  __kmalloc_reserve net/core/skbuff.c:150 [inline]
>  __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
>  __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
>  netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
>  qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
>  qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
>  call_write_iter include/linux/fs.h:1901 [inline]
>  new_sync_write+0x426/0x650 fs/read_write.c:518
>  vfs_write+0x791/0xa30 fs/read_write.c:605
>  ksys_write+0x12d/0x250 fs/read_write.c:658
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Ah, by the way. Have you tried to seek for the root cause, why
a request for such insanely large (at least 4 Mib) skb happens
in QRTR? I don't believe it's intended to be like this.
Now I feel that silencing this error with early return isn't
really correct approach for this.

> Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
>
> ---
> Changes from v3:
> * Removed Change-Id and extra tabs in net/core/skbuff.c
>
> Changes from v2:
> * Added length check to __napi_alloc_skb
> * Added unlikely() in checks
>
> Change from v1:
> * Added length check to __netdev_alloc_skb
> ---
>  net/core/skbuff.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 785daff48030..ec7ba8728b61 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -443,6 +443,9 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
>  	if (len <= SKB_WITH_OVERHEAD(1024) ||
>  	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
>  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> +		if (unlikely(len > KMALLOC_MAX_SIZE))
> +			return NULL;
> +
>  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
>  		if (!skb)
>  			goto skb_fail;
> @@ -517,6 +520,9 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>  	if (len <= SKB_WITH_OVERHEAD(1024) ||
>  	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
>  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> +		if (unlikely(len > KMALLOC_MAX_SIZE))
> +			return NULL;
> +
>  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
>  		if (!skb)
>  			goto skb_fail;
> --
> 2.25.1

Thanks,
Al

Jakub Kicinski Feb. 28, 2021, 6:55 p.m. UTC | #2

On Sun, 28 Feb 2021 18:14:46 +0000 Alexander Lobakin wrote:
> > [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
> > Call Trace:
> >  __alloc_pages include/linux/gfp.h:511 [inline]
> >  __alloc_pages_node include/linux/gfp.h:524 [inline]
> >  alloc_pages_node include/linux/gfp.h:538 [inline]
> >  kmalloc_large_node+0x60/0x110 mm/slub.c:3999
> >  __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
> >  __kmalloc_reserve net/core/skbuff.c:150 [inline]
> >  __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
> >  __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
> >  netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
> >  qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
> >  qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
> >  call_write_iter include/linux/fs.h:1901 [inline]
> >  new_sync_write+0x426/0x650 fs/read_write.c:518
> >  vfs_write+0x791/0xa30 fs/read_write.c:605
> >  ksys_write+0x12d/0x250 fs/read_write.c:658
> >  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9  
> 
> Ah, by the way. Have you tried to seek for the root cause, why
> a request for such insanely large (at least 4 Mib) skb happens
> in QRTR? I don't believe it's intended to be like this.
> Now I feel that silencing this error with early return isn't
> really correct approach for this.

Right, IIUC Eric suggested we limit the length of the allocation 
to 64KB because that's the max reasonable skb length, and QRTR tun write
results in generating a single skb. That seems like a good approach.

Alexander Lobakin Feb. 28, 2021, 7:11 p.m. UTC | #3

From: Jakub Kicinski <kuba@kernel.org>
Date: Sun, 28 Feb 2021 10:55:52 -0800

> On Sun, 28 Feb 2021 18:14:46 +0000 Alexander Lobakin wrote:
> > > [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
> > > Call Trace:
> > >  __alloc_pages include/linux/gfp.h:511 [inline]
> > >  __alloc_pages_node include/linux/gfp.h:524 [inline]
> > >  alloc_pages_node include/linux/gfp.h:538 [inline]
> > >  kmalloc_large_node+0x60/0x110 mm/slub.c:3999
> > >  __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
> > >  __kmalloc_reserve net/core/skbuff.c:150 [inline]
> > >  __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
> > >  __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
> > >  netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
> > >  qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
> > >  qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
> > >  call_write_iter include/linux/fs.h:1901 [inline]
> > >  new_sync_write+0x426/0x650 fs/read_write.c:518
> > >  vfs_write+0x791/0xa30 fs/read_write.c:605
> > >  ksys_write+0x12d/0x250 fs/read_write.c:658
> > >  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > Ah, by the way. Have you tried to seek for the root cause, why
> > a request for such insanely large (at least 4 Mib) skb happens
> > in QRTR? I don't believe it's intended to be like this.
> > Now I feel that silencing this error with early return isn't
> > really correct approach for this.
>
> Right, IIUC Eric suggested we limit the length of the allocation
> to 64KB because that's the max reasonable skb length, and QRTR tun write
> results in generating a single skb. That seems like a good approach.

+1 for explicit capping max skb length to 64K.

Al

Pavel Skripkin Feb. 28, 2021, 7:28 p.m. UTC | #4

Hi, thanks for reply!

> From: Pavel Skripkin <paskripkin@gmail.com>
> Date: Sat, 27 Feb 2021 20:51:14 +0300
> 
> Hi,
> 
> > syzbot found WARNING in __alloc_pages_nodemask()[1] when order >=
> > MAX_ORDER.
> > It was caused by __netdev_alloc_skb(), which doesn't check len
> > value after adding NET_SKB_PAD.
> > Order will be >= MAX_ORDER and passed to __alloc_pages_nodemask()
> > if size > KMALLOC_MAX_SIZE.
> > Same happens in __napi_alloc_skb.
> > 
> > static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> > {
> > 	struct page *page;
> > 	void *ptr = NULL;
> > 	unsigned int order = get_order(size);
> > ...
> > 	page = alloc_pages_node(node, flags, order);
> > ...
> > 
> > [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730
> > mm/page_alloc.c:5014
> > Call Trace:
> >  __alloc_pages include/linux/gfp.h:511 [inline]
> >  __alloc_pages_node include/linux/gfp.h:524 [inline]
> >  alloc_pages_node include/linux/gfp.h:538 [inline]
> >  kmalloc_large_node+0x60/0x110 mm/slub.c:3999
> >  __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
> >  __kmalloc_reserve net/core/skbuff.c:150 [inline]
> >  __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
> >  __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
> >  netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
> >  qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
> >  qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
> >  call_write_iter include/linux/fs.h:1901 [inline]
> >  new_sync_write+0x426/0x650 fs/read_write.c:518
> >  vfs_write+0x791/0xa30 fs/read_write.c:605
> >  ksys_write+0x12d/0x250 fs/read_write.c:658
> >  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Ah, by the way. Have you tried to seek for the root cause, why
> a request for such insanely large (at least 4 Mib) skb happens
> in QRTR? I don't believe it's intended to be like this.
> Now I feel that silencing this error with early return isn't
> really correct approach for this.

Yeah, i figured it out. Syzbot provides reproducer for thig bug:

void loop(void)
{
  intptr_t res = 0;
  memcpy((void*)0x20000000, "/dev/qrtr-tun\000", 14);
  res = syscall(__NR_openat, 0xffffffffffffff9cul, 0x20000000ul, 1ul,
0);
  if (res != -1)
    r[0] = res;
  memcpy((void*)0x20000040, "\x02", 1);
  syscall(__NR_write, r[0], 0x20000040ul, 0x400000ul);
}

So, simply it writes to /dev/qrtr-tun 0x400000 bytes.
In qrtr_tun_write_iter there is a check, that the length is smaller
than KMALLOC_MAX_VSIZE. Then the length is passed to
__netdev_alloc_skb, where it becomes more than KMALLOC_MAX_VSIZE.

> 
> > Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
> > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> > 
> > ---
> > Changes from v3:
> > * Removed Change-Id and extra tabs in net/core/skbuff.c
> > 
> > Changes from v2:
> > * Added length check to __napi_alloc_skb
> > * Added unlikely() in checks
> > 
> > Change from v1:
> > * Added length check to __netdev_alloc_skb
> > ---
> >  net/core/skbuff.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 785daff48030..ec7ba8728b61 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -443,6 +443,9 @@ struct sk_buff *__netdev_alloc_skb(struct
> > net_device *dev, unsigned int len,
> >  	if (len <= SKB_WITH_OVERHEAD(1024) ||
> >  	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> >  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> > +		if (unlikely(len > KMALLOC_MAX_SIZE))
> > +			return NULL;
> > +
> >  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX,
> > NUMA_NO_NODE);
> >  		if (!skb)
> >  			goto skb_fail;
> > @@ -517,6 +520,9 @@ struct sk_buff *__napi_alloc_skb(struct
> > napi_struct *napi, unsigned int len,
> >  	if (len <= SKB_WITH_OVERHEAD(1024) ||
> >  	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> >  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> > +		if (unlikely(len > KMALLOC_MAX_SIZE))
> > +			return NULL;
> > +
> >  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX,
> > NUMA_NO_NODE);
> >  		if (!skb)
> >  			goto skb_fail;
> > --
> > 2.25.1
> 
> Thanks,
> Al
> 

With regards,
Pavel Skripkin

Alexander Lobakin Feb. 28, 2021, 8:10 p.m. UTC | #5

From: Pavel Skripkin <paskripkin@gmail.com>
Date: Sun, 28 Feb 2021 22:28:13 +0300

> Hi, thanks for reply!
>
> > From: Pavel Skripkin <paskripkin@gmail.com>
> > Date: Sat, 27 Feb 2021 20:51:14 +0300
> >
> > Hi,
> >
> > > syzbot found WARNING in __alloc_pages_nodemask()[1] when order >=
> > > MAX_ORDER.
> > > It was caused by __netdev_alloc_skb(), which doesn't check len
> > > value after adding NET_SKB_PAD.
> > > Order will be >= MAX_ORDER and passed to __alloc_pages_nodemask()
> > > if size > KMALLOC_MAX_SIZE.
> > > Same happens in __napi_alloc_skb.
> > >
> > > static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
> > > {
> > > 	struct page *page;
> > > 	void *ptr = NULL;
> > > 	unsigned int order = get_order(size);
> > > ...
> > > 	page = alloc_pages_node(node, flags, order);
> > > ...
> > >
> > > [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730
> > > mm/page_alloc.c:5014
> > > Call Trace:
> > >  __alloc_pages include/linux/gfp.h:511 [inline]
> > >  __alloc_pages_node include/linux/gfp.h:524 [inline]
> > >  alloc_pages_node include/linux/gfp.h:538 [inline]
> > >  kmalloc_large_node+0x60/0x110 mm/slub.c:3999
> > >  __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
> > >  __kmalloc_reserve net/core/skbuff.c:150 [inline]
> > >  __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
> > >  __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
> > >  netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
> > >  qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
> > >  qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
> > >  call_write_iter include/linux/fs.h:1901 [inline]
> > >  new_sync_write+0x426/0x650 fs/read_write.c:518
> > >  vfs_write+0x791/0xa30 fs/read_write.c:605
> > >  ksys_write+0x12d/0x250 fs/read_write.c:658
> > >  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > Ah, by the way. Have you tried to seek for the root cause, why
> > a request for such insanely large (at least 4 Mib) skb happens
> > in QRTR? I don't believe it's intended to be like this.
> > Now I feel that silencing this error with early return isn't
> > really correct approach for this.
>
> Yeah, i figured it out. Syzbot provides reproducer for thig bug:
>
> void loop(void)
> {
>   intptr_t res = 0;
>   memcpy((void*)0x20000000, "/dev/qrtr-tun\000", 14);
>   res = syscall(__NR_openat, 0xffffffffffffff9cul, 0x20000000ul, 1ul,
> 0);
>   if (res != -1)
>     r[0] = res;
>   memcpy((void*)0x20000040, "\x02", 1);
>   syscall(__NR_write, r[0], 0x20000040ul, 0x400000ul);
> }
>
> So, simply it writes to /dev/qrtr-tun 0x400000 bytes.
> In qrtr_tun_write_iter there is a check, that the length is smaller
> than KMALLOC_MAX_VSIZE. Then the length is passed to
> __netdev_alloc_skb, where it becomes more than KMALLOC_MAX_VSIZE.

I've checked the logics in qrtr_tun_write_iter(). Seems like it's
only trying to prevent kzallocs of sizes larger than the maximum
and doesn't care about netdev_alloc_skb() at all, as it ignores
the fact that, besides NET_SKB_PAD and NET_IP_ALIGN, skbs always
have SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) on top of
the "usable" size.

On the other hand, skb memory overheads, kmalloc bounds etc. are
an internal thing and all related corner cases should be handled
inside the implementations, not the users. From this point, even
this check for (len < KMALLOC_MAX_SIZE) is a bit bogus.
I think in that particular case with the size coming from userspace
(i.e. untrusted source), the allocations (both kzalloc() and
__netdev_alloc_skb()) should be performed with __GFP_NOWARN, so
insane values won't provoke any splats.

So maybe use it as a fix for this particular warning (seems like
it's the sole place in the entire kernel that can potentially
request such skb allocations) and don't add any branches into
hot *alloc_skb() at all?
We might add a cap for max skb length later, as Jakub pointed.

> >
> > > Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
> > > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> > >
> > > ---
> > > Changes from v3:
> > > * Removed Change-Id and extra tabs in net/core/skbuff.c
> > >
> > > Changes from v2:
> > > * Added length check to __napi_alloc_skb
> > > * Added unlikely() in checks
> > >
> > > Change from v1:
> > > * Added length check to __netdev_alloc_skb
> > > ---
> > >  net/core/skbuff.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 785daff48030..ec7ba8728b61 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -443,6 +443,9 @@ struct sk_buff *__netdev_alloc_skb(struct
> > > net_device *dev, unsigned int len,
> > >  	if (len <= SKB_WITH_OVERHEAD(1024) ||
> > >  	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> > >  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> > > +		if (unlikely(len > KMALLOC_MAX_SIZE))
> > > +			return NULL;
> > > +
> > >  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX,
> > > NUMA_NO_NODE);
> > >  		if (!skb)
> > >  			goto skb_fail;
> > > @@ -517,6 +520,9 @@ struct sk_buff *__napi_alloc_skb(struct
> > > napi_struct *napi, unsigned int len,
> > >  	if (len <= SKB_WITH_OVERHEAD(1024) ||
> > >  	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> > >  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> > > +		if (unlikely(len > KMALLOC_MAX_SIZE))
> > > +			return NULL;
> > > +
> > >  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX,
> > > NUMA_NO_NODE);
> > >  		if (!skb)
> > >  			goto skb_fail;
> > > --
> > > 2.25.1
> >
> > Thanks,
> > Al
> >
>
> With regards,
> Pavel Skripkin

Thanks,
Al

Pavel Skripkin Feb. 28, 2021, 8:27 p.m. UTC | #6

> From: Pavel Skripkin <paskripkin@gmail.com>
> Date: Sun, 28 Feb 2021 22:28:13 +0300
> 
> > Hi, thanks for reply!
> > 
> > > From: Pavel Skripkin <paskripkin@gmail.com>
> > > Date: Sat, 27 Feb 2021 20:51:14 +0300
> > > 
> > > Hi,
> > > 
> > > > syzbot found WARNING in __alloc_pages_nodemask()[1] when order
> > > > >=
> > > > MAX_ORDER.
> > > > It was caused by __netdev_alloc_skb(), which doesn't check len
> > > > value after adding NET_SKB_PAD.
> > > > Order will be >= MAX_ORDER and passed to
> > > > __alloc_pages_nodemask()
> > > > if size > KMALLOC_MAX_SIZE.
> > > > Same happens in __napi_alloc_skb.
> > > > 
> > > > static void *kmalloc_large_node(size_t size, gfp_t flags, int
> > > > node)
> > > > {
> > > > 	struct page *page;
> > > > 	void *ptr = NULL;
> > > > 	unsigned int order = get_order(size);
> > > > ...
> > > > 	page = alloc_pages_node(node, flags, order);
> > > > ...
> > > > 
> > > > [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730
> > > > mm/page_alloc.c:5014
> > > > Call Trace:
> > > >  __alloc_pages include/linux/gfp.h:511 [inline]
> > > >  __alloc_pages_node include/linux/gfp.h:524 [inline]
> > > >  alloc_pages_node include/linux/gfp.h:538 [inline]
> > > >  kmalloc_large_node+0x60/0x110 mm/slub.c:3999
> > > >  __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
> > > >  __kmalloc_reserve net/core/skbuff.c:150 [inline]
> > > >  __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
> > > >  __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
> > > >  netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
> > > >  qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
> > > >  qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
> > > >  call_write_iter include/linux/fs.h:1901 [inline]
> > > >  new_sync_write+0x426/0x650 fs/read_write.c:518
> > > >  vfs_write+0x791/0xa30 fs/read_write.c:605
> > > >  ksys_write+0x12d/0x250 fs/read_write.c:658
> > > >  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> > > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > 
> > > Ah, by the way. Have you tried to seek for the root cause, why
> > > a request for such insanely large (at least 4 Mib) skb happens
> > > in QRTR? I don't believe it's intended to be like this.
> > > Now I feel that silencing this error with early return isn't
> > > really correct approach for this.
> > 
> > Yeah, i figured it out. Syzbot provides reproducer for thig bug:
> > 
> > void loop(void)
> > {
> >   intptr_t res = 0;
> >   memcpy((void*)0x20000000, "/dev/qrtr-tun\000", 14);
> >   res = syscall(__NR_openat, 0xffffffffffffff9cul, 0x20000000ul,
> > 1ul,
> > 0);
> >   if (res != -1)
> >     r[0] = res;
> >   memcpy((void*)0x20000040, "\x02", 1);
> >   syscall(__NR_write, r[0], 0x20000040ul, 0x400000ul);
> > }
> > 
> > So, simply it writes to /dev/qrtr-tun 0x400000 bytes.
> > In qrtr_tun_write_iter there is a check, that the length is smaller
> > than KMALLOC_MAX_VSIZE. Then the length is passed to
> > __netdev_alloc_skb, where it becomes more than KMALLOC_MAX_VSIZE.
> 
> I've checked the logics in qrtr_tun_write_iter(). Seems like it's
> only trying to prevent kzallocs of sizes larger than the maximum
> and doesn't care about netdev_alloc_skb() at all, as it ignores
> the fact that, besides NET_SKB_PAD and NET_IP_ALIGN, skbs always
> have SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) on top of
> the "usable" size.
> 
> On the other hand, skb memory overheads, kmalloc bounds etc. are
> an internal thing and all related corner cases should be handled
> inside the implementations, not the users. From this point, even
> this check for (len < KMALLOC_MAX_SIZE) is a bit bogus.
> I think in that particular case with the size coming from userspace
> (i.e. untrusted source), the allocations (both kzalloc() and
> __netdev_alloc_skb()) should be performed with __GFP_NOWARN, so
> insane values won't provoke any splats.
> 
> So maybe use it as a fix for this particular warning (seems like
> it's the sole place in the entire kernel that can potentially
> request such skb allocations) and don't add any branches into
> hot *alloc_skb() at all?

Well, it seems like it's better solution for this
specific warning. Thanks for quick feedback, I'll send You new patch
version soon.

> We might add a cap for max skb length later, as Jakub pointed.
> 
> > > > Reported-by: 
> > > > syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
> > > > Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
> > > > 
> > > > ---
> > > > Changes from v3:
> > > > * Removed Change-Id and extra tabs in net/core/skbuff.c
> > > > 
> > > > Changes from v2:
> > > > * Added length check to __napi_alloc_skb
> > > > * Added unlikely() in checks
> > > > 
> > > > Change from v1:
> > > > * Added length check to __netdev_alloc_skb
> > > > ---
> > > >  net/core/skbuff.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > index 785daff48030..ec7ba8728b61 100644
> > > > --- a/net/core/skbuff.c
> > > > +++ b/net/core/skbuff.c
> > > > @@ -443,6 +443,9 @@ struct sk_buff *__netdev_alloc_skb(struct
> > > > net_device *dev, unsigned int len,
> > > >  	if (len <= SKB_WITH_OVERHEAD(1024) ||
> > > >  	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> > > >  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> > > > +		if (unlikely(len > KMALLOC_MAX_SIZE))
> > > > +			return NULL;
> > > > +
> > > >  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX,
> > > > NUMA_NO_NODE);
> > > >  		if (!skb)
> > > >  			goto skb_fail;
> > > > @@ -517,6 +520,9 @@ struct sk_buff *__napi_alloc_skb(struct
> > > > napi_struct *napi, unsigned int len,
> > > >  	if (len <= SKB_WITH_OVERHEAD(1024) ||
> > > >  	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
> > > >  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
> > > > +		if (unlikely(len > KMALLOC_MAX_SIZE))
> > > > +			return NULL;
> > > > +
> > > >  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX,
> > > > NUMA_NO_NODE);
> > > >  		if (!skb)
> > > >  			goto skb_fail;
> > > > --
> > > > 2.25.1
> > > 
> > > Thanks,
> > > Al
> > > 
> > 
> > With regards,
> > Pavel Skripkin
> 
> Thanks,
> Al
> 
With regards,
Pavel Skripkin

[v3] net/core/skbuff: fix passing wrong size to __alloc_skb

Checks

Commit Message

Comments

Patch