xen/netback: handle empty rx queue in xenvif_rx_next_skb()

Message ID	20220713074823.5679-1-jgross@suse.com (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: Juergen Gross <jgross@suse.com> To: xen-devel@lists.xenproject.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Juergen Gross <jgross@suse.com>, Wei Liu <wei.liu@kernel.org>, Paul Durrant <paul@xen.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Subject: [PATCH] xen/netback: handle empty rx queue in xenvif_rx_next_skb() Date: Wed, 13 Jul 2022 09:48:23 +0200 Message-Id: <20220713074823.5679-1-jgross@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	xen/netback: handle empty rx queue in xenvif_rx_next_skb() \| expand xen/netback: handle empty rx queue in xenvif_rx_next_skb()

Context	Check	Description
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	warning	Target tree name not specified in the subject
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	success	CCed 8 of 8 maintainers
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Jürgen Groß July 13, 2022, 7:48 a.m. UTC

xenvif_rx_next_skb() is expecting the rx queue not being empty, but
in case the loop in xenvif_rx_action() is doing multiple iterations,
the availability of another skb in the rx queue is not being checked.

This can lead to crashes:

[40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback]
[40072.537534] PGD 0 P4D 0
[40072.537644] Oops: 0000 [#1] SMP NOPTI
[40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5
[40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021
[40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000
[40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback]
[40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246
[40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7
[40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8
[40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008
[40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708
[40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0
[40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000
[40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660
[40072.539211] Call Trace:
[40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback]
[40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback]

Fix that by stopping the loop in case the rx queue becomes empty.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/net/xen-netback/rx.c | 1 +
 1 file changed, 1 insertion(+)

Jan Beulich July 13, 2022, 7:59 a.m. UTC | #1

On 13.07.2022 09:48, Juergen Gross wrote:
> xenvif_rx_next_skb() is expecting the rx queue not being empty, but
> in case the loop in xenvif_rx_action() is doing multiple iterations,
> the availability of another skb in the rx queue is not being checked.
> 
> This can lead to crashes:
> 
> [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
> [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback]
> [40072.537534] PGD 0 P4D 0
> [40072.537644] Oops: 0000 [#1] SMP NOPTI
> [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5
> [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021
> [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000
> [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback]
> [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246
> [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7
> [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8
> [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008
> [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708
> [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0
> [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000
> [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660
> [40072.539211] Call Trace:
> [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback]
> [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback]
> 
> Fix that by stopping the loop in case the rx queue becomes empty.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Does this want a Fixes: tag and Cc: to stable@ (not the least since as per
above the issue was noticed with 4.12.x)?

> --- a/drivers/net/xen-netback/rx.c
> +++ b/drivers/net/xen-netback/rx.c
> @@ -495,6 +495,7 @@ void xenvif_rx_action(struct xenvif_queue *queue)
>  	queue->rx_copy.completed = &completed_skbs;
>  
>  	while (xenvif_rx_ring_slots_available(queue) &&
> +	       !skb_queue_empty(&queue->rx_queue) &&
>  	       work_done < RX_BATCH_SIZE) {
>  		xenvif_rx_skb(queue);
>  		work_done++;

I have to admit that I find the title a little misleading - you don't
deal with the issue _in_ xenvif_rx_next_skb(); you instead avoid
entering the function in such a case.

Jan

Paul Durrant July 13, 2022, 8:12 a.m. UTC | #2

On 13/07/2022 09:48, Juergen Gross wrote:
> xenvif_rx_next_skb() is expecting the rx queue not being empty, but
> in case the loop in xenvif_rx_action() is doing multiple iterations,
> the availability of another skb in the rx queue is not being checked.
> 
> This can lead to crashes:
> 
> [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
> [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback]
> [40072.537534] PGD 0 P4D 0
> [40072.537644] Oops: 0000 [#1] SMP NOPTI
> [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5
> [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021
> [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000
> [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback]
> [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246
> [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7
> [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8
> [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008
> [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708
> [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0
> [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000
> [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660
> [40072.539211] Call Trace:
> [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback]
> [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback]
> 
> Fix that by stopping the loop in case the rx queue becomes empty.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Paul Durrant <paul@xen.org>

Jürgen Groß July 13, 2022, 9:31 a.m. UTC | #3

On 13.07.22 09:59, Jan Beulich wrote:
> On 13.07.2022 09:48, Juergen Gross wrote:
>> xenvif_rx_next_skb() is expecting the rx queue not being empty, but
>> in case the loop in xenvif_rx_action() is doing multiple iterations,
>> the availability of another skb in the rx queue is not being checked.
>>
>> This can lead to crashes:
>>
>> [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
>> [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback]
>> [40072.537534] PGD 0 P4D 0
>> [40072.537644] Oops: 0000 [#1] SMP NOPTI
>> [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5
>> [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021
>> [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000
>> [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback]
>> [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246
>> [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7
>> [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8
>> [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008
>> [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708
>> [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0
>> [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000
>> [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660
>> [40072.539211] Call Trace:
>> [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback]
>> [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback]
>>
>> Fix that by stopping the loop in case the rx queue becomes empty.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Does this want a Fixes: tag and Cc: to stable@ (not the least since as per
> above the issue was noticed with 4.12.x)?

Hmm, I _think_ the issue was introduced with eb1723a29b9a. Do you agree?

> 
>> --- a/drivers/net/xen-netback/rx.c
>> +++ b/drivers/net/xen-netback/rx.c
>> @@ -495,6 +495,7 @@ void xenvif_rx_action(struct xenvif_queue *queue)
>>   	queue->rx_copy.completed = &completed_skbs;
>>   
>>   	while (xenvif_rx_ring_slots_available(queue) &&
>> +	       !skb_queue_empty(&queue->rx_queue) &&
>>   	       work_done < RX_BATCH_SIZE) {
>>   		xenvif_rx_skb(queue);
>>   		work_done++;
> 
> I have to admit that I find the title a little misleading - you don't
> deal with the issue _in_ xenvif_rx_next_skb(); you instead avoid
> entering the function in such a case.

I'm handling the issue to avoid "an empty rx queue in xenvif_rx_next_skb()".

I can rephrase it to "avoid entering xenvif_rx_next_skb() with an empty rx
queue".


Juergen

Jan Beulich July 13, 2022, 9:48 a.m. UTC | #4

On 13.07.2022 11:31, Juergen Gross wrote:
> On 13.07.22 09:59, Jan Beulich wrote:
>> On 13.07.2022 09:48, Juergen Gross wrote:
>>> xenvif_rx_next_skb() is expecting the rx queue not being empty, but
>>> in case the loop in xenvif_rx_action() is doing multiple iterations,
>>> the availability of another skb in the rx queue is not being checked.
>>>
>>> This can lead to crashes:
>>>
>>> [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
>>> [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback]
>>> [40072.537534] PGD 0 P4D 0
>>> [40072.537644] Oops: 0000 [#1] SMP NOPTI
>>> [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5
>>> [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021
>>> [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000
>>> [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback]
>>> [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246
>>> [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7
>>> [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8
>>> [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008
>>> [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708
>>> [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0
>>> [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000
>>> [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660
>>> [40072.539211] Call Trace:
>>> [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback]
>>> [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback]
>>>
>>> Fix that by stopping the loop in case the rx queue becomes empty.
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> Does this want a Fixes: tag and Cc: to stable@ (not the least since as per
>> above the issue was noticed with 4.12.x)?
> 
> Hmm, I _think_ the issue was introduced with eb1723a29b9a. Do you agree?

If it was that, then something must have invoked xenvif_rx_action()
without there actually being work. I'd rather see 98f6d57ced73
("xen-netback: process guest rx packets in batches") as the origin.

>>> --- a/drivers/net/xen-netback/rx.c
>>> +++ b/drivers/net/xen-netback/rx.c
>>> @@ -495,6 +495,7 @@ void xenvif_rx_action(struct xenvif_queue *queue)
>>>   	queue->rx_copy.completed = &completed_skbs;
>>>   
>>>   	while (xenvif_rx_ring_slots_available(queue) &&
>>> +	       !skb_queue_empty(&queue->rx_queue) &&
>>>   	       work_done < RX_BATCH_SIZE) {
>>>   		xenvif_rx_skb(queue);
>>>   		work_done++;
>>
>> I have to admit that I find the title a little misleading - you don't
>> deal with the issue _in_ xenvif_rx_next_skb(); you instead avoid
>> entering the function in such a case.
> 
> I'm handling the issue to avoid "an empty rx queue in xenvif_rx_next_skb()".
> 
> I can rephrase it to "avoid entering xenvif_rx_next_skb() with an empty rx
> queue".

That or simply s/handle/avoid/ in the original title.

Jan

xen/netback: handle empty rx queue in xenvif_rx_next_skb()

Checks

Commit Message

Comments

Patch