Message ID | 20220713074823.5679-1-jgross@suse.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | xen/netback: handle empty rx queue in xenvif_rx_next_skb() | expand |
On 13.07.2022 09:48, Juergen Gross wrote: > xenvif_rx_next_skb() is expecting the rx queue not being empty, but > in case the loop in xenvif_rx_action() is doing multiple iterations, > the availability of another skb in the rx queue is not being checked. > > This can lead to crashes: > > [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 > [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback] > [40072.537534] PGD 0 P4D 0 > [40072.537644] Oops: 0000 [#1] SMP NOPTI > [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5 > [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021 > [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000 > [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback] > [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246 > [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7 > [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8 > [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008 > [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708 > [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0 > [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000 > [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660 > [40072.539211] Call Trace: > [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback] > [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback] > > Fix that by stopping the loop in case the rx queue becomes empty. > > Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Does this want a Fixes: tag and Cc: to stable@ (not the least since as per above the issue was noticed with 4.12.x)? > --- a/drivers/net/xen-netback/rx.c > +++ b/drivers/net/xen-netback/rx.c > @@ -495,6 +495,7 @@ void xenvif_rx_action(struct xenvif_queue *queue) > queue->rx_copy.completed = &completed_skbs; > > while (xenvif_rx_ring_slots_available(queue) && > + !skb_queue_empty(&queue->rx_queue) && > work_done < RX_BATCH_SIZE) { > xenvif_rx_skb(queue); > work_done++; I have to admit that I find the title a little misleading - you don't deal with the issue _in_ xenvif_rx_next_skb(); you instead avoid entering the function in such a case. Jan
On 13/07/2022 09:48, Juergen Gross wrote: > xenvif_rx_next_skb() is expecting the rx queue not being empty, but > in case the loop in xenvif_rx_action() is doing multiple iterations, > the availability of another skb in the rx queue is not being checked. > > This can lead to crashes: > > [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 > [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback] > [40072.537534] PGD 0 P4D 0 > [40072.537644] Oops: 0000 [#1] SMP NOPTI > [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5 > [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021 > [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000 > [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback] > [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246 > [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7 > [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8 > [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008 > [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708 > [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0 > [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000 > [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660 > [40072.539211] Call Trace: > [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback] > [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback] > > Fix that by stopping the loop in case the rx queue becomes empty. > > Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
On 13.07.22 09:59, Jan Beulich wrote: > On 13.07.2022 09:48, Juergen Gross wrote: >> xenvif_rx_next_skb() is expecting the rx queue not being empty, but >> in case the loop in xenvif_rx_action() is doing multiple iterations, >> the availability of another skb in the rx queue is not being checked. >> >> This can lead to crashes: >> >> [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 >> [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback] >> [40072.537534] PGD 0 P4D 0 >> [40072.537644] Oops: 0000 [#1] SMP NOPTI >> [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5 >> [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021 >> [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000 >> [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback] >> [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246 >> [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7 >> [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8 >> [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008 >> [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708 >> [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0 >> [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000 >> [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660 >> [40072.539211] Call Trace: >> [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback] >> [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback] >> >> Fix that by stopping the loop in case the rx queue becomes empty. >> >> Signed-off-by: Juergen Gross <jgross@suse.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > Does this want a Fixes: tag and Cc: to stable@ (not the least since as per > above the issue was noticed with 4.12.x)? Hmm, I _think_ the issue was introduced with eb1723a29b9a. Do you agree? > >> --- a/drivers/net/xen-netback/rx.c >> +++ b/drivers/net/xen-netback/rx.c >> @@ -495,6 +495,7 @@ void xenvif_rx_action(struct xenvif_queue *queue) >> queue->rx_copy.completed = &completed_skbs; >> >> while (xenvif_rx_ring_slots_available(queue) && >> + !skb_queue_empty(&queue->rx_queue) && >> work_done < RX_BATCH_SIZE) { >> xenvif_rx_skb(queue); >> work_done++; > > I have to admit that I find the title a little misleading - you don't > deal with the issue _in_ xenvif_rx_next_skb(); you instead avoid > entering the function in such a case. I'm handling the issue to avoid "an empty rx queue in xenvif_rx_next_skb()". I can rephrase it to "avoid entering xenvif_rx_next_skb() with an empty rx queue". Juergen
On 13.07.2022 11:31, Juergen Gross wrote: > On 13.07.22 09:59, Jan Beulich wrote: >> On 13.07.2022 09:48, Juergen Gross wrote: >>> xenvif_rx_next_skb() is expecting the rx queue not being empty, but >>> in case the loop in xenvif_rx_action() is doing multiple iterations, >>> the availability of another skb in the rx queue is not being checked. >>> >>> This can lead to crashes: >>> >>> [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 >>> [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback] >>> [40072.537534] PGD 0 P4D 0 >>> [40072.537644] Oops: 0000 [#1] SMP NOPTI >>> [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5 >>> [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021 >>> [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000 >>> [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback] >>> [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246 >>> [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7 >>> [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8 >>> [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008 >>> [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708 >>> [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0 >>> [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000 >>> [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660 >>> [40072.539211] Call Trace: >>> [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback] >>> [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback] >>> >>> Fix that by stopping the loop in case the rx queue becomes empty. >>> >>> Signed-off-by: Juergen Gross <jgross@suse.com> >> >> Reviewed-by: Jan Beulich <jbeulich@suse.com> >> >> Does this want a Fixes: tag and Cc: to stable@ (not the least since as per >> above the issue was noticed with 4.12.x)? > > Hmm, I _think_ the issue was introduced with eb1723a29b9a. Do you agree? If it was that, then something must have invoked xenvif_rx_action() without there actually being work. I'd rather see 98f6d57ced73 ("xen-netback: process guest rx packets in batches") as the origin. >>> --- a/drivers/net/xen-netback/rx.c >>> +++ b/drivers/net/xen-netback/rx.c >>> @@ -495,6 +495,7 @@ void xenvif_rx_action(struct xenvif_queue *queue) >>> queue->rx_copy.completed = &completed_skbs; >>> >>> while (xenvif_rx_ring_slots_available(queue) && >>> + !skb_queue_empty(&queue->rx_queue) && >>> work_done < RX_BATCH_SIZE) { >>> xenvif_rx_skb(queue); >>> work_done++; >> >> I have to admit that I find the title a little misleading - you don't >> deal with the issue _in_ xenvif_rx_next_skb(); you instead avoid >> entering the function in such a case. > > I'm handling the issue to avoid "an empty rx queue in xenvif_rx_next_skb()". > > I can rephrase it to "avoid entering xenvif_rx_next_skb() with an empty rx > queue". That or simply s/handle/avoid/ in the original title. Jan
diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c index dbac4c03d21a..a0335407be42 100644 --- a/drivers/net/xen-netback/rx.c +++ b/drivers/net/xen-netback/rx.c @@ -495,6 +495,7 @@ void xenvif_rx_action(struct xenvif_queue *queue) queue->rx_copy.completed = &completed_skbs; while (xenvif_rx_ring_slots_available(queue) && + !skb_queue_empty(&queue->rx_queue) && work_done < RX_BATCH_SIZE) { xenvif_rx_skb(queue); work_done++;
xenvif_rx_next_skb() is expecting the rx queue not being empty, but in case the loop in xenvif_rx_action() is doing multiple iterations, the availability of another skb in the rx queue is not being checked. This can lead to crashes: [40072.537261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 [40072.537407] IP: xenvif_rx_skb+0x23/0x590 [xen_netback] [40072.537534] PGD 0 P4D 0 [40072.537644] Oops: 0000 [#1] SMP NOPTI [40072.537749] CPU: 0 PID: 12505 Comm: v1-c40247-q2-gu Not tainted 4.12.14-122.121-default #1 SLE12-SP5 [40072.537867] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 11/23/2021 [40072.537999] task: ffff880433b38100 task.stack: ffffc90043d40000 [40072.538112] RIP: e030:xenvif_rx_skb+0x23/0x590 [xen_netback] [40072.538217] RSP: e02b:ffffc90043d43de0 EFLAGS: 00010246 [40072.538319] RAX: 0000000000000000 RBX: ffffc90043cd7cd0 RCX: 00000000000000f7 [40072.538430] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffc90043d43df8 [40072.538531] RBP: 000000000000003f R08: 000077ff80000000 R09: 0000000000000008 [40072.538644] R10: 0000000000007ff0 R11: 00000000000008f6 R12: ffffc90043ce2708 [40072.538745] R13: 0000000000000000 R14: ffffc90043d43ed0 R15: ffff88043ea748c0 [40072.538861] FS: 0000000000000000(0000) GS:ffff880484600000(0000) knlGS:0000000000000000 [40072.538988] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [40072.539088] CR2: 0000000000000080 CR3: 0000000407ac8000 CR4: 0000000000040660 [40072.539211] Call Trace: [40072.539319] xenvif_rx_action+0x71/0x90 [xen_netback] [40072.539429] xenvif_kthread_guest_rx+0x14a/0x29c [xen_netback] Fix that by stopping the loop in case the rx queue becomes empty. Signed-off-by: Juergen Gross <jgross@suse.com> --- drivers/net/xen-netback/rx.c | 1 + 1 file changed, 1 insertion(+)