Message ID | 20240704074153.1508825-1-ap420073@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] bnxt_en: fix kernel panic in queue api functions | expand |
On Thu, Jul 4, 2024 at 1:12 PM Taehee Yoo <ap420073@gmail.com> wrote: > > bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is > initialized while an interface is being up. > The rings are initialized as a number of channels. > > The queue API functions access rx_ring without checking both null and > ring size. > So, if the queue API functions are called when interface status is down, > they access an uninitialized rx_ring array. > Also if the queue index parameter value is larger than a ring, it > would also access an uninitialized rx_ring. You might want to rephrase this as 'if the queue index parameter is greater than the no: of rings created' > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > CPU: 1 PID: 1697 Comm: ncdevmem Not tainted 6.10.0-rc5+ #34 > RIP: 0010:bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en] > Code: 49 89 f5 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 8d af 40 0a 00 00 53 48 89 fb 48 83 ec 05 > RSP: 0018:ffffa1ad0449ba48 EFLAGS: 00010246 > RAX: ffffffffc04c7710 RBX: ffff9b88aee48000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffff9b8884ba0000 RDI: ffff9b8884ba0008 > RBP: ffff9b88aee48a40 R08: 0000000000000000 R09: ffff9b8884ba6000 > R10: ffffa1ad0449ba88 R11: ffff9b8884ba6000 R12: 0000000000000000 > R13: ffff9b8884ba0000 R14: ffff9b8884ba0000 R15: ffff9b8884ba6000 > FS: 00007f7b2a094740(0000) GS:ffff9b8f9f680000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 000000015f394000 CR4: 00000000007506f0 > PKRU: 55555554 > Call Trace: > <TASK> > ? __die+0x20/0x70 > ? page_fault_oops+0x15a/0x460 > ? __vmalloc_node_range_noprof+0x4f7/0x8e0 > ? exc_page_fault+0x6e/0x180 > ? asm_exc_page_fault+0x22/0x30 > ? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534] > ? bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534] > netdev_rx_queue_restart+0xa9/0x1c0 > net_devmem_bind_dmabuf_to_queue+0xcb/0x100 > netdev_nl_bind_rx_doit+0x2f6/0x350 > genl_family_rcv_msg_doit+0xd9/0x130 > genl_rcv_msg+0x184/0x2b0 > ? __pfx_netdev_nl_bind_rx_doit+0x10/0x10 > ? __pfx_genl_rcv_msg+0x10/0x10 > netlink_rcv_skb+0x54/0x100 > genl_rcv+0x24/0x40 > netlink_unicast+0x243/0x370 > netlink_sendmsg+0x1bb/0x3e0 > > Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops") > Signed-off-by: Taehee Yoo <ap420073@gmail.com> > --- > > The branch is not net because the commit 2d694c27d32e is not > yet merged into net branch. > > devmem TCP causes this problem, but it is not yet merged. > So, to test this patch, please patch the current devmem TCP. > The /tools/testing/selftests/net/ncdevmem will immediately reproduce > this problem. > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > index 6fc34ccb86e3..e270fb6b2866 100644 > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > @@ -15022,6 +15022,9 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx) > struct bnxt_ring_struct *ring; > int rc; > > + if (!bp->rx_ring || idx >= bp->rx_nr_rings) > + return -EINVAL; > + > rxr = &bp->rx_ring[idx]; > clone = qmem; > memcpy(clone, rxr, sizeof(*rxr)); > @@ -15156,6 +15159,9 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx) > struct bnxt_cp_ring_info *cpr; > int rc; > > + if (!bp->rx_ring || idx >= bp->rx_nr_rings) > + return -EINVAL; > + > rxr = &bp->rx_ring[idx]; > clone = qmem; > > @@ -15195,6 +15201,9 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) > struct bnxt *bp = netdev_priv(dev); > struct bnxt_rx_ring_info *rxr; > > + if (!bp->rx_ring || idx >= bp->rx_nr_rings) > + return -EINVAL; > + > rxr = &bp->rx_ring[idx]; > napi_disable(&rxr->bnapi->napi); > bnxt_hwrm_rx_ring_free(bp, rxr, false); > -- > 2.34.1 > Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
On Thu, 4 Jul 2024 07:41:53 +0000 Taehee Yoo wrote: > bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is > initialized while an interface is being up. > The rings are initialized as a number of channels. > > The queue API functions access rx_ring without checking both null and > ring size. > So, if the queue API functions are called when interface status is down, > they access an uninitialized rx_ring array. > Also if the queue index parameter value is larger than a ring, it > would also access an uninitialized rx_ring. Shouldn't the core be checking against dev->real_num_rx_queues instead ?
On Thu, Jul 4, 2024 at 10:33 PM Jakub Kicinski <kuba@kernel.org> wrote: > Hi Jakub, Thanks a lot for review! > On Thu, 4 Jul 2024 07:41:53 +0000 Taehee Yoo wrote: > > bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is > > initialized while an interface is being up. > > The rings are initialized as a number of channels. > > > > The queue API functions access rx_ring without checking both null and > > ring size. > > So, if the queue API functions are called when interface status is down, > > they access an uninitialized rx_ring array. > > Also if the queue index parameter value is larger than a ring, it > > would also access an uninitialized rx_ring. > > Shouldn't the core be checking against dev->real_num_rx_queues instead ? Oh, I missed it. I agree the core should check dev->real_num_rx_queues. But the current devmem TCP code checks dev->num_rx_queues instead of dev->real_num_rx_queues. I tested the below change, and it works well. So, I will comment on it in Mina's patch. diff --git a/net/core/devmem.c b/net/core/devmem.c index 7afaf17801ef..da27778c2421 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -146,7 +146,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, u32 xa_idx; int err; - if (rxq_idx >= dev->num_rx_queues) + if (rxq_idx >= dev->real_num_rx_queues) return -ERANGE; rxq = __netif_get_rx_queue(dev, rxq_idx); Thanks a lot! Taehee Yoo
On 2024-07-04 00:41, Taehee Yoo wrote: > bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is > initialized while an interface is being up. > The rings are initialized as a number of channels. > > The queue API functions access rx_ring without checking both null and > ring size. > So, if the queue API functions are called when interface status is down, > they access an uninitialized rx_ring array. > Also if the queue index parameter value is larger than a ring, it > would also access an uninitialized rx_ring. Hi Taehee, I deliberately didn't do a check as it is too defensive. The caller of the queue API e.g. netdev_rx_queue_restart() should be the one doing the check instead. > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > CPU: 1 PID: 1697 Comm: ncdevmem Not tainted 6.10.0-rc5+ #34 > RIP: 0010:bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en] > Code: 49 89 f5 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 8d af 40 0a 00 00 53 48 89 fb 48 83 ec 05 > RSP: 0018:ffffa1ad0449ba48 EFLAGS: 00010246 > RAX: ffffffffc04c7710 RBX: ffff9b88aee48000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffff9b8884ba0000 RDI: ffff9b8884ba0008 > RBP: ffff9b88aee48a40 R08: 0000000000000000 R09: ffff9b8884ba6000 > R10: ffffa1ad0449ba88 R11: ffff9b8884ba6000 R12: 0000000000000000 > R13: ffff9b8884ba0000 R14: ffff9b8884ba0000 R15: ffff9b8884ba6000 > FS: 00007f7b2a094740(0000) GS:ffff9b8f9f680000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 000000015f394000 CR4: 00000000007506f0 > PKRU: 55555554 > Call Trace: > <TASK> > ? __die+0x20/0x70 > ? page_fault_oops+0x15a/0x460 > ? __vmalloc_node_range_noprof+0x4f7/0x8e0 > ? exc_page_fault+0x6e/0x180 > ? asm_exc_page_fault+0x22/0x30 > ? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534] > ? bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534] > netdev_rx_queue_restart+0xa9/0x1c0 > net_devmem_bind_dmabuf_to_queue+0xcb/0x100 > netdev_nl_bind_rx_doit+0x2f6/0x350 > genl_family_rcv_msg_doit+0xd9/0x130 > genl_rcv_msg+0x184/0x2b0 > ? __pfx_netdev_nl_bind_rx_doit+0x10/0x10 > ? __pfx_genl_rcv_msg+0x10/0x10 > netlink_rcv_skb+0x54/0x100 > genl_rcv+0x24/0x40 > netlink_unicast+0x243/0x370 > netlink_sendmsg+0x1bb/0x3e0 > > Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops") > Signed-off-by: Taehee Yoo <ap420073@gmail.com> > --- > > The branch is not net because the commit 2d694c27d32e is not > yet merged into net branch. > > devmem TCP causes this problem, but it is not yet merged. > So, to test this patch, please patch the current devmem TCP. > The /tools/testing/selftests/net/ncdevmem will immediately reproduce > this problem. > > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > index 6fc34ccb86e3..e270fb6b2866 100644 > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > @@ -15022,6 +15022,9 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx) > struct bnxt_ring_struct *ring; > int rc; > > + if (!bp->rx_ring || idx >= bp->rx_nr_rings) > + return -EINVAL; > + > rxr = &bp->rx_ring[idx]; > clone = qmem; > memcpy(clone, rxr, sizeof(*rxr)); > @@ -15156,6 +15159,9 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx) > struct bnxt_cp_ring_info *cpr; > int rc; > > + if (!bp->rx_ring || idx >= bp->rx_nr_rings) > + return -EINVAL; > + > rxr = &bp->rx_ring[idx]; > clone = qmem; > > @@ -15195,6 +15201,9 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) > struct bnxt *bp = netdev_priv(dev); > struct bnxt_rx_ring_info *rxr; > > + if (!bp->rx_ring || idx >= bp->rx_nr_rings) > + return -EINVAL; > + > rxr = &bp->rx_ring[idx]; > napi_disable(&rxr->bnapi->napi); > bnxt_hwrm_rx_ring_free(bp, rxr, false);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 6fc34ccb86e3..e270fb6b2866 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -15022,6 +15022,9 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx) struct bnxt_ring_struct *ring; int rc; + if (!bp->rx_ring || idx >= bp->rx_nr_rings) + return -EINVAL; + rxr = &bp->rx_ring[idx]; clone = qmem; memcpy(clone, rxr, sizeof(*rxr)); @@ -15156,6 +15159,9 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx) struct bnxt_cp_ring_info *cpr; int rc; + if (!bp->rx_ring || idx >= bp->rx_nr_rings) + return -EINVAL; + rxr = &bp->rx_ring[idx]; clone = qmem; @@ -15195,6 +15201,9 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx) struct bnxt *bp = netdev_priv(dev); struct bnxt_rx_ring_info *rxr; + if (!bp->rx_ring || idx >= bp->rx_nr_rings) + return -EINVAL; + rxr = &bp->rx_ring[idx]; napi_disable(&rxr->bnapi->napi); bnxt_hwrm_rx_ring_free(bp, rxr, false);
bnxt_queue_{mem_alloc,start,stop} access bp->rx_ring array and this is initialized while an interface is being up. The rings are initialized as a number of channels. The queue API functions access rx_ring without checking both null and ring size. So, if the queue API functions are called when interface status is down, they access an uninitialized rx_ring array. Also if the queue index parameter value is larger than a ring, it would also access an uninitialized rx_ring. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 1697 Comm: ncdevmem Not tainted 6.10.0-rc5+ #34 RIP: 0010:bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en] Code: 49 89 f5 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 8d af 40 0a 00 00 53 48 89 fb 48 83 ec 05 RSP: 0018:ffffa1ad0449ba48 EFLAGS: 00010246 RAX: ffffffffc04c7710 RBX: ffff9b88aee48000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9b8884ba0000 RDI: ffff9b8884ba0008 RBP: ffff9b88aee48a40 R08: 0000000000000000 R09: ffff9b8884ba6000 R10: ffffa1ad0449ba88 R11: ffff9b8884ba6000 R12: 0000000000000000 R13: ffff9b8884ba0000 R14: ffff9b8884ba0000 R15: ffff9b8884ba6000 FS: 00007f7b2a094740(0000) GS:ffff9b8f9f680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000015f394000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x15a/0x460 ? __vmalloc_node_range_noprof+0x4f7/0x8e0 ? exc_page_fault+0x6e/0x180 ? asm_exc_page_fault+0x22/0x30 ? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534] ? bnxt_queue_mem_alloc+0x38/0x410 [bnxt_en 2b2843e995211f081639d5c0e74fe1cce7fed534] netdev_rx_queue_restart+0xa9/0x1c0 net_devmem_bind_dmabuf_to_queue+0xcb/0x100 netdev_nl_bind_rx_doit+0x2f6/0x350 genl_family_rcv_msg_doit+0xd9/0x130 genl_rcv_msg+0x184/0x2b0 ? __pfx_netdev_nl_bind_rx_doit+0x10/0x10 ? __pfx_genl_rcv_msg+0x10/0x10 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x243/0x370 netlink_sendmsg+0x1bb/0x3e0 Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops") Signed-off-by: Taehee Yoo <ap420073@gmail.com> --- The branch is not net because the commit 2d694c27d32e is not yet merged into net branch. devmem TCP causes this problem, but it is not yet merged. So, to test this patch, please patch the current devmem TCP. The /tools/testing/selftests/net/ncdevmem will immediately reproduce this problem. drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++++++++ 1 file changed, 9 insertions(+)