Message ID | 4a6ab7b8-0dcc-43b8-a647-9be2a767b06d@infradead.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v4,net] ps3/gelic: Fix SKB allocation | expand |
On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote: > Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of > 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a > kernel panic when the SKB variable (struct gelic_descr.skb) was accessed. > > This fix changes the way the napi buffer and corresponding SKB are > allocated and managed. I think this is not what Jakub asked on v3. Isn't something alike the following enough to fix the NULL ptr deref? Thanks, Paolo --- diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c index d5b75af163d3..51ee6075653f 100644 --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card, descr->hw_regs.data_error = 0; descr->hw_regs.payload.dev_addr = 0; descr->hw_regs.payload.size = 0; - descr->skb = NULL; offset = ((unsigned long)descr->skb->data) & (GELIC_NET_RXBUF_ALIGN - 1);
Le 13/02/2024 à 13:07, Paolo Abeni a écrit : > On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote: >> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of >> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a >> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed. >> >> This fix changes the way the napi buffer and corresponding SKB are >> allocated and managed. > > I think this is not what Jakub asked on v3. > > Isn't something alike the following enough to fix the NULL ptr deref? If you think it is enough, please explain in more details. From my point of view, when looking at commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") that introduced the problem, it is not obvious. Christophe > > Thanks, > > Paolo > --- > diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c > index d5b75af163d3..51ee6075653f 100644 > --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c > +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c > @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card, > descr->hw_regs.data_error = 0; > descr->hw_regs.payload.dev_addr = 0; > descr->hw_regs.payload.size = 0; > - descr->skb = NULL; > > offset = ((unsigned long)descr->skb->data) & > (GELIC_NET_RXBUF_ALIGN - 1); >
On Tue, 2024-02-13 at 12:56 +0000, Christophe Leroy wrote: > > Le 13/02/2024 à 13:07, Paolo Abeni a écrit : > > On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote: > > > Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of > > > 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a > > > kernel panic when the SKB variable (struct gelic_descr.skb) was accessed. > > > > > > This fix changes the way the napi buffer and corresponding SKB are > > > allocated and managed. > > > > I think this is not what Jakub asked on v3. > > > > Isn't something alike the following enough to fix the NULL ptr deref? > > If you think it is enough, please explain in more details. I'm unsure it will be enough, but at least the quoted line causes a NULL ptr dereference: descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size); if (!descr->skb) { descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */ return -ENOMEM; } // here descr->skb is not NULL... descr->hw_regs.dmac_cmd_status = 0; descr->hw_regs.result_size = 0; descr->hw_regs.valid_size = 0; descr->hw_regs.data_error = 0; descr->hw_regs.payload.dev_addr = 0; descr->hw_regs.payload.size = 0; descr->skb = NULL; // ... and now it's NULL for no apparent good reason offset = ((unsigned long)descr->skb->data) & // ^^^^^^^ NULL ptr deref (GELIC_NET_RXBUF_ALIGN - 1); if (offset) skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset); /* io-mmu-map the skb */ cpu_addr = dma_map_single(ctodev(card), descr->skb->data, GELIC_NET_MAX_FRAME, DMA_FROM_DEVICE); The buggy line in introduced by the blamed commit. > From my point of view, when looking at commit 3ce4f9c3fbb3 > ("net/ps3_gelic_net: Add gelic_descr structures") that introduced the > problem, it is not obvious. That change is quite complex. It could includes other issues - quickly skimming over it I could not see them. Reporting the decoded stack trace generated by the NULL ptr deref could help. Cheers, Paolo
On 2/13/24 21:07, Paolo Abeni wrote: > On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote: >> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of >> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a >> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed. >> >> This fix changes the way the napi buffer and corresponding SKB are >> allocated and managed. > > I think this is not what Jakub asked on v3. > > Isn't something alike the following enough to fix the NULL ptr deref? > > Thanks, > > Paolo > --- > diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c > index d5b75af163d3..51ee6075653f 100644 > --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c > +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c > @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card, > descr->hw_regs.data_error = 0; > descr->hw_regs.payload.dev_addr = 0; > descr->hw_regs.payload.size = 0; > - descr->skb = NULL; The reason we set the SKB pointer to NULL here is so we can detect if an SKB has been allocated or not. If the SKB pointer is not NULL, then we delete it. If we just let the SKB pointer be some random value then later we will try to delete some random address. -Geoff
On Fri, 2024-02-16 at 18:37 +0900, Geoff Levand wrote: > On 2/13/24 21:07, Paolo Abeni wrote: > > On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote: > > > Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of > > > 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a > > > kernel panic when the SKB variable (struct gelic_descr.skb) was accessed. > > > > > > This fix changes the way the napi buffer and corresponding SKB are > > > allocated and managed. > > > > I think this is not what Jakub asked on v3. > > > > Isn't something alike the following enough to fix the NULL ptr deref? > > > > Thanks, > > > > Paolo > > --- > > diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c > > index d5b75af163d3..51ee6075653f 100644 > > --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c > > +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c > > @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card, > > descr->hw_regs.data_error = 0; > > descr->hw_regs.payload.dev_addr = 0; > > descr->hw_regs.payload.size = 0; > > - descr->skb = NULL; > > The reason we set the SKB pointer to NULL here is so we can > detect if an SKB has been allocated or not. If the SKB pointer > is not NULL, then we delete it. > > If we just let the SKB pointer be some random value then later > we will try to delete some random address. Note that this specific 'skb = NULL' assignment happens just after a successful allocation and just before unconditional dereference of such ptr: descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size); if (!descr->skb) { descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */ return -ENOMEM; } descr->hw_regs.dmac_cmd_status = 0; descr->hw_regs.result_size = 0; descr->hw_regs.valid_size = 0; descr->hw_regs.data_error = 0; descr->hw_regs.payload.dev_addr = 0; descr->hw_regs.payload.size = 0; // XXX here skb is not NULL and valid descr->skb = NULL; offset = ((unsigned long)descr->skb->data) & // XXX here ^^^^^^^^^^ // you unconditionally get null ptr deref (GELIC_NET_RXBUF_ALIGN - 1); unless ENOCOFFEE here which is totally possible. Cheers, Paolo
On 2/13/24 22:09, Paolo Abeni wrote: > On Tue, 2024-02-13 at 12:56 +0000, Christophe Leroy wrote: >> >> Le 13/02/2024 à 13:07, Paolo Abeni a écrit : >>> On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote: >>>> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of >>>> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a >>>> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed. >>>> >>>> This fix changes the way the napi buffer and corresponding SKB are >>>> allocated and managed. >>> >>> I think this is not what Jakub asked on v3. >>> >>> Isn't something alike the following enough to fix the NULL ptr deref? >> >> If you think it is enough, please explain in more details. > > I'm unsure it will be enough, but at least the quoted line causes a > NULL ptr dereference: > > descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size); > if (!descr->skb) { > descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */ > return -ENOMEM; > } > // here descr->skb is not NULL... > > descr->hw_regs.dmac_cmd_status = 0; > descr->hw_regs.result_size = 0; > descr->hw_regs.valid_size = 0; > descr->hw_regs.data_error = 0; > descr->hw_regs.payload.dev_addr = 0; > descr->hw_regs.payload.size = 0; > descr->skb = NULL; > // ... and now it's NULL for no apparent good reason As I mentioned in an earlier mail, the SKB pointer is set to NULL so we can detect if an SKB has been allocated. > > offset = ((unsigned long)descr->skb->data) & > // ^^^^^^^ NULL ptr deref > > (GELIC_NET_RXBUF_ALIGN - 1); > if (offset) > skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset); > /* io-mmu-map the skb */ > cpu_addr = dma_map_single(ctodev(card), descr->skb->data, > GELIC_NET_MAX_FRAME, DMA_FROM_DEVICE); > > > The buggy line in introduced by the blamed commit. > >> From my point of view, when looking at commit 3ce4f9c3fbb3 >> ("net/ps3_gelic_net: Add gelic_descr structures") that introduced the >> problem, it is not obvious. > > That change is quite complex. It could includes other issues - quickly > skimming over it I could not see them. The proposed change was tested by both Sambat and I. It fixes the problem, and no ill effects were seen. -Geoff
On 2/16/24 19:06, Paolo Abeni wrote: > On Fri, 2024-02-16 at 18:37 +0900, Geoff Levand wrote: >> On 2/13/24 21:07, Paolo Abeni wrote: >>> On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote: >>>> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of >>>> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a >>>> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed. >>>> >>>> This fix changes the way the napi buffer and corresponding SKB are >>>> allocated and managed. >>> >>> I think this is not what Jakub asked on v3. >>> >>> Isn't something alike the following enough to fix the NULL ptr deref? >>> >>> Thanks, >>> >>> Paolo >>> --- >>> diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c >>> index d5b75af163d3..51ee6075653f 100644 >>> --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c >>> +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c >>> @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card, >>> descr->hw_regs.data_error = 0; >>> descr->hw_regs.payload.dev_addr = 0; >>> descr->hw_regs.payload.size = 0; >>> - descr->skb = NULL; >> >> The reason we set the SKB pointer to NULL here is so we can >> detect if an SKB has been allocated or not. If the SKB pointer >> is not NULL, then we delete it. >> >> If we just let the SKB pointer be some random value then later >> we will try to delete some random address. > > Note that this specific 'skb = NULL' assignment happens just after a > successful allocation and just before unconditional dereference of such > ptr: > > descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size); > if (!descr->skb) { > descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */ > return -ENOMEM; > } > > descr->hw_regs.dmac_cmd_status = 0; > descr->hw_regs.result_size = 0; > descr->hw_regs.valid_size = 0; > descr->hw_regs.data_error = 0; > descr->hw_regs.payload.dev_addr = 0; > descr->hw_regs.payload.size = 0; > // XXX here skb is not NULL and valid > descr->skb = NULL; I see your point now. I'll send out a fix-up patch that just moves the initialization of the descr to before the allocation of the SKB. Thanks. -Geoff
diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c index d5b75af163d3..a09b534efa32 100644 --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c @@ -375,20 +375,14 @@ static int gelic_card_init_chain(struct gelic_card *card, static int gelic_descr_prepare_rx(struct gelic_card *card, struct gelic_descr *descr) { - static const unsigned int rx_skb_size = - ALIGN(GELIC_NET_MAX_FRAME, GELIC_NET_RXBUF_ALIGN) + - GELIC_NET_RXBUF_ALIGN - 1; + static const unsigned int napi_buff_size = + round_up(GELIC_NET_MAX_FRAME, GELIC_NET_RXBUF_ALIGN); dma_addr_t cpu_addr; - int offset; + void *napi_buff; if (gelic_descr_get_status(descr) != GELIC_DESCR_DMA_NOT_IN_USE) dev_info(ctodev(card), "%s: ERROR status\n", __func__); - descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size); - if (!descr->skb) { - descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */ - return -ENOMEM; - } descr->hw_regs.dmac_cmd_status = 0; descr->hw_regs.result_size = 0; descr->hw_regs.valid_size = 0; @@ -397,24 +391,32 @@ static int gelic_descr_prepare_rx(struct gelic_card *card, descr->hw_regs.payload.size = 0; descr->skb = NULL; - offset = ((unsigned long)descr->skb->data) & - (GELIC_NET_RXBUF_ALIGN - 1); - if (offset) - skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset); - /* io-mmu-map the skb */ - cpu_addr = dma_map_single(ctodev(card), descr->skb->data, - GELIC_NET_MAX_FRAME, DMA_FROM_DEVICE); - descr->hw_regs.payload.dev_addr = cpu_to_be32(cpu_addr); + napi_buff = napi_alloc_frag_align(napi_buff_size, + GELIC_NET_RXBUF_ALIGN); + + if (unlikely(!napi_buff)) + return -ENOMEM; + + descr->skb = napi_build_skb(napi_buff, napi_buff_size); + + if (unlikely(!descr->skb)) { + skb_free_frag(napi_buff); + return -ENOMEM; + } + + cpu_addr = dma_map_single(ctodev(card), napi_buff, napi_buff_size, + DMA_FROM_DEVICE); + if (dma_mapping_error(ctodev(card), cpu_addr)) { - dev_kfree_skb_any(descr->skb); + skb_free_frag(napi_buff); descr->skb = NULL; - dev_info(ctodev(card), - "%s:Could not iommu-map rx buffer\n", __func__); + dev_err_once(ctodev(card), "%s:Could not iommu-map rx buffer\n", + __func__); gelic_descr_set_status(descr, GELIC_DESCR_DMA_NOT_IN_USE); return -ENOMEM; } - descr->hw_regs.payload.size = cpu_to_be32(GELIC_NET_MAX_FRAME); + descr->hw_regs.payload.size = cpu_to_be32(napi_buff_size); descr->hw_regs.payload.dev_addr = cpu_to_be32(cpu_addr); gelic_descr_set_status(descr, GELIC_DESCR_DMA_CARDOWNED);
Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a kernel panic when the SKB variable (struct gelic_descr.skb) was accessed. This fix changes the way the napi buffer and corresponding SKB are allocated and managed. Reported-by: sambat goson <sombat3960@gmail.com> Fixes: 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") Signed-off-by: Geoff Levand <geoff@infradead.org>