mbox series

[net-next,v4,0/3] Fix late DMA unmap crash for page pool

Message ID 20250327-page-pool-track-dma-v4-0-b380dc6706d0@redhat.com (mailing list archive)
Headers show
Series Fix late DMA unmap crash for page pool | expand

Message

Toke Høiland-Jørgensen March 27, 2025, 10:44 a.m. UTC
This series fixes the late dma_unmap crash for page pool first reported
by Yonglong Liu in [0]. It is an alternative approach to the one
submitted by Yunsheng Lin, most recently in [1]. The first two commits
are small refactors of the page pool code, in preparation of the main
change in patch 3. See the commit message of patch 3 for the details.

-Toke

[0] https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
[1] https://lore.kernel.org/r/20250307092356.638242-1-linyunsheng@huawei.com

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
Changes in v4:
- Rebase on net-next
- Collect tags
- Link to v3: https://lore.kernel.org/r/20250326-page-pool-track-dma-v3-0-8e464016e0ac@redhat.com

Changes in v3:
- Use a full-width bool for pp->dma_sync instead of a full unsigned
  long (in patch 2), and leave pp->dma_sync_cpu alone.

- Link to v2: https://lore.kernel.org/r/20250325-page-pool-track-dma-v2-0-113ebc1946f3@redhat.com

Changes in v2:
- Always leave two bits at the top of pp_magic as zero, instead of one

- Add an rcu_read_lock() around __page_pool_dma_sync_for_device()

- Add a comment in poison.h with a reference to the bitmask definition

- Add a longer description of the logic of the bitmask definitions to
  the comment in types.h, and a summary of the security implications of
  using the pp_magic field to the commit message of patch 3

- Collect Mina's Reviewed-by and Yonglong's Tested-by tags

- Link to v1: https://lore.kernel.org/r/20250314-page-pool-track-dma-v1-0-c212e57a74c2@redhat.com

---
Toke Høiland-Jørgensen (3):
      page_pool: Move pp_magic check into helper functions
      page_pool: Turn dma_sync into a full-width bool field
      page_pool: Track DMA-mapped pages and unmap them when destroying the pool

 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c |  4 +-
 include/linux/poison.h                           |  4 ++
 include/net/page_pool/types.h                    | 65 ++++++++++++++++++-
 mm/page_alloc.c                                  |  9 +--
 net/core/netmem_priv.h                           | 33 +++++++++-
 net/core/page_pool.c                             | 81 ++++++++++++++++++++----
 net/core/skbuff.c                                | 16 +----
 net/core/xdp.c                                   |  4 +-
 8 files changed, 176 insertions(+), 40 deletions(-)
---
base-commit: 1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95
change-id: 20250310-page-pool-track-dma-0332343a460e

Comments

Jakub Kicinski March 27, 2025, 7:48 p.m. UTC | #1
On Thu, 27 Mar 2025 11:44:10 +0100 Toke Høiland-Jørgensen wrote:
> This series fixes the late dma_unmap crash for page pool first reported
> by Yonglong Liu in [0]. It is an alternative approach to the one
> submitted by Yunsheng Lin, most recently in [1]. The first two commits
> are small refactors of the page pool code, in preparation of the main
> change in patch 3. See the commit message of patch 3 for the details.

We see a crash and an UAF on:

[   18.574787] RIP: 0010:page_pool_put_unrefed_netmem (net/core/page_pool.c:465 net/core/page_pool.c:808 net/core/page_pool.c:866) 
[   18.575880] napi_pp_put_page (net/core/skbuff.c:998) 
[   18.575912] skb_release_data (./include/linux/skbuff_ref.h:40 ./include/linux/skbuff_ref.h:56 net/core/skbuff.c:1079) 
[   18.575944] consume_skb (net/core/skbuff.c:1165 net/core/skbuff.c:1396 net/core/skbuff.c:1390) 

You should be able to repro with ping test over netdevsim
Toke Høiland-Jørgensen March 28, 2025, 11:20 a.m. UTC | #2
Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 27 Mar 2025 11:44:10 +0100 Toke Høiland-Jørgensen wrote:
>> This series fixes the late dma_unmap crash for page pool first reported
>> by Yonglong Liu in [0]. It is an alternative approach to the one
>> submitted by Yunsheng Lin, most recently in [1]. The first two commits
>> are small refactors of the page pool code, in preparation of the main
>> change in patch 3. See the commit message of patch 3 for the details.
>
> We see a crash and an UAF on:
>
> [   18.574787] RIP: 0010:page_pool_put_unrefed_netmem (net/core/page_pool.c:465 net/core/page_pool.c:808 net/core/page_pool.c:866) 
> [   18.575880] napi_pp_put_page (net/core/skbuff.c:998) 
> [   18.575912] skb_release_data (./include/linux/skbuff_ref.h:40 ./include/linux/skbuff_ref.h:56 net/core/skbuff.c:1079) 
> [   18.575944] consume_skb (net/core/skbuff.c:1165 net/core/skbuff.c:1396 net/core/skbuff.c:1390) 
>
> You should be able to repro with ping test over netdevsim

Alright, I'll take a look, thanks for the pointer.

-Toke