From patchwork Wed Nov 20 10:34:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yunsheng Lin X-Patchwork-Id: 13881011 X-Patchwork-Delegate: kuba@kernel.org Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8561A19D894; Wed, 20 Nov 2024 10:41:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732099307; cv=none; b=goP4WICaBOYQlB2WarGtTizU2epJthzgpMt8/u6ARoBFtc083oU8B14GhqJDhi7L/JR8mWlHVgzglEamhnErzXKqRb4RxkNLX6cXNqZoDXSUT4/WD3NgsraRAE+PNK8y0aoMxqljZivZj/IDyoEWi71LFofwoAEm82WvTVTRAD8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732099307; c=relaxed/simple; bh=JlcPB+XQ7+ySKj6oA8bzNNE/FXbCzv6GW2zRxNfkS8U=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=f5k6eU7zxn5jCUDt9gTboGgULgWwEvM/I7zc3xMVKPBEJVky/664fjQyfIcm3ynvVWwKztInRvYlLc3eBJmX6I70uUCFwh9eT2scq90XtobN0YBhoGwxOflIQ8C8QuYPpnXUmBnWkSkv/ctJKNLMsz4Ru3jpIG7bMYrHpRuc5BM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4XtdCX4yVkz2GZlH; Wed, 20 Nov 2024 18:39:36 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id AA9911A0188; Wed, 20 Nov 2024 18:41:35 +0800 (CST) Received: from localhost.localdomain (10.90.30.45) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 20 Nov 2024 18:41:35 +0800 From: Yunsheng Lin To: , , CC: , , , Yunsheng Lin , Alexander Lobakin , Xuan Zhuo , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , Simon Horman , , Subject: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local Date: Wed, 20 Nov 2024 18:34:53 +0800 Message-ID: <20241120103456.396577-2-linyunsheng@huawei.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20241120103456.396577-1-linyunsheng@huawei.com> References: <20241120103456.396577-1-linyunsheng@huawei.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemf200006.china.huawei.com (7.185.36.61) X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC page_pool page may be freed from skb_defer_free_flush() in softirq context without binding to any specific napi, it may cause use-after-free problem due to the below time window, as below, CPU1 may still access napi->list_owner after CPU0 free the napi memory: CPU 0 CPU1 page_pool_destroy() skb_defer_free_flush() . . . napi = READ_ONCE(pool->p.napi); . . page_pool_disable_direct_recycling() . driver free napi memory . . . . napi && READ_ONCE(napi->list_owner) == cpuid . . Use rcu mechanism to avoid the above problem. Note, the above was found during code reviewing on how to fix the problem in [1]. 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/ Fixes: dd64b232deb8 ("page_pool: unlink from napi during destroy") Signed-off-by: Yunsheng Lin CC: Alexander Lobakin Reviewed-by: Xuan Zhuo --- net/core/page_pool.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index f89cf93f6eb4..b3dae671eb26 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -795,6 +795,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem, static bool page_pool_napi_local(const struct page_pool *pool) { const struct napi_struct *napi; + bool napi_local; u32 cpuid; if (unlikely(!in_softirq())) @@ -810,9 +811,15 @@ static bool page_pool_napi_local(const struct page_pool *pool) if (READ_ONCE(pool->cpuid) == cpuid) return true; + /* Synchronizated with page_pool_destory() to avoid use-after-free + * for 'napi'. + */ + rcu_read_lock(); napi = READ_ONCE(pool->p.napi); + napi_local = napi && READ_ONCE(napi->list_owner) == cpuid; + rcu_read_unlock(); - return napi && READ_ONCE(napi->list_owner) == cpuid; + return napi_local; } void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem, @@ -1126,6 +1133,12 @@ void page_pool_destroy(struct page_pool *pool) if (!page_pool_release(pool)) return; + /* Paired with rcu lock in page_pool_napi_local() to enable clearing + * of pool->p.napi in page_pool_disable_direct_recycling() is seen + * before returning to driver to free the napi instance. + */ + synchronize_rcu(); + page_pool_detached(pool); pool->defer_start = jiffies; pool->defer_warn = jiffies + DEFER_WARN_INTERVAL; From patchwork Wed Nov 20 10:34:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yunsheng Lin X-Patchwork-Id: 13881009 X-Patchwork-Delegate: kuba@kernel.org Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BF9B19CC20; Wed, 20 Nov 2024 10:41:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732099304; cv=none; b=XOeqcZ4ZIMejLGnSzDMkZT7PzjxityMbX9eePax6EKAyp9x9DVcuqAyLpyWHQ+FU1bV67yT/876G061fmox/vbZrH0kvjsZW1PAFlc8cPhUzWsPDgLxbU1vMp4uEw87o5Qfv3LOSjcwfPc2pK+fCzPUgJ+sFyA5B9NpLimsRZo0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732099304; c=relaxed/simple; bh=RYgkZEqCorElXt0EcI+64LUaN0vsbRHSMoUBTH0s8gs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bNLwRhmeRiQwY2cg2hcBnH9cxBpdNzYZ/HZN2UgF69HigFSPVVGM4D6654i8srKmVxrHlMgqMwWCAC3t4ZtJeGAsKLexzOPxpxYYePOgURSeonipwyie51K673qrq2KZFGfFwjGA2OdECB/ITOvpsJK7mpmAnb74ZzbuT3hnz9E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4XtdBk3Qb5z92Nn; Wed, 20 Nov 2024 18:38:54 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id D61F9140360; Wed, 20 Nov 2024 18:41:37 +0800 (CST) Received: from localhost.localdomain (10.90.30.45) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 20 Nov 2024 18:41:37 +0800 From: Yunsheng Lin To: , , CC: , , , Yunsheng Lin , Robin Murphy , Alexander Duyck , IOMMU , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , Simon Horman , , Subject: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound Date: Wed, 20 Nov 2024 18:34:54 +0800 Message-ID: <20241120103456.396577-3-linyunsheng@huawei.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20241120103456.396577-1-linyunsheng@huawei.com> References: <20241120103456.396577-1-linyunsheng@huawei.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemf200006.china.huawei.com (7.185.36.61) X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Networking driver with page_pool support may hand over page still with dma mapping to network stack and try to reuse that page after network stack is done with it and passes it back to page_pool to avoid the penalty of dma mapping/unmapping. With all the caching in the network stack, some pages may be held in the network stack without returning to the page_pool soon enough, and with VF disable causing the driver unbound, the page_pool does not stop the driver from doing it's unbounding work, instead page_pool uses workqueue to check if there is some pages coming back from the network stack periodically, if there is any, it will do the dma unmmapping related cleanup work. As mentioned in [1], attempting DMA unmaps after the driver has already unbound may leak resources or at worst corrupt memory. Fundamentally, the page pool code cannot allow DMA mappings to outlive the driver they belong to. Currently it seems there are at least two cases that the page is not released fast enough causing dma unmmapping done after driver has already unbound: 1. ipv4 packet defragmentation timeout: this seems to cause delay up to 30 secs. 2. skb_defer_free_flush(): this may cause infinite delay if there is no triggering for net_rx_action(). In order not to call DMA APIs to do DMA unmmapping after driver has already unbound and stall the unloading of the networking driver, scan the inflight pages using some MM API to do the DMA unmmapping for those pages when page_pool_destroy() is called. The max time of scanning inflight pages is about 1.3 sec for system with over 300GB memory as mentioned in [3], which seems acceptable as the scanning is only done when there are indeed some inflight pages and is done in the slow path. Note, the devmem patchset seems to make the bug harder to fix, and may make backporting harder too. As there is no actual user for the devmem and the fixing for devmem is unclear for now, this patch does not consider fixing the case for devmem yet. 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/ 2. https://github.com/netoptimizer/prototype-kernel 3. https://lore.kernel.org/all/17a24d69-7bf0-412c-a32a-b25d82bb4159@kernel.org/ CC: Robin Murphy CC: Alexander Duyck CC: IOMMU Fixes: f71fec47c2df ("page_pool: make sure struct device is stable") Signed-off-by: Yunsheng Lin Tested-by: Yonglong Liu --- include/net/page_pool/types.h | 6 ++- net/core/page_pool.c | 95 ++++++++++++++++++++++++++++++----- 2 files changed, 87 insertions(+), 14 deletions(-) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index c022c410abe3..7393fd45bc47 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -228,7 +228,11 @@ struct page_pool { */ refcount_t user_cnt; - u64 destroy_cnt; + /* Lock to avoid doing dma unmapping concurrently when + * destroy_cnt > 0. + */ + spinlock_t destroy_lock; + unsigned int destroy_cnt; /* Slow/Control-path information follows */ struct page_pool_params_slow slow; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index b3dae671eb26..33a314abbba4 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -272,9 +272,6 @@ static int page_pool_init(struct page_pool *pool, /* Driver calling page_pool_create() also call page_pool_destroy() */ refcount_set(&pool->user_cnt, 1); - if (pool->dma_map) - get_device(pool->p.dev); - if (pool->slow.flags & PP_FLAG_ALLOW_UNREADABLE_NETMEM) { /* We rely on rtnl_lock()ing to make sure netdev_rx_queue * configuration doesn't change while we're initializing @@ -312,9 +309,6 @@ static void page_pool_uninit(struct page_pool *pool) { ptr_ring_cleanup(&pool->ring, NULL); - if (pool->dma_map) - put_device(pool->p.dev); - #ifdef CONFIG_PAGE_POOL_STATS if (!pool->system) free_percpu(pool->recycle_stats); @@ -365,7 +359,7 @@ struct page_pool *page_pool_create(const struct page_pool_params *params) } EXPORT_SYMBOL(page_pool_create); -static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem); +static void __page_pool_return_page(struct page_pool *pool, netmem_ref netmem); static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool) { @@ -403,7 +397,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool) * (2) break out to fallthrough to alloc_pages_node. * This limit stress on page buddy alloactor. */ - page_pool_return_page(pool, netmem); + __page_pool_return_page(pool, netmem); alloc_stat_inc(pool, waive); netmem = 0; break; @@ -670,7 +664,7 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool, * a regular page (that will eventually be returned to the normal * page-allocator via put_page). */ -void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) +void __page_pool_return_page(struct page_pool *pool, netmem_ref netmem) { int count; bool put; @@ -697,6 +691,27 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) */ } +/* Called from page_pool_put_*() path, need to synchronizated with + * page_pool_destory() path. + */ +static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) +{ + unsigned int destroy_cnt; + + rcu_read_lock(); + + destroy_cnt = READ_ONCE(pool->destroy_cnt); + if (unlikely(destroy_cnt)) { + spin_lock_bh(&pool->destroy_lock); + __page_pool_return_page(pool, netmem); + spin_unlock_bh(&pool->destroy_lock); + } else { + __page_pool_return_page(pool, netmem); + } + + rcu_read_unlock(); +} + static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem) { int ret; @@ -924,7 +939,7 @@ static netmem_ref page_pool_drain_frag(struct page_pool *pool, return netmem; } - page_pool_return_page(pool, netmem); + __page_pool_return_page(pool, netmem); return 0; } @@ -938,7 +953,7 @@ static void page_pool_free_frag(struct page_pool *pool) if (!netmem || page_pool_unref_netmem(netmem, drain_count)) return; - page_pool_return_page(pool, netmem); + __page_pool_return_page(pool, netmem); } netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool, @@ -1045,7 +1060,7 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool) static void page_pool_scrub(struct page_pool *pool) { page_pool_empty_alloc_cache_once(pool); - pool->destroy_cnt++; + WRITE_ONCE(pool->destroy_cnt, pool->destroy_cnt + 1); /* No more consumers should exist, but producers could still * be in-flight. @@ -1119,6 +1134,58 @@ void page_pool_disable_direct_recycling(struct page_pool *pool) } EXPORT_SYMBOL(page_pool_disable_direct_recycling); +static void page_pool_inflight_unmap(struct page_pool *pool) +{ + unsigned int unmapped = 0; + struct zone *zone; + int inflight; + + if (!pool->dma_map || pool->mp_priv) + return; + + get_online_mems(); + spin_lock_bh(&pool->destroy_lock); + + inflight = page_pool_inflight(pool, false); + for_each_populated_zone(zone) { + unsigned long end_pfn = zone_end_pfn(zone); + unsigned long pfn; + + for (pfn = zone->zone_start_pfn; pfn < end_pfn; pfn++) { + struct page *page = pfn_to_online_page(pfn); + + if (!page || !page_count(page) || + (page->pp_magic & ~0x3UL) != PP_SIGNATURE || + page->pp != pool) + continue; + + dma_unmap_page_attrs(pool->p.dev, + page_pool_get_dma_addr(page), + PAGE_SIZE << pool->p.order, + pool->p.dma_dir, + DMA_ATTR_SKIP_CPU_SYNC | + DMA_ATTR_WEAK_ORDERING); + page_pool_set_dma_addr(page, 0); + + unmapped++; + + /* Skip scanning all pages when debug is disabled */ + if (!IS_ENABLED(CONFIG_DEBUG_NET) && + inflight == unmapped) + goto out; + } + } + +out: + WARN_ONCE(page_pool_inflight(pool, false) != unmapped, + "page_pool(%u): unmapped(%u) != inflight pages(%d)\n", + pool->user.id, unmapped, inflight); + + pool->dma_map = false; + spin_unlock_bh(&pool->destroy_lock); + put_online_mems(); +} + void page_pool_destroy(struct page_pool *pool) { if (!pool) @@ -1139,6 +1206,8 @@ void page_pool_destroy(struct page_pool *pool) */ synchronize_rcu(); + page_pool_inflight_unmap(pool); + page_pool_detached(pool); pool->defer_start = jiffies; pool->defer_warn = jiffies + DEFER_WARN_INTERVAL; @@ -1159,7 +1228,7 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) /* Flush pool alloc cache, as refill will check NUMA node */ while (pool->alloc.count) { netmem = pool->alloc.cache[--pool->alloc.count]; - page_pool_return_page(pool, netmem); + __page_pool_return_page(pool, netmem); } } EXPORT_SYMBOL(page_pool_update_nid); From patchwork Wed Nov 20 10:34:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yunsheng Lin X-Patchwork-Id: 13881010 X-Patchwork-Delegate: kuba@kernel.org Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 484C819D060; Wed, 20 Nov 2024 10:41:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.255 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732099305; cv=none; b=Z38+ApbL/XeJbILBkP85+3mGW6AYHEFRqeTQvNSqa1ra/1iF1cOWM5T/o6ugIJmXwscyuq7qNnUITy3oDBOXbO/d57kHxft9aosH+OdPvfA/4645m4dT8jjKWvtc0ykNHUej/L+TE5y2Az6Gy1zKcJ6PnRTxLNjJTs0Z6oEJ/7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732099305; c=relaxed/simple; bh=mcby2DGppHd18uHygt54sfA41+/bgOa1b8xhVO49ZeU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NX1ISy/f8mtJ32h9fbNk85Av6Gm0wgQbNK4IWyMIkvyDSnSVCtJK4WucqyNlXSjS37u2o7wyikTu/t6/EPvV92ktARYtb5dGfJrVuM4kMxj+ZdYS6HweSMdEB022kt22vKEIK2Y1pcNPiIHCis/VMPvro1ZZCWpH6u6EoRxEOkk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.255 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4XtdBs1qkjz1V4dw; Wed, 20 Nov 2024 18:39:01 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 819D01800F2; Wed, 20 Nov 2024 18:41:39 +0800 (CST) Received: from localhost.localdomain (10.90.30.45) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 20 Nov 2024 18:41:39 +0800 From: Yunsheng Lin To: , , CC: , , , Yunsheng Lin , Robin Murphy , Alexander Duyck , Andrew Morton , IOMMU , MM , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , Simon Horman , , Subject: [PATCH RFC v4 3/3] page_pool: skip dma sync operation for inflight pages Date: Wed, 20 Nov 2024 18:34:55 +0800 Message-ID: <20241120103456.396577-4-linyunsheng@huawei.com> X-Mailer: git-send-email 2.30.0 In-Reply-To: <20241120103456.396577-1-linyunsheng@huawei.com> References: <20241120103456.396577-1-linyunsheng@huawei.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemf200006.china.huawei.com (7.185.36.61) X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Skip dma sync operation for inflight pages before the page_pool_destroy() returns to the driver as DMA API expects to be called with a valid device bound to a driver as mentioned in [1]. After page_pool_destroy() is called, the page is not expected to be recycled back to pool->alloc cache and dma sync operation is not needed when the page is not recyclable or pool->ring is full, so only skip the dma sync operation for the infilght pages by clearing the pool->dma_sync under protection of rcu lock when page is recycled to pool->ring to ensure that there is no dma sync operation called after page_pool_destroy() is returned. 1. https://lore.kernel.org/all/caf31b5e-0e8f-4844-b7ba-ef59ed13b74e@arm.com/ CC: Robin Murphy CC: Alexander Duyck CC: Andrew Morton CC: IOMMU CC: MM Fixes: f71fec47c2df ("page_pool: make sure struct device is stable") Signed-off-by: Yunsheng Lin --- net/core/page_pool.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 33a314abbba4..0bde7c6c781a 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -712,7 +712,8 @@ static void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) rcu_read_unlock(); } -static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem) +static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem, + unsigned int dma_sync_size) { int ret; /* BH protection not needed if current is softirq */ @@ -723,10 +724,13 @@ static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem) if (!ret) { recycle_stat_inc(pool, ring); - return true; + + rcu_read_lock(); + page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); + rcu_read_unlock(); } - return false; + return !ret; } /* Only allow direct recycling in special circumstances, into the @@ -779,10 +783,11 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem, if (likely(__page_pool_page_can_be_recycled(netmem))) { /* Read barrier done in page_ref_count / READ_ONCE */ - page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); - - if (allow_direct && page_pool_recycle_in_cache(netmem, pool)) + if (allow_direct && page_pool_recycle_in_cache(netmem, pool)) { + page_pool_dma_sync_for_device(pool, netmem, + dma_sync_size); return 0; + } /* Page found as candidate for recycling */ return netmem; @@ -845,7 +850,7 @@ void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem, netmem = __page_pool_put_page(pool, netmem, dma_sync_size, allow_direct); - if (netmem && !page_pool_recycle_in_ring(pool, netmem)) { + if (netmem && !page_pool_recycle_in_ring(pool, netmem, dma_sync_size)) { /* Cache full, fallback to free pages */ recycle_stat_inc(pool, ring_full); page_pool_return_page(pool, netmem); @@ -903,14 +908,18 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data, /* Bulk producer into ptr_ring page_pool cache */ in_softirq = page_pool_producer_lock(pool); + rcu_read_lock(); for (i = 0; i < bulk_len; i++) { if (__ptr_ring_produce(&pool->ring, data[i])) { /* ring full */ recycle_stat_inc(pool, ring_full); break; } + page_pool_dma_sync_for_device(pool, (__force netmem_ref)data[i], + -1); } recycle_stat_add(pool, ring, i); + rcu_read_unlock(); page_pool_producer_unlock(pool, in_softirq); /* Hopefully all pages was return into ptr_ring */ @@ -1200,6 +1209,8 @@ void page_pool_destroy(struct page_pool *pool) if (!page_pool_release(pool)) return; + pool->dma_sync = false; + /* Paired with rcu lock in page_pool_napi_local() to enable clearing * of pool->p.napi in page_pool_disable_direct_recycling() is seen * before returning to driver to free the napi instance.