From patchwork Mon Jan 27 02:57:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yunsheng Lin X-Patchwork-Id: 13950944 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74BABC0218D for ; Mon, 27 Jan 2025 03:04:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBE1F2800CA; Sun, 26 Jan 2025 22:04:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D6DEE6B0278; Sun, 26 Jan 2025 22:04:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C35992800CA; Sun, 26 Jan 2025 22:04:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A4DCD6B0277 for ; Sun, 26 Jan 2025 22:04:53 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2E58D81C2A for ; Mon, 27 Jan 2025 03:04:53 +0000 (UTC) X-FDA: 83051739666.09.D4D2A38 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf05.hostedemail.com (Postfix) with ESMTP id 48D9810000B for ; Mon, 27 Jan 2025 03:04:49 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf05.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737947091; a=rsa-sha256; cv=none; b=COBy6foNRteLuZk7P9LTAoS2oqDG9vdCeN3kOcRZWPuDyZpiMheCRvxGRxBSjYWg88bPQy jTjndF9G4oNSOQI5CKTjt8jAwRp4+4nRGfeBEetS+0o3/hKZHCBOq3RIoMIfHESDlSjtIH BogYV3IBNCI/jvDOkz74cVpPkvcyi+Q= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf05.hostedemail.com: domain of linyunsheng@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=linyunsheng@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737947091; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=5hPEoOXD7vh0YSjHsAN+Af8DsMrKotGPIYAaWMuUu+U=; b=qoubuPEXM/zkAZH86S/q/VPmJOepzjRJMoBomC1s8ygB+ZVzOraq9t7R0thAOth5DWgKu+ JhBUsitOqkpUiHDfSOqVwEDSLEtoqc6s14P717OiOVTrl/AJB8h+2pG5ViwqzSMBNRjjB+ o/IC8kKvuR/yAL+BaLE4FOPgsv7uJ/o= Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4YhCt359LFz1JJ4v; Mon, 27 Jan 2025 11:03:39 +0800 (CST) Received: from dggpemf200006.china.huawei.com (unknown [7.185.36.61]) by mail.maildlp.com (Postfix) with ESMTPS id 3AB8E180042; Mon, 27 Jan 2025 11:04:45 +0800 (CST) Received: from localhost.localdomain (10.90.30.45) by dggpemf200006.china.huawei.com (7.185.36.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 27 Jan 2025 11:04:44 +0800 From: Yunsheng Lin To: , , CC: , , , Yunsheng Lin , Alexander Lobakin , Robin Murphy , Alexander Duyck , Andrew Morton , IOMMU , MM , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Matthias Brugger , AngeloGioacchino Del Regno , , , , , , Subject: [RFC v8 0/5] fix two bugs related to page_pool Date: Mon, 27 Jan 2025 10:57:29 +0800 Message-ID: <20250127025734.3406167-1-linyunsheng@huawei.com> X-Mailer: git-send-email 2.30.0 MIME-Version: 1.0 X-Originating-IP: [10.90.30.45] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemf200006.china.huawei.com (7.185.36.61) X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 48D9810000B X-Stat-Signature: mpy6mbmgmc5p7f1raixfsn16riukbw5b X-HE-Tag: 1737947089-315129 X-HE-Meta: U2FsdGVkX18drez0gem+uw/IcAcSueLiOrjTh0HcjzvVytr+1Qe+WbMWIW19T2a2FHbQBmRSo84xjEtki/H9V7WB/AmKBpi1lWgk3zB7dcXQeKIGi8Ei5TYyKeOZuZuTHmRJTJmy+7nKZuAw6r77I97047EGjlP497eo2/rQ1EwsA6byVCosZ8A4jD/G7kNAkSyivaUvfS/o1KQJUNt66YQrpUEuAadbYJl0eGpXc8ExY6YgQkND7CnMLcSPOi10e/ktVwfZU4PxEuRYumRLPGIuhQwyYTZKhNCALyLXvEVkV11iFdBfStw/Mm3TExpiFbgEsUOrvmcihxncl46QHPXsT1ciJj8x4BH+u9QAgAJ19Mpx/AUnUu08apBriKCY+JFSxGV6lGLUk7IrjecpM1jqMWeAMoRXLejDQsiwDQgVAEy0NdsSJPpzv8BI2ajZ360xC21ljbL2poyLLBQhB2H1pYYr64erludcam4YLVAi6noJ6s+qRS01lyPgTRpneweM84WA7kcLBb5fey0rWzYfh9bEMfb7xdho9acIwWwDeo6K7yRbWAQJeyS1BXxbvZHrerWNRtwsTrQxBtNF6av9B6YDCktpTsNXhkS4TZmncekGwYuWdmJ74TcYVhFO4UXpSlOrT6LYPe0G5UiyHaMRef6epyTXR9d3i2xkOZi9izasGIj4YA8Q1Hh1mLlmb11GuWmzLS2knWl7M+7Vfym+mzUoLwefyVObxqCsyXEJji+KDDCt47MF6SMy+GhWrIPCKHO0Rk57Yvs2lp+tNLDEcU0zs+RB2OSVt8/OzkGici84cl9FlUe5iSl6tVTkLMuOP8KF+LncytuoEI3+EEKekAb5faD5cntKGDyca3Pf6yKDoZsW9Hyr0+u9Wae4Cfj2VTkrwwSTie7A9U0YFTlRfqCZ35YvdEsYorHqCfFtAWCJpm1V8hbahCq0nr9uSC57yeGwr0xB4yRN4xF CgAQB2ka ArEhtZQ2g0xT1RF5sAJV9bCC5Ba1zpaNC88wiPXKXsb3lLZrTrn26ea4C7vcyFACdUrr4DrA3xjZ01EjkFa7InU/feoke8jVDaIgOrKU8AntolUfF0qm8Gbm6h5LFpVDqzDntgye7ddtxiwqNw3mu7S8NcZ2UpwqdgI6hPB4yLD/XWj+j6egbv/2p76lmAzERYXWWCbA0+zqpY/q4DASQQBdLMa0gxohMxUhne1L4BcBZ9KrTwInx1y7BxkFRsrog4OoNy+GRTeFx0ED5jys9tQ/2lUxun7D7tln/xq0sCynr+bRv+oZx+4vHQ3y5P7A/i1psymFE4TEaGFOchY8Yyxy6hb+5MZodtPSyLLku6uqJnE2OSxRiNhlWQhucR+2JEjOsa4yYCrZpjtmklQa3u//0LY0Y8IWaITOsTEnGt9DxpBdnsfQHZ+Rj6iX69Qq04H2yQjesk97WJKQCB9ZbfcFsgBVH8l0RSkgFNdTZJvn3XdE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patchset fix a possible time window problem for page_pool and the dma API misuse problem as mentioned in [1], and try to avoid the overhead of the fixing using some optimization. From the below performance data, the overhead is not so obvious due to performance variations in arm64 server and less than 1 ns in x86 server for time_bench_page_pool01_fast_path() and time_bench_page_pool02_ptr_ring, and there is about 10~20ns overhead for time_bench_page_pool03_slow(), see more detail in [2]. arm64 server: Before this patchset: fast_path ptr_ring slow 1. 31.171 ns 60.980 ns 164.917 ns 2. 28.824 ns 60.891 ns 170.241 ns 3. 14.236 ns 60.583 ns 164.355 ns With patchset: 6. 26.163 ns 53.781 ns 189.450 ns 7. 26.189 ns 53.798 ns 189.466 ns X86 server: | Test name |Cycles | 1-5 | | Nanosec | 1-5 | | % | | (tasklet_*)|Before | After |diff| Before | After | diff | change | |------------+-------+-------+----+---------+--------+--------+--------| | fast_path | 19 | 19 | 0| 5.399 | 5.492 | 0.093 | 1.7 | | ptr_ring | 54 | 57 | 3| 15.090 | 15.849 | 0.759 | 5.0 | | slow | 238 | 284 | 46| 66.134 | 78.909 | 12.775 | 19.3 | And about 16 bytes of memory is also needed for each page_pool owned page to fix the dma API misuse problem 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/ 2. https://lore.kernel.org/all/f558df7a-d983-4fc5-8358-faf251994d23@kernel.org/ CC: Alexander Lobakin CC: Robin Murphy CC: Alexander Duyck CC: Andrew Morton CC: IOMMU CC: MM Change log: V8: 1. Drop last 3 patch as it causes observable performance degradation for x86 system. 2. Remove rcu read lock in page_pool_napi_local(). 3. Renaming item function more consistently. V7: 1. Fix a used-after-free bug reported by KASAN as mentioned by Jakub. 2. Fix the 'netmem' variable not setting up correctly bug as mentioned by Simon. V6: 1. Repost based on latest net-next. 2. Rename page_pool_to_pp() to page_pool_get_pp(). V5: 1. Support unlimit inflight pages. 2. Add some optimization to avoid the overhead of fixing bug. V4: 1. use scanning to do the unmapping 2. spilt dma sync skipping into separate patch V3: 1. Target net-next tree instead of net tree. 2. Narrow the rcu lock as the discussion in v2. 3. Check the ummapping cnt against the inflight cnt. V2: 1. Add a item_full stat. 2. Use container_of() for page_pool_to_pp(). Yunsheng Lin (5): page_pool: introduce page_pool_get_pp() API page_pool: fix timing for checking and disabling napi_local page_pool: fix IOMMU crash when driver has already unbound page_pool: support unlimited number of inflight pages page_pool: skip dma sync operation for inflight pages drivers/net/ethernet/freescale/fec_main.c | 8 +- .../ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +- drivers/net/ethernet/intel/iavf/iavf_txrx.c | 6 +- drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +- drivers/net/ethernet/intel/libeth/rx.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +- drivers/net/netdevsim/netdev.c | 6 +- drivers/net/wireless/mediatek/mt76/mt76.h | 2 +- include/linux/mm_types.h | 2 +- include/linux/skbuff.h | 1 + include/net/libeth/rx.h | 3 +- include/net/netmem.h | 22 +- include/net/page_pool/helpers.h | 15 + include/net/page_pool/types.h | 46 +- net/core/devmem.c | 4 +- net/core/netmem_priv.h | 5 +- net/core/page_pool.c | 425 ++++++++++++++++-- net/core/page_pool_priv.h | 10 +- net/core/xdp.c | 3 +- 19 files changed, 500 insertions(+), 79 deletions(-)