From patchwork Mon Mar 25 17:14:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13602531 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48D3DC54E58 for ; Mon, 25 Mar 2024 17:14:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B44426B0092; Mon, 25 Mar 2024 13:14:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF4926B0093; Mon, 25 Mar 2024 13:14:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BBA46B0095; Mon, 25 Mar 2024 13:14:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8D74B6B0092 for ; Mon, 25 Mar 2024 13:14:15 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1F194807E6 for ; Mon, 25 Mar 2024 17:14:15 +0000 (UTC) X-FDA: 81936209670.19.B1CC801 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf19.hostedemail.com (Postfix) with ESMTP id 3ED601A000D for ; Mon, 25 Mar 2024 17:14:13 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=e7NC1joa; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711386853; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=hD4pKr1Z3PiNKpEy3DKOk6KxCoRm3Zwe+f8OfrIAeqY=; b=yp5EnRdms9bYUp7FRhKFQSirnR3xix8jq4MtO76Kxd7JFACgEVgFvV/FCf3GGShxzjarVW spuUNpo7rcg6mUIiVnhzWWfFKhIJID7oCKJqQtARCqiepsm5VmXniY/go1MH0KJe8fYcR0 QGLd7X1p1FIvDzu81b+dETcFhQyi2Sw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711386853; a=rsa-sha256; cv=none; b=zPcDxLMsMdGMiH9hxBVOvqn4gSDBWkwSmapG84sQlTTU0EvpdZLJck9x2oVojoiMr4Seav ln+2N2Hg7u24od0dVPpdl+9Qr/11cxBeV3nbg2THLMKqc8NqMtrDPGZ7YY5Xy9rr1oDHqn AWFY/QoNNcCU24ZcWo+yYThpML0IT2w= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=e7NC1joa; spf=pass (imf19.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-6e6b22af648so3676675b3a.0 for ; Mon, 25 Mar 2024 10:14:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711386851; x=1711991651; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=hD4pKr1Z3PiNKpEy3DKOk6KxCoRm3Zwe+f8OfrIAeqY=; b=e7NC1joaTvcqZTvuT1ubRgbvSw52rzCTIZtYU0BUrvUZncEazMI81JaY21MmJPFL5Q LC/svvuUfXlMZxYJi5/1My/7Wa/6LMHOxQWs3LWk66/bDsOQRvKX3gFEOIpFQ/nfJxnm nsyerAoIsoiYiMRofnqUUBKwT7g3boft/eUD5mGwMAd9b0P+7A+HkycUjnIjyekavflw CUzgOWWvfezfu7zGFl7snz8vofu/6xVbMWWhDR2034UKbJLvzkA9bof28wmcmTVGzlit Jr8lf/gNVt2wAOgAp3ddn3D1/JFomFPjwaVo2wr7NiApNdfONuzg+VWz9bN0X4FPlLG4 4DIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711386851; x=1711991651; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hD4pKr1Z3PiNKpEy3DKOk6KxCoRm3Zwe+f8OfrIAeqY=; b=fWXzP4/zStMn0jCAzlSZIlkSe0aVxDyDq4mT1Vjh5pFQxFFSZSg2c63uoJVbCy58DQ 1YhqVyQPBFql0D4eqoK02QapnMxbZewBqpaUDWy0mzw/YH0OY8Fd1X4SCeqWPWSgaSjm MTQNsIU64AlQ5EhVmd6TrMvrwzOnnLUo4FqYWLl19fKMXzHjxuWzS6VFTGia8e+zmlf1 wDhb6crxAg5sGxiajmHGCT1aPd5au7qqnQgb58QM7inBwrqmaNuI1clsvljBx3LoFWti y932098UD/DnPdMd2dc0qVJ5Cphf9RizuFrQchbnDaMrF6hSmYC3Z/YwauzkPvqdAOBU wkTw== X-Gm-Message-State: AOJu0YydK/v6LYg3Q+nCQtAqtk58sLWR91KWr7Zc2G5zAfi3dofxFYwV nvOC/DiDFHLfbMxTchOQ/KZEH6zk8RSrFLZ0lNcN1lr2bnEemolir5qB/YBNxmuuFMXk X-Google-Smtp-Source: AGHT+IElg2PDhkm5P/uKexHiLvmcCdc6JAPhGRD2O5JRmQJNtKe/5yqqSCz31qjWUuKmGJuciTnp5g== X-Received: by 2002:a05:6a20:8f07:b0:1a3:5920:ae9f with SMTP id b7-20020a056a208f0700b001a35920ae9fmr9155255pzk.2.1711386851022; Mon, 25 Mar 2024 10:14:11 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([115.171.40.106]) by smtp.gmail.com with ESMTPSA id r16-20020a63d910000000b005dc5129ba9dsm6001812pgg.72.2024.03.25.10.14.09 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 25 Mar 2024 10:14:10 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Matthew Wilcox , Andrew Morton , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 0/4] mm/filemap: optimize folio adding and splitting Date: Tue, 26 Mar 2024 01:14:01 +0800 Message-ID: <20240325171405.99971-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 3ED601A000D X-Rspam-User: X-Stat-Signature: pnmoirrrn78xicsrtfr45digqs1agaoq X-Rspamd-Server: rspam03 X-HE-Tag: 1711386853-848860 X-HE-Meta: U2FsdGVkX19TdrOAt7CGp+QMgock2EvLYzK8sOkEYMexDqv0c3uwGBjrmAJPrzHd2UpNIiXHOzGP1WSqRp4j7vm0lT7gs0v7SpJ2zm3dw+baVGR/KGacc07uZ8D/dkhP1mUMzj1rKxW24jI41wdaHnn/p/54JKfAcuUbc0EiPUpaGp6efCsIzEqyRbKmvq85IE48qZwxeoGFGVfQNndrLv8r2+OV1xy2dJzGL438TTY5nJRt9mE4fVZQVrhSZaFZOO0jn7yL+hUV5gYRIPNcL/g4hMvUc+lB235/QXaeqmTvIpj0XvlBlDsAbbwUh9135cn5Tm3tRQBvVNukKGT5rbU0i/Qzyc2YvgSnyeYG0WUxJ7xxkd0aJCTp0QypcJBCQ6woudQJjE7Ize4u48aqW0RgTbHoP3KeprDwK6vbLpctX0sw8AsiTiNJq5qSON3jPXqfWQ6uUGLkY/m84BxJX3CXeAp57IZc7WyaujWhIx5gzIA+fGkYiaTphNcH3Xt1d7ny3G9+ZtDYY/4ZOwAIPiXJh1LGvXaVyjXvkLDsTTxyvUcoDsoTgOHwVqwDAr4qbZBvoDeKA7WmwEF392/2NB0/QHOkF72+QsborVdvh5cPsjJl3jGrS/4Q6g1kY3u2tsUJTX6An6Lg3/cCTjsYATKdtB/HRIxb2wAaaFngH5v6SGQIy+tLX+Tz69IMvADsckDtDF4p4Mh23iS0iFGD2LMdJNUXweFQGssU/Grz4CdksZf76OkaBIWNPYGwRCZdqlXn1F0x4hhQoJ42K8KPKOekmbO/TbzEwr59oJEUxDAV8lzwPfOfEGd6tXJ+yJ3P7KOfimXong9ME2ykLdolshFT6fgv63rmBqsYRerYMJcYnNQgQEaZjVOgwqBSSaLGIQMOo76eFMf9wqN3tjdCXlasHTfBC3LyjtnXCuKulpOa07E9njAjY4YsriAprLLMbmVtjmltMLRzGkz+Tq4 cbxMpbog iJhA/OsBS8tH7YddLVIW6G5HSx+gs+ujnTqZ4E55+4K6qqHETPBkxhBVe+y4w8nH3b+0BWz3E/SqJDI/tMmZ+kb+iS48nvPurfbLToG8yfcxhyDaamuvS/MWpLXhfZC6t8n1ar0A35YIh/DiE4KqwtDffRKWUnbWET5FFZDDJq1p2r0mMOY4245/XLiZoI1CnO9U/S3GGoWmbS4pQbe7Ougkl5yb+CRf2XUC+9z9Z8xQO6QJ9pXtJttpo0QBNpNDoEnJNrcOsdu+BeuTmOig4XLTa885cNiBZWCIdFGj+t2uW41a9u64Xekx2QRhN6eHh1HN6xx6WhioWFiONAbfo1TTaSsIe6d30z7py6zpcI2xDWDgP2er6iFQLnrgJTS42fJDLNKEHPwqtTlt6EZJRpUGuJX9YOTzhYoka+steBNvjiBrIfF+LzJzntZKYG4v7/xhiZWsZU3lDkx55xPyzQe9O5rkgdURVIfwpMrQ5CSfgZfnYItye+y7xjLc81Rpaf4yI4Q2p8p13uqIPDJc8lWLPoLE2F5yOTnBLiNjLo03XswyjWWfeucOBtCWLXM16G5ofzeJGjsfAdcVouf0850rJAwXqxTHXlyTbaSCxasL8mOl0tImOei8byN91kwkrZMvg7notgsEmWjDCpwZQ67k1iX8ohL5WAYq5TY3HXl7xYD6eA4KCyNExOcI1JX19rA06s+Yw2I5F03FKl/3EpbxanNyYa2BJqE/yk0apuusmwRs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Currently, at least 3 tree walks are needed for filemap folio adding if the folio is previously evicted. One for getting the order of current slot, one for ranged conflict check, and one for another order retrieving. If a split is needed, more walks are needed. This series is trying to merge these walks, and speed up filemap_add_folio, I see a 7.5% - 12.5% performance gain for fio stress test. So instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order. Drop the lock and do the allocation only if a split is needed. In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranged conflict check and order retrieving, one for the second check after allocation, one for the insert after split). Testing with 4K pages, in an 8G cgroup, with 16G brd as block device: echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting Before: bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691 iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691 After (+7.3%): bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651 iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651 Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting): echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap -thp=1 --readonly \ --rw=randread --time_based --ramp_time=30s --runtime=10m \ --group_reporting fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap \ --rw=randread --time_based --runtime=5s --group_reporting Before: bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976 iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976ยท READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec After (+10.4%): bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146 iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146 READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read. V1: https://lore.kernel.org/lkml/20240319092733.4501-1-ryncsn@gmail.com/ Updates from V1: - Collect Acks. - Add tests for new xas_get_order and combined usage of xas_get_order with xas_for_each_conflict. - Fix a memleak for patch 4/4 and modify the function in place instead of adding a new helper. - Update benchmark, I forgot to drop cache and disable THP for pervious test, so the result was for mixed usaged of split and add. The result is even better now. Kairui Song (4): mm/filemap: return early if failed to allocate memory for split mm/filemap: clean up hugetlb exclusion code lib/xarray: introduce a new helper xas_get_order mm/filemap: optimize filemap folio adding include/linux/xarray.h | 6 +++ lib/test_xarray.c | 89 ++++++++++++++++++++++++++++++++++++++++++ lib/xarray.c | 49 ++++++++++++++--------- mm/filemap.c | 74 +++++++++++++++++++++++------------ 4 files changed, 175 insertions(+), 43 deletions(-)