From patchwork Tue Mar 19 09:27:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13596503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFD54C54E5D for ; Tue, 19 Mar 2024 09:41:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F3916B0095; Tue, 19 Mar 2024 05:41:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 655F26B0096; Tue, 19 Mar 2024 05:41:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4584E6B0098; Tue, 19 Mar 2024 05:41:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2D4CD6B0095 for ; Tue, 19 Mar 2024 05:41:52 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D55721A10CB for ; Tue, 19 Mar 2024 09:41:51 +0000 (UTC) X-FDA: 81913296822.05.BB2BFC0 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf06.hostedemail.com (Postfix) with ESMTP id 15532180008 for ; Tue, 19 Mar 2024 09:41:49 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ThcG16pQ; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710841310; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+IPrpGzoHpDGCt5ZLHs8DWvQZcRMV24wjwQMoVjv/9U=; b=TJJaYyOf0pKjrEOzPWSTf2bpkwBbz4jxdTyhMFDAS0RkWYlQRvSiPfTkUDQyn4fPA5rBU0 xiVIcYWrCtsWONsVTOWweXSIkvPAXoo01XhjsJp6kls2s6Z61g87yCWpTWvYdL9WeSZzNS 7e9yTd+GgIO4A+fqm+a8aE/pXGLEQHU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710841310; a=rsa-sha256; cv=none; b=Wf3z73BiPjF89L2H3luwdYNhZ3Pb213BkpxCgbM5UhLLDeBdRlau1WJlJfZhysRK5v92qL q9qwInvy3eunuvJJbjE60IljxRaZpm5mswWXX293xcp/dKApNEz2kZ694b4xllmL5zuyFb /kaSodMEgZjldK9gyfvD8LjH23sjPog= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ThcG16pQ; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6e6bee809b8so4922179b3a.1 for ; Tue, 19 Mar 2024 02:41:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710841308; x=1711446108; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=+IPrpGzoHpDGCt5ZLHs8DWvQZcRMV24wjwQMoVjv/9U=; b=ThcG16pQPY1PSjxNLjMpPCyUnp17bee8cDOqp1YT2vEy4f8BRiC4Gcf1XFcJ8ZDAVt g5vySVIXn51ZCaB82YWWzxVOVFWoyUlv3edxNLhB3kjzHLG53FyYnUeI7Uo5Q/1NrG7Y HUxsjgetBUKoY89a49t/apEOm0M3Rqka8ewU8lVZj215PQx/AxZS7sgmkAF/wugyPD6E BKCDqzIBzwJRz0vKSEUVggmwLx5MffPFx9LKmSmcV7byWz6mSBfTodnSons+vaZZ+LtR LSLKptdRH0V1vRwdGubcpr3Ak/crQRbblUl1if3xL4cs9f0CZrwIOGyh8t6ztOM20Oto 9aRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710841308; x=1711446108; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=+IPrpGzoHpDGCt5ZLHs8DWvQZcRMV24wjwQMoVjv/9U=; b=XEOyg1w6mdtxQvGltY7ya+p0A7Nk3UsTAOUSaev9CyILwfLesJXiw6pykYa7Vn1LF0 fUvq1KrcvNqbSIwv41xrC17jbI7jd55bQ3aJmoLY1gQQfKMERBdFRFaYiOjdqMBCFngH iKe0Drw3NMjrfzdppV2OHZr2yG4Ci+9c8F/+e9cg8ZV2th2yi3lW2Y7HwyfGH00rvYx1 JLgfPhZdWupyxxpNMeYrJbruYu+K2/kQTj62ySIOWiMNjha7dyBy253Ffd4/IgjAKjXR mozByiv0ahn+M4KoJVqW92Yu4qva8e2ThiBe/Yz5b2en8ho9CnOZAcED7++JVIluBL7k hJMw== X-Gm-Message-State: AOJu0Yyt/UNKzdIlOiav9QknHhUUfszMrT2pCp074n+EPCF4KEKmxCSI vAgq0P+ihjDb9XPoj4lfMpEFgWtLGTCbF9ERHWtFxvYd1AF9pxyRGnv3B7ff7Kkksaeg X-Google-Smtp-Source: AGHT+IH9clmpXo+eDYRjLfrDjT4vum1cmIxLp7ORYVYCH84rdbd4eGTduRgvmmpEsgUtIio/VDV/8g== X-Received: by 2002:a05:6a00:9389:b0:6e6:a1ff:3661 with SMTP id ka9-20020a056a00938900b006e6a1ff3661mr19465566pfb.31.1710841308559; Tue, 19 Mar 2024 02:41:48 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id i185-20020a62c1c2000000b006e583a649b4sm9708761pfg.210.2024.03.19.02.41.46 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 19 Mar 2024 02:41:48 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 4/4] mm/filemap: optimize filemap folio adding Date: Tue, 19 Mar 2024 17:27:33 +0800 Message-ID: <20240319092733.4501-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240319092733.4501-1-ryncsn@gmail.com> References: <20240319092733.4501-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 15532180008 X-Rspam-User: X-Stat-Signature: i48fys9sxz6eb9nxzq3zm9x76ni3o6tc X-Rspamd-Server: rspam03 X-HE-Tag: 1710841309-632968 X-HE-Meta: U2FsdGVkX18gCwBnWk4wUy5kE30oUoI/HfeSaJHzDGnsrMQtOyqfLv/LgWDZ34SjHFt9LhluYo04h+jXjvwaNaLN5Irk/LIfO1ZqVbPQrRZ9ec00ZHcHPcWnH/2hlBsl51U3DWrGAzyemlPwh+EfCAGERliD5sh6kHV0bY1oqrjsr/fVP/270ibH/hzYsCKDzzTujnoMFmflCZ2udnk3sDTVydCKWYdT9iknVW7wJow7YJ7v+0bfbrKAHfSp2XhZFrg/QysBIgOa92RLllpRoFEcDZAvLahZHQs+s2AD1jHCO6kOtvya6xTVm5L/CiiqAD2FAJkCB33WwVDjVQHU4+sK6LSPg2ICq/eKZAcxXJHfKdcFpntrYHIZIh33f+Rys/4tKIMsrfplULNEwsnR0CCo0NKAOwRt/hy3C+6DMhKkyvItJAKB+X1hIRlU0kJsb6zM8oDb654Qpr1PJLnKPp84f0FH6OCYKhVXn9kHz3kSrPVdtEntgwbZF+S+EjQi7TwjnGm7R11DGUMvOFe586dTyyskovr7FUNikbR/jnmRX291gHA+VquBNGZLZw9ZVAiS1lzNUaJ8lTX1tWcKv0lbqkMJWrIejiPJL9x6MejZXcHI14t0rdHPTsAgK8A1NlswcX5yUF2MKHoNe++kgisWlb+0jvZgI14We6s7uDE9lQ1semJyRLxwz6tCj5FQMx/Ybz0S6MK88QV8WYFZhc2kqqy0gSsLDyPILcT+NK+7EZLllauDVw74k77o6+4sSrZybpRx2DeZsE6AVGTc5RRvcjtpLcfvYflL//n2kcWveg3/6bKQ5VqgEm4mCnJgsKvw1qipvTdr7U1Klwdhq1vXcW+scGd2dIYs0qtICVmNkZaPdkZm5wvPi3zdu+dfVLZErPrQly0hNVN2Qy1q4XwCXLP0B8R9rSjE7uWEcuAFmwZxv3rR2sDbAVY05QhIh+OmA7s9AsWi+tGZScL Bk4GsHe4 D6fQidsnMzQErw3AWSAtn2xSVNNBXybuthZqV8vPmLGASXUlX1pShmZL1fIbV+1jWULeZlYtgtigsV9R/unQiKxzty8OQU0/yBOTdEG30E0HF4rHvSAJYI3r6ZhKlOaSM58VZQ+ZGf8/ALHyeypMN+VarSiZ5fVefFTTL1z6+iVlWckM2w1wcH4GMOBEFBj1ywyx+z2Kc+WZ0/uqF/OJpqQ+jJwKXVVG1VyqcaAP4RCBIXTddBHfjyKkyCfhgShjs4iOY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order. Drop the lock and do the allocation only if a split is needed. In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranced conflict check and order retrieving, one for the second check after allocation, one for the insert after split). Testing with 4k pages, in an 8G cgroup, with 20G brd as block device: fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting Before: bw ( MiB/s): min= 790, max= 3665, per=100.00%, avg=2499.17, stdev=20.64, samples=8698 iops : min=202295, max=938417, avg=639785.81, stdev=5284.08, samples=8698 After (+4%): bw ( MiB/s): min= 451, max= 3868, per=100.00%, avg=2599.83, stdev=23.39, samples=8653 iops : min=115596, max=990364, avg=665556.34, stdev=5988.20, samples=8653 Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting): fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine mmap -thp=1 --readonly \ --rw=randread --random_distribution=random \ --time_based --runtime=5m --group_reporting fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine mmap --readonly \ --rw=randread --random_distribution=random \ --time_based --runtime=5s --group_reporting Before: bw ( KiB/s): min=28071, max=62359, per=100.00%, avg=53542.44, stdev=179.77, samples=9520 iops : min= 7012, max=15586, avg=13379.39, stdev=44.94, samples=9520 bw ( MiB/s): min= 2457, max= 6193, per=100.00%, avg=3923.21, stdev=82.48, samples=144 iops : min=629220, max=1585642, avg=1004340.78, stdev=21116.07, samples=144 After (+-0.0%): bw ( KiB/s): min=30561, max=63064, per=100.00%, avg=53635.82, stdev=177.21, samples=9520 iops : min= 7636, max=15762, avg=13402.82, stdev=44.29, samples=9520 bw ( MiB/s): min= 2449, max= 6145, per=100.00%, avg=3914.68, stdev=81.15, samples=144 iops : min=627106, max=1573156, avg=1002158.11, stdev=20774.77, samples=144 The performance is better (+4%) for 4K cached read and unchanged for THP. Signed-off-by: Kairui Song --- mm/filemap.c | 127 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 76 insertions(+), 51 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 6bbec8783793..c1484bcdbddb 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -848,12 +848,77 @@ void replace_page_cache_folio(struct folio *old, struct folio *new) } EXPORT_SYMBOL_GPL(replace_page_cache_folio); +static int __split_add_folio_locked(struct xa_state *xas, struct folio *folio, + pgoff_t index, gfp_t gfp, void **shadowp) +{ + void *entry, *shadow, *alloced_shadow = NULL; + int order, alloced_order = 0; + + gfp &= GFP_RECLAIM_MASK; + for (;;) { + shadow = NULL; + order = 0; + + xas_for_each_conflict(xas, entry) { + if (!xa_is_value(entry)) + return -EEXIST; + shadow = entry; + } + + if (shadow) { + if (shadow == xas_reload(xas)) { + order = xas_get_order(xas); + if (order && order > folio_order(folio)) { + /* entry may have been split before we acquired lock */ + if (shadow != alloced_shadow || order != alloced_order) + goto unlock; + xas_split(xas, shadow, order); + xas_reset(xas); + } + order = 0; + } + if (shadowp) + *shadowp = shadow; + } + + xas_store(xas, folio); + /* Success, return with mapping locked */ + if (!xas_error(xas)) + return 0; +unlock: + /* + * Unlock path, if errored, return unlocked. + * If allocation needed, alloc and retry. + */ + xas_unlock_irq(xas); + if (order) { + if (unlikely(alloced_order)) + xas_destroy(xas); + xas_split_alloc(xas, shadow, order, gfp); + if (!xas_error(xas)) { + alloced_shadow = shadow; + alloced_order = order; + } + goto next; + } + /* xas_nomem result checked by xas_error below */ + xas_nomem(xas, gfp); +next: + xas_lock_irq(xas); + if (xas_error(xas)) + return xas_error(xas); + + xas_reset(xas); + } +} + noinline int __filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp) { XA_STATE(xas, &mapping->i_pages, index); bool huge = folio_test_hugetlb(folio); long nr; + int ret; VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio); @@ -863,70 +928,30 @@ noinline int __filemap_add_folio(struct address_space *mapping, xas_set_order(&xas, index, folio_order(folio)); nr = folio_nr_pages(folio); - gfp &= GFP_RECLAIM_MASK; folio_ref_add(folio, nr); folio->mapping = mapping; folio->index = xas.xa_index; - do { - unsigned int order = xa_get_order(xas.xa, xas.xa_index); - void *entry, *old = NULL; - - if (order > folio_order(folio)) { - xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index), - order, gfp); - if (xas_error(&xas)) - goto error; - } - xas_lock_irq(&xas); - xas_for_each_conflict(&xas, entry) { - old = entry; - if (!xa_is_value(entry)) { - xas_set_err(&xas, -EEXIST); - goto unlock; - } - } - - if (old) { - if (shadowp) - *shadowp = old; - /* entry may have been split before we acquired lock */ - order = xa_get_order(xas.xa, xas.xa_index); - if (order > folio_order(folio)) { - /* How to handle large swap entries? */ - BUG_ON(shmem_mapping(mapping)); - xas_split(&xas, old, order); - xas_reset(&xas); - } - } - - xas_store(&xas, folio); - if (xas_error(&xas)) - goto unlock; - + xas_lock_irq(&xas); + ret = __split_add_folio_locked(&xas, folio, index, gfp, shadowp); + if (likely(!ret)) { mapping->nrpages += nr; - - /* hugetlb pages do not participate in page cache accounting */ if (!huge) { __lruvec_stat_mod_folio(folio, NR_FILE_PAGES, nr); if (folio_test_pmd_mappable(folio)) __lruvec_stat_mod_folio(folio, NR_FILE_THPS, nr); } -unlock: xas_unlock_irq(&xas); - } while (xas_nomem(&xas, gfp)); - - if (xas_error(&xas)) - goto error; + trace_mm_filemap_add_to_page_cache(folio); + } else { + xas_unlock_irq(&xas); + folio->mapping = NULL; + /* Leave page->index set: truncation relies upon it */ + folio_put_refs(folio, nr); + } - trace_mm_filemap_add_to_page_cache(folio); - return 0; -error: - folio->mapping = NULL; - /* Leave page->index set: truncation relies upon it */ - folio_put_refs(folio, nr); - return xas_error(&xas); + return ret; } ALLOW_ERROR_INJECTION(__filemap_add_folio, ERRNO);