From patchwork Tue Apr 16 07:17:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13631451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45D3BC04FF8 for ; Tue, 16 Apr 2024 07:18:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68D416B009C; Tue, 16 Apr 2024 03:17:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5EF536B009D; Tue, 16 Apr 2024 03:17:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 440BB6B009E; Tue, 16 Apr 2024 03:17:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1851F6B009C for ; Tue, 16 Apr 2024 03:17:59 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9A1FF1C1312 for ; Tue, 16 Apr 2024 07:17:58 +0000 (UTC) X-FDA: 82014540636.12.63C8FA5 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf02.hostedemail.com (Postfix) with ESMTP id BA6CB8000C for ; Tue, 16 Apr 2024 07:17:56 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bKlkqf8V; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713251876; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2PyPfFBA1Tc3D6QFaHrtVUXv3LRzBa54pk/bPz9W1MA=; b=FJWSM5k/cc2lMaTN/VEqEbkrrS2TI9zlRVAkXRyBDHsH0xLa+bDCuL8kdBw8XCnRtojLJN +pfQcYn5m95ibES9kyJ8nSdwK8cWB99h+/6LCZY71sFu2z8HmWELgDOijvDLL70aQS3Jmt lMLpK52MN1rpwJuyf7mtH8D2GVCl8GI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713251876; a=rsa-sha256; cv=none; b=8MP7IT50az48t+UWR6FrVbWOUgoVf8d78I6aLtpHI6+8IyZRhQN08l78DaEUU1E2O2nwbV CX0PqaNVBj4ylxY6nIgGjVPh5JbFyQlwD5eGM4WknNgGhakLI7O/Oj1aFDIjcPqb035WBn lIxzBV319a6G0u7aOQnty7+qCHo8+jE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bKlkqf8V; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6eaf9565e6bso2501096b3a.2 for ; Tue, 16 Apr 2024 00:17:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713251875; x=1713856675; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=2PyPfFBA1Tc3D6QFaHrtVUXv3LRzBa54pk/bPz9W1MA=; b=bKlkqf8VCTNgPfi4nTb1mBLvbf7WD+E7o22nnqhjyxwI/QpoGPtu+WihMs1Zv/oP+N ViE5hEixlkJXIj62FgLni8KrMw1ciJI7ncxo/TO+mL9ctgiWjZ8mPKjcHt9DPQWJhWpO zw+QzYJFM74XWIbi9qMd+OzInTfSqM+6dwAT0W/hP0N5DQn7hU7nq3WxQOVEc61cuL5J 2v1QYZ+/Q3HGYbcBpkTRLcKL3V+LMwQIEeKUAKpgj/hh9XstQ+R0rnVXbEunJfjW3uaW aXs+8UJbQMPNJUXkBL0kHVYj6C4pWivx0aqWjdPUQVtVTkwoNAccTfTAWAZ70+2/uRDF DTug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713251875; x=1713856675; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=2PyPfFBA1Tc3D6QFaHrtVUXv3LRzBa54pk/bPz9W1MA=; b=jbUBOdSRR82zk+dTXlYIw7TRDkt0ii+zV3NKeUVJ0gl+A+SUXYlHTiICXKh4XPo2o0 zbCL8x9j5Tj6ALmqPZnF1tteYwcf1dAVu6cd86GizdPT/W6Gk/6mqYtumnRGTwuaYR7p TwZp//tgA8xfAShp1ifWIZYRXG39NbBjGhFB+fwX/gL0IAuaJd7oasJFj8vqjLE6afrU XEMhSTfsMUmEnIU/O7U6wqxwXsfYAyW/TBG2iW42Uk1aqKxOoZvOaPcqWJ6mYFPOydzp rXuGhdw/ePfXARY4DYv5/yZZzAWJWcUt4ga/1X5Ay6XIdW6hkW0hDJI+QNXPwdTd+ewA r0iw== X-Gm-Message-State: AOJu0YzpjOPwzPs1+Tpdm/E88sE0kX/vbI2e94t6FSYajmUeHVtAyHIf gccfsFSUjXAqlhmoiAcBfDayV0qVp5g12I2DgzCRYo0aI6s5JkgdgsDQiA9t X-Google-Smtp-Source: AGHT+IHr7oN8taByNZNOFuS818KVFfhROjhg9COx9F5A8nzf64zhs0XWSwf3xe5Glor8jesGCLKr/A== X-Received: by 2002:a05:6a00:2353:b0:6eb:4:3f26 with SMTP id j19-20020a056a00235300b006eb00043f26mr13968297pfj.18.1713251874742; Tue, 16 Apr 2024 00:17:54 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id lr48-20020a056a0073b000b006e724ccdc3esm8340029pfb.55.2024.04.16.00.17.52 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 16 Apr 2024 00:17:54 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v4 4/4] mm/filemap: optimize filemap folio adding Date: Tue, 16 Apr 2024 15:17:22 +0800 Message-ID: <20240416071722.45997-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240416071722.45997-1-ryncsn@gmail.com> References: <20240416071722.45997-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: BA6CB8000C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: n9kd3mzda67z5myygks77aior96ncmhd X-HE-Tag: 1713251876-870930 X-HE-Meta: U2FsdGVkX1/af9M8kGKBW1NBLrHR922ZJDigkstQKBoBDuzOV3mkTQq9hcC9+b1NEWaERj3eelP0RMIm1XjKFVYIRoE97Q3ihOddxAgAaHhYMAp5TFtj/Sa7mlh9FP9cvzpYHLPaiuG/z6Ql1vvGcTsFwyOrsSGF0RxNwOp770MfpYfCk75xkYMEb9ehodehCnMd5d0KGS+mvYIh4f7ZeSgt5MpCTf5XLNRr71bsm+CVxxkBkN8+AmUQ7U8BWXoIHiIChpCRKg1fI2mfItK9BtpB2PuAoI6KxoiIaH/sF5rxLhMs4pJNqSROh3yIAOYZ2jg84PLKhkUMhCWhUDNMxl7BVKlvQT0MfwlcZyFwtJc1x0oamoj2lmhR6czdPGAQyn/fAMM0wuvij0T4RrdIHZG4CwaJRcP97AGnGh1gNXH3QhJ4stKqT6R37AD7t4tq1VIPsMpDzE/lHOK/S4VIDCAy55gPJu2R/0W1McC2aR4CUXLx4XLpIFku1TCubARvtCJSVIPIe7G5qg+SmOcTsmLuvb4/O3x+BnVu7a6hfFmmx+Yo6sYUnv722r2noz4ngyN+YRio9VoW0r5nNGClCuIKHT/Xv53jF/jfoVrLkufrkf+e58SVD/ZED/+sr50rUHosh44H9jmEN/Q+b2CTOEtL3qXrk6x3OKON/Ry/xbRVoqJPOYxWt+mTIfF8LkhsqaHz+QgsCt7/XU5fTrOyixPlMJdpyJiR4lpe79Hjj8mlw9v/uGAQ3fdk77qCWFU1WBmAlg3gCikVC58PrvJF4tTp5UGA94gcv1LUvURaVI78xoS9fNqcibXmNtu1GslDzBOcDmkQILsu9/iGiSKBlv2O0cQJpXl5sIo8axHLwXJ0sX/Pzl4QCvn2XC1vw3EoDecQZNpQ7cZz9IoabfpkUIQmKnIJY/4cQzheRNT/lNIlqElqzNfU1/uwFP927XCClSt41LjAFuPkYK7ul6X CGgxv2Ss 6X9epPwti1U+FJj95Stl/ZXIL6lImEHW1ub8ap38C595n4l9xo597mtF50BCugheruKDm32ZgfztguBpKEreJttPr1ZbkF+1hynmXrRgXFYQS+pPEJgyNIPRHEqNUjlTriXJ2eMjpe6HvOHAnTMM8mAQQuQqOZLJho/7Qxvkfyf6u0V6m9Ej4QsYrXBRsfIlX0HqmKHOEy0GMXa6XwvoKKFeIDgs7zjBjDQLOXBQ+SfxqM6dNnwI2oueoEEG51mo0Wgc+uo5UtWjhAiZD36Er5DiOjk2+wdTQSNJvXdrCLEBng9wBxv7WojNet+rP2EkqYvGaqJQBNJZGiAVv8houP5q+62Q1t7CoPl/+goTFR4TZJbf4EhWn3UKbfndk9Qhw1cL5KC0ermsz/QW1uun1j29qwB3Mgm9grI9jrygFngteMMIM6Esm3OyQcXdnoMQuBu7hwIDM4D1zL9wIAAiSmGENEZ/ySrMd/FaQaSWGuVU3udGvVmAfFerRkiK2KvrhlT+2o4wTYNm0zecohW9+BSRTROPU3gNB62IWLB/GQjoOJ28fUpRDOeKNJh5Q8IO6cGm6GBUEkXPxJU6m7cqiNamNUySZIqi8/Z9nR4Vp5ulxirTXs+lZ8zrZNBUMaXtRJlf5XdJxiwHfDwGvNIUdp/cfa9I1XbU5AKS+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order. Drop the lock and do the allocation only if a split is needed. In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranged conflict check and order retrieving, one for the second check after allocation, one for the insert after split). Testing with 4K pages, in an 8G cgroup, with 16G brd as block device: echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting Before: bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691 iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691 After (+7.3%): bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651 iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651 Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting): echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap -thp=1 --readonly \ --rw=randread --time_based --ramp_time=30s --runtime=10m \ --group_reporting fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap \ --rw=randread --time_based --runtime=5s --group_reporting Before: bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976 iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976ยท READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec After (+12.5%): bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146 iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146 READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read. Signed-off-by: Kairui Song --- lib/test_xarray.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++ mm/filemap.c | 56 ++++++++++++++++++++++++++++++++------------ 2 files changed, 100 insertions(+), 15 deletions(-) diff --git a/lib/test_xarray.c b/lib/test_xarray.c index 0efde8f93490..8732a311f613 100644 --- a/lib/test_xarray.c +++ b/lib/test_xarray.c @@ -2017,6 +2017,64 @@ static noinline void check_xas_get_order(struct xarray *xa) } } +static noinline void check_xas_conflict_get_order(struct xarray *xa) +{ + XA_STATE(xas, xa, 0); + + void *entry; + int only_once; + unsigned int max_order = IS_ENABLED(CONFIG_XARRAY_MULTI) ? 20 : 1; + unsigned int order; + unsigned long i, j, k; + + for (order = 0; order < max_order; order++) { + for (i = 0; i < 10; i++) { + xas_set_order(&xas, i << order, order); + do { + xas_lock(&xas); + xas_store(&xas, xa_mk_value(i)); + xas_unlock(&xas); + } while (xas_nomem(&xas, GFP_KERNEL)); + + /* + * Ensure xas_get_order works with xas_for_each_conflict. + */ + j = i << order; + for (k = 0; k < order; k++) { + only_once = 0; + xas_set_order(&xas, j + (1 << k), k); + xas_lock(&xas); + xas_for_each_conflict(&xas, entry) { + XA_BUG_ON(xa, entry != xa_mk_value(i)); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + only_once++; + } + XA_BUG_ON(xa, only_once != 1); + xas_unlock(&xas); + } + + if (order < max_order - 1) { + only_once = 0; + xas_set_order(&xas, (i & ~1UL) << order, order + 1); + xas_lock(&xas); + xas_for_each_conflict(&xas, entry) { + XA_BUG_ON(xa, entry != xa_mk_value(i)); + XA_BUG_ON(xa, xas_get_order(&xas) != order); + only_once++; + } + XA_BUG_ON(xa, only_once != 1); + xas_unlock(&xas); + } + + xas_set_order(&xas, i << order, order); + xas_lock(&xas); + xas_store(&xas, NULL); + xas_unlock(&xas); + } + } +} + + static noinline void check_destroy(struct xarray *xa) { unsigned long index; @@ -2069,6 +2127,7 @@ static int xarray_checks(void) check_multi_store_advanced(&array); check_get_order(&array); check_xas_get_order(&array); + check_xas_conflict_get_order(&array); check_xa_alloc(); check_find(&array); check_find_entry(&array); diff --git a/mm/filemap.c b/mm/filemap.c index 7f734b7ad273..00177ecdc4a4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -852,7 +852,9 @@ noinline int __filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp) { XA_STATE(xas, &mapping->i_pages, index); - bool huge = folio_test_hugetlb(folio); + void *alloced_shadow = NULL; + int alloced_order = 0; + bool huge; long nr; VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); @@ -861,6 +863,7 @@ noinline int __filemap_add_folio(struct address_space *mapping, VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); xas_set_order(&xas, index, folio_order(folio)); + huge = folio_test_hugetlb(folio); nr = folio_nr_pages(folio); gfp &= GFP_RECLAIM_MASK; @@ -868,16 +871,10 @@ noinline int __filemap_add_folio(struct address_space *mapping, folio->mapping = mapping; folio->index = xas.xa_index; - do { - unsigned int order = xa_get_order(xas.xa, xas.xa_index); + for (;;) { + int order = -1, split_order = 0; void *entry, *old = NULL; - if (order > folio_order(folio)) { - xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index), - order, gfp); - if (xas_error(&xas)) - goto error; - } xas_lock_irq(&xas); xas_for_each_conflict(&xas, entry) { old = entry; @@ -885,19 +882,33 @@ noinline int __filemap_add_folio(struct address_space *mapping, xas_set_err(&xas, -EEXIST); goto unlock; } + /* + * If a larger entry exists, + * it will be the first and only entry iterated. + */ + if (order == -1) + order = xas_get_order(&xas); + } + + /* entry may have changed before we re-acquire the lock */ + if (alloced_order && (old != alloced_shadow || order != alloced_order)) { + xas_destroy(&xas); + alloced_order = 0; } if (old) { - if (shadowp) - *shadowp = old; - /* entry may have been split before we acquired lock */ - order = xa_get_order(xas.xa, xas.xa_index); - if (order > folio_order(folio)) { + if (order > 0 && order > folio_order(folio)) { /* How to handle large swap entries? */ BUG_ON(shmem_mapping(mapping)); + if (!alloced_order) { + split_order = order; + goto unlock; + } xas_split(&xas, old, order); xas_reset(&xas); } + if (shadowp) + *shadowp = old; } xas_store(&xas, folio); @@ -913,9 +924,24 @@ noinline int __filemap_add_folio(struct address_space *mapping, __lruvec_stat_mod_folio(folio, NR_FILE_THPS, nr); } + unlock: xas_unlock_irq(&xas); - } while (xas_nomem(&xas, gfp)); + + /* split needed, alloc here and retry. */ + if (split_order) { + xas_split_alloc(&xas, old, split_order, gfp); + if (xas_error(&xas)) + goto error; + alloced_shadow = old; + alloced_order = split_order; + xas_reset(&xas); + continue; + } + + if (!xas_nomem(&xas, gfp)) + break; + } if (xas_error(&xas)) goto error;