From patchwork Tue Jan 23 18:45:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13527977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BDAFC47DDB for ; Tue, 23 Jan 2024 18:46:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 829DB6B008A; Tue, 23 Jan 2024 13:46:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D7D06B008C; Tue, 23 Jan 2024 13:46:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 608666B0092; Tue, 23 Jan 2024 13:46:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3FA596B008A for ; Tue, 23 Jan 2024 13:46:09 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id DD81F12070A for ; Tue, 23 Jan 2024 18:46:08 +0000 (UTC) X-FDA: 81711455616.20.BE9EE31 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf06.hostedemail.com (Postfix) with ESMTP id E48E0180024 for ; Tue, 23 Jan 2024 18:46:06 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WFOnnuL6; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706035567; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rc80RJovXRONel2vPZqiqNiPzdLsOGe//Ia7888m/xQ=; b=cQU+MYyf/zY3ydgodF2/GC/4Ggg/51WOzHPBfknrJC/N9kLRneCO/+huydK9egeYLjO1oR mkCxiBKoT/Mx5CubkDztnze62dZigYF/JqY05KQsqfG/N72mZ7J4T87PQ2Ytd+I3EWW9s9 DgjZ+llJa2BvMcY6mwbQwO05GRczMzU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706035567; a=rsa-sha256; cv=none; b=GOZEsW5UDWjRPaZhHEmASNVwZifw6JzaokiLs1WB6kIvyEZKFrcDt4Ld+woNoR7fW7mvq/ zB8uwfMxKTuYDG7c9xnRvqic1+nzTfY6kcDEcF0AOOEhuGFbRn0dEAaOAPRkJG6C9F3OII OyorqOr9zh8r6ZLbWhG64+7PSFs5B1g= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WFOnnuL6; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1d75ea3a9b6so18104985ad.2 for ; Tue, 23 Jan 2024 10:46:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706035565; x=1706640365; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=rc80RJovXRONel2vPZqiqNiPzdLsOGe//Ia7888m/xQ=; b=WFOnnuL60NRFrtKX8GPdmYLlFxNsdFX3V0vzRV0GgPIOTCOMadlh370bYDL1DGs0gd qY2XYzBf58O8EwDMC0scNSU4+RsDW+qBApSEThTfMs8FXn043dH0hAoGrbcEBa9T/Bay WCzftZolKV8J9kmnnWR4ElPaSijJ471Yf5h3hS6xRnFDzP6NPjt91sF7EE0nbllYigKQ Sr/mKX10KP/kXOeWT8lL1Bge7Bf2DoVeBFb89k+dUCjlvqwfuxMLd7sDJjdvej+Cs1Iu 0XyUI1Kjlj+GsVdTCKUPPXMAFKlunuJuW3GDKrpdMuOOcQH1WKjS7FkJBMgao4t8MR8k 46aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706035565; x=1706640365; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=rc80RJovXRONel2vPZqiqNiPzdLsOGe//Ia7888m/xQ=; b=vOqNAxmRkqA252KMGE0h7f1z3/aToy0X2AI8gACj18YbbW9sTxIo7TRptqgLFpzjAT pnVkZ36xixhMIGJtTBxEoadostB6GDEo4oxujZBueyrthPcb9viLIyDMcEg2MPfAhrUT X8VVFVKu+69vByJxRcRc8x+18UScgGOXj762Eo4e0PSscPgUikGLFL6ovm36NsqqW4GX 3TT4LLYUhVzkzL0i0k5EVIjBKjG8h/XQyO8dmpqfhyvlmNSnuuCsKz4xxhVtM82fbGyC KbN/DuL/BraTqxr6H5YNeqZ9ddwQmxYbobV5M+1bw/BnRmRp2A3H8WiMjLT4u9qJ4kDT bJhQ== X-Gm-Message-State: AOJu0Yy52wJWqjDWA/pcgBhv1xUH+m2T06QevG+2c+uh02aYS30b8PAa QbVCLKSnwzzkNwAtjHUJ84obbjlaqlK4T4Odm6Vxm0MEH8AOGoebDzf2+mAx6TqO/g== X-Google-Smtp-Source: AGHT+IF64IW2XJLq7K2HXZYsFhS1Lcix1YogNkO/ahBEKQxX6SOEvgC3c8bnVYZKaRE3MHMEchv0JA== X-Received: by 2002:a17:902:d48b:b0:1d7:37cf:6c71 with SMTP id c11-20020a170902d48b00b001d737cf6c71mr7152816plg.38.1706035565062; Tue, 23 Jan 2024 10:46:05 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id d5-20020a170902b70500b001d74c285b55sm4035196pls.67.2024.01.23.10.46.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 23 Jan 2024 10:46:04 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Wei Xu , Chris Li , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 2/3] mm, lru_gen: batch update counters on aging Date: Wed, 24 Jan 2024 02:45:51 +0800 Message-ID: <20240123184552.59758-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240123184552.59758-1-ryncsn@gmail.com> References: <20240123184552.59758-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: E48E0180024 X-Rspam-User: X-Stat-Signature: r7mb6uwyf18retycm1wqjfm54rho1hnf X-Rspamd-Server: rspam03 X-HE-Tag: 1706035566-860679 X-HE-Meta: U2FsdGVkX18kEtStBR7zHruUMpv5vuolXyAJLWdJWEAB3QLCjbQbExJN88ovqPMlXWt/7kuDQU+qcj4C5h6I03q26t5u3M3pRj7inpvyz9vV0rUZ/pJ8L9Gu08k9u73GOMMd2K202IXCtrarPRILVDT+vIPwnA3vLuaXPSvziy7R1bm7rIZFce/KkFtI2ZUCcuWmjaXBtSNgRs/UJdZ1kHlCVcYJlunu4kNZiwm1F1SHYrxjbpP1wANMvZkNv0cbrHSke3NBUE3IAi2744BfzqWlEri1LgWhLNPPuJPZMMsVlUFVpVuTfJtqTzCqkrfOLDmESx4ezf5wQFjWmUxOhO9+M7OV3Q88ph/vlTYg5NBTnKmoGI4A6XAkTmy5uZPin9VCYlsbLeBIcD2mrkAne4peaZNMtjrlZM/HBlmT6uQuBeQbSHMTflbjNB+Xf11ILYh2C1oUjKaa05Dugq/BmjS1rjDLtIbfKr6f1ZrBDkAdF93q9oLkMUfnK43HQ8uZH/aiQ28OYfT7pXkIjpUER9qcApyr2s7HCg7wjieSO60YoPFC2daAhF3Orjau1vLJMUVJYGMzIacuKlhwJyxSWsxQtZKf4nOZsRH9BgCPa9v+uWiNnH4+kpT0wgATcI2fbUwsyoiDygmvilij0OYWhGfry9FoN4s7hmRnkBTQ2tsVCcACE6Qvx3I+Ki5InSoIq8JYf07h9yekvFYl9MQYgs3cpAgtvY9YjoHy4Wouf4udnf218MeBWU7yZkzdDgZbXyhP8uljHwzxP6Ly2ODdvf7d+RJl0fwrGWAY1JhUVCDwHb2e9XIV/FkGXoJ7pGT/Q6T5rBEgYMFIxd1xvDn0Qvj1AseUj5stzZu5uVLcqyFgQqalfCZyipesogDRgWSY+n4IOaIRl8h/Z84K/cMYdH78qSU0svVv55uW/e41mdjYAFL0eJpz43IcN3ZZ4sN8WyTFyorsv5aqlrC2flV lfyfXa20 TwIYW4sbKVCY9bcsJdFw2ojWf5Dy/cKjk0IKBTQYGN7b9CPnWs848TD9M3YLOGMftG9z3UAEQEhFuopM8F46g+qQ2Fbe08HqzwlgtD8Z0Vvs149JNCtQI4C7GOkUMrJLMHQkXNR6lzvI5fkZt4YN4jpE44KZ7JRAl5b1A6BNsjMEVDojzoYKszIJJhSEZb02hq1YjeZLxj752ssubjyftumUob0tlv53/pHDo6SzrMGZtebfTuWDt2u+juAt690fwWSSCJMps0rK3U46m0p2mbiFoJHncpGcnMHhbe6wenZqk4yxfmwCIXKb/wxOzcmWyhUrYqU68Ov5k9IP8FjDeu1CVSzsXEiS78IsNq824Qib0nl38BvgJm1oSLKpg0qck0zBCf8kXKg+D2iWRcJbjVo8ktx+Od/55QuGaogpnNH3E9YEkTYBIeez5tRnUSZWyD7L4+JfM/7EEFKohDTNZixZA6k8XsUuDh5BIR4DcgITDMtc2lrGsW/hQbQU3msXH8AWF5/AF5mUhM+rYwjnkngYjdqjJjeX51zrk39IQu7Jgubni5L+FOvpJ5hJp+jQBrGE8Ll3p22MuqZHOMspReVMimPR94yZCnG+3tj5+jfa5lrQ0agLCHcXH3v1ubQkFdRYVVNGqGYb264ML2+HPx+57c3zThjgbcxfBgvnbkfOsBjg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001121, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song When lru_gen is aging, it will update mm counters page by page, which causes a higher overhead if age happens frequently or there are a lot of pages in one generation getting moved. Optimize this by doing the counter update in batch. Although most __mod_*_state has its own caches the overhead is still observable. Test 1: Ramdisk fio test in a 4G memcg on a EPYC 7K62 with: fio -name=mglru --numjobs=16 --directory=/mnt --size=960m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=zipf:0.5 --norandommap \ --time_based --ramp_time=1m --runtime=6m --group_reporting Before this patch: bw ( MiB/s): min= 8360, max= 9771, per=100.00%, avg=9381.31, stdev=15.67, samples=11488 iops : min=2140296, max=2501385, avg=2401613.91, stdev=4010.41, samples=11488 After this patch (+0.0%): bw ( MiB/s): min= 8299, max= 9847, per=100.00%, avg=9388.23, stdev=16.25, samples=11488 iops : min=2124544, max=2521056, avg=2403385.82, stdev=4159.07, samples=11488 Test 2: Ramdisk fio hybrid test for 30m in a 4G memcg on a EPYC 7K62 (3 times): fio --buffered=1 --numjobs=8 --size=960m --directory=/mnt \ --time_based --ramp_time=1m --runtime=30m \ --ioengine=io_uring --iodepth=128 --iodepth_batch_submit=32 \ --iodepth_batch_complete=32 --norandommap \ --name=mglru-ro --rw=randread --random_distribution=zipf:0.7 \ --name=mglru-rw --rw=randrw --random_distribution=zipf:0.7 Before this patch: READ: 6926.6 MiB/s, Stdev: 37.950260 WRITE: 1297.3 MiB/s, Stdev: 7.408704 After this patch (+0.7%, +0.4%): READ: 6973.3 MiB/s, Stdev: 19.601587 WRITE: 1302.3 MiB/s, Stdev: 4.988877 Test 3: 30m of MySQL test in 6G memcg (12 times): echo 'set GLOBAL innodb_buffer_pool_size=16106127360;' | \ mysql -u USER -h localhost --password=PASS sysbench /usr/share/sysbench/oltp_read_only.lua \ --mysql-user=USER --mysql-password=PASS --mysql-db=DB \ --tables=48 --table-size=2000000 --threads=16 --time=1800 run Before this patch Avg: 135005.779091 qps. Stdev: 295.299027 After this patch (+0.2%): Avg: 135310.868182 qps. Stdev: 379.200942 Test 4: Build linux kernel in 2G memcg with make -j48 with SSD swap (for memory stress, 18 times): Before this patch: Average: 1455.659254 s. Stdev: 15.274481 After this patch (-0.8%): Average: 1467.813023 s. Stdev: 24.232886 Test 5: Memtier test in a 4G cgroup using brd as swap (20 times): memcached -u nobody -m 16384 -s /tmp/memcached.socket \ -a 0766 -t 16 -B binary & memtier_benchmark -S /tmp/memcached.socket \ -P memcache_binary -n allkeys \ --key-minimum=1 --key-maximum=16000000 -d 1024 \ --ratio=1:0 --key-pattern=P:P -c 1 -t 16 --pipeline 8 -x 3 Before this patch: Avg: 47691.343500 Ops/sec. Stdev: 3925.772473 After this patch (+1.7%): Avg: 48389.282500 Ops/sec. Stdev: 3534.470933 Signed-off-by: Kairui Song --- mm/vmscan.c | 68 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 03631cedb3ab..8c701b34d757 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3113,12 +3113,45 @@ static int folio_update_gen(struct folio *folio, int gen) return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; } -/* protect pages accessed multiple times through file descriptors */ -static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +/* + * When oldest gen ie being reclaimed, protected/unreclaimable pages can be + * moved in batch. They usually all land on same gen (old_gen + 1) by + * folio_inc_gen so the batch struct is limited to one / type / zone + * level LRU. + * Batch is applied after finished or aborted scanning one LRU list. + */ +struct lru_gen_inc_batch { + int delta; +}; + +static void lru_gen_inc_batch_done(struct lruvec *lruvec, int gen, int type, int zone, + struct lru_gen_inc_batch *batch) { - int type = folio_is_file_lru(folio); + int delta = batch->delta; + int new_gen = (gen + 1) % MAX_NR_GENS; struct lru_gen_folio *lrugen = &lruvec->lrugen; - int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]); + enum lru_list lru = type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON; + + if (!delta) + return; + + WRITE_ONCE(lrugen->nr_pages[gen][type][zone], + lrugen->nr_pages[gen][type][zone] - delta); + WRITE_ONCE(lrugen->nr_pages[new_gen][type][zone], + lrugen->nr_pages[new_gen][type][zone] + delta); + + if (!lru_gen_is_active(lruvec, gen) && lru_gen_is_active(lruvec, new_gen)) { + __update_lru_size(lruvec, lru, zone, -delta); + __update_lru_size(lruvec, lru + LRU_ACTIVE, zone, delta); + } +} + +/* protect pages accessed multiple times through file descriptors */ +static int folio_inc_gen(struct folio *folio, int old_gen, bool reclaiming, + struct lru_gen_inc_batch *batch) +{ + int new_gen; + int delta = folio_nr_pages(folio); unsigned long new_flags, old_flags = READ_ONCE(folio->flags); VM_WARN_ON_ONCE_FOLIO(!(old_flags & LRU_GEN_MASK), folio); @@ -3138,7 +3171,8 @@ static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclai new_flags |= BIT(PG_reclaim); } while (!try_cmpxchg(&folio->flags, &old_flags, new_flags)); - lru_gen_update_size(lruvec, folio, old_gen, new_gen); + /* new_gen is ensured to be old_gen + 1 here, do a batch update */ + batch->delta += delta; return new_gen; } @@ -3672,6 +3706,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) { int zone; int remaining = MAX_LRU_BATCH; + struct lru_gen_inc_batch batch = { }; struct lru_gen_folio *lrugen = &lruvec->lrugen; int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]); @@ -3701,12 +3736,15 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) prefetchw(&prev->flags); } - new_gen = folio_inc_gen(lruvec, folio, false); + new_gen = folio_inc_gen(folio, old_gen, false, &batch); list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]); - if (!--remaining) + if (!--remaining) { + lru_gen_inc_batch_done(lruvec, old_gen, type, zone, &batch); return false; + } } + lru_gen_inc_batch_done(lruvec, old_gen, type, zone, &batch); } done: reset_ctrl_pos(lruvec, type, true); @@ -4226,7 +4264,7 @@ void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid) ******************************************************************************/ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_control *sc, - int tier_idx) + int tier_idx, struct lru_gen_inc_batch *batch) { bool success; int gen = folio_lru_gen(folio); @@ -4236,6 +4274,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c int refs = folio_lru_refs(folio); int tier = lru_tier_from_refs(refs); struct lru_gen_folio *lrugen = &lruvec->lrugen; + int old_gen = lru_gen_from_seq(lrugen->min_seq[type]); VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio); @@ -4259,7 +4298,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c } /* promoted */ - if (gen != lru_gen_from_seq(lrugen->min_seq[type])) { + if (gen != old_gen) { list_move(&folio->lru, &lrugen->folios[gen][type][zone]); return true; } @@ -4268,7 +4307,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) { int hist = lru_hist_from_seq(lrugen->min_seq[type]); - gen = folio_inc_gen(lruvec, folio, false); + gen = folio_inc_gen(folio, old_gen, false, batch); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); WRITE_ONCE(lrugen->protected[hist][type][tier - 1], @@ -4278,7 +4317,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c /* ineligible */ if (zone > sc->reclaim_idx || skip_cma(folio, sc)) { - gen = folio_inc_gen(lruvec, folio, false); + gen = folio_inc_gen(folio, old_gen, false, batch); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); return true; } @@ -4286,7 +4325,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c /* waiting for writeback */ if (folio_test_locked(folio) || folio_test_writeback(folio) || (type == LRU_GEN_FILE && folio_test_dirty(folio))) { - gen = folio_inc_gen(lruvec, folio, true); + gen = folio_inc_gen(folio, old_gen, true, batch); list_move(&folio->lru, &lrugen->folios[gen][type][zone]); return true; } @@ -4353,6 +4392,7 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, LIST_HEAD(moved); int skipped_zone = 0; struct folio *prev = NULL; + struct lru_gen_inc_batch batch = { }; int zone = (sc->reclaim_idx + i) % MAX_NR_ZONES; struct list_head *head = &lrugen->folios[gen][type][zone]; @@ -4377,7 +4417,7 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, prefetchw(&prev->flags); } - if (sort_folio(lruvec, folio, sc, tier)) + if (sort_folio(lruvec, folio, sc, tier, &batch)) sorted += delta; else if (isolate_folio(lruvec, folio, sc)) { list_add(&folio->lru, list); @@ -4391,6 +4431,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, break; } + lru_gen_inc_batch_done(lruvec, gen, type, zone, &batch); + if (skipped_zone) { list_splice(&moved, head); __count_zid_vm_events(PGSCAN_SKIP, zone, skipped_zone);