From patchwork Tue Jan 23 18:45:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13527976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4200EC47DDB for ; Tue, 23 Jan 2024 18:46:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD1916B0082; Tue, 23 Jan 2024 13:46:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B0C456B0083; Tue, 23 Jan 2024 13:46:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 962E76B0089; Tue, 23 Jan 2024 13:46:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 81DD96B0082 for ; Tue, 23 Jan 2024 13:46:06 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4BE06A0C2B for ; Tue, 23 Jan 2024 18:46:06 +0000 (UTC) X-FDA: 81711455532.17.D14F9A6 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf02.hostedemail.com (Postfix) with ESMTP id 6D01780013 for ; Tue, 23 Jan 2024 18:46:04 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GJDQCPXe; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706035564; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4tUupe0pRs64+ug0l5Lv4L/wYRNFF+WyGC/Eo0dQs5k=; b=MER3GryB4+t95oKGKRAHxsoVnjaytbWzIQhpgdUJDaAzzPA0mHYyvBZm2iB3dpAk4ir2s8 OJC4obkegOjhF869xOtIpwRgwSCK2SHVZezlC/NidEiUbqpHpqNNDzyrrOWbG7Ux2VGQeS XlIep7JGdQudBEX6nLR8WAtnZBJeaA8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GJDQCPXe; spf=pass (imf02.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706035564; a=rsa-sha256; cv=none; b=G/bbL4wuUKoDguUQQ1/GSfESAeZiO8Zs2V3VXGIZ+M0oKwZa0i2HHhc9fQg9kTqTkF4s9j ozQMV7BjNqiIv/KlLiykXvBYGXtayI1p1Py60mv7qVwVD/ZkkVi/yvATfEknWTK7zI/nYs G5K0F/WQqtLw11g7PZFni5zYjevSIbA= Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1d75ea3a9b6so18104775ad.2 for ; Tue, 23 Jan 2024 10:46:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706035562; x=1706640362; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=4tUupe0pRs64+ug0l5Lv4L/wYRNFF+WyGC/Eo0dQs5k=; b=GJDQCPXe3xgfC/1TJAb/OD8NRA8C7exmYyTy7rCzNF19xiSqOC5bVIQ8Mi48RJgFjw +k8z1cppN5rUkO1iSUUlFtGsTqCxlCFDbMhdtJNZm2LJ+obgT82gX9Z70FWv4ID0RQ5m TYH8zONdqQ8Xpr7/MMGLqTWoNbpeRa4Ul06yLK1wAtz99gdSaaOjp2S/FW3BMhGlwtzm I4LAp2sMHkO4cq9k2xqrtCX+hIvSN+KkZzcmpKEoTHIQ+Eq1DNTA7CIat8gaAip0ST/u YjsMZ2D/tYboDTXxPU4Kpr/x7Bd5IF8G4XB914KkCdoUwzIGWBC8rIdP6yyIj51US8GR Wn1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706035562; x=1706640362; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=4tUupe0pRs64+ug0l5Lv4L/wYRNFF+WyGC/Eo0dQs5k=; b=tSGMF2LqmbzqyZhsVG8ICl2UMFdGeCWurj0KaGFwEN8dKjC7f9NLqQ7w7AQgD2+qkZ tAE5u7FX6lCjhIAODaDLdoQHVz1KvMRu86xJnbsGdsCq89im7FPQIF/oX+LcOGAuJ/ZZ nvF6OWWNLmz6/dQlAYWUmqLl0Zw+o07hxqgXsT0RbIMskYjYPaJBxABhmFRX9kEnqrPd f+Y/7uuYPgKFW6ZEj25J/MHWRQ0EDZB7hOnnPRRPpla7dZPlGyJHfqXtSNfMngTYxAAy F7UyqtFWhcAcZRLelGUMCwzWBh53e8MRMrSFOC5VgU6z8KRiMPGBWdmS9WrvxjOe4s0s 3uXg== X-Gm-Message-State: AOJu0YzXQw/u0JBz1Dj/ccPHgOzxXAvIXYpHQkgHerZEaGOQCiFgwwIV /ut9vDzRv2Tn9QFdxBnOXQlS5Py5OYjvOv+VqqSykIju3wFDXAt78nkSMSsw/9YkAw== X-Google-Smtp-Source: AGHT+IHzOTaNc2ALEPfuFIm2wjtMBHoRPdcf8S7u1UHGQhnOuc8VCGjIMwWrNjGtijcUP07sJotD5A== X-Received: by 2002:a17:902:bc41:b0:1d7:2bd6:23e6 with SMTP id t1-20020a170902bc4100b001d72bd623e6mr6093896plz.128.1706035562123; Tue, 23 Jan 2024 10:46:02 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id d5-20020a170902b70500b001d74c285b55sm4035196pls.67.2024.01.23.10.45.59 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 23 Jan 2024 10:46:01 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Wei Xu , Chris Li , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 1/3] mm, lru_gen: try to prefetch next page when scanning LRU Date: Wed, 24 Jan 2024 02:45:50 +0800 Message-ID: <20240123184552.59758-2-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240123184552.59758-1-ryncsn@gmail.com> References: <20240123184552.59758-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: 6D01780013 X-Rspam-User: X-Stat-Signature: koczjhi6kyqanwjjz19db1xas5pkjntn X-Rspamd-Server: rspam01 X-HE-Tag: 1706035564-2679 X-HE-Meta: U2FsdGVkX19KHref+0KYC+nUVE6DSd20wEAoEE+lQbf6ax4RAeqVsuvTZBJFGbzqbyDFQCtRxkw+63CDZeRMP+fbEKFOaeK9Cz7i3lc2+jOh22hqaoZIK0tgZsn3Qayl4NubqKRrPbRrqLe+wCv90w2r1IlkNxULNIOPEXXl0b90rqPh103NHDKrwGdTP3AKimdIRwqmx01xu05Mnvg1aJHyYf6YgU3EZ7B3AlOkgktq+1T+ViEWR6OYbdNjNHMveJ6tJqN0rP/SuDBtQHBGUpOOwrdTwESiiYwTI3sydnBEUcvOoNwdTQZuA0cCa/lTvW9//jLW/Xs7Qdnii5kjM5vOGgXPSxtB37DM6nXTA91zHv8/tYpMdqT3cb7bMcJXfWd/v1+Vo/xpaZpa1d8rJi2GXL8svljBPd7jyVJN8OeoENV3KFIy0y0Dz1YafkWxEpxFfBwbEaLg2ZjyEVHi0YCgkkSMfeOmKQSwG5V51FTE6Kl50XuTCBIYSIWPHn1Ebe/w5fJT1rbp1N3mT+6VhL4jc/gJ3vMBR0T0hnd0rNi2YTcIqdTcjsyZQKHglS484Z+FU8iLaS8ODSglcoG7ESihO0Y7GQWRBRCvKaWS9g8/4ngwnA+4RDQ8tCtnmIs0oVGtsrgc4LtnH6r/iLhO0EXqyxS/fHOswQFP3WsCRduG6Fg1b3FThWp6aFP5ymraRM/vmw4OHtReExQzKlk0iqQXe2P/IWMwg2aL/19MASvthe43YuOY1c8SWB76zZcw3zCnZthZLkmEsiaftCLqInZlGDm+PKI4vxyKVzkNKsWFmoPQDrqxxQ6D+3aIrS2FOS6lQw/k+wYBvsRPeDAesdusbnk8pCahJsUli3sg9Tobs+VjFyYin23CG/MdbyYZ8r8yOxKiUCd64zUvgRTcrlOb502XjetU0JRYQe6E1SnwwmydOQjEIUEe/7pj+ZTI8FjtkWLJEZoMqeYXP7G uJMjN10V bPcJb8pg1aj8UTLJGplhz6+CtjRdB+U4iu2AxOJeFMeVHoNv1f2o7aooTRn5Y4Rs+khNz6vhG+YBs9q5/V9Z0Nk7VacY//SjFC6hxbY4O47S9ilbhtK3RF0qv//QKo/vDFQPU6XyZpsf8WpsFU7nJrOa4FNqDu41QTfvjPtuGOySgQpGd2A5e6kiCKVA+q+cLssL6j9Us0+i65geTzWgavYTByNPsnXusospWoXO23EDnBpssfDIFFBjG8/P9ctx4Xb3TQ7z9Z1z3mpxmzSrhAKgLggR1hz1AguGRNpCgFEB8tYxL6HnmB8S4zrao0jogsu74khVUpHWc866zEmYl8Cbsyrz7tTXvNFph6+FrR1Kn+wUN/oeBGeXyZ0USd0ti0SdG/SxPb8aD5Puti5ejiAIpXdaZDGSua0i//yrSp2VOvDKucLCWlYvuYsIZ99zdc5seNFS3V4iNzlZeJzPWEo6JkiwocRQSHBy0wpYP3BmeIO7THVqK8Z596xa26j9FI9aQjSSHhHVLi05jh9kwCqB/7owS1bjcIYju6wQ5reGuE6KPHEurv45XDyQ6Az/uKZ5nCRvSjPmInjzr8ZBSSu9wiRb5HRLGtxF+iu76WiXAivz9vCUJXb592z599J+GAxhV8fhdlWqBpU2D32UnuFd9A4oEJBssVw/6Xx9bIA+pzsym/67s8dHgTe8mYktXrgwp4pZm1ab0iaw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000043, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Prefetch for inactive/active LRU have been long exiting, apply the same optimization for MGLRU. Test 1: Ramdisk fio ro test in a 4G memcg on a EPYC 7K62: fio -name=mglru --numjobs=16 --directory=/mnt --size=960m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=zipf:0.5 --norandommap \ --time_based --ramp_time=1m --runtime=6m --group_reporting Before this patch: bw ( MiB/s): min= 7758, max= 9239, per=100.00%, avg=8747.59, stdev=16.51, samples=11488 iops : min=1986251, max=2365323, avg=2239380.87, stdev=4225.93, samples=11488 After this patch (+7.2%): bw ( MiB/s): min= 8360, max= 9771, per=100.00%, avg=9381.31, stdev=15.67, samples=11488 iops : min=2140296, max=2501385, avg=2401613.91, stdev=4010.41, samples=11488 Test 2: Ramdisk fio hybrid test for 30m in a 4G memcg on a EPYC 7K62 (3 times): fio --buffered=1 --numjobs=8 --size=960m --directory=/mnt \ --time_based --ramp_time=1m --runtime=30m \ --ioengine=io_uring --iodepth=128 --iodepth_batch_submit=32 \ --iodepth_batch_complete=32 --norandommap \ --name=mglru-ro --rw=randread --random_distribution=zipf:0.7 \ --name=mglru-rw --rw=randrw --random_distribution=zipf:0.7 Before this patch: READ: 6622.0 MiB/s. Stdev: 22.090722 WRITE: 1256.3 MiB/s. Stdev: 5.249339 After this patch (+4.6%, +3.3%): READ: 6926.6 MiB/s, Stdev: 37.950260 WRITE: 1297.3 MiB/s, Stdev: 7.408704 Test 3: 30m of MySQL test in 6G memcg (12 times): echo 'set GLOBAL innodb_buffer_pool_size=16106127360;' | \ mysql -u USER -h localhost --password=PASS sysbench /usr/share/sysbench/oltp_read_only.lua \ --mysql-user=USER --mysql-password=PASS --mysql-db=DB \ --tables=48 --table-size=2000000 --threads=16 --time=1800 run Before this patch Avg: 134743.714545 qps. Stdev: 582.242189 After this patch (+0.2%): Avg: 135005.779091 qps. Stdev: 295.299027 Test 4: Build linux kernel in 2G memcg with make -j48 with SSD swap (for memory stress, 18 times): Before this patch: Avg: 1456.768899 s. Stdev: 20.106973 After this patch (+0.0%): Avg: 1455.659254 s. Stdev: 15.274481 Test 5: Memtier test in a 4G cgroup using brd as swap (18 times): memcached -u nobody -m 16384 -s /tmp/memcached.socket \ -a 0766 -t 16 -B binary & memtier_benchmark -S /tmp/memcached.socket \ -P memcache_binary -n allkeys \ --key-minimum=1 --key-maximum=16000000 -d 1024 \ --ratio=1:0 --key-pattern=P:P -c 1 -t 16 --pipeline 8 -x 3 Before this patch: Avg: 50317.984000 Ops/sec. Stdev: 2568.965458 After this patch (-5.7%): Avg: 47691.343500 Ops/sec. Stdev: 3925.772473 It seems prefetch is helpful in most cases, but the memtier test is either hitting a case where prefetch causes higher cache miss or it's just too noisy (high stdev). Signed-off-by: Kairui Song --- mm/vmscan.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f9c854ce6cc..03631cedb3ab 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3681,15 +3681,26 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) /* prevent cold/hot inversion if force_scan is true */ for (zone = 0; zone < MAX_NR_ZONES; zone++) { struct list_head *head = &lrugen->folios[old_gen][type][zone]; + struct folio *prev = NULL; - while (!list_empty(head)) { - struct folio *folio = lru_to_folio(head); + if (!list_empty(head)) + prev = lru_to_folio(head); + + while (prev) { + struct folio *folio = prev; VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio); VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio); VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); + if (unlikely(list_is_first(&folio->lru, head))) { + prev = NULL; + } else { + prev = lru_to_folio(&folio->lru); + prefetchw(&prev->flags); + } + new_gen = folio_inc_gen(lruvec, folio, false); list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]); @@ -4341,11 +4352,15 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, for (i = MAX_NR_ZONES; i > 0; i--) { LIST_HEAD(moved); int skipped_zone = 0; + struct folio *prev = NULL; int zone = (sc->reclaim_idx + i) % MAX_NR_ZONES; struct list_head *head = &lrugen->folios[gen][type][zone]; - while (!list_empty(head)) { - struct folio *folio = lru_to_folio(head); + if (!list_empty(head)) + prev = lru_to_folio(head); + + while (prev) { + struct folio *folio = prev; int delta = folio_nr_pages(folio); VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); @@ -4355,6 +4370,13 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, scanned += delta; + if (unlikely(list_is_first(&folio->lru, head))) { + prev = NULL; + } else { + prev = lru_to_folio(&folio->lru); + prefetchw(&prev->flags); + } + if (sort_folio(lruvec, folio, sc, tier)) sorted += delta; else if (isolate_folio(lruvec, folio, sc)) { From patchwork Tue Jan 23 18:45:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13527977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BDAFC47DDB for ; Tue, 23 Jan 2024 18:46:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 829DB6B008A; Tue, 23 Jan 2024 13:46:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D7D06B008C; Tue, 23 Jan 2024 13:46:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 608666B0092; Tue, 23 Jan 2024 13:46:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3FA596B008A for ; Tue, 23 Jan 2024 13:46:09 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id DD81F12070A for ; Tue, 23 Jan 2024 18:46:08 +0000 (UTC) X-FDA: 81711455616.20.BE9EE31 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf06.hostedemail.com (Postfix) with ESMTP id E48E0180024 for ; Tue, 23 Jan 2024 18:46:06 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WFOnnuL6; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706035567; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rc80RJovXRONel2vPZqiqNiPzdLsOGe//Ia7888m/xQ=; b=cQU+MYyf/zY3ydgodF2/GC/4Ggg/51WOzHPBfknrJC/N9kLRneCO/+huydK9egeYLjO1oR mkCxiBKoT/Mx5CubkDztnze62dZigYF/JqY05KQsqfG/N72mZ7J4T87PQ2Ytd+I3EWW9s9 DgjZ+llJa2BvMcY6mwbQwO05GRczMzU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706035567; a=rsa-sha256; cv=none; b=GOZEsW5UDWjRPaZhHEmASNVwZifw6JzaokiLs1WB6kIvyEZKFrcDt4Ld+woNoR7fW7mvq/ zB8uwfMxKTuYDG7c9xnRvqic1+nzTfY6kcDEcF0AOOEhuGFbRn0dEAaOAPRkJG6C9F3OII OyorqOr9zh8r6ZLbWhG64+7PSFs5B1g= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WFOnnuL6; spf=pass (imf06.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1d75ea3a9b6so18104985ad.2 for ; Tue, 23 Jan 2024 10:46:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706035565; x=1706640365; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=rc80RJovXRONel2vPZqiqNiPzdLsOGe//Ia7888m/xQ=; b=WFOnnuL60NRFrtKX8GPdmYLlFxNsdFX3V0vzRV0GgPIOTCOMadlh370bYDL1DGs0gd qY2XYzBf58O8EwDMC0scNSU4+RsDW+qBApSEThTfMs8FXn043dH0hAoGrbcEBa9T/Bay WCzftZolKV8J9kmnnWR4ElPaSijJ471Yf5h3hS6xRnFDzP6NPjt91sF7EE0nbllYigKQ Sr/mKX10KP/kXOeWT8lL1Bge7Bf2DoVeBFb89k+dUCjlvqwfuxMLd7sDJjdvej+Cs1Iu 0XyUI1Kjlj+GsVdTCKUPPXMAFKlunuJuW3GDKrpdMuOOcQH1WKjS7FkJBMgao4t8MR8k 46aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706035565; x=1706640365; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=rc80RJovXRONel2vPZqiqNiPzdLsOGe//Ia7888m/xQ=; b=vOqNAxmRkqA252KMGE0h7f1z3/aToy0X2AI8gACj18YbbW9sTxIo7TRptqgLFpzjAT pnVkZ36xixhMIGJtTBxEoadostB6GDEo4oxujZBueyrthPcb9viLIyDMcEg2MPfAhrUT X8VVFVKu+69vByJxRcRc8x+18UScgGOXj762Eo4e0PSscPgUikGLFL6ovm36NsqqW4GX 3TT4LLYUhVzkzL0i0k5EVIjBKjG8h/XQyO8dmpqfhyvlmNSnuuCsKz4xxhVtM82fbGyC KbN/DuL/BraTqxr6H5YNeqZ9ddwQmxYbobV5M+1bw/BnRmRp2A3H8WiMjLT4u9qJ4kDT bJhQ== X-Gm-Message-State: AOJu0Yy52wJWqjDWA/pcgBhv1xUH+m2T06QevG+2c+uh02aYS30b8PAa QbVCLKSnwzzkNwAtjHUJ84obbjlaqlK4T4Odm6Vxm0MEH8AOGoebDzf2+mAx6TqO/g== X-Google-Smtp-Source: AGHT+IF64IW2XJLq7K2HXZYsFhS1Lcix1YogNkO/ahBEKQxX6SOEvgC3c8bnVYZKaRE3MHMEchv0JA== X-Received: by 2002:a17:902:d48b:b0:1d7:37cf:6c71 with SMTP id c11-20020a170902d48b00b001d737cf6c71mr7152816plg.38.1706035565062; Tue, 23 Jan 2024 10:46:05 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id d5-20020a170902b70500b001d74c285b55sm4035196pls.67.2024.01.23.10.46.02 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 23 Jan 2024 10:46:04 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Wei Xu , Chris Li , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 2/3] mm, lru_gen: batch update counters on aging Date: Wed, 24 Jan 2024 02:45:51 +0800 Message-ID: <20240123184552.59758-3-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240123184552.59758-1-ryncsn@gmail.com> References: <20240123184552.59758-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Queue-Id: E48E0180024 X-Rspam-User: X-Stat-Signature: r7mb6uwyf18retycm1wqjfm54rho1hnf X-Rspamd-Server: rspam03 X-HE-Tag: 1706035566-860679 X-HE-Meta: U2FsdGVkX18kEtStBR7zHruUMpv5vuolXyAJLWdJWEAB3QLCjbQbExJN88ovqPMlXWt/7kuDQU+qcj4C5h6I03q26t5u3M3pRj7inpvyz9vV0rUZ/pJ8L9Gu08k9u73GOMMd2K202IXCtrarPRILVDT+vIPwnA3vLuaXPSvziy7R1bm7rIZFce/KkFtI2ZUCcuWmjaXBtSNgRs/UJdZ1kHlCVcYJlunu4kNZiwm1F1SHYrxjbpP1wANMvZkNv0cbrHSke3NBUE3IAi2744BfzqWlEri1LgWhLNPPuJPZMMsVlUFVpVuTfJtqTzCqkrfOLDmESx4ezf5wQFjWmUxOhO9+M7OV3Q88ph/vlTYg5NBTnKmoGI4A6XAkTmy5uZPin9VCYlsbLeBIcD2mrkAne4peaZNMtjrlZM/HBlmT6uQuBeQbSHMTflbjNB+Xf11ILYh2C1oUjKaa05Dugq/BmjS1rjDLtIbfKr6f1ZrBDkAdF93q9oLkMUfnK43HQ8uZH/aiQ28OYfT7pXkIjpUER9qcApyr2s7HCg7wjieSO60YoPFC2daAhF3Orjau1vLJMUVJYGMzIacuKlhwJyxSWsxQtZKf4nOZsRH9BgCPa9v+uWiNnH4+kpT0wgATcI2fbUwsyoiDygmvilij0OYWhGfry9FoN4s7hmRnkBTQ2tsVCcACE6Qvx3I+Ki5InSoIq8JYf07h9yekvFYl9MQYgs3cpAgtvY9YjoHy4Wouf4udnf218MeBWU7yZkzdDgZbXyhP8uljHwzxP6Ly2ODdvf7d+RJl0fwrGWAY1JhUVCDwHb2e9XIV/FkGXoJ7pGT/Q6T5rBEgYMFIxd1xvDn0Qvj1AseUj5stzZu5uVLcqyFgQqalfCZyipesogDRgWSY+n4IOaIRl8h/Z84K/cMYdH78qSU0svVv55uW/e41mdjYAFL0eJpz43IcN3ZZ4sN8WyTFyorsv5aqlrC2flV lfyfXa20 TwIYW4sbKVCY9bcsJdFw2ojWf5Dy/cKjk0IKBTQYGN7b9CPnWs848TD9M3YLOGMftG9z3UAEQEhFuopM8F46g+qQ2Fbe08HqzwlgtD8Z0Vvs149JNCtQI4C7GOkUMrJLMHQkXNR6lzvI5fkZt4YN4jpE44KZ7JRAl5b1A6BNsjMEVDojzoYKszIJJhSEZb02hq1YjeZLxj752ssubjyftumUob0tlv53/pHDo6SzrMGZtebfTuWDt2u+juAt690fwWSSCJMps0rK3U46m0p2mbiFoJHncpGcnMHhbe6wenZqk4yxfmwCIXKb/wxOzcmWyhUrYqU68Ov5k9IP8FjDeu1CVSzsXEiS78IsNq824Qib0nl38BvgJm1oSLKpg0qck0zBCf8kXKg+D2iWRcJbjVo8ktx+Od/55QuGaogpnNH3E9YEkTYBIeez5tRnUSZWyD7L4+JfM/7EEFKohDTNZixZA6k8XsUuDh5BIR4DcgITDMtc2lrGsW/hQbQU3msXH8AWF5/AF5mUhM+rYwjnkngYjdqjJjeX51zrk39IQu7Jgubni5L+FOvpJ5hJp+jQBrGE8Ll3p22MuqZHOMspReVMimPR94yZCnG+3tj5+jfa5lrQ0agLCHcXH3v1ubQkFdRYVVNGqGYb264ML2+HPx+57c3zThjgbcxfBgvnbkfOsBjg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001121, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song When lru_gen is aging, it will update mm counters page by page, which causes a higher overhead if age happens frequently or there are a lot of pages in one generation getting moved. Optimize this by doing the counter update in batch. Although most __mod_*_state has its own caches the overhead is still observable. Test 1: Ramdisk fio test in a 4G memcg on a EPYC 7K62 with: fio -name=mglru --numjobs=16 --directory=/mnt --size=960m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=zipf:0.5 --norandommap \ --time_based --ramp_time=1m --runtime=6m --group_reporting Before this patch: bw ( MiB/s): min= 8360, max= 9771, per=100.00%, avg=9381.31, stdev=15.67, samples=11488 iops : min=2140296, max=2501385, avg=2401613.91, stdev=4010.41, samples=11488 After this patch (+0.0%): bw ( MiB/s): min= 8299, max= 9847, per=100.00%, avg=9388.23, stdev=16.25, samples=11488 iops : min=2124544, max=2521056, avg=2403385.82, stdev=4159.07, samples=11488 Test 2: Ramdisk fio hybrid test for 30m in a 4G memcg on a EPYC 7K62 (3 times): fio --buffered=1 --numjobs=8 --size=960m --directory=/mnt \ --time_based --ramp_time=1m --runtime=30m \ --ioengine=io_uring --iodepth=128 --iodepth_batch_submit=32 \ --iodepth_batch_complete=32 --norandommap \ --name=mglru-ro --rw=randread --random_distribution=zipf:0.7 \ --name=mglru-rw --rw=randrw --random_distribution=zipf:0.7 Before this patch: READ: 6926.6 MiB/s, Stdev: 37.950260 WRITE: 1297.3 MiB/s, Stdev: 7.408704 After this patch (+0.7%, +0.4%): READ: 6973.3 MiB/s, Stdev: 19.601587 WRITE: 1302.3 MiB/s, Stdev: 4.988877 Test 3: 30m of MySQL test in 6G memcg (12 times): echo 'set GLOBAL innodb_buffer_pool_size=16106127360;' | \ mysql -u USER -h localhost --password=PASS sysbench /usr/share/sysbench/oltp_read_only.lua \ --mysql-user=USER --mysql-password=PASS --mysql-db=DB \ --tables=48 --table-size=2000000 --threads=16 --time=1800 run Before this patch Avg: 135005.779091 qps. Stdev: 295.299027 After this patch (+0.2%): Avg: 135310.868182 qps. Stdev: 379.200942 Test 4: Build linux kernel in 2G memcg with make -j48 with SSD swap (for memory stress, 18 times): Before this patch: Average: 1455.659254 s. Stdev: 15.274481 After this patch (-0.8%): Average: 1467.813023 s. Stdev: 24.232886 Test 5: Memtier test in a 4G cgroup using brd as swap (20 times): memcached -u nobody -m 16384 -s /tmp/memcached.socket \ -a 0766 -t 16 -B binary & memtier_benchmark -S /tmp/memcached.socket \ -P memcache_binary -n allkeys \ --key-minimum=1 --key-maximum=16000000 -d 1024 \ --ratio=1:0 --key-pattern=P:P -c 1 -t 16 --pipeline 8 -x 3 Before this patch: Avg: 47691.343500 Ops/sec. Stdev: 3925.772473 After this patch (+1.7%): Avg: 48389.282500 Ops/sec. Stdev: 3534.470933 Signed-off-by: Kairui Song --- mm/vmscan.c | 68 +++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 03631cedb3ab..8c701b34d757 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3113,12 +3113,45 @@ static int folio_update_gen(struct folio *folio, int gen) return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; } -/* protect pages accessed multiple times through file descriptors */ -static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +/* + * When oldest gen ie being reclaimed, protected/unreclaimable pages can be + * moved in batch. They usually all land on same gen (old_gen + 1) by + * folio_inc_gen so the batch struct is limited to one / type / zone + * level LRU. + * Batch is applied after finished or aborted scanning one LRU list. + */ +struct lru_gen_inc_batch { + int delta; +}; + +static void lru_gen_inc_batch_done(struct lruvec *lruvec, int gen, int type, int zone, + struct lru_gen_inc_batch *batch) { - int type = folio_is_file_lru(folio); + int delta = batch->delta; + int new_gen = (gen + 1) % MAX_NR_GENS; struct lru_gen_folio *lrugen = &lruvec->lrugen; - int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]); + enum lru_list lru = type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON; + + if (!delta) + return; + + WRITE_ONCE(lrugen->nr_pages[gen][type][zone], + lrugen->nr_pages[gen][type][zone] - delta); + WRITE_ONCE(lrugen->nr_pages[new_gen][type][zone], + lrugen->nr_pages[new_gen][type][zone] + delta); + + if (!lru_gen_is_active(lruvec, gen) && lru_gen_is_active(lruvec, new_gen)) { + __update_lru_size(lruvec, lru, zone, -delta); + __update_lru_size(lruvec, lru + LRU_ACTIVE, zone, delta); + } +} + +/* protect pages accessed multiple times through file descriptors */ +static int folio_inc_gen(struct folio *folio, int old_gen, bool reclaiming, + struct lru_gen_inc_batch *batch) +{ + int new_gen; + int delta = folio_nr_pages(folio); unsigned long new_flags, old_flags = READ_ONCE(folio->flags); VM_WARN_ON_ONCE_FOLIO(!(old_flags & LRU_GEN_MASK), folio); @@ -3138,7 +3171,8 @@ static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclai new_flags |= BIT(PG_reclaim); } while (!try_cmpxchg(&folio->flags, &old_flags, new_flags)); - lru_gen_update_size(lruvec, folio, old_gen, new_gen); + /* new_gen is ensured to be old_gen + 1 here, do a batch update */ + batch->delta += delta; return new_gen; } @@ -3672,6 +3706,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) { int zone; int remaining = MAX_LRU_BATCH; + struct lru_gen_inc_batch batch = { }; struct lru_gen_folio *lrugen = &lruvec->lrugen; int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]); @@ -3701,12 +3736,15 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) prefetchw(&prev->flags); } - new_gen = folio_inc_gen(lruvec, folio, false); + new_gen = folio_inc_gen(folio, old_gen, false, &batch); list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]); - if (!--remaining) + if (!--remaining) { + lru_gen_inc_batch_done(lruvec, old_gen, type, zone, &batch); return false; + } } + lru_gen_inc_batch_done(lruvec, old_gen, type, zone, &batch); } done: reset_ctrl_pos(lruvec, type, true); @@ -4226,7 +4264,7 @@ void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid) ******************************************************************************/ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_control *sc, - int tier_idx) + int tier_idx, struct lru_gen_inc_batch *batch) { bool success; int gen = folio_lru_gen(folio); @@ -4236,6 +4274,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c int refs = folio_lru_refs(folio); int tier = lru_tier_from_refs(refs); struct lru_gen_folio *lrugen = &lruvec->lrugen; + int old_gen = lru_gen_from_seq(lrugen->min_seq[type]); VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio); @@ -4259,7 +4298,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c } /* promoted */ - if (gen != lru_gen_from_seq(lrugen->min_seq[type])) { + if (gen != old_gen) { list_move(&folio->lru, &lrugen->folios[gen][type][zone]); return true; } @@ -4268,7 +4307,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) { int hist = lru_hist_from_seq(lrugen->min_seq[type]); - gen = folio_inc_gen(lruvec, folio, false); + gen = folio_inc_gen(folio, old_gen, false, batch); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); WRITE_ONCE(lrugen->protected[hist][type][tier - 1], @@ -4278,7 +4317,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c /* ineligible */ if (zone > sc->reclaim_idx || skip_cma(folio, sc)) { - gen = folio_inc_gen(lruvec, folio, false); + gen = folio_inc_gen(folio, old_gen, false, batch); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); return true; } @@ -4286,7 +4325,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c /* waiting for writeback */ if (folio_test_locked(folio) || folio_test_writeback(folio) || (type == LRU_GEN_FILE && folio_test_dirty(folio))) { - gen = folio_inc_gen(lruvec, folio, true); + gen = folio_inc_gen(folio, old_gen, true, batch); list_move(&folio->lru, &lrugen->folios[gen][type][zone]); return true; } @@ -4353,6 +4392,7 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, LIST_HEAD(moved); int skipped_zone = 0; struct folio *prev = NULL; + struct lru_gen_inc_batch batch = { }; int zone = (sc->reclaim_idx + i) % MAX_NR_ZONES; struct list_head *head = &lrugen->folios[gen][type][zone]; @@ -4377,7 +4417,7 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, prefetchw(&prev->flags); } - if (sort_folio(lruvec, folio, sc, tier)) + if (sort_folio(lruvec, folio, sc, tier, &batch)) sorted += delta; else if (isolate_folio(lruvec, folio, sc)) { list_add(&folio->lru, list); @@ -4391,6 +4431,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, break; } + lru_gen_inc_batch_done(lruvec, gen, type, zone, &batch); + if (skipped_zone) { list_splice(&moved, head); __count_zid_vm_events(PGSCAN_SKIP, zone, skipped_zone); From patchwork Tue Jan 23 18:45:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13527978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07A92C47DDB for ; Tue, 23 Jan 2024 18:46:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C5266B0092; Tue, 23 Jan 2024 13:46:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 874206B0093; Tue, 23 Jan 2024 13:46:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EF5A6B0095; Tue, 23 Jan 2024 13:46:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5B1246B0092 for ; Tue, 23 Jan 2024 13:46:12 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2C89E1403E4 for ; Tue, 23 Jan 2024 18:46:12 +0000 (UTC) X-FDA: 81711455784.21.763D5CE Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf03.hostedemail.com (Postfix) with ESMTP id 5232C20009 for ; Tue, 23 Jan 2024 18:46:10 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QwCOPayb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706035570; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IuyIoFCfZ7YdkhEdaTKUsA0jfoduX/Ed91Z/Iy4HbTw=; b=7COFatIJadDGfEVMrkAJRnZa1rr+0b+CFwGtpIRJJjUaytTeqAFieuwmD1BDPNGnUl2nHk SQ8QiDmL+coJtsogAjdST8lXRPsTD8ABNr3KgL/Oc1Anp6XJttX36ikLBTdw8T8/M1Mm/P dw/At+x0sH0qt8Y8tYjMZn1CewoEJlg= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QwCOPayb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706035570; a=rsa-sha256; cv=none; b=WyA/zjbhr5ucb3k3y5dOsVRFmXPdX+U9K6Ap2FhcctXWU/1ZoxdU3UjGdHuOjbF9mLtCpm vzkilWpL5Nj8lOlaAkGPALQXRejFYSttnL79Z9ns8fumnskSjyyWjQJM6RCMVueCgfXL8Z RQkdtH2F95+76SW9kb8DKAae6Fcwa9E= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1d748d43186so17462595ad.0 for ; Tue, 23 Jan 2024 10:46:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706035568; x=1706640368; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=IuyIoFCfZ7YdkhEdaTKUsA0jfoduX/Ed91Z/Iy4HbTw=; b=QwCOPaybqcNz/jOp3t9QnvjSZOEIdYjmakzpoQQdFUfZ2DBSADaT5NjMqwIxJ7L5Dz YfcgG3265UeNAefaRbmgfBvJAiJNNWrlqGYXs3asc+mrBsB1MbgRChWhrCUXxsUMUTTU bUD0x+hOaG0UOZT8sAN+rAU1nSpT4uePpbvu6KwMul0lWO/PEJ8Z1ocsXfXPat0gd3ET kBua3LZ0A94HtIW/CY8nfPqN2TAw7HTG0/Cqmey7+520QwSJePDypwnxVzgjDdpWuZAm AI/+WX9s5cNKT6TYfWGlQV1VqDOqHUu9hLpXl+yIKoQJnGjKBW+OMjmgMMe8T0rmVLst deWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706035568; x=1706640368; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=IuyIoFCfZ7YdkhEdaTKUsA0jfoduX/Ed91Z/Iy4HbTw=; b=juN4LV9J8xzCyBLqF08d1BoIh/j651P/A+tyhlp8Y5pgqu/SLOrdAkJ9hnwQFec5ZM wdOjIWf1vHIp57q/Eyojq98zZuguBWwGuTK3hfI1jNWhGmdWbkpJS7zpmqHr0hUDXCzj 1tkvzcUoU1n7kfhAuckHCcAsWtXW7BnkmahpJFaU5ykCC484T6VHqW49dEJFV+2qXlV/ vcrIqm79aB+Fk63ezq8aj4G/RwGm1D4uzzG7wZImxFOCmmwjpYjdBOdAqseAx6WtcOdn vREUSGYySiAUOkiz6JMOKHbku0KpnHvR5S52d4gM138l3eNZpiarr6uji+xvepTmHk6y SrWQ== X-Gm-Message-State: AOJu0YwO+kWUa0MulkvAItepKbu1UUgOBzhbhRSyCT4Pj16kIfdbJTgM lG9+aI/DfY0p8kdQ2Lmagi600+PnZfgi8v7WQSBqTSzy/TVnFx6nD4C9k4BLf71Duw== X-Google-Smtp-Source: AGHT+IEICLiW4CBTB3qCkG5rC/3WLM2CjLX2MTEiymk06M7wUen2STMsCnbNu4Cim58h7VWkWAoNUA== X-Received: by 2002:a17:902:eacb:b0:1d7:2500:69d5 with SMTP id p11-20020a170902eacb00b001d7250069d5mr3578430pld.17.1706035567964; Tue, 23 Jan 2024 10:46:07 -0800 (PST) Received: from KASONG-MB2.tencent.com ([1.203.117.98]) by smtp.gmail.com with ESMTPSA id d5-20020a170902b70500b001d74c285b55sm4035196pls.67.2024.01.23.10.46.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 23 Jan 2024 10:46:07 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Wei Xu , Chris Li , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 3/3] mm, lru_gen: move pages in bulk when aging Date: Wed, 24 Jan 2024 02:45:52 +0800 Message-ID: <20240123184552.59758-4-ryncsn@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240123184552.59758-1-ryncsn@gmail.com> References: <20240123184552.59758-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 5232C20009 X-Stat-Signature: tq4yk51nb9zeui57qiqjdncpuhyg3ymh X-HE-Tag: 1706035570-766500 X-HE-Meta: U2FsdGVkX19oKSXKnkQcdl5mLC/m2cmlaJpUQjnJJqC1bbjBzEIUpFZFEwEJ8n/64D3fys+gesYmXN9zARX7U/VOn9ek7Vh/mxcS3p96NKQAvF41GGolbMBQA3AMvnpmFnsTNVLvI1Nk4lltYNQIGasEpCZGVRdVYebSg5CKjKmNMba+jkRGAS3KRs3wttgBtY2iQqlLgfY+N3XrUQEDrBquBU7jAcuOPYo+1BJjJdK8K9OwNoM3xC3ZTvNPZAIMnlI0JguKo37vORRS01GAhqf8T5NePms1i/u9wmYwtNu6/lLC/BDepmgIEooICE6uzwigLsgOBbB5BQkhq88o/XY5U7ofUajVxA03AZC3mTWi1jAwn7/pzO2iW7Tj6W38YQBmWyOVJDv3SQUDWZ8HeQiXRzUhiXEuaEoHai4RzAKUHfaDe5COCQoJDxdyqCgIgXICbz4S7sN7sTA6uuEk+RtKqxpsjiZiGtRtw1VuyK7d/zCpeBdj5dTzKTONjal3LMceSz15NlNCyCMRVdyKLORm2vDXNC2/ysk4kN54fCdYoMMSs/EEyqV3tyYP8LKgnrqaaySeLXENIzEP3YCgSBkQ0DZ+rwt71orvpOryI0fhComBFkaEexDfw/XPdeJIJUebKacu5MxgHxECY5h+FTx5gpsxmWRG5O7NNgDQ2/4IMCYZJtO2o62RUGrQi0NAUFI+/1bzTfCvLJ1wxZ4fSToLA0FuDQ34+tH2XjX0W7wE+8LhMmSA0NaOE5ZI7hxDgmzf2seuMsqf4nkEzhHszx9Oh1AKyBw3Z3kOnHMeQvHpyYJAzYIIKXvOZB4KqJR/NXg4nu5EM9ZYsZPXLAHgUFzvwTRlXYMLPeD8wVF/sLU+gVORiCFgOezFnF7T+GktCfzkRmQL/tN1ul16rbAZfWSl6PZfioxvB1wiIVN7f48HdoGJDeEgeFTLBu+Js5aLHEVKWctx3IGztlPe7iW JVS0j/8Q mZKpqDe0BDocA2+OjLU7bKZ+UA1daNi5WbSYa4fhC+2Pj3aLOMpW+QJc7UdLcIAoHM2xDH3yekzme2pjezc3MluW/5dGaApRvq15bf6biuLR1W6PWJG4Xrxjni6iw6itJUqRwvakEbpuSUPm1usj3JOyZ/qiGLzAC2Q4zW3EfG/tNGyVCbsn1mttL/l8ccV68QX/wjb1scVC9/viQkNAOot654X1Mv1Vi2q4giH69dtQNxTR2WK2qUaJsqcCb5jCSRp/QjW0UiY0W2bMIlKHGVF3/JQbI2xc93Hj9pAFqi6q86bym8bPXlRzXDYiUgzz4nRPygdm/yrBC/OccGoZB95aNboa5ACimlwkBSK0hVoZFsN0o0ULLLRjBGG817v/0lRObv1QAuoK6uBuDdJ10xurLcvCmHm+nes95nXcetMJrJZJre8e4QjkUoVvk0i3tutjbZRjAhbUk7NoCigyGvl+UvNChHWRlYGyH9EqxHGA6X8yCKoyvF4SFIfrOPp64yz60Focb+xligz7rNjzjKNJ2kUUAZ8ETGn0b4+mngFAs+kfvyWFvV6eE/2YxIYhBvB+B6yW8Qlp4tYfHbNnv+89u6cyy++PQp1C74gtqTdniq99ia4iBCQh8gG8fvLpN884lbjb18j8CDGCtKdMjgqX6lKx1TmvZf8xKr6RJRAD+h2I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001096, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Another overhead of aging is page moving. Actually, in most cases, pages are being moved to the same gen after folio_inc_gen is called, especially the protected pages. So it's better to move them in bulk. This also has a good effect on LRU order. Currently when MGLRU ages, it walks the LRU backwards, and the protected pages are moved to the tail of newer gen one by one, which actually reverses the order of pages in LRU. Moving them in batches can help keep their order, only in a small scope though, due to the scan limit of MAX_LRU_BATCH pages. After this commit, we can see a slight performance gain (with CONFIG_DEBUG_LIST=n): Test 1: Ramdisk fio test in a 4G memcg on a EPYC 7K62: fio -name=mglru --numjobs=16 --directory=/mnt --size=960m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=zipf:0.5 --norandommap \ --time_based --ramp_time=1m --runtime=6m --group_reporting Before: bw ( MiB/s): min= 8299, max= 9847, per=100.00%, avg=9388.23, stdev=16.25, samples=11488 iops : min=2124544, max=2521056, avg=2403385.82, stdev=4159.07, samples=11488 After (-0.2%): bw ( MiB/s): min= 8359, max= 9796, per=100.00%, avg=9367.29, stdev=15.75, samples=11488 iops : min=2140113, max=2507928, avg=2398024.65, stdev=4033.07, samples=11488 Test 2: Ramdisk fio hybrid test for 30m in a 4G memcg on a EPYC 7K62 (3 times): fio --buffered=1 --numjobs=8 --size=960m --directory=/mnt \ --time_based --ramp_time=1m --runtime=30m \ --ioengine=io_uring --iodepth=128 --iodepth_batch_submit=32 \ --iodepth_batch_complete=32 --norandommap \ --name=mglru-ro --rw=randread --random_distribution=zipf:0.7 \ --name=mglru-rw --rw=randrw --random_distribution=zipf:0.7 Before this patch: READ: 6973.3 MiB/s, Stdev: 19.601587 WRITE: 1302.3 MiB/s, Stdev: 4.988877 After this patch (+0.1%, +0.3%): READ: 6981.0 MiB/s, Stdev: 15.556349 WRITE: 1305.7 MiB/s, Stdev: 2.357023 Test 3: 30m of MySQL test in 6G memcg for 12 times: echo 'set GLOBAL innodb_buffer_pool_size=16106127360;' | \ mysql -u USER -h localhost --password=PASS sysbench /usr/share/sysbench/oltp_read_only.lua \ --mysql-user=USER --mysql-password=PASS --mysql-db=DB \ --tables=48 --table-size=2000000 --threads=16 --time=1800 run Before this patch Avg: 135310.868182 qps. Stdev: 379.200942 After this patch (-0.3%): Avg: 135099.210000 qps. Stdev: 351.488863 Test 4: Build linux kernel in 2G memcg with make -j48 with SSD swap (for memory stress, 18 times): Before this patch: Average: 1467.813023. Stdev: 24.232886 After this patch (+0.0%): Average: 1464.178154. Stdev: 17.992974 Test 5: Memtier test in a 4G cgroup using brd as swap (20 times): memcached -u nobody -m 16384 -s /tmp/memcached.socket \ -a 0766 -t 16 -B binary & memtier_benchmark -S /tmp/memcached.socket \ -P memcache_binary -n allkeys \ --key-minimum=1 --key-maximum=16000000 -d 1024 \ --ratio=1:0 --key-pattern=P:P -c 1 -t 16 --pipeline 8 -x 3 Before this patch: Avg: 48389.282500 Ops/sec. Stdev: 3534.470933 After this patch (+1.2%): Avg: 48959.374118 Ops/sec. Stdev: 3488.559744 Signed-off-by: Kairui Song --- mm/vmscan.c | 47 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 8c701b34d757..373a70801db9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3122,8 +3122,45 @@ static int folio_update_gen(struct folio *folio, int gen) */ struct lru_gen_inc_batch { int delta; + struct folio *head, *tail; }; +static inline void lru_gen_inc_bulk_done(struct lru_gen_folio *lrugen, + int bulk_gen, bool type, int zone, + struct lru_gen_inc_batch *batch) +{ + if (!batch->head) + return; + + list_bulk_move_tail(&lrugen->folios[bulk_gen][type][zone], + &batch->head->lru, + &batch->tail->lru); + + batch->head = NULL; +} + +/* + * When aging, protected pages will go to the tail of the same higher + * gen, so the can be moved in batches. Besides reduced overhead, this + * also avoids changing their LRU order in a small scope. + */ +static inline void lru_gen_try_bulk_move(struct lru_gen_folio *lrugen, struct folio *folio, + int bulk_gen, int new_gen, bool type, int zone, + struct lru_gen_inc_batch *batch) +{ + /* + * If folio not moving to the bulk_gen, it's raced with promotion + * so it need to go to the head of another LRU. + */ + if (bulk_gen != new_gen) + list_move(&folio->lru, &lrugen->folios[new_gen][type][zone]); + + if (!batch->head) + batch->tail = folio; + + batch->head = folio; +} + static void lru_gen_inc_batch_done(struct lruvec *lruvec, int gen, int type, int zone, struct lru_gen_inc_batch *batch) { @@ -3132,6 +3169,8 @@ static void lru_gen_inc_batch_done(struct lruvec *lruvec, int gen, int type, int struct lru_gen_folio *lrugen = &lruvec->lrugen; enum lru_list lru = type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON; + lru_gen_inc_bulk_done(lrugen, new_gen, type, zone, batch); + if (!delta) return; @@ -3709,6 +3748,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) struct lru_gen_inc_batch batch = { }; struct lru_gen_folio *lrugen = &lruvec->lrugen; int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]); + int bulk_gen = (old_gen + 1) % MAX_NR_GENS; if (type == LRU_GEN_ANON && !can_swap) goto done; @@ -3737,7 +3777,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) } new_gen = folio_inc_gen(folio, old_gen, false, &batch); - list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]); + lru_gen_try_bulk_move(lrugen, folio, bulk_gen, new_gen, type, zone, &batch); if (!--remaining) { lru_gen_inc_batch_done(lruvec, old_gen, type, zone, &batch); @@ -4275,6 +4315,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c int tier = lru_tier_from_refs(refs); struct lru_gen_folio *lrugen = &lruvec->lrugen; int old_gen = lru_gen_from_seq(lrugen->min_seq[type]); + int bulk_gen = (old_gen + 1) % MAX_NR_GENS; VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio); @@ -4308,7 +4349,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c int hist = lru_hist_from_seq(lrugen->min_seq[type]); gen = folio_inc_gen(folio, old_gen, false, batch); - list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); + lru_gen_try_bulk_move(lrugen, folio, bulk_gen, gen, type, zone, batch); WRITE_ONCE(lrugen->protected[hist][type][tier - 1], lrugen->protected[hist][type][tier - 1] + delta); @@ -4318,7 +4359,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c /* ineligible */ if (zone > sc->reclaim_idx || skip_cma(folio, sc)) { gen = folio_inc_gen(folio, old_gen, false, batch); - list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); + lru_gen_try_bulk_move(lrugen, folio, bulk_gen, gen, type, zone, batch); return true; }