From patchwork Wed May 18 01:46:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 12853099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6703EC433EF for ; Wed, 18 May 2022 03:04:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=mtY2Dcl1RLYiG0Wm4P8Dc8a78EJZYN6/izcD1eIote0=; b=dlHretoOcXy8evw0FxwnpiVyNV GCzvf9D920LfnAxlt3S/iWhATfwx2bPNhu1Oz6Rtq2aXSRkzJGAg+KZOUYd/6q+4tRKqW/ZwnxfLh bY/Gi2fbF8nIw/BxWRS0HLvYEO/S5b5sPHFC+WW3NDP/DDZWITNqRis6tOzgZrFBASqn/cB18+Q9r lAy3gU8dd2yJrNq7e8zfFN9rJ12za7T4583ntcd0L0iLvLTfi/3pzGTpbhhDwb0iFjzCanHqIjXzW l6Nl7CQdwjIdoTwqJnoYtLh6r3bj8mnUhjo/p8fVbeIjJvfs+eA7RwWcugFBbXe+DBAdVRJpDBjk4 hkaiAkVg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nr9xm-00GmvQ-Bb; Wed, 18 May 2022 03:03:04 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nr9wa-00GmIb-9p for linux-arm-kernel@bombadil.infradead.org; Wed, 18 May 2022 03:01:48 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:Content-Type :Cc:To:From:Subject:References:Mime-Version:Message-Id:In-Reply-To:Date: Sender:Reply-To:Content-ID:Content-Description; bh=4FeaUrAcLPRNiyzpAHQ0qcCXUD7YXoCTIslsK0eD41Y=; b=La7DGjkFh2dQtDEALxEYACrRoj cWH8HrG3HQPw3hZMHgMDMrM+W+u3dcq6ussPOJ/zfhSu7pOq0Kwo25P1xqqMxlGxEaZVz04o8hH11 WYv0daUsyubTiBS3lAZ7xbus3fGKZkDrHOVBjuaKhzLeUKQBItNG+VsJcgY6uO1gGT3ukKgTrFG4z zEs8OfTcDwF/c1XjlBex31TJvLxsRbRI/nHwQ+zpId0Ntc4ZbnkCkWrVoNvb8UzaSxS/hAW5mCvpV qtMSyBTVkG/aSaqEGdMtNc/wP/PGctAbrVfM8D6v2ZEK0824Ij1b2X5ndQSMo45LLdTVN+cycqfqD 6LkPb05g==; Received: from mail-yw1-x114a.google.com ([2607:f8b0:4864:20::114a]) by desiato.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nr8mM-001Oxk-MQ for linux-arm-kernel@lists.infradead.org; Wed, 18 May 2022 01:47:14 +0000 Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d7eaa730d9so8132317b3.13 for ; Tue, 17 May 2022 18:47:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=4FeaUrAcLPRNiyzpAHQ0qcCXUD7YXoCTIslsK0eD41Y=; b=fY1LqfrKQ18+zKLGy9WAd1jhk4VwDPAd0IP4P/GayWs6ndMf1nwMh8ZeYcvmBvuQ86 GpQQbyLBSjF23S7zqVvGdVp+NhPNYUcEUibP9bZO93rvR/t4RaRptXl2EOOp6+M7siuk yyRO+Wl6PuJWn7Xwhyh8gZ85CQPKMf/I/wBMMl0V/SycgsCafQq1iRHTvJ9moWxQmf46 rZyxEQjVEhddig/3bxo3Z7eq6tArA9FnKVE89X+ECO3qXcalVE8q0pmjvTzgQoZD9u8Y 1jNa9K1p0P9tENosGYdbO0KFHBjCCY+zlhZh6sG36xX4v5+dFYKBdKat1qNZevsdhVt/ sD6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=4FeaUrAcLPRNiyzpAHQ0qcCXUD7YXoCTIslsK0eD41Y=; b=zEf1joVDNBYyAhthfnX2O/1Fp0dSyb57soR0oWbv4dVSnJoxTcAh8kWRu320aRKajj 89EdtQG0bLEzBSsLZJWY+vWsbQT9ZjCZ3mJt7Iuer9dQW3Akzn5pwJ0lSvASAWL6h4hL tdPcDy5whVNgQ57iSK6e3JkpKHMc+RoekNzgWPJhEXwb7sZEnck/WP4P7CzsjuTzM12u gAwuiQ3+GsyNgXATbaBNreJ4mtHTw5qUl1G7NsCRD/vC+mrtOtA9HC8i1ScAGOG7Je+4 9jMZPT2xDZdSCAz+ZTW4ISNMNVivTYZAZq/BK0q/48E9vh+aUikjTUYwM11RhSc4FS2t GUiA== X-Gm-Message-State: AOAM531wBIxkqGP0nS0ZXWXHP5MkkwnUK9zUFgFA1i5sass6PvhDnb40 EftzD5zzFHIyTx1h++sIaenOf3CUrY4= X-Google-Smtp-Source: ABdhPJxuNk1xSGxsu2QLdajUUp4vZke7U/gS7MXlBwFIqZvnz7q0GxTwsMzXdo4nRCJXJA5BHFLhn+OxTdg= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:f7bc:90c9:d86e:4ea8]) (user=yuzhao job=sendgmr) by 2002:a81:2684:0:b0:2f8:46f4:7c6d with SMTP id m126-20020a812684000000b002f846f47c6dmr30129509ywm.444.1652838427172; Tue, 17 May 2022 18:47:07 -0700 (PDT) Date: Tue, 17 May 2022 19:46:28 -0600 In-Reply-To: <20220518014632.922072-1-yuzhao@google.com> Message-Id: <20220518014632.922072-10-yuzhao@google.com> Mime-Version: 1.0 References: <20220518014632.922072-1-yuzhao@google.com> X-Mailer: git-send-email 2.36.0.550.gb090851708-goog Subject: [PATCH v11 09/14] mm: multi-gen LRU: optimize multiple memcgs From: Yu Zhao To: Andrew Morton , linux-mm@kvack.org Cc: Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Peter Zijlstra , Tejun Heo , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, page-reclaim@google.com, Yu Zhao , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , " =?utf-8?q?Holger_Hoffst=C3=A4tte?= " , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220518_024711_555205_C1B8C166 X-CRM114-Status: GOOD ( 19.59 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When multiple memcgs are available, it is possible to make better choices based on generations and tiers and therefore improve the overall performance under global memory pressure. This patch adds a rudimentary optimization to select memcgs that can drop single-use unmapped clean pages first. Doing so reduces the chance of going into the aging path or swapping. These two operations can be costly. A typical example that benefits from this optimization is a server running mixed types of workloads, e.g., heavy anon workload in one memcg and heavy buffered I/O workload in the other. Though this optimization can be applied to both kswapd and direct reclaim, it is only added to kswapd to keep the patchset manageable. Later improvements will cover the direct reclaim path. Server benchmark results: Mixed workloads: fio (buffered I/O): +[1, 3]% IOPS BW patch1-8: 2154k 8415MiB/s patch1-9: 2205k 8613MiB/s memcached (anon): +[132, 136]% Ops/sec KB/sec patch1-8: 819618.49 31838.48 patch1-9: 1916516.06 74447.92 Mixed workloads: fio (buffered I/O): +[59, 61]% IOPS BW 5.18-rc1: 1378k 5385MiB/s patch1-9: 2205k 8613MiB/s memcached (anon): +[229, 233]% Ops/sec KB/sec 5.18-rc1: 578946.00 22489.44 patch1-9: 1916516.06 74447.92 Configurations: (changes since patch 6) cat mixed.sh modprobe brd rd_nr=2 rd_size=56623104 swapoff -a mkswap /dev/ram0 swapon /dev/ram0 mkfs.ext4 /dev/ram1 mount -t ext4 /dev/ram1 /mnt memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=P:P -c 1 -t 36 \ --ratio 1:0 --pipeline 8 -d 2000 fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=10m --runtime=90m --group_reporting & pid=$! sleep 200 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=R:R -c 1 -t 36 \ --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed kill -INT $pid wait Client benchmark results: no change (CONFIG_MEMCG=n) Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain --- mm/vmscan.c | 45 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index f292e7e761b1..a7e768675707 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -128,6 +128,13 @@ struct scan_control { /* Always discard instead of demoting to lower tier memory */ unsigned int no_demotion:1; +#ifdef CONFIG_LRU_GEN + /* help make better choices when multiple memcgs are available */ + unsigned int memcgs_need_aging:1; + unsigned int memcgs_need_swapping:1; + unsigned int memcgs_avoid_swapping:1; +#endif + /* Allocation order */ s8 order; @@ -4290,6 +4297,22 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) VM_WARN_ON_ONCE(!current_is_kswapd()); + /* + * To reduce the chance of going into the aging path or swapping, which + * can be costly, optimistically skip them unless their corresponding + * flags were cleared in the eviction path. This improves the overall + * performance when multiple memcgs are available. + */ + if (!sc->memcgs_need_aging) { + sc->memcgs_need_aging = true; + sc->memcgs_avoid_swapping = !sc->memcgs_need_swapping; + sc->memcgs_need_swapping = true; + return; + } + + sc->memcgs_need_swapping = true; + sc->memcgs_avoid_swapping = true; + current->reclaim_state->mm_walk = &pgdat->mm_walk; memcg = mem_cgroup_iter(NULL, NULL, NULL); @@ -4699,7 +4722,8 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw return scanned; } -static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness) +static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness, + bool *swapped) { int type; int scanned; @@ -4765,6 +4789,9 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap sc->nr_reclaimed += reclaimed; + if (type == LRU_GEN_ANON && swapped) + *swapped = true; + return scanned; } @@ -4793,8 +4820,10 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool if (!nr_to_scan) return 0; - if (!need_aging) + if (!need_aging) { + sc->memcgs_need_aging = false; return nr_to_scan; + } /* leave the work to lru_gen_age_node() */ if (current_is_kswapd()) @@ -4816,6 +4845,8 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc { struct blk_plug plug; long scanned = 0; + bool swapped = false; + unsigned long reclaimed = sc->nr_reclaimed; struct pglist_data *pgdat = lruvec_pgdat(lruvec); lru_add_drain(); @@ -4841,13 +4872,19 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc if (!nr_to_scan) break; - delta = evict_folios(lruvec, sc, swappiness); + delta = evict_folios(lruvec, sc, swappiness, &swapped); if (!delta) break; + if (sc->memcgs_avoid_swapping && swappiness < 200 && swapped) + break; + scanned += delta; - if (scanned >= nr_to_scan) + if (scanned >= nr_to_scan) { + if (!swapped && sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH) + sc->memcgs_need_swapping = false; break; + } cond_resched(); }