From patchwork Thu Dec 1 22:39:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13061871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 531FCC3A5A7 for ; Thu, 1 Dec 2022 22:39:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7833F6B0071; Thu, 1 Dec 2022 17:39:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7349A6B0073; Thu, 1 Dec 2022 17:39:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D48B6B0074; Thu, 1 Dec 2022 17:39:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4EA836B0071 for ; Thu, 1 Dec 2022 17:39:41 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1A1F6811EC for ; Thu, 1 Dec 2022 22:39:41 +0000 (UTC) X-FDA: 80195205762.13.C2E4675 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf12.hostedemail.com (Postfix) with ESMTP id A0B3C40011 for ; Thu, 1 Dec 2022 22:39:40 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lmtKLlPj; spf=pass (imf12.hostedemail.com: domain of 3Ky2JYwYKCMwGCHzs6y66y3w.u64305CF-442Dsu2.69y@flex--yuzhao.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3Ky2JYwYKCMwGCHzs6y66y3w.u64305CF-442Dsu2.69y@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669934380; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=RPWNRTuoWbXrgp1NH21gGXYAfKzjIgMXeMXTZovdDMA=; b=wNrLTu3O7izTI+0jDSaWcA02DWdbAyCAHWqxLYfAiJkMl7D4gkOMGGUUKywwyEC8Qo0tiU 8ZCgMnLOhUf5ds3gJrsccJUaKWa3boz9nETjcp18/andfkgT0WdPoYiOyXPdzFy0evdrty HOode0pJ1InO/xz54TVT++Wdpgy9lmY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669934380; a=rsa-sha256; cv=none; b=X0ot67MZyX30QOSw3NtUZlIUpZ0oUFiJZnX3spVJ1bxonXLgJHNL8XYN5AVK62vPcWRZrQ b6yVYCd9NH0iCJiy1XJ25v479YIEFPFlftL4icoXE5NS+bhpEsFbNOhl1+/R0vfGxW9QIQ l46SrDffUA3Z3I2SwRk8AROEkbN3EFc= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=lmtKLlPj; spf=pass (imf12.hostedemail.com: domain of 3Ky2JYwYKCMwGCHzs6y66y3w.u64305CF-442Dsu2.69y@flex--yuzhao.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3Ky2JYwYKCMwGCHzs6y66y3w.u64305CF-442Dsu2.69y@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-36810cfa61fso30377977b3.6 for ; Thu, 01 Dec 2022 14:39:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:mime-version:message-id:date:from:to:cc:subject :date:message-id:reply-to; bh=RPWNRTuoWbXrgp1NH21gGXYAfKzjIgMXeMXTZovdDMA=; b=lmtKLlPjebbfKByTuieVcEJBX5tuEPkQ2m1E2LeiVqWO3nelYE4fnv3ZWFRCcfN9g3 96rUXW/DxohZ7XiYeWd4rzDQ4E2s+OHGB+UxDT1EQBRecfhKlxJVpmeg+DntEarZ/Uv3 mlcSYBdlbEaiD9lnfjqKydqJuC+2pOLyuBhaGam9/4jtSAC8EU0e7ftldaVPzbsk8Lzz rp1zurujhSz22TMG8ldq6QVJJ2RSiYv2A95JD3ggQ2BnfhPMKiTx7emUSmiu6mqS/uhG 3Y0a2bf+g9RiCpt8c+MTGg1jsehgNeQXgGsCXVInIIeULxgK8TDxxQmO+CrQWmVeN+1n P43g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:mime-version:message-id:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=RPWNRTuoWbXrgp1NH21gGXYAfKzjIgMXeMXTZovdDMA=; b=DlBR0ABdq1obiasBu87QbKaSlbzfXBSwSBV0h71julqwZTXv0jgatA/a4nAJoIS+wS NkH3Ets2iHQ3UNm8W+pt7ZWA/NMSKBlX55oQ/E36CbqkUJQp62uq4EF/5mStgKGSy6Xp Xb5o9AbK0gpTTsqTzvN93N86mQ8oxhCUeK8IeZZwtgt7UxnI62VSKaNtsnzuqQYlHo1a T+wcTS6YCtEM8a07W9u0N8i8vNBycSGW0FQ3On2r1GrMp5j/KTIPxCF0rTwMdrwu9/+M 7QmCA52JzCERmrbpottvtsXqNp+5k0HSTRNFtCTDGlf85BDDw74HENWcMRtI2WeShlFo LZDA== X-Gm-Message-State: ANoB5pmmu9wf+v+GzijLeG69qEWIqYnMGIm+nJUulGzmsCiALjJg7hxS VJzVPufjQFwteB6arnv9y889Zm0jg7o= X-Google-Smtp-Source: AA0mqf6hXnZSM59G1ecGPfl3B/OMjB4+qhluE+MVfphFLbsfMy9iYqnPHkcJmDKaYQ1vxuDpH+3GJiDPBFc= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:1d8c:fe8c:ee3e:abb]) (user=yuzhao job=sendgmr) by 2002:a0d:da82:0:b0:3ac:a266:951f with SMTP id c124-20020a0dda82000000b003aca266951fmr44307316ywe.39.1669934379919; Thu, 01 Dec 2022 14:39:39 -0800 (PST) Date: Thu, 1 Dec 2022 15:39:16 -0700 Message-Id: <20221201223923.873696-1-yuzhao@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.39.0.rc0.267.gcb52ba06e7-goog Subject: [PATCH mm-unstable v1 0/8] mm: multi-gen LRU: memcg LRU From: Yu Zhao To: Andrew Morton Cc: Johannes Weiner , Jonathan Corbet , Michael Larabel , Michal Hocko , Mike Rapoport , Roman Gushchin , Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, Yu Zhao X-Rspam-User: X-Rspamd-Queue-Id: A0B3C40011 X-Stat-Signature: pozot5mnxonoaxes7gxwcohfhmmpp5ez X-Rspamd-Server: rspam01 X-Spamd-Result: default: False [4.85 / 9.00]; SORBS_IRL_BL(3.00)[209.85.128.201:from]; MID_CONTAINS_FROM(1.00)[]; MV_CASE(0.50)[]; FORGED_SENDER(0.30)[yuzhao@google.com,3Ky2JYwYKCMwGCHzs6y66y3w.u64305CF-442Dsu2.69y@flex--yuzhao.bounces.google.com]; MIME_GOOD(-0.10)[text/plain]; RCVD_NO_TLS_LAST(0.10)[]; BAD_REP_POLICIES(0.10)[]; BAYES_HAM(-0.05)[60.49%]; TO_DN_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; R_DKIM_ALLOW(0.00)[google.com:s=20210112]; RCVD_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+]; R_SPF_ALLOW(0.00)[+ip4:209.85.128.0/17]; RCPT_COUNT_TWELVE(0.00)[12]; DKIM_TRACE(0.00)[google.com:+]; PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org]; DMARC_POLICY_ALLOW(0.00)[google.com,reject]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_NEQ_ENVFROM(0.00)[yuzhao@google.com,3Ky2JYwYKCMwGCHzs6y66y3w.u64305CF-442Dsu2.69y@flex--yuzhao.bounces.google.com]; ARC_NA(0.00)[] X-HE-Tag: 1669934380-436910 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs, since each node and memcg combination has an LRU of folios (see mem_cgroup_lruvec()). Its goal is to improve the scalability of global reclaim, which is critical to systemwide memory overcommit in data centers. Note that memcg reclaim is currently out of scope. Its memory bloat is a pointer to each LRU vector and negligible to each node. In terms of traversing memcgs during global reclaim, it improves the best-case complexity from O(n) to O(1) and does not affect the worst-case complexity O(n). Therefore, on average, it has a sublinear complexity in contrast to the current linear complexity. The basic structure of an memcg LRU can be understood by an analogy to the active/inactive LRU (of folios): 1. It has the young and the old (generations); 2. Its linked lists have the head and the tail; 3. The increment of max_seq triggers promotion; 4. Other events, e.g., offlining an memcg, triggers similar operations. In terms of global reclaim, it has two distinct features: 1. Sharding, which allows each thread to start at a random memcg (in the old generation) and improves parallelism; 2. Eventual fairness, which allows direct reclaim to bail out and reduces latency without affecting fairness over some time. The commit message in patch 6 details the workflow: https://lore.kernel.org/r/20221201223923.873696-7-yuzhao@google.com/ The following is a simple test to quickly verify its effectiveness. More benchmarks are coming soon. Test design: 1. Create multiple memcgs. 2. Each memcg contains a job (fio). 3. All jobs access the same amount of memory randomly. 4. The system does not experience global memory pressure. 5. Periodically write to the root memory.reclaim. Desired outcome: 1. All memcgs have similar pgsteal, i.e., stddev(pgsteal)/mean(pgsteal) is close to 0%. 2. The total pgsteal is close to the total requested through memory.reclaim, i.e., sum(pgsteal)/sum(requested) is close to 100%. Actual outcome [1]: stddev(pgsteal)/mean(pgsteal) sum(pgsteal)/sum(requested) MGLRU off 75% 425% MGLRU on 20% 95% #################################################################### MEMCGS=128 for ((memcg = 0; memcg < $MEMCGS; memcg++)); do mkdir /sys/fs/cgroup/memcg$memcg done start() { echo $BASHPID > /sys/fs/cgroup/memcg$memcg/cgroup.procs fio -name=memcg$memcg --numjobs=1 --ioengine=mmap \ --filename=/dev/zero --size=1920M --rw=randrw \ --rate=64m,64m --random_distribution=random \ --fadvise_hint=0 --time_based --runtime=10h \ --group_reporting --minimal } for ((memcg = 0; memcg < $MEMCGS; memcg++)); do start & done sleep 600 for ((i = 0; i < 600; i++)); do echo 256m >/sys/fs/cgroup/memory.reclaim sleep 6 done for ((memcg = 0; memcg < $MEMCGS; memcg++)); do grep "pgsteal " /sys/fs/cgroup/memcg$memcg/memory.stat done #################################################################### [1]: This was obtained from running the above script (touches less than 256GB memory) on an EPYC 7B13 with 512GB DRAM for over an hour. Yu Zhao (8): mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[] mm: multi-gen LRU: remove eviction fairness safeguard mm: multi-gen LRU: remove aging fairness safeguard mm: multi-gen LRU: shuffle should_run_aging() mm: multi-gen LRU: per-node lru_gen_folio lists mm: multi-gen LRU: clarify scan_control flags mm: multi-gen LRU: simplify arch_has_hw_pte_young() check Documentation/mm/multigen_lru.rst | 8 +- include/linux/memcontrol.h | 10 + include/linux/mm_inline.h | 25 +- include/linux/mmzone.h | 127 ++++- mm/memcontrol.c | 16 + mm/page_alloc.c | 1 + mm/vmscan.c | 765 ++++++++++++++++++++---------- mm/workingset.c | 4 +- 8 files changed, 687 insertions(+), 269 deletions(-)