From patchwork Wed Sep 20 19:02:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13393222 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94BCDC04FEC for ; Wed, 20 Sep 2023 19:03:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D94A6B0197; Wed, 20 Sep 2023 15:03:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 289566B0198; Wed, 20 Sep 2023 15:03:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1519D6B0199; Wed, 20 Sep 2023 15:03:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 04E866B0197 for ; Wed, 20 Sep 2023 15:03:05 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C85EDB4265 for ; Wed, 20 Sep 2023 19:03:04 +0000 (UTC) X-FDA: 81257898288.10.17F2A7E Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf08.hostedemail.com (Postfix) with ESMTP id C5971160015 for ; Wed, 20 Sep 2023 19:03:01 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GN0pTAnc; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695236581; a=rsa-sha256; cv=none; b=PA1G2BXwkyr5JJNXEsI31C+z0NE9spTUUxvg0T6EZ4PbTItXbBvojZxUiGm5ui2Rq1QkAK astBKJmWjmofJZ/27OZuTAvMaIMOSXbg0fS88qR4a6CBLWdd/TM4kBsVFOPF71NgzQUtXV VGm43SKQbCMSm0bcm52lSbigjaYlMmo= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GN0pTAnc; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695236581; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=WHWMgku4LLrGTufn8n1g27TfkS1hUDllwGuoEQkkyB0=; b=hGmOPJ/gZJVLzvVh/OQZmYXvC2yUY3NBDmZ45FlXnsfgfwUALWX+DmSMFaDJaPvzdTQtMw trERV5NiDjGYYZdx4uyiAmsBiPOHPNYtZGci0Q4Iw9IQ3Qq4DTmSghbLzpAzALDSRsh91q 3jzFPsgGCtSfjOXbGCjJjJTGrPy2OtY= Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-690fa0eea3cso117577b3a.0 for ; Wed, 20 Sep 2023 12:03:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695236580; x=1695841380; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=WHWMgku4LLrGTufn8n1g27TfkS1hUDllwGuoEQkkyB0=; b=GN0pTAncdKmJOSr1SDAbZ+nrPGIqVd4LY9LjhtdnpO3VHZAWQwjfb6sJwc5+/JTWbu XtIKWRUU6Dl7MEbo+GiWIAoj73cVmkt1hvqODHPOxwOHJsIj4ooHo7dfULf5g3kCgna9 Fvt+EX/AQYuXj74Y35ZcXmyUsGL3ArasGCh5/CgV9ZdBZxEYGuZnFgLaMwiQKSvH4+cJ 7AKJ0ZXqon8Dyx+mpO5+Zv2+fJa1Qg9TGJY8xJQ13Tpvj3jtv05DwIc29kMQDrD0mRqw VJtTVK7sZCujSf7yYXBcHMOi6I6iVi3P2UaiZIDRVkis7fP35lD1uLQyCxAc5a4JiIFq 5KvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695236580; x=1695841380; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WHWMgku4LLrGTufn8n1g27TfkS1hUDllwGuoEQkkyB0=; b=VA8IoI7p3/PVLVHiWKvBfGe37+aAjZvIoSY+4IeTa9wxAWRIkmvZB9ltJ2Vkl3PTBY nct1kjvJ82K26qesPXSDo9rYSMbjoRPeqV7vJHHBmSo1fKdMFKLR4eikfDJc3iD4jeIJ 3wEAbij4Pbm1+3b8lzfjgxfE8V0bZwbi9L9chxBdTKLkJSIgjrkTmNrADFvnrOZLIV5w 5Jqobj5d1DVvz/tvCvB6k40WoYceG5xMomf0KdQRMxHMRjR06qbUgG48L5bIw3q/VEXm RDWe3TDSu5r2rEjzmScHtYrtcB47H0NqAl2ExIFWJT1nrxT8vuQok3BwuG2Vi78SGap2 yBsQ== X-Gm-Message-State: AOJu0YwPlXWEZt0iu1b4zeeZ4+gxJ3RbdniDA0x+ciDBYPKLO3K7LZMs XNJM5FnFwwESP31hXWV9MszFX771iks1HT7RSp0= X-Google-Smtp-Source: AGHT+IGVg2XSENvLktAqZHQqkry/EQebwFO4mWYs8eMdqulN7NUvY80VGar70a8xd+peAgrSNcqYIQ== X-Received: by 2002:a05:6a20:7d88:b0:132:ff57:7fab with SMTP id v8-20020a056a207d8800b00132ff577fabmr4220198pzj.2.1695236579796; Wed, 20 Sep 2023 12:02:59 -0700 (PDT) Received: from KASONG-MB2.tencent.com ([124.127.145.18]) by smtp.gmail.com with ESMTPSA id m5-20020aa78a05000000b006871fdde2c7sm423935pfa.110.2023.09.20.12.02.56 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 20 Sep 2023 12:02:59 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Yu Zhao , Roman Gushchin , Johannes Weiner , Michal Hocko , Hugh Dickins , Nhat Pham , Yuanchu Xie , Kalesh Singh , Suren Baghdasaryan , "T . J . Mercier" , linux-kernel@vger.kernel.org, Kairui Song Subject: [RFC PATCH v3 0/6] Refault distance update with MGLRU support Date: Thu, 21 Sep 2023 03:02:38 +0800 Message-ID: <20230920190244.16839-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.41.0 Reply-To: Kairui Song MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C5971160015 X-Stat-Signature: 1r598575nrj6hyuer8oaa696wm9xe88c X-Rspam-User: X-HE-Tag: 1695236581-795463 X-HE-Meta: U2FsdGVkX18w4eBvfGpOXFseMc9awLwT9zK/NO27zfi60cb4V8gS8/v2TohiDPrC/Oy06ZfG1Em6yz0oepw0BTjdaMAe22rnfseLYBxbC/DrjucjHszWaNlXZByRctClpM+Yk+C0S4/byPEUXKhpADvKOOelFB2C4dKqTe6SExqTbNgUr2lvK6NHhyM7cewP7KFIZUIldsZqLIZDmE2cUjpNpeqXOM5swYU1Fmg7ZzyPBuKsykzWhVdJi7Q0QV+qZoWfvRaDsXWN+nGkK2DCRcDyq09SW0nbu/3lCwfV2hKd3HOmjHQauSFmzt5ci9FdcEXE7ZxaVSdfINASTznbwS+woLwy9wgpnXDqntUaaG7y6rUJPKfJEwsaix8AYngsh2yyY3P4tLlrIa2YqPbftZbpTh2w3136fq1dmRs5g0YhbQAy2niWZ00A8VNj/SovqfGiXMlQQF1KbP/hnGDDovUErzyF/pmnyB9r0p71Jp7W6KelNtMWnW946l5QELnbjuvYCA4ST08XiUEss27810eWmuRyeTgsqCTWKpLHJZj0bWtyDyUgNl0+CxzL9Hf1dpjMSOEIUbsBLZBSI+3HMmcgkqNJFBlmGoTke6u7d72TEI9e0RA4G9YnL0glCXXhsqlzXQC4wbvc1mBhXN115T7TUUEdSwuLz8zeJ7ryTUmA3pmsuCuYmkkYWcu2V/EQVz5r6JPkizUjjlmNGZWP9Cg1i4g3/+Ppd9aZxQqEgQT5eYPSHnkEgaZhYS7Ns9t/vU0a9hxOBrGjq15GiEDnkL7/PvmSUsLfXwPd/qHRGaiC1XTmts0LDqqPLeomegJmY7Bq1iz4RvudkCN7TYeWZGzJOgrFT5B6IWOialWguQna3FPLBeTh1bdkDcHnngPi1bFouLQ6uUIQN3hYRBIZYJFQTkkap6xuf/ntJN+4GlwMJXxCXOFqYD2+DTgtMMFvaCj8qwsbU3SWkx1RtM2 E/WW3IIb 071Nw03ZLZp8BcEPjFAIWwVTWRr/tNCpv/nfvMRQ3upPGwv2uhSs8q5sbEP4WjepWBroi9J4LEkFGcmn5DurrAfynNDSJOZTXhD3hDdBlE+TLn5W7aj00XiCQ9oAQxEgtIIZ7uJIaybgnfzyuInXF6VyytbHLLNS3Iu/KsiUXn3nTj2uZ6J+QJ1mWZ4aVM0HWTu2E5cKiuAfxaKUymubul1geHEA/BBQcXcL6i6oUFTtBTMJNKshi1REkUQvUFqG0S583HVApkVGmh2mhX0FbJsrJp3JjJEWLAhyZLGJPzQx5MYhDXNCRHp5jzwvAI20YrYeVE6jzHoF0qI+aX4AN47D6QMmWWu+DYXPI62t1ZAyOAs+g9jBaLYVHYyyYG4qCSxtLd19GreJPCuJUcFjV9QmjaYgp5tbizXx5vZSCWb3VLcLF5kKTKbttiy4yO4I0ZVvecwcZllVOzCl9h2P+eFpMYoi9xI/AEqLu+daMapsBQ5dZoiMMIMgKiTLF7jgKyPiEksbervbleKKjyUnVrZ+4GopQA6Yb/FWwI4ZA5cpk+jPMb6MaxBezPtYhPv8IQkrEw5wG1zjMIX69QUhhKLmuvQ0HBKrMfDy45Z6HAaun07z+4oi5mpxhgoRJnkDu+xozt9TB6GFqZ9MtFEfdpiC29+2AffXW37bn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000010, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Kairui Song I noticed MGLRU not working very well on certain workflows, which is observed on some workloads with heavy memory stress. After some debugging, I found this was related to refault distance detection, when the file page workingset size exceeds total memory, and the access distance (the left-shift time of a page before it gets activated or promoted, considering LRU starts from right) of file pages are larger than total memory. All file pages are stuck on the oldest generation and getting read-in then evicted permutably, few get activated and stay in memory. This series tries to fix this problem by rework the refault distance based activation to better fit MGLRU, and also tries to use a unified algorithm for both MGLRU and Inactive/Active LRU, the performance almost doubled for the workloads that are not working well previously. Patch 1/5 reworked the refault distance detection model for Inactive/Active LRU, and updated the comments. Patch 2/5 splitted the code logic into a helper, prepare for MGLRU. Patch 3/5 and 4/5 are code simplification and updates for MGLRU. Patch 5/5 applies the modified refault distance algorithm for MGLRU. Following benchmark showed 5x improvement: To simulate the workflow, I setup a 3-replicated mongodb cluster using docker, each in a standalone cgroup, set to use 5GB of wiretiger cache and 10g of oplog, on a 32G VM. The benchmark is done using https://github.com/apavlo/py-tpcc.git, modified to run STOCK_LEVEL query only, for simulating slow query and get a stable result. Before (with ZRAM enabled, the result won't change whether any kind of swap is on or not): $ tpcc.py --config=mongodb.config mongodb --duration=900 --warehouses=500 --clients=30 ================================================================== Execution Results after 919 seconds ------------------------------------------------------------------ Executed Time (µs) Rate STOCK_LEVEL 577 27584645283.7 0.02 txn/s ------------------------------------------------------------------ TOTAL 577 27584645283.7 0.02 txn/s Patched (with ZRAM enabled): $ tpcc.py --config=mongodb.config mongodb --duration=900 --warehouses=500 --clients=30 ================================================================== Execution Results after 905 seconds ------------------------------------------------------------------ Executed Time (µs) Rate STOCK_LEVEL 2542 27121571486.2 0.09 txn/s ------------------------------------------------------------------ TOTAL 2542 27121571486.2 0.09 txn/s The performance is 5x times better than before. Testing with lower stress and some other benchmarks also shows slight improvement or equivalent performance (eg. fio tests shows a observable performance gain). Sending out as RFC, I'm still doing more test on it, since this changed a frequently used algorithm and not really sure if there is any performance regression on long term. It should improvement the performance for file pages in general even if there are low memory pressure, since it saved some cgroup iterations and atomic operations. Update from V2: - Rebase to latest mm-stable and redone some tests. - Split the algorithm change into a stand alone patch as suggested by Johannes Weiner. Update from V1: - Removed the fls operations which previously used in patch 1 for protecting active pages by expontial ratio, simply compare with number of inactive pages seems good enough. - Update some benchmarks results, test result that are basically identical as before are not updated. Kairui Song (6): workingset: simplify and use a more intuitive model workingset: move refault distance checking into to a helper workignset: simplify the initilization code workingset: simplify lru_gen_test_recent mm, lru_gen: convert avg_total and avg_refaulted to atomic workingset, lru_gen: apply refault-distance based re-activation include/linux/mmzone.h | 4 +- include/linux/swap.h | 2 - mm/swap.c | 1 - mm/vmscan.c | 30 +-- mm/workingset.c | 416 +++++++++++++++++++++-------------------- 5 files changed, 236 insertions(+), 217 deletions(-)