From patchwork Sun Nov 19 19:47:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13460656 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4A59C072A2 for ; Sun, 19 Nov 2023 19:48:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 359596B0346; Sun, 19 Nov 2023 14:48:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 309016B034A; Sun, 19 Nov 2023 14:48:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D1326B034E; Sun, 19 Nov 2023 14:48:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0709D6B0346 for ; Sun, 19 Nov 2023 14:48:07 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D53A8120796 for ; Sun, 19 Nov 2023 19:48:06 +0000 (UTC) X-FDA: 81475739772.03.4EA80A1 Received: from mail-oa1-f42.google.com (mail-oa1-f42.google.com [209.85.160.42]) by imf08.hostedemail.com (Postfix) with ESMTP id 147C5160010 for ; Sun, 19 Nov 2023 19:48:03 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fVJROIGC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.160.42 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700423284; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=j7g3Lfo6dN87EwnrcmQzmZgcn/H6DphiWq+VSi6rH/E=; b=OMqMk+/NhAbdxmQxTEXSaTfJIx3wP7MZ4lXz0IxcsE1bJpsKypRfmNqZ2ozVHTpfWinnQM nYCTKYPFx48SLMP0c+RF/AkIOPN89kFTXXs59ldM1otC6Ia0uXpIOlHxodZGoMYuSInddf miWCALmvo04obICxaqZrwanikLBkzGQ= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fVJROIGC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.160.42 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700423284; a=rsa-sha256; cv=none; b=scialpPVk5koSWQc005pHEdR/wPfZL2rSFHoA7EM28zDXhFTldT+8W28QuqkV79tlt99qJ 0kyfj2LqwIFHCOyAFYwsC+AslleKAGuqd+kMRsWkb9RZgV+cSNxctEVdPpJL6DKBs+MKBq CXnxvlbgzqmvv7ubmKGS3FLIjxMscXM= Received: by mail-oa1-f42.google.com with SMTP id 586e51a60fabf-1f93d0cd2ddso92608fac.1 for ; Sun, 19 Nov 2023 11:48:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700423282; x=1701028082; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=j7g3Lfo6dN87EwnrcmQzmZgcn/H6DphiWq+VSi6rH/E=; b=fVJROIGCMJhn2uDNvA5uYMteJGVJ7YC42YdWqZjXAZYdytan+M6LAmeeN5XEG5Oxkf LQGOWgBH3BbVUytqCw6L7zJDEXNQqV6S7krGU6yv0fYCMe9BRZ7kS5CwMZXsm08QA8dZ Vd+ThV2pe3uSz1mz7kKEfcO69YTgu2ju1MylR6Wiwo+cFPtwv8fUHADyAHtP1WcEp9+Q +xcpeifMLyeHSN21kOODvwqzTOI9UNXYugAQjEo6P+n6Y8pPlV3dfoxUGPG6MW42E7SD SDRRfNR5+qcyb1Lku4m7WneyV2hICVMACck8tSV9g2dJxgyLeHGi0g+NS1cWQ3H1RY1+ XElg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700423282; x=1701028082; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=j7g3Lfo6dN87EwnrcmQzmZgcn/H6DphiWq+VSi6rH/E=; b=gDrbyYB8ZGZLLYHReiHNDhv8UryGl1EmZ4t7XUiYHyZMoZqXm9xmUJRGJfRrkJzghA wcJLcS/LMCi5lhMHjGgynXF/3CiFrjxVgQQKx0iOO9ZzHO4Bn3m+B0aFQsdyCJdtXbZd hTWc+vzX/547/p3D31mtDJwvMYI47fmlauY1F3v2eF/TjqUOWlvSHYb7OmQSXORD1nD2 /Ir8iaw+m+7YYEmO5fsoyI1pvFsyAvWT4qHxR1KGoIki6DBGYc19RYbGEMVwaNvJAVLC D86pvadD43lOqYSFQHLsMSXiy0kyyUPifbpu6z7YrbauGzec3Zcw/fRniHz6HT+y4rp5 mlkA== X-Gm-Message-State: AOJu0YyJJfoEnYkWE8GXxfgy0a0Z9XqZ4LGvhiP+SoDIAub78bni8iCV eufwIFOU400tnt6aDgIRScQ6Nq+aNILxnurs X-Google-Smtp-Source: AGHT+IHbOMmoRegyUVpK5atjtU0rtMh2MzeuyRg060GiHOZ0rDGKHx49WpCR2J4WmJAN0CmCPqc3zg== X-Received: by 2002:a05:6870:aa97:b0:1e9:e975:4418 with SMTP id gr23-20020a056870aa9700b001e9e9754418mr7240498oab.53.1700423281997; Sun, 19 Nov 2023 11:48:01 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.40.79]) by smtp.gmail.com with ESMTPSA id a6-20020aa78646000000b006cb7feae74fsm1237140pfo.164.2023.11.19.11.47.58 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 19 Nov 2023 11:48:01 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , "Huang, Ying" , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 00/24] Swapin path refactor for optimization and bugfix Date: Mon, 20 Nov 2023 03:47:16 +0800 Message-ID: <20231119194740.94101-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.42.0 Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 147C5160010 X-Stat-Signature: u9oejmwgp5hfj3s9mj8jcyox94qxeueh X-HE-Tag: 1700423283-970347 X-HE-Meta: U2FsdGVkX1//tjh7XYNFQLeknxkuawu/UYjh3YBAJWr+fxHMWT4jQ+lk8PAwb9XjH6Anc0ZdrLfw5Ocq6MTSETxU2NKKkTdWKPt7ROc3mSCfsNbf/Sglo4zXwi9sD8Co5iRP9gkatKJqwU+atW+U1n7oCwpfXkA0lLAYerYbK/0nasBimqTAyAqEfYB7se2rmu4BQFkKu3icgIygR8+4v7MxkYuHbySnfOdleBiHLBIK9UevWPru7ZvrXFT0jr8a2kVXxDlQ/yu4BROhwYW+KDjMLTEEXW0iW4RpK4HYlVkSryN1I1DP9RJ2BMZQYmI6kAqkKFyTjgCvVniUM7rXHdedjJJfJ32RbtavrcP+u1oIDyJjr6kUpPGyHb7vwGkxNXaqznwFUf6z1tyUVysD6RV/AZCqdjULMw0dft/DpgjsTwnWV7IHptf3XWlpUq6llNMhpFbuaYz/xGibCUVI4tR4ojusIShNrxEFoSegzcAKgNIGDbPehqvN6kHIOVoZdDJQ0e3iZ+iqg2I75bLfMbJ7z88DkBSJsHCNGSXzxfpyGrXlhYCot1uD/fAUgBiv2YyqZFBGkjC/PQbZ8JVNYCEFpBXYSkE+fYsonljAjjU6IFgFU5WCwmuNKNb+k6R4TxqnQKImLTH22yTF0zAks+TXoEKEMl3rYDETEOmf5lNlRbuKg4rrRIbwObKTApScE7Smv/lsQjt1nM0koD2XcO1hC3+JhZ5Lrd+EEjHl0SMLNRz6GeERV8UlapXzvOf9Bd5FSCD18yvqnEMMwjAVFDrEt/jXkRIA2rtfWXiOpQwWwe8DgjyxS1SxwIpV1tkEk9C5tfWPHlafu88PENM16dH0npqweIOm8h/i8p0GzSUEAlXQHmXZWVdHdTlTnnx7qOLlHf8GNHkpii37K6pnizGPl5XbYewxg1DyO0z9CZcXQ5LqU5KXIbMpn7BP8LRAcL+zhAa80Qqyiy0HXsb eggR5hLR O7Uj2DT5MJ981v0ofH1OWrG+8YRdJVPYY3BuCfN9o6jE+vel0qYRuOt8oJKunEwiGhE6G72hJEtUQy6fzm42z0sN+72T4ac1seoi/tHhXgb9Q+wfnjxocSS0pDz8t754atSKaqrvIxBmS2gfhK62sMivGob/K7oXDLYViIvc3r/LGrNaLYhOKkou35ClYUbP76jf+LB51D0BAWIBMeuRgS3C/xpnM7k5udzrUMfyNqwgvW8LNNJxfSU5fAEXo7X/AY1S/9D3fNwg1Aj/jjizYnNX07lvjuVC7RqrrYHLcyD32yx0mtJUJqukhhgZmycswJdNVEpCYuwRny2NNaTIOehBw5pDUQl5Z8sRgp09Yg/sgr9vrzOygaGYWk+KkbrNSgFUFnUXJNJvhmvnVToncfHndwutIfNKB4CnkLctRYdqdo7T7cAuQh2V4FpwgSKhiKTAzowMXQw9dkjMI1F/r2nInosqZdE8b2StbLtz+4Tj+qAfBQmKgw+1IoXqzMzYxhhS3NAnM6Oyda8oAre2X9gdw+CiqJGMgpWxeEG20qT023x5UHm70pUu/XncUa32KY/gjT687ODTbVGVvWigquiJ6eZ6NLG4BBifD2W1U+vSKaTXr06ettY4W+LIAGaRWNzrn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song This series tries to unify and clean up the swapin path, fixing a few issues with optimizations: 1. Memcg leak issue: when a process that previously swapped out some migrated to another cgroup, and the origianl cgroup is dead. If we do a swapoff, swapped in pages will be accounted into the process doing swapoff instead of the new cgroup. This will allow the process to use more memory than expect easily. This can be easily reproduced by: - Setup a swap. - Create memory cgroup A, B and C. - Spawn process P1 in cgroup A and make it swap out some pages. - Move process P1 to memory cgroup B. - Destroy cgroup A. - Do a swapoff in cgroup C - Swapped in pages is accounted into cgroup C. This patch will fix it make the swapped in pages accounted in cgroup B. 2. When there are multiple swap deviced configured, if one of these devices is not SSD, VMA readahead will be globally disabled. This series will make the readahead policy check per swap entry. 3. This series also include many refactor and optimzations: - Swap readahead policy check is unified for page-fault/swapoff/shmem, so swapin from ramdisk (eg. ZRAM) will always bypass swapcache. Previously shmem and swapoff have different behavior on this. - Some mircro optimization (eg. skip duplicated xarray lookup) for swapin path while doing the refactor. Some benchmark: 1. fio test for shmem (whin 2G memcg limit and using lzo-rle ZRAM swap): fio -name=tmpfs --numjobs=16 --directory=/tmpfs --size=256m --ioengine=mmap \ --iodepth=128 --rw=randrw --random_distribution= --time_based\ --ramp_time=1m --runtime=1m --group_reporting RANDOM=zipf:1.2 ZRAM Before (R/W, bw): 7339MiB/s / 7341MiB/s After (R/W, bw): 7305MiB/s / 7308MiB/s (-0.5%) RANDOM=zipf:0.5 ZRAM Before (R/W, bw): 770MiB/s / 770MiB/s After (R/W, bw): 775MiB/s / 774MiB/s (+0.6%) RANDOM=random ZRAM Before (R/W, bw): 537MiB/s / 537MiB/s After (R/W, bw): 552MiB/s / 552MiB/s (+2.7%) We can see readahead barely helps, and for random RW there is a observable performance gain. 2. Micro benchmark which use madvise to swap out 10G zero-filled data to ZRAM then read them in, shows a performance gain for swapin path: Before: 12480532 us After: 12013318 us (+3.8%) 4. The final vmlinux is also a little bit smaller (gcc 8.5.0): ./scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 8/7 grow/shrink: 5/6 up/down: 5737/-5789 (-52) Function old new delta unuse_vma - 3204 +3204 swapin_page_fault - 1804 +1804 swapin_page_non_fault - 437 +437 swapin_no_readahead - 165 +165 swap_cache_get_folio 291 326 +35 __pfx_unuse_vma - 16 +16 __pfx_swapin_page_non_fault - 16 +16 __pfx_swapin_page_fault - 16 +16 __pfx_swapin_no_readahead - 16 +16 read_swap_cache_async 179 191 +12 swap_cluster_readahead 912 921 +9 __read_swap_cache_async 669 673 +4 zswap_writeback_entry 1463 1466 +3 __do_sys_swapon 4923 4920 -3 nr_rotate_swap 4 - -4 __pfx_unuse_pte_range 16 - -16 __pfx_swapin_readahead 16 - -16 __pfx___swap_count 16 - -16 __x64_sys_swapoff 1347 1312 -35 __ia32_sys_swapoff 1346 1311 -35 __swap_count 72 - -72 shmem_swapin_folio 1697 1535 -162 do_swap_page 2404 1942 -462 try_to_unuse 1867 880 -987 swapin_readahead 1377 - -1377 unuse_pte_range 2604 - -2604 Total: Before=30085393, After=30085341, chg -0.00% Kairui Song (24): mm/swap: fix a potential undefined behavior issue mm/swapfile.c: add back some comment mm/swap: move no readahead swapin code to a stand alone helper mm/swap: avoid setting page lock bit and doing extra unlock check mm/swap: move readahead policy checking into swapin_readahead swap: rework swapin_no_readahead arguments mm/swap: move swap_count to header to be shared mm/swap: check readahead policy per entry mm/swap: inline __swap_count mm/swap: remove nr_rotate_swap and related code mm/swap: also handle swapcache lookup in swapin_readahead mm/swap: simplify arguments for swap_cache_get_folio swap: simplify swap_cache_get_folio mm/swap: do shadow lookup as well when doing swap cache lookup mm/swap: avoid an duplicated swap cache lookup for SYNCHRONOUS_IO device mm/swap: reduce scope of get_swap_device in swapin path mm/swap: fix false error when swapoff race with swapin mm/swap: introduce a helper non fault swapin shmem, swap: refactor error check on OOM or race swap: simplify and make swap_find_cache static swap: make swapin_readahead result checking argument mandatory swap: make swap_cluster_readahead static swap: fix multiple swap leak when after cgroup migrate mm/swap: change swapin_readahead to swapin_page_fault include/linux/swap.h | 7 -- mm/memory.c | 109 +++++++-------------- mm/shmem.c | 55 ++++------- mm/swap.h | 34 ++++--- mm/swap_state.c | 222 ++++++++++++++++++++++++++++++++----------- mm/swapfile.c | 70 ++++++-------- mm/zswap.c | 2 +- 7 files changed, 269 insertions(+), 230 deletions(-)