[v2,0/9] swapin refactor for optimization and unified readahead

Message ID	20240102175338.62012-1-ryncsn@gmail.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Kairui Song <ryncsn@gmail.com> To: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org>, Chris Li <chrisl@kernel.org>, "Huang, Ying" <ying.huang@intel.com>, Hugh Dickins <hughd@google.com>, Johannes Weiner <hannes@cmpxchg.org>, Matthew Wilcox <willy@infradead.org>, Michal Hocko <mhocko@suse.com>, Yosry Ahmed <yosryahmed@google.com>, David Hildenbrand <david@redhat.com>, linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com> Subject: [PATCH v2 0/9] swapin refactor for optimization and unified readahead Date: Wed, 3 Jan 2024 01:53:29 +0800 Message-ID: <20240102175338.62012-1-ryncsn@gmail.com> Reply-To: Kairui Song <kasong@tencent.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	swapin refactor for optimization and unified readahead \| expand [v2,0/9] swapin refactor for optimization and unified readahead [v2,1/9] mm/swapfile.c: add back some comment [v2,2/9] mm/swap: move no readahead swapin code to a stand-alone helper [v2,3/9] mm/swap: avoid doing extra unlock error checks for direct swapin [v2,4/9] mm/swap: always account swapped in page into current memcg [v2,5/9] mm/swap: introduce swapin_entry for unified readahead policy [v2,6/9] mm/swap: handle swapcache lookup in swapin_entry [v2,7/9] mm/swap: avoid a duplicated swap cache lookup for SWP_SYNCHRONOUS_IO [v2,8/9] mm/swap: introduce a helper for swapin without vmfault [v2,9/9] mm/swap, shmem: use new swapin helper to skip readahead conditionally

Message ID

20240102175338.62012-1-ryncsn@gmail.com (mailing list archive)

Headers

From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Chris Li <chrisl@kernel.org>,
	"Huang, Ying" <ying.huang@intel.com>,
	Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Yosry Ahmed <yosryahmed@google.com>,
	David Hildenbrand <david@redhat.com>,
	linux-kernel@vger.kernel.org,
	Kairui Song <kasong@tencent.com>
Subject: [PATCH v2 0/9] swapin refactor for optimization and unified readahead
Date: Wed,  3 Jan 2024 01:53:29 +0800
Message-ID: <20240102175338.62012-1-ryncsn@gmail.com>
Reply-To: Kairui Song <kasong@tencent.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

swapin refactor for optimization and unified readahead | expand

Message

Kairui Song Jan. 2, 2024, 5:53 p.m. UTC

From: Kairui Song <kasong@tencent.com>

This series is rebased on latest mm-stable to avoid conflicts.

This series tries to unify and clean up the swapin path, introduce minor
optimization, and make both shmem swapoff make use of SWP_SYNCHRONOUS_IO
flag to skip readahead and swapcache for better performance.

1. Some benchmark for dropping readahead and swapcache for shmem with ZRAM:

- Single file sequence read:
  perf stat --repeat 20 dd if=/tmpfs/test of=/dev/null bs=1M count=8192
  (/tmpfs/test is a zero filled file, using brd as swap, 4G memcg limit)
  Before: 22.248 +- 0.549
  After:  22.021 +- 0.684 (-1.1%)

- Random read stress test:
  fio -name=tmpfs --numjobs=16 --directory=/tmpfs \
  --size=256m --ioengine=mmap --rw=randread --random_distribution=random \
  --time_based --ramp_time=1m --runtime=5m --group_reporting
  (using brd as swap, 2G memcg limit)

  Before: 1818MiB/s
  After:  1888MiB/s (+3.85%)

- Zipf biased random read stress test:
  fio -name=tmpfs --numjobs=16 --directory=/tmpfs \
  --size=256m --ioengine=mmap --rw=randread --random_distribution=zipf:1.2 \
  --time_based --ramp_time=1m --runtime=5m --group_reporting
  (using brd as swap, 2G memcg limit)

  Before: 31.1GiB/s
  After:  32.3GiB/s (+3.86%)

Previously, shmem always used cluster readahead, it doesn't help much even
for single sequence read, and for random stress tests, the performance is
better without it. In reality, due to memory and swap fragmentation cluster
read-head is less helpful for ZRAM.

2. Micro benchmark which use madvise to swap out 10G zero-filled data to
   ZRAM then read them in, shows a performance gain for swapin path:

Before: 11143285 us
After:  10692644 us (+4.1%)

3. Swap off an 10G ZRAM:

Before:
time swapoff /dev/zram0
real    0m12.337s
user    0m0.001s
sys     0m12.329s

After:
time swapoff /dev/zram0
real    0m9.728s
user    0m0.001s
sys     0m9.719s

This also clean up the path to apply a per swap device readahead
policy for all swapin paths.

V1: https://lkml.org/lkml/2023/11/19/296
Update from V1:
  - Rebased based on mm-unstable.
  - Remove behaviour changing patches, will submit in seperate series
    later.
  - Code style, naming and comments updates.
  - Thanks to Chris Li for very detailed and helpful review of V1. Thanks
    to Matthew Wilcox and Huang Ying for helpful suggestions.

Kairui Song (9):
  mm/swapfile.c: add back some comment
  mm/swap: move no readahead swapin code to a stand-alone helper
  mm/swap: avoid doing extra unlock error checks for direct swapin
  mm/swap: always account swapped in page into current memcg
  mm/swap: introduce swapin_entry for unified readahead policy
  mm/swap: also handle swapcache lookup in swapin_entry
  mm/swap: avoid a duplicated swap cache lookup for SWP_SYNCHRONOUS_IO
  mm/swap: introduce a helper for swapin without vmfault
  swap, shmem: use new swapin helper and skip readahead conditionally

 mm/memory.c     |  74 +++++++-------------------
 mm/shmem.c      |  67 +++++++++++------------
 mm/swap.h       |  39 ++++++++++----
 mm/swap_state.c | 138 +++++++++++++++++++++++++++++++++++++++++-------
 mm/swapfile.c   |  32 +++++------
 5 files changed, 218 insertions(+), 132 deletions(-)