From patchwork Sun Aug  4 08:01:04 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yafang Shao <laoar.shao@gmail.com>
X-Patchwork-Id: 13752533
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 15A32C3DA64
	for <linux-mm@archiver.kernel.org>; Sun,  4 Aug 2024 08:01:37 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 9C6756B0092; Sun,  4 Aug 2024 04:01:36 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 9751B6B0098; Sun,  4 Aug 2024 04:01:36 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 815FE6B009F; Sun,  4 Aug 2024 04:01:36 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com
 [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 646E16B0092
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 04:01:36 -0400 (EDT)
Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id CE53714186E
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 08:01:35 +0000 (UTC)
X-FDA: 82413818550.26.9CDEABB
Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com
 [209.85.210.176])
	by imf14.hostedemail.com (Postfix) with ESMTP id 0250310002D
	for <linux-mm@kvack.org>; Sun,  4 Aug 2024 08:01:33 +0000 (UTC)
Authentication-Results: imf14.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=LDa4+TZ6;
	spf=pass (imf14.hostedemail.com: domain of laoar.shao@gmail.com designates
 209.85.210.176 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=hostedemail.com;
	s=arc-20220608; t=1722758434;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:references:dkim-signature;
	bh=OImwUt1oCvBGnA7Hqhr9r46KcanrquasYGcAq5MGkkE=;
	b=ARnMuAVCd2zakfH/AAdTdypiVBFfAtbO73zw1YCpHEIcg3fVTBnRxnS+L2sm8AxQbsmYzy
	VTVDsZBr0iswldJYxFg0UYfobh70rdFBrN8IfPlm9g0ni114C3CaHwZz9Mr6Bc4WjbLbfj
	YlF6zDZy66XipBs5SDPlpZvKudmkgzo=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722758434; a=rsa-sha256;
	cv=none;
	b=K0WB+LuX/Y2KMCPXnY7t6j/d6ge/F+HGYyiMEGCAgA+K4AUAjtFru7/Y7ul9uu2jQWVyak
	kC5p9Ugs1WcaJzZNm10DAU6S4p0C8Tm8QA2NID0U42nmd966iWUOuu9sPgzOHBhQoVicUi
	OD0TNNucZTLAkQB04fkFR+ZyF9wa09o=
ARC-Authentication-Results: i=1;
	imf14.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=LDa4+TZ6;
	spf=pass (imf14.hostedemail.com: domain of laoar.shao@gmail.com designates
 209.85.210.176 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-pf1-f176.google.com with SMTP id
 d2e1a72fcca58-70d1c655141so7077669b3a.1
        for <linux-mm@kvack.org>; Sun, 04 Aug 2024 01:01:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1722758493; x=1723363293; darn=kvack.org;
        h=content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:from:to:cc:subject:date:message-id:reply-to;
        bh=OImwUt1oCvBGnA7Hqhr9r46KcanrquasYGcAq5MGkkE=;
        b=LDa4+TZ66JO42Nns8CkDAy5+0SOERMuE3vpAtxSjZ+ltqtvfYNfUT2fRL2iwe9MYai
         zk3ocQbJRUb8vFqWBfNnoqOBJuHPxUnCwT50LD5pn7oruOOK3QAHrtMfC4mbZ+/Qq8E6
         14HcgZMlsSyMAYhgag6A0NIXeaJ9SrlQLWfh/I4TBdq2cj26mvhYVEJnevSyBaQks5k3
         xbd/dOBFSR0zTDDTi2cGLbPFL0dJsLzxu0o5ASc+p8fS9BAq1LGhOnovnJkk7wl3q0Hv
         UF6cjQBXloKvhI+YPaRowgtmm22sPBTUpssYDa760//s1CkjxomKfLz2yEN93WVqakE2
         V5uw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1722758493; x=1723363293;
        h=content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=OImwUt1oCvBGnA7Hqhr9r46KcanrquasYGcAq5MGkkE=;
        b=xUrm4gEBQLAZVSsFHcz3EO2IjOgH6jCRPT7f8LX8yKsRwhjikrdJxoGGpyX5VY47Cw
         S+DDojwkuluCOXqigMYEF0x3eR7aQUJaPDDMSzBd4+2kl5iyRVWXczyrUu4lpUBmHK90
         gRSs07B2+RzFK7dNLA4Bg3Zce7yCGEo0WYebc4FmXZ6RC8C9teceWIunaEAzDv0pheYE
         9GbzOY2ZIsu1cQHQOS8jReKKZIym1E91Rb74EfK/Iav5kJvQqwbYPzh3ZTEWjKTgyeBs
         Yz5pDJvlnoROiYyzUQxJ/vxPHF3cENJLEIqOemlVzWkVjXHLOM9Hm7xuekPX/apsnY+W
         ewzg==
X-Forwarded-Encrypted: i=1;
 AJvYcCVXnuSYHfqb366SCesQIoRaXPxB7VnStDCseZlnRsF/XwhA1adLaBPmirVMfdA8dgLetJEC3yaOOotFlpvmLK0bCSc=
X-Gm-Message-State: AOJu0YzBnvCLzAciXS2toafKeY56hYpmSx5sc/+ryKCWUvllt6Xl+LGW
	Dyn2EBM2ZZ8p3f+aXmGQ2LFUoFoloPSU9fiLjYLVxEP42E336lBB9oi1dDOPQ+w=
X-Google-Smtp-Source: 
 AGHT+IGk0OgejRjRIYTqwNA4+5SjZwmZa6jjqF/8K+skLhDKueWXosYpVddVyWU0E93ia5mjSucWnA==
X-Received: by 2002:a17:902:eccf:b0:1fb:9b47:b642 with SMTP id
 d9443c01a7336-1ff57329340mr72596855ad.31.1722758492482;
        Sun, 04 Aug 2024 01:01:32 -0700 (PDT)
Received: from localhost.localdomain ([39.144.105.172])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-1ff61f3fc8asm39601295ad.231.2024.08.04.01.01.29
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Sun, 04 Aug 2024 01:01:31 -0700 (PDT)
From: Yafang Shao <laoar.shao@gmail.com>
To: akpm@linux-foundation.org
Cc: ying.huang@intel.com,
	mgorman@techsingularity.net,
	linux-mm@kvack.org,
	Yafang Shao <laoar.shao@gmail.com>
Subject: [PATCH v3 0/3] mm: Introduce a new sysctl knob vm.pcp_batch_scale_max 
Date: Sun,  4 Aug 2024 16:01:04 +0800
Message-Id: <20240804080107.21094-1-laoar.shao@gmail.com>
X-Mailer: git-send-email 2.30.1 (Apple Git-130)
MIME-Version: 1.0
X-Rspamd-Queue-Id: 0250310002D
X-Stat-Signature: 4gdndcgg97oqundp5kxxddg93hfe9agy
X-Rspamd-Server: rspam09
X-Rspam-User: 
X-HE-Tag: 1722758493-462502
X-HE-Meta: 
 U2FsdGVkX19r1xO8Iw3QUtf47baJLU/9+ck0U7XKJrYFEyFj6C9EO/VZhPeUPn81RzmHquPa5hPA7zUjzSwlXozNt3Stcm562K9SkfNl6M8F/LyFr6Rz1pCtwXDvWyV0RQYHR85xXAMYi3GB2BAJP34666TxE7PN1hE4UPXbjn90Kk/SkWqTVll3e2KErO4fpno0k+ahynjAvflsEoRd4TzwJVg0CIfbJN/uQXU1kE0bVZu6TUxoVz+GNvyfKjXj5X/t00EmG/gthI+9R4xgoXAkwcMXO6fxgRdHynSX5a4ix7m06epXzKElTSAg9TZKgeJZ5NCg5PmUqUqcF9Rrt0aW247hD5Q4DWDMV+bmpb03u2k8Wmr8P7km0cNwhDj6LikLAIz7xs/L8B1yg3zu5GkRBaV1T40Xr6K5zlWrqbELMPFNJcrmpAnIHC97TrjOr6dc1Eb5h50MrqxDFfxH5HOSQaUOTc6Jn3JEKM4xxB0t9/K5VfYPaTEwjGC1o6sEUOb5x04YHOk108uYiIHVCX964yaE2mbnW0YPXUuMvd7Hbokz4qeVkPYZb3UhYMfqemK8QZkLIxnWfjH/t6MUn/5Uife4G9X4SZWJR9qMGXKn5588iQeyQJuLannCUIeCxtMCRvb6B7bkaXEdvtl/kM7+iGeKEJ2X9lM4UGlR2r+NIIKN772H5rLYlKb/L3H8jTL6CJ1W4mYy8YaikLpXVHXIskxyzQBXsIjGeTxWAoZom6RG37laOYakJU2LmXwBIbJi3VBMrPKI7bZYGGqYrOfxXbqgp0x2Tu0cg5wqWsX/p3wjVE+lBgLvD5gbRK/NXQAjAEj2jvFrk8fU3i5/BKEwZQa0xGzVce38G+E7dxGkuyhxuHEcElcxHp3BwBGYwAjbJXwqtURKhTYqVA0JssOUCbI3NBh8lfcESLh/OeQo6bwiTlolT/KWowpd7DKX4oWYxTJxoxA5bn/0Ym3
 egg7DXQ8
 u8oyBD7gCWMT7KSoVoTJisPvhK5rAQg83bH/PA0Y6/r3ZL/JPFwVdnXKXY/WcnsoecH4cK54l90MFU09pzKFVayijrt0xnpxWpJgxAuic4jxCMth0tGjzB9sHlKlT3bMZQtkYxWoEWsCce/O+3mrmargjCmVF0d6HARtj+4/GfMNu3B12j6S5c2H06rCPTpbconVsCwCWIwBRxb5wtk1EDsxbz5TtVWzMQFQmaXePXqudyQGLRltaimB20rbXsIPlnIadlQLow7xCqM6o+1iL/fjTQf3habNZJXjLII6wZNJHLjNi4d+xEBiXyHEd+iMFwOcJ44p++I9ilaq1RW57flLYl8YYM326guZcjMqlx6jo5lBWseWXIFGjeNEQmTf+h6rCaujFyPETXaHXdJ8fwTfw3g8fM9shE21Da3FUtsFZ9MfmQX0ZhwsgZ3J4ipfEwMgw0GlgH/3JL2o/kUyj/9aN1GJoyLWPe4vTBGu2YgnnyfcZy6w+0GF0RkN5Jf+benGW57WuQB0x037k4IawnDdGIsU+NFXD2SJzO9yece7Y+5hmr5l241AMl1jBSdg5zy7E0Iq4o6T1FSz2PQ3IQJ/UOigF3q/o0DAQmkfILPxYxpqjNEvgBITO66ZeMedxXkBwuyfFBq6ciQ230Um3aWyLQ2D4j3RPyZeAVD5tB6lV7wSlUJYkqfgnoEvc17I7dZ8MJaBYUVt8KzEfvi+WdBvuZbJTlp4WFt8J+CeCHqAjUZWPDk8CDCcK5Q==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.001290, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Background
==========

In our containerized environment, we have a specific type of container
that runs 18 processes, each consuming approximately 6GB of RSS. These
processes are organized as separate processes rather than threads due
to the Python Global Interpreter Lock (GIL) being a bottleneck in a
multi-threaded setup. Upon the exit of these containers, other
containers hosted on the same machine experience significant latency
spikes.

Investigation
=============

Duration my investigation on this issue, I found the latency spikes were
caused by the zone->lock contention. That can be illustrated as follows,

   CPU A (Freer)                 CPU B (Allocator)
  lock zone->lock
  free pages                      lock zone->lock
  unlock zone->lock               
                                  alloc pages
                                  unlock zone->lock

If the Freer holds the zone->lock for an extended period, the Allocator
has to wait and thus latency spikes occures.

I also wrote a python script to reproduce it on my test servers. See the
dedails in patch #3. It is worth to note that the reproducer is based on
the upstream kernel.

Experimenting
=============

As the more pages to be freed in one batch, the long the duration will
be. So my attempt involves reducing the batch size. After I restrict the
batch to the smallest size, there is no complains on the latency spikes
any more.

However, duration my experiment, I found that the
CONFIG_PCP_BATCH_SCALE_MAX is hard to use in practice. So I try to
improve it in this series.

The Proposal
============

This series encompasses two minor refinements to the PCP high watermark
auto-tuning mechanism, along with the introduction of a new sysctl knob
that serves as a more practical alternative to the previous configuration
method.

Future work
===========

To ultimately mitigate the zone->lock contention issue, several suggestions
have been proposed. One approach involves dividing large zones into multi
smaller zones, as suggested by Matthew[0], while another entails splitting
the zone->lock using a mechanism similar to memory arenas and shifting away
from relying solely on zone_id to identify the range of free lists a
particular page belongs to, as suggested by Mel[1]. However, implementing
these solutions is likely to necessitate a more extended development
effort.

Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [0]
Link: https://lore.kernel.org/linux-mm/20240705130943.htsyhhhzbcptnkcu@techsingularity.net/ [1]

Changes:
- v2->v3: 
  - commit log refinement
  - rebase it on mm-everything

- v1-> v2: https://lwn.net/Articles/983837/
  Commit log refinement

- v1: mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max
  https://lwn.net/Articles/981069/

- mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the
  minimum pagelist
  https://lore.kernel.org/linux-mm/20240701142046.6050-1-laoar.shao@gmail.com/

Yafang Shao (3):
  mm/page_alloc: A minor fix to the calculation of pcp->free_count
  mm/page_alloc: Avoid changing pcp->high decaying when adjusting
    CONFIG_PCP_BATCH_SCALE_MAX
  mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max

 Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++
 mm/Kconfig                              | 11 -------
 mm/page_alloc.c                         | 40 ++++++++++++++++++-------
 3 files changed, 47 insertions(+), 21 deletions(-)