From patchwork Sun Aug 4 08:01:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13752533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15A32C3DA64 for ; Sun, 4 Aug 2024 08:01:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C6756B0092; Sun, 4 Aug 2024 04:01:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9751B6B0098; Sun, 4 Aug 2024 04:01:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 815FE6B009F; Sun, 4 Aug 2024 04:01:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 646E16B0092 for ; Sun, 4 Aug 2024 04:01:36 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CE53714186E for ; Sun, 4 Aug 2024 08:01:35 +0000 (UTC) X-FDA: 82413818550.26.9CDEABB Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf14.hostedemail.com (Postfix) with ESMTP id 0250310002D for ; Sun, 4 Aug 2024 08:01:33 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LDa4+TZ6; spf=pass (imf14.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722758434; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=OImwUt1oCvBGnA7Hqhr9r46KcanrquasYGcAq5MGkkE=; b=ARnMuAVCd2zakfH/AAdTdypiVBFfAtbO73zw1YCpHEIcg3fVTBnRxnS+L2sm8AxQbsmYzy VTVDsZBr0iswldJYxFg0UYfobh70rdFBrN8IfPlm9g0ni114C3CaHwZz9Mr6Bc4WjbLbfj YlF6zDZy66XipBs5SDPlpZvKudmkgzo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722758434; a=rsa-sha256; cv=none; b=K0WB+LuX/Y2KMCPXnY7t6j/d6ge/F+HGYyiMEGCAgA+K4AUAjtFru7/Y7ul9uu2jQWVyak kC5p9Ugs1WcaJzZNm10DAU6S4p0C8Tm8QA2NID0U42nmd966iWUOuu9sPgzOHBhQoVicUi OD0TNNucZTLAkQB04fkFR+ZyF9wa09o= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LDa4+TZ6; spf=pass (imf14.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-70d1c655141so7077669b3a.1 for ; Sun, 04 Aug 2024 01:01:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722758493; x=1723363293; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=OImwUt1oCvBGnA7Hqhr9r46KcanrquasYGcAq5MGkkE=; b=LDa4+TZ66JO42Nns8CkDAy5+0SOERMuE3vpAtxSjZ+ltqtvfYNfUT2fRL2iwe9MYai zk3ocQbJRUb8vFqWBfNnoqOBJuHPxUnCwT50LD5pn7oruOOK3QAHrtMfC4mbZ+/Qq8E6 14HcgZMlsSyMAYhgag6A0NIXeaJ9SrlQLWfh/I4TBdq2cj26mvhYVEJnevSyBaQks5k3 xbd/dOBFSR0zTDDTi2cGLbPFL0dJsLzxu0o5ASc+p8fS9BAq1LGhOnovnJkk7wl3q0Hv UF6cjQBXloKvhI+YPaRowgtmm22sPBTUpssYDa760//s1CkjxomKfLz2yEN93WVqakE2 V5uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722758493; x=1723363293; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OImwUt1oCvBGnA7Hqhr9r46KcanrquasYGcAq5MGkkE=; b=xUrm4gEBQLAZVSsFHcz3EO2IjOgH6jCRPT7f8LX8yKsRwhjikrdJxoGGpyX5VY47Cw S+DDojwkuluCOXqigMYEF0x3eR7aQUJaPDDMSzBd4+2kl5iyRVWXczyrUu4lpUBmHK90 gRSs07B2+RzFK7dNLA4Bg3Zce7yCGEo0WYebc4FmXZ6RC8C9teceWIunaEAzDv0pheYE 9GbzOY2ZIsu1cQHQOS8jReKKZIym1E91Rb74EfK/Iav5kJvQqwbYPzh3ZTEWjKTgyeBs Yz5pDJvlnoROiYyzUQxJ/vxPHF3cENJLEIqOemlVzWkVjXHLOM9Hm7xuekPX/apsnY+W ewzg== X-Forwarded-Encrypted: i=1; AJvYcCVXnuSYHfqb366SCesQIoRaXPxB7VnStDCseZlnRsF/XwhA1adLaBPmirVMfdA8dgLetJEC3yaOOotFlpvmLK0bCSc= X-Gm-Message-State: AOJu0YzBnvCLzAciXS2toafKeY56hYpmSx5sc/+ryKCWUvllt6Xl+LGW Dyn2EBM2ZZ8p3f+aXmGQ2LFUoFoloPSU9fiLjYLVxEP42E336lBB9oi1dDOPQ+w= X-Google-Smtp-Source: AGHT+IGk0OgejRjRIYTqwNA4+5SjZwmZa6jjqF/8K+skLhDKueWXosYpVddVyWU0E93ia5mjSucWnA== X-Received: by 2002:a17:902:eccf:b0:1fb:9b47:b642 with SMTP id d9443c01a7336-1ff57329340mr72596855ad.31.1722758492482; Sun, 04 Aug 2024 01:01:32 -0700 (PDT) Received: from localhost.localdomain ([39.144.105.172]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1ff61f3fc8asm39601295ad.231.2024.08.04.01.01.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 04 Aug 2024 01:01:31 -0700 (PDT) From: Yafang Shao To: akpm@linux-foundation.org Cc: ying.huang@intel.com, mgorman@techsingularity.net, linux-mm@kvack.org, Yafang Shao Subject: [PATCH v3 0/3] mm: Introduce a new sysctl knob vm.pcp_batch_scale_max Date: Sun, 4 Aug 2024 16:01:04 +0800 Message-Id: <20240804080107.21094-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) MIME-Version: 1.0 X-Rspamd-Queue-Id: 0250310002D X-Stat-Signature: 4gdndcgg97oqundp5kxxddg93hfe9agy X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1722758493-462502 X-HE-Meta: U2FsdGVkX19r1xO8Iw3QUtf47baJLU/9+ck0U7XKJrYFEyFj6C9EO/VZhPeUPn81RzmHquPa5hPA7zUjzSwlXozNt3Stcm562K9SkfNl6M8F/LyFr6Rz1pCtwXDvWyV0RQYHR85xXAMYi3GB2BAJP34666TxE7PN1hE4UPXbjn90Kk/SkWqTVll3e2KErO4fpno0k+ahynjAvflsEoRd4TzwJVg0CIfbJN/uQXU1kE0bVZu6TUxoVz+GNvyfKjXj5X/t00EmG/gthI+9R4xgoXAkwcMXO6fxgRdHynSX5a4ix7m06epXzKElTSAg9TZKgeJZ5NCg5PmUqUqcF9Rrt0aW247hD5Q4DWDMV+bmpb03u2k8Wmr8P7km0cNwhDj6LikLAIz7xs/L8B1yg3zu5GkRBaV1T40Xr6K5zlWrqbELMPFNJcrmpAnIHC97TrjOr6dc1Eb5h50MrqxDFfxH5HOSQaUOTc6Jn3JEKM4xxB0t9/K5VfYPaTEwjGC1o6sEUOb5x04YHOk108uYiIHVCX964yaE2mbnW0YPXUuMvd7Hbokz4qeVkPYZb3UhYMfqemK8QZkLIxnWfjH/t6MUn/5Uife4G9X4SZWJR9qMGXKn5588iQeyQJuLannCUIeCxtMCRvb6B7bkaXEdvtl/kM7+iGeKEJ2X9lM4UGlR2r+NIIKN772H5rLYlKb/L3H8jTL6CJ1W4mYy8YaikLpXVHXIskxyzQBXsIjGeTxWAoZom6RG37laOYakJU2LmXwBIbJi3VBMrPKI7bZYGGqYrOfxXbqgp0x2Tu0cg5wqWsX/p3wjVE+lBgLvD5gbRK/NXQAjAEj2jvFrk8fU3i5/BKEwZQa0xGzVce38G+E7dxGkuyhxuHEcElcxHp3BwBGYwAjbJXwqtURKhTYqVA0JssOUCbI3NBh8lfcESLh/OeQo6bwiTlolT/KWowpd7DKX4oWYxTJxoxA5bn/0Ym3 egg7DXQ8 u8oyBD7gCWMT7KSoVoTJisPvhK5rAQg83bH/PA0Y6/r3ZL/JPFwVdnXKXY/WcnsoecH4cK54l90MFU09pzKFVayijrt0xnpxWpJgxAuic4jxCMth0tGjzB9sHlKlT3bMZQtkYxWoEWsCce/O+3mrmargjCmVF0d6HARtj+4/GfMNu3B12j6S5c2H06rCPTpbconVsCwCWIwBRxb5wtk1EDsxbz5TtVWzMQFQmaXePXqudyQGLRltaimB20rbXsIPlnIadlQLow7xCqM6o+1iL/fjTQf3habNZJXjLII6wZNJHLjNi4d+xEBiXyHEd+iMFwOcJ44p++I9ilaq1RW57flLYl8YYM326guZcjMqlx6jo5lBWseWXIFGjeNEQmTf+h6rCaujFyPETXaHXdJ8fwTfw3g8fM9shE21Da3FUtsFZ9MfmQX0ZhwsgZ3J4ipfEwMgw0GlgH/3JL2o/kUyj/9aN1GJoyLWPe4vTBGu2YgnnyfcZy6w+0GF0RkN5Jf+benGW57WuQB0x037k4IawnDdGIsU+NFXD2SJzO9yece7Y+5hmr5l241AMl1jBSdg5zy7E0Iq4o6T1FSz2PQ3IQJ/UOigF3q/o0DAQmkfILPxYxpqjNEvgBITO66ZeMedxXkBwuyfFBq6ciQ230Um3aWyLQ2D4j3RPyZeAVD5tB6lV7wSlUJYkqfgnoEvc17I7dZ8MJaBYUVt8KzEfvi+WdBvuZbJTlp4WFt8J+CeCHqAjUZWPDk8CDCcK5Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.001290, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Background ========== In our containerized environment, we have a specific type of container that runs 18 processes, each consuming approximately 6GB of RSS. These processes are organized as separate processes rather than threads due to the Python Global Interpreter Lock (GIL) being a bottleneck in a multi-threaded setup. Upon the exit of these containers, other containers hosted on the same machine experience significant latency spikes. Investigation ============= Duration my investigation on this issue, I found the latency spikes were caused by the zone->lock contention. That can be illustrated as follows, CPU A (Freer) CPU B (Allocator) lock zone->lock free pages lock zone->lock unlock zone->lock alloc pages unlock zone->lock If the Freer holds the zone->lock for an extended period, the Allocator has to wait and thus latency spikes occures. I also wrote a python script to reproduce it on my test servers. See the dedails in patch #3. It is worth to note that the reproducer is based on the upstream kernel. Experimenting ============= As the more pages to be freed in one batch, the long the duration will be. So my attempt involves reducing the batch size. After I restrict the batch to the smallest size, there is no complains on the latency spikes any more. However, duration my experiment, I found that the CONFIG_PCP_BATCH_SCALE_MAX is hard to use in practice. So I try to improve it in this series. The Proposal ============ This series encompasses two minor refinements to the PCP high watermark auto-tuning mechanism, along with the introduction of a new sysctl knob that serves as a more practical alternative to the previous configuration method. Future work =========== To ultimately mitigate the zone->lock contention issue, several suggestions have been proposed. One approach involves dividing large zones into multi smaller zones, as suggested by Matthew[0], while another entails splitting the zone->lock using a mechanism similar to memory arenas and shifting away from relying solely on zone_id to identify the range of free lists a particular page belongs to, as suggested by Mel[1]. However, implementing these solutions is likely to necessitate a more extended development effort. Link: https://lore.kernel.org/linux-mm/ZnTrZ9mcAIRodnjx@casper.infradead.org/ [0] Link: https://lore.kernel.org/linux-mm/20240705130943.htsyhhhzbcptnkcu@techsingularity.net/ [1] Changes: - v2->v3: - commit log refinement - rebase it on mm-everything - v1-> v2: https://lwn.net/Articles/983837/ Commit log refinement - v1: mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max https://lwn.net/Articles/981069/ - mm: Enable setting -1 for vm.percpu_pagelist_high_fraction to set the minimum pagelist https://lore.kernel.org/linux-mm/20240701142046.6050-1-laoar.shao@gmail.com/ Yafang Shao (3): mm/page_alloc: A minor fix to the calculation of pcp->free_count mm/page_alloc: Avoid changing pcp->high decaying when adjusting CONFIG_PCP_BATCH_SCALE_MAX mm/page_alloc: Introduce a new sysctl knob vm.pcp_batch_scale_max Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++ mm/Kconfig | 11 ------- mm/page_alloc.c | 40 ++++++++++++++++++------- 3 files changed, 47 insertions(+), 21 deletions(-)