From patchwork Wed Mar 20 02:42:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kaiyang Zhao X-Patchwork-Id: 13597219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 667A3C54E71 for ; Wed, 20 Mar 2024 02:42:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 971586B0085; Tue, 19 Mar 2024 22:42:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 923696B0088; Tue, 19 Mar 2024 22:42:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EA5D6B0089; Tue, 19 Mar 2024 22:42:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 67AD06B0088 for ; Tue, 19 Mar 2024 22:42:24 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3587E14037B for ; Wed, 20 Mar 2024 02:42:24 +0000 (UTC) X-FDA: 81915868608.17.BA7D43F Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf26.hostedemail.com (Postfix) with ESMTP id 34AA6140006 for ; Wed, 20 Mar 2024 02:42:21 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=cs.cmu.edu header.s=google-2021 header.b=UvL2XPKT; dmarc=pass (policy=none) header.from=cs.cmu.edu; spf=pass (imf26.hostedemail.com: domain of kaiyang2@andrew.cmu.edu designates 209.85.222.182 as permitted sender) smtp.mailfrom=kaiyang2@andrew.cmu.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710902541; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=aRK0SnsjbBEzd2IgwEwXrYJHhbyBxAabDsSn4vosIOI=; b=DVilMwhlQ7wPD9XryrG7+JTGj3WUeHUc07shUhu/rkL1T9NScck6YHP3LzAjPwWiWGrBSj 7wcuAr0aoQftporSCqNGlHOdzMit9yq4fccw6PB4cw26fr4lsTwSqVKyjtKO4rmintMiXt uW5LcCpI/t5W2wnUCL4vL6aAp9tAPmo= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=cs.cmu.edu header.s=google-2021 header.b=UvL2XPKT; dmarc=pass (policy=none) header.from=cs.cmu.edu; spf=pass (imf26.hostedemail.com: domain of kaiyang2@andrew.cmu.edu designates 209.85.222.182 as permitted sender) smtp.mailfrom=kaiyang2@andrew.cmu.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710902541; a=rsa-sha256; cv=none; b=Jkn89BlfPaG2+uR7QI9LqKjy1Pw2ovnbRmIDYBllGpQ7TTgUKZvftL3eXHQm9sDdPSC9VN P/TqIb/0e6+oPmeBp7Fc9Y9UMl8p3EfiLWN612DL/XKwYhV/7EMINzvmg0uGYXJGpR7LNk rEZktGmgGQATluXJOjhRelHvtw0fiPw= Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-789dbd9a6f1so423699585a.1 for ; Tue, 19 Mar 2024 19:42:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.cmu.edu; s=google-2021; t=1710902540; x=1711507340; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=aRK0SnsjbBEzd2IgwEwXrYJHhbyBxAabDsSn4vosIOI=; b=UvL2XPKTMCyLFgJ3Movi24Eiz7h2ULW1cv5Bqn4ILCaLqMwiGjx461jjl+gWlC06Db Wswpu4nL93qQZHLBnoHprIZq9On38X8MzWvXo20SG1ljD+nGYIgjt/q4AmVyI3qlvAp4 fNDP+Cri366xYu7hyoC3/E2cq7J7v5ylSwGehsiQuu5h/V4Ax/wjBgzkVfHP2+0WZf4O CDmb4D6+XesbMVBSYfT4YIQztZUYA7NkfEolTW7RjQCah+craARaiFN2NwK9Z+REWU6c FDxH/zzOwOCLX46LU3zjxz45I5NMyKbYnuCoFJXL0bPRpqPPYOA/MlrpCpfzAlumCszT rZyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710902540; x=1711507340; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=aRK0SnsjbBEzd2IgwEwXrYJHhbyBxAabDsSn4vosIOI=; b=Zq1KXoao7qHuNlFydIKbAHNfYFpzu/oyGwnG/BIw/3TxNR7DTnPpJllvZ2dRksF9iZ 4rLTlGP0VUGxE2EIHJvQzm+sLfarzeCMEpnUQ7uRse/6JbXayL5FmXXi/rM7zTSUTa3w vPEbMW8vyqvLGaVTdCZxMmUyu+WBdBNKOwhM6OfQPg0Rd4b4JV5gcMW9i0qREh7H9R1i m7c69Q6Cd6ZR7Uz5D8YWujiDgPjpxlxs9ag/9XpaSJzu0Lv0y+IdvuFVhuKcHq3fVp7e 0SiKlT5kDEKB4lZHetkFlSdZm5qomW86XromY66qDl8icy5LM2ntizqkJVbSFi6WD4Nz wlKQ== X-Gm-Message-State: AOJu0YxG//32tAngqvH7Dzi6YtQfAkCRaW0DoWmjT+FbCnRwP+1QDpbc fIAPwA+zElyDXhecGbQ2q2z4LnHtu0IjUCHtdZvFS4SSbiOOVYD6nyUUaKCM1YLO2kNiZKQHhwH V6CcZ05oW6VW6BIPinhQZr2kF+umDmeJsWrh91C9QekvbprVlNiaip+ZCJOzi4gqCi4kC2y3rLD V7KuCqJ3qSsCj0fPIRh9gc48H+gQ0lCrzOJDk= X-Google-Smtp-Source: AGHT+IGgxtyHCap0WgtUTDWfNpHXPVzQPhVtb+cu0B/hsTClonwVi3lULn0a8bMbCW01NLLOrkZnxw== X-Received: by 2002:ae9:e70f:0:b0:787:f739:9836 with SMTP id m15-20020ae9e70f000000b00787f7399836mr17633769qka.52.1710902540034; Tue, 19 Mar 2024 19:42:20 -0700 (PDT) Received: from localhost (pool-74-98-221-57.pitbpa.fios.verizon.net. [74.98.221.57]) by smtp.gmail.com with UTF8SMTPSA id x6-20020a05620a448600b0078824c140b7sm6140238qkp.27.2024.03.19.19.42.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 19 Mar 2024 19:42:19 -0700 (PDT) From: kaiyang2@cs.cmu.edu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Kaiyang Zhao , hannes@cmpxchg.org, ziy@nvidia.com, dskarlat@cs.cmu.edu Subject: [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations Date: Wed, 20 Mar 2024 02:42:11 +0000 Message-Id: <20240320024218.203491-1-kaiyang2@cs.cmu.edu> X-Mailer: git-send-email 2.40.1 Reply-To: Kaiyang Zhao MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 34AA6140006 X-Stat-Signature: q9ca3pj4hzi6npacgagx91koo4hk3749 X-Rspam-User: X-HE-Tag: 1710902541-63210 X-HE-Meta: U2FsdGVkX1+ZkxVkkGuWAoBE5f0nG6pxZWZvb865ne04hPnrDSoNlNxIAr6lpH9gL61xmgQ8v175dyj8TNn4bmbf6sQQ8U/fMxKZaWtQyEBV9uIVnrDsTIdZTzP+dclpNtF3awmT5NEs4oApbfrM+9zR4sQKiLB9XXivlLC/fJqRF00nz0M9vYoJR1xgw6xIMaWcGsKPhgQwbc1gIWUrLeWSgBpY4AAc0xjF7GVhjHfopGJ4b39nZA1Sfo4FFk0nQZ31mE7KRiCqWuD5nIIdKWs4qhplfHg5oqlH92f+qj5mWVKiPPbIl/uWGHAdd1k2x50Q3lHXJARUSMp7OWjtcHyH4wv0VoUik/MH58IdywlXdqHJSTot1fQaThHN4PrEhYa5crJXU6VyGCwAoadt6lXji9hmZC469c8oPDEtMD1J8ljHop2xz9GjhSOnEgPd6yc7E5B+TIJDF2fsDbo/bs0xpWiXTgdnTm1LOKKf+zPU5zzOAT3zYTs7Ey4MGCClYMCrpbRvR+Ox5gHUGCF8intx38/pxIExkrz753UaiL4VlMFeBscdsOM1WLwBC4lN4eWsJrrTcbORhdwc0GussLUYV7uM5mCu7qUrEbYutML2E/R+fm9VcYFF8rEsOpFdmArFKKhG0h7BTX6rMTK4B1tdWquaq0DLYjFFwYOMtnlydS4mrENQyZy0mbV5FvOEdf5/Xzx/2MBNdzt/qHUk16IPmflf96cItKbil8Y3vmjSdDXHSAUmMxIQQQlimtEyxB7mdeQGv2BuXwgBhOjEphbIHV2fxmYwWVN0rkmcmD98NCya5zfGRK9xVTnb0wTfcZAqNJWfv+O1zPmzzQhPt1dwQUi1+gFUof/OK7+e0kYOW4Z3EfsICRlNxtr1Ps5Xufn6ApJl/kFaTTeFXLja8WY31n2NpxxEseFdKbiRTgHjV+QXJE8oV8a+NBiWNnulR5jYINkPeiDg9xgMdkw BX6qtI+e LyTv9+hNG+0fQUNVYgTCTLzSHA1/XizFDH7ZSPJv23z6pyQvOdO0aFgjBdtnoq4KV/+BniocXU+KjxSi/0AkJUN7KSrYiEwkzK2wuxaEtIBVzsBfE60ktHNPP8zJBBvB1ssJegOWaQ87ksGjjJ2GzXt8DcYA2/w11i38SKJiqk8Dsys97cWRCVkVGZlLr2MplUAqNMZryBtFmWOvXRRP+NTkFaPANFIU3hox+SuyJ4zU7tiqrOsctPDW/MGfcMQ7ARcy0c1sojtXFqgFISaHYY0a/7mkCWGl+Z+ZCHcjn8MlkMnwn9FAKXTvFLlw77jt0V5jnYZIzlN0RraiyAUidqGuV5x1zOkMDYYvg0MRiQn9sn/ps+aKt6iwVPiK0SXfxlAHDfkBNcn87wKNLGo+ucSGZSIZobP4kKhAZ/hcUhGxJuJS40mP6sHe2eHElFQkOPH6XkTFLQAQTL7dKQFOeqfLzv+PkRco931n7x/YoIv2Feu5SHU2Knz8wuoCVbUvUDScC3oydI45DqaORhPS2Hc6Yr+gMPkMmDLXheSBpCJqQXftB1vBsSkK0DG+Ky5nwHWDeiMdqIOppKa4jxguN1mPZMxMs7ZZNZH4ana+TORFzYVVg7bS7ckcwTz5dvkkoR08G5+q3rg2c8JQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.179556, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kaiyang Zhao Memory capacity has increased dramatically over the last decades. Meanwhile, TLB capacity has stagnated, causing a significant virtual address translation overhead. As a collaboration between Carnegie Mellon University and Meta, we investigated the issue at Meta’s datacenters and found that about 20% of CPU cycles are spent doing page walks [1], and similar results are also reported by Google [2]. To tackle the overhead, we need widespread uses of huge pages. And huge pages, when they can actually be created, work wonders: they provide up to 18% higher performance for Meta’s production workloads in our experiments [1]. However, we observed that huge pages through THP are unreliable because sufficient physical contiguity may not exist and compaction to recover from memory fragmentation frequently fails. To ensure workloads get a reasonable number of huge pages, Meta could not rely on THP and had to use reserved huge pages. Proposals to add 1GB THP support [5] are even more dependent on ample availability of physical contiguity. A major reason for the lack of physical contiguity is the mixing of unmovable and movable allocations, causing compaction to fail. Quoting from [3], “in a broad sample of Meta servers, we find that unmovable allocations make up less than 7% of total memory on average, yet occupy 34% of the 2M blocks in the system. We also found that this effect isn't correlated with high uptimes, and that servers can get heavily fragmented within the first hour of running a workload.” Our proposed solution is to confine the unmovable allocations to a separate region in physical memory. We experimented with using a CMA region for the movable allocations, but in this version we use ZONE_MOVABLE for movable and all other zones for unmovable allocations. Movable allocations can temporarily reside in the unmovable zones, but will be proactively moved out by compaction. To resize ZONE_MOVABLE, we still rely on memory hotplug interfaces. We export the number of pages scanned on behalf of movable or unmovable allocations during reclaim to approximate the memory pressure in two parts of physical memory, and a userspace tool can monitor the metrics and make resizing decisions. Previously we augmented the PSI interface to break down memory pressure into movable and unmovable allocation types, but that approach enlarges the scheduler cacheline footprint. From our preliminary observations, just looking at the per-allocation type scanned counters and with a little tuning, it is sufficient to tell if there is not enough memory for unmovable allocations and make resizing decisions. This patch extends the idea of migratetype isolation at pageblock granularity posted earlier [3] by Johannes Weiner to an as-large-as-needed region to better support huge pages of bigger sizes and hardware TLB coalescing. We’re looking for feedback on the overall direction, particularly in relation to the recent THP allocator optimization proposal [4]. The patches are based on 6.4 and are also available on github at https://github.com/magickaiyang/kernel-contiguous/tree/per_alloc_type_reclaim_counters_oct052023 Kaiyang Zhao (7): sysfs interface for the boundary of movable zone Disallows high-order movable allocations in other zones if ZONE_MOVABLE is populated compaction accepts a destination zone vmstat counter for pages migrated across zones proactively move pages out of unmovable zones in kcompactd pass gfp mask of the allocation that waked kswapd to track number of pages scanned on behalf of each alloc type exports the number of pages scanned on behalf of movable/unmovable allocations drivers/base/memory.c | 2 +- drivers/base/node.c | 32 ++++++ include/linux/compaction.h | 4 +- include/linux/memory.h | 1 + include/linux/mmzone.h | 1 + include/linux/vm_event_item.h | 6 + mm/compaction.c | 209 ++++++++++++++++++++++++++-------- mm/internal.h | 1 + mm/page_alloc.c | 10 ++ mm/vmscan.c | 28 ++++- mm/vmstat.c | 14 ++- 11 files changed, 249 insertions(+), 59 deletions(-)