From patchwork Wed Jan 8 08:37:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ge Yang X-Patchwork-Id: 13930281 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FD89E77188 for ; Wed, 8 Jan 2025 08:38:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 147A46B00A7; Wed, 8 Jan 2025 03:38:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F7BE6B00A9; Wed, 8 Jan 2025 03:38:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F02566B00AD; Wed, 8 Jan 2025 03:38:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D12AE6B00A7 for ; Wed, 8 Jan 2025 03:38:26 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 00286AE59E for ; Wed, 8 Jan 2025 08:38:25 +0000 (UTC) X-FDA: 82983632970.09.53DDE1A Received: from m16.mail.126.com (m16.mail.126.com [117.135.210.9]) by imf10.hostedemail.com (Postfix) with ESMTP id 73268C0005 for ; Wed, 8 Jan 2025 08:38:23 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Gpaujyqs; spf=pass (imf10.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736325504; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=yJ26+NF1bxEQwU0baXgN1NRR0XTdETHrocY9uiVz3tk=; b=JCVVO+TT5uGicaENQkZnMofmgfPRvjiWDFFEsCa8119LZG//f3Y1q/W2UrbrDtZ3HCczx8 ImIEXl2CEPgcAN9dVeldhN9v+YUJ27l9gUT06a4XSxkRJxlP/C345DRZRi2xEcut/yEY1a yMPjWTgYeaqkNx+XGXJAgdzcIVe//cY= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Gpaujyqs; spf=pass (imf10.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736325504; a=rsa-sha256; cv=none; b=e/G/X8vh5l1rghlb40UEhKKau6/HJg9FokKJ00IpVygH0AE985mC7qoFhBKrnajezYnQOI bZ5DKecVgmZ0aKYk9l19T/0p1kEIW2/XAEtTbz3zwtsX+TZhSsO4OEhLGbjSzE2oI4HCCN rGdho1vkXyZJTHAzTUNTDGhYqWART1U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=yJ26+NF1bxEQwU0baX gN1NRR0XTdETHrocY9uiVz3tk=; b=GpaujyqsHBbboZn+IhJPAne12yHq+L4O9N DNr++s8YLXA0l6P1MkSLQqlSjlc9VR1GRskKsBoSYoo4SMtwyw6MXrwvNTCJ8Hov bKu8z23K5KOXIQ9H1L7TMSHrUcl1i749v3IVgjm9DesAKNTh/cUzCo3BEJHn5FiF A28ORrfjg= Received: from hg-OptiPlex-7040.hygon.cn (unknown [112.64.138.194]) by gzsmtp5 (Coremail) with SMTP id qSkvCgCHKktBOX5nd++SDQ--.58031S2; Wed, 08 Jan 2025 16:37:21 +0800 (CST) From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, hannes@cmpxchg.org, liuzixing@hygon.cn, yangge Subject: [PATCH V2] mm: compaction: skip memory compaction when there are not enough migratable pages Date: Wed, 8 Jan 2025 16:37:20 +0800 Message-Id: <1736325440-30857-1-git-send-email-yangge1116@126.com> X-Mailer: git-send-email 2.7.4 X-CM-TRANSID: qSkvCgCHKktBOX5nd++SDQ--.58031S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxZrW3CFyxAw15GFW5uryxuFg_yoW5CF1Dpr 18GrnxGw4kXFZIkw1Iv3Wvkryfta1fGF4UJF9FyF93u3ZI9FySvwnrt34jya1UZryjqr4Y v3yDuw1DCan8Z3JanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zRY-erUUUUU= X-Originating-IP: [112.64.138.194] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiig3OG2d+KpnrLgAAs9 X-Rspamd-Server: rspam05 X-Stat-Signature: j8xrazs3j9x4iy6zd1r9rmy6bngt5psn X-Rspamd-Queue-Id: 73268C0005 X-Rspam-User: X-HE-Tag: 1736325503-544327 X-HE-Meta: U2FsdGVkX18c5po/7r8p4g6IB/9GgBgz7Q9gkMoyLRHzW6pcZnoWQEKShZlfJXVj2JuEWvAR6BU2toA8TdiS/kGb60evaUMphnckWjp+PkNcrUJb1i++mNoe9+pYjqlm0cRSaUYL3lIiTQdMFZIfBMKPLIbrnTvfbsQibb1B+PPA6lBND8/6vd2Ngtqtl7IfMy67eKdUJL8CoxyxYZRdK/gSCYeGdm4V02nvfIeHJxdi6EVM9ExuwYvfR8NoXGEocZLo+MAJ+D4Ml8XqFSV20u8ga/LMyO3UJ5sBwNXxLdgBeZZWpmSZEbbmIK4edsmoblehArd+z0LEGEPIcsvW9bB0ztRUDjxPBWotSsMlORn5ePR1AKN0ORX3+X62ToLU2HPHoDORpzT+FGFq/+Y6+3yTBuMdL1a5FsA9GB51WZUJaa43pQJMJ3hZS1dWDtWQXRi/6tDhgB5RmG+YH2Ffl7j/0T15K2qO1bZ3dmy8FRyUkHIkZaZ7HkjkkamsSlDnlBcUkvG+AD6z36y95GuUHFdc8aul0Lp0Kjq9MYBmE4KeRBPO7jayOdO1O8jj+t/cdIm0a1OzzSGdJ9a1D9NC9YlnHIeh6u/pzw6HgNv5OvOV9mlTiAGOs4JIouMbxNhetHCdm8bsHZXDWy7CYiliMJWseLm5Sp1ZBGoK+tqgel25b36zv6iCQyLuKVgK/9VNfOjuV3LVab2xxHqFs5K2ZbyranLaiYrc4ItSzKPyRC+7f4hvustGldnZWj8uB3fICw69JiViH3PEORU5CnQc7NoaHQSeuSN1/GG+TocD8U5CHR2/CIN7I/JZsaTxvzyxnYa6Q78A07zbgrurkINK4Wc5NNa+jaEBAsahVdc1f8PIAHPAxJHKut+hO3MUMrplBtROy/GFOujIJTUmR6cB3ii1Y9UwXXokow0Owu5MtozWlpu7noOWX/YP4iBjAHlpgMi0/PsS+apZFrNXX6C +ONlvkcJ NTJ+XeF+kLjYV/gXWIGQjVUjPjuyOqcUBT9aDgDqz3rkI7qztvM1HRJ6UuH+k1qcItJS1xjIBaiOpuTuy6z76P2R0pSC7dQni+HM7R502U3R9yUmEjZvgDjkU1yv6h8Hh0bBEslayqKy/BQYvvfS18B0liWzPFpnFgmDqfeR5aBa/5+DNENfqpD0SkshrTyM+Abp9/7ew++sM1CCyx25h+OFd9sTeeNOg7azZmGb1T5TY1zAUR1B0AwH2utTFkCpTYLuL56JtPvIG0eKq5OJCOzKpk6TQ5XEDfcq7F6S5avcSvvXkHoHu4oh8tIspdQDTZtgR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: yangge There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of memory. I have configured 16GB of CMA memory on each NUMA node, and starting a 32GB virtual machine with device passthrough is extremely slow, taking almost an hour. During the start-up of the virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB of no-CMA memory on a NUMA node can be used as virtual machine memory. There is 16GB of free CMA memory on a NUMA node, which is sufficient to pass the order-0 watermark check, causing the __compaction_suitable() function to consistently return true. However, if there aren't enough migratable pages available, performing memory compaction is also meaningless. Besides checking whether the order-0 watermark is met, __compaction_suitable() also needs to determine whether there are sufficient migratable pages available for memory compaction. For costly allocations, because __compaction_suitable() always returns true, __alloc_pages_slowpath() can't exit at the appropriate place, resulting in excessively long virtual machine startup times. Call trace: __alloc_pages_slowpath if (compact_result == COMPACT_SKIPPED || compact_result == COMPACT_DEFERRED) goto nopage; // should exit __alloc_pages_slowpath() from here When the 16G of non-CMA memory on a single node is exhausted, we will fallback to allocating memory on other nodes. In order to quickly fallback to remote nodes, we should skip memory compaction when migratable pages are insufficient. After this fix, it only takes a few tens of seconds to start a 32GB virtual machine with device passthrough functionality. Signed-off-by: yangge --- V2: - consider unevictable folios mm/compaction.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/mm/compaction.c b/mm/compaction.c index 07bd227..1630abd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *zone, int order, int highest_zoneidx, unsigned long wmark_target) { + struct pglist_data *pgdat = zone->zone_pgdat; + unsigned long sum, nr_pinned; unsigned long watermark; + + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + + node_page_state(pgdat, NR_INACTIVE_ANON) + + node_page_state(pgdat, NR_ACTIVE_FILE) + + node_page_state(pgdat, NR_ACTIVE_ANON) + + node_page_state(pgdat, NR_UNEVICTABLE); + + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); + + /* + * Gup-pinned pages are non-migratable. After subtracting these pages, + * we need to check if the remaining pages are sufficient for memory + * compaction. + */ + if ((sum - nr_pinned) < (1 << order)) + return false; + /* * Watermarks for order-0 must be met for compaction to be able to * isolate free pages for migration targets. This means that the