From patchwork Thu Jun 27 08:53:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ge Yang X-Patchwork-Id: 13713988 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6D56C2BD09 for ; Thu, 27 Jun 2024 08:53:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55B336B0083; Thu, 27 Jun 2024 04:53:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 50A986B0088; Thu, 27 Jun 2024 04:53:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D19E6B0089; Thu, 27 Jun 2024 04:53:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 203E26B0083 for ; Thu, 27 Jun 2024 04:53:25 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B1C8DA1BD8 for ; Thu, 27 Jun 2024 08:53:24 +0000 (UTC) X-FDA: 82276054728.15.492FA9D Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.6]) by imf10.hostedemail.com (Postfix) with ESMTP id C1E49C0005 for ; Thu, 27 Jun 2024 08:53:21 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=TReDolmK; spf=pass (imf10.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.6 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719478394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=6xGCK3vdlcLd1W/Bs/Fp3r0CvMSGjItZew3IZE6at+k=; b=o2sBkzDgiLU2aRam2lhG5DC9LUQ0sWwOI7GlpG2Ha1cE3AdoVOiDcsY6D5Hq5axTl4Ew+d qkm0ZiHn7fdm5KTOFnD+fZVnMwTLnlZ6SK5z8J6qSdOc+JcKp9SgIdd5om1fBDLa4Pj5cY P/JZhGQs+09VyYzmWPaBO0K21J8Gfyo= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=TReDolmK; spf=pass (imf10.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.6 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719478394; a=rsa-sha256; cv=none; b=l1wJj92sx/aPdTmmVtlri3QbU8gmVAOSBr61trLv3Ts28gCJeON4s1VJb68qSIG1udwv8R NJ9kDcvZZMAn2YQBKqHtwnXQtfJ0DI6RaLrFI3oF/i18blljDFQlxbuA1TkCyxSEH6uef0 OujXm5n2sliTRp2hc2B40jjsXn9LYmM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=6xGCK3vdlcLd1W/Bs/ Fp3r0CvMSGjItZew3IZE6at+k=; b=TReDolmKAyQPbR9HCYIBi4m+eFXTCT1XmE thQEWPhM+ayAXUtvfUPQDO0x94AmLBJ9ZxXTgJAVXbe8ysCjh2W2zp/YyYguJ+0n UC3XHK0Wz86ulmpYE2zJEFV5SHGZRTkOl/OTGzAZ/bvwwqLPTN9/tODdPvT3P0Fp +DJOhh4gY= Received: from hg-OptiPlex-7040.hygon.cn (unknown [118.242.3.34]) by gzga-smtp-mta-g0-2 (Coremail) with SMTP id _____wDnD_l1KH1mY9GFAA--.8903S2; Thu, 27 Jun 2024 16:53:11 +0800 (CST) From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, peterx@redhat.com, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn, yangge Subject: [PATCH] mm/gup: Use try_grab_page() instead of try_grab_folio() in gup slow path Date: Thu, 27 Jun 2024 16:53:08 +0800 Message-Id: <1719478388-31917-1-git-send-email-yangge1116@126.com> X-Mailer: git-send-email 2.7.4 X-CM-TRANSID: _____wDnD_l1KH1mY9GFAA--.8903S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3WF45KrWkXF48uF17Zr47Arb_yoWxAr47pF 4xWwn8tw4DJr17Can7JF4DZr4Sy3s7Ka18CFWfCw4fZa13t34YkF1xJFyrJr98G3y8uFWr AF42yF1Uua1DJwUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jjLvNUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiGBwLG2VLcCYqdAAAsy X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C1E49C0005 X-Stat-Signature: dnkeszfbgo3nmnc449pphfggmm1zxcqr X-Rspam-User: X-HE-Tag: 1719478401-669797 X-HE-Meta: U2FsdGVkX19is+9YuvXjCqML7DmEfjU88qICRyctAhmY/3CSvuy3ZVslGuiovfzsNXzoIJk7SjvxAF6fE0hzkknBnG8Ck2s4vYwdUpe+wuhktLBZor/X6EC6agWfXdYmIDYV8/LjcWIjuVFx/22LtqR4/XEdf7ehTxJky7lfI7JVESG5PYXEimk35i2pWm/Eev7XyTb8Hecmn6tQ5KmKzo530bG1YqAgXGpDVjBwao7wFEcmQcsSg+sn6lM5UiCP9pmjIZCH8kkdI1+rW4im/MqqZcsGF3zOCZkju6vroAr4pFmCovqugCeyA/lQXWQczUk5iLlyOPJ5FUGnLwGa1PeETJtuAdQLpxg44bk7piQSooTvhhEPxsPt31YwUxklKc8zUno2MvjO0cHcykSfzaw7uaBFwMBMg3DVIiMnmpAtxwaNejCDxOkQWiHO6J+paVKoGbjB8vRo1uxEFH6aLSgubNMAS48yN1bVziRWJCXTinZpyvqeKJIsyodgtRqq8RsyG3M/YDlbwHtCc0wlYFu3p97U+Ovm6p5jiTCv85VU8I7j2XJxpcMVJQKg5AlW6U91s3pmLtqUEhy9XdOAx8EfGW8tvDVhCk6vcw3LLjcD1zyMQcxwWlaKiNrs4dLgb/UDGKOxgKKi3lnUOxXtC9RjMfpdOhc3kpy0ufmnpNI77grwrvE+IlsPHTmP8GiFdPH7jjCeEyrKzhwzvSinTu5Lc4zFw7cYwibe17+QWLz16k6s4QPK4cdfsDVn9xJ/ClKSf7fO+myCTW3wb/XRZG640kWbbkxZRFMDaaB/ml2ZZzCIDHW2xIzsGnDnel/qceHffeTTHd1fs0jisUYAxu/B9yGnYbeiz7u22AiESE+p0Is9iXwvjQMO/wj17/Pqv2dnjDN+LwKg1SmK25pM8clOhgZGXlXsRVISr06D2foSP/kBOqpuJqhTn7gc5absyinJVjqJdrlSmkyMaot ZUER1Af1 U2xtQ5C6bU2bLTmcu6WdA3xWeX5mu4A7F8vPLMElFptb62mVmV5a5KAX97OHFBFXea1KZU85oS9mv4nlQGQsWsMLjTpAEP9yKbddEtixL6cTyD1BU2cul8LzJYb2e7a3WkcJHav+TlsLVzjd1B8mRawmfkiX+Wa2ilKeH8Ttjs5HTg473/g9CccRKkHAZLGkmff8mtXjl9f31yjWglgRakvSdxSrE3xPQnG8QTK+ZTGrOmgQoIFFo+knPLvZnOUMmVIevgjmfTENuCi3sBxzX4MPktxigcSoGr98WaUM7had9ydbT6gXmEgLwaFCkhGLe9HwL1tRCHVAPeC2di8Kb/3YQR9ZHXQpBXGa5TdbZH0XZtgwVrAtOxhjf0q7Sip6zLC5yAF/KB01VBXSu7qlw0iCAp2mpd6U6Ol/rVx+shWdj1ApLhTbewxuXjy0XPvvyiIaOGQOUKb/kaEOT8ougX4zjeK8EK/vHbk4V X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: yangge If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a SEV virtual machine will fail. During starting the SEV virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_fast() will first call __get_user_pages_locked() to pin the page in CMA area, and then call check_and_migrate_movable_pages() to migrate the page from CMA area to non-CMA area. But the current code calling __get_user_pages_locked() will fail, because it call try_grab_folio() to pin page in gup slow path. The commit 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") uses try_grab_folio() in gup slow path, which seems to be problematic because try_grap_folio() will check if the page can be longterm pinned. This check may fail and cause __get_user_pages_lock() to fail. However, these checks are not required in gup slow path, seems we can use try_grab_page() instead of try_grab_folio(). In addition, in the current code, try_grab_page() can only add 1 to the page's refcount. We extend this function so that the page's refcount can be increased according to the parameters passed in. The following log reveals it: [ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 [ 464.325515] Call Trace: [ 464.325520] [ 464.325523] ? __get_user_pages+0x423/0x520 [ 464.325528] ? __warn+0x81/0x130 [ 464.325536] ? __get_user_pages+0x423/0x520 [ 464.325541] ? report_bug+0x171/0x1a0 [ 464.325549] ? handle_bug+0x3c/0x70 [ 464.325554] ? exc_invalid_op+0x17/0x70 [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 [ 464.325567] ? __get_user_pages+0x423/0x520 [ 464.325575] __gup_longterm_locked+0x212/0x7a0 [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 [ 464.325590] pin_user_pages_fast+0x47/0x60 [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd] Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") Cc: Signed-off-by: yangge --- mm/gup.c | 26 ++++++++++++-------------- mm/huge_memory.c | 2 +- mm/internal.h | 2 +- 3 files changed, 14 insertions(+), 16 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 6ff9f95..bb58909 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -222,7 +222,7 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) * -ENOMEM FOLL_GET or FOLL_PIN was set, but the page could not * be grabbed. */ -int __must_check try_grab_page(struct page *page, unsigned int flags) +int __must_check try_grab_page(struct page *page, int refs, unsigned int flags) { struct folio *folio = page_folio(page); @@ -233,7 +233,7 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) return -EREMOTEIO; if (flags & FOLL_GET) - folio_ref_inc(folio); + folio_ref_add(folio, refs); else if (flags & FOLL_PIN) { /* * Don't take a pin on the zero page - it's not going anywhere @@ -248,13 +248,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) * so that the page really is pinned. */ if (folio_test_large(folio)) { - folio_ref_add(folio, 1); - atomic_add(1, &folio->_pincount); + folio_ref_add(folio, refs); + atomic_add(refs, &folio->_pincount); } else { - folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); + folio_ref_add(folio, refs * GUP_PIN_COUNTING_BIAS); } - node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); } return 0; @@ -729,7 +729,7 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, gup_must_unshare(vma, flags, page)) return ERR_PTR(-EMLINK); - ret = try_grab_page(page, flags); + ret = try_grab_page(page, 1, flags); if (ret) page = ERR_PTR(ret); else @@ -806,7 +806,7 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma, VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); - ret = try_grab_page(page, flags); + ret = try_grab_page(page, 1, flags); if (ret) return ERR_PTR(ret); @@ -969,7 +969,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, !PageAnonExclusive(page), page); /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */ - ret = try_grab_page(page, flags); + ret = try_grab_page(page, 1, flags); if (unlikely(ret)) { page = ERR_PTR(ret); goto out; @@ -1233,7 +1233,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, goto unmap; *page = pte_page(entry); } - ret = try_grab_page(*page, gup_flags); + ret = try_grab_page(*page, 1, gup_flags); if (unlikely(ret)) goto unmap; out: @@ -1636,22 +1636,20 @@ static long __get_user_pages(struct mm_struct *mm, * pages. */ if (page_increm > 1) { - struct folio *folio; /* * Since we already hold refcount on the * large folio, this should never fail. */ - folio = try_grab_folio(page, page_increm - 1, + ret = try_grab_page(page, page_increm - 1, foll_flags); - if (WARN_ON_ONCE(!folio)) { + if (WARN_ON_ONCE(ret)) { /* * Release the 1st page ref if the * folio is problematic, fail hard. */ gup_put_folio(page_folio(page), 1, foll_flags); - ret = -EFAULT; goto out; } } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 425374a..18604e4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1332,7 +1332,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); - ret = try_grab_page(page, flags); + ret = try_grab_page(page, 1, flags); if (ret) page = ERR_PTR(ret); diff --git a/mm/internal.h b/mm/internal.h index 2ea9a88..5305bbf 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1227,7 +1227,7 @@ int migrate_device_coherent_page(struct page *page); * mm/gup.c */ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags); -int __must_check try_grab_page(struct page *page, unsigned int flags); +int __must_check try_grab_page(struct page *page, int refs, unsigned int flags); /* * mm/huge_memory.c