From patchwork Tue Jan 7 04:35:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13928244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53A8FE77199 for ; Tue, 7 Jan 2025 04:35:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 901FD6B00C0; Mon, 6 Jan 2025 23:35:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88A716B00C2; Mon, 6 Jan 2025 23:35:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 703CC6B00C1; Mon, 6 Jan 2025 23:35:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4EFB96B00BF for ; Mon, 6 Jan 2025 23:35:18 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0523B1406B8 for ; Tue, 7 Jan 2025 04:35:17 +0000 (UTC) X-FDA: 82979391516.19.975B7FE Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 507DE12000C for ; Tue, 7 Jan 2025 04:35:16 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Y6i450zr; spf=pass (imf29.hostedemail.com: domain of 3Aq98ZwYKCJUNJO6zD5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yuzhao.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3Aq98ZwYKCJUNJO6zD5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736224516; a=rsa-sha256; cv=none; b=fZy8+NW1u3l2jgEh7QgDoR3soH+u5O8+sPGRuI2d4fy0xe0ve1NLlz+wyVcYDDm7Pdsrka kCle+41CEnevYQ/RhKI9cj/kYY2wDI8w9G2gAoQFvSzCQ7s/8j14LKgZS4ntM/mcvSBVxE TtfX3BxcTWOgvQymvvXmXUusZjD3xYg= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Y6i450zr; spf=pass (imf29.hostedemail.com: domain of 3Aq98ZwYKCJUNJO6zD5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yuzhao.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3Aq98ZwYKCJUNJO6zD5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736224516; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=Cn4s+9ucXGtjlzn0PWRq8ra+54khHbQ+LN3X8SCXohA=; b=GG03KJJtEeCqoKESGsI4UgmUYRlx+QFFMl/3XwSIK+DAU2Fp8dU0Rh3obSrLV/NEyEURlJ GHH2+8aAsdpTlNpDWRQVAeQqxec9hDA9PF2mxqM7r6JH3Y8zbb9BbRyBVdoXmbbuf8HQoE 88N+979WwJpAJmI0o+U+2C+BGw+a9wk= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2166464e236so377624025ad.1 for ; Mon, 06 Jan 2025 20:35:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736224515; x=1736829315; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Cn4s+9ucXGtjlzn0PWRq8ra+54khHbQ+LN3X8SCXohA=; b=Y6i450zrk6qiG+akDz5Kg0nZs/VBJe72hxsVaHyaE93TtDXVY+NsF5UrZT/i070hOV bIcDPSWnAToevc4/0vhiDCrS3Pi6HV/pdFaSnNaJpRLvxY2b/EyPkhOt/4zgFBiAMnJT VEDFpAFetVgmLTlXyrR0BaGrQG5P9fZc6ON/r0US7xgxltK3BFb2GnHJfbuoA0WKtN0v Z25ktnM2KR4fB67Fh9k4zCmieoHI1YhVpU4nJCPJCBHUjkwHaDYyVyLKJepvSOA2KC8M mp52A3DwdVeU1qxz0mx3vLSbuDHoJQx720RfwaVt+YyF990YTDu20lQhLMtSe0R1b4ju viDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736224515; x=1736829315; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Cn4s+9ucXGtjlzn0PWRq8ra+54khHbQ+LN3X8SCXohA=; b=bTk1ram36akC1/Rk9MVyw5DJjTAeCQXFlu38ZTnBpLqi0m6YeNiVcTtTJYOe1BoN5G HMOnARu4hbbOJMgC89/B9cg1Og3Bgpew5LxWXXhSYlJmU5YPHfRvtPXiDM0DH/Xd5/XQ jjTB+lLqSgNe7IlLqjW3Iqx47yvmFkCYGikxNKNE/fQvtRNFbpJ0sXeNXLaPjMQLUKyY Ag4OSgJtf+V2U+HPemiYaHdY9X0dIFjZEqWPzGVeyQbMIlrV96c36D+SSKgw3HpNfm0H wcKIs98sjOlHQB7y+RXRor89pfuuP/3UacWPrY3sn0JYHqNXaJrpzdmKEy/5T9r94SIO dnkQ== X-Forwarded-Encrypted: i=1; AJvYcCXt+1ZRlE2Uot2gw0HozRLXX3BzrxJ7U5hfiGdpR4/Yb9xobIBwWLRTZxNZvRFnmZwBm1EbCg/6Eg==@kvack.org X-Gm-Message-State: AOJu0YzAVReTs/EaQQsccdNC/2AMhGWymVTuFMSE71Tnw+BKvTlPuCIL W4QlIwCdN5f7ibaRoeAVYPI1kSp7AUYCJJDvBiWbP+uVAS1yXRIffXfrSlunt457AF1iMgmjM4d Duw== X-Google-Smtp-Source: AGHT+IF+y73cdd3ZEHD+XhpEssxjkMg/Wm3KlFQdEXnuSvr8fI5xc73aStiWEaCUdlyInFDAuIKS9nee3YE= X-Received: from plbjy11.prod.google.com ([2002:a17:903:42cb:b0:216:2dc4:50c1]) (user=yuzhao job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:ce01:b0:215:97c5:52b4 with SMTP id d9443c01a7336-219e6f2601emr950676335ad.39.1736224514894; Mon, 06 Jan 2025 20:35:14 -0800 (PST) Date: Mon, 6 Jan 2025 21:35:05 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20250107043505.351925-1-yuzhao@google.com> Subject: [PATCH mm-unstable v1] mm/hugetlb_vmemmap: fix memory loads ordering From: Yu Zhao To: Andrew Morton Cc: David Hildenbrand , Mateusz Guzik , "Matthew Wilcox (Oracle)" , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Will Deacon X-Rspamd-Queue-Id: 507DE12000C X-Stat-Signature: 3qdhpk1ugyxybr5i6xqwkuda7nbbguwr X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1736224516-709072 X-HE-Meta: U2FsdGVkX1/fDip8f4yXhIlOrbsJB8qUSY0dvVrHIyudzZEgdziU47Vz6LcBhfbVWBxZJAdbPiknSyLQFdyexv67sdphMLaW9TXc4ev7K9xPOmP6Eh4tjUym+qkuARljo5iaAleOkWH10m1jmyPx818JsXu0GEm/FwP7Rd9uSQQKF+oYjmSkrsQU93ilk3lwixz7CSQqSaAnpSFD8Xfh2Qt1BNP5Hl7SXbRvb5N9NX5nl9OxKvdegTOZrVvmenszpnjwwT3ZJZmk0idoeLLjNwC89Lx/thvKAwv6vpo3Xbn9vUkMPxbfNFwqWuFoBS4wzYbgv8wpsPTLBiD/t9cGugT0uxKrm2HZehVp68ghulHPi/UJEHLgfgww63EI9c9lbMfZj92V7KFQJQPQZD8rBDS6Js/yFPoEzmAKOPNgkcJoFyeOLEFKAVmqG1hmlsE2oOQ4XhDvq9pC6MfQf7gTUpN4neFoKdLUPaA1SB7IswLVoLp4otW2SqjR93ZSpVTynCrkQQwssYtlWPSUqt6kcLqhPwjA8FghIcQ446Ip+9oP/uWA8yHJRKlfswpd54/P1gvXLnDK8tkmLS+YqBwmA70pIrQgUPyZ61nVzb1NZ0MEKKk8LGLfHdhbE+Ci2MgZk0VKQ2LvfXzqTiK23nrRm+gIxsMiePDLkl+8Z5hT3ivrFT+x5B/qgeTgJ72vKf2WcSdtFf3WSTDIp6W3ANzzUOU4dielh/3dhmw2JWEMufvKW91Qhyuip3guJM4vGSjKyaywfkabokDtlKgm/SEAZlqhffmAjIX37ArVPDUKVSk86GRnRfRZhi9ebP5LKSTu5D7YMBxxOed5xvh9j/qk/3LaP0++8RUGTdZMUZQaBlGbwAHfyVEZMJ731rA71nOGvvPHOYUwf72yJoZt4CZflAm6cBatSSIW72e7VDdGLEpTrqrk8oXLD2344pybdEhKtPRQ9XImiuUIQSFA0ve rffW8l/C gsMHyEOJAzsdcolhEGU40DRQPPPatVK7lkOjA5zDqBMaakAuVeLKXDCCsuayvO+yXZRtdXBt5QUWQOVfXCeII6wcet+fm0SXQFbdjJBpA2cXbfqwEZzHQafASEEr21iae1CnBEi4HSkPxmXnOnS+6nrdXLUQwlzTGzg4WmCv1H06tcTAC1Ng/ALPnOApgS8hsOB1lqCBQwp+/zse0VMiwUXBZk0tpEhwNG7JILjMYBAdOmMcJC2s/BN6jtvg8I2ub19RNXnBFMoWSGOu6/SCOjOiLSiq/EVPyR2oLWSjjiP9i6tpHDZUBFLTrSsDaZvfuSPXj++jCT34/4pLNpqvHFXmSrkaijHa96RKdpPr7pw8jZJUb/9qpVJpnHX1h/Y2ggGzHhbBGVGpS6bDOjfTXLWJtoP8Ozysuvl7wSfL1LmBQU06mDFRRNB8P45XOqHuO2HsSFHqP0MsanXpizqDiH+FKYPfIjQ1WfjRtMXCcT/zClpp4KyD6zpBmzPY8Y/qJ9ewFIIoArvhEpyIPPdPdG2DVTKQEZ35CFtDz+v93rmga9qM6qMFvBIiK7k9Y8EF9c/ZgdIe9tb9rp7X+4wo6KVokKhrXHuj5n1W5cfnDR0Nga/4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Using x86_64 as an example, for a 32KB struct page[] area describing a 2MB hugeTLB, HVO reduces the area to 4KB by the following steps: 1. Split the (r/w vmemmap) PMD mapping the area into 512 (r/w) PTEs; 2. For the 8 PTEs mapping the area, remap PTE 1-7 to the page mapped by PTE 0, and at the same time change the permission from r/w to r/o; 3. Free the pages PTE 1-7 used to map, hence the reduction from 32KB to 4KB. However, the following race can happen due to improperly memory loads ordering: CPU 1 (HVO) CPU 2 (speculative PFN walker) page_ref_freeze() synchronize_rcu() rcu_read_lock() page_is_fake_head() is false vmemmap_remap_pte() XXX: struct page[] becomes r/o page_ref_unfreeze() page_ref_count() is not zero atomic_add_unless(&page->_refcount) XXX: try to modify r/o struct page[] Specifically, page_is_fake_head() must be ordered after page_ref_count() on CPU 2 so that it can only return true for this case, to avoid the later attempt to modify r/o struct page[]. This patch adds the missing memory barrier and makes the tests on page_is_fake_head() and page_ref_count() done in the proper order. Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Reported-by: Will Deacon Closes: https://lore.kernel.org/20241128142028.GA3506@willie-the-truck/ Signed-off-by: Yu Zhao --- include/linux/page-flags.h | 2 +- include/linux/page_ref.h | 8 ++++++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 691506bdf2c5..6b8ecf86f1b6 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -212,7 +212,7 @@ static __always_inline const struct page *page_fixed_fake_head(const struct page * cold cacheline in some cases. */ if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && - test_bit(PG_head, &page->flags)) { + test_bit_acquire(PG_head, &page->flags)) { /* * We can safely access the field of the @page[1] with PG_head * because the @page is a compound page composed with at least diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 8c236c651d1d..5becea98bd79 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -233,8 +233,12 @@ static inline bool page_ref_add_unless(struct page *page, int nr, int u) bool ret = false; rcu_read_lock(); - /* avoid writing to the vmemmap area being remapped */ - if (!page_is_fake_head(page) && page_ref_count(page) != u) + /* + * To avoid writing to the vmemmap area remapped into r/o in parallel, + * the page_ref_count() test must precede the page_is_fake_head() test + * so that test_bit_acquire() in the latter is ordered after the former. + */ + if (page_ref_count(page) != u && !page_is_fake_head(page)) ret = atomic_add_unless(&page->_refcount, nr, u); rcu_read_unlock();