From patchwork Fri Jun 11 16:15:45 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jann Horn <jannh@google.com>
X-Patchwork-Id: 12316105
Return-Path: <SRS0=QBBA=LF=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 72526C48BE5
	for <linux-mm@archiver.kernel.org>; Fri, 11 Jun 2021 16:16:00 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id DDDCA613F9
	for <linux-mm@archiver.kernel.org>; Fri, 11 Jun 2021 16:15:59 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DDDCA613F9
Authentication-Results: mail.kernel.org;
 dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org;
 spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 61B7A6B0036; Fri, 11 Jun 2021 12:15:59 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5CC746B006C; Fri, 11 Jun 2021 12:15:59 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 41E826B006E; Fri, 11 Jun 2021 12:15:59 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0035.hostedemail.com
 [216.40.44.35])
	by kanga.kvack.org (Postfix) with ESMTP id 109106B0036
	for <linux-mm@kvack.org>; Fri, 11 Jun 2021 12:15:59 -0400 (EDT)
Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com
 [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 9E8B21811CE2E
	for <linux-mm@kvack.org>; Fri, 11 Jun 2021 16:15:58 +0000 (UTC)
X-FDA: 78241944396.29.7CD73C0
Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com
 [209.85.128.42])
	by imf29.hostedemail.com (Postfix) with ESMTP id 16E14369
	for <linux-mm@kvack.org>; Fri, 11 Jun 2021 16:15:49 +0000 (UTC)
Received: by mail-wm1-f42.google.com with SMTP id
 f16-20020a05600c1550b02901b00c1be4abso8971811wmg.2
        for <linux-mm@kvack.org>; Fri, 11 Jun 2021 09:15:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=from:to:cc:subject:date:message-id:mime-version
         :content-transfer-encoding;
        bh=IVqMlYWdikjDHutbFv9+dfzcUm5ALsiZkNs2gPCnaS0=;
        b=PGwMUfuS/p3ydUOe60pp0aCTn2kyOQkP8PcmwQ0lBDeztyN+J1ZM7NngVUY8ChWg96
         KZ7y1UiZIVHqnMld7eCgR19SMR4h2T4xYKitMqW79cmz7bUgBnwifpZbypx94+GHx1Th
         H0bdCPhdg2GbARm+gbw3Pq9EHrAZykVG797FJxMwa8bwOlJ5sPi5qburvR0Iai+ebz9s
         8sAmlaazKIt4wVzPThDgYP8/Uwkw4Mb8ioLwKlCrdzPPaIkHrKCbCG0dZtAu2z2PC7re
         1PBieWGxL33yemWwO6hQXZ/n/mu8VDKSO5JXkv+vR+hQe6rbpsM0Eeyyn96qHj685Uzl
         QYVw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version
         :content-transfer-encoding;
        bh=IVqMlYWdikjDHutbFv9+dfzcUm5ALsiZkNs2gPCnaS0=;
        b=nL/5GoLJ7DMXPF890EEngnGe0RZ99XA8K1+tJIf/pPsHT2DCueFk0WGXf4FRga1z2x
         t5QJOyWVxt1T6IqpNrypVAIpu2nRulx9C8zvAyopWy/F/HXnri9i0SpIykhyDjZG4yKS
         EIx1jjCFB4zI5WN7iPuWAZgg1axOhtlhCIv4V1260b2uVnaZ2bweTP1YaxyFWLo4I25F
         /w5D7pzhPRCOCBpNLiA098lnpyJYCMFwGSI32fB1N4c5lVhGkvwMz17i2BLUACLUO2Xr
         OeCj1HMaGw+O+7PLx2NWHBFyF1MuXYO7j/MluNzsTi08Pc3N06PTneOPisCIxGYQUZAl
         SIlA==
X-Gm-Message-State: AOAM531ISS+OPVIr/UsPnEeb1UxHcOXDq+gZK7oYlP9jytnXFpUoqv4f
	ivP78cnkT8k3hs31NFRnI55P/A==
X-Google-Smtp-Source: 
 ABdhPJylBNw6V0z1EED8yUk5TJvp2GWnvfg079+NSy4Sn+GSs0EkvtjTWIUwsV1pNwOqB+NwWR5uxA==
X-Received: by 2002:a7b:c006:: with SMTP id c6mr20720159wmb.11.1623428156839;
        Fri, 11 Jun 2021 09:15:56 -0700 (PDT)
Received: from localhost ([2a02:168:96c5:1:55ed:514f:6ad7:5bcc])
        by smtp.gmail.com with ESMTPSA id
 h15sm7470993wrq.88.2021.06.11.09.15.56
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 11 Jun 2021 09:15:56 -0700 (PDT)
From: Jann Horn <jannh@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Jann Horn <jannh@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	John Hubbard <jhubbard@nvidia.com>,
	Jan Kara <jack@suse.cz>,
	stable@vger.kernel.org
Subject: [PATCH resend] mm/gup: fix try_grab_compound_head() race with
 split_huge_page()
Date: Fri, 11 Jun 2021 18:15:45 +0200
Message-Id: <20210611161545.998858-1-jannh@google.com>
X-Mailer: git-send-email 2.32.0.272.g935e593368-goog
MIME-Version: 1.0
Authentication-Results: imf29.hostedemail.com;
	dkim=pass header.d=google.com header.s=20161025 header.b=PGwMUfuS;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf29.hostedemail.com: domain of jannh@google.com designates
 209.85.128.42 as permitted sender) smtp.mailfrom=jannh@google.com
X-Rspamd-Server: rspam02
X-Stat-Signature: y93ofhx81xnj1cwwjqg4s88jfdqxb4cg
X-Rspamd-Queue-Id: 16E14369
X-HE-Tag: 1623428149-783163
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent
freeing of page tables (via local_irq_save()), but not against
concurrent TLB flushes, freeing of data pages, or splitting of compound
pages.

Because no reference is held to the page when try_grab_compound_head()
is called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).

The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound
page may have been split up (either explicitly through split_huge_page()
or by freeing the compound page to the buddy allocator and then
allocating its individual order-0 pages).
If that happens, get_user_pages_fast() may end up returning the right
page but lifting the refcount on a now-unrelated page, leading to
use-after-free of pages.

To fix it:
Re-check whether the pages still belong together after lifting the
refcount on the head page.
Move anything else that checks compound_head(page) below the refcount
increment.

This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits,
so that PV TLB flushes are used instead of IPIs).

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
resending because linux-mm was down

 mm/gup.c | 54 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 15 deletions(-)


base-commit: 614124bea77e452aa6df7a8714e8bc820b489922

diff --git a/mm/gup.c b/mm/gup.c
index 3ded6a5f26b2..1f9c0ac15073 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -43,8 +43,21 @@ static void hpage_pincount_sub(struct page *page, int refs)
 
 	atomic_sub(refs, compound_pincount_ptr(page));
 }
 
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+	VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
+	/*
+	 * Calling put_page() for each ref is unnecessarily slow. Only the last
+	 * ref needs a put_page().
+	 */
+	if (refs > 1)
+		page_ref_sub(page, refs - 1);
+	put_page(page);
+}
+
 /*
  * Return the compound head page with ref appropriately incremented,
  * or NULL if that failed.
  */
@@ -55,8 +68,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
 	if (WARN_ON_ONCE(page_ref_count(head) < 0))
 		return NULL;
 	if (unlikely(!page_cache_add_speculative(head, refs)))
 		return NULL;
+
+	/*
+	 * At this point we have a stable reference to the head page; but it
+	 * could be that between the compound_head() lookup and the refcount
+	 * increment, the compound page was split, in which case we'd end up
+	 * holding a reference on a page that has nothing to do with the page
+	 * we were given anymore.
+	 * So now that the head page is stable, recheck that the pages still
+	 * belong together.
+	 */
+	if (unlikely(compound_head(page) != head)) {
+		put_page_refs(head, refs);
+		return NULL;
+	}
+
 	return head;
 }
 
 /*
@@ -94,25 +122,28 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
 		if (unlikely((flags & FOLL_LONGTERM) &&
 			     !is_pinnable_page(page)))
 			return NULL;
 
+		/*
+		 * CAUTION: Don't use compound_head() on the page before this
+		 * point, the result won't be stable.
+		 */
+		page = try_get_compound_head(page, refs);
+		if (!page)
+			return NULL;
+
 		/*
 		 * When pinning a compound page of order > 1 (which is what
 		 * hpage_pincount_available() checks for), use an exact count to
 		 * track it, via hpage_pincount_add/_sub().
 		 *
 		 * However, be sure to *also* increment the normal page refcount
 		 * field at least once, so that the page really is pinned.
 		 */
-		if (!hpage_pincount_available(page))
-			refs *= GUP_PIN_COUNTING_BIAS;
-
-		page = try_get_compound_head(page, refs);
-		if (!page)
-			return NULL;
-
 		if (hpage_pincount_available(page))
 			hpage_pincount_add(page, refs);
+		else
+			page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
 
 		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
 				    orig_refs);
 
@@ -134,16 +165,9 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
 		else
 			refs *= GUP_PIN_COUNTING_BIAS;
 	}
 
-	VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
-	/*
-	 * Calling put_page() for each ref is unnecessarily slow. Only the last
-	 * ref needs a put_page().
-	 */
-	if (refs > 1)
-		page_ref_sub(page, refs - 1);
-	put_page(page);
+	put_page_refs(page, refs);
 }
 
 /**
  * try_grab_page() - elevate a page's refcount by a flag-dependent amount