From patchwork Wed Jan 11 13:33:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 13096683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2361EC46467 for ; Wed, 11 Jan 2023 13:34:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B4EDA8E0002; Wed, 11 Jan 2023 08:34:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AD84B8E0001; Wed, 11 Jan 2023 08:34:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 978288E0002; Wed, 11 Jan 2023 08:34:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 846AC8E0001 for ; Wed, 11 Jan 2023 08:34:02 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 440D18033B for ; Wed, 11 Jan 2023 13:34:02 +0000 (UTC) X-FDA: 80342611524.20.16FB89D Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by imf02.hostedemail.com (Postfix) with ESMTP id A829B80005 for ; Wed, 11 Jan 2023 13:34:00 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bXVlqB2g; spf=pass (imf02.hostedemail.com: domain of jannh@google.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673444040; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=OXEe94MmZGPMMW6utG3JoRma4Dh+hLsL+OKzICtq6dg=; b=ta2G2MWsp1GnzvvbSMNDeViGQb4kfqcFIOZDJ8uzFbRjU4xYBEdwszIHvzOyvG42kt17y0 Z9C+nabSGitwLL4NMapJM14MJp1vJJi0MrJh9Zb7A4M3+yczY1XgKsRAmQArmMjH3Rf4Kp r9KDv/gfywn1mtYZst9LgN6SHFzRbxo= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bXVlqB2g; spf=pass (imf02.hostedemail.com: domain of jannh@google.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673444040; a=rsa-sha256; cv=none; b=xdnFBTpLNnA7zYGqodyqY2XbfkYimXTc5+2Qwpo6sL148SzrCK8WnNeaHGNiCYhWxJrJNM WdyYbWWd8Wh978b0nVQKC2h6Jrt+KzpglcHl1GMN/ZJVkIVsu/gyyFJ9HCor4xMjfPTqCB TRuUJVLSeLZ1bKUdxZ0/quZP4zFstfQ= Received: by mail-wr1-f49.google.com with SMTP id z5so13975672wrt.6 for ; Wed, 11 Jan 2023 05:34:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=OXEe94MmZGPMMW6utG3JoRma4Dh+hLsL+OKzICtq6dg=; b=bXVlqB2gP77kS0Zl/TFPemwO0HjNUvFDjcGx/TeRf/IIOFf8hjTnjri6RfCObLxc/q bk3QmTlD0VHp/Yhya9QuhfNhk/opKk9NjxM4A6jmYKOBPq2NfJELaoLko76QUQf8bE1B Dhcsw2YpHArTlaVyPE+GvEBf/dUDzlhegkzTp3TvlfZr6HQrrUv1jzC2lqB7O08B96KN l2yT6L/HDj49qKe38WVtDyZfY56fqHeqvV7G4eMjEAlfMxnLouxYOOgW8A/r1c+XClPW Q5p1m8DM7Q9Vk4CqyZi/g0ooaiFaWEUarISBQ/hkfPvXTF6kESRcTAmnkt4vlfIMn7Dw XqbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OXEe94MmZGPMMW6utG3JoRma4Dh+hLsL+OKzICtq6dg=; b=orKDL/BVMxsKunWuEFdGfilI+hkxsjjYmHrjtydMtreGisj0aIXdbexHVH2iD4O7Fz DRspoFrtHB7ZX+wkdSo43XjTevCevFmUl0zp+4ySSQAKhgxkI3Vgz0UHzQVk3RgNcfQy HqQurertRQdjen+2m5e/i0cAyN5zSjU+YLtKfbueP3YwgOUUVNxw5arKM+PmUrkVxr2l 0GE0KnSILdKMHl2i9lhjgzfevJEdq4ykFv5So5kC7LXGjLiGpMel+1cCOrUQXpL7xlxm a65I9hDNIhYlZoqtJT9XsRYsh2RMjoa0meIVnDM5ZHtoYwTmExFEBPcwcJztTTjPsEg7 rTeA== X-Gm-Message-State: AFqh2kow311ft+7HYxZndJ3/JnI9Vv2/EoB/d+5ibD8xS/9q8ZrJTspg hWM4+bUEl5nNwzMM7K4P1S4ecw== X-Google-Smtp-Source: AMrXdXuOaYorv1fwEINO+93jyv0T6KFgMAJKO8yUaqNeYRqV2arOxo9P+BlA43e1sQiT1juu/H6llw== X-Received: by 2002:a05:6000:1e0c:b0:2bb:e890:b9c8 with SMTP id bj12-20020a0560001e0c00b002bbe890b9c8mr211734wrb.1.1673444039159; Wed, 11 Jan 2023 05:33:59 -0800 (PST) Received: from localhost ([2a00:79e0:9d:4:89cb:eee6:2c:ec53]) by smtp.gmail.com with ESMTPSA id o15-20020a5d62cf000000b002bbedd60a9asm8187914wrv.77.2023.01.11.05.33.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Jan 2023 05:33:58 -0800 (PST) From: Jann Horn To: Andrew Morton , linux-mm@kvack.org Cc: "Kirill A. Shutemov" , Zach O'Keefe , linux-kernel@vger.kernel.org, David Hildenbrand , Yang Shi Subject: [PATCH] mm/khugepaged: Fix ->anon_vma race Date: Wed, 11 Jan 2023 14:33:51 +0100 Message-Id: <20230111133351.807024-1-jannh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog MIME-Version: 1.0 X-Stat-Signature: 1tpkgg1toffsra5qm95es9ctuh4yxdi7 X-Rspam-User: X-Rspamd-Queue-Id: A829B80005 X-Rspamd-Server: rspam06 X-HE-Tag: 1673444040-537275 X-HE-Meta: U2FsdGVkX18kndiZi4MJ9agmWRhqMKyXlyxyDO+pVh8qJTeIsSsywGgCTaIqO2RrFfY2vCDJc3CeVbrF61i5YkEXnekHiQ0NqRvArsAcUpyvcgVAe+35F5zGmm900nCEmrgErK9pD3pqMZQXfkWetmO9rnNPbAW/j+uxB1yeBOaaff3pYPi8Tl7MFOyH2XDj8FD/U3sKoeUcZj0XzSMUjm+YXuTXhWzBF7MBVFFONuefLCArqV73qhM3sWkUzKIb4kvzHYjeVqwcReqBJ+zv8ukh1RUsza45FOAm7mDa2oQLf4fc4aF0IsQQ85b/kyVy8lyc4/tkP1ttY3EeYbatUMV4rKerTc1lRrtbPynKALaTSAW5yf54X5btaw3sZQ4Zw1rNTioFOiiMZRfcM1FMhH8V062kLDc+JPyUg1aDNXwo6ZXm3NabnkvyyxfWss6EcUTB/EP3+QXL1v2UpR/j4m8Zcrv+wIgDHk3+fDyheZI2YZEyUh2gaCTZNWx1MKjIZGEsTkIErUlezWOwTCTdQdh6mm994xuVoyi7SI4kRvNkDwZs6yxis1ZGJmEvhmHME81XpWA3e5rF6K6j5glQFKMzVN8V/dvqrMqAoYN53shThNjczeIcPUXqph1Im4wYDSkKK1jHSdeNuBiOxSA3FeXKUf0L2GtGwYt51iGoo+MF/GqbiZwR1rHJ3hUvHrkXqultnwsuWiZhbTY/z2Ay9l7Oz4lQMyayPFU6NcW66OYueyntA74Tpiwx76vZWHdlZxwoBwnAhD7mJq0aav0zEcpkAQVoGH586gT5+vAFvaT/wLHO0eTkTSMUxtERm4HKtB3Sonxsx0IOksQBWFPmJevE8URrqsBpXMj++EFNuOdel+Y7LZGug33zT3Hd+wIgRhoHHzRdrEE2DRfQ32S7XlBRtCMsDCnzqpD3BjWZdwUNLJTXFCPkGDRH9wazcmhgbnh51ncVXOyjhe8Naw0 zIyK1cK6 0NU3Mc4PgThs86LrAyRmhBxLR37U4GLSFbj3hMpTbUDDoGr3SmCZYXGsIRZeFPAKwArXAYF/6OgRSfkHa7WPescKoQXGsQS9uDGJeW+XgfuQLwnZYHpR7oKsTGl6PSIp9oo7rAP5Ar4YtIJZONSNsDPrLR7K+B9e6/c3477L+k6gPVdkoKkwWAPOXxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires it to be locked. retract_page_tables() bails out if an ->anon_vma is attached, but does this check before holding the mmap lock (as the comment above the check explains). If we racily merge an existing ->anon_vma (shared with a child process) from a neighboring VMA, subsequent rmap traversals on pages belonging to the child will be able to see the page tables that we are concurrently removing while assuming that nothing else can access them. Repeat the ->anon_vma check once we hold the mmap lock to ensure that there really is no concurrent page table access. Reported-by: Zach O'Keefe Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn Acked-by: Kirill A. Shutemov --- zokeefe@ pointed out to me that the current code (after my last round of patches) can hit a lockdep assert by racing, and after staring at it a bit I've convinced myself that this is a real, preexisting bug. (I haven't written a reproducer for it though. One way to hit it might be something along the lines of: - set up a process A with a private-file-mapping VMA V1 - let A fork() to create process B, thereby copying V1 in A to V1' in B - let B extend the end of V1' - let B put some anon pages into the extended part of V1' - let A map a new private-file-mapping VMA V2 directly behind V1, without an anon_vma [race begins here] - in A's thread 1: begin retract_page_tables() on V2, run through first ->anon_vma check - in A's thread 2: run __anon_vma_prepare() on V2 and ensure that it merges the anon_vma of V1 (which implies V1 and V2 must be mapping the same file at compatible offsets) - in B: trigger rmap traversal on anon page in V1' mm/khugepaged.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) base-commit: 7dd4b804e08041ff56c88bdd8da742d14b17ed25 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5cb401aa2b9d..0bfed37f3a3b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1644,7 +1644,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, * has higher cost too. It would also probably require locking * the anon_vma. */ - if (vma->anon_vma) { + if (READ_ONCE(vma->anon_vma)) { result = SCAN_PAGE_ANON; goto next; } @@ -1672,6 +1672,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, result = SCAN_PTE_MAPPED_HUGEPAGE; if ((cc->is_khugepaged || is_target) && mmap_write_trylock(mm)) { + /* + * Re-check whether we have an ->anon_vma, because + * collapse_and_free_pmd() requires that either no + * ->anon_vma exists or the anon_vma is locked. + * We already checked ->anon_vma above, but that check + * is racy because ->anon_vma can be populated under the + * mmap lock in read mode. + */ + if (vma->anon_vma) { + result = SCAN_PAGE_ANON; + goto unlock_next; + } /* * When a vma is registered with uffd-wp, we can't * recycle the pmd pgtable because there can be pte