From patchwork Mon Nov 18 16:47:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jann Horn X-Patchwork-Id: 13878832 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B303D591A5 for ; Mon, 18 Nov 2024 16:47:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D112D6B0083; Mon, 18 Nov 2024 11:47:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CC17C6B0085; Mon, 18 Nov 2024 11:47:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B62B96B0088; Mon, 18 Nov 2024 11:47:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 98AE36B0083 for ; Mon, 18 Nov 2024 11:47:20 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3D8BD1202EC for ; Mon, 18 Nov 2024 16:47:20 +0000 (UTC) X-FDA: 82799794728.26.A36F5D3 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) by imf05.hostedemail.com (Postfix) with ESMTP id 124C1100011 for ; Mon, 18 Nov 2024 16:45:44 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oYIJKSX+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of jannh@google.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731948255; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=yulWlyAryarglyJHhf1g4QlaVWiEt2glir3o6MKhddQ=; b=Jmz0yA6twuI8cr8a38doky3RKBbFvryRKHsw6Aayqov3KY8ucLiqQZw7Q6mvrz2YSkk/zG pA5xDwndmpi06k9J/SqHB6Z46oix4Q7XrZxx3INL/8tFoxmDub/ij7A4RLuztN6Qd2Jwcb aOZ7xNzWJizV8sgOnkHH+xl2AYhngQw= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oYIJKSX+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of jannh@google.com designates 209.85.128.54 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731948255; a=rsa-sha256; cv=none; b=1Rn1idztySq9eqHej8TQHX865fsXGFUmu4i7fcgb0dhoegNO+071R5w6z0wJ9EoHELiC+L 6s9F1ynXQgg383SVmqDsni86VfEtVSxoAaJbyJ1OeZHjzD+ux7CFtzHncubsaY5wqfMavK vHZ791Y88khixZRiNllAPs4OUmE80ho= Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-432d7d5fcd6so94055e9.1 for ; Mon, 18 Nov 2024 08:47:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1731948436; x=1732553236; darn=kvack.org; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:from:to:cc:subject:date:message-id:reply-to; bh=yulWlyAryarglyJHhf1g4QlaVWiEt2glir3o6MKhddQ=; b=oYIJKSX+opoME2qohOBi+e7z7faYV27ijxAIJE7mnkqqV2vOH9rg5u7GBusp2ErPZt hf/Jemxu4DW//teypWJHEYLeKl0fh9JmPrZyVfvJStciq8/3nbco0fQVZbEQqpqZIoOA 1rkIU9qboXKQ27wHuyrmDN7sB8aya7r1mFQahhoirIDSntdz17Xl4Iw4oDHpz1jFrBo1 V7k4MzAKMtX3p86P+Bw7Hjknt1UnLPh5VG0zso50pMIzktv1mYTAMHbSdrmwhRtMlhO6 yoon/Qd2LWVC9Irly3TBnE6UTvRP+/feKklkYUmEpzvILhfX/3eY5T4TzGA1Os+vFPlQ /jnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731948436; x=1732553236; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yulWlyAryarglyJHhf1g4QlaVWiEt2glir3o6MKhddQ=; b=hMu59LHsSgE+owX+Ck0TP7BFL+U+uRT9TT30n7YwV1oBH6ICk7yLfCzzP7mh7Y+bAe Kwgfmc2qK2wUyHjfFV0+qGYvxUud7mj6ADJVWL3jyRLoESmWO26LFzhp9cLc5K4F6xa8 TM3XeWYDfZ7LtnT/pRzTemTGKkUurDFz+QduEmJeEm7mZqpkHT5O1Upyf20ICOfp7bIz viQ6X6V2EjUegNBKQNF+3YkE56M+oWcFaOjOE93R6VuOYwRpUDE74pGYTebLn/PSIjTz JgZHvjkn6nxfe78/XdyjSVAoo8U6fCcTU1W9pN21mJC/HCMfHrJLcz3RZreTo7NHK6fa g3jw== X-Forwarded-Encrypted: i=1; AJvYcCUIS2X6sbJ3bNfIyjqWQRiWlP7yfPbeGG1wqjvCZTsxS1yFkf1XQEoB+NOibf0RRf5OEVVfZHwSkg==@kvack.org X-Gm-Message-State: AOJu0YzXquDFlyOT7uXj4ky8y+rxua1pACyoW8L+Di7PGIAkXMODZjj0 4hLDrlrRBHcVMuVVatJN0IWh+8/jGEvBRVG2ufDEQneixPmlLDVPIKmEnktFQQ== X-Gm-Gg: ASbGncsH43pBscZ7IPL4vLl5g2vH+O6UqSt8IxZvhaHqRu0pq6nSPdnlyuY8aO9bBBb Wy1eO6AgjnoQvDnVqkyaBIGeKytXqoRYrkZBdwKa1wEXfQIyPwVbKyhCPuRZDW36FruXGsgG+o2 uR80EYog4cDcfqg/mE/8PQtSHHL8MrI3z7cxXeaDae9LxkM7J5QHxFsKW5gDQzhpjmME6qytTx9 0ICJwjfQFfNoHpgcKAWstDqki9up9Iadq26iw== X-Google-Smtp-Source: AGHT+IFNBQnXcXNYwBA4uY4EnNPQT7jvfiQd0dZbXjjA3odLp7P3c1YobAYW4+ktniSoixOes7Rx0g== X-Received: by 2002:a05:600c:378e:b0:426:68ce:c97a with SMTP id 5b1f17b1804b1-432e6c68531mr2946445e9.7.1731948436066; Mon, 18 Nov 2024 08:47:16 -0800 (PST) Received: from localhost ([2a00:79e0:9d:4:67c6:82f7:cb35:6755]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3824a42008esm2049157f8f.66.2024.11.18.08.47.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Nov 2024 08:47:15 -0800 (PST) From: Jann Horn Date: Mon, 18 Nov 2024 17:47:08 +0100 Subject: [PATCH v2] docs/mm: add more warnings around page table access MIME-Version: 1.0 Message-Id: <20241118-vma-docs-addition1-onv3-v2-1-c9d5395b72ee@google.com> X-B4-Tracking: v=1; b=H4sIAItvO2cC/4WNQQrCMBBFryKzdqSTBktdeQ/pIiQzbcB2JClBK b27sRdw+T7/v79B5hQ5w+20QeISc9SlgjmfwE9uGRljqAymMZaILJbZYVCf0YUQ19om1KW02Jo glq9BxAvU9SuxxPdhfgyVp5hXTZ/jqNAv/e8shIQi1HWucabv7X1UHZ988TrDsO/7F2mdtF+/A AAA X-Change-ID: 20241114-vma-docs-addition1-onv3-32df4e6dffcf To: Andrew Morton , Jonathan Corbet , Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka Cc: Alice Ryhl , Boqun Feng , Matthew Wilcox , Mike Rapoport , Suren Baghdasaryan , Hillf Danton , Qi Zheng , SeongJae Park , Bagas Sanjaya , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Matteo Rizzo , Jann Horn X-Mailer: b4 0.15-dev X-Developer-Signature: v=1; a=ed25519-sha256; t=1731948432; l=5458; i=jannh@google.com; s=20240730; h=from:subject:message-id; bh=Tj6as4lyfOj870ZEtZFErV7pZX1VT/55uSBp4Vx4tOs=; b=cPYqKG4VICkcO8YOIeEi7vfjkAdMtNeAdAIBaayMANfxSgQ9qij3Dy8QQNfYInj9DI9yqK5Ct g3qS4QMIhm5AOsf5uKCuLY2siStASz+GNULQiXe6C13sdXsaq19sF0U X-Developer-Key: i=jannh@google.com; a=ed25519; pk=AljNtGOzXeF6khBXDJVVvwSEkVDGnnZZYqfWhP1V+C8= X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 124C1100011 X-Stat-Signature: 9fg4qwfyt4ckrn5gpmow7b8qemag19wy X-Rspam-User: X-HE-Tag: 1731948344-443108 X-HE-Meta: U2FsdGVkX1/kRM5xgngqKuyY3fJsx+0FdfqRjSR86HBSuAUHep3pKbfnFJZD3/s+j2wJDbnVeJahngkB0N9ZvPBEQjtamxkmgMVOee6L6JOkS4K78JsWrOuln9IP9AOFiGgWzecASvaAcpVS4RgNKvg15cBH391fTYxQmEG5LH6orTeTAzScdWZIVqqOrFO3OLLDvXNVKni0Q2YdF1hEdY+bAEaEShgJPRl1ouYlWXfcN5wOeyCJW4izwvj9vXP745qTDyDmBKxOXXdkKzuqkJumPwhfnbZXZTYqVCQ1++eysCsENYmwU94slXOQBETsCMsL2xpSiq1AdSYM2fOi3U/7GkoO3oHPotc9y/Jud8MYFwre1Wc8juV23bhLXeSFS+op9GB+DN/TlQaSOUodKGVIXxO8ka8VsqzQ1S4OcHoHcM2StoYGKlhS7gm351+NYKNYimGqnSM5JhwV1FyPeSOjwqUIc2w4iRNtgxcT2ttwhy3eg1hFk8FDsXNKN2NPx6s/DbFdzPL4cUHvLjcdESiM95xWaBjzafMoNJKkNJvcJ3HehECoxrUiUUdNX+EFfnUETNi3QxT8kRTSQeuBUBSYl+MTPgXvDREWImgfKxM92EH0g2pv/UgUIO5+lq+eCW/ayaCOumLa0WytyfIhIudt8TFbgsCkDwwtMiFHu8sPCEV8oUSe7XPZdsX49NrFW9noHfC/rO2kEOLc6smlnnAmJlUztqwYpykKrZh2dRCFtUwetvEwlwRf9/J6OegF+k3H6XWjX66/Ck0yauuJAOrjsY1b8TEbFJmPQAz7dLcawL+iCWjRJiok4VFq/e2WVgvrSRNC5ODdhKAuhnRBHVpZHBw6pUDrmltzFP1MPJ37aMFqoudpMZBKap6VZDdV5SXC5q1hH1h9kSHMSQdcTMPuppWJrYVeDw1IFz7cjzAPph3XXWb0tc3M0JKUzWzCc8zKaOflP3mLPrtz9t1 vTILB9kh FfnnqSvbg1E3Fr4TIB41XQ0hnfxjUmT12TwJDFEZUD2ElJrz9BJymEX1eXKLTJBlcnBP2X8Ff8JNEBLab67NEdP4PaGH0+yEWoJhfFeys7r1dsXPDFjDUYQb3Pm4hhobrY6WhI7qxjEld7g+1yZsQBGk6Fm5NvWWcoCxH9FjsG9/C6FAZDFey3FkYgP4NHdVW0wNc+X1YJUje3VNY4AH5/IxR8X5JRy3HroacsE8zsikg5oYB1oZzt9ZHvEsf+Hg8bimTHD0YykCno+5BxCLxOsrrrkvUyQTRQkQ64vtMJ9lllFrLqoVJQXFvp+iSHRj43QGtmC+01y/roWq+avMhmvJMhJ58+nUbSi0GQaS4o22wm1tngC+sM4XpGuybSVA/Ql1wljS5Dx1w0HaWSOZRivWeVxvslh4Wm9vUkecM9NF8wzmuc9ezXvvuHz0sRCCZto8B X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Make it clearer that holding the mmap lock in read mode is not enough to traverse page tables, and that just having a stable VMA is not enough to read PTEs. Suggested-by: Matteo Rizzo Suggested-by: Lorenzo Stoakes Signed-off-by: Jann Horn Reviewed-by: Lorenzo Stoakes Acked-by: Qi Zheng --- Changes in v2: - improved based on feedback from Lorenzo - Link to v1: https://lore.kernel.org/r/20241114-vma-docs-addition1-onv3-v1-1-ff177a0a2994@google.com --- Documentation/mm/process_addrs.rst | 46 +++++++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 10 deletions(-) --- base-commit: 1e96a63d3022403e06cdda0213c7849b05973cd5 change-id: 20241114-vma-docs-addition1-onv3-32df4e6dffcf diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst index 1bf7ad010fc063d003bb857bb3b695a3eafa0b55..1d416658d7f59ec595bd51018f42eec606f7e272 100644 --- a/Documentation/mm/process_addrs.rst +++ b/Documentation/mm/process_addrs.rst @@ -339,6 +339,11 @@ When **installing** page table entries, the mmap or VMA lock must be held to keep the VMA stable. We explore why this is in the page table locking details section below. +.. warning:: Page tables are normally only traversed in regions covered by VMAs. + If you want to traverse page tables in areas that might not be + covered by VMAs, heavier locking is required. + See :c:func:`!walk_page_range_novma` for details. + **Freeing** page tables is an entirely internal memory management operation and has special requirements (see the page freeing section below for more details). @@ -450,6 +455,9 @@ the time of writing of this document. Locking Implementation Details ------------------------------ +.. warning:: Locking rules for PTE-level page tables are very different from + locking rules for page tables at other levels. + Page table locking details -------------------------- @@ -470,8 +478,12 @@ additional locks dedicated to page tables: These locks represent the minimum required to interact with each page table level, but there are further requirements. -Importantly, note that on a **traversal** of page tables, no such locks are -taken. Whether care is taken on reading the page table entries depends on the +Importantly, note that on a **traversal** of page tables, sometimes no such +locks are taken. However, at the PTE level, at least concurrent page table +deletion must be prevented (using RCU) and the page table must be mapped into +high memory, see below. + +Whether care is taken on reading the page table entries depends on the architecture, see the section on atomicity below. Locking rules @@ -489,12 +501,6 @@ We establish basic locking rules when interacting with page tables: the warning below). * As mentioned previously, zapping can be performed while simply keeping the VMA stable, that is holding any one of the mmap, VMA or rmap locks. -* Special care is required for PTEs, as on 32-bit architectures these must be - mapped into high memory and additionally, careful consideration must be - applied to racing with THP, migration or other concurrent kernel operations - that might steal the entire PTE table from under us. All this is handled by - :c:func:`!pte_offset_map_lock` (see the section on page table installation - below for more details). .. warning:: Populating previously empty entries is dangerous as, when unmapping VMAs, :c:func:`!vms_clear_ptes` has a window of time between @@ -509,8 +515,28 @@ We establish basic locking rules when interacting with page tables: There are additional rules applicable when moving page tables, which we discuss in the section on this topic below. -.. note:: Interestingly, :c:func:`!pte_offset_map_lock` holds an RCU read lock - while the PTE page table lock is held. +PTE-level page tables are different from page tables at other levels, and there +are extra requirements for accessing them: + +* On 32-bit architectures, they may be in high memory (meaning they need to be + mapped into kernel memory to be accessible). +* When empty, they can be unlinked and RCU-freed while holding an mmap lock or + rmap lock for reading in combination with the PTE and PMD page table locks. + In particular, this happens in :c:func:`!retract_page_tables` when handling + :c:macro:`!MADV_COLLAPSE`. + So accessing PTE-level page tables requires at least holding an RCU read lock; + but that only suffices for readers that can tolerate racing with concurrent + page table updates such that an empty PTE is observed (in a page table that + has actually already been detached and marked for RCU freeing) while another + new page table has been installed in the same location and filled with + entries. Writers normally need to take the PTE lock and revalidate that the + PMD entry still refers to the same PTE-level page table. + +To access PTE-level page tables, a helper like :c:func:`!pte_offset_map_lock` or +:c:func:`!pte_offset_map` can be used depending on stability requirements. +These map the page table into kernel memory if required, take the RCU lock, and +depending on variant, may also look up or acquire the PTE lock. +See the comment on :c:func:`!__pte_offset_map_lock`. Atomicity ^^^^^^^^^