From patchwork Tue Apr 25 23:15:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenzo Stoakes X-Patchwork-Id: 13223892 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F6F0C77B7C for ; Tue, 25 Apr 2023 23:15:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C3D096B0074; Tue, 25 Apr 2023 19:15:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BED566B0075; Tue, 25 Apr 2023 19:15:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8EA06B0078; Tue, 25 Apr 2023 19:15:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 95FA76B0074 for ; Tue, 25 Apr 2023 19:15:44 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 50FA2403FA for ; Tue, 25 Apr 2023 23:15:44 +0000 (UTC) X-FDA: 80721472608.14.345DA26 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf02.hostedemail.com (Postfix) with ESMTP id 7FBA080011 for ; Tue, 25 Apr 2023 23:15:42 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="P/ZgbjiO"; spf=pass (imf02.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682464542; a=rsa-sha256; cv=none; b=U2ETd1CePVLN6hp3kpmQrNPHWAfrFxoLcg4L6tX3rryJk+tE1Ty6ZiG20ig/3NkEMlIOs/ YqJJhhNaABuOAYTsuezH8IWhnR3ugSb02MtRFUywqhelh7KaUf0zmYUCGAGTBdRIBzNiLm kTMx09+iCn/pzAcd9RuyhQsTPYBzukE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="P/ZgbjiO"; spf=pass (imf02.hostedemail.com: domain of lstoakes@gmail.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=lstoakes@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682464542; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=ZepthuyfXWO/R7g3gyOdstMJqp904AnhCOVcdOoCQb0=; b=F2X0WqwDu3JrO6Uuh1Lm5gHq2ImcsGVg0voHC0I2EgEtL986lA7s2iXjd9IYblIgwU6SXH nE5bgBb1a+AsX0oSu1x7q7OPclX50S1SUjLQSfpDEYZk3zVzMUIhrsXd4uUNPeSmTsX4+d FVXp2ttgwKnR0VLDbRu2BhLCb9xmxuA= Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-3f1763ee8f8so44066245e9.1 for ; Tue, 25 Apr 2023 16:15:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682464540; x=1685056540; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ZepthuyfXWO/R7g3gyOdstMJqp904AnhCOVcdOoCQb0=; b=P/ZgbjiORZxfSpThRX4OlzffNCX8Ybcr1lBmTad9NQf7+dexFazlUkuxqanlqMWSjh jKJffLtWMQxKhG1CwTJwCEPfdZkkDQt3XiNqXg8s49c5JHiMVO89913P9zfMplKGt94d I+U8HApgNC+zXFC38f4veYPb0l0xJFoLMT2yq2Tr1qI5CXQu/qTxxN+TMmC1tqmDei4w 6Hf1+XWAYtPVTMpHYmqLX+wAiGha6NIUv26p33Z9n0DfnkU7STnjTwu36Zzz/UWY+Wfp eJo9yW6IsTAlmoCJs6j2D1wQav0ZAnawZf2Cvx2N3nO4EjmtUWCQazmwwbqMu6GNaCtB TQQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682464540; x=1685056540; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZepthuyfXWO/R7g3gyOdstMJqp904AnhCOVcdOoCQb0=; b=RfYaJBOWYucFmo4nkPcyAZibul7YxFUMu5BenJBhx42YOYbQIEDrOo25UNTUA9SbBg 9YwmlFK7P7Atudikh2aeKNhZTmA/X/7Kaxr6VKwMoCK2UIftuN2ivgtKBIL+Wcxji65J 1oy9h6nqZE8Kl0ilWijkv7+7LKEc4DdTpqOx7Bf3r0505oTjtY3LfOQt2BlKzg0bQ0o9 U8gqNmiK6sU6aC1rz5rzC5opnKRsWdg2n/irtG/MFUcC2+FSJVc7RrDsQFO4YPY07NMk b5KMCxp0rz9XDgsifUv0yWELolBVN7fYj1iAC2/YNy3Qg/kYTFhL/q59KsVbNospBA0B 1ipA== X-Gm-Message-State: AAQBX9epzEzmuewiHoD929XcjRk43RHmx6zYSrqavaUC+cHvMVJm9ins d+mJH6S0Q3YbDSe/F+VLpi75IoomY96AzQ== X-Google-Smtp-Source: AKy350Ynyb1uELx6zdIeynzAV1Y2Sqtuf7HOOmHPp7XqvNiP1RB8Z0vy1Oihgn31ZuCxpVCumCQrvQ== X-Received: by 2002:a05:600c:2212:b0:3f1:6ead:34fe with SMTP id z18-20020a05600c221200b003f16ead34femr11542501wml.26.1682464540470; Tue, 25 Apr 2023 16:15:40 -0700 (PDT) Received: from lucifer.home ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.googlemail.com with ESMTPSA id x24-20020a1c7c18000000b003f183127434sm16220321wmc.30.2023.04.25.16.15.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Apr 2023 16:15:39 -0700 (PDT) From: Lorenzo Stoakes To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Cc: Jason Gunthorpe , Jens Axboe , Matthew Wilcox , Dennis Dalessandro , Leon Romanovsky , Christian Benvenuti , Nelson Escobar , Bernard Metzler , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Bjorn Topel , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Christian Brauner , Richard Cochran , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, Oleg Nesterov , Jason Gunthorpe , John Hubbard , Jan Kara , "Kirill A . Shutemov" , Pavel Begunkov , Lorenzo Stoakes Subject: [PATCH v4] mm/gup: disallow GUP writing to file-backed mappings by default Date: Wed, 26 Apr 2023 00:15:36 +0100 Message-Id: <3b92d56f55671a0389252379237703df6e86ea48.1682464032.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 7FBA080011 X-Rspamd-Server: rspam01 X-Stat-Signature: dx57q73434ymwa4k49fsuy9kju3nnpkp X-HE-Tag: 1682464542-765066 X-HE-Meta: U2FsdGVkX1+gvAuy5kB40KoFmbK40MQH17hcgvaLo4el6fJLDFSn0aHDFsKJ9agRNHOjjuxUceI5X3nPLaAtsW3mZqNU9Qm8vc1vncpBL3Myz09mYEhXACNMD1CgvQtVP9FY7GXdgx8emn46a/gstz+a9VeqHZaxiIL8ua+KKzLFZ+QXSnnVa4h1+e15O8+UHwOnXEpkS1uLJ5zTVE2WBxCgcBBOEZCuQnbMy8ZWE/7WRB/kFnpXxCW0E7R7BNjw36QGwExejQ4/hFH9WkLT6llgRxN0hHidug2tjOYu7msmf1NXpGDLBLLSyiGtlNTcZSdL0ECps9ZNRPSRAyi6ctySocOrH3GEznJIe8BsY7ANrfQgjFsprAL70uBMxiWD0w35oWOimosfCkvn0O75bM9uNrBg2p09uY1SFy7ucEDXEDB0SHwawuOqAlu+dpIl/ixi+btFW6HCJJpXOQZsxZtYDKvDXVYiiIyi0WqjJ/NNIBCC0eNa9AofVkg8kMBwbonpkJ9OXWqgM98rSBOOzjSJRsvsOLB+2VoJSF8uimJXC93GmaslHbOj1B3pjBfTfNtM/cbU/TL+sNdWgVOxwt40I3JHlaBU/ExV5FZZZ3XRQSqP4l1K8iDLYZQP/ymnCel9uWQl4rdNxabpC1bHH1tjHpDBYH5WyKkTS5XocOQxXbp2FnF1jP+15NFmTqoipTWfFsjpIGThCuCZqwwu21N5amLQCexBnIHBWc+F1x15a/LETzNW0pjymjFCZIq2TTkUTSCsTIL4tVfX8+agpzKcau9IlgFihbVmVWo8dhxc5JeFNORSlLpxmc6HPECF0lPbFomewCAmwx9sJDFdd8kWfkeQk6f55+m9TWmFKUY7AYLJPfq8UdM/xNn2JpklUqzL/pl/WFvfCbUg03sQJMOPh1nSj2RBaG1EZOwdYvtp0emapK1SakebYjrFYq/Fnwy9WYfi9zUqX+9RglI LDpFFsEk roDDMA110XjCaspdHxdXd005/c4un+impTQ9FCCNyUkh10mYw6Pm/XgZcxVs3CvnAAYRbmShm6fA8sHTl65NXgzXpR9jqwQfjaJNiGD+EH6grOzM9bE0aKZDdQsDHRFaa7eCHo3/8PA0OfG1REv5fVsuGfU+CvxtUUUH2es3rrePkoY/KwxdpikLk4X/N34ju5xcnR8MUhoFmkEAFfAFEmDE/0Hkk7AuxYoUtrzhiMe4EVds2Rqi1Tp8EialG6Mtx3y43wL+5249WAu3zjvAEJkiqr8G3ntBPI7KhDIiABQzaAZ9yxkDAjYLSyLQG0hFaFt57cFpr5wIFgHmOsbJxya6uEw6oobX6vsrOg+mXMUD1rsbkqhXZkkfIChHy23/jc96bcZVrl0STJLemSktFuzFSPshU/YgfEFnpygs6xiNYirl6E8sOZ9kGl8qjSSiUys/bpboIgHcmX0zL6MveFxPOybiJLF2aIwDsjI2A3pnbTgFMGGtZkQimRZrcwLC04B/kRtC4IvqYYL6hBtvWMG2b8m0vxfvQabjp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: GUP does not correctly implement write-notify semantics, nor does it guarantee that the underlying pages are correctly dirtied, which could lead to a kernel oops or data corruption when writing to file-backed mappings. This is only relevant when the mappings are file-backed and the underlying file system requires folio dirty tracking. File systems which do not, such as shmem or hugetlb, are not at risk and therefore can be written to without issue. Unfortunately this limitation of GUP has been present for some time and requires future rework of the GUP API in order to provide correct write access to such mappings. In the meantime, we add a check for the most broken GUP case - FOLL_LONGTERM - which really under no circumstances can safely access dirty-tracked file mappings. As part of this change we separate out vma_needs_dirty_tracking() as a helper function to determine this, which is distinct from vma_wants_writenotify() which is specific to determining which PTE flags to set. Suggested-by: Jason Gunthorpe Signed-off-by: Lorenzo Stoakes --- v4: - Split out vma_needs_dirty_tracking() from vma_wants_writenotify() to reduce duplication and update to use this in the GUP check. Note that both separately check vm_ops_needs_writenotify() as the latter needs to test this before the vm_pgprot_modify() test, resulting in vma_wants_writenotify() checking this twice, however it is such a small check this should not be egregious. v3: - Rebased on latest mm-unstable as of 24th April 2023. - Explicitly check whether file system requires folio dirtying. Note that vma_wants_writenotify() could not be used directly as it is very much focused on determining if the PTE r/w should be set (e.g. assuming private mapping does not require it as already set, soft dirty considerations). - Tested code against shmem and hugetlb mappings - confirmed that these are not disallowed by the check. - Eliminate FOLL_ALLOW_BROKEN_FILE_MAPPING flag and instead perform check only for FOLL_LONGTERM pins. - As a result, limit check to internal GUP code. https://lore.kernel.org/all/23c19e27ef0745f6d3125976e047ee0da62569d4.1682406295.git.lstoakes@gmail.com/ v2: - Add accidentally excluded ptrace_access_vm() use of FOLL_ALLOW_BROKEN_FILE_MAPPING. - Tweak commit message. https://lore.kernel.org/all/c8ee7e02d3d4f50bb3e40855c53bda39eec85b7d.1682321768.git.lstoakes@gmail.com/ v1: https://lore.kernel.org/all/f86dc089b460c80805e321747b0898fd1efe93d7.1682168199.git.lstoakes@gmail.com/ include/linux/mm.h | 1 + mm/gup.c | 26 +++++++++++++++++++++++++- mm/mmap.c | 37 ++++++++++++++++++++++++++++--------- 3 files changed, 54 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 37554b08bb28..f7da02fc89c6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2433,6 +2433,7 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) +bool vma_needs_dirty_tracking(struct vm_area_struct *vma); int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot); static inline bool vma_wants_manual_pte_write_upgrade(struct vm_area_struct *vma) { diff --git a/mm/gup.c b/mm/gup.c index 1f72a717232b..53652453037c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -959,16 +959,37 @@ static int faultin_page(struct vm_area_struct *vma, return 0; } +/* + * Writing to file-backed mappings which require folio dirty tracking using GUP + * is a fundamentally broken operation as kernel write access to GUP mappings + * may not adhere to the semantics expected by a file system. + */ +static inline bool can_write_file_mapping(struct vm_area_struct *vma, + unsigned long gup_flags) +{ + /* If we aren't pinning then no problematic write can occur. */ + if (!(gup_flags & (FOLL_GET | FOLL_PIN))) + return true; + + /* We limit this check to the most egregious case - a long term pin. */ + if (!(gup_flags & FOLL_LONGTERM)) + return true; + + /* If the VMA requires dirty tracking then GUP will be problematic. */ + return vma_needs_dirty_tracking(vma); +} + static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) { vm_flags_t vm_flags = vma->vm_flags; int write = (gup_flags & FOLL_WRITE); int foreign = (gup_flags & FOLL_REMOTE); + bool vma_anon = vma_is_anonymous(vma); if (vm_flags & (VM_IO | VM_PFNMAP)) return -EFAULT; - if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma)) + if ((gup_flags & FOLL_ANON) && !vma_anon) return -EFAULT; if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) @@ -978,6 +999,9 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) return -EFAULT; if (write) { + if (!vma_anon && !can_write_file_mapping(vma, gup_flags)) + return -EFAULT; + if (!(vm_flags & VM_WRITE)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; diff --git a/mm/mmap.c b/mm/mmap.c index 536bbb8fa0ae..aac638dd22cf 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1475,6 +1475,32 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg) } #endif /* __ARCH_WANT_SYS_OLD_MMAP */ +/* Do VMA operations imply write notify is required? */ +static inline bool vm_ops_needs_writenotify( + const struct vm_operations_struct *vm_ops) +{ + return vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite); +} + +/* + * Does this VMA require the underlying folios to have their dirty state + * tracked? + */ +bool vma_needs_dirty_tracking(struct vm_area_struct *vma) +{ + /* Does the filesystem need to be notified? */ + if (vm_ops_needs_writenotify(vma->vm_ops)) + return true; + + /* Specialty mapping? */ + if (vma->vm_flags & VM_PFNMAP) + return false; + + /* Can the mapping track the dirty pages? */ + return vma->vm_file && vma->vm_file->f_mapping && + mapping_can_writeback(vma->vm_file->f_mapping); +} + /* * Some shared mappings will want the pages marked read-only * to track write events. If so, we'll downgrade vm_page_prot @@ -1484,14 +1510,13 @@ SYSCALL_DEFINE1(old_mmap, struct mmap_arg_struct __user *, arg) int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) { vm_flags_t vm_flags = vma->vm_flags; - const struct vm_operations_struct *vm_ops = vma->vm_ops; /* If it was private or non-writable, the write bit is already clear */ if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) return 0; /* The backer wishes to know when pages are first written to? */ - if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite)) + if (vm_ops_needs_writenotify(vma->vm_ops)) return 1; /* The open routine did something to the protections that pgprot_modify @@ -1511,13 +1536,7 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) if (userfaultfd_wp(vma)) return 1; - /* Specialty mapping? */ - if (vm_flags & VM_PFNMAP) - return 0; - - /* Can the mapping track the dirty pages? */ - return vma->vm_file && vma->vm_file->f_mapping && - mapping_can_writeback(vma->vm_file->f_mapping); + return vma_needs_dirty_tracking(vma); } /*