From patchwork Thu May 11 18:24:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 13238351 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9819DC7EE2E for ; Thu, 11 May 2023 18:24:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238866AbjEKSYj (ORCPT ); Thu, 11 May 2023 14:24:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238673AbjEKSYh (ORCPT ); Thu, 11 May 2023 14:24:37 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 629151FEE for ; Thu, 11 May 2023 11:24:34 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-b9e2b65f2eeso16114385276.2 for ; Thu, 11 May 2023 11:24:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683829473; x=1686421473; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=pi6QoZ5SCLGgP2sxC+MkaEgbY1VXhaAz9TEP3dwrDPs=; b=zpyNZKJdhGrBM4FCCCFdLJNUeHu7VB5x1zBza86yuRYT//8gc05k3M+A1XAjXeHYQU XsEPOjhIWX936abY0RAnUz756Co2yEtSnUBul6v6sCxjIPkPJgNCvPmNFhIN62HuVOE1 irkrkZ4ocjIKKQeR0IfXiH2yEHFBjv0I6mZrVd3b7aEQFXXsYHEuLzZoJ1h6qn/7NkBG Kjw+bEIhpmgecGP8AQgNQNhH64/X0SBEYMkbt0iVzYsNyZ87+y4B+sEecTBCV4ftXZHx CwAYGbE8i4Tg39vPlkbL8N9Ogy15Z6Fr4rHTurnmFTOuV0J1qAkQm721UxKaMxjwzdwc hf5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683829473; x=1686421473; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=pi6QoZ5SCLGgP2sxC+MkaEgbY1VXhaAz9TEP3dwrDPs=; b=S1PJiTgXvUdBWVgtrfucQXShSX3Tw15dHN7LVlqoRgUfdgixw3phI1ybzXFG01eCsV /cTzyX1FH5acwdJ6hGcgaXyIsOJUMB2PMsakbd+Fm3D4l/Or2fsOSBv2nIbiJSjmfVIw DMkoaF33nVxd3c73zSAZenMYNrmMvoS+S3FbGPFVCoX/bb5OiZhbb4faNZ/DDahPscP1 J3+nHZxORwjWyTdUaxl/mNJQ2/0aBvJqKlOF5yuScsOfi5vITY8oyyXhlVih3gikPvr4 3npSW7cipVKw5wG/zqQn63azn82PGK2XhAH6WqyqshT9vOgTKrTijP8GD79vdB6WXNmQ /muA== X-Gm-Message-State: AC+VfDypZOiZwVGJuvSlWz4ot8OO13O2KlZ5WQlKAfB4ntEOYlIvkhit xMJQQkeBYb+LJJ4oCEbMA4b2VGzRhWXTTeQRLjiC X-Google-Smtp-Source: ACHHUZ7u4hIr1abfAjSfObToFMnjaZ5a7i67wz1z8K+AHbBfqmL+8z0DeI0fgjzjTRoAf5zNOgmTizL/c3RRg3dEL+oU X-Received: from axel.svl.corp.google.com ([2620:15c:2d4:203:1119:8675:ddb3:1e7a]) (user=axelrasmussen job=sendgmr) by 2002:a05:6902:114e:b0:b4a:3896:bc17 with SMTP id p14-20020a056902114e00b00b4a3896bc17mr9915594ybu.0.1683829473641; Thu, 11 May 2023 11:24:33 -0700 (PDT) Date: Thu, 11 May 2023 11:24:24 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.1.606.ga4b1b128d6-goog Message-ID: <20230511182426.1898675-1-axelrasmussen@google.com> Subject: [PATCH 1/3] mm: userfaultfd: add new UFFDIO_SIGBUS ioctl From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Christian Brauner , David Hildenbrand , Hongchen Zhang , Huang Ying , James Houghton , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Peter Xu , Shuah Khan , ZhangPeng Cc: Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The basic idea here is to "simulate" memory poisoning for VMs. A VM running on some host might encounter a memory error, after which some page(s) are poisoned (i.e., future accesses SIGBUS). They expect that once poisoned, pages can never become "un-poisoned". So, when we live migrate the VM, we need to preserve the poisoned status of these pages. When live migrating, we try to get the guest running on its new host as quickly as possible. So, we start it running before all memory has been copied, and before we're certain which pages should be poisoned or not. So the basic way to use this new feature is: - On the new host, the guest's memory is registered with userfaultfd, in either MISSING or MINOR mode (doesn't really matter for this purpose). - On any first access, we get a userfaultfd event. At this point we can communicate with the old host to find out if the page was poisoned. - If so, we can respond with a UFFDIO_SIGBUS - this places a swap marker so any future accesses will SIGBUS. Because the pte is now "present", future accesses won't generate more userfaultfd events, they'll just SIGBUS directly. UFFDIO_SIGBUS does not handle unmapping previously-present PTEs. This isn't needed, because during live migration we want to intercept all accesses with userfaultfd (not just writes, so WP mode isn't useful for this). So whether minor or missing mode is being used (or both), the PTE won't be present in any case, so handling that case isn't needed. Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 63 ++++++++++++++++++++++++++++++++ include/linux/swapops.h | 3 +- include/linux/userfaultfd_k.h | 4 ++ include/uapi/linux/userfaultfd.h | 25 +++++++++++-- mm/memory.c | 4 ++ mm/userfaultfd.c | 62 ++++++++++++++++++++++++++++++- 6 files changed, 156 insertions(+), 5 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0fd96d6e39ce..edc2928dae2b 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1966,6 +1966,66 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) return ret; } +static inline int userfaultfd_sigbus(struct userfaultfd_ctx *ctx, unsigned long arg) +{ + __s64 ret; + struct uffdio_sigbus uffdio_sigbus; + struct uffdio_sigbus __user *user_uffdio_sigbus; + struct userfaultfd_wake_range range; + + user_uffdio_sigbus = (struct uffdio_sigbus __user *)arg; + + ret = -EAGAIN; + if (atomic_read(&ctx->mmap_changing)) + goto out; + + ret = -EFAULT; + if (copy_from_user(&uffdio_sigbus, user_uffdio_sigbus, + /* don't copy the output fields */ + sizeof(uffdio_sigbus) - (sizeof(__s64)))) + goto out; + + ret = validate_range(ctx->mm, uffdio_sigbus.range.start, + uffdio_sigbus.range.len); + if (ret) + goto out; + + ret = -EINVAL; + /* double check for wraparound just in case. */ + if (uffdio_sigbus.range.start + uffdio_sigbus.range.len <= + uffdio_sigbus.range.start) { + goto out; + } + if (uffdio_sigbus.mode & ~UFFDIO_SIGBUS_MODE_DONTWAKE) + goto out; + + if (mmget_not_zero(ctx->mm)) { + ret = mfill_atomic_sigbus(ctx->mm, uffdio_sigbus.range.start, + uffdio_sigbus.range.len, + &ctx->mmap_changing, 0); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (unlikely(put_user(ret, &user_uffdio_sigbus->updated))) + return -EFAULT; + if (ret < 0) + goto out; + + /* len == 0 would wake all */ + BUG_ON(!ret); + range.len = ret; + if (!(uffdio_sigbus.mode & UFFDIO_SIGBUS_MODE_DONTWAKE)) { + range.start = uffdio_sigbus.range.start; + wake_userfault(ctx, &range); + } + ret = range.len == uffdio_sigbus.range.len ? 0 : -EAGAIN; + +out: + return ret; +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { /* @@ -2067,6 +2127,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, case UFFDIO_CONTINUE: ret = userfaultfd_continue(ctx, arg); break; + case UFFDIO_SIGBUS: + ret = userfaultfd_sigbus(ctx, arg); + break; } return ret; } diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 3a451b7afcb3..fa778a0ae730 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -405,7 +405,8 @@ typedef unsigned long pte_marker; #define PTE_MARKER_UFFD_WP BIT(0) #define PTE_MARKER_SWAPIN_ERROR BIT(1) -#define PTE_MARKER_MASK (BIT(2) - 1) +#define PTE_MARKER_UFFD_SIGBUS BIT(2) +#define PTE_MARKER_MASK (BIT(3) - 1) static inline swp_entry_t make_pte_marker_entry(pte_marker marker) { diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index d78b01524349..6de1084939c5 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -46,6 +46,7 @@ enum mfill_atomic_mode { MFILL_ATOMIC_COPY, MFILL_ATOMIC_ZEROPAGE, MFILL_ATOMIC_CONTINUE, + MFILL_ATOMIC_SIGBUS, NR_MFILL_ATOMIC_MODES, }; @@ -83,6 +84,9 @@ extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm, extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, atomic_t *mmap_changing, uffd_flags_t flags); +extern ssize_t mfill_atomic_sigbus(struct mm_struct *dst_mm, unsigned long start, + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t flags); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, atomic_t *mmap_changing); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 66dd4cd277bd..616e33d3db97 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -39,7 +39,8 @@ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \ - UFFD_FEATURE_WP_UNPOPULATED) + UFFD_FEATURE_WP_UNPOPULATED | \ + UFFD_FEATURE_SIGBUS_IOCTL) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -49,12 +50,14 @@ (__u64)1 << _UFFDIO_COPY | \ (__u64)1 << _UFFDIO_ZEROPAGE | \ (__u64)1 << _UFFDIO_WRITEPROTECT | \ - (__u64)1 << _UFFDIO_CONTINUE) + (__u64)1 << _UFFDIO_CONTINUE | \ + (__u64)1 << _UFFDIO_SIGBUS) #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ + (__u64)1 << _UFFDIO_WRITEPROTECT | \ (__u64)1 << _UFFDIO_CONTINUE | \ - (__u64)1 << _UFFDIO_WRITEPROTECT) + (__u64)1 << _UFFDIO_SIGBUS) /* * Valid ioctl command number range with this API is from 0x00 to @@ -71,6 +74,7 @@ #define _UFFDIO_ZEROPAGE (0x04) #define _UFFDIO_WRITEPROTECT (0x06) #define _UFFDIO_CONTINUE (0x07) +#define _UFFDIO_SIGBUS (0x08) #define _UFFDIO_API (0x3F) /* userfaultfd ioctl ids */ @@ -91,6 +95,8 @@ struct uffdio_writeprotect) #define UFFDIO_CONTINUE _IOWR(UFFDIO, _UFFDIO_CONTINUE, \ struct uffdio_continue) +#define UFFDIO_SIGBUS _IOWR(UFFDIO, _UFFDIO_SIGBUS, \ + struct uffdio_sigbus) /* read() structure */ struct uffd_msg { @@ -225,6 +231,7 @@ struct uffdio_api { #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) #define UFFD_FEATURE_WP_UNPOPULATED (1<<13) +#define UFFD_FEATURE_SIGBUS_IOCTL (1<<14) __u64 features; __u64 ioctls; @@ -321,6 +328,18 @@ struct uffdio_continue { __s64 mapped; }; +struct uffdio_sigbus { + struct uffdio_range range; +#define UFFDIO_SIGBUS_MODE_DONTWAKE ((__u64)1<<0) + __u64 mode; + + /* + * Fields below here are written by the ioctl and must be at the end: + * the copy_from_user will not read past here. + */ + __s64 updated; +}; + /* * Flags for the userfaultfd(2) system call itself. */ diff --git a/mm/memory.c b/mm/memory.c index f69fbc251198..e4b4207c2590 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3675,6 +3675,10 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) if (WARN_ON_ONCE(!marker)) return VM_FAULT_SIGBUS; + /* SIGBUS explicitly requested for this PTE. */ + if (marker & PTE_MARKER_UFFD_SIGBUS) + return VM_FAULT_SIGBUS; + /* Higher priority than uffd-wp when data corrupted */ if (marker & PTE_MARKER_SWAPIN_ERROR) return VM_FAULT_SIGBUS; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e97a0b4889fc..933587eebd5d 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -278,6 +278,51 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, goto out; } +/* Handles UFFDIO_SIGBUS for all non-hugetlb VMAs. */ +static int mfill_atomic_pte_sigbus(pmd_t *dst_pmd, + struct vm_area_struct *dst_vma, + unsigned long dst_addr, + uffd_flags_t flags) +{ + int ret; + struct mm_struct *dst_mm = dst_vma->vm_mm; + pte_t _dst_pte, *dst_pte; + spinlock_t *ptl; + + _dst_pte = make_pte_marker(PTE_MARKER_UFFD_SIGBUS); + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); + + if (vma_is_shmem(dst_vma)) { + struct inode *inode; + pgoff_t offset, max_off; + + /* serialize against truncate with the page table lock */ + inode = dst_vma->vm_file->f_inode; + offset = linear_page_index(dst_vma, dst_addr); + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); + ret = -EFAULT; + if (unlikely(offset >= max_off)) + goto out_unlock; + } + + ret = -EEXIST; + /* + * For now, we don't handle unmapping pages, so only support filling in + * none PTEs, or replacing PTE markers. + */ + if (!pte_none_mostly(*dst_pte)) + goto out_unlock; + + set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); + + /* No need to invalidate - it was non-present before */ + update_mmu_cache(dst_vma, dst_addr, dst_pte); + ret = 0; +out_unlock: + pte_unmap_unlock(dst_pte, ptl); + return ret; +} + static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; @@ -328,8 +373,12 @@ static __always_inline ssize_t mfill_atomic_hugetlb( * supported by hugetlb. A PMD_SIZE huge pages may exist as used * by THP. Since we can not reliably insert a zero page, this * feature is not supported. + * + * PTE marker handling for hugetlb is a bit special, so for now + * UFFDIO_SIGBUS is not supported. */ - if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) { + if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE) || + uffd_flags_mode_is(flags, MFILL_ATOMIC_SIGBUS)) { mmap_read_unlock(dst_mm); return -EINVAL; } @@ -473,6 +522,9 @@ static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd, if (uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) { return mfill_atomic_pte_continue(dst_pmd, dst_vma, dst_addr, flags); + } else if (uffd_flags_mode_is(flags, MFILL_ATOMIC_SIGBUS)) { + return mfill_atomic_pte_sigbus(dst_pmd, dst_vma, + dst_addr, flags); } /* @@ -694,6 +746,14 @@ ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start, uffd_flags_set_mode(flags, MFILL_ATOMIC_CONTINUE)); } +ssize_t mfill_atomic_sigbus(struct mm_struct *dst_mm, unsigned long start, + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t flags) +{ + return mfill_atomic(dst_mm, start, 0, len, mmap_changing, + uffd_flags_set_mode(flags, MFILL_ATOMIC_SIGBUS)); +} + long uffd_wp_range(struct vm_area_struct *dst_vma, unsigned long start, unsigned long len, bool enable_wp) { From patchwork Thu May 11 18:24:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 13238352 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B16A5C77B7F for ; Thu, 11 May 2023 18:24:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238919AbjEKSYk (ORCPT ); Thu, 11 May 2023 14:24:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238836AbjEKSYi (ORCPT ); Thu, 11 May 2023 14:24:38 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41DD65584 for ; Thu, 11 May 2023 11:24:36 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-b922aa3725fso17031560276.0 for ; Thu, 11 May 2023 11:24:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683829475; x=1686421475; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LjUrEJJ1UhaYRGAtGgSCY4PNxG4jgAwHbuTYYPklZWY=; b=55rn+NwTS9qkN6o3EHQH9tg5nvgfFAoVfIi3Ps/Vx2UwN/K9+YFrkS/lHydkVA1cxR iowaGUww07beE3ntNfoSK93Fb1ceSg9jZe+0OTW9gkOExIsspD5MajhZSLmPMFNhTmMt gYuZl1WAugwIYddYiwJQaO7nq+GtN5jEQblGC1v623/h5p5sdc5Fgzt9DS2LFIrWogJ1 uW449MkphLCiIDXFZtwUGTOaENHChuPdfQqfcJ0QFEkS+4bIncJToGHvvADNkuPgQdkG R3TQXk6+FSc/Ziwfvc5dNOcdLIdSb14/ToPdMXENIjkpjZO8fNcFVpbUtLbpkbE1w/zq 6kkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683829475; x=1686421475; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LjUrEJJ1UhaYRGAtGgSCY4PNxG4jgAwHbuTYYPklZWY=; b=GRdMKH+2VozYMjVxparUMQtjAe+/hs1x6kor6/J71xvuuVf/aPKheJma9A71aYJMk5 3DuKXndrqxqssTkGfN/5ACcDf7xQFld0vkTDRMfBqZ1jPJGgqzoGKmeRiQm0MGL9b+29 5aVcKoomMxRB890MufVwmGgRSfLVz7+6N+ypxjkNwKhmH0TJtKFJadVT7PiWccOi3N2k HQ4xqLBpr2j+GNWp6iXI7yM3Dt1AQ9SaUFfxrS/UrD3FCg2dQm5dNWgpCeTaw61u9Vz+ QzHPBqVnDMEXwAWeWEVc/YK83LQIoT4yFp20RJ7vUn+Oah8nwxCm09KiCINFwpg048B1 g+2g== X-Gm-Message-State: AC+VfDwwB22ruKN0q1sqUXDYJ+eX/cRZrUHaq6G5rKCI9IvR7lIGCfDZ XKjhczYpiHpqx6thPrAOoxVnDQbxwojQR7rXYft6 X-Google-Smtp-Source: ACHHUZ5tH1DV1GALQ1Ob1t3RTbPs1BmNd7Keg/peKTDa9epscPLL3nUgdVKOqLkDNSlWMLhG0C8f2l/D/e6LqknAyPsd X-Received: from axel.svl.corp.google.com ([2620:15c:2d4:203:1119:8675:ddb3:1e7a]) (user=axelrasmussen job=sendgmr) by 2002:a05:6902:154f:b0:b78:8bd8:6e77 with SMTP id r15-20020a056902154f00b00b788bd86e77mr14316791ybu.8.1683829475459; Thu, 11 May 2023 11:24:35 -0700 (PDT) Date: Thu, 11 May 2023 11:24:25 -0700 In-Reply-To: <20230511182426.1898675-1-axelrasmussen@google.com> Mime-Version: 1.0 References: <20230511182426.1898675-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.40.1.606.ga4b1b128d6-goog Message-ID: <20230511182426.1898675-2-axelrasmussen@google.com> Subject: [PATCH 2/3] selftests/mm: refactor uffd_poll_thread to allow custom fault handlers From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Christian Brauner , David Hildenbrand , Hongchen Zhang , Huang Ying , James Houghton , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Peter Xu , Shuah Khan , ZhangPeng Cc: Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Previously, we had "one fault handler to rule them all", which used several branches to deal with all of the scenarios required by all of the various tests. In upcoming patches, I plan to add a new test, which has its own slightly different fault handling logic. Instead of continuing to add cruft to the existing fault handler, let's allow tests to define custom ones, separate from other tests. Signed-off-by: Axel Rasmussen --- tools/testing/selftests/mm/uffd-common.c | 5 ++++- tools/testing/selftests/mm/uffd-common.h | 3 +++ tools/testing/selftests/mm/uffd-stress.c | 6 ++---- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c index 61c6250adf93..c9756dbffe7d 100644 --- a/tools/testing/selftests/mm/uffd-common.c +++ b/tools/testing/selftests/mm/uffd-common.c @@ -499,6 +499,9 @@ void *uffd_poll_thread(void *arg) int ret; char tmp_chr; + if (!args->handle_fault) + args->handle_fault = uffd_handle_page_fault; + pollfd[0].fd = uffd; pollfd[0].events = POLLIN; pollfd[1].fd = pipefd[cpu*2]; @@ -527,7 +530,7 @@ void *uffd_poll_thread(void *arg) err("unexpected msg event %u\n", msg.event); break; case UFFD_EVENT_PAGEFAULT: - uffd_handle_page_fault(&msg, args); + args->handle_fault(&msg, args); break; case UFFD_EVENT_FORK: close(uffd); diff --git a/tools/testing/selftests/mm/uffd-common.h b/tools/testing/selftests/mm/uffd-common.h index 6068f2346b86..b28d88b9937e 100644 --- a/tools/testing/selftests/mm/uffd-common.h +++ b/tools/testing/selftests/mm/uffd-common.h @@ -77,6 +77,9 @@ struct uffd_args { unsigned long missing_faults; unsigned long wp_faults; unsigned long minor_faults; + + /* A custom fault handler; defaults to uffd_handle_page_fault. */ + void (*handle_fault)(struct uffd_msg *msg, struct uffd_args *args); }; struct uffd_test_ops { diff --git a/tools/testing/selftests/mm/uffd-stress.c b/tools/testing/selftests/mm/uffd-stress.c index f1ad9eef1c3a..47e1464935a8 100644 --- a/tools/testing/selftests/mm/uffd-stress.c +++ b/tools/testing/selftests/mm/uffd-stress.c @@ -199,10 +199,8 @@ static int stress(struct uffd_args *args) locking_thread, (void *)cpu)) return 1; if (bounces & BOUNCE_POLL) { - if (pthread_create(&uffd_threads[cpu], &attr, - uffd_poll_thread, - (void *)&args[cpu])) - return 1; + if (pthread_create(&uffd_threads[cpu], &attr, uffd_poll_thread, &args[cpu])) + err("uffd_poll_thread create"); } else { if (pthread_create(&uffd_threads[cpu], &attr, uffd_read_thread, From patchwork Thu May 11 18:24:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 13238353 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FC33C7EE24 for ; Thu, 11 May 2023 18:25:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238838AbjEKSY6 (ORCPT ); Thu, 11 May 2023 14:24:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238903AbjEKSYj (ORCPT ); Thu, 11 May 2023 14:24:39 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE304558D for ; Thu, 11 May 2023 11:24:37 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-b9a7e65b34aso16106013276.0 for ; Thu, 11 May 2023 11:24:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683829477; x=1686421477; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pXR2gVK9OA+wwjaEDZ+OSGVT3sLd4z3OrzMBakd4aKc=; b=uljCWBeUfmKO+5Of2CDGdnripKh0c8Rzx0Uw+3mUJQMB0vkRyYLVzF+n17pnlhk/Gx 5bJy3V1jOgdbwhD+IzTsmSPyo+npOvf5g9Bg01qvgjSMD6eTNJAt18pXpgDdOP/1R0pG f7W9dTJmQptgBrJNb6rZO42eQOqXUPeL3PmdMoDvsc0jIKFazxlaZQy6iFcTAKrcyb2g Vd4Xof1tpuhpdbe6fUAhZyOPuYJwlVtphMWQs8q/tSYp7h7umSFg6RazMw4joPD08IQq U+PE0b8W/rQAZMz7pnNUhXYfuSp6KyH2zv92pJQFwOvAqZixv97KJ+4OltRNsi03yCWr atQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683829477; x=1686421477; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pXR2gVK9OA+wwjaEDZ+OSGVT3sLd4z3OrzMBakd4aKc=; b=RIrkmnruJNYZU5l3cjEFAmEe68XcvGpj4E6gHlCSFFBB0zaHyfnPTOwnob3S17uccR rz5f608ruQBvwWnXIhBnzl+e8HwWjgeRg0ID7MEcOKDMRiiqQFRoIg7E/q5M6t4z196K Pl34nmJ1RO8da2tHuCT+t64HKyVqJtOO8Mae7XJa4DDAoMN9dH/wlQCmTeJMajCF1VSB 4zbPQFC90gZcZSCxGyOyP18JSEEQ9yzADkreiAeG0vUiRmKJy7lfwxlcogAHH5rz4eDm IWCequ7UoSNc+ld/IW3Y6V/KMEJCcJshSScxxYCYFEHxbcWfxOc9zXoRehx/jdnw6UHa +TOQ== X-Gm-Message-State: AC+VfDxfc/sa2gbXXAxs+1hezb8SDnuL8XYhqWg+BqbSoiz1I2m10roG HebJsGiLA/nea0GOCvBU2cHii8hN3fLo0rrQ/T0o X-Google-Smtp-Source: ACHHUZ6/k8WRFVr95FNBEjmhvpwsN42OkGFLi7clqHCCS0gOIKkyVoZZe1vIA3bGUQMYYdRD+oFyrl83UA1YRSgN8vJE X-Received: from axel.svl.corp.google.com ([2620:15c:2d4:203:1119:8675:ddb3:1e7a]) (user=axelrasmussen job=sendgmr) by 2002:a05:6902:114e:b0:b4a:3896:bc17 with SMTP id p14-20020a056902114e00b00b4a3896bc17mr9915667ybu.0.1683829477159; Thu, 11 May 2023 11:24:37 -0700 (PDT) Date: Thu, 11 May 2023 11:24:26 -0700 In-Reply-To: <20230511182426.1898675-1-axelrasmussen@google.com> Mime-Version: 1.0 References: <20230511182426.1898675-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.40.1.606.ga4b1b128d6-goog Message-ID: <20230511182426.1898675-3-axelrasmussen@google.com> Subject: [PATCH 3/3] selftests/mm: add uffd unit test for UFFDIO_SIGBUS From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Christian Brauner , David Hildenbrand , Hongchen Zhang , Huang Ying , James Houghton , "Liam R. Howlett" , Miaohe Lin , "Mike Rapoport (IBM)" , Nadav Amit , Naoya Horiguchi , Peter Xu , Shuah Khan , ZhangPeng Cc: Axel Rasmussen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The test is pretty basic, and exercises UFFDIO_SIGBUS straightforwardly. We register a region with userfaultfd, in missing fault mode. For each fault, we either issue UFFDIO_ZEROPAGE (odd pages) or UFFDIO_SIGBUS (even pages). We read each page in the region, and assert that the odd pages are zeroed as expected, and the even pages yield a SIGBUS as expected. Signed-off-by: Axel Rasmussen --- tools/testing/selftests/mm/uffd-unit-tests.c | 114 ++++++++++++++++++- 1 file changed, 110 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/selftests/mm/uffd-unit-tests.c index 269c86768a02..3eb5a6f9b51f 100644 --- a/tools/testing/selftests/mm/uffd-unit-tests.c +++ b/tools/testing/selftests/mm/uffd-unit-tests.c @@ -881,13 +881,13 @@ static void retry_uffdio_zeropage(int ufd, } } -static bool do_uffdio_zeropage(int ufd, bool has_zeropage) +static bool do_uffdio_zeropage(int ufd, bool has_zeropage, bool test_retry, unsigned long offset) { struct uffdio_zeropage uffdio_zeropage = { 0 }; int ret; __s64 res; - uffdio_zeropage.range.start = (unsigned long) area_dst; + uffdio_zeropage.range.start = (unsigned long) area_dst + offset; uffdio_zeropage.range.len = page_size; uffdio_zeropage.mode = 0; ret = ioctl(ufd, UFFDIO_ZEROPAGE, &uffdio_zeropage); @@ -901,7 +901,7 @@ static bool do_uffdio_zeropage(int ufd, bool has_zeropage) } else if (has_zeropage) { if (res != page_size) err("UFFDIO_ZEROPAGE unexpected size"); - else + else if (test_retry) retry_uffdio_zeropage(ufd, &uffdio_zeropage); return true; } else @@ -938,7 +938,7 @@ static void uffd_zeropage_test(uffd_test_args_t *args) /* Ignore the retval; we already have it */ uffd_register_detect_zeropage(uffd, area_dst_alias, page_size); - if (do_uffdio_zeropage(uffd, has_zeropage)) + if (do_uffdio_zeropage(uffd, has_zeropage, true, 0)) for (i = 0; i < page_size; i++) if (area_dst[i] != 0) err("data non-zero at offset %d\n", i); @@ -952,6 +952,106 @@ static void uffd_zeropage_test(uffd_test_args_t *args) uffd_test_pass(); } +static void do_uffdio_sigbus(int uffd, unsigned long offset) +{ + struct uffdio_sigbus uffdio_sigbus = { 0 }; + int ret; + __s64 res; + + uffdio_sigbus.range.start = (unsigned long) area_dst + offset; + uffdio_sigbus.range.len = page_size; + uffdio_sigbus.mode = 0; + ret = ioctl(uffd, UFFDIO_SIGBUS, &uffdio_sigbus); + res = uffdio_sigbus.updated; + + if (ret) + err("UFFDIO_SIGBUS error: %"PRId64, (int64_t)res); + else if (res != page_size) + err("UFFDIO_SIGBUS unexpected size: %"PRId64, (int64_t)res); +} + +static void uffd_sigbus_ioctl_handle_fault( + struct uffd_msg *msg, struct uffd_args *args) +{ + unsigned long offset; + + if (msg->event != UFFD_EVENT_PAGEFAULT) + err("unexpected msg event %u", msg->event); + + if (msg->arg.pagefault.flags & + (UFFD_PAGEFAULT_FLAG_WP | UFFD_PAGEFAULT_FLAG_MINOR)) + err("unexpected fault type %llu", msg->arg.pagefault.flags); + + offset = (char *)(unsigned long)msg->arg.pagefault.address - area_dst; + offset &= ~(page_size-1); + + /* Odd pages -> zeropage; even pages -> sigbus. */ + if (offset & page_size) { + if (!do_uffdio_zeropage(uffd, true, false, offset)) + err("UFFDIO_ZEROPAGE failed"); + } else { + do_uffdio_sigbus(uffd, offset); + } +} + +static void uffd_sigbus_ioctl_test(uffd_test_args_t *targs) +{ + pthread_t uffd_mon; + char c; + struct uffd_args args = { 0 }; + struct sigaction act = { 0 }; + unsigned long nr_sigbus = 0; + unsigned long nr; + + fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK); + + if (!uffd_register_detect_zeropage(uffd, area_dst, nr_pages * page_size)) + err("register failed: no zeropage support"); + + args.handle_fault = uffd_sigbus_ioctl_handle_fault; + if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &args)) + err("uffd_poll_thread create"); + + sigbuf = &jbuf; + act.sa_sigaction = sighndl; + act.sa_flags = SA_SIGINFO; + if (sigaction(SIGBUS, &act, 0)) + err("sigaction"); + + for (nr = 0; nr < nr_pages; ++nr) { + unsigned long offset = nr * page_size; + const char *bytes = (const char *) area_dst + offset; + const char *i; + + if (sigsetjmp(*sigbuf, 1)) { + /* + * Access below triggered a SIGBUS, which was caught by + * sighndl, which then jumped here. Count this SIGBUS, + * and move on to next page. + */ + ++nr_sigbus; + continue; + } + + for (i = bytes; i < bytes + page_size; ++i) { + if (*i) + err("nonzero byte in area_dst (%p) at %p: %u", + area_dst, i, *i); + } + } + + if (write(pipefd[1], &c, sizeof(c)) != sizeof(c)) + err("pipe write"); + if (pthread_join(uffd_mon, NULL)) + err("pthread_join()"); + + if (nr_sigbus != nr_pages / 2) + err("expected to receive %lu SIGBUS, actually received %lu", + nr_pages / 2, nr_sigbus); + + uffd_test_pass(); +} + /* * Test the returned uffdio_register.ioctls with different register modes. * Note that _UFFDIO_ZEROPAGE is tested separately in the zeropage test. @@ -1127,6 +1227,12 @@ uffd_test_case_t uffd_tests[] = { UFFD_FEATURE_PAGEFAULT_FLAG_WP | UFFD_FEATURE_WP_HUGETLBFS_SHMEM, }, + { + .name = "sigbus-ioctl", + .uffd_fn = uffd_sigbus_ioctl_test, + .mem_targets = MEM_ALL & ~(MEM_HUGETLB | MEM_HUGETLB_PRIVATE), + .uffd_feature_required = UFFD_FEATURE_SIGBUS_IOCTL, + }, }; static void usage(const char *prog)