From patchwork Thu Aug 25 07:09:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 12954257 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6862C04AA5 for ; Thu, 25 Aug 2022 07:10:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237143AbiHYHJ5 (ORCPT ); Thu, 25 Aug 2022 03:09:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237100AbiHYHJ4 (ORCPT ); Thu, 25 Aug 2022 03:09:56 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e5ab]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9BFE11455; Thu, 25 Aug 2022 00:09:53 -0700 (PDT) Received: from lenovo.Home (unknown [39.53.61.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 17DA06601E9A; Thu, 25 Aug 2022 08:09:49 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1661411392; bh=BJ+PAJ4nk2Sl0y5d43VYaapFmr5r01iee9mww5IRQaA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SuXLZgWmDePMqY0JETNTzWqQIZMdAhRAw8tbx6Ng4QluyYiEZNX8mrPiheQBdjBrN wMRuHOBS8PDuwhzxLdBolrcyEOfeQ3EVaL0/c0/cGcsbo441fZ+bSjlpKlJF4F8Q86 +Z6sW7mgo1PyVX5osBxkT59TDbhVMtAadsE8J89i8Mgby/U3aAIU0CEWv7OvT0jaD6 z6Po0oPmHZ73Gick5G4E+8Y7BPTy7SayZ1yEGkK4DUQTcDQC20xEoko1QPRsZgc+dR IFb67JEp1/fstH2LNyhitgSzqRYD6frSx31dEmeTmOLt1WzN6mk4MwPMHc/7WtZITW JFULgp5xhhyMA== From: Muhammad Usama Anjum To: Jonathan Corbet , Alexander Viro , Andrew Morton , Shuah Khan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:PROC FILESYSTEM), linux-mm@kvack.org (open list:MEMORY MANAGEMENT), linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Cc: Muhammad Usama Anjum , kernel@collabora.com Subject: [PATCH v2 1/4] fs/proc/task_mmu: update functions to clear the soft-dirty bit Date: Thu, 25 Aug 2022 12:09:23 +0500 Message-Id: <20220825070926.2922471-2-usama.anjum@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220825070926.2922471-1-usama.anjum@collabora.com> References: <20220825070926.2922471-1-usama.anjum@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Update the clear_soft_dirty() and clear_soft_dirty_pmd() to optionally clear and return the status if page is dirty. Signed-off-by: Muhammad Usama Anjum --- Changes in v2: - Move back the functions back to their original file --- fs/proc/task_mmu.c | 82 ++++++++++++++++++++++++++++------------------ 1 file changed, 51 insertions(+), 31 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8b4f3073f8f5..f66674033207 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1095,8 +1095,8 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr, return page_maybe_dma_pinned(page); } -static inline void clear_soft_dirty(struct vm_area_struct *vma, - unsigned long addr, pte_t *pte) +static inline bool check_soft_dirty(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, bool clear) { /* * The soft-dirty tracker uses #PF-s to catch writes @@ -1105,55 +1105,75 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma, * of how soft-dirty works. */ pte_t ptent = *pte; + int dirty = 0; if (pte_present(ptent)) { pte_t old_pte; - if (pte_is_pinned(vma, addr, ptent)) - return; - old_pte = ptep_modify_prot_start(vma, addr, pte); - ptent = pte_wrprotect(old_pte); - ptent = pte_clear_soft_dirty(ptent); - ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); + dirty = pte_soft_dirty(ptent); + + if (dirty && clear && !pte_is_pinned(vma, addr, ptent)) { + old_pte = ptep_modify_prot_start(vma, addr, pte); + ptent = pte_wrprotect(old_pte); + ptent = pte_clear_soft_dirty(ptent); + ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); + } } else if (is_swap_pte(ptent)) { - ptent = pte_swp_clear_soft_dirty(ptent); - set_pte_at(vma->vm_mm, addr, pte, ptent); + dirty = pte_swp_soft_dirty(ptent); + + if (dirty && clear) { + ptent = pte_swp_clear_soft_dirty(ptent); + set_pte_at(vma->vm_mm, addr, pte, ptent); + } } + + return !!dirty; } #else -static inline void clear_soft_dirty(struct vm_area_struct *vma, - unsigned long addr, pte_t *pte) +static inline bool check_soft_dirty(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, bool clear) { + return false; } #endif #if defined(CONFIG_MEM_SOFT_DIRTY) && defined(CONFIG_TRANSPARENT_HUGEPAGE) -static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) +static inline bool check_soft_dirty_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmdp, bool clear) { pmd_t old, pmd = *pmdp; + int dirty = 0; if (pmd_present(pmd)) { - /* See comment in change_huge_pmd() */ - old = pmdp_invalidate(vma, addr, pmdp); - if (pmd_dirty(old)) - pmd = pmd_mkdirty(pmd); - if (pmd_young(old)) - pmd = pmd_mkyoung(pmd); - - pmd = pmd_wrprotect(pmd); - pmd = pmd_clear_soft_dirty(pmd); - - set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + dirty = pmd_soft_dirty(pmd); + if (dirty && clear) { + /* See comment in change_huge_pmd() */ + old = pmdp_invalidate(vma, addr, pmdp); + if (pmd_dirty(old)) + pmd = pmd_mkdirty(pmd); + if (pmd_young(old)) + pmd = pmd_mkyoung(pmd); + + pmd = pmd_wrprotect(pmd); + pmd = pmd_clear_soft_dirty(pmd); + + set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + } } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { - pmd = pmd_swp_clear_soft_dirty(pmd); - set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + dirty = pmd_swp_soft_dirty(pmd); + + if (dirty && clear) { + pmd = pmd_swp_clear_soft_dirty(pmd); + set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + } } + return !!dirty; } #else -static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) +static inline bool check_soft_dirty_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmdp, bool clear) { + return false; } #endif @@ -1169,7 +1189,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (cp->type == CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty_pmd(vma, addr, pmd); + check_soft_dirty_pmd(vma, addr, pmd, true); goto out; } @@ -1195,7 +1215,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, ptent = *pte; if (cp->type == CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty(vma, addr, pte); + check_soft_dirty(vma, addr, pte, true); continue; } From patchwork Thu Aug 25 07:09:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 12954260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60E21C04AA5 for ; Thu, 25 Aug 2022 07:10:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237304AbiHYHKH (ORCPT ); Thu, 25 Aug 2022 03:10:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237202AbiHYHKA (ORCPT ); Thu, 25 Aug 2022 03:10:00 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCC0227CF5; Thu, 25 Aug 2022 00:09:57 -0700 (PDT) Received: from lenovo.Home (unknown [39.53.61.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id 058AB6601E9B; Thu, 25 Aug 2022 08:09:52 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1661411395; bh=Lcwa5zRVRrswuxaF0ScnA1HT8R4bMWlg2mkK1d8QFEs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=k1yjE0mdB0T52BqsErrfzTzKo1VX7NEBQlxipxLba7QF3ET/xn6QXSDaHM6wkNfiI UBdzI5wCiY+F8HNYcNmFhRTiamrEsAO/VMac3MOFufG70zWejj42QFuwFTRsDNJzHT Ow2DwVrrx/V63AcE1VHS2AT4ZHxP+at84cLB/xvFV+8K6whslUacfklLbs7hqRJLzg IhgKQeN5ubm5nB8YwnrQa1VVEbipsi1JmtR5OavFCWuEMg1ydgG/VEbAvAV6XTWbzd HVLa7c57XyTP95MMstz4OlZxulQgMq4q4sv0zgymStN2FVyRVNEh8fBourPD8mDhiE vv4XC7wSro6ug== From: Muhammad Usama Anjum To: Jonathan Corbet , Alexander Viro , Andrew Morton , Shuah Khan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:PROC FILESYSTEM), linux-mm@kvack.org (open list:MEMORY MANAGEMENT), linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Cc: Muhammad Usama Anjum , kernel@collabora.com Subject: [PATCH v2 2/4] fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty PTE bit Date: Thu, 25 Aug 2022 12:09:24 +0500 Message-Id: <20220825070926.2922471-3-usama.anjum@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220825070926.2922471-1-usama.anjum@collabora.com> References: <20220825070926.2922471-1-usama.anjum@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org This ioctl can be used to watch the process's memory and perform atomic operations which aren't possible through procfs. Three operations have been implemented: - PAGEMAP_SD_GET gets the soft dirty pages in a address range. - PAGEMAP_SD_CLEAR clears the soft dirty bit from dirty pages in a address range. - PAGEMAP_SD_GET_AND_CLEAR gets and clears the soft dirty bit in a address range. struct pagemap_sd_args is used as the argument of the IOCTL. In this struct: - The range is specified through start and len. - The output buffer and size is specified as vec and vec_len. - The flags can be specified in the flags field. Currently only one PAGEMAP_SD_NO_REUSED_REGIONS is supported which can be specified to ignore the VMA dirty flags. This is based on a patch from Gabriel Krisman Bertazi. Signed-off-by: Muhammad Usama Anjum --- Changes in v2: - Convert the interface from syscall to ioctl - Remove pidfd support as it doesn't make sense in ioctl --- fs/proc/task_mmu.c | 255 ++++++++++++++++++++++++++++++++++ include/uapi/linux/fs.h | 13 ++ tools/include/uapi/linux/fs.h | 13 ++ 3 files changed, 281 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f66674033207..bd09dcd52db2 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -19,6 +19,8 @@ #include #include #include +#include +#include #include #include @@ -1775,11 +1777,264 @@ static int pagemap_release(struct inode *inode, struct file *file) return 0; } +#ifdef CONFIG_MEM_SOFT_DIRTY +#define IS_CLEAR_SD_OP(op) (op == PAGEMAP_SD_CLEAR || op == PAGEMAP_SD_GET_AND_CLEAR) +#define IS_GET_SD_OP(op) (op == PAGEMAP_SD_GET || op == PAGEMAP_SD_GET_AND_CLEAR) + +struct pagemap_sd_private { + unsigned long start; + unsigned int op; + unsigned int flags; + unsigned int index; + unsigned int vec_len; + unsigned long *vec; +}; + +static int pagemap_sd_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + unsigned long start = addr; + spinlock_t *ptl; + pte_t *pte; + int dirty; + bool dirty_vma = (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) ? 0 : + (vma->vm_flags & VM_SOFTDIRTY); + + end = min(end, walk->vma->vm_end); + ptl = pmd_trans_huge_lock(pmd, vma); + if (ptl) { + if (dirty_vma || check_soft_dirty_pmd(vma, addr, pmd, false)) { + /* + * Break huge page into small pages if operation needs to be performed is + * on a portion of the huge page or the return buffer cannot store complete + * data. Then process this PMD as having normal pages. + */ + if ((IS_CLEAR_SD_OP(p->op) && (end - addr < HPAGE_SIZE)) || + (IS_GET_SD_OP(p->op) && (p->index + HPAGE_SIZE/PAGE_SIZE > p->vec_len))) { + spin_unlock(ptl); + split_huge_pmd(vma, pmd, addr); + goto process_smaller_pages; + } else { + dirty = check_soft_dirty_pmd(vma, addr, pmd, IS_CLEAR_SD_OP(p->op)); + if (IS_GET_SD_OP(p->op) && (dirty_vma || dirty)) { + for (; addr != end && p->index < p->vec_len; + addr += PAGE_SIZE) + p->vec[p->index++] = addr - p->start; + } + } + } + spin_unlock(ptl); + return 0; + } + +process_smaller_pages: + if (pmd_trans_unstable(pmd)) + return 0; + + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + for (; addr != end; pte++, addr += PAGE_SIZE) { + dirty = check_soft_dirty(vma, addr, pte, IS_CLEAR_SD_OP(p->op)); + + if (IS_GET_SD_OP(p->op) && (dirty_vma || dirty)) { + p->vec[p->index++] = addr - p->start; + WARN_ON(p->index > p->vec_len); + } + } + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + if (IS_CLEAR_SD_OP(p->op)) + flush_tlb_mm_range(vma->vm_mm, start, end, PAGE_SHIFT, false); + + return 0; +} + +static int pagemap_sd_pte_hole(unsigned long addr, unsigned long end, int depth, + struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return 0; + + if (vma && (vma->vm_flags & VM_SOFTDIRTY) && IS_GET_SD_OP(p->op)) { + for (; addr != end && p->index < p->vec_len; addr += PAGE_SIZE) + p->vec[p->index++] = addr - p->start; + } + + return 0; +} + +static int pagemap_sd_pre_vma(unsigned long start, unsigned long end, struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + int ret; + unsigned long end_cut = end; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return 0; + + if (IS_CLEAR_SD_OP(p->op) && (vma->vm_flags & VM_SOFTDIRTY)) { + if (vma->vm_start < start) { + ret = split_vma(vma->vm_mm, vma, start, 1); + if (ret) + return ret; + } + + if (IS_GET_SD_OP(p->op)) + end_cut = min(start + p->vec_len * PAGE_SIZE, end); + + if (vma->vm_end > end_cut) { + ret = split_vma(vma->vm_mm, vma, end_cut, 0); + if (ret) + return ret; + } + } + + return 0; +} + +static void pagemap_sd_post_vma(struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return; + + if (IS_CLEAR_SD_OP(p->op) && (vma->vm_flags & VM_SOFTDIRTY)) { + vma->vm_flags &= ~VM_SOFTDIRTY; + vma_set_page_prot(vma); + } +} + +static int pagemap_sd_pmd_test_walk(unsigned long start, unsigned long end, + struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (IS_GET_SD_OP(p->op) && (p->index == p->vec_len)) + return -1; + + if (vma->vm_flags & VM_PFNMAP) + return 1; + + return 0; +} + +static const struct mm_walk_ops pagemap_sd_ops = { + .test_walk = pagemap_sd_pmd_test_walk, + .pre_vma = pagemap_sd_pre_vma, + .pmd_entry = pagemap_sd_pmd_entry, + .pte_hole = pagemap_sd_pte_hole, + .post_vma = pagemap_sd_post_vma, +}; + +static long do_pagemap_sd_cmd(struct mm_struct *mm, unsigned int cmd, struct pagemap_sd_args *arg) +{ + struct pagemap_sd_private sd_data; + struct mmu_notifier_range range; + unsigned long start, end; + int ret; + + start = (unsigned long)untagged_addr(arg->start); + if ((!IS_ALIGNED(start, PAGE_SIZE)) || !access_ok((void __user *)start, arg->len)) + return -EINVAL; + + if (IS_GET_SD_OP(cmd) && + ((arg->vec_len == 0) || (!arg->vec) || !access_ok(arg->vec, arg->vec_len))) + return -EINVAL; + + end = start + arg->len; + sd_data.start = start; + sd_data.op = cmd; + sd_data.flags = arg->flags; + sd_data.index = 0; + sd_data.vec_len = arg->vec_len; + + if (IS_GET_SD_OP(cmd)) { + sd_data.vec = vzalloc(arg->vec_len * sizeof(loff_t)); + if (!sd_data.vec) + return -ENOMEM; + } + + if (IS_CLEAR_SD_OP(cmd)) { + mmap_write_lock(mm); + + mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY, 0, NULL, + mm, start, end); + mmu_notifier_invalidate_range_start(&range); + inc_tlb_flush_pending(mm); + } else { + mmap_read_lock(mm); + } + + ret = walk_page_range(mm, start, end, &pagemap_sd_ops, &sd_data); + + if (IS_CLEAR_SD_OP(cmd)) { + mmu_notifier_invalidate_range_end(&range); + dec_tlb_flush_pending(mm); + + mmap_write_unlock(mm); + } else { + mmap_read_unlock(mm); + } + + if (ret < 0) + goto free_sd_data; + + if (IS_GET_SD_OP(cmd)) { + ret = copy_to_user(arg->vec, sd_data.vec, sd_data.index * sizeof(loff_t)); + if (ret) { + ret = -EIO; + goto free_sd_data; + } + ret = sd_data.index; + } else { + ret = 0; + } + +free_sd_data: + if (IS_GET_SD_OP(cmd)) + vfree(sd_data.vec); + + return ret; +} + +static long pagemap_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct pagemap_sd_args __user *uarg = (struct pagemap_sd_args __user *)arg; + struct mm_struct *mm = file->private_data; + struct pagemap_sd_args arguments; + + switch (cmd) { + case PAGEMAP_SD_GET: + fallthrough; + case PAGEMAP_SD_CLEAR: + fallthrough; + case PAGEMAP_SD_GET_AND_CLEAR: + if (copy_from_user(&arguments, uarg, sizeof(struct pagemap_sd_args))) + return -EFAULT; + return do_pagemap_sd_cmd(mm, cmd, &arguments); + default: + return -EINVAL; + } +} +#endif /* CONFIG_MEM_SOFT_DIRTY */ + const struct file_operations proc_pagemap_operations = { .llseek = mem_lseek, /* borrow this */ .read = pagemap_read, .open = pagemap_open, .release = pagemap_release, +#ifdef CONFIG_MEM_SOFT_DIRTY + .unlocked_ioctl = pagemap_ioctl, +#endif /* CONFIG_MEM_SOFT_DIRTY */ }; #endif /* CONFIG_PROC_PAGE_MONITOR */ diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index b7b56871029c..a7e48ba9457b 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -305,4 +305,17 @@ typedef int __bitwise __kernel_rwf_t; #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ RWF_APPEND) +struct pagemap_sd_args { + void __user *start; + int len; + loff_t __user *vec; + int vec_len; + int flags; +}; + +#define PAGEMAP_SD_GET _IOWR('f', 16, struct pagemap_sd_args) +#define PAGEMAP_SD_CLEAR _IOWR('f', 17, struct pagemap_sd_args) +#define PAGEMAP_SD_GET_AND_CLEAR _IOWR('f', 18, struct pagemap_sd_args) +#define PAGEMAP_SD_NO_REUSED_REGIONS 0x1 + #endif /* _UAPI_LINUX_FS_H */ diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h index b7b56871029c..a7e48ba9457b 100644 --- a/tools/include/uapi/linux/fs.h +++ b/tools/include/uapi/linux/fs.h @@ -305,4 +305,17 @@ typedef int __bitwise __kernel_rwf_t; #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ RWF_APPEND) +struct pagemap_sd_args { + void __user *start; + int len; + loff_t __user *vec; + int vec_len; + int flags; +}; + +#define PAGEMAP_SD_GET _IOWR('f', 16, struct pagemap_sd_args) +#define PAGEMAP_SD_CLEAR _IOWR('f', 17, struct pagemap_sd_args) +#define PAGEMAP_SD_GET_AND_CLEAR _IOWR('f', 18, struct pagemap_sd_args) +#define PAGEMAP_SD_NO_REUSED_REGIONS 0x1 + #endif /* _UAPI_LINUX_FS_H */ From patchwork Thu Aug 25 07:09:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 12954258 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2684C32774 for ; Thu, 25 Aug 2022 07:10:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237284AbiHYHKG (ORCPT ); Thu, 25 Aug 2022 03:10:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237162AbiHYHKC (ORCPT ); Thu, 25 Aug 2022 03:10:02 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6FDB20F7B; Thu, 25 Aug 2022 00:09:59 -0700 (PDT) Received: from lenovo.Home (unknown [39.53.61.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id F11686601E99; Thu, 25 Aug 2022 08:09:55 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1661411398; bh=MEAqS9FO3mi8rXqjN9FmAS/aAjWRNW4CmEGdKHWrrsA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=l415p8BIyCoewaUGlaa2xqnKxWPElUmZubdzUkEFbPR2jFYG29xCt1do/2/Daxvi7 mAYRS3Wy3nGjEzek+eg9kC4e0Hwl8lpn5dB6XDQifwlGLvuusmk9YkQ/d3/zw9ZQsD NXe3Ps/4ZyeoWE0OohYTM8sfv8wsC/c+cu+nQtCfLb5rqZhIdDScg/tXK588hvsiMx vRJDNJ2HUSqHJN+D5yCGRIXWdYCr9hYMrBnMlNykbFs5GeAAuUYlRgxN/0wr/xTMO0 NQoPek6q0oacj3tfBJvKLtgWP7n5OZfidHO9ToptxfdEziAPiCkXQ40afZUrWEwgHo Bo5BnVYBdoM8Q== From: Muhammad Usama Anjum To: Jonathan Corbet , Alexander Viro , Andrew Morton , Shuah Khan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:PROC FILESYSTEM), linux-mm@kvack.org (open list:MEMORY MANAGEMENT), linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Cc: Muhammad Usama Anjum , kernel@collabora.com Subject: [PATCH v2 3/4] selftests: vm: add pagemap ioctl tests Date: Thu, 25 Aug 2022 12:09:25 +0500 Message-Id: <20220825070926.2922471-4-usama.anjum@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220825070926.2922471-1-usama.anjum@collabora.com> References: <20220825070926.2922471-1-usama.anjum@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Add pagemap ioctl tests. Add several different types of tests to judge the correction of the interface. Signed-off-by: Muhammad Usama Anjum --- TAP version 13 1..42 ok 1 sanity_tests no flag specified ok 2 sanity_tests wrong flag specified ok 3 sanity_tests mixture of correct and wrong flags ok 4 sanity_tests Clear area with larger vec size ok 5 Page testing: all new pages must be soft dirty ok 6 Page testing: all pages must not be soft dirty ok 7 Page testing: all pages dirty other than first and the last one ok 8 Page testing: only middle page dirty ok 9 Page testing: only two middle pages dirty ok 10 Page testing: only get 2 dirty pages and clear them as well ok 11 Page testing: Range clear only ok 12 Large Page testing: all new pages must be soft dirty ok 13 Large Page testing: all pages must not be soft dirty ok 14 Large Page testing: all pages dirty other than first and the last one ok 15 Large Page testing: only middle page dirty ok 16 Large Page testing: only two middle pages dirty ok 17 Large Page testing: only get 2 dirty pages and clear them as well ok 18 Large Page testing: Range clear only ok 19 Huge page testing: all new pages must be soft dirty ok 20 Huge page testing: all pages must not be soft dirty ok 21 Huge page testing: all pages dirty other than first and the last one ok 22 Huge page testing: only middle page dirty ok 23 Huge page testing: only two middle pages dirty ok 24 Huge page testing: only get 2 dirty pages and clear them as well ok 25 Huge page testing: Range clear only ok 26 Performance Page testing: page isn't dirty ok 27 Performance Page testing: all pages must not be soft dirty ok 28 Performance Page testing: all pages dirty other than first and the last one ok 29 Performance Page testing: only middle page dirty ok 30 Performance Page testing: only two middle pages dirty ok 31 Performance Page testing: only get 2 dirty pages and clear them as well ok 32 Performance Page testing: Range clear only ok 33 hpage_unit_tests all new huge page must be dirty ok 34 hpage_unit_tests all the huge page must not be dirty ok 35 hpage_unit_tests all the huge page must be dirty and clear ok 36 hpage_unit_tests only middle page dirty ok 37 hpage_unit_tests clear first half of huge page ok 38 hpage_unit_tests clear first half of huge page with limited buffer ok 39 hpage_unit_tests clear second half huge page ok 40 unmapped_region_tests Get dirty pages ok 41 unmapped_region_tests Get dirty pages ok 42 Test test_simple # Totals: pass:42 fail:0 xfail:0 xpass:0 skip:0 error:0 Changes in v2: - Update the tests to use the ioctl interface instead of syscall --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 2 + tools/testing/selftests/vm/pagemap_ioctl.c | 629 +++++++++++++++++++++ 3 files changed, 632 insertions(+) create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 31e5eea2a9b9..334fde556499 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -16,6 +16,7 @@ mremap_dontunmap mremap_test on-fault-limit transhuge-stress +pagemap_ioctl protection_keys protection_keys_32 protection_keys_64 diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 266e965e724c..4296c3268f64 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -51,6 +51,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_PROGS += pagemap_ioctl TEST_GEN_PROGS += soft-dirty TEST_GEN_PROGS += split_huge_page_test TEST_GEN_FILES += ksm_tests @@ -98,6 +99,7 @@ TEST_FILES += va_128TBswitch.sh include ../lib.mk $(OUTPUT)/madv_populate: vm_util.c +$(OUTPUT)/pagemap_ioctl: vm_util.c $(OUTPUT)/soft-dirty: vm_util.c $(OUTPUT)/split_huge_page_test: vm_util.c diff --git a/tools/testing/selftests/vm/pagemap_ioctl.c b/tools/testing/selftests/vm/pagemap_ioctl.c new file mode 100644 index 000000000000..4cd0a8d03012 --- /dev/null +++ b/tools/testing/selftests/vm/pagemap_ioctl.c @@ -0,0 +1,629 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include "vm_util.h" +#include "../kselftest.h" +#include +#include +#include + +#define TEST_ITERATIONS 10000 +#define PAGEMAP "/proc/self/pagemap" +int pagemap_fd; + +static long pagemap_ioctl(void *start, int len, unsigned int cmd, loff_t *vec, + int vec_len, int flag) +{ + struct pagemap_sd_args arg; + int ret; + + arg.start = start; + arg.len = len; + arg.vec = vec; + arg.vec_len = vec_len; + arg.flags = flag; + + ret = ioctl(pagemap_fd, cmd, &arg); + + if (ret < 0) + return ret; + + return ret; +} + +int sanity_tests(int page_size) +{ + char *mem; + int mem_size, vec_size, ret; + loff_t *vec; + + /* 1. wrong operation */ + vec_size = 100; + mem_size = page_size; + + vec = malloc(sizeof(loff_t) * vec_size); + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem || !vec) + ksft_exit_fail_msg("error nomem\n"); + + ksft_test_result(pagemap_ioctl(mem, mem_size, 0, vec, vec_size, 0) < 0, + "%s no flag specified\n", __func__); + ksft_test_result(pagemap_ioctl(mem, mem_size, 0x01000000, vec, vec_size, 0) < 0, + "%s wrong flag specified\n", __func__); + ksft_test_result(pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET | 0xFF, + vec, vec_size, 0) < 0, + "%s mixture of correct and wrong flags\n", __func__); + + /* 2. Clear area with larger vec size */ + ret = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, vec_size, 0); + ksft_test_result(ret >= 0, "%s Clear area with larger vec size\n", __func__); + + free(vec); + munmap(mem, mem_size); + return 0; +} + +void *gethugepage(int map_size) +{ + int ret; + char *map; + size_t hpage_len = read_pmd_pagesize(); + + map = memalign(hpage_len, map_size); + if (!map) + ksft_exit_fail_msg("memalign failed %d %s\n", errno, strerror(errno)); + + ret = madvise(map, map_size, MADV_HUGEPAGE); + if (ret) + ksft_exit_fail_msg("madvise failed %d %d %s\n", ret, errno, strerror(errno)); + + memset(map, 0, map_size); + + if (check_huge(map)) + return map; + + free(map); + return NULL; + +} + +int hpage_unit_tests(int page_size) +{ + char *map; + int i, ret; + size_t hpage_len = read_pmd_pagesize(); + size_t num_pages = 1; + int map_size = hpage_len * num_pages; + int vec_size = map_size/page_size; + loff_t *vec, *vec2; + + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + if (!vec || !vec2) + ksft_exit_fail_msg("malloc failed\n"); + + map = gethugepage(map_size); + if (map) { + // 1. all new huge page must be dirty + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(i == vec_size, "%s all new huge page must be dirty\n", __func__); + + // 2. all the huge page must not be dirty + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ksft_test_result(ret == 0, "%s all the huge page must not be dirty\n", __func__); + + // 3. all the huge page must be dirty and clear dirty as well + memset(map, -1, map_size); + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(ret == vec_size && i == vec_size, + "%s all the huge page must be dirty and clear\n", __func__); + + // 4. only middle page dirty + free(map); + map = gethugepage(map_size); + clear_softdirty(); + map[vec_size/2 * page_size]++; + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", __func__); + + free(map); + } else { + ksft_test_result_skip("all new huge page must be dirty\n"); + ksft_test_result_skip("all the huge page must not be dirty\n"); + ksft_test_result_skip("all the huge page must be dirty and clear\n"); + ksft_test_result_skip("only middle page dirty\n"); + } + + // 5. clear first half of huge page + map = gethugepage(map_size); + if (map) { + ret = pagemap_ioctl(map, map_size/2, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != (i + vec_size/2) * page_size) + break; + + ksft_test_result(i == vec_size/2 && ret == vec_size/2, + "%s clear first half of huge page\n", __func__); + free(map); + } else { + ksft_test_result_skip("clear first half of huge page\n"); + } + + // 6. clear first half of huge page with limited buffer + map = gethugepage(map_size); + if (map) { + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size/2, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != (i + vec_size/2) * page_size) + break; + + ksft_test_result(i == vec_size/2 && ret == vec_size/2, + "%s clear first half of huge page with limited buffer\n", + __func__); + free(map); + } else { + ksft_test_result_skip("clear first half of huge page with limited buffer\n"); + } + + // 7. clear second half of huge page + map = gethugepage(map_size); + if (map) { + memset(map, -1, map_size); + ret = pagemap_ioctl(map + map_size/2, map_size/2, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(i == vec_size/2, "%s clear second half huge page\n", __func__); + free(map); + } else { + ksft_test_result_skip("clear second half huge page\n"); + } + + free(vec); + free(vec2); + return 0; +} + +int base_tests(char *prefix, char *mem, int mem_size, int page_size, int skip) +{ + int vec_size, i, j, ret, dirty_pages, dirty_pages2; + loff_t *vec, *vec2; + + if (skip) { + ksft_test_result_skip("%s all new pages must be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages must not be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages dirty other than first and the last one\n", + prefix); + ksft_test_result_skip("%s only middle page dirty\n", prefix); + ksft_test_result_skip("%s only two middle pages dirty\n", prefix); + ksft_test_result_skip("%s only get 2 dirty pages and clear them as well\n", prefix); + ksft_test_result_skip("%s Range clear only\n", prefix); + return 0; + } + + vec_size = mem_size/page_size; + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + + /* 1. all new pages must be soft dirty and clear the range for next test */ + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size - 2, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, vec2, vec_size, 0); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + for (i = 0; i < dirty_pages; i++) + if (vec[i] != i * page_size) + break; + for (j = 0; j < dirty_pages2; j++) + if (vec2[j] != (j + vec_size - 2) * page_size) + break; + + ksft_test_result(dirty_pages == vec_size - 2 && i == dirty_pages && + dirty_pages2 == 2 && j == dirty_pages2, + "%s all new pages must be soft dirty\n", prefix); + + // 2. all pages must not be soft dirty + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s all pages must not be soft dirty\n", prefix); + + // 3. all pages dirty other than first and the last one + memset(mem + page_size, -1, (mem_size - (2 * page_size))); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages >= vec_size - 2 && dirty_pages <= vec_size, + "%s all pages dirty other than first and the last one\n", prefix); + + // 4. only middle page dirty + clear_softdirty(); + mem[vec_size/2 * page_size]++; + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", prefix); + + // 5. only two middle pages dirty and walk over only middle pages + clear_softdirty(); + mem[vec_size/2 * page_size]++; + mem[(vec_size/2 + 1) * page_size]++; + + dirty_pages = pagemap_ioctl(&mem[vec_size/2 * page_size], 2 * page_size, + PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 2 && vec[0] == 0 && vec[1] == page_size, + "%s only two middle pages dirty\n", prefix); + + /* 6. only get 2 dirty pages and clear them as well */ + memset(mem, -1, mem_size); + + /* get and clear second and third pages */ + ret = pagemap_ioctl(mem + page_size, 2 * page_size, PAGEMAP_SD_GET_AND_CLEAR, vec, 2, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec2, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size - 2; i++) { + if (i == 0 && (vec[i] != 0 || vec2[i] != 0)) + break; + else if (i == 1 && (vec[i] != page_size || vec2[i] != (i + 2) * page_size)) + break; + else if (i > 1 && (vec2[i] != (i + 2) * page_size)) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s only get 2 dirty pages and clear them as well\n", prefix); + /* 7. Range clear only */ + memset(mem, -1, mem_size); + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, "%s Range clear only\n", + prefix); + + free(vec); + free(vec2); + return 0; +} + +int performance_base_tests(char *prefix, char *mem, int mem_size, int page_size, int skip) +{ + int vec_size, i, ret, dirty_pages, dirty_pages2; + loff_t *vec, *vec2; + + if (skip) { + ksft_test_result_skip("%s all new pages must be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages must not be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages dirty other than first and the last one\n", + prefix); + ksft_test_result_skip("%s only middle page dirty\n", prefix); + ksft_test_result_skip("%s only two middle pages dirty\n", prefix); + ksft_test_result_skip("%s only get 2 dirty pages and clear them as well\n", prefix); + ksft_test_result_skip("%s Range clear only\n", prefix); + return 0; + } + + vec_size = mem_size/page_size; + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + + /* 1. all new pages must be soft dirty and clear the range for next test */ + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, vec_size - 2, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec2, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, + "%s page isn't dirty\n", prefix); + + // 2. all pages must not be soft dirty + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s all pages must not be soft dirty\n", prefix); + + // 3. all pages dirty other than first and the last one + memset(mem + page_size, -1, (mem_size - 2 * page_size)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < dirty_pages; i++) { + if (vec[i] != (i + 1) * page_size) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s all pages dirty other than first and the last one\n", prefix); + + // 4. only middle page dirty + clear_softdirty(); + mem[vec_size/2 * page_size]++; + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", prefix); + + // 5. only two middle pages dirty and walk over only middle pages + clear_softdirty(); + mem[vec_size/2 * page_size]++; + mem[(vec_size/2 + 1) * page_size]++; + + dirty_pages = pagemap_ioctl(&mem[vec_size/2 * page_size], 2 * page_size, + PAGEMAP_SD_GET, vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 2 && vec[0] == 0 && vec[1] == page_size, + "%s only two middle pages dirty\n", prefix); + + /* 6. only get 2 dirty pages and clear them as well */ + memset(mem, -1, mem_size); + + /* get and clear second and third pages */ + ret = pagemap_ioctl(mem + page_size, 2 * page_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, 2, PAGEMAP_SD_NO_REUSED_REGIONS); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec2, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size - 2; i++) { + if (i == 0 && (vec[i] != 0 || vec2[i] != 0)) + break; + else if (i == 1 && (vec[i] != page_size || vec2[i] != (i + 2) * page_size)) + break; + else if (i > 1 && (vec2[i] != (i + 2) * page_size)) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s only get 2 dirty pages and clear them as well\n", prefix); + + /* 7. Range clear only */ + memset(mem, -1, mem_size); + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_CLEAR, + NULL, 0, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, "%s Range clear only\n", + prefix); + + free(vec); + free(vec2); + return 0; +} + +int unmapped_region_tests(int page_size) +{ + void *start = (void *)0x10000000; + int dirty_pages, len = 0x00040000; + int vec_size = len / page_size; + loff_t *vec = malloc(sizeof(loff_t) * vec_size); + + /* 1. Get dirty pages */ + dirty_pages = pagemap_ioctl(start, len, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages >= 0, "%s Get dirty pages\n", __func__); + + /* 2. Clear dirty bit of whole address space */ + dirty_pages = pagemap_ioctl(0, 0x7FFFFFFF, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s Get dirty pages\n", __func__); + + free(vec); + return 0; +} + +static void test_simple(int page_size) +{ + int i; + char *map; + loff_t *vec = NULL; + + map = aligned_alloc(page_size, page_size); + if (!map) + ksft_exit_fail_msg("mmap failed\n"); + + clear_softdirty(); + + for (i = 0 ; i < TEST_ITERATIONS; i++) { + if (pagemap_ioctl(map, page_size, PAGEMAP_SD_GET, vec, 1, 0) == 1) { + ksft_print_msg("dirty bit was 1, but should be 0 (i=%d)\n", i); + break; + } + + clear_softdirty(); + // Write something to the page to get the dirty bit enabled on the page + map[0]++; + + if (pagemap_ioctl(map, page_size, PAGEMAP_SD_GET, vec, 1, 0) == 0) { + ksft_print_msg("dirty bit was 0, but should be 1 (i=%d)\n", i); + break; + } + + clear_softdirty(); + } + free(map); + + ksft_test_result(i == TEST_ITERATIONS, "Test %s\n", __func__); +} + +int main(int argc, char **argv) +{ + int page_size = getpagesize(); + size_t hpage_len = read_pmd_pagesize(); + char *mem, *map; + int mem_size; + + ksft_print_header(); + ksft_set_plan(42); + + pagemap_fd = open(PAGEMAP, O_RDWR); + if (pagemap_fd < 0) + return -EINVAL; + + /* 1. Sanity testing */ + sanity_tests(page_size); + + /* 2. Normal page testing */ + mem_size = 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + base_tests("Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 3. Large page testing */ + mem_size = 512 * 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + base_tests("Large Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 4. Huge page testing */ + map = gethugepage(hpage_len); + if (check_huge(map)) + base_tests("Huge page testing:", map, hpage_len, page_size, 0); + else + base_tests("Huge page testing:", NULL, 0, 0, 1); + + free(map); + + /* 5. Normal page testing */ + mem_size = 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + performance_base_tests("Performance Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 6. Huge page tests */ + hpage_unit_tests(page_size); + + /* 7. Unmapped address test */ + unmapped_region_tests(page_size); + + /* 8. Iterative test */ + test_simple(page_size); + + close(pagemap_fd); + return ksft_exit_pass(); +} From patchwork Thu Aug 25 07:09:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muhammad Usama Anjum X-Patchwork-Id: 12954259 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33E68C64992 for ; Thu, 25 Aug 2022 07:10:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231259AbiHYHKJ (ORCPT ); Thu, 25 Aug 2022 03:10:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237197AbiHYHKF (ORCPT ); Thu, 25 Aug 2022 03:10:05 -0400 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CF4C31379; Thu, 25 Aug 2022 00:10:02 -0700 (PDT) Received: from lenovo.Home (unknown [39.53.61.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id E66546601E9C; Thu, 25 Aug 2022 08:09:58 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1661411401; bh=qFwfAW+YdGQMEoUt1Jsqu4meeyL+SBRRa4jSpOBxxag=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Bx/9rVPU+JTiTRGq6yKLfsVSVb5l7v9f/qPpy7CBWa0cJUDj+xYFFMRaY1jmAeFth nuqIe9jNrr3lohnEK9M67it9OomP2JbkAK3gTNI4/s8h/d1MGjm54kIIKSI682fQNy oSDyudVxA60s1YOELUYREVVfC033R1ILywyNy7kWoJG5YzPobioz9IYw6SJ0IXuOfR EKByu1vasxgHu5DwxLGW1nnNS2Aq4WIjnkwuBi0HN3BSqIdsMDZ5YXi51yyES4bzGM cO++TskZMNgLzfDHY9BIz/vrEtu/h+VRL6qGJt4tgZpPCrafRy6THV4q/2NMCekJDc D7i43/jy/FPJw== From: Muhammad Usama Anjum To: Jonathan Corbet , Alexander Viro , Andrew Morton , Shuah Khan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list), linux-fsdevel@vger.kernel.org (open list:PROC FILESYSTEM), linux-mm@kvack.org (open list:MEMORY MANAGEMENT), linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Cc: Muhammad Usama Anjum , kernel@collabora.com Subject: [PATCH v2 4/4] mm: add documentation of the new ioctl on pagemap Date: Thu, 25 Aug 2022 12:09:26 +0500 Message-Id: <20220825070926.2922471-5-usama.anjum@collabora.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220825070926.2922471-1-usama.anjum@collabora.com> References: <20220825070926.2922471-1-usama.anjum@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Add the explanation of the added ioctl on pagemap file. It can be used to get, clear or both soft-dirty PTE bit of the specified range. or both at the same time. Signed-off-by: Muhammad Usama Anjum --- Changes in v2: - Update documentation to mention ioctl instead of the syscall --- Documentation/admin-guide/mm/soft-dirty.rst | 42 ++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/soft-dirty.rst b/Documentation/admin-guide/mm/soft-dirty.rst index cb0cfd6672fa..d3d33e63a965 100644 --- a/Documentation/admin-guide/mm/soft-dirty.rst +++ b/Documentation/admin-guide/mm/soft-dirty.rst @@ -5,7 +5,12 @@ Soft-Dirty PTEs =============== The soft-dirty is a bit on a PTE which helps to track which pages a task -writes to. In order to do this tracking one should +writes to. + +Using Proc FS +------------- + +In order to do this tracking one should 1. Clear soft-dirty bits from the task's PTEs. @@ -20,6 +25,41 @@ writes to. In order to do this tracking one should 64-bit qword is the soft-dirty one. If set, the respective PTE was written to since step 1. +Using IOCTL +----------- + +The IOCTL on the ``/proc/PID/pagemap`` can be can be used to find the dirty pages +atomically. The following commands are supported:: + + MEMWATCH_SD_GET + Get the page offsets which are soft dirty. + + MEMWATCH_SD_CLEAR + Clear the pages which are soft dirty. + + MEMWATCH_SD_GET_AND_CLEAR + Get and clear the pages which are soft dirty. + +The struct :c:type:`pagemap_sd_args` is used as the argument. In this struct: + + 1. The range is specified through start and len. The len argument need not be + the multiple of the page size, but since the information is returned for the + whole pages, len is effectively rounded up to the next multiple of the page + size. + + 2. The output buffer and size is specified in vec and vec_len. The offsets of + the dirty pages from start are returned in vec. The ioctl returns when the + whole range has been searched or vec is completely filled. The whole range + isn't cleared if vec fills up completely. + + 3. The flags can be specified in flags field. Currently only one flag, + PAGEMAP_SD_NO_REUSED_REGIONS is supported which can be specified to ignore + the VMA dirty flags for better performance. This flag shows only those pages + dirty which have been written to by the user. All new allocations aren't returned + to be dirty. + +Explanation +----------- Internally, to do this tracking, the writable bit is cleared from PTEs when the soft-dirty bit is cleared. So, after this, when the task tries to