From patchwork Fri Aug 7 13:13:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11705633 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 08D70618 for ; Fri, 7 Aug 2020 13:13:47 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E5B9A22C9F for ; Fri, 7 Aug 2020 13:13:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E5B9A22C9F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id BF22812C2F602; Fri, 7 Aug 2020 06:13:46 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=183.91.158.132; helo=heian.cn.fujitsu.com; envelope-from=ruansy.fnst@cn.fujitsu.com; receiver= Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by ml01.01.org (Postfix) with ESMTP id 6203812C6067C for ; Fri, 7 Aug 2020 06:13:44 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.75,445,1589212800"; d="scan'208";a="97774910" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Aug 2020 21:13:42 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 364964CE34EF; Fri, 7 Aug 2020 21:13:37 +0800 (CST) Received: from G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:39 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 7 Aug 2020 21:13:37 +0800 From: Shiyang Ruan To: , , Subject: [RFC PATCH 1/8] fs: introduce get_shared_files() for dax&reflink Date: Fri, 7 Aug 2020 21:13:29 +0800 Message-ID: <20200807131336.318774-2-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> References: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 364964CE34EF.ADD3D X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Message-ID-Hash: JSR2W7E4J3KPMR7IE4OBMJ3LJGX72J67 X-Message-ID-Hash: JSR2W7E4J3KPMR7IE4OBMJ3LJGX72J67 X-MailFrom: ruansy.fnst@cn.fujitsu.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Under the mode of both dax and reflink on, one page may be shared by multiple files and offsets. In order to track them in memory-failure or other cases, we introduce this function by finding out who is sharing this block(the page) in a filesystem. It returns a list that contains all the owners, and the offset in each owner. For XFS, rmapbt is used to find out the owners of one block. So, it should be turned on when we want to use dax&reflink feature together. Signed-off-by: Shiyang Ruan --- fs/xfs/xfs_super.c | 67 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/dax.h | 7 +++++ include/linux/fs.h | 2 ++ 3 files changed, 76 insertions(+) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 379cbff438bc..b71392219c91 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -35,6 +35,9 @@ #include "xfs_refcount_item.h" #include "xfs_bmap_item.h" #include "xfs_reflink.h" +#include "xfs_alloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" #include #include @@ -1097,6 +1100,69 @@ xfs_fs_free_cached_objects( return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan); } +static int _get_shared_files_fn( + struct xfs_btree_cur *cur, + struct xfs_rmap_irec *rec, + void *priv) +{ + struct list_head *list = priv; + struct xfs_inode *ip; + struct shared_files *sfp; + + /* Get files that incore, filter out others that are not in use. */ + xfs_iget(cur->bc_mp, cur->bc_tp, rec->rm_owner, XFS_IGET_INCORE, 0, &ip); + if (ip && !ip->i_vnode.i_mapping) + return 0; + + sfp = kmalloc(sizeof(*sfp), GFP_KERNEL); + sfp->mapping = ip->i_vnode.i_mapping; + sfp->index = rec->rm_offset; + list_add_tail(&sfp->list, list); + + return 0; +} + +static int +xfs_fs_get_shared_files( + struct super_block *sb, + pgoff_t offset, + struct list_head *list) +{ + struct xfs_mount *mp = XFS_M(sb); + struct xfs_trans *tp = NULL; + struct xfs_btree_cur *cur = NULL; + struct xfs_rmap_irec rmap_low = { 0 }, rmap_high = { 0 }; + struct xfs_buf *agf_bp = NULL; + xfs_agblock_t bno = XFS_B_TO_FSB(mp, offset); + xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, bno); + int error = 0; + + error = xfs_trans_alloc_empty(mp, &tp); + if (error) + return error; + + error = xfs_alloc_read_agf(mp, tp, agno, 0, &agf_bp); + if (error) + return error; + + cur = xfs_rmapbt_init_cursor(mp, tp, agf_bp, agno); + + memset(&cur->bc_rec, 0, sizeof(cur->bc_rec)); + /* Construct the range for one rmap search */ + memset(&rmap_low, 0, sizeof(rmap_low)); + memset(&rmap_high, 0xFF, sizeof(rmap_high)); + rmap_low.rm_startblock = rmap_high.rm_startblock = bno; + + error = xfs_rmap_query_range(cur, &rmap_low, &rmap_high, + _get_shared_files_fn, list); + if (error == -ECANCELED) + error = 0; + + xfs_btree_del_cursor(cur, error); + xfs_trans_brelse(tp, agf_bp); + return error; +} + static const struct super_operations xfs_super_operations = { .alloc_inode = xfs_fs_alloc_inode, .destroy_inode = xfs_fs_destroy_inode, @@ -1110,6 +1176,7 @@ static const struct super_operations xfs_super_operations = { .show_options = xfs_fs_show_options, .nr_cached_objects = xfs_fs_nr_cached_objects, .free_cached_objects = xfs_fs_free_cached_objects, + .get_shared_files = xfs_fs_get_shared_files, }; static int diff --git a/include/linux/dax.h b/include/linux/dax.h index 6904d4e0b2e0..0a85e321d6b4 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -40,6 +40,13 @@ struct dax_operations { extern struct attribute_group dax_attribute_group; +struct shared_files { + struct list_head list; + struct address_space *mapping; + pgoff_t index; + dax_entry_t cookie; +}; + #if IS_ENABLED(CONFIG_DAX) struct dax_device *dax_get_by_host(const char *host); struct dax_device *alloc_dax(void *private, const char *host, diff --git a/include/linux/fs.h b/include/linux/fs.h index f5abba86107d..81de3d2739b9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1977,6 +1977,8 @@ struct super_operations { struct shrink_control *); long (*free_cached_objects)(struct super_block *, struct shrink_control *); + int (*get_shared_files)(struct super_block *sb, pgoff_t offset, + struct list_head *list); }; /* From patchwork Fri Aug 7 13:13:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11705641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 109D9618 for ; Fri, 7 Aug 2020 13:13:51 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EE96B22C9F for ; Fri, 7 Aug 2020 13:13:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE96B22C9F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 276E812C2F610; Fri, 7 Aug 2020 06:13:50 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=183.91.158.132; helo=heian.cn.fujitsu.com; envelope-from=ruansy.fnst@cn.fujitsu.com; receiver= Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by ml01.01.org (Postfix) with ESMTP id 2A52712C2F606 for ; Fri, 7 Aug 2020 06:13:46 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.75,445,1589212800"; d="scan'208";a="97774912" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Aug 2020 21:13:42 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 3D9444CE34F4; Fri, 7 Aug 2020 21:13:38 +0800 (CST) Received: from G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:40 +0800 Received: from G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) by G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:40 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 7 Aug 2020 21:13:38 +0800 From: Shiyang Ruan To: , , Subject: [RFC PATCH 2/8] fsdax, mm: track files sharing dax page for memory-failure Date: Fri, 7 Aug 2020 21:13:30 +0800 Message-ID: <20200807131336.318774-3-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> References: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 3D9444CE34F4.AE25B X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Message-ID-Hash: RCK4HWXTJMGOY76V4HP6EQKKCP5PDQ53 X-Message-ID-Hash: RCK4HWXTJMGOY76V4HP6EQKKCP5PDQ53 X-MailFrom: ruansy.fnst@cn.fujitsu.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: When memory-failure occurs on a pmem device which contains a filesystem, we need to find out which files are in using, and then notify processes that are using these files to handle the error. The design of the track method is as follow: 1. dax_assocaite_entry() associates the owner's info to this page - For non-reflink case: page->mapping,->index stores the file's mapping, offset in file. A dax page is not shared by other files. dax_associate_entry() is called only once. So, use page->mapping,->index to store the owner's info. - For reflink case: page->mapping,->index stores the block device, offset in device. A dax page is shared more than once. So, dax_assocaite_entry() will be called more than once. We introduce page->zone_device_data as reflink counter, to indicate that this page is shared and how many owners now is using this page. The page->mapping,->index is used to store the block_device of the fs and page offset of this device. 2. dax_lock_page() calls query interface to lock each dax entry - For non-reflink case: owner's info is stored in page->mapping,->index. So, It is easy to lock its dax entry. - For reflink case: owner's info is obtained by calling get_shared_files(), which is implemented by FS. The FS context could be found in block_device that stored by page->mapping. Then lock the dax entries of the owners. In memory-failure(), since owners list has been obtained in dax_lock_page(), just iterate the list and handle the error. Signed-off-by: Shiyang Ruan --- fs/dax.c | 111 +++++++++++++++++++++++++++-------- include/linux/dax.h | 6 +- include/linux/mm.h | 8 +++ mm/memory-failure.c | 138 +++++++++++++++++++++++++++----------------- 4 files changed, 183 insertions(+), 80 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 11b16729b86f..47380f75ef38 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -329,7 +329,7 @@ static unsigned long dax_end_pfn(void *entry) * offsets. */ static void dax_associate_entry(void *entry, struct address_space *mapping, - struct vm_area_struct *vma, unsigned long address) + struct vm_fault *vmf, pgoff_t offset) { unsigned long size = dax_entry_size(entry), pfn, index; int i = 0; @@ -337,13 +337,27 @@ static void dax_associate_entry(void *entry, struct address_space *mapping, if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) return; - index = linear_page_index(vma, address & ~(size - 1)); + index = linear_page_index(vmf->vma, vmf->address & ~(size - 1)); for_each_mapped_pfn(entry, pfn) { struct page *page = pfn_to_page(pfn); - WARN_ON_ONCE(page->mapping); - page->mapping = mapping; - page->index = index + i++; + BUG_ON(!page->mapping && IS_FSDAX_SHARED(page)); + + /* Use zone_device_data as reflink counter here. + * If one page is associated for the first time, then use the + * ->mapping,->index as normal. For the second time it is + * assocated, we store the block_device that this page belongs + * to in ->mapping and the offset within this block_device in + * ->index, and increase the reflink counter. + */ + if (!page->mapping) { + page->mapping = mapping; + page->index = index + i++; + } else { + page->mapping = (struct address_space *)mapping->host->i_sb->s_bdev; + page->index = offset; + page->zone_device_data++; + } } } @@ -359,9 +373,12 @@ static void dax_disassociate_entry(void *entry, struct address_space *mapping, struct page *page = pfn_to_page(pfn); WARN_ON_ONCE(trunc && page_ref_count(page) > 1); - WARN_ON_ONCE(page->mapping && page->mapping != mapping); - page->mapping = NULL; - page->index = 0; + if (!IS_FSDAX_SHARED(page)) { + page->mapping = NULL; + page->index = 0; + } else { + page->zone_device_data--; + } } } @@ -386,7 +403,7 @@ static struct page *dax_busy_page(void *entry) * Return: A cookie to pass to dax_unlock_page() or 0 if the entry could * not be locked. */ -dax_entry_t dax_lock_page(struct page *page) +void _dax_lock_page(struct page *page, struct shared_files *sfp) { XA_STATE(xas, NULL, 0); void *entry; @@ -394,7 +411,7 @@ dax_entry_t dax_lock_page(struct page *page) /* Ensure page->mapping isn't freed while we look at it */ rcu_read_lock(); for (;;) { - struct address_space *mapping = READ_ONCE(page->mapping); + struct address_space *mapping = READ_ONCE(sfp->mapping); entry = NULL; if (!mapping || !dax_mapping(mapping)) @@ -413,11 +430,11 @@ dax_entry_t dax_lock_page(struct page *page) xas.xa = &mapping->i_pages; xas_lock_irq(&xas); - if (mapping != page->mapping) { + if (mapping != sfp->mapping) { xas_unlock_irq(&xas); continue; } - xas_set(&xas, page->index); + xas_set(&xas, sfp->index); entry = xas_load(&xas); if (dax_is_locked(entry)) { rcu_read_unlock(); @@ -430,18 +447,61 @@ dax_entry_t dax_lock_page(struct page *page) break; } rcu_read_unlock(); - return (dax_entry_t)entry; + sfp->cookie = (dax_entry_t)entry; +} + +int dax_lock_page(struct page *page, struct list_head *shared_files) +{ + struct shared_files *sfp; + + if (IS_FSDAX_SHARED(page)) { + struct block_device *bdev = (struct block_device *)page->mapping; + + bdev->bd_super->s_op->get_shared_files(bdev->bd_super, + page->index, shared_files); + list_for_each_entry(sfp, shared_files, list) { + _dax_lock_page(page, sfp); + if (!sfp->cookie) + return 1; + } + } else { + sfp = kmalloc(sizeof(*sfp), GFP_KERNEL); + sfp->mapping = page->mapping; + sfp->index = page->index; + list_add(&sfp->list, shared_files); + _dax_lock_page(page, sfp); + if (!sfp->cookie) + return 1; + } + return 0; } -void dax_unlock_page(struct page *page, dax_entry_t cookie) +void _dax_unlock_page(struct page *page, struct shared_files *sfp) { - struct address_space *mapping = page->mapping; - XA_STATE(xas, &mapping->i_pages, page->index); + struct address_space *mapping = sfp->mapping; + XA_STATE(xas, &mapping->i_pages, sfp->index); if (S_ISCHR(mapping->host->i_mode)) return; - dax_unlock_entry(&xas, (void *)cookie); + dax_unlock_entry(&xas, (void *)sfp->cookie); +} + +void dax_unlock_page(struct page *page, struct list_head *shared_files) +{ + struct shared_files *sfp, *next; + + if (IS_FSDAX_SHARED(page)) { + list_for_each_entry_safe(sfp, next, shared_files, list) { + if (!sfp->cookie) + _dax_unlock_page(page, sfp); + kfree(sfp); + } + } else { + if (!sfp->cookie) + _dax_unlock_page(page, sfp); + kfree(sfp); + } } /* @@ -715,7 +775,8 @@ static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev, */ static void *dax_insert_entry(struct xa_state *xas, struct address_space *mapping, struct vm_fault *vmf, - void *entry, pfn_t pfn, unsigned long flags, bool dirty) + pgoff_t bdoff, void *entry, pfn_t pfn, unsigned long flags, + bool dirty) { void *new_entry = dax_make_entry(pfn, flags); @@ -738,7 +799,7 @@ static void *dax_insert_entry(struct xa_state *xas, void *old; dax_disassociate_entry(entry, mapping, false); - dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address); + dax_associate_entry(new_entry, mapping, vmf, bdoff); /* * Only swap our new entry into the page cache if the current * entry is a zero page or an empty entry. If a normal PTE or @@ -1030,7 +1091,7 @@ static vm_fault_t dax_load_hole(struct xa_state *xas, pfn_t pfn = pfn_to_pfn_t(my_zero_pfn(vaddr)); vm_fault_t ret; - *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, + *entry = dax_insert_entry(xas, mapping, vmf, 0, *entry, pfn, DAX_ZERO_PAGE, false); ret = vmf_insert_mixed(vmf->vma, vaddr, pfn); @@ -1337,8 +1398,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, if (error < 0) goto error_finish_iomap; - entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, - 0, write && !sync); + entry = dax_insert_entry(&xas, mapping, vmf, iomap.addr, entry, + pfn, 0, write && !sync); /* * If we are doing synchronous page fault and inode needs fsync, @@ -1418,7 +1479,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault *vmf, goto fallback; pfn = page_to_pfn_t(zero_page); - *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, + *entry = dax_insert_entry(xas, mapping, vmf, 0, *entry, pfn, DAX_PMD | DAX_ZERO_PAGE, false); if (arch_needs_pgtable_deposit()) { @@ -1555,8 +1616,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, if (error < 0) goto finish_iomap; - entry = dax_insert_entry(&xas, mapping, vmf, entry, pfn, - DAX_PMD, write && !sync); + entry = dax_insert_entry(&xas, mapping, vmf, iomap.addr, entry, + pfn, DAX_PMD, write && !sync); /* * If we are doing synchronous page fault and inode needs fsync, diff --git a/include/linux/dax.h b/include/linux/dax.h index 0a85e321d6b4..d6d9f4b721bc 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -8,7 +8,7 @@ /* Flag for synchronous flush */ #define DAXDEV_F_SYNC (1UL << 0) - +#define IS_FSDAX_SHARED(p) ((unsigned long)p->zone_device_data != 0) typedef unsigned long dax_entry_t; struct iomap_ops; @@ -148,8 +148,8 @@ int dax_writeback_mapping_range(struct address_space *mapping, struct dax_device *dax_dev, struct writeback_control *wbc); struct page *dax_layout_busy_page(struct address_space *mapping); -dax_entry_t dax_lock_page(struct page *page); -void dax_unlock_page(struct page *page, dax_entry_t cookie); +int dax_lock_page(struct page *page, struct list_head *shared_files); +void dax_unlock_page(struct page *page, struct list_head *shared_files); #else static inline bool bdev_dax_supported(struct block_device *bdev, int blocksize) diff --git a/include/linux/mm.h b/include/linux/mm.h index dc7b87310c10..bffccc34f1a1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1115,6 +1115,14 @@ static inline bool is_device_private_page(const struct page *page) page->pgmap->type == MEMORY_DEVICE_PRIVATE; } +static inline bool is_device_fsdax_page(const struct page *page) +{ + return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && + IS_ENABLED(CONFIG_DEVICE_PRIVATE) && + is_zone_device_page(page) && + page->pgmap->type == MEMORY_DEVICE_FS_DAX; +} + static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 47b8ccb1fb9b..53cc65821c99 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -93,9 +93,13 @@ static int hwpoison_filter_dev(struct page *p) if (PageSlab(p)) return -EINVAL; - mapping = page_mapping(p); - if (mapping == NULL || mapping->host == NULL) - return -EINVAL; + if (is_device_fsdax_page(p) & IS_FSDAX_SHARED(p)) + dev = ((struct block_device *)p->mapping)->s_dev; + else { + mapping = page_mapping(p); + if (mapping == NULL || mapping->host == NULL) + return -EINVAL; + } dev = mapping->host->i_sb->s_dev; if (hwpoison_filter_dev_major != ~0U && @@ -263,9 +267,8 @@ void shake_page(struct page *p, int access) EXPORT_SYMBOL_GPL(shake_page); static unsigned long dev_pagemap_mapping_shift(struct page *page, - struct vm_area_struct *vma) + struct vm_area_struct *vma, unsigned long address) { - unsigned long address = vma_address(page, vma); pgd_t *pgd; p4d_t *p4d; pud_t *pud; @@ -306,8 +309,8 @@ static unsigned long dev_pagemap_mapping_shift(struct page *page, * Uses GFP_ATOMIC allocations to avoid potential recursions in the VM. */ static void add_to_kill(struct task_struct *tsk, struct page *p, - struct vm_area_struct *vma, - struct list_head *to_kill) + struct address_space *mapping, pgoff_t offset, + struct vm_area_struct *vma, struct list_head *to_kill) { struct to_kill *tk; @@ -317,12 +320,18 @@ static void add_to_kill(struct task_struct *tsk, struct page *p, return; } - tk->addr = page_address_in_vma(p, vma); - if (is_zone_device_page(p)) - tk->size_shift = dev_pagemap_mapping_shift(p, vma); - else - tk->size_shift = page_shift(compound_head(p)); - + if (is_device_fsdax_page(p)) { + tk->addr = vma->vm_start + + ((offset - vma->vm_pgoff) << PAGE_SHIFT); + tk->size_shift = dev_pagemap_mapping_shift(p, vma, tk->addr); + } else { + tk->addr = page_address_in_vma(p, vma); + if (is_zone_device_page(p)) { + tk->size_shift = dev_pagemap_mapping_shift(p, vma, + vma_address(p, vma)); + } else + tk->size_shift = page_shift(compound_head(p)); + } /* * Send SIGKILL if "tk->addr == -EFAULT". Also, as * "tk->size_shift" is always non-zero for !is_zone_device_page(), @@ -468,33 +477,26 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill, if (!page_mapped_in_vma(page, vma)) continue; if (vma->vm_mm == t->mm) - add_to_kill(t, page, vma, to_kill); + add_to_kill(t, page, NULL, 0, vma, to_kill); } } read_unlock(&tasklist_lock); page_unlock_anon_vma_read(av); } -/* - * Collect processes when the error hit a file mapped page. - */ -static void collect_procs_file(struct page *page, struct list_head *to_kill, - int force_early) +static void collect_procs_file_one(struct page *page, struct list_head *to_kill, + struct address_space *mapping, pgoff_t pgoff, bool force_early) { struct vm_area_struct *vma; struct task_struct *tsk; - struct address_space *mapping = page->mapping; i_mmap_lock_read(mapping); read_lock(&tasklist_lock); for_each_process(tsk) { - pgoff_t pgoff = page_to_pgoff(page); struct task_struct *t = task_early_kill(tsk, force_early); - if (!t) continue; - vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, - pgoff) { + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { /* * Send early kill signal to tasks where a vma covers * the page but the corrupted page is not necessarily @@ -502,19 +504,39 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, * Assume applications who requested early kill want * to be informed of all such data corruptions. */ - if (vma->vm_mm == t->mm) - add_to_kill(t, page, vma, to_kill); + if (vma->vm_mm == t->mm) { + add_to_kill(t, page, mapping, pgoff, vma, + to_kill); + } } } read_unlock(&tasklist_lock); i_mmap_unlock_read(mapping); } +/* + * Collect processes when the error hit a file mapped page. + */ +static void collect_procs_file(struct page *page, struct list_head *to_kill, + struct list_head *shared_files, int force_early) +{ + struct shared_files *sfp; + + if (shared_files) + list_for_each_entry(sfp, shared_files, list) { + collect_procs_file_one(page, to_kill, sfp->mapping, + sfp->index, force_early); + } + else + collect_procs_file_one(page, to_kill, page->mapping, + page_to_pgoff(page), force_early); +} + /* * Collect the processes who have the corrupted page mapped to kill. */ static void collect_procs(struct page *page, struct list_head *tokill, - int force_early) + struct list_head *shared_files, int force_early) { if (!page->mapping) return; @@ -522,7 +544,33 @@ static void collect_procs(struct page *page, struct list_head *tokill, if (PageAnon(page)) collect_procs_anon(page, tokill, force_early); else - collect_procs_file(page, tokill, force_early); + collect_procs_file(page, tokill, shared_files, force_early); +} + +static void unmap_mappings_range(struct list_head *to_kill, + struct list_head *shared_files) +{ + unsigned long size = 0; + loff_t start = 0; + struct to_kill *tk; + struct shared_files *sfp; + + list_for_each_entry(tk, to_kill, nd) + if (tk->size_shift) + size = max(size, 1UL << tk->size_shift); + if (size) { + /* + * Unmap the largest mapping to avoid breaking up + * device-dax mappings which are constant size. The + * actual size of the mapping being torn down is + * communicated in siginfo, see kill_proc() + */ + list_for_each_entry(sfp, shared_files, list) { + start = (sfp->index << PAGE_SHIFT) & ~(size - 1); + unmap_mapping_range(sfp->mapping, start, + start + size, 0); + } + } } static const char *action_name[] = { @@ -1026,7 +1074,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, * there's nothing that can be done. */ if (kill) - collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED); + collect_procs(hpage, &tokill, NULL, flags & MF_ACTION_REQUIRED); if (!PageHuge(hpage)) { unmap_success = try_to_unmap(hpage, ttu); @@ -1181,12 +1229,10 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, { struct page *page = pfn_to_page(pfn); const bool unmap_success = true; - unsigned long size = 0; - struct to_kill *tk; LIST_HEAD(tokill); + LIST_HEAD(shared_files); int rc = -EBUSY; - loff_t start; - dax_entry_t cookie; + int error = 0; /* * Prevent the inode from being freed while we are interrogating @@ -1195,9 +1241,9 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, * also prevents changes to the mapping of this pfn until * poison signaling is complete. */ - cookie = dax_lock_page(page); - if (!cookie) - goto out; + error = dax_lock_page(page, &shared_files); + if (error) + goto unlock; if (hwpoison_filter(page)) { rc = 0; @@ -1225,26 +1271,14 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, * SIGBUS (i.e. MF_MUST_KILL) */ flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; - collect_procs(page, &tokill, flags & MF_ACTION_REQUIRED); + collect_procs(page, &tokill, &shared_files, flags & MF_ACTION_REQUIRED); + + unmap_mappings_range(&tokill, &shared_files); - list_for_each_entry(tk, &tokill, nd) - if (tk->size_shift) - size = max(size, 1UL << tk->size_shift); - if (size) { - /* - * Unmap the largest mapping to avoid breaking up - * device-dax mappings which are constant size. The - * actual size of the mapping being torn down is - * communicated in siginfo, see kill_proc() - */ - start = (page->index << PAGE_SHIFT) & ~(size - 1); - unmap_mapping_range(page->mapping, start, start + size, 0); - } kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, pfn, flags); rc = 0; unlock: - dax_unlock_page(page, cookie); -out: + dax_unlock_page(page, &shared_files); /* drop pgmap ref acquired in caller */ put_dev_pagemap(pgmap); action_result(pfn, MF_MSG_DAX, rc ? MF_FAILED : MF_RECOVERED); From patchwork Fri Aug 7 13:13:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11705653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4A47138C for ; Fri, 7 Aug 2020 13:13:55 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BDEFD22CF6 for ; Fri, 7 Aug 2020 13:13:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BDEFD22CF6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 9594F12C2F616; Fri, 7 Aug 2020 06:13:55 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=183.91.158.132; helo=heian.cn.fujitsu.com; envelope-from=ruansy.fnst@cn.fujitsu.com; receiver= Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by ml01.01.org (Postfix) with ESMTP id DEECC12C2F603 for ; Fri, 7 Aug 2020 06:13:49 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.75,445,1589212800"; d="scan'208";a="97774925" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Aug 2020 21:13:48 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 6F1EC4CE34F1; Fri, 7 Aug 2020 21:13:44 +0800 (CST) Received: from G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:40 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 7 Aug 2020 21:13:38 +0800 From: Shiyang Ruan To: , , Subject: [RFC PATCH 3/8] fsdax: introduce dax_copy_edges() for COW Date: Fri, 7 Aug 2020 21:13:31 +0800 Message-ID: <20200807131336.318774-4-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> References: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 6F1EC4CE34F1.ACA37 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Message-ID-Hash: 3L7NRJDN3XFFVFJXYWPBV3TR2HPRYIT7 X-Message-ID-Hash: 3L7NRJDN3XFFVFJXYWPBV3TR2HPRYIT7 X-MailFrom: ruansy.fnst@cn.fujitsu.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com, Goldwyn Rodrigues X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Add address output in dax_iomap_pfn() in order to perform a memcpy() in CoW case. Since this function both output address and pfn, rename it to dax_iomap_direct_access(). dax_copy_edges() is a helper functions performs a copy from one part of the device to another for data not page aligned. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan --- fs/dax.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 58 insertions(+), 6 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 47380f75ef38..308678c58d4d 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1043,8 +1043,8 @@ static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos) return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9; } -static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size, - pfn_t *pfnp) +static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size, + pfn_t *pfnp, void **addr) { const sector_t sector = dax_iomap_sector(iomap, pos); pgoff_t pgoff; @@ -1055,12 +1055,14 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size, if (rc) return rc; id = dax_read_lock(); - length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size), - NULL, pfnp); + length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size), addr, + pfnp); if (length < 0) { rc = length; goto out; } + if (!pfnp) + goto out_check_addr; rc = -EINVAL; if (PFN_PHYS(length) < size) goto out; @@ -1070,11 +1072,59 @@ static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size, if (length > 1 && !pfn_t_devmap(*pfnp)) goto out; rc = 0; + +out_check_addr: + if (!addr) + goto out; + if (!*addr) + rc = -EFAULT; out: dax_read_unlock(id); return rc; } +/* + * dax_copy_edges - Copies the part of the pages not included in + * the write, but required for CoW because + * offset/offset+length are not page aligned. + */ +static int dax_copy_edges(loff_t pos, loff_t length, struct iomap *srcmap, + void *daddr, bool pmd) +{ + size_t page_size = pmd ? PMD_SIZE : PAGE_SIZE; + loff_t offset = pos & (page_size - 1); + size_t size = ALIGN(offset + length, page_size); + loff_t end = pos + length; + loff_t pg_end = round_up(end, page_size); + void *saddr = 0; + int ret = 0; + + ret = dax_iomap_direct_access(srcmap, pos, size, NULL, &saddr); + if (ret) + return ret; + /* + * Copy the first part of the page + * Note: we pass offset as length + */ + if (offset) { + if (saddr) + ret = memcpy_mcsafe(daddr, saddr, offset); + else + memset(daddr, 0, offset); + } + + /* Copy the last part of the range */ + if (end < pg_end) { + if (saddr) + ret = memcpy_mcsafe(daddr + offset + length, + saddr + offset + length, pg_end - end); + else + memset(daddr + offset + length, 0, + pg_end - end); + } + return ret; +} + /* * The user has performed a load from a hole in the file. Allocating a new * page in the file would cause excessive storage usage for workloads with @@ -1394,7 +1444,8 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); major = VM_FAULT_MAJOR; } - error = dax_iomap_pfn(&iomap, pos, PAGE_SIZE, &pfn); + error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE, &pfn, + NULL); if (error < 0) goto error_finish_iomap; @@ -1612,7 +1663,8 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, switch (iomap.type) { case IOMAP_MAPPED: - error = dax_iomap_pfn(&iomap, pos, PMD_SIZE, &pfn); + error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE, &pfn, + NULL); if (error < 0) goto finish_iomap; From patchwork Fri Aug 7 13:13:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11705663 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0E821731 for ; Fri, 7 Aug 2020 13:13:59 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CAA5B22D02 for ; Fri, 7 Aug 2020 13:13:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CAA5B22D02 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id ACDF312C2F609; Fri, 7 Aug 2020 06:13:59 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=183.91.158.132; helo=heian.cn.fujitsu.com; envelope-from=ruansy.fnst@cn.fujitsu.com; receiver= Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by ml01.01.org (Postfix) with ESMTP id 97D0612C2F617 for ; Fri, 7 Aug 2020 06:13:55 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.75,445,1589212800"; d="scan'208";a="97774934" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Aug 2020 21:13:54 +0800 Received: from G08CNEXMBPEKD06.g08.fujitsu.local (unknown [10.167.33.206]) by cn.fujitsu.com (Postfix) with ESMTP id BF32F4CE34EE; Fri, 7 Aug 2020 21:13:51 +0800 (CST) Received: from G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) by G08CNEXMBPEKD06.g08.fujitsu.local (10.167.33.206) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:40 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 7 Aug 2020 21:13:39 +0800 From: Shiyang Ruan To: , , Subject: [RFC PATCH 4/8] fsdax: copy data before write Date: Fri, 7 Aug 2020 21:13:32 +0800 Message-ID: <20200807131336.318774-5-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> References: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: BF32F4CE34EE.AFF39 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Message-ID-Hash: RDVDJNKRXJBA6G6YCTOZRMNA5UJVTCK6 X-Message-ID-Hash: RDVDJNKRXJBA6G6YCTOZRMNA5UJVTCK6 X-MailFrom: ruansy.fnst@cn.fujitsu.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Add dax_copy_edges() into each dax actor functions to perform CoW. Signed-off-by: Shiyang Ruan --- fs/dax.c | 39 +++++++++++++++++++++++++++++++++++---- 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 308678c58d4d..65553e3f7602 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1208,7 +1208,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return iov_iter_zero(min(length, end - pos), iter); } - if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED)) + if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED && + !(iomap->flags & IOMAP_F_SHARED))) return -EIO; /* @@ -1247,6 +1248,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, break; } + if (iomap != srcmap) { + ret = dax_copy_edges(pos, length, srcmap, kaddr, false); + if (ret) + break; + } + map_len = PFN_PHYS(map_len); kaddr += offset; map_len -= offset; @@ -1358,6 +1365,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, vm_fault_t ret = 0; void *entry; pfn_t pfn; + void *kaddr; trace_dax_pte_fault(inode, vmf, ret); /* @@ -1439,19 +1447,27 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, switch (iomap.type) { case IOMAP_MAPPED: +cow: if (iomap.flags & IOMAP_F_NEW) { count_vm_event(PGMAJFAULT); count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); major = VM_FAULT_MAJOR; } error = dax_iomap_direct_access(&iomap, pos, PAGE_SIZE, &pfn, - NULL); + &kaddr); if (error < 0) goto error_finish_iomap; entry = dax_insert_entry(&xas, mapping, vmf, iomap.addr, entry, pfn, 0, write && !sync); + if (srcmap.type != IOMAP_HOLE) { + error = dax_copy_edges(pos, PAGE_SIZE, &srcmap, kaddr, + false); + if (error) + goto error_finish_iomap; + } + /* * If we are doing synchronous page fault and inode needs fsync, * we can insert PTE into page tables only after that happens. @@ -1475,12 +1491,15 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, goto finish_iomap; case IOMAP_UNWRITTEN: + if (write && iomap.flags & IOMAP_F_SHARED) + goto cow; + fallthrough; case IOMAP_HOLE: if (!write) { ret = dax_load_hole(&xas, mapping, &entry, vmf); goto finish_iomap; } - /*FALLTHRU*/ + fallthrough; default: WARN_ON_ONCE(1); error = -EIO; @@ -1582,6 +1601,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, loff_t pos; int error; pfn_t pfn; + void *kaddr; /* * Check whether offset isn't beyond end of file now. Caller is @@ -1663,14 +1683,22 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, switch (iomap.type) { case IOMAP_MAPPED: +cow: error = dax_iomap_direct_access(&iomap, pos, PMD_SIZE, &pfn, - NULL); + &kaddr); if (error < 0) goto finish_iomap; entry = dax_insert_entry(&xas, mapping, vmf, iomap.addr, entry, pfn, DAX_PMD, write && !sync); + if (srcmap.type != IOMAP_HOLE) { + error = dax_copy_edges(pos, PMD_SIZE, &srcmap, kaddr, + true); + if (error) + goto unlock_entry; + } + /* * If we are doing synchronous page fault and inode needs fsync, * we can insert PMD into page tables only after that happens. @@ -1689,6 +1717,9 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, result = vmf_insert_pfn_pmd(vmf, pfn, write); break; case IOMAP_UNWRITTEN: + if (write && iomap.flags & IOMAP_F_SHARED) + goto cow; + fallthrough; case IOMAP_HOLE: if (WARN_ON_ONCE(write)) break; From patchwork Fri Aug 7 13:13:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11705635 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29906618 for ; Fri, 7 Aug 2020 13:13:49 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1367F2224D for ; Fri, 7 Aug 2020 13:13:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1367F2224D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E6FC512C2F608; Fri, 7 Aug 2020 06:13:48 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=183.91.158.132; helo=heian.cn.fujitsu.com; envelope-from=ruansy.fnst@cn.fujitsu.com; receiver= Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by ml01.01.org (Postfix) with ESMTP id 7234F12C9E7C1 for ; Fri, 7 Aug 2020 06:13:45 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.75,445,1589212800"; d="scan'208";a="97774909" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Aug 2020 21:13:42 +0800 Received: from G08CNEXMBPEKD04.g08.fujitsu.local (unknown [10.167.33.201]) by cn.fujitsu.com (Postfix) with ESMTP id 349484CE38A0; Fri, 7 Aug 2020 21:13:40 +0800 (CST) Received: from G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) by G08CNEXMBPEKD04.g08.fujitsu.local (10.167.33.201) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:42 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 7 Aug 2020 21:13:40 +0800 From: Shiyang Ruan To: , , Subject: [RFC PATCH 5/8] fsdax: replace mmap entry in case of CoW Date: Fri, 7 Aug 2020 21:13:33 +0800 Message-ID: <20200807131336.318774-6-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> References: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 349484CE38A0.AAB17 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Message-ID-Hash: EIIPZG34OYVCLNQOV7D6NLXQSZVMOHQS X-Message-ID-Hash: EIIPZG34OYVCLNQOV7D6NLXQSZVMOHQS X-MailFrom: ruansy.fnst@cn.fujitsu.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com, Goldwyn Rodrigues X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: We replace the existing entry to the newly allocated one in case of CoW. Also, we mark the entry as PAGECACHE_TAG_TOWRITE so writeback marks this entry as writeprotected. This helps us snapshots so new write pagefaults after snapshots trigger a CoW. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan --- fs/dax.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 65553e3f7602..28b8e23b11ac 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -766,6 +766,9 @@ static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev, return 0; } +#define DAX_IF_DIRTY (1ULL << 0) +#define DAX_IF_COW (1ULL << 1) + /* * By this point grab_mapping_entry() has ensured that we have a locked entry * of the appropriate size so we don't have to worry about downgrading PMDs to @@ -776,14 +779,16 @@ static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev, static void *dax_insert_entry(struct xa_state *xas, struct address_space *mapping, struct vm_fault *vmf, pgoff_t bdoff, void *entry, pfn_t pfn, unsigned long flags, - bool dirty) + bool insert_flags) { void *new_entry = dax_make_entry(pfn, flags); + bool dirty = insert_flags & DAX_IF_DIRTY; + bool cow = insert_flags & DAX_IF_COW; if (dirty) __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - if (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE)) { + if (cow || (dax_is_zero_entry(entry) && !(flags & DAX_ZERO_PAGE))) { unsigned long index = xas->xa_index; /* we are replacing a zero page with block mapping */ if (dax_is_pmd_entry(entry)) @@ -795,7 +800,7 @@ static void *dax_insert_entry(struct xa_state *xas, xas_reset(xas); xas_lock_irq(xas); - if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { + if (cow || dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) { void *old; dax_disassociate_entry(entry, mapping, false); @@ -819,6 +824,9 @@ static void *dax_insert_entry(struct xa_state *xas, if (dirty) xas_set_mark(xas, PAGECACHE_TAG_DIRTY); + if (cow) + xas_set_mark(xas, PAGECACHE_TAG_TOWRITE); + xas_unlock_irq(xas); return entry; } @@ -1366,6 +1374,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, void *entry; pfn_t pfn; void *kaddr; + unsigned long insert_flags = 0; trace_dax_pte_fault(inode, vmf, ret); /* @@ -1491,8 +1500,10 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, goto finish_iomap; case IOMAP_UNWRITTEN: - if (write && iomap.flags & IOMAP_F_SHARED) + if (write && (iomap.flags & IOMAP_F_SHARED)) { + insert_flags |= DAX_IF_COW; goto cow; + } fallthrough; case IOMAP_HOLE: if (!write) { @@ -1602,6 +1613,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, int error; pfn_t pfn; void *kaddr; + unsigned long insert_flags = 0; /* * Check whether offset isn't beyond end of file now. Caller is @@ -1717,14 +1729,17 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, result = vmf_insert_pfn_pmd(vmf, pfn, write); break; case IOMAP_UNWRITTEN: - if (write && iomap.flags & IOMAP_F_SHARED) + if (write && (iomap.flags & IOMAP_F_SHARED)) { + insert_flags |= DAX_IF_COW; goto cow; + } fallthrough; case IOMAP_HOLE: - if (WARN_ON_ONCE(write)) + if (!write) { + result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry); break; - result = dax_pmd_load_hole(&xas, vmf, &iomap, &entry); - break; + } + fallthrough; default: WARN_ON_ONCE(1); break; From patchwork Fri Aug 7 13:13:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11705665 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EBF6C138C for ; Fri, 7 Aug 2020 13:13:59 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D5A4622D0B for ; Fri, 7 Aug 2020 13:13:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D5A4622D0B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C0D0812C2F616; Fri, 7 Aug 2020 06:13:59 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=183.91.158.132; helo=heian.cn.fujitsu.com; envelope-from=ruansy.fnst@cn.fujitsu.com; receiver= Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by ml01.01.org (Postfix) with ESMTP id 3C58712C2F600 for ; Fri, 7 Aug 2020 06:13:57 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.75,445,1589212800"; d="scan'208";a="97774935" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Aug 2020 21:13:54 +0800 Received: from G08CNEXMBPEKD06.g08.fujitsu.local (unknown [10.167.33.206]) by cn.fujitsu.com (Postfix) with ESMTP id C0FA64CE34EF; Fri, 7 Aug 2020 21:13:51 +0800 (CST) Received: from G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) by G08CNEXMBPEKD06.g08.fujitsu.local (10.167.33.206) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:42 +0800 Received: from G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) by G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:43 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 7 Aug 2020 21:13:41 +0800 From: Shiyang Ruan To: , , Subject: [RFC PATCH 6/8] fsdax: dedup file range to use a compare function Date: Fri, 7 Aug 2020 21:13:34 +0800 Message-ID: <20200807131336.318774-7-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> References: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: C0FA64CE34EF.AE9E3 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Message-ID-Hash: P5ZOWOX2WG4NXA4EFYH3RTSVAOCDP5GJ X-Message-ID-Hash: P5ZOWOX2WG4NXA4EFYH3RTSVAOCDP5GJ X-MailFrom: ruansy.fnst@cn.fujitsu.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com, Goldwyn Rodrigues X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: With dax we cannot deal with readpage() etc. So, we create a funciton callback to perform the file data comparison and pass it to generic_remap_file_range_prep() so it can use iomap-based functions. This may not be the best way to solve this. Suggestions welcome. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan --- fs/btrfs/reflink.c | 3 +- fs/dax.c | 67 ++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/file.c | 2 +- fs/read_write.c | 11 ++++---- fs/xfs/xfs_reflink.c | 2 +- include/linux/dax.h | 5 ++++ include/linux/fs.h | 9 +++++- 7 files changed, 90 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c index 040009d1cc31..d0412bc338da 100644 --- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -765,7 +765,8 @@ static int btrfs_remap_file_range_prep(struct file *file_in, loff_t pos_in, return ret; return generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, - len, remap_flags); + len, remap_flags, + vfs_dedupe_file_range_compare); } loff_t btrfs_remap_file_range(struct file *src_file, loff_t off, diff --git a/fs/dax.c b/fs/dax.c index 28b8e23b11ac..80a9946fd25a 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -30,6 +30,8 @@ #define CREATE_TRACE_POINTS #include +#define MIN(a, b) (((a) < (b)) ? (a) : (b)) + static inline unsigned int pe_order(enum page_entry_size pe_size) { if (pe_size == PE_SIZE_PTE) @@ -1874,3 +1876,68 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, return dax_insert_pfn_mkwrite(vmf, pfn, order); } EXPORT_SYMBOL_GPL(dax_finish_sync_fault); + +int dax_file_range_compare(struct inode *src, loff_t srcoff, struct inode *dest, + loff_t destoff, loff_t len, bool *is_same, + const struct iomap_ops *ops) +{ + void *saddr, *daddr; + struct iomap smap = { 0 }; + struct iomap dmap = { 0 }; + bool same = false; + loff_t cmp_len; + int id, ret = 0; + + id = dax_read_lock(); + while (len) { + ret = ops->iomap_begin(src, srcoff, len, 0, &smap, NULL); + if (ret < 0) + goto out_src; + cmp_len = MIN(len, smap.offset + smap.length - srcoff); + + ret = ops->iomap_begin(dest, destoff, cmp_len, 0, &dmap, NULL); + if (ret < 0) + goto out_dest; + cmp_len = MIN(cmp_len, dmap.offset + dmap.length - destoff); + + if (smap.type == IOMAP_HOLE && dmap.type == IOMAP_HOLE) + goto next; + + if (smap.type == IOMAP_HOLE || dmap.type == IOMAP_HOLE) { + same = false; + goto next; + } + + ret = dax_iomap_direct_access(&smap, srcoff, + ALIGN(srcoff + cmp_len, PAGE_SIZE), NULL, &saddr); + if (ret < 0) + goto out_dest; + + ret = dax_iomap_direct_access(&dmap, destoff, + ALIGN(destoff + cmp_len, PAGE_SIZE), NULL, &daddr); + if (ret < 0) + goto out_dest; + + same = !memcmp(saddr, daddr, cmp_len); + if (!same) + break; +next: + len -= cmp_len; + srcoff += cmp_len; + destoff += cmp_len; +out_dest: + if (ops->iomap_end) + ops->iomap_end(dest, destoff, cmp_len, 0, 0, &dmap); +out_src: + if (ops->iomap_end) + ops->iomap_end(src, srcoff, len, 0, 0, &smap); + + if (ret < 0) + goto out; + } + *is_same = same; +out: + dax_read_unlock(id); + return ret; +} +EXPORT_SYMBOL_GPL(dax_file_range_compare); diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 85979e2214b3..9d101f129d16 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2591,7 +2591,7 @@ static loff_t ocfs2_remap_file_range(struct file *file_in, loff_t pos_in, goto out_unlock; ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, - &len, remap_flags); + &len, remap_flags, vfs_dedupe_file_range_compare); if (ret < 0 || len == 0) goto out_unlock; diff --git a/fs/read_write.c b/fs/read_write.c index 4fb797822567..2974a624f232 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1927,7 +1927,7 @@ static void vfs_unlock_two_pages(struct page *page1, struct page *page2) * Compare extents of two files to see if they are the same. * Caller must have locked both inodes to prevent write races. */ -static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, +int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, struct inode *dest, loff_t destoff, loff_t len, bool *is_same) { @@ -2008,6 +2008,7 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, out_error: return error; } +EXPORT_SYMBOL_GPL(vfs_dedupe_file_range_compare); /* * Check that the two inodes are eligible for cloning, the ranges make @@ -2019,7 +2020,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, */ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, - loff_t *len, unsigned int remap_flags) + loff_t *len, unsigned int remap_flags, + compare_range_t compare) { struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); @@ -2078,9 +2080,8 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, */ if (remap_flags & REMAP_FILE_DEDUP) { bool is_same = false; - - ret = vfs_dedupe_file_range_compare(inode_in, pos_in, - inode_out, pos_out, *len, &is_same); + ret = (*compare)(inode_in, pos_in, + inode_out, pos_out, *len, &is_same); if (ret) return ret; if (!is_same) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 107bf2a2f344..792217cd1e64 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1338,7 +1338,7 @@ xfs_reflink_remap_prep( goto out_unlock; ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, - len, remap_flags); + len, remap_flags, vfs_dedupe_file_range_compare); if (ret < 0 || *len == 0) goto out_unlock; diff --git a/include/linux/dax.h b/include/linux/dax.h index d6d9f4b721bc..607e21933928 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -223,6 +223,11 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping, pgoff_t index); int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size, struct iomap *iomap); +int dax_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same, + const struct iomap_ops *ops); + static inline bool dax_mapping(struct address_space *mapping) { return mapping->host && IS_DAX(mapping->host); diff --git a/include/linux/fs.h b/include/linux/fs.h index 81de3d2739b9..c2d985295db5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1924,13 +1924,20 @@ extern ssize_t vfs_readv(struct file *, const struct iovec __user *, unsigned long, loff_t *, rwf_t); extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *, loff_t, size_t, unsigned int); +extern int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same); +typedef int (*compare_range_t)(struct inode *src, loff_t srcpos, + struct inode *dest, loff_t destpos, + loff_t len, bool *is_same); extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len, unsigned int flags); extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, - unsigned int remap_flags); + unsigned int remap_flags, + compare_range_t cmp); extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); From patchwork Fri Aug 7 13:13:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11705643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7302E138C for ; Fri, 7 Aug 2020 13:13:52 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5C57B221E2 for ; Fri, 7 Aug 2020 13:13:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C57B221E2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 4388012C2F614; Fri, 7 Aug 2020 06:13:50 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=183.91.158.132; helo=heian.cn.fujitsu.com; envelope-from=ruansy.fnst@cn.fujitsu.com; receiver= Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by ml01.01.org (Postfix) with ESMTP id BF4EF12C2F603 for ; Fri, 7 Aug 2020 06:13:48 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.75,445,1589212800"; d="scan'208";a="97774922" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Aug 2020 21:13:48 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 721444CE34F2; Fri, 7 Aug 2020 21:13:44 +0800 (CST) Received: from G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:43 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 7 Aug 2020 21:13:41 +0800 From: Shiyang Ruan To: , , Subject: [RFC PATCH 7/8] fs/xfs: handle CoW for fsdax write() path Date: Fri, 7 Aug 2020 21:13:35 +0800 Message-ID: <20200807131336.318774-8-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> References: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 721444CE34F2.AB269 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Message-ID-Hash: FH27BGZEVZP5MW5O575VNHFXRYN2EQQH X-Message-ID-Hash: FH27BGZEVZP5MW5O575VNHFXRYN2EQQH X-MailFrom: ruansy.fnst@cn.fujitsu.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: In fsdax mode, WRITE and ZERO on a shared extent need CoW mechanism performed. After CoW, new extents needs to be remapped to the file. Signed-off-by: Shiyang Ruan --- fs/xfs/xfs_bmap_util.c | 6 +++++- fs/xfs/xfs_file.c | 10 +++++++--- fs/xfs/xfs_iomap.c | 3 ++- fs/xfs/xfs_iops.c | 11 ++++++++--- fs/xfs/xfs_reflink.c | 2 ++ 5 files changed, 24 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index f37f5cc4b19f..5d09d6c454b6 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -969,10 +969,14 @@ xfs_free_file_space( if (offset + len > XFS_ISIZE(ip)) len = XFS_ISIZE(ip) - offset; error = iomap_zero_range(VFS_I(ip), offset, len, NULL, - &xfs_buffered_write_iomap_ops); + IS_DAX(VFS_I(ip)) ? + &xfs_direct_write_iomap_ops : &xfs_buffered_write_iomap_ops); if (error) return error; + if (xfs_is_reflink_inode(ip)) + xfs_reflink_end_cow(ip, offset, len); + /* * If we zeroed right up to EOF and EOF straddles a page boundary we * must make sure that the post-EOF area is also zeroed because the diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 00db81eac80d..45041913129b 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -588,9 +588,13 @@ xfs_file_dax_write( trace_xfs_file_dax_write(ip, count, pos); ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops); - if (ret > 0 && iocb->ki_pos > i_size_read(inode)) { - i_size_write(inode, iocb->ki_pos); - error = xfs_setfilesize(ip, pos, ret); + if (ret > 0) { + if (iocb->ki_pos > i_size_read(inode)) { + i_size_write(inode, iocb->ki_pos); + error = xfs_setfilesize(ip, pos, ret); + } + if (xfs_is_cow_inode(ip)) + xfs_reflink_end_cow(ip, pos, ret); } out: xfs_iunlock(ip, iolock); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index b9a8c3798e08..a1fc75f11cf9 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -748,13 +748,14 @@ xfs_direct_write_iomap_begin( goto out_unlock; if (imap_needs_cow(ip, flags, &imap, nimaps)) { + bool need_convert = flags & IOMAP_DIRECT || IS_DAX(inode); error = -EAGAIN; if (flags & IOMAP_NOWAIT) goto out_unlock; /* may drop and re-acquire the ilock */ error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared, - &lockmode, flags & IOMAP_DIRECT); + &lockmode, need_convert); if (error) goto out_unlock; if (shared) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 80a13c8561d8..6dd6a973ea75 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -860,6 +860,7 @@ xfs_setattr_size( int error; uint lock_flags = 0; bool did_zeroing = false; + const struct iomap_ops *ops; ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL)); @@ -906,13 +907,17 @@ xfs_setattr_size( * extension, or zeroing out the rest of the block on a downward * truncate. */ + if (IS_DAX(inode)) + ops = &xfs_direct_write_iomap_ops; + else + ops = &xfs_buffered_write_iomap_ops; + if (newsize > oldsize) { trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); error = iomap_zero_range(inode, oldsize, newsize - oldsize, - &did_zeroing, &xfs_buffered_write_iomap_ops); + &did_zeroing, ops); } else { - error = iomap_truncate_page(inode, newsize, &did_zeroing, - &xfs_buffered_write_iomap_ops); + error = iomap_truncate_page(inode, newsize, &did_zeroing, ops); } if (error) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 792217cd1e64..f87ab78dd421 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1269,6 +1269,8 @@ xfs_reflink_zero_posteof( trace_xfs_zero_eof(ip, isize, pos - isize); return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL, + IS_DAX(VFS_I(ip)) ? + &xfs_direct_write_iomap_ops : &xfs_buffered_write_iomap_ops); } From patchwork Fri Aug 7 13:13:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruan Shiyang X-Patchwork-Id: 11705651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A5EF6175A for ; Fri, 7 Aug 2020 13:13:54 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8F1EC22D00 for ; Fri, 7 Aug 2020 13:13:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F1EC22D00 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=cn.fujitsu.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 78B3612C2F604; Fri, 7 Aug 2020 06:13:54 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=183.91.158.132; helo=heian.cn.fujitsu.com; envelope-from=ruansy.fnst@cn.fujitsu.com; receiver= Received: from heian.cn.fujitsu.com (mail.cn.fujitsu.com [183.91.158.132]) by ml01.01.org (Postfix) with ESMTP id CCF9512C2F604 for ; Fri, 7 Aug 2020 06:13:48 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.75,445,1589212800"; d="scan'208";a="97774923" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 07 Aug 2020 21:13:48 +0800 Received: from G08CNEXMBPEKD05.g08.fujitsu.local (unknown [10.167.33.204]) by cn.fujitsu.com (Postfix) with ESMTP id 7467E4CE34F3; Fri, 7 Aug 2020 21:13:44 +0800 (CST) Received: from G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) by G08CNEXMBPEKD05.g08.fujitsu.local (10.167.33.204) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:44 +0800 Received: from G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.205) by G08CNEXJMPEKD02.g08.fujitsu.local (10.167.33.202) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 7 Aug 2020 21:13:44 +0800 Received: from localhost.localdomain (10.167.225.141) by G08CNEXCHPEKD06.g08.fujitsu.local (10.167.33.209) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 7 Aug 2020 21:13:42 +0800 From: Shiyang Ruan To: , , Subject: [RFC PATCH 8/8] fs/xfs: support dedupe for fsdax Date: Fri, 7 Aug 2020 21:13:36 +0800 Message-ID: <20200807131336.318774-9-ruansy.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> References: <20200807131336.318774-1-ruansy.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-yoursite-MailScanner-ID: 7467E4CE34F3.AD342 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: ruansy.fnst@cn.fujitsu.com X-Spam-Status: No Message-ID-Hash: 6AT3TMVQITLG3YQCFLXYWWWEBOT5FEHE X-Message-ID-Hash: 6AT3TMVQITLG3YQCFLXYWWWEBOT5FEHE X-MailFrom: ruansy.fnst@cn.fujitsu.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, darrick.wong@oracle.com, david@fromorbit.com, hch@lst.de, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Use xfs_break_layouts() to break files' layouts when locking them. And call dax_file_range_compare() function to compare range for files both have DAX flag. Signed-off-by: Shiyang Ruan --- fs/xfs/xfs_reflink.c | 78 ++++++++++++++++++++++++++------------------ 1 file changed, 46 insertions(+), 32 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index f87ab78dd421..b2901ad1a269 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -29,6 +29,7 @@ #include "xfs_iomap.h" #include "xfs_sb.h" #include "xfs_ag_resv.h" +#include /* * Copy on Write of Shared Blocks @@ -1185,47 +1186,41 @@ xfs_reflink_remap_blocks( * back out both locks. */ static int -xfs_iolock_two_inodes_and_break_layout( - struct inode *src, - struct inode *dest) +xfs_reflink_remap_lock_and_break_layouts( + struct file *file_in, + struct file *file_out) { int error; + struct inode *inode_in = file_inode(file_in); + struct xfs_inode *src = XFS_I(inode_in); + struct inode *inode_out = file_inode(file_out); + struct xfs_inode *dest = XFS_I(inode_out); + uint iolock = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL; - if (src > dest) + if (inode_in > inode_out) { + swap(inode_in, inode_out); swap(src, dest); - -retry: - /* Wait to break both inodes' layouts before we start locking. */ - error = break_layout(src, true); - if (error) - return error; - if (src != dest) { - error = break_layout(dest, true); - if (error) - return error; } - /* Lock one inode and make sure nobody got in and leased it. */ - inode_lock(src); - error = break_layout(src, false); + inode_lock(inode_in); + xfs_ilock(src, XFS_MMAPLOCK_EXCL); + error = xfs_break_layouts(inode_in, &iolock, BREAK_UNMAP); + xfs_iunlock(src, XFS_MMAPLOCK_EXCL); if (error) { - inode_unlock(src); - if (error == -EWOULDBLOCK) - goto retry; + inode_unlock(inode_in); return error; } - if (src == dest) + if (inode_in == inode_out) return 0; - /* Lock the other inode and make sure nobody got in and leased it. */ - inode_lock_nested(dest, I_MUTEX_NONDIR2); - error = break_layout(dest, false); + inode_lock_nested(inode_out, I_MUTEX_NONDIR2); + xfs_ilock(dest, XFS_MMAPLOCK_EXCL); + error = xfs_break_layouts(inode_out, &iolock, BREAK_UNMAP); + xfs_iunlock(dest, XFS_MMAPLOCK_EXCL); if (error) { - inode_unlock(src); - inode_unlock(dest); - if (error == -EWOULDBLOCK) - goto retry; + inode_unlock(inode_in); + inode_unlock(inode_out); return error; } @@ -1244,6 +1239,11 @@ xfs_reflink_remap_unlock( struct xfs_inode *dest = XFS_I(inode_out); bool same_inode = (inode_in == inode_out); + if (inode_in > inode_out) { + swap(inode_in, inode_out); + swap(src, dest); + } + xfs_iunlock(dest, XFS_MMAPLOCK_EXCL); if (!same_inode) xfs_iunlock(src, XFS_MMAPLOCK_EXCL); @@ -1274,6 +1274,14 @@ xfs_reflink_zero_posteof( &xfs_buffered_write_iomap_ops); } +int xfs_reflink_dedupe_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same) +{ + return dax_file_range_compare(src, srcoff, dest, destoff, len, is_same, + &xfs_read_iomap_ops); +} + /* * Prepare two files for range cloning. Upon a successful return both inodes * will have the iolock and mmaplock held, the page cache of the out file will @@ -1318,9 +1326,10 @@ xfs_reflink_remap_prep( struct xfs_inode *dest = XFS_I(inode_out); bool same_inode = (inode_in == inode_out); ssize_t ret; + compare_range_t cmp; /* Lock both files against IO */ - ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out); + ret = xfs_reflink_remap_lock_and_break_layouts(file_in, file_out); if (ret) return ret; if (same_inode) @@ -1335,12 +1344,17 @@ xfs_reflink_remap_prep( if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest)) goto out_unlock; - /* Don't share DAX file data for now. */ - if (IS_DAX(inode_in) || IS_DAX(inode_out)) + /* Don't share DAX file data with non-DAX file. */ + if (IS_DAX(inode_in) != IS_DAX(inode_out)) goto out_unlock; + if (IS_DAX(inode_in)) + cmp = xfs_reflink_dedupe_file_range_compare; + else + cmp = vfs_dedupe_file_range_compare; + ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, - len, remap_flags, vfs_dedupe_file_range_compare); + len, remap_flags, cmp); if (ret < 0 || *len == 0) goto out_unlock;