From patchwork Tue Jun 15 09:17:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320805 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D276C48BE8 for ; Tue, 15 Jun 2021 09:18:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2081361434 for ; Tue, 15 Jun 2021 09:18:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231425AbhFOJU3 (ORCPT ); Tue, 15 Jun 2021 05:20:29 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54442 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231243AbhFOJUV (ORCPT ); Tue, 15 Jun 2021 05:20:21 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 0C714219C4; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t7igIsjuZi1JvEYkGCAYMapFEOcoZqyGHFLdZJ9amoQ=; b=wuSltVUtcbffkh57rA4M0xkAPc6hBBIQD7rqUxZeTvpXLa8DfBYKay4vZ+gDcd3w7PwCKR G/MM4NkcZvJ/MYlPrt5Nm2dQKwMTSwXaAzopBzG5WnNzjliPrwMbKXHhwnlAPSukU0wmnr DKjtXbcxh5pzihUaz7dlRoBHaHziGMk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t7igIsjuZi1JvEYkGCAYMapFEOcoZqyGHFLdZJ9amoQ=; b=8BDoEOuUztARLzWeNKP7WRNHA+Jl7QZ0bqw8IEeBILnXjSHr9ivpspNG4F4cO+sksJYTxJ BJTqNMjxEf8y9EAw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 67129A3B97; Tue, 15 Jun 2021 09:18:14 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 2424B1E125C; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , Christoph Hellwig , "Darrick J . Wong" , Hugh Dickins Subject: [PATCH 01/14] mm: Fix comments mentioning i_mutex Date: Tue, 15 Jun 2021 11:17:51 +0200 Message-Id: <20210615091814.28626-1-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=10823; h=from:subject; bh=mivglnSwswYFPkeVyv9F6vdXnrQ0F6cyovWhp1oQWu8=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBATgss+CqoCMoRtzmZ1tQiJoTcQ5Qb6Hbe2WI2 hbYp4XqJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwQAAKCRCcnaoHP2RA2RMQB/ 9ol+MZ7HafDRq9/Fi7puPeZS2XRWjM8ZlEsZD22vXPb5t2ZfOIq4Mj+KdA7cjbPLzpBX4ILB6gc3Yd s0VCIzEq0uzDTJqXGYt+vnb2gRVgGoQNeJfDCnzhj3dmYGEflu0EP8jUtSLkntUtf2u/S+AFmO5vrT /ZeJv6v230aAAhaV6DEI4O8q2WakTECOPXhd+YbV1Ev9aJdFf6fChcn99uURiGXM9CMZeCEmfOTQok 2po5GDacNVP3Vwi5IuA9hSu3BD5ll6+d1XSxzQHnA4mvI44yln+6hVfPdKPtiHsU822MHs27ihOKQ8 iQVSm3NZd9XDF9MKCazcMe2oCixY5T X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Acked-by: Hugh Dickins Signed-off-by: Jan Kara --- mm/filemap.c | 10 +++++----- mm/madvise.c | 2 +- mm/memory-failure.c | 2 +- mm/rmap.c | 6 +++--- mm/shmem.c | 20 ++++++++++---------- mm/truncate.c | 8 ++++---- 6 files changed, 24 insertions(+), 24 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 66f7e9fdfbc4..ba1068a1837f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -76,7 +76,7 @@ * ->swap_lock (exclusive_swap_page, others) * ->i_pages lock * - * ->i_mutex + * ->i_rwsem * ->i_mmap_rwsem (truncate->unmap_mapping_range) * * ->mmap_lock @@ -87,7 +87,7 @@ * ->mmap_lock * ->lock_page (access_process_vm) * - * ->i_mutex (generic_perform_write) + * ->i_rwsem (generic_perform_write) * ->mmap_lock (fault_in_pages_readable->do_page_fault) * * bdi->wb.list_lock @@ -3710,12 +3710,12 @@ EXPORT_SYMBOL(generic_perform_write); * modification times and calls proper subroutines depending on whether we * do direct IO or a standard buffered write. * - * It expects i_mutex to be grabbed unless we work on a block device or similar + * It expects i_rwsem to be grabbed unless we work on a block device or similar * object which does not need locking at all. * * This function does *not* take care of syncing data in case of O_SYNC write. * A caller has to handle it. This is mainly due to the fact that we want to - * avoid syncing under i_mutex. + * avoid syncing under i_rwsem. * * Return: * * number of bytes written, even for truncated writes @@ -3803,7 +3803,7 @@ EXPORT_SYMBOL(__generic_file_write_iter); * * This is a wrapper around __generic_file_write_iter() to be used by most * filesystems. It takes care of syncing the file in case of O_SYNC file - * and acquires i_mutex as needed. + * and acquires i_rwsem as needed. * Return: * * negative error code if no data has been written at all of * vfs_fsync_range() failed for a synchronous write diff --git a/mm/madvise.c b/mm/madvise.c index 63e489e5bfdb..a0137706b92a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -853,7 +853,7 @@ static long madvise_remove(struct vm_area_struct *vma, + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); /* - * Filesystem's fallocate may need to take i_mutex. We need to + * Filesystem's fallocate may need to take i_rwsem. We need to * explicitly grab a reference because the vma (and hence the * vma's reference to the file) can go away as soon as we drop * mmap_lock. diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 85ad98c00fd9..9dcc9bcea731 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -704,7 +704,7 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn) /* * Truncation is a bit tricky. Enable it per file system for now. * - * Open: to take i_mutex or not for this? Right now we don't. + * Open: to take i_rwsem or not for this? Right now we don't. */ return truncate_error_page(p, pfn, mapping); } diff --git a/mm/rmap.c b/mm/rmap.c index 693a610e181d..a35cbbbded0d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -20,9 +20,9 @@ /* * Lock ordering in mm: * - * inode->i_mutex (while writing or truncating, not reading or faulting) + * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock - * page->flags PG_locked (lock_page) * (see huegtlbfs below) + * page->flags PG_locked (lock_page) * (see hugetlbfs below) * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) * mapping->i_mmap_rwsem * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) @@ -41,7 +41,7 @@ * in arch-dependent flush_dcache_mmap_lock, * within bdi.wb->list_lock in __sync_single_inode) * - * anon_vma->rwsem,mapping->i_mutex (memory_failure, collect_procs_anon) + * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock * diff --git a/mm/shmem.c b/mm/shmem.c index 5d46611cba8d..0a26c64f6a2e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -96,7 +96,7 @@ static struct vfsmount *shm_mnt; /* * shmem_fallocate communicates with shmem_fault or shmem_writepage via - * inode->i_private (with i_mutex making sure that it has only one user at + * inode->i_private (with i_rwsem making sure that it has only one user at * a time): we would prefer not to enlarge the shmem inode just for that. */ struct shmem_falloc { @@ -774,7 +774,7 @@ static int shmem_free_swap(struct address_space *mapping, * Determine (in bytes) how many of the shmem object's pages mapped by the * given offsets are swapped out. * - * This is safe to call without i_mutex or the i_pages lock thanks to RCU, + * This is safe to call without i_rwsem or the i_pages lock thanks to RCU, * as long as the inode doesn't go away and racy results are not a problem. */ unsigned long shmem_partial_swap_usage(struct address_space *mapping, @@ -806,7 +806,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, * Determine (in bytes) how many of the shmem object's pages mapped by the * given vma is swapped out. * - * This is safe to call without i_mutex or the i_pages lock thanks to RCU, + * This is safe to call without i_rwsem or the i_pages lock thanks to RCU, * as long as the inode doesn't go away and racy results are not a problem. */ unsigned long shmem_swap_usage(struct vm_area_struct *vma) @@ -1069,7 +1069,7 @@ static int shmem_setattr(struct user_namespace *mnt_userns, loff_t oldsize = inode->i_size; loff_t newsize = attr->ia_size; - /* protected by i_mutex */ + /* protected by i_rwsem */ if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) || (newsize > oldsize && (info->seals & F_SEAL_GROW))) return -EPERM; @@ -2049,7 +2049,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) /* * Trinity finds that probing a hole which tmpfs is punching can * prevent the hole-punch from ever completing: which in turn - * locks writers out with its hold on i_mutex. So refrain from + * locks writers out with its hold on i_rwsem. So refrain from * faulting pages into the hole while it's being punched. Although * shmem_undo_range() does remove the additions, it may be unable to * keep up, as each new page needs its own unmap_mapping_range() call, @@ -2060,7 +2060,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) * we just need to make racing faults a rare case. * * The implementation below would be much simpler if we just used a - * standard mutex or completion: but we cannot take i_mutex in fault, + * standard mutex or completion: but we cannot take i_rwsem in fault, * and bloating every shmem inode for this unlikely case would be sad. */ if (unlikely(inode->i_private)) { @@ -2514,7 +2514,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping, struct shmem_inode_info *info = SHMEM_I(inode); pgoff_t index = pos >> PAGE_SHIFT; - /* i_mutex is held by caller */ + /* i_rwsem is held by caller */ if (unlikely(info->seals & (F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) { if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) @@ -2614,7 +2614,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) /* * We must evaluate after, since reads (unlike writes) - * are called without i_mutex protection against truncate + * are called without i_rwsem protection against truncate */ nr = PAGE_SIZE; i_size = i_size_read(inode); @@ -2684,7 +2684,7 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence) return -ENXIO; inode_lock(inode); - /* We're holding i_mutex so we can access i_size directly */ + /* We're holding i_rwsem so we can access i_size directly */ offset = mapping_seek_hole_data(mapping, offset, inode->i_size, whence); if (offset >= 0) offset = vfs_setpos(file, offset, MAX_LFS_FILESIZE); @@ -2713,7 +2713,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1; DECLARE_WAIT_QUEUE_HEAD_ONSTACK(shmem_falloc_waitq); - /* protected by i_mutex */ + /* protected by i_rwsem */ if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) { error = -EPERM; goto out; diff --git a/mm/truncate.c b/mm/truncate.c index 95af244b112a..57a618c4a0d6 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -415,7 +415,7 @@ EXPORT_SYMBOL(truncate_inode_pages_range); * @mapping: mapping to truncate * @lstart: offset from which to truncate * - * Called under (and serialised by) inode->i_mutex. + * Called under (and serialised by) inode->i_rwsem. * * Note: When this function returns, there can be a page in the process of * deletion (inside __delete_from_page_cache()) in the specified range. Thus @@ -432,7 +432,7 @@ EXPORT_SYMBOL(truncate_inode_pages); * truncate_inode_pages_final - truncate *all* pages before inode dies * @mapping: mapping to truncate * - * Called under (and serialized by) inode->i_mutex. + * Called under (and serialized by) inode->i_rwsem. * * Filesystems have to use this in the .evict_inode path to inform the * VM that this is the final truncate and the inode is going away. @@ -753,7 +753,7 @@ EXPORT_SYMBOL(truncate_pagecache); * setattr function when ATTR_SIZE is passed in. * * Must be called with a lock serializing truncates and writes (generally - * i_mutex but e.g. xfs uses a different lock) and before all filesystem + * i_rwsem but e.g. xfs uses a different lock) and before all filesystem * specific block truncation has been performed. */ void truncate_setsize(struct inode *inode, loff_t newsize) @@ -782,7 +782,7 @@ EXPORT_SYMBOL(truncate_setsize); * * The function must be called after i_size is updated so that page fault * coming after we unlock the page will already see the new i_size. - * The function must be called while we still hold i_mutex - this not only + * The function must be called while we still hold i_rwsem - this not only * makes sure i_size is stable but also that userspace cannot observe new * i_size value before we are prepared to store mmap writes at new inode size. */ From patchwork Tue Jun 15 09:17:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21953C4CC67 for ; Tue, 15 Jun 2021 09:18:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03FAB61434 for ; Tue, 15 Jun 2021 09:18:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231352AbhFOJUV (ORCPT ); Tue, 15 Jun 2021 05:20:21 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:46210 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231187AbhFOJUU (ORCPT ); Tue, 15 Jun 2021 05:20:20 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 0685E1FD2A; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y1jAxj7HpC0zj00ojIU7+2/5URp27cwg5ekGhnTUbK4=; b=kCTrnnv24k3P/naCGHeylkCPKtqvzZxc8HE4GZmrgKCn4fBuRnMlMLYLxDGCigyj6jIFYN zdaypn8Hx68KoUHM2O7mDOg+f9zqD4tAa6Ci2/E5hofojZ6q9OQaAlJlLPr+p2/F0L6T2C jJJo4WJYSCVMwgQ2lbb0V4OfZdjxT3k= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y1jAxj7HpC0zj00ojIU7+2/5URp27cwg5ekGhnTUbK4=; b=hRwu7qMP9TmzHT2FplW6HkPp9wFYpk2/9ffOuKo+lEZhGKIH5Jp86yoVXflv5Z/3I4YxVO nntGHBOo1ltBnyAA== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 5E75AA3B94; Tue, 15 Jun 2021 09:18:14 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 2836A1F2CBA; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , "Darrick J . Wong" Subject: [PATCH 02/14] documentation: Sync file_operations members with reality Date: Tue, 15 Jun 2021 11:17:52 +0200 Message-Id: <20210615091814.28626-2-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2278; h=from:subject; bh=+JHTm1BDQQGdP9H1PUT1wwFK/NSOXnEkL9zE0DdMSHk=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBBtH3fQO/hIgFbbEMhZxePOrm925VwhMgEiw4b MZAHXzCJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwQQAKCRCcnaoHP2RA2ZrVB/ 4otmqFQT04mmx22L2rIYmUffhR+JPydLyuYvgT1FPWPnO1TI9dNS9xy0tDTRSbAPFXlF+vrVzDTIDc 2wX9BM5mXwYA/nBka5hJRtCBta9M+gqe8ng9MzNyrZJ7IKLfTNaUPK6K1TaDPWDZZANR197c4ou3Gq SiqqPfMp70BQilo9ZLbtF3EK85t10niL7v/2/Riv/ERxXvJPszraejpZIjHB3cWwOhRIb4WgltUCIQ RCzi0gZ7m9O2CBe7F/Jkqml+OQu3Y2mTWORroM2K+wd+4xijA4Y+5sTYOk0Wb3BDNa0XK3tXCQYP5Q gN5oby6zn3gxDbmppHygxOKACiihcA X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Sync listing of struct file_operations members with the real one in fs.h. Acked-by: Darrick J. Wong Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig --- Documentation/filesystems/locking.rst | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 1e894480115b..4ed2b22bd0a8 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -506,6 +506,7 @@ prototypes:: ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll) (struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); @@ -518,12 +519,6 @@ prototypes:: int (*fsync) (struct file *, loff_t start, loff_t end, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); - ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, - loff_t *); - ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, - loff_t *); - ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, - void __user *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area)(struct file *, unsigned long, @@ -536,6 +531,14 @@ prototypes:: size_t, unsigned int); int (*setlease)(struct file *, long, struct file_lock **, void **); long (*fallocate)(struct file *, int, loff_t, loff_t); + void (*show_fdinfo)(struct seq_file *m, struct file *f); + unsigned (*mmap_capabilities)(struct file *); + ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, + loff_t, size_t, unsigned int); + loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t len, unsigned int remap_flags); + int (*fadvise)(struct file *, loff_t, loff_t, int); locking rules: All may block. From patchwork Tue Jun 15 09:17:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E52A4C4BA3C for ; Tue, 15 Jun 2021 09:18:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C646561245 for ; Tue, 15 Jun 2021 09:18:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231443AbhFOJUb (ORCPT ); Tue, 15 Jun 2021 05:20:31 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54470 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231289AbhFOJUV (ORCPT ); Tue, 15 Jun 2021 05:20:21 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 14E66219C7; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3EzXz9IV/iXstHciVJQGRzb6nPoV63TaKdFIDnhPAYA=; b=OOR95ZwyTNZr38I6Z7xvI9OYOMdRpGkMJidBSP72kOyD2aDoqEXAf4bQw9nmDxPIfRrHdT lw/whoJCoDPj0050uFkr0awT09MvrZNYS4zuiz7PIJc2egaaXpsxbwpQnTw0C5A0KEsU6K XtwC2enhIqBXWikWRg9C86fm+elqgsQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3EzXz9IV/iXstHciVJQGRzb6nPoV63TaKdFIDnhPAYA=; b=j5e6W9Ux0qUYugL+roane55srmqgGxOgqmLn9176q44SW9mCzzPAWKQhDxVvGZ+l2fnJTg EKOZfMzUQFCUwzAg== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 586BAA3B93; Tue, 15 Jun 2021 09:18:14 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 2D1321F2CBB; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock Date: Tue, 15 Jun 2021 11:17:53 +0200 Message-Id: <20210615091814.28626-3-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=19597; h=from:subject; bh=1nV1NrSxvLCEMLx5l35wTS6bhNjH014djA6IdqPT8pg=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBCjwL3nBP/gKer/1+61aLPElgU6R29mc70OPMp xpsxcCOJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwQgAKCRCcnaoHP2RA2fdRCA CQ7a+Dej3W2RJ9bzusnTo8g+ldTaFvF85HJqUgROyRaPJ2voRkkneW76X9pr6FsUYW/gW4e5uuTYA6 DtFUog+JZFPZc8IDDAxNXv3qXPWHL6lh8J6BmrWU95jIP9uZOV+b4eIQgf5fXZ7n1efF7Zhhvb0B1t yOPP4V7d+bN8fP7+i4S+Xs/svKMxmwU+mtdUwDjqiF/yKtWqvI3t7sYgmW/mqEfY7NttPTH8T/Jm9+ s0U0fOdddHoxq+17j+o1k8fSJcg/iX+obALacgXzF9S8okgSNqqF6x74T5TM4DMkgclvm3EMS2OdZh PbnSXq/5D9kQq5Lq0DRj5lpelopZvj X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Currently, serializing operations such as page fault, read, or readahead against hole punching is rather difficult. The basic race scheme is like: fallocate(FALLOC_FL_PUNCH_HOLE) read / fault / .. truncate_inode_pages_range() Now the problem is in this way read / page fault / readahead can instantiate pages in page cache with potentially stale data (if blocks get quickly reused). Avoiding this race is not simple - page locks do not work because we want to make sure there are *no* pages in given range. inode->i_rwsem does not work because page fault happens under mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes the performance for mixed read-write workloads suffer. So create a new rw_semaphore in the address_space - invalidate_lock - that protects adding of pages to page cache for page faults / reads / readahead. Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- Documentation/filesystems/locking.rst | 62 +++++++++++++++++-------- fs/inode.c | 2 + include/linux/fs.h | 33 ++++++++++++++ mm/filemap.c | 65 ++++++++++++++++++++++----- mm/readahead.c | 2 + mm/rmap.c | 37 +++++++-------- mm/truncate.c | 3 +- 7 files changed, 154 insertions(+), 50 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 4ed2b22bd0a8..3b27319dd187 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -271,19 +271,19 @@ prototypes:: locking rules: All except set_page_dirty and freepage may block -====================== ======================== ========= -ops PageLocked(page) i_rwsem -====================== ======================== ========= +====================== ======================== ========= =============== +ops PageLocked(page) i_rwsem invalidate_lock +====================== ======================== ========= =============== writepage: yes, unlocks (see below) -readpage: yes, unlocks +readpage: yes, unlocks shared writepages: set_page_dirty no -readahead: yes, unlocks -readpages: no +readahead: yes, unlocks shared +readpages: no shared write_begin: locks the page exclusive write_end: yes, unlocks exclusive bmap: -invalidatepage: yes +invalidatepage: yes exclusive releasepage: yes freepage: yes direct_IO: @@ -378,7 +378,10 @@ keep it that way and don't breed new callers. ->invalidatepage() is called when the filesystem must attempt to drop some or all of the buffers from the page when it is being truncated. It returns zero on success. If ->invalidatepage is zero, the kernel uses -block_invalidatepage() instead. +block_invalidatepage() instead. The filesystem must exclusively acquire +invalidate_lock before invalidating page cache in truncate / hole punch path +(and thus calling into ->invalidatepage) to block races between page cache +invalidation and page cache filling functions (fault, read, ...). ->releasepage() is called when the kernel is about to try to drop the buffers from the page in preparation for freeing it. It returns zero to @@ -573,6 +576,25 @@ in sys_read() and friends. the lease within the individual filesystem to record the result of the operation +->fallocate implementation must be really careful to maintain page cache +consistency when punching holes or performing other operations that invalidate +page cache contents. Usually the filesystem needs to call +truncate_inode_pages_range() to invalidate relevant range of the page cache. +However the filesystem usually also needs to update its internal (and on disk) +view of file offset -> disk block mapping. Until this update is finished, the +filesystem needs to block page faults and reads from reloading now-stale page +cache contents from the disk. Since VFS acquires mapping->invalidate_lock in +shared mode when loading pages from disk (filemap_fault(), filemap_read(), +readahead paths), the fallocate implementation must take the invalidate_lock to +prevent reloading. + +->copy_file_range and ->remap_file_range implementations need to serialize +against modifications of file data while the operation is running. For +blocking changes through write(2) and similar operations inode->i_rwsem can be +used. To block changes to file contents via a memory mapping during the +operation, the filesystem must take mapping->invalidate_lock to coordinate +with ->page_mkwrite. + dquot_operations ================ @@ -630,11 +652,11 @@ pfn_mkwrite: yes access: yes ============= ========= =========================== -->fault() is called when a previously not present pte is about -to be faulted in. The filesystem must find and return the page associated -with the passed in "pgoff" in the vm_fault structure. If it is possible that -the page may be truncated and/or invalidated, then the filesystem must lock -the page, then ensure it is not already truncated (the page lock will block +->fault() is called when a previously not present pte is about to be faulted +in. The filesystem must find and return the page associated with the passed in +"pgoff" in the vm_fault structure. If it is possible that the page may be +truncated and/or invalidated, then the filesystem must lock invalidate_lock, +then ensure the page is not already truncated (invalidate_lock will block subsequent truncate), and then return with VM_FAULT_LOCKED, and the page locked. The VM will unlock the page. @@ -647,12 +669,14 @@ page table entry. Pointer to entry associated with the page is passed in "pte" field in vm_fault structure. Pointers to entries for other offsets should be calculated relative to "pte". -->page_mkwrite() is called when a previously read-only pte is -about to become writeable. The filesystem again must ensure that there are -no truncate/invalidate races, and then return with the page locked. If -the page has been truncated, the filesystem should not look up a new page -like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which -will cause the VM to retry the fault. +->page_mkwrite() is called when a previously read-only pte is about to become +writeable. The filesystem again must ensure that there are no +truncate/invalidate races or races with operations such as ->remap_file_range +or ->copy_file_range, and then return with the page locked. Usually +mapping->invalidate_lock is suitable for proper serialization. If the page has +been truncated, the filesystem should not look up a new page like the ->fault() +handler, but simply return with VM_FAULT_NOPAGE, which will cause the VM to +retry the fault. ->pfn_mkwrite() is the same as page_mkwrite but when the pte is VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is diff --git a/fs/inode.c b/fs/inode.c index c93500d84264..84c528cd1955 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -190,6 +190,8 @@ int inode_init_always(struct super_block *sb, struct inode *inode) mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE); mapping->private_data = NULL; mapping->writeback_index = 0; + __init_rwsem(&mapping->invalidate_lock, "mapping.invalidate_lock", + &sb->s_type->invalidate_lock_key); inode->i_private = NULL; inode->i_mapping = mapping; INIT_HLIST_HEAD(&inode->i_dentry); /* buggered by rcu freeing */ diff --git a/include/linux/fs.h b/include/linux/fs.h index c3c88fdb9b2a..d8afbc9661d7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -436,6 +436,10 @@ int pagecache_write_end(struct file *, struct address_space *mapping, * struct address_space - Contents of a cacheable, mappable object. * @host: Owner, either the inode or the block_device. * @i_pages: Cached pages. + * @invalidate_lock: Guards coherency between page cache contents and + * file offset->disk block mappings in the filesystem during invalidates. + * It is also used to block modification of page cache contents through + * memory mappings. * @gfp_mask: Memory allocation flags to use for allocating pages. * @i_mmap_writable: Number of VM_SHARED mappings. * @nr_thps: Number of THPs in the pagecache (non-shmem only). @@ -453,6 +457,7 @@ int pagecache_write_end(struct file *, struct address_space *mapping, struct address_space { struct inode *host; struct xarray i_pages; + struct rw_semaphore invalidate_lock; gfp_t gfp_mask; atomic_t i_mmap_writable; #ifdef CONFIG_READ_ONLY_THP_FOR_FS @@ -814,6 +819,33 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla down_read_nested(&inode->i_rwsem, subclass); } +static inline void filemap_invalidate_lock(struct address_space *mapping) +{ + down_write(&mapping->invalidate_lock); +} + +static inline void filemap_invalidate_unlock(struct address_space *mapping) +{ + up_write(&mapping->invalidate_lock); +} + +static inline void filemap_invalidate_lock_shared(struct address_space *mapping) +{ + down_read(&mapping->invalidate_lock); +} + +static inline int filemap_invalidate_trylock_shared( + struct address_space *mapping) +{ + return down_read_trylock(&mapping->invalidate_lock); +} + +static inline void filemap_invalidate_unlock_shared( + struct address_space *mapping) +{ + up_read(&mapping->invalidate_lock); +} + void lock_two_nondirectories(struct inode *, struct inode*); void unlock_two_nondirectories(struct inode *, struct inode*); @@ -2488,6 +2520,7 @@ struct file_system_type { struct lock_class_key i_lock_key; struct lock_class_key i_mutex_key; + struct lock_class_key invalidate_lock_key; struct lock_class_key i_mutex_dir_key; }; diff --git a/mm/filemap.c b/mm/filemap.c index ba1068a1837f..c8e7e451d81e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -77,7 +77,8 @@ * ->i_pages lock * * ->i_rwsem - * ->i_mmap_rwsem (truncate->unmap_mapping_range) + * ->invalidate_lock (acquired by fs in truncate path) + * ->i_mmap_rwsem (truncate->unmap_mapping_range) * * ->mmap_lock * ->i_mmap_rwsem @@ -85,7 +86,8 @@ * ->i_pages lock (arch-dependent flush_dcache_mmap_lock) * * ->mmap_lock - * ->lock_page (access_process_vm) + * ->invalidate_lock (filemap_fault) + * ->lock_page (filemap_fault, access_process_vm) * * ->i_rwsem (generic_perform_write) * ->mmap_lock (fault_in_pages_readable->do_page_fault) @@ -2368,20 +2370,30 @@ static int filemap_update_page(struct kiocb *iocb, { int error; + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!filemap_invalidate_trylock_shared(mapping)) + return -EAGAIN; + } else { + filemap_invalidate_lock_shared(mapping); + } + if (!trylock_page(page)) { + error = -EAGAIN; if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) - return -EAGAIN; + goto unlock_mapping; if (!(iocb->ki_flags & IOCB_WAITQ)) { + filemap_invalidate_unlock_shared(mapping); put_and_wait_on_page_locked(page, TASK_KILLABLE); return AOP_TRUNCATED_PAGE; } error = __lock_page_async(page, iocb->ki_waitq); if (error) - return error; + goto unlock_mapping; } + error = AOP_TRUNCATED_PAGE; if (!page->mapping) - goto truncated; + goto unlock; error = 0; if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, page)) @@ -2392,15 +2404,13 @@ static int filemap_update_page(struct kiocb *iocb, goto unlock; error = filemap_read_page(iocb->ki_filp, mapping, page); - if (error == AOP_TRUNCATED_PAGE) - put_page(page); - return error; -truncated: - unlock_page(page); - put_page(page); - return AOP_TRUNCATED_PAGE; + goto unlock_mapping; unlock: unlock_page(page); +unlock_mapping: + filemap_invalidate_unlock_shared(mapping); + if (error == AOP_TRUNCATED_PAGE) + put_page(page); return error; } @@ -2415,6 +2425,19 @@ static int filemap_create_page(struct file *file, if (!page) return -ENOMEM; + /* + * Protect against truncate / hole punch. Grabbing invalidate_lock here + * assures we cannot instantiate and bring uptodate new pagecache pages + * after evicting page cache during truncate and before actually + * freeing blocks. Note that we could release invalidate_lock after + * inserting the page into page cache as the locked page would then be + * enough to synchronize with hole punching. But there are code paths + * such as filemap_update_page() filling in partially uptodate pages or + * ->readpages() that need to hold invalidate_lock while mapping blocks + * for IO so let's hold the lock here as well to keep locking rules + * simple. + */ + filemap_invalidate_lock_shared(mapping); error = add_to_page_cache_lru(page, mapping, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error == -EEXIST) @@ -2426,9 +2449,11 @@ static int filemap_create_page(struct file *file, if (error) goto error; + filemap_invalidate_unlock_shared(mapping); pagevec_add(pvec, page); return 0; error: + filemap_invalidate_unlock_shared(mapping); put_page(page); return error; } @@ -2988,6 +3013,13 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; fpin = do_sync_mmap_readahead(vmf); + } + + /* + * See comment in filemap_create_page() why we need invalidate_lock + */ + filemap_invalidate_lock_shared(mapping); + if (!page) { retry_find: page = pagecache_get_page(mapping, offset, FGP_CREAT|FGP_FOR_MMAP, @@ -2995,6 +3027,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (!page) { if (fpin) goto out_retry; + filemap_invalidate_unlock_shared(mapping); return VM_FAULT_OOM; } } @@ -3035,9 +3068,11 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (unlikely(offset >= max_off)) { unlock_page(page); put_page(page); + filemap_invalidate_unlock_shared(mapping); return VM_FAULT_SIGBUS; } + filemap_invalidate_unlock_shared(mapping); vmf->page = page; return ret | VM_FAULT_LOCKED; @@ -3056,6 +3091,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (!error || error == AOP_TRUNCATED_PAGE) goto retry_find; + filemap_invalidate_unlock_shared(mapping); return VM_FAULT_SIGBUS; @@ -3067,6 +3103,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) */ if (page) put_page(page); + filemap_invalidate_unlock_shared(mapping); if (fpin) fput(fpin); return ret | VM_FAULT_RETRY; @@ -3437,6 +3474,8 @@ static struct page *do_read_cache_page(struct address_space *mapping, * * If the page does not get brought uptodate, return -EIO. * + * The function expects mapping->invalidate_lock to be already held. + * * Return: up to date page on success, ERR_PTR() on failure. */ struct page *read_cache_page(struct address_space *mapping, @@ -3460,6 +3499,8 @@ EXPORT_SYMBOL(read_cache_page); * * If the page does not get brought uptodate, return -EIO. * + * The function expects mapping->invalidate_lock to be already held. + * * Return: up to date page on success, ERR_PTR() on failure. */ struct page *read_cache_page_gfp(struct address_space *mapping, diff --git a/mm/readahead.c b/mm/readahead.c index d589f147f4c2..41b75d76d36e 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -192,6 +192,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, */ unsigned int nofs = memalloc_nofs_save(); + filemap_invalidate_lock_shared(mapping); /* * Preallocate as many pages as we will need. */ @@ -236,6 +237,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, * will then handle the error. */ read_pages(ractl, &page_pool, false); + filemap_invalidate_unlock_shared(mapping); memalloc_nofs_restore(nofs); } EXPORT_SYMBOL_GPL(page_cache_ra_unbounded); diff --git a/mm/rmap.c b/mm/rmap.c index a35cbbbded0d..76d33c3b8ae6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -22,24 +22,25 @@ * * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock - * page->flags PG_locked (lock_page) * (see hugetlbfs below) - * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) - * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * anon_vma->rwsem - * mm->page_table_lock or pte_lock - * swap_lock (in swap_duplicate, swap_info_get) - * mmlist_lock (in mmput, drain_mmlist and others) - * mapping->private_lock (in __set_page_dirty_buffers) - * lock_page_memcg move_lock (in __set_page_dirty_buffers) - * i_pages lock (widely used) - * lruvec->lru_lock (in lock_page_lruvec_irq) - * inode->i_lock (in set_page_dirty's __mark_inode_dirty) - * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) - * sb_lock (within inode_lock in fs/fs-writeback.c) - * i_pages lock (widely used, in set_page_dirty, - * in arch-dependent flush_dcache_mmap_lock, - * within bdi.wb->list_lock in __sync_single_inode) + * mapping->invalidate_lock (in filemap_fault) + * page->flags PG_locked (lock_page) * (see hugetlbfs below) + * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) + * mapping->i_mmap_rwsem + * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) + * anon_vma->rwsem + * mm->page_table_lock or pte_lock + * swap_lock (in swap_duplicate, swap_info_get) + * mmlist_lock (in mmput, drain_mmlist and others) + * mapping->private_lock (in __set_page_dirty_buffers) + * lock_page_memcg move_lock (in __set_page_dirty_buffers) + * i_pages lock (widely used) + * lruvec->lru_lock (in lock_page_lruvec_irq) + * inode->i_lock (in set_page_dirty's __mark_inode_dirty) + * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) + * sb_lock (within inode_lock in fs/fs-writeback.c) + * i_pages lock (widely used, in set_page_dirty, + * in arch-dependent flush_dcache_mmap_lock, + * within bdi.wb->list_lock in __sync_single_inode) * * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock diff --git a/mm/truncate.c b/mm/truncate.c index 57a618c4a0d6..d0cc6588aba2 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -415,7 +415,8 @@ EXPORT_SYMBOL(truncate_inode_pages_range); * @mapping: mapping to truncate * @lstart: offset from which to truncate * - * Called under (and serialised by) inode->i_rwsem. + * Called under (and serialised by) inode->i_rwsem and + * mapping->invalidate_lock. * * Note: When this function returns, there can be a page in the process of * deletion (inside __delete_from_page_cache()) in the specified range. Thus From patchwork Tue Jun 15 09:17:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320799 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75C78C49EAF for ; Tue, 15 Jun 2021 09:18:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 61D2161245 for ; Tue, 15 Jun 2021 09:18:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231380AbhFOJUZ (ORCPT ); Tue, 15 Jun 2021 05:20:25 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54346 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230454AbhFOJUU (ORCPT ); Tue, 15 Jun 2021 05:20:20 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 0514321989; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0XkV+RgC+fT2zb/9P8XJncXzoes6c7BbAf1UfyHkV6A=; b=0e1HG1zyf9r0J2dNWokgc+OQ6H0UXR3iSbtdwrp9wm/bE28mhz4Fb/S8UqOpeYqRtSdOQu cjzSydpXdO1I2vHzYya5mJAaN//RyXzr/7zzUn4xfaMsXEldyCft0OEcFgoH8uBWNaf9EV K6ZVvlNwJx3pxSvhG+7r7Tc+dgN3eSM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0XkV+RgC+fT2zb/9P8XJncXzoes6c7BbAf1UfyHkV6A=; b=uT6drJDEI8MJfUI/sGn0NhkaG98lb5Jbe1P+6dH/SBVNuTJJgwgIMJ3LZV0NLinp+sR++m YCE36wZaDGigUuDw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 64B4CA3B95; Tue, 15 Jun 2021 09:18:14 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 3153F1F2CBC; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , "Darrick J . Wong" Subject: [PATCH 04/14] mm: Add functions to lock invalidate_lock for two mappings Date: Tue, 15 Jun 2021 11:17:54 +0200 Message-Id: <20210615091814.28626-4-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2666; h=from:subject; bh=oUYTBj0usYKZHAhcL7fto1zqDGKiVZb74yqc2jI2vTI=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBDjHixD9ynk7cHWEaRc2rAtNglfOdu6LFrLdwd AgRldvWJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwQwAKCRCcnaoHP2RA2dx8B/ 4gR5b8Yria4/uQruq+8EWovN0cccZVV1BF92wjDcg5rTaSr5GwLxoWGNa1zmJPwwQ2NRH2fz9jBbWX XTLnwG6OSvOA0TXxy9HTK9HZV6SHDFXUW3yhJ22X9hfGvy8bJz0s8f1eW8xywNms/Nly/xRu5UrXBT GyDDHranBIdveS/GDQ6BdURWmvKQyXthCrw1Wzl56+seW+PZeDN12hTUHRuojD2IaCIEoViNboFWcx jP6JhEYnZi/9K03+kpC2TWCBzY9VNPll6r5dkv4DCQIYTcVCyUPzhAmY4ETc89VR+5lWLjW0AVtXsz vAlFo8n+VuIi4W+AICrWJ+aOTHxfPV X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Some operations such as reflinking blocks among files will need to lock invalidate_lock for two mappings. Add helper functions to do that. Reviewed-by: Darrick J. Wong Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig --- include/linux/fs.h | 6 ++++++ mm/filemap.c | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index d8afbc9661d7..ddc11bafc183 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -849,6 +849,12 @@ static inline void filemap_invalidate_unlock_shared( void lock_two_nondirectories(struct inode *, struct inode*); void unlock_two_nondirectories(struct inode *, struct inode*); +void filemap_invalidate_lock_two(struct address_space *mapping1, + struct address_space *mapping2); +void filemap_invalidate_unlock_two(struct address_space *mapping1, + struct address_space *mapping2); + + /* * NOTE: in a 32bit arch with a preemptable kernel and * an UP compile the i_size_read/write must be atomic diff --git a/mm/filemap.c b/mm/filemap.c index c8e7e451d81e..b8e9bccecd9f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1009,6 +1009,44 @@ struct page *__page_cache_alloc(gfp_t gfp) EXPORT_SYMBOL(__page_cache_alloc); #endif +/* + * filemap_invalidate_lock_two - lock invalidate_lock for two mappings + * + * Lock exclusively invalidate_lock of any passed mapping that is not NULL. + * + * @mapping1: the first mapping to lock + * @mapping2: the second mapping to lock + */ +void filemap_invalidate_lock_two(struct address_space *mapping1, + struct address_space *mapping2) +{ + if (mapping1 > mapping2) + swap(mapping1, mapping2); + if (mapping1) + down_write(&mapping1->invalidate_lock); + if (mapping2 && mapping1 != mapping2) + down_write_nested(&mapping2->invalidate_lock, 1); +} +EXPORT_SYMBOL(filemap_invalidate_lock_two); + +/* + * filemap_invalidate_unlock_two - unlock invalidate_lock for two mappings + * + * Unlock exclusive invalidate_lock of any passed mapping that is not NULL. + * + * @mapping1: the first mapping to unlock + * @mapping2: the second mapping to unlock + */ +void filemap_invalidate_unlock_two(struct address_space *mapping1, + struct address_space *mapping2) +{ + if (mapping1) + up_write(&mapping1->invalidate_lock); + if (mapping2 && mapping1 != mapping2) + up_write(&mapping2->invalidate_lock); +} +EXPORT_SYMBOL(filemap_invalidate_unlock_two); + /* * In order to wait for pages to become available there must be * waitqueues associated with pages. By using a hash table of From patchwork Tue Jun 15 09:17:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320817 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6AC4C4CC73 for ; Tue, 15 Jun 2021 09:18:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A246A61434 for ; Tue, 15 Jun 2021 09:18:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231567AbhFOJUn (ORCPT ); Tue, 15 Jun 2021 05:20:43 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54680 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231351AbhFOJUX (ORCPT ); Tue, 15 Jun 2021 05:20:23 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 41B09219D6; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FTye4HMfCHRW6CfzkB5PfXcLIugz729++Ha91ExOp18=; b=eaVM6sxKdiim21U0WDjcNiGugC7COieY3JyzXY/7/wZrAtJqsQx9tnMqb9tMzj0vd4s+vg g5BdGLJxw6yb5qQuOdjkYbv3VZxC/w3qvh5fiYJwAuV7wVC8Lg0bbtk1m+8o9c/F1bam4a VTZPjATYHZggrivMwNVnfjd6+rhpU+U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FTye4HMfCHRW6CfzkB5PfXcLIugz729++Ha91ExOp18=; b=bb63UA5A0V7FlNRPRqf8NVLqe6LNsuQAFBBG5gx+KBnFfnhTshphJwqgcWWeI3jnNaw94r V075VQDEBjN2g8AA== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 1A77CA3B95; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 365781F2CBD; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 05/14] ext4: Convert to use mapping->invalidate_lock Date: Tue, 15 Jun 2021 11:17:55 +0200 Message-Id: <20210615091814.28626-5-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=15626; h=from:subject; bh=iQ9uZKJC3kKmRPaUO68W8vXe7LY4Nf41m+Lur66qXp4=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBERl4/E/EB/m2Dfoob/XGemNPV1M3nlvuKJ8je yw1FTJ+JATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwRAAKCRCcnaoHP2RA2TOHB/ 4/NUM8wY4iJz45ftxXtSswJhIkU5CACM/BsD3o481iiOE8xxLnQU2P130La50j+v/6M0chts9ENCAN 2wCSPpttNZplYisDnGtmfEr8JLFqIDA+tEjTDxqnOYEPI7ch0Yi3hMaDwIqIQRLlyH0Hg3QPovqWeF klQzztNoU9VI55VdWWChFjG9ZiOCpWEObKATq7eN717cLU/nRN36zO/L6bu4h48vgOwwpgN4LHSjDy iE/b2HRYL6wn6quiDjDFjJqfG1tl5QFIuh6lHZe1hCwH10cPIQXGkk61oeIhWnjRA9gZetsP+mpAAF Jsn+STdf5ffK+RBkliYTID+pEfY06B X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Convert ext4 to use mapping->invalidate_lock instead of its private EXT4_I(inode)->i_mmap_sem. This is mostly search-and-replace. By this conversion we fix a long standing race between hole punching and read(2) / readahead(2) paths that can lead to stale page cache contents. CC: CC: Ted Tso Signed-off-by: Jan Kara Reviewed-by: Darrick J. Wong Acked-by: Theodore Ts'o --- fs/ext4/ext4.h | 10 ---------- fs/ext4/extents.c | 25 +++++++++++++----------- fs/ext4/file.c | 13 +++++++------ fs/ext4/inode.c | 47 +++++++++++++++++----------------------------- fs/ext4/ioctl.c | 4 ++-- fs/ext4/super.c | 13 +++++-------- fs/ext4/truncate.h | 8 +++++--- 7 files changed, 50 insertions(+), 70 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 37002663d521..ed64b4b217a1 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1077,15 +1077,6 @@ struct ext4_inode_info { * by other means, so we have i_data_sem. */ struct rw_semaphore i_data_sem; - /* - * i_mmap_sem is for serializing page faults with truncate / punch hole - * operations. We have to make sure that new page cannot be faulted in - * a section of the inode that is being punched. We cannot easily use - * i_data_sem for this since we need protection for the whole punch - * operation and i_data_sem ranks below transaction start so we have - * to occasionally drop it. - */ - struct rw_semaphore i_mmap_sem; struct inode vfs_inode; struct jbd2_inode *jinode; @@ -2962,7 +2953,6 @@ extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks); extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, loff_t lstart, loff_t lend); extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf); -extern vm_fault_t ext4_filemap_fault(struct vm_fault *vmf); extern qsize_t *ext4_get_reserved_space(struct inode *inode); extern int ext4_get_projid(struct inode *inode, kprojid_t *projid); extern void ext4_da_release_space(struct inode *inode, int to_free); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index cbf37b2cf871..db5d38af9ba8 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4470,6 +4470,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, loff_t len, int mode) { struct inode *inode = file_inode(file); + struct address_space *mapping = file->f_mapping; handle_t *handle = NULL; unsigned int max_blocks; loff_t new_size = 0; @@ -4556,17 +4557,17 @@ static long ext4_zero_range(struct file *file, loff_t offset, * Prevent page faults from reinstantiating pages we have * released from page cache. */ - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_break_layouts(inode); if (ret) { - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); goto out_mutex; } ret = ext4_update_disksize_before_punch(inode, offset, len); if (ret) { - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); goto out_mutex; } /* Now release the pages and zero block aligned part of pages */ @@ -4575,7 +4576,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags); - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); if (ret) goto out_mutex; } @@ -5217,6 +5218,7 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle, static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) { struct super_block *sb = inode->i_sb; + struct address_space *mapping = inode->i_mapping; ext4_lblk_t punch_start, punch_stop; handle_t *handle; unsigned int credits; @@ -5270,7 +5272,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) * Prevent page faults from reinstantiating pages we have released from * page cache. */ - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_break_layouts(inode); if (ret) @@ -5285,15 +5287,15 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) * Write tail of the last page before removed range since it will get * removed from the page cache below. */ - ret = filemap_write_and_wait_range(inode->i_mapping, ioffset, offset); + ret = filemap_write_and_wait_range(mapping, ioffset, offset); if (ret) goto out_mmap; /* * Write data that will be shifted to preserve them when discarding * page cache below. We are also protected from pages becoming dirty - * by i_mmap_sem. + * by i_rwsem and invalidate_lock. */ - ret = filemap_write_and_wait_range(inode->i_mapping, offset + len, + ret = filemap_write_and_wait_range(mapping, offset + len, LLONG_MAX); if (ret) goto out_mmap; @@ -5346,7 +5348,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) ext4_journal_stop(handle); ext4_fc_stop_ineligible(sb); out_mmap: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); out_mutex: inode_unlock(inode); return ret; @@ -5363,6 +5365,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) { struct super_block *sb = inode->i_sb; + struct address_space *mapping = inode->i_mapping; handle_t *handle; struct ext4_ext_path *path; struct ext4_extent *extent; @@ -5421,7 +5424,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) * Prevent page faults from reinstantiating pages we have released from * page cache. */ - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_break_layouts(inode); if (ret) @@ -5522,7 +5525,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) ext4_journal_stop(handle); ext4_fc_stop_ineligible(sb); out_mmap: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); out_mutex: inode_unlock(inode); return ret; diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 816dedcbd541..d3b4ed91aa68 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -704,22 +704,23 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf, */ bool write = (vmf->flags & FAULT_FLAG_WRITE) && (vmf->vma->vm_flags & VM_SHARED); + struct address_space *mapping = vmf->vma->vm_file->f_mapping; pfn_t pfn; if (write) { sb_start_pagefault(sb); file_update_time(vmf->vma->vm_file); - down_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(mapping); retry: handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE, EXT4_DATA_TRANS_BLOCKS(sb)); if (IS_ERR(handle)) { - up_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(mapping); sb_end_pagefault(sb); return VM_FAULT_SIGBUS; } } else { - down_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(mapping); } result = dax_iomap_fault(vmf, pe_size, &pfn, &error, &ext4_iomap_ops); if (write) { @@ -731,10 +732,10 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf, /* Handling synchronous page fault? */ if (result & VM_FAULT_NEEDDSYNC) result = dax_finish_sync_fault(vmf, pe_size, pfn); - up_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(mapping); sb_end_pagefault(sb); } else { - up_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(mapping); } return result; @@ -756,7 +757,7 @@ static const struct vm_operations_struct ext4_dax_vm_ops = { #endif static const struct vm_operations_struct ext4_file_vm_ops = { - .fault = ext4_filemap_fault, + .fault = filemap_fault, .map_pages = filemap_map_pages, .page_mkwrite = ext4_page_mkwrite, }; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fe6045a46599..ee6e69d6f949 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3950,20 +3950,19 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, return ret; } -static void ext4_wait_dax_page(struct ext4_inode_info *ei) +static void ext4_wait_dax_page(struct inode *inode) { - up_write(&ei->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); schedule(); - down_write(&ei->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); } int ext4_break_layouts(struct inode *inode) { - struct ext4_inode_info *ei = EXT4_I(inode); struct page *page; int error; - if (WARN_ON_ONCE(!rwsem_is_locked(&ei->i_mmap_sem))) + if (WARN_ON_ONCE(!rwsem_is_locked(&inode->i_mapping->invalidate_lock))) return -EINVAL; do { @@ -3974,7 +3973,7 @@ int ext4_break_layouts(struct inode *inode) error = ___wait_var_event(&page->_refcount, atomic_read(&page->_refcount) == 1, TASK_INTERRUPTIBLE, 0, 0, - ext4_wait_dax_page(ei)); + ext4_wait_dax_page(inode)); } while (error == 0); return error; @@ -4005,9 +4004,9 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); if (ext4_has_inline_data(inode)) { - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_convert_inline_data(inode); - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); if (ret) return ret; } @@ -4058,7 +4057,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) * Prevent page faults from reinstantiating pages we have released from * page cache. */ - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_break_layouts(inode); if (ret) @@ -4131,7 +4130,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) out_stop: ext4_journal_stop(handle); out_dio: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); out_mutex: inode_unlock(inode); return ret; @@ -5426,11 +5425,11 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, inode_dio_wait(inode); } - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); rc = ext4_break_layouts(inode); if (rc) { - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); goto err_out; } @@ -5506,7 +5505,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, error = rc; } out_mmap_sem: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); } if (!error) { @@ -5983,10 +5982,10 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) * data (and journalled aops don't know how to handle these cases). */ if (val) { - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); err = filemap_write_and_wait(inode->i_mapping); if (err < 0) { - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); return err; } } @@ -6019,7 +6018,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) percpu_up_write(&sbi->s_writepages_rwsem); if (val) - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); /* Finally we can mark the inode as dirty. */ @@ -6063,7 +6062,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) sb_start_pagefault(inode->i_sb); file_update_time(vma->vm_file); - down_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(mapping); err = ext4_convert_inline_data(inode); if (err) @@ -6176,7 +6175,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) out_ret: ret = block_page_mkwrite_return(err); out: - up_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(mapping); sb_end_pagefault(inode->i_sb); return ret; out_error: @@ -6184,15 +6183,3 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) ext4_journal_stop(handle); goto out; } - -vm_fault_t ext4_filemap_fault(struct vm_fault *vmf) -{ - struct inode *inode = file_inode(vmf->vma->vm_file); - vm_fault_t ret; - - down_read(&EXT4_I(inode)->i_mmap_sem); - ret = filemap_fault(vmf); - up_read(&EXT4_I(inode)->i_mmap_sem); - - return ret; -} diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 31627f7dc5cd..c5ed562b4185 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -148,7 +148,7 @@ static long swap_inode_boot_loader(struct super_block *sb, goto journal_err_out; } - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); err = filemap_write_and_wait(inode->i_mapping); if (err) goto err_out; @@ -256,7 +256,7 @@ static long swap_inode_boot_loader(struct super_block *sb, ext4_double_up_write_data_sem(inode, inode_bl); err_out: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); journal_err_out: unlock_two_nondirectories(inode, inode_bl); iput(inode_bl); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index d29f6aa7d96e..c3c3cd8b0966 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -90,12 +90,9 @@ static struct inode *ext4_get_journal_inode(struct super_block *sb, /* * Lock ordering * - * Note the difference between i_mmap_sem (EXT4_I(inode)->i_mmap_sem) and - * i_mmap_rwsem (inode->i_mmap_rwsem)! - * * page fault path: - * mmap_lock -> sb_start_pagefault -> i_mmap_sem (r) -> transaction start -> - * page lock -> i_data_sem (rw) + * mmap_lock -> sb_start_pagefault -> invalidate_lock (r) -> transaction start + * -> page lock -> i_data_sem (rw) * * buffered write path: * sb_start_write -> i_mutex -> mmap_lock @@ -103,8 +100,9 @@ static struct inode *ext4_get_journal_inode(struct super_block *sb, * i_data_sem (rw) * * truncate: - * sb_start_write -> i_mutex -> i_mmap_sem (w) -> i_mmap_rwsem (w) -> page lock - * sb_start_write -> i_mutex -> i_mmap_sem (w) -> transaction start -> + * sb_start_write -> i_mutex -> invalidate_lock (w) -> i_mmap_rwsem (w) -> + * page lock + * sb_start_write -> i_mutex -> invalidate_lock (w) -> transaction start -> * i_data_sem (rw) * * direct IO: @@ -1350,7 +1348,6 @@ static void init_once(void *foo) INIT_LIST_HEAD(&ei->i_orphan); init_rwsem(&ei->xattr_sem); init_rwsem(&ei->i_data_sem); - init_rwsem(&ei->i_mmap_sem); inode_init_once(&ei->vfs_inode); ext4_fc_init_inode(&ei->vfs_inode); } diff --git a/fs/ext4/truncate.h b/fs/ext4/truncate.h index bcbe3668c1d4..ce84aa2786c7 100644 --- a/fs/ext4/truncate.h +++ b/fs/ext4/truncate.h @@ -11,14 +11,16 @@ */ static inline void ext4_truncate_failed_write(struct inode *inode) { + struct address_space *mapping = inode->i_mapping; + /* * We don't need to call ext4_break_layouts() because the blocks we * are truncating were never visible to userspace. */ - down_write(&EXT4_I(inode)->i_mmap_sem); - truncate_inode_pages(inode->i_mapping, inode->i_size); + filemap_invalidate_lock(mapping); + truncate_inode_pages(mapping, inode->i_size); ext4_truncate(inode); - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); } /* From patchwork Tue Jun 15 09:17:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNWANTED_LANGUAGE_BODY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC954C4CC60 for ; Tue, 15 Jun 2021 09:18:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C2B5261433 for ; Tue, 15 Jun 2021 09:18:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231477AbhFOJUi (ORCPT ); Tue, 15 Jun 2021 05:20:38 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54672 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231345AbhFOJUX (ORCPT ); Tue, 15 Jun 2021 05:20:23 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 37A85219CF; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y0cR5u9TaTNf3kT8s6I4jpd1jCjKCNBR8ag5a0EBpIA=; b=OHh3bUEVv71ozbI3AsVJcPaPYQvtz+MmKz2t9cI+GLwxa4FKwNzD4LN8qBRn1IKV/5Ff77 i425BEwnsdwKQz7TE9yW8eSmL358aa2peVIkyWa/7tvhvY6Q4pC/OCt6SQ1an2ZtVSTqvI d8dVpAX8puPhpEc0WKPBU5k1eCbZB9k= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y0cR5u9TaTNf3kT8s6I4jpd1jCjKCNBR8ag5a0EBpIA=; b=eE4Zb6uIWIkVZRSm2ER/qlKBXgr6U65MFkHvQfqOjVAEEtCzClcZljk7KlkS/Gs8J4yfrn lXuJlgHd633fGEAA== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 19308A3B8A; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 397031F2CBF; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 06/14] ext2: Convert to using invalidate_lock Date: Tue, 15 Jun 2021 11:17:56 +0200 Message-Id: <20210615091814.28626-6-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4682; h=from:subject; bh=CDcZWsyH32vJE3DSXVHXUVYPxLcy1S8W7KZFP94NQWc=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBFthhqcWN94OroqMs+dt8ZjvTEGenhtDvbVKo4 rIyqNfCJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwRQAKCRCcnaoHP2RA2e2rB/ 0YxI0noK+qUMXHsXn9I15hLi050WyNsbymEWE96v2SdzIiDhndWdo9d/0zgch3VwnbnZvGppRiy6BL L8cvvJmWnfdcJfo8usccb/Xnz4qYwyYwSK6HtKW8kDBKzpWQO2MYO6fOnj+C1gpc2vmQaY6IXWjnPC TME/XgagdHt0YwbhhYPTCFnQq2R6fxof7T56QfQm4Z0C3rf61b1R96engPQ3Kd8xb5Qa5Vh9vFU3Oh eqMaRcXRZaH2c320aNfwDH7c9AXiJGuof6mCll1rle8zriJYvh9kWivvZtKLCKtYLA/KPfeAcF3jU/ fLXzQCORShNTz/SEHiM5cbvSjLD+s0 X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Ext2 has its private dax_sem used for synchronizing page faults and truncation. Use mapping->invalidate_lock instead as it is meant for this purpose. CC: Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig --- fs/ext2/ext2.h | 11 ----------- fs/ext2/file.c | 7 +++---- fs/ext2/inode.c | 12 ++++++------ fs/ext2/super.c | 3 --- 4 files changed, 9 insertions(+), 24 deletions(-) diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index b0a694820cb7..81907a041570 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -667,9 +667,6 @@ struct ext2_inode_info { struct rw_semaphore xattr_sem; #endif rwlock_t i_meta_lock; -#ifdef CONFIG_FS_DAX - struct rw_semaphore dax_sem; -#endif /* * truncate_mutex is for serialising ext2_truncate() against @@ -685,14 +682,6 @@ struct ext2_inode_info { #endif }; -#ifdef CONFIG_FS_DAX -#define dax_sem_down_write(ext2_inode) down_write(&(ext2_inode)->dax_sem) -#define dax_sem_up_write(ext2_inode) up_write(&(ext2_inode)->dax_sem) -#else -#define dax_sem_down_write(ext2_inode) -#define dax_sem_up_write(ext2_inode) -#endif - /* * Inode dynamic state flags */ diff --git a/fs/ext2/file.c b/fs/ext2/file.c index f98466acc672..eb97aa3d700e 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -81,7 +81,7 @@ static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) * * mmap_lock (MM) * sb_start_pagefault (vfs, freeze) - * ext2_inode_info->dax_sem + * address_space->invalidate_lock * address_space->i_mmap_rwsem or page_lock (mutually exclusive in DAX) * ext2_inode_info->truncate_mutex * @@ -91,7 +91,6 @@ static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) static vm_fault_t ext2_dax_fault(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file); - struct ext2_inode_info *ei = EXT2_I(inode); vm_fault_t ret; bool write = (vmf->flags & FAULT_FLAG_WRITE) && (vmf->vma->vm_flags & VM_SHARED); @@ -100,11 +99,11 @@ static vm_fault_t ext2_dax_fault(struct vm_fault *vmf) sb_start_pagefault(inode->i_sb); file_update_time(vmf->vma->vm_file); } - down_read(&ei->dax_sem); + filemap_invalidate_lock_shared(inode->i_mapping); ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); - up_read(&ei->dax_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); if (write) sb_end_pagefault(inode->i_sb); return ret; diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 68178b2234bd..2c76b9ffea26 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1175,7 +1175,7 @@ static void ext2_free_branches(struct inode *inode, __le32 *p, __le32 *q, int de ext2_free_data(inode, p, q); } -/* dax_sem must be held when calling this function */ +/* mapping->invalidate_lock must be held when calling this function */ static void __ext2_truncate_blocks(struct inode *inode, loff_t offset) { __le32 *i_data = EXT2_I(inode)->i_data; @@ -1192,7 +1192,7 @@ static void __ext2_truncate_blocks(struct inode *inode, loff_t offset) iblock = (offset + blocksize-1) >> EXT2_BLOCK_SIZE_BITS(inode->i_sb); #ifdef CONFIG_FS_DAX - WARN_ON(!rwsem_is_locked(&ei->dax_sem)); + WARN_ON(!rwsem_is_locked(&inode->i_mapping->invalidate_lock)); #endif n = ext2_block_to_path(inode, iblock, offsets, NULL); @@ -1274,9 +1274,9 @@ static void ext2_truncate_blocks(struct inode *inode, loff_t offset) if (ext2_inode_is_fast_symlink(inode)) return; - dax_sem_down_write(EXT2_I(inode)); + filemap_invalidate_lock(inode->i_mapping); __ext2_truncate_blocks(inode, offset); - dax_sem_up_write(EXT2_I(inode)); + filemap_invalidate_unlock(inode->i_mapping); } static int ext2_setsize(struct inode *inode, loff_t newsize) @@ -1306,10 +1306,10 @@ static int ext2_setsize(struct inode *inode, loff_t newsize) if (error) return error; - dax_sem_down_write(EXT2_I(inode)); + filemap_invalidate_lock(inode->i_mapping); truncate_setsize(inode, newsize); __ext2_truncate_blocks(inode, newsize); - dax_sem_up_write(EXT2_I(inode)); + filemap_invalidate_unlock(inode->i_mapping); inode->i_mtime = inode->i_ctime = current_time(inode); if (inode_needs_sync(inode)) { diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 21e09fbaa46f..987bcf32ed46 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -206,9 +206,6 @@ static void init_once(void *foo) init_rwsem(&ei->xattr_sem); #endif mutex_init(&ei->truncate_mutex); -#ifdef CONFIG_FS_DAX - init_rwsem(&ei->dax_sem); -#endif inode_init_once(&ei->vfs_inode); } From patchwork Tue Jun 15 09:17:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 448AAC48BDF for ; Tue, 15 Jun 2021 09:18:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 27AAE61245 for ; Tue, 15 Jun 2021 09:18:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231488AbhFOJUo (ORCPT ); Tue, 15 Jun 2021 05:20:44 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54674 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231346AbhFOJUX (ORCPT ); Tue, 15 Jun 2021 05:20:23 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 3C982219D0; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eJPcNasQsdD4alraw2ySNp9CvXcKAD7eZsBqJbvJG/E=; b=mNYsE3Moem4CPgipCoJBVW3pKbWM/RieHW2msBIFTGhF0DkZcz1nTEkMIpUU6UjabKdm0W M6mVM1MuYXexKl9goxJrYVfD+Tj4mmSH179+L8f9whj6gJTW9LfV41g3sQXs+Pcg/O4WWi 7FgLS7vOukNIO2t4r3TRkcs19pHiAsQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eJPcNasQsdD4alraw2ySNp9CvXcKAD7eZsBqJbvJG/E=; b=diw0cB6TtQrkCRKrb/BOoRn14gZQ/b0qj0myLemQdccLwtuT7qpBMKRtuoiLaD5N0Be4IC 3HLDqaJki5AW9ODA== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 19077A3B84; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 3D26D1F2CC0; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Pavel Reichl , Dave Chinner , Eric Sandeen , Jan Kara Subject: [PATCH 07/14] xfs: Refactor xfs_isilocked() Date: Tue, 15 Jun 2021 11:17:57 +0200 Message-Id: <20210615091814.28626-7-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2613; i=jack@suse.cz; h=from:subject; bh=U+bbQrzY4NRNf6bjNug/8v3gG53d1tZPwGQTBPmqTzM=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBGb6DDZ1XROYQY0WKzslcvTC5Zc+A/26r5bbgz 3WGOnWOJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwRgAKCRCcnaoHP2RA2aRdB/ 9ARdUkK0QlTWiC/gWPnxKJ8Av4wBT4lrAPdR1a5S0uvs+LB+14MAWG84ZF9HIOr0LFDOMd282bJoaJ IseWFk4fqZdzNu3cYmvZzy4dyFh4eGr+1caf0RtF0InMJoGmONPMk/b5o7/Of1f8V1mUn9qDynxzBU wlSqOKQ4+bvxwZ6oJR2Q8PYRCbiS+qwNDWa8t1PFEl1l4ds/PPt0JyhEt11F2sblY2y2RB7/uYWCLv MEpWPPohHUJyO2cUxiHMTfOnJc31j92fbjJ72x/LASoZUIStAazqwBX2/Rx9nUvSddiRuGfZnMdOZJ T06WayHXGk7uGywNCl05BBOH3cZn62 X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Pavel Reichl Refactor xfs_isilocked() to use newly introduced __xfs_rwsem_islocked(). __xfs_rwsem_islocked() is a helper function which encapsulates checking state of rw_semaphores hold by inode. Signed-off-by: Pavel Reichl Suggested-by: Dave Chinner Suggested-by: Eric Sandeen Suggested-by: Darrick J. Wong Signed-off-by: Jan Kara Signed-off-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_inode.c | 34 ++++++++++++++++++++++++++-------- fs/xfs/xfs_inode.h | 2 +- 2 files changed, 27 insertions(+), 9 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index e4c2da4566f1..ffd47217a8fa 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -342,9 +342,29 @@ xfs_ilock_demote( } #if defined(DEBUG) || defined(XFS_WARN) -int +static inline bool +__xfs_rwsem_islocked( + struct rw_semaphore *rwsem, + bool shared) +{ + if (!debug_locks) + return rwsem_is_locked(rwsem); + + if (!shared) + return lockdep_is_held_type(rwsem, 0); + + /* + * We are checking that the lock is held at least in shared + * mode but don't care that it might be held exclusively + * (i.e. shared | excl). Hence we check if the lock is held + * in any mode rather than an explicit shared mode. + */ + return lockdep_is_held_type(rwsem, -1); +} + +bool xfs_isilocked( - xfs_inode_t *ip, + struct xfs_inode *ip, uint lock_flags) { if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) { @@ -359,15 +379,13 @@ xfs_isilocked( return rwsem_is_locked(&ip->i_mmaplock.mr_lock); } - if (lock_flags & (XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED)) { - if (!(lock_flags & XFS_IOLOCK_SHARED)) - return !debug_locks || - lockdep_is_held_type(&VFS_I(ip)->i_rwsem, 0); - return rwsem_is_locked(&VFS_I(ip)->i_rwsem); + if (lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) { + return __xfs_rwsem_islocked(&VFS_I(ip)->i_rwsem, + (lock_flags & XFS_IOLOCK_SHARED)); } ASSERT(0); - return 0; + return false; } #endif diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index ca826cfba91c..4659e1568966 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -410,7 +410,7 @@ void xfs_ilock(xfs_inode_t *, uint); int xfs_ilock_nowait(xfs_inode_t *, uint); void xfs_iunlock(xfs_inode_t *, uint); void xfs_ilock_demote(xfs_inode_t *, uint); -int xfs_isilocked(xfs_inode_t *, uint); +bool xfs_isilocked(struct xfs_inode *, uint); uint xfs_ilock_data_map_shared(struct xfs_inode *); uint xfs_ilock_attr_map_shared(struct xfs_inode *); From patchwork Tue Jun 15 09:17:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320803 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A73DC49EBE for ; Tue, 15 Jun 2021 09:18:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C5D26142E for ; Tue, 15 Jun 2021 09:18:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231409AbhFOJU1 (ORCPT ); Tue, 15 Jun 2021 05:20:27 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:46258 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231282AbhFOJUU (ORCPT ); Tue, 15 Jun 2021 05:20:20 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 2D8C71FD56; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3n6F+0dzAE2kVwo9l0/Z2iXaLof65XpBCWpGOoaSj48=; b=0rr+BXP5AM14wOLP3tqyhrym+gBi46GdSbHD+PSAIYWlkW2aomF05xmKuEZLsgMXNWBOy7 3SB8q4v4kDWBcGxR202sJOg4y1Blyl4cHVWJtAItiR8lEz8kTyFtsv6IrWeN87P+ihjFEN PCuFT69vQaLnsDcxZRAl3dKKjMaUhkc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3n6F+0dzAE2kVwo9l0/Z2iXaLof65XpBCWpGOoaSj48=; b=Po2EhGeDkmQuh5NbcRS7b5d3LmO9PaKxnLzApds4mOhjfjsSVoZdwkoXEcL26/sc2vrJlW AWTVhvqGIc1JyHBA== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 19D89A3B94; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 411101F2CC1; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , Christoph Hellwig , "Darrick J . Wong" Subject: [PATCH 08/14] xfs: Convert to use invalidate_lock Date: Tue, 15 Jun 2021 11:17:58 +0200 Message-Id: <20210615091814.28626-8-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7925; h=from:subject; bh=P0ASIWl83w3dRvaVxmpQbUU3k4N/QT1nYATS5hVYspY=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBG4kjlP9r+znmpJLJQeuNjdsTv60CbzATeJygo yVUnsH+JATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwRgAKCRCcnaoHP2RA2fkbB/ 9BgWtec4/ZbM1vbKvvrC9SBmDpS3xT0ixGsM6AGpUEFEnDQ2oHO1HIiWGqUzD/LAQQkxEZQPCfjPc7 fSKsUv5xh2iLNp8A1lPY4hgEXCa47zIvQ1LaydUFxGuzVGyg3D5oJj06xq2ftzctLFwGXv84gN/dpy Tvw4KGwNKpzC3ym0wFWNC2hHFY6fcu7G4sowvpD2/cWV/7HVIt6qElizGh8PPWAlQev7tFc9KdKcJn oocrwK6pUMYbK6qHi2dPyP/3A7W+FJjuyvXe+xMsbL9vgSAR0vKy9NGdYQjAB6sfLt/lm0ZFlXKx7E IvX/1vpLKspFXuAW2dfg0v7soc4IBC X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use invalidate_lock instead of XFS internal i_mmap_lock. The intended purpose of invalidate_lock is exactly the same. Note that the locking in __xfs_filemap_fault() slightly changes as filemap_fault() already takes invalidate_lock. Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong CC: CC: "Darrick J. Wong" Signed-off-by: Jan Kara --- fs/xfs/xfs_file.c | 13 +++++++----- fs/xfs/xfs_inode.c | 50 ++++++++++++++++++++++++---------------------- fs/xfs/xfs_inode.h | 1 - fs/xfs/xfs_super.c | 2 -- 4 files changed, 34 insertions(+), 32 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 396ef36dcd0a..7cb7703c2209 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1282,7 +1282,7 @@ xfs_file_llseek( * * mmap_lock (MM) * sb_start_pagefault(vfs, freeze) - * i_mmaplock (XFS - truncate serialisation) + * invalidate_lock (vfs/XFS_MMAPLOCK - truncate serialisation) * page_lock (MM) * i_lock (XFS - extent map serialisation) */ @@ -1303,24 +1303,27 @@ __xfs_filemap_fault( file_update_time(vmf->vma->vm_file); } - xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); if (IS_DAX(inode)) { pfn_t pfn; + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, (write_fault && !vmf->cow_page) ? &xfs_direct_write_iomap_ops : &xfs_read_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); } else { - if (write_fault) + if (write_fault) { + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); ret = iomap_page_mkwrite(vmf, &xfs_buffered_write_iomap_ops); - else + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); + } else { ret = filemap_fault(vmf); + } } - xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); if (write_fault) sb_end_pagefault(inode->i_sb); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index ffd47217a8fa..da0c3b62dae6 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -131,7 +131,7 @@ xfs_ilock_attr_map_shared( /* * In addition to i_rwsem in the VFS inode, the xfs inode contains 2 - * multi-reader locks: i_mmap_lock and the i_lock. This routine allows + * multi-reader locks: invalidate_lock and the i_lock. This routine allows * various combinations of the locks to be obtained. * * The 3 locks should always be ordered so that the IO lock is obtained first, @@ -139,23 +139,23 @@ xfs_ilock_attr_map_shared( * * Basic locking order: * - * i_rwsem -> i_mmap_lock -> page_lock -> i_ilock + * i_rwsem -> invalidate_lock -> page_lock -> i_ilock * * mmap_lock locking order: * * i_rwsem -> page lock -> mmap_lock - * mmap_lock -> i_mmap_lock -> page_lock + * mmap_lock -> invalidate_lock -> page_lock * * The difference in mmap_lock locking order mean that we cannot hold the - * i_mmap_lock over syscall based read(2)/write(2) based IO. These IO paths can - * fault in pages during copy in/out (for buffered IO) or require the mmap_lock - * in get_user_pages() to map the user pages into the kernel address space for - * direct IO. Similarly the i_rwsem cannot be taken inside a page fault because - * page faults already hold the mmap_lock. + * invalidate_lock over syscall based read(2)/write(2) based IO. These IO paths + * can fault in pages during copy in/out (for buffered IO) or require the + * mmap_lock in get_user_pages() to map the user pages into the kernel address + * space for direct IO. Similarly the i_rwsem cannot be taken inside a page + * fault because page faults already hold the mmap_lock. * * Hence to serialise fully against both syscall and mmap based IO, we need to - * take both the i_rwsem and the i_mmap_lock. These locks should *only* be both - * taken in places where we need to invalidate the page cache in a race + * take both the i_rwsem and the invalidate_lock. These locks should *only* be + * both taken in places where we need to invalidate the page cache in a race * free manner (e.g. truncate, hole punch and other extent manipulation * functions). */ @@ -187,10 +187,13 @@ xfs_ilock( XFS_IOLOCK_DEP(lock_flags)); } - if (lock_flags & XFS_MMAPLOCK_EXCL) - mrupdate_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags)); - else if (lock_flags & XFS_MMAPLOCK_SHARED) - mraccess_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags)); + if (lock_flags & XFS_MMAPLOCK_EXCL) { + down_write_nested(&VFS_I(ip)->i_mapping->invalidate_lock, + XFS_MMAPLOCK_DEP(lock_flags)); + } else if (lock_flags & XFS_MMAPLOCK_SHARED) { + down_read_nested(&VFS_I(ip)->i_mapping->invalidate_lock, + XFS_MMAPLOCK_DEP(lock_flags)); + } if (lock_flags & XFS_ILOCK_EXCL) mrupdate_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags)); @@ -239,10 +242,10 @@ xfs_ilock_nowait( } if (lock_flags & XFS_MMAPLOCK_EXCL) { - if (!mrtryupdate(&ip->i_mmaplock)) + if (!down_write_trylock(&VFS_I(ip)->i_mapping->invalidate_lock)) goto out_undo_iolock; } else if (lock_flags & XFS_MMAPLOCK_SHARED) { - if (!mrtryaccess(&ip->i_mmaplock)) + if (!down_read_trylock(&VFS_I(ip)->i_mapping->invalidate_lock)) goto out_undo_iolock; } @@ -257,9 +260,9 @@ xfs_ilock_nowait( out_undo_mmaplock: if (lock_flags & XFS_MMAPLOCK_EXCL) - mrunlock_excl(&ip->i_mmaplock); + up_write(&VFS_I(ip)->i_mapping->invalidate_lock); else if (lock_flags & XFS_MMAPLOCK_SHARED) - mrunlock_shared(&ip->i_mmaplock); + up_read(&VFS_I(ip)->i_mapping->invalidate_lock); out_undo_iolock: if (lock_flags & XFS_IOLOCK_EXCL) up_write(&VFS_I(ip)->i_rwsem); @@ -306,9 +309,9 @@ xfs_iunlock( up_read(&VFS_I(ip)->i_rwsem); if (lock_flags & XFS_MMAPLOCK_EXCL) - mrunlock_excl(&ip->i_mmaplock); + up_write(&VFS_I(ip)->i_mapping->invalidate_lock); else if (lock_flags & XFS_MMAPLOCK_SHARED) - mrunlock_shared(&ip->i_mmaplock); + up_read(&VFS_I(ip)->i_mapping->invalidate_lock); if (lock_flags & XFS_ILOCK_EXCL) mrunlock_excl(&ip->i_lock); @@ -334,7 +337,7 @@ xfs_ilock_demote( if (lock_flags & XFS_ILOCK_EXCL) mrdemote(&ip->i_lock); if (lock_flags & XFS_MMAPLOCK_EXCL) - mrdemote(&ip->i_mmaplock); + downgrade_write(&VFS_I(ip)->i_mapping->invalidate_lock); if (lock_flags & XFS_IOLOCK_EXCL) downgrade_write(&VFS_I(ip)->i_rwsem); @@ -374,9 +377,8 @@ xfs_isilocked( } if (lock_flags & (XFS_MMAPLOCK_EXCL|XFS_MMAPLOCK_SHARED)) { - if (!(lock_flags & XFS_MMAPLOCK_SHARED)) - return !!ip->i_mmaplock.mr_writer; - return rwsem_is_locked(&ip->i_mmaplock.mr_lock); + return __xfs_rwsem_islocked(&VFS_I(ip)->i_rwsem, + (lock_flags & XFS_IOLOCK_SHARED)); } if (lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) { diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 4659e1568966..313635544058 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -40,7 +40,6 @@ typedef struct xfs_inode { /* Transaction and locking information. */ struct xfs_inode_log_item *i_itemp; /* logging information */ mrlock_t i_lock; /* inode lock */ - mrlock_t i_mmaplock; /* inode mmap IO lock */ atomic_t i_pincount; /* inode pin count */ /* diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index a2dab05332ac..eeaf44910b5f 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -715,8 +715,6 @@ xfs_fs_inode_init_once( atomic_set(&ip->i_pincount, 0); spin_lock_init(&ip->i_flags_lock); - mrlock_init(&ip->i_mmaplock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER, - "xfsino", ip->i_ino); mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER, "xfsino", ip->i_ino); } From patchwork Tue Jun 15 09:17:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320827 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95ABAC49EB7 for ; Tue, 15 Jun 2021 09:18:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7810061245 for ; Tue, 15 Jun 2021 09:18:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231614AbhFOJUv (ORCPT ); Tue, 15 Jun 2021 05:20:51 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54498 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230519AbhFOJUV (ORCPT ); Tue, 15 Jun 2021 05:20:21 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 2CB2D219CE; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jzpVj8JlrB8ydY+vFRnq+G9CoeRUVMlFtm7aKp4N1TU=; b=jIyNAG9KFenh2z2siQJXSnp6wk4j3u0lHvGIRuc95y2Iz7dJphW6jZer7NjmAMG3UAGSi8 p2Gypy1k/cSMKrfPUvmpuqqmB51SVvAxH96AQ8OO5vuL4HqNwe5dNHmrQO+7omOhgmDGi+ arYs+IWjE2JHjCyb+wNGIIcOkOlby0k= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jzpVj8JlrB8ydY+vFRnq+G9CoeRUVMlFtm7aKp4N1TU=; b=zdbe+wLegF9N6BdAYz21hwpOcpeFQLPCiznBaKEgMFDJgyBhBwUdkS2CCn44d3XZommyI9 oXDi/0dmGOrNp2Bg== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 1A965A3B99; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 45F9A1F2CC2; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , "Darrick J. Wong" Subject: [PATCH 09/14] xfs: Convert double locking of MMAPLOCK to use VFS helpers Date: Tue, 15 Jun 2021 11:17:59 +0200 Message-Id: <20210615091814.28626-9-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4917; h=from:subject; bh=Pt3UjgWOKw5VBXlHinJjhLE0BJn5KyvbzUsgusJoyQ8=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBHQz7sYYGESLOPUM9je2BwnQMR+94bnROYEA4r PHSnKz2JATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwRwAKCRCcnaoHP2RA2ZcrB/ 9gzkntEzMfcOHZzvFBHpOx8w6qa0DVwkTogkJwuD4ehyxoN26RDHjrHkpfGyuWmge5Xcv9Xe5qEzFU DGYDGiBq1WPXUespIfaU645ADMMvxG8mcSLWgoN8D2yaJfdCkNjHckHDbJGPAyg7+suX2Jjc/F6HQO qKDWxh+eH9kUWsv6zTuK/BGNGjb53iH4swqvoVGw29hTwqMA28nYxUM36aRdCe01AVXlq5ruTMiRfg xD0/4Jat8TpArX3qchGrDyDlRz2ff7SrsyOZ5Qf/Lieqn/YbcoQgskaWyXlMUv3LQANbfCq5+zeJ4Q 3ffjlRg4ZpEVIz1Q5fmJeRMUsHUJ9h X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Convert places in XFS that take MMAPLOCK for two inodes to use helper VFS provides for it (filemap_invalidate_down_write_two()). Note that this changes lock ordering for MMAPLOCK from inode number based ordering to pointer based ordering VFS generally uses. CC: "Darrick J. Wong" Reviewed-by: Darrick J. Wong Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_bmap_util.c | 15 ++++++++------- fs/xfs/xfs_inode.c | 37 +++++++++++-------------------------- 2 files changed, 19 insertions(+), 33 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 0936f3a96fe6..ef18269ad1f7 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1626,7 +1626,6 @@ xfs_swap_extents( struct xfs_bstat *sbp = &sxp->sx_stat; int src_log_flags, target_log_flags; int error = 0; - int lock_flags; uint64_t f; int resblks = 0; unsigned int flags = 0; @@ -1638,8 +1637,8 @@ xfs_swap_extents( * do the rest of the checks. */ lock_two_nondirectories(VFS_I(ip), VFS_I(tip)); - lock_flags = XFS_MMAPLOCK_EXCL; - xfs_lock_two_inodes(ip, XFS_MMAPLOCK_EXCL, tip, XFS_MMAPLOCK_EXCL); + filemap_invalidate_lock_two(VFS_I(ip)->i_mapping, + VFS_I(tip)->i_mapping); /* Verify that both files have the same format */ if ((VFS_I(ip)->i_mode & S_IFMT) != (VFS_I(tip)->i_mode & S_IFMT)) { @@ -1711,7 +1710,6 @@ xfs_swap_extents( * or cancel will unlock the inodes from this point onwards. */ xfs_lock_two_inodes(ip, XFS_ILOCK_EXCL, tip, XFS_ILOCK_EXCL); - lock_flags |= XFS_ILOCK_EXCL; xfs_trans_ijoin(tp, ip, 0); xfs_trans_ijoin(tp, tip, 0); @@ -1830,13 +1828,16 @@ xfs_swap_extents( trace_xfs_swap_extent_after(ip, 0); trace_xfs_swap_extent_after(tip, 1); +out_unlock_ilock: + xfs_iunlock(ip, XFS_ILOCK_EXCL); + xfs_iunlock(tip, XFS_ILOCK_EXCL); out_unlock: - xfs_iunlock(ip, lock_flags); - xfs_iunlock(tip, lock_flags); + filemap_invalidate_unlock_two(VFS_I(ip)->i_mapping, + VFS_I(tip)->i_mapping); unlock_two_nondirectories(VFS_I(ip), VFS_I(tip)); return error; out_trans_cancel: xfs_trans_cancel(tp); - goto out_unlock; + goto out_unlock_ilock; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index da0c3b62dae6..c963b9dc484b 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -551,12 +551,10 @@ xfs_lock_inodes( } /* - * xfs_lock_two_inodes() can only be used to lock one type of lock at a time - - * the mmaplock or the ilock, but not more than one type at a time. If we lock - * more than one at a time, lockdep will report false positives saying we have - * violated locking orders. The iolock must be double-locked separately since - * we use i_rwsem for that. We now support taking one lock EXCL and the other - * SHARED. + * xfs_lock_two_inodes() can only be used to lock ilock. The iolock and + * mmaplock must be double-locked separately since we use i_rwsem and + * invalidate_lock for that. We now support taking one lock EXCL and the + * other SHARED. */ void xfs_lock_two_inodes( @@ -574,15 +572,8 @@ xfs_lock_two_inodes( ASSERT(hweight32(ip1_mode) == 1); ASSERT(!(ip0_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))); ASSERT(!(ip1_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))); - ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || - !(ip0_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); - ASSERT(!(ip1_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || - !(ip1_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); - ASSERT(!(ip1_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || - !(ip0_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); - ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || - !(ip1_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); - + ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL))); + ASSERT(!(ip1_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL))); ASSERT(ip0->i_ino != ip1->i_ino); if (ip0->i_ino > ip1->i_ino) { @@ -3774,11 +3765,8 @@ xfs_ilock2_io_mmap( ret = xfs_iolock_two_inodes_and_break_layout(VFS_I(ip1), VFS_I(ip2)); if (ret) return ret; - if (ip1 == ip2) - xfs_ilock(ip1, XFS_MMAPLOCK_EXCL); - else - xfs_lock_two_inodes(ip1, XFS_MMAPLOCK_EXCL, - ip2, XFS_MMAPLOCK_EXCL); + filemap_invalidate_lock_two(VFS_I(ip1)->i_mapping, + VFS_I(ip2)->i_mapping); return 0; } @@ -3788,12 +3776,9 @@ xfs_iunlock2_io_mmap( struct xfs_inode *ip1, struct xfs_inode *ip2) { - bool same_inode = (ip1 == ip2); - - xfs_iunlock(ip2, XFS_MMAPLOCK_EXCL); - if (!same_inode) - xfs_iunlock(ip1, XFS_MMAPLOCK_EXCL); + filemap_invalidate_unlock_two(VFS_I(ip1)->i_mapping, + VFS_I(ip2)->i_mapping); inode_unlock(VFS_I(ip2)); - if (!same_inode) + if (ip1 != ip2) inode_unlock(VFS_I(ip1)); } From patchwork Tue Jun 15 09:18:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNWANTED_LANGUAGE_BODY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F308C49EA7 for ; Tue, 15 Jun 2021 09:18:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 79C9261432 for ; Tue, 15 Jun 2021 09:18:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231588AbhFOJUr (ORCPT ); Tue, 15 Jun 2021 05:20:47 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54714 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231354AbhFOJUX (ORCPT ); Tue, 15 Jun 2021 05:20:23 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 466B1219D7; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=emP2yQ1eljPd1NFX6C4KS1pG2iKhLeUQkNZ2AuwC2+s=; b=Ah4EmbVQ6M39Y6+LU7t5DiCtD8hmTR3/RGkh8tJS5uPiaiUD/ZFzLAf39R9vJhHv1ZFrNg rBVfWnox2lo0wiGuZOyP9FN3z7Ckf/lgG42jhWZz+3BA6qKjAxv25g68m0gAv7zqISGoqY llRqrQ+QCGd6kMX8Rkvq6oLOdVeal2M= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=emP2yQ1eljPd1NFX6C4KS1pG2iKhLeUQkNZ2AuwC2+s=; b=vSYdhpsCRnSXzaqR2+4HccdDgz/dZS4DZTpvxk9nYt/ZMaxbF9lFh/PZNIXU3vyPSMZ5Hn shWE2wmxGojI+MDQ== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 2375CA3B93; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 4B42F1F2CC3; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 10/14] zonefs: Convert to using invalidate_lock Date: Tue, 15 Jun 2021 11:18:00 +0200 Message-Id: <20210615091814.28626-10-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3534; h=from:subject; bh=X1+FAHnRq/hvKiFmSNKc+MwXBtDprJwr7H1BxXLJ9wE=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBIL5zkdMExsJLTZiQSWW/f9riYmtp7RAK4a4BF P6yR1FuJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwSAAKCRCcnaoHP2RA2UjtCA DoGEOo5arbFkmjtyizZRD5fM5r5JrObCwT2t+IQJ3QFCXxSozqyYb8zdpzsUOCM2dt827sgezUBiyD Vu951paF9lgXzwVfR4FN3ipP6AxAoiPXiFOi1wEJKe9v9uz3RxZczz6Bw35QqUkxbSR/8+2jfiGS63 XXrEEZLjtDjyjwMshEgx2uym2/CX0BTl9G844bnsnWrguD8USkL1mj8coFJDG5CQrBiQ906PvCvOtV nUhYdWPV8a4dCwkPsRjykm5JceG862taTZwcqe3kWyUz1oNZ3/OZmLvag09+5UnF5LBUYqVZk29tQ1 yXRMzbLBXY8cwUKqFfjzq+/PVbqkf3 X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use invalidate_lock instead of zonefs' private i_mmap_sem. The intended purpose is exactly the same. CC: Damien Le Moal CC: Johannes Thumshirn CC: Acked-by: Damien Le Moal Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig --- fs/zonefs/super.c | 23 +++++------------------ fs/zonefs/zonefs.h | 7 +++---- 2 files changed, 8 insertions(+), 22 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index cd145d318b17..8931a566038e 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -462,7 +462,7 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize) inode_dio_wait(inode); /* Serialize against page faults */ - down_write(&zi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); /* Serialize against zonefs_iomap_begin() */ mutex_lock(&zi->i_truncate_mutex); @@ -500,7 +500,7 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize) unlock: mutex_unlock(&zi->i_truncate_mutex); - up_write(&zi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); return ret; } @@ -575,18 +575,6 @@ static int zonefs_file_fsync(struct file *file, loff_t start, loff_t end, return ret; } -static vm_fault_t zonefs_filemap_fault(struct vm_fault *vmf) -{ - struct zonefs_inode_info *zi = ZONEFS_I(file_inode(vmf->vma->vm_file)); - vm_fault_t ret; - - down_read(&zi->i_mmap_sem); - ret = filemap_fault(vmf); - up_read(&zi->i_mmap_sem); - - return ret; -} - static vm_fault_t zonefs_filemap_page_mkwrite(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file); @@ -607,16 +595,16 @@ static vm_fault_t zonefs_filemap_page_mkwrite(struct vm_fault *vmf) file_update_time(vmf->vma->vm_file); /* Serialize against truncates */ - down_read(&zi->i_mmap_sem); + filemap_invalidate_lock_shared(inode->i_mapping); ret = iomap_page_mkwrite(vmf, &zonefs_iomap_ops); - up_read(&zi->i_mmap_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); sb_end_pagefault(inode->i_sb); return ret; } static const struct vm_operations_struct zonefs_file_vm_ops = { - .fault = zonefs_filemap_fault, + .fault = filemap_fault, .map_pages = filemap_map_pages, .page_mkwrite = zonefs_filemap_page_mkwrite, }; @@ -1158,7 +1146,6 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb) inode_init_once(&zi->i_vnode); mutex_init(&zi->i_truncate_mutex); - init_rwsem(&zi->i_mmap_sem); zi->i_wr_refcnt = 0; return &zi->i_vnode; diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h index 51141907097c..7b147907c328 100644 --- a/fs/zonefs/zonefs.h +++ b/fs/zonefs/zonefs.h @@ -70,12 +70,11 @@ struct zonefs_inode_info { * and changes to the inode private data, and in particular changes to * a sequential file size on completion of direct IO writes. * Serialization of mmap read IOs with truncate and syscall IO - * operations is done with i_mmap_sem in addition to i_truncate_mutex. - * Only zonefs_seq_file_truncate() takes both lock (i_mmap_sem first, - * i_truncate_mutex second). + * operations is done with invalidate_lock in addition to + * i_truncate_mutex. Only zonefs_seq_file_truncate() takes both lock + * (invalidate_lock first, i_truncate_mutex second). */ struct mutex i_truncate_mutex; - struct rw_semaphore i_mmap_sem; /* guarded by i_truncate_mutex */ unsigned int i_wr_refcnt; From patchwork Tue Jun 15 09:18:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320825 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE9F1C49ED6 for ; Tue, 15 Jun 2021 09:18:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9D8876142E for ; Tue, 15 Jun 2021 09:18:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231470AbhFOJUt (ORCPT ); Tue, 15 Jun 2021 05:20:49 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54678 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231350AbhFOJUX (ORCPT ); Tue, 15 Jun 2021 05:20:23 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 3F1B9219D5; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gxy4E4M6RlL/AvZpYrufblUBcih2aW4l6tpccnvD6oM=; b=TqZv4+btCO83UzmhuMydU56xE4Mr68bgBcP7HYBDRFh7AHNQDuC2PgNSQ8pusl0eYSUmsZ bqbo6JFJYR+wj4I6QViRwKD6UWCCj7FyJcr/3vDaKs+H2AW2IZlXpQVBWouReF8RIrJNxt a4VSpONkD6TQ2CsV6Chz+tpeNXF6pmI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gxy4E4M6RlL/AvZpYrufblUBcih2aW4l6tpccnvD6oM=; b=GfbtEBj9QvdLLo+6b2NyELeXT8CPrjZVq3BIhTYGuWhSBbyBZJgWkaQm33q0rRZ2lUSHn/ X91cDKdwSrCyPpBw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 1D7CFA3B9B; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 4EC361F2CC4; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 11/14] f2fs: Convert to using invalidate_lock Date: Tue, 15 Jun 2021 11:18:01 +0200 Message-Id: <20210615091814.28626-11-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=11698; h=from:subject; bh=XQE+RQtvOgf7IDyNXJkSiTagf22oyLHsdZOO7629/zo=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBJCopAHYDFDFNQOcMX1zduyEZW6Te4cIIjZsdV Ls+/WV2JATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwSQAKCRCcnaoHP2RA2UM1CA Dqu7WyBHvYudHVCseebPIId5TCFasDWCNiTap22rf4z7pEWYmklhUU7ezj9VadLk4UGvFLgpPZ+syU 8dbu1YMMWtdwkIAk0ybZjbE32A/iJ8t8bIUc4JAEcDWesF4ZKtnvXFfiO/f4bn6xnJkWuq+wMQQquQ t9Aj13cGE5OXQWBSREUD2g7a55Dfir8sNaPwAsFNp4P6XP30SUW1hV1b+Ceh2m6WKIDNgXPwYz8zaL GoTI6Huz8JSifUapj5gc8BJ9HRe3i+2iApyXaPIuzkwHEy988EuGjjcvbRXXLjhplt8vJtAio4d3Ns aR4nsaqeMMwH6s/fJ2qtCr47QXane6 X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use invalidate_lock instead of f2fs' private i_mmap_sem. The intended purpose is exactly the same. By this conversion we fix a long standing race between hole punching and read(2) / readahead(2) paths that can lead to stale page cache contents. CC: Jaegeuk Kim CC: Chao Yu CC: linux-f2fs-devel@lists.sourceforge.net Acked-by: Chao Yu Signed-off-by: Jan Kara --- fs/f2fs/data.c | 4 ++-- fs/f2fs/f2fs.h | 1 - fs/f2fs/file.c | 62 ++++++++++++++++++++++++------------------------- fs/f2fs/super.c | 1 - 4 files changed, 32 insertions(+), 36 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 009a09fb9d88..4c2bad8edcf4 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3165,12 +3165,12 @@ static void f2fs_write_failed(struct address_space *mapping, loff_t to) /* In the fs-verity case, f2fs_end_enable_verity() does the truncate */ if (to > i_size && !f2fs_verity_in_progress(inode)) { down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); truncate_pagecache(inode, i_size); f2fs_truncate_blocks(inode, i_size, true); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); } } diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index c83d90125ebd..ef0eb8b43fff 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -748,7 +748,6 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct rw_semaphore i_gc_rwsem[2]; - struct rw_semaphore i_mmap_sem; struct rw_semaphore i_xattr_sem; /* avoid racing between reading and changing EAs */ int i_extra_isize; /* size of extra space located in i_addr */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ceb575f99048..75e1fe908b7b 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -38,10 +38,7 @@ static vm_fault_t f2fs_filemap_fault(struct vm_fault *vmf) struct inode *inode = file_inode(vmf->vma->vm_file); vm_fault_t ret; - down_read(&F2FS_I(inode)->i_mmap_sem); ret = filemap_fault(vmf); - up_read(&F2FS_I(inode)->i_mmap_sem); - if (!ret) f2fs_update_iostat(F2FS_I_SB(inode), APP_MAPPED_READ_IO, F2FS_BLKSIZE); @@ -102,7 +99,7 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) f2fs_bug_on(sbi, f2fs_has_inline_data(inode)); file_update_time(vmf->vma->vm_file); - down_read(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(inode->i_mapping); lock_page(page); if (unlikely(page->mapping != inode->i_mapping || page_offset(page) > i_size_read(inode) || @@ -161,7 +158,7 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) trace_f2fs_vm_page_mkwrite(page, DATA); out_sem: - up_read(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); sb_end_pagefault(inode->i_sb); err: @@ -942,7 +939,7 @@ int f2fs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, } down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); truncate_setsize(inode, attr->ia_size); @@ -952,7 +949,7 @@ int f2fs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, * do not trim all blocks after i_size if target size is * larger than i_size. */ - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); if (err) return err; @@ -1097,7 +1094,7 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len) blk_end = (loff_t)pg_end << PAGE_SHIFT; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); truncate_inode_pages_range(mapping, blk_start, blk_end - 1); @@ -1106,7 +1103,7 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len) ret = f2fs_truncate_hole(inode, pg_start, pg_end); f2fs_unlock_op(sbi); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); } } @@ -1341,7 +1338,7 @@ static int f2fs_do_collapse(struct inode *inode, loff_t offset, loff_t len) /* avoid gc operation during block exchange */ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); f2fs_lock_op(sbi); f2fs_drop_extent_tree(inode); @@ -1349,7 +1346,7 @@ static int f2fs_do_collapse(struct inode *inode, loff_t offset, loff_t len) ret = __exchange_data_block(inode, inode, end, start, nrpages - end, true); f2fs_unlock_op(sbi); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); return ret; } @@ -1380,13 +1377,13 @@ static int f2fs_collapse_range(struct inode *inode, loff_t offset, loff_t len) return ret; /* write out all moved pages, if possible */ - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); truncate_pagecache(inode, offset); new_size = i_size_read(inode) - len; ret = f2fs_truncate_blocks(inode, new_size, true); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); if (!ret) f2fs_i_size_write(inode, new_size); return ret; @@ -1486,7 +1483,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, pgoff_t end; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); truncate_pagecache_range(inode, (loff_t)index << PAGE_SHIFT, @@ -1498,7 +1495,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, ret = f2fs_get_dnode_of_data(&dn, index, ALLOC_NODE); if (ret) { f2fs_unlock_op(sbi); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); goto out; } @@ -1510,7 +1507,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, f2fs_put_dnode(&dn); f2fs_unlock_op(sbi); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); f2fs_balance_fs(sbi, dn.node_changed); @@ -1545,6 +1542,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct address_space *mapping = inode->i_mapping; pgoff_t nr, pg_start, pg_end, delta, idx; loff_t new_size; int ret = 0; @@ -1567,14 +1565,14 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) f2fs_balance_fs(sbi, true); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = f2fs_truncate_blocks(inode, i_size_read(inode), true); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); if (ret) return ret; /* write out all dirty pages from offset */ - ret = filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); + ret = filemap_write_and_wait_range(mapping, offset, LLONG_MAX); if (ret) return ret; @@ -1585,7 +1583,7 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) /* avoid gc operation during block exchange */ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); truncate_pagecache(inode, offset); while (!ret && idx > pg_start) { @@ -1601,14 +1599,14 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) idx + delta, nr, false); f2fs_unlock_op(sbi); } - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); /* write out all moved pages, if possible */ - down_write(&F2FS_I(inode)->i_mmap_sem); - filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); + filemap_invalidate_lock(mapping); + filemap_write_and_wait_range(mapping, offset, LLONG_MAX); truncate_pagecache(inode, offset); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); if (!ret) f2fs_i_size_write(inode, new_size); @@ -3443,7 +3441,7 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) goto out; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); last_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); @@ -3479,7 +3477,7 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) } up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); out: inode_unlock(inode); @@ -3596,7 +3594,7 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) } down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); last_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); @@ -3632,7 +3630,7 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) } up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); if (ret >= 0) { F2FS_I(inode)->i_flags &= ~F2FS_IMMUTABLE_FL; @@ -3752,7 +3750,7 @@ static int f2fs_sec_trim_file(struct file *filp, unsigned long arg) goto err; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = filemap_write_and_wait_range(mapping, range.start, to_end ? LLONG_MAX : end_addr - 1); @@ -3839,7 +3837,7 @@ static int f2fs_sec_trim_file(struct file *filp, unsigned long arg) ret = f2fs_secure_erase(prev_bdev, inode, prev_index, prev_block, len, range.flags); out: - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); err: inode_unlock(inode); @@ -4314,9 +4312,9 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) /* if we couldn't write data, we should deallocate blocks. */ if (preallocated && i_size_read(inode) < target_size) { down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); f2fs_truncate(inode); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 7d325bfaf65a..22e942aac7ad 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1187,7 +1187,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) mutex_init(&fi->inmem_lock); init_rwsem(&fi->i_gc_rwsem[READ]); init_rwsem(&fi->i_gc_rwsem[WRITE]); - init_rwsem(&fi->i_mmap_sem); init_rwsem(&fi->i_xattr_sem); /* Will be used by directory only */ From patchwork Tue Jun 15 09:18:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320815 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60FDDC4CC6E for ; Tue, 15 Jun 2021 09:18:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4208C61433 for ; Tue, 15 Jun 2021 09:18:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231536AbhFOJUm (ORCPT ); Tue, 15 Jun 2021 05:20:42 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54712 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231355AbhFOJUX (ORCPT ); Tue, 15 Jun 2021 05:20:23 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 47609219D9; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iscQRmTFfUtbnHbVan6D5Z1ED2H/9ug7FhtB+MiEd9M=; b=F36nxO6jsDB59OvP6BvsVojaNTSYgF6ry34z7TrGLPMDMCvXDaZ5rozcUt5IuRUZzBCwYg FKBmXIUBPxS9uyW7dRYXHvRFRdfe+TBdahNRQKJrp1enmXnwg9LaEVvFb959kr4NgyvzwQ WYmeh4VNot7FOpQLEMykPYeETqlIbDQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iscQRmTFfUtbnHbVan6D5Z1ED2H/9ug7FhtB+MiEd9M=; b=jZgGz+7wnpBbulrUmil8jN4hUchx0lxG3F8qZe0EXtzKzdsee+KzcsJg1ubTmT/DH+Yqiy 7K6j69rxH2k6LnDg== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 27F22A3B97; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 52F1A1F2CC5; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , Miklos Szeredi Subject: [PATCH 12/14] fuse: Convert to using invalidate_lock Date: Tue, 15 Jun 2021 11:18:02 +0200 Message-Id: <20210615091814.28626-12-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=11364; h=from:subject; bh=EntRuShPEtQtIt5wPMs3zYDgVJwnikp8NrWTTTMClnY=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBKcImvaLLO3GLMUIkO1FQ9N7hyQrEethJtF+T2 OjnaH3yJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwSgAKCRCcnaoHP2RA2eqIB/ 0XsSJ9R41LEIlz8bJXFe/v4JafXPyU363Mf6Ut3D9CFPLm2J5aEqHO40DitTrnZT2W0Z2e5dFxFG7P KKR5CBn7XxYz5NZwW5e2WsU99js7TT//q+f6DAo+Rm0X+A3mag8GKmE2Hy0msG1xxIKXJmoRSemC+Q rQM9S+B8S80Uurablmo6GouAbyyA2yM4HUOR+HF6nE1dMKAriuEzAjakuxcN2TfpdNBAsOWFW/xayf bwODLhEyTqfMWKBAjjAehfBL2+FfeASdUJ6gjHBoXRIriwB7A3lpNW+dfVZ6yLEUHgzbNaK8/2Eo4Y +XqNnYMnmtZRUnrq5zCLXoKh8sw1TA X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use invalidate_lock instead of fuse's private i_mmap_sem. The intended purpose is exactly the same. By this conversion we fix a long standing race between hole punching and read(2) / readahead(2) paths that can lead to stale page cache contents. CC: Miklos Szeredi Reviewed-by: Miklos Szeredi Signed-off-by: Jan Kara --- fs/fuse/dax.c | 50 +++++++++++++++++++++++------------------------- fs/fuse/dir.c | 11 ++++++----- fs/fuse/file.c | 10 +++++----- fs/fuse/fuse_i.h | 7 ------- fs/fuse/inode.c | 1 - 5 files changed, 35 insertions(+), 44 deletions(-) diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index ff99ab2a3c43..08d2b7e2c3a0 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -443,12 +443,12 @@ static int fuse_setup_new_dax_mapping(struct inode *inode, loff_t pos, /* * Can't do inline reclaim in fault path. We call * dax_layout_busy_page() before we free a range. And - * fuse_wait_dax_page() drops fi->i_mmap_sem lock and requires it. - * In fault path we enter with fi->i_mmap_sem held and can't drop - * it. Also in fault path we hold fi->i_mmap_sem shared and not - * exclusive, so that creates further issues with fuse_wait_dax_page(). - * Hence return -EAGAIN and fuse_dax_fault() will wait for a memory - * range to become free and retry. + * fuse_wait_dax_page() drops mapping->invalidate_lock and requires it. + * In fault path we enter with mapping->invalidate_lock held and can't + * drop it. Also in fault path we hold mapping->invalidate_lock shared + * and not exclusive, so that creates further issues with + * fuse_wait_dax_page(). Hence return -EAGAIN and fuse_dax_fault() + * will wait for a memory range to become free and retry. */ if (flags & IOMAP_FAULT) { alloc_dmap = alloc_dax_mapping(fcd); @@ -512,7 +512,7 @@ static int fuse_upgrade_dax_mapping(struct inode *inode, loff_t pos, down_write(&fi->dax->sem); node = interval_tree_iter_first(&fi->dax->tree, idx, idx); - /* We are holding either inode lock or i_mmap_sem, and that should + /* We are holding either inode lock or invalidate_lock, and that should * ensure that dmap can't be truncated. We are holding a reference * on dmap and that should make sure it can't be reclaimed. So dmap * should still be there in tree despite the fact we dropped and @@ -659,14 +659,12 @@ static const struct iomap_ops fuse_iomap_ops = { static void fuse_wait_dax_page(struct inode *inode) { - struct fuse_inode *fi = get_fuse_inode(inode); - - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); schedule(); - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); } -/* Should be called with fi->i_mmap_sem lock held exclusively */ +/* Should be called with mapping->invalidate_lock held exclusively */ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, loff_t start, loff_t end) { @@ -812,18 +810,18 @@ static vm_fault_t __fuse_dax_fault(struct vm_fault *vmf, * we do not want any read/write/mmap to make progress and try * to populate page cache or access memory we are trying to free. */ - down_read(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(inode->i_mapping); ret = dax_iomap_fault(vmf, pe_size, &pfn, &error, &fuse_iomap_ops); if ((ret & VM_FAULT_ERROR) && error == -EAGAIN) { error = 0; retry = true; - up_read(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); goto retry; } if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); - up_read(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); if (write) sb_end_pagefault(sb); @@ -959,7 +957,7 @@ inode_inline_reclaim_one_dmap(struct fuse_conn_dax *fcd, struct inode *inode, int ret; struct interval_tree_node *node; - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); /* Lookup a dmap and corresponding file offset to reclaim. */ down_read(&fi->dax->sem); @@ -1020,7 +1018,7 @@ inode_inline_reclaim_one_dmap(struct fuse_conn_dax *fcd, struct inode *inode, out_write_dmap_sem: up_write(&fi->dax->sem); out_mmap_sem: - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); return dmap; } @@ -1049,10 +1047,10 @@ alloc_dax_mapping_reclaim(struct fuse_conn_dax *fcd, struct inode *inode) * had a reference or some other temporary failure, * Try again. We want to give up inline reclaim only * if there is no range assigned to this node. Otherwise - * if a deadlock is possible if we sleep with fi->i_mmap_sem - * held and worker to free memory can't make progress due - * to unavailability of fi->i_mmap_sem lock. So sleep - * only if fi->dax->nr=0 + * if a deadlock is possible if we sleep with + * mapping->invalidate_lock held and worker to free memory + * can't make progress due to unavailability of + * mapping->invalidate_lock. So sleep only if fi->dax->nr=0 */ if (retry) continue; @@ -1060,8 +1058,8 @@ alloc_dax_mapping_reclaim(struct fuse_conn_dax *fcd, struct inode *inode) * There are no mappings which can be reclaimed. Wait for one. * We are not holding fi->dax->sem. So it is possible * that range gets added now. But as we are not holding - * fi->i_mmap_sem, worker should still be able to free up - * a range and wake us up. + * mapping->invalidate_lock, worker should still be able to + * free up a range and wake us up. */ if (!fi->dax->nr && !(fcd->nr_free_ranges > 0)) { if (wait_event_killable_exclusive(fcd->range_waitq, @@ -1107,7 +1105,7 @@ static int lookup_and_reclaim_dmap_locked(struct fuse_conn_dax *fcd, /* * Free a range of memory. * Locking: - * 1. Take fi->i_mmap_sem to block dax faults. + * 1. Take mapping->invalidate_lock to block dax faults. * 2. Take fi->dax->sem to protect interval tree and also to make sure * read/write can not reuse a dmap which we might be freeing. */ @@ -1121,7 +1119,7 @@ static int lookup_and_reclaim_dmap(struct fuse_conn_dax *fcd, loff_t dmap_start = start_idx << FUSE_DAX_SHIFT; loff_t dmap_end = (dmap_start + FUSE_DAX_SZ) - 1; - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); ret = fuse_dax_break_layouts(inode, dmap_start, dmap_end); if (ret) { pr_debug("virtio_fs: fuse_dax_break_layouts() failed. err=%d\n", @@ -1133,7 +1131,7 @@ static int lookup_and_reclaim_dmap(struct fuse_conn_dax *fcd, ret = lookup_and_reclaim_dmap_locked(fcd, inode, start_idx); up_write(&fi->dax->sem); out_mmap_sem: - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); return ret; } diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 1b6c001a7dd1..4ea9c690dfed 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1601,6 +1601,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, struct fuse_mount *fm = get_fuse_mount(inode); struct fuse_conn *fc = fm->fc; struct fuse_inode *fi = get_fuse_inode(inode); + struct address_space *mapping = inode->i_mapping; FUSE_ARGS(args); struct fuse_setattr_in inarg; struct fuse_attr_out outarg; @@ -1625,11 +1626,11 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, } if (FUSE_IS_DAX(inode) && is_truncate) { - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(mapping); fault_blocked = true; err = fuse_dax_break_layouts(inode, 0, 0); if (err) { - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(mapping); return err; } } @@ -1739,13 +1740,13 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, if ((is_truncate || !is_wb) && S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { truncate_pagecache(inode, outarg.attr.size); - invalidate_inode_pages2(inode->i_mapping); + invalidate_inode_pages2(mapping); } clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); out: if (fault_blocked) - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(mapping); return 0; @@ -1756,7 +1757,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); if (fault_blocked) - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(mapping); return err; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 09ef2a4d25ed..515104032ddc 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -243,7 +243,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) } if (dax_truncate) { - down_write(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); err = fuse_dax_break_layouts(inode, 0, 0); if (err) goto out; @@ -255,7 +255,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) out: if (dax_truncate) - up_write(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); if (is_wb_truncate | dax_truncate) { fuse_release_nowrite(inode); @@ -2920,7 +2920,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, if (lock_inode) { inode_lock(inode); if (block_faults) { - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); err = fuse_dax_break_layouts(inode, 0, 0); if (err) goto out; @@ -2976,7 +2976,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); if (block_faults) - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); if (lock_inode) inode_unlock(inode); @@ -3045,7 +3045,7 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in, * modifications. Yet this does give less guarantees than if the * copying was performed with write(2). * - * To fix this a i_mmap_sem style lock could be used to prevent new + * To fix this a mapping->invalidate_lock could be used to prevent new * faults while the copy is ongoing. */ err = fuse_writeback_range(inode_out, pos_out, pos_out + len - 1); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7e463e220053..5130e88f811e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -149,13 +149,6 @@ struct fuse_inode { /** Lock to protect write related fields */ spinlock_t lock; - /** - * Can't take inode lock in fault path (leads to circular dependency). - * Introduce another semaphore which can be taken in fault path and - * then other filesystem paths can take this to block faults. - */ - struct rw_semaphore i_mmap_sem; - #ifdef CONFIG_FUSE_DAX /* * Dax specific inode data diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 393e36b74dc4..f73bab71a0a0 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -85,7 +85,6 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->orig_ino = 0; fi->state = 0; mutex_init(&fi->mutex); - init_rwsem(&fi->i_mmap_sem); spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); if (!fi->forget) From patchwork Tue Jun 15 09:18:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4881AC49EA6 for ; Tue, 15 Jun 2021 09:18:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A25861245 for ; Tue, 15 Jun 2021 09:18:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231485AbhFOJUj (ORCPT ); Tue, 15 Jun 2021 05:20:39 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54752 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231361AbhFOJUX (ORCPT ); Tue, 15 Jun 2021 05:20:23 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4C34E219DA; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5hN9VihgYabPzS6M/DoMuNHHykbae3N2/E4zCgl0t4=; b=cHi43eUh77NgpQ2alhaJPAVnXodRpsXRDkVYR6bXyu3rUlE5FNTfCtH+ARQ5Y7T1QeNgPq qB+jKZqRvyX/JUEOxqTxsKoKDP7Y0TDmBqqdOXCa4Syt23NGCfSgKvAhgxt8VPpjZt1wj4 5LOsLpnrhczPLKzMNdtv15HOUkQf4W4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5hN9VihgYabPzS6M/DoMuNHHykbae3N2/E4zCgl0t4=; b=NVtFvCI1YLGCVQRr2zH2Y3DXubADGSSNhU4M4pdGhYhhqrweXd1a/G3fuV1TiertvlFR6n BnL/Y+41CwtNyYCw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 31419A3BA1; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 5746C1F2CC6; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 13/14] ceph: Fix race between hole punch and page fault Date: Tue, 15 Jun 2021 11:18:03 +0200 Message-Id: <20210615091814.28626-13-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2465; h=from:subject; bh=Sp8IgV3mV8zmiY5I9Wbg4z0IgLX0J9niLgWRID5omWo=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBLt/G434Rs9ZHmHm4m7KwtsKNjseeQr2d06yTU b1jblrKJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwSwAKCRCcnaoHP2RA2SvIB/ 9qljIE76TJa7zLPbM2BfNhj7dfMN+u8KBs2gqKiv0lYSM0fn2f0wdZ3mPwXyDwG9B/crgHbMQwKl/4 fZke50BXKaAwjbbcRXpZGfACp8hpFl8xfYPfBB/EnzpwOVf68yg0DWmjaO4eVrlqJC+7q6PC11yQTN m1KMn7H99fRuQa7EaGokNSZSKMSQmFMqcDdi7HiAlgzsO2UoBgF8th11yGngaZO6pWOkVmVHz1Uo2h 9nKBcOrvj1OOfE57pew/y6NYLnaCsLWIIDNNEyc4IQqAUuPP217fS1jm7/bb9aPukFLDBdJDNCrPgk 4wgzkMOMy0FWgOE4YLYyChJUAjsEek X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Ceph has a following race between hole punching and page fault: CPU1 CPU2 ceph_fallocate() ... ceph_zero_pagecache_range() ceph_filemap_fault() faults in page in the range being punched ceph_zero_objects() And now we have a page in punched range with invalid data. Fix the problem by using mapping->invalidate_lock similarly to other filesystems. Note that using invalidate_lock also fixes a similar race wrt ->readpage(). CC: Jeff Layton CC: ceph-devel@vger.kernel.org Reviewed-by: Jeff Layton Signed-off-by: Jan Kara --- fs/ceph/addr.c | 9 ++++++--- fs/ceph/file.c | 2 ++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index c1570fada3d8..4664b93a32db 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1401,9 +1401,11 @@ static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf) ret = VM_FAULT_SIGBUS; } else { struct address_space *mapping = inode->i_mapping; - struct page *page = find_or_create_page(mapping, 0, - mapping_gfp_constraint(mapping, - ~__GFP_FS)); + struct page *page; + + filemap_invalidate_lock_shared(mapping); + page = find_or_create_page(mapping, 0, + mapping_gfp_constraint(mapping, ~__GFP_FS)); if (!page) { ret = VM_FAULT_OOM; goto out_inline; @@ -1424,6 +1426,7 @@ static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf) vmf->page = page; ret = VM_FAULT_MAJOR | VM_FAULT_LOCKED; out_inline: + filemap_invalidate_unlock_shared(mapping); dout("filemap_fault %p %llu read inline data ret %x\n", inode, off, ret); } diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 77fc037d5beb..e826b1b1e4af 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -2083,6 +2083,7 @@ static long ceph_fallocate(struct file *file, int mode, if (ret < 0) goto unlock; + filemap_invalidate_lock(inode->i_mapping); ceph_zero_pagecache_range(inode, offset, length); ret = ceph_zero_objects(inode, offset, length); @@ -2095,6 +2096,7 @@ static long ceph_fallocate(struct file *file, int mode, if (dirty) __mark_inode_dirty(inode, dirty); } + filemap_invalidate_unlock(inode->i_mapping); ceph_put_cap_refs(ci, got); unlock: From patchwork Tue Jun 15 09:18:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12320813 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE272C4CC6C for ; Tue, 15 Jun 2021 09:18:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D33AD61433 for ; Tue, 15 Jun 2021 09:18:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231513AbhFOJUk (ORCPT ); Tue, 15 Jun 2021 05:20:40 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:54754 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231362AbhFOJUX (ORCPT ); Tue, 15 Jun 2021 05:20:23 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 4DD37219DB; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2y4ATobPVnEcMEJhABd7mhzgYGQRV2///C8khknCrDI=; b=jkrSgf91dMSZKGLpwFnVzcX+RniA5lrBYoOpTKdscfVANjpzuygwReWX45ijFTqHWI/oP5 6SIc40rlo9wSEueoc+eqxwrP39o9wT6pJo8bu1eWvP00C5ncP2WlrT27Hw1Zb0tWoxQpYR LvbV9V92glyJeD13FUgXPPVcUCYib8A= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623748695; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2y4ATobPVnEcMEJhABd7mhzgYGQRV2///C8khknCrDI=; b=8JAptZxj9cSPv7QkSZ+GkFGANKcsiSwNgVsiI14DugpGsIS+68yIl+moQjnMjZBiYkk9h+ Z1VMWYbCuGHWBYAw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id 36F8FA3BA4; Tue, 15 Jun 2021 09:18:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 5BFB01F2CC7; Tue, 15 Jun 2021 11:18:14 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 14/14] cifs: Fix race between hole punch and page fault Date: Tue, 15 Jun 2021 11:18:04 +0200 Message-Id: <20210615091814.28626-14-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210615090844.6045-1-jack@suse.cz> References: <20210615090844.6045-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1533; h=from:subject; bh=ZCABbWCWKurwj2pIQo6pZPjhdXQ595KGSFP8UTIZMfk=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgyHBM5bKiu8eYGM3Nz8nWQKIiVrKQrT6ftatfT6be 2XeiCXeJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYMhwTAAKCRCcnaoHP2RA2di6B/ 45mhRVQnW7UEq8suFiCGDgISonzDUeQNVWEgcLjdXBZEIDbjCB7lYGsbJf/2IQcCDyAp1zJtKbAA97 LKvEdr6VjgcXgV5+oiI4V26Eg4SeicN83dDn0lmsYHvEt32v8FDDtMnb3ANMlKSUEqBQNhPBKvF0Hu 7oan61Xlt9SjuKtR2DNIo6BnoW+d3ctXdP54jBT2o81yaKbIQNLQF1CkZsz+XDP8ihnmg7st97ymKU TPRJSfingbG2ZM2Sp63ozFihnFywZI3GwOjoVYCxmX8taZ69x0x3a1KarULQ8O4uuVOhr+9KDsEgoq Py6BTeucaRFVJYzpQRsDWUnDh/IZou X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Cifs has a following race between hole punching and page fault: CPU1 CPU2 smb3_fallocate() smb3_punch_hole() truncate_pagecache_range() filemap_fault() - loads old data into the page cache SMB2_ioctl(..., FSCTL_SET_ZERO_DATA, ...) And now we have stale data in the page cache. Fix the problem by locking out faults (as well as reads) using mapping->invalidate_lock while hole punch is running. CC: Steve French CC: linux-cifs@vger.kernel.org Signed-off-by: Jan Kara --- fs/cifs/smb2ops.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 21ef51d338e0..07c9ec047020 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -3581,6 +3581,7 @@ static long smb3_punch_hole(struct file *file, struct cifs_tcon *tcon, return rc; } + filemap_invalidate_lock(inode->i_mapping); /* * We implement the punch hole through ioctl, so we need remove the page * caches first, otherwise the data may be inconsistent with the server. @@ -3598,6 +3599,7 @@ static long smb3_punch_hole(struct file *file, struct cifs_tcon *tcon, sizeof(struct file_zero_data_information), CIFSMaxBufSize, NULL, NULL); free_xid(xid); + filemap_invalidate_unlock(inode->i_mapping); return rc; }