From patchwork Mon Jun 7 14:52:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303753 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDBEEC49EBC for ; Mon, 7 Jun 2021 14:52:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9FD8861107 for ; Mon, 7 Jun 2021 14:52:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231312AbhFGOyh (ORCPT ); Mon, 7 Jun 2021 10:54:37 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51610 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230422AbhFGOyb (ORCPT ); Mon, 7 Jun 2021 10:54:31 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 8E9FC1FDCF; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6LMR/Ngf6lvtbuuF+m2yUFjG+uBCM2FsKQmwAkdHZe8=; b=aTrKFrEoFfIPG0RdDASQYvnH8TmAFvfCPq4uxZkEXLy7DNxnH08cwhx/AwFA/Tv0JF4wS/ IlqXZgRlwSQajws3dKw/kPeA//zQmkamTpW2pMc6YsLCZSnaRa9ajD1UFtMReIuQ5IXJzh k78CLYo2pBfghAmPIKb4CE+Xqukb1n8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6LMR/Ngf6lvtbuuF+m2yUFjG+uBCM2FsKQmwAkdHZe8=; b=bmjXxtFApaFtEEahfcHQjn9EG70Ymg8Dw+dnjDevKVNJerqutgpHkzsoqnI77XBPKs4eTB 7peE/V1Z5rlBxFBg== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id B6D0DA3B97; Mon, 7 Jun 2021 14:52:36 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 8BA221F2CAD; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , Christoph Hellwig , Hugh Dickins Subject: [PATCH 01/14] mm: Fix comments mentioning i_mutex Date: Mon, 7 Jun 2021 16:52:11 +0200 Message-Id: <20210607145236.31852-1-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=10773; h=from:subject; bh=qWGnUSPdCswzeaXtJkuwfIKWu5LVdO9uiV6kdiCIgJw=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKcw5d45xtTM4OMxdijV495g/eS0k1cXgftUY1X eDhA6QaJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ynAAKCRCcnaoHP2RA2Ut+B/ 90RfAXwC6NTXLh+IxhpM05IXfCw8Dy7xc9ffQrLj11tBaPdp0Fe3Qw81d2kPDP2+2iisxtWxW4TzPj 3V+hmFszke3OZ+62kHq/AuCoalBMf2d2pGb/EHTeZrPgOBbpOIjR7dSmmDKTXnaUHqIIvyrl0FQcl4 Riju5j9kBwxKLPDXNlG882t/ZeBkdho5YwkqTz3CLcGt17DZ5ePLdSi0QH185LELw+MX8IlaDmAgQY T7edkEqHNVKv1XyTNKAPsBg1so8y9242sh91K9x/QJ0Ebv0MeENb2IqsLaCheae1hAr0C1uOCH+9XJ RjKSj2EaOPbXEw0fEcYX1gOiFdE3wo X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Reviewed-by: Christoph Hellwig Acked-by: Hugh Dickins Signed-off-by: Jan Kara Reviewed-by: Darrick J. Wong --- mm/filemap.c | 10 +++++----- mm/madvise.c | 2 +- mm/memory-failure.c | 2 +- mm/rmap.c | 6 +++--- mm/shmem.c | 20 ++++++++++---------- mm/truncate.c | 8 ++++---- 6 files changed, 24 insertions(+), 24 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 66f7e9fdfbc4..ba1068a1837f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -76,7 +76,7 @@ * ->swap_lock (exclusive_swap_page, others) * ->i_pages lock * - * ->i_mutex + * ->i_rwsem * ->i_mmap_rwsem (truncate->unmap_mapping_range) * * ->mmap_lock @@ -87,7 +87,7 @@ * ->mmap_lock * ->lock_page (access_process_vm) * - * ->i_mutex (generic_perform_write) + * ->i_rwsem (generic_perform_write) * ->mmap_lock (fault_in_pages_readable->do_page_fault) * * bdi->wb.list_lock @@ -3710,12 +3710,12 @@ EXPORT_SYMBOL(generic_perform_write); * modification times and calls proper subroutines depending on whether we * do direct IO or a standard buffered write. * - * It expects i_mutex to be grabbed unless we work on a block device or similar + * It expects i_rwsem to be grabbed unless we work on a block device or similar * object which does not need locking at all. * * This function does *not* take care of syncing data in case of O_SYNC write. * A caller has to handle it. This is mainly due to the fact that we want to - * avoid syncing under i_mutex. + * avoid syncing under i_rwsem. * * Return: * * number of bytes written, even for truncated writes @@ -3803,7 +3803,7 @@ EXPORT_SYMBOL(__generic_file_write_iter); * * This is a wrapper around __generic_file_write_iter() to be used by most * filesystems. It takes care of syncing the file in case of O_SYNC file - * and acquires i_mutex as needed. + * and acquires i_rwsem as needed. * Return: * * negative error code if no data has been written at all of * vfs_fsync_range() failed for a synchronous write diff --git a/mm/madvise.c b/mm/madvise.c index 63e489e5bfdb..a0137706b92a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -853,7 +853,7 @@ static long madvise_remove(struct vm_area_struct *vma, + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); /* - * Filesystem's fallocate may need to take i_mutex. We need to + * Filesystem's fallocate may need to take i_rwsem. We need to * explicitly grab a reference because the vma (and hence the * vma's reference to the file) can go away as soon as we drop * mmap_lock. diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 85ad98c00fd9..9dcc9bcea731 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -704,7 +704,7 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn) /* * Truncation is a bit tricky. Enable it per file system for now. * - * Open: to take i_mutex or not for this? Right now we don't. + * Open: to take i_rwsem or not for this? Right now we don't. */ return truncate_error_page(p, pfn, mapping); } diff --git a/mm/rmap.c b/mm/rmap.c index 693a610e181d..a35cbbbded0d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -20,9 +20,9 @@ /* * Lock ordering in mm: * - * inode->i_mutex (while writing or truncating, not reading or faulting) + * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock - * page->flags PG_locked (lock_page) * (see huegtlbfs below) + * page->flags PG_locked (lock_page) * (see hugetlbfs below) * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) * mapping->i_mmap_rwsem * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) @@ -41,7 +41,7 @@ * in arch-dependent flush_dcache_mmap_lock, * within bdi.wb->list_lock in __sync_single_inode) * - * anon_vma->rwsem,mapping->i_mutex (memory_failure, collect_procs_anon) + * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock * pte map lock * diff --git a/mm/shmem.c b/mm/shmem.c index a08cedefbfaa..056204b1f76a 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -96,7 +96,7 @@ static struct vfsmount *shm_mnt; /* * shmem_fallocate communicates with shmem_fault or shmem_writepage via - * inode->i_private (with i_mutex making sure that it has only one user at + * inode->i_private (with i_rwsem making sure that it has only one user at * a time): we would prefer not to enlarge the shmem inode just for that. */ struct shmem_falloc { @@ -774,7 +774,7 @@ static int shmem_free_swap(struct address_space *mapping, * Determine (in bytes) how many of the shmem object's pages mapped by the * given offsets are swapped out. * - * This is safe to call without i_mutex or the i_pages lock thanks to RCU, + * This is safe to call without i_rwsem or the i_pages lock thanks to RCU, * as long as the inode doesn't go away and racy results are not a problem. */ unsigned long shmem_partial_swap_usage(struct address_space *mapping, @@ -806,7 +806,7 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping, * Determine (in bytes) how many of the shmem object's pages mapped by the * given vma is swapped out. * - * This is safe to call without i_mutex or the i_pages lock thanks to RCU, + * This is safe to call without i_rwsem or the i_pages lock thanks to RCU, * as long as the inode doesn't go away and racy results are not a problem. */ unsigned long shmem_swap_usage(struct vm_area_struct *vma) @@ -1069,7 +1069,7 @@ static int shmem_setattr(struct user_namespace *mnt_userns, loff_t oldsize = inode->i_size; loff_t newsize = attr->ia_size; - /* protected by i_mutex */ + /* protected by i_rwsem */ if ((newsize < oldsize && (info->seals & F_SEAL_SHRINK)) || (newsize > oldsize && (info->seals & F_SEAL_GROW))) return -EPERM; @@ -2049,7 +2049,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) /* * Trinity finds that probing a hole which tmpfs is punching can * prevent the hole-punch from ever completing: which in turn - * locks writers out with its hold on i_mutex. So refrain from + * locks writers out with its hold on i_rwsem. So refrain from * faulting pages into the hole while it's being punched. Although * shmem_undo_range() does remove the additions, it may be unable to * keep up, as each new page needs its own unmap_mapping_range() call, @@ -2060,7 +2060,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) * we just need to make racing faults a rare case. * * The implementation below would be much simpler if we just used a - * standard mutex or completion: but we cannot take i_mutex in fault, + * standard mutex or completion: but we cannot take i_rwsem in fault, * and bloating every shmem inode for this unlikely case would be sad. */ if (unlikely(inode->i_private)) { @@ -2518,7 +2518,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping, struct shmem_inode_info *info = SHMEM_I(inode); pgoff_t index = pos >> PAGE_SHIFT; - /* i_mutex is held by caller */ + /* i_rwsem is held by caller */ if (unlikely(info->seals & (F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) { if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) @@ -2618,7 +2618,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) /* * We must evaluate after, since reads (unlike writes) - * are called without i_mutex protection against truncate + * are called without i_rwsem protection against truncate */ nr = PAGE_SIZE; i_size = i_size_read(inode); @@ -2688,7 +2688,7 @@ static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence) return -ENXIO; inode_lock(inode); - /* We're holding i_mutex so we can access i_size directly */ + /* We're holding i_rwsem so we can access i_size directly */ offset = mapping_seek_hole_data(mapping, offset, inode->i_size, whence); if (offset >= 0) offset = vfs_setpos(file, offset, MAX_LFS_FILESIZE); @@ -2717,7 +2717,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1; DECLARE_WAIT_QUEUE_HEAD_ONSTACK(shmem_falloc_waitq); - /* protected by i_mutex */ + /* protected by i_rwsem */ if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) { error = -EPERM; goto out; diff --git a/mm/truncate.c b/mm/truncate.c index 95af244b112a..57a618c4a0d6 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -415,7 +415,7 @@ EXPORT_SYMBOL(truncate_inode_pages_range); * @mapping: mapping to truncate * @lstart: offset from which to truncate * - * Called under (and serialised by) inode->i_mutex. + * Called under (and serialised by) inode->i_rwsem. * * Note: When this function returns, there can be a page in the process of * deletion (inside __delete_from_page_cache()) in the specified range. Thus @@ -432,7 +432,7 @@ EXPORT_SYMBOL(truncate_inode_pages); * truncate_inode_pages_final - truncate *all* pages before inode dies * @mapping: mapping to truncate * - * Called under (and serialized by) inode->i_mutex. + * Called under (and serialized by) inode->i_rwsem. * * Filesystems have to use this in the .evict_inode path to inform the * VM that this is the final truncate and the inode is going away. @@ -753,7 +753,7 @@ EXPORT_SYMBOL(truncate_pagecache); * setattr function when ATTR_SIZE is passed in. * * Must be called with a lock serializing truncates and writes (generally - * i_mutex but e.g. xfs uses a different lock) and before all filesystem + * i_rwsem but e.g. xfs uses a different lock) and before all filesystem * specific block truncation has been performed. */ void truncate_setsize(struct inode *inode, loff_t newsize) @@ -782,7 +782,7 @@ EXPORT_SYMBOL(truncate_setsize); * * The function must be called after i_size is updated so that page fault * coming after we unlock the page will already see the new i_size. - * The function must be called while we still hold i_mutex - this not only + * The function must be called while we still hold i_rwsem - this not only * makes sure i_size is stable but also that userspace cannot observe new * i_size value before we are prepared to store mmap writes at new inode size. */ From patchwork Mon Jun 7 14:52:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AC21C49EA4 for ; Mon, 7 Jun 2021 14:52:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8869260FE5 for ; Mon, 7 Jun 2021 14:52:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231263AbhFGOye (ORCPT ); Mon, 7 Jun 2021 10:54:34 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:34648 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230302AbhFGOya (ORCPT ); Mon, 7 Jun 2021 10:54:30 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 831A521A7F; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y1jAxj7HpC0zj00ojIU7+2/5URp27cwg5ekGhnTUbK4=; b=ywpLGJdty5iSkKm9w/AOo+yDNL4Z58wAqS4EOreSA/RTLfpPyDQIQG9zDcEQUTUIsTn7Uj 94XekAdYr8iJEtAEitFyi7XMby7ti4W1OsXID63Z8n/ZBzDTENhqBEKOs3aH5j+V9Ua27A hMpOy+zgKJ+RI9txJxqgPDv4dQU0n94= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y1jAxj7HpC0zj00ojIU7+2/5URp27cwg5ekGhnTUbK4=; b=MZWqBrCVyau4A4+7kibfDaD/WXjp5/wDxL8xwWeSypxo9GmQTmBtF3KWOEsjSfh6Nm1WBL NWn8ph386/4QalBQ== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id B4900A3B96; Mon, 7 Jun 2021 14:52:36 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 882571E0B03; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , "Darrick J . Wong" Subject: [PATCH 02/14] documentation: Sync file_operations members with reality Date: Mon, 7 Jun 2021 16:52:12 +0200 Message-Id: <20210607145236.31852-2-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2278; h=from:subject; bh=+JHTm1BDQQGdP9H1PUT1wwFK/NSOXnEkL9zE0DdMSHk=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKdtH3fQO/hIgFbbEMhZxePOrm925VwhMgEiw4b MZAHXzCJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ynQAKCRCcnaoHP2RA2fHqCA DR4FmBISIXxJa3PNOQ3OqL5QeKcbbO1Zpjtii476euyuydAtDIIYOyEz257A1VHUgxoMCEXnAQI9ff sLzEt9kmkB1b/yKBs2CvDun65BumMP87ftjIJySy8SprEB/KpnHt2Ok2HMRiFDKugwkpzCPCmWqRdA jNdd5215Cmv3aHgIN0bFdYUv4DjnnZela2eh1CF2cAlSU1SQ/h0TYM4tqoIaGZPe1bPPjLA5SNr+k4 HEKKAlOFdZQO9jpOjAlR64g2DY5zuPS8ZAg7woFTAlj4fm1VlUWxTlohEV6g6xHmI5jPBt+0u0AsgA oDOxcxy/EcMWvuXXl9AhoPhV8EF94z X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Sync listing of struct file_operations members with the real one in fs.h. Acked-by: Darrick J. Wong Signed-off-by: Jan Kara --- Documentation/filesystems/locking.rst | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 1e894480115b..4ed2b22bd0a8 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -506,6 +506,7 @@ prototypes:: ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + int (*iopoll) (struct kiocb *kiocb, bool spin); int (*iterate) (struct file *, struct dir_context *); int (*iterate_shared) (struct file *, struct dir_context *); __poll_t (*poll) (struct file *, struct poll_table_struct *); @@ -518,12 +519,6 @@ prototypes:: int (*fsync) (struct file *, loff_t start, loff_t end, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); - ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, - loff_t *); - ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, - loff_t *); - ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, - void __user *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area)(struct file *, unsigned long, @@ -536,6 +531,14 @@ prototypes:: size_t, unsigned int); int (*setlease)(struct file *, long, struct file_lock **, void **); long (*fallocate)(struct file *, int, loff_t, loff_t); + void (*show_fdinfo)(struct seq_file *m, struct file *f); + unsigned (*mmap_capabilities)(struct file *); + ssize_t (*copy_file_range)(struct file *, loff_t, struct file *, + loff_t, size_t, unsigned int); + loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t len, unsigned int remap_flags); + int (*fadvise)(struct file *, loff_t, loff_t, int); locking rules: All may block. From patchwork Mon Jun 7 14:52:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303751 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 425A3C49EB9 for ; Mon, 7 Jun 2021 14:52:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 13C2560FE5 for ; Mon, 7 Jun 2021 14:52:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231284AbhFGOyf (ORCPT ); Mon, 7 Jun 2021 10:54:35 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51562 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230363AbhFGOya (ORCPT ); Mon, 7 Jun 2021 10:54:30 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 8EC251FDD0; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TCxS8KQ2T5ocmcNzii5ZWZjeiEyKsXG+pcIA7IGDUn4=; b=LCOtKh5vxSnJF3Ws2XMv1S4iCVNaSkoB5kHjPMsF830Gwhk0LA41rwnGYcxDYajHMksrV0 DO8p8oLSE1Yi7dZUJ6dmijeY8ReQ2YwgCwCPmQyUoxBvs2KwKJANACvj6C21qPMreCdG9p voHqoe28N6MgQMltxw85CX8rwyo6QYg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TCxS8KQ2T5ocmcNzii5ZWZjeiEyKsXG+pcIA7IGDUn4=; b=AYH38T/lulHt18d/uuLosBOncZyFXTFkrhd2Yz+ywWgScZSBq3UmXumgYuodvOCe9GtOtd Dbbt+AkDy4MIGRBA== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id BB24BA3B99; Mon, 7 Jun 2021 14:52:36 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 8DCA51F2CA8; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock Date: Mon, 7 Jun 2021 16:52:13 +0200 Message-Id: <20210607145236.31852-3-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=19741; h=from:subject; bh=y1/kmgvm0GK3x29JLYz5RikRpskFzJrYZ9n3HYIjqj4=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKeuW9jlBgWBYTfyO6PQ8idYZTTgPvDGWjcISke 14gCr8yJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4yngAKCRCcnaoHP2RA2RKVB/ 0R/iX85EvB8t6zypHjWeqmScybiiiqg0Vd8mGUenBLW+pR5Y1PEmxyRfauo1mEURVDxQQ7xjnQr1Me McBWx7/nEahhmuB6e8Ih0uHfzgmLPohwwGLvfs9HZHr5wkiOglGPo11OtgxuHz0c+KzsVpJb0r0XZN P0EVS0HYnGDhlzymEq1mMEjFL5DyGJV4DHFJNLZWfMDVg66m2aoJZDkpMw+x8XDfW4Ryp9YyFJpGFv UUDWYFMlv+5HnfL4RoyG9zYh1g1yU2PJtrsX0MHYhmqCMDSd9FxpAePuRhXVXFAe3F6pDAjxiCws4H lP2DQAqGMG6cIkHYVFZv2zf1WTO2d6 X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Currently, serializing operations such as page fault, read, or readahead against hole punching is rather difficult. The basic race scheme is like: fallocate(FALLOC_FL_PUNCH_HOLE) read / fault / .. truncate_inode_pages_range() Now the problem is in this way read / page fault / readahead can instantiate pages in page cache with potentially stale data (if blocks get quickly reused). Avoiding this race is not simple - page locks do not work because we want to make sure there are *no* pages in given range. inode->i_rwsem does not work because page fault happens under mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes the performance for mixed read-write workloads suffer. So create a new rw_semaphore in the address_space - invalidate_lock - that protects adding of pages to page cache for page faults / reads / readahead. Signed-off-by: Jan Kara --- Documentation/filesystems/locking.rst | 64 ++++++++++++++++++-------- fs/inode.c | 2 + include/linux/fs.h | 33 ++++++++++++++ mm/filemap.c | 65 ++++++++++++++++++++++----- mm/readahead.c | 2 + mm/rmap.c | 37 +++++++-------- mm/truncate.c | 3 +- 7 files changed, 156 insertions(+), 50 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 4ed2b22bd0a8..fcb4c0f05050 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -271,19 +271,19 @@ prototypes:: locking rules: All except set_page_dirty and freepage may block -====================== ======================== ========= -ops PageLocked(page) i_rwsem -====================== ======================== ========= +====================== ======================== ========= =============== +ops PageLocked(page) i_rwsem invalidate_lock +====================== ======================== ========= =============== writepage: yes, unlocks (see below) -readpage: yes, unlocks +readpage: yes, unlocks shared writepages: set_page_dirty no -readahead: yes, unlocks -readpages: no +readahead: yes, unlocks shared +readpages: no shared write_begin: locks the page exclusive write_end: yes, unlocks exclusive bmap: -invalidatepage: yes +invalidatepage: yes exclusive releasepage: yes freepage: yes direct_IO: @@ -378,7 +378,10 @@ keep it that way and don't breed new callers. ->invalidatepage() is called when the filesystem must attempt to drop some or all of the buffers from the page when it is being truncated. It returns zero on success. If ->invalidatepage is zero, the kernel uses -block_invalidatepage() instead. +block_invalidatepage() instead. The filesystem must exclusively acquire +invalidate_lock before invalidating page cache in truncate / hole punch path +(and thus calling into ->invalidatepage) to block races between page cache +invalidation and page cache filling functions (fault, read, ...). ->releasepage() is called when the kernel is about to try to drop the buffers from the page in preparation for freeing it. It returns zero to @@ -573,6 +576,27 @@ in sys_read() and friends. the lease within the individual filesystem to record the result of the operation +->fallocate implementation must be really careful to maintain page cache +consistency when punching holes or performing other operations that invalidate +page cache contents. Usually the filesystem needs to call +truncate_inode_pages_range() to invalidate relevant range of the page cache. +However the filesystem usually also needs to update its internal (and on disk) +view of file offset -> disk block mapping. Until this update is finished, the +filesystem needs to block page faults and reads from reloading now-stale page +cache contents from the disk. VFS provides mapping->invalidate_lock for this +and acquires it in shared mode in paths loading pages from disk +(filemap_fault(), filemap_read(), readahead paths). The filesystem is +responsible for taking this lock in its fallocate implementation and generally +whenever the page cache contents needs to be invalidated because a block is +moving from under a page. + +->copy_file_range and ->remap_file_range implementations need to serialize +against modifications of file data while the operation is running. For +blocking changes through write(2) and similar operations inode->i_rwsem can be +used. For blocking changes through memory mapping, the filesystem can use +mapping->invalidate_lock provided it also acquires it in its ->page_mkwrite +implementation. + dquot_operations ================ @@ -630,11 +654,11 @@ pfn_mkwrite: yes access: yes ============= ========= =========================== -->fault() is called when a previously not present pte is about -to be faulted in. The filesystem must find and return the page associated -with the passed in "pgoff" in the vm_fault structure. If it is possible that -the page may be truncated and/or invalidated, then the filesystem must lock -the page, then ensure it is not already truncated (the page lock will block +->fault() is called when a previously not present pte is about to be faulted +in. The filesystem must find and return the page associated with the passed in +"pgoff" in the vm_fault structure. If it is possible that the page may be +truncated and/or invalidated, then the filesystem must lock invalidate_lock, +then ensure the page is not already truncated (invalidate_lock will block subsequent truncate), and then return with VM_FAULT_LOCKED, and the page locked. The VM will unlock the page. @@ -647,12 +671,14 @@ page table entry. Pointer to entry associated with the page is passed in "pte" field in vm_fault structure. Pointers to entries for other offsets should be calculated relative to "pte". -->page_mkwrite() is called when a previously read-only pte is -about to become writeable. The filesystem again must ensure that there are -no truncate/invalidate races, and then return with the page locked. If -the page has been truncated, the filesystem should not look up a new page -like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which -will cause the VM to retry the fault. +->page_mkwrite() is called when a previously read-only pte is about to become +writeable. The filesystem again must ensure that there are no +truncate/invalidate races or races with operations such as ->remap_file_range +or ->copy_file_range, and then return with the page locked. Usually +mapping->invalidate_lock is suitable for proper serialization. If the page has +been truncated, the filesystem should not look up a new page like the ->fault() +handler, but simply return with VM_FAULT_NOPAGE, which will cause the VM to +retry the fault. ->pfn_mkwrite() is the same as page_mkwrite but when the pte is VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is diff --git a/fs/inode.c b/fs/inode.c index c93500d84264..84c528cd1955 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -190,6 +190,8 @@ int inode_init_always(struct super_block *sb, struct inode *inode) mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE); mapping->private_data = NULL; mapping->writeback_index = 0; + __init_rwsem(&mapping->invalidate_lock, "mapping.invalidate_lock", + &sb->s_type->invalidate_lock_key); inode->i_private = NULL; inode->i_mapping = mapping; INIT_HLIST_HEAD(&inode->i_dentry); /* buggered by rcu freeing */ diff --git a/include/linux/fs.h b/include/linux/fs.h index c3c88fdb9b2a..d8afbc9661d7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -436,6 +436,10 @@ int pagecache_write_end(struct file *, struct address_space *mapping, * struct address_space - Contents of a cacheable, mappable object. * @host: Owner, either the inode or the block_device. * @i_pages: Cached pages. + * @invalidate_lock: Guards coherency between page cache contents and + * file offset->disk block mappings in the filesystem during invalidates. + * It is also used to block modification of page cache contents through + * memory mappings. * @gfp_mask: Memory allocation flags to use for allocating pages. * @i_mmap_writable: Number of VM_SHARED mappings. * @nr_thps: Number of THPs in the pagecache (non-shmem only). @@ -453,6 +457,7 @@ int pagecache_write_end(struct file *, struct address_space *mapping, struct address_space { struct inode *host; struct xarray i_pages; + struct rw_semaphore invalidate_lock; gfp_t gfp_mask; atomic_t i_mmap_writable; #ifdef CONFIG_READ_ONLY_THP_FOR_FS @@ -814,6 +819,33 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla down_read_nested(&inode->i_rwsem, subclass); } +static inline void filemap_invalidate_lock(struct address_space *mapping) +{ + down_write(&mapping->invalidate_lock); +} + +static inline void filemap_invalidate_unlock(struct address_space *mapping) +{ + up_write(&mapping->invalidate_lock); +} + +static inline void filemap_invalidate_lock_shared(struct address_space *mapping) +{ + down_read(&mapping->invalidate_lock); +} + +static inline int filemap_invalidate_trylock_shared( + struct address_space *mapping) +{ + return down_read_trylock(&mapping->invalidate_lock); +} + +static inline void filemap_invalidate_unlock_shared( + struct address_space *mapping) +{ + up_read(&mapping->invalidate_lock); +} + void lock_two_nondirectories(struct inode *, struct inode*); void unlock_two_nondirectories(struct inode *, struct inode*); @@ -2488,6 +2520,7 @@ struct file_system_type { struct lock_class_key i_lock_key; struct lock_class_key i_mutex_key; + struct lock_class_key invalidate_lock_key; struct lock_class_key i_mutex_dir_key; }; diff --git a/mm/filemap.c b/mm/filemap.c index ba1068a1837f..c8e7e451d81e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -77,7 +77,8 @@ * ->i_pages lock * * ->i_rwsem - * ->i_mmap_rwsem (truncate->unmap_mapping_range) + * ->invalidate_lock (acquired by fs in truncate path) + * ->i_mmap_rwsem (truncate->unmap_mapping_range) * * ->mmap_lock * ->i_mmap_rwsem @@ -85,7 +86,8 @@ * ->i_pages lock (arch-dependent flush_dcache_mmap_lock) * * ->mmap_lock - * ->lock_page (access_process_vm) + * ->invalidate_lock (filemap_fault) + * ->lock_page (filemap_fault, access_process_vm) * * ->i_rwsem (generic_perform_write) * ->mmap_lock (fault_in_pages_readable->do_page_fault) @@ -2368,20 +2370,30 @@ static int filemap_update_page(struct kiocb *iocb, { int error; + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!filemap_invalidate_trylock_shared(mapping)) + return -EAGAIN; + } else { + filemap_invalidate_lock_shared(mapping); + } + if (!trylock_page(page)) { + error = -EAGAIN; if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) - return -EAGAIN; + goto unlock_mapping; if (!(iocb->ki_flags & IOCB_WAITQ)) { + filemap_invalidate_unlock_shared(mapping); put_and_wait_on_page_locked(page, TASK_KILLABLE); return AOP_TRUNCATED_PAGE; } error = __lock_page_async(page, iocb->ki_waitq); if (error) - return error; + goto unlock_mapping; } + error = AOP_TRUNCATED_PAGE; if (!page->mapping) - goto truncated; + goto unlock; error = 0; if (filemap_range_uptodate(mapping, iocb->ki_pos, iter, page)) @@ -2392,15 +2404,13 @@ static int filemap_update_page(struct kiocb *iocb, goto unlock; error = filemap_read_page(iocb->ki_filp, mapping, page); - if (error == AOP_TRUNCATED_PAGE) - put_page(page); - return error; -truncated: - unlock_page(page); - put_page(page); - return AOP_TRUNCATED_PAGE; + goto unlock_mapping; unlock: unlock_page(page); +unlock_mapping: + filemap_invalidate_unlock_shared(mapping); + if (error == AOP_TRUNCATED_PAGE) + put_page(page); return error; } @@ -2415,6 +2425,19 @@ static int filemap_create_page(struct file *file, if (!page) return -ENOMEM; + /* + * Protect against truncate / hole punch. Grabbing invalidate_lock here + * assures we cannot instantiate and bring uptodate new pagecache pages + * after evicting page cache during truncate and before actually + * freeing blocks. Note that we could release invalidate_lock after + * inserting the page into page cache as the locked page would then be + * enough to synchronize with hole punching. But there are code paths + * such as filemap_update_page() filling in partially uptodate pages or + * ->readpages() that need to hold invalidate_lock while mapping blocks + * for IO so let's hold the lock here as well to keep locking rules + * simple. + */ + filemap_invalidate_lock_shared(mapping); error = add_to_page_cache_lru(page, mapping, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error == -EEXIST) @@ -2426,9 +2449,11 @@ static int filemap_create_page(struct file *file, if (error) goto error; + filemap_invalidate_unlock_shared(mapping); pagevec_add(pvec, page); return 0; error: + filemap_invalidate_unlock_shared(mapping); put_page(page); return error; } @@ -2988,6 +3013,13 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; fpin = do_sync_mmap_readahead(vmf); + } + + /* + * See comment in filemap_create_page() why we need invalidate_lock + */ + filemap_invalidate_lock_shared(mapping); + if (!page) { retry_find: page = pagecache_get_page(mapping, offset, FGP_CREAT|FGP_FOR_MMAP, @@ -2995,6 +3027,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (!page) { if (fpin) goto out_retry; + filemap_invalidate_unlock_shared(mapping); return VM_FAULT_OOM; } } @@ -3035,9 +3068,11 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (unlikely(offset >= max_off)) { unlock_page(page); put_page(page); + filemap_invalidate_unlock_shared(mapping); return VM_FAULT_SIGBUS; } + filemap_invalidate_unlock_shared(mapping); vmf->page = page; return ret | VM_FAULT_LOCKED; @@ -3056,6 +3091,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) if (!error || error == AOP_TRUNCATED_PAGE) goto retry_find; + filemap_invalidate_unlock_shared(mapping); return VM_FAULT_SIGBUS; @@ -3067,6 +3103,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) */ if (page) put_page(page); + filemap_invalidate_unlock_shared(mapping); if (fpin) fput(fpin); return ret | VM_FAULT_RETRY; @@ -3437,6 +3474,8 @@ static struct page *do_read_cache_page(struct address_space *mapping, * * If the page does not get brought uptodate, return -EIO. * + * The function expects mapping->invalidate_lock to be already held. + * * Return: up to date page on success, ERR_PTR() on failure. */ struct page *read_cache_page(struct address_space *mapping, @@ -3460,6 +3499,8 @@ EXPORT_SYMBOL(read_cache_page); * * If the page does not get brought uptodate, return -EIO. * + * The function expects mapping->invalidate_lock to be already held. + * * Return: up to date page on success, ERR_PTR() on failure. */ struct page *read_cache_page_gfp(struct address_space *mapping, diff --git a/mm/readahead.c b/mm/readahead.c index d589f147f4c2..41b75d76d36e 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -192,6 +192,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, */ unsigned int nofs = memalloc_nofs_save(); + filemap_invalidate_lock_shared(mapping); /* * Preallocate as many pages as we will need. */ @@ -236,6 +237,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, * will then handle the error. */ read_pages(ractl, &page_pool, false); + filemap_invalidate_unlock_shared(mapping); memalloc_nofs_restore(nofs); } EXPORT_SYMBOL_GPL(page_cache_ra_unbounded); diff --git a/mm/rmap.c b/mm/rmap.c index a35cbbbded0d..76d33c3b8ae6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -22,24 +22,25 @@ * * inode->i_rwsem (while writing or truncating, not reading or faulting) * mm->mmap_lock - * page->flags PG_locked (lock_page) * (see hugetlbfs below) - * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) - * mapping->i_mmap_rwsem - * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * anon_vma->rwsem - * mm->page_table_lock or pte_lock - * swap_lock (in swap_duplicate, swap_info_get) - * mmlist_lock (in mmput, drain_mmlist and others) - * mapping->private_lock (in __set_page_dirty_buffers) - * lock_page_memcg move_lock (in __set_page_dirty_buffers) - * i_pages lock (widely used) - * lruvec->lru_lock (in lock_page_lruvec_irq) - * inode->i_lock (in set_page_dirty's __mark_inode_dirty) - * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) - * sb_lock (within inode_lock in fs/fs-writeback.c) - * i_pages lock (widely used, in set_page_dirty, - * in arch-dependent flush_dcache_mmap_lock, - * within bdi.wb->list_lock in __sync_single_inode) + * mapping->invalidate_lock (in filemap_fault) + * page->flags PG_locked (lock_page) * (see hugetlbfs below) + * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share) + * mapping->i_mmap_rwsem + * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) + * anon_vma->rwsem + * mm->page_table_lock or pte_lock + * swap_lock (in swap_duplicate, swap_info_get) + * mmlist_lock (in mmput, drain_mmlist and others) + * mapping->private_lock (in __set_page_dirty_buffers) + * lock_page_memcg move_lock (in __set_page_dirty_buffers) + * i_pages lock (widely used) + * lruvec->lru_lock (in lock_page_lruvec_irq) + * inode->i_lock (in set_page_dirty's __mark_inode_dirty) + * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) + * sb_lock (within inode_lock in fs/fs-writeback.c) + * i_pages lock (widely used, in set_page_dirty, + * in arch-dependent flush_dcache_mmap_lock, + * within bdi.wb->list_lock in __sync_single_inode) * * anon_vma->rwsem,mapping->i_mmap_rwsem (memory_failure, collect_procs_anon) * ->tasklist_lock diff --git a/mm/truncate.c b/mm/truncate.c index 57a618c4a0d6..d0cc6588aba2 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -415,7 +415,8 @@ EXPORT_SYMBOL(truncate_inode_pages_range); * @mapping: mapping to truncate * @lstart: offset from which to truncate * - * Called under (and serialised by) inode->i_rwsem. + * Called under (and serialised by) inode->i_rwsem and + * mapping->invalidate_lock. * * Note: When this function returns, there can be a page in the process of * deletion (inside __delete_from_page_cache()) in the specified range. Thus From patchwork Mon Jun 7 14:52:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303747 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19839C48BE6 for ; Mon, 7 Jun 2021 14:52:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0975F60FE5 for ; Mon, 7 Jun 2021 14:52:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231217AbhFGOyd (ORCPT ); Mon, 7 Jun 2021 10:54:33 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:34624 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230264AbhFGOya (ORCPT ); Mon, 7 Jun 2021 10:54:30 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 819DE21A68; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vhH8fZvd6Tom4nBiYNHBFCVmcJF6aY93XCBQD2IK7mE=; b=sTYyrBrhmcoYH4Pc6qGpefCJy5Sux5PJVReKTeUz6gUVd74lV8PShJvlmgXRUYi7K78zGY d4J3ry2lEKY4SEj8ooKRSfsxNwMcXt1Hj4nLWK6IcMLWMM+R6InO8T8IFySZfCpG7xy0Kk TEFjqGOekbxuEn9tycM3YhzJbNJncoU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vhH8fZvd6Tom4nBiYNHBFCVmcJF6aY93XCBQD2IK7mE=; b=DwFAqfCR7Wr+2ACyXhre2GSoLv5DOgFn+ENsakIV7+3dRNt9fpuZ6AfKNeT4x7NlOGqZpl fOCS0M97+XnF36Bg== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id B8DC1A3B98; Mon, 7 Jun 2021 14:52:36 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 91DF31F2CB3; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 04/14] mm: Add functions to lock invalidate_lock for two mappings Date: Mon, 7 Jun 2021 16:52:14 +0200 Message-Id: <20210607145236.31852-4-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2616; h=from:subject; bh=feuIxLKoHMpqfTsHSNzx+Llrxfh/M4pQHEj3mln4fkU=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKf4qNEt/Jyz8s3OnJ3hXcO5MHT6Xe5BiU0vi7v Yg+bSvWJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ynwAKCRCcnaoHP2RA2chHB/ wKYys4WnIBaTn+iTmcDwD76cwJNj9fNUmSIB5g6n0gtAP4/BnjWvcBbmQla0RGUYJvtO/9VQX1/DPE RdysgXZVUQqMFjHYjWjBwkkpPWcvdHvIW+9KXp0iJzLJkqUV8p8ZfByoP2JTRzDN82Rte87GtnN2Sp L0xIljcsn5jrH8Ww2o3Twm5GXBaCRdS8AVgRk1xEBgsw9Tm6VEqipm87uxNWlHgaU4THjuXQviA32K qTmxhDoIUFvd+7DyJIQyVFx8U2bwkAdQVm20zTZnZAZepHKSgapRODGbx6Yqk1g6ob+l9hmvzMz/zA Y3rmXXMf7ydvJe7qAvc8CkddqajivD X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Some operations such as reflinking blocks among files will need to lock invalidate_lock for two mappings. Add helper functions to do that. Signed-off-by: Jan Kara Reviewed-by: Darrick J. Wong --- include/linux/fs.h | 6 ++++++ mm/filemap.c | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index d8afbc9661d7..ddc11bafc183 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -849,6 +849,12 @@ static inline void filemap_invalidate_unlock_shared( void lock_two_nondirectories(struct inode *, struct inode*); void unlock_two_nondirectories(struct inode *, struct inode*); +void filemap_invalidate_lock_two(struct address_space *mapping1, + struct address_space *mapping2); +void filemap_invalidate_unlock_two(struct address_space *mapping1, + struct address_space *mapping2); + + /* * NOTE: in a 32bit arch with a preemptable kernel and * an UP compile the i_size_read/write must be atomic diff --git a/mm/filemap.c b/mm/filemap.c index c8e7e451d81e..b8e9bccecd9f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1009,6 +1009,44 @@ struct page *__page_cache_alloc(gfp_t gfp) EXPORT_SYMBOL(__page_cache_alloc); #endif +/* + * filemap_invalidate_lock_two - lock invalidate_lock for two mappings + * + * Lock exclusively invalidate_lock of any passed mapping that is not NULL. + * + * @mapping1: the first mapping to lock + * @mapping2: the second mapping to lock + */ +void filemap_invalidate_lock_two(struct address_space *mapping1, + struct address_space *mapping2) +{ + if (mapping1 > mapping2) + swap(mapping1, mapping2); + if (mapping1) + down_write(&mapping1->invalidate_lock); + if (mapping2 && mapping1 != mapping2) + down_write_nested(&mapping2->invalidate_lock, 1); +} +EXPORT_SYMBOL(filemap_invalidate_lock_two); + +/* + * filemap_invalidate_unlock_two - unlock invalidate_lock for two mappings + * + * Unlock exclusive invalidate_lock of any passed mapping that is not NULL. + * + * @mapping1: the first mapping to unlock + * @mapping2: the second mapping to unlock + */ +void filemap_invalidate_unlock_two(struct address_space *mapping1, + struct address_space *mapping2) +{ + if (mapping1) + up_write(&mapping1->invalidate_lock); + if (mapping2 && mapping1 != mapping2) + up_write(&mapping2->invalidate_lock); +} +EXPORT_SYMBOL(filemap_invalidate_unlock_two); + /* * In order to wait for pages to become available there must be * waitqueues associated with pages. By using a hash table of From patchwork Mon Jun 7 14:52:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303765 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83B1AC47082 for ; Mon, 7 Jun 2021 14:52:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A9AF61246 for ; Mon, 7 Jun 2021 14:52:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231437AbhFGOym (ORCPT ); Mon, 7 Jun 2021 10:54:42 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:34722 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230462AbhFGOyb (ORCPT ); Mon, 7 Jun 2021 10:54:31 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id DBC9621A99; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vP1Q17M5eZAkCPVn/AjVmngx8UcfserGwnad76hX0QI=; b=wZuGlb/rL3eWDA2Su7SOF31cpgEQAsMqxjk3C3YFcooLY62oRQPbHoXWzOLw8dpI9PvKQ9 bjndEFqpv+HBSYCszOz9rBCPQ9OwOdu2BdVCvYS4jfKW/TrYsZOKYUtcz3CuescpkiGxLN k4nVaATaEsZm0KF/fcXDwzLgkYFpZCY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vP1Q17M5eZAkCPVn/AjVmngx8UcfserGwnad76hX0QI=; b=8BXeUOPNdsUFK8RB/cQkR2OXq+OtEZa4UbXRxlvLWhrJjJNoWKJRGXF6hWcJIAcVoMwLzG 0cwhOEGQPmj1QyAw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id A9C03A3B88; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 9787F1F2CB4; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 05/14] ext4: Convert to use mapping->invalidate_lock Date: Mon, 7 Jun 2021 16:52:15 +0200 Message-Id: <20210607145236.31852-5-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=15626; h=from:subject; bh=82J4dYf3V1dFW0NKaPIAPu5uQhO7oTiI9K34LI03E5M=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKfNgDhkAUKMbaf5jWRt9dwXelzjD22tymsm2Bk wLB3QZaJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ynwAKCRCcnaoHP2RA2RSACA CLuQw9ZpyriDbwnxmNDWEI4ROeMvlQysT21pk1QHGiGE4M0k/Z9g1j946afwsF9aXZSHqSMJqMJPai oWLBgLmtD7Dg/v+3eU29VBKL+s0h4OJXIfUmjpIwgOfQ5pShYR5k/asNgIGyXOjpM9TsrqYrhNM5Ks wPGhJMhkiOMHc/bAzdDo+d3+TmNkHrEG52Kr8CYGV8qxlUkfPnzgn4aZijPC9u/BFvjWaURUQ2ovFZ WoWHuq+cpfXaCHb+SHI7ZygKpnmQKCnypXAH2vNqYIQEf4B23RC9A6kPWW4YGxlb4aC2f1YF5AzIiC yHu23Xu2+yY/IsAAXf/nKJOyC/CfN/ X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Convert ext4 to use mapping->invalidate_lock instead of its private EXT4_I(inode)->i_mmap_sem. This is mostly search-and-replace. By this conversion we fix a long standing race between hole punching and read(2) / readahead(2) paths that can lead to stale page cache contents. CC: CC: Ted Tso Signed-off-by: Jan Kara --- fs/ext4/ext4.h | 10 ---------- fs/ext4/extents.c | 25 +++++++++++++----------- fs/ext4/file.c | 13 +++++++------ fs/ext4/inode.c | 47 +++++++++++++++++----------------------------- fs/ext4/ioctl.c | 4 ++-- fs/ext4/super.c | 13 +++++-------- fs/ext4/truncate.h | 8 +++++--- 7 files changed, 50 insertions(+), 70 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 37002663d521..ed64b4b217a1 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1077,15 +1077,6 @@ struct ext4_inode_info { * by other means, so we have i_data_sem. */ struct rw_semaphore i_data_sem; - /* - * i_mmap_sem is for serializing page faults with truncate / punch hole - * operations. We have to make sure that new page cannot be faulted in - * a section of the inode that is being punched. We cannot easily use - * i_data_sem for this since we need protection for the whole punch - * operation and i_data_sem ranks below transaction start so we have - * to occasionally drop it. - */ - struct rw_semaphore i_mmap_sem; struct inode vfs_inode; struct jbd2_inode *jinode; @@ -2962,7 +2953,6 @@ extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks); extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode, loff_t lstart, loff_t lend); extern vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf); -extern vm_fault_t ext4_filemap_fault(struct vm_fault *vmf); extern qsize_t *ext4_get_reserved_space(struct inode *inode); extern int ext4_get_projid(struct inode *inode, kprojid_t *projid); extern void ext4_da_release_space(struct inode *inode, int to_free); diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 77c84d6f1af6..194b8af2b31d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4467,6 +4467,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, loff_t len, int mode) { struct inode *inode = file_inode(file); + struct address_space *mapping = file->f_mapping; handle_t *handle = NULL; unsigned int max_blocks; loff_t new_size = 0; @@ -4553,17 +4554,17 @@ static long ext4_zero_range(struct file *file, loff_t offset, * Prevent page faults from reinstantiating pages we have * released from page cache. */ - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_break_layouts(inode); if (ret) { - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); goto out_mutex; } ret = ext4_update_disksize_before_punch(inode, offset, len); if (ret) { - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); goto out_mutex; } /* Now release the pages and zero block aligned part of pages */ @@ -4572,7 +4573,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags); - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); if (ret) goto out_mutex; } @@ -5214,6 +5215,7 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle, static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) { struct super_block *sb = inode->i_sb; + struct address_space *mapping = inode->i_mapping; ext4_lblk_t punch_start, punch_stop; handle_t *handle; unsigned int credits; @@ -5267,7 +5269,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) * Prevent page faults from reinstantiating pages we have released from * page cache. */ - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_break_layouts(inode); if (ret) @@ -5282,15 +5284,15 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) * Write tail of the last page before removed range since it will get * removed from the page cache below. */ - ret = filemap_write_and_wait_range(inode->i_mapping, ioffset, offset); + ret = filemap_write_and_wait_range(mapping, ioffset, offset); if (ret) goto out_mmap; /* * Write data that will be shifted to preserve them when discarding * page cache below. We are also protected from pages becoming dirty - * by i_mmap_sem. + * by i_rwsem and invalidate_lock. */ - ret = filemap_write_and_wait_range(inode->i_mapping, offset + len, + ret = filemap_write_and_wait_range(mapping, offset + len, LLONG_MAX); if (ret) goto out_mmap; @@ -5343,7 +5345,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) ext4_journal_stop(handle); ext4_fc_stop_ineligible(sb); out_mmap: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); out_mutex: inode_unlock(inode); return ret; @@ -5360,6 +5362,7 @@ static int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len) static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) { struct super_block *sb = inode->i_sb; + struct address_space *mapping = inode->i_mapping; handle_t *handle; struct ext4_ext_path *path; struct ext4_extent *extent; @@ -5418,7 +5421,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) * Prevent page faults from reinstantiating pages we have released from * page cache. */ - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_break_layouts(inode); if (ret) @@ -5519,7 +5522,7 @@ static int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len) ext4_journal_stop(handle); ext4_fc_stop_ineligible(sb); out_mmap: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); out_mutex: inode_unlock(inode); return ret; diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 816dedcbd541..d3b4ed91aa68 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -704,22 +704,23 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf, */ bool write = (vmf->flags & FAULT_FLAG_WRITE) && (vmf->vma->vm_flags & VM_SHARED); + struct address_space *mapping = vmf->vma->vm_file->f_mapping; pfn_t pfn; if (write) { sb_start_pagefault(sb); file_update_time(vmf->vma->vm_file); - down_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(mapping); retry: handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE, EXT4_DATA_TRANS_BLOCKS(sb)); if (IS_ERR(handle)) { - up_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(mapping); sb_end_pagefault(sb); return VM_FAULT_SIGBUS; } } else { - down_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(mapping); } result = dax_iomap_fault(vmf, pe_size, &pfn, &error, &ext4_iomap_ops); if (write) { @@ -731,10 +732,10 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf, /* Handling synchronous page fault? */ if (result & VM_FAULT_NEEDDSYNC) result = dax_finish_sync_fault(vmf, pe_size, pfn); - up_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(mapping); sb_end_pagefault(sb); } else { - up_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(mapping); } return result; @@ -756,7 +757,7 @@ static const struct vm_operations_struct ext4_dax_vm_ops = { #endif static const struct vm_operations_struct ext4_file_vm_ops = { - .fault = ext4_filemap_fault, + .fault = filemap_fault, .map_pages = filemap_map_pages, .page_mkwrite = ext4_page_mkwrite, }; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fe6045a46599..ee6e69d6f949 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3950,20 +3950,19 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset, return ret; } -static void ext4_wait_dax_page(struct ext4_inode_info *ei) +static void ext4_wait_dax_page(struct inode *inode) { - up_write(&ei->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); schedule(); - down_write(&ei->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); } int ext4_break_layouts(struct inode *inode) { - struct ext4_inode_info *ei = EXT4_I(inode); struct page *page; int error; - if (WARN_ON_ONCE(!rwsem_is_locked(&ei->i_mmap_sem))) + if (WARN_ON_ONCE(!rwsem_is_locked(&inode->i_mapping->invalidate_lock))) return -EINVAL; do { @@ -3974,7 +3973,7 @@ int ext4_break_layouts(struct inode *inode) error = ___wait_var_event(&page->_refcount, atomic_read(&page->_refcount) == 1, TASK_INTERRUPTIBLE, 0, 0, - ext4_wait_dax_page(ei)); + ext4_wait_dax_page(inode)); } while (error == 0); return error; @@ -4005,9 +4004,9 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); if (ext4_has_inline_data(inode)) { - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_convert_inline_data(inode); - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); if (ret) return ret; } @@ -4058,7 +4057,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) * Prevent page faults from reinstantiating pages we have released from * page cache. */ - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = ext4_break_layouts(inode); if (ret) @@ -4131,7 +4130,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) out_stop: ext4_journal_stop(handle); out_dio: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); out_mutex: inode_unlock(inode); return ret; @@ -5426,11 +5425,11 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, inode_dio_wait(inode); } - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); rc = ext4_break_layouts(inode); if (rc) { - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); goto err_out; } @@ -5506,7 +5505,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, error = rc; } out_mmap_sem: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); } if (!error) { @@ -5983,10 +5982,10 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) * data (and journalled aops don't know how to handle these cases). */ if (val) { - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); err = filemap_write_and_wait(inode->i_mapping); if (err < 0) { - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); return err; } } @@ -6019,7 +6018,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val) percpu_up_write(&sbi->s_writepages_rwsem); if (val) - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); /* Finally we can mark the inode as dirty. */ @@ -6063,7 +6062,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) sb_start_pagefault(inode->i_sb); file_update_time(vma->vm_file); - down_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(mapping); err = ext4_convert_inline_data(inode); if (err) @@ -6176,7 +6175,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) out_ret: ret = block_page_mkwrite_return(err); out: - up_read(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(mapping); sb_end_pagefault(inode->i_sb); return ret; out_error: @@ -6184,15 +6183,3 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) ext4_journal_stop(handle); goto out; } - -vm_fault_t ext4_filemap_fault(struct vm_fault *vmf) -{ - struct inode *inode = file_inode(vmf->vma->vm_file); - vm_fault_t ret; - - down_read(&EXT4_I(inode)->i_mmap_sem); - ret = filemap_fault(vmf); - up_read(&EXT4_I(inode)->i_mmap_sem); - - return ret; -} diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index 31627f7dc5cd..c5ed562b4185 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -148,7 +148,7 @@ static long swap_inode_boot_loader(struct super_block *sb, goto journal_err_out; } - down_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); err = filemap_write_and_wait(inode->i_mapping); if (err) goto err_out; @@ -256,7 +256,7 @@ static long swap_inode_boot_loader(struct super_block *sb, ext4_double_up_write_data_sem(inode, inode_bl); err_out: - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); journal_err_out: unlock_two_nondirectories(inode, inode_bl); iput(inode_bl); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 7dc94f3e18e6..966fd5ca48e0 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -90,12 +90,9 @@ static struct inode *ext4_get_journal_inode(struct super_block *sb, /* * Lock ordering * - * Note the difference between i_mmap_sem (EXT4_I(inode)->i_mmap_sem) and - * i_mmap_rwsem (inode->i_mmap_rwsem)! - * * page fault path: - * mmap_lock -> sb_start_pagefault -> i_mmap_sem (r) -> transaction start -> - * page lock -> i_data_sem (rw) + * mmap_lock -> sb_start_pagefault -> invalidate_lock (r) -> transaction start + * -> page lock -> i_data_sem (rw) * * buffered write path: * sb_start_write -> i_mutex -> mmap_lock @@ -103,8 +100,9 @@ static struct inode *ext4_get_journal_inode(struct super_block *sb, * i_data_sem (rw) * * truncate: - * sb_start_write -> i_mutex -> i_mmap_sem (w) -> i_mmap_rwsem (w) -> page lock - * sb_start_write -> i_mutex -> i_mmap_sem (w) -> transaction start -> + * sb_start_write -> i_mutex -> invalidate_lock (w) -> i_mmap_rwsem (w) -> + * page lock + * sb_start_write -> i_mutex -> invalidate_lock (w) -> transaction start -> * i_data_sem (rw) * * direct IO: @@ -1350,7 +1348,6 @@ static void init_once(void *foo) INIT_LIST_HEAD(&ei->i_orphan); init_rwsem(&ei->xattr_sem); init_rwsem(&ei->i_data_sem); - init_rwsem(&ei->i_mmap_sem); inode_init_once(&ei->vfs_inode); ext4_fc_init_inode(&ei->vfs_inode); } diff --git a/fs/ext4/truncate.h b/fs/ext4/truncate.h index bcbe3668c1d4..ce84aa2786c7 100644 --- a/fs/ext4/truncate.h +++ b/fs/ext4/truncate.h @@ -11,14 +11,16 @@ */ static inline void ext4_truncate_failed_write(struct inode *inode) { + struct address_space *mapping = inode->i_mapping; + /* * We don't need to call ext4_break_layouts() because the blocks we * are truncating were never visible to userspace. */ - down_write(&EXT4_I(inode)->i_mmap_sem); - truncate_inode_pages(inode->i_mapping, inode->i_size); + filemap_invalidate_lock(mapping); + truncate_inode_pages(mapping, inode->i_size); ext4_truncate(inode); - up_write(&EXT4_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); } /* From patchwork Mon Jun 7 14:52:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303755 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNWANTED_LANGUAGE_BODY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83783C49ECA for ; Mon, 7 Jun 2021 14:52:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6755361153 for ; Mon, 7 Jun 2021 14:52:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230269AbhFGOyi (ORCPT ); Mon, 7 Jun 2021 10:54:38 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:34694 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230438AbhFGOyb (ORCPT ); Mon, 7 Jun 2021 10:54:31 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id DE3D321A9B; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y0cR5u9TaTNf3kT8s6I4jpd1jCjKCNBR8ag5a0EBpIA=; b=lRNmVcmRG737AlbDU8H4w/KlwsUwJATOMmnyK9j1C08aviXsnN+z59eaOMXG3SBOcoo8hP X49jgtJSXWZJTSrn32cp+qmrqms9Z6RpgegsMLcHWf+adcj8nnqU+LC6V8BOQzrTU8V5hu p1w/2gMeidgaPqBZqSEpWAXCVUkjENo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y0cR5u9TaTNf3kT8s6I4jpd1jCjKCNBR8ag5a0EBpIA=; b=DzX0ZLzbeMzKgIqY3N5UjYGxTkUqF+YfzdtDAPtexZAmQW4jN1i8OjliYuGNqWSQdGeMbm JEK/HMzy1klveLCQ== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id AE2EAA3B8D; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 9D0CF1F2CB5; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 06/14] ext2: Convert to using invalidate_lock Date: Mon, 7 Jun 2021 16:52:16 +0200 Message-Id: <20210607145236.31852-6-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4682; h=from:subject; bh=CDcZWsyH32vJE3DSXVHXUVYPxLcy1S8W7KZFP94NQWc=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKgthhqcWN94OroqMs+dt8ZjvTEGenhtDvbVKo4 rIyqNfCJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4yoAAKCRCcnaoHP2RA2TxZCA CRdo+Mbb6U82tItDgWExu+HN9TPnhfk4Ge6vz5lG4CbHc40AXb90NGmXzbXkb4Qovz6zhJODtOVBtm WmN9EsXJdn7TWSttbSjCX4oqlnUq4Qgu9t+MiXyo8a0JITM3/sXXPF0Gy783DF1SuYMVogVkibN0Kp sCH/g2aptvoJEfUeWcbr1FHGrit1Ojjl/kHhmYGE1dTO9g9Nz0/sqtY4g8jL3ckAelok/zdMZqG7qP WFI8qHzir4+bULWsuaN3XRs28PdT+8L/0SVgYUS8ZUdfO2rXZkWTJP3X0hiuMC4RYgQhQoTmB6cxRS 3HcqGSSnT/qf3t1hzAjFoCJBBGbQxC X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Ext2 has its private dax_sem used for synchronizing page faults and truncation. Use mapping->invalidate_lock instead as it is meant for this purpose. CC: Signed-off-by: Jan Kara --- fs/ext2/ext2.h | 11 ----------- fs/ext2/file.c | 7 +++---- fs/ext2/inode.c | 12 ++++++------ fs/ext2/super.c | 3 --- 4 files changed, 9 insertions(+), 24 deletions(-) diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index b0a694820cb7..81907a041570 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -667,9 +667,6 @@ struct ext2_inode_info { struct rw_semaphore xattr_sem; #endif rwlock_t i_meta_lock; -#ifdef CONFIG_FS_DAX - struct rw_semaphore dax_sem; -#endif /* * truncate_mutex is for serialising ext2_truncate() against @@ -685,14 +682,6 @@ struct ext2_inode_info { #endif }; -#ifdef CONFIG_FS_DAX -#define dax_sem_down_write(ext2_inode) down_write(&(ext2_inode)->dax_sem) -#define dax_sem_up_write(ext2_inode) up_write(&(ext2_inode)->dax_sem) -#else -#define dax_sem_down_write(ext2_inode) -#define dax_sem_up_write(ext2_inode) -#endif - /* * Inode dynamic state flags */ diff --git a/fs/ext2/file.c b/fs/ext2/file.c index f98466acc672..eb97aa3d700e 100644 --- a/fs/ext2/file.c +++ b/fs/ext2/file.c @@ -81,7 +81,7 @@ static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) * * mmap_lock (MM) * sb_start_pagefault (vfs, freeze) - * ext2_inode_info->dax_sem + * address_space->invalidate_lock * address_space->i_mmap_rwsem or page_lock (mutually exclusive in DAX) * ext2_inode_info->truncate_mutex * @@ -91,7 +91,6 @@ static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) static vm_fault_t ext2_dax_fault(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file); - struct ext2_inode_info *ei = EXT2_I(inode); vm_fault_t ret; bool write = (vmf->flags & FAULT_FLAG_WRITE) && (vmf->vma->vm_flags & VM_SHARED); @@ -100,11 +99,11 @@ static vm_fault_t ext2_dax_fault(struct vm_fault *vmf) sb_start_pagefault(inode->i_sb); file_update_time(vmf->vma->vm_file); } - down_read(&ei->dax_sem); + filemap_invalidate_lock_shared(inode->i_mapping); ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); - up_read(&ei->dax_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); if (write) sb_end_pagefault(inode->i_sb); return ret; diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 68178b2234bd..2c76b9ffea26 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1175,7 +1175,7 @@ static void ext2_free_branches(struct inode *inode, __le32 *p, __le32 *q, int de ext2_free_data(inode, p, q); } -/* dax_sem must be held when calling this function */ +/* mapping->invalidate_lock must be held when calling this function */ static void __ext2_truncate_blocks(struct inode *inode, loff_t offset) { __le32 *i_data = EXT2_I(inode)->i_data; @@ -1192,7 +1192,7 @@ static void __ext2_truncate_blocks(struct inode *inode, loff_t offset) iblock = (offset + blocksize-1) >> EXT2_BLOCK_SIZE_BITS(inode->i_sb); #ifdef CONFIG_FS_DAX - WARN_ON(!rwsem_is_locked(&ei->dax_sem)); + WARN_ON(!rwsem_is_locked(&inode->i_mapping->invalidate_lock)); #endif n = ext2_block_to_path(inode, iblock, offsets, NULL); @@ -1274,9 +1274,9 @@ static void ext2_truncate_blocks(struct inode *inode, loff_t offset) if (ext2_inode_is_fast_symlink(inode)) return; - dax_sem_down_write(EXT2_I(inode)); + filemap_invalidate_lock(inode->i_mapping); __ext2_truncate_blocks(inode, offset); - dax_sem_up_write(EXT2_I(inode)); + filemap_invalidate_unlock(inode->i_mapping); } static int ext2_setsize(struct inode *inode, loff_t newsize) @@ -1306,10 +1306,10 @@ static int ext2_setsize(struct inode *inode, loff_t newsize) if (error) return error; - dax_sem_down_write(EXT2_I(inode)); + filemap_invalidate_lock(inode->i_mapping); truncate_setsize(inode, newsize); __ext2_truncate_blocks(inode, newsize); - dax_sem_up_write(EXT2_I(inode)); + filemap_invalidate_unlock(inode->i_mapping); inode->i_mtime = inode->i_ctime = current_time(inode); if (inode_needs_sync(inode)) { diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 21e09fbaa46f..987bcf32ed46 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -206,9 +206,6 @@ static void init_once(void *foo) init_rwsem(&ei->xattr_sem); #endif mutex_init(&ei->truncate_mutex); -#ifdef CONFIG_FS_DAX - init_rwsem(&ei->dax_sem); -#endif inode_init_once(&ei->vfs_inode); } From patchwork Mon Jun 7 14:52:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303761 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73409C4CC64 for ; Mon, 7 Jun 2021 14:52:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5FC7F61245 for ; Mon, 7 Jun 2021 14:52:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231419AbhFGOyn (ORCPT ); Mon, 7 Jun 2021 10:54:43 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51640 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230453AbhFGOyb (ORCPT ); Mon, 7 Jun 2021 10:54:31 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id D28CC1FDD2; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YK8GEJCDKjq8m4s9mPl+Cr6kVFf/7XBXLx1QgmSt22Q=; b=ZPlrCJN1Q+nMRsgZETbRrkJxtgOotDXgTM7o/Tu5h0Gr8bAQMMOWm9ywdBWCrBwAI7CZji XQ89VbYq6IO3nzIgL5B3Uyh2skJ78mL6D0ufMj4PSXgd24UOChI0BchwDit/Ibjypzxjl7 YB3X8fXTW/lTTi+z7pDiySNk0P53K8Q= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YK8GEJCDKjq8m4s9mPl+Cr6kVFf/7XBXLx1QgmSt22Q=; b=SxBYwXZ92n/WdiLGzzzCaCsIzOkITvUlxhfnTWU+AtfUWr6eqleklDVqDXiKoJuS5uohUU u91Krl7ddNuSAXCA== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id B2199A3B8E; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id A28761F2CB6; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Pavel Reichl , Dave Chinner , Eric Sandeen , Christoph Hellwig , Jan Kara Subject: [PATCH 07/14] xfs: Refactor xfs_isilocked() Date: Mon, 7 Jun 2021 16:52:17 +0200 Message-Id: <20210607145236.31852-7-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4097; i=jack@suse.cz; h=from:subject; bh=EyWkjLzMxBvucK3cTjl3wByc8CRPjLl12F487VSLK9k=; b=owGbwMvMwME4Z+4qdvsUh5uMp9WSGBL2GS2sPKKyPeulwoHnToySDLXdc5y83cKnpbv1c6xXNbp3 WO1SJ6MxCwMjB4OsmCLL6siL2tfmGXVtDdWQgRnEygQyhYGLUwAmElzK/k99C/OZUBGvGTm+bpWfZo W7yKq6BCfnG6l6VHLy3Wp+EbSn0t6ds1F+hWD105TVc+6WbF0VtfLdzeNapxayi29dfnhaT9fJ/pnT BFMP+LPdFci78DnbR+V0RM/Tts8fpi2al+LWuPul1OtWVwPllpvT/h1/U3Xc3jciaJa0LytvV2PxJ/ fDP03yKioVHk/uteHTv5ya1WhiEJvu3axd9nmHtA7DNckdh+1NubaWqnOGrZvayXzmaa5K0+tnxnHu fsWd617V/Ij1OczrefPUAweFJtGSQrG7C3YVO65x//t5h99p7z2F/wNkfj2cvLfxp8CG7SuCXlpdqD uQdn3HYzE5Tx/jGUskWUViuRLCAA== X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Pavel Reichl Refactor xfs_isilocked() to use newly introduced __xfs_rwsem_islocked(). __xfs_rwsem_islocked() is a helper function which encapsulates checking state of rw_semaphores hold by inode. Signed-off-by: Pavel Reichl Suggested-by: Dave Chinner Suggested-by: Eric Sandeen Suggested-by: Darrick J. Wong Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jan Kara Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_inode.c | 39 +++++++++++++++++++++++++++++++-------- fs/xfs/xfs_inode.h | 21 ++++++++++++++------- 2 files changed, 45 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 0369eb22c1bb..6247977870bd 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -342,9 +342,34 @@ xfs_ilock_demote( } #if defined(DEBUG) || defined(XFS_WARN) -int +static inline bool +__xfs_rwsem_islocked( + struct rw_semaphore *rwsem, + int lock_flags, + int shift) +{ + lock_flags >>= shift; + + if (!debug_locks) + return rwsem_is_locked(rwsem); + /* + * If the shared flag is not set, pass 0 to explicitly check for + * exclusive access to the lock. If the shared flag is set, we typically + * want to make sure the lock is at least held in shared mode + * (i.e., shared | excl) but we don't necessarily care that it might + * actually be held exclusive. Therefore, pass -1 to check whether the + * lock is held in any mode rather than one of the explicit shared mode + * values (1 or 2)." + */ + if (lock_flags & (1 << XFS_SHARED_LOCK_SHIFT)) { + return lockdep_is_held_type(rwsem, -1); + } + return lockdep_is_held_type(rwsem, 0); +} + +bool xfs_isilocked( - xfs_inode_t *ip, + struct xfs_inode *ip, uint lock_flags) { if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) { @@ -359,15 +384,13 @@ xfs_isilocked( return rwsem_is_locked(&ip->i_mmaplock.mr_lock); } - if (lock_flags & (XFS_IOLOCK_EXCL|XFS_IOLOCK_SHARED)) { - if (!(lock_flags & XFS_IOLOCK_SHARED)) - return !debug_locks || - lockdep_is_held_type(&VFS_I(ip)->i_rwsem, 0); - return rwsem_is_locked(&VFS_I(ip)->i_rwsem); + if (lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) { + return __xfs_rwsem_islocked(&VFS_I(ip)->i_rwsem, lock_flags, + XFS_IOLOCK_FLAG_SHIFT); } ASSERT(0); - return 0; + return false; } #endif diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index ca826cfba91c..1c0e15c480bc 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -262,12 +262,19 @@ static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip) * Bit ranges: 1<<1 - 1<<16-1 -- iolock/ilock modes (bitfield) * 1<<16 - 1<<32-1 -- lockdep annotation (integers) */ -#define XFS_IOLOCK_EXCL (1<<0) -#define XFS_IOLOCK_SHARED (1<<1) -#define XFS_ILOCK_EXCL (1<<2) -#define XFS_ILOCK_SHARED (1<<3) -#define XFS_MMAPLOCK_EXCL (1<<4) -#define XFS_MMAPLOCK_SHARED (1<<5) + +#define XFS_IOLOCK_FLAG_SHIFT 0 +#define XFS_ILOCK_FLAG_SHIFT 2 +#define XFS_MMAPLOCK_FLAG_SHIFT 4 + +#define XFS_SHARED_LOCK_SHIFT 1 + +#define XFS_IOLOCK_EXCL (1 << XFS_IOLOCK_FLAG_SHIFT) +#define XFS_IOLOCK_SHARED (XFS_IOLOCK_EXCL << XFS_SHARED_LOCK_SHIFT) +#define XFS_ILOCK_EXCL (1 << XFS_ILOCK_FLAG_SHIFT) +#define XFS_ILOCK_SHARED (XFS_ILOCK_EXCL << XFS_SHARED_LOCK_SHIFT) +#define XFS_MMAPLOCK_EXCL (1 << XFS_MMAPLOCK_FLAG_SHIFT) +#define XFS_MMAPLOCK_SHARED (XFS_MMAPLOCK_EXCL << XFS_SHARED_LOCK_SHIFT) #define XFS_LOCK_MASK (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED \ | XFS_ILOCK_EXCL | XFS_ILOCK_SHARED \ @@ -410,7 +417,7 @@ void xfs_ilock(xfs_inode_t *, uint); int xfs_ilock_nowait(xfs_inode_t *, uint); void xfs_iunlock(xfs_inode_t *, uint); void xfs_ilock_demote(xfs_inode_t *, uint); -int xfs_isilocked(xfs_inode_t *, uint); +bool xfs_isilocked(struct xfs_inode *, uint); uint xfs_ilock_data_map_shared(struct xfs_inode *); uint xfs_ilock_attr_map_shared(struct xfs_inode *); From patchwork Mon Jun 7 14:52:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303763 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E94C9C49361 for ; Mon, 7 Jun 2021 14:52:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF0D861108 for ; Mon, 7 Jun 2021 14:52:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231443AbhFGOyo (ORCPT ); Mon, 7 Jun 2021 10:54:44 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51720 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231132AbhFGOyc (ORCPT ); Mon, 7 Jun 2021 10:54:32 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id D1CF11FDD1; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bgvhMIl1mhONFJymI2z921fWdq0nztyS6tFQ71v6L/c=; b=EogLMR4VH/4i7SOnd8JnFVMebkTurk1AiRWT9zX3SM76UqTrozjotz9wtldiZHnAOrL+J/ 5arW/6LoGlowEyqm0G74Q6JDXuZZo3QefVvJemg9NNrQqq175ej6ugZnm6YUcw6NWbPVXE ggDSiQxU0aXdW8cMSgiqEujHS4WLVPo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bgvhMIl1mhONFJymI2z921fWdq0nztyS6tFQ71v6L/c=; b=ppG1pyBH/COR3cR7iByuuCer71tUF00D0mctM0GDVYDjxq5hWjNYXM65CnirRjZf3jKWWg DaS1QAcczgCciEBQ== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id B3B80A3B91; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id A7F1E1F2CB7; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , Christoph Hellwig , "Darrick J. Wong" Subject: [PATCH 08/14] xfs: Convert to use invalidate_lock Date: Mon, 7 Jun 2021 16:52:18 +0200 Message-Id: <20210607145236.31852-8-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7878; h=from:subject; bh=J3EQ8pDQ5KRZUdlRHNva+CWDZzhCxMdVvTCGFmCAjwU=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKiV7EBqLMkIyrPRQk9vprxZNHegMYGEVXhYfxb 0fnI37OJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4yogAKCRCcnaoHP2RA2XlUB/ 0QZ4lKWIb3zLYk1dU5C31Zyp7eDhQwPHyHUdXnw3tlLdht3VwmkPWMlwZjSkYIlQzSk86MZBo5U+9s MlU/Msh5iu3K2ipkMhCe2E0jKnfnssiy8RQDzlUZdSGhrHZyRWVVVvnw+4GEI7fx+xB5VjrBNDsql3 I2YbA11Vy9PMzWQuOtDfu3auh5kN2WT1GTvpbFJOyJZPe0x4FU/XOlRKPpowFkIMyppuhW1MU/UVlZ lmpMmU2xkCwJIDCKl7YTdLvNp6v39KORODCCgSE7FURn5ZllE3CWOZZ9BWefBkPsZIRTOWaeBkgO8V WHaxLauOSdTqRL1Hn1hlHXwIkITtfx X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use invalidate_lock instead of XFS internal i_mmap_lock. The intended purpose of invalidate_lock is exactly the same. Note that the locking in __xfs_filemap_fault() slightly changes as filemap_fault() already takes invalidate_lock. Reviewed-by: Christoph Hellwig CC: CC: "Darrick J. Wong" Signed-off-by: Jan Kara Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 13 +++++++----- fs/xfs/xfs_inode.c | 50 ++++++++++++++++++++++++---------------------- fs/xfs/xfs_inode.h | 1 - fs/xfs/xfs_super.c | 2 -- 4 files changed, 34 insertions(+), 32 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 396ef36dcd0a..7cb7703c2209 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1282,7 +1282,7 @@ xfs_file_llseek( * * mmap_lock (MM) * sb_start_pagefault(vfs, freeze) - * i_mmaplock (XFS - truncate serialisation) + * invalidate_lock (vfs/XFS_MMAPLOCK - truncate serialisation) * page_lock (MM) * i_lock (XFS - extent map serialisation) */ @@ -1303,24 +1303,27 @@ __xfs_filemap_fault( file_update_time(vmf->vma->vm_file); } - xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); if (IS_DAX(inode)) { pfn_t pfn; + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, (write_fault && !vmf->cow_page) ? &xfs_direct_write_iomap_ops : &xfs_read_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); } else { - if (write_fault) + if (write_fault) { + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); ret = iomap_page_mkwrite(vmf, &xfs_buffered_write_iomap_ops); - else + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); + } else { ret = filemap_fault(vmf); + } } - xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); if (write_fault) sb_end_pagefault(inode->i_sb); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 6247977870bd..e1854a660809 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -131,7 +131,7 @@ xfs_ilock_attr_map_shared( /* * In addition to i_rwsem in the VFS inode, the xfs inode contains 2 - * multi-reader locks: i_mmap_lock and the i_lock. This routine allows + * multi-reader locks: invalidate_lock and the i_lock. This routine allows * various combinations of the locks to be obtained. * * The 3 locks should always be ordered so that the IO lock is obtained first, @@ -139,23 +139,23 @@ xfs_ilock_attr_map_shared( * * Basic locking order: * - * i_rwsem -> i_mmap_lock -> page_lock -> i_ilock + * i_rwsem -> invalidate_lock -> page_lock -> i_ilock * * mmap_lock locking order: * * i_rwsem -> page lock -> mmap_lock - * mmap_lock -> i_mmap_lock -> page_lock + * mmap_lock -> invalidate_lock -> page_lock * * The difference in mmap_lock locking order mean that we cannot hold the - * i_mmap_lock over syscall based read(2)/write(2) based IO. These IO paths can - * fault in pages during copy in/out (for buffered IO) or require the mmap_lock - * in get_user_pages() to map the user pages into the kernel address space for - * direct IO. Similarly the i_rwsem cannot be taken inside a page fault because - * page faults already hold the mmap_lock. + * invalidate_lock over syscall based read(2)/write(2) based IO. These IO paths + * can fault in pages during copy in/out (for buffered IO) or require the + * mmap_lock in get_user_pages() to map the user pages into the kernel address + * space for direct IO. Similarly the i_rwsem cannot be taken inside a page + * fault because page faults already hold the mmap_lock. * * Hence to serialise fully against both syscall and mmap based IO, we need to - * take both the i_rwsem and the i_mmap_lock. These locks should *only* be both - * taken in places where we need to invalidate the page cache in a race + * take both the i_rwsem and the invalidate_lock. These locks should *only* be + * both taken in places where we need to invalidate the page cache in a race * free manner (e.g. truncate, hole punch and other extent manipulation * functions). */ @@ -187,10 +187,13 @@ xfs_ilock( XFS_IOLOCK_DEP(lock_flags)); } - if (lock_flags & XFS_MMAPLOCK_EXCL) - mrupdate_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags)); - else if (lock_flags & XFS_MMAPLOCK_SHARED) - mraccess_nested(&ip->i_mmaplock, XFS_MMAPLOCK_DEP(lock_flags)); + if (lock_flags & XFS_MMAPLOCK_EXCL) { + down_write_nested(&VFS_I(ip)->i_mapping->invalidate_lock, + XFS_MMAPLOCK_DEP(lock_flags)); + } else if (lock_flags & XFS_MMAPLOCK_SHARED) { + down_read_nested(&VFS_I(ip)->i_mapping->invalidate_lock, + XFS_MMAPLOCK_DEP(lock_flags)); + } if (lock_flags & XFS_ILOCK_EXCL) mrupdate_nested(&ip->i_lock, XFS_ILOCK_DEP(lock_flags)); @@ -239,10 +242,10 @@ xfs_ilock_nowait( } if (lock_flags & XFS_MMAPLOCK_EXCL) { - if (!mrtryupdate(&ip->i_mmaplock)) + if (!down_write_trylock(&VFS_I(ip)->i_mapping->invalidate_lock)) goto out_undo_iolock; } else if (lock_flags & XFS_MMAPLOCK_SHARED) { - if (!mrtryaccess(&ip->i_mmaplock)) + if (!down_read_trylock(&VFS_I(ip)->i_mapping->invalidate_lock)) goto out_undo_iolock; } @@ -257,9 +260,9 @@ xfs_ilock_nowait( out_undo_mmaplock: if (lock_flags & XFS_MMAPLOCK_EXCL) - mrunlock_excl(&ip->i_mmaplock); + up_write(&VFS_I(ip)->i_mapping->invalidate_lock); else if (lock_flags & XFS_MMAPLOCK_SHARED) - mrunlock_shared(&ip->i_mmaplock); + up_read(&VFS_I(ip)->i_mapping->invalidate_lock); out_undo_iolock: if (lock_flags & XFS_IOLOCK_EXCL) up_write(&VFS_I(ip)->i_rwsem); @@ -306,9 +309,9 @@ xfs_iunlock( up_read(&VFS_I(ip)->i_rwsem); if (lock_flags & XFS_MMAPLOCK_EXCL) - mrunlock_excl(&ip->i_mmaplock); + up_write(&VFS_I(ip)->i_mapping->invalidate_lock); else if (lock_flags & XFS_MMAPLOCK_SHARED) - mrunlock_shared(&ip->i_mmaplock); + up_read(&VFS_I(ip)->i_mapping->invalidate_lock); if (lock_flags & XFS_ILOCK_EXCL) mrunlock_excl(&ip->i_lock); @@ -334,7 +337,7 @@ xfs_ilock_demote( if (lock_flags & XFS_ILOCK_EXCL) mrdemote(&ip->i_lock); if (lock_flags & XFS_MMAPLOCK_EXCL) - mrdemote(&ip->i_mmaplock); + downgrade_write(&VFS_I(ip)->i_mapping->invalidate_lock); if (lock_flags & XFS_IOLOCK_EXCL) downgrade_write(&VFS_I(ip)->i_rwsem); @@ -379,9 +382,8 @@ xfs_isilocked( } if (lock_flags & (XFS_MMAPLOCK_EXCL|XFS_MMAPLOCK_SHARED)) { - if (!(lock_flags & XFS_MMAPLOCK_SHARED)) - return !!ip->i_mmaplock.mr_writer; - return rwsem_is_locked(&ip->i_mmaplock.mr_lock); + return __xfs_rwsem_islocked(&VFS_I(ip)->i_rwsem, lock_flags, + XFS_MMAPLOCK_FLAG_SHIFT); } if (lock_flags & (XFS_IOLOCK_EXCL | XFS_IOLOCK_SHARED)) { diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 1c0e15c480bc..4551bdd555ce 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -40,7 +40,6 @@ typedef struct xfs_inode { /* Transaction and locking information. */ struct xfs_inode_log_item *i_itemp; /* logging information */ mrlock_t i_lock; /* inode lock */ - mrlock_t i_mmaplock; /* inode mmap IO lock */ atomic_t i_pincount; /* inode pin count */ /* diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index a2dab05332ac..eeaf44910b5f 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -715,8 +715,6 @@ xfs_fs_inode_init_once( atomic_set(&ip->i_pincount, 0); spin_lock_init(&ip->i_flags_lock); - mrlock_init(&ip->i_mmaplock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER, - "xfsino", ip->i_ino); mrlock_init(&ip->i_lock, MRLOCK_ALLOW_EQUAL_PRI|MRLOCK_BARRIER, "xfsino", ip->i_ino); } From patchwork Mon Jun 7 14:52:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303773 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49D94C52CA5 for ; Mon, 7 Jun 2021 14:53:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1838C60FE5 for ; Mon, 7 Jun 2021 14:53:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231183AbhFGOy4 (ORCPT ); Mon, 7 Jun 2021 10:54:56 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51722 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230525AbhFGOyc (ORCPT ); Mon, 7 Jun 2021 10:54:32 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id E170F1FDD5; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KfDin0J2zny5bW8XnWnPVfAKe9yqG7WG/xT8oxb1OFc=; b=ko9c25cMZ9m2NiKcrQaoy4jB2KP5+1KJ5AmveEHVcxVC897KyySKjriOsW9ddFUrBmrsnW tplhe3RBaLbIBGr/60F4LD9RtyGK0O7MJoXf3rJ2ReLB48rTwVcVr5qpYZnb39Umi7B1TU fevxNlW1A1nPHC0/mQw4leHU5yao5Us= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KfDin0J2zny5bW8XnWnPVfAKe9yqG7WG/xT8oxb1OFc=; b=zCJrGdMPlN9oBtyZvI/DiQEevYk2yUhECJIW5Kelc7FBxXz22DF6h0GjiHDGJjDHaYrs2Q SEThINadMsXZUtBw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id B3662A3B90; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id ABD5C1F2CB8; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara , "Darrick J. Wong" Subject: [PATCH 09/14] xfs: Convert double locking of MMAPLOCK to use VFS helpers Date: Mon, 7 Jun 2021 16:52:19 +0200 Message-Id: <20210607145236.31852-9-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=4917; h=from:subject; bh=Xf/EluwecvDckS87lWRfy77OC8mUhzlKOvkojalUOxs=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKjGNBsDA4B/xWBG7QqXcuTTJ84ph+j2kRzoAje 0Eh8m9mJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4yowAKCRCcnaoHP2RA2Uh7CA DZbcBgGYIBPgI4e6MROhQ7DoBtFZkTvHXvvFzadqOPGkvDAkWZ7mKBrNWv6aRNNMkTBjw6uF7CuYOR AQk0lmW+6Cx7gK1uu5H9A96j7hXnKvCDYplEu4gG5Ac7LQTmWPNtFlG4gATfERUak58My8Ja2eDkLj r74P/RTyjlGIG5SLeKnqQAKaxk1qYZ7ILXWpX4U4VZ1J6U9X7uZYKaU7HfsZd5yqB+QADrj++hrsEG Kgr4bDYlseBkWLvTOGTVyaU+izvpq7aJNMbC+BoWwusx2fNABlDXJo0MDQiivDY0+75/9P3+OatqKt 5Bux1K4NKGY6+gr+NlJqaR6T1oEnwb X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Convert places in XFS that take MMAPLOCK for two inodes to use helper VFS provides for it (filemap_invalidate_down_write_two()). Note that this changes lock ordering for MMAPLOCK from inode number based ordering to pointer based ordering VFS generally uses. CC: "Darrick J. Wong" Reviewed-by: Darrick J. Wong Signed-off-by: Jan Kara Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 15 ++++++++------- fs/xfs/xfs_inode.c | 37 +++++++++++-------------------------- 2 files changed, 19 insertions(+), 33 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index a5e9d7d34023..7421d6ec4def 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1582,7 +1582,6 @@ xfs_swap_extents( struct xfs_bstat *sbp = &sxp->sx_stat; int src_log_flags, target_log_flags; int error = 0; - int lock_flags; uint64_t f; int resblks = 0; unsigned int flags = 0; @@ -1594,8 +1593,8 @@ xfs_swap_extents( * do the rest of the checks. */ lock_two_nondirectories(VFS_I(ip), VFS_I(tip)); - lock_flags = XFS_MMAPLOCK_EXCL; - xfs_lock_two_inodes(ip, XFS_MMAPLOCK_EXCL, tip, XFS_MMAPLOCK_EXCL); + filemap_invalidate_lock_two(VFS_I(ip)->i_mapping, + VFS_I(tip)->i_mapping); /* Verify that both files have the same format */ if ((VFS_I(ip)->i_mode & S_IFMT) != (VFS_I(tip)->i_mode & S_IFMT)) { @@ -1667,7 +1666,6 @@ xfs_swap_extents( * or cancel will unlock the inodes from this point onwards. */ xfs_lock_two_inodes(ip, XFS_ILOCK_EXCL, tip, XFS_ILOCK_EXCL); - lock_flags |= XFS_ILOCK_EXCL; xfs_trans_ijoin(tp, ip, 0); xfs_trans_ijoin(tp, tip, 0); @@ -1786,13 +1784,16 @@ xfs_swap_extents( trace_xfs_swap_extent_after(ip, 0); trace_xfs_swap_extent_after(tip, 1); +out_unlock_ilock: + xfs_iunlock(ip, XFS_ILOCK_EXCL); + xfs_iunlock(tip, XFS_ILOCK_EXCL); out_unlock: - xfs_iunlock(ip, lock_flags); - xfs_iunlock(tip, lock_flags); + filemap_invalidate_unlock_two(VFS_I(ip)->i_mapping, + VFS_I(tip)->i_mapping); unlock_two_nondirectories(VFS_I(ip), VFS_I(tip)); return error; out_trans_cancel: xfs_trans_cancel(tp); - goto out_unlock; + goto out_unlock_ilock; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index e1854a660809..0468f56f3bbb 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -556,12 +556,10 @@ xfs_lock_inodes( } /* - * xfs_lock_two_inodes() can only be used to lock one type of lock at a time - - * the mmaplock or the ilock, but not more than one type at a time. If we lock - * more than one at a time, lockdep will report false positives saying we have - * violated locking orders. The iolock must be double-locked separately since - * we use i_rwsem for that. We now support taking one lock EXCL and the other - * SHARED. + * xfs_lock_two_inodes() can only be used to lock ilock. The iolock and + * mmaplock must be double-locked separately since we use i_rwsem and + * invalidate_lock for that. We now support taking one lock EXCL and the + * other SHARED. */ void xfs_lock_two_inodes( @@ -579,15 +577,8 @@ xfs_lock_two_inodes( ASSERT(hweight32(ip1_mode) == 1); ASSERT(!(ip0_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))); ASSERT(!(ip1_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL))); - ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || - !(ip0_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); - ASSERT(!(ip1_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || - !(ip1_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); - ASSERT(!(ip1_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || - !(ip0_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); - ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL)) || - !(ip1_mode & (XFS_ILOCK_SHARED|XFS_ILOCK_EXCL))); - + ASSERT(!(ip0_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL))); + ASSERT(!(ip1_mode & (XFS_MMAPLOCK_SHARED|XFS_MMAPLOCK_EXCL))); ASSERT(ip0->i_ino != ip1->i_ino); if (ip0->i_ino > ip1->i_ino) { @@ -3750,11 +3741,8 @@ xfs_ilock2_io_mmap( ret = xfs_iolock_two_inodes_and_break_layout(VFS_I(ip1), VFS_I(ip2)); if (ret) return ret; - if (ip1 == ip2) - xfs_ilock(ip1, XFS_MMAPLOCK_EXCL); - else - xfs_lock_two_inodes(ip1, XFS_MMAPLOCK_EXCL, - ip2, XFS_MMAPLOCK_EXCL); + filemap_invalidate_lock_two(VFS_I(ip1)->i_mapping, + VFS_I(ip2)->i_mapping); return 0; } @@ -3764,12 +3752,9 @@ xfs_iunlock2_io_mmap( struct xfs_inode *ip1, struct xfs_inode *ip2) { - bool same_inode = (ip1 == ip2); - - xfs_iunlock(ip2, XFS_MMAPLOCK_EXCL); - if (!same_inode) - xfs_iunlock(ip1, XFS_MMAPLOCK_EXCL); + filemap_invalidate_unlock_two(VFS_I(ip1)->i_mapping, + VFS_I(ip2)->i_mapping); inode_unlock(VFS_I(ip2)); - if (!same_inode) + if (ip1 != ip2) inode_unlock(VFS_I(ip1)); } From patchwork Mon Jun 7 14:52:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303757 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNWANTED_LANGUAGE_BODY,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7D28C49ED6 for ; Mon, 7 Jun 2021 14:52:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B153A61108 for ; Mon, 7 Jun 2021 14:52:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231371AbhFGOyj (ORCPT ); Mon, 7 Jun 2021 10:54:39 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51664 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230481AbhFGOyb (ORCPT ); Mon, 7 Jun 2021 10:54:31 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id D2A781FDD4; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=emP2yQ1eljPd1NFX6C4KS1pG2iKhLeUQkNZ2AuwC2+s=; b=M0d+f8btnCIe1QK/dux+/6AQfIyyA/+aDbfV8qb30yBxhWh6TMzBgrC3+avIrmPbcZzJ9r ir1hn38uTd6Hgjk+8td+UxOlJXLbd8frAlcBvtF6DQYY3Lk1hTqrSbOxG+/kOVO/QvmUJ0 +rfKfYtP0POGSeqCcArR+QX77x3kqzk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077557; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=emP2yQ1eljPd1NFX6C4KS1pG2iKhLeUQkNZ2AuwC2+s=; b=Mrs92P348/BMc30iG2PezhMoktG73qKm6adBOt7FIzmlw6nEWuy6rlFva1dQf829UAqptu niFzpiGEe/n4UQBw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id B4324A3B96; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id B10E21F2CB9; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 10/14] zonefs: Convert to using invalidate_lock Date: Mon, 7 Jun 2021 16:52:20 +0200 Message-Id: <20210607145236.31852-10-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3534; h=from:subject; bh=X1+FAHnRq/hvKiFmSNKc+MwXBtDprJwr7H1BxXLJ9wE=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKkL5zkdMExsJLTZiQSWW/f9riYmtp7RAK4a4BF P6yR1FuJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ypAAKCRCcnaoHP2RA2dAfCA DQMh0/p+yS2UAZ84nujOxvtQWyQ6awdWkN4hNN/NdNp5tF69bN9+gs7jwWbp/UgDdJ220wsXGvYJJi l2EAGoCfm5elr4VAcWec5uAdxI7bkybITdz43ZbQNdoG4lLMGhMmUlcJuWoIxcmRYB/Vbz4++4lVy5 YWxTPGE72IpnydtQTVWRFcrzV+Yh5gwq63DnNxLIlJzyJqjZzL4px5hWCRzj2BTRO5I8fDVED+93vB t8BHGAyhO1G3UR2EqKW06PigQJdrTfBpGHLGlIPkuQMrITkTMahJAzT839RpHEZVE/7E/lTMzdWSZP znTAQKojbwMw4ceZJxqEdetQR2BhGh X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use invalidate_lock instead of zonefs' private i_mmap_sem. The intended purpose is exactly the same. CC: Damien Le Moal CC: Johannes Thumshirn CC: Acked-by: Damien Le Moal Signed-off-by: Jan Kara --- fs/zonefs/super.c | 23 +++++------------------ fs/zonefs/zonefs.h | 7 +++---- 2 files changed, 8 insertions(+), 22 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index cd145d318b17..8931a566038e 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -462,7 +462,7 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize) inode_dio_wait(inode); /* Serialize against page faults */ - down_write(&zi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); /* Serialize against zonefs_iomap_begin() */ mutex_lock(&zi->i_truncate_mutex); @@ -500,7 +500,7 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize) unlock: mutex_unlock(&zi->i_truncate_mutex); - up_write(&zi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); return ret; } @@ -575,18 +575,6 @@ static int zonefs_file_fsync(struct file *file, loff_t start, loff_t end, return ret; } -static vm_fault_t zonefs_filemap_fault(struct vm_fault *vmf) -{ - struct zonefs_inode_info *zi = ZONEFS_I(file_inode(vmf->vma->vm_file)); - vm_fault_t ret; - - down_read(&zi->i_mmap_sem); - ret = filemap_fault(vmf); - up_read(&zi->i_mmap_sem); - - return ret; -} - static vm_fault_t zonefs_filemap_page_mkwrite(struct vm_fault *vmf) { struct inode *inode = file_inode(vmf->vma->vm_file); @@ -607,16 +595,16 @@ static vm_fault_t zonefs_filemap_page_mkwrite(struct vm_fault *vmf) file_update_time(vmf->vma->vm_file); /* Serialize against truncates */ - down_read(&zi->i_mmap_sem); + filemap_invalidate_lock_shared(inode->i_mapping); ret = iomap_page_mkwrite(vmf, &zonefs_iomap_ops); - up_read(&zi->i_mmap_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); sb_end_pagefault(inode->i_sb); return ret; } static const struct vm_operations_struct zonefs_file_vm_ops = { - .fault = zonefs_filemap_fault, + .fault = filemap_fault, .map_pages = filemap_map_pages, .page_mkwrite = zonefs_filemap_page_mkwrite, }; @@ -1158,7 +1146,6 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb) inode_init_once(&zi->i_vnode); mutex_init(&zi->i_truncate_mutex); - init_rwsem(&zi->i_mmap_sem); zi->i_wr_refcnt = 0; return &zi->i_vnode; diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h index 51141907097c..7b147907c328 100644 --- a/fs/zonefs/zonefs.h +++ b/fs/zonefs/zonefs.h @@ -70,12 +70,11 @@ struct zonefs_inode_info { * and changes to the inode private data, and in particular changes to * a sequential file size on completion of direct IO writes. * Serialization of mmap read IOs with truncate and syscall IO - * operations is done with i_mmap_sem in addition to i_truncate_mutex. - * Only zonefs_seq_file_truncate() takes both lock (i_mmap_sem first, - * i_truncate_mutex second). + * operations is done with invalidate_lock in addition to + * i_truncate_mutex. Only zonefs_seq_file_truncate() takes both lock + * (invalidate_lock first, i_truncate_mutex second). */ struct mutex i_truncate_mutex; - struct rw_semaphore i_mmap_sem; /* guarded by i_truncate_mutex */ unsigned int i_wr_refcnt; From patchwork Mon Jun 7 14:52:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303771 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A759C4CED1 for ; Mon, 7 Jun 2021 14:53:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8F2960FE5 for ; Mon, 7 Jun 2021 14:53:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231480AbhFGOyy (ORCPT ); Mon, 7 Jun 2021 10:54:54 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51748 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231183AbhFGOyd (ORCPT ); Mon, 7 Jun 2021 10:54:33 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 0F9621FDD7; Mon, 7 Jun 2021 14:52:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IYSgS5m2T2u6Q6Ponyx+Zv42cR4FBe8YnAlTCyybDyU=; b=joREQLoRJIX4r39pUhk19XimH+tuu2F5vaaVaQDlBNuVXvKj2tHu4ev8j+cbRrErJQSik9 1yPwKwEBVUzluhO2f0FiTjYAKvWdjh3GZRkJLBBUiJuvDNb9glT5YEFmLIU6HLbfampi+I B2NiB4PP/evrejwYxGA4AQDux+zgeD0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IYSgS5m2T2u6Q6Ponyx+Zv42cR4FBe8YnAlTCyybDyU=; b=gq+L6wX/9UqiGmzLKik0LrRbrsxw3yavon/SNjh1r91+DTobUhBj045SJq5IDqteJhm/eI HLEZ577PX0qTIpAQ== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id CEC8FA3B8B; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id B777C1F2CBA; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 11/14] f2fs: Convert to using invalidate_lock Date: Mon, 7 Jun 2021 16:52:21 +0200 Message-Id: <20210607145236.31852-11-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=11698; h=from:subject; bh=ezAzL9Mh91MTCMm+oIB7Tr9ZHGQQoU+2zfoCcWTPNuI=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKlRxZLy+Up9fbVrWZihVGnUQaDfKkh8oR7zo3U L1Kw4eiJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ypQAKCRCcnaoHP2RA2S7DCA CNAmWsJA8HAgaSyRFLyyx8/cjpq2qUTZz+QXtWInurmb+t7GSrQiL6XPLT9089OuR/rCYQsasRpIzx OnobpcKVc/Z92L6HaArcxsaiROmXJB6CkaVnxFYXq2/j+EUuOwVWu33OCytCKHBO691Nlz3taJ2qJ4 BiGfPnLJA8TSKyEIztKaI0dg+bs/axNsi4WuhXObCCC34jMGTyqB77AukcsscFJRH9WfRMZ8U0XcHA bab3CmgNvHLOQS1sx7SmqetsyQ9lhLHW/2C+qYjJbbxMxUTJQceRNNS4YLaqFiLAyYN+fSWSMq8uvN It+T/kK6rX1p2DeS4phX4kn3DkD7Lb X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use invalidate_lock instead of f2fs' private i_mmap_sem. The intended purpose is exactly the same. By this conversion we fix a long standing race between hole punching and read(2) / readahead(2) paths that can lead to stale page cache contents. CC: Jaegeuk Kim CC: Chao Yu CC: linux-f2fs-devel@lists.sourceforge.net Acked-by: Chao Yu Signed-off-by: Jan Kara --- fs/f2fs/data.c | 4 ++-- fs/f2fs/f2fs.h | 1 - fs/f2fs/file.c | 62 ++++++++++++++++++++++++------------------------- fs/f2fs/super.c | 1 - 4 files changed, 32 insertions(+), 36 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 96f1a354f89f..66ce7e7b1c37 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3165,12 +3165,12 @@ static void f2fs_write_failed(struct address_space *mapping, loff_t to) /* In the fs-verity case, f2fs_end_enable_verity() does the truncate */ if (to > i_size && !f2fs_verity_in_progress(inode)) { down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); truncate_pagecache(inode, i_size); f2fs_truncate_blocks(inode, i_size, true); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); } } diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 044878866ca3..1f887c906aaf 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -748,7 +748,6 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct rw_semaphore i_gc_rwsem[2]; - struct rw_semaphore i_mmap_sem; struct rw_semaphore i_xattr_sem; /* avoid racing between reading and changing EAs */ int i_extra_isize; /* size of extra space located in i_addr */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 44a4650aea7b..cce1ac3dd51b 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -38,10 +38,7 @@ static vm_fault_t f2fs_filemap_fault(struct vm_fault *vmf) struct inode *inode = file_inode(vmf->vma->vm_file); vm_fault_t ret; - down_read(&F2FS_I(inode)->i_mmap_sem); ret = filemap_fault(vmf); - up_read(&F2FS_I(inode)->i_mmap_sem); - if (!ret) f2fs_update_iostat(F2FS_I_SB(inode), APP_MAPPED_READ_IO, F2FS_BLKSIZE); @@ -102,7 +99,7 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) f2fs_bug_on(sbi, f2fs_has_inline_data(inode)); file_update_time(vmf->vma->vm_file); - down_read(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(inode->i_mapping); lock_page(page); if (unlikely(page->mapping != inode->i_mapping || page_offset(page) > i_size_read(inode) || @@ -161,7 +158,7 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) trace_f2fs_vm_page_mkwrite(page, DATA); out_sem: - up_read(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); sb_end_pagefault(inode->i_sb); err: @@ -942,7 +939,7 @@ int f2fs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, } down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); truncate_setsize(inode, attr->ia_size); @@ -952,7 +949,7 @@ int f2fs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, * do not trim all blocks after i_size if target size is * larger than i_size. */ - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); if (err) return err; @@ -1097,7 +1094,7 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len) blk_end = (loff_t)pg_end << PAGE_SHIFT; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); truncate_inode_pages_range(mapping, blk_start, blk_end - 1); @@ -1106,7 +1103,7 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len) ret = f2fs_truncate_hole(inode, pg_start, pg_end); f2fs_unlock_op(sbi); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); } } @@ -1341,7 +1338,7 @@ static int f2fs_do_collapse(struct inode *inode, loff_t offset, loff_t len) /* avoid gc operation during block exchange */ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); f2fs_lock_op(sbi); f2fs_drop_extent_tree(inode); @@ -1349,7 +1346,7 @@ static int f2fs_do_collapse(struct inode *inode, loff_t offset, loff_t len) ret = __exchange_data_block(inode, inode, end, start, nrpages - end, true); f2fs_unlock_op(sbi); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); return ret; } @@ -1380,13 +1377,13 @@ static int f2fs_collapse_range(struct inode *inode, loff_t offset, loff_t len) return ret; /* write out all moved pages, if possible */ - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); truncate_pagecache(inode, offset); new_size = i_size_read(inode) - len; ret = f2fs_truncate_blocks(inode, new_size, true); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); if (!ret) f2fs_i_size_write(inode, new_size); return ret; @@ -1486,7 +1483,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, pgoff_t end; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); truncate_pagecache_range(inode, (loff_t)index << PAGE_SHIFT, @@ -1498,7 +1495,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, ret = f2fs_get_dnode_of_data(&dn, index, ALLOC_NODE); if (ret) { f2fs_unlock_op(sbi); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); goto out; } @@ -1510,7 +1507,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, f2fs_put_dnode(&dn); f2fs_unlock_op(sbi); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); f2fs_balance_fs(sbi, dn.node_changed); @@ -1545,6 +1542,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct address_space *mapping = inode->i_mapping; pgoff_t nr, pg_start, pg_end, delta, idx; loff_t new_size; int ret = 0; @@ -1567,14 +1565,14 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) f2fs_balance_fs(sbi, true); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = f2fs_truncate_blocks(inode, i_size_read(inode), true); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); if (ret) return ret; /* write out all dirty pages from offset */ - ret = filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); + ret = filemap_write_and_wait_range(mapping, offset, LLONG_MAX); if (ret) return ret; @@ -1585,7 +1583,7 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) /* avoid gc operation during block exchange */ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); truncate_pagecache(inode, offset); while (!ret && idx > pg_start) { @@ -1601,14 +1599,14 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) idx + delta, nr, false); f2fs_unlock_op(sbi); } - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); /* write out all moved pages, if possible */ - down_write(&F2FS_I(inode)->i_mmap_sem); - filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); + filemap_invalidate_lock(mapping); + filemap_write_and_wait_range(mapping, offset, LLONG_MAX); truncate_pagecache(inode, offset); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); if (!ret) f2fs_i_size_write(inode, new_size); @@ -3442,7 +3440,7 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) goto out; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); last_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); @@ -3478,7 +3476,7 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) } up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); out: inode_unlock(inode); @@ -3595,7 +3593,7 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) } down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); last_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); @@ -3631,7 +3629,7 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) } up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); if (ret >= 0) { F2FS_I(inode)->i_flags &= ~F2FS_IMMUTABLE_FL; @@ -3751,7 +3749,7 @@ static int f2fs_sec_trim_file(struct file *filp, unsigned long arg) goto err; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(mapping); ret = filemap_write_and_wait_range(mapping, range.start, to_end ? LLONG_MAX : end_addr - 1); @@ -3838,7 +3836,7 @@ static int f2fs_sec_trim_file(struct file *filp, unsigned long arg) ret = f2fs_secure_erase(prev_bdev, inode, prev_index, prev_block, len, range.flags); out: - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); err: inode_unlock(inode); @@ -4313,9 +4311,9 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) /* if we couldn't write data, we should deallocate blocks. */ if (preallocated && i_size_read(inode) < target_size) { down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - down_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); f2fs_truncate(inode); - up_write(&F2FS_I(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 7d325bfaf65a..22e942aac7ad 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1187,7 +1187,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) mutex_init(&fi->inmem_lock); init_rwsem(&fi->i_gc_rwsem[READ]); init_rwsem(&fi->i_gc_rwsem[WRITE]); - init_rwsem(&fi->i_mmap_sem); init_rwsem(&fi->i_xattr_sem); /* Will be used by directory only */ From patchwork Mon Jun 7 14:52:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303769 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0171C4CC7F for ; Mon, 7 Jun 2021 14:53:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9643560FE5 for ; Mon, 7 Jun 2021 14:53:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231140AbhFGOyu (ORCPT ); Mon, 7 Jun 2021 10:54:50 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51742 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231158AbhFGOyd (ORCPT ); Mon, 7 Jun 2021 10:54:33 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 090AB1FDA5; Mon, 7 Jun 2021 14:52:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aeX4rhkcDHAyqUpRgiJbnC2LH6a+zs6J+9zaOkeEH3Y=; b=XKHoLqpO5u67Iq1U103lvhzthovxYzIkoYspqJJDoDb+R9Auyk98ao/C2orSymtbdPI++5 tTLqwae79VBb2dWxJwHXWMglDDs2Gz2GB8cEu1pNeNJlI+Hb0Vkvfr94xjROSjeboqrKas keY23Y0pqdNFPyuBcqc2DxjsKC/LtB0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aeX4rhkcDHAyqUpRgiJbnC2LH6a+zs6J+9zaOkeEH3Y=; b=ZXyOigLF8+gzgMHZgS63re8HQfTdxNWEzzvxp+FjrrnrVr3EMSP04e6pjUfIU+7yUPRucp zKOanfokJXnvYuCg== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id CED68A3B97; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id BBB171F2CBD; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 12/14] fuse: Convert to using invalidate_lock Date: Mon, 7 Jun 2021 16:52:22 +0200 Message-Id: <20210607145236.31852-12-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=11313; h=from:subject; bh=Bf0pPMhZZMXzEqZz0ieuU3uwLQ7nvtve6PWBxVKZWek=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKmqB0Ijs9WLLgTFedxnvXTc/CiIxq0V+5ipdJ4 NeNDQTyJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ypgAKCRCcnaoHP2RA2fnzCA CLTo/KDG1Cz6GCXe9Uwbl3ugR+Pa9NxdgsbhQ7HONLWDsmxplP0A3LB2bOs0hxzum/dN+KMnAhP7JH 6N/TZ1UWorqAcQhaz/PWbiIEO/7bv2CpV/1nUD7ZPh/dE3Tr64UyzCGJj/DX3cq0itQ5e4yJvbXDdQ gRyTs277uGbdTrpDDtMMyM8lqgEJpPxGyhHyq+U6NRHZiuEImdUd+2Ypj3MGpBrt9ZPEkb/WpssrjU Z2P+OLhfGuVFao3YU1ruH9bsRJ6WnEbTl7yoKyjfFg1k8nHZhY1TGtY3bxXBoMamEofwTLjcBmPeLY hRZJNeZZr0pBzUbC/OIz85j+J6LK3h X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Use invalidate_lock instead of fuse's private i_mmap_sem. The intended purpose is exactly the same. By this conversion we fix a long standing race between hole punching and read(2) / readahead(2) paths that can lead to stale page cache contents. CC: Miklos Szeredi Signed-off-by: Jan Kara Reviewed-by: Miklos Szeredi --- fs/fuse/dax.c | 50 +++++++++++++++++++++++------------------------- fs/fuse/dir.c | 11 ++++++----- fs/fuse/file.c | 10 +++++----- fs/fuse/fuse_i.h | 7 ------- fs/fuse/inode.c | 1 - 5 files changed, 35 insertions(+), 44 deletions(-) diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index ff99ab2a3c43..08d2b7e2c3a0 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -443,12 +443,12 @@ static int fuse_setup_new_dax_mapping(struct inode *inode, loff_t pos, /* * Can't do inline reclaim in fault path. We call * dax_layout_busy_page() before we free a range. And - * fuse_wait_dax_page() drops fi->i_mmap_sem lock and requires it. - * In fault path we enter with fi->i_mmap_sem held and can't drop - * it. Also in fault path we hold fi->i_mmap_sem shared and not - * exclusive, so that creates further issues with fuse_wait_dax_page(). - * Hence return -EAGAIN and fuse_dax_fault() will wait for a memory - * range to become free and retry. + * fuse_wait_dax_page() drops mapping->invalidate_lock and requires it. + * In fault path we enter with mapping->invalidate_lock held and can't + * drop it. Also in fault path we hold mapping->invalidate_lock shared + * and not exclusive, so that creates further issues with + * fuse_wait_dax_page(). Hence return -EAGAIN and fuse_dax_fault() + * will wait for a memory range to become free and retry. */ if (flags & IOMAP_FAULT) { alloc_dmap = alloc_dax_mapping(fcd); @@ -512,7 +512,7 @@ static int fuse_upgrade_dax_mapping(struct inode *inode, loff_t pos, down_write(&fi->dax->sem); node = interval_tree_iter_first(&fi->dax->tree, idx, idx); - /* We are holding either inode lock or i_mmap_sem, and that should + /* We are holding either inode lock or invalidate_lock, and that should * ensure that dmap can't be truncated. We are holding a reference * on dmap and that should make sure it can't be reclaimed. So dmap * should still be there in tree despite the fact we dropped and @@ -659,14 +659,12 @@ static const struct iomap_ops fuse_iomap_ops = { static void fuse_wait_dax_page(struct inode *inode) { - struct fuse_inode *fi = get_fuse_inode(inode); - - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); schedule(); - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); } -/* Should be called with fi->i_mmap_sem lock held exclusively */ +/* Should be called with mapping->invalidate_lock held exclusively */ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, loff_t start, loff_t end) { @@ -812,18 +810,18 @@ static vm_fault_t __fuse_dax_fault(struct vm_fault *vmf, * we do not want any read/write/mmap to make progress and try * to populate page cache or access memory we are trying to free. */ - down_read(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_lock_shared(inode->i_mapping); ret = dax_iomap_fault(vmf, pe_size, &pfn, &error, &fuse_iomap_ops); if ((ret & VM_FAULT_ERROR) && error == -EAGAIN) { error = 0; retry = true; - up_read(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); goto retry; } if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); - up_read(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_unlock_shared(inode->i_mapping); if (write) sb_end_pagefault(sb); @@ -959,7 +957,7 @@ inode_inline_reclaim_one_dmap(struct fuse_conn_dax *fcd, struct inode *inode, int ret; struct interval_tree_node *node; - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); /* Lookup a dmap and corresponding file offset to reclaim. */ down_read(&fi->dax->sem); @@ -1020,7 +1018,7 @@ inode_inline_reclaim_one_dmap(struct fuse_conn_dax *fcd, struct inode *inode, out_write_dmap_sem: up_write(&fi->dax->sem); out_mmap_sem: - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); return dmap; } @@ -1049,10 +1047,10 @@ alloc_dax_mapping_reclaim(struct fuse_conn_dax *fcd, struct inode *inode) * had a reference or some other temporary failure, * Try again. We want to give up inline reclaim only * if there is no range assigned to this node. Otherwise - * if a deadlock is possible if we sleep with fi->i_mmap_sem - * held and worker to free memory can't make progress due - * to unavailability of fi->i_mmap_sem lock. So sleep - * only if fi->dax->nr=0 + * if a deadlock is possible if we sleep with + * mapping->invalidate_lock held and worker to free memory + * can't make progress due to unavailability of + * mapping->invalidate_lock. So sleep only if fi->dax->nr=0 */ if (retry) continue; @@ -1060,8 +1058,8 @@ alloc_dax_mapping_reclaim(struct fuse_conn_dax *fcd, struct inode *inode) * There are no mappings which can be reclaimed. Wait for one. * We are not holding fi->dax->sem. So it is possible * that range gets added now. But as we are not holding - * fi->i_mmap_sem, worker should still be able to free up - * a range and wake us up. + * mapping->invalidate_lock, worker should still be able to + * free up a range and wake us up. */ if (!fi->dax->nr && !(fcd->nr_free_ranges > 0)) { if (wait_event_killable_exclusive(fcd->range_waitq, @@ -1107,7 +1105,7 @@ static int lookup_and_reclaim_dmap_locked(struct fuse_conn_dax *fcd, /* * Free a range of memory. * Locking: - * 1. Take fi->i_mmap_sem to block dax faults. + * 1. Take mapping->invalidate_lock to block dax faults. * 2. Take fi->dax->sem to protect interval tree and also to make sure * read/write can not reuse a dmap which we might be freeing. */ @@ -1121,7 +1119,7 @@ static int lookup_and_reclaim_dmap(struct fuse_conn_dax *fcd, loff_t dmap_start = start_idx << FUSE_DAX_SHIFT; loff_t dmap_end = (dmap_start + FUSE_DAX_SZ) - 1; - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); ret = fuse_dax_break_layouts(inode, dmap_start, dmap_end); if (ret) { pr_debug("virtio_fs: fuse_dax_break_layouts() failed. err=%d\n", @@ -1133,7 +1131,7 @@ static int lookup_and_reclaim_dmap(struct fuse_conn_dax *fcd, ret = lookup_and_reclaim_dmap_locked(fcd, inode, start_idx); up_write(&fi->dax->sem); out_mmap_sem: - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); return ret; } diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 1b6c001a7dd1..4ea9c690dfed 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1601,6 +1601,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, struct fuse_mount *fm = get_fuse_mount(inode); struct fuse_conn *fc = fm->fc; struct fuse_inode *fi = get_fuse_inode(inode); + struct address_space *mapping = inode->i_mapping; FUSE_ARGS(args); struct fuse_setattr_in inarg; struct fuse_attr_out outarg; @@ -1625,11 +1626,11 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, } if (FUSE_IS_DAX(inode) && is_truncate) { - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(mapping); fault_blocked = true; err = fuse_dax_break_layouts(inode, 0, 0); if (err) { - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(mapping); return err; } } @@ -1739,13 +1740,13 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, if ((is_truncate || !is_wb) && S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { truncate_pagecache(inode, outarg.attr.size); - invalidate_inode_pages2(inode->i_mapping); + invalidate_inode_pages2(mapping); } clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); out: if (fault_blocked) - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(mapping); return 0; @@ -1756,7 +1757,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); if (fault_blocked) - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(mapping); return err; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 09ef2a4d25ed..515104032ddc 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -243,7 +243,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) } if (dax_truncate) { - down_write(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); err = fuse_dax_break_layouts(inode, 0, 0); if (err) goto out; @@ -255,7 +255,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) out: if (dax_truncate) - up_write(&get_fuse_inode(inode)->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); if (is_wb_truncate | dax_truncate) { fuse_release_nowrite(inode); @@ -2920,7 +2920,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, if (lock_inode) { inode_lock(inode); if (block_faults) { - down_write(&fi->i_mmap_sem); + filemap_invalidate_lock(inode->i_mapping); err = fuse_dax_break_layouts(inode, 0, 0); if (err) goto out; @@ -2976,7 +2976,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); if (block_faults) - up_write(&fi->i_mmap_sem); + filemap_invalidate_unlock(inode->i_mapping); if (lock_inode) inode_unlock(inode); @@ -3045,7 +3045,7 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in, * modifications. Yet this does give less guarantees than if the * copying was performed with write(2). * - * To fix this a i_mmap_sem style lock could be used to prevent new + * To fix this a mapping->invalidate_lock could be used to prevent new * faults while the copy is ongoing. */ err = fuse_writeback_range(inode_out, pos_out, pos_out + len - 1); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7e463e220053..5130e88f811e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -149,13 +149,6 @@ struct fuse_inode { /** Lock to protect write related fields */ spinlock_t lock; - /** - * Can't take inode lock in fault path (leads to circular dependency). - * Introduce another semaphore which can be taken in fault path and - * then other filesystem paths can take this to block faults. - */ - struct rw_semaphore i_mmap_sem; - #ifdef CONFIG_FUSE_DAX /* * Dax specific inode data diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 393e36b74dc4..f73bab71a0a0 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -85,7 +85,6 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->orig_ino = 0; fi->state = 0; mutex_init(&fi->mutex); - init_rwsem(&fi->i_mmap_sem); spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); if (!fi->forget) From patchwork Mon Jun 7 14:52:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303767 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18E3AC4CC6E for ; Mon, 7 Jun 2021 14:52:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E232061107 for ; Mon, 7 Jun 2021 14:52:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231448AbhFGOyp (ORCPT ); Mon, 7 Jun 2021 10:54:45 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:51740 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231157AbhFGOyd (ORCPT ); Mon, 7 Jun 2021 10:54:33 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 0DA7F1FDD6; Mon, 7 Jun 2021 14:52:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5hN9VihgYabPzS6M/DoMuNHHykbae3N2/E4zCgl0t4=; b=NFenYZndzTQCSEs0kiVAUsrK4q4SXXp9cvTfVysJCl9S8OYKpb3mep2Wudp9TVce+rKAos ADmLsAgkFnGXxOPA69oYb85BUicyV5sjjgoWrMhz6Wpq7+ACrl16M4hl2VwI4wW/Ld4Gho I9oJu7DPonRpgXb0g9RDbabMD167D1E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c5hN9VihgYabPzS6M/DoMuNHHykbae3N2/E4zCgl0t4=; b=YpALarJquUvnyzCPi/IGQR1z3cZ2is7/Uh27Kg/PZfUSGOrH9/QH4sHHGmPwz8dPsIyLwH UR1UQkMjNEdHi2Bw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id DA68DA3B92; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id C03301F2CBF; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 13/14] ceph: Fix race between hole punch and page fault Date: Mon, 7 Jun 2021 16:52:23 +0200 Message-Id: <20210607145236.31852-13-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2465; h=from:subject; bh=Sp8IgV3mV8zmiY5I9Wbg4z0IgLX0J9niLgWRID5omWo=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKmt/G434Rs9ZHmHm4m7KwtsKNjseeQr2d06yTU b1jblrKJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ypgAKCRCcnaoHP2RA2XPBB/ 0WRyE3RzVXQo4gUS7mKGJPyff+S+h8thH32QhOWZDXf4WYznPoJ+G3wFt5xde4p7HM3U7v4hpB7tQ8 EygfikeupmDm5O4NkUcQX/1NsWO5mf1y1khgu8KW9yas0wO4nNDvfUKTZj+5HUQukNJRO9ns9dIP+i ZHpf05z3tet5psuGs9y6FwuZstNgxlaQ9J8UOjG7fthTYQQ84eb7tMDli7aa2ogiQCpEULtSaR2KZq oSaXwXuly1pLWQKgrrob8K3/TVpt+5qVCPl0s81BB5Lp+mP2VILVhXd3lUp0wCBKewMTkqSdkyuX4i GM5IPBYS2vNK7LNSJvGIyk6io3Qiy/ X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Ceph has a following race between hole punching and page fault: CPU1 CPU2 ceph_fallocate() ... ceph_zero_pagecache_range() ceph_filemap_fault() faults in page in the range being punched ceph_zero_objects() And now we have a page in punched range with invalid data. Fix the problem by using mapping->invalidate_lock similarly to other filesystems. Note that using invalidate_lock also fixes a similar race wrt ->readpage(). CC: Jeff Layton CC: ceph-devel@vger.kernel.org Reviewed-by: Jeff Layton Signed-off-by: Jan Kara --- fs/ceph/addr.c | 9 ++++++--- fs/ceph/file.c | 2 ++ 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index c1570fada3d8..4664b93a32db 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1401,9 +1401,11 @@ static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf) ret = VM_FAULT_SIGBUS; } else { struct address_space *mapping = inode->i_mapping; - struct page *page = find_or_create_page(mapping, 0, - mapping_gfp_constraint(mapping, - ~__GFP_FS)); + struct page *page; + + filemap_invalidate_lock_shared(mapping); + page = find_or_create_page(mapping, 0, + mapping_gfp_constraint(mapping, ~__GFP_FS)); if (!page) { ret = VM_FAULT_OOM; goto out_inline; @@ -1424,6 +1426,7 @@ static vm_fault_t ceph_filemap_fault(struct vm_fault *vmf) vmf->page = page; ret = VM_FAULT_MAJOR | VM_FAULT_LOCKED; out_inline: + filemap_invalidate_unlock_shared(mapping); dout("filemap_fault %p %llu read inline data ret %x\n", inode, off, ret); } diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 77fc037d5beb..e826b1b1e4af 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -2083,6 +2083,7 @@ static long ceph_fallocate(struct file *file, int mode, if (ret < 0) goto unlock; + filemap_invalidate_lock(inode->i_mapping); ceph_zero_pagecache_range(inode, offset, length); ret = ceph_zero_objects(inode, offset, length); @@ -2095,6 +2096,7 @@ static long ceph_fallocate(struct file *file, int mode, if (dirty) __mark_inode_dirty(inode, dirty); } + filemap_invalidate_unlock(inode->i_mapping); ceph_put_cap_refs(ci, got); unlock: From patchwork Mon Jun 7 14:52:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12303759 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07B75C4BA3C for ; Mon, 7 Jun 2021 14:52:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E226D60FE5 for ; Mon, 7 Jun 2021 14:52:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231415AbhFGOyk (ORCPT ); Mon, 7 Jun 2021 10:54:40 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:34746 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230478AbhFGOyb (ORCPT ); Mon, 7 Jun 2021 10:54:31 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 0464C21A9F; Mon, 7 Jun 2021 14:52:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1623077558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vl6h8io6lSoWK7OIrnD7E7ya5J3bdZJ7HzP2D9qWuTI=; b=pKfvrYrcy1xAVtKNmw9VHwTemgYio9dqCCXj2qp2s2JWUKyLBGbU0ymRIUsFMTnem07EFP Rdc/QIuzF2BthdMO2sBS/L4z/rA/917Pf9sTft9oGj3xqIWDaTmw14BY3oBHj9U9ZUA7P4 3CsHFNGPY7rieEn/jaNNM+eG7j7qPGM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1623077558; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vl6h8io6lSoWK7OIrnD7E7ya5J3bdZJ7HzP2D9qWuTI=; b=3pUZCLZilKHDEtXnzJoG6nngWw0wNgbZHlnJxu0cudCyLBIYO1DwS4rc3i5Vx4QUQ3xDzz VUH9U7p0kK3UXkCw== Received: from quack2.suse.cz (unknown [10.100.200.198]) by relay2.suse.de (Postfix) with ESMTP id DA79EA3B99; Mon, 7 Jun 2021 14:52:37 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id C41211F2CC0; Mon, 7 Jun 2021 16:52:36 +0200 (CEST) From: Jan Kara To: Cc: Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, , linux-f2fs-devel@lists.sourceforge.net, , , Miklos Szeredi , Steve French , Ted Tso , Matthew Wilcox , Jan Kara Subject: [PATCH 14/14] cifs: Fix race between hole punch and page fault Date: Mon, 7 Jun 2021 16:52:24 +0200 Message-Id: <20210607145236.31852-14-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210607144631.8717-1-jack@suse.cz> References: <20210607144631.8717-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1533; h=from:subject; bh=yLMJejfXbe3cEmy2dQFvIvX02RxXlz2mOO0L4SMl8Co=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBgvjKnJxY3aOf0Uh04RpHAOh/EZgichtc+O2fd+Y+C czFMeL2JATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYL4ypwAKCRCcnaoHP2RA2Ue3B/ 9exfyIQAzlGqcG2osnZCouldM63nI3uW0xQ2C9f5oG2IPFvR9kW48iWOZvY8IeVLn/hDfPcumyTE3S rVE5Wb8HloLQVQrPKhUTw8U7QuIJVnxEWeBrOQRkcJm2p/i8DGH/tJK6Iw/z6QFn3oEoOyOT96u/HQ XvMcq+lNmKNSfUE9n6xOQJOdvYt2bMCS7UF3V1ONE8X1HkyUHJxDYBL+zW7gTkfxuBg+Ym0QIiNQzi zBwKLs2btjQxoUfPsSpa6AqqbGrmx+OjASJLWob322k8teI7Y8mEDXc6smlXTSonPzIEcQRXKvJoIG PzUylz0kA5utwu4PimVcuV48+9RTJg X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Cifs has a following race between hole punching and page fault: CPU1 CPU2 smb3_fallocate() smb3_punch_hole() truncate_pagecache_range() filemap_fault() - loads old data into the page cache SMB2_ioctl(..., FSCTL_SET_ZERO_DATA, ...) And now we have stale data in the page cache. Fix the problem by locking out faults (as well as reads) using mapping->invalidate_lock while hole punch is running. CC: Steve French CC: linux-cifs@vger.kernel.org Signed-off-by: Jan Kara --- fs/cifs/smb2ops.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index dd0eb665b680..e41ea254beaa 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -3579,6 +3579,7 @@ static long smb3_punch_hole(struct file *file, struct cifs_tcon *tcon, return rc; } + filemap_invalidate_lock(inode->i_mapping); /* * We implement the punch hole through ioctl, so we need remove the page * caches first, otherwise the data may be inconsistent with the server. @@ -3596,6 +3597,7 @@ static long smb3_punch_hole(struct file *file, struct cifs_tcon *tcon, sizeof(struct file_zero_data_information), CIFSMaxBufSize, NULL, NULL); free_xid(xid); + filemap_invalidate_unlock(inode->i_mapping); return rc; }