From patchwork Thu Oct 18 20:23:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10648113 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 73E3E17D4 for ; Thu, 18 Oct 2018 20:23:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6748E28D8F for ; Thu, 18 Oct 2018 20:23:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5B70628D98; Thu, 18 Oct 2018 20:23:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BFB8328D8F for ; Thu, 18 Oct 2018 20:23:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 94B9E6B026C; Thu, 18 Oct 2018 16:23:41 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8D45A6B026D; Thu, 18 Oct 2018 16:23:41 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FFEF6B026E; Thu, 18 Oct 2018 16:23:41 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 441896B026C for ; Thu, 18 Oct 2018 16:23:41 -0400 (EDT) Received: by mail-qk1-f197.google.com with SMTP id y201-v6so32822061qka.1 for ; Thu, 18 Oct 2018 13:23:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=SEqSY4t8p9DCOeSfmdiOT3jbO6A8Qq1SIfNhhSeNGXU=; b=lHrd1ehAMHAq3qBfgpRGfl/kTYdI1lPzmfVqHsW3aEZi3EX5qfsLvdgbMhrC4lzm+j mzEPAQXfSoLYAZ96mLKIoYsCunOyucWTWEe0tAJzWfrucLRp4mt+RQHyIculfasYpwoW Xm8yozE5bEhzcBVfaipDUZpCCFTrWoT4V2jT/VYX7BApi3K8VnCs6m/OsloCVzAfvy0C 1sOtuoqFgnNNCKSxHkOQJA3jh0sO3otjlJ4H6JWYf0raAl33y5ODvDWQ5NuQRLuMQGrH ZRa78jQHHJsYYLvx7uuzG0N5lLQmPHyjHBcrtTBp/GAlY/WE4aqm0dNHK3OJxe+5KOMl WaHg== X-Gm-Message-State: ABuFfogTsvUGmTdooaAOcFNGpu5a3jEP9q+q/1UivUR8rAvvAZPU+Y7A SBXSfphqNXT9a0fscbug4m7zMq8MJlekbmhcf95a9WEl/9uLWDXSX4x+x9cuLSilTRqQVNfkhFZ jTtGMKCIaP/R8ej5fBs4RE7ivKFCz6s94N33oKtCRlj7U7QBdZaurTQxoPck/jRSYmImiy/57pS mrnDSh0LbPlOiMIEwLj61hezn1Ngbm3t+bma6oBHykyRJltIgjeLUVSSk/+HilY5XpkfmfZKvJ/ IWNJEV9z8vLD/zDW+wJ2IIuve0xF80Ko0vy1afv9ukihZNSsU6ew9L9yMUdpRQpmqWRXTwp17VJ Nj2sjpa7QncK/gglM3TywqcEQWZ39IeTQmywbCPneRJAjaCD9fl3q9ZwHrDjcvWKJnfZ/TVCcSM s X-Received: by 2002:a0c:9324:: with SMTP id d33mr31167054qvd.4.1539894221034; Thu, 18 Oct 2018 13:23:41 -0700 (PDT) X-Received: by 2002:a0c:9324:: with SMTP id d33mr31167007qvd.4.1539894220233; Thu, 18 Oct 2018 13:23:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539894220; cv=none; d=google.com; s=arc-20160816; b=jQWzjfEHkjcfFSoVvHs+h2uMJHC0KrVRxJOsITsXZ0k79fwy4A9DJUs08D7pQCx4gF 3OolhXB7CnazbQywVIdMaNXd9BYtsY6ybSgni+l+dQAxVBo9w8/cZ7CiZcW0M8Xb09Cs V1YEETKw4P0y2NkjWfqKB+6UvDO6lDXeFZvVsk6TwNKWPhSKt1JDYpuTJMyQt1xGCmSp VE7Ql95SeyxIuAZkp/5gQE2xcQh/8jt62iIH2QtvVRgjvbFgCuxdsw8raEsEIZY/uDTT /gbo6FRrt0r/UWKWrdFSVJX6q4lCFIZ01zJCIo7J5yCVs4KJKfsd2pMrdRZcNOfl+2uX 4a5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=SEqSY4t8p9DCOeSfmdiOT3jbO6A8Qq1SIfNhhSeNGXU=; b=kaxrgOnFzX2yHJnUrG9OQmAc4xWShykve1MtcOAwls5MyFCMknRXK5zz1lxpVtM5AV pgaHkp5O88v74caQ8zT1cv9ghRNkc7quOzHqTNiLdOSUKFTTrXMR8XDgWHl/N61MgN2k yQA8KXQ3QaAP/5xatnEafGhcjTpdNYXFYIvosUusFOQHY6aEb3RNc7h4A45aggphgsUZ a2oDlGcO8zfN58rH+zaTlORkfgg3vtAx4PMjkLuinIPzGoFT6yEmn0SEpBCWFwAkJlpQ 7R5V1GbYPYM8OKNApxP+jykYIhBykOdNMq2U46fbfFxJiot+jyol+xvqg7MM+Zhj0WGA rvAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=tZV7WZ86; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id m30-v6sor25788550qtd.70.2018.10.18.13.23.40 for (Google Transport Security); Thu, 18 Oct 2018 13:23:40 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=tZV7WZ86; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=SEqSY4t8p9DCOeSfmdiOT3jbO6A8Qq1SIfNhhSeNGXU=; b=tZV7WZ86PDsJVCMVc2m/xnObzxWpTqThyVHl1H7tR6r0DaUfthzmYYIPs09urF90I7 3jQW8OLObDgwno6x8ayDmpQjT8L+ZfKko7LqCQbhoYcJmDDtWv/fFsefu+TQepktAyqs E9EKtrTrJYcHdtNY/lD+fB8HsfUlw0TfGRfts+v/PHsKFzZ9w1Xi3FLJC47euGwz82AM iK8jkB2UPmojher8v7COrY8iD6NLwtMUE7RMnOg0vhu2vnLXHLGU6p4t7aJheyNUfR09 DJ1X/qk/SzJFrb7fl3/98QRbdS3JfwaJSBRmanO83guk2V5vuVrJ6eGnEGhu0ja/Ap2c G5Yw== X-Google-Smtp-Source: ACcGV63lmj6JK1y8zCHg7zYBXOI5ZGshCJgVfTREVwxCmlu9Njt/3I0cpq7Sp1LwCjyQMLpdxOa6HA== X-Received: by 2002:ac8:3290:: with SMTP id z16-v6mr30103001qta.209.1539894219819; Thu, 18 Oct 2018 13:23:39 -0700 (PDT) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id m54-v6sm15077864qtb.97.2018.10.18.13.23.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 18 Oct 2018 13:23:38 -0700 (PDT) From: Josef Bacik To: kernel-team@fb.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, david@fromorbit.com, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org, riel@fb.com, linux-mm@kvack.org Subject: [PATCH 7/7] btrfs: drop mmap_sem in mkwrite for btrfs Date: Thu, 18 Oct 2018 16:23:18 -0400 Message-Id: <20181018202318.9131-8-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20181018202318.9131-1-josef@toxicpanda.com> References: <20181018202318.9131-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP ->page_mkwrite is extremely expensive in btrfs. We have to reserve space, which can take 6 lifetimes, and we could possibly have to wait on writeback on the page, another several lifetimes. To avoid this simply drop the mmap_sem if we didn't have the cached page and do all of our work and return the appropriate retry error. If we have the cached page we know we did all the right things to set this page up and we can just carry on. Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 41 +++++++++++++++++++++++++++++++++++++++-- include/linux/mm.h | 14 ++++++++++++++ mm/filemap.c | 3 ++- 3 files changed, 55 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3ea5339603cf..6b723d29bc0c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8809,7 +8809,9 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) { struct page *page = vmf->page; - struct inode *inode = file_inode(vmf->vma->vm_file); + struct file *file = vmf->vma->vm_file, *fpin; + struct mm_struct *mm = vmf->vma->vm_mm; + struct inode *inode = file_inode(file); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct btrfs_ordered_extent *ordered; @@ -8828,6 +8830,29 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) reserved_space = PAGE_SIZE; + /* + * We have our cached page from a previous mkwrite, check it to make + * sure it's still dirty and our file size matches when we ran mkwrite + * the last time. If everything is OK then return VM_FAULT_LOCKED, + * otherwise do the mkwrite again. + */ + if (vmf->flags & FAULT_FLAG_USED_CACHED) { + lock_page(page); + if (vmf->cached_size == i_size_read(inode) && + PageDirty(page)) + return VM_FAULT_LOCKED; + unlock_page(page); + } + + /* + * mkwrite is extremely expensive, and we are holding the mmap_sem + * during this, which means we can starve out anybody trying to + * down_write(mmap_sem) for a long while, especially if we throw cgroups + * into the mix. So just drop the mmap_sem and do all of our work, + * we'll loop back through and verify everything is ok the next time and + * hopefully avoid doing the work twice. + */ + fpin = maybe_unlock_mmap_for_io(vmf->vma, vmf->flags); sb_start_pagefault(inode->i_sb); page_start = page_offset(page); page_end = page_start + PAGE_SIZE - 1; @@ -8844,7 +8869,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) ret2 = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start, reserved_space); if (!ret2) { - ret2 = file_update_time(vmf->vma->vm_file); + ret2 = file_update_time(file); reserved = 1; } if (ret2) { @@ -8943,6 +8968,14 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true); sb_end_pagefault(inode->i_sb); extent_changeset_free(data_reserved); + if (fpin) { + unlock_page(page); + fput(fpin); + get_page(page); + vmf->cached_size = size; + vmf->cached_page = page; + return VM_FAULT_RETRY; + } return VM_FAULT_LOCKED; } @@ -8955,6 +8988,10 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) out_noreserve: sb_end_pagefault(inode->i_sb); extent_changeset_free(data_reserved); + if (fpin) { + fput(fpin); + down_read(&mm->mmap_sem); + } return ret; } diff --git a/include/linux/mm.h b/include/linux/mm.h index a7305d193c71..02b420be6b06 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -370,6 +370,13 @@ struct vm_fault { * next time we loop through the fault * handler for faster lookup. */ + loff_t cached_size; /* ->page_mkwrite handlers may drop + * the mmap_sem to avoid starvation, in + * which case they need to save the + * i_size in order to verify the cached + * page we're using the next loop + * through hasn't changed under us. + */ /* These three entries are valid only while holding ptl lock */ pte_t *pte; /* Pointer to pte entry matching * the 'address'. NULL if the page @@ -1437,6 +1444,8 @@ extern vm_fault_t handle_mm_fault_cacheable(struct vm_fault *vmf); extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); +extern struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, + int flags); void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows); void unmap_mapping_range(struct address_space *mapping, @@ -1463,6 +1472,11 @@ static inline int fixup_user_fault(struct task_struct *tsk, BUG(); return -EFAULT; } +static inline struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, + int flags) +{ + return NULL; +} static inline void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows) { } static inline void unmap_mapping_range(struct address_space *mapping, diff --git a/mm/filemap.c b/mm/filemap.c index e9cb44bd35aa..8027f082d74f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2366,7 +2366,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU -static struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int flags) +struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int flags) { if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == FAULT_FLAG_ALLOW_RETRY) { struct file *file; @@ -2377,6 +2377,7 @@ static struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int fla } return NULL; } +EXPORT_SYMBOL_GPL(maybe_unlock_mmap_for_io); /** * page_cache_read - adds requested page to the page cache if not already there