From patchwork Wed Sep 26 21:08:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10616867 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C28015A6 for ; Wed, 26 Sep 2018 21:09:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B1F72B7F2 for ; Wed, 26 Sep 2018 21:09:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0DB762B7F9; Wed, 26 Sep 2018 21:09:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 733552B7F2 for ; Wed, 26 Sep 2018 21:09:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F0D7E8E000B; Wed, 26 Sep 2018 17:09:16 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id EBB1F8E0008; Wed, 26 Sep 2018 17:09:16 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3B4D8E000B; Wed, 26 Sep 2018 17:09:16 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by kanga.kvack.org (Postfix) with ESMTP id 9B07B8E0008 for ; Wed, 26 Sep 2018 17:09:16 -0400 (EDT) Received: by mail-qt1-f200.google.com with SMTP id d18-v6so346534qtj.20 for ; Wed, 26 Sep 2018 14:09:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=auspTN6sS8IxHotb0wwOzsGrFw8EJq3KjZuMbDyvHfw=; b=GsLjZMnb+jQ95ZRGVjBvqSCcU5xjzpyxtIEKkY5ISi5uv/6CL42Z6Ga04/ZiXxl3Dg aEbbzepD/Mc008AvQisERvNYWwqxy6cK1gCBRnP8bE+Da/RIMJpaODu2h/jzrRq1K57M 8muObe3/ooBE/zCuzHdKT6dzl/G6jyuDvkOAMCVlvJcHNuzRPsICl3NfpT79Uqburif1 NKLEZLIUuOXbqeohOOx/WRSgy8DNnirl3NOKr53fCMf+CzhxjqHpBWwmMwWINqrbL7ar K/YjTJG8ZJxMN/bR/au9wlyOh9TxdSn26lNw+qmYVRlL68pKOWBdk3evbdsQU8rfXoUK mnEA== X-Gm-Message-State: ABuFfoid5Of6dgtNN2qWQ0D+h0yGPGPrYJ9eV0EhujUSCKALbrZt0umh l7aldJuXYfX4E3kM96Q9FZSkWNSSQijNqK9ZqOhrIsVEhNEDz0GCSmnljOm9a/cJvrtUYG/yfIw /nRsc5Ms7wES8ArCu2QHQp9FHKcR7N9hzcjN6hmjyT1vPKpB/GyilxftFITYILI1z3KqXzEBAGA QRusVgWzhbpZBWpW2EZwt5l4k/J7Td6u6aigPQV7WHG6zTEqIewbWrNDVA550saB9/AKVp4Zc6j G+fWjTpUq7Tq280uHY24Ed7HiQDLemkhh+DScRWBUmPrkTQxweJdgYHFkqJchFQ+qNgVx5O6eoY L67L/7JEFBwpbDfapGYP62uBy13nhGp2clNvyMqxWlQDyTepee/2pNfXvUA2APtR4/OsAGW7+Tq v X-Received: by 2002:ac8:4649:: with SMTP id f9-v6mr6169565qto.34.1537996156388; Wed, 26 Sep 2018 14:09:16 -0700 (PDT) X-Received: by 2002:ac8:4649:: with SMTP id f9-v6mr6169528qto.34.1537996155646; Wed, 26 Sep 2018 14:09:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537996155; cv=none; d=google.com; s=arc-20160816; b=dJxThA59lVVxWK0HRzDTNYDHd4oRrqzWOpQhyYYmNqQkqAymkbiIhQMiX2Ej1ortSp ZeSUzqoginza7MTbqCjUr1RQC33wTOoIQRqOzWI2lssFTJCn8tJh2/j2AS3yIgzynn4K FJj3UYQRIuaGyAzD51RK0sqzYgwhN7RSEHScP8UInBdWT7gQmuCRjK07fVXMNtLv1AdR QLUK78JlNfmV3yHegJnQ3QElD+z4A0zIBh+N+3CyEdoPR3lmHAVJXoQhRU9Is4v7TJw2 dO7aQnAx9tufZdgD5xdiQmpWCLjy3RiEP4VSb5h7baB+Qj8jTz2Q3Pi37zeRuXD2w7Wk 2VTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=auspTN6sS8IxHotb0wwOzsGrFw8EJq3KjZuMbDyvHfw=; b=Mc/FBLL5InGJzsnTiiYuH34AAMvI8yE2jSt/7aj90DpJSAPpQnHjhH4UYyPP68FC3I EO+k7M35MPtrKFIZJ3dLae9raZWgLYc6D6piDlcZPG3CsD/fmaEeJo0iz23sPsB+ZxAB WkuWir3NhtQ6nsG8RafAkin5e/vv0EEFbIDCnEIQ32Y8K/JkdZk1fIz5G+Wxrs1S/6L1 7/2ktC8jAo6OLiFPXYFUdmcjcmRLtnRRq4pEwWH7l6prvLMp63R3Zy7fU0mv9FKJizCE gV5WrLcdaUShWEiKpetqhxuY5d1gUSkdrGPq6qPw1KKC56gdu//JsJMyJMgFE/tm6i/R 2uMA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=SXAd6Vke; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id z38-v6sor65350qtb.1.2018.09.26.14.09.15 for (Google Transport Security); Wed, 26 Sep 2018 14:09:15 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=SXAd6Vke; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=auspTN6sS8IxHotb0wwOzsGrFw8EJq3KjZuMbDyvHfw=; b=SXAd6Vke7LmvdOzgepTuMkxa5FVG4FDrn4gPRLVbWKiMjOC6JruSWoiGNuqVBPi55i oomIuyIQpOYd5drvE4MP5YkIE/T8CccT7b4a0rCb/jOWcNs/VLzth1hvdpAJIC+Kr8sS Kdql0JzF4IvjxHmTULK+GNHNHU2T1sELYyiUO3fwvg8cB+vzlHY1aCyOkAGeoo7ZuI0K q4tEdtkustrtW0fLoHEhksp0WWm6UWorJUfYwumQBeNgr0t397zLmHZW/2TKCgQPhqn4 LUcY8NcfEvfqRYiCWcGN2kwaHc9TZ9ZIWFZYX+ykxDz+aHy+hpQ5PyNpPQCYmLLRi1PJ GzFw== X-Google-Smtp-Source: ACcGV62nJC3T0WRvUixrTKzlWB2aS2+o+U13U55Ewpn8iBVNXBNucv90lx3SdSa1/k4WkjXmvdPwbA== X-Received: by 2002:aed:2b43:: with SMTP id p61-v6mr6152561qtd.107.1537996155364; Wed, 26 Sep 2018 14:09:15 -0700 (PDT) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id q26-v6sm86927qta.18.2018.09.26.14.09.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 26 Sep 2018 14:09:14 -0700 (PDT) From: Josef Bacik To: kernel-team@fb.com, linux-kernel@vger.kernel.org, hannes@cmpxchg.org, tj@kernel.org, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, riel@redhat.com, linux-mm@kvack.org, linux-btrfs@vger.kernel.org Subject: [PATCH 9/9] btrfs: drop mmap_sem in mkwrite for btrfs Date: Wed, 26 Sep 2018 17:08:56 -0400 Message-Id: <20180926210856.7895-10-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180926210856.7895-1-josef@toxicpanda.com> References: <20180926210856.7895-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP ->page_mkwrite is extremely expensive in btrfs. We have to reserve space, which can take 6 lifetimes, and we could possibly have to wait on writeback on the page, another several lifetimes. To avoid this simply drop the mmap_sem if we didn't have the cached page and do all of our work and return the appropriate retry error. If we have the cached page we know we did all the right things to set this page up and we can just carry on. Signed-off-by: Josef Bacik --- fs/btrfs/inode.c | 41 +++++++++++++++++++++++++++++++++++++++-- include/linux/mm.h | 14 ++++++++++++++ mm/filemap.c | 3 ++- 3 files changed, 55 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3ea5339603cf..6b723d29bc0c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8809,7 +8809,9 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) { struct page *page = vmf->page; - struct inode *inode = file_inode(vmf->vma->vm_file); + struct file *file = vmf->vma->vm_file, *fpin; + struct mm_struct *mm = vmf->vma->vm_mm; + struct inode *inode = file_inode(file); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; struct btrfs_ordered_extent *ordered; @@ -8828,6 +8830,29 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) reserved_space = PAGE_SIZE; + /* + * We have our cached page from a previous mkwrite, check it to make + * sure it's still dirty and our file size matches when we ran mkwrite + * the last time. If everything is OK then return VM_FAULT_LOCKED, + * otherwise do the mkwrite again. + */ + if (vmf->flags & FAULT_FLAG_USED_CACHED) { + lock_page(page); + if (vmf->cached_size == i_size_read(inode) && + PageDirty(page)) + return VM_FAULT_LOCKED; + unlock_page(page); + } + + /* + * mkwrite is extremely expensive, and we are holding the mmap_sem + * during this, which means we can starve out anybody trying to + * down_write(mmap_sem) for a long while, especially if we throw cgroups + * into the mix. So just drop the mmap_sem and do all of our work, + * we'll loop back through and verify everything is ok the next time and + * hopefully avoid doing the work twice. + */ + fpin = maybe_unlock_mmap_for_io(vmf->vma, vmf->flags); sb_start_pagefault(inode->i_sb); page_start = page_offset(page); page_end = page_start + PAGE_SIZE - 1; @@ -8844,7 +8869,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) ret2 = btrfs_delalloc_reserve_space(inode, &data_reserved, page_start, reserved_space); if (!ret2) { - ret2 = file_update_time(vmf->vma->vm_file); + ret2 = file_update_time(file); reserved = 1; } if (ret2) { @@ -8943,6 +8968,14 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true); sb_end_pagefault(inode->i_sb); extent_changeset_free(data_reserved); + if (fpin) { + unlock_page(page); + fput(fpin); + get_page(page); + vmf->cached_size = size; + vmf->cached_page = page; + return VM_FAULT_RETRY; + } return VM_FAULT_LOCKED; } @@ -8955,6 +8988,10 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) out_noreserve: sb_end_pagefault(inode->i_sb); extent_changeset_free(data_reserved); + if (fpin) { + fput(fpin); + down_read(&mm->mmap_sem); + } return ret; } diff --git a/include/linux/mm.h b/include/linux/mm.h index a7305d193c71..9409845d0411 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -370,6 +370,13 @@ struct vm_fault { * next time we loop through the fault * handler for faster lookup. */ + loff_t cached_size; /* ->page_mkwrite handlers may drop + * the mmap_sem to avoid starvation, in + * which case they need to save the + * i_size in order to verify the cached + * page we're using the next loop + * through hasn't changed under us. + */ /* These three entries are valid only while holding ptl lock */ pte_t *pte; /* Pointer to pte entry matching * the 'address'. NULL if the page @@ -1437,6 +1444,8 @@ extern vm_fault_t handle_mm_fault_cacheable(struct vm_fault *vmf); extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); +extern struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, + int flags); void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows); void unmap_mapping_range(struct address_space *mapping, @@ -1463,6 +1472,11 @@ static inline int fixup_user_fault(struct task_struct *tsk, BUG(); return -EFAULT; } +stiatc inline struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, + int flags) +{ + return NULL; +} static inline void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows) { } static inline void unmap_mapping_range(struct address_space *mapping, diff --git a/mm/filemap.c b/mm/filemap.c index e9cb44bd35aa..8027f082d74f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2366,7 +2366,7 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU -static struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int flags) +struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int flags) { if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == FAULT_FLAG_ALLOW_RETRY) { struct file *file; @@ -2377,6 +2377,7 @@ static struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int fla } return NULL; } +EXPORT_SYMBOL_GPL(maybe_unlock_mmap_for_io); /** * page_cache_read - adds requested page to the page cache if not already there