From patchwork Wed Sep 26 21:08:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Josef Bacik X-Patchwork-Id: 10616841 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A9DC180E for ; Wed, 26 Sep 2018 21:09:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8CDBD2B774 for ; Wed, 26 Sep 2018 21:09:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 80D982B7F2; Wed, 26 Sep 2018 21:09:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DAFC72B7F9 for ; Wed, 26 Sep 2018 21:09:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B62038E0004; Wed, 26 Sep 2018 17:09:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id ABE918E0001; Wed, 26 Sep 2018 17:09:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89B428E0005; Wed, 26 Sep 2018 17:09:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 5CDA58E0004 for ; Wed, 26 Sep 2018 17:09:04 -0400 (EDT) Received: by mail-qk1-f197.google.com with SMTP id x144-v6so473024qkb.4 for ; Wed, 26 Sep 2018 14:09:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:subject:date:message-id :in-reply-to:references; bh=FY4PvfFFgkG4rCGf8NvSi/OKKAo2xpKioHzicQvhcOI=; b=CRE2yw7ALWQt4ZxXJMDGz/NIuyoFfp9AjDvrQz9r0i5nUCe8nhJsyrQgajw1j0AVwG jOlTFKVdeGtlC6WEUkKVFVSSmz4ycphZQ/Y2yjaqWF1U+rpqVhGuQ+3MsOQ/OAeLRVKv igpYDKpMvBmUuun/KuO95sMg6tx6EjTa1S2iAK1iOv5cJal5xJ6FREhtwVnYWjwXwBkb MBftLumiYd+TMXqYicqHVY9mCLzG4i94SxSdNyureE5cjBMgxBm+7U5nbsE82ZKwvSs0 hJZ+SNB+k0XVCpsk9dLN5P4Ce29zQJkdhjgkIXFwA5a4SxokenBEWxALaJGUKet6c7qj eU3g== X-Gm-Message-State: ABuFfojiPSFRy6KO/4FC/+n69dhl+FOJPOiyjtdORw3fGHq0Kz66YYDl oqfwsXUzWpRnhAk/pXjymn+XnldvzbTLVlwpGc90/7IIsVo3uH2SpVbrBz2ZeP1Yg5hQvKDMypJ 3lRTiXG5STcpM4sl8yxCJ4nzieuF3Q1dSZej+7hR9nkonA6/O0jHEzkni0j5RrgjDI5UexnqggR /x2BIrUfBGXxIuA7Wh/O4HzH3nDtY3Kyi0bSw+8JPD21DiQn3cTMPyu4BaUwEU5KUp0pdvLoy+G cs4G0iCjgJijjX+WU4CsY8dvvpkrpkjZPRem+gy+B2YiLuK3jFZqGfTTMSd/jVAFs2v0Cyge2YQ 76+HFWp04J5lQsfSXrSIvbckFjgh6xZYbgL0wFPWy9XU0rNQqNRhoeKONzUskNaSQiXF6WOaIrf w X-Received: by 2002:ac8:6c8:: with SMTP id j8-v6mr5947980qth.314.1537996144114; Wed, 26 Sep 2018 14:09:04 -0700 (PDT) X-Received: by 2002:ac8:6c8:: with SMTP id j8-v6mr5947926qth.314.1537996143321; Wed, 26 Sep 2018 14:09:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537996143; cv=none; d=google.com; s=arc-20160816; b=s0AUDJDkoYpWUd+EOHp1AlPFDG9iW7HCw+iYkvjChATf94Eyr9g2z5fII6YKY5XwHJ jkiTyOTjZ9oW8brblQjGej9ZHFCyIUHqe6ViL8mKyroem7L0/MAE2lJx+xGh83ah7JKi pLKMQun4kxPHMsiLpyTMDnTjiV7fWDqkLz9zCR036pfqBal213ZZWhhmtHmFhJieE1cB iAcE8lz1q54acgaI0UcPOHaYQIGw07wXhYNTaegQSGYYbcUCt2xUUh87XhLCPZawLZvC VQmxwKVi9UwWmQiTHgyLlkhWbJoyu82md3BihzeKlQbkD35mL2zn7EprTAx38Wf0iase EqoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:to:from :dkim-signature; bh=FY4PvfFFgkG4rCGf8NvSi/OKKAo2xpKioHzicQvhcOI=; b=ICb9JqiQDhsJNIXGQYPVzlwBqBwSyMntCs1hPbX2x21w5vvhhjCswwwDye6hkMepo0 1Vhd+DvDucFUhOwruKAP3Nkt+MtkxFhFgw+0hqtITfWwJeDOt/vsUrSXyeT54Mv97K52 gNBoCjvxUisULTmV1CguWBFGHIxob6qp8Bp6wKLbVWaS7B9u+qWzUeEB3R7OlN8sg7WS VwTbokSbCC8PQXu2+h415b2UtKTrcpWUwIo5vMuRTQn1qPxQoIWHc5BsRpdmc4vKnUjl gpEaJ1FrfYnGngEHEfhapUs7yQi7/jqk35aZwOx8NxU1sMf6gh41UxK7MGYBcbNltCaK QggA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=tUlEGY8l; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id v21-v6sor59788qvf.16.2018.09.26.14.09.03 for (Google Transport Security); Wed, 26 Sep 2018 14:09:03 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=tUlEGY8l; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of josef@toxicpanda.com) smtp.mailfrom=josef@toxicpanda.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=FY4PvfFFgkG4rCGf8NvSi/OKKAo2xpKioHzicQvhcOI=; b=tUlEGY8l8IgAEzc/SrMWSEhdj/YyihWzeRHQm/F7LSmGImuylO2549hO9a14Megk4C S6YLhf0itc2jGfRTXuqZXNNpdLG0n/5fynFIeJx5PCGpk8ve3pr+L3HbN8JwP4oY2vNz are7A4iKZfhnLtonNzI92X7P0gMiAtPr5kJ1P1S0JSLRkB3Xy/YRf40h3yYhGpmrIPrO fINLLRqmeSOAFFhYx/4ZtRJtKCFIbY1d2A4Q0xJO0lqhVB+uTUigu5Wrre1Gp4Gbx/4Q 1/2nAOKAeZ6nAdT/3xXt7evr6JQEeAOsDYjieEmH8aGd86QB9tZKXMuOkAackq2qjcjf pGVA== X-Google-Smtp-Source: ACcGV63m+uEm9iqYhuq6Vvgdg3700+CAAqavam7D27mFj9y+w+xIfARiMWPqtlMjnrw80RLGv5qyZg== X-Received: by 2002:a0c:f906:: with SMTP id v6-v6mr5821962qvn.86.1537996143023; Wed, 26 Sep 2018 14:09:03 -0700 (PDT) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id l18-v6sm96163qtb.2.2018.09.26.14.09.01 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 26 Sep 2018 14:09:02 -0700 (PDT) From: Josef Bacik To: kernel-team@fb.com, linux-kernel@vger.kernel.org, hannes@cmpxchg.org, tj@kernel.org, linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, riel@redhat.com, linux-mm@kvack.org, linux-btrfs@vger.kernel.org Subject: [PATCH 2/9] mm: drop mmap_sem for page cache read IO submission Date: Wed, 26 Sep 2018 17:08:49 -0400 Message-Id: <20180926210856.7895-3-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180926210856.7895-1-josef@toxicpanda.com> References: <20180926210856.7895-1-josef@toxicpanda.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Johannes Weiner Reads can take a long time, and if anybody needs to take a write lock on the mmap_sem it'll block any subsequent readers to the mmap_sem while the read is outstanding, which could cause long delays. Instead drop the mmap_sem if we do any reads at all. Signed-off-by: Johannes Weiner Signed-off-by: Josef Bacik --- mm/filemap.c | 119 ++++++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 90 insertions(+), 29 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 52517f28e6f4..1ed35cd99b2c 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2366,6 +2366,18 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) EXPORT_SYMBOL(generic_file_read_iter); #ifdef CONFIG_MMU +static struct file *maybe_unlock_mmap_for_io(struct vm_area_struct *vma, int flags) +{ + if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == FAULT_FLAG_ALLOW_RETRY) { + struct file *file; + + file = get_file(vma->vm_file); + up_read(&vma->vm_mm->mmap_sem); + return file; + } + return NULL; +} + /** * page_cache_read - adds requested page to the page cache if not already there * @file: file to read @@ -2405,23 +2417,28 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) * Synchronous readahead happens when we don't even find * a page in the page cache at all. */ -static void do_sync_mmap_readahead(struct vm_area_struct *vma, - struct file_ra_state *ra, - struct file *file, - pgoff_t offset) +static int do_sync_mmap_readahead(struct vm_area_struct *vma, + struct file_ra_state *ra, + struct file *file, + pgoff_t offset, + int flags) { struct address_space *mapping = file->f_mapping; + struct file *fpin; /* If we don't want any read-ahead, don't bother */ if (vma->vm_flags & VM_RAND_READ) - return; + return 0; if (!ra->ra_pages) - return; + return 0; if (vma->vm_flags & VM_SEQ_READ) { + fpin = maybe_unlock_mmap_for_io(vma, flags); page_cache_sync_readahead(mapping, ra, file, offset, ra->ra_pages); - return; + if (fpin) + fput(fpin); + return fpin ? -EAGAIN : 0; } /* Avoid banging the cache line if not needed */ @@ -2433,7 +2450,9 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, * stop bothering with read-ahead. It will only hurt. */ if (ra->mmap_miss > MMAP_LOTSAMISS) - return; + return 0; + + fpin = maybe_unlock_mmap_for_io(vma, flags); /* * mmap read-around @@ -2442,28 +2461,40 @@ static void do_sync_mmap_readahead(struct vm_area_struct *vma, ra->size = ra->ra_pages; ra->async_size = ra->ra_pages / 4; ra_submit(ra, mapping, file); + + if (fpin) + fput(fpin); + + return fpin ? -EAGAIN : 0; } /* * Asynchronous readahead happens when we find the page and PG_readahead, * so we want to possibly extend the readahead further.. */ -static void do_async_mmap_readahead(struct vm_area_struct *vma, - struct file_ra_state *ra, - struct file *file, - struct page *page, - pgoff_t offset) +static int do_async_mmap_readahead(struct vm_area_struct *vma, + struct file_ra_state *ra, + struct file *file, + struct page *page, + pgoff_t offset, + int flags) { struct address_space *mapping = file->f_mapping; + struct file *fpin; /* If we don't want any read-ahead, don't bother */ if (vma->vm_flags & VM_RAND_READ) - return; + return 0; if (ra->mmap_miss > 0) ra->mmap_miss--; - if (PageReadahead(page)) - page_cache_async_readahead(mapping, ra, file, - page, offset, ra->ra_pages); + if (!PageReadahead(page)) + return 0; + fpin = maybe_unlock_mmap_for_io(vma, flags); + page_cache_async_readahead(mapping, ra, file, + page, offset, ra->ra_pages); + if (fpin) + fput(fpin); + return fpin ? -EAGAIN : 0; } /** @@ -2479,10 +2510,8 @@ static void do_async_mmap_readahead(struct vm_area_struct *vma, * * vma->vm_mm->mmap_sem must be held on entry. * - * If our return value has VM_FAULT_RETRY set, it's because - * lock_page_or_retry() returned 0. - * The mmap_sem has usually been released in this case. - * See __lock_page_or_retry() for the exception. + * If our return value has VM_FAULT_RETRY set, the mmap_sem has + * usually been released. * * If our return value does not have VM_FAULT_RETRY set, the mmap_sem * has not been released. @@ -2492,11 +2521,13 @@ static void do_async_mmap_readahead(struct vm_area_struct *vma, vm_fault_t filemap_fault(struct vm_fault *vmf) { int error; + struct mm_struct *mm = vmf->vma->vm_mm; struct file *file = vmf->vma->vm_file; struct address_space *mapping = file->f_mapping; struct file_ra_state *ra = &file->f_ra; struct inode *inode = mapping->host; pgoff_t offset = vmf->pgoff; + int flags = vmf->flags; pgoff_t max_off; struct page *page; vm_fault_t ret = 0; @@ -2509,27 +2540,44 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) * Do we have something in the page cache already? */ page = find_get_page(mapping, offset); - if (likely(page) && !(vmf->flags & FAULT_FLAG_TRIED)) { + if (likely(page) && !(flags & FAULT_FLAG_TRIED)) { /* * We found the page, so try async readahead before * waiting for the lock. */ - do_async_mmap_readahead(vmf->vma, ra, file, page, offset); + error = do_async_mmap_readahead(vmf->vma, ra, file, page, offset, vmf->flags); + if (error == -EAGAIN) + goto out_retry_wait; } else if (!page) { /* No page in the page cache at all */ - do_sync_mmap_readahead(vmf->vma, ra, file, offset); - count_vm_event(PGMAJFAULT); - count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); ret = VM_FAULT_MAJOR; + count_vm_event(PGMAJFAULT); + count_memcg_event_mm(mm, PGMAJFAULT); + error = do_sync_mmap_readahead(vmf->vma, ra, file, offset, vmf->flags); + if (error == -EAGAIN) + goto out_retry_wait; retry_find: page = find_get_page(mapping, offset); if (!page) goto no_cached_page; } - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { - put_page(page); - return ret | VM_FAULT_RETRY; + if (!trylock_page(page)) { + if (flags & FAULT_FLAG_ALLOW_RETRY) { + if (flags & FAULT_FLAG_RETRY_NOWAIT) + goto out_retry; + up_read(&mm->mmap_sem); + goto out_retry_wait; + } + if (flags & FAULT_FLAG_KILLABLE) { + int ret = __lock_page_killable(page); + + if (ret) { + up_read(&mm->mmap_sem); + goto out_retry; + } + } else + __lock_page(page); } /* Did it get truncated? */ @@ -2607,6 +2655,19 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) /* Things didn't work out. Return zero to tell the mm layer so. */ shrink_readahead_size_eio(file, ra); return VM_FAULT_SIGBUS; + +out_retry_wait: + if (page) { + if (flags & FAULT_FLAG_KILLABLE) + wait_on_page_locked_killable(page); + else + wait_on_page_locked(page); + } + +out_retry: + if (page) + put_page(page); + return ret | VM_FAULT_RETRY; } EXPORT_SYMBOL(filemap_fault);