From patchwork Fri Jul 3 09:53:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 11641595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE65A60D for ; Fri, 3 Jul 2020 09:53:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8B30720723 for ; Fri, 3 Jul 2020 09:53:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K6jXg2Fi" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B30720723 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9207F8D0067; Fri, 3 Jul 2020 05:53:35 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 8D1D08D0066; Fri, 3 Jul 2020 05:53:35 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7980A8D0067; Fri, 3 Jul 2020 05:53:35 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0043.hostedemail.com [216.40.44.43]) by kanga.kvack.org (Postfix) with ESMTP id 638F48D0036 for ; Fri, 3 Jul 2020 05:53:35 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0C279181AC9CB for ; Fri, 3 Jul 2020 09:53:35 +0000 (UTC) X-FDA: 76996302390.07.board50_140be7a26e91 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id DD0051803F9AF for ; Fri, 3 Jul 2020 09:53:34 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,agruenba@redhat.com,,RULES_HIT:30012:30034:30054,0,RBL:205.139.110.61:@redhat.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100;04yg58g1a8n1p75a5hu1pfsn58ucoyc49hisfj1uozmtmhxgt45idz5tau4bwkx.r17zipkyejxcch31kak6i6hkft4kjyeq1x5eckhftfa1c6xsxf3txguccxhdp9x.o-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:24,LUA_SUMMARY:none X-HE-Tag: board50_140be7a26e91 X-Filterd-Recvd-Size: 4696 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Jul 2020 09:53:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593770014; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H3F7Np/uM/15jpMjri5ryOy3uvdVrn3PYgDhCmIq6hw=; b=K6jXg2Fiqk/vE7s/MF1yIZxu8wYelL/O9iJn2fXJ5BZlh1d9+GuLv1n1NG0v3z57DVBex+ NpG66dwfDWuRvnXyqmxJNowKweMjqiaPQZDI7jxszhCOx8yn1n5+HRuTT3+lWTRk8JVvrT oVrmsOhsNr+EK7lkqGU+O4C+5/H1j9g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-144-g2LTuadCPpCwQwFgSo78mw-1; Fri, 03 Jul 2020 05:53:32 -0400 X-MC-Unique: g2LTuadCPpCwQwFgSo78mw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1109F107ACF4; Fri, 3 Jul 2020 09:53:31 +0000 (UTC) Received: from max.home.com (unknown [10.40.192.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 494ED6111F; Fri, 3 Jul 2020 09:53:29 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Matthew Wilcox , Dave Chinner , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andreas Gruenbacher Subject: [RFC v2 1/2] fs: Add IOCB_NOIO flag for generic_file_read_iter Date: Fri, 3 Jul 2020 11:53:24 +0200 Message-Id: <20200703095325.1491832-2-agruenba@redhat.com> In-Reply-To: <20200703095325.1491832-1-agruenba@redhat.com> References: <20200703095325.1491832-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Rspamd-Queue-Id: DD0051803F9AF X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it shouldn't trigger any filesystem I/O for the actual request or for readahead. This allows to do tentative reads out of the page cache as some filesystems allow, and to take the appropriate locks and retry the reads only if the requested pages are not cached. Signed-off-by: Andreas Gruenbacher Reviewed-by: Matthew Wilcox (Oracle) --- include/linux/fs.h | 1 + mm/filemap.c | 17 +++++++++++++++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 3f881a892ea7..1ab2ea19e883 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -315,6 +315,7 @@ enum rw_hint { #define IOCB_SYNC (1 << 5) #define IOCB_WRITE (1 << 6) #define IOCB_NOWAIT (1 << 7) +#define IOCB_NOIO (1 << 8) struct kiocb { struct file *ki_filp; diff --git a/mm/filemap.c b/mm/filemap.c index f0ae9a6308cb..22f7ff2d369e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2028,7 +2028,7 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb, page = find_get_page(mapping, index); if (!page) { - if (iocb->ki_flags & IOCB_NOWAIT) + if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) goto would_block; page_cache_sync_readahead(mapping, ra, filp, @@ -2038,6 +2038,10 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb, goto no_cached_page; } if (PageReadahead(page)) { + if (iocb->ki_flags & IOCB_NOIO) { + put_page(page); + goto out; + } page_cache_async_readahead(mapping, ra, filp, page, index, last_index - index); @@ -2249,9 +2253,18 @@ EXPORT_SYMBOL_GPL(generic_file_buffered_read); * * This is the "read_iter()" routine for all filesystems * that can use the page cache directly. + * + * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall + * be returned when no data can be read without waiting for I/O requests + * to complete; it doesn't prevent readahead. + * + * The IOCB_NOIO flag in iocb->ki_flags indicates that -EAGAIN shall be + * returned when no data can be read without issuing new I/O requests, + * and 0 shall be returned when readhead would have been triggered. + * * Return: * * number of bytes copied, even for partial reads - * * negative error code if nothing was read + * * negative error code (or 0 if IOCB_NOIO) if nothing was read */ ssize_t generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) From patchwork Fri Jul 3 09:53:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 11641601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 10E0B13BD for ; Fri, 3 Jul 2020 09:53:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C40F520723 for ; Fri, 3 Jul 2020 09:53:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DJo8Z0cV" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C40F520723 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 10C308D0068; Fri, 3 Jul 2020 05:53:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 0C2248D0066; Fri, 3 Jul 2020 05:53:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB85F8D0068; Fri, 3 Jul 2020 05:53:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0118.hostedemail.com [216.40.44.118]) by kanga.kvack.org (Postfix) with ESMTP id C345E8D0066 for ; Fri, 3 Jul 2020 05:53:37 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8CE32A2CC for ; Fri, 3 Jul 2020 09:53:37 +0000 (UTC) X-FDA: 76996302474.02.worm10_310af6e26e91 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 55798600014A7906 for ; Fri, 3 Jul 2020 09:53:37 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,agruenba@redhat.com,,RULES_HIT:30003:30012:30054:30070:30090,0,RBL:205.139.110.61:@redhat.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100;04yrr6mknf8cpy7azayp5pjtzyx6kopyz85rc47oczpdknbbs89itqysr1ey78x.fspcynhnfmpoio98cs85eayzwgrsd9k65664gt9oigyouy3hb8scekmpcf7iog3.q-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: worm10_310af6e26e91 X-Filterd-Recvd-Size: 8021 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 3 Jul 2020 09:53:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593770016; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mT7+S/JZ0X5UTjCHTtZu+xMVOeg7qKFbo24bRgqj2g8=; b=DJo8Z0cVyyjco+5EDgF1G+hMnTIgYXZOOdwtM5WucqefPRSLOqFeIAkfghbF6udVWUyE8h V6vyr5qll59DiYGrMntlXL43my+Nz38hs+B1zIuhwi0uxlmiYjeEgsDJdzIO7b2A1haicr cw53ZAmVAqm6TlIPuBkZHyzjKi9KbH4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-68-E3iaSV3zOIiSfGk0c4lyhw-1; Fri, 03 Jul 2020 05:53:34 -0400 X-MC-Unique: E3iaSV3zOIiSfGk0c4lyhw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 54630107ACCA; Fri, 3 Jul 2020 09:53:33 +0000 (UTC) Received: from max.home.com (unknown [10.40.192.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id 71F656111F; Fri, 3 Jul 2020 09:53:31 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Matthew Wilcox , Dave Chinner , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andreas Gruenbacher Subject: [RFC v2 2/2] gfs2: Rework read and page fault locking Date: Fri, 3 Jul 2020 11:53:25 +0200 Message-Id: <20200703095325.1491832-3-agruenba@redhat.com> In-Reply-To: <20200703095325.1491832-1-agruenba@redhat.com> References: <20200703095325.1491832-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Rspamd-Queue-Id: 55798600014A7906 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: So far, gfs2 has taken the inode glocks inside the ->readpage and ->readahead address space operations. Since commit d4388340ae0b ("fs: convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed the pages to read ahead locked. With that, the current holder of the inode glock may be trying to lock one of those pages while gfs2_readahead is trying to take the inode glock, resulting in a deadlock. Fix that by moving the lock taking to the higher-level ->read_iter file and ->fault vm operations. This also gets rid of an ugly lock inversion workaround in gfs2_readpage. The cache consistency model of filesystems like gfs2 is such that if data is found in the page cache, the data is up to date and can be used without taking any filesystem locks. If a page is not cached, filesystem locks must be taken before populating the page cache. To avoid taking the inode glock when the data is already cached, gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag set. If that fails, the inode glock is taken and the operation is retried with the IOCB_NOIO flag cleared. Signed-off-by: Andreas Gruenbacher Reviewed-by: Matthew Wilcox (Oracle) --- fs/gfs2/aops.c | 45 +------------------------------------------ fs/gfs2/file.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 51 insertions(+), 46 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 72c9560f4467..68cd700a2719 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -468,21 +468,10 @@ static int stuffed_readpage(struct gfs2_inode *ip, struct page *page) } -/** - * __gfs2_readpage - readpage - * @file: The file to read a page for - * @page: The page to read - * - * This is the core of gfs2's readpage. It's used by the internal file - * reading code as in that case we already hold the glock. Also it's - * called by gfs2_readpage() once the required lock has been granted. - */ - static int __gfs2_readpage(void *file, struct page *page) { struct gfs2_inode *ip = GFS2_I(page->mapping->host); struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host); - int error; if (i_blocksize(page->mapping->host) == PAGE_SIZE && @@ -505,36 +494,11 @@ static int __gfs2_readpage(void *file, struct page *page) * gfs2_readpage - read a page of a file * @file: The file to read * @page: The page of the file - * - * This deals with the locking required. We have to unlock and - * relock the page in order to get the locking in the right - * order. */ static int gfs2_readpage(struct file *file, struct page *page) { - struct address_space *mapping = page->mapping; - struct gfs2_inode *ip = GFS2_I(mapping->host); - struct gfs2_holder gh; - int error; - - unlock_page(page); - gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); - error = gfs2_glock_nq(&gh); - if (unlikely(error)) - goto out; - error = AOP_TRUNCATED_PAGE; - lock_page(page); - if (page->mapping == mapping && !PageUptodate(page)) - error = __gfs2_readpage(file, page); - else - unlock_page(page); - gfs2_glock_dq(&gh); -out: - gfs2_holder_uninit(&gh); - if (error && error != AOP_TRUNCATED_PAGE) - lock_page(page); - return error; + return __gfs2_readpage(file, page); } /** @@ -598,16 +562,9 @@ static void gfs2_readahead(struct readahead_control *rac) { struct inode *inode = rac->mapping->host; struct gfs2_inode *ip = GFS2_I(inode); - struct gfs2_holder gh; - gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); - if (gfs2_glock_nq(&gh)) - goto out_uninit; if (!gfs2_is_stuffed(ip)) mpage_readahead(rac, gfs2_block_map); - gfs2_glock_dq(&gh); -out_uninit: - gfs2_holder_uninit(&gh); } /** diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index fe305e4bfd37..bebde537ac8c 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -558,8 +558,29 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf) return block_page_mkwrite_return(ret); } +static vm_fault_t gfs2_fault(struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vmf->vma->vm_file); + struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_holder gh; + vm_fault_t ret; + int err; + + gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); + err = gfs2_glock_nq(&gh); + if (err) { + ret = block_page_mkwrite_return(err); + goto out_uninit; + } + ret = filemap_fault(vmf); + gfs2_glock_dq(&gh); +out_uninit: + gfs2_holder_uninit(&gh); + return ret; +} + static const struct vm_operations_struct gfs2_vm_ops = { - .fault = filemap_fault, + .fault = gfs2_fault, .map_pages = filemap_map_pages, .page_mkwrite = gfs2_page_mkwrite, }; @@ -824,6 +845,9 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from) static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { + struct gfs2_inode *ip; + struct gfs2_holder gh; + size_t written = 0; ssize_t ret; if (iocb->ki_flags & IOCB_DIRECT) { @@ -832,7 +856,31 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) return ret; iocb->ki_flags &= ~IOCB_DIRECT; } - return generic_file_read_iter(iocb, to); + iocb->ki_flags |= IOCB_NOIO; + ret = generic_file_read_iter(iocb, to); + iocb->ki_flags &= ~IOCB_NOIO; + if (ret >= 0) { + if (!iov_iter_count(to)) + return ret; + written = ret; + } else { + if (ret != -EAGAIN) + return ret; + if (iocb->ki_flags & IOCB_NOWAIT) + return ret; + } + ip = GFS2_I(iocb->ki_filp->f_mapping->host); + gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); + ret = gfs2_glock_nq(&gh); + if (ret) + goto out_uninit; + ret = generic_file_read_iter(iocb, to); + if (ret > 0) + written += ret; + gfs2_glock_dq(&gh); +out_uninit: + gfs2_holder_uninit(&gh); + return written ? written : ret; } /**