From patchwork Thu Jul 2 16:51:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Gruenbacher X-Patchwork-Id: 11639829 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 00AC76C1 for ; Thu, 2 Jul 2020 16:51:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B54DA20780 for ; Thu, 2 Jul 2020 16:51:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PYPm+A7u" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B54DA20780 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3C4248D000B; Thu, 2 Jul 2020 12:51:42 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 34EF28D0001; Thu, 2 Jul 2020 12:51:42 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2173A8D000B; Thu, 2 Jul 2020 12:51:42 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0033.hostedemail.com [216.40.44.33]) by kanga.kvack.org (Postfix) with ESMTP id 0A1A98D0001 for ; Thu, 2 Jul 2020 12:51:42 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BF598181ABE85 for ; Thu, 2 Jul 2020 16:51:41 +0000 (UTC) X-FDA: 76993727202.08.jail70_4c165c326e8a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id B8AA61819E63D for ; Thu, 2 Jul 2020 16:51:34 +0000 (UTC) X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,agruenba@redhat.com,,RULES_HIT:30003:30012:30034:30054:30075:30090,0,RBL:207.211.31.81:@redhat.com:.lbl8.mailshell.net-66.10.201.10 62.18.0.100;04yg7gc4s66eaen3fc5rhwugmemx1ypcwsu4k6hmsa1c8o6ddtkh8wa7go7tzf4.3egpq51gbmwf35qyasygr86fnwjzyfkzbyeqwu94ags4tmj4wqombdbcef1q7a8.c-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:26,LUA_SUMMARY:none X-HE-Tag: jail70_4c165c326e8a X-Filterd-Recvd-Size: 9212 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Thu, 2 Jul 2020 16:51:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593708693; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FGS7zmuOCpOiRp7/K1lHThQp3FMTr/Ps3bbcjzSZ3xc=; b=PYPm+A7ulCIxP+PqFGrmqvTbaj9PORSNq1XKrqCs0f0BXATNP3+8tfvTbvu5EHURyfoeW7 LwsBhjRPt7UYNHFrWvkFY7WlHi9hrMajxPlqnCaZoWhgSCCJTVeP6hqq/biZpGZrCzx/iO kAPgiWUOfrYyfzmNrm3/mS2R0onlNdg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-503-kTjKHIsVP9KWCXN-TygWfA-1; Thu, 02 Jul 2020 12:51:28 -0400 X-MC-Unique: kTjKHIsVP9KWCXN-TygWfA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B61EB8018AB; Thu, 2 Jul 2020 16:51:26 +0000 (UTC) Received: from max.home.com (unknown [10.40.192.26]) by smtp.corp.redhat.com (Postfix) with ESMTP id F02EB79231; Thu, 2 Jul 2020 16:51:24 +0000 (UTC) From: Andreas Gruenbacher To: Matthew Wilcox Cc: Dave Chinner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andreas Gruenbacher Subject: [RFC 1/4] gfs2: Revert readahead conversion Date: Thu, 2 Jul 2020 18:51:17 +0200 Message-Id: <20200702165120.1469875-2-agruenba@redhat.com> In-Reply-To: <20200702165120.1469875-1-agruenba@redhat.com> References: <20200702165120.1469875-1-agruenba@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Rspamd-Queue-Id: B8AA61819E63D X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Commit d4388340ae0b ("fs: convert mpage_readpages to mpage_readahead") converted gfs2 and other filesystems from the ->readpages to the ->readahead address space operation. Other than ->readpages, ->readahead is passed the pages to readahead locked. Due to problems in the current page locking strategy, this is now causing deadlocks in gfs2. Fix this by reinstating mpage_readpages from before commit d4388340ae0b and by converting gfs2 back to ->readpages. Signed-off-by: Andreas Gruenbacher --- fs/gfs2/aops.c | 23 ++++++++----- fs/mpage.c | 75 +++++++++++++++++++++++++++++++++++++++++++ include/linux/mpage.h | 2 ++ 3 files changed, 92 insertions(+), 8 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 72c9560f4467..786c1ce8f030 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -577,7 +577,7 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, loff_t *pos, } /** - * gfs2_readahead - Read a bunch of pages at once + * gfs2_readpages - Read a bunch of pages at once * @file: The file to read from * @mapping: Address space info * @pages: List of pages to read @@ -590,24 +590,31 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, loff_t *pos, * obviously not something we'd want to do on too regular a basis. * Any I/O we ignore at this time will be done via readpage later. * 2. We don't handle stuffed files here we let readpage do the honours. - * 3. mpage_readahead() does most of the heavy lifting in the common case. + * 3. mpage_readpages() does most of the heavy lifting in the common case. * 4. gfs2_block_map() is relied upon to set BH_Boundary in the right places. */ -static void gfs2_readahead(struct readahead_control *rac) +static int gfs2_readpages(struct file *file, struct address_space *mapping, + struct list_head *pages, unsigned nr_pages) { - struct inode *inode = rac->mapping->host; + struct inode *inode = mapping->host; struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_sbd *sdp = GFS2_SB(inode); struct gfs2_holder gh; + int ret; gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, &gh); - if (gfs2_glock_nq(&gh)) + ret = gfs2_glock_nq(&gh); + if (unlikely(ret)) goto out_uninit; if (!gfs2_is_stuffed(ip)) - mpage_readahead(rac, gfs2_block_map); + ret = mpage_readpages(mapping, pages, nr_pages, gfs2_block_map); gfs2_glock_dq(&gh); out_uninit: gfs2_holder_uninit(&gh); + if (unlikely(gfs2_withdrawn(sdp))) + ret = -EIO; + return ret; } /** @@ -826,7 +833,7 @@ static const struct address_space_operations gfs2_aops = { .writepage = gfs2_writepage, .writepages = gfs2_writepages, .readpage = gfs2_readpage, - .readahead = gfs2_readahead, + .readpages = gfs2_readpages, .bmap = gfs2_bmap, .invalidatepage = gfs2_invalidatepage, .releasepage = gfs2_releasepage, @@ -840,7 +847,7 @@ static const struct address_space_operations gfs2_jdata_aops = { .writepage = gfs2_jdata_writepage, .writepages = gfs2_jdata_writepages, .readpage = gfs2_readpage, - .readahead = gfs2_readahead, + .readpages = gfs2_readpages, .set_page_dirty = jdata_set_page_dirty, .bmap = gfs2_bmap, .invalidatepage = gfs2_invalidatepage, diff --git a/fs/mpage.c b/fs/mpage.c index 830e6cc2a9e7..5243a065a062 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -396,6 +396,81 @@ void mpage_readahead(struct readahead_control *rac, get_block_t get_block) } EXPORT_SYMBOL(mpage_readahead); +/** + * mpage_readpages - populate an address space with some pages & start reads against them + * @mapping: the address_space + * @pages: The address of a list_head which contains the target pages. These + * pages have their ->index populated and are otherwise uninitialised. + * The page at @pages->prev has the lowest file offset, and reads should be + * issued in @pages->prev to @pages->next order. + * @nr_pages: The number of pages at *@pages + * @get_block: The filesystem's block mapper function. + * + * This function walks the pages and the blocks within each page, building and + * emitting large BIOs. + * + * If anything unusual happens, such as: + * + * - encountering a page which has buffers + * - encountering a page which has a non-hole after a hole + * - encountering a page with non-contiguous blocks + * + * then this code just gives up and calls the buffer_head-based read function. + * It does handle a page which has holes at the end - that is a common case: + * the end-of-file on blocksize < PAGE_SIZE setups. + * + * BH_Boundary explanation: + * + * There is a problem. The mpage read code assembles several pages, gets all + * their disk mappings, and then submits them all. That's fine, but obtaining + * the disk mappings may require I/O. Reads of indirect blocks, for example. + * + * So an mpage read of the first 16 blocks of an ext2 file will cause I/O to be + * submitted in the following order: + * + * 12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 + * + * because the indirect block has to be read to get the mappings of blocks + * 13,14,15,16. Obviously, this impacts performance. + * + * So what we do it to allow the filesystem's get_block() function to set + * BH_Boundary when it maps block 11. BH_Boundary says: mapping of the block + * after this one will require I/O against a block which is probably close to + * this one. So you should push what I/O you have currently accumulated. + * + * This all causes the disk requests to be issued in the correct order. + */ +int +mpage_readpages(struct address_space *mapping, struct list_head *pages, + unsigned nr_pages, get_block_t get_block) +{ + struct mpage_readpage_args args = { + .get_block = get_block, + .is_readahead = true, + }; + unsigned page_idx; + + for (page_idx = 0; page_idx < nr_pages; page_idx++) { + struct page *page = lru_to_page(pages); + + prefetchw(&page->flags); + list_del(&page->lru); + if (!add_to_page_cache_lru(page, mapping, + page->index, + readahead_gfp_mask(mapping))) { + args.page = page; + args.nr_pages = nr_pages - page_idx; + args.bio = do_mpage_readpage(&args); + } + put_page(page); + } + BUG_ON(!list_empty(pages)); + if (args.bio) + mpage_bio_submit(REQ_OP_READ, REQ_RAHEAD, args.bio); + return 0; +} +EXPORT_SYMBOL(mpage_readpages); + /* * This isn't called much at all */ diff --git a/include/linux/mpage.h b/include/linux/mpage.h index f4f5e90a6844..181f1b0fbd83 100644 --- a/include/linux/mpage.h +++ b/include/linux/mpage.h @@ -16,6 +16,8 @@ struct writeback_control; struct readahead_control; void mpage_readahead(struct readahead_control *, get_block_t get_block); +int mpage_readpages(struct address_space *, struct list_head *, unsigned, + get_block_t); int mpage_readpage(struct page *page, get_block_t get_block); int mpage_writepages(struct address_space *mapping, struct writeback_control *wbc, get_block_t get_block);