From patchwork Wed May 23 14:46:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 10421683 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 32ECB6032A for ; Wed, 23 May 2018 14:47:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 219752897A for ; Wed, 23 May 2018 14:47:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 15676289B1; Wed, 23 May 2018 14:47:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8936428B00 for ; Wed, 23 May 2018 14:47:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 78E106B02A4; Wed, 23 May 2018 10:47:04 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 71D486B02A6; Wed, 23 May 2018 10:47:04 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53D386B02A7; Wed, 23 May 2018 10:47:04 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id E92026B02A4 for ; Wed, 23 May 2018 10:47:03 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id e1-v6so1351876pgv.4 for ; Wed, 23 May 2018 07:47:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=E2xSp5TacckznSVHmEfamXehSYwKJkf5pc9SaB8Kxus=; b=jjxw3gvR1wZvZSVnVpw/wcC/xdgG3O1vxO2VUairPKqZr+EUG2RDS4xPqgEcQR6nxK kdmddaiLwB23BdElOGtpAw/U54CTBd86c4ZreWLQTJHlHXTeiuMqq+XGs90mIucWMHXc 9CALd2Uas6XtsSyvZuqNypgWMcmPLffqKAPQEwU3G+DJxuJQQkX737d/YyE9nSPXFFDC hEOxR7ln++hRjDVVq1mDNiGhsZieLyB//8HzJbqGrXD0z5HiboW/1jvvpkSGdxUAjjib 8MrazRs7SqxglcVpw4ezZKhvvwxAKw0cPYqxt30XZH/70UasMX4/JxhoRLXEtMOFWteJ X4Uw== X-Gm-Message-State: ALKqPwcy5I2sZ8zVX97O/lF3pc7dxDdo9KDfgo74dgwHvc/LzojbitMX PTZTWz1H5YvP3+IBgVkimhkur9u4t6xJ5odALeIsCqMRSgNl5NFs8d7XjiNOS8V17Sux0YzZ3Rq UY7zZUFbNG54DrGVEoaApJGvwCUQWH9M08URX8K5FZshX4LyRNScCKaVZbIfEVJ0= X-Received: by 2002:a65:65ce:: with SMTP id y14-v6mr2592355pgv.270.1527086823536; Wed, 23 May 2018 07:47:03 -0700 (PDT) X-Google-Smtp-Source: AB8JxZo3YGpeDTwcw4hfpJCqwlabKNnyyOogbZ7AxHpct32ZkLZkRqXvQiirtAInbxUF3pTEejH4 X-Received: by 2002:a65:65ce:: with SMTP id y14-v6mr2592283pgv.270.1527086821517; Wed, 23 May 2018 07:47:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527086821; cv=none; d=google.com; s=arc-20160816; b=oG9GQwQrguvSQYHsY93RwAPKv6CvgyaxAHyCbg5NZ1Gx8XgE4MThLq7zG4LBDsKO2D qO6L1D8DG8RxqH5h4n0e1rmFpmAU3rpoyb+y739GEmm8H56DdVQ+VQ+YL1mREtAbv6T1 56Zt1Whmu7qD9bpCHxl/FZSwsCt1yp1mBUbuNGQGmwUv2MTmuTRspKirW4ZcjXUHLo1g 3vu8S8R602t1MqsGmQMe6IBunV15vNubO9daJ+Fct4C+7pzz7RHofRK7dy3PhPyd6a/q GcoE0j/i+mJd64bGLwD1VnuBwqiQOIzTVyXgHdD1YNkTV+3++S/1fa19nD71FA5H0IO/ Fr3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=E2xSp5TacckznSVHmEfamXehSYwKJkf5pc9SaB8Kxus=; b=yv0OOk3hw4VRHxfmVQ+M/rnE/G0zAF/8f6Kv3nELns3oRsAXdnGcKK5TZMNykBCzWV EB83fIPuHKkWeHVeHwxKbP/XhtipRsfTbWPIycv4aYMYCjFkiI95Xv22GHfRWCIv3MY8 qZw6mFSiRS3bk6AoUI042975K+r0EQ1dViUQyhIY4keRJY99N7a7Eug+iMGbLsW2FrqG xCLlcZgM9q5bzzTpSupkVwrTm5TeDQ6pykOIxFazedAY2YxTyXIjEy/UDUwDQhFBQNiA t1G1wg7f6AQZzfWYmw+zCS14fm4Al9UFSpzcRV7JVFK3/v0zJuew15T28XCeoIafEtN8 fhXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20170209 header.b=B5SDeJ0c; spf=pass (google.com: best guess record for domain of batv+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) smtp.mailfrom=BATV+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org Received: from bombadil.infradead.org (bombadil.infradead.org. [2607:7c80:54:e::133]) by mx.google.com with ESMTPS id b93-v6si19089110plb.536.2018.05.23.07.46.58 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 23 May 2018 07:47:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of batv+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) client-ip=2607:7c80:54:e::133; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20170209 header.b=B5SDeJ0c; spf=pass (google.com: best guess record for domain of batv+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org designates 2607:7c80:54:e::133 as permitted sender) smtp.mailfrom=BATV+df5a2477ff0fa86e9985+5386+infradead.org+hch@bombadil.srs.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=References:In-Reply-To:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:MIME-Version:Content-Type: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=E2xSp5TacckznSVHmEfamXehSYwKJkf5pc9SaB8Kxus=; b=B5SDeJ0cw2okYE6pIN35R8Yxx 16oVKegvVyCUaZrGKRuRTEw3xRsp/akbFCbCfSJwnSacfL6VHTvCwX9vYe2oI6nAhrfCCL3GHtar1 PJWmMIL4KlTvtIpW8P8gMPp45sJl0YsALEDe8DlLy37RyGcL7j2pp0TjzvdqSrhUM8ziA4WPXNiab cz91bTXWzlc4rFyLHoLmdKJGDpZiDqJiwFuP9Agkz6mXBeZzjm6UvHrc4jFPH2IU/BzCJfPx5apaW wmk4Fgu/WmGJgzehxdwLhc3LhXa6DscM2Ktv56yAv0VYfLWSCtWBkrGhbT/kQSfk2EnOKaDlvjAl+ G+pSHfQTg==; Received: from 089144199016.atnat0008.highway.a1.net ([89.144.199.16] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fLV2W-0002or-3y; Wed, 23 May 2018 14:46:57 +0000 From: Christoph Hellwig To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 2/2] xfs: add support for sub-pagesize writeback without buffer_heads Date: Wed, 23 May 2018 16:46:46 +0200 Message-Id: <20180523144646.19159-3-hch@lst.de> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180523144646.19159-1-hch@lst.de> References: <20180523144646.19159-1-hch@lst.de> X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Switch to using the iomap_page structure for checking sub-page uptodate status and track sub-page I/O completion status, and remove large quantities of boilerplate code working around buffer heads. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_aops.c | 536 +++++++-------------------------------------- fs/xfs/xfs_buf.h | 1 - fs/xfs/xfs_iomap.c | 3 - fs/xfs/xfs_super.c | 2 +- fs/xfs/xfs_trace.h | 18 +- 5 files changed, 79 insertions(+), 481 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index efa2cbb27d67..d279929e53fb 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -32,9 +32,6 @@ #include "xfs_bmap_util.h" #include "xfs_bmap_btree.h" #include "xfs_reflink.h" -#include -#include -#include #include /* @@ -46,25 +43,6 @@ struct xfs_writepage_ctx { struct xfs_ioend *ioend; }; -void -xfs_count_page_state( - struct page *page, - int *delalloc, - int *unwritten) -{ - struct buffer_head *bh, *head; - - *delalloc = *unwritten = 0; - - bh = head = page_buffers(page); - do { - if (buffer_unwritten(bh)) - (*unwritten) = 1; - else if (buffer_delay(bh)) - (*delalloc) = 1; - } while ((bh = bh->b_this_page) != head); -} - struct block_device * xfs_find_bdev_for_inode( struct inode *inode) @@ -97,67 +75,17 @@ xfs_finish_page_writeback( struct bio_vec *bvec, int error) { + struct iomap_page *iop = to_iomap_page(bvec->bv_page); + if (error) { SetPageError(bvec->bv_page); mapping_set_error(inode->i_mapping, -EIO); } - end_page_writeback(bvec->bv_page); -} -/* - * We're now finished for good with this page. Update the page state via the - * associated buffer_heads, paying attention to the start and end offsets that - * we need to process on the page. - * - * Note that we open code the action in end_buffer_async_write here so that we - * only have to iterate over the buffers attached to the page once. This is not - * only more efficient, but also ensures that we only calls end_page_writeback - * at the end of the iteration, and thus avoids the pitfall of having the page - * and buffers potentially freed after every call to end_buffer_async_write. - */ -static void -xfs_finish_buffer_writeback( - struct inode *inode, - struct bio_vec *bvec, - int error) -{ - struct buffer_head *head = page_buffers(bvec->bv_page), *bh = head; - bool busy = false; - unsigned int off = 0; - unsigned long flags; - - ASSERT(bvec->bv_offset < PAGE_SIZE); - ASSERT((bvec->bv_offset & (i_blocksize(inode) - 1)) == 0); - ASSERT(bvec->bv_offset + bvec->bv_len <= PAGE_SIZE); - ASSERT((bvec->bv_len & (i_blocksize(inode) - 1)) == 0); - - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &head->b_state); - do { - if (off >= bvec->bv_offset && - off < bvec->bv_offset + bvec->bv_len) { - ASSERT(buffer_async_write(bh)); - ASSERT(bh->b_end_io == NULL); - - if (error) { - mark_buffer_write_io_error(bh); - clear_buffer_uptodate(bh); - SetPageError(bvec->bv_page); - } else { - set_buffer_uptodate(bh); - } - clear_buffer_async_write(bh); - unlock_buffer(bh); - } else if (buffer_async_write(bh)) { - ASSERT(buffer_locked(bh)); - busy = true; - } - off += bh->b_size; - } while ((bh = bh->b_this_page) != head); - bit_spin_unlock(BH_Uptodate_Lock, &head->b_state); - local_irq_restore(flags); + ASSERT(iop || i_blocksize(inode) == PAGE_SIZE); + ASSERT(!iop || atomic_read(&iop->write_count) > 0); - if (!busy) + if (!iop || atomic_dec_and_test(&iop->write_count)) end_page_writeback(bvec->bv_page); } @@ -191,12 +119,8 @@ xfs_destroy_ioend( next = bio->bi_private; /* walk each page on bio, ending page IO on them */ - bio_for_each_segment_all(bvec, bio, i) { - if (page_has_buffers(bvec->bv_page)) - xfs_finish_buffer_writeback(inode, bvec, error); - else - xfs_finish_page_writeback(inode, bvec, error); - } + bio_for_each_segment_all(bvec, bio, i) + xfs_finish_page_writeback(inode, bvec, error); bio_put(bio); } @@ -638,6 +562,7 @@ xfs_add_to_ioend( struct inode *inode, xfs_off_t offset, struct page *page, + struct iomap_page *iop, struct xfs_writepage_ctx *wpc, struct writeback_control *wbc, struct list_head *iolist) @@ -661,100 +586,27 @@ xfs_add_to_ioend( bdev, sector); } - /* - * If the block doesn't fit into the bio we need to allocate a new - * one. This shouldn't happen more than once for a given block. - */ - while (bio_add_page(wpc->ioend->io_bio, page, len, poff) != len) - xfs_chain_bio(wpc->ioend, wbc, bdev, sector); + if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) { + if (iop) + atomic_inc(&iop->write_count); + if (bio_full(wpc->ioend->io_bio)) + xfs_chain_bio(wpc->ioend, wbc, bdev, sector); + __bio_add_page(wpc->ioend->io_bio, page, len, poff); + } wpc->ioend->io_size += len; } -STATIC void -xfs_map_buffer( - struct inode *inode, - struct buffer_head *bh, - struct xfs_bmbt_irec *imap, - xfs_off_t offset) -{ - sector_t bn; - struct xfs_mount *m = XFS_I(inode)->i_mount; - xfs_off_t iomap_offset = XFS_FSB_TO_B(m, imap->br_startoff); - xfs_daddr_t iomap_bn = xfs_fsb_to_db(XFS_I(inode), imap->br_startblock); - - ASSERT(imap->br_startblock != HOLESTARTBLOCK); - ASSERT(imap->br_startblock != DELAYSTARTBLOCK); - - bn = (iomap_bn >> (inode->i_blkbits - BBSHIFT)) + - ((offset - iomap_offset) >> inode->i_blkbits); - - ASSERT(bn || XFS_IS_REALTIME_INODE(XFS_I(inode))); - - bh->b_blocknr = bn; - set_buffer_mapped(bh); -} - -STATIC void -xfs_map_at_offset( - struct inode *inode, - struct buffer_head *bh, - struct xfs_bmbt_irec *imap, - xfs_off_t offset) -{ - ASSERT(imap->br_startblock != HOLESTARTBLOCK); - ASSERT(imap->br_startblock != DELAYSTARTBLOCK); - - lock_buffer(bh); - xfs_map_buffer(inode, bh, imap, offset); - set_buffer_mapped(bh); - clear_buffer_delay(bh); - clear_buffer_unwritten(bh); - - /* - * If this is a realtime file, data may be on a different device. - * to that pointed to from the buffer_head b_bdev currently. We can't - * trust that the bufferhead has a already been mapped correctly, so - * set the bdev now. - */ - bh->b_bdev = xfs_find_bdev_for_inode(inode); - bh->b_end_io = NULL; - set_buffer_async_write(bh); - set_buffer_uptodate(bh); - clear_buffer_dirty(bh); -} - -STATIC void -xfs_vm_invalidatepage( - struct page *page, - unsigned int offset, - unsigned int length) -{ - trace_xfs_invalidatepage(page->mapping->host, page, offset, - length); - - /* - * If we are invalidating the entire page, clear the dirty state from it - * so that we can check for attempts to release dirty cached pages in - * xfs_vm_releasepage(). - */ - if (offset == 0 && length >= PAGE_SIZE) - cancel_dirty_page(page); - block_invalidatepage(page, offset, length); -} - /* - * If the page has delalloc buffers on it, we need to punch them out before we - * invalidate the page. If we don't, we leave a stale delalloc mapping on the - * inode that can trip a BUG() in xfs_get_blocks() later on if a direct IO read - * is done on that same region - the delalloc extent is returned when none is - * supposed to be there. + * If the page has delalloc blocks on it, we need to punch them out before we + * invalidate the page. If we don't, we leave a stale delalloc mapping on the + * inode that can trip up a later direct I/O read operation on the same region. * - * We prevent this by truncating away the delalloc regions on the page before - * invalidating it. Because they are delalloc, we can do this without needing a - * transaction. Indeed - if we get ENOSPC errors, we have to be able to do this - * truncation without a transaction as there is no space left for block - * reservation (typically why we see a ENOSPC in writeback). + * We prevent this by truncating away the delalloc regions on the page. Because + * they are delalloc, we can do this without needing a transaction. Indeed - if + * we get ENOSPC errors, we have to be able to do this truncation without a + * transaction as there is no space left for block reservation (typically why we + * see a ENOSPC in writeback). */ STATIC void xfs_aops_discard_page( @@ -768,7 +620,7 @@ xfs_aops_discard_page( int error; if (XFS_FORCED_SHUTDOWN(mp)) - goto out_invalidate; + goto out; xfs_alert(mp, "page discard on page "PTR_FMT", inode 0x%llx, offset %llu.", @@ -778,15 +630,15 @@ xfs_aops_discard_page( PAGE_SIZE / i_blocksize(inode)); if (error && !XFS_FORCED_SHUTDOWN(mp)) xfs_alert(mp, "page discard unable to remove delalloc mapping."); -out_invalidate: - xfs_vm_invalidatepage(page, 0, PAGE_SIZE); +out: + iomap_invalidatepage(page, 0, PAGE_SIZE); } /* * We implement an immediate ioend submission policy here to avoid needing to * chain multiple ioends and hence nest mempool allocations which can violate * forward progress guarantees we need to provide. The current ioend we are - * adding buffers to is cached on the writepage context, and if the new buffer + * adding blocks to is cached on the writepage context, and if the new block * does not append to the cached ioend it will create a new ioend and cache that * instead. * @@ -807,41 +659,28 @@ xfs_writepage_map( uint64_t end_offset) { LIST_HEAD(submit_list); + struct iomap_page *iop = to_iomap_page(page); + unsigned len = i_blocksize(inode); struct xfs_ioend *ioend, *next; - struct buffer_head *bh = NULL; - ssize_t len = i_blocksize(inode); - int error = 0; - int count = 0; - loff_t file_offset; /* file offset of page */ - unsigned poffset; /* offset into page */ + int error = 0, count = 0, i; + u64 file_offset; /* file offset of page */ - if (page_has_buffers(page)) - bh = page_buffers(page); + ASSERT(iop || i_blocksize(inode) == PAGE_SIZE); + ASSERT(!iop || atomic_read(&iop->write_count) == 0); /* - * Walk the blocks on the page, and we we run off then end of the - * current map or find the current map invalid, grab a new one. - * We only use bufferheads here to check per-block state - they no - * longer control the iteration through the page. This allows us to - * replace the bufferhead with some other state tracking mechanism in - * future. + * Walk through the page to find areas to write back. If we run off the + * end of the current map or find the current map invalid, grab a new + * one. */ - for (poffset = 0, file_offset = page_offset(page); - poffset < PAGE_SIZE; - poffset += len, file_offset += len) { - /* past the range we are writing, so nothing more to write. */ - if (file_offset >= end_offset) - break; - + for (i = 0, file_offset = page_offset(page); + i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset; + i++, file_offset += len) { /* * Block does not contain valid data, skip it. */ - if (bh && !buffer_uptodate(bh)) { - if (PageUptodate(page)) - ASSERT(buffer_mapped(bh)); - bh = bh->b_this_page; + if (iop && !test_bit(i, iop->uptodate)) continue; - } /* * If we don't have a valid map, now it's time to get a new one @@ -854,52 +693,33 @@ xfs_writepage_map( error = xfs_map_blocks(inode, file_offset, &wpc->imap, &wpc->io_type); if (error) - goto out; + break; } - if (wpc->io_type == XFS_IO_HOLE) { - /* - * set_page_dirty dirties all buffers in a page, independent - * of their state. The dirty state however is entirely - * meaningless for holes (!mapped && uptodate), so check we did - * have a buffer covering a hole here and continue. - */ - if (bh) - bh = bh->b_this_page; - continue; - } - - if (bh) { - xfs_map_at_offset(inode, bh, &wpc->imap, file_offset); - bh = bh->b_this_page; + if (wpc->io_type != XFS_IO_HOLE) { + xfs_add_to_ioend(inode, file_offset, page, iop, wpc, + wbc, &submit_list); + count++; } - xfs_add_to_ioend(inode, file_offset, page, wpc, wbc, - &submit_list); - count++; } ASSERT(wpc->ioend || list_empty(&submit_list)); - -out: ASSERT(PageLocked(page)); ASSERT(!PageWriteback(page)); /* - * On error, we have to fail the ioend here because we have locked - * buffers in the ioend. If we don't do this, we'll deadlock - * invalidating the page as that tries to lock the buffers on the page. - * Also, because we may have set pages under writeback, we have to make - * sure we run IO completion to mark the error state of the IO - * appropriately, so we can't cancel the ioend directly here. That means - * we have to mark this page as under writeback if we included any - * buffers from it in the ioend chain so that completion treats it - * correctly. + * On error, we have to fail the ioend here because we may have set + * pages under writeback, we have to make sure we run IO completion to + * mark the error state of the IO appropriately, so we can't cancel the + * ioend directly here. That means we have to mark this page as under + * writeback if we included any blocks from it in the ioend chain so + * that completion treats it correctly. * * If we didn't include the page in the ioend, the on error we can * simply discard and unlock it as there are no other users of the page - * or it's buffers right now. The caller will still need to trigger - * submission of outstanding ioends on the writepage context so they are - * treated correctly on error. + * now. The caller will still need to trigger submission of outstanding + * ioends on the writepage context so they are treated correctly on + * error. */ if (unlikely(error)) { if (!count) { @@ -940,8 +760,8 @@ xfs_writepage_map( } /* - * We can end up here with no error and nothing to write if we race with - * a partial page truncate on a sub-page block sized filesystem. + * We can end up here with no error and nothing to write only if we race + * with a partial page truncate on a sub-page block sized filesystem. */ if (!count) end_page_writeback(page); @@ -956,7 +776,6 @@ xfs_writepage_map( * For delalloc space on the page we need to allocate space and flush it. * For unwritten space on the page we need to start the conversion to * regular allocated space. - * For any other dirty buffer heads on the page we should flush them. */ STATIC int xfs_do_writepage( @@ -1110,168 +929,6 @@ xfs_dax_writepages( xfs_find_bdev_for_inode(mapping->host), wbc); } -/* - * Called to move a page into cleanable state - and from there - * to be released. The page should already be clean. We always - * have buffer heads in this call. - * - * Returns 1 if the page is ok to release, 0 otherwise. - */ -STATIC int -xfs_vm_releasepage( - struct page *page, - gfp_t gfp_mask) -{ - int delalloc, unwritten; - - trace_xfs_releasepage(page->mapping->host, page, 0, 0); - - /* - * mm accommodates an old ext3 case where clean pages might not have had - * the dirty bit cleared. Thus, it can send actual dirty pages to - * ->releasepage() via shrink_active_list(). Conversely, - * block_invalidatepage() can send pages that are still marked dirty but - * otherwise have invalidated buffers. - * - * We want to release the latter to avoid unnecessary buildup of the - * LRU, so xfs_vm_invalidatepage() clears the page dirty flag on pages - * that are entirely invalidated and need to be released. Hence the - * only time we should get dirty pages here is through - * shrink_active_list() and so we can simply skip those now. - * - * warn if we've left any lingering delalloc/unwritten buffers on clean - * or invalidated pages we are about to release. - */ - if (PageDirty(page)) - return 0; - - xfs_count_page_state(page, &delalloc, &unwritten); - - if (WARN_ON_ONCE(delalloc)) - return 0; - if (WARN_ON_ONCE(unwritten)) - return 0; - - return try_to_free_buffers(page); -} - -/* - * If this is O_DIRECT or the mpage code calling tell them how large the mapping - * is, so that we can avoid repeated get_blocks calls. - * - * If the mapping spans EOF, then we have to break the mapping up as the mapping - * for blocks beyond EOF must be marked new so that sub block regions can be - * correctly zeroed. We can't do this for mappings within EOF unless the mapping - * was just allocated or is unwritten, otherwise the callers would overwrite - * existing data with zeros. Hence we have to split the mapping into a range up - * to and including EOF, and a second mapping for beyond EOF. - */ -static void -xfs_map_trim_size( - struct inode *inode, - sector_t iblock, - struct buffer_head *bh_result, - struct xfs_bmbt_irec *imap, - xfs_off_t offset, - ssize_t size) -{ - xfs_off_t mapping_size; - - mapping_size = imap->br_startoff + imap->br_blockcount - iblock; - mapping_size <<= inode->i_blkbits; - - ASSERT(mapping_size > 0); - if (mapping_size > size) - mapping_size = size; - if (offset < i_size_read(inode) && - (xfs_ufsize_t)offset + mapping_size >= i_size_read(inode)) { - /* limit mapping to block that spans EOF */ - mapping_size = roundup_64(i_size_read(inode) - offset, - i_blocksize(inode)); - } - if (mapping_size > LONG_MAX) - mapping_size = LONG_MAX; - - bh_result->b_size = mapping_size; -} - -static int -xfs_get_blocks( - struct inode *inode, - sector_t iblock, - struct buffer_head *bh_result, - int create) -{ - struct xfs_inode *ip = XFS_I(inode); - struct xfs_mount *mp = ip->i_mount; - xfs_fileoff_t offset_fsb, end_fsb; - int error = 0; - int lockmode = 0; - struct xfs_bmbt_irec imap; - int nimaps = 1; - xfs_off_t offset; - ssize_t size; - - BUG_ON(create); - - if (XFS_FORCED_SHUTDOWN(mp)) - return -EIO; - - offset = (xfs_off_t)iblock << inode->i_blkbits; - ASSERT(bh_result->b_size >= i_blocksize(inode)); - size = bh_result->b_size; - - if (offset >= i_size_read(inode)) - return 0; - - /* - * Direct I/O is usually done on preallocated files, so try getting - * a block mapping without an exclusive lock first. - */ - lockmode = xfs_ilock_data_map_shared(ip); - - ASSERT(offset <= mp->m_super->s_maxbytes); - if (offset > mp->m_super->s_maxbytes - size) - size = mp->m_super->s_maxbytes - offset; - end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + size); - offset_fsb = XFS_B_TO_FSBT(mp, offset); - - error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, - &nimaps, 0); - if (error) - goto out_unlock; - if (!nimaps) { - trace_xfs_get_blocks_notfound(ip, offset, size); - goto out_unlock; - } - - trace_xfs_get_blocks_found(ip, offset, size, - imap.br_state == XFS_EXT_UNWRITTEN ? - XFS_IO_UNWRITTEN : XFS_IO_OVERWRITE, &imap); - xfs_iunlock(ip, lockmode); - - /* trim mapping down to size requested */ - xfs_map_trim_size(inode, iblock, bh_result, &imap, offset, size); - - /* - * For unwritten extents do not report a disk address in the buffered - * read case (treat as if we're reading into a hole). - */ - if (xfs_bmap_is_real_extent(&imap)) - xfs_map_buffer(inode, bh_result, &imap, offset); - - /* - * If this is a realtime file, data may be on a different device. - * to that pointed to from the buffer_head b_bdev currently. - */ - bh_result->b_bdev = xfs_find_bdev_for_inode(inode); - return 0; - -out_unlock: - xfs_iunlock(ip, lockmode); - return error; -} - STATIC sector_t xfs_vm_bmap( struct address_space *mapping, @@ -1301,9 +958,7 @@ xfs_vm_readpage( struct page *page) { trace_xfs_vm_readpage(page->mapping->host, 1); - if (i_blocksize(page->mapping->host) == PAGE_SIZE) - return iomap_readpage(page, &xfs_iomap_ops); - return mpage_readpage(page, xfs_get_blocks); + return iomap_readpage(page, &xfs_iomap_ops); } STATIC int @@ -1314,65 +969,26 @@ xfs_vm_readpages( unsigned nr_pages) { trace_xfs_vm_readpages(mapping->host, nr_pages); - if (i_blocksize(mapping->host) == PAGE_SIZE) - return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops); - return mpage_readpages(mapping, pages, nr_pages, xfs_get_blocks); + return iomap_readpages(mapping, pages, nr_pages, &xfs_iomap_ops); } -/* - * This is basically a copy of __set_page_dirty_buffers() with one - * small tweak: buffers beyond EOF do not get marked dirty. If we mark them - * dirty, we'll never be able to clean them because we don't write buffers - * beyond EOF, and that means we can't invalidate pages that span EOF - * that have been marked dirty. Further, the dirty state can leak into - * the file interior if the file is extended, resulting in all sorts of - * bad things happening as the state does not match the underlying data. - * - * XXX: this really indicates that bufferheads in XFS need to die. Warts like - * this only exist because of bufferheads and how the generic code manages them. - */ -STATIC int -xfs_vm_set_page_dirty( - struct page *page) +static int +xfs_vm_releasepage( + struct page *page, + gfp_t gfp_mask) { - struct address_space *mapping = page->mapping; - struct inode *inode = mapping->host; - loff_t end_offset; - loff_t offset; - int newly_dirty; - - if (unlikely(!mapping)) - return !TestSetPageDirty(page); - - end_offset = i_size_read(inode); - offset = page_offset(page); - - spin_lock(&mapping->private_lock); - if (page_has_buffers(page)) { - struct buffer_head *head = page_buffers(page); - struct buffer_head *bh = head; + trace_xfs_releasepage(page->mapping->host, page, 0, 0); + return iomap_releasepage(page, gfp_mask); +} - do { - if (offset < end_offset) - set_buffer_dirty(bh); - bh = bh->b_this_page; - offset += i_blocksize(inode); - } while (bh != head); - } - /* - * Lock out page->mem_cgroup migration to keep PageDirty - * synchronized with per-memcg dirty page counters. - */ - lock_page_memcg(page); - newly_dirty = !TestSetPageDirty(page); - spin_unlock(&mapping->private_lock); - - if (newly_dirty) - __set_page_dirty(page, mapping, 1); - unlock_page_memcg(page); - if (newly_dirty) - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); - return newly_dirty; +static void +xfs_vm_invalidatepage( + struct page *page, + unsigned int offset, + unsigned int length) +{ + trace_xfs_invalidatepage(page->mapping->host, page, offset, length); + iomap_invalidatepage(page, offset, length); } static int @@ -1390,13 +1006,13 @@ const struct address_space_operations xfs_address_space_operations = { .readpages = xfs_vm_readpages, .writepage = xfs_vm_writepage, .writepages = xfs_vm_writepages, - .set_page_dirty = xfs_vm_set_page_dirty, + .set_page_dirty = iomap_set_page_dirty, .releasepage = xfs_vm_releasepage, .invalidatepage = xfs_vm_invalidatepage, .bmap = xfs_vm_bmap, .direct_IO = noop_direct_IO, - .migratepage = buffer_migrate_page, - .is_partially_uptodate = block_is_partially_uptodate, + .migratepage = iomap_migrate_page, + .is_partially_uptodate = iomap_is_partially_uptodate, .error_remove_page = generic_error_remove_page, .swap_activate = xfs_iomap_swapfile_activate, }; diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index f5f2b71c2fde..f3fa197bd272 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -24,7 +24,6 @@ #include #include #include -#include #include #include diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 93c40da3378a..c646d84cd55e 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1031,9 +1031,6 @@ xfs_file_iomap_begin( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; - if (i_blocksize(inode) < PAGE_SIZE) - iomap->flags |= IOMAP_F_BUFFER_HEAD; - if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) && !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { /* Reserve delalloc blocks for regular writeback. */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 39e5ec3d407f..a9f23ec95216 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1866,7 +1866,7 @@ MODULE_ALIAS_FS("xfs"); STATIC int __init xfs_init_zones(void) { - xfs_ioend_bioset = bioset_create(4 * MAX_BUF_PER_PAGE, + xfs_ioend_bioset = bioset_create(4 * (PAGE_SIZE / SECTOR_SIZE), offsetof(struct xfs_ioend, io_inline_bio), BIOSET_NEED_BVECS); if (!xfs_ioend_bioset) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index ed8f774944ba..e4dc7c7f3da9 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1165,33 +1165,23 @@ DECLARE_EVENT_CLASS(xfs_page_class, __field(loff_t, size) __field(unsigned long, offset) __field(unsigned int, length) - __field(int, delalloc) - __field(int, unwritten) ), TP_fast_assign( - int delalloc = -1, unwritten = -1; - - if (page_has_buffers(page)) - xfs_count_page_state(page, &delalloc, &unwritten); __entry->dev = inode->i_sb->s_dev; __entry->ino = XFS_I(inode)->i_ino; __entry->pgoff = page_offset(page); __entry->size = i_size_read(inode); __entry->offset = off; __entry->length = len; - __entry->delalloc = delalloc; - __entry->unwritten = unwritten; ), TP_printk("dev %d:%d ino 0x%llx pgoff 0x%lx size 0x%llx offset %lx " - "length %x delalloc %d unwritten %d", + "length %x", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino, __entry->pgoff, __entry->size, __entry->offset, - __entry->length, - __entry->delalloc, - __entry->unwritten) + __entry->length) ) #define DEFINE_PAGE_EVENT(name) \ @@ -1275,9 +1265,6 @@ DEFINE_EVENT(xfs_imap_class, name, \ TP_ARGS(ip, offset, count, type, irec)) DEFINE_IOMAP_EVENT(xfs_map_blocks_found); DEFINE_IOMAP_EVENT(xfs_map_blocks_alloc); -DEFINE_IOMAP_EVENT(xfs_get_blocks_found); -DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc); -DEFINE_IOMAP_EVENT(xfs_get_blocks_map_direct); DEFINE_IOMAP_EVENT(xfs_iomap_alloc); DEFINE_IOMAP_EVENT(xfs_iomap_found); @@ -1316,7 +1303,6 @@ DEFINE_EVENT(xfs_simple_io_class, name, \ TP_ARGS(ip, offset, count)) DEFINE_SIMPLE_IO_EVENT(xfs_delalloc_enospc); DEFINE_SIMPLE_IO_EVENT(xfs_unwritten_convert); -DEFINE_SIMPLE_IO_EVENT(xfs_get_blocks_notfound); DEFINE_SIMPLE_IO_EVENT(xfs_setfilesize); DEFINE_SIMPLE_IO_EVENT(xfs_zero_eof); DEFINE_SIMPLE_IO_EVENT(xfs_end_io_direct_write);