From patchwork Mon Oct 12 14:03:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11832777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 360EB139F for ; Mon, 12 Oct 2020 14:04:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1B9BA206F4 for ; Mon, 12 Oct 2020 14:04:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="N3Eji2kw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390923AbgJLOEW (ORCPT ); Mon, 12 Oct 2020 10:04:22 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:38708 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2403775AbgJLODz (ORCPT ); Mon, 12 Oct 2020 10:03:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602511434; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pzsvx7UXT6aMAjxWW6vN5RJBkTTB9fPQ/VOY1YyM6DY=; b=N3Eji2kwaKMEgtbDL57r1c+fddABUabX+Evs775H5Hslg8SdsXdKLncKM8cDCCWH6fF78U nSn2Vk8p6hTiUyC1ku+Q743vWIqvlDR2i+AlEN4HP81/4Ehi+eRxFe108qFeVtUONzubc5 H/3hYJX6ICcMDKjeIB5TOyTexmnJbPg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-294-7C6Z62cfPfaSZTTQYsppNw-1; Mon, 12 Oct 2020 10:03:52 -0400 X-MC-Unique: 7C6Z62cfPfaSZTTQYsppNw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8FED0802B4B; Mon, 12 Oct 2020 14:03:51 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-112-249.rdu2.redhat.com [10.10.112.249]) by smtp.corp.redhat.com (Postfix) with ESMTP id 37D4260C07; Mon, 12 Oct 2020 14:03:51 +0000 (UTC) From: Brian Foster To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org Subject: [PATCH 1/2] iomap: use page dirty state to seek data over unwritten extents Date: Mon, 12 Oct 2020 10:03:49 -0400 Message-Id: <20201012140350.950064-2-bfoster@redhat.com> In-Reply-To: <20201012140350.950064-1-bfoster@redhat.com> References: <20201012140350.950064-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org iomap seek hole/data currently uses page Uptodate state to track data over unwritten extents. This is odd and unpredictable in that the existence of clean pages changes behavior. For example: $ xfs_io -fc "falloc 0 32k" -c "seek -d 0" \ -c "pread 16k 4k" -c "seek -d 0" /mnt/file Whence Result DATA EOF ... Whence Result DATA 16384 Instead, use page dirty state to locate data over unwritten extents. This causes seek data to land on the first uptodate block of a dirty page since we don't have per-block dirty state in iomap. This is consistent with writeback, however, which converts all uptodate blocks of a dirty page for similar reasons. Signed-off-by: Brian Foster --- fs/iomap/seek.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c index 107ee80c3568..981a74c8d60f 100644 --- a/fs/iomap/seek.c +++ b/fs/iomap/seek.c @@ -40,7 +40,7 @@ page_seek_hole_data(struct inode *inode, struct page *page, loff_t *lastoff, * Just check the page unless we can and should check block ranges: */ if (bsize == PAGE_SIZE || !ops->is_partially_uptodate) - return PageUptodate(page) == seek_data; + return PageDirty(page) == seek_data; lock_page(page); if (unlikely(page->mapping != inode->i_mapping)) @@ -49,7 +49,8 @@ page_seek_hole_data(struct inode *inode, struct page *page, loff_t *lastoff, for (off = 0; off < PAGE_SIZE; off += bsize) { if (offset_in_page(*lastoff) >= off + bsize) continue; - if (ops->is_partially_uptodate(page, off, bsize) == seek_data) { + if ((ops->is_partially_uptodate(page, off, bsize) && + PageDirty(page)) == seek_data) { unlock_page(page); return true; } From patchwork Mon Oct 12 14:03:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 11832785 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B7E1516BC for ; Mon, 12 Oct 2020 14:04:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9D6832080D for ; Mon, 12 Oct 2020 14:04:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iWkC6Gmx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390692AbgJLOEV (ORCPT ); Mon, 12 Oct 2020 10:04:21 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:54269 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2403780AbgJLOD5 (ORCPT ); Mon, 12 Oct 2020 10:03:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602511435; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nsP7bJ5xE859gYh7hZQPmIA2QTkwd3outq7h66tL/c4=; b=iWkC6GmxLDM8DE89HTw+XUC3QjbCF3oqjsdpE8w4j+F4KjgpKu5KgF/K3uBOH6ATkzOReJ 8OGdj4iGQTysan83zPnxd2csl6Ojx0dPRjaHp0By6KjqV8ZfmG9y1qeUUSmRpznyuT36th oLWkTQ/BsMNpMSV56CpbcLPcchwVivM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-481-F4G7htSwPtiZsJL8CfHXZw-1; Mon, 12 Oct 2020 10:03:53 -0400 X-MC-Unique: F4G7htSwPtiZsJL8CfHXZw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1C60910054FF; Mon, 12 Oct 2020 14:03:52 +0000 (UTC) Received: from bfoster.redhat.com (ovpn-112-249.rdu2.redhat.com [10.10.112.249]) by smtp.corp.redhat.com (Postfix) with ESMTP id B66F360C07; Mon, 12 Oct 2020 14:03:51 +0000 (UTC) From: Brian Foster To: linux-fsdevel@vger.kernel.org Cc: linux-xfs@vger.kernel.org Subject: [PATCH 2/2] iomap: zero cached pages over unwritten extents on zero range Date: Mon, 12 Oct 2020 10:03:50 -0400 Message-Id: <20201012140350.950064-3-bfoster@redhat.com> In-Reply-To: <20201012140350.950064-1-bfoster@redhat.com> References: <20201012140350.950064-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The iomap zero range mechanism currently skips unwritten mappings. This is normally not a problem as most users synchronize in-core state to the underlying block mapping by flushing pagecache prior to calling into iomap. This is not always the case, however. For example, XFS calls iomap_truncate_page() on truncate down before flushing the new EOF page of the file. This means that if the new EOF block is unwritten but covered by a dirty page (i.e. awaiting unwritten conversion after writeback), iomap fails to zero that page. The subsequent truncate_setsize() call does perform page zeroing, but doesn't dirty the page. Therefore if the new EOF page is written back after calling into iomap and before the pagecache truncate, the post-EOF zeroing is lost on page reclaim. This exposes stale post-EOF data on mapped reads. To address this problem, update the iomap zero range mechanism to explicitly zero ranges over unwritten extents where pagecache happens to exist. This is similar to how iomap seek data works for unwritten extents with cached data. In fact, we can reuse the same mechanism to scan for pagecache over large unwritten mappings to retain the same level of efficiency when zeroing large unwritten (and non-dirty) ranges. Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path") Signed-off-by: Brian Foster Reported-by: kernel test robot --- fs/iomap/buffered-io.c | 39 +++++++++++++++++++++++++++++++++++++-- fs/iomap/seek.c | 2 +- include/linux/iomap.h | 2 ++ 3 files changed, 40 insertions(+), 3 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index bcfc288dba3f..a07703d686da 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -944,6 +944,30 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset, return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap); } +/* + * Seek data over an unwritten mapping and update the counters for the caller to + * perform zeroing, if necessary. + */ +static void +iomap_zero_range_skip_uncached(struct inode *inode, loff_t *pos, + loff_t *count, loff_t *written) +{ + unsigned dirty_offset, bytes = 0; + + dirty_offset = page_cache_seek_hole_data(inode, *pos, *count, + SEEK_DATA); + if (dirty_offset == -ENOENT) + bytes = *count; + else if (dirty_offset > *pos) + bytes = dirty_offset - *pos; + + if (bytes) { + *pos += bytes; + *count -= bytes; + *written += bytes; + } +} + static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, void *data, struct iomap *iomap, struct iomap *srcmap) @@ -952,13 +976,24 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, loff_t written = 0; int status; - /* already zeroed? we're done. */ - if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) + /* holes are already zeroed, we're done */ + if (srcmap->type == IOMAP_HOLE) return count; do { unsigned offset, bytes; + /* + * Unwritten mappings are effectively zeroed on disk, but we + * must zero any preexisting data pages over the range. + */ + if (srcmap->type == IOMAP_UNWRITTEN) { + iomap_zero_range_skip_uncached(inode, &pos, &count, + &written); + if (!count) + break; + } + offset = offset_in_page(pos); bytes = min_t(loff_t, PAGE_SIZE - offset, count); diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c index 981a74c8d60f..6804c1d5808e 100644 --- a/fs/iomap/seek.c +++ b/fs/iomap/seek.c @@ -71,7 +71,7 @@ page_seek_hole_data(struct inode *inode, struct page *page, loff_t *lastoff, * * Returns the resulting offset on successs, and -ENOENT otherwise. */ -static loff_t +loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length, int whence) { diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 4d1d3c3469e9..898c012f4f33 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -184,6 +184,8 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset, const struct iomap_ops *ops); sector_t iomap_bmap(struct address_space *mapping, sector_t bno, const struct iomap_ops *ops); +loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset, + loff_t length, int whence); /* * Structure for writeback I/O completions.