From patchwork Tue Nov 15 01:30:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84222C4167D for ; Tue, 15 Nov 2022 01:30:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D58758E0001; Mon, 14 Nov 2022 20:30:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BD53D6B0074; Mon, 14 Nov 2022 20:30:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A51528E0003; Mon, 14 Nov 2022 20:30:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8111F6B0073 for ; Mon, 14 Nov 2022 20:30:50 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 52DF680F08 for ; Tue, 15 Nov 2022 01:30:50 +0000 (UTC) X-FDA: 80133947460.30.8F13033 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf20.hostedemail.com (Postfix) with ESMTP id 03A331C0006 for ; Tue, 15 Nov 2022 01:30:49 +0000 (UTC) Received: by mail-pl1-f170.google.com with SMTP id p12so11769290plq.4 for ; Mon, 14 Nov 2022 17:30:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RAlz7tqq0QNaEwK87s1iuzIYezDUz9kvZQYKXkSLXDo=; b=sd9UI1HlQaNvLFt5yYvNRsOv7VflJP4IAUrhnHE7flexeeuoQ5hP8TKmCc+BaiIfMP ZiRGlgPAXhmbHSKa7yVy2L3lIOlPynkkJEHtQmdc7TRdfpUE6pywh2aPcPiv4GoVLz0Q zd0iww4tAvO4mJl3M8G/RaiY6Ava/FpH3GuKDApgHTJPoSxt/5F8/qZBvVpNgq1A8Prs fHl1IL0/KmUzftODKT8X6Jm4m4ecva0wyKgRR8mwwBU4D5J7BY/i/6NoDYqBkj3VD1qD r5la6S0xsiSZSjAQpJVXSlemhOX2B55pAvdhXhpwAkHkFieaYn2mwZYDwfepK+WLZrd3 Y0HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RAlz7tqq0QNaEwK87s1iuzIYezDUz9kvZQYKXkSLXDo=; b=ab7SsORucc+6uoD2s7jHHP0nNczMlTpwVSNKKBMGldVs35rAbuLj6F+7iAOEv2b9K/ B0ugjcqFK+9ooCEXM43D1qjMk5HTAmIvKTc1pwZDvpvcOkbgHXXKS2yVJT7L6oKP3psk DXkngU2lx7vyLO6EkEawFztxm/wjmEyGecBdFDhDOk1+b9HbESLOXB11vufhjbl1RnIu ZlDjAWpdgO4/k/TTjjNriSScL1BU3HZFMyJxTUwEjLN5FOk+VsgK34xiV0RVUKGQl3lo xO0h0QMsDkac8G9MOrR6u0nrpQPSC+u/fFUrYa3RQL3O0jHHc65U6f8FOoB7S3eFpvh/ j/1w== X-Gm-Message-State: ANoB5pllREt3A4vahPcFD+5K1AY9ubO42Ipu0KZw1jmOdjwxZ26e8l9U ZVMzAmT6Wj3jCB5+qojYhAa9tA== X-Google-Smtp-Source: AA0mqf7VUafnzdFgsn+0nyI7dq9zs5JPeB8q0hRVxivQdDOKjWJ/q10BkeaEXZRzwUNmnCwRvxAnNQ== X-Received: by 2002:a17:90b:4b4b:b0:213:2262:e3f8 with SMTP id mi11-20020a17090b4b4b00b002132262e3f8mr16127935pjb.82.1668475849114; Mon, 14 Nov 2022 17:30:49 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id w3-20020a626203000000b0056bc31f4f9fsm7390484pfb.65.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:47 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKFu-9H; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001VpE-0o; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 1/9] mm: export mapping_seek_hole_data() Date: Tue, 15 Nov 2022 12:30:35 +1100 Message-Id: <20221115013043.360610-2-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475850; a=rsa-sha256; cv=none; b=y82avkLni9EOZ8g3+mni2efHMXjl/bU25sNbzBPWBAkFKSglXquIrlwYTMtUYog4MmJsiB KMvL/JSOzHycGOtIy8j4VDk6bU22YITjWCd1m1h7++tRUAGChtIdcXvQccUWxMZo3/nx68 PbXRRrzN67eCOtYkO2eQYKLgVbXtKn0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=sd9UI1Hl; dmarc=none; spf=none (imf20.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.214.170) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475850; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RAlz7tqq0QNaEwK87s1iuzIYezDUz9kvZQYKXkSLXDo=; b=kQuiHTF/NpWzK1MRA/6gWZcdFInJXUYjUg2hoEynJWp1cJcOBWZpXqdYJEyuFPtcad7kHF 1Y6J8sWIbzXiJxLtQj278iQGJ5F/D7OiXUUiUZI4zDxn8Z9c2uYfkH3XF5aZEArbTv0HQi b4FZYRmFQeSqH3EtmhwAAO/v/ZDQeZ8= X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=sd9UI1Hl; dmarc=none; spf=none (imf20.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.214.170) smtp.mailfrom=david@fromorbit.com X-Rspamd-Server: rspam02 X-Stat-Signature: fphki1oep6dc5jbnh13squrf6g8iwypo X-Rspamd-Queue-Id: 03A331C0006 X-HE-Tag: 1668475849-950708 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner XFS needs this for finding cached dirty data regions when cleaning up short writes, so it needs to be exported as XFS can be built as a module. Signed-off-by: Dave Chinner --- mm/filemap.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/filemap.c b/mm/filemap.c index 08341616ae7a..07d255c41c43 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2925,6 +2925,7 @@ loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, return end; return start; } +EXPORT_SYMBOL_GPL(mapping_seek_hole_data); #ifdef CONFIG_MMU #define MMAP_LOTSAMISS (100) From patchwork Tue Nov 15 01:30:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043082 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 863F1C4332F for ; Tue, 15 Nov 2022 01:30:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D1F68E0003; Mon, 14 Nov 2022 20:30:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 45B208E0002; Mon, 14 Nov 2022 20:30:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 322098E0003; Mon, 14 Nov 2022 20:30:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 22FFB8E0002 for ; Mon, 14 Nov 2022 20:30:51 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id ED5281C645F for ; Tue, 15 Nov 2022 01:30:50 +0000 (UTC) X-FDA: 80133947460.28.8E4999F Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by imf12.hostedemail.com (Postfix) with ESMTP id 931A940006 for ; Tue, 15 Nov 2022 01:30:50 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id h193so11861150pgc.10 for ; Mon, 14 Nov 2022 17:30:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xXTvPzYo/FNVPPUBJfWV1FT/CBSdhDIuqNcR9jo+cPM=; b=YeiVS6DbLpVk00yQfx8PyNLK6jcZ1YAIEmPvKhGiCeMnVsTJobJoplSrltjmtk/iIA LTzl4pjiLxoaKi2tGfcvcg+vUo9G04Pj/qzLWVbW1xyIMCNqF7NY/Z2s+CSkcLvmf4d1 vY2yskVvzZ01jPJhkKb8RutnRxDpIafnWupAel36iBL943wb5ZqVLaqKJhKceu4H5/8f cFu+AztIQjn2HqAZa63WRRd78gwTuBP2CSIY36rG4sUjPY8yHw5CZcpBvcA1nJrj09ar QS60U2sttLdOtt7jz6WVVphTMh246zUic1SjasKq+K4uDu55aXs1VQnEt9hlvAGpjK3S 3u8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xXTvPzYo/FNVPPUBJfWV1FT/CBSdhDIuqNcR9jo+cPM=; b=6VRJR2MjUcXV2hk2EZhVeSmY0Tb9MmEmZhC3OQbM6o6sBEid75ukf6/anvziOn0JA4 8MaCRcsRAs6M6PXzTvjuomM286c0996SUJtsXZ3JvQ5qNWZpgZvPmXJ1yx5Hf/aLpDjn NofOQ3GK8Z03Q5ZEWAKXESYAj1acxshayEWeJutfi5AdsJNgaQxDa9hIzlaTRLC91jB8 SVQTozepGCWNjuhloLvnsS8BMA7u2+qZ3z5kmN7t4+63xoyaR/oB/QMhdKU5BsM+DpS3 WmuIG6ZcXTWz4NwnyTo3jK/MUcPyK/cQPtXaZ++oCZZMOji3AmFViWzeqmgRD7t80m0z KwxA== X-Gm-Message-State: ANoB5pnw5hbzN7EAeqIwtzaVREWE/KGsQVUJqBH0Pv6cR/upxOrNR7tc p5NjwlI+jVsKLuNDYlskhIXkMA== X-Google-Smtp-Source: AA0mqf6mvfxaixLdw9mdL/WImf3S0ul9CJiKBqlXzswxmwwbWhPfuah1HeY/Qzm93fIbtOk79jnU3A== X-Received: by 2002:a63:e343:0:b0:46f:ed91:6664 with SMTP id o3-20020a63e343000000b0046fed916664mr13798300pgj.558.1668475849538; Mon, 14 Nov 2022 17:30:49 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id x15-20020aa7940f000000b005627d995a36sm7346339pfo.44.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:47 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKFv-AB; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001VpI-0u; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 2/9] xfs: write page faults in iomap are not buffered writes Date: Tue, 15 Nov 2022 12:30:36 +1100 Message-Id: <20221115013043.360610-3-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475850; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xXTvPzYo/FNVPPUBJfWV1FT/CBSdhDIuqNcR9jo+cPM=; b=1NB+Yrzk9lKmzLl8euHMq8MTk1B+Lwx89OS1x+tFD5GqcOS+YYehTclorDxh65ZV68EgEy u9Se8Qwul7T3+V2HRRiSyETjb/UcdLdwtJ1bUAU7OXpdwFNdR8yp4wIaB0FpjDDucs048O u/wFjUQ/0xtX9hYjopS4QKc/NO3lcVc= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=YeiVS6Db; spf=none (imf12.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.215.169) smtp.mailfrom=david@fromorbit.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475850; a=rsa-sha256; cv=none; b=j8xeY/TVPnOu5vcHAfsJv4yp2F8zzRgHUR/wo7PeO5OVnssLf5soQff6Dt9PAfwzjyyt+h rsYF6TWdp6u+2um17PcVxai2d89UGkMsqX30GdetOsDBiNdI30M6V5zLPK4Gp8KQ8GRJi0 b2ITg6VK7TAhjtclsMHTdYlcQyEfBv8= X-Stat-Signature: ezfgdbayn4atfxrduzqbm8m6sus3bio4 X-Rspamd-Queue-Id: 931A940006 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=YeiVS6Db; spf=none (imf12.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.215.169) smtp.mailfrom=david@fromorbit.com; dmarc=none X-Rspamd-Server: rspam07 X-Rspam-User: X-HE-Tag: 1668475850-453574 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner When we reserve a delalloc region in xfs_buffered_write_iomap_begin, we mark the iomap as IOMAP_F_NEW so that the the write context understands that it allocated the delalloc region. If we then fail that buffered write, xfs_buffered_write_iomap_end() checks for the IOMAP_F_NEW flag and if it is set, it punches out the unused delalloc region that was allocated for the write. The assumption this code makes is that all buffered write operations that can allocate space are run under an exclusive lock (i_rwsem). This is an invalid assumption: page faults in mmap()d regions call through this same function pair to map the file range being faulted and this runs only holding the inode->i_mapping->invalidate_lock in shared mode. IOWs, we can have races between page faults and write() calls that fail the nested page cache write operation that result in data loss. That is, the failing iomap_end call will punch out the data that the other racing iomap iteration brought into the page cache. This can be reproduced with generic/34[46] if we arbitrarily fail page cache copy-in operations from write() syscalls. Code analysis tells us that the iomap_page_mkwrite() function holds the already instantiated and uptodate folio locked across the iomap mapping iterations. Hence the folio cannot be removed from memory whilst we are mapping the range it covers, and as such we do not care if the mapping changes state underneath the iomap iteration loop: 1. if the folio is not already dirty, there is no writeback races possible. 2. if we allocated the mapping (delalloc or unwritten), the folio cannot already be dirty. See #1. 3. If the folio is already dirty, it must be up to date. As we hold it locked, it cannot be reclaimed from memory. Hence we always have valid data in the page cache while iterating the mapping. 4. Valid data in the page cache can exist when the underlying mapping is DELALLOC, UNWRITTEN or WRITTEN. Having the mapping change from DELALLOC->UNWRITTEN or UNWRITTEN->WRITTEN does not change the data in the page - it only affects actions if we are initialising a new page. Hence #3 applies and we don't care about these extent map transitions racing with iomap_page_mkwrite(). 5. iomap_page_mkwrite() checks for page invalidation races (truncate, hole punch, etc) after it locks the folio. We also hold the mapping->invalidation_lock here, and hence the mapping cannot change due to extent removal operations while we are iterating the folio. As such, filesystems that don't use bufferheads will never fail the iomap_folio_mkwrite_iter() operation on the current mapping, regardless of whether the iomap should be considered stale. Further, the range we are asked to iterate is limited to the range inside EOF that the folio spans. Hence, for XFS, we will only map the exact range we are asked for, and we will only do speculative preallocation with delalloc if we are mapping a hole at the EOF page. The iterator will consume the entire range of the folio that is within EOF, and anything beyond the EOF block cannot be accessed. We never need to truncate this post-EOF speculative prealloc away in the context of the iomap_page_mkwrite() iterator because if it remains unused we'll remove it when the last reference to the inode goes away. Hence we don't actually need an .iomap_end() cleanup/error handling path at all for iomap_page_mkwrite() for XFS. This means we can separate the page fault processing from the complexity of the .iomap_end() processing in the buffered write path. This also means that the buffered write path will also be able to take the mapping->invalidate_lock as necessary. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 2 +- fs/xfs/xfs_iomap.c | 9 +++++++++ fs/xfs/xfs_iomap.h | 1 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index e462d39c840e..595a5bcf46b9 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1325,7 +1325,7 @@ __xfs_filemap_fault( if (write_fault) { xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); ret = iomap_page_mkwrite(vmf, - &xfs_buffered_write_iomap_ops); + &xfs_page_mkwrite_iomap_ops); xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); } else { ret = filemap_fault(vmf); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 07da03976ec1..5cea069a38b4 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1187,6 +1187,15 @@ const struct iomap_ops xfs_buffered_write_iomap_ops = { .iomap_end = xfs_buffered_write_iomap_end, }; +/* + * iomap_page_mkwrite() will never fail in a way that requires delalloc extents + * that it allocated to be revoked. Hence we do not need an .iomap_end method + * for this operation. + */ +const struct iomap_ops xfs_page_mkwrite_iomap_ops = { + .iomap_begin = xfs_buffered_write_iomap_begin, +}; + static int xfs_read_iomap_begin( struct inode *inode, diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h index c782e8c0479c..0f62ab633040 100644 --- a/fs/xfs/xfs_iomap.h +++ b/fs/xfs/xfs_iomap.h @@ -47,6 +47,7 @@ xfs_aligned_fsb_count( } extern const struct iomap_ops xfs_buffered_write_iomap_ops; +extern const struct iomap_ops xfs_page_mkwrite_iomap_ops; extern const struct iomap_ops xfs_direct_write_iomap_ops; extern const struct iomap_ops xfs_read_iomap_ops; extern const struct iomap_ops xfs_seek_iomap_ops; From patchwork Tue Nov 15 01:30:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043080 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B018C43219 for ; Tue, 15 Nov 2022 01:30:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADDD26B0072; Mon, 14 Nov 2022 20:30:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A2C268E0002; Mon, 14 Nov 2022 20:30:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 740E98E0001; Mon, 14 Nov 2022 20:30:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5D8556B0072 for ; Mon, 14 Nov 2022 20:30:50 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2D659A05AA for ; Tue, 15 Nov 2022 01:30:50 +0000 (UTC) X-FDA: 80133947460.06.EF9BE92 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by imf04.hostedemail.com (Postfix) with ESMTP id BF69C40008 for ; Tue, 15 Nov 2022 01:30:49 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id q1so11865145pgl.11 for ; Mon, 14 Nov 2022 17:30:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UpYORrtHPEFkdqKAbT3ik18Ay5BZZZbilDStDtNjXS8=; b=co43QgtTK4koY81+x7Mo6GXWhA8qafRWtVduLzA3bTAJl8WxeaRPbiJZNKYpZvtv98 BBfXbZtdI9El//DFgPBA3E6qxVtEWYfRyAadPz9RFZwsBQoXz26JkDUb5MtjjcvgnuU6 H/6MiKVkvnQSQKTvlNghrZ3Ag66qduR2ABeHfYCn0Fwdu88F4MOQpTC5TrSqp+lNGouq Rr5jP3OZu6Nfy3I0yadxa2fjIda663NaefuTlLmr5xVjGTCWQubF0+38igI0nlXOkxbY TMnuc3qwv9lmbKuaa+duJvVX8XPUiSWifr3CeWjxTrQ5A0LZYVpfELEUPY5QvTEi+zLN Bvzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UpYORrtHPEFkdqKAbT3ik18Ay5BZZZbilDStDtNjXS8=; b=pZUCABvictliq3tONnjMitpPyn4u4ba+rQl0eH6DunvMdA+ahRZT48wgK5bL0v8S8n EHQNzVDn02TOZZRwxH/3Y4pcYQseJ10kFSKAUKdLLt17Lnue2YHJo8JoJ5GHlVA7LAtM +ou7FZfpEVlYtr+hOZUPbmR9TyrW50lqpJzQxdYjjMliCYtjxRkwUNWoNUrmrej6YR7p 8KTJDjp3ZE6aEzVnMUkd/yBNUa9QvRaIRLgUR0oUiHuYGp8YXkjK2UuWQXUEAPiixIk9 zplE/da7FWaRYCzYod5UODds9++NE+3S8LI+WVirIDByyNaxVU5jha+ZkRw8wTiKpB4N TH+A== X-Gm-Message-State: ANoB5pkzAIjSqL2qYFIj4obUT7tNBrzL0vPmCR4NULIcPb8g0gde6l/0 6KLu3c6Hy7bgNs0bErRCyqdaHY5fD/397A== X-Google-Smtp-Source: AA0mqf7cXllmyDaiPWcvaKRz/fwiRS4L+bWOPOhO9q5Y16NL3MgbOh3d+5jMiP3wDCY0eNsTVcQnmQ== X-Received: by 2002:a63:3601:0:b0:46e:fa6d:497 with SMTP id d1-20020a633601000000b0046efa6d0497mr13560226pga.60.1668475848708; Mon, 14 Nov 2022 17:30:48 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id n12-20020a170902e54c00b00186f608c543sm8304788plf.304.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:47 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKFw-B3; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001VpM-10; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 3/9] xfs: punching delalloc extents on write failure is racy Date: Tue, 15 Nov 2022 12:30:37 +1100 Message-Id: <20221115013043.360610-4-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=co43QgtT; spf=none (imf04.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.215.169) smtp.mailfrom=david@fromorbit.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475849; a=rsa-sha256; cv=none; b=Bs6FKXOz4iwUQhPc9VY561Y7bMq1tGbLzloxwKqYR1GTDrua2zr+AfZBdWHGee1S+vt8mN d80hKzUgSaacM/J1a0XOb2Vd6xFgXfvZmPk6xxz0ufftNy0AYN4nCac/dEg9p64M4HiTlV L9YVlhCFBuD8oOKQJ2ip74cmhx0Uot8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UpYORrtHPEFkdqKAbT3ik18Ay5BZZZbilDStDtNjXS8=; b=6RO3h/fs02aW2W21czjYRS+wAhRoEYF3Z/fzuNGxt3siB3wr1KsxF0eDigzedYUx5LvtIj VtGToyJKm4DylS7trbJVxZ4kHnnNJlnNTAjZ7efnHyyus/w8bKeRdjbdu55sw319otr8Jy etTK4Lxgc6RElRzEpir2C7DWUlwEE28= X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=co43QgtT; spf=none (imf04.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.215.169) smtp.mailfrom=david@fromorbit.com; dmarc=none X-Stat-Signature: 9zywaj444z54nt8arafxfhupwf4wxbah X-Rspamd-Queue-Id: BF69C40008 X-Rspamd-Server: rspam09 X-HE-Tag: 1668475849-65683 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner xfs_buffered_write_iomap_end() has a comment about the safety of punching delalloc extents based holding the IOLOCK_EXCL. This comment is wrong, and punching delalloc extents is not race free. When we punch out a delalloc extent after a write failure in xfs_buffered_write_iomap_end(), we punch out the page cache with truncate_pagecache_range() before we punch out the delalloc extents. At this point, we only hold the IOLOCK_EXCL, so there is nothing stopping mmap() write faults racing with this cleanup operation, reinstantiating a folio over the range we are about to punch and hence requiring the delalloc extent to be kept. If this race condition is hit, we can end up with a dirty page in the page cache that has no delalloc extent or space reservation backing it. This leads to bad things happening at writeback time. To avoid this race condition, we need the page cache truncation to be atomic w.r.t. the extent manipulation. We can do this by holding the mapping->invalidate_lock exclusively across this operation - this will prevent new pages from being inserted into the page cache whilst we are removing the pages and the backing extent and space reservation. Taking the mapping->invalidate_lock exclusively in the buffered write IO path is safe - it naturally nests inside the IOLOCK (see truncate and fallocate paths). iomap_zero_range() can be called from under the mapping->invalidate_lock (from the truncate path via either xfs_zero_eof() or xfs_truncate_page(), but iomap_zero_iter() will not instantiate new delalloc pages (because it skips holes) and hence will not ever need to punch out delalloc extents on failure. Fix the locking issue, and clean up the code logic a little to avoid unnecessary work if we didn't allocate the delalloc extent or wrote the entire region we allocated. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_iomap.c | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 5cea069a38b4..a2e45ea1b0cb 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1147,6 +1147,10 @@ xfs_buffered_write_iomap_end( written = 0; } + /* If we didn't reserve the blocks, we're not allowed to punch them. */ + if (!(iomap->flags & IOMAP_F_NEW)) + return 0; + /* * start_fsb refers to the first unused block after a short write. If * nothing was written, round offset down to point at the first block in @@ -1158,27 +1162,28 @@ xfs_buffered_write_iomap_end( start_fsb = XFS_B_TO_FSB(mp, offset + written); end_fsb = XFS_B_TO_FSB(mp, offset + length); + /* Nothing to do if we've written the entire delalloc extent */ + if (start_fsb >= end_fsb) + return 0; + /* - * Trim delalloc blocks if they were allocated by this write and we - * didn't manage to write the whole range. - * - * We don't need to care about racing delalloc as we hold i_mutex - * across the reserve/allocate/unreserve calls. If there are delalloc - * blocks in the range, they are ours. + * Lock the mapping to avoid races with page faults re-instantiating + * folios and dirtying them via ->page_mkwrite between the page cache + * truncation and the delalloc extent removal. Failing to do this can + * leave dirty pages with no space reservation in the cache. */ - if ((iomap->flags & IOMAP_F_NEW) && start_fsb < end_fsb) { - truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), - XFS_FSB_TO_B(mp, end_fsb) - 1); - - error = xfs_bmap_punch_delalloc_range(ip, start_fsb, - end_fsb - start_fsb); - if (error && !xfs_is_shutdown(mp)) { - xfs_alert(mp, "%s: unable to clean up ino %lld", - __func__, ip->i_ino); - return error; - } + filemap_invalidate_lock(inode->i_mapping); + truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), + XFS_FSB_TO_B(mp, end_fsb) - 1); + + error = xfs_bmap_punch_delalloc_range(ip, start_fsb, + end_fsb - start_fsb); + filemap_invalidate_unlock(inode->i_mapping); + if (error && !xfs_is_shutdown(mp)) { + xfs_alert(mp, "%s: unable to clean up ino %lld", + __func__, ip->i_ino); + return error; } - return 0; } From patchwork Tue Nov 15 01:30:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20AD8C433FE for ; Tue, 15 Nov 2022 01:31:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 399D88E0008; Mon, 14 Nov 2022 20:30:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 34AAC8E0009; Mon, 14 Nov 2022 20:30:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 101548E0008; Mon, 14 Nov 2022 20:30:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E45988E0009 for ; Mon, 14 Nov 2022 20:30:52 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B5660160A33 for ; Tue, 15 Nov 2022 01:30:52 +0000 (UTC) X-FDA: 80133947544.06.FF2A7FF Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf16.hostedemail.com (Postfix) with ESMTP id 5AA0718000A for ; Tue, 15 Nov 2022 01:30:52 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id r61-20020a17090a43c300b00212f4e9cccdso15478403pjg.5 for ; Mon, 14 Nov 2022 17:30:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QPTCsd0eA+I5KaPEF+DZNzfuYfjq5r3zeLN8iU677T4=; b=eihiAlIsgjvCrNWa8JrS6H38mJ/UEoAacjHim7UhA+iAtxVhj24acVvGLzOVkiRJbA yLhiOLQBwe/tLqyoQCcl/1apHj/a8UcPcGuoi1W0mBNwxqDAbS/XaiOFQC/6xqA2KDfx ZA0Xu3/BdVL1dShgBtAz5cm5jDnLtErIaIYZk/tS2qsk3CwcEq1gm67lMq5Kpa3NaDR+ 2OxgvL14stpxuNJtUUfVpkc3/ncy5xe0kNDEOL28Js6dAiFrlRXweXJosvUJ9Ll6a5sH wvI9K4UdubwtjJUlWCn+/zaWf6SKm5vLUio70lOs/argYADx4yb1Lg+ZhI2Hoy34go91 W9Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QPTCsd0eA+I5KaPEF+DZNzfuYfjq5r3zeLN8iU677T4=; b=n48ODpwyV8cI+XCuFWwsM//bhlZIRJ0Cu53oOKgVrOiL7YF03a8SFv2ZZrN703XRj0 ReR4PkpNddtr8GvP1f0M3/r+8EzLeTvEYYdY3yGKqUFXB5PFWh4GbhMSCl+zNg38YQ05 w7A29Aqny1Q024hBHmVC02B7beTUkNkPg2vYQe1ieEcqdf9OQLCNZ1EJEts+MbPUBM9r sF2eS8Al7+XGe4Sq6dyQd9X9Np6ogBqsnB7Au2aqKkXWPli49z5YR/qc9yLAE0tMerFc vzg1eNGmXwVjP+S2IG5n2fJUfRTfignlmDxorcscMQ6/zeuZ3pifz1/CWLOwDfHp6H5q 6vTA== X-Gm-Message-State: ANoB5plpg6veQ630qOblSAYK356m/wRU0bF+C8DlYXWkzdqh0sTyUn++ zCxs0slbJevKN3RQEWVUJXkHrA== X-Google-Smtp-Source: AA0mqf5iWxCgy606h4XYwR+zCca2X+upA7f7ZysX/ph9qzrPhhq0LQOr3GgGIzJnWppkUwPp7JATGw== X-Received: by 2002:a17:902:b48d:b0:188:b8cf:85b with SMTP id y13-20020a170902b48d00b00188b8cf085bmr1696179plr.126.1668475851202; Mon, 14 Nov 2022 17:30:51 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id 34-20020a630d62000000b0046feca0883fsm6457285pgn.64.2022.11.14.17.30.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:50 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKFy-Bv; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001VpQ-15; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 4/9] xfs: use byte ranges for write cleanup ranges Date: Tue, 15 Nov 2022 12:30:38 +1100 Message-Id: <20221115013043.360610-5-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475852; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QPTCsd0eA+I5KaPEF+DZNzfuYfjq5r3zeLN8iU677T4=; b=K8QxPwAQlNaxY738UFiD8edXIsDBn8WZfecTtHvsdo2JV26KT+1IM3Jpz44Xy5kbbSlToa t3qkeWQR9eSw04I39MGOtPy0yWxOOW8gVHGFUEaJqMCpGrSx97SG9NvOXBEs0TgSHHe2rK 2gZyCG+m6/Ndp+SXnJjQWaApKD0fGxQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=eihiAlIs; dmarc=none; spf=none (imf16.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.216.50) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475852; a=rsa-sha256; cv=none; b=MrwaO9HzGbUd5Ou3MduzUI0BUSGZTHxnqqVLryUKvLbsDnFMvGZfu/8PTj7Jt2x6pbCWST NKqJzEvGIZGULq3vVjx69tGKTyW9YOQBoWr4N8SfZPW3dXjrGP7IMxJ450iG1z99LI+DUv 43fgpf+3Jt38/fgRde/8cQCFW5RmOxk= X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5AA0718000A X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=eihiAlIs; dmarc=none; spf=none (imf16.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.216.50) smtp.mailfrom=david@fromorbit.com X-Stat-Signature: knuj83swnbd3mgcscgns55tk1rc4t17u X-HE-Tag: 1668475852-248427 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner xfs_buffered_write_iomap_end() currently converts the byte ranges passed to it to filesystem blocks to pass them to the bmap code to punch out delalloc blocks, but then has to convert filesytem blocks back to byte ranges for page cache truncate. We're about to make the page cache truncate go away and replace it with a page cache walk, so having to convert everything to/from/to filesystem blocks is messy and error-prone. It is much easier to pass around byte ranges and convert to page indexes and/or filesystem blocks only where those units are needed. In preparation for the page cache walk being added, add a helper that converts byte ranges to filesystem blocks and calls xfs_bmap_punch_delalloc_range() and convert xfs_buffered_write_iomap_end() to calculate limits in byte ranges. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_iomap.c | 40 +++++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index a2e45ea1b0cb..7bb55dbc19d3 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1120,6 +1120,20 @@ xfs_buffered_write_iomap_begin( return error; } +static int +xfs_buffered_write_delalloc_punch( + struct inode *inode, + loff_t start_byte, + loff_t end_byte) +{ + struct xfs_mount *mp = XFS_M(inode->i_sb); + xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte); + xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte); + + return xfs_bmap_punch_delalloc_range(XFS_I(inode), start_fsb, + end_fsb - start_fsb); +} + static int xfs_buffered_write_iomap_end( struct inode *inode, @@ -1129,10 +1143,9 @@ xfs_buffered_write_iomap_end( unsigned flags, struct iomap *iomap) { - struct xfs_inode *ip = XFS_I(inode); - struct xfs_mount *mp = ip->i_mount; - xfs_fileoff_t start_fsb; - xfs_fileoff_t end_fsb; + struct xfs_mount *mp = XFS_M(inode->i_sb); + loff_t start_byte; + loff_t end_byte; int error = 0; if (iomap->type != IOMAP_DELALLOC) @@ -1157,13 +1170,13 @@ xfs_buffered_write_iomap_end( * the range. */ if (unlikely(!written)) - start_fsb = XFS_B_TO_FSBT(mp, offset); + start_byte = round_down(offset, mp->m_sb.sb_blocksize); else - start_fsb = XFS_B_TO_FSB(mp, offset + written); - end_fsb = XFS_B_TO_FSB(mp, offset + length); + start_byte = round_up(offset + written, mp->m_sb.sb_blocksize); + end_byte = round_up(offset + length, mp->m_sb.sb_blocksize); /* Nothing to do if we've written the entire delalloc extent */ - if (start_fsb >= end_fsb) + if (start_byte >= end_byte) return 0; /* @@ -1173,15 +1186,12 @@ xfs_buffered_write_iomap_end( * leave dirty pages with no space reservation in the cache. */ filemap_invalidate_lock(inode->i_mapping); - truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb), - XFS_FSB_TO_B(mp, end_fsb) - 1); - - error = xfs_bmap_punch_delalloc_range(ip, start_fsb, - end_fsb - start_fsb); + truncate_pagecache_range(inode, start_byte, end_byte - 1); + error = xfs_buffered_write_delalloc_punch(inode, start_byte, end_byte); filemap_invalidate_unlock(inode->i_mapping); if (error && !xfs_is_shutdown(mp)) { - xfs_alert(mp, "%s: unable to clean up ino %lld", - __func__, ip->i_ino); + xfs_alert(mp, "%s: unable to clean up ino 0x%llx", + __func__, XFS_I(inode)->i_ino); return error; } return 0; From patchwork Tue Nov 15 01:30:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65ADDC433FE for ; Tue, 15 Nov 2022 01:30:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9B1C8E0007; Mon, 14 Nov 2022 20:30:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C4CCA8E0008; Mon, 14 Nov 2022 20:30:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8CA638E0007; Mon, 14 Nov 2022 20:30:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 79D9B8E0002 for ; Mon, 14 Nov 2022 20:30:52 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 49AE2141006 for ; Tue, 15 Nov 2022 01:30:52 +0000 (UTC) X-FDA: 80133947544.12.1B75AB3 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf04.hostedemail.com (Postfix) with ESMTP id C494A4000A for ; Tue, 15 Nov 2022 01:30:51 +0000 (UTC) Received: by mail-pj1-f47.google.com with SMTP id u8-20020a17090a5e4800b002106dcdd4a0so15507067pji.1 for ; Mon, 14 Nov 2022 17:30:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ao3r4ObshVMOTeg9d8XwDGgMsxE9xyLmCjV3U83plkg=; b=IxLXf8uPVQk2uek4l4A7Ma0r42OMSFEwWJMY5GlhrmXuYw0hkvYTCABeUXdwgEXbCb wLzxihAxsr7f8jNGmHPWcwKK3BEc7Xftau4qy1yM4n/b/hpkaFN4Xpwqgq6SMXQHZfrN ubvwH07dPUs7gOIV7qduBQBGyEVmJniTH41HFk6l1fLGWWLwncjtzijD0aITz5mJ6eXb VrR2KLhazGSAr4LfAf9MEePPX4GtNvjlXLq2DnDcOB9+22JRgjEIxoYa0ag2l49pdx1d 44ajQMaCRuhBigGiwG4dJyFBMpc/woEXbf9XCmxISbGm9yf0Q7dzeT2Rm/7LmGw8sMaF YH/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ao3r4ObshVMOTeg9d8XwDGgMsxE9xyLmCjV3U83plkg=; b=ZmubFhOtV10Ur3gTI8BLQzYLVhY9bfddpVRK3vKjAnHJgED0LReVawOOTLrjf8dmrX WOdeQyxR9OOV/JH4GeFpo4vOPXEzOOG5FD+43RyeSLj2tIArP1K9oh2GkX/jXv5sBlgo HkmfrulFA0PhfGl5lvKbXlgZOtgHDCwnKpV34vUoM/MBQtYdXzRmmGl6I61KCkx8cZkb CcHQzrKsFxkPtjmKlrF8o/K62x5qtKbeS/i9RzEAC3vt/0M5hHf+E+yZIv03R9LhmYM1 QT60r+r7CC0pecMystnH/vprspLFHxHVpfAOiBpJfDqcfmMnNaMnTYLlaN3YfY4I7Fy6 299g== X-Gm-Message-State: ANoB5pkMG1bIrGspcTV/X0F4SA17t+wAphaG23NGrQMWxxysXsxDQGFu 4j4z0UhjVPLgRUHN6u6HXF825g== X-Google-Smtp-Source: AA0mqf6VWZhWw2NouJCG7fUJ8iXNFP6L9tR8d5A3cPmJ34Q2RxBmQHkFWBn4GqNEAHrcyqUTwhQqEw== X-Received: by 2002:a17:902:cf4c:b0:185:4703:9f5f with SMTP id e12-20020a170902cf4c00b0018547039f5fmr1762037plg.156.1668475850539; Mon, 14 Nov 2022 17:30:50 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id n12-20020a170902e54c00b00186f608c543sm8304789plf.304.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:48 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKG3-Cw; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001VpU-1B; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 5/9] xfs: buffered write failure should not truncate the page cache Date: Tue, 15 Nov 2022 12:30:39 +1100 Message-Id: <20221115013043.360610-6-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=IxLXf8uP; spf=none (imf04.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.216.47) smtp.mailfrom=david@fromorbit.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475851; a=rsa-sha256; cv=none; b=0petVHqR9jHv3oXAx7jXIpYF8fUusy1hVEGagYA3WGnsICO2cKYRmzpTVcP41T05VEmQ5b saNm6HoHt47O3+U78bkvwMLvXU0ZT3cesPY2SCo34B7A4yvSfNbKo5fEkvgj6eEtsB1CWr a4TUaSG2ISrMoBZRZ65sdTX1SlM+CJc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475851; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ao3r4ObshVMOTeg9d8XwDGgMsxE9xyLmCjV3U83plkg=; b=SvON7oPkekuuvPUs6vNnNdoFfCQbFHHt8K+IdLTRQqA5i7gyV5tA792ygVlTzkDuIXokXa Kg8sYt1pPRl9xQVD4HZ4vZGyUnC8OvmuCoS6cVIg3iZAgZT2DCyLZR4bMtenMx4iS2kUwW GUF+qK6GZLKJ8eeg5wQarCEXGVwsNqE= X-Stat-Signature: k1mihttantsjd1nfbhuxcouoakkirma5 X-Rspamd-Queue-Id: C494A4000A X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=IxLXf8uP; spf=none (imf04.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.216.47) smtp.mailfrom=david@fromorbit.com; dmarc=none X-Rspamd-Server: rspam11 X-HE-Tag: 1668475851-394753 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner xfs_buffered_write_iomap_end() currently invalidates the page cache over the unused range of the delalloc extent it allocated. While the write allocated the delalloc extent, it does not own it exclusively as the write does not hold any locks that prevent either writeback or mmap page faults from changing the state of either the page cache or the extent state backing this range. Whilst xfs_bmap_punch_delalloc_range() already handles races in extent conversion - it will only punch out delalloc extents and it ignores any other type of extent - the page cache truncate does not discriminate between data written by this write or some other task. As a result, truncating the page cache can result in data corruption if the write races with mmap modifications to the file over the same range. generic/346 exercises this workload, and if we randomly fail writes (as will happen when iomap gets stale iomap detection later in the patchset), it will randomly corrupt the file data because it removes data written by mmap() in the same page as the write() that failed. Hence we do not want to punch out the page cache over the range of the extent we failed to write to - what we actually need to do is detect the ranges that have dirty data in cache over them and *not punch them out*. TO do this, we have to walk the page cache over the range of the delalloc extent we want to remove. This is made complex by the fact we have to handle partially up-to-date folios correctly and this can happen even when the FSB size == PAGE_SIZE because we now support multi-page folios in the page cache. Because we are only interested in discovering the edges of data ranges in the page cache (i.e. hole-data boundaries) we can make use of mapping_seek_hole_data() to find those transitions in the page cache. As we hold the invalidate_lock, we know that the boundaries are not going to change while we walk the range. This interface is also byte-based and is sub-page block aware, so we can find the data ranges in the cache based on byte offsets rather than page, folio or fs block sized chunks. This greatly simplifies the logic of finding dirty cached ranges in the page cache. Once we've identified a range that contains cached data, we can then iterate the range folio by folio. This allows us to determine if the data is dirty and hence perform the correct delalloc extent punching operations. The seek interface we use to iterate data ranges will give us sub-folio start/end granularity, so we may end up looking up the same folio multiple times as the seek interface iterates across each discontiguous data region in the folio. Signed-off-by: Dave Chinner --- fs/xfs/xfs_iomap.c | 151 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 141 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 7bb55dbc19d3..2d48fcc7bd6f 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1134,6 +1134,146 @@ xfs_buffered_write_delalloc_punch( end_fsb - start_fsb); } +/* + * Scan the data range passed to us for dirty page cache folios. If we find a + * dirty folio, punch out the preceeding range and update the offset from which + * the next punch will start from. + * + * We can punch out clean pages because they either contain data that has been + * written back - in which case the delalloc punch over that range is a no-op - + * or they have been read faults in which case they contain zeroes and we can + * remove the delalloc backing range and any new writes to those pages will do + * the normal hole filling operation... + * + * This makes the logic simple: we only need to keep the delalloc extents only + * over the dirty ranges of the page cache. + */ +static int +xfs_buffered_write_delalloc_scan( + struct inode *inode, + loff_t *punch_start_byte, + loff_t start_byte, + loff_t end_byte) +{ + loff_t offset = start_byte; + + while (offset < end_byte) { + struct folio *folio; + + /* grab locked page */ + folio = filemap_lock_folio(inode->i_mapping, offset >> PAGE_SHIFT); + if (!folio) { + offset = ALIGN_DOWN(offset, PAGE_SIZE) + PAGE_SIZE; + continue; + } + + /* if dirty, punch up to offset */ + if (folio_test_dirty(folio)) { + if (offset > *punch_start_byte) { + int error; + + error = xfs_buffered_write_delalloc_punch(inode, + *punch_start_byte, offset); + if (error) { + folio_unlock(folio); + folio_put(folio); + return error; + } + } + + /* + * Make sure the next punch start is correctly bound to + * the end of this data range, not the end of the folio. + */ + *punch_start_byte = min_t(loff_t, end_byte, + folio_next_index(folio) << PAGE_SHIFT); + } + + /* move offset to start of next folio in range */ + offset = folio_next_index(folio) << PAGE_SHIFT; + folio_unlock(folio); + folio_put(folio); + } + return 0; +} + +/* + * Punch out all the delalloc blocks in the range given except for those that + * have dirty data still pending in the page cache - those are going to be + * written and so must still retain the delalloc backing for writeback. + * + * As we are scanning the page cache for data, we don't need to reimplement the + * wheel - mapping_seek_hole_data() does exactly what we need to identify the + * start and end of data ranges correctly even for sub-folio block sizes. This + * byte range based iteration is especially convenient because it means we don't + * have to care about variable size folios, nor where the start or end of the + * data range lies within a folio, if they lie within the same folio or even if + * there are multiple discontiguous data ranges within the folio. + */ +static int +xfs_buffered_write_delalloc_release( + struct inode *inode, + loff_t start_byte, + loff_t end_byte) +{ + loff_t punch_start_byte = start_byte; + int error = 0; + + /* + * Lock the mapping to avoid races with page faults re-instantiating + * folios and dirtying them via ->page_mkwrite whilst we walk the + * cache and perform delalloc extent removal. Failing to do this can + * leave dirty pages with no space reservation in the cache. + */ + filemap_invalidate_lock(inode->i_mapping); + while (start_byte < end_byte) { + loff_t data_end; + + start_byte = mapping_seek_hole_data(inode->i_mapping, + start_byte, end_byte, SEEK_DATA); + /* + * If there is no more data to scan, all that is left is to + * punch out the remaining range. + */ + if (start_byte == -ENXIO || start_byte == end_byte) + break; + if (start_byte < 0) { + error = start_byte; + goto out_unlock; + } + ASSERT(start_byte >= punch_start_byte); + ASSERT(start_byte < end_byte); + + /* + * We find the end of this contiguous cached data range by + * seeking from start_byte to the beginning of the next hole. + */ + data_end = mapping_seek_hole_data(inode->i_mapping, start_byte, + end_byte, SEEK_HOLE); + if (data_end < 0) { + error = data_end; + goto out_unlock; + } + ASSERT(data_end > start_byte); + ASSERT(data_end <= end_byte); + + error = xfs_buffered_write_delalloc_scan(inode, + &punch_start_byte, start_byte, data_end); + if (error) + goto out_unlock; + + /* The next data search starts at the end of this one. */ + start_byte = data_end; + } + + if (punch_start_byte < end_byte) + error = xfs_buffered_write_delalloc_punch(inode, + punch_start_byte, end_byte); +out_unlock: + filemap_invalidate_unlock(inode->i_mapping); + return error; +} + static int xfs_buffered_write_iomap_end( struct inode *inode, @@ -1179,16 +1319,7 @@ xfs_buffered_write_iomap_end( if (start_byte >= end_byte) return 0; - /* - * Lock the mapping to avoid races with page faults re-instantiating - * folios and dirtying them via ->page_mkwrite between the page cache - * truncation and the delalloc extent removal. Failing to do this can - * leave dirty pages with no space reservation in the cache. - */ - filemap_invalidate_lock(inode->i_mapping); - truncate_pagecache_range(inode, start_byte, end_byte - 1); - error = xfs_buffered_write_delalloc_punch(inode, start_byte, end_byte); - filemap_invalidate_unlock(inode->i_mapping); + error = xfs_buffered_write_delalloc_release(inode, start_byte, end_byte); if (error && !xfs_is_shutdown(mp)) { xfs_alert(mp, "%s: unable to clean up ino 0x%llx", __func__, XFS_I(inode)->i_ino); From patchwork Tue Nov 15 01:30:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043079 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53209C433FE for ; Tue, 15 Nov 2022 01:30:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B1226B0071; Mon, 14 Nov 2022 20:30:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DD7D6B0072; Mon, 14 Nov 2022 20:30:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A4676B0074; Mon, 14 Nov 2022 20:30:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 55B076B0071 for ; Mon, 14 Nov 2022 20:30:50 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2C8E9AB55E for ; Tue, 15 Nov 2022 01:30:50 +0000 (UTC) X-FDA: 80133947460.04.A31DC38 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf15.hostedemail.com (Postfix) with ESMTP id A73C4A0009 for ; Tue, 15 Nov 2022 01:30:49 +0000 (UTC) Received: by mail-pf1-f171.google.com with SMTP id q9so12726147pfg.5 for ; Mon, 14 Nov 2022 17:30:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XRbCKOeEzaINPi28mkymmnvdjm9021B1BlIXQDG3kyg=; b=w6QrD1tRY+J5ZkwFK9ZX+G4Rt/2yPVQZoD0dbLbepFH5SuLO1KAAjZeq7Qb1AUqJyt 4c3/j2omnChCoLxxtTjT2tqLiQmMtuESJkzcmvj9EpZj+mTAc0Kie9ccf/g47O4yhDZl pAoEH2wJPDAX1GyUG0uXtSc2hsL/SecqTcx6WuKNrYBqW0nq+khenr161SNF7KVa5XRn KLClCBYCkFP9vdokQN8WBCKws7oj3uOMXg7YCYtc/f3qjaqPIH3Azr+8Mn38e598Hned olK9TePB+WvV7AgUzJ3637H2oohlCYpXP86GphkDV7lJ0U7/FcuiQdkSfsz9vWg6cm+4 RUfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XRbCKOeEzaINPi28mkymmnvdjm9021B1BlIXQDG3kyg=; b=Sp7iV6Oevn74H27uG4MjcRyRFCUjdhzPs4pOY0v3y3J1ZSbF53+qOVCSGMhoAzBcfR SmvUGNsXEmZCIZdxtkYC5ceuAKI2OWN8PU7N2Y2qiPyy4yDeDIB3W9xh0K0+sH7S8gah Ls5/tBqaOJoNf37Jj4Tx6U5lPKBFCET15/tBDHiVOuSBxmRe/hF3KjDCoMa+gZ6jz7Of fgZz5esdr3qhATYHMAGVOFzfi8AVHmriQvwIVmqqqo65U9bF5+cp7MKZuxNv/2GvKDUB KEsmP0qJpsR8C7DOQl1JClk5jWuMG+MSK1GA053KK07l9ueLbQjy1Xkgc4PiN3E8Zt9G pXOg== X-Gm-Message-State: ANoB5pklvEu0MshMWzrZoThyQdtL7i4ROHaqEvdHmfcb91vYnHKzQ6Ai Bj+s4mBOmtlsRUtSehOEfVQUYg== X-Google-Smtp-Source: AA0mqf7UjzuHUARnNssJHidYHv6IsqjUDGTSoPPpjTwRymVju/Gxhkqsrl0nYg2Vw5pGFC0Bl/4mtw== X-Received: by 2002:a63:40d:0:b0:470:514e:1f12 with SMTP id 13-20020a63040d000000b00470514e1f12mr13964475pge.353.1668475848378; Mon, 14 Nov 2022 17:30:48 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id w18-20020a170902c79200b001754064ac31sm8143681pla.280.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:47 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKG5-Dw; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001Vpb-1H; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 6/9] xfs: xfs_bmap_punch_delalloc_range() should take a byte range Date: Tue, 15 Nov 2022 12:30:40 +1100 Message-Id: <20221115013043.360610-7-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475849; a=rsa-sha256; cv=none; b=df6pJJwPG47POF/BmzWtFj1T8DpZOK+v0szPGPYISxisYMLqkGhH3A+vEYTWBmbEWxFvbd aTMbtHxYel48MMMeRWqSrJS8m8OhaPJxCb4HLQYsg5X84JleezxcYE2u7G97VmmS3L0Enm KEoM91uVcK0s2Sau5CN3A0/QJhhedLM= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=w6QrD1tR; spf=none (imf15.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.210.171) smtp.mailfrom=david@fromorbit.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475849; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XRbCKOeEzaINPi28mkymmnvdjm9021B1BlIXQDG3kyg=; b=iM/RSI1Vv6xjGAWLXgV/tVtYkQ/IdlD2DFzpbsHmzBJPJxlKy+Dgqt4VP8Lljev3vxK3wl 179XRTp8Ene2mV4Th3tyJ4pRp2/ivlxuPXqs0CkYWLvXDLxSZx3jYM7Aw8uNB5BlmBicu4 f3GDb1ZneigyteI7WMa66SYMRKTSom4= X-Stat-Signature: 567de8bc1giygdjsqpwregfzko9uqjqp Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=w6QrD1tR; spf=none (imf15.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.210.171) smtp.mailfrom=david@fromorbit.com; dmarc=none X-Rspamd-Server: rspam10 X-Rspam-User: X-Rspamd-Queue-Id: A73C4A0009 X-HE-Tag: 1668475849-536654 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner All the callers of xfs_bmap_punch_delalloc_range() jump through hoops to convert a byte range to filesystem blocks before calling xfs_bmap_punch_delalloc_range(). Instead, pass the byte range to xfs_bmap_punch_delalloc_range() and have it do the conversion to filesystem blocks internally. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_aops.c | 16 ++++++---------- fs/xfs/xfs_bmap_util.c | 10 ++++++---- fs/xfs/xfs_bmap_util.h | 2 +- fs/xfs/xfs_iomap.c | 21 ++++----------------- 4 files changed, 17 insertions(+), 32 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 5d1a995b15f8..6aadc5815068 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -114,9 +114,8 @@ xfs_end_ioend( if (unlikely(error)) { if (ioend->io_flags & IOMAP_F_SHARED) { xfs_reflink_cancel_cow_range(ip, offset, size, true); - xfs_bmap_punch_delalloc_range(ip, - XFS_B_TO_FSBT(mp, offset), - XFS_B_TO_FSB(mp, size)); + xfs_bmap_punch_delalloc_range(ip, offset, + offset + size); } goto done; } @@ -455,12 +454,8 @@ xfs_discard_folio( struct folio *folio, loff_t pos) { - struct inode *inode = folio->mapping->host; - struct xfs_inode *ip = XFS_I(inode); + struct xfs_inode *ip = XFS_I(folio->mapping->host); struct xfs_mount *mp = ip->i_mount; - size_t offset = offset_in_folio(folio, pos); - xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, pos); - xfs_fileoff_t pageoff_fsb = XFS_B_TO_FSBT(mp, offset); int error; if (xfs_is_shutdown(mp)) @@ -470,8 +465,9 @@ xfs_discard_folio( "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.", folio, ip->i_ino, pos); - error = xfs_bmap_punch_delalloc_range(ip, start_fsb, - i_blocks_per_folio(inode, folio) - pageoff_fsb); + error = xfs_bmap_punch_delalloc_range(ip, pos, + round_up(pos, folio_size(folio))); + if (error && !xfs_is_shutdown(mp)) xfs_alert(mp, "page discard unable to remove delalloc mapping."); } diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 04d0c2bff67c..867645b74d88 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -590,11 +590,13 @@ xfs_getbmap( int xfs_bmap_punch_delalloc_range( struct xfs_inode *ip, - xfs_fileoff_t start_fsb, - xfs_fileoff_t length) + xfs_off_t start_byte, + xfs_off_t end_byte) { + struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = &ip->i_df; - xfs_fileoff_t end_fsb = start_fsb + length; + xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte); + xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte); struct xfs_bmbt_irec got, del; struct xfs_iext_cursor icur; int error = 0; @@ -607,7 +609,7 @@ xfs_bmap_punch_delalloc_range( while (got.br_startoff + got.br_blockcount > start_fsb) { del = got; - xfs_trim_extent(&del, start_fsb, length); + xfs_trim_extent(&del, start_fsb, end_fsb - start_fsb); /* * A delete can push the cursor forward. Step back to the diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 24b37d211f1d..6888078f5c31 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -31,7 +31,7 @@ xfs_bmap_rtalloc(struct xfs_bmalloca *ap) #endif /* CONFIG_XFS_RT */ int xfs_bmap_punch_delalloc_range(struct xfs_inode *ip, - xfs_fileoff_t start_fsb, xfs_fileoff_t length); + xfs_off_t start_byte, xfs_off_t end_byte); struct kgetbmap { __s64 bmv_offset; /* file offset of segment in blocks */ diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 2d48fcc7bd6f..04da22943e7c 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1120,20 +1120,6 @@ xfs_buffered_write_iomap_begin( return error; } -static int -xfs_buffered_write_delalloc_punch( - struct inode *inode, - loff_t start_byte, - loff_t end_byte) -{ - struct xfs_mount *mp = XFS_M(inode->i_sb); - xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte); - xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte); - - return xfs_bmap_punch_delalloc_range(XFS_I(inode), start_fsb, - end_fsb - start_fsb); -} - /* * Scan the data range passed to us for dirty page cache folios. If we find a * dirty folio, punch out the preceeding range and update the offset from which @@ -1172,8 +1158,9 @@ xfs_buffered_write_delalloc_scan( if (offset > *punch_start_byte) { int error; - error = xfs_buffered_write_delalloc_punch(inode, - *punch_start_byte, offset); + error = xfs_bmap_punch_delalloc_range( + XFS_I(inode), *punch_start_byte, + offset); if (error) { folio_unlock(folio); folio_put(folio); @@ -1267,7 +1254,7 @@ xfs_buffered_write_delalloc_release( } if (punch_start_byte < end_byte) - error = xfs_buffered_write_delalloc_punch(inode, + error = xfs_bmap_punch_delalloc_range(XFS_I(inode), punch_start_byte, end_byte); out_unlock: filemap_invalidate_unlock(inode->i_mapping); From patchwork Tue Nov 15 01:30:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBB2BC4321E for ; Tue, 15 Nov 2022 01:30:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B14C08E0005; Mon, 14 Nov 2022 20:30:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A75DE8E0002; Mon, 14 Nov 2022 20:30:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7DE698E0005; Mon, 14 Nov 2022 20:30:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6878B8E0002 for ; Mon, 14 Nov 2022 20:30:51 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 44D0080EE3 for ; Tue, 15 Nov 2022 01:30:51 +0000 (UTC) X-FDA: 80133947502.24.5D9DB69 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf13.hostedemail.com (Postfix) with ESMTP id DE5E920005 for ; Tue, 15 Nov 2022 01:30:50 +0000 (UTC) Received: by mail-pf1-f180.google.com with SMTP id 130so12709133pfu.8 for ; Mon, 14 Nov 2022 17:30:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=phVBYhudr1GO3CmXNiH4VHKFg2wktmsa7QR6ncKeGUY=; b=CnJLmS65oJuTXNJdauefJ7ELys/nPCvRWwVX6p7OIzQ5KbUM7QPVl79+tgw88wsm7+ NOXvdQrEcikF6PXWKwS95asN6EGPpnzUx4O94A/xedi5MfZ0G1d9aYt66X1a174iCgoc 0JjUtGc3s7wTtBFNUVjLpCumTSkYKzhoX//mcOfUCLLr8Bi7+bNTlSTceik1XNXEd56e LZDIx6vy3A4MIyexw2GJYAGzRvIPhhRvA37WNdtT44zXvD5F7EOfE02hxAvunBjy4eMz pq23L43nJleoxqZgKUFR8qycS5KZl1kKLcng6jqyuEfe5RJ4kfVkcy0Abz4coSFE5Chx DMbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=phVBYhudr1GO3CmXNiH4VHKFg2wktmsa7QR6ncKeGUY=; b=Bn5jgusy+6qYAnNkV0tNhtSwQ1NrY9DEVRfTqPN42PsoiqproULTWlAnjLPOgC4keF NgGcvLmL3hkcPEqnCK9RmYTg/iU3gRELFlpYpmHzY6IyJLYA2IWUE3y3H9nthtYbAPMg NoqmWTb+xHSQndt6qKRxGAJsGozoSg+c0mdf3bpxFuY5+QqHT7KhjZZ+QBuymYxoGjQM uiKqPJoLkJpkvqeKHAJsc5ysXgO8GZwF1jY4P49/XtoXXclR4MYfKKiwf4CJTWnZchSF Tlpr5vU1jVv9k780M6SciUoigHS1c1qyiGHDUccju0sYSlihb104jpBK9txCScoUG4WA XXqQ== X-Gm-Message-State: ANoB5pnpotGCGgRauWNI/64SMd0JRWYmu+ejBnrdcdlTa5K6gN4L+r5K gxjfIhuF9JTgqqGrOHATpcyUPiVO33LWug== X-Google-Smtp-Source: AA0mqf7IBDdvpfbRaR4daIa3KxGQlGc37lKNd2h/683Ib/X7tTmU2xe6OcRHlDoNxxbEc5NDvgl6aA== X-Received: by 2002:a62:403:0:b0:572:5be2:505b with SMTP id 3-20020a620403000000b005725be2505bmr2765779pfe.52.1668475849890; Mon, 14 Nov 2022 17:30:49 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id e2-20020a170902d38200b00186e2b3e12fsm8178832pld.261.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:48 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKG7-Es; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001Vph-1O; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 7/9] iomap: write iomap validity checks Date: Tue, 15 Nov 2022 12:30:41 +1100 Message-Id: <20221115013043.360610-8-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475850; a=rsa-sha256; cv=none; b=Stf5fpOZRHJ6yTVM3sjPahmdQU6o4Mi1Wb2z7VxXqNlyD/bDZS4TJeZvJm8WJWgFS3m8Rl DSzejepfuya1X2QZEHiwrbep/tY+hVrlSIsVGeepyNlPf1G3s55wL7QVScfsPj9nvP2clz WMtaarQu/ZlDCWky2vraDXGyxwbcyRM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=CnJLmS65; dmarc=none; spf=none (imf13.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.210.180) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475850; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=phVBYhudr1GO3CmXNiH4VHKFg2wktmsa7QR6ncKeGUY=; b=MbGoOZ/3swZojtV9Uflz6TlbdlrsVuxqZ/Ghc7ckdJD2IgPGA3HTaa1Xj+DFSXU8qDjCM3 smFYvagZcMX8HPtZpI/YQhKjqbGbde7idVEplvx/8ABCpLq4T1X+G+e+bCfPJ2fkqfsSDA nbcHbmRjcQsoGBh+FNUUP3Xp9c4YT5k= X-Rspamd-Queue-Id: DE5E920005 X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=CnJLmS65; dmarc=none; spf=none (imf13.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.210.180) smtp.mailfrom=david@fromorbit.com X-Rspamd-Server: rspam06 X-Stat-Signature: qdr1agyz6tcy53yu5wg7n6oif1eipbnp X-HE-Tag: 1668475850-303698 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner A recent multithreaded write data corruption has been uncovered in the iomap write code. The core of the problem is partial folio writes can be flushed to disk while a new racing write can map it and fill the rest of the page: writeback new write allocate blocks blocks are unwritten submit IO ..... map blocks iomap indicates UNWRITTEN range loop { lock folio copyin data ..... IO completes runs unwritten extent conv blocks are marked written get next folio } Now add memory pressure such that memory reclaim evicts the partially written folio that has already been written to disk. When the new write finally gets to the last partial page of the new write, it does not find it in cache, so it instantiates a new page, sees the iomap is unwritten, and zeros the part of the page that it does not have data from. This overwrites the data on disk that was originally written. The full description of the corruption mechanism can be found here: https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/ To solve this problem, we need to check whether the iomap is still valid after we lock each folio during the write. We have to do it after we lock the page so that we don't end up with state changes occurring while we wait for the folio to be locked. Hence we need a mechanism to be able to check that the cached iomap is still valid (similar to what we already do in buffered writeback), and we need a way for ->begin_write to back out and tell the high level iomap iterator that we need to remap the remaining write range. The iomap needs to grow some storage for the validity cookie that the filesystem provides to travel with the iomap. XFS, in particular, also needs to know some more information about what the iomap maps (attribute extents rather than file data extents) to for the validity cookie to cover all the types of iomaps we might need to validate. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig --- fs/iomap/buffered-io.c | 29 +++++++++++++++++++++++++++- fs/iomap/iter.c | 19 ++++++++++++++++++- include/linux/iomap.h | 43 ++++++++++++++++++++++++++++++++++-------- 3 files changed, 81 insertions(+), 10 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 91ee0b308e13..8354b0fdaa94 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -584,7 +584,7 @@ static int iomap_write_begin_inline(const struct iomap_iter *iter, return iomap_read_inline_data(iter, folio); } -static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos, +static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, size_t len, struct folio **foliop) { const struct iomap_page_ops *page_ops = iter->iomap.page_ops; @@ -618,6 +618,27 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos, status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM; goto out_no_page; } + + /* + * Now we have a locked folio, before we do anything with it we need to + * check that the iomap we have cached is not stale. The inode extent + * mapping can change due to concurrent IO in flight (e.g. + * IOMAP_UNWRITTEN state can change and memory reclaim could have + * reclaimed a previously partially written page at this index after IO + * completion before this write reaches this file offset) and hence we + * could do the wrong thing here (zero a page range incorrectly or fail + * to zero) and corrupt data. + */ + if (page_ops && page_ops->iomap_valid) { + bool iomap_valid = page_ops->iomap_valid(iter->inode, + &iter->iomap); + if (!iomap_valid) { + iter->iomap.flags |= IOMAP_F_STALE; + status = 0; + goto out_unlock; + } + } + if (pos + len > folio_pos(folio) + folio_size(folio)) len = folio_pos(folio) + folio_size(folio) - pos; @@ -773,6 +794,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) status = iomap_write_begin(iter, pos, bytes, &folio); if (unlikely(status)) break; + if (iter->iomap.flags & IOMAP_F_STALE) + break; page = folio_file_page(folio, pos >> PAGE_SHIFT); if (mapping_writably_mapped(mapping)) @@ -856,6 +879,8 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter) status = iomap_write_begin(iter, pos, bytes, &folio); if (unlikely(status)) return status; + if (iter->iomap.flags & IOMAP_F_STALE) + break; status = iomap_write_end(iter, pos, bytes, bytes, folio); if (WARN_ON_ONCE(status == 0)) @@ -911,6 +936,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) status = iomap_write_begin(iter, pos, bytes, &folio); if (status) return status; + if (iter->iomap.flags & IOMAP_F_STALE) + break; offset = offset_in_folio(folio, pos); if (bytes > folio_size(folio) - offset) diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c index a1c7592d2ade..79a0614eaab7 100644 --- a/fs/iomap/iter.c +++ b/fs/iomap/iter.c @@ -7,12 +7,28 @@ #include #include "trace.h" +/* + * Advance to the next range we need to map. + * + * If the iomap is marked IOMAP_F_STALE, it means the existing map was not fully + * processed - it was aborted because the extent the iomap spanned may have been + * changed during the operation. In this case, the iteration behaviour is to + * remap the unprocessed range of the iter, and that means we may need to remap + * even when we've made no progress (i.e. iter->processed = 0). Hence the + * "finished iterating" case needs to distinguish between + * (processed = 0) meaning we are done and (processed = 0 && stale) meaning we + * need to remap the entire remaining range. + */ static inline int iomap_iter_advance(struct iomap_iter *iter) { + bool stale = iter->iomap.flags & IOMAP_F_STALE; + /* handle the previous iteration (if any) */ if (iter->iomap.length) { - if (iter->processed <= 0) + if (iter->processed < 0) return iter->processed; + if (!iter->processed && !stale) + return 0; if (WARN_ON_ONCE(iter->processed > iomap_length(iter))) return -EIO; iter->pos += iter->processed; @@ -33,6 +49,7 @@ static inline void iomap_iter_done(struct iomap_iter *iter) WARN_ON_ONCE(iter->iomap.offset > iter->pos); WARN_ON_ONCE(iter->iomap.length == 0); WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos); + WARN_ON_ONCE(iter->iomap.flags & IOMAP_F_STALE); trace_iomap_iter_dstmap(iter->inode, &iter->iomap); if (iter->srcmap.type != IOMAP_HOLE) diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 238a03087e17..f166d80b68bf 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -49,26 +49,35 @@ struct vm_fault; * * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of * buffer heads for this mapping. + * + * IOMAP_F_XATTR indicates that the iomap is for an extended attribute extent + * rather than a file data extent. */ -#define IOMAP_F_NEW 0x01 -#define IOMAP_F_DIRTY 0x02 -#define IOMAP_F_SHARED 0x04 -#define IOMAP_F_MERGED 0x08 -#define IOMAP_F_BUFFER_HEAD 0x10 -#define IOMAP_F_ZONE_APPEND 0x20 +#define IOMAP_F_NEW (1U << 0) +#define IOMAP_F_DIRTY (1U << 1) +#define IOMAP_F_SHARED (1U << 2) +#define IOMAP_F_MERGED (1U << 3) +#define IOMAP_F_BUFFER_HEAD (1U << 4) +#define IOMAP_F_ZONE_APPEND (1U << 5) +#define IOMAP_F_XATTR (1U << 6) /* * Flags set by the core iomap code during operations: * * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size * has changed as the result of this write operation. + * + * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file + * range it covers needs to be remapped by the high level before the operation + * can proceed. */ -#define IOMAP_F_SIZE_CHANGED 0x100 +#define IOMAP_F_SIZE_CHANGED (1U << 8) +#define IOMAP_F_STALE (1U << 9) /* * Flags from 0x1000 up are for file system specific usage: */ -#define IOMAP_F_PRIVATE 0x1000 +#define IOMAP_F_PRIVATE (1U << 12) /* @@ -89,6 +98,7 @@ struct iomap { void *inline_data; void *private; /* filesystem private */ const struct iomap_page_ops *page_ops; + u64 validity_cookie; /* used with .iomap_valid() */ }; static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos) @@ -128,6 +138,23 @@ struct iomap_page_ops { int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len); void (*page_done)(struct inode *inode, loff_t pos, unsigned copied, struct page *page); + + /* + * Check that the cached iomap still maps correctly to the filesystem's + * internal extent map. FS internal extent maps can change while iomap + * is iterating a cached iomap, so this hook allows iomap to detect that + * the iomap needs to be refreshed during a long running write + * operation. + * + * The filesystem can store internal state (e.g. a sequence number) in + * iomap->validity_cookie when the iomap is first mapped to be able to + * detect changes between mapping time and whenever .iomap_valid() is + * called. + * + * This is called with the folio over the specified file position held + * locked by the iomap code. + */ + bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap); }; /* From patchwork Tue Nov 15 01:30:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD027C43217 for ; Tue, 15 Nov 2022 01:31:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F2FE98E0002; Mon, 14 Nov 2022 20:30:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E9D2E8E000A; Mon, 14 Nov 2022 20:30:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C991C8E0002; Mon, 14 Nov 2022 20:30:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 99DD88E0002 for ; Mon, 14 Nov 2022 20:30:52 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6D8E240181 for ; Tue, 15 Nov 2022 01:30:52 +0000 (UTC) X-FDA: 80133947544.09.3851149 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf15.hostedemail.com (Postfix) with ESMTP id EB799A0010 for ; Tue, 15 Nov 2022 01:30:51 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id k5so11938229pjo.5 for ; Mon, 14 Nov 2022 17:30:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5n9jUy4sssfgVJUVEaPMIy+PBIYxz0wlrDVXEfECpxc=; b=iXN7Lii91KrW1mEKYU7vcnT93z2zl5il+RsSZieAy2KiegXHj1P+FRXqh5HbrZHj8i 7LoG+qzlWGRPrO1nSebtmU8bk8N9QME0/yRk/KZoIBk+zytAUPoWXObPSIOu2MjhcM8/ NatPhbYo7TelsIi32E8NblK8Qm6wYrC0AAeTfmP3PxX6t8OabY893fWq+fbes7bqkEae 6ubKQx9enABGWbAJqIpsr7nPLOFAcHuVFrD75Drc95VIGRWtpgH5fi6RVTJiDnSnP6O8 7yQCFeG67jqWWJg8moveBeClD8B4vXr0N/F6jcMddPKvomsx7wRl1L3sImuCTVgdwzAF f7FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5n9jUy4sssfgVJUVEaPMIy+PBIYxz0wlrDVXEfECpxc=; b=Pp8CwVMrTcpLo6ExSZSYBy4qPABwNqWhjtgun6tUb2KKrnrbvInZXOfoznoXdlJS1R Ml74qGMYlPPScgXgRhAUWLnJXCW1TUyY/or+yCTsf7IPg3zxG4C/UavKxElq04MYRruj FybN2zZuoNierGfmQI/Je28BqFRnrRKB/FD7kEMUpL3AwzN7Xcske5sre+60BZy/ODey N6oJ+d9YCOi9OcI77vnV8vx6P8xcc6lhThsDfExS4D877Xmq8IIDcYGxY8B1D2v9+x6J 4DnrCRoY+29ZFP4YAFfvqu3ccKHe7uoLxUWcCtHUIvw72C32qAtYRx1MZxV0yfJEo7bo Pwsg== X-Gm-Message-State: ANoB5pkeOfTJaVFBhsPwNfutBUH4sGP9ttBPW4IzXiRIFeamCyBYZ2qc LwGN+ONhPqJ+ghu+9o8Dc9LwxiQnzm04hA== X-Google-Smtp-Source: AA0mqf5P5xy6NOiQTxX0F9Jsip5v/trupz+ohtzexVLyTJ+bPPcB5F2mBBoySL68Or9ePkJ+mbMhvQ== X-Received: by 2002:a17:902:c652:b0:187:4920:3a78 with SMTP id s18-20020a170902c65200b0018749203a78mr1774485pls.88.1668475850860; Mon, 14 Nov 2022 17:30:50 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id i7-20020a1709026ac700b00176c0e055f8sm8222412plt.64.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:48 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKG9-GI; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001Vpm-1T; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 8/9] xfs: use iomap_valid method to detect stale cached iomaps Date: Tue, 15 Nov 2022 12:30:42 +1100 Message-Id: <20221115013043.360610-9-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475852; a=rsa-sha256; cv=none; b=Dvprh9squM0zPxFdM6lT6Q7KvS6OAj+98Oz8qXu33yTCuMWAUeoHXCz65WvlMyenNt5ZXB WKMEeFW5NH6VeYUVb2QHJHQaDS3wnJUmsCeB2CSa3buLYKkCb8PrEY1GAur6N1tzA5JlGn FLO6ThwHei5KYnmtkesOvbExt4MKfSA= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=iXN7Lii9; spf=none (imf15.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.216.43) smtp.mailfrom=david@fromorbit.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475852; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5n9jUy4sssfgVJUVEaPMIy+PBIYxz0wlrDVXEfECpxc=; b=nJ34lYbCDvxtRPCPHBqf8cMPDN07Sa7rYKOTJjZO+PGO2RrzM8hBEd9qci8opMySz1kMDf R0Gq2UmLC7c7wKuIMxdAuseYYWlhK9X+tDSp21jYMZNutEXHgiQ+0yJau7WaDXOQrl/n2U JHMVx0E738ID47JlGyE3GgU/PuK5J+M= X-Stat-Signature: q4xubcj3rb3fcu8owdjqzow6d7sc59k1 X-Rspamd-Queue-Id: EB799A0010 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b=iXN7Lii9; spf=none (imf15.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.216.43) smtp.mailfrom=david@fromorbit.com; dmarc=none X-Rspamd-Server: rspam04 X-Rspam-User: X-HE-Tag: 1668475851-601223 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner Now that iomap supports a mechanism to validate cached iomaps for buffered write operations, hook it up to the XFS buffered write ops so that we can avoid data corruptions that result from stale cached iomaps. See: https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/ or the ->iomap_valid() introduction commit for exact details of the corruption vector. The validity cookie we store in the iomap is based on the type of iomap we return. It is expected that the iomap->flags we set in xfs_bmbt_to_iomap() is not perturbed by the iomap core and are returned to us in the iomap passed via the .iomap_valid() callback. This ensures that the validity cookie is always checking the correct inode fork sequence numbers to detect potential changes that affect the extent cached by the iomap. Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_bmap.c | 6 ++- fs/xfs/xfs_aops.c | 2 +- fs/xfs/xfs_iomap.c | 105 +++++++++++++++++++++++++++++++-------- fs/xfs/xfs_iomap.h | 5 +- fs/xfs/xfs_pnfs.c | 6 ++- 5 files changed, 97 insertions(+), 27 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 49d0d4ea63fc..56b9b7db38bb 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4551,7 +4551,8 @@ xfs_bmapi_convert_delalloc( * the extent. Just return the real extent at this offset. */ if (!isnullstartblock(bma.got.br_startblock)) { - xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags); + xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags, + xfs_iomap_inode_sequence(ip, flags)); *seq = READ_ONCE(ifp->if_seq); goto out_trans_cancel; } @@ -4599,7 +4600,8 @@ xfs_bmapi_convert_delalloc( XFS_STATS_INC(mp, xs_xstrat_quick); ASSERT(!isnullstartblock(bma.got.br_startblock)); - xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags); + xfs_bmbt_to_iomap(ip, iomap, &bma.got, 0, flags, + xfs_iomap_inode_sequence(ip, flags)); *seq = READ_ONCE(ifp->if_seq); if (whichfork == XFS_COW_FORK) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 6aadc5815068..a22d90af40c8 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -372,7 +372,7 @@ xfs_map_blocks( isnullstartblock(imap.br_startblock)) goto allocate_blocks; - xfs_bmbt_to_iomap(ip, &wpc->iomap, &imap, 0, 0); + xfs_bmbt_to_iomap(ip, &wpc->iomap, &imap, 0, 0, XFS_WPC(wpc)->data_seq); trace_xfs_map_blocks_found(ip, offset, count, whichfork, &imap); return 0; allocate_blocks: diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 04da22943e7c..fa578c913323 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -48,13 +48,54 @@ xfs_alert_fsblock_zero( return -EFSCORRUPTED; } +u64 +xfs_iomap_inode_sequence( + struct xfs_inode *ip, + u16 iomap_flags) +{ + u64 cookie = 0; + + if (iomap_flags & IOMAP_F_XATTR) + return READ_ONCE(ip->i_af.if_seq); + if ((iomap_flags & IOMAP_F_SHARED) && ip->i_cowfp) + cookie = (u64)READ_ONCE(ip->i_cowfp->if_seq) << 32; + return cookie | READ_ONCE(ip->i_df.if_seq); +} + +/* + * Check that the iomap passed to us is still valid for the given offset and + * length. + */ +static bool +xfs_iomap_valid( + struct inode *inode, + const struct iomap *iomap) +{ + struct xfs_inode *ip = XFS_I(inode); + u64 cookie = 0; + + if (iomap->flags & IOMAP_F_XATTR) { + cookie = READ_ONCE(ip->i_af.if_seq); + } else { + if ((iomap->flags & IOMAP_F_SHARED) && ip->i_cowfp) + cookie = (u64)READ_ONCE(ip->i_cowfp->if_seq) << 32; + cookie |= READ_ONCE(ip->i_df.if_seq); + } + return cookie == iomap->validity_cookie; +} + +const struct iomap_page_ops xfs_iomap_page_ops = { + .iomap_valid = xfs_iomap_valid, +}; + int xfs_bmbt_to_iomap( struct xfs_inode *ip, struct iomap *iomap, struct xfs_bmbt_irec *imap, unsigned int mapping_flags, - u16 iomap_flags) + u16 iomap_flags, + u64 sequence_cookie) { struct xfs_mount *mp = ip->i_mount; struct xfs_buftarg *target = xfs_inode_buftarg(ip); @@ -91,6 +132,9 @@ xfs_bmbt_to_iomap( if (xfs_ipincount(ip) && (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP)) iomap->flags |= IOMAP_F_DIRTY; + + iomap->validity_cookie = sequence_cookie; + iomap->page_ops = &xfs_iomap_page_ops; return 0; } @@ -195,7 +239,8 @@ xfs_iomap_write_direct( xfs_fileoff_t offset_fsb, xfs_fileoff_t count_fsb, unsigned int flags, - struct xfs_bmbt_irec *imap) + struct xfs_bmbt_irec *imap, + u64 *seq) { struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; @@ -285,6 +330,8 @@ xfs_iomap_write_direct( error = xfs_alert_fsblock_zero(ip, imap); out_unlock: + if (seq) + *seq = xfs_iomap_inode_sequence(ip, 0); xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; @@ -743,6 +790,7 @@ xfs_direct_write_iomap_begin( bool shared = false; u16 iomap_flags = 0; unsigned int lockmode = XFS_ILOCK_SHARED; + u64 seq; ASSERT(flags & (IOMAP_WRITE | IOMAP_ZERO)); @@ -811,9 +859,10 @@ xfs_direct_write_iomap_begin( goto out_unlock; } + seq = xfs_iomap_inode_sequence(ip, iomap_flags); xfs_iunlock(ip, lockmode); trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap); - return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags); + return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq); allocate_blocks: error = -EAGAIN; @@ -839,24 +888,26 @@ xfs_direct_write_iomap_begin( xfs_iunlock(ip, lockmode); error = xfs_iomap_write_direct(ip, offset_fsb, end_fsb - offset_fsb, - flags, &imap); + flags, &imap, &seq); if (error) return error; trace_xfs_iomap_alloc(ip, offset, length, XFS_DATA_FORK, &imap); return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, - iomap_flags | IOMAP_F_NEW); + iomap_flags | IOMAP_F_NEW, seq); out_found_cow: - xfs_iunlock(ip, lockmode); length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount); trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap); if (imap.br_startblock != HOLESTARTBLOCK) { - error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0); + seq = xfs_iomap_inode_sequence(ip, 0); + error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq); if (error) - return error; + goto out_unlock; } - return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED); + seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED); + xfs_iunlock(ip, lockmode); + return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq); out_unlock: if (lockmode) @@ -915,6 +966,7 @@ xfs_buffered_write_iomap_begin( int allocfork = XFS_DATA_FORK; int error = 0; unsigned int lockmode = XFS_ILOCK_EXCL; + u64 seq; if (xfs_is_shutdown(mp)) return -EIO; @@ -1094,26 +1146,31 @@ xfs_buffered_write_iomap_begin( * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch * them out if the write happens to fail. */ + seq = xfs_iomap_inode_sequence(ip, IOMAP_F_NEW); xfs_iunlock(ip, XFS_ILOCK_EXCL); trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap); - return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_NEW); + return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_NEW, seq); found_imap: + seq = xfs_iomap_inode_sequence(ip, 0); xfs_iunlock(ip, XFS_ILOCK_EXCL); - return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0); + return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq); found_cow: - xfs_iunlock(ip, XFS_ILOCK_EXCL); + seq = xfs_iomap_inode_sequence(ip, 0); if (imap.br_startoff <= offset_fsb) { - error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0); + error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq); if (error) - return error; + goto out_unlock; + seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED); + xfs_iunlock(ip, XFS_ILOCK_EXCL); return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, - IOMAP_F_SHARED); + IOMAP_F_SHARED, seq); } xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb); - return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0); + xfs_iunlock(ip, XFS_ILOCK_EXCL); + return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq); out_unlock: xfs_iunlock(ip, XFS_ILOCK_EXCL); @@ -1346,6 +1403,7 @@ xfs_read_iomap_begin( int nimaps = 1, error = 0; bool shared = false; unsigned int lockmode = XFS_ILOCK_SHARED; + u64 seq; ASSERT(!(flags & (IOMAP_WRITE | IOMAP_ZERO))); @@ -1359,13 +1417,14 @@ xfs_read_iomap_begin( &nimaps, 0); if (!error && (flags & IOMAP_REPORT)) error = xfs_reflink_trim_around_shared(ip, &imap, &shared); + seq = xfs_iomap_inode_sequence(ip, shared ? IOMAP_F_SHARED : 0); xfs_iunlock(ip, lockmode); if (error) return error; trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap); return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, - shared ? IOMAP_F_SHARED : 0); + shared ? IOMAP_F_SHARED : 0, seq); } const struct iomap_ops xfs_read_iomap_ops = { @@ -1390,6 +1449,7 @@ xfs_seek_iomap_begin( struct xfs_bmbt_irec imap, cmap; int error = 0; unsigned lockmode; + u64 seq; if (xfs_is_shutdown(mp)) return -EIO; @@ -1424,8 +1484,9 @@ xfs_seek_iomap_begin( if (data_fsb < cow_fsb + cmap.br_blockcount) end_fsb = min(end_fsb, data_fsb); xfs_trim_extent(&cmap, offset_fsb, end_fsb); + seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED); error = xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, - IOMAP_F_SHARED); + IOMAP_F_SHARED, seq); /* * This is a COW extent, so we must probe the page cache * because there could be dirty page cache being backed @@ -1446,8 +1507,9 @@ xfs_seek_iomap_begin( imap.br_startblock = HOLESTARTBLOCK; imap.br_state = XFS_EXT_NORM; done: + seq = xfs_iomap_inode_sequence(ip, 0); xfs_trim_extent(&imap, offset_fsb, end_fsb); - error = xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0); + error = xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq); out_unlock: xfs_iunlock(ip, lockmode); return error; @@ -1473,6 +1535,7 @@ xfs_xattr_iomap_begin( struct xfs_bmbt_irec imap; int nimaps = 1, error = 0; unsigned lockmode; + int seq; if (xfs_is_shutdown(mp)) return -EIO; @@ -1489,12 +1552,14 @@ xfs_xattr_iomap_begin( error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, &nimaps, XFS_BMAPI_ATTRFORK); out_unlock: + + seq = xfs_iomap_inode_sequence(ip, IOMAP_F_XATTR); xfs_iunlock(ip, lockmode); if (error) return error; ASSERT(nimaps); - return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0); + return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_XATTR, seq); } const struct iomap_ops xfs_xattr_iomap_ops = { diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h index 0f62ab633040..4da13440bae9 100644 --- a/fs/xfs/xfs_iomap.h +++ b/fs/xfs/xfs_iomap.h @@ -13,14 +13,15 @@ struct xfs_bmbt_irec; int xfs_iomap_write_direct(struct xfs_inode *ip, xfs_fileoff_t offset_fsb, xfs_fileoff_t count_fsb, unsigned int flags, - struct xfs_bmbt_irec *imap); + struct xfs_bmbt_irec *imap, u64 *sequence); int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool); xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode *ip, xfs_fileoff_t end_fsb); +u64 xfs_iomap_inode_sequence(struct xfs_inode *ip, u16 iomap_flags); int xfs_bmbt_to_iomap(struct xfs_inode *ip, struct iomap *iomap, struct xfs_bmbt_irec *imap, unsigned int mapping_flags, - u16 iomap_flags); + u16 iomap_flags, u64 sequence_cookie); int xfs_zero_range(struct xfs_inode *ip, loff_t pos, loff_t len, bool *did_zero); diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c index 37a24f0f7cd4..38d23f0e703a 100644 --- a/fs/xfs/xfs_pnfs.c +++ b/fs/xfs/xfs_pnfs.c @@ -125,6 +125,7 @@ xfs_fs_map_blocks( int nimaps = 1; uint lock_flags; int error = 0; + u64 seq; if (xfs_is_shutdown(mp)) return -EIO; @@ -176,6 +177,7 @@ xfs_fs_map_blocks( lock_flags = xfs_ilock_data_map_shared(ip); error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb, &imap, &nimaps, bmapi_flags); + seq = xfs_iomap_inode_sequence(ip, 0); ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK); @@ -189,7 +191,7 @@ xfs_fs_map_blocks( xfs_iunlock(ip, lock_flags); error = xfs_iomap_write_direct(ip, offset_fsb, - end_fsb - offset_fsb, 0, &imap); + end_fsb - offset_fsb, 0, &imap, &seq); if (error) goto out_unlock; @@ -209,7 +211,7 @@ xfs_fs_map_blocks( } xfs_iunlock(ip, XFS_IOLOCK_EXCL); - error = xfs_bmbt_to_iomap(ip, iomap, &imap, 0, 0); + error = xfs_bmbt_to_iomap(ip, iomap, &imap, 0, 0, seq); *device_generation = mp->m_generation; return error; out_unlock: From patchwork Tue Nov 15 01:30:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 13043084 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 123D8C43217 for ; Tue, 15 Nov 2022 01:30:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 522358E0006; Mon, 14 Nov 2022 20:30:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D1558E0002; Mon, 14 Nov 2022 20:30:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FC4A8E0006; Mon, 14 Nov 2022 20:30:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 206A88E0002 for ; Mon, 14 Nov 2022 20:30:52 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AE718C0345 for ; Tue, 15 Nov 2022 01:30:51 +0000 (UTC) X-FDA: 80133947502.20.5A724BA Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf25.hostedemail.com (Postfix) with ESMTP id 402C2A000D for ; Tue, 15 Nov 2022 01:30:51 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id 6so11875623pgm.6 for ; Mon, 14 Nov 2022 17:30:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jgZL4Q6/hiN7SK0U+fxjQpeeWJPO0+rsmSJULzaGQ0k=; b=UUqPu/qLFSsKkqibdIyWpzTl7kJ0392Arsqxv5eqI2K6kcKDlIPK26OhDQtPxdgLk6 NmBy3VVOPc9exNJwbPMYA/O1J5+2B3/QWN36xbCtHZFo1XS7tAziLiwIwM1I4ZSIMSfz ya5v3sLA5e4ngF5cW55rzN6yyHLIdJP385ok87P7w4GraXHJfZkCQd0rd9WUT3aIz21m kZBez5immZ36uVk6aLtli7nW9CxMyY3g627gqnkAHVEJgAzsGAheJkioxNrm6Xx9p+yi 0q0KykfQMpzuWtFoq8of5+MbNVzqwkjdjnt0CdGcSJOfNkZKtS97NnGEg+ruJCBe2siW uJGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jgZL4Q6/hiN7SK0U+fxjQpeeWJPO0+rsmSJULzaGQ0k=; b=CN1qKX6giPplSDMTYXjNYjZHb+g4Onns/alC/xVexc7Ui/VKc+Aha8u6YE+BbR9mqU MVynTQS8nJPOfLI6nX1OHpDXDPrLCy6OpYqFbWbxFit5HpkBwqOk9yeih0iz40jKHODU K4N4S+pw5KpABYU7IMcat2x1ZmNgOf3oWgPDjwelWxVstNk828mQvmwK1eh0wNKyqtmw t7d6tOKLjs8WAskx2k9z41FR9S8Fl0VWkBntw5MDE9hd+Yfyw5N22BYOzZsVEk5IWyRo /6WzZQPfYatGu8sIpCbYHZE5ygu9L81SonPxtS7lQ1VbnmkfNEmp1gDn2Fr6/hk4NijZ Vn2w== X-Gm-Message-State: ANoB5pm4cyhPPdT9qaZ0d8HD3tIrHl662WqtZ0CEW5Jt9VJT7A2ktc70 zFjp6p03RWi1O5arxWd1Oh4ZtQ== X-Google-Smtp-Source: AA0mqf53q+Ed2oyN1jbck8d7rOoRUBqjIi3l7i6bP6FZ3exURtjovulj7jav8V30tqIY2/ycTy12og== X-Received: by 2002:aa7:814e:0:b0:572:6e9b:9fa2 with SMTP id d14-20020aa7814e000000b005726e9b9fa2mr1008065pfn.8.1668475850202; Mon, 14 Nov 2022 17:30:50 -0800 (PST) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id d17-20020a170902ced100b001868ed86a95sm8290720plg.174.2022.11.14.17.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Nov 2022 17:30:48 -0800 (PST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1oukmj-00EKGJ-Hn; Tue, 15 Nov 2022 12:30:45 +1100 Received: from dave by discord.disaster.area with local (Exim 4.96) (envelope-from ) id 1oukmj-001Vpr-1d; Tue, 15 Nov 2022 12:30:45 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Date: Tue, 15 Nov 2022 12:30:43 +1100 Message-Id: <20221115013043.360610-10-david@fromorbit.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20221115013043.360610-1-david@fromorbit.com> References: <20221115013043.360610-1-david@fromorbit.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668475851; a=rsa-sha256; cv=none; b=EGMiutRgfKfr3CLjO/fHztUIVXGbhECc7Hof2Qmihim6XhPGVXPXK0opdZ30tUyKvL/M49 21f7pbiLHUVG0PXZyIMOU0xPqKBbMKOz8eoQdI5voOf/J4/0M6ETjSWaCm7fLeSXUmKJ5J CGIPzk2oF9zGzIxd29VIp/pu4BUxVjs= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b="UUqPu/qL"; spf=none (imf25.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.215.177) smtp.mailfrom=david@fromorbit.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668475851; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jgZL4Q6/hiN7SK0U+fxjQpeeWJPO0+rsmSJULzaGQ0k=; b=AJC7+Q4IVAt/2yZVZOXo0+jiluDp2i4aoJiONNB7T0SNBFcgaIWa+xgiWf4NBPciY9KFbz 1ZrPQ/vf7FntuGxDvoxS+TeLlMFeHu0lTqK/WzJBptf0rhT996qz/XtGRe5iYS/06vkNhj jjSwqstIXWSZKFHCvD5ZnddMkycGNRk= X-Stat-Signature: 5homj9ux1frcd66oy1nw7yssn5n89nit Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=fromorbit-com.20210112.gappssmtp.com header.s=20210112 header.b="UUqPu/qL"; spf=none (imf25.hostedemail.com: domain of david@fromorbit.com has no SPF policy when checking 209.85.215.177) smtp.mailfrom=david@fromorbit.com; dmarc=none X-Rspamd-Server: rspam10 X-Rspam-User: X-Rspamd-Queue-Id: 402C2A000D X-HE-Tag: 1668475851-732015 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Chinner With the changes to scan the page cache for dirty data to avoid data corruptions from partial write cleanup racing with other page cache operations, the drop writes error injection no longer works the same way it used to and causes xfs/196 to fail. This is because xfs/196 writes to the file and populates the page cache before it turns on the error injection and starts failing -overwrites-. The result is that the original drop-writes code failed writes only -after- overwriting the data in the cache, followed by invalidates the cached data, then punching out the delalloc extent from under that data. On the surface, this looks fine. The problem is that page cache invalidation *doesn't guarantee that it removes anything from the page cache* and it doesn't change the dirty state of the folio. When block size == page size and we do page aligned IO (as xfs/196 does) everything happens to align perfectly and page cache invalidation removes the single page folios that span the written data. Hence the followup delalloc punch pass does not find cached data over that range and it can punch the extent out. IOWs, xfs/196 "works" for block size == page size with the new code. I say "works", because it actually only works for the case where IO is page aligned, and no data was read from disk before writes occur. Because the moment we actually read data first, the readahead code allocates multipage folios and suddenly the invalidate code goes back to zeroing subfolio ranges without changing dirty state. Hence, with multipage folios in play, block size == page size is functionally identical to block size < page size behaviour, and drop-writes is manifestly broken w.r.t to this case. Invalidation of a subfolio range doesn't result in the folio being removed from the cache, just the range gets zeroed. Hence after we've sequentially walked over a folio that we've dirtied (via write data) and then invalidated, we end up with a dirty folio full of zeroed data. And because the new code skips punching ranges that have dirty folios covering them, we end up leaving the delalloc range intact after failing all the writes. Hence failed writes now end up writing zeroes to disk in the cases where invalidation zeroes folios rather than removing them from cache. This is a fundamental change of behaviour that is needed to avoid the data corruption vectors that exist in the old write fail path, and it renders the drop-writes injection non-functional and unworkable as it stands. As it is, I think the error injection is also now unnecessary, as partial writes that need delalloc extent are going to be a lot more common with stale iomap detection in place. Hence this patch removes the drop-writes error injection completely. xfs/196 can remain for testing kernels that don't have this data corruption fix, but those that do will report: xfs/196 3s ... [not run] XFS error injection drop_writes unknown on this kernel. Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_errortag.h | 12 +++++------- fs/xfs/xfs_error.c | 27 ++++++++++++++++++++------- fs/xfs/xfs_iomap.c | 9 --------- 3 files changed, 25 insertions(+), 23 deletions(-) diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h index 5362908164b0..580ccbd5aadc 100644 --- a/fs/xfs/libxfs/xfs_errortag.h +++ b/fs/xfs/libxfs/xfs_errortag.h @@ -40,13 +40,12 @@ #define XFS_ERRTAG_REFCOUNT_FINISH_ONE 25 #define XFS_ERRTAG_BMAP_FINISH_ONE 26 #define XFS_ERRTAG_AG_RESV_CRITICAL 27 + /* - * DEBUG mode instrumentation to test and/or trigger delayed allocation - * block killing in the event of failed writes. When enabled, all - * buffered writes are silenty dropped and handled as if they failed. - * All delalloc blocks in the range of the write (including pre-existing - * delalloc blocks!) are tossed as part of the write failure error - * handling sequence. + * Drop-writes support removed because write error handling cannot trash + * pre-existing delalloc extents in any useful way anymore. We retain the + * definition so that we can reject it as an invalid value in + * xfs_errortag_valid(). */ #define XFS_ERRTAG_DROP_WRITES 28 #define XFS_ERRTAG_LOG_BAD_CRC 29 @@ -95,7 +94,6 @@ #define XFS_RANDOM_REFCOUNT_FINISH_ONE 1 #define XFS_RANDOM_BMAP_FINISH_ONE 1 #define XFS_RANDOM_AG_RESV_CRITICAL 4 -#define XFS_RANDOM_DROP_WRITES 1 #define XFS_RANDOM_LOG_BAD_CRC 1 #define XFS_RANDOM_LOG_ITEM_PIN 1 #define XFS_RANDOM_BUF_LRU_REF 2 diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c index c6b2aabd6f18..dea3c0649d2f 100644 --- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -46,7 +46,7 @@ static unsigned int xfs_errortag_random_default[] = { XFS_RANDOM_REFCOUNT_FINISH_ONE, XFS_RANDOM_BMAP_FINISH_ONE, XFS_RANDOM_AG_RESV_CRITICAL, - XFS_RANDOM_DROP_WRITES, + 0, /* XFS_RANDOM_DROP_WRITES has been removed */ XFS_RANDOM_LOG_BAD_CRC, XFS_RANDOM_LOG_ITEM_PIN, XFS_RANDOM_BUF_LRU_REF, @@ -162,7 +162,6 @@ XFS_ERRORTAG_ATTR_RW(refcount_continue_update, XFS_ERRTAG_REFCOUNT_CONTINUE_UPDA XFS_ERRORTAG_ATTR_RW(refcount_finish_one, XFS_ERRTAG_REFCOUNT_FINISH_ONE); XFS_ERRORTAG_ATTR_RW(bmap_finish_one, XFS_ERRTAG_BMAP_FINISH_ONE); XFS_ERRORTAG_ATTR_RW(ag_resv_critical, XFS_ERRTAG_AG_RESV_CRITICAL); -XFS_ERRORTAG_ATTR_RW(drop_writes, XFS_ERRTAG_DROP_WRITES); XFS_ERRORTAG_ATTR_RW(log_bad_crc, XFS_ERRTAG_LOG_BAD_CRC); XFS_ERRORTAG_ATTR_RW(log_item_pin, XFS_ERRTAG_LOG_ITEM_PIN); XFS_ERRORTAG_ATTR_RW(buf_lru_ref, XFS_ERRTAG_BUF_LRU_REF); @@ -206,7 +205,6 @@ static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(refcount_finish_one), XFS_ERRORTAG_ATTR_LIST(bmap_finish_one), XFS_ERRORTAG_ATTR_LIST(ag_resv_critical), - XFS_ERRORTAG_ATTR_LIST(drop_writes), XFS_ERRORTAG_ATTR_LIST(log_bad_crc), XFS_ERRORTAG_ATTR_LIST(log_item_pin), XFS_ERRORTAG_ATTR_LIST(buf_lru_ref), @@ -256,6 +254,19 @@ xfs_errortag_del( kmem_free(mp->m_errortag); } +static bool +xfs_errortag_valid( + unsigned int error_tag) +{ + if (error_tag >= XFS_ERRTAG_MAX) + return false; + + /* Error out removed injection types */ + if (error_tag == XFS_ERRTAG_DROP_WRITES) + return false; + return true; +} + bool xfs_errortag_test( struct xfs_mount *mp, @@ -277,7 +288,9 @@ xfs_errortag_test( if (!mp->m_errortag) return false; - ASSERT(error_tag < XFS_ERRTAG_MAX); + if (!xfs_errortag_valid(error_tag)) + return false; + randfactor = mp->m_errortag[error_tag]; if (!randfactor || prandom_u32_max(randfactor)) return false; @@ -293,7 +306,7 @@ xfs_errortag_get( struct xfs_mount *mp, unsigned int error_tag) { - if (error_tag >= XFS_ERRTAG_MAX) + if (!xfs_errortag_valid(error_tag)) return -EINVAL; return mp->m_errortag[error_tag]; @@ -305,7 +318,7 @@ xfs_errortag_set( unsigned int error_tag, unsigned int tag_value) { - if (error_tag >= XFS_ERRTAG_MAX) + if (!xfs_errortag_valid(error_tag)) return -EINVAL; mp->m_errortag[error_tag] = tag_value; @@ -319,7 +332,7 @@ xfs_errortag_add( { BUILD_BUG_ON(ARRAY_SIZE(xfs_errortag_random_default) != XFS_ERRTAG_MAX); - if (error_tag >= XFS_ERRTAG_MAX) + if (!xfs_errortag_valid(error_tag)) return -EINVAL; return xfs_errortag_set(mp, error_tag, diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index fa578c913323..bf1661410272 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1335,15 +1335,6 @@ xfs_buffered_write_iomap_end( if (iomap->type != IOMAP_DELALLOC) return 0; - /* - * Behave as if the write failed if drop writes is enabled. Set the NEW - * flag to force delalloc cleanup. - */ - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) { - iomap->flags |= IOMAP_F_NEW; - written = 0; - } - /* If we didn't reserve the blocks, we're not allowed to punch them. */ if (!(iomap->flags & IOMAP_F_NEW)) return 0;