From patchwork Mon Feb 17 09:31:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13977385 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55D532153C7 for ; Mon, 17 Feb 2025 09:32:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739784734; cv=none; b=u1x9p3VmW9+yJW/7afoiGsTBozLLogSYvtz+nHnoiBO6pfnAn1aclFDOjbCqrPmH42vmbzTfJ4tQptpHMclLfOqTmRNzeTWLTPXFxFjRxI0UUO/zj+W2EXParw1B5gkopFl6iRaQBYr7l5GpJmV3qXJ8DI7geaY5lUDAQtzKDzE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739784734; c=relaxed/simple; bh=OCNTfNqAj/p3hCxriSJXudESRuydYxSXGV8bOZURJJU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=acr7HtLH/DbnOuC6suIxSSCw5qHPJckYZ2BLuforhkPLvikpPWuqDNhappiJc+T69XH275HnVBhEWF4J7HLiKWil7rRU4mGnFUHWDvby9BtEUtphqgcZJV3EI390nBGO1KJjkdWOB+2PgYoFdP50Ep+4iizga+ftgCdB1pHby08= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=d2OkWECo; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="d2OkWECo" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=JEsasq2f4ZAU/rckU/k8GBDUwJg22AmpwVOkzv15BAk=; b=d2OkWECoB6JBsgrjy/IA8H/bO8 WbYzWqsEPS2g7wbVy1BrXo8lYY1r80+aEkRhKyIijyZmBjofKPW3+TuOZ4qzgzgFaeblWyosOLeqN YynBf/+mqvVGq4d46jl/7VVL9mVJoqUK4YDMHut0SIX5/3uB5OqD0pucxlD0bmCpal4HA5QeHXRco Do6vltXdtjQcSCpfdSquNicXH70duAuxc+CgUxkMnPbhYG2VsDH9jFkDwzcneFiIMPC8g8sO5J9P7 eO+Lo0ohZrBbUSVvHs9s3/hAS9YkPaD1gkcSkP0g2a0VWHeJ0UMTxWdOS5+YqtAIfHRop6nCUHucD 1BLm0J2Q==; Received: from 2a02-8389-2341-5b80-a8df-74d2-0b85-4db2.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:a8df:74d2:b85:4db2] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tjxU4-00000003wGl-1iwe; Mon, 17 Feb 2025 09:32:12 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , linux-xfs@vger.kernel.org Subject: [PATCH 1/4] xfs: reduce context switches for synchronous buffered I/O Date: Mon, 17 Feb 2025 10:31:26 +0100 Message-ID: <20250217093207.3769550-2-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250217093207.3769550-1-hch@lst.de> References: <20250217093207.3769550-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Currently all metadata I/O completions happen in the m_buf_workqueue workqueue. But for synchronous I/O (i.e. all buffer reads) there is no need for that, as there always is a called in process context that is waiting for the I/O. Factor out the guts of xfs_buf_ioend into a separate helper and call it from xfs_buf_iowait to avoid a double an extra context switch to the workqueue. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_buf.c | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 15bb790359f8..050f2c2f6a40 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1345,6 +1345,7 @@ xfs_buf_ioend_handle_error( resubmit: xfs_buf_ioerror(bp, 0); bp->b_flags |= (XBF_DONE | XBF_WRITE_FAIL); + reinit_completion(&bp->b_iowait); xfs_buf_submit(bp); return true; out_stale: @@ -1355,8 +1356,8 @@ xfs_buf_ioend_handle_error( return false; } -static void -xfs_buf_ioend( +static bool +__xfs_buf_ioend( struct xfs_buf *bp) { trace_xfs_buf_iodone(bp, _RET_IP_); @@ -1376,7 +1377,7 @@ xfs_buf_ioend( } if (unlikely(bp->b_error) && xfs_buf_ioend_handle_error(bp)) - return; + return false; /* clear the retry state */ bp->b_last_error = 0; @@ -1397,7 +1398,15 @@ xfs_buf_ioend( bp->b_flags &= ~(XBF_READ | XBF_WRITE | XBF_READ_AHEAD | _XBF_LOGRECOVERY); + return true; +} +static void +xfs_buf_ioend( + struct xfs_buf *bp) +{ + if (!__xfs_buf_ioend(bp)) + return; if (bp->b_flags & XBF_ASYNC) xfs_buf_relse(bp); else @@ -1411,15 +1420,8 @@ xfs_buf_ioend_work( struct xfs_buf *bp = container_of(work, struct xfs_buf, b_ioend_work); - xfs_buf_ioend(bp); -} - -static void -xfs_buf_ioend_async( - struct xfs_buf *bp) -{ - INIT_WORK(&bp->b_ioend_work, xfs_buf_ioend_work); - queue_work(bp->b_mount->m_buf_workqueue, &bp->b_ioend_work); + if (__xfs_buf_ioend(bp)) + xfs_buf_relse(bp); } void @@ -1491,7 +1493,13 @@ xfs_buf_bio_end_io( XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR)) xfs_buf_ioerror(bp, -EIO); - xfs_buf_ioend_async(bp); + if (bp->b_flags & XBF_ASYNC) { + INIT_WORK(&bp->b_ioend_work, xfs_buf_ioend_work); + queue_work(bp->b_mount->m_buf_workqueue, &bp->b_ioend_work); + } else { + complete(&bp->b_iowait); + } + bio_put(bio); } @@ -1568,9 +1576,11 @@ xfs_buf_iowait( { ASSERT(!(bp->b_flags & XBF_ASYNC)); - trace_xfs_buf_iowait(bp, _RET_IP_); - wait_for_completion(&bp->b_iowait); - trace_xfs_buf_iowait_done(bp, _RET_IP_); + do { + trace_xfs_buf_iowait(bp, _RET_IP_); + wait_for_completion(&bp->b_iowait); + trace_xfs_buf_iowait_done(bp, _RET_IP_); + } while (!__xfs_buf_ioend(bp)); return bp->b_error; } From patchwork Mon Feb 17 09:31:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13977386 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3BE92153C7 for ; Mon, 17 Feb 2025 09:32:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739784738; cv=none; b=K4KyVnnhCFntdzewufl1Pumr6z1OjksYq/WwHEeT4sY1zEM6YPV7EEJOvkSosU/C6WEgoBPv8vnvdv4z/iUn3ElcTbiFrTbyZJ5G5dDgM+DGM4DtxtVjZ4swxGfrKaSYrZVIXPxJyikj0PXl9Yxa0nPeV5g3TxZKbelMssYig44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739784738; c=relaxed/simple; bh=xjq1sBu5c19n/PHoOT2H7qyoC+/YK37PwFOui1ORhOg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pH11htDVLWPBn4zUejzqRF0bq5gHBjuLfS2wtsf0ug79/HFcDO3rIjsnGG8oRYek3L4F5skldqUV14N028bnEC2+G89HwLSyzC5kILtfKRyhcDR5cO+uCkVuYJ1Deax5AdkRmSLpFeBCS0uIIF4C0/+pknKEn2IUWmjs36fciuY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=U3xCdkl+; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="U3xCdkl+" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=SYne0Ble0UiMKhn+nlmhAepDbhS+7aWC4sz0ozAWIgg=; b=U3xCdkl+lhfpNOD6yqI4PNXihJ wfe9ooaX4o79qZYJN/0WkDKwECyz/XCLeGgub/J+xZgrG8vqJola0SnQ61+nioVBKWnaYr1/Y3L2G tT21kQ1R4vbfJ8u0W9NoOVsFhTFE0fpgr7oe8vzY9pN5PVtSZeq3odswaeHaRf8RLfoyX88ynBlYG JoUsoxzxog/ZjjCNwv0mTK1oA/QmPJHasC5rIjpAX7YxF41l+Uc03L8UGGUvm+ZmLtZwgb2+d0n0n eu3rn5EcQEXY03ytzCZrf7B4GQ6PB5otpeUvRMMRY++xKFPkv5PP9wS51E1/RHs/SScY1jRKY3MJm 7cS2NkVQ==; Received: from 2a02-8389-2341-5b80-a8df-74d2-0b85-4db2.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:a8df:74d2:b85:4db2] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tjxU6-00000003wHS-3Mxz; Mon, 17 Feb 2025 09:32:15 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , linux-xfs@vger.kernel.org Subject: [PATCH 2/4] xfs: decouple buffer readahead from the normal buffer read path Date: Mon, 17 Feb 2025 10:31:27 +0100 Message-ID: <20250217093207.3769550-3-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250217093207.3769550-1-hch@lst.de> References: <20250217093207.3769550-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html xfs_buf_readahead_map is the only caller of xfs_buf_read_map and thus _xfs_buf_read that is not synchronous. Split it from xfs_buf_read_map so that the asynchronous path is self-contained and the now purely synchronous xfs_buf_read_map / _xfs_buf_read implementation can be simplified. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 41 ++++++++++++++++++++-------------------- fs/xfs/xfs_buf.h | 2 +- fs/xfs/xfs_log_recover.c | 2 +- fs/xfs/xfs_trace.h | 1 + 4 files changed, 23 insertions(+), 23 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 050f2c2f6a40..52fb85c42e94 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -794,18 +794,13 @@ xfs_buf_get_map( int _xfs_buf_read( - struct xfs_buf *bp, - xfs_buf_flags_t flags) + struct xfs_buf *bp) { - ASSERT(!(flags & XBF_WRITE)); ASSERT(bp->b_maps[0].bm_bn != XFS_BUF_DADDR_NULL); bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD | XBF_DONE); - bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD); - + bp->b_flags |= XBF_READ; xfs_buf_submit(bp); - if (flags & XBF_ASYNC) - return 0; return xfs_buf_iowait(bp); } @@ -857,6 +852,8 @@ xfs_buf_read_map( struct xfs_buf *bp; int error; + ASSERT(!(flags & (XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD))); + flags |= XBF_READ; *bpp = NULL; @@ -870,21 +867,11 @@ xfs_buf_read_map( /* Initiate the buffer read and wait. */ XFS_STATS_INC(target->bt_mount, xb_get_read); bp->b_ops = ops; - error = _xfs_buf_read(bp, flags); - - /* Readahead iodone already dropped the buffer, so exit. */ - if (flags & XBF_ASYNC) - return 0; + error = _xfs_buf_read(bp); } else { /* Buffer already read; all we need to do is check it. */ error = xfs_buf_reverify(bp, ops); - /* Readahead already finished; drop the buffer and exit. */ - if (flags & XBF_ASYNC) { - xfs_buf_relse(bp); - return 0; - } - /* We do not want read in the flags */ bp->b_flags &= ~XBF_READ; ASSERT(bp->b_ops != NULL || ops == NULL); @@ -936,6 +923,7 @@ xfs_buf_readahead_map( int nmaps, const struct xfs_buf_ops *ops) { + const xfs_buf_flags_t flags = XBF_READ | XBF_ASYNC | XBF_READ_AHEAD; struct xfs_buf *bp; /* @@ -945,9 +933,20 @@ xfs_buf_readahead_map( if (xfs_buftarg_is_mem(target)) return; - xfs_buf_read_map(target, map, nmaps, - XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD, &bp, ops, - __this_address); + if (xfs_buf_get_map(target, map, nmaps, flags | XBF_TRYLOCK, &bp)) + return; + trace_xfs_buf_readahead(bp, 0, _RET_IP_); + + if (bp->b_flags & XBF_DONE) { + xfs_buf_reverify(bp, ops); + xfs_buf_relse(bp); + return; + } + XFS_STATS_INC(target->bt_mount, xb_get_read); + bp->b_ops = ops; + bp->b_flags &= ~(XBF_WRITE | XBF_DONE); + bp->b_flags |= flags; + xfs_buf_submit(bp); } /* diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 3b4ed42e11c0..2e747555ad3f 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -291,7 +291,7 @@ int xfs_buf_get_uncached(struct xfs_buftarg *target, size_t numblks, int xfs_buf_read_uncached(struct xfs_buftarg *target, xfs_daddr_t daddr, size_t numblks, xfs_buf_flags_t flags, struct xfs_buf **bpp, const struct xfs_buf_ops *ops); -int _xfs_buf_read(struct xfs_buf *bp, xfs_buf_flags_t flags); +int _xfs_buf_read(struct xfs_buf *bp); void xfs_buf_hold(struct xfs_buf *bp); /* Releasing Buffers */ diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index b3c27dbccce8..2f76531842f8 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3380,7 +3380,7 @@ xlog_do_recover( */ xfs_buf_lock(bp); xfs_buf_hold(bp); - error = _xfs_buf_read(bp, XBF_READ); + error = _xfs_buf_read(bp); if (error) { if (!xlog_is_shutdown(log)) { xfs_buf_ioerror_alert(bp, __this_address); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index b29462363b81..bfc2f1249022 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -593,6 +593,7 @@ DEFINE_EVENT(xfs_buf_flags_class, name, \ DEFINE_BUF_FLAGS_EVENT(xfs_buf_find); DEFINE_BUF_FLAGS_EVENT(xfs_buf_get); DEFINE_BUF_FLAGS_EVENT(xfs_buf_read); +DEFINE_BUF_FLAGS_EVENT(xfs_buf_readahead); TRACE_EVENT(xfs_buf_ioerror, TP_PROTO(struct xfs_buf *bp, int error, xfs_failaddr_t caller_ip), From patchwork Mon Feb 17 09:31:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13977387 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6E1F2153EC for ; Mon, 17 Feb 2025 09:32:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739784739; cv=none; b=PNfsOx/Gf2EuLTV5sb2p6F6tSD8Rl1BdemlCVGDkUg46IeqXC21N+dwpX/Wxa37OMJnGKJQuSZutdr/ol5CClOVjBnd5WytL09AB2p5zCUBpPUsJ6j/xibPX0x6VLUzFM0HAKEnaRTxTGFG3aYDyKk4sfhOzPliBLAOhAJ9CY0w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739784739; c=relaxed/simple; bh=g0Vd9i3OYXHnZPiA+exC+6zVcgVREF+KzNWhPsaB2qc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QFRnx2SQXH/GmEGeJ4yPd66uI68reiDdAkrvI+oNbkqqo+Eb7XalEcnNVMrq9x5zHPM/AXR1lyMtxAmGR/k9GS0+XhqfuP6L9rF9HLep9utBiWKTS7swFUmqcIbgPLrYVHfCJNeAzkyJrsJuwLGrps/TMyOe9cW3gn6/C4uPWvk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=2RWNkq71; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="2RWNkq71" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=5y+16WDWnez12lYK9jJVb1UuUvs5ET8nseKL3C5X8FM=; b=2RWNkq716TqfbEL2iRsUrbCCx4 KsTQG8BBOUouR4eFWsFeHIi9bYS9L4x8IIabLHR5KD6xvwHyc0Tc/uUE8D33s0Pm94VLMO0NmKR6Z 0marg/71bUTNUEyM9kcpCpaJ3ziR6WkrHf6vEXThQqcs1KNLowmAX++pD8CPJQ0QOgksmBYvkFFvW hf2MCuxu8vy0BpUr8lYsAR3RW2DzQnC3idaSGAsqziARIFqpf144JIxc5y7LFX5zp/a1moRxmlFhm dG/NsPClkBKa4tJ+LAHygPNNnap420f+JgdOaN00gh89aOvYT7QMVfBdscRiFtC5yJuPAGu+4tAgs I2iv8xvA==; Received: from 2a02-8389-2341-5b80-a8df-74d2-0b85-4db2.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:a8df:74d2:b85:4db2] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tjxU9-00000003wI8-0YYC; Mon, 17 Feb 2025 09:32:17 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , linux-xfs@vger.kernel.org Subject: [PATCH 3/4] xfs: remove most in-flight buffer accounting Date: Mon, 17 Feb 2025 10:31:28 +0100 Message-ID: <20250217093207.3769550-4-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250217093207.3769550-1-hch@lst.de> References: <20250217093207.3769550-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html The buffer cache keeps a bt_io_count per-CPU counter to track all in-flight I/O, which is used to ensure no I/O is in flight when unmounting the file system. For most I/O we already keep track of inflight I/O at higher levels: - for synchronous I/O (xfs_buf_read/xfs_bwrite/xfs_buf_delwri_submit), the caller has a reference and waits for I/O completions using xfs_buf_iowait - for xfs_buf_delwri_submit the only caller (AIL writeback) tracks the log items that the buffer attached to This only leaves only xfs_buf_readahead_map as a submitter of asynchronous I/O that is not tracked by anything else. Replace the bt_io_count per-cpu counter with a more specific bt_readahead_count counter only tracking readahead I/O. This allows to simply increment it when submitting readahead I/O and decrementing it when it completed, and thus simplify xfs_buf_rele and remove the needed for the XBF_NO_IOACCT flags and the XFS_BSTATE_IN_FLIGHT buffer state. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 90 ++++++++------------------------------------ fs/xfs/xfs_buf.h | 5 +-- fs/xfs/xfs_buf_mem.c | 2 +- fs/xfs/xfs_mount.c | 7 +--- fs/xfs/xfs_rtalloc.c | 2 +- 5 files changed, 20 insertions(+), 86 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 52fb85c42e94..f8efdee3c8b4 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -29,11 +29,6 @@ struct kmem_cache *xfs_buf_cache; /* * Locking orders * - * xfs_buf_ioacct_inc: - * xfs_buf_ioacct_dec: - * b_sema (caller holds) - * b_lock - * * xfs_buf_stale: * b_sema (caller holds) * b_lock @@ -81,51 +76,6 @@ xfs_buf_vmap_len( return (bp->b_page_count * PAGE_SIZE); } -/* - * Bump the I/O in flight count on the buftarg if we haven't yet done so for - * this buffer. The count is incremented once per buffer (per hold cycle) - * because the corresponding decrement is deferred to buffer release. Buffers - * can undergo I/O multiple times in a hold-release cycle and per buffer I/O - * tracking adds unnecessary overhead. This is used for sychronization purposes - * with unmount (see xfs_buftarg_drain()), so all we really need is a count of - * in-flight buffers. - * - * Buffers that are never released (e.g., superblock, iclog buffers) must set - * the XBF_NO_IOACCT flag before I/O submission. Otherwise, the buftarg count - * never reaches zero and unmount hangs indefinitely. - */ -static inline void -xfs_buf_ioacct_inc( - struct xfs_buf *bp) -{ - if (bp->b_flags & XBF_NO_IOACCT) - return; - - ASSERT(bp->b_flags & XBF_ASYNC); - spin_lock(&bp->b_lock); - if (!(bp->b_state & XFS_BSTATE_IN_FLIGHT)) { - bp->b_state |= XFS_BSTATE_IN_FLIGHT; - percpu_counter_inc(&bp->b_target->bt_io_count); - } - spin_unlock(&bp->b_lock); -} - -/* - * Clear the in-flight state on a buffer about to be released to the LRU or - * freed and unaccount from the buftarg. - */ -static inline void -__xfs_buf_ioacct_dec( - struct xfs_buf *bp) -{ - lockdep_assert_held(&bp->b_lock); - - if (bp->b_state & XFS_BSTATE_IN_FLIGHT) { - bp->b_state &= ~XFS_BSTATE_IN_FLIGHT; - percpu_counter_dec(&bp->b_target->bt_io_count); - } -} - /* * When we mark a buffer stale, we remove the buffer from the LRU and clear the * b_lru_ref count so that the buffer is freed immediately when the buffer @@ -156,8 +106,6 @@ xfs_buf_stale( * status now to preserve accounting consistency. */ spin_lock(&bp->b_lock); - __xfs_buf_ioacct_dec(bp); - atomic_set(&bp->b_lru_ref, 0); if (!(bp->b_state & XFS_BSTATE_DISPOSE) && (list_lru_del_obj(&bp->b_target->bt_lru, &bp->b_lru))) @@ -946,6 +894,7 @@ xfs_buf_readahead_map( bp->b_ops = ops; bp->b_flags &= ~(XBF_WRITE | XBF_DONE); bp->b_flags |= flags; + percpu_counter_inc(&target->bt_readahead_count); xfs_buf_submit(bp); } @@ -1002,10 +951,12 @@ xfs_buf_get_uncached( struct xfs_buf *bp; DEFINE_SINGLE_BUF_MAP(map, XFS_BUF_DADDR_NULL, numblks); + /* there are currently no valid flags for xfs_buf_get_uncached */ + ASSERT(flags == 0); + *bpp = NULL; - /* flags might contain irrelevant bits, pass only what we care about */ - error = _xfs_buf_alloc(target, &map, 1, flags & XBF_NO_IOACCT, &bp); + error = _xfs_buf_alloc(target, &map, 1, flags, &bp); if (error) return error; @@ -1059,7 +1010,6 @@ xfs_buf_rele_uncached( spin_unlock(&bp->b_lock); return; } - __xfs_buf_ioacct_dec(bp); spin_unlock(&bp->b_lock); xfs_buf_free(bp); } @@ -1078,19 +1028,11 @@ xfs_buf_rele_cached( spin_lock(&bp->b_lock); ASSERT(bp->b_hold >= 1); if (bp->b_hold > 1) { - /* - * Drop the in-flight state if the buffer is already on the LRU - * and it holds the only reference. This is racy because we - * haven't acquired the pag lock, but the use of _XBF_IN_FLIGHT - * ensures the decrement occurs only once per-buf. - */ - if (--bp->b_hold == 1 && !list_empty(&bp->b_lru)) - __xfs_buf_ioacct_dec(bp); + bp->b_hold--; goto out_unlock; } /* we are asked to drop the last reference */ - __xfs_buf_ioacct_dec(bp); if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) { /* * If the buffer is added to the LRU, keep the reference to the @@ -1369,6 +1311,8 @@ __xfs_buf_ioend( bp->b_ops->verify_read(bp); if (!bp->b_error) bp->b_flags |= XBF_DONE; + if (bp->b_flags & XBF_READ_AHEAD) + percpu_counter_dec(&bp->b_target->bt_readahead_count); } else { if (!bp->b_error) { bp->b_flags &= ~XBF_WRITE_FAIL; @@ -1657,9 +1601,6 @@ xfs_buf_submit( */ bp->b_error = 0; - if (bp->b_flags & XBF_ASYNC) - xfs_buf_ioacct_inc(bp); - if ((bp->b_flags & XBF_WRITE) && !xfs_buf_verify_write(bp)) { xfs_force_shutdown(bp->b_mount, SHUTDOWN_CORRUPT_INCORE); xfs_buf_ioend(bp); @@ -1785,9 +1726,8 @@ xfs_buftarg_wait( struct xfs_buftarg *btp) { /* - * First wait on the buftarg I/O count for all in-flight buffers to be - * released. This is critical as new buffers do not make the LRU until - * they are released. + * First wait for all in-flight readahead buffers to be released. This is + 8 critical as new buffers do not make the LRU until they are released. * * Next, flush the buffer workqueue to ensure all completion processing * has finished. Just waiting on buffer locks is not sufficient for @@ -1796,7 +1736,7 @@ xfs_buftarg_wait( * all reference counts have been dropped before we start walking the * LRU list. */ - while (percpu_counter_sum(&btp->bt_io_count)) + while (percpu_counter_sum(&btp->bt_readahead_count)) delay(100); flush_workqueue(btp->bt_mount->m_buf_workqueue); } @@ -1913,8 +1853,8 @@ xfs_destroy_buftarg( struct xfs_buftarg *btp) { shrinker_free(btp->bt_shrinker); - ASSERT(percpu_counter_sum(&btp->bt_io_count) == 0); - percpu_counter_destroy(&btp->bt_io_count); + ASSERT(percpu_counter_sum(&btp->bt_readahead_count) == 0); + percpu_counter_destroy(&btp->bt_readahead_count); list_lru_destroy(&btp->bt_lru); } @@ -1968,7 +1908,7 @@ xfs_init_buftarg( if (list_lru_init(&btp->bt_lru)) return -ENOMEM; - if (percpu_counter_init(&btp->bt_io_count, 0, GFP_KERNEL)) + if (percpu_counter_init(&btp->bt_readahead_count, 0, GFP_KERNEL)) goto out_destroy_lru; btp->bt_shrinker = @@ -1982,7 +1922,7 @@ xfs_init_buftarg( return 0; out_destroy_io_count: - percpu_counter_destroy(&btp->bt_io_count); + percpu_counter_destroy(&btp->bt_readahead_count); out_destroy_lru: list_lru_destroy(&btp->bt_lru); return -ENOMEM; diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 2e747555ad3f..80e06eecaf56 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -27,7 +27,6 @@ struct xfs_buf; #define XBF_READ (1u << 0) /* buffer intended for reading from device */ #define XBF_WRITE (1u << 1) /* buffer intended for writing to device */ #define XBF_READ_AHEAD (1u << 2) /* asynchronous read-ahead */ -#define XBF_NO_IOACCT (1u << 3) /* bypass I/O accounting (non-LRU bufs) */ #define XBF_ASYNC (1u << 4) /* initiator will not wait for completion */ #define XBF_DONE (1u << 5) /* all pages in the buffer uptodate */ #define XBF_STALE (1u << 6) /* buffer has been staled, do not find it */ @@ -58,7 +57,6 @@ typedef unsigned int xfs_buf_flags_t; { XBF_READ, "READ" }, \ { XBF_WRITE, "WRITE" }, \ { XBF_READ_AHEAD, "READ_AHEAD" }, \ - { XBF_NO_IOACCT, "NO_IOACCT" }, \ { XBF_ASYNC, "ASYNC" }, \ { XBF_DONE, "DONE" }, \ { XBF_STALE, "STALE" }, \ @@ -77,7 +75,6 @@ typedef unsigned int xfs_buf_flags_t; * Internal state flags. */ #define XFS_BSTATE_DISPOSE (1 << 0) /* buffer being discarded */ -#define XFS_BSTATE_IN_FLIGHT (1 << 1) /* I/O in flight */ struct xfs_buf_cache { struct rhashtable bc_hash; @@ -116,7 +113,7 @@ struct xfs_buftarg { struct shrinker *bt_shrinker; struct list_lru bt_lru; - struct percpu_counter bt_io_count; + struct percpu_counter bt_readahead_count; struct ratelimit_state bt_ioerror_rl; /* Atomic write unit values */ diff --git a/fs/xfs/xfs_buf_mem.c b/fs/xfs/xfs_buf_mem.c index 07bebbfb16ee..5b64a2b3b113 100644 --- a/fs/xfs/xfs_buf_mem.c +++ b/fs/xfs/xfs_buf_mem.c @@ -117,7 +117,7 @@ xmbuf_free( struct xfs_buftarg *btp) { ASSERT(xfs_buftarg_is_mem(btp)); - ASSERT(percpu_counter_sum(&btp->bt_io_count) == 0); + ASSERT(percpu_counter_sum(&btp->bt_readahead_count) == 0); trace_xmbuf_free(btp); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 477c5262cf91..b69356582b86 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -181,14 +181,11 @@ xfs_readsb( /* * Allocate a (locked) buffer to hold the superblock. This will be kept - * around at all times to optimize access to the superblock. Therefore, - * set XBF_NO_IOACCT to make sure it doesn't hold the buftarg count - * elevated. + * around at all times to optimize access to the superblock. */ reread: error = xfs_buf_read_uncached(mp->m_ddev_targp, XFS_SB_DADDR, - BTOBB(sector_size), XBF_NO_IOACCT, &bp, - buf_ops); + BTOBB(sector_size), 0, &bp, buf_ops); if (error) { if (loud) xfs_warn(mp, "SB validate failed with error %d.", error); diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index d8e6d073d64d..57bef567e011 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1407,7 +1407,7 @@ xfs_rtmount_readsb( /* m_blkbb_log is not set up yet */ error = xfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR, - mp->m_sb.sb_blocksize >> BBSHIFT, XBF_NO_IOACCT, &bp, + mp->m_sb.sb_blocksize >> BBSHIFT, 0, &bp, &xfs_rtsb_buf_ops); if (error) { xfs_warn(mp, "rt sb validate failed with error %d.", error); From patchwork Mon Feb 17 09:31:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13977388 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91E322153EC for ; Mon, 17 Feb 2025 09:32:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739784741; cv=none; b=gc4t1rvdAwv+SbXAR+rVaVcnYRBJspIv6JWDGFcu7oObbl0ETpu+0l+dAFp8DHJtposlrTgVtAZ+oOvneqhWEkffXEQWbdJuH23NtfpR3ZKqS6k+7QNHnfPQAz/b3vVpj4eeCxJA6rX9cXV5UCFMNyL/VIljcZOyjRg+8xS5nRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739784741; c=relaxed/simple; bh=k/meAa860+T3MCtZzqE4nN+3uq9fY8QeiFtrNUnFXgc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eb62uTSGRuHr4tdyH1QA8VPLjsxc+k7kUfANUjrjr7KPQ+PpPppXjYZso7PaePiUhGJc+B8v57Nrs5yrbhDrade77puPNNzx44VUZrdkVEyeRwoKAeE83qUwJYLsKsx1c4Yalsb9WXGcbo3IEOAYRkIT8/djIeP9Kkz4E/fgQ6E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=B+7y7wLT; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="B+7y7wLT" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=8ShMEhhMFAbUkuYQ3oBEFRM5BWUVpQKn7ZYeUTDfGi4=; b=B+7y7wLT1ztxssppD+rMwshlZ/ 630mv7kTYV6oY0iSJXe9Nm+Lv31AdsFecCF1fjFlZeBH4W7iXTnXFOOZp4t4mtv5t/T9sgRsXBHAk IzF6KRHnBAFB7oEnBVL4jZ8UKkvEP/Y2kWqGnHZ2z6b9bTvnqyw8dBeS0/8HilCwA0zBOQrjCl8KY Z9mgXEtSH33IMjPIrsN43ALOoPGoDRuT3lkAetSpPevIJXKrvvBuAYvaYw1vjDnWMOT8y3s4YbSED O7daEVNPB3RF/yS1hWfRVhvCjamBeuloNcPJW7qtMk32f1+KprEcfBcVKqMATW7mIzlPN2+1iWfIY dp2f3S6w==; Received: from 2a02-8389-2341-5b80-a8df-74d2-0b85-4db2.cable.dynamic.v6.surfer.at ([2a02:8389:2341:5b80:a8df:74d2:b85:4db2] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tjxUB-00000003wIt-3fp3; Mon, 17 Feb 2025 09:32:20 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , linux-xfs@vger.kernel.org Subject: [PATCH 4/4] xfs: remove the XBF_STALE check from xfs_buf_rele_cached Date: Mon, 17 Feb 2025 10:31:29 +0100 Message-ID: <20250217093207.3769550-5-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250217093207.3769550-1-hch@lst.de> References: <20250217093207.3769550-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html xfs_buf_stale already set b_lru_ref to 0, and thus prevents the buffer from moving to the LRU. Remove the duplicate check. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index f8efdee3c8b4..cf88b25fe3c5 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -99,12 +99,6 @@ xfs_buf_stale( */ bp->b_flags &= ~_XBF_DELWRI_Q; - /* - * Once the buffer is marked stale and unlocked, a subsequent lookup - * could reset b_flags. There is no guarantee that the buffer is - * unaccounted (released to LRU) before that occurs. Drop in-flight - * status now to preserve accounting consistency. - */ spin_lock(&bp->b_lock); atomic_set(&bp->b_lru_ref, 0); if (!(bp->b_state & XFS_BSTATE_DISPOSE) && @@ -1033,7 +1027,7 @@ xfs_buf_rele_cached( } /* we are asked to drop the last reference */ - if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) { + if (atomic_read(&bp->b_lru_ref)) { /* * If the buffer is added to the LRU, keep the reference to the * buffer for the LRU and clear the (now stale) dispose list