block: Skip the folio lock if the folio is already dirty

Message ID	20250124225104.326613-1-willy@infradead.org (mailing list archive)
State	New
Headers	show Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A06318A93E; Fri, 24 Jan 2025 22:51:12 +0000 (UTC) From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: Jens Axboe <axboe@kernel.dk> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andres Freund <andres@anarazel.de> Subject: [PATCH] block: Skip the folio lock if the folio is already dirty Date: Fri, 24 Jan 2025 22:51:02 +0000 Message-ID: <20250124225104.326613-1-willy@infradead.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	block: Skip the folio lock if the folio is already dirty \| expand block: Skip the folio lock if the folio is already dirty

Message ID

20250124225104.326613-1-willy@infradead.org (mailing list archive)

State

New

Headers

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	linux-mm@kvack.org,
	linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Andres Freund <andres@anarazel.de>
Subject: [PATCH] block: Skip the folio lock if the folio is already dirty
Date: Fri, 24 Jan 2025 22:51:02 +0000
Message-ID: <20250124225104.326613-1-willy@infradead.org>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

block: Skip the folio lock if the folio is already dirty | expand

Commit Message

Matthew Wilcox Jan. 24, 2025, 10:51 p.m. UTC

Postgres sees significant contention on the hashed folio waitqueue lock
when performing direct I/O to 1GB hugetlb pages.  This is because we
mark the destination pages as dirty, and the locks end up 512x more
contended with 1GB pages than with 2MB pages.

We can skip the locking if the folio is already marked as dirty.
The writeback path clears the dirty flag before commencing writeback,
if we see the dirty flag set, the data written to the folio will be
written back.

In one test, throughput increased from 18GB/s to 20GB/s and moved the
bottleneck elsewhere.

Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 block/bio.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Christoph Hellwig Jan. 28, 2025, 5:58 a.m. UTC | #1

On Fri, Jan 24, 2025 at 10:51:02PM +0000, Matthew Wilcox (Oracle) wrote:
>  	bio_for_each_folio_all(fi, bio) {
> +		if (folio_test_dirty(fi.folio))
> +			continue;

Can you add a comment why this is safe (the answer probably is "folio
dirtying through direct I/O is racy as hell anyway and we don't care")
and desirable?

Otherwise this looks great.

diff --git a/block/bio.c b/block/bio.c
index f0c416e5931d..58d30b1dc08e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1404,6 +1404,8 @@  void bio_set_pages_dirty(struct bio *bio)
 	struct folio_iter fi;
 
 	bio_for_each_folio_all(fi, bio) {
+		if (folio_test_dirty(fi.folio))
+			continue;
 		folio_lock(fi.folio);
 		folio_mark_dirty(fi.folio);
 		folio_unlock(fi.folio);

block: Skip the folio lock if the folio is already dirty

Commit Message

Comments

Patch