From patchwork Wed Feb 26 15:51:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992739 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 438C02EAE4 for ; Wed, 26 Feb 2025 15:52:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; cv=none; b=PokM0FZlPQCHokgYxCMR1CyAVmtmj6UgITUHdK02zbIae2bHWOMEfItHS5xBYGuMVuAthOsssm7HTRnnsGg8AbZCMs/LFmLcGlszXD0PNEXR1MCJqR+xcMQ5JCC+Sog8tz0oYFRKeBuqWUmOk8G5TV0ealPx4tK3CkIKC44NH10= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; c=relaxed/simple; bh=qcxYEQe1E0oUPbtbR0x0NKlmEucOx+8fv1Ln3WuySks=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VD5otR0rri2w4PMh2667ebfPeIDN9ltiS82mmZWRxWWIzDXkbOuNcSo8OncMZ0PnlF2onAKmvhXrCAMuUkgcKQfHtXdKXo7+ivzDrF0nmcbxLrBMPAFVAJhrFwVTUdC6Jj2Hrev551cNwlH4vLQeO8KYvDhYQfnVxrZK4YkGS9k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=tg9EXFY9; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="tg9EXFY9" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=OTV+fWF5ADfmdOOCB87tAKKxwRz3E6m5IfZZm13POfc=; b=tg9EXFY9nz8bYnOzCWdxgRbfyj mmdmahKAdndqDN/+42/ZjNQj0nXtGhA0dyNhMPjziZHmTFoRP91scHUDAt+rE0hgDmlxcNVVr/gnK w+Ey9LUg07diG3njW3oJpjjSzZv982p0a0/E5cog8sjzmy/zI5Ru7qLE5aMwNiy8+Bk3MLaH148BD PulN1Rj4hSGhGK1YtlWl30kIsta6Zy4aWhZgLaDjBjjNUnc1z3iAslIHDe822nmHb1mIJKgsmdQnb 0B8h4jFBBUW6n5O6xwo0MnWqtutZJWXOBQ6Ffj78OV7G2hF76+Eb0Ld7OEMC9y3PJm9YrmY0tAH3H mxuHBiZw==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiH-00000004PLc-3wNu; Wed, 26 Feb 2025 15:52:45 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 01/12] xfs: unmapped buffer item size straddling mismatch Date: Wed, 26 Feb 2025 07:51:29 -0800 Message-ID: <20250226155245.513494-2-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner We never log large contiguous regions of unmapped buffers, so this bug is never triggered by the current code. However, the slowpath for formatting buffer straddling regions is broken. That is, the size and shape of the log vector calculated across a straddle does not match how the formatting code formats a straddle. This results in a log vector with an uninitialised iovec and this causes a crash when xlog_write_full() goes to copy the iovec into the journal. Whilst touching this code, don't bother checking mapped or single folio buffers for discontiguous regions because they don't have them. This significantly reduces the overhead of this check when logging large buffers as calling xfs_buf_offset() is not free and it occurs a *lot* in those cases. Fixes: 929f8b0deb83 ("xfs: optimise xfs_buf_item_size/format for contiguous regions") Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_buf_item.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 47549cfa61cd..0ee6fa9efd18 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -57,6 +57,10 @@ xfs_buf_log_format_size( (blfp->blf_map_size * sizeof(blfp->blf_data_map[0])); } +/* + * We only have to worry about discontiguous buffer range straddling on unmapped + * buffers. Everything else will have a contiguous data region we can copy from. + */ static inline bool xfs_buf_item_straddle( struct xfs_buf *bp, @@ -66,6 +70,9 @@ xfs_buf_item_straddle( { void *first, *last; + if (bp->b_page_count == 1 || !(bp->b_flags & XBF_UNMAPPED)) + return false; + first = xfs_buf_offset(bp, offset + (first_bit << XFS_BLF_SHIFT)); last = xfs_buf_offset(bp, offset + ((first_bit + nbits) << XFS_BLF_SHIFT)); @@ -133,11 +140,13 @@ xfs_buf_item_size_segment( return; slow_scan: - /* Count the first bit we jumped out of the above loop from */ - (*nvecs)++; - *nbytes += XFS_BLF_CHUNK; + ASSERT(bp->b_addr == NULL); last_bit = first_bit; + nbits = 1; while (last_bit != -1) { + + *nbytes += XFS_BLF_CHUNK; + /* * This takes the bit number to start looking from and * returns the next set bit from there. It returns -1 @@ -152,6 +161,8 @@ xfs_buf_item_size_segment( * else keep scanning the current set of bits. */ if (next_bit == -1) { + if (first_bit != last_bit) + (*nvecs)++; break; } else if (next_bit != last_bit + 1 || xfs_buf_item_straddle(bp, offset, first_bit, nbits)) { @@ -163,7 +174,6 @@ xfs_buf_item_size_segment( last_bit++; nbits++; } - *nbytes += XFS_BLF_CHUNK; } } From patchwork Wed Feb 26 15:51:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992737 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71B6019597F for ; Wed, 26 Feb 2025 15:52:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; cv=none; b=ODVQgN/sHhv2GuDF63k9jh11rWuTU3TOdWt+DKaZK3vkupq+XN9Z/aF+qD0BwvH4sds88ZNo85PKUEOAlV/AGYteXdysdjbG5tZWpLOTaw6Wx+ibB39yHhb2pzxMqUGNU0KLMHB8J90jBViDYJQnECCOBHnNAANqzg2PfSIGmhs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; c=relaxed/simple; bh=wIgz0q21MCRth3M6hR1ai8RTzkSw+Jz68GjG9a2pDY8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aqmEIQ+yKxRAtw+lniuejD0bERHU12v4UzR/gkPZeoK9EzFAyGlCH2F0GK/hePz/iiZGjgNtDawA/DteNKv+m70d4AXvOGiWC1mMkLOul/9IyD/FcpAvEIMtlDpy9rQbBMhcCvWkk95VFASiDfA8d8AkDeTg6pT6UK89q7+8Q+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=kIN7c3bv; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="kIN7c3bv" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=nHf3wcGj+D8xdnFyA7p9D1kyiSujpeaXMZ3SXOjiAjU=; b=kIN7c3bv941s0V+7DqZtXDs8XB uMwZRVgubtNr1Oadu4S4mG3cxdz898C5sBRlZttEUctnBlxCLXjm1QKOj5bRqz78AuiVIKSfSzUA7 YjKq5X15qgcdGlnDwZeWZCk2aGT2hHmqJYkboROqIhJoBO1yqWUZHGXbfyGJC8T+Fy8mRJAYYGkmL thIObTDsK80n3mi39v6bY17yglSfRs3+OIoUOK0Fl9w5kAdBPwGvTyuYyuZp1cv6bBITIacIOGc2w AIQFutkQqzsr9btIuLzvdPnp8hbaJvDG8+Gplz92rv83EP5OtMVn8MJam/bxDm+Ert60ajbaKBrK0 V7CAyG3w==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiI-00000004PLe-0MHr; Wed, 26 Feb 2025 15:52:46 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 02/12] xfs: add a fast path to xfs_buf_zero when b_addr is set Date: Wed, 26 Feb 2025 07:51:30 -0800 Message-ID: <20250226155245.513494-3-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html No need to walk the page list if bp->b_addr is valid. That also means b_offset doesn't need to be taken into account in the unmapped loop as b_offset is only set for kmem backed buffers which are always mapped. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 15bb790359f8..9e0c64511936 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1688,13 +1688,18 @@ xfs_buf_zero( { size_t bend; + if (bp->b_addr) { + memset(bp->b_addr + boff, 0, bsize); + return; + } + bend = boff + bsize; while (boff < bend) { struct page *page; int page_index, page_offset, csize; - page_index = (boff + bp->b_offset) >> PAGE_SHIFT; - page_offset = (boff + bp->b_offset) & ~PAGE_MASK; + page_index = boff >> PAGE_SHIFT; + page_offset = boff & ~PAGE_MASK; page = bp->b_pages[page_index]; csize = min_t(size_t, PAGE_SIZE - page_offset, BBTOB(bp->b_length) - boff); From patchwork Wed Feb 26 15:51:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992733 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B426A2904 for ; Wed, 26 Feb 2025 15:52:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585168; cv=none; b=iVlM8eXVyu/GQZqXWsJIEuSI8D9hRILf5UpquQP+T3iotFsKwIgmWMiBIW5wlSAUziVKIvWe5wMBmObElbPWDt8Y/u7qh1JmpLIVdC+sJBSXJumCxLVbgsH2xKeDx6Ew+DH8HrgSjWhwZgJrXwYrNzBjRogwBgXA6+iTA8UgNns= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585168; c=relaxed/simple; bh=g5ZUMDPL8d2nAHuRJu4E/1Rkyt2Oa79ZeWrJFDh4irk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TnuJf2u3CJJH2u4nOtt/ilbRZTuXOWxJscOJKDoEQzZF2ZLjdornbeM8p8Z6xPlSc2713FhZ+K68nXPI/5FDqck294O4fCb9VsXmBmHlVwOZtpvZk7NtDVx6MJutqv1k3JvIA1Zbi2kvoc11pebwQaTH3+aKva39gFZpkzdIBd8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=w8NbQxG4; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="w8NbQxG4" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=m2gYjj+3U9OXtS4ZNoUQ7oKb5ux5M58U4foc4dj6K6o=; b=w8NbQxG4bLKPah7qBAP5X7ofjG Peqko5WVDdUdZVwyBhqnaLHlRyuGKxpKJBWosY3YhsVSyzdkpN4pXJl69AbScV2IdGxeRT2AQFNPy gMOP8VnPGZuF9y/hC8wWO1V7GSUrsr9wuq0NmAJCFdD5r3fqenDoYE98zmEKJMlRdt/WUbORHL5Ve QSF/z7Vb2HtecpZ7apoitTXVtSmR+FFloPj8WmsgxL9USiyb6Ch4xcW25K9BkCGmyHFvj4CPaDZpf kfmK5EYzti50gUAplXEboKhCrVRIuz2BWyPQilTwgTP/5t6PsU9bSMi5KTq4uQGuXU2rY8tyAncEN V5HeCfKw==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiI-00000004PLk-0xKi; Wed, 26 Feb 2025 15:52:46 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 03/12] xfs: remove xfs_buf.b_offset Date: Wed, 26 Feb 2025 07:51:31 -0800 Message-ID: <20250226155245.513494-4-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html b_offset is only set for slab backed buffers and always set to offset_in_page(bp->b_addr), which can be done just as easily in the only user of b_offset. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 3 +-- fs/xfs/xfs_buf.h | 2 -- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 9e0c64511936..ee678e13d9bd 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -336,7 +336,6 @@ xfs_buf_alloc_kmem( bp->b_addr = NULL; return -ENOMEM; } - bp->b_offset = offset_in_page(bp->b_addr); bp->b_pages = bp->b_page_array; bp->b_pages[0] = kmem_to_page(bp->b_addr); bp->b_page_count = 1; @@ -1528,7 +1527,7 @@ xfs_buf_submit_bio( if (bp->b_flags & _XBF_KMEM) { __bio_add_page(bio, virt_to_page(bp->b_addr), size, - bp->b_offset); + offset_in_page(bp->b_addr)); } else { for (p = 0; p < bp->b_page_count; p++) __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0); diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 3b4ed42e11c0..dc41b617b067 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -197,8 +197,6 @@ struct xfs_buf { int b_map_count; atomic_t b_pin_count; /* pin count */ unsigned int b_page_count; /* size of page array */ - unsigned int b_offset; /* page offset of b_addr, - only for _XBF_KMEM buffers */ int b_error; /* error code on I/O */ void (*b_iodone)(struct xfs_buf *bp); From patchwork Wed Feb 26 15:51:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992744 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDA5D19CC3E for ; Wed, 26 Feb 2025 15:52:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585170; cv=none; b=RouvN4NzMuqDIRnCXgfPW2KbW/jW+zbXFmKYXcCFDsgL+PC41QvqNS8GW/y4CrDprenBeTxkKywDOF0GoIi1bXwsLLvXoYz10jt7C4cVrqRAkvIvuLosh8y7+CI9VlIYzNmYNeJDWUKE/Qv9twZp9ctjVQ5UL/1xfUUWjOdTCqc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585170; c=relaxed/simple; bh=qmlSti8jQ4I9wUnAdAkS9Bs3UIxeuf71E/ZhVTWa7vU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tAhZ25bTKVj5/D4t1ZIc3mTVkcQwPV+ceDQ9Icd7rE1l9MlMJcltlNjQmapLTRbnHSyjLRiuOHATd5PX1wEOswNonsyTXuzQmS6VsMypAH8Ss6OEXL061exX3h8GmVM0cTxfn5YM+8XxFEKVFAB4ZTrx4xPAqupQ6wE0kXj0CNA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=289w8kun; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="289w8kun" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=M3HslwyJKh6uiMQxO9HaUc+SFd/ozF1C/yXZBP12VCs=; b=289w8kun3x+qL2F7LxUNHCj1fr sQwLUO28b8gAoX7VlBbbHGstNmCdbe4S7h4NHnnlvwWUa/kuTZZo/Sfr7M8UlcRE2mOjevjZuneMU UUcegxQ7AYBl69yEkCkhw7idBb7lTcxsRHYy9WBrabXs/hbImgZH9+d1FRYrsg4p3mNmVAmc0ykmR ic6gEg4DEjI5+xFb6jnxR+DGMxaGvlSlc2+8xZb9BogBujWAauNX5nFcQZQglCVU5bsItHF3+7lOb sRQ9QOOCpRdgK86X2M8rWkQhExw1QeRGBQD7TC+NkAzWPMofXVEP9uKS8anPI81L0kL5QIILuHg6z /nD4OMNA==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiI-00000004PLo-1ZS2; Wed, 26 Feb 2025 15:52:46 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 04/12] xfs: remove xfs_buf_is_vmapped Date: Wed, 26 Feb 2025 07:51:32 -0800 Message-ID: <20250226155245.513494-5-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html No need to look at the page count if we can simply call is_vmalloc_addr on bp->b_addr. This prepares for eventualy removing the b_page_count field. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index ee678e13d9bd..af1389ebdd69 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -60,20 +60,6 @@ static inline bool xfs_buf_is_uncached(struct xfs_buf *bp) return bp->b_rhash_key == XFS_BUF_DADDR_NULL; } -static inline int -xfs_buf_is_vmapped( - struct xfs_buf *bp) -{ - /* - * Return true if the buffer is vmapped. - * - * b_addr is null if the buffer is not mapped, but the code is clever - * enough to know it doesn't have to map a single page, so the check has - * to be both for b_addr and bp->b_page_count > 1. - */ - return bp->b_addr && bp->b_page_count > 1; -} - static inline int xfs_buf_vmap_len( struct xfs_buf *bp) @@ -270,7 +256,7 @@ xfs_buf_free_pages( ASSERT(bp->b_flags & _XBF_PAGES); - if (xfs_buf_is_vmapped(bp)) + if (is_vmalloc_addr(bp->b_addr)) vm_unmap_ram(bp->b_addr, bp->b_page_count); for (i = 0; i < bp->b_page_count; i++) { @@ -1361,7 +1347,7 @@ xfs_buf_ioend( trace_xfs_buf_iodone(bp, _RET_IP_); if (bp->b_flags & XBF_READ) { - if (!bp->b_error && xfs_buf_is_vmapped(bp)) + if (!bp->b_error && bp->b_addr && is_vmalloc_addr(bp->b_addr)) invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp)); if (!bp->b_error && bp->b_ops) @@ -1533,7 +1519,7 @@ xfs_buf_submit_bio( __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0); bio->bi_iter.bi_size = size; /* limit to the actual size used */ - if (xfs_buf_is_vmapped(bp)) + if (bp->b_addr && is_vmalloc_addr(bp->b_addr)) flush_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp)); } From patchwork Wed Feb 26 15:51:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992734 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FDB919ABC3 for ; Wed, 26 Feb 2025 15:52:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585168; cv=none; b=cKCV7sOz8I7nbNJnDzU2XUepCOcOaMhNf4VQ9dWMLDtzMWxmTHyY1+d12RraZENIALipGcoKhji1ajTRB6Uv2Z5/PyNk6YcIUNrzc2csIxdc0c+q8mVMmC4igGjRT+WBv6aYwgkosLBEPBUr9fY/Xrd75GrM1Dl4mc8JpbNxHwA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585168; c=relaxed/simple; bh=4bc/ddENqE3FBldChmG4jSbV/HC8sIrIGxhd31B6pJM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OkEDvlq4TlhUw5B6WCbSAW0SDEdjYoCIN6UxI7APHpGVkDIH/CVMrYW35Wur64M1mcd3TTp6vKPcjTshX6mVJOPEZraoMo3MJoU57UW0pWTkZoSWZIdTvddivSYNLbbUjaXnjsHYGJ4z9HaGbnPVAo7VgAMQ/Ey6fzyYurHXmvw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=uD9PRSRL; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="uD9PRSRL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=3FqZSWqApVrVerO84f5903mZzMqeE9zfd8V7Ewsz0/A=; b=uD9PRSRLiVNY8g+qhbn1eoOYy5 toKPtRO14p04ubAhj3sulxjobRzjqlcvcB9qmU7e84NCswwmhs3OM1yAhDaSVNk6RAGAUOMwcFVfn RV/zBTLFZ8pD5eIsbyEdOjYmjtKNkfH+j56cfOZpRXbT9n9M9rQrA5AlKY5xSOZqA65Nlav34lKgi h3cIb47U9hjYeJTEOmiUjjLwtGJzlYh7IzxT+0v8974LY4zVJM3YN3De1p6/It6URwFAhHgEFEERr 6kVWZxbn+lQP2+Ewx0goeEBlXzyRMK/qyNIWjDjmce8Ct1m3owP7x+loGSvquB0gHo6R48LMJqZhh g3w5rOCg==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiI-00000004PM7-2Bqx; Wed, 26 Feb 2025 15:52:46 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 05/12] xfs: refactor backing memory allocations for buffers Date: Wed, 26 Feb 2025 07:51:33 -0800 Message-ID: <20250226155245.513494-6-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Lift handling of shmem and slab backed buffers into xfs_buf_alloc_pages and rename the result to xfs_buf_alloc_backing_mem. This shares more code and ensures uncached buffers can also use slab, which slightly reduces the memory usage of growfs on 512 byte sector size file systems, but more importantly means the allocation invariants are the same for cached and uncached buffers. Document these new invariants with a big fat comment mostly stolen from a patch by Dave Chinner. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 55 +++++++++++++++++++++++++++++++----------------- 1 file changed, 36 insertions(+), 19 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index af1389ebdd69..e8783ee23623 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -329,19 +329,49 @@ xfs_buf_alloc_kmem( return 0; } +/* + * Allocate backing memory for a buffer. + * + * For tmpfs-backed buffers used by in-memory btrees this directly maps the + * tmpfs page cache folios. + * + * For real file system buffers there are two different kinds backing memory: + * + * The first type backs the buffer by a kmalloc allocation. This is done for + * less than PAGE_SIZE allocations to avoid wasting memory. + * + * The second type of buffer is the multi-page buffer. These are always made + * up of single pages so that they can be fed to vmap_ram() to return a + * contiguous memory region we can access the data through, or mark it as + * XBF_UNMAPPED and access the data directly through individual page_address() + * calls. + */ static int -xfs_buf_alloc_pages( +xfs_buf_alloc_backing_mem( struct xfs_buf *bp, xfs_buf_flags_t flags) { + size_t size = BBTOB(bp->b_length); gfp_t gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN; long filled = 0; + if (xfs_buftarg_is_mem(bp->b_target)) + return xmbuf_map_page(bp); + + /* + * For buffers that fit entirely within a single page, first attempt to + * allocate the memory from the heap to minimise memory usage. If we + * can't get heap memory for these small buffers, we fall back to using + * the page allocator. + */ + if (size < PAGE_SIZE && xfs_buf_alloc_kmem(new_bp, flags) == 0) + return 0; + if (flags & XBF_READ_AHEAD) gfp_mask |= __GFP_NORETRY; /* Make sure that we have a page list */ - bp->b_page_count = DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE); + bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE); if (bp->b_page_count <= XB_PAGES) { bp->b_pages = bp->b_page_array; } else { @@ -622,18 +652,7 @@ xfs_buf_find_insert( if (error) goto out_drop_pag; - if (xfs_buftarg_is_mem(new_bp->b_target)) { - error = xmbuf_map_page(new_bp); - } else if (BBTOB(new_bp->b_length) >= PAGE_SIZE || - xfs_buf_alloc_kmem(new_bp, flags) < 0) { - /* - * For buffers that fit entirely within a single page, first - * attempt to allocate the memory from the heap to minimise - * memory usage. If we can't get heap memory for these small - * buffers, we fall back to using the page allocator. - */ - error = xfs_buf_alloc_pages(new_bp, flags); - } + error = xfs_buf_alloc_backing_mem(new_bp, flags); if (error) goto out_free_buf; @@ -995,14 +1014,12 @@ xfs_buf_get_uncached( if (error) return error; - if (xfs_buftarg_is_mem(bp->b_target)) - error = xmbuf_map_page(bp); - else - error = xfs_buf_alloc_pages(bp, flags); + error = xfs_buf_alloc_backing_mem(bp, flags); if (error) goto fail_free_buf; - error = _xfs_buf_map_pages(bp, 0); + if (!bp->b_addr) + error = _xfs_buf_map_pages(bp, 0); if (unlikely(error)) { xfs_warn(target->bt_mount, "%s: failed to map pages", __func__); From patchwork Wed Feb 26 15:51:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992735 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32F0A1A238B for ; Wed, 26 Feb 2025 15:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585168; cv=none; b=OYfVi8JwEi+qffF3y1OP86JgYq3PA6XwpudyDinCts3u3bX6nxgHiKEq7yHg+mESkvZmadaa8jVs9r0jCHe5/r8GY6yzvBWDz0DAEDel1IaKO23BbDaO3r8TmRuQqm6t9KnvQU4K2wYbvIniS5jmAIOyfR84G7U+nSO4ujWqs+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585168; c=relaxed/simple; bh=4Mea2IyTZn698tpYHAOz4Am+ltGpBFN/5ydRGI2kpRQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WMKjhA35L0Sk9efIfeUVfWspK2vSxhi45NEV4WUDh53qNUsMpSGWH9VWQnxBIVNlrTyd9qp5UGQAUpxt3jF2sfKkAMVrbaJCGMhjrqiKr5TUBpDHRVdv0Q1GBH4ZTE3A98JGchut+T9cvSj+gaWu5tLT3dOPqDl5SQjwvHLuqko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=NpyTUfHm; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="NpyTUfHm" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=sMxdrwsWit8CQgbW4NFF0seZY6kKMZsOaHMwfuJcpEk=; b=NpyTUfHmUkn1hh1OjrzKdXElrP pSkRw8ewTmFoDskHMuAfEvetdCMuG7uSqvZEp0fVNPu70okw05r5e3/KGDXA0XEvNOk8wGY9h65cr MboP7QcVColQi4dtHJKdGiHogUrIfBs1gN5HX7Yc+fUzTLH0YZkqnAOZh14LVnkuOX1bok+8x2Hi0 F2lqBA5JEJ4575b5OOZYnOy50pmEd6WWkAATl/PL/jLaqxkEpC/WmLVjezg12gJH/h6e+cFVR9Hcj twmeVtIVvLTUbBaF84/hP89ruH4D+jU1FJujubFO8FxWkvjl5NOLHcx0Q4Yd+KjT26EiB6ipt7xXV FALSjmMg==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiI-00000004PMD-2s7y; Wed, 26 Feb 2025 15:52:46 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 06/12] xfs: remove the kmalloc to page allocator fallback Date: Wed, 26 Feb 2025 07:51:34 -0800 Message-ID: <20250226155245.513494-7-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)", kmalloc and friends guarantee that power of two sized allocations are naturally aligned. Limit our use of kmalloc for buffers to these power of two sizes and remove the fallback to the page allocator for this case, but keep a check in addition to trusting the slab allocator to get the alignment right. Also refactor the kmalloc path to reuse various calculations for the size and gfp flags. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_buf.c | 49 ++++++++++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 24 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index e8783ee23623..f327bf5b04c0 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -301,23 +301,24 @@ xfs_buf_free( static int xfs_buf_alloc_kmem( - struct xfs_buf *bp, - xfs_buf_flags_t flags) + struct xfs_buf *bp, + size_t size, + gfp_t gfp_mask) { - gfp_t gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL; - size_t size = BBTOB(bp->b_length); - - /* Assure zeroed buffer for non-read cases. */ - if (!(flags & XBF_READ)) - gfp_mask |= __GFP_ZERO; + ASSERT(is_power_of_2(size)); + ASSERT(size < PAGE_SIZE); - bp->b_addr = kmalloc(size, gfp_mask); + bp->b_addr = kmalloc(size, gfp_mask | __GFP_NOFAIL); if (!bp->b_addr) return -ENOMEM; - if (((unsigned long)(bp->b_addr + size - 1) & PAGE_MASK) != - ((unsigned long)bp->b_addr & PAGE_MASK)) { - /* b_addr spans two pages - use alloc_page instead */ + /* + * Slab guarantees that we get back naturally aligned allocations for + * power of two sizes. Keep this check as the canary in the coal mine + * if anything changes in slab. + */ + if (!IS_ALIGNED((unsigned long)bp->b_addr, size)) { + ASSERT(IS_ALIGNED((unsigned long)bp->b_addr, size)); kfree(bp->b_addr); bp->b_addr = NULL; return -ENOMEM; @@ -358,18 +359,22 @@ xfs_buf_alloc_backing_mem( if (xfs_buftarg_is_mem(bp->b_target)) return xmbuf_map_page(bp); - /* - * For buffers that fit entirely within a single page, first attempt to - * allocate the memory from the heap to minimise memory usage. If we - * can't get heap memory for these small buffers, we fall back to using - * the page allocator. - */ - if (size < PAGE_SIZE && xfs_buf_alloc_kmem(new_bp, flags) == 0) - return 0; + /* Assure zeroed buffer for non-read cases. */ + if (!(flags & XBF_READ)) + gfp_mask |= __GFP_ZERO; if (flags & XBF_READ_AHEAD) gfp_mask |= __GFP_NORETRY; + /* + * For buffers smaller than PAGE_SIZE use a kmalloc allocation if that + * is properly aligned. The slab allocator now guarantees an aligned + * allocation for all power of two sizes, we matches must of the smaller + * than PAGE_SIZE buffers used by XFS. + */ + if (size < PAGE_SIZE && is_power_of_2(size)) + return xfs_buf_alloc_kmem(bp, size, gfp_mask); + /* Make sure that we have a page list */ bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE); if (bp->b_page_count <= XB_PAGES) { @@ -382,10 +387,6 @@ xfs_buf_alloc_backing_mem( } bp->b_flags |= _XBF_PAGES; - /* Assure zeroed buffer for non-read cases. */ - if (!(flags & XBF_READ)) - gfp_mask |= __GFP_ZERO; - /* * Bulk filling of pages can take multiple calls. Not filling the entire * array is not an allocation failure, so don't back off if we get at From patchwork Wed Feb 26 15:51:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992736 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5850118DB03 for ; Wed, 26 Feb 2025 15:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; cv=none; b=YdA9cfkJPxD6xlqPdFLnNR1Awbz/V3ysSKvAxdD6gLjC//B+0rjxvPcnJff9J2P3rJx9Gn3lg5GQdV75gsjZCuwqa05bgWWmOmt2QGqfYkg6PlCEKvL1oalqkUpujixOmnQtdVS7/XpCOfOPx33tCQ3UcT8hrU8EJRriZC/1jqY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; c=relaxed/simple; bh=XlIdEUv/YwLp1SLmKAG/B/zi8anv+iIRS4hrqlk1mIU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mgcFfvdHTFTTTrOKrW8A4GfHQPG0dWtDJDmbLivNEd44YNtVjoWbMGVr/hPqLhh4xMMLd4nD2kDI/Gp6uUNzlyC3OR7ZXLpmgJKfU88cbMKpNzN3ObOyynB3uKs+IL0wGIAlb6QChr/xJAKpGj+2ObKz5svUvbglApFEMiqyCe8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=fkv5psBs; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="fkv5psBs" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=9+NPKx3NP3m3Fjsf9lqueX0O5QZHvpPlk2Trba14S4c=; b=fkv5psBs5NrY9b2ehFBjqlwRiK VxH0/qEldj0hYV4oxyP9hRbQD4Ghsk+LUVCgWoJXgFPykmNUp6I2WYAzw7lIqCaSEuasDlfp/Rv1y WJhPg5LqKmIrq5xRux4+mMnKS+ikNfHQwTDPGD5UdbT8uy/qm+0e4P7lIC6IBoK/koo/8gQd0So+G 2rN5nhGlwK7PC3Uz+4lzfiCGaR3C/8fDU5i9IInG3GDAlNyr1jokJ4j1l05eh0U7+79BxmImGwPCN m2Yip6F63IXpiIaUbzYudw8mCXOAcmgAthytK+3tRfGsUttqv2yOoCfxYHfbkblQKiLuq253aN29u vJpldoHw==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiI-00000004PMI-3UdB; Wed, 26 Feb 2025 15:52:46 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 07/12] xfs: convert buffer cache to use high order folios Date: Wed, 26 Feb 2025 07:51:35 -0800 Message-ID: <20250226155245.513494-8-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Now that we have the buffer cache using the folio API, we can extend the use of folios to allocate high order folios for multi-page buffers rather than an array of single pages that are then vmapped into a contiguous range. This creates a new type of single folio buffers that can have arbitrary order in addition to the existing multi-folio buffers made up of many single page folios that get vmapped. The single folio is for now stashed into the existing b_pages array, but that will go away entirely later in the series and remove the temporary page vs folio typing issues that only work because the two structures currently can be used largely interchangeable. The code that allocates buffers will optimistically attempt a high order folio allocation as a fast path if the buffer size is a power of two and thus fits into a folio. If this high order allocation fails, then we fall back to the existing multi-folio allocation code. This now forms the slow allocation path, and hopefully will be largely unused in normal conditions except for buffers with size that are not a power of two like larger remote xattrs. This should improve performance of large buffer operations (e.g. large directory block sizes) as we should now mostly avoid the expense of vmapping large buffers (and the vmap lock contention that can occur) as well as avoid the runtime pressure that frequently accessing kernel vmapped pages put on the TLBs. Based on a patch from Dave Chinner , but mutilated beyond recognition. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_buf.c | 58 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 52 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index f327bf5b04c0..3c582eaa656d 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -261,9 +261,10 @@ xfs_buf_free_pages( for (i = 0; i < bp->b_page_count; i++) { if (bp->b_pages[i]) - __free_page(bp->b_pages[i]); + folio_put(page_folio(bp->b_pages[i])); } - mm_account_reclaimed_pages(bp->b_page_count); + mm_account_reclaimed_pages( + DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE)); if (bp->b_pages != bp->b_page_array) kfree(bp->b_pages); @@ -336,12 +337,17 @@ xfs_buf_alloc_kmem( * For tmpfs-backed buffers used by in-memory btrees this directly maps the * tmpfs page cache folios. * - * For real file system buffers there are two different kinds backing memory: + * For real file system buffers there are three different kinds backing memory: * * The first type backs the buffer by a kmalloc allocation. This is done for * less than PAGE_SIZE allocations to avoid wasting memory. * - * The second type of buffer is the multi-page buffer. These are always made + * The second type is a single folio buffer - this may be a high order folio or + * just a single page sized folio, but either way they get treated the same way + * by the rest of the code - the buffer memory spans a single contiguous memory + * region that we don't have to map and unmap to access the data directly. + * + * The third type of buffer is the multi-page buffer. These are always made * up of single pages so that they can be fed to vmap_ram() to return a * contiguous memory region we can access the data through, or mark it as * XBF_UNMAPPED and access the data directly through individual page_address() @@ -354,6 +360,7 @@ xfs_buf_alloc_backing_mem( { size_t size = BBTOB(bp->b_length); gfp_t gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN; + struct folio *folio; long filled = 0; if (xfs_buftarg_is_mem(bp->b_target)) @@ -375,7 +382,46 @@ xfs_buf_alloc_backing_mem( if (size < PAGE_SIZE && is_power_of_2(size)) return xfs_buf_alloc_kmem(bp, size, gfp_mask); - /* Make sure that we have a page list */ + /* Assure zeroed buffer for non-read cases. */ + if (!(flags & XBF_READ)) + gfp_mask |= __GFP_ZERO; + + /* + * Don't bother with the retry loop for single PAGE allocations, there + * is litte changes this can be better than the VM version. + */ + if (size <= PAGE_SIZE) + gfp_mask |= __GFP_NOFAIL; + + /* + * Optimistically attempt a single high order folio allocation for + * larger than PAGE_SIZE buffers. + * + * Allocating a high order folio makes the assumption that buffers are a + * power-of-2 size, matching the power-of-2 folios sizes available. + * + * The exception here are user xattr data buffers, which can be arbitrarily + * sized up to 64kB plus structure metadata, skip straight to the vmalloc + * path for them instead of wasting memory. + * here. + */ + if (size > PAGE_SIZE && !is_power_of_2(size)) + goto fallback; + folio = folio_alloc(gfp_mask, get_order(size)); + if (!folio) { + if (size <= PAGE_SIZE) + return -ENOMEM; + goto fallback; + } + bp->b_addr = folio_address(folio); + bp->b_page_array[0] = &folio->page; + bp->b_pages = bp->b_page_array; + bp->b_page_count = 1; + bp->b_flags |= _XBF_PAGES; + return 0; + +fallback: + /* Fall back to allocating an array of single page folios. */ bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE); if (bp->b_page_count <= XB_PAGES) { bp->b_pages = bp->b_page_array; @@ -1529,7 +1575,7 @@ xfs_buf_submit_bio( bio->bi_private = bp; bio->bi_end_io = xfs_buf_bio_end_io; - if (bp->b_flags & _XBF_KMEM) { + if (bp->b_page_count == 1) { __bio_add_page(bio, virt_to_page(bp->b_addr), size, offset_in_page(bp->b_addr)); } else { From patchwork Wed Feb 26 15:51:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992738 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D96C21C9EE for ; Wed, 26 Feb 2025 15:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; cv=none; b=E1FrpzalXqHba884pWQ1U3L8S2JnC49wOwE8pwt5AKIn5ah65yzANzLPqz+lSka3kT74vqrBig92WejYdJ3axkm7fadiG2bcWQBO+Zoqv9AWlraNdh4MU9gTFTMhjMxFnJ7maUUmIjgKywlpsKHINUHTlERAjFGMxNmZzIbl7gs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; c=relaxed/simple; bh=atgATXpH1lswt9o/sFitM+TJ1XJ7Mo/NHkULWaroqmk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JMLcOpIJe9uDTeTYf5XE6PrKfFrM1qL5cI8lMaA3D+FPpOVnHQGRcl8iCBpA3lo/S6ovNYzP9F+jfI+yG1Pxhwcl2GiCUIRd81SQx31Ik7Da99FvP+am5b2HfnRM3oDwmzOxNyFE86zMRr29LzvHNH7KQqm2ZRJgce0PB/0CxH8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=aNG4JasE; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="aNG4JasE" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=myLPhAFKh3abcIIW+nj6ElRbOG2effVawlV8DvFTunM=; b=aNG4JasEZb9/INnpibdGna451+ 76X9baPCX7f1/h9/da9NxwskPJOU3UoGF4nalJCxdSyhWb6g/dSiq8raSsqFdg6vfaCboaB3C3qx/ cMfl6OVZ5qG7xERoz31xxqDtefxdLyBuj4tNCf05ui3eU0gpNsgdZFD183J8Nlag135aNGZid4DeF 1Ja8wp0mYxmZBX10UCXu1KtqxnACjkYymMZ9mCFJI0g2vif/MRMZY4Mw6eAN50wE7zbMX6lXSPnc4 2JdjglP1kP4CRqU5tkoC9eDanNWzF58Ozau5/+pokue9dcF+XJ9HkQo6umkj7sI+oQs5HdJNpIa3e ZZEq3LLw==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiI-00000004PMR-47q4; Wed, 26 Feb 2025 15:52:47 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 08/12] xfs: kill XBF_UNMAPPED Date: Wed, 26 Feb 2025 07:51:36 -0800 Message-ID: <20250226155245.513494-9-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Unmapped buffer access is a pain, so kill it. The switch to large folios means we rarely pay a vmap penalty for large buffers, so this functionality is largely unnecessary now. Signed-off-by: Christoph Hellwig Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_ialloc.c | 2 +- fs/xfs/libxfs/xfs_inode_buf.c | 2 +- fs/xfs/scrub/inode_repair.c | 3 +- fs/xfs/xfs_buf.c | 58 +++-------------------------------- fs/xfs/xfs_buf.h | 16 +++++++--- fs/xfs/xfs_buf_item.c | 2 +- fs/xfs/xfs_buf_item_recover.c | 8 +---- fs/xfs/xfs_inode.c | 3 +- 8 files changed, 21 insertions(+), 73 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index f3a840a425f5..24b133930368 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -364,7 +364,7 @@ xfs_ialloc_inode_init( (j * M_IGEO(mp)->blocks_per_cluster)); error = xfs_trans_get_buf(tp, mp->m_ddev_targp, d, mp->m_bsize * M_IGEO(mp)->blocks_per_cluster, - XBF_UNMAPPED, &fbuf); + 0, &fbuf); if (error) return error; diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index f24fa628fecf..2f575b88cd7c 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -137,7 +137,7 @@ xfs_imap_to_bp( int error; error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, imap->im_blkno, - imap->im_len, XBF_UNMAPPED, bpp, &xfs_inode_buf_ops); + imap->im_len, 0, bpp, &xfs_inode_buf_ops); if (xfs_metadata_is_sick(error)) xfs_agno_mark_sick(mp, xfs_daddr_to_agno(mp, imap->im_blkno), XFS_SICK_AG_INODES); diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 13ff1c933cb8..2d2ff07e63e5 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -1558,8 +1558,7 @@ xrep_dinode_core( /* Read the inode cluster buffer. */ error = xfs_trans_read_buf(sc->mp, sc->tp, sc->mp->m_ddev_targp, - ri->imap.im_blkno, ri->imap.im_len, XBF_UNMAPPED, &bp, - NULL); + ri->imap.im_blkno, ri->imap.im_len, 0, &bp, NULL); if (error) return error; diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 3c582eaa656d..15087f24372f 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -203,7 +203,7 @@ _xfs_buf_alloc( * We don't want certain flags to appear in b_flags unless they are * specifically set by later operations on the buffer. */ - flags &= ~(XBF_UNMAPPED | XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD); + flags &= ~(XBF_TRYLOCK | XBF_ASYNC | XBF_READ_AHEAD); /* * A new buffer is held and locked by the owner. This ensures that the @@ -349,9 +349,7 @@ xfs_buf_alloc_kmem( * * The third type of buffer is the multi-page buffer. These are always made * up of single pages so that they can be fed to vmap_ram() to return a - * contiguous memory region we can access the data through, or mark it as - * XBF_UNMAPPED and access the data directly through individual page_address() - * calls. + * contiguous memory region we can access the data through. */ static int xfs_buf_alloc_backing_mem( @@ -474,8 +472,6 @@ _xfs_buf_map_pages( if (bp->b_page_count == 1) { /* A single page buffer is always mappable */ bp->b_addr = page_address(bp->b_pages[0]); - } else if (flags & XBF_UNMAPPED) { - bp->b_addr = NULL; } else { int retried = 0; unsigned nofs_flag; @@ -1411,7 +1407,7 @@ xfs_buf_ioend( trace_xfs_buf_iodone(bp, _RET_IP_); if (bp->b_flags & XBF_READ) { - if (!bp->b_error && bp->b_addr && is_vmalloc_addr(bp->b_addr)) + if (!bp->b_error && is_vmalloc_addr(bp->b_addr)) invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp)); if (!bp->b_error && bp->b_ops) @@ -1583,7 +1579,7 @@ xfs_buf_submit_bio( __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0); bio->bi_iter.bi_size = size; /* limit to the actual size used */ - if (bp->b_addr && is_vmalloc_addr(bp->b_addr)) + if (is_vmalloc_addr(bp->b_addr)) flush_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp)); } @@ -1715,52 +1711,6 @@ xfs_buf_submit( xfs_buf_submit_bio(bp); } -void * -xfs_buf_offset( - struct xfs_buf *bp, - size_t offset) -{ - struct page *page; - - if (bp->b_addr) - return bp->b_addr + offset; - - page = bp->b_pages[offset >> PAGE_SHIFT]; - return page_address(page) + (offset & (PAGE_SIZE-1)); -} - -void -xfs_buf_zero( - struct xfs_buf *bp, - size_t boff, - size_t bsize) -{ - size_t bend; - - if (bp->b_addr) { - memset(bp->b_addr + boff, 0, bsize); - return; - } - - bend = boff + bsize; - while (boff < bend) { - struct page *page; - int page_index, page_offset, csize; - - page_index = boff >> PAGE_SHIFT; - page_offset = boff & ~PAGE_MASK; - page = bp->b_pages[page_index]; - csize = min_t(size_t, PAGE_SIZE - page_offset, - BBTOB(bp->b_length) - boff); - - ASSERT((csize + page_offset) <= PAGE_SIZE); - - memset(page_address(page) + page_offset, 0, csize); - - boff += csize; - } -} - /* * Log a message about and stale a buffer that a caller has decided is corrupt. * diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index dc41b617b067..57faed82e93c 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -49,7 +49,6 @@ struct xfs_buf; #define XBF_LIVESCAN (1u << 28) #define XBF_INCORE (1u << 29)/* lookup only, return if found in cache */ #define XBF_TRYLOCK (1u << 30)/* lock requested, but do not wait */ -#define XBF_UNMAPPED (1u << 31)/* do not map the buffer */ typedef unsigned int xfs_buf_flags_t; @@ -70,8 +69,7 @@ typedef unsigned int xfs_buf_flags_t; /* The following interface flags should never be set */ \ { XBF_LIVESCAN, "LIVESCAN" }, \ { XBF_INCORE, "INCORE" }, \ - { XBF_TRYLOCK, "TRYLOCK" }, \ - { XBF_UNMAPPED, "UNMAPPED" } + { XBF_TRYLOCK, "TRYLOCK" } /* * Internal state flags. @@ -316,12 +314,20 @@ extern void __xfs_buf_ioerror(struct xfs_buf *bp, int error, #define xfs_buf_ioerror(bp, err) __xfs_buf_ioerror((bp), (err), __this_address) extern void xfs_buf_ioerror_alert(struct xfs_buf *bp, xfs_failaddr_t fa); void xfs_buf_ioend_fail(struct xfs_buf *); -void xfs_buf_zero(struct xfs_buf *bp, size_t boff, size_t bsize); void __xfs_buf_mark_corrupt(struct xfs_buf *bp, xfs_failaddr_t fa); #define xfs_buf_mark_corrupt(bp) __xfs_buf_mark_corrupt((bp), __this_address) /* Buffer Utility Routines */ -extern void *xfs_buf_offset(struct xfs_buf *, size_t); +static inline void *xfs_buf_offset(struct xfs_buf *bp, size_t offset) +{ + return bp->b_addr + offset; +} + +static inline void xfs_buf_zero(struct xfs_buf *bp, size_t boff, size_t bsize) +{ + memset(bp->b_addr + boff, 0, bsize); +} + extern void xfs_buf_stale(struct xfs_buf *bp); /* Delayed Write Buffer Routines */ diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 0ee6fa9efd18..41f0bc9aa5f4 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -70,7 +70,7 @@ xfs_buf_item_straddle( { void *first, *last; - if (bp->b_page_count == 1 || !(bp->b_flags & XBF_UNMAPPED)) + if (bp->b_page_count == 1) return false; first = xfs_buf_offset(bp, offset + (first_bit << XFS_BLF_SHIFT)); diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 05a2f6927c12..d4c5cef5bc43 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -1006,7 +1006,6 @@ xlog_recover_buf_commit_pass2( struct xfs_mount *mp = log->l_mp; struct xfs_buf *bp; int error; - uint buf_flags; xfs_lsn_t lsn; /* @@ -1025,13 +1024,8 @@ xlog_recover_buf_commit_pass2( } trace_xfs_log_recover_buf_recover(log, buf_f); - - buf_flags = 0; - if (buf_f->blf_flags & XFS_BLF_INODE_BUF) - buf_flags |= XBF_UNMAPPED; - error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno, buf_f->blf_len, - buf_flags, &bp, NULL); + 0, &bp, NULL); if (error) return error; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index b1f9f156ec88..36cfd9c457ce 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1721,8 +1721,7 @@ xfs_ifree_cluster( * to mark all the active inodes on the buffer stale. */ error = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno, - mp->m_bsize * igeo->blocks_per_cluster, - XBF_UNMAPPED, &bp); + mp->m_bsize * igeo->blocks_per_cluster, 0, &bp); if (error) return error; From patchwork Wed Feb 26 15:51:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992740 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 849B021C9FE for ; Wed, 26 Feb 2025 15:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; cv=none; b=SeMRlr229Tx/X7ia4lRsGkrMwpOCi8F0+MCBgxhxbfU5SVuw/CNoHX5Is/bET230jkVaP3Zj86b1SAYJ04WWJVi9Q3JSX0AfkFQMCNfH+5MJurk9J6VxYtpnakBS589D9ms4mPnJ4KAVAdo5E1pxYXE1Z4U2/NVfMTfBJX53aLs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; c=relaxed/simple; bh=zqsJCED4EjQ8EcGsCefJ1mVwlYHgHiKiK+uHcjXBsC8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GuXh6V6f3cNoK4P9KJssoqJf5O8zQ2YBFwBLX9QYKjQDLhkLvfL0tDpsEBN//BwizkefcVqw4B9YMlFRM+rASNpBr2Xu5tekWFk/vy4yAGjoe0VCBTdomCn4XhJQN5CObNPRSERK0ZbDr4s+Zt8vx0XQPs3sOolp2nv1AYJQDhc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Ho6PIxJH; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Ho6PIxJH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=GP1hUh1g9MG4qqN2UqR10dgqJkITzaCajHI3aV8ZfIg=; b=Ho6PIxJHM1poafnzTbJLw1p+KT L4NjJ/7g98lCNaS2TfqOmVbChQRclzoTMRsDPUiuUt5oIFOC6v6z9OFsFPMqS7m2F4xRfvjVfXeFX nPZgxK9/ZD0RXZdutkYbF4QM/+YqKhfv66GyGs32XmfW5pMiCycVr32Djd3sSV6jlsHqUIo0dT6Ty 8dSYxx5yreVau79BNiuJOSJRwezL/n2jvgz48HllFJqh5lvVbP4e5vRoRx5OuurCI2pOnpjqgnw+Z 6ybm6pc47lrrlMtGspPX/5EkLRtCdWWAzGWrJoVZrQTpKxzkAL11ybVSRoMyMKgek2xZnRBQ3/Ort pa8taELw==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiJ-00000004PMT-0aEC; Wed, 26 Feb 2025 15:52:47 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 09/12] xfs: buffer items don't straddle pages anymore Date: Wed, 26 Feb 2025 07:51:37 -0800 Message-ID: <20250226155245.513494-10-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html From: Dave Chinner Unmapped buffers don't exist anymore, so the page straddling detection and slow path code can go away now. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_buf_item.c | 124 ------------------------------------------ 1 file changed, 124 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 41f0bc9aa5f4..19eb0b7a3e58 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -57,31 +57,6 @@ xfs_buf_log_format_size( (blfp->blf_map_size * sizeof(blfp->blf_data_map[0])); } -/* - * We only have to worry about discontiguous buffer range straddling on unmapped - * buffers. Everything else will have a contiguous data region we can copy from. - */ -static inline bool -xfs_buf_item_straddle( - struct xfs_buf *bp, - uint offset, - int first_bit, - int nbits) -{ - void *first, *last; - - if (bp->b_page_count == 1) - return false; - - first = xfs_buf_offset(bp, offset + (first_bit << XFS_BLF_SHIFT)); - last = xfs_buf_offset(bp, - offset + ((first_bit + nbits) << XFS_BLF_SHIFT)); - - if (last - first != nbits * XFS_BLF_CHUNK) - return true; - return false; -} - /* * Return the number of log iovecs and space needed to log the given buf log * item segment. @@ -98,11 +73,8 @@ xfs_buf_item_size_segment( int *nvecs, int *nbytes) { - struct xfs_buf *bp = bip->bli_buf; int first_bit; int nbits; - int next_bit; - int last_bit; first_bit = xfs_next_bit(blfp->blf_data_map, blfp->blf_map_size, 0); if (first_bit == -1) @@ -115,15 +87,6 @@ xfs_buf_item_size_segment( nbits = xfs_contig_bits(blfp->blf_data_map, blfp->blf_map_size, first_bit); ASSERT(nbits > 0); - - /* - * Straddling a page is rare because we don't log contiguous - * chunks of unmapped buffers anywhere. - */ - if (nbits > 1 && - xfs_buf_item_straddle(bp, offset, first_bit, nbits)) - goto slow_scan; - (*nvecs)++; *nbytes += nbits * XFS_BLF_CHUNK; @@ -138,43 +101,6 @@ xfs_buf_item_size_segment( } while (first_bit != -1); return; - -slow_scan: - ASSERT(bp->b_addr == NULL); - last_bit = first_bit; - nbits = 1; - while (last_bit != -1) { - - *nbytes += XFS_BLF_CHUNK; - - /* - * This takes the bit number to start looking from and - * returns the next set bit from there. It returns -1 - * if there are no more bits set or the start bit is - * beyond the end of the bitmap. - */ - next_bit = xfs_next_bit(blfp->blf_data_map, blfp->blf_map_size, - last_bit + 1); - /* - * If we run out of bits, leave the loop, - * else if we find a new set of bits bump the number of vecs, - * else keep scanning the current set of bits. - */ - if (next_bit == -1) { - if (first_bit != last_bit) - (*nvecs)++; - break; - } else if (next_bit != last_bit + 1 || - xfs_buf_item_straddle(bp, offset, first_bit, nbits)) { - last_bit = next_bit; - first_bit = next_bit; - (*nvecs)++; - nbits = 1; - } else { - last_bit++; - nbits++; - } - } } /* @@ -287,8 +213,6 @@ xfs_buf_item_format_segment( struct xfs_buf *bp = bip->bli_buf; uint base_size; int first_bit; - int last_bit; - int next_bit; uint nbits; /* copy the flags across from the base format item */ @@ -333,15 +257,6 @@ xfs_buf_item_format_segment( nbits = xfs_contig_bits(blfp->blf_data_map, blfp->blf_map_size, first_bit); ASSERT(nbits > 0); - - /* - * Straddling a page is rare because we don't log contiguous - * chunks of unmapped buffers anywhere. - */ - if (nbits > 1 && - xfs_buf_item_straddle(bp, offset, first_bit, nbits)) - goto slow_scan; - xfs_buf_item_copy_iovec(lv, vecp, bp, offset, first_bit, nbits); blfp->blf_size++; @@ -357,45 +272,6 @@ xfs_buf_item_format_segment( } while (first_bit != -1); return; - -slow_scan: - ASSERT(bp->b_addr == NULL); - last_bit = first_bit; - nbits = 1; - for (;;) { - /* - * This takes the bit number to start looking from and - * returns the next set bit from there. It returns -1 - * if there are no more bits set or the start bit is - * beyond the end of the bitmap. - */ - next_bit = xfs_next_bit(blfp->blf_data_map, blfp->blf_map_size, - (uint)last_bit + 1); - /* - * If we run out of bits fill in the last iovec and get out of - * the loop. Else if we start a new set of bits then fill in - * the iovec for the series we were looking at and start - * counting the bits in the new one. Else we're still in the - * same set of bits so just keep counting and scanning. - */ - if (next_bit == -1) { - xfs_buf_item_copy_iovec(lv, vecp, bp, offset, - first_bit, nbits); - blfp->blf_size++; - break; - } else if (next_bit != last_bit + 1 || - xfs_buf_item_straddle(bp, offset, first_bit, nbits)) { - xfs_buf_item_copy_iovec(lv, vecp, bp, offset, - first_bit, nbits); - blfp->blf_size++; - first_bit = next_bit; - last_bit = next_bit; - nbits = 1; - } else { - last_bit++; - nbits++; - } - } } /* From patchwork Wed Feb 26 15:51:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992742 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B205021CA01 for ; Wed, 26 Feb 2025 15:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; cv=none; b=vBdrOBbdakKaHBNSo995L1pBNLmwtRwsLSKNVLfzUme2lFK7uGpJC5VaJhMl56Oajn4w4+HFDfW8GXNFGtDu+2pDwx9ps1cZ81BkqNr2bHD6H8d1bSnYzPTRP8LtEcynZ6WDEU3VQ9PDpTPO0bZcS1J1PXFZl9UMclWRbP6lNaI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; c=relaxed/simple; bh=NC/Un2C9Jsaxw5Ty0emocI2FtTX3JoF39PLYNQzO/a0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=j+KDOucac8+H/wH0FVTi/PHejyL39DR8EI5UCOhavZ/+Uy9zkScNfLVYCQBNM+50pGiMu14n+GQrmFnrbyXOIyhlOleFaiRedBGaWLr2lJbqTHA3qnBsb5g5G60KfY1NVUckFsQzyCc1l9WAxZnVyXkMUItcGdDBo0PO6dCXqZ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=qCB8O+9+; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="qCB8O+9+" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=DyZhPL3B9tQn6CgOryAXXLrGhj20dfqSDhPObVEvqww=; b=qCB8O+9+OAq+7MwDY9YiJDx6O7 M7tulVtu6UeMeCiESv+HFReJDu7txZ5ip/ECLel2+KPYNapCu4QDP38s38cKjFKLGRT/enR49nn8U /sjmZEDf09aAxsK7Bcz6xLkz7P0tL4k7/1x2nqEVU2Zi1ondCq/EEoVFhmH1qVWbwtLgPdFdwcI0O MMiX2EViTf6FzT4siWMlInhol/M9VylH/in1Vak8+CKOxPkl9/62y0loVtKv6Yg4pWQNWP6PHtUIy CTziiv2YHEIvxoWgZ5SkkvOv88yyqa7e2NGPLDoPBsE79HPZXzpYwxaVntvlFZUeNkFODeyqxy/96 k5eMFtiA==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiJ-00000004PMd-1EXV; Wed, 26 Feb 2025 15:52:47 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 10/12] xfs: use vmalloc instead of vm_map_area for buffer backing memory Date: Wed, 26 Feb 2025 07:51:38 -0800 Message-ID: <20250226155245.513494-11-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html The fallback buffer allocation path currently open codes a suboptimal version of vmalloc to allocate pages that are then mapped into vmalloc space. Switch to using vmalloc instead, which uses all the optimizations in the common vmalloc code, and removes the need to track the backing pages in the xfs_buf structure. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_buf.c | 209 ++++++++++--------------------------------- fs/xfs/xfs_buf.h | 7 -- fs/xfs/xfs_buf_mem.c | 11 +-- 3 files changed, 49 insertions(+), 178 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 15087f24372f..fb127589c6b4 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -60,13 +60,6 @@ static inline bool xfs_buf_is_uncached(struct xfs_buf *bp) return bp->b_rhash_key == XFS_BUF_DADDR_NULL; } -static inline int -xfs_buf_vmap_len( - struct xfs_buf *bp) -{ - return (bp->b_page_count * PAGE_SIZE); -} - /* * Bump the I/O in flight count on the buftarg if we haven't yet done so for * this buffer. The count is incremented once per buffer (per hold cycle) @@ -248,30 +241,6 @@ _xfs_buf_alloc( return 0; } -static void -xfs_buf_free_pages( - struct xfs_buf *bp) -{ - uint i; - - ASSERT(bp->b_flags & _XBF_PAGES); - - if (is_vmalloc_addr(bp->b_addr)) - vm_unmap_ram(bp->b_addr, bp->b_page_count); - - for (i = 0; i < bp->b_page_count; i++) { - if (bp->b_pages[i]) - folio_put(page_folio(bp->b_pages[i])); - } - mm_account_reclaimed_pages( - DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE)); - - if (bp->b_pages != bp->b_page_array) - kfree(bp->b_pages); - bp->b_pages = NULL; - bp->b_flags &= ~_XBF_PAGES; -} - static void xfs_buf_free_callback( struct callback_head *cb) @@ -286,16 +255,23 @@ static void xfs_buf_free( struct xfs_buf *bp) { + unsigned int size = BBTOB(bp->b_length); + trace_xfs_buf_free(bp, _RET_IP_); ASSERT(list_empty(&bp->b_lru)); + if (!xfs_buftarg_is_mem(bp->b_target) && size >= PAGE_SIZE) + mm_account_reclaimed_pages(DIV_ROUND_UP(size, PAGE_SHIFT)); + if (xfs_buftarg_is_mem(bp->b_target)) xmbuf_unmap_page(bp); - else if (bp->b_flags & _XBF_PAGES) - xfs_buf_free_pages(bp); + else if (is_vmalloc_addr(bp->b_addr)) + vfree(bp->b_addr); else if (bp->b_flags & _XBF_KMEM) kfree(bp->b_addr); + else + folio_put(virt_to_folio(bp->b_addr)); call_rcu(&bp->b_rcu, xfs_buf_free_callback); } @@ -324,9 +300,6 @@ xfs_buf_alloc_kmem( bp->b_addr = NULL; return -ENOMEM; } - bp->b_pages = bp->b_page_array; - bp->b_pages[0] = kmem_to_page(bp->b_addr); - bp->b_page_count = 1; bp->b_flags |= _XBF_KMEM; return 0; } @@ -347,9 +320,9 @@ xfs_buf_alloc_kmem( * by the rest of the code - the buffer memory spans a single contiguous memory * region that we don't have to map and unmap to access the data directly. * - * The third type of buffer is the multi-page buffer. These are always made - * up of single pages so that they can be fed to vmap_ram() to return a - * contiguous memory region we can access the data through. + * The third type of buffer is the vmalloc()d buffer. This provides the buffer + * with the required contiguous memory region but backed by discontiguous + * physical pages. */ static int xfs_buf_alloc_backing_mem( @@ -359,7 +332,6 @@ xfs_buf_alloc_backing_mem( size_t size = BBTOB(bp->b_length); gfp_t gfp_mask = GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOWARN; struct folio *folio; - long filled = 0; if (xfs_buftarg_is_mem(bp->b_target)) return xmbuf_map_page(bp); @@ -412,98 +384,18 @@ xfs_buf_alloc_backing_mem( goto fallback; } bp->b_addr = folio_address(folio); - bp->b_page_array[0] = &folio->page; - bp->b_pages = bp->b_page_array; - bp->b_page_count = 1; - bp->b_flags |= _XBF_PAGES; return 0; fallback: - /* Fall back to allocating an array of single page folios. */ - bp->b_page_count = DIV_ROUND_UP(size, PAGE_SIZE); - if (bp->b_page_count <= XB_PAGES) { - bp->b_pages = bp->b_page_array; - } else { - bp->b_pages = kzalloc(sizeof(struct page *) * bp->b_page_count, - gfp_mask); - if (!bp->b_pages) - return -ENOMEM; - } - bp->b_flags |= _XBF_PAGES; - - /* - * Bulk filling of pages can take multiple calls. Not filling the entire - * array is not an allocation failure, so don't back off if we get at - * least one extra page. - */ for (;;) { - long last = filled; - - filled = alloc_pages_bulk(gfp_mask, bp->b_page_count, - bp->b_pages); - if (filled == bp->b_page_count) { - XFS_STATS_INC(bp->b_mount, xb_page_found); + bp->b_addr = __vmalloc(size, gfp_mask); + if (bp->b_addr) break; - } - - if (filled != last) - continue; - - if (flags & XBF_READ_AHEAD) { - xfs_buf_free_pages(bp); + if (flags & XBF_READ_AHEAD) return -ENOMEM; - } - XFS_STATS_INC(bp->b_mount, xb_page_retries); memalloc_retry_wait(gfp_mask); } - return 0; -} - -/* - * Map buffer into kernel address-space if necessary. - */ -STATIC int -_xfs_buf_map_pages( - struct xfs_buf *bp, - xfs_buf_flags_t flags) -{ - ASSERT(bp->b_flags & _XBF_PAGES); - if (bp->b_page_count == 1) { - /* A single page buffer is always mappable */ - bp->b_addr = page_address(bp->b_pages[0]); - } else { - int retried = 0; - unsigned nofs_flag; - - /* - * vm_map_ram() will allocate auxiliary structures (e.g. - * pagetables) with GFP_KERNEL, yet we often under a scoped nofs - * context here. Mixing GFP_KERNEL with GFP_NOFS allocations - * from the same call site that can be run from both above and - * below memory reclaim causes lockdep false positives. Hence we - * always need to force this allocation to nofs context because - * we can't pass __GFP_NOLOCKDEP down to auxillary structures to - * prevent false positive lockdep reports. - * - * XXX(dgc): I think dquot reclaim is the only place we can get - * to this function from memory reclaim context now. If we fix - * that like we've fixed inode reclaim to avoid writeback from - * reclaim, this nofs wrapping can go away. - */ - nofs_flag = memalloc_nofs_save(); - do { - bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count, - -1); - if (bp->b_addr) - break; - vm_unmap_aliases(); - } while (retried++ <= 1); - memalloc_nofs_restore(nofs_flag); - - if (!bp->b_addr) - return -ENOMEM; - } return 0; } @@ -623,7 +515,7 @@ xfs_buf_find_lock( return -ENOENT; } ASSERT((bp->b_flags & _XBF_DELWRI_Q) == 0); - bp->b_flags &= _XBF_KMEM | _XBF_PAGES; + bp->b_flags &= _XBF_KMEM; bp->b_ops = NULL; } return 0; @@ -809,18 +701,6 @@ xfs_buf_get_map( xfs_perag_put(pag); } - /* We do not hold a perag reference anymore. */ - if (!bp->b_addr) { - error = _xfs_buf_map_pages(bp, flags); - if (unlikely(error)) { - xfs_warn_ratelimited(btp->bt_mount, - "%s: failed to map %u pages", __func__, - bp->b_page_count); - xfs_buf_relse(bp); - return error; - } - } - /* * Clear b_error if this is a lookup from a caller that doesn't expect * valid data to be found in the buffer. @@ -1061,14 +941,6 @@ xfs_buf_get_uncached( if (error) goto fail_free_buf; - if (!bp->b_addr) - error = _xfs_buf_map_pages(bp, 0); - if (unlikely(error)) { - xfs_warn(target->bt_mount, - "%s: failed to map pages", __func__); - goto fail_free_buf; - } - trace_xfs_buf_get_uncached(bp, _RET_IP_); *bpp = bp; return 0; @@ -1409,7 +1281,8 @@ xfs_buf_ioend( if (bp->b_flags & XBF_READ) { if (!bp->b_error && is_vmalloc_addr(bp->b_addr)) invalidate_kernel_vmap_range(bp->b_addr, - xfs_buf_vmap_len(bp)); + DIV_ROUND_UP(BBTOB(bp->b_length), + PAGE_SIZE)); if (!bp->b_error && bp->b_ops) bp->b_ops->verify_read(bp); if (!bp->b_error) @@ -1561,29 +1434,43 @@ static void xfs_buf_submit_bio( struct xfs_buf *bp) { - unsigned int size = BBTOB(bp->b_length); - unsigned int map = 0, p; + unsigned int map = 0; struct blk_plug plug; struct bio *bio; - bio = bio_alloc(bp->b_target->bt_bdev, bp->b_page_count, - xfs_buf_bio_op(bp), GFP_NOIO); - bio->bi_private = bp; - bio->bi_end_io = xfs_buf_bio_end_io; + if (is_vmalloc_addr(bp->b_addr)) { + unsigned int size = BBTOB(bp->b_length); + unsigned int alloc_size = DIV_ROUND_UP(size, PAGE_SIZE); + void *data = bp->b_addr; - if (bp->b_page_count == 1) { - __bio_add_page(bio, virt_to_page(bp->b_addr), size, - offset_in_page(bp->b_addr)); - } else { - for (p = 0; p < bp->b_page_count; p++) - __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0); - bio->bi_iter.bi_size = size; /* limit to the actual size used */ + bio = bio_alloc(bp->b_target->bt_bdev, size >> PAGE_SHIFT, + xfs_buf_bio_op(bp), GFP_NOIO); + + do { + unsigned int len = min(size, PAGE_SIZE); + + ASSERT(offset_in_page(data) == 0); + __bio_add_page(bio, vmalloc_to_page(data), len, 0); + data += len; + size -= len; + } while (size); - if (is_vmalloc_addr(bp->b_addr)) - flush_kernel_vmap_range(bp->b_addr, - xfs_buf_vmap_len(bp)); + flush_kernel_vmap_range(bp->b_addr, alloc_size); + } else { + /* + * Single folio or slab allocation. Must be contigous and thus + * only a single bvec is needed. + */ + bio = bio_alloc(bp->b_target->bt_bdev, 1, xfs_buf_bio_op(bp), + GFP_NOIO); + __bio_add_page(bio, virt_to_page(bp->b_addr), + BBTOB(bp->b_length), + offset_in_page(bp->b_addr)); } + bio->bi_private = bp; + bio->bi_end_io = xfs_buf_bio_end_io; + /* * If there is more than one map segment, split out a new bio for each * map except of the last one. The last map is handled by the diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 57faed82e93c..3089e5d5f042 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -37,7 +37,6 @@ struct xfs_buf; #define _XBF_LOGRECOVERY (1u << 18)/* log recovery buffer */ /* flags used only internally */ -#define _XBF_PAGES (1u << 20)/* backed by refcounted pages */ #define _XBF_KMEM (1u << 21)/* backed by heap memory */ #define _XBF_DELWRI_Q (1u << 22)/* buffer on a delwri queue */ @@ -63,7 +62,6 @@ typedef unsigned int xfs_buf_flags_t; { XBF_STALE, "STALE" }, \ { XBF_WRITE_FAIL, "WRITE_FAIL" }, \ { _XBF_LOGRECOVERY, "LOG_RECOVERY" }, \ - { _XBF_PAGES, "PAGES" }, \ { _XBF_KMEM, "KMEM" }, \ { _XBF_DELWRI_Q, "DELWRI_Q" }, \ /* The following interface flags should never be set */ \ @@ -125,8 +123,6 @@ struct xfs_buftarg { struct xfs_buf_cache bt_cache[]; }; -#define XB_PAGES 2 - struct xfs_buf_map { xfs_daddr_t bm_bn; /* block number for I/O */ int bm_len; /* size of I/O */ @@ -188,13 +184,10 @@ struct xfs_buf { struct xfs_buf_log_item *b_log_item; struct list_head b_li_list; /* Log items list head */ struct xfs_trans *b_transp; - struct page **b_pages; /* array of page pointers */ - struct page *b_page_array[XB_PAGES]; /* inline pages */ struct xfs_buf_map *b_maps; /* compound buffer map */ struct xfs_buf_map __b_map; /* inline compound buffer map */ int b_map_count; atomic_t b_pin_count; /* pin count */ - unsigned int b_page_count; /* size of page array */ int b_error; /* error code on I/O */ void (*b_iodone)(struct xfs_buf *bp); diff --git a/fs/xfs/xfs_buf_mem.c b/fs/xfs/xfs_buf_mem.c index 07bebbfb16ee..e2f6c5524771 100644 --- a/fs/xfs/xfs_buf_mem.c +++ b/fs/xfs/xfs_buf_mem.c @@ -169,9 +169,6 @@ xmbuf_map_page( unlock_page(page); bp->b_addr = page_address(page); - bp->b_pages = bp->b_page_array; - bp->b_pages[0] = page; - bp->b_page_count = 1; return 0; } @@ -180,16 +177,10 @@ void xmbuf_unmap_page( struct xfs_buf *bp) { - struct page *page = bp->b_pages[0]; - ASSERT(xfs_buftarg_is_mem(bp->b_target)); - put_page(page); - + put_page(virt_to_page(bp->b_addr)); bp->b_addr = NULL; - bp->b_pages[0] = NULL; - bp->b_pages = NULL; - bp->b_page_count = 0; } /* Is this a valid daddr within the buftarg? */ From patchwork Wed Feb 26 15:51:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992743 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11D3521CC55 for ; Wed, 26 Feb 2025 15:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; cv=none; b=urUVWqSXrfMsLleiJqF8dnJ5ZpjehWJ2f9CrC7XrC3twjGlcdW6I8vW6QbyQ3QDc2v6+xgNkc85OUSyyc6/DGDD5p6Fud3p5oBUAJqbwm8KiYo59YfkysWmu+f6L/X+LNrGVQ3q8CEzUXLonRBO/+5ArwltSQOeht2iTLQYEL2Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; c=relaxed/simple; bh=uDEWTpg8mnnXSaZ8lPWrNBaeeMxNu+OBzp2ApgjOJI0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NZssALh2TBzUkHlH7kvEkZC7gi01HEyZRXRcG6KzBVNjoFDKh2a+vJn78uQTqNmdwlzP34jWS1NeEN9dAuYRGX12nmoJIcQRpO6pV4KajpuIW4N9y+SVTRkkd2KvnUeeBZUtJO66PEAd74sQqO1WuQrHIKnoFZWyTPljFBNYouc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=XFUFPAIh; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="XFUFPAIh" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=v06ItEd1Mw+PxR1dqggrrq/dp6p5p0vbfp95BtmKUx8=; b=XFUFPAIhbqY6eahjEnv8/fuaHz c/c5xfFZU/LF7EoHsMtnhYIq6nxgEvCpuyMiTrbjbX7TeE+R/b4j3lVBbPPSX+FbJF3glZ17dbfAf S+6pliAefyVqxbcODQh2CxeXUX28XUzj6M022/W/OBVm2xj7x/0iJzO9WXkAiBRkjhIr6Bti4LZJn GQN85HJjRb9KTGsW9OnSmReUyE/g30n1zlRWERxEUZuGSk2G9zx3aR6efCuS1MeKNdfhMY1Oaf6d7 Aoqv6P6FIdFzz6cpqd2oyLMrVyRUuLOXou4LYvi2fZ6utFZ6kzcHl0crBqqbZcwfp9nWLDlwZcnTr KvPkvYXg==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiJ-00000004PMf-1tka; Wed, 26 Feb 2025 15:52:47 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 11/12] xfs: cleanup mapping tmpfs folios into the buffer cache Date: Wed, 26 Feb 2025 07:51:39 -0800 Message-ID: <20250226155245.513494-12-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Directly assign b_addr based on the tmpfs folios without a detour through pages, reuse the folio_put path used for non-tmpfs buffers and replace all references to pages in comments with folios. Partially based on a patch from Dave Chinner . Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_buf.c | 6 ++---- fs/xfs/xfs_buf_mem.c | 34 ++++++++++------------------------ fs/xfs/xfs_buf_mem.h | 6 ++---- 3 files changed, 14 insertions(+), 32 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index fb127589c6b4..0393dd302cf6 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -264,9 +264,7 @@ xfs_buf_free( if (!xfs_buftarg_is_mem(bp->b_target) && size >= PAGE_SIZE) mm_account_reclaimed_pages(DIV_ROUND_UP(size, PAGE_SHIFT)); - if (xfs_buftarg_is_mem(bp->b_target)) - xmbuf_unmap_page(bp); - else if (is_vmalloc_addr(bp->b_addr)) + if (is_vmalloc_addr(bp->b_addr)) vfree(bp->b_addr); else if (bp->b_flags & _XBF_KMEM) kfree(bp->b_addr); @@ -334,7 +332,7 @@ xfs_buf_alloc_backing_mem( struct folio *folio; if (xfs_buftarg_is_mem(bp->b_target)) - return xmbuf_map_page(bp); + return xmbuf_map_backing_mem(bp); /* Assure zeroed buffer for non-read cases. */ if (!(flags & XBF_READ)) diff --git a/fs/xfs/xfs_buf_mem.c b/fs/xfs/xfs_buf_mem.c index e2f6c5524771..c4872a4d7a71 100644 --- a/fs/xfs/xfs_buf_mem.c +++ b/fs/xfs/xfs_buf_mem.c @@ -74,7 +74,7 @@ xmbuf_alloc( /* * We don't want to bother with kmapping data during repair, so don't - * allow highmem pages to back this mapping. + * allow highmem folios to back this mapping. */ mapping_set_gfp_mask(inode->i_mapping, GFP_KERNEL); @@ -127,14 +127,13 @@ xmbuf_free( kfree(btp); } -/* Directly map a shmem page into the buffer cache. */ +/* Directly map a shmem folio into the buffer cache. */ int -xmbuf_map_page( +xmbuf_map_backing_mem( struct xfs_buf *bp) { struct inode *inode = file_inode(bp->b_target->bt_file); struct folio *folio = NULL; - struct page *page; loff_t pos = BBTOB(xfs_buf_daddr(bp)); int error; @@ -159,30 +158,17 @@ xmbuf_map_page( return -EIO; } - page = folio_file_page(folio, pos >> PAGE_SHIFT); - /* - * Mark the page dirty so that it won't be reclaimed once we drop the - * (potentially last) reference in xmbuf_unmap_page. + * Mark the folio dirty so that it won't be reclaimed once we drop the + * (potentially last) reference in xfs_buf_free. */ - set_page_dirty(page); - unlock_page(page); + folio_set_dirty(folio); + folio_unlock(folio); - bp->b_addr = page_address(page); + bp->b_addr = folio_address(folio); return 0; } -/* Unmap a shmem page that was mapped into the buffer cache. */ -void -xmbuf_unmap_page( - struct xfs_buf *bp) -{ - ASSERT(xfs_buftarg_is_mem(bp->b_target)); - - put_page(virt_to_page(bp->b_addr)); - bp->b_addr = NULL; -} - /* Is this a valid daddr within the buftarg? */ bool xmbuf_verify_daddr( @@ -196,7 +182,7 @@ xmbuf_verify_daddr( return daddr < (inode->i_sb->s_maxbytes >> BBSHIFT); } -/* Discard the page backing this buffer. */ +/* Discard the folio backing this buffer. */ static void xmbuf_stale( struct xfs_buf *bp) @@ -211,7 +197,7 @@ xmbuf_stale( } /* - * Finalize a buffer -- discard the backing page if it's stale, or run the + * Finalize a buffer -- discard the backing folio if it's stale, or run the * write verifier to detect problems. */ int diff --git a/fs/xfs/xfs_buf_mem.h b/fs/xfs/xfs_buf_mem.h index eed4a7b63232..67d525cc1513 100644 --- a/fs/xfs/xfs_buf_mem.h +++ b/fs/xfs/xfs_buf_mem.h @@ -19,16 +19,14 @@ int xmbuf_alloc(struct xfs_mount *mp, const char *descr, struct xfs_buftarg **btpp); void xmbuf_free(struct xfs_buftarg *btp); -int xmbuf_map_page(struct xfs_buf *bp); -void xmbuf_unmap_page(struct xfs_buf *bp); bool xmbuf_verify_daddr(struct xfs_buftarg *btp, xfs_daddr_t daddr); void xmbuf_trans_bdetach(struct xfs_trans *tp, struct xfs_buf *bp); int xmbuf_finalize(struct xfs_buf *bp); #else # define xfs_buftarg_is_mem(...) (false) -# define xmbuf_map_page(...) (-ENOMEM) -# define xmbuf_unmap_page(...) ((void)0) # define xmbuf_verify_daddr(...) (false) #endif /* CONFIG_XFS_MEMORY_BUFS */ +int xmbuf_map_backing_mem(struct xfs_buf *bp); + #endif /* __XFS_BUF_MEM_H__ */ From patchwork Wed Feb 26 15:51:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13992741 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE2FB21CA0D for ; Wed, 26 Feb 2025 15:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; cv=none; b=sQXkPeDCJ4c4qnu52OJfLe8CtDSM/Jro3Po86kmItXTMiDr4vkN7ervDO0GOIKOdJExPJpEOjGGmRuj7uiGpFnQYcHJLNBH0vu8Gwv14Vl5syf3YlexoNdZFvuCgglEJWbVtku5LZYCE3JZS913PQugkOZjfv0DY8geaZwDlQr4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740585169; c=relaxed/simple; bh=uER3PstcB2L55h73M6TqnyPTLd1PTuJwuj7MjcDxTWU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CJRpVRFiTgtWd65GV8JK2fZWXha+pXpKvINT5126aE8owK/ytp6bAcfOzUqXG4YOekiCqkZMlMLx6PCiTqD6zpsyFl57ks86NM0KYUrmmgExyFUFBbXXp+jdjI/UAV+nNF0QcyOQEijMv3uVWS+G/DrXTUDP2dG7e3qVJvmpr/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=none smtp.mailfrom=bombadil.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=tBNuvO3S; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=bombadil.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="tBNuvO3S" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=bXfjvp20aEpc2lNimXve+0LJZ0zx5ojlMZCnSrxYFI4=; b=tBNuvO3SuRztWJm1Pdo/W/iJ+V H7/NZxpsezFLPV2iSN5utWEtPjUgZJAHk+eV+NJiGTvy814h+pIPFyGUubYOFUi1kMtyjeANWMEtv NPmBhI4A+6VYgwr/VqbRRPDk/CHQ7O5KnvAiSSx1wcjfvCjdnVtdNstkU4W9fDFG/klqLjHEd0gfs K77jwifClj89QWfS96iJHtuToqI0TpXr1/sccWz35rJ+kfN1FNyhQCnmhYf55HZsmZ2Vx+ydRKnSt PcMstRUpNjaBHdxi5lIghDeXQoWna0Z2g6Jx2aPMphGg3jUADubtniax+HpFJtQk2DFdMSDrZGyfa BxlrRIeA==; Received: from [4.28.11.157] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tnJiJ-00000004PMm-2UqA; Wed, 26 Feb 2025 15:52:47 +0000 From: Christoph Hellwig To: Carlos Maiolino Cc: "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org Subject: [PATCH 12/12] xfs: trace what memory backs a buffer Date: Wed, 26 Feb 2025 07:51:40 -0800 Message-ID: <20250226155245.513494-13-hch@lst.de> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20250226155245.513494-1-hch@lst.de> References: <20250226155245.513494-1-hch@lst.de> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Add three trace points for the different backing memory allocators for buffers. Signed-off-by: Christoph Hellwig Reviewed-by: "Darrick J. Wong" --- fs/xfs/xfs_buf.c | 4 ++++ fs/xfs/xfs_trace.h | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 0393dd302cf6..a396b628e9b0 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -299,6 +299,7 @@ xfs_buf_alloc_kmem( return -ENOMEM; } bp->b_flags |= _XBF_KMEM; + trace_xfs_buf_backing_kmem(bp, _RET_IP_); return 0; } @@ -379,9 +380,11 @@ xfs_buf_alloc_backing_mem( if (!folio) { if (size <= PAGE_SIZE) return -ENOMEM; + trace_xfs_buf_backing_fallback(bp, _RET_IP_); goto fallback; } bp->b_addr = folio_address(folio); + trace_xfs_buf_backing_folio(bp, _RET_IP_); return 0; fallback: @@ -395,6 +398,7 @@ xfs_buf_alloc_backing_mem( memalloc_retry_wait(gfp_mask); } + trace_xfs_buf_backing_vmalloc(bp, _RET_IP_); return 0; } diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index b29462363b81..c8f64daf6d75 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -545,6 +545,10 @@ DEFINE_BUF_EVENT(xfs_buf_iodone_async); DEFINE_BUF_EVENT(xfs_buf_error_relse); DEFINE_BUF_EVENT(xfs_buf_drain_buftarg); DEFINE_BUF_EVENT(xfs_trans_read_buf_shut); +DEFINE_BUF_EVENT(xfs_buf_backing_folio); +DEFINE_BUF_EVENT(xfs_buf_backing_kmem); +DEFINE_BUF_EVENT(xfs_buf_backing_vmalloc); +DEFINE_BUF_EVENT(xfs_buf_backing_fallback); /* not really buffer traces, but the buf provides useful information */ DEFINE_BUF_EVENT(xfs_btree_corrupt);