diff mbox series

[RFC] mkfs.xfs: do not set log stripe unit for probed sw <= 2

Message ID 08ddc67b-392f-efe0-ffd2-a7295a42bac6@redhat.com (mailing list archive)
State Deferred, archived
Headers show
Series [RFC] mkfs.xfs: do not set log stripe unit for probed sw <= 2 | expand

Commit Message

Eric Sandeen May 27, 2020, 5:26 p.m. UTC
If the stripe width of a device is only 2x or 1x the stripe unit, there
is no parity disk on this device, and setting a larger log stripe unit
will not avoid any RMW cycles.  However, a large log stripe unit does
have significant penalties for IO amplification because every log write
will be rounded up to the log stripe unit.

This was recently highlighted by a user running bonnie++ in sync mode,
where the default RAID10 geometry of 256k/512k yielded results which
were 4x slower than a smaller log stripe unit. While bonnie++ may not
be the benchmark of choice, it does highlight this issue.

Because a larger log stripe unit yields no RMW benefit on a device with
no parity disks, avoid setting in these cases.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

... thoughts?

Am I missing a reason why we /would/ still want lsunit in this case?

Comments

Eric Sandeen May 27, 2020, 9:20 p.m. UTC | #1
On 5/27/20 12:26 PM, Eric Sandeen wrote:
> If the stripe width of a device is only 2x or 1x the stripe unit, there
> is no parity disk on this device, and setting a larger log stripe unit
> will not avoid any RMW cycles.  However, a large log stripe unit does
> have significant penalties for IO amplification because every log write
> will be rounded up to the log stripe unit.
> 
> This was recently highlighted by a user running bonnie++ in sync mode,
> where the default RAID10 geometry of 256k/512k yielded results which
> were 4x slower than a smaller log stripe unit. While bonnie++ may not
> be the benchmark of choice, it does highlight this issue.
> 
> Because a larger log stripe unit yields no RMW benefit on a device with
> no parity disks, avoid setting in these cases.

NAK

I wasn't thinking about stacked parity raids like RAID50 (thanks Dave).

So we cannot infer anything about parity disks from the top-level geometry,
and so the proposed optimization is not valid.

-Eric
diff mbox series

Patch

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 2e6cd280..4da69b29 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2407,13 +2407,15 @@  _("log stripe unit (%d) must be a multiple of the block size (%d)\n"),
 	}
 
 	/*
-	 * check that log sunit is modulo fsblksize or default it to dsunit.
+	 * check that log sunit is modulo fsblksize or default it to dsunit
+	 * if this looks like a parity device (swidth > 2x sunit).
 	 */
 	if (lsunit) {
 		/* convert from 512 byte blocks to fs blocks */
 		cfg->lsunit = DTOBT(lsunit, cfg->blocklog);
 	} else if (cfg->sb_feat.log_version == 2 &&
-		   cfg->loginternal && cfg->dsunit) {
+		   cfg->loginternal && cfg->dsunit &&
+		   (cfg->dswidth / cfg->dsunit > 2)) {
 		/* lsunit and dsunit now in fs blocks */
 		cfg->lsunit = cfg->dsunit;
 	}