Message ID | 20180621025520.9115-1-jeffm@suse.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote: > From: Jeff Mahoney <jeffm@suse.com> > > Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the > AG alignment code into a separate function. It got rid of > redundant checks for dswidth != 0 but did too good a job since now > it doesn't check at all. Of course they got removed - we've already validated the CLI input and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is zero in calc_stripe_factors(). i.e. We do input validation of CLI paramters before anything else so that later users (like align_ag_geometry()) can assume the parameters they are using are valid. In this case, the assumption is that either both dsunit and dswidth are zero or that both are non-zero and dswidth an integer multple of dsunit. > When we hit the check to see if agsize > is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash > on a divide by zero. What CLI config did you use to hit this? I'd like to reproduce it so I can see where calc_stripe_factors() is going wrong.... Cheers, Dave.
On 6/20/18 11:57 PM, Dave Chinner wrote: > On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote: >> From: Jeff Mahoney <jeffm@suse.com> >> >> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the >> AG alignment code into a separate function. It got rid of >> redundant checks for dswidth != 0 but did too good a job since now >> it doesn't check at all. > > Of course they got removed - we've already validated the CLI input > and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is > zero in calc_stripe_factors(). > > i.e. We do input validation of CLI paramters before anything else so > that later users (like align_ag_geometry()) can assume the > parameters they are using are valid. In this case, the assumption is > that either both dsunit and dswidth are zero or that both are > non-zero and dswidth an integer multple of dsunit. It's not coming from the CLI parameters. It's coming from the topology. The blkid topology stuff is returning 8k for minimal i/o and 0 for optimal. Without a CLI config, we have dunit=0 in calc_stripe_factors, which takes it from the device. We set cfg->dsunit=16 and cfg->dswidth=0, and then head down to align_ag_geometry. The topology on this system looks like: ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512, psectorsize = 512} That matches with a few of the dm targets I see reported on this system. Since minimal io size isn't sector size, we don't clear it. Talking with Eric, we should probably just extent that check in blkid_get_topology to cover that case since it's not like we can just reject what blkid gives us. >> When we hit the check to see if agsize >> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash >> on a divide by zero. > > What CLI config did you use to hit this? I'd like to reproduce it so > I can see where calc_stripe_factors() is going wrong.... It was just "mkfs.xfs <path-to-mpath-device>" -Jeff
On 6/21/18 2:15 PM, Jeff Mahoney wrote: > On 6/20/18 11:57 PM, Dave Chinner wrote: >> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote: >>> From: Jeff Mahoney <jeffm@suse.com> >>> >>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the >>> AG alignment code into a separate function. It got rid of >>> redundant checks for dswidth != 0 but did too good a job since now >>> it doesn't check at all. >> >> Of course they got removed - we've already validated the CLI input >> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is >> zero in calc_stripe_factors(). >> >> i.e. We do input validation of CLI paramters before anything else so >> that later users (like align_ag_geometry()) can assume the >> parameters they are using are valid. In this case, the assumption is >> that either both dsunit and dswidth are zero or that both are >> non-zero and dswidth an integer multple of dsunit. > > It's not coming from the CLI parameters. It's coming from the topology. > The blkid topology stuff is returning 8k for minimal i/o and 0 for > optimal. Without a CLI config, we have dunit=0 in calc_stripe_factors, > which takes it from the device. We set cfg->dsunit=16 and > cfg->dswidth=0, and then head down to align_ag_geometry. > > The topology on this system looks like: > > ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512, > psectorsize = 512} > > That matches with a few of the dm targets I see reported on this system. > > Since minimal io size isn't sector size, we don't clear it. Talking > with Eric, we should probably just extent that check in > blkid_get_topology to cover that case since it's not like we can just > reject what blkid gives us. > >>> When we hit the check to see if agsize >>> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash >>> on a divide by zero. >> >> What CLI config did you use to hit this? I'd like to reproduce it so >> I can see where calc_stripe_factors() is going wrong.... > > It was just "mkfs.xfs <path-to-mpath-device>" yeah, so in blkid_get_topology we have: /* * If the reported values are the same as the physical sector size * do not bother to report anything. It will only cause warnings * if people specify larger stripe units or widths manually. */ if (*sunit == *psectorsize || *swidth == *psectorsize) { *sunit = 0; *swidth = 0; } as a sanity check. We should probably extend that (or add a similar test) which sets both to zero if either is zero, for the same reason as the CLI validation does it. Comments should explain why.... /* * Ensure that if either sunit or stripe width is zero, the other * is as well. Having only one set is not valid stripe geometry. */ if (*sunit == 0 || *swidth == 0) { *sunit = 0; *swidth = 0; } FWIW if I set *sunit = 8192; *swidth = 0; manually in blkid_get_topology I do get the same floating point exception; if that's what we get from libblkid for some weird device, we'd go boom. -Eric
On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote: > On 6/21/18 2:15 PM, Jeff Mahoney wrote: > > On 6/20/18 11:57 PM, Dave Chinner wrote: > >> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote: > >>> From: Jeff Mahoney <jeffm@suse.com> > >>> > >>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the > >>> AG alignment code into a separate function. It got rid of > >>> redundant checks for dswidth != 0 but did too good a job since now > >>> it doesn't check at all. > >> > >> Of course they got removed - we've already validated the CLI input > >> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is > >> zero in calc_stripe_factors(). > >> > >> i.e. We do input validation of CLI paramters before anything else so > >> that later users (like align_ag_geometry()) can assume the > >> parameters they are using are valid. In this case, the assumption is > >> that either both dsunit and dswidth are zero or that both are > >> non-zero and dswidth an integer multple of dsunit. > > > > It's not coming from the CLI parameters. It's coming from the topology. > > The blkid topology stuff is returning 8k for minimal i/o and 0 for > > optimal. Without a CLI config, we have dunit=0 in calc_stripe_factors, > > which takes it from the device. We set cfg->dsunit=16 and > > cfg->dswidth=0, and then head down to align_ag_geometry. > > > > The topology on this system looks like: > > > > ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512, > > > > psectorsize = 512} > > That matches with a few of the dm targets I see reported on this system. Exactly what kind of dm device does that - we've never had anyone report this before? Also, if it's a dm device, shouldn't it also be fixed to output a sane set of IO characteristics in /sys? > > Since minimal io size isn't sector size, we don't clear it. Talking > > with Eric, we should probably just extent that check in > > blkid_get_topology to cover that case since it's not like we can just > > reject what blkid gives us. > > > >>> When we hit the check to see if agsize > >>> is a multiple of stripe width: (cfg->agsize % cfg->dswidth), we crash > >>> on a divide by zero. > >> > >> What CLI config did you use to hit this? I'd like to reproduce it so > >> I can see where calc_stripe_factors() is going wrong.... > > > > It was just "mkfs.xfs <path-to-mpath-device>" > > yeah, so in blkid_get_topology we have: > > /* > * If the reported values are the same as the physical sector size > * do not bother to report anything. It will only cause warnings > * if people specify larger stripe units or widths manually. > */ > if (*sunit == *psectorsize || *swidth == *psectorsize) { > *sunit = 0; > *swidth = 0; > } > > as a sanity check. We should probably extend that (or add a similar test) > which sets both to zero if either is zero, for the same reason as the CLI > validation does it. Comments should explain why.... > > /* > * Ensure that if either sunit or stripe width is zero, the other > * is as well. Having only one set is not valid stripe geometry. > */ > if (*sunit == 0 || *swidth == 0) { > *sunit = 0; > *swidth = 0; > } Yup, we need to validate that input properly before using it. Silly me - I should have known better that to assume external code would give back sane results..... Cheers, Dave.
On 6/21/18 3:49 PM, Dave Chinner wrote: > On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote: >> On 6/21/18 2:15 PM, Jeff Mahoney wrote: >>> On 6/20/18 11:57 PM, Dave Chinner wrote: >>>> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote: >>>>> From: Jeff Mahoney <jeffm@suse.com> >>>>> >>>>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the >>>>> AG alignment code into a separate function. It got rid of >>>>> redundant checks for dswidth != 0 but did too good a job since now >>>>> it doesn't check at all. >>>> >>>> Of course they got removed - we've already validated the CLI input >>>> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is >>>> zero in calc_stripe_factors(). >>>> >>>> i.e. We do input validation of CLI paramters before anything else so >>>> that later users (like align_ag_geometry()) can assume the >>>> parameters they are using are valid. In this case, the assumption is >>>> that either both dsunit and dswidth are zero or that both are >>>> non-zero and dswidth an integer multple of dsunit. >>> >>> It's not coming from the CLI parameters. It's coming from the topology. >>> The blkid topology stuff is returning 8k for minimal i/o and 0 for >>> optimal. Without a CLI config, we have dunit=0 in calc_stripe_factors, >>> which takes it from the device. We set cfg->dsunit=16 and >>> cfg->dswidth=0, and then head down to align_ag_geometry. >>> >>> The topology on this system looks like: >>> >>> ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512, >>> >>> psectorsize = 512} >>> That matches with a few of the dm targets I see reported on this system. > > Exactly what kind of dm device does that - we've never had anyone > report this before? Also, if it's a dm device, shouldn't it > also be fixed to output a sane set of IO characteristics in /sys? It's multipath, so it just follows the stacking rules. The underlying SCSI devices report the same numbers. The optimal io number is documented as being optional, at least for the kernel, so we need to handle it being 0 anyway. I'm not sure if the device specifying a minimum i/o size larger than the sector size and also not specifying an optimal i/o size is valid SCSI. I'll ask for more information since now I'm also curious. -Jeff
On Thu, Jun 21, 2018 at 05:31:58PM -0400, Jeff Mahoney wrote: > On 6/21/18 3:49 PM, Dave Chinner wrote: > > On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote: > >> On 6/21/18 2:15 PM, Jeff Mahoney wrote: > >>> On 6/20/18 11:57 PM, Dave Chinner wrote: > >>>> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote: > >>>>> From: Jeff Mahoney <jeffm@suse.com> > >>>>> > >>>>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the > >>>>> AG alignment code into a separate function. It got rid of > >>>>> redundant checks for dswidth != 0 but did too good a job since now > >>>>> it doesn't check at all. > >>>> > >>>> Of course they got removed - we've already validated the CLI input > >>>> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is > >>>> zero in calc_stripe_factors(). > >>>> > >>>> i.e. We do input validation of CLI paramters before anything else so > >>>> that later users (like align_ag_geometry()) can assume the > >>>> parameters they are using are valid. In this case, the assumption is > >>>> that either both dsunit and dswidth are zero or that both are > >>>> non-zero and dswidth an integer multple of dsunit. > >>> > >>> It's not coming from the CLI parameters. It's coming from the topology. > >>> The blkid topology stuff is returning 8k for minimal i/o and 0 for > >>> optimal. Without a CLI config, we have dunit=0 in calc_stripe_factors, > >>> which takes it from the device. We set cfg->dsunit=16 and > >>> cfg->dswidth=0, and then head down to align_ag_geometry. > >>> > >>> The topology on this system looks like: > >>> > >>> ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512, > >>> > >>> psectorsize = 512} > >>> That matches with a few of the dm targets I see reported on this system. > > > > Exactly what kind of dm device does that - we've never had anyone > > report this before? Also, if it's a dm device, shouldn't it > > also be fixed to output a sane set of IO characteristics in /sys? > > It's multipath, so it just follows the stacking rules. The underlying > SCSI devices report the same numbers. The optimal io number is > documented as being optional, at least for the kernel, so we need to > handle it being 0 anyway. I'm not sure if the device specifying a > minimum i/o size larger than the sector size and also not specifying an > optimal i/o size is valid SCSI. I'll ask for more information since now > I'm also curious. Ah, so it came from the hardware? In that case, we probably shouldn't zero sunit when blkid reports this whacky case. i.e. I think we should set swidth = sunit so that we retain allocation alignment to the minimum IO size the device specified. Cheers, Dave.
On 6/21/18 5:36 PM, Dave Chinner wrote: > On Thu, Jun 21, 2018 at 05:31:58PM -0400, Jeff Mahoney wrote: >> On 6/21/18 3:49 PM, Dave Chinner wrote: >>> On Thu, Jun 21, 2018 at 02:26:45PM -0500, Eric Sandeen wrote: >>>> On 6/21/18 2:15 PM, Jeff Mahoney wrote: >>>>> On 6/20/18 11:57 PM, Dave Chinner wrote: >>>>>> On Wed, Jun 20, 2018 at 10:55:20PM -0400, jeffm@suse.com wrote: >>>>>>> From: Jeff Mahoney <jeffm@suse.com> >>>>>>> >>>>>>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the >>>>>>> AG alignment code into a separate function. It got rid of >>>>>>> redundant checks for dswidth != 0 but did too good a job since now >>>>>>> it doesn't check at all. >>>>>> >>>>>> Of course they got removed - we've already validated the CLI input >>>>>> and guaranteed that cfg->dswidth can only be zero iff cfg->dsunit is >>>>>> zero in calc_stripe_factors(). >>>>>> >>>>>> i.e. We do input validation of CLI paramters before anything else so >>>>>> that later users (like align_ag_geometry()) can assume the >>>>>> parameters they are using are valid. In this case, the assumption is >>>>>> that either both dsunit and dswidth are zero or that both are >>>>>> non-zero and dswidth an integer multple of dsunit. >>>>> >>>>> It's not coming from the CLI parameters. It's coming from the topology. >>>>> The blkid topology stuff is returning 8k for minimal i/o and 0 for >>>>> optimal. Without a CLI config, we have dunit=0 in calc_stripe_factors, >>>>> which takes it from the device. We set cfg->dsunit=16 and >>>>> cfg->dswidth=0, and then head down to align_ag_geometry. >>>>> >>>>> The topology on this system looks like: >>>>> >>>>> ft = {dsunit = 16, dswidth = 0, rtswidth = 0, lsectorsize = 512, >>>>> >>>>> psectorsize = 512} >>>>> That matches with a few of the dm targets I see reported on this system. >>> >>> Exactly what kind of dm device does that - we've never had anyone >>> report this before? Also, if it's a dm device, shouldn't it >>> also be fixed to output a sane set of IO characteristics in /sys? >> >> It's multipath, so it just follows the stacking rules. The underlying >> SCSI devices report the same numbers. The optimal io number is >> documented as being optional, at least for the kernel, so we need to >> handle it being 0 anyway. I'm not sure if the device specifying a >> minimum i/o size larger than the sector size and also not specifying an >> optimal i/o size is valid SCSI. I'll ask for more information since now >> I'm also curious. > > Ah, so it came from the hardware? In that case, we probably > shouldn't zero sunit when blkid reports this whacky case. i.e. I > think we should set swidth = sunit so that we retain allocation > alignment to the minimum IO size the device specified. Hrmph. yeah, I'd really like to see # blockdev --getiomin --getioopt --getss --getpbsz for all the devices in the stack, I guess... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/21/18 6:16 PM, Eric Sandeen wrote: >> Ah, so it came from the hardware? In that case, we probably >> shouldn't zero sunit when blkid reports this whacky case. i.e. I >> think we should set swidth = sunit so that we retain allocation >> alignment to the minimum IO size the device specified. > Hrmph. > > yeah, I'd really like to see > > # blockdev --getiomin --getioopt --getss --getpbsz > > for all the devices in the stack, I guess... Ok, so Jeff shared the lsblk output with me; the underlying device and the mpath device are both: MIN-IO OPT-IO PHY-SEC LOG-SEC 8192 0 512 512 so logical/physical 512, with minimum io 8192 and optimal io 0. On further inspection Jeff says this may be because an optimal size is reported as greater than the maximum size, so we get 0 due to the inconsistency. Ok, well, I guess I see your point that we should probably set sunit=swidth=8192 here. I was thinking that conflicted with how we process it on the commandline but no, it doesn't; if you're going to manually specify you can't set one to 0 and not the other, but we have to make the best of what the hardware tells us. So we could do: /* * Some odd hardware sets minimum IO but not optimal; assume * minimal is the smallest preferred IO size, and set optimal * to the same value, i.e. a stripe of 1 disk. */ if (*sunit && *swidth == 0) *swidth = *sunit; But it's clearly buggy hardware. How are we to know which values are accurate and which are not? It still may be better to just set to no stripe geometry if we've gotten nonsense from libblockdev, if the device wants better performance it needs to have rational firmware... trying to 2nd guess bad values is just a crapshoot. I guess we could issue a warning ... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 6/21/18 9:34 PM, Eric Sandeen wrote: > On 6/21/18 6:16 PM, Eric Sandeen wrote: >>> Ah, so it came from the hardware? In that case, we probably >>> shouldn't zero sunit when blkid reports this whacky case. i.e. I >>> think we should set swidth = sunit so that we retain allocation >>> alignment to the minimum IO size the device specified. >> Hrmph. >> >> yeah, I'd really like to see >> >> # blockdev --getiomin --getioopt --getss --getpbsz >> >> for all the devices in the stack, I guess... > > Ok, so Jeff shared the lsblk output with me; the underlying > device and the mpath device are both: > > MIN-IO OPT-IO PHY-SEC LOG-SEC > 8192 0 512 512 > > so logical/physical 512, with minimum io 8192 and optimal io 0. > On further inspection Jeff says this may be because an optimal > size is reported as greater than the maximum size, so we get 0 > due to the inconsistency. For this device, the user reported the following sg_vpd -a output: Optimal transfer length granularity: 16 blocks -- This is what gets exported as minimum_io_size Maximum transfer length: 32768 blocks Optimal transfer length: 65535 blocks The SCSI layer checks to see if the optimal blocks value is larger than what the device reports as its max and whether it's larger than 65535 or smaller than page size, and refuses to set the optimal io size so it defaults to 0. Given the values above, that has to be what's happening here. > Ok, well, I guess I see your point that we should probably set > sunit=swidth=8192 here. I was thinking that conflicted with > how we process it on the commandline but no, it doesn't; if you're > going to manually specify you can't set one to 0 and not the other, > but we have to make the best of what the hardware tells us. > > So we could do: > > /* > * Some odd hardware sets minimum IO but not optimal; assume > * minimal is the smallest preferred IO size, and set optimal > * to the same value, i.e. a stripe of 1 disk. > */ > if (*sunit && *swidth == 0) > *swidth = *sunit; > > But it's clearly buggy hardware. How are we to know which values > are accurate and which are not? It still may be better to just set > to no stripe geometry if we've gotten nonsense from libblockdev, > if the device wants better performance it needs to have rational > firmware... trying to 2nd guess bad values is just a crapshoot. > I guess we could issue a warning ... I think the sane defaults + warning is the way to go here too. -Jeff
On 6/21/18 9:34 PM, Eric Sandeen wrote: > On 6/21/18 6:16 PM, Eric Sandeen wrote: >>> Ah, so it came from the hardware? In that case, we probably >>> shouldn't zero sunit when blkid reports this whacky case. i.e. I >>> think we should set swidth = sunit so that we retain allocation >>> alignment to the minimum IO size the device specified. >> Hrmph. >> >> yeah, I'd really like to see >> >> # blockdev --getiomin --getioopt --getss --getpbsz >> >> for all the devices in the stack, I guess... > > Ok, so Jeff shared the lsblk output with me; the underlying > device and the mpath device are both: > > MIN-IO OPT-IO PHY-SEC LOG-SEC > 8192 0 512 512 > > so logical/physical 512, with minimum io 8192 and optimal io 0. > On further inspection Jeff says this may be because an optimal > size is reported as greater than the maximum size, so we get 0 > due to the inconsistency. > > Ok, well, I guess I see your point that we should probably set > sunit=swidth=8192 here. I was thinking that conflicted with > how we process it on the commandline but no, it doesn't; if you're > going to manually specify you can't set one to 0 and not the other, > but we have to make the best of what the hardware tells us. > > So we could do: > > /* > * Some odd hardware sets minimum IO but not optimal; assume > * minimal is the smallest preferred IO size, and set optimal > * to the same value, i.e. a stripe of 1 disk. > */ > if (*sunit && *swidth == 0) > *swidth = *sunit; > > But it's clearly buggy hardware. How are we to know which values > are accurate and which are not? It still may be better to just set > to no stripe geometry if we've gotten nonsense from libblockdev, > if the device wants better performance it needs to have rational > firmware... trying to 2nd guess bad values is just a crapshoot. > I guess we could issue a warning ... Getting back to this after some time off. I agree we should warn, but if we warn, we need to do it when we have more context. Otherwise we'll warn even when the user specifies stripe parameters, which just looks sloppy. I have an updated patch that puts it in calc_stripe_factors and only issues the warning if we will consume those values. -Jeff
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index 78d0ce5d..28a7e70c 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -2670,7 +2670,7 @@ align_ag_geometry( uint64_t tmp_agsize; int dsunit = cfg->dsunit; - if (!dsunit) + if (!dsunit || !cfg->dswidth) goto validate; /*