diff mbox

mkfs: avoid divide-by-zero when hardware reports optimal i/o size as 0

Message ID 6f62dbc6-f516-e8b5-1f08-6be227a61219@suse.com (mailing list archive)
State Accepted, archived
Headers show

Commit Message

Jeff Mahoney July 19, 2018, 9:23 p.m. UTC
Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
AG alignment code into a separate function.  It got rid of
redundant checks for dswidth != 0 since calc_stripe_factors was
supposed to guarantee that if dsunit is non-zero dswidth will be
as well.  Unfortunately, there's hardware out there that reports its
optimal i/o size as larger than the maximum i/o size, which the kernel
treats as broken and zeros out the optimal i/o size.  We'll accept
the multi-sector dsunit but have a zero dswidth and hit a divide-by-zero
in align_ag_geometry.

To resolve this we can check the topology before consuming it, default
to using the stripe unit as the stripe width, and warn the user about it.

Fixes: 051b4e37f5e (mkfs: factor AG alignment)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 mkfs/xfs_mkfs.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Carlos Maiolino July 20, 2018, 3:55 p.m. UTC | #1
On Thu, Jul 19, 2018 at 05:23:22PM -0400, Jeff Mahoney wrote:
> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> AG alignment code into a separate function.  It got rid of
> redundant checks for dswidth != 0 since calc_stripe_factors was
> supposed to guarantee that if dsunit is non-zero dswidth will be
> as well.  Unfortunately, there's hardware out there that reports its
> optimal i/o size as larger than the maximum i/o size, which the kernel
> treats as broken and zeros out the optimal i/o size.  We'll accept
> the multi-sector dsunit but have a zero dswidth and hit a divide-by-zero
> in align_ag_geometry.
> 
> To resolve this we can check the topology before consuming it, default
> to using the stripe unit as the stripe width, and warn the user about it.
> 

I wonder if this shouldn't go into blkid_get_topology since something is wrong
with the information reported by the storage.
And require a force_overwrite to continue, at this point, something looks quite
wrong in the storage, and I think this is the last 'resource' a sysadmin will
have to notice this before making the FS, and start using it, so, maybe requiring
force_overwrite would bring more attention.

> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> ---
>  mkfs/xfs_mkfs.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index a135e06e..35542e57 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2295,6 +2295,12 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>  	if (!dsunit) {
>  		dsunit = ft->dsunit;
>  		dswidth = ft->dswidth;
> +		if (dsunit && dswidth == 0) {
> +			fprintf(stderr,
> +_("%s: Volume reports stripe unit of %d bytes but stripe width of 0.  Using stripe width of %d bytes, which may not be optimal.\n"),
> +				progname, dsunit << 9, dsunit << 9);
> +			dswidth = dsunit;
> +		}
>  		use_dev = true;
>  	} else {
>  		/* check and warn is alignment is sub-optimal */
> -- 
> 2.16.4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong July 20, 2018, 4:19 p.m. UTC | #2
On Fri, Jul 20, 2018 at 05:55:29PM +0200, Carlos Maiolino wrote:
> On Thu, Jul 19, 2018 at 05:23:22PM -0400, Jeff Mahoney wrote:
> > Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> > AG alignment code into a separate function.  It got rid of
> > redundant checks for dswidth != 0 since calc_stripe_factors was
> > supposed to guarantee that if dsunit is non-zero dswidth will be
> > as well.  Unfortunately, there's hardware out there that reports its
> > optimal i/o size as larger than the maximum i/o size, which the kernel
> > treats as broken and zeros out the optimal i/o size.  We'll accept
> > the multi-sector dsunit but have a zero dswidth and hit a divide-by-zero
> > in align_ag_geometry.
> > 
> > To resolve this we can check the topology before consuming it, default
> > to using the stripe unit as the stripe width, and warn the user about it.
> > 
> 
> I wonder if this shouldn't go into blkid_get_topology since something is wrong
> with the information reported by the storage.

If the storage gives us crap geometry information we don't have to use
it.  Keep the message that we autodetected nonsense and are dropping it;
the sysadmin can always re-run mkfs with sensible dsunit/dswidth.

> And require a force_overwrite to continue, at this point, something looks quite
> wrong in the storage, and I think this is the last 'resource' a sysadmin will
> have to notice this before making the FS, and start using it, so, maybe requiring
> force_overwrite would bring more attention.

I prefer reserving -f for "This is about to destroy something and can't
be undone", not "This auto-optimization is screwed up, continue? (y/N)"

--D

> > Fixes: 051b4e37f5e (mkfs: factor AG alignment)
> > Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> > ---
> >  mkfs/xfs_mkfs.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> > index a135e06e..35542e57 100644
> > --- a/mkfs/xfs_mkfs.c
> > +++ b/mkfs/xfs_mkfs.c
> > @@ -2295,6 +2295,12 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
> >  	if (!dsunit) {
> >  		dsunit = ft->dsunit;
> >  		dswidth = ft->dswidth;
> > +		if (dsunit && dswidth == 0) {
> > +			fprintf(stderr,
> > +_("%s: Volume reports stripe unit of %d bytes but stripe width of 0.  Using stripe width of %d bytes, which may not be optimal.\n"),
> > +				progname, dsunit << 9, dsunit << 9);
> > +			dswidth = dsunit;
> > +		}
> >  		use_dev = true;
> >  	} else {
> >  		/* check and warn is alignment is sub-optimal */
> > -- 
> > 2.16.4
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Carlos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Mahoney July 20, 2018, 6:08 p.m. UTC | #3
On 7/20/18 11:55 AM, Carlos Maiolino wrote:
> On Thu, Jul 19, 2018 at 05:23:22PM -0400, Jeff Mahoney wrote:
>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>> AG alignment code into a separate function.  It got rid of
>> redundant checks for dswidth != 0 since calc_stripe_factors was
>> supposed to guarantee that if dsunit is non-zero dswidth will be
>> as well.  Unfortunately, there's hardware out there that reports its
>> optimal i/o size as larger than the maximum i/o size, which the kernel
>> treats as broken and zeros out the optimal i/o size.  We'll accept
>> the multi-sector dsunit but have a zero dswidth and hit a divide-by-zero
>> in align_ag_geometry.
>>
>> To resolve this we can check the topology before consuming it, default
>> to using the stripe unit as the stripe width, and warn the user about it.
>>
> 
> I wonder if this shouldn't go into blkid_get_topology since something is wrong
> with the information reported by the storage.
> And require a force_overwrite to continue, at this point, something looks quite
> wrong in the storage, and I think this is the last 'resource' a sysadmin will
> have to notice this before making the FS, and start using it, so, maybe requiring
> force_overwrite would bring more attention.

We discussed that initially here:
https://patchwork.kernel.org/patch/10479083/

I worked that up and what ends up happening is that, since we don't have
any context for how the topology will be used, if at all, we print the
error every time.  If the user specified stripe parameters manually, the
topology won't be used.  They won't care if it's broken and certainly
don't need to force it.

Lastly, this wasn't encountered in the real world on some weird discount
hardware.  It's a pretty big product from a major storage vendor.  I've
advised them to fix their firmware but we still need to get users
rolling again.  Warning about a potential suboptimal result is enough,
IMO.  It's not an emergency situation that will result in a completely
broken file system.

-Jeff

>> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> ---
>>  mkfs/xfs_mkfs.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
>> index a135e06e..35542e57 100644
>> --- a/mkfs/xfs_mkfs.c
>> +++ b/mkfs/xfs_mkfs.c
>> @@ -2295,6 +2295,12 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>>  	if (!dsunit) {
>>  		dsunit = ft->dsunit;
>>  		dswidth = ft->dswidth;
>> +		if (dsunit && dswidth == 0) {
>> +			fprintf(stderr,
>> +_("%s: Volume reports stripe unit of %d bytes but stripe width of 0.  Using stripe width of %d bytes, which may not be optimal.\n"),
>> +				progname, dsunit << 9, dsunit << 9);
>> +			dswidth = dsunit;
>> +		}
>>  		use_dev = true;
>>  	} else {
>>  		/* check and warn is alignment is sub-optimal */
>> -- 
>> 2.16.4
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Carlos Maiolino July 23, 2018, 12:21 p.m. UTC | #4
On Fri, Jul 20, 2018 at 09:19:23AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 20, 2018 at 05:55:29PM +0200, Carlos Maiolino wrote:
> > On Thu, Jul 19, 2018 at 05:23:22PM -0400, Jeff Mahoney wrote:
> > > Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> > > AG alignment code into a separate function.  It got rid of
> > > redundant checks for dswidth != 0 since calc_stripe_factors was
> > > supposed to guarantee that if dsunit is non-zero dswidth will be
> > > as well.  Unfortunately, there's hardware out there that reports its
> > > optimal i/o size as larger than the maximum i/o size, which the kernel
> > > treats as broken and zeros out the optimal i/o size.  We'll accept
> > > the multi-sector dsunit but have a zero dswidth and hit a divide-by-zero
> > > in align_ag_geometry.
> > > 
> > > To resolve this we can check the topology before consuming it, default
> > > to using the stripe unit as the stripe width, and warn the user about it.
> > > 
> > 
> > I wonder if this shouldn't go into blkid_get_topology since something is wrong
> > with the information reported by the storage.
> 
> If the storage gives us crap geometry information we don't have to use
> it.  Keep the message that we autodetected nonsense and are dropping it;
> the sysadmin can always re-run mkfs with sensible dsunit/dswidth.
> 
> > And require a force_overwrite to continue, at this point, something looks quite
> > wrong in the storage, and I think this is the last 'resource' a sysadmin will
> > have to notice this before making the FS, and start using it, so, maybe requiring
> > force_overwrite would bring more attention.
> 
> I prefer reserving -f for "This is about to destroy something and can't
> be undone", not "This auto-optimization is screwed up, continue? (y/N)"

Yeah, I agree, forget everything I said here :P

> 
> --D
> 
> > > Fixes: 051b4e37f5e (mkfs: factor AG alignment)
> > > Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> > > ---
> > >  mkfs/xfs_mkfs.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> > > index a135e06e..35542e57 100644
> > > --- a/mkfs/xfs_mkfs.c
> > > +++ b/mkfs/xfs_mkfs.c
> > > @@ -2295,6 +2295,12 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
> > >  	if (!dsunit) {
> > >  		dsunit = ft->dsunit;
> > >  		dswidth = ft->dswidth;
> > > +		if (dsunit && dswidth == 0) {
> > > +			fprintf(stderr,
> > > +_("%s: Volume reports stripe unit of %d bytes but stripe width of 0.  Using stripe width of %d bytes, which may not be optimal.\n"),
> > > +				progname, dsunit << 9, dsunit << 9);
> > > +			dswidth = dsunit;
> > > +		}
> > >  		use_dev = true;
> > >  	} else {
> > >  		/* check and warn is alignment is sub-optimal */
> > > -- 
> > > 2.16.4
> > > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > -- 
> > Carlos
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen July 31, 2018, 2:10 a.m. UTC | #5
On 7/19/18 4:23 PM, Jeff Mahoney wrote:
> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
> AG alignment code into a separate function.  It got rid of
> redundant checks for dswidth != 0 since calc_stripe_factors was
> supposed to guarantee that if dsunit is non-zero dswidth will be
> as well.  Unfortunately, there's hardware out there that reports its
> optimal i/o size as larger than the maximum i/o size, which the kernel
> treats as broken and zeros out the optimal i/o size.  We'll accept
> the multi-sector dsunit but have a zero dswidth and hit a divide-by-zero
> in align_ag_geometry.
> 
> To resolve this we can check the topology before consuming it, default
> to using the stripe unit as the stripe width, and warn the user about it.
> 
> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>

Looks fine to me logically.  Sorry for nitpicking a patch again (it's
a character flaw) but I'd like to massage this slightly:

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 1074886..231542f 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2281,6 +2281,16 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
 
 	/* if no stripe config set, use the device default */
 	if (!dsunit) {
+		/* Watch out for nonsense from device */
+		if (ft->dsunit && ft->dswidth == 0) {
+			fprintf(stderr,
+_("%s: Volume reports stripe unit of %d bytes but stripe width of 0.\n"),
+				progname, ft->dsunit << 9);
+			fprintf(stderr,
+_("Using stripe width of %d bytes, which may not be optimal.\n"),
+				ft->dsunit << 9);
+			ft->dswidth = ft->dsunit;
+		}
 		dsunit = ft->dsunit;
 		dswidth = ft->dswidth;
 		use_dev = true;

to make it a bit more clear that we're checking the /device/-reported
topology (by looking at ft before using it) and also to break up
the long warning message into < 80 char lines.  OK?

This all seems a little messy yet (an inherited mess) but that's slightly
clearer to me.

Thanks,
-Eric

> ---
>  mkfs/xfs_mkfs.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index a135e06e..35542e57 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2295,6 +2295,12 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>  	if (!dsunit) {
>  		dsunit = ft->dsunit;
>  		dswidth = ft->dswidth;
> +		if (dsunit && dswidth == 0) {
> +			fprintf(stderr,
> +_("%s: Volume reports stripe unit of %d bytes but stripe width of 0.  Using stripe width of %d bytes, which may not be optimal.\n"),
> +				progname, dsunit << 9, dsunit << 9);
> +			dswidth = dsunit;
> +		}
>  		use_dev = true;
>  	} else {
>  		/* check and warn is alignment is sub-optimal */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen July 31, 2018, 2:14 a.m. UTC | #6
On 7/30/18 9:10 PM, Eric Sandeen wrote:
> On 7/19/18 4:23 PM, Jeff Mahoney wrote:
>> Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
>> AG alignment code into a separate function.  It got rid of
>> redundant checks for dswidth != 0 since calc_stripe_factors was
>> supposed to guarantee that if dsunit is non-zero dswidth will be
>> as well.  Unfortunately, there's hardware out there that reports its
>> optimal i/o size as larger than the maximum i/o size, which the kernel
>> treats as broken and zeros out the optimal i/o size.  We'll accept
>> the multi-sector dsunit but have a zero dswidth and hit a divide-by-zero
>> in align_ag_geometry.
>>
>> To resolve this we can check the topology before consuming it, default
>> to using the stripe unit as the stripe width, and warn the user about it.
>>
>> Fixes: 051b4e37f5e (mkfs: factor AG alignment)
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> 
> Looks fine to me logically.  Sorry for nitpicking a patch again (it's
> a character flaw) but I'd like to massage this slightly:
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 1074886..231542f 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -2281,6 +2281,16 @@ _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
>  
>  	/* if no stripe config set, use the device default */
>  	if (!dsunit) {
> +		/* Watch out for nonsense from device */
> +		if (ft->dsunit && ft->dswidth == 0) {
> +			fprintf(stderr,
> +_("%s: Volume reports stripe unit of %d bytes but stripe width of 0.\n"),
> +				progname, ft->dsunit << 9);
> +			fprintf(stderr,
> +_("Using stripe width of %d bytes, which may not be optimal.\n"),
> +				ft->dsunit << 9);
> +			ft->dswidth = ft->dsunit;
> +		}
>  		dsunit = ft->dsunit;
>  		dswidth = ft->dswidth;
>  		use_dev = true;
> 
> to make it a bit more clear that we're checking the /device/-reported
> topology (by looking at ft before using it) and also to break up
> the long warning message into < 80 char lines.  OK?
> 
> This all seems a little messy yet (an inherited mess) but that's slightly
> clearer to me.

Hm, though now I'm half tempted to put all the dswidth-vs-dsunit checks in a helper,
and if it fails on the commandline values, usage(); on detected values, set to 0
with a warning, as it does here anyway:

        /*
         * now we have our stripe config, check it's a multiple of block
         * size.
         */
        if ((BBTOB(dsunit) % cfg->blocksize) ||
            (BBTOB(dswidth) % cfg->blocksize)) {
                if (!use_dev) {
...
                }
                dsunit = 0;
                dswidth = 0;
                cfg->sb_feat.nodalign = true;)

and let the user respecify if they wish.  *shrug* I may follow up with another patch
if it works out.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index a135e06e..35542e57 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2295,6 +2295,12 @@  _("data stripe width (%d) must be a multiple of the data stripe unit (%d)\n"),
 	if (!dsunit) {
 		dsunit = ft->dsunit;
 		dswidth = ft->dswidth;
+		if (dsunit && dswidth == 0) {
+			fprintf(stderr,
+_("%s: Volume reports stripe unit of %d bytes but stripe width of 0.  Using stripe width of %d bytes, which may not be optimal.\n"),
+				progname, dsunit << 9, dsunit << 9);
+			dswidth = dsunit;
+		}
 		use_dev = true;
 	} else {
 		/* check and warn is alignment is sub-optimal */