diff mbox

[RESEND] xfs: add online uevent for mount operation

Message ID 1503582115-37220-1-git-send-email-houtao1@huawei.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Hou Tao Aug. 24, 2017, 1:41 p.m. UTC
It will be useful if there is a corresponding online uevent after
a XFS filesystem has been mounted. A typical usage of the uevent
is setting the error configuration for a specific XFS filesystem
or all XFS filesystems by using udevd.

The following is an example of udevd rule which will shutdown
any XFS filesystem after the filesystem gets any IO error:

    ACTION=="online", SUBSYSTEM=="xfs", DEVPATH=="/fs/xfs/*", \
    RUN+="/bin/sh -c 'echo 0 > /sys%p/error/metadata/default/max_retries; \
	    echo 0 > /sys%p/error/metadata/EIO/max_retries; \
	    echo 0 > /sys%p/error/metadata/ENOSPC/max_retries; \
	    echo 0 > /sys%p/error/metadata/ENODEV/max_retries'"

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/xfs/xfs_super.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Darrick J. Wong Aug. 24, 2017, 5:10 p.m. UTC | #1
On Thu, Aug 24, 2017 at 09:41:55PM +0800, Hou Tao wrote:
> It will be useful if there is a corresponding online uevent after
> a XFS filesystem has been mounted. A typical usage of the uevent
> is setting the error configuration for a specific XFS filesystem
> or all XFS filesystems by using udevd.
> 
> The following is an example of udevd rule which will shutdown
> any XFS filesystem after the filesystem gets any IO error:
> 
>     ACTION=="online", SUBSYSTEM=="xfs", DEVPATH=="/fs/xfs/*", \
>     RUN+="/bin/sh -c 'echo 0 > /sys%p/error/metadata/default/max_retries; \
> 	    echo 0 > /sys%p/error/metadata/EIO/max_retries; \
> 	    echo 0 > /sys%p/error/metadata/ENOSPC/max_retries; \
> 	    echo 0 > /sys%p/error/metadata/ENODEV/max_retries'"
>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  fs/xfs/xfs_super.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 38aaacd..695fe85f 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1530,6 +1530,19 @@ xfs_destroy_percpu_counters(
>  	percpu_counter_destroy(&mp->m_fdblocks);
>  }
>  
> +static void
> +xfs_fs_uevent(
> +	struct xfs_mount	*mp,
> +	enum kobject_action	action)
> +{
> +	int err;
> +
> +	err = kobject_uevent(&mp->m_kobj.kobject, action);

It might be useful to send kobject_uevent_env so that we can squeeze in
things like the fs uuid for easier identification?  I envision a
/etc/xfs/fs.conf file like this:

[903db5b1-82cd-4f26-8221-443a4ed0563a]
error.metadata.EIO.max_retries = 0
error.fail_at_unmount = 1
config.cowextsize = 1048576
config.inherit_noatime = 1

With a udev helper that uses the fsuuid to find the appropriate section
of the ini^Wconfig file and load the appropriate values into sysfs.  If
we ever decide to expose more config knobs through sysfs then they'll
be automatically supported.

> +	if (err)
> +		xfs_notice(mp, "Sending XFS uevent %d got error %d",
> +				action, err);
> +}
> +
>  STATIC int
>  xfs_fs_fill_super(
>  	struct super_block	*sb,
> @@ -1667,6 +1680,8 @@ xfs_fs_fill_super(
>  		goto out_unmount;
>  	}
>  
> +	xfs_fs_uevent(mp, KOBJ_ONLINE);

That said, I wonder if we ought to move this discussion to fsdevel to
pull in any /other/ filesystems that have sysfs knobs that they might
like to be able to persist?  ext4 has a bunch of tunables too...

--D

> +
>  	return 0;
>  
>   out_filestream_unmount:
> -- 
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Aug. 24, 2017, 6:04 p.m. UTC | #2
+ linux-fsdevel

On Thu, Aug 24, 2017 at 10:10:28AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 24, 2017 at 09:41:55PM +0800, Hou Tao wrote:
> > It will be useful if there is a corresponding online uevent after
> > a XFS filesystem has been mounted. A typical usage of the uevent
> > is setting the error configuration for a specific XFS filesystem
> > or all XFS filesystems by using udevd.
> > 
> > The following is an example of udevd rule which will shutdown
> > any XFS filesystem after the filesystem gets any IO error:
> > 
> >     ACTION=="online", SUBSYSTEM=="xfs", DEVPATH=="/fs/xfs/*", \
> >     RUN+="/bin/sh -c 'echo 0 > /sys%p/error/metadata/default/max_retries; \
> > 	    echo 0 > /sys%p/error/metadata/EIO/max_retries; \
> > 	    echo 0 > /sys%p/error/metadata/ENOSPC/max_retries; \
> > 	    echo 0 > /sys%p/error/metadata/ENODEV/max_retries'"
> >
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> >  fs/xfs/xfs_super.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 38aaacd..695fe85f 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1530,6 +1530,19 @@ xfs_destroy_percpu_counters(
> >  	percpu_counter_destroy(&mp->m_fdblocks);
> >  }
> >  
> > +static void
> > +xfs_fs_uevent(
> > +	struct xfs_mount	*mp,
> > +	enum kobject_action	action)
> > +{
> > +	int err;
> > +
> > +	err = kobject_uevent(&mp->m_kobj.kobject, action);
> 
> It might be useful to send kobject_uevent_env so that we can squeeze in
> things like the fs uuid for easier identification?  I envision a
> /etc/xfs/fs.conf file like this:
> 
> [903db5b1-82cd-4f26-8221-443a4ed0563a]
> error.metadata.EIO.max_retries = 0
> error.fail_at_unmount = 1
> config.cowextsize = 1048576
> config.inherit_noatime = 1
> 
> With a udev helper that uses the fsuuid to find the appropriate section
> of the ini^Wconfig file and load the appropriate values into sysfs.

Interesting, so yet-another possible configuration file for filesystems use
here could be of a udev helper scraping for matches. This in turn would mean
such udev helpers would need support to parse config files.

Reason I think this is important is it may raise the importance of evaluating
which file configuration parser we end up using for mkfs.xfs as well. While
they don't necessarily need to match, its perhaps ideal if they could. I
realize for instance e2fsprogs is using its own libprofile fork, and chances
of it moving off are low due to the gains libprofile provides for it: supports
different OSes easier, the size is great. However if in any way the more
filesystems will tend use adopt configuration files it may be a question to
raise once again for that filesystem if the parser of choice used is the best.
Or a general community one: do we want to share a common config file parser?

On that note FWIW, at this point I've reviewed 4 libraries for mkfs.xfs:

  o libconfig -- we want to stay away from this [0]
  o e2fsprogs libprofile -- its a fork of krb5 libparser, size-wise this is [1]
    a best option, however its also a fork on its own, or 
  o libini_config -- my pick and what I recommend [2] due it supporting
    uint64_t well, having a good new community traction, and a good set of
    tests
  o nfs: has its own incarnation of a library

*Iff* we want to share the same library for mkfs.xfs and also a udev helper,
then a new requirement here is perhaps also having a respective library on
other higher level language like python for instance, if not now perhaps
for the future. Unless of course we'd expect such udev heleprs to be C
based.

[0] https://gitlab.com/mcgrof/libconfig-int64-issues
[1] https://gitlab.com/mcgrof/libekprofile
[2] https://gitlab.com/mcgrof/libini_config-demo

> If
> we ever decide to expose more config knobs through sysfs then they'll
> be automatically supported.

While all this is nice, I'm sure we're all aware of the dangers of setting
things in stone through sysfs, its likely already decided for the above
tunables, but adding a uevent *is* yet another layer of user interface
which userspace can *expect* and we should be certain we want this so
we won't regress userspace later.

Just saying, we better damn be sure this is the way we want to go.

> > +	if (err)
> > +		xfs_notice(mp, "Sending XFS uevent %d got error %d",
> > +				action, err);
> > +}
> > +
> >  STATIC int
> >  xfs_fs_fill_super(
> >  	struct super_block	*sb,
> > @@ -1667,6 +1680,8 @@ xfs_fs_fill_super(
> >  		goto out_unmount;
> >  	}
> >  
> > +	xfs_fs_uevent(mp, KOBJ_ONLINE);
> 
> That said, I wonder if we ought to move this discussion to fsdevel to
> pull in any /other/ filesystems that have sysfs knobs that they might
> like to be able to persist?  ext4 has a bunch of tunables too...

Done :)

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Aug. 24, 2017, 6:13 p.m. UTC | #3
On the other thread we can discuss the configuration file parser used,
this other thread we can use for the sysfs knobs and if it makes sense
to share anything in consideration for a possible future udev event.
More below.

On Thu, Aug 24, 2017 at 08:04:41PM +0200, Luis R. Rodriguez wrote:
> On Thu, Aug 24, 2017 at 10:10:28AM -0700, Darrick J. Wong wrote:
> > It might be useful to send kobject_uevent_env so that we can squeeze in
> > things like the fs uuid for easier identification?  I envision a
> > /etc/xfs/fs.conf file like this:
> > 
> > [903db5b1-82cd-4f26-8221-443a4ed0563a]
> > error.metadata.EIO.max_retries = 0
> > error.fail_at_unmount = 1
> > config.cowextsize = 1048576
> > config.inherit_noatime = 1
> > 
> > With a udev helper that uses the fsuuid to find the appropriate section
> > of the ini^Wconfig file and load the appropriate values into sysfs.
> 
> While all this is nice, I'm sure we're all aware of the dangers of setting
> things in stone through sysfs, its likely already decided for the above
> tunables,

<-- udev thing and now snip -->

> Just saying, we better damn be sure this is the way we want to go.
> 
> > > +	if (err)
> > > +		xfs_notice(mp, "Sending XFS uevent %d got error %d",
> > > +				action, err);
> > > +}
> > > +
> > >  STATIC int
> > >  xfs_fs_fill_super(
> > >  	struct super_block	*sb,
> > > @@ -1667,6 +1680,8 @@ xfs_fs_fill_super(
> > >  		goto out_unmount;
> > >  	}
> > >  
> > > +	xfs_fs_uevent(mp, KOBJ_ONLINE);
> > 
> > That said, I wonder if we ought to move this discussion to fsdevel to
> > pull in any /other/ filesystems that have sysfs knobs that they might
> > like to be able to persist?  ext4 has a bunch of tunables too...

So were you thinking perhaps if we are going to use udev that it may make
sense then to instead consider if we want to share tunables for uevents?
Then instead of each filesystem having its own parser, etc, we have set
of common knobs for any filesystem possible.

Is that what you had in mind to discuss?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Aug. 24, 2017, 6:59 p.m. UTC | #4
On Thu, Aug 24, 2017 at 08:04:41PM +0200, Luis R. Rodriguez wrote:
> On that note FWIW, at this point I've reviewed 4 libraries for mkfs.xfs:
> 
>   o libconfig -- we want to stay away from this [0]
>   o e2fsprogs libprofile -- its a fork of krb5 libparser, size-wise this is [1]
>     a best option, however its also a fork on its own, or 
>   o libini_config -- my pick and what I recommend [2] due it supporting
>     uint64_t well, having a good new community traction, and a good set of
>     tests
>   o nfs: has its own incarnation of a library
> 
> *Iff* we want to share the same library for mkfs.xfs and also a udev helper,
> then a new requirement here is perhaps also having a respective library on
> other higher level language like python for instance, if not now perhaps
> for the future. Unless of course we'd expect such udev heleprs to be C
> based.
> 
> [0] https://gitlab.com/mcgrof/libconfig-int64-issues
> [1] https://gitlab.com/mcgrof/libekprofile
> [2] https://gitlab.com/mcgrof/libini_config-demo

FWIW -- for a more thorough evaluation folks can read a bit more about
configuration file uses for filesystems on this work-in-progress paper:

https://gitlab.com/mcgrof/filesystem-configuration-paper/blob/master/paper.pdf

It does not take into account the newly mentioned possible use case as this
is new and just now considered.

Feedback/rants on other points welcomed.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
NeilBrown Aug. 24, 2017, 10:11 p.m. UTC | #5
On Thu, Aug 24 2017, Luis R. Rodriguez wrote:
>
> Just saying, we better damn be sure this is the way we want to go.
>

Just to provide contrast, NFS has a config file which can provide
certain defaults for mounted filesystems.  However it doesn't require a
udev event to achieve this.

Instead, it provides "/sbin/mount.nfs" which imposes the defaults at
mount time, rather than just after mount time.

There could, conceivably, be an "/sbin/mount.xfs" which read the config
file and set all the defaults concurrently with the mount operation.

I'm not necessarily saying this is a better way to go, but I agree with
the need to be sure, and this provides evidence that other approaches
are possible and worth considering.

NeilBrown
Luis Chamberlain Aug. 24, 2017, 10:39 p.m. UTC | #6
On Fri, Aug 25, 2017 at 08:11:05AM +1000, NeilBrown wrote:
> On Thu, Aug 24 2017, Luis R. Rodriguez wrote:
> >
> > Just saying, we better damn be sure this is the way we want to go.
> >
> 
> Just to provide contrast, NFS has a config file which can provide
> certain defaults for mounted filesystems.  However it doesn't require a
> udev event to achieve this.
> 
> Instead, it provides "/sbin/mount.nfs" which imposes the defaults at
> mount time, rather than just after mount time.
> 
> There could, conceivably, be an "/sbin/mount.xfs" which read the config
> file and set all the defaults concurrently with the mount operation.
> 
> I'm not necessarily saying this is a better way to go, but I agree with
> the need to be sure, and this provides evidence that other approaches
> are possible and worth considering.

I like the NFS approach given:

  o It would mean you can keep things contained within each filesystem
    userspace tool. This would not escalate the requirements for what
    parser to use, ie we would not need to pick one for a higher level
    language for udev rules. Tools which already have a parser
    like e2fsprogs can keep on chugging with libprofile, meanwhile XFS and
    others can choose whatever library their own heart pleases and the
    requirement still remains just simple file parsing, not needing
    higher level language support.

 o One advantage of having things like /sbin/mount.<filesystem> is
   shared filesystems between a type of system can be kept separately
   from host specific mount points. This could allow sharing *one* file
   for a type of target system (database, development, etc). There
   would not be any need to look for your own filesystem in /etc/fstab,
   or adding entries, then /etc/fstab would just become very host specific.

 o Custom entries can be supported without having to worry about
   breaking parsing of the general /etc/fstab, it also removes clutter
   from complex lines and entries out of /etc/fstab

Are there other needs for uevents which could not be addressed through
a filesystem configuration file or which prove more efficient or provide
more flexibility?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Aug. 24, 2017, 11:34 p.m. UTC | #7
On Fri, Aug 25, 2017 at 08:11:05AM +1000, NeilBrown wrote:
> 
> Just to provide contrast, NFS has a config file which can provide
> certain defaults for mounted filesystems.  However it doesn't require a
> udev event to achieve this.

I think we're mixing multiple things, and that may be confusing the
discussion.  A config file parser is a technology.  *How* you use the
config file parser can be quite different even if it is using the same
config file parser or config file format.

From what I've seen in this e-mail thread, there are at least three
different use cases:

1)  How mkfs uses to create the file system
2)  How the file system should be configured when/while it is mounted
3)  Message marshalling and unmarshalling when communicating with the
    kernel via uevents

> Instead, it provides "/sbin/mount.nfs" which imposes the defaults at
> mount time, rather than just after mount time.
> 
> There could, conceivably, be an "/sbin/mount.xfs" which read the config
> file and set all the defaults concurrently with the mount operation.

For nfs there are a large number of knobs that are of interest at
mount time, such that specifying them all as mount options on the
command line, or in /etc/fstab, gets spectacularly painful.  Which is
why /etc/nfsmount.conf makes sense.

For ext4 and nfs we have many more knobs that come into play when the
file system is created, and so for ext4 that's why I have
/etc/mke2fs.conf, where I might have some complex stanzas such as:

    hugefile = {
        features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
        cluster_size = 32768
        hash_alg = half_md4
        reserved_ratio = 0.0
        num_backup_sb = 0
        packed_meta_blocks = 1
        make_hugefiles = 1
        inode_ratio = 4194304
        hugefiles_dir = /huge-dir
        hugefiles_name = huge-data
        hugefiles_digits = 0
        hugefiles_size = 0
        hugefiles_align = 256M
        hugefiles_align_disk = true
        num_hugefiles = 1
        zero_hugefiles = false
	flex_bg_size = 262144
    }

...and then the sysadmin only has to type "mke2fs -T hugefile
/dev/sdc3" (for example).

The question of using the same parser for uevents is yet another can
of worms.  And there, I think we need to be careful about what the
requirements are for these uevents.  Uevents is just the technology.
There are many use cases for that technology; are you using them to
signal mounts/unmounts?  And at the file system level or at the mount
namespace level?

Are you using them to single file system corruption problems?  And if
so, are you trying to trying to send a single bit of information ---
"the file system is corrupted"?  Or are you trying to send much more
information --- for example, sending each inode that file system found
that was corrupted, and each block number where the SCSI device found
a media error?  (We have a solution that does this inside Google, and
it uses a netlink socket to do the delievery.  I'm pretty sure that if
you used uevents, udev could melt down if there are a large number of
errors reported.)

And then the final question here is how much does standardization
help?  Are there use cases where progams really want to use the same
format when updating /etc/mke2fs.conf versus /etc/nfsmount.conf versus
/etc/mkfs.xfs.conf?  Or is this more of, "let's try to make life
slightly easier for the user by keeping things mostly the same as we
can"?  Or is this just a matter of file system developers sharing
their experiences of various different designs, just so that they are
aware of what libaries or code bases are available for them to use?

Cheers,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luis Chamberlain Aug. 25, 2017, 12:02 a.m. UTC | #8
On Thu, Aug 24, 2017 at 07:34:53PM -0400, Theodore Ts'o wrote:
> On Fri, Aug 25, 2017 at 08:11:05AM +1000, NeilBrown wrote:
> > 
> > Just to provide contrast, NFS has a config file which can provide
> > certain defaults for mounted filesystems.  However it doesn't require a
> > udev event to achieve this.
> 
> I think we're mixing multiple things, and that may be confusing the
> discussion.  A config file parser is a technology.  *How* you use the
> config file parser can be quite different even if it is using the same
> config file parser or config file format.
> 
> From what I've seen in this e-mail thread, there are at least three
> different use cases:
> 
> 1)  How mkfs uses to create the file system
> 2)  How the file system should be configured when/while it is mounted
> 3)  Message marshalling and unmarshalling when communicating with the
>     kernel via uevents

Right, its perhaps easier to relate them to the tools used for
interfacing with filesystems:

a) mkfs.<filesystem>
b) mount.<filesystem>
c) new but in this particular use case it was to parse a configuration
   file to tweak a set of sysfs knobs which tune the filesystem

c) is somewhat related to b) which begs the question if some of these
parameters can instead be set at mount time, and if not why not. This
is also why the question came up about making some of these knobs
generic.

> The question of using the same parser for uevents is yet another can
> of worms.  And there, I think we need to be careful about what the
> requirements are for these uevents.  Uevents is just the technology.
> There are many use cases for that technology; are you using them to
> signal mounts/unmounts?  And at the file system level or at the mount
> namespace level?

In the given use case which brought this up it was to tweak specific
filesystems with XFS syfs knobs for error handling, as an example
shut down the filesyste on any IO error, for a careful pedantic
and paranoid admin/setup.

> Are you using them to single file system corruption problems?  And if
> so, are you trying to trying to send a single bit of information ---
> "the file system is corrupted"?  Or are you trying to send much more
> information --- for example, sending each inode that file system found
> that was corrupted, and each block number where the SCSI device found
> a media error?  (We have a solution that does this inside Google, and
> it uses a netlink socket to do the delievery.  I'm pretty sure that if
> you used uevents, udev could melt down if there are a large number of
> errors reported.)

Such uses seem rather advanced and perhaps should be considered in the
future.  I agree netlink is much more suitable in that case (or generic
netlink), that was designed to scale, at least at the networking level.

> And then the final question here is how much does standardization
> help?

One aspect here was the syfs knobs and whether or not we wanted
a shared set of errors for which we want to grant admins a way
to say easily "Hey for these types of errors please shutdown the
filesystem". Let's start off with that as it stays a bit more on
topic for now.

For the rest (mkfs.<filesystem>, mount.<filesystem>) I think that
those already have generic attributes figured out, and that custom
stuff has been thought out as well.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Aug. 25, 2017, 12:27 a.m. UTC | #9
On Thu, Aug 24, 2017 at 10:10:28AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 24, 2017 at 09:41:55PM +0800, Hou Tao wrote:
> > It will be useful if there is a corresponding online uevent after
> > a XFS filesystem has been mounted. A typical usage of the uevent
> > is setting the error configuration for a specific XFS filesystem
> > or all XFS filesystems by using udevd.
> > 
> > The following is an example of udevd rule which will shutdown
> > any XFS filesystem after the filesystem gets any IO error:
> > 
> >     ACTION=="online", SUBSYSTEM=="xfs", DEVPATH=="/fs/xfs/*", \
> >     RUN+="/bin/sh -c 'echo 0 > /sys%p/error/metadata/default/max_retries; \
> > 	    echo 0 > /sys%p/error/metadata/EIO/max_retries; \
> > 	    echo 0 > /sys%p/error/metadata/ENOSPC/max_retries; \
> > 	    echo 0 > /sys%p/error/metadata/ENODEV/max_retries'"
> >
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> >  fs/xfs/xfs_super.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 38aaacd..695fe85f 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1530,6 +1530,19 @@ xfs_destroy_percpu_counters(
> >  	percpu_counter_destroy(&mp->m_fdblocks);
> >  }
> >  
> > +static void
> > +xfs_fs_uevent(
> > +	struct xfs_mount	*mp,
> > +	enum kobject_action	action)
> > +{
> > +	int err;
> > +
> > +	err = kobject_uevent(&mp->m_kobj.kobject, action);
> 
> It might be useful to send kobject_uevent_env so that we can squeeze in
> things like the fs uuid for easier identification?  I envision a
> /etc/xfs/fs.conf file like this:
> 
> [903db5b1-82cd-4f26-8221-443a4ed0563a]
> error.metadata.EIO.max_retries = 0
> error.fail_at_unmount = 1
> config.cowextsize = 1048576
> config.inherit_noatime = 1
> 
> With a udev helper that uses the fsuuid to find the appropriate section
> of the ini^Wconfig file and load the appropriate values into sysfs.  If
> we ever decide to expose more config knobs through sysfs then they'll
> be automatically supported.

Yup, that's sounds like a good idea.

> > +	if (err)
> > +		xfs_notice(mp, "Sending XFS uevent %d got error %d",
> > +				action, err);
> > +}
> > +
> >  STATIC int
> >  xfs_fs_fill_super(
> >  	struct super_block	*sb,
> > @@ -1667,6 +1680,8 @@ xfs_fs_fill_super(
> >  		goto out_unmount;
> >  	}
> >  
> > +	xfs_fs_uevent(mp, KOBJ_ONLINE);
> 
> That said, I wonder if we ought to move this discussion to fsdevel to
> pull in any /other/ filesystems that have sysfs knobs that they might
> like to be able to persist?  ext4 has a bunch of tunables too...

Nope. We'll never get anywhere if we do that, just like the last
umpteenth times that generic fs events have been discussed. These
are XFS specific kobjs so we can do what we like with them and we
have an XFS specific problem that needs solving....

Cheers,

Dave.
Dave Chinner Aug. 25, 2017, 1:01 a.m. UTC | #10
On Thu, Aug 24, 2017 at 08:04:41PM +0200, Luis R. Rodriguez wrote:
> While all this is nice, I'm sure we're all aware of the dangers of setting
> things in stone through sysfs, its likely already decided for the above
> tunables, but adding a uevent *is* yet another layer of user interface
> which userspace can *expect* and we should be certain we want this so
> we won't regress userspace later.
> 
> Just saying, we better damn be sure this is the way we want to go.

I'm not sure what you are trying to warn us about? :/

These are events on an XFS specific kobj, it's not a generic VFS
filesystem event we are generating here. It's no different from a
hardware device generating it's own uevents to tell userspace
something changed in the device.

A quick grep would have told you that GFS2 already has it's own
online/offline uevents (e.g. gfs2_online_uevent()), as does the
DLM code. Orangefs seems to use quite a few of uevent notifications,
too. So it's not like we're doing something controversial or unique
here, nor something that locks us out of a non-existent VFS event
notification mechanism.

Please, talk all you want about a Grand Unified VFS Event
Notification infrastructure and filesystem tool libraries, but
please don't hijack an unrelated discussion to do it.

Cheers,

Dave.
Luis Chamberlain Aug. 25, 2017, 1:20 a.m. UTC | #11
On Fri, Aug 25, 2017 at 11:01:55AM +1000, Dave Chinner wrote:
> On Thu, Aug 24, 2017 at 08:04:41PM +0200, Luis R. Rodriguez wrote:
> > While all this is nice, I'm sure we're all aware of the dangers of setting
> > things in stone through sysfs, its likely already decided for the above
> > tunables, but adding a uevent *is* yet another layer of user interface
> > which userspace can *expect* and we should be certain we want this so
> > we won't regress userspace later.
> > 
> > Just saying, we better damn be sure this is the way we want to go.
> 
> I'm not sure what you are trying to warn us about? :/
> 
> These are events on an XFS specific kobj, it's not a generic VFS
> filesystem event we are generating here. It's no different from a
> hardware device generating it's own uevents to tell userspace
> something changed in the device.

I meant that once its sent even if it is XFS specific, it will be
something that some userspace can expect and then we'd always have
to send it later.

> A quick grep would have told you that GFS2 already has it's own
> online/offline uevents (e.g. gfs2_online_uevent()), as does the
> DLM code. Orangefs seems to use quite a few of uevent notifications,
> too. So it's not like we're doing something controversial or unique
> here, nor something that locks us out of a non-existent VFS event
> notification mechanism.

I don't think its controversial *at all*. Just that a mount uevent,
with uuid, sounds like something we could at least agree is pretty generic.

Regardless of what we end up doing with it or how.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Aug. 25, 2017, 2:18 a.m. UTC | #12
On Fri, Aug 25, 2017 at 03:20:01AM +0200, Luis R. Rodriguez wrote:
> On Fri, Aug 25, 2017 at 11:01:55AM +1000, Dave Chinner wrote:
> > On Thu, Aug 24, 2017 at 08:04:41PM +0200, Luis R. Rodriguez wrote:
> > > While all this is nice, I'm sure we're all aware of the dangers of setting
> > > things in stone through sysfs, its likely already decided for the above
> > > tunables, but adding a uevent *is* yet another layer of user interface
> > > which userspace can *expect* and we should be certain we want this so
> > > we won't regress userspace later.
> > > 
> > > Just saying, we better damn be sure this is the way we want to go.
> > 
> > I'm not sure what you are trying to warn us about? :/
> > 
> > These are events on an XFS specific kobj, it's not a generic VFS
> > filesystem event we are generating here. It's no different from a
> > hardware device generating it's own uevents to tell userspace
> > something changed in the device.
> 
> I meant that once its sent even if it is XFS specific, it will be
> something that some userspace can expect and then we'd always have
> to send it later.

Well, yes. Just like we have to support ioctls, /proc and /sysfs
interfaces and the quota event interface essentially forever.
There's nothing new or difficult about that.

> > A quick grep would have told you that GFS2 already has it's own
> > online/offline uevents (e.g. gfs2_online_uevent()), as does the
> > DLM code. Orangefs seems to use quite a few of uevent notifications,
> > too. So it's not like we're doing something controversial or unique
> > here, nor something that locks us out of a non-existent VFS event
> > notification mechanism.
> 
> I don't think its controversial *at all*. Just that a mount uevent,
> with uuid, sounds like something we could at least agree is pretty generic.

Yeah, right. Call me cynical, but every single time this has been
brought up it devolves into a paint-fest where everyone wants
something to be added before anything can be done because generic
filesystem events is a deep, dark complex hole. e.g. think of bind
mounts: go and develop arguments for and against whether we should
send generic mount notifications for bind mounts. Generic doesn't
mean simple.

IOWs, what you describe as "pretty generic", I see as "a great big
can of worms".

Cheers,

Dave.
Darrick J. Wong Aug. 25, 2017, 4:54 a.m. UTC | #13
On Thu, Aug 24, 2017 at 07:34:53PM -0400, Theodore Ts'o wrote:
> On Fri, Aug 25, 2017 at 08:11:05AM +1000, NeilBrown wrote:
> > 
> > Just to provide contrast, NFS has a config file which can provide
> > certain defaults for mounted filesystems.  However it doesn't require a
> > udev event to achieve this.
> 
> I think we're mixing multiple things, and that may be confusing the
> discussion.  A config file parser is a technology.  *How* you use the
> config file parser can be quite different even if it is using the same
> config file parser or config file format.

<shrug> Well, they /are/ interlinked, at least -- if XFS were to adopt
a program to inject config file values into the kernel and a config file
for mkfs, I think it makes sense at least to think about sharing the
same parser for both, though we needn't do that in /this/ particular
thread.  OTOH it's something to keep in the back of one's head while we
hash out what, if anything, to put in a mount uevent.

> From what I've seen in this e-mail thread, there are at least three
> different use cases:
> 
> 1)  How mkfs uses to create the file system
> 2)  How the file system should be configured when/while it is mounted
> 3)  Message marshalling and unmarshalling when communicating with the
>     kernel via uevents

I wasn't suggesting we try to make uevent formats so fancy that they
require parsing; the current environment variable mechanism is just
fine.

> > Instead, it provides "/sbin/mount.nfs" which imposes the defaults at
> > mount time, rather than just after mount time.
> > 
> > There could, conceivably, be an "/sbin/mount.xfs" which read the config
> > file and set all the defaults concurrently with the mount operation.
> 
> For nfs there are a large number of knobs that are of interest at
> mount time, such that specifying them all as mount options on the
> command line, or in /etc/fstab, gets spectacularly painful.  Which is
> why /etc/nfsmount.conf makes sense.
> 
> For ext4 and nfs we have many more knobs that come into play when the
> file system is created, and so for ext4 that's why I have
> /etc/mke2fs.conf, where I might have some complex stanzas such as:
> 
>     hugefile = {
>         features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
>         cluster_size = 32768
>         hash_alg = half_md4
>         reserved_ratio = 0.0
>         num_backup_sb = 0
>         packed_meta_blocks = 1
>         make_hugefiles = 1
>         inode_ratio = 4194304
>         hugefiles_dir = /huge-dir
>         hugefiles_name = huge-data
>         hugefiles_digits = 0
>         hugefiles_size = 0
>         hugefiles_align = 256M
>         hugefiles_align_disk = true
>         num_hugefiles = 1
>         zero_hugefiles = false
> 	flex_bg_size = 262144
>     }
> 
> ...and then the sysadmin only has to type "mke2fs -T hugefile
> /dev/sdc3" (for example).

(FWIW I'm interested in pursuing usage types in a mkfs.xfs config but
I'll hold on to that until we start that discussion.)

> The question of using the same parser for uevents is yet another can
> of worms.  And there, I think we need to be careful about what the
> requirements are for these uevents.  Uevents is just the technology.
> There are many use cases for that technology; are you using them to
> signal mounts/unmounts?  And at the file system level or at the mount
> namespace level?
> 
> Are you using them to single file system corruption problems?  And if
> so, are you trying to trying to send a single bit of information ---
> "the file system is corrupted"?  Or are you trying to send much more
> information --- for example, sending each inode that file system found
> that was corrupted, and each block number where the SCSI device found
> a media error?  (We have a solution that does this inside Google, and
> it uses a netlink socket to do the delievery.  I'm pretty sure that if
> you used uevents, udev could melt down if there are a large number of
> errors reported.)

For now I'd only send along the fact that the fs shut down.  If XFS ever
grows the ability to receive notifications that a range of blocks has
gone bad we could (ratelimited) push that up too.

> And then the final question here is how much does standardization
> help?  Are there use cases where progams really want to use the same
> format when updating /etc/mke2fs.conf versus /etc/nfsmount.conf versus
> /etc/mkfs.xfs.conf?  Or is this more of, "let's try to make life
> slightly easier for the user by keeping things mostly the same as we
> can"?  Or is this just a matter of file system developers sharing
> their experiences of various different designs, just so that they are
> aware of what libaries or code bases are available for them to use?

/I/ was looking for sharing experiences and trying to make life somewhat
easier for administrators by keeping the syntax roughly the same, even
if the particular keys and values aren't exactly the same from one fs to
the other.  Ultimately, we can (and probably will) do whatever we think
is a good fit for XFS, but seeing as there are some things that other
filesystems have done that we haven't, we could at least benefit from
whatever people learned from plumbing support into those filesystms.

I thought about the "mount.nfs mounts the filesystem and then plugs in
the config as desired" but then realized that it's a suid executable and
yeeech. :)

--D

> 
> Cheers,
> 
> 						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong Aug. 25, 2017, 5:29 a.m. UTC | #14
On Fri, Aug 25, 2017 at 12:18:30PM +1000, Dave Chinner wrote:
> On Fri, Aug 25, 2017 at 03:20:01AM +0200, Luis R. Rodriguez wrote:
> > On Fri, Aug 25, 2017 at 11:01:55AM +1000, Dave Chinner wrote:
> > > On Thu, Aug 24, 2017 at 08:04:41PM +0200, Luis R. Rodriguez wrote:
> > > > While all this is nice, I'm sure we're all aware of the dangers of setting
> > > > things in stone through sysfs, its likely already decided for the above
> > > > tunables, but adding a uevent *is* yet another layer of user interface
> > > > which userspace can *expect* and we should be certain we want this so
> > > > we won't regress userspace later.
> > > > 
> > > > Just saying, we better damn be sure this is the way we want to go.
> > > 
> > > I'm not sure what you are trying to warn us about? :/
> > > 
> > > These are events on an XFS specific kobj, it's not a generic VFS
> > > filesystem event we are generating here. It's no different from a
> > > hardware device generating it's own uevents to tell userspace
> > > something changed in the device.
> > 
> > I meant that once its sent even if it is XFS specific, it will be
> > something that some userspace can expect and then we'd always have
> > to send it later.
> 
> Well, yes. Just like we have to support ioctls, /proc and /sysfs
> interfaces and the quota event interface essentially forever.
> There's nothing new or difficult about that.
> 
> > > A quick grep would have told you that GFS2 already has it's own
> > > online/offline uevents (e.g. gfs2_online_uevent()), as does the
> > > DLM code. Orangefs seems to use quite a few of uevent notifications,
> > > too. So it's not like we're doing something controversial or unique
> > > here, nor something that locks us out of a non-existent VFS event
> > > notification mechanism.
> > 
> > I don't think its controversial *at all*. Just that a mount uevent,
> > with uuid, sounds like something we could at least agree is pretty generic.

It's not a slam dunk -- look at all the rattling that went on when
overlayfs tried to use s_uuid.

> Yeah, right. Call me cynical, but every single time this has been
> brought up it devolves into a paint-fest where everyone wants
> something to be added before anything can be done because generic
> filesystem events is a deep, dark complex hole. e.g. think of bind
> mounts: go and develop arguments for and against whether we should
> send generic mount notifications for bind mounts. Generic doesn't
> mean simple.
>
> IOWs, what you describe as "pretty generic", I see as "a great big
> can of worms".

By my count there are 70 filesystems (in fs/ anyway) so the likelihood
of successfully coordinating the development of a common vfs uevent
anything is so low I wasn't even going to consider it.  I've proposed
patches that change things across all the filesystems before; it sucks,
and meh.

I don't see a problem with continuing to let individual filesystems
manage their own kobject attributes and events, but frankly I've found
it very useful to absorb a more diverse set of points of view by asking
other filesystem developers what they've done and what pain points they
might have hit.

What I'd really like to see is that XFS retains control of its per-fs
kobj.  We'll add FS UUID as an environment variable to the uevent patch
(XFS has uuids, no controversy there!) and run it by fsdevel to see if
anyone from the wider fsdevel audience can make a convincing argument to
add/remove/change the variables that get passed to the uevent, or even
to consider a different approach.  (This is what seems to be happening,
if in a somewhat incohesive manner.)  Then we do our usual rounds of
review and (presumably) merge it along with the appropriate xfsprogs
helper.

Since uevents can send freeform text to the uevent handler in the form
of environment variables, this means that any other fs project that
might be considering adding uevents can build on the keys we've created.
They can also do their own thing.  I've written udev rule files, it's
not difficult to adapt the ruleset to format shifts, at least not as
difficult as structure changes to an ABI.

IOWs, I'm looking for de facto coordination if/where desired, not a
heavy top down coordination process.  If your project doesn't like it,
you're not bound by it.

Maybe I should've been more explicit with what I was looking for, but
TBH in the email just prior to this thread being cc'd to fsdevel I
really was only asking the xfs list "should we take this to fsdevel or
just leave it here"?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 38aaacd..695fe85f 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1530,6 +1530,19 @@  xfs_destroy_percpu_counters(
 	percpu_counter_destroy(&mp->m_fdblocks);
 }
 
+static void
+xfs_fs_uevent(
+	struct xfs_mount	*mp,
+	enum kobject_action	action)
+{
+	int err;
+
+	err = kobject_uevent(&mp->m_kobj.kobject, action);
+	if (err)
+		xfs_notice(mp, "Sending XFS uevent %d got error %d",
+				action, err);
+}
+
 STATIC int
 xfs_fs_fill_super(
 	struct super_block	*sb,
@@ -1667,6 +1680,8 @@  xfs_fs_fill_super(
 		goto out_unmount;
 	}
 
+	xfs_fs_uevent(mp, KOBJ_ONLINE);
+
 	return 0;
 
  out_filestream_unmount: