Message ID | 1468922657-3895-1-git-send-email-cmaiolino@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 7/19/16 3:04 AM, Carlos Maiolino wrote: > This is the first try to document the implementation of error handlers into > sysfs. > > Reviews and comments are appreciated, please also notice I'm not english-native, > so, spelling corrections are also appreciated :) Thanks for doing this! There seems to be a specific sysfs documentation format, see for example Documentation/ABI/testing/sysfs-fs-ext4 It might be better to follow that format, and refer to it after a brief explanation of the functionality in the xfs.txt file? -Eric
On 7/19/16 2:15 PM, Eric Sandeen wrote: > On 7/19/16 3:04 AM, Carlos Maiolino wrote: >> This is the first try to document the implementation of error handlers into >> sysfs. >> >> Reviews and comments are appreciated, please also notice I'm not english-native, >> so, spelling corrections are also appreciated :) > > Thanks for doing this! > > There seems to be a specific sysfs documentation format, see for example > Documentation/ABI/testing/sysfs-fs-ext4 > > It might be better to follow that format, and refer to it after a brief > explanation of the functionality in the xfs.txt file? Or not; Dave doesn't like this location, so perhaps best not to take my suggestion. ;) -Eric
On Tue, Jul 19, 2016 at 11:18:01PM -0700, Eric Sandeen wrote: > > > On 7/19/16 2:15 PM, Eric Sandeen wrote: > > On 7/19/16 3:04 AM, Carlos Maiolino wrote: > >> This is the first try to document the implementation of error handlers into > >> sysfs. > >> > >> Reviews and comments are appreciated, please also notice I'm not english-native, > >> so, spelling corrections are also appreciated :) > > > > Thanks for doing this! > > > > There seems to be a specific sysfs documentation format, see for example > > Documentation/ABI/testing/sysfs-fs-ext4 > > > > It might be better to follow that format, and refer to it after a brief > > explanation of the functionality in the xfs.txt file? > > Or not; Dave doesn't like this location, so perhaps best not to take > my suggestion. ;) Oh, I can see now why he doesn't like that, I've never seen such directory until you mentioned it, why should it be so hidden, and why should we split filesystem information into different locations. IMHO, if someone want to take a look into filesystem documentation, the person goes directly to Documentation/filesystems, I honestly think splitting information into two different directories are wrong, and, even though you point to there in some other place, it is still bad, sounds like a RPG book... Start here...now go to page X...now go to page Y...now go to page Z. I can re-format the documentation to the same format from sysfs-fs-ext4, but I believe keeping it under Documentation/filesystems is still the best to do. To be honest, I actually think we should create an XFS directory under it and put everything xfs related there. Cheers > > -Eric > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs
On Wed, Jul 20, 2016 at 11:04 AM, Carlos Maiolino <cmaiolino@redhat.com> wrote: > > IMHO, if someone want to take a look into filesystem documentation, the person > goes directly to Documentation/filesystems, I honestly think splitting > information into two different directories are wrong, and, even though you point > to there in some other place, it is still bad, sounds like a RPG book... Start > here...now go to page X...now go to page Y...now go to page Z. > I'm sorry for this offtopic, but I would almost bet that I saw a game made in man pages. Yet Google can't find anything, so maybe it is a kind of deja vu, or a common experience... :( Jan
On Wed, Jul 20, 2016 at 11:04:06AM +0200, Carlos Maiolino wrote: > On Tue, Jul 19, 2016 at 11:18:01PM -0700, Eric Sandeen wrote: > > > > > > On 7/19/16 2:15 PM, Eric Sandeen wrote: > > > On 7/19/16 3:04 AM, Carlos Maiolino wrote: > > >> This is the first try to document the implementation of error handlers into > > >> sysfs. > > >> > > >> Reviews and comments are appreciated, please also notice I'm not english-native, > > >> so, spelling corrections are also appreciated :) > > > > > > Thanks for doing this! > > > > > > There seems to be a specific sysfs documentation format, see for example > > > Documentation/ABI/testing/sysfs-fs-ext4 > > > > > > It might be better to follow that format, and refer to it after a brief > > > explanation of the functionality in the xfs.txt file? > > > > Or not; Dave doesn't like this location, so perhaps best not to take > > my suggestion. ;) > > Oh, I can see now why he doesn't like that, I've never seen such directory until > you mentioned it, why should it be so hidden, and why should we split filesystem > information into different locations. > > IMHO, if someone want to take a look into filesystem documentation, the person > goes directly to Documentation/filesystems, I honestly think splitting > information into two different directories are wrong, and, even though you point > to there in some other place, it is still bad, sounds like a RPG book... Start > here...now go to page X...now go to page Y...now go to page Z. > > I can re-format the documentation to the same format from sysfs-fs-ext4, but I > believe keeping it under Documentation/filesystems is still the best to do. To > be honest, I actually think we should create an XFS directory under it and put > everything xfs related there. I'd just add it to Doc/fs/xfs.txt right now, and we can work out restructuring details later. Especially as we really need this documentation added to the xfs-documentation repo (along with a "how to use" guide). It's a similar situation to the libxfs code shared between kernel and userspace, I think... Cheers, Dave.
On Tue, Jul 19, 2016 at 12:04:17PM +0200, Carlos Maiolino wrote: > This is the first try to document the implementation of error handlers into > sysfs. > > Reviews and comments are appreciated, please also notice I'm not english-native, > so, spelling corrections are also appreciated :) > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > --- > Documentation/filesystems/xfs.txt | 78 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 78 insertions(+) > > diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt > index 8146e9f..1df868a 100644 > --- a/Documentation/filesystems/xfs.txt > +++ b/Documentation/filesystems/xfs.txt > @@ -348,3 +348,81 @@ Removed Sysctls > ---- ------- > fs.xfs.xfsbufd_centisec v4.0 > fs.xfs.age_buffer_centisecs v4.0 > + > +Error handling > +============== > + > +XFS can act differently according with the type of error found > +during its operation. The implementation introduces the following > +concepts to the error handler: > + > + -failure speed: > + Defines how fast XFS should shutdown in case of a specific > + error is found during the filesystem operation. It can > + shutdown immediately, after a defined number of tries, or > + simply try forever, which was the old behavior and is now > + set as default behavior, except during unmount time, where > + in case of a error is found while unmounting, the filesystem > + will shutdown. > + > + -error classes: > + Specifies the subsystem/location where the error handlers > + configure the behavior for, such as metadata or memory allocation. > + > + -error handlers: > + Defines the behavior for a specific error. > + > +The filesystem behavior during an error can be set via sysfs files, where, the > +errors are organized with the following structure: > + > + /sys/fs/xfs/<dev>/error/<class>/<error>/ > + > +Each directory contains: > + > + /sys/fs/xfs/<dev>/error/ > + > + fail_at_unmount (Min: 0 Default: 1 Max: 1) > + Defines the global error behavior during unmount time. If set to > + "1", XFS will shutdown in case of any error is found, otherwise, > + if set to "0", the filesystem will indefinitely retry to cleanly > + unmount the filesystem. Hi Carlos, Could you explain more about the relationship of fail_at_unmount and max_retries(/retry_timeout_seconds). For example, if I set fail_at_unmount=0, and set EIO/max_retries=1, what's expected? I'd like to write test case about this error handling, according to your document. Thanks, Zorro > + > + <class> subdirectories > + Contains specific error handlers configuration > + (Ex: /sys/fs/xfs/<dev>/error/metadata). > + > + /sys/fs/xfs/<dev>/error/<class>/ > + > + The contents of this directory are <class> specific, since each <class> > + might need to handle different types of errors. All <error> directory > + though, contains the "default" directory, which is a global configuration > + for errors not available for independent configuration. > + > + /sys/fs/xfs/<dev>/error/<class>/<error> > + > + Contains the failure speed configuration files for each specific error, > + including the "default" behavior, which contains the same configuration > + options as the specific errors. > + > + The available configurations for each error type are: > + > + max_retries (Min: -1 Default: -1 Max: INTMAX) > + Define how many tries the filesystem is allowed to retry its > + operations during the specific error, before shutdown the > + filesystem. Setting this file to "-1", will set XFS to retry > + forever in the specific error, setting it to "0", will make > + XFS to fail immediately after the specific error is found, > + while setting it to a "N" value, where N is greater than 0, > + will make XFS retry "N" times before shutdown. > + > + retry_timeout_seconds (Min: 0 Default: 0 Max: INTMAX) > + Define the amount of time (in seconds) that the filesystem is > + allowed to retry its operations when the specific error is > + found. "0" means no wait time. > + > + > + "max_retries" takes precedence over "retry_timeout_seconds", where, > + "retry_timeout_seconds" will only be tested if the "max_retries" limit > + were not reached yet or is set to retry forever ("-1"). If "max_retries" > + limit is reached, the filesystem will shutdown, wether or not > + "retry_timeout_seconds" has been reached. > -- > 2.7.4 > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs
On Fri, Jul 22, 2016 at 12:09:55PM +0800, Zorro Lang wrote: > On Tue, Jul 19, 2016 at 12:04:17PM +0200, Carlos Maiolino wrote: > > This is the first try to document the implementation of error handlers into > > sysfs. > > > > Reviews and comments are appreciated, please also notice I'm not english-native, > > so, spelling corrections are also appreciated :) > > > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > > --- > > Documentation/filesystems/xfs.txt | 78 +++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 78 insertions(+) > > > > diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt > > index 8146e9f..1df868a 100644 > > --- a/Documentation/filesystems/xfs.txt > > +++ b/Documentation/filesystems/xfs.txt > > @@ -348,3 +348,81 @@ Removed Sysctls > > ---- ------- > > fs.xfs.xfsbufd_centisec v4.0 > > fs.xfs.age_buffer_centisecs v4.0 > > + > > +Error handling > > +============== > > + > > +XFS can act differently according with the type of error found > > +during its operation. The implementation introduces the following > > +concepts to the error handler: > > + > > + -failure speed: > > + Defines how fast XFS should shutdown in case of a specific > > + error is found during the filesystem operation. It can > > + shutdown immediately, after a defined number of tries, or > > + simply try forever, which was the old behavior and is now > > + set as default behavior, except during unmount time, where > > + in case of a error is found while unmounting, the filesystem > > + will shutdown. > > + > > + -error classes: > > + Specifies the subsystem/location where the error handlers > > + configure the behavior for, such as metadata or memory allocation. > > + > > + -error handlers: > > + Defines the behavior for a specific error. > > + > > +The filesystem behavior during an error can be set via sysfs files, where, the > > +errors are organized with the following structure: > > + > > + /sys/fs/xfs/<dev>/error/<class>/<error>/ > > + > > +Each directory contains: > > + > > + /sys/fs/xfs/<dev>/error/ > > + > > + fail_at_unmount (Min: 0 Default: 1 Max: 1) > > + Defines the global error behavior during unmount time. If set to > > + "1", XFS will shutdown in case of any error is found, otherwise, > > + if set to "0", the filesystem will indefinitely retry to cleanly > > + unmount the filesystem. > > Hi Carlos, > > Could you explain more about the relationship of fail_at_unmount and > max_retries(/retry_timeout_seconds). For example, if I set fail_at_unmount=0, > and set EIO/max_retries=1, what's expected? > They are different options, if max_retries is set to 1, it will fail after the first try as expected, even if during unmount, and even if fail_at_unmount = 0. The problem, and the reason for us to have added fail_at_unmount, is that, you can't change any configuration after umount is issued, because the sysfs directory for the device being unmounted will be detached from sysfs, so, if the sysadmin wants to make XFS retry forever for any error during the filesystem operation, he is still able to unmount the filesystem "properly" (since, if the FS find errors, it might not be a clean mount) if he sets fail_at_unmount, otherwise, he might have umount process stuck forever. > I'd like to write test case about this error handling, according to > your document. > > Thanks, > Zorro > > > + > > + <class> subdirectories > > + Contains specific error handlers configuration > > + (Ex: /sys/fs/xfs/<dev>/error/metadata). > > + > > + /sys/fs/xfs/<dev>/error/<class>/ > > + > > + The contents of this directory are <class> specific, since each <class> > > + might need to handle different types of errors. All <error> directory > > + though, contains the "default" directory, which is a global configuration > > + for errors not available for independent configuration. > > + > > + /sys/fs/xfs/<dev>/error/<class>/<error> > > + > > + Contains the failure speed configuration files for each specific error, > > + including the "default" behavior, which contains the same configuration > > + options as the specific errors. > > + > > + The available configurations for each error type are: > > + > > + max_retries (Min: -1 Default: -1 Max: INTMAX) > > + Define how many tries the filesystem is allowed to retry its > > + operations during the specific error, before shutdown the > > + filesystem. Setting this file to "-1", will set XFS to retry > > + forever in the specific error, setting it to "0", will make > > + XFS to fail immediately after the specific error is found, > > + while setting it to a "N" value, where N is greater than 0, > > + will make XFS retry "N" times before shutdown. > > + > > + retry_timeout_seconds (Min: 0 Default: 0 Max: INTMAX) > > + Define the amount of time (in seconds) that the filesystem is > > + allowed to retry its operations when the specific error is > > + found. "0" means no wait time. > > + > > + > > + "max_retries" takes precedence over "retry_timeout_seconds", where, > > + "retry_timeout_seconds" will only be tested if the "max_retries" limit > > + were not reached yet or is set to retry forever ("-1"). If "max_retries" > > + limit is reached, the filesystem will shutdown, wether or not > > + "retry_timeout_seconds" has been reached. > > -- > > 2.7.4 > > > > _______________________________________________ > > xfs mailing list > > xfs@oss.sgi.com > > http://oss.sgi.com/mailman/listinfo/xfs
Hi folks, is there any update about this? I didn't see any comments if I need to change something on this patch to get the documentation applied, or perhaps I missed some e-mail? Cheers ----- Original Message ----- From: "Carlos Maiolino" <cmaiolino@redhat.com> To: "Zorro Lang" <zlang@redhat.com> Cc: xfs@oss.sgi.com Sent: Friday, July 22, 2016 10:58:04 AM Subject: Re: [PATCH] xfs: Document error handling behavior On Fri, Jul 22, 2016 at 12:09:55PM +0800, Zorro Lang wrote: > On Tue, Jul 19, 2016 at 12:04:17PM +0200, Carlos Maiolino wrote: > > This is the first try to document the implementation of error handlers into > > sysfs. > > > > Reviews and comments are appreciated, please also notice I'm not english-native, > > so, spelling corrections are also appreciated :) > > > > Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> > > ---
On Mon, Aug 08, 2016 at 06:57:15AM -0400, Carlos Eduardo Maiolino wrote: > Hi folks, > > is there any update about this? I didn't see any comments if I need to change > something on this patch to get the documentation applied, or perhaps I missed some e-mail? I've been waiting for a v2. i.e. If you have to explain how fail at unmount works (or doesn't, in this case) during review, then that clearly needs to be added to the documentation. Cheers, Dave.
On Tue, Aug 09, 2016 at 08:40:11AM +1000, Dave Chinner wrote: > On Mon, Aug 08, 2016 at 06:57:15AM -0400, Carlos Eduardo Maiolino wrote: > > Hi folks, > > > > is there any update about this? I didn't see any comments if I need to change > > something on this patch to get the documentation applied, or perhaps I missed some e-mail? > > I've been waiting for a v2. > > i.e. If you have to explain how fail at unmount works (or doesn't, > in this case) during review, then that clearly needs to be added to > the documentation. Well, can't argue with that, I'll rework it and send a V2 :) > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt index 8146e9f..1df868a 100644 --- a/Documentation/filesystems/xfs.txt +++ b/Documentation/filesystems/xfs.txt @@ -348,3 +348,81 @@ Removed Sysctls ---- ------- fs.xfs.xfsbufd_centisec v4.0 fs.xfs.age_buffer_centisecs v4.0 + +Error handling +============== + +XFS can act differently according with the type of error found +during its operation. The implementation introduces the following +concepts to the error handler: + + -failure speed: + Defines how fast XFS should shutdown in case of a specific + error is found during the filesystem operation. It can + shutdown immediately, after a defined number of tries, or + simply try forever, which was the old behavior and is now + set as default behavior, except during unmount time, where + in case of a error is found while unmounting, the filesystem + will shutdown. + + -error classes: + Specifies the subsystem/location where the error handlers + configure the behavior for, such as metadata or memory allocation. + + -error handlers: + Defines the behavior for a specific error. + +The filesystem behavior during an error can be set via sysfs files, where, the +errors are organized with the following structure: + + /sys/fs/xfs/<dev>/error/<class>/<error>/ + +Each directory contains: + + /sys/fs/xfs/<dev>/error/ + + fail_at_unmount (Min: 0 Default: 1 Max: 1) + Defines the global error behavior during unmount time. If set to + "1", XFS will shutdown in case of any error is found, otherwise, + if set to "0", the filesystem will indefinitely retry to cleanly + unmount the filesystem. + + <class> subdirectories + Contains specific error handlers configuration + (Ex: /sys/fs/xfs/<dev>/error/metadata). + + /sys/fs/xfs/<dev>/error/<class>/ + + The contents of this directory are <class> specific, since each <class> + might need to handle different types of errors. All <error> directory + though, contains the "default" directory, which is a global configuration + for errors not available for independent configuration. + + /sys/fs/xfs/<dev>/error/<class>/<error> + + Contains the failure speed configuration files for each specific error, + including the "default" behavior, which contains the same configuration + options as the specific errors. + + The available configurations for each error type are: + + max_retries (Min: -1 Default: -1 Max: INTMAX) + Define how many tries the filesystem is allowed to retry its + operations during the specific error, before shutdown the + filesystem. Setting this file to "-1", will set XFS to retry + forever in the specific error, setting it to "0", will make + XFS to fail immediately after the specific error is found, + while setting it to a "N" value, where N is greater than 0, + will make XFS retry "N" times before shutdown. + + retry_timeout_seconds (Min: 0 Default: 0 Max: INTMAX) + Define the amount of time (in seconds) that the filesystem is + allowed to retry its operations when the specific error is + found. "0" means no wait time. + + + "max_retries" takes precedence over "retry_timeout_seconds", where, + "retry_timeout_seconds" will only be tested if the "max_retries" limit + were not reached yet or is set to retry forever ("-1"). If "max_retries" + limit is reached, the filesystem will shutdown, wether or not + "retry_timeout_seconds" has been reached.
This is the first try to document the implementation of error handlers into sysfs. Reviews and comments are appreciated, please also notice I'm not english-native, so, spelling corrections are also appreciated :) Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> --- Documentation/filesystems/xfs.txt | 78 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+)