Message ID | 20190617125529.6230-9-zyan@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ceph: remount aborted mount | expand |
On Mon, Jun 17, 2019 at 5:56 AM Yan, Zheng <zyan@redhat.com> wrote: > > After mds evicts session, file locks get lost sliently. It's not safe to > let programs continue to do read/write. I think one issue with returning EIO on a file handle that did hold a lock is that the application may be programmed to write to other files assuming that lock is never lost, yes? In that case, it may not ever write to the locked file in any case. Again, I'd like to see SIGLOST sent to the application here. Are there any objections to reviving whatever patchset was in flight to add that? Or just writeup a new one?
On Mon, 2019-06-17 at 13:32 -0700, Patrick Donnelly wrote: > On Mon, Jun 17, 2019 at 5:56 AM Yan, Zheng <zyan@redhat.com> wrote: > > After mds evicts session, file locks get lost sliently. It's not safe to > > let programs continue to do read/write. > > I think one issue with returning EIO on a file handle that did hold a > lock is that the application may be programmed to write to other files > assuming that lock is never lost, yes? In that case, it may not ever > write to the locked file in any case. > Sure, applications do all sorts of crazy things. We can only cater to so much craziness. I'll assert that most applications don't have these sorts of weirdo usage patterns, and an error on read/write is more reasonable. Note that this behavior is already documented in fcntl(2) as well, as is SIGLOST not being implemented. > Again, I'd like to see SIGLOST sent to the application here. Are there > any objections to reviving whatever patchset was in flight to add > that? Or just writeup a new one? > I think SIGLOST's utility is somewhat questionable. Applications will need to be custom-written to handle it. If you're going to do that, then it may be better to consider other async notification mechanisms. inotify or fanotify, perhaps? Those may be simpler to implement and get merged.
On Mon, Jun 17, 2019 at 1:46 PM Jeff Layton <jlayton@redhat.com> wrote: > > On Mon, 2019-06-17 at 13:32 -0700, Patrick Donnelly wrote: > > On Mon, Jun 17, 2019 at 5:56 AM Yan, Zheng <zyan@redhat.com> wrote: > > > After mds evicts session, file locks get lost sliently. It's not safe to > > > let programs continue to do read/write. > > > > I think one issue with returning EIO on a file handle that did hold a > > lock is that the application may be programmed to write to other files > > assuming that lock is never lost, yes? In that case, it may not ever > > write to the locked file in any case. > > > > Sure, applications do all sorts of crazy things. We can only cater to so > much craziness. I'll assert that most applications don't have these > sorts of weirdo usage patterns, and an error on read/write is more > reasonable. It wouldn't surprise me if it's a niche use case, but I hear about a *lot* of applications which know they're running on a distributed fs using file locks as a primitive leader election to select amongst multiple processes (this happens a lot not only in HPC, but also in distributed database and/or compute projects). That tends to involve precisely what Patrick is describing. Given that as you've said SIGLOST doesn't actually exist in Linux I don't have a good alternative, but if there was some more proactive way we could tell the application (or let applications enable a more proactive way, like getting EIO on any file access once a lock is lost?) it would probably be good. > > Note that this behavior is already documented in fcntl(2) as well, as is > SIGLOST not being implemented. > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > any objections to reviving whatever patchset was in flight to add > > that? Or just writeup a new one? > > > > I think SIGLOST's utility is somewhat questionable. Applications will > need to be custom-written to handle it. If you're going to do that, then > it may be better to consider other async notification mechanisms. > inotify or fanotify, perhaps? Those may be simpler to implement and get > merged. > -- > Jeff Layton <jlayton@redhat.com> >
On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@redhat.com> wrote: > > Again, I'd like to see SIGLOST sent to the application here. Are there > > any objections to reviving whatever patchset was in flight to add > > that? Or just writeup a new one? > > > > I think SIGLOST's utility is somewhat questionable. Applications will > need to be custom-written to handle it. If you're going to do that, then > it may be better to consider other async notification mechanisms. > inotify or fanotify, perhaps? Those may be simpler to implement and get > merged. The utility of SIGLOST is not well understood from the viewpoint of a local file system. The problem uniquely applies to distributed file systems. There simply is no way to recover from a lost lock for an application through POSIX mechanisms. We really need a new signal to just kill the application (by default) because recovery cannot be automatically performed even through system call errors. I don't see how inotify/fanotify (not POSIX interfaces!) helps here as it assumes the application will even use those system calls to monitor for lost locks when POSIX has no provision for that to happen. It's worth noting as well that the current behavior of the mount freezing on blacklist is not an acceptable status quo. The application will just silently stall the next time it tries to access the mount, if it ever does.
On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote: > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@redhat.com> wrote: > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > > any objections to reviving whatever patchset was in flight to add > > > that? Or just writeup a new one? > > > > > > > I think SIGLOST's utility is somewhat questionable. Applications will > > need to be custom-written to handle it. If you're going to do that, then > > it may be better to consider other async notification mechanisms. > > inotify or fanotify, perhaps? Those may be simpler to implement and get > > merged. > > The utility of SIGLOST is not well understood from the viewpoint of a > local file system. The problem uniquely applies to distributed file > systems. There simply is no way to recover from a lost lock for an > application through POSIX mechanisms. We really need a new signal to > just kill the application (by default) because recovery cannot be > automatically performed even through system call errors. I don't see > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes > the application will even use those system calls to monitor for lost > locks when POSIX has no provision for that to happen. > (cc'ing Anna in case she has an opinion) SIGLOST isn't defined in POSIX either, so I'm not sure that argument carries much weight. The _only_ way you'd be able to add SIGLOST is if it defaults to SIG_IGN, such that only applications that are watching for it will react to it. Given that, you're already looking at code modifications. So, the real question is: what's the best method to watch for lost locks? I don't have a terribly strong opinion about any of these notification methods, tbh. I only suggested inotify/fanotify because they'd likely be much simpler to implement. Adding signals is a non-trivial affair as we're running out of bits in that space. The lower 32 bits are all taken and the upper ones are reserved for realtime signals. My signal.h has a commented out SIGLOST: #define SIGIO 29 #define SIGPOLL SIGIO /* #define SIGLOST 29 */ Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I think it'd probably need its own value. Maybe there is some way to have the application ask that one of the realtime signals be set up for this? > It's worth noting as well that the current behavior of the mount > freezing on blacklist is not an acceptable status quo. The application > will just silently stall the next time it tries to access the mount, > if it ever does. Fair enough.
On Thu, Jun 20, 2019 at 11:18 AM Jeff Layton <jlayton@redhat.com> wrote: > > On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote: > > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@redhat.com> wrote: > > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > > > any objections to reviving whatever patchset was in flight to add > > > > that? Or just writeup a new one? > > > > > > > > > > I think SIGLOST's utility is somewhat questionable. Applications will > > > need to be custom-written to handle it. If you're going to do that, then > > > it may be better to consider other async notification mechanisms. > > > inotify or fanotify, perhaps? Those may be simpler to implement and get > > > merged. > > > > The utility of SIGLOST is not well understood from the viewpoint of a > > local file system. The problem uniquely applies to distributed file > > systems. There simply is no way to recover from a lost lock for an > > application through POSIX mechanisms. We really need a new signal to > > just kill the application (by default) because recovery cannot be > > automatically performed even through system call errors. I don't see > > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes > > the application will even use those system calls to monitor for lost > > locks when POSIX has no provision for that to happen. > > > > (cc'ing Anna in case she has an opinion) > > SIGLOST isn't defined in POSIX either, so I'm not sure that argument > carries much weight. The _only_ way you'd be able to add SIGLOST is if > it defaults to SIG_IGN, such that only applications that are watching > for it will react to it. Given that, you're already looking at code > modifications. Why does the default need to be SIG_IGN? Is that existing convention for new signals in Linux? > So, the real question is: what's the best method to watch for lost > locks? I don't have a terribly strong opinion about any of these > notification methods, tbh. I only suggested inotify/fanotify because > they'd likely be much simpler to implement. > > Adding signals is a non-trivial affair as we're running out of bits in > that space. The lower 32 bits are all taken and the upper ones are > reserved for realtime signals. My signal.h has a commented out SIGLOST: > > #define SIGIO 29 > #define SIGPOLL SIGIO > /* > #define SIGLOST 29 > */ > > Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I > think it'd probably need its own value. Maybe there is some way to have > the application ask that one of the realtime signals be set up for this? Well, SIGPOLL is on its way out according to the standards. So SIGIO looks like what Linux uses instead. Looking at the man page for signal.h, I wonder if we could use SIGIO with si_code==POLL_LOST (a new code); si_band==POLL_MSG; and si_fd==<locked fd>. Then the inotify interface could then be used to process the event? The one nit here is that we would be generating SIGIO for regular files (and directories?) which would be new. It makes sense with what we want to do though. Also, SIGIO default behavior is to terminate the process.
On Thu, 2019-06-20 at 11:51 -0700, Patrick Donnelly wrote: > On Thu, Jun 20, 2019 at 11:18 AM Jeff Layton <jlayton@redhat.com> wrote: > > On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote: > > > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@redhat.com> wrote: > > > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > > > > any objections to reviving whatever patchset was in flight to add > > > > > that? Or just writeup a new one? > > > > > > > > > > > > > I think SIGLOST's utility is somewhat questionable. Applications will > > > > need to be custom-written to handle it. If you're going to do that, then > > > > it may be better to consider other async notification mechanisms. > > > > inotify or fanotify, perhaps? Those may be simpler to implement and get > > > > merged. > > > > > > The utility of SIGLOST is not well understood from the viewpoint of a > > > local file system. The problem uniquely applies to distributed file > > > systems. There simply is no way to recover from a lost lock for an > > > application through POSIX mechanisms. We really need a new signal to > > > just kill the application (by default) because recovery cannot be > > > automatically performed even through system call errors. I don't see > > > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes > > > the application will even use those system calls to monitor for lost > > > locks when POSIX has no provision for that to happen. > > > > > > > (cc'ing Anna in case she has an opinion) > > > > SIGLOST isn't defined in POSIX either, so I'm not sure that argument > > carries much weight. The _only_ way you'd be able to add SIGLOST is if > > it defaults to SIG_IGN, such that only applications that are watching > > for it will react to it. Given that, you're already looking at code > > modifications. > > Why does the default need to be SIG_IGN? Is that existing convention > for new signals in Linux? > No, it's just that if you don't do that, and locks are lost, then you'll have a bunch of applications suddenly crash. That sounds scary. > > So, the real question is: what's the best method to watch for lost > > locks? I don't have a terribly strong opinion about any of these > > notification methods, tbh. I only suggested inotify/fanotify because > > they'd likely be much simpler to implement. > > > > Adding signals is a non-trivial affair as we're running out of bits in > > that space. The lower 32 bits are all taken and the upper ones are > > reserved for realtime signals. My signal.h has a commented out SIGLOST: > > > > #define SIGIO 29 > > #define SIGPOLL SIGIO > > /* > > #define SIGLOST 29 > > */ > > > > Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I > > think it'd probably need its own value. Maybe there is some way to have > > the application ask that one of the realtime signals be set up for this? > > Well, SIGPOLL is on its way out according to the standards. So SIGIO > looks like what Linux uses instead. Looking at the man page for > signal.h, I wonder if we could use SIGIO with si_code==POLL_LOST (a > new code); si_band==POLL_MSG; and si_fd==<locked fd>. Then the inotify > interface could then be used to process the event? > > The one nit here is that we would be generating SIGIO for regular > files (and directories?) which would be new. It makes sense with what > we want to do though. Also, SIGIO default behavior is to terminate the > process. > That sounds like it could have unintended side-effects. A systemwide event that causes a ton of userland processes to suddenly catch a fatal signal seems rather nasty. It's also not clear to me how you'll identify recipients for this signal. What tasks will get a SIGLOST when this occurs? Going from file descriptors or inodes to tasks that are associated with them is not straightforward.
On Thu, Jun 20, 2019 at 12:00 PM Jeff Layton <jlayton@redhat.com> wrote: > > On Thu, 2019-06-20 at 11:51 -0700, Patrick Donnelly wrote: > > On Thu, Jun 20, 2019 at 11:18 AM Jeff Layton <jlayton@redhat.com> wrote: > > > On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote: > > > > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > > > > > any objections to reviving whatever patchset was in flight to add > > > > > > that? Or just writeup a new one? > > > > > > > > > > > > > > > > I think SIGLOST's utility is somewhat questionable. Applications will > > > > > need to be custom-written to handle it. If you're going to do that, then > > > > > it may be better to consider other async notification mechanisms. > > > > > inotify or fanotify, perhaps? Those may be simpler to implement and get > > > > > merged. > > > > > > > > The utility of SIGLOST is not well understood from the viewpoint of a > > > > local file system. The problem uniquely applies to distributed file > > > > systems. There simply is no way to recover from a lost lock for an > > > > application through POSIX mechanisms. We really need a new signal to > > > > just kill the application (by default) because recovery cannot be > > > > automatically performed even through system call errors. I don't see > > > > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes > > > > the application will even use those system calls to monitor for lost > > > > locks when POSIX has no provision for that to happen. > > > > > > > > > > (cc'ing Anna in case she has an opinion) > > > > > > SIGLOST isn't defined in POSIX either, so I'm not sure that argument > > > carries much weight. The _only_ way you'd be able to add SIGLOST is if > > > it defaults to SIG_IGN, such that only applications that are watching > > > for it will react to it. Given that, you're already looking at code > > > modifications. > > > > Why does the default need to be SIG_IGN? Is that existing convention > > for new signals in Linux? > > > > No, it's just that if you don't do that, and locks are lost, then you'll > have a bunch of applications suddenly crash. That sounds scary. > > > > So, the real question is: what's the best method to watch for lost > > > locks? I don't have a terribly strong opinion about any of these > > > notification methods, tbh. I only suggested inotify/fanotify because > > > they'd likely be much simpler to implement. > > > > > > Adding signals is a non-trivial affair as we're running out of bits in > > > that space. The lower 32 bits are all taken and the upper ones are > > > reserved for realtime signals. My signal.h has a commented out SIGLOST: > > > > > > #define SIGIO 29 > > > #define SIGPOLL SIGIO > > > /* > > > #define SIGLOST 29 > > > */ > > > > > > Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I > > > think it'd probably need its own value. Maybe there is some way to have > > > the application ask that one of the realtime signals be set up for this? > > > > Well, SIGPOLL is on its way out according to the standards. So SIGIO > > looks like what Linux uses instead. Looking at the man page for > > signal.h, I wonder if we could use SIGIO with si_code==POLL_LOST (a > > new code); si_band==POLL_MSG; and si_fd==<locked fd>. Then the inotify > > interface could then be used to process the event? > > > > The one nit here is that we would be generating SIGIO for regular > > files (and directories?) which would be new. It makes sense with what > > we want to do though. Also, SIGIO default behavior is to terminate the > > process. > > > > That sounds like it could have unintended side-effects. A systemwide > event that causes a ton of userland processes to suddenly catch a fatal > signal seems rather nasty. To be clear: that's only if the mount is configured in the most conservative way. Killing only userland processes with file locks would be an intermediate option (and maybe a default). > It's also not clear to me how you'll identify recipients for this > signal. What tasks will get a SIGLOST when this occurs? Going from file > descriptors or inodes to tasks that are associated with them is not > straightforward. We could start with file locks (which do have owners?) and table the idea of killing all tasks that have any kind of file descriptor open.
On Thu, 2019-06-20 at 12:50 -0700, Patrick Donnelly wrote: > On Thu, Jun 20, 2019 at 12:00 PM Jeff Layton <jlayton@redhat.com> wrote: > > On Thu, 2019-06-20 at 11:51 -0700, Patrick Donnelly wrote: > > > On Thu, Jun 20, 2019 at 11:18 AM Jeff Layton <jlayton@redhat.com> wrote: > > > > On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote: > > > > > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > > > > > > any objections to reviving whatever patchset was in flight to add > > > > > > > that? Or just writeup a new one? > > > > > > > > > > > > > > > > > > > I think SIGLOST's utility is somewhat questionable. Applications will > > > > > > need to be custom-written to handle it. If you're going to do that, then > > > > > > it may be better to consider other async notification mechanisms. > > > > > > inotify or fanotify, perhaps? Those may be simpler to implement and get > > > > > > merged. > > > > > > > > > > The utility of SIGLOST is not well understood from the viewpoint of a > > > > > local file system. The problem uniquely applies to distributed file > > > > > systems. There simply is no way to recover from a lost lock for an > > > > > application through POSIX mechanisms. We really need a new signal to > > > > > just kill the application (by default) because recovery cannot be > > > > > automatically performed even through system call errors. I don't see > > > > > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes > > > > > the application will even use those system calls to monitor for lost > > > > > locks when POSIX has no provision for that to happen. > > > > > > > > > > > > > (cc'ing Anna in case she has an opinion) > > > > > > > > SIGLOST isn't defined in POSIX either, so I'm not sure that argument > > > > carries much weight. The _only_ way you'd be able to add SIGLOST is if > > > > it defaults to SIG_IGN, such that only applications that are watching > > > > for it will react to it. Given that, you're already looking at code > > > > modifications. > > > > > > Why does the default need to be SIG_IGN? Is that existing convention > > > for new signals in Linux? > > > > > > > No, it's just that if you don't do that, and locks are lost, then you'll > > have a bunch of applications suddenly crash. That sounds scary. > > > > > > So, the real question is: what's the best method to watch for lost > > > > locks? I don't have a terribly strong opinion about any of these > > > > notification methods, tbh. I only suggested inotify/fanotify because > > > > they'd likely be much simpler to implement. > > > > > > > > Adding signals is a non-trivial affair as we're running out of bits in > > > > that space. The lower 32 bits are all taken and the upper ones are > > > > reserved for realtime signals. My signal.h has a commented out SIGLOST: > > > > > > > > #define SIGIO 29 > > > > #define SIGPOLL SIGIO > > > > /* > > > > #define SIGLOST 29 > > > > */ > > > > > > > > Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I > > > > think it'd probably need its own value. Maybe there is some way to have > > > > the application ask that one of the realtime signals be set up for this? > > > > > > Well, SIGPOLL is on its way out according to the standards. So SIGIO > > > looks like what Linux uses instead. Looking at the man page for > > > signal.h, I wonder if we could use SIGIO with si_code==POLL_LOST (a > > > new code); si_band==POLL_MSG; and si_fd==<locked fd>. Then the inotify > > > interface could then be used to process the event? > > > > > > The one nit here is that we would be generating SIGIO for regular > > > files (and directories?) which would be new. It makes sense with what > > > we want to do though. Also, SIGIO default behavior is to terminate the > > > process. > > > > > > > That sounds like it could have unintended side-effects. A systemwide > > event that causes a ton of userland processes to suddenly catch a fatal > > signal seems rather nasty. > > To be clear: that's only if the mount is configured in the most > conservative way. Killing only userland processes with file locks > would be an intermediate option (and maybe a default). > A disable switch for this behavior would be a minimum requirement, I think. > > It's also not clear to me how you'll identify recipients for this > > signal. What tasks will get a SIGLOST when this occurs? Going from file > > descriptors or inodes to tasks that are associated with them is not > > straightforward. > > We could start with file locks (which do have owners?) and table the > idea of killing all tasks that have any kind of file descriptor open. Well...we do track current->tgid for l_pid, so you could probably go by that for traditional POSIX locks. For flock() and OFD locks though, the tasks are owned by the file description and those can be shared between processes. So, even if you kill the tgid that acquired the lock, there's no guarantee other processes that might be using that lock will get the signal. Not that that's a real argument against doing this, but this approach could have significant gaps. OTOH, reporting a lost lock via fanotify would be quite straightforward (and not even that difficult). Then, any process that really cares could watch for these events. Again, I really think I'm missing the big picture on the problem you're attempting to solve with this. For instance, I was operating under the assumption that you wanted this to be an opt-in thing, but it sounds like you're aiming for something that is more draconian. I'm not convinced that that's a good idea -- especially not initially. Enabling this by default could be a very unwelcome surprise in some environments.
On Thu, Jun 20, 2019 at 1:06 PM Jeff Layton <jlayton@redhat.com> wrote: > > On Thu, 2019-06-20 at 12:50 -0700, Patrick Donnelly wrote: > > On Thu, Jun 20, 2019 at 12:00 PM Jeff Layton <jlayton@redhat.com> wrote: > > > On Thu, 2019-06-20 at 11:51 -0700, Patrick Donnelly wrote: > > > > On Thu, Jun 20, 2019 at 11:18 AM Jeff Layton <jlayton@redhat.com> wrote: > > > > > On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote: > > > > > > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > > > > > > > any objections to reviving whatever patchset was in flight to add > > > > > > > > that? Or just writeup a new one? > > > > > > > > > > > > > > > > > > > > > > I think SIGLOST's utility is somewhat questionable. Applications will > > > > > > > need to be custom-written to handle it. If you're going to do that, then > > > > > > > it may be better to consider other async notification mechanisms. > > > > > > > inotify or fanotify, perhaps? Those may be simpler to implement and get > > > > > > > merged. > > > > > > > > > > > > The utility of SIGLOST is not well understood from the viewpoint of a > > > > > > local file system. The problem uniquely applies to distributed file > > > > > > systems. There simply is no way to recover from a lost lock for an > > > > > > application through POSIX mechanisms. We really need a new signal to > > > > > > just kill the application (by default) because recovery cannot be > > > > > > automatically performed even through system call errors. I don't see > > > > > > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes > > > > > > the application will even use those system calls to monitor for lost > > > > > > locks when POSIX has no provision for that to happen. > > > > > > > > > > > > > > > > (cc'ing Anna in case she has an opinion) > > > > > > > > > > SIGLOST isn't defined in POSIX either, so I'm not sure that argument > > > > > carries much weight. The _only_ way you'd be able to add SIGLOST is if > > > > > it defaults to SIG_IGN, such that only applications that are watching > > > > > for it will react to it. Given that, you're already looking at code > > > > > modifications. > > > > > > > > Why does the default need to be SIG_IGN? Is that existing convention > > > > for new signals in Linux? > > > > > > > > > > No, it's just that if you don't do that, and locks are lost, then you'll > > > have a bunch of applications suddenly crash. That sounds scary. > > > > > > > > So, the real question is: what's the best method to watch for lost > > > > > locks? I don't have a terribly strong opinion about any of these > > > > > notification methods, tbh. I only suggested inotify/fanotify because > > > > > they'd likely be much simpler to implement. > > > > > > > > > > Adding signals is a non-trivial affair as we're running out of bits in > > > > > that space. The lower 32 bits are all taken and the upper ones are > > > > > reserved for realtime signals. My signal.h has a commented out SIGLOST: > > > > > > > > > > #define SIGIO 29 > > > > > #define SIGPOLL SIGIO > > > > > /* > > > > > #define SIGLOST 29 > > > > > */ > > > > > > > > > > Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I > > > > > think it'd probably need its own value. Maybe there is some way to have > > > > > the application ask that one of the realtime signals be set up for this? > > > > > > > > Well, SIGPOLL is on its way out according to the standards. So SIGIO > > > > looks like what Linux uses instead. Looking at the man page for > > > > signal.h, I wonder if we could use SIGIO with si_code==POLL_LOST (a > > > > new code); si_band==POLL_MSG; and si_fd==<locked fd>. Then the inotify > > > > interface could then be used to process the event? > > > > > > > > The one nit here is that we would be generating SIGIO for regular > > > > files (and directories?) which would be new. It makes sense with what > > > > we want to do though. Also, SIGIO default behavior is to terminate the > > > > process. > > > > > > > > > > That sounds like it could have unintended side-effects. A systemwide > > > event that causes a ton of userland processes to suddenly catch a fatal > > > signal seems rather nasty. > > > > To be clear: that's only if the mount is configured in the most > > conservative way. Killing only userland processes with file locks > > would be an intermediate option (and maybe a default). > > > > A disable switch for this behavior would be a minimum requirement, I > think. > > > > It's also not clear to me how you'll identify recipients for this > > > signal. What tasks will get a SIGLOST when this occurs? Going from file > > > descriptors or inodes to tasks that are associated with them is not > > > straightforward. > > > > We could start with file locks (which do have owners?) and table the > > idea of killing all tasks that have any kind of file descriptor open. > > Well...we do track current->tgid for l_pid, so you could probably go by > that for traditional POSIX locks. > > For flock() and OFD locks though, the tasks are owned by the file > description and those can be shared between processes. So, even if you > kill the tgid that acquired the lock, there's no guarantee other > processes that might be using that lock will get the signal. Not that > that's a real argument against doing this, but this approach could have > significant gaps. I wonder if it's actually common for a process to share locked file descriptors? I'm not even sure what that should mean. > OTOH, reporting a lost lock via fanotify would be quite straightforward > (and not even that difficult). Then, any process that really cares could > watch for these events. > > Again, I really think I'm missing the big picture on the problem you're > attempting to solve with this. I may be zooming farther than you want, but here's "The Big Picture": a kernel cephfs mount should recover after blacklist and continue to be usable at least for new processes/applications. Recovery should be automatic without admin intervention. > For instance, I was operating under the assumption that you wanted this > to be an opt-in thing, but it sounds like you're aiming for something > that is more draconian. I'm not convinced that that's a good idea -- > especially not initially. Enabling this by default could be a very > unwelcome surprise in some environments. Losing file locks is a serious issue that is grounds for terminating applications. Reminder once-again: status quo is the application is freezing without _any way to recover_. Almost any change is an improvement over that behavior, including termination because then monitor processes/init may recover. I'm not looking to build "opt-in" mechanisms (because the alternative is what? hang forever?) but I am happy to make the behavior configurable via mount options.
On Thu, 2019-06-20 at 15:25 -0700, Patrick Donnelly wrote: > On Thu, Jun 20, 2019 at 1:06 PM Jeff Layton <jlayton@redhat.com> wrote: > > On Thu, 2019-06-20 at 12:50 -0700, Patrick Donnelly wrote: > > > On Thu, Jun 20, 2019 at 12:00 PM Jeff Layton <jlayton@redhat.com> wrote: > > > > On Thu, 2019-06-20 at 11:51 -0700, Patrick Donnelly wrote: > > > > > On Thu, Jun 20, 2019 at 11:18 AM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > On Thu, 2019-06-20 at 10:19 -0700, Patrick Donnelly wrote: > > > > > > > On Mon, Jun 17, 2019 at 1:45 PM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > > Again, I'd like to see SIGLOST sent to the application here. Are there > > > > > > > > > any objections to reviving whatever patchset was in flight to add > > > > > > > > > that? Or just writeup a new one? > > > > > > > > > > > > > > > > > > > > > > > > > I think SIGLOST's utility is somewhat questionable. Applications will > > > > > > > > need to be custom-written to handle it. If you're going to do that, then > > > > > > > > it may be better to consider other async notification mechanisms. > > > > > > > > inotify or fanotify, perhaps? Those may be simpler to implement and get > > > > > > > > merged. > > > > > > > > > > > > > > The utility of SIGLOST is not well understood from the viewpoint of a > > > > > > > local file system. The problem uniquely applies to distributed file > > > > > > > systems. There simply is no way to recover from a lost lock for an > > > > > > > application through POSIX mechanisms. We really need a new signal to > > > > > > > just kill the application (by default) because recovery cannot be > > > > > > > automatically performed even through system call errors. I don't see > > > > > > > how inotify/fanotify (not POSIX interfaces!) helps here as it assumes > > > > > > > the application will even use those system calls to monitor for lost > > > > > > > locks when POSIX has no provision for that to happen. > > > > > > > > > > > > > > > > > > > (cc'ing Anna in case she has an opinion) > > > > > > > > > > > > SIGLOST isn't defined in POSIX either, so I'm not sure that argument > > > > > > carries much weight. The _only_ way you'd be able to add SIGLOST is if > > > > > > it defaults to SIG_IGN, such that only applications that are watching > > > > > > for it will react to it. Given that, you're already looking at code > > > > > > modifications. > > > > > > > > > > Why does the default need to be SIG_IGN? Is that existing convention > > > > > for new signals in Linux? > > > > > > > > > > > > > No, it's just that if you don't do that, and locks are lost, then you'll > > > > have a bunch of applications suddenly crash. That sounds scary. > > > > > > > > > > So, the real question is: what's the best method to watch for lost > > > > > > locks? I don't have a terribly strong opinion about any of these > > > > > > notification methods, tbh. I only suggested inotify/fanotify because > > > > > > they'd likely be much simpler to implement. > > > > > > > > > > > > Adding signals is a non-trivial affair as we're running out of bits in > > > > > > that space. The lower 32 bits are all taken and the upper ones are > > > > > > reserved for realtime signals. My signal.h has a commented out SIGLOST: > > > > > > > > > > > > #define SIGIO 29 > > > > > > #define SIGPOLL SIGIO > > > > > > /* > > > > > > #define SIGLOST 29 > > > > > > */ > > > > > > > > > > > > Sharing the value with SIGIO/SIGPOLL makes it distinctly less useful. I > > > > > > think it'd probably need its own value. Maybe there is some way to have > > > > > > the application ask that one of the realtime signals be set up for this? > > > > > > > > > > Well, SIGPOLL is on its way out according to the standards. So SIGIO > > > > > looks like what Linux uses instead. Looking at the man page for > > > > > signal.h, I wonder if we could use SIGIO with si_code==POLL_LOST (a > > > > > new code); si_band==POLL_MSG; and si_fd==<locked fd>. Then the inotify > > > > > interface could then be used to process the event? > > > > > > > > > > The one nit here is that we would be generating SIGIO for regular > > > > > files (and directories?) which would be new. It makes sense with what > > > > > we want to do though. Also, SIGIO default behavior is to terminate the > > > > > process. > > > > > > > > > > > > > That sounds like it could have unintended side-effects. A systemwide > > > > event that causes a ton of userland processes to suddenly catch a fatal > > > > signal seems rather nasty. > > > > > > To be clear: that's only if the mount is configured in the most > > > conservative way. Killing only userland processes with file locks > > > would be an intermediate option (and maybe a default). > > > > > > > A disable switch for this behavior would be a minimum requirement, I > > think. > > > > > > It's also not clear to me how you'll identify recipients for this > > > > signal. What tasks will get a SIGLOST when this occurs? Going from file > > > > descriptors or inodes to tasks that are associated with them is not > > > > straightforward. > > > > > > We could start with file locks (which do have owners?) and table the > > > idea of killing all tasks that have any kind of file descriptor open. > > > > Well...we do track current->tgid for l_pid, so you could probably go by > > that for traditional POSIX locks. > > > > For flock() and OFD locks though, the tasks are owned by the file > > description and those can be shared between processes. So, even if you > > kill the tgid that acquired the lock, there's no guarantee other > > processes that might be using that lock will get the signal. Not that > > that's a real argument against doing this, but this approach could have > > significant gaps. > > I wonder if it's actually common for a process to share locked file > descriptors? I'm not even sure what that should mean. > > > OTOH, reporting a lost lock via fanotify would be quite straightforward > > (and not even that difficult). Then, any process that really cares could > > watch for these events. > > > > Again, I really think I'm missing the big picture on the problem you're > > attempting to solve with this. > > I may be zooming farther than you want, but here's "The Big Picture": > a kernel cephfs mount should recover after blacklist and continue to > be usable at least for new processes/applications. Recovery should be > automatic without admin intervention. > > > For instance, I was operating under the assumption that you wanted this > > to be an opt-in thing, but it sounds like you're aiming for something > > that is more draconian. I'm not convinced that that's a good idea -- > > especially not initially. Enabling this by default could be a very > > unwelcome surprise in some environments. > > Losing file locks is a serious issue that is grounds for terminating > applications. Reminder once-again: status quo is the application is > freezing without _any way to recover_. Almost any change is an > improvement over that behavior, including termination because then > monitor processes/init may recover. > > I'm not looking to build "opt-in" mechanisms (because the alternative > is what? hang forever?) but I am happy to make the behavior > configurable via mount options. > FWIW, here are some links to old linux-nfs threads on SIGLOST/SIGIO handling (from 2011 and 2013): https://marc.info/?l=linux-nfs&m=137114937610166&w=4 https://marc.info/?l=linux-nfs&m=131101162420308&w=4 The original patcheset used SIGIO (since that's what SIGLOST is aliased to on most arches), with plans to use siginfo to distinguish a SIGLOST signal vs SIGIO.
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index f07767d3864c..d8efacd874b7 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2543,8 +2543,13 @@ static void __take_cap_refs(struct ceph_inode_info *ci, int got, * * FIXME: how does a 0 return differ from -EAGAIN? */ +enum { + NON_BLOCKING = 1, + CHECK_FILELOCK = 2, +}; + static int try_get_cap_refs(struct inode *inode, int need, int want, - loff_t endoff, bool nonblock, int *got) + loff_t endoff, int flags, int *got) { struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; @@ -2559,6 +2564,13 @@ static int try_get_cap_refs(struct inode *inode, int need, int want, again: spin_lock(&ci->i_ceph_lock); + if ((flags & CHECK_FILELOCK) && + (ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK)) { + dout("try_get_cap_refs %p error filelock\n", inode); + ret = -EIO; + goto out_unlock; + } + /* make sure file is actually open */ file_wanted = __ceph_caps_file_wanted(ci); if ((file_wanted & need) != need) { @@ -2620,7 +2632,7 @@ static int try_get_cap_refs(struct inode *inode, int need, int want, * we can not call down_read() when * task isn't in TASK_RUNNING state */ - if (nonblock) { + if (flags & NON_BLOCKING) { ret = -EAGAIN; goto out_unlock; } @@ -2725,7 +2737,8 @@ int ceph_try_get_caps(struct inode *inode, int need, int want, if (ret < 0) return ret; - ret = try_get_cap_refs(inode, need, want, 0, nonblock, got); + ret = try_get_cap_refs(inode, need, want, 0, + (nonblock ? NON_BLOCKING : 0), got); return ret == -EAGAIN ? 0 : ret; } @@ -2737,9 +2750,10 @@ int ceph_try_get_caps(struct inode *inode, int need, int want, int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got, struct page **pinned_page) { + struct ceph_file_info *fi = filp->private_data; struct inode *inode = file_inode(filp); struct ceph_inode_info *ci = ceph_inode(inode); - int _got, ret; + int ret, _got, flags; ret = ceph_pool_perm_check(inode, need); if (ret < 0) @@ -2749,17 +2763,19 @@ int ceph_get_caps(struct file *filp, int need, int want, if (endoff > 0) check_max_size(inode, endoff); + flags = atomic_read(&fi->num_locks) ? CHECK_FILELOCK : 0; _got = 0; ret = try_get_cap_refs(inode, need, want, endoff, - false, &_got); + flags, &_got); if (ret == -EAGAIN) continue; if (!ret) { DEFINE_WAIT_FUNC(wait, woken_wake_function); add_wait_queue(&ci->i_cap_wq, &wait); + flags |= NON_BLOCKING; while (!(ret = try_get_cap_refs(inode, need, want, - endoff, true, &_got))) { + endoff, flags, &_got))) { if (signal_pending(current)) { ret = -ERESTARTSYS; break; diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c index ac9b53b89365..cb216501c959 100644 --- a/fs/ceph/locks.c +++ b/fs/ceph/locks.c @@ -32,14 +32,18 @@ void __init ceph_flock_init(void) static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src) { - struct inode *inode = file_inode(src->fl_file); + struct ceph_file_info *fi = dst->fl_file->private_data; + struct inode *inode = file_inode(dst->fl_file); atomic_inc(&ceph_inode(inode)->i_filelock_ref); + atomic_inc(&fi->num_locks); } static void ceph_fl_release_lock(struct file_lock *fl) { + struct ceph_file_info *fi = fl->fl_file->private_data; struct inode *inode = file_inode(fl->fl_file); struct ceph_inode_info *ci = ceph_inode(inode); + atomic_dec(&fi->num_locks); if (atomic_dec_and_test(&ci->i_filelock_ref)) { /* clear error when all locks are released */ spin_lock(&ci->i_ceph_lock); @@ -73,7 +77,7 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct inode *inode, * window. Caller function will decrease the counter. */ fl->fl_ops = &ceph_fl_lock_ops; - atomic_inc(&ceph_inode(inode)->i_filelock_ref); + fl->fl_ops->fl_copy_lock(fl, NULL); } if (operation != CEPH_MDS_OP_SETFILELOCK || cmd == CEPH_LOCK_UNLOCK) diff --git a/fs/ceph/super.h b/fs/ceph/super.h index f45a06475f4f..999fd3244907 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -705,6 +705,7 @@ struct ceph_file_info { struct list_head rw_contexts; errseq_t meta_err; + atomic_t num_locks; }; struct ceph_dir_file_info {
After mds evicts session, file locks get lost sliently. It's not safe to let programs continue to do read/write. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> --- fs/ceph/caps.c | 28 ++++++++++++++++++++++------ fs/ceph/locks.c | 8 ++++++-- fs/ceph/super.h | 1 + 3 files changed, 29 insertions(+), 8 deletions(-)