Message ID | 20171201211327.GQ729@wotan.suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hello, I think I owe you a reply here... Sorry that it took so long. On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote: > On Fri, Dec 01, 2017 at 12:47:24PM +0100, Jan Kara wrote: > > On Thu 30-11-17 20:05:48, Luis R. Rodriguez wrote: > > > > In fact, what might be a cleaner solution is to introduce a 'freeze_count' > > > > for superblock freezing (we already do have this for block devices). Then > > > > you don't need to differentiate these two cases - but you'd still need to > > > > properly handle cleanup if freezing of all superblocks fails in the middle. > > > > So I'm not 100% this works out nicely in the end. But it's certainly worth > > > > a consideration. > > > > > > Ah, there are three important reasons for doing it the way I did it which are > > > easy to miss, unless you read the commit log message very carefully. > > > > > > 0) The ioctl interface causes a failure to be sent back to userspace if > > > you issue two consecutive freezes, or two thaws. Ie, once a filesystem is > > > frozen, a secondary call will result in an error. Likewise for thaw. > > > > Yep. But also note that there's *another* interface to filesystem freezing > > which behaves differently - freeze_bdev() (used internally by dm). That > > interface uses the counter and freezing of already frozen device succeeds. > > Ah... so freeze_bdev() semantics matches the same semantics I was looking > for. > > > IOW it is a mess. > > To say the least. > > > We cannot change the behavior of the ioctl but we could > > still provide an in-kernel interface to freeze_super() with the same > > semantics as freeze_bdev() which might be easier to use by suspend - maybe > > we could call it 'exclusive' (for the current freeze_super() semantics) and > > 'non-exclusive' (for the freeze_bdev() semantics) since this is very much > > like O_EXCL open of block devices... > > Sure, now typically I see we make exclusive calls with the postfix _excl() so > I take it you'd be OK in renaming freeze_super() freeze_super_excl() eventually > then? In principle yes but let's leave the naming disputes to a later time when it is clear what API do we actually want to provide. > I totally missed freeze_bdev() otherwise I think I would have picked up on the > shared semantics stuff and I would have just made a helper out of what > freeze_bdev() uses, and then have both in-kernel and freeze_bdev() use it. > > I'll note that its still not perfectly clear if really the semantics behind > freeze_bdev() match what I described above fully. That still needs to be > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we > an ioctl initiated freeze had occurred before? If so then great. Otherwise > I think we'll need to distinguish the ioctl interface. Worst possible case > is that bdev semantics and in-kernel semantics differ somehow, then that > will really create a holy fucking mess. I believe nobody really thought about mixing those two interfaces to fs freezing and so the behavior is basically defined by the implementation. That is: freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY freeze_bdev() on sb frozen by freeze_bdev() -> success ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL ioctl_fsthaw() on sb frozen by freeze_bdev() -> success What I propose is the following API: freeze_super_excl() - freezes superblock, returns EBUSY if the superblock is already frozen (either by another freeze_super_excl() or by freeze_super()) freeze_super() - this function will make sure superblock is frozen when the function returns with success. It can be nested with other freeze_super() or freeze_super_excl() calls (this second part is different from how freeze_bdev() behaves currently but AFAICT this behavior is actually what all current users of freeze_bdev() really want - just make sure fs cannot be written to) thaw_super() - counterpart to freeze_super(), would fail with EINVAL if we were to drop the last "freeze reference" but sb was actually frozen with freeze_super_excl() thaw_super_excl() - counterpart to freeze_super_excl(). Fails with EINVAL if sb was not frozen with freeze_super_excl() (this is different to current behavior but I don't believe anyone relies on this and doing otherwise is asking for data corruption). I'd implement it by a freeze counter in the superblock (similar to what we currently have in bdev) where every call to freeze_super() or freeze_super_excl() would add one. Additionally we'd have a flag in the superblock whether the first freeze (it could not be any other since those would fail with EBUSY) came from freeze_super_excl(). Then we could make ioctl interface use the _excl() variant of the freezing API, freeze_bdev() would use the non-exclusive variant (we could drop the freeze counter in bdev completely), your freezing on suspend could then use the non-exclusive variant as well. Also when doing this, you'd need to move code like: if (sb->s_op->freeze_super) error = sb->s_op->freeze_super(sb); else error = freeze_super(sb); into the freeze_super() / freeze_super_excl() handler behind the freeze counting code which might be a bit tricky WRT locking. GFS2 is the only fs having ->freeze_super() and that callback was implemented specifically so that it can do its own (cluster wide) locking before generic code grabbing s_umount semaphore. Then internally GFS2 ends up calling freeze_super() from freeze_go_sync() when cluster lock is acquired. > > 2) It is not that normal users + one special user (who owns the "flag" in > > the superblock in form of a special freeze state) setup. We'd simply have > > exclusive and non-exclusive users of superblock freezing and there can be > > arbitrary numbers of them. > > Sorry I did not understand this point. Can you rephrase perhaps a bit? > > Anyway, I just tried implementing this and it seemed rather easy to > use a pivot, however note that then freeze_processes() which calls > fs_suspend_freeze() would somehow need to pass the failed sb... do > we want to have let fs_suspend_freeze() pass a parameter to be set > to the failed sb of it failed? Locking-wise this seems racy. So with your iterate_supers_excl() doing this is somewhat difficult but you could have something like: int freeze_all_supers(void) { struct super_block *sb, *p = NULL; int error = 0; spin_lock(&sb_lock); list_for_each_entry_reverse(sb, &super_blocks, s_list) { if (hlist_unhashed(&sb->s_instances)) continue; sb->s_count++; spin_unlock(&sb_lock); down_write(&sb->s_umount); if (sb->s_root && (sb->s_flags & SB_BORN)) { error = freeze_super(sb, arg); if (error) { up_write(&sb->s_umount); spin_lock(&sb_lock); if (p) __put_super(p); p = sb; list_for_each_entry_continue(sb, &super_blocks, s_list) { if (hlist_unhashed(&sb->s_instances)) continue; sb->s_count++; spin_unlock(&sb_lock); down_write(&sb->s_umount); if (sb->s_root && (sb->s_flags & SB_BORN)) thaw_super(sb, arg); up_write(&sb->s_umount); spin_lock(&sb_lock); if (p) __put_super(p); p = sb; } break; } } up_write(&sb->s_umount); spin_lock(&sb_lock); if (p) __put_super(p); p = sb; } if (p) __put_super(p); spin_unlock(&sb_lock); return error; } And you could possibly factor that out into two helper functions for iterating the superblocks, just they'd need more parameters and you'd need to pass reference (sb->count) when passing in the 'pivot' as you call it. Honza
On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote: > Hello, > > I think I owe you a reply here... Sorry that it took so long. Took me just as long :) > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote: > > > > I'll note that its still not perfectly clear if really the semantics behind > > freeze_bdev() match what I described above fully. That still needs to be > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we > > an ioctl initiated freeze had occurred before? If so then great. Otherwise > > I think we'll need to distinguish the ioctl interface. Worst possible case > > is that bdev semantics and in-kernel semantics differ somehow, then that > > will really create a holy fucking mess. > > I believe nobody really thought about mixing those two interfaces to fs > freezing and so the behavior is basically defined by the implementation. > That is: > > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY Note below as well on your *future* freeze_super() implementation. > freeze_bdev() on sb frozen by freeze_bdev() -> success > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY > > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL Phew, so this is what we want for the in-kernel freezing so we're good and *can* combine these then. > ioctl_fsthaw() on sb frozen by freeze_bdev() -> success > > What I propose is the following API: > > freeze_super_excl() > - freezes superblock, returns EBUSY if the superblock is already frozen > (either by another freeze_super_excl() or by freeze_super()) > freeze_super() > - this function will make sure superblock is frozen when the function > returns with success. That's straight forward. > It can be nested with other freeze_super() or > freeze_super_excl() calls This is where it can get hairy. More below. > (this second part is different from how > freeze_bdev() behaves currently but AFAICT this behavior is actually > what all current users of freeze_bdev() really want - just make sure > fs cannot be written to) If we can agree to this, then sure. However there are two types of possible nested calls to consider, one where the sb was already frozen by an IOCTL, and the other where it was initiated by either another freeze_super_excl() or another freeze_super() call which is currently being processed. For the first type, its easy to say the device is already frozen as such return success. If the freezing is ongoing, we may want to wait or not wait, and this will depend on our current use cases for freeze_bdev(). As you noted above, freeze_bdev() currently returns EBUSY if we had the sb already frozen by ioctl_fsfreeze(). It may be a welcomed enhancement to correct the semantics first to address the first case, but keep the EBUSY for the other case. A secondary patch could then add a completion mechanism and let callers decide to either wait or not. *Iff* the caller did not opt-in to wait we keep the EBUSY return. Seem reasonable? I'll address the rest of the mail later. Luis
On Tue 17-04-18 17:59:36, Luis R. Rodriguez wrote: > On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote: > > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote: > > > > > > I'll note that its still not perfectly clear if really the semantics behind > > > freeze_bdev() match what I described above fully. That still needs to be > > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we > > > an ioctl initiated freeze had occurred before? If so then great. Otherwise > > > I think we'll need to distinguish the ioctl interface. Worst possible case > > > is that bdev semantics and in-kernel semantics differ somehow, then that > > > will really create a holy fucking mess. > > > > I believe nobody really thought about mixing those two interfaces to fs > > freezing and so the behavior is basically defined by the implementation. > > That is: > > > > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY > > Note below as well on your *future* freeze_super() implementation. > > > freeze_bdev() on sb frozen by freeze_bdev() -> success > > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY > > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY > > > > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL > > Phew, so this is what we want for the in-kernel freezing so we're good > and *can* combine these then. > > > ioctl_fsthaw() on sb frozen by freeze_bdev() -> success > > > > What I propose is the following API: > > > > freeze_super_excl() > > - freezes superblock, returns EBUSY if the superblock is already frozen > > (either by another freeze_super_excl() or by freeze_super()) > > freeze_super() > > - this function will make sure superblock is frozen when the function > > returns with success. > > That's straight forward. > > > It can be nested with other freeze_super() or > > freeze_super_excl() calls > > This is where it can get hairy. More below. > > > (this second part is different from how > > freeze_bdev() behaves currently but AFAICT this behavior is actually > > what all current users of freeze_bdev() really want - just make sure > > fs cannot be written to) > > If we can agree to this, then sure. However there are two types of > possible nested calls to consider, one where the sb was already frozen > by an IOCTL, and the other where it was initiated by either another > freeze_super_excl() or another freeze_super() call which is currently > being processed. For the first type, its easy to say the device is > already frozen as such return success. If the freezing is ongoing, > we may want to wait or not wait, and this will depend on our current > use cases for freeze_bdev(). A side note since I'm not sure I wrote this down in my previous email: I want ioctl_fsfreeze() directly use freeze_super_excl(). Now to your freeze in progress question: freeze_super_excl() can immediately return EBUSY when there's freezing in progress. OTOH freeze_super() always has to wait for the current freeze / thaw to finish and then do what's necessary. I don't see a use case where you'd like to have freeze_super() not wait. > As you noted above, freeze_bdev() currently returns EBUSY if we had > the sb already frozen by ioctl_fsfreeze(). It may be a welcomed > enhancement to correct the semantics first to address the first case, > but keep the EBUSY for the other case. A secondary patch could then > add a completion mechanism and let callers decide to either wait or not. > *Iff* the caller did not opt-in to wait we keep the EBUSY return. You're now speaking about steps to transition to the new API, right? I'd structure the transition as follows: 1) Move bdev->bd_fsfreeze_count to a superblock. 2) Make freeze_super() grab the counter as well, thaw_super() drops it and unfreezes the filesystem only if the counter dropped to zero. 3) Rename freeze_super() to freeze_super_excl(). 4) Only now I'd go for messing with freeze_bdev() as it now combines sanely with freeze_super_excl(). Probably I'd just implement new freeze_super() with the desired semantics (including waiting for ongoing operation to finish). 5) And then switch all users (there are 4 in the kernel) from freeze_bdev() to freeze_super() with the justification in each case why the new semantics is actually desirable. 6) Drop old freeze_bdev() - note that only one freeze_bdev() user (in drivers/md/dm.c) is actually interested in passing bdev, all the others are better off just passing in superblock to new freeze_super(). Anyway for that user in dm we might still provide a convenience wrapper to grab the superblock and call new freeze_super() on it. Honza
On Tue, Apr 17, 2018 at 05:59:36PM -0700, Luis R. Rodriguez wrote: > On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote: > > Hello, > > > > I think I owe you a reply here... Sorry that it took so long. > > Took me just as long :) > > > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote: > > > > > > I'll note that its still not perfectly clear if really the semantics behind > > > freeze_bdev() match what I described above fully. That still needs to be > > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we > > > an ioctl initiated freeze had occurred before? If so then great. Otherwise > > > I think we'll need to distinguish the ioctl interface. Worst possible case > > > is that bdev semantics and in-kernel semantics differ somehow, then that > > > will really create a holy fucking mess. > > > > I believe nobody really thought about mixing those two interfaces to fs > > freezing and so the behavior is basically defined by the implementation. > > That is: > > > > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY > > freeze_bdev() on sb frozen by freeze_bdev() -> success > > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY > > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY > > > > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL > > Phew, so this is what we want for the in-kernel freezing so we're good > and *can* combine these then. I double checked, and I don't see where you get EINVAL for this case. We *do* keep the sb frozen though, which is good, and the worst fear I had was that we did not. However we return 0 if there was already a prior freeze_bdev() or ioctl_fsfreeze() other than the context that started the prior freeze (--bdev->bd_fsfreeze_count > 0). The -EINVAL is only returned currently if there were no freezers. int thaw_bdev(struct block_device *bdev, struct super_block *sb) { int error = -EINVAL; mutex_lock(&bdev->bd_fsfreeze_mutex); if (!bdev->bd_fsfreeze_count) goto out; error = 0; if (--bdev->bd_fsfreeze_count > 0) goto out; ... out: mutex_unlock(&bdev->bd_fsfreeze_mutex); return error; } Luis
On Fri 20-04-18 11:49:32, Luis R. Rodriguez wrote: > On Tue, Apr 17, 2018 at 05:59:36PM -0700, Luis R. Rodriguez wrote: > > On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote: > > > Hello, > > > > > > I think I owe you a reply here... Sorry that it took so long. > > > > Took me just as long :) > > > > > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote: > > > > > > > > I'll note that its still not perfectly clear if really the semantics behind > > > > freeze_bdev() match what I described above fully. That still needs to be > > > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we > > > > an ioctl initiated freeze had occurred before? If so then great. Otherwise > > > > I think we'll need to distinguish the ioctl interface. Worst possible case > > > > is that bdev semantics and in-kernel semantics differ somehow, then that > > > > will really create a holy fucking mess. > > > > > > I believe nobody really thought about mixing those two interfaces to fs > > > freezing and so the behavior is basically defined by the implementation. > > > That is: > > > > > > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY > > > freeze_bdev() on sb frozen by freeze_bdev() -> success > > > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY > > > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY > > > > > > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL > > > > Phew, so this is what we want for the in-kernel freezing so we're good > > and *can* combine these then. > > I double checked, and I don't see where you get EINVAL for this case. > We *do* keep the sb frozen though, which is good, and the worst fear > I had was that we did not. However we return 0 if there was already > a prior freeze_bdev() or ioctl_fsfreeze() other than the context that > started the prior freeze (--bdev->bd_fsfreeze_count > 0). > > The -EINVAL is only returned currently if there were no freezers. > > int thaw_bdev(struct block_device *bdev, struct super_block *sb) > { > int error = -EINVAL; > > mutex_lock(&bdev->bd_fsfreeze_mutex); > if (!bdev->bd_fsfreeze_count) > goto out; But this is precisely where we'd bail if we freeze sb by ioctl_fsfreeze() but try to thaw by thaw_bdev(). ioctl_fsfreeze() does not touch bd_fsfreeze_count... Honza
On Sun, Apr 22, 2018 at 01:53:23AM +0200, Jan Kara wrote: > On Fri 20-04-18 11:49:32, Luis R. Rodriguez wrote: > > On Tue, Apr 17, 2018 at 05:59:36PM -0700, Luis R. Rodriguez wrote: > > > On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote: > > > > Hello, > > > > > > > > I think I owe you a reply here... Sorry that it took so long. > > > > > > Took me just as long :) > > > > > > > On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote: > > > > > > > > > > I'll note that its still not perfectly clear if really the semantics behind > > > > > freeze_bdev() match what I described above fully. That still needs to be > > > > > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we > > > > > an ioctl initiated freeze had occurred before? If so then great. Otherwise > > > > > I think we'll need to distinguish the ioctl interface. Worst possible case > > > > > is that bdev semantics and in-kernel semantics differ somehow, then that > > > > > will really create a holy fucking mess. > > > > > > > > I believe nobody really thought about mixing those two interfaces to fs > > > > freezing and so the behavior is basically defined by the implementation. > > > > That is: > > > > > > > > freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY > > > > freeze_bdev() on sb frozen by freeze_bdev() -> success > > > > ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY > > > > ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY > > > > > > > > thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL > > > > > > Phew, so this is what we want for the in-kernel freezing so we're good > > > and *can* combine these then. > > > > I double checked, and I don't see where you get EINVAL for this case. > > We *do* keep the sb frozen though, which is good, and the worst fear > > I had was that we did not. However we return 0 if there was already > > a prior freeze_bdev() or ioctl_fsfreeze() other than the context that > > started the prior freeze (--bdev->bd_fsfreeze_count > 0). > > > > The -EINVAL is only returned currently if there were no freezers. > > > > int thaw_bdev(struct block_device *bdev, struct super_block *sb) > > { > > int error = -EINVAL; > > > > mutex_lock(&bdev->bd_fsfreeze_mutex); > > if (!bdev->bd_fsfreeze_count) > > goto out; > > But this is precisely where we'd bail if we freeze sb by ioctl_fsfreeze() > but try to thaw by thaw_bdev(). ioctl_fsfreeze() does not touch > bd_fsfreeze_count... Ah, yes, I see that now, thanks! Luis
diff --git a/fs/super.c b/fs/super.c index 885711c1d35b..8cb6f38652d8 100644 --- a/fs/super.c +++ b/fs/super.c @@ -614,13 +614,21 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg) * locked superblock and given argument. Returns 0 unless an error * occurred on calling the function on any superblock. */ -int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg) +int iterate_supers_excl(int (*f)(struct super_block *, void *), void *arg, + struct super_block *pivot) { struct super_block *sb, *p = NULL; int error = 0; spin_lock(&sb_lock); list_for_each_entry(sb, &super_blocks, s_list) { + /* If we have a pivot, start work on the next item */ + if (pivot) { + if (sb != pivot) + continue; + pivot = NULL; + continue; + } if (hlist_unhashed(&sb->s_instances)) continue; sb->s_count++;