Message ID | 20210908135135.1474055-1-nborisov@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: Remove received information from snapshot on ro->rw switch | expand |
On Wed, Sep 8, 2021 at 2:53 PM Nikolay Borisov <nborisov@suse.com> wrote: > > Currently when a read-only snapshot is received and subsequently its > ro property is set to false i.e. switched to rw-mode the > received_uuid/stime/rtime/stransid/rtransid of that subvol remains > intact. However, once the received volume is switched to RW mode we > cannot guaranteee that it contains the same data, so it makes sense > to remove those fields which indicate this volume was ever > send/received. Additionally, sending such volume can cause conflicts > due to the presence of received_uuid. > > Preserving the received_uuid (and related fields) after transitioning the > snapshot from RO to RW and then changing the snapshot has a potential for > causing send to fail in many unexpected ways if we later turn it back to > RO and use it for an incremental send operation. > > A recent example, in the thread to which the Link tag below points to, we > had a user with a filesystem that had multiple snapshots with the same > received_uuid but with different content due to a transition from RO to RW > and then back to RO. When an incremental send was attempted using two of > those snapshots, it resulted in send emitting a clone operation that was > intended to clone from the parent root to the send root - however because > both roots had the same received_uuid, the receiver ended up picking the > send root as the source root for the clone operation, which happened to > result in the clone operation to fail with -EINVAL, due to the source and > destination offsets being the same (and on the same root and file). In this > particular case there was no harm, but we could end up in a case where the > clone operation succeeds but clones wrong data due to picking up an > incorrect root - as in the case of multiple snapshots with the same > received_uuid, it has no way to know which one is the correct one if they > have different content. > > Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/ > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > Suggested-by: David Sterba <dsterba@suse.cz> Thanks, looks good. Reviewed-by: Filipe Manana <fdmanana@suse.com> > --- > fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++------ > 1 file changed, 35 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 9eb0c1eb568e..67709d274489 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1927,9 +1927,11 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > struct inode *inode = file_inode(file); > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > struct btrfs_root *root = BTRFS_I(inode)->root; > + struct btrfs_root_item *root_item = &root->root_item; > struct btrfs_trans_handle *trans; > u64 root_flags; > u64 flags; > + bool clear_received_state = false; > int ret = 0; > > if (!inode_owner_or_capable(file_mnt_user_ns(file), inode)) > @@ -1960,9 +1962,9 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > if (!!(flags & BTRFS_SUBVOL_RDONLY) == btrfs_root_readonly(root)) > goto out_drop_sem; > > - root_flags = btrfs_root_flags(&root->root_item); > + root_flags = btrfs_root_flags(root_item); > if (flags & BTRFS_SUBVOL_RDONLY) { > - btrfs_set_root_flags(&root->root_item, > + btrfs_set_root_flags(root_item, > root_flags | BTRFS_ROOT_SUBVOL_RDONLY); > } else { > /* > @@ -1971,9 +1973,10 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > */ > spin_lock(&root->root_item_lock); > if (root->send_in_progress == 0) { > - btrfs_set_root_flags(&root->root_item, > + btrfs_set_root_flags(root_item, > root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY); > spin_unlock(&root->root_item_lock); > + clear_received_state = true; > } else { > spin_unlock(&root->root_item_lock); > btrfs_warn(fs_info, > @@ -1984,14 +1987,40 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > } > } > > - trans = btrfs_start_transaction(root, 1); > + /* > + * 1 item for updating the uuid root in the root tree > + * 1 item for actually removing the uuid record in the uuid tree > + */ > + trans = btrfs_start_transaction(root, 2); > if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > goto out_reset; > } > > - ret = btrfs_update_root(trans, fs_info->tree_root, > - &root->root_key, &root->root_item); > + if (clear_received_state && > + !btrfs_is_empty_uuid(root_item->received_uuid)) { > + > + ret = btrfs_uuid_tree_remove(trans, root_item->received_uuid, > + BTRFS_UUID_KEY_RECEIVED_SUBVOL, > + root->root_key.objectid); > + > + if (ret && ret != -ENOENT) { > + btrfs_abort_transaction(trans, ret); > + btrfs_end_transaction(trans); > + goto out_reset; > + } > + > + memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE); > + btrfs_set_root_stransid(root_item, 0); > + btrfs_set_root_rtransid(root_item, 0); > + btrfs_set_stack_timespec_sec(&root_item->stime, 0); > + btrfs_set_stack_timespec_nsec(&root_item->stime, 0); > + btrfs_set_stack_timespec_sec(&root_item->rtime, 0); > + btrfs_set_stack_timespec_nsec(&root_item->rtime, 0); > + } > + > + ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key, > + root_item); > if (ret < 0) { > btrfs_end_transaction(trans); > goto out_reset; > -- > 2.25.1 >
On 08.09.2021 15:51 Nikolay Borisov wrote: > Currently when a read-only snapshot is received and subsequently its > ro property is set to false i.e. switched to rw-mode the > received_uuid/stime/rtime/stransid/rtransid of that subvol remains > intact. However, once the received volume is switched to RW mode we > cannot guaranteee that it contains the same data, so it makes sense > to remove those fields which indicate this volume was ever > send/received. Additionally, sending such volume can cause conflicts > due to the presence of received_uuid. > > Preserving the received_uuid (and related fields) after transitioning the > snapshot from RO to RW and then changing the snapshot has a potential for > causing send to fail in many unexpected ways if we later turn it back to > RO and use it for an incremental send operation. > > A recent example, in the thread to which the Link tag below points to, we > had a user with a filesystem that had multiple snapshots with the same > received_uuid but with different content due to a transition from RO to RW > and then back to RO. When an incremental send was attempted using two of > those snapshots, it resulted in send emitting a clone operation that was > intended to clone from the parent root to the send root - however because > both roots had the same received_uuid, the receiver ended up picking the > send root as the source root for the clone operation, which happened to > result in the clone operation to fail with -EINVAL, due to the source and > destination offsets being the same (and on the same root and file). In this > particular case there was no harm, but we could end up in a case where the > clone operation succeeds but clones wrong data due to picking up an > incorrect root - as in the case of multiple snapshots with the same > received_uuid, it has no way to know which one is the correct one if they > have different content. Not to overly discourage this change but I think this will cause some issues in user space. For example I had the problem of partial subvols after a sudden restart during receive. No problem, just receive to a directory that gets deleted on startup then move the subvol to the final location after completion. To move the subvol it needs to be temporarily set rw for some reason. I'm sure there is some better solution but patterns like this might be out there. > > Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/ > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > Suggested-by: David Sterba <dsterba@suse.cz> > --- > fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++------ > 1 file changed, 35 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 9eb0c1eb568e..67709d274489 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1927,9 +1927,11 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > struct inode *inode = file_inode(file); > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > struct btrfs_root *root = BTRFS_I(inode)->root; > + struct btrfs_root_item *root_item = &root->root_item; > struct btrfs_trans_handle *trans; > u64 root_flags; > u64 flags; > + bool clear_received_state = false; > int ret = 0; > > if (!inode_owner_or_capable(file_mnt_user_ns(file), inode)) > @@ -1960,9 +1962,9 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > if (!!(flags & BTRFS_SUBVOL_RDONLY) == btrfs_root_readonly(root)) > goto out_drop_sem; > > - root_flags = btrfs_root_flags(&root->root_item); > + root_flags = btrfs_root_flags(root_item); > if (flags & BTRFS_SUBVOL_RDONLY) { > - btrfs_set_root_flags(&root->root_item, > + btrfs_set_root_flags(root_item, > root_flags | BTRFS_ROOT_SUBVOL_RDONLY); > } else { > /* > @@ -1971,9 +1973,10 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > */ > spin_lock(&root->root_item_lock); > if (root->send_in_progress == 0) { > - btrfs_set_root_flags(&root->root_item, > + btrfs_set_root_flags(root_item, > root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY); > spin_unlock(&root->root_item_lock); > + clear_received_state = true; > } else { > spin_unlock(&root->root_item_lock); > btrfs_warn(fs_info, > @@ -1984,14 +1987,40 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > } > } > > - trans = btrfs_start_transaction(root, 1); > + /* > + * 1 item for updating the uuid root in the root tree > + * 1 item for actually removing the uuid record in the uuid tree > + */ > + trans = btrfs_start_transaction(root, 2); > if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > goto out_reset; > } > > - ret = btrfs_update_root(trans, fs_info->tree_root, > - &root->root_key, &root->root_item); > + if (clear_received_state && > + !btrfs_is_empty_uuid(root_item->received_uuid)) { > + > + ret = btrfs_uuid_tree_remove(trans, root_item->received_uuid, > + BTRFS_UUID_KEY_RECEIVED_SUBVOL, > + root->root_key.objectid); > + > + if (ret && ret != -ENOENT) { > + btrfs_abort_transaction(trans, ret); > + btrfs_end_transaction(trans); > + goto out_reset; > + } > + > + memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE); > + btrfs_set_root_stransid(root_item, 0); > + btrfs_set_root_rtransid(root_item, 0); > + btrfs_set_stack_timespec_sec(&root_item->stime, 0); > + btrfs_set_stack_timespec_nsec(&root_item->stime, 0); > + btrfs_set_stack_timespec_sec(&root_item->rtime, 0); > + btrfs_set_stack_timespec_nsec(&root_item->rtime, 0); > + } > + > + ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key, > + root_item); > if (ret < 0) { > btrfs_end_transaction(trans); > goto out_reset;
On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote: > On 08.09.2021 15:51 Nikolay Borisov wrote: > > Currently when a read-only snapshot is received and subsequently its > > ro property is set to false i.e. switched to rw-mode the > > received_uuid/stime/rtime/stransid/rtransid of that subvol remains > > intact. However, once the received volume is switched to RW mode we > > cannot guaranteee that it contains the same data, so it makes sense > > to remove those fields which indicate this volume was ever > > send/received. Additionally, sending such volume can cause conflicts > > due to the presence of received_uuid. > > > > Preserving the received_uuid (and related fields) after transitioning the > > snapshot from RO to RW and then changing the snapshot has a potential for > > causing send to fail in many unexpected ways if we later turn it back to > > RO and use it for an incremental send operation. > > > > A recent example, in the thread to which the Link tag below points to, we > > had a user with a filesystem that had multiple snapshots with the same > > received_uuid but with different content due to a transition from RO to RW > > and then back to RO. When an incremental send was attempted using two of > > those snapshots, it resulted in send emitting a clone operation that was > > intended to clone from the parent root to the send root - however because > > both roots had the same received_uuid, the receiver ended up picking the > > send root as the source root for the clone operation, which happened to > > result in the clone operation to fail with -EINVAL, due to the source and > > destination offsets being the same (and on the same root and file). In this > > particular case there was no harm, but we could end up in a case where the > > clone operation succeeds but clones wrong data due to picking up an > > incorrect root - as in the case of multiple snapshots with the same > > received_uuid, it has no way to know which one is the correct one if they > > have different content. > Not to overly discourage this change but I think this will cause some > issues in user space. That this change can cause issues for users is the reason why it hasn't been merged. The change on the kernel side is simple, but I've been missing the progs part and the "what-if"s that happen in practice. This hasn't been communicated properly so we've got resends without changes. I had a chat with Nikolay about what's missing so hopefully we can move forward this time. > For example I had the problem of partial subvols after a sudden > restart during receive. No problem, just receive to a directory that > gets deleted on startup then move the subvol to the final location > after completion. To move the subvol it needs to be temporarily set rw > for some reason. I'm sure there is some better solution but patterns > like this might be out there. Thanks, that's a case we should take into account. And there are probably more, but I'm not using send/receive much so that's another reason why I've been hesitant to merge the patch due to lack of understanding of the use case.
On 08/09/2021 19:33, David Sterba wrote: > On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote: <snip> >> For example I had the problem of partial subvols after a sudden >> restart during receive. No problem, just receive to a directory that >> gets deleted on startup then move the subvol to the final location >> after completion. To move the subvol it needs to be temporarily set rw >> for some reason. I'm sure there is some better solution but patterns >> like this might be out there. > > Thanks, that's a case we should take into account. And there are > probably more, but I'm not using send/receive much so that's another > reason why I've been hesitant to merge the patch due to lack of > understanding of the use case. > This seems to be an important change to make, but is user-affecting. A few ideas: 1) Can it be made optional? On by default but possible to turn off (mount option, sys file, ...) if you really need to retain the current behaviour. 2) Is there a way to engage with the developers and user community for popular tools which make heavy use of snapshotting and/or send/receive for discussion and testing? For example, btrbk, snapper, distros. 3) How about adding an IOCTL to allow the user to set/delete the received_uuid? Only intended for cases which really need to emulate the removed behaviour. This could be a less complex long term solution than keeping option 1 indefinitely. Graham
On 9.09.21 г. 0:24, Graham Cobb wrote: > > On 08/09/2021 19:33, David Sterba wrote: >> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote: > > <snip> > >>> For example I had the problem of partial subvols after a sudden >>> restart during receive. No problem, just receive to a directory that >>> gets deleted on startup then move the subvol to the final location >>> after completion. To move the subvol it needs to be temporarily set rw >>> for some reason. I'm sure there is some better solution but patterns >>> like this might be out there. >> >> Thanks, that's a case we should take into account. And there are >> probably more, but I'm not using send/receive much so that's another >> reason why I've been hesitant to merge the patch due to lack of >> understanding of the use case. >> > > This seems to be an important change to make, but is user-affecting. A > few ideas: > > 1) Can it be made optional? On by default but possible to turn off > (mount option, sys file, ...) if you really need to retain the current > behaviour. But the current behavior is buggy and non-intentional so it should be rectified. Basically we've made it easy for users to do something which they shouldn't really be doing, they then bear the consequences and come to the mailing list for support thinking something is broken. > > 2) Is there a way to engage with the developers and user community for > popular tools which make heavy use of snapshotting and/or send/receive > for discussion and testing? For example, btrbk, snapper, distros. > > 3) How about adding an IOCTL to allow the user to set/delete the > received_uuid? Only intended for cases which really need to emulate the > removed behaviour. This could be a less complex long term solution than > keeping option 1 indefinitely. > > Graham >
On Wed, Sep 8, 2021 at 7:35 PM David Sterba <dsterba@suse.cz> wrote: > > On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote: > > On 08.09.2021 15:51 Nikolay Borisov wrote: > > > Currently when a read-only snapshot is received and subsequently its > > > ro property is set to false i.e. switched to rw-mode the > > > received_uuid/stime/rtime/stransid/rtransid of that subvol remains > > > intact. However, once the received volume is switched to RW mode we > > > cannot guaranteee that it contains the same data, so it makes sense > > > to remove those fields which indicate this volume was ever > > > send/received. Additionally, sending such volume can cause conflicts > > > due to the presence of received_uuid. > > > > > > Preserving the received_uuid (and related fields) after transitioning the > > > snapshot from RO to RW and then changing the snapshot has a potential for > > > causing send to fail in many unexpected ways if we later turn it back to > > > RO and use it for an incremental send operation. > > > > > > A recent example, in the thread to which the Link tag below points to, we > > > had a user with a filesystem that had multiple snapshots with the same > > > received_uuid but with different content due to a transition from RO to RW > > > and then back to RO. When an incremental send was attempted using two of > > > those snapshots, it resulted in send emitting a clone operation that was > > > intended to clone from the parent root to the send root - however because > > > both roots had the same received_uuid, the receiver ended up picking the > > > send root as the source root for the clone operation, which happened to > > > result in the clone operation to fail with -EINVAL, due to the source and > > > destination offsets being the same (and on the same root and file). In this > > > particular case there was no harm, but we could end up in a case where the > > > clone operation succeeds but clones wrong data due to picking up an > > > incorrect root - as in the case of multiple snapshots with the same > > > received_uuid, it has no way to know which one is the correct one if they > > > have different content. > > Not to overly discourage this change but I think this will cause some > > issues in user space. > > That this change can cause issues for users is the reason why it hasn't > been merged. The change on the kernel side is simple, but I've been > missing the progs part and the "what-if"s that happen in practice. > > This hasn't been communicated properly so we've got resends without > changes. I had a chat with Nikolay about what's missing so hopefully we > can move forward this time. Any reason why that wasn't shared in the past? > > > For example I had the problem of partial subvols after a sudden > > restart during receive. No problem, just receive to a directory that > > gets deleted on startup then move the subvol to the final location > > after completion. To move the subvol it needs to be temporarily set rw > > for some reason. I'm sure there is some better solution but patterns > > like this might be out there. > > Thanks, that's a case we should take into account. And there are > probably more, but I'm not using send/receive much so that's another > reason why I've been hesitant to merge the patch due to lack of > understanding of the use case. So this has been going on for years, and it's well known that changing a snapshot to RW, do changes on it, then back to RO and then use it for incremental sends, often causes problems. Most of the time, when there is a problem, the receive operation simply fails somewhere, and it's not obvious what went wrong. In last week's case I spent quite a considerable amount of time looking at why a clone operation was failing, convinced the problem was due to a bug in the algorithm of the sending side. But looking at that what scares the most is not those cases that result in an error - it's the cases where the incremental send succeeds but results in a silent corruption, like cloning from the wrong root, because there are multiple snapshots with the same received_uuid but different content. For the cloning case, we can work around and disable cloning completely for incremental sends - not ideal, but at least no silent corruptions due to cloning. For all the other cases not related to cloning, I have no idea how to prevent them. Yes, cases like Martin's are unfortunate and there's no easy way around. But having such failures, and especially silent corruption, is even worse IMO. Do you have suggestions on how to proceed? How would you solve the problem? Thanks.
On 09/09/2021 07:46, Nikolay Borisov wrote: > > > On 9.09.21 г. 0:24, Graham Cobb wrote: >> >> On 08/09/2021 19:33, David Sterba wrote: >>> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote: >> >> <snip> >> >>>> For example I had the problem of partial subvols after a sudden >>>> restart during receive. No problem, just receive to a directory that >>>> gets deleted on startup then move the subvol to the final location >>>> after completion. To move the subvol it needs to be temporarily set rw >>>> for some reason. I'm sure there is some better solution but patterns >>>> like this might be out there. >>> >>> Thanks, that's a case we should take into account. And there are >>> probably more, but I'm not using send/receive much so that's another >>> reason why I've been hesitant to merge the patch due to lack of >>> understanding of the use case. >>> >> >> This seems to be an important change to make, but is user-affecting. A >> few ideas: >> >> 1) Can it be made optional? On by default but possible to turn off >> (mount option, sys file, ...) if you really need to retain the current >> behaviour. > > But the current behavior is buggy and non-intentional so it should be > rectified. Basically we've made it easy for users to do something which > they shouldn't really be doing, they then bear the consequences and come > to the mailing list for support thinking something is broken. Yes, I agree completely. I was disappointed the change wasn't made last time this was discussed on the list. But it wasn't. And I think that was because of concern over the impact: the change will break some users' workflows or scripts and could break some important tools, apps or things like distro installation/upgrade scripts. That was the purpose of my suggestions: to break down barriers which might delay making this happen. >> 2) Is there a way to engage with the developers and user community for >> popular tools which make heavy use of snapshotting and/or send/receive >> for discussion and testing? For example, btrbk, snapper, distros. The point of this suggestion was to address David's concern of not understanding the use case. This could be useful when discussing the timing of the fix (and whether it can be backported to stable kernels). >> 3) How about adding an IOCTL to allow the user to set/delete the >> received_uuid? Only intended for cases which really need to emulate the >> removed behaviour. This could be a less complex long term solution than >> keeping option 1 indefinitely. And this suggestion was to make it "possible" to work round the change but, in practice, harder than just doing the right thing :-) I agree this is probably too far. Graham
On 09.09.2021 00:24, Graham Cobb wrote: > > On 08/09/2021 19:33, David Sterba wrote: >> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote: > > <snip> > >>> For example I had the problem of partial subvols after a sudden >>> restart during receive. No problem, just receive to a directory that >>> gets deleted on startup then move the subvol to the final location >>> after completion. To move the subvol it needs to be temporarily set rw >>> for some reason. I'm sure there is some better solution but patterns >>> like this might be out there. >> Better solution is to delete partial subvolume and start over. No need to mess with internals. >> Thanks, that's a case we should take into account. And there are >> probably more, but I'm not using send/receive much so that's another >> reason why I've been hesitant to merge the patch due to lack of >> understanding of the use case. >> > > This seems to be an important change to make, but is user-affecting. A > few ideas: > > 1) Can it be made optional? On by default but possible to turn off > (mount option, sys file, ...) if you really need to retain the current > behaviour. > And then next btrfs update will start modifying read-write subvolume when you do not expect it and we are back on square one. > 2) Is there a way to engage with the developers and user community for > popular tools which make heavy use of snapshotting and/or send/receive > for discussion and testing? For example, btrbk, snapper, distros. > The only reason you need possibility to flip read-only bit is to allow btrfs receive to write to subvolume and mark it read-only after that. What should have happened instead is subvolume state where only one process (btrfs receive) is allowed to write and subvolume is automatically removed if this process dies before having completed. Even better would be support for checkpoints and restarts. But current situation is really the worst possible. > 3) How about adding an IOCTL to allow the user to set/delete the > received_uuid? Only intended for cases which really need to emulate the > removed behaviour. This could be a less complex long term solution than > keeping option 1 indefinitely. > This IOCTL already exists and it is used by btrfs receive but it does not really solve anything. Why do you expect users to remember to use this IOCTL if users do not remember to never switch read-only subvolume that is part of replication to read-write? tw:/home/bor/python-btrfs/examples # btrfs sub sh /mnt/rcv2/src-snap1/ rcv2/src-snap1 Name: src-snap1 UUID: 05bb994c-605d-3042-b45a-1ba9d88822ea Parent UUID: - Received UUID: 213b224e-016e-0b43-a681-d534027cfe95 Creation time: 2021-09-01 20:30:35 +0300 Subvolume ID: 263 Generation: 118 Gen at creation: 114 Parent ID: 262 Top level ID: 262 Flags: readonly Snapshot(s): src-snap1-clone src-snap1-clone1 tw:/home/bor/python-btrfs/examples # btrfs property set /mnt/rcv2/src-snap1/ ro false tw:/home/bor/python-btrfs/examples # btrfs sub sh /mnt/rcv2/src-snap1/ rcv2/src-snap1 Name: src-snap1 UUID: 05bb994c-605d-3042-b45a-1ba9d88822ea Parent UUID: - Received UUID: 213b224e-016e-0b43-a681-d534027cfe95 Creation time: 2021-09-01 20:30:35 +0300 Subvolume ID: 263 Generation: 118 Gen at creation: 114 Parent ID: 262 Top level ID: 262 Flags: - Snapshot(s): src-snap1-clone src-snap1-clone1 tw:/home/bor/python-btrfs/examples # ./set_received_uuid.py 00000000-0000-0000-0000-000000000000 0 0.0 /mnt/rcv2/src-snap1/ Current subvolume information: subvol_id: 263 received_uuid: 213b224e-016e-0b43-a681-d534027cfe95 stime: 0.0 (1970-01-01T00:00:00) stransid: 111 rtime: 1630517435.297119052 (2021-09-01T17:30:35.297119) rtransid: 115 Setting received subvolume... Resulting subvolume information: subvol_id: 263 received_uuid: 00000000-0000-0000-0000-000000000000 stime: 0.0 (1970-01-01T00:00:00) stransid: 0 rtime: 1631189234.720887105 (2021-09-09T12:07:14.720887) rtransid: 167 tw:/home/bor/python-btrfs/examples # btrfs sub sh /mnt/rcv2/src-snap1/ rcv2/src-snap1 Name: src-snap1 UUID: 05bb994c-605d-3042-b45a-1ba9d88822ea Parent UUID: - Received UUID: - Creation time: 2021-09-01 20:30:35 +0300 Subvolume ID: 263 Generation: 118 Gen at creation: 114 Parent ID: 262 Top level ID: 262 Flags: - Snapshot(s): src-snap1-clone src-snap1-clone1 tw:/home/bor/python-btrfs/examples #
On 09.09.2021 11:37 Graham Cobb wrote: > On 09/09/2021 07:46, Nikolay Borisov wrote: > > > On 9.09.21 г. 0:24, Graham Cobb wrote: >>> On 08/09/2021 19:33, David Sterba wrote: >>>> On Wed, Sep 08, 2021 at 04:34:47PM +0000, Martin Raiber wrote: >>> <snip> >>> >>>>> For example I had the problem of partial subvols after a sudden >>>>> restart during receive. No problem, just receive to a directory that >>>>> gets deleted on startup then move the subvol to the final location >>>>> after completion. To move the subvol it needs to be temporarily set rw >>>>> for some reason. I'm sure there is some better solution but patterns >>>>> like this might be out there. >>>> Thanks, that's a case we should take into account. And there are >>>> probably more, but I'm not using send/receive much so that's another >>>> reason why I've been hesitant to merge the patch due to lack of >>>> understanding of the use case. >>>> >>> This seems to be an important change to make, but is user-affecting. A >>> few ideas: >>> >>> 1) Can it be made optional? On by default but possible to turn off >>> (mount option, sys file, ...) if you really need to retain the current >>> behaviour. >> But the current behavior is buggy and non-intentional so it should be >> rectified. Basically we've made it easy for users to do something which >> they shouldn't really be doing, they then bear the consequences and come >> to the mailing list for support thinking something is broken. > Yes, I agree completely. I was disappointed the change wasn't made last > time this was discussed on the list. But it wasn't. And I think that was > because of concern over the impact: the change will break some users' > workflows or scripts and could break some important tools, apps or > things like distro installation/upgrade scripts. That was the purpose of > my suggestions: to break down barriers which might delay making this happen. > >>> 2) Is there a way to engage with the developers and user community for >>> popular tools which make heavy use of snapshotting and/or send/receive >>> for discussion and testing? For example, btrbk, snapper, distros. > The point of this suggestion was to address David's concern of not > understanding the use case. This could be useful when discussing the > timing of the fix (and whether it can be backported to stable kernels). > >>> 3) How about adding an IOCTL to allow the user to set/delete the >>> received_uuid? Only intended for cases which really need to emulate the >>> removed behaviour. This could be a less complex long term solution than >>> keeping option 1 indefinitely. > And this suggestion was to make it "possible" to work round the change > but, in practice, harder than just doing the right thing :-) I agree > this is probably too far. 5) Have a new subvol flag and only reset received uuid on ro -> rw if this new subvol flag is set. At this point it is completely backwards compatible (if older kernels ignore unknown subvol flags?). Then change btrfs-progs to set this flag during receive (under whatever policy btrfs-progs has for backwards-compatibility). Maybe only for v2 streams but that might further delay the transition. btrfs_ioctl_received_subvol_args even has an already existing flags member.
On 2021/9/8 下午9:51, Nikolay Borisov wrote: > Currently when a read-only snapshot is received and subsequently its > ro property is set to false i.e. switched to rw-mode the > received_uuid/stime/rtime/stransid/rtransid of that subvol remains > intact. However, once the received volume is switched to RW mode we > cannot guaranteee that it contains the same data, so it makes sense > to remove those fields which indicate this volume was ever > send/received. Additionally, sending such volume can cause conflicts > due to the presence of received_uuid. > > Preserving the received_uuid (and related fields) after transitioning the > snapshot from RO to RW and then changing the snapshot has a potential for > causing send to fail in many unexpected ways if we later turn it back to > RO and use it for an incremental send operation. > > A recent example, in the thread to which the Link tag below points to, we > had a user with a filesystem that had multiple snapshots with the same > received_uuid but with different content due to a transition from RO to RW > and then back to RO. When an incremental send was attempted using two of > those snapshots, it resulted in send emitting a clone operation that was > intended to clone from the parent root to the send root - however because > both roots had the same received_uuid, the receiver ended up picking the > send root as the source root for the clone operation, which happened to > result in the clone operation to fail with -EINVAL, due to the source and > destination offsets being the same (and on the same root and file). In this > particular case there was no harm, but we could end up in a case where the > clone operation succeeds but clones wrong data due to picking up an > incorrect root - as in the case of multiple snapshots with the same > received_uuid, it has no way to know which one is the correct one if they > have different content. > > Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/ > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > Suggested-by: David Sterba <dsterba@suse.cz> Reviewed-by: Qu Wenruo <wqu@suse.com> Will add some warning for btrfs-progs to educate users. Thanks, Qu > --- > fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++------ > 1 file changed, 35 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 9eb0c1eb568e..67709d274489 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1927,9 +1927,11 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > struct inode *inode = file_inode(file); > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > struct btrfs_root *root = BTRFS_I(inode)->root; > + struct btrfs_root_item *root_item = &root->root_item; > struct btrfs_trans_handle *trans; > u64 root_flags; > u64 flags; > + bool clear_received_state = false; > int ret = 0; > > if (!inode_owner_or_capable(file_mnt_user_ns(file), inode)) > @@ -1960,9 +1962,9 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > if (!!(flags & BTRFS_SUBVOL_RDONLY) == btrfs_root_readonly(root)) > goto out_drop_sem; > > - root_flags = btrfs_root_flags(&root->root_item); > + root_flags = btrfs_root_flags(root_item); > if (flags & BTRFS_SUBVOL_RDONLY) { > - btrfs_set_root_flags(&root->root_item, > + btrfs_set_root_flags(root_item, > root_flags | BTRFS_ROOT_SUBVOL_RDONLY); > } else { > /* > @@ -1971,9 +1973,10 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > */ > spin_lock(&root->root_item_lock); > if (root->send_in_progress == 0) { > - btrfs_set_root_flags(&root->root_item, > + btrfs_set_root_flags(root_item, > root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY); > spin_unlock(&root->root_item_lock); > + clear_received_state = true; > } else { > spin_unlock(&root->root_item_lock); > btrfs_warn(fs_info, > @@ -1984,14 +1987,40 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, > } > } > > - trans = btrfs_start_transaction(root, 1); > + /* > + * 1 item for updating the uuid root in the root tree > + * 1 item for actually removing the uuid record in the uuid tree > + */ > + trans = btrfs_start_transaction(root, 2); > if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > goto out_reset; > } > > - ret = btrfs_update_root(trans, fs_info->tree_root, > - &root->root_key, &root->root_item); > + if (clear_received_state && > + !btrfs_is_empty_uuid(root_item->received_uuid)) { > + > + ret = btrfs_uuid_tree_remove(trans, root_item->received_uuid, > + BTRFS_UUID_KEY_RECEIVED_SUBVOL, > + root->root_key.objectid); > + > + if (ret && ret != -ENOENT) { > + btrfs_abort_transaction(trans, ret); > + btrfs_end_transaction(trans); > + goto out_reset; > + } > + > + memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE); > + btrfs_set_root_stransid(root_item, 0); > + btrfs_set_root_rtransid(root_item, 0); > + btrfs_set_stack_timespec_sec(&root_item->stime, 0); > + btrfs_set_stack_timespec_nsec(&root_item->stime, 0); > + btrfs_set_stack_timespec_sec(&root_item->rtime, 0); > + btrfs_set_stack_timespec_nsec(&root_item->rtime, 0); > + } > + > + ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key, > + root_item); > if (ret < 0) { > btrfs_end_transaction(trans); > goto out_reset; >
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9eb0c1eb568e..67709d274489 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1927,9 +1927,11 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, struct inode *inode = file_inode(file); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_root_item *root_item = &root->root_item; struct btrfs_trans_handle *trans; u64 root_flags; u64 flags; + bool clear_received_state = false; int ret = 0; if (!inode_owner_or_capable(file_mnt_user_ns(file), inode)) @@ -1960,9 +1962,9 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, if (!!(flags & BTRFS_SUBVOL_RDONLY) == btrfs_root_readonly(root)) goto out_drop_sem; - root_flags = btrfs_root_flags(&root->root_item); + root_flags = btrfs_root_flags(root_item); if (flags & BTRFS_SUBVOL_RDONLY) { - btrfs_set_root_flags(&root->root_item, + btrfs_set_root_flags(root_item, root_flags | BTRFS_ROOT_SUBVOL_RDONLY); } else { /* @@ -1971,9 +1973,10 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, */ spin_lock(&root->root_item_lock); if (root->send_in_progress == 0) { - btrfs_set_root_flags(&root->root_item, + btrfs_set_root_flags(root_item, root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY); spin_unlock(&root->root_item_lock); + clear_received_state = true; } else { spin_unlock(&root->root_item_lock); btrfs_warn(fs_info, @@ -1984,14 +1987,40 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, } } - trans = btrfs_start_transaction(root, 1); + /* + * 1 item for updating the uuid root in the root tree + * 1 item for actually removing the uuid record in the uuid tree + */ + trans = btrfs_start_transaction(root, 2); if (IS_ERR(trans)) { ret = PTR_ERR(trans); goto out_reset; } - ret = btrfs_update_root(trans, fs_info->tree_root, - &root->root_key, &root->root_item); + if (clear_received_state && + !btrfs_is_empty_uuid(root_item->received_uuid)) { + + ret = btrfs_uuid_tree_remove(trans, root_item->received_uuid, + BTRFS_UUID_KEY_RECEIVED_SUBVOL, + root->root_key.objectid); + + if (ret && ret != -ENOENT) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + goto out_reset; + } + + memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE); + btrfs_set_root_stransid(root_item, 0); + btrfs_set_root_rtransid(root_item, 0); + btrfs_set_stack_timespec_sec(&root_item->stime, 0); + btrfs_set_stack_timespec_nsec(&root_item->stime, 0); + btrfs_set_stack_timespec_sec(&root_item->rtime, 0); + btrfs_set_stack_timespec_nsec(&root_item->rtime, 0); + } + + ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key, + root_item); if (ret < 0) { btrfs_end_transaction(trans); goto out_reset;
Currently when a read-only snapshot is received and subsequently its ro property is set to false i.e. switched to rw-mode the received_uuid/stime/rtime/stransid/rtransid of that subvol remains intact. However, once the received volume is switched to RW mode we cannot guaranteee that it contains the same data, so it makes sense to remove those fields which indicate this volume was ever send/received. Additionally, sending such volume can cause conflicts due to the presence of received_uuid. Preserving the received_uuid (and related fields) after transitioning the snapshot from RO to RW and then changing the snapshot has a potential for causing send to fail in many unexpected ways if we later turn it back to RO and use it for an incremental send operation. A recent example, in the thread to which the Link tag below points to, we had a user with a filesystem that had multiple snapshots with the same received_uuid but with different content due to a transition from RO to RW and then back to RO. When an incremental send was attempted using two of those snapshots, it resulted in send emitting a clone operation that was intended to clone from the parent root to the send root - however because both roots had the same received_uuid, the receiver ended up picking the send root as the source root for the clone operation, which happened to result in the clone operation to fail with -EINVAL, due to the source and destination offsets being the same (and on the same root and file). In this particular case there was no harm, but we could end up in a case where the clone operation succeeds but clones wrong data due to picking up an incorrect root - as in the case of multiple snapshots with the same received_uuid, it has no way to know which one is the correct one if they have different content. Link: https://lore.kernel.org/linux-btrfs/CAOaVUnV3L6RpcqJ5gaqzNXWXK0VMkEVXCdihawH1PgS6TiMchQ@mail.gmail.com/ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Suggested-by: David Sterba <dsterba@suse.cz> --- fs/btrfs/ioctl.c | 41 +++++++++++++++++++++++++++++++++++------ 1 file changed, 35 insertions(+), 6 deletions(-)