Message ID | 1433416690-19177-6-git-send-email-robbieko@synology.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: > There's one case where we can't issue a utimes operation for a directory. > When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier > processed and update uime between two parent directory. > So fix this by not update non exist utimes for this case. So you mean that we attempt to update utimes for an inode, corresponding to a directory, that exists in the parent snapshot but not in the send snapshot. So the subject should be something like "Btrfs: incremental send, don't send utimes for non-existing directory" instead of "Btrfs: incremental send, fix rmdir not send utimes" > > Example: > > Parent snapshot: > |---- a/ (ino 259) > |---- c (ino 264) > |---- b/ (ino 260) > |---- d (ino 265) > |---- del/ (ino 263) > |---- item1/ (ino 261) > |---- item2/ (ino 262) > > Send snapshot: > |---- a/ (ino 259) > |---- b/ (ino 260) > |---- c/ (ino 2) > |---- item2 (ino 259) > |---- d/ (ino 257) > |---- item1/ (ino 258) > > Signed-off-by: Robbie Ko <robbieko@synology.com> > --- > fs/btrfs/send.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > index e8eb3ab..46f954c 100644 > --- a/fs/btrfs/send.c > +++ b/fs/btrfs/send.c > @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); > key.type = BTRFS_INODE_ITEM_KEY; > key.offset = 0; > ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); > - if (ret < 0) > + if (ret != 0) > goto out; So I don't think this is a good fix. The problem is in some code that calls this function (send_utimes) against the directory that doesn't exist - it just shouldn't do that, its logic should be fixed. Following this approach, while it works, it's just hiding logic errors in one or more code paths, and none of its callers checks for a return value of 1 - they only react to values < 0 and introduces the possibility of propagating a return value of 1 to user space. thanks > > eb = path->nodes[0]; > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Filipe, I've fixed "don't send utimes for non-existing directory" with another solution. In apply_dir_move(), the old parent dir. and new parent dir. will be updated after the current dir. has moved. And there's only one entry in old parent dir. (e.g. entry with smallest ino) will be tagged with rmdir_ino to prevent its parent dir. is deleted but updated. However, if we process rename for another entry not tagged with rmdir_ino first, its old parent dir. which is deleted will be updated according to apply_dir_move(). Therefore, I think we should check the existence of the dir. before we're going to update it's utime. The patch is pasted in the following link, could you give me some comment? https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k Thans! Robbie Ko 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: > On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: >> There's one case where we can't issue a utimes operation for a directory. >> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier >> processed and update uime between two parent directory. >> So fix this by not update non exist utimes for this case. > > So you mean that we attempt to update utimes for an inode, > corresponding to a directory, that exists in the parent snapshot but > not in the send snapshot. > > So the subject should be something like "Btrfs: incremental send, > don't send utimes for non-existing directory" instead of "Btrfs: > incremental send, fix rmdir not send utimes" > >> >> Example: >> >> Parent snapshot: >> |---- a/ (ino 259) >> |---- c (ino 264) >> |---- b/ (ino 260) >> |---- d (ino 265) >> |---- del/ (ino 263) >> |---- item1/ (ino 261) >> |---- item2/ (ino 262) >> >> Send snapshot: >> |---- a/ (ino 259) >> |---- b/ (ino 260) >> |---- c/ (ino 2) >> |---- item2 (ino 259) >> |---- d/ (ino 257) >> |---- item1/ (ino 258) >> >> Signed-off-by: Robbie Ko <robbieko@synology.com> >> --- >> fs/btrfs/send.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >> index e8eb3ab..46f954c 100644 >> --- a/fs/btrfs/send.c >> +++ b/fs/btrfs/send.c >> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); >> key.type = BTRFS_INODE_ITEM_KEY; >> key.offset = 0; >> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); >> - if (ret < 0) >> + if (ret != 0) >> goto out; > > So I don't think this is a good fix. The problem is in some code that > calls this function (send_utimes) against the directory that doesn't > exist - it just shouldn't do that, its logic should be fixed. > Following this approach, while it works, it's just hiding logic errors > in one or more code paths, and none of its callers checks for a return > value of 1 - they only react to values < 0 and introduces the > possibility of propagating a return value of 1 to user space. > > thanks > >> >> eb = path->nodes[0]; >> -- >> 1.9.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote: > Hi Filipe, Hi Robbie, > > I've fixed "don't send utimes for non-existing directory" with another solution. > > In apply_dir_move(), the old parent dir. and new parent dir. will be > updated after the current dir. has moved. > > And there's only one entry in old parent dir. (e.g. entry with > smallest ino) will be tagged with rmdir_ino to prevent its parent dir. > is deleted but updated. Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino? rmdir_ino corresponds to the number of a inode that wasn't deleted when it was processed because there was some inode with a lower number that is a child of the directory in the parent snapshot and had its rename/move operation delayed (it happens after the directory we want to delete is processed). > > However, if we process rename for another entry not tagged with > rmdir_ino first, its old parent dir. which is deleted will be updated > according to apply_dir_move(). > > Therefore, I think we should check the existence of the dir. before > we're going to update it's utime. > > The patch is pasted in the following link, could you give me some comment? > > https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k Looks better. However I still don't understand your explanation, and just tried the example in your commit message: "Parent snapshot: |---- a/ (ino 259) |---- c (ino 264) |---- b/ (ino 260) |---- d (ino 265) |---- del/ (ino 263) |---- item1/ (ino 261) |---- item2/ (ino 262) Send snapshot: |---- a/ (ino 259) |---- b/ (ino 260) |---- c/ (ino 2) |---- item2 (ino 259) |---- d/ (ino 257) |---- item1/ (ino 258)" So it's confusing after looking at it. First the send snapshot mentions inode number 2, which doesn't exist in the parent snapshot - I assume you meant inode number 264. Then, the send snapshot has two inodes with number 259. Is "item2" in the send snapshot supposed to be inode 262? Anyway, assuming those 2 fixes to the example are correct guesses, I tried the following and it didn't fail without your patches (i.e. no attempts to send utimes to a non-existing directory): # Parent snapshot: # # |---- a/ (ino 259) # | |---- c (ino 264) # | # |---- b/ (ino 260) # | |---- d (ino 265) # | # |---- del/ (ino 263) # |---- item1/ (ino 261) # |---- item2/ (ino 262) # Send snapshot: # # |---- a/ (ino 259) # |---- b/ (ino 260) # |---- c/ (ino 264) # | |---- item2/ (ino 262) # | # |---- d/ (ino 265) # |---- item1/ (ino 258) mkdir $SCRATCH_MNT/0 mkdir $SCRATCH_MNT/1 mkdir $SCRATCH_MNT/a # 259 mkdir $SCRATCH_MNT/b # 260 mkdir $SCRATCH_MNT/item1 # 261 mkdir $SCRATCH_MNT/item2 # 262 mkdir $SCRATCH_MNT/del # 263 mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1 mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2 mkdir $SCRATCH_MNT/a/c # 264 mkdir $SCRATCH_MNT/b/d # 265 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d rmdir $SCRATCH_MNT/del _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ $SCRATCH_MNT/mysnap2 _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ -f $tmp/2.snap _check_scratch_fs _scratch_unmount _scratch_mkfs >/dev/null 2>&1 _scratch_mount _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 I would suggest making those hiearachy diagrams more readable - pipes right below the name of their parent, continuation pipes like and align all inode numbers in the same column, like the following: # Parent snapshot: # # |---- a/ (ino 259) # | |---- c (ino 264) # | # |---- b/ (ino 260) # | |---- d (ino 265) # | # |---- del/ (ino 263) # |---- item1/ (ino 261) # |---- item2/ (ino 262) # Send snapshot: # # |---- a/ (ino 259) # |---- b/ (ino 260) # |---- c/ (ino 264) # | |---- item2/ (ino 262) # | # |---- d/ (ino 265) # |---- item1/ (ino 258) (pasted here in case gmail screws up the indentation/formatting: https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU) thanks > > Thans! > > Robbie Ko > > 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: >>> There's one case where we can't issue a utimes operation for a directory. >>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier >>> processed and update uime between two parent directory. >>> So fix this by not update non exist utimes for this case. >> >> So you mean that we attempt to update utimes for an inode, >> corresponding to a directory, that exists in the parent snapshot but >> not in the send snapshot. >> >> So the subject should be something like "Btrfs: incremental send, >> don't send utimes for non-existing directory" instead of "Btrfs: >> incremental send, fix rmdir not send utimes" >> >>> >>> Example: >>> >>> Parent snapshot: >>> |---- a/ (ino 259) >>> |---- c (ino 264) >>> |---- b/ (ino 260) >>> |---- d (ino 265) >>> |---- del/ (ino 263) >>> |---- item1/ (ino 261) >>> |---- item2/ (ino 262) >>> >>> Send snapshot: >>> |---- a/ (ino 259) >>> |---- b/ (ino 260) >>> |---- c/ (ino 2) >>> |---- item2 (ino 259) >>> |---- d/ (ino 257) >>> |---- item1/ (ino 258) >>> >>> Signed-off-by: Robbie Ko <robbieko@synology.com> >>> --- >>> fs/btrfs/send.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>> index e8eb3ab..46f954c 100644 >>> --- a/fs/btrfs/send.c >>> +++ b/fs/btrfs/send.c >>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); >>> key.type = BTRFS_INODE_ITEM_KEY; >>> key.offset = 0; >>> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); >>> - if (ret < 0) >>> + if (ret != 0) >>> goto out; >> >> So I don't think this is a good fix. The problem is in some code that >> calls this function (send_utimes) against the directory that doesn't >> exist - it just shouldn't do that, its logic should be fixed. >> Following this approach, while it works, it's just hiding logic errors >> in one or more code paths, and none of its callers checks for a return >> value of 1 - they only react to values < 0 and introduces the >> possibility of propagating a return value of 1 to user space. >> >> thanks >> >>> >>> eb = path->nodes[0]; >>> -- >>> 1.9.1 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That's why all progress depends on unreasonable men."
Hi Filipe, 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: > On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote: >> Hi Filipe, > > Hi Robbie, > >> >> I've fixed "don't send utimes for non-existing directory" with another solution. >> >> In apply_dir_move(), the old parent dir. and new parent dir. will be >> updated after the current dir. has moved. >> >> And there's only one entry in old parent dir. (e.g. entry with >> smallest ino) will be tagged with rmdir_ino to prevent its parent dir. >> is deleted but updated. > > Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino? > rmdir_ino corresponds to the number of a inode that wasn't deleted > when it was processed because there was some inode with a lower number > that is a child of the directory in the parent snapshot and had its > rename/move operation delayed (it happens after the directory we want > to delete is processed). > Right , my "tagged with rmdir_ino" is same meaning as you explained here. >> >> However, if we process rename for another entry not tagged with >> rmdir_ino first, its old parent dir. which is deleted will be updated >> according to apply_dir_move(). >> >> Therefore, I think we should check the existence of the dir. before >> we're going to update it's utime. >> >> The patch is pasted in the following link, could you give me some comment? >> >> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k > > Looks better. > However I still don't understand your explanation, and just tried the > example in your commit message: > > "Parent snapshot: > > |---- a/ (ino 259) > |---- c (ino 264) > |---- b/ (ino 260) > |---- d (ino 265) > |---- del/ (ino 263) > |---- item1/ (ino 261) > |---- item2/ (ino 262) > > Send snapshot: > |---- a/ (ino 259) > |---- b/ (ino 260) > |---- c/ (ino 2) > |---- item2 (ino 259) > |---- d/ (ino 257) > |---- item1/ (ino 258)" > > So it's confusing after looking at it. > First the send snapshot mentions inode number 2, which doesn't exist > in the parent snapshot - I assume you meant inode number 264. > Then, the send snapshot has two inodes with number 259. Is "item2" in > the send snapshot supposed to be inode 262? > Your guess is right. And I correct it as follow. # Parent snapshot: # # |---- a/ (ino 259) # | |---- c (ino 264) # | # |---- b/ (ino 260) # | |---- d (ino 265) # | # |---- del/ (ino 263) # |---- item1/ (ino 261) # |---- item2/ (ino 262) # Send snapshot: # # |---- a/ (ino 259) # |---- b/ (ino 260) # |---- c/ (ino 264) # | |---- item2/ (ino 262) # | # |---- d/ (ino 265) # |---- item1/ (ino 261) > Anyway, assuming those 2 fixes to the example are correct guesses, I > tried the following and it didn't fail without your patches (i.e. no > attempts to send utimes to a non-existing directory): > Here my mean is : btrfs tries to get utime from non-existing directory and apply it on the existing directory. And my patch is attempted to avoid this case. However, this case is not guaranteed to cause error anytime but it may fails somehow which is depending on the data on the disk. The following are the incremental procedures to send the snapshot. utimes utimes a utimes b rename del -> o263-259-0 utimes rename a/c -> c utimes utimes a rename o263-259-0/item2 -> c/item2 utimes c/item2 utimes o263-259-0 <<---------------------- this step may cause error utimes c utimes c rename b/d -> d utimes utimes b rename o263-259-0/item1 -> d/item1 rmdir o263-259-0 utimes d/item1 utimes d utimes d As the above pointed procedure, o263-259-0 is not appeared in the send root. When utime got from o263-259-0 is invalid (i.e. out of range of time format), it will fail. I saw the error occurs at utimensat() and got EINVAL. Here's explanation from linux man page. EINVAL Invalid value in one of the tv_nsec fields (value outside range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or an invalid value in one of the tv_sec fields. So if o263-259-0 is not the send root, invalid format of utime may be got. Thanks. Robbie Ko > # Parent snapshot: > # > # |---- a/ (ino 259) > # | |---- c (ino 264) > # | > # |---- b/ (ino 260) > # | |---- d (ino 265) > # | > # |---- del/ (ino 263) > # |---- item1/ (ino 261) > # |---- item2/ (ino 262) > > # Send snapshot: > # > # |---- a/ (ino 259) > # |---- b/ (ino 260) > # |---- c/ (ino 264) > # | |---- item2/ (ino 262) > # | > # |---- d/ (ino 265) > # |---- item1/ (ino 258) > > mkdir $SCRATCH_MNT/0 > mkdir $SCRATCH_MNT/1 > > mkdir $SCRATCH_MNT/a # 259 > mkdir $SCRATCH_MNT/b # 260 > mkdir $SCRATCH_MNT/item1 # 261 > mkdir $SCRATCH_MNT/item2 # 262 > mkdir $SCRATCH_MNT/del # 263 > mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1 > mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2 > mkdir $SCRATCH_MNT/a/c # 264 > mkdir $SCRATCH_MNT/b/d # 265 > > _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 > > mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c > mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d > mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c > mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d > rmdir $SCRATCH_MNT/del > > _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 > > run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 > run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ > $SCRATCH_MNT/mysnap2 > > _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap > _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ > -f $tmp/2.snap > > _check_scratch_fs > > _scratch_unmount > _scratch_mkfs >/dev/null 2>&1 > _scratch_mount > > _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap > run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 > > _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap > run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 > > > > I would suggest making those hiearachy diagrams more readable - pipes > right below the name of their parent, continuation pipes like and > align all inode numbers in the same column, like the following: > > # Parent snapshot: > # > # |---- a/ (ino 259) > # | |---- c (ino 264) > # | > # |---- b/ (ino 260) > # | |---- d (ino 265) > # | > # |---- del/ (ino 263) > # |---- item1/ (ino 261) > # |---- item2/ (ino 262) > > # Send snapshot: > # > # |---- a/ (ino 259) > # |---- b/ (ino 260) > # |---- c/ (ino 264) > # | |---- item2/ (ino 262) > # | > # |---- d/ (ino 265) > # |---- item1/ (ino 258) > > (pasted here in case gmail screws up the indentation/formatting: > https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU) > > thanks > >> >> Thans! >> >> Robbie Ko >> >> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: >>>> There's one case where we can't issue a utimes operation for a directory. >>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier >>>> processed and update uime between two parent directory. >>>> So fix this by not update non exist utimes for this case. >>> >>> So you mean that we attempt to update utimes for an inode, >>> corresponding to a directory, that exists in the parent snapshot but >>> not in the send snapshot. >>> >>> So the subject should be something like "Btrfs: incremental send, >>> don't send utimes for non-existing directory" instead of "Btrfs: >>> incremental send, fix rmdir not send utimes" >>> >>>> >>>> Example: >>>> >>>> Parent snapshot: >>>> |---- a/ (ino 259) >>>> |---- c (ino 264) >>>> |---- b/ (ino 260) >>>> |---- d (ino 265) >>>> |---- del/ (ino 263) >>>> |---- item1/ (ino 261) >>>> |---- item2/ (ino 262) >>>> >>>> Send snapshot: >>>> |---- a/ (ino 259) >>>> |---- b/ (ino 260) >>>> |---- c/ (ino 2) >>>> |---- item2 (ino 259) >>>> |---- d/ (ino 257) >>>> |---- item1/ (ino 258) >>>> >>>> Signed-off-by: Robbie Ko <robbieko@synology.com> >>>> --- >>>> fs/btrfs/send.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>> index e8eb3ab..46f954c 100644 >>>> --- a/fs/btrfs/send.c >>>> +++ b/fs/btrfs/send.c >>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); >>>> key.type = BTRFS_INODE_ITEM_KEY; >>>> key.offset = 0; >>>> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); >>>> - if (ret < 0) >>>> + if (ret != 0) >>>> goto out; >>> >>> So I don't think this is a good fix. The problem is in some code that >>> calls this function (send_utimes) against the directory that doesn't >>> exist - it just shouldn't do that, its logic should be fixed. >>> Following this approach, while it works, it's just hiding logic errors >>> in one or more code paths, and none of its callers checks for a return >>> value of 1 - they only react to values < 0 and introduces the >>> possibility of propagating a return value of 1 to user space. >>> >>> thanks >>> >>>> >>>> eb = path->nodes[0]; >>>> -- >>>> 1.9.1 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> -- >>> Filipe David Manana, >>> >>> "Reasonable men adapt themselves to the world. >>> Unreasonable men adapt the world to themselves. >>> That's why all progress depends on unreasonable men." > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote: > Hi Filipe, > > 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote: >>> Hi Filipe, >> >> Hi Robbie, >> >>> >>> I've fixed "don't send utimes for non-existing directory" with another solution. >>> >>> In apply_dir_move(), the old parent dir. and new parent dir. will be >>> updated after the current dir. has moved. >>> >>> And there's only one entry in old parent dir. (e.g. entry with >>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir. >>> is deleted but updated. >> >> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino? >> rmdir_ino corresponds to the number of a inode that wasn't deleted >> when it was processed because there was some inode with a lower number >> that is a child of the directory in the parent snapshot and had its >> rename/move operation delayed (it happens after the directory we want >> to delete is processed). >> > > Right , my "tagged with rmdir_ino" is same meaning as you explained here. > >>> >>> However, if we process rename for another entry not tagged with >>> rmdir_ino first, its old parent dir. which is deleted will be updated >>> according to apply_dir_move(). >>> >>> Therefore, I think we should check the existence of the dir. before >>> we're going to update it's utime. >>> >>> The patch is pasted in the following link, could you give me some comment? >>> >>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k >> >> Looks better. >> However I still don't understand your explanation, and just tried the >> example in your commit message: >> >> "Parent snapshot: >> >> |---- a/ (ino 259) >> |---- c (ino 264) >> |---- b/ (ino 260) >> |---- d (ino 265) >> |---- del/ (ino 263) >> |---- item1/ (ino 261) >> |---- item2/ (ino 262) >> >> Send snapshot: >> |---- a/ (ino 259) >> |---- b/ (ino 260) >> |---- c/ (ino 2) >> |---- item2 (ino 259) >> |---- d/ (ino 257) >> |---- item1/ (ino 258)" >> >> So it's confusing after looking at it. >> First the send snapshot mentions inode number 2, which doesn't exist >> in the parent snapshot - I assume you meant inode number 264. >> Then, the send snapshot has two inodes with number 259. Is "item2" in >> the send snapshot supposed to be inode 262? >> > > Your guess is right. And I correct it as follow. > > # Parent snapshot: > # > # |---- a/ (ino 259) > # | |---- c (ino 264) > # | > # |---- b/ (ino 260) > # | |---- d (ino 265) > # | > # |---- del/ (ino 263) > # |---- item1/ (ino 261) > # |---- item2/ (ino 262) > > # Send snapshot: > # > # |---- a/ (ino 259) > # |---- b/ (ino 260) > # |---- c/ (ino 264) > # | |---- item2/ (ino 262) > # | > # |---- d/ (ino 265) > # |---- item1/ (ino 261) > >> Anyway, assuming those 2 fixes to the example are correct guesses, I >> tried the following and it didn't fail without your patches (i.e. no >> attempts to send utimes to a non-existing directory): >> > > Here my mean is : btrfs tries to get utime from non-existing directory and > apply it on the existing directory. And my patch is attempted to avoid > this case. > However, this case is not guaranteed to cause error anytime but it may > fails somehow > which is depending on the data on the disk. > The following are the incremental procedures to send the snapshot. > > utimes > utimes a > utimes b > rename del -> o263-259-0 > utimes > rename a/c -> c > utimes > utimes a > rename o263-259-0/item2 -> c/item2 > utimes c/item2 > utimes o263-259-0 <<---------------------- this step may cause error Why may it cause an error? At that moment the name/path o263-259-0 exists at the destination (i.e. the receiver, as it applies commands from the send stream serially). > utimes c > utimes c > rename b/d -> d > utimes > utimes b > rename o263-259-0/item1 -> d/item1 > rmdir o263-259-0 > utimes d/item1 > utimes d > utimes d > > As the above pointed procedure, o263-259-0 is not appeared in the send root. Well yes, but that doesn't matter. The oXXX-YYY-ZZZ names are never in any of the roots (send or parent), they're just temporary names to allow for rename/move operations when collisions are detected, and exist only in the receiver's filesystem. > When utime got from o263-259-0 is invalid (i.e. out of range of time > format), it will fail. > I saw the error occurs at utimensat() and got EINVAL. Here's > explanation from linux man page. > > EINVAL Invalid value in one of the tv_nsec fields (value outside > range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or > an invalid value in one of the tv_sec fields. > > So if o263-259-0 is not the send root, invalid format of utime may be got. Doubly confused now. So the whole change log (and the code changes) mentions only attempts to send utimes for a directory/path that doesn't exist in the receiver's fs - inode 263 exists in both the send and parent roots, so we can always get its utimes values. Now you're saying that somewhere in the send code we're getting incorrect values for a utimes operation and then sending such operation to the receiver? Getting the values for utimes is done looking up the inode item, by its number, in the send root - for this the current name of the inode in the receiver fs doesn't matter - it matters only for building the path for the utimes operation. But what I'm understanding from your reply is that we're getting a wrong utimes value form the inode item for inode number 263, which would be a totally different issue from generating incorrect paths. So this still doesn't explain me why, without any of your patches applied, the reproducer doesn't fail - which is really an important aspect. This is the xfstest I made with your reproducer: https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and tests/btrfs/999.out is just contains the single line "QA output created by 999"). If you're able to give me a correct reproducer, I can make sense of it and help getting a better change log. Or this issue happens only after applying some of the other patches in the series? thanks > > Thanks. > Robbie Ko > >> # Parent snapshot: >> # >> # |---- a/ (ino 259) >> # | |---- c (ino 264) >> # | >> # |---- b/ (ino 260) >> # | |---- d (ino 265) >> # | >> # |---- del/ (ino 263) >> # |---- item1/ (ino 261) >> # |---- item2/ (ino 262) >> >> # Send snapshot: >> # >> # |---- a/ (ino 259) >> # |---- b/ (ino 260) >> # |---- c/ (ino 264) >> # | |---- item2/ (ino 262) >> # | >> # |---- d/ (ino 265) >> # |---- item1/ (ino 258) >> >> mkdir $SCRATCH_MNT/0 >> mkdir $SCRATCH_MNT/1 >> >> mkdir $SCRATCH_MNT/a # 259 >> mkdir $SCRATCH_MNT/b # 260 >> mkdir $SCRATCH_MNT/item1 # 261 >> mkdir $SCRATCH_MNT/item2 # 262 >> mkdir $SCRATCH_MNT/del # 263 >> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1 >> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2 >> mkdir $SCRATCH_MNT/a/c # 264 >> mkdir $SCRATCH_MNT/b/d # 265 >> >> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 >> >> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c >> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d >> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c >> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d >> rmdir $SCRATCH_MNT/del >> >> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 >> >> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 >> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ >> $SCRATCH_MNT/mysnap2 >> >> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap >> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ >> -f $tmp/2.snap >> >> _check_scratch_fs >> >> _scratch_unmount >> _scratch_mkfs >/dev/null 2>&1 >> _scratch_mount >> >> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap >> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 >> >> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap >> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 >> >> >> >> I would suggest making those hiearachy diagrams more readable - pipes >> right below the name of their parent, continuation pipes like and >> align all inode numbers in the same column, like the following: >> >> # Parent snapshot: >> # >> # |---- a/ (ino 259) >> # | |---- c (ino 264) >> # | >> # |---- b/ (ino 260) >> # | |---- d (ino 265) >> # | >> # |---- del/ (ino 263) >> # |---- item1/ (ino 261) >> # |---- item2/ (ino 262) >> >> # Send snapshot: >> # >> # |---- a/ (ino 259) >> # |---- b/ (ino 260) >> # |---- c/ (ino 264) >> # | |---- item2/ (ino 262) >> # | >> # |---- d/ (ino 265) >> # |---- item1/ (ino 258) >> >> (pasted here in case gmail screws up the indentation/formatting: >> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU) >> >> thanks >> >>> >>> Thans! >>> >>> Robbie Ko >>> >>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: >>>>> There's one case where we can't issue a utimes operation for a directory. >>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier >>>>> processed and update uime between two parent directory. >>>>> So fix this by not update non exist utimes for this case. >>>> >>>> So you mean that we attempt to update utimes for an inode, >>>> corresponding to a directory, that exists in the parent snapshot but >>>> not in the send snapshot. >>>> >>>> So the subject should be something like "Btrfs: incremental send, >>>> don't send utimes for non-existing directory" instead of "Btrfs: >>>> incremental send, fix rmdir not send utimes" >>>> >>>>> >>>>> Example: >>>>> >>>>> Parent snapshot: >>>>> |---- a/ (ino 259) >>>>> |---- c (ino 264) >>>>> |---- b/ (ino 260) >>>>> |---- d (ino 265) >>>>> |---- del/ (ino 263) >>>>> |---- item1/ (ino 261) >>>>> |---- item2/ (ino 262) >>>>> >>>>> Send snapshot: >>>>> |---- a/ (ino 259) >>>>> |---- b/ (ino 260) >>>>> |---- c/ (ino 2) >>>>> |---- item2 (ino 259) >>>>> |---- d/ (ino 257) >>>>> |---- item1/ (ino 258) >>>>> >>>>> Signed-off-by: Robbie Ko <robbieko@synology.com> >>>>> --- >>>>> fs/btrfs/send.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>>> index e8eb3ab..46f954c 100644 >>>>> --- a/fs/btrfs/send.c >>>>> +++ b/fs/btrfs/send.c >>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); >>>>> key.type = BTRFS_INODE_ITEM_KEY; >>>>> key.offset = 0; >>>>> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); >>>>> - if (ret < 0) >>>>> + if (ret != 0) >>>>> goto out; >>>> >>>> So I don't think this is a good fix. The problem is in some code that >>>> calls this function (send_utimes) against the directory that doesn't >>>> exist - it just shouldn't do that, its logic should be fixed. >>>> Following this approach, while it works, it's just hiding logic errors >>>> in one or more code paths, and none of its callers checks for a return >>>> value of 1 - they only react to values < 0 and introduces the >>>> possibility of propagating a return value of 1 to user space. >>>> >>>> thanks >>>> >>>>> >>>>> eb = path->nodes[0]; >>>>> -- >>>>> 1.9.1 >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> -- >>>> Filipe David Manana, >>>> >>>> "Reasonable men adapt themselves to the world. >>>> Unreasonable men adapt the world to themselves. >>>> That's why all progress depends on unreasonable men." >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That's why all progress depends on unreasonable men."
Hi Filipi, 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: > On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote: >> Hi Filipe, >> >> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote: >>>> Hi Filipe, >>> >>> Hi Robbie, >>> >>>> >>>> I've fixed "don't send utimes for non-existing directory" with another solution. >>>> >>>> In apply_dir_move(), the old parent dir. and new parent dir. will be >>>> updated after the current dir. has moved. >>>> >>>> And there's only one entry in old parent dir. (e.g. entry with >>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir. >>>> is deleted but updated. >>> >>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino? >>> rmdir_ino corresponds to the number of a inode that wasn't deleted >>> when it was processed because there was some inode with a lower number >>> that is a child of the directory in the parent snapshot and had its >>> rename/move operation delayed (it happens after the directory we want >>> to delete is processed). >>> >> >> Right , my "tagged with rmdir_ino" is same meaning as you explained here. >> >>>> >>>> However, if we process rename for another entry not tagged with >>>> rmdir_ino first, its old parent dir. which is deleted will be updated >>>> according to apply_dir_move(). >>>> >>>> Therefore, I think we should check the existence of the dir. before >>>> we're going to update it's utime. >>>> >>>> The patch is pasted in the following link, could you give me some comment? >>>> >>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k >>> >>> Looks better. >>> However I still don't understand your explanation, and just tried the >>> example in your commit message: >>> >>> "Parent snapshot: >>> >>> |---- a/ (ino 259) >>> |---- c (ino 264) >>> |---- b/ (ino 260) >>> |---- d (ino 265) >>> |---- del/ (ino 263) >>> |---- item1/ (ino 261) >>> |---- item2/ (ino 262) >>> >>> Send snapshot: >>> |---- a/ (ino 259) >>> |---- b/ (ino 260) >>> |---- c/ (ino 2) >>> |---- item2 (ino 259) >>> |---- d/ (ino 257) >>> |---- item1/ (ino 258)" >>> >>> So it's confusing after looking at it. >>> First the send snapshot mentions inode number 2, which doesn't exist >>> in the parent snapshot - I assume you meant inode number 264. >>> Then, the send snapshot has two inodes with number 259. Is "item2" in >>> the send snapshot supposed to be inode 262? >>> >> >> Your guess is right. And I correct it as follow. >> >> # Parent snapshot: >> # >> # |---- a/ (ino 259) >> # | |---- c (ino 264) >> # | >> # |---- b/ (ino 260) >> # | |---- d (ino 265) >> # | >> # |---- del/ (ino 263) >> # |---- item1/ (ino 261) >> # |---- item2/ (ino 262) >> >> # Send snapshot: >> # >> # |---- a/ (ino 259) >> # |---- b/ (ino 260) >> # |---- c/ (ino 264) >> # | |---- item2/ (ino 262) >> # | >> # |---- d/ (ino 265) >> # |---- item1/ (ino 261) >> >>> Anyway, assuming those 2 fixes to the example are correct guesses, I >>> tried the following and it didn't fail without your patches (i.e. no >>> attempts to send utimes to a non-existing directory): >>> >> >> Here my mean is : btrfs tries to get utime from non-existing directory and >> apply it on the existing directory. And my patch is attempted to avoid >> this case. >> However, this case is not guaranteed to cause error anytime but it may >> fails somehow >> which is depending on the data on the disk. >> The following are the incremental procedures to send the snapshot. >> >> utimes >> utimes a >> utimes b >> rename del -> o263-259-0 >> utimes >> rename a/c -> c >> utimes >> utimes a >> rename o263-259-0/item2 -> c/item2 >> utimes c/item2 >> utimes o263-259-0 <<---------------------- this step may cause error > > Why may it cause an error? > At that moment the name/path o263-259-0 exists at the destination > (i.e. the receiver, as it applies commands from the send stream > serially). Following is the error message when do receive. ERROR: utimes o263-259-0 failed. Invalid argument The argument of utimes for o263-259-0 is got from inode 263 in send root, But inode 263 is not exist in send root In send_utimes(), we didn't check if btrfs_search_slot returns 1, therefore may encounters this problem. I will try to make a stable reproducer. Thanks. Robbie Ko > >> utimes c >> utimes c >> rename b/d -> d >> utimes >> utimes b >> rename o263-259-0/item1 -> d/item1 >> rmdir o263-259-0 >> utimes d/item1 >> utimes d >> utimes d >> >> As the above pointed procedure, o263-259-0 is not appeared in the send root. > > Well yes, but that doesn't matter. > The oXXX-YYY-ZZZ names are never in any of the roots (send or parent), > they're just temporary names to allow for rename/move operations when > collisions are detected, and exist only in the receiver's filesystem. > >> When utime got from o263-259-0 is invalid (i.e. out of range of time >> format), it will fail. >> I saw the error occurs at utimensat() and got EINVAL. Here's >> explanation from linux man page. >> >> EINVAL Invalid value in one of the tv_nsec fields (value outside >> range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or >> an invalid value in one of the tv_sec fields. >> >> So if o263-259-0 is not the send root, invalid format of utime may be got. > > Doubly confused now. So the whole change log (and the code changes) > mentions only attempts to send utimes for a directory/path that > doesn't exist in the receiver's fs - inode 263 exists in both the send > and parent roots, so we can always get its utimes values. Now you're > saying that somewhere in the send code we're getting incorrect values > for a utimes operation and then sending such operation to the > receiver? > > Getting the values for utimes is done looking up the inode item, by > its number, in the send root - for this the current name of the inode > in the receiver fs doesn't matter - it matters only for building the > path for the utimes operation. But what I'm understanding from your > reply is that we're getting a wrong utimes value form the inode item > for inode number 263, which would be a totally different issue from > generating incorrect paths. > > > So this still doesn't explain me why, without any of your patches > applied, the reproducer doesn't fail - which is really an important > aspect. > This is the xfstest I made with your reproducer: > https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and > tests/btrfs/999.out is just contains the single line "QA output > created by 999"). > > If you're able to give me a correct reproducer, I can make sense of it > and help getting a better change log. > Or this issue happens only after applying some of the other patches in > the series? > > thanks > > > > >> >> Thanks. >> Robbie Ko >> >>> # Parent snapshot: >>> # >>> # |---- a/ (ino 259) >>> # | |---- c (ino 264) >>> # | >>> # |---- b/ (ino 260) >>> # | |---- d (ino 265) >>> # | >>> # |---- del/ (ino 263) >>> # |---- item1/ (ino 261) >>> # |---- item2/ (ino 262) >>> >>> # Send snapshot: >>> # >>> # |---- a/ (ino 259) >>> # |---- b/ (ino 260) >>> # |---- c/ (ino 264) >>> # | |---- item2/ (ino 262) >>> # | >>> # |---- d/ (ino 265) >>> # |---- item1/ (ino 258) >>> >>> mkdir $SCRATCH_MNT/0 >>> mkdir $SCRATCH_MNT/1 >>> >>> mkdir $SCRATCH_MNT/a # 259 >>> mkdir $SCRATCH_MNT/b # 260 >>> mkdir $SCRATCH_MNT/item1 # 261 >>> mkdir $SCRATCH_MNT/item2 # 262 >>> mkdir $SCRATCH_MNT/del # 263 >>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1 >>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2 >>> mkdir $SCRATCH_MNT/a/c # 264 >>> mkdir $SCRATCH_MNT/b/d # 265 >>> >>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 >>> >>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c >>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d >>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c >>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d >>> rmdir $SCRATCH_MNT/del >>> >>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 >>> >>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 >>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ >>> $SCRATCH_MNT/mysnap2 >>> >>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap >>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ >>> -f $tmp/2.snap >>> >>> _check_scratch_fs >>> >>> _scratch_unmount >>> _scratch_mkfs >/dev/null 2>&1 >>> _scratch_mount >>> >>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap >>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 >>> >>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap >>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 >>> >>> >>> >>> I would suggest making those hiearachy diagrams more readable - pipes >>> right below the name of their parent, continuation pipes like and >>> align all inode numbers in the same column, like the following: >>> >>> # Parent snapshot: >>> # >>> # |---- a/ (ino 259) >>> # | |---- c (ino 264) >>> # | >>> # |---- b/ (ino 260) >>> # | |---- d (ino 265) >>> # | >>> # |---- del/ (ino 263) >>> # |---- item1/ (ino 261) >>> # |---- item2/ (ino 262) >>> >>> # Send snapshot: >>> # >>> # |---- a/ (ino 259) >>> # |---- b/ (ino 260) >>> # |---- c/ (ino 264) >>> # | |---- item2/ (ino 262) >>> # | >>> # |---- d/ (ino 265) >>> # |---- item1/ (ino 258) >>> >>> (pasted here in case gmail screws up the indentation/formatting: >>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU) >>> >>> thanks >>> >>>> >>>> Thans! >>>> >>>> Robbie Ko >>>> >>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: >>>>>> There's one case where we can't issue a utimes operation for a directory. >>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier >>>>>> processed and update uime between two parent directory. >>>>>> So fix this by not update non exist utimes for this case. >>>>> >>>>> So you mean that we attempt to update utimes for an inode, >>>>> corresponding to a directory, that exists in the parent snapshot but >>>>> not in the send snapshot. >>>>> >>>>> So the subject should be something like "Btrfs: incremental send, >>>>> don't send utimes for non-existing directory" instead of "Btrfs: >>>>> incremental send, fix rmdir not send utimes" >>>>> >>>>>> >>>>>> Example: >>>>>> >>>>>> Parent snapshot: >>>>>> |---- a/ (ino 259) >>>>>> |---- c (ino 264) >>>>>> |---- b/ (ino 260) >>>>>> |---- d (ino 265) >>>>>> |---- del/ (ino 263) >>>>>> |---- item1/ (ino 261) >>>>>> |---- item2/ (ino 262) >>>>>> >>>>>> Send snapshot: >>>>>> |---- a/ (ino 259) >>>>>> |---- b/ (ino 260) >>>>>> |---- c/ (ino 2) >>>>>> |---- item2 (ino 259) >>>>>> |---- d/ (ino 257) >>>>>> |---- item1/ (ino 258) >>>>>> >>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com> >>>>>> --- >>>>>> fs/btrfs/send.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>>>> index e8eb3ab..46f954c 100644 >>>>>> --- a/fs/btrfs/send.c >>>>>> +++ b/fs/btrfs/send.c >>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); >>>>>> key.type = BTRFS_INODE_ITEM_KEY; >>>>>> key.offset = 0; >>>>>> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); >>>>>> - if (ret < 0) >>>>>> + if (ret != 0) >>>>>> goto out; >>>>> >>>>> So I don't think this is a good fix. The problem is in some code that >>>>> calls this function (send_utimes) against the directory that doesn't >>>>> exist - it just shouldn't do that, its logic should be fixed. >>>>> Following this approach, while it works, it's just hiding logic errors >>>>> in one or more code paths, and none of its callers checks for a return >>>>> value of 1 - they only react to values < 0 and introduces the >>>>> possibility of propagating a return value of 1 to user space. >>>>> >>>>> thanks >>>>> >>>>>> >>>>>> eb = path->nodes[0]; >>>>>> -- >>>>>> 1.9.1 >>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>>> -- >>>>> Filipe David Manana, >>>>> >>>>> "Reasonable men adapt themselves to the world. >>>>> Unreasonable men adapt the world to themselves. >>>>> That's why all progress depends on unreasonable men." >>> >>> >>> >>> -- >>> Filipe David Manana, >>> >>> "Reasonable men adapt themselves to the world. >>> Unreasonable men adapt the world to themselves. >>> That's why all progress depends on unreasonable men." > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Filipe, I've found that the following case is the main cause of such error and it's fs tree is shown via btrfs-debug-tress as below. file tree key (459 ROOT_ITEM 20487) node 132988928 level 1 items 3 free 490 generation 20487 owner 459 fs uuid b451ae42-3b03-4003-b0a4-45dce324557f chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486 key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480 key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464 leaf 132710400 items 166 free space 3639 generation 20486 owner 455 fs uuid b451ae42-3b03-4003-b0a4-45dce324557f chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 inode generation 20425 transid 20442 size 32 block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 inode ref index 0 namelen 2 name: .. ... item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39 location key (0 UNKNOWN.0 0) type XATTR namelen 8 datalen 1 name: user.a78 data a binary 61 leaf 130695168 items 133 free space 7332 generation 20480 owner 455 fs uuid b451ae42-3b03-4003-b0a4-45dce324557f chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160 inode generation 20428 transid 20434 size 10 block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11 inode ref index 11 namelen 1 name: c ... We can see that inode 262 is right at the end of leaf. Then send_utime() will use btrfs_search_slot() to find a appropriate place to put 262 where is at the back of 262. However, that place is uninitialized on disk. Suppose we read atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out. Receiving side will got EINVAL since tv_nsec:1919251317 is greater than 999,999,999. Thanks. Robbie Ko 2015-06-10 18:06 GMT+08:00 Robbie Ko <robbieko@synology.com>: > Hi Filipi, > > 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote: >>> Hi Filipe, >>> >>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote: >>>>> Hi Filipe, >>>> >>>> Hi Robbie, >>>> >>>>> >>>>> I've fixed "don't send utimes for non-existing directory" with another solution. >>>>> >>>>> In apply_dir_move(), the old parent dir. and new parent dir. will be >>>>> updated after the current dir. has moved. >>>>> >>>>> And there's only one entry in old parent dir. (e.g. entry with >>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir. >>>>> is deleted but updated. >>>> >>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino? >>>> rmdir_ino corresponds to the number of a inode that wasn't deleted >>>> when it was processed because there was some inode with a lower number >>>> that is a child of the directory in the parent snapshot and had its >>>> rename/move operation delayed (it happens after the directory we want >>>> to delete is processed). >>>> >>> >>> Right , my "tagged with rmdir_ino" is same meaning as you explained here. >>> >>>>> >>>>> However, if we process rename for another entry not tagged with >>>>> rmdir_ino first, its old parent dir. which is deleted will be updated >>>>> according to apply_dir_move(). >>>>> >>>>> Therefore, I think we should check the existence of the dir. before >>>>> we're going to update it's utime. >>>>> >>>>> The patch is pasted in the following link, could you give me some comment? >>>>> >>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k >>>> >>>> Looks better. >>>> However I still don't understand your explanation, and just tried the >>>> example in your commit message: >>>> >>>> "Parent snapshot: >>>> >>>> |---- a/ (ino 259) >>>> |---- c (ino 264) >>>> |---- b/ (ino 260) >>>> |---- d (ino 265) >>>> |---- del/ (ino 263) >>>> |---- item1/ (ino 261) >>>> |---- item2/ (ino 262) >>>> >>>> Send snapshot: >>>> |---- a/ (ino 259) >>>> |---- b/ (ino 260) >>>> |---- c/ (ino 2) >>>> |---- item2 (ino 259) >>>> |---- d/ (ino 257) >>>> |---- item1/ (ino 258)" >>>> >>>> So it's confusing after looking at it. >>>> First the send snapshot mentions inode number 2, which doesn't exist >>>> in the parent snapshot - I assume you meant inode number 264. >>>> Then, the send snapshot has two inodes with number 259. Is "item2" in >>>> the send snapshot supposed to be inode 262? >>>> >>> >>> Your guess is right. And I correct it as follow. >>> >>> # Parent snapshot: >>> # >>> # |---- a/ (ino 259) >>> # | |---- c (ino 264) >>> # | >>> # |---- b/ (ino 260) >>> # | |---- d (ino 265) >>> # | >>> # |---- del/ (ino 263) >>> # |---- item1/ (ino 261) >>> # |---- item2/ (ino 262) >>> >>> # Send snapshot: >>> # >>> # |---- a/ (ino 259) >>> # |---- b/ (ino 260) >>> # |---- c/ (ino 264) >>> # | |---- item2/ (ino 262) >>> # | >>> # |---- d/ (ino 265) >>> # |---- item1/ (ino 261) >>> >>>> Anyway, assuming those 2 fixes to the example are correct guesses, I >>>> tried the following and it didn't fail without your patches (i.e. no >>>> attempts to send utimes to a non-existing directory): >>>> >>> >>> Here my mean is : btrfs tries to get utime from non-existing directory and >>> apply it on the existing directory. And my patch is attempted to avoid >>> this case. >>> However, this case is not guaranteed to cause error anytime but it may >>> fails somehow >>> which is depending on the data on the disk. >>> The following are the incremental procedures to send the snapshot. >>> >>> utimes >>> utimes a >>> utimes b >>> rename del -> o263-259-0 >>> utimes >>> rename a/c -> c >>> utimes >>> utimes a >>> rename o263-259-0/item2 -> c/item2 >>> utimes c/item2 >>> utimes o263-259-0 <<---------------------- this step may cause error >> >> Why may it cause an error? >> At that moment the name/path o263-259-0 exists at the destination >> (i.e. the receiver, as it applies commands from the send stream >> serially). > > Following is the error message when do receive. > ERROR: utimes o263-259-0 failed. Invalid argument > > The argument of utimes for o263-259-0 is got from inode 263 in send > root, But inode 263 is not exist in send root > In send_utimes(), we didn't check if btrfs_search_slot returns 1, > therefore may encounters this problem. > > I will try to make a stable reproducer. > > Thanks. > Robbie Ko > >> >>> utimes c >>> utimes c >>> rename b/d -> d >>> utimes >>> utimes b >>> rename o263-259-0/item1 -> d/item1 >>> rmdir o263-259-0 >>> utimes d/item1 >>> utimes d >>> utimes d >>> >>> As the above pointed procedure, o263-259-0 is not appeared in the send root. >> >> Well yes, but that doesn't matter. >> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent), >> they're just temporary names to allow for rename/move operations when >> collisions are detected, and exist only in the receiver's filesystem. >> >>> When utime got from o263-259-0 is invalid (i.e. out of range of time >>> format), it will fail. >>> I saw the error occurs at utimensat() and got EINVAL. Here's >>> explanation from linux man page. >>> >>> EINVAL Invalid value in one of the tv_nsec fields (value outside >>> range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or >>> an invalid value in one of the tv_sec fields. >>> >>> So if o263-259-0 is not the send root, invalid format of utime may be got. >> >> Doubly confused now. So the whole change log (and the code changes) >> mentions only attempts to send utimes for a directory/path that >> doesn't exist in the receiver's fs - inode 263 exists in both the send >> and parent roots, so we can always get its utimes values. Now you're >> saying that somewhere in the send code we're getting incorrect values >> for a utimes operation and then sending such operation to the >> receiver? >> >> Getting the values for utimes is done looking up the inode item, by >> its number, in the send root - for this the current name of the inode >> in the receiver fs doesn't matter - it matters only for building the >> path for the utimes operation. But what I'm understanding from your >> reply is that we're getting a wrong utimes value form the inode item >> for inode number 263, which would be a totally different issue from >> generating incorrect paths. >> >> >> So this still doesn't explain me why, without any of your patches >> applied, the reproducer doesn't fail - which is really an important >> aspect. >> This is the xfstest I made with your reproducer: >> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and >> tests/btrfs/999.out is just contains the single line "QA output >> created by 999"). >> >> If you're able to give me a correct reproducer, I can make sense of it >> and help getting a better change log. >> Or this issue happens only after applying some of the other patches in >> the series? >> >> thanks >> >> >> >> >>> >>> Thanks. >>> Robbie Ko >>> >>>> # Parent snapshot: >>>> # >>>> # |---- a/ (ino 259) >>>> # | |---- c (ino 264) >>>> # | >>>> # |---- b/ (ino 260) >>>> # | |---- d (ino 265) >>>> # | >>>> # |---- del/ (ino 263) >>>> # |---- item1/ (ino 261) >>>> # |---- item2/ (ino 262) >>>> >>>> # Send snapshot: >>>> # >>>> # |---- a/ (ino 259) >>>> # |---- b/ (ino 260) >>>> # |---- c/ (ino 264) >>>> # | |---- item2/ (ino 262) >>>> # | >>>> # |---- d/ (ino 265) >>>> # |---- item1/ (ino 258) >>>> >>>> mkdir $SCRATCH_MNT/0 >>>> mkdir $SCRATCH_MNT/1 >>>> >>>> mkdir $SCRATCH_MNT/a # 259 >>>> mkdir $SCRATCH_MNT/b # 260 >>>> mkdir $SCRATCH_MNT/item1 # 261 >>>> mkdir $SCRATCH_MNT/item2 # 262 >>>> mkdir $SCRATCH_MNT/del # 263 >>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1 >>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2 >>>> mkdir $SCRATCH_MNT/a/c # 264 >>>> mkdir $SCRATCH_MNT/b/d # 265 >>>> >>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 >>>> >>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c >>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d >>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c >>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d >>>> rmdir $SCRATCH_MNT/del >>>> >>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 >>>> >>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 >>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ >>>> $SCRATCH_MNT/mysnap2 >>>> >>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap >>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ >>>> -f $tmp/2.snap >>>> >>>> _check_scratch_fs >>>> >>>> _scratch_unmount >>>> _scratch_mkfs >/dev/null 2>&1 >>>> _scratch_mount >>>> >>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap >>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 >>>> >>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap >>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 >>>> >>>> >>>> >>>> I would suggest making those hiearachy diagrams more readable - pipes >>>> right below the name of their parent, continuation pipes like and >>>> align all inode numbers in the same column, like the following: >>>> >>>> # Parent snapshot: >>>> # >>>> # |---- a/ (ino 259) >>>> # | |---- c (ino 264) >>>> # | >>>> # |---- b/ (ino 260) >>>> # | |---- d (ino 265) >>>> # | >>>> # |---- del/ (ino 263) >>>> # |---- item1/ (ino 261) >>>> # |---- item2/ (ino 262) >>>> >>>> # Send snapshot: >>>> # >>>> # |---- a/ (ino 259) >>>> # |---- b/ (ino 260) >>>> # |---- c/ (ino 264) >>>> # | |---- item2/ (ino 262) >>>> # | >>>> # |---- d/ (ino 265) >>>> # |---- item1/ (ino 258) >>>> >>>> (pasted here in case gmail screws up the indentation/formatting: >>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU) >>>> >>>> thanks >>>> >>>>> >>>>> Thans! >>>>> >>>>> Robbie Ko >>>>> >>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: >>>>>>> There's one case where we can't issue a utimes operation for a directory. >>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier >>>>>>> processed and update uime between two parent directory. >>>>>>> So fix this by not update non exist utimes for this case. >>>>>> >>>>>> So you mean that we attempt to update utimes for an inode, >>>>>> corresponding to a directory, that exists in the parent snapshot but >>>>>> not in the send snapshot. >>>>>> >>>>>> So the subject should be something like "Btrfs: incremental send, >>>>>> don't send utimes for non-existing directory" instead of "Btrfs: >>>>>> incremental send, fix rmdir not send utimes" >>>>>> >>>>>>> >>>>>>> Example: >>>>>>> >>>>>>> Parent snapshot: >>>>>>> |---- a/ (ino 259) >>>>>>> |---- c (ino 264) >>>>>>> |---- b/ (ino 260) >>>>>>> |---- d (ino 265) >>>>>>> |---- del/ (ino 263) >>>>>>> |---- item1/ (ino 261) >>>>>>> |---- item2/ (ino 262) >>>>>>> >>>>>>> Send snapshot: >>>>>>> |---- a/ (ino 259) >>>>>>> |---- b/ (ino 260) >>>>>>> |---- c/ (ino 2) >>>>>>> |---- item2 (ino 259) >>>>>>> |---- d/ (ino 257) >>>>>>> |---- item1/ (ino 258) >>>>>>> >>>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com> >>>>>>> --- >>>>>>> fs/btrfs/send.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>>>>> index e8eb3ab..46f954c 100644 >>>>>>> --- a/fs/btrfs/send.c >>>>>>> +++ b/fs/btrfs/send.c >>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); >>>>>>> key.type = BTRFS_INODE_ITEM_KEY; >>>>>>> key.offset = 0; >>>>>>> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); >>>>>>> - if (ret < 0) >>>>>>> + if (ret != 0) >>>>>>> goto out; >>>>>> >>>>>> So I don't think this is a good fix. The problem is in some code that >>>>>> calls this function (send_utimes) against the directory that doesn't >>>>>> exist - it just shouldn't do that, its logic should be fixed. >>>>>> Following this approach, while it works, it's just hiding logic errors >>>>>> in one or more code paths, and none of its callers checks for a return >>>>>> value of 1 - they only react to values < 0 and introduces the >>>>>> possibility of propagating a return value of 1 to user space. >>>>>> >>>>>> thanks >>>>>> >>>>>>> >>>>>>> eb = path->nodes[0]; >>>>>>> -- >>>>>>> 1.9.1 >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Filipe David Manana, >>>>>> >>>>>> "Reasonable men adapt themselves to the world. >>>>>> Unreasonable men adapt the world to themselves. >>>>>> That's why all progress depends on unreasonable men." >>>> >>>> >>>> >>>> -- >>>> Filipe David Manana, >>>> >>>> "Reasonable men adapt themselves to the world. >>>> Unreasonable men adapt the world to themselves. >>>> That's why all progress depends on unreasonable men." >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 18, 2015 at 4:21 AM, Robbie Ko <robbieko@synology.com> wrote: > Hi Filipe, > > I've found that the following case is the main cause of such error > and it's fs tree is shown via btrfs-debug-tress as below. > > file tree key (459 ROOT_ITEM 20487) > node 132988928 level 1 items 3 free 490 generation 20487 owner 459 > fs uuid b451ae42-3b03-4003-b0a4-45dce324557f > chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc > key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486 > key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480 > key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464 > leaf 132710400 items 166 free space 3639 generation 20486 owner 455 > fs uuid b451ae42-3b03-4003-b0a4-45dce324557f > chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc > item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 > inode generation 20425 transid 20442 size 32 block > group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 > item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 > inode ref index 0 namelen 2 name: .. > ... > item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39 > location key (0 UNKNOWN.0 0) type XATTR > namelen 8 datalen 1 name: user.a78 > data a > binary 61 > leaf 130695168 items 133 free space 7332 generation 20480 owner 455 > fs uuid b451ae42-3b03-4003-b0a4-45dce324557f > chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc > item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160 > inode generation 20428 transid 20434 size 10 block > group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 > item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11 > inode ref index 11 namelen 1 name: c > ... > > We can see that inode 262 is right at the end of leaf. Then send_utime() will > use btrfs_search_slot() to find a appropriate place to put 262 where is at the > back of 262. However, that place is uninitialized on disk. > Suppose we read > atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out. > Receiving side will got EINVAL since tv_nsec:1919251317 is greater > than 999,999,999. I see. So in apply_dir_move, instead of searching the btree of the send snapshot, we can search the rbtree of orphan dir infos for an entry with a key == cur->dir. Searching that rbtree makes it clear what the intention is and more efficient (fully in memory structure, and much smaller than the btree). Should work, but I haven't tested it. thanks > > Thanks. > Robbie Ko > > 2015-06-10 18:06 GMT+08:00 Robbie Ko <robbieko@synology.com>: >> Hi Filipi, >> >> 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote: >>>> Hi Filipe, >>>> >>>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote: >>>>>> Hi Filipe, >>>>> >>>>> Hi Robbie, >>>>> >>>>>> >>>>>> I've fixed "don't send utimes for non-existing directory" with another solution. >>>>>> >>>>>> In apply_dir_move(), the old parent dir. and new parent dir. will be >>>>>> updated after the current dir. has moved. >>>>>> >>>>>> And there's only one entry in old parent dir. (e.g. entry with >>>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir. >>>>>> is deleted but updated. >>>>> >>>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino? >>>>> rmdir_ino corresponds to the number of a inode that wasn't deleted >>>>> when it was processed because there was some inode with a lower number >>>>> that is a child of the directory in the parent snapshot and had its >>>>> rename/move operation delayed (it happens after the directory we want >>>>> to delete is processed). >>>>> >>>> >>>> Right , my "tagged with rmdir_ino" is same meaning as you explained here. >>>> >>>>>> >>>>>> However, if we process rename for another entry not tagged with >>>>>> rmdir_ino first, its old parent dir. which is deleted will be updated >>>>>> according to apply_dir_move(). >>>>>> >>>>>> Therefore, I think we should check the existence of the dir. before >>>>>> we're going to update it's utime. >>>>>> >>>>>> The patch is pasted in the following link, could you give me some comment? >>>>>> >>>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k >>>>> >>>>> Looks better. >>>>> However I still don't understand your explanation, and just tried the >>>>> example in your commit message: >>>>> >>>>> "Parent snapshot: >>>>> >>>>> |---- a/ (ino 259) >>>>> |---- c (ino 264) >>>>> |---- b/ (ino 260) >>>>> |---- d (ino 265) >>>>> |---- del/ (ino 263) >>>>> |---- item1/ (ino 261) >>>>> |---- item2/ (ino 262) >>>>> >>>>> Send snapshot: >>>>> |---- a/ (ino 259) >>>>> |---- b/ (ino 260) >>>>> |---- c/ (ino 2) >>>>> |---- item2 (ino 259) >>>>> |---- d/ (ino 257) >>>>> |---- item1/ (ino 258)" >>>>> >>>>> So it's confusing after looking at it. >>>>> First the send snapshot mentions inode number 2, which doesn't exist >>>>> in the parent snapshot - I assume you meant inode number 264. >>>>> Then, the send snapshot has two inodes with number 259. Is "item2" in >>>>> the send snapshot supposed to be inode 262? >>>>> >>>> >>>> Your guess is right. And I correct it as follow. >>>> >>>> # Parent snapshot: >>>> # >>>> # |---- a/ (ino 259) >>>> # | |---- c (ino 264) >>>> # | >>>> # |---- b/ (ino 260) >>>> # | |---- d (ino 265) >>>> # | >>>> # |---- del/ (ino 263) >>>> # |---- item1/ (ino 261) >>>> # |---- item2/ (ino 262) >>>> >>>> # Send snapshot: >>>> # >>>> # |---- a/ (ino 259) >>>> # |---- b/ (ino 260) >>>> # |---- c/ (ino 264) >>>> # | |---- item2/ (ino 262) >>>> # | >>>> # |---- d/ (ino 265) >>>> # |---- item1/ (ino 261) >>>> >>>>> Anyway, assuming those 2 fixes to the example are correct guesses, I >>>>> tried the following and it didn't fail without your patches (i.e. no >>>>> attempts to send utimes to a non-existing directory): >>>>> >>>> >>>> Here my mean is : btrfs tries to get utime from non-existing directory and >>>> apply it on the existing directory. And my patch is attempted to avoid >>>> this case. >>>> However, this case is not guaranteed to cause error anytime but it may >>>> fails somehow >>>> which is depending on the data on the disk. >>>> The following are the incremental procedures to send the snapshot. >>>> >>>> utimes >>>> utimes a >>>> utimes b >>>> rename del -> o263-259-0 >>>> utimes >>>> rename a/c -> c >>>> utimes >>>> utimes a >>>> rename o263-259-0/item2 -> c/item2 >>>> utimes c/item2 >>>> utimes o263-259-0 <<---------------------- this step may cause error >>> >>> Why may it cause an error? >>> At that moment the name/path o263-259-0 exists at the destination >>> (i.e. the receiver, as it applies commands from the send stream >>> serially). >> >> Following is the error message when do receive. >> ERROR: utimes o263-259-0 failed. Invalid argument >> >> The argument of utimes for o263-259-0 is got from inode 263 in send >> root, But inode 263 is not exist in send root >> In send_utimes(), we didn't check if btrfs_search_slot returns 1, >> therefore may encounters this problem. >> >> I will try to make a stable reproducer. >> >> Thanks. >> Robbie Ko >> >>> >>>> utimes c >>>> utimes c >>>> rename b/d -> d >>>> utimes >>>> utimes b >>>> rename o263-259-0/item1 -> d/item1 >>>> rmdir o263-259-0 >>>> utimes d/item1 >>>> utimes d >>>> utimes d >>>> >>>> As the above pointed procedure, o263-259-0 is not appeared in the send root. >>> >>> Well yes, but that doesn't matter. >>> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent), >>> they're just temporary names to allow for rename/move operations when >>> collisions are detected, and exist only in the receiver's filesystem. >>> >>>> When utime got from o263-259-0 is invalid (i.e. out of range of time >>>> format), it will fail. >>>> I saw the error occurs at utimensat() and got EINVAL. Here's >>>> explanation from linux man page. >>>> >>>> EINVAL Invalid value in one of the tv_nsec fields (value outside >>>> range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or >>>> an invalid value in one of the tv_sec fields. >>>> >>>> So if o263-259-0 is not the send root, invalid format of utime may be got. >>> >>> Doubly confused now. So the whole change log (and the code changes) >>> mentions only attempts to send utimes for a directory/path that >>> doesn't exist in the receiver's fs - inode 263 exists in both the send >>> and parent roots, so we can always get its utimes values. Now you're >>> saying that somewhere in the send code we're getting incorrect values >>> for a utimes operation and then sending such operation to the >>> receiver? >>> >>> Getting the values for utimes is done looking up the inode item, by >>> its number, in the send root - for this the current name of the inode >>> in the receiver fs doesn't matter - it matters only for building the >>> path for the utimes operation. But what I'm understanding from your >>> reply is that we're getting a wrong utimes value form the inode item >>> for inode number 263, which would be a totally different issue from >>> generating incorrect paths. >>> >>> >>> So this still doesn't explain me why, without any of your patches >>> applied, the reproducer doesn't fail - which is really an important >>> aspect. >>> This is the xfstest I made with your reproducer: >>> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and >>> tests/btrfs/999.out is just contains the single line "QA output >>> created by 999"). >>> >>> If you're able to give me a correct reproducer, I can make sense of it >>> and help getting a better change log. >>> Or this issue happens only after applying some of the other patches in >>> the series? >>> >>> thanks >>> >>> >>> >>> >>>> >>>> Thanks. >>>> Robbie Ko >>>> >>>>> # Parent snapshot: >>>>> # >>>>> # |---- a/ (ino 259) >>>>> # | |---- c (ino 264) >>>>> # | >>>>> # |---- b/ (ino 260) >>>>> # | |---- d (ino 265) >>>>> # | >>>>> # |---- del/ (ino 263) >>>>> # |---- item1/ (ino 261) >>>>> # |---- item2/ (ino 262) >>>>> >>>>> # Send snapshot: >>>>> # >>>>> # |---- a/ (ino 259) >>>>> # |---- b/ (ino 260) >>>>> # |---- c/ (ino 264) >>>>> # | |---- item2/ (ino 262) >>>>> # | >>>>> # |---- d/ (ino 265) >>>>> # |---- item1/ (ino 258) >>>>> >>>>> mkdir $SCRATCH_MNT/0 >>>>> mkdir $SCRATCH_MNT/1 >>>>> >>>>> mkdir $SCRATCH_MNT/a # 259 >>>>> mkdir $SCRATCH_MNT/b # 260 >>>>> mkdir $SCRATCH_MNT/item1 # 261 >>>>> mkdir $SCRATCH_MNT/item2 # 262 >>>>> mkdir $SCRATCH_MNT/del # 263 >>>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1 >>>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2 >>>>> mkdir $SCRATCH_MNT/a/c # 264 >>>>> mkdir $SCRATCH_MNT/b/d # 265 >>>>> >>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 >>>>> >>>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c >>>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d >>>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c >>>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d >>>>> rmdir $SCRATCH_MNT/del >>>>> >>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 >>>>> >>>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 >>>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ >>>>> $SCRATCH_MNT/mysnap2 >>>>> >>>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap >>>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ >>>>> -f $tmp/2.snap >>>>> >>>>> _check_scratch_fs >>>>> >>>>> _scratch_unmount >>>>> _scratch_mkfs >/dev/null 2>&1 >>>>> _scratch_mount >>>>> >>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap >>>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 >>>>> >>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap >>>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 >>>>> >>>>> >>>>> >>>>> I would suggest making those hiearachy diagrams more readable - pipes >>>>> right below the name of their parent, continuation pipes like and >>>>> align all inode numbers in the same column, like the following: >>>>> >>>>> # Parent snapshot: >>>>> # >>>>> # |---- a/ (ino 259) >>>>> # | |---- c (ino 264) >>>>> # | >>>>> # |---- b/ (ino 260) >>>>> # | |---- d (ino 265) >>>>> # | >>>>> # |---- del/ (ino 263) >>>>> # |---- item1/ (ino 261) >>>>> # |---- item2/ (ino 262) >>>>> >>>>> # Send snapshot: >>>>> # >>>>> # |---- a/ (ino 259) >>>>> # |---- b/ (ino 260) >>>>> # |---- c/ (ino 264) >>>>> # | |---- item2/ (ino 262) >>>>> # | >>>>> # |---- d/ (ino 265) >>>>> # |---- item1/ (ino 258) >>>>> >>>>> (pasted here in case gmail screws up the indentation/formatting: >>>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU) >>>>> >>>>> thanks >>>>> >>>>>> >>>>>> Thans! >>>>>> >>>>>> Robbie Ko >>>>>> >>>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: >>>>>>>> There's one case where we can't issue a utimes operation for a directory. >>>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier >>>>>>>> processed and update uime between two parent directory. >>>>>>>> So fix this by not update non exist utimes for this case. >>>>>>> >>>>>>> So you mean that we attempt to update utimes for an inode, >>>>>>> corresponding to a directory, that exists in the parent snapshot but >>>>>>> not in the send snapshot. >>>>>>> >>>>>>> So the subject should be something like "Btrfs: incremental send, >>>>>>> don't send utimes for non-existing directory" instead of "Btrfs: >>>>>>> incremental send, fix rmdir not send utimes" >>>>>>> >>>>>>>> >>>>>>>> Example: >>>>>>>> >>>>>>>> Parent snapshot: >>>>>>>> |---- a/ (ino 259) >>>>>>>> |---- c (ino 264) >>>>>>>> |---- b/ (ino 260) >>>>>>>> |---- d (ino 265) >>>>>>>> |---- del/ (ino 263) >>>>>>>> |---- item1/ (ino 261) >>>>>>>> |---- item2/ (ino 262) >>>>>>>> >>>>>>>> Send snapshot: >>>>>>>> |---- a/ (ino 259) >>>>>>>> |---- b/ (ino 260) >>>>>>>> |---- c/ (ino 2) >>>>>>>> |---- item2 (ino 259) >>>>>>>> |---- d/ (ino 257) >>>>>>>> |---- item1/ (ino 258) >>>>>>>> >>>>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com> >>>>>>>> --- >>>>>>>> fs/btrfs/send.c | 2 +- >>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>>>>>> index e8eb3ab..46f954c 100644 >>>>>>>> --- a/fs/btrfs/send.c >>>>>>>> +++ b/fs/btrfs/send.c >>>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); >>>>>>>> key.type = BTRFS_INODE_ITEM_KEY; >>>>>>>> key.offset = 0; >>>>>>>> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); >>>>>>>> - if (ret < 0) >>>>>>>> + if (ret != 0) >>>>>>>> goto out; >>>>>>> >>>>>>> So I don't think this is a good fix. The problem is in some code that >>>>>>> calls this function (send_utimes) against the directory that doesn't >>>>>>> exist - it just shouldn't do that, its logic should be fixed. >>>>>>> Following this approach, while it works, it's just hiding logic errors >>>>>>> in one or more code paths, and none of its callers checks for a return >>>>>>> value of 1 - they only react to values < 0 and introduces the >>>>>>> possibility of propagating a return value of 1 to user space. >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>>> >>>>>>>> eb = path->nodes[0]; >>>>>>>> -- >>>>>>>> 1.9.1 >>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Filipe David Manana, >>>>>>> >>>>>>> "Reasonable men adapt themselves to the world. >>>>>>> Unreasonable men adapt the world to themselves. >>>>>>> That's why all progress depends on unreasonable men." >>>>> >>>>> >>>>> >>>>> -- >>>>> Filipe David Manana, >>>>> >>>>> "Reasonable men adapt themselves to the world. >>>>> Unreasonable men adapt the world to themselves. >>>>> That's why all progress depends on unreasonable men." >>> >>> >>> >>> -- >>> Filipe David Manana, >>> >>> "Reasonable men adapt themselves to the world. >>> Unreasonable men adapt the world to themselves. >>> That's why all progress depends on unreasonable men."
Hi Filipe, We can not only search rbtree of orphan dir infos for an entry with a key == cur->dir. Because in apply_dir_move, it needs to need to update the utimes of both new parent(s) and old parent(s) after rename/move. If the old parent have not been processed yet, it can't find cur->dir in rbtree of orphan dir infos. Example, Parent snapshot: |---- a/ (ino 259) |---- c (ino 261) |---- del/ (ino 262) |---- item1/ (ino 260) Send snapshot: |---- a/ (ino 259) |---- c/ (ino 261) |---- item1 (ino 260) receiving snapshot utimes utimes a rename a/c -> c utimes utimes a rename del/dir_item1 -> c/dir_item1 utimes c/dir_item1 utimes del <----------- the same problem. utimes c utimes c rmdir del utimes Thanks. Robbieko 2015-06-19 2:11 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: > On Thu, Jun 18, 2015 at 4:21 AM, Robbie Ko <robbieko@synology.com> wrote: >> Hi Filipe, >> >> I've found that the following case is the main cause of such error >> and it's fs tree is shown via btrfs-debug-tress as below. >> >> file tree key (459 ROOT_ITEM 20487) >> node 132988928 level 1 items 3 free 490 generation 20487 owner 459 >> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f >> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc >> key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486 >> key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480 >> key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464 >> leaf 132710400 items 166 free space 3639 generation 20486 owner 455 >> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f >> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc >> item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 >> inode generation 20425 transid 20442 size 32 block >> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 >> item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 >> inode ref index 0 namelen 2 name: .. >> ... >> item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39 >> location key (0 UNKNOWN.0 0) type XATTR >> namelen 8 datalen 1 name: user.a78 >> data a >> binary 61 >> leaf 130695168 items 133 free space 7332 generation 20480 owner 455 >> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f >> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc >> item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160 >> inode generation 20428 transid 20434 size 10 block >> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 >> item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11 >> inode ref index 11 namelen 1 name: c >> ... >> >> We can see that inode 262 is right at the end of leaf. Then send_utime() will >> use btrfs_search_slot() to find a appropriate place to put 262 where is at the >> back of 262. However, that place is uninitialized on disk. >> Suppose we read >> atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out. >> Receiving side will got EINVAL since tv_nsec:1919251317 is greater >> than 999,999,999. > > I see. > So in apply_dir_move, instead of searching the btree of the send > snapshot, we can search the rbtree of orphan dir infos for an entry > with a key == cur->dir. Searching that rbtree makes it clear what the > intention is and more efficient (fully in memory structure, and much > smaller than the btree). Should work, but I haven't tested it. > > thanks > >> >> Thanks. >> Robbie Ko >> >> 2015-06-10 18:06 GMT+08:00 Robbie Ko <robbieko@synology.com>: >>> Hi Filipi, >>> >>> 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@synology.com> wrote: >>>>> Hi Filipe, >>>>> >>>>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@synology.com> wrote: >>>>>>> Hi Filipe, >>>>>> >>>>>> Hi Robbie, >>>>>> >>>>>>> >>>>>>> I've fixed "don't send utimes for non-existing directory" with another solution. >>>>>>> >>>>>>> In apply_dir_move(), the old parent dir. and new parent dir. will be >>>>>>> updated after the current dir. has moved. >>>>>>> >>>>>>> And there's only one entry in old parent dir. (e.g. entry with >>>>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir. >>>>>>> is deleted but updated. >>>>>> >>>>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino? >>>>>> rmdir_ino corresponds to the number of a inode that wasn't deleted >>>>>> when it was processed because there was some inode with a lower number >>>>>> that is a child of the directory in the parent snapshot and had its >>>>>> rename/move operation delayed (it happens after the directory we want >>>>>> to delete is processed). >>>>>> >>>>> >>>>> Right , my "tagged with rmdir_ino" is same meaning as you explained here. >>>>> >>>>>>> >>>>>>> However, if we process rename for another entry not tagged with >>>>>>> rmdir_ino first, its old parent dir. which is deleted will be updated >>>>>>> according to apply_dir_move(). >>>>>>> >>>>>>> Therefore, I think we should check the existence of the dir. before >>>>>>> we're going to update it's utime. >>>>>>> >>>>>>> The patch is pasted in the following link, could you give me some comment? >>>>>>> >>>>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k >>>>>> >>>>>> Looks better. >>>>>> However I still don't understand your explanation, and just tried the >>>>>> example in your commit message: >>>>>> >>>>>> "Parent snapshot: >>>>>> >>>>>> |---- a/ (ino 259) >>>>>> |---- c (ino 264) >>>>>> |---- b/ (ino 260) >>>>>> |---- d (ino 265) >>>>>> |---- del/ (ino 263) >>>>>> |---- item1/ (ino 261) >>>>>> |---- item2/ (ino 262) >>>>>> >>>>>> Send snapshot: >>>>>> |---- a/ (ino 259) >>>>>> |---- b/ (ino 260) >>>>>> |---- c/ (ino 2) >>>>>> |---- item2 (ino 259) >>>>>> |---- d/ (ino 257) >>>>>> |---- item1/ (ino 258)" >>>>>> >>>>>> So it's confusing after looking at it. >>>>>> First the send snapshot mentions inode number 2, which doesn't exist >>>>>> in the parent snapshot - I assume you meant inode number 264. >>>>>> Then, the send snapshot has two inodes with number 259. Is "item2" in >>>>>> the send snapshot supposed to be inode 262? >>>>>> >>>>> >>>>> Your guess is right. And I correct it as follow. >>>>> >>>>> # Parent snapshot: >>>>> # >>>>> # |---- a/ (ino 259) >>>>> # | |---- c (ino 264) >>>>> # | >>>>> # |---- b/ (ino 260) >>>>> # | |---- d (ino 265) >>>>> # | >>>>> # |---- del/ (ino 263) >>>>> # |---- item1/ (ino 261) >>>>> # |---- item2/ (ino 262) >>>>> >>>>> # Send snapshot: >>>>> # >>>>> # |---- a/ (ino 259) >>>>> # |---- b/ (ino 260) >>>>> # |---- c/ (ino 264) >>>>> # | |---- item2/ (ino 262) >>>>> # | >>>>> # |---- d/ (ino 265) >>>>> # |---- item1/ (ino 261) >>>>> >>>>>> Anyway, assuming those 2 fixes to the example are correct guesses, I >>>>>> tried the following and it didn't fail without your patches (i.e. no >>>>>> attempts to send utimes to a non-existing directory): >>>>>> >>>>> >>>>> Here my mean is : btrfs tries to get utime from non-existing directory and >>>>> apply it on the existing directory. And my patch is attempted to avoid >>>>> this case. >>>>> However, this case is not guaranteed to cause error anytime but it may >>>>> fails somehow >>>>> which is depending on the data on the disk. >>>>> The following are the incremental procedures to send the snapshot. >>>>> >>>>> utimes >>>>> utimes a >>>>> utimes b >>>>> rename del -> o263-259-0 >>>>> utimes >>>>> rename a/c -> c >>>>> utimes >>>>> utimes a >>>>> rename o263-259-0/item2 -> c/item2 >>>>> utimes c/item2 >>>>> utimes o263-259-0 <<---------------------- this step may cause error >>>> >>>> Why may it cause an error? >>>> At that moment the name/path o263-259-0 exists at the destination >>>> (i.e. the receiver, as it applies commands from the send stream >>>> serially). >>> >>> Following is the error message when do receive. >>> ERROR: utimes o263-259-0 failed. Invalid argument >>> >>> The argument of utimes for o263-259-0 is got from inode 263 in send >>> root, But inode 263 is not exist in send root >>> In send_utimes(), we didn't check if btrfs_search_slot returns 1, >>> therefore may encounters this problem. >>> >>> I will try to make a stable reproducer. >>> >>> Thanks. >>> Robbie Ko >>> >>>> >>>>> utimes c >>>>> utimes c >>>>> rename b/d -> d >>>>> utimes >>>>> utimes b >>>>> rename o263-259-0/item1 -> d/item1 >>>>> rmdir o263-259-0 >>>>> utimes d/item1 >>>>> utimes d >>>>> utimes d >>>>> >>>>> As the above pointed procedure, o263-259-0 is not appeared in the send root. >>>> >>>> Well yes, but that doesn't matter. >>>> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent), >>>> they're just temporary names to allow for rename/move operations when >>>> collisions are detected, and exist only in the receiver's filesystem. >>>> >>>>> When utime got from o263-259-0 is invalid (i.e. out of range of time >>>>> format), it will fail. >>>>> I saw the error occurs at utimensat() and got EINVAL. Here's >>>>> explanation from linux man page. >>>>> >>>>> EINVAL Invalid value in one of the tv_nsec fields (value outside >>>>> range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or >>>>> an invalid value in one of the tv_sec fields. >>>>> >>>>> So if o263-259-0 is not the send root, invalid format of utime may be got. >>>> >>>> Doubly confused now. So the whole change log (and the code changes) >>>> mentions only attempts to send utimes for a directory/path that >>>> doesn't exist in the receiver's fs - inode 263 exists in both the send >>>> and parent roots, so we can always get its utimes values. Now you're >>>> saying that somewhere in the send code we're getting incorrect values >>>> for a utimes operation and then sending such operation to the >>>> receiver? >>>> >>>> Getting the values for utimes is done looking up the inode item, by >>>> its number, in the send root - for this the current name of the inode >>>> in the receiver fs doesn't matter - it matters only for building the >>>> path for the utimes operation. But what I'm understanding from your >>>> reply is that we're getting a wrong utimes value form the inode item >>>> for inode number 263, which would be a totally different issue from >>>> generating incorrect paths. >>>> >>>> >>>> So this still doesn't explain me why, without any of your patches >>>> applied, the reproducer doesn't fail - which is really an important >>>> aspect. >>>> This is the xfstest I made with your reproducer: >>>> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and >>>> tests/btrfs/999.out is just contains the single line "QA output >>>> created by 999"). >>>> >>>> If you're able to give me a correct reproducer, I can make sense of it >>>> and help getting a better change log. >>>> Or this issue happens only after applying some of the other patches in >>>> the series? >>>> >>>> thanks >>>> >>>> >>>> >>>> >>>>> >>>>> Thanks. >>>>> Robbie Ko >>>>> >>>>>> # Parent snapshot: >>>>>> # >>>>>> # |---- a/ (ino 259) >>>>>> # | |---- c (ino 264) >>>>>> # | >>>>>> # |---- b/ (ino 260) >>>>>> # | |---- d (ino 265) >>>>>> # | >>>>>> # |---- del/ (ino 263) >>>>>> # |---- item1/ (ino 261) >>>>>> # |---- item2/ (ino 262) >>>>>> >>>>>> # Send snapshot: >>>>>> # >>>>>> # |---- a/ (ino 259) >>>>>> # |---- b/ (ino 260) >>>>>> # |---- c/ (ino 264) >>>>>> # | |---- item2/ (ino 262) >>>>>> # | >>>>>> # |---- d/ (ino 265) >>>>>> # |---- item1/ (ino 258) >>>>>> >>>>>> mkdir $SCRATCH_MNT/0 >>>>>> mkdir $SCRATCH_MNT/1 >>>>>> >>>>>> mkdir $SCRATCH_MNT/a # 259 >>>>>> mkdir $SCRATCH_MNT/b # 260 >>>>>> mkdir $SCRATCH_MNT/item1 # 261 >>>>>> mkdir $SCRATCH_MNT/item2 # 262 >>>>>> mkdir $SCRATCH_MNT/del # 263 >>>>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1 >>>>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2 >>>>>> mkdir $SCRATCH_MNT/a/c # 264 >>>>>> mkdir $SCRATCH_MNT/b/d # 265 >>>>>> >>>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1 >>>>>> >>>>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c >>>>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d >>>>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c >>>>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d >>>>>> rmdir $SCRATCH_MNT/del >>>>>> >>>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2 >>>>>> >>>>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1 >>>>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \ >>>>>> $SCRATCH_MNT/mysnap2 >>>>>> >>>>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap >>>>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ >>>>>> -f $tmp/2.snap >>>>>> >>>>>> _check_scratch_fs >>>>>> >>>>>> _scratch_unmount >>>>>> _scratch_mkfs >/dev/null 2>&1 >>>>>> _scratch_mount >>>>>> >>>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap >>>>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1 >>>>>> >>>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap >>>>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2 >>>>>> >>>>>> >>>>>> >>>>>> I would suggest making those hiearachy diagrams more readable - pipes >>>>>> right below the name of their parent, continuation pipes like and >>>>>> align all inode numbers in the same column, like the following: >>>>>> >>>>>> # Parent snapshot: >>>>>> # >>>>>> # |---- a/ (ino 259) >>>>>> # | |---- c (ino 264) >>>>>> # | >>>>>> # |---- b/ (ino 260) >>>>>> # | |---- d (ino 265) >>>>>> # | >>>>>> # |---- del/ (ino 263) >>>>>> # |---- item1/ (ino 261) >>>>>> # |---- item2/ (ino 262) >>>>>> >>>>>> # Send snapshot: >>>>>> # >>>>>> # |---- a/ (ino 259) >>>>>> # |---- b/ (ino 260) >>>>>> # |---- c/ (ino 264) >>>>>> # | |---- item2/ (ino 262) >>>>>> # | >>>>>> # |---- d/ (ino 265) >>>>>> # |---- item1/ (ino 258) >>>>>> >>>>>> (pasted here in case gmail screws up the indentation/formatting: >>>>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU) >>>>>> >>>>>> thanks >>>>>> >>>>>>> >>>>>>> Thans! >>>>>>> >>>>>>> Robbie Ko >>>>>>> >>>>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@gmail.com>: >>>>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@synology.com> wrote: >>>>>>>>> There's one case where we can't issue a utimes operation for a directory. >>>>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier >>>>>>>>> processed and update uime between two parent directory. >>>>>>>>> So fix this by not update non exist utimes for this case. >>>>>>>> >>>>>>>> So you mean that we attempt to update utimes for an inode, >>>>>>>> corresponding to a directory, that exists in the parent snapshot but >>>>>>>> not in the send snapshot. >>>>>>>> >>>>>>>> So the subject should be something like "Btrfs: incremental send, >>>>>>>> don't send utimes for non-existing directory" instead of "Btrfs: >>>>>>>> incremental send, fix rmdir not send utimes" >>>>>>>> >>>>>>>>> >>>>>>>>> Example: >>>>>>>>> >>>>>>>>> Parent snapshot: >>>>>>>>> |---- a/ (ino 259) >>>>>>>>> |---- c (ino 264) >>>>>>>>> |---- b/ (ino 260) >>>>>>>>> |---- d (ino 265) >>>>>>>>> |---- del/ (ino 263) >>>>>>>>> |---- item1/ (ino 261) >>>>>>>>> |---- item2/ (ino 262) >>>>>>>>> >>>>>>>>> Send snapshot: >>>>>>>>> |---- a/ (ino 259) >>>>>>>>> |---- b/ (ino 260) >>>>>>>>> |---- c/ (ino 2) >>>>>>>>> |---- item2 (ino 259) >>>>>>>>> |---- d/ (ino 257) >>>>>>>>> |---- item1/ (ino 258) >>>>>>>>> >>>>>>>>> Signed-off-by: Robbie Ko <robbieko@synology.com> >>>>>>>>> --- >>>>>>>>> fs/btrfs/send.c | 2 +- >>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c >>>>>>>>> index e8eb3ab..46f954c 100644 >>>>>>>>> --- a/fs/btrfs/send.c >>>>>>>>> +++ b/fs/btrfs/send.c >>>>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); >>>>>>>>> key.type = BTRFS_INODE_ITEM_KEY; >>>>>>>>> key.offset = 0; >>>>>>>>> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); >>>>>>>>> - if (ret < 0) >>>>>>>>> + if (ret != 0) >>>>>>>>> goto out; >>>>>>>> >>>>>>>> So I don't think this is a good fix. The problem is in some code that >>>>>>>> calls this function (send_utimes) against the directory that doesn't >>>>>>>> exist - it just shouldn't do that, its logic should be fixed. >>>>>>>> Following this approach, while it works, it's just hiding logic errors >>>>>>>> in one or more code paths, and none of its callers checks for a return >>>>>>>> value of 1 - they only react to values < 0 and introduces the >>>>>>>> possibility of propagating a return value of 1 to user space. >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>>> >>>>>>>>> eb = path->nodes[0]; >>>>>>>>> -- >>>>>>>>> 1.9.1 >>>>>>>>> >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Filipe David Manana, >>>>>>>> >>>>>>>> "Reasonable men adapt themselves to the world. >>>>>>>> Unreasonable men adapt the world to themselves. >>>>>>>> That's why all progress depends on unreasonable men." >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Filipe David Manana, >>>>>> >>>>>> "Reasonable men adapt themselves to the world. >>>>>> Unreasonable men adapt the world to themselves. >>>>>> That's why all progress depends on unreasonable men." >>>> >>>> >>>> >>>> -- >>>> Filipe David Manana, >>>> >>>> "Reasonable men adapt themselves to the world. >>>> Unreasonable men adapt the world to themselves. >>>> That's why all progress depends on unreasonable men." > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e8eb3ab..46f954c 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino); key.type = BTRFS_INODE_ITEM_KEY; key.offset = 0; ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0); - if (ret < 0) + if (ret != 0) goto out; eb = path->nodes[0];
There's one case where we can't issue a utimes operation for a directory. When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier processed and update uime between two parent directory. So fix this by not update non exist utimes for this case. Example: Parent snapshot: |---- a/ (ino 259) |---- c (ino 264) |---- b/ (ino 260) |---- d (ino 265) |---- del/ (ino 263) |---- item1/ (ino 261) |---- item2/ (ino 262) Send snapshot: |---- a/ (ino 259) |---- b/ (ino 260) |---- c/ (ino 2) |---- item2 (ino 259) |---- d/ (ino 257) |---- item1/ (ino 258) Signed-off-by: Robbie Ko <robbieko@synology.com> --- fs/btrfs/send.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)