Message ID | 20220719132809.409247-3-shaoxuan.yuan02@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | mv: from in-cone to out-of-cone | expand |
On 7/19/2022 9:28 AM, Shaoxuan Yuan wrote: > Using check_dir_in_index without checking if the directory is on-disk > could get a false positive for partially sparsified directory. It can be helpful to point out that this is a relatively new method, introduced in b91a2b6594 (mv: add check_dir_in_index() and solve general dir check issue, 2022-06-30). > + * > + * Note: *always* check the directory is not on-disk before this function > + * (i.e. using lstat()); > + * otherwise it may return a false positive for a partially sparsified > + * directory. I'm not sure what you mean by a "false positive" in this case. The directory exists in the index, which is what the method is defined as checking. This does not say anything about the worktree. Perhaps that's the real problem? Someone might interpret this as meaning the directory does not exist in the worktree? That would mean that this doc update needs to be changed significantly to say "Note that this does not imply anything about the state of the worktree" or something. But I think I'd rather just see this patch be dropped, unless I am missing something important. Thanks, -Stolee
Shaoxuan Yuan wrote: > Using check_dir_in_index without checking if the directory is on-disk > could get a false positive for partially sparsified directory. > > Add a note in the documentation to warn potential user. > > Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> > --- > builtin/mv.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/builtin/mv.c b/builtin/mv.c > index 4729bb1a1a..c8b9069db8 100644 > --- a/builtin/mv.c > +++ b/builtin/mv.c > @@ -132,6 +132,11 @@ static int index_range_of_same_dir(const char *src, int length, > * Return 0 if such directory exist (i.e. with any of its contained files not > * marked with CE_SKIP_WORKTREE, the directory would be present in working tree). > * Return 1 otherwise. > + * > + * Note: *always* check the directory is not on-disk before this function > + * (i.e. using lstat()); > + * otherwise it may return a false positive for a partially sparsified > + * directory. To me, a comment like this indicates that either the function is not doing what it should be doing, or its name doesn't properly describe the function's behavior. Per the function description: Check if an out-of-cone directory should be in the index. Imagine this case that all the files under a directory are marked with 'CE_SKIP_WORKTREE' bit and thus the directory is sparsified. But neither the name of the function ('check_dir_in_index') nor the function's behavior (hence the comment you're adding here) match that description. Since this function is intended to verify that a directory 1) exists in the index, and 2) is *entirely* sparse, I have two suggestions: * Change the description to specify that the non-existence of the directory on disk is an *assumption*, not an opportunity for a "false positive." It's a subtle distinction, but it clarifies that the function should be used only when the caller already knows the directory is empty. For example: /* * Given the path of a directory that does not exist on-disk, check whether the * directory contains any entries in the index with the SKIP_WORKTREE flag * enabled. * * Return 1 if such index entries exist. * Return 0 otherwise. */ * 'check_dir_in_index' doesn't reflect the "is not on disk" prerequisite, so the function name can be updated to clarify that (e.g., 'empty_dir_has_sparse_contents') > */ > static int check_dir_in_index(const char *name) > {
Victoria Dye wrote: > Shaoxuan Yuan wrote: >> Using check_dir_in_index without checking if the directory is on-disk >> could get a false positive for partially sparsified directory. >> >> Add a note in the documentation to warn potential user. >> >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> >> --- >> builtin/mv.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/builtin/mv.c b/builtin/mv.c >> index 4729bb1a1a..c8b9069db8 100644 >> --- a/builtin/mv.c >> +++ b/builtin/mv.c >> @@ -132,6 +132,11 @@ static int index_range_of_same_dir(const char *src, int length, >> * Return 0 if such directory exist (i.e. with any of its contained files not >> * marked with CE_SKIP_WORKTREE, the directory would be present in working tree). >> * Return 1 otherwise. >> + * >> + * Note: *always* check the directory is not on-disk before this function >> + * (i.e. using lstat()); >> + * otherwise it may return a false positive for a partially sparsified >> + * directory. > > To me, a comment like this indicates that either the function is not doing > what it should be doing, or its name doesn't properly describe the > function's behavior. > > Per the function description: > > Check if an out-of-cone directory should be in the index. Imagine > this case that all the files under a directory are marked with > 'CE_SKIP_WORKTREE' bit and thus the directory is sparsified. > > But neither the name of the function ('check_dir_in_index') nor the > function's behavior (hence the comment you're adding here) match that > description. > > Since this function is intended to verify that a directory 1) exists in the > index, and 2) is *entirely* sparse, I have two suggestions: > > * Change the description to specify that the non-existence of the directory > on disk is an *assumption*, not an opportunity for a "false positive." > It's a subtle distinction, but it clarifies that the function should be > used only when the caller already knows the directory is empty. For > example: > > /* > * Given the path of a directory that does not exist on-disk, check whether the > * directory contains any entries in the index with the SKIP_WORKTREE flag > * enabled. > * > * Return 1 if such index entries exist. > * Return 0 otherwise. > */ Whoops, I mixed up the return values (I assumed this returned a boolean based on the 'check' in the function name). In that case... > > * 'check_dir_in_index' doesn't reflect the "is not on disk" prerequisite, so > the function name can be updated to clarify that (e.g., > 'empty_dir_has_sparse_contents') ...there are two options. Either you can use a more "boolean-y" name (like the one I suggest above) and flip the return values (where "1" means "the empty dir has sparse contents"), or change the name to something that explicitly *doesn't* imply boolean. I'm personally in favor of the former (I'm really struggling to come up with a descriptive, non-boolean name for this function), but I'm fine with either if you can come up with a good function name. > >> */ >> static int check_dir_in_index(const char *name) >> { >
On 7/20/2022 1:43 AM, Derrick Stolee wrote: >> + * >> + * Note: *always* check the directory is not on-disk before this function >> + * (i.e. using lstat()); >> + * otherwise it may return a false positive for a partially sparsified >> + * directory. > > I'm not sure what you mean by a "false positive" in this case. > The directory exists in the index, which is what the method is > defined as checking. This does not say anything about the > worktree. > > Perhaps that's the real problem? Someone might interpret this > as meaning the directory does not exist in the worktree? That > would mean that this doc update needs to be changed significantly > to say "Note that this does not imply anything about the state > of the worktree" or something. This method assumes that the directory being checking does not exist in the working tree, but the method itself does not check this. And if the user does not make sure the directory is absent from the worktree, this method may return a success for a partially sparsified directory, which is not intended. > But I think I'd rather just see this patch be dropped, unless I > am missing something important. I found Victoria's paraphrase [1] makes my point much clearer. -- Thanks, Shaoxuan
On 7/20/2022 2:01 AM, Victoria Dye wrote: > Shaoxuan Yuan wrote: >> Using check_dir_in_index without checking if the directory is on-disk >> could get a false positive for partially sparsified directory. >> >> Add a note in the documentation to warn potential user. >> >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> >> --- >> builtin/mv.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/builtin/mv.c b/builtin/mv.c >> index 4729bb1a1a..c8b9069db8 100644 >> --- a/builtin/mv.c >> +++ b/builtin/mv.c >> @@ -132,6 +132,11 @@ static int index_range_of_same_dir(const char *src, int length, >> * Return 0 if such directory exist (i.e. with any of its contained files not >> * marked with CE_SKIP_WORKTREE, the directory would be present in working tree). >> * Return 1 otherwise. >> + * >> + * Note: *always* check the directory is not on-disk before this function >> + * (i.e. using lstat()); >> + * otherwise it may return a false positive for a partially sparsified >> + * directory. > > To me, a comment like this indicates that either the function is not doing > what it should be doing, or its name doesn't properly describe the > function's behavior. > > Per the function description: > > Check if an out-of-cone directory should be in the index. Imagine > this case that all the files under a directory are marked with > 'CE_SKIP_WORKTREE' bit and thus the directory is sparsified. > > But neither the name of the function ('check_dir_in_index') nor the > function's behavior (hence the comment you're adding here) match that > description. > > Since this function is intended to verify that a directory 1) exists in the > index, and 2) is *entirely* sparse, I have two suggestions: > > * Change the description to specify that the non-existence of the directory > on disk is an *assumption*, not an opportunity for a "false positive." > It's a subtle distinction, but it clarifies that the function should be > used only when the caller already knows the directory is empty. For > example: > > /* > * Given the path of a directory that does not exist on-disk, check whether the > * directory contains any entries in the index with the SKIP_WORKTREE flag > * enabled. > * > * Return 1 if such index entries exist. > * Return 0 otherwise. > */ > > * 'check_dir_in_index' doesn't reflect the "is not on disk" prerequisite, so > the function name can be updated to clarify that (e.g., > 'empty_dir_has_sparse_contents') > I really breathed a sigh of relief after seeing these two points :-) They word things out much better than the original ones. -- Thanks, Shaoxuan
diff --git a/builtin/mv.c b/builtin/mv.c index 4729bb1a1a..c8b9069db8 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -132,6 +132,11 @@ static int index_range_of_same_dir(const char *src, int length, * Return 0 if such directory exist (i.e. with any of its contained files not * marked with CE_SKIP_WORKTREE, the directory would be present in working tree). * Return 1 otherwise. + * + * Note: *always* check the directory is not on-disk before this function + * (i.e. using lstat()); + * otherwise it may return a false positive for a partially sparsified + * directory. */ static int check_dir_in_index(const char *name) {
Using check_dir_in_index without checking if the directory is on-disk could get a false positive for partially sparsified directory. Add a note in the documentation to warn potential user. Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> --- builtin/mv.c | 5 +++++ 1 file changed, 5 insertions(+)