Message ID | 20220527100804.209890-5-shaoxuan.yuan02@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | mv: fix out-of-cone file/directory move logic | expand |
On 5/27/2022 6:08 AM, Shaoxuan Yuan wrote: > +/* > + * Check if an out-of-cone directory should be in the index. Imagine this case > + * that all the files under a directory are marked with 'CE_SKIP_WORKTREE' bit > + * and thus the directory is sparsified. > + * > + * Return 0 if such directory exist (i.e. with any of its contained files not > + * marked with CE_SKIP_WORKTREE, the directory would be present in working tree). > + * Return 1 otherwise. > + */ > +static int check_dir_in_index(const char *name, int namelen) > +{ > + int ret = 1; > + const char *with_slash = add_slash(name); > + int length = namelen + 1; > + > + int pos = cache_name_pos(with_slash, length); > + const struct cache_entry *ce; > + > + if (pos < 0) { > + pos = -pos - 1; > + if (pos >= the_index.cache_nr) > + return ret; > + ce = active_cache[pos]; > + if (strncmp(with_slash, ce->name, length)) > + return ret; > + if (ce_skip_worktree(ce)) > + return ret = 0; This appears to check if the _first_ entry under the directory is sparse, but not if _all_ entries are sparse. These are not the same thing, even in cone-mode sparse-checkout. The t1092 test directory has files like "folder1/0/0/a" but if "folder1/1" is in the sparse-checkout cone, then that first entry has the skip-worktree bit, but "folder1/1/a" and "folder1/a" do not. > + } > + return ret; At the moment, it doesn't seem like we need 'ret' since the only place you set it is in "return ret = 0;" (which could just be "return 0;" while the others are "return 1;"). But, perhaps you intended to create a loop over 'pos' while with_slash is a prefix of the cache entry? > + else if (!check_dir_in_index(src, length) && > + !path_in_sparse_checkout(src_w_slash, &the_index)) { style-nit: You'll want to align the different parts of your logical statement to agree with the end of the "else if (", else if (A && B) { > + modes[i] = SKIP_WORKTREE_DIR; If we are moving to a flags-based model, should we convert all "modes[i] =" to "modes[i] |=" as a first step (before adding the SKIP_WORTKREE_DIR flag)? > + goto dir_check; Hm. While I did recommend using 'goto' to jump to a common end place in the loop body, I'm not sure about jumping into another else-if statement. This might be a good time to extract the code from "else if (src_is_dir)" below into a helper method that can be used in both places. > + } > /* only error if existence is expected. */ > else if (modes[i] != SPARSE) > bad = _("bad source"); > @@ -218,7 +264,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > && lstat(dst, &st) == 0) > bad = _("cannot move directory over file"); > else if (src_is_dir) { > - int first = cache_name_pos(src, length), last; > + int first, last; > +dir_check: > + first = cache_name_pos(src, length); > > if (first >= 0) > prepare_move_submodule(src, first, > @@ -229,7 +277,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > else { /* last - first >= 1 */ > int j, dst_len, n; > > - modes[i] = WORKING_DIRECTORY; > + if (!modes[i]) > + modes[i] |= WORKING_DIRECTORY; This appears to only add the WORKING_DIRECTORY flag if modes[i] is already zero. This maybe implies that we wouldn't understand "WORKING_DIRECTORY | SKIP_WORKTREE_DIR" as a value. > n = argc + last - first; > REALLOC_ARRAY(source, n); > REALLOC_ARRAY(destination, n); > @@ -331,7 +380,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > printf(_("Renaming %s to %s\n"), src, dst); > if (show_only) > continue; > - if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) { > + if (!(mode & (INDEX | SPARSE | SKIP_WORKTREE_DIR)) && > + rename(src, dst) < 0) { style-nit: align your logical statements. > if (ignore_errors) > continue; > die_errno(_("renaming '%s' failed"), src); > @@ -345,7 +395,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > 1); > } > > - if (mode == WORKING_DIRECTORY) > + if (mode & (WORKING_DIRECTORY | SKIP_WORKTREE_DIR)) > continue; Ok, here you check if _either_ mode is enabled, which is good. Maybe you don't need the "if (!mode[i])" part above. Thanks, -Stolee
On Fri, May 27, 2022 at 11:27 PM Derrick Stolee <derrickstolee@github.com> wrote: > > On 5/27/2022 6:08 AM, Shaoxuan Yuan wrote: > > +/* > > + * Check if an out-of-cone directory should be in the index. Imagine this case > > + * that all the files under a directory are marked with 'CE_SKIP_WORKTREE' bit > > + * and thus the directory is sparsified. > > + * > > + * Return 0 if such directory exist (i.e. with any of its contained files not > > + * marked with CE_SKIP_WORKTREE, the directory would be present in working tree). > > + * Return 1 otherwise. > > + */ > > +static int check_dir_in_index(const char *name, int namelen) > > +{ > > + int ret = 1; > > + const char *with_slash = add_slash(name); > > + int length = namelen + 1; > > + > > + int pos = cache_name_pos(with_slash, length); > > + const struct cache_entry *ce; > > + > > + if (pos < 0) { > > + pos = -pos - 1; > > + if (pos >= the_index.cache_nr) > > + return ret; > > + ce = active_cache[pos]; > > + if (strncmp(with_slash, ce->name, length)) > > + return ret; > > + if (ce_skip_worktree(ce)) > > + return ret = 0; > > This appears to check if the _first_ entry under the directory > is sparse, but not if _all_ entries are sparse. These are not > the same thing, even in cone-mode sparse-checkout. The t1092 > test directory has files like "folder1/0/0/a" but if > "folder1/1" is in the sparse-checkout cone, then that first > entry has the skip-worktree bit, but "folder1/1/a" and "folder1/a" > do not. Yes, it is checking the first entry and this would not work without the lstat in the front. But I think the "lstat < 0" makes sure that this directory cannot be partially sparsified. It is either missing both in the worktree and index, or missing in the worktree but present in index (with all its content sparsified). And because of that, I think only the first entry needs to be checked. > > + } > > + return ret; > > At the moment, it doesn't seem like we need 'ret' since the > only place you set it is in "return ret = 0;" (which could > just be "return 0;" while the others are "return 1;"). But, > perhaps you intended to create a loop over 'pos' while > with_slash is a prefix of the cache entry? I agree that this variable is redundant. But I fail to understand the logical relation between before "But," and after "But,". Please elaborate on that? > > + else if (!check_dir_in_index(src, length) && > > + !path_in_sparse_checkout(src_w_slash, &the_index)) { > > style-nit: You'll want to align the different parts of your > logical statement to agree with the end of the "else if (", > > else if (A && > B) { > This one is interesting because it appears just alright in my VSCode editor. Later I found that it is because git-diff is using a tab size of 8 or something, but my VSCode uses tab size of 4. After I configured the git-diff tab rendering size, it looks alright. Same for another style nit down below. > > + modes[i] = SKIP_WORKTREE_DIR; > > If we are moving to a flags-based model, should we convert all > "modes[i] =" to "modes[i] |=" as a first step (before adding the > SKIP_WORTKREE_DIR flag)? > > > + goto dir_check; > > Hm. While I did recommend using 'goto' to jump to a common end > place in the loop body, I'm not sure about jumping into another > else-if statement. This might be a good time to extract the > code from "else if (src_is_dir)" below into a helper method that > can be used in both places. Right, this is suspicious. I wasn't familiar at all with C/C++, and being able to do this inter-if-else jump also startled me. I agree that it should be something more legitimate, like extracting a method for it. > > + } > > /* only error if existence is expected. */ > > else if (modes[i] != SPARSE) > > bad = _("bad source"); > > @@ -218,7 +264,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > && lstat(dst, &st) == 0) > > bad = _("cannot move directory over file"); > > else if (src_is_dir) { > > - int first = cache_name_pos(src, length), last; > > + int first, last; > > +dir_check: > > + first = cache_name_pos(src, length); > > > > if (first >= 0) > > prepare_move_submodule(src, first, > > @@ -229,7 +277,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > else { /* last - first >= 1 */ > > int j, dst_len, n; > > > > - modes[i] = WORKING_DIRECTORY; > > + if (!modes[i]) > > + modes[i] |= WORKING_DIRECTORY; > > This appears to only add the WORKING_DIRECTORY flag if modes[i] is > already zero. This maybe implies that we wouldn't understand > "WORKING_DIRECTORY | SKIP_WORKTREE_DIR" as a value. At this point, I cannot think of the reason for writing it this way. And yes, this does not make sense... > > n = argc + last - first; > > REALLOC_ARRAY(source, n); > > REALLOC_ARRAY(destination, n); > > @@ -331,7 +380,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > printf(_("Renaming %s to %s\n"), src, dst); > > if (show_only) > > continue; > > - if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) { > > + if (!(mode & (INDEX | SPARSE | SKIP_WORKTREE_DIR)) && > > + rename(src, dst) < 0) { > > style-nit: align your logical statements. > > > if (ignore_errors) > > continue; > > die_errno(_("renaming '%s' failed"), src); > > @@ -345,7 +395,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > 1); > > } > > > > - if (mode == WORKING_DIRECTORY) > > + if (mode & (WORKING_DIRECTORY | SKIP_WORKTREE_DIR)) > > continue; > > Ok, here you check if _either_ mode is enabled, which is good. Maybe > you don't need the "if (!mode[i])" part above. > > Thanks, > -Stolee
On 5/31/2022 5:56 AM, Shaoxuan Yuan wrote: > On Fri, May 27, 2022 at 11:27 PM Derrick Stolee > <derrickstolee@github.com> wrote: >> >> On 5/27/2022 6:08 AM, Shaoxuan Yuan wrote: ... >> This appears to check if the _first_ entry under the directory >> is sparse, but not if _all_ entries are sparse. These are not >> the same thing, even in cone-mode sparse-checkout. The t1092 >> test directory has files like "folder1/0/0/a" but if >> "folder1/1" is in the sparse-checkout cone, then that first >> entry has the skip-worktree bit, but "folder1/1/a" and "folder1/a" >> do not. > > Yes, it is checking the first entry and this would not work without the > lstat in the front. But I think the "lstat < 0" makes sure that this directory > cannot be partially sparsified. > > It is either missing both in the worktree and index, or missing in the worktree > but present in index (with all its content sparsified). And because of that, > I think only the first entry needs to be checked. Ah! Good thinking. I hadn't considered that extra detail, so we get to save some cycles here. >>> + } >>> + return ret; >> >> At the moment, it doesn't seem like we need 'ret' since the >> only place you set it is in "return ret = 0;" (which could >> just be "return 0;" while the others are "return 1;"). But, >> perhaps you intended to create a loop over 'pos' while >> with_slash is a prefix of the cache entry? > > I agree that this variable is redundant. But I fail to understand > the logical relation between before "But," and after "But,". Please > elaborate on that? I was just thinking that if you intended to write a loop as I had suggested, then 'ret' could be modified or used in more places. Feel free to ignore since we resolved that. >>> + else if (!check_dir_in_index(src, length) && >>> + !path_in_sparse_checkout(src_w_slash, &the_index)) { >> >> style-nit: You'll want to align the different parts of your >> logical statement to agree with the end of the "else if (", >> >> else if (A && >> B) { >> > > This one is interesting because it appears just alright in my VSCode editor. > Later I found that it is because git-diff is using a tab size of 8 or something, > but my VSCode uses tab size of 4. After I configured the git-diff tab rendering > size, it looks alright. Same for another style nit down below. That'll do it. You can double-check the alignment in your GGG PR, which should use the correct tab width. Thanks, -Stolee
diff --git a/builtin/mv.c b/builtin/mv.c index 62284e3f86..e64f251a69 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -19,6 +19,14 @@ static const char * const builtin_mv_usage[] = { NULL }; +enum update_mode { + BOTH = 0, + WORKING_DIRECTORY = (1 << 1), + INDEX = (1 << 2), + SPARSE = (1 << 3), + SKIP_WORKTREE_DIR = (1 << 4), +}; + #define DUP_BASENAME 1 #define KEEP_TRAILING_SLASH 2 @@ -115,6 +123,37 @@ static int index_range_of_same_dir(const char *src, int length, return last - first; } +/* + * Check if an out-of-cone directory should be in the index. Imagine this case + * that all the files under a directory are marked with 'CE_SKIP_WORKTREE' bit + * and thus the directory is sparsified. + * + * Return 0 if such directory exist (i.e. with any of its contained files not + * marked with CE_SKIP_WORKTREE, the directory would be present in working tree). + * Return 1 otherwise. + */ +static int check_dir_in_index(const char *name, int namelen) +{ + int ret = 1; + const char *with_slash = add_slash(name); + int length = namelen + 1; + + int pos = cache_name_pos(with_slash, length); + const struct cache_entry *ce; + + if (pos < 0) { + pos = -pos - 1; + if (pos >= the_index.cache_nr) + return ret; + ce = active_cache[pos]; + if (strncmp(with_slash, ce->name, length)) + return ret; + if (ce_skip_worktree(ce)) + return ret = 0; + } + return ret; +} + int cmd_mv(int argc, const char **argv, const char *prefix) { int i, flags, gitmodules_modified = 0; @@ -129,7 +168,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) OPT_END(), }; const char **source, **destination, **dest_path, **submodule_gitfile; - enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes; + enum update_mode *modes; struct stat st; struct string_list src_for_dst = STRING_LIST_INIT_NODUP; struct lock_file lock_file = LOCK_INIT; @@ -187,6 +226,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) if (lstat(src, &st) < 0) { int pos = cache_name_pos(src, length); + const char *src_w_slash = add_slash(src); + if (pos >= 0) { const struct cache_entry *ce = active_cache[pos]; @@ -208,6 +249,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix) else bad = _("bad source"); } + else if (!check_dir_in_index(src, length) && + !path_in_sparse_checkout(src_w_slash, &the_index)) { + modes[i] = SKIP_WORKTREE_DIR; + goto dir_check; + } /* only error if existence is expected. */ else if (modes[i] != SPARSE) bad = _("bad source"); @@ -218,7 +264,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) && lstat(dst, &st) == 0) bad = _("cannot move directory over file"); else if (src_is_dir) { - int first = cache_name_pos(src, length), last; + int first, last; +dir_check: + first = cache_name_pos(src, length); if (first >= 0) prepare_move_submodule(src, first, @@ -229,7 +277,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) else { /* last - first >= 1 */ int j, dst_len, n; - modes[i] = WORKING_DIRECTORY; + if (!modes[i]) + modes[i] |= WORKING_DIRECTORY; n = argc + last - first; REALLOC_ARRAY(source, n); REALLOC_ARRAY(destination, n); @@ -331,7 +380,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) printf(_("Renaming %s to %s\n"), src, dst); if (show_only) continue; - if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) { + if (!(mode & (INDEX | SPARSE | SKIP_WORKTREE_DIR)) && + rename(src, dst) < 0) { if (ignore_errors) continue; die_errno(_("renaming '%s' failed"), src); @@ -345,7 +395,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) 1); } - if (mode == WORKING_DIRECTORY) + if (mode & (WORKING_DIRECTORY | SKIP_WORKTREE_DIR)) continue; pos = cache_name_pos(src, strlen(src)); diff --git a/t/t7002-mv-sparse-checkout.sh b/t/t7002-mv-sparse-checkout.sh index 2c9008573a..cf2f5dc46f 100755 --- a/t/t7002-mv-sparse-checkout.sh +++ b/t/t7002-mv-sparse-checkout.sh @@ -206,7 +206,7 @@ test_expect_success 'refuse to move file to non-skip-worktree sparse path' ' test_cmp expect stderr ' -test_expect_failure 'refuse to move out-of-cone directory without --sparse' ' +test_expect_success 'refuse to move out-of-cone directory without --sparse' ' git sparse-checkout disable && git reset --hard && mkdir folder1 && @@ -222,7 +222,7 @@ test_expect_failure 'refuse to move out-of-cone directory without --sparse' ' test_cmp expect stderr ' -test_expect_failure 'can move out-of-cone directory with --sparse' ' +test_expect_success 'can move out-of-cone directory with --sparse' ' git sparse-checkout disable && git reset --hard && mkdir folder1 &&
Originally, moving a <source> directory which is not on-disk due to its existence outside of sparse-checkout cone, "giv mv" command errors out with "bad source". Add a helper check_dir_in_index() function to see if a directory name exists in the index. Also add a SKIP_WORKTREE_DIR bit to mark such directories. Change the checking logic, so that such <source> directory makes "giv mv" command warns with "advise_on_updating_sparse_paths()" instead of "bad source"; also user now can supply a "--sparse" flag so this operation can be carried out successfully. Also, as suggested by Derrick [1], move the in-line definition of "enum update_mode" to the top of the file and make it use "flags" mode (each state is a different bit in the word). [1] https://lore.kernel.org/git/22aadea2-9330-aa9e-7b6a-834585189144@github.com/ Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> --- builtin/mv.c | 60 ++++++++++++++++++++++++++++++++--- t/t7002-mv-sparse-checkout.sh | 4 +-- 2 files changed, 57 insertions(+), 7 deletions(-)