Message ID | 20220331091755.385961-3-shaoxuan.yuan02@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mv: fix out-of-cone file/directory move logic | expand |
On Thu, Mar 31 2022, Shaoxuan Yuan wrote: > +static int check_dir_in_index(const char *dir) > +{ > + int ret = 0; > + int length = sizeof(dir) + 1; > + char *substr = malloc(length); > + > + for (int i = 0; i < the_index.cache_nr; i++) { See https://lore.kernel.org/git/xmqqy20r3rv7.fsf@gitster.g/ for how we're not quite using this syntax yet. This should also be "unsigned int" to go with the "cache_nr" member. > + memcpy(substr, the_index.cache[i]->name, length); > + memset(substr + length - 1, 0, 1); > + > + if (strcmp(dir, substr) == 0) { Style: don't compare against == 0, or == NULL, use !, see CodingGuidelines. > + else if (check_dir_in_index(src_w_slash) && > + !path_in_sparse_checkout(src_w_slash, &the_index)) { Funny indentation, the ! should be aligned with "(". > - modes[i] = WORKING_DIRECTORY; > + if (!modes[i]) > + modes[i] = WORKING_DIRECTORY; This works, but assuming things about enum values (even if 0) always seems a bit nasty, can this be a comparison to BOTH instead of !? May or may not be better... But then again we do xcalloc() to allocate it, so we assume that already, nevermind... :) (there were also indentation issues below)
Shaoxuan Yuan wrote: > Originally, moving a <source> directory which is not on-disk due > to its existence outside of sparse-checkout cone, "giv mv" command > errors out with "bad source". > > Add a helper check_dir_in_index() function to see if a directory > name exists in the index. Also add a SPARSE_DIRECTORY bit to mark > such directories. > Hmm, I think this patch would fit better in your eventual "sparse index integration" series than this "prerequisite fixes to sparse-checkout" series. Sparse directories *only* appear when you're using a sparse index so, theoretically, this shouldn't ever come up (and thus isn't testable) until you're using a sparse index. Since it's here, though, I'm happy to review what you have (even if you eventually move it to a later series)! > Change the checking logic, so that such <source> directory makes > "giv mv" command warns with "advise_on_updating_sparse_paths()" > instead of "bad source"; also user now can supply a "--sparse" flag so > this operation can be carried out successfully. > > Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> > --- > Since I'm so new to C language (not an acquaintance until this patch), > the "check_dir_in_index()" function I added might not be ideal (in terms of > safety and correctness?). I have digging into the APIs provided in the codebase > but I haven't found anything to do this very job: find out if a directory is > in the index (am I missing something?). > Probably because contents are stored in the index as blobs and > they all represent regular files. So I came up with this dull solution... > > builtin/mv.c | 41 ++++++++++++++++++++++++++++++++++++----- > 1 file changed, 36 insertions(+), 5 deletions(-) > > diff --git a/builtin/mv.c b/builtin/mv.c > index 32ad4d5682..9da9205e01 100644 > --- a/builtin/mv.c > +++ b/builtin/mv.c > @@ -115,6 +115,25 @@ static int index_range_of_same_dir(const char *src, int length, > return last - first; > } > > +static int check_dir_in_index(const char *dir) > +{ This function can be made a lot simpler - you can use `cache_name_pos()` to find the index entry - if it's found, the directory exists as a sparse directory. And, because 'add_slash()' enforces the trailing slash later on, you don't need to worry about adjusting the name before you look for the entry. > + int ret = 0; > + int length = sizeof(dir) + 1; > + char *substr = malloc(length); > + > + for (int i = 0; i < the_index.cache_nr; i++) { > + memcpy(substr, the_index.cache[i]->name, length); > + memset(substr + length - 1, 0, 1); > + > + if (strcmp(dir, substr) == 0) { > + ret = 1; > + return ret; > + } > + } > + free(substr); > + return ret; > +} > + > int cmd_mv(int argc, const char **argv, const char *prefix) > { > int i, flags, gitmodules_modified = 0; > @@ -129,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > OPT_END(), > }; > const char **source, **destination, **dest_path, **submodule_gitfile; > - enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes; > + enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE, > + SPARSE_DIRECTORY } *modes; > struct stat st; > struct string_list src_for_dst = STRING_LIST_INIT_NODUP; > struct lock_file lock_file = LOCK_INIT; > @@ -197,6 +217,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > */ > > int pos = cache_name_pos(src, length); > + const char *src_w_slash = add_slash(src); > + > if (pos >= 0) { > const struct cache_entry *ce = active_cache[pos]; > > @@ -209,6 +231,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > else > bad = _("bad source"); > } > + else if (check_dir_in_index(src_w_slash) && > + !path_in_sparse_checkout(src_w_slash, &the_index)) { > + modes[i] = SPARSE_DIRECTORY; > + goto dir_check; > + } In if-statements like this, you'll want to line up the statements in parentheses on subsequent lines, like: else if (check_dir_in_index(src_w_slash) && !path_in_sparse_checkout(src_w_slash, &the_index)) { ...where the second line is indented 1 (8-space-sized) tab + 1 space. In general, if you're trying to align code (in this repository), align first with as many tabs as possible, then the "remainder" with spaces. Note that this isn't 100% consistent throughout the repository - older lines might not have been aligned properly - but you should go for this styling on any new lines that you add. > /* only error if existence is expected. */ > else if (modes[i] != SPARSE) > bad = _("bad source"); > @@ -219,7 +246,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > && lstat(dst, &st) == 0) > bad = _("cannot move directory over file"); > else if (src_is_dir) { > - int first = cache_name_pos(src, length), last; > + int first, last; > +dir_check: > + first = cache_name_pos(src, length); > > if (first >= 0) > prepare_move_submodule(src, first, > @@ -230,7 +259,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > else { /* last - first >= 1 */ > int j, dst_len, n; > > - modes[i] = WORKING_DIRECTORY; > + if (!modes[i]) > + modes[i] = WORKING_DIRECTORY; > n = argc + last - first; > REALLOC_ARRAY(source, n); > REALLOC_ARRAY(destination, n); > @@ -332,7 +362,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > printf(_("Renaming %s to %s\n"), src, dst); > if (show_only) > continue; > - if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) { > + if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY && > + rename(src, dst) < 0) { > if (ignore_errors) > continue; > die_errno(_("renaming '%s' failed"), src); > @@ -346,7 +377,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > 1); > } > > - if (mode == WORKING_DIRECTORY) > + if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY) I'm a bit confused - doesn't this mean the sparse dir move will be skipped? In your commit description, you mention that this 'mv' succeeds with the '--sparse' option, but I don't see any place where the sparse directory would be moved. > continue; > > pos = cache_name_pos(src, strlen(src));
On Thu, Mar 31, 2022 at 6:30 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Thu, Mar 31 2022, Shaoxuan Yuan wrote: > > > +static int check_dir_in_index(const char *dir) > > +{ > > + int ret = 0; > > + int length = sizeof(dir) + 1; > > + char *substr = malloc(length); > > + > > + for (int i = 0; i < the_index.cache_nr; i++) { > > See https://lore.kernel.org/git/xmqqy20r3rv7.fsf@gitster.g/ for how > we're not quite using this syntax yet. > > This should also be "unsigned int" to go with the "cache_nr" member. > > > + memcpy(substr, the_index.cache[i]->name, length); > > + memset(substr + length - 1, 0, 1); > > + > > + if (strcmp(dir, substr) == 0) { > > Style: don't compare against == 0, or == NULL, use !, see CodingGuidelines. > > > + else if (check_dir_in_index(src_w_slash) && > > + !path_in_sparse_checkout(src_w_slash, &the_index)) { > > Funny indentation, the ! should be aligned with "(". > > > - modes[i] = WORKING_DIRECTORY; > > + if (!modes[i]) > > + modes[i] = WORKING_DIRECTORY; > > This works, but assuming things about enum values (even if 0) always > seems a bit nasty, can this be a comparison to BOTH instead of !? May or > may not be better... > > But then again we do xcalloc() to allocate it, so we assume that > already, nevermind... :) > > (there were also indentation issues below) Thanks for the styling reminders! I should go back and reread CodingGuidelines more often... Things just slipped off my mind since I'm still trying to remember the guideline...
On Fri, Apr 1, 2022 at 5:28 AM Victoria Dye <vdye@github.com> wrote: > > Shaoxuan Yuan wrote: > > Originally, moving a <source> directory which is not on-disk due > > to its existence outside of sparse-checkout cone, "giv mv" command > > errors out with "bad source". > > > > Add a helper check_dir_in_index() function to see if a directory > > name exists in the index. Also add a SPARSE_DIRECTORY bit to mark > > such directories. > > > > Hmm, I think this patch would fit better in your eventual "sparse index > integration" series than this "prerequisite fixes to sparse-checkout" > series. Sparse directories *only* appear when you're using a sparse index > so, theoretically, this shouldn't ever come up (and thus isn't testable) > until you're using a sparse index. After reading your feedback, I realized that I totally misused the phrase "sparse directory". Clearly, this patch series does not deal with sparse- index yet, as "sparse directory" is a cache entry that points to a tree, if sparse-index is enabled. Silly me ;) What I was *actually* trying to say is: I want to change the checking logic of moving a "directory that exists outside of sparse-checkout cone", and I apparently misused "sparse directory" to reference such a thing. > Since it's here, though, I'm happy to review what you have (even if you > eventually move it to a later series)! Thanks! > > diff --git a/builtin/mv.c b/builtin/mv.c > > index 32ad4d5682..9da9205e01 100644 > > --- a/builtin/mv.c > > +++ b/builtin/mv.c > > @@ -115,6 +115,25 @@ static int index_range_of_same_dir(const char *src, int length, > > return last - first; > > } > > > > +static int check_dir_in_index(const char *dir) > > +{ > > This function can be made a lot simpler - you can use `cache_name_pos()` to > find the index entry - if it's found, the directory exists as a sparse > directory. And, because 'add_slash()' enforces the trailing slash later on, > you don't need to worry about adjusting the name before you look for the > entry. Yes, if I correctly used the phrase "sparse directory", but I did not... I think I can use 'cache_name_pos()' to check a directory *iff* it is a legit sparse directory when using sparse-index? In my case, I just want to check a regular directory that is not in the worktree, since the cone pattern excludes it. And in a non-sparse index, cache entry points only to blobs, not trees, and that's the reason I wrote this weird function to look into the index. I understand that sounds not compatible with how git manages index, but all I want to know is "does this directory exist in the index?" (this question is also quasi-correct). I tried to find an existing API for this job, but I failed to find any. Though I have a hunch that there must be something to do it... > > + int ret = 0; > > + int length = sizeof(dir) + 1; > > + char *substr = malloc(length); > > + > > + for (int i = 0; i < the_index.cache_nr; i++) { > > + memcpy(substr, the_index.cache[i]->name, length); > > + memset(substr + length - 1, 0, 1); > > + > > + if (strcmp(dir, substr) == 0) { > > + ret = 1; > > + return ret; > > + } > > + } > > + free(substr); > > + return ret; > > +} > > + > > int cmd_mv(int argc, const char **argv, const char *prefix) > > { > > int i, flags, gitmodules_modified = 0; > > @@ -129,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > OPT_END(), > > }; > > const char **source, **destination, **dest_path, **submodule_gitfile; > > - enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes; > > + enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE, > > + SPARSE_DIRECTORY } *modes; > > struct stat st; > > struct string_list src_for_dst = STRING_LIST_INIT_NODUP; > > struct lock_file lock_file = LOCK_INIT; > > @@ -197,6 +217,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > */ > > > > int pos = cache_name_pos(src, length); > > + const char *src_w_slash = add_slash(src); > > + > > if (pos >= 0) { > > const struct cache_entry *ce = active_cache[pos]; > > > > @@ -209,6 +231,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > else > > bad = _("bad source"); > > } > > + else if (check_dir_in_index(src_w_slash) && > > + !path_in_sparse_checkout(src_w_slash, &the_index)) { > > + modes[i] = SPARSE_DIRECTORY; > > + goto dir_check; > > + } > > In if-statements like this, you'll want to line up the statements in > parentheses on subsequent lines, like: > > else if (check_dir_in_index(src_w_slash) && > !path_in_sparse_checkout(src_w_slash, &the_index)) { > > ...where the second line is indented 1 (8-space-sized) tab + 1 space. > > In general, if you're trying to align code (in this repository), align first > with as many tabs as possible, then the "remainder" with spaces. Note that > this isn't 100% consistent throughout the repository - older lines might not > have been aligned properly - but you should go for this styling on any new > lines that you add. Will do. > > > /* only error if existence is expected. */ > > else if (modes[i] != SPARSE) > > bad = _("bad source"); > > @@ -219,7 +246,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > && lstat(dst, &st) == 0) > > bad = _("cannot move directory over file"); > > else if (src_is_dir) { > > - int first = cache_name_pos(src, length), last; > > + int first, last; > > +dir_check: > > + first = cache_name_pos(src, length); > > > > if (first >= 0) > > prepare_move_submodule(src, first, > > @@ -230,7 +259,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > else { /* last - first >= 1 */ > > int j, dst_len, n; > > > > - modes[i] = WORKING_DIRECTORY; > > + if (!modes[i]) > > + modes[i] = WORKING_DIRECTORY; > > n = argc + last - first; > > REALLOC_ARRAY(source, n); > > REALLOC_ARRAY(destination, n); > > @@ -332,7 +362,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > printf(_("Renaming %s to %s\n"), src, dst); > > if (show_only) > > continue; > > - if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) { > > + if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY && > > + rename(src, dst) < 0) { > > if (ignore_errors) > > continue; > > die_errno(_("renaming '%s' failed"), src); > > @@ -346,7 +377,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > 1); > > } > > > > - if (mode == WORKING_DIRECTORY) > > + if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY) > > I'm a bit confused - doesn't this mean the sparse dir move will be skipped? > In your commit description, you mention that this 'mv' succeeds with the > '--sparse' option, but I don't see any place where the sparse directory > would be moved. Well, you know the drill, I did not use "sparse directory" correctly, let alone 'SPARSE_DIRECTORY' enum bit in this hunk. I think it makes some sense if you apply my actual meaning of 'SPARSE_DIRECTORY' here (it should be something like OUT_OF_CONE_WORKING_DIRECTORY)? Because such directory is not on disk, it cannot be "rename()"d, and should also skip the "rename_cache_entry_at()" function. If all the files under the directory are moved/renamed, then (in my opinion) the directory is both moved to the destination, both in the worktree and in the index.
On 4/1/2022 8:49 AM, Shaoxuan Yuan wrote:> On Fri, Apr 1, 2022 at 5:28 AM Victoria Dye <vdye@github.com> wrote: >> >> Shaoxuan Yuan wrote: >>> Originally, moving a <source> directory which is not on-disk due >>> to its existence outside of sparse-checkout cone, "giv mv" command >>> errors out with "bad source". >>> >>> Add a helper check_dir_in_index() function to see if a directory >>> name exists in the index. Also add a SPARSE_DIRECTORY bit to mark >>> such directories. >>> >> >> Hmm, I think this patch would fit better in your eventual "sparse index >> integration" series than this "prerequisite fixes to sparse-checkout" >> series. Sparse directories *only* appear when you're using a sparse index >> so, theoretically, this shouldn't ever come up (and thus isn't testable) >> until you're using a sparse index. > > After reading your feedback, I realized that I totally misused > the phrase "sparse directory". Clearly, this patch series does not > deal with sparse- > index yet, as "sparse directory" is a cache entry that points to a > tree, if sparse-index > is enabled. Silly me ;) > > What I was *actually* trying to say is: I want to change the checking > logic of moving > a "directory that exists outside of sparse-checkout cone", and I > apparently misused > "sparse directory" to reference such a thing. In the case of a full index (or an expanded sparse index, which is currently always the case for `git mv`), you detect a sparse directory by looking for the directory in the index, _not_ finding it, and then seeing if the cache entry at the position where the directory _would_ exist is marked with the SKIP_WORKTREE bit. This works in cone mode and the old mode because I assume you've already checked for the existence of the directory, so if there _was_ any non-SKIP_WORKTREE cache entry within the directory, then the directory would exist in the worktree. (These are good details to include in the commit message.) >>> +static int check_dir_in_index(const char *dir) >>> +{ >> >> This function can be made a lot simpler - you can use `cache_name_pos()` to >> find the index entry - if it's found, the directory exists as a sparse >> directory. And, because 'add_slash()' enforces the trailing slash later on, >> you don't need to worry about adjusting the name before you look for the >> entry. > > Yes, if I correctly used the phrase "sparse directory", but I did not... > I think I can use 'cache_name_pos()' to > check a directory *iff* it is a legit sparse directory when using sparse-index? > > In my case, I just want to check a regular directory that is not in > the worktree, > since the cone pattern excludes it. And in a non-sparse index, cache > entry points only > to blobs, not trees, and that's the reason I wrote this weird function > to look into the > index. I understand that sounds not compatible with how git manages > index, but all > I want to know is "does this directory exist in the index?" (this > question is also quasi-correct). > > I tried to find an existing API for this job, but I failed to find > any. Though I have a hunch > that there must be something to do it... You can still use cache_name_pos() and if the resulting value is negative, then you can "flip it" (pos = -1 - pos) to get the array index where the directory _would_ be inserted. For example, here is a case in unpack-trees.c (that uses the synonym index_name_pos()): static int locate_in_src_index(const struct cache_entry *ce, struct unpack_trees_options *o) { struct index_state *index = o->src_index; int len = ce_namelen(ce); int pos = index_name_pos(index, ce->name, len); if (pos < 0) pos = -1 - pos; return pos; } This uses a binary search inside the method, so it will be much faster than the loop you wrote here. If you have this helper, then you can integrate with the sparse index later by checking for a sparse directory entry when pos is non-negative. But that can wait for the next series. >>> /* only error if existence is expected. */ >>> else if (modes[i] != SPARSE) >>> bad = _("bad source"); >>> @@ -219,7 +246,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) >>> && lstat(dst, &st) == 0) >>> bad = _("cannot move directory over file"); >>> else if (src_is_dir) { >>> - int first = cache_name_pos(src, length), last; >>> + int first, last; >>> +dir_check: >>> + first = cache_name_pos(src, length); >>> >>> if (first >= 0) >>> prepare_move_submodule(src, first, >>> @@ -230,7 +259,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) >>> else { /* last - first >= 1 */ >>> int j, dst_len, n; >>> >>> - modes[i] = WORKING_DIRECTORY; >>> + if (!modes[i]) >>> + modes[i] = WORKING_DIRECTORY; This is curious that we could get here with an existing mode. I wonder if it would be worthwhile to make the enum using a "flags" mode (each state is a different bit in the word) so instead of modes[i] = WORKING_DIRECTORY; we would write modes[i] |= WORKING_DIRECTORY; >>> n = argc + last - first; >>> REALLOC_ARRAY(source, n); >>> REALLOC_ARRAY(destination, n); >>> @@ -332,7 +362,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) >>> printf(_("Renaming %s to %s\n"), src, dst); >>> if (show_only) >>> continue; >>> - if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) { >>> + if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY && And here we would write something like if (!(mode & (INDEX | SPARSE | SPARSE_DIRECTORY)) && >>> + rename(src, dst) < 0) { >>> if (ignore_errors) >>> continue; >>> die_errno(_("renaming '%s' failed"), src); >>> @@ -346,7 +377,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) >>> 1); >>> } >>> >>> - if (mode == WORKING_DIRECTORY) >>> + if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY) and here: if (mode & (WORKING_DIRECTORY | SPARSE_DIRECTORY)) This requires changing your enum definition. It got lost in the previous quoting, it seems, but here it is again: >>> @@ -129,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) >>> OPT_END(), >>> }; >>> const char **source, **destination, **dest_path, **submodule_gitfile; >>> - enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes; >>> + enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE, >>> + SPARSE_DIRECTORY } *modes; >>> struct stat st; >>> struct string_list src_for_dst = STRING_LIST_INIT_NODUP; >>> struct lock_file lock_file = LOCK_INIT; I think it is time to split out "enum update_mode" to the top of the file instead of defining it inline here. Here is what it could look like: enum update_mode { BOTH = 0, WORKING_DIRECTORY = (1 << 1), INDEX = (1 << 2), SPARSE = (1 << 3), }; (This is how it would look before adding your new value.) I can imagine making a new commit that does the following: * Move update_mode to the top and set the values to be independent bits. * Change "mode[i] =" to "mode[i] |=" * Change "mode ==" checks to "mode &" checks. Think about it. >> I'm a bit confused - doesn't this mean the sparse dir move will be skipped? >> In your commit description, you mention that this 'mv' succeeds with the >> '--sparse' option, but I don't see any place where the sparse directory >> would be moved. > > Well, you know the drill, I did not use "sparse directory" correctly, let alone > 'SPARSE_DIRECTORY' enum bit in this hunk. I think it makes some sense > if you apply my actual meaning of 'SPARSE_DIRECTORY' here (it should be > something like OUT_OF_CONE_WORKING_DIRECTORY)? Because such > directory is not on disk, it cannot be "rename()"d, and should also skip the > "rename_cache_entry_at()" function. If all the files under the directory are > moved/renamed, then (in my opinion) the directory is both moved to the > destination, > both in the worktree and in the index. Perhaps a better name would be SKIP_WORKTREE_DIR. But yes, we need to make sure that all cache entries under the directory have their SKIP_WORKTREE bits re-checked and any that lose the bit need to be written to the worktree. I wonder if it is as simple as marking a boolean that says "I moved at least one sparse entry" and then calling update_sparsity() at the end of cmd_mv() if that boolean is true. Thanks, -Stolee
On Fri, Apr 1, 2022 at 10:49 PM Derrick Stolee <derrickstolee@github.com> wrote: > > On 4/1/2022 8:49 AM, Shaoxuan Yuan wrote:> On Fri, Apr 1, 2022 at 5:28 AM Victoria Dye <vdye@github.com> wrote: > >> > >> Shaoxuan Yuan wrote: > >>> Originally, moving a <source> directory which is not on-disk due > >>> to its existence outside of sparse-checkout cone, "giv mv" command > >>> errors out with "bad source". > >>> > >>> Add a helper check_dir_in_index() function to see if a directory > >>> name exists in the index. Also add a SPARSE_DIRECTORY bit to mark > >>> such directories. > >>> > >> > >> Hmm, I think this patch would fit better in your eventual "sparse index > >> integration" series than this "prerequisite fixes to sparse-checkout" > >> series. Sparse directories *only* appear when you're using a sparse index > >> so, theoretically, this shouldn't ever come up (and thus isn't testable) > >> until you're using a sparse index. > > > > After reading your feedback, I realized that I totally misused > > the phrase "sparse directory". Clearly, this patch series does not > > deal with sparse- > > index yet, as "sparse directory" is a cache entry that points to a > > tree, if sparse-index > > is enabled. Silly me ;) > > > > What I was *actually* trying to say is: I want to change the checking > > logic of moving > > a "directory that exists outside of sparse-checkout cone", and I > > apparently misused > > "sparse directory" to reference such a thing. > > In the case of a full index (or an expanded sparse index, which is > currently always the case for `git mv`), you detect a sparse directory > by looking for the directory in the index, _not_ finding it, and then > seeing if the cache entry at the position where the directory _would_ > exist is marked with the SKIP_WORKTREE bit. > > This works in cone mode and the old mode because I assume you've already > checked for the existence of the directory, so if there _was_ any > non-SKIP_WORKTREE cache entry within the directory, then the directory > would exist in the worktree. > > (These are good details to include in the commit message.) I read and think about this part a few times, but I'm still confused. As Victoria pointed out earlier, and I quote, "Sparse directories *only* appear when you're using a sparse index, so, theoretically, this shouldn't ever come up (and thus isn't testable) until you're using a sparse index." So I'm not so sure what do you mean by putting "full index" and "sparse directory" together. Thus, I go ahead and try to detect a directory that is outside of sparse-checkout cone, without sparse-index enabled. I found a problem that if you use cache_name_pos() to do this detection, I imagined the following example (I'm trying to imitate an output of "git ls-files -t"): H a H b S d/file1 H e/file1 So in this index, I use cache_name_pos() to find a directory "c/". I imagine the the value returned would be -3, which indicates this directory would be inserted at index position 2. However, the cache entry at position 2 is "d/file1", which is marked with SKIP_WORKTREE, and this fact cannot guarantee that "c/" is a sparse directory, since ''c/" is not in the index per se. Probably I'm missing something, or I'm just dumb. > Thanks, > -Stolee
On Mon, Apr 4, 2022 at 3:25 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote: > I read and think about this part a few times, but I'm still confused. > > As Victoria pointed out earlier, and I quote, "Sparse directories *only* appear > when you're using a sparse index, so, theoretically, this shouldn't ever > come up (and thus isn't testable) until you're using a sparse index." > So I'm not so sure what do you mean by putting "full index" and "sparse > directory" together. > > Thus, I go ahead and try to detect a directory that is outside of > sparse-checkout cone, without sparse-index enabled. > > I found a problem that if you use cache_name_pos() to do this > detection, I imagined the following example (I'm trying to imitate an > output of "git ls-files -t"): > > H a > H b > S d/file1 > H e/file1 > > So in this index, I use cache_name_pos() to find a directory "c/". I imagine the > the value returned would be -3, which indicates this directory would be inserted > at index position 2. However, the cache entry at position 2 is > "d/file1", which is > marked with SKIP_WORKTREE, and this fact cannot guarantee that "c/" is > a sparse directory, since ''c/" is not in the index per se. > > Probably I'm missing something, or I'm just dumb. Though I think doing a strncmp() after the cache_name_pos() can get the job done :)
On 4/4/2022 3:49 AM, Shaoxuan Yuan wrote: > On Mon, Apr 4, 2022 at 3:25 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote: >> I read and think about this part a few times, but I'm still confused. >> >> As Victoria pointed out earlier, and I quote, "Sparse directories *only* appear >> when you're using a sparse index, so, theoretically, this shouldn't ever >> come up (and thus isn't testable) until you're using a sparse index." >> So I'm not so sure what do you mean by putting "full index" and "sparse >> directory" together. >> >> Thus, I go ahead and try to detect a directory that is outside of >> sparse-checkout cone, without sparse-index enabled. >> >> I found a problem that if you use cache_name_pos() to do this >> detection, I imagined the following example (I'm trying to imitate an >> output of "git ls-files -t"): >> >> H a >> H b >> S d/file1 >> H e/file1 >> >> So in this index, I use cache_name_pos() to find a directory "c/". I imagine the >> the value returned would be -3, which indicates this directory would be inserted >> at index position 2. However, the cache entry at position 2 is >> "d/file1", which is >> marked with SKIP_WORKTREE, and this fact cannot guarantee that "c/" is >> a sparse directory, since ''c/" is not in the index per se. I was thinking more about the case where you find a tracked directory that has all contained files marked with SKIP_WORTKREE. If you search for "d/" you will also get -3. Then, you will see that at position 2 the cache entry d/file1 has SKIP_WORKTREE. The directory "d/" should exist on disk if there are any entries starting with "d/" that do not have the SKIP_WORKTREE bit. Interesting things happen if you are in the scenario where d/f is in the cone-mode sparse-checkout definition and you see this list of cache entries: S d/a H d/f/b S d/g Again, d/f/b should imply the existence of the directory d/ in the worktree. >> Probably I'm missing something, or I'm just dumb. > > Though I think doing a strncmp() after the cache_name_pos() > can get the job done :) I think this is the way to do it, including index_range_of_same_dir() in builtin/mv.c. Thanks, -Stolee
diff --git a/builtin/mv.c b/builtin/mv.c index 32ad4d5682..9da9205e01 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -115,6 +115,25 @@ static int index_range_of_same_dir(const char *src, int length, return last - first; } +static int check_dir_in_index(const char *dir) +{ + int ret = 0; + int length = sizeof(dir) + 1; + char *substr = malloc(length); + + for (int i = 0; i < the_index.cache_nr; i++) { + memcpy(substr, the_index.cache[i]->name, length); + memset(substr + length - 1, 0, 1); + + if (strcmp(dir, substr) == 0) { + ret = 1; + return ret; + } + } + free(substr); + return ret; +} + int cmd_mv(int argc, const char **argv, const char *prefix) { int i, flags, gitmodules_modified = 0; @@ -129,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) OPT_END(), }; const char **source, **destination, **dest_path, **submodule_gitfile; - enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes; + enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE, + SPARSE_DIRECTORY } *modes; struct stat st; struct string_list src_for_dst = STRING_LIST_INIT_NODUP; struct lock_file lock_file = LOCK_INIT; @@ -197,6 +217,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) */ int pos = cache_name_pos(src, length); + const char *src_w_slash = add_slash(src); + if (pos >= 0) { const struct cache_entry *ce = active_cache[pos]; @@ -209,6 +231,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix) else bad = _("bad source"); } + else if (check_dir_in_index(src_w_slash) && + !path_in_sparse_checkout(src_w_slash, &the_index)) { + modes[i] = SPARSE_DIRECTORY; + goto dir_check; + } /* only error if existence is expected. */ else if (modes[i] != SPARSE) bad = _("bad source"); @@ -219,7 +246,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) && lstat(dst, &st) == 0) bad = _("cannot move directory over file"); else if (src_is_dir) { - int first = cache_name_pos(src, length), last; + int first, last; +dir_check: + first = cache_name_pos(src, length); if (first >= 0) prepare_move_submodule(src, first, @@ -230,7 +259,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) else { /* last - first >= 1 */ int j, dst_len, n; - modes[i] = WORKING_DIRECTORY; + if (!modes[i]) + modes[i] = WORKING_DIRECTORY; n = argc + last - first; REALLOC_ARRAY(source, n); REALLOC_ARRAY(destination, n); @@ -332,7 +362,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix) printf(_("Renaming %s to %s\n"), src, dst); if (show_only) continue; - if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) { + if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY && + rename(src, dst) < 0) { if (ignore_errors) continue; die_errno(_("renaming '%s' failed"), src); @@ -346,7 +377,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix) 1); } - if (mode == WORKING_DIRECTORY) + if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY) continue; pos = cache_name_pos(src, strlen(src));
Originally, moving a <source> directory which is not on-disk due to its existence outside of sparse-checkout cone, "giv mv" command errors out with "bad source". Add a helper check_dir_in_index() function to see if a directory name exists in the index. Also add a SPARSE_DIRECTORY bit to mark such directories. Change the checking logic, so that such <source> directory makes "giv mv" command warns with "advise_on_updating_sparse_paths()" instead of "bad source"; also user now can supply a "--sparse" flag so this operation can be carried out successfully. Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> --- Since I'm so new to C language (not an acquaintance until this patch), the "check_dir_in_index()" function I added might not be ideal (in terms of safety and correctness?). I have digging into the APIs provided in the codebase but I haven't found anything to do this very job: find out if a directory is in the index (am I missing something?). Probably because contents are stored in the index as blobs and they all represent regular files. So I came up with this dull solution... builtin/mv.c | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-)