diff mbox series

[WIP,v1,2/4] mv: add check_dir_in_index() and solve general dir check issue

Message ID 20220331091755.385961-3-shaoxuan.yuan02@gmail.com (mailing list archive)
State New, archived
Headers show
Series mv: fix out-of-cone file/directory move logic | expand

Commit Message

Shaoxuan Yuan March 31, 2022, 9:17 a.m. UTC
Originally, moving a <source> directory which is not on-disk due
to its existence outside of sparse-checkout cone, "giv mv" command
errors out with "bad source".

Add a helper check_dir_in_index() function to see if a directory
name exists in the index. Also add a SPARSE_DIRECTORY bit to mark
such directories.

Change the checking logic, so that such <source> directory makes
"giv mv" command warns with "advise_on_updating_sparse_paths()"
instead of "bad source"; also user now can supply a "--sparse" flag so
this operation can be carried out successfully.

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
Since I'm so new to C language (not an acquaintance until this patch), 
the "check_dir_in_index()" function I added might not be ideal (in terms of 
safety and correctness?). I have digging into the APIs provided in the codebase 
but I haven't found anything to do this very job: find out if a directory is 
in the index (am I missing something?). 
Probably because contents are stored in the index as blobs and 
they all represent regular files. So I came up with this dull solution...

 builtin/mv.c | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

Comments

Ævar Arnfjörð Bjarmason March 31, 2022, 10:25 a.m. UTC | #1
On Thu, Mar 31 2022, Shaoxuan Yuan wrote:

> +static int check_dir_in_index(const char *dir)
> +{
> +	int ret = 0;
> +	int length = sizeof(dir) + 1;
> +	char *substr = malloc(length);
> +
> +	for (int i = 0; i < the_index.cache_nr; i++) {

See https://lore.kernel.org/git/xmqqy20r3rv7.fsf@gitster.g/ for how
we're not quite using this syntax yet.

This should also be "unsigned int" to go with the "cache_nr" member.

> +		memcpy(substr, the_index.cache[i]->name, length);
> +		memset(substr + length - 1, 0, 1);
> +
> +		if (strcmp(dir, substr) == 0) {

Style: don't compare against == 0, or == NULL, use !, see CodingGuidelines.

> +			else if (check_dir_in_index(src_w_slash) &&
> +			!path_in_sparse_checkout(src_w_slash, &the_index)) {

Funny indentation, the ! should be aligned with "(".

> -				modes[i] = WORKING_DIRECTORY;
> +				if (!modes[i])
> +					modes[i] = WORKING_DIRECTORY;

This works, but assuming things about enum values (even if 0) always
seems a bit nasty, can this be a comparison to BOTH instead of !? May or
may not be better...

But then again we do xcalloc() to allocate it, so we assume that
already, nevermind... :)

(there were also indentation issues below)
Victoria Dye March 31, 2022, 9:28 p.m. UTC | #2
Shaoxuan Yuan wrote:
> Originally, moving a <source> directory which is not on-disk due
> to its existence outside of sparse-checkout cone, "giv mv" command
> errors out with "bad source".
> 
> Add a helper check_dir_in_index() function to see if a directory
> name exists in the index. Also add a SPARSE_DIRECTORY bit to mark
> such directories.
> 

Hmm, I think this patch would fit better in your eventual "sparse index
integration" series than this "prerequisite fixes to sparse-checkout"
series. Sparse directories *only* appear when you're using a sparse index
so, theoretically, this shouldn't ever come up (and thus isn't testable)
until you're using a sparse index. 

Since it's here, though, I'm happy to review what you have (even if you
eventually move it to a later series)!

> Change the checking logic, so that such <source> directory makes
> "giv mv" command warns with "advise_on_updating_sparse_paths()"
> instead of "bad source"; also user now can supply a "--sparse" flag so
> this operation can be carried out successfully.
> 
> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
> ---
> Since I'm so new to C language (not an acquaintance until this patch), 
> the "check_dir_in_index()" function I added might not be ideal (in terms of 
> safety and correctness?). I have digging into the APIs provided in the codebase 
> but I haven't found anything to do this very job: find out if a directory is 
> in the index (am I missing something?). 
> Probably because contents are stored in the index as blobs and 
> they all represent regular files. So I came up with this dull solution...
> 
>  builtin/mv.c | 41 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/builtin/mv.c b/builtin/mv.c
> index 32ad4d5682..9da9205e01 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -115,6 +115,25 @@ static int index_range_of_same_dir(const char *src, int length,
>  	return last - first;
>  }
>  
> +static int check_dir_in_index(const char *dir)
> +{

This function can be made a lot simpler - you can use `cache_name_pos()` to
find the index entry - if it's found, the directory exists as a sparse
directory. And, because 'add_slash()' enforces the trailing slash later on,
you don't need to worry about adjusting the name before you look for the
entry.

> +	int ret = 0;
> +	int length = sizeof(dir) + 1;
> +	char *substr = malloc(length);
> +
> +	for (int i = 0; i < the_index.cache_nr; i++) {
> +		memcpy(substr, the_index.cache[i]->name, length);
> +		memset(substr + length - 1, 0, 1);
> +
> +		if (strcmp(dir, substr) == 0) {
> +			ret = 1;
> +			return ret;
> +		}
> +	}
> +	free(substr);
> +	return ret;
> +}
> +
>  int cmd_mv(int argc, const char **argv, const char *prefix)
>  {
>  	int i, flags, gitmodules_modified = 0;
> @@ -129,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  		OPT_END(),
>  	};
>  	const char **source, **destination, **dest_path, **submodule_gitfile;
> -	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes;
> +	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE,
> +	SPARSE_DIRECTORY } *modes;
>  	struct stat st;
>  	struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
>  	struct lock_file lock_file = LOCK_INIT;
> @@ -197,6 +217,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  			 */
>  
>  			int pos = cache_name_pos(src, length);
> +			const char *src_w_slash = add_slash(src);
> +
>  			if (pos >= 0) {
>  				const struct cache_entry *ce = active_cache[pos];
>  
> @@ -209,6 +231,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  				else
>  					bad = _("bad source");
>  			}
> +			else if (check_dir_in_index(src_w_slash) &&
> +			!path_in_sparse_checkout(src_w_slash, &the_index)) {
> +				modes[i] = SPARSE_DIRECTORY;
> +				goto dir_check;
> +			}

In if-statements like this, you'll want to line up the statements in
parentheses on subsequent lines, like:

	else if (check_dir_in_index(src_w_slash) &&
		 !path_in_sparse_checkout(src_w_slash, &the_index)) {

...where the second line is indented 1 (8-space-sized) tab + 1 space. 

In general, if you're trying to align code (in this repository), align first
with as many tabs as possible, then the "remainder" with spaces. Note that
this isn't 100% consistent throughout the repository - older lines might not
have been aligned properly - but you should go for this styling on any new
lines that you add.

>  			/* only error if existence is expected. */
>  			else if (modes[i] != SPARSE)
>  				bad = _("bad source");
> @@ -219,7 +246,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  				&& lstat(dst, &st) == 0)
>  			bad = _("cannot move directory over file");
>  		else if (src_is_dir) {
> -			int first = cache_name_pos(src, length), last;
> +			int first, last;
> +dir_check:
> +			first = cache_name_pos(src, length);
>  
>  			if (first >= 0)
>  				prepare_move_submodule(src, first,
> @@ -230,7 +259,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  			else { /* last - first >= 1 */
>  				int j, dst_len, n;
>  
> -				modes[i] = WORKING_DIRECTORY;
> +				if (!modes[i])
> +					modes[i] = WORKING_DIRECTORY;
>  				n = argc + last - first;
>  				REALLOC_ARRAY(source, n);
>  				REALLOC_ARRAY(destination, n);
> @@ -332,7 +362,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  			printf(_("Renaming %s to %s\n"), src, dst);
>  		if (show_only)
>  			continue;
> -		if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {
> +		if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY &&
> +		 rename(src, dst) < 0) {
>  			if (ignore_errors)
>  				continue;
>  			die_errno(_("renaming '%s' failed"), src);
> @@ -346,7 +377,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  							      1);
>  		}
>  
> -		if (mode == WORKING_DIRECTORY)
> +		if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY)

I'm a bit confused - doesn't this mean the sparse dir move will be skipped?
In your commit description, you mention that this 'mv' succeeds with the
'--sparse' option, but I don't see any place where the sparse directory
would be moved. 

>  			continue;
>  
>  		pos = cache_name_pos(src, strlen(src));
Shaoxuan Yuan April 1, 2022, 3:51 a.m. UTC | #3
On Thu, Mar 31, 2022 at 6:30 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Thu, Mar 31 2022, Shaoxuan Yuan wrote:
>
> > +static int check_dir_in_index(const char *dir)
> > +{
> > +     int ret = 0;
> > +     int length = sizeof(dir) + 1;
> > +     char *substr = malloc(length);
> > +
> > +     for (int i = 0; i < the_index.cache_nr; i++) {
>
> See https://lore.kernel.org/git/xmqqy20r3rv7.fsf@gitster.g/ for how
> we're not quite using this syntax yet.
>
> This should also be "unsigned int" to go with the "cache_nr" member.
>
> > +             memcpy(substr, the_index.cache[i]->name, length);
> > +             memset(substr + length - 1, 0, 1);
> > +
> > +             if (strcmp(dir, substr) == 0) {
>
> Style: don't compare against == 0, or == NULL, use !, see CodingGuidelines.
>
> > +                     else if (check_dir_in_index(src_w_slash) &&
> > +                     !path_in_sparse_checkout(src_w_slash, &the_index)) {
>
> Funny indentation, the ! should be aligned with "(".
>
> > -                             modes[i] = WORKING_DIRECTORY;
> > +                             if (!modes[i])
> > +                                     modes[i] = WORKING_DIRECTORY;
>
> This works, but assuming things about enum values (even if 0) always
> seems a bit nasty, can this be a comparison to BOTH instead of !? May or
> may not be better...
>
> But then again we do xcalloc() to allocate it, so we assume that
> already, nevermind... :)
>
> (there were also indentation issues below)

Thanks for the styling reminders! I should go back and reread CodingGuidelines
more often... Things just slipped off my mind since I'm still trying to remember
the guideline...
Shaoxuan Yuan April 1, 2022, 12:49 p.m. UTC | #4
On Fri, Apr 1, 2022 at 5:28 AM Victoria Dye <vdye@github.com> wrote:
>
> Shaoxuan Yuan wrote:
> > Originally, moving a <source> directory which is not on-disk due
> > to its existence outside of sparse-checkout cone, "giv mv" command
> > errors out with "bad source".
> >
> > Add a helper check_dir_in_index() function to see if a directory
> > name exists in the index. Also add a SPARSE_DIRECTORY bit to mark
> > such directories.
> >
>
> Hmm, I think this patch would fit better in your eventual "sparse index
> integration" series than this "prerequisite fixes to sparse-checkout"
> series. Sparse directories *only* appear when you're using a sparse index
> so, theoretically, this shouldn't ever come up (and thus isn't testable)
> until you're using a sparse index.

After reading your feedback, I realized that I totally misused
the phrase "sparse directory". Clearly, this patch series does not
deal with sparse-
index yet, as "sparse directory" is a cache entry that points to a
tree, if sparse-index
is enabled. Silly me ;)

What I was *actually* trying to say is: I want to change the checking
logic of moving
a "directory that exists outside of sparse-checkout cone", and I
apparently misused
"sparse directory" to reference such a thing.

> Since it's here, though, I'm happy to review what you have (even if you
> eventually move it to a later series)!

Thanks!

> > diff --git a/builtin/mv.c b/builtin/mv.c
> > index 32ad4d5682..9da9205e01 100644
> > --- a/builtin/mv.c
> > +++ b/builtin/mv.c
> > @@ -115,6 +115,25 @@ static int index_range_of_same_dir(const char *src, int length,
> >       return last - first;
> >  }
> >
> > +static int check_dir_in_index(const char *dir)
> > +{
>
> This function can be made a lot simpler - you can use `cache_name_pos()` to
> find the index entry - if it's found, the directory exists as a sparse
> directory. And, because 'add_slash()' enforces the trailing slash later on,
> you don't need to worry about adjusting the name before you look for the
> entry.

Yes, if I correctly used the phrase "sparse directory", but I did not...
I think I can use 'cache_name_pos()' to
check a directory *iff* it is a legit sparse directory when using sparse-index?

In my case, I just want to check a regular directory that is not in
the worktree,
since the cone pattern excludes it. And in a non-sparse index, cache
entry points only
to blobs, not trees, and that's the reason I wrote this weird function
to look into the
index. I understand that sounds not compatible with how git manages
index, but all
I want to know is "does this directory exist in the index?" (this
question is also quasi-correct).

I tried to find an existing API for this job, but I failed to find
any. Though I have a hunch
that there must be something to do it...

> > +     int ret = 0;
> > +     int length = sizeof(dir) + 1;
> > +     char *substr = malloc(length);
> > +
> > +     for (int i = 0; i < the_index.cache_nr; i++) {
> > +             memcpy(substr, the_index.cache[i]->name, length);
> > +             memset(substr + length - 1, 0, 1);
> > +
> > +             if (strcmp(dir, substr) == 0) {
> > +                     ret = 1;
> > +                     return ret;
> > +             }
> > +     }
> > +     free(substr);
> > +     return ret;
> > +}
> > +
> >  int cmd_mv(int argc, const char **argv, const char *prefix)
> >  {
> >       int i, flags, gitmodules_modified = 0;
> > @@ -129,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >               OPT_END(),
> >       };
> >       const char **source, **destination, **dest_path, **submodule_gitfile;
> > -     enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes;
> > +     enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE,
> > +     SPARSE_DIRECTORY } *modes;
> >       struct stat st;
> >       struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
> >       struct lock_file lock_file = LOCK_INIT;
> > @@ -197,6 +217,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                        */
> >
> >                       int pos = cache_name_pos(src, length);
> > +                     const char *src_w_slash = add_slash(src);
> > +
> >                       if (pos >= 0) {
> >                               const struct cache_entry *ce = active_cache[pos];
> >
> > @@ -209,6 +231,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                               else
> >                                       bad = _("bad source");
> >                       }
> > +                     else if (check_dir_in_index(src_w_slash) &&
> > +                     !path_in_sparse_checkout(src_w_slash, &the_index)) {
> > +                             modes[i] = SPARSE_DIRECTORY;
> > +                             goto dir_check;
> > +                     }
>
> In if-statements like this, you'll want to line up the statements in
> parentheses on subsequent lines, like:
>
>         else if (check_dir_in_index(src_w_slash) &&
>                  !path_in_sparse_checkout(src_w_slash, &the_index)) {
>
> ...where the second line is indented 1 (8-space-sized) tab + 1 space.
>
> In general, if you're trying to align code (in this repository), align first
> with as many tabs as possible, then the "remainder" with spaces. Note that
> this isn't 100% consistent throughout the repository - older lines might not
> have been aligned properly - but you should go for this styling on any new
> lines that you add.

Will do.

>
> >                       /* only error if existence is expected. */
> >                       else if (modes[i] != SPARSE)
> >                               bad = _("bad source");
> > @@ -219,7 +246,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                               && lstat(dst, &st) == 0)
> >                       bad = _("cannot move directory over file");
> >               else if (src_is_dir) {
> > -                     int first = cache_name_pos(src, length), last;
> > +                     int first, last;
> > +dir_check:
> > +                     first = cache_name_pos(src, length);
> >
> >                       if (first >= 0)
> >                               prepare_move_submodule(src, first,
> > @@ -230,7 +259,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                       else { /* last - first >= 1 */
> >                               int j, dst_len, n;
> >
> > -                             modes[i] = WORKING_DIRECTORY;
> > +                             if (!modes[i])
> > +                                     modes[i] = WORKING_DIRECTORY;
> >                               n = argc + last - first;
> >                               REALLOC_ARRAY(source, n);
> >                               REALLOC_ARRAY(destination, n);
> > @@ -332,7 +362,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                       printf(_("Renaming %s to %s\n"), src, dst);
> >               if (show_only)
> >                       continue;
> > -             if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {
> > +             if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY &&
> > +              rename(src, dst) < 0) {
> >                       if (ignore_errors)
> >                               continue;
> >                       die_errno(_("renaming '%s' failed"), src);
> > @@ -346,7 +377,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                                                             1);
> >               }
> >
> > -             if (mode == WORKING_DIRECTORY)
> > +             if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY)
>
> I'm a bit confused - doesn't this mean the sparse dir move will be skipped?
> In your commit description, you mention that this 'mv' succeeds with the
> '--sparse' option, but I don't see any place where the sparse directory
> would be moved.

Well, you know the drill, I did not use "sparse directory" correctly, let alone
'SPARSE_DIRECTORY' enum bit in this hunk. I think it makes some sense
if you apply my actual meaning of 'SPARSE_DIRECTORY' here (it should be
something like OUT_OF_CONE_WORKING_DIRECTORY)? Because such
directory is not on disk, it cannot be "rename()"d, and should also skip the
"rename_cache_entry_at()" function. If all the files under the directory are
moved/renamed, then (in my opinion) the directory is both moved to the
destination,
both in the worktree and in the index.
Derrick Stolee April 1, 2022, 2:49 p.m. UTC | #5
On 4/1/2022 8:49 AM, Shaoxuan Yuan wrote:> On Fri, Apr 1, 2022 at 5:28 AM Victoria Dye <vdye@github.com> wrote:
>>
>> Shaoxuan Yuan wrote:
>>> Originally, moving a <source> directory which is not on-disk due
>>> to its existence outside of sparse-checkout cone, "giv mv" command
>>> errors out with "bad source".
>>>
>>> Add a helper check_dir_in_index() function to see if a directory
>>> name exists in the index. Also add a SPARSE_DIRECTORY bit to mark
>>> such directories.
>>>
>>
>> Hmm, I think this patch would fit better in your eventual "sparse index
>> integration" series than this "prerequisite fixes to sparse-checkout"
>> series. Sparse directories *only* appear when you're using a sparse index
>> so, theoretically, this shouldn't ever come up (and thus isn't testable)
>> until you're using a sparse index.
> 
> After reading your feedback, I realized that I totally misused
> the phrase "sparse directory". Clearly, this patch series does not
> deal with sparse-
> index yet, as "sparse directory" is a cache entry that points to a
> tree, if sparse-index
> is enabled. Silly me ;)
> 
> What I was *actually* trying to say is: I want to change the checking
> logic of moving
> a "directory that exists outside of sparse-checkout cone", and I
> apparently misused
> "sparse directory" to reference such a thing.

In the case of a full index (or an expanded sparse index, which is
currently always the case for `git mv`), you detect a sparse directory
by looking for the directory in the index, _not_ finding it, and then
seeing if the cache entry at the position where the directory _would_
exist is marked with the SKIP_WORKTREE bit.

This works in cone mode and the old mode because I assume you've already
checked for the existence of the directory, so if there _was_ any
non-SKIP_WORKTREE cache entry within the directory, then the directory
would exist in the worktree.

(These are good details to include in the commit message.)

>>> +static int check_dir_in_index(const char *dir)
>>> +{
>>
>> This function can be made a lot simpler - you can use `cache_name_pos()` to
>> find the index entry - if it's found, the directory exists as a sparse
>> directory. And, because 'add_slash()' enforces the trailing slash later on,
>> you don't need to worry about adjusting the name before you look for the
>> entry.
> 
> Yes, if I correctly used the phrase "sparse directory", but I did not...
> I think I can use 'cache_name_pos()' to
> check a directory *iff* it is a legit sparse directory when using sparse-index?
> 
> In my case, I just want to check a regular directory that is not in
> the worktree,
> since the cone pattern excludes it. And in a non-sparse index, cache
> entry points only
> to blobs, not trees, and that's the reason I wrote this weird function
> to look into the
> index. I understand that sounds not compatible with how git manages
> index, but all
> I want to know is "does this directory exist in the index?" (this
> question is also quasi-correct).
> 
> I tried to find an existing API for this job, but I failed to find
> any. Though I have a hunch
> that there must be something to do it...

You can still use cache_name_pos() and if the resulting value is negative,
then you can "flip it" (pos = -1 - pos) to get the array index where the
directory _would_ be inserted.

For example, here is a case in unpack-trees.c (that uses the synonym
index_name_pos()):

static int locate_in_src_index(const struct cache_entry *ce,
			       struct unpack_trees_options *o)
{
	struct index_state *index = o->src_index;
	int len = ce_namelen(ce);
	int pos = index_name_pos(index, ce->name, len);
	if (pos < 0)
		pos = -1 - pos;
	return pos;
}

This uses a binary search inside the method, so it will be much faster
than the loop you wrote here.

If you have this helper, then you can integrate with the sparse index later
by checking for a sparse directory entry when pos is non-negative. But that
can wait for the next series.

>>>                       /* only error if existence is expected. */
>>>                       else if (modes[i] != SPARSE)
>>>                               bad = _("bad source");
>>> @@ -219,7 +246,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>                               && lstat(dst, &st) == 0)
>>>                       bad = _("cannot move directory over file");
>>>               else if (src_is_dir) {
>>> -                     int first = cache_name_pos(src, length), last;
>>> +                     int first, last;
>>> +dir_check:
>>> +                     first = cache_name_pos(src, length);
>>>
>>>                       if (first >= 0)
>>>                               prepare_move_submodule(src, first,
>>> @@ -230,7 +259,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>                       else { /* last - first >= 1 */
>>>                               int j, dst_len, n;
>>>
>>> -                             modes[i] = WORKING_DIRECTORY;
>>> +                             if (!modes[i])
>>> +                                     modes[i] = WORKING_DIRECTORY;

This is curious that we could get here with an existing mode. I wonder if
it would be worthwhile to make the enum using a "flags" mode (each state
is a different bit in the word) so instead of

	modes[i] = WORKING_DIRECTORY;

we would write

	modes[i] |= WORKING_DIRECTORY;

>>>                               n = argc + last - first;
>>>                               REALLOC_ARRAY(source, n);
>>>                               REALLOC_ARRAY(destination, n);
>>> @@ -332,7 +362,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>                       printf(_("Renaming %s to %s\n"), src, dst);
>>>               if (show_only)
>>>                       continue;
>>> -             if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {
>>> +             if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY &&

And here we would write something like

	if (!(mode & (INDEX | SPARSE | SPARSE_DIRECTORY)) &&

>>> +              rename(src, dst) < 0) {
>>>                       if (ignore_errors)
>>>                               continue;
>>>                       die_errno(_("renaming '%s' failed"), src);
>>> @@ -346,7 +377,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>                                                             1);
>>>               }
>>>
>>> -             if (mode == WORKING_DIRECTORY)
>>> +             if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY)

and here:

	if (mode & (WORKING_DIRECTORY | SPARSE_DIRECTORY))

This requires changing your enum definition. It got lost in the previous
quoting, it seems, but here it is again:

>>> @@ -129,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>>>  		OPT_END(),
>>>  	};
>>>  	const char **source, **destination, **dest_path, **submodule_gitfile;
>>> -	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes;
>>> +	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE,
>>> +	SPARSE_DIRECTORY } *modes;
>>>  	struct stat st;
>>>  	struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
>>>  	struct lock_file lock_file = LOCK_INIT;

I think it is time to split out "enum update_mode" to the top of the file
instead of defining it inline here. Here is what it could look like:

enum update_mode {
	BOTH = 0,
	WORKING_DIRECTORY = (1 << 1),
	INDEX = (1 << 2),
	SPARSE = (1 << 3),
};

(This is how it would look before adding your new value.)

I can imagine making a new commit that does the following:

* Move update_mode to the top and set the values to be independent bits.
* Change "mode[i] =" to "mode[i] |="
* Change "mode ==" checks to "mode &" checks.

Think about it.

>> I'm a bit confused - doesn't this mean the sparse dir move will be skipped?
>> In your commit description, you mention that this 'mv' succeeds with the
>> '--sparse' option, but I don't see any place where the sparse directory
>> would be moved.
> 
> Well, you know the drill, I did not use "sparse directory" correctly, let alone
> 'SPARSE_DIRECTORY' enum bit in this hunk. I think it makes some sense
> if you apply my actual meaning of 'SPARSE_DIRECTORY' here (it should be
> something like OUT_OF_CONE_WORKING_DIRECTORY)? Because such
> directory is not on disk, it cannot be "rename()"d, and should also skip the
> "rename_cache_entry_at()" function. If all the files under the directory are
> moved/renamed, then (in my opinion) the directory is both moved to the
> destination,
> both in the worktree and in the index.

Perhaps a better name would be SKIP_WORKTREE_DIR.

But yes, we need to make sure that all cache entries under the directory
have their SKIP_WORKTREE bits re-checked and any that lose the bit need to
be written to the worktree.

I wonder if it is as simple as marking a boolean that says "I moved at
least one sparse entry" and then calling update_sparsity() at the end of
cmd_mv() if that boolean is true.

Thanks,
-Stolee
Shaoxuan Yuan April 4, 2022, 7:25 a.m. UTC | #6
On Fri, Apr 1, 2022 at 10:49 PM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 4/1/2022 8:49 AM, Shaoxuan Yuan wrote:> On Fri, Apr 1, 2022 at 5:28 AM Victoria Dye <vdye@github.com> wrote:
> >>
> >> Shaoxuan Yuan wrote:
> >>> Originally, moving a <source> directory which is not on-disk due
> >>> to its existence outside of sparse-checkout cone, "giv mv" command
> >>> errors out with "bad source".
> >>>
> >>> Add a helper check_dir_in_index() function to see if a directory
> >>> name exists in the index. Also add a SPARSE_DIRECTORY bit to mark
> >>> such directories.
> >>>
> >>
> >> Hmm, I think this patch would fit better in your eventual "sparse index
> >> integration" series than this "prerequisite fixes to sparse-checkout"
> >> series. Sparse directories *only* appear when you're using a sparse index
> >> so, theoretically, this shouldn't ever come up (and thus isn't testable)
> >> until you're using a sparse index.
> >
> > After reading your feedback, I realized that I totally misused
> > the phrase "sparse directory". Clearly, this patch series does not
> > deal with sparse-
> > index yet, as "sparse directory" is a cache entry that points to a
> > tree, if sparse-index
> > is enabled. Silly me ;)
> >
> > What I was *actually* trying to say is: I want to change the checking
> > logic of moving
> > a "directory that exists outside of sparse-checkout cone", and I
> > apparently misused
> > "sparse directory" to reference such a thing.
>
> In the case of a full index (or an expanded sparse index, which is
> currently always the case for `git mv`), you detect a sparse directory
> by looking for the directory in the index, _not_ finding it, and then
> seeing if the cache entry at the position where the directory _would_
> exist is marked with the SKIP_WORKTREE bit.
>
> This works in cone mode and the old mode because I assume you've already
> checked for the existence of the directory, so if there _was_ any
> non-SKIP_WORKTREE cache entry within the directory, then the directory
> would exist in the worktree.
>
> (These are good details to include in the commit message.)

I read and think about this part a few times, but I'm still confused.

As Victoria pointed out earlier, and I quote, "Sparse directories *only* appear
when you're using a sparse index, so, theoretically, this shouldn't ever
come up (and thus isn't testable) until you're using a sparse index."
So I'm not so sure what do you mean by putting "full index" and "sparse
directory" together.

Thus, I go ahead and try to detect a directory that is outside of
sparse-checkout cone, without sparse-index enabled.

I found a problem that if you use cache_name_pos() to do this
detection, I imagined the following example (I'm trying to imitate an
output of "git ls-files -t"):

H a
H b
S d/file1
H e/file1

So in this index, I use cache_name_pos() to find a directory "c/". I imagine the
the value returned would be -3, which indicates this directory would be inserted
at index position 2. However, the cache entry at position 2 is
"d/file1", which is
marked with SKIP_WORKTREE, and this fact cannot guarantee that "c/" is
a sparse directory, since ''c/" is not in the index per se.

Probably I'm missing something, or I'm just dumb.

> Thanks,
> -Stolee
Shaoxuan Yuan April 4, 2022, 7:49 a.m. UTC | #7
On Mon, Apr 4, 2022 at 3:25 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote:
> I read and think about this part a few times, but I'm still confused.
>
> As Victoria pointed out earlier, and I quote, "Sparse directories *only* appear
> when you're using a sparse index, so, theoretically, this shouldn't ever
> come up (and thus isn't testable) until you're using a sparse index."
> So I'm not so sure what do you mean by putting "full index" and "sparse
> directory" together.
>
> Thus, I go ahead and try to detect a directory that is outside of
> sparse-checkout cone, without sparse-index enabled.
>
> I found a problem that if you use cache_name_pos() to do this
> detection, I imagined the following example (I'm trying to imitate an
> output of "git ls-files -t"):
>
> H a
> H b
> S d/file1
> H e/file1
>
> So in this index, I use cache_name_pos() to find a directory "c/". I imagine the
> the value returned would be -3, which indicates this directory would be inserted
> at index position 2. However, the cache entry at position 2 is
> "d/file1", which is
> marked with SKIP_WORKTREE, and this fact cannot guarantee that "c/" is
> a sparse directory, since ''c/" is not in the index per se.
>
> Probably I'm missing something, or I'm just dumb.

Though I think doing a strncmp() after the cache_name_pos()
can get the job done :)
Derrick Stolee April 4, 2022, 12:43 p.m. UTC | #8
On 4/4/2022 3:49 AM, Shaoxuan Yuan wrote:
> On Mon, Apr 4, 2022 at 3:25 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote:
>> I read and think about this part a few times, but I'm still confused.
>>
>> As Victoria pointed out earlier, and I quote, "Sparse directories *only* appear
>> when you're using a sparse index, so, theoretically, this shouldn't ever
>> come up (and thus isn't testable) until you're using a sparse index."
>> So I'm not so sure what do you mean by putting "full index" and "sparse
>> directory" together.
>>
>> Thus, I go ahead and try to detect a directory that is outside of
>> sparse-checkout cone, without sparse-index enabled.
>>
>> I found a problem that if you use cache_name_pos() to do this
>> detection, I imagined the following example (I'm trying to imitate an
>> output of "git ls-files -t"):
>>
>> H a
>> H b
>> S d/file1
>> H e/file1
>>
>> So in this index, I use cache_name_pos() to find a directory "c/". I imagine the
>> the value returned would be -3, which indicates this directory would be inserted
>> at index position 2. However, the cache entry at position 2 is
>> "d/file1", which is
>> marked with SKIP_WORKTREE, and this fact cannot guarantee that "c/" is
>> a sparse directory, since ''c/" is not in the index per se.

I was thinking more about the case where you find a tracked directory that
has all contained files marked with SKIP_WORTKREE.

If you search for "d/" you will also get -3. Then, you will see that at
position 2 the cache entry d/file1 has SKIP_WORKTREE. The directory "d/"
should exist on disk if there are any entries starting with "d/" that do
not have the SKIP_WORKTREE bit.

Interesting things happen if you are in the scenario where d/f is in the
cone-mode sparse-checkout definition and you see this list of cache entries:

S d/a
H d/f/b
S d/g

Again, d/f/b should imply the existence of the directory d/ in the worktree.
>> Probably I'm missing something, or I'm just dumb.
> 
> Though I think doing a strncmp() after the cache_name_pos()
> can get the job done :)
 
I think this is the way to do it, including index_range_of_same_dir() in
builtin/mv.c.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/builtin/mv.c b/builtin/mv.c
index 32ad4d5682..9da9205e01 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -115,6 +115,25 @@  static int index_range_of_same_dir(const char *src, int length,
 	return last - first;
 }
 
+static int check_dir_in_index(const char *dir)
+{
+	int ret = 0;
+	int length = sizeof(dir) + 1;
+	char *substr = malloc(length);
+
+	for (int i = 0; i < the_index.cache_nr; i++) {
+		memcpy(substr, the_index.cache[i]->name, length);
+		memset(substr + length - 1, 0, 1);
+
+		if (strcmp(dir, substr) == 0) {
+			ret = 1;
+			return ret;
+		}
+	}
+	free(substr);
+	return ret;
+}
+
 int cmd_mv(int argc, const char **argv, const char *prefix)
 {
 	int i, flags, gitmodules_modified = 0;
@@ -129,7 +148,8 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 		OPT_END(),
 	};
 	const char **source, **destination, **dest_path, **submodule_gitfile;
-	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes;
+	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE,
+	SPARSE_DIRECTORY } *modes;
 	struct stat st;
 	struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
 	struct lock_file lock_file = LOCK_INIT;
@@ -197,6 +217,8 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 			 */
 
 			int pos = cache_name_pos(src, length);
+			const char *src_w_slash = add_slash(src);
+
 			if (pos >= 0) {
 				const struct cache_entry *ce = active_cache[pos];
 
@@ -209,6 +231,11 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 				else
 					bad = _("bad source");
 			}
+			else if (check_dir_in_index(src_w_slash) &&
+			!path_in_sparse_checkout(src_w_slash, &the_index)) {
+				modes[i] = SPARSE_DIRECTORY;
+				goto dir_check;
+			}
 			/* only error if existence is expected. */
 			else if (modes[i] != SPARSE)
 				bad = _("bad source");
@@ -219,7 +246,9 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 				&& lstat(dst, &st) == 0)
 			bad = _("cannot move directory over file");
 		else if (src_is_dir) {
-			int first = cache_name_pos(src, length), last;
+			int first, last;
+dir_check:
+			first = cache_name_pos(src, length);
 
 			if (first >= 0)
 				prepare_move_submodule(src, first,
@@ -230,7 +259,8 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 			else { /* last - first >= 1 */
 				int j, dst_len, n;
 
-				modes[i] = WORKING_DIRECTORY;
+				if (!modes[i])
+					modes[i] = WORKING_DIRECTORY;
 				n = argc + last - first;
 				REALLOC_ARRAY(source, n);
 				REALLOC_ARRAY(destination, n);
@@ -332,7 +362,8 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 			printf(_("Renaming %s to %s\n"), src, dst);
 		if (show_only)
 			continue;
-		if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {
+		if (mode != INDEX && mode != SPARSE && mode != SPARSE_DIRECTORY &&
+		 rename(src, dst) < 0) {
 			if (ignore_errors)
 				continue;
 			die_errno(_("renaming '%s' failed"), src);
@@ -346,7 +377,7 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 							      1);
 		}
 
-		if (mode == WORKING_DIRECTORY)
+		if (mode == WORKING_DIRECTORY || mode == SPARSE_DIRECTORY)
 			continue;
 
 		pos = cache_name_pos(src, strlen(src));