diff mbox series

[WIP,v2,4/5] mv: add check_dir_in_index() and solve general dir check issue

Message ID 20220527100804.209890-5-shaoxuan.yuan02@gmail.com (mailing list archive)
State Superseded
Headers show
Series mv: fix out-of-cone file/directory move logic | expand

Commit Message

Shaoxuan Yuan May 27, 2022, 10:08 a.m. UTC
Originally, moving a <source> directory which is not on-disk due
to its existence outside of sparse-checkout cone, "giv mv" command
errors out with "bad source".

Add a helper check_dir_in_index() function to see if a directory
name exists in the index. Also add a SKIP_WORKTREE_DIR bit to mark
such directories.

Change the checking logic, so that such <source> directory makes
"giv mv" command warns with "advise_on_updating_sparse_paths()"
instead of "bad source"; also user now can supply a "--sparse" flag so
this operation can be carried out successfully.

Also, as suggested by Derrick [1],
move the in-line definition of "enum update_mode" to the top
of the file and make it use "flags" mode (each state is a different
bit in the word).

[1] https://lore.kernel.org/git/22aadea2-9330-aa9e-7b6a-834585189144@github.com/

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/mv.c                  | 60 ++++++++++++++++++++++++++++++++---
 t/t7002-mv-sparse-checkout.sh |  4 +--
 2 files changed, 57 insertions(+), 7 deletions(-)

Comments

Derrick Stolee May 27, 2022, 3:27 p.m. UTC | #1
On 5/27/2022 6:08 AM, Shaoxuan Yuan wrote:
> +/*
> + * Check if an out-of-cone directory should be in the index. Imagine this case
> + * that all the files under a directory are marked with 'CE_SKIP_WORKTREE' bit
> + * and thus the directory is sparsified.
> + *
> + * Return 0 if such directory exist (i.e. with any of its contained files not
> + * marked with CE_SKIP_WORKTREE, the directory would be present in working tree).
> + * Return 1 otherwise.
> + */
> +static int check_dir_in_index(const char *name, int namelen)
> +{
> +	int ret = 1;
> +	const char *with_slash = add_slash(name);
> +	int length = namelen + 1;
> +
> +	int pos = cache_name_pos(with_slash, length);
> +	const struct cache_entry *ce;
> +
> +	if (pos < 0) {
> +		pos = -pos - 1;
> +		if (pos >= the_index.cache_nr)
> +			return ret;
> +		ce = active_cache[pos];
> +		if (strncmp(with_slash, ce->name, length))
> +			return ret;
> +		if (ce_skip_worktree(ce))
> +			return ret = 0;

This appears to check if the _first_ entry under the directory
is sparse, but not if _all_ entries are sparse. These are not
the same thing, even in cone-mode sparse-checkout. The t1092
test directory has files like "folder1/0/0/a" but if
"folder1/1" is in the sparse-checkout cone, then that first
entry has the skip-worktree bit, but "folder1/1/a" and "folder1/a"
do not.

> +	}
> +	return ret;

At the moment, it doesn't seem like we need 'ret' since the
only place you set it is in "return ret = 0;" (which could
just be "return 0;" while the others are "return 1;"). But,
perhaps you intended to create a loop over 'pos' while
with_slash is a prefix of the cache entry?

> +			else if (!check_dir_in_index(src, length) &&
> +					 !path_in_sparse_checkout(src_w_slash, &the_index)) {

style-nit: You'll want to align the different parts of your
logical statement to agree with the end of the "else if (",

	else if (A &&
		 B) {


> +				modes[i] = SKIP_WORKTREE_DIR;

If we are moving to a flags-based model, should we convert all
"modes[i] =" to "modes[i] |=" as a first step (before adding the
SKIP_WORTKREE_DIR flag)?

> +				goto dir_check;

Hm. While I did recommend using 'goto' to jump to a common end
place in the loop body, I'm not sure about jumping into another
else-if statement. This might be a good time to extract the
code from "else if (src_is_dir)" below into a helper method that
can be used in both places.

> +			}
>  			/* only error if existence is expected. */
>  			else if (modes[i] != SPARSE)
>  				bad = _("bad source");
> @@ -218,7 +264,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  				&& lstat(dst, &st) == 0)
>  			bad = _("cannot move directory over file");
>  		else if (src_is_dir) {
> -			int first = cache_name_pos(src, length), last;
> +			int first, last;
> +dir_check:
> +			first = cache_name_pos(src, length);
>  
>  			if (first >= 0)
>  				prepare_move_submodule(src, first,
> @@ -229,7 +277,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  			else { /* last - first >= 1 */
>  				int j, dst_len, n;
>  
> -				modes[i] = WORKING_DIRECTORY;
> +				if (!modes[i])
> +					modes[i] |= WORKING_DIRECTORY;

This appears to only add the WORKING_DIRECTORY flag if modes[i] is
already zero. This maybe implies that we wouldn't understand
"WORKING_DIRECTORY | SKIP_WORKTREE_DIR" as a value.

>  				n = argc + last - first;
>  				REALLOC_ARRAY(source, n);
>  				REALLOC_ARRAY(destination, n);
> @@ -331,7 +380,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  			printf(_("Renaming %s to %s\n"), src, dst);
>  		if (show_only)
>  			continue;
> -		if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {
> +		if (!(mode & (INDEX | SPARSE | SKIP_WORKTREE_DIR)) &&
> +		 	rename(src, dst) < 0) {

style-nit: align your logical statements.

>  			if (ignore_errors)
>  				continue;
>  			die_errno(_("renaming '%s' failed"), src);
> @@ -345,7 +395,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  							      1);
>  		}
>  
> -		if (mode == WORKING_DIRECTORY)
> +		if (mode & (WORKING_DIRECTORY | SKIP_WORKTREE_DIR))
>  			continue;

Ok, here you check if _either_ mode is enabled, which is good. Maybe
you don't need the "if (!mode[i])" part above.

Thanks,
-Stolee
Shaoxuan Yuan May 31, 2022, 9:56 a.m. UTC | #2
On Fri, May 27, 2022 at 11:27 PM Derrick Stolee
<derrickstolee@github.com> wrote:
>
> On 5/27/2022 6:08 AM, Shaoxuan Yuan wrote:
> > +/*
> > + * Check if an out-of-cone directory should be in the index. Imagine this case
> > + * that all the files under a directory are marked with 'CE_SKIP_WORKTREE' bit
> > + * and thus the directory is sparsified.
> > + *
> > + * Return 0 if such directory exist (i.e. with any of its contained files not
> > + * marked with CE_SKIP_WORKTREE, the directory would be present in working tree).
> > + * Return 1 otherwise.
> > + */
> > +static int check_dir_in_index(const char *name, int namelen)
> > +{
> > +     int ret = 1;
> > +     const char *with_slash = add_slash(name);
> > +     int length = namelen + 1;
> > +
> > +     int pos = cache_name_pos(with_slash, length);
> > +     const struct cache_entry *ce;
> > +
> > +     if (pos < 0) {
> > +             pos = -pos - 1;
> > +             if (pos >= the_index.cache_nr)
> > +                     return ret;
> > +             ce = active_cache[pos];
> > +             if (strncmp(with_slash, ce->name, length))
> > +                     return ret;
> > +             if (ce_skip_worktree(ce))
> > +                     return ret = 0;
>
> This appears to check if the _first_ entry under the directory
> is sparse, but not if _all_ entries are sparse. These are not
> the same thing, even in cone-mode sparse-checkout. The t1092
> test directory has files like "folder1/0/0/a" but if
> "folder1/1" is in the sparse-checkout cone, then that first
> entry has the skip-worktree bit, but "folder1/1/a" and "folder1/a"
> do not.

Yes, it is checking the first entry and this would not work without the
lstat in the front. But I think the "lstat < 0" makes sure that this directory
cannot be partially sparsified.

It is either missing both in the worktree and index, or missing in the worktree
but present in index (with all its content sparsified). And because of that,
I think only the first entry needs to be checked.

> > +     }
> > +     return ret;
>
> At the moment, it doesn't seem like we need 'ret' since the
> only place you set it is in "return ret = 0;" (which could
> just be "return 0;" while the others are "return 1;"). But,
> perhaps you intended to create a loop over 'pos' while
> with_slash is a prefix of the cache entry?

I agree that this variable is redundant. But I fail to understand
the logical relation between before "But," and after "But,". Please
elaborate on that?

> > +                     else if (!check_dir_in_index(src, length) &&
> > +                                      !path_in_sparse_checkout(src_w_slash, &the_index)) {
>
> style-nit: You'll want to align the different parts of your
> logical statement to agree with the end of the "else if (",
>
>         else if (A &&
>                  B) {
>

This one is interesting because it appears just alright in my VSCode editor.
Later I found that it is because git-diff is using a tab size of 8 or something,
but my VSCode uses tab size of 4. After I configured the git-diff tab rendering
size, it looks alright. Same for another style nit down below.

> > +                             modes[i] = SKIP_WORKTREE_DIR;
>
> If we are moving to a flags-based model, should we convert all
> "modes[i] =" to "modes[i] |=" as a first step (before adding the
> SKIP_WORTKREE_DIR flag)?
>
> > +                             goto dir_check;
>
> Hm. While I did recommend using 'goto' to jump to a common end
> place in the loop body, I'm not sure about jumping into another
> else-if statement. This might be a good time to extract the
> code from "else if (src_is_dir)" below into a helper method that
> can be used in both places.

Right, this is suspicious. I wasn't familiar at all with C/C++, and being able
to do this inter-if-else jump also startled me.
I agree that it should be something more legitimate, like extracting a
method for it.

> > +                     }
> >                       /* only error if existence is expected. */
> >                       else if (modes[i] != SPARSE)
> >                               bad = _("bad source");
> > @@ -218,7 +264,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                               && lstat(dst, &st) == 0)
> >                       bad = _("cannot move directory over file");
> >               else if (src_is_dir) {
> > -                     int first = cache_name_pos(src, length), last;
> > +                     int first, last;
> > +dir_check:
> > +                     first = cache_name_pos(src, length);
> >
> >                       if (first >= 0)
> >                               prepare_move_submodule(src, first,
> > @@ -229,7 +277,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                       else { /* last - first >= 1 */
> >                               int j, dst_len, n;
> >
> > -                             modes[i] = WORKING_DIRECTORY;
> > +                             if (!modes[i])
> > +                                     modes[i] |= WORKING_DIRECTORY;
>
> This appears to only add the WORKING_DIRECTORY flag if modes[i] is
> already zero. This maybe implies that we wouldn't understand
> "WORKING_DIRECTORY | SKIP_WORKTREE_DIR" as a value.

At this point, I cannot think of the reason for writing it this way. And yes,
this does not make sense...

> >                               n = argc + last - first;
> >                               REALLOC_ARRAY(source, n);
> >                               REALLOC_ARRAY(destination, n);
> > @@ -331,7 +380,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                       printf(_("Renaming %s to %s\n"), src, dst);
> >               if (show_only)
> >                       continue;
> > -             if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {
> > +             if (!(mode & (INDEX | SPARSE | SKIP_WORKTREE_DIR)) &&
> > +                     rename(src, dst) < 0) {
>
> style-nit: align your logical statements.
>
> >                       if (ignore_errors)
> >                               continue;
> >                       die_errno(_("renaming '%s' failed"), src);
> > @@ -345,7 +395,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
> >                                                             1);
> >               }
> >
> > -             if (mode == WORKING_DIRECTORY)
> > +             if (mode & (WORKING_DIRECTORY | SKIP_WORKTREE_DIR))
> >                       continue;
>
> Ok, here you check if _either_ mode is enabled, which is good. Maybe
> you don't need the "if (!mode[i])" part above.
>
> Thanks,
> -Stolee
Derrick Stolee May 31, 2022, 3:49 p.m. UTC | #3
On 5/31/2022 5:56 AM, Shaoxuan Yuan wrote:
> On Fri, May 27, 2022 at 11:27 PM Derrick Stolee
> <derrickstolee@github.com> wrote:
>>
>> On 5/27/2022 6:08 AM, Shaoxuan Yuan wrote:
...
>> This appears to check if the _first_ entry under the directory
>> is sparse, but not if _all_ entries are sparse. These are not
>> the same thing, even in cone-mode sparse-checkout. The t1092
>> test directory has files like "folder1/0/0/a" but if
>> "folder1/1" is in the sparse-checkout cone, then that first
>> entry has the skip-worktree bit, but "folder1/1/a" and "folder1/a"
>> do not.
> 
> Yes, it is checking the first entry and this would not work without the
> lstat in the front. But I think the "lstat < 0" makes sure that this directory
> cannot be partially sparsified.
> 
> It is either missing both in the worktree and index, or missing in the worktree
> but present in index (with all its content sparsified). And because of that,
> I think only the first entry needs to be checked.

Ah! Good thinking. I hadn't considered that extra detail, so
we get to save some cycles here.

>>> +     }
>>> +     return ret;
>>
>> At the moment, it doesn't seem like we need 'ret' since the
>> only place you set it is in "return ret = 0;" (which could
>> just be "return 0;" while the others are "return 1;"). But,
>> perhaps you intended to create a loop over 'pos' while
>> with_slash is a prefix of the cache entry?
> 
> I agree that this variable is redundant. But I fail to understand
> the logical relation between before "But," and after "But,". Please
> elaborate on that?

I was just thinking that if you intended to write a loop as
I had suggested, then 'ret' could be modified or used in more
places. Feel free to ignore since we resolved that.

>>> +                     else if (!check_dir_in_index(src, length) &&
>>> +                                      !path_in_sparse_checkout(src_w_slash, &the_index)) {
>>
>> style-nit: You'll want to align the different parts of your
>> logical statement to agree with the end of the "else if (",
>>
>>         else if (A &&
>>                  B) {
>>
> 
> This one is interesting because it appears just alright in my VSCode editor.
> Later I found that it is because git-diff is using a tab size of 8 or something,
> but my VSCode uses tab size of 4. After I configured the git-diff tab rendering
> size, it looks alright. Same for another style nit down below.

That'll do it. You can double-check the alignment in your GGG
PR, which should use the correct tab width.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/builtin/mv.c b/builtin/mv.c
index 62284e3f86..e64f251a69 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -19,6 +19,14 @@  static const char * const builtin_mv_usage[] = {
 	NULL
 };
 
+enum update_mode {
+	BOTH = 0,
+	WORKING_DIRECTORY = (1 << 1),
+	INDEX = (1 << 2),
+	SPARSE = (1 << 3),
+	SKIP_WORKTREE_DIR = (1 << 4),
+};
+
 #define DUP_BASENAME 1
 #define KEEP_TRAILING_SLASH 2
 
@@ -115,6 +123,37 @@  static int index_range_of_same_dir(const char *src, int length,
 	return last - first;
 }
 
+/*
+ * Check if an out-of-cone directory should be in the index. Imagine this case
+ * that all the files under a directory are marked with 'CE_SKIP_WORKTREE' bit
+ * and thus the directory is sparsified.
+ *
+ * Return 0 if such directory exist (i.e. with any of its contained files not
+ * marked with CE_SKIP_WORKTREE, the directory would be present in working tree).
+ * Return 1 otherwise.
+ */
+static int check_dir_in_index(const char *name, int namelen)
+{
+	int ret = 1;
+	const char *with_slash = add_slash(name);
+	int length = namelen + 1;
+
+	int pos = cache_name_pos(with_slash, length);
+	const struct cache_entry *ce;
+
+	if (pos < 0) {
+		pos = -pos - 1;
+		if (pos >= the_index.cache_nr)
+			return ret;
+		ce = active_cache[pos];
+		if (strncmp(with_slash, ce->name, length))
+			return ret;
+		if (ce_skip_worktree(ce))
+			return ret = 0;
+	}
+	return ret;
+}
+
 int cmd_mv(int argc, const char **argv, const char *prefix)
 {
 	int i, flags, gitmodules_modified = 0;
@@ -129,7 +168,7 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 		OPT_END(),
 	};
 	const char **source, **destination, **dest_path, **submodule_gitfile;
-	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX, SPARSE } *modes;
+	enum update_mode *modes;
 	struct stat st;
 	struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
 	struct lock_file lock_file = LOCK_INIT;
@@ -187,6 +226,8 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 		if (lstat(src, &st) < 0) {
 
 			int pos = cache_name_pos(src, length);
+			const char *src_w_slash = add_slash(src);
+
 			if (pos >= 0) {
 				const struct cache_entry *ce = active_cache[pos];
 
@@ -208,6 +249,11 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 				else
 					bad = _("bad source");
 			}
+			else if (!check_dir_in_index(src, length) &&
+					 !path_in_sparse_checkout(src_w_slash, &the_index)) {
+				modes[i] = SKIP_WORKTREE_DIR;
+				goto dir_check;
+			}
 			/* only error if existence is expected. */
 			else if (modes[i] != SPARSE)
 				bad = _("bad source");
@@ -218,7 +264,9 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 				&& lstat(dst, &st) == 0)
 			bad = _("cannot move directory over file");
 		else if (src_is_dir) {
-			int first = cache_name_pos(src, length), last;
+			int first, last;
+dir_check:
+			first = cache_name_pos(src, length);
 
 			if (first >= 0)
 				prepare_move_submodule(src, first,
@@ -229,7 +277,8 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 			else { /* last - first >= 1 */
 				int j, dst_len, n;
 
-				modes[i] = WORKING_DIRECTORY;
+				if (!modes[i])
+					modes[i] |= WORKING_DIRECTORY;
 				n = argc + last - first;
 				REALLOC_ARRAY(source, n);
 				REALLOC_ARRAY(destination, n);
@@ -331,7 +380,8 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 			printf(_("Renaming %s to %s\n"), src, dst);
 		if (show_only)
 			continue;
-		if (mode != INDEX && mode != SPARSE && rename(src, dst) < 0) {
+		if (!(mode & (INDEX | SPARSE | SKIP_WORKTREE_DIR)) &&
+		 	rename(src, dst) < 0) {
 			if (ignore_errors)
 				continue;
 			die_errno(_("renaming '%s' failed"), src);
@@ -345,7 +395,7 @@  int cmd_mv(int argc, const char **argv, const char *prefix)
 							      1);
 		}
 
-		if (mode == WORKING_DIRECTORY)
+		if (mode & (WORKING_DIRECTORY | SKIP_WORKTREE_DIR))
 			continue;
 
 		pos = cache_name_pos(src, strlen(src));
diff --git a/t/t7002-mv-sparse-checkout.sh b/t/t7002-mv-sparse-checkout.sh
index 2c9008573a..cf2f5dc46f 100755
--- a/t/t7002-mv-sparse-checkout.sh
+++ b/t/t7002-mv-sparse-checkout.sh
@@ -206,7 +206,7 @@  test_expect_success 'refuse to move file to non-skip-worktree sparse path' '
 	test_cmp expect stderr
 '
 
-test_expect_failure 'refuse to move out-of-cone directory without --sparse' '
+test_expect_success 'refuse to move out-of-cone directory without --sparse' '
 	git sparse-checkout disable &&
 	git reset --hard &&
 	mkdir folder1 &&
@@ -222,7 +222,7 @@  test_expect_failure 'refuse to move out-of-cone directory without --sparse' '
 	test_cmp expect stderr
 '
 
-test_expect_failure 'can move out-of-cone directory with --sparse' '
+test_expect_success 'can move out-of-cone directory with --sparse' '
 	git sparse-checkout disable &&
 	git reset --hard &&
 	mkdir folder1 &&