Message ID | 80b5ba61861193daf7132aa64b65fc7dde90dacb.1634866698.git.matheus.bernardino@usp.br (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] add|rm|mv: fix bug that prevent the update of non-sparse | expand |
On Thu, Oct 21, 2021 at 11:28 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > diff --git a/dir.c b/dir.c > index a4306ab874..225487a59c 100644 > --- a/dir.c > +++ b/dir.c > @@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path, > !istate->sparse_checkout_patterns->use_cone_patterns)) > return 1; > > - base = strrchr(path, '/'); > - return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path, > - &dtype, > - istate->sparse_checkout_patterns, > - istate) > 0; > + if (istate->sparse_checkout_patterns->use_cone_patterns) { > + const char *base = strrchr(path, '/'); > + return path_matches_pattern_list(path, strlen(path), > + base ? base + 1 : path, &dtype, > + istate->sparse_checkout_patterns, istate) > 0; > + } > + > + for (p = path; ; p++) { > + enum pattern_match_result match; > + > + if (*p && *p != '/') > + continue; > + > + match = path_matches_pattern_list(path, p - path, > + last_slash ? last_slash + 1 : path, &dtype, > + istate->sparse_checkout_patterns, istate); > + > + if (match != UNDECIDED) > + ret = match; > + if (!*p) > + break; > + last_slash = p; > + } > + > + return ret; > } Of course, after hitting send I realized it would make a lot more sense to start the pattern matching from the full path and only go backwards through the parent dirs until we find the first non-UNDECIDED result. I.e. something like this: static int path_in_sparse_checkout_1(const char *path, struct index_state *istate, int require_cone_mode) { int dtype = DT_REG; enum pattern_match_result ret; const char *p, *base; /* * We default to accepting a path if there are no patterns or * they are of the wrong type. */ if (init_sparse_checkout_patterns(istate) || (require_cone_mode && !istate->sparse_checkout_patterns->use_cone_patterns)) return 1; if (istate->sparse_checkout_patterns->use_cone_patterns) { base = strrchr(path, '/'); return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path, &dtype, istate->sparse_checkout_patterns, istate) > 0; } /* * If the match for the path is UNDECIDED, try to match the parent dir * recursively. */ for (p = path + strlen(path); p && p > path; p = base) { base = memrchr(path, '/', p - path); ret = path_matches_pattern_list(path, p - path, base ? base + 1 : path, &dtype, istate->sparse_checkout_patterns, istate); if (ret != UNDECIDED) break; } return ret == UNDECIDED ? NOT_MATCHED : ret; } But I will let others comment on the overall idea and/or other alternatives before sending a possible v2.
On 10/21/2021 10:28 PM, Matheus Tavares wrote: > On Mon, Oct 18, 2021 at 6:28 PM Sean Christopherson <seanjc@google.com> wrote: >> >> $ cat .git/info/sparse-checkout >> !arch/* >> !tools/arch/* >> !virt/kvm/arm/* >> /* >> arch/.gitignore >> arch/Kconfig >> arch/x86 >> tools/arch/x86 >> tools/include/uapi/linux/kvm.h >> !Documentation >> !drivers >> >> $ git read-tree -mu HEAD >> >> $ rm arch/x86/kvm/x86.c > [...] >> $ git add arch/x86 >> The following paths and/or pathspecs matched paths that exist >> outside of your sparse-checkout definition, so will not be >> updated in the index: >> arch/x86 > > I think the problem may be that we are performing pattern matching > slightly different in add, mv, and rm, in comparison to "git > sparse-checkout". On "git sparse-checkout init" (or reapply), we call > clear_ce_flags() which calls path_matches_pattern_list() for each > component of the working tree paths. If the full path gives a match > result of UNDECIDED, we recursively try to use the match result from > the parent dir (or NOT_MATCHED if we reach the top with UNDECIDED). Yes! I think this is absolutely the problem. Thanks for pointing this out! > In Sean's example, we get UNDECIDED for "arch/x86/kvm/x86.c", but > "arch/x86" gives MATCHED, so we end up using that for the full path. > > However, in add|mv|rm we only call path_matches_pattern_list() for the > full path and get UNDECIDED, which we consider the same as NOT_MATCHED, > and end up disallowing the path update operation with a warning message. > > The commands do work if we replace the sparsity pattern "arch/x86" with > "arch/x86/" (with a trailing slash), but note that it only works > because the pattern is relative to the root (see dir.c:1297). If we > change it to "x86/", it would no longer work. > > So far, the only way I could think of to fix this would be to perform > pattern matching for the leading components of the paths too. That > doesn't seem very nice, though, as it can probably be quite expensive... > But here is a patch for discussion: I agree that it is expensive, but that's already the case for the non-cone sparse-checkout patterns. Hopefully it is sufficient that these cases are restricted to modified files (in the case of `git add .`) or specific pathspecs (in the case of `git mv` and `git rm`). > -- >8 -- > Subject: [RFC PATCH] add|rm|mv: fix bug that prevent the update of non-sparse dirs > > These three commands recently learned to avoid updating paths that do > not match the sparse-checkout patterns even if they are missing the > SKIP_WORKTREE bit. This is done using path_in_sparse_checkout(), which > tries to match the path with the current set of sparsity rules using > path_matches_pattern_list(). This is similar to what clear_ce_flags() > does when we run "git sparse-checkout init" or "git sparse-checkout > reapply". But note that clear_ce_flags() has a recursive behavior, > calling path_matches_pattern_list() for each component in a path, > whereas path_in_sparse_checkout() only calls it for the full path. This > makes the function miss matches such as the one between path "a/b/c" and > the pattern "b/". So if the user has the sparsity rules "!/a" and "b/", > for example, add, rm, and mv will fail to update the path "a/b/c" and > end up displaying a warning about "a/b/c" being outside the sparse > checkout even though it isn't. Note that this problem only occurs with > non-cone mode. > > Fix this by making path_in_sparse_checkout() perform pattern matching > for every component in the given path when cone mode is disabled. (This > can be expensive, and we might want to do some form of caching for the > match results of the leading components. However, this is not > implemented in this patch.) Also add two tests for each command (add, > rm, and mv) to check that they behave correctly with the said pattern > matching. The first test would previously fail without this patch, while > the second already succeeded. It is added mostly to make sure that we > are not breaking the existing pattern matching for directories that are > really sparse, and also as a protection against any future > regressions. > > Note that two other existing tests had to be changed: one test in t3602 > checks that "git rm -r <dir>" won't remove sparse entries, but it > didn't allow the non-sparse entries inside <dir> to be removed. The > other one, in t7002, tested that "git mv" would correctly display a > warning message for sparse paths, but it accidentally expected the > message to include two non-sparse paths as well. > @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path, > struct index_state *istate, > int require_cone_mode) > { > - const char *base; > int dtype = DT_REG; > + enum pattern_match_result ret = NOT_MATCHED; > + const char *p, *last_slash = NULL; > > /* > * We default to accepting a path if there are no patterns or > @@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path, > !istate->sparse_checkout_patterns->use_cone_patterns)) > return 1; > > - base = strrchr(path, '/'); > - return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path, > - &dtype, > - istate->sparse_checkout_patterns, > - istate) > 0; > + if (istate->sparse_checkout_patterns->use_cone_patterns) { > + const char *base = strrchr(path, '/'); > + return path_matches_pattern_list(path, strlen(path), > + base ? base + 1 : path, &dtype, > + istate->sparse_checkout_patterns, istate) > 0; > + } > + > + for (p = path; ; p++) { > + enum pattern_match_result match; > + > + if (*p && *p != '/') > + continue; > + > + match = path_matches_pattern_list(path, p - path, > + last_slash ? last_slash + 1 : path, &dtype, > + istate->sparse_checkout_patterns, istate); > + > + if (match != UNDECIDED) > + ret = match; > + if (!*p) > + break; > + last_slash = p; > + } > + > + return ret; This implementation makes sense to me. > test_expect_success 'recursive rm does not remove sparse entries' ' > git reset --hard && > git sparse-checkout set sub/dir && > - test_must_fail git rm -r sub && > - git rm --sparse -r sub && > + git rm -r sub && Interesting that the new pattern-matching already presents a change of behavior in this test case. > git status --porcelain -uno >actual && > cat >expected <<-\EOF && > + D sub/dir/e > + EOF > + test_cmp expected actual && And here is why. Excellent. I suppose that setting the pattern to be "sub/dir/" would have shown this behavior before. > + > + git rm --sparse -r sub && > + git status --porcelain -uno >actual2 && > + cat >expected2 <<-\EOF && > D sub/d > D sub/dir/e > EOF > - test_cmp expected actual > + test_cmp expected2 actual2 > ' The rest of the test cases add new checks that are very valuable. I love this idea and I agree that it would be better to change the loop direction to match the full path first (as you mention in your response). Thanks, -Stolee
diff --git a/dir.c b/dir.c index a4306ab874..225487a59c 100644 --- a/dir.c +++ b/dir.c @@ -1504,8 +1504,9 @@ static int path_in_sparse_checkout_1(const char *path, struct index_state *istate, int require_cone_mode) { - const char *base; int dtype = DT_REG; + enum pattern_match_result ret = NOT_MATCHED; + const char *p, *last_slash = NULL; /* * We default to accepting a path if there are no patterns or @@ -1516,11 +1517,31 @@ static int path_in_sparse_checkout_1(const char *path, !istate->sparse_checkout_patterns->use_cone_patterns)) return 1; - base = strrchr(path, '/'); - return path_matches_pattern_list(path, strlen(path), base ? base + 1 : path, - &dtype, - istate->sparse_checkout_patterns, - istate) > 0; + if (istate->sparse_checkout_patterns->use_cone_patterns) { + const char *base = strrchr(path, '/'); + return path_matches_pattern_list(path, strlen(path), + base ? base + 1 : path, &dtype, + istate->sparse_checkout_patterns, istate) > 0; + } + + for (p = path; ; p++) { + enum pattern_match_result match; + + if (*p && *p != '/') + continue; + + match = path_matches_pattern_list(path, p - path, + last_slash ? last_slash + 1 : path, &dtype, + istate->sparse_checkout_patterns, istate); + + if (match != UNDECIDED) + ret = match; + if (!*p) + break; + last_slash = p; + } + + return ret; } int path_in_sparse_checkout(const char *path, diff --git a/t/t3602-rm-sparse-checkout.sh b/t/t3602-rm-sparse-checkout.sh index ecce497a9c..6e127b966e 100755 --- a/t/t3602-rm-sparse-checkout.sh +++ b/t/t3602-rm-sparse-checkout.sh @@ -40,14 +40,20 @@ done test_expect_success 'recursive rm does not remove sparse entries' ' git reset --hard && git sparse-checkout set sub/dir && - test_must_fail git rm -r sub && - git rm --sparse -r sub && + git rm -r sub && git status --porcelain -uno >actual && cat >expected <<-\EOF && + D sub/dir/e + EOF + test_cmp expected actual && + + git rm --sparse -r sub && + git status --porcelain -uno >actual2 && + cat >expected2 <<-\EOF && D sub/d D sub/dir/e EOF - test_cmp expected actual + test_cmp expected2 actual2 ' test_expect_success 'recursive rm --sparse removes sparse entries' ' @@ -105,4 +111,29 @@ test_expect_success 'refuse to rm a non-skip-worktree path outside sparse cone' test_path_is_missing b ' +test_expect_success 'can remove files from non-sparse dir' ' + git reset --hard && + git sparse-checkout disable && + mkdir -p w x/y && + test_commit w/f && + test_commit x/y/f && + + git sparse-checkout set w !/x y && + git rm w/f.t x/y/f.t 2>stderr && + test_must_be_empty stderr +' + +test_expect_success 'refuse to remove non-skip-worktree file from sparse dir' ' + git reset --hard && + git sparse-checkout disable && + mkdir -p x/y/z && + test_commit x/y/z/f && + git sparse-checkout set !/x y !x/y/z && + + git update-index --no-skip-worktree x/y/z/f.t && + test_must_fail git rm x/y/z/f.t 2>stderr && + echo x/y/z/f.t | cat sparse_error_header - sparse_hint >expect && + test_cmp expect stderr +' + test_done diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh index 5b904988d4..63f888af12 100755 --- a/t/t3705-add-sparse-checkout.sh +++ b/t/t3705-add-sparse-checkout.sh @@ -214,4 +214,22 @@ test_expect_success 'add allows sparse entries with --sparse' ' test_must_be_empty stderr ' +test_expect_success 'can add files from non-sparse dir' ' + git sparse-checkout set w !/x y && + mkdir -p w x/y && + touch w/f x/y/f && + git add w/f x/y/f 2>stderr && + test_must_be_empty stderr +' + +test_expect_success 'refuse to add non-skip-worktree file from sparse dir' ' + git sparse-checkout set !/x y !x/y/z && + mkdir -p x/y/z && + touch x/y/z/f && + test_must_fail git add x/y/z/f 2>stderr && + echo x/y/z/f | cat sparse_error_header - sparse_hint >expect && + test_cmp expect stderr + +' + test_done diff --git a/t/t7002-mv-sparse-checkout.sh b/t/t7002-mv-sparse-checkout.sh index 545748949a..91a857bf05 100755 --- a/t/t7002-mv-sparse-checkout.sh +++ b/t/t7002-mv-sparse-checkout.sh @@ -143,8 +143,6 @@ test_expect_success 'recursive mv refuses to move (possible) sparse' ' cat >>expect <<-\EOF && sub/d sub2/d - sub/dir/e - sub2/dir/e sub/dir2/e sub2/dir2/e EOF @@ -186,4 +184,30 @@ test_expect_success 'recursive mv refuses to move sparse' ' git reset --hard HEAD~1 ' +test_expect_success 'can move files to non-sparse dir' ' + git reset --hard && + git sparse-checkout init --no-cone && + git sparse-checkout set a b c w !/x y && + mkdir -p w x/y && + + git mv a w/new-a 2>stderr && + git mv b x/y/new-b 2>stderr && + test_must_be_empty stderr +' + +test_expect_success 'refuse to move file to non-skip-worktree sparse path' ' + git reset --hard && + git sparse-checkout init --no-cone && + git sparse-checkout set a !/x y !x/y/z && + mkdir -p x/y/z && + + test_must_fail git mv a x/y/z/new-a 2>stderr && + cat sparse_error_header >expect && + cat >>expect <<-\EOF && + x/y/z/new-a + EOF + cat sparse_hint >>expect && + test_cmp expect stderr +' + test_done