Message ID | 20220315100145.214054-2-shaoxuan.yuan02@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mv: integrate with sparse-index | expand |
Shaoxuan Yuan wrote: > Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> > --- > builtin/mv.c | 3 +++ > t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > Hi Shaoxuan! I'll answer your question "are the tests on the right track?" [1] inline with the tests here. [1] https://lore.kernel.org/git/20220315100145.214054-1-shaoxuan.yuan02@gmail.com/ > diff --git a/builtin/mv.c b/builtin/mv.c > index 83a465ba83..111360ebf5 100644 > --- a/builtin/mv.c > +++ b/builtin/mv.c > @@ -138,6 +138,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > git_config(git_default_config, NULL); > > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + > argc = parse_options(argc, argv, prefix, builtin_mv_options, > builtin_mv_usage, 0); > if (--argc < 1) > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index 2a04b532f9..0a8164c5f6 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -1521,4 +1521,38 @@ test_expect_success 'checkout behaves oddly with df-conflict-2' ' > test_cmp full-checkout-err sparse-index-err > ' > > +test_expect_success 'mv' ' > + init_repos && > + In 't1092', I've tried to write test cases around some of the characteristics relevant to sparse checkout/sparse index. For example: - files inside vs. outside of sparse cone (e.g., 'deep/a' vs 'folder1/a') - "normal" directories vs. sparse directories (e.g., 'deep/' vs. 'folder1/') - directories inside a sparse directory vs. "toplevel" sparse directories (e.g., 'folder1/0/' vs. 'folder1/') - options that follow different code paths, especially if those code paths interact with the index differently (e.g., 'git reset --hard' vs 'git reset --mixed') - (probably not relevant for 'git mv') files with vs. without staged changes in the index I've found that exercising these characteristics provides good baseline coverage for a sparse index integration, not leaving any major gaps. I'll also typically add cases specific to any workarounds I need to add to a command (like for 'git read-tree --prefix' [2]). Also, if some of the information about the test repos (e.g., what's inside vs. outside cone, or what's in the repos in the first place) isn't clear, I'm happy to give a deeper dive into how they're set up. With all of that in mind, let's go over the cases you have so far. [2] https://lore.kernel.org/git/90ebcb7b8ff4b4f1ba09abcbe636d639fa597e74.1646166271.git.gitgitgadget@gmail.com/ > + # test first form <source> <destination> > + test_all_match git mv deep/a deep/a_mod && > + test_all_match git mv deep/deeper1 deep/deeper1_mod && > + test_all_match git mv deep/deeper2/deeper1/deepest2/a \ > + deep/deeper2/deeper1/deepest2/a_mod && > + This is a good basis for "inside cone" to "inside cone" moves. That said, I don't think you need all three (since they're all testing effectively the same inside-to-inside cone move). I'd suggest instead adding cases like: test_all_match git mv <inside cone> <outside cone> && test_all_match git mv <outside cone> <inside cone> && test_all_match git mv <outside cone> <outside cone> && to see how the sparse index behaves when files are moved in and out of sparse directories. Similarly, you may want to try 'git mv <sparse directory> <somewhere else>' to see if that triggers any unintended behavior. Additionally, I don't *think* 'git mv' prints out the state of the index, so you'll probably want to follow these cases with: test_all_match git status --porcelain=v2 && which prints the status info in a machine-readable format. > + run_on_all git reset --hard && > + > + test_all_match git mv -f deep/a deep/before/a && > + test_all_match git mv -f deep/before/a deep/a && > + Good! The '-f' option will allow one file to overwrite another in the index, which is definitely interesting in a sparse index. Same as above, though, you should verify 'git status --porcelain=v2'. > + run_on_all git reset --hard && > + > + test_all_match git mv -k deep/a deep/before/a && > + test_all_match git mv -k deep/before/a deep/a && > + The '-k' option might be interesting in the context of the index, since it pushes past errors that would normally make it exit early. However, if it just skips things that fail rather than exiting with an error, it probably isn't testing anything more than the 'git mv' cases. > + run_on_all git reset --hard && > + > + test_all_match git mv -v deep/a deep/a_mod && > + test_all_match git mv -v deep/deeper1 deep/deeper1_mod && > + test_all_match git mv -v deep/deeper2/deeper1/deepest2/a \ > + deep/deeper2/deeper1/deepest2/a_mod && > + Looking at 'builtin/mv.c', the '-v' "verbose" option only controls whether some verbose printouts are emitted. This might be relevant if the printouts were printing index information that didn't match between 'full-checkout', 'sparse-checkout' and 'sparse-index', but if you haven't seen that, I'd leave these cases out. > + # test second form <source> ... <destination directory> > + run_on_all git reset --hard && > + run_on_all mkdir deep/folder && > + test_all_match git mv deep/a deep/folder && > + test_all_match git mv -v deep/deeper1 deep/folder && > + test_all_match git mv -f deep/deeper2/deeper1/deepest2/a deep/folder This is a good variation on the standard "inside cone" to "inside cone", and I'd like to see something similar done inside sparse directories. And, similar to above, I don't think '-v' needs to be tested. > +' > + > test_done Overall, this is a great start! You've got a good pattern set up (it's very clear to follow), I think it mainly needs some more variety to the test cases. Also, if you find that this test gets way too large after adding more cases, feel free to split it into multiple named tests if one gets too long (e.g. "mv", "mv -f"). My recommendations: - add tests covering outside-of-sparse-cone 'mv' arguments - add tests covering 'mv' attempting to move directories (in-cone and sparse) - add some "test_must_fail" tests to see what happens when you do something "wrong", e.g. to try to overwrite a file without '-f' (I've found some really interesting issues in the past where you expect something to fail and it doesn't) - add 'git status --porcelain=v2' checks to confirm that the 'mv' worked the same across the different checkouts - remove multiples of test cases that test the same general behavior (e.g., 'git mv <in-cone file> <in-cone file>' only needs to be done once) - double-check whether '-v' and '-k' have the ability to affect full-checkout/sparse-checkout/sparse-index differently - if not, you probably don't need to test them Thanks for working on this, and I hope this helps!
On 3/15/2022 12:07 PM, Victoria Dye wrote: > Shaoxuan Yuan wrote: >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> >> --- >> builtin/mv.c | 3 +++ >> t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++ >> 2 files changed, 37 insertions(+) >> > > Hi Shaoxuan! Hello! > I'll answer your question "are the tests on the right track?" [1] inline > with the tests here. > In 't1092', I've tried to write test cases around some of the > characteristics relevant to sparse checkout/sparse index. For example: > > - files inside vs. outside of sparse cone (e.g., 'deep/a' vs 'folder1/a') > - "normal" directories vs. sparse directories (e.g., 'deep/' vs. 'folder1/') > - directories inside a sparse directory vs. "toplevel" sparse directories > (e.g., 'folder1/0/' vs. 'folder1/') > - options that follow different code paths, especially if those code paths > interact with the index differently (e.g., 'git reset --hard' vs 'git > reset --mixed') > - (probably not relevant for 'git mv') files with vs. without staged changes > in the index > > I've found that exercising these characteristics provides good baseline > coverage for a sparse index integration, not leaving any major gaps. I'll > also typically add cases specific to any workarounds I need to add to a > command (like for 'git read-tree --prefix' [2]). This, and other advice that Victoria mentions, are really good points to keep in mind. > My recommendations: > > - add tests covering outside-of-sparse-cone 'mv' arguments > - add tests covering 'mv' attempting to move directories (in-cone and > sparse) > - add some "test_must_fail" tests to see what happens when you do something > "wrong", e.g. to try to overwrite a file without '-f' (I've found some > really interesting issues in the past where you expect something to fail > and it doesn't) > - add 'git status --porcelain=v2' checks to confirm that the 'mv' worked the > same across the different checkouts > - remove multiples of test cases that test the same general behavior (e.g., > 'git mv <in-cone file> <in-cone file>' only needs to be done once) > - double-check whether '-v' and '-k' have the ability to affect > full-checkout/sparse-checkout/sparse-index differently - if not, you > probably don't need to test them > > Thanks for working on this, and I hope this helps! You mention in your cover letter that the ensure_not_expanded tests are not added yet (same with performance tests). Now that you've gotten feedback on this version of the patch, I might recommend the organization you might want for a full series: 1. Add these 'mv' tests to t1092 _without_ the code change. These tests should work when the index is expanded, and making the code change to not expand the index shouldn't change the behavior. 2. Add the performance test so we have a baseline to measure how well 'mv' does in the normal case (and how it is slower when expanding the index). 3. Make the code change and add the ensure_not_expanded test, since the functionality from the tests added in (1) will not change and we can report the results from the perf tests added in (2). The only thing to test is the new, internal behavior that the index is not expanded when doing these actions. (Keep in mind that we expect the index to be expanded for out-of-cone moves, but it's the in-cone moves that we expect to not expand.) Thanks! -Stolee
Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> writes: > Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> > --- > builtin/mv.c | 3 +++ > t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/builtin/mv.c b/builtin/mv.c > index 83a465ba83..111360ebf5 100644 > --- a/builtin/mv.c > +++ b/builtin/mv.c > @@ -138,6 +138,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) > > git_config(git_default_config, NULL); > > + prepare_repo_settings(the_repository); > + the_repository->settings.command_requires_full_index = 0; > + The command used to be marked as one of the commands that require full index to work correctly. Why did it suddenly become not to require it, especially without any other changes to make it so? This patch needs a lot more explaining to do in itse proposed log message. Thanks.
On 3/15/2022 1:23 PM, Junio C Hamano wrote: > Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> writes: > >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> >> --- >> builtin/mv.c | 3 +++ >> t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++ >> 2 files changed, 37 insertions(+) >> >> diff --git a/builtin/mv.c b/builtin/mv.c >> index 83a465ba83..111360ebf5 100644 >> --- a/builtin/mv.c >> +++ b/builtin/mv.c >> @@ -138,6 +138,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) >> >> git_config(git_default_config, NULL); >> >> + prepare_repo_settings(the_repository); >> + the_repository->settings.command_requires_full_index = 0; >> + > > The command used to be marked as one of the commands that require > full index to work correctly. Why did it suddenly become not to > require it, especially without any other changes to make it so? > > This patch needs a lot more explaining to do in itse proposed log > message. Right. Some builtins already work safely with the sparse index, but we just were not sure without creating the proper tests for it. In this case, I expect 'git mv' uses index_name_pos() to find the locations of a given index entry, which can cause the index to expand naturally. I can definitely imagine a bug where index_name_pos() fixes the location of the in-cone path within the sparse index, then the index_name_pos() for the out-of-cone path expands the index and causes the position of the in-cone path to no longer be correct. Testing with a variety of in-cone and out-of-cone paths will help here. While I was writing this reply, I realized that our default cone in `t1092` doesn't have a sparse directory before the typical in-cone path of "deep/a". I set out to make one. This diff _should_ apply to `t1092` without causing any failures: --- >8 --- diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 2a04b532f91..e9533832aab 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -16,7 +16,9 @@ test_expect_success 'setup' ' echo "after deep" >e && echo "after folder1" >g && echo "after x" >z && - mkdir folder1 folder2 deep x && + mkdir folder1 folder2 deep before x && + echo "before deep" >before/a && + echo "before deep again" >before/b && mkdir deep/deeper1 deep/deeper2 deep/before deep/later && mkdir deep/deeper1/deepest && mkdir deep/deeper1/deepest2 && @@ -1311,6 +1313,7 @@ test_expect_success 'ls-files' ' cat >expect <<-\EOF && a + before/ deep/ e folder1- @@ -1358,6 +1361,7 @@ test_expect_success 'ls-files' ' cat >expect <<-\EOF && a + before/ deep/ e folder1- --- >8 --- However, it causes failures in these tests: 1. `8 - add outside sparse cone` 2. `10 - status/add: outside sparse cone` 3. `21 - reset with pathspecs inside sparse definition` After talking with @vdye about this, it seems that they are all failing based on a common issue regarding an index-based diff. Somehow the diff is not finding a version of the paths so is reporting them as added. For example, at the point of failure in `8 - add outside sparse cone`, we have these results for some Git commands: $ git diff diff --git a/folder1/a b/folder1/a index 7898192..8e27be7 100644 --- a/folder1/a +++ b/folder1/a @@ -1 +1 @@ -a +text $ git diff --staged $ git diff --staged -- folder1/a diff --git a/folder1/a b/folder1/a new file mode 100644 index 0000000..7898192 --- /dev/null +++ b/folder1/a @@ -0,0 +1 @@ +a $ git diff --staged -- deep diff --git a/deep/later/a b/deep/later/a new file mode 100644 index 0000000..7898192 --- /dev/null +++ b/deep/later/a @@ -0,0 +1 @@ +a ``` The impact must be pretty low and specific to these prefixed diffs (the reset test also uses prefixed resets, so pathspecs are somehow involved) which are rare for users to actually use. Still, we should fix this and strengthen our tests. After trying for an hour to fix this myself, I have failed to find the root cause of this issue. I'm about to head out on vacation, so I won't have time to look into this again until Monday. I wanted to share this information so it doesn't cause Shaoxuan too much pain while working on this 'git mv' change. Thanks, -Stolee
On Wed, Mar 16, 2022 at 12:07 AM Victoria Dye <vdye@github.com> wrote: > > Shaoxuan Yuan wrote: > > Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> > > --- > > builtin/mv.c | 3 +++ > > t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++ > > 2 files changed, 37 insertions(+) > > > > Hi Shaoxuan! Hi Victoria! > Thanks for working on this, and I hope this helps! Thanks for all the feedback, they are really helpful! I'm still researching and experimenting with various documents and examples regarding your feedback. And I will try to submit a more refined patch in less than a few days :)
On Wed, Mar 16, 2022 at 1:14 AM Derrick Stolee <derrickstolee@github.com> wrote: > > Hello! Hi Derrick, > > > I'll answer your question "are the tests on the right track?" [1] inline > > with the tests here. > > In 't1092', I've tried to write test cases around some of the > > characteristics relevant to sparse checkout/sparse index. For example: > > > > - files inside vs. outside of sparse cone (e.g., 'deep/a' vs 'folder1/a') > > - "normal" directories vs. sparse directories (e.g., 'deep/' vs. 'folder1/') > > - directories inside a sparse directory vs. "toplevel" sparse directories > > (e.g., 'folder1/0/' vs. 'folder1/') > > - options that follow different code paths, especially if those code paths > > interact with the index differently (e.g., 'git reset --hard' vs 'git > > reset --mixed') > > - (probably not relevant for 'git mv') files with vs. without staged changes > > in the index > > > > I've found that exercising these characteristics provides good baseline > > coverage for a sparse index integration, not leaving any major gaps. I'll > > also typically add cases specific to any workarounds I need to add to a > > command (like for 'git read-tree --prefix' [2]). > > This, and other advice that Victoria mentions, are really > good points to keep in mind. > > > My recommendations: > > > > - add tests covering outside-of-sparse-cone 'mv' arguments > > - add tests covering 'mv' attempting to move directories (in-cone and > > sparse) > > - add some "test_must_fail" tests to see what happens when you do something > > "wrong", e.g. to try to overwrite a file without '-f' (I've found some > > really interesting issues in the past where you expect something to fail > > and it doesn't) > > - add 'git status --porcelain=v2' checks to confirm that the 'mv' worked the > > same across the different checkouts > > - remove multiples of test cases that test the same general behavior (e.g., > > 'git mv <in-cone file> <in-cone file>' only needs to be done once) > > - double-check whether '-v' and '-k' have the ability to affect > > full-checkout/sparse-checkout/sparse-index differently - if not, you > > probably don't need to test them > > > > Thanks for working on this, and I hope this helps! > > You mention in your cover letter that the ensure_not_expanded tests > are not added yet (same with performance tests). Now that you've > gotten feedback on this version of the patch, I might recommend the > organization you might want for a full series: > > 1. Add these 'mv' tests to t1092 _without_ the code change. These > tests should work when the index is expanded, and making the > code change to not expand the index shouldn't change the > behavior. > > 2. Add the performance test so we have a baseline to measure how > well 'mv' does in the normal case (and how it is slower when > expanding the index). > > 3. Make the code change and add the ensure_not_expanded test, > since the functionality from the tests added in (1) will not > change and we can report the results from the perf tests > added in (2). The only thing to test is the new, internal > behavior that the index is not expanded when doing these > actions. (Keep in mind that we expect the index to be > expanded for out-of-cone moves, but it's the in-cone moves > that we expect to not expand.) Thanks for the recommendations, they are really helpful! I will try to address them in the next patch :)
Hi Victoria, Just found an interesting (probably) behavior. In the `sparse-checkout` directory created in `init_repos()` in t1092, if I say: $ mkdir folder3 $ touch folder3/a $ git mv folder3/a deep and git will prompt: "fatal: not under version control, source=folder3/a, destination=deep/a" And if I say: $ git mv folder3 deep git will prompt: "fatal: source directory is empty, source=folder3, destination=deep/folder3" What I am wondering is that file `folder3/a` is outside of sparse-checkout cone, should `git mv` instead prompts with `advise_on_updating_sparse_paths()` or this "not under version control" alarm is acceptable?
On 3/16/2022 6:45 AM, Shaoxuan Yuan wrote: > Hi Victoria, > > Just found an interesting (probably) behavior. > > In the `sparse-checkout` directory created in `init_repos()` in t1092, if I > say: > > $ mkdir folder3 > $ touch folder3/a The issue here is that this file is "untracked", not just outside of the sparse-checkout cone. > $ git mv folder3/a deep > > and git will prompt: > > "fatal: not under version control, source=folder3/a, destination=deep/a" > > And if I say: > > $ git mv folder3 deep > > git will prompt: > > "fatal: source directory is empty, source=folder3, destination=deep/folder3" > > What I am wondering is that file `folder3/a` is outside of sparse-checkout cone, > should `git mv` instead prompts with `advise_on_updating_sparse_paths()` or this > "not under version control" alarm is acceptable? Instead, what about git mv folder2/a deep/new since folder2/a is a tracked file, just not in the working tree since it is outside the sparse-checkout cone. (If it fails, then it should fail the same with and without the sparse index, which is what "test_sparse_match" is for.) Thanks, -Stolee
Hi Derrick, On Wed, Mar 16, 2022 at 9:34 PM Derrick Stolee <derrickstolee@github.com> wrote: > The issue here is that this file is "untracked", not just outside > of the sparse-checkout cone. Thanks for the succinct explanation, it makes much more sense now :) > Instead, what about > > git mv folder2/a deep/new > > since folder2/a is a tracked file, just not in the working tree > since it is outside the sparse-checkout cone. > > (If it fails, then it should fail the same with and without the > sparse index, which is what "test_sparse_match" is for.) I tested this and it fails as expected with: "fatal: bad source, source=folder2/a, destination=deep/new" > Thanks, > -Stolee Thanks for the reply above! Other than that, I also have found another issue (probably), with $ mkdir folder2 $ git mv folder2 deep After these I do: $ git status And the output indicates that the index is updated with the following changes: renamed: folder2/0/0/0 -> deep/folder2/0/0/0 renamed: folder2/0/1 -> deep/folder2/0/1 renamed: folder2/a -> deep/folder2/a Nothing fails, which is not what I expected. What I expect is `git mv` will fail because it is being told to update a sparse-directory (which as I read the blogs and sparse-index.txt is taken as a sparse-directory entry) outside of the sparse-checkout cone. Unless `git mv` is supplied with `--sparse`, the command will do nothing but fail, no? What confuses me more is that the `folder2`, which is present in the index but not in the working tree (due to sparse-checkout cone), seems to be "unlocked" and re-picked up by Git after `mkdir folder2` and move `folder2` into the cone area. And still, the files under `deep/folder2` are not present in the working tree (might be relevant to the previous context). I haven't run the gdb to see into the process, I just get somehow confused by these discrepancies (seemingly to me). I think I should gdb into it though, getting some info here from people can also be really helpful :)
Hi Derrick and Victoria. On Wed, Mar 16, 2022 at 1:14 AM Derrick Stolee <derrickstolee@github.com> wrote: > You mention in your cover letter that the ensure_not_expanded tests > are not added yet (same with performance tests). Now that you've > gotten feedback on this version of the patch, I might recommend the > organization you might want for a full series: > > 1. Add these 'mv' tests to t1092 _without_ the code change. These > tests should work when the index is expanded, and making the > code change to not expand the index shouldn't change the > behavior. > > 2. Add the performance test so we have a baseline to measure how > well 'mv' does in the normal case (and how it is slower when > expanding the index). I'm a bit caught up here. Do I just do a before-code-change test and after-code-change test, and benchmark the after against the before? Or do you mean I should also perf test out-of-cone arguments with 'mv' so that the index could be expanded? According to my understanding, the sparse-index could be required to expand when out-of-cone actions happen and the 'ensure_full_index()' is called. And do a 3-way comparison among before-code-change, after-code-change, and after-code-change- index-expanded, no?
Shaoxuan Yuan wrote: > Hi Derrick, > > On Wed, Mar 16, 2022 at 9:34 PM Derrick Stolee <derrickstolee@github.com> wrote: >> The issue here is that this file is "untracked", not just outside >> of the sparse-checkout cone. > > Thanks for the succinct explanation, it makes much more sense now :) > >> Instead, what about >> >> git mv folder2/a deep/new >> >> since folder2/a is a tracked file, just not in the working tree >> since it is outside the sparse-checkout cone. >> >> (If it fails, then it should fail the same with and without the >> sparse index, which is what "test_sparse_match" is for.) > > I tested this and it fails as expected with: > "fatal: bad source, source=folder2/a, destination=deep/new" > Great! This should then probably be turned into a "test_expect_fail" test in 't1092' - that'll make sure we get both the right behavior and right error message with sparse index after it's enabled. However, I also get the same result when I add the '--sparse' option. I would expect the behavior to be "move 'folder2/a' to 'deep/new' and check it out in the worktree" - this may be a good candidate for improving the existing integration with sparse *checkout* before enabling sparse *index* (e.g., like when 'git add' was updated to not add sparse files by default [1]). [1] https://lore.kernel.org/git/2c5c834bc9fb42aeaff7befbba477aec727184c0.1632497954.git.gitgitgadget@gmail.com/ >> Thanks, >> -Stolee > > Thanks for the reply above! > > Other than that, I also have found another issue (probably), with > > $ mkdir folder2 > $ git mv folder2 deep > > After these I do: > > $ git status > > And the output indicates that the index is updated with the following changes: > > renamed: folder2/0/0/0 -> deep/folder2/0/0/0 > renamed: folder2/0/1 -> deep/folder2/0/1 > renamed: folder2/a -> deep/folder2/a > > Nothing fails, which is not what I expected. What I expect is `git mv` will > fail because it is being told to update a sparse-directory (which as I read the > blogs and sparse-index.txt is taken as a sparse-directory entry) outside of the > sparse-checkout cone. Unless `git mv` is supplied with `--sparse`, the command > will do nothing but fail, no? > I think you're right that this is a bug. This appears to come from the fact that 'mv' decides whether a directory is sparse only *after* it sees that it doesn't exist on-disk. > What confuses me more is that the `folder2`, which is present in the index but > not in the working tree (due to sparse-checkout cone), seems to be "unlocked" > and re-picked up by Git after `mkdir folder2` and move `folder2` into > the cone area. > And still, the files under `deep/folder2` are not present in the > working tree (might > be relevant to the previous context). > This is a consequence of how sparse-checkout is implemented. The files in 'folder2/' aren't on-disk (and aren't correspondingly shown as "deleted" in 'git status') because the index entries of 'folder2/' files are marked with the "SKIP_WORKTREE" flag. This flag basically indicates to git "this file shouldn't be in the worktree (on-disk), but it is part of the repository (in the index)". In other words, it's what makes a file "sparse", and sparse-checkout (when you run 'set' or 'add') assigns that flag based on the user-specified patterns. However, the flag isn't constantly being re-evaluated - only certain commands change SKIP_WORKTREE, and only because they do so explicitly (e.g., 'git reset --mixed' [2]) - leading to situations like what you're seeing, where 'folder2/' files (which have SKIP_WORKTREE enabled) are moved into the sparse cone, but still have SKIP_WORKTREE enabled. So I think there are three potential things to fix here: 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't fail with "bad source", even though it should. 2. When you try to move a sparse file with 'git mv --sparse', it still fails. 3. SKIP_WORKTREE is not removed from out-of-cone files moved into the sparse cone. On a related note, there is precedent for needing to make fixes like this before integrating with sparse index. For example: in addition to the earlier examples in 'add' and 'reset', 'checkout-index' was changed to no longer checkout SKIP_WORKTREE files by default [3]. It's a somewhat expected part of this process because sparse-checkout is still "experimental", and one of our secondary goals with this sparse index work is to improve the behavior of sparse-checkout in the commands we integrate. All of this combined will, ideally, make the experience of using sparse-checkout much nicer for users (both from usability and performance perspectives). [2] https://lore.kernel.org/git/b221b00b7e06a3b135b9f68ce87cffaa7d782581.1638201164.git.gitgitgadget@gmail.com/ [3] https://lore.kernel.org/git/601888606d1cf7d7752844dbdbc7fac20d4be8c4.1641924306.git.gitgitgadget@gmail.com/ > I haven't run the gdb to see into the process, I just get somehow confused by > these discrepancies (seemingly to me). I think I should gdb into it though, > getting some info here from people can also be really helpful :) > Another tool that may help you here is 'git ls-files --sparse -t'. It lists the files in the index and their "tags" ('H' is "normal" tracked files, 'S' is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd expect to be SKIP_WORKTREE is not and vice versa. [4] https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt--t
Victoria Dye <vdye@github.com> writes: >> I tested this and it fails as expected with: >> "fatal: bad source, source=folder2/a, destination=deep/new" > > Great! This should then probably be turned into a "test_expect_fail" test in > 't1092' - that'll make sure we get both the right behavior and right error > message with sparse index after it's enabled. > > However, I also get the same result when I add the '--sparse' option. I > would expect the behavior to be "move 'folder2/a' to 'deep/new' and check it > out in the worktree" - this may be a good candidate for improving the > existing integration with sparse *checkout* before enabling sparse *index* > (e.g., like when 'git add' was updated to not add sparse files by default > [1]). > ... > I think you're right that this is a bug. This appears to come from the fact > that 'mv' decides whether a directory is sparse only *after* it sees that it > doesn't exist on-disk. > ... > So I think there are three potential things to fix here: > > 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't > fail with "bad source", even though it should. > 2. When you try to move a sparse file with 'git mv --sparse', it still > fails. > 3. SKIP_WORKTREE is not removed from out-of-cone files moved into the sparse > cone. > > On a related note, there is precedent for needing to make fixes like this > before integrating with sparse index. For example: in addition to the > earlier examples in 'add' and 'reset', 'checkout-index' was changed to no > longer checkout SKIP_WORKTREE files by default [3]. It's a somewhat expected > part of this process ... > ... > Another tool that may help you here is 'git ls-files --sparse -t'. It lists > the files in the index and their "tags" ('H' is "normal" tracked files, 'S' > is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd > expect to be SKIP_WORKTREE is not and vice versa. Wonderful. Quite honestly, because the code will most likely compile correctly if you just remove the unconditional "we first expand the in-core index fully" code, and because the "sparse index" makes the existing index walking code fail in unexpected and surprising ways, I consider it unsuitably harder for people who are not yet familiar with the system. Without a good test coverage (which is hard to give unless you are familiar with the code being tested X-<), one can easily get confused and lost. Thanks for guiding a new contributor with the usual process of loosening "require-full-index".
On 3/17/2022 9:00 PM, Junio C Hamano wrote: > Victoria Dye <vdye@github.com> writes: > >>> I tested this and it fails as expected with: >>> "fatal: bad source, source=folder2/a, destination=deep/new" >> >> Great! This should then probably be turned into a "test_expect_fail" test in >> 't1092' - that'll make sure we get both the right behavior and right error >> message with sparse index after it's enabled. >> >> However, I also get the same result when I add the '--sparse' option. I >> would expect the behavior to be "move 'folder2/a' to 'deep/new' and check it >> out in the worktree" - this may be a good candidate for improving the >> existing integration with sparse *checkout* before enabling sparse *index* >> (e.g., like when 'git add' was updated to not add sparse files by default >> [1]). >> ... >> I think you're right that this is a bug. This appears to come from the fact >> that 'mv' decides whether a directory is sparse only *after* it sees that it >> doesn't exist on-disk. >> ... >> So I think there are three potential things to fix here: >> >> 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't >> fail with "bad source", even though it should. >> 2. When you try to move a sparse file with 'git mv --sparse', it still >> fails. >> 3. SKIP_WORKTREE is not removed from out-of-cone files moved into the sparse >> cone. >> >> On a related note, there is precedent for needing to make fixes like this >> before integrating with sparse index. For example: in addition to the >> earlier examples in 'add' and 'reset', 'checkout-index' was changed to no >> longer checkout SKIP_WORKTREE files by default [3]. It's a somewhat expected >> part of this process ... >> ... >> Another tool that may help you here is 'git ls-files --sparse -t'. It lists >> the files in the index and their "tags" ('H' is "normal" tracked files, 'S' >> is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd >> expect to be SKIP_WORKTREE is not and vice versa. > > Wonderful. > > Quite honestly, because the code will most likely compile correctly > if you just remove the unconditional "we first expand the in-core > index fully" code, and because the "sparse index" makes the existing > index walking code fail in unexpected and surprising ways, I > consider it unsuitably harder for people who are not yet familiar > with the system. Without a good test coverage (which is hard to > give unless you are familiar with the code being tested X-<), one > can easily get confused and lost. Certainly, 'git mv' is looking to be harder than expected, but there is a lot of interesting exploration happening in the process. Thanks for persisting on this one, Shaoxuan! Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: >>> Another tool that may help you here is 'git ls-files --sparse -t'. It lists >>> the files in the index and their "tags" ('H' is "normal" tracked files, 'S' >>> is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd >>> expect to be SKIP_WORKTREE is not and vice versa. >> >> Wonderful. >> >> Quite honestly, because the code will most likely compile correctly >> if you just remove the unconditional "we first expand the in-core >> index fully" code, and because the "sparse index" makes the existing >> index walking code fail in unexpected and surprising ways, I >> consider it unsuitably harder for people who are not yet familiar >> with the system. Without a good test coverage (which is hard to >> give unless you are familiar with the code being tested X-<), one >> can easily get confused and lost. > > Certainly, 'git mv' is looking to be harder than expected, but there > is a lot of interesting exploration happening in the process. Yeah, I know. I am suprised that it is harder than expected *to* *you*, though. After having seen a few other topics, I thought that you should know how deceptively easy to lose "require-full" and how hard to audit the code that may expect "a flat list of paths" in the in-core index ;-). > Thanks for persisting on this one, Shaoxuan! Yes, thanks. And thanks for mentoring Shaoxuan.
On 3/21/2022 3:14 PM, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >>>> Another tool that may help you here is 'git ls-files --sparse -t'. It lists >>>> the files in the index and their "tags" ('H' is "normal" tracked files, 'S' >>>> is SKIP_WORKTREE, etc. [4]), which can help identify when a file you'd >>>> expect to be SKIP_WORKTREE is not and vice versa. >>> >>> Wonderful. >>> >>> Quite honestly, because the code will most likely compile correctly >>> if you just remove the unconditional "we first expand the in-core >>> index fully" code, and because the "sparse index" makes the existing >>> index walking code fail in unexpected and surprising ways, I >>> consider it unsuitably harder for people who are not yet familiar >>> with the system. Without a good test coverage (which is hard to >>> give unless you are familiar with the code being tested X-<), one >>> can easily get confused and lost. >> >> Certainly, 'git mv' is looking to be harder than expected, but there >> is a lot of interesting exploration happening in the process. > > Yeah, I know. > > I am suprised that it is harder than expected *to* *you*, though. > After having seen a few other topics, I thought that you should know > how deceptively easy to lose "require-full" and how hard to audit > the code that may expect "a flat list of paths" in the in-core index > ;-). I'm particularly surprised in how much 'git mv' doesn't work very well in the sparse-checkout environment already, which makes things more difficult than "just" doing the normal sparse index things. It's good that we are discovering them and working to fix them. Thanks, -Stolee
Hi all, On Tue, Mar 22, 2022 at 3:45 AM Derrick Stolee <derrickstolee@github.com> wrote: > I'm particularly surprised in how much 'git mv' doesn't work very > well in the sparse-checkout environment already, which makes things > more difficult than "just" doing the normal sparse index things. > > It's good that we are discovering them and working to fix them. > > Thanks, > -Stolee Really appreciate the mentoring and tips here, I'm trying to make some progress now. The problems facing here certainly push me to explore more and know better about the codebase. Appreciate all the help :-)
On 3/22/2022 4:38 AM, Shaoxuan Yuan wrote: > Hi all, > > On Tue, Mar 22, 2022 at 3:45 AM Derrick Stolee <derrickstolee@github.com> wrote: >> I'm particularly surprised in how much 'git mv' doesn't work very >> well in the sparse-checkout environment already, which makes things >> more difficult than "just" doing the normal sparse index things. >> >> It's good that we are discovering them and working to fix them. >> >> Thanks, >> -Stolee > > Really appreciate the mentoring and tips here, I'm trying to make some progress > now. The problems facing here certainly push me to explore more and know > better about the codebase. Appreciate all the help :-) A thought occurred to me while thinking about these difficulties: perhaps it is better to start with 'git rm' since that does only half of what 'git mv' does. It should be a smaller lift as a first contribution. There is even a clear loop that is marked with "TODO: audit for interaction with sparse-index." As we've discovered in this thread, the direction for integrating a builtin with the sparse index should follow this outline: 0. Test the builtin in t1092 with interactions inside and outside of the sparse-checkout cone.* 1. Add command_requires_full_index = 0 line to the builtin. 2. Check for failures and diagnose them. 3. Check for index expansion and remove them as necessary. (Go back to 2.) 4. Run performance tests. (*) This step is the one we failed to focus enough on previously. Of course, if you've already gotten really far on 'mv' and don't want to switch context, then keep at it. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > On 3/22/2022 4:38 AM, Shaoxuan Yuan wrote: >> Hi all, >> >> On Tue, Mar 22, 2022 at 3:45 AM Derrick Stolee <derrickstolee@github.com> wrote: >>> I'm particularly surprised in how much 'git mv' doesn't work very >>> well in the sparse-checkout environment already, which makes things >>> more difficult than "just" doing the normal sparse index things. >>> >>> It's good that we are discovering them and working to fix them. >>> >>> Thanks, >>> -Stolee >> >> Really appreciate the mentoring and tips here, I'm trying to make some progress >> now. The problems facing here certainly push me to explore more and know >> better about the codebase. Appreciate all the help :-) > > A thought occurred to me while thinking about these difficulties: > perhaps it is better to start with 'git rm' since that does only > half of what 'git mv' does. It should be a smaller lift as a first > contribution. There is even a clear loop that is marked with "TODO: > audit for interaction with sparse-index." > > As we've discovered in this thread, the direction for integrating > a builtin with the sparse index should follow this outline: > > 0. Test the builtin in t1092 with interactions inside and outside > of the sparse-checkout cone.* > > 1. Add command_requires_full_index = 0 line to the builtin. > > 2. Check for failures and diagnose them. > > 3. Check for index expansion and remove them as necessary. > (Go back to 2.) > > 4. Run performance tests. > > (*) This step is the one we failed to focus enough on previously. Jonathan, I thought you had a radically different approach that ought to be much safer than the above---do you want to bring it up for discussion? > Of course, if you've already gotten really far on 'mv' and don't > want to switch context, then keep at it. > > Thanks, > -Stolee
On Fri, Mar 18, 2022 at 5:57 AM Victoria Dye <vdye@github.com> wrote: Hi all, It's been a busy week, I'm sorry that did not have much time to respond. ================================= A brief summary of my latest investigations: I think the 'git mv' command still has some questionable aspects, especially with the 'git sparse-checkout' command. And I feel we have to sort things out between 'git mv' and sparse-checkout first, then proceed to the issues with sparse-index. =========== I'm trying to fix the first 2 of the 3 potential things mentioned earlier. > 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't > fail with "bad source", even though it should. In this case, 'git mv' does not fail with "bad source" is something expected, because this error is related to the existence of an on-disk file, not a directory. The closest thing that it should fail with, in my opinion, is by calling the advise function 'advise_on_updating_sparse_paths'. With that being said, I now raise the first question: should we change the sparse- checkout cone check to be placed at the very beginning of the checking process, or keep it at the end as a very final check (where it is right now). My preference is to place it at the very beginning, since the user should always be cautious about touching contents outside of sparse-checkout cone, no matter what. If a certain move touches out-of-cone stuff, and at the same time it will fail with the, for example, "destination exists" or "conflicted" error, I think these errors should come second, after being supplied with the "--sparse" flag. > 2. When you try to move a sparse file with 'git mv --sparse', it still > fails. This is also related to the first question, because in this case, "folder2/a" is not on-disk, then 'git mv' will fail fast at the first check and ignore the "--sparse" flag. Even if we modify the code to place the out-of-cone check at the very beginning, and supply a "--sparse" flag, we have to make 'git mv' look into the index to find a sparse file. And here I raise my second question: by moving a sparse file, which is normally not on-disk, we have to alter the original 'git mv' logic to make it grope into the index for the missing sparse file (for now it does not care about the index, except when it receives a directory as <source>); can we make this change? -- Thanks & Regards, Shaoxuan
On 3/26/2022 11:48 PM, Shaoxuan Yuan wrote: > On Fri, Mar 18, 2022 at 5:57 AM Victoria Dye <vdye@github.com> wrote: > > Hi all, > > It's been a busy week, I'm sorry that did not have much time to respond. > > ================================= > A brief summary of my latest investigations: > > I think the 'git mv' command still has some questionable aspects, especially > with the 'git sparse-checkout' command. And I feel we have to sort things out > between 'git mv' and sparse-checkout first, then proceed to the issues with > sparse-index. > =========== > > I'm trying to fix the first 2 of the 3 potential things mentioned earlier. > >> 1. When empty folder2/ is on-disk, 'git mv' (without '--sparse') doesn't >> fail with "bad source", even though it should. >> 2. When you try to move a sparse file with 'git mv --sparse', it still >> fails. I think that these conditions are all related to the perspective as described in the documentation of 'git mv': In the first form, it renames <source>, which **must exist** and be either a file, symlink or directory, to <destination>. In the second form, the last argument has to be **an existing** directory; the given sources will be moved into this directory. The index is updated after successful completion, but the change must still be committed. (**emphasis mine**) So, this documents the perspective that files must exist in the worktree (and destination directories must exist in the worktree), and after all of those checks, then the move is staged. I think part of our issues here is that in the case of a sparse-checkout, we can have index entries that don't exist in the worktree. The expected behavior we are considering is that 'git mv' should stage the movement of the file in the index, ignoring the worktree for paths outside of the sparse-checkout definition. One way to do this would be to flip the implementation's direction: perform the index operation of moving the cache entry, then update the worktree to reflect that change (if necessary). The case that I can think about being a bit strange is if the user has an unstaged deletion of the source file, then runs 'git mv'. Since the worktree is missing the file, then we cannot do the equivalent 'mv' operation in the worktree. One other thing to keep in mind is that 'git mv <source> <destination>' can act like 'mv <source> <destination> && git add <destination>' if <source> is an untracked path. So, 'git mv' can succeed even if the source is not in the index! So the change here is to ignore a non-existing path when the same path exists as a cache entry with the SKIP_WORKTREE bit. That bit does say "ignore the worktree" so 'git mv' isn't doing the right thing already. --- In conclusion, there might be multiple ways forward here: 1. Keep the expectation that <source> is in the worktree as given, and let a tracked <source> outside of the sparse-checkout cone result in a failure (as it currently does). Consider adding an advice message if <source> is a tracked, sparse path. 2. Change the expectation to be that <source> must either be a file in the worktree _or_ a tracked, sparse path (or both). The nice thing here is that we can do (1) and then (2) later. The key things to investigate for the sparse index, then are what happens when <destination> is outside of the sparse-checkout cone? I imagine it would be fine if the moved file still appears in the worktree and is cleaned up by a later 'git sparse-checkout reapply'. I'm eager to here alternate opinions and options on this topic. Thanks, -Stolee
diff --git a/builtin/mv.c b/builtin/mv.c index 83a465ba83..111360ebf5 100644 --- a/builtin/mv.c +++ b/builtin/mv.c @@ -138,6 +138,9 @@ int cmd_mv(int argc, const char **argv, const char *prefix) git_config(git_default_config, NULL); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + argc = parse_options(argc, argv, prefix, builtin_mv_options, builtin_mv_usage, 0); if (--argc < 1) diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 2a04b532f9..0a8164c5f6 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -1521,4 +1521,38 @@ test_expect_success 'checkout behaves oddly with df-conflict-2' ' test_cmp full-checkout-err sparse-index-err ' +test_expect_success 'mv' ' + init_repos && + + # test first form <source> <destination> + test_all_match git mv deep/a deep/a_mod && + test_all_match git mv deep/deeper1 deep/deeper1_mod && + test_all_match git mv deep/deeper2/deeper1/deepest2/a \ + deep/deeper2/deeper1/deepest2/a_mod && + + run_on_all git reset --hard && + + test_all_match git mv -f deep/a deep/before/a && + test_all_match git mv -f deep/before/a deep/a && + + run_on_all git reset --hard && + + test_all_match git mv -k deep/a deep/before/a && + test_all_match git mv -k deep/before/a deep/a && + + run_on_all git reset --hard && + + test_all_match git mv -v deep/a deep/a_mod && + test_all_match git mv -v deep/deeper1 deep/deeper1_mod && + test_all_match git mv -v deep/deeper2/deeper1/deepest2/a \ + deep/deeper2/deeper1/deepest2/a_mod && + + # test second form <source> ... <destination directory> + run_on_all git reset --hard && + run_on_all mkdir deep/folder && + test_all_match git mv deep/a deep/folder && + test_all_match git mv -v deep/deeper1 deep/folder && + test_all_match git mv -f deep/deeper2/deeper1/deepest2/a deep/folder +' + test_done
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> --- builtin/mv.c | 3 +++ t/t1092-sparse-checkout-compatibility.sh | 34 ++++++++++++++++++++++++ 2 files changed, 37 insertions(+)