Message ID | pull.973.git.1624932786.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Sparse index: integrate with commit and checkout | expand |
On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > This series extends our integration of sparse-index to 'git commit' and 'git > checkout'. > > This is based on ds/status-with-sparse-index (v7) and v2.32.0. The hard work > was already done in that topic, so these changes are simple. > > Recall that we have delayed our integration with 'git add' until we can work > out the concerns about how to deal with pathspecs outside of the > sparse-checkout definition. Those concerns might have some overlap with how > 'git commit' takes a pathspec, but this seems like a rare enough case to > handle here and we can be more careful with the behavior change in the next > series which will integrate with git add. > > In addition to the tests that already exist in t1092, I have integrated > these changes in microsoft/git and tested them against the Scalar functional > tests, which go through quite a few complicated scenarios, verifying that > things work the same across the full index and sparse-index cases. > > Thanks, -Stolee > > Derrick Stolee (5): > p2000: add 'git checkout -' test and decrease depth > p2000: compress repo names > commit: integrate with sparse-index > sparse-index: recompute cache-tree > checkout: stop expanding sparse indexes > > builtin/checkout.c | 8 ++-- > builtin/commit.c | 3 ++ > cache-tree.c | 2 - > sparse-index.c | 2 + > t/perf/p2000-sparse-operations.sh | 47 ++++++++++++-------- > t/t1092-sparse-checkout-compatibility.sh | 55 ++++++++++++++++++++++-- > 6 files changed, 89 insertions(+), 28 deletions(-) > > > base-commit: 1d744848ee6b58ccaf3a30f20abe9797ed5d2ce7 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-973%2Fderrickstolee%2Fsparse-index%2Fcommit-and-checkout-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-973/derrickstolee/sparse-index/commit-and-checkout-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/973 I've read over these patches and didn't find any problems in them in my reading; however, since it builds upon ds/status-with-sparse-index (v7)... I decided to retry some of my ideas and testing on Patch 10/16 of v7, over at https://lore.kernel.org/git/CABPp-BHwTAKwFiWQ0-2P=_g+7HLK5FfOAz-uujRjLou1fXT3zw@mail.gmail.com/ It turns out that the block you added there is now triggered by t1092 after this series, and the testcase won't pass without that block. It might be clearer to move that code fragment, or perhaps the whole patch, into this series...though the code fragment as is has introduced a bug. If you take t1092 test 12 ("diff with directory/file conflicts") and modify it so that before the git checkout df-conflict invocation from sparse-index, you first run: $ git sparse-checkout disable $ echo more stuff >>folder1/edited-content $ git add -u $ git diff HEAD # note the changes $ git sparse-checkout init --cone --sparse-index $ git diff HEAD # note the changes are still there $ git checkout df-conflict # no error?? What about the conflicting changes? $ git diff HEAD then the last command will show that the staged changes from before the commit have simply been discarded. In short, this makes the series behave like --force was passed with sparse directory entries, when --force wasn't passed. So we've still got some directory/file conflict issues.
On 7/9/2021 5:26 PM, Elijah Newren wrote: > On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> ... > I've read over these patches and didn't find any problems in them in > my reading; however, since it builds upon ds/status-with-sparse-index > (v7)... > > I decided to retry some of my ideas and testing on Patch 10/16 of v7, > over at https://lore.kernel.org/git/CABPp-BHwTAKwFiWQ0-2P=_g+7HLK5FfOAz-uujRjLou1fXT3zw@mail.gmail.com/ > > It turns out that the block you added there is now triggered by t1092 > after this series, and the testcase won't pass without that block. It > might be clearer to move that code fragment, or perhaps the whole > patch, into this series...though the code fragment as is has > introduced a bug. If you take t1092 test 12 ("diff with > directory/file conflicts") and modify it so that before the > > git checkout df-conflict > > invocation from sparse-index, you first run: > > $ git sparse-checkout disable > $ echo more stuff >>folder1/edited-content > $ git add -u > $ git diff HEAD # note the changes > $ git sparse-checkout init --cone --sparse-index > $ git diff HEAD # note the changes are still there > $ git checkout df-conflict # no error?? What about the > conflicting changes? > $ git diff HEAD > > then the last command will show that the staged changes from before > the commit have simply been discarded. In short, this makes the > series behave like --force was passed with sparse directory entries, > when --force wasn't passed. > > So we've still got some directory/file conflict issues. You are absolutely right that this seems strange. In fact, there is a behavior change during 'git checkout' for sparse-checkouts in general, but also my sparse-index change creates an additional change in this case. Here is a test that demonstrates the issue: test_expect_success 'staged directory/file conflict' ' init_repos && test_sparse_match git sparse-checkout disable && write_script edit-contents <<-\EOF && echo text >>folder1/edited-content EOF run_on_all ../edit-contents && test_all_match git add folder1/edited-content && test_all_match git diff HEAD && git -C sparse-checkout sparse-checkout init --cone --no-sparse-index && git -C sparse-index sparse-checkout init --cone --sparse-index && test_all_match git diff HEAD && # Sparse-checkouts handle this conflict differently than # full checkouts, as they consider the file "folder1" to # be deleted in favor of the staged file # "folder1/edited-content". test_sparse_match git checkout df-conflict && test_sparse_match git diff HEAD ' The sparse-index case drops all staged changes during the 'git checkout df-conflict' command, so the test fails on that line. That final diff looks like this in the sparse-checkout repo (no sparse index): diff --git a/folder1 b/folder1 deleted file mode 100644 index d95f3ad..0000000 --- a/folder1 +++ /dev/null @@ -1 +0,0 @@ -content diff --git a/folder1/edited-content b/folder1/edited-content new file mode 100644 index 0000000..8e27be7 --- /dev/null +++ b/folder1/edited-content @@ -0,0 +1 @@ +text This is a strange case in that we have a staged tree that is outside of the sparse-checkout cone. When running the 'git checkout df-conflict' command, the twoway_merge() method receives the following values: current: "folder1/" (tree OID) oldtree: "" (NULL OID) newtree: "folder1" (blob OID) Is this value for 'oldtree' correct? It seems strange to me, so I'll look further into it. Clearly, the resolution that was presented in the previous patch was incorrect so I will try to understand this situation better. Further, I expect it to be simpler to modify the behavior here to match the full checkout case than to make the sparse-index case match the normal sparse-checkout case. The "natural" thing would be to keep the staged "folder1/" directory, but that would present as adding all contained content, not just the single staged entry. Thanks, -Stolee
On 7/12/2021 2:46 PM, Derrick Stolee wrote: > On 7/9/2021 5:26 PM, Elijah Newren wrote: >> On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget >> <gitgitgadget@gmail.com> wrote: >>> ... > This is a strange case in that we have a staged tree that is > outside of the sparse-checkout cone. When running the 'git > checkout df-conflict' command, the twoway_merge() method > receives the following values: > > current: "folder1/" (tree OID) > oldtree: "" (NULL OID) > newtree: "folder1" (blob OID) > > Is this value for 'oldtree' correct? It seems strange to me, > so I'll look further into it. This is correct. This 'oldtree' entry is actually the o->df_conflict placeholder and is set to NULL inside the method. > Further, I expect it to be simpler to modify the behavior > here to match the full checkout case than to make the > sparse-index case match the normal sparse-checkout case. > The "natural" thing would be to keep the staged "folder1/" > directory, but that would present as adding all contained > content, not just the single staged entry. Taking a closer look at the full checkout case, I discovered that the 'git checkout df-conflict' command succeeds in the full checkout case if I apply it directly to the 'master' branch. In that situation, it completely removes the staged change to folder1/edited-content! This seems like incorrect behavior, and has nothing to do with the sparse-checkout feature. It just happens that a sparse-checkout will have a _different_ kind of incorrect behavior! However, when adding the test on top of the ds/status-with-sparse-index branch, the full checkout case matches the sparse-checkout! I bisected this to the additions of files adjacent to folder1/ (folder1. folder1-, etc) in e669ffb (t1092: expand repository data shape, 2021-07-14). If I switch the test to conflict on folder2, then I get the strange behavior that I was noticing on 'master'. Some very subtle things are going on here, and they don't necessarily involve the sparse index. Adding the sparse index to the mix creates a third incorrect behavior to this already-broken case. If we agree that the correct thing to do here is to reject the merge and fail the command, then I can start working on making that change in isolation (because _none_ of the existing behaviors are correct). That leaves a question as to whether we should hold up this series for that reason, or if I should pursue a fix to this kind of conflict as a forward fix on top of it. What do you think, Elijah and Junio? Thanks, -Stolee
On Fri, Jul 16, 2021 at 6:59 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 7/12/2021 2:46 PM, Derrick Stolee wrote: > > On 7/9/2021 5:26 PM, Elijah Newren wrote: > >> On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget > >> <gitgitgadget@gmail.com> wrote: > >>> > ... > > Further, I expect it to be simpler to modify the behavior > > here to match the full checkout case than to make the > > sparse-index case match the normal sparse-checkout case. > > The "natural" thing would be to keep the staged "folder1/" > > directory, but that would present as adding all contained > > content, not just the single staged entry. > Taking a closer look at the full checkout case, I discovered that the > 'git checkout df-conflict' command succeeds in the full checkout case if I > apply it directly to the 'master' branch. In that situation, it completely > removes the staged change to folder1/edited-content! This seems like > incorrect behavior, and has nothing to do with the sparse-checkout feature. I was not able to reproduce. Do you have other modifications to git, or is there some other special setup required to trigger the bug that I am missing in reading the paragraph above? Here's what I see: <Add an "exit 1 &&" right after "init_repos &&" in the 'diff with directory/file conflicts' test, run until first failure, then: $ cd trash directory.t1092-sparse-checkout-compatibility/full-checkout $ git reset --hard $ git checkout rename-in-to-out $ echo more stuff >>folder1/edited-content $ git add -u $ git checkout df-conflict error: Your local changes to the following files would be overwritten by checkout: folder1/edited-content Please commit your changes or stash them before you switch branches. Aborting This looks like the expected behavior to me, and is what I'd also expect from the sparse-checkout and sparse-index cases. > It just happens that a sparse-checkout will have a _different_ kind of > incorrect behavior! > > However, when adding the test on top of the ds/status-with-sparse-index > branch, the full checkout case matches the sparse-checkout! I bisected > this to the additions of files adjacent to folder1/ (folder1. folder1-, > etc) in e669ffb (t1092: expand repository data shape, 2021-07-14). If I > switch the test to conflict on folder2, then I get the strange behavior > that I was noticing on 'master'. > > Some very subtle things are going on here, and they don't necessarily > involve the sparse index. Adding the sparse index to the mix creates a > third incorrect behavior to this already-broken case. > > If we agree that the correct thing to do here is to reject the merge and > fail the command, then I can start working on making that change in > isolation (because _none_ of the existing behaviors are correct). Yes, rejecting the merge is the correct behavior. This is implied by the existing documentation for both the --merge and --force options to checkout. > That leaves a question as to whether we should hold up this series for > that reason, or if I should pursue a fix to this kind of conflict as a > forward fix on top of it. What do you think, Elijah and Junio? I only dug in and found the sparse-checkout/sparse-index bugs because the D/F changes you made to twoway_merge() looked clearly wrong to me and I was trying to find a case that would demonstrate it and make it easier for you to fix up. I still think the patch is wrong and that it adds a bug. If you can drop that patch, and still get correct behavior in your tests, then I think we can ignore other bugs in this area, but I'm not happy with that particular patch. If you need that patch, then it needs to be corrected, which probably means figuring out all these bugs.
On 7/17/2021 11:37 AM, Elijah Newren wrote: > On Fri, Jul 16, 2021 at 6:59 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 7/12/2021 2:46 PM, Derrick Stolee wrote: >>> On 7/9/2021 5:26 PM, Elijah Newren wrote: >>>> On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget >>>> <gitgitgadget@gmail.com> wrote: >>>>> >> ... >>> Further, I expect it to be simpler to modify the behavior >>> here to match the full checkout case than to make the >>> sparse-index case match the normal sparse-checkout case. >>> The "natural" thing would be to keep the staged "folder1/" >>> directory, but that would present as adding all contained >>> content, not just the single staged entry. >> Taking a closer look at the full checkout case, I discovered that the >> 'git checkout df-conflict' command succeeds in the full checkout case if I >> apply it directly to the 'master' branch. In that situation, it completely >> removes the staged change to folder1/edited-content! This seems like >> incorrect behavior, and has nothing to do with the sparse-checkout feature. > > I was not able to reproduce. Do you have other modifications to git, > or is there some other special setup required to trigger the bug that > I am missing in reading the paragraph above? Here's what I see: > > <Add an "exit 1 &&" right after "init_repos &&" in the 'diff with > directory/file conflicts' test, run until first failure, then: > > $ cd trash directory.t1092-sparse-checkout-compatibility/full-checkout > $ git reset --hard > $ git checkout rename-in-to-out > $ echo more stuff >>folder1/edited-content > $ git add -u > $ git checkout df-conflict > error: Your local changes to the following files would be overwritten > by checkout: > folder1/edited-content > Please commit your changes or stash them before you switch branches. > Aborting > > This looks like the expected behavior to me, and is what I'd also > expect from the sparse-checkout and sparse-index cases. It is fragile to the data shape in my test, so I'll be sure to include one in the next series version that demonstrates the change. >> It just happens that a sparse-checkout will have a _different_ kind of >> incorrect behavior! >> >> However, when adding the test on top of the ds/status-with-sparse-index >> branch, the full checkout case matches the sparse-checkout! I bisected >> this to the additions of files adjacent to folder1/ (folder1. folder1-, >> etc) in e669ffb (t1092: expand repository data shape, 2021-07-14). If I >> switch the test to conflict on folder2, then I get the strange behavior >> that I was noticing on 'master'. >> >> Some very subtle things are going on here, and they don't necessarily >> involve the sparse index. Adding the sparse index to the mix creates a >> third incorrect behavior to this already-broken case. >> >> If we agree that the correct thing to do here is to reject the merge and >> fail the command, then I can start working on making that change in >> isolation (because _none_ of the existing behaviors are correct). > > Yes, rejecting the merge is the correct behavior. This is implied by > the existing documentation for both the --merge and --force options to > checkout. > >> That leaves a question as to whether we should hold up this series for >> that reason, or if I should pursue a fix to this kind of conflict as a >> forward fix on top of it. What do you think, Elijah and Junio? > > I only dug in and found the sparse-checkout/sparse-index bugs because > the D/F changes you made to twoway_merge() looked clearly wrong to me > and I was trying to find a case that would demonstrate it and make it > easier for you to fix up. I still think the patch is wrong and that > it adds a bug. If you can drop that patch, and still get correct > behavior in your tests, then I think we can ignore other bugs in this > area, but I'm not happy with that particular patch. If you need that > patch, then it needs to be corrected, which probably means figuring > out all these bugs. That's a good point. I reverted the patch and re-ran the test and found that actually the patch is necessary in order to match the _incorrect_ behavior. Without the patch, the sparse-index case (correctly) refuses to complete the checkout. I'll replace this patch with a test change that demonstrates these subtleties and marks them as NEEDSWORK. Thanks, -Stolee