mbox series

[0/5] Sparse index: integrate with commit and checkout

Message ID pull.973.git.1624932786.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse index: integrate with commit and checkout | expand

Message

Philippe Blain via GitGitGadget June 29, 2021, 2:13 a.m. UTC
This series extends our integration of sparse-index to 'git commit' and 'git
checkout'.

This is based on ds/status-with-sparse-index (v7) and v2.32.0. The hard work
was already done in that topic, so these changes are simple.

Recall that we have delayed our integration with 'git add' until we can work
out the concerns about how to deal with pathspecs outside of the
sparse-checkout definition. Those concerns might have some overlap with how
'git commit' takes a pathspec, but this seems like a rare enough case to
handle here and we can be more careful with the behavior change in the next
series which will integrate with git add.

In addition to the tests that already exist in t1092, I have integrated
these changes in microsoft/git and tested them against the Scalar functional
tests, which go through quite a few complicated scenarios, verifying that
things work the same across the full index and sparse-index cases.

Thanks, -Stolee

Derrick Stolee (5):
  p2000: add 'git checkout -' test and decrease depth
  p2000: compress repo names
  commit: integrate with sparse-index
  sparse-index: recompute cache-tree
  checkout: stop expanding sparse indexes

 builtin/checkout.c                       |  8 ++--
 builtin/commit.c                         |  3 ++
 cache-tree.c                             |  2 -
 sparse-index.c                           |  2 +
 t/perf/p2000-sparse-operations.sh        | 47 ++++++++++++--------
 t/t1092-sparse-checkout-compatibility.sh | 55 ++++++++++++++++++++++--
 6 files changed, 89 insertions(+), 28 deletions(-)


base-commit: 1d744848ee6b58ccaf3a30f20abe9797ed5d2ce7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-973%2Fderrickstolee%2Fsparse-index%2Fcommit-and-checkout-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-973/derrickstolee/sparse-index/commit-and-checkout-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/973

Comments

Elijah Newren July 9, 2021, 9:26 p.m. UTC | #1
On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series extends our integration of sparse-index to 'git commit' and 'git
> checkout'.
>
> This is based on ds/status-with-sparse-index (v7) and v2.32.0. The hard work
> was already done in that topic, so these changes are simple.
>
> Recall that we have delayed our integration with 'git add' until we can work
> out the concerns about how to deal with pathspecs outside of the
> sparse-checkout definition. Those concerns might have some overlap with how
> 'git commit' takes a pathspec, but this seems like a rare enough case to
> handle here and we can be more careful with the behavior change in the next
> series which will integrate with git add.
>
> In addition to the tests that already exist in t1092, I have integrated
> these changes in microsoft/git and tested them against the Scalar functional
> tests, which go through quite a few complicated scenarios, verifying that
> things work the same across the full index and sparse-index cases.
>
> Thanks, -Stolee
>
> Derrick Stolee (5):
>   p2000: add 'git checkout -' test and decrease depth
>   p2000: compress repo names
>   commit: integrate with sparse-index
>   sparse-index: recompute cache-tree
>   checkout: stop expanding sparse indexes
>
>  builtin/checkout.c                       |  8 ++--
>  builtin/commit.c                         |  3 ++
>  cache-tree.c                             |  2 -
>  sparse-index.c                           |  2 +
>  t/perf/p2000-sparse-operations.sh        | 47 ++++++++++++--------
>  t/t1092-sparse-checkout-compatibility.sh | 55 ++++++++++++++++++++++--
>  6 files changed, 89 insertions(+), 28 deletions(-)
>
>
> base-commit: 1d744848ee6b58ccaf3a30f20abe9797ed5d2ce7
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-973%2Fderrickstolee%2Fsparse-index%2Fcommit-and-checkout-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-973/derrickstolee/sparse-index/commit-and-checkout-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/973

I've read over these patches and didn't find any problems in them in
my reading; however, since it builds upon ds/status-with-sparse-index
(v7)...

I decided to retry some of my ideas and testing on Patch 10/16 of v7,
over at https://lore.kernel.org/git/CABPp-BHwTAKwFiWQ0-2P=_g+7HLK5FfOAz-uujRjLou1fXT3zw@mail.gmail.com/

It turns out that the block you added there is now triggered by t1092
after this series, and the testcase won't pass without that block.  It
might be clearer to move that code fragment, or perhaps the whole
patch, into this series...though the code fragment as is has
introduced a bug.  If you take t1092 test 12 ("diff with
directory/file conflicts") and modify it so that before the

    git checkout df-conflict

invocation from sparse-index, you first run:

    $ git sparse-checkout disable
    $ echo more stuff >>folder1/edited-content
    $ git add -u
    $ git diff HEAD   # note the changes
    $ git sparse-checkout init --cone --sparse-index
    $ git diff HEAD   # note the changes are still there
    $ git checkout df-conflict  # no error??  What about the
conflicting changes?
    $ git diff HEAD

then the last command will show that the staged changes from before
the commit have simply been discarded.  In short, this makes the
series behave like --force was passed with sparse directory entries,
when --force wasn't passed.

So we've still got some directory/file conflict issues.
Derrick Stolee July 12, 2021, 6:46 p.m. UTC | #2
On 7/9/2021 5:26 PM, Elijah Newren wrote:
> On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
...
> I've read over these patches and didn't find any problems in them in
> my reading; however, since it builds upon ds/status-with-sparse-index
> (v7)...
> 
> I decided to retry some of my ideas and testing on Patch 10/16 of v7,
> over at https://lore.kernel.org/git/CABPp-BHwTAKwFiWQ0-2P=_g+7HLK5FfOAz-uujRjLou1fXT3zw@mail.gmail.com/
> 
> It turns out that the block you added there is now triggered by t1092
> after this series, and the testcase won't pass without that block.  It
> might be clearer to move that code fragment, or perhaps the whole
> patch, into this series...though the code fragment as is has
> introduced a bug.  If you take t1092 test 12 ("diff with
> directory/file conflicts") and modify it so that before the
> 
>     git checkout df-conflict
> 
> invocation from sparse-index, you first run:
> 
>     $ git sparse-checkout disable
>     $ echo more stuff >>folder1/edited-content
>     $ git add -u
>     $ git diff HEAD   # note the changes
>     $ git sparse-checkout init --cone --sparse-index
>     $ git diff HEAD   # note the changes are still there
>     $ git checkout df-conflict  # no error??  What about the
> conflicting changes?
>     $ git diff HEAD
> 
> then the last command will show that the staged changes from before
> the commit have simply been discarded.  In short, this makes the
> series behave like --force was passed with sparse directory entries,
> when --force wasn't passed.
> 
> So we've still got some directory/file conflict issues.

You are absolutely right that this seems strange. In fact, there
is a behavior change during 'git checkout' for sparse-checkouts
in general, but also my sparse-index change creates an additional
change in this case.

Here is a test that demonstrates the issue:

test_expect_success 'staged directory/file conflict' '
	init_repos &&

	test_sparse_match git sparse-checkout disable &&
	write_script edit-contents <<-\EOF &&
	echo text >>folder1/edited-content
	EOF
	run_on_all ../edit-contents &&

	test_all_match git add folder1/edited-content &&
	test_all_match git diff HEAD &&
	git -C sparse-checkout sparse-checkout init --cone --no-sparse-index &&
	git -C sparse-index sparse-checkout init --cone --sparse-index &&
	test_all_match git diff HEAD &&

	# Sparse-checkouts handle this conflict differently than
	# full checkouts, as they consider the file "folder1" to
	# be deleted in favor of the staged file
	# "folder1/edited-content".
	test_sparse_match git checkout df-conflict &&
	test_sparse_match git diff HEAD
'

The sparse-index case drops all staged changes during the
'git checkout df-conflict' command, so the test fails on
that line.

That final diff looks like this in the sparse-checkout
repo (no sparse index):

diff --git a/folder1 b/folder1
deleted file mode 100644
index d95f3ad..0000000
--- a/folder1
+++ /dev/null
@@ -1 +0,0 @@
-content
diff --git a/folder1/edited-content b/folder1/edited-content
new file mode 100644
index 0000000..8e27be7
--- /dev/null
+++ b/folder1/edited-content
@@ -0,0 +1 @@
+text

This is a strange case in that we have a staged tree that is
outside of the sparse-checkout cone. When running the 'git
checkout df-conflict' command, the twoway_merge() method
receives the following values:

 current: "folder1/" (tree OID)
 oldtree: "" (NULL OID)
 newtree: "folder1" (blob OID)

Is this value for 'oldtree' correct? It seems strange to me,
so I'll look further into it.

Clearly, the resolution that was presented in the previous
patch was incorrect so I will try to understand this
situation better.

Further, I expect it to be simpler to modify the behavior
here to match the full checkout case than to make the
sparse-index case match the normal sparse-checkout case.
The "natural" thing would be to keep the staged "folder1/"
directory, but that would present as adding all contained
content, not just the single staged entry.

Thanks,
-Stolee
Derrick Stolee July 16, 2021, 1:59 p.m. UTC | #3
On 7/12/2021 2:46 PM, Derrick Stolee wrote:
> On 7/9/2021 5:26 PM, Elijah Newren wrote:
>> On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>>
...
> This is a strange case in that we have a staged tree that is
> outside of the sparse-checkout cone. When running the 'git
> checkout df-conflict' command, the twoway_merge() method
> receives the following values:
> 
>  current: "folder1/" (tree OID)
>  oldtree: "" (NULL OID)
>  newtree: "folder1" (blob OID)
> 
> Is this value for 'oldtree' correct? It seems strange to me,
> so I'll look further into it.

This is correct. This 'oldtree' entry is actually the o->df_conflict
placeholder and is set to NULL inside the method.

> Further, I expect it to be simpler to modify the behavior
> here to match the full checkout case than to make the
> sparse-index case match the normal sparse-checkout case.
> The "natural" thing would be to keep the staged "folder1/"
> directory, but that would present as adding all contained
> content, not just the single staged entry.
Taking a closer look at the full checkout case, I discovered that the
'git checkout df-conflict' command succeeds in the full checkout case if I
apply it directly to the 'master' branch. In that situation, it completely
removes the staged change to folder1/edited-content! This seems like
incorrect behavior, and has nothing to do with the sparse-checkout feature.

It just happens that a sparse-checkout will have a _different_ kind of
incorrect behavior!

However, when adding the test on top of the ds/status-with-sparse-index
branch, the full checkout case matches the sparse-checkout! I bisected
this to the additions of files adjacent to folder1/ (folder1. folder1-,
etc) in e669ffb (t1092: expand repository data shape, 2021-07-14). If I
switch the test to conflict on folder2, then I get the strange behavior
that I was noticing on 'master'.

Some very subtle things are going on here, and they don't necessarily
involve the sparse index. Adding the sparse index to the mix creates a
third incorrect behavior to this already-broken case.

If we agree that the correct thing to do here is to reject the merge and
fail the command, then I can start working on making that change in
isolation (because _none_ of the existing behaviors are correct).

That leaves a question as to whether we should hold up this series for
that reason, or if I should pursue a fix to this kind of conflict as a
forward fix on top of it. What do you think, Elijah and Junio?

Thanks,
-Stolee
Elijah Newren July 17, 2021, 3:37 p.m. UTC | #4
On Fri, Jul 16, 2021 at 6:59 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/12/2021 2:46 PM, Derrick Stolee wrote:
> > On 7/9/2021 5:26 PM, Elijah Newren wrote:
> >> On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget
> >> <gitgitgadget@gmail.com> wrote:
> >>>
> ...
> > Further, I expect it to be simpler to modify the behavior
> > here to match the full checkout case than to make the
> > sparse-index case match the normal sparse-checkout case.
> > The "natural" thing would be to keep the staged "folder1/"
> > directory, but that would present as adding all contained
> > content, not just the single staged entry.
> Taking a closer look at the full checkout case, I discovered that the
> 'git checkout df-conflict' command succeeds in the full checkout case if I
> apply it directly to the 'master' branch. In that situation, it completely
> removes the staged change to folder1/edited-content! This seems like
> incorrect behavior, and has nothing to do with the sparse-checkout feature.

I was not able to reproduce.  Do you have other modifications to git,
or is there some other special setup required to trigger the bug that
I am missing in reading the paragraph above?  Here's what I see:

<Add an "exit 1 &&" right after "init_repos &&" in the 'diff with
directory/file conflicts' test, run until first failure, then:

$ cd trash directory.t1092-sparse-checkout-compatibility/full-checkout
$ git reset --hard
$ git checkout rename-in-to-out
$ echo more stuff >>folder1/edited-content
$ git add -u
$ git checkout df-conflict
error: Your local changes to the following files would be overwritten
by checkout:
folder1/edited-content
Please commit your changes or stash them before you switch branches.
Aborting

This looks like the expected behavior to me, and is what I'd also
expect from the sparse-checkout and sparse-index cases.

> It just happens that a sparse-checkout will have a _different_ kind of
> incorrect behavior!
>
> However, when adding the test on top of the ds/status-with-sparse-index
> branch, the full checkout case matches the sparse-checkout! I bisected
> this to the additions of files adjacent to folder1/ (folder1. folder1-,
> etc) in e669ffb (t1092: expand repository data shape, 2021-07-14). If I
> switch the test to conflict on folder2, then I get the strange behavior
> that I was noticing on 'master'.
>
> Some very subtle things are going on here, and they don't necessarily
> involve the sparse index. Adding the sparse index to the mix creates a
> third incorrect behavior to this already-broken case.
>
> If we agree that the correct thing to do here is to reject the merge and
> fail the command, then I can start working on making that change in
> isolation (because _none_ of the existing behaviors are correct).

Yes, rejecting the merge is the correct behavior.  This is implied by
the existing documentation for both the --merge and --force options to
checkout.

> That leaves a question as to whether we should hold up this series for
> that reason, or if I should pursue a fix to this kind of conflict as a
> forward fix on top of it. What do you think, Elijah and Junio?

I only dug in and found the sparse-checkout/sparse-index bugs because
the D/F changes you made to twoway_merge() looked clearly wrong to me
and I was trying to find a case that would demonstrate it and make it
easier for you to fix up.  I still think the patch is wrong and that
it adds a bug.  If you can drop that patch, and still get correct
behavior in your tests, then I think we can ignore other bugs in this
area, but I'm not happy with that particular patch.  If you need that
patch, then it needs to be corrected, which probably means figuring
out all these bugs.
Derrick Stolee July 19, 2021, 2:05 p.m. UTC | #5
On 7/17/2021 11:37 AM, Elijah Newren wrote:
> On Fri, Jul 16, 2021 at 6:59 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 7/12/2021 2:46 PM, Derrick Stolee wrote:
>>> On 7/9/2021 5:26 PM, Elijah Newren wrote:
>>>> On Mon, Jun 28, 2021 at 7:13 PM Derrick Stolee via GitGitGadget
>>>> <gitgitgadget@gmail.com> wrote:
>>>>>
>> ...
>>> Further, I expect it to be simpler to modify the behavior
>>> here to match the full checkout case than to make the
>>> sparse-index case match the normal sparse-checkout case.
>>> The "natural" thing would be to keep the staged "folder1/"
>>> directory, but that would present as adding all contained
>>> content, not just the single staged entry.
>> Taking a closer look at the full checkout case, I discovered that the
>> 'git checkout df-conflict' command succeeds in the full checkout case if I
>> apply it directly to the 'master' branch. In that situation, it completely
>> removes the staged change to folder1/edited-content! This seems like
>> incorrect behavior, and has nothing to do with the sparse-checkout feature.
> 
> I was not able to reproduce.  Do you have other modifications to git,
> or is there some other special setup required to trigger the bug that
> I am missing in reading the paragraph above?  Here's what I see:
> 
> <Add an "exit 1 &&" right after "init_repos &&" in the 'diff with
> directory/file conflicts' test, run until first failure, then:
> 
> $ cd trash directory.t1092-sparse-checkout-compatibility/full-checkout
> $ git reset --hard
> $ git checkout rename-in-to-out
> $ echo more stuff >>folder1/edited-content
> $ git add -u
> $ git checkout df-conflict
> error: Your local changes to the following files would be overwritten
> by checkout:
> folder1/edited-content
> Please commit your changes or stash them before you switch branches.
> Aborting
> 
> This looks like the expected behavior to me, and is what I'd also
> expect from the sparse-checkout and sparse-index cases.

It is fragile to the data shape in my test, so I'll be sure to
include one in the next series version that demonstrates the change.

>> It just happens that a sparse-checkout will have a _different_ kind of
>> incorrect behavior!
>>
>> However, when adding the test on top of the ds/status-with-sparse-index
>> branch, the full checkout case matches the sparse-checkout! I bisected
>> this to the additions of files adjacent to folder1/ (folder1. folder1-,
>> etc) in e669ffb (t1092: expand repository data shape, 2021-07-14). If I
>> switch the test to conflict on folder2, then I get the strange behavior
>> that I was noticing on 'master'.
>>
>> Some very subtle things are going on here, and they don't necessarily
>> involve the sparse index. Adding the sparse index to the mix creates a
>> third incorrect behavior to this already-broken case.
>>
>> If we agree that the correct thing to do here is to reject the merge and
>> fail the command, then I can start working on making that change in
>> isolation (because _none_ of the existing behaviors are correct).
> 
> Yes, rejecting the merge is the correct behavior.  This is implied by
> the existing documentation for both the --merge and --force options to
> checkout.
> 
>> That leaves a question as to whether we should hold up this series for
>> that reason, or if I should pursue a fix to this kind of conflict as a
>> forward fix on top of it. What do you think, Elijah and Junio?
> 
> I only dug in and found the sparse-checkout/sparse-index bugs because
> the D/F changes you made to twoway_merge() looked clearly wrong to me
> and I was trying to find a case that would demonstrate it and make it
> easier for you to fix up.  I still think the patch is wrong and that
> it adds a bug.  If you can drop that patch, and still get correct
> behavior in your tests, then I think we can ignore other bugs in this
> area, but I'm not happy with that particular patch.  If you need that
> patch, then it needs to be corrected, which probably means figuring
> out all these bugs.

That's a good point. I reverted the patch and re-ran the test and
found that actually the patch is necessary in order to match the
_incorrect_ behavior. Without the patch, the sparse-index case
(correctly) refuses to complete the checkout.

I'll replace this patch with a test change that demonstrates these
subtleties and marks them as NEEDSWORK.

Thanks,
-Stolee