diff mbox series

[v7,10/16] unpack-trees: handle dir/file conflict of sparse entries

Message ID 9f31c691af6780f0ea48bdcb5ff6d56b628f1a81.1624932294.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse-index: integrate with status | expand

Commit Message

Derrick Stolee June 29, 2021, 2:04 a.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++--
 unpack-trees.c                           |  5 ++++-
 2 files changed, 27 insertions(+), 3 deletions(-)

Comments

Elijah Newren July 7, 2021, 11:19 p.m. UTC | #1
On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++--
>  unpack-trees.c                           |  5 ++++-
>  2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 3f61e5686b5..4e6446e7545 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -95,6 +95,19 @@ test_expect_success 'setup' '
>                 git add . &&
>                 git commit -m "rename deep/deeper1/... to folder1/..." &&
>
> +               git checkout -b df-conflict base &&
> +               rm -rf folder1 &&
> +               echo content >folder1 &&
> +               git add . &&
> +               git commit -m df &&
> +
> +               git checkout -b fd-conflict base &&
> +               rm a &&
> +               mkdir a &&
> +               echo content >a/a &&
> +               git add . &&
> +               git commit -m fd &&
> +
>                 git checkout -b deepest base &&
>                 echo "updated deepest" >deep/deeper1/deepest/a &&
>                 git commit -a -m "update deepest" &&
> @@ -325,7 +338,11 @@ test_expect_success 'diff --staged' '
>  test_expect_success 'diff with renames and conflicts' '
>         init_repos &&
>
> -       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> +       for branch in rename-out-to-out \
> +                     rename-out-to-in \
> +                     rename-in-to-out \
> +                     df-conflict \
> +                     fd-conflict
>         do
>                 test_all_match git checkout rename-base &&
>                 test_all_match git checkout $branch -- .&&
> @@ -338,7 +355,11 @@ test_expect_success 'diff with renames and conflicts' '
>  test_expect_success 'diff with directory/file conflicts' '
>         init_repos &&
>
> -       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> +       for branch in rename-out-to-out \
> +                     rename-out-to-in \
> +                     rename-in-to-out \
> +                     df-conflict \
> +                     fd-conflict
>         do
>                 git -C full-checkout reset --hard &&
>                 test_sparse_match git reset --hard &&

Tests look good...

> diff --git a/unpack-trees.c b/unpack-trees.c
> index d141dffbd94..e63b2dcacbc 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -2617,7 +2617,10 @@ int twoway_merge(const struct cache_entry * const *src,
>                          same(current, oldtree) && !same(current, newtree)) {
>                         /* 20 or 21 */
>                         return merged_entry(newtree, current, o);
> -               } else
> +               } else if (current && !oldtree && newtree &&
> +                          S_ISSPARSEDIR(current->ce_mode) != S_ISSPARSEDIR(newtree->ce_mode))
> +                       return merged_entry(newtree, current, o);
> +               else
>                         return reject_merge(current, o);
>         }
>         else if (newtree) {

This seems wrong to me but I'm having a hard time nailing down a
testcase to prove it.  The logic looks to me like "if the old tree as
nothing in the index at the given path, and the newtree has something,
and the index had something staged, but the newtree and staged index
entry disagree on the type of the object, do some weird merged_entry()
logic on both types of trees that tends to just take the newer I
thought but who knows what functions like verify_uptodate(entry) do
when entry is a sparse directory...".

So, I'm not so sure about this.  Could you explain this a bit more?

However, I did find a testcase that aborts with a fatal error...though
I can't tell if it's even triggering the above logic; I think it isn't
because I have an "ignoreme" on both sides of the history.  Here's the
testcase:

# Make a little test repo
git init dumb
cd dumb

# Setup old commit
touch tracked
echo foo >ignoreme
git add .
git commit -m "Initial"
git branch orig

# Setup new commit
git rm ignoreme
mkdir ignoreme
touch ignoreme/file
git add ignoreme/file
git commit -m "whatever"

# Switch to old commit
git checkout orig

# Make index != new (and index != old)
git rm ignoreme
mkdir ignoreme
echo user-data >ignoreme/file
git add ignoreme/file

# Sparsify
GIT_TEST_SPARSE_INDEX=0 # GIT_TEST_SPARSE_INDEX is documented as a boolean;
                        # but the traditional boolean value is ignored and it
                        # really only cares about set/unset.  Confusing.
git sparse-checkout init --cone --sparse-index
git sparse-checkout set tracked

# Check status and dirs/paths in index
git status --porcelain
test-tool read-cache --table
test-tool read-cache --table --expand

# Run a command that aborts with a fatal error
git checkout -m master
Elijah Newren July 9, 2021, 12:58 a.m. UTC | #2
On Wed, Jul 7, 2021 at 4:19 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  t/t1092-sparse-checkout-compatibility.sh | 25 ++++++++++++++++++++++--
> >  unpack-trees.c                           |  5 ++++-
> >  2 files changed, 27 insertions(+), 3 deletions(-)
> >
> > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> > index 3f61e5686b5..4e6446e7545 100755
> > --- a/t/t1092-sparse-checkout-compatibility.sh
> > +++ b/t/t1092-sparse-checkout-compatibility.sh
> > @@ -95,6 +95,19 @@ test_expect_success 'setup' '
> >                 git add . &&
> >                 git commit -m "rename deep/deeper1/... to folder1/..." &&
> >
> > +               git checkout -b df-conflict base &&
> > +               rm -rf folder1 &&
> > +               echo content >folder1 &&
> > +               git add . &&
> > +               git commit -m df &&
> > +
> > +               git checkout -b fd-conflict base &&
> > +               rm a &&
> > +               mkdir a &&
> > +               echo content >a/a &&
> > +               git add . &&
> > +               git commit -m fd &&
> > +
> >                 git checkout -b deepest base &&
> >                 echo "updated deepest" >deep/deeper1/deepest/a &&
> >                 git commit -a -m "update deepest" &&
> > @@ -325,7 +338,11 @@ test_expect_success 'diff --staged' '
> >  test_expect_success 'diff with renames and conflicts' '
> >         init_repos &&
> >
> > -       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> > +       for branch in rename-out-to-out \
> > +                     rename-out-to-in \
> > +                     rename-in-to-out \
> > +                     df-conflict \
> > +                     fd-conflict
> >         do
> >                 test_all_match git checkout rename-base &&
> >                 test_all_match git checkout $branch -- .&&
> > @@ -338,7 +355,11 @@ test_expect_success 'diff with renames and conflicts' '
> >  test_expect_success 'diff with directory/file conflicts' '
> >         init_repos &&
> >
> > -       for branch in rename-out-to-out rename-out-to-in rename-in-to-out
> > +       for branch in rename-out-to-out \
> > +                     rename-out-to-in \
> > +                     rename-in-to-out \
> > +                     df-conflict \
> > +                     fd-conflict
> >         do
> >                 git -C full-checkout reset --hard &&
> >                 test_sparse_match git reset --hard &&
>
> Tests look good...
>
> > diff --git a/unpack-trees.c b/unpack-trees.c
> > index d141dffbd94..e63b2dcacbc 100644
> > --- a/unpack-trees.c
> > +++ b/unpack-trees.c
> > @@ -2617,7 +2617,10 @@ int twoway_merge(const struct cache_entry * const *src,
> >                          same(current, oldtree) && !same(current, newtree)) {
> >                         /* 20 or 21 */
> >                         return merged_entry(newtree, current, o);
> > -               } else
> > +               } else if (current && !oldtree && newtree &&
> > +                          S_ISSPARSEDIR(current->ce_mode) != S_ISSPARSEDIR(newtree->ce_mode))
> > +                       return merged_entry(newtree, current, o);
> > +               else
> >                         return reject_merge(current, o);
> >         }
> >         else if (newtree) {

t1092 still passes if you replace the
    return merged_entry(newtree, current, o);
line with
    die("This line is never hit.");

Is it possible that you thought you needed this block but further
refactoring removed the need?  Or that it is only needed by the later
ds/commit-and-checkout-with-sparse-index topic (which I haven't yet
reviewed, because I was reviewing this topic first)?  It seems this
code change should either be dropped, or moved out to the relevant
series that uses it.

> This seems wrong to me but I'm having a hard time nailing down a
> testcase to prove it.  The logic looks to me like "if the old tree as
> nothing in the index at the given path, and the newtree has something,
> and the index had something staged, but the newtree and staged index
> entry disagree on the type of the object, do some weird merged_entry()
> logic on both types of trees that tends to just take the newer I
> thought but who knows what functions like verify_uptodate(entry) do
> when entry is a sparse directory...".
>
> So, I'm not so sure about this.  Could you explain this a bit more?
>
> However, I did find a testcase that aborts with a fatal error...though
> I can't tell if it's even triggering the above logic; I think it isn't
> because I have an "ignoreme" on both sides of the history.  Here's the
> testcase:
>
> # Make a little test repo
> git init dumb
> cd dumb
>
> # Setup old commit
> touch tracked
> echo foo >ignoreme
> git add .
> git commit -m "Initial"
> git branch orig
>
> # Setup new commit
> git rm ignoreme
> mkdir ignoreme
> touch ignoreme/file
> git add ignoreme/file
> git commit -m "whatever"
>
> # Switch to old commit
> git checkout orig
>
> # Make index != new (and index != old)
> git rm ignoreme
> mkdir ignoreme
> echo user-data >ignoreme/file
> git add ignoreme/file
>
> # Sparsify
> GIT_TEST_SPARSE_INDEX=0 # GIT_TEST_SPARSE_INDEX is documented as a boolean;
>                         # but the traditional boolean value is ignored and it
>                         # really only cares about set/unset.  Confusing.
> git sparse-checkout init --cone --sparse-index
> git sparse-checkout set tracked
>
> # Check status and dirs/paths in index
> git status --porcelain
> test-tool read-cache --table
> test-tool read-cache --table --expand
>
> # Run a command that aborts with a fatal error
> git checkout -m master

It turns out that this testcase I provided still triggers the same
fatal error if you omit the --sparse-index flag, so it's not a
sparse-index-specific bug.

So, perhaps it shouldn't hold up this series, but given that a lot of
your correctness verification in t1092 relies on comparisons between
sparse checkouts and sparse indexes, it might be worth trying to get
to the root of this.
Derrick Stolee July 12, 2021, 1:46 p.m. UTC | #3
On 7/8/2021 8:58 PM, Elijah Newren wrote:
> On Wed, Jul 7, 2021 at 4:19 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
...
>>> +               } else if (current && !oldtree && newtree &&
>>> +                          S_ISSPARSEDIR(current->ce_mode) != S_ISSPARSEDIR(newtree->ce_mode))
>>> +                       return merged_entry(newtree, current, o);
>>> +               else
>>>                         return reject_merge(current, o);
>>>         }
>>>         else if (newtree) {
> 
> t1092 still passes if you replace the
>     return merged_entry(newtree, current, o);
> line with
>     die("This line is never hit.");
> 
> Is it possible that you thought you needed this block but further
> refactoring removed the need?  Or that it is only needed by the later
> ds/commit-and-checkout-with-sparse-index topic (which I haven't yet
> reviewed, because I was reviewing this topic first)?  It seems this
> code change should either be dropped, or moved out to the relevant
> series that uses it.

I have been working with the whole stack of patches (including the
ones that update 'git add') and trying to make the necessary changes
in this series, especially when updating the test data shape or
modifying methods that were necessary for 'git status'.

It is likely I was just overcautious thinking this was necessary so
early. It might also have been necessary due to some strange case
that is only exposed in a Scalar functional test. I'll try moving it
to the next series and double check those tests.

>> This seems wrong to me but I'm having a hard time nailing down a
>> testcase to prove it.  The logic looks to me like "if the old tree as
>> nothing in the index at the given path, and the newtree has something,
>> and the index had something staged, but the newtree and staged index
>> entry disagree on the type of the object, do some weird merged_entry()
>> logic on both types of trees that tends to just take the newer I
>> thought but who knows what functions like verify_uptodate(entry) do
>> when entry is a sparse directory...".
>>
>> So, I'm not so sure about this.  Could you explain this a bit more?

The most important point is that 'current' and 'newtree' are both
present but have different types (blob and tree) and the tree is
necessarily at the edge of the sparse-checkout cone. In the cases
where I was able to trigger this logic in the debugger, 'oldtree'
was NULL, so I added as a condition to be extra cautious around
unexpected initialization. It is possible that the '!oldtree'
condition is unnecessary. I will investigate.

As for verify_uptodate(), it ignores the sparse directory unless
o->skip_sparse_checkout is set. I wonder if it is possible to
have that set and hit this case.

>> However, I did find a testcase that aborts with a fatal error...though
>> I can't tell if it's even triggering the above logic; I think it isn't
>> because I have an "ignoreme" on both sides of the history.  Here's the
>> testcase:
>>
...
>> # Run a command that aborts with a fatal error
>> git checkout -m master

Thank you for finding an interesting test!

> It turns out that this testcase I provided still triggers the same
> fatal error if you omit the --sparse-index flag, so it's not a
> sparse-index-specific bug.
> 
> So, perhaps it shouldn't hold up this series, but given that a lot of
> your correctness verification in t1092 relies on comparisons between
> sparse checkouts and sparse indexes, it might be worth trying to get
> to the root of this.
 
I have some upcoming thoughts on changing how sparse-checkout works
for other complicated cases, so I will add this to that pile.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 3f61e5686b5..4e6446e7545 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -95,6 +95,19 @@  test_expect_success 'setup' '
 		git add . &&
 		git commit -m "rename deep/deeper1/... to folder1/..." &&
 
+		git checkout -b df-conflict base &&
+		rm -rf folder1 &&
+		echo content >folder1 &&
+		git add . &&
+		git commit -m df &&
+
+		git checkout -b fd-conflict base &&
+		rm a &&
+		mkdir a &&
+		echo content >a/a &&
+		git add . &&
+		git commit -m fd &&
+
 		git checkout -b deepest base &&
 		echo "updated deepest" >deep/deeper1/deepest/a &&
 		git commit -a -m "update deepest" &&
@@ -325,7 +338,11 @@  test_expect_success 'diff --staged' '
 test_expect_success 'diff with renames and conflicts' '
 	init_repos &&
 
-	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	for branch in rename-out-to-out \
+		      rename-out-to-in \
+		      rename-in-to-out \
+		      df-conflict \
+		      fd-conflict
 	do
 		test_all_match git checkout rename-base &&
 		test_all_match git checkout $branch -- .&&
@@ -338,7 +355,11 @@  test_expect_success 'diff with renames and conflicts' '
 test_expect_success 'diff with directory/file conflicts' '
 	init_repos &&
 
-	for branch in rename-out-to-out rename-out-to-in rename-in-to-out
+	for branch in rename-out-to-out \
+		      rename-out-to-in \
+		      rename-in-to-out \
+		      df-conflict \
+		      fd-conflict
 	do
 		git -C full-checkout reset --hard &&
 		test_sparse_match git reset --hard &&
diff --git a/unpack-trees.c b/unpack-trees.c
index d141dffbd94..e63b2dcacbc 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2617,7 +2617,10 @@  int twoway_merge(const struct cache_entry * const *src,
 			 same(current, oldtree) && !same(current, newtree)) {
 			/* 20 or 21 */
 			return merged_entry(newtree, current, o);
-		} else
+		} else if (current && !oldtree && newtree &&
+			   S_ISSPARSEDIR(current->ce_mode) != S_ISSPARSEDIR(newtree->ce_mode))
+			return merged_entry(newtree, current, o);
+		else
 			return reject_merge(current, o);
 	}
 	else if (newtree) {