diff mbox series

[v7,12/16] diff-lib: handle index diffs with sparse dirs

Message ID f83aa08ff6b0fd18d6f9f3ce5ee993523a7f1759.1624932294.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse-index: integrate with status | expand

Commit Message

Derrick Stolee June 29, 2021, 2:04 a.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

While comparing an index to a tree, we may see a sparse directory entry.
In this case, we should compare that portion of the tree to the tree
represented by that entry. This could include a new tree which needs to
be expanded to a full list of added files. It could also include an
existing tree, in which case all of the changes inside are important to
describe, including the modifications, additions, and deletions. Note
that the case where the tree has a path and the index does not remains
identical to before: the lack of a cache entry is the same with a sparse
index.

Use diff_tree_oid() appropriately to compute the diff.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff-lib.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Elijah Newren July 8, 2021, 11:10 p.m. UTC | #1
On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> While comparing an index to a tree, we may see a sparse directory entry.
> In this case, we should compare that portion of the tree to the tree
> represented by that entry. This could include a new tree which needs to
> be expanded to a full list of added files. It could also include an
> existing tree, in which case all of the changes inside are important to
> describe, including the modifications, additions, and deletions. Note
> that the case where the tree has a path and the index does not remains
> identical to before: the lack of a cache entry is the same with a sparse
> index.
>
> Use diff_tree_oid() appropriately to compute the diff.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  diff-lib.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/diff-lib.c b/diff-lib.c
> index c2ac9250fe9..3f32f038371 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -325,6 +325,11 @@ static void show_new_file(struct rev_info *revs,
>         unsigned dirty_submodule = 0;
>         struct index_state *istate = revs->diffopt.repo->index;
>
> +       if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
> +               diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
> +               return;
> +       }
> +
>         /*
>          * New file in the index: it might actually be different in
>          * the working tree.
> @@ -347,6 +352,17 @@ static int show_modified(struct rev_info *revs,
>         unsigned dirty_submodule = 0;
>         struct index_state *istate = revs->diffopt.repo->index;
>
> +       /*
> +        * If both are sparse directory entries, then expand the
> +        * modifications to the file level.
> +        */
> +       if (old_entry && new_entry &&
> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
> +               diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
> +               return 0;
> +       }
> +
>         if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
>                           &dirty_submodule, &revs->diffopt) < 0) {
>                 if (report_missing)

Love the simpler patch.

I'm curious about the case where S_ISSPARSEDIR(old_entry->ce_mode) !=
S_ISSPARSEDIR(new_entry->ce_mode), though; how is that handled?
Elijah Newren July 8, 2021, 11:51 p.m. UTC | #2
On Thu, Jul 8, 2021 at 4:10 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > While comparing an index to a tree, we may see a sparse directory entry.
> > In this case, we should compare that portion of the tree to the tree
> > represented by that entry. This could include a new tree which needs to
> > be expanded to a full list of added files. It could also include an
> > existing tree, in which case all of the changes inside are important to
> > describe, including the modifications, additions, and deletions. Note
> > that the case where the tree has a path and the index does not remains
> > identical to before: the lack of a cache entry is the same with a sparse
> > index.
> >
> > Use diff_tree_oid() appropriately to compute the diff.
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  diff-lib.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/diff-lib.c b/diff-lib.c
> > index c2ac9250fe9..3f32f038371 100644
> > --- a/diff-lib.c
> > +++ b/diff-lib.c
> > @@ -325,6 +325,11 @@ static void show_new_file(struct rev_info *revs,
> >         unsigned dirty_submodule = 0;
> >         struct index_state *istate = revs->diffopt.repo->index;
> >
> > +       if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
> > +               diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
> > +               return;
> > +       }
> > +
> >         /*
> >          * New file in the index: it might actually be different in
> >          * the working tree.
> > @@ -347,6 +352,17 @@ static int show_modified(struct rev_info *revs,
> >         unsigned dirty_submodule = 0;
> >         struct index_state *istate = revs->diffopt.repo->index;
> >
> > +       /*
> > +        * If both are sparse directory entries, then expand the
> > +        * modifications to the file level.
> > +        */
> > +       if (old_entry && new_entry &&
> > +           S_ISSPARSEDIR(old_entry->ce_mode) &&
> > +           S_ISSPARSEDIR(new_entry->ce_mode)) {
> > +               diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
> > +               return 0;
> > +       }
> > +
> >         if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
> >                           &dirty_submodule, &revs->diffopt) < 0) {
> >                 if (report_missing)
>
> Love the simpler patch.
>
> I'm curious about the case where S_ISSPARSEDIR(old_entry->ce_mode) !=
> S_ISSPARSEDIR(new_entry->ce_mode), though; how is that handled?

Digging a little deeper, it appears that we could add this just before
your new if-block:

    assert(S_ISSPARSEDIR(old_entry->ce_mode) ==
           S_ISSPARSEDIR(new_entry->ce_mode));

And the code still functions, while that also removes some of the
surprise factor.  I'm guessing that the difference between "folder1"
and "folder1/" cause us to never try to directly compare a file to a
directory...but if that's accurate, a comment of some effect might
help make this code be a little clearer and make readers less likely
to wonder why you need to check that both old and new are sparse
directories.
Derrick Stolee July 12, 2021, 1:52 p.m. UTC | #3
On 7/8/2021 7:51 PM, Elijah Newren wrote:
> On Thu, Jul 8, 2021 at 4:10 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Mon, Jun 28, 2021 at 7:05 PM Derrick Stolee via GitGitGadget
...
>>> +       /*
>>> +        * If both are sparse directory entries, then expand the
>>> +        * modifications to the file level.
>>> +        */
>>> +       if (old_entry && new_entry &&
>>> +           S_ISSPARSEDIR(old_entry->ce_mode) &&
>>> +           S_ISSPARSEDIR(new_entry->ce_mode)) {
>>> +               diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
>>> +               return 0;
>>> +       }
>>> +
>>>         if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
>>>                           &dirty_submodule, &revs->diffopt) < 0) {
>>>                 if (report_missing)
>>
>> Love the simpler patch.
>>
>> I'm curious about the case where S_ISSPARSEDIR(old_entry->ce_mode) !=
>> S_ISSPARSEDIR(new_entry->ce_mode), though; how is that handled?
> 
> Digging a little deeper, it appears that we could add this just before
> your new if-block:
> 
>     assert(S_ISSPARSEDIR(old_entry->ce_mode) ==
>            S_ISSPARSEDIR(new_entry->ce_mode));
> 
> And the code still functions, while that also removes some of the
> surprise factor.  I'm guessing that the difference between "folder1"
> and "folder1/" cause us to never try to directly compare a file to a
> directory...but if that's accurate, a comment of some effect might
> help make this code be a little clearer and make readers less likely
> to wonder why you need to check that both old and new are sparse
> directories.

I was surprised that this worked, because my patch conditioned on
old_entry and new_entry being non-NULL. But of course show_modified()
requires them to be non-NULL. That can be further simplified.

Adding the assert helps demonstrate this expectation, but also I will
expand upon the comment.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/diff-lib.c b/diff-lib.c
index c2ac9250fe9..3f32f038371 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -325,6 +325,11 @@  static void show_new_file(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) {
+		diff_tree_oid(NULL, &new_file->oid, new_file->name, &revs->diffopt);
+		return;
+	}
+
 	/*
 	 * New file in the index: it might actually be different in
 	 * the working tree.
@@ -347,6 +352,17 @@  static int show_modified(struct rev_info *revs,
 	unsigned dirty_submodule = 0;
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	/*
+	 * If both are sparse directory entries, then expand the
+	 * modifications to the file level.
+	 */
+	if (old_entry && new_entry &&
+	    S_ISSPARSEDIR(old_entry->ce_mode) &&
+	    S_ISSPARSEDIR(new_entry->ce_mode)) {
+		diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt);
+		return 0;
+	}
+
 	if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing,
 			  &dirty_submodule, &revs->diffopt) < 0) {
 		if (report_missing)