mbox series

[0/2] Sparse Index: fix a checkout bug with deep sparse-checkout patterns

Message ID pull.1092.git.1638586534.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse Index: fix a checkout bug with deep sparse-checkout patterns | expand

Message

Philippe Blain via GitGitGadget Dec. 4, 2021, 2:55 a.m. UTC
This week, we rolled out the sparse index to a large internal monorepo. We
got two very similar bug reports that dealt with a strange error that
involved the same set of paths. One was during git pull (pull was a red
herring) and the other was git checkout. The git checkout case gave enough
of a reproduction to debug deep into unpack-trees.c and find the problem.

This bug dates back to 523506d (unpack-trees: unpack sparse directory
entries, 2021-07-14). The reason we didn't hit this before is because it
requires the following:

 1. The sparse-checkout definition needs to have recursive inclusion of deep
    folders (depth 3 or more).
 2. Adjacent to those deep folders, we need a deep sparse directory entry
    that receives changes.
 3. In this particular repo, deep directories are only added to the
    sparse-checkout in rare occasions and those adjacent folders are rarely
    updated. They happened to update this week and hit our sparse index
    dogfooders in surprising ways.

The first patch adds a test that fails without the fix. It requires
modifying our test data to make adjacent, deep sparse directory entries
possible. It's a rather simple test after we have that data change.

The second patch includes the actual fix. It's really just an error of not
understanding the difference between the name and traverse_path members of
the struct traverse_info structure. name only stores a single tree entry
while traverse_path actually includes the full name from root. The method we
are editing also has an additional struct name_entry that fills in the tree
entry on top of the traverse_path, which explains how this worked to depth
two, but not depth three.

Thanks, -Stolee

Derrick Stolee (2):
  t1092: add deeper changes during a checkout
  unpack-trees: use traverse_path instead of name

 t/t1092-sparse-checkout-compatibility.sh | 16 +++++++++++++++-
 unpack-trees.c                           | 10 +++++-----
 2 files changed, 20 insertions(+), 6 deletions(-)


base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1092%2Fderrickstolee%2Fsparse-index%2Fcheckout-bug-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1092/derrickstolee/sparse-index/checkout-bug-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1092

Comments

Elijah Newren Dec. 4, 2021, 5:45 a.m. UTC | #1
On Fri, Dec 3, 2021 at 6:55 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This week, we rolled out the sparse index to a large internal monorepo. We
> got two very similar bug reports that dealt with a strange error that
> involved the same set of paths. One was during git pull (pull was a red
> herring) and the other was git checkout. The git checkout case gave enough
> of a reproduction to debug deep into unpack-trees.c and find the problem.
>
> This bug dates back to 523506d (unpack-trees: unpack sparse directory
> entries, 2021-07-14). The reason we didn't hit this before is because it
> requires the following:
>
>  1. The sparse-checkout definition needs to have recursive inclusion of deep
>     folders (depth 3 or more).
>  2. Adjacent to those deep folders, we need a deep sparse directory entry
>     that receives changes.
>  3. In this particular repo, deep directories are only added to the
>     sparse-checkout in rare occasions and those adjacent folders are rarely
>     updated. They happened to update this week and hit our sparse index
>     dogfooders in surprising ways.
>
> The first patch adds a test that fails without the fix. It requires
> modifying our test data to make adjacent, deep sparse directory entries
> possible. It's a rather simple test after we have that data change.
>
> The second patch includes the actual fix. It's really just an error of not
> understanding the difference between the name and traverse_path members of
> the struct traverse_info structure. name only stores a single tree entry
> while traverse_path actually includes the full name from root. The method we
> are editing also has an additional struct name_entry that fills in the tree
> entry on top of the traverse_path, which explains how this worked to depth
> two, but not depth three.

Thanks for the detailed explanation.  I looked around for similar
potential problems elsewhere, but only noted that the comment at the
top of the function is also wrong and should be updated (as I
commented on Patch 2).  After you fix the comment similarly, feel free
to add my

Reviewed-by: Elijah Newren <newren@gmail.com>