mbox series

[v3,0/5] Sparse index: fetch, pull, ls-files

Message ID pull.1080.v3.git.1639149192.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse index: fetch, pull, ls-files | expand

Message

Philippe Blain via GitGitGadget Dec. 10, 2021, 3:13 p.m. UTC
This is based on ld/sparse-index-blame (merged with 'master' due to an
unrelated build issue).

Here are two relatively-simple patches that further the sparse index
integrations.

Did you know that 'fetch' and 'pull' read the index? I didn't, or this would
have been an integration much earlier in the cycle. They read the index to
look for the .gitmodules file in case there are submodules that need to be
fetched. Since looking for a file by name is already protected, we only need
to disable 'command_requires_full_index' and we are done.

The 'ls-files' builtin is useful when debugging the index, and some scripts
use it, too. We are not changing the default behavior which expands a sparse
index in order to show all of the cached blobs. Instead, we add a '--sparse'
option that allows us to see the sparse directory entries upon request.
Combined with --debug, we can see a lot of index details, such as:

$ git ls-files --debug --sparse
LICENSE
  ctime: 1634910503:287405820
  mtime: 1634910503:287405820
  dev: 16777220 ino: 119325319
  uid: 501  gid: 20
  size: 1098    flags: 200000
README.md
  ctime: 1634910503:288090279
  mtime: 1634910503:288090279
  dev: 16777220 ino: 119325320
  uid: 501  gid: 20
  size: 934 flags: 200000
bin/index.js
  ctime: 1634910767:828434033
  mtime: 1634910767:828434033
  dev: 16777220 ino: 119325520
  uid: 501  gid: 20
  size: 7292    flags: 200000
examples/
  ctime: 0:0
  mtime: 0:0
  dev: 0    ino: 0
  uid: 0    gid: 0
  size: 0   flags: 40004000
package.json
  ctime: 1634910503:288676330
  mtime: 1634910503:288676330
  dev: 16777220 ino: 119325321
  uid: 501  gid: 20
  size: 680 flags: 200000


(In this example, the 'examples/' directory is sparse.)

Thanks!


Updates in v2
=============

 * Rebased onto latest ld/sparse-index-blame without issue.
 * Updated the test to use diff-of-diffs instead of a sequence of greps.
 * Added patches that remove the use of 'test-tool read-cache --table' and
   its implementation.


Updates in v3
=============

 * Fixed typo in commit message.
 * Added comments around doing strange things in an ls-files test.
 * Fixed adjacent typo in a test comment.

Derrick Stolee (5):
  fetch/pull: use the sparse index
  ls-files: add --sparse option
  t1092: replace 'read-cache --table' with 'ls-files --sparse'
  t1091/t3705: remove 'test-tool read-cache --table'
  test-read-cache: remove --table, --expand options

 Documentation/git-ls-files.txt           |   4 +
 builtin/fetch.c                          |   2 +
 builtin/ls-files.c                       |  12 +-
 builtin/pull.c                           |   2 +
 t/helper/test-read-cache.c               |  64 ++---------
 t/t1091-sparse-checkout-builtin.sh       |  25 ++++-
 t/t1092-sparse-checkout-compatibility.sh | 137 ++++++++++++++++++++---
 t/t3705-add-sparse-checkout.sh           |   8 +-
 8 files changed, 172 insertions(+), 82 deletions(-)


base-commit: 3fffe69d24e4ecc95246766f5396303a953695ff
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1080%2Fderrickstolee%2Fsparse-index%2Ffetch-pull-ls-files-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1080/derrickstolee/sparse-index/fetch-pull-ls-files-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1080

Range-diff vs v2:

 1:  f72001638d1 = 1:  f72001638d1 fetch/pull: use the sparse index
 2:  58b5eca4835 ! 2:  b81174ba54b ls-files: add --sparse option
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'sparse index is n
      +	git -C sparse-checkout ls-files --sparse >sparse &&
      +	test_cmp dense sparse &&
      +
     ++	# Set up a strange condition of having a file edit
     ++	# outside of the sparse-checkout cone. This is just
     ++	# to verify that sparse-checkout and sparse-index
     ++	# behave the same in this case.
      +	write_script edit-content <<-\EOF &&
      +	mkdir folder1 &&
      +	echo content >>folder1/a
      +	EOF
      +	run_on_sparse ../edit-content &&
      +
     -+	# ls-files does not notice modified files whose
     -+	# cache entries are marked SKIP_WORKTREE.
     ++	# ls-files does not currently notice modified files whose
     ++	# cache entries are marked SKIP_WORKTREE. This may change
     ++	# in the future, but here we test that sparse index does
     ++	# not accidentally create a change of behavior.
      +	test_sparse_match git ls-files --modified &&
      +	test_must_be_empty sparse-checkout-out &&
      +	test_must_be_empty sparse-index-out &&
 3:  5ffae2a03ae ! 3:  2a6a1c5a39c t1092: replace 'read-cache --table' with 'ls-files --sparse'
     @@ t/t1092-sparse-checkout-compatibility.sh: test_sparse_unstaged () {
       			|| return 1
       	done &&
       
     - 	# Disabling the sparse-index removes tree entries with full ones
     +-	# Disabling the sparse-index removes tree entries with full ones
     ++	# Disabling the sparse-index replaces tree entries with full ones
       	git -C sparse-index sparse-checkout init --no-sparse-index &&
      -
      -	test-tool -C sparse-index read-cache --table >cache &&
 4:  b98e5e6d2bc ! 4:  f0143686754 t1091/t3705: remove 'test-tool read-cache --table'
     @@ Commit message
          t3705-add-sparse-checkout.sh.
      
          The important changes are due to the different output format. In t3705,
     -    wWe need to use the '--stage' output to get a file mode and OID, but
     +    we need to use the '--stage' output to get a file mode and OID, but
          it also includes a stage value and drops the object type. This leads
          to some differences in how we handle looking for specific entries.
      
 5:  f31a24eeb9b = 5:  9227dc54165 test-read-cache: remove --table, --expand options

Comments

Ævar Arnfjörð Bjarmason Dec. 10, 2021, 4:16 p.m. UTC | #1
On Fri, Dec 10 2021, Derrick Stolee via GitGitGadget wrote:

> Updates in v3
> =============
>
>  * Fixed typo in commit message.
>  * Added comments around doing strange things in an ls-files test.
>  * Fixed adjacent typo in a test comment.

Yay, I'm happy to see 5/5. Not because I didn't like the helper, but
that sparse is getting mature enough that we're getting ls-files to emit
information about it. Thanks.

There's the small "diff -u" portability issue noted in my just-sent
<211210.86zgp8bi48.gmgdl@evledraar.gmail.com>.

Other than that 2/5 adds this documentation about ls-files --sparse:

	If the index is sparse, show the sparse directories without expanding
	to the contained files.

Shouldn't we at least add:

	Sparse directories will be shown with a trailing slash,
	e.g. "x/" for a sparse directory "x".q

In addition to that I think this may have a buggy/unexpected interaction
with the --eol option:

    040000 aaff74984cccd156a469afa7d9ab10e4777beb24 0       i/      w/      attr/                   x/

I.e. should we be saying anything about the EOL state of these? OTOHO I
tried adding a submodule and it says the same, which seems similarly
odd, so maybe it's either correct, or this isn't updated for those
either.

Is the behavior of:

    $ git -C sparse-index ls-files --stage --sparse -- 'folder2/a'
    $ echo $?
    0

Expected? I.e. accepting /a when we'd just print "folder2/" and not
e.g. erroring (probably, just asking)?

How about:

    $ ls -l sparse-index/x
    ls: cannot access 'sparse-index/x': No such file or directory
    $ git -C sparse-index ls-files --stage 'x/*'
    100644 78981922613b2afb6025042ff6bd878ac1994e85 0       x/a
    $ git -C sparse-index ls-files --stage --no-empty-directory 'x/*' 
    100644 78981922613b2afb6025042ff6bd878ac1994e85 0       x/a
    $ git -C sparse-index ls-files --stage --no-empty-directory --sparse 'x/*' 
    040000 aaff74984cccd156a469afa7d9ab10e4777beb24 0       x/

The answer is probably "yes that's fine" because I've got no idea how
sparse really works, but just checking..

So it's very nice to have the new diff test in 2/5, but would be much
nicer/assuring to have that split into a trivial function followed by
seeing how the diff looked in combination with each of the other option
that "ls-files" accepts.
Elijah Newren Dec. 10, 2021, 6:45 p.m. UTC | #2
On Fri, Dec 10, 2021 at 8:31 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Dec 10 2021, Derrick Stolee via GitGitGadget wrote:
>
> > Updates in v3
> > =============
> >
> >  * Fixed typo in commit message.
> >  * Added comments around doing strange things in an ls-files test.
> >  * Fixed adjacent typo in a test comment.
>
> Yay, I'm happy to see 5/5. Not because I didn't like the helper, but
> that sparse is getting mature enough that we're getting ls-files to emit
> information about it. Thanks.
>
> There's the small "diff -u" portability issue noted in my just-sent
> <211210.86zgp8bi48.gmgdl@evledraar.gmail.com>.

Yeah, that one is an important point.

> Other than that 2/5 adds this documentation about ls-files --sparse:
>
>         If the index is sparse, show the sparse directories without expanding
>         to the contained files.
>
> Shouldn't we at least add:
>
>         Sparse directories will be shown with a trailing slash,
>         e.g. "x/" for a sparse directory "x".q

Makes sense.  Except I don't understand the trailing 'q' -- typo?

>
> In addition to that I think this may have a buggy/unexpected interaction
> with the --eol option:
>
>     040000 aaff74984cccd156a469afa7d9ab10e4777beb24 0       i/      w/      attr/                   x/
>
> I.e. should we be saying anything about the EOL state of these? OTOHO I
> tried adding a submodule and it says the same, which seems similarly
> odd, so maybe it's either correct, or this isn't updated for those
> either.

If it matches what we do for submodules, for which eol values are also
non-sensical, then I think we're good enough for this series.  Perhaps
we just shouldn't print anything eol related for directories with
--eol, but that sounds like an orthogonal series rather than something
that should go in this one.

> Is the behavior of:
>
>     $ git -C sparse-index ls-files --stage --sparse -- 'folder2/a'
>     $ echo $?
>     0
>
> Expected? I.e. accepting /a when we'd just print "folder2/" and not
> e.g. erroring (probably, just asking)?

Fair question.  I think it's fine; by way of comparison:

$ git rm --cached removed-and-no-longer-tracked-file
$ git ls-files --stage -- non-existent-file
removed-and-no-longer-tracked-file untracked-file
$ echo $?
0

So it also shows nothing and displays nothing when asked for file(s)
that are not in the index.

Yes, there is a slight semantic difference in that in your example we
have a "folder2/" entry which *could be* expanded, but I am quite
happy with the literal interpretation of the command that there is no
"folder2/a" in the index.  Said another way, I'm happy with ls-files
showing what is in the index right now, rather than what could be in
it, or listing things that HEAD contains that we don't for whatever
reason.

> How about:
>
>     $ ls -l sparse-index/x
>     ls: cannot access 'sparse-index/x': No such file or directory
>     $ git -C sparse-index ls-files --stage 'x/*'
>     100644 78981922613b2afb6025042ff6bd878ac1994e85 0       x/a
>     $ git -C sparse-index ls-files --stage --no-empty-directory 'x/*'
>     100644 78981922613b2afb6025042ff6bd878ac1994e85 0       x/a
>     $ git -C sparse-index ls-files --stage --no-empty-directory --sparse 'x/*'
>     040000 aaff74984cccd156a469afa7d9ab10e4777beb24 0       x/
>
> The answer is probably "yes that's fine" because I've got no idea how
> sparse really works, but just checking..

You should read the docs for this option you are trying: "Do not list
empty directories. Has no effect without --directory."    (Also,
--directory only takes effect with --other, which you are also
missing.)

So yeah, that flag is irrelevant.  Perhaps ls-files should print a
warning when flags are passed but ignored due to other flags not being
passed, but that would belong in an orthogonal series rather than this
one.

> So it's very nice to have the new diff test in 2/5, but would be much
> nicer/assuring to have that split into a trivial function followed by
> seeing how the diff looked in combination with each of the other option
> that "ls-files" accepts.

There's no point testing in combination with flags that only affect
untracked files.  And I'm very dubious of adding testing for a case
where we would need to add an explicit disclaimer that "We have no
idea what the output should be but we are testing it anyway".  So the
options you suggest at least are things I'd rather not see us trying
to add to the testing here.
Elijah Newren Dec. 10, 2021, 6:53 p.m. UTC | #3
On Fri, Dec 10, 2021 at 7:13 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is based on ld/sparse-index-blame (merged with 'master' due to an
> unrelated build issue).
>
> Here are two relatively-simple patches that further the sparse index
> integrations.
>
> Did you know that 'fetch' and 'pull' read the index? I didn't, or this would
> have been an integration much earlier in the cycle. They read the index to
> look for the .gitmodules file in case there are submodules that need to be
> fetched. Since looking for a file by name is already protected, we only need
> to disable 'command_requires_full_index' and we are done.
>
> The 'ls-files' builtin is useful when debugging the index, and some scripts
> use it, too. We are not changing the default behavior which expands a sparse
> index in order to show all of the cached blobs. Instead, we add a '--sparse'
> option that allows us to see the sparse directory entries upon request.
> Combined with --debug, we can see a lot of index details, such as:
>
> $ git ls-files --debug --sparse
> LICENSE
>   ctime: 1634910503:287405820
>   mtime: 1634910503:287405820
>   dev: 16777220 ino: 119325319
>   uid: 501  gid: 20
>   size: 1098    flags: 200000
> README.md
>   ctime: 1634910503:288090279
>   mtime: 1634910503:288090279
>   dev: 16777220 ino: 119325320
>   uid: 501  gid: 20
>   size: 934 flags: 200000
> bin/index.js
>   ctime: 1634910767:828434033
>   mtime: 1634910767:828434033
>   dev: 16777220 ino: 119325520
>   uid: 501  gid: 20
>   size: 7292    flags: 200000
> examples/
>   ctime: 0:0
>   mtime: 0:0
>   dev: 0    ino: 0
>   uid: 0    gid: 0
>   size: 0   flags: 40004000
> package.json
>   ctime: 1634910503:288676330
>   mtime: 1634910503:288676330
>   dev: 16777220 ino: 119325321
>   uid: 501  gid: 20
>   size: 680 flags: 200000
>
>
> (In this example, the 'examples/' directory is sparse.)
>
> Thanks!
>
>
> Updates in v2
> =============
>
>  * Rebased onto latest ld/sparse-index-blame without issue.
>  * Updated the test to use diff-of-diffs instead of a sequence of greps.
>  * Added patches that remove the use of 'test-tool read-cache --table' and
>    its implementation.
>
>
> Updates in v3
> =============
>
>  * Fixed typo in commit message.
>  * Added comments around doing strange things in an ls-files test.
>  * Fixed adjacent typo in a test comment.

Thanks, this round addresses all my previous feedback.  However, there
are two things Ævar has brought up that I think are important:
   * cannot rely on `diff -u` for portability reasons[1] (his
suggestion of git diff --no-index sounds good, or you can use comm(1))
   * have documentation mention the trailing slash that sparse
directory entries are mentioned with[2]

[1] https://lore.kernel.org/git/211210.86zgp8bi48.gmgdl@evledraar.gmail.com/
[2] https://lore.kernel.org/git/211210.86v8zwbev9.gmgdl@evledraar.gmail.com/
Ævar Arnfjörð Bjarmason Dec. 11, 2021, 2:24 a.m. UTC | #4
On Fri, Dec 10 2021, Elijah Newren wrote:

> On Fri, Dec 10, 2021 at 8:31 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> [...]
>> Other than that 2/5 adds this documentation about ls-files --sparse:
>>
>>         If the index is sparse, show the sparse directories without expanding
>>         to the contained files.
>>
>> Shouldn't we at least add:
>>
>>         Sparse directories will be shown with a trailing slash,
>>         e.g. "x/" for a sparse directory "x".q
>
> Makes sense.  Except I don't understand the trailing 'q' -- typo?

Yes, sorry.

>>
>> In addition to that I think this may have a buggy/unexpected interaction
>> with the --eol option:
>>
>>     040000 aaff74984cccd156a469afa7d9ab10e4777beb24 0       i/      w/      attr/                   x/
>>
>> I.e. should we be saying anything about the EOL state of these? OTOHO I
>> tried adding a submodule and it says the same, which seems similarly
>> odd, so maybe it's either correct, or this isn't updated for those
>> either.
>
> If it matches what we do for submodules, for which eol values are also
> non-sensical, then I think we're good enough for this series.  Perhaps
> we just shouldn't print anything eol related for directories with
> --eol, but that sounds like an orthogonal series rather than something
> that should go in this one.

*nod*, probably. 

>> Is the behavior of:
>>
>>     $ git -C sparse-index ls-files --stage --sparse -- 'folder2/a'
>>     $ echo $?
>>     0
>>
>> Expected? I.e. accepting /a when we'd just print "folder2/" and not
>> e.g. erroring (probably, just asking)?
>
> Fair question.  I think it's fine; by way of comparison:
>
> $ git rm --cached removed-and-no-longer-tracked-file
> $ git ls-files --stage -- non-existent-file
> removed-and-no-longer-tracked-file untracked-file
> $ echo $?
> 0
>
> So it also shows nothing and displays nothing when asked for file(s)
> that are not in the index.
>
> Yes, there is a slight semantic difference in that in your example we
> have a "folder2/" entry which *could be* expanded, but I am quite
> happy with the literal interpretation of the command that there is no
> "folder2/a" in the index.  Said another way, I'm happy with ls-files
> showing what is in the index right now, rather than what could be in
> it, or listing things that HEAD contains that we don't for whatever
> reason.

Sounds good.

>> How about:
>>
>>     $ ls -l sparse-index/x
>>     ls: cannot access 'sparse-index/x': No such file or directory
>>     $ git -C sparse-index ls-files --stage 'x/*'
>>     100644 78981922613b2afb6025042ff6bd878ac1994e85 0       x/a
>>     $ git -C sparse-index ls-files --stage --no-empty-directory 'x/*'
>>     100644 78981922613b2afb6025042ff6bd878ac1994e85 0       x/a
>>     $ git -C sparse-index ls-files --stage --no-empty-directory --sparse 'x/*'
>>     040000 aaff74984cccd156a469afa7d9ab10e4777beb24 0       x/
>>
>> The answer is probably "yes that's fine" because I've got no idea how
>> sparse really works, but just checking..
>
> You should read the docs for this option you are trying: "Do not list
> empty directories. Has no effect without --directory."    (Also,
> --directory only takes effect with --other, which you are also
> missing.)
>
> So yeah, that flag is irrelevant.  Perhaps ls-files should print a
> warning when flags are passed but ignored due to other flags not being
> passed, but that would belong in an orthogonal series rather than this
> one.

...

>> So it's very nice to have the new diff test in 2/5, but would be much
>> nicer/assuring to have that split into a trivial function followed by
>> seeing how the diff looked in combination with each of the other option
>> that "ls-files" accepts.
>
> There's no point testing in combination with flags that only affect
> untracked files.  And I'm very dubious of adding testing for a case
> where we would need to add an explicit disclaimer that "We have no
> idea what the output should be but we are testing it anyway".  So the
> options you suggest at least are things I'd rather not see us trying
> to add to the testing here.

This series is adding a new flag to ls-files, it doesn't error out when
combined with other existing flags, and observably changes their output.

I think erroring out would be fine, or doing whatever it's doing now,
but either way the gap in test coverage should be closed, shouldn't it?

I'd think the easiest and probably most prudent fix would just be to say
that we don't think some of these make sense with --sparse and have them
error out if they're combined, no?
Elijah Newren Dec. 11, 2021, 4:45 a.m. UTC | #5
On Fri, Dec 10, 2021 at 6:28 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Dec 10 2021, Elijah Newren wrote:
>
> > On Fri, Dec 10, 2021 at 8:31 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> > [...]
> >> So it's very nice to have the new diff test in 2/5, but would be much
> >> nicer/assuring to have that split into a trivial function followed by
> >> seeing how the diff looked in combination with each of the other option
> >> that "ls-files" accepts.
> >
> > There's no point testing in combination with flags that only affect
> > untracked files.  And I'm very dubious of adding testing for a case
> > where we would need to add an explicit disclaimer that "We have no
> > idea what the output should be but we are testing it anyway".  So the
> > options you suggest at least are things I'd rather not see us trying
> > to add to the testing here.
>
> This series is adding a new flag to ls-files, it doesn't error out when
> combined with other existing flags, and observably changes their output.

Ah, I think you had a misunderstanding here.  If what you say here
were true, then indeed we would need some testing and it'd suggest
some kind of bug.  But the combination here does not observably change
the output.  You were missing an important testcase for comparison.
Let me repeat your testing and sprinkle in some commentary:

> >>     $ ls -l sparse-index/x
> >>     ls: cannot access 'sparse-index/x': No such file or directory

Right, this is a sparse directory; good to double check.

> >>     $ git -C sparse-index ls-files --stage 'x/*'
> >>     100644 78981922613b2afb6025042ff6bd878ac1994e85 0       x/a
> >>     $ git -C sparse-index ls-files --stage --no-empty-directory 'x/*'
> >>     100644 78981922613b2afb6025042ff6bd878ac1994e85 0       x/a

Right, --no-empty-directory by itself is a useless option that won't
affect the output of ls-files; it only takes affect with --directory,
which in turn only takes affect with other options.  Since that
options is useless, the output is the same for both of these.

> >>     $ git -C sparse-index ls-files --stage --no-empty-directory --sparse 'x/*'
> >>     040000 aaff74984cccd156a469afa7d9ab10e4777beb24 0       x/

Here you added --sparse, but you neglected what it would show without
--no-empty-directory, which is a critical comparison point.  So let me
fill it in:

     $ git -C sparse-index ls-files --stage --sparse 'x/*'
     040000 aaff74984cccd156a469afa7d9ab10e4777beb24 0       x/

Now, this case I added in comparison to the one three above it shows
that, yes, --sparse does indeed change the output relative to --stage.
And it does so by design.  Now if you compare my added case to the
last one you showed, you can verify that adding --no-empty-directory
to that mix does not change the output further; --no-empty-directory
is a useless/ignored option unless you also include other flags that
were not involved here.

> I think erroring out would be fine, or doing whatever it's doing now,
> but either way the gap in test coverage should be closed, shouldn't it?
>
> I'd think the easiest and probably most prudent fix would just be to say
> that we don't think some of these make sense with --sparse and have them
> error out if they're combined, no?

ls-files offers several options that allow you to either slice and
dice or tweak the output, and function on two kinds of files: tracked,
and not tracked.  Several examples of such flags:
   * tracked: --cached, --stage, --unmerged, --modified, --sparse (and
I think --error-unmatch)
   * not tracked: --others, --ignored, --exclude, --exclude-from,
--exclude-standard, --directory, --no-empty-directory

Now, in particular, specifying any of --exclude, --exclude-from,
--exclude-standard, --directory, or --no-empty-directory is a complete
waste of breath and will do nothing unless you also specify --others
or --ignored.  None of these options interact in any way with any of
the flags from the --tracked category.

I don't think we want an n! permutation of all combinations tested.  I
don't even think an n^2 pair-wise combination makes sense when we know
that some flags have no effect on their own.  What would make sense is
perhaps adding a warning to ls-files when specified flags will have no
utility due to depending on other flags that have not been specified.
But that's in no way specific to --sparse and does not make sense to
me to make part of this topic.