mbox series

[v4,0/2] builtin/cat-file: mark 'git cat-file' sparse-index compatible

Message ID pull.1770.v4.git.git.1725401207.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series builtin/cat-file: mark 'git cat-file' sparse-index compatible | expand

Message

Philippe Blain via GitGitGadget Sept. 3, 2024, 10:06 p.m. UTC
I believe I've addressed all issues raised except the question about how to
exhaustively validate the changes. I was under the impression that the
combination of the existing 'git cat-file' test cases and my new cases were
enough, but I have thus far been unable to get code coverage working locally
to validate that myself, due to spurious failures in other tests. I've
kicked off another run with most unrelated tests removed to see if that
helps.

Please note that this is my first contribution to git. I've tried to follow
the instructions about how to correctly submit a patch (I'm using
GitGitGadget as getting Outlook to do plain text e-mail correctly seems
impossible), but please let me know if I've missed something.

My motivation for making this change is purely performance. We have a large
repository that we enable the sparse index for, and I am developing a
pre-commit hook that (among other things) uses git cat-file to get the
staged contents of certain files. Without this change, getting the contents
of a single small file from the index can take upwards of 10 seconds due to
the index expansion. After this change, it only takes ~0.3 seconds unless
the file is outside of the sparse index.

Kevin Lyles (2):
  t1092: allow run_on_* functions to use standard input
  builtin/cat-file: mark 'git cat-file' sparse-index compatible

 builtin/cat-file.c                       |  3 ++
 t/t1092-sparse-checkout-compatibility.sh | 50 +++++++++++++++++++++---
 2 files changed, 48 insertions(+), 5 deletions(-)


base-commit: 4590f2e9412378c61eac95966709c78766d326ba
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1770%2Fklylesatepic%2Fkl%2Fmark-cat-file-sparse-index-compatible-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1770/klylesatepic/kl/mark-cat-file-sparse-index-compatible-v4
Pull-Request: https://github.com/git/git/pull/1770

Range-diff vs v3:

 1:  b310593aec2 ! 1:  73fe71abcd5 Allow using stdin in run_on_* functions
     @@ Metadata
      Author: Kevin Lyles <klyles@epic.com>
      
       ## Commit message ##
     -    Allow using stdin in run_on_* functions
     +    t1092: allow run_on_* functions to use standard input
      
     -    The 'run_on_sparse' and 'run_on_all' functions previously did not work
     -    correctly for commands accepting standard input. This also indirectly
     -    affected 'test_all_match' and 'test_sparse_match'.
     +    The 'run_on_sparse' and 'run_on_all' functions do not work correctly for
     +    commands accepting standard input, because they run the same command
     +    multiple times and the first instance consumes it. This also indirectly
     +    affects 'test_all_match' and 'test_sparse_match'.
      
     -    Now, we accept standard input and will send it to each command that we
     -    run. This does not impact using the functions for commands that don't
     -    need standard input.
     +    To allow these functions to work with commands accepting standard input,
     +    first slurp standard input to a temporary file, and then run the command
     +    with its standard input redirected from the temporary file. This ensures
     +    that each command sees the same contents from its standard input.
     +
     +    Note that this does not impact commands that do not read from standard
     +    input; they continue to ignore it. Additionally, existing uses of the
     +    run_on_* functions do not need to do anything differently, as the
     +    standard input of the test environment is already connected to
     +    /dev/null.
     +
     +    We do not explicitly clean up the input files because they are cleaned
     +    up with the rest of the test repositories and their contents may be
     +    useful for figuring out which command failed when a test case fails.
      
          Signed-off-by: Kevin Lyles <klyles@epic.com>
      
     @@ t/t1092-sparse-checkout-compatibility.sh: init_repos_as_submodules () {
       }
       
       run_on_sparse () {
     -+	cat >run_on_sparse-input &&
     ++	cat >run-on-sparse-input &&
      +
       	(
       		cd sparse-checkout &&
       		GIT_PROGRESS_DELAY=100000 "$@" >../sparse-checkout-out 2>../sparse-checkout-err
      -	) &&
     -+	) <run_on_sparse-input &&
     ++	) <run-on-sparse-input &&
       	(
       		cd sparse-index &&
       		GIT_PROGRESS_DELAY=100000 "$@" >../sparse-index-out 2>../sparse-index-err
      -	)
     -+	) <run_on_sparse-input
     ++	) <run-on-sparse-input
       }
       
       run_on_all () {
     -+	cat >run_on_all-input &&
     ++	cat >run-on-all-input &&
      +
       	(
       		cd full-checkout &&
       		GIT_PROGRESS_DELAY=100000 "$@" >../full-checkout-out 2>../full-checkout-err
      -	) &&
      -	run_on_sparse "$@"
     -+	) <run_on_all-input &&
     -+	run_on_sparse "$@" <run_on_all-input
     ++	) <run-on-all-input &&
     ++	run_on_sparse "$@" <run-on-all-input
       }
       
       test_all_match () {
 2:  f4d1461b993 ! 2:  ac913257309 Mark 'git cat-file' sparse-index compatible
     @@ Metadata
      Author: Kevin Lyles <klyles+github@epic.com>
      
       ## Commit message ##
     -    Mark 'git cat-file' sparse-index compatible
     +    builtin/cat-file: mark 'git cat-file' sparse-index compatible
      
          This change affects how 'git cat-file' works with the index when
          specifying an object with the ":<path>" syntax (which will give file
          contents from the index).
      
     -    'git cat-file' will expand a sparse index to a full index when needed,
     -    but is currently marked as needing a full index (or rather, not marked
     -    as not needing a full index). This results in much slower 'git cat-file'
     -    operations when working within the sparse index, since we expand the
     -    index whether it's needed or not.
     +    'git cat-file' expands a sparse index to a full index any time contents
     +    are requested from the index by specifying an object with the ":<path>"
     +    syntax. This is true even when the requested file is part of the sparse
     +    index, and results in much slower 'git cat-file' operations when working
     +    within the sparse index.
      
          Mark 'git cat-file' as not needing a full index, so that you only pay
          the cost of expanding the sparse index to a full index when you request