mbox series

[0/3] cat-file: add %(objectmode) avoid verifying submodules' OIDs

Message ID pull.1689.git.1710183362.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series cat-file: add %(objectmode) avoid verifying submodules' OIDs | expand

Message

Philippe Blain via GitGitGadget March 11, 2024, 6:55 p.m. UTC
The cat-file --batch command is very valuable in server settings, but so far
it is missing a bit of functionality that would come in handy there.

For example, it is sometimes necessary to determine the object mode of a
batch of tree objects' children.

This came up in $dayjob recently, and applies cleanly to v2.44.0.

Johannes Schindelin (1):
  cat-file: avoid verifying submodules' OIDs

Victoria Dye (2):
  t1006: update 'run_tests' to test generic object specifiers
  cat-file: add %(objectmode) atom

 Documentation/git-cat-file.txt | 10 +++++
 builtin/cat-file.c             | 41 ++++++++++++++----
 t/t1006-cat-file.sh            | 79 +++++++++++++++++++++-------------
 3 files changed, 92 insertions(+), 38 deletions(-)


base-commit: 3c2a3fdc388747b9eaf4a4a4f2035c1c9ddb26d0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1689%2Fdscho%2Fcat-file-vs-submodules-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1689/dscho/cat-file-vs-submodules-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1689

Comments

Junio C Hamano March 11, 2024, 9:43 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> The cat-file --batch command is very valuable in server settings, but so far
> it is missing a bit of functionality that would come in handy there.
>
> For example, it is sometimes necessary to determine the object mode of a
> batch of tree objects' children.

OK.

It is somewhat unsatisfying that --batch/--batch-check lacks so
much.  Even with %(objectmode) its nature of one-object-at-a-time
makes querying children of a large tree a chore, when you compare it
with something like "cat-file -p HEAD:" that allows you to grab the
needed information for all children with a single invocation.

This is orthogonal to what the patch wants to do, which is to enrich
the output side with more formatting, bit I wonder if we want to
consider enriching the input side?  e.g. instead of feeding just a
single object name from the standard input of "cat-file
--batch/--batch-check", perhaps a syntax can say "Here I have the
object name for a tree-ish object, but please pretend that I gave
you all the objects contained within it", or something?

Thanks, will queue.
Jeff King March 12, 2024, 8:59 a.m. UTC | #2
On Mon, Mar 11, 2024 at 02:43:00PM -0700, Junio C Hamano wrote:

> It is somewhat unsatisfying that --batch/--batch-check lacks so
> much.  Even with %(objectmode) its nature of one-object-at-a-time
> makes querying children of a large tree a chore, when you compare it
> with something like "cat-file -p HEAD:" that allows you to grab the
> needed information for all children with a single invocation.
> 
> This is orthogonal to what the patch wants to do, which is to enrich
> the output side with more formatting, bit I wonder if we want to
> consider enriching the input side?  e.g. instead of feeding just a
> single object name from the standard input of "cat-file
> --batch/--batch-check", perhaps a syntax can say "Here I have the
> object name for a tree-ish object, but please pretend that I gave
> you all the objects contained within it", or something?

That is an interesting direction. In practice I guess you might want to
expand trees (to show their contents) or perhaps commits (to traverse
history and/or look at their trees). And we already have tools to do
that.

So for example you can already do:

  git ls-tree --format='%(objectname) %(objectmode)' HEAD

Or if you wanted to mix-and-match with other cat-file placeholders, you
can do:

  git ls-tree --format='%(objectname) %(objectmode)' HEAD |
  git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'

That is a little less efficient (we look up the object twice), but once
you are working with hex object ids it is not too bad (cat-file is
heavily optimized here). Of course in the long run I think we should
move to a future where the formatting code is shared, and you can just
ask ls-tree for deltabase if you want to.

I think leaving this to specialized tools like ls-tree gives them a lot
of flexibility that a special input mode to cat-file might find awkward.
For example, recurse vs non-recursive tree listing. Or filtering with
pathspecs. And of course when you get into commits and traversal, there
are many rev-list options. :)

The strategy so far has been making sure cat-file can efficiently take
in the output of these other tools to further describe objects. But
moving towards a unified output formatting model would be even better, I
think. In the meantime, I think cat-file learning %(objectmode) makes
sense for single names (rather than listing trees), and fortunately it
uses the same (obvious) name that ls-tree does, so we won't have a
problem unifying them later.

The patch itself looked reasonable to me, modulo the comments you
already made.

-Peff
Junio C Hamano March 12, 2024, 7:28 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> That is an interesting direction. In practice I guess you might want to
> expand trees (to show their contents) or perhaps commits (to traverse
> history and/or look at their trees). And we already have tools to do
> that.
>
> So for example you can already do:
>
>   git ls-tree --format='%(objectname) %(objectmode)' HEAD
>
> Or if you wanted to mix-and-match with other cat-file placeholders, you
> can do:
>
>   git ls-tree --format='%(objectname) %(objectmode)' HEAD |
>   git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'
>
> That is a little less efficient (we look up the object twice), but once
> you are working with hex object ids it is not too bad (cat-file is
> heavily optimized here). Of course in the long run I think we should
> move to a future where the formatting code is shared, and you can just
> ask ls-tree for deltabase if you want to.

I was imagining more about a use case "cat-file --batch" was
originally designed for---having a long-running single process
and ask any and all questions you have about various objects in the
object database by interacting with it.  So "yes, ls-tree can
already give us that information", while it is true, shoots at a
different direction from what I had in mind.

> The strategy so far has been making sure cat-file can efficiently take
> in the output of these other tools to further describe objects. But
> moving towards a unified output formatting model would be even better, I
> think. In the meantime, I think cat-file learning %(objectmode) makes
> sense for single names (rather than listing trees), and fortunately it
> uses the same (obvious) name that ls-tree does, so we won't have a
> problem unifying them later.

Yes, enriching the output format side is an orthogonal issue from
the input side, and the %(objectmode) thing that gives a piece of
information that is additionally available on top of the various
pieces of information about the object itself does make sense.

> The patch itself looked reasonable to me, modulo the comments you
> already made.
>
> -Peff
Jeff King March 12, 2024, 10:03 p.m. UTC | #4
On Tue, Mar 12, 2024 at 12:28:48PM -0700, Junio C Hamano wrote:

> > Or if you wanted to mix-and-match with other cat-file placeholders, you
> > can do:
> >
> >   git ls-tree --format='%(objectname) %(objectmode)' HEAD |
> >   git cat-file --batch-check='%(objectname) %(deltabase) %(rest)'
> >
> > That is a little less efficient (we look up the object twice), but once
> > you are working with hex object ids it is not too bad (cat-file is
> > heavily optimized here). Of course in the long run I think we should
> > move to a future where the formatting code is shared, and you can just
> > ask ls-tree for deltabase if you want to.
> 
> I was imagining more about a use case "cat-file --batch" was
> originally designed for---having a long-running single process
> and ask any and all questions you have about various objects in the
> object database by interacting with it.  So "yes, ls-tree can
> already give us that information", while it is true, shoots at a
> different direction from what I had in mind.

Ah, yeah, that is one thing that cat-file does that no other part of the
system will. I do wonder in the long term if it is easier to teach
cat-file everything that all of the other commands can do, or to teach
all of the other commands some way of handling multiple requests in a
single process. ;)

(All obviously orthogonal to this patch series).

-Peff