diff mbox series

[v2] bisect: Honor log.date

Message ID 3ec4ec15-8889-913a-1184-72e55a1e0432@softwolves.pp.se (mailing list archive)
State New, archived
Headers show
Series [v2] bisect: Honor log.date | expand

Commit Message

Peter Krefting March 30, 2024, 11:10 p.m. UTC
When bisect finds the target commit to display, it calls git diff-tree
to do so. This is a plumbing command that is not affected by the user's
log.date setting. Switch to instead use "git show", which does honor
it.

Reported-by: Michael Osipov <michael.osipov@innomotics.com>
Signed-off-By: Peter Krefting <peter@softwolves.pp.se>
---
  bisect.c | 25 ++++++++++---------------
  1 file changed, 10 insertions(+), 15 deletions(-)

This version also uses "--stat" which produces an output more like the 
one from the diff-tree utility.

GitHub's test run reports a single failed test (7300), but this passes 
when I try it locally: 
https://github.com/nafmo/git-l10n-sv/commit/2f27ae64064edc5c2570f1c9ea121f3f1a7283d7

Comments

Junio C Hamano March 31, 2024, 2:14 a.m. UTC | #1
Peter Krefting <peter@softwolves.pp.se> writes:

> When bisect finds the target commit to display, it calls git diff-tree
> to do so. This is a plumbing command that is not affected by the user's
> log.date setting. Switch to instead use "git show", which does honor
> it.

I suspect that log.date is a small tip of an iceberg of the benefit
we'll get from this switch.  There is an untold assumption that
honoring the user's configuration is a good thing behind the move
against "plumbing" in the above description, but singling log.date
out would give a wrong message.  It makes it harder to answer a
question, "The commit meant to make the command honor `log.date` and
make no other behaviour changes, but there are many small behaviour
changes---are they intended?", when somebody reads this commit log
message after we all forgot about the true motivation behind the
change.

    Subject: [PATCH vN] bisect: report the final commit with "show"

    When "git bisect" finds the first bad commit and shows it to the
    user, it calls "git diff-tree", whose output is meant to be
    stable and deliberately ignores end-user customizations.

    As this output is meant to be consumed by humans, let's switch
    it to use "git show" so that we honor end-user customizations
    via the configuration mechanism (e.g., "log.mailmap") and
    benefit from UI improvements meant for human consumption (e.g.,
    the output is sent to the pager) in "git show" relative to "git
    diff-tree".

    We have to give "git show" some hardcoded options, like not
    showing the patch text at all, as the patch is too much for the
    purpose of "git bisect" reporting the final commit.

would be how I would explain and justify this change.  If we later
add more configuration to tweak "git show" output, it will affect
the output from "git bisect" automatically, which is another thing
you may want to explain and use as another reason to justify the
change (in the second paragraph).

Some differences in the proposed output and the current output I see
are:

 - the output now goes to the pager

 - it now honors log.mailmap (which may default to true, so you
   could disable it with log.mailmap=false).

 - it shows the ref decoration by default (when the output goes to
   terminal).

 - the commit object names for the merge parents are abbreviated.

 - it no longer shows the change summary (creation, deletion,
   rename, copy).

 - it no longer shows the diffstat when the final commit turns out
   to be a merge commit.

There may be other differences.

I personally welcome the first four changes above, which I suspect
you didn't intend to make (I suspect that you weren't even aware of
making these changes).

If there were no existing users of "git bisect" other than me, I
would even suggest dropping "--no-abbrev-commit" from the set of
hardcoded "git show" options, so that the commit object name itself,
just like the commit object names for the merge parents, gets
abbreviated.  The abbreviation is designed to give us unique prefix,
so for the purpose of cutting and pasting from the output to some
other Git command, it should not break my workflow.  If some tool is
reading the output and blindly assuming that the object names are
spelled in full, such a change will break it.

The final two changes, lack of diffstat for merges, may or may not
be considered a regression, depending on the user you ask.  I was
just surprised by them but personally was not too unhappy with the
behaviour change, but reactions from other couple of thousands of
Git users (we have at least that many users these days, no?) may be
different from mine, ranging from "Meh" to "you broke my workflow".

A good test case to try is to do a bisection that finds c2f3bf07
(GIT 1.0.0, 2005-12-21) with and without your patch and compare
the output from them.  I say it is "good test case", not because
I view any difference is a bug in this patch, but because many
differences are probably good things that helps us to promote the
behaviour changes.  They just need to be explained in the proposed
log message to tell our future developers that we knew about these
behaviour changes and we meant to make them.

Having said all that.

> +static void show_commit(struct commit *commit)
>  {
> -	const char *argv[] = {
> -		"diff-tree", "--pretty", "--stat", "--summary", "--cc", NULL
> -	};
> -	struct rev_info opt;
> +	struct child_process show = CHILD_PROCESS_INIT;

It is very good that we no longer use the separate argv[] array and
use the more convenient strvec_pushl() call, which will make it
easier for us to later tweak the arguments we pass to the command
invocation dynamically if needed.

> -	git_config(git_diff_ui_config, NULL);
> -	repo_init_revisions(r, &opt, prefix);
> -
> -	setup_revisions(ARRAY_SIZE(argv) - 1, argv, &opt, NULL);
> -	log_tree_commit(&opt, commit);
> -	release_revisions(&opt);

And not doing this in process lets us not have to bother with the
configuration and other things we did in the original.  We now spawn
an extra process to show the final commit, but this is done only at
the very end of a bisection session, so it shouldn't matter.

> +	strvec_pushl(&show.args, "show", "--pretty=medium", "--stat", "--no-abbrev-commit", "--no-patch",
> +		     oid_to_hex(&commit->object.oid), NULL);

I would write it either like this:

	strvec_pushl(&show.args, "show",
		     "--pretty=medium", "--stat",
		     "--no-abbrev-commit", "--no-patch",
		     oid_to_hex(&commit->object.oid), NULL);

in anticipation for changing the set of options over the evolution
of this code (but the first "show" line or the last "oid_to_hex()"
line would have much less chance of needing to change), or even
spread the middle part one-option-per-line.

As to the exact set of options to pass to "git show", the preference
would be different from person to person, but I probably would drop
"--pretty=medium", as it is the default and if/when "git show"
learns to tweak it via configuration variable, you would want the
output from here honor it just like you wanted it honor `log.date`.
I would not be too unhappy to see `--no-abbrev-commit` to go myself,
but some tool authors might hate you if you did so.  I dunno.

If you add --stat, don't you want to add --summary as well?  Try to
bisect down to a commit that adds or removes files to see the output
difference to decide.

Thanks.
Peter Krefting March 31, 2024, 5:10 p.m. UTC | #2
Junio C Hamano:

> I suspect that log.date is a small tip of an iceberg of the benefit
> we'll get from this switch.

Yeah. I was planning on elaborating a bit on that, but forgot 
completely by the time I came around to look at it. I will update the 
message with your suggestions for the next version.

> Some differences in the proposed output and the current output I see
> are:
>
> - the output now goes to the pager
>
> - it now honors log.mailmap (which may default to true, so you
>   could disable it with log.mailmap=false).
>
> - it shows the ref decoration by default (when the output goes to
>   terminal).
>
> - the commit object names for the merge parents are abbreviated.
>
> - it no longer shows the change summary (creation, deletion,
>   rename, copy).
>
> - it no longer shows the diffstat when the final commit turns out
>   to be a merge commit.
>
> There may be other differences.
>
> I personally welcome the first four changes above, which I suspect
> you didn't intend to make (I suspect that you weren't even aware of
> making these changes).

I hadn't really noticed that the previous implementation *didn't* 
display this. For the most part, the final output of 'bisect' looks 
like what I expect 'show' to display, to me it was mostly missing the 
other things.

> If there were no existing users of "git bisect" other than me, I
> would even suggest dropping "--no-abbrev-commit" from the set of
> hardcoded "git show" options, so that the commit object name itself,
> just like the commit object names for the merge parents, gets
> abbreviated.

The full commit hash is shown in the line above anyway, so that entire 
line is redundant. But since there is no standard format available 
that omits the commit hash I thought I'd leave it at the full hash to 
be the most like the previous behaviour as possible.

> The final two changes, lack of diffstat for merges, may or may not
> be considered a regression, depending on the user you ask.  I was
> just surprised by them but personally was not too unhappy with the
> behaviour change, but reactions from other couple of thousands of
> Git users (we have at least that many users these days, no?) may be
> different from mine, ranging from "Meh" to "you broke my workflow".

Those two were not intentional. I'll have to do a few test runs to 
compare the outputs and try to the change as non-intrusive as 
possible. Thanks.

> If you add --stat, don't you want to add --summary as well?  Try to
> bisect down to a commit that adds or removes files to see the output
> difference to decide.

There are a lot of parameters to show that I have haven't used in my 
14+ years of using Git, --summary is one of them. That's why I didn't 
add it.
Junio C Hamano March 31, 2024, 10:58 p.m. UTC | #3
Peter Krefting <peter@softwolves.pp.se> writes:

> There are a lot of parameters to show that I have haven't used in my
> 14+ years of using Git, --summary is one of them. That's why I didn't
> add it.

Yup, that is semi-understandable, but especially given that it is
one of the options used by the original "diff-tree"'s invocation,
and that we are trying to replace it with "show" from the same
family of commands, it is a bit of disappointment.

We know we used to drive "diff-tree" with a known set of options,
and we are replacing the command to use "show" with some other set
of options.  I expected it to be fairly straight-forward and natural
to feed randomly picked commits to the two commands and compare
their output while deciding what that "some other set of options"
should be.  It is exactly the reason why I mentioned v1.0.0^0 is a
good test case.

Again, the output from them do not have to be identical---we are
primarily after catching unintended loss of informatino in such a
comparison, while gaining more confidence that it is a better
approach to use "show" output to produce output for end-user
consumption.

We have changed the bisect output before, as recent as in 2019 with
b02be8b9 (bisect: make diff-tree output prettier, 2019-02-22), and
heard nobody complain, so once we get to a reasonable set of options
and land this patch, maybe we can try improving on it safely.

FYI, attached is a comparison between the diff-tree output and
output from show with my choice of options for "show" picked from
the top of my head.  I do not think I personally like the --stat
output applied to a merge (--stat and --summary do not work N-way
like --cc does for patch text), but I think these options are the
closest parallel to what we have been giving to "diff-tree".

Thanks.

---------------------- >8 ----------------------
$ git diff-tree --pretty --stat --summary --cc v1.0.0^0
commit c2f3bf071ee90b01f2d629921bb04c4f798f02fa
Merge: 1ed91937e5cd59fdbdfa5f15f6fac132d2b21ce0 41f93a2c903a45167b26c2dc93d45ffa9a9bbd49
Author: Junio C Hamano <junkio@cox.net>
Date:   Wed Dec 21 00:01:00 2005 -0800

    GIT 1.0.0
    
    Signed-off-by: Junio C Hamano <junkio@cox.net>

 .gitignore                                       |   1 -
 Documentation/diff-options.txt                   |   8 +
 ...
 tar-tree.c                                       |   4 +-
 unpack-objects.c                                 |  13 +-
 66 files changed, 778 insertions(+), 617 deletions(-)
 delete mode 100644 Documentation/git-octopus.txt
 ...
 mode change 100644 => 100755 t/t5500-fetch-pack.sh
 mode change 100644 => 100755 t/t6101-rev-parse-parents.sh

---------------------- >8 ----------------------
$ git show -s --stat --summary --first-parent v1.0.0^0
commit c2f3bf071ee90b01f2d629921bb04c4f798f02fa
Merge: 1ed91937e5 41f93a2c90
Author: Junio C Hamano <gitster@pobox.com>
Date:   Wed Dec 21 00:01:00 2005 -0800

    GIT 1.0.0
    
    Signed-off-by: Junio C Hamano <junkio@cox.net>

 .gitignore                                       |   1 -
 Documentation/diff-options.txt                   |   8 +
 ...
 tar-tree.c                                       |   4 +-
 unpack-objects.c                                 |  13 +-
 66 files changed, 778 insertions(+), 617 deletions(-)
 delete mode 100644 Documentation/git-octopus.txt
 ...
 mode change 100644 => 100755 t/t5500-fetch-pack.sh
 mode change 100644 => 100755 t/t6101-rev-parse-parents.sh
Jeff King April 1, 2024, 2:32 a.m. UTC | #4
On Sun, Mar 31, 2024 at 03:58:21PM -0700, Junio C Hamano wrote:

> Again, the output from them do not have to be identical---we are
> primarily after catching unintended loss of informatino in such a
> comparison, while gaining more confidence that it is a better
> approach to use "show" output to produce output for end-user
> consumption.
> 
> We have changed the bisect output before, as recent as in 2019 with
> b02be8b9 (bisect: make diff-tree output prettier, 2019-02-22), and
> heard nobody complain, so once we get to a reasonable set of options
> and land this patch, maybe we can try improving on it safely.

I guess that commit is what brought me into the cc. I have not been
following this topic too closely, but generally I'm in favor of using
"git show". I even suggested it back then, but I think Christian
preferred not using an external process if we could avoid it.

The thread from 2019 is here:

  http://lore.kernel.org/git/20190222061949.GA9875@sigill.intra.peff.net

which links to the earlier discussion about "git show":

  https://lore.kernel.org/git/CAP8UFD3QhTUj+j3vBGrm0sTQ2dSOLS-m2_PwFj6DZS4VZHKRTQ@mail.gmail.com/

IMHO this config thing is a good example of the strength of the separate
"show" process. If our goal is to trigger all the niceties of "git
show", it is tricky to catch them all. The revision machinery is pretty
reusable, but there's no easy way to figure out which config affects
git-show and so on. Of course if we had a way to invoke git-show
in-process that would work, but I suspect there are unexpected corner
cases that might trigger.

> FYI, attached is a comparison between the diff-tree output and
> output from show with my choice of options for "show" picked from
> the top of my head.  I do not think I personally like the --stat
> output applied to a merge (--stat and --summary do not work N-way
> like --cc does for patch text), but I think these options are the
> closest parallel to what we have been giving to "diff-tree".

I think it was me who added the --cc in 2019; before then we simply
showed nothing at all for merges. I am inclined to say that --cc is not
really that useful for a bisection at all. If a merge introduces a bug,
it _might_ come from a resolved hunk that would be shown by --cc
--patch, but it is just as likely to me that there is some semantic
conflict between the two sides.

For a workflow like the one we use in git.git, where we are merging
topic branches to a long-running branch, I think that showing the
-stat/--summary against the first parent is what you want. We know that
things worked in merge^1, so we show the changes brought in by merge^2.
That does not necessarily mean that the changes in merge^1 are not to
blame, but at least the changes in merge^2 give you a good place to
start.

For a workflow where you do lots of back-merges, it is less clear that
showing the changes against the first parent is better than against
others. But I still think the "well, at least showing the changes
against one parent gives you an idea of where to start looking" logic
applies. And showing the "-m" diff against every parent often has a lot
of useless noise.

I don't think I considered all this back when adding --cc in 2019. But I
believe that "--stat --cc" is just showing the diff against the first
parent. Which happens to be what I think this useful, biased of course
by the fact that projects I work on tend to use a topic-branch workflow. ;)

Arguably passing "--diff-merges=first-parent" would more directly
express the intent (I don't think that existed back in 2019).

Of course with "git show" we do not need to even say anything, since
"--cc" is the default and it does what we (I) want.

(I was puzzled that earlier in the thread you said "it no longer shows
the diffstat when the final commit turns out to be a merge commit". It
looks like it still does?).

I do think keeping --summary is important; it's the only place we show
mode changes, for example.

The other changes you outlined all seem like improvements to me.

Looking at Peter's patch, I think:

  - "--no-patch" is doing nothing (passing --stat is enough to suppress
    the default behavior of showing the patch).

  - "--pretty=medium" is redundant at best (it's the default), and
    possibly overriding a different decision "show" might make (I don't
    remember if we have a way for a user to configure the default show
    format for commits, but if we did, I think users would expect it to
    kick in here)

  - I'm not sure what the intent is in adding --no-abbrev-commit. It is
    already the default not to abbreviate it in the "commit <oid>" line,
    and if the user has set log.abbrevcommit, shouldn't we respect that?
    It seems to me the point of the patch is that "git show" represents
    the way users expect to see commits shown, and we should let it do
    its thing as much as possible.

-Peff
Peter Krefting April 1, 2024, 3:50 p.m. UTC | #5
Junio C Hamano:

> Yup, that is semi-understandable, but especially given that it is 
> one of the options used by the original "diff-tree"'s invocation, 
> and that we are trying to replace it with "show" from the same 
> family of commands, it is a bit of disappointment.

Indeed. I will make the necessary adjustments.

> FYI, attached is a comparison between the diff-tree output and 
> output from show with my choice of options for "show" picked from 
> the top of my head.

I am trying to run some comparisons, but I'm not entirely certain what 
the parameters are that were passed to "ls-tree", as it doesn't 
actually run it through a command line. I tried the v1.0.0^0 and are 
seeing discrepancies in the line count. I need to check if it is my 
configuration that causes it, or something else:

   $ git diff-tree --pretty --stat --summary --cc v1.0.0^0 | grep clone-pack.c
    clone-pack.c                                     | 153 ++----------------
   $ git show --stat --summary --no-abbrev-commit v1.0.0^0 | grep clone-pack.c
    clone-pack.c                                     | 151 ++----------------

(these are the options I've currently landed on)

> I do not think I personally like the --stat output applied to a 
> merge (--stat and --summary do not work N-way like --cc does for 
> patch text), but I think these options are the closest parallel to 
> what we have been giving to "diff-tree".

I don't really have a preference here. I usually only look at when 
something changed (which is why I initially targetted the date format; 
in Sweden the YYYY-MM-DD date format is the most prevalent) and the 
commit message (for bug tracker and code-review references and so on), 
less so the actual diff details (those I can look into later).

> $ git show -s --stat --summary --first-parent v1.0.0^0

Hmm, the git show manual page doesn't document supporting 
"--first-parent".

Jeff King:

> I guess that commit is what brought me into the cc. I have not been 
> following this topic too closely, but generally I'm in favor of 
> using "git show". I even suggested it back then, but I think 
> Christian preferred not using an external process if we could avoid 
> it.

I saw the code that tried to avoid calling one. I don't know the 
internals well enough here to figure out if we can do without, even 
when using git show?



That made me realize, if "git show" runs things through a pager, 
wouldn't it then lose the "%s is the first %s commit\n" message 
printed by bisect_next_all() before calling the function to show the 
contents?

Is that fixable?

> The thread from 2019 is here:
>
>  http://lore.kernel.org/git/20190222061949.GA9875@sigill.intra.peff.net
>
> which links to the earlier discussion about "git show":
>
>  https://lore.kernel.org/git/CAP8UFD3QhTUj+j3vBGrm0sTQ2dSOLS-m2_PwFj6DZS4VZHKRTQ@mail.gmail.com/

These two seems to also get it to honor settings (one to not colorize 
output, for instance). So this would be a step further.

> I do think keeping --summary is important; it's the only place we show
> mode changes, for example.

Yes, will fix that. I hadn't realized I lost that, since it wasn't 
something I have been using myself.

>  - "--no-patch" is doing nothing (passing --stat is enough to suppress
>    the default behavior of showing the patch).

Indeed. And it also negates "--summary", so I have dropped that.

>  - "--pretty=medium" is redundant at best (it's the default),

Dropped.

>  - I'm not sure what the intent is in adding --no-abbrev-commit. It is
>    already the default not to abbreviate it in the "commit <oid>" line,
>    and if the user has set log.abbrevcommit, shouldn't we respect that?

I think I added it because the diff-tree command did something 
similar. I can drop that as well ("bisect" displays the full commit 
hash anyway). I guess it mostly is for merges where we show the parent 
hashes?
Jeff King April 1, 2024, 4:32 p.m. UTC | #6
On Mon, Apr 01, 2024 at 04:50:32PM +0100, Peter Krefting wrote:

> I am trying to run some comparisons, but I'm not entirely certain what the
> parameters are that were passed to "ls-tree", as it doesn't actually run it
> through a command line. I tried the v1.0.0^0 and are seeing discrepancies in
> the line count. I need to check if it is my configuration that causes it, or
> something else:
> 
>   $ git diff-tree --pretty --stat --summary --cc v1.0.0^0 | grep clone-pack.c
>    clone-pack.c                                     | 153 ++----------------
>   $ git show --stat --summary --no-abbrev-commit v1.0.0^0 | grep clone-pack.c
>    clone-pack.c                                     | 151 ++----------------
> 
> (these are the options I've currently landed on)

Hmm, I get 153 for both. Presumably it's due to some config that only
git-show respects...

Aha. If I set diff.algorithm to "patience", I get 151. And I think
bisect would produce the same, because it loads diff_ui_config() before
running the internal diff-tree.

So I think this is fine.

> > $ git show -s --stat --summary --first-parent v1.0.0^0
> 
> Hmm, the git show manual page doesn't document supporting "--first-parent".

I think that's a documentation bug(-ish). We do not include all of the
traversal-related options that "git log" could use because "git show"
does not traverse by default. But it does also affect diffs, per the
comment added to git-log's documentation in e58142add4
(doc/rev-list-options: document --first-parent changes merges format,
2020-12-21).

But these days we have "--diff-merges=first-parent", which I think is a
more intuitive way to specify the same thing for git-show. And it is
documented. So I'd say we could probably continue to not mention
"--first-parent" itself for git-show.

> Jeff King:
> 
> > I guess that commit is what brought me into the cc. I have not been
> > following this topic too closely, but generally I'm in favor of using
> > "git show". I even suggested it back then, but I think Christian
> > preferred not using an external process if we could avoid it.
> 
> I saw the code that tried to avoid calling one. I don't know the internals
> well enough here to figure out if we can do without, even when using git
> show?

There's not really an easy way.

I think the only thing you could do is call cmd_show(), but I'm
skeptical of that approach in general. The builtin top-level commands
are not designed to be run from other spots. And while it will generally
work, there will be corner cases (e.g., loading config that touches
globals, affecting the calling command in unexpected ways). I suspect
you could largely get away with it here where showing the commit is the
last thing we do, but I don't think it's a good pattern to get into.

> That made me realize, if "git show" runs things through a pager, wouldn't it
> then lose the "%s is the first %s commit\n" message printed by
> bisect_next_all() before calling the function to show the contents?
> 
> Is that fixable?

Good catch. IMHO we should disable the pager entirely by sticking
"--no-pager" at the front of the child argv. But then, maybe somebody
would like the output to be paged? I wouldn't.

If we really wanted to keep the pager for git-show, I guess we'd need to
have it print the "%s is the first %s commit" message. The only way I
can think to do that is to pass it as a custom --format. But then we'd
need to additionally specify all of the usual "medium" format as a
custom format, too, which is quite ugly.

> I think I added it because the diff-tree command did something similar. I
> can drop that as well ("bisect" displays the full commit hash anyway). I
> guess it mostly is for merges where we show the parent hashes?

No, I don't think the "Merge:" lines are affected by it either way.
Those are always abbreviated, and looking at pretty.c's
add_merge_info(), I don't think there is any config that affects it.

-Peff
Junio C Hamano April 1, 2024, 5:03 p.m. UTC | #7
Jeff King <peff@peff.net> writes:

> So I think this is fine.
>
>> > $ git show -s --stat --summary --first-parent v1.0.0^0
>> 
>> Hmm, the git show manual page doesn't document supporting "--first-parent".
>
> I think that's a documentation bug(-ish). We do not include all of the
> traversal-related options that "git log" could use because "git show"
> does not traverse by default. But it does also affect diffs, per the
> comment added to git-log's documentation in e58142add4
> (doc/rev-list-options: document --first-parent changes merges format,
> 2020-12-21).

It's one of the "show is a command in the log family, so some of the
options that are appropriate to log applies there".  The ones that
are not useful are the ones about commit walking (e.g., "git show
--no-merges seen" would probably show nothing), but many are still
relevant.  After all "git show" is a "git log --no-walk --cc" in
disguise.  The "--first-parent" option affects both traversal (which
is useless in the context of "git show" that does not walk) and also
diff generation (which does make it show the diffstat/summary/patch
relative to the first parent), as you two saw.

>> I saw the code that tried to avoid calling one. I don't know the internals
>> well enough here to figure out if we can do without, even when using git
>> show?
>
> There's not really an easy way.

True, but this is "we show the single commit we found before
exiting"; executing "git show" as an external program is fine and
not worth "optimizing out" the cost of starting another process.

> I think the only thing you could do is call cmd_show(), but I'm
> skeptical of that approach in general. The builtin top-level commands
> are not designed to be run from other spots. And while it will generally
> work, there will be corner cases (e.g., loading config that touches
> globals, affecting the calling command in unexpected ways). I suspect
> you could largely get away with it here where showing the commit is the
> last thing we do, but I don't think it's a good pattern to get into.

Exactly.  Anybody who turns run_command("foo") into blindly calling
cmd_foo() should be shot, twice ;-).  The right way to turn
run_command("foo") into an internal call is not to call cmd_foo(),
but to refactor cmd_foo() into the part that sets up the global
state and the part that does the "foo" thing, and make the latter a
reusable function.

In the longer run, if we had infinite engineering resources, it
would be nice to have everything callable by everything else
internally, is it worth doing for this case?  I dunno.

>> That made me realize, if "git show" runs things through a pager, wouldn't it
>> then lose the "%s is the first %s commit\n" message printed by
>> bisect_next_all() before calling the function to show the contents?
>> 
>> Is that fixable?
>
> Good catch. IMHO we should disable the pager entirely by sticking
> "--no-pager" at the front of the child argv. But then, maybe somebody
> would like the output to be paged? I wouldn't.

Hardcoded --no-pager is a good workaround.  But if the output is
long and needs paging, wouldn't we see what was shown before we
spawned "less" on the screen when we quit it?  Running

    $ (echo message here ; git log --help)

and then saying 'q' to exit the pager leaves me "message" after that
command line.

> If we really wanted to keep the pager for git-show, I guess we'd need to
> have it print the "%s is the first %s commit" message. The only way I
> can think to do that is to pass it as a custom --format. But then we'd
> need to additionally specify all of the usual "medium" format as a
> custom format, too, which is quite ugly.

;-)  Ugly but fun.

I wonder how hard it is to add %(default-output) placeholder for the
pretty machinery.

Thanks.
Jeff King April 3, 2024, 1:27 a.m. UTC | #8
On Mon, Apr 01, 2024 at 10:03:13AM -0700, Junio C Hamano wrote:

> >> That made me realize, if "git show" runs things through a pager, wouldn't it
> >> then lose the "%s is the first %s commit\n" message printed by
> >> bisect_next_all() before calling the function to show the contents?
> >> 
> >> Is that fixable?
> >
> > Good catch. IMHO we should disable the pager entirely by sticking
> > "--no-pager" at the front of the child argv. But then, maybe somebody
> > would like the output to be paged? I wouldn't.
> 
> Hardcoded --no-pager is a good workaround.  But if the output is
> long and needs paging, wouldn't we see what was shown before we
> spawned "less" on the screen when we quit it?  Running
> 
>     $ (echo message here ; git log --help)
> 
> and then saying 'q' to exit the pager leaves me "message" after that
> command line.

That depends on your "less" options and your terminal, I think. Aren't
there some combinations where the terminal deinit sequence clears the
screen? It has been a while since I've run into that, though, so I might
be misremembering.

At any rate, my concerns are more:

  1. You wouldn't see it while the pager is active, so you are missing
     some context.

  2. If you don't use LESS=F, then it may be annoying to invoke the
     pager at all.

-Peff

> > If we really wanted to keep the pager for git-show, I guess we'd need to
> > have it print the "%s is the first %s commit" message. The only way I
> > can think to do that is to pass it as a custom --format. But then we'd
> > need to additionally specify all of the usual "medium" format as a
> > custom format, too, which is quite ugly.
> 
> ;-)  Ugly but fun.
> 
> I wonder how hard it is to add %(default-output) placeholder for the
> pretty machinery.

I have a dream that all of the pretty formats could be implemented in
terms of %-placeholders. But yeah, even without that, being able to do
"%(pretty:medium)" would be cool. "Pretty" cool, even. (Sorry, I could
not resist).

-Peff
Christian Couder April 16, 2024, 11:01 a.m. UTC | #9
On Mon, Apr 1, 2024 at 4:32 AM Jeff King <peff@peff.net> wrote:

> I guess that commit is what brought me into the cc. I have not been
> following this topic too closely, but generally I'm in favor of using
> "git show". I even suggested it back then, but I think Christian
> preferred not using an external process if we could avoid it.
>
> The thread from 2019 is here:
>
>   http://lore.kernel.org/git/20190222061949.GA9875@sigill.intra.peff.net
>
> which links to the earlier discussion about "git show":
>
>   https://lore.kernel.org/git/CAP8UFD3QhTUj+j3vBGrm0sTQ2dSOLS-m2_PwFj6DZS4VZHKRTQ@mail.gmail.com/
>
> IMHO this config thing is a good example of the strength of the separate
> "show" process. If our goal is to trigger all the niceties of "git
> show", it is tricky to catch them all. The revision machinery is pretty
> reusable, but there's no easy way to figure out which config affects
> git-show and so on. Of course if we had a way to invoke git-show
> in-process that would work, but I suspect there are unexpected corner
> cases that might trigger.

Sorry for not following the topic closely and for replying to this so
late, but I think that by now we should have some kind of guidelines
about when forking a new process is Ok and when it's not.

It seems to me that there was already some amount of back and forth on
this topic when bisect and other shell commands were ported to C a
long time ago. There weren't clearly written guidelines, but it seems
to me that at that time we thought that forking a new process was
generally bad, especially for performance reasons, but also because
they showed a bad example and didn't go in the right direction. It
seems to me that people who reviewed code that ported some commands to
C sometimes asked contributors to not fork processes, and efforts were
made by contributors, like GSoC or Outreachy contributors I mentored,
to go in this direction. At one point there was even a microproject
about replacing code that forked a process with function calls.

These days there are also talks and patches around about libification
and about passing around a "repository" variable and other such
variables, so that C code does not need to fork processes to be able
to work more broadly, for example in submodules. And again it seems to
me that such changes (adding code which starts a new process to
replace code which doesn't) doesn't go in the same direction as the
libification and similar goals.
Junio C Hamano April 16, 2024, 3:42 p.m. UTC | #10
Christian Couder <christian.couder@gmail.com> writes:

>> IMHO this config thing is a good example of the strength of the separate
>> "show" process. If our goal is to trigger all the niceties of "git
>> show", it is tricky to catch them all. The revision machinery is pretty
>> reusable, but there's no easy way to figure out which config affects
>> git-show and so on. Of course if we had a way to invoke git-show
>> in-process that would work, but I suspect there are unexpected corner
>> cases that might trigger.
>
> Sorry for not following the topic closely and for replying to this so
> late, but I think that by now we should have some kind of guidelines
> about when forking a new process is Ok and when it's not.

I thought we had passed that stage long ago.  A case like this one
we see in this patch, where it is run just once immediately before
we give control back to the end-user (as opposed to "gets run each
time in a tight loop"), I would see it a no-brainer to discount the
"fork+exec is so expensive" objection more than we would otherwise,
especially when the upside of running an external command is so much
bigger.

There actually should be a different level of "running it as a
separate command" that we do not have.  If we can split out and
encapsulate the global execution context sufficiently into a "bag of
state variables" structure, and rewrite cmd_foo() for each such
command we wish to be able to run from inside an executing Git into
two parts:

 - cmd_foo() that prepares the global execution context to a
   "pristine" state, calls into cmd__foo() with that "bag of state
   variables" structure as one of the parameters, and exits when
   everything is done.

 - cmd__foo() that does the rest, including reading the
   configuration files, parsing of the command line arguments to
   override them, doing the actual work.

then the codepath we are changing from using diff-tree to show can
do something like:

	struct git_global_state state = GIT_GLOBAL_STATE_INIT;
	struct strvec args = STRVEC_INIT;

        strvec_pushl(&args, ...);
        cmd__show(&state, args.nr , args.v);

and expect that cmd__show() will do the _right thing_, right?

And to reach that ultimate goal, I do not think using run_command()
API in the meantime poses hindrance.  The real work should be in the
implementation of cmd__show(), not the open-coded use of revisions
API at each such point where you are tempted to spawn an external
command via run_command() API, which will have to be consolidated
and replaced with a call to cmd__show() anyway.
Peter Krefting April 16, 2024, 7:53 p.m. UTC | #11
Junio C Hamano:

> then the codepath we are changing from using diff-tree to show can 
> do something like:
>
> 	struct git_global_state state = GIT_GLOBAL_STATE_INIT;
> 	struct strvec args = STRVEC_INIT;
>
>        strvec_pushl(&args, ...);
>        cmd__show(&state, args.nr , args.v);
>
> and expect that cmd__show() will do the _right thing_, right?

In this particular case, calling "git show" is really the last thing 
we want to do; so if we can move the cleanup that happens after it 
(that ends the bisect), it should be able to just take over the 
current process with a call to show, without needing to re-exec.

And calling back to the libification question, I would see this part 
of the bisect command to be something that would run *on top of* the 
library (with possibly an API to poke bad/good states into it), so I 
don't think that objection holds for this particular case.
Junio C Hamano April 20, 2024, 5:13 p.m. UTC | #12
Peter Krefting <peter@softwolves.pp.se> writes:

> In this particular case, calling "git show" is really the last thing
> we want to do; so if we can move the cleanup that happens after it
> (that ends the bisect), it should be able to just take over the
> current process with a call to show, without needing to re-exec.

The cmd_foo() functions are also expected to either return to their
callers or call exit() themselves, and it is true that as the very
last step before giving the control back to the end-user in the
current "bisect" process, we could make an internal call to
cmd_foo() and let it exit with its own exit status.

But that is only the latter half of a story.

The cmd_show() (or any cmd_foo() in general) function expects to
start from within a pristine environment.  Calling them _after_
somebody else (in this case everything called from cmd_bisect())
clobbered the global state may or may not work (and in general we
should assume it would not work) correctly.

The outline of the envisioned end state of libification I gave was
about an arrangement to ensure that we can give such an pristine
state when we make a call to such "top level" entry point of "foo"
command (in this case, "show"), from a different command (in this
case, "bisect").  It is very much orthogonal to what you are talking
about, I think.  We need both.

> And calling back to the libification question, I would see this part
> of the bisect command to be something that would run *on top of* the
> library (with possibly an API to poke bad/good states into it), so I
> don't think that objection holds for this particular case.

There was no objection.  I was just pointing out that the infrastructure
is not ready to do so.
diff mbox series

Patch

diff --git a/bisect.c b/bisect.c
index 8487f8cd1b..3d0100b165 100644
--- a/bisect.c
+++ b/bisect.c
@@ -959,23 +959,18 @@  static enum bisect_error check_good_are_ancestors_of_bad(struct repository *r,
  }

  /*
- * This does "git diff-tree --pretty COMMIT" without one fork+exec.
+ * Runs "git show" to display a commit
   */
-static void show_diff_tree(struct repository *r,
-			   const char *prefix,
-			   struct commit *commit)
+static void show_commit(struct commit *commit)
  {
-	const char *argv[] = {
-		"diff-tree", "--pretty", "--stat", "--summary", "--cc", NULL
-	};
-	struct rev_info opt;
+	struct child_process show = CHILD_PROCESS_INIT;

-	git_config(git_diff_ui_config, NULL);
-	repo_init_revisions(r, &opt, prefix);
-
-	setup_revisions(ARRAY_SIZE(argv) - 1, argv, &opt, NULL);
-	log_tree_commit(&opt, commit);
-	release_revisions(&opt);
+	strvec_pushl(&show.args, "show", "--pretty=medium", "--stat", "--no-abbrev-commit", "--no-patch",
+		     oid_to_hex(&commit->object.oid), NULL);
+	show.git_cmd = 1;
+	if (run_command(&show))
+		die(_("unable to start 'show' for object '%s'"),
+		    oid_to_hex(&commit->object.oid));
  }

  /*
@@ -1092,7 +1087,7 @@  enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
  		printf("%s is the first %s commit\n", oid_to_hex(bisect_rev),
  			term_bad);

-		show_diff_tree(r, prefix, revs.commits->item);
+		show_commit(revs.commits->item);
  		/*
  		 * This means the bisection process succeeded.
  		 * Using BISECT_INTERNAL_SUCCESS_1ST_BAD_FOUND (-10)