Message ID | 3ec4ec15-8889-913a-1184-72e55a1e0432@softwolves.pp.se (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] bisect: Honor log.date | expand |
Peter Krefting <peter@softwolves.pp.se> writes: > When bisect finds the target commit to display, it calls git diff-tree > to do so. This is a plumbing command that is not affected by the user's > log.date setting. Switch to instead use "git show", which does honor > it. I suspect that log.date is a small tip of an iceberg of the benefit we'll get from this switch. There is an untold assumption that honoring the user's configuration is a good thing behind the move against "plumbing" in the above description, but singling log.date out would give a wrong message. It makes it harder to answer a question, "The commit meant to make the command honor `log.date` and make no other behaviour changes, but there are many small behaviour changes---are they intended?", when somebody reads this commit log message after we all forgot about the true motivation behind the change. Subject: [PATCH vN] bisect: report the final commit with "show" When "git bisect" finds the first bad commit and shows it to the user, it calls "git diff-tree", whose output is meant to be stable and deliberately ignores end-user customizations. As this output is meant to be consumed by humans, let's switch it to use "git show" so that we honor end-user customizations via the configuration mechanism (e.g., "log.mailmap") and benefit from UI improvements meant for human consumption (e.g., the output is sent to the pager) in "git show" relative to "git diff-tree". We have to give "git show" some hardcoded options, like not showing the patch text at all, as the patch is too much for the purpose of "git bisect" reporting the final commit. would be how I would explain and justify this change. If we later add more configuration to tweak "git show" output, it will affect the output from "git bisect" automatically, which is another thing you may want to explain and use as another reason to justify the change (in the second paragraph). Some differences in the proposed output and the current output I see are: - the output now goes to the pager - it now honors log.mailmap (which may default to true, so you could disable it with log.mailmap=false). - it shows the ref decoration by default (when the output goes to terminal). - the commit object names for the merge parents are abbreviated. - it no longer shows the change summary (creation, deletion, rename, copy). - it no longer shows the diffstat when the final commit turns out to be a merge commit. There may be other differences. I personally welcome the first four changes above, which I suspect you didn't intend to make (I suspect that you weren't even aware of making these changes). If there were no existing users of "git bisect" other than me, I would even suggest dropping "--no-abbrev-commit" from the set of hardcoded "git show" options, so that the commit object name itself, just like the commit object names for the merge parents, gets abbreviated. The abbreviation is designed to give us unique prefix, so for the purpose of cutting and pasting from the output to some other Git command, it should not break my workflow. If some tool is reading the output and blindly assuming that the object names are spelled in full, such a change will break it. The final two changes, lack of diffstat for merges, may or may not be considered a regression, depending on the user you ask. I was just surprised by them but personally was not too unhappy with the behaviour change, but reactions from other couple of thousands of Git users (we have at least that many users these days, no?) may be different from mine, ranging from "Meh" to "you broke my workflow". A good test case to try is to do a bisection that finds c2f3bf07 (GIT 1.0.0, 2005-12-21) with and without your patch and compare the output from them. I say it is "good test case", not because I view any difference is a bug in this patch, but because many differences are probably good things that helps us to promote the behaviour changes. They just need to be explained in the proposed log message to tell our future developers that we knew about these behaviour changes and we meant to make them. Having said all that. > +static void show_commit(struct commit *commit) > { > - const char *argv[] = { > - "diff-tree", "--pretty", "--stat", "--summary", "--cc", NULL > - }; > - struct rev_info opt; > + struct child_process show = CHILD_PROCESS_INIT; It is very good that we no longer use the separate argv[] array and use the more convenient strvec_pushl() call, which will make it easier for us to later tweak the arguments we pass to the command invocation dynamically if needed. > - git_config(git_diff_ui_config, NULL); > - repo_init_revisions(r, &opt, prefix); > - > - setup_revisions(ARRAY_SIZE(argv) - 1, argv, &opt, NULL); > - log_tree_commit(&opt, commit); > - release_revisions(&opt); And not doing this in process lets us not have to bother with the configuration and other things we did in the original. We now spawn an extra process to show the final commit, but this is done only at the very end of a bisection session, so it shouldn't matter. > + strvec_pushl(&show.args, "show", "--pretty=medium", "--stat", "--no-abbrev-commit", "--no-patch", > + oid_to_hex(&commit->object.oid), NULL); I would write it either like this: strvec_pushl(&show.args, "show", "--pretty=medium", "--stat", "--no-abbrev-commit", "--no-patch", oid_to_hex(&commit->object.oid), NULL); in anticipation for changing the set of options over the evolution of this code (but the first "show" line or the last "oid_to_hex()" line would have much less chance of needing to change), or even spread the middle part one-option-per-line. As to the exact set of options to pass to "git show", the preference would be different from person to person, but I probably would drop "--pretty=medium", as it is the default and if/when "git show" learns to tweak it via configuration variable, you would want the output from here honor it just like you wanted it honor `log.date`. I would not be too unhappy to see `--no-abbrev-commit` to go myself, but some tool authors might hate you if you did so. I dunno. If you add --stat, don't you want to add --summary as well? Try to bisect down to a commit that adds or removes files to see the output difference to decide. Thanks.
Junio C Hamano: > I suspect that log.date is a small tip of an iceberg of the benefit > we'll get from this switch. Yeah. I was planning on elaborating a bit on that, but forgot completely by the time I came around to look at it. I will update the message with your suggestions for the next version. > Some differences in the proposed output and the current output I see > are: > > - the output now goes to the pager > > - it now honors log.mailmap (which may default to true, so you > could disable it with log.mailmap=false). > > - it shows the ref decoration by default (when the output goes to > terminal). > > - the commit object names for the merge parents are abbreviated. > > - it no longer shows the change summary (creation, deletion, > rename, copy). > > - it no longer shows the diffstat when the final commit turns out > to be a merge commit. > > There may be other differences. > > I personally welcome the first four changes above, which I suspect > you didn't intend to make (I suspect that you weren't even aware of > making these changes). I hadn't really noticed that the previous implementation *didn't* display this. For the most part, the final output of 'bisect' looks like what I expect 'show' to display, to me it was mostly missing the other things. > If there were no existing users of "git bisect" other than me, I > would even suggest dropping "--no-abbrev-commit" from the set of > hardcoded "git show" options, so that the commit object name itself, > just like the commit object names for the merge parents, gets > abbreviated. The full commit hash is shown in the line above anyway, so that entire line is redundant. But since there is no standard format available that omits the commit hash I thought I'd leave it at the full hash to be the most like the previous behaviour as possible. > The final two changes, lack of diffstat for merges, may or may not > be considered a regression, depending on the user you ask. I was > just surprised by them but personally was not too unhappy with the > behaviour change, but reactions from other couple of thousands of > Git users (we have at least that many users these days, no?) may be > different from mine, ranging from "Meh" to "you broke my workflow". Those two were not intentional. I'll have to do a few test runs to compare the outputs and try to the change as non-intrusive as possible. Thanks. > If you add --stat, don't you want to add --summary as well? Try to > bisect down to a commit that adds or removes files to see the output > difference to decide. There are a lot of parameters to show that I have haven't used in my 14+ years of using Git, --summary is one of them. That's why I didn't add it.
Peter Krefting <peter@softwolves.pp.se> writes: > There are a lot of parameters to show that I have haven't used in my > 14+ years of using Git, --summary is one of them. That's why I didn't > add it. Yup, that is semi-understandable, but especially given that it is one of the options used by the original "diff-tree"'s invocation, and that we are trying to replace it with "show" from the same family of commands, it is a bit of disappointment. We know we used to drive "diff-tree" with a known set of options, and we are replacing the command to use "show" with some other set of options. I expected it to be fairly straight-forward and natural to feed randomly picked commits to the two commands and compare their output while deciding what that "some other set of options" should be. It is exactly the reason why I mentioned v1.0.0^0 is a good test case. Again, the output from them do not have to be identical---we are primarily after catching unintended loss of informatino in such a comparison, while gaining more confidence that it is a better approach to use "show" output to produce output for end-user consumption. We have changed the bisect output before, as recent as in 2019 with b02be8b9 (bisect: make diff-tree output prettier, 2019-02-22), and heard nobody complain, so once we get to a reasonable set of options and land this patch, maybe we can try improving on it safely. FYI, attached is a comparison between the diff-tree output and output from show with my choice of options for "show" picked from the top of my head. I do not think I personally like the --stat output applied to a merge (--stat and --summary do not work N-way like --cc does for patch text), but I think these options are the closest parallel to what we have been giving to "diff-tree". Thanks. ---------------------- >8 ---------------------- $ git diff-tree --pretty --stat --summary --cc v1.0.0^0 commit c2f3bf071ee90b01f2d629921bb04c4f798f02fa Merge: 1ed91937e5cd59fdbdfa5f15f6fac132d2b21ce0 41f93a2c903a45167b26c2dc93d45ffa9a9bbd49 Author: Junio C Hamano <junkio@cox.net> Date: Wed Dec 21 00:01:00 2005 -0800 GIT 1.0.0 Signed-off-by: Junio C Hamano <junkio@cox.net> .gitignore | 1 - Documentation/diff-options.txt | 8 + ... tar-tree.c | 4 +- unpack-objects.c | 13 +- 66 files changed, 778 insertions(+), 617 deletions(-) delete mode 100644 Documentation/git-octopus.txt ... mode change 100644 => 100755 t/t5500-fetch-pack.sh mode change 100644 => 100755 t/t6101-rev-parse-parents.sh ---------------------- >8 ---------------------- $ git show -s --stat --summary --first-parent v1.0.0^0 commit c2f3bf071ee90b01f2d629921bb04c4f798f02fa Merge: 1ed91937e5 41f93a2c90 Author: Junio C Hamano <gitster@pobox.com> Date: Wed Dec 21 00:01:00 2005 -0800 GIT 1.0.0 Signed-off-by: Junio C Hamano <junkio@cox.net> .gitignore | 1 - Documentation/diff-options.txt | 8 + ... tar-tree.c | 4 +- unpack-objects.c | 13 +- 66 files changed, 778 insertions(+), 617 deletions(-) delete mode 100644 Documentation/git-octopus.txt ... mode change 100644 => 100755 t/t5500-fetch-pack.sh mode change 100644 => 100755 t/t6101-rev-parse-parents.sh
On Sun, Mar 31, 2024 at 03:58:21PM -0700, Junio C Hamano wrote: > Again, the output from them do not have to be identical---we are > primarily after catching unintended loss of informatino in such a > comparison, while gaining more confidence that it is a better > approach to use "show" output to produce output for end-user > consumption. > > We have changed the bisect output before, as recent as in 2019 with > b02be8b9 (bisect: make diff-tree output prettier, 2019-02-22), and > heard nobody complain, so once we get to a reasonable set of options > and land this patch, maybe we can try improving on it safely. I guess that commit is what brought me into the cc. I have not been following this topic too closely, but generally I'm in favor of using "git show". I even suggested it back then, but I think Christian preferred not using an external process if we could avoid it. The thread from 2019 is here: http://lore.kernel.org/git/20190222061949.GA9875@sigill.intra.peff.net which links to the earlier discussion about "git show": https://lore.kernel.org/git/CAP8UFD3QhTUj+j3vBGrm0sTQ2dSOLS-m2_PwFj6DZS4VZHKRTQ@mail.gmail.com/ IMHO this config thing is a good example of the strength of the separate "show" process. If our goal is to trigger all the niceties of "git show", it is tricky to catch them all. The revision machinery is pretty reusable, but there's no easy way to figure out which config affects git-show and so on. Of course if we had a way to invoke git-show in-process that would work, but I suspect there are unexpected corner cases that might trigger. > FYI, attached is a comparison between the diff-tree output and > output from show with my choice of options for "show" picked from > the top of my head. I do not think I personally like the --stat > output applied to a merge (--stat and --summary do not work N-way > like --cc does for patch text), but I think these options are the > closest parallel to what we have been giving to "diff-tree". I think it was me who added the --cc in 2019; before then we simply showed nothing at all for merges. I am inclined to say that --cc is not really that useful for a bisection at all. If a merge introduces a bug, it _might_ come from a resolved hunk that would be shown by --cc --patch, but it is just as likely to me that there is some semantic conflict between the two sides. For a workflow like the one we use in git.git, where we are merging topic branches to a long-running branch, I think that showing the -stat/--summary against the first parent is what you want. We know that things worked in merge^1, so we show the changes brought in by merge^2. That does not necessarily mean that the changes in merge^1 are not to blame, but at least the changes in merge^2 give you a good place to start. For a workflow where you do lots of back-merges, it is less clear that showing the changes against the first parent is better than against others. But I still think the "well, at least showing the changes against one parent gives you an idea of where to start looking" logic applies. And showing the "-m" diff against every parent often has a lot of useless noise. I don't think I considered all this back when adding --cc in 2019. But I believe that "--stat --cc" is just showing the diff against the first parent. Which happens to be what I think this useful, biased of course by the fact that projects I work on tend to use a topic-branch workflow. ;) Arguably passing "--diff-merges=first-parent" would more directly express the intent (I don't think that existed back in 2019). Of course with "git show" we do not need to even say anything, since "--cc" is the default and it does what we (I) want. (I was puzzled that earlier in the thread you said "it no longer shows the diffstat when the final commit turns out to be a merge commit". It looks like it still does?). I do think keeping --summary is important; it's the only place we show mode changes, for example. The other changes you outlined all seem like improvements to me. Looking at Peter's patch, I think: - "--no-patch" is doing nothing (passing --stat is enough to suppress the default behavior of showing the patch). - "--pretty=medium" is redundant at best (it's the default), and possibly overriding a different decision "show" might make (I don't remember if we have a way for a user to configure the default show format for commits, but if we did, I think users would expect it to kick in here) - I'm not sure what the intent is in adding --no-abbrev-commit. It is already the default not to abbreviate it in the "commit <oid>" line, and if the user has set log.abbrevcommit, shouldn't we respect that? It seems to me the point of the patch is that "git show" represents the way users expect to see commits shown, and we should let it do its thing as much as possible. -Peff
Junio C Hamano: > Yup, that is semi-understandable, but especially given that it is > one of the options used by the original "diff-tree"'s invocation, > and that we are trying to replace it with "show" from the same > family of commands, it is a bit of disappointment. Indeed. I will make the necessary adjustments. > FYI, attached is a comparison between the diff-tree output and > output from show with my choice of options for "show" picked from > the top of my head. I am trying to run some comparisons, but I'm not entirely certain what the parameters are that were passed to "ls-tree", as it doesn't actually run it through a command line. I tried the v1.0.0^0 and are seeing discrepancies in the line count. I need to check if it is my configuration that causes it, or something else: $ git diff-tree --pretty --stat --summary --cc v1.0.0^0 | grep clone-pack.c clone-pack.c | 153 ++---------------- $ git show --stat --summary --no-abbrev-commit v1.0.0^0 | grep clone-pack.c clone-pack.c | 151 ++---------------- (these are the options I've currently landed on) > I do not think I personally like the --stat output applied to a > merge (--stat and --summary do not work N-way like --cc does for > patch text), but I think these options are the closest parallel to > what we have been giving to "diff-tree". I don't really have a preference here. I usually only look at when something changed (which is why I initially targetted the date format; in Sweden the YYYY-MM-DD date format is the most prevalent) and the commit message (for bug tracker and code-review references and so on), less so the actual diff details (those I can look into later). > $ git show -s --stat --summary --first-parent v1.0.0^0 Hmm, the git show manual page doesn't document supporting "--first-parent". Jeff King: > I guess that commit is what brought me into the cc. I have not been > following this topic too closely, but generally I'm in favor of > using "git show". I even suggested it back then, but I think > Christian preferred not using an external process if we could avoid > it. I saw the code that tried to avoid calling one. I don't know the internals well enough here to figure out if we can do without, even when using git show? That made me realize, if "git show" runs things through a pager, wouldn't it then lose the "%s is the first %s commit\n" message printed by bisect_next_all() before calling the function to show the contents? Is that fixable? > The thread from 2019 is here: > > http://lore.kernel.org/git/20190222061949.GA9875@sigill.intra.peff.net > > which links to the earlier discussion about "git show": > > https://lore.kernel.org/git/CAP8UFD3QhTUj+j3vBGrm0sTQ2dSOLS-m2_PwFj6DZS4VZHKRTQ@mail.gmail.com/ These two seems to also get it to honor settings (one to not colorize output, for instance). So this would be a step further. > I do think keeping --summary is important; it's the only place we show > mode changes, for example. Yes, will fix that. I hadn't realized I lost that, since it wasn't something I have been using myself. > - "--no-patch" is doing nothing (passing --stat is enough to suppress > the default behavior of showing the patch). Indeed. And it also negates "--summary", so I have dropped that. > - "--pretty=medium" is redundant at best (it's the default), Dropped. > - I'm not sure what the intent is in adding --no-abbrev-commit. It is > already the default not to abbreviate it in the "commit <oid>" line, > and if the user has set log.abbrevcommit, shouldn't we respect that? I think I added it because the diff-tree command did something similar. I can drop that as well ("bisect" displays the full commit hash anyway). I guess it mostly is for merges where we show the parent hashes?
On Mon, Apr 01, 2024 at 04:50:32PM +0100, Peter Krefting wrote: > I am trying to run some comparisons, but I'm not entirely certain what the > parameters are that were passed to "ls-tree", as it doesn't actually run it > through a command line. I tried the v1.0.0^0 and are seeing discrepancies in > the line count. I need to check if it is my configuration that causes it, or > something else: > > $ git diff-tree --pretty --stat --summary --cc v1.0.0^0 | grep clone-pack.c > clone-pack.c | 153 ++---------------- > $ git show --stat --summary --no-abbrev-commit v1.0.0^0 | grep clone-pack.c > clone-pack.c | 151 ++---------------- > > (these are the options I've currently landed on) Hmm, I get 153 for both. Presumably it's due to some config that only git-show respects... Aha. If I set diff.algorithm to "patience", I get 151. And I think bisect would produce the same, because it loads diff_ui_config() before running the internal diff-tree. So I think this is fine. > > $ git show -s --stat --summary --first-parent v1.0.0^0 > > Hmm, the git show manual page doesn't document supporting "--first-parent". I think that's a documentation bug(-ish). We do not include all of the traversal-related options that "git log" could use because "git show" does not traverse by default. But it does also affect diffs, per the comment added to git-log's documentation in e58142add4 (doc/rev-list-options: document --first-parent changes merges format, 2020-12-21). But these days we have "--diff-merges=first-parent", which I think is a more intuitive way to specify the same thing for git-show. And it is documented. So I'd say we could probably continue to not mention "--first-parent" itself for git-show. > Jeff King: > > > I guess that commit is what brought me into the cc. I have not been > > following this topic too closely, but generally I'm in favor of using > > "git show". I even suggested it back then, but I think Christian > > preferred not using an external process if we could avoid it. > > I saw the code that tried to avoid calling one. I don't know the internals > well enough here to figure out if we can do without, even when using git > show? There's not really an easy way. I think the only thing you could do is call cmd_show(), but I'm skeptical of that approach in general. The builtin top-level commands are not designed to be run from other spots. And while it will generally work, there will be corner cases (e.g., loading config that touches globals, affecting the calling command in unexpected ways). I suspect you could largely get away with it here where showing the commit is the last thing we do, but I don't think it's a good pattern to get into. > That made me realize, if "git show" runs things through a pager, wouldn't it > then lose the "%s is the first %s commit\n" message printed by > bisect_next_all() before calling the function to show the contents? > > Is that fixable? Good catch. IMHO we should disable the pager entirely by sticking "--no-pager" at the front of the child argv. But then, maybe somebody would like the output to be paged? I wouldn't. If we really wanted to keep the pager for git-show, I guess we'd need to have it print the "%s is the first %s commit" message. The only way I can think to do that is to pass it as a custom --format. But then we'd need to additionally specify all of the usual "medium" format as a custom format, too, which is quite ugly. > I think I added it because the diff-tree command did something similar. I > can drop that as well ("bisect" displays the full commit hash anyway). I > guess it mostly is for merges where we show the parent hashes? No, I don't think the "Merge:" lines are affected by it either way. Those are always abbreviated, and looking at pretty.c's add_merge_info(), I don't think there is any config that affects it. -Peff
Jeff King <peff@peff.net> writes: > So I think this is fine. > >> > $ git show -s --stat --summary --first-parent v1.0.0^0 >> >> Hmm, the git show manual page doesn't document supporting "--first-parent". > > I think that's a documentation bug(-ish). We do not include all of the > traversal-related options that "git log" could use because "git show" > does not traverse by default. But it does also affect diffs, per the > comment added to git-log's documentation in e58142add4 > (doc/rev-list-options: document --first-parent changes merges format, > 2020-12-21). It's one of the "show is a command in the log family, so some of the options that are appropriate to log applies there". The ones that are not useful are the ones about commit walking (e.g., "git show --no-merges seen" would probably show nothing), but many are still relevant. After all "git show" is a "git log --no-walk --cc" in disguise. The "--first-parent" option affects both traversal (which is useless in the context of "git show" that does not walk) and also diff generation (which does make it show the diffstat/summary/patch relative to the first parent), as you two saw. >> I saw the code that tried to avoid calling one. I don't know the internals >> well enough here to figure out if we can do without, even when using git >> show? > > There's not really an easy way. True, but this is "we show the single commit we found before exiting"; executing "git show" as an external program is fine and not worth "optimizing out" the cost of starting another process. > I think the only thing you could do is call cmd_show(), but I'm > skeptical of that approach in general. The builtin top-level commands > are not designed to be run from other spots. And while it will generally > work, there will be corner cases (e.g., loading config that touches > globals, affecting the calling command in unexpected ways). I suspect > you could largely get away with it here where showing the commit is the > last thing we do, but I don't think it's a good pattern to get into. Exactly. Anybody who turns run_command("foo") into blindly calling cmd_foo() should be shot, twice ;-). The right way to turn run_command("foo") into an internal call is not to call cmd_foo(), but to refactor cmd_foo() into the part that sets up the global state and the part that does the "foo" thing, and make the latter a reusable function. In the longer run, if we had infinite engineering resources, it would be nice to have everything callable by everything else internally, is it worth doing for this case? I dunno. >> That made me realize, if "git show" runs things through a pager, wouldn't it >> then lose the "%s is the first %s commit\n" message printed by >> bisect_next_all() before calling the function to show the contents? >> >> Is that fixable? > > Good catch. IMHO we should disable the pager entirely by sticking > "--no-pager" at the front of the child argv. But then, maybe somebody > would like the output to be paged? I wouldn't. Hardcoded --no-pager is a good workaround. But if the output is long and needs paging, wouldn't we see what was shown before we spawned "less" on the screen when we quit it? Running $ (echo message here ; git log --help) and then saying 'q' to exit the pager leaves me "message" after that command line. > If we really wanted to keep the pager for git-show, I guess we'd need to > have it print the "%s is the first %s commit" message. The only way I > can think to do that is to pass it as a custom --format. But then we'd > need to additionally specify all of the usual "medium" format as a > custom format, too, which is quite ugly. ;-) Ugly but fun. I wonder how hard it is to add %(default-output) placeholder for the pretty machinery. Thanks.
On Mon, Apr 01, 2024 at 10:03:13AM -0700, Junio C Hamano wrote: > >> That made me realize, if "git show" runs things through a pager, wouldn't it > >> then lose the "%s is the first %s commit\n" message printed by > >> bisect_next_all() before calling the function to show the contents? > >> > >> Is that fixable? > > > > Good catch. IMHO we should disable the pager entirely by sticking > > "--no-pager" at the front of the child argv. But then, maybe somebody > > would like the output to be paged? I wouldn't. > > Hardcoded --no-pager is a good workaround. But if the output is > long and needs paging, wouldn't we see what was shown before we > spawned "less" on the screen when we quit it? Running > > $ (echo message here ; git log --help) > > and then saying 'q' to exit the pager leaves me "message" after that > command line. That depends on your "less" options and your terminal, I think. Aren't there some combinations where the terminal deinit sequence clears the screen? It has been a while since I've run into that, though, so I might be misremembering. At any rate, my concerns are more: 1. You wouldn't see it while the pager is active, so you are missing some context. 2. If you don't use LESS=F, then it may be annoying to invoke the pager at all. -Peff > > If we really wanted to keep the pager for git-show, I guess we'd need to > > have it print the "%s is the first %s commit" message. The only way I > > can think to do that is to pass it as a custom --format. But then we'd > > need to additionally specify all of the usual "medium" format as a > > custom format, too, which is quite ugly. > > ;-) Ugly but fun. > > I wonder how hard it is to add %(default-output) placeholder for the > pretty machinery. I have a dream that all of the pretty formats could be implemented in terms of %-placeholders. But yeah, even without that, being able to do "%(pretty:medium)" would be cool. "Pretty" cool, even. (Sorry, I could not resist). -Peff
On Mon, Apr 1, 2024 at 4:32 AM Jeff King <peff@peff.net> wrote: > I guess that commit is what brought me into the cc. I have not been > following this topic too closely, but generally I'm in favor of using > "git show". I even suggested it back then, but I think Christian > preferred not using an external process if we could avoid it. > > The thread from 2019 is here: > > http://lore.kernel.org/git/20190222061949.GA9875@sigill.intra.peff.net > > which links to the earlier discussion about "git show": > > https://lore.kernel.org/git/CAP8UFD3QhTUj+j3vBGrm0sTQ2dSOLS-m2_PwFj6DZS4VZHKRTQ@mail.gmail.com/ > > IMHO this config thing is a good example of the strength of the separate > "show" process. If our goal is to trigger all the niceties of "git > show", it is tricky to catch them all. The revision machinery is pretty > reusable, but there's no easy way to figure out which config affects > git-show and so on. Of course if we had a way to invoke git-show > in-process that would work, but I suspect there are unexpected corner > cases that might trigger. Sorry for not following the topic closely and for replying to this so late, but I think that by now we should have some kind of guidelines about when forking a new process is Ok and when it's not. It seems to me that there was already some amount of back and forth on this topic when bisect and other shell commands were ported to C a long time ago. There weren't clearly written guidelines, but it seems to me that at that time we thought that forking a new process was generally bad, especially for performance reasons, but also because they showed a bad example and didn't go in the right direction. It seems to me that people who reviewed code that ported some commands to C sometimes asked contributors to not fork processes, and efforts were made by contributors, like GSoC or Outreachy contributors I mentored, to go in this direction. At one point there was even a microproject about replacing code that forked a process with function calls. These days there are also talks and patches around about libification and about passing around a "repository" variable and other such variables, so that C code does not need to fork processes to be able to work more broadly, for example in submodules. And again it seems to me that such changes (adding code which starts a new process to replace code which doesn't) doesn't go in the same direction as the libification and similar goals.
Christian Couder <christian.couder@gmail.com> writes: >> IMHO this config thing is a good example of the strength of the separate >> "show" process. If our goal is to trigger all the niceties of "git >> show", it is tricky to catch them all. The revision machinery is pretty >> reusable, but there's no easy way to figure out which config affects >> git-show and so on. Of course if we had a way to invoke git-show >> in-process that would work, but I suspect there are unexpected corner >> cases that might trigger. > > Sorry for not following the topic closely and for replying to this so > late, but I think that by now we should have some kind of guidelines > about when forking a new process is Ok and when it's not. I thought we had passed that stage long ago. A case like this one we see in this patch, where it is run just once immediately before we give control back to the end-user (as opposed to "gets run each time in a tight loop"), I would see it a no-brainer to discount the "fork+exec is so expensive" objection more than we would otherwise, especially when the upside of running an external command is so much bigger. There actually should be a different level of "running it as a separate command" that we do not have. If we can split out and encapsulate the global execution context sufficiently into a "bag of state variables" structure, and rewrite cmd_foo() for each such command we wish to be able to run from inside an executing Git into two parts: - cmd_foo() that prepares the global execution context to a "pristine" state, calls into cmd__foo() with that "bag of state variables" structure as one of the parameters, and exits when everything is done. - cmd__foo() that does the rest, including reading the configuration files, parsing of the command line arguments to override them, doing the actual work. then the codepath we are changing from using diff-tree to show can do something like: struct git_global_state state = GIT_GLOBAL_STATE_INIT; struct strvec args = STRVEC_INIT; strvec_pushl(&args, ...); cmd__show(&state, args.nr , args.v); and expect that cmd__show() will do the _right thing_, right? And to reach that ultimate goal, I do not think using run_command() API in the meantime poses hindrance. The real work should be in the implementation of cmd__show(), not the open-coded use of revisions API at each such point where you are tempted to spawn an external command via run_command() API, which will have to be consolidated and replaced with a call to cmd__show() anyway.
Junio C Hamano: > then the codepath we are changing from using diff-tree to show can > do something like: > > struct git_global_state state = GIT_GLOBAL_STATE_INIT; > struct strvec args = STRVEC_INIT; > > strvec_pushl(&args, ...); > cmd__show(&state, args.nr , args.v); > > and expect that cmd__show() will do the _right thing_, right? In this particular case, calling "git show" is really the last thing we want to do; so if we can move the cleanup that happens after it (that ends the bisect), it should be able to just take over the current process with a call to show, without needing to re-exec. And calling back to the libification question, I would see this part of the bisect command to be something that would run *on top of* the library (with possibly an API to poke bad/good states into it), so I don't think that objection holds for this particular case.
Peter Krefting <peter@softwolves.pp.se> writes: > In this particular case, calling "git show" is really the last thing > we want to do; so if we can move the cleanup that happens after it > (that ends the bisect), it should be able to just take over the > current process with a call to show, without needing to re-exec. The cmd_foo() functions are also expected to either return to their callers or call exit() themselves, and it is true that as the very last step before giving the control back to the end-user in the current "bisect" process, we could make an internal call to cmd_foo() and let it exit with its own exit status. But that is only the latter half of a story. The cmd_show() (or any cmd_foo() in general) function expects to start from within a pristine environment. Calling them _after_ somebody else (in this case everything called from cmd_bisect()) clobbered the global state may or may not work (and in general we should assume it would not work) correctly. The outline of the envisioned end state of libification I gave was about an arrangement to ensure that we can give such an pristine state when we make a call to such "top level" entry point of "foo" command (in this case, "show"), from a different command (in this case, "bisect"). It is very much orthogonal to what you are talking about, I think. We need both. > And calling back to the libification question, I would see this part > of the bisect command to be something that would run *on top of* the > library (with possibly an API to poke bad/good states into it), so I > don't think that objection holds for this particular case. There was no objection. I was just pointing out that the infrastructure is not ready to do so.
diff --git a/bisect.c b/bisect.c index 8487f8cd1b..3d0100b165 100644 --- a/bisect.c +++ b/bisect.c @@ -959,23 +959,18 @@ static enum bisect_error check_good_are_ancestors_of_bad(struct repository *r, } /* - * This does "git diff-tree --pretty COMMIT" without one fork+exec. + * Runs "git show" to display a commit */ -static void show_diff_tree(struct repository *r, - const char *prefix, - struct commit *commit) +static void show_commit(struct commit *commit) { - const char *argv[] = { - "diff-tree", "--pretty", "--stat", "--summary", "--cc", NULL - }; - struct rev_info opt; + struct child_process show = CHILD_PROCESS_INIT; - git_config(git_diff_ui_config, NULL); - repo_init_revisions(r, &opt, prefix); - - setup_revisions(ARRAY_SIZE(argv) - 1, argv, &opt, NULL); - log_tree_commit(&opt, commit); - release_revisions(&opt); + strvec_pushl(&show.args, "show", "--pretty=medium", "--stat", "--no-abbrev-commit", "--no-patch", + oid_to_hex(&commit->object.oid), NULL); + show.git_cmd = 1; + if (run_command(&show)) + die(_("unable to start 'show' for object '%s'"), + oid_to_hex(&commit->object.oid)); } /* @@ -1092,7 +1087,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix) printf("%s is the first %s commit\n", oid_to_hex(bisect_rev), term_bad); - show_diff_tree(r, prefix, revs.commits->item); + show_commit(revs.commits->item); /* * This means the bisection process succeeded. * Using BISECT_INTERNAL_SUCCESS_1ST_BAD_FOUND (-10)
When bisect finds the target commit to display, it calls git diff-tree to do so. This is a plumbing command that is not affected by the user's log.date setting. Switch to instead use "git show", which does honor it. Reported-by: Michael Osipov <michael.osipov@innomotics.com> Signed-off-By: Peter Krefting <peter@softwolves.pp.se> --- bisect.c | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) This version also uses "--stat" which produces an output more like the one from the diff-tree utility. GitHub's test run reports a single failed test (7300), but this passes when I try it locally: https://github.com/nafmo/git-l10n-sv/commit/2f27ae64064edc5c2570f1c9ea121f3f1a7283d7