Message ID | 20210620213836.10771-1-rhi@pengutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | bisect: allow to run from subdirectories | expand |
On Sun, Jun 20 2021, Roland Hieber wrote: > Currently, calling 'git bisect' from a directory other than the top > level of a repository only comes up with an error message: > > You need to run this command from the toplevel of the working tree. > > After a glance through the bisect code, there seems to be nothing that > relies on the current working directory, and a few hours of bisect usage > also didn't turn up any problems. Set the appropriate flag for > git-sh-setup to remove the error message. > > Signed-off-by: Roland Hieber <rhi@pengutronix.de> > --- > git-bisect.sh | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/git-bisect.sh b/git-bisect.sh > index 6a7afaea8da0..20ba0ee7c18a 100755 > --- a/git-bisect.sh > +++ b/git-bisect.sh > @@ -32,6 +32,7 @@ git bisect run <cmd>... > Please use "git help bisect" to get the full man page.' > > OPTIONS_SPEC= > +SUBDIRECTORY_OK=1 > . git-sh-setup > > _x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]' How does this affect out-of-tree scripts that will be run with "git bisect run", is the cwd set to the root as they now might expect git to check, or whatever subdirectory you ran the "run" from?
On 2021-06-21 at 00:35:49, Ævar Arnfjörð Bjarmason wrote: > > On Sun, Jun 20 2021, Roland Hieber wrote: > > > Currently, calling 'git bisect' from a directory other than the top > > level of a repository only comes up with an error message: > > > > You need to run this command from the toplevel of the working tree. > > > > After a glance through the bisect code, there seems to be nothing that > > relies on the current working directory, and a few hours of bisect usage > > also didn't turn up any problems. Set the appropriate flag for > > git-sh-setup to remove the error message. > > > > Signed-off-by: Roland Hieber <rhi@pengutronix.de> > > --- > > git-bisect.sh | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/git-bisect.sh b/git-bisect.sh > > index 6a7afaea8da0..20ba0ee7c18a 100755 > > --- a/git-bisect.sh > > +++ b/git-bisect.sh > > @@ -32,6 +32,7 @@ git bisect run <cmd>... > > Please use "git help bisect" to get the full man page.' > > > > OPTIONS_SPEC= > > +SUBDIRECTORY_OK=1 > > . git-sh-setup > > > > _x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]' > > How does this affect out-of-tree scripts that will be run with "git > bisect run", is the cwd set to the root as they now might expect git to > check, or whatever subdirectory you ran the "run" from? I'm also interested in this, specifically as a patch to the documentation and a corresponding test (and a commit message justification), since folks will rely on whatever behavior we implement and we won't want to break it. We'd probably also want to add a test at least that the user can invoke git bisect outside of the root of the repository, and maybe that it performs correct results for at least one or two known cases when invoked outside of the root. And I'm also wondering if maybe there are other cases that deserve a test along with this change. As for the idea itself, I think it's a good one assuming everything continues to work. It will certainly be more convenient for a lot of people.
On Sun, Jun 20, 2021 at 10:00 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > On 2021-06-21 at 00:35:49, Ævar Arnfjörð Bjarmason wrote: > > On Sun, Jun 20 2021, Roland Hieber wrote: > > > Currently, calling 'git bisect' from a directory other than the top > > > level of a repository only comes up with an error message: > > > > > > You need to run this command from the toplevel of the working tree. > > > > How does this affect out-of-tree scripts that will be run with "git > > bisect run", is the cwd set to the root as they now might expect git to > > check, or whatever subdirectory you ran the "run" from? > > As for the idea itself, I think it's a good one assuming everything > continues to work. It will certainly be more convenient for a lot of > people. There have been multiple patches sent to the project over the years with the same purpose. One problem, I believe, which has never been fully addressed is what happens when the subdirectory from which git-bisect is run gets deleted as part of the bisection. Here are a couple recent threads triggered by previous such patches (but there are probably several more): https://lore.kernel.org/git/pull.765.git.1603271344522.gitgitgadget@gmail.com/ https://lore.kernel.org/git/pull.736.git.git.1584868547682.gitgitgadget@gmail.com/
Roland Hieber <rhi@pengutronix.de> writes: > Currently, calling 'git bisect' from a directory other than the top > level of a repository only comes up with an error message: > > You need to run this command from the toplevel of the working tree. > > After a glance through the bisect code, there seems to be nothing that > relies on the current working directory, and a few hours of bisect usage > also didn't turn up any problems. Set the appropriate flag for > git-sh-setup to remove the error message. Try to find a history in which to run a bisect that (1) has a directory T in the recent part of the history, and (2) does not have that directory in the older part of the history. Better yet, if the older part of the history has T as a regular file, that would be ideal. Even better, if that old regular file T was added, then removed, and then recreated, before it got turned into a directory. Now, using the two commits you used to satisfy conditions (1) and (2) as "bad" and "good", and using "bad - has T as a directory" and "good - does not have T as a directory", as the bisection criterion, try to find where "good" turns "bad"---in other words, find the commit that either creates T as a directory or turns the regular file T into a directory. Perform that bisect from the subdirectory T. Would that work? I suspect it wouldn't; when trying to check out an old version that does not have T in the directory, either the checkout would fail because it cannot remove T which has an active process (i.e. your terminal session) in it, or your process sitting in an orphaned directory (i.e. your ".." may still be the original top-level directory that used to have T subdirectory, but "cd T" from the top-level will not reach where you are). All sorts of things can go wrong and that is why we forbid it. Just a single "cd" upfront would save the user a lot of headache. This does not depend on "do we have T as a directory?" being the bisection criteria. The important thing is that the current directory would appear and disappear as the bisection process makes you jump around in the history. Hope this helps understanding the issue.
On Sun, Jun 20, 2021 at 10:10:10PM -0400, Eric Sunshine wrote: > On Sun, Jun 20, 2021 at 10:00 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > On 2021-06-21 at 00:35:49, Ævar Arnfjörð Bjarmason wrote: > > > On Sun, Jun 20 2021, Roland Hieber wrote: > > > > Currently, calling 'git bisect' from a directory other than the top > > > > level of a repository only comes up with an error message: > > > > > > > > You need to run this command from the toplevel of the working tree. > > > > > > How does this affect out-of-tree scripts that will be run with "git > > > bisect run", is the cwd set to the root as they now might expect git to > > > check, or whatever subdirectory you ran the "run" from? > > > > As for the idea itself, I think it's a good one assuming everything > > continues to work. It will certainly be more convenient for a lot of > > people. > > There have been multiple patches sent to the project over the years > with the same purpose. One problem, I believe, which has never been > fully addressed is what happens when the subdirectory from which > git-bisect is run gets deleted as part of the bisection. > > Here are a couple recent threads triggered by previous such patches > (but there are probably several more): > > https://lore.kernel.org/git/pull.765.git.1603271344522.gitgitgadget@gmail.com/ > https://lore.kernel.org/git/pull.736.git.git.1584868547682.gitgitgadget@gmail.com/ Ah, thanks for explaining the problem. Would a patch that adds a short explanatory comment in git-bisect.sh on the matter help to prevent people sending such patches? - Roland
On Mon, Jun 21 2021, Roland Hieber wrote: > On Sun, Jun 20, 2021 at 10:10:10PM -0400, Eric Sunshine wrote: >> On Sun, Jun 20, 2021 at 10:00 PM brian m. carlson >> <sandals@crustytoothpaste.net> wrote: >> > On 2021-06-21 at 00:35:49, Ævar Arnfjörð Bjarmason wrote: >> > > On Sun, Jun 20 2021, Roland Hieber wrote: >> > > > Currently, calling 'git bisect' from a directory other than the top >> > > > level of a repository only comes up with an error message: >> > > > >> > > > You need to run this command from the toplevel of the working tree. >> > > >> > > How does this affect out-of-tree scripts that will be run with "git >> > > bisect run", is the cwd set to the root as they now might expect git to >> > > check, or whatever subdirectory you ran the "run" from? >> > >> > As for the idea itself, I think it's a good one assuming everything >> > continues to work. It will certainly be more convenient for a lot of >> > people. >> >> There have been multiple patches sent to the project over the years >> with the same purpose. One problem, I believe, which has never been >> fully addressed is what happens when the subdirectory from which >> git-bisect is run gets deleted as part of the bisection. >> >> Here are a couple recent threads triggered by previous such patches >> (but there are probably several more): >> >> https://lore.kernel.org/git/pull.765.git.1603271344522.gitgitgadget@gmail.com/ >> https://lore.kernel.org/git/pull.736.git.git.1584868547682.gitgitgadget@gmail.com/ > > Ah, thanks for explaining the problem. Would a patch that adds a short > explanatory comment in git-bisect.sh on the matter help to prevent > people sending such patches? Having skimmed the linked discussions I don't think the consensus is that this shouldn't exist, but that someone who wants it should do some research on the relevant edge cases, come up with test cases for them, discuss the trade-offs in a commit message etc. I for one would welcome such a feature, it's often annoyed me, it should just work like "rebase exec" in that a "run" script should cd to the root, but (as discussed in the linked threads) I don't see why we'd prevent it any more than several other commands that already have this edge case, but don't explicitly prevent this.
On Mon, Jun 21, 2021 at 8:50 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Mon, Jun 21 2021, Roland Hieber wrote: > > On Sun, Jun 20, 2021 at 10:10:10PM -0400, Eric Sunshine wrote: > >> There have been multiple patches sent to the project over the years > >> with the same purpose. One problem, I believe, which has never been > >> fully addressed is what happens when the subdirectory from which > >> git-bisect is run gets deleted as part of the bisection. > > > > Ah, thanks for explaining the problem. Would a patch that adds a short > > explanatory comment in git-bisect.sh on the matter help to prevent > > people sending such patches? > > Having skimmed the linked discussions I don't think the consensus is > that this shouldn't exist, but that someone who wants it should do some > research on the relevant edge cases, come up with test cases for them, > discuss the trade-offs in a commit message etc. Be that as it may, having a comment in the code explaining why it is currently turned off and what needs to be accomplished before turning it on might indeed be a good way to stem the flow of patches which merely flip-the-switch without doing that extra research. Whether or not Junio would accept such a patch documenting the current state of affairs is a different question.
On 2021-06-21 at 02:10:10, Eric Sunshine wrote: > On Sun, Jun 20, 2021 at 10:00 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > On 2021-06-21 at 00:35:49, Ævar Arnfjörð Bjarmason wrote: > > > On Sun, Jun 20 2021, Roland Hieber wrote: > > > > Currently, calling 'git bisect' from a directory other than the top > > > > level of a repository only comes up with an error message: > > > > > > > > You need to run this command from the toplevel of the working tree. > > > > > > How does this affect out-of-tree scripts that will be run with "git > > > bisect run", is the cwd set to the root as they now might expect git to > > > check, or whatever subdirectory you ran the "run" from? > > > > As for the idea itself, I think it's a good one assuming everything > > continues to work. It will certainly be more convenient for a lot of > > people. > > There have been multiple patches sent to the project over the years > with the same purpose. One problem, I believe, which has never been > fully addressed is what happens when the subdirectory from which > git-bisect is run gets deleted as part of the bisection. And that's the thing I was missing. This did seem a little too simple. I think there are certainly cases where we know the directory isn't changing; most of the situations in which I've bisected things in Git, for example. But we will, of course, need to specify the behavior when that's not the case, and as Junio said, it probably will just fail in unexpected ways, and that wouldn't be a helpful user experience. At the very least, we'd need to document the behavior, and ideally try to make it work reasonably gracefully (e.g., by not removing the directory in that case).
On Mon, Jun 21 2021, Ævar Arnfjörð Bjarmason wrote: > On Mon, Jun 21 2021, Roland Hieber wrote: > >> On Sun, Jun 20, 2021 at 10:10:10PM -0400, Eric Sunshine wrote: >>> On Sun, Jun 20, 2021 at 10:00 PM brian m. carlson >>> <sandals@crustytoothpaste.net> wrote: >>> > On 2021-06-21 at 00:35:49, Ævar Arnfjörð Bjarmason wrote: >>> > > On Sun, Jun 20 2021, Roland Hieber wrote: >>> > > > Currently, calling 'git bisect' from a directory other than the top >>> > > > level of a repository only comes up with an error message: >>> > > > >>> > > > You need to run this command from the toplevel of the working tree. >>> > > >>> > > How does this affect out-of-tree scripts that will be run with "git >>> > > bisect run", is the cwd set to the root as they now might expect git to >>> > > check, or whatever subdirectory you ran the "run" from? >>> > >>> > As for the idea itself, I think it's a good one assuming everything >>> > continues to work. It will certainly be more convenient for a lot of >>> > people. >>> >>> There have been multiple patches sent to the project over the years >>> with the same purpose. One problem, I believe, which has never been >>> fully addressed is what happens when the subdirectory from which >>> git-bisect is run gets deleted as part of the bisection. >>> >>> Here are a couple recent threads triggered by previous such patches >>> (but there are probably several more): >>> >>> https://lore.kernel.org/git/pull.765.git.1603271344522.gitgitgadget@gmail.com/ >>> https://lore.kernel.org/git/pull.736.git.git.1584868547682.gitgitgadget@gmail.com/ >> >> Ah, thanks for explaining the problem. Would a patch that adds a short >> explanatory comment in git-bisect.sh on the matter help to prevent >> people sending such patches? > > Having skimmed the linked discussions I don't think the consensus is > that this shouldn't exist, but that someone who wants it should do some > research on the relevant edge cases, come up with test cases for them, > discuss the trade-offs in a commit message etc. > > I for one would welcome such a feature, it's often annoyed me, it should > just work like "rebase exec" in that a "run" script should cd to the > root, but (as discussed in the linked threads) I don't see why we'd > prevent it any more than several other commands that already have this > edge case, but don't explicitly prevent this. There's also a related issue: It's not just "git bisect start" etc. that have this problem, but also "git bisect log". No matter what we do with run etc. I see no reason for why "log" should not locate the relevant log in the .git directory, same for "view", also "good", "bad" etc. E.g. one might start a bisect session in one terminal, and be cd'd into the 't/' directory in another, and then couldn't run "good/bad" there.
On Mon, Jun 21, 2021 at 12:43:32PM +0900, Junio C Hamano wrote: > Roland Hieber <rhi@pengutronix.de> writes: > > > Currently, calling 'git bisect' from a directory other than the top > > level of a repository only comes up with an error message: > > > > You need to run this command from the toplevel of the working tree. > > > > After a glance through the bisect code, there seems to be nothing that > > relies on the current working directory, and a few hours of bisect usage > > also didn't turn up any problems. Set the appropriate flag for > > git-sh-setup to remove the error message. > > Try to find a history in which to run a bisect that > > (1) has a directory T in the recent part of the history, and > > (2) does not have that directory in the older part of the history. > Better yet, if the older part of the history has T as a regular > file, that would be ideal. Even better, if that old regular > file T was added, then removed, and then recreated, before it > got turned into a directory. > > Now, using the two commits you used to satisfy conditions (1) and > (2) as "bad" and "good", and using "bad - has T as a directory" and > "good - does not have T as a directory", as the bisection criterion, > try to find where "good" turns "bad"---in other words, find the > commit that either creates T as a directory or turns the regular > file T into a directory. > > Perform that bisect from the subdirectory T. Would that work? I > suspect it wouldn't; when trying to check out an old version that > does not have T in the directory, either the checkout would fail > because it cannot remove T which has an active process (i.e. your > terminal session) in it, or your process sitting in an orphaned > directory (i.e. your ".." may still be the original top-level > directory that used to have T subdirectory, but "cd T" from the > top-level will not reach where you are). All sorts of things can > go wrong and that is why we forbid it. Just a single "cd" upfront > would save the user a lot of headache. > > This does not depend on "do we have T as a directory?" being the > bisection criteria. The important thing is that the current > directory would appear and disappear as the bisection process makes > you jump around in the history. I think that is a good explanation. But I remain somewhat unconvinced that it is that big a problem in practice. There are already other cases where checkout may fail (e.g., an untracked build artifact in one commit conflicting with a tracked file in another). E.g., this sequence in git.git: # here we build git-remote-testgit from its .sh counterpart git checkout 5afb2ce4cd^ make # now imagine our bisect jumps forward; this one drops it from # the .gitignore, so its now a precious untracked file git checkout 5afb2ce4cd make # now jump backwards. I'm not sure if this can actually happen in # a bisection of git.git, since we went forward then back. But # possibly you could hit it in parallel lines of history[1], and # certainly a similar history which didn't update ".gitignore" at the # same time could run into it. git checkout 709a957d9493^ That last command yields: error: The following untracked working tree files would be overwritten by checkout: git-remote-testgit Please move or remove them before you switch branches. Aborting Which is annoying, but it's pretty clear that you need to remove the file and then re-run your "bisect good" or "bisect bad" (and if we want to make it more clear, then git-bisect could notice that git-checkout failed and add extra advice). And I think any directory typechange shenanigans would end up with a similar message. It's nice to avoid that ahead of time, but it comes at the cost of bugging the user to preemptively "cd" to the toplevel _every time_, even if they are not going to bisect through history where the directory goes away. So while I don't disagree with avoiding the confusing case, it seems like the safety check is overly cautious. (Of course an alternative is that it could actually examine the history to make sure $PWD never goes away; I wonder if that would be annoyingly costly or not). The more interesting case, I think, is when T is simply removed. If there are untracked files (even ignored build artifacts) in T, then the directory sticks around anyway. But if not, Git is able to checkout the original commit and deletes the directory. And then you get: $ cd builtin $ git checkout 81b50f3ce40bfdd66e5d967bf82be001039a9a98^ $ git status fatal: Unable to read current working directory: No such file or directory That's potentially more confusing, as things subtly don't work. But as shown by the command above, it's not the exclusive domain of bisect. Perhaps a poorly written bisect-run script could get confused. But it still doesn't seem to me like it justifies the tradeoff of "we must never allow bisect from a subdirectory, on the off chance that we might run into this case". If we did want try helping the user in this case, we could also diagnose it pretty easily by confirming that $PWD will be in the destination commit before checking it out, and complaining otherwise (which turns it into the earlier case). In fact, regular "checkout" could perhaps do that, too, as a courtesy (with "-f" overriding). There may be other corner cases of interest (I didn't think too hard about weird symlink cases, for example). But overall I think people are more likely to be annoyed by the "must be at the toplevel" safety valve than they are by "checkout failed midway through my bisect", if only because the former happens a lot more frequently. -Peff [1] I'm sorry not to have produced a real git-bisect example that causes "checkout" to bail. My subjective recollection is that I have run into this problem with git-remote-testgit before during real-world use, but I'm not 100% sure it was while bisecting, and not simply sight-seeing around history.
Jeff King <peff@peff.net> writes: >> This does not depend on "do we have T as a directory?" being the >> bisection criteria. The important thing is that the current >> directory would appear and disappear as the bisection process makes >> you jump around in the history. > > I think that is a good explanation. But I remain somewhat unconvinced > that it is that big a problem in practice. It's just the difference in attitude, I would think. Things like "rebase" take a more liberal attitude and most of the time things work out OK because removal of a directory is a rare event and replacement of a directory with a non-directory is even rarer, but when things break there is no provision to help users to know how it broke by diagnosing why the revision cannot be checked out, or why the directory D the user's shell session is sitting in is now orphaned and different from the directory D the user thinks he is in because it was removed (while the user's process is in there) and then recreated under the same name, or any of the tricky things. The ideal endgame would be to allow operating from subdirectory *AND* have provisions for helping users when things go wrong because the starting subdirectory goes away. "bisect" works under the more conservative philosophy (start strict and forbid operation that we know we didn't spend any effort to avoid taking the user into dangerous waters---we can allow it later once we make it safer but not until then). It would involve a bit of chicken-and-egg, I would guess. If we think improving Git is more important than avoiding even occasional failures imposed on end-users, then the more liberal approach would be easier to work with---we can allow the command to start from subdirectory even if we do not do anything to help avoid problems, let the users hit a snag and have them report it, which would give us some concrete failure mode to work on.
On Mon, Jun 28, 2021 at 06:22:37PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > >> This does not depend on "do we have T as a directory?" being the > >> bisection criteria. The important thing is that the current > >> directory would appear and disappear as the bisection process makes > >> you jump around in the history. > > > > I think that is a good explanation. But I remain somewhat unconvinced > > that it is that big a problem in practice. > > It's just the difference in attitude, I would think. Things like > "rebase" take a more liberal attitude and most of the time things > work out OK because removal of a directory is a rare event and > replacement of a directory with a non-directory is even rarer, but > when things break there is no provision to help users to know how it > broke by diagnosing why the revision cannot be checked out, or why > the directory D the user's shell session is sitting in is now > orphaned and different from the directory D the user thinks he is in > because it was removed (while the user's process is in there) and > then recreated under the same name, or any of the tricky things. > > The ideal endgame would be to allow operating from subdirectory > *AND* have provisions for helping users when things go wrong because > the starting subdirectory goes away. "bisect" works under the more > conservative philosophy (start strict and forbid operation that we > know we didn't spend any effort to avoid taking the user into > dangerous waters---we can allow it later once we make it safer but > not until then). Yes, I agree with this second paragraph. Just trying to create a constructive path forward, I think I'd be comfortable with a patch series that: - confirmed that bisect's behavior when checkout fails produces a reasonable error message that the user can act on (either from checkout itself, or perhaps extra advice from bisect when the checkout fails) - detected the case when the prefix we started from goes away as part of the checkout, and turned that into an error (rather than orphaning the user's cwd and leading to confusing results). This _might_ even be something that regular "git checkout" would benefit from, too. And I think should not be too expensive to implement (at least not after an admittedly moderate amount of thinking on it). - only then turn on SUBDIRECTORY_OK. -Peff
diff --git a/git-bisect.sh b/git-bisect.sh index 6a7afaea8da0..20ba0ee7c18a 100755 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -32,6 +32,7 @@ git bisect run <cmd>... Please use "git help bisect" to get the full man page.' OPTIONS_SPEC= +SUBDIRECTORY_OK=1 . git-sh-setup _x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
Currently, calling 'git bisect' from a directory other than the top level of a repository only comes up with an error message: You need to run this command from the toplevel of the working tree. After a glance through the bisect code, there seems to be nothing that relies on the current working directory, and a few hours of bisect usage also didn't turn up any problems. Set the appropriate flag for git-sh-setup to remove the error message. Signed-off-by: Roland Hieber <rhi@pengutronix.de> --- git-bisect.sh | 1 + 1 file changed, 1 insertion(+)