Message ID | 20190204205037.32143-1-martin.agren@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] doc-diff: don't `cd_to_toplevel` | expand |
On Mon, Feb 04, 2019 at 09:50:37PM +0100, Martin Ågren wrote: > `usage` tries to call $0, which might very well be "./doc-diff", so if > we `cd_to_toplevel` before calling `usage`, we'll end with an error to > the effect of "./doc-diff: not found" rather than a friendly `doc-diff > -h` output. This regressed in ad51743007 ("doc-diff: add --clean mode to > remove temporary working gunk", 2018-08-31) where we moved the call to > `cd_to_toplevel` to much earlier. > > A general fix might be to teach git-sh-setup to save away the absolute > path for $0 and then use that, instead. I'm not aware of any portable > way of doing that, see, e.g., d2addc3b96 ("t7800: readlink may not be > available", 2016-05-31). > > An early version of this patch moved `cd_to_toplevel` back to where it > was before ad51743007 and taught the "--clean" code to cd on its own. > But let's try instead to get rid of the cd-ing entirely. We don't really > need it and we can work with absolute paths instead. There's just one > use of $PWD that we need to adjust by simply dropping it. Thanks, this version looks great to me! -Peff
Hi Peff and Martin, On Tue, 5 Feb 2019, Jeff King wrote: > On Mon, Feb 04, 2019 at 09:50:37PM +0100, Martin Ågren wrote: > > > `usage` tries to call $0, which might very well be "./doc-diff", so if > > we `cd_to_toplevel` before calling `usage`, we'll end with an error to > > the effect of "./doc-diff: not found" rather than a friendly `doc-diff > > -h` output. This regressed in ad51743007 ("doc-diff: add --clean mode to > > remove temporary working gunk", 2018-08-31) where we moved the call to > > `cd_to_toplevel` to much earlier. > > > > A general fix might be to teach git-sh-setup to save away the absolute > > path for $0 and then use that, instead. I'm not aware of any portable > > way of doing that, see, e.g., d2addc3b96 ("t7800: readlink may not be > > available", 2016-05-31). > > > > An early version of this patch moved `cd_to_toplevel` back to where it > > was before ad51743007 and taught the "--clean" code to cd on its own. > > But let's try instead to get rid of the cd-ing entirely. We don't really > > need it and we can work with absolute paths instead. There's just one > > use of $PWD that we need to adjust by simply dropping it. > > Thanks, this version looks great to me! Peff, you asked at the Contributors' Summit for a way to get notified when CI fails for your patch, and I was hesitant to add it (even if it would be straight-forward, really) because of the false positives. This is one such example, as the test fails: https://dev.azure.com/gitgitgadget/git/_build/results?buildId=944 In particular, the tests t2024 and t5552 are broken for ma/doc-diff-usage-fix on Windows. The reason seems to be that those are already broken for the base commit that Junio picked: jk/diff-rendered-docs (actually, not the tip of it, but the commit fixed by Martin's patch). Of course, I understand why Junio picks base commits that are far, far in the past (and have regressions that current `master` does not have), it makes sense from the point of view where the fixes should be as close to the commits they fix. The downside is that we cannot automate regression testing more than we do now, with e.g. myself acting as curator of test failures. Ciao, Dscho Ciao,
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > In particular, the tests t2024 and t5552 are broken for > ma/doc-diff-usage-fix on Windows. The reason seems to be that those are > already broken for the base commit that Junio picked: > jk/diff-rendered-docs (actually, not the tip of it, but the commit fixed > by Martin's patch). > > Of course, I understand why Junio picks base commits that are far, far in > the past (and have regressions that current `master` does not have), it > makes sense from the point of view where the fixes should be as close to > the commits they fix. The downside is that we cannot automate regression > testing more than we do now, with e.g. myself acting as curator of test > failures. I think by "regressions that current master does not have", you actually meant "unrelated known breakages we have fixed since then". If you use topic-branch workflow, it is unavoidable and expected, as each topic has its focus and the topic to fix doc-diff are not about making checkout test or fetch negotiator test to pass on all platforms. I am wondering if the automated testing can be made more useful by limiting the scope of testing if it is run on individual topic. For four primary integration branches, we do want to ensure all tests keep passing (or at least no tests that passed in the previous round start failing), but it would match the topic-branch workflow better if the automated testing allowed an individual topic to rely on unrelated known breakages to be dealt with by other topics and newer integration branches. For a topic like doc-diff that is primarily meant for developers and documenters, it does not matter much, but for an old but important bug, forking the topic to fix it at a point close to the origin is crucial---that is what would allow people to merge the fix to an older maintenance track, without cherry-picking. It is especially true when the bug being fixed is more severe than unrelated breakages that have been fixed since then. Possible approaches I considered and rejected are: - Perhaps find updated tests in the topic and run only those? We can complain a topic that is meant as a "fix" to something if it does not have test while trying to find such tests ;-) But this does not work if a "fix" has unintended consequences and breaks existing tests that do not have apparent relation known to the author of the patch. - Perhaps find the fork point, run tests to find known breakages and exclude them? This would simply be not practical, as it doubles the number of tests run, for individual topic branches because there are an order of magnitude more of them than the primary integration branches. Possibly, if we limit the fork point to tagged releases, the latter approach might turn out to be doable. For ma/doc-diff-usage-fix, the base commit ad51743007 was v2.20.0-rc0~232, so we could round up and pick v2.20.0 as the base instead (the purpose of picking an older point is to allow merging to older maintenance track, and it is unlikely that people would keep using v2.20.0-rc0 as the base of their forked distribution). Then if the automated testing keeps record of known breakages (whether they are later fixed or still broken) for these tagged releases, testing a new topic forked from one of them and deciding not to worry about breakages of tests that are known to be broken (or perhaps deciding to skip these tests in the first place) may become feasible. I dunno. Limiting fork points to tagged releases for topics that are meant for the maintenance tracks may have other benefits by restricting the shape of the resulting dag, so I am willing to try it as an experiment, whether it would help your workflow or not.
Hi Junio, On Tue, 5 Feb 2019, Junio C Hamano wrote: > I am wondering if the automated testing can be made more useful by > limiting the scope of testing if it is run on individual topic. For > four primary integration branches, we do want to ensure all tests keep > passing (or at least no tests that passed in the previous round start > failing), but it would match the topic-branch workflow better if the > automated testing allowed an individual topic to rely on unrelated known > breakages to be dealt with by other topics and newer integration > branches. It is a known technique to use code coverage to inform automated test runs which tests (or test cases) to run, but our code base is not yet conducive to that approach: we use too many scripts (try to determine code coverage for scripts, will ya?), and you cannot really cherry-pick which test cases to run: despite `test-lib.sh` offering the `--run=<range>` option, our test cases frequently rely on side effects by previous test cases in the same script. So that obvious strategy simply is unavailable to us, at least at the moment. > For a topic like doc-diff that is primarily meant for developers and > documenters, it does not matter much, but for an old but important bug, > forking the topic to fix it at a point close to the origin is > crucial---that is what would allow people to merge the fix to an older > maintenance track, without cherry-picking. It is especially true when > the bug being fixed is more severe than unrelated breakages that have > been fixed since then. As I said, I understand your reasoning. > Possible approaches I considered and rejected are: > > - Perhaps find updated tests in the topic and run only those? We > can complain a topic that is meant as a "fix" to something if it > does not have test while trying to find such tests ;-) But this > does not work if a "fix" has unintended consequences and breaks > existing tests that do not have apparent relation known to the > author of the patch. Indeed, and the test suite tries to catch exactly those unintended consequences. > - Perhaps find the fork point, run tests to find known breakages > and exclude them? This would simply be not practical, as it > doubles the number of tests run, for individual topic branches > because there are an order of magnitude more of them than the > primary integration branches. I saw another strategy in action: accept the base commit chosen by the contributor, and ask to back-port it to previous, still supported versions (unless an automated rebase managed to back-port already). > Possibly, if we limit the fork point to tagged releases, the latter > approach might turn out to be doable. For ma/doc-diff-usage-fix, > the base commit ad51743007 was v2.20.0-rc0~232, so we could round up > and pick v2.20.0 as the base instead (the purpose of picking an older > point is to allow merging to older maintenance track, and it is > unlikely that people would keep using v2.20.0-rc0 as the base of > their forked distribution). Then if the automated testing keeps > record of known breakages (whether they are later fixed or still > broken) for these tagged releases, testing a new topic forked from > one of them and deciding not to worry about breakages of tests that > are known to be broken (or perhaps deciding to skip these tests in > the first place) may become feasible. I kind of do that already in the CI builds of GitGitGadget, but not for all known fixed bugs. The most prominent example is the lack of Windows Vista support in current the Git for Windows SDK: a *ton* of your branches does not have that fix, and I still wanted to have test coverage for those branches, so I introduced this snippet: git grep _WIN32_WINNT.0x06 HEAD -- git-compat-util.h || export MAKEFLAGS=CFLAGS=-D_WIN32_WINNT=0x0502 This will let those branches build. Likewise, I added CI builds for all of your branches in gitster/git to make it much easier to *act* on regression test failures. As you can see, I am more than willing to pour an insane amount of time just for the sake of not forcing you to change your work style, as it seems to work for you and I would be the last person to ask anyone to change anything that works for them. In this instance, Peff remarked at the Contributors' Summit (BTW will you attend any of those in the future?) that he would love to have "an automated Ramsay", i.e. a bot that reacts on test failures, notifies the author of the root cause, preferably with a patch. And all I thought when I sent the mail that started this here sub-thread was that I could demonstrate to Peff that it is not that easy (even if writing that script that would determine the patch author upon a failure would be easy, now even the bisect would be easy because we are working on linear topic branches where you do not have to test 10,000 merge bases before testing the 2 patches that could be the culprit). Thank you, of course, for detailing your thoughts on that matter. > I dunno. Limiting fork points to tagged releases for topics that > are meant for the maintenance tracks may have other benefits by > restricting the shape of the resulting dag, so I am willing to try > it as an experiment, whether it would help your workflow or not. If you would like to experiment that, I'd like to learn the outcome. But I do not strictly think that it is necessary. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> For a topic like doc-diff that is primarily meant for developers and >> documenters, it does not matter much, but for an old but important bug, >> forking the topic to fix it at a point close to the origin is >> crucial---that is what would allow people to merge the fix to an older >> maintenance track, without cherry-picking. It is especially true when >> the bug being fixed is more severe than unrelated breakages that have >> been fixed since then. > > As I said, I understand your reasoning. When I am following-up to the mailing list, not replying directly and solely to you, I may write things that I know you already know to help others reading from sidelines. Just saying I agree, without sounding as if saying "you do not have to say that", would be better. >> - Perhaps find the fork point, run tests to find known breakages >> and exclude them? This would simply be not practical, as it >> doubles the number of tests run, for individual topic branches >> because there are an order of magnitude more of them than the >> primary integration branches. > > I saw another strategy in action: accept the base commit chosen by the > contributor, and ask to back-port it to previous, still supported versions > (unless an automated rebase managed to back-port already). It sounds more like "notice the base commit chosen by the contributor, reject the series and ask to rebase on a fork point of my choice". That's not all that different from "notice the base commit chosen by the contributor, rebase on a more sensible fork point myself, and double-check the result by merging it to the base commit chosen by the contributor and make sure there is no unmanageable conflict".
On Tue, Feb 05, 2019 at 11:34:54AM +0100, Johannes Schindelin wrote: > Peff, you asked at the Contributors' Summit for a way to get notified when > CI fails for your patch, and I was hesitant to add it (even if it would be > straight-forward, really) because of the false positives. > > This is one such example, as the test fails: > > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=944 > > In particular, the tests t2024 and t5552 are broken for > ma/doc-diff-usage-fix on Windows. The reason seems to be that those are > already broken for the base commit that Junio picked: > jk/diff-rendered-docs (actually, not the tip of it, but the commit fixed > by Martin's patch). > > Of course, I understand why Junio picks base commits that are far, far in > the past (and have regressions that current `master` does not have), it > makes sense from the point of view where the fixes should be as close to > the commits they fix. The downside is that we cannot automate regression > testing more than we do now, with e.g. myself acting as curator of test > failures. Thanks for this real-world example; it's definitely something I hadn't thought too hard about. I think this is a specific case of more general problems with testing. We strive to have every commit pass all of the tests, so that people building on top have a known-good base. But there are many reasons that may fail in practice (not exhaustive, but just things I've personally seen): - some tests are flaky, and will fail intermittently - some tests _used_ to pass, but no longer do if the environment has changed (e.g., relying on behavior of system tools that change) - historical mistakes, where "all tests pass" was only true on one platform but not others (which I think is what's going on here) And these are a problem not just for automated CI, but for running "make test" locally. I don't think we'll ever get 100% accuracy, so at some point we have to accept some annoying false positives. The question is how often they come up, and whether they drown out real problems. Testing under Linux, my experience with the first two is that they're uncommon enough not to be a huge burden. The third class seems like it is likely to be a lot more common for Windows builds, since we haven't historically run tests on them. But it would also in theory be a thing that would get better going forward, as we fix all of those test failures (and new commits are less likely to be built on truly ancient history). So IMHO this isn't really a show-stopper problem, so much as something that is a sign of the maturing test/CI setup (I say "maturing", not "mature", as it seems we've probably still got a ways to go). As far as notifications go, it probably makes sense for them to be something that requires the user to sign up for anyway, so at that point they're making their own choice about whether the signal to noise ratio is acceptable. I also think there are ways to automate away some of these problems (e.g., flake detection by repeating test failures, re-running failures on parent commits to check whether a patch actually introduced the problem). But implementing those is non-trivial work, so I am certainly not asking you to do it. -Peff
On Tue, Feb 05, 2019 at 10:45:35AM -0800, Junio C Hamano wrote: > - Perhaps find the fork point, run tests to find known breakages > and exclude them? This would simply be not practical, as it > doubles the number of tests run, for individual topic branches > because there are an order of magnitude more of them than the > primary integration branches. I think this can be limited to the tests that failed, which makes things much faster. I.e., we run the tests at the tip of topic X and see that t1234 fails. We then go back to the fork point and we just need to run t1234 again. If it succeeds, then we blame X for the failure. If it fails, then we consider it a false positive. You do pay the price to do a full build at the fork point, but in my experience that is much quicker than the tests. -Peff
Hi Peff, On Wed, 6 Feb 2019, Jeff King wrote: > On Tue, Feb 05, 2019 at 11:34:54AM +0100, Johannes Schindelin wrote: > > > Peff, you asked at the Contributors' Summit for a way to get notified > > when CI fails for your patch, and I was hesitant to add it (even if it > > would be straight-forward, really) because of the false positives. > > > > This is one such example, as the test fails: > > > > https://dev.azure.com/gitgitgadget/git/_build/results?buildId=944 > > > > In particular, the tests t2024 and t5552 are broken for > > ma/doc-diff-usage-fix on Windows. The reason seems to be that those are > > already broken for the base commit that Junio picked: > > jk/diff-rendered-docs (actually, not the tip of it, but the commit fixed > > by Martin's patch). > > > > Of course, I understand why Junio picks base commits that are far, far in > > the past (and have regressions that current `master` does not have), it > > makes sense from the point of view where the fixes should be as close to > > the commits they fix. The downside is that we cannot automate regression > > testing more than we do now, with e.g. myself acting as curator of test > > failures. > > Thanks for this real-world example; it's definitely something I hadn't > thought too hard about. > > I think this is a specific case of more general problems with testing. > We strive to have every commit pass all of the tests, so that people > building on top have a known-good base. But there are many reasons that > may fail in practice (not exhaustive, but just things I've personally > seen): > > - some tests are flaky, and will fail intermittently > > - some tests _used_ to pass, but no longer do if the environment has > changed (e.g., relying on behavior of system tools that change) > > - historical mistakes, where "all tests pass" was only true on one > platform but not others (which I think is what's going on here) > > And these are a problem not just for automated CI, but for running "make > test" locally. I don't think we'll ever get 100% accuracy, so at some > point we have to accept some annoying false positives. The question is > how often they come up, and whether they drown out real problems. > Testing under Linux, my experience with the first two is that they're > uncommon enough not to be a huge burden. > > The third class seems like it is likely to be a lot more common for > Windows builds, since we haven't historically run tests on them. But it > would also in theory be a thing that would get better going forward, as > we fix all of those test failures (and new commits are less likely to be > built on truly ancient history). Indeed, you are absolutely right: things *are* getting better. To me, a big difference is the recent speed-up, making it possible for me to pay more attention to individual branches (which are now tested, too), and if I see any particular breakage in `pu` or `jch` that I saw already in the individual branches, I won't bother digging further. That is already a *huge* relief for me. Granted, I had originally hoped to speed up the test suite, so that it would be faster locally. But I can use the cloud as my computer, too. > So IMHO this isn't really a show-stopper problem, so much as something > that is a sign of the maturing test/CI setup (I say "maturing", not > "mature", as it seems we've probably still got a ways to go). As far as > notifications go, it probably makes sense for them to be something that > requires the user to sign up for anyway, so at that point they're making > their own choice about whether the signal to noise ratio is acceptable. Maybe. I do not even know whether there is an option for that in Azure Pipelines, maybe GitHub offers that? In any case, I just wanted to corroborate with a real-world example what I mentioned at the Contributors' Summit: that I would like to not script that thing yet where contributors are automatically notified when their branches don't pass. > I also think there are ways to automate away some of these problems > (e.g., flake detection by repeating test failures, re-running failures > on parent commits to check whether a patch actually introduced the > problem). But implementing those is non-trivial work, so I am certainly > not asking you to do it. Indeed. It might be a lot more common than just Git, too, in which case I might catch the interest of some of my colleagues who could then implement a solid solution that works not only for us, but for everybody using Azure Pipelines. Speaking of which... can we hook it up with https://github.com/git/git, now that the Azure Pipelines support is in `master`? I sent you and Junio an invitation to https://dev.azure.com/git/git, so that either you or Junio (who are the only owners of the GitHub repository) can set it up. If you want me to help, please do not hesitate to ping me on IRC. Ciao, Dscho
Hi Peff, On Wed, 6 Feb 2019, Jeff King wrote: > On Tue, Feb 05, 2019 at 10:45:35AM -0800, Junio C Hamano wrote: > > > - Perhaps find the fork point, run tests to find known breakages > > and exclude them? This would simply be not practical, as it > > doubles the number of tests run, for individual topic branches > > because there are an order of magnitude more of them than the > > primary integration branches. > > I think this can be limited to the tests that failed, which makes things > much faster. I.e., we run the tests at the tip of topic X and see that > t1234 fails. We then go back to the fork point and we just need to run > t1234 again. If it succeeds, then we blame X for the failure. If it > fails, then we consider it a false positive. If you mean merge bases by fork points, I wrote an Azure Pipeline to do that (so that I could use the cloud as kind of a fast computer), but that was still too slow. Even when there are even only as much as 12 merge bases to test (which is the current number of merge bases between `next` and `pu`), a build takes roughly 6 minutes on Windows, and many tests take 1 minute or more to run (offenders like t7003 and t7610 take over 400 seconds, i.e. roughly 6 minutes), we are talking about roughly 1.5h *just* to test the merge bases. And I sadly have to report that that's not the end of it. Back when I implemented the automatic bisect after failed builds (for details, see https://github.com/git-for-windows/build-extra/commit/c7e01e82c), I had to turn it off real quickly because the dumb bisect between `next` and `pu` regularly ran into the 4h timeout. Ciao, Dscho > You do pay the price to do a full build at the fork point, but in my > experience that is much quicker than the tests. > > -Peff >
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Even when there are even only as much as 12 merge bases to test (which is > the current number of merge bases between `next` and `pu`),... > ... > And I sadly have to report that that's not the end of it. Back when I > implemented the automatic bisect after failed builds (for details, see > https://github.com/git-for-windows/build-extra/commit/c7e01e82c), I had to > turn it off real quickly because the dumb bisect between `next` and `pu` > regularly ran into the 4h timeout. Would it make it easier for you if you substituted all the mention of 'next' in your message with 'pu^{/^### match next}'? That mid-point between 'master' and 'pu' is designed to contain exactly the same set of non-merge commits 'next' has, with the tree that is identical to that of 'next', and from there to the tip of 'pu' forms a single strand of merges of tips of topic branches that are not yet merged to 'next' (by definition, it itself is the merge base of it and 'pu'). Bisecting along the first-parent chain from there to the tip of 'pu' would let us identify which merge is faulty as the first-and-quick pass and currently there are about 20 merges in that range on the first-parent chain.
On Thu, Feb 07, 2019 at 03:26:21PM +0100, Johannes Schindelin wrote: > > So IMHO this isn't really a show-stopper problem, so much as something > > that is a sign of the maturing test/CI setup (I say "maturing", not > > "mature", as it seems we've probably still got a ways to go). As far as > > notifications go, it probably makes sense for them to be something that > > requires the user to sign up for anyway, so at that point they're making > > their own choice about whether the signal to noise ratio is acceptable. > > Maybe. I do not even know whether there is an option for that in Azure > Pipelines, maybe GitHub offers that? No, I don't think so. Probably the route there would be to make a comment on the commit or PR that would then go to the user as a notification (from which they can then decide on email delivery, etc). > In any case, I just wanted to corroborate with a real-world example what I > mentioned at the Contributors' Summit: that I would like to not script > that thing yet where contributors are automatically notified when their > branches don't pass. Fair enough. As an alternative, do you know offhand if there's an easy machine-readable way to get the CI results? If I could poll it with curl and generate my own notifications, that would be fine for me. > > I also think there are ways to automate away some of these problems > > (e.g., flake detection by repeating test failures, re-running failures > > on parent commits to check whether a patch actually introduced the > > problem). But implementing those is non-trivial work, so I am certainly > > not asking you to do it. > > Indeed. It might be a lot more common than just Git, too, in which case I > might catch the interest of some of my colleagues who could then implement > a solid solution that works not only for us, but for everybody using Azure > Pipelines. Yes, agreed. :) > Speaking of which... can we hook it up with https://github.com/git/git, > now that the Azure Pipelines support is in `master`? I sent you and Junio > an invitation to https://dev.azure.com/git/git, so that either you or > Junio (who are the only owners of the GitHub repository) can set it up. If > you want me to help, please do not hesitate to ping me on IRC. I'm happy to. I walked through the Azure setup/login procedure, but I'm not sure what to do next. -Peff
On Thu, Feb 07, 2019 at 04:41:57PM +0100, Johannes Schindelin wrote: > > I think this can be limited to the tests that failed, which makes things > > much faster. I.e., we run the tests at the tip of topic X and see that > > t1234 fails. We then go back to the fork point and we just need to run > > t1234 again. If it succeeds, then we blame X for the failure. If it > > fails, then we consider it a false positive. > > If you mean merge bases by fork points, I wrote an Azure Pipeline to do > that (so that I could use the cloud as kind of a fast computer), but that > was still too slow. > > Even when there are even only as much as 12 merge bases to test (which is > the current number of merge bases between `next` and `pu`), a build takes > roughly 6 minutes on Windows, and many tests take 1 minute or more to run > (offenders like t7003 and t7610 take over 400 seconds, i.e. roughly 6 > minutes), we are talking about roughly 1.5h *just* to test the merge > bases. I was assuming you're testing individual topics from gitster/git here (which admittedly is more CPU in total than just the integration branches, but it at least parallelizes well). So with that assumption, I was thinking that you'd just look for the merge-base of HEAD and master, which should give you a single point for most topics. For inter-twined topics there may be more merge bases, but I actually think for our purposes here, just testing the most recent one is probably OK. I.e., we're just trying to have a vague sense of whether the test failure is due to new commits or old. I think Junio's suggestion to just pick some common release points would work OK in practice, too. It's possible that some other topic made it to master with a breakage, but in most cases, I think these sorts of failures are often more coarsely-grained (especially if Junio pays attention to the CI results before merging). -Peff
Hi Junio, On Thu, 7 Feb 2019, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Even when there are even only as much as 12 merge bases to test (which is > > the current number of merge bases between `next` and `pu`),... > > ... > > And I sadly have to report that that's not the end of it. Back when I > > implemented the automatic bisect after failed builds (for details, see > > https://github.com/git-for-windows/build-extra/commit/c7e01e82c), I had to > > turn it off real quickly because the dumb bisect between `next` and `pu` > > regularly ran into the 4h timeout. > > Would it make it easier for you if you substituted all the mention > of 'next' in your message with 'pu^{/^### match next}'? I was working on this in 2017, and could not make it to work as I wanted. Ever since, that bisect code has been dormant. In the meantime, I was able to parallelize the test suite enough to make it feasible to test the topic branches. That usually takes care of things really quickly, and I just bite the bullet and bisect manually. Ciao, Dscho > > That mid-point between 'master' and 'pu' is designed to contain > exactly the same set of non-merge commits 'next' has, with the tree > that is identical to that of 'next', and from there to the tip of > 'pu' forms a single strand of merges of tips of topic branches that > are not yet merged to 'next' (by definition, it itself is the merge > base of it and 'pu'). > > Bisecting along the first-parent chain from there to the tip of 'pu' > would let us identify which merge is faulty as the first-and-quick > pass and currently there are about 20 merges in that range on the > first-parent chain. > >
Hi Peff, On Thu, 7 Feb 2019, Jeff King wrote: > On Thu, Feb 07, 2019 at 04:41:57PM +0100, Johannes Schindelin wrote: > > > > I think this can be limited to the tests that failed, which makes things > > > much faster. I.e., we run the tests at the tip of topic X and see that > > > t1234 fails. We then go back to the fork point and we just need to run > > > t1234 again. If it succeeds, then we blame X for the failure. If it > > > fails, then we consider it a false positive. > > > > If you mean merge bases by fork points, I wrote an Azure Pipeline to do > > that (so that I could use the cloud as kind of a fast computer), but that > > was still too slow. > > > > Even when there are even only as much as 12 merge bases to test (which is > > the current number of merge bases between `next` and `pu`), a build takes > > roughly 6 minutes on Windows, and many tests take 1 minute or more to run > > (offenders like t7003 and t7610 take over 400 seconds, i.e. roughly 6 > > minutes), we are talking about roughly 1.5h *just* to test the merge > > bases. > > I was assuming you're testing individual topics from gitster/git here > (which admittedly is more CPU in total than just the integration > branches, but it at least parallelizes well). Indeed. And there, I can usually figure out really quickly myself (but manually) what is going wrong. Hopefully we have Azure Pipelines enabled on https://github.com/git/git soon, with PR builds that include Windows (unlike our current Travis builds), so that contributors have an easier time to test their code in an automated fashion. I also have on my backlog a task to include `sparse` in the Azure Pipelines jobs. That should take care of even more things in a purely automated fashion, as long as the contributors look at those builds. > So with that assumption, I was thinking that you'd just look for the > merge-base of HEAD and master, which should give you a single point for > most topics. For inter-twined topics there may be more merge bases, but > I actually think for our purposes here, just testing the most recent one > is probably OK. I.e., we're just trying to have a vague sense of whether > the test failure is due to new commits or old. Oh, I am typically looking only at the latest commits up until I hit a merge commit. Usually that already tells me enough, and if not, a bisect is really quick on a linear history. I guess my dumb branch^{/^Merge} to find the next merge commit works, but it is a bit unsatisfying that we do not have a more robust way to say "traverse the commit history until finding a merge commit, then use that". > I think Junio's suggestion to just pick some common release points would > work OK in practice, too. It's possible that some other topic made it to > master with a breakage, but in most cases, I think these sorts of > failures are often more coarsely-grained (especially if Junio pays > attention to the CI results before merging). If Junio wants to experiment with that, sure, I'm all for it. Ciao, Dscho
Hi Peff, On Thu, 7 Feb 2019, Jeff King wrote: > On Thu, Feb 07, 2019 at 03:26:21PM +0100, Johannes Schindelin wrote: > > > > So IMHO this isn't really a show-stopper problem, so much as > > > something that is a sign of the maturing test/CI setup (I say > > > "maturing", not "mature", as it seems we've probably still got a > > > ways to go). As far as notifications go, it probably makes sense for > > > them to be something that requires the user to sign up for anyway, > > > so at that point they're making their own choice about whether the > > > signal to noise ratio is acceptable. > > > > Maybe. I do not even know whether there is an option for that in Azure > > Pipelines, maybe GitHub offers that? > > No, I don't think so. Probably the route there would be to make a > comment on the commit or PR that would then go to the user as a > notification (from which they can then decide on email delivery, etc). Ah, but that won't notify you when a Check failed. So that still would require some scripting. > > In any case, I just wanted to corroborate with a real-world example > > what I mentioned at the Contributors' Summit: that I would like to not > > script that thing yet where contributors are automatically notified > > when their branches don't pass. > > Fair enough. As an alternative, do you know offhand if there's an easy > machine-readable way to get the CI results? If I could poll it with curl > and generate my own notifications, that would be fine for me. There is a REST API: https://docs.microsoft.com/en-us/rest/api/azure/devops/build/builds/list?view=azure-devops-rest-5.0 So this would give you the latest 5 failed builds: curl "https://dev.azure.com/gitgitgadget/git/_apis/build/builds?definitions=6&resultFilter=failed&\$top=5" I did not find a way to filter by user, or by branch name with wildcards, though. > > Speaking of which... can we hook it up with https://github.com/git/git, > > now that the Azure Pipelines support is in `master`? I sent you and Junio > > an invitation to https://dev.azure.com/git/git, so that either you or > > Junio (who are the only owners of the GitHub repository) can set it up. If > > you want me to help, please do not hesitate to ping me on IRC. > > I'm happy to. I walked through the Azure setup/login procedure, but I'm > not sure what to do next. The next step would be to install Azure Pipelines from the Marketplace and activate it for git/git. There *should* be a wizard or something to walk you through... Ciao, Dscho
On Thu, Feb 07, 2019 at 03:45:02PM -0500, Jeff King wrote: > Fair enough. As an alternative, do you know offhand if there's an easy > machine-readable way to get the CI results? If I could poll it with curl > and generate my own notifications, that would be fine for me. Well, what do you mean by "CI results"? Getting whether a build succeeded, failed, still running, etc.? Sure. Getting which particular test script (semantic patch, documentation...) caused the build failure? Nope. [1] Travis CI has a REST API (note that you have to sign in with GitHub account to view its docs, and then need an access token to use the API): https://developer.travis-ci.org/gettingstarted They also offer a command line client for this API: https://github.com/travis-ci/travis.rb Depending on what you want that in itself might already be enough for you. It wasn't for me, as I have a very particular idea about how I prefer to view my CI results, but neither the website nor the CLI client offer such a compact _and_ detailed view like this: ccccccccc 2175 pu ccccccccc 2174 sg/ci-parallel-build ccccccccc 2173 js/fuzz-commit-graph-update ccccccccc 2172 js/mingw-host-cpu PsscsPscc 2171 dl/submodule-set-branch PPXsPPPPP 2170 kl/pretty-doc-markup-fix PPPPPPPPP 2169 en/combined-all-paths ('c' - created, 's' - started, 'P' - passed, 'X' - failed) Nothing that can't be achived with good screenful of Python/Ruby/etc scripting... including colors matching the website's color scheme! :) [1] Although since we include the trash directory of the failed test script in the logs, surrounded by clear marker lines containing the failed test script's name, it wouldn't be too hard to get it programmatically, either.
On Fri, Feb 08, 2019 at 01:58:52AM +0100, SZEDER Gábor wrote: > On Thu, Feb 07, 2019 at 03:45:02PM -0500, Jeff King wrote: > > Fair enough. As an alternative, do you know offhand if there's an easy > > machine-readable way to get the CI results? If I could poll it with curl > > and generate my own notifications, that would be fine for me. > > Well, what do you mean by "CI results"? Getting whether a build > succeeded, failed, still running, etc.? Sure. Getting which > particular test script (semantic patch, documentation...) caused the > build failure? Nope. [1] Ideally yeah, I'd like to see the verbose (even "-x") log of the failed test. But even an indication of a failure is enough that I know I can start digging (and bonus if I can then dig into the log with a script and parse it myself). > Travis CI has a REST API (note that you have to sign in with GitHub > account to view its docs, and then need an access token to use the > API): Thanks, I may poke around that. In this particular case, though, I was wondering about the Azure Pipelines builds that Dscho has put together. > Depending on what you want that in itself might already be enough for > you. It wasn't for me, as I have a very particular idea about how I > prefer to view my CI results, but neither the website nor the CLI > client offer such a compact _and_ detailed view like this: > > ccccccccc 2175 pu > ccccccccc 2174 sg/ci-parallel-build > ccccccccc 2173 js/fuzz-commit-graph-update > ccccccccc 2172 js/mingw-host-cpu > PsscsPscc 2171 dl/submodule-set-branch > PPXsPPPPP 2170 kl/pretty-doc-markup-fix > PPPPPPPPP 2169 en/combined-all-paths Mostly I just want to see the status of my own topics (ideally as soon as they're available, but even polling to get them within a few hours would be OK). I run the tests locally, of course, but sometimes problems show up on other platforms. -Peff
On Thu, Feb 07, 2019 at 10:57:48PM +0100, Johannes Schindelin wrote: > > Fair enough. As an alternative, do you know offhand if there's an easy > > machine-readable way to get the CI results? If I could poll it with curl > > and generate my own notifications, that would be fine for me. > > There is a REST API: > > https://docs.microsoft.com/en-us/rest/api/azure/devops/build/builds/list?view=azure-devops-rest-5.0 > > So this would give you the latest 5 failed builds: > > curl "https://dev.azure.com/gitgitgadget/git/_apis/build/builds?definitions=6&resultFilter=failed&\$top=5" > > I did not find a way to filter by user, or by branch name with wildcards, > though. Thanks. I'll play around with that. If I can get the data out at all, I'm sure I can massage it into some useful form with perl. That's what it's for, after all. :) > > I'm happy to. I walked through the Azure setup/login procedure, but I'm > > not sure what to do next. > > The next step would be to install Azure Pipelines from the Marketplace and > activate it for git/git. There *should* be a wizard or something to walk > you through... OK, I'll take a look (but probably not until tomorrow). -Peff
On Thu, Feb 07, 2019 at 09:53:18PM -0500, Jeff King wrote: > > > I'm happy to. I walked through the Azure setup/login procedure, but I'm > > > not sure what to do next. > > > > The next step would be to install Azure Pipelines from the Marketplace and > > activate it for git/git. There *should* be a wizard or something to walk > > you through... > > OK, I'll take a look (but probably not until tomorrow). I did that, and it magically set things up, but under a "peff" organization. It seems that I don't have sufficient permissions on the "git" organization (on Azure) to do it. -Peff
diff --git a/Documentation/doc-diff b/Documentation/doc-diff index dfd9418778..32c83dd26f 100755 --- a/Documentation/doc-diff +++ b/Documentation/doc-diff @@ -39,8 +39,7 @@ do shift done -cd_to_toplevel -tmp=Documentation/tmp-doc-diff +tmp="$(git rev-parse --show-toplevel)/Documentation/tmp-doc-diff" || exit 1 if test -n "$clean" then @@ -109,7 +108,7 @@ render_tree () { make -j$parallel -C "$tmp/worktree" \ GIT_VERSION=omitted \ SOURCE_DATE_EPOCH=0 \ - DESTDIR="$PWD/$tmp/installed/$1+" \ + DESTDIR="$tmp/installed/$1+" \ install-man && mv "$tmp/installed/$1+" "$tmp/installed/$1" fi &&
`usage` tries to call $0, which might very well be "./doc-diff", so if we `cd_to_toplevel` before calling `usage`, we'll end with an error to the effect of "./doc-diff: not found" rather than a friendly `doc-diff -h` output. This regressed in ad51743007 ("doc-diff: add --clean mode to remove temporary working gunk", 2018-08-31) where we moved the call to `cd_to_toplevel` to much earlier. A general fix might be to teach git-sh-setup to save away the absolute path for $0 and then use that, instead. I'm not aware of any portable way of doing that, see, e.g., d2addc3b96 ("t7800: readlink may not be available", 2016-05-31). An early version of this patch moved `cd_to_toplevel` back to where it was before ad51743007 and taught the "--clean" code to cd on its own. But let's try instead to get rid of the cd-ing entirely. We don't really need it and we can work with absolute paths instead. There's just one use of $PWD that we need to adjust by simply dropping it. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- Thanks Peff for the suggestions. Trying not to cd at all seems sane to me. That it allows `./Documentation/doc-diff` is a bonus, I guess, but you're right that it's probably nothing anyone will use. I've verified that diffs produced by `./Documentation/doc-diff foo bar` and `./doc-diff foo bar` are identical, FWIW. Martin Documentation/doc-diff | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)