ci/install-depends: attempt to fix "brew cask" stuff

Message ID	xmqqk0sevqlh.fsf@gitster.c.googlers.com (mailing list archive)
State	Accepted
Commit	5a11de010b85261407d306768c119269d9297481
Headers	show Return-Path: <git-owner@kernel.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:subject :date:message-id:mime-version:content-type; q=dns; s=sasl; b=IbF QWVJOfl4ffISQKtfjOq1KckpBGAyYzAJ9KxeW1pvC1Gae9mi8IcKqvAIn+B+pSHh loDi1yzYPyUx26OQDQ6XRGY0LPYDtGeyWfvuquB9CVO2xwEKr6nXup3XrzdGOI36 tMwgTkduruL0ACkiA0GyQRHedN/A88GTfBJsqbGk= From: Junio C Hamano <gitster@pobox.com> To: git@vger.kernel.org Subject: [PATCH] ci/install-depends: attempt to fix "brew cask" stuff Date: Thu, 14 Jan 2021 19:14:34 -0800 Message-ID: <xmqqk0sevqlh.fsf@gitster.c.googlers.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1.90 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	ci/install-depends: attempt to fix "brew cask" stuff \| expand ci/install-depends: attempt to fix "brew cask" stuff

Junio C Hamano Jan. 15, 2021, 3:14 a.m. UTC

It seems that homebrew suddenly started giving us trouble, like this:

https://github.com/git/git/runs/1705953982?check_suite_focus=true#step:3:70

Here is my attempt to work it around by blindly following the
suggested course of action in the error message, without knowing
what I am doing X-<.  I am not a Mac person.

What is frustrating is that every time we hit a minor snag like this
to break one of the jobs, all other unrelated jobs are also taken
down.

Help by those who know what they are doing on macOS would greatly be
appreciated.  Thanks.

----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----
We run "git pull" against "$cask_repo"; clarify that we are
expecting not to have any of our own modifications and running "git
pull" to merely update, by passing "--ff-only" on the command line.

Also, the "brew cask install" command line triggers an error message
that says:

    Error: Calling brew cask install is disabled! Use brew install
    [--cask] instead.

In addition, "brew install caskroom/cask/perforce" step triggers an
error that says:

    Error: caskroom/cask was moved. Tap homebrew/cask instead.

Attempt to see if blindly following the suggestion in these error
messages gets us into a better shape.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 ci/install-dependencies.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Junio C Hamano Jan. 15, 2021, 5:48 a.m. UTC | #1

Junio C Hamano <gitster@pobox.com> writes:

> It seems that homebrew suddenly started giving us trouble, like this:
>
> https://github.com/git/git/runs/1705953982?check_suite_focus=true#step:3:70
>
> Here is my attempt to work it around by blindly following the
> suggested course of action in the error message, without knowing
> what I am doing X-<.  I am not a Mac person.
>
> What is frustrating is that every time we hit a minor snag like this
> to break one of the jobs, all other unrelated jobs are also taken
> down.
>
> Help by those who know what they are doing on macOS would greatly be
> appreciated.  Thanks.

After seeing 'seen' with this patch at its tip pass the tests [*1*],
I prepared a merge of this change into the tip of 'next' and
tentatively updated 'seen' with it.

The test is still running [*2*], but the problematic part in the
macOS build has already passed, so I am planning to fast-track this
change down to 'next', 'master' and eventually down to 'maint' to
keep the CI going to help other platforms catch more interesting
problems.

Help from those who are more familiar with macOS and homebrew is
still appreciated, though.


[References]

*1* https://github.com/git/git/actions/runs/486978562
*2* https://github.com/git/git/runs/1706704233?check_suite_focus=true#step:3:81


> ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 ----- >8 -----
> We run "git pull" against "$cask_repo"; clarify that we are
> expecting not to have any of our own modifications and running "git
> pull" to merely update, by passing "--ff-only" on the command line.
>
> Also, the "brew cask install" command line triggers an error message
> that says:
>
>     Error: Calling brew cask install is disabled! Use brew install
>     [--cask] instead.
>
> In addition, "brew install caskroom/cask/perforce" step triggers an
> error that says:
>
>     Error: caskroom/cask was moved. Tap homebrew/cask instead.
>
> Attempt to see if blindly following the suggestion in these error
> messages gets us into a better shape.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  ci/install-dependencies.sh | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/ci/install-dependencies.sh b/ci/install-dependencies.sh
> index 0229a77f7d..0b1184e04a 100755
> --- a/ci/install-dependencies.sh
> +++ b/ci/install-dependencies.sh
> @@ -44,13 +44,13 @@ osx-clang|osx-gcc)
>  	test -z "$BREW_INSTALL_PACKAGES" ||
>  	brew install $BREW_INSTALL_PACKAGES
>  	brew link --force gettext
> -	brew cask install --no-quarantine perforce || {
> +	brew install --cask --no-quarantine perforce || {
>  		# Update the definitions and try again
>  		cask_repo="$(brew --repository)"/Library/Taps/homebrew/homebrew-cask &&
> -		git -C "$cask_repo" pull --no-stat &&
> -		brew cask install --no-quarantine perforce
> +		git -C "$cask_repo" pull --no-stat --ff-only &&
> +		brew install --cask --no-quarantine perforce
>  	} ||
> -	brew install caskroom/cask/perforce
> +	brew install homebrew/cask/perforce
>  	case "$jobname" in
>  	osx-gcc)
>  		brew install gcc@9

Eric Sunshine Jan. 15, 2021, 6:05 a.m. UTC | #2

On Fri, Jan 15, 2021 at 12:50 AM Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> > It seems that homebrew suddenly started giving us trouble, like this:
> > https://github.com/git/git/runs/1705953982?check_suite_focus=true#step:3:70
> >
> After seeing 'seen' with this patch at its tip pass the tests [*1*],
> I prepared a merge of this change into the tip of 'next' and
> tentatively updated 'seen' with it.
>
> The test is still running [*2*], but the problematic part in the
> macOS build has already passed, so I am planning to fast-track this
> change down to 'next', 'master' and eventually down to 'maint' to
> keep the CI going to help other platforms catch more interesting
> problems.
>
> Help from those who are more familiar with macOS and homebrew is
> still appreciated, though.

I looked over your Homebrew-related changes, and they seem fine.

The `brew install homebrew/cask/perforce` command could be shortened
to `brew install perforce`, at least at the present, but being
explicit with the long identifier is probably better anyhow.

Derrick Stolee Jan. 15, 2021, 2:27 p.m. UTC | #3

On 1/15/2021 12:48 AM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> It seems that homebrew suddenly started giving us trouble, like this:
>>
>> https://github.com/git/git/runs/1705953982?check_suite_focus=true#step:3:70
>>
>> Here is my attempt to work it around by blindly following the
>> suggested course of action in the error message, without knowing
>> what I am doing X-<.  I am not a Mac person.
>>
>> What is frustrating is that every time we hit a minor snag like this
>> to break one of the jobs, all other unrelated jobs are also taken
>> down.
>>
>> Help by those who know what they are doing on macOS would greatly be
>> appreciated.  Thanks.
> 
> After seeing 'seen' with this patch at its tip pass the tests [*1*],
> I prepared a merge of this change into the tip of 'next' and
> tentatively updated 'seen' with it.
> 
> The test is still running [*2*], but the problematic part in the
> macOS build has already passed, so I am planning to fast-track this
> change down to 'next', 'master' and eventually down to 'maint' to
> keep the CI going to help other platforms catch more interesting
> problems.
> 
> Help from those who are more familiar with macOS and homebrew is
> still appreciated, though.
> 
> 
> [References]
> 
> *1* https://github.com/git/git/actions/runs/486978562
> *2* https://github.com/git/git/runs/1706704233?check_suite_focus=true#step:3:81

We recently hit this same issue with our macOS builds for Scalar
and GCM core. Your solution looks very similar to how we fixed
the problem.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>

Junio C Hamano Jan. 15, 2021, 7:52 p.m. UTC | #4

Derrick Stolee <stolee@gmail.com> writes:

> On 1/15/2021 12:48 AM, Junio C Hamano wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>> 
>>> It seems that homebrew suddenly started giving us trouble, like this:
>>>
>>> https://github.com/git/git/runs/1705953982?check_suite_focus=true#step:3:70
>>>
>>> Here is my attempt to work it around by blindly following the
>>> suggested course of action in the error message, without knowing
>>> what I am doing X-<.  I am not a Mac person.
>>>
>>> What is frustrating is that every time we hit a minor snag like this
>>> to break one of the jobs, all other unrelated jobs are also taken
>>> down.
>>>
>>> Help by those who know what they are doing on macOS would greatly be
>>> appreciated.  Thanks.
>> 
>> After seeing 'seen' with this patch at its tip pass the tests [*1*],
>> I prepared a merge of this change into the tip of 'next' and
>> tentatively updated 'seen' with it.
>> 
>> The test is still running [*2*], but the problematic part in the
>> macOS build has already passed, so I am planning to fast-track this
>> change down to 'next', 'master' and eventually down to 'maint' to
>> keep the CI going to help other platforms catch more interesting
>> problems.
>> 
>> Help from those who are more familiar with macOS and homebrew is
>> still appreciated, though.
>> 
>> 
>> [References]
>> 
>> *1* https://github.com/git/git/actions/runs/486978562
>> *2* https://github.com/git/git/runs/1706704233?check_suite_focus=true#step:3:81
>
> We recently hit this same issue with our macOS builds for Scalar
> and GCM core. Your solution looks very similar to how we fixed
> the problem.
>
> Reviewed-by: Derrick Stolee <dstolee@microsoft.com>

Thanks.

Jeff King Jan. 15, 2021, 7:59 p.m. UTC | #5

On Thu, Jan 14, 2021 at 07:14:34PM -0800, Junio C Hamano wrote:

> It seems that homebrew suddenly started giving us trouble, like this:
> 
> https://github.com/git/git/runs/1705953982?check_suite_focus=true#step:3:70
> 
> Here is my attempt to work it around by blindly following the
> suggested course of action in the error message, without knowing
> what I am doing X-<.  I am not a Mac person.
> 
> What is frustrating is that every time we hit a minor snag like this
> to break one of the jobs, all other unrelated jobs are also taken
> down.

I think that has to do with the grouping in the workflow file (the
Windows builds, for example, will cancel the _other_ Windows tests if
they fail, but not the Linux ones).

So we could split the macos test out. It would probably involve
duplicating a little bit of the content, but we do something similar for
the dockerized builds. It might be that there is an option we can set to
say "keep building the others even if one of these fails", which would
give us the best of both.

>  	brew install $BREW_INSTALL_PACKAGES
>  	brew link --force gettext
> -	brew cask install --no-quarantine perforce || {
> +	brew install --cask --no-quarantine perforce || {

On a related note, it feels like perforce is a frequent offender for
triggering spurious failures (both for homebrew setup, but I have
definitely seen racy/flaky failures from it as well). I am tempted to
say it is not worth the trouble, but then I do not care at all about
git-p4 myself in the first place, so I may be biased.

-Peff

Eric Sunshine Jan. 15, 2021, 8:07 p.m. UTC | #6

On Fri, Jan 15, 2021 at 3:00 PM Jeff King <peff@peff.net> wrote:
> > -     brew cask install --no-quarantine perforce || {
> > +     brew install --cask --no-quarantine perforce || {
>
> On a related note, it feels like perforce is a frequent offender for
> triggering spurious failures (both for homebrew setup, but I have
> definitely seen racy/flaky failures from it as well). I am tempted to
> say it is not worth the trouble, but then I do not care at all about
> git-p4 myself in the first place, so I may be biased.

To be fair to 'perforce', though, the fault of this particular problem
is Homebrew, which doesn't seem to be all that concerned about
backward compatibility, at least in my experience. The single
Homebrew-related automation script I wrote for personal use has been
broken by arbitrary Homebrew changes frustratingly often over the last
three years.

Jeff King Jan. 15, 2021, 8:46 p.m. UTC | #7

On Fri, Jan 15, 2021 at 03:07:38PM -0500, Eric Sunshine wrote:

> On Fri, Jan 15, 2021 at 3:00 PM Jeff King <peff@peff.net> wrote:
> > > -     brew cask install --no-quarantine perforce || {
> > > +     brew install --cask --no-quarantine perforce || {
> >
> > On a related note, it feels like perforce is a frequent offender for
> > triggering spurious failures (both for homebrew setup, but I have
> > definitely seen racy/flaky failures from it as well). I am tempted to
> > say it is not worth the trouble, but then I do not care at all about
> > git-p4 myself in the first place, so I may be biased.
> 
> To be fair to 'perforce', though, the fault of this particular problem
> is Homebrew, which doesn't seem to be all that concerned about
> backward compatibility, at least in my experience. The single
> Homebrew-related automation script I wrote for personal use has been
> broken by arbitrary Homebrew changes frustratingly often over the last
> three years.

Yeah, sorry, I should have been more precise in my language. None of
this is perforce's fault at all. It is homebrew in this case, and in the
racy cases it is probably our tests. But I do not feel like trying to
debug those races for a tool I don't care much about.

I tried to dig up some failing logs as an example, but it's actually a
bit hard to search for, and it looks like logs get expired after a few
months anyway.

-Peff

Philippe Blain Jan. 24, 2021, 2:41 a.m. UTC | #8

Hi everyone,

Le 2021-01-15 à 14:59, Jeff King a écrit :
> On Thu, Jan 14, 2021 at 07:14:34PM -0800, Junio C Hamano wrote:
> 
>> It seems that homebrew suddenly started giving us trouble, like this:
>>
>> https://github.com/git/git/runs/1705953982?check_suite_focus=true#step:3:70
>>
>> Here is my attempt to work it around by blindly following the
>> suggested course of action in the error message, without knowing
>> what I am doing X-<.  I am not a Mac person.
>>
>> What is frustrating is that every time we hit a minor snag like this
>> to break one of the jobs, all other unrelated jobs are also taken
>> down.
> 
> I think that has to do with the grouping in the workflow file (the
> Windows builds, for example, will cancel the _other_ Windows tests if
> they fail, but not the Linux ones).
> 
> So we could split the macos test out. It would probably involve
> duplicating a little bit of the content, but we do something similar for
> the dockerized builds. It might be that there is an option we can set to
> say "keep building the others even if one of these fails", which would
> give us the best of both.

Yes, a quick Google search pointed me to a blog post [1]
that mentions using 'fail-fast: false' in the test matrix so that
one failing job does not automatically cancel the rest of the jobs
in the matrix (the default is 'true') [2].

If we apply that to all four matrices in the workflow file,
(windows-test, vs-test, regular and dockerized), it would be
something like this:

~~~
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index aef6643648..6c31f978bc 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -125,6 +125,7 @@ jobs:
      strategy:
        matrix:
          nr: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+      fail-fast: false
      steps:
      - uses: actions/checkout@v1
      - name: download build artifacts
@@ -229,6 +230,7 @@ jobs:
      strategy:
        matrix:
          nr: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+      fail-fast: false
      steps:
      - uses: actions/checkout@v1
      - name: download git-sdk-64-minimal
@@ -289,6 +291,7 @@ jobs:
            - jobname: GETTEXT_POISON
              cc: gcc
              pool: ubuntu-latest
+        fail-fast: false
      env:
        CC: ${{matrix.vector.cc}}
        jobname: ${{matrix.vector.jobname}}
@@ -315,6 +318,7 @@ jobs:
            image: alpine
          - jobname: Linux32
            image: daald/ubuntu32:xenial
+      fail-fast: false
      env:
        jobname: ${{matrix.vector.jobname}}
      runs-on: ubuntu-latest
~~~

I've CC-ed Dscho regarding if we also want this for the Windows tests,
(I don't see why not) and if we feel it's a good idea I can
send a proper patch.


Cheers,

Philippe.


[1] https://www.edwardthomson.com/blog/github_actions_6_fail_fast_matrix_workflows.html
[2] https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idstrategyfail-fast

Junio C Hamano Jan. 24, 2021, 6:18 a.m. UTC | #9

Philippe Blain <levraiphilippeblain@gmail.com> writes:

> Yes, a quick Google search pointed me to a blog post [1]
> that mentions using 'fail-fast: false' in the test matrix so that
> one failing job does not automatically cancel the rest of the jobs
> in the matrix (the default is 'true') [2].
>
> If we apply that to all four matrices in the workflow file,
> (windows-test, vs-test, regular and dockerized), it would be
> something like this:

I am assuming that "failure of build stops tests from running",
which is done on the windows/vs-build side, is still kept with
this change.  If that is the case, it sounds quite sensible.

> I've CC-ed Dscho regarding if we also want this for the Windows tests,
> (I don't see why not) and if we feel it's a good idea I can
> send a proper patch.

I often see windows/vs-test #4 fail first and take other 9 with it.
We may see some interesting patterns to emerge, when we run all of
them (and let all of them fail X-<).

Thanks for working on this.

Taylor Blau Jan. 25, 2021, 12:56 a.m. UTC | #10

On Sat, Jan 23, 2021 at 09:41:29PM -0500, Philippe Blain wrote:
> I've CC-ed Dscho regarding if we also want this for the Windows tests,
> (I don't see why not) and if we feel it's a good idea I can
> send a proper patch.

:-). The first thought I had when I read your email was: "shouldn't we
turn this on in those sharded Windows tests, too?"

So, yeah, I think that this is a good idea for the windows- and
vs-build tests, too.

Thanks,
Taylor

Jeff King Jan. 27, 2021, 10:21 p.m. UTC | #11

On Sat, Jan 23, 2021 at 09:41:29PM -0500, Philippe Blain wrote:

> > So we could split the macos test out. It would probably involve
> > duplicating a little bit of the content, but we do something similar for
> > the dockerized builds. It might be that there is an option we can set to
> > say "keep building the others even if one of these fails", which would
> > give us the best of both.
> 
> Yes, a quick Google search pointed me to a blog post [1]
> that mentions using 'fail-fast: false' in the test matrix so that
> one failing job does not automatically cancel the rest of the jobs
> in the matrix (the default is 'true') [2].
> 
> If we apply that to all four matrices in the workflow file,
> (windows-test, vs-test, regular and dockerized), it would be
> something like this:

Thanks, that's exactly what I think we'd want.

The downside, of course, is that a failure that will happen on every
platform will mean wasted CPU to trigger the same failure over and over.
But:

  - I rarely see that myself, because I wouldn't bother pushing up to CI
    until "make test" passed locally. So usually I'm finding portability
    issues via CI. Other people might be different, though.

  - we already have the Windows tests in a separate matrix anyway, so a
    failure on Linux would run the whole Windows suite (which is an
    order of magnitude more expensive)

  - even within the Windows matrix, I think running the rest of the
    tests after a failure is still valuable. If there's a second
    failure, you save a round-trip to CI (so it doesn't reduce CPU, but
    it may help latency to reach a passing state).

-Peff

Junio C Hamano Jan. 28, 2021, 5:23 a.m. UTC | #12

Jeff King <peff@peff.net> writes:

> The downside, of course, is that a failure that will happen on every
> platform will mean wasted CPU to trigger the same failure over and over.
> But:
>
>   - I rarely see that myself, because I wouldn't bother pushing up to CI
>     until "make test" passed locally. So usually I'm finding portability
>     issues via CI. Other people might be different, though.

This is the same for me.

>   - we already have the Windows tests in a separate matrix anyway, so a
>     failure on Linux would run the whole Windows suite (which is an
>     order of magnitude more expensive)

But they tend to finish earlier than OSX and Ubuntu jobs; their
sharding of the tests into 10 jobs may probably have something to do
with this. 

>   - even within the Windows matrix, I think running the rest of the
>     tests after a failure is still valuable. If there's a second
>     failure, you save a round-trip to CI (so it doesn't reduce CPU, but
>     it may help latency to reach a passing state).

SZEDER Gábor Feb. 1, 2021, 10:11 p.m. UTC | #13

On Fri, Jan 15, 2021 at 03:46:40PM -0500, Jeff King wrote:
> On Fri, Jan 15, 2021 at 03:07:38PM -0500, Eric Sunshine wrote:
> 
> > On Fri, Jan 15, 2021 at 3:00 PM Jeff King <peff@peff.net> wrote:
> > > > -     brew cask install --no-quarantine perforce || {
> > > > +     brew install --cask --no-quarantine perforce || {
> > >
> > > On a related note, it feels like perforce is a frequent offender for
> > > triggering spurious failures (both for homebrew setup, but I have
> > > definitely seen racy/flaky failures from it as well). I am tempted to
> > > say it is not worth the trouble, but then I do not care at all about
> > > git-p4 myself in the first place, so I may be biased.
> > 
> > To be fair to 'perforce', though, the fault of this particular problem
> > is Homebrew, which doesn't seem to be all that concerned about
> > backward compatibility, at least in my experience. The single
> > Homebrew-related automation script I wrote for personal use has been
> > broken by arbitrary Homebrew changes frustratingly often over the last
> > three years.
> 
> Yeah, sorry, I should have been more precise in my language. None of
> this is perforce's fault at all. It is homebrew in this case, and in the
> racy cases it is probably our tests. But I do not feel like trying to
> debug those races for a tool I don't care much about.
> 
> I tried to dig up some failing logs as an example, but it's actually a
> bit hard to search for,

I keep an eye out for flaky test failures, because they are much more
entertaining to hunt down than the "usual" bugs, but can't recall any
involving Perforce in the last year, since 6026aff5bb (git-p4: cleanup
better on error exit, 2020-01-29) fixed the last one I knew about.

> and it looks like logs get expired after a few
> months anyway.

Travis CI can still show our builds from mid-2016:

  https://travis-ci.org/github/git/git/builds/146233889

though I couldn't dig out our very first build.

ci/install-depends: attempt to fix "brew cask" stuff

Commit Message

Comments

Patch