diff mbox series

ci: avoid building from the same commit in parallel

Message ID xmqq1qfvor35.fsf_-_@gitster.g (mailing list archive)
State New, archived
Headers show
Series ci: avoid building from the same commit in parallel | expand

Commit Message

Junio C Hamano Aug. 22, 2023, 12:31 a.m. UTC
At times, we may need to push the same commit to multiple branches
in the same push.  Rewinding 'next' to rebuild on top of 'master'
soon after a release is such an occasion.  Making sure 'main' stays
in sync with 'master' to help those who expect that primary branch
of the project is named either of these is another.

We used to use the branch name as the "concurrency group" key, but
by switching to use the commit object name would make sure the
builds for the same commit would happen serially, and by the time
the second job becomes ready to run, the first job's outcome would
be available and mking it unnecessary to run the second job.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * There are tons of concurrency groups defined, but as a trial
   change, here is to cover the "regular" matrix that consumes the
   most resources (linux-asan-ubsan is the worst culprit, it seems).

 .github/workflows/main.yml | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Junio C Hamano Aug. 22, 2023, 4:36 a.m. UTC | #1
Junio C Hamano <gitster@pobox.com> writes:

>  * There are tons of concurrency groups defined, but as a trial
>    change, here is to cover the "regular" matrix that consumes the
>    most resources (linux-asan-ubsan is the worst culprit, it seems).

Unfortunately, this did not work.

https://github.com/git/git/actions/runs/5933451874
https://github.com/git/git/actions/runs/5933451805

are trial runs that had the same commit with this patch pushed to
'seen' and 'pu'.  While one of them started the "regular" matrix,
the other one indeed went into paused state and waited.  But that is
way too late.  What happened was that their "config" (which everything
else depends on) started in parallel before the serialization at the
"regular" matrix kicked in.

So, one did wait before doing the "regular" matrix, until the other
one finished everything, and then kept going and did its own
"regular" matrix for the same commit.  It is because the avoidance
of "redundant build" was done at the "config" phase, which both of
them had already done X-<.

If we wanted to do this, I suspect that we need to serialize the
entire thing, not at the individual level where we currently define
the "concurrency" thing.
Johannes Schindelin Aug. 22, 2023, 4:48 a.m. UTC | #2
Hi Junio,

On Mon, 21 Aug 2023, Junio C Hamano wrote:

> If we wanted to do this, I suspect that we need to serialize the
> entire thing, not at the individual level where we currently define
> the "concurrency" thing.

Right, we'd need that `concurrency: ${{ github.sha }}` attribute on the
`config` job.

BTW there is another caveat. According to the documentation, if a job is
queued while another job is already queued, that other job is canceled in
favor of the latest one.

I have not verified this by testing it out, but have no reason to doubt
it.

If that's the case, pushing, say, `master`, `main` and `next` to the same
SHA will see one of them canceled. Which might be okay, if the cancelation
message on the canceled one is indicative enough.

This all depends on reducing the number of flakes (in particular the p4
tests in the ASAN job), but that's a story for another day.

Ciao,
Johannes
Junio C Hamano Aug. 22, 2023, 3:31 p.m. UTC | #3
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Right, we'd need that `concurrency: ${{ github.sha }}` attribute on the
> `config` job.

That was my first thought, but I am not sure how it would work.

Doesn't skip-if-redundant grab the workflow runs that have succeeded
and then see if one for the same commit already exists?  If you used
concurrency on the 'config', what gets serialized between two jobs
for the same commit is only the 'config' phase, so 'master' may wait
starting (because 'config' is what everybody else 'needs' it) while
'config' phase of 'main' runs, and then when it gets to the turn of
'config' phase of 'master', it would not find the run for the same
commit being done for 'main' completed yet, would it?

> BTW there is another caveat. According to the documentation, if a job is
> queued while another job is already queued, that other job is canceled in
> favor of the latest one.

Yes, that was the impression I got; your second one will wait (so
you need a working skip-if-redundant to turn it into noop), but the
third and subsequent ones are discarded without starting, which
unfortunately is what we may want to see happen.

Hmph, from that point of view, would the best and simplest we can do
be to use the commit object name as the concurrency key for the
'config' phase, and use something similar to cancel-in-progress
(which kills the other one when the new one starts, but what we want
is what stops the new one to start when it notices there is already
one running)?
Johannes Schindelin Aug. 23, 2023, 8:42 a.m. UTC | #4
Hi Junio,

On Tue, 22 Aug 2023, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Right, we'd need that `concurrency: ${{ github.sha }}` attribute on
> > the `config` job.
>
> That was my first thought, but I am not sure how it would work.
>
> Doesn't skip-if-redundant grab the workflow runs that have succeeded
> and then see if one for the same commit already exists?  If you used
> concurrency on the 'config', what gets serialized between two jobs
> for the same commit is only the 'config' phase, so 'master' may wait
> starting (because 'config' is what everybody else 'needs' it) while
> 'config' phase of 'main' runs, and then when it gets to the turn of
> 'config' phase of 'master', it would not find the run for the same
> commit being done for 'main' completed yet, would it?

Yes, that's true.

But there is a silver lining: the `concurrency` can not only be specified
on the job level, but also on the workflow run level.

I tested this, and present the corresponding patch at the end of this
mail.

> > BTW there is another caveat. According to the documentation, if a job
> > is queued while another job is already queued, that other job is
> > canceled in favor of the latest one.
>
> Yes, that was the impression I got; your second one will wait (so
> you need a working skip-if-redundant to turn it into noop), but the
> third and subsequent ones are discarded without starting, which
> unfortunately is what we may want to see happen.

It's actually the last one that is still pending, while the intermediate
ones will be canceled before they are started (see also the attached
screenshot). The message that is shown in the web UI reads like this:

 
Junio C Hamano Aug. 23, 2023, 4:08 p.m. UTC | #5
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Junio,
>
> On Tue, 22 Aug 2023, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > Right, we'd need that `concurrency: ${{ github.sha }}` attribute on
>> > the `config` job.
>>
>> That was my first thought, but I am not sure how it would work.
>>
>> Doesn't skip-if-redundant grab the workflow runs that have succeeded
>> and then see if one for the same commit already exists?  If you used
>> concurrency on the 'config', what gets serialized between two jobs
>> for the same commit is only the 'config' phase, so 'master' may wait
>> starting (because 'config' is what everybody else 'needs' it) while
>> 'config' phase of 'main' runs, and then when it gets to the turn of
>> 'config' phase of 'master', it would not find the run for the same
>> commit being done for 'main' completed yet, would it?
>
> Yes, that's true.
>
> But there is a silver lining: the `concurrency` can not only be specified
> on the job level, but also on the workflow run level.
> I tested this, and present the corresponding patch at the end of this
> mail.

Yeah, serializing the whole thing was the only way I thought that
would work, and I am glad you already tested that it works.

Thanks.
Junio C Hamano Aug. 23, 2023, 4:10 p.m. UTC | #6
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Here is the patch:
>
> -- snipsnap --
> From: Junio C Hamano <gitster@pobox.com>
> Date: Mon, 21 Aug 2023 17:31:26 -0700
> Subject: [PATCH] ci: avoid building from the same commit in parallel

I forgot to say that I do not think I deserve the credit in the end
result, as you've done all the hard part.  

Mind taking the authorship, while demoting me to "Helped-by" status?

Thanks.
Johannes Schindelin Aug. 25, 2023, 12:56 p.m. UTC | #7
Hi Junio,

On Wed, 23 Aug 2023, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Here is the patch:
> >
> > -- snipsnap --
> > From: Junio C Hamano <gitster@pobox.com>
> > Date: Mon, 21 Aug 2023 17:31:26 -0700
> > Subject: [PATCH] ci: avoid building from the same commit in parallel
>
> I forgot to say that I do not think I deserve the credit in the end
> result, as you've done all the hard part.
>
> Mind taking the authorship, while demoting me to "Helped-by" status?

Sure. But I disagree that I did most of the work. You tried all the
avenues that I would have tried, saving me tons of time by spending yours.

Thank you,
Johannes
diff mbox series

Patch

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 30492eacdd..27b151aadf 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -240,8 +240,7 @@  jobs:
     needs: ci-config
     if: needs.ci-config.outputs.enabled == 'yes'
     concurrency:
-      group: ${{ matrix.vector.jobname }}-${{ matrix.vector.pool }}-${{ github.ref }}
-      cancel-in-progress: ${{ needs.ci-config.outputs.skip_concurrent == 'yes' }}
+      group: ${{ matrix.vector.jobname }}-${{ matrix.vector.pool }}-${{ github.sha }}
     strategy:
       fail-fast: false
       matrix: