diff mbox series

ci: update 'static-analysis' to Ubuntu 22.04

Message ID pull.1334.git.1661275691795.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit ef46584831268a83591412a33783caf866867482
Headers show
Series ci: update 'static-analysis' to Ubuntu 22.04 | expand

Commit Message

Derrick Stolee Aug. 23, 2022, 5:28 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all
runs of the 'static-analysis' job in our CI runs. Update to 22.04 to
avoid this as the brownout later turns into a complete deprecation.

The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run
static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle
being available on 20.04 (which continues today).

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
    ci: update 'static-analysis' to Ubuntu 20.04
    
    I noticed this while preparing my bundle URIs series. See an example
    cancellation at [1]
    
    [1]
    https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true
    
    I initially asked about this [2]. Thanks to Matthias Aßhauer for
    pointing out that 22.04 has Coccinelle available [3].
    
    [2]
    https://lore.kernel.org/git/eb8779bc-fc41-f601-05f2-024e6bf3f316@github.com/
    [3]
    https://github.com/gitgitgadget/git/pull/1334#issuecomment-1223597655
    
    Thanks,
    
     * Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1334%2Fderrickstolee%2Fstatic-analysis-ubuntu-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1334/derrickstolee/static-analysis-ubuntu-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1334

 .github/workflows/main.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 795ea8776befc95ea2becd8020c7a284677b4161

Comments

Johannes Schindelin Aug. 24, 2022, 2:40 p.m. UTC | #1
Hi Stolee,

On Tue, 23 Aug 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all
> runs of the 'static-analysis' job in our CI runs. Update to 22.04 to
> avoid this as the brownout later turns into a complete deprecation.
>
> The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run
> static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle
> being available on 20.04 (which continues today).

ACK!

>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>     ci: update 'static-analysis' to Ubuntu 20.04
>
>     I noticed this while preparing my bundle URIs series. See an example
>     cancellation at [1]
>
>     [1]
>     https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true
>
>     I initially asked about this [2]. Thanks to Matthias Aßhauer for
>     pointing out that 22.04 has Coccinelle available [3].
>
>     [2]
>     https://lore.kernel.org/git/eb8779bc-fc41-f601-05f2-024e6bf3f316@github.com/
>     [3]
>     https://github.com/gitgitgadget/git/pull/1334#issuecomment-1223597655

This can be verified also by looking at the successful `static-analysis`
run at
https://github.com/gitgitgadget/git/runs/7979368539?check_suite_focus=true
(which is part of the PR/CI runs at
https://github.com/gitgitgadget/git/pull/1334/checks).

Thanks,
Dscho

>
>     Thanks,
>
>      * Stolee
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1334%2Fderrickstolee%2Fstatic-analysis-ubuntu-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1334/derrickstolee/static-analysis-ubuntu-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1334
>
>  .github/workflows/main.yml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
> index cd1f52692a5..831f4df56c5 100644
> --- a/.github/workflows/main.yml
> +++ b/.github/workflows/main.yml
> @@ -309,7 +309,7 @@ jobs:
>      if: needs.ci-config.outputs.enabled == 'yes'
>      env:
>        jobname: StaticAnalysis
> -    runs-on: ubuntu-18.04
> +    runs-on: ubuntu-22.04
>      steps:
>      - uses: actions/checkout@v2
>      - run: ci/install-dependencies.sh
>
> base-commit: 795ea8776befc95ea2becd8020c7a284677b4161
> --
> gitgitgadget
>
Junio C Hamano Aug. 24, 2022, 7:59 p.m. UTC | #2
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Stolee,
>
> On Tue, 23 Aug 2022, Derrick Stolee via GitGitGadget wrote:
>
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all
>> runs of the 'static-analysis' job in our CI runs. Update to 22.04 to
>> avoid this as the brownout later turns into a complete deprecation.
>>
>> The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run
>> static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle
>> being available on 20.04 (which continues today).
>
> ACK!

Yup, this is probably a good move.  We tend to stay away from
bleeding edge and instead pick a representative one that is likely
to be with our real users, but 22.04 by no means is one.

I guess we would want to do this even down to 'maint'?

Thanks, both.
Junio C Hamano Aug. 24, 2022, 11:43 p.m. UTC | #3
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all
> runs of the 'static-analysis' job in our CI runs. Update to 22.04 to
> avoid this as the brownout later turns into a complete deprecation.
>
> The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run
> static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle
> being available on 20.04 (which continues today).
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>     ci: update 'static-analysis' to Ubuntu 20.04
>     
>     I noticed this while preparing my bundle URIs series. See an example
>     cancellation at [1]
>     
>     [1]
>     https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true
>     
>     I initially asked about this [2]. Thanks to Matthias Aßhauer for
>     pointing out that 22.04 has Coccinelle available [3].

Thanks, it is already paying its dividend, it seems.

We probably need to fix or revert/remove rules we have in
unused.cocci that makes bogus "suggestion".

  https://github.com/git/git/runs/8005321972?check_suite_focus=true
Derrick Stolee Aug. 25, 2022, 12:30 a.m. UTC | #4
On 8/24/22 7:43 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all
>> runs of the 'static-analysis' job in our CI runs. Update to 22.04 to
>> avoid this as the brownout later turns into a complete deprecation.
>>
>> The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run
>> static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle
>> being available on 20.04 (which continues today).
>>
>> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
>> ---
>>     ci: update 'static-analysis' to Ubuntu 20.04
>>     
>>     I noticed this while preparing my bundle URIs series. See an example
>>     cancellation at [1]
>>     
>>     [1]
>>     https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true
>>     
>>     I initially asked about this [2]. Thanks to Matthias Aßhauer for
>>     pointing out that 22.04 has Coccinelle available [3].
> 
> Thanks, it is already paying its dividend, it seems.
> 
> We probably need to fix or revert/remove rules we have in
> unused.cocci that makes bogus "suggestion".
> 
>   https://github.com/git/git/runs/8005321972?check_suite_focus=true

Yes, this is definitely a bogus suggestion. It's probable that it
is picked up by the newer version of Coccinelle.

I would recommend removing unused.cocci in a patch after this one,
and then you can apply both into 'maint'.

Perhaps a careful rewrite of unused.cocci could fix these kind of
suggestions using the latest version of Coccinelle, but now is not
the time to wait for that.

Thanks,
-Stolee
Junio C Hamano Aug. 25, 2022, 4:43 a.m. UTC | #5
Derrick Stolee <derrickstolee@github.com> writes:

>> We probably need to fix or revert/remove rules we have in
>> unused.cocci that makes bogus "suggestion".
>> 
>>   https://github.com/git/git/runs/8005321972?check_suite_focus=true
>
> Yes, this is definitely a bogus suggestion. It's probable that it
> is picked up by the newer version of Coccinelle.

Yes, I think we should tentatively disable the offending one until
we know how to properly "fix" it.
Ævar Arnfjörð Bjarmason Aug. 25, 2022, 10:47 a.m. UTC | #6
On Wed, Aug 24 2022, Junio C Hamano wrote:

> Derrick Stolee <derrickstolee@github.com> writes:
>
>>> We probably need to fix or revert/remove rules we have in
>>> unused.cocci that makes bogus "suggestion".
>>> 
>>>   https://github.com/git/git/runs/8005321972?check_suite_focus=true
>>
>> Yes, this is definitely a bogus suggestion. It's probable that it
>> is picked up by the newer version of Coccinelle.
>
> Yes, I think we should tentatively disable the offending one until
> we know how to properly "fix" it.

I'm seeing this relatively late, it would be nice to get a CC on
discussions of of one's code :)

This is indeed completely broken, but it's not the unused.cocci rule
that's broken.

What's happening here is that coccinelle can no longer properly parse
the file after the UNUSED() macros were applied to refs.c.

Try running with "--verbose-match --verbose-parsing", on
"seen". Deleting the UNUSED() from warn_if_dangling_symref() happens to
"fix" it, but it's only working as a result of some hack. Coccinelle is
running into some unbalanced paren issue, and it happens to balance out
with that.

I don't think there's any issue here in unused.cocci, it just happens to
be the rule that's unlucky enough to fire like that in the face of these
parse errors.

We should probably coerce coccinelle into stopping in general if it's
encountering "BAD:!!!!!" parse errors behind the scenes, as it clearly
results in broken logic, but offhand (and from briefly browsing the
manpage) I don't know a way to do that.

But the fix here isn't to delete unused.cocci, but to hold off on the
UNUSEwork D() patches until we figure out how to make coccinelle jive with
them.

One thing that *would* fix it is to go with the approach I suggested in
https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/,
i.e. to not use an "UNUSED(var)" form, but just "UNUSED".

I tried that just now with this hack, which wouldn't even compile with
the compiler, but coccinelle is seemingly smart enough to ignore unknown
tokens it doesn't know about if they're not introducing parens (i.e. I
didn't even have to define UNUSED2).

It's also not that it punted out entirely, with this changing
refs_verify_refname_available() so that "referent" actually isn't unused
for real would have unused.cocci suggest the same removal, so we're
managing to fully apply rules to the file:
	
	diff --git a/refs.c b/refs.c
	index 607694c2662..37e7d88920c 100644
	--- a/refs.c
	+++ b/refs.c
	@@ -442,7 +442,7 @@ struct warn_if_dangling_data {
	 };
	 
	 static int warn_if_dangling_symref(const char *refname,
	-				   const struct object_id *UNUSED(oid),
	+				   const struct object_id *oid UNUSED2,
	 				   int flags, void *cb_data)
	 {
	 	struct warn_if_dangling_data *d = cb_data;
	@@ -982,7 +982,7 @@ static void set_read_ref_cutoffs(struct read_ref_at_cb *cb,
	 }
	 
	 static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid,
	-			   const char *UNUSED(email),
	+			   const char *email UNUSED2,
	 			   timestamp_t timestamp, int tz,
	 			   const char *message, void *cb_data)
	 {
	@@ -1024,9 +1024,9 @@ static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid,
	 	return cb->found_it;
	 }
	 
	-static int read_ref_at_ent_newest(struct object_id *UNUSED(ooid),
	+static int read_ref_at_ent_newest(struct object_id *ooid UNUSED2,
	 				  struct object_id *noid,
	-				  const char *UNUSED(email),
	+				  const char *email UNUSED2,
	 				  timestamp_t timestamp, int tz,
	 				  const char *message, void *cb_data)
	 {
	@@ -1039,7 +1039,7 @@ static int read_ref_at_ent_newest(struct object_id *UNUSED(ooid),
	 }
	 
	 static int read_ref_at_ent_oldest(struct object_id *ooid, struct object_id *noid,
	-				  const char *UNUSED(email),
	+				  const char *email UNUSED2,
	 				  timestamp_t timestamp, int tz,
	 				  const char *message, void *cb_data)
	 {
	@@ -1904,7 +1904,7 @@ struct ref_store_hash_entry
	 	char name[FLEX_ARRAY];
	 };
	 
	-static int ref_store_hash_cmp(const void *UNUSED(cmp_data),
	+static int ref_store_hash_cmp(const void *cmp_data UNUSED2,
	 			      const struct hashmap_entry *eptr,
	 			      const struct hashmap_entry *entry_or_key,
	 			      const void *keydata)
Ævar Arnfjörð Bjarmason Aug. 25, 2022, 2:57 p.m. UTC | #7
On Tue, Aug 23 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
>
> GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all
> runs of the 'static-analysis' job in our CI runs. Update to 22.04 to
> avoid this as the brownout later turns into a complete deprecation.
>
> The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run
> static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle
> being available on 20.04 (which continues today).
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>     ci: update 'static-analysis' to Ubuntu 20.04
>     
>     I noticed this while preparing my bundle URIs series. See an example
>     cancellation at [1]
>     
>     [1]
>     https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true
>     
>     I initially asked about this [2]. Thanks to Matthias Aßhauer for
>     pointing out that 22.04 has Coccinelle available [3].
>     
>     [2]
>     https://lore.kernel.org/git/eb8779bc-fc41-f601-05f2-024e6bf3f316@github.com/
>     [3]
>     https://github.com/gitgitgadget/git/pull/1334#issuecomment-1223597655
>     
>     Thanks,
>     
>      * Stolee
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1334%2Fderrickstolee%2Fstatic-analysis-ubuntu-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1334/derrickstolee/static-analysis-ubuntu-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1334
>
>  .github/workflows/main.yml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
> index cd1f52692a5..831f4df56c5 100644
> --- a/.github/workflows/main.yml
> +++ b/.github/workflows/main.yml
> @@ -309,7 +309,7 @@ jobs:
>      if: needs.ci-config.outputs.enabled == 'yes'
>      env:
>        jobname: StaticAnalysis
> -    runs-on: ubuntu-18.04
> +    runs-on: ubuntu-22.04

I think it's worth mentioning in the commit message that the reason
we're not going for "ubuntu-latest" here is because "latest" is a
misnomer, it's not the "latest", but "latest-stable" or
whatever. I.e. it's pointing to 20.04 currently:

	https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

I think per
https://lore.kernel.org/git/211104.86v918i78r.gmgdl@evledraar.gmail.com/
that we should pin these in general, but you/Junio disagreed at the
time. So maybe a commit message change/comment here noting "# make this
ubuntu-latest once it's ubuntu-22.04 or later" or something would make
the intent clearer.

Also: for "sparse" didn't we miss doing that, i.e. it's pinned on
ubuntu-20.04 explicitly, but surely that should be "ubuntu-latest" now?
Junio C Hamano Aug. 25, 2022, 4:08 p.m. UTC | #8
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> What's happening here is that coccinelle can no longer properly parse
> the file after the UNUSED() macros were applied to refs.c.

Sigh.

> 	diff --git a/refs.c b/refs.c
> 	index 607694c2662..37e7d88920c 100644
> 	--- a/refs.c
> 	+++ b/refs.c
> 	@@ -442,7 +442,7 @@ struct warn_if_dangling_data {
> 	 };
> 	 
> 	 static int warn_if_dangling_symref(const char *refname,
> 	-				   const struct object_id *UNUSED(oid),
> 	+				   const struct object_id *oid UNUSED2,
> 	 				   int flags, void *cb_data)
> 	 {
> 	 	struct warn_if_dangling_data *d = cb_data;

This is almost "the most simple and stupid and nobody would get
confused" version, which I may actually be able to live with.

Unfortunately it will not get the "somebody by mistake uses 'oid'
and we can break the build no matter what compiler is used by them
before sending the patch out to the list".

Doing s/oid UNUSED2/oid_UNUSED/ without any __attribute__((unused))
would give that benefit to us, but that won't squelch compilation
with -Wunused which makes it a non-starter.
Junio C Hamano Aug. 25, 2022, 4:17 p.m. UTC | #9
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> -    runs-on: ubuntu-18.04
>> +    runs-on: ubuntu-22.04
>
> I think it's worth mentioning in the commit message that the reason
> we're not going for "ubuntu-latest" here is because "latest" is a
> misnomer, it's not the "latest", but "latest-stable" or
> whatever.

I think we are naming an exact version because we do not want what
it means to change under us until the situation stabilizes.  

If we know that the version after 22.04 that will become "latest" is
already reasonable when 22.04 (or later) becomes the "latest", maybe
we decide to use "latest", but I suspect that right now is not the
time to do that (yet).  We just have proposed to go to 22.04 and
haven't even used it long enough.
Jeff King Aug. 26, 2022, 7:48 a.m. UTC | #10
On Thu, Aug 25, 2022 at 12:47:53PM +0200, Ævar Arnfjörð Bjarmason wrote:

> What's happening here is that coccinelle can no longer properly parse
> the file after the UNUSED() macros were applied to refs.c.

Ugh. Thanks for finding this.

I'm a little disappointed in coccinelle here. I mean, I get that it
probably can't just use the output of "gcc -E"; it wants to see and
transform the code as written (and maybe even translate macros). But
if it doesn't understand the macros enough to see inside them, there are
a lot of things it is going to get confused by.

A simple example like this shows that it doesn't really see inside them:

  $ cat foo.c
  #define foo(x) bar(x)
  static void fun(int x) { foo(x); }

  $ cat foo.cocci
  expression E;
  @@
  - bar(E);
  + changed(E);

I wonder what it makes of code-generation macros like
for_each_string_list_item(). Or all of commit-slab.

Likewise, the manner in which it fails is unexpected. I could see it
getting confused about the line in question, but why would that cause it
to fail to see the of a strbuf in another function entirely?

I realize that pseudo-parsing C is an incredibly hard problem. But I
also wonder how robust the tool actually is in practice. We've had a lot
of mystery "now why didn't it catch this case" incidents over the years.
Just running the tip of master with --verbose-parsing yields quite a few
"bad" and "parse error" entries in the log. I wonder how much that
contributes.

> We should probably coerce coccinelle into stopping in general if it's
> encountering "BAD:!!!!!" parse errors behind the scenes, as it clearly
> results in broken logic, but offhand (and from briefly browsing the
> manpage) I don't know a way to do that.

You could detect the BAD entries from the log and complain before
returning the exit code to make. But since there are several instances
already, I suspect it's a losing battle. We need some way to actually
_fix_ them. And in the meantime, it does seem to _mostly_ work, as it
clearly has some kind of resync logic.

> But the fix here isn't to delete unused.cocci, but to hold off on the
> UNUSEwork D() patches until we figure out how to make coccinelle jive with
> them.

Yeah, my general skepticism and disappointment above notwithstanding,
this seems like the best path forward from here. I tried a few other
tricks (like --macro-file and --iso-file), but if its parser chokes, I
don't think there's much we can do about it. Even if we wrote a patch to
coccinelle itself (and I have no interest in doing that myself), it
would take a while to become available.

-Peff
Junio C Hamano Aug. 26, 2022, 4:46 p.m. UTC | #11
Jeff King <peff@peff.net> writes:

>> But the fix here isn't to delete unused.cocci, but to hold off on the
>> UNUSEwork D() patches until we figure out how to make coccinelle jive with
>> them.
>
> Yeah, my general skepticism and disappointment above notwithstanding,
> this seems like the best path forward from here. I tried a few other
> tricks (like --macro-file and --iso-file), but if its parser chokes, I
> don't think there's much we can do about it. Even if we wrote a patch to
> coccinelle itself (and I have no interest in doing that myself), it
> would take a while to become available.

If it is just a single unused.cocci, I would actually think removing
it would be a much better path forward.  UNUSED() that renames to
help folks without checking compilers would help noticing bad code
much earlier than unused.cocci many contributors are not running
themselves anyway.
Jeff King Aug. 27, 2022, 12:58 p.m. UTC | #12
On Fri, Aug 26, 2022 at 09:46:54AM -0700, Junio C Hamano wrote:

> > Yeah, my general skepticism and disappointment above notwithstanding,
> > this seems like the best path forward from here. I tried a few other
> > tricks (like --macro-file and --iso-file), but if its parser chokes, I
> > don't think there's much we can do about it. Even if we wrote a patch to
> > coccinelle itself (and I have no interest in doing that myself), it
> > would take a while to become available.
> 
> If it is just a single unused.cocci, I would actually think removing
> it would be a much better path forward.  UNUSED() that renames to
> help folks without checking compilers would help noticing bad code
> much earlier than unused.cocci many contributors are not running
> themselves anyway.

I doubt that it is just unused.cocci. If I understand correctly, the new
syntax is choking coccinelle's parser, so it is missing segments of the
code from its analysis.

That shows up as a false positive for unused.cocci, because its patch is
something like "find me spots that have X followed by Y, with no Z in
between (and then delete X and Y)". And the breakage is that it doesn't
notice Z, so it thinks it has found such a spot.

But other rules are of the form "find me X (and turn it into Y)". If the
tool fails to parse code that contains X, then we'll get a false
negative: it won't generate output when it should. But we don't notice
any of those, because there are no true positives in the code base right
now. But it would presumably fail to find some of them if there were.
And more importantly, it makes the tool just as useless as false
positives.

If it were a matter of choosing one static analysis over another (unused
parameters versus coccinelle), we might have something tricky to decide.
But Ævar's non-parenthesized suggestion is a practical workaround. I
don't like it as much as what I posted originally, but it's only a
little less ergonomic, and lets us keep using coccinelle (which, despite
the headaches I've had tangling with it over the years, has produced
useful outcomes).

-Peff
Junio C Hamano Aug. 29, 2022, 5:56 a.m. UTC | #13
Jeff King <peff@peff.net> writes:

> If it were a matter of choosing one static analysis over another (unused
> parameters versus coccinelle), we might have something tricky to decide.
> But Ævar's non-parenthesized suggestion is a practical workaround.

OK.  Let's go that route, then.

Thanks.
Ævar Arnfjörð Bjarmason Aug. 29, 2022, 10:29 a.m. UTC | #14
On Fri, Aug 26 2022, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
>>> But the fix here isn't to delete unused.cocci, but to hold off on the
>>> UNUSEwork D() patches until we figure out how to make coccinelle jive with
>>> them.
>>
>> Yeah, my general skepticism and disappointment above notwithstanding,
>> this seems like the best path forward from here. I tried a few other
>> tricks (like --macro-file and --iso-file), but if its parser chokes, I
>> don't think there's much we can do about it. Even if we wrote a patch to
>> coccinelle itself (and I have no interest in doing that myself), it
>> would take a while to become available.
>
> If it is just a single unused.cocci, I would actually think removing
> it would be a much better path forward.  UNUSED() that renames to
> help folks without checking compilers would help noticing bad code
> much earlier than unused.cocci many contributors are not running
> themselves anyway.

I think Jeff King's reply covers what I would have said, except one
thing I'd like to add:

My reading of this is that you're misimpression that unused.cocci and
Jeff's UNUSED macro are two ways to the same end-goal, and that if we
keep the macro we could lose the coccinelle rule.

But they're doing completely orthogonal checks, the unused.cocci is
finding code that's *actually used* accordingn to the compiler, but which
we know results in code that's functionally unused.

E.g. doing nothing with a "struct strbuf" except to initialize it, and
call strbuf_release() on it.

Whereas the UNUSED macro is finding parameters that are truly unused,
and which we could have expected compilers to optimize out already. It's
just helping us maintain our own code hygene.

I suspect that we'll also find "functionally unused" code with the
UNUSED macro, but only indirectly. I.e. we might find that all users of
an interface don't need the Nth parameter, and so we don't need to
prepare it for them.
SZEDER Gábor Aug. 31, 2022, 8:44 a.m. UTC | #15
On Fri, Aug 26, 2022 at 09:46:54AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> >> But the fix here isn't to delete unused.cocci, but to hold off on the
> >> UNUSEwork D() patches until we figure out how to make coccinelle jive with
> >> them.
> >
> > Yeah, my general skepticism and disappointment above notwithstanding,
> > this seems like the best path forward from here. I tried a few other
> > tricks (like --macro-file and --iso-file), but if its parser chokes, I
> > don't think there's much we can do about it. Even if we wrote a patch to
> > coccinelle itself (and I have no interest in doing that myself), it
> > would take a while to become available.
> 
> If it is just a single unused.cocci, I would actually think removing
> it would be a much better path forward.  UNUSED() that renames to
> help folks without checking compilers would help noticing bad code
> much earlier than unused.cocci many contributors are not running
> themselves anyway.

Here is another reason for the removal of 'unused.cocci': it's very
costly to apply that semantic patch to the whole code base.

  make SPATCH_BATCH_SIZE=32 contrib/coccinelle/unused.cocci.patch

takes 440s on my machine, whereas the second slowest 'object_id.cocci'
takes only 56s [1].  Applying 'unused.cocci' to some of our source files
individually takes well over a minute:

  $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/log.c
  warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h
  warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso
  HANDLING: builtin/log.c
  Note: processing took    83.1s: builtin/log.c
  
  real	1m23.083s
  user	1m22.983s
  sys	0m0.033s
  $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/rebase.c 
  warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h
  warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso
  HANDLING: builtin/rebase.c
  Note: processing took    83.2s: builtin/rebase.c
  
  real	1m23.223s
  user	1m23.156s
  sys	0m0.017s

In my opinion the benefits of having 'unused.cocci' clearly do not
justify the costs.

[1] https://public-inbox.org/git/20220824113901.GD1735@szeder.dev/
Ævar Arnfjörð Bjarmason Aug. 31, 2022, 12:13 p.m. UTC | #16
On Wed, Aug 31 2022, SZEDER Gábor wrote:

> On Fri, Aug 26, 2022 at 09:46:54AM -0700, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>> 
>> >> But the fix here isn't to delete unused.cocci, but to hold off on the
>> >> UNUSEwork D() patches until we figure out how to make coccinelle jive with
>> >> them.
>> >
>> > Yeah, my general skepticism and disappointment above notwithstanding,
>> > this seems like the best path forward from here. I tried a few other
>> > tricks (like --macro-file and --iso-file), but if its parser chokes, I
>> > don't think there's much we can do about it. Even if we wrote a patch to
>> > coccinelle itself (and I have no interest in doing that myself), it
>> > would take a while to become available.
>> 
>> If it is just a single unused.cocci, I would actually think removing
>> it would be a much better path forward.  UNUSED() that renames to
>> help folks without checking compilers would help noticing bad code
>> much earlier than unused.cocci many contributors are not running
>> themselves anyway.
>
> Here is another reason for the removal of 'unused.cocci': it's very
> costly to apply that semantic patch to the whole code base.
>
>   make SPATCH_BATCH_SIZE=32 contrib/coccinelle/unused.cocci.patch
>
> takes 440s on my machine, whereas the second slowest 'object_id.cocci'
> takes only 56s [1].  Applying 'unused.cocci' to some of our source files
> individually takes well over a minute:
>
>   $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/log.c
>   warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h
>   warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso
>   HANDLING: builtin/log.c
>   Note: processing took    83.1s: builtin/log.c
>   
>   real	1m23.083s
>   user	1m22.983s

If you remove the "done:" line in cmd_format_patch() buiiltin/log.c runs
in ~200ms instead of ~40s for me. Perhaps we should be discussing
removing or refactoring that one line of code instead? :)

Removing coccinelle rules because we're seeing slowness somewhere seems
particularly short-sighted to me.

Maybe we do run into intractable problems somewhere with it being slow,
and we'd also like to cater to more "interactive" use.

But we shouldn't do that by removing rules until we get below some
runtime limit, but rather by creating a "batch" category or something
(just like we have "pending") now.

Or, just actually look into why it's slow and fix those issues and/or
report them upstream.

There's nothing in unused.cocci that we either aren't running into
elsewhere, or wouldn't run into if we had 10x the coccinelle rules we
have now (which I think would be a good direction, we should rely on it
more heavily).

I've found that being able to have a ccache-like tool for "spatch"[1]
solved almost all of the practical performance concerns I had with
it. I.e. I can just run things in a batch, and usually any interactive
use will hit things already in cache.

To the extent it doesn't it's usually some pathological issue in spatch.

>   sys	0m0.033s
>   $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/rebase.c 
>   warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h
>   warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso
>   HANDLING: builtin/rebase.c
>   Note: processing took    83.2s: builtin/rebase.c
>   
>   real	1m23.223s
>   user	1m23.156s
>   sys	0m0.017s

I didn't look at this one, but I assume it's some similar (and probably
easily fixed) pathological issue.

1. https://lore.kernel.org/git/patch-5.5-ce4734e5d79-20220825T141212Z-avarab@gmail.com/
Jeff King Aug. 31, 2022, 3:12 p.m. UTC | #17
On Mon, Aug 29, 2022 at 12:29:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> My reading of this is that you're misimpression that unused.cocci and
> Jeff's UNUSED macro are two ways to the same end-goal, and that if we
> keep the macro we could lose the coccinelle rule.

Ah, I didn't think of that, but yeah, that would explain Junio's
position a bit more.

> But they're doing completely orthogonal checks, the unused.cocci is
> finding code that's *actually used* accordingn to the compiler, but which
> we know results in code that's functionally unused.

Right. They're two separate types of "unused", and they should actually
complement each other (e.g., if we drop an unused strbuf parameter, then
a calling function may find that it is now doing nothing but
init/release on the strbuf). So we'd want both.

-Peff
Jeff King Aug. 31, 2022, 3:24 p.m. UTC | #18
On Wed, Aug 31, 2022 at 02:13:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Removing coccinelle rules because we're seeing slowness somewhere seems
> particularly short-sighted to me.
> 
> Maybe we do run into intractable problems somewhere with it being slow,
> and we'd also like to cater to more "interactive" use.

Agreed. I'm not wild about how long it takes to run either, but if it's
producing useful results, it seems worth it to pay the CPU (and I think
unused.cocci did find some useful results already). There's a point at
which the CPU use becomes intractable, but I don't think we're there
yet.

> There's nothing in unused.cocci that we either aren't running into
> elsewhere, or wouldn't run into if we had 10x the coccinelle rules we
> have now (which I think would be a good direction, we should rely on it
> more heavily).

From past experience, I suspect the "<... ...>" operator is what's
expensive. I don't see an easy way of avoiding it here, though.

I'm more skeptical on more coccinelle in general, just because I've
spent so many hours fighting with it for both output and performance
reasons. But if somebody else is willing to do that work, I'm OK with
it. I have often wondered if our rules are sufficiently simple that
libclang plus some light scripting might get us similar results with
less hassle. But maybe that's a rabbit hole.

-Peff
SZEDER Gábor Aug. 31, 2022, 6:05 p.m. UTC | #19
On Wed, Aug 31, 2022 at 02:13:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Aug 31 2022, SZEDER Gábor wrote:
> 
> > On Fri, Aug 26, 2022 at 09:46:54AM -0700, Junio C Hamano wrote:
> >> Jeff King <peff@peff.net> writes:
> >> 
> >> >> But the fix here isn't to delete unused.cocci, but to hold off on the
> >> >> UNUSEwork D() patches until we figure out how to make coccinelle jive with
> >> >> them.
> >> >
> >> > Yeah, my general skepticism and disappointment above notwithstanding,
> >> > this seems like the best path forward from here. I tried a few other
> >> > tricks (like --macro-file and --iso-file), but if its parser chokes, I
> >> > don't think there's much we can do about it. Even if we wrote a patch to
> >> > coccinelle itself (and I have no interest in doing that myself), it
> >> > would take a while to become available.
> >> 
> >> If it is just a single unused.cocci, I would actually think removing
> >> it would be a much better path forward.  UNUSED() that renames to
> >> help folks without checking compilers would help noticing bad code
> >> much earlier than unused.cocci many contributors are not running
> >> themselves anyway.
> >
> > Here is another reason for the removal of 'unused.cocci': it's very
> > costly to apply that semantic patch to the whole code base.
> >
> >   make SPATCH_BATCH_SIZE=32 contrib/coccinelle/unused.cocci.patch
> >
> > takes 440s on my machine, whereas the second slowest 'object_id.cocci'
> > takes only 56s [1].  Applying 'unused.cocci' to some of our source files
> > individually takes well over a minute:
> >
> >   $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/log.c
> >   warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h
> >   warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso
> >   HANDLING: builtin/log.c
> >   Note: processing took    83.1s: builtin/log.c
> >   
> >   real	1m23.083s
> >   user	1m22.983s
> 
> If you remove the "done:" line in cmd_format_patch() buiiltin/log.c runs
> in ~200ms instead of ~40s for me. Perhaps we should be discussing
> removing or refactoring that one line of code instead? :)
> 
> Removing coccinelle rules because we're seeing slowness somewhere seems
> particularly short-sighted to me.

It's not just slowness, it's drastic slowness.  I'm looking at two
"from scratch" 'make coccicheck' runs here, one with 'unused.cocci'
taking 9m51s, one without taking 4m56s.  So 'unused.cocci' effectively
doubled the runtime, and wastes other developers' time and resources.

I don't see anything wrong with removing a semantic patch that is as
slow as 'unused.cocci' in its current form on our current codebase.
We can always re-add it later, after those interested managed to
figure out a way to address its slowness, and updated the semantic
patch and/or the codebase accordingly.

> Maybe we do run into intractable problems somewhere with it being slow,

Looking at the runtimes I showed above, I think deeming it intractable
is fully justified.

> and we'd also like to cater to more "interactive" use.
> 
> But we shouldn't do that by removing rules until we get below some
> runtime limit, but rather by creating a "batch" category or something
> (just like we have "pending") now.
> 
> Or, just actually look into why it's slow and fix those issues and/or
> report them upstream.

IMO this should be the other way around: if applying a semantic patch
is this slow, then first look into why it's slow, fix it, and only
then submit it for merging.  A semantic patch this slow shouldn't have
been merged in the first place.

> There's nothing in unused.cocci that we either aren't running into
> elsewhere, or wouldn't run into if we had 10x the coccinelle rules we
> have now (which I think would be a good direction, we should rely on it
> more heavily).

Several developers have already stated that they might run 'make
coccicheck' more often if it weren't so slow.  I think we must keep
this in mind when adding new semantic patches, and should aim for a
good return of investment between the usefulness of the semantic patch
and its overhead.  'unused.cocci' doesn't seem to strike a good
balance here.

I doubt that I would ever run 'make coccicheck' if we had 10x as many
semantic patches.

> I've found that being able to have a ccache-like tool for "spatch"[1]
> solved almost all of the practical performance concerns I had with
> it. I.e. I can just run things in a batch, and usually any interactive
> use will hit things already in cache.

Well, perhaps that's why you didn't notice just how slow
'unused.cocci' can be... :)  Please don't forget about the runtime of
a default "from scratch" 'make coccicheck'.

> To the extent it doesn't it's usually some pathological issue in spatch.
> 
> >   sys	0m0.033s
> >   $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/rebase.c 
> >   warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h
> >   warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso
> >   HANDLING: builtin/rebase.c
> >   Note: processing took    83.2s: builtin/rebase.c
> >   
> >   real	1m23.223s
> >   user	1m23.156s
> >   sys	0m0.017s
> 
> I didn't look at this one, but I assume it's some similar (and probably
> easily fixed) pathological issue.
> 
> 1. https://lore.kernel.org/git/patch-5.5-ce4734e5d79-20220825T141212Z-avarab@gmail.com/
Junio C Hamano Aug. 31, 2022, 7:19 p.m. UTC | #20
Jeff King <peff@peff.net> writes:

> On Wed, Aug 31, 2022 at 02:13:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Removing coccinelle rules because we're seeing slowness somewhere seems
>> particularly short-sighted to me.
>> 
>> Maybe we do run into intractable problems somewhere with it being slow,
>> and we'd also like to cater to more "interactive" use.
>
> Agreed. I'm not wild about how long it takes to run either, but if it's
> producing useful results, it seems worth it to pay the CPU (and I think
> unused.cocci did find some useful results already). There's a point at
> which the CPU use becomes intractable, but I don't think we're there
> yet.

I am not, either, as it does not look like it is producing all that
useful results these days, but of course, we are looking at the end
result of previous clean-up triggered by these existing rules, and
therefore be biased---we only see cost of running the rules these
days without seeing anything new getting caught.
Ævar Arnfjörð Bjarmason Aug. 31, 2022, 7:29 p.m. UTC | #21
On Wed, Aug 31 2022, SZEDER Gábor wrote:

> On Wed, Aug 31, 2022 at 02:13:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Aug 31 2022, SZEDER Gábor wrote:
>>> [...]
>> If you remove the "done:" line in cmd_format_patch() buiiltin/log.c runs
>> in ~200ms instead of ~40s for me. Perhaps we should be discussing
>> removing or refactoring that one line of code instead? :)
>> 
>> Removing coccinelle rules because we're seeing slowness somewhere seems
>> particularly short-sighted to me.
>
> It's not just slowness, it's drastic slowness.  I'm looking at two
> "from scratch" 'make coccicheck' runs here, one with 'unused.cocci'
> taking 9m51s, one without taking 4m56s.  So 'unused.cocci' effectively
> doubled the runtime, and wastes other developers' time and resources.
>
> I don't see anything wrong with removing a semantic patch that is as
> slow as 'unused.cocci' in its current form on our current codebase.
> We can always re-add it later, after those interested managed to
> figure out a way to address its slowness, and updated the semantic
> patch and/or the codebase accordingly.
>
>> Maybe we do run into intractable problems somewhere with it being slow,
>
> Looking at the runtimes I showed above, I think deeming it intractable
> is fully justified.
>
>> and we'd also like to cater to more "interactive" use.
>> 
>> But we shouldn't do that by removing rules until we get below some
>> runtime limit, but rather by creating a "batch" category or something
>> (just like we have "pending") now.
>> 
>> Or, just actually look into why it's slow and fix those issues and/or
>> report them upstream.
>
> IMO this should be the other way around: if applying a semantic patch
> is this slow, then first look into why it's slow, fix it, and only
> then submit it for merging.  A semantic patch this slow shouldn't have
> been merged in the first place.

If we're playing hypotheticals then if it hadn't been merged in the
first place we'd have the UNUSED() macro all over the place on master
now, and several *other* rules would probably be more broken (but I
haven't looked into the cooci guts enough), we just wouldn't notice
since "unused" is currently the only "look ahead" rule that fired &
caught it :)

>> There's nothing in unused.cocci that we either aren't running into
>> elsewhere, or wouldn't run into if we had 10x the coccinelle rules we
>> have now (which I think would be a good direction, we should rely on it
>> more heavily).
>
> Several developers have already stated that they might run 'make
> coccicheck' more often if it weren't so slow.  I think we must keep
> this in mind when adding new semantic patches, and should aim for a
> good return of investment between the usefulness of the semantic patch
> and its overhead.  'unused.cocci' doesn't seem to strike a good
> balance here.
>
> I doubt that I would ever run 'make coccicheck' if we had 10x as many
> semantic patches.

I think we're just describing our workflows by proxy here.

For me I generally take wasting a computer's time over a person's time
any day.

We should always be optimizing for saving people's time over CPUs,
because the latter is relatively cheap.

Granted unused.cocci isn't doing much in the grand scheme of things, but
if you know it's there it's one more thing you can let your eyes wander
over in patch review. You don't need to worry if some "struct strbuf"
(or similar) is really being used, or is leftover boilerplate.

Yes, it would be useful if coccicheck were currently fast enough to run
as part of an edit-save-compile-check cycle, but like e.g. -fanalyzer
it's much too slow for that for most people. So it gets run as some
"batch' job.

I think most people who use it in git.git are doing so by pushing to
CI. There it doesn't really matter that it takes longer now, because as
long as it's not slower than the longest running job it won't hold up
the total CI runtime, which is in the ~1hr range anyway.

So that's how I use it (when based off master). I also run it when I
build my local git, but that's a "kick it off in the background and
wait" type of operation. It runs tests, then other tests with other
compilers, SANITIZE=address etc. etc.

So I'm sorry if I tripped up some workflow for you with this change, but
I do still think it's worth it.

>> I've found that being able to have a ccache-like tool for "spatch"[1]
>> solved almost all of the practical performance concerns I had with
>> it. I.e. I can just run things in a batch, and usually any interactive
>> use will hit things already in cache.
>
> Well, perhaps that's why you didn't notice just how slow
> 'unused.cocci' can be... :)  Please don't forget about the runtime of
> a default "from scratch" 'make coccicheck'.

I did notice FWIW, but since I'm mainly looking at CI and other "batch"
operations when looking at "coccicheck" I thought the trade-off was
worth it.
diff mbox series

Patch

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index cd1f52692a5..831f4df56c5 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -309,7 +309,7 @@  jobs:
     if: needs.ci-config.outputs.enabled == 'yes'
     env:
       jobname: StaticAnalysis
-    runs-on: ubuntu-18.04
+    runs-on: ubuntu-22.04
     steps:
     - uses: actions/checkout@v2
     - run: ci/install-dependencies.sh