Message ID | pull.1334.git.1661275691795.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | ef46584831268a83591412a33783caf866867482 |
Headers | show |
Series | ci: update 'static-analysis' to Ubuntu 22.04 | expand |
Hi Stolee, On Tue, 23 Aug 2022, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > > GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all > runs of the 'static-analysis' job in our CI runs. Update to 22.04 to > avoid this as the brownout later turns into a complete deprecation. > > The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run > static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle > being available on 20.04 (which continues today). ACK! > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > ci: update 'static-analysis' to Ubuntu 20.04 > > I noticed this while preparing my bundle URIs series. See an example > cancellation at [1] > > [1] > https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true > > I initially asked about this [2]. Thanks to Matthias Aßhauer for > pointing out that 22.04 has Coccinelle available [3]. > > [2] > https://lore.kernel.org/git/eb8779bc-fc41-f601-05f2-024e6bf3f316@github.com/ > [3] > https://github.com/gitgitgadget/git/pull/1334#issuecomment-1223597655 This can be verified also by looking at the successful `static-analysis` run at https://github.com/gitgitgadget/git/runs/7979368539?check_suite_focus=true (which is part of the PR/CI runs at https://github.com/gitgitgadget/git/pull/1334/checks). Thanks, Dscho > > Thanks, > > * Stolee > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1334%2Fderrickstolee%2Fstatic-analysis-ubuntu-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1334/derrickstolee/static-analysis-ubuntu-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1334 > > .github/workflows/main.yml | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml > index cd1f52692a5..831f4df56c5 100644 > --- a/.github/workflows/main.yml > +++ b/.github/workflows/main.yml > @@ -309,7 +309,7 @@ jobs: > if: needs.ci-config.outputs.enabled == 'yes' > env: > jobname: StaticAnalysis > - runs-on: ubuntu-18.04 > + runs-on: ubuntu-22.04 > steps: > - uses: actions/checkout@v2 > - run: ci/install-dependencies.sh > > base-commit: 795ea8776befc95ea2becd8020c7a284677b4161 > -- > gitgitgadget >
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi Stolee, > > On Tue, 23 Aug 2022, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <derrickstolee@github.com> >> >> GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all >> runs of the 'static-analysis' job in our CI runs. Update to 22.04 to >> avoid this as the brownout later turns into a complete deprecation. >> >> The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run >> static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle >> being available on 20.04 (which continues today). > > ACK! Yup, this is probably a good move. We tend to stay away from bleeding edge and instead pick a representative one that is likely to be with our real users, but 22.04 by no means is one. I guess we would want to do this even down to 'maint'? Thanks, both.
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <derrickstolee@github.com> > > GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all > runs of the 'static-analysis' job in our CI runs. Update to 22.04 to > avoid this as the brownout later turns into a complete deprecation. > > The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run > static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle > being available on 20.04 (which continues today). > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > ci: update 'static-analysis' to Ubuntu 20.04 > > I noticed this while preparing my bundle URIs series. See an example > cancellation at [1] > > [1] > https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true > > I initially asked about this [2]. Thanks to Matthias Aßhauer for > pointing out that 22.04 has Coccinelle available [3]. Thanks, it is already paying its dividend, it seems. We probably need to fix or revert/remove rules we have in unused.cocci that makes bogus "suggestion". https://github.com/git/git/runs/8005321972?check_suite_focus=true
On 8/24/22 7:43 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Derrick Stolee <derrickstolee@github.com> >> >> GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all >> runs of the 'static-analysis' job in our CI runs. Update to 22.04 to >> avoid this as the brownout later turns into a complete deprecation. >> >> The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run >> static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle >> being available on 20.04 (which continues today). >> >> Signed-off-by: Derrick Stolee <derrickstolee@github.com> >> --- >> ci: update 'static-analysis' to Ubuntu 20.04 >> >> I noticed this while preparing my bundle URIs series. See an example >> cancellation at [1] >> >> [1] >> https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true >> >> I initially asked about this [2]. Thanks to Matthias Aßhauer for >> pointing out that 22.04 has Coccinelle available [3]. > > Thanks, it is already paying its dividend, it seems. > > We probably need to fix or revert/remove rules we have in > unused.cocci that makes bogus "suggestion". > > https://github.com/git/git/runs/8005321972?check_suite_focus=true Yes, this is definitely a bogus suggestion. It's probable that it is picked up by the newer version of Coccinelle. I would recommend removing unused.cocci in a patch after this one, and then you can apply both into 'maint'. Perhaps a careful rewrite of unused.cocci could fix these kind of suggestions using the latest version of Coccinelle, but now is not the time to wait for that. Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: >> We probably need to fix or revert/remove rules we have in >> unused.cocci that makes bogus "suggestion". >> >> https://github.com/git/git/runs/8005321972?check_suite_focus=true > > Yes, this is definitely a bogus suggestion. It's probable that it > is picked up by the newer version of Coccinelle. Yes, I think we should tentatively disable the offending one until we know how to properly "fix" it.
On Wed, Aug 24 2022, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >>> We probably need to fix or revert/remove rules we have in >>> unused.cocci that makes bogus "suggestion". >>> >>> https://github.com/git/git/runs/8005321972?check_suite_focus=true >> >> Yes, this is definitely a bogus suggestion. It's probable that it >> is picked up by the newer version of Coccinelle. > > Yes, I think we should tentatively disable the offending one until > we know how to properly "fix" it. I'm seeing this relatively late, it would be nice to get a CC on discussions of of one's code :) This is indeed completely broken, but it's not the unused.cocci rule that's broken. What's happening here is that coccinelle can no longer properly parse the file after the UNUSED() macros were applied to refs.c. Try running with "--verbose-match --verbose-parsing", on "seen". Deleting the UNUSED() from warn_if_dangling_symref() happens to "fix" it, but it's only working as a result of some hack. Coccinelle is running into some unbalanced paren issue, and it happens to balance out with that. I don't think there's any issue here in unused.cocci, it just happens to be the rule that's unlucky enough to fire like that in the face of these parse errors. We should probably coerce coccinelle into stopping in general if it's encountering "BAD:!!!!!" parse errors behind the scenes, as it clearly results in broken logic, but offhand (and from briefly browsing the manpage) I don't know a way to do that. But the fix here isn't to delete unused.cocci, but to hold off on the UNUSEwork D() patches until we figure out how to make coccinelle jive with them. One thing that *would* fix it is to go with the approach I suggested in https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/, i.e. to not use an "UNUSED(var)" form, but just "UNUSED". I tried that just now with this hack, which wouldn't even compile with the compiler, but coccinelle is seemingly smart enough to ignore unknown tokens it doesn't know about if they're not introducing parens (i.e. I didn't even have to define UNUSED2). It's also not that it punted out entirely, with this changing refs_verify_refname_available() so that "referent" actually isn't unused for real would have unused.cocci suggest the same removal, so we're managing to fully apply rules to the file: diff --git a/refs.c b/refs.c index 607694c2662..37e7d88920c 100644 --- a/refs.c +++ b/refs.c @@ -442,7 +442,7 @@ struct warn_if_dangling_data { }; static int warn_if_dangling_symref(const char *refname, - const struct object_id *UNUSED(oid), + const struct object_id *oid UNUSED2, int flags, void *cb_data) { struct warn_if_dangling_data *d = cb_data; @@ -982,7 +982,7 @@ static void set_read_ref_cutoffs(struct read_ref_at_cb *cb, } static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid, - const char *UNUSED(email), + const char *email UNUSED2, timestamp_t timestamp, int tz, const char *message, void *cb_data) { @@ -1024,9 +1024,9 @@ static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid, return cb->found_it; } -static int read_ref_at_ent_newest(struct object_id *UNUSED(ooid), +static int read_ref_at_ent_newest(struct object_id *ooid UNUSED2, struct object_id *noid, - const char *UNUSED(email), + const char *email UNUSED2, timestamp_t timestamp, int tz, const char *message, void *cb_data) { @@ -1039,7 +1039,7 @@ static int read_ref_at_ent_newest(struct object_id *UNUSED(ooid), } static int read_ref_at_ent_oldest(struct object_id *ooid, struct object_id *noid, - const char *UNUSED(email), + const char *email UNUSED2, timestamp_t timestamp, int tz, const char *message, void *cb_data) { @@ -1904,7 +1904,7 @@ struct ref_store_hash_entry char name[FLEX_ARRAY]; }; -static int ref_store_hash_cmp(const void *UNUSED(cmp_data), +static int ref_store_hash_cmp(const void *cmp_data UNUSED2, const struct hashmap_entry *eptr, const struct hashmap_entry *entry_or_key, const void *keydata)
On Tue, Aug 23 2022, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > > GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all > runs of the 'static-analysis' job in our CI runs. Update to 22.04 to > avoid this as the brownout later turns into a complete deprecation. > > The use of 18.04 was set in d051ed77ee6 (.github/workflows/main.yml: run > static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle > being available on 20.04 (which continues today). > > Signed-off-by: Derrick Stolee <derrickstolee@github.com> > --- > ci: update 'static-analysis' to Ubuntu 20.04 > > I noticed this while preparing my bundle URIs series. See an example > cancellation at [1] > > [1] > https://github.com/gitgitgadget/git/runs/7954913465?check_suite_focus=true > > I initially asked about this [2]. Thanks to Matthias Aßhauer for > pointing out that 22.04 has Coccinelle available [3]. > > [2] > https://lore.kernel.org/git/eb8779bc-fc41-f601-05f2-024e6bf3f316@github.com/ > [3] > https://github.com/gitgitgadget/git/pull/1334#issuecomment-1223597655 > > Thanks, > > * Stolee > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1334%2Fderrickstolee%2Fstatic-analysis-ubuntu-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1334/derrickstolee/static-analysis-ubuntu-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1334 > > .github/workflows/main.yml | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml > index cd1f52692a5..831f4df56c5 100644 > --- a/.github/workflows/main.yml > +++ b/.github/workflows/main.yml > @@ -309,7 +309,7 @@ jobs: > if: needs.ci-config.outputs.enabled == 'yes' > env: > jobname: StaticAnalysis > - runs-on: ubuntu-18.04 > + runs-on: ubuntu-22.04 I think it's worth mentioning in the commit message that the reason we're not going for "ubuntu-latest" here is because "latest" is a misnomer, it's not the "latest", but "latest-stable" or whatever. I.e. it's pointing to 20.04 currently: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources I think per https://lore.kernel.org/git/211104.86v918i78r.gmgdl@evledraar.gmail.com/ that we should pin these in general, but you/Junio disagreed at the time. So maybe a commit message change/comment here noting "# make this ubuntu-latest once it's ubuntu-22.04 or later" or something would make the intent clearer. Also: for "sparse" didn't we miss doing that, i.e. it's pinned on ubuntu-20.04 explicitly, but surely that should be "ubuntu-latest" now?
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > What's happening here is that coccinelle can no longer properly parse > the file after the UNUSED() macros were applied to refs.c. Sigh. > diff --git a/refs.c b/refs.c > index 607694c2662..37e7d88920c 100644 > --- a/refs.c > +++ b/refs.c > @@ -442,7 +442,7 @@ struct warn_if_dangling_data { > }; > > static int warn_if_dangling_symref(const char *refname, > - const struct object_id *UNUSED(oid), > + const struct object_id *oid UNUSED2, > int flags, void *cb_data) > { > struct warn_if_dangling_data *d = cb_data; This is almost "the most simple and stupid and nobody would get confused" version, which I may actually be able to live with. Unfortunately it will not get the "somebody by mistake uses 'oid' and we can break the build no matter what compiler is used by them before sending the patch out to the list". Doing s/oid UNUSED2/oid_UNUSED/ without any __attribute__((unused)) would give that benefit to us, but that won't squelch compilation with -Wunused which makes it a non-starter.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> - runs-on: ubuntu-18.04 >> + runs-on: ubuntu-22.04 > > I think it's worth mentioning in the commit message that the reason > we're not going for "ubuntu-latest" here is because "latest" is a > misnomer, it's not the "latest", but "latest-stable" or > whatever. I think we are naming an exact version because we do not want what it means to change under us until the situation stabilizes. If we know that the version after 22.04 that will become "latest" is already reasonable when 22.04 (or later) becomes the "latest", maybe we decide to use "latest", but I suspect that right now is not the time to do that (yet). We just have proposed to go to 22.04 and haven't even used it long enough.
On Thu, Aug 25, 2022 at 12:47:53PM +0200, Ævar Arnfjörð Bjarmason wrote: > What's happening here is that coccinelle can no longer properly parse > the file after the UNUSED() macros were applied to refs.c. Ugh. Thanks for finding this. I'm a little disappointed in coccinelle here. I mean, I get that it probably can't just use the output of "gcc -E"; it wants to see and transform the code as written (and maybe even translate macros). But if it doesn't understand the macros enough to see inside them, there are a lot of things it is going to get confused by. A simple example like this shows that it doesn't really see inside them: $ cat foo.c #define foo(x) bar(x) static void fun(int x) { foo(x); } $ cat foo.cocci expression E; @@ - bar(E); + changed(E); I wonder what it makes of code-generation macros like for_each_string_list_item(). Or all of commit-slab. Likewise, the manner in which it fails is unexpected. I could see it getting confused about the line in question, but why would that cause it to fail to see the of a strbuf in another function entirely? I realize that pseudo-parsing C is an incredibly hard problem. But I also wonder how robust the tool actually is in practice. We've had a lot of mystery "now why didn't it catch this case" incidents over the years. Just running the tip of master with --verbose-parsing yields quite a few "bad" and "parse error" entries in the log. I wonder how much that contributes. > We should probably coerce coccinelle into stopping in general if it's > encountering "BAD:!!!!!" parse errors behind the scenes, as it clearly > results in broken logic, but offhand (and from briefly browsing the > manpage) I don't know a way to do that. You could detect the BAD entries from the log and complain before returning the exit code to make. But since there are several instances already, I suspect it's a losing battle. We need some way to actually _fix_ them. And in the meantime, it does seem to _mostly_ work, as it clearly has some kind of resync logic. > But the fix here isn't to delete unused.cocci, but to hold off on the > UNUSEwork D() patches until we figure out how to make coccinelle jive with > them. Yeah, my general skepticism and disappointment above notwithstanding, this seems like the best path forward from here. I tried a few other tricks (like --macro-file and --iso-file), but if its parser chokes, I don't think there's much we can do about it. Even if we wrote a patch to coccinelle itself (and I have no interest in doing that myself), it would take a while to become available. -Peff
Jeff King <peff@peff.net> writes: >> But the fix here isn't to delete unused.cocci, but to hold off on the >> UNUSEwork D() patches until we figure out how to make coccinelle jive with >> them. > > Yeah, my general skepticism and disappointment above notwithstanding, > this seems like the best path forward from here. I tried a few other > tricks (like --macro-file and --iso-file), but if its parser chokes, I > don't think there's much we can do about it. Even if we wrote a patch to > coccinelle itself (and I have no interest in doing that myself), it > would take a while to become available. If it is just a single unused.cocci, I would actually think removing it would be a much better path forward. UNUSED() that renames to help folks without checking compilers would help noticing bad code much earlier than unused.cocci many contributors are not running themselves anyway.
On Fri, Aug 26, 2022 at 09:46:54AM -0700, Junio C Hamano wrote: > > Yeah, my general skepticism and disappointment above notwithstanding, > > this seems like the best path forward from here. I tried a few other > > tricks (like --macro-file and --iso-file), but if its parser chokes, I > > don't think there's much we can do about it. Even if we wrote a patch to > > coccinelle itself (and I have no interest in doing that myself), it > > would take a while to become available. > > If it is just a single unused.cocci, I would actually think removing > it would be a much better path forward. UNUSED() that renames to > help folks without checking compilers would help noticing bad code > much earlier than unused.cocci many contributors are not running > themselves anyway. I doubt that it is just unused.cocci. If I understand correctly, the new syntax is choking coccinelle's parser, so it is missing segments of the code from its analysis. That shows up as a false positive for unused.cocci, because its patch is something like "find me spots that have X followed by Y, with no Z in between (and then delete X and Y)". And the breakage is that it doesn't notice Z, so it thinks it has found such a spot. But other rules are of the form "find me X (and turn it into Y)". If the tool fails to parse code that contains X, then we'll get a false negative: it won't generate output when it should. But we don't notice any of those, because there are no true positives in the code base right now. But it would presumably fail to find some of them if there were. And more importantly, it makes the tool just as useless as false positives. If it were a matter of choosing one static analysis over another (unused parameters versus coccinelle), we might have something tricky to decide. But Ævar's non-parenthesized suggestion is a practical workaround. I don't like it as much as what I posted originally, but it's only a little less ergonomic, and lets us keep using coccinelle (which, despite the headaches I've had tangling with it over the years, has produced useful outcomes). -Peff
Jeff King <peff@peff.net> writes: > If it were a matter of choosing one static analysis over another (unused > parameters versus coccinelle), we might have something tricky to decide. > But Ævar's non-parenthesized suggestion is a practical workaround. OK. Let's go that route, then. Thanks.
On Fri, Aug 26 2022, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > >>> But the fix here isn't to delete unused.cocci, but to hold off on the >>> UNUSEwork D() patches until we figure out how to make coccinelle jive with >>> them. >> >> Yeah, my general skepticism and disappointment above notwithstanding, >> this seems like the best path forward from here. I tried a few other >> tricks (like --macro-file and --iso-file), but if its parser chokes, I >> don't think there's much we can do about it. Even if we wrote a patch to >> coccinelle itself (and I have no interest in doing that myself), it >> would take a while to become available. > > If it is just a single unused.cocci, I would actually think removing > it would be a much better path forward. UNUSED() that renames to > help folks without checking compilers would help noticing bad code > much earlier than unused.cocci many contributors are not running > themselves anyway. I think Jeff King's reply covers what I would have said, except one thing I'd like to add: My reading of this is that you're misimpression that unused.cocci and Jeff's UNUSED macro are two ways to the same end-goal, and that if we keep the macro we could lose the coccinelle rule. But they're doing completely orthogonal checks, the unused.cocci is finding code that's *actually used* accordingn to the compiler, but which we know results in code that's functionally unused. E.g. doing nothing with a "struct strbuf" except to initialize it, and call strbuf_release() on it. Whereas the UNUSED macro is finding parameters that are truly unused, and which we could have expected compilers to optimize out already. It's just helping us maintain our own code hygene. I suspect that we'll also find "functionally unused" code with the UNUSED macro, but only indirectly. I.e. we might find that all users of an interface don't need the Nth parameter, and so we don't need to prepare it for them.
On Fri, Aug 26, 2022 at 09:46:54AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > >> But the fix here isn't to delete unused.cocci, but to hold off on the > >> UNUSEwork D() patches until we figure out how to make coccinelle jive with > >> them. > > > > Yeah, my general skepticism and disappointment above notwithstanding, > > this seems like the best path forward from here. I tried a few other > > tricks (like --macro-file and --iso-file), but if its parser chokes, I > > don't think there's much we can do about it. Even if we wrote a patch to > > coccinelle itself (and I have no interest in doing that myself), it > > would take a while to become available. > > If it is just a single unused.cocci, I would actually think removing > it would be a much better path forward. UNUSED() that renames to > help folks without checking compilers would help noticing bad code > much earlier than unused.cocci many contributors are not running > themselves anyway. Here is another reason for the removal of 'unused.cocci': it's very costly to apply that semantic patch to the whole code base. make SPATCH_BATCH_SIZE=32 contrib/coccinelle/unused.cocci.patch takes 440s on my machine, whereas the second slowest 'object_id.cocci' takes only 56s [1]. Applying 'unused.cocci' to some of our source files individually takes well over a minute: $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/log.c warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso HANDLING: builtin/log.c Note: processing took 83.1s: builtin/log.c real 1m23.083s user 1m22.983s sys 0m0.033s $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/rebase.c warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso HANDLING: builtin/rebase.c Note: processing took 83.2s: builtin/rebase.c real 1m23.223s user 1m23.156s sys 0m0.017s In my opinion the benefits of having 'unused.cocci' clearly do not justify the costs. [1] https://public-inbox.org/git/20220824113901.GD1735@szeder.dev/
On Wed, Aug 31 2022, SZEDER Gábor wrote: > On Fri, Aug 26, 2022 at 09:46:54AM -0700, Junio C Hamano wrote: >> Jeff King <peff@peff.net> writes: >> >> >> But the fix here isn't to delete unused.cocci, but to hold off on the >> >> UNUSEwork D() patches until we figure out how to make coccinelle jive with >> >> them. >> > >> > Yeah, my general skepticism and disappointment above notwithstanding, >> > this seems like the best path forward from here. I tried a few other >> > tricks (like --macro-file and --iso-file), but if its parser chokes, I >> > don't think there's much we can do about it. Even if we wrote a patch to >> > coccinelle itself (and I have no interest in doing that myself), it >> > would take a while to become available. >> >> If it is just a single unused.cocci, I would actually think removing >> it would be a much better path forward. UNUSED() that renames to >> help folks without checking compilers would help noticing bad code >> much earlier than unused.cocci many contributors are not running >> themselves anyway. > > Here is another reason for the removal of 'unused.cocci': it's very > costly to apply that semantic patch to the whole code base. > > make SPATCH_BATCH_SIZE=32 contrib/coccinelle/unused.cocci.patch > > takes 440s on my machine, whereas the second slowest 'object_id.cocci' > takes only 56s [1]. Applying 'unused.cocci' to some of our source files > individually takes well over a minute: > > $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/log.c > warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h > warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso > HANDLING: builtin/log.c > Note: processing took 83.1s: builtin/log.c > > real 1m23.083s > user 1m22.983s If you remove the "done:" line in cmd_format_patch() buiiltin/log.c runs in ~200ms instead of ~40s for me. Perhaps we should be discussing removing or refactoring that one line of code instead? :) Removing coccinelle rules because we're seeing slowness somewhere seems particularly short-sighted to me. Maybe we do run into intractable problems somewhere with it being slow, and we'd also like to cater to more "interactive" use. But we shouldn't do that by removing rules until we get below some runtime limit, but rather by creating a "batch" category or something (just like we have "pending") now. Or, just actually look into why it's slow and fix those issues and/or report them upstream. There's nothing in unused.cocci that we either aren't running into elsewhere, or wouldn't run into if we had 10x the coccinelle rules we have now (which I think would be a good direction, we should rely on it more heavily). I've found that being able to have a ccache-like tool for "spatch"[1] solved almost all of the practical performance concerns I had with it. I.e. I can just run things in a batch, and usually any interactive use will hit things already in cache. To the extent it doesn't it's usually some pathological issue in spatch. > sys 0m0.033s > $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/rebase.c > warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h > warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso > HANDLING: builtin/rebase.c > Note: processing took 83.2s: builtin/rebase.c > > real 1m23.223s > user 1m23.156s > sys 0m0.017s I didn't look at this one, but I assume it's some similar (and probably easily fixed) pathological issue. 1. https://lore.kernel.org/git/patch-5.5-ce4734e5d79-20220825T141212Z-avarab@gmail.com/
On Mon, Aug 29, 2022 at 12:29:09PM +0200, Ævar Arnfjörð Bjarmason wrote: > My reading of this is that you're misimpression that unused.cocci and > Jeff's UNUSED macro are two ways to the same end-goal, and that if we > keep the macro we could lose the coccinelle rule. Ah, I didn't think of that, but yeah, that would explain Junio's position a bit more. > But they're doing completely orthogonal checks, the unused.cocci is > finding code that's *actually used* accordingn to the compiler, but which > we know results in code that's functionally unused. Right. They're two separate types of "unused", and they should actually complement each other (e.g., if we drop an unused strbuf parameter, then a calling function may find that it is now doing nothing but init/release on the strbuf). So we'd want both. -Peff
On Wed, Aug 31, 2022 at 02:13:51PM +0200, Ævar Arnfjörð Bjarmason wrote: > Removing coccinelle rules because we're seeing slowness somewhere seems > particularly short-sighted to me. > > Maybe we do run into intractable problems somewhere with it being slow, > and we'd also like to cater to more "interactive" use. Agreed. I'm not wild about how long it takes to run either, but if it's producing useful results, it seems worth it to pay the CPU (and I think unused.cocci did find some useful results already). There's a point at which the CPU use becomes intractable, but I don't think we're there yet. > There's nothing in unused.cocci that we either aren't running into > elsewhere, or wouldn't run into if we had 10x the coccinelle rules we > have now (which I think would be a good direction, we should rely on it > more heavily). From past experience, I suspect the "<... ...>" operator is what's expensive. I don't see an easy way of avoiding it here, though. I'm more skeptical on more coccinelle in general, just because I've spent so many hours fighting with it for both output and performance reasons. But if somebody else is willing to do that work, I'm OK with it. I have often wondered if our rules are sufficiently simple that libclang plus some light scripting might get us similar results with less hassle. But maybe that's a rabbit hole. -Peff
On Wed, Aug 31, 2022 at 02:13:51PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Aug 31 2022, SZEDER Gábor wrote: > > > On Fri, Aug 26, 2022 at 09:46:54AM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: > >> > >> >> But the fix here isn't to delete unused.cocci, but to hold off on the > >> >> UNUSEwork D() patches until we figure out how to make coccinelle jive with > >> >> them. > >> > > >> > Yeah, my general skepticism and disappointment above notwithstanding, > >> > this seems like the best path forward from here. I tried a few other > >> > tricks (like --macro-file and --iso-file), but if its parser chokes, I > >> > don't think there's much we can do about it. Even if we wrote a patch to > >> > coccinelle itself (and I have no interest in doing that myself), it > >> > would take a while to become available. > >> > >> If it is just a single unused.cocci, I would actually think removing > >> it would be a much better path forward. UNUSED() that renames to > >> help folks without checking compilers would help noticing bad code > >> much earlier than unused.cocci many contributors are not running > >> themselves anyway. > > > > Here is another reason for the removal of 'unused.cocci': it's very > > costly to apply that semantic patch to the whole code base. > > > > make SPATCH_BATCH_SIZE=32 contrib/coccinelle/unused.cocci.patch > > > > takes 440s on my machine, whereas the second slowest 'object_id.cocci' > > takes only 56s [1]. Applying 'unused.cocci' to some of our source files > > individually takes well over a minute: > > > > $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/log.c > > warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h > > warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso > > HANDLING: builtin/log.c > > Note: processing took 83.1s: builtin/log.c > > > > real 1m23.083s > > user 1m22.983s > > If you remove the "done:" line in cmd_format_patch() buiiltin/log.c runs > in ~200ms instead of ~40s for me. Perhaps we should be discussing > removing or refactoring that one line of code instead? :) > > Removing coccinelle rules because we're seeing slowness somewhere seems > particularly short-sighted to me. It's not just slowness, it's drastic slowness. I'm looking at two "from scratch" 'make coccicheck' runs here, one with 'unused.cocci' taking 9m51s, one without taking 4m56s. So 'unused.cocci' effectively doubled the runtime, and wastes other developers' time and resources. I don't see anything wrong with removing a semantic patch that is as slow as 'unused.cocci' in its current form on our current codebase. We can always re-add it later, after those interested managed to figure out a way to address its slowness, and updated the semantic patch and/or the codebase accordingly. > Maybe we do run into intractable problems somewhere with it being slow, Looking at the runtimes I showed above, I think deeming it intractable is fully justified. > and we'd also like to cater to more "interactive" use. > > But we shouldn't do that by removing rules until we get below some > runtime limit, but rather by creating a "batch" category or something > (just like we have "pending") now. > > Or, just actually look into why it's slow and fix those issues and/or > report them upstream. IMO this should be the other way around: if applying a semantic patch is this slow, then first look into why it's slow, fix it, and only then submit it for merging. A semantic patch this slow shouldn't have been merged in the first place. > There's nothing in unused.cocci that we either aren't running into > elsewhere, or wouldn't run into if we had 10x the coccinelle rules we > have now (which I think would be a good direction, we should rely on it > more heavily). Several developers have already stated that they might run 'make coccicheck' more often if it weren't so slow. I think we must keep this in mind when adding new semantic patches, and should aim for a good return of investment between the usefulness of the semantic patch and its overhead. 'unused.cocci' doesn't seem to strike a good balance here. I doubt that I would ever run 'make coccicheck' if we had 10x as many semantic patches. > I've found that being able to have a ccache-like tool for "spatch"[1] > solved almost all of the practical performance concerns I had with > it. I.e. I can just run things in a batch, and usually any interactive > use will hit things already in cache. Well, perhaps that's why you didn't notice just how slow 'unused.cocci' can be... :) Please don't forget about the runtime of a default "from scratch" 'make coccicheck'. > To the extent it doesn't it's usually some pathological issue in spatch. > > > sys 0m0.033s > > $ time spatch --all-includes --sp-file contrib/coccinelle/unused.cocci builtin/rebase.c > > warning: Can't find macro file: /usr/local/bin/lib/coccinelle/standard.h > > warning: Can't find default iso file: /usr/local/bin/lib/coccinelle/standard.iso > > HANDLING: builtin/rebase.c > > Note: processing took 83.2s: builtin/rebase.c > > > > real 1m23.223s > > user 1m23.156s > > sys 0m0.017s > > I didn't look at this one, but I assume it's some similar (and probably > easily fixed) pathological issue. > > 1. https://lore.kernel.org/git/patch-5.5-ce4734e5d79-20220825T141212Z-avarab@gmail.com/
Jeff King <peff@peff.net> writes: > On Wed, Aug 31, 2022 at 02:13:51PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Removing coccinelle rules because we're seeing slowness somewhere seems >> particularly short-sighted to me. >> >> Maybe we do run into intractable problems somewhere with it being slow, >> and we'd also like to cater to more "interactive" use. > > Agreed. I'm not wild about how long it takes to run either, but if it's > producing useful results, it seems worth it to pay the CPU (and I think > unused.cocci did find some useful results already). There's a point at > which the CPU use becomes intractable, but I don't think we're there > yet. I am not, either, as it does not look like it is producing all that useful results these days, but of course, we are looking at the end result of previous clean-up triggered by these existing rules, and therefore be biased---we only see cost of running the rules these days without seeing anything new getting caught.
On Wed, Aug 31 2022, SZEDER Gábor wrote: > On Wed, Aug 31, 2022 at 02:13:51PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Wed, Aug 31 2022, SZEDER Gábor wrote: >>> [...] >> If you remove the "done:" line in cmd_format_patch() buiiltin/log.c runs >> in ~200ms instead of ~40s for me. Perhaps we should be discussing >> removing or refactoring that one line of code instead? :) >> >> Removing coccinelle rules because we're seeing slowness somewhere seems >> particularly short-sighted to me. > > It's not just slowness, it's drastic slowness. I'm looking at two > "from scratch" 'make coccicheck' runs here, one with 'unused.cocci' > taking 9m51s, one without taking 4m56s. So 'unused.cocci' effectively > doubled the runtime, and wastes other developers' time and resources. > > I don't see anything wrong with removing a semantic patch that is as > slow as 'unused.cocci' in its current form on our current codebase. > We can always re-add it later, after those interested managed to > figure out a way to address its slowness, and updated the semantic > patch and/or the codebase accordingly. > >> Maybe we do run into intractable problems somewhere with it being slow, > > Looking at the runtimes I showed above, I think deeming it intractable > is fully justified. > >> and we'd also like to cater to more "interactive" use. >> >> But we shouldn't do that by removing rules until we get below some >> runtime limit, but rather by creating a "batch" category or something >> (just like we have "pending") now. >> >> Or, just actually look into why it's slow and fix those issues and/or >> report them upstream. > > IMO this should be the other way around: if applying a semantic patch > is this slow, then first look into why it's slow, fix it, and only > then submit it for merging. A semantic patch this slow shouldn't have > been merged in the first place. If we're playing hypotheticals then if it hadn't been merged in the first place we'd have the UNUSED() macro all over the place on master now, and several *other* rules would probably be more broken (but I haven't looked into the cooci guts enough), we just wouldn't notice since "unused" is currently the only "look ahead" rule that fired & caught it :) >> There's nothing in unused.cocci that we either aren't running into >> elsewhere, or wouldn't run into if we had 10x the coccinelle rules we >> have now (which I think would be a good direction, we should rely on it >> more heavily). > > Several developers have already stated that they might run 'make > coccicheck' more often if it weren't so slow. I think we must keep > this in mind when adding new semantic patches, and should aim for a > good return of investment between the usefulness of the semantic patch > and its overhead. 'unused.cocci' doesn't seem to strike a good > balance here. > > I doubt that I would ever run 'make coccicheck' if we had 10x as many > semantic patches. I think we're just describing our workflows by proxy here. For me I generally take wasting a computer's time over a person's time any day. We should always be optimizing for saving people's time over CPUs, because the latter is relatively cheap. Granted unused.cocci isn't doing much in the grand scheme of things, but if you know it's there it's one more thing you can let your eyes wander over in patch review. You don't need to worry if some "struct strbuf" (or similar) is really being used, or is leftover boilerplate. Yes, it would be useful if coccicheck were currently fast enough to run as part of an edit-save-compile-check cycle, but like e.g. -fanalyzer it's much too slow for that for most people. So it gets run as some "batch' job. I think most people who use it in git.git are doing so by pushing to CI. There it doesn't really matter that it takes longer now, because as long as it's not slower than the longest running job it won't hold up the total CI runtime, which is in the ~1hr range anyway. So that's how I use it (when based off master). I also run it when I build my local git, but that's a "kick it off in the background and wait" type of operation. It runs tests, then other tests with other compilers, SANITIZE=address etc. etc. So I'm sorry if I tripped up some workflow for you with this change, but I do still think it's worth it. >> I've found that being able to have a ccache-like tool for "spatch"[1] >> solved almost all of the practical performance concerns I had with >> it. I.e. I can just run things in a batch, and usually any interactive >> use will hit things already in cache. > > Well, perhaps that's why you didn't notice just how slow > 'unused.cocci' can be... :) Please don't forget about the runtime of > a default "from scratch" 'make coccicheck'. I did notice FWIW, but since I'm mainly looking at CI and other "batch" operations when looking at "coccicheck" I thought the trade-off was worth it.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index cd1f52692a5..831f4df56c5 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -309,7 +309,7 @@ jobs: if: needs.ci-config.outputs.enabled == 'yes' env: jobname: StaticAnalysis - runs-on: ubuntu-18.04 + runs-on: ubuntu-22.04 steps: - uses: actions/checkout@v2 - run: ci/install-dependencies.sh