Message ID | 20240509162415.GB1708042@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 11c7001e3d37a64a0eacfc314d9609d37f33b9f4 |
Headers | show |
Series | un-breaking osx-gcc ci job | expand |
On Thu, May 09, 2024 at 12:24:15PM -0400, Jeff King wrote: > On macOS, a bare "gcc" (without a version) will invoke a wrapper for > clang, not actual gcc. Even when gcc is installed via homebrew, that > only provides version-specific links in /usr/local/bin (like "gcc-13"), > and never a version-agnostic "gcc" wrapper. That's awful. > As far as I can tell, this has been the case for a long time, and this > osx-gcc job has largely been doing nothing. We can point it at "gcc-13", > which will pick up the homebrew-installed version. > > The fix here is specific to the github workflow file, as the gitlab one > does not have a matching job. Indeed we don't. We could add it, but that certainly does not need to be part of this patch series. > It's a little unfortunate that we cannot just ask for the latest version > of gcc which homebrew provides, but as far as I can tell there is no > easy alias (you'd have to find the highest number gcc-* in > /usr/local/bin yourself). > > Signed-off-by: Jeff King <peff@peff.net> > --- > We'll eventually have to bump "gcc-13" to "gcc-14" here, and so on. I > don't think we ever cared about gcc-13 in particular; it's just that > older versions of the runner image had some ancient version which we > wanted to avoid. As an alternative we could munge PATH such that Homebrew's GCC is found before Apple's. Patrick
On Fri, May 10, 2024 at 09:00:14AM +0200, Patrick Steinhardt wrote: > > We'll eventually have to bump "gcc-13" to "gcc-14" here, and so on. I > > don't think we ever cared about gcc-13 in particular; it's just that > > older versions of the runner image had some ancient version which we > > wanted to avoid. > > As an alternative we could munge PATH such that Homebrew's GCC is found > before Apple's. Ideally, yeah, but it's not just a PATH issue. AFAICT there literally is no "gcc" in any PATH, only the version specific ones (even if you "brew install gcc"). So you'd need something like: for i in /usr/local/bin/gcc-* do # rely on sorting of glob to do last-one-wins gcc=$i done -Peff
On Thu, May 9, 2024 at 9:24 AM Jeff King <peff@peff.net> wrote: > > On macOS, a bare "gcc" (without a version) will invoke a wrapper for > clang, not actual gcc. Even when gcc is installed via homebrew, that > only provides version-specific links in /usr/local/bin (like "gcc-13"), > and never a version-agnostic "gcc" wrapper. > > As far as I can tell, this has been the case for a long time, and this > osx-gcc job has largely been doing nothing. If it's been doing nothing (which I interpreted as "it's doing the same thing as osx-clang"), and we've not noticed any issues with that, do we need the job at all? Should we just rely on clang and not test with gcc on macOS, since it's not a compiler that's provided by the platform anymore? > We can point it at "gcc-13", > which will pick up the homebrew-installed version. > > The fix here is specific to the github workflow file, as the gitlab one > does not have a matching job. > > It's a little unfortunate that we cannot just ask for the latest version > of gcc which homebrew provides, but as far as I can tell there is no > easy alias (you'd have to find the highest number gcc-* in > /usr/local/bin yourself). > > Signed-off-by: Jeff King <peff@peff.net> > --- > We'll eventually have to bump "gcc-13" to "gcc-14" here, and so on. I > don't think we ever cared about gcc-13 in particular; it's just that > older versions of the runner image had some ancient version which we > wanted to avoid. > > .github/workflows/main.yml | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml > index 5838986895..5f92a50271 100644 > --- a/.github/workflows/main.yml > +++ b/.github/workflows/main.yml > @@ -284,7 +284,7 @@ jobs: > cc: clang > pool: macos-13 > - jobname: osx-gcc > - cc: gcc > + cc: gcc-13 > cc_package: gcc-13 > pool: macos-13 > - jobname: linux-gcc-default > -- > 2.45.0.414.g4009e73179 > >
Kyle Lippincott <spectral@google.com> writes: > On Thu, May 9, 2024 at 9:24 AM Jeff King <peff@peff.net> wrote: >> >> On macOS, a bare "gcc" (without a version) will invoke a wrapper for >> clang, not actual gcc. Even when gcc is installed via homebrew, that >> only provides version-specific links in /usr/local/bin (like "gcc-13"), >> and never a version-agnostic "gcc" wrapper. >> >> As far as I can tell, this has been the case for a long time, and this >> osx-gcc job has largely been doing nothing. > > If it's been doing nothing (which I interpreted as "it's doing the > same thing as osx-clang"), and we've not noticed any issues with that, > do we need the job at all? Should we just rely on clang and not test > with gcc on macOS, since it's not a compiler that's provided by the > platform anymore? A very tempting suggestion. I do not see any problems with the direction. Thanks.
On Fri, May 10, 2024 at 01:32:15PM -0700, Kyle Lippincott wrote: > On Thu, May 9, 2024 at 9:24 AM Jeff King <peff@peff.net> wrote: > > > > On macOS, a bare "gcc" (without a version) will invoke a wrapper for > > clang, not actual gcc. Even when gcc is installed via homebrew, that > > only provides version-specific links in /usr/local/bin (like "gcc-13"), > > and never a version-agnostic "gcc" wrapper. > > > > As far as I can tell, this has been the case for a long time, and this > > osx-gcc job has largely been doing nothing. > > If it's been doing nothing (which I interpreted as "it's doing the > same thing as osx-clang"), and we've not noticed any issues with that, > do we need the job at all? Should we just rely on clang and not test > with gcc on macOS, since it's not a compiler that's provided by the > platform anymore? Your interpretation is correct; it was just doing the same thing as osx-clang. As I said earlier, I'd be fine with just dropping it. It's possible that there is value in testing there with gcc (that we've been missing out on), but to know that we'd have to understand the original reasons it was added. Looks like it came from 889cacb689 (ci: configure GitHub Actions for CI/PR, 2020-04-11), but that was just porting from the Azure pipelines config. That system got its osx_gcc job from 27be78173d (Add a build definition for Azure DevOps, 2019-01-29), but that in turn was just porting the Travis config. The Travis job came from 522354d70f (Add Travis CI support, 2015-11-27), and there's no specific rationale given. But it used a build matrix of (os, compiler), so we got all four of {linux,osx}-{gcc,clang}. So it mostly seems like "more is better" and that was the easiest way to define it. I do think there's value in testing with both clang and gcc in general[1]. And there is _some_ code which is compiled only on macos and not elsewhere. So this would be our only chance for gcc to see it. But it seems like a pretty small return for an entire parallel job. Especially as I do not think it has uncovered anything interesting in the past (even when it was working). -Peff [1] Another quirk is that we run the whole test suite for both compilers, which is probably overkill. The main value in comparing gcc vs clang is that we don't use any constructs that the compiler complains about. It's _possible_ for there to be a construct that the compiler does not notice but which causes a runtime difference (say, undefined behavior which happens to work out on one compiler), but I think we're again hitting diminishing returns.
Jeff King <peff@peff.net> writes: > I do think there's value in testing with both clang and gcc in > general[1]. And there is _some_ code which is compiled only on macos > and not elsewhere. So this would be our only chance for gcc to see it. > But it seems like a pretty small return for an entire parallel job. > Especially as I do not think it has uncovered anything interesting in > the past (even when it was working). 100% agreed. > [1] Another quirk is that we run the whole test suite for both > compilers, which is probably overkill. The main value in comparing > gcc vs clang is that we don't use any constructs that the compiler > complains about. It's _possible_ for there to be a construct that > the compiler does not notice but which causes a runtime difference > (say, undefined behavior which happens to work out on one compiler), > but I think we're again hitting diminishing returns. Yeah, that is a very good point.
On Fri, May 10, 2024 at 03:47:39PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: [snip] > > [1] Another quirk is that we run the whole test suite for both > > compilers, which is probably overkill. The main value in comparing > > gcc vs clang is that we don't use any constructs that the compiler > > complains about. It's _possible_ for there to be a construct that > > the compiler does not notice but which causes a runtime difference > > (say, undefined behavior which happens to work out on one compiler), > > but I think we're again hitting diminishing returns. > > Yeah, that is a very good point. On Linux, we have the "pedantic" job that runs Fedora and only compiles the sources with DEVOPTS=pedantic without running any of the tests. We could do the same on macOS. Patrick
On Sat, May 11, 2024 at 07:21:28PM +0200, Patrick Steinhardt wrote: > On Fri, May 10, 2024 at 03:47:39PM -0700, Junio C Hamano wrote: > > Jeff King <peff@peff.net> writes: > [snip] > > > [1] Another quirk is that we run the whole test suite for both > > > compilers, which is probably overkill. The main value in comparing > > > gcc vs clang is that we don't use any constructs that the compiler > > > complains about. It's _possible_ for there to be a construct that > > > the compiler does not notice but which causes a runtime difference > > > (say, undefined behavior which happens to work out on one compiler), > > > but I think we're again hitting diminishing returns. > > > > Yeah, that is a very good point. > > On Linux, we have the "pedantic" job that runs Fedora and only compiles > the sources with DEVOPTS=pedantic without running any of the tests. We > could do the same on macOS. Yeah, I think the infrastructure is there (looks like just resetting $run_tests). We probably could stand to use it in more places. E.g., is there even value in running the tests for linux-gcc and linux-clang? It's _possible_ for there to be a run-time difference in the compiler outputs, but we may be hitting diminishing returns. The main value I think is just seeing what the compilers complain about. But I dunno. This thread argues there is value in running the tests with the separate compiler: https://lore.kernel.org/git/pull.266.git.gitgitgadget@gmail.com/ which I guess would argue for doing the same for osx-clang and osx-gcc (if the latter continues to exist). -Peff
On Thu, May 16, 2024 at 03:19:30AM -0400, Jeff King wrote: > On Sat, May 11, 2024 at 07:21:28PM +0200, Patrick Steinhardt wrote: > > > On Fri, May 10, 2024 at 03:47:39PM -0700, Junio C Hamano wrote: > > > Jeff King <peff@peff.net> writes: > > [snip] > > > > [1] Another quirk is that we run the whole test suite for both > > > > compilers, which is probably overkill. The main value in comparing > > > > gcc vs clang is that we don't use any constructs that the compiler > > > > complains about. It's _possible_ for there to be a construct that > > > > the compiler does not notice but which causes a runtime difference > > > > (say, undefined behavior which happens to work out on one compiler), > > > > but I think we're again hitting diminishing returns. > > > > > > Yeah, that is a very good point. > > > > On Linux, we have the "pedantic" job that runs Fedora and only compiles > > the sources with DEVOPTS=pedantic without running any of the tests. We > > could do the same on macOS. > > Yeah, I think the infrastructure is there (looks like just resetting > $run_tests). We probably could stand to use it in more places. E.g., is > there even value in running the tests for linux-gcc and linux-clang? > It's _possible_ for there to be a run-time difference in the compiler > outputs, but we may be hitting diminishing returns. The main value I > think is just seeing what the compilers complain about. Actually, scratch that. I forgot we dropped linux-clang last summer in d88d727143 (ci: drop linux-clang job, 2023-06-01), since it was mostly redundant with the sanitizer builds (which use clang). There is still "linux-gcc" and "linux-gcc-default", the former of which uses an older version of the compiler. And again, it's not clear to me that running the actual tests is uncovering useful stuff there. Certainly it is possible, but I wonder if that is the best bang for the buck (e.g., if we wanted to spend a "make test" worth of CPU, then something like osx-sha256 would IMHO be more likely to uncover something useful). -Peff
On Thu, May 16, 2024 at 03:19:30AM -0400, Jeff King wrote: > On Sat, May 11, 2024 at 07:21:28PM +0200, Patrick Steinhardt wrote: > > > On Fri, May 10, 2024 at 03:47:39PM -0700, Junio C Hamano wrote: > > > Jeff King <peff@peff.net> writes: > > [snip] > > > > [1] Another quirk is that we run the whole test suite for both > > > > compilers, which is probably overkill. The main value in comparing > > > > gcc vs clang is that we don't use any constructs that the compiler > > > > complains about. It's _possible_ for there to be a construct that > > > > the compiler does not notice but which causes a runtime difference > > > > (say, undefined behavior which happens to work out on one compiler), > > > > but I think we're again hitting diminishing returns. > > > > > > Yeah, that is a very good point. > > > > On Linux, we have the "pedantic" job that runs Fedora and only compiles > > the sources with DEVOPTS=pedantic without running any of the tests. We > > could do the same on macOS. > > Yeah, I think the infrastructure is there (looks like just resetting > $run_tests). We probably could stand to use it in more places. E.g., is > there even value in running the tests for linux-gcc and linux-clang? > It's _possible_ for there to be a run-time difference in the compiler > outputs, but we may be hitting diminishing returns. The main value I > think is just seeing what the compilers complain about. That's certainly the biggest part, yeah. But I have been hitting lots of compiler-dependent behaviour. This is mostly in the area of bugs though, where for example toolchain A may initialize variables on the stack to all zeroes while toolchain B does not. I guess this is mostly a question of defaults though, and I think it is partially influenced by the overall toolchain environment as configured by my distro. Especially hardening options are for example quite likely to lead to different behaviour. I'm not sure whether this is sufficient reason on its own to warrant testing with several toolchains. But we can easily combine this with additional tuning knobs. Two separate test jobs with GCC and Clang are comparatively boring. But if we make it GCC+SHA1 and Clang+SHA256 then it becomes more interesting. So I think dropping the compiler coverage completely is rather pointless because we already run multiple different jobs per platform anyway. But we should investigate whether we can cleverly combine those so that we do not need a separate jobs just to test a specific compiler. Patrick > But I dunno. This thread argues there is value in running the tests with > the separate compiler: > > https://lore.kernel.org/git/pull.266.git.gitgitgadget@gmail.com/ > > which I guess would argue for doing the same for osx-clang and osx-gcc > (if the latter continues to exist). > > -Peff
On Thu, May 16, 2024 at 11:54:44AM +0200, Patrick Steinhardt wrote: > That's certainly the biggest part, yeah. But I have been hitting lots of > compiler-dependent behaviour. This is mostly in the area of bugs though, > where for example toolchain A may initialize variables on the stack to > all zeroes while toolchain B does not. I've definitely run into differing runtime outcomes for undefined behavior stuff like that. But in my experience most of that is consistently found by ASan/UBSan (which we do run in CI these days). It's possible there are cases that those sanitizers don't catch but that cause differing behavior. But I can't think of one off the top of my head where that has happened. > I'm not sure whether this is sufficient reason on its own to warrant > testing with several toolchains. But we can easily combine this with > additional tuning knobs. Two separate test jobs with GCC and Clang are > comparatively boring. But if we make it GCC+SHA1 and Clang+SHA256 then > it becomes more interesting. Yeah. Combining orthogonal properties into a single job lets us cover both (for the common case of success on both) with less CPU. But: - it can sometimes be hard to figure out which of the properties was responsible for a failure. That was the very subject of the thread I referenced earlier, where "linux-gcc" was "use gcc" and also "set lots of knobs". - they might not actually be orthogonal. If you care about checking runtime behavior in the output of two compilers, then that _could_ manifest only in the sha256 code. Or as you get into more properties, they may overlap in other ways. I think reftable+sha256 is an interesting (eventual) combo to test on top of reftable+sha1. -Peff
On Fri, May 17, 2024 at 04:19:09AM -0400, Jeff King wrote: > On Thu, May 16, 2024 at 11:54:44AM +0200, Patrick Steinhardt wrote: > > > That's certainly the biggest part, yeah. But I have been hitting lots of > > compiler-dependent behaviour. This is mostly in the area of bugs though, > > where for example toolchain A may initialize variables on the stack to > > all zeroes while toolchain B does not. > > I've definitely run into differing runtime outcomes for undefined > behavior stuff like that. But in my experience most of that is > consistently found by ASan/UBSan (which we do run in CI these days). > > It's possible there are cases that those sanitizers don't catch but that > cause differing behavior. But I can't think of one off the top of my > head where that has happened. True, these should be sufficient indeed. > > I'm not sure whether this is sufficient reason on its own to warrant > > testing with several toolchains. But we can easily combine this with > > additional tuning knobs. Two separate test jobs with GCC and Clang are > > comparatively boring. But if we make it GCC+SHA1 and Clang+SHA256 then > > it becomes more interesting. > > Yeah. Combining orthogonal properties into a single job lets us cover > both (for the common case of success on both) with less CPU. But: > > - it can sometimes be hard to figure out which of the properties was > responsible for a failure. That was the very subject of the thread I > referenced earlier, where "linux-gcc" was "use gcc" and also "set > lots of knobs". That's true. But for me the problem typically is that you need to be aware that the job uses different properties in the first place -- this is quite hidden away. Figuring out that a job uses "main" as default branch just because it is called "linux-gcc" is quite hard unless you are aware of how exactly our CI systems work. And besides being hard to discover, it's also really fragile. I wish that we got rid of relying on job names and made this more discoverable. The obvious way to do so would be to instead declare the `GIT_TEST_` variables in the CI definitions, which would make them easy to spot and change. But it of course has the big downside that it is now quite easy for the different CI platforms to diverge. > - they might not actually be orthogonal. If you care about checking > runtime behavior in the output of two compilers, then that _could_ > manifest only in the sha256 code. Or as you get into more > properties, they may overlap in other ways. I think reftable+sha256 > is an interesting (eventual) combo to test on top of reftable+sha1. Yes, I really want to have reftable+sha256, as well. I didn't feel like adding it back then because we already have a ton of jobs, and adding another job felt like pushing the limits. Patrick
Jeff King <peff@peff.net> writes: > Yeah. Combining orthogonal properties into a single job lets us cover > both (for the common case of success on both) with less CPU. But: > > - it can sometimes be hard to figure out which of the properties was > responsible for a failure. That was the very subject of the thread I > referenced earlier, where "linux-gcc" was "use gcc" and also "set > lots of knobs". > > - they might not actually be orthogonal. If you care about checking > runtime behavior in the output of two compilers, then that _could_ > manifest only in the sha256 code. Or as you get into more > properties, they may overlap in other ways. I think reftable+sha256 > is an interesting (eventual) combo to test on top of reftable+sha1. We could consider permuting, then? If we (for the sake of simplicity) had two jobs available, one compiled with GCC and the other compiled with clang, we can enumerate other properties (e.g. <SHA-1 vs SHA-256>, <reftable vs reffiles>) into pairs, and in one run, GCC may be running SHA-1+reffiles while clang is running SHA-256+reftable, and in another run, GCC may be running SHA-256+reffiles, etc. --- eventually we cover all four combinations (admittedly for different commits).
On Fri, May 17, 2024 at 09:59:35AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Yeah. Combining orthogonal properties into a single job lets us cover > > both (for the common case of success on both) with less CPU. But: > > > > - it can sometimes be hard to figure out which of the properties was > > responsible for a failure. That was the very subject of the thread I > > referenced earlier, where "linux-gcc" was "use gcc" and also "set > > lots of knobs". > > > > - they might not actually be orthogonal. If you care about checking > > runtime behavior in the output of two compilers, then that _could_ > > manifest only in the sha256 code. Or as you get into more > > properties, they may overlap in other ways. I think reftable+sha256 > > is an interesting (eventual) combo to test on top of reftable+sha1. > > We could consider permuting, then? If we (for the sake of > simplicity) had two jobs available, one compiled with GCC and the > other compiled with clang, we can enumerate other properties > (e.g. <SHA-1 vs SHA-256>, <reftable vs reffiles>) into pairs, and in > one run, GCC may be running SHA-1+reffiles while clang is running > SHA-256+reftable, and in another run, GCC may be running > SHA-256+reffiles, etc. --- eventually we cover all four combinations > (admittedly for different commits). That's a neat idea to get eventual coverage. I have a feeling it would be a pain in practice, though, because now the CI results aren't quite deterministic. So if commit X introduces a bug in some combination, we might not find out until later, and seeing that X passed all tests doesn't absolve it of responsibility anymore. Likewise, I often have to re-run CI to get more data, or to see if a failure is a flake. If it changed what it ran that would be confusing (though I guess you could use the commit hash as the random "seed" for deciding which permutation to run). -Peff
Jeff King <peff@peff.net> writes: > ... be a pain in practice, though, because now the CI results aren't quite > deterministic. So if commit X introduces a bug in some combination, we > might not find out until later, and seeing that X passed all tests > doesn't absolve it of responsibility anymore. Right. A poor idea retracted.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 5838986895..5f92a50271 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -284,7 +284,7 @@ jobs: cc: clang pool: macos-13 - jobname: osx-gcc - cc: gcc + cc: gcc-13 cc_package: gcc-13 pool: macos-13 - jobname: linux-gcc-default
On macOS, a bare "gcc" (without a version) will invoke a wrapper for clang, not actual gcc. Even when gcc is installed via homebrew, that only provides version-specific links in /usr/local/bin (like "gcc-13"), and never a version-agnostic "gcc" wrapper. As far as I can tell, this has been the case for a long time, and this osx-gcc job has largely been doing nothing. We can point it at "gcc-13", which will pick up the homebrew-installed version. The fix here is specific to the github workflow file, as the gitlab one does not have a matching job. It's a little unfortunate that we cannot just ask for the latest version of gcc which homebrew provides, but as far as I can tell there is no easy alias (you'd have to find the highest number gcc-* in /usr/local/bin yourself). Signed-off-by: Jeff King <peff@peff.net> --- We'll eventually have to bump "gcc-13" to "gcc-14" here, and so on. I don't think we ever cared about gcc-13 in particular; it's just that older versions of the runner image had some ancient version which we wanted to avoid. .github/workflows/main.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)