diff mbox series

[v4,2/2] ci: compile "linux-gcc-default" job with -Og

Message ID 03270d3414117ae7229d87127cff81e349557039.1718001244.git.ps@pks.im (mailing list archive)
State New, archived
Headers show
Series ci: detect more warnings via `-Og` | expand

Commit Message

Patrick Steinhardt June 10, 2024, 6:38 a.m. UTC
We have recently noticed that our CI does not always notice variables
that may be used uninitialized. While it is expected that compiler
warnings aren't perfect, this one was a bit puzzling because it was
rather obvious that the variable can be uninitialized.

Many compiler warnings unfortunately depend on the optimization level
used by the compiler. While `-O0` for example will disable a lot of
warnings altogether because optimization passes go away, `-O2`, which is
our default optimization level used in CI, may optimize specific code
away or even double down on undefined behaviour. Interestingly, this
specific instance that triggered the investigation does get noted by GCC
when using `-Og`.

While we could adapt all jobs to compile with `-Og` now, that would
potentially mask other warnings that only get diagnosed with `-O2`.
Instead, adapt the "linux-gcc-default" job to compile with `-Og`. This
job is chosen because it uses the "ubuntu:latest" image and should thus
have a comparatively recent compiler toolchain, and because we have
other jobs that use "ubuntu:latest" so that we do not lose coverage for
warnings diagnosed only on `-O2` level.

To make it easier to set up the optimization level in our CI, add
support in our Makefile to specify the level via an environment
variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 ci/run-build-and-tests.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Junio C Hamano June 10, 2024, 4:06 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> While we could adapt all jobs to compile with `-Og` now, that would
> potentially mask other warnings that only get diagnosed with `-O2`.

I would say it is not just "potentially".  It is likely that higher
optimization levels will diagnose breakage more correctly, while
lower optimization levels may trigger warnings more often.  I
suspect a large part of those that lower optimization levels alone
find may be due to false positives.

So, ... I had a strong knee-jerk reaction against "While we could
adapt all" with "why would anybody sane would even want to do such a
thing in the first place?"  If it were phrased like "as we have
plenty of jobs compiling with -O2, we can afford to switch a few of
them to different optimization levels, and we may find bugs that the
exiting level did not notice, as the compilers' understanding of the
data and control flow is affected by these -O levels", I would have
understood it, and I think that is what happens in this patch.

I just tried compiling pack-mtimes.c (untouced between maint and
seen) with -Og using gcc-11, gcc-12, and gcc-13, using the exact
copy of the command line with V=1 output I see at the GitHub CI
[*1*] (after all, this was a good use case, even though it was
dropped from this round ;-)

The earlier versions of the compiler mark the use of mtimes_size
that is guarded in "if (data)" as maybe-uninitialized:

        pack-mtimes.c: In function ‘load_pack_mtimes_file’:
        pack-mtimes.c:89:25: error: ‘mtimes_size’ may be used uninitialized [-Werror=maybe-uninitialized]
           89 |                         munmap(data, mtimes_size);
              |                         ^~~~~~~~~~~~~~~~~~~~~~~~~
        pack-mtimes.c:32:16: note: ‘mtimes_size’ was declared here
           32 |         size_t mtimes_size, expected_size;
              |                ^~~~~~~~~~~
        cc1: all warnings being treated as errors
        make: *** [Makefile:2750: pack-mtimes.o] Error 1

But as I already reported elsewhere, this is a false positive.  The
place "data" can possibly become non-NULL happens only after the
control passes the place mtimes_size get assigned a meaningful
value.  If the control reached here and data were non-NULL, the
control flow guarantees that mtimes_size has been assigned
xsize_t(st.st_size) and cannot be uninitialized.

We can squelch the warning by initializing the variable to a
meaningless value, like 0, but I consider that such a change makes
the resulting code a lot worse than the original, and that is not
because I do not want an extra and meaningless assignment emitted
[*2*] in the resulting binary whose performance impact cannot be
measured by anybody.

It is bad because it defeats efforts by compilers that understand
the data flow a bit better to diagnose a true "used uninitialized"
bugs we may introduce in the future.  Adding such a workaround goes
against the longer-term health of the codebase.

For example, I could add another use of mtimes_size that is not
guarded by "if (data)" right next to it, i.e.

        @@ -87,6 +87,7 @@ static int load_pack_mtimes_file(char *mtimes_file,
                if (ret) {
                        if (data)
                                munmap(data, mtimes_size);
        +		printf("debug %d\n", (int)mtimes_size);
                } else {
                        *len_p = mtimes_size;
                        *data_p = data;

gcc-13 does notice that we are now truly using the variable
uninitialized and flags it for us (gcc-12 also does notice but the
important difference is that gcc-13 refrained from warning at the
safe use of the variable that is guarded by "if (data)").

Once we initialize the variable to meaninless 0 as a workaround,
however, the compiler cannot help us with the "used uninitialized"
warning, and there is no "the program uses the variable what was
initialized with meaningless value before it get set a meaningful
value" warning, unfortunately X-<.  And that, not an extra
assignment, is why such a workaround  goes against the longer-term
health of the codebase.

By the way, it is not just pack-mtimes.c [*3*]; there are quite a few
obvious false positives, e.g.

builtin/branch.c: In function ‘print_current_branch_name’:
builtin/branch.c:509:17: error: ‘shortname’ may be used uninitialized [-Werror=maybe-uninitialized]
  509 |                 puts(shortname);
      |                 ^~~~~~~~~~~~~~~

where after skip_prefix() computed shortname like so:

	else if (skip_prefix(refname, "refs/heads/", &shortname))
		puts(shortname);

and the compiler at that low optimization level does not even notice
that the static inline function skip_prefix() never returns true
without updating its third parameter to a valid pointer X-<.

And adding a workaround for each and every uses of variable that are
set by skip_prefix() like so?  I am not sure if that is a good move

If you saw some diagnosis that "-Og" made but not with higher
optimization levels (with better understanding of the data and
control flow on the compiler's part), I am somewhat curious.

With gcc-13 everything seems to pass locally for me, so I personally
do not mind this patch, but from what I've seen "gcc-12 -Wall -Og"
so far, especially given that GitHub CI (and GitLab CI as well?
otherwise you wouldn't be sending this change, I would imagine)
seems to use a version of compiler whose "-Og -Wall" notices and
flags these "bugs", I am not sure if we want to enable this option
for us.  At least, -Werror and more stupid compilers is not a mix I
would want to live with and I am sure I would not want to add
workaround for such a mix to make our code worse (to end users and
builders who want to use older comiplers with different optimization
levels, we can tell them not to enable -Werror in their build).

Also, with "-Os", we seem to find different set of "bugs" that "-Og"
did not see.  For example, "gcc-13 -Os -Wall" flags that ppid may be
uninitialized in compat/linux/procinfo.c:push_ancestry_name(), but
that is because it does not read stat_parent_pid() [*4*] and realize
that the only time ppid is left unset is when the function returns
-1:

        static void push_ancestry_name(struct strvec *names, pid_t pid)
        {
                int ppid;

                if (stat_parent_pid(pid, &name, &ppid) < 0)
                        goto cleanup;
                ...
                if (ppid)
                        push_ancestry_name(names, ppid);
        cleanup:
                strbuf_release(&name);
		...

It is the same pattern as how "gcc-12 -Og" mishandles skip_prefix().
Another one "-Os" found is in builtin/fast-import.c:file_change_m()
where the compiler does not notice that parse_mode() never returns
non-NULL without setting "mode".  It is exactly the same pattern.


[Footnotes]

 *1* https://github.com/git/git/actions/runs/9432273715/job/25981831882#step:4:132

 *2* We used to do the (IIRC GCC only) trick to initialize the
     variable to itself, i.e.

        @@ -29,7 +29,7 @@ static int load_pack_mtimes_file(char *mtimes_file,
                int fd, ret = 0;
                struct stat st;
                uint32_t *data = NULL;
        -	size_t mtimes_size, expected_size;
        +	size_t mtimes_size = mtimes_size, expected_size;
                struct mtimes_header header;

                fd = git_open(mtimes_file);

     to work around stupid compilers that flag a variable whose use
     is safe as potentially-used-uninitialized but these days we got
     rid of the trick as some other compiler did not like it (for
     legitimate reasons).  Using the trick is not any better than
     initializing the variable to a meaningless value, like 0, as
     both will prevent smarter compilers from noticing real bugs
     (after all, that is the point of the workaround--not to flag
     use of variable the compiler thinks uninitialized as a bug).

 *3* I was planning to send "these are horrible workarounds to work
     around the recent -Og build failures that has a huge negative
     consequences in the longer term" patch on top of your series.

     But I do not think it is a good idea to add such a huge number
     of workarounds for various optimization levels.  If we could
     add -Og with -Werror disabled, those who may be interested in
     sifting through false positives to find real errors may be able
     to do so without breaking build for others.

 *4* As "-Os" is for smaller code, I would imagine that it didn't
     try to auto inline the function.  Come to think of it, "-Og" is
     to forbid optimizations that interfere with debugging, and code
     inlining may be considered as one, which would certainly
     explain why it has problem with skip_prefix().
Junio C Hamano June 12, 2024, 10:11 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> We can squelch the warning by initializing the variable to a
> meaningless value, like 0, but I consider that such a change makes
> the resulting code a lot worse than the original, and that is not
> because I do not want an extra and meaningless assignment emitted
> [*2*] in the resulting binary whose performance impact cannot be
> measured by anybody.
>
> It is bad because it defeats efforts by compilers that understand
> the data flow a bit better to diagnose a true "used uninitialized"
> bugs we may introduce in the future.  Adding such a workaround goes
> against the longer-term health of the codebase.
>
> For example, I could add another use of mtimes_size that is not
> guarded by "if (data)" right next to it, i.e.
>
>         @@ -87,6 +87,7 @@ static int load_pack_mtimes_file(char *mtimes_file,
>                 if (ret) {
>                         if (data)
>                                 munmap(data, mtimes_size);
>         +		printf("debug %d\n", (int)mtimes_size);
>                 } else {
>                         *len_p = mtimes_size;
>                         *data_p = data;
>
> gcc-13 does notice that we are now truly using the variable
> uninitialized and flags it for us (gcc-12 also does notice but the
> important difference is that gcc-13 refrained from warning at the
> safe use of the variable that is guarded by "if (data)").

By the way, I do not know if any compiler gives us such a feature,
but if the trick to squelch a false positive were finer grained, I
would have been much more receptive to the idea of building with
different optimization level, allowing a bit more false positives.

The workaround everybody jumps at is to initialize the variable to a
meaningless value (like 0) and I have explained why it is suboptimal
already.  But if we can tell less intelligent compilers "we know our
use of this variable AT THIS POINT is safe", e.g. by annotating the
above snippet of the code, perhaps like this:

                if (ret) {
                        if (data)
				/* -Wno-uninitialized (mtimes_size) */
                                munmap(data, mtimes_size);
			printf("debug %d\n", (int)mtimes_size);

then it would be clear to the compiler that understand the
annotation that inside that "if (data)" block, we know that
the use of mtimes_size is not using an uninitialized variable.

For the use of the same variable on the next "debug" line, because
it is outside of that "if (data)" block, the annotation should have
no effect, and the compiler is free to do its own analysis and we
will accept if it finds mtimes_size can be used uninitialized there.
Any new use added for the same variable will not be masked by a
meaningless initialization if we can use such a "workaround" to
squelch false positives.
Jeff King June 13, 2024, 10:15 a.m. UTC | #3
On Wed, Jun 12, 2024 at 03:11:30PM -0700, Junio C Hamano wrote:

> By the way, I do not know if any compiler gives us such a feature,
> but if the trick to squelch a false positive were finer grained, I
> would have been much more receptive to the idea of building with
> different optimization level, allowing a bit more false positives.
> 
> The workaround everybody jumps at is to initialize the variable to a
> meaningless value (like 0) and I have explained why it is suboptimal
> already.  But if we can tell less intelligent compilers "we know our
> use of this variable AT THIS POINT is safe", e.g. by annotating the
> above snippet of the code, perhaps like this:
> 
>                 if (ret) {
>                         if (data)
> 				/* -Wno-uninitialized (mtimes_size) */
>                                 munmap(data, mtimes_size);
> 			printf("debug %d\n", (int)mtimes_size);
> 
> then it would be clear to the compiler that understand the
> annotation that inside that "if (data)" block, we know that
> the use of mtimes_size is not using an uninitialized variable.
> 
> For the use of the same variable on the next "debug" line, because
> it is outside of that "if (data)" block, the annotation should have
> no effect, and the compiler is free to do its own analysis and we
> will accept if it finds mtimes_size can be used uninitialized there.
> Any new use added for the same variable will not be masked by a
> meaningless initialization if we can use such a "workaround" to
> squelch false positives.

I agree that such an annotation is much more focused. It's still not
foolproof, though (e.g., we might chance earlier code so that the
data/mtimes_size correlation is no longer true).

I think you could do it with:

			if (data)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wuninitialized"
				munmap(data, mtimes_size);
#pragma GCC diagnostic pop

which is...ugly. There's a _Pragma() operator, too, which I think would
let you make a macro like:

			if (data)
				SUPPRESS("-Wuninitialized", munmap(data, mtimes_size));

which is maybe slightly less horrific? Still pretty magical though.

But if the alternative is to do none of that, and just continue to avoid
looking for warnings with -Os, I prefer that.

-Peff
Junio C Hamano June 13, 2024, 3:47 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> I think you could do it with:
>
> 			if (data)
> #pragma GCC diagnostic push
> #pragma GCC diagnostic ignored "-Wuninitialized"
> 				munmap(data, mtimes_size);
> #pragma GCC diagnostic pop
>
> which is...ugly. There's a _Pragma() operator, too, which I think would
> let you make a macro like:
>
> 			if (data)
> 				SUPPRESS("-Wuninitialized", munmap(data, mtimes_size));
>
> which is maybe slightly less horrific? Still pretty magical though.
>
> But if the alternative is to do none of that, and just continue to avoid
> looking for warnings with -Os, I prefer that.

Oh, we are in agreement.  Such effort to annotate code and tolerate
inconvenience is better spent elsewhere but expertise and competence
are not as fungible as I wish them to be ;-)

Thanks.
diff mbox series

Patch

diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 98dda42045..40106c6c36 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -13,6 +13,15 @@  esac
 run_tests=t
 
 case "$jobname" in
+linux-gcc-default)
+	# Warnings generated by compilers are unfortunately specific to the
+	# optimization level. With `-O0`, many warnings won't be shown at all,
+	# whereas the optimizations performed by our default optimization level
+	# `-O2` will mask others. We thus use `-Og` here just so that we have
+	# at least one job with a different optimization level so that we can
+	# overall surface more warnings.
+	export CFLAGS_APPEND=-Og
+	;;
 linux-gcc)
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 	;;