Message ID | 03270d3414117ae7229d87127cff81e349557039.1718001244.git.ps@pks.im (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ci: detect more warnings via `-Og` | expand |
Patrick Steinhardt <ps@pks.im> writes: > While we could adapt all jobs to compile with `-Og` now, that would > potentially mask other warnings that only get diagnosed with `-O2`. I would say it is not just "potentially". It is likely that higher optimization levels will diagnose breakage more correctly, while lower optimization levels may trigger warnings more often. I suspect a large part of those that lower optimization levels alone find may be due to false positives. So, ... I had a strong knee-jerk reaction against "While we could adapt all" with "why would anybody sane would even want to do such a thing in the first place?" If it were phrased like "as we have plenty of jobs compiling with -O2, we can afford to switch a few of them to different optimization levels, and we may find bugs that the exiting level did not notice, as the compilers' understanding of the data and control flow is affected by these -O levels", I would have understood it, and I think that is what happens in this patch. I just tried compiling pack-mtimes.c (untouced between maint and seen) with -Og using gcc-11, gcc-12, and gcc-13, using the exact copy of the command line with V=1 output I see at the GitHub CI [*1*] (after all, this was a good use case, even though it was dropped from this round ;-) The earlier versions of the compiler mark the use of mtimes_size that is guarded in "if (data)" as maybe-uninitialized: pack-mtimes.c: In function ‘load_pack_mtimes_file’: pack-mtimes.c:89:25: error: ‘mtimes_size’ may be used uninitialized [-Werror=maybe-uninitialized] 89 | munmap(data, mtimes_size); | ^~~~~~~~~~~~~~~~~~~~~~~~~ pack-mtimes.c:32:16: note: ‘mtimes_size’ was declared here 32 | size_t mtimes_size, expected_size; | ^~~~~~~~~~~ cc1: all warnings being treated as errors make: *** [Makefile:2750: pack-mtimes.o] Error 1 But as I already reported elsewhere, this is a false positive. The place "data" can possibly become non-NULL happens only after the control passes the place mtimes_size get assigned a meaningful value. If the control reached here and data were non-NULL, the control flow guarantees that mtimes_size has been assigned xsize_t(st.st_size) and cannot be uninitialized. We can squelch the warning by initializing the variable to a meaningless value, like 0, but I consider that such a change makes the resulting code a lot worse than the original, and that is not because I do not want an extra and meaningless assignment emitted [*2*] in the resulting binary whose performance impact cannot be measured by anybody. It is bad because it defeats efforts by compilers that understand the data flow a bit better to diagnose a true "used uninitialized" bugs we may introduce in the future. Adding such a workaround goes against the longer-term health of the codebase. For example, I could add another use of mtimes_size that is not guarded by "if (data)" right next to it, i.e. @@ -87,6 +87,7 @@ static int load_pack_mtimes_file(char *mtimes_file, if (ret) { if (data) munmap(data, mtimes_size); + printf("debug %d\n", (int)mtimes_size); } else { *len_p = mtimes_size; *data_p = data; gcc-13 does notice that we are now truly using the variable uninitialized and flags it for us (gcc-12 also does notice but the important difference is that gcc-13 refrained from warning at the safe use of the variable that is guarded by "if (data)"). Once we initialize the variable to meaninless 0 as a workaround, however, the compiler cannot help us with the "used uninitialized" warning, and there is no "the program uses the variable what was initialized with meaningless value before it get set a meaningful value" warning, unfortunately X-<. And that, not an extra assignment, is why such a workaround goes against the longer-term health of the codebase. By the way, it is not just pack-mtimes.c [*3*]; there are quite a few obvious false positives, e.g. builtin/branch.c: In function ‘print_current_branch_name’: builtin/branch.c:509:17: error: ‘shortname’ may be used uninitialized [-Werror=maybe-uninitialized] 509 | puts(shortname); | ^~~~~~~~~~~~~~~ where after skip_prefix() computed shortname like so: else if (skip_prefix(refname, "refs/heads/", &shortname)) puts(shortname); and the compiler at that low optimization level does not even notice that the static inline function skip_prefix() never returns true without updating its third parameter to a valid pointer X-<. And adding a workaround for each and every uses of variable that are set by skip_prefix() like so? I am not sure if that is a good move If you saw some diagnosis that "-Og" made but not with higher optimization levels (with better understanding of the data and control flow on the compiler's part), I am somewhat curious. With gcc-13 everything seems to pass locally for me, so I personally do not mind this patch, but from what I've seen "gcc-12 -Wall -Og" so far, especially given that GitHub CI (and GitLab CI as well? otherwise you wouldn't be sending this change, I would imagine) seems to use a version of compiler whose "-Og -Wall" notices and flags these "bugs", I am not sure if we want to enable this option for us. At least, -Werror and more stupid compilers is not a mix I would want to live with and I am sure I would not want to add workaround for such a mix to make our code worse (to end users and builders who want to use older comiplers with different optimization levels, we can tell them not to enable -Werror in their build). Also, with "-Os", we seem to find different set of "bugs" that "-Og" did not see. For example, "gcc-13 -Os -Wall" flags that ppid may be uninitialized in compat/linux/procinfo.c:push_ancestry_name(), but that is because it does not read stat_parent_pid() [*4*] and realize that the only time ppid is left unset is when the function returns -1: static void push_ancestry_name(struct strvec *names, pid_t pid) { int ppid; if (stat_parent_pid(pid, &name, &ppid) < 0) goto cleanup; ... if (ppid) push_ancestry_name(names, ppid); cleanup: strbuf_release(&name); ... It is the same pattern as how "gcc-12 -Og" mishandles skip_prefix(). Another one "-Os" found is in builtin/fast-import.c:file_change_m() where the compiler does not notice that parse_mode() never returns non-NULL without setting "mode". It is exactly the same pattern. [Footnotes] *1* https://github.com/git/git/actions/runs/9432273715/job/25981831882#step:4:132 *2* We used to do the (IIRC GCC only) trick to initialize the variable to itself, i.e. @@ -29,7 +29,7 @@ static int load_pack_mtimes_file(char *mtimes_file, int fd, ret = 0; struct stat st; uint32_t *data = NULL; - size_t mtimes_size, expected_size; + size_t mtimes_size = mtimes_size, expected_size; struct mtimes_header header; fd = git_open(mtimes_file); to work around stupid compilers that flag a variable whose use is safe as potentially-used-uninitialized but these days we got rid of the trick as some other compiler did not like it (for legitimate reasons). Using the trick is not any better than initializing the variable to a meaningless value, like 0, as both will prevent smarter compilers from noticing real bugs (after all, that is the point of the workaround--not to flag use of variable the compiler thinks uninitialized as a bug). *3* I was planning to send "these are horrible workarounds to work around the recent -Og build failures that has a huge negative consequences in the longer term" patch on top of your series. But I do not think it is a good idea to add such a huge number of workarounds for various optimization levels. If we could add -Og with -Werror disabled, those who may be interested in sifting through false positives to find real errors may be able to do so without breaking build for others. *4* As "-Os" is for smaller code, I would imagine that it didn't try to auto inline the function. Come to think of it, "-Og" is to forbid optimizations that interfere with debugging, and code inlining may be considered as one, which would certainly explain why it has problem with skip_prefix().
Junio C Hamano <gitster@pobox.com> writes: > We can squelch the warning by initializing the variable to a > meaningless value, like 0, but I consider that such a change makes > the resulting code a lot worse than the original, and that is not > because I do not want an extra and meaningless assignment emitted > [*2*] in the resulting binary whose performance impact cannot be > measured by anybody. > > It is bad because it defeats efforts by compilers that understand > the data flow a bit better to diagnose a true "used uninitialized" > bugs we may introduce in the future. Adding such a workaround goes > against the longer-term health of the codebase. > > For example, I could add another use of mtimes_size that is not > guarded by "if (data)" right next to it, i.e. > > @@ -87,6 +87,7 @@ static int load_pack_mtimes_file(char *mtimes_file, > if (ret) { > if (data) > munmap(data, mtimes_size); > + printf("debug %d\n", (int)mtimes_size); > } else { > *len_p = mtimes_size; > *data_p = data; > > gcc-13 does notice that we are now truly using the variable > uninitialized and flags it for us (gcc-12 also does notice but the > important difference is that gcc-13 refrained from warning at the > safe use of the variable that is guarded by "if (data)"). By the way, I do not know if any compiler gives us such a feature, but if the trick to squelch a false positive were finer grained, I would have been much more receptive to the idea of building with different optimization level, allowing a bit more false positives. The workaround everybody jumps at is to initialize the variable to a meaningless value (like 0) and I have explained why it is suboptimal already. But if we can tell less intelligent compilers "we know our use of this variable AT THIS POINT is safe", e.g. by annotating the above snippet of the code, perhaps like this: if (ret) { if (data) /* -Wno-uninitialized (mtimes_size) */ munmap(data, mtimes_size); printf("debug %d\n", (int)mtimes_size); then it would be clear to the compiler that understand the annotation that inside that "if (data)" block, we know that the use of mtimes_size is not using an uninitialized variable. For the use of the same variable on the next "debug" line, because it is outside of that "if (data)" block, the annotation should have no effect, and the compiler is free to do its own analysis and we will accept if it finds mtimes_size can be used uninitialized there. Any new use added for the same variable will not be masked by a meaningless initialization if we can use such a "workaround" to squelch false positives.
On Wed, Jun 12, 2024 at 03:11:30PM -0700, Junio C Hamano wrote: > By the way, I do not know if any compiler gives us such a feature, > but if the trick to squelch a false positive were finer grained, I > would have been much more receptive to the idea of building with > different optimization level, allowing a bit more false positives. > > The workaround everybody jumps at is to initialize the variable to a > meaningless value (like 0) and I have explained why it is suboptimal > already. But if we can tell less intelligent compilers "we know our > use of this variable AT THIS POINT is safe", e.g. by annotating the > above snippet of the code, perhaps like this: > > if (ret) { > if (data) > /* -Wno-uninitialized (mtimes_size) */ > munmap(data, mtimes_size); > printf("debug %d\n", (int)mtimes_size); > > then it would be clear to the compiler that understand the > annotation that inside that "if (data)" block, we know that > the use of mtimes_size is not using an uninitialized variable. > > For the use of the same variable on the next "debug" line, because > it is outside of that "if (data)" block, the annotation should have > no effect, and the compiler is free to do its own analysis and we > will accept if it finds mtimes_size can be used uninitialized there. > Any new use added for the same variable will not be masked by a > meaningless initialization if we can use such a "workaround" to > squelch false positives. I agree that such an annotation is much more focused. It's still not foolproof, though (e.g., we might chance earlier code so that the data/mtimes_size correlation is no longer true). I think you could do it with: if (data) #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wuninitialized" munmap(data, mtimes_size); #pragma GCC diagnostic pop which is...ugly. There's a _Pragma() operator, too, which I think would let you make a macro like: if (data) SUPPRESS("-Wuninitialized", munmap(data, mtimes_size)); which is maybe slightly less horrific? Still pretty magical though. But if the alternative is to do none of that, and just continue to avoid looking for warnings with -Os, I prefer that. -Peff
Jeff King <peff@peff.net> writes: > I think you could do it with: > > if (data) > #pragma GCC diagnostic push > #pragma GCC diagnostic ignored "-Wuninitialized" > munmap(data, mtimes_size); > #pragma GCC diagnostic pop > > which is...ugly. There's a _Pragma() operator, too, which I think would > let you make a macro like: > > if (data) > SUPPRESS("-Wuninitialized", munmap(data, mtimes_size)); > > which is maybe slightly less horrific? Still pretty magical though. > > But if the alternative is to do none of that, and just continue to avoid > looking for warnings with -Os, I prefer that. Oh, we are in agreement. Such effort to annotate code and tolerate inconvenience is better spent elsewhere but expertise and competence are not as fungible as I wish them to be ;-) Thanks.
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh index 98dda42045..40106c6c36 100755 --- a/ci/run-build-and-tests.sh +++ b/ci/run-build-and-tests.sh @@ -13,6 +13,15 @@ esac run_tests=t case "$jobname" in +linux-gcc-default) + # Warnings generated by compilers are unfortunately specific to the + # optimization level. With `-O0`, many warnings won't be shown at all, + # whereas the optimizations performed by our default optimization level + # `-O2` will mask others. We thus use `-Og` here just so that we have + # at least one job with a different optimization level so that we can + # overall surface more warnings. + export CFLAGS_APPEND=-Og + ;; linux-gcc) export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main ;;
We have recently noticed that our CI does not always notice variables that may be used uninitialized. While it is expected that compiler warnings aren't perfect, this one was a bit puzzling because it was rather obvious that the variable can be uninitialized. Many compiler warnings unfortunately depend on the optimization level used by the compiler. While `-O0` for example will disable a lot of warnings altogether because optimization passes go away, `-O2`, which is our default optimization level used in CI, may optimize specific code away or even double down on undefined behaviour. Interestingly, this specific instance that triggered the investigation does get noted by GCC when using `-Og`. While we could adapt all jobs to compile with `-Og` now, that would potentially mask other warnings that only get diagnosed with `-O2`. Instead, adapt the "linux-gcc-default" job to compile with `-Og`. This job is chosen because it uses the "ubuntu:latest" image and should thus have a comparatively recent compiler toolchain, and because we have other jobs that use "ubuntu:latest" so that we do not lose coverage for warnings diagnosed only on `-O2` level. To make it easier to set up the optimization level in our CI, add support in our Makefile to specify the level via an environment variable. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- ci/run-build-and-tests.sh | 9 +++++++++ 1 file changed, 9 insertions(+)