Message ID | 20250101191422.GC1391912@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | a less-invasive racy-leak fix, was Re: What's cooking in git.git (Dec 2024, #11; Mon, 30) | expand |
Jeff King <peff@peff.net> writes: > On Mon, Dec 30, 2024 at 09:33:20AM -0800, Junio C Hamano wrote: > >> * jk/lsan-race-with-barrier (2024-12-30) 5 commits > ... > This graduated faster than I expected. :) Heh, it is before -rc2 and the change is only about tests, so ... > ... > So that line is doing something useful. But it may not be worth the racy > pain it's causing. So some alternatives are: > > - we drop that line by default, and then when people are investigating > a specific leak, they can override LSAN_OPTIONS themselves to get > better output (though of course knowing that you can even do is > tricky) > > - we keep that line by default, but override LSAN_OPTIONS for CI to > avoid the race. That makes all local leak-checking traces > informative by default. But CI ones may be truncated. I'm not sure > if people use the CI ones directly, or investigate further > themselves. > > - we could annotate individual scripts or even tests to disable the > option (since it's really just threaded programs). This is more > hassle, but would limit the blast radius. > > I don't love any of those, but they may be less bad than all of the > barrier trickery. And it may be that this is even something we could get > fixed in LSan upstream, and it would just be a temporary workaround. I'm > still going to pursue that. > > And finally, one other option (that I'm not sure why I didn't consider > before): can we just ignore the false positives, similar to what we did > in 370ef7e40d (test-lib: ignore uninteresting LSan output, 2023-08-28). Good point. > I think we'd have to stop doing abort_on_error for the leak checker and > just rely on the logs, but that's OK (we always check the logs these > days). > ... > A little hacky, but it lets us have our cake and eat it, too. No changes > to the code, and no bad stack traces. > > What do you think? I like the small hack. "This is ultimately LSan's racy-ness and not ours, so let's avoid changing our code to work it around when we can do the workaround somewhere else" is an attitude that I would endorse fully. Thanks.
On Wed, Jan 01, 2025 at 04:25:02PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Mon, Dec 30, 2024 at 09:33:20AM -0800, Junio C Hamano wrote: > > > >> * jk/lsan-race-with-barrier (2024-12-30) 5 commits > > ... > > This graduated faster than I expected. :) > > Heh, it is before -rc2 and the change is only about tests, so ... Yeah, I figured as much. I also considered it of relatively low importance during -rc, but I guess CI false positives do tend to annoy everybody and waste their time. :) It looks like you pushed out the version of 'master' with it merged. I had figured you'd revert jk/lsan-race-with-barrier out of next, so I wondered how we would proceed (revert the whole merge from master to rebuild, or do a moral revert of the final three). Looking at jk/lsan-race-ignore-false-positive, it looks like you did the moral revert via fc89d14c63 (Revert barrier-based LSan threading race workaround, 2025-01-01). That commit's tree matches what I'd expect (I guess you probably used "revert -n HEAD~3..HEAD" just like I did). It would be nice if the 3-commit revert mentioned the specific commits it was reverting. If you revert the whole merge, you get the merge commit's id and you can find the original commits by inspecting the history (but of course here we were reverting only part of it, so we couldn't do that here). If you revert the sequence without "-n" you get three individual reverts. Which is informative, but a little clunky in the history. I wonder if revert should have a "squash" mode that reverts all of the commits (perhaps in reverse order of application in case they depend on each other textually), and then gives you a commit message template similar to git-fmt-merge-msg, where we list all of the commits, one per line (though probably with their commit ids in this case). Probably not a big deal either way, and certainly not a blocker for the series. Thanks (as usual) for doing all the maintainer juggling. -Peff
On Wed, Jan 1, 2025 at 6:32 PM Jeff King <peff@peff.net> wrote: > I wonder if revert should have a "squash" mode that reverts all of the > commits (perhaps in reverse order of application in case they depend on > each other textually), and then gives you a commit message template > similar to git-fmt-merge-msg, where we list all of the commits, one per > line (though probably with their commit ids in this case). I have kind of wanted this in the past, but never enough to build it, I just did a few separate reverts. Chris
On Wed, Jan 01, 2025 at 04:25:02PM -0800, Junio C Hamano wrote: > > What do you think? > > I like the small hack. "This is ultimately LSan's racy-ness and not > ours, so let's avoid changing our code to work it around when we can > do the workaround somewhere else" is an attitude that I would endorse > fully. BTW, in case anybody wants to follow along with what happens upstream, I reported it here: https://github.com/google/sanitizers/issues/1836 -Peff
Jeff King <peff@peff.net> writes: > On Wed, Jan 01, 2025 at 04:25:02PM -0800, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> > On Mon, Dec 30, 2024 at 09:33:20AM -0800, Junio C Hamano wrote: >> > >> >> * jk/lsan-race-with-barrier (2024-12-30) 5 commits >> > ... >> > This graduated faster than I expected. :) >> >> Heh, it is before -rc2 and the change is only about tests, so ... > > Yeah, I figured as much. I also considered it of relatively low > importance during -rc, but I guess CI false positives do tend to annoy > everybody and waste their time. :) > > It looks like you pushed out the version of 'master' with it merged. I > had figured you'd revert jk/lsan-race-with-barrier out of next, so I > wondered how we would proceed (revert the whole merge from master to > rebuild, or do a moral revert of the final three). Revert the effect of the tip-part (except for the bottom two) and then queue the new ones, which would allow me to merge the whole thing in one go without losing the bottom two's effect (which would happen if we reverted the whole thing first, and then reused the bottom two commits to build the new iteration on top). > Looking at jk/lsan-race-ignore-false-positive, it looks like you did the > moral revert via fc89d14c63 (Revert barrier-based LSan threading race > workaround, 2025-01-01). That commit's tree matches what I'd expect (I > guess you probably used "revert -n HEAD~3..HEAD" just like I did). I actually did "read-tree -u -m" followed by "commit" ;-) > It would be nice if the 3-commit revert mentioned the specific commits > it was reverting. True. I should probably amend while I can. > I wonder if revert should have a "squash" mode that reverts all of the > commits (perhaps in reverse order of application in case they depend on > each other textually), and then gives you a commit message template > similar to git-fmt-merge-msg, where we list all of the commits, one per > line (though probably with their commit ids in this case). I am not sure if I follow. Should "revert HEAD~3..HEAD" give such concatenation of messages, something similar to what "rebase -i" gives us when seeing multiple "squash"es in a row?
On Thu, Jan 02, 2025 at 06:42:30AM -0800, Junio C Hamano wrote: > > I wonder if revert should have a "squash" mode that reverts all of the > > commits (perhaps in reverse order of application in case they depend on > > each other textually), and then gives you a commit message template > > similar to git-fmt-merge-msg, where we list all of the commits, one per > > line (though probably with their commit ids in this case). > > I am not sure if I follow. Should "revert HEAD~3..HEAD" give such > concatenation of messages, something similar to what "rebase -i" > gives us when seeing multiple "squash"es in a row? I don't think we need to concatenate all of the individual revert messages. I was thinking of producing something more like: <SUBJECT: DESCRIBE YOUR REVERT HERE> Revert the following commits: - 7a8d9efc26 (grep: work around LSan threading race with barrier, 2024-12-29) - 526c0a851b (index-pack: work around LSan threading race with barrier, 2024-12-29) - 7d0037b59a (thread-utils: introduce optional barrier type, 2024-12-29) You could perhaps even auto-populate the subject with: Revert jk/lsan-race-with-barrier~3..jk/lsan-race-with-barrier similar to how git-merge uses "Merge branch ...". But it's a little clunky to read, and unlike merge, it's a lot easier to use names that are not very meaningful (e.g., I checked out a new branch based on that one and then used HEAD~3..HEAD, which is worthless to mention). -Peff
Jeff King <peff@peff.net> writes: > <SUBJECT: DESCRIBE YOUR REVERT HERE> > > Revert the following commits: > > - 7a8d9efc26 (grep: work around LSan threading race with barrier, 2024-12-29) > - 526c0a851b (index-pack: work around LSan threading race with barrier, 2024-12-29) > - 7d0037b59a (thread-utils: introduce optional barrier type, 2024-12-29) Ah, I love it. > You could perhaps even auto-populate the subject with: > > Revert jk/lsan-race-with-barrier~3..jk/lsan-race-with-barrier > > similar to how git-merge uses "Merge branch ...". But it's a little > clunky to read, and unlike merge, it's a lot easier to use names that > are not very meaningful (e.g., I checked out a new branch based on that > one and then used HEAD~3..HEAD, which is worthless to mention). Yup, agreed. The commit title is much less interesting to automate than the "we revert these three" list. Thanks.
diff --git a/t/test-lib.sh b/t/test-lib.sh index 96f2dfb69d..2936e810b5 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -80,7 +80,6 @@ prepend_var ASAN_OPTIONS : detect_leaks=0 export ASAN_OPTIONS prepend_var LSAN_OPTIONS : $GIT_SAN_OPTIONS -prepend_var LSAN_OPTIONS : fast_unwind_on_malloc=0 export LSAN_OPTIONS prepend_var UBSAN_OPTIONS : $GIT_SAN_OPTIONS And indeed, that lets t5309 run successfully under --stress for me (with the log-dir fix from the base of my series). That line comes from in 71f26798f2 (test-lib: add "fast_unwind_on_malloc=0" to LSAN_OPTIONS, 2022-02-27), with the goal of making the backtraces a bit more readable. I'm not sure how important it is. I tried reverting a random commit from one of Patrick's recent leak-fix series: git revert 58e7568c619cce872b17987f0d14b3de3703129a make SANITIZE=leak cd t ./t1091-sparse-checkout-builtin.sh -i and the trace looks quite reasonable to me. That old commit 71f2678f2 gives an example from t0006. That script was marked leak-free by 974c919d36 (date API: add and use a date_mode_release(), 2022-02-16). If we revert its change to test-date like so: git diff-tree -Rp 974c919d36 -- t/helper/test-date.c | git apply -3 make SANITIZE=leak cd t ./t0006-date.sh -i then we do get the same crappy trace as mentioned in 71f26798f2 (if I understand correctly, this is because libc functions which allocate are not built with -fno-omit-frame-pointer, so we can't walk past them). So that line is doing something useful. But it may not be worth the racy pain it's causing. So some alternatives are: - we drop that line by default, and then when people are investigating a specific leak, they can override LSAN_OPTIONS themselves to get better output (though of course knowing that you can even do is tricky) - we keep that line by default, but override LSAN_OPTIONS for CI to avoid the race. That makes all local leak-checking traces informative by default. But CI ones may be truncated. I'm not sure if people use the CI ones directly, or investigate further themselves. - we could annotate individual scripts or even tests to disable the option (since it's really just threaded programs). This is more hassle, but would limit the blast radius. I don't love any of those, but they may be less bad than all of the barrier trickery. And it may be that this is even something we could get fixed in LSan upstream, and it would just be a temporary workaround. I'm still going to pursue that. And finally, one other option (that I'm not sure why I didn't consider before): can we just ignore the false positives, similar to what we did in 370ef7e40d (test-lib: ignore uninteresting LSan output, 2023-08-28). I think we'd have to stop doing abort_on_error for the leak checker and just rely on the logs, but that's OK (we always check the logs these days). And detecting the false positive is a little involved. But this seems to work: diff --git a/t/test-lib.sh b/t/test-lib.sh index 96f2dfb69d..ab7d2d321e 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -80,6 +80,7 @@ prepend_var ASAN_OPTIONS : detect_leaks=0 export ASAN_OPTIONS prepend_var LSAN_OPTIONS : $GIT_SAN_OPTIONS +prepend_var LSAN_OPTIONS : exitcode=0 prepend_var LSAN_OPTIONS : fast_unwind_on_malloc=0 export LSAN_OPTIONS @@ -346,7 +347,8 @@ nr_san_dir_leaks_ () { find "$TEST_RESULTS_SAN_DIR" \ -type f \ -name "$TEST_RESULTS_SAN_FILE_PFX.*" 2>/dev/null | - xargs grep -lv "Unable to get registers from thread" | + xargs grep ^DEDUP_TOKEN | + grep -v sanitizer::GetThreadStackTopAndBottom | wc -l }