[0/10] some commit-graph leak fixes

Message ID	20231003202504.GA7697@coredump.intra.peff.net (mailing list archive)
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Tue, 3 Oct 2023 16:25:04 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Subject: [PATCH 0/10] some commit-graph leak fixes Message-ID: <20231003202504.GA7697@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Precedence: bulk
Series	some commit-graph leak fixes \| expand [0/10] some commit-graph leak fixes [01/10] t6700: mark test as leak-free [02/10] commit-reach: free temporary list in get_octopus_merge_bases() [03/10] merge: free result of repo_get_merge_bases() [04/10] commit-graph: move slab-clearing to close_commit_graph() [05/10] commit-graph: free all elements of graph chain [06/10] commit-graph: delay base_graph assignment in add_graph_to_chain() [07/10] commit-graph: free graph struct that was not added to chain [08/10] commit-graph: free write-context entries before overwriting [09/10] commit-graph: free write-context base_graph_name during cleanup [10/10] commit-graph: clear oidset after finishing write

Jeff King Oct. 3, 2023, 8:25 p.m. UTC

I noticed while working on the jk/commit-graph-verify-fix topic that
free_commit_graph() leaks any slices of a commit-graph-chain except for
the first. I naively hoped that fixing that would make t5324 leak-free,
but it turns out there were a number of other leaks, so I fixed those,
too. A couple of them were in the merge code, which in turn means a
bunch of new test scripts are now leak-free.

Even though I saw the problem on that other topic, there's no dependency
here; this series can be applied directly to master (or possibly even
maint, though I didn't try).

  [01/10]: t6700: mark test as leak-free
  [02/10]: commit-reach: free temporary list in get_octopus_merge_bases()
  [03/10]: merge: free result of repo_get_merge_bases()
  [04/10]: commit-graph: move slab-clearing to close_commit_graph()
  [05/10]: commit-graph: free all elements of graph chain
  [06/10]: commit-graph: delay base_graph assignment in add_graph_to_chain()
  [07/10]: commit-graph: free graph struct that was not added to chain
  [08/10]: commit-graph: free write-context entries before overwriting
  [09/10]: commit-graph: free write-context base_graph_name during cleanup
  [10/10]: commit-graph: clear oidset after finishing write

 builtin/commit-graph.c             |  1 +
 builtin/merge.c                    |  5 +++-
 commit-graph.c                     | 40 ++++++++++++++----------------
 commit-reach.c                     |  1 +
 t/t4214-log-graph-octopus.sh       |  1 +
 t/t4215-log-skewed-merges.sh       |  1 +
 t/t5324-split-commit-graph.sh      |  2 ++
 t/t5328-commit-graph-64bit-time.sh |  2 ++
 t/t5521-pull-options.sh            |  1 +
 t/t6009-rev-list-parent.sh         |  1 +
 t/t6416-recursive-corner-cases.sh  |  1 +
 t/t6433-merge-toplevel.sh          |  1 +
 t/t6437-submodule-merge.sh         |  1 +
 t/t6700-tree-depth.sh              |  2 ++
 t/t7602-merge-octopus-many.sh      |  1 +
 t/t7603-merge-reduce-heads.sh      |  1 +
 t/t7607-merge-state.sh             |  1 +
 t/t7608-merge-messages.sh          |  1 +
 18 files changed, 42 insertions(+), 22 deletions(-)

-Peff

Eric W. Biederman Oct. 4, 2023, 1:33 a.m. UTC | #1

Jeff,

Please accept my apologies for slightly hijacking your posting, but I
see you have been fixing some leaks, and so presumably you are familiar
with building git with "SANITIZE=leak".

I have fixed some leaks in my SHA1+SHA256 patchset recently and while
tracking them down I found that simply enabling SANITIZE=leak caused
"make test" on git v2.42 without patches to give different failures
from test run to test run.

Well actually I wound up with the following command line:
GIT_TEST_PASSING_SANITIZE_LEAK=true GIT_TEST_SANITIZE_LEAK_LOG=true SANITIZE=leak DEVELOPER=1 make test

I had removed "-j32" to make things more reproducible.

I observed this unreliability with SANITIZE=leak when building git on
an fully updated version of debian 12.

My big question is:

    Do other people see random test failures when SANITIZE=leak is enabled?

Is it just me?

Thanks,
Eric

Jeff King Oct. 4, 2023, 1:21 p.m. UTC | #2

On Tue, Oct 03, 2023 at 08:33:26PM -0500, Eric W. Biederman wrote:

> My big question is:
> 
>     Do other people see random test failures when SANITIZE=leak is enabled?
> 
> Is it just me?

Yes, I've seen this. You mentioned that you were testing with v2.42,
which lacks 370ef7e40d (test-lib: ignore uninteresting LSan output,
2023-08-28). Try using the current version of 'master', or just
cherry-picking that commit onto v2.42.

A few other tips to avoid confusing results (though they at least do not
vary from run to run):

  - use the LEAK_LOG option, since you otherwise miss some cases (it
    looks like you already are from what you posted above)

  - gcc and clang sometimes produce different results. Right now I get
    no leak from gcc on t9004, but clang reports one (I think clang is
    right here)

  - turn off compiler optimizations; we've had cases where code
    reordering/removal creates false positives. Oh, hmm, I forgot we do
    this by default since d3775de074 (Makefile: force -O0 when compiling
    with SANITIZE=leak, 2022-10-18), so your v2.42 should be covered.

-Peff

Eric W. Biederman Oct. 4, 2023, 2:19 p.m. UTC | #3

Jeff King <peff@peff.net> writes:

> On Tue, Oct 03, 2023 at 08:33:26PM -0500, Eric W. Biederman wrote:
>
>> My big question is:
>> 
>>     Do other people see random test failures when SANITIZE=leak is enabled?
>> 
>> Is it just me?
>
> Yes, I've seen this. You mentioned that you were testing with v2.42,
> which lacks 370ef7e40d (test-lib: ignore uninteresting LSan output,
> 2023-08-28). Try using the current version of 'master', or just
> cherry-picking that commit onto v2.42.
>
> A few other tips to avoid confusing results (though they at least do not
> vary from run to run):
>
>   - use the LEAK_LOG option, since you otherwise miss some cases (it
>     looks like you already are from what you posted above)
>
>   - gcc and clang sometimes produce different results. Right now I get
>     no leak from gcc on t9004, but clang reports one (I think clang is
>     right here)
>
>   - turn off compiler optimizations; we've had cases where code
>     reordering/removal creates false positives. Oh, hmm, I forgot we do
>     this by default since d3775de074 (Makefile: force -O0 when compiling
>     with SANITIZE=leak, 2022-10-18), so your v2.42 should be covered.

I just tried master, aka commit d0e8084c65cb ("The fourteenth batch").

What I see on a random failure looks like:

> make -C t/ all
> make[1]: Entering directory '/home/user/projects/git/git/t'
> rm -f -r 'test-results'
> GIT_TEST_EXT_CHAIN_LINT=0 && export GIT_TEST_EXT_CHAIN_LINT && make aggregate-results-and-cleanup
> make[2]: Entering directory '/home/user/projects/git/git/t'
> *** t0000-basic.sh ***
> Segmentation fault
> error: test_bool_env requires bool values both for $GIT_TEST_PASSING_SANITIZE_LEAK and for the default fallback

Which doesn't sound like anything you have described so I am guessing it
is something with my environment I need to track down.

Eric

Jeff King Oct. 4, 2023, 2:47 p.m. UTC | #4

On Wed, Oct 04, 2023 at 09:19:40AM -0500, Eric W. Biederman wrote:

> What I see on a random failure looks like:
> 
> > make -C t/ all
> > make[1]: Entering directory '/home/user/projects/git/git/t'
> > rm -f -r 'test-results'
> > GIT_TEST_EXT_CHAIN_LINT=0 && export GIT_TEST_EXT_CHAIN_LINT && make aggregate-results-and-cleanup
> > make[2]: Entering directory '/home/user/projects/git/git/t'
> > *** t0000-basic.sh ***
> > Segmentation fault
> > error: test_bool_env requires bool values both for $GIT_TEST_PASSING_SANITIZE_LEAK and for the default fallback
> 
> Which doesn't sound like anything you have described so I am guessing it
> is something with my environment I need to track down.

No, that seems different entirely. You'll have to figure out which
program is segfaulting and why (if you can see it in a script besides
t0000 you're probably better off, as that one is a maze of
tests-within-tests, since it is testing the test-harness itself).

Although the "error" you see maybe implies that it is failing early on
in test-lib.sh, when we are calling "test-tool env-helper". If that is
segfaulting there is probably something very wrong with your build.

-Peff

Eric W. Biederman Oct. 4, 2023, 3:38 p.m. UTC | #5

Jeff King <peff@peff.net> writes:

> On Wed, Oct 04, 2023 at 09:19:40AM -0500, Eric W. Biederman wrote:
>
>> What I see on a random failure looks like:
>> 
>> > make -C t/ all
>> > make[1]: Entering directory '/home/user/projects/git/git/t'
>> > rm -f -r 'test-results'
>> > GIT_TEST_EXT_CHAIN_LINT=0 && export GIT_TEST_EXT_CHAIN_LINT && make aggregate-results-and-cleanup
>> > make[2]: Entering directory '/home/user/projects/git/git/t'
>> > *** t0000-basic.sh ***
>> > Segmentation fault
>> > error: test_bool_env requires bool values both for $GIT_TEST_PASSING_SANITIZE_LEAK and for the default fallback
>> 
>> Which doesn't sound like anything you have described so I am guessing it
>> is something with my environment I need to track down.
>
> No, that seems different entirely. You'll have to figure out which
> program is segfaulting and why (if you can see it in a script besides
> t0000 you're probably better off, as that one is a maze of
> tests-within-tests, since it is testing the test-harness itself).
>
> Although the "error" you see maybe implies that it is failing early on
> in test-lib.sh, when we are calling "test-tool env-helper". If that is
> segfaulting there is probably something very wrong with your build.

Just to document what I am seeing it appears to be some odd interaction
with address space randomization.

If I run my make as: "setarch --addr-no-randomize make test"

I don't see coredumps any more.

Now to dig a deeper and see if I can figure out what about address space
randomization is making things break.


Eric

Taylor Blau Oct. 5, 2023, 5:52 p.m. UTC | #6

On Tue, Oct 03, 2023 at 04:25:04PM -0400, Jeff King wrote:
> I noticed while working on the jk/commit-graph-verify-fix topic that
> free_commit_graph() leaks any slices of a commit-graph-chain except for
> the first. I naively hoped that fixing that would make t5324 leak-free,
> but it turns out there were a number of other leaks, so I fixed those,
> too. A couple of them were in the merge code, which in turn means a
> bunch of new test scripts are now leak-free.
>
> Even though I saw the problem on that other topic, there's no dependency
> here; this series can be applied directly to master (or possibly even
> maint, though I didn't try).

Thanks for carefully finding and explaining these various leaks. The
series is a definite improvement, and after reviewing closely I couldn't
find anything worth changing. LGTM!

Thanks,
Taylor

Junio C Hamano Oct. 6, 2023, 12:39 a.m. UTC | #7

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Oct 03, 2023 at 04:25:04PM -0400, Jeff King wrote:
>> I noticed while working on the jk/commit-graph-verify-fix topic that
>> free_commit_graph() leaks any slices of a commit-graph-chain except for
>> the first. I naively hoped that fixing that would make t5324 leak-free,
>> but it turns out there were a number of other leaks, so I fixed those,
>> too. A couple of them were in the merge code, which in turn means a
>> bunch of new test scripts are now leak-free.
>>
>> Even though I saw the problem on that other topic, there's no dependency
>> here; this series can be applied directly to master (or possibly even
>> maint, though I didn't try).
>
> Thanks for carefully finding and explaining these various leaks. The
> series is a definite improvement, and after reviewing closely I couldn't
> find anything worth changing. LGTM!

Thanks, both.  Let's merge it down to 'next'.

[0/10] some commit-graph leak fixes

Message

Comments