mbox series

[00/20] Various memory leak fixes

Message ID cover.1716465556.git.ps@pks.im (mailing list archive)
Headers show
Series Various memory leak fixes | expand

Message

Patrick Steinhardt May 23, 2024, 12:25 p.m. UTC
Hi,

my mind had a couple of minutes where it was roaming, and of course it
immediately searched for and chased down the next rabbit hole. The
result is this patch series which fixes a bunch of leaks all over the
place. There isn't really any structure to the leaks that I did fix --
it's mostly things that I stumbled over. In the end, this series makes
another 56 test suites pass with leak checking enabled, 13 of which have
already been passing without any changes.

While most things are unstructured, there are two topics that stand out:

  - Patches 5 to 12 address a shortcoming of our config API. Both
    `git_config_string()` and `git_config_pathname()` have a `const char
    **` out parameter, but they do in fact transfer memory ownership to
    the caller. This resulted in a bunch of memory leaks all over the
    place.

    These patches thus refactor a bunch of code and then ultimately
    switch the out parameter to become a `char *`

  - Patches 16 to 20 have the goal of making git-mv(1) memory leak free.
    I had a very hard time understanding how it tracks memory. I think
    this wasn't only me, or otherwise there wouldn't be calls to
    `UNLEAK()` in there. In any case, I decided to rewrite the arrays to
    use a `struct strvec`, which makes tracking and releasing of memory
    a ton easier.

    It does come at the cost of more allocations because we may now
    duplicate strings that we didn't before. But I think the tradeoff is
    worth it because the number of strings we may now duplicate is
    bounded by the number of command line arguments anyway.

Thanks!

Patrick

Patrick Steinhardt (20):
  t: mark a bunch of tests as leak-free
  transport-helper: fix leaking helper name
  strbuf: fix leak when `appendwholeline()` fails with EOF
  checkout: clarify memory ownership in `unique_tracking_name()`
  http: refactor code to clarify memory ownership
  config: clarify memory ownership in `git_config_pathname()`
  diff: refactor code to clarify memory ownership of prefixes
  convert: refactor code to clarify ownership of
    check_roundtrip_encoding
  builtin/log: stop using globals for log config
  builtin/log: stop using globals for format config
  config: clarify memory ownership in `git_config_string()`
  config: plug various memory leaks
  builtin/credential: clear credential before exit
  commit-reach: fix memory leak in `ahead_behind()`
  submodule: fix leaking memory for submodule entries
  strvec: add functions to replace and remove strings
  builtin/mv: refactor `add_slash()` to always return allocated strings
  builtin/mv duplicate string list memory
  builtin/mv: refactor to use `struct strvec`
  builtin/mv: fix leaks for submodule gitfile paths

 Makefile                                      |   1 +
 alias.c                                       |   6 +-
 attr.c                                        |   2 +-
 attr.h                                        |   2 +-
 builtin/blame.c                               |   2 +-
 builtin/checkout.c                            |  14 +-
 builtin/commit.c                              |   4 +-
 builtin/config.c                              |   2 +-
 builtin/credential.c                          |   2 +
 builtin/log.c                                 | 708 ++++++++++--------
 builtin/merge.c                               |   4 +-
 builtin/mv.c                                  | 222 +++---
 builtin/rebase.c                              |   2 +-
 builtin/receive-pack.c                        |   6 +-
 builtin/repack.c                              |   8 +-
 builtin/worktree.c                            |  20 +-
 checkout.c                                    |   4 +-
 checkout.h                                    |   6 +-
 commit-reach.c                                |   4 +
 config.c                                      |  52 +-
 config.h                                      |  10 +-
 convert.c                                     |  30 +-
 convert.h                                     |   2 +-
 delta-islands.c                               |   2 +-
 diff.c                                        |  20 +-
 environment.c                                 |  16 +-
 environment.h                                 |  14 +-
 fetch-pack.c                                  |   4 +-
 fsck.c                                        |   4 +-
 fsmonitor-settings.c                          |   5 +-
 gpg-interface.c                               |   6 +-
 http.c                                        |  50 +-
 imap-send.c                                   |  12 +-
 mailmap.c                                     |   4 +-
 mailmap.h                                     |   4 +-
 merge-ll.c                                    |   6 +-
 pager.c                                       |   2 +-
 pretty.c                                      |  14 +-
 promisor-remote.h                             |   2 +-
 remote.c                                      |  20 +-
 remote.h                                      |   8 +-
 sequencer.c                                   |   2 +-
 setup.c                                       |   6 +-
 strbuf.c                                      |   4 +-
 strvec.c                                      |  20 +
 strvec.h                                      |  13 +
 submodule-config.c                            |   2 +
 t/t0300-credentials.sh                        |   2 +
 t/t0411-clone-from-partial.sh                 |   1 +
 t/t0610-reftable-basics.sh                    |   1 +
 t/t0611-reftable-httpd.sh                     |   1 +
 t/t1013-read-tree-submodule.sh                |   1 +
 t/t1306-xdg-files.sh                          |   1 +
 t/t1350-config-hooks-path.sh                  |   1 +
 t/t1400-update-ref.sh                         |   2 +
 t/t2013-checkout-submodule.sh                 |   1 +
 t/t2024-checkout-dwim.sh                      |   1 +
 t/t2060-switch.sh                             |   1 +
 t/t2405-worktree-submodule.sh                 |   1 +
 t/t3007-ls-files-recurse-submodules.sh        |   1 +
 t/t3203-branch-output.sh                      |   2 +
 t/t3415-rebase-autosquash.sh                  |   1 +
 t/t3426-rebase-submodule.sh                   |   1 +
 t/t3512-cherry-pick-submodule.sh              |   1 +
 t/t3513-revert-submodule.sh                   |   1 +
 t/t3600-rm.sh                                 |   1 +
 t/t3906-stash-submodule.sh                    |   1 +
 t/t4001-diff-rename.sh                        |   4 +-
 t/t4041-diff-submodule-option.sh              |   1 +
 t/t4043-diff-rename-binary.sh                 |   1 +
 t/t4059-diff-submodule-not-initialized.sh     |   1 +
 t/t4060-diff-submodule-option-diff-format.sh  |   1 +
 t/t4120-apply-popt.sh                         |   1 +
 t/t4137-apply-submodule.sh                    |   1 +
 t/t4153-am-resume-override-opts.sh            |   1 +
 t/t4210-log-i18n.sh                           |   2 +
 t/t5563-simple-http-auth.sh                   |   1 +
 t/t5564-http-proxy.sh                         |   1 +
 t/t5581-http-curl-verbose.sh                  |   1 +
 t/t6006-rev-list-format.sh                    |   1 +
 t/t6041-bisect-submodule.sh                   |   1 +
 t/t6400-merge-df.sh                           |   1 +
 t/t6412-merge-large-rename.sh                 |   1 +
 t/t6426-merge-skip-unneeded-updates.sh        |   1 +
 t/t6429-merge-sequence-rename-caching.sh      |   1 +
 t/t6438-submodule-directory-file-conflicts.sh |   1 +
 t/t7001-mv.sh                                 |   2 +
 t/t7005-editor.sh                             |   1 +
 t/t7006-pager.sh                              |   1 +
 t/t7102-reset.sh                              |   1 +
 t/t7112-reset-submodule.sh                    |   1 +
 t/t7417-submodule-path-url.sh                 |   1 +
 t/t7421-submodule-summary-add.sh              |   1 +
 t/t7423-submodule-symlinks.sh                 |   1 +
 t/t9129-git-svn-i18n-commitencoding.sh        |   1 -
 t/t9139-git-svn-non-utf8-commitencoding.sh    |   1 -
 t/t9200-git-cvsexportcommit.sh                |   1 +
 t/t9401-git-cvsserver-crlf.sh                 |   1 +
 t/t9600-cvsimport.sh                          |   1 +
 t/t9601-cvsimport-vendor-branch.sh            |   1 +
 t/t9602-cvsimport-branches-tags.sh            |   1 +
 t/t9603-cvsimport-patchsets.sh                |   2 +
 t/t9604-cvsimport-timestamps.sh               |   2 +
 t/unit-tests/t-strvec.c                       | 259 +++++++
 t/unit-tests/test-lib.c                       |  13 +
 t/unit-tests/test-lib.h                       |  13 +
 transport-helper.c                            |   6 +-
 transport.c                                   |   1 +
 upload-pack.c                                 |   2 +-
 userdiff.h                                    |  12 +-
 110 files changed, 1141 insertions(+), 584 deletions(-)
 create mode 100644 t/unit-tests/t-strvec.c

Comments

Junio C Hamano May 23, 2024, 4:45 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> my mind had a couple of minutes where it was roaming, and of course it
> immediately searched for and chased down the next rabbit hole. The
> result is this patch series which fixes a bunch of leaks all over the
> place. There isn't really any structure to the leaks that I did fix --
> it's mostly things that I stumbled over. In the end, this series makes
> another 56 test suites pass with leak checking enabled, 13 of which have
> already been passing without any changes.

... meaning there were coverage gaps?

> While most things are unstructured, there are two topics that stand out:
>
>   - Patches 5 to 12 address a shortcoming of our config API. Both
>     `git_config_string()` and `git_config_pathname()` have a `const char
>     **` out parameter, but they do in fact transfer memory ownership to
>     the caller. This resulted in a bunch of memory leaks all over the
>     place.
>
>     These patches thus refactor a bunch of code and then ultimately
>     switch the out parameter to become a `char *`

I do remember getting hurt by this one relatively recently.
Addressing the issue is very much appreciated.

>   - Patches 16 to 20 have the goal of making git-mv(1) memory leak free.
>     I had a very hard time understanding how it tracks memory. I think
>     this wasn't only me, or otherwise there wouldn't be calls to
>     `UNLEAK()` in there. In any case, I decided to rewrite the arrays to
>     use a `struct strvec`, which makes tracking and releasing of memory
>     a ton easier.
>
>     It does come at the cost of more allocations because we may now
>     duplicate strings that we didn't before. But I think the tradeoff is
>     worth it because the number of strings we may now duplicate is
>     bounded by the number of command line arguments anyway.

Nice.  I have to admit that "git mv" is not one of the best-done
code in this project X-<, and improving it with rewriting was long
overdue.

Thanks.
Patrick Steinhardt May 24, 2024, 6:56 a.m. UTC | #2
On Thu, May 23, 2024 at 09:45:47AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > my mind had a couple of minutes where it was roaming, and of course it
> > immediately searched for and chased down the next rabbit hole. The
> > result is this patch series which fixes a bunch of leaks all over the
> > place. There isn't really any structure to the leaks that I did fix --
> > it's mostly things that I stumbled over. In the end, this series makes
> > another 56 test suites pass with leak checking enabled, 13 of which have
> > already been passing without any changes.
> 
> ... meaning there were coverage gaps?

Well, we never run CI jobs with GIT_TEST_PASSING_SANITIZE_LEAK=check. We
could add such a job, but the question is how important it is to notice.
The much more important part is to not regress, and that we do check in
our CI already.

The only oddball here is newly added tests. It's a bit of a shame that
the leak checking is opt-in rather than opt out, as this is the primary
way for how new tests land that do pass but aren't marked accordingly.
Also, it may raise more eyebrows during review if a new test suite is
marked as failing compared to not being marked as passing.

Ideally, we'd just get over with all the tests that currently fail with
the leak sanitizer. And honestly, that doesn't feel out of reach given
that you can fix >10% of all non-passing tests in a day or two. With
this patch series, we now have more test suites with the leak checking
enabled rather than disabled. If we continue on this track I could
certainly see this happening in a release or two.

But maybe that's just whishful thinking.

> >   - Patches 16 to 20 have the goal of making git-mv(1) memory leak free.
> >     I had a very hard time understanding how it tracks memory. I think
> >     this wasn't only me, or otherwise there wouldn't be calls to
> >     `UNLEAK()` in there. In any case, I decided to rewrite the arrays to
> >     use a `struct strvec`, which makes tracking and releasing of memory
> >     a ton easier.
> >
> >     It does come at the cost of more allocations because we may now
> >     duplicate strings that we didn't before. But I think the tradeoff is
> >     worth it because the number of strings we may now duplicate is
> >     bounded by the number of command line arguments anyway.
> 
> Nice.  I have to admit that "git mv" is not one of the best-done
> code in this project X-<, and improving it with rewriting was long
> overdue.

Well, the patches only rewrite a very small part of git-mv(1). So while
the code may be a bit easier to understand now, I still think that it
leaves a lot to be desired after the refactoring.

Patrick