Message ID | cover.1716465556.git.ps@pks.im (mailing list archive) |
---|---|
Headers | show |
Series | Various memory leak fixes | expand |
Patrick Steinhardt <ps@pks.im> writes: > my mind had a couple of minutes where it was roaming, and of course it > immediately searched for and chased down the next rabbit hole. The > result is this patch series which fixes a bunch of leaks all over the > place. There isn't really any structure to the leaks that I did fix -- > it's mostly things that I stumbled over. In the end, this series makes > another 56 test suites pass with leak checking enabled, 13 of which have > already been passing without any changes. ... meaning there were coverage gaps? > While most things are unstructured, there are two topics that stand out: > > - Patches 5 to 12 address a shortcoming of our config API. Both > `git_config_string()` and `git_config_pathname()` have a `const char > **` out parameter, but they do in fact transfer memory ownership to > the caller. This resulted in a bunch of memory leaks all over the > place. > > These patches thus refactor a bunch of code and then ultimately > switch the out parameter to become a `char *` I do remember getting hurt by this one relatively recently. Addressing the issue is very much appreciated. > - Patches 16 to 20 have the goal of making git-mv(1) memory leak free. > I had a very hard time understanding how it tracks memory. I think > this wasn't only me, or otherwise there wouldn't be calls to > `UNLEAK()` in there. In any case, I decided to rewrite the arrays to > use a `struct strvec`, which makes tracking and releasing of memory > a ton easier. > > It does come at the cost of more allocations because we may now > duplicate strings that we didn't before. But I think the tradeoff is > worth it because the number of strings we may now duplicate is > bounded by the number of command line arguments anyway. Nice. I have to admit that "git mv" is not one of the best-done code in this project X-<, and improving it with rewriting was long overdue. Thanks.
On Thu, May 23, 2024 at 09:45:47AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > my mind had a couple of minutes where it was roaming, and of course it > > immediately searched for and chased down the next rabbit hole. The > > result is this patch series which fixes a bunch of leaks all over the > > place. There isn't really any structure to the leaks that I did fix -- > > it's mostly things that I stumbled over. In the end, this series makes > > another 56 test suites pass with leak checking enabled, 13 of which have > > already been passing without any changes. > > ... meaning there were coverage gaps? Well, we never run CI jobs with GIT_TEST_PASSING_SANITIZE_LEAK=check. We could add such a job, but the question is how important it is to notice. The much more important part is to not regress, and that we do check in our CI already. The only oddball here is newly added tests. It's a bit of a shame that the leak checking is opt-in rather than opt out, as this is the primary way for how new tests land that do pass but aren't marked accordingly. Also, it may raise more eyebrows during review if a new test suite is marked as failing compared to not being marked as passing. Ideally, we'd just get over with all the tests that currently fail with the leak sanitizer. And honestly, that doesn't feel out of reach given that you can fix >10% of all non-passing tests in a day or two. With this patch series, we now have more test suites with the leak checking enabled rather than disabled. If we continue on this track I could certainly see this happening in a release or two. But maybe that's just whishful thinking. > > - Patches 16 to 20 have the goal of making git-mv(1) memory leak free. > > I had a very hard time understanding how it tracks memory. I think > > this wasn't only me, or otherwise there wouldn't be calls to > > `UNLEAK()` in there. In any case, I decided to rewrite the arrays to > > use a `struct strvec`, which makes tracking and releasing of memory > > a ton easier. > > > > It does come at the cost of more allocations because we may now > > duplicate strings that we didn't before. But I think the tradeoff is > > worth it because the number of strings we may now duplicate is > > bounded by the number of command line arguments anyway. > > Nice. I have to admit that "git mv" is not one of the best-done > code in this project X-<, and improving it with rewriting was long > overdue. Well, the patches only rewrite a very small part of git-mv(1). So while the code may be a bit easier to understand now, I still think that it leaves a lot to be desired after the refactoring. Patrick