Message ID | 6906c25415eddf79cc3f71f905a77b140f2f66f0.1538492321.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Clean up leaks in commit-graph.c | expand |
On Tue, 2 Oct 2018 at 17:01, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > diff --git a/commit-graph.c b/commit-graph.c > index 2a24eb8b5a..7226bd6b58 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -698,6 +698,8 @@ void write_commit_graph_reachable(const char *obj_dir, int append, > string_list_init(&list, 1); > for_each_ref(add_ref_to_list, &list); > write_commit_graph(obj_dir, NULL, &list, append, report_progress); > + > + string_list_clear(&list, 0); > } Nit: The blank line adds some asymmetry, IMVHO. > void write_commit_graph(const char *obj_dir, > @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir, > compute_generation_numbers(&commits, report_progress); > > graph_name = get_commit_graph_filename(obj_dir); > - if (safe_create_leading_directories(graph_name)) > + if (safe_create_leading_directories(graph_name)) { > + UNLEAK(graph_name); > die_errno(_("unable to create leading directories of %s"), > graph_name); > + } Do you really need this hunk? In my testing with LeakSanitizer and valgrind, I don't need this hunk to be leak-free. Generally speaking, it seems impossible to UNLEAK when dying, since we don't know what we have allocated higher up in the call-stack. Without this hunk, this patch can have my Reviewed-by: Martin Ågren <martin.agren@gmail.com> as I've verified the leaks before and after. With this hunk, I am puzzled and feel uneasy, both about having to UNLEAK before dying and about having to UNLEAK outside of builtin/. > + free(graph_name); > + free(commits.list); > free(oids.list); > oids.alloc = 0; > oids.nr = 0; Both `commits` and `oids` are on the stack here, so cleaning up one more than the other is a bit asymmetrical. Also, if we try to zero the counts -- which seems unnecessary to me, but which is not new with this patch -- we should perhaps use `FREE_AND_NULL` too. But personally, I would just use `free` and leave `nr` and `alloc` at whatever values they happen to have. Martin
On Tue, Oct 2, 2018 at 8:40 AM Martin Ågren <martin.agren@gmail.com> wrote: > > On Tue, 2 Oct 2018 at 17:01, Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > diff --git a/commit-graph.c b/commit-graph.c > > index 2a24eb8b5a..7226bd6b58 100644 > > --- a/commit-graph.c > > +++ b/commit-graph.c > > @@ -698,6 +698,8 @@ void write_commit_graph_reachable(const char *obj_dir, int append, > > string_list_init(&list, 1); > > for_each_ref(add_ref_to_list, &list); > > write_commit_graph(obj_dir, NULL, &list, append, report_progress); > > + > > + string_list_clear(&list, 0); > > } > > Nit: The blank line adds some asymmetry, IMVHO. I think these blank lines are super common, as in: { declarations; multiple; lines(of); code; cleanup; and_frees; } (c.f. display_table in column.c, which I admit to have cherry-picked as an example). While in nit territory, I would rather move the string list init into the first block: { struct string_list list = STRING_LIST_INIT_DUP; for_each_ref(add_ref_to_list, &list); write_commit_graph(obj_dir, NULL, &list, append); string_list_clear(&list, 0); } > > > void write_commit_graph(const char *obj_dir, > > @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir, > > compute_generation_numbers(&commits, report_progress); > > > > graph_name = get_commit_graph_filename(obj_dir); > > - if (safe_create_leading_directories(graph_name)) > > + if (safe_create_leading_directories(graph_name)) { > > + UNLEAK(graph_name); > > die_errno(_("unable to create leading directories of %s"), > > graph_name); > > + } > > Do you really need this hunk? graph_name is produced via xstrfmt in get_commit_graph_filename, so it needs to be free'd in any return/exit path. > In my testing with LeakSanitizer and > valgrind, I don't need this hunk to be leak-free. > Generally speaking, it > seems impossible to UNLEAK when dying, since we don't know what we have > allocated higher up in the call-stack. I do not understand; I thought UNLEAK was specifically for the purpose of die() calls without imposing extra overhead; rereading 0e5bba53af (add UNLEAK annotation for reducing leak false positives, 2017-09-08) doesn't provide an example for prematurely die()ing, only for regular program exit. > Reviewed-by: Martin Ågren <martin.agren@gmail.com> > > as I've verified the leaks before and after. With this hunk, I am > puzzled and feel uneasy, both about having to UNLEAK before dying and > about having to UNLEAK outside of builtin/. I am not uneasy about an UNLEAK before dying, but about dying outside builtin/ in general (but having a die call accompanied by UNLEAK seems to be the right thing). Can you explain the worries you have regarding the allocations on the call stack, as xstrfmt is allocating on the heap and we only UNLEAK the pointer to that? Stefan
On Tue, 2 Oct 2018 at 19:59, Stefan Beller <sbeller@google.com> wrote: > > > + > > > + string_list_clear(&list, 0); > > > } > > > > Nit: The blank line adds some asymmetry, IMVHO. > > I think these blank lines are super common, as in: > > { > declarations; > > multiple; > lines(of); > code; > > cleanup; > and_frees; > } > > (c.f. display_table in column.c, which I admit to have > cherry-picked as an example). > > While in nit territory, I would rather move the string list init > into the first block: > > { > struct string_list list = STRING_LIST_INIT_DUP; > > for_each_ref(add_ref_to_list, &list); > write_commit_graph(obj_dir, NULL, &list, append); > > string_list_clear(&list, 0); > } Now this looks very symmetrical. :-) > > > void write_commit_graph(const char *obj_dir, > > > @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir, > > > compute_generation_numbers(&commits, report_progress); > > > > > > graph_name = get_commit_graph_filename(obj_dir); > > > - if (safe_create_leading_directories(graph_name)) > > > + if (safe_create_leading_directories(graph_name)) { > > > + UNLEAK(graph_name); > > > die_errno(_("unable to create leading directories of %s"), > > > graph_name); > > > + } > > > > Do you really need this hunk? > > graph_name is produced via xstrfmt in get_commit_graph_filename, > so it needs to be free'd in any return/exit path. Agreed. Although I am questioning that `die()` and its siblings count. > > In my testing with LeakSanitizer and > > valgrind, I don't need this hunk to be leak-free. > > > > Generally speaking, it > > seems impossible to UNLEAK when dying, since we don't know what we have > > allocated higher up in the call-stack. > > I do not understand; I thought UNLEAK was specifically for the purpose of > die() calls without imposing extra overhead; rereading 0e5bba53af > (add UNLEAK annotation for reducing leak false positives, 2017-09-08) > doesn't provide an example for prematurely die()ing, only for regular > program exit. > > > [...] With this hunk, I am > > puzzled and feel uneasy, both about having to UNLEAK before dying and > > about having to UNLEAK outside of builtin/. > > I am not uneasy about an UNLEAK before dying, but about dying outside > builtin/ in general Yeah, not dying would be even better (out of scope for this patch). > (but having a die call accompanied by UNLEAK seems > to be the right thing). Can you explain the worries you have regarding the > allocations on the call stack, as xstrfmt is allocating on the heap and we > only UNLEAK the pointer to that? I think we agree that leaking things "allocat[ed] on the call stack" isn't much of a worry. The reason I mentioned the call stack is that we've got any number of calls behind us on it, and we might have made all sorts of allocations on the heap, and at this point, we have no idea about what we should be UNLEAK-ing. My worry is that one of these would seem to be true: * UNLEAK is unsuitable for the job. Whenever we have a `die()` as we do here, we can UNLEAK the variables we know of, but we can't do anything about the allocations we have made higher up the call-chain. Our test suite obviously provokes lots of calls to `die()` -- imagine that each of those leaves a few leaked allocations behind. We'd have a semi-huge number of leaks being reported. While we could mark with UNLEAK to reduce that number, we wouldn't be able to bring the number of leaks down to anywhere near manageable where we'd be able to find the last few true positives. * We add code with no purpose. In this case, we're not talking a lot of lines, but across the code base, if they bring no gain, they are bound to provide a negative net value given enough time. Martin
On Tue, Oct 2, 2018 at 12:09 PM Martin Ågren <martin.agren@gmail.com> wrote: > > On Tue, 2 Oct 2018 at 19:59, Stefan Beller <sbeller@google.com> wrote: > > > > + > > > > + string_list_clear(&list, 0); > > > > } > > > > > > Nit: The blank line adds some asymmetry, IMVHO. > > > > I think these blank lines are super common, as in: > > > > { > > declarations; > > > > multiple; > > lines(of); > > code; > > > > cleanup; > > and_frees; > > } > > > > (c.f. display_table in column.c, which I admit to have > > cherry-picked as an example). > > > > While in nit territory, I would rather move the string list init > > into the first block: > > > > { > > struct string_list list = STRING_LIST_INIT_DUP; > > > > for_each_ref(add_ref_to_list, &list); > > write_commit_graph(obj_dir, NULL, &list, append); > > > > string_list_clear(&list, 0); > > } > > Now this looks very symmetrical. :-) > > > > > void write_commit_graph(const char *obj_dir, > > > > @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir, > > > > compute_generation_numbers(&commits, report_progress); > > > > > > > > graph_name = get_commit_graph_filename(obj_dir); > > > > - if (safe_create_leading_directories(graph_name)) > > > > + if (safe_create_leading_directories(graph_name)) { > > > > + UNLEAK(graph_name); > > > > die_errno(_("unable to create leading directories of %s"), > > > > graph_name); > > > > + } > > > > > > Do you really need this hunk? > > > > graph_name is produced via xstrfmt in get_commit_graph_filename, > > so it needs to be free'd in any return/exit path. > > Agreed. Although I am questioning that `die()` and its siblings count. > > > > In my testing with LeakSanitizer and > > > valgrind, I don't need this hunk to be leak-free. > > > > > > > Generally speaking, it > > > seems impossible to UNLEAK when dying, since we don't know what we have > > > allocated higher up in the call-stack. > > > > I do not understand; I thought UNLEAK was specifically for the purpose of > > die() calls without imposing extra overhead; rereading 0e5bba53af > > (add UNLEAK annotation for reducing leak false positives, 2017-09-08) > > doesn't provide an example for prematurely die()ing, only for regular > > program exit. > > > > > [...] With this hunk, I am > > > puzzled and feel uneasy, both about having to UNLEAK before dying and > > > about having to UNLEAK outside of builtin/. > > > > I am not uneasy about an UNLEAK before dying, but about dying outside > > builtin/ in general > > Yeah, not dying would be even better (out of scope for this patch). > > > (but having a die call accompanied by UNLEAK seems > > to be the right thing). Can you explain the worries you have regarding the > > allocations on the call stack, as xstrfmt is allocating on the heap and we > > only UNLEAK the pointer to that? > > I think we agree that leaking things "allocat[ed] on the call stack" > isn't much of a worry. The reason I mentioned the call stack is that > we've got any number of calls behind us on it, and we might have made > all sorts of allocations on the heap, and at this point, we have no > idea about what we should be UNLEAK-ing. Wouldn't that be the responsibility of each function to make sure things are UNLEAK'd or free'd before the function is either over or stopped intermittently (by a subroutine dying) ? In an ideal world we'd only ever exit/die in the functions high up the call chain (which are in builtin/) and all other code would gracefully return error codes or messages instead or even cope with some failure conditions? > My worry is that one of these would seem to be true: > > * UNLEAK is unsuitable for the job. Whenever we have a `die()` as we do > here, we can UNLEAK the variables we know of, but we can't do anything > about the allocations we have made higher up the call-chain. IMHO that is the issue of the functions higher up the call chain and ought to not affect this patch. By doing the right thing here locally the code base will approach a good state eventually. > Our test > suite obviously provokes lots of calls to `die()` -- imagine that each > of those leaves a few leaked allocations behind. We'd have a semi-huge > number of leaks being reported. While we could mark with UNLEAK to > reduce that number, we wouldn't be able to bring the number of leaks > down to anywhere near manageable where we'd be able to find the last > few true positives. Makes sense. > * We add code with no purpose. In this case, we're not talking a lot of > lines, but across the code base, if they bring no gain, they are bound > to provide a negative net value given enough time. I see. I did not estimate its negative impact to be high enough, as the UNLEAK near a die() call was obvious good thing (locally). I don't know what the best way to proceed is in this case. Thanks, Stefan
On Tue, Oct 02, 2018 at 12:44:09PM -0700, Stefan Beller wrote: > > My worry is that one of these would seem to be true: > > > > * UNLEAK is unsuitable for the job. Whenever we have a `die()` as we do > > here, we can UNLEAK the variables we know of, but we can't do anything > > about the allocations we have made higher up the call-chain. > > IMHO that is the issue of the functions higher up the call chain and ought > to not affect this patch. By doing the right thing here locally the code base > will approach a good state eventually. But it's impossible. If I do this: foo = xstrdup(bar); subfunction(foo); then I cannot protect myself from leaking "foo" when subfunction() calls die(). It must be valid when I enter the function, and I have no opportunity to run code when it leaves (because it never does). > > * We add code with no purpose. In this case, we're not talking a lot of > > lines, but across the code base, if they bring no gain, they are bound > > to provide a negative net value given enough time. > > I see. I did not estimate its negative impact to be high enough, as the > UNLEAK near a die() call was obvious good thing (locally). > > I don't know what the best way to proceed is in this case. My preference is to avoid them in the name of simplicity. If you're using "make SANITIZE=leak test" to check for leaks, it will skip these cases. If you're using valgrind, I think these may be reported as "reachable". But that number already isn't useful for finding real leaks, because it includes cases like the "foo" above as well as program-lifetime globals. The only argument (IMHO) for such an UNLEAK() is that it annotates the location for when somebody later changes the function to "return -1" instead of dying. But if we are going to do such annotation, we may as well actually call free(), which is what the "return" version will ultimately have to do. I'd actually be _more_ favorable to calling free() instead of UNLEAK() there, but I'm still mildly negative, just because it may go stale (and our leak-checking wouldn't usefully notice these cases). Anybody converting that die() to a return needs to re-analyze the function for what might need to be released (and that includes non-memory bits like descriptors, too). -Peff
On Tue, Oct 02, 2018 at 10:59:28AM -0700, Stefan Beller wrote: > > Generally speaking, it > > seems impossible to UNLEAK when dying, since we don't know what we have > > allocated higher up in the call-stack. > > I do not understand; I thought UNLEAK was specifically for the purpose of > die() calls without imposing extra overhead; rereading 0e5bba53af > (add UNLEAK annotation for reducing leak false positives, 2017-09-08) > doesn't provide an example for prematurely die()ing, only for regular > program exit. I responded elsewhere, but as the author of UNLEAK, let me comment here: it was intended only for program exit. That's why there are no such examples. :) If you're using it anywhere except the return from a cmd_* function, or a static-local helper that's called from a cmd_*, you should probably actually be freeing the memory. -Peff
> > My preference is to avoid them in the name of simplicity. If you're > using "make SANITIZE=leak test" to check for leaks, it will skip these > cases. If you're using valgrind, I think these may be reported as > "reachable". But that number already isn't useful for finding real > leaks, because it includes cases like the "foo" above as well as > program-lifetime globals. > > The only argument (IMHO) for such an UNLEAK() is that it annotates the > location for when somebody later changes the function to "return -1" > instead of dying. But if we are going to do such annotation, we may as > well actually call free(), which is what the "return" version will > ultimately have to do. Heh, that was part of my reasoning why we'd want to have *something*. > I'd actually be _more_ favorable to calling free() instead of UNLEAK() > there, but I'm still mildly negative, just because it may go stale (and > our leak-checking wouldn't usefully notice these cases). Anybody > converting that die() to a return needs to re-analyze the function for > what might need to be released (and that includes non-memory bits like > descriptors, too). Sounds reasonable, so then the consensus (between Martin, you and me) is to drop the UNLEAK.
On 10/2/2018 6:44 PM, Stefan Beller wrote: >> My preference is to avoid them in the name of simplicity. If you're >> using "make SANITIZE=leak test" to check for leaks, it will skip these >> cases. If you're using valgrind, I think these may be reported as >> "reachable". But that number already isn't useful for finding real >> leaks, because it includes cases like the "foo" above as well as >> program-lifetime globals. >> >> The only argument (IMHO) for such an UNLEAK() is that it annotates the >> location for when somebody later changes the function to "return -1" >> instead of dying. But if we are going to do such annotation, we may as >> well actually call free(), which is what the "return" version will >> ultimately have to do. > Heh, that was part of my reasoning why we'd want to have *something*. > >> I'd actually be _more_ favorable to calling free() instead of UNLEAK() >> there, but I'm still mildly negative, just because it may go stale (and >> our leak-checking wouldn't usefully notice these cases). Anybody >> converting that die() to a return needs to re-analyze the function for >> what might need to be released (and that includes non-memory bits like >> descriptors, too). > Sounds reasonable, so then the consensus (between Martin, you and me) > is to drop the UNLEAK. Thanks for the discussion here. I'll drop the UNLEAK for now and think about how to remove the die() calls from commit-graph.c in a later series. Thanks, -Stolee
diff --git a/commit-graph.c b/commit-graph.c index 2a24eb8b5a..7226bd6b58 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -698,6 +698,8 @@ void write_commit_graph_reachable(const char *obj_dir, int append, string_list_init(&list, 1); for_each_ref(add_ref_to_list, &list); write_commit_graph(obj_dir, NULL, &list, append, report_progress); + + string_list_clear(&list, 0); } void write_commit_graph(const char *obj_dir, @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir, compute_generation_numbers(&commits, report_progress); graph_name = get_commit_graph_filename(obj_dir); - if (safe_create_leading_directories(graph_name)) + if (safe_create_leading_directories(graph_name)) { + UNLEAK(graph_name); die_errno(_("unable to create leading directories of %s"), graph_name); + } hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR); f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); @@ -893,6 +897,8 @@ void write_commit_graph(const char *obj_dir, finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC); commit_lock_file(&lk); + free(graph_name); + free(commits.list); free(oids.list); oids.alloc = 0; oids.nr = 0;