Message ID | 20240531113530.GI428814@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | leak fixes for sparse-checkout code | expand |
On Fri, May 31, 2024 at 07:35:30AM -0400, Jeff King wrote: > In update_working_directory(), we take in a pattern_list, attach it to > the repository index by assigning it to index->sparse_checkout_patterns, > and then call unpack_trees. Afterwards, we remove it by setting > index->sparse_checkout_patterns back to NULL. > > But there are two possible leaks here: > > 1. If the index already had a populated sparse_checkout_patterns, > we've obliterated it. We can fix this by saving and restoring it, > rather than always setting it back to NULL. So this isn't only a leak, but also a potential bug because we did not restore the old pattern list? > @@ -241,7 +243,12 @@ static int update_working_directory(struct pattern_list *pl) > > clean_tracked_sparse_directories(r); > > - r->index->sparse_checkout_patterns = NULL; > + if (r->index->sparse_checkout_patterns != pl) { > + clear_pattern_list(r->index->sparse_checkout_patterns); > + FREE_AND_NULL(r->index->sparse_checkout_patterns); > + } else { > + r->index->sparse_checkout_patterns = old_pl; > + } > return result; > } What I find weird is that we sometimes restore the old value, and sometimes we don't. I get that it makes sense to conditionally free only the lazy-loaded list. But shouldn't we then unconditionally assign `old_pl`? Makes me wonder whether I miss some subtlety here. Patrick
On Tue, Jun 04, 2024 at 09:42:51AM +0200, Patrick Steinhardt wrote: > > But there are two possible leaks here: > > > > 1. If the index already had a populated sparse_checkout_patterns, > > we've obliterated it. We can fix this by saving and restoring it, > > rather than always setting it back to NULL. > > So this isn't only a leak, but also a potential bug because we did not > restore the old pattern list? That's a good question, but I think in practice, it's not possible to trigger the bug. Within a run that calls this function, we only care about index->sparse_checkout_patterns in the context of this function itself (that is, we do not otherwise try to read the sparse index). And in this function, we always want to use the pattern_list that the caller hands us. So nobody actually cares about the "restored" pattern list at all (except that in the lazy-load case, we leak it). Every call will overwrite it anyway. > > @@ -241,7 +243,12 @@ static int update_working_directory(struct pattern_list *pl) > > > > clean_tracked_sparse_directories(r); > > > > - r->index->sparse_checkout_patterns = NULL; > > + if (r->index->sparse_checkout_patterns != pl) { > > + clear_pattern_list(r->index->sparse_checkout_patterns); > > + FREE_AND_NULL(r->index->sparse_checkout_patterns); > > + } else { > > + r->index->sparse_checkout_patterns = old_pl; > > + } > > return result; > > } > > What I find weird is that we sometimes restore the old value, and > sometimes we don't. I get that it makes sense to conditionally free only > the lazy-loaded list. But shouldn't we then unconditionally assign > `old_pl`? Yes, I think it would be more correct to do so (otherwise we risk leaking old_pl). In practice, as I mentioned above, this is the only function that actually assigns to the index list. So after this hunk, I think we'd always come in to the function with a NULL value in the index (it starts NULL, we restore NULL after using a passed-in value, and we restore free-and-NULL after lazy-loading). So we _could_ actually drop the "save old_pl" hunk from the beginning of the function entirely. I only hit that case after trying something like: /* only throw away patterns if they were the ones we own */ if (r->index->sparse_checkout_patterns == pl) r->index->sparse_checkout_patterns = NULL; at the end of the function to fix the leak. And with that code, we _do_ see the lazy-loaded entries in subsequent calls. If we instead do: /* throw away lazy-loaded ones */ if (r->index->sparse_checkout_patterns != pl) { clear_pattern_list(r->index->sparse_checkout_patterns); FREE_AND_NULL(r->index->sparse_checkout_patterns); } Then I think you don't really need the else clause at all. But I was trying to be defensive and not make too many assumptions. I think in that spirit, just restoring old_pl always is the right thing, even though I don't think we'd ever trigger that. -Peff
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index 8f8f5c359f..356b7349f9 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -205,11 +205,13 @@ static int update_working_directory(struct pattern_list *pl) struct unpack_trees_options o; struct lock_file lock_file = LOCK_INIT; struct repository *r = the_repository; + struct pattern_list *old_pl; /* If no branch has been checked out, there are no updates to make. */ if (is_index_unborn(r->index)) return UPDATE_SPARSITY_SUCCESS; + old_pl = r->index->sparse_checkout_patterns; r->index->sparse_checkout_patterns = pl; memset(&o, 0, sizeof(o)); @@ -241,7 +243,12 @@ static int update_working_directory(struct pattern_list *pl) clean_tracked_sparse_directories(r); - r->index->sparse_checkout_patterns = NULL; + if (r->index->sparse_checkout_patterns != pl) { + clear_pattern_list(r->index->sparse_checkout_patterns); + FREE_AND_NULL(r->index->sparse_checkout_patterns); + } else { + r->index->sparse_checkout_patterns = old_pl; + } return result; }
In update_working_directory(), we take in a pattern_list, attach it to the repository index by assigning it to index->sparse_checkout_patterns, and then call unpack_trees. Afterwards, we remove it by setting index->sparse_checkout_patterns back to NULL. But there are two possible leaks here: 1. If the index already had a populated sparse_checkout_patterns, we've obliterated it. We can fix this by saving and restoring it, rather than always setting it back to NULL. 2. We may call the function with a NULL pattern_list, expecting it to use the on-disk sparse file. In that case, the index routines will lazy-load the sparse patterns automatically. But now at the end of the function when we restore the patterns, we'll leak those lazy-loaded ones! We can fix this by freeing the pattern list before overwriting its pointer whenever it does not match what was passed in (in practice this should only happen when the passed-in list is NULL, but this is erring on the defensive side). Together these remove 48 indirect leaks found in t1091. Signed-off-by: Jeff King <peff@peff.net> --- builtin/sparse-checkout.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)