diff mbox series

[09/13] sparse-checkout: refactor temporary sparse_checkout_patterns

Message ID 20240531113530.GI428814@coredump.intra.peff.net (mailing list archive)
State Superseded
Headers show
Series leak fixes for sparse-checkout code | expand

Commit Message

Jeff King May 31, 2024, 11:35 a.m. UTC
In update_working_directory(), we take in a pattern_list, attach it to
the repository index by assigning it to index->sparse_checkout_patterns,
and then call unpack_trees. Afterwards, we remove it by setting
index->sparse_checkout_patterns back to NULL.

But there are two possible leaks here:

  1. If the index already had a populated sparse_checkout_patterns,
     we've obliterated it. We can fix this by saving and restoring it,
     rather than always setting it back to NULL.

  2. We may call the function with a NULL pattern_list, expecting it to
     use the on-disk sparse file. In that case, the index routines will
     lazy-load the sparse patterns automatically. But now at the end of
     the function when we restore the patterns, we'll leak those
     lazy-loaded ones!

     We can fix this by freeing the pattern list before overwriting its
     pointer whenever it does not match what was passed in (in practice
     this should only happen when the passed-in list is NULL, but this
     is erring on the defensive side).

Together these remove 48 indirect leaks found in t1091.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/sparse-checkout.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Patrick Steinhardt June 4, 2024, 7:42 a.m. UTC | #1
On Fri, May 31, 2024 at 07:35:30AM -0400, Jeff King wrote:
> In update_working_directory(), we take in a pattern_list, attach it to
> the repository index by assigning it to index->sparse_checkout_patterns,
> and then call unpack_trees. Afterwards, we remove it by setting
> index->sparse_checkout_patterns back to NULL.
> 
> But there are two possible leaks here:
> 
>   1. If the index already had a populated sparse_checkout_patterns,
>      we've obliterated it. We can fix this by saving and restoring it,
>      rather than always setting it back to NULL.

So this isn't only a leak, but also a potential bug because we did not
restore the old pattern list?

> @@ -241,7 +243,12 @@ static int update_working_directory(struct pattern_list *pl)
>  
>  	clean_tracked_sparse_directories(r);
>  
> -	r->index->sparse_checkout_patterns = NULL;
> +	if (r->index->sparse_checkout_patterns != pl) {
> +		clear_pattern_list(r->index->sparse_checkout_patterns);
> +		FREE_AND_NULL(r->index->sparse_checkout_patterns);
> +	} else {
> +		r->index->sparse_checkout_patterns = old_pl;
> +	}
>  	return result;
>  }

What I find weird is that we sometimes restore the old value, and
sometimes we don't. I get that it makes sense to conditionally free only
the lazy-loaded list. But shouldn't we then unconditionally assign
`old_pl`?

Makes me wonder whether I miss some subtlety here.

Patrick
Jeff King June 4, 2024, 9:53 a.m. UTC | #2
On Tue, Jun 04, 2024 at 09:42:51AM +0200, Patrick Steinhardt wrote:

> > But there are two possible leaks here:
> > 
> >   1. If the index already had a populated sparse_checkout_patterns,
> >      we've obliterated it. We can fix this by saving and restoring it,
> >      rather than always setting it back to NULL.
> 
> So this isn't only a leak, but also a potential bug because we did not
> restore the old pattern list?

That's a good question, but I think in practice, it's not possible to
trigger the bug. Within a run that calls this function, we only care
about index->sparse_checkout_patterns in the context of this function
itself (that is, we do not otherwise try to read the sparse index). And
in this function, we always want to use the pattern_list that the caller
hands us.

So nobody actually cares about the "restored" pattern list at all
(except that in the lazy-load case, we leak it). Every call will
overwrite it anyway.

> > @@ -241,7 +243,12 @@ static int update_working_directory(struct pattern_list *pl)
> >  
> >  	clean_tracked_sparse_directories(r);
> >  
> > -	r->index->sparse_checkout_patterns = NULL;
> > +	if (r->index->sparse_checkout_patterns != pl) {
> > +		clear_pattern_list(r->index->sparse_checkout_patterns);
> > +		FREE_AND_NULL(r->index->sparse_checkout_patterns);
> > +	} else {
> > +		r->index->sparse_checkout_patterns = old_pl;
> > +	}
> >  	return result;
> >  }
> 
> What I find weird is that we sometimes restore the old value, and
> sometimes we don't. I get that it makes sense to conditionally free only
> the lazy-loaded list. But shouldn't we then unconditionally assign
> `old_pl`?

Yes, I think it would be more correct to do so (otherwise we risk
leaking old_pl). In practice, as I mentioned above, this is the only
function that actually assigns to the index list. So after this hunk, I
think we'd always come in to the function with a NULL value in the
index (it starts NULL, we restore NULL after using a passed-in value,
and we restore free-and-NULL after lazy-loading).

So we _could_ actually drop the "save old_pl" hunk from the beginning of
the function entirely. I only hit that case after trying something like:

  /* only throw away patterns if they were the ones we own */
  if (r->index->sparse_checkout_patterns == pl)
	r->index->sparse_checkout_patterns = NULL;

at the end of the function to fix the leak. And with that code, we _do_
see the lazy-loaded entries in subsequent calls. If we instead do:

  /* throw away lazy-loaded ones */
  if (r->index->sparse_checkout_patterns != pl) {
	clear_pattern_list(r->index->sparse_checkout_patterns);
	FREE_AND_NULL(r->index->sparse_checkout_patterns);
  }

Then I think you don't really need the else clause at all. But I was
trying to be defensive and not make too many assumptions. I think in
that spirit, just restoring old_pl always is the right thing, even
though I don't think we'd ever trigger that.

-Peff
diff mbox series

Patch

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 8f8f5c359f..356b7349f9 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -205,11 +205,13 @@  static int update_working_directory(struct pattern_list *pl)
 	struct unpack_trees_options o;
 	struct lock_file lock_file = LOCK_INIT;
 	struct repository *r = the_repository;
+	struct pattern_list *old_pl;
 
 	/* If no branch has been checked out, there are no updates to make. */
 	if (is_index_unborn(r->index))
 		return UPDATE_SPARSITY_SUCCESS;
 
+	old_pl = r->index->sparse_checkout_patterns;
 	r->index->sparse_checkout_patterns = pl;
 
 	memset(&o, 0, sizeof(o));
@@ -241,7 +243,12 @@  static int update_working_directory(struct pattern_list *pl)
 
 	clean_tracked_sparse_directories(r);
 
-	r->index->sparse_checkout_patterns = NULL;
+	if (r->index->sparse_checkout_patterns != pl) {
+		clear_pattern_list(r->index->sparse_checkout_patterns);
+		FREE_AND_NULL(r->index->sparse_checkout_patterns);
+	} else {
+		r->index->sparse_checkout_patterns = old_pl;
+	}
 	return result;
 }