diff mbox series

[1/2] commit-graph.c: remove temporary graph layers on exit

Message ID 25324fea5b7c7f748d7f4e1e40299c0af04006e8.1717712358.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit c195ecda7703098f28c160cc63306ca0e1488236
Headers show
Series commit-graph/server-info: use tempfile.h in more places | expand

Commit Message

Taylor Blau June 6, 2024, 10:19 p.m. UTC
Since the introduction of split commit graph layers in 92b1ea66b9a
(Merge branch 'ds/commit-graph-incremental', 2019-07-19), the function
write_commit_graph_file() has done the following when writing an
incremental commit-graph layer:

  - used a lock_file to control access to the commit-graph-chain file

  - used an auxiliary file (whose descriptor was stored in 'fd') to
    write the new commit-graph layer itself

Using a lock_file to control access to the commit-graph-chain is
sensible, since only one writer may modify it at a time. Likewise, when
the commit-graph machinery is writing out a single layer, the lock_file
structure is used to modify the commit-graph itself. This is also
sensible, since the non-incremental commit-graph may also have at most
one writer.

However, using an auxiliary temporary file without using the tempfile.h
API means that writes that fail after the temporary graph layer has been
created will leave around a file in

    $GIT_DIR/objects/info/commit-graphs/tmp_graph_XXXXXX

The commit-graph-chain file and non-incremental commit-graph do not
suffer from this problem as the lockfile.h API uses the tempfile.h API
transparently, so processes that died before moving those finals into
their final location cleaned up after themselves.

Ensure that the temporary file used to write incremental commit-graphs
is also managed with the tempfile.h API, to ensure that we do not ever
leave tmp_graph_XXXXXX files laying around.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c                | 19 ++++++++-----------
 t/t5324-split-commit-graph.sh | 26 +++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 12 deletions(-)

Comments

Junio C Hamano June 7, 2024, 4:33 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> @@ -2133,8 +2132,6 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
>  		char *final_graph_name;
>  		int result;
>  
> -		close(fd);
> -
>  		if (!chainf) {
>  			error(_("unable to open commit-graph chain file"));
>  			return -1;
> @@ -2169,7 +2166,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
>  		free(ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 1]);
>  		ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 1] = final_graph_name;
>  
> -		result = rename(ctx->graph_name, final_graph_name);
> +		result = rename_tempfile(&graph_layer, final_graph_name);

Before this rename, after the close(fd) we saw in the previous hunk,
there is one early error return when we fail to rename the base
graph file.  Do we need to do anything there, or an unfinished
tempfile getting removed at the process termination is sufficient
for cleaning up the mess?

Other than that, this looked quite straight-forward.

Thanks.
Taylor Blau June 7, 2024, 9:41 p.m. UTC | #2
On Fri, Jun 07, 2024 at 09:33:56AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > @@ -2133,8 +2132,6 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
> >  		char *final_graph_name;
> >  		int result;
> >
> > -		close(fd);
> > -
> >  		if (!chainf) {
> >  			error(_("unable to open commit-graph chain file"));
> >  			return -1;
> > @@ -2169,7 +2166,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
> >  		free(ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 1]);
> >  		ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 1] = final_graph_name;
> >
> > -		result = rename(ctx->graph_name, final_graph_name);
> > +		result = rename_tempfile(&graph_layer, final_graph_name);
>
> Before this rename, after the close(fd) we saw in the previous hunk,
> there is one early error return when we fail to rename the base
> graph file.  Do we need to do anything there, or an unfinished
> tempfile getting removed at the process termination is sufficient
> for cleaning up the mess?

We could explicitly clean it up, but we'll do so implicitly upon exit,
so I think it's fine to leave it as-is.

Thanks,
Taylor
Junio C Hamano June 7, 2024, 9:47 p.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

>> > -		result = rename(ctx->graph_name, final_graph_name);
>> > +		result = rename_tempfile(&graph_layer, final_graph_name);
>>
>> Before this rename, after the close(fd) we saw in the previous hunk,
>> there is one early error return when we fail to rename the base
>> graph file.  Do we need to do anything there, or an unfinished
>> tempfile getting removed at the process termination is sufficient
>> for cleaning up the mess?
>
> We could explicitly clean it up, but we'll do so implicitly upon exit,
> so I think it's fine to leave it as-is.

I am not worried about cleaning it up.  Upon exit, the underlying
file descriptor will be closed, but this new code never does
fclose() on the FILE* that has a buffer around the underlying file
descriptor.  How are we guaranteeing that we are not losing anything
buffered but not flushed yet?
Junio C Hamano June 7, 2024, 9:49 p.m. UTC | #4
Taylor Blau <me@ttaylorr.com> writes:

>> Before this rename, after the close(fd) we saw in the previous hunk,
>> there is one early error return when we fail to rename the base
>> graph file.  Do we need to do anything there, or an unfinished
>> tempfile getting removed at the process termination is sufficient
>> for cleaning up the mess?
>
> We could explicitly clean it up, but we'll do so implicitly upon exit,
> so I think it's fine to leave it as-is.

OK.  That sounds good.
Jeff King June 8, 2024, 9:42 a.m. UTC | #5
On Fri, Jun 07, 2024 at 02:47:58PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> >> > -		result = rename(ctx->graph_name, final_graph_name);
> >> > +		result = rename_tempfile(&graph_layer, final_graph_name);
> >>
> >> Before this rename, after the close(fd) we saw in the previous hunk,
> >> there is one early error return when we fail to rename the base
> >> graph file.  Do we need to do anything there, or an unfinished
> >> tempfile getting removed at the process termination is sufficient
> >> for cleaning up the mess?
> >
> > We could explicitly clean it up, but we'll do so implicitly upon exit,
> > so I think it's fine to leave it as-is.
> 
> I am not worried about cleaning it up.  Upon exit, the underlying
> file descriptor will be closed, but this new code never does
> fclose() on the FILE* that has a buffer around the underlying file
> descriptor.  How are we guaranteeing that we are not losing anything
> buffered but not flushed yet?

I'm not sure I understand. There is no "FILE *" at all in this patch (we
use the descriptor directly via the hashfd interface). But even if there
were, if we leave without renaming the temp file into place, then that
tempfile will be deleted. So it wouldn't matter if we flushed to it or
not, because all of those byte are destined to be thrown away.

-Peff
Jeff King June 8, 2024, 9:53 a.m. UTC | #6
On Thu, Jun 06, 2024 at 06:19:25PM -0400, Taylor Blau wrote:

> @@ -2035,24 +2035,23 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
>  					       LOCK_DIE_ON_ERROR, 0444);
>  		free(lock_name);
>  
> -		fd = git_mkstemp_mode(ctx->graph_name, 0444);
> -		if (fd < 0) {
> +		graph_layer = mks_tempfile_m(ctx->graph_name, 0444);
> +		if (!graph_layer) {
>  			error(_("unable to create temporary graph layer"));
>  			return -1;
>  		}
>  
> -		if (adjust_shared_perm(ctx->graph_name)) {
> +		if (adjust_shared_perm(get_tempfile_path(graph_layer))) {
>  			error(_("unable to adjust shared permissions for '%s'"),
> -			      ctx->graph_name);
> +			      get_tempfile_path(graph_layer));
>  			return -1;
>  		}

Most errors will cause us to die(), but this "return" and the one that
Junio noted later (when renaming the base graph file fails) mean we'll
return with the tempfile still active. We'll clean it up when the
program exits, but if there were a long-running program that
called write_commit_graph_file(), that tempfile might hang around longer
than necessary.

But that is the same strategy that the existing code uses for the lock
we use for the chain filename! And it is much worse there, because now
it is not just a tempfile hanging around, but we're blocking anybody
else from taking the lock.

So I think it would be OK to punt on this for now. Your patch is not
making the situation worse, and it's all a problem for a hypothetical
libification of this function.

> +test_expect_success 'temporary graph layer is discarded upon failure' '
> +	git init layer-discard &&
> +	(
> +		cd layer-discard &&
> +
> +		test_commit A &&
> +		test_commit B &&
> +
> +		# Intentionally remove commit "A" from the object store
> +		# so that the commit-graph machinery fails to parse the
> +		# parents of "B".
> +		#
> +		# This takes place after the commit-graph machinery has
> +		# initialized a new temporary file to store the contents
> +		# of the new graph layer, so will allow us to ensure
> +		# that the temporary file is discarded upon failure.
> +		rm $objdir/$(test_oid_to_path $(git rev-parse HEAD^)) &&
> +
> +		test_must_fail git commit-graph write --reachable --split &&
> +		test_dir_is_empty $graphdir
> +	)
> +'

I'm glad you were able to come up with a case that fails cleanly and
non-racily. The exit code of rev-parse will be lost. I doubt that it
matters in practice, but I wouldn't be surprised if our shell linting
eventually learns to complain about this spot.

Looks like there are a few similar ones in the test suite already, from
t5329. I'd be content to leave it for now and deal with it later if
somebody really wants to go on a crusade against lost exit codes.

-Peff
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index e5dd3553dfe..92c71637142 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2002,8 +2002,8 @@  static int write_graph_chunk_base(struct hashfile *f,
 static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 {
 	uint32_t i;
-	int fd;
 	struct hashfile *f;
+	struct tempfile *graph_layer; /* when ctx->split is non-zero */
 	struct lock_file lk = LOCK_INIT;
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
@@ -2035,24 +2035,23 @@  static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 					       LOCK_DIE_ON_ERROR, 0444);
 		free(lock_name);
 
-		fd = git_mkstemp_mode(ctx->graph_name, 0444);
-		if (fd < 0) {
+		graph_layer = mks_tempfile_m(ctx->graph_name, 0444);
+		if (!graph_layer) {
 			error(_("unable to create temporary graph layer"));
 			return -1;
 		}
 
-		if (adjust_shared_perm(ctx->graph_name)) {
+		if (adjust_shared_perm(get_tempfile_path(graph_layer))) {
 			error(_("unable to adjust shared permissions for '%s'"),
-			      ctx->graph_name);
+			      get_tempfile_path(graph_layer));
 			return -1;
 		}
 
-		f = hashfd(fd, ctx->graph_name);
+		f = hashfd(get_tempfile_fd(graph_layer), get_tempfile_path(graph_layer));
 	} else {
 		hold_lock_file_for_update_mode(&lk, ctx->graph_name,
 					       LOCK_DIE_ON_ERROR, 0444);
-		fd = get_lock_file_fd(&lk);
-		f = hashfd(fd, get_lock_file_path(&lk));
+		f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 	}
 
 	cf = init_chunkfile(f);
@@ -2133,8 +2132,6 @@  static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 		char *final_graph_name;
 		int result;
 
-		close(fd);
-
 		if (!chainf) {
 			error(_("unable to open commit-graph chain file"));
 			return -1;
@@ -2169,7 +2166,7 @@  static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 		free(ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 1]);
 		ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 1] = final_graph_name;
 
-		result = rename(ctx->graph_name, final_graph_name);
+		result = rename_tempfile(&graph_layer, final_graph_name);
 
 		for (i = 0; i < ctx->num_commit_graphs_after; i++)
 			fprintf(get_lock_file_fp(&lk), "%s\n", ctx->commit_graph_hash_after[i]);
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 281266f7883..77e91547ea3 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -13,7 +13,8 @@  test_expect_success 'setup repo' '
 	git init &&
 	git config core.commitGraph true &&
 	git config gc.writeCommitGraph false &&
-	infodir=".git/objects/info" &&
+	objdir=".git/objects" &&
+	infodir="$objdir/info" &&
 	graphdir="$infodir/commit-graphs" &&
 	test_oid_cache <<-EOM
 	shallow sha1:2132
@@ -718,4 +719,27 @@  test_expect_success 'write generation data chunk when commit-graph chain is repl
 	)
 '
 
+test_expect_success 'temporary graph layer is discarded upon failure' '
+	git init layer-discard &&
+	(
+		cd layer-discard &&
+
+		test_commit A &&
+		test_commit B &&
+
+		# Intentionally remove commit "A" from the object store
+		# so that the commit-graph machinery fails to parse the
+		# parents of "B".
+		#
+		# This takes place after the commit-graph machinery has
+		# initialized a new temporary file to store the contents
+		# of the new graph layer, so will allow us to ensure
+		# that the temporary file is discarded upon failure.
+		rm $objdir/$(test_oid_to_path $(git rev-parse HEAD^)) &&
+
+		test_must_fail git commit-graph write --reachable --split &&
+		test_dir_is_empty $graphdir
+	)
+'
+
 test_done