diff mbox series

[4/4] commit-graph: close descriptors after mmap

Message ID e05db264cb50760cab222157b436e82adeaeadc8.1587677671.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series commit-graph: handle file descriptor exhaustion | expand

Commit Message

Taylor Blau April 23, 2020, 9:41 p.m. UTC
From: Jeff King <peff@peff.net>

We don't ever refer to the descriptor after mmap-ing it. And keeping it
open means we can run out of descriptors in degenerate cases (e.g.,
thousands of split chain files). Let's close it as soon as possible.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c      | 15 +++++----------
 commit-graph.h      |  3 +--
 fuzz-commit-graph.c |  5 ++---
 3 files changed, 8 insertions(+), 15 deletions(-)

Comments

Junio C Hamano April 23, 2020, 10:04 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> From: Jeff King <peff@peff.net>
>
> We don't ever refer to the descriptor after mmap-ing it. And keeping it
> open means we can run out of descriptors in degenerate cases (e.g.,
> thousands of split chain files). Let's close it as soon as possible.

Yikes.  

Sorry, I should have looked at the use of mmap in this topioc more
carefully when we queued the series.  It is an easy mistake to make
by anybody new to the API to leave it open while the region is in
use.

With this fix, with or without the other topics still in flight, I
do not think no code touches graph_fd.  Should we remove the
graph_fd field from the structure as well?




 commit-graph.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/commit-graph.h b/commit-graph.h
index a0a2c4a1e5..1254eae948 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -40,8 +40,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r,
 				      const struct commit *c);
 
 struct commit_graph {
-	int graph_fd;
-
 	const unsigned char *data;
 	size_t data_len;
Jeff King April 24, 2020, 3:56 a.m. UTC | #2
On Thu, Apr 23, 2020 at 03:04:19PM -0700, Junio C Hamano wrote:

> With this fix, with or without the other topics still in flight, I
> do not think no code touches graph_fd.  Should we remove the
> graph_fd field from the structure as well?

Oops, I thought I did.

> diff --git a/commit-graph.h b/commit-graph.h
> index a0a2c4a1e5..1254eae948 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -40,8 +40,6 @@ struct tree *get_commit_tree_in_graph(struct repository *r,
>  				      const struct commit *c);
>  
>  struct commit_graph {
> -	int graph_fd;
> -

Yes, this should definitely be squashed in.

-Peff
Derrick Stolee April 24, 2020, 1:17 p.m. UTC | #3
On 4/23/2020 6:04 PM, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
>> From: Jeff King <peff@peff.net>
>>
>> We don't ever refer to the descriptor after mmap-ing it. And keeping it
>> open means we can run out of descriptors in degenerate cases (e.g.,
>> thousands of split chain files). Let's close it as soon as possible.
> 
> Yikes.  
> 
> Sorry, I should have looked at the use of mmap in this topioc more
> carefully when we queued the series.  It is an easy mistake to make
> by anybody new to the API to leave it open while the region is in
> use.

You are right. I was new when first contributing the commit-graph. It
was also easier to miss because we only had one commit-graph open at
the time. Adding in the incremental file format led to multiple file
descriptors being open.

However, this idea of closing a descriptor after an mmap is new to
me. So I thought about other situations where I made the same mistake.
Please see the patch below.

> With this fix, with or without the other topics still in flight, I
> do not think no code touches graph_fd.  Should we remove the
> graph_fd field from the structure as well?

I agree that this should be done.

Thanks,
-Stolee

-->8--

From: Derrick Stolee <dstolee@microsoft.com>
Date: Fri, 24 Apr 2020 13:11:13 +0000
Subject: [PATCH] multi-pack-index: close file descriptor after mmap

We recently discovered that the commit-graph was not closing its
file descriptor after memory-mapping the file contents. After this
mmap() succeeds, there is no need to keep the file descriptor open.
In fact, there is signficant reason to close it so we do not run
out of descriptors.

This is entirely my fault for not knowing that we can have an open
mmap while also closing the descriptor. Some could blame this on
being a new contributor when the series was first submitted, but
even years later this is still new information to me.

That made me realize that I used the same pattern when opening a
multi-pack-index. Since this file is not (yet) incremental, there
will be at most one descriptor open for this reason. It is still
worth fixing, especially if we extend the format to be incremental
like the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c | 4 +---
 midx.h | 2 --
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/midx.c b/midx.c
index 1527e464a7..60d30e873b 100644
--- a/midx.c
+++ b/midx.c
@@ -72,9 +72,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	FREE_AND_NULL(midx_name);
 
 	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
 
 	FLEX_ALLOC_STR(m, object_dir, object_dir);
-	m->fd = fd;
 	m->data = midx_map;
 	m->data_len = midx_size;
 	m->local = local;
@@ -190,8 +190,6 @@ void close_midx(struct multi_pack_index *m)
 		return;
 
 	munmap((unsigned char *)m->data, m->data_len);
-	close(m->fd);
-	m->fd = -1;
 
 	for (i = 0; i < m->num_packs; i++) {
 		if (m->packs[i])
diff --git a/midx.h b/midx.h
index e6fa356b5c..b18cf53bc4 100644
--- a/midx.h
+++ b/midx.h
@@ -12,8 +12,6 @@ struct repository;
 struct multi_pack_index {
 	struct multi_pack_index *next;
 
-	int fd;
-
 	const unsigned char *data;
 	size_t data_len;
Taylor Blau April 24, 2020, 4:35 p.m. UTC | #4
On Fri, Apr 24, 2020 at 09:17:16AM -0400, Derrick Stolee wrote:
> On 4/23/2020 6:04 PM, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> >> From: Jeff King <peff@peff.net>
> >>
> >> We don't ever refer to the descriptor after mmap-ing it. And keeping it
> >> open means we can run out of descriptors in degenerate cases (e.g.,
> >> thousands of split chain files). Let's close it as soon as possible.
> >
> > Yikes.
> >
> > Sorry, I should have looked at the use of mmap in this topioc more
> > carefully when we queued the series.  It is an easy mistake to make
> > by anybody new to the API to leave it open while the region is in
> > use.
>
> You are right. I was new when first contributing the commit-graph. It
> was also easier to miss because we only had one commit-graph open at
> the time. Adding in the incremental file format led to multiple file
> descriptors being open.
>
> However, this idea of closing a descriptor after an mmap is new to
> me. So I thought about other situations where I made the same mistake.
> Please see the patch below.

It's new to me, too :). If I had known it beforehand, then I would have
written the fourth patch here myself. But, I didn't, so I am grateful to
Peff for teaching me something new here.

> > With this fix, with or without the other topics still in flight, I
> > do not think no code touches graph_fd.  Should we remove the
> > graph_fd field from the structure as well?
>
> I agree that this should be done.
>
> Thanks,
> -Stolee
>
> -->8--

For what it's worth, this didn't apply quite right with 'git am -3 -c',
since it didn't seem to recognize that this was your scissors line. If I
edit your mail myself by replacing this line with '-- >8 --', then 'git
am' applies it just fine.

> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Fri, 24 Apr 2020 13:11:13 +0000
> Subject: [PATCH] multi-pack-index: close file descriptor after mmap
>
> We recently discovered that the commit-graph was not closing its
> file descriptor after memory-mapping the file contents. After this
> mmap() succeeds, there is no need to keep the file descriptor open.
> In fact, there is signficant reason to close it so we do not run
> out of descriptors.
>
> This is entirely my fault for not knowing that we can have an open
> mmap while also closing the descriptor. Some could blame this on
> being a new contributor when the series was first submitted, but
> even years later this is still new information to me.
>
> That made me realize that I used the same pattern when opening a
> multi-pack-index. Since this file is not (yet) incremental, there
> will be at most one descriptor open for this reason. It is still
> worth fixing, especially if we extend the format to be incremental
> like the commit-graph.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  midx.c | 4 +---
>  midx.h | 2 --
>  2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 1527e464a7..60d30e873b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -72,9 +72,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
>  	FREE_AND_NULL(midx_name);
>
>  	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +	close(fd);

Right, we want to close as soon as we have mmaped the file.

>
>  	FLEX_ALLOC_STR(m, object_dir, object_dir);
> -	m->fd = fd;
>  	m->data = midx_map;
>  	m->data_len = midx_size;
>  	m->local = local;
> @@ -190,8 +190,6 @@ void close_midx(struct multi_pack_index *m)
>  		return;
>
>  	munmap((unsigned char *)m->data, m->data_len);
> -	close(m->fd);
> -	m->fd = -1;

...and not down here. Thanks.

>  	for (i = 0; i < m->num_packs; i++) {
>  		if (m->packs[i])
> diff --git a/midx.h b/midx.h
> index e6fa356b5c..b18cf53bc4 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -12,8 +12,6 @@ struct repository;
>  struct multi_pack_index {
>  	struct multi_pack_index *next;
>
> -	int fd;
> -

:). Even better!

>  	const unsigned char *data;
>  	size_t data_len;
>
> --
> 2.26.2

This looks great to me, and thanks for being proactive about the fix.

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor
Junio C Hamano April 24, 2020, 8:02 p.m. UTC | #5
Derrick Stolee <stolee@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
> Date: Fri, 24 Apr 2020 13:11:13 +0000
> Subject: [PATCH] multi-pack-index: close file descriptor after mmap
>
> We recently discovered that the commit-graph was not closing its
> file descriptor after memory-mapping the file contents. After this
> mmap() succeeds, there is no need to keep the file descriptor open.
> In fact, there is signficant reason to close it so we do not run
> out of descriptors.

The above is sufficient a justification.  Let's leave the remaining
two paragraphs under three-dashes.

> This is entirely my fault for not knowing that we can have an open
> mmap while also closing the descriptor. Some could blame this on
> being a new contributor when the series was first submitted, but
> even years later this is still new information to me.
>
> That made me realize that I used the same pattern when opening a
> multi-pack-index. Since this file is not (yet) incremental, there
> will be at most one descriptor open for this reason. It is still
> worth fixing, especially if we extend the format to be incremental
> like the commit-graph.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  midx.c | 4 +---
>  midx.h | 2 --
>  2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 1527e464a7..60d30e873b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -72,9 +72,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
>  	FREE_AND_NULL(midx_name);
>  
>  	midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +	close(fd);
>  
>  	FLEX_ALLOC_STR(m, object_dir, object_dir);
> -	m->fd = fd;
>  	m->data = midx_map;
>  	m->data_len = midx_size;
>  	m->local = local;
> @@ -190,8 +190,6 @@ void close_midx(struct multi_pack_index *m)
>  		return;
>  
>  	munmap((unsigned char *)m->data, m->data_len);
> -	close(m->fd);
> -	m->fd = -1;
>  
>  	for (i = 0; i < m->num_packs; i++) {
>  		if (m->packs[i])
> diff --git a/midx.h b/midx.h
> index e6fa356b5c..b18cf53bc4 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -12,8 +12,6 @@ struct repository;
>  struct multi_pack_index {
>  	struct multi_pack_index *next;
>  
> -	int fd;
> -
>  	const unsigned char *data;
>  	size_t data_len;
Derrick Stolee April 27, 2020, 10:57 a.m. UTC | #6
On 4/24/2020 4:02 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>> Date: Fri, 24 Apr 2020 13:11:13 +0000
>> Subject: [PATCH] multi-pack-index: close file descriptor after mmap
>>
>> We recently discovered that the commit-graph was not closing its
>> file descriptor after memory-mapping the file contents. After this
>> mmap() succeeds, there is no need to keep the file descriptor open.
>> In fact, there is signficant reason to close it so we do not run
>> out of descriptors.
> 
> The above is sufficient a justification.  Let's leave the remaining
> two paragraphs under three-dashes.

Works for me! I also thought there were too many first-person pronouns,
but erred on the side of reporting "how did we figure out this was an
issue?" in the message.

-Stolee
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index b2d2fdfe3d..e9b458539f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -69,7 +69,6 @@  static uint8_t oid_version(void)
 static struct commit_graph *alloc_commit_graph(void)
 {
 	struct commit_graph *g = xcalloc(1, sizeof(*g));
-	g->graph_fd = -1;
 
 	return g;
 }
@@ -123,14 +122,13 @@  struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st,
 		return NULL;
 	}
 	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
-	ret = parse_commit_graph(graph_map, fd, graph_size);
+	close(fd);
+	ret = parse_commit_graph(graph_map, graph_size);
 
 	if (ret)
 		ret->odb = odb;
-	else {
+	else
 		munmap(graph_map, graph_size);
-		close(fd);
-	}
 
 	return ret;
 }
@@ -165,8 +163,7 @@  static int verify_commit_graph_lite(struct commit_graph *g)
 	return 0;
 }
 
-struct commit_graph *parse_commit_graph(void *graph_map, int fd,
-					size_t graph_size)
+struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size)
 {
 	const unsigned char *data, *chunk_lookup;
 	uint32_t i;
@@ -209,7 +206,6 @@  struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 
 	graph->hash_len = the_hash_algo->rawsz;
 	graph->num_chunks = *(unsigned char*)(data + 6);
-	graph->graph_fd = fd;
 	graph->data = graph_map;
 	graph->data_len = graph_size;
 
@@ -2125,10 +2121,9 @@  void free_commit_graph(struct commit_graph *g)
 {
 	if (!g)
 		return;
-	if (g->graph_fd >= 0) {
+	if (g->data) {
 		munmap((void *)g->data, g->data_len);
 		g->data = NULL;
-		close(g->graph_fd);
 	}
 	free(g->filename);
 	free(g);
diff --git a/commit-graph.h b/commit-graph.h
index 98ef121924..a0a2c4a1e5 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -66,8 +66,7 @@  struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st,
 						 struct object_directory *odb);
 struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb);
-struct commit_graph *parse_commit_graph(void *graph_map, int fd,
-					size_t graph_size);
+struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
 
 /*
  * Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index 0157acbf2e..9fd1c04edd 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -1,8 +1,7 @@ 
 #include "commit-graph.h"
 #include "repository.h"
 
-struct commit_graph *parse_commit_graph(void *graph_map, int fd,
-					size_t graph_size);
+struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
 
 int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);
 
@@ -11,7 +10,7 @@  int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	struct commit_graph *g;
 
 	initialize_the_repository();
-	g = parse_commit_graph((void *)data, -1, size);
+	g = parse_commit_graph((void *)data, size);
 	repo_clear(the_repository);
 	free(g);