diff mbox series

[v4,03/10] commit-graph: consolidate fill_commit_graph_info

Message ID 18bb3318a12c859c21c8e95285d551c48d31b54b.1602079786.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Implement Corrected Commit Date | expand

Commit Message

Philippe Blain via GitGitGadget Oct. 7, 2020, 2:09 p.m. UTC
From: Abhishek Kumar <abhishekkumar8222@gmail.com>

Both fill_commit_graph_info() and fill_commit_in_graph() parse
information present in commit data chunk. Let's simplify the
implementation by calling fill_commit_graph_info() within
fill_commit_in_graph().

fill_commit_graph_info() used to not load committer data from commit data
chunk. However, with the corrected committer date, we have to load
committer date to calculate generation number value.

e51217e15 (t5000: test tar files that overflow ustar headers,
30-06-2016) introduced a test 'generate tar with future mtime' that
creates a commit with committer date of (2 ^ 36 + 1) seconds since
EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus
committer time overflows into generation number (within CDAT chunk) and
has undefined behavior.

The test used to pass as fill_commit_graph_info() would not set struct
member `date` of struct commit and loads committer date from the object
database, generating a tar file with the expected mtime.

However, with corrected commit date, we will load the committer date
from CDAT chunk (truncated to lower 34-bits to populate the generation
number. Thus, Git sets date and generates tar file with the truncated
mtime.

The ustar format (the header format used by most modern tar programs)
only has room for 11 (or 12, depending om some implementations) octal
digits for the size and mtime of each files.

Thus, setting a timestamp of 2 ^ 33 + 1 would overflow the 11-octal
digit implementations while still fitting into commit data chunk.

Since we want to test 12-octal digit implementations of ustar as well,
let's modify the existing test to no longer use commit-graph file.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
---
 commit-graph.c      | 27 ++++++++++-----------------
 t/t5000-tar-tree.sh | 20 +++++++++++++++++++-
 2 files changed, 29 insertions(+), 18 deletions(-)

Comments

Jakub Narębski Oct. 25, 2020, 10:52 a.m. UTC | #1
Hi Abhishek,

In short: everything is all right, except for the now duplicated test
names in t5000 after this commit.

"Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Abhishek Kumar <abhishekkumar8222@gmail.com>
>
> Both fill_commit_graph_info() and fill_commit_in_graph() parse
> information present in commit data chunk. Let's simplify the
> implementation by calling fill_commit_graph_info() within
> fill_commit_in_graph().
>
> fill_commit_graph_info() used to not load committer data from commit data
> chunk. However, with the corrected committer date, we have to load
> committer date to calculate generation number value.

Nice writeup, however the last sentence would in my opinion read better
in the future tense: we don't use generation number v2 yet.  For
example:

  However, with upcoming switch to using corrected committer date as
  generation number v2, we will have to load committer date to compute
  generation number value anyway.

Or something like that - notice the minor addition and changes.

The following is slightly unrelated change, but we agreed that it would
be better to not separate them; the need for change to the t5000 test is
caused by the change described above.

>
> e51217e15 (t5000: test tar files that overflow ustar headers,
> 30-06-2016) introduced a test 'generate tar with future mtime' that
> creates a commit with committer date of (2 ^ 36 + 1) seconds since
> EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus
> committer time overflows into generation number (within CDAT chunk) and
> has undefined behavior.
>
> The test used to pass as fill_commit_graph_info() would not set struct
> member `date` of struct commit and loads committer date from the object
> database, generating a tar file with the expected mtime.

I think it should be s/loads/load/, as in "would load", but I am not a
native English speaker.

>
> However, with corrected commit date, we will load the committer date
> from CDAT chunk (truncated to lower 34-bits to populate the generation
> number. Thus, Git sets date and generates tar file with the truncated
> mtime.
>
> The ustar format (the header format used by most modern tar programs)
> only has room for 11 (or 12, depending om some implementations) octal
> digits for the size and mtime of each files.
>
> Thus, setting a timestamp of 2 ^ 33 + 1 would overflow the 11-octal
> digit implementations while still fitting into commit data chunk.
>
> Since we want to test 12-octal digit implementations of ustar as well,
> let's modify the existing test to no longer use commit-graph file.

The description above is for me does not make it entirely clear that we
add new test for handling possible 11-octal digit overflow nearly
identical to the existing one, and turn off use of commit-graph file for
test that checks handling 12-octal digit overflow.

> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  commit-graph.c      | 27 ++++++++++-----------------
>  t/t5000-tar-tree.sh | 20 +++++++++++++++++++-
>  2 files changed, 29 insertions(+), 18 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 94503e584b..e8362e144e 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -749,15 +749,24 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
>  	const unsigned char *commit_data;
>  	struct commit_graph_data *graph_data;
>  	uint32_t lex_index;
> +	uint64_t date_high, date_low;
>  
>  	while (pos < g->num_commits_in_base)
>  		g = g->base_graph;
>  
> +	if (pos >= g->num_commits + g->num_commits_in_base)
> +		die(_("invalid commit position. commit-graph is likely corrupt"));
> +
>  	lex_index = pos - g->num_commits_in_base;
>  	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
>  
>  	graph_data = commit_graph_data_at(item);
>  	graph_data->graph_pos = pos;
> +
> +	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
> +	date_low = get_be32(commit_data + g->hash_len + 12);
> +	item->date = (timestamp_t)((date_high << 32) | date_low);
> +
>  	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
>  }
>  
> @@ -772,38 +781,22 @@ static int fill_commit_in_graph(struct repository *r,
>  {
>  	uint32_t edge_value;
>  	uint32_t *parent_data_ptr;
> -	uint64_t date_low, date_high;
>  	struct commit_list **pptr;
> -	struct commit_graph_data *graph_data;
>  	const unsigned char *commit_data;
>  	uint32_t lex_index;
>  
>  	while (pos < g->num_commits_in_base)
>  		g = g->base_graph;
>  
> -	if (pos >= g->num_commits + g->num_commits_in_base)
> -		die(_("invalid commit position. commit-graph is likely corrupt"));
> +	fill_commit_graph_info(item, g, pos);
>  
> -	/*
> -	 * Store the "full" position, but then use the
> -	 * "local" position for the rest of the calculation.
> -	 */
> -	graph_data = commit_graph_data_at(item);
> -	graph_data->graph_pos = pos;
>  	lex_index = pos - g->num_commits_in_base;
> -
>  	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
>  
>  	item->object.parsed = 1;
>  
>  	set_commit_tree(item, NULL);
>  
> -	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
> -	date_low = get_be32(commit_data + g->hash_len + 12);
> -	item->date = (timestamp_t)((date_high << 32) | date_low);
> -
> -	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> -
>  	pptr = &item->parents;
>  
>  	edge_value = get_be32(commit_data + g->hash_len);

All right, looks good for me.

Here second change begins.

> diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> index 3ebb0d3b65..8f41cdc509 100755
> --- a/t/t5000-tar-tree.sh
> +++ b/t/t5000-tar-tree.sh
> @@ -431,11 +431,29 @@ test_expect_success TAR_HUGE,LONG_IS_64BIT 'system tar can read our huge size' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' '
> +	rm -f .git/index &&
> +	echo foo >file &&
> +	git add file &&
> +	GIT_COMMITTER_DATE="@17179869183 +0000" \
> +		git commit -m "tempori parendum"
> +'
> +
> +test_expect_success TIME_IS_64BIT 'generate tar with future mtime' '
> +	git archive HEAD >future.tar
> +'
> +
> +test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' '
> +	echo 2514 >expect &&
> +	tar_info future.tar | cut -d" " -f2 >actual &&
> +	test_cmp expect actual
> +'
> +

Everything is all right, except we now have duplicated test names.

Perhaps in the three following tests we should use 'far-far-future
commit' and 'far future mtime' in place of current 'far-future commit'
and 'future mtime' for tests checking handling 12-digital ditgits
overflow, or add description how far the future is, for example
'far-future commit (2^11 + 1)', etc.

>  test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' '
>  	rm -f .git/index &&
>  	echo content >file &&
>  	git add file &&
> -	GIT_COMMITTER_DATE="@68719476737 +0000" \
> +	GIT_TEST_COMMIT_GRAPH=0 GIT_COMMITTER_DATE="@68719476737 +0000" \
>  		git commit -m "tempori parendum"
>  '

Best,
Abhishek Kumar Oct. 27, 2020, 6:33 a.m. UTC | #2
Hello Dr. Narębski,

On Sun, Oct 25, 2020 at 11:52:42AM +0100, Jakub Narębski wrote:
> Hi Abhishek,
> 
> In short: everything is all right, except for the now duplicated test
> names in t5000 after this commit.
> 
> "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Abhishek Kumar <abhishekkumar8222@gmail.com>
> >
> > Both fill_commit_graph_info() and fill_commit_in_graph() parse
> > information present in commit data chunk. Let's simplify the
> > implementation by calling fill_commit_graph_info() within
> > fill_commit_in_graph().
> >
> > fill_commit_graph_info() used to not load committer data from commit data
> > chunk. However, with the corrected committer date, we have to load
> > committer date to calculate generation number value.
> 
> Nice writeup, however the last sentence would in my opinion read better
> in the future tense: we don't use generation number v2 yet.  For
> example:
> 
>   However, with upcoming switch to using corrected committer date as
>   generation number v2, we will have to load committer date to compute
>   generation number value anyway.
> 
> Or something like that - notice the minor addition and changes.
> 

Thanks for the change, it looks better!

> The following is slightly unrelated change, but we agreed that it would
> be better to not separate them; the need for change to the t5000 test is
> caused by the change described above.

> 
> >
> > e51217e15 (t5000: test tar files that overflow ustar headers,
> > 30-06-2016) introduced a test 'generate tar with future mtime' that
> > creates a commit with committer date of (2 ^ 36 + 1) seconds since
> > EPOCH. The CDAT chunk provides 34-bits for storing committer date, thus
> > committer time overflows into generation number (within CDAT chunk) and
> > has undefined behavior.
> >
> > The test used to pass as fill_commit_graph_info() would not set struct
> > member `date` of struct commit and loads committer date from the object
> > database, generating a tar file with the expected mtime.
> 
> I think it should be s/loads/load/, as in "would load", but I am not a
> native English speaker.
> 

That's correct - since I have used "would not set" in the first half of
sentence, the later half should follow suit too.

> >
> > However, with corrected commit date, we will load the committer date
> > from CDAT chunk (truncated to lower 34-bits to populate the generation
> > number. Thus, Git sets date and generates tar file with the truncated
> > mtime.
> >
> > The ustar format (the header format used by most modern tar programs)
> > only has room for 11 (or 12, depending om some implementations) octal
> > digits for the size and mtime of each files.
> >
> > Thus, setting a timestamp of 2 ^ 33 + 1 would overflow the 11-octal
> > digit implementations while still fitting into commit data chunk.
> >
> > Since we want to test 12-octal digit implementations of ustar as well,
> > let's modify the existing test to no longer use commit-graph file.
> 
> The description above is for me does not make it entirely clear that we
> add new test for handling possible 11-octal digit overflow nearly
> identical to the existing one, and turn off use of commit-graph file for
> test that checks handling 12-octal digit overflow.
> 

Revised the last paragraphs to:

  The ustar format (the header format used by most modern tar programs)
  only has room for 11 (or 12, depending on some implementations) octal
  digits for the size and mtime of each file.

  To test the 11-octal digit implementation, we create a future commit
  with committer date of 2^34 - 1, which overflows 11-octal digits
  without overflowing 34-bits of the Commit Data chunk.

  To test the 12-octal digit implementation, the smallest committer date
  possible is 2^36, which overflows the Commit Data chunk and thus
  commit-graph must be disabled for the test.

> > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> > ---
> >  commit-graph.c      | 27 ++++++++++-----------------
> >  t/t5000-tar-tree.sh | 20 +++++++++++++++++++-
> >  2 files changed, 29 insertions(+), 18 deletions(-)
> >
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 94503e584b..e8362e144e 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -749,15 +749,24 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
> >  	const unsigned char *commit_data;
> >  	struct commit_graph_data *graph_data;
> >  	uint32_t lex_index;
> > +	uint64_t date_high, date_low;
> >  
> >  	while (pos < g->num_commits_in_base)
> >  		g = g->base_graph;
> >  
> > +	if (pos >= g->num_commits + g->num_commits_in_base)
> > +		die(_("invalid commit position. commit-graph is likely corrupt"));
> > +
> >  	lex_index = pos - g->num_commits_in_base;
> >  	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
> >  
> >  	graph_data = commit_graph_data_at(item);
> >  	graph_data->graph_pos = pos;
> > +
> > +	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
> > +	date_low = get_be32(commit_data + g->hash_len + 12);
> > +	item->date = (timestamp_t)((date_high << 32) | date_low);
> > +
> >  	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> >  }
> >  
> > @@ -772,38 +781,22 @@ static int fill_commit_in_graph(struct repository *r,
> >  {
> >  	uint32_t edge_value;
> >  	uint32_t *parent_data_ptr;
> > -	uint64_t date_low, date_high;
> >  	struct commit_list **pptr;
> > -	struct commit_graph_data *graph_data;
> >  	const unsigned char *commit_data;
> >  	uint32_t lex_index;
> >  
> >  	while (pos < g->num_commits_in_base)
> >  		g = g->base_graph;
> >  
> > -	if (pos >= g->num_commits + g->num_commits_in_base)
> > -		die(_("invalid commit position. commit-graph is likely corrupt"));
> > +	fill_commit_graph_info(item, g, pos);
> >  
> > -	/*
> > -	 * Store the "full" position, but then use the
> > -	 * "local" position for the rest of the calculation.
> > -	 */
> > -	graph_data = commit_graph_data_at(item);
> > -	graph_data->graph_pos = pos;
> >  	lex_index = pos - g->num_commits_in_base;
> > -
> >  	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
> >  
> >  	item->object.parsed = 1;
> >  
> >  	set_commit_tree(item, NULL);
> >  
> > -	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
> > -	date_low = get_be32(commit_data + g->hash_len + 12);
> > -	item->date = (timestamp_t)((date_high << 32) | date_low);
> > -
> > -	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> > -
> >  	pptr = &item->parents;
> >  
> >  	edge_value = get_be32(commit_data + g->hash_len);
> 
> All right, looks good for me.
> 
> Here second change begins.
> 
> > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> > index 3ebb0d3b65..8f41cdc509 100755
> > --- a/t/t5000-tar-tree.sh
> > +++ b/t/t5000-tar-tree.sh
> > @@ -431,11 +431,29 @@ test_expect_success TAR_HUGE,LONG_IS_64BIT 'system tar can read our huge size' '
> >  	test_cmp expect actual
> >  '
> >  
> > +test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' '
> > +	rm -f .git/index &&
> > +	echo foo >file &&
> > +	git add file &&
> > +	GIT_COMMITTER_DATE="@17179869183 +0000" \
> > +		git commit -m "tempori parendum"
> > +'
> > +
> > +test_expect_success TIME_IS_64BIT 'generate tar with future mtime' '
> > +	git archive HEAD >future.tar
> > +'
> > +
> > +test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' '
> > +	echo 2514 >expect &&
> > +	tar_info future.tar | cut -d" " -f2 >actual &&
> > +	test_cmp expect actual
> > +'
> > +
> 
> Everything is all right, except we now have duplicated test names.
> 
> Perhaps in the three following tests we should use 'far-far-future
> commit' and 'far future mtime' in place of current 'far-future commit'
> and 'future mtime' for tests checking handling 12-digital ditgits
> overflow, or add description how far the future is, for example
> 'far-future commit (2^11 + 1)', etc.
> 

Changed, thanks for pointing this out.

> >  test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' '
> >  	rm -f .git/index &&
> >  	echo content >file &&
> >  	git add file &&
> > -	GIT_COMMITTER_DATE="@68719476737 +0000" \
> > +	GIT_TEST_COMMIT_GRAPH=0 GIT_COMMITTER_DATE="@68719476737 +0000" \
> >  		git commit -m "tempori parendum"
> >  '
> 
> Best,
> -- 
> Jakub Narębski

Thanks
- Abhishek
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 94503e584b..e8362e144e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -749,15 +749,24 @@  static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
 	const unsigned char *commit_data;
 	struct commit_graph_data *graph_data;
 	uint32_t lex_index;
+	uint64_t date_high, date_low;
 
 	while (pos < g->num_commits_in_base)
 		g = g->base_graph;
 
+	if (pos >= g->num_commits + g->num_commits_in_base)
+		die(_("invalid commit position. commit-graph is likely corrupt"));
+
 	lex_index = pos - g->num_commits_in_base;
 	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
 
 	graph_data = commit_graph_data_at(item);
 	graph_data->graph_pos = pos;
+
+	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
+	date_low = get_be32(commit_data + g->hash_len + 12);
+	item->date = (timestamp_t)((date_high << 32) | date_low);
+
 	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
@@ -772,38 +781,22 @@  static int fill_commit_in_graph(struct repository *r,
 {
 	uint32_t edge_value;
 	uint32_t *parent_data_ptr;
-	uint64_t date_low, date_high;
 	struct commit_list **pptr;
-	struct commit_graph_data *graph_data;
 	const unsigned char *commit_data;
 	uint32_t lex_index;
 
 	while (pos < g->num_commits_in_base)
 		g = g->base_graph;
 
-	if (pos >= g->num_commits + g->num_commits_in_base)
-		die(_("invalid commit position. commit-graph is likely corrupt"));
+	fill_commit_graph_info(item, g, pos);
 
-	/*
-	 * Store the "full" position, but then use the
-	 * "local" position for the rest of the calculation.
-	 */
-	graph_data = commit_graph_data_at(item);
-	graph_data->graph_pos = pos;
 	lex_index = pos - g->num_commits_in_base;
-
 	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
 
 	item->object.parsed = 1;
 
 	set_commit_tree(item, NULL);
 
-	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
-	date_low = get_be32(commit_data + g->hash_len + 12);
-	item->date = (timestamp_t)((date_high << 32) | date_low);
-
-	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
-
 	pptr = &item->parents;
 
 	edge_value = get_be32(commit_data + g->hash_len);
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 3ebb0d3b65..8f41cdc509 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -431,11 +431,29 @@  test_expect_success TAR_HUGE,LONG_IS_64BIT 'system tar can read our huge size' '
 	test_cmp expect actual
 '
 
+test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' '
+	rm -f .git/index &&
+	echo foo >file &&
+	git add file &&
+	GIT_COMMITTER_DATE="@17179869183 +0000" \
+		git commit -m "tempori parendum"
+'
+
+test_expect_success TIME_IS_64BIT 'generate tar with future mtime' '
+	git archive HEAD >future.tar
+'
+
+test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' '
+	echo 2514 >expect &&
+	tar_info future.tar | cut -d" " -f2 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' '
 	rm -f .git/index &&
 	echo content >file &&
 	git add file &&
-	GIT_COMMITTER_DATE="@68719476737 +0000" \
+	GIT_TEST_COMMIT_GRAPH=0 GIT_COMMITTER_DATE="@68719476737 +0000" \
 		git commit -m "tempori parendum"
 '