[v2,16/17] chunk-format: restore duplicate chunk checks

Message ID	669eeec707ab92a3e5983ad12baddc2c15012d43.1611759716.git.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <669eeec707ab92a3e5983ad12baddc2c15012d43.1611759716.git.gitgitgadget@gmail.com> In-Reply-To: <pull.848.v2.git.1611759716.gitgitgadget@gmail.com> References: <pull.848.git.1611676886.gitgitgadget@gmail.com> <pull.848.v2.git.1611759716.gitgitgadget@gmail.com> Date: Wed, 27 Jan 2021 15:01:55 +0000 Subject: [PATCH v2 16/17] chunk-format: restore duplicate chunk checks Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Chris Torek <chris.torek@gmail.com>, Derrick Stolee <stolee@gmail.com>, Derrick Stolee <derrickstolee@github.com>, Derrick Stolee <dstolee@microsoft.com> Precedence: bulk From: Derrick Stolee <dstolee@microsoft.com>
Series	Refactor chunk-format into an API \| expand [v2,00/17] Refactor chunk-format into an API [v2,01/17] commit-graph: anonymize data in chunk_write_fn [v2,02/17] chunk-format: create chunk format write API [v2,03/17] commit-graph: use chunk-format write API [v2,04/17] midx: rename pack_info to write_midx_context [v2,05/17] midx: use context in write_midx_pack_names() [v2,06/17] midx: add entries to write_midx_context [v2,07/17] midx: add pack_perm to write_midx_context [v2,08/17] midx: add num_large_offsets to write_midx_context [v2,09/17] midx: return success/failure in chunk write methods [v2,10/17] midx: drop chunk progress during write [v2,11/17] midx: use chunk-format API in write_midx_internal() [v2,12/17] chunk-format: create read chunk API [v2,13/17] commit-graph: use chunk-format read API [v2,14/17] midx: use chunk-format read API [v2,15/17] midx: use 64-bit multiplication for chunk sizes [v2,16/17] chunk-format: restore duplicate chunk checks [v2,17/17] chunk-format: add technical docs

Message ID

669eeec707ab92a3e5983ad12baddc2c15012d43.1611759716.git.gitgitgadget@gmail.com (mailing list archive)

State

Superseded

Headers

Message-Id: 
 <669eeec707ab92a3e5983ad12baddc2c15012d43.1611759716.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.848.v2.git.1611759716.gitgitgadget@gmail.com>
References: <pull.848.git.1611676886.gitgitgadget@gmail.com>
        <pull.848.v2.git.1611759716.gitgitgadget@gmail.com>
Date: Wed, 27 Jan 2021 15:01:55 +0000
Subject: [PATCH v2 16/17] chunk-format: restore duplicate chunk checks
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de,
        szeder.dev@gmail.com, Chris Torek <chris.torek@gmail.com>,
        Derrick Stolee <stolee@gmail.com>,
        Derrick Stolee <derrickstolee@github.com>,
        Derrick Stolee <dstolee@microsoft.com>
Precedence: bulk
From: Derrick Stolee <dstolee@microsoft.com>

Series

Refactor chunk-format into an API | expand

Commit Message

Derrick Stolee Jan. 27, 2021, 3:01 p.m. UTC

From: Derrick Stolee <dstolee@microsoft.com>

Before refactoring into the chunk-format API, the commit-graph parsing
logic included checks for duplicate chunks. It is unlikely that we would
desire a chunk-based file format that allows duplicate chunk IDs in the
table of contents, so add duplicate checks into
read_table_of_contents().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 chunk-format.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Junio C Hamano Feb. 5, 2021, 12:05 a.m. UTC | #1

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before refactoring into the chunk-format API, the commit-graph parsing
> logic included checks for duplicate chunks. It is unlikely that we would
> desire a chunk-based file format that allows duplicate chunk IDs in the
> table of contents, so add duplicate checks into
> read_table_of_contents().

Makes sense.  This answers a question I had while reading one of the
previous steps about the design, I think.

However...

> diff --git a/chunk-format.c b/chunk-format.c
> index 74501084cf8..1ee875df423 100644
> --- a/chunk-format.c
> +++ b/chunk-format.c
> @@ -14,6 +14,7 @@ struct chunk_info {
>  	chunk_write_fn write_fn;
>  
>  	const void *start;
> +	unsigned found:1;

This defines a .found member ...

> @@ -98,6 +99,7 @@ int read_table_of_contents(struct chunkfile *cf,
>  			   uint64_t toc_offset,
>  			   int toc_length)
>  {
> +	int i;
>  	uint32_t chunk_id;
>  	const unsigned char *table_of_contents = mfile + toc_offset;
>  
> @@ -124,6 +126,14 @@ int read_table_of_contents(struct chunkfile *cf,
>  			return -1;
>  		}
>  
> +		for (i = 0; i < cf->chunks_nr; i++) {
> +			if (cf->chunks[i].id == chunk_id) {
> +				error(_("duplicate chunk ID %"PRIx32" found"),
> +					chunk_id);
> +				return -1;
> +			}
> +		}
> +
>  		cf->chunks[cf->chunks_nr].id = chunk_id;
>  		cf->chunks[cf->chunks_nr].start = mfile + chunk_offset;
>  		cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset;

... and no new code touches it.

The way duplicate is found is by having a inner loop that checks the
IDs of chunks we've seen so far (quadratic, but presumably that
would not matter as long as we'd be dealing with just half a dozen
chunk types).

Is the .found bit used for something else and needs to be added in a
different step?

Derrick Stolee Feb. 5, 2021, 12:31 p.m. UTC | #2

On 2/4/2021 7:05 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>  	const void *start;
>> +	unsigned found:1;
> 
> This defines a .found member ...
>> ... and no new code touches it.
> 
> The way duplicate is found is by having a inner loop that checks the
> IDs of chunks we've seen so far (quadratic, but presumably that
> would not matter as long as we'd be dealing with just half a dozen
> chunk types).
> 
> Is the .found bit used for something else and needs to be added in a
> different step?

Nope. It is just noise that I should have caught and deleted.

Thanks,
-Stolee

diff --git a/chunk-format.c b/chunk-format.c
index 74501084cf8..1ee875df423 100644
--- a/chunk-format.c
+++ b/chunk-format.c
@@ -14,6 +14,7 @@  struct chunk_info {
 	chunk_write_fn write_fn;
 
 	const void *start;
+	unsigned found:1;
 };
 
 struct chunkfile {
@@ -98,6 +99,7 @@  int read_table_of_contents(struct chunkfile *cf,
 			   uint64_t toc_offset,
 			   int toc_length)
 {
+	int i;
 	uint32_t chunk_id;
 	const unsigned char *table_of_contents = mfile + toc_offset;
 
@@ -124,6 +126,14 @@  int read_table_of_contents(struct chunkfile *cf,
 			return -1;
 		}
 
+		for (i = 0; i < cf->chunks_nr; i++) {
+			if (cf->chunks[i].id == chunk_id) {
+				error(_("duplicate chunk ID %"PRIx32" found"),
+					chunk_id);
+				return -1;
+			}
+		}
+
 		cf->chunks[cf->chunks_nr].id = chunk_id;
 		cf->chunks[cf->chunks_nr].start = mfile + chunk_offset;
 		cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset;

[v2,16/17] chunk-format: restore duplicate chunk checks

Commit Message

Comments

Patch