[14/17] midx: use chunk-format read API

Message ID	cb145e0e32afed99b9bfa822c76f48bee18885ba.1611676886.git.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <cb145e0e32afed99b9bfa822c76f48bee18885ba.1611676886.git.gitgitgadget@gmail.com> In-Reply-To: <pull.848.git.1611676886.gitgitgadget@gmail.com> References: <pull.848.git.1611676886.gitgitgadget@gmail.com> Date: Tue, 26 Jan 2021 16:01:23 +0000 Subject: [PATCH 14/17] midx: use chunk-format read API Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de, szeder.dev@gmail.com, Derrick Stolee <derrickstolee@github.com>, Derrick Stolee <dstolee@microsoft.com> Precedence: bulk From: Derrick Stolee <dstolee@microsoft.com>
Series	Refactor chunk-format into an API \| expand [00/17] Refactor chunk-format into an API [01/17] commit-graph: anonymize data in chunk_write_fn [02/17] chunk-format: create chunk format write API [03/17] commit-graph: use chunk-format write API [04/17] midx: rename pack_info to write_midx_context [05/17] midx: use context in write_midx_pack_names() [06/17] midx: add entries to write_midx_context [07/17] midx: add pack_perm to write_midx_context [08/17] midx: add num_large_offsets to write_midx_context [09/17] midx: return success/failure in chunk write methods [10/17] midx: drop chunk progress during write [11/17] midx: use chunk-format API in write_midx_internal() [12/17] chunk-format: create read chunk API [13/17] commit-graph: use chunk-format read API [14/17] midx: use chunk-format read API [15/17] midx: use 64-bit multiplication for chunk sizes [16/17] chunk-format: restore duplicate chunk checks [17/17] chunk-format: add technical docs

Message ID

cb145e0e32afed99b9bfa822c76f48bee18885ba.1611676886.git.gitgitgadget@gmail.com (mailing list archive)

State

Superseded

Headers

Message-Id: 
 <cb145e0e32afed99b9bfa822c76f48bee18885ba.1611676886.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.848.git.1611676886.gitgitgadget@gmail.com>
References: <pull.848.git.1611676886.gitgitgadget@gmail.com>
Date: Tue, 26 Jan 2021 16:01:23 +0000
Subject: [PATCH 14/17] midx: use chunk-format read API
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: me@ttaylorr.com, gitster@pobox.com, l.s.r@web.de,
        szeder.dev@gmail.com, Derrick Stolee <derrickstolee@github.com>,
        Derrick Stolee <dstolee@microsoft.com>
Precedence: bulk
From: Derrick Stolee <dstolee@microsoft.com>

Series

Refactor chunk-format into an API | expand

Commit Message

Derrick Stolee Jan. 26, 2021, 4:01 p.m. UTC

From: Derrick Stolee <dstolee@microsoft.com>

Instead of parsing the table of contents directly, use the chunk-format
API methods read_table_of_contents() and pair_chunk(). In particular, we
can use the return value of pair_chunk() to generate an error when a
required chunk is missing.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 103 ++++++++++++++++++++----------------
 t/t5319-multi-pack-index.sh |   6 +--
 2 files changed, 60 insertions(+), 49 deletions(-)

Comments

Taylor Blau Jan. 27, 2021, 3:06 a.m. UTC | #1

On Tue, Jan 26, 2021 at 04:01:23PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Instead of parsing the table of contents directly, use the chunk-format
> API methods read_table_of_contents() and pair_chunk(). In particular, we
> can use the return value of pair_chunk() to generate an error when a
> required chunk is missing.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  midx.c                      | 103 ++++++++++++++++++++----------------
>  t/t5319-multi-pack-index.sh |   6 +--
>  2 files changed, 60 insertions(+), 49 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 0bfd2d802b6..dd019c00795 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -54,6 +54,51 @@ static char *get_midx_filename(const char *object_dir)
>  	return xstrfmt("%s/pack/multi-pack-index", object_dir);
>  }
>
> +static int midx_read_pack_names(const unsigned char *chunk_start,
> +				size_t chunk_size, void *data)
> +{
> +	struct multi_pack_index *m = (struct multi_pack_index *)data;
> +	m->chunk_pack_names = chunk_start;
> +	return 0;
> +}

There are a lot of these callbacks that just assign some 'void **' to
point at chunk_start.

Maybe a good use of the "pair_chunk" name would be something like:

    int pair_chunk(struct chunkfile *cf, uint32_t id, const unsigned char **p);

which does the same as what you wrote here. So instead of what you
wrote, you could instead:

    pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names);

This would be in addition to the richer callback-style function which
allows the caller greater flexibility (e.g., for the Bloom filter
related readers in the commit-graph code).

Thanks,
Taylor

Derrick Stolee Jan. 27, 2021, 1:50 p.m. UTC | #2

On 1/26/2021 10:06 PM, Taylor Blau wrote:
> On Tue, Jan 26, 2021 at 04:01:23PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Instead of parsing the table of contents directly, use the chunk-format
>> API methods read_table_of_contents() and pair_chunk(). In particular, we
>> can use the return value of pair_chunk() to generate an error when a
>> required chunk is missing.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  midx.c                      | 103 ++++++++++++++++++++----------------
>>  t/t5319-multi-pack-index.sh |   6 +--
>>  2 files changed, 60 insertions(+), 49 deletions(-)
>>
>> diff --git a/midx.c b/midx.c
>> index 0bfd2d802b6..dd019c00795 100644
>> --- a/midx.c
>> +++ b/midx.c
>> @@ -54,6 +54,51 @@ static char *get_midx_filename(const char *object_dir)
>>  	return xstrfmt("%s/pack/multi-pack-index", object_dir);
>>  }
>>
>> +static int midx_read_pack_names(const unsigned char *chunk_start,
>> +				size_t chunk_size, void *data)
>> +{
>> +	struct multi_pack_index *m = (struct multi_pack_index *)data;
>> +	m->chunk_pack_names = chunk_start;
>> +	return 0;
>> +}
> 
> There are a lot of these callbacks that just assign some 'void **' to
> point at chunk_start.
> 
> Maybe a good use of the "pair_chunk" name would be something like:
> 
>     int pair_chunk(struct chunkfile *cf, uint32_t id, const unsigned char **p);
> 
> which does the same as what you wrote here. So instead of what you
> wrote, you could instead:
> 
>     pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names);
> 
> This would be in addition to the richer callback-style function which
> allows the caller greater flexibility (e.g., for the Bloom filter
> related readers in the commit-graph code).

You're right that _most_ callers just want to assign a pointer,
so this mechanism would be better. I'll make a different function,
read_chunk() perhaps, that relies on a callback for advanced users.

Thanks,
-Stolee

diff --git a/midx.c b/midx.c
index 0bfd2d802b6..dd019c00795 100644
--- a/midx.c
+++ b/midx.c
@@ -54,6 +54,51 @@  static char *get_midx_filename(const char *object_dir)
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+static int midx_read_pack_names(const unsigned char *chunk_start,
+				size_t chunk_size, void *data)
+{
+	struct multi_pack_index *m = (struct multi_pack_index *)data;
+	m->chunk_pack_names = chunk_start;
+	return 0;
+}
+
+static int midx_read_oid_fanout(const unsigned char *chunk_start,
+				size_t chunk_size, void *data)
+{
+	struct multi_pack_index *m = (struct multi_pack_index *)data;
+	m->chunk_oid_fanout = (uint32_t *)chunk_start;
+
+	if (chunk_size != 4 * 256) {
+		error(_("multi-pack-index OID fanout is of the wrong size"));
+		return 1;
+	}
+	return 0;
+}
+
+static int midx_read_oid_lookup(const unsigned char *chunk_start,
+				size_t chunk_size, void *data)
+{
+	struct multi_pack_index *m = (struct multi_pack_index *)data;
+	m->chunk_oid_lookup = chunk_start;
+	return 0;
+}
+
+static int midx_read_offsets(const unsigned char *chunk_start,
+			     size_t chunk_size, void *data)
+{
+	struct multi_pack_index *m = (struct multi_pack_index *)data;
+	m->chunk_object_offsets = chunk_start;
+	return 0;
+}
+
+static int midx_read_large_offsets(const unsigned char *chunk_start,
+				   size_t chunk_size, void *data)
+{
+	struct multi_pack_index *m = (struct multi_pack_index *)data;
+	m->chunk_large_offsets = chunk_start;
+	return 0;
+}
+
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
 {
 	struct multi_pack_index *m = NULL;
@@ -65,6 +110,7 @@  struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	char *midx_name = get_midx_filename(object_dir);
 	uint32_t i;
 	const char *cur_pack_name;
+	struct chunkfile *cf = NULL;
 
 	fd = git_open(midx_name);
 
@@ -114,58 +160,23 @@  struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 	m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
 
-	for (i = 0; i < m->num_chunks; i++) {
-		uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE +
-					     MIDX_CHUNKLOOKUP_WIDTH * i);
-		uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 +
-						 MIDX_CHUNKLOOKUP_WIDTH * i);
-
-		if (chunk_offset >= m->data_len)
-			die(_("invalid chunk offset (too large)"));
-
-		switch (chunk_id) {
-			case MIDX_CHUNKID_PACKNAMES:
-				m->chunk_pack_names = m->data + chunk_offset;
-				break;
-
-			case MIDX_CHUNKID_OIDFANOUT:
-				m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset);
-				break;
-
-			case MIDX_CHUNKID_OIDLOOKUP:
-				m->chunk_oid_lookup = m->data + chunk_offset;
-				break;
-
-			case MIDX_CHUNKID_OBJECTOFFSETS:
-				m->chunk_object_offsets = m->data + chunk_offset;
-				break;
-
-			case MIDX_CHUNKID_LARGEOFFSETS:
-				m->chunk_large_offsets = m->data + chunk_offset;
-				break;
-
-			case 0:
-				die(_("terminating multi-pack-index chunk id appears earlier than expected"));
-				break;
-
-			default:
-				/*
-				 * Do nothing on unrecognized chunks, allowing future
-				 * extensions to add optional chunks.
-				 */
-				break;
-		}
-	}
+	cf = init_chunkfile(NULL);
 
-	if (!m->chunk_pack_names)
+	if (read_table_of_contents(cf, m->data, midx_size,
+				   MIDX_HEADER_SIZE, m->num_chunks))
+		goto cleanup_fail;
+
+	if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, midx_read_pack_names, m) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required pack-name chunk"));
-	if (!m->chunk_oid_fanout)
+	if (pair_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required OID fanout chunk"));
-	if (!m->chunk_oid_lookup)
+	if (pair_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, midx_read_oid_lookup, m) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required OID lookup chunk"));
-	if (!m->chunk_object_offsets)
+	if (pair_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, midx_read_offsets, m) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required object offsets chunk"));
 
+	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, midx_read_large_offsets, m);
+
 	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
 	m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 297de502a94..ad4e878b65b 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -314,12 +314,12 @@  test_expect_success 'verify bad OID version' '
 
 test_expect_success 'verify truncated chunk count' '
 	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \
-		"missing required"
+		"final chunk has non-zero id"
 '
 
 test_expect_success 'verify extended chunk count' '
 	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \
-		"terminating multi-pack-index chunk id appears earlier than expected"
+		"terminating chunk id appears earlier than expected"
 '
 
 test_expect_success 'verify missing required chunk' '
@@ -329,7 +329,7 @@  test_expect_success 'verify missing required chunk' '
 
 test_expect_success 'verify invalid chunk offset' '
 	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \
-		"invalid chunk offset (too large)"
+		"improper chunk offset(s)"
 '
 
 test_expect_success 'verify packnames out of order' '

[14/17] midx: use chunk-format read API

Commit Message

Comments

Patch