[v2,01/13] hash: add an algo member to struct object_id

Message ID	20210426010301.1093562-2-sandals@crustytoothpaste.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: "brian m. carlson" <sandals@crustytoothpaste.net> To: <git@vger.kernel.org> Cc: Derrick Stolee <dstolee@microsoft.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnI=?= =?utf-8?b?w7AgQmphcm1hc29u?= <avarab@gmail.com> Subject: [PATCH v2 01/13] hash: add an algo member to struct object_id Date: Mon, 26 Apr 2021 01:02:49 +0000 Message-Id: <20210426010301.1093562-2-sandals@crustytoothpaste.net> In-Reply-To: <20210426010301.1093562-1-sandals@crustytoothpaste.net> References: <20210426010301.1093562-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	SHA-256 / SHA-1 interop, part 1 \| expand [v2,00/13] SHA-256 / SHA-1 interop, part 1 [v2,01/13] hash: add an algo member to struct object_id [v2,02/13] Always use oidread to read into struct object_id [v2,03/13] http-push: set algorithm when reading object ID [v2,04/13] hash: add a function to finalize object IDs [v2,05/13] Use the final_oid_fn to finalize hashing of object IDs [v2,06/13] builtin/pack-redundant: avoid casting buffers to struct object_id [v2,07/13] hash: set, copy, and use algo field in struct object_id [v2,08/13] hash: provide per-algorithm null OIDs [v2,09/13] builtin/show-index: set the algorithm for object IDs [v2,10/13] commit-graph: don't store file hashes as struct object_id [v2,11/13] builtin/pack-objects: avoid using struct object_id for pack hash [v2,12/13] hex: default to the_hash_algo on zero algorithm value [v2,13/13] hex: print objects using the hash algorithm member

Message ID

20210426010301.1093562-2-sandals@crustytoothpaste.net (mailing list archive)

State

New, archived

Headers

From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: <git@vger.kernel.org>
Cc: Derrick Stolee <dstolee@microsoft.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnI=?=
	=?utf-8?b?w7AgQmphcm1hc29u?=  <avarab@gmail.com>
Subject: [PATCH v2 01/13] hash: add an algo member to struct object_id
Date: Mon, 26 Apr 2021 01:02:49 +0000
Message-Id: <20210426010301.1093562-2-sandals@crustytoothpaste.net>
In-Reply-To: <20210426010301.1093562-1-sandals@crustytoothpaste.net>
References: <20210426010301.1093562-1-sandals@crustytoothpaste.net>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

SHA-256 / SHA-1 interop, part 1 | expand

Commit Message

brian m. carlson April 26, 2021, 1:02 a.m. UTC

Now that we're working with multiple hash algorithms in the same repo,
it's best if we label each object ID with its algorithm so we can
determine how to format a given object ID. Add a member called algo to
struct object_id.

Performance testing on object ID-heavy workloads doesn't reveal a clear
change in performance.  Out of performance tests t0001 and t1450, there
are slight variations in performance both up and down, but all
measurements are within the margin of error.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Matheus Tavares May 7, 2021, 1:58 p.m. UTC | #1

Hi, brian

On Sun, Apr 25, 2021 at 10:03 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Now that we're working with multiple hash algorithms in the same repo,
> it's best if we label each object ID with its algorithm so we can
> determine how to format a given object ID. Add a member called algo to
> struct object_id.

In parallel-checkout.c:send_one_item(), I used hashcpy() instead of
oidcpy() to prepare the packet data that is sent to the checkout
workers through a pipe.

I avoided oidcpy() because it would copy the whole GIT_MAX_RAWSZ
bytes, and the last part could be uninitialized, leading to a Valgrind
warning about passing uninitialized bytes to a write() syscall. There
is no real harm in this case, but I wanted to avoid the warning as it
might confuse someone trying to debug this code, me included.

The problem with this approach, of course, is that it will not copy
the new `algo` field, leaving it as zero for all items. So, what do
you think would be best in this situation? Some ideas that came
through my mind were:

1. Make oidcpy() only copy `hash_algos[src->algo].rawsz` bytes. (But
then we would probably need to branch in case `src->algo` is zero,
right?)

2. Reintroduce the oid_pad_buffer() function from your v1, and use it
in parallel-checkout.c:send_one_item(), after oidcpy(). This would
then zero out the copied uninitialized bytes (with the cost of one
additional memcpy() per item, but this might be neglectable here).

3. Use oidcpy() as-is, without additional padding, and let Valgrind
warn. This false-positive warn might not be so problematic after all,
and maybe I'm just overthinking things :)

What do you think?

Thanks,
Matheus

brian m. carlson May 7, 2021, 8:07 p.m. UTC | #2

On 2021-05-07 at 13:58:42, Matheus Tavares Bernardino wrote:
> Hi, brian

Hey,

> 1. Make oidcpy() only copy `hash_algos[src->algo].rawsz` bytes. (But
> then we would probably need to branch in case `src->algo` is zero,
> right?)

Yeah, this will likely incur a performance cost.  I'd recommend avoiding
this if possible.

> 2. Reintroduce the oid_pad_buffer() function from your v1, and use it
> in parallel-checkout.c:send_one_item(), after oidcpy(). This would
> then zero out the copied uninitialized bytes (with the cost of one
> additional memcpy() per item, but this might be neglectable here).

This is fine with me.  I didn't have a use for it anymore, but you've
clearly found one, and I think this is probably the best approach here.

> 3. Use oidcpy() as-is, without additional padding, and let Valgrind
> warn. This false-positive warn might not be so problematic after all,
> and maybe I'm just overthinking things :)

I'm okay with this, but I don't know if the other end is security
sensitive and might need unused data zeroed.  If so, we should
definitely avoid this option.

diff --git a/hash.h b/hash.h
index 3fb0c3d400..dafdcb3335 100644
--- a/hash.h
+++ b/hash.h
@@ -181,6 +181,7 @@  static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
+	int algo;
 };
 
 #define the_hash_algo the_repository->hash_algo

[v2,01/13] hash: add an algo member to struct object_id

Commit Message

Comments

Patch