Message ID | 20210426010301.1093562-2-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SHA-256 / SHA-1 interop, part 1 | expand |
Hi, brian On Sun, Apr 25, 2021 at 10:03 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > Now that we're working with multiple hash algorithms in the same repo, > it's best if we label each object ID with its algorithm so we can > determine how to format a given object ID. Add a member called algo to > struct object_id. In parallel-checkout.c:send_one_item(), I used hashcpy() instead of oidcpy() to prepare the packet data that is sent to the checkout workers through a pipe. I avoided oidcpy() because it would copy the whole GIT_MAX_RAWSZ bytes, and the last part could be uninitialized, leading to a Valgrind warning about passing uninitialized bytes to a write() syscall. There is no real harm in this case, but I wanted to avoid the warning as it might confuse someone trying to debug this code, me included. The problem with this approach, of course, is that it will not copy the new `algo` field, leaving it as zero for all items. So, what do you think would be best in this situation? Some ideas that came through my mind were: 1. Make oidcpy() only copy `hash_algos[src->algo].rawsz` bytes. (But then we would probably need to branch in case `src->algo` is zero, right?) 2. Reintroduce the oid_pad_buffer() function from your v1, and use it in parallel-checkout.c:send_one_item(), after oidcpy(). This would then zero out the copied uninitialized bytes (with the cost of one additional memcpy() per item, but this might be neglectable here). 3. Use oidcpy() as-is, without additional padding, and let Valgrind warn. This false-positive warn might not be so problematic after all, and maybe I'm just overthinking things :) What do you think? Thanks, Matheus
On 2021-05-07 at 13:58:42, Matheus Tavares Bernardino wrote: > Hi, brian Hey, > 1. Make oidcpy() only copy `hash_algos[src->algo].rawsz` bytes. (But > then we would probably need to branch in case `src->algo` is zero, > right?) Yeah, this will likely incur a performance cost. I'd recommend avoiding this if possible. > 2. Reintroduce the oid_pad_buffer() function from your v1, and use it > in parallel-checkout.c:send_one_item(), after oidcpy(). This would > then zero out the copied uninitialized bytes (with the cost of one > additional memcpy() per item, but this might be neglectable here). This is fine with me. I didn't have a use for it anymore, but you've clearly found one, and I think this is probably the best approach here. > 3. Use oidcpy() as-is, without additional padding, and let Valgrind > warn. This false-positive warn might not be so problematic after all, > and maybe I'm just overthinking things :) I'm okay with this, but I don't know if the other end is security sensitive and might need unused data zeroed. If so, we should definitely avoid this option.
diff --git a/hash.h b/hash.h index 3fb0c3d400..dafdcb3335 100644 --- a/hash.h +++ b/hash.h @@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p) struct object_id { unsigned char hash[GIT_MAX_RAWSZ]; + int algo; }; #define the_hash_algo the_repository->hash_algo
Now that we're working with multiple hash algorithms in the same repo, it's best if we label each object ID with its algorithm so we can determine how to format a given object ID. Add a member called algo to struct object_id. Performance testing on object ID-heavy workloads doesn't reveal a clear change in performance. Out of performance tests t0001 and t1450, there are slight variations in performance both up and down, but all measurements are within the margin of error. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- hash.h | 1 + 1 file changed, 1 insertion(+)