[v3,2/3] t/: port helper/test-sha1.c to unit-tests/t-hash.c

Message ID	20240523235945.26833-3-shyamthakkar001@gmail.com (mailing list archive)
State	New, archived
Headers	show Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74BA03C2F for <git@vger.kernel.org>; Fri, 24 May 2024 00:00:32 +0000 (UTC) From: Ghanshyam Thakkar <shyamthakkar001@gmail.com> To: ach.lumap@gmail.com Cc: chriscool@tuxfamily.org, christian.couder@gmail.com, git@vger.kernel.org, gitster@pobox.com, kaartic.sivaraam@gmail.com, ps@pks.im Subject: [PATCH v3 2/3] t/: port helper/test-sha1.c to unit-tests/t-hash.c Date: Fri, 24 May 2024 05:29:44 +0530 Message-ID: <20240523235945.26833-3-shyamthakkar001@gmail.com> In-Reply-To: <20240523235945.26833-1-shyamthakkar001@gmail.com> References: <20240229054004.3807-1-ach.lumap@gmail.com> <20240523235945.26833-1-shyamthakkar001@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Port t0015-hash to the unit testing framework \| expand [v3,0/3] Port t0015-hash to the unit testing framework [v3,1/3] strbuf: introduce strbuf_addstrings() to repeatedly add a string [v3,2/3] t/: port helper/test-sha1.c to unit-tests/t-hash.c [v3,3/3] t/: port helper/test-sha256.c to unit-tests/t-hash.c

Ghanshyam Thakkar May 23, 2024, 11:59 p.m. UTC

t/helper/test-sha1 and t/t0015-hash.sh test the hash implementation of
SHA-1 in Git with basic SHA-1 hash values. Migrate them to the new unit
testing framework for better debugging and runtime performance.

The sha1 subcommand from test-tool is still not removed because it is
relied upon by t0013-sha1dc (which requires 'test-tool sha1' dying
when it is used on a file created to contain the known sha1 attack)
and pack_trailer():lib-pack.sh.

Helped-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Co-authored-by: Achu Luma <ach.lumap@gmail.com>
Signed-off-by: Achu Luma <ach.lumap@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
---
 Makefile              |  1 +
 t/t0015-hash.sh       | 22 ------------------
 t/unit-tests/t-hash.c | 54 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 22 deletions(-)
 create mode 100644 t/unit-tests/t-hash.c

Patrick Steinhardt May 24, 2024, 1:30 p.m. UTC | #1

On Fri, May 24, 2024 at 05:29:44AM +0530, Ghanshyam Thakkar wrote:
> t/helper/test-sha1 and t/t0015-hash.sh test the hash implementation of
> SHA-1 in Git with basic SHA-1 hash values. Migrate them to the new unit
> testing framework for better debugging and runtime performance.
> 
> The sha1 subcommand from test-tool is still not removed because it is
> relied upon by t0013-sha1dc (which requires 'test-tool sha1' dying
> when it is used on a file created to contain the known sha1 attack)
> and pack_trailer():lib-pack.sh.

Can we refactor this test to stop doing that? E.g., would it work if we
used git-hash-object(1) to check that SHA1DC does its thing? Then we
could get rid of the helper altogether, as far as I understand.

> diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c
> new file mode 100644
> index 0000000000..89dfea9cc1
> --- /dev/null
> +++ b/t/unit-tests/t-hash.c
> @@ -0,0 +1,54 @@
> +#include "test-lib.h"
> +#include "hash-ll.h"
> +#include "hex.h"
> +#include "strbuf.h"
> +
> +static void check_hash_data(const void *data, size_t data_length,
> +			    const char *expected, int algo)
> +{
> +	git_hash_ctx ctx;
> +	unsigned char hash[GIT_MAX_HEXSZ];
> +	const struct git_hash_algo *algop = &hash_algos[algo];
> +
> +	if (!check(!!data)) {

Is this double negation needed? Can't we just `if (!check(data))`?

> +		test_msg("Error: No data provided when expecting: %s", expected);

This error message is a bit atypical compared to the other callers of
this function. We could say something like "BUG: test has no data",
which would match something we have in "t/unit-tests/test-lib.c".

> +		return;
> +	}
> +
> +	algop->init_fn(&ctx);
> +	algop->update_fn(&ctx, data, data_length);
> +	algop->final_fn(hash, &ctx);
> +
> +	check_str(hash_to_hex_algop(hash, algop), expected);
> +}
> +
> +/* Works with a NUL terminated string. Doesn't work if it should contain a NUL character. */
> +#define TEST_SHA1_STR(data, expected) \
> +	TEST(check_hash_data(data, strlen(data), expected, GIT_HASH_SHA1), \
> +	     "SHA1 (%s) works", #data)
> +
> +/* Only works with a literal string, useful when it contains a NUL character. */
> +#define TEST_SHA1_LITERAL(literal, expected) \
> +	TEST(check_hash_data(literal, (sizeof(literal) - 1), expected, GIT_HASH_SHA1), \
> +	     "SHA1 (%s) works", #literal)
> 

This macro also works for `TEST_SHA1_STR()`, right? Is there a
partiuclar reason why we don't unify them?

Patrick

Christian Couder May 24, 2024, 2:08 p.m. UTC | #2

On Fri, May 24, 2024 at 3:30 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Fri, May 24, 2024 at 05:29:44AM +0530, Ghanshyam Thakkar wrote:
> > t/helper/test-sha1 and t/t0015-hash.sh test the hash implementation of
> > SHA-1 in Git with basic SHA-1 hash values. Migrate them to the new unit
> > testing framework for better debugging and runtime performance.
> >
> > The sha1 subcommand from test-tool is still not removed because it is
> > relied upon by t0013-sha1dc (which requires 'test-tool sha1' dying
> > when it is used on a file created to contain the known sha1 attack)
> > and pack_trailer():lib-pack.sh.
>
> Can we refactor this test to stop doing that? E.g., would it work if we
> used git-hash-object(1) to check that SHA1DC does its thing? Then we
> could get rid of the helper altogether, as far as I understand.

It could perhaps work if we used git-hash-object(1) instead of
`test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
but we could do that in a separate patch or patch series.

> > diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c
> > new file mode 100644
> > index 0000000000..89dfea9cc1
> > --- /dev/null
> > +++ b/t/unit-tests/t-hash.c
> > @@ -0,0 +1,54 @@
> > +#include "test-lib.h"
> > +#include "hash-ll.h"
> > +#include "hex.h"
> > +#include "strbuf.h"
> > +
> > +static void check_hash_data(const void *data, size_t data_length,
> > +                         const char *expected, int algo)
> > +{
> > +     git_hash_ctx ctx;
> > +     unsigned char hash[GIT_MAX_HEXSZ];
> > +     const struct git_hash_algo *algop = &hash_algos[algo];
> > +
> > +     if (!check(!!data)) {
>
> Is this double negation needed? Can't we just `if (!check(data))`?

As far as I remember it is needed as check() is expecting an 'int'
while 'data' is a 'void *'.

> > +             test_msg("Error: No data provided when expecting: %s", expected);
>
> This error message is a bit atypical compared to the other callers of
> this function. We could say something like "BUG: test has no data",
> which would match something we have in "t/unit-tests/test-lib.c".

Actually I think something like "BUG: Null data pointer provided"
would be even better.

> > +             return;
> > +     }
> > +
> > +     algop->init_fn(&ctx);
> > +     algop->update_fn(&ctx, data, data_length);
> > +     algop->final_fn(hash, &ctx);
> > +
> > +     check_str(hash_to_hex_algop(hash, algop), expected);
> > +}
> > +
> > +/* Works with a NUL terminated string. Doesn't work if it should contain a NUL character. */
> > +#define TEST_SHA1_STR(data, expected) \
> > +     TEST(check_hash_data(data, strlen(data), expected, GIT_HASH_SHA1), \
> > +          "SHA1 (%s) works", #data)
> > +
> > +/* Only works with a literal string, useful when it contains a NUL character. */
> > +#define TEST_SHA1_LITERAL(literal, expected) \
> > +     TEST(check_hash_data(literal, (sizeof(literal) - 1), expected, GIT_HASH_SHA1), \
> > +          "SHA1 (%s) works", #literal)
> >
>
> This macro also works for `TEST_SHA1_STR()`, right?

No, it uses 'sizeof(literal)' which works only for string literals.

> Is there a
> partiuclar reason why we don't unify them?

The comments above them try to explain that the first one doesn't work
when the data contains a NUL char as it uses strlen() while the second
one works only for string literals including those which contain NUL
characters.

Thanks for your review.

Junio C Hamano May 24, 2024, 3:49 p.m. UTC | #3

Christian Couder <christian.couder@gmail.com> writes:

>> Can we refactor this test to stop doing that? E.g., would it work if we
>> used git-hash-object(1) to check that SHA1DC does its thing? Then we
>> could get rid of the helper altogether, as far as I understand.
>
> It could perhaps work if we used git-hash-object(1) instead of
> `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
> but we could do that in a separate patch or patch series.

Yeah, I think such a plan to make preliminary refactoring as a
separate series, and then have another series to get rid of
"test-tool sha1" (and "test-tool sha256" as well?) on top of it
would work well.

>> > +     if (!check(!!data)) {
>>
>> Is this double negation needed? Can't we just `if (!check(data))`?
>
> As far as I remember it is needed as check() is expecting an 'int'
> while 'data' is a 'void *'.

It might be easier to read by being more explicit, "data != NULL",
if that is the case?  check() is like assert(), i.e., "we expect
data is not NULL", and if (!check("expected condition")) { guards an
error handling block for the case in which the expectation is not
met, right?

Ghanshyam Thakkar June 15, 2024, 8:14 p.m. UTC | #4

On Fri, 24 May 2024, Junio C Hamano <gitster@pobox.com> wrote:
> Christian Couder <christian.couder@gmail.com> writes:
> 
> >> Can we refactor this test to stop doing that? E.g., would it work if we
> >> used git-hash-object(1) to check that SHA1DC does its thing? Then we
> >> could get rid of the helper altogether, as far as I understand.
> >
> > It could perhaps work if we used git-hash-object(1) instead of
> > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
> > but we could do that in a separate patch or patch series.
> 
> Yeah, I think such a plan to make preliminary refactoring as a
> separate series, and then have another series to get rid of
> "test-tool sha1" (and "test-tool sha256" as well?) on top of it
> would work well.

It seems that git-hash-object does not die (or give an error) when
providing t0013/shattered-1.pdf, and gives a different hash than the
one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently
replacing the hash when it detects the collision. Is this an expected
behaviour?

Thanks.

Jeff King June 16, 2024, 4:52 a.m. UTC | #5

On Sun, Jun 16, 2024 at 01:44:07AM +0530, Ghanshyam Thakkar wrote:

> On Fri, 24 May 2024, Junio C Hamano <gitster@pobox.com> wrote:
> > Christian Couder <christian.couder@gmail.com> writes:
> > 
> > >> Can we refactor this test to stop doing that? E.g., would it work if we
> > >> used git-hash-object(1) to check that SHA1DC does its thing? Then we
> > >> could get rid of the helper altogether, as far as I understand.
> > >
> > > It could perhaps work if we used git-hash-object(1) instead of
> > > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
> > > but we could do that in a separate patch or patch series.
> > 
> > Yeah, I think such a plan to make preliminary refactoring as a
> > separate series, and then have another series to get rid of
> > "test-tool sha1" (and "test-tool sha256" as well?) on top of it
> > would work well.
> 
> It seems that git-hash-object does not die (or give an error) when
> providing t0013/shattered-1.pdf, and gives a different hash than the
> one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently
> replacing the hash when it detects the collision. Is this an expected
> behaviour?

The shattered files do not create a collision (nor trigger the detection
in sha1dc) when hashed as Git objects. The reason is that Git objects
are not a straight hash of the contents, but have the object type and
size prepended.  One _could_ use the same techniques that created the
shattered files to create a colliding set of Git objects, but AFAIK
nobody has done so (and it probably costs tens of thousands of USD,
though perhaps getting cheaper every year).

So no, git-hash-object can't be used to test this. You have to directly
hash some contents with sha1, and I don't think there is any way to do
that with regular Git commands. Anything working with objects will use
the type+size format. We also use sha1 for the csum-file.[ch] mechanism,
where it is a straight hash of the contents (and we use this for
packfiles, etc). But there's not an easy way to feed an arbitrary file
to that system.

It's possible there might be a way to abuse hashfd_check() to feed an
arbitrary file. E.g., stick shattered-1.pdf into a .pack file or
something, then ask "index-pack --verify" to check it. But I don't think
even that works, because before we even get to the final checksum, we're
verifying the actual contents as we go.

So I think we need to keep some mechanism for computing the sha1 of
arbitrary contents.

-Peff

Junio C Hamano June 17, 2024, 5:44 p.m. UTC | #6

Jeff King <peff@peff.net> writes:

> So no, git-hash-object can't be used to test this. You have to directly
> hash some contents with sha1, and I don't think there is any way to do
> that with regular Git commands.
>
> So I think we need to keep some mechanism for computing the sha1 of
> arbitrary contents.

You're right.  We'd need a separate test helper if we wanted to keep
using the shattered sample files as-is (which we do).

Thanks.

Ghanshyam Thakkar June 21, 2024, 6:37 p.m. UTC | #7

On Sun Jun 16, 2024 at 10:22 AM IST, Jeff King wrote:
> On Sun, Jun 16, 2024 at 01:44:07AM +0530, Ghanshyam Thakkar wrote:
>
> > On Fri, 24 May 2024, Junio C Hamano <gitster@pobox.com> wrote:
> > > Christian Couder <christian.couder@gmail.com> writes:
> > > 
> > > >> Can we refactor this test to stop doing that? E.g., would it work if we
> > > >> used git-hash-object(1) to check that SHA1DC does its thing? Then we
> > > >> could get rid of the helper altogether, as far as I understand.
> > > >
> > > > It could perhaps work if we used git-hash-object(1) instead of
> > > > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing,
> > > > but we could do that in a separate patch or patch series.
> > > 
> > > Yeah, I think such a plan to make preliminary refactoring as a
> > > separate series, and then have another series to get rid of
> > > "test-tool sha1" (and "test-tool sha256" as well?) on top of it
> > > would work well.
> > 
> > It seems that git-hash-object does not die (or give an error) when
> > providing t0013/shattered-1.pdf, and gives a different hash than the
> > one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently
> > replacing the hash when it detects the collision. Is this an expected
> > behaviour?
>
> The shattered files do not create a collision (nor trigger the detection
> in sha1dc) when hashed as Git objects. The reason is that Git objects
> are not a straight hash of the contents, but have the object type and
> size prepended.  One _could_ use the same techniques that created the
> shattered files to create a colliding set of Git objects, but AFAIK
> nobody has done so (and it probably costs tens of thousands of USD,
> though perhaps getting cheaper every year).
>
> So no, git-hash-object can't be used to test this. You have to directly
> hash some contents with sha1, and I don't think there is any way to do
> that with regular Git commands. Anything working with objects will use
> the type+size format. We also use sha1 for the csum-file.[ch] mechanism,
> where it is a straight hash of the contents (and we use this for
> packfiles, etc). But there's not an easy way to feed an arbitrary file
> to that system.
>
> It's possible there might be a way to abuse hashfd_check() to feed an
> arbitrary file. E.g., stick shattered-1.pdf into a .pack file or
> something, then ask "index-pack --verify" to check it. But I don't think
> even that works, because before we even get to the final checksum, we're
> verifying the actual contents as we go.
>
> So I think we need to keep some mechanism for computing the sha1 of
> arbitrary contents.

Thank you for the detailed explanation. Then I suppose we should keep
these helpers (test-{sha1, sha256, hash}) as it is.

[v3,2/3] t/: port helper/test-sha1.c to unit-tests/t-hash.c

Commit Message

Comments

Patch