Message ID | 20240523235945.26833-3-shyamthakkar001@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Port t0015-hash to the unit testing framework | expand |
On Fri, May 24, 2024 at 05:29:44AM +0530, Ghanshyam Thakkar wrote: > t/helper/test-sha1 and t/t0015-hash.sh test the hash implementation of > SHA-1 in Git with basic SHA-1 hash values. Migrate them to the new unit > testing framework for better debugging and runtime performance. > > The sha1 subcommand from test-tool is still not removed because it is > relied upon by t0013-sha1dc (which requires 'test-tool sha1' dying > when it is used on a file created to contain the known sha1 attack) > and pack_trailer():lib-pack.sh. Can we refactor this test to stop doing that? E.g., would it work if we used git-hash-object(1) to check that SHA1DC does its thing? Then we could get rid of the helper altogether, as far as I understand. > diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c > new file mode 100644 > index 0000000000..89dfea9cc1 > --- /dev/null > +++ b/t/unit-tests/t-hash.c > @@ -0,0 +1,54 @@ > +#include "test-lib.h" > +#include "hash-ll.h" > +#include "hex.h" > +#include "strbuf.h" > + > +static void check_hash_data(const void *data, size_t data_length, > + const char *expected, int algo) > +{ > + git_hash_ctx ctx; > + unsigned char hash[GIT_MAX_HEXSZ]; > + const struct git_hash_algo *algop = &hash_algos[algo]; > + > + if (!check(!!data)) { Is this double negation needed? Can't we just `if (!check(data))`? > + test_msg("Error: No data provided when expecting: %s", expected); This error message is a bit atypical compared to the other callers of this function. We could say something like "BUG: test has no data", which would match something we have in "t/unit-tests/test-lib.c". > + return; > + } > + > + algop->init_fn(&ctx); > + algop->update_fn(&ctx, data, data_length); > + algop->final_fn(hash, &ctx); > + > + check_str(hash_to_hex_algop(hash, algop), expected); > +} > + > +/* Works with a NUL terminated string. Doesn't work if it should contain a NUL character. */ > +#define TEST_SHA1_STR(data, expected) \ > + TEST(check_hash_data(data, strlen(data), expected, GIT_HASH_SHA1), \ > + "SHA1 (%s) works", #data) > + > +/* Only works with a literal string, useful when it contains a NUL character. */ > +#define TEST_SHA1_LITERAL(literal, expected) \ > + TEST(check_hash_data(literal, (sizeof(literal) - 1), expected, GIT_HASH_SHA1), \ > + "SHA1 (%s) works", #literal) > This macro also works for `TEST_SHA1_STR()`, right? Is there a partiuclar reason why we don't unify them? Patrick
On Fri, May 24, 2024 at 3:30 PM Patrick Steinhardt <ps@pks.im> wrote: > > On Fri, May 24, 2024 at 05:29:44AM +0530, Ghanshyam Thakkar wrote: > > t/helper/test-sha1 and t/t0015-hash.sh test the hash implementation of > > SHA-1 in Git with basic SHA-1 hash values. Migrate them to the new unit > > testing framework for better debugging and runtime performance. > > > > The sha1 subcommand from test-tool is still not removed because it is > > relied upon by t0013-sha1dc (which requires 'test-tool sha1' dying > > when it is used on a file created to contain the known sha1 attack) > > and pack_trailer():lib-pack.sh. > > Can we refactor this test to stop doing that? E.g., would it work if we > used git-hash-object(1) to check that SHA1DC does its thing? Then we > could get rid of the helper altogether, as far as I understand. It could perhaps work if we used git-hash-object(1) instead of `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing, but we could do that in a separate patch or patch series. > > diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c > > new file mode 100644 > > index 0000000000..89dfea9cc1 > > --- /dev/null > > +++ b/t/unit-tests/t-hash.c > > @@ -0,0 +1,54 @@ > > +#include "test-lib.h" > > +#include "hash-ll.h" > > +#include "hex.h" > > +#include "strbuf.h" > > + > > +static void check_hash_data(const void *data, size_t data_length, > > + const char *expected, int algo) > > +{ > > + git_hash_ctx ctx; > > + unsigned char hash[GIT_MAX_HEXSZ]; > > + const struct git_hash_algo *algop = &hash_algos[algo]; > > + > > + if (!check(!!data)) { > > Is this double negation needed? Can't we just `if (!check(data))`? As far as I remember it is needed as check() is expecting an 'int' while 'data' is a 'void *'. > > + test_msg("Error: No data provided when expecting: %s", expected); > > This error message is a bit atypical compared to the other callers of > this function. We could say something like "BUG: test has no data", > which would match something we have in "t/unit-tests/test-lib.c". Actually I think something like "BUG: Null data pointer provided" would be even better. > > + return; > > + } > > + > > + algop->init_fn(&ctx); > > + algop->update_fn(&ctx, data, data_length); > > + algop->final_fn(hash, &ctx); > > + > > + check_str(hash_to_hex_algop(hash, algop), expected); > > +} > > + > > +/* Works with a NUL terminated string. Doesn't work if it should contain a NUL character. */ > > +#define TEST_SHA1_STR(data, expected) \ > > + TEST(check_hash_data(data, strlen(data), expected, GIT_HASH_SHA1), \ > > + "SHA1 (%s) works", #data) > > + > > +/* Only works with a literal string, useful when it contains a NUL character. */ > > +#define TEST_SHA1_LITERAL(literal, expected) \ > > + TEST(check_hash_data(literal, (sizeof(literal) - 1), expected, GIT_HASH_SHA1), \ > > + "SHA1 (%s) works", #literal) > > > > This macro also works for `TEST_SHA1_STR()`, right? No, it uses 'sizeof(literal)' which works only for string literals. > Is there a > partiuclar reason why we don't unify them? The comments above them try to explain that the first one doesn't work when the data contains a NUL char as it uses strlen() while the second one works only for string literals including those which contain NUL characters. Thanks for your review.
Christian Couder <christian.couder@gmail.com> writes: >> Can we refactor this test to stop doing that? E.g., would it work if we >> used git-hash-object(1) to check that SHA1DC does its thing? Then we >> could get rid of the helper altogether, as far as I understand. > > It could perhaps work if we used git-hash-object(1) instead of > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing, > but we could do that in a separate patch or patch series. Yeah, I think such a plan to make preliminary refactoring as a separate series, and then have another series to get rid of "test-tool sha1" (and "test-tool sha256" as well?) on top of it would work well. >> > + if (!check(!!data)) { >> >> Is this double negation needed? Can't we just `if (!check(data))`? > > As far as I remember it is needed as check() is expecting an 'int' > while 'data' is a 'void *'. It might be easier to read by being more explicit, "data != NULL", if that is the case? check() is like assert(), i.e., "we expect data is not NULL", and if (!check("expected condition")) { guards an error handling block for the case in which the expectation is not met, right?
On Fri, 24 May 2024, Junio C Hamano <gitster@pobox.com> wrote: > Christian Couder <christian.couder@gmail.com> writes: > > >> Can we refactor this test to stop doing that? E.g., would it work if we > >> used git-hash-object(1) to check that SHA1DC does its thing? Then we > >> could get rid of the helper altogether, as far as I understand. > > > > It could perhaps work if we used git-hash-object(1) instead of > > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing, > > but we could do that in a separate patch or patch series. > > Yeah, I think such a plan to make preliminary refactoring as a > separate series, and then have another series to get rid of > "test-tool sha1" (and "test-tool sha256" as well?) on top of it > would work well. It seems that git-hash-object does not die (or give an error) when providing t0013/shattered-1.pdf, and gives a different hash than the one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently replacing the hash when it detects the collision. Is this an expected behaviour? Thanks.
On Sun, Jun 16, 2024 at 01:44:07AM +0530, Ghanshyam Thakkar wrote: > On Fri, 24 May 2024, Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > > >> Can we refactor this test to stop doing that? E.g., would it work if we > > >> used git-hash-object(1) to check that SHA1DC does its thing? Then we > > >> could get rid of the helper altogether, as far as I understand. > > > > > > It could perhaps work if we used git-hash-object(1) instead of > > > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing, > > > but we could do that in a separate patch or patch series. > > > > Yeah, I think such a plan to make preliminary refactoring as a > > separate series, and then have another series to get rid of > > "test-tool sha1" (and "test-tool sha256" as well?) on top of it > > would work well. > > It seems that git-hash-object does not die (or give an error) when > providing t0013/shattered-1.pdf, and gives a different hash than the > one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently > replacing the hash when it detects the collision. Is this an expected > behaviour? The shattered files do not create a collision (nor trigger the detection in sha1dc) when hashed as Git objects. The reason is that Git objects are not a straight hash of the contents, but have the object type and size prepended. One _could_ use the same techniques that created the shattered files to create a colliding set of Git objects, but AFAIK nobody has done so (and it probably costs tens of thousands of USD, though perhaps getting cheaper every year). So no, git-hash-object can't be used to test this. You have to directly hash some contents with sha1, and I don't think there is any way to do that with regular Git commands. Anything working with objects will use the type+size format. We also use sha1 for the csum-file.[ch] mechanism, where it is a straight hash of the contents (and we use this for packfiles, etc). But there's not an easy way to feed an arbitrary file to that system. It's possible there might be a way to abuse hashfd_check() to feed an arbitrary file. E.g., stick shattered-1.pdf into a .pack file or something, then ask "index-pack --verify" to check it. But I don't think even that works, because before we even get to the final checksum, we're verifying the actual contents as we go. So I think we need to keep some mechanism for computing the sha1 of arbitrary contents. -Peff
Jeff King <peff@peff.net> writes: > So no, git-hash-object can't be used to test this. You have to directly > hash some contents with sha1, and I don't think there is any way to do > that with regular Git commands. > > So I think we need to keep some mechanism for computing the sha1 of > arbitrary contents. You're right. We'd need a separate test helper if we wanted to keep using the shattered sample files as-is (which we do). Thanks.
On Sun Jun 16, 2024 at 10:22 AM IST, Jeff King wrote: > On Sun, Jun 16, 2024 at 01:44:07AM +0530, Ghanshyam Thakkar wrote: > > > On Fri, 24 May 2024, Junio C Hamano <gitster@pobox.com> wrote: > > > Christian Couder <christian.couder@gmail.com> writes: > > > > > > >> Can we refactor this test to stop doing that? E.g., would it work if we > > > >> used git-hash-object(1) to check that SHA1DC does its thing? Then we > > > >> could get rid of the helper altogether, as far as I understand. > > > > > > > > It could perhaps work if we used git-hash-object(1) instead of > > > > `test-tool sha1` in t0013-sha1dc to check that SHA1DC does its thing, > > > > but we could do that in a separate patch or patch series. > > > > > > Yeah, I think such a plan to make preliminary refactoring as a > > > separate series, and then have another series to get rid of > > > "test-tool sha1" (and "test-tool sha256" as well?) on top of it > > > would work well. > > > > It seems that git-hash-object does not die (or give an error) when > > providing t0013/shattered-1.pdf, and gives a different hash than the > > one explicitly mentioned t0013-sha1dc.sh. I suppose it is silently > > replacing the hash when it detects the collision. Is this an expected > > behaviour? > > The shattered files do not create a collision (nor trigger the detection > in sha1dc) when hashed as Git objects. The reason is that Git objects > are not a straight hash of the contents, but have the object type and > size prepended. One _could_ use the same techniques that created the > shattered files to create a colliding set of Git objects, but AFAIK > nobody has done so (and it probably costs tens of thousands of USD, > though perhaps getting cheaper every year). > > So no, git-hash-object can't be used to test this. You have to directly > hash some contents with sha1, and I don't think there is any way to do > that with regular Git commands. Anything working with objects will use > the type+size format. We also use sha1 for the csum-file.[ch] mechanism, > where it is a straight hash of the contents (and we use this for > packfiles, etc). But there's not an easy way to feed an arbitrary file > to that system. > > It's possible there might be a way to abuse hashfd_check() to feed an > arbitrary file. E.g., stick shattered-1.pdf into a .pack file or > something, then ask "index-pack --verify" to check it. But I don't think > even that works, because before we even get to the final checksum, we're > verifying the actual contents as we go. > > So I think we need to keep some mechanism for computing the sha1 of > arbitrary contents. Thank you for the detailed explanation. Then I suppose we should keep these helpers (test-{sha1, sha256, hash}) as it is.
diff --git a/Makefile b/Makefile index 8f4432ae57..2b19fdf6ae 100644 --- a/Makefile +++ b/Makefile @@ -1335,6 +1335,7 @@ THIRD_PARTY_SOURCES += sha1collisiondetection/% THIRD_PARTY_SOURCES += sha1dc/% UNIT_TEST_PROGRAMS += t-ctype +UNIT_TEST_PROGRAMS += t-hash UNIT_TEST_PROGRAMS += t-mem-pool UNIT_TEST_PROGRAMS += t-prio-queue UNIT_TEST_PROGRAMS += t-strbuf diff --git a/t/t0015-hash.sh b/t/t0015-hash.sh index 0a087a1983..2264e702d5 100755 --- a/t/t0015-hash.sh +++ b/t/t0015-hash.sh @@ -5,28 +5,6 @@ test_description='test basic hash implementation' TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh -test_expect_success 'test basic SHA-1 hash values' ' - test-tool sha1 </dev/null >actual && - grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual && - printf "a" | test-tool sha1 >actual && - grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual && - printf "abc" | test-tool sha1 >actual && - grep a9993e364706816aba3e25717850c26c9cd0d89d actual && - printf "message digest" | test-tool sha1 >actual && - grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual && - printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual && - grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual && - perl -e "$| = 1; print q{aaaaaaaaaa} for 1..100000;" | - test-tool sha1 >actual && - grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual && - printf "blob 0\0" | test-tool sha1 >actual && - grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual && - printf "blob 3\0abc" | test-tool sha1 >actual && - grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual && - printf "tree 0\0" | test-tool sha1 >actual && - grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual -' - test_expect_success 'test basic SHA-256 hash values' ' test-tool sha256 </dev/null >actual && grep e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 actual && diff --git a/t/unit-tests/t-hash.c b/t/unit-tests/t-hash.c new file mode 100644 index 0000000000..89dfea9cc1 --- /dev/null +++ b/t/unit-tests/t-hash.c @@ -0,0 +1,54 @@ +#include "test-lib.h" +#include "hash-ll.h" +#include "hex.h" +#include "strbuf.h" + +static void check_hash_data(const void *data, size_t data_length, + const char *expected, int algo) +{ + git_hash_ctx ctx; + unsigned char hash[GIT_MAX_HEXSZ]; + const struct git_hash_algo *algop = &hash_algos[algo]; + + if (!check(!!data)) { + test_msg("Error: No data provided when expecting: %s", expected); + return; + } + + algop->init_fn(&ctx); + algop->update_fn(&ctx, data, data_length); + algop->final_fn(hash, &ctx); + + check_str(hash_to_hex_algop(hash, algop), expected); +} + +/* Works with a NUL terminated string. Doesn't work if it should contain a NUL character. */ +#define TEST_SHA1_STR(data, expected) \ + TEST(check_hash_data(data, strlen(data), expected, GIT_HASH_SHA1), \ + "SHA1 (%s) works", #data) + +/* Only works with a literal string, useful when it contains a NUL character. */ +#define TEST_SHA1_LITERAL(literal, expected) \ + TEST(check_hash_data(literal, (sizeof(literal) - 1), expected, GIT_HASH_SHA1), \ + "SHA1 (%s) works", #literal) + +int cmd_main(int argc, const char **argv) +{ + struct strbuf aaaaaaaaaa_100000 = STRBUF_INIT; + + strbuf_addstrings(&aaaaaaaaaa_100000, "aaaaaaaaaa", 100000); + + TEST_SHA1_STR("", "da39a3ee5e6b4b0d3255bfef95601890afd80709"); + TEST_SHA1_STR("a", "86f7e437faa5a7fce15d1ddcb9eaeaea377667b8"); + TEST_SHA1_STR("abc", "a9993e364706816aba3e25717850c26c9cd0d89d"); + TEST_SHA1_STR("message digest", "c12252ceda8be8994d5fa0290a47231c1d16aae3"); + TEST_SHA1_STR("abcdefghijklmnopqrstuvwxyz", "32d10c7b8cf96570ca04ce37f2a19d84240d3a89"); + TEST_SHA1_STR(aaaaaaaaaa_100000.buf, "34aa973cd4c4daa4f61eeb2bdbad27316534016f"); + TEST_SHA1_LITERAL("blob 0\0", "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391"); + TEST_SHA1_LITERAL("blob 3\0abc", "f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f"); + TEST_SHA1_LITERAL("tree 0\0", "4b825dc642cb6eb9a060e54bf8d69288fbee4904"); + + strbuf_release(&aaaaaaaaaa_100000); + + return test_done(); +}