Message ID | 4bbfd345d16da4604dd20decda8ecb12372e4223.1597428440.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [1/3] t/README: document GIT_TEST_DEFAULT_HASH | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <dstolee@microsoft.com> > > The commit-graph format reserved a byte among the header of the file to > store a "hash version". During the SHA-256 work, this was not modified > because file formats are not necessarily intended to work across hash > versions. If a repository has SHA-256 as its hash algorithm, it > automatically up-shifts the lengths of object names in all necessary > formats. > > However, since we have this byte available for adjusting the version, we > can make the file formats more obviously incompatible instead of relying > on other context from the repository. Very good idea. > Update the oid_version() method in commit-graph.c to add a new value, 2, > for sha-256. This automatically writes the new value in a SHA-256 > repository _and_ verifies the value is correct. This is a breaking > change relative to the current 'master' branch since 092b677 (Merge > branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking > relative to any released version of Git. That is perfectly fine. I think any file and on-wire protocol that uses anything but SHA-1 without identifying what it uses is a bug. Will queue. Thanks. > +OID_VERSION=1 > +if [ "$GIT_DEFAULT_HASH" = "sha256" ] > +then > + OID_VERSION=2 > +fi > + > test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' > git init && > mkdir A A/B A/B/C && > @@ -35,7 +41,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' > graph_read_expect () { > NUM_CHUNKS=5 > cat >expect <<- EOF > - header: 43475048 1 1 $NUM_CHUNKS 0 > + header: 43475048 1 $OID_VERSION $NUM_CHUNKS 0 > num_commits: $1 > chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data > EOF > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > index 044cf8a3de..5b65017676 100755 > --- a/t/t5318-commit-graph.sh > +++ b/t/t5318-commit-graph.sh > @@ -5,6 +5,12 @@ test_description='commit graph' > > GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 > > +OID_VERSION=1 > +if [ "$GIT_DEFAULT_HASH" = "sha256" ] > +then > + OID_VERSION=2 > +fi > + > test_expect_success 'setup full repo' ' > mkdir full && > cd "$TRASH_DIRECTORY/full" && > @@ -77,7 +83,7 @@ graph_read_expect() { > NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) > fi > cat >expect <<- EOF > - header: 43475048 1 1 $NUM_CHUNKS 0 > + header: 43475048 1 $OID_VERSION $NUM_CHUNKS 0 > num_commits: $1 > chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL > EOF > @@ -412,6 +418,35 @@ test_expect_success 'replace-objects invalidates commit-graph' ' > ) > ' > > +test_expect_success 'warn on improper hash version' ' > + git init --object-format=sha1 sha1 && > + ( > + cd sha1 && > + test_commit 1 && > + git commit-graph write --reachable && > + mv .git/objects/info/commit-graph ../cg-sha1 > + ) && > + git init --object-format=sha256 sha256 && > + ( > + cd sha256 && > + test_commit 1 && > + git commit-graph write --reachable && > + mv .git/objects/info/commit-graph ../cg-sha256 > + ) && > + ( > + cd sha1 && > + mv ../cg-sha256 .git/objects/info/commit-graph && > + git log -1 2>err && > + test_i18ngrep "commit-graph hash version 2 does not match version 1" err > + ) && > + ( > + cd sha256 && > + mv ../cg-sha1 .git/objects/info/commit-graph && > + git log -1 2>err && > + test_i18ngrep "commit-graph hash version 1 does not match version 2" err > + ) > +' > + > # the verify tests below expect the commit-graph to contain > # exactly the commits reachable from the commits/8 branch. > # If the file changes the set of commits in the list, then the > diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh > index ea28d522b8..6f1a324f4f 100755 > --- a/t/t5324-split-commit-graph.sh > +++ b/t/t5324-split-commit-graph.sh > @@ -6,6 +6,12 @@ test_description='split commit graph' > GIT_TEST_COMMIT_GRAPH=0 > GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 > > +OID_VERSION=1 > +if [ "$GIT_DEFAULT_HASH" = "sha256" ] > +then > + OID_VERSION=2 > +fi > + > test_expect_success 'setup repo' ' > git init && > git config core.commitGraph true && > @@ -28,7 +34,7 @@ graph_read_expect() { > NUM_BASE=$2 > fi > cat >expect <<- EOF > - header: 43475048 1 1 3 $NUM_BASE > + header: 43475048 1 $OID_VERSION 3 $NUM_BASE > num_commits: $1 > chunks: oid_fanout oid_lookup commit_metadata > EOF
On Fri, Aug 14, 2020 at 12:05:09PM -0700, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Derrick Stolee <dstolee@microsoft.com> > > > > The commit-graph format reserved a byte among the header of the file to > > store a "hash version". During the SHA-256 work, this was not modified > > because file formats are not necessarily intended to work across hash > > versions. If a repository has SHA-256 as its hash algorithm, it > > automatically up-shifts the lengths of object names in all necessary > > formats. > > > > However, since we have this byte available for adjusting the version, we > > can make the file formats more obviously incompatible instead of relying > > on other context from the repository. > > Very good idea. > > > Update the oid_version() method in commit-graph.c to add a new value, 2, > > for sha-256. This automatically writes the new value in a SHA-256 > > repository _and_ verifies the value is correct. This is a breaking > > change relative to the current 'master' branch since 092b677 (Merge > > branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking > > relative to any released version of Git. > > That is perfectly fine. I think any file and on-wire protocol that > uses anything but SHA-1 without identifying what it uses is a bug. > > Will queue. Thanks. Great. I have nothing to add other than my 'ack' that this is a good idea. > > +OID_VERSION=1 > > +if [ "$GIT_DEFAULT_HASH" = "sha256" ] > > +then > > + OID_VERSION=2 > > +fi > > + > > test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' > > git init && > > mkdir A A/B A/B/C && > > @@ -35,7 +41,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' > > graph_read_expect () { > > NUM_CHUNKS=5 > > cat >expect <<- EOF > > - header: 43475048 1 1 $NUM_CHUNKS 0 > > + header: 43475048 1 $OID_VERSION $NUM_CHUNKS 0 > > num_commits: $1 > > chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data > > EOF > > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > > index 044cf8a3de..5b65017676 100755 > > --- a/t/t5318-commit-graph.sh > > +++ b/t/t5318-commit-graph.sh > > @@ -5,6 +5,12 @@ test_description='commit graph' > > > > GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 > > > > +OID_VERSION=1 > > +if [ "$GIT_DEFAULT_HASH" = "sha256" ] > > +then > > + OID_VERSION=2 > > +fi > > + > > test_expect_success 'setup full repo' ' > > mkdir full && > > cd "$TRASH_DIRECTORY/full" && > > @@ -77,7 +83,7 @@ graph_read_expect() { > > NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) > > fi > > cat >expect <<- EOF > > - header: 43475048 1 1 $NUM_CHUNKS 0 > > + header: 43475048 1 $OID_VERSION $NUM_CHUNKS 0 > > num_commits: $1 > > chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL > > EOF > > @@ -412,6 +418,35 @@ test_expect_success 'replace-objects invalidates commit-graph' ' > > ) > > ' > > > > +test_expect_success 'warn on improper hash version' ' > > + git init --object-format=sha1 sha1 && > > + ( > > + cd sha1 && > > + test_commit 1 && > > + git commit-graph write --reachable && > > + mv .git/objects/info/commit-graph ../cg-sha1 > > + ) && > > + git init --object-format=sha256 sha256 && > > + ( > > + cd sha256 && > > + test_commit 1 && > > + git commit-graph write --reachable && > > + mv .git/objects/info/commit-graph ../cg-sha256 > > + ) && > > + ( > > + cd sha1 && > > + mv ../cg-sha256 .git/objects/info/commit-graph && > > + git log -1 2>err && > > + test_i18ngrep "commit-graph hash version 2 does not match version 1" err > > + ) && > > + ( > > + cd sha256 && > > + mv ../cg-sha1 .git/objects/info/commit-graph && > > + git log -1 2>err && > > + test_i18ngrep "commit-graph hash version 1 does not match version 2" err > > + ) > > +' > > + > > # the verify tests below expect the commit-graph to contain > > # exactly the commits reachable from the commits/8 branch. > > # If the file changes the set of commits in the list, then the > > diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh > > index ea28d522b8..6f1a324f4f 100755 > > --- a/t/t5324-split-commit-graph.sh > > +++ b/t/t5324-split-commit-graph.sh > > @@ -6,6 +6,12 @@ test_description='split commit graph' > > GIT_TEST_COMMIT_GRAPH=0 > > GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 > > > > +OID_VERSION=1 > > +if [ "$GIT_DEFAULT_HASH" = "sha256" ] > > +then > > + OID_VERSION=2 > > +fi > > + > > test_expect_success 'setup repo' ' > > git init && > > git config core.commitGraph true && > > @@ -28,7 +34,7 @@ graph_read_expect() { > > NUM_BASE=$2 > > fi > > cat >expect <<- EOF > > - header: 43475048 1 1 3 $NUM_BASE > > + header: 43475048 1 $OID_VERSION 3 $NUM_BASE > > num_commits: $1 > > chunks: oid_fanout oid_lookup commit_metadata > > EOF Thanks, Taylor
On 2020-08-14 at 18:07:19, Derrick Stolee via GitGitGadget wrote: > diff --git a/commit-graph.c b/commit-graph.c > index e51c91dd5b..d03328d64c 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -179,7 +179,11 @@ static char *get_chain_filename(struct object_directory *odb) > > static uint8_t oid_version(void) > { > - return 1; > + if (the_hash_algo->rawsz == GIT_SHA1_RAWSZ) > + return 1; > + if (the_hash_algo->rawsz == GIT_SHA256_RAWSZ) > + return 2; > + die(_("invalid hash version")); > } Can we maybe say something like this? switch (hash_algo_by_ptr(the_hash_algo)) { case GIT_HASH_SHA1: return 1; case GIT_HASH_SHA256: return 2; default: die(_("invalid hash version")); } That way if SHA-256 gets broken and we switch to another 256-bit hash (SHA-3-256?), we'll be forced to update this properly. > static struct commit_graph *alloc_commit_graph(void) > diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh > index c21cc160f3..906af2799d 100755 > --- a/t/t4216-log-bloom.sh > +++ b/t/t4216-log-bloom.sh > @@ -6,6 +6,12 @@ test_description='git log for a path with Bloom filters' > GIT_TEST_COMMIT_GRAPH=0 > GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 > > +OID_VERSION=1 > +if [ "$GIT_DEFAULT_HASH" = "sha256" ] > +then > + OID_VERSION=2 > +fi Can we do something like this in the setup test instead? test_oid_cache <<-EOF oid_version sha1:1 oid_version sha256:2 EOF OID_VERSION=$(test_oid oid_version) # or, inline $(test_oid oid_version) That uses the existing test framework to ensure we produce the right results if someone adds another hash algorithm and that we produce a BUG if the value is missing. It will also make it easier to write tests if we end up creating SHA-1- or SHA-256-specific tests, since we can look up those values directly with test_oid. Since you're using this across several tests, you could even just add this to t/oid-info/hash-info like so: commit_graph_oid_version sha1:1 commit_graph_oid_version sha256:2 and then it's set in one place and can be used without any required test_oid_cache invocations at all. I think three tests is sufficient basis for us to add an entry there.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > Can we maybe say something like this? > > switch (hash_algo_by_ptr(the_hash_algo)) { > case GIT_HASH_SHA1: > return 1; > case GIT_HASH_SHA256: > return 2; > default: > die(_("invalid hash version")); > } > > That way if SHA-256 gets broken and we switch to another 256-bit hash > (SHA-3-256?), we'll be forced to update this properly. Excellent.
On 8/14/2020 4:11 PM, brian m. carlson wrote: > On 2020-08-14 at 18:07:19, Derrick Stolee via GitGitGadget wrote: >> diff --git a/commit-graph.c b/commit-graph.c >> index e51c91dd5b..d03328d64c 100644 >> --- a/commit-graph.c >> +++ b/commit-graph.c >> @@ -179,7 +179,11 @@ static char *get_chain_filename(struct object_directory *odb) >> >> static uint8_t oid_version(void) >> { >> - return 1; >> + if (the_hash_algo->rawsz == GIT_SHA1_RAWSZ) >> + return 1; >> + if (the_hash_algo->rawsz == GIT_SHA256_RAWSZ) >> + return 2; >> + die(_("invalid hash version")); >> } > > Can we maybe say something like this? > > switch (hash_algo_by_ptr(the_hash_algo)) { > case GIT_HASH_SHA1: > return 1; > case GIT_HASH_SHA256: > return 2; > default: > die(_("invalid hash version")); > } > > That way if SHA-256 gets broken and we switch to another 256-bit hash > (SHA-3-256?), we'll be forced to update this properly. > >> static struct commit_graph *alloc_commit_graph(void) >> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh >> index c21cc160f3..906af2799d 100755 >> --- a/t/t4216-log-bloom.sh >> +++ b/t/t4216-log-bloom.sh >> @@ -6,6 +6,12 @@ test_description='git log for a path with Bloom filters' >> GIT_TEST_COMMIT_GRAPH=0 >> GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 >> >> +OID_VERSION=1 >> +if [ "$GIT_DEFAULT_HASH" = "sha256" ] >> +then >> + OID_VERSION=2 >> +fi > > Can we do something like this in the setup test instead? > > test_oid_cache <<-EOF > oid_version sha1:1 > oid_version sha256:2 > EOF > > OID_VERSION=$(test_oid oid_version) > # or, inline > $(test_oid oid_version) > > That uses the existing test framework to ensure we produce the right > results if someone adds another hash algorithm and that we produce a BUG > if the value is missing. It will also make it easier to write tests if > we end up creating SHA-1- or SHA-256-specific tests, since we can look > up those values directly with test_oid. > > Since you're using this across several tests, you could even just add > this to t/oid-info/hash-info like so: > > commit_graph_oid_version sha1:1 > commit_graph_oid_version sha256:2 > > and then it's set in one place and can be used without any required > test_oid_cache invocations at all. I think three tests is sufficient > basis for us to add an entry there. I appreciate the suggested improvements here. I'm happy to do something more similar to other places in the code. These will be part of my v2 to be re-rolled early next week. Thanks, -Stolee
On Fri, 14 Aug 2020 at 22:36, Derrick Stolee <stolee@gmail.com> wrote: > > I appreciate the suggested improvements here. I'm happy to do > something more similar to other places in the code. > > These will be part of my v2 to be re-rolled early next week. I have nothing to add to brian's great suggestions. I'll be very happy to see this patch instead of my patch 5/5 'commit-graph-format.txt: fix "Hash Version" description'. Martin
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 440541045d..6ddbceba15 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -42,8 +42,13 @@ HEADER: 1-byte version number: Currently, the only valid version is 1. - 1-byte Hash Version (1 = SHA-1) - We infer the hash length (H) from this value. + 1-byte Hash Version + We infer the hash length (H) from this value: + 1 => SHA-1 + 2 => SHA-256 + If the hash type does not match the repository's hash algorithm, the + commit-graph file should be ignored with a warning presented to the + user. 1-byte number (C) of "chunks" diff --git a/commit-graph.c b/commit-graph.c index e51c91dd5b..d03328d64c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -179,7 +179,11 @@ static char *get_chain_filename(struct object_directory *odb) static uint8_t oid_version(void) { - return 1; + if (the_hash_algo->rawsz == GIT_SHA1_RAWSZ) + return 1; + if (the_hash_algo->rawsz == GIT_SHA256_RAWSZ) + return 2; + die(_("invalid hash version")); } static struct commit_graph *alloc_commit_graph(void) diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh index c21cc160f3..906af2799d 100755 --- a/t/t4216-log-bloom.sh +++ b/t/t4216-log-bloom.sh @@ -6,6 +6,12 @@ test_description='git log for a path with Bloom filters' GIT_TEST_COMMIT_GRAPH=0 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 +OID_VERSION=1 +if [ "$GIT_DEFAULT_HASH" = "sha256" ] +then + OID_VERSION=2 +fi + test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' git init && mkdir A A/B A/B/C && @@ -35,7 +41,7 @@ test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' graph_read_expect () { NUM_CHUNKS=5 cat >expect <<- EOF - header: 43475048 1 1 $NUM_CHUNKS 0 + header: 43475048 1 $OID_VERSION $NUM_CHUNKS 0 num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data EOF diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 044cf8a3de..5b65017676 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -5,6 +5,12 @@ test_description='commit graph' GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 +OID_VERSION=1 +if [ "$GIT_DEFAULT_HASH" = "sha256" ] +then + OID_VERSION=2 +fi + test_expect_success 'setup full repo' ' mkdir full && cd "$TRASH_DIRECTORY/full" && @@ -77,7 +83,7 @@ graph_read_expect() { NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) fi cat >expect <<- EOF - header: 43475048 1 1 $NUM_CHUNKS 0 + header: 43475048 1 $OID_VERSION $NUM_CHUNKS 0 num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL EOF @@ -412,6 +418,35 @@ test_expect_success 'replace-objects invalidates commit-graph' ' ) ' +test_expect_success 'warn on improper hash version' ' + git init --object-format=sha1 sha1 && + ( + cd sha1 && + test_commit 1 && + git commit-graph write --reachable && + mv .git/objects/info/commit-graph ../cg-sha1 + ) && + git init --object-format=sha256 sha256 && + ( + cd sha256 && + test_commit 1 && + git commit-graph write --reachable && + mv .git/objects/info/commit-graph ../cg-sha256 + ) && + ( + cd sha1 && + mv ../cg-sha256 .git/objects/info/commit-graph && + git log -1 2>err && + test_i18ngrep "commit-graph hash version 2 does not match version 1" err + ) && + ( + cd sha256 && + mv ../cg-sha1 .git/objects/info/commit-graph && + git log -1 2>err && + test_i18ngrep "commit-graph hash version 1 does not match version 2" err + ) +' + # the verify tests below expect the commit-graph to contain # exactly the commits reachable from the commits/8 branch. # If the file changes the set of commits in the list, then the diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index ea28d522b8..6f1a324f4f 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -6,6 +6,12 @@ test_description='split commit graph' GIT_TEST_COMMIT_GRAPH=0 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 +OID_VERSION=1 +if [ "$GIT_DEFAULT_HASH" = "sha256" ] +then + OID_VERSION=2 +fi + test_expect_success 'setup repo' ' git init && git config core.commitGraph true && @@ -28,7 +34,7 @@ graph_read_expect() { NUM_BASE=$2 fi cat >expect <<- EOF - header: 43475048 1 1 3 $NUM_BASE + header: 43475048 1 $OID_VERSION 3 $NUM_BASE num_commits: $1 chunks: oid_fanout oid_lookup commit_metadata EOF