Message ID | bbcd97db8a2c94343950bb0cf8cbd5c9c01b4577.1597406877.git.martin.agren@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | more SHA-256 documentation | expand |
On 8/14/2020 8:21 AM, Martin Ågren wrote: > We say that value 1 means "SHA-1", but in fact, it means "whatever > the_hash_algo is", see commit c166599862 ("commit-graph: convert to > using the_hash_algo", 2018-11-14). > > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > --- > If we want to be more fine-grained in the future, we'll need to say, > e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the > version number. > > I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It > could be implemented as "easily" as "if (value_from_header != > value_from_the_hash_algo) die(...);" for now. Might that pay off in the > long run? > > This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're > fine. I think that was the intention of the byte, but that is not what ended up happening. If we want that to be the case, then we should do that work as part of the 2.29 cycle before we release with the ability to create SHA-256 repos (which will lock the commit-graph format for these repos). (By "we" I mean that I would try to do this work in a way that minimizes conflicts with the current commit-graph work in flight [1] [2].) [1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/ [2] https://lore.kernel.org/git/cover.1597178914.git.me@ttaylorr.com/ > [1] https://lore.kernel.org/git/da077fb0-14bb-b84f-c526-d759ebc9f5eb@gmail.com/ > > Documentation/technical/commit-graph-format.txt | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt > index 440541045d..3535426d32 100644 > --- a/Documentation/technical/commit-graph-format.txt > +++ b/Documentation/technical/commit-graph-format.txt > @@ -42,8 +42,8 @@ HEADER: > 1-byte version number: > Currently, the only valid version is 1. > > - 1-byte Hash Version (1 = SHA-1) > - We infer the hash length (H) from this value. > + 1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo) > + We infer the hash length (H) from the hash algo derived from this value. If we are _not_ changing the format to have a meaningful value in this byte, then this documentation should be updated to state that this byte must always have value 1, as it does not provide any information. Thanks, -Stolee
On Fri, 14 Aug 2020 at 14:37, Derrick Stolee <stolee@gmail.com> wrote: > > On 8/14/2020 8:21 AM, Martin Ågren wrote: > > We say that value 1 means "SHA-1", but in fact, it means "whatever > > the_hash_algo is", see commit c166599862 ("commit-graph: convert to > > using the_hash_algo", 2018-11-14). > > > > Signed-off-by: Martin Ågren <martin.agren@gmail.com> > > --- > > If we want to be more fine-grained in the future, we'll need to say, > > e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the > > version number. > > > > I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It > > could be implemented as "easily" as "if (value_from_header != > > value_from_the_hash_algo) die(...);" for now. Might that pay off in the > > long run? > > > > This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're > > fine. > > I think that was the intention of the byte, but that is not what ended > up happening. When I wrote this patch, I did go with "fix <thing>" rather than "document SHA-256". For the other patches, it's sort of obvious how those formats are so old that it's no wonder they assumed SHA-1. But here, we did go to some trouble to try and future-proof things and already had NewHash in mind. So that calls for "fix <thing>". But I'm more and more starting to think that it's the implementation that should be fixed and that the docs should just be extended to add "2 means SHA-256". > If we want that to be the case, then we should do that > work as part of the 2.29 cycle before we release with the ability to > create SHA-256 repos (which will lock the commit-graph format for these > repos). Part of my reasoning behind [3] was that in exactly a situation like this, we'd be able to say With extensions.objectFormat=sha256, Git 2.29-2.30 will barf at the commit-graph files that Git 2.31+ generate, and the other way around. Users will be able to remove "old" files and regenerate them, and shouldn't use a mixed environment. and know that those users knew this might happen. But certainly, if we can avoid it altogether by changing behavior already in 2.29, that would be better. [3] https://lore.kernel.org/git/20200806202358.2265705-1-martin.agren@gmail.com/ > (By "we" I mean that I would try to do this work in a way that minimizes > conflicts with the current commit-graph work in flight [1] [2].) None of those seems to touch `oid_version()`, so if we can just tweak that function to return 1 or 2 instead of always 1, I guess that's one way. > [1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/ > > [2] https://lore.kernel.org/git/cover.1597178914.git.me@ttaylorr.com/ > > > > > - 1-byte Hash Version (1 = SHA-1) > > - We infer the hash length (H) from this value. > > + 1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo) > > + We infer the hash length (H) from the hash algo derived from this value. > > If we are _not_ changing the format to have a meaningful value in > this byte, then this documentation should be updated to state that > this byte must always have value 1, as it does not provide any > information. We could still go 1 means whatever extensions.objectFormat says, 2 means SHA-1, 3 means SHA-256, ... But maybe that would just be crazy. Thanks for all your comments Martin
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 440541045d..3535426d32 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -42,8 +42,8 @@ HEADER: 1-byte version number: Currently, the only valid version is 1. - 1-byte Hash Version (1 = SHA-1) - We infer the hash length (H) from this value. + 1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo) + We infer the hash length (H) from the hash algo derived from this value. 1-byte number (C) of "chunks"
We say that value 1 means "SHA-1", but in fact, it means "whatever the_hash_algo is", see commit c166599862 ("commit-graph: convert to using the_hash_algo", 2018-11-14). Signed-off-by: Martin Ågren <martin.agren@gmail.com> --- If we want to be more fine-grained in the future, we'll need to say, e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the version number. I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It could be implemented as "easily" as "if (value_from_header != value_from_the_hash_algo) die(...);" for now. Might that pay off in the long run? This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're fine. [1] https://lore.kernel.org/git/da077fb0-14bb-b84f-c526-d759ebc9f5eb@gmail.com/ Documentation/technical/commit-graph-format.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)