diff mbox series

[5/5] commit-graph-format.txt: fix "Hash Version" description

Message ID bbcd97db8a2c94343950bb0cf8cbd5c9c01b4577.1597406877.git.martin.agren@gmail.com (mailing list archive)
State Superseded
Headers show
Series more SHA-256 documentation | expand

Commit Message

Martin Ågren Aug. 14, 2020, 12:21 p.m. UTC
We say that value 1 means "SHA-1", but in fact, it means "whatever
the_hash_algo is", see commit c166599862 ("commit-graph: convert to
using the_hash_algo", 2018-11-14).

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 If we want to be more fine-grained in the future, we'll need to say,
 e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
 version number.

 I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
 could be implemented as "easily" as "if (value_from_header !=
 value_from_the_hash_algo) die(...);" for now. Might that pay off in the
 long run?

 This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
 fine.

 [1] https://lore.kernel.org/git/da077fb0-14bb-b84f-c526-d759ebc9f5eb@gmail.com/

 Documentation/technical/commit-graph-format.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Derrick Stolee Aug. 14, 2020, 12:37 p.m. UTC | #1
On 8/14/2020 8:21 AM, Martin Ågren wrote:
> We say that value 1 means "SHA-1", but in fact, it means "whatever
> the_hash_algo is", see commit c166599862 ("commit-graph: convert to
> using the_hash_algo", 2018-11-14).
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  If we want to be more fine-grained in the future, we'll need to say,
>  e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
>  version number.
> 
>  I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
>  could be implemented as "easily" as "if (value_from_header !=
>  value_from_the_hash_algo) die(...);" for now. Might that pay off in the
>  long run?
> 
>  This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
>  fine.

I think that was the intention of the byte, but that is not what ended
up happening. If we want that to be the case, then we should do that
work as part of the 2.29 cycle before we release with the ability to
create SHA-256 repos (which will lock the commit-graph format for these
repos).

(By "we" I mean that I would try to do this work in a way that minimizes
conflicts with the current commit-graph work in flight [1] [2].)

[1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/

[2] https://lore.kernel.org/git/cover.1597178914.git.me@ttaylorr.com/

>  [1] https://lore.kernel.org/git/da077fb0-14bb-b84f-c526-d759ebc9f5eb@gmail.com/
> 
>  Documentation/technical/commit-graph-format.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> index 440541045d..3535426d32 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -42,8 +42,8 @@ HEADER:
>    1-byte version number:
>        Currently, the only valid version is 1.
>  
> -  1-byte Hash Version (1 = SHA-1)
> -      We infer the hash length (H) from this value.
> +  1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
> +      We infer the hash length (H) from the hash algo derived from this value.

If we are _not_ changing the format to have a meaningful value in
this byte, then this documentation should be updated to state that
this byte must always have value 1, as it does not provide any
information.

Thanks,
-Stolee
Martin Ågren Aug. 14, 2020, 2:10 p.m. UTC | #2
On Fri, 14 Aug 2020 at 14:37, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/14/2020 8:21 AM, Martin Ågren wrote:
> > We say that value 1 means "SHA-1", but in fact, it means "whatever
> > the_hash_algo is", see commit c166599862 ("commit-graph: convert to
> > using the_hash_algo", 2018-11-14).
> >
> > Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> > ---
> >  If we want to be more fine-grained in the future, we'll need to say,
> >  e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
> >  version number.
> >
> >  I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
> >  could be implemented as "easily" as "if (value_from_header !=
> >  value_from_the_hash_algo) die(...);" for now. Might that pay off in the
> >  long run?
> >
> >  This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
> >  fine.
>
> I think that was the intention of the byte, but that is not what ended
> up happening.

When I wrote this patch, I did go with "fix <thing>" rather than
"document SHA-256". For the other patches, it's sort of obvious how
those formats are so old that it's no wonder they assumed SHA-1. But
here, we did go to some trouble to try and future-proof things and
already had NewHash in mind. So that calls for "fix <thing>". But I'm
more and more starting to think that it's the implementation that should
be fixed and that the docs should just be extended to add "2 means
SHA-256".

> If we want that to be the case, then we should do that
> work as part of the 2.29 cycle before we release with the ability to
> create SHA-256 repos (which will lock the commit-graph format for these
> repos).

Part of my reasoning behind [3] was that in exactly a situation like this,
we'd be able to say

  With extensions.objectFormat=sha256, Git 2.29-2.30 will barf at the
  commit-graph files that Git 2.31+ generate, and the other way around.
  Users will be able to remove "old" files and regenerate them, and
  shouldn't use a mixed environment.

and know that those users knew this might happen.

But certainly, if we can avoid it altogether by changing behavior
already in 2.29, that would be better.

[3] https://lore.kernel.org/git/20200806202358.2265705-1-martin.agren@gmail.com/

> (By "we" I mean that I would try to do this work in a way that minimizes
> conflicts with the current commit-graph work in flight [1] [2].)

None of those seems to touch `oid_version()`, so if we can just tweak
that function to return 1 or 2 instead of always 1, I guess that's one
way.

> [1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/
>
> [2] https://lore.kernel.org/git/cover.1597178914.git.me@ttaylorr.com/
>
> >
> > -  1-byte Hash Version (1 = SHA-1)
> > -      We infer the hash length (H) from this value.
> > +  1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
> > +      We infer the hash length (H) from the hash algo derived from this value.
>
> If we are _not_ changing the format to have a meaningful value in
> this byte, then this documentation should be updated to state that
> this byte must always have value 1, as it does not provide any
> information.

We could still go

  1 means whatever extensions.objectFormat says,
  2 means SHA-1,
  3 means SHA-256,
  ...

But maybe that would just be crazy.

Thanks for all your comments
Martin
diff mbox series

Patch

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 440541045d..3535426d32 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -42,8 +42,8 @@  HEADER:
   1-byte version number:
       Currently, the only valid version is 1.
 
-  1-byte Hash Version (1 = SHA-1)
-      We infer the hash length (H) from this value.
+  1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
+      We infer the hash length (H) from the hash algo derived from this value.
 
   1-byte number (C) of "chunks"