diff mbox series

[1/5] http-protocol.txt: document SHA-256 "want"/"have" format

Message ID 5590a68c5ba7081cd7e64c708b5c25db23f5e95b.1597406877.git.martin.agren@gmail.com (mailing list archive)
State Superseded
Headers show
Series more SHA-256 documentation | expand

Commit Message

Martin Ågren Aug. 14, 2020, 12:21 p.m. UTC
Document that in SHA-1 repositories, we use SHA-1 for "want"s and
"have"s, and in SHA-256 repositories, we use SHA-256.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/http-protocol.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Aug. 14, 2020, 5:28 p.m. UTC | #1
Martin Ågren <martin.agren@gmail.com> writes:

> Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> "have"s, and in SHA-256 repositories, we use SHA-256.

Ehh, doesn't this directly contradict the transition plan of "on the
wire everything will use SHA-1 version for now?"



> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  Documentation/technical/http-protocol.txt | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> index 51a79e63de..507f28f9b3 100644
> --- a/Documentation/technical/http-protocol.txt
> +++ b/Documentation/technical/http-protocol.txt
> @@ -401,8 +401,9 @@ at all in the request stream:
>  The stream is terminated by a pkt-line flush (`0000`).
>  
>  A single "want" or "have" command MUST have one hex formatted
> -SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
> -multiple commands.
> +object name as its value.  Multiple object names MUST be sent by sending
> +multiple commands. (An object name is a SHA-1 hash in a SHA-1 repo
> +and a SHA-256 hash in a SHA-256 repo.)
>  
>  The `have` list is created by popping the first 32 commits
>  from `c_pending`.  Less can be supplied if `c_pending` empties.
brian m. carlson Aug. 14, 2020, 8:23 p.m. UTC | #2
On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
> Martin Ågren <martin.agren@gmail.com> writes:
> 
> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> > "have"s, and in SHA-256 repositories, we use SHA-256.
> 
> Ehh, doesn't this directly contradict the transition plan of "on the
> wire everything will use SHA-1 version for now?"

SHA-256 repositories interoperate currently using SHA-256 object IDs.
It was originally intended that we wouldn't update the protocol, but
that leads to much of the testsuite failing since it's impossible to
move objects from one place to another.

If we wanted to be more pedantically correct and optimize for the
future, we could say that the values use the format negotiated by the
"object-format" protocol extension and SHA-1 otherwise.
Martin Ågren Aug. 14, 2020, 8:32 p.m. UTC | #3
On Fri, 14 Aug 2020 at 22:23, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
> > Martin Ågren <martin.agren@gmail.com> writes:
> >
> > > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> > > "have"s, and in SHA-256 repositories, we use SHA-256.
> >
> > Ehh, doesn't this directly contradict the transition plan of "on the
> > wire everything will use SHA-1 version for now?"

Yes, the transition plan would probably need updating there. I'm just
trying to document what we have.

> SHA-256 repositories interoperate currently using SHA-256 object IDs.
> It was originally intended that we wouldn't update the protocol, but
> that leads to much of the testsuite failing since it's impossible to
> move objects from one place to another.
>
> If we wanted to be more pedantically correct and optimize for the
> future, we could say that the values use the format negotiated by the
> "object-format" protocol extension and SHA-1 otherwise.

Hmm, I didn't think of that. Would we ever regret that we've painted
such a "big" picture and wish to refine it somehow? Compared to
admittedly being fairly narrow as I am here, then loosen things later.
I'll think about it, but I think I could go either way.

Martin
Junio C Hamano Aug. 14, 2020, 8:39 p.m. UTC | #4
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
>> Martin Ågren <martin.agren@gmail.com> writes:
>> 
>> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
>> > "have"s, and in SHA-256 repositories, we use SHA-256.
>> 
>> Ehh, doesn't this directly contradict the transition plan of "on the
>> wire everything will use SHA-1 version for now?"
>
> SHA-256 repositories interoperate currently using SHA-256 object IDs.
> It was originally intended that we wouldn't update the protocol, but
> that leads to much of the testsuite failing since it's impossible to
> move objects from one place to another.
>
> If we wanted to be more pedantically correct and optimize for the
> future, we could say that the values use the format negotiated by the
> "object-format" protocol extension and SHA-1 otherwise.

Yup.  I think a reasonable evolution path is

    0) everything on the wire is SHA-1 and no local operation knows
       SHA-256 (i.e. a few releases ago)

    1) local operations are either SHA-1 or SHA-256 but not both.
       On the wire, only protocol for SHA-1 repositories are
       defined, so SHA-256 repositories cannot talk with anybody
       using any official protocol, but a "borked" SHA-1 protocol
       that naturally extends the object names width exists and
       SHA-256 repositories can interoperate with each other.  This
       will be a backward compatibility nightmare, as Git from
       SHA-256 repository that tries to talk to SHA-1 repository
       will fail but without grace (i.e. the current situation).

    2) on-the-wire protocol gains just one new capability to safely
       unleash SHA-256 repositories to talk to the wider world.  The
       "borked" SHA-1 protocol above will become official when the
       object-format=sha256 capability is negotiated by both ends.
       At this stage, SHA-256 repositories still cannot talk with
       SHA-1 repositories, but at least they can talk among
       themselves as long as they use new-enough version of Git that
       knows about the new capability.

    3) on-the-fly SHA-1 vs SHA-256 migration gets implemented.
       SHA-256 reposotories trying to talk to somebody else, after
       discovering that the other end lacks object-format=sha256
       capability, on-the-fly converts its SHA-256 objecst to SHA-1
       and vice versa.  Between SHA-256 repositories, the capability
       above in 2) will allow native conversation with SHA-256.

Reaching 3) may be a lot of work, but at least we should get to 2)
to be able to safely let SHA-256 repositories to talk to the outside
world (yes, I consider it OK for SHA-256 repositories talking among
themselves in a private setting in the current state, and it would
be a good milestone and also test towards the eventual goal of
reaching 3), and with much smaller effort.

Thanks.
Junio C Hamano Aug. 14, 2020, 8:47 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
>>> Martin Ågren <martin.agren@gmail.com> writes:
>>> 
>>> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
>>> > "have"s, and in SHA-256 repositories, we use SHA-256.
>>> 
>>> Ehh, doesn't this directly contradict the transition plan of "on the
>>> wire everything will use SHA-1 version for now?"
>>
>> SHA-256 repositories interoperate currently using SHA-256 object IDs.
>> It was originally intended that we wouldn't update the protocol, but
>> that leads to much of the testsuite failing since it's impossible to
>> move objects from one place to another.
>>
>> If we wanted to be more pedantically correct and optimize for the
>> future, we could say that the values use the format negotiated by the
>> "object-format" protocol extension and SHA-1 otherwise.

Yes, that's wonderful.  I was confused when I said about the
evolution path.  We still would want to eventually do the on-the-fly
migration over the wire to make SHA-1 and SHA-256 repositories
interoperate, but at least we already can allow SHA-256 repositories
safely attempt to talk to SHA-1 repositories and gracefully fail.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 51a79e63de..507f28f9b3 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -401,8 +401,9 @@  at all in the request stream:
 The stream is terminated by a pkt-line flush (`0000`).
 
 A single "want" or "have" command MUST have one hex formatted
-SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
-multiple commands.
+object name as its value.  Multiple object names MUST be sent by sending
+multiple commands. (An object name is a SHA-1 hash in a SHA-1 repo
+and a SHA-256 hash in a SHA-256 repo.)
 
 The `have` list is created by popping the first 32 commits
 from `c_pending`.  Less can be supplied if `c_pending` empties.