mbox series

[RFC,0/32] SHA256 and SHA1 interoperability

Message ID 87sf7ol0z3.fsf@email.froward.int.ebiederm.org (mailing list archive)
Headers show
Series SHA256 and SHA1 interoperability | expand

Message

Eric W. Biederman Sept. 8, 2023, 11:05 p.m. UTC
I would like to see the SHA256 transition happen so I started playing
with the k2204-transition-interop branch of brian m. carlson's tree.

Before I go farther I need to some other folks to look at this and see
if this is a general direction that the git project can stand.

This patchset is not complete it does not implement converting a
received pack of the compatibility hash into the hash function of the
repository, nor have I written any automated tests.  Both need to happen
before this is finalized.

That said I think I have working implementations of all of the
interesting cases.  In particular I have "git index-pack" computing the
compatibility hash of every object in a pack file, and I can tell you
the sha256 of every sha1 in the git://git.kernel.org/pub/scm/git/git.git

To get there I have tweaked the transition plan a little.

So far I have just aimed for code that works, so there is doubtless
room for improvement.  My hope is that I have implemented enough
that people can play with this, and that people can see all of the
weird little details that need to be taken care of to make this work.

What do everyone else think?  Does this direction look plausible?

Eric W. Biederman (24):
      doc hash-file-transition: A map file for mapping between sha1 and sha256
      doc hash-function-transition: Replace compatObjectFormat with compatMap
      object-file-convert:  Stubs for converting from one object format to another
      object-name: Initial support for ^{sha1} and ^{sha256}
      repository: add a compatibility hash algorithm
      loose: Compatibilty short name support
      object-file: Update the loose object map when writing loose objects
      bulk-checkin: Only accept blobs
      pack: Communicate the compat_oid through struct pack_idx_entry
      object-file: Add a compat_oid_in parameter to write_object_file_flags
      object: Factor out parse_mode out of fast-import and tree-walk into in object.h
      builtin/cat-file:  Let the oid determine the output algorithm
      tree-walk: init_tree_desc take an oid to get the hash algorithm
      object-file: Handle compat objects in check_object_signature
      builtin/ls-tree: Let the oid determine the output algorithm
      builtin/pack-objects:  Communicate the compatibility hash through struct pack_idx_entry
      pack-compat-map:  Add support for .compat files of a packfile
      object-file-convert: Implement convert_object_file_{begin,step,end}
      builtin/fast-import: compute compatibility hashs for imported objects
      builtin/index-pack:  Add a simple oid index
      builtin/index-pack:  Compute the compatibility hash
      builtin/index-pack: Make the stack in compute_compat_oid explicit
      unpack-objects: Update to compute and write the compatibility hashes
      object-file-convert: Implement repo_submodule_oid_to_algop

brian m. carlson (8):
      repository: Implement core.compatMap
      loose: add a mapping between SHA-1 and SHA-256 for loose objects
      bulk-checkin: hash object with compatibility algorithm
      commit: write commits for both hashes
      cache: add a function to read an OID of a specific algorithm
      object-file-convert: add a function to convert trees between algorithms
      object-file-convert: convert commit objects when writing
      object-file-convert: convert tag commits when writing


 Documentation/config/core.txt                      |   6 +
 .../technical/hash-function-transition.txt         |  56 ++-
 Makefile                                           |   4 +
 archive.c                                          |   3 +-
 builtin.h                                          |   1 +
 builtin/am.c                                       |   6 +-
 builtin/cat-file.c                                 |   8 +-
 builtin/checkout.c                                 |   8 +-
 builtin/clone.c                                    |   2 +-
 builtin/commit.c                                   |   2 +-
 builtin/fast-import.c                              | 110 +++--
 builtin/grep.c                                     |   8 +-
 builtin/index-pack.c                               | 441 ++++++++++++++++++++-
 builtin/ls-tree.c                                  |   5 +-
 builtin/merge.c                                    |   3 +-
 builtin/pack-objects.c                             |  13 +-
 builtin/read-tree.c                                |   2 +-
 builtin/show-compat-map.c                          | 139 +++++++
 builtin/stash.c                                    |   5 +-
 builtin/unpack-objects.c                           |  14 +-
 bulk-checkin.c                                     |  55 ++-
 bulk-checkin.h                                     |   6 +-
 cache-tree.c                                       |   4 +-
 commit.c                                           | 176 +++++---
 commit.h                                           |   1 +
 delta-islands.c                                    |   2 +-
 diff-lib.c                                         |   2 +-
 fsck.c                                             |   6 +-
 git.c                                              |   1 +
 hash-ll.h                                          |   3 +
 hash.h                                             |   9 +-
 http-push.c                                        |   2 +-
 list-objects.c                                     |   2 +-
 loose.c                                            | 256 ++++++++++++
 loose.h                                            |  20 +
 match-trees.c                                      |   4 +-
 merge-ort.c                                        |  11 +-
 merge-recursive.c                                  |   2 +-
 merge.c                                            |   3 +-
 object-file-convert.c                              | 366 +++++++++++++++++
 object-file-convert.h                              |  50 +++
 object-file.c                                      | 197 +++++++--
 object-name.c                                      |  77 +++-
 object-store-ll.h                                  |  13 +-
 object.c                                           |   2 +
 object.h                                           |  18 +
 pack-bitmap-write.c                                |   2 +-
 pack-compat-map.c                                  | 334 ++++++++++++++++
 pack-compat-map.h                                  |  27 ++
 pack-write.c                                       | 158 ++++++++
 pack.h                                             |   1 +
 packfile.c                                         |  15 +-
 reflog.c                                           |   2 +-
 repository.c                                       |  17 +
 repository.h                                       |   4 +
 revision.c                                         |   4 +-
 setup.c                                            |   5 +
 setup.h                                            |   1 +
 tree-walk.c                                        |  58 ++-
 tree-walk.h                                        |   7 +-
 tree.c                                             |   2 +-
 walker.c                                           |   2 +-
 62 files changed, 2525 insertions(+), 238 deletions(-)

Eric

Comments

Eric W. Biederman Sept. 9, 2023, 12:58 p.m. UTC | #1
I forgot to mention the patches are against 2.42.

Eric
brian m. carlson Sept. 10, 2023, 3:38 p.m. UTC | #2
On 2023-09-08 at 23:05:52, Eric W. Biederman wrote:
> 
> I would like to see the SHA256 transition happen so I started playing
> with the k2204-transition-interop branch of brian m. carlson's tree.
> 
> Before I go farther I need to some other folks to look at this and see
> if this is a general direction that the git project can stand.

I'm really excited to see this and I think it's a great way forward.
I've taken a brief look at each patch, and I don't see anything that
should be a dealbreaker.  I left a few comments, although I think your
mailserver is blocking mine at the moment, so you may not have received
them (hopefully you can read them on the list in the interim).

You may also feel free to simply adjust the commit message for the
patches of mine you've modified without needing to document that you've
changed them.  I expect that you will have changed them when you submit
them, if only to resolve conflicts.  After all, Junio does so all the
time.

> This patchset is not complete it does not implement converting a
> received pack of the compatibility hash into the hash function of the
> repository, nor have I written any automated tests.  Both need to happen
> before this is finalized.

Speaking of tests, one set of tests I had intended to write and think
should be written, but had not yet implemented, is tests for
round-tripping objects.  That is, the SHA-1 value we get for a revision
in a pure SHA-1 repository should obviously be the same as the SHA-1
value we get in a SHA-256 repository in interop mode, and we should be
able to use the `test_oid_cache` functionality to hard-code the desired
objects.  I think it would be also helpful to do this for fixed objects
that are doubly-signed (with both algorithms) as well, since that's a
tricky edge case that we'll want to avoid breaking.  Other edge cases
will include things like merge commits, including octopus merges.

But overall, I think this is a great improvement, and I'm very excited
to see someone picking up some of this work and moving it forward.
Thanks for doing so.
Eric W. Biederman Sept. 10, 2023, 6:20 p.m. UTC | #3
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-09-08 at 23:05:52, Eric W. Biederman wrote:
>> 
>> I would like to see the SHA256 transition happen so I started playing
>> with the k2204-transition-interop branch of brian m. carlson's tree.
>> 
>> Before I go farther I need to some other folks to look at this and see
>> if this is a general direction that the git project can stand.
>
> I'm really excited to see this and I think it's a great way forward.
> I've taken a brief look at each patch, and I don't see anything that
> should be a dealbreaker.  I left a few comments, although I think your
> mailserver is blocking mine at the moment, so you may not have received
> them (hopefully you can read them on the list in the interim).

I can.  I will see if I can figure out what is happening with direct
reception tomorrow.

> You may also feel free to simply adjust the commit message for the
> patches of mine you've modified without needing to document that you've
> changed them.  I expect that you will have changed them when you submit
> them, if only to resolve conflicts.  After all, Junio does so all the
> time.

Thanks.  I was doing my best at striking a balance between giving credit
where is credit is due, and pointing out the bugs are probably mine.

>> This patchset is not complete it does not implement converting a
>> received pack of the compatibility hash into the hash function of the
>> repository, nor have I written any automated tests.  Both need to happen
>> before this is finalized.
>
> Speaking of tests, one set of tests I had intended to write and think
> should be written, but had not yet implemented, is tests for
> round-tripping objects.  That is, the SHA-1 value we get for a revision
> in a pure SHA-1 repository should obviously be the same as the SHA-1
> value we get in a SHA-256 repository in interop mode, and we should be
> able to use the `test_oid_cache` functionality to hard-code the desired
> objects.  I think it would be also helpful to do this for fixed objects
> that are doubly-signed (with both algorithms) as well, since that's a
> tricky edge case that we'll want to avoid breaking.  Other edge cases
> will include things like merge commits, including octopus merges.

Yes.  I think we can use cat-file to do that.  Have two repositories one
in each format.  Verify that when cat-file prints out an object given
the native oid cat-file prints out what was put in.  Similarly verify
that when cat-file prints out an object given the compatibility oid
cat-file prints out the expected conversion.  That logic performed in
both repositories should work.

> But overall, I think this is a great improvement, and I'm very excited
> to see someone picking up some of this work and moving it forward.
> Thanks for doing so.

Thanks.

Then next goal is to get enough merged that I can test the round-trip
conversions.  More than anything else we need to know the conversion
functionality is solid.

Plus I expect that while 32 patches were important to show the scope of
the work, but a bit much to fully review and merge all at once.

Eric
Junio C Hamano Sept. 11, 2023, 6:37 a.m. UTC | #4
"Eric W. Biederman" <ebiederm@xmission.com> writes:

> I would like to see the SHA256 transition happen so I started playing
> with the k2204-transition-interop branch of brian m. carlson's tree.

I needed these tweaks to build the series standalone on 'master' (or
2.42).  There are semantic merge conflicts with some topics in flight
when this is merged to 'seen', so it may take me a bit more time to
push the integration result.

Thanks.

 builtin/fast-import.c | 2 +-
 bulk-checkin.c        | 2 +-
 commit.c              | 2 +-
 object-file-convert.c | 4 ++--
 pack-write.c          | 1 +
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 66c471bc73..93cc4a491c 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -784,7 +784,7 @@ struct pack_index_names {
 
 static struct pack_index_names create_index(void)
 {
-	struct pack_index_names tmp = {};
+	struct pack_index_names tmp = { 0 };
 	struct pack_idx_entry **idx, **c, **last;
 	struct object_entry *e;
 	struct object_entry_pool *o;
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 3206412a19..d63b3ffa01 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -264,7 +264,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 	struct hashfile_checkpoint checkpoint = {0};
 	struct pack_idx_entry *idx = NULL;
 	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
-	struct object_id compat_oid = {};
+	struct object_id compat_oid = { 0 };
 
 	seekback = lseek(fd, 0, SEEK_CUR);
 	if (seekback == (off_t) -1)
diff --git a/commit.c b/commit.c
index 54f19ed032..2e2b805d5e 100644
--- a/commit.c
+++ b/commit.c
@@ -1654,7 +1654,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 	struct strbuf buffer, compat_buffer;
 	struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
 	struct object_id *parent_buf = NULL;
-	struct object_id compat_oid = {};
+	struct object_id compat_oid = { 0 };
 	size_t i, nparents;
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
diff --git a/object-file-convert.c b/object-file-convert.c
index 2306e17dd5..148e61d24f 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -26,7 +26,7 @@ int repo_submodule_oid_to_algop(struct repository *repo,
 
 	for (i = 0; i < repo->index->cache_nr; i++) {
 		const struct cache_entry *ce = repo->index->cache[i];
-		struct repository subrepo = {};
+		struct repository subrepo = { 0 };
 		int ret;
 
 		if (!S_ISGITLINK(ce->ce_mode))
@@ -205,7 +205,7 @@ static int convert_tag_object_step(struct object_file_convert_state *state)
 	const struct git_hash_algo *to = state->to;
 	struct strbuf *out = state->outbuf;
 	const char *buffer = state->buf;
-	size_t payload_size, size = state->buf_len;;
+	size_t payload_size, size = state->buf_len;
 	struct object_id oid;
 	const char *p;
 	int ret = 0;
diff --git a/pack-write.c b/pack-write.c
index f22eea964f..b2ec09737e 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -7,6 +7,7 @@
 #include "remote.h"
 #include "chunk-format.h"
 #include "pack-mtimes.h"
+#include "pack-compat-map.h"
 #include "oidmap.h"
 #include "pack-objects.h"
 #include "pack-revindex.h"
Eric W. Biederman Sept. 11, 2023, 4:13 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> I would like to see the SHA256 transition happen so I started playing
>> with the k2204-transition-interop branch of brian m. carlson's tree.
>
> I needed these tweaks to build the series standalone on 'master' (or
> 2.42).  There are semantic merge conflicts with some topics in flight
> when this is merged to 'seen', so it may take me a bit more time to
> push the integration result.

Junio, brian for the very warm reception of this, it is very
encouraging.

I am not worried about what it will take time to get the changes I
posted into the integration.  I had only envisioned them as good enough
to get the technical ideas across, and had never envisioned them as
being accepted as is.

What I am envisioning as my future directions are:

- Post non controversial cleanups, so they can be merged.
  (I can only see about 4 of them the most significant is:
   bulk-checkin: Only accept blobs)

- Sort out the configuration options

- Post the smallest patchset I can that will allow testing the code in
  object-file-convert.c.  Unfortunately for that I need configuration
  options to enable the mapping.

  In starting to write the tests I have already found a bug in
  the conversion of tags (an extra newline is added), and I haven't
  even gotten to testing the tricky bits with signatures.

- Once the object file conversion is tested and is solid work on
  the more substantial pieces.

Does that sound like a reasonable plan?

Eric
brian m. carlson Sept. 11, 2023, 10:05 p.m. UTC | #6
On 2023-09-11 at 16:13:27, Eric W. Biederman wrote:
> Junio, brian for the very warm reception of this, it is very
> encouraging.
> 
> I am not worried about what it will take time to get the changes I
> posted into the integration.  I had only envisioned them as good enough
> to get the technical ideas across, and had never envisioned them as
> being accepted as is.
> 
> What I am envisioning as my future directions are:
> 
> - Post non controversial cleanups, so they can be merged.
>   (I can only see about 4 of them the most significant is:
>    bulk-checkin: Only accept blobs)
> 
> - Sort out the configuration options
> 
> - Post the smallest patchset I can that will allow testing the code in
>   object-file-convert.c.  Unfortunately for that I need configuration
>   options to enable the mapping.
> 
>   In starting to write the tests I have already found a bug in
>   the conversion of tags (an extra newline is added), and I haven't
>   even gotten to testing the tricky bits with signatures.

I wonder if unit tests are a possibility here now that we're starting to
use them.  They're not obligatory, of course, but it may be more
convenient for you if they turn out to be a suitable option.  If not, no
big deal.

> - Once the object file conversion is tested and is solid work on
>   the more substantial pieces.
> 
> Does that sound like a reasonable plan?

Yeah, that seems fine.
Junio C Hamano Sept. 12, 2023, 4:20 p.m. UTC | #7
"Eric W. Biederman" <ebiederm@xmission.com> writes:

> I am not worried about what it will take time to get the changes I
> posted into the integration.  I had only envisioned them as good enough
> to get the technical ideas across, and had never envisioned them as
> being accepted as is.

Ah, no worries.  By "integration" I did not mean "patches considered
perfect, they are accepted, and are now part of the Git codebase".

All that happens when the patches become part of the 'master'
branch, but before that, patches that prove testable and worthy of
getting tested will be merged to the 'next' branch and spend about a
week there.  What I meant to refer to is a step _before_ that, i.e.
before the patches probe to be testable.  New patches first appear
on the 'seen' branch that merges "everything else" to see the
interaction with all the topics "in flight" (i.e.  not yet in
'master').  The 'seen' branch is reassembled from the latest
iteration of the patches twice of thrice per day, and some patches
are merged to 'next' and down to 'master', these "merging to prepare
'master', 'next' and 'seen' branches for publishing" was what I
meant by "integration".  In short, being queued on 'seen' does not
mean all that much.  It gives project participants an easy access to
view how topics look in the larger picture, potentially interacting
with other topics in flight, but the patches in there can be
replaced wholesale or even dropped if they do not turn out to be
desirable.

I resolved textual conflicts and also compiler detectable semantic
conflicts (e.g. some in-flight topics may have added callsites to a
function your topic changes the function sigunature, or vice versa)
to the point that the result compiles while merging this topic to
'seen', but tests are broken the big time, it seems, even though the
topic by itself seems to pass the tests standalone.

> What I am envisioning as my future directions are:
> ...
> Does that sound like a reasonable plan?

Nice.
Eric W. Biederman Sept. 12, 2023, 9:19 p.m. UTC | #8
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2023-09-11 at 16:13:27, Eric W. Biederman wrote:
>> Junio, brian for the very warm reception of this, it is very
>> encouraging.
>> 
>> I am not worried about what it will take time to get the changes I
>> posted into the integration.  I had only envisioned them as good enough
>> to get the technical ideas across, and had never envisioned them as
>> being accepted as is.
>> 
>> What I am envisioning as my future directions are:
>> 
>> - Post non controversial cleanups, so they can be merged.
>>   (I can only see about 4 of them the most significant is:
>>    bulk-checkin: Only accept blobs)
>> 
>> - Sort out the configuration options
>> 
>> - Post the smallest patchset I can that will allow testing the code in
>>   object-file-convert.c.  Unfortunately for that I need configuration
>>   options to enable the mapping.
>> 
>>   In starting to write the tests I have already found a bug in
>>   the conversion of tags (an extra newline is added), and I haven't
>>   even gotten to testing the tricky bits with signatures.
>
> I wonder if unit tests are a possibility here now that we're starting to
> use them.  They're not obligatory, of course, but it may be more
> convenient for you if they turn out to be a suitable option.  If not, no
> big deal.

I believe you mean using test-tool and making a very narrow focused test
on just the functionality.

If the number of patches I have to go through before I can test anything
becomes a problem I might go there.  Unfortunately it would take some
refactoring to make object-file-convert independent of the object
mapping layer, and that is extra work and is likely to introduces bugs
as anything.

I have managed to get a set of tests working.  I am just going through
now and plugging the holes.

My big strategy for testing convert_object_file is to build two
repositories one sha1 and the other sha256 both with compatibility
support enabled.  I add a series of objects to those repositories and
compare them to ensure the objects are identical.

It is working well and is finding bugs not just in convert_object_file
but in code such as commit and tag that perform interesting work with
signed commits.

I discovered I had bungled the placement of hash_object_file for
the compatibility hash in commit.

I found that git tag did not yet support building tags with both
hash algorithms.

Right now I am looking at commits with mergetag lines.  It is not fun.
The mergetag instead of pointing to the tag objects they include the
body of the tag objects in the commit object.  So I have convert
the embedded tag objects from one hash function to another.
Which given the presence of preceding space on every line of
the embedded tag object makes it doubly interesting.  Perhaps
someone has already written code to extract the embedded tag.

Or in short everything is moving along steadily.

Eric
Eric W. Biederman Sept. 14, 2023, 7:57 p.m. UTC | #9
Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>
>> I am not worried about what it will take time to get the changes I
>> posted into the integration.  I had only envisioned them as good enough
>> to get the technical ideas across, and had never envisioned them as
>> being accepted as is.
>
> Ah, no worries.  By "integration" I did not mean "patches considered
> perfect, they are accepted, and are now part of the Git codebase".
>
> All that happens when the patches become part of the 'master'
> branch, but before that, patches that prove testable and worthy of
> getting tested will be merged to the 'next' branch and spend about a
> week there.  What I meant to refer to is a step _before_ that, i.e.
> before the patches probe to be testable.  New patches first appear
> on the 'seen' branch that merges "everything else" to see the
> interaction with all the topics "in flight" (i.e.  not yet in
> 'master').  The 'seen' branch is reassembled from the latest
> iteration of the patches twice of thrice per day, and some patches
> are merged to 'next' and down to 'master', these "merging to prepare
> 'master', 'next' and 'seen' branches for publishing" was what I
> meant by "integration".  In short, being queued on 'seen' does not
> mean all that much.  It gives project participants an easy access to
> view how topics look in the larger picture, potentially interacting
> with other topics in flight, but the patches in there can be
> replaced wholesale or even dropped if they do not turn out to be
> desirable.
>
> I resolved textual conflicts and also compiler detectable semantic
> conflicts (e.g. some in-flight topics may have added callsites to a
> function your topic changes the function sigunature, or vice versa)
> to the point that the result compiles while merging this topic to
> 'seen', but tests are broken the big time, it seems, even though the
> topic by itself seems to pass the tests standalone.

That the tests are broken is very unfortunate.

I took at look at What's cooking in git.git and I did not see my topic
mentioned.  So I presume I would have to perform the test merge myself
to have a sense of what the conflicts were.

Is there a time when in flight topics is low?  I had a hunch that basing
my work on a brand new release would achieve that but I saw a lot of
topics in your "What's cooking" email.

I am just trying to figure out a good plan to deal with conflicts,
because the bugs need to be hunted down.

Eric