Message ID | 87sf7ol0z3.fsf@email.froward.int.ebiederm.org (mailing list archive) |
---|---|
Headers | show |
Series | SHA256 and SHA1 interoperability | expand |
I forgot to mention the patches are against 2.42. Eric
On 2023-09-08 at 23:05:52, Eric W. Biederman wrote: > > I would like to see the SHA256 transition happen so I started playing > with the k2204-transition-interop branch of brian m. carlson's tree. > > Before I go farther I need to some other folks to look at this and see > if this is a general direction that the git project can stand. I'm really excited to see this and I think it's a great way forward. I've taken a brief look at each patch, and I don't see anything that should be a dealbreaker. I left a few comments, although I think your mailserver is blocking mine at the moment, so you may not have received them (hopefully you can read them on the list in the interim). You may also feel free to simply adjust the commit message for the patches of mine you've modified without needing to document that you've changed them. I expect that you will have changed them when you submit them, if only to resolve conflicts. After all, Junio does so all the time. > This patchset is not complete it does not implement converting a > received pack of the compatibility hash into the hash function of the > repository, nor have I written any automated tests. Both need to happen > before this is finalized. Speaking of tests, one set of tests I had intended to write and think should be written, but had not yet implemented, is tests for round-tripping objects. That is, the SHA-1 value we get for a revision in a pure SHA-1 repository should obviously be the same as the SHA-1 value we get in a SHA-256 repository in interop mode, and we should be able to use the `test_oid_cache` functionality to hard-code the desired objects. I think it would be also helpful to do this for fixed objects that are doubly-signed (with both algorithms) as well, since that's a tricky edge case that we'll want to avoid breaking. Other edge cases will include things like merge commits, including octopus merges. But overall, I think this is a great improvement, and I'm very excited to see someone picking up some of this work and moving it forward. Thanks for doing so.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > On 2023-09-08 at 23:05:52, Eric W. Biederman wrote: >> >> I would like to see the SHA256 transition happen so I started playing >> with the k2204-transition-interop branch of brian m. carlson's tree. >> >> Before I go farther I need to some other folks to look at this and see >> if this is a general direction that the git project can stand. > > I'm really excited to see this and I think it's a great way forward. > I've taken a brief look at each patch, and I don't see anything that > should be a dealbreaker. I left a few comments, although I think your > mailserver is blocking mine at the moment, so you may not have received > them (hopefully you can read them on the list in the interim). I can. I will see if I can figure out what is happening with direct reception tomorrow. > You may also feel free to simply adjust the commit message for the > patches of mine you've modified without needing to document that you've > changed them. I expect that you will have changed them when you submit > them, if only to resolve conflicts. After all, Junio does so all the > time. Thanks. I was doing my best at striking a balance between giving credit where is credit is due, and pointing out the bugs are probably mine. >> This patchset is not complete it does not implement converting a >> received pack of the compatibility hash into the hash function of the >> repository, nor have I written any automated tests. Both need to happen >> before this is finalized. > > Speaking of tests, one set of tests I had intended to write and think > should be written, but had not yet implemented, is tests for > round-tripping objects. That is, the SHA-1 value we get for a revision > in a pure SHA-1 repository should obviously be the same as the SHA-1 > value we get in a SHA-256 repository in interop mode, and we should be > able to use the `test_oid_cache` functionality to hard-code the desired > objects. I think it would be also helpful to do this for fixed objects > that are doubly-signed (with both algorithms) as well, since that's a > tricky edge case that we'll want to avoid breaking. Other edge cases > will include things like merge commits, including octopus merges. Yes. I think we can use cat-file to do that. Have two repositories one in each format. Verify that when cat-file prints out an object given the native oid cat-file prints out what was put in. Similarly verify that when cat-file prints out an object given the compatibility oid cat-file prints out the expected conversion. That logic performed in both repositories should work. > But overall, I think this is a great improvement, and I'm very excited > to see someone picking up some of this work and moving it forward. > Thanks for doing so. Thanks. Then next goal is to get enough merged that I can test the round-trip conversions. More than anything else we need to know the conversion functionality is solid. Plus I expect that while 32 patches were important to show the scope of the work, but a bit much to fully review and merge all at once. Eric
"Eric W. Biederman" <ebiederm@xmission.com> writes: > I would like to see the SHA256 transition happen so I started playing > with the k2204-transition-interop branch of brian m. carlson's tree. I needed these tweaks to build the series standalone on 'master' (or 2.42). There are semantic merge conflicts with some topics in flight when this is merged to 'seen', so it may take me a bit more time to push the integration result. Thanks. builtin/fast-import.c | 2 +- bulk-checkin.c | 2 +- commit.c | 2 +- object-file-convert.c | 4 ++-- pack-write.c | 1 + 5 files changed, 6 insertions(+), 5 deletions(-) diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 66c471bc73..93cc4a491c 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -784,7 +784,7 @@ struct pack_index_names { static struct pack_index_names create_index(void) { - struct pack_index_names tmp = {}; + struct pack_index_names tmp = { 0 }; struct pack_idx_entry **idx, **c, **last; struct object_entry *e; struct object_entry_pool *o; diff --git a/bulk-checkin.c b/bulk-checkin.c index 3206412a19..d63b3ffa01 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -264,7 +264,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; const struct git_hash_algo *compat = the_repository->compat_hash_algo; - struct object_id compat_oid = {}; + struct object_id compat_oid = { 0 }; seekback = lseek(fd, 0, SEEK_CUR); if (seekback == (off_t) -1) diff --git a/commit.c b/commit.c index 54f19ed032..2e2b805d5e 100644 --- a/commit.c +++ b/commit.c @@ -1654,7 +1654,7 @@ int commit_tree_extended(const char *msg, size_t msg_len, struct strbuf buffer, compat_buffer; struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT; struct object_id *parent_buf = NULL; - struct object_id compat_oid = {}; + struct object_id compat_oid = { 0 }; size_t i, nparents; /* Not having i18n.commitencoding is the same as having utf-8 */ diff --git a/object-file-convert.c b/object-file-convert.c index 2306e17dd5..148e61d24f 100644 --- a/object-file-convert.c +++ b/object-file-convert.c @@ -26,7 +26,7 @@ int repo_submodule_oid_to_algop(struct repository *repo, for (i = 0; i < repo->index->cache_nr; i++) { const struct cache_entry *ce = repo->index->cache[i]; - struct repository subrepo = {}; + struct repository subrepo = { 0 }; int ret; if (!S_ISGITLINK(ce->ce_mode)) @@ -205,7 +205,7 @@ static int convert_tag_object_step(struct object_file_convert_state *state) const struct git_hash_algo *to = state->to; struct strbuf *out = state->outbuf; const char *buffer = state->buf; - size_t payload_size, size = state->buf_len;; + size_t payload_size, size = state->buf_len; struct object_id oid; const char *p; int ret = 0; diff --git a/pack-write.c b/pack-write.c index f22eea964f..b2ec09737e 100644 --- a/pack-write.c +++ b/pack-write.c @@ -7,6 +7,7 @@ #include "remote.h" #include "chunk-format.h" #include "pack-mtimes.h" +#include "pack-compat-map.h" #include "oidmap.h" #include "pack-objects.h" #include "pack-revindex.h"
Junio C Hamano <gitster@pobox.com> writes: > "Eric W. Biederman" <ebiederm@xmission.com> writes: > >> I would like to see the SHA256 transition happen so I started playing >> with the k2204-transition-interop branch of brian m. carlson's tree. > > I needed these tweaks to build the series standalone on 'master' (or > 2.42). There are semantic merge conflicts with some topics in flight > when this is merged to 'seen', so it may take me a bit more time to > push the integration result. Junio, brian for the very warm reception of this, it is very encouraging. I am not worried about what it will take time to get the changes I posted into the integration. I had only envisioned them as good enough to get the technical ideas across, and had never envisioned them as being accepted as is. What I am envisioning as my future directions are: - Post non controversial cleanups, so they can be merged. (I can only see about 4 of them the most significant is: bulk-checkin: Only accept blobs) - Sort out the configuration options - Post the smallest patchset I can that will allow testing the code in object-file-convert.c. Unfortunately for that I need configuration options to enable the mapping. In starting to write the tests I have already found a bug in the conversion of tags (an extra newline is added), and I haven't even gotten to testing the tricky bits with signatures. - Once the object file conversion is tested and is solid work on the more substantial pieces. Does that sound like a reasonable plan? Eric
On 2023-09-11 at 16:13:27, Eric W. Biederman wrote: > Junio, brian for the very warm reception of this, it is very > encouraging. > > I am not worried about what it will take time to get the changes I > posted into the integration. I had only envisioned them as good enough > to get the technical ideas across, and had never envisioned them as > being accepted as is. > > What I am envisioning as my future directions are: > > - Post non controversial cleanups, so they can be merged. > (I can only see about 4 of them the most significant is: > bulk-checkin: Only accept blobs) > > - Sort out the configuration options > > - Post the smallest patchset I can that will allow testing the code in > object-file-convert.c. Unfortunately for that I need configuration > options to enable the mapping. > > In starting to write the tests I have already found a bug in > the conversion of tags (an extra newline is added), and I haven't > even gotten to testing the tricky bits with signatures. I wonder if unit tests are a possibility here now that we're starting to use them. They're not obligatory, of course, but it may be more convenient for you if they turn out to be a suitable option. If not, no big deal. > - Once the object file conversion is tested and is solid work on > the more substantial pieces. > > Does that sound like a reasonable plan? Yeah, that seems fine.
"Eric W. Biederman" <ebiederm@xmission.com> writes: > I am not worried about what it will take time to get the changes I > posted into the integration. I had only envisioned them as good enough > to get the technical ideas across, and had never envisioned them as > being accepted as is. Ah, no worries. By "integration" I did not mean "patches considered perfect, they are accepted, and are now part of the Git codebase". All that happens when the patches become part of the 'master' branch, but before that, patches that prove testable and worthy of getting tested will be merged to the 'next' branch and spend about a week there. What I meant to refer to is a step _before_ that, i.e. before the patches probe to be testable. New patches first appear on the 'seen' branch that merges "everything else" to see the interaction with all the topics "in flight" (i.e. not yet in 'master'). The 'seen' branch is reassembled from the latest iteration of the patches twice of thrice per day, and some patches are merged to 'next' and down to 'master', these "merging to prepare 'master', 'next' and 'seen' branches for publishing" was what I meant by "integration". In short, being queued on 'seen' does not mean all that much. It gives project participants an easy access to view how topics look in the larger picture, potentially interacting with other topics in flight, but the patches in there can be replaced wholesale or even dropped if they do not turn out to be desirable. I resolved textual conflicts and also compiler detectable semantic conflicts (e.g. some in-flight topics may have added callsites to a function your topic changes the function sigunature, or vice versa) to the point that the result compiles while merging this topic to 'seen', but tests are broken the big time, it seems, even though the topic by itself seems to pass the tests standalone. > What I am envisioning as my future directions are: > ... > Does that sound like a reasonable plan? Nice.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > On 2023-09-11 at 16:13:27, Eric W. Biederman wrote: >> Junio, brian for the very warm reception of this, it is very >> encouraging. >> >> I am not worried about what it will take time to get the changes I >> posted into the integration. I had only envisioned them as good enough >> to get the technical ideas across, and had never envisioned them as >> being accepted as is. >> >> What I am envisioning as my future directions are: >> >> - Post non controversial cleanups, so they can be merged. >> (I can only see about 4 of them the most significant is: >> bulk-checkin: Only accept blobs) >> >> - Sort out the configuration options >> >> - Post the smallest patchset I can that will allow testing the code in >> object-file-convert.c. Unfortunately for that I need configuration >> options to enable the mapping. >> >> In starting to write the tests I have already found a bug in >> the conversion of tags (an extra newline is added), and I haven't >> even gotten to testing the tricky bits with signatures. > > I wonder if unit tests are a possibility here now that we're starting to > use them. They're not obligatory, of course, but it may be more > convenient for you if they turn out to be a suitable option. If not, no > big deal. I believe you mean using test-tool and making a very narrow focused test on just the functionality. If the number of patches I have to go through before I can test anything becomes a problem I might go there. Unfortunately it would take some refactoring to make object-file-convert independent of the object mapping layer, and that is extra work and is likely to introduces bugs as anything. I have managed to get a set of tests working. I am just going through now and plugging the holes. My big strategy for testing convert_object_file is to build two repositories one sha1 and the other sha256 both with compatibility support enabled. I add a series of objects to those repositories and compare them to ensure the objects are identical. It is working well and is finding bugs not just in convert_object_file but in code such as commit and tag that perform interesting work with signed commits. I discovered I had bungled the placement of hash_object_file for the compatibility hash in commit. I found that git tag did not yet support building tags with both hash algorithms. Right now I am looking at commits with mergetag lines. It is not fun. The mergetag instead of pointing to the tag objects they include the body of the tag objects in the commit object. So I have convert the embedded tag objects from one hash function to another. Which given the presence of preceding space on every line of the embedded tag object makes it doubly interesting. Perhaps someone has already written code to extract the embedded tag. Or in short everything is moving along steadily. Eric
Junio C Hamano <gitster@pobox.com> writes: > "Eric W. Biederman" <ebiederm@xmission.com> writes: > >> I am not worried about what it will take time to get the changes I >> posted into the integration. I had only envisioned them as good enough >> to get the technical ideas across, and had never envisioned them as >> being accepted as is. > > Ah, no worries. By "integration" I did not mean "patches considered > perfect, they are accepted, and are now part of the Git codebase". > > All that happens when the patches become part of the 'master' > branch, but before that, patches that prove testable and worthy of > getting tested will be merged to the 'next' branch and spend about a > week there. What I meant to refer to is a step _before_ that, i.e. > before the patches probe to be testable. New patches first appear > on the 'seen' branch that merges "everything else" to see the > interaction with all the topics "in flight" (i.e. not yet in > 'master'). The 'seen' branch is reassembled from the latest > iteration of the patches twice of thrice per day, and some patches > are merged to 'next' and down to 'master', these "merging to prepare > 'master', 'next' and 'seen' branches for publishing" was what I > meant by "integration". In short, being queued on 'seen' does not > mean all that much. It gives project participants an easy access to > view how topics look in the larger picture, potentially interacting > with other topics in flight, but the patches in there can be > replaced wholesale or even dropped if they do not turn out to be > desirable. > > I resolved textual conflicts and also compiler detectable semantic > conflicts (e.g. some in-flight topics may have added callsites to a > function your topic changes the function sigunature, or vice versa) > to the point that the result compiles while merging this topic to > 'seen', but tests are broken the big time, it seems, even though the > topic by itself seems to pass the tests standalone. That the tests are broken is very unfortunate. I took at look at What's cooking in git.git and I did not see my topic mentioned. So I presume I would have to perform the test merge myself to have a sense of what the conflicts were. Is there a time when in flight topics is low? I had a hunch that basing my work on a brand new release would achieve that but I saw a lot of topics in your "What's cooking" email. I am just trying to figure out a good plan to deal with conflicts, because the bugs need to be hunted down. Eric