Message ID | 20230908231049.2035003-2-ebiederm@xmission.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SHA256 and SHA1 interoperability | expand |
On 2023-09-08 at 23:10:19, Eric W. Biederman wrote: > Ir makes a lot of sense for the hash algorithm that determines how all Minor nit: "It". > diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt > index 4b937480848a..10572c5794f9 100644 > --- a/Documentation/technical/hash-function-transition.txt > +++ b/Documentation/technical/hash-function-transition.txt > @@ -148,14 +148,14 @@ Detailed Design > Repository format extension > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > A SHA-256 repository uses repository format version `1` (see > -Documentation/technical/repository-version.txt) with extensions > -`objectFormat` and `compatObjectFormat`: > +Documentation/technical/repository-version.txt) with the extension > +`objectFormat`, and an optional core.compatMap configuration. > > [core] > repositoryFormatVersion = 1 > + compatMap = on > [extensions] > objectFormat = sha256 > - compatObjectFormat = sha1 While I'm in favour of an approach that uses the compat map, the situation we've implemented here doesn't specify the extra hash algorithm. We want this approach to work just as well for moving from SHA-1 to SHA-256 as it might for a future transition from SHA-256 to, say, SHA-3-512, if that becomes necessary. Making a future transition easier has been a goal of my SHA-256 work (because who wants to write several hundred patches in such a case?), so my hope is we can keep that here as well by explicitly naming the algorithm we're using. I also wonder if an approach that doesn't use an extension is going to be helpful. Say, that I have a repository that is using Git 3.x, which supports interop, but I also need to use Git 2.x, which does not. While it's true that Git 2.x can read my SHA-256 repository, it won't write the appropriate objects into the map, and thus it will be practically very difficult to actually use Git 3.x to push data to a repository of a different hash function. We might well prefer to have Git 2.x not work with the repository at all rather than have incomplete data preventing us from, well, interoperating.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > On 2023-09-08 at 23:10:19, Eric W. Biederman wrote: >> Ir makes a lot of sense for the hash algorithm that determines how all > > Minor nit: "It". > >> diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt >> index 4b937480848a..10572c5794f9 100644 >> --- a/Documentation/technical/hash-function-transition.txt >> +++ b/Documentation/technical/hash-function-transition.txt >> @@ -148,14 +148,14 @@ Detailed Design >> Repository format extension >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> A SHA-256 repository uses repository format version `1` (see >> -Documentation/technical/repository-version.txt) with extensions >> -`objectFormat` and `compatObjectFormat`: >> +Documentation/technical/repository-version.txt) with the extension >> +`objectFormat`, and an optional core.compatMap configuration. >> >> [core] >> repositoryFormatVersion = 1 >> + compatMap = on >> [extensions] >> objectFormat = sha256 >> - compatObjectFormat = sha1 > > While I'm in favour of an approach that uses the compat map, the > situation we've implemented here doesn't specify the extra hash > algorithm. We want this approach to work just as well for moving from > SHA-1 to SHA-256 as it might for a future transition from SHA-256 to, > say, SHA-3-512, if that becomes necessary. > > Making a future transition easier has been a goal of my SHA-256 work > (because who wants to write several hundred patches in such a case?), so > my hope is we can keep that here as well by explicitly naming the > algorithm we're using. > > I also wonder if an approach that doesn't use an extension is going to > be helpful. Say, that I have a repository that is using Git 3.x, which > supports interop, but I also need to use Git 2.x, which does not. While > it's true that Git 2.x can read my SHA-256 repository, it won't write > the appropriate objects into the map, and thus it will be practically > very difficult to actually use Git 3.x to push data to a repository of a > different hash function. We might well prefer to have Git 2.x not work > with the repository at all rather than have incomplete data preventing > us from, well, interoperating. First it is my hope that we can get a command such as "git gc" to scan the repository and fill in all of the missing compatibility hashes. Not so much for day to day work, but for people able to enable compatibility hashes on an existing repository. Enabling compatibility hashes on a sha1 repository is going to be necessary to create a sha256 repository from it. A depth first walk, or a topological sort of the objects pretty much has to happen as a separate pass. So it makes sense just to require all of the objects have their compatibility hash computed before attempting to generate a pack in the compatibility format. I say all of that and I feel silly. The core and optimized path is what whatever receive pack does to deal with a pack in the repositories compatibility format. Once that is built we can create a sha256 repository from a sha1 repository just by cloning it, and letting receive-pack figure out the details. Before we can generate a sha256 pack from a sha1 pack we still need to compute the sha256 hash of every object, but that can be very optimized and local to the case of receiving a non-native pack. So a repository that generates a compatibility hash for all of it's objects is not necessary to transition to another hash algorithm. All we need is another repository in the other format. That said there is value in being able to add compatibility hashes to an existing repository. The upstream repository can just convert to the new hash function and all of the downstream repositories can compute their compatibility hashes and convert when they are ready. Basically once a git with transition support exists any repository can convert at any time without creating a problem for other repositories. In my head it seems cheaper/safer to compute the compatibility hash of every object in an existing repository than it does to convert a repository. Is it? I think that if the first pull from a repository in another format can trigger the initial computation of the compatibility hash (like the first use of a reverse index triggers the creation of the reverse index), then it will definitely be easier to just enable compatibility hashes in an existing repository. The additional hash computation step every pull from upstream (even when well optimized) should be an incentive for people to fully convert their repositories after the upstream has converted. That is when things get tricky and the transition plan has not talked about. There are references to existing oid's in email, bug trackers, and commit comments. Digging through the history and dealing with those references is something that developers are going to need to do for the rest of the life of a project. Which means eventually we will need to support a mode where we have some packs with a ``.compat'' index but we no longer compute or generate the old hash for new objects. In summary. I agree that compatMap is likely insufficient. So far I think it is too cheap/easy to generate the missing mappings to make it a mandatory requirement that all operations always generate them. I also agree that making the configuration resilient foreseeable future demands is a good idea. So I will push this change farther out in the patch series. Eric
"brian m. carlson" <sandals@crustytoothpaste.net> writes: >> +Documentation/technical/repository-version.txt) with the extension >> +`objectFormat`, and an optional core.compatMap configuration. >> >> [core] >> repositoryFormatVersion = 1 >> + compatMap = on >> [extensions] >> objectFormat = sha256 >> - compatObjectFormat = sha1 > > While I'm in favour of an approach that uses the compat map, the > situation we've implemented here doesn't specify the extra hash > algorithm. We want this approach to work just as well for moving from > SHA-1 to SHA-256 as it might for a future transition from SHA-256 to, > say, SHA-3-512, if that becomes necessary. > > Making a future transition easier has been a goal of my SHA-256 work > (because who wants to write several hundred patches in such a case?), so > my hope is we can keep that here as well by explicitly naming the > algorithm we're using. > > I also wonder if an approach that doesn't use an extension is going to > be helpful. Say, that I have a repository that is using Git 3.x, which > supports interop, but I also need to use Git 2.x, which does not. While > it's true that Git 2.x can read my SHA-256 repository, it won't write > the appropriate objects into the map, and thus it will be practically > very difficult to actually use Git 3.x to push data to a repository of a > different hash function. We might well prefer to have Git 2.x not work > with the repository at all rather than have incomplete data preventing > us from, well, interoperating. Very sensible line of thought and suggestion to move the topic forward. Very much appreciated.
diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index 4b937480848a..10572c5794f9 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -148,14 +148,14 @@ Detailed Design Repository format extension ~~~~~~~~~~~~~~~~~~~~~~~~~~~ A SHA-256 repository uses repository format version `1` (see -Documentation/technical/repository-version.txt) with extensions -`objectFormat` and `compatObjectFormat`: +Documentation/technical/repository-version.txt) with the extension +`objectFormat`, and an optional core.compatMap configuration. [core] repositoryFormatVersion = 1 + compatMap = on [extensions] objectFormat = sha256 - compatObjectFormat = sha1 The combination of setting `core.repositoryFormatVersion=1` and populating `extensions.*` ensures that all versions of Git later than @@ -682,7 +682,7 @@ Some initial steps can be implemented independently of one another: - adding support for the PSRC field and safer object pruning The first user-visible change is the introduction of the objectFormat -extension (without compatObjectFormat). This requires: +extension. This requires: - teaching fsck about this mode of operation - using the hash function API (vtable) when computing object names @@ -690,7 +690,7 @@ extension (without compatObjectFormat). This requires: - rejecting attempts to fetch from or push to an incompatible repository -Next comes introduction of compatObjectFormat: +Next comes introduction of compatMap: - implementing the loose-object-idx - translating object names between object formats @@ -724,9 +724,9 @@ Over time projects would encourage their users to adopt the "early transition" and then "late transition" modes to take advantage of the new, more futureproof SHA-256 object names. -When objectFormat and compatObjectFormat are both set, commands -generating signatures would generate both SHA-1 and SHA-256 signatures -by default to support both new and old users. +When objectFormat and compatMap are both set, commands generating +signatures would generate both SHA-1 and SHA-256 signatures by default +to support both new and old users. In projects using SHA-256 heavily, users could be encouraged to adopt the "post-transition" mode to avoid accidentally making implicit use
Ir makes a lot of sense for the hash algorithm that determines how all of the objects in the repostiory be an extension so that versions of git that don't know about it won't even try. For implementing the compatiblity maps that really is not the case. An version of git that does not recognizes the won't care and continue to use the repository as is. The mapping functionality simply won't be present. Similarly if all of the objects are not mapped this could cause some practical difficulties but it will not cause anything to perform the wrong actions to the repository. Some commands just won't work. In the worst case all that needs to happen is for the compatibilty maps to be rebuilt. So let's use an option that forces unnecessary breakage of existing tools. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- .../technical/hash-function-transition.txt | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)