Message ID | cover.1611455251.git.jonathantanmy@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Check .gitmodules when using packfile URIs | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > As part of this, index-pack has to output (1) the hash that goes into > the name of the .pack/.idx file and (2) the hashes of all dangling > .gitmodules. I just had (2) come after (1). If anyone has a better idea, > I'm interested. I have this feeling that the "blobs that need to be validated across packs" will *not* be the last enhancement we'd need to make to the output from index-pack to allow richer communication between it and its invoker. While there is no reason to change how the first line of the output looks like, we'd probably want to make sure that the future versions of Git can easily tell "list of blobs that require further validation" from other additional information. I am not comfortable to recommend "ok, then let's add a delimiter line '---\n' if/when we need to have something after the list of blobs and append more stuff in future versions of Git", because we may find need to emit new kinds of info before the list of blobs that needs further validation, for example, in future versions of Git. Having said all that, the internal communication between the index-pack and its caller do not need as much care about compatibility across versions as output visible to end-users, so when a future version of Git needs to send different kinds of information in different order from what you created here, we can do so pretty much freely, I would guess. Thanks.
> Jonathan Tan <jonathantanmy@google.com> writes: > > > As part of this, index-pack has to output (1) the hash that goes into > > the name of the .pack/.idx file and (2) the hashes of all dangling > > .gitmodules. I just had (2) come after (1). If anyone has a better idea, > > I'm interested. > > I have this feeling that the "blobs that need to be validated across > packs" will *not* be the last enhancement we'd need to make to the > output from index-pack to allow richer communication between it and > its invoker. While there is no reason to change how the first line > of the output looks like, we'd probably want to make sure that the > future versions of Git can easily tell "list of blobs that require > further validation" from other additional information. > > I am not comfortable to recommend "ok, then let's add a delimiter > line '---\n' if/when we need to have something after the list of > blobs and append more stuff in future versions of Git", because we > may find need to emit new kinds of info before the list of blobs > that needs further validation, for example, in future versions of > Git. > > Having said all that, the internal communication between the > index-pack and its caller do not need as much care about > compatibility across versions as output visible to end-users, so > when a future version of Git needs to send different kinds of > information in different order from what you created here, we can do > so pretty much freely, I would guess. Yeah, that's what I thought too - since this is an internal interface, we can evolve them in lockstep. If we're really worried about the Git binaries (on a user's system) getting out of sync, we could just make sure that subsequent updates to this protocol are non-backwards-compatible (e.g. have index-pack emit "foo <hash>", where "foo" is a string that describes the new check, so that current fetch-pack will reject "foo" since it is not a hash).
On Thu, Jan 28 2021, Jonathan Tan wrote: >> Jonathan Tan <jonathantanmy@google.com> writes: >> >> > As part of this, index-pack has to output (1) the hash that goes into >> > the name of the .pack/.idx file and (2) the hashes of all dangling >> > .gitmodules. I just had (2) come after (1). If anyone has a better idea, >> > I'm interested. >> >> I have this feeling that the "blobs that need to be validated across >> packs" will *not* be the last enhancement we'd need to make to the >> output from index-pack to allow richer communication between it and >> its invoker. While there is no reason to change how the first line >> of the output looks like, we'd probably want to make sure that the >> future versions of Git can easily tell "list of blobs that require >> further validation" from other additional information. >> >> I am not comfortable to recommend "ok, then let's add a delimiter >> line '---\n' if/when we need to have something after the list of >> blobs and append more stuff in future versions of Git", because we >> may find need to emit new kinds of info before the list of blobs >> that needs further validation, for example, in future versions of >> Git. >> >> Having said all that, the internal communication between the >> index-pack and its caller do not need as much care about >> compatibility across versions as output visible to end-users, so >> when a future version of Git needs to send different kinds of >> information in different order from what you created here, we can do >> so pretty much freely, I would guess. > > Yeah, that's what I thought too - since this is an internal interface, > we can evolve them in lockstep. If we're really worried about the Git > binaries (on a user's system) getting out of sync, I'm thinking in reading "getting out of sync" that you may be missing an aspect of the issue here. We're not talking about some abnormal error in some packaging system, but how we'd expect all installations of git to behave if you update them with *.rpm, *.deb etc, e.g. when your binaries are in /usr/libexec/git-core. I suppose NixOS or something where there's hash-based paths may be exempt from this. On those systems if you've got a server serving concurrent traffic and update the "git" package you could expect failure if any git process invoked by another is incompatible during such an upgrade. If you browse some of the recent GIT_CONFIG_PARAMETERS discussion this was discussed there. I.e. even if GIT_CONFIG_PARAMETERS is internal-only we bent over backwards not to change it in such a way as to have process A invoking process B and the two not understanding each other because of such an upgrade. That's exactly because of this case, where receive-pack may be started on version A, someone runs "apt install git" in the background concurrently, and now a version A of that program is talking to a version B index-pack. > we could just make sure that subsequent updates to this protocol are > non-backwards-compatible (e.g. have index-pack emit "foo <hash>", > where "foo" is a string that describes the new check, so that current > fetch-pack will reject "foo" since it is not a hash). And then presumably index-pack would die and receive-pack would die on the push or whatever, so the push fails for the end user.
Jonathan Tan <jonathantanmy@google.com> writes: > This patch set resolves the .gitmodules-and-tree-in-separate-packfiles > issue I mentioned in [1] by having index-pack print out all dangling > .gitmodules (instead of returning with an error code) and then teaching > fetch-pack to read those and run its own fsck checks after all > index-pack invocations are complete. > > As part of this, index-pack has to output (1) the hash that goes into > the name of the .pack/.idx file and (2) the hashes of all dangling > .gitmodules. I just had (2) come after (1). If anyone has a better idea, > I'm interested. > > I also discovered a bug in that different index-pack arguments were used > when processing the inline packfile and when processing the ones > referenced by URIs. Patch 1-3 fixes that bug by passing the arguments to > use as a space-separated URL-encoded list. (URL-encoded so that we can > have spaces in the arguments.) Again, if anyone has a better idea, I'm > interested. It is only in patch 4 that we have the dangling .gitmodules > fix. This seems to have been stalled but I think it would be a better approach to use a custom callback for error reporting, suggested by Ævar, which would be where his fsck API clean-up topic would lead to. If it is not ultra-urgent, perhaps you can retract the ones that are queued right now, work with Ævar to finish the error-callback work and rebuild this topic on top of it? Thanks.
> This seems to have been stalled but I think it would be a better > approach to use a custom callback for error reporting, suggested by > Ævar, which would be where his fsck API clean-up topic would lead > to. > > If it is not ultra-urgent, perhaps you can retract the ones that are > queued right now, work with Ævar to finish the error-callback work > and rebuild this topic on top of it? Thanks. OK - that works. My original idea was to rewrite it using an error-callback but using starts_with() instead of the ID that Ævar's work will provide, but seeing that at least one other contributor (Peff) seems OK with the patches, rebasing mine on top of his works too. I'll also take a look at his patches.
On Fri, Feb 19 2021, Junio C Hamano wrote: > Jonathan Tan <jonathantanmy@google.com> writes: > >> This patch set resolves the .gitmodules-and-tree-in-separate-packfiles >> issue I mentioned in [1] by having index-pack print out all dangling >> .gitmodules (instead of returning with an error code) and then teaching >> fetch-pack to read those and run its own fsck checks after all >> index-pack invocations are complete. >> >> As part of this, index-pack has to output (1) the hash that goes into >> the name of the .pack/.idx file and (2) the hashes of all dangling >> .gitmodules. I just had (2) come after (1). If anyone has a better idea, >> I'm interested. >> >> I also discovered a bug in that different index-pack arguments were used >> when processing the inline packfile and when processing the ones >> referenced by URIs. Patch 1-3 fixes that bug by passing the arguments to >> use as a space-separated URL-encoded list. (URL-encoded so that we can >> have spaces in the arguments.) Again, if anyone has a better idea, I'm >> interested. It is only in patch 4 that we have the dangling .gitmodules >> fix. > > This seems to have been stalled but I think it would be a better > approach to use a custom callback for error reporting, suggested by > Ævar, which would be where his fsck API clean-up topic would lead > to. > > If it is not ultra-urgent, perhaps you can retract the ones that are > queued right now, work with Ævar to finish the error-callback work > and rebuild this topic on top of it? Thanks. If my vote counts for something I think it makes sense to have Jonathan's series go first and just ignore my fsck API improvement patches (well, the part of my v1[1] which conflicts with his work). I'm also happy to help him queue his on top of a v1 version of my series. But the end result of doing so (shown after the "--" in [1]) is just a small re-arrangement of code to get a cleaner fsck API use, it doesn't actually matter to anyone using git. Whereas his patches actually do, we have in-the-wild server/repo/clone setups that are getting on-clone errors, and the window for 2.31 is getting closer. We can always do the small API use refactoring later. My interest in barking up that tree was just that I've been poking at that part of the fsck API and have some follow-up work that hasn't made it onto the list yet that makes other use of the fsck API. So in the longer term I wanted us to think about not needing N special cases like "print_dangling_gitmodules" if we could help it, but in the shorter term having it is a non-issue. 1. https://lore.kernel.org/git/20210217194246.25342-1-avarab@gmail.com/
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Fri, Feb 19 2021, Junio C Hamano wrote: > >> This seems to have been stalled but I think it would be a better >> approach to use a custom callback for error reporting, suggested by >> Ævar, which would be where his fsck API clean-up topic would lead >> to. >> >> If it is not ultra-urgent, perhaps you can retract the ones that are >> queued right now, work with Ævar to finish the error-callback work >> and rebuild this topic on top of it? Thanks. > > If my vote counts for something I think it makes sense to have > Jonathan's series go first and just ignore my fsck API improvement > patches (well, the part of my v1[1] which conflicts with his work). > > I'm also happy to help him queue his on top of a v1 version of my > series. Either would work for us, I would think. Thanks.
Jonathan Tan <jonathantanmy@google.com> writes: >> This seems to have been stalled but I think it would be a better >> approach to use a custom callback for error reporting, suggested by >> Ævar, which would be where his fsck API clean-up topic would lead >> to. >> >> If it is not ultra-urgent, perhaps you can retract the ones that are >> queued right now, work with Ævar to finish the error-callback work >> and rebuild this topic on top of it? Thanks. > > OK - that works. My original idea was to rewrite it using an > error-callback but using starts_with() instead of the ID that Ævar's > work will provide, but seeing that at least one other contributor (Peff) > seems OK with the patches, rebasing mine on top of his works too. I'll > also take a look at his patches. Thanks, either way would work for me, but if the suggested route forces you review Ævar's code and work together, that would be a good bonus point ;-)