Message ID | 20231215203245.3622299-1-gitster@pobox.com (mailing list archive) |
---|---|
Headers | show |
Series | make room for "special ref" | expand |
Junio C Hamano <gitster@pobox.com> writes: > ... For example, FETCH_HEAD currently stores not > just a single object name, but can and is used to store multiple > object names, each with annotations to record where they came from. > There indeed may be a need to introduce a new term to refer to such > "special refs". The "may be" here vaguely hints another possibility. If we manage to get rid of the "special refs", we do not even have to mention "special refs", and more importantly, we do not need extra code to deal with them. For FETCH_HEAD, for example, I wonder if an update along this line is possible: * Teach "git fetch" to store what it writes to FETCH_HEAD to a different file, under a distinctly different filename (e.g., $GIT_DIR/fetched-tips). Demote FETCH_HEAD to a pseudoref, and store the first object name in that "fetched-tips" file to it. * Teach "git pull" to learn what it used to learn from FETCH_HEAD (i.e., list of fetched tips, each annotated with what ref at what repository it came from and if it is to be merged) from the new "fetched-tips" file. The "special" ness of FETCH_HEAD is really an implementation detail of how "git pull" works and how the findings of "git fetch" are communicated to "git pull". The general refs API should not have to worry about it, and the refs backends should not have to worry about storing more than just an object name (or if it is a symbolic ref, the target refname). An end-user command like "git log ORIG_HEAD..FETCH_HEAD" would not be affected by changes along the above line, because the current FETCH_HEAD, when used as a revision, will work as if it stores the single object name that is listed first in the file. If somebody is reading FETCH_HEAD and acting on its contents (rather than merely consuming it as a ref of the first object), perhaps feeding it to "git fmt-merge-msg", they will be broken by such a change (indeed, our own "git pull" will be broken by the change to "git fetch", and the second bullet point above is about fixing the exact fallout from it), but I am not sure if that is a use case worth worrying about. Hmm?
On 15/12/2023 21:21, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> ... For example, FETCH_HEAD currently stores not >> just a single object name, but can and is used to store multiple >> object names, each with annotations to record where they came from. >> There indeed may be a need to introduce a new term to refer to such >> "special refs". > > The "may be" here vaguely hints another possibility. If we manage > to get rid of the "special refs", we do not even have to mention > "special refs", and more importantly, we do not need extra code to > deal with them. > > For FETCH_HEAD, for example, I wonder if an update along this line > is possible: > > * Teach "git fetch" to store what it writes to FETCH_HEAD to a > different file, under a distinctly different filename (e.g., > $GIT_DIR/fetched-tips). Demote FETCH_HEAD to a pseudoref, and > store the first object name in that "fetched-tips" file to it. > > * Teach "git pull" to learn what it used to learn from FETCH_HEAD > (i.e., list of fetched tips, each annotated with what ref at what > repository it came from and if it is to be merged) from the new > "fetched-tips" file. > > The "special" ness of FETCH_HEAD is really an implementation detail > of how "git pull" works and how the findings of "git fetch" are > communicated to "git pull". The general refs API should not have to > worry about it, and the refs backends should not have to worry about > storing more than just an object name (or if it is a symbolic ref, > the target refname). > > An end-user command like "git log ORIG_HEAD..FETCH_HEAD" would not > be affected by changes along the above line, because the current > FETCH_HEAD, when used as a revision, will work as if it stores the > single object name that is listed first in the file. > > If somebody is reading FETCH_HEAD and acting on its contents (rather > than merely consuming it as a ref of the first object), perhaps > feeding it to "git fmt-merge-msg", they will be broken by such a > change (indeed, our own "git pull" will be broken by the change to > "git fetch", and the second bullet point above is about fixing the > exact fallout from it), but I am not sure if that is a use case worth > worrying about. > > Hmm? > Yes, I was going to suggest exactly this, after Patrick pointed out that there were only two 'special psuedo-refs' (I had a vague feeling there were some more than that) FETCH_HEAD and MERGE_HEAD. ATB, Ramsay Jones
Ramsay Jones <ramsay@ramsayjones.plus.com> writes: > Yes, I was going to suggest exactly this, after Patrick pointed out > that there were only two 'special psuedo-refs' (I had a vague feeling > there were some more than that) FETCH_HEAD and MERGE_HEAD. Glad to see that I am not alone. We should be able to treat MERGE_HEAD similarly. It is used to communicate the list of "other parents" from "git merge" that stops in the middle (either for merge conflict, or in response to the "--no-commit" command line option) to "git commit" that concludes such an unfinished merge. Many commands merely use the presence of MERGE_HEAD as a sign that a merge is in progress (e.g. "git status"), which would not break if we just started to record the first parent in a pseudoref MERGE_HEAD and wrote the other octopus parents elsewhere, but some commands do need all these parents from MERGE_HEAD (e.g. "git blame" that synthesizes a fake starting commit out of the working tree state). If we cannot get rid of all "special refs" anyway, however, I think there is little that we can gain from doing such "make FETCH_HEAD and MERGE_HEAD into a single-object pseudoref, and write other info in separate files" exercise. We can treat the current FETCH_HEAD and MERGE_HEAD as "file that is not and is more than a ref", which is what the current code is doing anyway, which means we would declare that they have to stay to be files under $GIT_DIR/ and will be accessed via the filesystem access. At that point, calling them "special ref" might even be more misleading than its worth and we may be better off to admit that they are not even refs but a datafile some commands can use to obtain input from, but the phrase we use to refer to them, be it "special ref" or some random datafile, does not make a fundamental change on anything.
On 15/12/2023 22:44, Ramsay Jones wrote: > On 15/12/2023 21:21, Junio C Hamano wrote: >> If somebody is reading FETCH_HEAD and acting on its contents (rather >> than merely consuming it as a ref of the first object), perhaps >> feeding it to "git fmt-merge-msg", they will be broken by such a >> change (indeed, our own "git pull" will be broken by the change to >> "git fetch", and the second bullet point above is about fixing the >> exact fallout from it), but I am not sure if that is a use case worth >> worrying about. > > Yes, I was going to suggest exactly this, after Patrick pointed out > that there were only two 'special psuedo-refs' (I had a vague feeling > there were some more than that) FETCH_HEAD and MERGE_HEAD. According to the pseudoref entry of gitglossary, CHERRY_PICK_HEAD also stores additional data (which would imply that REVERT_HEAD does too). Looking at CHERRY_PICK_HEAD during a pick though, I only see a single hash, even when picking multiple commits. Regards, Andy
On 15/12/2023 20:32, Junio C Hamano wrote: > A pseudo ref is merely a normal ref with a funny naming convention, > i.e., being outside the refs/ hierarchy and has names with all > uppercase letters (or an underscore). I know what you mean, but gitglossary defines pseudorefs as separate from refs, albeit behaving like refs. Their name itself implies the same. Although the 'ref' entry then goes on to say that "there are a few special-purpose refs that do not begin with 'refs/', the most notable example being HEAD." That implies that at least some of the pseudorefs are refs after all, while keeping in mind that "HEAD is not a pseudoref, because it is sometimes a symbolic ref" according to the 'pseudoref' entry. I think a clearer answer on whether pseudorefs are refs is needed, or at least a better-defined fudge, such as "pseudorefs are refs except when ...". Defining everything under "refs/" as refs, and the stuff outside it including HEAD itself as pseudorefs, would draw clearer lines. The fact HEAD is usually symbolic doesn't seem all that relevant from the perspective of a user trying to get a grasp of refs and pseudorefs. Regards, Andy
On Sat, Dec 16, 2023 at 10:20:09AM +0000, Andy Koppe wrote: > On 15/12/2023 22:44, Ramsay Jones wrote: > > On 15/12/2023 21:21, Junio C Hamano wrote: > > > > If somebody is reading FETCH_HEAD and acting on its contents (rather > > > than merely consuming it as a ref of the first object), perhaps > > > feeding it to "git fmt-merge-msg", they will be broken by such a > > > change (indeed, our own "git pull" will be broken by the change to > > > "git fetch", and the second bullet point above is about fixing the > > > exact fallout from it), but I am not sure if that is a use case worth > > > worrying about. > > > > Yes, I was going to suggest exactly this, after Patrick pointed out > > that there were only two 'special psuedo-refs' (I had a vague feeling > > there were some more than that) FETCH_HEAD and MERGE_HEAD. > > According to the pseudoref entry of gitglossary, CHERRY_PICK_HEAD also > stores additional data (which would imply that REVERT_HEAD does too). > Looking at CHERRY_PICK_HEAD during a pick though, I only see a single hash, > even when picking multiple commits. Both CHERRY_PICK_HEAD and REVERT_HEAD are only ever updated via the refs API, so neither of them ever contains anything other than a normal ref. I guess we should update the glossary accordingly. Patrick
On Fri, Dec 15, 2023 at 04:44:47PM -0800, Junio C Hamano wrote: > Ramsay Jones <ramsay@ramsayjones.plus.com> writes: > > > Yes, I was going to suggest exactly this, after Patrick pointed out > > that there were only two 'special psuedo-refs' (I had a vague feeling > > there were some more than that) FETCH_HEAD and MERGE_HEAD. I don't think there are more special refs than those two. Andy pointed out CHERRY_PICK_HEAD and REVERT_HEAD, but both of them actually get accessed via the ref backend exclusively and thus cannot be special in any way. Also, the test suite of Git passes with only those two refs marked as special refs with the reftable backend, which is another good indicator that I didn't miss anything here because we definitely can't store special information in the reftable backend. It's of course still possible that our test suite has a blind spot and that I missed any special refs. If so, I would love to hear about them. > Glad to see that I am not alone. We should be able to treat > MERGE_HEAD similarly. It is used to communicate the list of "other > parents" from "git merge" that stops in the middle (either for merge > conflict, or in response to the "--no-commit" command line option) > to "git commit" that concludes such an unfinished merge. Many > commands merely use the presence of MERGE_HEAD as a sign that a > merge is in progress (e.g. "git status"), which would not break if > we just started to record the first parent in a pseudoref MERGE_HEAD > and wrote the other octopus parents elsewhere, but some commands do > need all these parents from MERGE_HEAD (e.g. "git blame" that > synthesizes a fake starting commit out of the working tree state). I would certainly love to drop the "specialness" of both FETCH_HEAD and MERGE_HEAD, but I am a bit pessimistic about whether we really can. The format of those refs has been around for quite a long time already, and I do expect that there is tooling out there that parses those files. I would claim that it's especially likely that FETCH_HEAD is getting parsed by external tools. Historically, there has not been a way to really figure out which refs have been updated in git-fetch(1). So any scripts that perform a fetch and want to learn about what was updated would very likely resort to parsing FETCH_HEAD. This has changed a bit with the introduction of the machine-parsable interface of git-fetch(1), but it has only been introduced rather recently with Git v2.42. > If we cannot get rid of all "special refs" anyway, however, I think > there is little that we can gain from doing such "make FETCH_HEAD > and MERGE_HEAD into a single-object pseudoref, and write other info > in separate files" exercise. We can treat the current FETCH_HEAD > and MERGE_HEAD as "file that is not and is more than a ref", which > is what the current code is doing anyway, which means we would > declare that they have to stay to be files under $GIT_DIR/ and will > be accessed via the filesystem access. I'd like for it to be otherwise, but I think this is the only sensible thing to do. I think it was a mistake to introduce those special refs like this and treat them almost like a real ref, but that's always easy to say in hindsight. > At that point, calling them "special ref" might even be more > misleading than its worth and we may be better off to admit that they > are not even refs but a datafile some commands can use to obtain input > from, but the phrase we use to refer to them, be it "special ref" or > some random datafile, does not make a fundamental change on anything. Well, the problem is that these do indeed behave like a ref for most of the part: you can ask for them via git-rev-parse(1) and we'll resolve them just fine, even though we only ever return the first object ID. So even though I'm not a huge fan of calling them "special ref", I think we should at least highlight the reflike-nature in whatever we want to call them. Patrick
On Fri, Dec 15, 2023 at 12:32:40PM -0800, Junio C Hamano wrote: > Patrick's reftable work is progressing nicely and wants to establish > "special ref" as a phrase with some defined meaning that is somewhat > different from a mere "pseudo ref". > > A pseudo ref is merely a normal ref with a funny naming convention, > i.e., being outside the refs/ hierarchy and has names with all > uppercase letters (or an underscore). But there truly are refs that > are more than that. For example, FETCH_HEAD currently stores not > just a single object name, but can and is used to store multiple > object names, each with annotations to record where they came from. > There indeed may be a need to introduce a new term to refer to such > "special refs". > > Existing documentation, however, uses "special ref" to refer to > pseudo refs without any "special" property, like FETCH_HEAD does. > > This series merely corrects such existing uses of the word, to make > room for Patrick's series to introduce (and formally define in the > glossary) "special refs". Thanks for helping out with this effort and kicking off the discussion, I highly appreciate it! Patrick > Junio C Hamano (5): > git.txt: HEAD is not that special > git-bisect.txt: BISECT_HEAD is not that special > refs.h: HEAD is not that special > docs: AUTO_MERGE is not that special > docs: MERGE_AUTOSTASH is not that special > > Documentation/git-bisect.txt | 2 +- > Documentation/git-diff.txt | 2 +- > Documentation/git-merge.txt | 2 +- > Documentation/git.txt | 7 ++++--- > Documentation/merge-options.txt | 2 +- > Documentation/user-manual.txt | 2 +- > refs.h | 2 +- > 7 files changed, 10 insertions(+), 9 deletions(-) > > -- > 2.43.0-76-g1a87c842ec >