Message ID | pull.1577.git.git.1695218431033.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | attr: attr.allowInvalidSource config to allow invalid revision | expand |
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: John Cai <johncai86@gmail.com> > > 44451a2e5e (attr: teach "--attr-source=<tree>" global option to "git", > 2023-05-06) provided the ability to pass in a treeish as the attr > source. When a revision does not resolve to a valid tree is passed, Git > will die. GitLab keeps bare repositories and always reads attributes > from the default branch, so we pass in HEAD to --attr-source. Makes sense. > With empty repositories however, HEAD does not point to a valid treeish, > causing Git to die. This means we would need to check for a valid > treeish each time. Naturally. > To avoid this, let's add a configuration that allows > Git to simply ignore --attr-source if it does not resolve to a valid > tree. Not convincing at all as to the reason why we want to do anything "to avoid this". "git log" in a repository whose HEAD does not point to a valid treeish. "git blame" dies with "no such ref: HEAD". An empty repository (more precisely, an unborn history) needs special casing if you want to present it if you do not want to spew underlying error messages to the end users *anyway*. It is unclear why seeing what commit the HEAD pointer points at (or which branch it points at for that matter) is *an* *extra* and *otherwise* *unnecessary* overhead that need to be avoided.
On Wed, Sep 20, 2023 at 09:06:46AM -0700, Junio C Hamano wrote: > > With empty repositories however, HEAD does not point to a valid treeish, > > causing Git to die. This means we would need to check for a valid > > treeish each time. > > Naturally. > > > To avoid this, let's add a configuration that allows > > Git to simply ignore --attr-source if it does not resolve to a valid > > tree. > > Not convincing at all as to the reason why we want to do anything > "to avoid this". "git log" in a repository whose HEAD does not > point to a valid treeish. "git blame" dies with "no such ref: > HEAD". An empty repository (more precisely, an unborn history) > needs special casing if you want to present it if you do not want to > spew underlying error messages to the end users *anyway*. It is > unclear why seeing what commit the HEAD pointer points at (or which > branch it points at for that matter) is *an* *extra* and *otherwise* > *unnecessary* overhead that need to be avoided. In an empty repository, "git log" will die anyway. So I think the more interesting case is "I have a repository with stuff in it, but HEAD points to an unborn branch". So: git --attr-source=HEAD diff foo^ foo And there you really are saying "if there are attributes in HEAD, use them; otherwise, don't worry about it". This is exactly what we do with mailmap.blob: in a bare repository it is set to HEAD by default, but if HEAD does not resolve, we just ignore it (just like a HEAD that does not contain a .mailmap file). And those match the non-bare cases, where we'd read those files from the working tree instead. So I think the same notion applies here. You want to be able to point it at HEAD by default, but if there is no HEAD, that is the same as if HEAD simply did not contain any attributes. If we had attr.blob, that is exactly how I would expect it to work. My gut feeling is that --attr-source should do the same, and just quietly ignore a ref that does not resolve. But I think an argument can be made that because the caller explicitly gave us a ref, they expect it to work (and that would catch misspellings, etc). Like: git --attr-source=my-barnch diff foo^ foo So I'm OK with not changing that behavior. But what is weird about this patch is that we are using a config option to change how a command-line option is interpreted. If the idea is that some invocations care about the validity of the source and some do not, then the config option is much too blunt. It is set once long ago, but it can't distinguish between times you care about invalid sources and times you don't. It would make much more sense to me to have another command-line option, like: git --attr-source=HEAD --allow-invalid-attr-source Obviously that is horrible to type, but I think the point is that you'd only do this from a script anyway (because it's those automated cases where you want to say "use HEAD only if it exists"). If there were an attr.blob config option and it complained about an invalid HEAD, _then_ I think attr.allowInvalidSource might make sense (though again, I would just argue for switching the behavior by default). And I really think attr.blob is a better match for what GitLab is trying to do here, because it is set once and applies to all commands, rather than having to teach every invocation to pass it (though I guess maybe they use it as an environment variable). Of course I would think that, as the person who solved GitHub's exact same problem for mailmap by adding mailmap.blob. So you may ingest the appropriate grain of salt. :) -Peff
Jeff King <peff@peff.net> writes: > In an empty repository, "git log" will die anyway. So I think the more > interesting case is "I have a repository with stuff in it, but HEAD > points to an unborn branch". So: > > git --attr-source=HEAD diff foo^ foo This still looks like a made-up example. Who in the right mind would specify HEAD when both of the revs involved in the operation are from branch 'foo'? The history of HEAD may not have anything common with the operand of the operation 'foo' (or its parent), or worse, it may not even exist. But your "in this repository we never trust attributes from working tree, take it instead from this file or from this blob" example does make a lot more sense as a use case. > And there you really are saying "if there are attributes in HEAD, use > them; otherwise, don't worry about it". This is exactly what we do with > mailmap.blob: in a bare repository it is set to HEAD by default, but if > HEAD does not resolve, we just ignore it (just like a HEAD that does not > contain a .mailmap file). And those match the non-bare cases, where we'd > read those files from the working tree instead. "HEAD" -> "HEAD:.mailmap" if I recall correctly. And if HEAD does not resolve, we pretend as if HEAD is an empty tree-ish (hence HEAD:.mailmap is missing). It becomes very tempting to do the same for the attribute sources and treat unborn HEAD as if it specifies an empty tree-ish, without any configuration or an extra option. Such a change would be an end-user observable behaviour change, but nobody sane would be running "git --attr-source=HEAD diff HEAD^ HEAD" to check and detect an unborn HEAD for its error exit code, so I do not think it is a horribly wrong thing to do. But again, as you said, --attr-source=<tree-ish> does not sound like a good fit for bare-repository hosted environment and a tentative hack waiting for a proper attr.blob support, or something like that, to appear. > But what is weird about this patch is that we are using a config option > to change how a command-line option is interpreted. If the idea is that > some invocations care about the validity of the source and some do not, > then the config option is much too blunt. It is set once long ago, but > it can't distinguish between times you care about invalid sources and > times you don't. > > It would make much more sense to me to have another command-line option, > like: > > git --attr-source=HEAD --allow-invalid-attr-source Yeah, if we were to make it configurable without changing the default behaviour, I agree that would be more correct approach. A configuration does not sound like a good fit. > ... And I really think attr.blob is a better match for what GitLab > is trying to do here, because it is set once and applies to all > commands, rather than having to teach every invocation to pass it > (though I guess maybe they use it as an environment variable). True, too. Thanks.
On Thu, Sep 21, 2023 at 01:52:52AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > In an empty repository, "git log" will die anyway. So I think the more > > interesting case is "I have a repository with stuff in it, but HEAD > > points to an unborn branch". So: > > > > git --attr-source=HEAD diff foo^ foo > > This still looks like a made-up example. Who in the right mind > would specify HEAD when both of the revs involved in the operation > are from branch 'foo'? The history of HEAD may not have anything > common with the operand of the operation 'foo' (or its parent), or > worse, it may not even exist. I think it's unlikely for a user to write that. But if you are running a server full of bare repositories that does diffs, you might end up with a script that sticks "--attr-source=HEAD" at the beginning of every command. It is true that HEAD may not be related. But that is what we use if you are in a non-bare repository and run "git diff foo^ foo". Arguably: git --attr-source=$to diff $from $to is a better default for this command. But something like "git log -p" is trickier, as you have many commits to show. You can try to use the tip of the traversal, but there may be multiple. Using the attributes from the destination of each commit is the most likely to avoid divergence between the attributes and the code, but it may also not be what people want. Using the modern attributes from the working tree often makes showing historical commits much nicer. So I dunno. I could see arguments in both directions, but I think in general people have been OK with pulling attributes from the working tree. And --attr-source=HEAD is the bare equivalent. > But your "in this repository we never trust attributes from working > tree, take it instead from this file or from this blob" example does > make a lot more sense as a use case. I don't know that it was my example. :) But yes, if you do "--attr-source=$to", you're overriding even the non-bare case. That may be what you want for some cases, but as above, I think it's hard to apply consistently (or even what you'd want for the general case). > > And there you really are saying "if there are attributes in HEAD, use > > them; otherwise, don't worry about it". This is exactly what we do with > > mailmap.blob: in a bare repository it is set to HEAD by default, but if > > HEAD does not resolve, we just ignore it (just like a HEAD that does not > > contain a .mailmap file). And those match the non-bare cases, where we'd > > read those files from the working tree instead. > > "HEAD" -> "HEAD:.mailmap" if I recall correctly. True, yeah. We can't do that here because attributes are spread across the tree. So all my mentions of attr.blob would really be attr.tree. > And if HEAD does not resolve, we pretend as if HEAD is an empty > tree-ish (hence HEAD:.mailmap is missing). It becomes very tempting > to do the same for the attribute sources and treat unborn HEAD as if > it specifies an empty tree-ish, without any configuration or an > extra option. > > Such a change would be an end-user observable behaviour change, but > nobody sane would be running "git --attr-source=HEAD diff HEAD^ HEAD" > to check and detect an unborn HEAD for its error exit code, so I do > not think it is a horribly wrong thing to do. Yeah, that is basically what I am proposing. It sounds from the discussion here that there are two interesting cases: 1. You want to use --attr-source=HEAD because you are trying to make a bare repo behave like a non-bare one. You probably want the "don't complain if it is missing" behavior. 2. You are trying to use the attributes from one side of the diff to override the worktree ones (because the two trees are unrelated). In which case it does not really matter if --attr-source complains, because the diff will likewise complain if the tree cannot be resolved. Just trying to play devil's advocate, though, I guess you could run a non-diff operation like say: git --attr-source=my-branch check-attr foo and then you probably _do_ want to know if that source was ignored or typo'd. > But again, as you said, --attr-source=<tree-ish> does not sound like > a good fit for bare-repository hosted environment and a tentative > hack waiting for a proper attr.blob support, or something like that, > to appear. I think folks mentioned mailmap.blob in the original discussion for --attr-source. I don't remember why the patch went with --attr-source there. Maybe John can speak to that. -Peff
Hi Peff, On 21 Sep 2023, at 0:15, Jeff King wrote: > On Wed, Sep 20, 2023 at 09:06:46AM -0700, Junio C Hamano wrote: > >>> With empty repositories however, HEAD does not point to a valid treeish, >>> causing Git to die. This means we would need to check for a valid >>> treeish each time. >> >> Naturally. >> >>> To avoid this, let's add a configuration that allows >>> Git to simply ignore --attr-source if it does not resolve to a valid >>> tree. >> >> Not convincing at all as to the reason why we want to do anything >> "to avoid this". "git log" in a repository whose HEAD does not >> point to a valid treeish. "git blame" dies with "no such ref: >> HEAD". An empty repository (more precisely, an unborn history) >> needs special casing if you want to present it if you do not want to >> spew underlying error messages to the end users *anyway*. It is >> unclear why seeing what commit the HEAD pointer points at (or which >> branch it points at for that matter) is *an* *extra* and *otherwise* >> *unnecessary* overhead that need to be avoided. > > In an empty repository, "git log" will die anyway. So I think the more > interesting case is "I have a repository with stuff in it, but HEAD > points to an unborn branch". So: > > git --attr-source=HEAD diff foo^ foo > > And there you really are saying "if there are attributes in HEAD, use > them; otherwise, don't worry about it". This is exactly what we do with > mailmap.blob: in a bare repository it is set to HEAD by default, but if > HEAD does not resolve, we just ignore it (just like a HEAD that does not > contain a .mailmap file). And those match the non-bare cases, where we'd > read those files from the working tree instead. > > So I think the same notion applies here. You want to be able to point it > at HEAD by default, but if there is no HEAD, that is the same as if HEAD > simply did not contain any attributes. If we had attr.blob, that is > exactly how I would expect it to work. You captured this quite well! > > My gut feeling is that --attr-source should do the same, and just > quietly ignore a ref that does not resolve. But I think an argument can > be made that because the caller explicitly gave us a ref, they expect it > to work (and that would catch misspellings, etc). Like: > > git --attr-source=my-barnch diff foo^ foo > > So I'm OK with not changing that behavior. > > But what is weird about this patch is that we are using a config option > to change how a command-line option is interpreted. If the idea is that > some invocations care about the validity of the source and some do not, > then the config option is much too blunt. It is set once long ago, but > it can't distinguish between times you care about invalid sources and > times you don't. > > It would make much more sense to me to have another command-line option, > like: > > git --attr-source=HEAD --allow-invalid-attr-source > > Obviously that is horrible to type, but I think the point is that you'd > only do this from a script anyway (because it's those automated cases > where you want to say "use HEAD only if it exists"). Yeah, I see the point about using a config to change default behavior of a command leading to confusion. > > If there were an attr.blob config option and it complained about an > invalid HEAD, _then_ I think attr.allowInvalidSource might make sense > (though again, I would just argue for switching the behavior by > default). And I really think attr.blob is a better match for what GitLab > is trying to do here, because it is set once and applies to all > commands, rather than having to teach every invocation to pass it > (though I guess maybe they use it as an environment variable). In retrospect perhaps a config would have been better here. I think this patch started with improving an existing command line flag [1] by making it global. So I think we were just thinking about command line flags and didn't consider configs. That being said, for GitLab at least there's not a lot of difference since we pass in configs through the commandline anyways rather than relying on the config state itself on disk for our bare server-side repositories. 1. https://lore.kernel.org/git/pull.1470.git.git.1679109928556.gitgitgadget@gmail.com/ > > Of course I would think that, as the person who solved GitHub's exact > same problem for mailmap by adding mailmap.blob. So you may ingest the > appropriate grain of salt. :) > > -Peff thanks John
Hi Peff, On 21 Sep 2023, at 17:40, Jeff King wrote: > On Thu, Sep 21, 2023 at 01:52:52AM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >>> In an empty repository, "git log" will die anyway. So I think the more >>> interesting case is "I have a repository with stuff in it, but HEAD >>> points to an unborn branch". So: >>> >>> git --attr-source=HEAD diff foo^ foo >> >> This still looks like a made-up example. Who in the right mind >> would specify HEAD when both of the revs involved in the operation >> are from branch 'foo'? The history of HEAD may not have anything >> common with the operand of the operation 'foo' (or its parent), or >> worse, it may not even exist. > > I think it's unlikely for a user to write that. But if you are running a > server full of bare repositories that does diffs, you might end up with > a script that sticks "--attr-source=HEAD" at the beginning of every > command. > > It is true that HEAD may not be related. But that is what we use if you > are in a non-bare repository and run "git diff foo^ foo". > > Arguably: > > git --attr-source=$to diff $from $to > > is a better default for this command. But something like "git log -p" is > trickier, as you have many commits to show. You can try to use the tip > of the traversal, but there may be multiple. Using the attributes from > the destination of each commit is the most likely to avoid divergence > between the attributes and the code, but it may also not be what people > want. Using the modern attributes from the working tree often makes > showing historical commits much nicer. > > So I dunno. I could see arguments in both directions, but I think in > general people have been OK with pulling attributes from the working > tree. And --attr-source=HEAD is the bare equivalent. > >> But your "in this repository we never trust attributes from working >> tree, take it instead from this file or from this blob" example does >> make a lot more sense as a use case. > > I don't know that it was my example. :) But yes, if you do > "--attr-source=$to", you're overriding even the non-bare case. That may > be what you want for some cases, but as above, I think it's hard to > apply consistently (or even what you'd want for the general case). > >>> And there you really are saying "if there are attributes in HEAD, use >>> them; otherwise, don't worry about it". This is exactly what we do with >>> mailmap.blob: in a bare repository it is set to HEAD by default, but if >>> HEAD does not resolve, we just ignore it (just like a HEAD that does not >>> contain a .mailmap file). And those match the non-bare cases, where we'd >>> read those files from the working tree instead. >> >> "HEAD" -> "HEAD:.mailmap" if I recall correctly. > > True, yeah. We can't do that here because attributes are spread across > the tree. So all my mentions of attr.blob would really be attr.tree. > >> And if HEAD does not resolve, we pretend as if HEAD is an empty >> tree-ish (hence HEAD:.mailmap is missing). It becomes very tempting >> to do the same for the attribute sources and treat unborn HEAD as if >> it specifies an empty tree-ish, without any configuration or an >> extra option. >> >> Such a change would be an end-user observable behaviour change, but >> nobody sane would be running "git --attr-source=HEAD diff HEAD^ HEAD" >> to check and detect an unborn HEAD for its error exit code, so I do >> not think it is a horribly wrong thing to do. > > Yeah, that is basically what I am proposing. It sounds from the > discussion here that there are two interesting cases: > > 1. You want to use --attr-source=HEAD because you are trying to make a > bare repo behave like a non-bare one. You probably want the "don't > complain if it is missing" behavior. Yep, this is the primary use case for us. > > 2. You are trying to use the attributes from one side of the diff to > override the worktree ones (because the two trees are unrelated). > In which case it does not really matter if --attr-source complains, > because the diff will likewise complain if the tree cannot be > resolved. > > Just trying to play devil's advocate, though, I guess you could run a > non-diff operation like say: > > git --attr-source=my-branch check-attr foo > > and then you probably _do_ want to know if that source was ignored or > typo'd. > >> But again, as you said, --attr-source=<tree-ish> does not sound like >> a good fit for bare-repository hosted environment and a tentative >> hack waiting for a proper attr.blob support, or something like that, >> to appear. > > I think folks mentioned mailmap.blob in the original discussion for > --attr-source. I don't remember why the patch went with --attr-source > there. Maybe John can speak to that. Yeah I was looking for this, and I think I found it in [1], which was part of a separate patch having to do with gitattributes. If someone did mention it in the patch for --attr-source, then I can't remember why we decided to go with the comand line flag rather than the config. 1. https://lore.kernel.org/git/ZBMn5T6zfKK+PYUe@coredump.intra.peff.net/ > > -Peff
Hi Junio, On 21 Sep 2023, at 4:52, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > >> In an empty repository, "git log" will die anyway. So I think the more >> interesting case is "I have a repository with stuff in it, but HEAD >> points to an unborn branch". So: >> >> git --attr-source=HEAD diff foo^ foo > > This still looks like a made-up example. Who in the right mind > would specify HEAD when both of the revs involved in the operation > are from branch 'foo'? The history of HEAD may not have anything > common with the operand of the operation 'foo' (or its parent), or > worse, it may not even exist. > > But your "in this repository we never trust attributes from working > tree, take it instead from this file or from this blob" example does > make a lot more sense as a use case. > >> And there you really are saying "if there are attributes in HEAD, use >> them; otherwise, don't worry about it". This is exactly what we do with >> mailmap.blob: in a bare repository it is set to HEAD by default, but if >> HEAD does not resolve, we just ignore it (just like a HEAD that does not >> contain a .mailmap file). And those match the non-bare cases, where we'd >> read those files from the working tree instead. > > "HEAD" -> "HEAD:.mailmap" if I recall correctly. > > And if HEAD does not resolve, we pretend as if HEAD is an empty > tree-ish (hence HEAD:.mailmap is missing). It becomes very tempting > to do the same for the attribute sources and treat unborn HEAD as if > it specifies an empty tree-ish, without any configuration or an > extra option. > > Such a change would be an end-user observable behaviour change, but > nobody sane would be running "git --attr-source=HEAD diff HEAD^ HEAD" > to check and detect an unborn HEAD for its error exit code, so I do > not think it is a horribly wrong thing to do. > > But again, as you said, --attr-source=<tree-ish> does not sound like > a good fit for bare-repository hosted environment and a tentative > hack waiting for a proper attr.blob support, or something like that, > to appear. > >> But what is weird about this patch is that we are using a config option >> to change how a command-line option is interpreted. If the idea is that >> some invocations care about the validity of the source and some do not, >> then the config option is much too blunt. It is set once long ago, but >> it can't distinguish between times you care about invalid sources and >> times you don't. >> >> It would make much more sense to me to have another command-line option, >> like: >> >> git --attr-source=HEAD --allow-invalid-attr-source > > Yeah, if we were to make it configurable without changing the > default behaviour, I agree that would be more correct approach. A > configuration does not sound like a good fit. > >> ... And I really think attr.blob is a better match for what GitLab >> is trying to do here, because it is set once and applies to all >> commands, rather than having to teach every invocation to pass it >> (though I guess maybe they use it as an environment variable). Between adding an --allow-invalid-attr-source, and adding attr.blob and attr.allowInvalidSource I think I like adding the attr.blob config more. > > True, too. > > Thanks. thanks John
diff --git a/Documentation/config.txt b/Documentation/config.txt index 229b63a454c..b1891c2b5af 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -371,6 +371,8 @@ other popular tools, and describe them in your documentation. include::config/advice.txt[] +include::config/attr.txt[] + include::config/core.txt[] include::config/add.txt[] diff --git a/Documentation/config/attr.txt b/Documentation/config/attr.txt new file mode 100644 index 00000000000..2218f0c982a --- /dev/null +++ b/Documentation/config/attr.txt @@ -0,0 +1,6 @@ +attr.allowInvalidSource:: + If `--attr-source` cannot resolve to a valid tree object, ignore + `--attr-source` instead of erroring out, and fall back to looking for + attributes in the default locations. Useful when passing `HEAD` into + `attr-source` since it allows `HEAD` to point to an unborn branch in + cases like an empty repository. diff --git a/attr.c b/attr.c index 71c84fbcf86..854a3720e3f 100644 --- a/attr.c +++ b/attr.c @@ -1208,8 +1208,15 @@ static void compute_default_attr_source(struct object_id *attr_source) if (!default_attr_source_tree_object_name || !is_null_oid(attr_source)) return; - if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, attr_source)) - die(_("bad --attr-source or GIT_ATTR_SOURCE")); + + if (repo_get_oid_treeish(the_repository, default_attr_source_tree_object_name, attr_source)) { + int allow_invalid_attr_source = 0; + + git_config_get_bool("attr.allowinvalidsource", &allow_invalid_attr_source); + + if (!allow_invalid_attr_source) + die(_("bad --attr-source or GIT_ATTR_SOURCE")); + } } static struct object_id *default_attr_source(void) diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh index 26e082f05b4..3272237ee2b 100755 --- a/t/t0003-attributes.sh +++ b/t/t0003-attributes.sh @@ -342,6 +342,42 @@ test_expect_success 'bare repository: check that .gitattribute is ignored' ' ) ' +bad_attr_source_err="fatal: bad --attr-source or GIT_ATTR_SOURCE" + +test_expect_success 'attr.allowInvalidSource when HEAD is unborn' ' + test_when_finished rm -rf empty && + echo $bad_attr_source_err >expect_err && + echo "f/path: test: unspecified" >expect && + git init empty && + test_must_fail git -C empty --attr-source=HEAD check-attr test -- f/path 2>err && + test_cmp expect_err err && + git -C empty -c attr.allowInvalidSource=true --attr-source=HEAD check-attr test -- f/path >actual 2>err && + test_must_be_empty err && + test_cmp expect actual +' + +test_expect_success 'attr.allowInvalidSource when --attr-source points to non-existing ref' ' + test_when_finished rm -rf empty && + echo $bad_attr_source_err >expect_err && + echo "f/path: test: unspecified" >expect && + git init empty && + test_must_fail git -C empty --attr-source=refs/does/not/exist check-attr test -- f/path 2>err && + test_cmp expect_err err && + git -C empty -c attr.allowInvalidSource=true --attr-source=refs/does/not/exist check-attr test -- f/path >actual 2>err && + test_must_be_empty err && + test_cmp expect actual +' + +test_expect_success 'bad attr source defaults to reading .gitattributes file' ' + test_when_finished rm -rf empty && + git init empty && + echo "f/path test=val" >empty/.gitattributes && + echo "f/path: test: val" >expect && + git -C empty -c attr.allowInvalidSource=true --attr-source=HEAD check-attr test -- f/path >actual 2>err && + test_must_be_empty err && + test_cmp expect actual +' + test_expect_success 'bare repository: with --source' ' ( cd bare.git &&