diff mbox series

[2/2] refs: warn on non-pseudoref looking .git/<file> refs

Message ID 20201210125321.19456-2-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series [1/2] refs: move is_pseudoref_syntax() earlier in the file | expand

Commit Message

Ævar Arnfjörð Bjarmason Dec. 10, 2020, 12:53 p.m. UTC
The refs parsing machinery will first try to parse arbitrary
.git/<name> for a given <name>, before moving onto refs/<name>,
refs/tags/<name> etc. See "ref_rev_parse_rules" in refs.c, but
e.g. "for-each-ref" and other things that list references ignore these
ancient-style refs.

Thus if you end up in a repository that contains e.g. .git/master the
likes of "checkout" can emit seemingly nonsensical error
messages. E.g. I happened to have a .git/master with a non-commit
SHA-1:

    $ git checkout master
    fatal: Cannot switch branch to a non-commit 'master'

Running "for-each-ref" yields only commits that could match "master",
until I realized I'd ended up with a .git/master file. Before this
we'd ignore it under a general rule that tries to ignore .git/HEAD,
.git/MERGE_HEAD and other non-pseudoref looking refs at the top-level.

Let's help the user in this case by doing a very loose check for
whether the ref name looks like a pseudoref such as "HEAD" (i.e. only
has upper case, dashes, underbars), and if not issue a warning:

    $ git rev-parse master
    warning: matched ref .git/master doesn't look like a pseudoref
    c87c83a2e9eb6d309913a0f59389f808024a58f9

I think it's conservative enough to just turn this on by default, but
place it under a configurable option similar to the existing
core.warnAmbiguousRefs. Running the entire test suite with "die"
instead of "warning" passes with this approach.

Our own test suite makes use of a few refs in .git/ that aren't
produced by git itself, e.g. "FOO", "TESTSYMREFTWO" etc, external
tools probably rely on this as well, so I don't think it's viable to
e.g. have a whitelist of "HEAD", "MERGE_HEAD" etc. As an aside that
list is quite large, I counted 12 names used in the C code before I
abandoned that approach.

This approach of checking the case of e.g. "master" is not an issue on
case-insensitive filesystems, since we're not checking against the
fs's version of the name, but what the user provided to git on the
command-line.

We are going to match "git rev-parse master" to e.g. .git/MASTER on
those systems, but I think that's also a case the user would like to
be warned about. I once helped a user on OSX with an issue where they
couldn't repeat a merge command on Linux, and it turned out they'd
referred to "HEAD" as "head", which we'd happily resolve to .git/HEAD
without warning on that system. Now we'll warn about that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config/core.txt | 11 ++++++++++
 cache.h                       |  1 +
 config.c                      |  5 +++++
 environment.c                 |  1 +
 refs.c                        | 20 +++++++++++++++++
 t/t1430-bad-ref-name.sh       | 41 +++++++++++++++++++++++++++++++++++
 6 files changed, 79 insertions(+)

Comments

Eric Sunshine Dec. 10, 2020, 2:39 p.m. UTC | #1
On Thu, Dec 10, 2020 at 7:55 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> [...]
> Let's help the user in this case by doing a very loose check for
> whether the ref name looks like a pseudoref such as "HEAD" (i.e. only
> has upper case, dashes, underbars), and if not issue a warning:
> [...]
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> @@ -355,6 +355,17 @@ core.warnAmbiguousRefs::
> +core.warnNonPseudoRefs::
> +       If true, Git will warn you if the `<ref>` you passed
> +       unexpectedly resolves to a top-level ref stored in
> +       `.git/<file>` but doesn't look like a pseudoref such as
> +       `HEAD`, `MERGE_HEAD` etc. True by default.
> ++
> +These references are ignored by linkgit:for-each-ref[1], but resolved
> +by linkgit:git-show[1], linkgit:git-rev-parse[1] etc. So it can be
> +confusing to have e.g. an errant `.git/master` being confused with
> +`.git/refs/heads/master`.

Dscho has been submitting patches lately to eradicate the word
"master" from the project source.

> diff --git a/refs.c b/refs.c
> @@ -669,6 +676,19 @@ int expand_ref(struct repository *repo, const char *str, int len,
>                 if (r) {
> +                       if (warn_non_pseudo_refs &&
> +                           !starts_with(fullref.buf, "refs/") &&
> +                           !starts_with(r, "refs/") &&
> +                           !strchr(r, '/') &&
> +                           !is_any_pseudoref_syntax(r) &&
> +                           !warned_on_non_pseudo_ref++) {
> +                               /*
> +                                * TRANSLATORS: The 1st argument is
> +                                * e.g. "master", and the 2nd can be
> +                                * e.g. "master~10".
> +                                */
> +                               warning(_("matched ref name .git/%s doesn't look like a pseudoref"), r);

The TRANSLATORS comment talks about two arguments, but I see only one.

Does the "matched ref name" part add any value? I would find the
warning just as helpful without it:

    .git/blork doesn't look like a pseudoref

> diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
> @@ -374,4 +374,45 @@ test_expect_success 'branch -m can rename refs/heads/-dash' '
> +test_expect_success 'warn on non-pseudoref syntax refs in .git/' '
> +       test_when_finished "
> +               rm -f .git/mybranch &&
> +               rm -rf .git/a-dir &&
> +               rm -rf .git/MY-BRANCH_NAME &&
> +               rm -rf .git/MY-branch_NAME
> +       " &&

Nit: These could all be removed with a single `rm -rf`:

    rm -rf .git/mybranch .git/a-dir ...

> +       # We do not ignore lower-case
> +       cp expect .git/mybranch &&
> +       git rev-parse mybranch >hash 2>err &&
> +       test_cmp expect hash &&
> +       GIT_TEST_GETTEXT_POISON=false grep "like a pseudoref" err &&

What is the purpose of assigning GIT_TEST_GETTEXT_POISON here?

> +       git -c core.warnNonPseudoRefs=false rev-parse mybranch >hash 2>err &&
> +       test_cmp expect hash &&
> +       test_must_be_empty err &&
> +       rm .git/mybranch
> +'
Ævar Arnfjörð Bjarmason Dec. 10, 2020, 8:28 p.m. UTC | #2
On Thu, Dec 10 2020, Eric Sunshine wrote:

> On Thu, Dec 10, 2020 at 7:55 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> [...]
>> Let's help the user in this case by doing a very loose check for
>> whether the ref name looks like a pseudoref such as "HEAD" (i.e. only
>> has upper case, dashes, underbars), and if not issue a warning:
>> [...]
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
>> @@ -355,6 +355,17 @@ core.warnAmbiguousRefs::
>> +core.warnNonPseudoRefs::
>> +       If true, Git will warn you if the `<ref>` you passed
>> +       unexpectedly resolves to a top-level ref stored in
>> +       `.git/<file>` but doesn't look like a pseudoref such as
>> +       `HEAD`, `MERGE_HEAD` etc. True by default.
>> ++
>> +These references are ignored by linkgit:for-each-ref[1], but resolved
>> +by linkgit:git-show[1], linkgit:git-rev-parse[1] etc. So it can be
>> +confusing to have e.g. an errant `.git/master` being confused with
>> +`.git/refs/heads/master`.
>
> Dscho has been submitting patches lately to eradicate the word
> "master" from the project source.

Muscle memory dies hard, will fix it in the reroll.

>> diff --git a/refs.c b/refs.c
>> @@ -669,6 +676,19 @@ int expand_ref(struct repository *repo, const char *str, int len,
>>                 if (r) {
>> +                       if (warn_non_pseudo_refs &&
>> +                           !starts_with(fullref.buf, "refs/") &&
>> +                           !starts_with(r, "refs/") &&
>> +                           !strchr(r, '/') &&
>> +                           !is_any_pseudoref_syntax(r) &&
>> +                           !warned_on_non_pseudo_ref++) {
>> +                               /*
>> +                                * TRANSLATORS: The 1st argument is
>> +                                * e.g. "master", and the 2nd can be
>> +                                * e.g. "master~10".
>> +                                */
>> +                               warning(_("matched ref name .git/%s doesn't look like a pseudoref"), r);
>
> The TRANSLATORS comment talks about two arguments, but I see only one.
>
> Does the "matched ref name" part add any value? I would find the
> warning just as helpful without it:
>
>     .git/blork doesn't look like a pseudoref

Yeah that's much better, the TRANSLATORS comment is incorrect, rebased
out an earlier version where it took two, didn't notice...

>> diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
>> @@ -374,4 +374,45 @@ test_expect_success 'branch -m can rename refs/heads/-dash' '
>> +test_expect_success 'warn on non-pseudoref syntax refs in .git/' '
>> +       test_when_finished "
>> +               rm -f .git/mybranch &&
>> +               rm -rf .git/a-dir &&
>> +               rm -rf .git/MY-BRANCH_NAME &&
>> +               rm -rf .git/MY-branch_NAME
>> +       " &&
>
> Nit: These could all be removed with a single `rm -rf`:
>
>     rm -rf .git/mybranch .git/a-dir ...

Will fix.

>> +       # We do not ignore lower-case
>> +       cp expect .git/mybranch &&
>> +       git rev-parse mybranch >hash 2>err &&
>> +       test_cmp expect hash &&
>> +       GIT_TEST_GETTEXT_POISON=false grep "like a pseudoref" err &&
>
> What is the purpose of assigning GIT_TEST_GETTEXT_POISON here?

Since 6cdccfce1e0 (i18n: make GETTEXT_POISON a runtime option,
2018-11-08) we haven't needed to use the C_LOCALE_OUTPUT prerequisite
for any new code, since we can just turn the poisoning off.

I think we should just slowly refactor things away from that
prerequisite and test_i18ngrep, which were only needed because it used
to be a compile-time switch, but I haven't gotter around to that
refactoring.

In liue of that I think it makes more sense to always run the full test
if possible, no matter what the GIT_TEST_* mode is.

>> +       git -c core.warnNonPseudoRefs=false rev-parse mybranch >hash 2>err &&
>> +       test_cmp expect hash &&
>> +       test_must_be_empty err &&
>> +       rm .git/mybranch
>> +'
Eric Sunshine Dec. 10, 2020, 8:47 p.m. UTC | #3
On Thu, Dec 10, 2020 at 3:29 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Thu, Dec 10 2020, Eric Sunshine wrote:
> > On Thu, Dec 10, 2020 at 7:55 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >> +       GIT_TEST_GETTEXT_POISON=false grep "like a pseudoref" err &&
> >
> > What is the purpose of assigning GIT_TEST_GETTEXT_POISON here?
>
> Since 6cdccfce1e0 (i18n: make GETTEXT_POISON a runtime option,
> 2018-11-08) we haven't needed to use the C_LOCALE_OUTPUT prerequisite
> for any new code, since we can just turn the poisoning off.
>
> I think we should just slowly refactor things away from that
> prerequisite and test_i18ngrep, which were only needed because it used
> to be a compile-time switch, but I haven't gotter around to that
> refactoring.
>
> In liue of that I think it makes more sense to always run the full test
> if possible, no matter what the GIT_TEST_* mode is.

I must be missing something. I've looked at 6cdccfce1e0 but I still
don't see how or why `GIT_TEST_GETTEXT_POISON=false` could affect the
simple `grep` invocation being done by this test. I could understand
if GIT_TEST_GETTEXT_POISON was applied to the invocation of a Git
command, but that's not the case here.

(I also notice that 6cdccfce1e0 only checks whether
GIT_TEST_GETTEXT_POISON is empty or not -- and the changes in
6cdccfce1e0 set GIT_TEST_GETTEXT_POISON to an empty value rather than
to "false", so I find myself doubly confused by this application of
GIT_TEST_GETTEXT_POISON to `grep`.)
Ævar Arnfjörð Bjarmason Dec. 10, 2020, 10:17 p.m. UTC | #4
On Thu, Dec 10 2020, Eric Sunshine wrote:

> On Thu, Dec 10, 2020 at 3:29 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> On Thu, Dec 10 2020, Eric Sunshine wrote:
>> > On Thu, Dec 10, 2020 at 7:55 AM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >> +       GIT_TEST_GETTEXT_POISON=false grep "like a pseudoref" err &&
>> >
>> > What is the purpose of assigning GIT_TEST_GETTEXT_POISON here?
>>
>> Since 6cdccfce1e0 (i18n: make GETTEXT_POISON a runtime option,
>> 2018-11-08) we haven't needed to use the C_LOCALE_OUTPUT prerequisite
>> for any new code, since we can just turn the poisoning off.
>>
>> I think we should just slowly refactor things away from that
>> prerequisite and test_i18ngrep, which were only needed because it used
>> to be a compile-time switch, but I haven't gotter around to that
>> refactoring.
>>
>> In liue of that I think it makes more sense to always run the full test
>> if possible, no matter what the GIT_TEST_* mode is.
>
> I must be missing something. I've looked at 6cdccfce1e0 but I still
> don't see how or why `GIT_TEST_GETTEXT_POISON=false` could affect the
> simple `grep` invocation being done by this test. I could understand
> if GIT_TEST_GETTEXT_POISON was applied to the invocation of a Git
> command, but that's not the case here.
>
> (I also notice that 6cdccfce1e0 only checks whether
> GIT_TEST_GETTEXT_POISON is empty or not -- and the changes in
> 6cdccfce1e0 set GIT_TEST_GETTEXT_POISON to an empty value rather than
> to "false", so I find myself doubly confused by this application of
> GIT_TEST_GETTEXT_POISON to `grep`.)

You're not missing something, but it seems I need to step away from the
computer for the day. I managed to write this & reply to your E-Mail
without noticing I'd put the GIT_TEST[...] env variable in front of the
wrong command. It should indeed be in front of the rev-parse above.
diff mbox series

Patch

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 160aacad84b..535a7012156 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -355,6 +355,17 @@  core.warnAmbiguousRefs::
 	If true, Git will warn you if the ref name you passed it is ambiguous
 	and might match multiple refs in the repository. True by default.
 
+core.warnNonPseudoRefs::
+	If true, Git will warn you if the `<ref>` you passed
+	unexpectedly resolves to a top-level ref stored in
+	`.git/<file>` but doesn't look like a pseudoref such as
+	`HEAD`, `MERGE_HEAD` etc. True by default.
++
+These references are ignored by linkgit:for-each-ref[1], but resolved
+by linkgit:git-show[1], linkgit:git-rev-parse[1] etc. So it can be
+confusing to have e.g. an errant `.git/master` being confused with
+`.git/refs/heads/master`.
+
 core.compression::
 	An integer -1..9, indicating a default compression level.
 	-1 is the zlib default. 0 means no compression,
diff --git a/cache.h b/cache.h
index 8d279bc1103..1a0cc5e38a3 100644
--- a/cache.h
+++ b/cache.h
@@ -920,6 +920,7 @@  extern int ignore_case;
 extern int assume_unchanged;
 extern int prefer_symlink_refs;
 extern int warn_ambiguous_refs;
+extern int warn_non_pseudo_refs;
 extern int warn_on_object_refname_ambiguity;
 extern char *apply_default_whitespace;
 extern char *apply_default_ignorewhitespace;
diff --git a/config.c b/config.c
index 1137bd73aff..6a589a770f4 100644
--- a/config.c
+++ b/config.c
@@ -1212,6 +1212,11 @@  static int git_default_core_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.warnnonpseudorefs")) {
+		warn_non_pseudo_refs = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.abbrev")) {
 		if (!value)
 			return config_error_nonbool(var);
diff --git a/environment.c b/environment.c
index bb518c61cd2..85a84eceaf3 100644
--- a/environment.c
+++ b/environment.c
@@ -29,6 +29,7 @@  int assume_unchanged;
 int prefer_symlink_refs;
 int is_bare_repository_cfg = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
+int warn_non_pseudo_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
diff --git a/refs.c b/refs.c
index 3ec5dcba0be..0beeb3dded8 100644
--- a/refs.c
+++ b/refs.c
@@ -649,12 +649,19 @@  static int is_main_pseudoref_syntax(const char *refname)
 		is_pseudoref_syntax(refname);
 }
 
+static int is_any_pseudoref_syntax(const char *refname)
+{
+	return is_main_pseudoref_syntax(refname) ||
+		is_pseudoref_syntax(refname);
+}
+
 int expand_ref(struct repository *repo, const char *str, int len,
 	       struct object_id *oid, char **ref)
 {
 	const char **p, *r;
 	int refs_found = 0;
 	struct strbuf fullref = STRBUF_INIT;
+	static int warned_on_non_pseudo_ref;
 
 	*ref = NULL;
 	for (p = ref_rev_parse_rules; *p; p++) {
@@ -669,6 +676,19 @@  int expand_ref(struct repository *repo, const char *str, int len,
 					    fullref.buf, RESOLVE_REF_READING,
 					    this_result, &flag);
 		if (r) {
+			if (warn_non_pseudo_refs &&
+			    !starts_with(fullref.buf, "refs/") &&
+			    !starts_with(r, "refs/") &&
+			    !strchr(r, '/') &&
+			    !is_any_pseudoref_syntax(r) &&
+			    !warned_on_non_pseudo_ref++) {
+				/*
+				 * TRANSLATORS: The 1st argument is
+				 * e.g. "master", and the 2nd can be
+				 * e.g. "master~10".
+				 */
+				warning(_("matched ref name .git/%s doesn't look like a pseudoref"), r);
+			}
 			if (!refs_found++)
 				*ref = xstrdup(r);
 			if (!warn_ambiguous_refs)
diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
index c7878a60edf..2c8c2c15c40 100755
--- a/t/t1430-bad-ref-name.sh
+++ b/t/t1430-bad-ref-name.sh
@@ -374,4 +374,45 @@  test_expect_success 'branch -m can rename refs/heads/-dash' '
 	git show-ref refs/heads/dash
 '
 
+test_expect_success 'warn on non-pseudoref syntax refs in .git/' '
+	test_when_finished "
+		rm -f .git/mybranch &&
+		rm -rf .git/a-dir &&
+		rm -rf .git/MY-BRANCH_NAME &&
+		rm -rf .git/MY-branch_NAME
+	" &&
+
+	# Setup
+	git rev-parse master >expect &&
+
+	# We ignore anything with slashes
+	mkdir .git/a-dir &&
+	cp expect .git/a-dir/mybranch &&
+	git rev-parse a-dir/mybranch >hash 2>err &&
+	test_must_be_empty err &&
+	test_cmp expect hash &&
+
+	# We ignore upper-case
+	cp expect .git/MY-BRANCH_NAME &&
+	git rev-parse a-dir/mybranch >hash 2>err &&
+	test_must_be_empty err &&
+	test_cmp expect hash &&
+
+	# We ignore mixed-case
+	cp expect .git/MY-branch_NAME &&
+	git rev-parse a-dir/mybranch >hash 2>err &&
+	test_must_be_empty err &&
+	test_cmp expect hash &&
+
+	# We do not ignore lower-case
+	cp expect .git/mybranch &&
+	git rev-parse mybranch >hash 2>err &&
+	test_cmp expect hash &&
+	GIT_TEST_GETTEXT_POISON=false grep "like a pseudoref" err &&
+	git -c core.warnNonPseudoRefs=false rev-parse mybranch >hash 2>err &&
+	test_cmp expect hash &&
+	test_must_be_empty err &&
+	rm .git/mybranch
+'
+
 test_done