diff mbox series

submodule: show inconsistent .gitmodules precedence

Message ID pull.1538.git.git.1687910254473.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series submodule: show inconsistent .gitmodules precedence | expand

Commit Message

Glen Choo June 27, 2023, 11:57 p.m. UTC
From: Glen Choo <chooglen@google.com>

Demonstrate, using tests, an inconsistency in how Git treats repeated
configuration keys in .gitmodules depending on whether we read it from
the working tree or from an object. Do not attempt to fix it yet,
because we don't know how much test coverage we have here, and what the
'right' fix is.

When a .gitmodules file contains multiple configurations for a submodule
like so:

  [submodule "sub"]
    path = path1
    path = path2

It's clearly misconfigured, but our docs don't state what we do in this
situation. If one checks this with "test-tool submodule-config", you'd
see that we ignore every value after the first (aka first-one-wins) and
issue a warning. *But* if you actually tried this with "git submodule",
you'd find it practically impossible to trigger this behavior - what you
actually see is last-one-wins!

The difference between the two is somewhat complicated because there are
two factors at play. The first is a probable bug in how
parse_config_parameter.overwrite affects the way submodule config is
cached. In submodule-config.c:parse_config(), when ".overwrite = 1", the
submodule machinery gladly overwrites the existing value (last-one-wins)
instead of issuing the warning (first-one-wins). This is probably a bug
because it seems that .overwrite is intended to overwrite cached values
from a previous .gitmodules (e.g. if we read .gitmodules from the index
and want to overwrite it with a newer version), not to overwrite values
that we read in the same file.

The second factor is that the two are reading differently cached values:
"git submodule" is reading cached values with ".overwrite = 1", but
test-tool is reading from cached values with ".overwrite = 0". The
submodule cache stores each submodule config based on the submodule name
and the .gitmodules oid it was read from (null_oid() if it's from the
working tree). Both code paths eventually call repo_read_gitmodules() to
eagerly cache .gitmodules from the working tree, and which happens to
use ".overwrite = 1". "git submodule" typically passes null_oid(), which
reads back this value. However, test-tool reads back values from the
actual .gitmodules oid. This causes a cache miss, but when we try to
populate the cache at that oid, we do it with ".overwrite = 0", causing
the difference in behavior.

To make this behavior easy to demonstrate, I've opted to teach test-tool
how to use null_oid() rather than use a real Git command, but this is
almost certainly affecting us in real Git. It's probably flying under
the radar due to some combination of submodule_from_[path|name]() being
relatively uncommon in the codebase, and such misconfigurations being
rare in practice.

We could fix this bug by targeting either of these factors:

- Make ".overwrite = 1" and ".overwrite = 0" do the same thing with
  repeated values in a .gitmodules.
- Remove the eager caching (repo_read_gitmodules()). It seems like we
  can lazily populate the cache on a miss, so we might not need this.

But I'm not sure how long this has been around, and whether users have
come to expect one over the other, so I've opted not to send a fix yet.

Signed-off-by: Glen Choo <chooglen@google.com>
---
    submodule: show inconsistent .gitmodules precedence
    
    While I was writing a .gitmodules parser for jj
    (https://github.com/martinvonz/jj, check it out, it's great!), a
    reviewer asked what would happen if a submodule had repeated fields
    (like .path). It turns out that the answer isn't just undefined (it's
    nowhere in the docs), it's inconsistent!
    
    Here's a bug report patch that demonstrates the issue using test-tool.
    I've stopped short of sending a fix because 1) I'm frankly not sure what
    behavior users have come to rely on 2) this problem is multi-faceted, so
    we could fix this in quite a few ways, but I'm not sure which way is
    'right'.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1538%2Fchooglen%2Fpush-lzmyuzkpxxxq-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1538/chooglen/push-lzmyuzkpxxxq-v1
Pull-Request: https://github.com/git/git/pull/1538

 submodule-config.c               |  7 +++++++
 t/helper/test-submodule-config.c | 22 +++++++++++++++++++++-
 t/t7411-submodule-config.sh      | 22 ++++++++++++++++++++++
 3 files changed, 50 insertions(+), 1 deletion(-)


base-commit: 6ff334181cfb6485d3ba50843038209a2a253907

Comments

Junio C Hamano June 28, 2023, 1:23 a.m. UTC | #1
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

>   [submodule "sub"]
>     path = path1
>     path = path2
>
> It's clearly misconfigured, but our docs don't state what we do in this
> situation. If one checks this with "test-tool submodule-config", you'd
> see that we ignore every value after the first (aka first-one-wins) and
> issue a warning. *But* if you actually tried this with "git submodule",
> you'd find it practically impossible to trigger this behavior - what you
> actually see is last-one-wins!

The last-one-wins sounds like a natural outcome for reusing the
config reading machinery, and the first-one-wins sounds like a total
confusion, but we probably should fail any operation before the user
fixes the .gitmodules by removing all but one path for each
submodule.  Otherwise we risk operating on wrong submodules (e.g. we
may think we are running deinit on "sub" at path #1, but the code
may deinit something different).

Thanks.
Glen Choo June 28, 2023, 1:36 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> The last-one-wins sounds like a natural outcome for reusing the
> config reading machinery, and the first-one-wins sounds like a total
> confusion, but we probably should fail any operation before the user
> fixes the .gitmodules by removing all but one path for each
> submodule.

An informal poll amongst Googlers suggests that my team mostly agrees:
last-one-wins makes more sense than first-one-wins, but erroring out is
the most sensible thing to do.

I'm not sure how reasonable it is to just fail. It makes sense if we
were only reading .gitmodules from the working tree (the user can fix
that), but we also read .gitmodules from commits, and I don't see (yet)
how a user could reasonably recover from that.
diff mbox series

Patch

diff --git a/submodule-config.c b/submodule-config.c
index 7eb7a0d88d2..1c4b5afa8e4 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -441,6 +441,13 @@  static int parse_config(const char *var, const char *value, void *data)
 					     me->gitmodules_oid,
 					     name.buf);
 
+	/*
+	 * FIXME me->overwrite=1 is only meant to overwrite existing submodule
+	 * configurations when we're reading from another .gitmodules (e.g. from
+	 * another commit), but it also unintentionally changes behavior when
+	 * there are multiple configurations in a single .gitmodules - instead
+	 * of respecting the first value, we now respect the last value.
+	 */
 	if (!strcmp(item.buf, "path")) {
 		if (!value)
 			ret = config_error_nonbool(var);
diff --git a/t/helper/test-submodule-config.c b/t/helper/test-submodule-config.c
index 9df2f03ac80..1bb1dc45878 100644
--- a/t/helper/test-submodule-config.c
+++ b/t/helper/test-submodule-config.c
@@ -29,11 +29,31 @@  int cmd__submodule_config(int argc, const char **argv)
 		my_argc--;
 	}
 
-	if (my_argc % 2 != 0)
+	if (my_argc > 1 && my_argc % 2 != 0)
 		die_usage(argc, argv, "Wrong number of arguments.");
 
 	setup_git_directory();
 
+	if (my_argc == 1) {
+		const struct submodule *submodule;
+		const char *path_or_name;
+
+		path_or_name = arg[0];
+		if (lookup_name) {
+			submodule = submodule_from_name(the_repository,
+							null_oid(), path_or_name);
+		} else
+			submodule = submodule_from_path(the_repository,
+							null_oid(), path_or_name);
+		if (!submodule)
+			die_usage(argc, argv, "Submodule not found.");
+
+		printf("Submodule name: '%s' for path '%s'\n", submodule->name,
+		       submodule->path);
+
+		return 0;
+	}
+
 	while (*arg) {
 		struct object_id commit_oid;
 		const struct submodule *submodule;
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index c0167944abd..b67eea7e085 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -258,4 +258,26 @@  test_expect_success 'reading nested submodules config when .gitmodules is not in
 	)
 '
 
+test_expect_success 'multiple config fields in .gitmodules' '
+	test_when_finished "rm -fr super-duplicate" &&
+	mkdir super-duplicate &&
+	(cd super-duplicate &&
+		git init &&
+		git submodule add ../submodule &&
+		git config --file .gitmodules --add submodule.submodule.path ignored &&
+		git config --file .gitmodules --add submodule.submodule.url ignored &&
+		git add .gitmodules &&
+		git commit -m "duplicate fields in .gitmodules" &&
+		test-tool submodule-config HEAD submodule >actual 2>warning &&
+		grep "Skipping second one" warning &&
+		echo "Submodule name: ${SQ}submodule${SQ} for path ${SQ}submodule${SQ}" >expect &&
+		test_cmp expect actual &&
+		# FIXME this should give the same result as "HEAD", but there is
+		#   a bug where if we use null_oid() instead of the real commit
+		#   id, the second .path gets respected instead of the first.
+		test_must_fail test-tool submodule-config submodule 2>null-oid-error &&
+		grep "Submodule not found" null-oid-error
+	)
+'
+
 test_done