mbox series

[v5,00/10] fetch --recurse-submodules: fetch unpopulated submodules

Message ID 20220308001433.94995-1-chooglen@google.com (mailing list archive)
Headers show
Series fetch --recurse-submodules: fetch unpopulated submodules | expand

Message

Glen Choo March 8, 2022, 12:14 a.m. UTC
Original cover letter: https://lore.kernel.org/git/20220210044152.78352-1-chooglen@google.com

Based off 'master'.

Thanks as always for the kind feedback :) I believe I incorporated all
of feedback from the previous round except for the following:

- <xmqqk0d9w91r.fsf@gitster.g> In a recursive fetch, we thought that we
  could use the submodule name instead of the superproject commit oid
  and path.

  It's true that we don't need <.super_oid, .path> in order to init the
  subrepo, but it turns out that recursive fetch reads some
  configuration values from .gitmodules (via submodule_from_path()), so
  we still need to store super_oid in order to read the correct
  .gitmodules file.

- <20220304235328.649768-1-jonathantanmy@google.com> I've described the
  differences between the no-submodule-in-index test and the
  other-submodule-in-index test (their comments now refer to one
  another, so the contrast is more obvious), but didn't reorder them
  because I thought that made the test setup less intuitive to read.

- <20220304234622.647776-1-jonathantanmy@google.com> I added
  expect.err.sub2 to verify_test_result() but didn't change
  write_expected_super() to account for sub2. It turned out to be tricky
  to predict the output when 'super' fetches >1 branch because each
  fetched branch can affect the formatting. e.g.

    	   OLD_HEAD..super  super           -> origin/super

  can become

    	   OLD_HEAD..super  super                   -> origin/super
    	   OLD_HEAD..super  some-other-branch       -> origin/some-other-branch

  (I could work around this by replacing the whitespace with sed, but it
  seemed like too much overhead for a one-off test).

= Patch organization

- Patches 1-3 are quality-of-life improvements to the test suite that
  make it easier to write the tests in patch 9.
- Patches 4-6 are preparation for "git fetch" to read .gitmodules from
  the superproject commit in patch 7.
- Patches 7-8 refactor out the logic of "finding which submodules to
  fetch" and "fetching the submodules", making it easier to tell "git
  fetch" to fetch unpopulated submodules.
- Patch 9 teaches "git fetch" to fetch changed, unpopulated submodules
  in addition to populated submodules.
- Patch 10 is an optional bugfix + cleanup of the "git fetch" code that
  removes the last caller of the deprecated "add_submodule_odb()".

= Changes 

== Since v4
- Rename test helpers (s/check_/write_expected_)
- Test style fixes
- Update test comments
- Remove the manual test_cmp in the test that checks sub2 (but we still
  construct expect.err.super manually).

== Since v3
- Numerous style fixes + improved comments.
- Fix sed portability issues.
- Fix failing test due to default branch name assumptions.
- Patch 3: change a test so that it no longer depends on state from the
  previous test.
- Patch 9: fix memory leak when recording super_oid and path + add
  explanatory comment.

== Since v2
- Numerous small fixes to the code and commit message (thanks to all who
  helped spot these :))
- In patch 2, use test_cmp + sed to assert on test output, effectively
  reverting the "use grep" approach of v1-2 (see patch 2's description).
- New patch 3: introduce a test helper that creates the expected
  superproject commit (instead of copy-pasting the code over and over).
  - I did not get rid of "git fetch" inside the test helper (as Jonathan
    suggested) though, because that requires a bigger change in the test
    setup, and I think the test helper makes the test straightforward
    enough.
- New patch 8: refactor some shared logic out into fetch_task_create().
  This reduces code duplication between the get_fetch_task_from_*
  functions.
- In patch 9, add additional tests for 'submodules with the same name'.
- In patch 9, handle a bug where a submodule that is unpopulated by "git
  rm" still has "core.worktree" set and cannot be fetched (see patch 9's
  description).
- Remove the "git fetch --update-shallow" patch (I'll try to send it
  separately).

== Since v1
- Numerous style fixes suggested by Jonathan (thanks!)
- In patch 3, don't prematurely read submodules from the superproject
  commit (see:
  <kl6l5yplyat6.fsf@chooglen-macbookpro.roam.corp.google.com>).
- In patch 7, stop using "git checkout" and "! grep" in tests.
- In patch 7, stop doing the "find changed submodules" rev walk
  unconditionally. Instead, continue to check for .gitmodules, but also
  check for submodules in $GIT_DIR/modules.
  - I'm not entirely happy with the helper function name, see "---" for
    details.
- Move "git fetch --update-shallow" bugfix to patch 8.
  - Because the "find changed submodules" rev walk is no longer
    unconditional, this fix is no longer needed for tests to pass.
- Rename fetch_populated_submodules() to fetch_submodules().


Glen Choo (10):
  t5526: introduce test helper to assert on fetches
  t5526: stop asserting on stderr literally
  t5526: create superproject commits with test helper
  submodule: make static functions read submodules from commits
  submodule: inline submodule_commits() into caller
  submodule: store new submodule commits oid_array in a struct
  submodule: extract get_fetch_task()
  submodule: move logic into fetch_task_create()
  fetch: fetch unpopulated, changed submodules
  submodule: fix latent check_has_commit() bug

 Documentation/fetch-options.txt |  26 +-
 Documentation/git-fetch.txt     |  10 +-
 builtin/fetch.c                 |  14 +-
 submodule.c                     | 442 +++++++++++++++++---------
 submodule.h                     |  21 +-
 t/t5526-fetch-submodules.sh     | 545 ++++++++++++++++++++++++--------
 6 files changed, 746 insertions(+), 312 deletions(-)

Range-diff against v4:
 1:  57cd31afc2 !  1:  f22f992e2b t5526: introduce test helper to assert on fetches
    @@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
     +#
     +# If a repo should not be fetched in the test, its corresponding
     +# expect.err file should be rm-ed.
    -+verify_fetch_result() {
    ++verify_fetch_result () {
     +	ACTUAL_ERR=$1 &&
     +	rm -f expect.err.combined &&
     +	if test -f expect.err.super
 2:  b70c894cff !  2:  f6ee125e16 t5526: stop asserting on stderr literally
    @@ Commit message
     
         Stop asserting on $head1 by replacing it with a dummy value in the
         actual and expected output. Do this by introducing test
    -    helpers (check_*()) that make it easier to construct the expected
    -    output, and use sed to munge the actual output.
    +    helpers (write_expected_*()) that make it easier to construct the
    +    expected output, and use sed to munge the actual output.
     
         Signed-off-by: Glen Choo <chooglen@google.com>
     
    @@ t/t5526-fetch-submodules.sh: export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB
      
      pwd=$(pwd)
      
    -+check_sub () {
    ++write_expected_sub () {
     +	NEW_HEAD=$1 &&
    -+	cat >$pwd/expect.err.sub <<-EOF
    ++	cat >"$pwd/expect.err.sub" <<-EOF
     +	Fetching submodule submodule
     +	From $pwd/submodule
     +	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
     +	EOF
     +}
     +
    -+check_deep () {
    ++write_expected_deep () {
     +	NEW_HEAD=$1 &&
    -+	cat >$pwd/expect.err.deep <<-EOF
    ++	cat >"$pwd/expect.err.deep" <<-EOF
     +	Fetching submodule submodule/subdir/deepsubmodule
     +	From $pwd/deepsubmodule
     +	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
     +	EOF
     +}
     +
    -+check_super () {
    ++write_expected_super () {
     +	NEW_HEAD=$1 &&
    -+	cat >$pwd/expect.err.super <<-EOF
    ++	cat >"$pwd/expect.err.super" <<-EOF
     +	From $pwd/.
     +	   OLD_HEAD..$NEW_HEAD  super      -> origin/super
     +	EOF
    @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
     +		new_head=$(git rev-parse --short HEAD) &&
    -+		check_sub $new_head
    ++		write_expected_sub $new_head
      	) &&
      	(
      		cd deepsubmodule &&
    @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
     -		echo "From $pwd/deepsubmodule" >> ../expect.err.deep &&
     -		echo "   $head1..$head2  deep       -> origin/deep" >> ../expect.err.deep
     +		new_head=$(git rev-parse --short HEAD) &&
    -+		check_deep $new_head
    ++		write_expected_deep $new_head
      	)
      }
      
    -@@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
    - #
    - # If a repo should not be fetched in the test, its corresponding
    - # expect.err file should be rm-ed.
    --verify_fetch_result() {
    -+verify_fetch_result () {
    - 	ACTUAL_ERR=$1 &&
    - 	rm -f expect.err.combined &&
    - 	if test -f expect.err.super
    -@@ t/t5526-fetch-submodules.sh: verify_fetch_result() {
    +@@ t/t5526-fetch-submodules.sh: verify_fetch_result () {
      	then
      		cat expect.err.deep >>expect.err.combined
      	fi &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion doesn't happen when
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.deep &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion stops when no new su
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.sub &&
      	rm expect.err.deep &&
      	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up config in s
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	(
      		cd downstream &&
      		git fetch >../actual.out 2>../actual.err &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up all submodu
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
     +		new_head=$(git rev-parse --short HEAD) &&
    -+		check_sub $new_head
    ++		write_expected_sub $new_head
      	) &&
     -	head1=$(git rev-parse --short HEAD) &&
      	git add submodule &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up all submodu
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	(
      		cd downstream &&
      		git fetch >../actual.out 2>../actual.err
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -		echo "From $pwd/submodule" >> ../expect.err.sub &&
     -		echo "   $head1..$head2  sub        -> origin/sub" >> ../expect.err.sub
     +		new_head=$(git rev-parse --short HEAD) &&
    -+		check_sub $new_head
    ++		write_expected_sub $new_head
      	) &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	(
      		cd downstream &&
      		git config fetch.recurseSubmodules false &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.sub &&
      	rm expect.err.deep &&
      	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'fetch.recurseSubmodules=on-de
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.deep &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'submodule.<sub>.fetchRecurseS
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.deep &&
      	(
      		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "don't fetch submodule when new
     -	echo "From $pwd/." > expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >> expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.sub &&
      	# This file does not exist, but rm -f for readability
      	rm -f expect.err.deep &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'fetch.recurseSubmodules=on-de
     -	echo "From $pwd/." >expect.err.super &&
     -	echo "   $head1..$head2  super      -> origin/super" >>expect.err.super &&
     +	new_head=$(git rev-parse --short HEAD) &&
    -+	check_super $new_head &&
    ++	write_expected_super $new_head &&
      	rm expect.err.deep &&
      	(
      		cd downstream &&
 3:  7e2a01164e !  3:  17ccae2933 t5526: create superproject commits with test helper
    @@ Commit message
         Signed-off-by: Glen Choo <chooglen@google.com>
     
      ## t/t5526-fetch-submodules.sh ##
    -@@ t/t5526-fetch-submodules.sh: check_super () {
    +@@ t/t5526-fetch-submodules.sh: write_expected_super () {
      # a file that contains the expected err if that new commit were fetched.
      # These output files get concatenated in the right order by
      # verify_fetch_result().
    @@ t/t5526-fetch-submodules.sh: add_upstream_commit() {
     +	git commit -m "new submodule" &&
     +	super_head=$(git rev-parse --short HEAD) &&
     +	sub_head=$(git -C submodule rev-parse --short HEAD) &&
    -+	check_super $super_head &&
    -+	check_sub $sub_head
    ++	write_expected_super $super_head &&
    ++	write_expected_sub $sub_head
     +}
     +
      # Verifies that the expected repositories were fetched. This is done by
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up config in s
     -		git add subdir/deepsubmodule &&
     -		git commit -m "new deepsubmodule" &&
     -		new_head=$(git rev-parse --short HEAD) &&
    --		check_sub $new_head
    +-		write_expected_sub $new_head
     -	) &&
     -	git add submodule &&
     -	git commit -m "new submodule" &&
     -	new_head=$(git rev-parse --short HEAD) &&
    --	check_super $new_head &&
    +-	write_expected_super $new_head &&
     +	add_submodule_commits &&
     +	add_superproject_commits &&
      	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "Recursion picks up all submodu
     -		git add subdir/deepsubmodule &&
     -		git commit -m "new deepsubmodule" &&
     -		new_head=$(git rev-parse --short HEAD) &&
    --		check_sub $new_head
    +-		write_expected_sub $new_head
     -	) &&
     +	add_submodule_commits &&
      	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     -	git add submodule &&
     -	git commit -m "new submodule" &&
     -	new_head=$(git rev-parse --short HEAD) &&
    --	check_super $new_head &&
    +-	write_expected_super $new_head &&
    ++	add_submodule_commits &&
     +	add_superproject_commits &&
      	(
      		cd downstream &&
 4:  88112ee225 =  4:  b220dd32c1 submodule: make static functions read submodules from commits
 5:  007cd97aba =  5:  da346aa12a submodule: inline submodule_commits() into caller
 6:  f34ea88fe9 =  6:  c1fd9c3abf submodule: store new submodule commits oid_array in a struct
 7:  f66ab663c5 =  7:  dbb931fe30 submodule: extract get_fetch_task()
 8:  4e3db1bc9d =  8:  7242236df9 submodule: move logic into fetch_task_create()
 9:  9e7b1c1bbe !  9:  0dada865d4 fetch: fetch unpopulated, changed submodules
    @@ submodule.h: int should_update_submodules(void);
      ## t/t5526-fetch-submodules.sh ##
     @@ t/t5526-fetch-submodules.sh: pwd=$(pwd)
      
    - check_sub () {
    + write_expected_sub () {
      	NEW_HEAD=$1 &&
     +	SUPER_HEAD=$2 &&
    - 	cat >$pwd/expect.err.sub <<-EOF
    + 	cat >"$pwd/expect.err.sub" <<-EOF
     -	Fetching submodule submodule
     +	Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD}
      	From $pwd/submodule
      	   OLD_HEAD..$NEW_HEAD  sub        -> origin/sub
      	EOF
    -@@ t/t5526-fetch-submodules.sh: check_sub () {
    + }
      
    - check_deep () {
    ++write_expected_sub2 () {
    ++	NEW_HEAD=$1 &&
    ++	SUPER_HEAD=$2 &&
    ++	cat >"$pwd/expect.err.sub2" <<-EOF
    ++	Fetching submodule submodule2${SUPER_HEAD:+ at commit $SUPER_HEAD}
    ++	From $pwd/submodule2
    ++	   OLD_HEAD..$NEW_HEAD  sub2       -> origin/sub2
    ++	EOF
    ++}
    ++
    + write_expected_deep () {
      	NEW_HEAD=$1 &&
     +	SUB_HEAD=$2 &&
    - 	cat >$pwd/expect.err.deep <<-EOF
    + 	cat >"$pwd/expect.err.deep" <<-EOF
     -	Fetching submodule submodule/subdir/deepsubmodule
     +	Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD}
      	From $pwd/deepsubmodule
      	   OLD_HEAD..$NEW_HEAD  deep       -> origin/deep
      	EOF
    -@@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne
    - '
    - 
    - test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" '
    -+	add_submodule_commits &&
    - 	add_superproject_commits &&
    - 	(
    - 		cd downstream &&
    +@@ t/t5526-fetch-submodules.sh: verify_fetch_result () {
    + 	then
    + 		cat expect.err.deep >>expect.err.combined
    + 	fi &&
    ++	if test -f expect.err.sub2
    ++	then
    ++		cat expect.err.sub2 >>expect.err.combined
    ++	fi &&
    + 	sed -e 's/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./' "$ACTUAL_ERR" >actual.err.cmp &&
    + 	test_cmp expect.err.combined actual.err.cmp
    + }
     @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess
      	verify_fetch_result actual.err
      '
      
    -+# Test that we can fetch submodules in other branches by running fetch
    -+# in a commit that has no submodules.
    ++# These tests verify that we can fetch submodules that aren't in the
    ++# index.
    ++#
    ++# First, test the simple case where the index is empty and we only fetch
    ++# submodules that are not in the index.
     +test_expect_success 'setup downstream branch without submodules' '
     +	(
     +		cd downstream &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
     +
     +	# assert that these are fetched from commits, not the index
    -+	check_sub $sub_head $super_head &&
    -+	check_deep $deep_head $sub_head &&
    ++	write_expected_sub $sub_head $super_head &&
    ++	write_expected_deep $deep_head $sub_head &&
     +
     +	test_must_be_empty actual.out &&
     +	verify_fetch_result actual.err
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
     +
     +	# assert that these are fetched from commits, not the index
    -+	check_sub $sub_head $super_head &&
    -+	check_deep $deep_head $sub_head &&
    ++	write_expected_sub $sub_head $super_head &&
    ++	write_expected_deep $deep_head $sub_head &&
     +
     +	test_must_be_empty actual.out &&
     +	verify_fetch_result actual.err
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +	) &&
     +	test_must_be_empty actual.out &&
     +	super_head=$(git rev-parse --short HEAD) &&
    -+	check_super $super_head &&
    ++	write_expected_super $super_head &&
     +	# Neither should be fetched because the submodule is inactive
     +	rm expect.err.sub &&
     +	rm expect.err.deep &&
     +	verify_fetch_result actual.err
     +'
     +
    -+# In downstream, init "submodule2", but do not check it out while
    -+# fetching. This lets us assert that unpopulated submodules can be
    -+# fetched.
    ++# Now that we know we can fetch submodules that are not in the index,
    ++# test that we can fetch index and non-index submodules in the same
    ++# operation.
     +test_expect_success 'setup downstream branch with other submodule' '
     +	mkdir submodule2 &&
     +	(
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +'
     +
     +test_expect_success "'--recurse-submodules' should fetch submodule commits in changed submodules and the index" '
    ++	test_when_finished "rm expect.err.sub2" &&
     +	# Create new commit in origin/super
     +	add_submodule_commits &&
     +	add_superproject_commits &&
    @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman
     +		git fetch --recurse-submodules >../actual.out 2>../actual.err
     +	) &&
     +	test_must_be_empty actual.out &&
    -+	deep_head=$(git -C submodule/subdir/deepsubmodule rev-parse --short HEAD) &&
    -+	sub_head=$(git -C submodule rev-parse --short HEAD) &&
     +	sub2_head=$(git -C submodule2 rev-parse --short HEAD) &&
    -+	super_head=$(git rev-parse --short HEAD) &&
    ++	super_head=$(git rev-parse --short super) &&
     +	super_sub2_only_head=$(git rev-parse --short super-sub2-only) &&
    ++	write_expected_sub2 $sub2_head $super_sub2_only_head &&
     +
    -+	# Use test_cmp manually because verify_fetch_result does not
    -+	# consider submodule2. All the repos should be fetched, but only
    -+	# submodule2 should be read from a commit
    -+	cat > expect.err.combined <<-EOF &&
    ++	# write_expected_super cannot handle >1 branch. Since this is a
    ++	# one-off, construct expect.err.super manually.
    ++	cat >"$pwd/expect.err.super" <<-EOF &&
     +	From $pwd/.
     +	   OLD_HEAD..$super_head  super           -> origin/super
     +	   OLD_HEAD..$super_sub2_only_head  super-sub2-only -> origin/super-sub2-only
    -+	Fetching submodule submodule
    -+	From $pwd/submodule
    -+	   OLD_HEAD..$sub_head  sub        -> origin/sub
    -+	Fetching submodule submodule/subdir/deepsubmodule
    -+	From $pwd/deepsubmodule
    -+	   OLD_HEAD..$deep_head  deep       -> origin/deep
    -+	Fetching submodule submodule2 at commit $super_sub2_only_head
    -+	From $pwd/submodule2
    -+	   OLD_HEAD..$sub2_head  sub2       -> origin/sub2
     +	EOF
    -+	sed -e "s/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp &&
    -+	test_cmp expect.err.combined actual.err.cmp
    ++	verify_fetch_result actual.err
     +'
     +
      test_expect_success "'--recurse-submodules=on-demand' stops when no new submodule commits are found in the superproject (and ignores config)" '
10:  362ce3c7f8 = 10:  71bb456041 submodule: fix latent check_has_commit() bug

base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7

Comments

Junio C Hamano March 8, 2022, 12:50 a.m. UTC | #1
Glen Choo <chooglen@google.com> writes:

>   It's true that we don't need <.super_oid, .path> in order to init the
>   subrepo, but it turns out that recursive fetch reads some
>   configuration values from .gitmodules (via submodule_from_path()), so
>   we still need to store super_oid in order to read the correct
>   .gitmodules file.

OK, but then do we know which .gitmodules file is the "correct" one,
when there are more than one .super_oid?  Or do we assume that
.gitmodules does not change in the range of superproject commits we
have fetched before deciding what commits need to be fetched in the
submodules?

> == Since v4
> - Rename test helpers (s/check_/write_expected_)
> - Test style fixes
> - Update test comments
> - Remove the manual test_cmp in the test that checks sub2 (but we still
>   construct expect.err.super manually).

All of these changes looked sensible.

Will queue.  Thanks.
Glen Choo March 8, 2022, 6:24 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>>   It's true that we don't need <.super_oid, .path> in order to init the
>>   subrepo, but it turns out that recursive fetch reads some
>>   configuration values from .gitmodules (via submodule_from_path()), so
>>   we still need to store super_oid in order to read the correct
>>   .gitmodules file.
>
> OK, but then do we know which .gitmodules file is the "correct" one,
> when there are more than one .super_oid?  Or do we assume that
> .gitmodules does not change in the range of superproject commits we
> have fetched before deciding what commits need to be fetched in the
> submodules?

This uses a "first one wins approach", which obviously doesn't have
correctness guarantees. But in practice, I don't think this is likely to
cause problems:

- As far as I can tell, the only value we read from .gitmodules is
  'submodule.<name>.fetchRecurseSubmodules', and this value gets
  overridden by two other values: the CLI option, and the config
  variable with the same name in .git/config.

  During "git submodule init", we copy the config values from
  .gitmodules to .git/config. Since we can only fetch init-ed submodules
  anyway, it's quite unlikely that we will ever actually make use of the
  .gitmodules config.

- Even if we do use the .gitmodules config values, it's unlikely that
  the values in .gitmodules will change often, so it _probably_ won't
  matter which one we choose.

- This only matters when the submodule is not in the index. If the
  submodule _is_ in the index, we read .gitmodules from the filesystem
  i.e. these patches shouldn't change the behavior for submodules in the
  index.
Jonathan Tan March 8, 2022, 9:42 p.m. UTC | #3
Glen Choo <chooglen@google.com> writes:
> - <20220304235328.649768-1-jonathantanmy@google.com> I've described the
>   differences between the no-submodule-in-index test and the
>   other-submodule-in-index test (their comments now refer to one
>   another, so the contrast is more obvious), but didn't reorder them
>   because I thought that made the test setup less intuitive to read.

Thanks - the comments make sense.

> - <20220304234622.647776-1-jonathantanmy@google.com> I added
>   expect.err.sub2 to verify_test_result() but didn't change
>   write_expected_super() to account for sub2. It turned out to be tricky
>   to predict the output when 'super' fetches >1 branch because each
>   fetched branch can affect the formatting. e.g.
> 
>     	   OLD_HEAD..super  super           -> origin/super
> 
>   can become
> 
>     	   OLD_HEAD..super  super                   -> origin/super
>     	   OLD_HEAD..super  some-other-branch       -> origin/some-other-branch
> 
>   (I could work around this by replacing the whitespace with sed, but it
>   seemed like too much overhead for a one-off test).

Overwriting just the super part works for me, thanks.

The only thing remaining from me is my comment about fetching OIDs from
one submodule into another (of the same name but different URL) [1], but
I looked into it myself and we can probably postpone handling this to
another patch set.

In such a patch set, we would probably need to store the URLs that are
reported by upstream .gitmodules somewhere. (I forgot that we don't use
them in this patch.) And then, either implement an autosync function
(like "git submodule sync", perhaps gated by a "--sync-submodules"
argument so that users can include it when fetching new commits and
exclude it when fetching historical commits) and/or use those URLs in a
diagnostic message to be printed when the fetch fails.

As it is, the existing fetch-into-submodules-at-HEAD also suffers from
the same flaw, so I'm OK postponing this to another patch set.

So,
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>

[1] https://lore.kernel.org/git/20220304234622.647776-1-jonathantanmy@google.com/
Junio C Hamano March 9, 2022, 7:13 p.m. UTC | #4
Glen Choo <chooglen@google.com> writes:

> This uses a "first one wins approach", which obviously doesn't have
> correctness guarantees. But in practice, I don't think this is likely to
> cause problems:
>
> - As far as I can tell, the only value we read from .gitmodules is
>   'submodule.<name>.fetchRecurseSubmodules', and this value gets
>   overridden by two other values: the CLI option, and the config
>   variable with the same name in .git/config.
>
>   During "git submodule init", we copy the config values from
>   .gitmodules to .git/config. Since we can only fetch init-ed submodules
>   anyway, it's quite unlikely that we will ever actually make use of the
>   .gitmodules config.

These are reasonable.

> - Even if we do use the .gitmodules config values, it's unlikely that
>   the values in .gitmodules will change often, so it _probably_ won't
>   matter which one we choose.

What bad things would we see if the value changes during the span of
history of the superproject we fetched?  How often we would see
broken behaviour is immaterial and breakage being rare is a no excuse
to import a new code with designed-in flaw.  Unless the "rare" is
"never", that is.

I would think using ANY values from .gitmodules without having the
end-user agree with the settings and copying the settings to the
.git/config is a BUG.  So if it mattered from which superproject
commit we took .gitmodules from, that would mean we already have
such a bug and it is not a new problem.

That would be a reasonable argument for this topic. Together with
the previous point, i.e. we do not copy values we see in the in-tree
.gitmodules file to .git/config anyway, it would make a good enough
assurance, I would think.

> - This only matters when the submodule is not in the index. If the
>   submodule _is_ in the index, we read .gitmodules from the filesystem
>   i.e. these patches shouldn't change the behavior for submodules in the
>   index.

How often we would see broken behaviour does not matter.  If it is
broken when the submodule is not in the index, we need to know.

But as you said, it does not sound likely that in-tree .gitmodules
matters.

It leads to a possible #leftoverbit clean-up.  Because we only fetch
submodules that are initialized, the API functions we are using in
this series has no reason to require us to feed _a_ commit in the
superproject to them so that they can find .gitmodules in them.

Fixing the API can probably be left outside the scope of the topic,
to be done soon after the dust from the topic settles, I think, to
avoid distracting us from the topic.

Thanks.
Glen Choo March 9, 2022, 7:49 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> This uses a "first one wins approach", which obviously doesn't have
>> correctness guarantees. But in practice, I don't think this is likely to
>> cause problems:
>>
>> - As far as I can tell, the only value we read from .gitmodules is
>>   'submodule.<name>.fetchRecurseSubmodules', and this value gets
>>   overridden by two other values: the CLI option, and the config
>>   variable with the same name in .git/config.
>>
>>   During "git submodule init", we copy the config values from
>>   .gitmodules to .git/config. Since we can only fetch init-ed submodules
>>   anyway, it's quite unlikely that we will ever actually make use of the
>>   .gitmodules config.
>
> These are reasonable.
>
>> - Even if we do use the .gitmodules config values, it's unlikely that
>>   the values in .gitmodules will change often, so it _probably_ won't
>>   matter which one we choose.
>
> What bad things would we see if the value changes during the span of
> history of the superproject we fetched?  How often we would see
> broken behaviour is immaterial and breakage being rare is a no excuse
> to import a new code with designed-in flaw.  Unless the "rare" is
> "never", that is.

Makes sense, I'll keep this mind.

> I would think using ANY values from .gitmodules without having the
> end-user agree with the settings and copying the settings to the
> .git/config is a BUG.  So if it mattered from which superproject
> commit we took .gitmodules from, that would mean we already have
> such a bug and it is not a new problem.
>
> That would be a reasonable argument for this topic. Together with
> the previous point, i.e. we do not copy values we see in the in-tree
> .gitmodules file to .git/config anyway, it would make a good enough
> assurance, I would think.

To clarify, does this opinion of "don't use config values that aren't
copied into .git/config" extend to in-tree .gitmodules? Prior to this
series, we always read the in-tree .gitmodules to get the config - the
user does not need to copy the settings to .git/config, but we don't
pick a commit to read .gitmodules from.

If we still want to consider in-tree .gitmodules e.g. by merging
.git/config and .gitmodules, then we still have the new problem of
choosing the right .gitmodules.

If the answer is "no, we don't even consider in-tree .gitmodules"
(unless we really have to, like cloning a new submodule), that seems
pretty safe and predictable because we wouldn't have to look in two
different places to figure out what the user wants. And more crucially,
we'd never have to guess which .gitmodules to read - which will become
more of an issue as we add more support for init-ed but unpopulated
submodules.

> It leads to a possible #leftoverbit clean-up.  Because we only fetch
> submodules that are initialized, the API functions we are using in
> this series has no reason to require us to feed _a_ commit in the
> superproject to them so that they can find .gitmodules in them.

Hm, this is true; an initialized submodule should already have the
'expected' information in .git/config. And if we no longer have to fret
about whether we're reading the correct .gitmodules, we can revisit the
idea of "init a subrepo using only its name".

> Fixing the API can probably be left outside the scope of the topic,
> to be done soon after the dust from the topic settles, I think, to
> avoid distracting us from the topic.
>
> Thanks.
Junio C Hamano March 9, 2022, 8:22 p.m. UTC | #6
Glen Choo <chooglen@google.com> writes:

> To clarify, does this opinion of "don't use config values that aren't
> copied into .git/config" extend to in-tree .gitmodules? Prior to this
> series, we always read the in-tree .gitmodules to get the config - the
> user does not need to copy the settings to .git/config, but we don't
> pick a commit to read .gitmodules from.

I think we do, but I also think it was a huge mistake to allow
repository data to directly affect the behaviour of local checkout.

Fixing that is most likely outside the scope of this series, though.
Glen Choo March 9, 2022, 10:11 p.m. UTC | #7
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> To clarify, does this opinion of "don't use config values that aren't
>> copied into .git/config" extend to in-tree .gitmodules? Prior to this
>> series, we always read the in-tree .gitmodules to get the config - the
>> user does not need to copy the settings to .git/config, but we don't
>> pick a commit to read .gitmodules from.
>
> I think we do, but I also think it was a huge mistake to allow
> repository data to directly affect the behaviour of local checkout.

I'm inclined to agree.

> Fixing that is most likely outside the scope of this series, though.

Agree. Thanks!
Glen Choo March 16, 2022, 9:58 p.m. UTC | #8
Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Glen Choo <chooglen@google.com> writes:
>>
>>> To clarify, does this opinion of "don't use config values that aren't
>>> copied into .git/config" extend to in-tree .gitmodules? Prior to this
>>> series, we always read the in-tree .gitmodules to get the config - the
>>> user does not need to copy the settings to .git/config, but we don't
>>> pick a commit to read .gitmodules from.
>>
>> I think we do, but I also think it was a huge mistake to allow
>> repository data to directly affect the behaviour of local checkout.
>
> I'm inclined to agree.
>
>> Fixing that is most likely outside the scope of this series, though.
>
> Agree. Thanks!

I thought that this would have been the end of the discussion, but after
reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression
(oops).

If I am reading everything correctly, we both agree that it's not
good to read _any_ config values from .gitmodules (even if it's
in-tree), and that we should clean it up outside of this topic. So for
this topic to be merged into 'next', is it enough to say that I will fix
this behavior in a follow up topic?
Junio C Hamano March 16, 2022, 10:06 p.m. UTC | #9
Glen Choo <chooglen@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Glen Choo <chooglen@google.com> writes:
>>>
>>>> To clarify, does this opinion of "don't use config values that aren't
>>>> copied into .git/config" extend to in-tree .gitmodules? Prior to this
>>>> series, we always read the in-tree .gitmodules to get the config - the
>>>> user does not need to copy the settings to .git/config, but we don't
>>>> pick a commit to read .gitmodules from.
>>>
>>> I think we do, but I also think it was a huge mistake to allow
>>> repository data to directly affect the behaviour of local checkout.
>>
>> I'm inclined to agree.
>>
>>> Fixing that is most likely outside the scope of this series, though.
>>
>> Agree. Thanks!
>
> I thought that this would have been the end of the discussion, but after
> reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression
> (oops).
>
> If I am reading everything correctly, we both agree that it's not
> good to read _any_ config values from .gitmodules (even if it's
> in-tree), and that we should clean it up outside of this topic. So for
> this topic to be merged into 'next', is it enough to say that I will fix
> this behavior in a follow up topic?

At least we should remember that is something to be fixed.  It may
not be you personally who addresses that issue, though ;-)
Glen Choo March 16, 2022, 10:37 p.m. UTC | #10
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Glen Choo <chooglen@google.com> writes:
>>
>>> Junio C Hamano <gitster@pobox.com> writes:
>>>
>>>> Glen Choo <chooglen@google.com> writes:
>>>>
>>>>> To clarify, does this opinion of "don't use config values that aren't
>>>>> copied into .git/config" extend to in-tree .gitmodules? Prior to this
>>>>> series, we always read the in-tree .gitmodules to get the config - the
>>>>> user does not need to copy the settings to .git/config, but we don't
>>>>> pick a commit to read .gitmodules from.
>>>>
>>>> I think we do, but I also think it was a huge mistake to allow
>>>> repository data to directly affect the behaviour of local checkout.
>>>
>>> I'm inclined to agree.
>>>
>>>> Fixing that is most likely outside the scope of this series, though.
>>>
>>> Agree. Thanks!
>>
>> I thought that this would have been the end of the discussion, but after
>> reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression
>> (oops).
>>
>> If I am reading everything correctly, we both agree that it's not
>> good to read _any_ config values from .gitmodules (even if it's
>> in-tree), and that we should clean it up outside of this topic. So for
>> this topic to be merged into 'next', is it enough to say that I will fix
>> this behavior in a follow up topic?
>
> At least we should remember that is something to be fixed.  It may
> not be you personally who addresses that issue, though ;-)

Perhaps squashing in a NEEDSWORK comment into [PATCH v5 09/10] will
suffice? I can also resend this series if preferred.

----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----

diff --git a/submodule.c b/submodule.c
index 6e6b2d04e4..93c78a4dc3 100644
--- a/submodule.c
+++ b/submodule.c
@@ -795,6 +795,21 @@ static const char *default_name_or_path(const char *path_or_name)
  * superproject commit that points to the submodule, but this is
  * arbitrary - we can choose any (super_oid, path) that matches the
  * submodule's name.
+ *
+ * NEEDSWORK: Storing an arbitrary commit is undesirable because we can't
+ * guarantee that we're reading the commit that the user would expect. A better
+ * scheme would be to just fetch a submodule by its name. This requires two
+ * steps:
+ * - Create a function that behaves like repo_submodule_init(), but accepts a
+ *   submodule name instead of treeish_name and path. This should be easy
+ *   because repo_submodule_init() internally uses the submodule's name.
+ *
+ * - Replace most instances of 'struct submodule' (which is the .gitmodules
+ *   config) with just the submodule name. This is OK because we expect
+ *   submodule settings to be stored in .git/config (via "git submodule init"),
+ *   not .gitmodules. This also lets us delete get_non_gitmodules_submodule(),
+ *   which constructs a bogus 'struct submodule' for the sake of giving a
+ *   placeholder name to a gitlink.
  */
 struct changed_submodule_data {
 	/*
Junio C Hamano March 16, 2022, 11:08 p.m. UTC | #11
Glen Choo <chooglen@google.com> writes:

> Perhaps squashing in a NEEDSWORK comment into [PATCH v5 09/10] will
> suffice? I can also resend this series if preferred.

It should work.  Let me try it in the last integration cycle of
today.

> ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----
>
> diff --git a/submodule.c b/submodule.c
> index 6e6b2d04e4..93c78a4dc3 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -795,6 +795,21 @@ static const char *default_name_or_path(const char *path_or_name)
>   * superproject commit that points to the submodule, but this is
>   * arbitrary - we can choose any (super_oid, path) that matches the
>   * submodule's name.
> + *
> + * NEEDSWORK: Storing an arbitrary commit is undesirable because we can't
> + * guarantee that we're reading the commit that the user would expect. A better
> + * scheme would be to just fetch a submodule by its name. This requires two
> + * steps:
> + * - Create a function that behaves like repo_submodule_init(), but accepts a
> + *   submodule name instead of treeish_name and path. This should be easy
> + *   because repo_submodule_init() internally uses the submodule's name.
> + *
> + * - Replace most instances of 'struct submodule' (which is the .gitmodules
> + *   config) with just the submodule name. This is OK because we expect
> + *   submodule settings to be stored in .git/config (via "git submodule init"),
> + *   not .gitmodules. This also lets us delete get_non_gitmodules_submodule(),
> + *   which constructs a bogus 'struct submodule' for the sake of giving a
> + *   placeholder name to a gitlink.
>   */
>  struct changed_submodule_data {
>  	/*