diff mbox series

builtin/fetch: skip unnecessary tasks when using --negotiate-only

Message ID 20211207192925.67680-1-chooglen@google.com (mailing list archive)
State Superseded
Headers show
Series builtin/fetch: skip unnecessary tasks when using --negotiate-only | expand

Commit Message

Glen Choo Dec. 7, 2021, 7:29 p.m. UTC
`git fetch --negotiate-only` does not fetch objects and thus, it should
not perform certain auxiliary tasks like updating submodules, updating
the commit graph, or running gc. Although send_pack() invokes `git fetch
--negotiate-only` correctly, cmd_fetch() also reads config variables,
leading to undesirable behavior, like updating submodules if
`submodule.recurse=true`.

Make cmd_fetch() return early if --negotiate-only was specified so that
these auxiliary tasks are skipped.

Signed-off-by: Glen Choo <chooglen@google.com>
---
`git fetch --negotiate-only` is used during push negotiation to
determine the reachability of commits. As its name implies, only
negotiation is performed, not the actual fetching of objects. However,
cmd_fetch() performs certain tasks with the assumption that objects are
fetched:

* Submodules are updated if enabled by recurse.submodules=true, but
  negotiation fetch doesn't actually update the repo, so this doesn't
  make sense (introduced in [1]).
* Commit graphs will be written if enabled by
  fetch.writeCommitGraph=true. But according to
  Documentation/config/fetch.txt [2], this should only be done if a
  pack-file is downloaded
* gc is run, but according to [3], we only do this because we expect
  `git fetch` to introduce objects

Instead of disabling these tasks piecemeal, let's just make cmd_fetch()
return early if --negotiate-only was given. To accommodate possible
future options that don't fetch objects, I opted to introduce another
`if` statement instead of putting the early return in the existing
`if (negotiate_only)` block.

[1] 7dce19d374 (fetch/pull: Add the --recurse-submodules option, 2010-11-12)
[2] 50f26bd035 (fetch: add fetch.writeCommitGraph config setting, 2019-09-02)
[3] 131b8fcbfb (fetch: run gc --auto after fetching, 2013-01-26)

 builtin/fetch.c       | 22 +++++++++++++++++-----
 t/t5516-fetch-push.sh | 12 ++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

Comments

Jonathan Tan Dec. 9, 2021, 10:12 p.m. UTC | #1
Glen Choo <chooglen@google.com> writes:
> `git fetch --negotiate-only` does not fetch objects and thus, it should
> not perform certain auxiliary tasks like updating submodules, updating
> the commit graph, or running gc. Although send_pack() invokes `git fetch
> --negotiate-only` correctly, cmd_fetch() also reads config variables,
> leading to undesirable behavior, like updating submodules if
> `submodule.recurse=true`.
> 
> Make cmd_fetch() return early if --negotiate-only was specified so that
> these auxiliary tasks are skipped.
> 
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
> `git fetch --negotiate-only` is used during push negotiation to
> determine the reachability of commits. As its name implies, only
> negotiation is performed, not the actual fetching of objects. However,
> cmd_fetch() performs certain tasks with the assumption that objects are
> fetched:
> 
> * Submodules are updated if enabled by recurse.submodules=true, but
>   negotiation fetch doesn't actually update the repo, so this doesn't
>   make sense (introduced in [1]).
> * Commit graphs will be written if enabled by
>   fetch.writeCommitGraph=true. But according to
>   Documentation/config/fetch.txt [2], this should only be done if a
>   pack-file is downloaded
> * gc is run, but according to [3], we only do this because we expect
>   `git fetch` to introduce objects
> 
> Instead of disabling these tasks piecemeal, let's just make cmd_fetch()
> return early if --negotiate-only was given. To accommodate possible
> future options that don't fetch objects, I opted to introduce another
> `if` statement instead of putting the early return in the existing
> `if (negotiate_only)` block.

Some of this probably should be in the commit message too.

> +	if (negotiate_only) {
> +		/*
> +		 * --negotiate-only should never recurse into
> +		 * submodules, so there is no need to read .gitmodules.
> +		 */
> +		recurse_submodules = RECURSE_SUBMODULES_OFF;
> +		if (!negotiation_tip.nr)
> +			die(_("--negotiate-only needs one or more --negotiate-tip=*"));
> +	}

Maybe add a check here that --recurse-submodules was not explicitly
given.

> +	/* skip irrelevant tasks if objects were not fetched  */
> +	if (negotiate_only)
> +		return result;

There are other reasons too why objects were not fetched (e.g. if we
already have all of them). Maybe add a NEEDSWORK explaining this.

Besides these code comments and the fact that the commit message could
be improved, this patch looks good to me.
Glen Choo Dec. 9, 2021, 10:36 p.m. UTC | #2
Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> `git fetch --negotiate-only` does not fetch objects and thus, it should
>> not perform certain auxiliary tasks like updating submodules, updating
>> the commit graph, or running gc. Although send_pack() invokes `git fetch
>> --negotiate-only` correctly, cmd_fetch() also reads config variables,
>> leading to undesirable behavior, like updating submodules if
>> `submodule.recurse=true`.
>> 
>> Make cmd_fetch() return early if --negotiate-only was specified so that
>> these auxiliary tasks are skipped.
>> 
>> Signed-off-by: Glen Choo <chooglen@google.com>
>> ---
>> `git fetch --negotiate-only` is used during push negotiation to
>> determine the reachability of commits. As its name implies, only
>> negotiation is performed, not the actual fetching of objects. However,
>> cmd_fetch() performs certain tasks with the assumption that objects are
>> fetched:
>> 
>> * Submodules are updated if enabled by recurse.submodules=true, but
>>   negotiation fetch doesn't actually update the repo, so this doesn't
>>   make sense (introduced in [1]).
>> * Commit graphs will be written if enabled by
>>   fetch.writeCommitGraph=true. But according to
>>   Documentation/config/fetch.txt [2], this should only be done if a
>>   pack-file is downloaded
>> * gc is run, but according to [3], we only do this because we expect
>>   `git fetch` to introduce objects
>> 
>> Instead of disabling these tasks piecemeal, let's just make cmd_fetch()
>> return early if --negotiate-only was given. To accommodate possible
>> future options that don't fetch objects, I opted to introduce another
>> `if` statement instead of putting the early return in the existing
>> `if (negotiate_only)` block.
>
> Some of this probably should be in the commit message too.

I suppose you mean the explanation of why the tasks are irrelevant to
negotiation fetch? i.e. 

   * Submodules are updated if enabled by recurse.submodules=true...
   * Commit graphs will be written if enabled by...
   * gc is run, but according to [3]...

>> +	if (negotiate_only) {
>> +		/*
>> +		 * --negotiate-only should never recurse into
>> +		 * submodules, so there is no need to read .gitmodules.
>> +		 */
>> +		recurse_submodules = RECURSE_SUBMODULES_OFF;
>> +		if (!negotiation_tip.nr)
>> +			die(_("--negotiate-only needs one or more --negotiate-tip=*"));
>> +	}
>
> Maybe add a check here that --recurse-submodules was not explicitly
> given.

Hm, that's not a bad idea, but it's not so easy because we don't have
RECURSE_SUBMODULES_EXPLICIT so it's not easy to tell whether or not
submodule recursion was enabled by CLI option or config.

This is the exact same use case I encountered with "branch
--recurse-submodules" [1]. I think this means that we should consider
standardizing the parsing of submodule.recurse + --recurse-submodules. I
haven't done it yet because it's a little tricky and hard to review.

So I'll punt on this check until we get RECURSE_SUBMODULES_EXPLICIT.

>
>> +	/* skip irrelevant tasks if objects were not fetched  */
>> +	if (negotiate_only)
>> +		return result;
>
> There are other reasons too why objects were not fetched (e.g. if we
> already have all of them). Maybe add a NEEDSWORK explaining this.

Good point, the comment doesn't distinguish between cases where we know
that objects won't be fetched beforehand vs cases where we found out
that objects weren't fetched after the fact. I'll add the NEEDSWORK.

[1] https://lore.kernel.org/git/20211209184928.71413-5-chooglen@google.com
Jonathan Tan Dec. 13, 2021, 10:58 p.m. UTC | #3
Glen Choo <chooglen@google.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > Glen Choo <chooglen@google.com> writes:
> >> `git fetch --negotiate-only` does not fetch objects and thus, it should
> >> not perform certain auxiliary tasks like updating submodules, updating
> >> the commit graph, or running gc. Although send_pack() invokes `git fetch
> >> --negotiate-only` correctly, cmd_fetch() also reads config variables,
> >> leading to undesirable behavior, like updating submodules if
> >> `submodule.recurse=true`.
> >> 
> >> Make cmd_fetch() return early if --negotiate-only was specified so that
> >> these auxiliary tasks are skipped.
> >> 
> >> Signed-off-by: Glen Choo <chooglen@google.com>
> >> ---
> >> `git fetch --negotiate-only` is used during push negotiation to
> >> determine the reachability of commits. As its name implies, only
> >> negotiation is performed, not the actual fetching of objects. However,
> >> cmd_fetch() performs certain tasks with the assumption that objects are
> >> fetched:
> >> 
> >> * Submodules are updated if enabled by recurse.submodules=true, but
> >>   negotiation fetch doesn't actually update the repo, so this doesn't
> >>   make sense (introduced in [1]).
> >> * Commit graphs will be written if enabled by
> >>   fetch.writeCommitGraph=true. But according to
> >>   Documentation/config/fetch.txt [2], this should only be done if a
> >>   pack-file is downloaded
> >> * gc is run, but according to [3], we only do this because we expect
> >>   `git fetch` to introduce objects
> >> 
> >> Instead of disabling these tasks piecemeal, let's just make cmd_fetch()
> >> return early if --negotiate-only was given. To accommodate possible
> >> future options that don't fetch objects, I opted to introduce another
> >> `if` statement instead of putting the early return in the existing
> >> `if (negotiate_only)` block.
> >
> > Some of this probably should be in the commit message too.
> 
> I suppose you mean the explanation of why the tasks are irrelevant to
> negotiation fetch? i.e. 
> 
>    * Submodules are updated if enabled by recurse.submodules=true...
>    * Commit graphs will be written if enabled by...
>    * gc is run, but according to [3]...

Yes - why the behavior is undesirable, and the way you're doing it (by
adding another "if" statement).

After looking at this, more concretely, it might be better to use the
part below the "---" as your commit message. :-) Just note that we're
not just accommodating future options that don't fetch objects - "fetch"
already may not fetch objects (e.g. if the ref we want doesn't exist or
if we already have all the objects).

> > Maybe add a check here that --recurse-submodules was not explicitly
> > given.
> 
> Hm, that's not a bad idea, but it's not so easy because we don't have
> RECURSE_SUBMODULES_EXPLICIT so it's not easy to tell whether or not
> submodule recursion was enabled by CLI option or config.
> 
> This is the exact same use case I encountered with "branch
> --recurse-submodules" [1]. I think this means that we should consider
> standardizing the parsing of submodule.recurse + --recurse-submodules. I
> haven't done it yet because it's a little tricky and hard to review.
> 
> So I'll punt on this check until we get RECURSE_SUBMODULES_EXPLICIT.

Hmm...can we separate out the recurse_submodules variable into one
that's given by config and one that's given by CLI argument?
Glen Choo Dec. 16, 2021, 6:11 p.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> > Glen Choo <chooglen@google.com> writes:
>> >> `git fetch --negotiate-only` does not fetch objects and thus, it should
>> >> not perform certain auxiliary tasks like updating submodules, updating
>> >> the commit graph, or running gc. Although send_pack() invokes `git fetch
>> >> --negotiate-only` correctly, cmd_fetch() also reads config variables,
>> >> leading to undesirable behavior, like updating submodules if
>> >> `submodule.recurse=true`.
>> >> 
>> >> Make cmd_fetch() return early if --negotiate-only was specified so that
>> >> these auxiliary tasks are skipped.
>> >> 
>> >> Signed-off-by: Glen Choo <chooglen@google.com>
>> >> ---
>> >> `git fetch --negotiate-only` is used during push negotiation to
>> >> determine the reachability of commits. As its name implies, only
>> >> negotiation is performed, not the actual fetching of objects. However,
>> >> cmd_fetch() performs certain tasks with the assumption that objects are
>> >> fetched:
>> >> 
>> >> * Submodules are updated if enabled by recurse.submodules=true, but
>> >>   negotiation fetch doesn't actually update the repo, so this doesn't
>> >>   make sense (introduced in [1]).
>> >> * Commit graphs will be written if enabled by
>> >>   fetch.writeCommitGraph=true. But according to
>> >>   Documentation/config/fetch.txt [2], this should only be done if a
>> >>   pack-file is downloaded
>> >> * gc is run, but according to [3], we only do this because we expect
>> >>   `git fetch` to introduce objects
>> >> 
>> >> Instead of disabling these tasks piecemeal, let's just make cmd_fetch()
>> >> return early if --negotiate-only was given. To accommodate possible
>> >> future options that don't fetch objects, I opted to introduce another
>> >> `if` statement instead of putting the early return in the existing
>> >> `if (negotiate_only)` block.
>> >
>> > Some of this probably should be in the commit message too.
>> 
>> I suppose you mean the explanation of why the tasks are irrelevant to
>> negotiation fetch? i.e. 
>> 
>>    * Submodules are updated if enabled by recurse.submodules=true...
>>    * Commit graphs will be written if enabled by...
>>    * gc is run, but according to [3]...
>
> Yes - why the behavior is undesirable, and the way you're doing it (by
> adding another "if" statement).
>
> After looking at this, more concretely, it might be better to use the
> part below the "---" as your commit message. :-) Just note that we're
> not just accommodating future options that don't fetch objects - "fetch"
> already may not fetch objects (e.g. if the ref we want doesn't exist or
> if we already have all the objects).

Good suggestion. I'll clean it up for brevity.

>
>> > Maybe add a check here that --recurse-submodules was not explicitly
>> > given.
>> 
>> Hm, that's not a bad idea, but it's not so easy because we don't have
>> RECURSE_SUBMODULES_EXPLICIT so it's not easy to tell whether or not
>> submodule recursion was enabled by CLI option or config.
>> 
>> This is the exact same use case I encountered with "branch
>> --recurse-submodules" [1]. I think this means that we should consider
>> standardizing the parsing of submodule.recurse + --recurse-submodules. I
>> haven't done it yet because it's a little tricky and hard to review.
>> 
>> So I'll punt on this check until we get RECURSE_SUBMODULES_EXPLICIT.
>
> Hmm...can we separate out the recurse_submodules variable into one
> that's given by config and one that's given by CLI argument?

This would be similar to what I did for branch --recurse-submodules [1],
which, as I noted in my cover letter [2], is easier to understand, but a
little inconsistent with the rest of the --recurse-submodules parsing.

I'm not very enthusiastic about adding more inconsistency, and since
this check isn't critical to this bugfix (and I think there is very
little chance that anyone would run afoul of this check any time soon)
I'd like to hold off on it until RECURSE_SUBMODULES_EXPLICIT.

Unless you think I'm missing something of course :)

[1] https://lore.kernel.org/git/20211216003213.99135-6-chooglen@google.com
[2] https://lore.kernel.org/git/20211216003213.99135-1-chooglen@google.com
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f7abbc31ff..01865b5c09 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1996,6 +1996,17 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 
 	argc = parse_options(argc, argv, prefix,
 			     builtin_fetch_options, builtin_fetch_usage, 0);
+
+	if (negotiate_only) {
+		/*
+		 * --negotiate-only should never recurse into
+		 * submodules, so there is no need to read .gitmodules.
+		 */
+		recurse_submodules = RECURSE_SUBMODULES_OFF;
+		if (!negotiation_tip.nr)
+			die(_("--negotiate-only needs one or more --negotiate-tip=*"));
+	}
+
 	if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
 		int *sfjc = submodule_fetch_jobs_config == -1
 			    ? &submodule_fetch_jobs_config : NULL;
@@ -2005,9 +2016,6 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 		fetch_config_from_gitmodules(sfjc, rs);
 	}
 
-	if (negotiate_only && !negotiation_tip.nr)
-		die(_("--negotiate-only needs one or more --negotiate-tip=*"));
-
 	if (deepen_relative) {
 		if (deepen_relative < 0)
 			die(_("Negative depth in --deepen is not supported"));
@@ -2112,6 +2120,12 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 		result = fetch_multiple(&list, max_children);
 	}
 
+	string_list_clear(&list, 0);
+
+	/* skip irrelevant tasks if objects were not fetched  */
+	if (negotiate_only)
+		return result;
+
 	if (!result && (recurse_submodules != RECURSE_SUBMODULES_OFF)) {
 		struct strvec options = STRVEC_INIT;
 		int max_children = max_jobs;
@@ -2132,8 +2146,6 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 		strvec_clear(&options);
 	}
 
-	string_list_clear(&list, 0);
-
 	prepare_repo_settings(the_repository);
 	if (fetch_write_commit_graph > 0 ||
 	    (fetch_write_commit_graph < 0 &&
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index 8212ca56dc..732031085e 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -229,6 +229,18 @@  test_expect_success 'push with negotiation proceeds anyway even if negotiation f
 	test_i18ngrep "push negotiation failed" err
 '
 
+test_expect_success 'push with negotiation does not attempt to fetch submodules' '
+	mk_empty submodule_upstream &&
+	test_commit -C submodule_upstream submodule_commit &&
+	git submodule add ./submodule_upstream submodule &&
+	mk_empty testrepo &&
+	git push testrepo $the_first_commit:refs/remotes/origin/first_commit &&
+	test_commit -C testrepo unrelated_commit &&
+	git -C testrepo config receive.hideRefs refs/remotes/origin/first_commit &&
+	git -c submodule.recurse=true -c protocol.version=2 -c push.negotiate=1 push testrepo refs/heads/main:refs/remotes/origin/main 2>err &&
+	! grep "Fetching submodule" err
+'
+
 test_expect_success 'push without wildcard' '
 	mk_empty testrepo &&