mbox series

[v3,0/3] Maintenance: adapt custom refspecs

Message ID pull.924.v3.git.1618020225.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Maintenance: adapt custom refspecs | expand

Message

Philippe Blain via GitGitGadget April 10, 2021, 2:03 a.m. UTC
Tom Saeger rightly pointed out [1] that the prefetch task ignores custom
refspecs. This can lead to downloading more data than requested, and it
doesn't even help the future foreground fetches that use that custom
refspec.

[1]
https://lore.kernel.org/git/20210401184914.qmr7jhjbhp2mt3h6@dhcp-10-154-148-175.vpn.oracle.com/

This series fixes this problem by carefully replacing the start of each
refspec's destination with "refs/prefetch/". If the destination already
starts with "refs/", then that is replaced. Otherwise "refs/prefetch/" is
just prepended.

This happens inside of git fetch when a --prefetch option is given. This
allows us to maniuplate a struct refspec_item instead of a full refspec
string. It also simplifies our logic in testing the prefetch task.


Update in V3
============

 * The fix is almost completely rewritten as an update to 'git fetch'. See
   the new PATCH 2 for this update.

 * There was some discussion of rewriting test_subcommand, but that can be
   delayed until a proper solution is found to complications around softer
   matches.


Updates in V2
=============

Thanks for the close eye on this series. I appreciate the recommendations,
which I believe I have responded to them all:

 * Fixed typos.
 * Made refspec_item_format() re-entrant. Consumers must free the buffer.
 * Cleaned up style (quoting and tabbing).

Thanks, -Stolee

Derrick Stolee (3):
  maintenance: simplify prefetch logic
  fetch: add --prefetch option
  maintenance: use 'git fetch --prefetch'

 Documentation/fetch-options.txt   |  5 +++
 Documentation/git-maintenance.txt |  6 ++--
 builtin/fetch.c                   | 56 +++++++++++++++++++++++++++++++
 builtin/gc.c                      | 36 +++++---------------
 t/t5582-fetch-negative-refspec.sh | 30 +++++++++++++++++
 t/t7900-maintenance.sh            | 14 ++++----
 6 files changed, 109 insertions(+), 38 deletions(-)


base-commit: 89b43f80a514aee58b662ad606e6352e03eaeee4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-924%2Fderrickstolee%2Fmaintenance%2Frefspec-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-924/derrickstolee/maintenance/refspec-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/924

Range-diff vs v2:

 1:  5aa0cb06c3f2 = 1:  4c0e983ba56f maintenance: simplify prefetch logic
 2:  d58a3e042ee8 < -:  ------------ test-lib: use exact match for test_subcommand
 3:  96388d949b98 < -:  ------------ refspec: output a refspec item
 4:  bf296282323a < -:  ------------ test-tool: test refspec input/output
 -:  ------------ > 2:  7f488eea6dbd fetch: add --prefetch option
 5:  9592224e3d42 ! 3:  ed055d772452 maintenance: allow custom refspecs during prefetch
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    maintenance: allow custom refspecs during prefetch
     +    maintenance: use 'git fetch --prefetch'
      
     -    The prefetch task previously used the default refspec source plus a
     -    custom refspec destination to avoid colliding with remote refs:
     +    The 'prefetch' maintenance task previously forced the following refspec
     +    for each remote:
      
                  +refs/heads/*:refs/prefetch/<remote>/*
      
     -    However, some users customize their refspec to reduce how much data they
     -    download from specific remotes. This can involve restrictive patterns
     -    for fetching or negative patterns to avoid downloading some refs.
     +    If a user has specified a more strict refspec for the remote, then this
     +    prefetch task downloads more objects than necessary.
      
     -    Modify fetch_remote() to iterate over the remote's refspec list and
     -    translate that into the appropriate prefetch scenario. Specifically,
     -    re-parse the raw form of the refspec into a new 'struct refspec' and
     -    modify the 'dst' member to replace a leading "refs/" substring with
     -    "refs/prefetch/", or prepend "refs/prefetch/" to 'dst' otherwise.
     -    Negative refspecs do not have a 'dst' so they can be transferred to the
     -    'git fetch' command unmodified.
     -
     -    This prefix change provides the benefit of keeping whatever collisions
     -    may exist in the custom refspecs, if that is a desirable outcome.
     -
     -    This changes the names of the refs that would be fetched by the default
     -    refspec. Instead of "refs/prefetch/<remote>/<branch>" they will now go
     -    to "refs/prefetch/remotes/<remote>/<branch>". While this is a change, it
     -    is not a seriously breaking one: these refs are intended to be hidden
     -    and not used.
     +    The previous change introduced the '--prefetch' option to 'git fetch'
     +    which manipulates the remote's refspec to place all resulting refs into
     +    refs/prefetch/, with further partitioning based on the destinations of
     +    those refspecs.
      
          Update the documentation to be more generic about the destination refs.
     -    Do not mention custom refpecs explicitly, as that does not need to be
     +    Do not mention custom refspecs explicitly, as that does not need to be
          highlighted in this documentation. The important part of placing refs in
     -    refs/prefetch remains.
     +    refs/prefetch/ remains.
      
          Reported-by: Tom Saeger <tom.saeger@oracle.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: prefetch::
     +@@ Documentation/git-maintenance.txt: commit-graph::
     + prefetch::
     + 	The `prefetch` task updates the object directory with the latest
       	objects from all registered remotes. For each remote, a `git fetch`
     - 	command is run. The refmap is custom to avoid updating local or remote
     - 	branches (those in `refs/heads` or `refs/remotes`). Instead, the
     +-	command is run. The refmap is custom to avoid updating local or remote
     +-	branches (those in `refs/heads` or `refs/remotes`). Instead, the
      -	remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are
      -	not updated.
     -+	refs are stored in `refs/prefetch/`. Also, tags are not updated.
     ++	command is run. The configured refspec is modified to place all
     ++	requested refs within `refs/prefetch/`. Also, tags are not updated.
       +
       This is done to avoid disrupting the remote-tracking branches. The end users
       expect these refs to stay unmoved unless they initiate a fetch.  With prefetch
      
       ## builtin/gc.c ##
     -@@
     - #include "remote.h"
     - #include "object-store.h"
     - #include "exec-cmd.h"
     -+#include "refspec.h"
     - 
     - #define FAILED_RUN "failed to run %s"
     - 
      @@ builtin/gc.c: static int fetch_remote(struct remote *remote, void *cbdata)
     - {
     - 	struct maintenance_run_opts *opts = cbdata;
       	struct child_process child = CHILD_PROCESS_INIT;
     -+	int i;
       
       	child.git_cmd = 1;
     - 	strvec_pushl(&child.args, "fetch", remote->name, "--prune", "--no-tags",
     -@@ builtin/gc.c: static int fetch_remote(struct remote *remote, void *cbdata)
     +-	strvec_pushl(&child.args, "fetch", remote->name, "--prune", "--no-tags",
     ++	strvec_pushl(&child.args, "fetch", remote->name,
     ++		     "--prefetch", "--prune", "--no-tags",
     + 		     "--no-write-fetch-head", "--recurse-submodules=no",
     +-		     "--refmap=", NULL);
     ++		     NULL);
     + 
       	if (opts->quiet)
       		strvec_push(&child.args, "--quiet");
       
      -	strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote->name);
     -+	for (i = 0; i < remote->fetch.nr; i++) {
     -+		struct refspec_item replace;
     -+		struct refspec_item *rsi = &remote->fetch.items[i];
     -+		struct strbuf new_dst = STRBUF_INIT;
     -+		size_t ignore_len = 0;
     -+		char *replace_string;
     -+
     -+		if (rsi->negative) {
     -+			strvec_push(&child.args, remote->fetch.raw[i]);
     -+			continue;
     -+		}
     -+
     -+		refspec_item_init(&replace, remote->fetch.raw[i], 1);
     -+
     -+		/*
     -+		 * If a refspec dst starts with "refs/" at the start,
     -+		 * then we will replace "refs/" with "refs/prefetch/".
     -+		 * Otherwise, we will prepend the dst string with
     -+		 * "refs/prefetch/".
     -+		 */
     -+		if (!strncmp(replace.dst, "refs/", 5))
     -+			ignore_len = 5;
     -+
     -+		strbuf_addstr(&new_dst, "refs/prefetch/");
     -+		strbuf_addstr(&new_dst, replace.dst + ignore_len);
     -+		free(replace.dst);
     -+		replace.dst = strbuf_detach(&new_dst, NULL);
     -+
     -+		replace_string = refspec_item_format(&replace);
     -+		strvec_push(&child.args, replace_string);
     -+		free(replace_string);
     -+
     -+		refspec_item_clear(&replace);
     -+	}
     - 
     +-
       	return !!run_command(&child);
       }
     + 
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'prefetch multiple remotes' '
     + 	test_commit -C clone1 one &&
       	test_commit -C clone2 two &&
       	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
     - 	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     --	test_subcommand git fetch remote1 $fetchargs "+refs/heads/*:refs/prefetch/remote1/*" <run-prefetch.txt &&
     --	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remote2/*" <run-prefetch.txt &&
     -+	test_subcommand git fetch remote1 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote1/*" <run-prefetch.txt &&
     -+	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote2/*" <run-prefetch.txt &&
     +-	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     +-	test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt &&
     +-	test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt &&
     ++	fetchargs="--prefetch --prune --no-tags --no-write-fetch-head --recurse-submodules=no --quiet" &&
     ++	test_subcommand git fetch remote1 $fetchargs <run-prefetch.txt &&
     ++	test_subcommand git fetch remote2 $fetchargs <run-prefetch.txt &&
       	test_path_is_missing .git/refs/remotes &&
      -	git log prefetch/remote1/one &&
      -	git log prefetch/remote2/two &&
     @@ t/t7900-maintenance.sh: test_expect_success 'prefetch multiple remotes' '
       
       	test_cmp_config refs/prefetch/ log.excludedecoration &&
       	git log --oneline --decorate --all >log &&
     - 	! grep "prefetch" log
     - '
     - 
     -+test_expect_success 'prefetch custom refspecs' '
     -+	git -C clone1 branch -f special/fetched HEAD &&
     -+	git -C clone1 branch -f special/secret/not-fetched HEAD &&
     -+
     -+	# create multiple refspecs for remote1
     -+	git config --add remote.remote1.fetch "+refs/heads/special/fetched:refs/heads/fetched" &&
     -+	git config --add remote.remote1.fetch "^refs/heads/special/secret/not-fetched" &&
     -+
     -+	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&
     -+
     -+	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
     -+
     -+	# skips second refspec because it is not a pattern type
     -+	rs1="+refs/heads/*:refs/prefetch/remotes/remote1/*" &&
     -+	rs2="+refs/heads/special/fetched:refs/prefetch/heads/fetched" &&
     -+	rs3="^refs/heads/special/secret/not-fetched" &&
     -+
     -+	test_subcommand git fetch remote1 $fetchargs "$rs1" "$rs2" "$rs3" <prefetch-refspec.txt &&
     -+	test_subcommand git fetch remote2 $fetchargs "+refs/heads/*:refs/prefetch/remotes/remote2/*" <prefetch-refspec.txt &&
     -+
     -+	# first refspec is overridden by second
     -+	test_must_fail git rev-parse refs/prefetch/special/fetched &&
     -+	git rev-parse refs/prefetch/heads/fetched &&
     -+
     -+	# possible incorrect places for the non-fetched ref
     -+	test_must_fail git rev-parse refs/prefetch/remotes/remote1/secret/not-fetched &&
     -+	test_must_fail git rev-parse refs/prefetch/remotes/remote1/not-fetched &&
     -+	test_must_fail git rev-parse refs/heads/secret/not-fetched &&
     -+	test_must_fail git rev-parse refs/heads/not-fetched
     -+'
     -+
     - test_expect_success 'prefetch and existing log.excludeDecoration values' '
     - 	git config --unset-all log.excludeDecoration &&
     - 	git config log.excludeDecoration refs/remotes/remote1/ &&

Comments

Junio C Hamano April 11, 2021, 1:35 a.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * The fix is almost completely rewritten as an update to 'git fetch'. See
>    the new PATCH 2 for this update.

I do agree that it gives us the most flexibility there with nice
encapsulation.  Nobody other than "git fetch" needs to know how it
computes which remote refs are fetched given the real pathspec, and
the only thing the caller with "--prefetch" is interested in is that
the prefetch operation would not contaminate the remote-tracking
refs.

Great idea.  I wish I were the one who thought of it first ;-)
Tom Saeger April 12, 2021, 4:48 p.m. UTC | #2
On Sat, Apr 10, 2021 at 06:35:40PM -0700, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> >  * The fix is almost completely rewritten as an update to 'git fetch'. See
> >    the new PATCH 2 for this update.
> 
> I do agree that it gives us the most flexibility there with nice
> encapsulation.  Nobody other than "git fetch" needs to know how it
> computes which remote refs are fetched given the real pathspec, and
> the only thing the caller with "--prefetch" is interested in is that
> the prefetch operation would not contaminate the remote-tracking
> refs.
> 
> Great idea.  I wish I were the one who thought of it first ;-)

Yes - this simplifies things greatly!

I do have one case that fails prefetch though.
It's a case where all the remote's fetch configs are filtered out.

Example:

	[remote "pr-924"]
	    url = https://github.com/gitgitgadget/git
	    fetch = +refs/tags/pr-924/derrickstolee/maintenance/refspec-v3
	    skipfetchall = true
	    tagopt = --no-tags


In this case, running `git fetch pr-924` will fetch and update
FETCH_HEAD, but running with maintenance prefetch task results in:

fatal: Couldn't find remote ref HEAD
error: failed to prefetch remotes
error: task 'prefetch' failed

I tracked this down a bit, but don't have a suggestion how to fix it.

builtin/fetch.c `get_ref_map` makes two calls to `filter_prefetch_refspec`,
one for 'rs' and another for 'remote->fetch'.

`filter_prefetch_refspec` works and filters out the above fetch config.
This correctly yields condition
`rs->nr == 0` and `remote->fetch.nr == 0`

Later a call is made to `get_remote_ref(remote_refs, "HEAD")` which
fails, leading to `fatal: Couldn't find remote ref HEAD`

Should this be expected, or should this now be special-cased for 'prefetch'
somehow?

Regards,

--Tom
Tom Saeger April 12, 2021, 5:24 p.m. UTC | #3
On Mon, Apr 12, 2021 at 11:48:09AM -0500, Tom Saeger wrote:
> On Sat, Apr 10, 2021 at 06:35:40PM -0700, Junio C Hamano wrote:
> > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > 
> > >  * The fix is almost completely rewritten as an update to 'git fetch'. See
> > >    the new PATCH 2 for this update.
> > 
> > I do agree that it gives us the most flexibility there with nice
> > encapsulation.  Nobody other than "git fetch" needs to know how it
> > computes which remote refs are fetched given the real pathspec, and
> > the only thing the caller with "--prefetch" is interested in is that
> > the prefetch operation would not contaminate the remote-tracking
> > refs.
> > 
> > Great idea.  I wish I were the one who thought of it first ;-)
> 
> Yes - this simplifies things greatly!
> 
> I do have one case that fails prefetch though.
> It's a case where all the remote's fetch configs are filtered out.
> 
> Example:
> 
> 	[remote "pr-924"]
> 	    url = https://github.com/gitgitgadget/git
> 	    fetch = +refs/tags/pr-924/derrickstolee/maintenance/refspec-v3
> 	    skipfetchall = true
> 	    tagopt = --no-tags
> 
> 
> In this case, running `git fetch pr-924` will fetch and update
> FETCH_HEAD, but running with maintenance prefetch task results in:
> 
> fatal: Couldn't find remote ref HEAD
> error: failed to prefetch remotes
> error: task 'prefetch' failed
> 
> I tracked this down a bit, but don't have a suggestion how to fix it.

This ugly hack fixes this failure.  I'll keep staring at it.


diff --git a/builtin/fetch.c b/builtin/fetch.c
index 30856b442b79..6489ce7d8d3b 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -508,6 +508,9 @@ static struct ref *get_ref_map(struct remote *remote,
        if (remote)
                filter_prefetch_refspec(&remote->fetch);

+       if (prefetch && !rs->nr && remote && !remote->fetch.nr)
+               return NULL;
+
        if (rs->nr) {
                struct refspec *fetch_refspec;

--



> 
> builtin/fetch.c `get_ref_map` makes two calls to `filter_prefetch_refspec`,
> one for 'rs' and another for 'remote->fetch'.
> 
> `filter_prefetch_refspec` works and filters out the above fetch config.
> This correctly yields condition
> `rs->nr == 0` and `remote->fetch.nr == 0`
> 
> Later a call is made to `get_remote_ref(remote_refs, "HEAD")` which
> fails, leading to `fatal: Couldn't find remote ref HEAD`
> 
> Should this be expected, or should this now be special-cased for 'prefetch'
> somehow?
> 
> Regards,
> 
> --Tom
Tom Saeger April 12, 2021, 5:41 p.m. UTC | #4
On Mon, Apr 12, 2021 at 12:24:27PM -0500, Tom Saeger wrote:
> On Mon, Apr 12, 2021 at 11:48:09AM -0500, Tom Saeger wrote:
> > On Sat, Apr 10, 2021 at 06:35:40PM -0700, Junio C Hamano wrote:
> > > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > > 
> > > >  * The fix is almost completely rewritten as an update to 'git fetch'. See
> > > >    the new PATCH 2 for this update.
> > > 
> > > I do agree that it gives us the most flexibility there with nice
> > > encapsulation.  Nobody other than "git fetch" needs to know how it
> > > computes which remote refs are fetched given the real pathspec, and
> > > the only thing the caller with "--prefetch" is interested in is that
> > > the prefetch operation would not contaminate the remote-tracking
> > > refs.
> > > 
> > > Great idea.  I wish I were the one who thought of it first ;-)
> > 
> > Yes - this simplifies things greatly!
> > 
> > I do have one case that fails prefetch though.
> > It's a case where all the remote's fetch configs are filtered out.
> > 
> > Example:
> > 
> > 	[remote "pr-924"]
> > 	    url = https://github.com/gitgitgadget/git
> > 	    fetch = +refs/tags/pr-924/derrickstolee/maintenance/refspec-v3
> > 	    skipfetchall = true
> > 	    tagopt = --no-tags
> > 
> > 
> > In this case, running `git fetch pr-924` will fetch and update
> > FETCH_HEAD, but running with maintenance prefetch task results in:
> > 
> > fatal: Couldn't find remote ref HEAD
> > error: failed to prefetch remotes
> > error: task 'prefetch' failed
> > 
> > I tracked this down a bit, but don't have a suggestion how to fix it.
> 
> This ugly hack fixes this failure.  I'll keep staring at it.
> 
> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 30856b442b79..6489ce7d8d3b 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -508,6 +508,9 @@ static struct ref *get_ref_map(struct remote *remote,
>         if (remote)
>                 filter_prefetch_refspec(&remote->fetch);
> 
> +       if (prefetch && !rs->nr && remote && !remote->fetch.nr)
> +               return NULL;
> +
>         if (rs->nr) {
>                 struct refspec *fetch_refspec;
> 
> --
> 

Less ugly fix...

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 30856b442b79..5fbffbd17d7d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -576,6 +576,8 @@ static struct ref *get_ref_map(struct remote *remote,
                        if (has_merge &&
                            !strcmp(branch->remote_name, remote->name))
                                add_merge_config(&ref_map, remote_refs, branch, &tail);
+               } else if (prefetch) {
+                       ;
                } else {
                        ref_map = get_remote_ref(remote_refs, "HEAD");
                        if (!ref_map)
--

Other ideas?


> 
> 
> > 
> > builtin/fetch.c `get_ref_map` makes two calls to `filter_prefetch_refspec`,
> > one for 'rs' and another for 'remote->fetch'.
> > 
> > `filter_prefetch_refspec` works and filters out the above fetch config.
> > This correctly yields condition
> > `rs->nr == 0` and `remote->fetch.nr == 0`
> > 
> > Later a call is made to `get_remote_ref(remote_refs, "HEAD")` which
> > fails, leading to `fatal: Couldn't find remote ref HEAD`
> > 
> > Should this be expected, or should this now be special-cased for 'prefetch'
> > somehow?
> > 
> > Regards,
> > 
> > --Tom
Derrick Stolee April 12, 2021, 8:25 p.m. UTC | #5
On 4/12/21 1:41 PM, Tom Saeger wrote:> 
> Less ugly fix...
> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 30856b442b79..5fbffbd17d7d 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -576,6 +576,8 @@ static struct ref *get_ref_map(struct remote *remote,
>                         if (has_merge &&
>                             !strcmp(branch->remote_name, remote->name))
>                                 add_merge_config(&ref_map, remote_refs, branch, &tail);
> +               } else if (prefetch) {
> +                       ;
>                 } else {

I'll give this a try, but with "else if (!prefetch)" for the
last block,instead.

Thanks for your diligent testing! It's helping a lot.

Thanks,
-Stolee