diff mbox series

fetch: in partial clone, check presence of targets

Message ID 20180920184843.20898-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series fetch: in partial clone, check presence of targets | expand

Commit Message

Jonathan Tan Sept. 20, 2018, 6:48 p.m. UTC
When fetching an object that is known as a promisor object to the local
repository, the connectivity check in quickfetch() in builtin/fetch.c
succeeds, causing object transfer to be bypassed. However, this should
not happen if that object is merely promised and not actually present.

Because this happens, when a user invokes "git fetch origin <sha-1>" on
the command-line, the <sha-1> object may not actually be fetched even
though the command returns an exit code of 0. This is a similar issue
(but with a different cause) to the one fixed by a0c9016abd
("upload-pack: send refs' objects despite "filter"", 2018-07-09).

Therefore, update quickfetch() to also directly check for the presence
of all objects to be fetched.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
In this patch, I have striven to avoid piping from Git commands that may
fail, following the guidelines in [1].

[1] https://public-inbox.org/git/c625bfe2205d51b3158ef71e4bf472708642c146.1537223021.git.matvore@google.com/
---
 builtin/fetch.c          | 19 +++++++++++++++++++
 t/t5616-partial-clone.sh | 17 +++++++++++++++++
 2 files changed, 36 insertions(+)

Comments

Junio C Hamano Sept. 20, 2018, 8:50 p.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 61bec5d21..e9640fe5a 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -938,6 +938,25 @@ static int quickfetch(struct ref *ref_map)
>  	 */
>  	if (deepen)
>  		return -1;
> +
> +	if (repository_format_partial_clone) {
> +		/*
> +		 * For the purposes of the connectivity check,
> +		 * check_connected() considers all objects promised by
> +		 * promisor objects as existing, which means that the
> +		 * connectivity check would pass even if a target object
> +		 * in rm is merely promised and not present. When
> +		 * fetching objects, we need all of them to be present
> +		 * (in addition to having correct connectivity), so
> +		 * check them directly.
> +		 */
> +		struct ref *r;
> +		for (r = rm; r; r = r->next) {
> +			if (!has_object_file(&r->old_oid))
> +				return -1;
> +		}

Because check_connected() lies in the presense of "promisor", we can
defeat it this way, which makes sense.  

I wonder if it makes sense to do this check unconditionally, even
when we are in a fully functioning clone.  The check is quite cheap
and can reject a ref_map that has an object we do not know about,
without check_connected() having to ask the rev-list.

> +	}
> +
>  	opt.quiet = 1;
>  	return check_connected(iterate_ref_map, &rm, &opt);
>  }
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index bbbe7537d..359d27d02 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -170,6 +170,23 @@ test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
>  	git -C dst fsck
>  '
>  
> +test_expect_success 'fetch what is specified on CLI even if already promised' '
> +	rm -rf src dst.git &&
> +	git init src &&
> +	test_commit -C src foo &&
> +	test_config -C src uploadpack.allowfilter 1 &&
> +	test_config -C src uploadpack.allowanysha1inwant 1 &&
> +
> +	git hash-object --stdin <src/foo.t >blob &&
> +
> +	git clone --bare --filter=blob:none "file://$(pwd)/src" dst.git &&
> +	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_before &&
> +	grep "?$(cat blob)" missing_before &&
> +	git -C dst.git fetch origin $(cat blob) &&
> +	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_after &&
> +	! grep "?$(cat blob)" missing_after
> +'
> +
>  . "$TEST_DIRECTORY"/lib-httpd.sh
>  start_httpd
Jonathan Tan Sept. 20, 2018, 10:10 p.m. UTC | #2
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > +	if (repository_format_partial_clone) {
> > +		/*
> > +		 * For the purposes of the connectivity check,
> > +		 * check_connected() considers all objects promised by
> > +		 * promisor objects as existing, which means that the
> > +		 * connectivity check would pass even if a target object
> > +		 * in rm is merely promised and not present. When
> > +		 * fetching objects, we need all of them to be present
> > +		 * (in addition to having correct connectivity), so
> > +		 * check them directly.
> > +		 */
> > +		struct ref *r;
> > +		for (r = rm; r; r = r->next) {
> > +			if (!has_object_file(&r->old_oid))
> > +				return -1;
> > +		}
> 
> Because check_connected() lies in the presense of "promisor", we can
> defeat it this way, which makes sense.  
> 
> I wonder if it makes sense to do this check unconditionally, even
> when we are in a fully functioning clone.  The check is quite cheap
> and can reject a ref_map that has an object we do not know about,
> without check_connected() having to ask the rev-list.

It makes sense to me from a runtime point of view - the usual case, for
me, is when we're missing at least one object that we're fetching, and
doing the cheap check even allows us to skip the expensive check.

The hard part for me lies in how to communicate to future readers of the
code that they cannot remove this section to simplify the code. We would
need a more complicated comment, something like this:

 /*
  * Check if all wanted objects are present.
  *
  * If the local repository is a partial clone, check_connected() is not
  * sufficient - it would pass even if a wanted object is merely
  * promised and not present. This is because, for the purposes of the
  * connectivity check, check_connected() considers all objects promised
  * by promisor objects as existing.
  *
  * If the local repository is not a partial clone, check_connected() is
  * sufficient, but this loop allows us to avoid the expensive
  * check_connected() in the usual case that at least one wanted object
  * is missing.
  */
Junio C Hamano Sept. 20, 2018, 11:28 p.m. UTC | #3
Jonathan Tan <jonathantanmy@google.com> writes:

> The hard part for me lies in how to communicate to future readers of the
> code that they cannot remove this section to simplify the code. We would
> need a more complicated comment, something like this:

That suggests two things.

 - Perhaps quickfetch() is misnamed.  It is to ensure "these exist,
   and are 'connected'"; renaming the helper to convey that would be
   sufficient to prevent future readers from removing the "these
   exist" check from it.

 - Perhaps check_connected() is also misnamed, if the above "these
   exist, and are 'connected'" is not a sufficient warning against
   removal of the "these exist" test, perhaps "check_connected()" is
   not telling the readers that things that are 'connected' do not
   have to exist.  What does being 'connected' mean in the world
   with "promised" objects anyway?  The designer of the feature
   should probably have a concise and clear answer.

>  /*
>   * Check if all wanted objects are present.

Here 'wanted' means... the tip that was directly requested must
exist, and in addition, anything that is reachable from it must
either exist locally or available from the lazy-clone source?  But
that is not quite it. Your definition of 'present' is fuzzy and mean
two different things---for the wanted tips, they must exist.  For
the objects that are required for these wanted tips to be well
formed, they do not have to exist but it is OK for them to be merely
promised.

Perhaps the comment for the quickfetch() function itself should say

/*
 * Ensure that the requested tips exist locally, and anything that is
 * reachable from them either exist locally or promised to be available.
 */

Adding a similar comment to check_connected() function is left as an
exercise, but I suspect it would be the latter half of the above
sentence.

It may be worth renaming both functions for clarity, as I mentioned
already.
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d21..e9640fe5a 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -938,6 +938,25 @@  static int quickfetch(struct ref *ref_map)
 	 */
 	if (deepen)
 		return -1;
+
+	if (repository_format_partial_clone) {
+		/*
+		 * For the purposes of the connectivity check,
+		 * check_connected() considers all objects promised by
+		 * promisor objects as existing, which means that the
+		 * connectivity check would pass even if a target object
+		 * in rm is merely promised and not present. When
+		 * fetching objects, we need all of them to be present
+		 * (in addition to having correct connectivity), so
+		 * check them directly.
+		 */
+		struct ref *r;
+		for (r = rm; r; r = r->next) {
+			if (!has_object_file(&r->old_oid))
+				return -1;
+		}
+	}
+
 	opt.quiet = 1;
 	return check_connected(iterate_ref_map, &rm, &opt);
 }
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index bbbe7537d..359d27d02 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -170,6 +170,23 @@  test_expect_success 'partial clone fetches blobs pointed to by refs even if norm
 	git -C dst fsck
 '
 
+test_expect_success 'fetch what is specified on CLI even if already promised' '
+	rm -rf src dst.git &&
+	git init src &&
+	test_commit -C src foo &&
+	test_config -C src uploadpack.allowfilter 1 &&
+	test_config -C src uploadpack.allowanysha1inwant 1 &&
+
+	git hash-object --stdin <src/foo.t >blob &&
+
+	git clone --bare --filter=blob:none "file://$(pwd)/src" dst.git &&
+	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_before &&
+	grep "?$(cat blob)" missing_before &&
+	git -C dst.git fetch origin $(cat blob) &&
+	git -C dst.git rev-list --objects --quiet --missing=print HEAD >missing_after &&
+	! grep "?$(cat blob)" missing_after
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd