mbox series

[v7,0/3] object checking related additions and fixes for bundles in fetches

Message ID pull.1730.v7.git.1718632535.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series object checking related additions and fixes for bundles in fetches | expand

Message

Philippe Blain via GitGitGadget June 17, 2024, 1:55 p.m. UTC
While attempting to fix a reference negotiation bug in bundle-uri, we
identified that the fetch process lacks some crucial object validation
checks when processing bundles. The primary issues are:

 1. In the bundle-uri scenario, object IDs were not validated before writing
    bundle references. This was the root cause of the original negotiation
    bug in bundle-uri and could lead to potential repository corruption.
 2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
    configurations were not applied when directly fetching bundles or
    fetching with bundle-uri enabled. In fact, there were no object
    validation supports for unbundle.

The first patch addresses the bundle-uri negotiation issue by removing the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.

Patches 2 through 3 extend verify_bundle_flags for bundle.c:unbundle to add
support for object validation (fsck) in fetch scenarios, mainly following
the suggestions from Junio and Patrick on the mailing list.

Xing Xin (3):
  bundle-uri: verify oid before writing refs
  fetch-pack: expose fsckObjects configuration logic
  unbundle: extend object verification for fetches

 bundle-uri.c                |   6 +-
 bundle.c                    |   3 +
 bundle.h                    |   1 +
 fetch-pack.c                |  17 ++--
 fetch-pack.h                |   5 +
 t/t5558-clone-bundle-uri.sh | 181 +++++++++++++++++++++++++++++++++++-
 t/t5607-clone-bundle.sh     |  33 +++++++
 transport.c                 |   3 +-
 8 files changed, 235 insertions(+), 14 deletions(-)


base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v7
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v7
Pull-Request: https://github.com/gitgitgadget/git/pull/1730

Range-diff vs v6:

 1:  e958a3ab20c ! 1:  fc9f44fda00 bundle-uri: verify oid before writing refs
     @@ Commit message
          be found for negotiation because it exists in "incr.pack", which is not
          included in `packed_git`.
      
     -    This commit fixes the bug by removing `REF_SKIP_OID_VERIFICATION` flag
     -    when writing bundle refs. When `refs.c:refs_update_ref` is called to to
     -    write the corresponding bundle refs, it triggers
     -    `refs.c:ref_transaction_commit`.  This, in turn, invokes
     -    `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of
     -    the refs storage backend. For files backend, this function is
     -    `files-backend.c:files_transaction_prepare`, and for reftable backend,
     -    it is `reftable-backend.c:reftable_be_transaction_prepare`. Both
     -    functions eventually call `object.c:parse_object`, which can invoke
     +    Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing
     +    bundle refs. When `refs.c:refs_update_ref` is called to write the
     +    corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`.
     +    This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls
     +    `transaction_prepare` of the refs storage backend. For files backend, it
     +    is `files-backend.c:files_transaction_prepare`, and for reftable
     +    backend, it is `reftable-backend.c:reftable_be_transaction_prepare`.
     +    Both functions eventually call `object.c:parse_object`, which can invoke
          `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures
          that bundle refs point to valid objects and that all tips from bundle
          refs are correctly parsed during subsequent negotiations.
      
     -    A test has been added to demonstrate that bundles with incorrect
     -    headers, where refs point to non-existent objects, do not result in any
     -    bundle refs being created in the repository. Additionally, a set of
     -    negotiation-related tests for fetching with bundle-uri has been
     -    included.
     +    A set of negotiation-related tests for cloning with bundle-uri has been
     +    included to demonstrate that downloaded bundles are utilized to
     +    accelerate fetching.
     +
     +    Additionally, another test has been added to show that bundles with
     +    incorrect headers, where refs point to non-existent objects, do not
     +    result in any bundle refs being created in the repository.
      
          Reviewed-by: Karthik Nayak <karthik.188@gmail.com>
          Reviewed-by: Patrick Steinhardt <ps@pks.im>
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
       	bundle_header_release(&header);
      
       ## t/t5558-clone-bundle-uri.sh ##
     +@@
     + test_description='test fetching bundles with --bundle-uri'
     + 
     + . ./test-lib.sh
     ++. "$TEST_DIRECTORY"/lib-bundle.sh
     + 
     + test_expect_success 'fail to clone from non-existent file' '
     + 	test_when_finished rm -rf test &&
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle file' '
       
       test_expect_success 'create bundle' '
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle
      +		git bundle create B.bundle topic &&
      +
      +		# Create a bundle with reference pointing to non-existent object.
     -+		sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
     ++		sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
     ++			<A.bundle >bad-header.bundle &&
     ++		convert_bundle_to_pack \
     ++			<A.bundle >>bad-header.bundle
      +	)
       '
       
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with path bundle' '
       '
       
      +test_expect_success 'clone with bundle that has bad header' '
     ++	# Write bundle ref fails, but clone can still proceed.
      +	git clone --bundle-uri="clone-from/bad-header.bundle" \
      +		clone-from clone-bad-header 2>err &&
     -+	# Write bundle ref fails, but clone can still proceed.
      +	commit_b=$(git -C clone-from rev-parse B) &&
      +	test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
      +	git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
       	! grep "refs/bundles/" refs
       '
       
     -+#########################################################################
     -+# Clone negotiation related tests begin here
     -+
      +test_expect_success 'negotiation: bundle with part of wanted commits' '
     -+	test_when_finished rm -rf trace*.txt &&
     ++	test_when_finished "rm -f trace*.txt" &&
      +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
      +	git clone --no-local --bundle-uri="clone-from/A.bundle" \
      +		clone-from nego-bundle-part &&
      +	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
      +	grep "refs/bundles/" refs >actual &&
     -+	cat >expect <<-\EOF &&
     -+	refs/bundles/topic
     -+	EOF
     ++	test_write_lines refs/bundles/topic >expect &&
      +	test_cmp expect actual &&
      +	# Ensure that refs/bundles/topic are sent as "have".
     -+	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
     ++	test_grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
      +'
      +
      +test_expect_success 'negotiation: bundle with all wanted commits' '
     -+	test_when_finished rm -rf trace*.txt &&
     ++	test_when_finished "rm -f trace*.txt" &&
      +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
      +	git clone --no-local --single-branch --branch=topic --no-tags \
      +		--bundle-uri="clone-from/B.bundle" \
      +		clone-from nego-bundle-all &&
      +	git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
      +	grep "refs/bundles/" refs >actual &&
     -+	cat >expect <<-\EOF &&
     -+	refs/bundles/topic
     -+	EOF
     ++	test_write_lines refs/bundles/topic >expect &&
      +	test_cmp expect actual &&
      +	# We already have all needed commits so no "want" needed.
      +	! grep "clone> want " trace-packet.txt
      +'
      +
      +test_expect_success 'negotiation: bundle list (no heuristic)' '
     -+	test_when_finished rm -f trace*.txt &&
     ++	test_when_finished "rm -f trace*.txt" &&
      +	cat >bundle-list <<-EOF &&
      +	[bundle]
      +		version = 1
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
      +	refs/bundles/left
      +	EOF
      +	test_cmp expect actual &&
     -+	grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
     ++	test_grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
      +'
      +
      +test_expect_success 'negotiation: bundle list (creationToken)' '
     -+	test_when_finished rm -f trace*.txt &&
     ++	test_when_finished "rm -f trace*.txt" &&
      +	cat >bundle-list <<-EOF &&
      +	[bundle]
      +		version = 1
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (file, any m
      +	refs/bundles/left
      +	EOF
      +	test_cmp expect actual &&
     -+	grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
     ++	test_grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
      +'
      +
      +test_expect_success 'negotiation: bundle list with all wanted commits' '
     -+	test_when_finished rm -f trace*.txt &&
     ++	test_when_finished "rm -f trace*.txt" &&
      +	cat >bundle-list <<-EOF &&
      +	[bundle]
      +		version = 1
 2:  d21c236b8de = 2:  3dc0d9dd22f fetch-pack: expose fsckObjects configuration logic
 3:  53395e8c08a ! 3:  2f15099bbb9 unbundle: support object verification for fetches
     @@ Metadata
      Author: Xing Xin <xingxin.xx@bytedance.com>
      
       ## Commit message ##
     -    unbundle: support object verification for fetches
     +    unbundle: extend object verification for fetches
      
     -    This commit extends object verification support for fetches in
     -    `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`
     -    option to `verify_bundle_flags`. When this option is enabled,
     -    `bundle.c:unbundle` invokes `fetch-pack.c:fetch_pack_fsck_objects` to
     -    determine whether to append the "--fsck-objects" flag to
     -    "git-index-pack".
     +    The existing fetch.fsckObjects and transfer.fsckObjects configurations
     +    were not fully applied to bundle-involved fetches, including direct
     +    bundle fetches and bundle-uri enabled fetches. Furthermore, there was no
     +    object verification support for unbundle.
      
     -    `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is now passed to `unbundle` in the
     -    fetching process, including:
     +    This commit extends object verification support in `bundle.c:unbundle`
     +    by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When
     +    this option is enabled, we append the `--fsck-objects` flag to
     +    `git-index-pack`.
     +
     +    The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches,
     +    where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether
     +    to enable this option for `bundle.c:unbundle`, specifically in:
      
          - `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
          - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
      
          This addition ensures a consistent logic for object verification during
     -    fetch operations. Tests have been added to confirm functionality in the
     -    scenarios mentioned above.
     +    fetches. Tests have been added to confirm functionality in the scenarios
     +    mentioned above.
      
          Reviewed-by: Patrick Steinhardt <ps@pks.im>
          Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
      
       ## bundle-uri.c ##
     +@@
     + #include "hashmap.h"
     + #include "pkt-line.h"
     + #include "config.h"
     ++#include "fetch-pack.h"
     + #include "remote.h"
     + 
     + static struct {
      @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
       	 * the prerequisite commits.
       	 */
       	if ((result = unbundle(r, &header, bundle_fd, NULL,
      -			       VERIFY_BUNDLE_QUIET)))
     -+			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
     ++			       VERIFY_BUNDLE_QUIET | (fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0))))
       		return 1;
       
       	/*
      
       ## bundle.c ##
     -@@
     - #include "list-objects-filter-options.h"
     - #include "connected.h"
     - #include "write-or-die.h"
     -+#include "fetch-pack.h"
     - 
     - static const char v2_bundle_signature[] = "# v2 git bundle\n";
     - static const char v3_bundle_signature[] = "# v3 git bundle\n";
      @@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
       	if (header->filter.choice)
       		strvec_push(&ip.args, "--promisor=from-bundle");
       
     -+	if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
     -+		if (fetch_pack_fsck_objects())
     -+			strvec_push(&ip.args, "--fsck-objects");
     ++	if (flags & VERIFY_BUNDLE_FSCK)
     ++		strvec_push(&ip.args, "--fsck-objects");
      +
       	if (extra_index_pack_args) {
       		strvec_pushv(&ip.args, extra_index_pack_args->v);
     @@ bundle.h: int create_bundle(struct repository *r, const char *path,
       enum verify_bundle_flags {
       	VERIFY_BUNDLE_VERBOSE = (1 << 0),
       	VERIFY_BUNDLE_QUIET = (1 << 1),
     -+	VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 2),
     ++	VERIFY_BUNDLE_FSCK = (1 << 2),
       };
       
       int verify_bundle(struct repository *r, struct bundle_header *header,
      
       ## t/t5558-clone-bundle-uri.sh ##
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'create bundle' '
     - 		git bundle create B.bundle topic &&
     - 
     - 		# Create a bundle with reference pointing to non-existent object.
     --		sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle
     -+		sed "s/$(git rev-parse A)/$(git rev-parse B)/" <A.bundle >bad-header.bundle &&
     + 		sed -e "/^$/q" -e "s/$(git rev-parse A) /$(git rev-parse B) /" \
     + 			<A.bundle >bad-header.bundle &&
     + 		convert_bundle_to_pack \
     +-			<A.bundle >>bad-header.bundle
     ++			<A.bundle >>bad-header.bundle &&
      +
      +		cat >data <<-EOF &&
      +		tree $(git rev-parse HEAD^{tree})
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone with bundle that has bad
      +		clone-from clone-bad-object-no-fsck &&
      +	git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
      +	grep "refs/bundles/" refs >actual &&
     -+	cat >expect <<-\EOF &&
     -+	refs/bundles/bad
     -+	EOF
     ++	test_write_lines refs/bundles/bad >expect &&
      +	test_cmp expect actual &&
      +
      +	# Unbundle fails with fsckObjects set true, but clone can still proceed.
     @@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
       		get_refs_from_bundle_inner(transport);
       	ret = unbundle(the_repository, &data->header, data->fd,
      -		       &extra_index_pack_args, 0);
     -+		       &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
     ++		       &extra_index_pack_args,
     ++		       fetch_pack_fsck_objects() ? VERIFY_BUNDLE_FSCK : 0);
       	transport->hash_algo = data->header.hash_algo;
       	return ret;
       }