mbox series

[v5,0/4] object checking related additions and fixes for bundles in fetches

Message ID pull.1730.v5.git.1718088126.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series object checking related additions and fixes for bundles in fetches | expand

Message

Philippe Blain via GitGitGadget June 11, 2024, 6:42 a.m. UTC
While attempting to fix a reference negotiation bug in bundle-uri, we
identified that the fetch process lacks some crucial object validation
checks when processing bundles. The primary issues are:

 1. In the bundle-uri scenario, object IDs were not validated before writing
    bundle references. This was the root cause of the original negotiation
    bug in bundle-uri and could lead to potential repository corruption.
 2. The existing "fetch.fsckObjects" and "transfer.fsckObjects"
    configurations were not applied when directly fetching bundles or
    fetching with bundle-uri enabled. In fact, there were no object
    validation supports for unbundle.

The first patch addresses the bundle-uri negotiation issue by removing the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.

Patches 2 through 4 extend verify_bundle_flags for bundle.c:unbundle to add
support for object validation (fsck) in different scenarios, mainly
following the suggestions from Junio on the mailing list.

Xing Xin (4):
  bundle-uri: verify oid before writing refs
  fetch-pack: expose fsckObjects configuration logic
  unbundle: extend options to support object verification
  unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches

 bundle-uri.c                |   5 +-
 bundle.c                    |  10 ++
 bundle.h                    |   2 +
 fetch-pack.c                |  17 ++--
 fetch-pack.h                |   5 +
 t/t5558-clone-bundle-uri.sh | 186 +++++++++++++++++++++++++++++++++++-
 t/t5607-clone-bundle.sh     |  33 +++++++
 transport.c                 |   2 +-
 8 files changed, 246 insertions(+), 14 deletions(-)


base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1730

Range-diff vs v4:

 1:  e958a3ab20c = 1:  e958a3ab20c bundle-uri: verify oid before writing refs
 3:  5ddc894c2c1 = 2:  d21c236b8de fetch-pack: expose fsckObjects configuration logic
 2:  beb70735811 ! 3:  0a18d7839be unbundle: extend verify_bundle_flags to support fsck-objects
     @@ Metadata
      Author: Xing Xin <xingxin.xx@bytedance.com>
      
       ## Commit message ##
     -    unbundle: extend verify_bundle_flags to support fsck-objects
     +    unbundle: extend options to support object verification
      
     -    This commit extends `verify_bundle_flags` by adding a new option
     -    `VERIFY_BUNDLE_FSCK_ALWAYS`, which enables checks for broken objects in
     -    `bundle.c:unbundle`. This option is now used as the default for fetches
     -    involving bundles, specifically by `transport.c:fetch_refs_from_bundle`
     -    for direct bundle fetches and by `bundle-uri.c:unbundle_from_file` for
     -    _bundle-uri_ enabled fetches.
     +    This commit extends object verification support in `bundle.c:unbundle`
     +    by adding two new options to `verify_bundle_flags`:
      
     -    Upcoming commits will introduce another option as a replacement that
     -    fits better with fetch operations. `VERIFY_BUNDLE_FSCK_ALWAYS` will be
     -    further used to add "--fsck-objects" support for "git bundle unbundle"
     -    and "git bundle verify".
     +    - `VERIFY_BUNDLE_FSCK_ALWAYS` explicitly enables checks for broken
     +      objects. It will be used to add "--fsck-objects" support for "git
     +      bundle unbundle" in a separate series.
     +    - `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` is designed to be used during fetch
     +      operations, specifically for direct bundle fetches and _bundle-uri_
     +      enabled fetches. When enabled, `bundle.c:unbundle` invokes
     +      `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable
     +      checks for broken objects. Passing this flag during fetching will be
     +      implemented in a subsequent commit.
     +
     +    Note that the option `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
     +    `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
      
          Reviewed-by: Patrick Steinhardt <ps@pks.im>
          Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
      
     - ## bundle-uri.c ##
     -@@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
     - 	 * the prerequisite commits.
     - 	 */
     - 	if ((result = unbundle(r, &header, bundle_fd, NULL,
     --			       VERIFY_BUNDLE_QUIET)))
     -+			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
     - 		return 1;
     - 
     - 	/*
     -
       ## bundle.c ##
     +@@
     + #include "list-objects-filter-options.h"
     + #include "connected.h"
     + #include "write-or-die.h"
     ++#include "fetch-pack.h"
     + 
     + static const char v2_bundle_signature[] = "# v2 git bundle\n";
     + static const char v3_bundle_signature[] = "# v3 git bundle\n";
     +@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
     + 	     enum verify_bundle_flags flags)
     + {
     + 	struct child_process ip = CHILD_PROCESS_INIT;
     ++	int fsck_objects = 0;
     + 
     + 	if (verify_bundle(r, header, flags))
     + 		return -1;
      @@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
       	if (header->filter.choice)
       		strvec_push(&ip.args, "--promisor=from-bundle");
       
      +	if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
     ++		fsck_objects = 1;
     ++	else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
     ++		fsck_objects = fetch_pack_fsck_objects();
     ++
     ++	if (fsck_objects)
      +		strvec_push(&ip.args, "--fsck-objects");
      +
       	if (extra_index_pack_args) {
     @@ bundle.h: int create_bundle(struct repository *r, const char *path,
       	VERIFY_BUNDLE_VERBOSE = (1 << 0),
       	VERIFY_BUNDLE_QUIET = (1 << 1),
      +	VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
     ++	VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
       };
       
       int verify_bundle(struct repository *r, struct bundle_header *header,
     -
     - ## transport.c ##
     -@@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
     - 	if (!data->get_refs_from_bundle_called)
     - 		get_refs_from_bundle_inner(transport);
     - 	ret = unbundle(the_repository, &data->header, data->fd,
     --		       &extra_index_pack_args, 0);
     -+		       &extra_index_pack_args, VERIFY_BUNDLE_FSCK_ALWAYS);
     - 	transport->hash_algo = data->header.hash_algo;
     - 	return ret;
     - }
 4:  68b9bca9f8b ! 4:  eb9f21f16b5 unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
     @@ Metadata
      Author: Xing Xin <xingxin.xx@bytedance.com>
      
       ## Commit message ##
     -    unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH
     +    unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches
      
     -    This commit introduces a new option `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` to
     -    `verify_bundle_flags`. In `bundle.c:unbundle`, this new option controls
     -    whether broken object checks should be enabled by invoking
     -    `fetch-pack.c:fetch_pack_fsck_objects`. Note that the option
     -    `VERIFY_BUNDLE_FSCK_ALWAYS` takes precedence over
     -    `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH`.
     -
     -    This flag is now used in the fetching process by:
     +    This commit passes `VERIFY_BUNDLE_FSCK_FOLLOW_FETCH` to `unbundle` in
     +    the fetching process, including:
      
          - `transport.c:fetch_refs_from_bundle` for direct bundle fetches.
          - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches.
     @@ Commit message
          fetch operations. Tests have been added to confirm functionality in the
          scenarios mentioned above.
      
     +    Reviewed-by: Patrick Steinhardt <ps@pks.im>
          Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
      
       ## bundle-uri.c ##
     @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *fi
       	 * the prerequisite commits.
       	 */
       	if ((result = unbundle(r, &header, bundle_fd, NULL,
     --			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_ALWAYS)))
     +-			       VERIFY_BUNDLE_QUIET)))
      +			       VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)))
       		return 1;
       
       	/*
      
     - ## bundle.c ##
     -@@
     - #include "list-objects-filter-options.h"
     - #include "connected.h"
     - #include "write-or-die.h"
     -+#include "fetch-pack.h"
     - 
     - static const char v2_bundle_signature[] = "# v2 git bundle\n";
     - static const char v3_bundle_signature[] = "# v3 git bundle\n";
     -@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
     - 	     enum verify_bundle_flags flags)
     - {
     - 	struct child_process ip = CHILD_PROCESS_INIT;
     -+	int fsck_objects = 0;
     - 
     - 	if (verify_bundle(r, header, flags))
     - 		return -1;
     -@@ bundle.c: int unbundle(struct repository *r, struct bundle_header *header,
     - 		strvec_push(&ip.args, "--promisor=from-bundle");
     - 
     - 	if (flags & VERIFY_BUNDLE_FSCK_ALWAYS)
     -+		fsck_objects = 1;
     -+	else if (flags & VERIFY_BUNDLE_FSCK_FOLLOW_FETCH)
     -+		fsck_objects = fetch_pack_fsck_objects();
     -+
     -+	if (fsck_objects)
     - 		strvec_push(&ip.args, "--fsck-objects");
     - 
     - 	if (extra_index_pack_args) {
     -
     - ## bundle.h ##
     -@@ bundle.h: enum verify_bundle_flags {
     - 	VERIFY_BUNDLE_VERBOSE = (1 << 0),
     - 	VERIFY_BUNDLE_QUIET = (1 << 1),
     - 	VERIFY_BUNDLE_FSCK_ALWAYS = (1 << 2),
     -+	VERIFY_BUNDLE_FSCK_FOLLOW_FETCH = (1 << 3),
     - };
     - 
     - int verify_bundle(struct repository *r, struct bundle_header *header,
     -
       ## t/t5558-clone-bundle-uri.sh ##
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'create bundle' '
       		git bundle create B.bundle topic &&
     @@ transport.c: static int fetch_refs_from_bundle(struct transport *transport,
       	if (!data->get_refs_from_bundle_called)
       		get_refs_from_bundle_inner(transport);
       	ret = unbundle(the_repository, &data->header, data->fd,
     --		       &extra_index_pack_args, VERIFY_BUNDLE_FSCK_ALWAYS);
     +-		       &extra_index_pack_args, 0);
      +		       &extra_index_pack_args, VERIFY_BUNDLE_FSCK_FOLLOW_FETCH);
       	transport->hash_algo = data->header.hash_algo;
       	return ret;