mbox series

[v3,0/4] git for-each-ref: is-base atom and base branches

Message ID pull.1768.v3.git.1723631490.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series git for-each-ref: is-base atom and base branches | expand

Message

Johannes Schindelin via GitGitGadget Aug. 14, 2024, 10:31 a.m. UTC
This change introduces a new 'git for-each-ref' atom, 'is-base', in a very
similar way to the 'ahead-behind' atom. As detailed carefully in the first
change, this is motivated by the need to detect the concept of a "base
branch" in a repository with multiple long-lived branches.

This change is motivated by a third-party tool created to make this
detection with the same optimization mechanism, but using a much slower
technique due to the limitations of the Git CLI not presenting this
information. The existing algorithm involves using git rev-list
--first-parent -<N> in batches for the collection of considered references,
comparing those lists, and increasing <N> as needed until finding a
collision. This new use of 'git for-each-ref' will allow determining this
mechanism within a single process and walking a minimal number of commits.

There are benefits to users both on client-side and server-side. In an
internal monorepo, this base branch detection algorithm is used to determine
a long-lived branch based on the HEAD commit, mapping to a group within the
organizational structure of the repository, which determines a set of
projects that the user will likely need to build; this leads to
automatically selecting an initial sparse-checkout definition based on the
build dependencies required. An upcoming feature in Azure Repos will use
this algorithm to automatically create a pull request against the correct
target branch, reducing user pain from needing to select a different branch
after a large commit diff is rendered against the default branch. This atom
unlocks that ability for Git hosting services that use Git in their backend.

Thanks, -Stolee


Updates in v2
=============

 * I had forgotten to include a documentation change in v1. My attempt to
   create a succinct doc change in a follow-up hunk continued to be
   confusing. This version includes a more expanded version of the
   documentation blurb for the is-base token.


Updates in v3
=============

 * Corrected some grammar in a commit message.
 * Fixed (and tested for) a bug where the source branch is equal to a
   candidate ref.
 * Added a test in t6500-for-each-ref.sh to cover some non-commit refs and
   some broken objects.
 * Motivated by the test in t6500, add a new patch that adds a ..._gently()
   method to reduce error noise for non-commit refs.

Derrick Stolee (4):
  commit-reach: add get_branch_base_for_tip
  commit: add gentle reference lookup method
  for-each-ref: add 'is-base' token
  p1500: add is-base performance tests

 Documentation/git-for-each-ref.txt |  42 ++++++++++
 commit-reach.c                     | 126 +++++++++++++++++++++++++++++
 commit-reach.h                     |  17 ++++
 commit.c                           |   8 +-
 commit.h                           |   2 +
 ref-filter.c                       |  77 +++++++++++++++++-
 ref-filter.h                       |  15 ++++
 t/helper/test-reach.c              |   2 +
 t/perf/p1500-graph-walks.sh        |  31 +++++++
 t/t6300-for-each-ref.sh            |   9 +++
 t/t6600-test-reach.sh              | 121 +++++++++++++++++++++++++++
 11 files changed, 448 insertions(+), 2 deletions(-)


base-commit: bea9ecd24b0c3bf06cab4a851694fe09e7e51408
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1768%2Fderrickstolee%2Ftarget-ref-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1768/derrickstolee/target-ref-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1768

Range-diff vs v2:

 1:  580026f910d ! 1:  f93d642c8d9 commit-reach: add get_branch_base_for_tip
     @@ Commit message
          which branch was used as the starting point for a given commit. Add focused
          tests using the 'test-tool reach' command.
      
     -    Repositories that use pull requests (or merge requests) to advance one or
     +    In repositories that use pull requests (or merge requests) to advance one or
          more "protected" branches, the history of that reference can be recovered by
          following the first-parent history in most cases. Most are completed using
          no-fast-forward merges, though squash merges are quite common. Less common
     @@ commit-reach.c: done:
      + */
      +define_commit_slab(best_branch_base, int);
      +static struct best_branch_base best_branch_base;
     -+#define get_best(c) (*best_branch_base_at(&best_branch_base, c))
     -+#define set_best(c,v) (*best_branch_base_at(&best_branch_base, c) = v)
     ++#define get_best(c) (*best_branch_base_at(&best_branch_base, (c)))
     ++#define set_best(c,v) (*best_branch_base_at(&best_branch_base, (c)) = (v))
      +
      +int get_branch_base_for_tip(struct repository *r,
      +			    struct commit *tip,
     @@ commit-reach.c: done:
      +
      +	for (size_t i = 0; i < bases_nr; i++) {
      +		struct commit *c = bases[i];
     ++		int best = get_best(c);
      +
      +		/* Has this already been marked as best by another commit? */
     -+		if (get_best(c))
     ++		if (best) {
     ++			if (best == -1) {
     ++				/* We agree at this position. Stop now. */
     ++				best_index = i + 1;
     ++				goto cleanup;
     ++			}
      +			continue;
     ++		}
      +
      +		set_best(c, i + 1);
      +		prio_queue_put(&queue, c);
     @@ commit-reach.c: done:
      +		branch_point = parent;
      +	}
      +
     ++cleanup:
      +	clear_best_branch_base(&best_branch_base);
      +	clear_prio_queue(&queue);
      +	return best_index > 0 ? best_index - 1 : -1;
     @@ t/t6600-test-reach.sh: test_expect_success 'for-each-ref merged:none' '
      +	test_all_modes get_branch_base_for_tip
      +'
      +
     ++test_expect_success 'get_branch_base_for_tip: equal to tip' '
     ++	# (2,3) branched from the first tip (i,4) in X with i > 2
     ++	cat >input <<-\EOF &&
     ++		A:commit-8-4
     ++		X:commit-1-2
     ++		X:commit-1-4
     ++		X:commit-4-4
     ++		X:commit-8-4
     ++		X:commit-10-4
     ++	EOF
     ++	echo "get_branch_base_for_tip(A,X):3" >expect &&
     ++	test_all_modes get_branch_base_for_tip
     ++'
     ++
      +test_expect_success 'get_branch_base_for_tip: all reach tip' '
      +	# (2,3) branched from the first tip (i,4) in X with i > 2
      +	cat >input <<-\EOF &&
 -:  ----------- > 2:  5240c2a7b32 commit: add gentle reference lookup method
 2:  13341e7e512 ! 3:  df05cee6003 for-each-ref: add 'is-base' token
     @@ ref-filter.c: static int populate_value(struct ref_array_item *ref, struct strbu
      +				v->s = xstrfmt("(%s)", ref->is_base[is_base_atoms]);
      +				free(ref->is_base[is_base_atoms]);
      +			} else {
     -+				/* Not a commit. */
      +				v->s = xstrdup("");
      +			}
      +			is_base_atoms++;
     @@ ref-filter.c: void filter_ahead_behind(struct repository *r,
      +
      +	for (size_t i = 0; i < array->nr; i++) {
      +		const char *name = array->items[i]->refname;
     -+		struct commit *c = lookup_commit_reference_by_name(name);
     ++		struct commit *c = lookup_commit_reference_by_name_gently(name, 1);
      +
      +		CALLOC_ARRAY(array->items[i]->is_base, format->is_base_tips.nr);
      +
     @@ ref-filter.h: void filter_ahead_behind(struct repository *r,
       void ref_filter_clear(struct ref_filter *filter);
       
      
     + ## t/t6300-for-each-ref.sh ##
     +@@ t/t6300-for-each-ref.sh: test_expect_success 'git for-each-ref with nested tags' '
     + 	test_cmp expect actual
     + '
     + 
     ++test_expect_success 'is-base atom with non-commits' '
     ++	git for-each-ref --format="%(is-base:HEAD) %(refname)" >out 2>err &&
     ++	grep "(HEAD) refs/heads/main" out &&
     ++
     ++	test_line_count = 2 err &&
     ++	grep "error: object .* is a commit, not a blob" err &&
     ++	grep "error: bad tag pointer to" err
     ++'
     ++
     + GRADE_FORMAT="%(signature:grade)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
     + TRUSTLEVEL_FORMAT="%(signature:trustlevel)%0a%(signature:key)%0a%(signature:signer)%0a%(signature:fingerprint)%0a%(signature:primarykeyfingerprint)"
     + 
     +
       ## t/t6600-test-reach.sh ##
      @@ t/t6600-test-reach.sh: test_expect_success 'get_branch_base_for_tip: all reach tip' '
       	test_all_modes get_branch_base_for_tip
     @@ t/t6600-test-reach.sh: test_expect_success 'get_branch_base_for_tip: all reach t
      +		--format="%(refname):%(is-base:commit-4-1)" --stdin
      +'
      +
     ++test_expect_success 'for-each-ref is-base: equal to tip' '
     ++	cat >input <<-\EOF &&
     ++	refs/heads/commit-4-2
     ++	refs/heads/commit-5-1
     ++	EOF
     ++	cat >expect <<-\EOF &&
     ++	refs/heads/commit-4-2:(commit-4-2)
     ++	refs/heads/commit-5-1:
     ++	EOF
     ++	run_all_modes git for-each-ref \
     ++		--format="%(refname):%(is-base:commit-4-2)" --stdin
     ++'
     ++
      +test_expect_success 'for-each-ref is-base:multiple' '
      +	cat >input <<-\EOF &&
      +	refs/heads/commit-1-1
 3:  757c20090db = 4:  cce9921bbd8 p1500: add is-base performance tests

Comments

Junio C Hamano Aug. 19, 2024, 7:52 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> There are benefits to users both on client-side and server-side. In an
> internal monorepo, this base branch detection algorithm is used to determine
> a long-lived branch based on the HEAD commit, mapping to a group within the
> organizational structure of the repository, which determines a set of
> projects that the user will likely need to build; this leads to
> automatically selecting an initial sparse-checkout definition based on the
> build dependencies required. An upcoming feature in Azure Repos will use
> this algorithm to automatically create a pull request against the correct
> target branch, reducing user pain from needing to select a different branch
> after a large commit diff is rendered against the default branch. This atom
> unlocks that ability for Git hosting services that use Git in their backend.

Thanks for an update.  This iteration looks good to me.
Derrick Stolee Aug. 20, 2024, 1:33 a.m. UTC | #2
On 8/19/24 3:52 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> There are benefits to users both on client-side and server-side. In an
>> internal monorepo, this base branch detection algorithm is used to determine
>> a long-lived branch based on the HEAD commit, mapping to a group within the
>> organizational structure of the repository, which determines a set of
>> projects that the user will likely need to build; this leads to
>> automatically selecting an initial sparse-checkout definition based on the
>> build dependencies required. An upcoming feature in Azure Repos will use
>> this algorithm to automatically create a pull request against the correct
>> target branch, reducing user pain from needing to select a different branch
>> after a large commit diff is rendered against the default branch. This atom
>> unlocks that ability for Git hosting services that use Git in their backend.
> 
> Thanks for an update.  This iteration looks good to me.

Thank you for your careful review.

-Stolee