mbox series

[v4,00/11] The merge-base logic vs missing commit objects

Message ID pull.1657.v4.git.1709113457.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series The merge-base logic vs missing commit objects | expand

Message

Jean-Noël Avila via GitGitGadget Feb. 28, 2024, 9:44 a.m. UTC
This patch series is in the same spirit as what I proposed in
https://lore.kernel.org/git/pull.1651.git.1707212981.gitgitgadget@gmail.com/,
where I tackled missing tree objects. This here patch series tackles missing
commit objects instead: Git's merge-base logic handles these missing commit
objects as if there had not been any commit object at all, i.e. if two
commits' merge base commit is missing, they will be treated as if they had
no common commit history at all, which is a bug. Those commit objects should
not be missing, of course, i.e. this is only a problem in corrupt
repositories.

This patch series is a bit more tricky than the "missing tree objects" one,
though: The function signature of quite a few functions need to be changed
to allow callers to discern between "missing commit object" vs "no
merge-base found".

And of course it gets even more tricky because in shallow and partial clones
we expect commit objects to be missing, and that must not be treated like an
error but the existing logic is actually desirable in those scenarios.

I am deeply sorry both about the length of this patch series as well as
having to lean so heavily on reviews on the Git mailing list.

Changes since v3:

 * Reworded some hard-to-understand paragraphs in the commit message of
   "Prepare paint_down_to_common() for handling shallow commits" (4/11).
 * Changed an inconsistent paint_down_to_common() < 0 to simply test whether
   the return value is non-zero.
 * Changed all commit messages to have proper <area>: prefixes.

Changes since v2:

 * Moved a hunk from 3/11 to 2/11 that lets the repo_in_merge_bases_many()
   function return an error if non-existing commits have been passed to it,
   unless the ignore_missing_commits parameter is non-zero.
 * The end result is tree-same.

Changes since v1:

 * Addressed a lot of memory leaks.
 * Reordered patch 2 and 3 to render the commit message's comment about
   unchanged behavior true.
 * Fixed the incorrectly converted condition in the merge_submodule()
   function.
 * The last patch ("paint_down_to_common(): special-case shallow/partial
   clones") was dropped because it is not actually necessary, and the
   explanation for that was added to the commit message of "Prepare
   paint_down_to_common() for handling shallow commits".
 * An inconsistently-named variable i was renamed to be consistent with the
   other variables called ret.

Johannes Schindelin (11):
  commit-reach(paint_down_to_common): plug two memory leaks
  commit-reach(repo_in_merge_bases_many): optionally expect missing
    commits
  commit-reach(repo_in_merge_bases_many): report missing commits
  commit-reach(paint_down_to_common): prepare for handling shallow
    commits
  commit-reach(paint_down_to_common): start reporting errors
  commit-reach(merge_bases_many): pass on "missing commits" errors
  commit-reach(get_merge_bases_many_0): pass on "missing commits" errors
  commit-reach(repo_get_merge_bases): pass on "missing commits" errors
  commit-reach(get_octopus_merge_bases): pass on "missing commits"
    errors
  commit-reach(repo_get_merge_bases_many): pass on "missing commits"
    errors
  commit-reach(repo_get_merge_bases_many_dirty): pass on errors

 bisect.c                         |   7 +-
 builtin/branch.c                 |  12 +-
 builtin/fast-import.c            |   6 +-
 builtin/fetch.c                  |   2 +
 builtin/log.c                    |  30 +++--
 builtin/merge-base.c             |  23 +++-
 builtin/merge-tree.c             |   5 +-
 builtin/merge.c                  |  26 ++--
 builtin/pull.c                   |   9 +-
 builtin/rebase.c                 |   8 +-
 builtin/receive-pack.c           |   6 +-
 builtin/rev-parse.c              |   5 +-
 commit-reach.c                   | 209 ++++++++++++++++++++-----------
 commit-reach.h                   |  26 ++--
 commit.c                         |   7 +-
 diff-lib.c                       |   5 +-
 http-push.c                      |   5 +-
 log-tree.c                       |   5 +-
 merge-ort.c                      |  87 +++++++++++--
 merge-recursive.c                |  58 +++++++--
 notes-merge.c                    |   3 +-
 object-name.c                    |   7 +-
 remote.c                         |   2 +-
 revision.c                       |  12 +-
 sequencer.c                      |   8 +-
 shallow.c                        |  21 ++--
 submodule.c                      |   7 +-
 t/helper/test-reach.c            |  11 +-
 t/t4301-merge-tree-write-tree.sh |  12 ++
 29 files changed, 441 insertions(+), 183 deletions(-)


base-commit: 564d0252ca632e0264ed670534a51d18a689ef5d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1657%2Fdscho%2Fmerge-base-and-missing-objects-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1657/dscho/merge-base-and-missing-objects-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1657

Range-diff vs v3:

  1:  6e4e409cd43 !  1:  647fa2ed5c5 paint_down_to_common: plug two memory leaks
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    paint_down_to_common: plug two memory leaks
     +    commit-reach(paint_down_to_common): plug two memory leaks
      
          When a commit is missing, we return early (currently pretending that no
          merge basis could be found in that case). At that stage, it is possible
  2:  f68d77c6123 !  2:  48e69bf7229 Prepare `repo_in_merge_bases_many()` to optionally expect missing commits
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    Prepare `repo_in_merge_bases_many()` to optionally expect missing commits
     +    commit-reach(repo_in_merge_bases_many): optionally expect missing commits
      
          Currently this function treats unrelated commit histories the same way
          as commit histories with missing commit objects.
  3:  0aaa224b5db !  3:  1938b317a49 Start reporting missing commits in `repo_in_merge_bases_many()`
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    Start reporting missing commits in `repo_in_merge_bases_many()`
     +    commit-reach(repo_in_merge_bases_many): report missing commits
      
          Some functions in Git's source code follow the convention that returning
          a negative value indicates a fatal error, e.g. repository corruption.
  4:  84e7fbc07e0 !  4:  837aa5a89c6 Prepare `paint_down_to_common()` for handling shallow commits
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    Prepare `paint_down_to_common()` for handling shallow commits
     +    commit-reach(paint_down_to_common): prepare for handling shallow commits
      
          When `git fetch --update-shallow` needs to test for commit ancestry, it
          can naturally run into a missing object (e.g. if it is a parent of a
     -    shallow commit). To accommodate, the merge base logic will need to be
     -    able to handle this situation gracefully.
     +    shallow commit). For the purpose of `--update-shallow`, this needs to be
     +    treated as if the child commit did not even have that parent, i.e. the
     +    commit history needs to be clamped.
      
     -    Currently, that logic pretends that a missing parent commit is
     -    equivalent to a missing parent commit, and for the purpose of
     -    `--update-shallow` that is exactly what we need it to do.
     +    For all other scenarios, clamping the commit history is actually a bug,
     +    as it would hide repository corruption (for an analysis regarding
     +    shallow and partial clones, see the analysis further down).
      
     -    Therefore, let's introduce a flag to indicate when that is precisely the
     -    logic we want.
     +    Add a flag to optionally ask the function to ignore missing commits, as
     +    `--update-shallow` needs it to, while detecting missing objects as a
     +    repository corruption error by default.
      
     -    We need a flag, and cannot rely on `is_repository_shallow()` to indicate
     -    that situation, because that function would return 0: There may not
     -    actually be a `shallow` file, as demonstrated e.g. by t5537.10 ("add new
     -    shallow root with receive.updateshallow on") and t5538.4 ("add new
     -    shallow root with receive.updateshallow on").
     +    This flag is needed, and cannot replaced by `is_repository_shallow()` to
     +    indicate that situation, because that function would return 0 in the
     +    `--update-shallow` scenario: There is not actually a `shallow` file in
     +    that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
     +    with receive.updateshallow on") and t5538.4 ("add new shallow root with
     +    receive.updateshallow on").
      
          Note: shallow commits' parents are set to `NULL` internally already,
          therefore there is no need to special-case shallow repositories here, as
  5:  85332b58c37 !  5:  b978b5d233a commit-reach: start reporting errors in `paint_down_to_common()`
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    commit-reach: start reporting errors in `paint_down_to_common()`
     +    commit-reach(paint_down_to_common): start reporting errors
      
          If a commit cannot be parsed, it is currently ignored when looking for
          merge bases. That's undesirable as the operation can pretend success in
     @@ commit-reach.c: static struct commit_list *merge_bases_many(struct repository *r
       	}
       
      -	list = paint_down_to_common(r, one, n, twos, 0, 0);
     -+	if (paint_down_to_common(r, one, n, twos, 0, 0, &list) < 0) {
     ++	if (paint_down_to_common(r, one, n, twos, 0, 0, &list)) {
      +		free_commit_list(list);
      +		return NULL;
      +	}
  6:  2ae6a54dd59 !  6:  0f1ce130ce6 merge_bases_many(): pass on errors from `paint_down_to_common()`
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    merge_bases_many(): pass on errors from `paint_down_to_common()`
     +    commit-reach(merge_bases_many): pass on "missing commits" errors
      
          The `paint_down_to_common()` function was just taught to indicate
          parsing errors, and now the `merge_bases_many()` function is aware of
     @@ commit-reach.c: static int paint_down_to_common(struct repository *r,
      +				     oid_to_hex(&twos[i]->object.oid));
       	}
       
     - 	if (paint_down_to_common(r, one, n, twos, 0, 0, &list) < 0) {
     + 	if (paint_down_to_common(r, one, n, twos, 0, 0, &list)) {
       		free_commit_list(list);
      -		return NULL;
      +		return -1;
  7:  4321795102d !  7:  133b69b6a62 get_merge_bases_many_0(): pass on errors from `merge_bases_many()`
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    get_merge_bases_many_0(): pass on errors from `merge_bases_many()`
     +    commit-reach(get_merge_bases_many_0): pass on "missing commits" errors
      
          The `merge_bases_many()` function was just taught to indicate
          parsing errors, and now the `get_merge_bases_many_0()` function is aware
  8:  35545c4b777 !  8:  bd52f258cd9 repo_get_merge_bases(): pass on errors from `merge_bases_many()`
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    repo_get_merge_bases(): pass on errors from `merge_bases_many()`
     +    commit-reach(repo_get_merge_bases): pass on "missing commits" errors
      
          The `merge_bases_many()` function was just taught to indicate parsing
          errors, and now the `repo_get_merge_bases()` function (which is also
  9:  a963058d2ba !  9:  b7ef90a57f0 get_octopus_merge_bases(): pass on errors from `merge_bases_many()`
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    get_octopus_merge_bases(): pass on errors from `merge_bases_many()`
     +    commit-reach(get_octopus_merge_bases): pass on "missing commits" errors
      
          The `merge_bases_many()` function was just taught to indicate parsing
          errors, and now the `repo_get_merge_bases()` function (which is also
 10:  c3bed9c8500 ! 10:  32587b3caa7 repo_get_merge_bases_many(): pass on errors from `merge_bases_many()`
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    repo_get_merge_bases_many(): pass on errors from `merge_bases_many()`
     +    commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors
      
          The `merge_bases_many()` function was just taught to indicate parsing
          errors, and now the `repo_get_merge_bases_many()` function is aware of
 11:  bdbf47ae505 ! 11:  05de9f24444 repo_get_merge_bases_many_dirty(): pass on errors from `merge_bases_many()`
     @@ Metadata
      Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
      
       ## Commit message ##
     -    repo_get_merge_bases_many_dirty(): pass on errors from `merge_bases_many()`
     +    commit-reach(repo_get_merge_bases_many_dirty): pass on errors
     +
     +    (Actually, this commit is only about passing on "missing commits"
     +    errors, but adding that to the commit's title would have made it too
     +    long.)
      
          The `merge_bases_many()` function was just taught to indicate parsing
          errors, and now the `repo_get_merge_bases_many_dirty()` function is