Message ID | 20230920104507.21664-1-karthik.188@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] revision: add `--ignore-missing-links` user option | expand |
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt > index a4a0cb93b2..8ee713db3d 100644 > --- a/Documentation/rev-list-options.txt > +++ b/Documentation/rev-list-options.txt > @@ -227,6 +227,15 @@ explicitly. > Upon seeing an invalid object name in the input, pretend as if > the bad input was not given. > > +--ignore-missing-links:: > + During traversal, if an object that is referenced does not > + exist, instead of dying of a repository corruption, pretend as > + if the reference itself does not exist. Running the command > + with the `--boundary` option makes these missing commits, > + together with the commits on the edge of revision ranges > + (i.e. true boundary objects), appear on the output, prefixed > + with '-'. There needs an explanation of interaction with --missing=<action> option here, no? "--missing=allow-any" and "--missing=print" are sensible choices, I presume. The former allows the traversal to proceed, as you described in one of your responses. Also with "--missing=print", the user can more directly find out which are the missing objects, even without using the "--boundary" that requires them to sift between missing objects and the objects that are truly on boundary. Here is my attempt: --ignore-missing-links:: During traversal, if an object that is referenced does not exist, instead of dying of a repository corruption, allow `--missing=<missing-action>` to decide what to do. + `--missing=print` will make the command print a list of missing objects, prefixed with a "?" character. + `--missing=allow-any` will make the command proceed without doing anything special. Used with `--boundary`, output these missing objects mixed with the commits on the edge of revision ranges, prefixed with a "-" character. It might make sense to add + Use of this option with other 'missing-action' may probably not give useful behaviour. at the end, but it may not be useful to the readers to say "we allow even more extra flexibility but haven't thought through what good they would do". > +# With `--ignore-missing-links`, we stop the traversal when we encounter a > +# missing link. The boundary commit is not listed as we haven't used the > +# `--boundary` options. > +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' ' > + hide_alternates && > + > + git -C alt rev-list --objects --no-object-names \ > + --ignore-missing-links --missing=allow-any HEAD >actual.raw && > + git -C alt cat-file --batch-check="%(objectname)" \ > + --batch-all-objects >expect.raw && > + > + sort actual.raw >actual && > + sort expect.raw >expect && > + test_cmp expect actual > +' This gives a good baseline. "--missing=print" without "--boundary" may have more obvious use cases, but is there a practical use case for the output from an invocation with "--missing=allow-any" without "--boundary"? Just being curious if I am missing something obvious. Perhaps add another test that uses "--missing=print" instead, and check that the "? missing" output matches what we expect to be missing? The same comment applies to the other test that uses "--missing=allow-any" without "--boundary" we see later. > +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary > +# commits. > +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' ' > + git -C alt rev-list --ignore-missing-links --boundary HEAD >got && > + grep "^-$(git rev-parse HEAD)" got > +' This makes sure what we expect to appear in 'got' actually is in 'got', but we should also make sure 'got' does not have anything unexpected. > +test_expect_success "setup for rev-list --ignore-missing-links with missing objects" ' > + show_alternates && > + test_commit -C alt 11 > +' > + > +for obj in "HEAD^{tree}" "HEAD:11.t" > +do > + # The `--ignore-missing-links` option should ensure that git-rev-list(1) > + # doesn't fail when used alongside `--objects` when a tree/blob is > + # missing. > + test_expect_success "rev-list --ignore-missing-links with missing $type" ' > + oid="$(git -C alt rev-parse $obj)" && > + path="alt/.git/objects/$(test_oid_to_path $oid)" && > + > + mv "$path" "$path.hidden" && > + test_when_finished "mv $path.hidden $path" && In the first iteration, we check without the tree object and we only ensure that removed tree does not appear in the output---but we know the blob that is referenced by that removed tree will not appear in the output, either, don't we? Don't we want to check that, too? In the second iteration, we have resurrected the tree but removed the blob that is referenced by the tree, so we would not see that blob in the output, which makes sense. > + git -C alt rev-list --ignore-missing-links --missing=allow-any --objects HEAD \ > + >actual && > + ! grep $oid actual > + ' > +done > + > +test_done Thanks.
On Wed, Sep 20, 2023 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > There needs an explanation of interaction with --missing=<action> > option here, no? "--missing=allow-any" and "--missing=print" are > sensible choices, I presume. The former allows the traversal to > proceed, as you described in one of your responses. Also with > "--missing=print", the user can more directly find out which are the > missing objects, even without using the "--boundary" that requires > them to sift between missing objects and the objects that are truly > on boundary. > > Here is my attempt: > > --ignore-missing-links:: > During traversal, if an object that is referenced does not > exist, instead of dying of a repository corruption, allow > `--missing=<missing-action>` to decide what to do. > + > `--missing=print` will make the command print a list of missing > objects, prefixed with a "?" character. > + > `--missing=allow-any` will make the command proceed without doing > anything special. Used with `--boundary`, output these missing > objects mixed with the commits on the edge of revision ranges, > prefixed with a "-" character. > > It might make sense to add > > + > Use of this option with other 'missing-action' may probably not > give useful behaviour. > > at the end, but it may not be useful to the readers to say "we allow > even more extra flexibility but haven't thought through what good > they would do". > I was thinking about this, but mostly didn't do this, because the interaction with `--missing` is only for non-commit objects. Because for missing commits, `--ignore-missing-links` skips the commit and the value of `--missing` doesn't make any difference. It's only for non-commit objects that `--missing` comes into play. So perhaps change the current explanation to: --ignore-missing-links:: During traversal, if a commit that is referenced does not exist, instead of dying of a repository corruption, pretend as if the commit itself does not exist. Running the command with the `--boundary` option makes these missing commits, together with the commits on the edge of revision ranges (i.e. true boundary objects), appear on the output, prefixed with '-'. This way `--ignore-missing-links` is specific to commits, combining this with `--missing=...` for non-commit objects is left to the user. What do you think? > > +# With `--ignore-missing-links`, we stop the traversal when we encounter a > > +# missing link. The boundary commit is not listed as we haven't used the > > +# `--boundary` options. > > +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' ' > > + hide_alternates && > > + > > + git -C alt rev-list --objects --no-object-names \ > > + --ignore-missing-links --missing=allow-any HEAD >actual.raw && > > + git -C alt cat-file --batch-check="%(objectname)" \ > > + --batch-all-objects >expect.raw && > > + > > + sort actual.raw >actual && > > + sort expect.raw >expect && > > + test_cmp expect actual > > +' > > This gives a good baseline. "--missing=print" without "--boundary" > may have more obvious use cases, but is there a practical use case > for the output from an invocation with "--missing=allow-any" without > "--boundary"? Just being curious if I am missing something obvious. > Not really, but it's easier to build up the testing, here without boundary we can use cat-file to test all objects (commits and others) that are output by rev-list. Then we can build on top of this in the next test, where we can also ensure that boundary commits are printed. This however is very simplistic, as you've mentioned. There could be other objects and we don't really check. > Perhaps add another test that uses "--missing=print" instead, and > check that the "? missing" output matches what we expect to be > missing? The same comment applies to the other test that uses > "--missing=allow-any" without "--boundary" we see later. > Sure, we can add that too! > > +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary > > +# commits. > > +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' ' > > + git -C alt rev-list --ignore-missing-links --boundary HEAD >got && > > + grep "^-$(git rev-parse HEAD)" got > > +' > > This makes sure what we expect to appear in 'got' actually is in > 'got', but we should also make sure 'got' does not have anything > unexpected. > Yeah, I can add that in too. > > +test_expect_success "setup for rev-list --ignore-missing-links with missing objects" ' > > + show_alternates && > > + test_commit -C alt 11 > > +' > > + > > +for obj in "HEAD^{tree}" "HEAD:11.t" > > +do > > + # The `--ignore-missing-links` option should ensure that git-rev-list(1) > > + # doesn't fail when used alongside `--objects` when a tree/blob is > > + # missing. > > + test_expect_success "rev-list --ignore-missing-links with missing $type" ' > > + oid="$(git -C alt rev-parse $obj)" && > > + path="alt/.git/objects/$(test_oid_to_path $oid)" && > > + > > + mv "$path" "$path.hidden" && > > + test_when_finished "mv $path.hidden $path" && > > In the first iteration, we check without the tree object and we only > ensure that removed tree does not appear in the output---but we know > the blob that is referenced by that removed tree will not appear in > the output, either, don't we? Don't we want to check that, too? > > In the second iteration, we have resurrected the tree but removed > the blob that is referenced by the tree, so we would not see that > blob in the output, which makes sense. > I was implementing this change and just realized that for missing trees, show_object() is never called (that is --missing=print has no effect). That means we only call show_object() when there is a missing blob. So this effectively means: 1. missing commits: --ignore-missing-links works, --missing=... has no effect 2. missing trees: --ignore-missing-links works, --missing=... has no effect 3. missing blobs: --ignore-missing-links works in conjunction with --missing=... I now think it does make even more sense to hardcode the skipping of `finish_object__ma` this way we can state that `--ignore-missing-links` and `--missing` are incompatible, wherein `--ignore-missing-links` ignores any missing object (irrelevant of type) and `--missing` is used to specifically handle missing blobs and provides options. This is also how currently `--boundary` and `--missing=print` is specific to commits and blobs respectively. What do you think?
Karthik Nayak <karthik.188@gmail.com> writes: > I was thinking about this, but mostly didn't do this, because the > interaction with `--missing` is only for non-commit > objects. Because for missing commits, `--ignore-missing-links` > skips the commit and the value of `--missing` doesn't make any > difference. Hmph, somehow that smells like an existing bug. So does the "trees are not shown by --missing=print, and show_object() is never called for missing objects unless they are blobs" you mention. When the user asks "instead of dying, list them so that I can ask around and fetch them to repair this repository", shouldn't we do just that? I wonder if these bugs are something people may be taking advatage of and cannot be fixed retroactively? If we can fix these and nobody complains, that would give us the ideal outcome, I would think. Thanks.
On Thu, Sep 21, 2023 at 9:16 PM Junio C Hamano <gitster@pobox.com> wrote: > > Karthik Nayak <karthik.188@gmail.com> writes: > > > I was thinking about this, but mostly didn't do this, because the > > interaction with `--missing` is only for non-commit > > objects. Because for missing commits, `--ignore-missing-links` > > skips the commit and the value of `--missing` doesn't make any > > difference. > > Hmph, somehow that smells like an existing bug. So does the "trees > are not shown by --missing=print, and show_object() is never called > for missing objects unless they are blobs" you mention. When the > user asks "instead of dying, list them so that I can ask around and > fetch them to repair this repository", shouldn't we do just that? > > I wonder if these bugs are something people may be taking advatage > of and cannot be fixed retroactively? If we can fix these and nobody > complains, that would give us the ideal outcome, I would think. > Let me prefix with saying that I was partly wrong. `--missing` does work for trees, only that it's ineffective when used along with the `ignore_missing_links` bit. But for commits, `--missing` was never configured to work with. I did a quick look at the code, we can do something like this for commits too, i.e. add support for the `--missing` option. We'll have to add a new flag (maybe MISSING) so it can be set during within `repo_parse_commit_gently` so we can parse this as a missing object in rev-list.c and act accordingly. It would invalidate this patch series in some sense. But I'm okay with that. Does that sound good to you?
Karthik Nayak <karthik.188@gmail.com> writes: > Let me prefix with saying that I was partly wrong. `--missing` does work for > trees, only that it's ineffective when used along with the > `ignore_missing_links` bit. > > But for commits, `--missing` was never configured to work with. I > did a quick look at the code, we can do something like this for > commits too, i.e. add support for the `--missing` option. We'll > have to add a new flag (maybe MISSING) so it can be set during > within `repo_parse_commit_gently` so we can parse this as a > missing object in rev-list.c and act accordingly. Do you mean that process_parents() would now throw such a commit to the resulting list successfully instead of omitting when "--missing" is requested? That sounds like a right thing to do but at the same time is a fix with major impact. I do not offhand know what the ramifications are, for example, when bitmap traversal is in use (I assume such a missing commit would not be catalogued in the bitmap?). Taylor, what do you think?
On Mon, Sep 25, 2023 at 6:57 PM Junio C Hamano <gitster@pobox.com> wrote: > > Karthik Nayak <karthik.188@gmail.com> writes: > > > Let me prefix with saying that I was partly wrong. `--missing` does work for > > trees, only that it's ineffective when used along with the > > `ignore_missing_links` bit. > > > > But for commits, `--missing` was never configured to work with. I > > did a quick look at the code, we can do something like this for > > commits too, i.e. add support for the `--missing` option. We'll > > have to add a new flag (maybe MISSING) so it can be set during > > within `repo_parse_commit_gently` so we can parse this as a > > missing object in rev-list.c and act accordingly. > > Do you mean that process_parents() would now throw such a commit to > the resulting list successfully instead of omitting when "--missing" > is requested? That sounds like a right thing to do but at the same > time is a fix with major impact. Yes, but with appropriate flag added. Which will be a new flag. > I do not offhand know what the > ramifications are, for example, when bitmap traversal is in use (I > assume such a missing commit would not be catalogued in the bitmap?). > If there is a missing commit or object, will there even be a bitmap? I can think of the two scenarios: 1. Object is missing before bitmap creation: In such a scenario, the bitmap doesn't get created, since an object is missing. Could be any type of object. 2. Object is missing after bitmap creation: In this case, the bitmap already exists and rev-list won't even know that the commit is missing and simply output the objects as if the objects exist. Overall, this makes sense, but curious to hear what Taylor has to say. I also might post a patch series in this direction to consolidate our thoughts and get a feedback from the list.
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index a4a0cb93b2..8ee713db3d 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -227,6 +227,15 @@ explicitly. Upon seeing an invalid object name in the input, pretend as if the bad input was not given. +--ignore-missing-links:: + During traversal, if an object that is referenced does not + exist, instead of dying of a repository corruption, pretend as + if the reference itself does not exist. Running the command + with the `--boundary` option makes these missing commits, + together with the commits on the edge of revision ranges + (i.e. true boundary objects), appear on the output, prefixed + with '-'. + ifndef::git-rev-list[] --bisect:: Pretend as if the bad bisection ref `refs/bisect/bad` diff --git a/revision.c b/revision.c index 2f4c53ea20..cbfcbf6e28 100644 --- a/revision.c +++ b/revision.c @@ -2595,6 +2595,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->limited = 1; } else if (!strcmp(arg, "--ignore-missing")) { revs->ignore_missing = 1; + } else if (!strcmp(arg, "--ignore-missing-links")) { + revs->ignore_missing_links = 1; } else if (opt && opt->allow_exclude_promisor_objects && !strcmp(arg, "--exclude-promisor-objects")) { if (fetch_if_missing) diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh new file mode 100755 index 0000000000..9ba739c830 --- /dev/null +++ b/t/t6022-rev-list-alternates.sh @@ -0,0 +1,93 @@ +#!/bin/sh + +test_description='handling of alternates in rev-list' + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +# We create 5 commits and move them to the alt directory and +# create 5 more commits which will stay in the main odb. +test_expect_success 'create repository and alternate directory' ' + test_commit_bulk 5 && + git clone --reference=. --shared . alt && + test_commit_bulk --start=6 -C alt 5 +' + +# When the alternate odb is provided, all commits are listed along with the boundary +# commit. +test_expect_success 'rev-list passes with alternate object directory' ' + git -C alt rev-list --all --objects --no-object-names >actual.raw && + { + git rev-list --all --objects --no-object-names && + git -C alt rev-list --all --objects --no-object-names --not \ + --alternate-refs + } >expect.raw && + sort actual.raw >actual && + sort expect.raw >expect && + test_cmp expect actual +' + +alt=alt/.git/objects/info/alternates + +hide_alternates () { + test -f "$alt.bak" || mv "$alt" "$alt.bak" +} + +show_alternates () { + test -f "$alt" || mv "$alt.bak" "$alt" +} + +# When the alternate odb is not provided, rev-list fails since the 5th commit's +# parent is not present in the main odb. +test_expect_success 'rev-list fails without alternate object directory' ' + hide_alternates && + test_must_fail git -C alt rev-list HEAD +' + +# With `--ignore-missing-links`, we stop the traversal when we encounter a +# missing link. The boundary commit is not listed as we haven't used the +# `--boundary` options. +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' ' + hide_alternates && + + git -C alt rev-list --objects --no-object-names \ + --ignore-missing-links --missing=allow-any HEAD >actual.raw && + git -C alt cat-file --batch-check="%(objectname)" \ + --batch-all-objects >expect.raw && + + sort actual.raw >actual && + sort expect.raw >expect && + test_cmp expect actual +' + +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary +# commits. +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' ' + git -C alt rev-list --ignore-missing-links --boundary HEAD >got && + grep "^-$(git rev-parse HEAD)" got +' + +test_expect_success "setup for rev-list --ignore-missing-links with missing objects" ' + show_alternates && + test_commit -C alt 11 +' + +for obj in "HEAD^{tree}" "HEAD:11.t" +do + # The `--ignore-missing-links` option should ensure that git-rev-list(1) + # doesn't fail when used alongside `--objects` when a tree/blob is + # missing. + test_expect_success "rev-list --ignore-missing-links with missing $type" ' + oid="$(git -C alt rev-parse $obj)" && + path="alt/.git/objects/$(test_oid_to_path $oid)" && + + mv "$path" "$path.hidden" && + test_when_finished "mv $path.hidden $path" && + + git -C alt rev-list --ignore-missing-links --missing=allow-any --objects HEAD \ + >actual && + ! grep $oid actual + ' +done + +test_done
The revision backend is used by multiple porcelain commands such as git-rev-list(1) and git-log(1). The backend currently supports ignoring missing links by setting the `ignore_missing_links` bit. This allows the revision walk to skip any objects links which are missing. Expose this bit via an `--ignore-missing-links` user option. A scenario where this option would be used is to find the boundary objects between different object directories. Consider a repository with a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a repository, enabling this option along with the `--boundary` option for while disabling the alternate object directory allows us to find the boundary objects between the main and alternate object directory. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- Changes from v3: 1. Remove hard-coded skipping of finish_object__ma(...). This means that `--ignore-missing-links` needs to be used with `--missing=...` for missing non-commit objects, but also now provides the flexibility to the user instead. Fixes to the tests around this. 2. Fix an incorrect test. 3. Capitalize first character in test's comment. Range diff from v4 1: a08f3637a0 ! 1: 639a8cc385 revision: add `--ignore-missing-links` user option @@ Documentation/rev-list-options.txt: explicitly. --bisect:: Pretend as if the bad bisection ref `refs/bisect/bad` - ## builtin/rev-list.c ## -@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED, - { - struct rev_list_info *info = cb_data; - if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) { -- finish_object__ma(obj); -+ if (!info->revs->ignore_missing_links) -+ finish_object__ma(obj); - return 1; - } - if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT) - ## revision.c ## @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->limited = 1; @@ t/t6022-rev-list-alternates.sh (new) + test_commit_bulk --start=6 -C alt 5 +' + -+# when the alternate odb is provided, all commits are listed along with the boundary ++# When the alternate odb is provided, all commits are listed along with the boundary +# commit. +test_expect_success 'rev-list passes with alternate object directory' ' + git -C alt rev-list --all --objects --no-object-names >actual.raw && @@ t/t6022-rev-list-alternates.sh (new) + hide_alternates && + + git -C alt rev-list --objects --no-object-names \ -+ --ignore-missing-links HEAD >actual.raw && ++ --ignore-missing-links --missing=allow-any HEAD >actual.raw && + git -C alt cat-file --batch-check="%(objectname)" \ + --batch-all-objects >expect.raw && + + sort actual.raw >actual && + sort expect.raw >expect && -+ test_must_fail git -C alt rev-list HEAD ++ test_cmp expect actual +' + +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary @@ t/t6022-rev-list-alternates.sh (new) + mv "$path" "$path.hidden" && + test_when_finished "mv $path.hidden $path" && + -+ git -C alt rev-list --ignore-missing-links --objects HEAD \ ++ git -C alt rev-list --ignore-missing-links --missing=allow-any --objects HEAD \ + >actual && + ! grep $oid actual + ' Documentation/rev-list-options.txt | 9 +++ revision.c | 2 + t/t6022-rev-list-alternates.sh | 93 ++++++++++++++++++++++++++++++ 3 files changed, 104 insertions(+) create mode 100755 t/t6022-rev-list-alternates.sh