diff mbox series

[v2,4/4] rev-list: allow missing tips with --missing=[print|allow*]

Message ID 20240208135055.2705260-5-christian.couder@gmail.com (mailing list archive)
State Superseded
Headers show
Series rev-list: allow missing tips with --missing | expand

Commit Message

Christian Couder Feb. 8, 2024, 1:50 p.m. UTC
In 9830926c7d (rev-list: add commit object support in `--missing`
option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
so that it works with with missing commits, not just blobs/trees.

Unfortunately, such a command would still fail with a "fatal: bad
object <oid>" if it is passed a missing commit, blob or tree as an
argument (before the rev walking even begins).

When such a command is used to find the dependencies of some objects,
for example the dependencies of quarantined objects (see the
"QUARANTINE ENVIRONMENT" section in the git-receive-pack(1)
documentation), it would be better if the command would instead
consider such missing objects, especially commits, in the same way as
other missing objects.

If, for example `--missing=print` is used, it would be nice for some
use cases if the missing tips passed as arguments were reported in
the same way as other missing objects instead of the command just
failing.

We could introduce a new option to make it work like this, but most
users are likely to prefer the command to have this behavior as the
default one. Introducing a new option would require another dumb loop
to look for that options early, which isn't nice.

Also we made `git rev-list` work with missing commits very recently
and the command is most often passed commits as arguments. So let's
consider this as a bug fix related to these previous change.

While at it let's add a NEEDSWORK comment to say that we should get
rid of the existing ugly dumb loops that parse the
`--exclude-promisor-objects` and `--missing=...` options early.

Helped-by: Linus Arver <linusa@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Documentation/rev-list-options.txt |  4 +++
 builtin/rev-list.c                 | 15 +++++++-
 revision.c                         | 14 ++++++--
 t/t6022-rev-list-missing.sh        | 56 ++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Feb. 8, 2024, 5:44 p.m. UTC | #1
Christian Couder <christian.couder@gmail.com> writes:

>  The form '--missing=print' is like 'allow-any', but will also print a
>  list of the missing objects.  Object IDs are prefixed with a ``?'' character.
> ++
> +If some tips passed to the traversal are missing, they will be
> +considered as missing too, and the traversal will ignore them. In case
> +we cannot get their Object ID though, an error will be raised.

Makes sense.

> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index b3f4783858..ec9556f135 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -545,6 +545,15 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  	 *
>  	 * Let "--missing" to conditionally set fetch_if_missing.
>  	 */
> +	/*
> +	 * NEEDSWORK: These dump loops to look for some options early
> +	 * are ugly. We really need setup_revisions() to have a

I would drop the "dump" and "ugly" from here.  It OK to make value
judgement with such words in casual conversations on a proposed
patch, but when we make a request to future developers to fix our
mess, we should be more objective and stick to the techincal facts,
so that they have better understanding on why we think this area
needs more work.

Perhaps something like:

    These loops that attempt to find presence of options without
    understanding what the options they are skipping are broken
    (e.g., it would not know "--grep --exclude-promisor-objects" is
    not triggering "--exclude-promisor-objects" option).

Everything after "We really need" is good (modulo possible grammos),
I think.  Thanks for writing it.

> +	 * mechanism to allow and disallow some sets of options for
> +	 * different commands (like rev-list, replay, etc). Such
> +	 * mechanism should do an early parsing of option and be able
> +	 * to manage the `--exclude-promisor-objects` and `--missing=...`
> +	 * options below.
> +	 */
>  	for (i = 1; i < argc; i++) {
>  		const char *arg = argv[i];
>  		if (!strcmp(arg, "--exclude-promisor-objects")) {
> @@ -753,8 +762,12 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  
>  	if (arg_print_omitted)
>  		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
> -	if (arg_missing_action == MA_PRINT)
> +	if (arg_missing_action == MA_PRINT) {
>  		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
> +		/* Already add missing tips */
> +		oidset_insert_from_set(&missing_objects, &revs.missing_commits);
> +		oidset_clear(&revs.missing_commits);
> +	}

It is unclear what "already" here refers to, at least to me.

Thanks.
Linus Arver Feb. 13, 2024, 10:33 p.m. UTC | #2
Christian Couder <christian.couder@gmail.com> writes:

> In 9830926c7d (rev-list: add commit object support in `--missing`
> option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
> so that it works with with missing commits, not just blobs/trees.
>
> Unfortunately, such a command would still fail with a "fatal: bad
> object <oid>" if it is passed a missing commit, blob or tree as an
> argument (before the rev walking even begins).
>
> When such a command is used to find the dependencies of some objects,
> for example the dependencies of quarantined objects (see the
> "QUARANTINE ENVIRONMENT" section in the git-receive-pack(1)
> documentation), it would be better if the command would instead
> consider such missing objects, especially commits, in the same way as
> other missing objects.

Looks good.

> If, for example `--missing=print` is used, it would be nice for some
> use cases if the missing tips passed as arguments were reported in
> the same way as other missing objects instead of the command just
> failing.

Nit: this paragraph sounds a bit redundant but it is giving an explicit
example of `--missing=print` which was not discussed so far, so I think
it's fine as is.

> We could introduce a new option to make it work like this, but most
> users are likely to prefer the command to have this behavior as the
> default one. Introducing a new option would require another dumb loop
> to look for that options early, which isn't nice.

s/options/option

> Also we made `git rev-list` work with missing commits very recently
> and the command is most often passed commits as arguments. So let's
> consider this as a bug fix related to these previous change.

s/previous change/recent changes

> While at it let's add a NEEDSWORK comment to say that we should get
> rid of the existing ugly dumb loops that parse the
> `--exclude-promisor-objects` and `--missing=...` options early.
>
> Helped-by: Linus Arver <linusa@google.com>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>  Documentation/rev-list-options.txt |  4 +++
>  builtin/rev-list.c                 | 15 +++++++-
>  revision.c                         | 14 ++++++--
>  t/t6022-rev-list-missing.sh        | 56 ++++++++++++++++++++++++++++++
>  4 files changed, 85 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index a583b52c61..bb753b6292 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -1019,6 +1019,10 @@ Unexpected missing objects will raise an error.
>  +
>  The form '--missing=print' is like 'allow-any', but will also print a
>  list of the missing objects.  Object IDs are prefixed with a ``?'' character.
> ++
> +If some tips passed to the traversal are missing, they will be
> +considered as missing too, and the traversal will ignore them. In case
> +we cannot get their Object ID though, an error will be raised.

The only other mention of the term "tips" is for the "--alternate-refs"
flag on line 215 which uses "ref tips". Perhaps this is obvious to
veteran Git developers but I do wonder if we need to somehow define it
(however briefly) the first time we mention it (either in the document
as a whole, or just within this newly added paragraph).

Here's an alternate wording

    Ref tips given on the command line are a special case. They are
    first dereferenced to Object IDs (if this is not possible, an error
    will be raised). Then these IDs are checked to see if the objects
    they refer to exist. If so, the traversal happens starting with
    these tips. Otherwise, then such tips will not be used for
    traversal.

    Even though such missing tips won't be included for traversal, for
    purposes of the `--missing` flag they will be treated the same way
    as those objects that did get traversed (and were determined to be
    missing). For example, if `--missing=print` is given then the Object
    IDs of these tips will be printed just like all other missing
    objects encountered during traversal.

But also, I realize that these documentation touch-ups might be better
served by an overall pass over the whole document, so it's fine if we
decide not to take this suggestion at this time.

Aside: unfortunately we don't seem to define the relationship between
ref tips (e.g., "HEAD"), object IDs (40-char hex string), and the actual
objects (the real data that we traverse over). It's probably another
thing that could be fixed up in the docs in the future.

>  --exclude-promisor-objects::
>  	(For internal use only.)  Prefilter object traversal at
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index b3f4783858..ec9556f135 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -545,6 +545,15 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>  	 *
>  	 * Let "--missing" to conditionally set fetch_if_missing.
>  	 */
> +	/*
> +	 * NEEDSWORK: These dump loops to look for some options early
> +	 * are ugly.

I agree with Junio's suggestion to use more objective language.

> We really need setup_revisions() to have a
> +	 * mechanism to allow and disallow some sets of options for
> +	 * different commands (like rev-list, replay, etc). Such

s/Such/Such a

> +	 * mechanism should do an early parsing of option and be able

s/option/options

> +	 * to manage the `--exclude-promisor-objects` and `--missing=...`
> +	 * options below.
> +	 */
>  	for (i = 1; i < argc; i++) {
>  		const char *arg = argv[i];
>  		if (!strcmp(arg, "--exclude-promisor-objects")) {
> 
> [...]
> 
> @@ -2178,13 +2183,18 @@ static int handle_revision_arg_1(const char *arg_, struct rev_info *revs, int fl
>  	if (revarg_opt & REVARG_COMMITTISH)
>  		get_sha1_flags |= GET_OID_COMMITTISH;
>  
> +	/*
> +	 * Even if revs->do_not_die_on_missing_objects is set, we
> +	 * should error out if we can't even get an oid, ...
> +	 */

Perhaps this wording is more precise?

    If we can't even get an oid, we are forced to error out (regardless
    of revs->do_not_die_on_missing_objects) because a valid traversal
    must start from *some* valid oid. OTOH we ignore the ref tip
    altogether with revs->ignore_missing.

> +	 * ... as
> +	 * `--missing=print` should be able to report missing oids.

I think this comment would be better placed ...

>  	if (get_oid_with_context(revs->repo, arg, get_sha1_flags, &oid, &oc))
>  		return revs->ignore_missing ? 0 : -1;
>  	if (!cant_be_filename)
>  		verify_non_filename(revs->prefix, arg);
>  	object = get_reference(revs, arg, &oid, flags ^ local_flags);

... around here.

>  	if (!object)
> -		return revs->ignore_missing ? 0 : -1;
> +		return (revs->ignore_missing || revs->do_not_die_on_missing_objects) ? 0 : -1;
>  	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
>  	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
>  	free(oc.path);
> @@ -3830,8 +3840,6 @@ int prepare_revision_walk(struct rev_info *revs)
>  				       FOR_EACH_OBJECT_PROMISOR_ONLY);
>  	}
>  
> -	oidset_init(&revs->missing_commits, 0);
> -
>  	if (!revs->reflog_info)
>  		prepare_to_use_bloom_filter(revs);
>  	if (!revs->unsorted_input)
> diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
> index 5f1be7abb5..78387eebb3 100755
> --- a/t/t6022-rev-list-missing.sh
> +++ b/t/t6022-rev-list-missing.sh
> @@ -78,4 +78,60 @@ do
>  	done
>  done
>  
> +for missing_tip in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
> +do
> +	# We want to check that things work when both
> +	#   - all the tips passed are missing (case existing_tip = ""), and
> +	#   - there is one missing tip and one existing tip (case existing_tip = "HEAD")
> +	for existing_tip in "" "HEAD"
> +	do

Though I am biased, these new variable names do make this test that much
easier to read. Thanks.
Linus Arver Feb. 13, 2024, 10:38 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Christian Couder <christian.couder@gmail.com> writes:
>
>> +	 * mechanism to allow and disallow some sets of options for
>> +	 * different commands (like rev-list, replay, etc). Such
>> +	 * mechanism should do an early parsing of option and be able
>> +	 * to manage the `--exclude-promisor-objects` and `--missing=...`
>> +	 * options below.
>> +	 */
>>  	for (i = 1; i < argc; i++) {
>>  		const char *arg = argv[i];
>>  		if (!strcmp(arg, "--exclude-promisor-objects")) {
>> @@ -753,8 +762,12 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
>>  
>>  	if (arg_print_omitted)
>>  		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
>> -	if (arg_missing_action == MA_PRINT)
>> +	if (arg_missing_action == MA_PRINT) {
>>  		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
>> +		/* Already add missing tips */
>> +		oidset_insert_from_set(&missing_objects, &revs.missing_commits);
>> +		oidset_clear(&revs.missing_commits);
>> +	}
>
> It is unclear what "already" here refers to, at least to me.

It's grammatically correct but perhaps a bit "over eager" (gives the
impression that we get these missing tips all the time and is a common
"happy" path). I do still think my earlier v1 comments

    Did you mean "Add already-missing commits"? Perhaps even more explicit
    would be "Add missing tips"?

are relevant here.
Christian Couder Feb. 14, 2024, 2:34 p.m. UTC | #4
On Tue, Feb 13, 2024 at 11:38 PM Linus Arver <linusa@google.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Christian Couder <christian.couder@gmail.com> writes:
> >
> >> +     * mechanism to allow and disallow some sets of options for
> >> +     * different commands (like rev-list, replay, etc). Such
> >> +     * mechanism should do an early parsing of option and be able
> >> +     * to manage the `--exclude-promisor-objects` and `--missing=...`
> >> +     * options below.
> >> +     */
> >>      for (i = 1; i < argc; i++) {
> >>              const char *arg = argv[i];
> >>              if (!strcmp(arg, "--exclude-promisor-objects")) {
> >> @@ -753,8 +762,12 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >>
> >>      if (arg_print_omitted)
> >>              oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
> >> -    if (arg_missing_action == MA_PRINT)
> >> +    if (arg_missing_action == MA_PRINT) {
> >>              oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
> >> +            /* Already add missing tips */
> >> +            oidset_insert_from_set(&missing_objects, &revs.missing_commits);
> >> +            oidset_clear(&revs.missing_commits);
> >> +    }
> >
> > It is unclear what "already" here refers to, at least to me.

I wanted to hint that we already have some missing objects that we can
add to the set. But it's not an important detail and I agree it can be
confusing.

> It's grammatically correct but perhaps a bit "over eager" (gives the
> impression that we get these missing tips all the time and is a common
> "happy" path). I do still think my earlier v1 comments
>
>     Did you mean "Add already-missing commits"? Perhaps even more explicit
>     would be "Add missing tips"?

"Add missing tips" is used in the V3 I just sent. Thanks.

> are relevant here.
Christian Couder Feb. 14, 2024, 2:38 p.m. UTC | #5
On Tue, Feb 13, 2024 at 11:33 PM Linus Arver <linusa@google.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:

> > We could introduce a new option to make it work like this, but most
> > users are likely to prefer the command to have this behavior as the
> > default one. Introducing a new option would require another dumb loop
> > to look for that options early, which isn't nice.
>
> s/options/option

Fixed in the V3 I just sent.

> > Also we made `git rev-list` work with missing commits very recently
> > and the command is most often passed commits as arguments. So let's
> > consider this as a bug fix related to these previous change.
>
> s/previous change/recent changes

Fixed in V3, thanks.

> > While at it let's add a NEEDSWORK comment to say that we should get
> > rid of the existing ugly dumb loops that parse the
> > `--exclude-promisor-objects` and `--missing=...` options early.

> > @@ -1019,6 +1019,10 @@ Unexpected missing objects will raise an error.
> >  +
> >  The form '--missing=print' is like 'allow-any', but will also print a
> >  list of the missing objects.  Object IDs are prefixed with a ``?'' character.
> > ++
> > +If some tips passed to the traversal are missing, they will be
> > +considered as missing too, and the traversal will ignore them. In case
> > +we cannot get their Object ID though, an error will be raised.
>
> The only other mention of the term "tips" is for the "--alternate-refs"
> flag on line 215 which uses "ref tips". Perhaps this is obvious to
> veteran Git developers but I do wonder if we need to somehow define it
> (however briefly) the first time we mention it (either in the document
> as a whole, or just within this newly added paragraph).

I did a quick grep in Documentation/git*.txt and found more than 130
instances of the 'tip' word. So I think it is quite common in the
whole Git documentation and our glossary would likely be the right
place to document it if we decide to do so. Anyway I think that topic
is independent from this small series.

> Here's an alternate wording
>
>     Ref tips given on the command line are a special case.

`git rev-list` has a `--stdin` mode which makes it accept tips from
stdin, so talking about the command line is not enough. Also maybe one
day some config option could be added that makes the command include
additional tips.

> They are
>     first dereferenced to Object IDs (if this is not possible, an error
>     will be raised). Then these IDs are checked to see if the objects
>     they refer to exist. If so, the traversal happens starting with
>     these tips. Otherwise, then such tips will not be used for
>     traversal.
>
>     Even though such missing tips won't be included for traversal, for
>     purposes of the `--missing` flag they will be treated the same way
>     as those objects that did get traversed (and were determined to be
>     missing). For example, if `--missing=print` is given then the Object
>     IDs of these tips will be printed just like all other missing
>     objects encountered during traversal.

Your wording describes what happens correctly, but I don't see much
added value for this specific patch compared to my wording which is
shorter.

Here I think, we only need to describe the result of the command in
the special case that the patch is fixing. We don't need to go into
details of how the command or even --missing works. It could be
interesting to go into details of how things work, but I think it's a
separate topic. Or perhaps it's even actually counter productive to go
into too much detail as it could prevent us from finding other ways to
make it work better. Anyway it seems to me to be a separate topic to
discuss.

> But also, I realize that these documentation touch-ups might be better
> served by an overall pass over the whole document, so it's fine if we
> decide not to take this suggestion at this time.

Right, I agree. Thanks for telling this.

> Aside: unfortunately we don't seem to define the relationship between
> ref tips (e.g., "HEAD"), object IDs (40-char hex string), and the actual
> objects (the real data that we traverse over). It's probably another
> thing that could be fixed up in the docs in the future.

Yeah, and for rev-list a tip could also be a tree or a blob. It
doesn't need to be a "ref tip". (Even though a ref can point to a tree
or a blog, it's very rare in practice.)

> >  --exclude-promisor-objects::
> >       (For internal use only.)  Prefilter object traversal at
> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index b3f4783858..ec9556f135 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -545,6 +545,15 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >        *
> >        * Let "--missing" to conditionally set fetch_if_missing.
> >        */
> > +     /*
> > +      * NEEDSWORK: These dump loops to look for some options early
> > +      * are ugly.
>
> I agree with Junio's suggestion to use more objective language.
>
> > We really need setup_revisions() to have a
> > +      * mechanism to allow and disallow some sets of options for
> > +      * different commands (like rev-list, replay, etc). Such
>
> s/Such/Such a

Fixed in V3

> > +      * mechanism should do an early parsing of option and be able
>
> s/option/options

Fixed in V3, thanks.

> > +      * to manage the `--exclude-promisor-objects` and `--missing=...`
> > +      * options below.
> > +      */
> >       for (i = 1; i < argc; i++) {
> >               const char *arg = argv[i];
> >               if (!strcmp(arg, "--exclude-promisor-objects")) {
> >
> > [...]
> >
> > @@ -2178,13 +2183,18 @@ static int handle_revision_arg_1(const char *arg_, struct rev_info *revs, int fl
> >       if (revarg_opt & REVARG_COMMITTISH)
> >               get_sha1_flags |= GET_OID_COMMITTISH;
> >
> > +     /*
> > +      * Even if revs->do_not_die_on_missing_objects is set, we
> > +      * should error out if we can't even get an oid, ...
> > +      */
>
> Perhaps this wording is more precise?
>
>     If we can't even get an oid, we are forced to error out (regardless
>     of revs->do_not_die_on_missing_objects) because a valid traversal
>     must start from *some* valid oid. OTOH we ignore the ref tip
>     altogether with revs->ignore_missing.

This uses "valid oid" and "valid traversal", but I am not sure it's
easy to understand what "valid" means in both of these expressions.

Also if all the tips passed to the command are missing, the traversal
doesn't need to actually start. The command, assuming
`--missing=print` is passed, just needs to output the oids of the tips
as missing oids.

I also think that "ref tip" might be misleading as trees and blos can
be passed as tips.

So I prefer to keep the wording I used.

> > +      * ... as
> > +      * `--missing=print` should be able to report missing oids.
>
> I think this comment would be better placed ...
>
> >       if (get_oid_with_context(revs->repo, arg, get_sha1_flags, &oid, &oc))
> >               return revs->ignore_missing ? 0 : -1;
> >       if (!cant_be_filename)
> >               verify_non_filename(revs->prefix, arg);
> >       object = get_reference(revs, arg, &oid, flags ^ local_flags);
>
> ... around here.

In a previous round, I was asked to put such a comment before `if
(get_oid_with_context(...))`. So I prefer to avoid some back and forth
here.

> > +++ b/t/t6022-rev-list-missing.sh
> > @@ -78,4 +78,60 @@ do
> >       done
> >  done
> >
> > +for missing_tip in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
> > +do
> > +     # We want to check that things work when both
> > +     #   - all the tips passed are missing (case existing_tip = ""), and
> > +     #   - there is one missing tip and one existing tip (case existing_tip = "HEAD")
> > +     for existing_tip in "" "HEAD"
> > +     do
>
> Though I am biased, these new variable names do make this test that much
> easier to read. Thanks.

Thanks for suggesting them and for your reviews.
Christian Couder Feb. 14, 2024, 2:39 p.m. UTC | #6
On Thu, Feb 8, 2024 at 6:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:

> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index b3f4783858..ec9556f135 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -545,6 +545,15 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >        *
> >        * Let "--missing" to conditionally set fetch_if_missing.
> >        */
> > +     /*
> > +      * NEEDSWORK: These dump loops to look for some options early
> > +      * are ugly. We really need setup_revisions() to have a
>
> I would drop the "dump" and "ugly" from here.  It OK to make value
> judgement with such words in casual conversations on a proposed
> patch, but when we make a request to future developers to fix our
> mess, we should be more objective and stick to the techincal facts,
> so that they have better understanding on why we think this area
> needs more work.
>
> Perhaps something like:
>
>     These loops that attempt to find presence of options without
>     understanding what the options they are skipping are broken
>     (e.g., it would not know "--grep --exclude-promisor-objects" is
>     not triggering "--exclude-promisor-objects" option).

I have used what you suggest in V3, except for s/what/that/.

> Everything after "We really need" is good (modulo possible grammos),
> I think.  Thanks for writing it.
>
> > +      * mechanism to allow and disallow some sets of options for
> > +      * different commands (like rev-list, replay, etc). Such
> > +      * mechanism should do an early parsing of option and be able
> > +      * to manage the `--exclude-promisor-objects` and `--missing=...`
> > +      * options below.
> > +      */
> >       for (i = 1; i < argc; i++) {
> >               const char *arg = argv[i];
> >               if (!strcmp(arg, "--exclude-promisor-objects")) {
> > @@ -753,8 +762,12 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
> >
> >       if (arg_print_omitted)
> >               oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
> > -     if (arg_missing_action == MA_PRINT)
> > +     if (arg_missing_action == MA_PRINT) {
> >               oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
> > +             /* Already add missing tips */
> > +             oidset_insert_from_set(&missing_objects, &revs.missing_commits);
> > +             oidset_clear(&revs.missing_commits);
> > +     }
>
> It is unclear what "already" here refers to, at least to me.

I removed it and changed the comment to just "/* Add missing tips */"
in the V3 I just sent.

Thanks.
Junio C Hamano Feb. 14, 2024, 4:49 p.m. UTC | #7
Christian Couder <christian.couder@gmail.com> writes:

>> >> +            /* Already add missing tips */
>> >> +            oidset_insert_from_set(&missing_objects, &revs.missing_commits);
>> >> +            oidset_clear(&revs.missing_commits);
>> >> +    }
>> >
>> > It is unclear what "already" here refers to, at least to me.
>
> I wanted to hint that we already have some missing objects that we can
> add to the set. But it's not an important detail and I agree it can be
> confusing.

I was confused primarily because "already" was not sitting next to
"missing".  I would have understood "Add already-missing tips" just
fine ;-)
Linus Arver Feb. 16, 2024, 1:24 a.m. UTC | #8
Christian Couder <christian.couder@gmail.com> writes:

> On Tue, Feb 13, 2024 at 11:33 PM Linus Arver <linusa@google.com> wrote:
>>
>> Christian Couder <christian.couder@gmail.com> writes:
>> > While at it let's add a NEEDSWORK comment to say that we should get
>> > rid of the existing ugly dumb loops that parse the
>> > `--exclude-promisor-objects` and `--missing=...` options early.
>
>> > @@ -1019,6 +1019,10 @@ Unexpected missing objects will raise an error.
>> >  +
>> >  The form '--missing=print' is like 'allow-any', but will also print a
>> >  list of the missing objects.  Object IDs are prefixed with a ``?'' character.
>> > ++
>> > +If some tips passed to the traversal are missing, they will be
>> > +considered as missing too, and the traversal will ignore them. In case
>> > +we cannot get their Object ID though, an error will be raised.
>>
>> The only other mention of the term "tips" is for the "--alternate-refs"
>> flag on line 215 which uses "ref tips". Perhaps this is obvious to
>> veteran Git developers but I do wonder if we need to somehow define it
>> (however briefly) the first time we mention it (either in the document
>> as a whole, or just within this newly added paragraph).
>
> I did a quick grep in Documentation/git*.txt and found more than 130
> instances of the 'tip' word. So I think it is quite common in the
> whole Git documentation and our glossary would likely be the right
> place to document it if we decide to do so.

SGTM.

> Anyway I think that topic
> is independent from this small series.

Agreed.

>> Here's an alternate wording
>>
>>     Ref tips given on the command line are a special case.
>
> `git rev-list` has a `--stdin` mode which makes it accept tips from
> stdin

Ah, thanks for the context.

> , so talking about the command line is not enough. Also maybe one
> day some config option could be added that makes the command include
> additional tips.

>> They are
>>     first dereferenced to Object IDs (if this is not possible, an error
>>     will be raised). Then these IDs are checked to see if the objects
>>     they refer to exist. If so, the traversal happens starting with
>>     these tips. Otherwise, then such tips will not be used for
>>     traversal.
>>
>>     Even though such missing tips won't be included for traversal, for
>>     purposes of the `--missing` flag they will be treated the same way
>>     as those objects that did get traversed (and were determined to be
>>     missing). For example, if `--missing=print` is given then the Object
>>     IDs of these tips will be printed just like all other missing
>>     objects encountered during traversal.
>
> Your wording describes what happens correctly, but I don't see much
> added value for this specific patch compared to my wording which is
> shorter.
>
> Here I think, we only need to describe the result of the command in
> the special case that the patch is fixing. We don't need to go into
> details of how the command or even --missing works. It could be
> interesting to go into details of how things work, but I think it's a
> separate topic. Or perhaps it's even actually counter productive to go
> into too much detail as it could prevent us from finding other ways to
> make it work better. Anyway it seems to me to be a separate topic to
> discuss.

Fair enough.

>> But also, I realize that these documentation touch-ups might be better
>> served by an overall pass over the whole document, so it's fine if we
>> decide not to take this suggestion at this time.
>
> Right, I agree. Thanks for telling this.
>
>> Aside: unfortunately we don't seem to define the relationship between
>> ref tips (e.g., "HEAD"), object IDs (40-char hex string), and the actual
>> objects (the real data that we traverse over). It's probably another
>> thing that could be fixed up in the docs in the future.
>
> Yeah, and for rev-list a tip could also be a tree or a blob. It
> doesn't need to be a "ref tip". (Even though a ref can point to a tree
> or a blog, it's very rare in practice.)

Interesting, thanks for the info.

BTW I appreciate you (and others on the list too) taking the time to
explain such subtleties. Although I've been using Git since 2008 a lot
of the terms used around in the codebase can feel quite foreign to me.
So, thanks again.

>> > +      * to manage the `--exclude-promisor-objects` and `--missing=...`
>> > +      * options below.
>> > +      */
>> >       for (i = 1; i < argc; i++) {
>> >               const char *arg = argv[i];
>> >               if (!strcmp(arg, "--exclude-promisor-objects")) {
>> >
>> > [...]
>> >
>> > @@ -2178,13 +2183,18 @@ static int handle_revision_arg_1(const char *arg_, struct rev_info *revs, int fl
>> >       if (revarg_opt & REVARG_COMMITTISH)
>> >               get_sha1_flags |= GET_OID_COMMITTISH;
>> >
>> > +     /*
>> > +      * Even if revs->do_not_die_on_missing_objects is set, we
>> > +      * should error out if we can't even get an oid, ...
>> > +      */
>>
>> Perhaps this wording is more precise?
>>
>>     If we can't even get an oid, we are forced to error out (regardless
>>     of revs->do_not_die_on_missing_objects) because a valid traversal
>>     must start from *some* valid oid. OTOH we ignore the ref tip
>>     altogether with revs->ignore_missing.
>
> This uses "valid oid" and "valid traversal", but I am not sure it's
> easy to understand what "valid" means in both of these expressions.
>
> Also if all the tips passed to the command are missing, the traversal
> doesn't need to actually start. The command, assuming
> `--missing=print` is passed, just needs to output the oids of the tips
> as missing oids.
>
> I also think that "ref tip" might be misleading as trees and blos can
> be passed as tips.
>
> So I prefer to keep the wording I used.

Makes sense, SGTM.

>> > +      * ... as
>> > +      * `--missing=print` should be able to report missing oids.
>>
>> I think this comment would be better placed ...
>>
>> >       if (get_oid_with_context(revs->repo, arg, get_sha1_flags, &oid, &oc))
>> >               return revs->ignore_missing ? 0 : -1;
>> >       if (!cant_be_filename)
>> >               verify_non_filename(revs->prefix, arg);
>> >       object = get_reference(revs, arg, &oid, flags ^ local_flags);
>>
>> ... around here.
>
> In a previous round, I was asked to put such a comment before `if
> (get_oid_with_context(...))`.

Sorry, I missed that.

> So I prefer to avoid some back and forth
> here.

SGTM.

>> > +++ b/t/t6022-rev-list-missing.sh
>> > @@ -78,4 +78,60 @@ do
>> >       done
>> >  done
>> >
>> > +for missing_tip in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
>> > +do
>> > +     # We want to check that things work when both
>> > +     #   - all the tips passed are missing (case existing_tip = ""), and
>> > +     #   - there is one missing tip and one existing tip (case existing_tip = "HEAD")
>> > +     for existing_tip in "" "HEAD"
>> > +     do
>>
>> Though I am biased, these new variable names do make this test that much
>> easier to read. Thanks.
>
> Thanks for suggesting them and for your reviews.

You're welcome!
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a583b52c61..bb753b6292 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -1019,6 +1019,10 @@  Unexpected missing objects will raise an error.
 +
 The form '--missing=print' is like 'allow-any', but will also print a
 list of the missing objects.  Object IDs are prefixed with a ``?'' character.
++
+If some tips passed to the traversal are missing, they will be
+considered as missing too, and the traversal will ignore them. In case
+we cannot get their Object ID though, an error will be raised.
 
 --exclude-promisor-objects::
 	(For internal use only.)  Prefilter object traversal at
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b3f4783858..ec9556f135 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -545,6 +545,15 @@  int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	 *
 	 * Let "--missing" to conditionally set fetch_if_missing.
 	 */
+	/*
+	 * NEEDSWORK: These dump loops to look for some options early
+	 * are ugly. We really need setup_revisions() to have a
+	 * mechanism to allow and disallow some sets of options for
+	 * different commands (like rev-list, replay, etc). Such
+	 * mechanism should do an early parsing of option and be able
+	 * to manage the `--exclude-promisor-objects` and `--missing=...`
+	 * options below.
+	 */
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 		if (!strcmp(arg, "--exclude-promisor-objects")) {
@@ -753,8 +762,12 @@  int cmd_rev_list(int argc, const char **argv, const char *prefix)
 
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
-	if (arg_missing_action == MA_PRINT)
+	if (arg_missing_action == MA_PRINT) {
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
+		/* Already add missing tips */
+		oidset_insert_from_set(&missing_objects, &revs.missing_commits);
+		oidset_clear(&revs.missing_commits);
+	}
 
 	traverse_commit_list_filtered(
 		&revs, show_commit, show_object, &info,
diff --git a/revision.c b/revision.c
index 4c5cd7c3ce..80c349d347 100644
--- a/revision.c
+++ b/revision.c
@@ -388,6 +388,10 @@  static struct object *get_reference(struct rev_info *revs, const char *name,
 			return NULL;
 		if (revs->exclude_promisor_objects && is_promisor_object(oid))
 			return NULL;
+		if (revs->do_not_die_on_missing_objects) {
+			oidset_insert(&revs->missing_commits, oid);
+			return NULL;
+		}
 		die("bad object %s", name);
 	}
 	object->flags |= flags;
@@ -1947,6 +1951,7 @@  void repo_init_revisions(struct repository *r,
 	init_display_notes(&revs->notes_opt);
 	list_objects_filter_init(&revs->filter);
 	init_ref_exclusions(&revs->ref_excludes);
+	oidset_init(&revs->missing_commits, 0);
 }
 
 static void add_pending_commit_list(struct rev_info *revs,
@@ -2178,13 +2183,18 @@  static int handle_revision_arg_1(const char *arg_, struct rev_info *revs, int fl
 	if (revarg_opt & REVARG_COMMITTISH)
 		get_sha1_flags |= GET_OID_COMMITTISH;
 
+	/*
+	 * Even if revs->do_not_die_on_missing_objects is set, we
+	 * should error out if we can't even get an oid, as
+	 * `--missing=print` should be able to report missing oids.
+	 */
 	if (get_oid_with_context(revs->repo, arg, get_sha1_flags, &oid, &oc))
 		return revs->ignore_missing ? 0 : -1;
 	if (!cant_be_filename)
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, &oid, flags ^ local_flags);
 	if (!object)
-		return revs->ignore_missing ? 0 : -1;
+		return (revs->ignore_missing || revs->do_not_die_on_missing_objects) ? 0 : -1;
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
 	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	free(oc.path);
@@ -3830,8 +3840,6 @@  int prepare_revision_walk(struct rev_info *revs)
 				       FOR_EACH_OBJECT_PROMISOR_ONLY);
 	}
 
-	oidset_init(&revs->missing_commits, 0);
-
 	if (!revs->reflog_info)
 		prepare_to_use_bloom_filter(revs);
 	if (!revs->unsorted_input)
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 5f1be7abb5..78387eebb3 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -78,4 +78,60 @@  do
 	done
 done
 
+for missing_tip in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
+do
+	# We want to check that things work when both
+	#   - all the tips passed are missing (case existing_tip = ""), and
+	#   - there is one missing tip and one existing tip (case existing_tip = "HEAD")
+	for existing_tip in "" "HEAD"
+	do
+		for action in "allow-any" "print"
+		do
+			test_expect_success "--missing=$action with tip '$missing_tip' missing and tip '$existing_tip'" '
+				oid="$(git rev-parse $missing_tip)" &&
+				path=".git/objects/$(test_oid_to_path $oid)" &&
+
+				# Before the object is made missing, we use rev-list to
+				# get the expected oids.
+				if test "$existing_tip" = "HEAD"
+				then
+					git rev-list --objects --no-object-names \
+						HEAD ^$missing_tip >expect.raw
+				else
+					>expect.raw
+				fi &&
+
+				# Blobs are shared by all commits, so even though a commit/tree
+				# might be skipped, its blob must be accounted for.
+				if test "$existing_tip" = "HEAD" && test $missing_tip != "HEAD:1.t"
+				then
+					echo $(git rev-parse HEAD:1.t) >>expect.raw &&
+					echo $(git rev-parse HEAD:2.t) >>expect.raw
+				fi &&
+
+				mv "$path" "$path.hidden" &&
+				test_when_finished "mv $path.hidden $path" &&
+
+				git rev-list --missing=$action --objects --no-object-names \
+				     $oid $existing_tip >actual.raw &&
+
+				# When the action is to print, we should also add the missing
+				# oid to the expect list.
+				case $action in
+				allow-any)
+					;;
+				print)
+					grep ?$oid actual.raw &&
+					echo ?$oid >>expect.raw
+					;;
+				esac &&
+
+				sort actual.raw >actual &&
+				sort expect.raw >expect &&
+				test_cmp expect actual
+			'
+		done
+	done
+done
+
 test_done