Message ID | pull.949.git.1620228664666.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GSOC] ref-filter: solve bugs caused by enumeration | expand |
"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: ZheNing Hu <adlternative@gmail.com> > > Johannes Schindelin seems to have introduced a bug in > cc72385f(for-each-ref: let upstream/push optionally > report the remote name), it use `atom->u.remote_ref.option` > which is a member of enumeration in the judgment statement. Sorry but I am not sure if our readers would understand what "a member of enumeration in the judgment statement" is (I certainly do not), and even more importantly, "bugs caused by enumeration" on the title does not hint much about what problem the patch is trying to solve. > When we use other members in the enumeration `used_atom.u`, > and it happened to fill in `remote_ref.push`, this judgment > may still be established and produces errors. So replace the > judgment statement with `starts_with(name, "push")` to fix > the error. And this paragraph does not enlighten all that much, unfortunately. Is it that a check refers to one member of a union without making sure that member is the one in effect in the union? I am most puzzled by the mention of "enumeration" when there does not appear to be any enum involved. > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-949%2Fadlternative%2Fref-filter-enum-bug-fix-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-949/adlternative/ref-filter-enum-bug-fix-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/949 > > ref-filter.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ref-filter.c b/ref-filter.c > index a0adb4551d87..f467f2fbbb73 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -1730,7 +1730,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) > else > v->s = xstrdup(""); > continue; > - } else if (atom->u.remote_ref.push) { > + } else if (starts_with(name, "push")) { > const char *branch_name; > v->s = xstrdup(""); > if (!skip_prefix(ref->refname, "refs/heads/", > > base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a
Junio C Hamano <gitster@pobox.com> 于2021年5月6日周四 上午9:53写道: > > "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: ZheNing Hu <adlternative@gmail.com> > > > > Johannes Schindelin seems to have introduced a bug in > > cc72385f(for-each-ref: let upstream/push optionally > > report the remote name), it use `atom->u.remote_ref.option` > > which is a member of enumeration in the judgment statement. > > Sorry but I am not sure if our readers would understand what "a > member of enumeration in the judgment statement" is (I certainly do > not), and even more importantly, "bugs caused by enumeration" on the > title does not hint much about what problem the patch is trying to > solve. > > > When we use other members in the enumeration `used_atom.u`, > > and it happened to fill in `remote_ref.push`, this judgment > > may still be established and produces errors. So replace the > > judgment statement with `starts_with(name, "push")` to fix > > the error. > > And this paragraph does not enlighten all that much, unfortunately. > > Is it that a check refers to one member of a union without making > sure that member is the one in effect in the union? I am most > puzzled by the mention of "enumeration" when there does not appear > to be any enum involved. > Sorry, I didn't make it clear. I re-describe the problem first, and then modify the commit messages. Suppose we are dealing with "%(notes)", the name member of our `used_atom` item at this time is "notes", and its member union `u` uses a `struct notes_option`, we fill some values in `used_atom.u.notes_option`, When we traverse in `used_atom` array in `populate_value()` and previous judgement like "if (starts_with(name, "refname"))" will failed, because we are dealing with atom "notes", but in judgement "else if (atom->u.remote_ref.push)", The value we fill in `used_atom.u.notes_option` just makes `used_atom.u.remote_ref.push` non-zero. This leads us into the wrong case. Is this clearer? Thanks. -- ZheNing Hu
ZheNing Hu <adlternative@gmail.com> writes: >> Is it that a check refers to one member of a union without making >> sure that member is the one in effect in the union? I am most >> puzzled by the mention of "enumeration" when there does not appear >> to be any enum involved. > > Sorry, I didn't make it clear. I re-describe the problem first, and then > modify the commit messages. > > Suppose we are dealing with "%(notes)", the name member of our > `used_atom` item at this time is "notes", and its member union `u` uses > a `struct notes_option`, we fill some values in `used_atom.u.notes_option`, > > When we traverse in `used_atom` array in `populate_value()` and previous > judgement like "if (starts_with(name, "refname"))" will failed, because we > are dealing with atom "notes", but in judgement "else if > (atom->u.remote_ref.push)", > The value we fill in `used_atom.u.notes_option` just makes > `used_atom.u.remote_ref.push` non-zero. This leads us into the wrong case. > > Is this clearer? This time you avoided the word enumeration, and it made it clearer. The word commonly used is "condition" where you said "judgment", I think, and wit that it would probably be even more clear. used_atom.u is an union, and it has different members depending on what atom the auxiliary data the union part of the "struct used_atom" wants to record. At most only one of the members can be valid at any one time. Since the code checks u.remote_ref without even making sure if the atom is "push" or "push:" (which are only two cases that u.remote_ref.push becomes valid), but u.remote_ref shares the same storage for other members of the union, the check was reading from an invalid member, which was the bug. So the fix was to see what atom it is by checking its name member? Is starts_with() a reliable test? A fictitious atom "pushe" will be different from "push" or "push:<something>", but will still pass that test, so from the point of view of future-proofing the tests, shouldn't it do the same check as the one at the beginning of remote_ref_atom_parser()? Thanks.
> This time you avoided the word enumeration, and it made it clearer. > The word commonly used is "condition" where you said "judgment", I > think, and wit that it would probably be even more clear. > OK. > used_atom.u is an union, and it has different members depending on > what atom the auxiliary data the union part of the "struct > used_atom" wants to record. At most only one of the members can be > valid at any one time. Since the code checks u.remote_ref without > even making sure if the atom is "push" or "push:" (which are only > two cases that u.remote_ref.push becomes valid), but u.remote_ref > shares the same storage for other members of the union, the check > was reading from an invalid member, which was the bug. > Yes, that's what it means. I got it wrong before, of course used_atom.u is not a enum, it's union. > So the fix was to see what atom it is by checking its name member? > Is starts_with() a reliable test? A fictitious atom "pushe" will be > different from "push" or "push:<something>", but will still pass > that test, so from the point of view of future-proofing the tests, > shouldn't it do the same check as the one at the beginning of > remote_ref_atom_parser()? > I think it's not necesssary. Before we call `populate_value()`, `parse_ref_filter_atom()` only allow user use atom which match "valid_atom" entry fully, something like "pushe" will be rejected and process die with "fatal: unknown field name: pushe", so it didn't pass this "starts_with()" test. /* Is the atom a valid one? */ for (i = 0; i < ARRAY_SIZE(valid_atom); i++) { int len = strlen(valid_atom[i].name); if (len == atom_len && !memcmp(valid_atom[i].name, sp, len)) break; } if (ARRAY_SIZE(valid_atom) <= i) return strbuf_addf_ret(err, -1, _("unknown field name: %.*s"), (int)(ep-atom), atom); > Thanks. Thanks. -- ZheNing Hu
ZheNing Hu <adlternative@gmail.com> writes: >> So the fix was to see what atom it is by checking its name member? >> Is starts_with() a reliable test? A fictitious atom "pushe" will be >> different from "push" or "push:<something>", but will still pass >> that test, so from the point of view of future-proofing the tests, >> shouldn't it do the same check as the one at the beginning of >> remote_ref_atom_parser()? >> > > I think it's not necesssary. Before we call `populate_value()`, ... I do not care if it is not needed with the "current" set of atoms we accept. The example "pushe", which obviously will not be accepted by today's code, was all about future-proofing.
Junio C Hamano <gitster@pobox.com> 于2021年5月6日周四 下午7:20写道: > > ZheNing Hu <adlternative@gmail.com> writes: > > >> So the fix was to see what atom it is by checking its name member? > >> Is starts_with() a reliable test? A fictitious atom "pushe" will be > >> different from "push" or "push:<something>", but will still pass > >> that test, so from the point of view of future-proofing the tests, > >> shouldn't it do the same check as the one at the beginning of > >> remote_ref_atom_parser()? > >> > > > > I think it's not necesssary. Before we call `populate_value()`, ... > > I do not care if it is not needed with the "current" set of atoms we > accept. The example "pushe", which obviously will not be accepted > by today's code, was all about future-proofing. Well, what you said makes sense, now I think something like this will be more secure. In the future, other people may be grateful for such a choice. :) @@ -1730,7 +1759,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) else v->s = xstrdup(""); continue; - } else if (starts_with(name, "push")) { + } else if (starts_with(name, "push") && atom->u.remote_ref.push) { -- ZheNing Hu
ZheNing Hu <adlternative@gmail.com> writes: > ... now I think something like this will be more secure. In the > future, other people may be grateful for such a choice. :) > > @@ -1730,7 +1759,7 @@ static int populate_value(struct ref_array_item > *ref, struct strbuf *err) > else > v->s = xstrdup(""); > continue; > - } else if (starts_with(name, "push")) { > + } else if (starts_with(name, "push") && > atom->u.remote_ref.push) { Hmph, I do no think that would be futureproof at all. If a new atom "pushe" gets added, it is not all that unlikely that it would add its own member to the same union. name here would be "pushe" and starts with "push", and upon parsing "pushe", its own member in the union, atom->u.X, would have been assigned to, but the code we see here still accesses atom->u.remote_ref.*, so you still have the same problem you started to solve, no? The check we use in remote_ref_atom_parser() to see if the atom is about pushing, i.e. if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:")) is unlikely to be invalidated in future changes, as this is very much end-user facing and changing the condition will break existing scripts, so that is what I was expecting you to use instead.
> Hmph, I do no think that would be futureproof at all. If a new atom > "pushe" gets added, it is not all that unlikely that it would add > its own member to the same union. name here would be "pushe" and > starts with "push", and upon parsing "pushe", its own member in the > union, atom->u.X, would have been assigned to, but the code we see > here still accesses atom->u.remote_ref.*, so you still have the same > problem you started to solve, no? > Well, this "pushe" has `atom->u.X` which is different from "push" and "push:". This is indeed worth worrying about. > The check we use in remote_ref_atom_parser() to see if the atom is > about pushing, i.e. > > if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:")) > > is unlikely to be invalidated in future changes, as this is very > much end-user facing and changing the condition will break existing > scripts, so that is what I was expecting you to use instead. > But I am afraid that the cost we paid for string matching here is too high, Since the string matching here is to illustrate that we use one of the members of the union, why can't we directly add a "union_type" member to used_atom? if we have this, we can just switch the "union_type" flag and also make the program correct. Perhaps this would be better than the large number of string matches have done here. > > Thanks. -- ZheNing Hu
ZheNing Hu <adlternative@gmail.com> writes:
> But I am afraid that the cost we paid for string matching here is too high,
If that is truly the concern (I do not know without measuring),
perhaps we should add a member next to the union to say which one of
the union members is valid, so that you can say
if (atom->atom_type == ATOM_TYPE_REMOTE_REF &&
atom->u.remote_ref.push)
(introduce an enum and define ATOM_TYPE_* after the member in the
union).
That would help futureproofing the code even further, as a new
synonym of "push" introduced laster [*] would not invalidate the check you are
adding there.
[Footnote]
* remote_ref_atom_parser() in the future may begin like so:
- if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:"))
+ if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:") ||
+ !strcmp(atom->name, "a-synonym-for-push"))
atom->u.remote_ref.push = 1;
> If that is truly the concern (I do not know without measuring), > perhaps we should add a member next to the union to say which one of > the union members is valid, so that you can say > > if (atom->atom_type == ATOM_TYPE_REMOTE_REF && > atom->u.remote_ref.push) > > (introduce an enum and define ATOM_TYPE_* after the member in the > union). > Yes, I think so. Since the content of this part needs to be modified for the parsing of all atoms, I will put it in a separate topic to complete. > That would help futureproofing the code even further, as a new > synonym of "push" introduced laster [*] would not invalidate the check you are > adding there. > Yes, this enhances its generalization ability. > > [Footnote] > > * remote_ref_atom_parser() in the future may begin like so: > > - if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:")) > + if (!strcmp(atom->name, "push") || starts_with(atom->name, "push:") || > + !strcmp(atom->name, "a-synonym-for-push")) > atom->u.remote_ref.push = 1; > -- ZheNing Hu
diff --git a/ref-filter.c b/ref-filter.c index a0adb4551d87..f467f2fbbb73 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1730,7 +1730,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err) else v->s = xstrdup(""); continue; - } else if (atom->u.remote_ref.push) { + } else if (starts_with(name, "push")) { const char *branch_name; v->s = xstrdup(""); if (!skip_prefix(ref->refname, "refs/heads/",