Message ID | cover.1642129840.git.dyroneteng@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | ls-remote: inconsistency from the order of args and opts | expand |
Teng Long <dyroneteng@gmail.com> writes: > +test_must_fail 'Exchange the order of "--heads" and <remote>' ' > + git --version && > + git init "test.git" && > + test_commit -C "test.git" one && > + git -C "test.git" ls-remote --heads ./. > result.1 && > + git -C "test.git" ls-remote ./. --heads > result.2 && I would say that this is working exactly as designed. As with the unix tradition, after the command name, first come options (e.g. "--heads", "-v", etc. that begin with a dash or two dashes), then arguments like "origin", "master", "." that are not dashed options/flags. Then among the arguments, we generally take revs first and then pathspecs. "git help cli" explicitly mentions this, because it is specific to "git" command suite, but it does not mention "dashed options/flags first and then args", primarily because, at least back when the documentation was written, this was taken as granted, iow, those who wrote the "gitcli" documentation thought it was a common knowledge among users that did not need to be spelled out. Apparently, it is not a common knowledge at least for you (and probably others). Perhaps we should add a paragraph to the cli help and explicitly mention "options first and then args", before we go on to say "among args, revs first and then pathspecs".
Junio C Hamano <gitster@pobox.com> writes: > Teng Long <dyroneteng@gmail.com> writes: > >> +test_must_fail 'Exchange the order of "--heads" and <remote>' ' >> + git --version && >> + git init "test.git" && >> + test_commit -C "test.git" one && >> + git -C "test.git" ls-remote --heads ./. > result.1 && >> + git -C "test.git" ls-remote ./. --heads > result.2 && > > I would say that this is working exactly as designed. As with the > unix tradition, after the command name, first come options > (e.g. "--heads", "-v", etc. that begin with a dash or two dashes), > then arguments like "origin", "master", "." that are not dashed > options/flags. I failed to say one important thing (I was again fooled by the "it is too obvious to say" that led "gitcli" not to mention this, too). "dashed-options first and then args" rule means that generally we scan the command line from left to right one by one, and as long as the one we are looking at begins with "-", we try to interpret it as a dashed option, possibly eat the next one as an argument given to the option (e.g. "--abbrev 7"), and keep repeating, until the one we are at right now does not begin with "-". And then everything after that, we do not interpret it as an option. That is how "--heads" on the second example above, since we have seen . and took it as an argument (not a dashed option) is considered as a pattern. > ... > Apparently, it is not a common knowledge at least for you (and > probably others). Perhaps we should add a paragraph to the cli help > and explicitly mention "options first and then args", before we go > on to say "among args, revs first and then pathspecs".
On Fri, Jan 14, 2022 at 1:47 PM Junio C Hamano <gitster@pobox.com> wrote: > Apparently, it is not a common knowledge at least for you (and > probably others). Perhaps we should add a paragraph to the cli help > and explicitly mention "options first and then args", before we go > on to say "among args, revs first and then pathspecs". It's much clearer now, thanks for the detailed answer. Another question, if I want to follow your advice and add a short paragraph in git CLI document, should this patch continue in the current RFC patchset or launch a new patchset? Thanks.
On Thu, Jan 13 2022, Junio C Hamano wrote: > Teng Long <dyroneteng@gmail.com> writes: > >> +test_must_fail 'Exchange the order of "--heads" and <remote>' ' >> + git --version && >> + git init "test.git" && >> + test_commit -C "test.git" one && >> + git -C "test.git" ls-remote --heads ./. > result.1 && >> + git -C "test.git" ls-remote ./. --heads > result.2 && > > I would say that this is working exactly as designed. As with the > unix tradition, after the command name, first come options > (e.g. "--heads", "-v", etc. that begin with a dash or two dashes), > then arguments like "origin", "master", "." that are not dashed > options/flags. > > Then among the arguments, we generally take revs first and then > pathspecs. "git help cli" explicitly mentions this, because it is > specific to "git" command suite, but it does not mention "dashed > options/flags first and then args", primarily because, at least back > when the documentation was written, this was taken as granted, iow, > those who wrote the "gitcli" documentation thought it was a common > knowledge among users that did not need to be spelled out. > > Apparently, it is not a common knowledge at least for you (and > probably others). Perhaps we should add a paragraph to the cli help > and explicitly mention "options first and then args", before we go > on to say "among args, revs first and then pathspecs". I don't think this summary is accurate. We have multiple commands that are in GNU-fashion loose about whether you provide options first before no-option args, or after. E.g. we accept both of: git push --dry-run <remote> <ref> And: git push <remote> <ref> --dry-run The "tradition" you're referring to accurately summarizes how POSIX specifies that things should work. But when GNU came around its option parser was generally happy to accept options and args in either order. E.g. these both work with GNU coreutils, but the latter will fail on FreeBSD and various other capital-U UNIX-es: touch foo; rm -v foo touch foo; rm foo -v The relevant documentation being https://www.gnu.org/prep/standards/standards.html#Standards-for-Command-Line-Interfaces; specifically: "Note that the GNU version of getopt will normally permit options anywhere among the arguments unless the special argument ‘--’ is used. This is not what POSIX specifies; it is a GNU extension." Now, as to git's behavior we mostly follow the GNU style. The default behavior of parse_options() is to accept the looser GNU style. In this case we're explicitly passing the PARSE_OPT_STOP_AT_NON_OPTION flag. But why is that? Most things that do that are doing so because we're parsing things in two passes, i.e. for commands that have sub-commands, e.g.: git commit-graph [opts] [subcommand] [subcommand-opts] In those cases we use PARSE_OPT_STOP_AT_NON_OPTION on our first pass, so that we'll stop at "subcommand", then we'll generally continue the parsing as of "[subcommand]" without that flag. The general exception to that rule, as you note, being things that take pathspecs, although for some of those we'll apply other custom disambiguation. In this case a look at ba5f28bf79e (ls-remote: use parse-options api, 2016-01-19) will show that it was faithfully converted from an ad-hoc custom option parser using the "BSD-like" option parsing discussed above. I really don't see why we wouldn't take this patch given that the command has one parse_options() invocation (so no subcommand complexity), and doesn't accept pathspecs etc. We should aim to make the command line UX consistent when possible, not slavishly bug-for-bug follow whatever the behavior of the pre-parse_options() code was. We can't do that in some cases, but this case seems like a pretty clear case where simply removing the PARSE_OPT_STOP_AT_NON_OPTION flag would be an improvement. It's not ambiguous, and makes invocations that currently (and inconsistently v.s. our usual behavior) error out work.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > We have multiple commands that are in GNU-fashion loose about whether > you provide options first before no-option args, or after. E.g. we > accept both of: > > git push --dry-run <remote> <ref> > > And: > > git push <remote> <ref> --dry-run Yes, but I consider that a bug that we cannot fix due to backward compatibility issues. That is why my preference is to encourage users to stick to the POSIX way in gltcli, just like we recommend "stuck" form of options its parameter. > But when GNU came around its option parser was generally happy to accept > options and args in either order. E.g. these both work with GNU > coreutils, but the latter will fail on FreeBSD and various other > capital-U UNIX-es: > > touch foo; rm -v foo > touch foo; rm foo -v Yes, among the harm GNU has done on mankind, this is one of the biggest ones. We shouldn't waste our engineering time to support more of them in our tools. As long as users stick to the recommended "dashed options first and then args, among which revs come first and then pathspecs", they will be fine.
nOn Fri, Jan 14 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> We have multiple commands that are in GNU-fashion loose about whether >> you provide options first before no-option args, or after. E.g. we >> accept both of: >> >> git push --dry-run <remote> <ref> >> >> And: >> >> git push <remote> <ref> --dry-run > > Yes, but I consider that a bug that we cannot fix due to backward > compatibility issues. > > That is why my preference is to encourage users to stick to the > POSIX way in gltcli, just like we recommend "stuck" form of options > its parameter. > >> But when GNU came around its option parser was generally happy to accept >> options and args in either order. E.g. these both work with GNU >> coreutils, but the latter will fail on FreeBSD and various other >> capital-U UNIX-es: >> >> touch foo; rm -v foo >> touch foo; rm foo -v This is only an approximate list, but: $ git grep -C3 'parse_options' -- 'builtin/*.c'|grep -c PARSE_OPT_STOP_AT_NON_OPTION 16 $ git grep -C3 'parse_options' -- 'builtin/*.c'|grep -c -F ', 0);' 101 The GNU-like behavior is far more common in our codebase, and I think it's less surprising if commands work the same way for consistency. I manually looked through the PARSE_OPT_STOP_AT_NON_OPTION cases, and I think this is the only one that's using it for no good reason. The others (e.g. "git config") would become ambiguous or error out as a result. > Yes, among the harm GNU has done on mankind, this is one of the > biggest ones. We shouldn't waste our engineering time to support > more of them in our tools. > > As long as users stick to the recommended "dashed options first and > then args, among which revs come first and then pathspecs", they > will be fine. I find it quite useful. E.g. if you typo a command or forget/want to remove an option: git push origin HEAD --dry-run You can just (under readline) do C-p M-DEL, instead of the equivalent navigating back a few words, or having to use more advanced readline features like ^--dry-run^^ or whatever. Anecdotally, I've been surprised by the amount of regular terminal users whose readline skills pretty much and at using the arrow keys to make command corrections. I think this GNU UX decision has probably saved several accumulated man-lifetimes by now :)
On 2022-01-14 at 19:57:17, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Jan 13 2022, Junio C Hamano wrote: > > > Teng Long <dyroneteng@gmail.com> writes: > > > >> +test_must_fail 'Exchange the order of "--heads" and <remote>' ' > >> + git --version && > >> + git init "test.git" && > >> + test_commit -C "test.git" one && > >> + git -C "test.git" ls-remote --heads ./. > result.1 && > >> + git -C "test.git" ls-remote ./. --heads > result.2 && > > > > I would say that this is working exactly as designed. As with the > > unix tradition, after the command name, first come options > > (e.g. "--heads", "-v", etc. that begin with a dash or two dashes), > > then arguments like "origin", "master", "." that are not dashed > > options/flags. > > > > Then among the arguments, we generally take revs first and then > > pathspecs. "git help cli" explicitly mentions this, because it is > > specific to "git" command suite, but it does not mention "dashed > > options/flags first and then args", primarily because, at least back > > when the documentation was written, this was taken as granted, iow, > > those who wrote the "gitcli" documentation thought it was a common > > knowledge among users that did not need to be spelled out. > > > > Apparently, it is not a common knowledge at least for you (and > > probably others). Perhaps we should add a paragraph to the cli help > > and explicitly mention "options first and then args", before we go > > on to say "among args, revs first and then pathspecs". > > I don't think this summary is accurate. > > We have multiple commands that are in GNU-fashion loose about whether > you provide options first before no-option args, or after. E.g. we > accept both of: > > git push --dry-run <remote> <ref> > > And: > > git push <remote> <ref> --dry-run > > The "tradition" you're referring to accurately summarizes how POSIX > specifies that things should work. > > But when GNU came around its option parser was generally happy to accept > options and args in either order. E.g. these both work with GNU > coreutils, but the latter will fail on FreeBSD and various other > capital-U UNIX-es: > > touch foo; rm -v foo > touch foo; rm foo -v Yes, POSIX specifies this is how it should work because it avoids ambiguity. According to POSIX, -v is a file, and that's a valid name on Unix. If GNU rm fails to delete that file or provide a diagnostic about why it didn't, that's a bug. In some cases, we do allow the GNU behavior of providing options anywhere on the command line, but we don't when it causes ambiguity, like in this case. I think we should document the current behavior, but I also think it's a given when working on Unix because many tools don't work that way. For example, test and find don't permit arbitrary location of options and arguments and they are found on all Unix systems. You can't write "test foo -f". And to prove that this is ambiguous, I provide you the following example: $ git update-ref refs/heads/--symref HEAD $ git ls-remote . --symref 1ffcbaa1a5f10c9f706314d77f88de20a4a498c2 refs/heads/--symref That prints something very different if I write "git ls-remote --symref .". And it is actually the case that people write this kind of syntax in scripts relying on the current behavior and then those scripts get used in a variety of situations with arbitrary ref names, so this should continue to work this way. I believe a former employer may have these kinds of scripts, for example. I'm not opposed to us building new tools which support the GNU behavior, but I don't think we should change tools where we have the existing behavior because it does lead to breakage in some scripting situations.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >>> But when GNU came around its option parser was generally happy to accept >>> options and args in either order. E.g. these both work with GNU >>> coreutils, but the latter will fail on FreeBSD and various other >>> capital-U UNIX-es: >>> >>> touch foo; rm -v foo >>> touch foo; rm foo -v > > This is only an approximate list, but: Don't waste your time on this, and instead spend on something more useful. What I gave wrt gitcli.txt in an earlier message is final.
On Fri, Jan 14 2022, brian m. carlson wrote: > [[PGP Signed Part:Undecided]] > On 2022-01-14 at 19:57:17, Ævar Arnfjörð Bjarmason wrote: >> >> On Thu, Jan 13 2022, Junio C Hamano wrote: >> >> > Teng Long <dyroneteng@gmail.com> writes: >> > >> >> +test_must_fail 'Exchange the order of "--heads" and <remote>' ' >> >> + git --version && >> >> + git init "test.git" && >> >> + test_commit -C "test.git" one && >> >> + git -C "test.git" ls-remote --heads ./. > result.1 && >> >> + git -C "test.git" ls-remote ./. --heads > result.2 && >> > >> > I would say that this is working exactly as designed. As with the >> > unix tradition, after the command name, first come options >> > (e.g. "--heads", "-v", etc. that begin with a dash or two dashes), >> > then arguments like "origin", "master", "." that are not dashed >> > options/flags. >> > >> > Then among the arguments, we generally take revs first and then >> > pathspecs. "git help cli" explicitly mentions this, because it is >> > specific to "git" command suite, but it does not mention "dashed >> > options/flags first and then args", primarily because, at least back >> > when the documentation was written, this was taken as granted, iow, >> > those who wrote the "gitcli" documentation thought it was a common >> > knowledge among users that did not need to be spelled out. >> > >> > Apparently, it is not a common knowledge at least for you (and >> > probably others). Perhaps we should add a paragraph to the cli help >> > and explicitly mention "options first and then args", before we go >> > on to say "among args, revs first and then pathspecs". >> >> I don't think this summary is accurate. >> >> We have multiple commands that are in GNU-fashion loose about whether >> you provide options first before no-option args, or after. E.g. we >> accept both of: >> >> git push --dry-run <remote> <ref> >> >> And: >> >> git push <remote> <ref> --dry-run >> >> The "tradition" you're referring to accurately summarizes how POSIX >> specifies that things should work. >> >> But when GNU came around its option parser was generally happy to accept >> options and args in either order. E.g. these both work with GNU >> coreutils, but the latter will fail on FreeBSD and various other >> capital-U UNIX-es: >> >> touch foo; rm -v foo >> touch foo; rm foo -v > > Yes, POSIX specifies this is how it should work because it avoids > ambiguity. According to POSIX, -v is a file, and that's a valid name on > Unix. If GNU rm fails to delete that file or provide a diagnostic about > why it didn't, that's a bug. It's not a bug that its default behavior isn't to slavishly follow POSIX, but if you'd like you can turn it on: $ touch -- file -v; rm -v file -v removed 'file' $ touch -- file -v; POSIXLY_CORRECT=1 rm -v file -v removed 'file' removed '-v' There's a good reason for this departure in behavior: If you have a single file called '-v' POSIX doesn't provide any way to remove it other than something like: rm ./-v Which can be painful when scripting things. I.e. you'll need to munge the name itself, v.s. consistently supporting "--": rm -- -v > In some cases, we do allow the GNU behavior of providing options > anywhere on the command line, but we don't when it causes ambiguity, > like in this case. I think we should document the current behavior, but > I also think it's a given when working on Unix because many tools don't > work that way. For example, test and find don't permit arbitrary > location of options and arguments and they are found on all Unix > systems. You can't write "test foo -f". Yes, *nix systems are all over the place with this. And then e.g. "dd" and the like accept arguments whose syntax pre-dates "-"-prefixed arguments. I'm only talking about how our command collection should behave, and how we should explain its behavior, or what behavior we might prefer. > And to prove that this is ambiguous, I provide you the following > example: > > $ git update-ref refs/heads/--symref HEAD > $ git ls-remote . --symref > 1ffcbaa1a5f10c9f706314d77f88de20a4a498c2 refs/heads/--symref > > That prints something very different if I write "git ls-remote --symref > .". And it is actually the case that people write this kind of syntax > in scripts relying on the current behavior and then those scripts get > used in a variety of situations with arbitrary ref names, so this should > continue to work this way. I believe a former employer may have these > kinds of scripts, for example. > > I'm not opposed to us building new tools which support the GNU behavior, > but I don't think we should change tools where we have the existing > behavior because it does lead to breakage in some scripting situations. Yes, my claim that it's "not ambiguous" in <220114.867db2rs0n.gmgdl@evledraar.gmail.com> isn't correct, as you point out. I suspect those cases are probably too obscure to worry about in practice, but yes, we might not want to convert any existing command due to those. But we should really be recommending use of -- or --end-of-options whenever possible, as it can be hard to know in Git's CLI whether options after args are accepted or not.
Teng Long <dyroneteng@gmail.com> writes: > On Fri, Jan 14, 2022 at 1:47 PM Junio C Hamano <gitster@pobox.com> wrote: > >> Apparently, it is not a common knowledge at least for you (and >> probably others). Perhaps we should add a paragraph to the cli help >> and explicitly mention "options first and then args", before we go >> on to say "among args, revs first and then pathspecs". > > It's much clearer now, thanks for the detailed answer. > > Another question, if I want to follow your advice and add a short > paragraph in git CLI document, should this patch continue in the > current RFC patchset or launch a new patchset? If I were you, I would retract the ls-remote topic and create a brand new topic that is about clarifying/enhancing the gitcli.txt file, which has nothing to do with ls-remote, because the command line option/argument convention is not specific to a single command.
On Fri, Jan 14 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >>>> But when GNU came around its option parser was generally happy to accept >>>> options and args in either order. E.g. these both work with GNU >>>> coreutils, but the latter will fail on FreeBSD and various other >>>> capital-U UNIX-es: >>>> >>>> touch foo; rm -v foo >>>> touch foo; rm foo -v >> >> This is only an approximate list, but: > > Don't waste your time on this, and instead spend on something more > useful. What I gave wrt gitcli.txt in an earlier message is final. Whatever we do with git-ls-remote (which I don't really care all that much about) gitcli should really be documenting how our tooling behaves. Which is what I was mainly pointing out upthread, that your summary of options before other types of args omitted that many utilities support the reverse. Or perhaps you were only describing an asthetic choice (which I don't think is worth debating). I'm just talking about what the ground truth is. What do you think about something like this to clear this up?: diff --git a/Documentation/gitcli.txt b/Documentation/gitcli.txt index 92e4ba6a2fa..b1387c4fe68 100644 --- a/Documentation/gitcli.txt +++ b/Documentation/gitcli.txt @@ -19,6 +19,13 @@ Many commands take revisions (most often "commits", but sometimes "tree-ish", depending on the context and command) and paths as their arguments. Here are the rules: + * Options are (almost) universally accpted before other types of + arguments, e.g. `git cat-file -t HEAD` or `git push --dry-run + origin`, but in the case of those commands a GNU-style `git + cat-file HEAD -t` and `git push origin --dry-run` would work just + as well. The reverse is often not true, many commands do not accept + options after non-option arguments. + * Revisions come first and then paths. E.g. in `git diff v1.0 v2.0 arch/x86 include/asm-x86`, `v1.0` and `v2.0` are revisions and `arch/x86` and `include/asm-x86`
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > But we should really be recommending use of -- or --end-of-options > whenever possible, as it can be hard to know in Git's CLI whether > options after args are accepted or not. Way before doing so, we should recommend the "options and then args" order. Users do not have to know which subcommands accepts options after args.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Which is what I was mainly pointing out upthread, that your summary of > options before other types of args omitted that many utilities support > the reverse. Or perhaps you were only describing an asthetic choice > (which I don't think is worth debating). I'm just talking about what the > ground truth is. The ground truth is that it is unlikely that we can fix some of our commands so that they stop taking options that are given after args, because of inertia. But we can teach users especially the new ones to always use the canonical order to sidestep the whole "some subcommands imitate misguided GNUism to make it ambiguous, some don't" problem. > What do you think about something like this to clear this up?: As we should aspire to fix the misguided "options can come still after we saw args" eventually (don't talk back on this point to waste any more of my time on a release day), I do not think it is a good idea to say "reverse is often not true" and stopping there. It will mislead people to think these "not true" commands should somehow be updated to the GNUism in the future. It's the other way around. > + * Options are (almost) universally accpted before other types of > + arguments, e.g. `git cat-file -t HEAD` or `git push --dry-run > + origin`, but in the case of those commands a GNU-style `git > + cat-file HEAD -t` and `git push origin --dry-run` would work just > + as well. The reverse is often not true, many commands do not accept > + options after non-option arguments. * A subcommand may take dashed options (which may take their own arguments, e.g. "--max-parents 2") and arguments. You SHOULD give dashed options first and then arguments. Some commands may accept dashed options after you have already gave non-option arguments (which may make the command ambiguous), but you should not rely on it (because eventually we may find a way to fix these ambiguity by enforcing the "options then args" rule).
On Fri, Jan 14 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> But we should really be recommending use of -- or --end-of-options >> whenever possible, as it can be hard to know in Git's CLI whether >> options after args are accepted or not. > > Way before doing so, we should recommend the "options and then args" > order. Users do not have to know which subcommands accepts options > after args. If you want to script git to do e.g.: touch -- foo -r git add foo -r git rm foo -r git status foo -r git log foo -r You do need to know that those won't work, if that "foo -r" is from e.g. scripted arguments that those commands will accept or interpret that -r as an argument. Hence suggesting that the user just add "--" to resolve the ambiguity, as gitcli already discusses.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Hence suggesting that the user just add "--" to resolve the ambiguity, > as gitcli already discusses. Sadly, requiring "--end-of-options" is a solution for a problem that we didn't have to create. If we didn't take options written after arg, git rm --end-of-options foo -r git rm foo ./-r to force "-r" to be interpreted as a filename wouldn't have been necessary. The presence of "foo" before "-r" would have been sufficient. I agree with you that, unfortunately, we'd need to teach a way (i.e. "--" or "--end-of-options") to defeat this misguided GNUism in some commands. Even if users stick to "options and then args", sadly, they need to know it. But we do already explain "--" and "--end-of-options" in gitcli.txt so we should be OK.
On Fri, Jan 14, 2022 at 12:24 PM Teng Long <dyroneteng@gmail.com> wrote: > So I sent this patch to try to figure out if this was a requirement of the > original design, or if it was something that could really be improved, or if > we can find a better way to deal this issue with compatibility. I really appreciate your comments and explanations. Now I've cleared that up, personally. It's not a bad thing that there are some conventions for use, the point is whether the rules are clear to others. So, after that, I will try to modify a little in "git-cli.txt" with a new patchset. Thanks.