Message ID | xmqqr17lphav.fsf_-_@gitster.g (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | parse-options: make parse_options_check() test-only | expand |
On Tue, Mar 01 2022, Junio C Hamano wrote: > The array of options given to the parse-options API is sanity > checked for reuse of a single-letter option for multiple entries and > other programmer mistakes by calling parse_options_check() from > parse_options_start(). This allows our developers to catch silly > mistakes early, but all callers of parse-options API pays this cost. > Once the set of options in an array is validated and passes this > check, until a programmer modifies the array, there is no way for it > to fail the check, which is wasteful. That's not true due to the "git rev-parse --parseopt" interface. I'd be happy to deprecate it, but I think the last time I brought it up you were opposed, i.e. it's documented as plumbing in "git-rev-parse", and it's easy to have it hit some of these BUG()'s. I see the benifit of Johannes's suggestion of checking this once (but with t0012-help.sh etc. we're nowhere near being able to do that). Now this runs for the whole test suite, so our tests will have the the same behavior. So it's just an optimization? Isn't it premature, if you run parse_options_check() in a loop how many checks/sec can we do? I haven't tested, but I'm betting it's a *lot*. So aren't we shaving microseconds off the runtime here?
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Tue, Mar 01 2022, Junio C Hamano wrote: > >> The array of options given to the parse-options API is sanity >> checked for reuse of a single-letter option for multiple entries and >> other programmer mistakes by calling parse_options_check() from >> parse_options_start(). This allows our developers to catch silly >> mistakes early, but all callers of parse-options API pays this cost. >> Once the set of options in an array is validated and passes this >> check, until a programmer modifies the array, there is no way for it >> to fail the check, which is wasteful. > > That's not true due to the "git rev-parse --parseopt" interface. I'd be Meaning that a parse-options array can be fed by "rev-parse --parseopt" and having the sanity check enabled does help the use case? Even there, I would say that once the script writer finishes developing the script that uses "rev-parse --parseopt", setting the parseopt input in stone, there is no need to check the same thing over and over again. Am I mistaken? Does "rev-parse --parseopt" that is fed the same input sometimes trigger the sanity check and sometimes not? > I see the benifit of Johannes's suggestion of checking this once (but > with t0012-help.sh etc. we're nowhere near being able to do that). > > Now this runs for the whole test suite, so our tests will have the the > same behavior. The code for sanity check is there ONLY to help those who develop while they develop, and it is logical to enable it during our tests. There is no reason to trigger the sanity check in the end-user environment, no? > So aren't we shaving microseconds off the runtime here? No, the problem I have with the runtime check is more at the conceptual level. Those who remove assert() by setting _NDEBUG would not be doing so to save nanoseconds, either.
On Tue, Mar 01 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Tue, Mar 01 2022, Junio C Hamano wrote: >> >>> The array of options given to the parse-options API is sanity >>> checked for reuse of a single-letter option for multiple entries and >>> other programmer mistakes by calling parse_options_check() from >>> parse_options_start(). This allows our developers to catch silly >>> mistakes early, but all callers of parse-options API pays this cost. >>> Once the set of options in an array is validated and passes this >>> check, until a programmer modifies the array, there is no way for it >>> to fail the check, which is wasteful. >> >> That's not true due to the "git rev-parse --parseopt" interface. I'd be > > Meaning that a parse-options array can be fed by "rev-parse --parseopt" > and having the sanity check enabled does help the use case? Even there, > I would say that once the script writer finishes developing the script > that uses "rev-parse --parseopt", setting the parseopt input in stone, > there is no need to check the same thing over and over again. Am I > mistaken? Does "rev-parse --parseopt" that is fed the same input > sometimes trigger the sanity check and sometimes not? If we're declaring that "git rev-parse --parseopt" is something that was only ever intended for in-tree usage sure, that should hold true. I.e. "git rev-parse" is documented as plumbing, and we document --parseopt as a generic option parsing mechanism you can use in shellscripts. So out-of-tree users wouldn't guard against GIT_TEST_PARSE_OPTIONS_CHECK, and I wouldn't be surprised if we could e.g. segfault on some subsequent code if some of the sanity checks aren't happening anymore. No, I'd be quite happy if we declared that it's for our use only, and could remove it when the last in-tree *.sh user went away. there's a bit of complexity in parse_options() required only for its use.... >> I see the benifit of Johannes's suggestion of checking this once (but >> with t0012-help.sh etc. we're nowhere near being able to do that). >> >> Now this runs for the whole test suite, so our tests will have the the >> same behavior. > > The code for sanity check is there ONLY to help those who develop > while they develop, and it is logical to enable it during our tests. > There is no reason to trigger the sanity check in the end-user > environment, no? I don't see the benefit of skipping it. Your commit message mentions "but all callers of parse-options API pays this cost". As a quick & dumb perf test I tried: diff --git a/parse-options.c b/parse-options.c index 6e57744fd22..cabea35e8b1 100644 --- a/parse-options.c +++ b/parse-options.c @@ -523,7 +523,10 @@ static void parse_options_start_1(struct parse_opt_ctx_t *ctx, if ((flags & PARSE_OPT_ONE_SHOT) && (flags & PARSE_OPT_KEEP_ARGV0)) BUG("Can't keep argv0 if you don't have it"); - parse_options_check(options); + while (1) { + printf("."); + parse_options_check(options); + } } void parse_options_start(struct parse_opt_ctx_t *ctx, And: ./git [am|rebase] | pv >/dev/null Get around 4MiB/s. I.e. we can do this check ~4 million times/sec on my computer, with -O3, with -O0 -g it's ~3MiB/s. So the performance cost is trivial & not worth worrying about. >> So aren't we shaving microseconds off the runtime here? > > No, the problem I have with the runtime check is more at the > conceptual level. Those who remove assert() by setting _NDEBUG > would not be doing so to save nanoseconds, either. I think the trade-off of not having to worry about the runtime v.s. "development build" checks is one we've done well with BUG(), i.e. not to have it be an assert(). E.g. in this case we have parse_options_concat(), so you can dynamically construct the options to be checked. I happen to have looked in detail at all of that code in the past, and I don't *think* it's doing something "actually dynamic". I.e. it should be the same when the tests run and when git runs in the wild. But having to know and check that when using or changing the API is just more state to keep in your head.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> Meaning that a parse-options array can be fed by "rev-parse --parseopt" >> and having the sanity check enabled does help the use case? Even there, >> I would say that once the script writer finishes developing the script >> that uses "rev-parse --parseopt", setting the parseopt input in stone, >> there is no need to check the same thing over and over again. Am I >> mistaken? Does "rev-parse --parseopt" that is fed the same input >> sometimes trigger the sanity check and sometimes not? > > If we're declaring that "git rev-parse --parseopt" is something that was > only ever intended for in-tree usage sure, that should hold true. > So out-of-tree users wouldn't guard against > GIT_TEST_PARSE_OPTIONS_CHECK, and I wouldn't be surprised if we could > e.g. segfault on some subsequent code if some of the sanity checks > aren't happening anymore. > ... > No, I'd be quite happy if we declared that it's for our use only, and > could remove it when the last in-tree *.sh user went away. there's a bit > of complexity in parse_options() required only for its use.... I do not see any need for such a declaration. We are not changing the behaviour of "git rev-parse --parseopt" plumbing command at all for those who feed valid input to it. "rev-parse --parseopt" users can keep using their scripts just the same as before, debugging their scripts to catch silly mistakes like duplicated short options may become slightly harder, but they still have a way to ask for the same debugging support available. Yes, I am saying that is perfectly fine, and both in-tree and out-of-tree users have a way to reinstate the sanity checks. I also do not mind if your proposal were one of these: * introduce --parseopt-with-sanity-check to "rev-parse" and arrange the parse_options_check() call to be made when the command was invoked with it; or * introduce --parse-opt-without-sanity-check to "rev-parse", and arrange the parse_options_check() call to be still made when "--parse-opt" is used. Those who finished developing their scripts can rewrite their --parse-opt to "without" version for conceptual cleanliness. > So the performance cost is trivial & not worth worrying about. I already said I am not worried about it, didn't I? These numbers do not matter in this discussion.
On Wed, Mar 02 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > [...] >> So the performance cost is trivial & not worth worrying about. > > I already said I am not worried about it, didn't I? These numbers > do not matter in this discussion. Sorry, but I really don't see the point then. You'd like to keep "git rev-parse --parseopt", but now if you feed bad input to it you'll get worse error messages from it, and it's not for a performance benefit then why? Why would we have worse error reporting without any upside? Another common case would be locally hacking a command that uses parse_options(), having it do the wrong thing for some cryptic reason we'd catch in parse_options_check(). Then eventually remember to turn on this GIT_TEST_* knob (i.e. if testing via the command-line/debugger instead of the test suite). I for one do that a lot when working on the parse_options()-using commands in-tree, if this land I'll probably remember to add this knob to my .bashrc, but everyone else will find out the hard way...
diff --git a/parse-options.c b/parse-options.c index 6e57744fd2..02cfe3f2cd 100644 --- a/parse-options.c +++ b/parse-options.c @@ -439,6 +439,14 @@ static void check_typos(const char *arg, const struct option *options) } } +/* + * Check the sanity of contents of opts[] array to find programmer + * mistakes (like duplicated short options). + * + * This function is supposed to be no-op when it returns without + * dying, making a call from parse_options_start_1() to it optional + * in end-user builds. + */ static void parse_options_check(const struct option *opts) { int err = 0; @@ -523,7 +531,9 @@ static void parse_options_start_1(struct parse_opt_ctx_t *ctx, if ((flags & PARSE_OPT_ONE_SHOT) && (flags & PARSE_OPT_KEEP_ARGV0)) BUG("Can't keep argv0 if you don't have it"); - parse_options_check(options); + + if (git_env_bool("GIT_TEST_PARSE_OPTIONS_CHECK", 0)) + parse_options_check(options); } void parse_options_start(struct parse_opt_ctx_t *ctx, diff --git a/t/README b/t/README index f48e0542cd..b7285531f2 100644 --- a/t/README +++ b/t/README @@ -472,6 +472,11 @@ a test and then fails then the whole test run will abort. This can help to make sure the expected tests are executed and not silently skipped when their dependency breaks or is simply not present in a new environment. +GIT_TEST_PARSE_OPTIONS_CHECK=<boolean>, when true, makes all options +array passed to the parse-options API to be sanity checked. This +environment variable is set to true by test-lib.sh unless it is set. + + Naming Tests ------------ diff --git a/t/test-lib.sh b/t/test-lib.sh index e4716b0b86..805f495fd4 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -474,6 +474,9 @@ export GIT_DEFAULT_HASH GIT_TEST_MERGE_ALGORITHM="${GIT_TEST_MERGE_ALGORITHM:-ort}" export GIT_TEST_MERGE_ALGORITHM +: ${GIT_TEST_PARSE_OPTIONS_CHECK:=1} +export GIT_TEST_PARSE_OPTIONS_CHECK + # Tests using GIT_TRACE typically don't want <timestamp> <file>:<line> output GIT_TRACE_BARE=1 export GIT_TRACE_BARE