parse-options: make parse_options_check() test-only

Message ID	xmqqr17lphav.fsf_-_@gitster.g (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Junio C Hamano <gitster@pobox.com> To: git <git@vger.kernel.org> Subject: [PATCH] parse-options: make parse_options_check() test-only References: <xmqqtuck3yv2.fsf@gitster.g> <20220228073908.20553-1-chakrabortyabhradeep79@gmail.com> <xmqqzgma287n.fsf@gitster.g> Date: Tue, 01 Mar 2022 12:08:08 -0800 In-Reply-To: <xmqqzgma287n.fsf@gitster.g> (Junio C. Hamano's message of "Mon, 28 Feb 2022 09:48:28 -0800") Message-ID: <xmqqr17lphav.fsf_-_@gitster.g> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	parse-options: make parse_options_check() test-only \| expand parse-options: make parse_options_check() test-only

Junio C Hamano March 1, 2022, 8:08 p.m. UTC

The array of options given to the parse-options API is sanity
checked for reuse of a single-letter option for multiple entries and
other programmer mistakes by calling parse_options_check() from
parse_options_start().  This allows our developers to catch silly
mistakes early, but all callers of parse-options API pays this cost.
Once the set of options in an array is validated and passes this
check, until a programmer modifies the array, there is no way for it
to fail the check, which is wasteful.

Introduce the GIT_TEST_PARSE_OPTIONS_CHECK environment variable and
make the sanity check only when it is set to true.  Set it in
t/test-lib.sh so that our tests will continue to catch buggy options
arrays.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

    >  (2) Rethink if parse_options_check() can be made optional at
    >      runtime, which would (a) allow our test to enable it, and allow
    >      us to test all code paths that use parse_options() centrally,
    >      and (b) allow us to bypass the check while the end-user runs
    >      "git", to avoid overhead of checking the same option[] array,
    >      which does not change between invocations of "git", over and
    >      over again all over the world.
    >
    >      We may add the check back to parse_options_check() after doing
    >      the above.  There are already tons of "check sanity of what is
    >      inside option[]" in there, and it would be beneficial if we can
    >      separate out from parse_options_start() the sanity checking
    >      code, regardless of this topic.

    This looked too easy and there may be some pitfalls, but I am
    hoping that we will know soon enough by floating a weather
    balloon like this.

 parse-options.c | 12 +++++++++++-
 t/README        |  5 +++++
 t/test-lib.sh   |  3 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

Ævar Arnfjörð Bjarmason March 1, 2022, 9:57 p.m. UTC | #1

On Tue, Mar 01 2022, Junio C Hamano wrote:

> The array of options given to the parse-options API is sanity
> checked for reuse of a single-letter option for multiple entries and
> other programmer mistakes by calling parse_options_check() from
> parse_options_start().  This allows our developers to catch silly
> mistakes early, but all callers of parse-options API pays this cost.
> Once the set of options in an array is validated and passes this
> check, until a programmer modifies the array, there is no way for it
> to fail the check, which is wasteful.

That's not true due to the "git rev-parse --parseopt" interface. I'd be
happy to deprecate it, but I think the last time I brought it up you
were opposed, i.e. it's documented as plumbing in "git-rev-parse", and
it's easy to have it hit some of these BUG()'s.

I see the benifit of Johannes's suggestion of checking this once (but
with t0012-help.sh etc. we're nowhere near being able to do that).

Now this runs for the whole test suite, so our tests will have the the
same behavior.

So it's just an optimization? Isn't it premature, if you run
parse_options_check() in a loop how many checks/sec can we do? I haven't
tested, but I'm betting it's a *lot*.

So aren't we shaving microseconds off the runtime here?

Junio C Hamano March 1, 2022, 10:18 p.m. UTC | #2

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Tue, Mar 01 2022, Junio C Hamano wrote:
>
>> The array of options given to the parse-options API is sanity
>> checked for reuse of a single-letter option for multiple entries and
>> other programmer mistakes by calling parse_options_check() from
>> parse_options_start().  This allows our developers to catch silly
>> mistakes early, but all callers of parse-options API pays this cost.
>> Once the set of options in an array is validated and passes this
>> check, until a programmer modifies the array, there is no way for it
>> to fail the check, which is wasteful.
>
> That's not true due to the "git rev-parse --parseopt" interface. I'd be

Meaning that a parse-options array can be fed by "rev-parse --parseopt"
and having the sanity check enabled does help the use case?  Even there,
I would say that once the script writer finishes developing the script
that uses "rev-parse --parseopt", setting the parseopt input in stone,
there is no need to check the same thing over and over again.  Am I
mistaken?  Does "rev-parse --parseopt" that is fed the same input
sometimes trigger the sanity check and sometimes not?

> I see the benifit of Johannes's suggestion of checking this once (but
> with t0012-help.sh etc. we're nowhere near being able to do that).
>
> Now this runs for the whole test suite, so our tests will have the the
> same behavior.

The code for sanity check is there ONLY to help those who develop
while they develop, and it is logical to enable it during our tests.
There is no reason to trigger the sanity check in the end-user
environment, no?

> So aren't we shaving microseconds off the runtime here?

No, the problem I have with the runtime check is more at the
conceptual level.  Those who remove assert() by setting _NDEBUG
would not be doing so to save nanoseconds, either.

Ævar Arnfjörð Bjarmason March 2, 2022, 10:52 a.m. UTC | #3

On Tue, Mar 01 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Tue, Mar 01 2022, Junio C Hamano wrote:
>>
>>> The array of options given to the parse-options API is sanity
>>> checked for reuse of a single-letter option for multiple entries and
>>> other programmer mistakes by calling parse_options_check() from
>>> parse_options_start().  This allows our developers to catch silly
>>> mistakes early, but all callers of parse-options API pays this cost.
>>> Once the set of options in an array is validated and passes this
>>> check, until a programmer modifies the array, there is no way for it
>>> to fail the check, which is wasteful.
>>
>> That's not true due to the "git rev-parse --parseopt" interface. I'd be
>
> Meaning that a parse-options array can be fed by "rev-parse --parseopt"
> and having the sanity check enabled does help the use case?  Even there,
> I would say that once the script writer finishes developing the script
> that uses "rev-parse --parseopt", setting the parseopt input in stone,
> there is no need to check the same thing over and over again.  Am I
> mistaken?  Does "rev-parse --parseopt" that is fed the same input
> sometimes trigger the sanity check and sometimes not?

If we're declaring that "git rev-parse --parseopt" is something that was
only ever intended for in-tree usage sure, that should hold true.

I.e. "git rev-parse" is documented as plumbing, and we document
--parseopt as a generic option parsing mechanism you can use in
shellscripts.

So out-of-tree users wouldn't guard against
GIT_TEST_PARSE_OPTIONS_CHECK, and I wouldn't be surprised if we could
e.g. segfault on some subsequent code if some of the sanity checks
aren't happening anymore.

No, I'd be quite happy if we declared that it's for our use only, and
could remove it when the last in-tree *.sh user went away. there's a bit
of complexity in parse_options() required only for its use....

>> I see the benifit of Johannes's suggestion of checking this once (but
>> with t0012-help.sh etc. we're nowhere near being able to do that).
>>
>> Now this runs for the whole test suite, so our tests will have the the
>> same behavior.
>
> The code for sanity check is there ONLY to help those who develop
> while they develop, and it is logical to enable it during our tests.
> There is no reason to trigger the sanity check in the end-user
> environment, no?

I don't see the benefit of skipping it. Your commit message mentions
"but all callers of parse-options API pays this cost". As a quick & dumb
perf test I tried:

	diff --git a/parse-options.c b/parse-options.c
	index 6e57744fd22..cabea35e8b1 100644
	--- a/parse-options.c
	+++ b/parse-options.c
	@@ -523,7 +523,10 @@ static void parse_options_start_1(struct parse_opt_ctx_t *ctx,
	        if ((flags & PARSE_OPT_ONE_SHOT) &&
	            (flags & PARSE_OPT_KEEP_ARGV0))
	                BUG("Can't keep argv0 if you don't have it");
	-       parse_options_check(options);
	+       while (1) {
	+               printf(".");
	+               parse_options_check(options);
	+       }
	 }

	 void parse_options_start(struct parse_opt_ctx_t *ctx,

And:

    ./git [am|rebase] | pv >/dev/null

Get around 4MiB/s. I.e. we can do this check ~4 million times/sec on my
computer, with -O3, with -O0 -g it's ~3MiB/s.

So the performance cost is trivial & not worth worrying about.

>> So aren't we shaving microseconds off the runtime here?
>
> No, the problem I have with the runtime check is more at the
> conceptual level.  Those who remove assert() by setting _NDEBUG
> would not be doing so to save nanoseconds, either.

I think the trade-off of not having to worry about the runtime
v.s. "development build" checks is one we've done well with BUG(),
i.e. not to have it be an assert().

E.g. in this case we have parse_options_concat(), so you can dynamically
construct the options to be checked.

I happen to have looked in detail at all of that code in the past, and I
don't *think* it's doing something "actually dynamic". I.e. it should be
the same when the tests run and when git runs in the wild.

But having to know and check that when using or changing the API is just
more state to keep in your head.

Junio C Hamano March 2, 2022, 6:59 p.m. UTC | #4

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Meaning that a parse-options array can be fed by "rev-parse --parseopt"
>> and having the sanity check enabled does help the use case?  Even there,
>> I would say that once the script writer finishes developing the script
>> that uses "rev-parse --parseopt", setting the parseopt input in stone,
>> there is no need to check the same thing over and over again.  Am I
>> mistaken?  Does "rev-parse --parseopt" that is fed the same input
>> sometimes trigger the sanity check and sometimes not?
>
> If we're declaring that "git rev-parse --parseopt" is something that was
> only ever intended for in-tree usage sure, that should hold true.

> So out-of-tree users wouldn't guard against
> GIT_TEST_PARSE_OPTIONS_CHECK, and I wouldn't be surprised if we could
> e.g. segfault on some subsequent code if some of the sanity checks
> aren't happening anymore.
> ...
> No, I'd be quite happy if we declared that it's for our use only, and
> could remove it when the last in-tree *.sh user went away. there's a bit
> of complexity in parse_options() required only for its use....

I do not see any need for such a declaration.  We are not changing
the behaviour of "git rev-parse --parseopt" plumbing command at all
for those who feed valid input to it.

"rev-parse --parseopt" users can keep using their scripts just the
same as before, debugging their scripts to catch silly mistakes like
duplicated short options may become slightly harder, but they still
have a way to ask for the same debugging support available.

Yes, I am saying that is perfectly fine, and both in-tree and
out-of-tree users have a way to reinstate the sanity checks.  I also
do not mind if your proposal were one of these:

 * introduce --parseopt-with-sanity-check to "rev-parse" and arrange
   the parse_options_check() call to be made when the command was
   invoked with it; or

 * introduce --parse-opt-without-sanity-check to "rev-parse", and
   arrange the parse_options_check() call to be still made when
   "--parse-opt" is used.  Those who finished developing their
   scripts can rewrite their --parse-opt to "without" version for
   conceptual cleanliness.

> So the performance cost is trivial & not worth worrying about.

I already said I am not worried about it, didn't I?  These numbers
do not matter in this discussion.

Ævar Arnfjörð Bjarmason March 2, 2022, 7:17 p.m. UTC | #5

On Wed, Mar 02 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> [...]
>> So the performance cost is trivial & not worth worrying about.
>
> I already said I am not worried about it, didn't I?  These numbers
> do not matter in this discussion.

Sorry, but I really don't see the point then.

You'd like to keep "git rev-parse --parseopt", but now if you feed bad
input to it you'll get worse error messages from it, and it's not for a
performance benefit then why? Why would we have worse error reporting
without any upside?

Another common case would be locally hacking a command that uses
parse_options(), having it do the wrong thing for some cryptic reason
we'd catch in parse_options_check().

Then eventually remember to turn on this GIT_TEST_* knob (i.e.  if
testing via the command-line/debugger instead of the test suite). I for
one do that a lot when working on the parse_options()-using commands
in-tree, if this land I'll probably remember to add this knob to my
.bashrc, but everyone else will find out the hard way...

parse-options: make parse_options_check() test-only

Commit Message

Comments

Patch