[1/2] config: return positive from git_config_parse_key()

Message ID	99244816307b822bd8ffcbff8690ef449c797a23.1689891436.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@vger.kernel.org> Message-ID: <99244816307b822bd8ffcbff8690ef449c797a23.1689891436.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1551.git.git.1689891436.gitgitgadget@gmail.com> References: <pull.1551.git.git.1689891436.gitgitgadget@gmail.com> Date: Thu, 20 Jul 2023 22:17:15 +0000 Subject: [PATCH 1/2] config: return positive from git_config_parse_key() Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com>, Calvin Wan <calvinwan@google.com>, Glen Choo <chooglen@google.com>, Glen Choo <chooglen@google.com> Precedence: bulk From: Glen Choo <chooglen@google.com>
Series	config-parse: create config parsing library \| expand [0/2] config-parse: create config parsing library [1/2] config: return positive from git_config_parse_key() [2/2] config-parse: split library out of config.[c\|h]

Message ID

99244816307b822bd8ffcbff8690ef449c797a23.1689891436.git.gitgitgadget@gmail.com (mailing list archive)

State

New, archived

Headers

Message-ID: 
 <99244816307b822bd8ffcbff8690ef449c797a23.1689891436.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.1551.git.git.1689891436.gitgitgadget@gmail.com>
References: <pull.1551.git.git.1689891436.gitgitgadget@gmail.com>
Date: Thu, 20 Jul 2023 22:17:15 +0000
Subject: [PATCH 1/2] config: return positive from git_config_parse_key()
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
        Calvin Wan <calvinwan@google.com>,
        Glen Choo <chooglen@google.com>,
        Glen Choo <chooglen@google.com>
Precedence: bulk
From: Glen Choo <chooglen@google.com>

Series

config-parse: create config parsing library | expand

Commit Message

Glen Choo July 20, 2023, 10:17 p.m. UTC

From: Glen Choo <chooglen@google.com>

git_config_parse_key() returns #define-d error codes, but negated. This
negation is merely a convenience to other parts of config.c that don't
bother inspecting the return value before passing it along. But:

a) There's no good reason why those callers couldn't negate the value
   themselves.

b) In other callers, this value eventually gets fed to exit(3), and
   those callers need to sanitize the negative value (and they sometimes
   do so lossily, by overriding the return value with
   CONFIG_INVALID_KEY).

c) We want to move that into a separate library, and returning only
   negative values no longer makes as much sense.

Change git_config_parse_key() to return positive values instead, and
adjust callers accordingly. Callers that sanitize the negative sign for
exit(3) now pass the return value opaquely, fixing a bug where "git
config <key with no section or name>" results in a different exit code
depending on whether we are setting or getting config. Callers that
wanted to pass along a negative value now negate the return value
themselves.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 builtin/config.c   |  3 +--
 config.c           | 16 ++++++++--------
 config.h           |  2 +-
 submodule-config.c |  4 ++--
 t/t1300-config.sh  | 16 ++++++++++++++++
 5 files changed, 28 insertions(+), 13 deletions(-)

Comments

Jonathan Tan July 20, 2023, 11:44 p.m. UTC | #1

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Glen Choo <chooglen@google.com>
> 
> git_config_parse_key() returns #define-d error codes, but negated. This
> negation is merely a convenience to other parts of config.c that don't
> bother inspecting the return value before passing it along. But:
> 
> a) There's no good reason why those callers couldn't negate the value
>    themselves.
> 
> b) In other callers, this value eventually gets fed to exit(3), and
>    those callers need to sanitize the negative value (and they sometimes
>    do so lossily, by overriding the return value with
>    CONFIG_INVALID_KEY).
> 
> c) We want to move that into a separate library, and returning only
>    negative values no longer makes as much sense.

I'm not sure if we ever concluded that functions returning errors should
return positive integers, but in this case I think it makes sense. We
can document what's returned as being the same as what's documented in
the config manpage.

The negative return was as early as when the function was first
introduced in b09c53a3e3 (Sanity-check config variable names, 2011-01-
30), but there's no indication there as to why the author chose negative
values.

> Change git_config_parse_key() to return positive values instead, and
> adjust callers accordingly. Callers that sanitize the negative sign for
> exit(3) now pass the return value opaquely, fixing a bug where "git
> config <key with no section or name>" results in a different exit code
> depending on whether we are setting or getting config.

Can you be more precise as to which bug is being fixed? (I think
somewhere, a 1 is returned when it should be a 2.)

> Callers that
> wanted to pass along a negative value now negate the return value
> themselves.

OK.

> diff --git a/builtin/config.c b/builtin/config.c
> index 1c75cbc43df..8a2840f0a8c 100644
> --- a/builtin/config.c
> +++ b/builtin/config.c
> @@ -362,8 +362,7 @@ static int get_value(const char *key_, const char *regex_, unsigned flags)
>  			goto free_strings;
>  		}
>  	} else {
> -		if (git_config_parse_key(key_, &key, NULL)) {
> -			ret = CONFIG_INVALID_KEY;
> +		if ((ret = git_config_parse_key(key_, &key, NULL))) {
>  			goto free_strings;
>  		}
>  	}

Ah, here, the return value was sanitized in such a way that it lost
information. The change makes sense.

Besides the callers modified in this patch, there is another caller
config_parse_pair() in config.c, but that just checks whether the return
value is 0, so it remaining unmodified is fine.

> diff --git a/config.h b/config.h
> index 6332d749047..40966cb6828 100644
> --- a/config.h
> +++ b/config.h
> @@ -23,7 +23,7 @@
>  
>  struct object_id;
>  
> -/* git_config_parse_key() returns these negated: */
> +/* git_config_parse_key() returns these: */
>  #define CONFIG_INVALID_KEY 1
>  #define CONFIG_NO_SECTION_OR_NAME 2

Should these be turned into an enum? Also, it might be worth adding that
these match the return values as documented in the manpage.

> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> index 387d336c91f..3202b0f8843 100755
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -2590,4 +2590,20 @@ test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such
>  	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
>  '
>  
> +# Exit codes
> +test_expect_success '--get with bad key' '

Rather than put an "exit codes" title, maybe embed that in the test
description.

> +	# Also exits with 1 if the value is not found

I don't understand this comment - what's the difference between a bad
key and a value not being found? And if there's a difference, could we
test both?

> +	test_expect_code 1 git config --get "bad.name\n" 2>err &&
> +	grep "error: invalid key" err &&
> +	test_expect_code 2 git config --get "bad." 2>err &&
> +	grep "error: key does not contain variable name" err
> +'
> +
> +test_expect_success 'set with bad key' '
> +	test_expect_code 1 git config "bad.name\n" var 2>err &&
> +	grep "error: invalid key" err &&
> +	test_expect_code 2 git config "bad." var 2>err &&
> +	grep "error: key does not contain variable name" err
> +'

Makes sense.

From a libification perspective, I'm not sure that using positive values
to indicate error is an advantage over negative values, but it makes
sense in this particular context to have the return values match the
manpage exactly, since that is part of the benefit of this function. So
I think this patch is worth getting in by itself.

Junio C Hamano July 21, 2023, 4:32 a.m. UTC | #2

"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> git_config_parse_key() returns #define-d error codes, but negated. This
> negation is merely a convenience to other parts of config.c that don't
> bother inspecting the return value before passing it along. But:
>
> a) There's no good reason why those callers couldn't negate the value
>    themselves.

That is not a good reason to break from a widely adopted convention
in UNIXy library functions to signal success with 0 and failure with
negative values.  The callers if they want to have a positive values
can flip the polarity themselves, by the way.

>
> b) In other callers, this value eventually gets fed to exit(3), and
>    those callers need to sanitize the negative value (and they sometimes
>    do so lossily, by overriding the return value with
>    CONFIG_INVALID_KEY).

There is no reason to think that each and every minute difference
the direct callers of a library function may want to notice by
different error return values needs to be propagated to the calling
process via its exit value.  It is perfectly fine and expected for
the status values of the entire process is more coarse grained than
individual library calls, the latter may convey not just "I failed"
but "I failed why" to their callers, while the former may not want
to say "I made a call to some library functions and got this error
code", let alone "I called library function X and got error code Y".

In other words, if your program does

	err = library_call_about_some_filesystem_operation();
	if (err)
		exit(err); /* or exit(-err) */
	err = library_call_about_some_database_operation();
	if (err)
		exit(err); /* or exit(-err) */
	err = library_call_about_some_parsing();
	if (err)
		exit(err); /* or exit(-err) */

there is something wrong.  The error codes from these different
library functions share the same "integer" namespace without being
segregated, and expecting the calling process to be able to tell
what error we discovered by relaying the literal translation of low
level library error code would not work.  The exit() codes would
need to be wider (i.e. not limited only to the possible errors from
a single library function) and would be coarser (i.e. a filesystem
operation may say "open failed due to permission error" or "open
failed because there was no such file"; at the end-user level, it
may be more appropriate to say "configuration file could not be
read", regardless of the reason why the filesystem operation
failed).

	err = library_call_about_some_filesystem_operation();
	if (err) {
		error("cannot open the filesystem entity");
		exit(ERR_FILESYSTEM);
	err = library_call_about_some_database_operation();
	if (err)
		exit(ERR_DATABASE);
	err = library_call_about_some_parsing();
	if (err)
		exit(ERR_PARSING);

So, this is not a good reason, either.

> c) We want to move that into a separate library, and returning only
>    negative values no longer makes as much sense.

Quite the contrary, if it were a purely internal convention, we may
not care too much, as we have only ourselves to confuse by picking
an unusual convention, but if we are making more parts of our code
available in a library-ish interface, I would expect they follow
some convention, and that convention would be "0 for success,
negative for failure".

Thanks.

Glen Choo July 21, 2023, 4:12 p.m. UTC | #3

Junio C Hamano <gitster@pobox.com> writes:

>> From: Glen Choo <chooglen@google.com>
>>
>> git_config_parse_key() returns #define-d error codes, but negated. This
>> negation is merely a convenience to other parts of config.c that don't
>> bother inspecting the return value before passing it along. But:
>>
>> a) There's no good reason why those callers couldn't negate the value
>>    themselves.
>
> That is not a good reason to break from a widely adopted convention
> in UNIXy library functions to signal success with 0 and failure with
> negative values.  The callers if they want to have a positive values
> can flip the polarity themselves, by the way.

Oh, interesting. I was trying to follow the conventions of the
surrounding config.c code and many other parts of the codebase, which
returns positive values. Why do we choose to return postive values
throughout the codebase, by the way? Is it because they were really
intended for exit(3), and not to be used as a library.

>>
>> b) In other callers, this value eventually gets fed to exit(3), and
>>    those callers need to sanitize the negative value (and they sometimes
>>    do so lossily, by overriding the return value with
>>    CONFIG_INVALID_KEY).
>
> There is no reason to think that each and every minute difference
> the direct callers of a library function may want to notice by
> different error return values needs to be propagated to the calling
> process via its exit value.

Yes, I fully agree. I didn't intend to be a statement on how things
should be, but rather how they already are. The oddities in this case
are:

- No callers actually care about the sign of git_config_parse_key()
  since it can only return values of one sign. Only
  configset_find_element() benefits from the sign being negative (since
  it returns negative on config key errors), but instead of putting the
  burden on the function it depends on, it could just return the
  negative value itself. And this "burden" is real, in that other
  callers have to worry about the negative value.

- For better or worse, we've already wired git_config_parse_key()
  directly to the exit values, e.g. if one peeks into
  builtin/config.c:cmd_config(), we'll see "return
  git_config_set_multivar_in_file_gently()", which in turn may return
  the value from git_config_parse_key(). (And as a result, we also try
  hard to separate the error values returnable by git_config_parse_key()
  vs the others returnable by git_config_set_multivar_in_file_gently().)

  I would strongly prefer if builtin/config.c took more responsibility
  for noticing config.c errors and sanitizing them accordingly, but it
  seemed like too much churn for this series. If you think it is the
  right time for it, though (which I think it might be), I could try to
  make that change.

>> c) We want to move that into a separate library, and returning only
>>    negative values no longer makes as much sense.
>
> Quite the contrary, if it were a purely internal convention, we may
> not care too much, as we have only ourselves to confuse by picking
> an unusual convention, but if we are making more parts of our code
> available in a library-ish interface, I would expect they follow
> some convention, and that convention would be "0 for success,
> negative for failure".

Right. I do not care what the convention is, only that we pick one. I
chose the one that I saw in the surrounding code (positive), but I'm
happy to go with UNIXy (negative) if others think it is worth the churn.

Junio C Hamano July 21, 2023, 4:36 p.m. UTC | #4

Glen Choo <chooglen@google.com> writes:

> Oh, interesting. I was trying to follow the conventions of the
> surrounding config.c code and many other parts of the codebase, which
> returns positive values. Why do we choose to return postive values
> throughout the codebase, by the way? Is it because they were really
> intended for exit(3), and not to be used as a library.

If config.c does that, I'd say that was poorly designed oddball.
Looking at read-cache.c (which is older parts of the codebase
written back when the developer base was smaller) may give you a
better examples to follow.  After all, error() returns negative
exactly because we want to follow the usual "negative is an error"
convention.

diff --git a/builtin/config.c b/builtin/config.c
index 1c75cbc43df..8a2840f0a8c 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -362,8 +362,7 @@  static int get_value(const char *key_, const char *regex_, unsigned flags)
 			goto free_strings;
 		}
 	} else {
-		if (git_config_parse_key(key_, &key, NULL)) {
-			ret = CONFIG_INVALID_KEY;
+		if ((ret = git_config_parse_key(key_, &key, NULL))) {
 			goto free_strings;
 		}
 	}
diff --git a/config.c b/config.c
index 85c5f35132c..ca77ca17a47 100644
--- a/config.c
+++ b/config.c
@@ -534,8 +534,9 @@  static inline int iskeychar(int c)
  * Auxiliary function to sanity-check and split the key into the section
  * identifier and variable name.
  *
- * Returns 0 on success, -1 when there is an invalid character in the key and
- * -2 if there is no section name in the key.
+ * Returns 0 on success, CONFIG_INVALID_KEY when there is an invalid character
+ * in the key and CONFIG_NO_SECTION_OR_NAME if there is no section name in the
+ * key.
  *
  * store_key - pointer to char* which will hold a copy of the key with
  *             lowercase section and variable name
@@ -555,12 +556,12 @@  int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
 
 	if (last_dot == NULL || last_dot == key) {
 		error(_("key does not contain a section: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
+		return CONFIG_NO_SECTION_OR_NAME;
 	}
 
 	if (!last_dot[1]) {
 		error(_("key does not contain variable name: %s"), key);
-		return -CONFIG_NO_SECTION_OR_NAME;
+		return CONFIG_NO_SECTION_OR_NAME;
 	}
 
 	baselen = last_dot - key;
@@ -596,7 +597,7 @@  int git_config_parse_key(const char *key, char **store_key, size_t *baselen_)
 
 out_free_ret_1:
 	FREE_AND_NULL(*store_key);
-	return -CONFIG_INVALID_KEY;
+	return CONFIG_INVALID_KEY;
 }
 
 static int config_parse_pair(const char *key, const char *value,
@@ -2346,7 +2347,7 @@  static int configset_find_element(struct config_set *set, const char *key,
 	 * `key` may come from the user, so normalize it before using it
 	 * for querying entries from the hashmap.
 	 */
-	ret = git_config_parse_key(key, &normalized_key, NULL);
+	ret = -git_config_parse_key(key, &normalized_key, NULL);
 	if (ret)
 		return ret;
 
@@ -3334,8 +3335,7 @@  int git_config_set_multivar_in_file_gently(const char *config_filename,
 	size_t contents_sz;
 	struct config_store_data store = CONFIG_STORE_INIT;
 
-	/* parse-key returns negative; flip the sign to feed exit(3) */
-	ret = 0 - git_config_parse_key(key, &store.key, &store.baselen);
+	ret = git_config_parse_key(key, &store.key, &store.baselen);
 	if (ret)
 		goto out_free;
 
diff --git a/config.h b/config.h
index 6332d749047..40966cb6828 100644
--- a/config.h
+++ b/config.h
@@ -23,7 +23,7 @@ 
 
 struct object_id;
 
-/* git_config_parse_key() returns these negated: */
+/* git_config_parse_key() returns these: */
 #define CONFIG_INVALID_KEY 1
 #define CONFIG_NO_SECTION_OR_NAME 2
 /* git_config_set_gently(), git_config_set_multivar_gently() return the above or these: */
diff --git a/submodule-config.c b/submodule-config.c
index b6908e295f4..2aafc7f9cbe 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -824,8 +824,8 @@  int print_config_from_gitmodules(struct repository *repo, const char *key)
 	char *store_key;
 
 	ret = git_config_parse_key(key, &store_key, NULL);
-	if (ret < 0)
-		return CONFIG_INVALID_KEY;
+	if (ret)
+		return ret;
 
 	config_from_gitmodules(config_print_callback, repo, store_key);
 
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 387d336c91f..3202b0f8843 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -2590,4 +2590,20 @@  test_expect_success 'includeIf.hasconfig:remote.*.url forbids remote url in such
 	grep "fatal: remote URLs cannot be configured in file directly or indirectly included by includeIf.hasconfig:remote.*.url" err
 '
 
+# Exit codes
+test_expect_success '--get with bad key' '
+	# Also exits with 1 if the value is not found
+	test_expect_code 1 git config --get "bad.name\n" 2>err &&
+	grep "error: invalid key" err &&
+	test_expect_code 2 git config --get "bad." 2>err &&
+	grep "error: key does not contain variable name" err
+'
+
+test_expect_success 'set with bad key' '
+	test_expect_code 1 git config "bad.name\n" var 2>err &&
+	grep "error: invalid key" err &&
+	test_expect_code 2 git config "bad." var 2>err &&
+	grep "error: key does not contain variable name" err
+'
+
 test_done

[1/2] config: return positive from git_config_parse_key()

Commit Message

Comments

Patch