diff mbox series

[RFC] config: teach `repo_config()` to allow `repo` to be NULL

Message ID 20250227175456.1129840-1-usmanakinyemi202@gmail.com (mailing list archive)
State New
Headers show
Series [RFC] config: teach `repo_config()` to allow `repo` to be NULL | expand

Commit Message

Usman Akinyemi Feb. 27, 2025, 5:54 p.m. UTC
The `repo` value can be NULL if a builtin command is run outside
any repository. The current implementation of `repo_config()` will
fail if `repo` is NULL.

If the `repo` is NULL the `repo_config()` can ignore the repository
configuration but it should read the other configuration sources like
the system-side configuration instead of failing.

Teach the `repo_config()` to allow `repo` to be NULL by calling the
`read_very_early_config()` which read config but only enumerate system
and global settings.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
---
 config.c | 4 ++++
 config.h | 3 +++
 2 files changed, 7 insertions(+)

Comments

Eric Sunshine Feb. 27, 2025, 6:35 p.m. UTC | #1
On Thu, Feb 27, 2025 at 12:55 PM Usman Akinyemi
<usmanakinyemi202@gmail.com> wrote:
> The `repo` value can be NULL if a builtin command is run outside
> any repository. The current implementation of `repo_config()` will
> fail if `repo` is NULL.
>
> If the `repo` is NULL the `repo_config()` can ignore the repository
> configuration but it should read the other configuration sources like
> the system-side configuration instead of failing.

s/side/wide/

> Teach the `repo_config()` to allow `repo` to be NULL by calling the
> `read_very_early_config()` which read config but only enumerate system
> and global settings.
>
> Suggested-by: Junio C Hamano <gitster@pobox.com>
> Mentored-by: Christian Couder <chriscool@tuxfamily.org>
> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Junio C Hamano Feb. 27, 2025, 6:46 p.m. UTC | #2
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

> The `repo` value can be NULL if a builtin command is run outside
> any repository. The current implementation of `repo_config()` will
> fail if `repo` is NULL.
>
> If the `repo` is NULL the `repo_config()` can ignore the repository
> configuration but it should read the other configuration sources like
> the system-side configuration instead of failing.
>
> Teach the `repo_config()` to allow `repo` to be NULL by calling the
> `read_very_early_config()` which read config but only enumerate system
> and global settings.
>
> Suggested-by: Junio C Hamano <gitster@pobox.com>
> Mentored-by: Christian Couder <chriscool@tuxfamily.org>
> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
> ---
>  config.c | 4 ++++
>  config.h | 3 +++
>  2 files changed, 7 insertions(+)

Thanks for tackling this one.  The other day I was looking at the
code paths involved and was trying to think of a way to "fix" it at
deeper level than this one, but didn't think of this much simpler
solution.

This is so drastic a change of behaviour that it deserves some
tests, I think.


> diff --git a/config.c b/config.c
> index 36f76fafe5..c5181fd23b 100644
> --- a/config.c
> +++ b/config.c
> @@ -2526,6 +2526,10 @@ void repo_config_clear(struct repository *repo)
>  
>  void repo_config(struct repository *repo, config_fn_t fn, void *data)
>  {
> +	if (!repo) {
> +		read_very_early_config(fn, data);
> +		return;
> +	}
>  	git_config_check_init(repo);
>  	configset_iter(repo->config, fn, data);
>  }
> diff --git a/config.h b/config.h
> index 5c730c4f89..1e5b22dfc4 100644
> --- a/config.h
> +++ b/config.h
> @@ -219,6 +219,9 @@ void read_very_early_config(config_fn_t cb, void *data);
>   * repo-specific one; by overwriting, the higher-priority repo-specific
>   * value is left at the end).
>   *
> + * In cases where the repository variable is NULL, repo_config() will
> + * call read_early_config().
> + *
>   * Unlike git_config_from_file(), this function respects includes.
>   */
>  void repo_config(struct repository *r, config_fn_t fn, void *);
Phillip Wood Feb. 28, 2025, 10:56 a.m. UTC | #3
Hi Usman

On 27/02/2025 17:54, Usman Akinyemi wrote:
> The `repo` value can be NULL if a builtin command is run outside
> any repository. The current implementation of `repo_config()` will
> fail if `repo` is NULL.
> 
> If the `repo` is NULL the `repo_config()` can ignore the repository
> configuration but it should read the other configuration sources like
> the system-side configuration instead of failing.
> 
> Teach the `repo_config()` to allow `repo` to be NULL by calling the
> `read_very_early_config()` which read config but only enumerate system
> and global settings.
"
Thanks for working on this, I like the idea but looking at 
read_very_early_config() it sets "opts.ignore_cmdline = 1" which means 
that this will ignore any config options passed with "git -c key=value". 
I think it would be better to call config_with_options() with the 
appropriate options directly.

For this to work all the commands that run outside a repository would 
have to read the config via repo_config(), and take care not to call any 
of the repo_config_get_*() functions. They mostly seem to do that but 
"git for-each-repo" calls repo_config_get_string_multi() - it should be 
easy enough to convert that to a callback when that command is updated 
to stop using "the_repository"

Best Wishes

Phillip

> Suggested-by: Junio C Hamano <gitster@pobox.com>
> Mentored-by: Christian Couder <chriscool@tuxfamily.org>
> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
> ---
>   config.c | 4 ++++
>   config.h | 3 +++
>   2 files changed, 7 insertions(+)
> 
> diff --git a/config.c b/config.c
> index 36f76fafe5..c5181fd23b 100644
> --- a/config.c
> +++ b/config.c
> @@ -2526,6 +2526,10 @@ void repo_config_clear(struct repository *repo)
>   
>   void repo_config(struct repository *repo, config_fn_t fn, void *data)
>   {
> +	if (!repo) {
> +		read_very_early_config(fn, data);
> +		return;
> +	}
>   	git_config_check_init(repo);
>   	configset_iter(repo->config, fn, data);
>   }
> diff --git a/config.h b/config.h
> index 5c730c4f89..1e5b22dfc4 100644
> --- a/config.h
> +++ b/config.h
> @@ -219,6 +219,9 @@ void read_very_early_config(config_fn_t cb, void *data);
>    * repo-specific one; by overwriting, the higher-priority repo-specific
>    * value is left at the end).
>    *
> + * In cases where the repository variable is NULL, repo_config() will
> + * call read_early_config().
> + *
>    * Unlike git_config_from_file(), this function respects includes.
>    */
>   void repo_config(struct repository *r, config_fn_t fn, void *);
Junio C Hamano Feb. 28, 2025, 6:14 p.m. UTC | #4
Phillip Wood <phillip.wood123@gmail.com> writes:

> Thanks for working on this, I like the idea but looking at
> read_very_early_config() it sets "opts.ignore_cmdline = 1" which means
> that this will ignore any config options passed with "git -c
> key=value". I think it would be better to call config_with_options()
> with the appropriate options directly.

hmph.

> For this to work all the commands that run outside a repository would
> have to read the config via repo_config(), and take care not to call
> any of the repo_config_get_*() functions. They mostly seem to do that
> but "git for-each-repo" calls repo_config_get_string_multi() - it
> should be easy enough to convert that to a callback when that command
> is updated to stop using "the_repository"

Well the only reason why we want to allow NULL in repo_config()
calls is to accomodate this pattern, if I am reading the discussion
correctly.

 * A command that wants to run in a repository (i.e. marked with
   RUN_SETUP, not RUN_SETUP_GENTLY) is invoked with "-h" and outside
   the repository.

 * cmd_foo() is called with "-h" option in argv[]; repo is set to
   NULL.

 * The typical start-up sequence for any builtin cmd_foo() is to
   read the configuration and then call parseopt, and the latter is
   how "git foo -h" is handled.

 * Historically, the "call the configuration" was done by making a
   call to git_config(), which used the_repository [*].  But recent
   trend is to use the repo passed down to cmd_foo() instead, and
   replacing git_config() with repo_config() blindly would break
   unless repo_config() is prepared to take NULL.

So, any command that requires to be in a repository would not go
beyond its call to parse_options() when called with repo==NULL.
Either it would have already died inside getup_git_directory() when
"-h" is not given, or parse_options() would have gave the usage
string and exited.

For commands like "for-each-repo" that itself wants to be able to
run outside a repository by marking itself as RUN_SETUP_GENTLY,
they do have to update the code after the parse_options()
themselves, of course, but I view it as a separate issue from this
"we make git_config() as the first thing in everything---we want
to replace it with repo_config()" patch.

Thanks.


[Footnote]

 * I personally think that it is an unhealthy fundamentalism to try
   eradicating the use of the_repository even from the top-level
   calls of cmd_foo() functions, which are like traditional main().

   They are never meant to be reused as a subroutine that can work
   on an arbitrary repository from any arbitrary codepaths.  The
   special repository instance, the_repository, is set up to be
   suitable to work in the current repository, if exists, and gives
   a reasonable fallback behaviour when we are not inside a
   repository, which behaves exactly how we want it to behave there.

   The more library-ish helper functions (I am talking about
   distinction between what is in say builtin/diff.c and diff-lib.c
   here and referring to the latter) are totally different matter,
   of course.  They are meant to be reused and they should be made
   to work on an arbitrary repository from arbitrary codepaths,
   hence reducing the reliance on the_repository is a good goal for
   them to aim at.
Usman Akinyemi Feb. 28, 2025, 11:56 p.m. UTC | #5
On Fri, Feb 28, 2025 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Phillip Wood <phillip.wood123@gmail.com> writes:
>
> > Thanks for working on this, I like the idea but looking at
> > read_very_early_config() it sets "opts.ignore_cmdline = 1" which means
> > that this will ignore any config options passed with "git -c
> > key=value". I think it would be better to call config_with_options()
> > with the appropriate options directly.
>
> hmph.
>
Hello,
> > For this to work all the commands that run outside a repository would
> > have to read the config via repo_config(), and take care not to call
> > any of the repo_config_get_*() functions. They mostly seem to do that
> > but "git for-each-repo" calls repo_config_get_string_multi() - it
> > should be easy enough to convert that to a callback when that command
> > is updated to stop using "the_repository"
>
> Well the only reason why we want to allow NULL in repo_config()
> calls is to accomodate this pattern, if I am reading the discussion
> correctly.
>
>  * A command that wants to run in a repository (i.e. marked with
>    RUN_SETUP, not RUN_SETUP_GENTLY) is invoked with "-h" and outside
>    the repository.
>
>  * cmd_foo() is called with "-h" option in argv[]; repo is set to
>    NULL.
>
>  * The typical start-up sequence for any builtin cmd_foo() is to
>    read the configuration and then call parseopt, and the latter is
>    how "git foo -h" is handled.
>
>  * Historically, the "call the configuration" was done by making a
>    call to git_config(), which used the_repository [*].  But recent
>    trend is to use the repo passed down to cmd_foo() instead, and
>    replacing git_config() with repo_config() blindly would break
>    unless repo_config() is prepared to take NULL.
Yeah, these all are true. Also, "git for-each-repo" and many other commands
seems not to make use of rep_config() or git_config(). I think this change and
functions are good for command marked RUN_SETUP since their behaviour
is pretty straight forward. I think, we might want to use another approach for
other commands that are not marked RUN_SETUP if we decide to remove
the_repository from them.
>
> So, any command that requires to be in a repository would not go
> beyond its call to parse_options() when called with repo==NULL.
> Either it would have already died inside getup_git_directory() when
> "-h" is not given, or parse_options() would have gave the usage
> string and exited.
>
> For commands like "for-each-repo" that itself wants to be able to
> run outside a repository by marking itself as RUN_SETUP_GENTLY,
> they do have to update the code after the parse_options()
> themselves, of course, but I view it as a separate issue from this
> "we make git_config() as the first thing in everything---we want
> to replace it with repo_config()" patch.
>
> Thanks.
>
>
> [Footnote]
>
>  * I personally think that it is an unhealthy fundamentalism to try
>    eradicating the use of the_repository even from the top-level
>    calls of cmd_foo() functions, which are like traditional main().
>
>    They are never meant to be reused as a subroutine that can work
>    on an arbitrary repository from any arbitrary codepaths.  The
>    special repository instance, the_repository, is set up to be
>    suitable to work in the current repository, if exists, and gives
>    a reasonable fallback behaviour when we are not inside a
>    repository, which behaves exactly how we want it to behave there.
>
>    The more library-ish helper functions (I am talking about
>    distinction between what is in say builtin/diff.c and diff-lib.c
>    here and referring to the latter) are totally different matter,
>    of course.  They are meant to be reused and they should be made
>    to work on an arbitrary repository from arbitrary codepaths,
>    hence reducing the reliance on the_repository is a good goal for
>    them to aim at.
Yeah, I understand. I believe these change we are trying to make to
builtin commands that are marked RUN_SETUP is a good goal due to how
they work, what  do you think ?

Also, about the testing, I was thinking of using the clar framework or the
test-tool, do you have any in mind ?

Thank you.
Junio C Hamano March 1, 2025, 7:45 p.m. UTC | #6
Usman Akinyemi <usmanakinyemi202@gmail.com> writes:

> Also, about the testing, I was thinking of using the clar framework or the
> test-tool, do you have any in mind ?

Neither of them seems a good fit for the task to me.  

Once you rewrite one of the built-in commands using this and run
"git $cmd -h" under "nongit" helper, wouldn't that be a good enough
test to future-proof the codepath?
diff mbox series

Patch

diff --git a/config.c b/config.c
index 36f76fafe5..c5181fd23b 100644
--- a/config.c
+++ b/config.c
@@ -2526,6 +2526,10 @@  void repo_config_clear(struct repository *repo)
 
 void repo_config(struct repository *repo, config_fn_t fn, void *data)
 {
+	if (!repo) {
+		read_very_early_config(fn, data);
+		return;
+	}
 	git_config_check_init(repo);
 	configset_iter(repo->config, fn, data);
 }
diff --git a/config.h b/config.h
index 5c730c4f89..1e5b22dfc4 100644
--- a/config.h
+++ b/config.h
@@ -219,6 +219,9 @@  void read_very_early_config(config_fn_t cb, void *data);
  * repo-specific one; by overwriting, the higher-priority repo-specific
  * value is left at the end).
  *
+ * In cases where the repository variable is NULL, repo_config() will
+ * call read_early_config().
+ *
  * Unlike git_config_from_file(), this function respects includes.
  */
 void repo_config(struct repository *r, config_fn_t fn, void *);