Message ID | 20250227175456.1129840-1-usmanakinyemi202@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] config: teach `repo_config()` to allow `repo` to be NULL | expand |
On Thu, Feb 27, 2025 at 12:55 PM Usman Akinyemi <usmanakinyemi202@gmail.com> wrote: > The `repo` value can be NULL if a builtin command is run outside > any repository. The current implementation of `repo_config()` will > fail if `repo` is NULL. > > If the `repo` is NULL the `repo_config()` can ignore the repository > configuration but it should read the other configuration sources like > the system-side configuration instead of failing. s/side/wide/ > Teach the `repo_config()` to allow `repo` to be NULL by calling the > `read_very_early_config()` which read config but only enumerate system > and global settings. > > Suggested-by: Junio C Hamano <gitster@pobox.com> > Mentored-by: Christian Couder <chriscool@tuxfamily.org> > Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > The `repo` value can be NULL if a builtin command is run outside > any repository. The current implementation of `repo_config()` will > fail if `repo` is NULL. > > If the `repo` is NULL the `repo_config()` can ignore the repository > configuration but it should read the other configuration sources like > the system-side configuration instead of failing. > > Teach the `repo_config()` to allow `repo` to be NULL by calling the > `read_very_early_config()` which read config but only enumerate system > and global settings. > > Suggested-by: Junio C Hamano <gitster@pobox.com> > Mentored-by: Christian Couder <chriscool@tuxfamily.org> > Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> > --- > config.c | 4 ++++ > config.h | 3 +++ > 2 files changed, 7 insertions(+) Thanks for tackling this one. The other day I was looking at the code paths involved and was trying to think of a way to "fix" it at deeper level than this one, but didn't think of this much simpler solution. This is so drastic a change of behaviour that it deserves some tests, I think. > diff --git a/config.c b/config.c > index 36f76fafe5..c5181fd23b 100644 > --- a/config.c > +++ b/config.c > @@ -2526,6 +2526,10 @@ void repo_config_clear(struct repository *repo) > > void repo_config(struct repository *repo, config_fn_t fn, void *data) > { > + if (!repo) { > + read_very_early_config(fn, data); > + return; > + } > git_config_check_init(repo); > configset_iter(repo->config, fn, data); > } > diff --git a/config.h b/config.h > index 5c730c4f89..1e5b22dfc4 100644 > --- a/config.h > +++ b/config.h > @@ -219,6 +219,9 @@ void read_very_early_config(config_fn_t cb, void *data); > * repo-specific one; by overwriting, the higher-priority repo-specific > * value is left at the end). > * > + * In cases where the repository variable is NULL, repo_config() will > + * call read_early_config(). > + * > * Unlike git_config_from_file(), this function respects includes. > */ > void repo_config(struct repository *r, config_fn_t fn, void *);
Hi Usman On 27/02/2025 17:54, Usman Akinyemi wrote: > The `repo` value can be NULL if a builtin command is run outside > any repository. The current implementation of `repo_config()` will > fail if `repo` is NULL. > > If the `repo` is NULL the `repo_config()` can ignore the repository > configuration but it should read the other configuration sources like > the system-side configuration instead of failing. > > Teach the `repo_config()` to allow `repo` to be NULL by calling the > `read_very_early_config()` which read config but only enumerate system > and global settings. " Thanks for working on this, I like the idea but looking at read_very_early_config() it sets "opts.ignore_cmdline = 1" which means that this will ignore any config options passed with "git -c key=value". I think it would be better to call config_with_options() with the appropriate options directly. For this to work all the commands that run outside a repository would have to read the config via repo_config(), and take care not to call any of the repo_config_get_*() functions. They mostly seem to do that but "git for-each-repo" calls repo_config_get_string_multi() - it should be easy enough to convert that to a callback when that command is updated to stop using "the_repository" Best Wishes Phillip > Suggested-by: Junio C Hamano <gitster@pobox.com> > Mentored-by: Christian Couder <chriscool@tuxfamily.org> > Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> > --- > config.c | 4 ++++ > config.h | 3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/config.c b/config.c > index 36f76fafe5..c5181fd23b 100644 > --- a/config.c > +++ b/config.c > @@ -2526,6 +2526,10 @@ void repo_config_clear(struct repository *repo) > > void repo_config(struct repository *repo, config_fn_t fn, void *data) > { > + if (!repo) { > + read_very_early_config(fn, data); > + return; > + } > git_config_check_init(repo); > configset_iter(repo->config, fn, data); > } > diff --git a/config.h b/config.h > index 5c730c4f89..1e5b22dfc4 100644 > --- a/config.h > +++ b/config.h > @@ -219,6 +219,9 @@ void read_very_early_config(config_fn_t cb, void *data); > * repo-specific one; by overwriting, the higher-priority repo-specific > * value is left at the end). > * > + * In cases where the repository variable is NULL, repo_config() will > + * call read_early_config(). > + * > * Unlike git_config_from_file(), this function respects includes. > */ > void repo_config(struct repository *r, config_fn_t fn, void *);
Phillip Wood <phillip.wood123@gmail.com> writes: > Thanks for working on this, I like the idea but looking at > read_very_early_config() it sets "opts.ignore_cmdline = 1" which means > that this will ignore any config options passed with "git -c > key=value". I think it would be better to call config_with_options() > with the appropriate options directly. hmph. > For this to work all the commands that run outside a repository would > have to read the config via repo_config(), and take care not to call > any of the repo_config_get_*() functions. They mostly seem to do that > but "git for-each-repo" calls repo_config_get_string_multi() - it > should be easy enough to convert that to a callback when that command > is updated to stop using "the_repository" Well the only reason why we want to allow NULL in repo_config() calls is to accomodate this pattern, if I am reading the discussion correctly. * A command that wants to run in a repository (i.e. marked with RUN_SETUP, not RUN_SETUP_GENTLY) is invoked with "-h" and outside the repository. * cmd_foo() is called with "-h" option in argv[]; repo is set to NULL. * The typical start-up sequence for any builtin cmd_foo() is to read the configuration and then call parseopt, and the latter is how "git foo -h" is handled. * Historically, the "call the configuration" was done by making a call to git_config(), which used the_repository [*]. But recent trend is to use the repo passed down to cmd_foo() instead, and replacing git_config() with repo_config() blindly would break unless repo_config() is prepared to take NULL. So, any command that requires to be in a repository would not go beyond its call to parse_options() when called with repo==NULL. Either it would have already died inside getup_git_directory() when "-h" is not given, or parse_options() would have gave the usage string and exited. For commands like "for-each-repo" that itself wants to be able to run outside a repository by marking itself as RUN_SETUP_GENTLY, they do have to update the code after the parse_options() themselves, of course, but I view it as a separate issue from this "we make git_config() as the first thing in everything---we want to replace it with repo_config()" patch. Thanks. [Footnote] * I personally think that it is an unhealthy fundamentalism to try eradicating the use of the_repository even from the top-level calls of cmd_foo() functions, which are like traditional main(). They are never meant to be reused as a subroutine that can work on an arbitrary repository from any arbitrary codepaths. The special repository instance, the_repository, is set up to be suitable to work in the current repository, if exists, and gives a reasonable fallback behaviour when we are not inside a repository, which behaves exactly how we want it to behave there. The more library-ish helper functions (I am talking about distinction between what is in say builtin/diff.c and diff-lib.c here and referring to the latter) are totally different matter, of course. They are meant to be reused and they should be made to work on an arbitrary repository from arbitrary codepaths, hence reducing the reliance on the_repository is a good goal for them to aim at.
On Fri, Feb 28, 2025 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote: > > Phillip Wood <phillip.wood123@gmail.com> writes: > > > Thanks for working on this, I like the idea but looking at > > read_very_early_config() it sets "opts.ignore_cmdline = 1" which means > > that this will ignore any config options passed with "git -c > > key=value". I think it would be better to call config_with_options() > > with the appropriate options directly. > > hmph. > Hello, > > For this to work all the commands that run outside a repository would > > have to read the config via repo_config(), and take care not to call > > any of the repo_config_get_*() functions. They mostly seem to do that > > but "git for-each-repo" calls repo_config_get_string_multi() - it > > should be easy enough to convert that to a callback when that command > > is updated to stop using "the_repository" > > Well the only reason why we want to allow NULL in repo_config() > calls is to accomodate this pattern, if I am reading the discussion > correctly. > > * A command that wants to run in a repository (i.e. marked with > RUN_SETUP, not RUN_SETUP_GENTLY) is invoked with "-h" and outside > the repository. > > * cmd_foo() is called with "-h" option in argv[]; repo is set to > NULL. > > * The typical start-up sequence for any builtin cmd_foo() is to > read the configuration and then call parseopt, and the latter is > how "git foo -h" is handled. > > * Historically, the "call the configuration" was done by making a > call to git_config(), which used the_repository [*]. But recent > trend is to use the repo passed down to cmd_foo() instead, and > replacing git_config() with repo_config() blindly would break > unless repo_config() is prepared to take NULL. Yeah, these all are true. Also, "git for-each-repo" and many other commands seems not to make use of rep_config() or git_config(). I think this change and functions are good for command marked RUN_SETUP since their behaviour is pretty straight forward. I think, we might want to use another approach for other commands that are not marked RUN_SETUP if we decide to remove the_repository from them. > > So, any command that requires to be in a repository would not go > beyond its call to parse_options() when called with repo==NULL. > Either it would have already died inside getup_git_directory() when > "-h" is not given, or parse_options() would have gave the usage > string and exited. > > For commands like "for-each-repo" that itself wants to be able to > run outside a repository by marking itself as RUN_SETUP_GENTLY, > they do have to update the code after the parse_options() > themselves, of course, but I view it as a separate issue from this > "we make git_config() as the first thing in everything---we want > to replace it with repo_config()" patch. > > Thanks. > > > [Footnote] > > * I personally think that it is an unhealthy fundamentalism to try > eradicating the use of the_repository even from the top-level > calls of cmd_foo() functions, which are like traditional main(). > > They are never meant to be reused as a subroutine that can work > on an arbitrary repository from any arbitrary codepaths. The > special repository instance, the_repository, is set up to be > suitable to work in the current repository, if exists, and gives > a reasonable fallback behaviour when we are not inside a > repository, which behaves exactly how we want it to behave there. > > The more library-ish helper functions (I am talking about > distinction between what is in say builtin/diff.c and diff-lib.c > here and referring to the latter) are totally different matter, > of course. They are meant to be reused and they should be made > to work on an arbitrary repository from arbitrary codepaths, > hence reducing the reliance on the_repository is a good goal for > them to aim at. Yeah, I understand. I believe these change we are trying to make to builtin commands that are marked RUN_SETUP is a good goal due to how they work, what do you think ? Also, about the testing, I was thinking of using the clar framework or the test-tool, do you have any in mind ? Thank you.
Usman Akinyemi <usmanakinyemi202@gmail.com> writes: > Also, about the testing, I was thinking of using the clar framework or the > test-tool, do you have any in mind ? Neither of them seems a good fit for the task to me. Once you rewrite one of the built-in commands using this and run "git $cmd -h" under "nongit" helper, wouldn't that be a good enough test to future-proof the codepath?
diff --git a/config.c b/config.c index 36f76fafe5..c5181fd23b 100644 --- a/config.c +++ b/config.c @@ -2526,6 +2526,10 @@ void repo_config_clear(struct repository *repo) void repo_config(struct repository *repo, config_fn_t fn, void *data) { + if (!repo) { + read_very_early_config(fn, data); + return; + } git_config_check_init(repo); configset_iter(repo->config, fn, data); } diff --git a/config.h b/config.h index 5c730c4f89..1e5b22dfc4 100644 --- a/config.h +++ b/config.h @@ -219,6 +219,9 @@ void read_very_early_config(config_fn_t cb, void *data); * repo-specific one; by overwriting, the higher-priority repo-specific * value is left at the end). * + * In cases where the repository variable is NULL, repo_config() will + * call read_early_config(). + * * Unlike git_config_from_file(), this function respects includes. */ void repo_config(struct repository *r, config_fn_t fn, void *);
The `repo` value can be NULL if a builtin command is run outside any repository. The current implementation of `repo_config()` will fail if `repo` is NULL. If the `repo` is NULL the `repo_config()` can ignore the repository configuration but it should read the other configuration sources like the system-side configuration instead of failing. Teach the `repo_config()` to allow `repo` to be NULL by calling the `read_very_early_config()` which read config but only enumerate system and global settings. Suggested-by: Junio C Hamano <gitster@pobox.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> --- config.c | 4 ++++ config.h | 3 +++ 2 files changed, 7 insertions(+)