Message ID | 7499a2809615d42eaf3649e1c33f38d099d27c1a.1653685761.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | config: introduce discovery.bare and protected config | expand |
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Glen Choo <chooglen@google.com> > > Protected config is read using `read_very_early_config()`, which has > several downsides: > > - Every call to `read_very_early_config()` parses global and > system-level config files anew, but this can be optimized by just > parsing them once [1]. > - Protected variables should respect "-c" because we can reasonably > assume that it comes from the user. But, `read_very_early_config()` > can't use "-c" because it is called so early that it does not have > access to command line arguments. Now we are talking about protected "variable". Is that a synonym for "config", or are there some distinctions between them? > - Protected config is stored in `the_repository` so that we don't need > to statically allocate it. But this might be confusing since protected > config ignores repository config by definition. Yes, it indeed is. Is it because we were over-eager when we introduced the "struct repository *repo" parameter to many functions and the configuration system wants you to have some repository, even when you know you are not reading from any repository? I am wondering if it is a cleaner solution *not* to hang the protected config as a configset in the_repository, but keep the configset as a separate global variable, perhaps static to config.c and is meant to be only accessed via git_protected_config() and the like. > @@ -295,6 +295,11 @@ void repo_clear(struct repository *repo) > FREE_AND_NULL(repo->remote_state); > } > > + if (repo->protected_config) { > + git_configset_clear(repo->protected_config); > + FREE_AND_NULL(repo->protected_config); > + } > + This becomes necessary only because each repository instance has protected_config, even though we need only one instance, no matter how many repositories we are accessing in this single invocation of Git, no? How should "git config -l" interact with "protected config" and "protected variables", by the way? Should a user be able to tell which ones are coming from protected scope? Should we gain, next to --global, --system, etc., --protected option to list only the protected config/variable? This is another thing that I find iffy on terminology. Should a random variable, like user.name, be a "protected config", if it is found in $HOME/.gitconfig? If it comes from there, surely we can trust its value, but unlike things like safe.directory, there is no code that wants to enforce that we pay attention only to user.name that came from trusted scopes. Should such a variable be called "protected variable"? Thanks.
Junio C Hamano <gitster@pobox.com> writes: >> Protected config is read using `read_very_early_config()`, which has >> several downsides: >> >> - Every call to `read_very_early_config()` parses global and >> system-level config files anew, but this can be optimized by just >> parsing them once [1]. >> - Protected variables should respect "-c" because we can reasonably >> assume that it comes from the user. But, `read_very_early_config()` >> can't use "-c" because it is called so early that it does not have >> access to command line arguments. > > Now we are talking about protected "variable". Is that a synonym > for "config", or are there some distinctions between them? Sorry, that's an old term I was toying with (this somehow snuck through my proofreading). I just meant "variable that is only read from protected config", aka a "protected config only variable". A goal in this version was to introduce as little jargon as possible, so - "protected config" refers to the set of config sources, and - "protected config only" refers to config variables/settings that are only read from protected config. >> - Protected config is stored in `the_repository` so that we don't need >> to statically allocate it. But this might be confusing since protected >> config ignores repository config by definition. > > Yes, it indeed is. Is it because we were over-eager when we > introduced the "struct repository *repo" parameter to many functions > and the configuration system wants you to have some repository, even > when you know you are not reading from any repository? Ah no, I was just trying to avoid yet-another global variable (since IIRC we want to move towards a more lib-like Git), and the_repository was a convenient global variable to (ab)use. > I am wondering if it is a cleaner solution *not* to hang the > protected config as a configset in the_repository, but keep the > configset as a separate global variable, perhaps static to config.c > and is meant to be only accessed via git_protected_config() and the > like. I think your suggestion to use a global variable is better, as much as I want to avoid another global variable. Protected config would affect any repositories that we work with in-core, so using a global sounds ok. environment.c might be a better place since we already make a concerted effort to put global config variables there instead of config.c. As an aside, I wonder how we could get rid of all of the globals in environment.c in the long term. Maybe we would have yet-another all encompassing global, the_environment, and then figure out which variables belong to the repository and which belong to the environment. >> @@ -295,6 +295,11 @@ void repo_clear(struct repository *repo) >> FREE_AND_NULL(repo->remote_state); >> } >> >> + if (repo->protected_config) { >> + git_configset_clear(repo->protected_config); >> + FREE_AND_NULL(repo->protected_config); >> + } >> + > > This becomes necessary only because each repository instance has > protected_config, even though we need only one instance, no matter > how many repositories we are accessing in this single invocation of > Git, no? Yes. > How should "git config -l" interact with "protected config" and > "protected variables", by the way? Should a user be able to tell > which ones are coming from protected scope? Should we gain, next to > --global, --system, etc., --protected option to list only the > protected config/variable? I'll have to think about this some more. My initial thoughts are that we should do this if we formalize 'protected' as a scope-like concept, but I don't see the lack of "--protected" as a significant hindrance to users because they can use "--global" and "--system" (albeit in two invocations instead of one). > This is another thing that I find iffy on terminology. Should a > random variable, like user.name, be a "protected config", if it is > found in $HOME/.gitconfig? If it comes from there, surely we can > trust its value, but unlike things like safe.directory, there is no > code that wants to enforce that we pay attention only to user.name > that came from trusted scopes. Should such a variable be called > "protected variable"? Ah.. I think it would be best to pretend that the "Protected variable" typo never happened. That term was destined to be confusing and meaningless. Instead, we can use "protected config" to refer to the config and "protected config only" to refer to variables. Since "protected config" is defined as (global + system + CLI) config, then yes, we would say that it is "protected config". But since we do not enforce that "user.name" _must_ come from only protected config, it is not "protected config only".
Glen Choo <chooglen@google.com> writes: > A goal in this version was to introduce as little jargon as possible, so > - "protected config" refers to the set of config sources, and > - "protected config only" refers to config variables/settings that are > only read from protected config. OK. Let's have such a clear pair of definitions somewhere in the doc or at least in a proposed log message. > >>> - Protected config is stored in `the_repository` so that we don't need >>> to statically allocate it. But this might be confusing since protected >>> config ignores repository config by definition. >> >> Yes, it indeed is. Is it because we were over-eager when we >> introduced the "struct repository *repo" parameter to many functions >> and the configuration system wants you to have some repository, even >> when you know you are not reading from any repository? > > Ah no, I was just trying to avoid yet-another global variable (since > IIRC we want to move towards a more lib-like Git), and the_repository > was a convenient global variable to (ab)use. If this does not have to be known only inside config.c, until we introduce a more global bag of things, which may have the current the_repository as one of its components, I do not think it hurts to have a file-scope static there. Then, perhaps git_configset_get*() helper functions can recognize cs==NULL as a sign that the caller wants to grab from the "protected config", or something? If we do not want to expose the underying global variable to the public, that is. > As an aside, I wonder how we could get rid of all of the globals in > environment.c in the long term. Maybe we would have yet-another all > encompassing global, the_environment, and then figure out which > variables belong to the repository and which belong to the environment. I think we are on the same page, we'd probably need something called the_world ;-) > Instead, we can use "protected config" to refer to the config and > "protected config only" to refer to variables. Since "protected config" > is defined as (global + system + CLI) config, then yes, we would say > that it is "protected config". But since we do not enforce that > "user.name" _must_ come from only protected config, it is not "protected > config only". Very clear. Thanks.
On 5/27/2022 5:09 PM, Glen Choo via GitGitGadget wrote: > From: Glen Choo <chooglen@google.com> > > Protected config is read using `read_very_early_config()`, which has > several downsides: > > - Every call to `read_very_early_config()` parses global and > system-level config files anew, but this can be optimized by just > parsing them once [1]. > - Protected variables should respect "-c" because we can reasonably > assume that it comes from the user. But, `read_very_early_config()` > can't use "-c" because it is called so early that it does not have > access to command line arguments. > > Introduce `git_protected_config()`, which reads protected config and > caches the values in `the_repository.protected_config`. Then, refactor > `safe.directory` to use `git_protected_config()`. > > This implementation can still be improved, however: > > - `git_protected_config()` iterates through every variable in > `the_repository.protected_config`, which may still be too expensive to > be called in every "git" invocation. There exist constant time lookup > functions for non-protected config (repo_config_get_*()), but for > simplicity, this commit does not implement similar functions for > protected config. I originally thought that we should jump to that "right" solution, but the existing logic in ensure_valid_ownership() uses the iterator method, mostly because it uses a multi-valued string. There are helpers that allow iterating over a specific multi-valued key, but there is no reason to complicate the current patch with that amount of refactoring. That can be handled as a completely separate topic. > - Protected config is stored in `the_repository` so that we don't need > to statically allocate it. But this might be confusing since protected > config ignores repository config by definition. I agree with Junio's suggestion of keeping this as a static global in config.c, accessible only by the public methods from config.h. A future where we have "the_world" might be nice for inventory on all these globals. Definitely not something to hold up this series. > +/* Read protected config into the_repository->protected_config. */ > +static void read_protected_config(void) > +{ > + char *xdg_config = NULL, *user_config = NULL, *system_config = NULL; > + > + CALLOC_ARRAY(the_repository->protected_config, 1); > + git_configset_init(the_repository->protected_config); > + > + system_config = git_system_config(); > + git_global_config(&user_config, &xdg_config); > + > + git_configset_add_file(the_repository->protected_config, system_config); > + git_configset_add_file(the_repository->protected_config, xdg_config); > + git_configset_add_file(the_repository->protected_config, user_config); > + > + free(system_config); > + free(xdg_config); > + free(user_config); > +} This loads the config from three files, including the xdg_config, which I wasn't thinking about before. This implementation does not use the -c config yet, which you listed as a downside of read_very_early_config(). I see that you include that in your patch 4, but the commit message for this patch could list that as a step that will be handled by a later change. (You could also do that as patch 3 and add a test near the existing safe.directory tests instead of waiting for discovery.bare.) > + > +/* Ensure that the_repository->protected_config has been initialized. */ > +static void git_protected_config_check_init(void) > +{ > + if (the_repository->protected_config && > + the_repository->protected_config->hash_initialized) > + return; > + read_protected_config(); > +} > + > +void git_protected_config(config_fn_t fn, void *data) > +{ > + git_protected_config_check_init(); > + configset_iter(the_repository->protected_config, fn, data); > +} These two methods are clearly correct. ..._check_init() is an OK name. I've seen us use "prepare_...()" in other areas as a way of making sure that we have the proper state (see prepare_packed_git() and the like), so maybe a rename here to match would be worthwhile. Feel free to ignore. > + if (repo->protected_config) { > + git_configset_clear(repo->protected_config); > + FREE_AND_NULL(repo->protected_config); > + } This will have no equivalent when protected_config is left as a static global, but that is fine. It only goes out of scope with the end of the process, anyway. > @@ -1128,7 +1128,7 @@ static int ensure_valid_ownership(const char *path) > is_path_owned_by_current_user(path)) > return 1; > > - read_very_early_config(safe_directory_cb, &data); > + git_protected_config(safe_directory_cb, &data); Nice to have a very simple conversion here. Thanks, -Stolee
diff --git a/config.c b/config.c index fa471dbdb89..c30bb7c5d09 100644 --- a/config.c +++ b/config.c @@ -2614,6 +2614,41 @@ int repo_config_get_pathname(struct repository *repo, return ret; } +/* Read protected config into the_repository->protected_config. */ +static void read_protected_config(void) +{ + char *xdg_config = NULL, *user_config = NULL, *system_config = NULL; + + CALLOC_ARRAY(the_repository->protected_config, 1); + git_configset_init(the_repository->protected_config); + + system_config = git_system_config(); + git_global_config(&user_config, &xdg_config); + + git_configset_add_file(the_repository->protected_config, system_config); + git_configset_add_file(the_repository->protected_config, xdg_config); + git_configset_add_file(the_repository->protected_config, user_config); + + free(system_config); + free(xdg_config); + free(user_config); +} + +/* Ensure that the_repository->protected_config has been initialized. */ +static void git_protected_config_check_init(void) +{ + if (the_repository->protected_config && + the_repository->protected_config->hash_initialized) + return; + read_protected_config(); +} + +void git_protected_config(config_fn_t fn, void *data) +{ + git_protected_config_check_init(); + configset_iter(the_repository->protected_config, fn, data); +} + /* Functions used historically to read configuration from 'the_repository' */ void git_config(config_fn_t fn, void *data) { diff --git a/config.h b/config.h index 7654f61c634..411965f52b5 100644 --- a/config.h +++ b/config.h @@ -505,6 +505,14 @@ int repo_config_get_maybe_bool(struct repository *repo, int repo_config_get_pathname(struct repository *repo, const char *key, const char **dest); +/* + * Functions for reading protected config. By definition, protected + * config ignores repository config, so it is unnecessary to read + * protected config from any `struct repository` other than + * the_repository. + */ +void git_protected_config(config_fn_t fn, void *data); + /** * Querying For Specific Variables * ------------------------------- diff --git a/repository.c b/repository.c index 5d166b692c8..ec319a5e09a 100644 --- a/repository.c +++ b/repository.c @@ -295,6 +295,11 @@ void repo_clear(struct repository *repo) FREE_AND_NULL(repo->remote_state); } + if (repo->protected_config) { + git_configset_clear(repo->protected_config); + FREE_AND_NULL(repo->protected_config); + } + repo_clear_path_cache(&repo->cached_paths); } diff --git a/repository.h b/repository.h index 6cc661e5a43..24251aac553 100644 --- a/repository.h +++ b/repository.h @@ -126,6 +126,14 @@ struct repository { struct repo_settings settings; + /* + * Config that comes from trusted sources, namely + * - system config files (e.g. /etc/gitconfig) + * - global config files (e.g. $HOME/.gitconfig, + * $XDG_CONFIG_HOME/git) + */ + struct config_set *protected_config; + /* Subsystems */ /* * Repository's config which contains key-value pairs from the usual diff --git a/setup.c b/setup.c index f818dd858c6..847d47f9195 100644 --- a/setup.c +++ b/setup.c @@ -1128,7 +1128,7 @@ static int ensure_valid_ownership(const char *path) is_path_owned_by_current_user(path)) return 1; - read_very_early_config(safe_directory_cb, &data); + git_protected_config(safe_directory_cb, &data); return data.is_safe; }