Message ID | pull.1261.v4.git.git.1654635432.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | config: introduce discovery.bare and protected config | expand |
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > Glen Choo (5): > Documentation/git-config.txt: add SCOPES section > Documentation: define protected configuration Forgot to mention when I was sending my comments on patch 2: we should standardize on "protected config" and not use "protected configuration" anywhere. > config: read protected config with `git_protected_config()` > safe.directory: use git_protected_config() > setup.c: create `discovery.bare` Thanks - I think this is a nice feature to have. Everything looks good except for some minor comments on text in patch 2, which I have sent. One alternative design would have been to have separate configsets for protected config and non-protected config (or even better, separate configsets for trace2 config, protected config minus trace2 config, and non-protected config) but that doesn't have to block the submission of this patch set.
Jonathan Tan <jonathantanmy@google.com> writes: > "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: >> Glen Choo (5): >> Documentation/git-config.txt: add SCOPES section >> Documentation: define protected configuration > > Forgot to mention when I was sending my comments on patch 2: we should > standardize on "protected config" and not use "protected configuration" > anywhere. Makes sense. > One alternative design would have been to have separate configsets for > protected config and non-protected config (or even better, separate > configsets for trace2 config, protected config minus trace2 config, and > non-protected config) but that doesn't have to block the submission of > this patch set. I suppose that the idea behind this is that we only parse and store each config file exactly once. It's a good goal, but the whole point of the configset is that we can query a single struct to figure out the value of a config variable. Having multiple configsets starts to shift more of the burden to the callers because they now have to query multiple configsets to find their desired config value, and we already start to see some of this unpleasantness in this series. An alternative that I'd been thinking about is to make a few changes to the git_config_* + configset API to allow us to use a single configset for all of our needs: 1. Keep track of what config we've read when reading into the_repository->config, i.e. instead of a boolean "all config has been [un]read", we can express "system and global config has been read, but not local or command config". Then, use this information to load config from sources as they become available. This will allow us to read incomplete config for trace2 and setup.c (discovery.bare and safe.directory), and only read what we need later on. This assumes that when Git reads config, that config is always valid later on. So this is broken if, e.g. we read global config file A during setup, but when we discover the repo, we discard A and read global config file B instead. I don't know if we do this or if we are planning to in the future. 2. Add an additional argument that specifies what scopes to respect when reading config (maybe as a set of flags). This gives us extra specificity when using the git_config*() functions, so we could get rid of git_protected_config() like so: /* Change enum config_scope into flags first... */ #define WIP_SCOPES_PROTECTED = CONFIG_SCOPE_SYSTEM & \ CONFIG_SCOPE_GLOBAL & CONFIG_SCOPE_COMMAND static enum discovery_bare_allowed get_discovery_bare(void) { enum discovery_bare_allowed result = DISCOVERY_BARE_ALWAYS; git_config(discovery_bare_cb, &result, WIP_SCOPES_PROTECTED); return result; } And as an added bonus, this gives us an easy way to implement the constant time git_config_*() functions for protected config. We could even do this without doing 1. first. I haven't looked into whether we could turn the enum into flags, but otherwise, I think this is pretty feasible.
Glen Choo <chooglen@google.com> writes: > Jonathan Tan <jonathantanmy@google.com> writes: > >> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: >>> Glen Choo (5): >>> Documentation/git-config.txt: add SCOPES section >>> Documentation: define protected configuration >> >> Forgot to mention when I was sending my comments on patch 2: we should >> standardize on "protected config" and not use "protected configuration" >> anywhere. > > Makes sense. Using a single word consistently does make sense, but why favor a non-word over a proper word ;-)? > I suppose that the idea behind this is that we only parse and store each > config file exactly once. It's a good goal, but the whole point of the > configset is that we can query a single struct to figure out the value > of a config variable. Having multiple configsets starts to shift more of > the burden to the callers because they now have to query multiple > configsets to find their desired config value, and we already start to > see some of this unpleasantness in this series. Yes, I was worried about this, too. "parse and store exactly once" may merely be a performance thing, but it still matters, even though it is not worse than making duplicate callbacks to overwrite globals that have been already set earlier, which will affect correctness ;-) > An alternative that I'd been thinking about is to make a few changes to > the git_config_* + configset API to allow us to use a single configset > for all of our needs: > > 1. Keep track of what config we've read when reading into > the_repository->config, i.e. instead of a boolean "all config has > been [un]read", we can express "system and global config has been > read, but not local or command config". Then, use this information to > load config from sources as they become available. This will allow us > to read incomplete config for trace2 and setup.c (discovery.bare and > safe.directory), and only read what we need later on. That is not a bad direction to go, but are we sure that we always read in the right order (and there is one single right order) and stop at the right step? config.c::do_git_config_sequence() reads the system and then the global before the local, the worktree, and the command line. We would allow the values of "protected" configuration variables to be inspected by stopping after the first two and inspecting the result before the local and the rest overrides them, but will we need *only* that kind of partial configuration reading that stops exactly there? Even with the proposed "protected" scheme, I thought we plan to honor the command line ones, so we may need to read system+global+command without reading anything else to grab the values only from the protected sources (ah, I like the application of the adjective "protected" to the source, not variables, because that is what we are really talking about---alternatively we could call it "safe"). But if we later read local and worktree ones lazily, unless we _insert_ them before what we read from the command line, we'll break the last-one-wins property, so we need to be careful. I guess each configuration value in the configset knows where it came from, so it probably is possible to insert the ones you read lazily later in the right spot. > 2. Add an additional argument that specifies what scopes to respect when > reading config (maybe as a set of flags). This gives us extra > specificity when using the git_config*() functions, so we could get > rid of git_protected_config() like so: > > /* Change enum config_scope into flags first... */ > > #define WIP_SCOPES_PROTECTED = CONFIG_SCOPE_SYSTEM & \ > CONFIG_SCOPE_GLOBAL & CONFIG_SCOPE_COMMAND > > static enum discovery_bare_allowed get_discovery_bare(void) > { > enum discovery_bare_allowed result = DISCOVERY_BARE_ALWAYS; > git_config(discovery_bare_cb, &result, WIP_SCOPES_PROTECTED); > return result; > } Alternatively, we could make the callback aware of the scope for each var-value it is called and have it filter, but that would be a bigger surgery. I think a new iterator git_config_in_scope(), instead of updating git_config(), would make sense. By definition, all existing git_config() callers do not need the scope specifiers, and "protected" may be the first one but will not be the last one that needs to read from particular scopes.
Junio C Hamano <gitster@pobox.com> writes: > Glen Choo <chooglen@google.com> writes: > >> Jonathan Tan <jonathantanmy@google.com> writes: >> >>> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: >>>> Glen Choo (5): >>>> Documentation/git-config.txt: add SCOPES section >>>> Documentation: define protected configuration >>> >>> Forgot to mention when I was sending my comments on patch 2: we should >>> standardize on "protected config" and not use "protected configuration" >>> anywhere. >> >> Makes sense. > > Using a single word consistently does make sense, but why favor a > non-word over a proper word ;-)? Hm, I guess there's an argument that "config" is a term of art that specifically refers to things from "git config". From that lens, it's much less confusing to see the CONFIGURATION section in Documentation/git-config.txt. But the argument is a little flimsy because I don't think that's something we've stuck to anywhere. I'll use "configuration" if it's not too unwieldy. >> I suppose that the idea behind this is that we only parse and store each >> config file exactly once. It's a good goal, but the whole point of the >> configset is that we can query a single struct to figure out the value >> of a config variable. Having multiple configsets starts to shift more of >> the burden to the callers because they now have to query multiple >> configsets to find their desired config value, and we already start to >> see some of this unpleasantness in this series. > > Yes, I was worried about this, too. "parse and store exactly once" > may merely be a performance thing, but it still matters, even though > it is not worse than making duplicate callbacks to overwrite globals > that have been already set earlier, which will affect correctness ;-) Exactly. >> An alternative that I'd been thinking about is to make a few changes to >> the git_config_* + configset API to allow us to use a single configset >> for all of our needs: >> >> 1. Keep track of what config we've read when reading into >> the_repository->config, i.e. instead of a boolean "all config has >> been [un]read", we can express "system and global config has been >> read, but not local or command config". Then, use this information to >> load config from sources as they become available. This will allow us >> to read incomplete config for trace2 and setup.c (discovery.bare and >> safe.directory), and only read what we need later on. > > That is not a bad direction to go, but are we sure that we always > read in the right order (and there is one single right order) and > stop at the right step? > > config.c::do_git_config_sequence() reads the system and then the > global before the local, the worktree, and the command line. We > would allow the values of "protected" configuration variables to be > inspected by stopping after the first two and inspecting the result > before the local and the rest overrides them, but will we need > *only* that kind of partial configuration reading that stops exactly > there? Even with the proposed "protected" scheme, I thought we plan > to honor the command line ones, so we may need to read > system+global+command without reading anything else to grab the > values only from the protected sources (ah, I like the application > of the adjective "protected" to the source, not variables, because > that is what we are really talking about---alternatively we could > call it "safe"). But if we later read local and worktree ones > lazily, unless we _insert_ them before what we read from the command > line, we'll break the last-one-wins property, so we need to be > careful. I guess each configuration value in the configset knows > where it came from, so it probably is possible to insert the ones > you read lazily later in the right spot. Yeah, last-one-wins makes this a lot trickier. I thought that it would be nice to have insert-with-priority because that also eliminates some of the correctness concerns in this series, i.e. that ensures protected config has the same priority as regular config, but that's a bigger undertaking and I'm not certain about the performance. >> 2. Add an additional argument that specifies what scopes to respect when >> reading config (maybe as a set of flags). This gives us extra >> specificity when using the git_config*() functions, so we could get >> rid of git_protected_config() like so: >> >> /* Change enum config_scope into flags first... */ >> >> #define WIP_SCOPES_PROTECTED = CONFIG_SCOPE_SYSTEM & \ >> CONFIG_SCOPE_GLOBAL & CONFIG_SCOPE_COMMAND >> >> static enum discovery_bare_allowed get_discovery_bare(void) >> { >> enum discovery_bare_allowed result = DISCOVERY_BARE_ALWAYS; >> git_config(discovery_bare_cb, &result, WIP_SCOPES_PROTECTED); >> return result; >> } > > Alternatively, we could make the callback aware of the scope for > each var-value it is called and have it filter, but that would be a > bigger surgery. > > I think a new iterator git_config_in_scope(), instead of updating > git_config(), would make sense. By definition, all existing > git_config() callers do not need the scope specifiers, and > "protected" may be the first one but will not be the last one that > needs to read from particular scopes. Makes sense. The signature of git_config() could stay the same, but we could refactor it to use git_config_in_scope().
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > This round doesn't introduce any major code changes. The most notable > changes are: > [...] > > * I added a test for "git config --show-scope" and the 'worktree' scope, > since 'worktree' wasn't listed in Documentation/git-config.txt (6/5). Erratum: This patch was sent as gc/document-config-worktree-scope [1]. Patch 6/5 obviously doesn't exist. [1] https://lore.kernel.org/git/pull.1274.git.git.1654637044966.gitgitgadget@gmail.com