Message ID | pull.1463.git.git.1677631097.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | config.c: use struct for config reading state | expand |
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > * We could add "struct config_reader" to "config_fn_t", i.e. > > -typedef int (*config_fn_t)(const char *var, const char *val, void > *data); +typedef int (*config_fn_t)(const struct config_reader *reader, > const char *var, const char *val, void *data); > > which isn't complex at all, except that there are ~100 config_fn_t > implementations [3] and a good number of them may never reference > "reader". If the churn is tolerable, I think this a good way forward. > > * We could create a new kind of "config_fn_t" that accepts "struct > config_reader", e.g. > > typedef int (*config_fn_t)(const char *var, const char *val, void *data); > +typedef int (*config_state_fn_t)(const struct config_reader *reader, > const char *var, const char *val, void *data); > > and only adjust the callers that would actually reference "reader". This > is less churn, but I couldn't find a great way to do this kind of > 'switching between config callback types' elegantly. To reduce churn, one thing that could be done alongside is to convert config-using code (which is...practically the rest of Git) to start using the configset interface (we seem to be using configsets internally anyway, as evidenced by repo_config()). That way, we would reduce the number of implementations of config_fn_t. > * We could smuggle "struct config_reader" to callback functions in a way > that interested callers could see it, but uninterested callers could > ignore. One trick that Jonathan Tan came up with (though not necessarily > endorsed) would be to allocate a struct for the config value + "struct > config_reader", then, interested callers could use "offset_of" to recover > the "struct config_reader". It's a little hacky, but it's low-churn. Indeed, although we should probably use this as a last resort. > = Questions > > * Is this worth merging without the extra work? There are some cleanups in > this series that could make it valuable, but there are also some hacks > (see 4/6) that aren't so great. I'm leaning towards merging it now, but can go either way, since the cost of churn is limited to one single file, but so are the benefits. If it was up to me to decide, I would merge it now, because this opens up a lot of work that other contributors could individually do (in particular, converting individual config code paths so that we don't need to reference the_reader as a global anymore). I don't see 4/6 as a hack. It is true that the nature of the config_fn_t callback could change so that passing the reader would end up being done in yet another different way, but firstly, I don't think that will happen for quite some time, and secondly, it might not happen at all (right now, I think what's most likely to happen is that the rest of the Git code moves to configsets and only a fraction of the Git code would need to do low-level parsing, which would not have a problem passing the reader through the data object since they would probably need to pass other context anyway). > * Is the extra work even worth it? Depends on which extra work, but I think that eliminating the the_reader global would really be useful (and, as far as I know, the whole reason for this effort). From the Git codebase's perspective, doing this would (as far as I know) eliminate the need for pushing and popping cf, and make multithreaded multi-repo operations less error-prone (e.g. we can spawn a thread operating on a submodule and that thread can itself read the configs of nested submodules without worrying about clobbering global state...well, there is thread-local storage, but as far as I know this is not portable). > * Do any of the ideas seem more promising than the others? Are there other > ideas I'm missing? Hopefully I answered this in my answers to the other questions.
Jonathan Tan <jonathantanmy@google.com> writes: > "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: >> * We could create a new kind of "config_fn_t" that accepts "struct >> config_reader", e.g. >> >> typedef int (*config_fn_t)(const char *var, const char *val, void *data); >> +typedef int (*config_state_fn_t)(const struct config_reader *reader, >> const char *var, const char *val, void *data); >> >> and only adjust the callers that would actually reference "reader". This >> is less churn, but I couldn't find a great way to do this kind of >> 'switching between config callback types' elegantly. > > To reduce churn, one thing that could be done alongside is to convert > config-using code (which is...practically the rest of Git) to start > using the configset interface (we seem to be using configsets internally > anyway, as evidenced by repo_config()). That way, we would reduce the > number of implementations of config_fn_t. By configset interface, I believe you mean the O(1) lookup functions like git_config_get_int() (which rely on the value being cached, but don't necessarily accept "struct config_set" as an arg)? I think that makes sense both from a performance and maintenance perspective. >> = Questions >> >> * Is this worth merging without the extra work? There are some cleanups in >> this series that could make it valuable, but there are also some hacks >> (see 4/6) that aren't so great. > I don't see 4/6 as a hack. It is true that the nature of the config_fn_t > callback could change so that passing the reader would end up being > done in yet another different way, but firstly, I don't think that will > happen for quite some time, and secondly, it might not happen at all > (right now, I think what's most likely to happen is that the rest of the > Git code moves to configsets and only a fraction of the Git code would > need to do low-level parsing, which would not have a problem passing the > reader through the data object since they would probably need to pass > other context anyway). Given how painful it is to change the config_fn_t signature, I think it is important to get as right as possible the first time. After I sent this out, I thought of yet another possible config_fn_t signature (since most callbacks only need diagnostic information, we could pass "struct key_value_info" instead of the more privileged "struct config_reader"), but given how many functions we'd have to change, it seemed extremely difficult to even begin experimenting with this different signature. So I think you're right that we'd want to start by removing as many config_fn_t implementations as possible. Perhaps we'd also want to consider creating new abstractions for other situations where the key is not known, e.g. "remote.<name>" and "branch.<name>".
Glen Choo <chooglen@google.com> writes: > By configset interface, I believe you mean the O(1) lookup functions > like git_config_get_int() (which rely on the value being cached, but > don't necessarily accept "struct config_set" as an arg)? I think that > makes sense both from a performance and maintenance perspective. Ah, yes. (More precisely, not the one where you call something like repo_config(), passing a callback function that.) > Given how painful it is to change the config_fn_t signature, I think it > is important to get as right as possible the first time. After I sent > this out, I thought of yet another possible config_fn_t signature > (since most callbacks only need diagnostic information, we could pass > "struct key_value_info" instead of the more privileged "struct > config_reader"), but given how many functions we'd have to change, it > seemed extremely difficult to even begin experimenting with this > different signature. Yeah, the first change is the hardest. I think passing it a single struct (so, instead of key, value, data, reader, and then any future fields we would need) to which we can add fields later would mean that we wouldn't need any changes beyond the first, though.
On Wed, Mar 01 2023, Glen Choo via GitGitGadget wrote: > This RFC is preparation for config.[ch] to be libified as as part of the > libification effort that Emily described in [1]. One of the first goals is > to read config from a file, but the trouble with how config.c is written > today is that all reading operations rely on global state, so before turning > that into a library, we'd want to make that state non-global. > > This series gets us about halfway there; it does enough plumbing for a > workable-but-kinda-ugly library interface, but with a little bit more work, > I think we can get rid of global state in-tree as well. That requires a fair > amount of work though, so I'd like to get thoughts on that before starting > work. > > = Description > > This series extracts the global config reading state into "struct > config_reader" and plumbs it through the config reading machinery. It's very > similar to how we've plumbed "struct repository" and other 'context objects' > in the past, except: > > * The global state (named "the_reader") for the git process lives in a > config.c static variable, and not on "the_repository". See 3/6 for the > rationale. I agree with the overall direction, but don't think that rationale in 3/6 is sufficient to go in this "the_reader" direction, as opposed to sticking with and extending "the_repository" approach. For orthagonal reasons (getting rid of some of the API duplication) I've been carrying a patch to get rid of the "configset" part of the *public* API, i.e. to have API users always use the "repo_config_*()" or "git_config_*()" variants, that patch is at: https://github.com/avar/git/commit/0233297a359bbda43a902dd0213aacdca82faa34 All of the rationale in your 3/6 is true now, but as that patch shows the reason for why we have "the_repository" is for the trivial reason that we want to access the repo's "config" member. It's a bit distasteful, but that change argues that just mocking up a "struct repository" with a "config" member pointing to a new configset is better than maintaining an entirely different API just for those cases where we need to parse a one-off file or whatever. I think that going in that direction neatly solves the issues you're noting here and in your 3/6, i.e. we'd always have this in the "repo" object, so we'd just stick the persistent "reader" variables in the "struct repository"'s "config" member.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Wed, Mar 01 2023, Glen Choo via GitGitGadget wrote: > >> This series extracts the global config reading state into "struct >> config_reader" and plumbs it through the config reading machinery. It's very >> similar to how we've plumbed "struct repository" and other 'context objects' >> in the past, except: >> >> * The global state (named "the_reader") for the git process lives in a >> config.c static variable, and not on "the_repository". See 3/6 for the >> rationale. > > I agree with the overall direction, but don't think that rationale in > 3/6 is sufficient to go in this "the_reader" direction, as opposed to > sticking with and extending "the_repository" approach. > > For orthagonal reasons (getting rid of some of the API duplication) I've > been carrying a patch to get rid of the "configset" part of the *public* > API, i.e. to have API users always use the "repo_config_*()" or > "git_config_*()" variants, that patch is at: > https://github.com/avar/git/commit/0233297a359bbda43a902dd0213aacdca82faa34 Those patches are probably worth sending, even if only as RFC. I found it pretty hard to draft a substantial response without effectively doing a full review of the patch. > It's a bit distasteful, but that change argues that just mocking up a > "struct repository" with a "config" member pointing to a new configset > is better than maintaining an entirely different API just for those > cases where we need to parse a one-off file or whatever. > > I think that going in that direction neatly solves the issues you're > noting here and in your 3/6, i.e. we'd always have this in the "repo" > object, so we'd just stick the persistent "reader" variables in the > "struct repository"'s "config" member. If I understand your proposal correctly, we would move the config variables to the_repository. Then, any time a caller would like to work with an individual file, it would init a new "struct repository" with a clean set of config members (using repo_init_repo_blank_config() or something) and reuse the repo_config_* API? It is a workable solution, e.g. that approach would work around the failures in test-tool and scalar that I observed. In the spirit of libification, this feels like a kludge, though, since we'd be reverting to using "struct repository" for more things instead of using more well-scoped interfaces. IMO a better future for the config_set API would be to move it into configset.c or something, where only users who want the low level API would use it and everyone else would just pretend it doesn't exist. This would be a little like libgit2's organization, where 'general config', 'config parsing' and 'in-memory config value representations' are separate files, e.g. https://github.com/libgit2/libgit2/blob/main/src/libgit2/config.h https://github.com/libgit2/libgit2/blob/main/src/libgit2/config_parse.h https://github.com/libgit2/libgit2/blob/main/src/libgit2/config_entries.h I also hesitate to put the config variables on the_repository, because in the long term, I think "struct config_reader" can and should be purely internal to config.c. But if we start advertising its existence via the_repository, that might be an invitation to (ab)use that API and make that transition harder.
On Tue, Mar 07 2023, Glen Choo wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Wed, Mar 01 2023, Glen Choo via GitGitGadget wrote: >> >>> This series extracts the global config reading state into "struct >>> config_reader" and plumbs it through the config reading machinery. It's very >>> similar to how we've plumbed "struct repository" and other 'context objects' >>> in the past, except: >>> >>> * The global state (named "the_reader") for the git process lives in a >>> config.c static variable, and not on "the_repository". See 3/6 for the >>> rationale. >> >> I agree with the overall direction, but don't think that rationale in >> 3/6 is sufficient to go in this "the_reader" direction, as opposed to >> sticking with and extending "the_repository" approach. >> >> For orthagonal reasons (getting rid of some of the API duplication) I've >> been carrying a patch to get rid of the "configset" part of the *public* >> API, i.e. to have API users always use the "repo_config_*()" or >> "git_config_*()" variants, that patch is at: >> https://github.com/avar/git/commit/0233297a359bbda43a902dd0213aacdca82faa34 > > Those patches are probably worth sending, even if only as RFC. I found > it pretty hard to draft a substantial response without effectively doing > a full review of the patch. Yes, sorry. It's part of some changes on top of my outstanding config API changes (just re-rolled at https://lore.kernel.org/git/cover-v6-0.9-00000000000-20230307T180516Z-avarab@gmail.com/). Hopefully those will land soon after the upcoming release, I'll try to submit this (along with related changes) soon afterwards. >> It's a bit distasteful, but that change argues that just mocking up a >> "struct repository" with a "config" member pointing to a new >> configset is better than maintaining an entirely different API just >> for those cases where we need to parse a one-off file or whatever. >> >> I think that going in that direction neatly solves the issues you're >> noting here and in your 3/6, i.e. we'd always have this in the "repo" >> object, so we'd just stick the persistent "reader" variables in the >> "struct repository"'s "config" member. > > If I understand your proposal correctly, we would move the config > variables to the_repository. Then, any time a caller would like to work > with an individual file, it would init a new "struct repository" with a > clean set of config members (using repo_init_repo_blank_config() or > something) and reuse the repo_config_* API? It's certainly a hack, but so is introducing a new "the_reader" singleton whose lifetime we need to manage seperately from "the_repository" in the common case :) I think a better argument for this is probably that if you try to change repository.h so that we define "struct repository" thusly (The "hash_algo" field being still there due to a very common macro): struct repository { const struct git_hash_algo *hash_algo; struct config_set *config; }; And then try to: make config.o You'll get: $ make config.o CC config.o config.c: In function ‘include_by_branch’: config.c:311:46: error: ‘struct repository’ has no member named ‘gitdir’ 311 | const char *refname = !the_repository->gitdir ? | ^~ config.c: In function ‘repo_read_config’: config.c:2523:30: error: ‘struct repository’ has no member named ‘commondir’ 2523 | opts.commondir = repo->commondir; | ^~ config.c:2524:28: error: ‘struct repository’ has no member named ‘gitdir’ 2524 | opts.git_dir = repo->gitdir; | ^~ make: *** [Makefile:2719: config.o] Error 1 I.e. almost all of the config code doesn't care about the repository at all, it just needs the "struct config_set" that's in the repository struct. With the linked-to change I'm arguing that just mocking it up sucks less than carrying a duplicate set of API functions just for those rare cases where we need to one-off read a config file. But... > It is a workable solution, e.g. that approach would work around the > failures in test-tool and scalar that I observed. In the spirit of > libification, this feels like a kludge, though, since we'd be reverting > to using "struct repository" for more things instead of using more > well-scoped interfaces. IMO a better future for the config_set API would > be to move it into configset.c or something, where only users who want > the low level API would use it and everyone else would just pretend it > doesn't exist. This would be a little like libgit2's organization, where > 'general config', 'config parsing' and 'in-memory config value > representations' are separate files, e.g. > > https://github.com/libgit2/libgit2/blob/main/src/libgit2/config.h > https://github.com/libgit2/libgit2/blob/main/src/libgit2/config_parse.h > https://github.com/libgit2/libgit2/blob/main/src/libgit2/config_entries.h > > I also hesitate to put the config variables on the_repository, because > in the long term, I think "struct config_reader" can and should be > purely internal to config.c. But if we start advertising its existence > via the_repository, that might be an invitation to (ab)use that API and > make that transition harder. ...yes it's a kludge, but I'd think if you're interested in more generally fixing it that a better use of time would be to narrowly focus on those cases where we don't have a "repository" now, i.e. the configset API users my linked-to patch shows & amends. Everything else that uses git_config_get_*(), repo_config_get_*() will either use an already set up "the_repository", or lazily init it, see the git_config_check_init() API users. Or, we could have repo_config() and friends not take a "struct repository", but another config-specific object which has the subset of the three fields from that struct which it actually needs (config, gitdir & commondir). The mocking function I added in the linked-to commit was just a way of getting to that via the shortest means.
Glen Choo <chooglen@google.com> writes: > ... In the spirit of > libification, this feels like a kludge, though, since we'd be reverting > to using "struct repository" for more things instead of using more > well-scoped interfaces. If you include "populate from system-wide, per-user, and repository specific configuration files" as part of the API being libified, your configsets cannot avoid being tied to a repository. But I do not think the reader needs to be in the repository. > IMO a better future for the config_set API would > be to move it into configset.c or something, where only users who want > the low level API would use it and everyone else would just pretend it > doesn't exist. Isn't the use of the reader object purely transitory while you populate the keys and values in a configset from a single file? At the layer to read and populate a configset from a single "source" file, you still do not need repository. Only when you say "I have a 'repo' instance and I want to read the config variables from appropriate places", you call such a "read and populate configset from a single source" helper three or four times to populate repo->config. Once a configset is populated, it or its contents do not depend on the reader instance to function, so I do not see how it benefits to have the reader in the repository object.
Junio C Hamano <gitster@pobox.com> writes: > If you include "populate from system-wide, per-user, and repository > specific configuration files" as part of the API being libified, > your configsets cannot avoid being tied to a repository. But I do > not think the reader needs to be in the repository. > > [...] > > Isn't the use of the reader object purely transitory while you > populate the keys and values in a configset from a single file? At > the layer to read and populate a configset from a single "source" > file, you still do not need repository. > > [...] > Only when you say "I have a 'repo' instance and I want to read the > config variables from appropriate places", you call such a "read and > populate configset from a single source" helper three or four times > to populate repo->config. Once a configset is populated, it or its > contents do not depend on the reader instance to function, so I do > not see how it benefits to have the reader in the repository object. Yes, exactly. Having a config_set on the repository makes sense, but I don't see a good reason to have the reader on the repository. If Ævar sends his series soon, it would be fruitful to see how that series interacts with this one :)
On Tue, Mar 07 2023, Glen Choo wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> If you include "populate from system-wide, per-user, and repository >> specific configuration files" as part of the API being libified, >> your configsets cannot avoid being tied to a repository. But I do >> not think the reader needs to be in the repository. >> >> [...] >> >> Isn't the use of the reader object purely transitory while you >> populate the keys and values in a configset from a single file? At >> the layer to read and populate a configset from a single "source" >> file, you still do not need repository. >> >> [...] >> Only when you say "I have a 'repo' instance and I want to read the >> config variables from appropriate places", you call such a "read and >> populate configset from a single source" helper three or four times >> to populate repo->config. Once a configset is populated, it or its >> contents do not depend on the reader instance to function, so I do >> not see how it benefits to have the reader in the repository object. > > Yes, exactly. Having a config_set on the repository makes sense, but I > don't see a good reason to have the reader on the repository. Isn't Junio suggesting something different here? I hadn't looked closely at this aspect of it, and just took it as a given that we needed to persist this data outside of the configset lifetime. If that's not the case then we don't need it in the file scope, nor a "struct repository" or whatever, and could just have it materialized by git_config_check_init(), no? I.e. when we create the configset we'd create it, and throw it away after the configset is created? I.e. to address this note in your initial RFC: I think we can get rid of global state in-tree as well. That requires a fair amount of work though, so I'd like to get thoughts on that before starting work. > If Ævar sends his series soon, it would be fruitful to see how that > series interacts with this one :) I tried merging this topic with that, and it didn't conflict textually or semantically. I just raised it because I think with your 3/6 you're needlessly tying yourself in knots, i.e. with this part: A more typical approach would be to put this struct on "the_repository", but that's a worse fit for this use case since config reading is not scoped to a repository. E.g. we can read config before the repository is known ("read_very_early_config()"), blatantly ignore the repo ("read_protected_config()"), or read only from a file ("git_config_from_file()"). This is especially evident in t5318 and t9210, where test-tool and scalar parse config but don't fully initialize "the_repository". Out of those examples read_very_early_config() and read_protected_config() already call config_with_options(), which optionally uses a "struct repository" via the "repo" member of "struct git_config_source". I think we may have gotten lost in the weeds over what amounts to an asthetic preference about how to do polymorphism in C. I'd be fine with mocking up the "struct repository", but you could equally prepare and pass some new "config_state" struct that would contain the required information (including the configset). As well as this in the CL: The catch (aka the reason I stopped halfway through) is that I couldn't find a way to expose "struct config_reader" state without some fairly big changes, complexity-wise or LoC-wise[...] I didn't look into exactly why config_fn_t would need your new "reader", but if you accept that we could stick such a thing into "the_repository" then there's no catch or need for the churn to change all those callbacks. Of course something that wants to use the API as a "real" library would need to use some alternate mechanism, as you reference in adding a new "config_state_fn_t". You note: but I couldn't find a great way to do this kind of 'switching between config callback types' elegantly. So, I don't know, but I was suggesting looking into doing that based on "the_repository" in play...
On Mon, Mar 06 2023, Jonathan Tan wrote: > Glen Choo <chooglen@google.com> writes: >> By configset interface, I believe you mean the O(1) lookup functions >> like git_config_get_int() (which rely on the value being cached, but >> don't necessarily accept "struct config_set" as an arg)? I think that >> makes sense both from a performance and maintenance perspective. > > Ah, yes. (More precisely, not the one where you call something like > repo_config(), passing a callback function that.) > >> Given how painful it is to change the config_fn_t signature, I think it >> is important to get as right as possible the first time. After I sent >> this out, I thought of yet another possible config_fn_t signature >> (since most callbacks only need diagnostic information, we could pass >> "struct key_value_info" instead of the more privileged "struct >> config_reader"), but given how many functions we'd have to change, it >> seemed extremely difficult to even begin experimenting with this >> different signature. > > Yeah, the first change is the hardest. I think passing it a single > struct (so, instead of key, value, data, reader, and then any future > fields we would need) to which we can add fields later would mean that > we wouldn't need any changes beyond the first, though. The more I've looked at this the more I'm convinced this is the wrong direction. For the configset API users we already have the line number, source etc. in the "util" member, i.e. when we have an error in any API user that uses the configset they can error about the specific line that config came from. I think this may have been conflated because e.g. for the configset to get the "scope" we need to go from do_git_config_sequence(), which will currently set "current_parsing_scope", all the way down to configset_add_value(), and there we'll make use of the "config_set_callback", which is a config_fn_t. But that's all internal "static" functions, except git_config_from_file() and git_config_from_file_with_options(), but those have only a handful of callers. But that's *different* than the user callbacks, which will be invoked through a loop in configset_iter(), i.e. *after* we've parsed the config, and are just getting the line number, scope etc. from the configset. There's other edge cases, e.g. current_config_line() will access the global, but it only has two callers (one if we exclude the test helper). But I think the answer there is to change the config_read_push_default() code, not to give every current "config_fn_t" implementation an extra parameter.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Mon, Mar 06 2023, Jonathan Tan wrote: > >> Glen Choo <chooglen@google.com> writes: >>> By configset interface, I believe you mean the O(1) lookup functions >>> like git_config_get_int() (which rely on the value being cached, but >>> don't necessarily accept "struct config_set" as an arg)? I think that >>> makes sense both from a performance and maintenance perspective. >> >> Ah, yes. (More precisely, not the one where you call something like >> repo_config(), passing a callback function that.) >> >>> Given how painful it is to change the config_fn_t signature, I think it >>> is important to get as right as possible the first time. After I sent >>> this out, I thought of yet another possible config_fn_t signature >>> (since most callbacks only need diagnostic information, we could pass >>> "struct key_value_info" instead of the more privileged "struct >>> config_reader"), but given how many functions we'd have to change, it >>> seemed extremely difficult to even begin experimenting with this >>> different signature. >> >> Yeah, the first change is the hardest. I think passing it a single >> struct (so, instead of key, value, data, reader, and then any future >> fields we would need) to which we can add fields later would mean that >> we wouldn't need any changes beyond the first, though. > > For the configset API users we already have the line number, source > etc. in the "util" member, i.e. when we have an error in any API user > that uses the configset they can error about the specific line that > config came from. Yeah, and I think we should plumb the "util" (actually "struct key_value_info") back to the users who need that diagnostic info, which achieves the same purpose as plumbing the "config_reader" to the callback functions (since the callback functions only need to read diagnostic information), but it's better since it exposes fewer gory details and we can refactor those using coccinelle. The difficult part is replacing the callbacks with the configset API in the first place. > I think this may have been conflated because e.g. for the configset to > get the "scope" we need to go from do_git_config_sequence(), which will > currently set "current_parsing_scope", all the way down to > configset_add_value(), and there we'll make use of the > "config_set_callback", which is a config_fn_t. I think this is half-right (at least from a historical perspective). We started by just reading auxiliary info like line number and file name from the "current config source", but when we started caching values in config sets, we had to find another way to share this auxiliary info. The solution that 0d44a2dacc (config: return configset value for current_config_ functions, 2016-05-26) gave us is to also cache the auxiliary info, and then when we iterate the configset, we set the global "current_config_kvi" to the cached value. So they aren't conflated per se, since the reason for their existence is so that API users don't have to care whether or not they are iterating a file or iterating a configset. But as you've observed... > But that's all internal "static" functions, except > git_config_from_file() and git_config_from_file_with_options(), but > those have only a handful of callers. > > But that's *different* than the user callbacks, which will be invoked > through a loop in configset_iter(), i.e. *after* we've parsed the > config, and are just getting the line number, scope etc. from the > configset. in practice, very few users read (and should be reading) config directly from files. Most want the 'config for the whole repo' (which is handled by the repo_* and git_* functions) and others will explicitly call git_config_from_file*() so I don't think the config API needs to keep users ignorant of 'whether the current config is cached or not'. We could convert callbacks to the configset API, and if we then we plumb the key_value_info to the configset API users, maybe we can retire "current_config_kvi". > There's other edge cases, e.g. current_config_line() will access the > global, but it only has two callers (one if we exclude the test > helper). But I think the answer there is to change the > config_read_push_default() code, not to give every current "config_fn_t" > implementation an extra parameter. I agree that we should rewrite config_read_push_default(), but wouldn't we still need to expose auxiliary information (line number, scope) to users of the callback API? e.g. config.c:die_bad_number() uses the file name [*], and builtin/config.c definitely needs it for 'git config -l'. Also, an express purpose for this series is to prepare git_config_from_file() to be used by callers out-of-tree, which would definitely need that information. I'm not sure if you're proposing to move state from "the_reader" to something like "the_repository.config_state". I'd hesitate to take that approach, since we're just swapping one global for another. [*] config.c:die_bad_number() is actually a bit broken because it doesn't use the cached kvi info from the config set. I'll probably send a fixup patch to fix this.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> Yes, exactly. Having a config_set on the repository makes sense, but I >> don't see a good reason to have the reader on the repository. > > [...] > > I hadn't looked closely at this aspect of it, and just took it as a > given that we needed to persist this data outside of the configset > lifetime. There really isn't a need to persist the "config_reader" state, we'd only need it while reading the file, and as you've hinted, once we've cached the info in the configset, we'd just use that instead. > If that's not the case then we don't need it in the file scope, nor a > "struct repository" or whatever, and could just have it materialized by > git_config_check_init(), no? I.e. when we create the configset we'd > create it, and throw it away after the configset is created? Yes, except that I'm proposing that we should do it even lower in the chain, like do_config_from(). > The catch (aka the reason I stopped halfway through) is that I > couldn't find a way to expose "struct config_reader" state > without some fairly big changes, complexity-wise or LoC-wise[...] > > I didn't look into exactly why config_fn_t would need your new "reader", Ah, this is what I was referencing in the CL here: I believe that only config callbacks are accessing this [config reader] state, e.g. because they use the low-level information (like builtin/config.c printing the filename and scope of the value) or for error reporting (like git_parse_int() reporting the filename and line number of the value it failed to parse) > but if you accept that we could stick such a thing into "the_repository" > then there's no catch or need for the churn to change all those > callbacks. > > Of course something that wants to use the API as a "real" library would > need to use some alternate mechanism, as you reference in adding a new > "config_state_fn_t". You note: > > but I couldn't find a great way to do this kind of 'switching > between config callback types' elegantly. > > So, I don't know, but I was suggesting looking into doing that based on > "the_repository" in play... And yes, since the primary purpose is to make git_config_from_file() usable by a library caller (and secondarily, prepare for a future where reading config is thread-safer because of less global state, as Jonathan discussed [1]), I'd prefer not to use the_repository. 1. https://lore.kernel.org/git/20230306195756.3399115-1-jonathantanmy@google.com