Message ID | 1b27e0b115f858a422e0a2891688227be8f3db01.1648055915.git.steadmon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] repo-settings: set defaults even when not in a repo | expand |
On 3/23/2022 2:03 PM, Josh Steadmon wrote: > prepare_repo_settings() initializes a `struct repository` with various > default config options and settings read from a repository-local config > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only > in git repos), prepare_repo_settings was changed to issue a BUG() if it > is called by a process whose CWD is not a Git repository. This approach > was suggested in [1]. > > This breaks fuzz-commit-graph, which attempts to parse arbitrary > fuzzing-engine-provided bytes as a commit graph file. > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but > since we run the fuzz tests without a valid repository, we are hitting > the BUG() from 44c7e62 for every test case. > > Fix this by refactoring prepare_repo_settings() such that it sets > default options unconditionally; if its process is in a Git repository, > it will also load settings from the local config. This eliminates the > need for a BUG() when not in a repository. I think you have the right idea and this can work. > - /* Booleans config or default, cascades to other settings */ > - repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0); > - repo_cfg_bool(r, "feature.experimental", &experimental, 0); > + if (r->gitdir) { > + /* Booleans config or default, cascades to other settings */ > + repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0); > + repo_cfg_bool(r, "feature.experimental", &experimental, 0); However, this giant if block is going to be a bit unwieldy, especially as we add settings in the future. Ignoring whitespace, this is your diff: diff --git a/repo-settings.c b/repo-settings.c index b4fbd16cdc..d154b37007 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r) char *strval; int manyfiles; - if (!r->gitdir) - BUG("Cannot add settings for uninitialized repository"); - if (r->settings.initialized++) return; @@ -28,6 +25,7 @@ void prepare_repo_settings(struct repository *r) r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP; r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE; + if (r->gitdir) { /* Booleans config or default, cascades to other settings */ repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0); repo_cfg_bool(r, "feature.experimental", &experimental, 0); @@ -50,16 +48,6 @@ void prepare_repo_settings(struct repository *r) repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); (Adding the if) - /* - * The GIT_TEST_MULTI_PACK_INDEX variable is special in that - * either it *or* the config sets - * r->settings.core_multi_pack_index if true. We don't take - * the environment variable if it exists (even if false) over - * any config, as in most other cases. - */ - if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) - r->settings.core_multi_pack_index = 1; - /* * Non-boolean config */ @@ -93,6 +81,17 @@ void prepare_repo_settings(struct repository *r) else die("unknown fetch negotiation algorithm '%s'", strval); } + } + + /* + * The GIT_TEST_MULTI_PACK_INDEX variable is special in that + * either it *or* the config sets + * r->settings.core_multi_pack_index if true. We don't take + * the environment variable if it exists (even if false) over + * any config, as in most other cases. + */ + if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) + r->settings.core_multi_pack_index = 1; (Moving this to group with other no-dir settings.) I think what you're really trying to do is if (r->gitdir) (Load settings that require a repo) (Load settings that work without a repo.) I think that you'd be better off extracting the two types of config into helper methods (prepare_gitdir_settings()/prepare_nodir_settings()?) across two patches, then in a third removing the BUG and inserting the if (r->gitdir) above. Does that seem reasonable? Thanks, -Stolee
On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote: > On 3/23/2022 2:03 PM, Josh Steadmon wrote: > > prepare_repo_settings() initializes a `struct repository` with various > > default config options and settings read from a repository-local config > > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only > > in git repos), prepare_repo_settings was changed to issue a BUG() if it > > is called by a process whose CWD is not a Git repository. This approach > > was suggested in [1]. > > > > This breaks fuzz-commit-graph, which attempts to parse arbitrary > > fuzzing-engine-provided bytes as a commit graph file. > > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but > > since we run the fuzz tests without a valid repository, we are hitting > > the BUG() from 44c7e62 for every test case. > > > > Fix this by refactoring prepare_repo_settings() such that it sets > > default options unconditionally; if its process is in a Git repository, > > it will also load settings from the local config. This eliminates the > > need for a BUG() when not in a repository. > > I think you have the right idea and this can work. Hmmm. To me this feels like bending over backwards in `prepare_repo_settings()` to accommodate one particular caller. I'm not necessarily opposed to it, but it does feel strange to make `prepare_repo_settings()` a noop here, since I would expect that any callers who do want to call `prepare_repo_settings()` are likely convinced that they are inside of a repository, and it probably should be a BUG() if they aren't. I was initially thinking that Josh's alternative of introducing and calling a lower-level version of `prepare_commit_graph()` that doesn't require the use of any repository settings would make sense. But when I started looking at implementing it, I became confused at how this is supposed to work at all without using a repository. We depend on the values parsed from: - commitGraph.generationVersion - commitGraph.readChangedPaths to determine which part(s) of the commit graph we do and don't read. Looking around, I think I probably inadvertently broke this in ab14d0676c (commit-graph: pass a 'struct repository *' in more places, 2020-09-09). But prior to ab14d0676c, neither of those settings existed, so parsing the commit graph was a pure function of the commit graph's contents alone, and didn't rely on the existence of a repository. We could pretend as if `commitGraph.generationVersion` is always "2" and `commitGraph.readChangedPaths` is always "true", and I think that would still give us good-enough coverage. I assume that libFuzzer doesn't give us a convenient way to stub out a repository. Maybe we should have a variant of `parse_commit_graph` that instead takes a `struct repo_settings` filled out with the defaults? We could use that to teach libFuzzer about additional states that the commit graph parser can be in, too (though probably outside the scope of this patch). I tried to sketch all of this out, which seems to work. Let me know what you think: --- 8< --- diff --git a/commit-graph.c b/commit-graph.c index adffd020dd..3d5aa536c2 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data); static struct commit_graph_data_slab commit_graph_data_slab = COMMIT_SLAB_INIT(1, commit_graph_data_slab); -static int get_configured_generation_version(struct repository *r) -{ - int version = 2; - repo_config_get_int(r, "commitgraph.generationversion", &version); - return version; -} - uint32_t commit_graph_position(const struct commit *c) { struct commit_graph_data *data = @@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start, struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size) +{ + prepare_repo_settings(r); + return parse_commit_graph_settings(&r->settings, graph_map, graph_size); +} + +struct commit_graph *parse_commit_graph_settings(struct repo_settings *s, + void *graph_map, size_t graph_size) { const unsigned char *data; struct commit_graph *graph; @@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r, return NULL; } - prepare_repo_settings(r); - graph = alloc_commit_graph(); graph->hash_len = the_hash_algo->rawsz; @@ -402,7 +400,7 @@ struct commit_graph *parse_commit_graph(struct repository *r, pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges); pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs); - if (get_configured_generation_version(r) >= 2) { + if (s->commit_graph_generation_version >= 2) { pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA, &graph->chunk_generation_data); pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW, @@ -412,7 +410,7 @@ struct commit_graph *parse_commit_graph(struct repository *r, graph->read_generation_data = 1; } - if (r->settings.commit_graph_read_changed_paths) { + if (s->commit_graph_read_changed_paths) { pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, &graph->chunk_bloom_indexes); read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, @@ -2299,7 +2297,7 @@ int write_commit_graph(struct object_directory *odb, ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; ctx->opts = opts; ctx->total_bloom_filter_data_size = 0; - ctx->write_generation_data = (get_configured_generation_version(r) == 2); + ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2); ctx->num_generation_data_overflows = 0; bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", diff --git a/commit-graph.h b/commit-graph.h index 2e3ac35237..1fdca7a927 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r, struct object_directory *odb); struct commit_graph *parse_commit_graph(struct repository *r, void *graph_map, size_t graph_size); +struct commit_graph *parse_commit_graph_settings(struct repo_settings *s, + void *graph_map, size_t graph_size); /* * Return 1 if and only if the repository has a commit-graph diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c index e7cf6d5b0f..07ef4643d8 100644 --- a/fuzz-commit-graph.c +++ b/fuzz-commit-graph.c @@ -11,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) struct commit_graph *g; initialize_the_repository(); - g = parse_commit_graph(the_repository, (void *)data, size); + g = parse_commit_graph_settings(the_repository->settings, (void *)data, size); repo_clear(the_repository); free_commit_graph(g); diff --git a/repo-settings.c b/repo-settings.c index b4fbd16cdc..a7e5d10b27 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest, *dest = def; } +static void repo_cfg_int(struct repository *r, const char *key, int *dest, + int def) +{ + if (repo_config_get_int(r, key, dest)) + *dest = def; +} + void prepare_repo_settings(struct repository *r) { int experimental; @@ -43,6 +50,7 @@ void prepare_repo_settings(struct repository *r) /* Boolean config or default, does not cascade (simple) */ repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); + repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2); repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1); repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1); repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0); diff --git a/repository.h b/repository.h index e29f361703..6c40ea3f1b 100644 --- a/repository.h +++ b/repository.h @@ -29,6 +29,7 @@ struct repo_settings { int initialized; int core_commit_graph; + int commit_graph_generation_version; int commit_graph_read_changed_paths; int gc_write_commit_graph; int fetch_write_commit_graph; --- >8 --- Thanks, Taylor
Josh Steadmon wrote: > prepare_repo_settings() initializes a `struct repository` with various > default config options and settings read from a repository-local config > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only > in git repos), prepare_repo_settings was changed to issue a BUG() if it > is called by a process whose CWD is not a Git repository. This approach > was suggested in [1]. > > This breaks fuzz-commit-graph, which attempts to parse arbitrary > fuzzing-engine-provided bytes as a commit graph file. > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but > since we run the fuzz tests without a valid repository, we are hitting > the BUG() from 44c7e62 for every test case. > > Fix this by refactoring prepare_repo_settings() such that it sets > default options unconditionally; if its process is in a Git repository, > it will also load settings from the local config. This eliminates the > need for a BUG() when not in a repository. > I think the decision of whether to go with this approach or the alternative listed below depends on the validity of a 'repository' without a gitdir. As far as I can tell, there is an implicit conflict between the changes in: 1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths', 2020-09-09) 2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos, 2021-12-06) (as you pointed out in your message) The former says that commit-graph should use a repository setting (implying it needs a valid repository), and the latter says that you need a valid gitdir to get repository settings. So to me, how to proceed depends on whether a repository can be "valid" without a gitdir or not: * If it is valid (or not-completely-broken/semi-valid - i.e., not a BUG()), then your approach is probably fine - the 'BUG()' should be removed from 'prepare_repo_settings()'. But, to make sure we don't run into this question again, 'prepare_repo_settings()' should be audited for any settings/setup that implicitly require a gitdir and updated accordingly, along with a comment somewhere saying that the 'gitdir' isn't required. * If it is not valid, then this use in 'fuzz-commit-graph' tests is fragile (even if it's not a problem right now) - 'prepare_repo_settings()' could start needing the gitdir, or "the_hash_algo" could become invalid, or some other problem could arise from the invalid repo and the tests would break. I don't have a good answer to this question, but someone with more experience in this area might be able to help. > Concerns: > > Are any callers strictly dependent on having a BUG() here? I suspect > that the worst that would happen is that rather than this BUG(), the > caller would later hit its own BUG() or die(), so I do not think this is > a blocker. Additionally, every builtin that directly calls > prepare_repo_settings is either marked as RUN_SETUP, which means we > would die() prior to calling it anyway, or checks on its own before > calling it (builtin/diff.c). There are several callers in library code, > though, and I have not tracked down how all of those are used. > > Alternatives considered: > > Setting up a valid repository for fuzz testing would avoid triggering > this bug, but would unacceptably slow down the test cases. > > Refactoring parse_commit_graph() in such a way that the fuzz test has an > alternate entry point that avoids calling prepare_repo_settings() might > be possible, but would be a much larger change than this one. It would > also run the risk that the changes would be so extensive that the fuzzer > would be merely testing its own custom commit-graph implementation, > rather than the one that's actually used in the real world. > If the 'repository' really is *completely* invalid without a gitdir, I don't think the refactor of 'commit-graph.c' would necessarily be too bad - certainly not so extensive that it's not testing "real world" behavior: * Make 'parse_commit_graph()' only call 'prepare_repo_settings()' if the repo has a valid gitdir (guarded like 'git diff' is) * Guard all use use of 'r' in 'parse_commit_graph()' based on the existence of 'gitdir', with fallbacks on e.g. global config settings. From what I can see, the only usage is: * The check for 'r->settings.commit_graph_read_changed_paths' * The call to 'get_configured_generation_version()' * The use of 'the_hash_algo->raw_sz' > [1]: https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/ > > Signed-off-by: Josh Steadmon <steadmon@google.com> > --- > repo-settings.c | 111 ++++++++++++++++++++++++------------------------ > 1 file changed, 55 insertions(+), 56 deletions(-) > > diff --git a/repo-settings.c b/repo-settings.c > index b4fbd16cdc..d154b37007 100644 > --- a/repo-settings.c > +++ b/repo-settings.c > @@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r) > char *strval; > int manyfiles; > > - if (!r->gitdir) > - BUG("Cannot add settings for uninitialized repository"); > - > if (r->settings.initialized++) > return; > > @@ -28,27 +25,63 @@ void prepare_repo_settings(struct repository *r) > r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP; > r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE; > > - /* Booleans config or default, cascades to other settings */ > - repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0); > - repo_cfg_bool(r, "feature.experimental", &experimental, 0); > + if (r->gitdir) { > + /* Booleans config or default, cascades to other settings */ > + repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0); > + repo_cfg_bool(r, "feature.experimental", &experimental, 0); > > - /* Defaults modified by feature.* */ > - if (experimental) { > - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; > - } > - if (manyfiles) { > - r->settings.index_version = 4; > - r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; > - } > + /* Defaults modified by feature.* */ > + if (experimental) { > + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; > + } > + if (manyfiles) { > + r->settings.index_version = 4; > + r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; > + } > + > + /* Boolean config or default, does not cascade (simple) */ > + repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); > + repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1); > + repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1); > + repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0); > + repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); > + repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); > + repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); > > - /* Boolean config or default, does not cascade (simple) */ > - repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); > - repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1); > - repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1); > - repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0); > - repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); > - repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); > - repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); > + /* > + * Non-boolean config > + */ > + if (!repo_config_get_int(r, "index.version", &value)) > + r->settings.index_version = value; > + > + if (!repo_config_get_string(r, "core.untrackedcache", &strval)) { > + int v = git_parse_maybe_bool(strval); > + > + /* > + * If it's set to "keep", or some other non-boolean > + * value then "v < 0". Then we do nothing and keep it > + * at the default of UNTRACKED_CACHE_KEEP. > + */ > + if (v >= 0) > + r->settings.core_untracked_cache = v ? > + UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE; > + free(strval); > + } > + > + if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) { > + int fetch_default = r->settings.fetch_negotiation_algorithm; > + if (!strcasecmp(strval, "skipping")) > + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; > + else if (!strcasecmp(strval, "noop")) > + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP; > + else if (!strcasecmp(strval, "consecutive")) > + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE; > + else if (!strcasecmp(strval, "default")) > + r->settings.fetch_negotiation_algorithm = fetch_default; > + else > + die("unknown fetch negotiation algorithm '%s'", strval); > + } > + } > > /* > * The GIT_TEST_MULTI_PACK_INDEX variable is special in that > @@ -60,40 +93,6 @@ void prepare_repo_settings(struct repository *r) > if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) > r->settings.core_multi_pack_index = 1; > > - /* > - * Non-boolean config > - */ > - if (!repo_config_get_int(r, "index.version", &value)) > - r->settings.index_version = value; > - > - if (!repo_config_get_string(r, "core.untrackedcache", &strval)) { > - int v = git_parse_maybe_bool(strval); > - > - /* > - * If it's set to "keep", or some other non-boolean > - * value then "v < 0". Then we do nothing and keep it > - * at the default of UNTRACKED_CACHE_KEEP. > - */ > - if (v >= 0) > - r->settings.core_untracked_cache = v ? > - UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE; > - free(strval); > - } > - > - if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) { > - int fetch_default = r->settings.fetch_negotiation_algorithm; > - if (!strcasecmp(strval, "skipping")) > - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; > - else if (!strcasecmp(strval, "noop")) > - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP; > - else if (!strcasecmp(strval, "consecutive")) > - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE; > - else if (!strcasecmp(strval, "default")) > - r->settings.fetch_negotiation_algorithm = fetch_default; > - else > - die("unknown fetch negotiation algorithm '%s'", strval); > - } > - > /* > * This setting guards all index reads to require a full index > * over a sparse index. After suitable guards are placed in the > > base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7
Josh Steadmon <steadmon@google.com> writes: > prepare_repo_settings() initializes a `struct repository` with various > default config options and settings read from a repository-local config > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only > in git repos), prepare_repo_settings was changed to issue a BUG() if it > is called by a process whose CWD is not a Git repository. This approach > was suggested in [1]. > > This breaks fuzz-commit-graph, which attempts to parse arbitrary > fuzzing-engine-provided bytes as a commit graph file. > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but > since we run the fuzz tests without a valid repository, we are hitting > the BUG() from 44c7e62 for every test case. I think the right approach for such a breakage is to ensure it is in a repository, not to force prepare_repo_settings() to lie. In the day-to-day real code paths the end user uses, we do want to catch mistakes in our code that calls prepare_repo_settings() when we are not in a repository.
Victoria Dye <vdye@github.com> writes: > I think the decision of whether to go with this approach or the alternative > listed below depends on the validity of a 'repository' without a gitdir. > > As far as I can tell, there is an implicit conflict between the changes in: > > 1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths', > 2020-09-09) > 2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos, > 2021-12-06) (as you pointed out in your message) > > The former says that commit-graph should use a repository setting (implying > it needs a valid repository), and the latter says that you need a valid > gitdir to get repository settings. > > So to me, how to proceed depends on whether a repository can be "valid" > without a gitdir or not: Sorry, I do not get it. What does "a repository without a git dir" look like? It does not make any sense to me. A repository without working tree, I can understand.
Junio C Hamano wrote: > Victoria Dye <vdye@github.com> writes: > >> I think the decision of whether to go with this approach or the alternative >> listed below depends on the validity of a 'repository' without a gitdir. >> >> As far as I can tell, there is an implicit conflict between the changes in: >> >> 1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths', >> 2020-09-09) >> 2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos, >> 2021-12-06) (as you pointed out in your message) >> >> The former says that commit-graph should use a repository setting (implying >> it needs a valid repository), and the latter says that you need a valid >> gitdir to get repository settings. >> >> So to me, how to proceed depends on whether a repository can be "valid" >> without a gitdir or not: > > Sorry, I do not get it. What does "a repository without a git dir" > look like? It does not make any sense to me. A repository without > working tree, I can understand. I think that answers my question - if it doesn't make sense to have a "repository without a git dir", then the code shouldn't allow 'the_repository' to be used without a valid 'the_repository.gitdir'. The 'BUG()' in 'prepare_repo_settings()' enforces that condition right now, so removing it like this RFC does would make the tests fragile and the code more prone to future bugs. That said, if it's conceptually sensible for 'fuzz-commit-graph' to work without a repository, then it could be updated to get its defaults from somewhere other than 'the_repository' when the repo doesn't exist (like what I mentioned in [1] ("If the 'repository' really is..."), or Taylor's suggestion in [2]). [1] https://lore.kernel.org/git/fcfdcbb9-761a-0d34-7d36-61e0ef279922@github.com/ [2] https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/
On 2022.03.23 15:52, Taylor Blau wrote: > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote: > > On 3/23/2022 2:03 PM, Josh Steadmon wrote: > > > prepare_repo_settings() initializes a `struct repository` with various > > > default config options and settings read from a repository-local config > > > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only > > > in git repos), prepare_repo_settings was changed to issue a BUG() if it > > > is called by a process whose CWD is not a Git repository. This approach > > > was suggested in [1]. > > > > > > This breaks fuzz-commit-graph, which attempts to parse arbitrary > > > fuzzing-engine-provided bytes as a commit graph file. > > > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but > > > since we run the fuzz tests without a valid repository, we are hitting > > > the BUG() from 44c7e62 for every test case. > > > > > > Fix this by refactoring prepare_repo_settings() such that it sets > > > default options unconditionally; if its process is in a Git repository, > > > it will also load settings from the local config. This eliminates the > > > need for a BUG() when not in a repository. > > > > I think you have the right idea and this can work. > > Hmmm. To me this feels like bending over backwards in > `prepare_repo_settings()` to accommodate one particular caller. I'm not > necessarily opposed to it, but it does feel strange to make > `prepare_repo_settings()` a noop here, since I would expect that any > callers who do want to call `prepare_repo_settings()` are likely > convinced that they are inside of a repository, and it probably should > be a BUG() if they aren't. > > I was initially thinking that Josh's alternative of introducing and > calling a lower-level version of `prepare_commit_graph()` that doesn't > require the use of any repository settings would make sense. > > But when I started looking at implementing it, I became confused at how > this is supposed to work at all without using a repository. We depend on > the values parsed from: > > - commitGraph.generationVersion > - commitGraph.readChangedPaths > > to determine which part(s) of the commit graph we do and don't read. > > Looking around, I think I probably inadvertently broke this in > ab14d0676c (commit-graph: pass a 'struct repository *' in more places, > 2020-09-09). But prior to ab14d0676c, neither of those settings existed, > so parsing the commit graph was a pure function of the commit graph's > contents alone, and didn't rely on the existence of a repository. Yeah, I have not done a great job keeping the fuzzers up to date with commit-graph changes :(. > We could pretend as if `commitGraph.generationVersion` is always "2" and > `commitGraph.readChangedPaths` is always "true", and I think that would > still give us good-enough coverage. It might also be worthwhile for the fuzzer to test each interesting combination of settings, using the same arbitrary input. > I assume that libFuzzer doesn't give us a convenient way to stub out a > repository. Maybe we should have a variant of `parse_commit_graph` that > instead takes a `struct repo_settings` filled out with the defaults? > > We could use that to teach libFuzzer about additional states that the > commit graph parser can be in, too (though probably outside the scope of > this patch). > > I tried to sketch all of this out, which seems to work. Let me know what > you think: Yes, your patch looks like a much smaller change than I feared would be the case for a parse_commit_graph refactor. I'll test it out, and probably add some additional fuzzer improvements on top. Thanks for the patch!
On 2022.03.23 15:52, Taylor Blau wrote: > I tried to sketch all of this out, which seems to work. Let me know what > you think: BTW, is it OK if I include your Signed-off-by on this when I send my V2?
On Mon, Mar 28, 2022 at 12:15:53PM -0700, Josh Steadmon wrote: > > Looking around, I think I probably inadvertently broke this in > > ab14d0676c (commit-graph: pass a 'struct repository *' in more places, > > 2020-09-09). But prior to ab14d0676c, neither of those settings existed, > > so parsing the commit graph was a pure function of the commit graph's > > contents alone, and didn't rely on the existence of a repository. > > Yeah, I have not done a great job keeping the fuzzers up to date with > commit-graph changes :(. I think that puts you and I in the same boat, since the original breakage from ab14d0676c blames back to me. I'm sorry that I didn't notice that my change had broken the fuzzing code at the time, and I appreciate you working on fixing it! > > We could pretend as if `commitGraph.generationVersion` is always "2" and > > `commitGraph.readChangedPaths` is always "true", and I think that would > > still give us good-enough coverage. > > It might also be worthwhile for the fuzzer to test each interesting > combination of settings, using the same arbitrary input. Definitely. I don't think it hurts to just focus on getting the common case ("2", "true") working again. And if libFuzzer makes it easy-ish to test more of the possible input space, I'm all for it. Thanks, Taylor
On Mon, Mar 28, 2022 at 12:53:33PM -0700, Josh Steadmon wrote: > On 2022.03.23 15:52, Taylor Blau wrote: > > I tried to sketch all of this out, which seems to work. Let me know what > > you think: > > BTW, is it OK if I include your Signed-off-by on this when I send my V2? Yes, absolutely. Thanks for asking; it's fine to include my Signed-off-by on any patches / diffs that I send to the mailing list. Thanks, Taylor
On Wed, Mar 23 2022, Taylor Blau wrote: > /* Boolean config or default, does not cascade (simple) */ > repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); > + repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2); If this ends up being kept let's add it to its own commented "section", the comment 2-lines above it is now incorrect.
On Wed, Mar 23 2022, Taylor Blau wrote: > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote: >> On 3/23/2022 2:03 PM, Josh Steadmon wrote: >> > prepare_repo_settings() initializes a `struct repository` with various >> > default config options and settings read from a repository-local config >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it >> > is called by a process whose CWD is not a Git repository. This approach >> > was suggested in [1]. >> > >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary >> > fuzzing-engine-provided bytes as a commit graph file. >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but >> > since we run the fuzz tests without a valid repository, we are hitting >> > the BUG() from 44c7e62 for every test case. >> > >> > Fix this by refactoring prepare_repo_settings() such that it sets >> > default options unconditionally; if its process is in a Git repository, >> > it will also load settings from the local config. This eliminates the >> > need for a BUG() when not in a repository. >> >> I think you have the right idea and this can work. > > Hmmm. To me this feels like bending over backwards in > `prepare_repo_settings()` to accommodate one particular caller. I'm not > necessarily opposed to it, but it does feel strange to make > `prepare_repo_settings()` a noop here, since I would expect that any > callers who do want to call `prepare_repo_settings()` are likely > convinced that they are inside of a repository, and it probably should > be a BUG() if they aren't. I think adding that BUG() was overzelous in the first place, per https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/; I don't see what purpose it solves to be this overly anal in this code, and 44c7e62e51e (repo-settings: prepare_repo_settings only in git repos, 2021-12-06) just discusses "what" and not "why". I think a perfectly fine solution to this is just to revert it: diff --git a/repo-settings.c b/repo-settings.c index b4fbd16cdcc..e162c1479bf 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -18,7 +18,7 @@ void prepare_repo_settings(struct repository *r) int manyfiles; if (!r->gitdir) - BUG("Cannot add settings for uninitialized repository"); + return; if (r->settings.initialized++) return; I have that in my local integration branch, because I ended up wanting to add prepare_repo_settings() to usage.c, which may or may not run inside a repo (and maybe we'll have that config, maybe not). But really, in common-main.c we do a initialize_the_repository(), so a "struct repository" is already a thing we have before we get to the "RUN_SETUP_GENTLY" or whatever in git.c, and a bunch of things all over the place assume that it's the equivalent of { 0 }-initialized. If we actually want to turn repository.[ch] into some strict API where "Tho Shalt Not Use the_repository unless" we're actually in a repo surely we should have it be NULL then, and to add that BUG() to the likes of initialize_the_repository(). Except I think there's no point in that, and it would just lead to pointless churn, so why do it for the settings in particular? Why can't they just be { 0 }-init'd too? If some caller cares about the distinction between r->settings being like it is because of us actually having a repo, or us using the defaults why can't they just check r->gitdir themselves? For the rest the default of "just provide the defaults then" is a much saner API. I think *maybe* what this actually wanted to do was to bridge the gap between "startup_info->have_repository" and a caller in builtin/ calling prepare_repo_settings(), i.e. that it was a logic error to have that RUN_SETUP_GENTLY caller do that. I can see how that *might* be useful as some sanity assertion, but then maybe we could add a more narrow BUG() just for that case, even having a builtin_prepare_repo_settings() wrapper in builtin.h or whatever.
On Tue, Mar 29, 2022 at 11:03:14AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Mar 23 2022, Taylor Blau wrote: > > > /* Boolean config or default, does not cascade (simple) */ > > repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); > > + repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2); > > If this ends up being kept let's add it to its own commented "section", > the comment 2-lines above it is now incorrect. Thanks for pointing it out; indeed that comment is definitely stale with respect to the newer code. This patch was just a sketch of an alternative approach for Josh to consider, so I can't guarantee that it isn't immune to nit-picks ;). I think that Josh is planning on picking this up, so hopefully he doesn't mind applying these and any other touch-ups locally before resubmitting. (Josh, if you do: thank you very much in advance!) Thanks, Taylor
On Tue, Mar 29, 2022 at 11:04:18AM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Mar 23 2022, Taylor Blau wrote: > > > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote: > >> On 3/23/2022 2:03 PM, Josh Steadmon wrote: > >> > prepare_repo_settings() initializes a `struct repository` with various > >> > default config options and settings read from a repository-local config > >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only > >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it > >> > is called by a process whose CWD is not a Git repository. This approach > >> > was suggested in [1]. > >> > > >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary > >> > fuzzing-engine-provided bytes as a commit graph file. > >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but > >> > since we run the fuzz tests without a valid repository, we are hitting > >> > the BUG() from 44c7e62 for every test case. > >> > > >> > Fix this by refactoring prepare_repo_settings() such that it sets > >> > default options unconditionally; if its process is in a Git repository, > >> > it will also load settings from the local config. This eliminates the > >> > need for a BUG() when not in a repository. > >> > >> I think you have the right idea and this can work. > > > > Hmmm. To me this feels like bending over backwards in > > `prepare_repo_settings()` to accommodate one particular caller. I'm not > > necessarily opposed to it, but it does feel strange to make > > `prepare_repo_settings()` a noop here, since I would expect that any > > callers who do want to call `prepare_repo_settings()` are likely > > convinced that they are inside of a repository, and it probably should > > be a BUG() if they aren't. > > I think adding that BUG() was overzelous in the first place, per > https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/; I think Junio raised a good point in https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/ , though some of the detail was lost in 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos, 2021-12-06). > I have that in my local integration branch, because I ended up wanting > to add prepare_repo_settings() to usage.c, which may or may not run > inside a repo (and maybe we'll have that config, maybe not). I see what you're saying, though I think we would be equally OK to have a default value of the repo_settings struct that we could rely on. I said some of this back in https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/ , namely the parts around "I would expect that any callers who do want to call `prepare_repo_settings()` are likely convinced that they are inside of a repository, and it probably should be a BUG() if they aren't." Thinking in terms of your message, though, I think the distinction (from my perspective, at least) is between (a) using something called _repo_-settings in a non-repo context, and (b) calling a function which is supposed to fill in its values from a repository (which the caller implicitly expects to exist). Neither feel _good_ to me, but (b) feels worse, since it is making it OK to operate in a likely-unexpected context with respect to the caller's expectations. Anyway, I think that we are pretty far into the weeds, and it's likely time to turn around. I don't have that strong a feeling either way, and in all honesty either approach is probably just fine. Thanks, Taylor
On Tue, Mar 29 2022, Taylor Blau wrote: > On Tue, Mar 29, 2022 at 11:04:18AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Wed, Mar 23 2022, Taylor Blau wrote: >> >> > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote: >> >> On 3/23/2022 2:03 PM, Josh Steadmon wrote: >> >> > prepare_repo_settings() initializes a `struct repository` with various >> >> > default config options and settings read from a repository-local config >> >> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only >> >> > in git repos), prepare_repo_settings was changed to issue a BUG() if it >> >> > is called by a process whose CWD is not a Git repository. This approach >> >> > was suggested in [1]. >> >> > >> >> > This breaks fuzz-commit-graph, which attempts to parse arbitrary >> >> > fuzzing-engine-provided bytes as a commit graph file. >> >> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but >> >> > since we run the fuzz tests without a valid repository, we are hitting >> >> > the BUG() from 44c7e62 for every test case. >> >> > >> >> > Fix this by refactoring prepare_repo_settings() such that it sets >> >> > default options unconditionally; if its process is in a Git repository, >> >> > it will also load settings from the local config. This eliminates the >> >> > need for a BUG() when not in a repository. >> >> >> >> I think you have the right idea and this can work. >> > >> > Hmmm. To me this feels like bending over backwards in >> > `prepare_repo_settings()` to accommodate one particular caller. I'm not >> > necessarily opposed to it, but it does feel strange to make >> > `prepare_repo_settings()` a noop here, since I would expect that any >> > callers who do want to call `prepare_repo_settings()` are likely >> > convinced that they are inside of a repository, and it probably should >> > be a BUG() if they aren't. >> >> I think adding that BUG() was overzelous in the first place, per >> https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@evledraar.gmail.com/; > > I think Junio raised a good point in > > https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/ > > , though some of the detail was lost in 44c7e62e51 (repo-settings: > prepare_repo_settings only in git repos, 2021-12-06). > >> I have that in my local integration branch, because I ended up wanting >> to add prepare_repo_settings() to usage.c, which may or may not run >> inside a repo (and maybe we'll have that config, maybe not). > > I see what you're saying, though I think we would be equally OK to have > a default value of the repo_settings struct that we could rely on. I > said some of this back in > > https://lore.kernel.org/git/Yjt6mLIfw0V3aVTO@nand.local/ > > , namely the parts around "I would expect that any callers who do want > to call `prepare_repo_settings()` are likely convinced that they are > inside of a repository, and it probably should be a BUG() if they > aren't." > > Thinking in terms of your message, though, I think the distinction (from > my perspective, at least) is between (a) using something called > _repo_-settings in a non-repo context, and (b) calling a function which > is supposed to fill in its values from a repository (which the caller > implicitly expects to exist). > > Neither feel _good_ to me, but (b) feels worse, since it is making it OK > to operate in a likely-unexpected context with respect to the caller's > expectations. I agree that it's a bit iffy. I'm basically advocating for treating "the_repository->settings" as though it's a new "the_config" or whatever. Maybe we'd be better off just making that move, or having the_repository->settings contain only settings relevant to cases where we only have a repository. But I think predicating useful uses of it on that refactoring is overdoing it a bit, especially as I think your "(b)" concern here is already something we deal with when it comes to initialize_the_repository() and checks for "the_repository->gitdir". Can't we just have callers that really care about the distinction check "->gitdir" instead? As they're already doing in some cases already? Or just: git mv {repo,global}-settings.c Since that's what it seems to want to be anyway. > Anyway, I think that we are pretty far into the weeds, and it's likely > time to turn around. I don't have that strong a feeling either way, and > in all honesty either approach is probably just fine. *nod*
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Or just: > > git mv {repo,global}-settings.c > > Since that's what it seems to want to be anyway. Hmph, care to elaborate a bit more on "seems to"? Here is my take - The code makes extensive use of repo_cfg_bool(), which is a thin wrapper around repo_config_get_bool(); despite its name, it is not about reading from the configuration file of that repository and nowhere else. It can be affected by global configuration. - Other uses of repo_config_get_*() it uses is the same way. So, it wants to grab a set of configuration that would +apply+ to this specific instance of "struct repository". But that is quite different from "give us settings that would apply in general, I do not have a specific repository in mind", which is what "global-settings.c" would imply at least to me. And in order for the "this specific instance" to make sense, the caller should have made sure that it is indeed a repository. Lifting that BUG() from the code path not only smells sloppy way to work around some corner case code that does not prepare the repository properly, but does not make much sense, at least to me. In exchange for scrapping the safety to help a caller that forgets to prepare repository before it is ready to call this function, what are we gaining? I went back to the thread-starter message and re-read its justification. It talks about: > Concerns: > > Are any callers strictly dependent on having a BUG() here? I suspect > that the worst that would happen is that rather than this BUG(), the > caller would later hit its own BUG() or die(), so I do not think this is > a blocker. Additionally, every builtin that directly calls > prepare_repo_settings is either marked as RUN_SETUP, which means we > would die() prior to calling it anyway, or checks on its own before > calling it (builtin/diff.c). There are several callers in library code, > though, and I have not tracked down how all of those are used. Asking for existing callers being dependent on having a BUG() is a pure nonsense. The existing callers are there in shipped versions of Git exactly because they do things correctly not to hit the BUG(), so BY DEFINITION, they do not care if the BUG() is there or not. So that is not "a blocker", but is a non-argument to ask if existing code paths care if the BUG() is gone. What BUG() is protecting us against is a careless developer who writes a new code or alters an existing code path that ends up making the control flow in such a way that a proper set-up of the repository structure is bypassed by mistake before calling this function. The function is call-once by r->settings.initialized guarding it, calling it and then doing a set-up will result in an unexplainable bug even if the caller tries to compensate by calling it twice, as r->settings that is set incorrectly will be sticky. Having said all that, I can be pursuaded to consider an approach to allow callers to explicitly ask for running outside repository, just like the more strict setup_git_directory() for majority of callers has looser setup_git_directory_gently() counterpart. The current callers should retain the "you must have discovered gitdir" there, but a special purpose code that is not even Git (like fuzzer) can say prepare_repo_settings_gently(r, &nongit_ok); instead. diff --git c/repo-settings.c w/repo-settings.c index b4fbd16cdc..c492bc7671 100644 --- c/repo-settings.c +++ w/repo-settings.c @@ -10,15 +10,24 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest, *dest = def; } -void prepare_repo_settings(struct repository *r) +void prepare_repo_settings_gently(struct repository *r, int *nongit) { int experimental; int value; char *strval; int manyfiles; - if (!r->gitdir) - BUG("Cannot add settings for uninitialized repository"); + if (!r->gitdir) { + /* + * The caller can pass nongit (out paremeter) to ask if r is already + * initialized (and act on it after this function returns). + */ + if (!nongit) + BUG("Cannot add settings for uninitialized repository"); + *nongit = 1; + } else if (nongit) { + *nongit = 0; + } if (r->settings.initialized++) return; diff --git c/repository.h w/repository.h index e29f361703..98f6ec12cc 100644 --- c/repository.h +++ w/repository.h @@ -222,7 +222,8 @@ int repo_read_index_unmerged(struct repository *); */ void repo_update_index_if_able(struct repository *, struct lock_file *); -void prepare_repo_settings(struct repository *r); +#define prepare_repo_settings(r) prepare_repo_settings_gently((r), NULL) +void prepare_repo_settings_gently(struct repository *r, int *nongit); /* * Return 1 if upgrade repository format to target_version succeeded,
On 2022.03.29 22:26, Taylor Blau wrote: > On Tue, Mar 29, 2022 at 11:03:14AM +0200, Ævar Arnfjörð Bjarmason wrote: > > > > On Wed, Mar 23 2022, Taylor Blau wrote: > > > > > /* Boolean config or default, does not cascade (simple) */ > > > repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); > > > + repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2); > > > > If this ends up being kept let's add it to its own commented "section", > > the comment 2-lines above it is now incorrect. > > Thanks for pointing it out; indeed that comment is definitely stale with > respect to the newer code. This patch was just a sketch of an > alternative approach for Josh to consider, so I can't guarantee that it > isn't immune to nit-picks ;). > > I think that Josh is planning on picking this up, so hopefully he > doesn't mind applying these and any other touch-ups locally before > resubmitting. > > (Josh, if you do: thank you very much in advance!) > > Thanks, > Taylor Done in V2, to be sent out shortly.
diff --git a/repo-settings.c b/repo-settings.c index b4fbd16cdc..d154b37007 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r) char *strval; int manyfiles; - if (!r->gitdir) - BUG("Cannot add settings for uninitialized repository"); - if (r->settings.initialized++) return; @@ -28,27 +25,63 @@ void prepare_repo_settings(struct repository *r) r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP; r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE; - /* Booleans config or default, cascades to other settings */ - repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0); - repo_cfg_bool(r, "feature.experimental", &experimental, 0); + if (r->gitdir) { + /* Booleans config or default, cascades to other settings */ + repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0); + repo_cfg_bool(r, "feature.experimental", &experimental, 0); - /* Defaults modified by feature.* */ - if (experimental) { - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; - } - if (manyfiles) { - r->settings.index_version = 4; - r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; - } + /* Defaults modified by feature.* */ + if (experimental) { + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; + } + if (manyfiles) { + r->settings.index_version = 4; + r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; + } + + /* Boolean config or default, does not cascade (simple) */ + repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); + repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1); + repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1); + repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0); + repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); + repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); + repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); - /* Boolean config or default, does not cascade (simple) */ - repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1); - repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1); - repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1); - repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0); - repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); - repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); - repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); + /* + * Non-boolean config + */ + if (!repo_config_get_int(r, "index.version", &value)) + r->settings.index_version = value; + + if (!repo_config_get_string(r, "core.untrackedcache", &strval)) { + int v = git_parse_maybe_bool(strval); + + /* + * If it's set to "keep", or some other non-boolean + * value then "v < 0". Then we do nothing and keep it + * at the default of UNTRACKED_CACHE_KEEP. + */ + if (v >= 0) + r->settings.core_untracked_cache = v ? + UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE; + free(strval); + } + + if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) { + int fetch_default = r->settings.fetch_negotiation_algorithm; + if (!strcasecmp(strval, "skipping")) + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; + else if (!strcasecmp(strval, "noop")) + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP; + else if (!strcasecmp(strval, "consecutive")) + r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE; + else if (!strcasecmp(strval, "default")) + r->settings.fetch_negotiation_algorithm = fetch_default; + else + die("unknown fetch negotiation algorithm '%s'", strval); + } + } /* * The GIT_TEST_MULTI_PACK_INDEX variable is special in that @@ -60,40 +93,6 @@ void prepare_repo_settings(struct repository *r) if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) r->settings.core_multi_pack_index = 1; - /* - * Non-boolean config - */ - if (!repo_config_get_int(r, "index.version", &value)) - r->settings.index_version = value; - - if (!repo_config_get_string(r, "core.untrackedcache", &strval)) { - int v = git_parse_maybe_bool(strval); - - /* - * If it's set to "keep", or some other non-boolean - * value then "v < 0". Then we do nothing and keep it - * at the default of UNTRACKED_CACHE_KEEP. - */ - if (v >= 0) - r->settings.core_untracked_cache = v ? - UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE; - free(strval); - } - - if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) { - int fetch_default = r->settings.fetch_negotiation_algorithm; - if (!strcasecmp(strval, "skipping")) - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING; - else if (!strcasecmp(strval, "noop")) - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP; - else if (!strcasecmp(strval, "consecutive")) - r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE; - else if (!strcasecmp(strval, "default")) - r->settings.fetch_negotiation_algorithm = fetch_default; - else - die("unknown fetch negotiation algorithm '%s'", strval); - } - /* * This setting guards all index reads to require a full index * over a sparse index. After suitable guards are placed in the
prepare_repo_settings() initializes a `struct repository` with various default config options and settings read from a repository-local config file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git repos), prepare_repo_settings was changed to issue a BUG() if it is called by a process whose CWD is not a Git repository. This approach was suggested in [1]. This breaks fuzz-commit-graph, which attempts to parse arbitrary fuzzing-engine-provided bytes as a commit graph file. commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but since we run the fuzz tests without a valid repository, we are hitting the BUG() from 44c7e62 for every test case. Fix this by refactoring prepare_repo_settings() such that it sets default options unconditionally; if its process is in a Git repository, it will also load settings from the local config. This eliminates the need for a BUG() when not in a repository. Concerns: Are any callers strictly dependent on having a BUG() here? I suspect that the worst that would happen is that rather than this BUG(), the caller would later hit its own BUG() or die(), so I do not think this is a blocker. Additionally, every builtin that directly calls prepare_repo_settings is either marked as RUN_SETUP, which means we would die() prior to calling it anyway, or checks on its own before calling it (builtin/diff.c). There are several callers in library code, though, and I have not tracked down how all of those are used. Alternatives considered: Setting up a valid repository for fuzz testing would avoid triggering this bug, but would unacceptably slow down the test cases. Refactoring parse_commit_graph() in such a way that the fuzz test has an alternate entry point that avoids calling prepare_repo_settings() might be possible, but would be a much larger change than this one. It would also run the risk that the changes would be so extensive that the fuzzer would be merely testing its own custom commit-graph implementation, rather than the one that's actually used in the real world. [1]: https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/ Signed-off-by: Josh Steadmon <steadmon@google.com> --- repo-settings.c | 111 ++++++++++++++++++++++++------------------------ 1 file changed, 55 insertions(+), 56 deletions(-) base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7