Message ID | 96256f9eb30e5aca632942169d5c311b1f245b40.1713180749.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | 66bce9d00bc4e2f89b2aea21d1e162c9ee47f55c |
Headers | show |
Series | global: drop external `the_index` variable | expand |
Patrick Steinhardt <ps@pks.im> writes: > Refactor the code and create a common `initialize_repository()` function > that gets called from `repo_init()` and `initialize_the_repository()`. > This function sets up both the repository and the index as required. > Like this, we can easily special-case when `repo_init()` gets called > with `the_repository`. > Nit: `initialize_the_repository()` calling `initialize_repository()` doesn't really indicate what each of them does and why we have two functions which are similarly named but calling each other. Maybe we should rename them?
On Wed, Apr 17, 2024 at 05:38:29PM +0000, Karthik Nayak wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > Refactor the code and create a common `initialize_repository()` function > > that gets called from `repo_init()` and `initialize_the_repository()`. > > This function sets up both the repository and the index as required. > > Like this, we can easily special-case when `repo_init()` gets called > > with `the_repository`. > > > > Nit: `initialize_the_repository()` calling `initialize_repository()` > doesn't really indicate what each of them does and why we have two > functions which are similarly named but calling each other. Maybe we > should rename them? I think once you know about `the_repository` it isn't all that bad. Whereas `initialize_repository()` initializes any repository, `initialize_the_repository()` only initializes `the_repository`. But you know, let's further simplify this and move all initialization logic into `initialize_repository()` to make this hopefully clearer. And on top of that we can then also drop `initialize_the_repository()` completely by statically initializing the `the_repository` pointer so that callers can simply call `initialize_repository(&the_repository)` instead. Patrick
diff --git a/repository.c b/repository.c index e15b416944..d64d15d952 100644 --- a/repository.c +++ b/repository.c @@ -25,17 +25,20 @@ static struct repository the_repo; struct repository *the_repository; struct index_state the_index; +static void initialize_repository(struct repository *repo, + struct index_state *index) +{ + repo->index = index; + repo->objects = raw_object_store_new(); + repo->remote_state = remote_state_new(); + repo->parsed_objects = parsed_object_pool_new(); + index_state_init(index, repo); +} + void initialize_the_repository(void) { the_repository = &the_repo; - - the_repo.index = &the_index; - the_repo.objects = raw_object_store_new(); - the_repo.remote_state = remote_state_new(); - the_repo.parsed_objects = parsed_object_pool_new(); - - index_state_init(&the_index, the_repository); - + initialize_repository(the_repository, &the_index); repo_set_hash_algo(&the_repo, GIT_HASH_SHA1); } @@ -188,9 +191,12 @@ int repo_init(struct repository *repo, struct repository_format format = REPOSITORY_FORMAT_INIT; memset(repo, 0, sizeof(*repo)); - repo->objects = raw_object_store_new(); - repo->parsed_objects = parsed_object_pool_new(); - repo->remote_state = remote_state_new(); + if (repo == the_repository) { + initialize_repository(the_repository, &the_index); + } else { + ALLOC_ARRAY(repo->index, 1); + initialize_repository(repo, repo->index); + } if (repo_init_gitdir(repo, gitdir)) goto error;
When Git starts, one of the first things it will do is to call `initialize_the_repository()`. This function sets up both the global `the_repository` and `the_index` variables as required. Part of that setup is also to set `the_repository.index = &the_index` so that the index can be accessed via the repository. When calling `repo_init()` on a repository though we set the complete struct to all-zeroes, which will also cause us to unset the `index` pointer. And as we don't re-initialize the index in that function, we will end up with a `NULL` pointer here. This has been fine until now becaues this function is only used to create a new repository. git-init(1) does not access the index at all after initializing the repository, whereas git-checkout(1) only uses `the_index` directly. We are about to remove `the_index` though, which will uncover this partially-initialized repository structure. Refactor the code and create a common `initialize_repository()` function that gets called from `repo_init()` and `initialize_the_repository()`. This function sets up both the repository and the index as required. Like this, we can easily special-case when `repo_init()` gets called with `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- repository.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-)