Message ID | d64955a2e277da138146020f6a0cf96f4636a162.1727718031.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Remove the_repository global for am, annotate, apply, archive builtins | expand |
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: John Cai <johncai86@gmail.com> > > Remove the_repository global variable in favor of the repository > argument that gets passed in through the builtin function. > > Signed-off-by: John Cai <johncai86@gmail.com> > --- > builtin/apply.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/builtin/apply.c b/builtin/apply.c > index 84f1863d3ac..d0bafbec7e4 100644 > --- a/builtin/apply.c > +++ b/builtin/apply.c > @@ -1,4 +1,3 @@ > -#define USE_THE_REPOSITORY_VARIABLE > #include "builtin.h" > #include "gettext.h" > #include "hash.h" > @@ -12,14 +11,14 @@ static const char * const apply_usage[] = { > int cmd_apply(int argc, > const char **argv, > const char *prefix, > - struct repository *repo UNUSED) > + struct repository *repo) > { > int force_apply = 0; > int options = 0; > int ret; > struct apply_state state; > > - if (init_apply_state(&state, the_repository, prefix)) > + if (init_apply_state(&state, repo, prefix)) > exit(128); Is this one, and ... > > /* > @@ -28,8 +27,8 @@ int cmd_apply(int argc, > * is worth the effort. > * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/ > */ > - if (!the_hash_algo) > - repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > + if (!repo->hash_algo) > + repo_set_hash_algo(repo, GIT_HASH_SHA1); ... is this use of "repo" still valid? We now pass NULL, not the_repository, when a command with SETUP_GENTLY is asked to run outside a repository, no? Shouldn't it detecting the case, and passing the pointer to a fallback object (perhaps the_repository) instead of repo? I _think_ state->repo->index is accessed unconditionally only to figure out whitespace attributes, even outside a repository (thanks to the_repository standing in), with the expectation that the index is empty (because we do not read any) and we find no customization, when "git apply" is used as a better GNU patch to work outside any repository. Maybe I am not looking hard enough, but I fail to see how the code makes sure that repo being NULL outside a repository does not lead to a dereferencing of a NULL pointer. Thanks.
On Mon, Sep 30, 2024 at 01:06:55PM -0700, Junio C Hamano wrote: > > /* > > @@ -28,8 +27,8 @@ int cmd_apply(int argc, > > * is worth the effort. > > * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/ > > */ > > - if (!the_hash_algo) > > - repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > > + if (!repo->hash_algo) > > + repo_set_hash_algo(repo, GIT_HASH_SHA1); > > ... is this use of "repo" still valid? We now pass NULL, not > the_repository, when a command with SETUP_GENTLY is asked to run > outside a repository, no? Shouldn't it detecting the case, and > passing the pointer to a fallback object (perhaps the_repository) > instead of repo? > This is a bad usage. Although the "t1517: apply a patch outside repository" should check the code, the uninitialized variable "repo_exists" will cause "repo_exists ? repo : NULL" to always be "repo" which hides the wrong usage of the "repo_exists". By fetching the tree, I initialize the "repo_exists = 0" for the [PATCH v2 1/4]. And there are many tests failed. Many builtins with "RUN_SETUP_GENTLY" property or which could be converted to "RUN_SETUP_GENTLY" property by ONLY "-h" parameter will fail (segmentation fault). It's obvious that we use NULL pointer for "repo". In my opinion, we should first think about how we handle the situation where we run builtins outside of the repository. The most easiest way is to pass the fallback object (aka "the_repository"). However, this seems a little strange. We are truly outside of the repository but we really rely on the "struct repository *" to do many operations. It's unrealistic to change so many interfaces which use the "struct repository *". So, we should just use the fallback idea at current. Thanks, Jialuo
On Tue, Oct 01, 2024 at 12:58:30PM +0800, shejialuo wrote: > On Mon, Sep 30, 2024 at 01:06:55PM -0700, Junio C Hamano wrote: > In my opinion, we should first think about how we handle the situation > where we run builtins outside of the repository. The most easiest way is > to pass the fallback object (aka "the_repository"). > > However, this seems a little strange. We are truly outside of the > repository but we really rely on the "struct repository *" to do many > operations. It's unrealistic to change so many interfaces which use the > "struct repository *". So, we should just use the fallback idea at > current. I disagree with this statement. If code isn't prepare to not handle a `NULL` repository we shouldn't fall back to `the_repository`, but we should instead prepare the code to handle this case. This of course requires us to do a ton of refactorings, but that is the idea of this whole exercise to get rid of `the_repository`. If a command cannot be converted to stop using `the_repository` right now we should skip it and revisit once all prerequisites have been adapted accordingly. Patrick
On Tue, Oct 01, 2024 at 02:32:30PM +0200, Patrick Steinhardt wrote: > On Tue, Oct 01, 2024 at 12:58:30PM +0800, shejialuo wrote: > > On Mon, Sep 30, 2024 at 01:06:55PM -0700, Junio C Hamano wrote: > > In my opinion, we should first think about how we handle the situation > > where we run builtins outside of the repository. The most easiest way is > > to pass the fallback object (aka "the_repository"). > > > > However, this seems a little strange. We are truly outside of the > > repository but we really rely on the "struct repository *" to do many > > operations. It's unrealistic to change so many interfaces which use the > > "struct repository *". So, we should just use the fallback idea at > > current. > > I disagree with this statement. If code isn't prepare to not handle a > `NULL` repository we shouldn't fall back to `the_repository`, but we > should instead prepare the code to handle this case. This of course > requires us to do a ton of refactorings, but that is the idea of this > whole exercise to get rid of `the_repository`. > Actually, I also insist that we should refactor here. But I worry about the burden this would bring to John due to we may do a lot of work here. So, I expressed my meaning in a compromising way. But we should face the problem directly :). Thanks, Jialuo
On Tue, Oct 01, 2024 at 09:40:43PM +0800, shejialuo wrote: > On Tue, Oct 01, 2024 at 02:32:30PM +0200, Patrick Steinhardt wrote: > > On Tue, Oct 01, 2024 at 12:58:30PM +0800, shejialuo wrote: > > > On Mon, Sep 30, 2024 at 01:06:55PM -0700, Junio C Hamano wrote: > > > In my opinion, we should first think about how we handle the situation > > > where we run builtins outside of the repository. The most easiest way is > > > to pass the fallback object (aka "the_repository"). > > > > > > However, this seems a little strange. We are truly outside of the > > > repository but we really rely on the "struct repository *" to do many > > > operations. It's unrealistic to change so many interfaces which use the > > > "struct repository *". So, we should just use the fallback idea at > > > current. > > > > I disagree with this statement. If code isn't prepare to not handle a > > `NULL` repository we shouldn't fall back to `the_repository`, but we > > should instead prepare the code to handle this case. This of course > > requires us to do a ton of refactorings, but that is the idea of this > > whole exercise to get rid of `the_repository`. > > > > Actually, I also insist that we should refactor here. But I worry about > the burden this would bring to John due to we may do a lot of work here. > So, I expressed my meaning in a compromising way. > > But we should face the problem directly :). True, all of this is a long-term effort that is probably going to take us many months, likely even years. So people working on it should take things slow and refactor chunks that are mostly ready to be converted to get rid of `the_repository`. That will sometimes mean that you have to scrap the conversion you're currently working on because you discover that it inherently relies on `the_repository` deep down in the stack, and refactoring it would be a huge undertaking. That definitely happened to me multiple times while introducing `USE_THE_REPOSITORY_VARIABLE`. And every time I did discover that, I went one level deeper to try and fix the underpinnings first. I mostly don't want us to blur the lines by silently falling back to `the_repository` in situations where we don't intend to. So I'd rather go a bit slower overall and design the code such that it doesn't fall back anymore as a way to prove that something is indeed not relying on `the_repository` anymore. Otherwise we're going to make everyones life harder. Patrick
Patrick Steinhardt <ps@pks.im> writes: > I disagree with this statement. If code isn't prepare to not handle a > `NULL` repository we shouldn't fall back to `the_repository`, but we > should instead prepare the code to handle this case. This of course > requires us to do a ton of refactorings, but that is the idea of this > whole exercise to get rid of `the_repository`. I agree. To me, the patch was screaming that the author was not prepared to go the whole nine yards, though. Adding back the explicit reference to "the_repository" as a fallback is the next best thing to do, pushing the "problem" closer to where it is. > If a command cannot be converted to stop using `the_repository` right > now we should skip it and revisit once all prerequisites have been > adapted accordingly. That is also a viable approach.
Hi Junio, On 1 Oct 2024, at 13:10, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > >> I disagree with this statement. If code isn't prepare to not handle a >> `NULL` repository we shouldn't fall back to `the_repository`, but we >> should instead prepare the code to handle this case. This of course >> requires us to do a ton of refactorings, but that is the idea of this >> whole exercise to get rid of `the_repository`. > > I agree. To me, the patch was screaming that the author was not > prepared to go the whole nine yards, though. Adding back the > explicit reference to "the_repository" as a fallback is the next > best thing to do, pushing the "problem" closer to where it is. > Indeed, I did not do my due diligence here instead of assuming all layers that look at the repo argument do the right thing. >> If a command cannot be converted to stop using `the_repository` right >> now we should skip it and revisit once all prerequisites have been >> adapted accordingly. Looks like it’d be preferable if I just drop this patch from the series as it will require a larger refactor. Thanks John > > That is also a viable approach.
diff --git a/builtin/apply.c b/builtin/apply.c index 84f1863d3ac..d0bafbec7e4 100644 --- a/builtin/apply.c +++ b/builtin/apply.c @@ -1,4 +1,3 @@ -#define USE_THE_REPOSITORY_VARIABLE #include "builtin.h" #include "gettext.h" #include "hash.h" @@ -12,14 +11,14 @@ static const char * const apply_usage[] = { int cmd_apply(int argc, const char **argv, const char *prefix, - struct repository *repo UNUSED) + struct repository *repo) { int force_apply = 0; int options = 0; int ret; struct apply_state state; - if (init_apply_state(&state, the_repository, prefix)) + if (init_apply_state(&state, repo, prefix)) exit(128); /* @@ -28,8 +27,8 @@ int cmd_apply(int argc, * is worth the effort. * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/ */ - if (!the_hash_algo) - repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + if (!repo->hash_algo) + repo_set_hash_algo(repo, GIT_HASH_SHA1); argc = apply_parse_options(argc, argv, &state, &force_apply, &options,