[v2,3/4] apply: remove the_repository global variable

Message ID	d64955a2e277da138146020f6a0cf96f4636a162.1727718031.git.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B281D198E89 for <git@vger.kernel.org>; Mon, 30 Sep 2024 17:40:36 +0000 (UTC) Message-Id: <d64955a2e277da138146020f6a0cf96f4636a162.1727718031.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1788.v2.git.git.1727718030.gitgitgadget@gmail.com> References: <pull.1788.git.git.1727185364.gitgitgadget@gmail.com> <pull.1788.v2.git.git.1727718030.gitgitgadget@gmail.com> Date: Mon, 30 Sep 2024 17:40:29 +0000 Subject: [PATCH v2 3/4] apply: remove the_repository global variable Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk MIME-Version: 1.0 To: git@vger.kernel.org Cc: shejialuo <shejialuo@gmail.com>, Patrick Steinhardt <ps@pks.im>, John Cai <johncai86@gmail.com>, John Cai <johncai86@gmail.com> From: John Cai <johncai86@gmail.com>
Series	Remove the_repository global for am, annotate, apply, archive builtins \| expand [v2,0/4] Remove the_repository global for am, annotate, apply, archive builtins [v2,1/4] git: pass in repo for RUN_SETUP_GENTLY [v2,2/4] annotate: remove usage of the_repository global [v2,3/4] apply: remove the_repository global variable [v2,4/4] archive: remove the_repository global variable

John Cai Sept. 30, 2024, 5:40 p.m. UTC

From: John Cai <johncai86@gmail.com>

Remove the_repository global variable in favor of the repository
argument that gets passed in through the builtin function.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 builtin/apply.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Junio C Hamano Sept. 30, 2024, 8:06 p.m. UTC | #1

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> Remove the_repository global variable in favor of the repository
> argument that gets passed in through the builtin function.
>
> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
>  builtin/apply.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/apply.c b/builtin/apply.c
> index 84f1863d3ac..d0bafbec7e4 100644
> --- a/builtin/apply.c
> +++ b/builtin/apply.c
> @@ -1,4 +1,3 @@
> -#define USE_THE_REPOSITORY_VARIABLE
>  #include "builtin.h"
>  #include "gettext.h"
>  #include "hash.h"
> @@ -12,14 +11,14 @@ static const char * const apply_usage[] = {
>  int cmd_apply(int argc,
>  	      const char **argv,
>  	      const char *prefix,
> -	      struct repository *repo UNUSED)
> +	      struct repository *repo)
>  {
>  	int force_apply = 0;
>  	int options = 0;
>  	int ret;
>  	struct apply_state state;
>  
> -	if (init_apply_state(&state, the_repository, prefix))
> +	if (init_apply_state(&state, repo, prefix))
>  		exit(128);

Is this one, and ...

>  
>  	/*
> @@ -28,8 +27,8 @@ int cmd_apply(int argc,
>  	 * is worth the effort.
>  	 * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/
>  	 */
> -	if (!the_hash_algo)
> -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +	if (!repo->hash_algo)
> +		repo_set_hash_algo(repo, GIT_HASH_SHA1);

... is this use of "repo" still valid?  We now pass NULL, not
the_repository, when a command with SETUP_GENTLY is asked to run
outside a repository, no?  Shouldn't it detecting the case, and
passing the pointer to a fallback object (perhaps the_repository)
instead of repo?

I _think_ state->repo->index is accessed unconditionally only to
figure out whitespace attributes, even outside a repository (thanks
to the_repository standing in), with the expectation that the index
is empty (because we do not read any) and we find no customization,
when "git apply" is used as a better GNU patch to work outside any
repository.  Maybe I am not looking hard enough, but I fail to see
how the code makes sure that repo being NULL outside a repository
does not lead to a dereferencing of a NULL pointer.

Thanks.

shejialuo Oct. 1, 2024, 4:58 a.m. UTC | #2

On Mon, Sep 30, 2024 at 01:06:55PM -0700, Junio C Hamano wrote:
> >  	/*
> > @@ -28,8 +27,8 @@ int cmd_apply(int argc,
> >  	 * is worth the effort.
> >  	 * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/
> >  	 */
> > -	if (!the_hash_algo)
> > -		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> > +	if (!repo->hash_algo)
> > +		repo_set_hash_algo(repo, GIT_HASH_SHA1);
> 
> ... is this use of "repo" still valid?  We now pass NULL, not
> the_repository, when a command with SETUP_GENTLY is asked to run
> outside a repository, no?  Shouldn't it detecting the case, and
> passing the pointer to a fallback object (perhaps the_repository)
> instead of repo?
> 

This is a bad usage. Although the "t1517: apply a patch outside
repository" should check the code, the uninitialized variable
"repo_exists" will cause "repo_exists ? repo : NULL" to always be
"repo" which hides the wrong usage of the "repo_exists".

By fetching the tree, I initialize the "repo_exists = 0" for the [PATCH
v2 1/4]. And there are many tests failed. Many builtins with
"RUN_SETUP_GENTLY" property or which could be converted to
"RUN_SETUP_GENTLY" property by ONLY "-h" parameter will fail
(segmentation fault). It's obvious that we use NULL pointer for "repo".

In my opinion, we should first think about how we handle the situation
where we run builtins outside of the repository. The most easiest way is
to pass the fallback object (aka "the_repository").

However, this seems a little strange. We are truly outside of the
repository but we really rely on the "struct repository *" to do many
operations. It's unrealistic to change so many interfaces which use the
"struct repository *". So, we should just use the fallback idea at
current.

Thanks,
Jialuo

Patrick Steinhardt Oct. 1, 2024, 12:32 p.m. UTC | #3

On Tue, Oct 01, 2024 at 12:58:30PM +0800, shejialuo wrote:
> On Mon, Sep 30, 2024 at 01:06:55PM -0700, Junio C Hamano wrote:
> In my opinion, we should first think about how we handle the situation
> where we run builtins outside of the repository. The most easiest way is
> to pass the fallback object (aka "the_repository").
> 
> However, this seems a little strange. We are truly outside of the
> repository but we really rely on the "struct repository *" to do many
> operations. It's unrealistic to change so many interfaces which use the
> "struct repository *". So, we should just use the fallback idea at
> current.

I disagree with this statement. If code isn't prepare to not handle a
`NULL` repository we shouldn't fall back to `the_repository`, but we
should instead prepare the code to handle this case. This of course
requires us to do a ton of refactorings, but that is the idea of this
whole exercise to get rid of `the_repository`.

If a command cannot be converted to stop using `the_repository` right
now we should skip it and revisit once all prerequisites have been
adapted accordingly.

Patrick

shejialuo Oct. 1, 2024, 1:40 p.m. UTC | #4

On Tue, Oct 01, 2024 at 02:32:30PM +0200, Patrick Steinhardt wrote:
> On Tue, Oct 01, 2024 at 12:58:30PM +0800, shejialuo wrote:
> > On Mon, Sep 30, 2024 at 01:06:55PM -0700, Junio C Hamano wrote:
> > In my opinion, we should first think about how we handle the situation
> > where we run builtins outside of the repository. The most easiest way is
> > to pass the fallback object (aka "the_repository").
> > 
> > However, this seems a little strange. We are truly outside of the
> > repository but we really rely on the "struct repository *" to do many
> > operations. It's unrealistic to change so many interfaces which use the
> > "struct repository *". So, we should just use the fallback idea at
> > current.
> 
> I disagree with this statement. If code isn't prepare to not handle a
> `NULL` repository we shouldn't fall back to `the_repository`, but we
> should instead prepare the code to handle this case. This of course
> requires us to do a ton of refactorings, but that is the idea of this
> whole exercise to get rid of `the_repository`.
> 

Actually, I also insist that we should refactor here. But I worry about
the burden this would bring to John due to we may do a lot of work here.
So, I expressed my meaning in a compromising way.

But we should face the problem directly :).

Thanks,
Jialuo

Patrick Steinhardt Oct. 1, 2024, 2:09 p.m. UTC | #5

On Tue, Oct 01, 2024 at 09:40:43PM +0800, shejialuo wrote:
> On Tue, Oct 01, 2024 at 02:32:30PM +0200, Patrick Steinhardt wrote:
> > On Tue, Oct 01, 2024 at 12:58:30PM +0800, shejialuo wrote:
> > > On Mon, Sep 30, 2024 at 01:06:55PM -0700, Junio C Hamano wrote:
> > > In my opinion, we should first think about how we handle the situation
> > > where we run builtins outside of the repository. The most easiest way is
> > > to pass the fallback object (aka "the_repository").
> > > 
> > > However, this seems a little strange. We are truly outside of the
> > > repository but we really rely on the "struct repository *" to do many
> > > operations. It's unrealistic to change so many interfaces which use the
> > > "struct repository *". So, we should just use the fallback idea at
> > > current.
> > 
> > I disagree with this statement. If code isn't prepare to not handle a
> > `NULL` repository we shouldn't fall back to `the_repository`, but we
> > should instead prepare the code to handle this case. This of course
> > requires us to do a ton of refactorings, but that is the idea of this
> > whole exercise to get rid of `the_repository`.
> > 
> 
> Actually, I also insist that we should refactor here. But I worry about
> the burden this would bring to John due to we may do a lot of work here.
> So, I expressed my meaning in a compromising way.
> 
> But we should face the problem directly :).

True, all of this is a long-term effort that is probably going to take
us many months, likely even years. So people working on it should take
things slow and refactor chunks that are mostly ready to be converted to
get rid of `the_repository`.

That will sometimes mean that you have to scrap the conversion you're
currently working on because you discover that it inherently relies on
`the_repository` deep down in the stack, and refactoring it would be a
huge undertaking. That definitely happened to me multiple times while
introducing `USE_THE_REPOSITORY_VARIABLE`. And every time I did discover
that, I went one level deeper to try and fix the underpinnings first.

I mostly don't want us to blur the lines by silently falling back to
`the_repository` in situations where we don't intend to. So I'd rather
go a bit slower overall and design the code such that it doesn't fall
back anymore as a way to prove that something is indeed not relying on
`the_repository` anymore. Otherwise we're going to make everyones life
harder.

Patrick

Junio C Hamano Oct. 1, 2024, 5:10 p.m. UTC | #6

Patrick Steinhardt <ps@pks.im> writes:

> I disagree with this statement. If code isn't prepare to not handle a
> `NULL` repository we shouldn't fall back to `the_repository`, but we
> should instead prepare the code to handle this case. This of course
> requires us to do a ton of refactorings, but that is the idea of this
> whole exercise to get rid of `the_repository`.

I agree.  To me, the patch was screaming that the author was not
prepared to go the whole nine yards, though.  Adding back the
explicit reference to "the_repository" as a fallback is the next
best thing to do, pushing the "problem" closer to where it is.

> If a command cannot be converted to stop using `the_repository` right
> now we should skip it and revisit once all prerequisites have been
> adapted accordingly.

That is also a viable approach.

John Cai Oct. 3, 2024, 6:28 p.m. UTC | #7

Hi Junio,

On 1 Oct 2024, at 13:10, Junio C Hamano wrote:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> I disagree with this statement. If code isn't prepare to not handle a
>> `NULL` repository we shouldn't fall back to `the_repository`, but we
>> should instead prepare the code to handle this case. This of course
>> requires us to do a ton of refactorings, but that is the idea of this
>> whole exercise to get rid of `the_repository`.
>
> I agree.  To me, the patch was screaming that the author was not
> prepared to go the whole nine yards, though.  Adding back the
> explicit reference to "the_repository" as a fallback is the next
> best thing to do, pushing the "problem" closer to where it is.
>

Indeed, I did not do my due diligence here instead of assuming all layers
that look at the repo argument do the right thing.

>> If a command cannot be converted to stop using `the_repository` right
>> now we should skip it and revisit once all prerequisites have been
>> adapted accordingly.

Looks like it’d be preferable if I just drop this patch from the series as
it will require a larger refactor.

Thanks
John

>
> That is also a viable approach.

[v2,3/4] apply: remove the_repository global variable

Commit Message

Comments

Patch