diff mbox series

merge-recursive: use fspathcmp() in path_hashmap_cmp()

Message ID 512abaef-d71c-9308-6a62-f37b4de69e60@web.de (mailing list archive)
State New, archived
Headers show
Series merge-recursive: use fspathcmp() in path_hashmap_cmp() | expand

Commit Message

René Scharfe Aug. 28, 2021, 9:30 p.m. UTC
Call fspathcmp() instead of open-coding it.  This shortens the code and
makes it less repetitive.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 merge-recursive.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--
2.33.0

Comments

Taylor Blau Aug. 29, 2021, 8:21 p.m. UTC | #1
Hi René,

On Sat, Aug 28, 2021 at 11:30:49PM +0200, René Scharfe wrote:
> Call fspathcmp() instead of open-coding it.  This shortens the code and
> makes it less repetitive.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  merge-recursive.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 3355d50e8a..840599fd53 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -55,10 +55,7 @@ static int path_hashmap_cmp(const void *cmp_data,
>  	a = container_of(eptr, const struct path_hashmap_entry, e);
>  	b = container_of(entry_or_key, const struct path_hashmap_entry, e);
>
> -	if (ignore_case)
> -		return strcasecmp(a->path, key ? key : b->path);
> -	else
> -		return strcmp(a->path, key ? key : b->path);
> +	return fspathcmp(a->path, key ? key : b->path);
>  }

Looks obviously right to me. I found another spot in
t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the
same way. But this looks fine with or without the following diff:

diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index 36ff07bd4b..ab34bdfecd 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data,
        e1 = container_of(eptr, const struct test_entry, ent);
        e2 = container_of(entry_or_key, const struct test_entry, ent);

-       if (ignore_case)
-               return strcasecmp(e1->key, key ? key : e2->key);
-       else
-               return strcmp(e1->key, key ? key : e2->key);
+       return fspathcmp(e1->key, key ? key : e2->key);
 }

 static struct test_entry *alloc_test_entry(unsigned int hash,
Jeff King Aug. 29, 2021, 9 p.m. UTC | #2
On Sun, Aug 29, 2021 at 04:21:21PM -0400, Taylor Blau wrote:

> > +++ b/merge-recursive.c
> > @@ -55,10 +55,7 @@ static int path_hashmap_cmp(const void *cmp_data,
> [...]
> 
> Looks obviously right to me. I found another spot in
> t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the
> same way. But this looks fine with or without the following diff:
> 
> diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
> index 36ff07bd4b..ab34bdfecd 100644
> --- a/t/helper/test-hashmap.c
> +++ b/t/helper/test-hashmap.c
> @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data,
>         e1 = container_of(eptr, const struct test_entry, ent);
>         e2 = container_of(entry_or_key, const struct test_entry, ent);
> 
> -       if (ignore_case)
> -               return strcasecmp(e1->key, key ? key : e2->key);
> -       else
> -               return strcmp(e1->key, key ? key : e2->key);
> +       return fspathcmp(e1->key, key ? key : e2->key);
>  }
> 
>  static struct test_entry *alloc_test_entry(unsigned int hash,

Maybe also:

diff --git a/dir.c b/dir.c
index 03c4d21267..ee46290cbb 100644
--- a/dir.c
+++ b/dir.c
@@ -669,9 +669,7 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
 			 ? ee1->patternlen
 			 : ee2->patternlen;
 
-	if (ignore_case)
-		return strncasecmp(ee1->pattern, ee2->pattern, min_len);
-	return strncmp(ee1->pattern, ee2->pattern, min_len);
+	return fspathncmp(ee1->pattern, ee2->pattern, min_len);
 }
 
 static char *dup_and_filter_pattern(const char *pattern)

This is fun. :)

-Peff
Junio C Hamano Aug. 30, 2021, 12:10 a.m. UTC | #3
Jeff King <peff@peff.net> writes:

>> -       if (ignore_case)
>> -               return strcasecmp(e1->key, key ? key : e2->key);
>> -       else
>> -               return strcmp(e1->key, key ? key : e2->key);
>> +       return fspathcmp(e1->key, key ? key : e2->key);
>>  }
>> 
>>  static struct test_entry *alloc_test_entry(unsigned int hash,
>
> Maybe also:
>
> diff --git a/dir.c b/dir.c
> index 03c4d21267..ee46290cbb 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -669,9 +669,7 @@ int pl_hashmap_cmp(const void *unused_cmp_data,
>  			 ? ee1->patternlen
>  			 : ee2->patternlen;
>  
> -	if (ignore_case)
> -		return strncasecmp(ee1->pattern, ee2->pattern, min_len);
> -	return strncmp(ee1->pattern, ee2->pattern, min_len);
> +	return fspathncmp(ee1->pattern, ee2->pattern, min_len);
>  }
>  
>  static char *dup_and_filter_pattern(const char *pattern)
>
> This is fun. :)

Yes.

So we found three of them in the existing code.  A Coccinelle rule
may be an overkill, I guess.
René Scharfe Aug. 30, 2021, 3:09 p.m. UTC | #4
Am 29.08.21 um 22:21 schrieb Taylor Blau:
> Hi René,
>
> On Sat, Aug 28, 2021 at 11:30:49PM +0200, René Scharfe wrote:
>> Call fspathcmp() instead of open-coding it.  This shortens the code and
>> makes it less repetitive.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  merge-recursive.c | 5 +----
>>  1 file changed, 1 insertion(+), 4 deletions(-)
>>
>> diff --git a/merge-recursive.c b/merge-recursive.c
>> index 3355d50e8a..840599fd53 100644
>> --- a/merge-recursive.c
>> +++ b/merge-recursive.c
>> @@ -55,10 +55,7 @@ static int path_hashmap_cmp(const void *cmp_data,
>>  	a = container_of(eptr, const struct path_hashmap_entry, e);
>>  	b = container_of(entry_or_key, const struct path_hashmap_entry, e);
>>
>> -	if (ignore_case)
>> -		return strcasecmp(a->path, key ? key : b->path);
>> -	else
>> -		return strcmp(a->path, key ? key : b->path);
>> +	return fspathcmp(a->path, key ? key : b->path);
>>  }
>
> Looks obviously right to me. I found another spot in
> t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the
> same way. But this looks fine with or without the following diff:
>
> diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
> index 36ff07bd4b..ab34bdfecd 100644
> --- a/t/helper/test-hashmap.c
> +++ b/t/helper/test-hashmap.c
> @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data,
>         e1 = container_of(eptr, const struct test_entry, ent);
>         e2 = container_of(entry_or_key, const struct test_entry, ent);
>
> -       if (ignore_case)
> -               return strcasecmp(e1->key, key ? key : e2->key);
> -       else
> -               return strcmp(e1->key, key ? key : e2->key);
> +       return fspathcmp(e1->key, key ? key : e2->key);
>  }
>
>  static struct test_entry *alloc_test_entry(unsigned int hash,
>

That's a local variable named "ignore_case", not the one declared in
environment.c that fspathcmp() uses, so this would change the behavior.
The helper code does not include cache.h, so this is not even a case of
variable shadowing, just two different variables for similar purposes
in different places having the same name.

René
Junio C Hamano Aug. 30, 2021, 4:55 p.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

> Looks obviously right to me. I found another spot in
> t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the
> same way. But this looks fine with or without the following diff:

> diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
> index 36ff07bd4b..ab34bdfecd 100644
> --- a/t/helper/test-hashmap.c
> +++ b/t/helper/test-hashmap.c
> @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data,
>         e1 = container_of(eptr, const struct test_entry, ent);
>         e2 = container_of(entry_or_key, const struct test_entry, ent);
>
> -       if (ignore_case)
> -               return strcasecmp(e1->key, key ? key : e2->key);
> -       else
> -               return strcmp(e1->key, key ? key : e2->key);
> +       return fspathcmp(e1->key, key ? key : e2->key);

Sorry but I think this patch is wrong.  Before the precontext of the
patch, there is a local variable decl for ignore_case---the existing
code looks at ignore_case that is different from the global
ignore_case fspathcmp() looks at.

Admittedly, it was probably not an excellent idea to give a name so
bland and unremarkable, 'ignore_case', to a global that affects so
many code paths in the system.  But the variable is already very
established that renaming it would not contribute to improving the
code at all.

It however may not be a bad idea to catch these code paths where a
local variable masks 'ignore_case' (and possibly other globals) and
rename these local ones to avoid a mistake like this.

Thanks.
Jeff King Aug. 30, 2021, 6:19 p.m. UTC | #6
On Mon, Aug 30, 2021 at 05:09:34PM +0200, René Scharfe wrote:

> > diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
> > index 36ff07bd4b..ab34bdfecd 100644
> > --- a/t/helper/test-hashmap.c
> > +++ b/t/helper/test-hashmap.c
> > @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data,
> >         e1 = container_of(eptr, const struct test_entry, ent);
> >         e2 = container_of(entry_or_key, const struct test_entry, ent);
> >
> > -       if (ignore_case)
> > -               return strcasecmp(e1->key, key ? key : e2->key);
> > -       else
> > -               return strcmp(e1->key, key ? key : e2->key);
> > +       return fspathcmp(e1->key, key ? key : e2->key);
> >  }
> >
> >  static struct test_entry *alloc_test_entry(unsigned int hash,
> >
> 
> That's a local variable named "ignore_case", not the one declared in
> environment.c that fspathcmp() uses, so this would change the behavior.
> The helper code does not include cache.h, so this is not even a case of
> variable shadowing, just two different variables for similar purposes
> in different places having the same name.

Yikes, good catch. Perhaps it's overkill, but I wonder if a comment
like:

  /*
   * Do not use fspathcmp() here; our behavior depends on the local
   * ignore_case variable, not the usual Git-wide global.
   */

would help.

I double-checked the spot I suggested. I think it is actually using the
global (though I got it right through sheer luck).

-Peff
René Scharfe Aug. 30, 2021, 6:22 p.m. UTC | #7
Am 30.08.21 um 18:55 schrieb Junio C Hamano:
> Taylor Blau <me@ttaylorr.com> writes:
>
>> Looks obviously right to me. I found another spot in
>> t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the
>> same way. But this looks fine with or without the following diff:
>
>> diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
>> index 36ff07bd4b..ab34bdfecd 100644
>> --- a/t/helper/test-hashmap.c
>> +++ b/t/helper/test-hashmap.c
>> @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data,
>>         e1 = container_of(eptr, const struct test_entry, ent);
>>         e2 = container_of(entry_or_key, const struct test_entry, ent);
>>
>> -       if (ignore_case)
>> -               return strcasecmp(e1->key, key ? key : e2->key);
>> -       else
>> -               return strcmp(e1->key, key ? key : e2->key);
>> +       return fspathcmp(e1->key, key ? key : e2->key);
>
> Sorry but I think this patch is wrong.  Before the precontext of the
> patch, there is a local variable decl for ignore_case---the existing
> code looks at ignore_case that is different from the global
> ignore_case fspathcmp() looks at.
>
> Admittedly, it was probably not an excellent idea to give a name so
> bland and unremarkable, 'ignore_case', to a global that affects so
> many code paths in the system.  But the variable is already very
> established that renaming it would not contribute to improving the
> code at all.
>
> It however may not be a bad idea to catch these code paths where a
> local variable masks 'ignore_case' (and possibly other globals) and
> rename these local ones to avoid a mistake like this.

The name itself is OK, I think, but using it at global scope is
confusing.  -Wshadow can help find such cases, but not this one, as
test-hashmap.c doesn't include the global declaration.  Moving the
global into a struct to provide a poor man's namespace would fix this
for all namesakes, assisted by the compiler.  We'd then access it as
the_config.ignore_case or even the_config.core.ignore_case.

Moving all config-related variables would be quite noisy, I guess,
and probably conflict with lots of in-flight patches, but might be
worth it.

René
Jeff King Aug. 30, 2021, 8:49 p.m. UTC | #8
On Mon, Aug 30, 2021 at 08:22:25PM +0200, René Scharfe wrote:

> > It however may not be a bad idea to catch these code paths where a
> > local variable masks 'ignore_case' (and possibly other globals) and
> > rename these local ones to avoid a mistake like this.
> 
> The name itself is OK, I think, but using it at global scope is
> confusing.  -Wshadow can help find such cases, but not this one, as
> test-hashmap.c doesn't include the global declaration.  Moving the
> global into a struct to provide a poor man's namespace would fix this
> for all namesakes, assisted by the compiler.  We'd then access it as
> the_config.ignore_case or even the_config.core.ignore_case.
> 
> Moving all config-related variables would be quite noisy, I guess,
> and probably conflict with lots of in-flight patches, but might be
> worth it.

Really most of these ought to be in the repository struct anyway, I
would think. The value of ignore_case comes from core.ignorecase, which
is going to be repository-specific. We are probably doing the wrong
thing already by looking at the parent core.ignorecase value when
operating in an in-process submodule, but nobody noticed because it's
quite unlikely for a submodule to have a different setting than the
parent.

-Peff
René Scharfe Sept. 11, 2021, 4:08 p.m. UTC | #9
Am 30.08.21 um 22:49 schrieb Jeff King:
> On Mon, Aug 30, 2021 at 08:22:25PM +0200, René Scharfe wrote:
>
>>> It however may not be a bad idea to catch these code paths where a
>>> local variable masks 'ignore_case' (and possibly other globals) and
>>> rename these local ones to avoid a mistake like this.
>>
>> The name itself is OK, I think, but using it at global scope is
>> confusing.  -Wshadow can help find such cases, but not this one, as
>> test-hashmap.c doesn't include the global declaration.  Moving the
>> global into a struct to provide a poor man's namespace would fix this
>> for all namesakes, assisted by the compiler.  We'd then access it as
>> the_config.ignore_case or even the_config.core.ignore_case.
>>
>> Moving all config-related variables would be quite noisy, I guess,
>> and probably conflict with lots of in-flight patches, but might be
>> worth it.
>
> Really most of these ought to be in the repository struct anyway, I
> would think. The value of ignore_case comes from core.ignorecase, which
> is going to be repository-specific. We are probably doing the wrong
> thing already by looking at the parent core.ignorecase value when
> operating in an in-process submodule, but nobody noticed because it's
> quite unlikely for a submodule to have a different setting than the
> parent.

Good point.  So fspathcmp() and friends would need a repo parameter. :-|

René
Johannes Schindelin Sept. 13, 2021, 11:37 a.m. UTC | #10
Hi René,

On Sat, 11 Sep 2021, René Scharfe wrote:

> Am 30.08.21 um 22:49 schrieb Jeff King:
> > On Mon, Aug 30, 2021 at 08:22:25PM +0200, René Scharfe wrote:
> >
> >>> It however may not be a bad idea to catch these code paths where a
> >>> local variable masks 'ignore_case' (and possibly other globals) and
> >>> rename these local ones to avoid a mistake like this.
> >>
> >> The name itself is OK, I think, but using it at global scope is
> >> confusing.  -Wshadow can help find such cases, but not this one, as
> >> test-hashmap.c doesn't include the global declaration.  Moving the
> >> global into a struct to provide a poor man's namespace would fix this
> >> for all namesakes, assisted by the compiler.  We'd then access it as
> >> the_config.ignore_case or even the_config.core.ignore_case.
> >>
> >> Moving all config-related variables would be quite noisy, I guess,
> >> and probably conflict with lots of in-flight patches, but might be
> >> worth it.
> >
> > Really most of these ought to be in the repository struct anyway, I
> > would think. The value of ignore_case comes from core.ignorecase, which
> > is going to be repository-specific. We are probably doing the wrong
> > thing already by looking at the parent core.ignorecase value when
> > operating in an in-process submodule, but nobody noticed because it's
> > quite unlikely for a submodule to have a different setting than the
> > parent.
>
> Good point.  So fspathcmp() and friends would need a repo parameter. :-|

Yes, we will eventually have to pass `struct repository *r` into a _lot_
of call chains. It'll be a disruptive change, yet if the submodule folks
truly want to aim for in-process recursive treatment of submodules, there
is no alternative.

FWIW on Windows there are other potentially repository-specific settings
that are relevant in similar situations. For example, there is
`core.symlinks`.

Ciao,
Dscho
Jeff King Sept. 13, 2021, 5:09 p.m. UTC | #11
On Mon, Sep 13, 2021 at 01:37:48PM +0200, Johannes Schindelin wrote:

> > Good point.  So fspathcmp() and friends would need a repo parameter. :-|
> 
> Yes, we will eventually have to pass `struct repository *r` into a _lot_
> of call chains. It'll be a disruptive change, yet if the submodule folks
> truly want to aim for in-process recursive treatment of submodules, there
> is no alternative.
> 
> FWIW on Windows there are other potentially repository-specific settings
> that are relevant in similar situations. For example, there is
> `core.symlinks`.

Another approach is to stuff the appropriate globals into the repository
struct, and then "push" onto the global the_repository pointer, treating
it like a stack. And then low-level code is free to use that global
context, even if it wasn't passed in.

That helps the primary use case of "now I need to do something in a
sub-module, but I'd like to do it in-process". But it's not without
challenges:

  - code which acts at the boundary of a submodule and a superproject
    may be more awkward (since only one of them can be "the current
    repository" at a time).

  - it's a challenge with threading (an obvious problem would be a
    multi-threaded grep which wanted to descend into a submodule). Using
    a thread-local global for the_repository might solve that.

It's possible that this is a terrible direction to go, so I'm not
necessarily endorsing it, but just offering it as a possibility to think
about. The trickiest thing is that any devil would likely be in the
details, and we wouldn't know until proceeding for a while along that
path. Whereas passing around a context struct, while verbose and
annoying, is a well-understood construct.

-Peff
Junio C Hamano Sept. 13, 2021, 7:58 p.m. UTC | #12
Jeff King <peff@peff.net> writes:

> Another approach is to stuff the appropriate globals into the repository
> struct, and then "push" onto the global the_repository pointer, treating
> it like a stack. And then low-level code is free to use that global
> context, even if it wasn't passed in.
> ...
>   - it's a challenge with threading (an obvious problem would be a
>     multi-threaded grep which wanted to descend into a submodule). Using
>     a thread-local global for the_repository might solve that.
>
> It's possible that this is a terrible direction to go, so I'm not
> necessarily endorsing it, but just offering it as a possibility to think
> about. The trickiest thing is that any devil would likely be in the
> details, and we wouldn't know until proceeding for a while along that
> path. Whereas passing around a context struct, while verbose and
> annoying, is a well-understood construct.

I agree that it is cute, thread-unsafe, and tricky.  Let's leave it
at an interesting thought experiment for now.

Thanks.
Johannes Schindelin Sept. 14, 2021, 10:18 a.m. UTC | #13
Hi Peff,

On Mon, 13 Sep 2021, Jeff King wrote:

> On Mon, Sep 13, 2021 at 01:37:48PM +0200, Johannes Schindelin wrote:
>
> > > Good point.  So fspathcmp() and friends would need a repo parameter. :-|
> >
> > Yes, we will eventually have to pass `struct repository *r` into a _lot_
> > of call chains. It'll be a disruptive change, yet if the submodule folks
> > truly want to aim for in-process recursive treatment of submodules, there
> > is no alternative.
> >
> > FWIW on Windows there are other potentially repository-specific settings
> > that are relevant in similar situations. For example, there is
> > `core.symlinks`.
>
> Another approach is to stuff the appropriate globals into the repository
> struct, and then "push" onto the global the_repository pointer, treating
> it like a stack. And then low-level code is free to use that global
> context, even if it wasn't passed in.
>
> That helps the primary use case of "now I need to do something in a
> sub-module, but I'd like to do it in-process". But it's not without
> challenges:
>
>   - code which acts at the boundary of a submodule and a superproject
>     may be more awkward (since only one of them can be "the current
>     repository" at a time).
>
>   - it's a challenge with threading (an obvious problem would be a
>     multi-threaded grep which wanted to descend into a submodule). Using
>     a thread-local global for the_repository might solve that.
>
> It's possible that this is a terrible direction to go, so I'm not
> necessarily endorsing it, but just offering it as a possibility to think
> about. The trickiest thing is that any devil would likely be in the
> details, and we wouldn't know until proceeding for a while along that
> path. Whereas passing around a context struct, while verbose and
> annoying, is a well-understood construct.

I would not so far as to call it a terrible direction. It is definitely
worth a thought or two.

At the end of the day, I fear that it is too tricky in practice, though.

Seeing as there seems to be some appetite for refactoring Git's code on
this list, I am thinking that the `struct repository *r` direction might
be the one to go for. And I mean like "move the globals into that struct"
as opposed to introducing that stack you talked about. It would even be a
refactoring where I would understand the motivation, and agree with it,
too.

Ciao,
Dscho
Jeff King Sept. 14, 2021, 2:11 p.m. UTC | #14
On Tue, Sep 14, 2021 at 12:18:35PM +0200, Johannes Schindelin wrote:

> Seeing as there seems to be some appetite for refactoring Git's code on
> this list, I am thinking that the `struct repository *r` direction might
> be the one to go for. And I mean like "move the globals into that struct"
> as opposed to introducing that stack you talked about. It would even be a
> refactoring where I would understand the motivation, and agree with it,
> too.

Oh, definitely. Regardless of whether step 2 is "pass around the
repository struct" or "treat the global repository struct as a stack",
step 1 must be putting repository-related globals into the struct. I
don't think there can be any solution that doesn't start with that. :)

And I think it can even be done incrementally with very little impact.
Just s/ignore_case/the_repository->ignore_case/ in the use-sites is an
improvement over the status quo. Even though it doesn't _fix_ anything,
now we can easily see where the dependencies on repo-variables are. And
of course follow-on steps to make sure we are passing around and
accessing the right repository struct are then welcome.

-Peff
diff mbox series

Patch

diff --git a/merge-recursive.c b/merge-recursive.c
index 3355d50e8a..840599fd53 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -55,10 +55,7 @@  static int path_hashmap_cmp(const void *cmp_data,
 	a = container_of(eptr, const struct path_hashmap_entry, e);
 	b = container_of(entry_or_key, const struct path_hashmap_entry, e);

-	if (ignore_case)
-		return strcasecmp(a->path, key ? key : b->path);
-	else
-		return strcmp(a->path, key ? key : b->path);
+	return fspathcmp(a->path, key ? key : b->path);
 }

 /*