Message ID | 512abaef-d71c-9308-6a62-f37b4de69e60@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | merge-recursive: use fspathcmp() in path_hashmap_cmp() | expand |
Hi René, On Sat, Aug 28, 2021 at 11:30:49PM +0200, René Scharfe wrote: > Call fspathcmp() instead of open-coding it. This shortens the code and > makes it less repetitive. > > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > merge-recursive.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/merge-recursive.c b/merge-recursive.c > index 3355d50e8a..840599fd53 100644 > --- a/merge-recursive.c > +++ b/merge-recursive.c > @@ -55,10 +55,7 @@ static int path_hashmap_cmp(const void *cmp_data, > a = container_of(eptr, const struct path_hashmap_entry, e); > b = container_of(entry_or_key, const struct path_hashmap_entry, e); > > - if (ignore_case) > - return strcasecmp(a->path, key ? key : b->path); > - else > - return strcmp(a->path, key ? key : b->path); > + return fspathcmp(a->path, key ? key : b->path); > } Looks obviously right to me. I found another spot in t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the same way. But this looks fine with or without the following diff: diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c index 36ff07bd4b..ab34bdfecd 100644 --- a/t/helper/test-hashmap.c +++ b/t/helper/test-hashmap.c @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data, e1 = container_of(eptr, const struct test_entry, ent); e2 = container_of(entry_or_key, const struct test_entry, ent); - if (ignore_case) - return strcasecmp(e1->key, key ? key : e2->key); - else - return strcmp(e1->key, key ? key : e2->key); + return fspathcmp(e1->key, key ? key : e2->key); } static struct test_entry *alloc_test_entry(unsigned int hash,
On Sun, Aug 29, 2021 at 04:21:21PM -0400, Taylor Blau wrote: > > +++ b/merge-recursive.c > > @@ -55,10 +55,7 @@ static int path_hashmap_cmp(const void *cmp_data, > [...] > > Looks obviously right to me. I found another spot in > t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the > same way. But this looks fine with or without the following diff: > > diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c > index 36ff07bd4b..ab34bdfecd 100644 > --- a/t/helper/test-hashmap.c > +++ b/t/helper/test-hashmap.c > @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data, > e1 = container_of(eptr, const struct test_entry, ent); > e2 = container_of(entry_or_key, const struct test_entry, ent); > > - if (ignore_case) > - return strcasecmp(e1->key, key ? key : e2->key); > - else > - return strcmp(e1->key, key ? key : e2->key); > + return fspathcmp(e1->key, key ? key : e2->key); > } > > static struct test_entry *alloc_test_entry(unsigned int hash, Maybe also: diff --git a/dir.c b/dir.c index 03c4d21267..ee46290cbb 100644 --- a/dir.c +++ b/dir.c @@ -669,9 +669,7 @@ int pl_hashmap_cmp(const void *unused_cmp_data, ? ee1->patternlen : ee2->patternlen; - if (ignore_case) - return strncasecmp(ee1->pattern, ee2->pattern, min_len); - return strncmp(ee1->pattern, ee2->pattern, min_len); + return fspathncmp(ee1->pattern, ee2->pattern, min_len); } static char *dup_and_filter_pattern(const char *pattern) This is fun. :) -Peff
Jeff King <peff@peff.net> writes: >> - if (ignore_case) >> - return strcasecmp(e1->key, key ? key : e2->key); >> - else >> - return strcmp(e1->key, key ? key : e2->key); >> + return fspathcmp(e1->key, key ? key : e2->key); >> } >> >> static struct test_entry *alloc_test_entry(unsigned int hash, > > Maybe also: > > diff --git a/dir.c b/dir.c > index 03c4d21267..ee46290cbb 100644 > --- a/dir.c > +++ b/dir.c > @@ -669,9 +669,7 @@ int pl_hashmap_cmp(const void *unused_cmp_data, > ? ee1->patternlen > : ee2->patternlen; > > - if (ignore_case) > - return strncasecmp(ee1->pattern, ee2->pattern, min_len); > - return strncmp(ee1->pattern, ee2->pattern, min_len); > + return fspathncmp(ee1->pattern, ee2->pattern, min_len); > } > > static char *dup_and_filter_pattern(const char *pattern) > > This is fun. :) Yes. So we found three of them in the existing code. A Coccinelle rule may be an overkill, I guess.
Am 29.08.21 um 22:21 schrieb Taylor Blau: > Hi René, > > On Sat, Aug 28, 2021 at 11:30:49PM +0200, René Scharfe wrote: >> Call fspathcmp() instead of open-coding it. This shortens the code and >> makes it less repetitive. >> >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> merge-recursive.c | 5 +---- >> 1 file changed, 1 insertion(+), 4 deletions(-) >> >> diff --git a/merge-recursive.c b/merge-recursive.c >> index 3355d50e8a..840599fd53 100644 >> --- a/merge-recursive.c >> +++ b/merge-recursive.c >> @@ -55,10 +55,7 @@ static int path_hashmap_cmp(const void *cmp_data, >> a = container_of(eptr, const struct path_hashmap_entry, e); >> b = container_of(entry_or_key, const struct path_hashmap_entry, e); >> >> - if (ignore_case) >> - return strcasecmp(a->path, key ? key : b->path); >> - else >> - return strcmp(a->path, key ? key : b->path); >> + return fspathcmp(a->path, key ? key : b->path); >> } > > Looks obviously right to me. I found another spot in > t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the > same way. But this looks fine with or without the following diff: > > diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c > index 36ff07bd4b..ab34bdfecd 100644 > --- a/t/helper/test-hashmap.c > +++ b/t/helper/test-hashmap.c > @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data, > e1 = container_of(eptr, const struct test_entry, ent); > e2 = container_of(entry_or_key, const struct test_entry, ent); > > - if (ignore_case) > - return strcasecmp(e1->key, key ? key : e2->key); > - else > - return strcmp(e1->key, key ? key : e2->key); > + return fspathcmp(e1->key, key ? key : e2->key); > } > > static struct test_entry *alloc_test_entry(unsigned int hash, > That's a local variable named "ignore_case", not the one declared in environment.c that fspathcmp() uses, so this would change the behavior. The helper code does not include cache.h, so this is not even a case of variable shadowing, just two different variables for similar purposes in different places having the same name. René
Taylor Blau <me@ttaylorr.com> writes: > Looks obviously right to me. I found another spot in > t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the > same way. But this looks fine with or without the following diff: > diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c > index 36ff07bd4b..ab34bdfecd 100644 > --- a/t/helper/test-hashmap.c > +++ b/t/helper/test-hashmap.c > @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data, > e1 = container_of(eptr, const struct test_entry, ent); > e2 = container_of(entry_or_key, const struct test_entry, ent); > > - if (ignore_case) > - return strcasecmp(e1->key, key ? key : e2->key); > - else > - return strcmp(e1->key, key ? key : e2->key); > + return fspathcmp(e1->key, key ? key : e2->key); Sorry but I think this patch is wrong. Before the precontext of the patch, there is a local variable decl for ignore_case---the existing code looks at ignore_case that is different from the global ignore_case fspathcmp() looks at. Admittedly, it was probably not an excellent idea to give a name so bland and unremarkable, 'ignore_case', to a global that affects so many code paths in the system. But the variable is already very established that renaming it would not contribute to improving the code at all. It however may not be a bad idea to catch these code paths where a local variable masks 'ignore_case' (and possibly other globals) and rename these local ones to avoid a mistake like this. Thanks.
On Mon, Aug 30, 2021 at 05:09:34PM +0200, René Scharfe wrote: > > diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c > > index 36ff07bd4b..ab34bdfecd 100644 > > --- a/t/helper/test-hashmap.c > > +++ b/t/helper/test-hashmap.c > > @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data, > > e1 = container_of(eptr, const struct test_entry, ent); > > e2 = container_of(entry_or_key, const struct test_entry, ent); > > > > - if (ignore_case) > > - return strcasecmp(e1->key, key ? key : e2->key); > > - else > > - return strcmp(e1->key, key ? key : e2->key); > > + return fspathcmp(e1->key, key ? key : e2->key); > > } > > > > static struct test_entry *alloc_test_entry(unsigned int hash, > > > > That's a local variable named "ignore_case", not the one declared in > environment.c that fspathcmp() uses, so this would change the behavior. > The helper code does not include cache.h, so this is not even a case of > variable shadowing, just two different variables for similar purposes > in different places having the same name. Yikes, good catch. Perhaps it's overkill, but I wonder if a comment like: /* * Do not use fspathcmp() here; our behavior depends on the local * ignore_case variable, not the usual Git-wide global. */ would help. I double-checked the spot I suggested. I think it is actually using the global (though I got it right through sheer luck). -Peff
Am 30.08.21 um 18:55 schrieb Junio C Hamano: > Taylor Blau <me@ttaylorr.com> writes: > >> Looks obviously right to me. I found another spot in >> t/helper/test-hashmap.c:test_entry_cmp() that could be cleaned up in the >> same way. But this looks fine with or without the following diff: > >> diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c >> index 36ff07bd4b..ab34bdfecd 100644 >> --- a/t/helper/test-hashmap.c >> +++ b/t/helper/test-hashmap.c >> @@ -28,10 +28,7 @@ static int test_entry_cmp(const void *cmp_data, >> e1 = container_of(eptr, const struct test_entry, ent); >> e2 = container_of(entry_or_key, const struct test_entry, ent); >> >> - if (ignore_case) >> - return strcasecmp(e1->key, key ? key : e2->key); >> - else >> - return strcmp(e1->key, key ? key : e2->key); >> + return fspathcmp(e1->key, key ? key : e2->key); > > Sorry but I think this patch is wrong. Before the precontext of the > patch, there is a local variable decl for ignore_case---the existing > code looks at ignore_case that is different from the global > ignore_case fspathcmp() looks at. > > Admittedly, it was probably not an excellent idea to give a name so > bland and unremarkable, 'ignore_case', to a global that affects so > many code paths in the system. But the variable is already very > established that renaming it would not contribute to improving the > code at all. > > It however may not be a bad idea to catch these code paths where a > local variable masks 'ignore_case' (and possibly other globals) and > rename these local ones to avoid a mistake like this. The name itself is OK, I think, but using it at global scope is confusing. -Wshadow can help find such cases, but not this one, as test-hashmap.c doesn't include the global declaration. Moving the global into a struct to provide a poor man's namespace would fix this for all namesakes, assisted by the compiler. We'd then access it as the_config.ignore_case or even the_config.core.ignore_case. Moving all config-related variables would be quite noisy, I guess, and probably conflict with lots of in-flight patches, but might be worth it. René
On Mon, Aug 30, 2021 at 08:22:25PM +0200, René Scharfe wrote: > > It however may not be a bad idea to catch these code paths where a > > local variable masks 'ignore_case' (and possibly other globals) and > > rename these local ones to avoid a mistake like this. > > The name itself is OK, I think, but using it at global scope is > confusing. -Wshadow can help find such cases, but not this one, as > test-hashmap.c doesn't include the global declaration. Moving the > global into a struct to provide a poor man's namespace would fix this > for all namesakes, assisted by the compiler. We'd then access it as > the_config.ignore_case or even the_config.core.ignore_case. > > Moving all config-related variables would be quite noisy, I guess, > and probably conflict with lots of in-flight patches, but might be > worth it. Really most of these ought to be in the repository struct anyway, I would think. The value of ignore_case comes from core.ignorecase, which is going to be repository-specific. We are probably doing the wrong thing already by looking at the parent core.ignorecase value when operating in an in-process submodule, but nobody noticed because it's quite unlikely for a submodule to have a different setting than the parent. -Peff
Am 30.08.21 um 22:49 schrieb Jeff King: > On Mon, Aug 30, 2021 at 08:22:25PM +0200, René Scharfe wrote: > >>> It however may not be a bad idea to catch these code paths where a >>> local variable masks 'ignore_case' (and possibly other globals) and >>> rename these local ones to avoid a mistake like this. >> >> The name itself is OK, I think, but using it at global scope is >> confusing. -Wshadow can help find such cases, but not this one, as >> test-hashmap.c doesn't include the global declaration. Moving the >> global into a struct to provide a poor man's namespace would fix this >> for all namesakes, assisted by the compiler. We'd then access it as >> the_config.ignore_case or even the_config.core.ignore_case. >> >> Moving all config-related variables would be quite noisy, I guess, >> and probably conflict with lots of in-flight patches, but might be >> worth it. > > Really most of these ought to be in the repository struct anyway, I > would think. The value of ignore_case comes from core.ignorecase, which > is going to be repository-specific. We are probably doing the wrong > thing already by looking at the parent core.ignorecase value when > operating in an in-process submodule, but nobody noticed because it's > quite unlikely for a submodule to have a different setting than the > parent. Good point. So fspathcmp() and friends would need a repo parameter. :-| René
Hi René, On Sat, 11 Sep 2021, René Scharfe wrote: > Am 30.08.21 um 22:49 schrieb Jeff King: > > On Mon, Aug 30, 2021 at 08:22:25PM +0200, René Scharfe wrote: > > > >>> It however may not be a bad idea to catch these code paths where a > >>> local variable masks 'ignore_case' (and possibly other globals) and > >>> rename these local ones to avoid a mistake like this. > >> > >> The name itself is OK, I think, but using it at global scope is > >> confusing. -Wshadow can help find such cases, but not this one, as > >> test-hashmap.c doesn't include the global declaration. Moving the > >> global into a struct to provide a poor man's namespace would fix this > >> for all namesakes, assisted by the compiler. We'd then access it as > >> the_config.ignore_case or even the_config.core.ignore_case. > >> > >> Moving all config-related variables would be quite noisy, I guess, > >> and probably conflict with lots of in-flight patches, but might be > >> worth it. > > > > Really most of these ought to be in the repository struct anyway, I > > would think. The value of ignore_case comes from core.ignorecase, which > > is going to be repository-specific. We are probably doing the wrong > > thing already by looking at the parent core.ignorecase value when > > operating in an in-process submodule, but nobody noticed because it's > > quite unlikely for a submodule to have a different setting than the > > parent. > > Good point. So fspathcmp() and friends would need a repo parameter. :-| Yes, we will eventually have to pass `struct repository *r` into a _lot_ of call chains. It'll be a disruptive change, yet if the submodule folks truly want to aim for in-process recursive treatment of submodules, there is no alternative. FWIW on Windows there are other potentially repository-specific settings that are relevant in similar situations. For example, there is `core.symlinks`. Ciao, Dscho
On Mon, Sep 13, 2021 at 01:37:48PM +0200, Johannes Schindelin wrote: > > Good point. So fspathcmp() and friends would need a repo parameter. :-| > > Yes, we will eventually have to pass `struct repository *r` into a _lot_ > of call chains. It'll be a disruptive change, yet if the submodule folks > truly want to aim for in-process recursive treatment of submodules, there > is no alternative. > > FWIW on Windows there are other potentially repository-specific settings > that are relevant in similar situations. For example, there is > `core.symlinks`. Another approach is to stuff the appropriate globals into the repository struct, and then "push" onto the global the_repository pointer, treating it like a stack. And then low-level code is free to use that global context, even if it wasn't passed in. That helps the primary use case of "now I need to do something in a sub-module, but I'd like to do it in-process". But it's not without challenges: - code which acts at the boundary of a submodule and a superproject may be more awkward (since only one of them can be "the current repository" at a time). - it's a challenge with threading (an obvious problem would be a multi-threaded grep which wanted to descend into a submodule). Using a thread-local global for the_repository might solve that. It's possible that this is a terrible direction to go, so I'm not necessarily endorsing it, but just offering it as a possibility to think about. The trickiest thing is that any devil would likely be in the details, and we wouldn't know until proceeding for a while along that path. Whereas passing around a context struct, while verbose and annoying, is a well-understood construct. -Peff
Jeff King <peff@peff.net> writes: > Another approach is to stuff the appropriate globals into the repository > struct, and then "push" onto the global the_repository pointer, treating > it like a stack. And then low-level code is free to use that global > context, even if it wasn't passed in. > ... > - it's a challenge with threading (an obvious problem would be a > multi-threaded grep which wanted to descend into a submodule). Using > a thread-local global for the_repository might solve that. > > It's possible that this is a terrible direction to go, so I'm not > necessarily endorsing it, but just offering it as a possibility to think > about. The trickiest thing is that any devil would likely be in the > details, and we wouldn't know until proceeding for a while along that > path. Whereas passing around a context struct, while verbose and > annoying, is a well-understood construct. I agree that it is cute, thread-unsafe, and tricky. Let's leave it at an interesting thought experiment for now. Thanks.
Hi Peff, On Mon, 13 Sep 2021, Jeff King wrote: > On Mon, Sep 13, 2021 at 01:37:48PM +0200, Johannes Schindelin wrote: > > > > Good point. So fspathcmp() and friends would need a repo parameter. :-| > > > > Yes, we will eventually have to pass `struct repository *r` into a _lot_ > > of call chains. It'll be a disruptive change, yet if the submodule folks > > truly want to aim for in-process recursive treatment of submodules, there > > is no alternative. > > > > FWIW on Windows there are other potentially repository-specific settings > > that are relevant in similar situations. For example, there is > > `core.symlinks`. > > Another approach is to stuff the appropriate globals into the repository > struct, and then "push" onto the global the_repository pointer, treating > it like a stack. And then low-level code is free to use that global > context, even if it wasn't passed in. > > That helps the primary use case of "now I need to do something in a > sub-module, but I'd like to do it in-process". But it's not without > challenges: > > - code which acts at the boundary of a submodule and a superproject > may be more awkward (since only one of them can be "the current > repository" at a time). > > - it's a challenge with threading (an obvious problem would be a > multi-threaded grep which wanted to descend into a submodule). Using > a thread-local global for the_repository might solve that. > > It's possible that this is a terrible direction to go, so I'm not > necessarily endorsing it, but just offering it as a possibility to think > about. The trickiest thing is that any devil would likely be in the > details, and we wouldn't know until proceeding for a while along that > path. Whereas passing around a context struct, while verbose and > annoying, is a well-understood construct. I would not so far as to call it a terrible direction. It is definitely worth a thought or two. At the end of the day, I fear that it is too tricky in practice, though. Seeing as there seems to be some appetite for refactoring Git's code on this list, I am thinking that the `struct repository *r` direction might be the one to go for. And I mean like "move the globals into that struct" as opposed to introducing that stack you talked about. It would even be a refactoring where I would understand the motivation, and agree with it, too. Ciao, Dscho
On Tue, Sep 14, 2021 at 12:18:35PM +0200, Johannes Schindelin wrote: > Seeing as there seems to be some appetite for refactoring Git's code on > this list, I am thinking that the `struct repository *r` direction might > be the one to go for. And I mean like "move the globals into that struct" > as opposed to introducing that stack you talked about. It would even be a > refactoring where I would understand the motivation, and agree with it, > too. Oh, definitely. Regardless of whether step 2 is "pass around the repository struct" or "treat the global repository struct as a stack", step 1 must be putting repository-related globals into the struct. I don't think there can be any solution that doesn't start with that. :) And I think it can even be done incrementally with very little impact. Just s/ignore_case/the_repository->ignore_case/ in the use-sites is an improvement over the status quo. Even though it doesn't _fix_ anything, now we can easily see where the dependencies on repo-variables are. And of course follow-on steps to make sure we are passing around and accessing the right repository struct are then welcome. -Peff
diff --git a/merge-recursive.c b/merge-recursive.c index 3355d50e8a..840599fd53 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -55,10 +55,7 @@ static int path_hashmap_cmp(const void *cmp_data, a = container_of(eptr, const struct path_hashmap_entry, e); b = container_of(entry_or_key, const struct path_hashmap_entry, e); - if (ignore_case) - return strcasecmp(a->path, key ? key : b->path); - else - return strcmp(a->path, key ? key : b->path); + return fspathcmp(a->path, key ? key : b->path); } /*
Call fspathcmp() instead of open-coding it. This shortens the code and makes it less repetitive. Signed-off-by: René Scharfe <l.s.r@web.de> --- merge-recursive.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) -- 2.33.0