diff mbox series

[02/12] name-hash: add index_dir_exists2()

Message ID 3464545fe3feceb08408618c77a70cc95745bfe9.1707857541.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series FSMonitor edge cases on case-insensitive file systems | expand

Commit Message

Jeff Hostetler Feb. 13, 2024, 8:52 p.m. UTC
From: Jeff Hostetler <jeffhostetler@github.com>

Create a new version of index_dir_exists() to return the canonical
spelling of the matched directory prefix.

The existing index_dir_exists() returns a boolean to indicate if
there is a case-insensitive match in the directory name-hash, but
it doesn't tell the caller the exact spelling of that match.

The new version also copies the matched spelling to a provided strbuf.
This lets the caller, for example, then call index_name_pos() with the
correct case to search the cache-entry array for the real insertion
position.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
---
 name-hash.c | 16 ++++++++++++++++
 name-hash.h |  2 ++
 2 files changed, 18 insertions(+)

Comments

Junio C Hamano Feb. 13, 2024, 9:43 p.m. UTC | #1
"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Create a new version of index_dir_exists() to return the canonical
> spelling of the matched directory prefix.
>
> The existing index_dir_exists() returns a boolean to indicate if
> there is a case-insensitive match in the directory name-hash, but
> it doesn't tell the caller the exact spelling of that match.
>
> The new version also copies the matched spelling to a provided strbuf.
> This lets the caller, for example, then call index_name_pos() with the
> correct case to search the cache-entry array for the real insertion
> position.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  name-hash.c | 16 ++++++++++++++++
>  name-hash.h |  2 ++
>  2 files changed, 18 insertions(+)
>
> diff --git a/name-hash.c b/name-hash.c
> index 251f036eef6..d735c81acb3 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -694,6 +694,22 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>  	dir = find_dir_entry(istate, name, namelen);
>  	return dir && dir->nr;
>  }
> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
> +		      struct strbuf *canonical_path)
> +{
> +	struct dir_entry *dir;
> +
> +	strbuf_init(canonical_path, namelen+1);
> +
> +	lazy_init_name_hash(istate);
> +	expand_to_path(istate, name, namelen, 0);
> +	dir = find_dir_entry(istate, name, namelen);
> +
> +	if (dir && dir->nr)
> +		strbuf_add(canonical_path, dir->name, dir->namelen);
> +
> +	return dir && dir->nr;
> +}
>  
>  void adjust_dirname_case(struct index_state *istate, char *name)

Missing inter-function blank line, before the new function.

I wonder if we can avoid such repetition---the body of
index_dir_exists() is 100% shared with this new function.

Isn't it extremely unusual to receive "struct strbuf *" and call
strbuf_init() on it?  It means that the caller is expected to have a
strbuf and pass a pointer to it, but also it is expected to leave
the strbuf uninitialized.

I'd understand if it calls strbuf_reset(), but it may not even be
necessary, if we make it responsibility of the caller to pass a
valid strbuf to be appended into.

	int index_dir_find(struct index_state *istate,
			   const char *name, int namelen,
			   struct strbuf *canonical_path)
	{
                struct dir_entry *dir;

                lazy_init_name_hash(istate);
                expand_to_path(istate, name, namelen, 0);
                dir = find_dir_entry(istate, name, namelen);

                if (canonical_path && dir && dir->nr) {
			// strbuf_reset(canonical_path); ???
                	strbuf_add(canonical_path, dir->name, dir->namelen);
		}
                return dir && dir->nr;
	}

Then we can do

	#define index_dir_exists(i, n, l) index_dir_find((i), (n), (l), NULL)

in the header for existing callers.

>  {
> diff --git a/name-hash.h b/name-hash.h
> index b1b4b0fb337..2fcac5c4870 100644
> --- a/name-hash.h
> +++ b/name-hash.h
> @@ -5,6 +5,8 @@ struct cache_entry;
>  struct index_state;
>  
>  int index_dir_exists(struct index_state *istate, const char *name, int namelen);
> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
> +		      struct strbuf *canonical_path);
>  void adjust_dirname_case(struct index_state *istate, char *name);
>  struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
Patrick Steinhardt Feb. 15, 2024, 9:31 a.m. UTC | #2
On Tue, Feb 13, 2024 at 08:52:11PM +0000, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhostetler@github.com>
> 
> Create a new version of index_dir_exists() to return the canonical
> spelling of the matched directory prefix.
> 
> The existing index_dir_exists() returns a boolean to indicate if
> there is a case-insensitive match in the directory name-hash, but
> it doesn't tell the caller the exact spelling of that match.
> 
> The new version also copies the matched spelling to a provided strbuf.
> This lets the caller, for example, then call index_name_pos() with the
> correct case to search the cache-entry array for the real insertion
> position.
> 
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  name-hash.c | 16 ++++++++++++++++
>  name-hash.h |  2 ++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/name-hash.c b/name-hash.c
> index 251f036eef6..d735c81acb3 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -694,6 +694,22 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>  	dir = find_dir_entry(istate, name, namelen);
>  	return dir && dir->nr;
>  }
> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
> +		      struct strbuf *canonical_path)
> +{
> +	struct dir_entry *dir;
> +
> +	strbuf_init(canonical_path, namelen+1);

Missing spaces: `namelen + 1`.

> +	lazy_init_name_hash(istate);
> +	expand_to_path(istate, name, namelen, 0);
> +	dir = find_dir_entry(istate, name, namelen);
> +
> +	if (dir && dir->nr)
> +		strbuf_add(canonical_path, dir->name, dir->namelen);
> +
> +	return dir && dir->nr;
> +}

Can we maybe give this function a more descriptive name?
`index_dir_exists2()` doesn't give the reader any indicator what is
different about it compared to `index_dir_exists()`. How about
`index_dir_exists_with_canonical()`?

>  void adjust_dirname_case(struct index_state *istate, char *name)
>  {
> diff --git a/name-hash.h b/name-hash.h
> index b1b4b0fb337..2fcac5c4870 100644
> --- a/name-hash.h
> +++ b/name-hash.h
> @@ -5,6 +5,8 @@ struct cache_entry;
>  struct index_state;
>  
>  int index_dir_exists(struct index_state *istate, const char *name, int namelen);
> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
> +		      struct strbuf *canonical_path);

It would also be great to add comments here that explain what those
functions do and what the difference between them is.

Patrick

>  void adjust_dirname_case(struct index_state *istate, char *name);
>  struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
>  
> -- 
> gitgitgadget
> 
>
Jeff Hostetler Feb. 20, 2024, 5:38 p.m. UTC | #3
On 2/13/24 4:43 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Jeff Hostetler <jeffhostetler@github.com>
>>
>> Create a new version of index_dir_exists() to return the canonical
>> spelling of the matched directory prefix.
>>
>> The existing index_dir_exists() returns a boolean to indicate if
>> there is a case-insensitive match in the directory name-hash, but
>> it doesn't tell the caller the exact spelling of that match.
>>
>> The new version also copies the matched spelling to a provided strbuf.
>> This lets the caller, for example, then call index_name_pos() with the
>> correct case to search the cache-entry array for the real insertion
>> position.
>>
>> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
>> ---
>>   name-hash.c | 16 ++++++++++++++++
>>   name-hash.h |  2 ++
>>   2 files changed, 18 insertions(+)
>>
>> diff --git a/name-hash.c b/name-hash.c
>> index 251f036eef6..d735c81acb3 100644
>> --- a/name-hash.c
>> +++ b/name-hash.c
>> @@ -694,6 +694,22 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen)
>>   	dir = find_dir_entry(istate, name, namelen);
>>   	return dir && dir->nr;
>>   }
>> +int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
>> +		      struct strbuf *canonical_path)
>> +{
>> +	struct dir_entry *dir;
>> +
>> +	strbuf_init(canonical_path, namelen+1);
>> +
>> +	lazy_init_name_hash(istate);
>> +	expand_to_path(istate, name, namelen, 0);
>> +	dir = find_dir_entry(istate, name, namelen);
>> +
>> +	if (dir && dir->nr)
>> +		strbuf_add(canonical_path, dir->name, dir->namelen);
>> +
>> +	return dir && dir->nr;
>> +}
>>   
>>   void adjust_dirname_case(struct index_state *istate, char *name)
> 
> Missing inter-function blank line, before the new function.
> 
> I wonder if we can avoid such repetition---the body of
> index_dir_exists() is 100% shared with this new function.
> 
> Isn't it extremely unusual to receive "struct strbuf *" and call
> strbuf_init() on it?  It means that the caller is expected to have a
> strbuf and pass a pointer to it, but also it is expected to leave
> the strbuf uninitialized.
> 
> I'd understand if it calls strbuf_reset(), but it may not even be
> necessary, if we make it responsibility of the caller to pass a
> valid strbuf to be appended into.
> 
> 	int index_dir_find(struct index_state *istate,
> 			   const char *name, int namelen,
> 			   struct strbuf *canonical_path)
> 	{
>                  struct dir_entry *dir;
> 
>                  lazy_init_name_hash(istate);
>                  expand_to_path(istate, name, namelen, 0);
>                  dir = find_dir_entry(istate, name, namelen);
> 
>                  if (canonical_path && dir && dir->nr) {
> 			// strbuf_reset(canonical_path); ???
>                  	strbuf_add(canonical_path, dir->name, dir->namelen);
> 		}
>                  return dir && dir->nr;
> 	}
> 
> Then we can do
> 
> 	#define index_dir_exists(i, n, l) index_dir_find((i), (n), (l), NULL)
> 
> in the header for existing callers.
> 

I'm always a little hesitant to change the signature of an existing
function and chasing all of the callers in the middle of another
task.  It can sometimes be distracting to reviewers.

I like your macro approach here. I'll do that in the next version.

Thanks
Jeff
Junio C Hamano Feb. 20, 2024, 7:34 p.m. UTC | #4
Jeff Hostetler <git@jeffhostetler.com> writes:

> I'm always a little hesitant to change the signature of an existing
> function and chasing all of the callers in the middle of another
> task.  It can sometimes be distracting to reviewers.

Of course we all should be hesitant.  In addition to reviewers,
there are topics in flight and topics people are cooking but not
posted that will be affected.  So it is perfectly fine to introduce
an enhanced version as needed under different name (but let's not
give it a meaningless name like "foo2" where it is totally unclear
and unexplained what its difference from "foo" is from the name),
but if it is meant as an enhanced version, we should aim to share
the code and rewrite the original in terms of the enhanced one,
instead of simply duplicating to risk unnecessary divergence of the
two functions in the future.

Thanks.
diff mbox series

Patch

diff --git a/name-hash.c b/name-hash.c
index 251f036eef6..d735c81acb3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -694,6 +694,22 @@  int index_dir_exists(struct index_state *istate, const char *name, int namelen)
 	dir = find_dir_entry(istate, name, namelen);
 	return dir && dir->nr;
 }
+int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
+		      struct strbuf *canonical_path)
+{
+	struct dir_entry *dir;
+
+	strbuf_init(canonical_path, namelen+1);
+
+	lazy_init_name_hash(istate);
+	expand_to_path(istate, name, namelen, 0);
+	dir = find_dir_entry(istate, name, namelen);
+
+	if (dir && dir->nr)
+		strbuf_add(canonical_path, dir->name, dir->namelen);
+
+	return dir && dir->nr;
+}
 
 void adjust_dirname_case(struct index_state *istate, char *name)
 {
diff --git a/name-hash.h b/name-hash.h
index b1b4b0fb337..2fcac5c4870 100644
--- a/name-hash.h
+++ b/name-hash.h
@@ -5,6 +5,8 @@  struct cache_entry;
 struct index_state;
 
 int index_dir_exists(struct index_state *istate, const char *name, int namelen);
+int index_dir_exists2(struct index_state *istate, const char *name, int namelen,
+		      struct strbuf *canonical_path);
 void adjust_dirname_case(struct index_state *istate, char *name);
 struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);