diff mbox series

Re*: [PATCH v3] fetch: replace string-list used as a look-up table with a hashmap

Message ID xmqqk1m3c2dz.fsf_-_@gitster-ct.c.googlers.com (mailing list archive)
State New, archived
Headers show
Series Re*: [PATCH v3] fetch: replace string-list used as a look-up table with a hashmap | expand

Commit Message

Junio C Hamano Oct. 27, 2018, 6:47 a.m. UTC
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Just one thing^W^Wa couple of things:
>
> It would probably make more sense to `hashmap_get_from_hash()` and
> `strhash()` here (and `strhash()` should probably be used everywhere
> instead of `memhash(str, strlen(str))`).

hashmap_get_from_hash() certainly is much better suited for simpler
usage pattern like these callsites, and the ones in sequencer.c.  It
is a shame that a more complex variant takes the shorter-and-sweeter
name hashmap_get().

I wish we named the latter hashmap_get_fullblown_feature_rich() and
called the _from_hash() thing a simple hashmap_get() from day one,
but it is way too late.

I looked briefly the users of the _get() variant, and some of their
uses are legitimately not-simple and cannot be reduced to use the
simpler _get_from_hash variant, it seems.  But others like those in
builtin/difftool.c should be straight-forward to convert to use the
simpler get_from_hash variant.  It could be a low-hanging fruit left
for later clean-up, perhaps.

>> @@ -271,10 +319,10 @@ static void find_non_local_tags(const struct ref *refs,
>>  			    !has_object_file_with_flags(&ref->old_oid,
>>  							OBJECT_INFO_QUICK) &&
>>  			    !will_fetch(head, ref->old_oid.hash) &&
>> -			    !has_sha1_file_with_flags(item->util,
>> +			    !has_sha1_file_with_flags(item->oid.hash,
>
> I am not sure that we need to test for null OIDs here, given that...
> ...
> Of course, `has_sha1_file_with_flags()` is supposed to return `false` for
> null OIDs, I guess.

Yup.  An alternative is to make item->oid a pointer to oid, not an
oid object itself, so that we can express "no OID for this ref" in a
more explicit way, but is_null_oid() is already used as "no OID" in
many other codepaths, so...

>> +	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
>> +		const char *refname = remote_ref_item->string;
>> +		struct hashmap_entry key;
>> +
>> +		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
>> +		item = hashmap_get(&remote_refs, &key, refname);
>> +		if (!item)
>> +			continue; /* can this happen??? */
>
> This would indicate a BUG, no?

Possibly.  Alternatively, we can just use item without checking and
let the runtime segfault.

Here is an incremental on top that can be squashed in to turn v3
into v4.

Comments

Johannes Schindelin Oct. 31, 2018, 2:50 p.m. UTC | #1
Hi Junio,

On Sat, 27 Oct 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Just one thing^W^Wa couple of things:
> >
> > It would probably make more sense to `hashmap_get_from_hash()` and
> > `strhash()` here (and `strhash()` should probably be used everywhere
> > instead of `memhash(str, strlen(str))`).
> 
> hashmap_get_from_hash() certainly is much better suited for simpler
> usage pattern like these callsites, and the ones in sequencer.c.  It
> is a shame that a more complex variant takes the shorter-and-sweeter
> name hashmap_get().

I agree, at least in part.

From what I understand, hashmap_get_from_hash() needs a little assistance
from the comparison function with which the hashmap is configured, see
e.g. this function in the sequencer:

	static int labels_cmp(const void *fndata, const struct labels_entry *a,
			      const struct labels_entry *b, const void *key)
	{
		return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
	}

See how that first tests whether `key` is non-`NULL`, and then takes a
shortcut, not even looking at `b`? This is important, because `b` does not
refer to a complete `labels_entry` when we call `hashmap_get_from_hash()`.
It only refers to a `hashmap_entry`. Looking at `b->label` would access
some random memory, and do most certainly the wrong thing.

> I wish we named the latter hashmap_get_fullblown_feature_rich() and
> called the _from_hash() thing a simple hashmap_get() from day one,
> but it is way too late.
> 
> I looked briefly the users of the _get() variant, and some of their
> uses are legitimately not-simple and cannot be reduced to use the
> simpler _get_from_hash variant, it seems.  But others like those in
> builtin/difftool.c should be straight-forward to convert to use the
> simpler get_from_hash variant.  It could be a low-hanging fruit left
> for later clean-up, perhaps.

Right. #leftoverbits

> >> @@ -271,10 +319,10 @@ static void find_non_local_tags(const struct ref *refs,
> >>  			    !has_object_file_with_flags(&ref->old_oid,
> >>  							OBJECT_INFO_QUICK) &&
> >>  			    !will_fetch(head, ref->old_oid.hash) &&
> >> -			    !has_sha1_file_with_flags(item->util,
> >> +			    !has_sha1_file_with_flags(item->oid.hash,
> >
> > I am not sure that we need to test for null OIDs here, given that...
> > ...
> > Of course, `has_sha1_file_with_flags()` is supposed to return `false` for
> > null OIDs, I guess.
> 
> Yup.  An alternative is to make item->oid a pointer to oid, not an
> oid object itself, so that we can express "no OID for this ref" in a
> more explicit way, but is_null_oid() is already used as "no OID" in
> many other codepaths, so...

Right, and it would complicate the code. So I am fine with your version of
it.

> >> +	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
> >> +		const char *refname = remote_ref_item->string;
> >> +		struct hashmap_entry key;
> >> +
> >> +		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> >> +		item = hashmap_get(&remote_refs, &key, refname);
> >> +		if (!item)
> >> +			continue; /* can this happen??? */
> >
> > This would indicate a BUG, no?
> 
> Possibly.  Alternatively, we can just use item without checking and
> let the runtime segfault.

Hahaha! Yep. We could also cause a crash. I do prefer the BUG() call.

> Here is an incremental on top that can be squashed in to turn v3
> into v4.

Nice.

Thanks!
Dscho

> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0f8e333022..aee1d9bf21 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -259,7 +259,7 @@ static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
>  	size_t len = strlen(refname);
>  
>  	FLEX_ALLOC_MEM(ent, refname, refname, len);
> -	hashmap_entry_init(ent, memhash(refname, len));
> +	hashmap_entry_init(ent, strhash(refname));
>  	oidcpy(&ent->oid, oid);
>  	hashmap_add(map, ent);
>  	return ent;
> @@ -282,11 +282,7 @@ static void refname_hash_init(struct hashmap *map)
>  
>  static int refname_hash_exists(struct hashmap *map, const char *refname)
>  {
> -	struct hashmap_entry key;
> -	size_t len = strlen(refname);
> -	hashmap_entry_init(&key, memhash(refname, len));
> -
> -	return !!hashmap_get(map, &key, refname);
> +	return !!hashmap_get_from_hash(map, strhash(refname), refname);
>  }
>  
>  static void find_non_local_tags(const struct ref *refs,
> @@ -365,12 +361,10 @@ static void find_non_local_tags(const struct ref *refs,
>  	 */
>  	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
>  		const char *refname = remote_ref_item->string;
> -		struct hashmap_entry key;
>  
> -		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> -		item = hashmap_get(&remote_refs, &key, refname);
> +		item = hashmap_get_from_hash(&remote_refs, strhash(refname), refname);
>  		if (!item)
> -			continue; /* can this happen??? */
> +			BUG("unseen remote ref?");
>  
>  		/* Unless we have already decided to ignore this item... */
>  		if (!is_null_oid(&item->oid)) {
> @@ -497,12 +491,12 @@ static struct ref *get_ref_map(struct remote *remote,
>  
>  	for (rm = ref_map; rm; rm = rm->next) {
>  		if (rm->peer_ref) {
> -			struct hashmap_entry key;
>  			const char *refname = rm->peer_ref->name;
>  			struct refname_hash_entry *peer_item;
>  
> -			hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> -			peer_item = hashmap_get(&existing_refs, &key, refname);
> +			peer_item = hashmap_get_from_hash(&existing_refs,
> +							  strhash(refname),
> +							  refname);
>  			if (peer_item) {
>  				struct object_id *old_oid = &peer_item->oid;
>  				oidcpy(&rm->peer_ref->old_oid, old_oid);
>
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0f8e333022..aee1d9bf21 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -259,7 +259,7 @@  static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
 	size_t len = strlen(refname);
 
 	FLEX_ALLOC_MEM(ent, refname, refname, len);
-	hashmap_entry_init(ent, memhash(refname, len));
+	hashmap_entry_init(ent, strhash(refname));
 	oidcpy(&ent->oid, oid);
 	hashmap_add(map, ent);
 	return ent;
@@ -282,11 +282,7 @@  static void refname_hash_init(struct hashmap *map)
 
 static int refname_hash_exists(struct hashmap *map, const char *refname)
 {
-	struct hashmap_entry key;
-	size_t len = strlen(refname);
-	hashmap_entry_init(&key, memhash(refname, len));
-
-	return !!hashmap_get(map, &key, refname);
+	return !!hashmap_get_from_hash(map, strhash(refname), refname);
 }
 
 static void find_non_local_tags(const struct ref *refs,
@@ -365,12 +361,10 @@  static void find_non_local_tags(const struct ref *refs,
 	 */
 	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
 		const char *refname = remote_ref_item->string;
-		struct hashmap_entry key;
 
-		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
-		item = hashmap_get(&remote_refs, &key, refname);
+		item = hashmap_get_from_hash(&remote_refs, strhash(refname), refname);
 		if (!item)
-			continue; /* can this happen??? */
+			BUG("unseen remote ref?");
 
 		/* Unless we have already decided to ignore this item... */
 		if (!is_null_oid(&item->oid)) {
@@ -497,12 +491,12 @@  static struct ref *get_ref_map(struct remote *remote,
 
 	for (rm = ref_map; rm; rm = rm->next) {
 		if (rm->peer_ref) {
-			struct hashmap_entry key;
 			const char *refname = rm->peer_ref->name;
 			struct refname_hash_entry *peer_item;
 
-			hashmap_entry_init(&key, memhash(refname, strlen(refname)));
-			peer_item = hashmap_get(&existing_refs, &key, refname);
+			peer_item = hashmap_get_from_hash(&existing_refs,
+							  strhash(refname),
+							  refname);
 			if (peer_item) {
 				struct object_id *old_oid = &peer_item->oid;
 				oidcpy(&rm->peer_ref->old_oid, old_oid);