Message ID | a86fd5fdcc47baf85046eb07257f4db9f9498084.1598035949.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add struct strmap and associated utility functions | expand |
On Fri, Aug 21, 2020 at 06:52:26PM +0000, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > Add strmap as a new struct and associated utility functions, > specifically for hashmaps that map strings to some value. The API is > taken directly from Peff's proposal at > https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/ Uh oh. You are encouraging me in the belief that I can send half-baked ideas to the list and somebody will come along and implement them for me. ;) > Peff only included the header, not the implementation, so it isn't clear what > the structure was he was going to use for the hash entries. Instead of having > my str_entry struct have three subfields (the hashmap_entry, the string, and > the void* value), I made it only have two -- the hashmap_entry and a > string_list_item, for two reasons: I'd probably have done: struct strmap_entry { struct hashmap_entry ent; void *value; char key[FLEX_ALLOC]; }; That saves 8 bytes (plus malloc overhead)per item, plus avoids an extra pointer-chase for each item we consider when looking up. > 1) a strmap is often the data structure we want where string_list has > been used in the past. Using the same building block for > individual entries in both makes it easier to adopt and reuse > parts of the string_list API in strmap. I can see that there might be some value in being able to interchange the items for code that's expecting a string_list_item. But I have to wonder if the potential for confusion is worth it. I.e., should that code really be expecting a raw string pointer (possibly with a separate void pointer, or even better an actual typed pointer). I'll keep an eye out as I read the rest of the series for code which uses this. > 2) In some cases, after doing lots of other work, I want to be able > to iterate over the items in my strmap in sorted order. hashmap > obviously doesn't support that, but I wanted to be able to export > the strmap to a string_list easily and then use its functions. > (Note: I do not need the data structure to both be sorted and have > efficient lookup at all times. If I did, I might use a B-tree > instead, as suggested by brian in response to Peff in the thread > noted above. In my case, most strmaps will never need sorting, but > in one special case at the very end of a bunch of other work I want > to iterate over the items in sorted order without doing any more > lookups afterward.) Hmm. Likewise, I'll keep an eye open for how this works in practice. I do suspect that a B-tree might be a better solution here, but implementing it is non-trivial, and most callers don't care about this property. If the interim solution is to just dump it to a string_list and sort that, that's really not that bad, assuming it just happens once after we've added everything. I'm not sure there's that big a benefit to using string_list_item internally, since presumably that conversion needs to write a whole new array of string_list_items anyway. > Also, I removed the STRMAP_INIT macro, since it cannot be used to > correctly initialize a strmap; the underlying hashmap needs a call to > hashmap_init() to allocate the hash table first. Since access to the underlying hashmap happens through strmap functions, they could lazily initialize it. That's how oidmap works. > diff --git a/strmap.c b/strmap.c > new file mode 100644 > index 0000000000..1c9fdb3b1e > --- /dev/null > +++ b/strmap.c > @@ -0,0 +1,81 @@ > +#include "git-compat-util.h" > +#include "strmap.h" > + > +static int cmp_str_entry(const void *hashmap_cmp_fn_data, > + const struct hashmap_entry *entry1, > + const struct hashmap_entry *entry2, > + const void *keydata) > +{ > + const struct str_entry *e1, *e2; > + > + e1 = container_of(entry1, const struct str_entry, ent); > + e2 = container_of(entry2, const struct str_entry, ent); > + return strcmp(e1->item.string, e2->item.string); > +} If you do go the FLEX_ALLOC route, obviously lookups won't want to allocate a str_entry struct for the lookup key. You'd use keydata there (and prefer it over looking at entry2 at all). See remotes_hash_cmp() for an example. > +static struct str_entry *find_str_entry(struct strmap *map, > + const char *str) > +{ > + struct str_entry entry; > + hashmap_entry_init(&entry.ent, strhash(str)); > + entry.item.string = (char *)str; > + return hashmap_get_entry(&map->map, &entry, ent, NULL); > +} Casting away constness here is awkward. It could likewise benefit from using keydata, so you wouldn't need to create a temporary string_list_item (which is where the non-constness comes from). > +void strmap_clear(struct strmap *map, int free_util) > +{ > + struct hashmap_iter iter; > + struct str_entry *e; > + > + if (!map) > + return; In a lazy-init world, this becomes: if (!map || !map->map.table) Of course it would be better still if the hashmap code learned to do the lazy-init stuff itself. > + hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) { > + free(e->item.string); > + if (free_util) > + free(e->item.util); > + } > + hashmap_free_entries(&map->map, struct str_entry, ent); With a flex-alloc struct, you can avoid the extra string free. But I guess you still wouldn't avoid the loop if you want to support free_entries(). I wonder if it would make the API simpler if the struct knew whether it owned the void pointer values or not. Then you'd do: struct strmap foo = { .free_values = 1 }; ... strmap_put(&foo, "key", value); ... strmap_clear(&foo); and wouldn't have to remember to do the right thing at clear-time. It is a little less flexible (e.g., if you transfer ownership after a certain point in the code), but I wonder if any callers actually need that (and they could always set the free_values flag then). > +/* > + * Insert "str" into the map, pointing to "data". A copy of "str" is made, so > + * it does not need to persist after the this function is called. > + * > + * If an entry for "str" already exists, its data pointer is overwritten, and > + * the original data pointer returned. Otherwise, returns NULL. > + */ > +void *strmap_put(struct strmap *map, const char *str, void *data) Minor, but IMHO we should avoid copying the docstrings to the implementation, since it gives two places that people have to remember to update if the API changes. > +void *strmap_put(struct strmap *map, const char *str, void *data) > +{ > + struct str_entry *entry = find_str_entry(map, str); > + void *old = NULL; In a lazy-init world, this is: if (!map->map.table) { strmap_init(map); entry = NULL; } else { entry = find_str_entry(map, str); } (or just call find_str_entry() in both cases and let it realize there's nothing to find). > + if (entry) { > + old = entry->item.util; > + entry->item.util = data; > + } else { > + entry = xmalloc(sizeof(*entry)); > + hashmap_entry_init(&entry->ent, strhash(str)); > + entry->item.string = strdup(str); > + entry->item.util = data; > + hashmap_add(&map->map, &entry->ent); > + } And in a flex-alloc world, this second block is: FLEX_ALLOC_STR(entry, key, str); hashmap_entry_init(&entry->ent, strhash(str)); entry->value = data; hashmap_add(&map->map, &entry->ent); > +void *strmap_get(struct strmap *map, const char *str) > +{ > + struct str_entry *entry = find_str_entry(map, str); > + return entry ? entry->item.util : NULL; > +} In a lazy world, this is: if (!map->map.table) return NULL; > +int strmap_contains(struct strmap *map, const char *str) > +{ > + return find_str_entry(map, str) != NULL; > +} And likewise: if (!map->map.table) return NULL; It might actually be easier to stick that in find_str_entry(). The rest of it all looked good to me. -Peff
diff --git a/Makefile b/Makefile index 65f8cfb236..0da15a9ee5 100644 --- a/Makefile +++ b/Makefile @@ -988,6 +988,7 @@ LIB_OBJS += strbuf.o LIB_OBJS += strvec.o LIB_OBJS += streaming.o LIB_OBJS += string-list.o +LIB_OBJS += strmap.o LIB_OBJS += sub-process.o LIB_OBJS += submodule-config.o LIB_OBJS += submodule.o diff --git a/strmap.c b/strmap.c new file mode 100644 index 0000000000..1c9fdb3b1e --- /dev/null +++ b/strmap.c @@ -0,0 +1,81 @@ +#include "git-compat-util.h" +#include "strmap.h" + +static int cmp_str_entry(const void *hashmap_cmp_fn_data, + const struct hashmap_entry *entry1, + const struct hashmap_entry *entry2, + const void *keydata) +{ + const struct str_entry *e1, *e2; + + e1 = container_of(entry1, const struct str_entry, ent); + e2 = container_of(entry2, const struct str_entry, ent); + return strcmp(e1->item.string, e2->item.string); +} + +static struct str_entry *find_str_entry(struct strmap *map, + const char *str) +{ + struct str_entry entry; + hashmap_entry_init(&entry.ent, strhash(str)); + entry.item.string = (char *)str; + return hashmap_get_entry(&map->map, &entry, ent, NULL); +} + +void strmap_init(struct strmap *map) +{ + hashmap_init(&map->map, cmp_str_entry, NULL, 0); +} + +void strmap_clear(struct strmap *map, int free_util) +{ + struct hashmap_iter iter; + struct str_entry *e; + + if (!map) + return; + + hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) { + free(e->item.string); + if (free_util) + free(e->item.util); + } + hashmap_free_entries(&map->map, struct str_entry, ent); + strmap_init(map); +} + +/* + * Insert "str" into the map, pointing to "data". A copy of "str" is made, so + * it does not need to persist after the this function is called. + * + * If an entry for "str" already exists, its data pointer is overwritten, and + * the original data pointer returned. Otherwise, returns NULL. + */ +void *strmap_put(struct strmap *map, const char *str, void *data) +{ + struct str_entry *entry = find_str_entry(map, str); + void *old = NULL; + + if (entry) { + old = entry->item.util; + entry->item.util = data; + } else { + entry = xmalloc(sizeof(*entry)); + hashmap_entry_init(&entry->ent, strhash(str)); + entry->item.string = strdup(str); + entry->item.util = data; + hashmap_add(&map->map, &entry->ent); + } + return old; +} + +void *strmap_get(struct strmap *map, const char *str) +{ + struct str_entry *entry = find_str_entry(map, str); + return entry ? entry->item.util : NULL; +} + +int strmap_contains(struct strmap *map, const char *str) +{ + return find_str_entry(map, str) != NULL; +} diff --git a/strmap.h b/strmap.h new file mode 100644 index 0000000000..eb5807f6fa --- /dev/null +++ b/strmap.h @@ -0,0 +1,47 @@ +#ifndef STRMAP_H +#define STRMAP_H + +#include "hashmap.h" +#include "string-list.h" + +struct strmap { + struct hashmap map; +}; + +struct str_entry { + struct hashmap_entry ent; + struct string_list_item item; +}; + +/* + * Initialize an empty strmap + */ +void strmap_init(struct strmap *map); + +/* + * Remove all entries from the map, releasing any allocated resources. + */ +void strmap_clear(struct strmap *map, int free_values); + +/* + * Insert "str" into the map, pointing to "data". A copy of "str" is made, so + * it does not need to persist after the this function is called. + * + * If an entry for "str" already exists, its data pointer is overwritten, and + * the original data pointer returned. Otherwise, returns NULL. + */ +void *strmap_put(struct strmap *map, const char *str, void *data); + +/* + * Return the data pointer mapped by "str", or NULL if the entry does not + * exist. + */ +void *strmap_get(struct strmap *map, const char *str); + +/* + * Return non-zero iff "str" is present in the map. This differs from + * strmap_get() in that it can distinguish entries with a NULL data pointer. + */ +int strmap_contains(struct strmap *map, const char *str); + +#endif /* STRMAP_H */