Message ID | pull.835.v4.git.git.1604535765.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Add struct strmap and associated utility functions | expand |
On Thu, Nov 05, 2020 at 12:22:32AM +0000, Elijah Newren via GitGitGadget wrote: > Changes since v3 (almost all of which were suggestions from Peff): > > * Fix pointer math due to platform differences in FLEX_ALLOC definition, > and a few other FLEXPTR_ALLOC_STR cleanups > * Define strmap_for_each_entry in terms of hashmap_for_each_entry instead > of lower level functions > * Use simpler _INIT macros > * Remove strset_check_and_add() from API as per Peff's suggestion > (merge-ort doesn't need it; we can add it later) > * Update comments and commit messages to update now obsolete statements due > to changes from earlier reviews Thanks, this version looks good to me. I think we might as well do this on top now: -- >8 -- Subject: [PATCH] shortlog: drop custom strset implementation We can use the strset recently added in strmap.h instead. Note that this doesn't have a "check_and_add" function. We can easily write the same thing using separate "contains" and "adds" calls. This is slightly less efficient, in that it hashes the string twice, but for our use here it shouldn't be a big deal either way. I did leave it as a separate helper function, though, since we use it in three separate spots (some of which are in the middle of a conditional). Signed-off-by: Jeff King <peff@peff.net> --- builtin/shortlog.c | 59 ++++++---------------------------------------- 1 file changed, 7 insertions(+), 52 deletions(-) diff --git a/builtin/shortlog.c b/builtin/shortlog.c index 83f0a739b4..2d036ceec2 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -10,6 +10,7 @@ #include "shortlog.h" #include "parse-options.h" #include "trailer.h" +#include "strmap.h" static char const * const shortlog_usage[] = { N_("git shortlog [<options>] [<revision-range>] [[--] <path>...]"), @@ -169,60 +170,14 @@ static void read_from_stdin(struct shortlog *log) strbuf_release(&oneline); } -struct strset_item { - struct hashmap_entry ent; - char value[FLEX_ARRAY]; -}; - -struct strset { - struct hashmap map; -}; - -#define STRSET_INIT { { NULL } } - -static int strset_item_hashcmp(const void *hash_data, - const struct hashmap_entry *entry, - const struct hashmap_entry *entry_or_key, - const void *keydata) +static int check_dup(struct strset *dups, const char *str) { - const struct strset_item *a, *b; - - a = container_of(entry, const struct strset_item, ent); - if (keydata) - return strcmp(a->value, keydata); - - b = container_of(entry_or_key, const struct strset_item, ent); - return strcmp(a->value, b->value); -} - -/* - * Adds "str" to the set if it was not already present; returns true if it was - * already there. - */ -static int strset_check_and_add(struct strset *ss, const char *str) -{ - unsigned int hash = strhash(str); - struct strset_item *item; - - if (!ss->map.table) - hashmap_init(&ss->map, strset_item_hashcmp, NULL, 0); - - if (hashmap_get_from_hash(&ss->map, hash, str)) + if (strset_contains(dups, str)) return 1; - - FLEX_ALLOC_STR(item, value, str); - hashmap_entry_init(&item->ent, hash); - hashmap_add(&ss->map, &item->ent); + strset_add(dups, str); return 0; } -static void strset_clear(struct strset *ss) -{ - if (!ss->map.table) - return; - hashmap_clear_and_free(&ss->map, struct strset_item, ent); -} - static void insert_records_from_trailers(struct shortlog *log, struct strset *dups, struct commit *commit, @@ -253,7 +208,7 @@ static void insert_records_from_trailers(struct shortlog *log, if (!parse_ident(log, &ident, value)) value = ident.buf; - if (strset_check_and_add(dups, value)) + if (check_dup(dups, value)) continue; insert_one_record(log, value, oneline); } @@ -291,7 +246,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) log->email ? "%aN <%aE>" : "%aN", &ident, &ctx); if (!HAS_MULTI_BITS(log->groups) || - !strset_check_and_add(&dups, ident.buf)) + !check_dup(&dups, ident.buf)) insert_one_record(log, ident.buf, oneline_str); } if (log->groups & SHORTLOG_GROUP_COMMITTER) { @@ -300,7 +255,7 @@ void shortlog_add_commit(struct shortlog *log, struct commit *commit) log->email ? "%cN <%cE>" : "%cN", &ident, &ctx); if (!HAS_MULTI_BITS(log->groups) || - !strset_check_and_add(&dups, ident.buf)) + !check_dup(&dups, ident.buf)) insert_one_record(log, ident.buf, oneline_str); } if (log->groups & SHORTLOG_GROUP_TRAILER) {
Jeff King <peff@peff.net> writes: > Subject: [PATCH] shortlog: drop custom strset implementation > > We can use the strset recently added in strmap.h instead. Note that this > doesn't have a "check_and_add" function. We can easily write the same > thing using separate "contains" and "adds" calls. This is slightly less > efficient, in that it hashes the string twice, but for our use here it > shouldn't be a big deal either way. > > I did leave it as a separate helper function, though, since we use it in > three separate spots (some of which are in the middle of a conditional). It makes sense, but check_dup() sounds like its use pattern would be if (check_dup(it) == NO_DUP) add(it); where it is more like "add it but just once". By the way, is a strset a set or a bag? If it makes the second add an no-op, perhaps your check_dup() is what strset_add() should do itself? What builtin/shortlog.c::check_dup() does smells like it is a workaround for the lack of a naturally-expected feature.
On Thu, Nov 05, 2020 at 12:25:14PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Subject: [PATCH] shortlog: drop custom strset implementation > > > > We can use the strset recently added in strmap.h instead. Note that this > > doesn't have a "check_and_add" function. We can easily write the same > > thing using separate "contains" and "adds" calls. This is slightly less > > efficient, in that it hashes the string twice, but for our use here it > > shouldn't be a big deal either way. > > > > I did leave it as a separate helper function, though, since we use it in > > three separate spots (some of which are in the middle of a conditional). > > It makes sense, but check_dup() sounds like its use pattern would be > > if (check_dup(it) == NO_DUP) > add(it); > > where it is more like "add it but just once". Hmph. I picked that name (versus just "contains") hoping it be general enough to cover the dual operation. Better name suggestions are welcome. Though... > By the way, is a strset a set or a bag? If it makes the second add > an no-op, perhaps your check_dup() is what strset_add() should do > itself? What builtin/shortlog.c::check_dup() does smells like it is > a workaround for the lack of a naturally-expected feature. Yes, if strset_add() returned an integer telling us whether the item was already in the set, then we could use it directly. It's slightly non-trivial to do, though, as it's built around strmap_put(), which returns a pointer to the old value. But since we're a set and not a map, that value is always NULL; we can't tell the difference between "I was storing an old value which was NULL" and "I was not storing any value". If strset were built around strintmap it could store "1" for "present in the set". It somehow feels hacky, though, to induce extra value writes just for the sake of working around the API. Since strset is defined within strmap.c, perhaps it wouldn't be too bad for it to be more intimate with the details here. I.e., to use find_strmap_entry() directly, and if the value is not present, to create a new hashmap entry. That would require hacking up strmap_put() into a few helpers, but it's probably not too bad. -Peff
On Thu, Nov 5, 2020 at 12:25 PM Junio C Hamano <gitster@pobox.com> wrote: > > Jeff King <peff@peff.net> writes: > > > Subject: [PATCH] shortlog: drop custom strset implementation > > > > We can use the strset recently added in strmap.h instead. Note that this > > doesn't have a "check_and_add" function. We can easily write the same > > thing using separate "contains" and "adds" calls. This is slightly less > > efficient, in that it hashes the string twice, but for our use here it > > shouldn't be a big deal either way. > > > > I did leave it as a separate helper function, though, since we use it in > > three separate spots (some of which are in the middle of a conditional). > > It makes sense, but check_dup() sounds like its use pattern would be > > if (check_dup(it) == NO_DUP) > add(it); > > where it is more like "add it but just once". > > By the way, is a strset a set or a bag? If it makes the second add strset is a set; there is no way to get duplicate entries. > an no-op, perhaps your check_dup() is what strset_add() should do > itself? What builtin/shortlog.c::check_dup() does smells like it is > a workaround for the lack of a naturally-expected feature. Is the expectation that strset_add() would return a boolean for whether a new entry was added?
Elijah Newren <newren@gmail.com> writes: >> It makes sense, but check_dup() sounds like its use pattern would be >> >> if (check_dup(it) == NO_DUP) >> add(it); >> >> where it is more like "add it but just once". >> >> By the way, is a strset a set or a bag? If it makes the second add > > strset is a set; there is no way to get duplicate entries. > >> an no-op, perhaps your check_dup() is what strset_add() should do >> itself? What builtin/shortlog.c::check_dup() does smells like it is >> a workaround for the lack of a naturally-expected feature. > > Is the expectation that strset_add() would return a boolean for > whether a new entry was added? It seems to be a reasonable expectation that the caller can tell if the add was "already there and was a no-op", judging from what we saw in the shortlog code, which was the first audience the API was introduced to support. It seems to benefit from it if it were available, and has to work around the lack of it with check_dup() wrapper.