Message ID | 20240225113947.89357-3-l.s.r@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | name-rev: use memory pool for name strings | expand |
René Scharfe <l.s.r@web.de> writes: > 1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04) > got a big performance boost in an unusual repository by calculating the > name length in advance. This is a bit awkward, as it references the > name components twice. > > Use a memory pool to store the strings for the struct rev_name member > tip_name. Using mem_pool_strfmt() allows efficient allocation without > explicit size calculation. This simplifies the formatting part of the > code without giving up performance: > > Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all > Time (mean ± σ): 1.231 s ± 0.013 s [User: 1.082 s, System: 0.136 s] > Range (min … max): 1.214 s … 1.252 s 10 runs > > Benchmark 2: ./git -C ../chromium/src name-rev --all > Time (mean ± σ): 1.220 s ± 0.020 s [User: 1.083 s, System: 0.130 s] > Range (min … max): 1.197 s … 1.254 s 10 runs > > Don't bother discarding the memory pool just before exiting. The effort > for that would be very low, but actually measurable in the above > example, with no benefit to users. At least UNLEAK it to calm down leak > checkers. This addresses the leaks that 45a14f578e (Revert "name-rev: > release unused name strings", 2022-04-22) brought back. > > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > This doesn't make any test script leak-free, though. Hmph, is the root cause of the leak because no sensible ownership rules are applied to .tip_name? Sometimes it is allocated for the paritcular rev_name, but some other times the pointer is copied from another rev_name.tip_name? As the way currently the code uses the .tip_name member seems to be "allocate and use without any freeing", I tend to agree that throwing them into mem-pool does make sense. > builtin/name-rev.c | 39 ++++++++++++++++++++------------------- > 1 file changed, 20 insertions(+), 19 deletions(-) > > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > index 2dd1807c4e..ad9930c831 100644 > --- a/builtin/name-rev.c > +++ b/builtin/name-rev.c > @@ -15,6 +15,7 @@ > #include "commit-slab.h" > #include "commit-graph.h" > #include "wildmatch.h" > +#include "mem-pool.h" > > /* > * One day. See the 'name a rev shortly after epoch' test in t6120 when > @@ -155,30 +156,25 @@ static struct rev_name *create_or_update_name(struct commit *commit, > return name; > } > > -static char *get_parent_name(const struct rev_name *name, int parent_number) > +static char *get_parent_name(const struct rev_name *name, int parent_number, > + struct mem_pool *string_pool) > { > - struct strbuf sb = STRBUF_INIT; > size_t len; > > strip_suffix(name->tip_name, "^0", &len); > if (name->generation > 0) { > - strbuf_grow(&sb, len + > - 1 + decimal_width(name->generation) + > - 1 + decimal_width(parent_number)); > - strbuf_addf(&sb, "%.*s~%d^%d", (int)len, name->tip_name, > - name->generation, parent_number); > + return mem_pool_strfmt(string_pool, "%.*s~%d^%d", > + (int)len, name->tip_name, > + name->generation, parent_number); > } else { > - strbuf_grow(&sb, len + > - 1 + decimal_width(parent_number)); > - strbuf_addf(&sb, "%.*s^%d", (int)len, name->tip_name, > - parent_number); > + return mem_pool_strfmt(string_pool, "%.*s^%d", > + (int)len, name->tip_name, parent_number); > } > - return strbuf_detach(&sb, NULL); > } > > static void name_rev(struct commit *start_commit, > const char *tip_name, timestamp_t taggerdate, > - int from_tag, int deref) > + int from_tag, int deref, struct mem_pool *string_pool) > { > struct prio_queue queue; > struct commit *commit; > @@ -195,9 +191,10 @@ static void name_rev(struct commit *start_commit, > if (!start_name) > return; > if (deref) > - start_name->tip_name = xstrfmt("%s^0", tip_name); > + start_name->tip_name = mem_pool_strfmt(string_pool, "%s^0", > + tip_name); > else > - start_name->tip_name = xstrdup(tip_name); > + start_name->tip_name = mem_pool_strdup(string_pool, tip_name); > > memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */ > prio_queue_put(&queue, start_commit); > @@ -235,7 +232,8 @@ static void name_rev(struct commit *start_commit, > if (parent_number > 1) > parent_name->tip_name = > get_parent_name(name, > - parent_number); > + parent_number, > + string_pool); > else > parent_name->tip_name = name->tip_name; > ALLOC_GROW(parents_to_queue, > @@ -415,7 +413,7 @@ static int name_ref(const char *path, const struct object_id *oid, > return 0; > } > > -static void name_tips(void) > +static void name_tips(struct mem_pool *string_pool) > { > int i; > > @@ -428,7 +426,7 @@ static void name_tips(void) > struct tip_table_entry *e = &tip_table.table[i]; > if (e->commit) { > name_rev(e->commit, e->refname, e->taggerdate, > - e->from_tag, e->deref); > + e->from_tag, e->deref, string_pool); > } > } > } > @@ -561,6 +559,7 @@ static void name_rev_line(char *p, struct name_ref_data *data) > > int cmd_name_rev(int argc, const char **argv, const char *prefix) > { > + struct mem_pool string_pool; > struct object_array revs = OBJECT_ARRAY_INIT; > int all = 0, annotate_stdin = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0; > struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP }; > @@ -587,6 +586,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) > OPT_END(), > }; > > + mem_pool_init(&string_pool, 0); > init_commit_rev_name(&rev_names); > git_config(git_default_config, NULL); > argc = parse_options(argc, argv, prefix, opts, name_rev_usage, 0); > @@ -648,7 +648,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) > adjust_cutoff_timestamp_for_slop(); > > for_each_ref(name_ref, &data); > - name_tips(); > + name_tips(&string_pool); > > if (annotate_stdin) { > struct strbuf sb = STRBUF_INIT; > @@ -676,6 +676,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) > always, allow_undefined, data.name_only); > } > > + UNLEAK(string_pool); > UNLEAK(revs); > return 0; > } > -- > 2.44.0
Am 26.02.24 um 03:05 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> 1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04) >> got a big performance boost in an unusual repository by calculating the >> name length in advance. This is a bit awkward, as it references the >> name components twice. >> >> Use a memory pool to store the strings for the struct rev_name member >> tip_name. Using mem_pool_strfmt() allows efficient allocation without >> explicit size calculation. This simplifies the formatting part of the >> code without giving up performance: >> >> Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all >> Time (mean ± σ): 1.231 s ± 0.013 s [User: 1.082 s, System: 0.136 s] >> Range (min … max): 1.214 s … 1.252 s 10 runs >> >> Benchmark 2: ./git -C ../chromium/src name-rev --all >> Time (mean ± σ): 1.220 s ± 0.020 s [User: 1.083 s, System: 0.130 s] >> Range (min … max): 1.197 s … 1.254 s 10 runs >> >> Don't bother discarding the memory pool just before exiting. The effort >> for that would be very low, but actually measurable in the above >> example, with no benefit to users. At least UNLEAK it to calm down leak >> checkers. This addresses the leaks that 45a14f578e (Revert "name-rev: >> release unused name strings", 2022-04-22) brought back. >> >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> This doesn't make any test script leak-free, though. > > Hmph, is the root cause of the leak because no sensible ownership > rules are applied to .tip_name? Sometimes it is allocated for the > paritcular rev_name, but some other times the pointer is copied from > another rev_name.tip_name? As the way currently the code uses the > .tip_name member seems to be "allocate and use without any freeing", > I tend to agree that throwing them into mem-pool does make sense. Yes, the tip_name string is shared between child and first parent (and its first parent and so on, until a better name is found. Without this sharing the peak memory footprint of "git name-rev --all" in Git's repo goes from 40011328 to 46286528 for me currently. I'm not too worried about the leak, though, as we can't release the memory until we're done anyway. The ability to build strings without having to calculate their sizes in advance (either by running vsnprintf twice, which is slow, or by doing format-specific calculations) is more interesting to me. The reason why none of the test scripts become leak-free is other commands (i.e. not git name-rev) that are still leaking. René
diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 2dd1807c4e..ad9930c831 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -15,6 +15,7 @@ #include "commit-slab.h" #include "commit-graph.h" #include "wildmatch.h" +#include "mem-pool.h" /* * One day. See the 'name a rev shortly after epoch' test in t6120 when @@ -155,30 +156,25 @@ static struct rev_name *create_or_update_name(struct commit *commit, return name; } -static char *get_parent_name(const struct rev_name *name, int parent_number) +static char *get_parent_name(const struct rev_name *name, int parent_number, + struct mem_pool *string_pool) { - struct strbuf sb = STRBUF_INIT; size_t len; strip_suffix(name->tip_name, "^0", &len); if (name->generation > 0) { - strbuf_grow(&sb, len + - 1 + decimal_width(name->generation) + - 1 + decimal_width(parent_number)); - strbuf_addf(&sb, "%.*s~%d^%d", (int)len, name->tip_name, - name->generation, parent_number); + return mem_pool_strfmt(string_pool, "%.*s~%d^%d", + (int)len, name->tip_name, + name->generation, parent_number); } else { - strbuf_grow(&sb, len + - 1 + decimal_width(parent_number)); - strbuf_addf(&sb, "%.*s^%d", (int)len, name->tip_name, - parent_number); + return mem_pool_strfmt(string_pool, "%.*s^%d", + (int)len, name->tip_name, parent_number); } - return strbuf_detach(&sb, NULL); } static void name_rev(struct commit *start_commit, const char *tip_name, timestamp_t taggerdate, - int from_tag, int deref) + int from_tag, int deref, struct mem_pool *string_pool) { struct prio_queue queue; struct commit *commit; @@ -195,9 +191,10 @@ static void name_rev(struct commit *start_commit, if (!start_name) return; if (deref) - start_name->tip_name = xstrfmt("%s^0", tip_name); + start_name->tip_name = mem_pool_strfmt(string_pool, "%s^0", + tip_name); else - start_name->tip_name = xstrdup(tip_name); + start_name->tip_name = mem_pool_strdup(string_pool, tip_name); memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */ prio_queue_put(&queue, start_commit); @@ -235,7 +232,8 @@ static void name_rev(struct commit *start_commit, if (parent_number > 1) parent_name->tip_name = get_parent_name(name, - parent_number); + parent_number, + string_pool); else parent_name->tip_name = name->tip_name; ALLOC_GROW(parents_to_queue, @@ -415,7 +413,7 @@ static int name_ref(const char *path, const struct object_id *oid, return 0; } -static void name_tips(void) +static void name_tips(struct mem_pool *string_pool) { int i; @@ -428,7 +426,7 @@ static void name_tips(void) struct tip_table_entry *e = &tip_table.table[i]; if (e->commit) { name_rev(e->commit, e->refname, e->taggerdate, - e->from_tag, e->deref); + e->from_tag, e->deref, string_pool); } } } @@ -561,6 +559,7 @@ static void name_rev_line(char *p, struct name_ref_data *data) int cmd_name_rev(int argc, const char **argv, const char *prefix) { + struct mem_pool string_pool; struct object_array revs = OBJECT_ARRAY_INIT; int all = 0, annotate_stdin = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0; struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP }; @@ -587,6 +586,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) OPT_END(), }; + mem_pool_init(&string_pool, 0); init_commit_rev_name(&rev_names); git_config(git_default_config, NULL); argc = parse_options(argc, argv, prefix, opts, name_rev_usage, 0); @@ -648,7 +648,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) adjust_cutoff_timestamp_for_slop(); for_each_ref(name_ref, &data); - name_tips(); + name_tips(&string_pool); if (annotate_stdin) { struct strbuf sb = STRBUF_INIT; @@ -676,6 +676,7 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) always, allow_undefined, data.name_only); } + UNLEAK(string_pool); UNLEAK(revs); return 0; }
1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04) got a big performance boost in an unusual repository by calculating the name length in advance. This is a bit awkward, as it references the name components twice. Use a memory pool to store the strings for the struct rev_name member tip_name. Using mem_pool_strfmt() allows efficient allocation without explicit size calculation. This simplifies the formatting part of the code without giving up performance: Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all Time (mean ± σ): 1.231 s ± 0.013 s [User: 1.082 s, System: 0.136 s] Range (min … max): 1.214 s … 1.252 s 10 runs Benchmark 2: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.220 s ± 0.020 s [User: 1.083 s, System: 0.130 s] Range (min … max): 1.197 s … 1.254 s 10 runs Don't bother discarding the memory pool just before exiting. The effort for that would be very low, but actually measurable in the above example, with no benefit to users. At least UNLEAK it to calm down leak checkers. This addresses the leaks that 45a14f578e (Revert "name-rev: release unused name strings", 2022-04-22) brought back. Signed-off-by: René Scharfe <l.s.r@web.de> --- This doesn't make any test script leak-free, though. builtin/name-rev.c | 39 ++++++++++++++++++++------------------- 1 file changed, 20 insertions(+), 19 deletions(-) -- 2.44.0