Message ID | pull.828.git.1609302714183.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 90b666da601a5ccf9bbc3f2282fb166bf3fdade7 |
Headers | show |
Series | revision: trace topo-walk statistics | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > diff --git a/revision.c b/revision.c > index 9dff845bed6..1bb590ece78 100644 > --- a/revision.c > +++ b/revision.c > @@ -3308,6 +3308,26 @@ struct topo_walk_info { > struct author_date_slab author_date; > }; > > +static int topo_walk_atexit_registered; > +static unsigned int count_explore_walked; > +static unsigned int count_indegree_walked; > +static unsigned int count_topo_walked; The revision walk machinery is designed to be callable more than once during the lifetime of a process. These make readers wonder if they should be defined in "struct rev_info" to allow stats collected per traversal.
On 1/6/2021 8:37 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> diff --git a/revision.c b/revision.c >> index 9dff845bed6..1bb590ece78 100644 >> --- a/revision.c >> +++ b/revision.c >> @@ -3308,6 +3308,26 @@ struct topo_walk_info { >> struct author_date_slab author_date; >> }; >> >> +static int topo_walk_atexit_registered; >> +static unsigned int count_explore_walked; >> +static unsigned int count_indegree_walked; >> +static unsigned int count_topo_walked; > > The revision walk machinery is designed to be callable more than > once during the lifetime of a process. These make readers wonder > if they should be defined in "struct rev_info" to allow stats > collected per traversal. That's possible, but the use of an at-exit method means we only report one set of statistics and the 'struct rev_info' might be defunct. It does limit how useful the statistics can be when there are multiple 'struct rev_info's in use, but we also cannot trust that the rev_infos are being cleaned up properly at the end of the process to trigger the stats logging. Of course, maybe that _is_ something we could guarantee, or rather _should_ guarantee by patching any leaks. Seems like a lot of work when these aggregate statistics will be effective enough. But maybe I'm judging the potential work too harshly? Thanks, -Stolee
Derrick Stolee <stolee@gmail.com> writes: > On 1/6/2021 8:37 PM, Junio C Hamano wrote: >> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> >>> diff --git a/revision.c b/revision.c >>> index 9dff845bed6..1bb590ece78 100644 >>> --- a/revision.c >>> +++ b/revision.c >>> @@ -3308,6 +3308,26 @@ struct topo_walk_info { >>> struct author_date_slab author_date; >>> }; >>> >>> +static int topo_walk_atexit_registered; >>> +static unsigned int count_explore_walked; >>> +static unsigned int count_indegree_walked; >>> +static unsigned int count_topo_walked; >> >> The revision walk machinery is designed to be callable more than >> once during the lifetime of a process. These make readers wonder >> if they should be defined in "struct rev_info" to allow stats >> collected per traversal. > > That's possible, but the use of an at-exit method means we only > report one set of statistics and the 'struct rev_info' might > be defunct. Ah, sorry for the noise. Even after making multiple traversal we want to report the aggregate. > It does limit how useful the statistics can be when there are > multiple 'struct rev_info's in use, but we also cannot trust > that the rev_infos are being cleaned up properly at the end > of the process to trigger the stats logging. > > Of course, maybe that _is_ something we could guarantee, or > rather _should_ guarantee by patching any leaks. Seems like > a lot of work when these aggregate statistics will be > effective enough. But maybe I'm judging the potential work > too harshly? But different subsystems would have different "per-invocation" structure (e.g. "diff" uses "struct diff_options") and some may not even have an appropriate structure to hang these stats on. We may want to design a more general mechanism that can be used to annotate the subsystems uniformly. While that could be a worthy longer term goal, it certainly does not have to be part of this single-patch topic, I would think.
On 1/6/2021 9:29 PM, Junio C Hamano wrote: > But different subsystems would have different "per-invocation" > structure (e.g. "diff" uses "struct diff_options") and some may not > even have an appropriate structure to hang these stats on. We may > want to design a more general mechanism that can be used to annotate > the subsystems uniformly. While that could be a worthy longer term > goal, it certainly does not have to be part of this single-patch > topic, I would think. I will give this further thought. Thanks, -Stolee
diff --git a/revision.c b/revision.c index 9dff845bed6..1bb590ece78 100644 --- a/revision.c +++ b/revision.c @@ -3308,6 +3308,26 @@ struct topo_walk_info { struct author_date_slab author_date; }; +static int topo_walk_atexit_registered; +static unsigned int count_explore_walked; +static unsigned int count_indegree_walked; +static unsigned int count_topo_walked; + +static void trace2_topo_walk_statistics_atexit(void) +{ + struct json_writer jw = JSON_WRITER_INIT; + + jw_object_begin(&jw, 0); + jw_object_intmax(&jw, "count_explore_walked", count_explore_walked); + jw_object_intmax(&jw, "count_indegree_walked", count_indegree_walked); + jw_object_intmax(&jw, "count_topo_walked", count_topo_walked); + jw_end(&jw); + + trace2_data_json("topo_walk", the_repository, "statistics", &jw); + + jw_release(&jw); +} + static inline void test_flag_and_insert(struct prio_queue *q, struct commit *c, int flag) { if (c->object.flags & flag) @@ -3329,6 +3349,8 @@ static void explore_walk_step(struct rev_info *revs) if (repo_parse_commit_gently(revs->repo, c, 1) < 0) return; + count_explore_walked++; + if (revs->sort_order == REV_SORT_BY_AUTHOR_DATE) record_author_date(&info->author_date, c); @@ -3367,6 +3389,8 @@ static void indegree_walk_step(struct rev_info *revs) if (repo_parse_commit_gently(revs->repo, c, 1) < 0) return; + count_indegree_walked++; + explore_to_depth(revs, commit_graph_generation(c)); for (p = c->parents; p; p = p->next) { @@ -3476,6 +3500,11 @@ static void init_topo_walk(struct rev_info *revs) */ if (revs->sort_order == REV_SORT_IN_GRAPH_ORDER) prio_queue_reverse(&info->topo_queue); + + if (trace2_is_enabled() && !topo_walk_atexit_registered) { + atexit(trace2_topo_walk_statistics_atexit); + topo_walk_atexit_registered = 1; + } } static struct commit *next_topo_commit(struct rev_info *revs) @@ -3502,6 +3531,8 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit) oid_to_hex(&commit->object.oid)); } + count_topo_walked++; + for (p = commit->parents; p; p = p->next) { struct commit *parent = p->item; int *pi;