Message ID | f3b87a33da2a9bd2cd5148fcfe1d55a6281d8b99.1656403084.git.dyroneteng@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tr2: avoid to print "interesting" config repeatedly | expand |
On Tue, Jun 28 2022, Teng Long wrote: > We can use GIT_TRACE2_CONFIG_PARAMS and trace2.configparams to > dump the config which we are inteseted in to the tr2 log. If > an "interesting" config exists in multiple scopes, it will be > dumped multiple times. So, let's fix it to only print the > final value instead. > > Signed-off-by: Teng Long <dyroneteng@gmail.com> > Helped-by: Jeff Hostetler <git@jeffhostetler.com> > --- > trace2/tr2_cfg.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c > index ec9ac1a6ef..632bb6feec 100644 > --- a/trace2/tr2_cfg.c > +++ b/trace2/tr2_cfg.c > @@ -1,5 +1,6 @@ > #include "cache.h" > #include "config.h" > +#include "strmap.h" > #include "trace2/tr2_cfg.h" > #include "trace2/tr2_sysenv.h" > > @@ -10,6 +11,7 @@ static int tr2_cfg_loaded; > static struct strbuf **tr2_cfg_env_vars; > static int tr2_cfg_env_vars_count; > static int tr2_cfg_env_vars_loaded; > +static struct strset tr_cfg_set = STRSET_INIT; > > /* > * Parse a string containing a comma-delimited list of config keys > @@ -101,12 +103,17 @@ static int tr2_cfg_cb(const char *key, const char *value, void *d) > { > struct strbuf **s; > struct tr2_cfg_data *data = (struct tr2_cfg_data *)d; > + const char *prior_value; > > for (s = tr2_cfg_patterns; *s; s++) { > struct strbuf *buf = *s; > int wm = wildmatch(buf->buf, key, WM_CASEFOLD); > if (wm == WM_MATCH) { > - trace2_def_param_fl(data->file, data->line, key, value); > + if (strset_contains(&tr_cfg_set, key) > + || git_config_get_value(key, &prior_value)) > + continue; > + trace2_def_param_fl(data->file, data->line, key, prior_value); > + strset_add(&tr_cfg_set, key); > return 0; > } > } Is tr2_cfg_load_patterns() run at early startup, but this perhaps at runtime? I wonder if this is OK under threading, i.e. concurrent access to your strset. The assumption you're making here doesn't hold in general, some config is "last vaule wins", but for some other config all configured values are important. Do we care in this case? I really don't know, but perhaps we can declare "dedup" using the same facility we're using to wildcard-match keys, and either make that optional or the default, e.g.: GIT_TRACE2_CONFIG_PARAMS='*:dedup,core.*:' I.e. to make it a list of <glob>[:<options>]. Maybe not worth the effort...
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Do we care in this case? I really don't know, but perhaps we can declare > "dedup" using the same facility we're using to wildcard-match keys, and > either make that optional or the default, e.g.: > > GIT_TRACE2_CONFIG_PARAMS='*:dedup,core.*:' > > I.e. to make it a list of <glob>[:<options>]. > > Maybe not worth the effort... I happen to think that the trace output is primarily for machine consumption (i.e. if you want to make it readable by humans, it is OK to require them to pre-process). How does this "duplicate output" come about? If it is because end-user configuration files have multiple entries that are either list-valued or relying on last-one-wins semantics, I suspect that it is better not to prematurely filter these at the trace generation side (which by definition is an opration that loses information). So, to me, it smells that the whole "dedup at the source" thing is not just not worth the effort but is misguided.
On 6/28/22 2:12 PM, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Do we care in this case? I really don't know, but perhaps we can declare >> "dedup" using the same facility we're using to wildcard-match keys, and >> either make that optional or the default, e.g.: >> >> GIT_TRACE2_CONFIG_PARAMS='*:dedup,core.*:' >> >> I.e. to make it a list of <glob>[:<options>]. >> >> Maybe not worth the effort... > > I happen to think that the trace output is primarily for machine > consumption (i.e. if you want to make it readable by humans, it is > OK to require them to pre-process). > > How does this "duplicate output" come about? If it is because > end-user configuration files have multiple entries that are either > list-valued or relying on last-one-wins semantics, I suspect that it > is better not to prematurely filter these at the trace generation > side (which by definition is an opration that loses information). > > So, to me, it smells that the whole "dedup at the source" thing is > not just not worth the effort but is misguided. > I agree. Let's not try to dedup these. IIRC, the tr2_cfg_cb() machinery runs during a "read early" or "read very early" scan of the config values and the program is still starting up (in some sense). For example, the command line may not have been fully processed yet (again, IIRC). So injecting a call here to force an explicit lookup could cause problems. And you'll be arbitrarily taking the first value, which is probably the system level setting rather than the last value, which is probably the local setting. So the output would be misleading. And I don't think it is worth the effort. Let's just log the context (system, global, local) as you described in the previous version and be happy that we get multiple-but-now-qualified values. This was intended as a machine-readable feature to allow telemetry post-processing to group or filter command events. For example, if we want to do some "feature-x" is ON or OFF and compute averages for the ONs and averages for the OFFs and compare them. Such post- processing has (I'm assuming) in the past always looked at the last value logged. And I assume that that would not change here with the additional qualification. Also, your new qualification would help us answer support questions using telemetry where a user thought they had a feature on (say globally) but was actually off at another level (say locally). Jeff
> Is tr2_cfg_load_patterns() run at early startup, but this perhaps at > runtime? I wonder if this is OK under threading, i.e. concurrent access > to your strset. Yes. I think it may have at runtime. Not for that reason, for now I think it's better to dump each config and scope name is a better way in terms of other replied suggestions. > The assumption you're making here doesn't hold in general, some config > is "last vaule wins", but for some other config all configured values > are important. Yes. Also based on the above replies. > Do we care in this case? I really don't know, but perhaps we can declare > "dedup" using the same facility we're using to wildcard-match keys, and > either make that optional or the default, e.g.: > > GIT_TRACE2_CONFIG_PARAMS='*:dedup,core.*:' > > I.e. to make it a list of <glob>[:<options>]. > > Maybe not worth the effort... I think if we could print config and scope names is enough maybe. When users specify such as GIT_TRACE2_CONFIG_PARAMS, no one will cares about the extra few lines config output in my opinion, but it may be useful for some troubleshooting, so in my opinion this may not be implemented in this patch for the time being. Thanks.
> I agree. Let's not try to dedup these. IIRC, the tr2_cfg_cb() > machinery runs during a "read early" or "read very early" scan of > the config values and the program is still starting up (in some > sense). For example, the command line may not have been fully > processed yet (again, IIRC). So injecting a call here to force an > explicit lookup could cause problems. The "read early" or "read very early" is a little abtract for me I think, I will take a further look. > And I don't think it is worth the effort. Let's just log the > context (system, global, local) as you described in the previous > version and be happy that we get multiple-but-now-qualified values. Yes. > Also, your new qualification would help us answer support questions > using telemetry where a user thought they had a feature on (say > globally) but was actually off at another level (say locally). That's certainly my original thought. Thanks.
diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c index ec9ac1a6ef..632bb6feec 100644 --- a/trace2/tr2_cfg.c +++ b/trace2/tr2_cfg.c @@ -1,5 +1,6 @@ #include "cache.h" #include "config.h" +#include "strmap.h" #include "trace2/tr2_cfg.h" #include "trace2/tr2_sysenv.h" @@ -10,6 +11,7 @@ static int tr2_cfg_loaded; static struct strbuf **tr2_cfg_env_vars; static int tr2_cfg_env_vars_count; static int tr2_cfg_env_vars_loaded; +static struct strset tr_cfg_set = STRSET_INIT; /* * Parse a string containing a comma-delimited list of config keys @@ -101,12 +103,17 @@ static int tr2_cfg_cb(const char *key, const char *value, void *d) { struct strbuf **s; struct tr2_cfg_data *data = (struct tr2_cfg_data *)d; + const char *prior_value; for (s = tr2_cfg_patterns; *s; s++) { struct strbuf *buf = *s; int wm = wildmatch(buf->buf, key, WM_CASEFOLD); if (wm == WM_MATCH) { - trace2_def_param_fl(data->file, data->line, key, value); + if (strset_contains(&tr_cfg_set, key) + || git_config_get_value(key, &prior_value)) + continue; + trace2_def_param_fl(data->file, data->line, key, prior_value); + strset_add(&tr_cfg_set, key); return 0; } }
We can use GIT_TRACE2_CONFIG_PARAMS and trace2.configparams to dump the config which we are inteseted in to the tr2 log. If an "interesting" config exists in multiple scopes, it will be dumped multiple times. So, let's fix it to only print the final value instead. Signed-off-by: Teng Long <dyroneteng@gmail.com> Helped-by: Jeff Hostetler <git@jeffhostetler.com> --- trace2/tr2_cfg.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)