diff mbox series

[v5,5/5] tr2: avoid to print "interesting" config repeatedly

Message ID f3b87a33da2a9bd2cd5148fcfe1d55a6281d8b99.1656403084.git.dyroneteng@gmail.com (mailing list archive)
State New, archived
Headers show
Series tr2: avoid to print "interesting" config repeatedly | expand

Commit Message

Teng Long June 28, 2022, 8:17 a.m. UTC
We can use GIT_TRACE2_CONFIG_PARAMS and trace2.configparams to
dump the config which we are inteseted in to the tr2 log. If
an "interesting" config exists in multiple scopes, it will be
dumped multiple times. So, let's fix it to only print the
final value instead.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
---
 trace2/tr2_cfg.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason June 28, 2022, 9:13 a.m. UTC | #1
On Tue, Jun 28 2022, Teng Long wrote:

> We can use GIT_TRACE2_CONFIG_PARAMS and trace2.configparams to
> dump the config which we are inteseted in to the tr2 log. If
> an "interesting" config exists in multiple scopes, it will be
> dumped multiple times. So, let's fix it to only print the
> final value instead.
>
> Signed-off-by: Teng Long <dyroneteng@gmail.com>
> Helped-by: Jeff Hostetler <git@jeffhostetler.com>
> ---
>  trace2/tr2_cfg.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c
> index ec9ac1a6ef..632bb6feec 100644
> --- a/trace2/tr2_cfg.c
> +++ b/trace2/tr2_cfg.c
> @@ -1,5 +1,6 @@
>  #include "cache.h"
>  #include "config.h"
> +#include "strmap.h"
>  #include "trace2/tr2_cfg.h"
>  #include "trace2/tr2_sysenv.h"
>  
> @@ -10,6 +11,7 @@ static int tr2_cfg_loaded;
>  static struct strbuf **tr2_cfg_env_vars;
>  static int tr2_cfg_env_vars_count;
>  static int tr2_cfg_env_vars_loaded;
> +static struct strset tr_cfg_set = STRSET_INIT;
>  
>  /*
>   * Parse a string containing a comma-delimited list of config keys
> @@ -101,12 +103,17 @@ static int tr2_cfg_cb(const char *key, const char *value, void *d)
>  {
>  	struct strbuf **s;
>  	struct tr2_cfg_data *data = (struct tr2_cfg_data *)d;
> +	const char *prior_value;
>  
>  	for (s = tr2_cfg_patterns; *s; s++) {
>  		struct strbuf *buf = *s;
>  		int wm = wildmatch(buf->buf, key, WM_CASEFOLD);
>  		if (wm == WM_MATCH) {
> -			trace2_def_param_fl(data->file, data->line, key, value);
> +			if (strset_contains(&tr_cfg_set, key)
> +			    || git_config_get_value(key, &prior_value))
> +				continue;
> +			trace2_def_param_fl(data->file, data->line, key, prior_value);
> +			strset_add(&tr_cfg_set, key);
>  			return 0;
>  		}
>  	}


Is tr2_cfg_load_patterns() run at early startup, but this perhaps at
runtime? I wonder if this is OK under threading, i.e. concurrent access
to your strset.

The assumption you're making here doesn't hold in general, some config
is "last vaule wins", but for some other config all configured values
are important.

Do we care in this case? I really don't know, but perhaps we can declare
"dedup" using the same facility we're using to wildcard-match keys, and
either make that optional or the default, e.g.:

	GIT_TRACE2_CONFIG_PARAMS='*:dedup,core.*:'

I.e. to make it a list of <glob>[:<options>].

Maybe not worth the effort...
Junio C Hamano June 28, 2022, 6:12 p.m. UTC | #2
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Do we care in this case? I really don't know, but perhaps we can declare
> "dedup" using the same facility we're using to wildcard-match keys, and
> either make that optional or the default, e.g.:
>
> 	GIT_TRACE2_CONFIG_PARAMS='*:dedup,core.*:'
>
> I.e. to make it a list of <glob>[:<options>].
>
> Maybe not worth the effort...

I happen to think that the trace output is primarily for machine
consumption (i.e. if you want to make it readable by humans, it is
OK to require them to pre-process).

How does this "duplicate output" come about?  If it is because
end-user configuration files have multiple entries that are either
list-valued or relying on last-one-wins semantics, I suspect that it
is better not to prematurely filter these at the trace generation
side (which by definition is an opration that loses information).

So, to me, it smells that the whole "dedup at the source" thing is
not just not worth the effort but is misguided.
Jeff Hostetler July 1, 2022, 2:31 p.m. UTC | #3
On 6/28/22 2:12 PM, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
>> Do we care in this case? I really don't know, but perhaps we can declare
>> "dedup" using the same facility we're using to wildcard-match keys, and
>> either make that optional or the default, e.g.:
>>
>> 	GIT_TRACE2_CONFIG_PARAMS='*:dedup,core.*:'
>>
>> I.e. to make it a list of <glob>[:<options>].
>>
>> Maybe not worth the effort...
> 
> I happen to think that the trace output is primarily for machine
> consumption (i.e. if you want to make it readable by humans, it is
> OK to require them to pre-process).
> 
> How does this "duplicate output" come about?  If it is because
> end-user configuration files have multiple entries that are either
> list-valued or relying on last-one-wins semantics, I suspect that it
> is better not to prematurely filter these at the trace generation
> side (which by definition is an opration that loses information).
> 
> So, to me, it smells that the whole "dedup at the source" thing is
> not just not worth the effort but is misguided.
> 

I agree.  Let's not try to dedup these.  IIRC, the tr2_cfg_cb()
machinery runs during a "read early" or "read very early" scan of
the config values and the program is still starting up (in some
sense).  For example, the command line may not have been fully
processed yet (again, IIRC).  So injecting a call here to force an 
explicit lookup could cause problems.

And you'll be arbitrarily taking the first value, which is probably
the system level setting rather than the last value, which is probably
the local setting.  So the output would be misleading.

And I don't think it is worth the effort.  Let's just log the
context (system, global, local) as you described in the previous
version and be happy that we get multiple-but-now-qualified values.

This was intended as a machine-readable feature to allow telemetry
post-processing to group or filter command events.  For example, if
we want to do some "feature-x" is ON or OFF and compute averages
for the ONs and averages for the OFFs and compare them.  Such post-
processing has (I'm assuming) in the past always looked at the last
value logged.  And I assume that that would not change here with the
additional qualification.

Also, your new qualification would help us answer support questions
using telemetry where a user thought they had a feature on (say
globally) but was actually off at another level (say locally).

Jeff
Teng Long July 11, 2022, 3:51 a.m. UTC | #4
> Is tr2_cfg_load_patterns() run at early startup, but this perhaps at
> runtime? I wonder if this is OK under threading, i.e. concurrent access
> to your strset.

Yes. I think it may have at runtime.
Not for that reason, for now I think it's better to dump each config and
scope name is a better way in terms of other replied suggestions.

> The assumption you're making here doesn't hold in general, some config
> is "last vaule wins", but for some other config all configured values
> are important.

Yes. Also based on the above replies.

> Do we care in this case? I really don't know, but perhaps we can declare
> "dedup" using the same facility we're using to wildcard-match keys, and
> either make that optional or the default, e.g.:
>
> 	GIT_TRACE2_CONFIG_PARAMS='*:dedup,core.*:'
>
> I.e. to make it a list of <glob>[:<options>].
>
> Maybe not worth the effort...

I think if we could print config and scope names is enough maybe. When users
specify such as GIT_TRACE2_CONFIG_PARAMS, no one will cares about the extra
few lines config output in my opinion, but it may be useful for some
troubleshooting, so in my opinion this may not be implemented in this patch
for the time being.

Thanks.
Teng Long July 11, 2022, 4:11 a.m. UTC | #5
> I agree.  Let's not try to dedup these.  IIRC, the tr2_cfg_cb()
> machinery runs during a "read early" or "read very early" scan of
> the config values and the program is still starting up (in some
> sense).  For example, the command line may not have been fully
> processed yet (again, IIRC).  So injecting a call here to force an
> explicit lookup could cause problems.


The "read early" or "read very early" is a little abtract for me I think, I will
take a further look.


> And I don't think it is worth the effort.  Let's just log the
> context (system, global, local) as you described in the previous
> version and be happy that we get multiple-but-now-qualified values.

Yes.

> Also, your new qualification would help us answer support questions
> using telemetry where a user thought they had a feature on (say
> globally) but was actually off at another level (say locally).

That's certainly my original thought.


Thanks.
diff mbox series

Patch

diff --git a/trace2/tr2_cfg.c b/trace2/tr2_cfg.c
index ec9ac1a6ef..632bb6feec 100644
--- a/trace2/tr2_cfg.c
+++ b/trace2/tr2_cfg.c
@@ -1,5 +1,6 @@ 
 #include "cache.h"
 #include "config.h"
+#include "strmap.h"
 #include "trace2/tr2_cfg.h"
 #include "trace2/tr2_sysenv.h"
 
@@ -10,6 +11,7 @@  static int tr2_cfg_loaded;
 static struct strbuf **tr2_cfg_env_vars;
 static int tr2_cfg_env_vars_count;
 static int tr2_cfg_env_vars_loaded;
+static struct strset tr_cfg_set = STRSET_INIT;
 
 /*
  * Parse a string containing a comma-delimited list of config keys
@@ -101,12 +103,17 @@  static int tr2_cfg_cb(const char *key, const char *value, void *d)
 {
 	struct strbuf **s;
 	struct tr2_cfg_data *data = (struct tr2_cfg_data *)d;
+	const char *prior_value;
 
 	for (s = tr2_cfg_patterns; *s; s++) {
 		struct strbuf *buf = *s;
 		int wm = wildmatch(buf->buf, key, WM_CASEFOLD);
 		if (wm == WM_MATCH) {
-			trace2_def_param_fl(data->file, data->line, key, value);
+			if (strset_contains(&tr_cfg_set, key)
+			    || git_config_get_value(key, &prior_value))
+				continue;
+			trace2_def_param_fl(data->file, data->line, key, prior_value);
+			strset_add(&tr_cfg_set, key);
 			return 0;
 		}
 	}