diff mbox series

[RFC,1/8] trace2: log fsync stats in trace2 rather than wrapper

Message ID 20230627195251.1973421-2-calvinwan@google.com (mailing list archive)
State New, archived
Headers show
Series Introduce Git Standard Library | expand

Commit Message

Calvin Wan June 27, 2023, 7:52 p.m. UTC
As a library boundary, wrapper.c should not directly log trace2
statistics, but instead provide those statistics upon
request. Therefore, move the trace2 logging code to trace2.[ch.]. This
also allows wrapper.c to not be dependent on trace2.h and repository.h.

Signed-off-by: Calvin Wan <calvinwan@google.com>
---
 trace2.c  | 13 +++++++++++++
 trace2.h  |  5 +++++
 wrapper.c | 17 ++++++-----------
 wrapper.h |  4 ++--
 4 files changed, 26 insertions(+), 13 deletions(-)

Comments

Victoria Dye June 28, 2023, 2:05 a.m. UTC | #1
Calvin Wan wrote:
> As a library boundary, wrapper.c should not directly log trace2
> statistics, but instead provide those statistics upon
> request. Therefore, move the trace2 logging code to trace2.[ch.]. This
> also allows wrapper.c to not be dependent on trace2.h and repository.h.
> 

...

> diff --git a/trace2.h b/trace2.h
> index 4ced30c0db..689e9a4027 100644
> --- a/trace2.h
> +++ b/trace2.h
> @@ -581,4 +581,9 @@ void trace2_collect_process_info(enum trace2_process_info_reason reason);
>  
>  const char *trace2_session_id(void);
>  
> +/*
> + * Writes out trace statistics for fsync
> + */
> +void trace_git_fsync_stats(void);
> +

This function does not belong in 'trace2.h', IMO. The purpose of that file
is to contain the generic API for Trace2 (e.g., 'trace2_printf()',
'trace2_region_(enter|exit)'), whereas this function is effectively a
wrapper around a specific invocation of that API. 

You note in the commit message that "wrapper.c should not directly log
trace2 statistics" with the reasoning of "[it's] a library boundary," but I
suspect the unstated underlying reason is "because it tracks 'count_fsync_*'
in static variables." This case would be better handled, then, by replacing
the usage in 'wrapper.c' with a new Trace2 counter (API introduced in [1]).
That keeps this usage consistent with the API already established for
Trace2, rather than starting an unsustainable trend of creating ad-hoc,
per-metric wrappers in 'trace2.[c|h]'.

An added note re: the commit message - it's extremely important that
functions _anywhere in Git_ are able to use the Trace2 API directly. A
developer could reasonably want to measure performance, keep track of an
interesting metric, log when a region is entered in the larger trace,
capture error information, etc. for any function, regardless of where in
falls in the internal library organization. To that end, I think either the
commit message should be rephrased to remove that statement (if the issue is
really "we're using a static variable and we want to avoid that"), or the
libification effort should be updated to accommodate use of Trace2 anywhere
in Git. 

[1] https://lore.kernel.org/git/pull.1373.v4.git.1666618868.gitgitgadget@gmail.com/
Calvin Wan July 5, 2023, 5:57 p.m. UTC | #2
> This function does not belong in 'trace2.h', IMO. The purpose of that file
> is to contain the generic API for Trace2 (e.g., 'trace2_printf()',
> 'trace2_region_(enter|exit)'), whereas this function is effectively a
> wrapper around a specific invocation of that API.
>
> You note in the commit message that "wrapper.c should not directly log
> trace2 statistics" with the reasoning of "[it's] a library boundary," but I
> suspect the unstated underlying reason is "because it tracks 'count_fsync_*'
> in static variables." This case would be better handled, then, by replacing
> the usage in 'wrapper.c' with a new Trace2 counter (API introduced in [1]).
> That keeps this usage consistent with the API already established for
> Trace2, rather than starting an unsustainable trend of creating ad-hoc,
> per-metric wrappers in 'trace2.[c|h]'.

The underlying reason is for removing the trace2 dependency from
wrapper.c so that when git-std-lib is compiled, there isn't a missing
object for  trace_git_fsync_stats(), resulting in a compilation error.
However I do agree that the method I chose to do so by creating an
ad-hoc wrapper is unsustainable and I will come up with a better
method for doing so.

>
> An added note re: the commit message - it's extremely important that
> functions _anywhere in Git_ are able to use the Trace2 API directly. A
> developer could reasonably want to measure performance, keep track of an
> interesting metric, log when a region is entered in the larger trace,
> capture error information, etc. for any function, regardless of where in
> falls in the internal library organization.

I don't quite agree that functions _anywhere in Git_ are able to use
the Trace2 API directly for the same reason that we don't have the
ability to log functions in external libraries -- logging common,
low-level functionality creates an unnecessary amount of log churn and
those logs generally contain practically useless information. However,
that does not mean that all of the functions in git-std-lib fall into
that category (usage has certain functions definitely worth logging).
This means that files like usage.c could instead be separated into its
own library and git-std-lib would only contain files that we deem
"should never be logged".

> To that end, I think either the
> commit message should be rephrased to remove that statement (if the issue is
> really "we're using a static variable and we want to avoid that"), or the
> libification effort should be updated to accommodate use of Trace2 anywhere
> in Git.

Besides potentially redrawing the boundaries of git-std-lib to
accommodate Trace2, we're also looking into the possibility of
stubbing out tracing in git-std-lib so that it and other libraries can
be built and tested, and then when Trace2 is turned into a library,
it's full functionality can be linked to.
Victoria Dye July 5, 2023, 6:22 p.m. UTC | #3
Calvin Wan wrote:
>> An added note re: the commit message - it's extremely important that
>> functions _anywhere in Git_ are able to use the Trace2 API directly. A
>> developer could reasonably want to measure performance, keep track of an
>> interesting metric, log when a region is entered in the larger trace,
>> capture error information, etc. for any function, regardless of where in
>> falls in the internal library organization.
> 
> I don't quite agree that functions _anywhere in Git_ are able to use
> the Trace2 API directly for the same reason that we don't have the
> ability to log functions in external libraries -- logging common,
> low-level functionality creates an unnecessary amount of log churn and
> those logs generally contain practically useless information. 

That may be true in your use cases, but it isn't in mine and may not be for
others'. In fact, I was just using these exact fsync metrics a couple weeks
ago to do some performance analysis; I could easily imagine doing something
similar for another "low level" function. It's unreasonable - and unfair to
future development - to make an absolute declaration about "what's useful
vs. useless" and use that decision to justify severely limiting our future
flexibility on the matter.

> However,
> that does not mean that all of the functions in git-std-lib fall into
> that category (usage has certain functions definitely worth logging).
> This means that files like usage.c could instead be separated into its
> own library and git-std-lib would only contain files that we deem
> "should never be logged".

How do you make that determination? What about if/when someone realizes,
somewhere down the line, that one of those "should never be logged" files
would actually benefit from some aggregate metric, e.g. a Trace2 timer? This
isn't a case of extracting an extraneous dependency (where a function really
doesn't _need_ something it has access to); tracing & logging is a core
functionality in Git, and should not be artificially constrained in the name
of organization. 

>> To that end, I think either the
>> commit message should be rephrased to remove that statement (if the issue is
>> really "we're using a static variable and we want to avoid that"), or the
>> libification effort should be updated to accommodate use of Trace2 anywhere
>> in Git.
> 
> Besides potentially redrawing the boundaries of git-std-lib to
> accommodate Trace2, we're also looking into the possibility of
> stubbing out tracing in git-std-lib so that it and other libraries can
> be built and tested, and then when Trace2 is turned into a library,
> it's full functionality can be linked to.

If that allows you to meet your libification goals without limiting Trace2's
accessibility throughout the codebase, that works for me.
Jeff Hostetler July 11, 2023, 8:07 p.m. UTC | #4
On 6/27/23 3:52 PM, Calvin Wan wrote:
> As a library boundary, wrapper.c should not directly log trace2
> statistics, but instead provide those statistics upon
> request. Therefore, move the trace2 logging code to trace2.[ch.]. This
> also allows wrapper.c to not be dependent on trace2.h and repository.h.
> 
> Signed-off-by: Calvin Wan <calvinwan@google.com>
> ---
>   trace2.c  | 13 +++++++++++++
>   trace2.h  |  5 +++++
>   wrapper.c | 17 ++++++-----------
>   wrapper.h |  4 ++--
>   4 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/trace2.c b/trace2.c
> index 0efc4e7b95..f367a1ce31 100644
> --- a/trace2.c
> +++ b/trace2.c
> @@ -915,3 +915,16 @@ const char *trace2_session_id(void)
>   {
>   	return tr2_sid_get();
>   }
> +
> +static void log_trace_fsync_if(const char *key)
> +{
> +	intmax_t value = get_trace_git_fsync_stats(key);
> +	if (value)
> +		trace2_data_intmax("fsync", the_repository, key, value);
> +}
> +
> +void trace_git_fsync_stats(void)
> +{
> +	log_trace_fsync_if("fsync/writeout-only");
> +	log_trace_fsync_if("fsync/hardware-flush");
> +}
> diff --git a/trace2.h b/trace2.h
> index 4ced30c0db..689e9a4027 100644
> --- a/trace2.h
> +++ b/trace2.h
> @@ -581,4 +581,9 @@ void trace2_collect_process_info(enum trace2_process_info_reason reason);
>   
>   const char *trace2_session_id(void);
>   
> +/*
> + * Writes out trace statistics for fsync
> + */
> +void trace_git_fsync_stats(void);
> +
>   #endif /* TRACE2_H */

Sorry to be late to this party, but none of this belongs
in trace2.[ch].

As Victoria stated, you can/should use the new "timers and counters"
feature in Trace2 to collect and log these stats.

And then you don't need specific log_trace_* functions or wrappers
-- just use the trace2_timer_start()/stop() or trace2_counter_add()
functions as necessary around the various fsync operations.


I haven't really followed the lib-ification effort, so I'm just going
to GUESS that all of the Trace2_ and tr2_ prefixed functions and data
structures will need to be in the lowest-level .a so that it can be
called from the main .exe and any other .a's between them.

Jeff
diff mbox series

Patch

diff --git a/trace2.c b/trace2.c
index 0efc4e7b95..f367a1ce31 100644
--- a/trace2.c
+++ b/trace2.c
@@ -915,3 +915,16 @@  const char *trace2_session_id(void)
 {
 	return tr2_sid_get();
 }
+
+static void log_trace_fsync_if(const char *key)
+{
+	intmax_t value = get_trace_git_fsync_stats(key);
+	if (value)
+		trace2_data_intmax("fsync", the_repository, key, value);
+}
+
+void trace_git_fsync_stats(void)
+{
+	log_trace_fsync_if("fsync/writeout-only");
+	log_trace_fsync_if("fsync/hardware-flush");
+}
diff --git a/trace2.h b/trace2.h
index 4ced30c0db..689e9a4027 100644
--- a/trace2.h
+++ b/trace2.h
@@ -581,4 +581,9 @@  void trace2_collect_process_info(enum trace2_process_info_reason reason);
 
 const char *trace2_session_id(void);
 
+/*
+ * Writes out trace statistics for fsync
+ */
+void trace_git_fsync_stats(void);
+
 #endif /* TRACE2_H */
diff --git a/wrapper.c b/wrapper.c
index 22be9812a7..bd7f0a9752 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -6,9 +6,7 @@ 
 #include "config.h"
 #include "gettext.h"
 #include "object.h"
-#include "repository.h"
 #include "strbuf.h"
-#include "trace2.h"
 
 static intmax_t count_fsync_writeout_only;
 static intmax_t count_fsync_hardware_flush;
@@ -600,16 +598,13 @@  int git_fsync(int fd, enum fsync_action action)
 	}
 }
 
-static void log_trace_fsync_if(const char *key, intmax_t value)
+intmax_t get_trace_git_fsync_stats(const char *key)
 {
-	if (value)
-		trace2_data_intmax("fsync", the_repository, key, value);
-}
-
-void trace_git_fsync_stats(void)
-{
-	log_trace_fsync_if("fsync/writeout-only", count_fsync_writeout_only);
-	log_trace_fsync_if("fsync/hardware-flush", count_fsync_hardware_flush);
+	if (!strcmp(key, "fsync/writeout-only"))
+		return count_fsync_writeout_only;
+	if (!strcmp(key, "fsync/hardware-flush"))
+		return count_fsync_hardware_flush;
+	return 0;
 }
 
 static int warn_if_unremovable(const char *op, const char *file, int rc)
diff --git a/wrapper.h b/wrapper.h
index c85b1328d1..db1bc109ed 100644
--- a/wrapper.h
+++ b/wrapper.h
@@ -88,9 +88,9 @@  enum fsync_action {
 int git_fsync(int fd, enum fsync_action action);
 
 /*
- * Writes out trace statistics for fsync using the trace2 API.
+ * Returns trace statistics for fsync using the trace2 API.
  */
-void trace_git_fsync_stats(void);
+intmax_t get_trace_git_fsync_stats(const char *key);
 
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.