diff mbox series

fetch-pack: write effective filter to trace2

Message ID 20220715172943.2681492-1-jonathantanmy@google.com (mailing list archive)
State Superseded
Headers show
Series fetch-pack: write effective filter to trace2 | expand

Commit Message

Jonathan Tan July 15, 2022, 5:29 p.m. UTC
Administrators of a managed Git environment (like the one at $DAYJOB)
might want to quantify the performance change of fetches with and
without partial clone from the client's point of view. Therefore, log
the effective filter being sent to the server whenever a fetch (or
clone) occurs. Note that this is not necessarily the same as what's
specified on the CLI, because during a fetch, the configured filter is
used whenever a filter is not specified on the CLI.

This is implemented for protocol v0, v1, and v2.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Jeff Hostetler July 15, 2022, 5:38 p.m. UTC | #1
On 7/15/22 1:29 PM, Jonathan Tan wrote:
> Administrators of a managed Git environment (like the one at $DAYJOB)
> might want to quantify the performance change of fetches with and
> without partial clone from the client's point of view. Therefore, log
> the effective filter being sent to the server whenever a fetch (or
> clone) occurs. Note that this is not necessarily the same as what's
> specified on the CLI, because during a fetch, the configured filter is
> used whenever a filter is not specified on the CLI.
> 
> This is implemented for protocol v0, v1, and v2.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>   fetch-pack.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index cb6647d657..dec8743bec 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -392,7 +392,10 @@ static int find_common(struct fetch_negotiator *negotiator,
>   	if (server_supports_filtering && args->filter_options.choice) {
>   		const char *spec =
>   			expand_list_objects_filter_spec(&args->filter_options);
> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec);
>   		packet_buf_write(&req_buf, "filter %s", spec);
> +	} else {
> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");
>   	}
>   	packet_buf_flush(&req_buf);
>   	state_len = req_buf.len;
> @@ -1328,9 +1331,12 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
>   		const char *spec =
>   			expand_list_objects_filter_spec(&args->filter_options);
>   		print_verbose(args, _("Server supports filter"));
> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec);
>   		packet_buf_write(&req_buf, "filter %s", spec);
> -	} else if (args->filter_options.choice) {
> -		warning("filtering not recognized by server, ignoring");
> +	} else {
> +		if (args->filter_options.choice)
> +			warning("filtering not recognized by server, ignoring");
> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");
>   	}
>   
>   	if (server_supports_feature("fetch", "packfile-uris", 0)) {
> 

This looks nice.  Thanks!
Jeff
Junio C Hamano July 15, 2022, 6:28 p.m. UTC | #2
Jeff Hostetler <git@jeffhostetler.com> writes:

> On 7/15/22 1:29 PM, Jonathan Tan wrote:
>> Administrators of a managed Git environment (like the one at $DAYJOB)
>> might want to quantify the performance change of fetches with and
>> without partial clone from the client's point of view. Therefore, log
>> the effective filter being sent to the server whenever a fetch (or
>> clone) occurs. Note that this is not necessarily the same as what's
>> specified on the CLI, because during a fetch, the configured filter is
>> used whenever a filter is not specified on the CLI.
>> This is implemented for protocol v0, v1, and v2.

Is that different to say "for all protocols"?  I am wondering if it
is worth saying (unlike in a hypothetical case where we do not
support v0 and v1 we may want to state why we only support v2).

>> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
>> ---
>>   fetch-pack.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index cb6647d657..dec8743bec 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -392,7 +392,10 @@ static int find_common(struct fetch_negotiator *negotiator,
>>   	if (server_supports_filtering && args->filter_options.choice) {
>>   		const char *spec =
>>   			expand_list_objects_filter_spec(&args->filter_options);
>> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec);
>>   		packet_buf_write(&req_buf, "filter %s", spec);
>> +	} else {
>> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");

Do we show "none" anywhere else where an expanded list objects
filter spec is expected?

I am wondering two things: 

 - The lack of this line would be a cleaner implementation of a
   signal to say "this client did not ask any filtering".

 - It would be good if we keep what report here more-or-less the
   same as what we can pass "--filter=" on the command line of
   "git pack-objects".

If "--filter=none" meant "this --filter passes everything", then
saying "none" here makes perfect sense wrt the latter, but I doubt
it is the case.

>>   	}
>>   	packet_buf_flush(&req_buf);
>>   	state_len = req_buf.len;
>> @@ -1328,9 +1331,12 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
>>   		const char *spec =
>>   			expand_list_objects_filter_spec(&args->filter_options);
>>   		print_verbose(args, _("Server supports filter"));
>> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec);
>>   		packet_buf_write(&req_buf, "filter %s", spec);
>> -	} else if (args->filter_options.choice) {
>> -		warning("filtering not recognized by server, ignoring");
>> +	} else {
>> +		if (args->filter_options.choice)
>> +			warning("filtering not recognized by server, ignoring");
>> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");

At the first glance, this seems to lose data, because you should be
able to use expand_list_objects_filter_spec() to report the filter
spec.  But this is reporting the filter that was actually in effect,
so it is OK.

But the same question about "none" remains.

>>   	}
>>     	if (server_supports_feature("fetch", "packfile-uris", 0)) {
>> 
>
> This looks nice.  Thanks!
> Jeff

Will queue with your Acked-by, then?

Thanks, both.
Jonathan Tan July 15, 2022, 7:09 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
> > On 7/15/22 1:29 PM, Jonathan Tan wrote:
> >> Administrators of a managed Git environment (like the one at $DAYJOB)
> >> might want to quantify the performance change of fetches with and
> >> without partial clone from the client's point of view. Therefore, log
> >> the effective filter being sent to the server whenever a fetch (or
> >> clone) occurs. Note that this is not necessarily the same as what's
> >> specified on the CLI, because during a fetch, the configured filter is
> >> used whenever a filter is not specified on the CLI.
> >> This is implemented for protocol v0, v1, and v2.
> 
> Is that different to say "for all protocols"?  I am wondering if it
> is worth saying (unlike in a hypothetical case where we do not
> support v0 and v1 we may want to state why we only support v2).

I wrote it that way to avoid confusion with things like HTTP (which is a
protocol, at least in its name). Maybe better would be "This is
implemented for all protocols (v0, v1, and v2)". I'll make that change
in the commit message (after the other questions are discussed).

> >> diff --git a/fetch-pack.c b/fetch-pack.c
> >> index cb6647d657..dec8743bec 100644
> >> --- a/fetch-pack.c
> >> +++ b/fetch-pack.c
> >> @@ -392,7 +392,10 @@ static int find_common(struct fetch_negotiator *negotiator,
> >>   	if (server_supports_filtering && args->filter_options.choice) {
> >>   		const char *spec =
> >>   			expand_list_objects_filter_spec(&args->filter_options);
> >> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec);
> >>   		packet_buf_write(&req_buf, "filter %s", spec);
> >> +	} else {
> >> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");
> 
> Do we show "none" anywhere else where an expanded list objects
> filter spec is expected?

Hmm...no, we don't.

> I am wondering two things: 
> 
>  - The lack of this line would be a cleaner implementation of a
>    signal to say "this client did not ask any filtering".

I think the presence is important to distinguish "no filtering" versus
someone using an old Git version that does not emit such traces, but I'm
open to changing the "none" to "none-specified" or something like that.

>  - It would be good if we keep what report here more-or-less the
>    same as what we can pass "--filter=" on the command line of
>    "git pack-objects".

My intent is to report what is being sent to the server in the fetch
request.

> If "--filter=none" meant "this --filter passes everything", then
> saying "none" here makes perfect sense wrt the latter, but I doubt
> it is the case.

--filter=none does not work (a user would have to specify --no-filter),
although it conceptually makes sense. I just wanted to have something
show up to indicate that we are using a Git version that would emit a
trace.
Junio C Hamano July 15, 2022, 8:10 p.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:

>> >> This is implemented for protocol v0, v1, and v2.
>> 
>> Is that different to say "for all protocols"?  I am wondering if it
>> is worth saying (unlike in a hypothetical case where we do not
>> support v0 and v1 we may want to state why we only support v2).
>
> I wrote it that way to avoid confusion with things like HTTP (which is a
> protocol, at least in its name). Maybe better would be "This is
> implemented for all protocols (v0, v1, and v2)".

I still wonder if it is better left unsaid.  When adding a new
thing, it would not be noteworthy if the new thing is available no
matter what protocol is being used, no?

>> >> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec);
>> >>   		packet_buf_write(&req_buf, "filter %s", spec);
>> >> +	} else {
>> >> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");
>> 
>> Do we show "none" anywhere else where an expanded list objects
>> filter spec is expected?
>
> Hmm...no, we don't.
>
>> I am wondering two things: 
>> 
>>  - The lack of this line would be a cleaner implementation of a
>>    signal to say "this client did not ask any filtering".
>
> I think the presence is important to distinguish "no filtering" versus
> someone using an old Git version that does not emit such traces, but I'm
> open to changing the "none" to "none-specified" or something like that.

OK.

>>  - It would be good if we keep what report here more-or-less the
>>    same as what we can pass "--filter=" on the command line of
>>    "git pack-objects".
>
> My intent is to report what is being sent to the server in the fetch
> request.

Then I'd be OK with a design that reports "none", if we send "none"
to the server in this case.  If not, then I do not think we should.

Perhaps report an empty string or not reporting at all?  After all,
the reader knows the client version and capability in the log so it
is easy to tell between 'no filter was used' and 'too old to report
the filter', no?  I dunno.
Jonathan Tan July 15, 2022, 8:49 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >> >> This is implemented for protocol v0, v1, and v2.
> >> 
> >> Is that different to say "for all protocols"?  I am wondering if it
> >> is worth saying (unlike in a hypothetical case where we do not
> >> support v0 and v1 we may want to state why we only support v2).
> >
> > I wrote it that way to avoid confusion with things like HTTP (which is a
> > protocol, at least in its name). Maybe better would be "This is
> > implemented for all protocols (v0, v1, and v2)".
> 
> I still wonder if it is better left unsaid.  When adding a new
> thing, it would not be noteworthy if the new thing is available no
> matter what protocol is being used, no?

I was thinking that a reviewer would think "why is this done twice" and
wrote this comment to answer this issue in advance, but it makes sense
to just remove it. OK - I'll do that.

> > My intent is to report what is being sent to the server in the fetch
> > request.
> 
> Then I'd be OK with a design that reports "none", if we send "none"
> to the server in this case.  If not, then I do not think we should.
> 
> Perhaps report an empty string or not reporting at all?  After all,
> the reader knows the client version and capability in the log so it
> is easy to tell between 'no filter was used' and 'too old to report
> the filter', no?  I dunno.

We don't send "none" to the server. If an empty string is OK then I'll
use that, since it will make it slightly easier for analysis since we do
not have to check against the version (and also to know which version
has support for this feature). But I'm OK either way.
Jeff Hostetler July 18, 2022, 2:08 p.m. UTC | #6
On 7/15/22 2:28 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> On 7/15/22 1:29 PM, Jonathan Tan wrote:
...
>>> +		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");
> 
> At the first glance, this seems to lose data, because you should be
> able to use expand_list_objects_filter_spec() to report the filter
> spec.  But this is reporting the filter that was actually in effect,
> so it is OK.
> 
> But the same question about "none" remains.

Yeah, the use of "none" gave me pause, but I didn't have a better idea
at the time.  I guess we have:
(a) requested, supported, used.
(b) "none used because the server doesn't support it" and
(c) "none used because the user didn't request it" cases,
right?

Perhaps it would be better to do:
     if (server_supports_filtering && args->filter_options.choice)
         trace2_data_string("fetch", r, "filter/effective", spec);
     else
         trace2_data_string("fetch", r, "filter/unsupported", spec);

Using two different keywords.

So that the log only contains "filter/effective" when it was actually
used.  And there is no "filter/effective" event when (for whatever
reason) it was not in effect.

Then the "filter/unsupported" event helps you with debugging.  Did they
hit a server that doesn't support filtering or did they have a typo in
their filter spec?

Then don't emit a message at all for the "not requested" case.  And you
can use the Git version number to know how to interpret it.  There are
other places where we don't bother sending messages where the value is
a zero or empty.

Jeff
Junio C Hamano July 18, 2022, 3:53 p.m. UTC | #7
Jeff Hostetler <git@jeffhostetler.com> writes:

> Yeah, the use of "none" gave me pause, but I didn't have a better idea
> at the time.  I guess we have:
> (a) requested, supported, used.
> (b) "none used because the server doesn't support it" and
> (c) "none used because the user didn't request it" cases,
> right?

At these sites where the new traces are added, we cannot detect the
remaining case (d) "requested by the user, asked for the server, but
the server dropped the ball", I think the above covers the possible
cases that are interesting to us entirely.

> Perhaps it would be better to do:
>     if (server_supports_filtering && args->filter_options.choice)
>         trace2_data_string("fetch", r, "filter/effective", spec);
>     else
>         trace2_data_string("fetch", r, "filter/unsupported", spec);
>
> Using two different keywords.
>
> So that the log only contains "filter/effective" when it was actually
> used.  And there is no "filter/effective" event when (for whatever
> reason) it was not in effect.
>
> Then the "filter/unsupported" event helps you with debugging.  Did they
> hit a server that doesn't support filtering or did they have a typo in
> their filter spec?
>
> Then don't emit a message at all for the "not requested" case.  And you
> can use the Git version number to know how to interpret it.  There are
> other places where we don't bother sending messages where the value is
> a zero or empty.

Sounds alright.  We could standardize the other way, which might
make the interpretation of individual trace entries independent of
the context easier, though.

Thanks.
Jonathan Tan July 18, 2022, 4:18 p.m. UTC | #8
Junio C Hamano <gitster@pobox.com> writes:
> > Using two different keywords.
> >
> > So that the log only contains "filter/effective" when it was actually
> > used.  And there is no "filter/effective" event when (for whatever
> > reason) it was not in effect.
> >
> > Then the "filter/unsupported" event helps you with debugging.  Did they
> > hit a server that doesn't support filtering or did they have a typo in
> > their filter spec?
> >
> > Then don't emit a message at all for the "not requested" case.  And you
> > can use the Git version number to know how to interpret it.  There are
> > other places where we don't bother sending messages where the value is
> > a zero or empty.
> 
> Sounds alright.  We could standardize the other way, which might
> make the interpretation of individual trace entries independent of
> the context easier, though.
> 
> Thanks.

Thanks for bringing up the use case of debugging a server that we
expected to support filtering but doesn't. As for whether we should not
send a message when the value is empty, I can see at least one reason
for not sending it - to not waste I/O and clutter the trace because of a
feature that the user is not using, but fetch is I/O-intensive enough
and having the empty message is useful enough (not only do we not need
to know which versions have this feature, but we also can be sure that
the message wasn't excluded because of some unexpected log filtering or
something like that) that I think we should have the empty message. I'll
put it in v2 but we can easily remove it if we decide that we don't want
it.
Junio C Hamano July 18, 2022, 5:56 p.m. UTC | #9
Jonathan Tan <jonathantanmy@google.com> writes:

> ... that the user is not using, but fetch is I/O-intensive enough
> and having the empty message is useful enough (not only do we not need
> to know which versions have this feature, but we also can be sure that
> the message wasn't excluded because of some unexpected log filtering or
> something like that) that I think we should have the empty message. I'll
> put it in v2 but we can easily remove it if we decide that we don't want
> it.

As long as it is clearly documented that "none" is given explicitly
when no filtering is requested (Jeff's point of making it possible
to tell the reason why we are not requesting is still valid), I
think it would be OK.  If we are giving enough log entries for
"fetch" operation, another "empty" message would not hurt (if the
trace for "fetch" is otherwise very quiet, then an unused "empty"
may become distracting, but I somehow do not think it is the case).

Thanks.
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index cb6647d657..dec8743bec 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -392,7 +392,10 @@  static int find_common(struct fetch_negotiator *negotiator,
 	if (server_supports_filtering && args->filter_options.choice) {
 		const char *spec =
 			expand_list_objects_filter_spec(&args->filter_options);
+		trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec);
 		packet_buf_write(&req_buf, "filter %s", spec);
+	} else {
+		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");
 	}
 	packet_buf_flush(&req_buf);
 	state_len = req_buf.len;
@@ -1328,9 +1331,12 @@  static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 		const char *spec =
 			expand_list_objects_filter_spec(&args->filter_options);
 		print_verbose(args, _("Server supports filter"));
+		trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec);
 		packet_buf_write(&req_buf, "filter %s", spec);
-	} else if (args->filter_options.choice) {
-		warning("filtering not recognized by server, ignoring");
+	} else {
+		if (args->filter_options.choice)
+			warning("filtering not recognized by server, ignoring");
+		trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none");
 	}
 
 	if (server_supports_feature("fetch", "packfile-uris", 0)) {