Message ID | 20220715172943.2681492-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fetch-pack: write effective filter to trace2 | expand |
On 7/15/22 1:29 PM, Jonathan Tan wrote: > Administrators of a managed Git environment (like the one at $DAYJOB) > might want to quantify the performance change of fetches with and > without partial clone from the client's point of view. Therefore, log > the effective filter being sent to the server whenever a fetch (or > clone) occurs. Note that this is not necessarily the same as what's > specified on the CLI, because during a fetch, the configured filter is > used whenever a filter is not specified on the CLI. > > This is implemented for protocol v0, v1, and v2. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > fetch-pack.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fetch-pack.c b/fetch-pack.c > index cb6647d657..dec8743bec 100644 > --- a/fetch-pack.c > +++ b/fetch-pack.c > @@ -392,7 +392,10 @@ static int find_common(struct fetch_negotiator *negotiator, > if (server_supports_filtering && args->filter_options.choice) { > const char *spec = > expand_list_objects_filter_spec(&args->filter_options); > + trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec); > packet_buf_write(&req_buf, "filter %s", spec); > + } else { > + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); > } > packet_buf_flush(&req_buf); > state_len = req_buf.len; > @@ -1328,9 +1331,12 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out, > const char *spec = > expand_list_objects_filter_spec(&args->filter_options); > print_verbose(args, _("Server supports filter")); > + trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec); > packet_buf_write(&req_buf, "filter %s", spec); > - } else if (args->filter_options.choice) { > - warning("filtering not recognized by server, ignoring"); > + } else { > + if (args->filter_options.choice) > + warning("filtering not recognized by server, ignoring"); > + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); > } > > if (server_supports_feature("fetch", "packfile-uris", 0)) { > This looks nice. Thanks! Jeff
Jeff Hostetler <git@jeffhostetler.com> writes: > On 7/15/22 1:29 PM, Jonathan Tan wrote: >> Administrators of a managed Git environment (like the one at $DAYJOB) >> might want to quantify the performance change of fetches with and >> without partial clone from the client's point of view. Therefore, log >> the effective filter being sent to the server whenever a fetch (or >> clone) occurs. Note that this is not necessarily the same as what's >> specified on the CLI, because during a fetch, the configured filter is >> used whenever a filter is not specified on the CLI. >> This is implemented for protocol v0, v1, and v2. Is that different to say "for all protocols"? I am wondering if it is worth saying (unlike in a hypothetical case where we do not support v0 and v1 we may want to state why we only support v2). >> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> >> --- >> fetch-pack.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> diff --git a/fetch-pack.c b/fetch-pack.c >> index cb6647d657..dec8743bec 100644 >> --- a/fetch-pack.c >> +++ b/fetch-pack.c >> @@ -392,7 +392,10 @@ static int find_common(struct fetch_negotiator *negotiator, >> if (server_supports_filtering && args->filter_options.choice) { >> const char *spec = >> expand_list_objects_filter_spec(&args->filter_options); >> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec); >> packet_buf_write(&req_buf, "filter %s", spec); >> + } else { >> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); Do we show "none" anywhere else where an expanded list objects filter spec is expected? I am wondering two things: - The lack of this line would be a cleaner implementation of a signal to say "this client did not ask any filtering". - It would be good if we keep what report here more-or-less the same as what we can pass "--filter=" on the command line of "git pack-objects". If "--filter=none" meant "this --filter passes everything", then saying "none" here makes perfect sense wrt the latter, but I doubt it is the case. >> } >> packet_buf_flush(&req_buf); >> state_len = req_buf.len; >> @@ -1328,9 +1331,12 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out, >> const char *spec = >> expand_list_objects_filter_spec(&args->filter_options); >> print_verbose(args, _("Server supports filter")); >> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec); >> packet_buf_write(&req_buf, "filter %s", spec); >> - } else if (args->filter_options.choice) { >> - warning("filtering not recognized by server, ignoring"); >> + } else { >> + if (args->filter_options.choice) >> + warning("filtering not recognized by server, ignoring"); >> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); At the first glance, this seems to lose data, because you should be able to use expand_list_objects_filter_spec() to report the filter spec. But this is reporting the filter that was actually in effect, so it is OK. But the same question about "none" remains. >> } >> if (server_supports_feature("fetch", "packfile-uris", 0)) { >> > > This looks nice. Thanks! > Jeff Will queue with your Acked-by, then? Thanks, both.
Junio C Hamano <gitster@pobox.com> writes: > Jeff Hostetler <git@jeffhostetler.com> writes: > > > On 7/15/22 1:29 PM, Jonathan Tan wrote: > >> Administrators of a managed Git environment (like the one at $DAYJOB) > >> might want to quantify the performance change of fetches with and > >> without partial clone from the client's point of view. Therefore, log > >> the effective filter being sent to the server whenever a fetch (or > >> clone) occurs. Note that this is not necessarily the same as what's > >> specified on the CLI, because during a fetch, the configured filter is > >> used whenever a filter is not specified on the CLI. > >> This is implemented for protocol v0, v1, and v2. > > Is that different to say "for all protocols"? I am wondering if it > is worth saying (unlike in a hypothetical case where we do not > support v0 and v1 we may want to state why we only support v2). I wrote it that way to avoid confusion with things like HTTP (which is a protocol, at least in its name). Maybe better would be "This is implemented for all protocols (v0, v1, and v2)". I'll make that change in the commit message (after the other questions are discussed). > >> diff --git a/fetch-pack.c b/fetch-pack.c > >> index cb6647d657..dec8743bec 100644 > >> --- a/fetch-pack.c > >> +++ b/fetch-pack.c > >> @@ -392,7 +392,10 @@ static int find_common(struct fetch_negotiator *negotiator, > >> if (server_supports_filtering && args->filter_options.choice) { > >> const char *spec = > >> expand_list_objects_filter_spec(&args->filter_options); > >> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec); > >> packet_buf_write(&req_buf, "filter %s", spec); > >> + } else { > >> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); > > Do we show "none" anywhere else where an expanded list objects > filter spec is expected? Hmm...no, we don't. > I am wondering two things: > > - The lack of this line would be a cleaner implementation of a > signal to say "this client did not ask any filtering". I think the presence is important to distinguish "no filtering" versus someone using an old Git version that does not emit such traces, but I'm open to changing the "none" to "none-specified" or something like that. > - It would be good if we keep what report here more-or-less the > same as what we can pass "--filter=" on the command line of > "git pack-objects". My intent is to report what is being sent to the server in the fetch request. > If "--filter=none" meant "this --filter passes everything", then > saying "none" here makes perfect sense wrt the latter, but I doubt > it is the case. --filter=none does not work (a user would have to specify --no-filter), although it conceptually makes sense. I just wanted to have something show up to indicate that we are using a Git version that would emit a trace.
Jonathan Tan <jonathantanmy@google.com> writes: >> >> This is implemented for protocol v0, v1, and v2. >> >> Is that different to say "for all protocols"? I am wondering if it >> is worth saying (unlike in a hypothetical case where we do not >> support v0 and v1 we may want to state why we only support v2). > > I wrote it that way to avoid confusion with things like HTTP (which is a > protocol, at least in its name). Maybe better would be "This is > implemented for all protocols (v0, v1, and v2)". I still wonder if it is better left unsaid. When adding a new thing, it would not be noteworthy if the new thing is available no matter what protocol is being used, no? >> >> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec); >> >> packet_buf_write(&req_buf, "filter %s", spec); >> >> + } else { >> >> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); >> >> Do we show "none" anywhere else where an expanded list objects >> filter spec is expected? > > Hmm...no, we don't. > >> I am wondering two things: >> >> - The lack of this line would be a cleaner implementation of a >> signal to say "this client did not ask any filtering". > > I think the presence is important to distinguish "no filtering" versus > someone using an old Git version that does not emit such traces, but I'm > open to changing the "none" to "none-specified" or something like that. OK. >> - It would be good if we keep what report here more-or-less the >> same as what we can pass "--filter=" on the command line of >> "git pack-objects". > > My intent is to report what is being sent to the server in the fetch > request. Then I'd be OK with a design that reports "none", if we send "none" to the server in this case. If not, then I do not think we should. Perhaps report an empty string or not reporting at all? After all, the reader knows the client version and capability in the log so it is easy to tell between 'no filter was used' and 'too old to report the filter', no? I dunno.
Junio C Hamano <gitster@pobox.com> writes: > Jonathan Tan <jonathantanmy@google.com> writes: > > >> >> This is implemented for protocol v0, v1, and v2. > >> > >> Is that different to say "for all protocols"? I am wondering if it > >> is worth saying (unlike in a hypothetical case where we do not > >> support v0 and v1 we may want to state why we only support v2). > > > > I wrote it that way to avoid confusion with things like HTTP (which is a > > protocol, at least in its name). Maybe better would be "This is > > implemented for all protocols (v0, v1, and v2)". > > I still wonder if it is better left unsaid. When adding a new > thing, it would not be noteworthy if the new thing is available no > matter what protocol is being used, no? I was thinking that a reviewer would think "why is this done twice" and wrote this comment to answer this issue in advance, but it makes sense to just remove it. OK - I'll do that. > > My intent is to report what is being sent to the server in the fetch > > request. > > Then I'd be OK with a design that reports "none", if we send "none" > to the server in this case. If not, then I do not think we should. > > Perhaps report an empty string or not reporting at all? After all, > the reader knows the client version and capability in the log so it > is easy to tell between 'no filter was used' and 'too old to report > the filter', no? I dunno. We don't send "none" to the server. If an empty string is OK then I'll use that, since it will make it slightly easier for analysis since we do not have to check against the version (and also to know which version has support for this feature). But I'm OK either way.
On 7/15/22 2:28 PM, Junio C Hamano wrote: > Jeff Hostetler <git@jeffhostetler.com> writes: > >> On 7/15/22 1:29 PM, Jonathan Tan wrote: ... >>> + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); > > At the first glance, this seems to lose data, because you should be > able to use expand_list_objects_filter_spec() to report the filter > spec. But this is reporting the filter that was actually in effect, > so it is OK. > > But the same question about "none" remains. Yeah, the use of "none" gave me pause, but I didn't have a better idea at the time. I guess we have: (a) requested, supported, used. (b) "none used because the server doesn't support it" and (c) "none used because the user didn't request it" cases, right? Perhaps it would be better to do: if (server_supports_filtering && args->filter_options.choice) trace2_data_string("fetch", r, "filter/effective", spec); else trace2_data_string("fetch", r, "filter/unsupported", spec); Using two different keywords. So that the log only contains "filter/effective" when it was actually used. And there is no "filter/effective" event when (for whatever reason) it was not in effect. Then the "filter/unsupported" event helps you with debugging. Did they hit a server that doesn't support filtering or did they have a typo in their filter spec? Then don't emit a message at all for the "not requested" case. And you can use the Git version number to know how to interpret it. There are other places where we don't bother sending messages where the value is a zero or empty. Jeff
Jeff Hostetler <git@jeffhostetler.com> writes: > Yeah, the use of "none" gave me pause, but I didn't have a better idea > at the time. I guess we have: > (a) requested, supported, used. > (b) "none used because the server doesn't support it" and > (c) "none used because the user didn't request it" cases, > right? At these sites where the new traces are added, we cannot detect the remaining case (d) "requested by the user, asked for the server, but the server dropped the ball", I think the above covers the possible cases that are interesting to us entirely. > Perhaps it would be better to do: > if (server_supports_filtering && args->filter_options.choice) > trace2_data_string("fetch", r, "filter/effective", spec); > else > trace2_data_string("fetch", r, "filter/unsupported", spec); > > Using two different keywords. > > So that the log only contains "filter/effective" when it was actually > used. And there is no "filter/effective" event when (for whatever > reason) it was not in effect. > > Then the "filter/unsupported" event helps you with debugging. Did they > hit a server that doesn't support filtering or did they have a typo in > their filter spec? > > Then don't emit a message at all for the "not requested" case. And you > can use the Git version number to know how to interpret it. There are > other places where we don't bother sending messages where the value is > a zero or empty. Sounds alright. We could standardize the other way, which might make the interpretation of individual trace entries independent of the context easier, though. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > > Using two different keywords. > > > > So that the log only contains "filter/effective" when it was actually > > used. And there is no "filter/effective" event when (for whatever > > reason) it was not in effect. > > > > Then the "filter/unsupported" event helps you with debugging. Did they > > hit a server that doesn't support filtering or did they have a typo in > > their filter spec? > > > > Then don't emit a message at all for the "not requested" case. And you > > can use the Git version number to know how to interpret it. There are > > other places where we don't bother sending messages where the value is > > a zero or empty. > > Sounds alright. We could standardize the other way, which might > make the interpretation of individual trace entries independent of > the context easier, though. > > Thanks. Thanks for bringing up the use case of debugging a server that we expected to support filtering but doesn't. As for whether we should not send a message when the value is empty, I can see at least one reason for not sending it - to not waste I/O and clutter the trace because of a feature that the user is not using, but fetch is I/O-intensive enough and having the empty message is useful enough (not only do we not need to know which versions have this feature, but we also can be sure that the message wasn't excluded because of some unexpected log filtering or something like that) that I think we should have the empty message. I'll put it in v2 but we can easily remove it if we decide that we don't want it.
Jonathan Tan <jonathantanmy@google.com> writes: > ... that the user is not using, but fetch is I/O-intensive enough > and having the empty message is useful enough (not only do we not need > to know which versions have this feature, but we also can be sure that > the message wasn't excluded because of some unexpected log filtering or > something like that) that I think we should have the empty message. I'll > put it in v2 but we can easily remove it if we decide that we don't want > it. As long as it is clearly documented that "none" is given explicitly when no filtering is requested (Jeff's point of making it possible to tell the reason why we are not requesting is still valid), I think it would be OK. If we are giving enough log entries for "fetch" operation, another "empty" message would not hurt (if the trace for "fetch" is otherwise very quiet, then an unused "empty" may become distracting, but I somehow do not think it is the case). Thanks.
diff --git a/fetch-pack.c b/fetch-pack.c index cb6647d657..dec8743bec 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -392,7 +392,10 @@ static int find_common(struct fetch_negotiator *negotiator, if (server_supports_filtering && args->filter_options.choice) { const char *spec = expand_list_objects_filter_spec(&args->filter_options); + trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec); packet_buf_write(&req_buf, "filter %s", spec); + } else { + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); } packet_buf_flush(&req_buf); state_len = req_buf.len; @@ -1328,9 +1331,12 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out, const char *spec = expand_list_objects_filter_spec(&args->filter_options); print_verbose(args, _("Server supports filter")); + trace2_data_string("fetch", the_repository, "fetch/effective-filter", spec); packet_buf_write(&req_buf, "filter %s", spec); - } else if (args->filter_options.choice) { - warning("filtering not recognized by server, ignoring"); + } else { + if (args->filter_options.choice) + warning("filtering not recognized by server, ignoring"); + trace2_data_string("fetch", the_repository, "fetch/effective-filter", "none"); } if (server_supports_feature("fetch", "packfile-uris", 0)) {
Administrators of a managed Git environment (like the one at $DAYJOB) might want to quantify the performance change of fetches with and without partial clone from the client's point of view. Therefore, log the effective filter being sent to the server whenever a fetch (or clone) occurs. Note that this is not necessarily the same as what's specified on the CLI, because during a fetch, the configured filter is used whenever a filter is not specified on the CLI. This is implemented for protocol v0, v1, and v2. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- fetch-pack.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)