Message ID | d04028c3c7574e3ca0f9c1b3d711192ca756158d.1604355792.git.steadmon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Advertise trace2 SID in protocol capabilities | expand |
Josh Steadmon <steadmon@google.com> writes: > +trace2-sid=<session-id> > +----------------------- > + > +If trace2 tracing is enabled on the server, it may advertise its session ID via > +this capability. The client may choose to log the server's session ID in its > +trace logs, and advertise its own session ID back to the server for it to log > +as well. This allows for easier debugging of remote sessions when both client > +and server logs are available. > diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt > index e597b74da3..a5b9ef04f6 100644 > --- a/Documentation/technical/protocol-v2.txt > +++ b/Documentation/technical/protocol-v2.txt > @@ -492,3 +492,12 @@ form `object-format=X`) to notify the client that the server is able to deal > with objects using hash algorithm X. If not specified, the server is assumed to > only handle SHA-1. If the client would like to use a hash algorithm other than > SHA-1, it should specify its object-format string. > + > +trace2-sid=<session-id> > +~~~~~~~~~~~~~~~~~~~~~~~ > + > +If trace2 tracing is enabled on the server, it may advertise its session ID via > +this capability. The client may choose to log the server's session ID in its > +trace logs, and advertise its own session ID back to the server for it to log > +as well. This allows for easier debugging of remote sessions when both client > +and server logs are available. Have we documented what a session-id should look like anywhere in the documentation? This document is to help third-party to write implementations of the protocol, but the above description leaves things "implementation defined" a bit too much, I am afraid. For example, as this must fit on a single pkt-line as an advertised capability, there would be some length limit. Are there other inherent limitations due to our protocol? Are there some artificial limitations that we may want to impose to make it easier to harden implementations against common mistakes? For example are bytes in <session-id> allowed to contain LF, CR, NUL, etc.?
On 11/3/20 4:33 PM, Junio C Hamano wrote: > Josh Steadmon <steadmon@google.com> writes: > >> +trace2-sid=<session-id> >> +----------------------- >> + >> +If trace2 tracing is enabled on the server, it may advertise its session ID via >> +this capability. The client may choose to log the server's session ID in its >> +trace logs, and advertise its own session ID back to the server for it to log >> +as well. This allows for easier debugging of remote sessions when both client >> +and server logs are available. >> diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt >> index e597b74da3..a5b9ef04f6 100644 >> --- a/Documentation/technical/protocol-v2.txt >> +++ b/Documentation/technical/protocol-v2.txt >> @@ -492,3 +492,12 @@ form `object-format=X`) to notify the client that the server is able to deal >> with objects using hash algorithm X. If not specified, the server is assumed to >> only handle SHA-1. If the client would like to use a hash algorithm other than >> SHA-1, it should specify its object-format string. >> + >> +trace2-sid=<session-id> >> +~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +If trace2 tracing is enabled on the server, it may advertise its session ID via >> +this capability. The client may choose to log the server's session ID in its >> +trace logs, and advertise its own session ID back to the server for it to log >> +as well. This allows for easier debugging of remote sessions when both client >> +and server logs are available. > > Have we documented what a session-id should look like anywhere in > the documentation? This document is to help third-party to write > implementations of the protocol, but the above description leaves > things "implementation defined" a bit too much, I am afraid. > > For example, as this must fit on a single pkt-line as an advertised > capability, there would be some length limit. Are there other > inherent limitations due to our protocol? Are there some artificial > limitations that we may want to impose to make it easier to harden > implementations against common mistakes? For example are bytes in > <session-id> allowed to contain LF, CR, NUL, etc.? > The computed part of trace2 SIDs are relatively safe (both for length and funky characters). And funky characters are protected by JSON encoding rules when writing to the GIT_TRACE2_EVENT target. And I think the computed part would be safe in this context. I've not seen commands that spawn more than about 6 levels of child processes, and those would be fine here. However, the opportunity to introduce a prefix as I suggested earlier does offer the opportunity to introduce funky chars that would not have any protection, so you may want to c-quote the string when inserting it into the wire protocol. $ GIT_TRACE2_EVENT=1 GIT_TRACE2_PARENT_SID=hello git version {"event":"version","sid":"hello/20201030T154143.660608Z-H86606a97- P00001d30",...} ... (Allowing the user to insert a prefix like that has been useful for tracking control/experiment testing, so I wouldn't want to disable it.) Jeff
On 2020.11.03 13:33, Junio C Hamano wrote: > Josh Steadmon <steadmon@google.com> writes: > > > +trace2-sid=<session-id> > > +----------------------- > > + > > +If trace2 tracing is enabled on the server, it may advertise its session ID via > > +this capability. The client may choose to log the server's session ID in its > > +trace logs, and advertise its own session ID back to the server for it to log > > +as well. This allows for easier debugging of remote sessions when both client > > +and server logs are available. > > diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt > > index e597b74da3..a5b9ef04f6 100644 > > --- a/Documentation/technical/protocol-v2.txt > > +++ b/Documentation/technical/protocol-v2.txt > > @@ -492,3 +492,12 @@ form `object-format=X`) to notify the client that the server is able to deal > > with objects using hash algorithm X. If not specified, the server is assumed to > > only handle SHA-1. If the client would like to use a hash algorithm other than > > SHA-1, it should specify its object-format string. > > + > > +trace2-sid=<session-id> > > +~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +If trace2 tracing is enabled on the server, it may advertise its session ID via > > +this capability. The client may choose to log the server's session ID in its > > +trace logs, and advertise its own session ID back to the server for it to log > > +as well. This allows for easier debugging of remote sessions when both client > > +and server logs are available. > > Have we documented what a session-id should look like anywhere in > the documentation? This document is to help third-party to write > implementations of the protocol, but the above description leaves > things "implementation defined" a bit too much, I am afraid. > > For example, as this must fit on a single pkt-line as an advertised > capability, there would be some length limit. Are there other > inherent limitations due to our protocol? Are there some artificial > limitations that we may want to impose to make it easier to harden > implementations against common mistakes? For example are bytes in > <session-id> allowed to contain LF, CR, NUL, etc.? Documented in V3. For argument's sake, I'm going to say that the tokens should be limited to printable, non-whitespace characters, and should fit on a single pkt-line.
On Thu, Nov 05 2020, Jeff Hostetler wrote: > However, the opportunity to introduce a prefix as I suggested earlier > does offer the opportunity to introduce funky chars that would not > have any protection, so you may want to c-quote the string when > inserting it into the wire protocol. > > $ GIT_TRACE2_EVENT=1 GIT_TRACE2_PARENT_SID=hello git version > {"event":"version","sid":"hello/20201030T154143.660608Z-H86606a97- > P00001d30",...} > ... > > (Allowing the user to insert a prefix like that has been useful for > tracking control/experiment testing, so I wouldn't want to disable > it.) AFAICT the way it's documented now is the "is the session-id[...]" paragraph in api-trace2.txt. I'd like to see us document the implementation a bit better and explicitly support the "hack" of setting GIT_TRACE2_PARENT_SID=hello. I.e. maybe I've missed something but we just say "session-id is prepended with the session-id of the parent" but don't mention that we separate them with slashes, so you can split on that to get the depth & individual ID's. My reading of the updated doc patch in v3 is that not allowing "non-printable or whitespace" allows you to e.g. have slashes in those custom session IDs, which would be quite inconvenient since it would break that property. And we should explicitly support the GIT_TRACE2_PARENT_SID=* setting from an external process, and make the SID definition loose enough to allow for SIDs that don't look like Git's in that chain. I.e. a useful property (and one I've seen in the wild) is to have some external programt that already has SIDs/UUID run IDs spawn git, setting GIT_TRACE2_PARENT_SID=<that program's SID> makes things convenient for the purposes of logging.n
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > AFAICT the way it's documented now is the "is the session-id[...]" > paragraph in api-trace2.txt. > > I'd like to see us document the implementation a bit better and > explicitly support the "hack" of setting GIT_TRACE2_PARENT_SID=hello. > > I.e. maybe I've missed something but we just say "session-id is > prepended with the session-id of the parent" but don't mention that we > separate them with slashes, so you can split on that to get the depth & > individual ID's. > > My reading of the updated doc patch in v3 is that not allowing > "non-printable or whitespace" allows you to e.g. have slashes in those > custom session IDs, which would be quite inconvenient since it would > break that property. A few things I want to see stakeholders agree on: - In "a/b/c", what's the "session ID" of the leaf child process? "a/b/c"? or "c"? If the former (which is what I am guessing), do we have name to refer to "b" or "c" alone (if not, we should have one)? - Do we encourage/force other implementations of Git protocol to adopt a similar "slash-separated non-whitespace ASCII printable" structure? I do not think such a structure is too limiting but others may feel differently. Or is a "session ID" supposed to be an opaque token without any structure, and upon seeing "a/b/c" the recipient should not read anything into its slash, or any relation with another session whose ID is "a/b/d"? > And we should explicitly support the GIT_TRACE2_PARENT_SID=* setting > from an external process, and make the SID definition loose enough to > allow for SIDs that don't look like Git's in that chain. I.e. a useful > property (and one I've seen in the wild) is to have some external > programt that already has SIDs/UUID run IDs spawn git, setting > GIT_TRACE2_PARENT_SID=<that program's SID> makes things convenient for > the purposes of logging.n
On 11/12/20 12:32 PM, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> AFAICT the way it's documented now is the "is the session-id[...]" >> paragraph in api-trace2.txt. >> >> I'd like to see us document the implementation a bit better and >> explicitly support the "hack" of setting GIT_TRACE2_PARENT_SID=hello. I've occasionally used that hack for control/experiment-type testing, but not that often. I was more pointing out that I had to use that environment inheritance mechanism so that child processes can be associated with their Git process ancestry. And so it is possible for someone to abuse that mechanism for other purposes (and introduce injections into what Josh is proposing). >> >> I.e. maybe I've missed something but we just say "session-id is >> prepended with the session-id of the parent" but don't mention that we >> separate them with slashes, so you can split on that to get the depth & >> individual ID's. >> >> My reading of the updated doc patch in v3 is that not allowing >> "non-printable or whitespace" allows you to e.g. have slashes in those >> custom session IDs, which would be quite inconvenient since it would >> break that property. > > A few things I want to see stakeholders agree on: > > - In "a/b/c", what's the "session ID" of the leaf child process? > "a/b/c"? or "c"? If the former (which is what I am guessing), > do we have name to refer to "b" or "c" alone (if not, we should > have one)? I consider a process' SID to be the complete "a/b/c" string. But I do know that when I look at my logging data, that I will also find data for a process with SID of "a" and data for another process with SID "a/b". So perhaps we should have names for: [1] "a/b/c" -- my process' complete SID name [2] "c" -- my process' SID component name [3] "a/b" -- my parent's complete SID name > > - Do we encourage/force other implementations of Git protocol to > adopt a similar "slash-separated non-whitespace ASCII printable" > structure? I do not think such a structure is too limiting but > others may feel differently. Or is a "session ID" supposed to be > an opaque token without any structure, and upon seeing "a/b/c" > the recipient should not read anything into its slash, or any > relation with another session whose ID is "a/b/d"? When analyzing Git perf data, there are times when you basically want to "SELECT * where SID startswith 'a/b/' ..." and summarize over the child processes of "a/b". So data from "a/b/c" and "a/b/d" would be aggregated. (I do have some of that data in the "child_exit" events reported for the "a/b" process, but sometimes you need to actually see the records for the child processes.) So I guess I'm saying that the hierarchy has been useful and we should try to keep it as is. > >> And we should explicitly support the GIT_TRACE2_PARENT_SID=* setting >> from an external process, and make the SID definition loose enough to >> allow for SIDs that don't look like Git's in that chain. I.e. a useful >> property (and one I've seen in the wild) is to have some external >> programt that already has SIDs/UUID run IDs spawn git, setting >> GIT_TRACE2_PARENT_SID=<that program's SID> makes things convenient for >> the purposes of logging.n Yes, it can be useful for external tools that drive Git to be able to set a SID prefix to help track that into Git process. Likewise, it would be nice to add code to some of the Git shell script commands (such as git-mergetool and git-prompt) to augment the SID being passed to child Git commands to help track why they are being invoked. Jeff
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index ba869a7d36..73d4ee7f27 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -27,8 +27,8 @@ and 'push-cert' capabilities are sent and recognized by the receive-pack (push to server) process. The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized -by both upload-pack and receive-pack protocols. The 'agent' capability -may optionally be sent in both protocols. +by both upload-pack and receive-pack protocols. The 'agent' and 'trace2-sid' +capabilities may optionally be sent in both protocols. All other capabilities are only recognized by the upload-pack (fetch from server) process. @@ -365,3 +365,12 @@ If the upload-pack server advertises the 'filter' capability, fetch-pack may send "filter" commands to request a partial clone or partial fetch and request that the server omit various objects from the packfile. + +trace2-sid=<session-id> +----------------------- + +If trace2 tracing is enabled on the server, it may advertise its session ID via +this capability. The client may choose to log the server's session ID in its +trace logs, and advertise its own session ID back to the server for it to log +as well. This allows for easier debugging of remote sessions when both client +and server logs are available. diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index e597b74da3..a5b9ef04f6 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -492,3 +492,12 @@ form `object-format=X`) to notify the client that the server is able to deal with objects using hash algorithm X. If not specified, the server is assumed to only handle SHA-1. If the client would like to use a hash algorithm other than SHA-1, it should specify its object-format string. + +trace2-sid=<session-id> +~~~~~~~~~~~~~~~~~~~~~~~ + +If trace2 tracing is enabled on the server, it may advertise its session ID via +this capability. The client may choose to log the server's session ID in its +trace logs, and advertise its own session ID back to the server for it to log +as well. This allows for easier debugging of remote sessions when both client +and server logs are available.
In future patches, we will add the ability for Git servers and clients to advertise their trace2 session IDs via protocol capabilities. This allows for easier debugging when both client and server logs are available. Signed-off-by: Josh Steadmon <steadmon@google.com> --- Documentation/technical/protocol-capabilities.txt | 13 +++++++++++-- Documentation/technical/protocol-v2.txt | 9 +++++++++ 2 files changed, 20 insertions(+), 2 deletions(-)