Message ID | 20240619125708.3719150-1-christian.couder@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Advertise OS version | expand |
Hello Christian, On 2024-06-19 14:57, Christian Couder wrote: > For debugging and statistical purposes, it can be useful for Git > servers to know the OS the client are using. > > So let's add a new 'os-version' capability to the v2 protocol, in the > same way as the existing 'agent' capability that lets clients and > servers exchange the Git version they are running. > > This sends the same info as `git bugreport` is already sending, which > uses uname(2). It should be the same as what `uname -srvm` returns, > except that it is sanitized in the same way as the Git version sent by > the 'agent' capability is sanitized (by replacing character having an > ascii code less than 32 or more than 127 with '.'). This may probably be a useful debugging feature, but I strongly suggest that a configuration knob exists that makes disabling it possible. For security reasons, some users may not want to publicly advertise their OSes and kernel versions. Count me in as one of such users. :) > CI tests are currently failing on Windows as it looks like uname(1) > and uname(2) don't report the same thing: > > > -os-version=MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64.2024-02-14.20:17.UTC.x86_64 > +os-version=Windows.10.0.20348 > > (See: https://github.com/chriscool/git/actions/runs/9581822699) > > Thoughts? > > Christian Couder (3): > version: refactor strbuf_sanitize() > version: refactor get_uname_info() > connect: advertise OS version > > Documentation/gitprotocol-v2.txt | 18 +++++++++ > builtin/bugreport.c | 13 +------ > connect.c | 3 ++ > serve.c | 12 ++++++ > t/t5555-http-smart-common.sh | 3 ++ > t/t5701-git-serve.sh | 3 ++ > version.c | 67 ++++++++++++++++++++++++++++---- > version.h | 10 +++++ > 8 files changed, 111 insertions(+), 18 deletions(-)
On Wed, Jun 19, 2024 at 03:18:40PM +0200, Dragan Simic wrote: > Hello Christian, > > On 2024-06-19 14:57, Christian Couder wrote: > > For debugging and statistical purposes, it can be useful for Git > > servers to know the OS the client are using. > > > > So let's add a new 'os-version' capability to the v2 protocol, in the > > same way as the existing 'agent' capability that lets clients and > > servers exchange the Git version they are running. > > > > This sends the same info as `git bugreport` is already sending, which > > uses uname(2). It should be the same as what `uname -srvm` returns, > > except that it is sanitized in the same way as the Git version sent by > > the 'agent' capability is sanitized (by replacing character having an > > ascii code less than 32 or more than 127 with '.'). > > This may probably be a useful debugging feature, but I strongly > suggest that a configuration knob exists that makes disabling it > possible. For security reasons, some users may not want to > publicly advertise their OSes and kernel versions. Count me in > as one of such users. :) Agreed. We do send the Git version, which is already a slight privacy issue (though it can be overridden at both build-time and run-time). But OS details seems like crossing a line to me. I don't mind if this is present but disabled by default, but then I guess it is not really serving much of a purpose, as hardly anybody would enable it. Which makes collecting large-scale statistics by hosting providers pretty much useless (and I don't think it is all that useful for debugging individual cases). -Peff
Hello Jeff, On 2024-06-19 15:45, Jeff King wrote: > On Wed, Jun 19, 2024 at 03:18:40PM +0200, Dragan Simic wrote: >> On 2024-06-19 14:57, Christian Couder wrote: >> > For debugging and statistical purposes, it can be useful for Git >> > servers to know the OS the client are using. >> > >> > So let's add a new 'os-version' capability to the v2 protocol, in the >> > same way as the existing 'agent' capability that lets clients and >> > servers exchange the Git version they are running. >> > >> > This sends the same info as `git bugreport` is already sending, which >> > uses uname(2). It should be the same as what `uname -srvm` returns, >> > except that it is sanitized in the same way as the Git version sent by >> > the 'agent' capability is sanitized (by replacing character having an >> > ascii code less than 32 or more than 127 with '.'). >> >> This may probably be a useful debugging feature, but I strongly >> suggest that a configuration knob exists that makes disabling it >> possible. For security reasons, some users may not want to >> publicly advertise their OSes and kernel versions. Count me in >> as one of such users. :) > > Agreed. We do send the Git version, which is already a slight privacy > issue (though it can be overridden at both build-time and run-time). > But > OS details seems like crossing a line to me. > > I don't mind if this is present but disabled by default, but then I > guess it is not really serving much of a purpose, as hardly anybody > would enable it. Which makes collecting large-scale statistics by > hosting providers pretty much useless (and I don't think it is all that > useful for debugging individual cases). I agree that it should actually be disabled by default, for privacy and security reasons, but that would actually defeat its purpose, so I'm not really sure should it be merged.
On Wed, Jun 19, 2024 at 3:50 PM Dragan Simic <dsimic@manjaro.org> wrote: > > I don't mind if this is present but disabled by default, but then I > > guess it is not really serving much of a purpose, as hardly anybody > > would enable it. Which makes collecting large-scale statistics by > > hosting providers pretty much useless (and I don't think it is all that > > useful for debugging individual cases). > > I agree that it should actually be disabled by default, for privacy > and security reasons, but that would actually defeat its purpose, so > I'm not really sure should it be merged. One possibility is to send just the `sysname`, described as 'Operating system name (e.g., "Linux")', field of the struct utsname filled out by uname(2) by default. It should be the same as what `uname -s` prints, so "Linux" for a Linux machine, and might be acceptable regarding privacy concerns. And then there might be a knob to deactivate it completely or to make it more verbose (which might be useful for example in a corporate context).
On 2024-06-19 16:01, Christian Couder wrote: > On Wed, Jun 19, 2024 at 3:50 PM Dragan Simic <dsimic@manjaro.org> > wrote: > >> > I don't mind if this is present but disabled by default, but then I >> > guess it is not really serving much of a purpose, as hardly anybody >> > would enable it. Which makes collecting large-scale statistics by >> > hosting providers pretty much useless (and I don't think it is all that >> > useful for debugging individual cases). >> >> I agree that it should actually be disabled by default, for privacy >> and security reasons, but that would actually defeat its purpose, so >> I'm not really sure should it be merged. > > One possibility is to send just the `sysname`, described as 'Operating > system name (e.g., "Linux")', field of the struct utsname filled out > by uname(2) by default. > > It should be the same as what `uname -s` prints, so "Linux" for a > Linux machine, and might be acceptable regarding privacy concerns. > > And then there might be a knob to deactivate it completely or to make > it more verbose (which might be useful for example in a corporate > context). I'd be fine with advertising "Linux" (or "Windows") only by default, because it doesn't reveal much from the privacy and security standpoint, but allows rather usable statistics to be collected. A configuration knob that would allow it to be disabled entirely, or be enabled with more details to be sent would also be fine with me.
On Wednesday, June 19, 2024 10:20 AM, Dragan Simic wrote: >On 2024-06-19 16:01, Christian Couder wrote: >> On Wed, Jun 19, 2024 at 3:50 PM Dragan Simic <dsimic@manjaro.org> >> wrote: >> >>> > I don't mind if this is present but disabled by default, but then I >>> > guess it is not really serving much of a purpose, as hardly anybody >>> > would enable it. Which makes collecting large-scale statistics by >>> > hosting providers pretty much useless (and I don't think it is all >>> > that useful for debugging individual cases). >>> >>> I agree that it should actually be disabled by default, for privacy >>> and security reasons, but that would actually defeat its purpose, so >>> I'm not really sure should it be merged. >> >> One possibility is to send just the `sysname`, described as 'Operating >> system name (e.g., "Linux")', field of the struct utsname filled out >> by uname(2) by default. >> >> It should be the same as what `uname -s` prints, so "Linux" for a >> Linux machine, and might be acceptable regarding privacy concerns. >> >> And then there might be a knob to deactivate it completely or to make >> it more verbose (which might be useful for example in a corporate >> context). > >I'd be fine with advertising "Linux" (or "Windows") only by default, because it >doesn't reveal much from the privacy and security standpoint, but allows rather >usable statistics to be collected. > >A configuration knob that would allow it to be disabled entirely, or be enabled with >more details to be sent would also be fine with me. While in the code, can I suggest including the OpenSSL version used in the build? This came up in at a customer a few weeks ago and they could not answer the question of what git build they were using. Turned out it used the wrong OpenSSL header compared to what they had installed.
On Wed, Jun 19, 2024 at 04:01:57PM +0200, Christian Couder wrote: > On Wed, Jun 19, 2024 at 3:50 PM Dragan Simic <dsimic@manjaro.org> wrote: > > > > I don't mind if this is present but disabled by default, but then I > > > guess it is not really serving much of a purpose, as hardly anybody > > > would enable it. Which makes collecting large-scale statistics by > > > hosting providers pretty much useless (and I don't think it is all that > > > useful for debugging individual cases). > > > > I agree that it should actually be disabled by default, for privacy > > and security reasons, but that would actually defeat its purpose, so > > I'm not really sure should it be merged. > > One possibility is to send just the `sysname`, described as 'Operating > system name (e.g., "Linux")', field of the struct utsname filled out > by uname(2) by default. That would be better to me. I still don't love it, but I admit it's coming more from a knee-jerk response than from some rational argument against people knowing I run Linux. Since HTTP user-agent fields are common, we can look at those for prior art. curl sends its own version but nothing else. Most browsers do seem to include some OS information. My version of firefox gives its own version along with "Linux x86_64". So basically "uname -sm". > And then there might be a knob to deactivate it completely or to make > it more verbose (which might be useful for example in a corporate > context). Yes, I think we should definitely have an option to suppress or override it, just like we do for the user-agent string. -Peff
On Wed, Jun 19, 2024 at 10:41:54AM -0400, rsbecker@nexbridge.com wrote: > >A configuration knob that would allow it to be disabled entirely, or > >be enabled with more details to be sent would also be fine with me. > > While in the code, can I suggest including the OpenSSL version used in > the build? This came up in at a customer a few weeks ago and they > could not answer the question of what git build they were using. > Turned out it used the wrong OpenSSL header compared to what they had > installed. At the point you are dealing one-on-one with somebody, I don't think protocol-level messages like this are the best spot for debugging. But it might make sense to teach "git version --build-options" to report the openssl version. -Peff
On Wednesday, June 19, 2024 10:53 AM, Peff wrote: >On Wed, Jun 19, 2024 at 10:41:54AM -0400, rsbecker@nexbridge.com wrote: > >> >A configuration knob that would allow it to be disabled entirely, or >> >be enabled with more details to be sent would also be fine with me. >> >> While in the code, can I suggest including the OpenSSL version used in >> the build? This came up in at a customer a few weeks ago and they >> could not answer the question of what git build they were using. >> Turned out it used the wrong OpenSSL header compared to what they had >> installed. > >At the point you are dealing one-on-one with somebody, I don't think protocol-level >messages like this are the best spot for debugging. But it might make sense to teach >"git version --build-options" to report the openssl version. Much better idea. Will look into it. Thanks.
On Wednesday, June 19, 2024 10:51 AM, Peff wrote: >On Wed, Jun 19, 2024 at 04:01:57PM +0200, Christian Couder wrote: > >> On Wed, Jun 19, 2024 at 3:50 PM Dragan Simic <dsimic@manjaro.org> wrote: >> >> > > I don't mind if this is present but disabled by default, but then >> > > I guess it is not really serving much of a purpose, as hardly >> > > anybody would enable it. Which makes collecting large-scale >> > > statistics by hosting providers pretty much useless (and I don't >> > > think it is all that useful for debugging individual cases). >> > >> > I agree that it should actually be disabled by default, for privacy >> > and security reasons, but that would actually defeat its purpose, so >> > I'm not really sure should it be merged. >> >> One possibility is to send just the `sysname`, described as 'Operating >> system name (e.g., "Linux")', field of the struct utsname filled out >> by uname(2) by default. > >That would be better to me. I still don't love it, but I admit it's coming more from a >knee-jerk response than from some rational argument against people knowing I run >Linux. > >Since HTTP user-agent fields are common, we can look at those for prior art. curl >sends its own version but nothing else. Most browsers do seem to include some OS >information. My version of firefox gives its own version along with "Linux x86_64". >So basically "uname -sm". > >> And then there might be a knob to deactivate it completely or to make >> it more verbose (which might be useful for example in a corporate >> context). > >Yes, I think we should definitely have an option to suppress or override it, just like >we do for the user-agent string. Instead of an override, what about a knob that specifies the uname command to use to build the value. Personally, I would use `uname -s -r -v` on NonStop to get the kernel version used in the build. The difficulty on my platform is that this is not truly useful info. The effective build OS compatibility version is in a #define __L_Series_RVU and __H_Series_RVU, so the knob might be needed in git_compat_util.h or similar. This comes from the compiler arguments, which are not yet captured.
On 2024-06-19 16:41, rsbecker@nexbridge.com wrote: > On Wednesday, June 19, 2024 10:20 AM, Dragan Simic wrote: >> On 2024-06-19 16:01, Christian Couder wrote: >>> On Wed, Jun 19, 2024 at 3:50 PM Dragan Simic <dsimic@manjaro.org> >>> wrote: >>> >>>> > I don't mind if this is present but disabled by default, but then I >>>> > guess it is not really serving much of a purpose, as hardly anybody >>>> > would enable it. Which makes collecting large-scale statistics by >>>> > hosting providers pretty much useless (and I don't think it is all >>>> > that useful for debugging individual cases). >>>> >>>> I agree that it should actually be disabled by default, for privacy >>>> and security reasons, but that would actually defeat its purpose, so >>>> I'm not really sure should it be merged. >>> >>> One possibility is to send just the `sysname`, described as >>> 'Operating >>> system name (e.g., "Linux")', field of the struct utsname filled out >>> by uname(2) by default. >>> >>> It should be the same as what `uname -s` prints, so "Linux" for a >>> Linux machine, and might be acceptable regarding privacy concerns. >>> >>> And then there might be a knob to deactivate it completely or to make >>> it more verbose (which might be useful for example in a corporate >>> context). >> >> I'd be fine with advertising "Linux" (or "Windows") only by default, >> because it >> doesn't reveal much from the privacy and security standpoint, but >> allows rather >> usable statistics to be collected. >> >> A configuration knob that would allow it to be disabled entirely, or >> be enabled with >> more details to be sent would also be fine with me. > > While in the code, can I suggest including the OpenSSL version used in > the build? This came up in at a customer a few weeks ago and they > could not answer the question of what git build they were using. > Turned out it used the wrong OpenSSL header compared to what they had > installed. Makes sense to me, but only in the non-default "advertise more details" mode of the new configuration knob.
On 2024-06-19 at 14:50:42, Jeff King wrote: > On Wed, Jun 19, 2024 at 04:01:57PM +0200, Christian Couder wrote: > > > One possibility is to send just the `sysname`, described as 'Operating > > system name (e.g., "Linux")', field of the struct utsname filled out > > by uname(2) by default. > > That would be better to me. I still don't love it, but I admit it's > coming more from a knee-jerk response than from some rational argument > against people knowing I run Linux. > > Since HTTP user-agent fields are common, we can look at those for prior > art. curl sends its own version but nothing else. Most browsers do seem > to include some OS information. My version of firefox gives its own > version along with "Linux x86_64". So basically "uname -sm". If we choose to enable this, this is the right level of detail, yeah. We could also allow a distributor to set this value at compile time, much like Debian does for Postfix and OpenSSH. For Postfix, it's simply "(Debian)", which doesn't give much information. To me as a server administrator interested in statistics, it's useful to me to know OS name and version (as in, how many users are still using an ancient version of CentOS?), since that tells me about things like supported TLS versions which is helpful, but as a user I don't think that's an appropriate level of detail to share. And I also worry about fingerprinting and tracking, which is a giant problem with HTTP user-agents. This is especially true if you're using something like FreeBSD RISC-V, which is just not that common. > > And then there might be a knob to deactivate it completely or to make > > it more verbose (which might be useful for example in a corporate > > context). > > Yes, I think we should definitely have an option to suppress or override > it, just like we do for the user-agent string. I definitely think we should have both. I'm sure we'll have some server maintainer or repository administrator who tries to reject "bad" OSes (like someone who doesn't like their employees using WSL, for example). We've already had people propose to reject access based on the version number in the name of "security", despite the fact that most Linux distros just backport security patches and thus the version number is not usually interesting in that regard. Again, HTTP user-agents tell us that people will make access control decisions here even though they should not. We'll want to honour people's decisions to remain a mystery or to work around broken server implementations, or just to make it harder to track or fingerprint them. I also think the documentation should state that for the user-agent and os-version fields that they are merely informative, can be changed, and MUST NOT be used for access control. That doesn't mean people will honour it, but it does mean that we can and should feel free to break implementations that don't comply.
On Wed, Jun 19, 2024 at 10:46:13PM +0000, brian m. carlson wrote: > We'll want to honour people's decisions to remain a mystery or to work > around broken server implementations, or just to make it harder to track > or fingerprint them. Yep, I agree with everything you said, especially this part. > I also think the documentation should state that for the user-agent and > os-version fields that they are merely informative, can be changed, and > MUST NOT be used for access control. That doesn't mean people will > honour it, but it does mean that we can and should feel free to break > implementations that don't comply. I think Christian's proposed documentation did have something along these lines. I do kind of wonder if we even need a separate "os-version" field, and if it couldn't simply be plugged into the user-agent string (making it "git/1.2.3 Linux x86_64" or something). But maybe that introduces more hassles with respect to configuring/overriding the two parts separately. -Peff
Christian Couder <christian.couder@gmail.com> writes: > For debugging and statistical purposes, it can be useful for Git > servers to know the OS the client are using. > > So let's add a new 'os-version' capability to the v2 protocol, in the > same way as the existing 'agent' capability that lets clients and > servers exchange the Git version they are running. > > This sends the same info as `git bugreport` is already sending, which > uses uname(2). It should be the same as what `uname -srvm` returns, > except that it is sanitized in the same way as the Git version sent by > the 'agent' capability is sanitized (by replacing character having an > ascii code less than 32 or more than 127 with '.'). > > CI tests are currently failing on Windows as it looks like uname(1) > and uname(2) don't report the same thing: > > -os-version=MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64.2024-02-14.20:17.UTC.x86_64 > +os-version=Windows.10.0.20348 > > (See: https://github.com/chriscool/git/actions/runs/9581822699) I think we already heard from enough people to cover the spectrum of opinions. I'd have to say that needs to be carefully kept to the minimum what we send in the 'user-agent' like manner. The "git bugreport" is an opt-in "these should help you in helping me" feature, designed to allow further redacting by the user before sending it out, and should not be compared with "on by default for everybody" telemetry data. I personally like the idea to add to user-agent, instead of adding a new capability. What is the true motivation behind this? Is this thing meant to gather statistics from potentially non-paying general public from hosting providers, or is this primarily for $CORP IT folks to make sure that nobody is being too stale?