Message ID | 44e8dd50c6ea7cbcc5e4fc35c9b9057c0a52038c.1605269465.git.ps@pks.im (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] config: extract function to parse config pairs | expand |
On Fri, Nov 13 2020, Patrick Steinhardt wrote: > While not document, it is currently possible to specify config entries "While not documented..." > + strbuf_addf(&envvar, "GIT_CONFIG_KEY_%d", i); > + if ((key = getenv(envvar.buf)) == NULL) > + break; The convention in git.git is to avoid explicit NULL checks. So maybe something like this, which also avoids the assignment inside an "if" key = getenv(envvar.buf); if (!key) break; > +test_expect_success 'git config handles environment config pairs' ' > + GIT_CONFIG_KEY_1="pair.one" GIT_CONFIG_VALUE_1="foo" \ > + GIT_CONFIG_KEY_2="pair.two" GIT_CONFIG_VALUE_2="bar" \ > + GIT_CONFIG_KEY_4="pair.four" GIT_CONFIG_VALUE_4="not-parsed" \ > + git config --get-regexp "pair.*" >actual && > + cat >expect <<-EOF && > + pair.one foo > + pair.two bar > + EOF > + test_cmp expect actual > +' > + > +test_expect_success 'git config copes with missing config pair value' ' > + GIT_CONFIG_KEY_1="pair.one" git config --get-regexp "pair.*" >actual && > + echo pair.one >expect && > + test_cmp expect actual > +' > + > +test_expect_success 'git config fails with invalid config pair key' ' > + test_must_fail env GIT_CONFIG_KEY_1= git config --list && > + test_must_fail env GIT_CONFIG_KEY_1=missing-section git config --list > +' > + > test_expect_success 'git config --edit works' ' > git config -f tmp test.value no && > echo test.value=yes >expect && I think we should have a bit more complete tests of what happens if you clobber existing config keys, and testing that this is set last after system/global/local config.
On 13/11/2020 12:16, Patrick Steinhardt wrote: > This commit thus adds a new way of adding config entries via the > environment which doesn't require splitting of keys and values. The user > can specify an config entry's key via `GIT_CONFIG_KEY_$n` and a value > via `GIT_CONFIG_VALUE_$n`, where `n` is any number starting with 1. It > is possible to add multiple entries via consecutively numbered envvars > `GIT_CONFIG_KEY_1`, `GIT_CONFIG_KEY_2`, etc, where each of the keys may > have a matching value. > When no matching value exists, it's assumed to be > the empty value. Is this a good choice of default in the face of potential mistyping when entering commands, or cut&paste editing of scripts. It's easy to see cases of mismatched KEY_2 VALUE_1 entries. Wouldn't it be better to warn about un-matched key/value pairs? > +GIT_CONFIG_KEY_1,GIT_CONFIG_VALUE_1:: Shouldn't the man page entry indicate that it's '<n>' ? Philip
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Fri, Nov 13 2020, Patrick Steinhardt wrote: > >> While not document, it is currently possible to specify config entries > > "While not documented..." > >> + strbuf_addf(&envvar, "GIT_CONFIG_KEY_%d", i); >> + if ((key = getenv(envvar.buf)) == NULL) >> + break; > > The convention in git.git is to avoid explicit NULL checks. So maybe > something like this, which also avoids the assignment inside an "if" > > key = getenv(envvar.buf); > if (!key) > break; All good suggestions, but... "While not documented" yes, for sure, but we do not document it for a good reason---it is a pure implementation detail between Git process that runs another one as its internal implementation detail. I especially do not think we want to read from unbounded number of GIT_CONFIG_KEY_<N> variables like this patch does. How would a script cleanse its environment to protect itself from stray such environment variable pair? Count up from 1 to forever? Run "env" and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if some environment variables have newline in its values?)
On Mon, Nov 16, 2020 at 11:39:35AM -0800, Junio C Hamano wrote: > >> While not document, it is currently possible to specify config entries > >> [in GIT_CONFIG_PARAMETERS] > [...] > > "While not documented" yes, for sure, but we do not document it for > a good reason---it is a pure implementation detail between Git > process that runs another one as its internal implementation detail. I have actually been quite tempted to document and promise that it will continue to work. Because it really is useful in some instances. The thing that has held me back is that the documentation would reveal how unforgiving the parser is. ;) It insists that key/value pairs are shell-quoted as a whole. I think if we made it accept a some reasonable inputs: - do not require quoting for values that do not need it - allow any amount of shell-style single-quoting (whole parameters, just values, etc). - do not bother allowing other quoting, like double-quotes with backslashes. However, document backslash and double-quote as meta-characters that must not appear outside of single-quotes, to allow for future expansion. then I'd feel comfortable making it a public-facing feature. And for most cases it would be pretty pleasant to use (and for the unpleasant ones, I'm not sure that a little quoting is any worse than the paired environment variables found here). > I especially do not think we want to read from unbounded number of > GIT_CONFIG_KEY_<N> variables like this patch does. How would a > script cleanse its environment to protect itself from stray such > environment variable pair? Count up from 1 to forever? Run "env" > and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if > some environment variables have newline in its values?) Yeah, scripts can currently assume that: unset $(git rev-parse --local-env-vars) will drop any config from the environment. In some cases, having rev-parse enumerate the GIT_CONFIG_KEY_* variables that are set would be sufficient. But that implies that rev-parse is seeing the same environment we're planning to clear. As it is now, a script is free to use rev-parse to generate that list, hold on to it, and then use it later. -Peff
On Mon, Nov 16, 2020 at 11:39:35AM -0800, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > > > On Fri, Nov 13 2020, Patrick Steinhardt wrote: > > > >> While not document, it is currently possible to specify config entries > > > > "While not documented..." > > > >> + strbuf_addf(&envvar, "GIT_CONFIG_KEY_%d", i); > >> + if ((key = getenv(envvar.buf)) == NULL) > >> + break; > > > > The convention in git.git is to avoid explicit NULL checks. So maybe > > something like this, which also avoids the assignment inside an "if" > > > > key = getenv(envvar.buf); > > if (!key) > > break; > > All good suggestions, but... > > "While not documented" yes, for sure, but we do not document it for > a good reason---it is a pure implementation detail between Git > process that runs another one as its internal implementation detail. Agreed > I especially do not think we want to read from unbounded number of > GIT_CONFIG_KEY_<N> variables like this patch does. How would a > script cleanse its environment to protect itself from stray such > environment variable pair? Count up from 1 to forever? Run "env" > and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if > some environment variables have newline in its values?) You only have to unset `GIT_CONFIG_KEY_1` as the parser will stop iterating on the first missing key. More generally, if you set `n` keys, it's sufficient to unset key `n+1`. Patrick
On Mon, Nov 16, 2020 at 09:34:54PM -0500, Jeff King wrote: > On Mon, Nov 16, 2020 at 11:39:35AM -0800, Junio C Hamano wrote: > > > >> While not document, it is currently possible to specify config entries > > >> [in GIT_CONFIG_PARAMETERS] > > [...] > > > > "While not documented" yes, for sure, but we do not document it for > > a good reason---it is a pure implementation detail between Git > > process that runs another one as its internal implementation detail. > > I have actually been quite tempted to document and promise that it will > continue to work. Because it really is useful in some instances. The > thing that has held me back is that the documentation would reveal how > unforgiving the parser is. ;) > > It insists that key/value pairs are shell-quoted as a whole. I think if > we made it accept a some reasonable inputs: > > - do not require quoting for values that do not need it > > - allow any amount of shell-style single-quoting (whole parameters, > just values, etc). > > - do not bother allowing other quoting, like double-quotes with > backslashes. However, document backslash and double-quote as > meta-characters that must not appear outside of single-quotes, to > allow for future expansion. > > then I'd feel comfortable making it a public-facing feature. And for > most cases it would be pretty pleasant to use (and for the unpleasant > ones, I'm not sure that a little quoting is any worse than the paired > environment variables found here). I tend to disagree there. As long as you control keys and values yourself it's not too hard, that's true. But as soon as you start processing untrusted keys or values, then it's getting a lot harder. E.g. suppose you create a fetch mirror for a user, where the source is protected by a password. We don't want to write the password into the gitconfig of the mirror repository. Passing it via `-C` will show up in ps(1). Using GIT_CONFIG_PARAMETERS requires you to quote the value, which contains arbitrary data, and if you fail to do that correctly you now have an avenue for arbitrary config injection. That scenario is roughly why I came up with the _KEY/_VALUE schema. It requires no quoting, is trivial to set up (at least for its target audience, which is mostly scripts or programs) and wouldn't show up in ps(1). > > I especially do not think we want to read from unbounded number of > > GIT_CONFIG_KEY_<N> variables like this patch does. How would a > > script cleanse its environment to protect itself from stray such > > environment variable pair? Count up from 1 to forever? Run "env" > > and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if > > some environment variables have newline in its values?) > > Yeah, scripts can currently assume that: > > unset $(git rev-parse --local-env-vars) > > will drop any config from the environment. In some cases, having > rev-parse enumerate the GIT_CONFIG_KEY_* variables that are set would be > sufficient. But that implies that rev-parse is seeing the same > environment we're planning to clear. As it is now, a script is free to > use rev-parse to generate that list, hold on to it, and then use it > later. Good point. Adjusting it would be trivial, though: unset all consecutive GIT_CONFIG_KEY_$n keys and potentially also GIT_CONFIG_VALUE_$n until we hit a gap. The parser would stop on the first gap anyway. Patrick > -Peff
On Fri, Nov 13, 2020 at 04:37:33PM +0000, Philip Oakley wrote: > On 13/11/2020 12:16, Patrick Steinhardt wrote: > > This commit thus adds a new way of adding config entries via the > > environment which doesn't require splitting of keys and values. The user > > can specify an config entry's key via `GIT_CONFIG_KEY_$n` and a value > > via `GIT_CONFIG_VALUE_$n`, where `n` is any number starting with 1. It > > is possible to add multiple entries via consecutively numbered envvars > > `GIT_CONFIG_KEY_1`, `GIT_CONFIG_KEY_2`, etc, where each of the keys may > > have a matching value. > > > When no matching value exists, it's assumed to be > > the empty value. > Is this a good choice of default in the face of potential mistyping when > entering commands, or cut&paste editing of scripts. It's easy to see > cases of mismatched KEY_2 VALUE_1 entries. > > Wouldn't it be better to warn about un-matched key/value pairs? Good point. I'll change this on the next iteration. > > +GIT_CONFIG_KEY_1,GIT_CONFIG_VALUE_1:: > > Shouldn't the man page entry indicate that it's '<n>' ? I wasn't quite sure how to document it, but using `<n>` would indicate better that this can be multiple envvars. Patrick > Philip
On Tue, Nov 17, 2020 at 07:37:34AM +0100, Patrick Steinhardt wrote: > > then I'd feel comfortable making it a public-facing feature. And for > > most cases it would be pretty pleasant to use (and for the unpleasant > > ones, I'm not sure that a little quoting is any worse than the paired > > environment variables found here). > > I tend to disagree there. As long as you control keys and values > yourself it's not too hard, that's true. But as soon as you start > processing untrusted keys or values, then it's getting a lot harder. > > E.g. suppose you create a fetch mirror for a user, where the source is > protected by a password. We don't want to write the password into the > gitconfig of the mirror repository. Passing it via `-C` will show up in > ps(1). Using GIT_CONFIG_PARAMETERS requires you to quote the value, > which contains arbitrary data, and if you fail to do that correctly you > now have an avenue for arbitrary config injection. > > That scenario is roughly why I came up with the _KEY/_VALUE schema. It > requires no quoting, is trivial to set up (at least for its target > audience, which is mostly scripts or programs) and wouldn't show up in > ps(1). True. Shell-quoting is pretty easy, but it's still a thing that the caller has to do (and get right). Within Git we have nice routines for that, but if you're calling from another program, you probably don't. > > Yeah, scripts can currently assume that: > > > > unset $(git rev-parse --local-env-vars) > > > > will drop any config from the environment. In some cases, having > > rev-parse enumerate the GIT_CONFIG_KEY_* variables that are set would be > > sufficient. But that implies that rev-parse is seeing the same > > environment we're planning to clear. As it is now, a script is free to > > use rev-parse to generate that list, hold on to it, and then use it > > later. > > Good point. Adjusting it would be trivial, though: unset all consecutive > GIT_CONFIG_KEY_$n keys and potentially also GIT_CONFIG_VALUE_$n until we > hit a gap. The parser would stop on the first gap anyway. That doesn't help this case, which currently works: # remember the list so we don't have to invoke rev-parse in # each iteration of the loop vars=$(git rev-parse --local-env-vars) for repo in $repos; do # start with a clean slate unset $vars export GIT_DIR=$repo git do-some-thing # oops, these won't get cleared in the next go-round because # they weren't set when we called rev-parse export GIT_CONFIG_KEY_1=foo.bar export GIT_CONFIG_VALUE_1=whatever git do-another-thing done Now I have no idea if anybody is doing something along those lines, and it's a bit convoluted (I'd probably use a subshell). And obviously nobody is doing this _exact_ thing yet, because the key/value variables don't exist yet. So maybe it would all work out (the caller of this script could have set git vars, but in that case we'd pick them up in the rev-parse call). So I dunno. It's probably unlikely to bite somebody, but it is a departure from the current design. -Peff
Patrick Steinhardt <ps@pks.im> writes: >> I especially do not think we want to read from unbounded number of >> GIT_CONFIG_KEY_<N> variables like this patch does. How would a >> script cleanse its environment to protect itself from stray such >> environment variable pair? Count up from 1 to forever? Run "env" >> and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if >> some environment variables have newline in its values?) > > You only have to unset `GIT_CONFIG_KEY_1` as the parser will stop > iterating on the first missing key. More generally, if you set `n` keys, > it's sufficient to unset key `n+1`. Yes, but those who want to set N keys would likely to be content with setting 1..N and happily forget unsetting N+1, and that is where "how would one cleanse the environment to give a clean slate?" comes from.
On Mon, Nov 16 2020, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> On Fri, Nov 13 2020, Patrick Steinhardt wrote: >> >>> While not document, it is currently possible to specify config entries >> >> "While not documented..." >> >>> + strbuf_addf(&envvar, "GIT_CONFIG_KEY_%d", i); >>> + if ((key = getenv(envvar.buf)) == NULL) >>> + break; >> >> The convention in git.git is to avoid explicit NULL checks. So maybe >> something like this, which also avoids the assignment inside an "if" >> >> key = getenv(envvar.buf); >> if (!key) >> break; > > All good suggestions, but... > > "While not documented" yes, for sure, but we do not document it for > a good reason---it is a pure implementation detail between Git > process that runs another one as its internal implementation detail. *nod* I didn't mean it should be treated as some API, just "it happens to work". I do agree with Jeff downthread that it would be nice to have it explicitly supported. > I especially do not think we want to read from unbounded number of > GIT_CONFIG_KEY_<N> variables like this patch does. How would a > script cleanse its environment to protect itself from stray such > environment variable pair? Count up from 1 to forever? Run "env" > and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if > some environment variables have newline in its values?) Purely on an implementation note, if we went that route we could provide something based on compat/unsetenv.c (or environ iteration in general) that would loop over the env, but I agree it would be better to make GIT_CONFIG_PARAMETERS nice.
On Tue, Nov 17 2020, Jeff King wrote: > On Mon, Nov 16, 2020 at 11:39:35AM -0800, Junio C Hamano wrote: > >> >> While not document, it is currently possible to specify config entries >> >> [in GIT_CONFIG_PARAMETERS] >> [...] >> >> "While not documented" yes, for sure, but we do not document it for >> a good reason---it is a pure implementation detail between Git >> process that runs another one as its internal implementation detail. > > I have actually been quite tempted to document and promise that it will > continue to work. Because it really is useful in some instances. The > thing that has held me back is that the documentation would reveal how > unforgiving the parser is. ;) > > It insists that key/value pairs are shell-quoted as a whole. I think if > we made it accept a some reasonable inputs: > > - do not require quoting for values that do not need it > > - allow any amount of shell-style single-quoting (whole parameters, > just values, etc). > > - do not bother allowing other quoting, like double-quotes with > backslashes. However, document backslash and double-quote as > meta-characters that must not appear outside of single-quotes, to > allow for future expansion. > > then I'd feel comfortable making it a public-facing feature. And for > most cases it would be pretty pleasant to use (and for the unpleasant > ones, I'm not sure that a little quoting is any worse than the paired > environment variables found here). I wonder if something like the git config -z format wouldn't be easier, with the twist that we'd obviously not support \0. So we'd need an optional length prefix. : = unspecified. :user.name Jeff K :alias.ci commit :10:bin.ary <10 byte string, might have a \n> :other.key Other Value Maybe that's overly fragile, or maybe another format would be better. I was trying to come up with one where the common case wouldn't require knowing about shell quoting/unquoting, and where you could still do: GIT_CONFIG_PARAMETERS=":my.new\nvalue\n$GIT_CONFIG_PARAMETERS" Or equivalent, and still just keep $GIT_CONFIG_PARAMETERS as-is to pass it along. Your "do not require quoting" accomplishes that, and it's arguably a lot
On Tue, Nov 17, 2020 at 03:22:05PM +0100, Ævar Arnfjörð Bjarmason wrote: > > then I'd feel comfortable making it a public-facing feature. And for > > most cases it would be pretty pleasant to use (and for the unpleasant > > ones, I'm not sure that a little quoting is any worse than the paired > > environment variables found here). > > I wonder if something like the git config -z format wouldn't be easier, > with the twist that we'd obviously not support \0. So we'd need an > optional length prefix. : = unspecified. > > :user.name > Jeff K > :alias.ci > commit > :10:bin.ary > <10 byte string, might have a \n> > :other.key > Other Value > > Maybe that's overly fragile, or maybe another format would be better. Yeah, length-delimited strings are an alternative that some people think is less error-prone than quoting. And we do use pkt-lines. They're also a pain for humans to write (it's nicer if they're optional, but when you _do_ have to start using them, now you are stuck counting things up). > I was trying to come up with one where the common case wouldn't > require knowing about shell quoting/unquoting, and where you could > still do: > > GIT_CONFIG_PARAMETERS=":my.new\nvalue\n$GIT_CONFIG_PARAMETERS" > > Or equivalent, and still just keep $GIT_CONFIG_PARAMETERS as-is to pass > it along. > > Your "do not require quoting" accomplishes that, and it's arguably a lot Looks like your mail got cut off. But yeah, the goal of making the quoting optional was to make it easier for humans to use for simple cases. It doesn't help at all with other programs inserting values, which can just as easily err on the side of caution. BTW, there is another problem with GIT_CONFIG_PARAMETERS (and "git -c" in general). The dotted config-key format: section.subsection.key is unambiguous by itself, even though "subsection" can contain arbitrary bytes, including dots. Because neither "section" nor "key" can contain dots, we can parse from either end, and take the whole middle as a subsection (and this is how we do it in the code). But an assignment string like: section.subsection.key=value _is_ ambiguous. We have to parse left-to-right up to the first equals (since "value" can contain arbitrary characters, including an equals). But "subsection" can have one, too, so we want to parse right-to-left there. E.g., in: one.two=three.four=five this could be either of: - section is "one", key is "two", value is "three.four=five" - section is "one", subsection is "two=three", key is "four", value is "five" We currently always parse it as the former (which I think is least-bad of the two, since values are more likely than subsections to contain arbitrary text with an equals). -Peff
On 2020-11-17 at 02:34:54, Jeff King wrote: > On Mon, Nov 16, 2020 at 11:39:35AM -0800, Junio C Hamano wrote: > > > >> While not document, it is currently possible to specify config entries > > >> [in GIT_CONFIG_PARAMETERS] > > [...] > > > > "While not documented" yes, for sure, but we do not document it for > > a good reason---it is a pure implementation detail between Git > > process that runs another one as its internal implementation detail. > > I have actually been quite tempted to document and promise that it will > continue to work. Because it really is useful in some instances. The > thing that has held me back is that the documentation would reveal how > unforgiving the parser is. ;) > > It insists that key/value pairs are shell-quoted as a whole. I think if > we made it accept a some reasonable inputs: > > - do not require quoting for values that do not need it > > - allow any amount of shell-style single-quoting (whole parameters, > just values, etc). > > - do not bother allowing other quoting, like double-quotes with > backslashes. However, document backslash and double-quote as > meta-characters that must not appear outside of single-quotes, to > allow for future expansion. > > then I'd feel comfortable making it a public-facing feature. And for > most cases it would be pretty pleasant to use (and for the unpleasant > ones, I'm not sure that a little quoting is any worse than the paired > environment variables found here). What if we didn't document it but provided a command that produced a suitable value? Maybe something like this: GIT_CONFIG_PARAMETERS=$(git rev-parse --quote-parameters a.b.c ENV_VAR d.e.f OTHER_ENV_VAR) Or whatever we decide. I don't personally love shell quoting as an interchange mechanism; I'd prefer something like URI-encoding, which is a bit more standardized and easier to reason about from a security perspective. But if we decide to change it, it doesn't matter, since it's still undocumented and this would be the only acceptable way to pass config through the environment. Alternatively, we could just do this: git with-config --key a.b.c --value ENV_VAR --key d.e.f --value OTHER_ENV_VAR --command git foo That would also leave it undocumented, but make it easier to work with. > > I especially do not think we want to read from unbounded number of > > GIT_CONFIG_KEY_<N> variables like this patch does. How would a > > script cleanse its environment to protect itself from stray such > > environment variable pair? Count up from 1 to forever? Run "env" > > and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if > > some environment variables have newline in its values?) > > Yeah, scripts can currently assume that: > > unset $(git rev-parse --local-env-vars) > > will drop any config from the environment. In some cases, having > rev-parse enumerate the GIT_CONFIG_KEY_* variables that are set would be > sufficient. But that implies that rev-parse is seeing the same > environment we're planning to clear. As it is now, a script is free to > use rev-parse to generate that list, hold on to it, and then use it > later. I'm also uncomfortable with an arbitrary number of keys and values. It becomes very tricky to cleanse the environment, and even if the code stops at the first gap, if you then add more entries, then you have to cleanse again or risk a security problem. I feel like this is only going to bite us in the future.
On Wed, Nov 18, 2020 at 12:50:14AM +0000, brian m. carlson wrote: > > then I'd feel comfortable making it a public-facing feature. And for > > most cases it would be pretty pleasant to use (and for the unpleasant > > ones, I'm not sure that a little quoting is any worse than the paired > > environment variables found here). > > What if we didn't document it but provided a command that produced a > suitable value? Maybe something like this: > > GIT_CONFIG_PARAMETERS=$(git rev-parse --quote-parameters a.b.c ENV_VAR d.e.f OTHER_ENV_VAR) > > Or whatever we decide. I think we mostly already have that tooling. $ GIT_CONFIG_PARAMETERS=$( git rev-parse --sq-quote \ foo.bar=value \ 'section.key=with spaces' \ "or.even=embedded 'quotes'" | sed 's/^ //'; # yuck ) $ export GIT_CONFIG_PARAMETERS $ git config --list --show-scope | grep ^command command foo.bar=value command section.key=with spaces command or.even=embedded 'quotes' The "yuck" there is because --sq-quote insists on putting a space at the front. Our parser should probably ignore that, but right now it's quite picky. Though I suppose: - do you mean ENV_VAR to literally look up an environment variable? That solves Patrick's original problem of not wanting to put sensitive values onto the command line. But it's much more annoying to use if you _don't_ already have the value in an environment variable. I'm not sure if that original problem matters here, as a program that does a lot of this would probably implement the quoting itself. - obviously it is doubling down on the shell-quoting as a public thing, and part of your point is that we would leave it opaque. I'm OK with that aspect (even if it ends up as an alias for --sq-quote for now). I'm not entirely sure people aren't using this in the wild already, though. Saying "it was undocumented" may give us a leg to stand on if we change the format, but it will still be annoying to people we break. - my example above still has the "a.b=c.d=e" ambiguity that I mentioned earlier. Fixing that requires changing the format in the environment variable. > I don't personally love shell quoting as an interchange mechanism; I'd > prefer something like URI-encoding, which is a bit more standardized and > easier to reason about from a security perspective. But if we decide to > change it, it doesn't matter, since it's still undocumented and this > would be the only acceptable way to pass config through the environment. Yes, I think concatenating uri_encode(key) + "=" + uri_encode(value) would be easier to reason about, and solves the ambiguity problem. If we are willing to break from the existing format, it's probably a reasonable direction. > Alternatively, we could just do this: > > git with-config --key a.b.c --value ENV_VAR --key d.e.f --value OTHER_ENV_VAR --command git foo > > That would also leave it undocumented, but make it easier to work with. I wondered at first how this is different from: git -c a.b.c=value foo but I guess it is meant to do the same environment-lookup? We could probably add: git --env-config a.b.c=ENV_VAR foo to have Git internally stick $ENV_VAR into $GIT_CONFIG_PARAMETERS. That avoids all of the rev-parse rigamarole, keeps the format of the environment variable opaque, and solves Patrick's problem of putting the value onto the command-line. It doesn't solve the ambiguity with "=" in the subsection, but IMHO that is not all that important a problem. -Peff
On 2020-11-18 at 01:59:07, Jeff King wrote: > I think we mostly already have that tooling. > > $ GIT_CONFIG_PARAMETERS=$( > git rev-parse --sq-quote \ > foo.bar=value \ > 'section.key=with spaces' \ > "or.even=embedded 'quotes'" | > sed 's/^ //'; # yuck > ) > $ export GIT_CONFIG_PARAMETERS > $ git config --list --show-scope | grep ^command > command foo.bar=value > command section.key=with spaces > command or.even=embedded 'quotes' > > The "yuck" there is because --sq-quote insists on putting a space at the > front. Our parser should probably ignore that, but right now it's quite > picky. I feel that this approach leaves a few things to be desired, yes. > Though I suppose: > > - do you mean ENV_VAR to literally look up an environment variable? > That solves Patrick's original problem of not wanting to put > sensitive values onto the command line. But it's much more annoying > to use if you _don't_ already have the value in an environment > variable. I'm not sure if that original problem matters here, as a > program that does a lot of this would probably implement the quoting > itself. Yeah, I did mean that, or various options to either read a literal value or from the environment. Your proposal, while it obviously works, doesn't solve Patrick's problem. > - obviously it is doubling down on the shell-quoting as a public > thing, and part of your point is that we would leave it opaque. > I'm OK with that aspect (even if it ends up as an alias for > --sq-quote for now). I'm not entirely sure people aren't using this > in the wild already, though. Saying "it was undocumented" may give > us a leg to stand on if we change the format, but it will still be > annoying to people we break. I've been telling people for some time that Git doesn't have a way to do this, so I'm comfortable sticking with that line until we have a documented way to do it. I knew we had an internal environment variable (because I was curious and looked), but I knew from just looking at it that it was internal. > Yes, I think concatenating uri_encode(key) + "=" + uri_encode(value) > would be easier to reason about, and solves the ambiguity problem. If we > are willing to break from the existing format, it's probably a > reasonable direction. We'll have to deal with embedded NULs, but other than that it seems robust and easy to reason about. I like the proposal below better, though. > I wondered at first how this is different from: > > git -c a.b.c=value foo > > but I guess it is meant to do the same environment-lookup? We could > probably add: > > git --env-config a.b.c=ENV_VAR foo > > to have Git internally stick $ENV_VAR into $GIT_CONFIG_PARAMETERS. That > avoids all of the rev-parse rigamarole, keeps the format of the > environment variable opaque, and solves Patrick's problem of putting the > value onto the command-line. Sure, that could be an option. It's the simplest, and we already know how to handle config this way. People will be able to figure out how to use it pretty easily. > It doesn't solve the ambiguity with "=" in the subsection, but IMHO that > is not all that important a problem. I'm fine with saying that we don't support environment variable names with equals signs and just going with that. It solves the ambiguity problem and I'm not sure that they could usefully work on Unix anyway. Moreover, people usually use the unrestricted data in the second chunk for URLs, which shouldn't have unquoted equals signs. You could technically name other second fields that way, but why would you when you could just not? So I'm not too worried about it either way.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > I don't personally love shell quoting as an interchange mechanism; I'd > prefer something like URI-encoding, which is a bit more standardized and > easier to reason about from a security perspective. But if we decide to > change it, it doesn't matter, since it's still undocumented and this > would be the only acceptable way to pass config through the environment. Surely. I am kind-of surprised that nobody has brought up json or yaml ;-)
On Wed, Nov 18, 2020 at 02:25:32AM +0000, brian m. carlson wrote: > On 2020-11-18 at 01:59:07, Jeff King wrote: > > Yes, I think concatenating uri_encode(key) + "=" + uri_encode(value) > > would be easier to reason about, and solves the ambiguity problem. If we > > are willing to break from the existing format, it's probably a > > reasonable direction. > > We'll have to deal with embedded NULs, but other than that it seems > robust and easy to reason about. I like the proposal below better, > though. This would be easy enough to handle and fixes all the problems I currently have. > > I wondered at first how this is different from: > > > > git -c a.b.c=value foo > > > > but I guess it is meant to do the same environment-lookup? We could > > probably add: > > > > git --env-config a.b.c=ENV_VAR foo > > > > to have Git internally stick $ENV_VAR into $GIT_CONFIG_PARAMETERS. That > > avoids all of the rev-parse rigamarole, keeps the format of the > > environment variable opaque, and solves Patrick's problem of putting the > > value onto the command-line. > > Sure, that could be an option. It's the simplest, and we already know > how to handle config this way. People will be able to figure out how to > use it pretty easily. At first, this idea sounds quite interesting. But only until one realizes that it's got the exact same problem which I'm trying to solve: there's still a point in time where one can observe config values via the command line, even though that moment now is a lot shorter compared to running the "real" git command with those keys. > > It doesn't solve the ambiguity with "=" in the subsection, but IMHO that > > is not all that important a problem. > > I'm fine with saying that we don't support environment variable names > with equals signs and just going with that. It solves the ambiguity > problem and I'm not sure that they could usefully work on Unix anyway. > > Moreover, people usually use the unrestricted data in the second chunk > for URLs, which shouldn't have unquoted equals signs. You could > technically name other second fields that way, but why would you when > you could just not? > > So I'm not too worried about it either way. There's not only URLs though, but also paths in 'includeIf.*.path'. Sure, it's unlikely that paths have an equals sign embedded. But I think we should try to not paint ourselves into a corner. This problem also would be nicely solved by URI-encoding both key and value and then passing it via GIT_CONFIG_PARAMETER. Patrick
On Wed, Nov 18 2020, Jeff King wrote: > On Tue, Nov 17, 2020 at 03:22:05PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> > then I'd feel comfortable making it a public-facing feature. And for >> > most cases it would be pretty pleasant to use (and for the unpleasant >> > ones, I'm not sure that a little quoting is any worse than the paired >> > environment variables found here). >> >> I wonder if something like the git config -z format wouldn't be easier, >> with the twist that we'd obviously not support \0. So we'd need an >> optional length prefix. : = unspecified. >> >> :user.name >> Jeff K >> :alias.ci >> commit >> :10:bin.ary >> <10 byte string, might have a \n> >> :other.key >> Other Value >> >> Maybe that's overly fragile, or maybe another format would be better. > > Yeah, length-delimited strings are an alternative that some people think > is less error-prone than quoting. And we do use pkt-lines. They're also > a pain for humans to write (it's nicer if they're optional, but when you > _do_ have to start using them, now you are stuck counting things up). > >> I was trying to come up with one where the common case wouldn't >> require knowing about shell quoting/unquoting, and where you could >> still do: >> >> GIT_CONFIG_PARAMETERS=":my.new\nvalue\n$GIT_CONFIG_PARAMETERS" >> >> Or equivalent, and still just keep $GIT_CONFIG_PARAMETERS as-is to pass >> it along. >> >> Your "do not require quoting" accomplishes that, and it's arguably a lot >n > Looks like your mail got cut off. Nothing important, probably :) > But yeah, the goal of making the quoting optional was to make it > easier for humans to use for simple cases. It doesn't help at all with > other programs inserting values, which can just as easily err on the > side of caution. > > BTW, there is another problem with GIT_CONFIG_PARAMETERS (and "git -c" > in general). The dotted config-key format: > > section.subsection.key > > is unambiguous by itself, even though "subsection" can contain arbitrary > bytes, including dots. Because neither "section" nor "key" can contain > dots, we can parse from either end, and take the whole middle as a > subsection (and this is how we do it in the code). > > But an assignment string like: > > section.subsection.key=value > > _is_ ambiguous. We have to parse left-to-right up to the first equals > (since "value" can contain arbitrary characters, including an equals). > But "subsection" can have one, too, so we want to parse right-to-left > there. E.g., in: > > one.two=three.four=five > > this could be either of: > > - section is "one", key is "two", value is "three.four=five" > > - section is "one", subsection is "two=three", key is "four", value is > "five" > > We currently always parse it as the former (which I think is least-bad > of the two, since values are more likely than subsections to contain > arbitrary text with an equals). Yeah, it's a pain to parse if it's on one line. FWIW that's the main reason for why the format I suggested moved it to \n-delimited, because keys can't contain an \n, so you can unambiguously have them be \n-delimited (as git config -z does). You do need to worry about a \n in the value, but for the common case where you don't have a \n there we wouldn't need to provide the length. Or just provide tooling as you suggested in <20201118015907.GD650959@coredump.intra.peff.net>, which I like better than any one format suggestion (including the one I suggsted). I.e. we can document that: - The variable exists - You read/write/add to it using a return value from this tool Which allows for keeping the value itself opaque and open to a future change.
On Tue, Nov 17 2020, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > >>> I especially do not think we want to read from unbounded number of >>> GIT_CONFIG_KEY_<N> variables like this patch does. How would a >>> script cleanse its environment to protect itself from stray such >>> environment variable pair? Count up from 1 to forever? Run "env" >>> and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if >>> some environment variables have newline in its values?) >> >> You only have to unset `GIT_CONFIG_KEY_1` as the parser will stop >> iterating on the first missing key. More generally, if you set `n` keys, >> it's sufficient to unset key `n+1`. > > Yes, but those who want to set N keys would likely to be content > with setting 1..N and happily forget unsetting N+1, and that is > where "how would one cleanse the environment to give a clean slate?" > comes from. Not as an argument from whataboutism, but just to note a bug/existing prior art: Nobody in this thread has mentioned GIT_PUSH_OPTION_* which works pretty much like Patrick's suggestion, and it looks like --local-env-vars misses those: $ GIT_PUSH_OPTION_0=foo GIT_PUSH_OPTION_COUNT=20 git rev-parse --local-env-vars | grep GIT_PUSH $ I haven't tested this, but I expect there's a bug where a push hook itself does a local push to another repo and that repo has a hook, that the push options are erroneously carried forward to the sub-process. That might also be a feature, depending on your point of view.
On Wed, Nov 18, 2020 at 02:49:39PM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Nov 17 2020, Junio C Hamano wrote: > > > Patrick Steinhardt <ps@pks.im> writes: > > > >>> I especially do not think we want to read from unbounded number of > >>> GIT_CONFIG_KEY_<N> variables like this patch does. How would a > >>> script cleanse its environment to protect itself from stray such > >>> environment variable pair? Count up from 1 to forever? Run "env" > >>> and grep for "GIT_CONFIG_KEY_[0-9]*=" (the answer is No. What if > >>> some environment variables have newline in its values?) > >> > >> You only have to unset `GIT_CONFIG_KEY_1` as the parser will stop > >> iterating on the first missing key. More generally, if you set `n` keys, > >> it's sufficient to unset key `n+1`. > > > > Yes, but those who want to set N keys would likely to be content > > with setting 1..N and happily forget unsetting N+1, and that is > > where "how would one cleanse the environment to give a clean slate?" > > comes from. > > Not as an argument from whataboutism, but just to note a bug/existing > prior art: > > Nobody in this thread has mentioned GIT_PUSH_OPTION_* which works pretty > much like Patrick's suggestion, and it looks like --local-env-vars > misses those: > > $ GIT_PUSH_OPTION_0=foo GIT_PUSH_OPTION_COUNT=20 git rev-parse --local-env-vars | grep GIT_PUSH > $ > > I haven't tested this, but I expect there's a bug where a push hook > itself does a local push to another repo and that repo has a hook, that > the push options are erroneously carried forward to the sub-process. > > That might also be a feature, depending on your point of view. I didn't actually know about it, thanks for pointing it out! If we're going to use the same `_COUNT` approach, then I think the issues which were discussed would mostly go away. No gaps needed, no requirement to unset `$n+1`. Any properly behaving user would know to set it as otherwise the written code/script cannot work. Also, git-rev-parse(1) wouldn't have to dynamically adjust based on whether a previously existing gap was filled with new keys or not. I'd be happy to pursue that road, but I'll wait for some feedback first. Patrick
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Nobody in this thread has mentioned GIT_PUSH_OPTION_* which works pretty > much like Patrick's suggestion, and it looks like --local-env-vars > misses those: > > $ GIT_PUSH_OPTION_0=foo GIT_PUSH_OPTION_COUNT=20 git rev-parse --local-env-vars | grep GIT_PUSH > $ > > I haven't tested this, but I expect there's a bug where a push hook > itself does a local push to another repo and that repo has a hook, that > the push options are erroneously carried forward to the sub-process. True. Nobody mentioned the environment variable in the discussion, and nobody discovered and was motivated enough to report and/or fix it, may be a good indication that these variables are not much used in real life and certainly not in combination with hooks that further push things out.
On 2020-11-18 at 07:04:30, Patrick Steinhardt wrote: > On Wed, Nov 18, 2020 at 02:25:32AM +0000, brian m. carlson wrote: > > Sure, that could be an option. It's the simplest, and we already know > > how to handle config this way. People will be able to figure out how to > > use it pretty easily. > > At first, this idea sounds quite interesting. But only until one > realizes that it's got the exact same problem which I'm trying to solve: > there's still a point in time where one can observe config values via > the command line, even though that moment now is a lot shorter compared > to running the "real" git command with those keys. I don't think that's the case. This command: git --env-config a.b.c=ENV_VAR would be equivalent to this shell command: git -c "a.b.c=$ENV_VAR" In other words, ENV_VAR is the _name_ of a environment variable to read for the config value. Subprocesses would inherit it using the undocumented GIT_CONFIG_PARAMETERS. Or are you trying to hide the configuration key as well?
On Thu, Nov 19, 2020 at 02:11:16AM +0000, brian m. carlson wrote: > On 2020-11-18 at 07:04:30, Patrick Steinhardt wrote: > > On Wed, Nov 18, 2020 at 02:25:32AM +0000, brian m. carlson wrote: > > > Sure, that could be an option. It's the simplest, and we already know > > > how to handle config this way. People will be able to figure out how to > > > use it pretty easily. > > > > At first, this idea sounds quite interesting. But only until one > > realizes that it's got the exact same problem which I'm trying to solve: > > there's still a point in time where one can observe config values via > > the command line, even though that moment now is a lot shorter compared > > to running the "real" git command with those keys. > > I don't think that's the case. This command: > > git --env-config a.b.c=ENV_VAR > > would be equivalent to this shell command: > > git -c "a.b.c=$ENV_VAR" > > In other words, ENV_VAR is the _name_ of a environment variable to read > for the config value. Subprocesses would inherit it using the > undocumented GIT_CONFIG_PARAMETERS. > > Or are you trying to hide the configuration key as well? No. I just didn't realize that it's supposed to be the name of an envvar. Patrick
diff --git a/Documentation/git-config.txt b/Documentation/git-config.txt index 7573160f21..83fbac3705 100644 --- a/Documentation/git-config.txt +++ b/Documentation/git-config.txt @@ -335,6 +335,12 @@ GIT_CONFIG_NOSYSTEM:: Whether to skip reading settings from the system-wide $(prefix)/etc/gitconfig file. See linkgit:git[1] for details. +GIT_CONFIG_KEY_1,GIT_CONFIG_VALUE_1:: + Each pair of GIT_CONFIG_KEY_/GIT_CONFIG_VALUE_ is added to the process's + runtime configuration. It is possible to add multiple entries by adding + consecutively numbered pairs, starting at 1. If the value corresponding + to a key is not set, it is treated as if it was empty. + See also <<FILES>>. diff --git a/config.c b/config.c index 3281b1374e..ab40479df2 100644 --- a/config.c +++ b/config.c @@ -485,37 +485,56 @@ int git_config_parse_parameter(const char *text, int git_config_from_parameters(config_fn_t fn, void *data) { const char *env = getenv(CONFIG_DATA_ENVIRONMENT); + struct strbuf envvar = STRBUF_INIT; int ret = 0; - char *envw; + char *envw = NULL; const char **argv = NULL; int nr = 0, alloc = 0; int i; struct config_source source; - if (!env) - return 0; - memset(&source, 0, sizeof(source)); source.prev = cf; source.origin_type = CONFIG_ORIGIN_CMDLINE; cf = &source; - /* sq_dequote will write over it */ - envw = xstrdup(env); + if (env) { + /* sq_dequote will write over it */ + envw = xstrdup(env); - if (sq_dequote_to_argv(envw, &argv, &nr, &alloc) < 0) { - ret = error(_("bogus format in %s"), CONFIG_DATA_ENVIRONMENT); - goto out; + if (sq_dequote_to_argv(envw, &argv, &nr, &alloc) < 0) { + ret = error(_("bogus format in %s"), CONFIG_DATA_ENVIRONMENT); + goto out; + } + + for (i = 0; i < nr; i++) { + if (git_config_parse_parameter(argv[i], fn, data) < 0) { + ret = -1; + goto out; + } + } } - for (i = 0; i < nr; i++) { - if (git_config_parse_parameter(argv[i], fn, data) < 0) { + for (i = 1; i; i++) { + const char *key, *value; + + strbuf_addf(&envvar, "GIT_CONFIG_KEY_%d", i); + if ((key = getenv(envvar.buf)) == NULL) + break; + strbuf_reset(&envvar); + + strbuf_addf(&envvar, "GIT_CONFIG_VALUE_%d", i); + value = getenv(envvar.buf); + strbuf_reset(&envvar); + + if (config_parse_pair(key, value, fn, data) < 0) { ret = -1; goto out; } } out: + strbuf_release(&envvar); free(argv); free(envw); cf = source.prev; diff --git a/t/t1300-config.sh b/t/t1300-config.sh index 825d9a184f..2ae9533aa8 100755 --- a/t/t1300-config.sh +++ b/t/t1300-config.sh @@ -1316,6 +1316,29 @@ test_expect_success 'detect bogus GIT_CONFIG_PARAMETERS' ' git config --get-regexp "env.*" ' +test_expect_success 'git config handles environment config pairs' ' + GIT_CONFIG_KEY_1="pair.one" GIT_CONFIG_VALUE_1="foo" \ + GIT_CONFIG_KEY_2="pair.two" GIT_CONFIG_VALUE_2="bar" \ + GIT_CONFIG_KEY_4="pair.four" GIT_CONFIG_VALUE_4="not-parsed" \ + git config --get-regexp "pair.*" >actual && + cat >expect <<-EOF && + pair.one foo + pair.two bar + EOF + test_cmp expect actual +' + +test_expect_success 'git config copes with missing config pair value' ' + GIT_CONFIG_KEY_1="pair.one" git config --get-regexp "pair.*" >actual && + echo pair.one >expect && + test_cmp expect actual +' + +test_expect_success 'git config fails with invalid config pair key' ' + test_must_fail env GIT_CONFIG_KEY_1= git config --list && + test_must_fail env GIT_CONFIG_KEY_1=missing-section git config --list +' + test_expect_success 'git config --edit works' ' git config -f tmp test.value no && echo test.value=yes >expect &&
While not document, it is currently possible to specify config entries via the environment by passing `GIT_CONFIG_PARAMETERS`. This variable is expected to hold one or multiple "section.key=value" entries separated by space. Next to being undocumented, this way of passing config entries has a major downside: the config keys need to be parsed. As such, it is left to the user to escape any potentially harmful characters in the value, which is quite hard to do if values are controlled by a third party. This commit thus adds a new way of adding config entries via the environment which doesn't require splitting of keys and values. The user can specify an config entry's key via `GIT_CONFIG_KEY_$n` and a value via `GIT_CONFIG_VALUE_$n`, where `n` is any number starting with 1. It is possible to add multiple entries via consecutively numbered envvars `GIT_CONFIG_KEY_1`, `GIT_CONFIG_KEY_2`, etc, where each of the keys may have a matching value. When no matching value exists, it's assumed to be the empty value. While the same can be achieved with `git -c <name>=<value>`, one may wish to not do so for potentially sensitive information. E.g. if one wants to set `http.extraHeader` to contain an authentication token, doing so via `-c` would trivially leak those credentials via e.g. ps(1), which typically also shows command arguments. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/git-config.txt | 6 ++++++ config.c | 41 ++++++++++++++++++++++++++---------- t/t1300-config.sh | 23 ++++++++++++++++++++ 3 files changed, 59 insertions(+), 11 deletions(-)