Message ID | xmqqk0px3dfu.fsf@gitster.g (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Pass or not to pass config environment down... | expand |
On Tue, Mar 23, 2021 at 11:58 AM Junio C Hamano <gitster@pobox.com> wrote: > > I was grepping around and found this piece of code today: > > static void prepare_submodule_repo_env_no_git_dir(struct strvec *out) > { > const char * const *var; > > for (var = local_repo_env; *var; var++) { > if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) > strvec_push(out, *var); > } > } > > which tries to "unsetenv" the environment variables that pertain to > the current repository listed in loocal_repo_env[], but makes > exception for GIT_CONFIG_PARAMETERS. Right. We need to unset some parameters, but not everything... > > It originally came from 14111fc4 (git: submodule honor -c > credential.* from command line, 2016-02-29) and later simplified by > 89044baa (submodule: stop sanitizing config options, 2016-05-04). > > Now after d8d77153 (config: allow specifying config entries via > envvar pairs, 2021-01-12), we have yet another way to pass a set of > custom one-shot configuration via the environment variable, using > GIT_CONFIG_COUNT (which is in local_repo_env[] and will be removed > from the environment by this helper function), GIT_CONFIG_KEY_$n and > GIT_CONFIG_VALUE_$n (which are unbound set and naturally not in > local_repo_env[]). Leaving the latter two exported will not hurt if > we do intend to hide the custom configuration from the subprocess by > unsetting GIT_CONFIG_COUNT, but should we be doing so? > I think it's a difficult question because if I recall, there was some question/concern about what values need to stay for the submodule vs which ones do not..? > There are many run_command() users that just pass local_repo_env[] > to the child.env when running a subprocess. Given that the code > that works in a submodule, which presumably is THE primary target > of the "we do not want to pass environment variables that pertain to > the current repository but not to the repository the child process > works in" consideration that the local_repo_env[] is about, does *not* > want the GIT_CONFIG_PARAMETERS cleansed, I have to wonder if the > environment variables (the original GIT_CONFIG_PARAMETERS as well as > Patrick's GIT_CONFIG_{COUNT,KEY_$n,VALUE_$n}) should be in that > local_repo_env[] array in the first place. If we remove them, the > above helper function can just go away and be replaced with the > usual child.env = local_repo_env assignment like everybody else. > I think that makes a reasonable amount of sense. So if I understand right, this change makes it so that CONFIG_DATA_ENVIRONMENT and CONFIG_COUNT_ENVIRONMENT will no longer be forwarded? I guess I am a bit confused about the current status vs what we want here. > Comments? > > environment.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git c/environment.c w/environment.c > index 2f27008424..6dcee6a9c5 100644 > --- c/environment.c > +++ w/environment.c > @@ -116,8 +116,6 @@ static char *super_prefix; > const char * const local_repo_env[] = { > ALTERNATE_DB_ENVIRONMENT, > CONFIG_ENVIRONMENT, > - CONFIG_DATA_ENVIRONMENT, > - CONFIG_COUNT_ENVIRONMENT, > DB_ENVIRONMENT, > GIT_DIR_ENVIRONMENT, > GIT_WORK_TREE_ENVIRONMENT,
Jacob Keller <jacob.keller@gmail.com> writes: > On Tue, Mar 23, 2021 at 11:58 AM Junio C Hamano <gitster@pobox.com> wrote: >> >> I was grepping around and found this piece of code today: >> >> static void prepare_submodule_repo_env_no_git_dir(struct strvec *out) >> { >> const char * const *var; >> >> for (var = local_repo_env; *var; var++) { >> if (strcmp(*var, CONFIG_DATA_ENVIRONMENT)) >> strvec_push(out, *var); >> } >> } >> >> which tries to "unsetenv" the environment variables that pertain to >> the current repository listed in loocal_repo_env[], but makes >> exception for GIT_CONFIG_PARAMETERS. > > Right. We need to unset some parameters, but not everything... > >> >> It originally came from 14111fc4 (git: submodule honor -c >> credential.* from command line, 2016-02-29) and later simplified by >> 89044baa (submodule: stop sanitizing config options, 2016-05-04). >> >> Now after d8d77153 (config: allow specifying config entries via >> envvar pairs, 2021-01-12), we have yet another way to pass a set of >> custom one-shot configuration via the environment variable, using >> GIT_CONFIG_COUNT (which is in local_repo_env[] and will be removed >> from the environment by this helper function), GIT_CONFIG_KEY_$n and >> GIT_CONFIG_VALUE_$n (which are unbound set and naturally not in >> local_repo_env[]). Leaving the latter two exported will not hurt if >> we do intend to hide the custom configuration from the subprocess by >> unsetting GIT_CONFIG_COUNT, but should we be doing so? >> > > I think it's a difficult question because if I recall, there was some > question/concern about what values need to stay for the submodule vs > which ones do not..? > >> There are many run_command() users that just pass local_repo_env[] >> to the child.env when running a subprocess. Given that the code >> that works in a submodule, which presumably is THE primary target >> of the "we do not want to pass environment variables that pertain to >> the current repository but not to the repository the child process >> works in" consideration that the local_repo_env[] is about, does *not* >> want the GIT_CONFIG_PARAMETERS cleansed, I have to wonder if the >> environment variables (the original GIT_CONFIG_PARAMETERS as well as >> Patrick's GIT_CONFIG_{COUNT,KEY_$n,VALUE_$n}) should be in that >> local_repo_env[] array in the first place. If we remove them, the >> above helper function can just go away and be replaced with the >> usual child.env = local_repo_env assignment like everybody else. >> > > I think that makes a reasonable amount of sense. So if I understand > right, this change makes it so that CONFIG_DATA_ENVIRONMENT and > CONFIG_COUNT_ENVIRONMENT will no longer be forwarded? I guess I am a > bit confused about the current status vs what we want here. Many callers do child.env = local_repo_env when spawning a subprocess. The elements of child.env is treated as if each of them were passed to setenv() (if there is '=') or unsetenv (otherwise), and because local_repo_env[] spells only the variable names, the effect is to unexport them. The helper function shown at the beginning of the message you are responding to, which you wrote more than 5 years ago, is to exclude GIT_CONFIG_PARAMETERS from that treatment. I.e. the code wants run_command() not to drop that particular environment variable when running a subprocess. Removing GIT_CONFIG_PARAMETERS from local_repo_env[] should have the same effect, without the helper to special case it in its loop. We have been passing GIT_CONFIG_PARAMETERS, and we will keep passing it even if we make such a change to remove it from local_repo_env[]. The configuration parameters passed via the newer GIT_CONFIG_COUNT mechanism, because local_repo_env[] has it but the above helper does not special case it, are dropped and not seen by the subprocess. Assuming that it is a bug and we would want to pass them to the subprocess the same way as GIT_CONFIG_PARAMETERS environment variable, we could tweak the helper function to make it special case GIT_CONFIG_COUNT the same way as we've done GIT_CONFIG_PARAMETERS for the past 5 years. But if we suspect that other codepaths (not the ones that use the above helper) may be doing it wrong and they too should pass the configuration parameters to the subprocess, a simpler way would be to remove them from local_repo_env[]. That is the summary of the current status and what would happen if we did the attached patch. >> Comments? >> >> environment.c | 2 -- >> 1 file changed, 2 deletions(-) >> >> diff --git c/environment.c w/environment.c >> index 2f27008424..6dcee6a9c5 100644 >> --- c/environment.c >> +++ w/environment.c >> @@ -116,8 +116,6 @@ static char *super_prefix; >> const char * const local_repo_env[] = { >> ALTERNATE_DB_ENVIRONMENT, >> CONFIG_ENVIRONMENT, >> - CONFIG_DATA_ENVIRONMENT, >> - CONFIG_COUNT_ENVIRONMENT, >> DB_ENVIRONMENT, >> GIT_DIR_ENVIRONMENT, >> GIT_WORK_TREE_ENVIRONMENT,
On Tue, Mar 23, 2021 at 2:48 PM Junio C Hamano <gitster@pobox.com> wrote: > Many callers do child.env = local_repo_env when spawning a > subprocess. The elements of child.env is treated as if each of them > were passed to setenv() (if there is '=') or unsetenv (otherwise), > and because local_repo_env[] spells only the variable names, the > effect is to unexport them. The helper function shown at the > beginning of the message you are responding to, which you wrote more > than 5 years ago, is to exclude GIT_CONFIG_PARAMETERS from that > treatment. I.e. the code wants run_command() not to drop that > particular environment variable when running a subprocess. > Ok, right. So the part that was confusing me is that by adding the value into the local_repo_env, we were telling it to clear the variables. Yep. > Removing GIT_CONFIG_PARAMETERS from local_repo_env[] should have the > same effect, without the helper to special case it in its loop. > We have been passing GIT_CONFIG_PARAMETERS, and we will keep passing > it even if we make such a change to remove it from local_repo_env[]. > Yea, makes sense. I think that's a much better approach than special casing in a separate function. > The configuration parameters passed via the newer GIT_CONFIG_COUNT > mechanism, because local_repo_env[] has it but the above helper does > not special case it, are dropped and not seen by the subprocess. > Assuming that it is a bug and we would want to pass them to the > subprocess the same way as GIT_CONFIG_PARAMETERS environment > variable, we could tweak the helper function to make it special case > GIT_CONFIG_COUNT the same way as we've done GIT_CONFIG_PARAMETERS > for the past 5 years. But if we suspect that other codepaths (not > the ones that use the above helper) may be doing it wrong and they > too should pass the configuration parameters to the subprocess, a > simpler way would be to remove them from local_repo_env[]. > > That is the summary of the current status and what would happen if > we did the attached patch. > Thank you for restating and clarifying. I think this is the right approach, and I agree that GIT_CONFIG_COUNT should be treated in the same way as GIT_CONFIG_PARAMETERS. So, I think this direction is good. I imagine a full patch would include also dropping the specialized helper function that is no longer needed, and possibly adding new tests for the behavior of GIT_CONFIG_COUNT? Thanks, Jake
Jacob Keller <jacob.keller@gmail.com> writes: > So, I think this direction is good. I imagine a full patch would > include also dropping the specialized helper function that is no > longer needed, and possibly adding new tests for the behavior of > GIT_CONFIG_COUNT? Yeah, coding that is the easiest part. Thinking through ramifications of making (or not making) such a change is much harder. I said "assuming" number of times, because I am not so sure if the subprocesses spawned from other codepaths do or do not want to see the one-shot custom configuration settings. If that assumption turns out to be wrong and the processes spawned using the helper in various helper functions in submodule.c are the oddball cases that want to see the custom configuration, then such a change would break existing users. I _think_ the one in connect.c, which runs either the ssh transport (for which the processes that run on the other side in the other repository won't be affected by our environment anyway) or the file transport that runs another process and talks with it over a pipe is probably OK if the configuration on the "client" side leaks through to the "server" side, e.g. $ git -c advice.ignoredHook=false clone file:///the/repo.git/ here would probably want the other end (i.e. the one that runs upload-pack in /the/repo.git/ directory) to see the one-shot configuration, too. I do not think it makes much difference to the use of local_repo_env in object-file.c::for_each_alternate_ref() either way; it could be used (via core.alternateRefsCommand) an arbitrary command in each alternate repository, but by default it runs for-each-ref in them, and I do not think of any configuration variables that would be useful on "the other side". And I suspect that trailers.c::apply_command() excludes these environment variables just out of habit without much deep thinking. It is not going in a different repository to run the command, and santitizing the environment that pertains to this repository should not have any meaningful effect [*]. So, I would not be surprised if it were a totally safe change, but I am not yet sure. Thanks. [Footnote] * Christian CC'ed for 85039fb6 (trailer: execute command from 'trailer.<name>.command', 2014-10-13) that introduced the code together with its use of local_repo_env[].
On Tue, Mar 23, 2021 at 03:35:07PM -0700, Junio C Hamano wrote: > > So, I think this direction is good. I imagine a full patch would > > include also dropping the specialized helper function that is no > > longer needed, and possibly adding new tests for the behavior of > > GIT_CONFIG_COUNT? > > Yeah, coding that is the easiest part. Thinking through > ramifications of making (or not making) such a change is much > harder. > > I said "assuming" number of times, because I am not so sure if the > subprocesses spawned from other codepaths do or do not want to see > the one-shot custom configuration settings. If that assumption > turns out to be wrong and the processes spawned using the helper in > various helper functions in submodule.c are the oddball cases that > want to see the custom configuration, then such a change would break > existing users. I think it really depends on the command being spawned. But keep in mind that the local_repo_env list is not limited just to callers inside of Git. We expose it to the user via rev-parse, so scripts can do: unset $(git rev-parse --local-env-vars) cd /some/other/repo I'm hesitant to change the output there, since we don't know exactly how it's used in the wild[1]. Changing what our internal callers do is less risky, though I'd generally avoid doing so unless there is a known benefit. And I'm sure what the benefit is; I think this came up mostly because you were looking at harmonizing the behavior of the two config systems (and I think that _is_ worth doing, but I'd probably choose the historical behavior for the new system). I also think it really depends on the specific config the user is expecting to get passed. Remember we used to have a whitelist for "this config is OK to pass to submodules", but it was such a mess that we did away with it in 89044baa8b (submodule: stop sanitizing config options, 2016-05-04). > I _think_ the one in connect.c, which runs either the ssh transport > (for which the processes that run on the other side in the other > repository won't be affected by our environment anyway) or the file > transport that runs another process and talks with it over a pipe is > probably OK if the configuration on the "client" side leaks through > to the "server" side, e.g. > > $ git -c advice.ignoredHook=false clone file:///the/repo.git/ here > > would probably want the other end (i.e. the one that runs upload-pack > in /the/repo.git/ directory) to see the one-shot configuration, too. That example is one of the reasons I prefer _not_ to pass config here. It only works over local-process invocations! Not over ssh://, nor git://, nor https://. Even though it will do what you want in this case, the overall behavior is more confusing. The more-consistent (or less inconsistent, perhaps) way is: git clone -u 'git -c advice.ignoredHook=false upload-pack' \ file:///the/repo.git which also works with ssh. It of course _doesn't_ work with other protocols, but I think the technique at least makes it more clear why that is the case (you do not get to specify arbitrary shell commands to https servers). > I do not think it makes much difference to the use of local_repo_env > in object-file.c::for_each_alternate_ref() either way; it could be > used (via core.alternateRefsCommand) an arbitrary command in each > alternate repository, but by default it runs for-each-ref in them, > and I do not think of any configuration variables that would be > useful on "the other side". > > And I suspect that trailers.c::apply_command() excludes these > environment variables just out of habit without much deep thinking. > It is not going in a different repository to run the command, and > santitizing the environment that pertains to this repository should > not have any meaningful effect [*]. > > So, I would not be surprised if it were a totally safe change, but I > am not yet sure. My suspicion is that for most cases, nobody cares that much either way (which is why we have not seen people ask "hey, why is my config not passed down" in any context _except_ submodules). -Peff [1] The one place I have used "rev-parse --local-env-vars" is in scripts to handle alternates storage. When we "sync" from a fork to the shared-object repo, there are some Git commands that must run in one, and some in the other. Likewise, there is "do maintenance" script which can be triggered from a fork, but then moves into the shared repo. Some of those sites explicitly allow config by doing: unset $(git rev-parse --local-env-vars | grep -v '^GIT_CONFIG') because we want "git -c pack.threads=17 do-the-maintenance" to respect that config after moving into the shared repo to do the repack. But others don't. _Probably_ nothing would go too wrong if we retained config, but I'd want to look at each. Which makes me concerned that other scripts in the wild would likewise be potentially surprised by this.
On Wed, Mar 24, 2021 at 02:39:51PM -0400, Jeff King wrote: > On Tue, Mar 23, 2021 at 03:35:07PM -0700, Junio C Hamano wrote: > > > > So, I think this direction is good. I imagine a full patch would > > > include also dropping the specialized helper function that is no > > > longer needed, and possibly adding new tests for the behavior of > > > GIT_CONFIG_COUNT? > > > > Yeah, coding that is the easiest part. Thinking through > > ramifications of making (or not making) such a change is much > > harder. > > > > I said "assuming" number of times, because I am not so sure if the > > subprocesses spawned from other codepaths do or do not want to see > > the one-shot custom configuration settings. If that assumption > > turns out to be wrong and the processes spawned using the helper in > > various helper functions in submodule.c are the oddball cases that > > want to see the custom configuration, then such a change would break > > existing users. > > I think it really depends on the command being spawned. But keep in mind > that the local_repo_env list is not limited just to callers inside of > Git. We expose it to the user via rev-parse, so scripts can do: > > unset $(git rev-parse --local-env-vars) > cd /some/other/repo > > I'm hesitant to change the output there, since we don't know exactly how > it's used in the wild[1]. Changing what our internal callers do is less > risky, though I'd generally avoid doing so unless there is a known > benefit. And I'm sure what the benefit is; I think this came up mostly > because you were looking at harmonizing the behavior of the two config > systems (and I think that _is_ worth doing, but I'd probably choose the > historical behavior for the new system). Agreed. I wasn't aware of this helper function at all, and aligning both config systems so they have the same behaviour there seems like the right thing to do to me. > I also think it really depends on the specific config the user is > expecting to get passed. Remember we used to have a whitelist for "this > config is OK to pass to submodules", but it was such a mess that we did > away with it in 89044baa8b (submodule: stop sanitizing config options, > 2016-05-04). That also came to my mind while this thread. I can see why it would be useful if e.g. `gc.auto=0` gets passed down to all subcommands spawned by git. But if the user for example injects remote configuration via config envvars, then it'd certainly be unexpected if submodules would try to fetch from the same in-memory remote as the parent on a recursive fetch. > > I _think_ the one in connect.c, which runs either the ssh transport > > (for which the processes that run on the other side in the other > > repository won't be affected by our environment anyway) or the file > > transport that runs another process and talks with it over a pipe is > > probably OK if the configuration on the "client" side leaks through > > to the "server" side, e.g. > > > > $ git -c advice.ignoredHook=false clone file:///the/repo.git/ here > > > > would probably want the other end (i.e. the one that runs upload-pack > > in /the/repo.git/ directory) to see the one-shot configuration, too. > > That example is one of the reasons I prefer _not_ to pass config here. > It only works over local-process invocations! Not over ssh://, nor > git://, nor https://. Even though it will do what you want in this case, > the overall behavior is more confusing. > > The more-consistent (or less inconsistent, perhaps) way is: > > git clone -u 'git -c advice.ignoredHook=false upload-pack' \ > file:///the/repo.git > > which also works with ssh. It of course _doesn't_ work with other > protocols, but I think the technique at least makes it more clear why > that is the case (you do not get to specify arbitrary shell commands to > https servers). > > > I do not think it makes much difference to the use of local_repo_env > > in object-file.c::for_each_alternate_ref() either way; it could be > > used (via core.alternateRefsCommand) an arbitrary command in each > > alternate repository, but by default it runs for-each-ref in them, > > and I do not think of any configuration variables that would be > > useful on "the other side". > > > > And I suspect that trailers.c::apply_command() excludes these > > environment variables just out of habit without much deep thinking. > > It is not going in a different repository to run the command, and > > santitizing the environment that pertains to this repository should > > not have any meaningful effect [*]. > > > > So, I would not be surprised if it were a totally safe change, but I > > am not yet sure. > > My suspicion is that for most cases, nobody cares that much either way > (which is why we have not seen people ask "hey, why is my config not > passed down" in any context _except_ submodules). Probably not, but it may be a good idea to document config boundaries such that nobody is caught by surprise there. Patrick
diff --git c/environment.c w/environment.c index 2f27008424..6dcee6a9c5 100644 --- c/environment.c +++ w/environment.c @@ -116,8 +116,6 @@ static char *super_prefix; const char * const local_repo_env[] = { ALTERNATE_DB_ENVIRONMENT, CONFIG_ENVIRONMENT, - CONFIG_DATA_ENVIRONMENT, - CONFIG_COUNT_ENVIRONMENT, DB_ENVIRONMENT, GIT_DIR_ENVIRONMENT, GIT_WORK_TREE_ENVIRONMENT,