diff mbox series

Pass or not to pass config environment down...

Message ID xmqqk0px3dfu.fsf@gitster.g (mailing list archive)
State New, archived
Headers show
Series Pass or not to pass config environment down... | expand

Commit Message

Junio C Hamano March 23, 2021, 6:57 p.m. UTC
I was grepping around and found this piece of code today:

        static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
        {
                const char * const *var;

                for (var = local_repo_env; *var; var++) {
                        if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
                                strvec_push(out, *var);
                }
        }

which tries to "unsetenv" the environment variables that pertain to
the current repository listed in loocal_repo_env[], but makes
exception for GIT_CONFIG_PARAMETERS.

It originally came from 14111fc4 (git: submodule honor -c
credential.* from command line, 2016-02-29) and later simplified by
89044baa (submodule: stop sanitizing config options, 2016-05-04).

Now after d8d77153 (config: allow specifying config entries via
envvar pairs, 2021-01-12), we have yet another way to pass a set of
custom one-shot configuration via the environment variable, using
GIT_CONFIG_COUNT (which is in local_repo_env[] and will be removed
from the environment by this helper function), GIT_CONFIG_KEY_$n and
GIT_CONFIG_VALUE_$n (which are unbound set and naturally not in
local_repo_env[]).  Leaving the latter two exported will not hurt if
we do intend to hide the custom configuration from the subprocess by
unsetting GIT_CONFIG_COUNT, but should we be doing so?

There are many run_command() users that just pass local_repo_env[]
to the child.env when running a subprocess.  Given that the code
that works in a submodule, which presumably is THE primary target
of the "we do not want to pass environment variables that pertain to
the current repository but not to the repository the child process
works in" consideration that the local_repo_env[] is about, does *not*
want the GIT_CONFIG_PARAMETERS cleansed, I have to wonder if the
environment variables (the original GIT_CONFIG_PARAMETERS as well as
Patrick's GIT_CONFIG_{COUNT,KEY_$n,VALUE_$n}) should be in that
local_repo_env[] array in the first place.  If we remove them, the
above helper function can just go away and be replaced with the
usual child.env = local_repo_env assignment like everybody else.

Comments?

 environment.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Jacob Keller March 23, 2021, 8:52 p.m. UTC | #1
On Tue, Mar 23, 2021 at 11:58 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> I was grepping around and found this piece of code today:
>
>         static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
>         {
>                 const char * const *var;
>
>                 for (var = local_repo_env; *var; var++) {
>                         if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
>                                 strvec_push(out, *var);
>                 }
>         }
>
> which tries to "unsetenv" the environment variables that pertain to
> the current repository listed in loocal_repo_env[], but makes
> exception for GIT_CONFIG_PARAMETERS.

Right. We need to unset some parameters, but not everything...

>
> It originally came from 14111fc4 (git: submodule honor -c
> credential.* from command line, 2016-02-29) and later simplified by
> 89044baa (submodule: stop sanitizing config options, 2016-05-04).
>
> Now after d8d77153 (config: allow specifying config entries via
> envvar pairs, 2021-01-12), we have yet another way to pass a set of
> custom one-shot configuration via the environment variable, using
> GIT_CONFIG_COUNT (which is in local_repo_env[] and will be removed
> from the environment by this helper function), GIT_CONFIG_KEY_$n and
> GIT_CONFIG_VALUE_$n (which are unbound set and naturally not in
> local_repo_env[]).  Leaving the latter two exported will not hurt if
> we do intend to hide the custom configuration from the subprocess by
> unsetting GIT_CONFIG_COUNT, but should we be doing so?
>

I think it's a difficult question because if I recall, there was some
question/concern about what values need to stay for the submodule vs
which ones do not..?

> There are many run_command() users that just pass local_repo_env[]
> to the child.env when running a subprocess.  Given that the code
> that works in a submodule, which presumably is THE primary target
> of the "we do not want to pass environment variables that pertain to
> the current repository but not to the repository the child process
> works in" consideration that the local_repo_env[] is about, does *not*
> want the GIT_CONFIG_PARAMETERS cleansed, I have to wonder if the
> environment variables (the original GIT_CONFIG_PARAMETERS as well as
> Patrick's GIT_CONFIG_{COUNT,KEY_$n,VALUE_$n}) should be in that
> local_repo_env[] array in the first place.  If we remove them, the
> above helper function can just go away and be replaced with the
> usual child.env = local_repo_env assignment like everybody else.
>

I think that makes a reasonable amount of sense. So if I understand
right, this change makes it so that CONFIG_DATA_ENVIRONMENT and
CONFIG_COUNT_ENVIRONMENT will no longer be forwarded? I guess I am a
bit confused about the current status vs what we want here.

> Comments?
>
>  environment.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git c/environment.c w/environment.c
> index 2f27008424..6dcee6a9c5 100644
> --- c/environment.c
> +++ w/environment.c
> @@ -116,8 +116,6 @@ static char *super_prefix;
>  const char * const local_repo_env[] = {
>         ALTERNATE_DB_ENVIRONMENT,
>         CONFIG_ENVIRONMENT,
> -       CONFIG_DATA_ENVIRONMENT,
> -       CONFIG_COUNT_ENVIRONMENT,
>         DB_ENVIRONMENT,
>         GIT_DIR_ENVIRONMENT,
>         GIT_WORK_TREE_ENVIRONMENT,
Junio C Hamano March 23, 2021, 9:48 p.m. UTC | #2
Jacob Keller <jacob.keller@gmail.com> writes:

> On Tue, Mar 23, 2021 at 11:58 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> I was grepping around and found this piece of code today:
>>
>>         static void prepare_submodule_repo_env_no_git_dir(struct strvec *out)
>>         {
>>                 const char * const *var;
>>
>>                 for (var = local_repo_env; *var; var++) {
>>                         if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
>>                                 strvec_push(out, *var);
>>                 }
>>         }
>>
>> which tries to "unsetenv" the environment variables that pertain to
>> the current repository listed in loocal_repo_env[], but makes
>> exception for GIT_CONFIG_PARAMETERS.
>
> Right. We need to unset some parameters, but not everything...
>
>>
>> It originally came from 14111fc4 (git: submodule honor -c
>> credential.* from command line, 2016-02-29) and later simplified by
>> 89044baa (submodule: stop sanitizing config options, 2016-05-04).
>>
>> Now after d8d77153 (config: allow specifying config entries via
>> envvar pairs, 2021-01-12), we have yet another way to pass a set of
>> custom one-shot configuration via the environment variable, using
>> GIT_CONFIG_COUNT (which is in local_repo_env[] and will be removed
>> from the environment by this helper function), GIT_CONFIG_KEY_$n and
>> GIT_CONFIG_VALUE_$n (which are unbound set and naturally not in
>> local_repo_env[]).  Leaving the latter two exported will not hurt if
>> we do intend to hide the custom configuration from the subprocess by
>> unsetting GIT_CONFIG_COUNT, but should we be doing so?
>>
>
> I think it's a difficult question because if I recall, there was some
> question/concern about what values need to stay for the submodule vs
> which ones do not..?
>
>> There are many run_command() users that just pass local_repo_env[]
>> to the child.env when running a subprocess.  Given that the code
>> that works in a submodule, which presumably is THE primary target
>> of the "we do not want to pass environment variables that pertain to
>> the current repository but not to the repository the child process
>> works in" consideration that the local_repo_env[] is about, does *not*
>> want the GIT_CONFIG_PARAMETERS cleansed, I have to wonder if the
>> environment variables (the original GIT_CONFIG_PARAMETERS as well as
>> Patrick's GIT_CONFIG_{COUNT,KEY_$n,VALUE_$n}) should be in that
>> local_repo_env[] array in the first place.  If we remove them, the
>> above helper function can just go away and be replaced with the
>> usual child.env = local_repo_env assignment like everybody else.
>>
>
> I think that makes a reasonable amount of sense. So if I understand
> right, this change makes it so that CONFIG_DATA_ENVIRONMENT and
> CONFIG_COUNT_ENVIRONMENT will no longer be forwarded? I guess I am a
> bit confused about the current status vs what we want here.

Many callers do child.env = local_repo_env when spawning a
subprocess.  The elements of child.env is treated as if each of them
were passed to setenv() (if there is '=') or unsetenv (otherwise),
and because local_repo_env[] spells only the variable names, the
effect is to unexport them.  The helper function shown at the
beginning of the message you are responding to, which you wrote more
than 5 years ago, is to exclude GIT_CONFIG_PARAMETERS from that
treatment.  I.e. the code wants run_command() not to drop that
particular environment variable when running a subprocess.

Removing GIT_CONFIG_PARAMETERS from local_repo_env[] should have the
same effect, without the helper to special case it in its loop.
We have been passing GIT_CONFIG_PARAMETERS, and we will keep passing
it even if we make such a change to remove it from local_repo_env[].

The configuration parameters passed via the newer GIT_CONFIG_COUNT
mechanism, because local_repo_env[] has it but the above helper does
not special case it, are dropped and not seen by the subprocess.
Assuming that it is a bug and we would want to pass them to the
subprocess the same way as GIT_CONFIG_PARAMETERS environment
variable, we could tweak the helper function to make it special case
GIT_CONFIG_COUNT the same way as we've done GIT_CONFIG_PARAMETERS
for the past 5 years.  But if we suspect that other codepaths (not
the ones that use the above helper) may be doing it wrong and they
too should pass the configuration parameters to the subprocess, a
simpler way would be to remove them from local_repo_env[].

That is the summary of the current status and what would happen if
we did the attached patch.

>> Comments?
>>
>>  environment.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git c/environment.c w/environment.c
>> index 2f27008424..6dcee6a9c5 100644
>> --- c/environment.c
>> +++ w/environment.c
>> @@ -116,8 +116,6 @@ static char *super_prefix;
>>  const char * const local_repo_env[] = {
>>         ALTERNATE_DB_ENVIRONMENT,
>>         CONFIG_ENVIRONMENT,
>> -       CONFIG_DATA_ENVIRONMENT,
>> -       CONFIG_COUNT_ENVIRONMENT,
>>         DB_ENVIRONMENT,
>>         GIT_DIR_ENVIRONMENT,
>>         GIT_WORK_TREE_ENVIRONMENT,
Jacob Keller March 23, 2021, 9:57 p.m. UTC | #3
On Tue, Mar 23, 2021 at 2:48 PM Junio C Hamano <gitster@pobox.com> wrote:
> Many callers do child.env = local_repo_env when spawning a
> subprocess.  The elements of child.env is treated as if each of them
> were passed to setenv() (if there is '=') or unsetenv (otherwise),
> and because local_repo_env[] spells only the variable names, the
> effect is to unexport them.  The helper function shown at the
> beginning of the message you are responding to, which you wrote more
> than 5 years ago, is to exclude GIT_CONFIG_PARAMETERS from that
> treatment.  I.e. the code wants run_command() not to drop that
> particular environment variable when running a subprocess.
>

Ok, right. So the part that was confusing me is that by adding the
value into the local_repo_env, we were telling it to clear the
variables. Yep.

> Removing GIT_CONFIG_PARAMETERS from local_repo_env[] should have the
> same effect, without the helper to special case it in its loop.
> We have been passing GIT_CONFIG_PARAMETERS, and we will keep passing
> it even if we make such a change to remove it from local_repo_env[].
>

Yea, makes sense. I think that's a much better approach than special
casing in a separate function.

> The configuration parameters passed via the newer GIT_CONFIG_COUNT
> mechanism, because local_repo_env[] has it but the above helper does
> not special case it, are dropped and not seen by the subprocess.
> Assuming that it is a bug and we would want to pass them to the
> subprocess the same way as GIT_CONFIG_PARAMETERS environment
> variable, we could tweak the helper function to make it special case
> GIT_CONFIG_COUNT the same way as we've done GIT_CONFIG_PARAMETERS
> for the past 5 years.  But if we suspect that other codepaths (not
> the ones that use the above helper) may be doing it wrong and they
> too should pass the configuration parameters to the subprocess, a
> simpler way would be to remove them from local_repo_env[].
>
> That is the summary of the current status and what would happen if
> we did the attached patch.
>

Thank you for restating and clarifying. I think this is the right
approach, and I agree that GIT_CONFIG_COUNT should be treated in the
same way as GIT_CONFIG_PARAMETERS.

So, I think this direction is good. I imagine a full patch would
include also dropping the specialized helper function that is no
longer needed, and possibly adding new tests for the behavior of
GIT_CONFIG_COUNT?

Thanks,
Jake
Junio C Hamano March 23, 2021, 10:35 p.m. UTC | #4
Jacob Keller <jacob.keller@gmail.com> writes:

> So, I think this direction is good. I imagine a full patch would
> include also dropping the specialized helper function that is no
> longer needed, and possibly adding new tests for the behavior of
> GIT_CONFIG_COUNT?

Yeah, coding that is the easiest part.  Thinking through
ramifications of making (or not making) such a change is much
harder.

I said "assuming" number of times, because I am not so sure if the
subprocesses spawned from other codepaths do or do not want to see
the one-shot custom configuration settings.  If that assumption
turns out to be wrong and the processes spawned using the helper in
various helper functions in submodule.c are the oddball cases that
want to see the custom configuration, then such a change would break
existing users.

I _think_ the one in connect.c, which runs either the ssh transport
(for which the processes that run on the other side in the other
repository won't be affected by our environment anyway) or the file
transport that runs another process and talks with it over a pipe is
probably OK if the configuration on the "client" side leaks through
to the "server" side, e.g.

    $ git -c advice.ignoredHook=false clone file:///the/repo.git/ here

would probably want the other end (i.e. the one that runs upload-pack
in /the/repo.git/ directory) to see the one-shot configuration, too.

I do not think it makes much difference to the use of local_repo_env
in object-file.c::for_each_alternate_ref() either way; it could be
used (via core.alternateRefsCommand) an arbitrary command in each
alternate repository, but by default it runs for-each-ref in them,
and I do not think of any configuration variables that would be
useful on "the other side".

And I suspect that trailers.c::apply_command() excludes these
environment variables just out of habit without much deep thinking.
It is not going in a different repository to run the command, and
santitizing the environment that pertains to this repository should
not have any meaningful effect [*].

So, I would not be surprised if it were a totally safe change, but I
am not yet sure.

Thanks.



[Footnote]

* Christian CC'ed for 85039fb6 (trailer: execute command from
  'trailer.<name>.command', 2014-10-13) that introduced the code
  together with its use of local_repo_env[].
Jeff King March 24, 2021, 6:39 p.m. UTC | #5
On Tue, Mar 23, 2021 at 03:35:07PM -0700, Junio C Hamano wrote:

> > So, I think this direction is good. I imagine a full patch would
> > include also dropping the specialized helper function that is no
> > longer needed, and possibly adding new tests for the behavior of
> > GIT_CONFIG_COUNT?
> 
> Yeah, coding that is the easiest part.  Thinking through
> ramifications of making (or not making) such a change is much
> harder.
> 
> I said "assuming" number of times, because I am not so sure if the
> subprocesses spawned from other codepaths do or do not want to see
> the one-shot custom configuration settings.  If that assumption
> turns out to be wrong and the processes spawned using the helper in
> various helper functions in submodule.c are the oddball cases that
> want to see the custom configuration, then such a change would break
> existing users.

I think it really depends on the command being spawned. But keep in mind
that the local_repo_env list is not limited just to callers inside of
Git. We expose it to the user via rev-parse, so scripts can do:

  unset $(git rev-parse --local-env-vars)
  cd /some/other/repo

I'm hesitant to change the output there, since we don't know exactly how
it's used in the wild[1]. Changing what our internal callers do is less
risky, though I'd generally avoid doing so unless there is a known
benefit. And I'm sure what the benefit is; I think this came up mostly
because you were looking at harmonizing the behavior of the two config
systems (and I think that _is_ worth doing, but I'd probably choose the
historical behavior for the new system).

I also think it really depends on the specific config the user is
expecting to get passed. Remember we used to have a whitelist for "this
config is OK to pass to submodules", but it was such a mess that we did
away with it in 89044baa8b (submodule: stop sanitizing config options,
2016-05-04).

> I _think_ the one in connect.c, which runs either the ssh transport
> (for which the processes that run on the other side in the other
> repository won't be affected by our environment anyway) or the file
> transport that runs another process and talks with it over a pipe is
> probably OK if the configuration on the "client" side leaks through
> to the "server" side, e.g.
> 
>     $ git -c advice.ignoredHook=false clone file:///the/repo.git/ here
> 
> would probably want the other end (i.e. the one that runs upload-pack
> in /the/repo.git/ directory) to see the one-shot configuration, too.

That example is one of the reasons I prefer _not_ to pass config here.
It only works over local-process invocations! Not over ssh://, nor
git://, nor https://. Even though it will do what you want in this case,
the overall behavior is more confusing.

The more-consistent (or less inconsistent, perhaps) way is:

  git clone -u 'git -c advice.ignoredHook=false upload-pack' \
    file:///the/repo.git

which also works with ssh. It of course _doesn't_ work with other
protocols, but I think the technique at least makes it more clear why
that is the case (you do not get to specify arbitrary shell commands to
https servers).

> I do not think it makes much difference to the use of local_repo_env
> in object-file.c::for_each_alternate_ref() either way; it could be
> used (via core.alternateRefsCommand) an arbitrary command in each
> alternate repository, but by default it runs for-each-ref in them,
> and I do not think of any configuration variables that would be
> useful on "the other side".
> 
> And I suspect that trailers.c::apply_command() excludes these
> environment variables just out of habit without much deep thinking.
> It is not going in a different repository to run the command, and
> santitizing the environment that pertains to this repository should
> not have any meaningful effect [*].
> 
> So, I would not be surprised if it were a totally safe change, but I
> am not yet sure.

My suspicion is that for most cases, nobody cares that much either way
(which is why we have not seen people ask "hey, why is my config not
passed down" in any context _except_ submodules).

-Peff

[1] The one place I have used "rev-parse --local-env-vars" is in scripts
    to handle alternates storage. When we "sync" from a fork to the
    shared-object repo, there are some Git commands that must run in
    one, and some in the other. Likewise, there is "do maintenance"
    script which can be triggered from a fork, but then moves into the
    shared repo. Some of those sites explicitly allow config by doing:

      unset $(git rev-parse --local-env-vars | grep -v '^GIT_CONFIG')

    because we want "git -c pack.threads=17 do-the-maintenance" to
    respect that config after moving into the shared repo to do the
    repack. But others don't. _Probably_ nothing would go too wrong if
    we retained config, but I'd want to look at each. Which makes me
    concerned that other scripts in the wild would likewise be
    potentially surprised by this.
Patrick Steinhardt March 31, 2021, 11:24 a.m. UTC | #6
On Wed, Mar 24, 2021 at 02:39:51PM -0400, Jeff King wrote:
> On Tue, Mar 23, 2021 at 03:35:07PM -0700, Junio C Hamano wrote:
> 
> > > So, I think this direction is good. I imagine a full patch would
> > > include also dropping the specialized helper function that is no
> > > longer needed, and possibly adding new tests for the behavior of
> > > GIT_CONFIG_COUNT?
> > 
> > Yeah, coding that is the easiest part.  Thinking through
> > ramifications of making (or not making) such a change is much
> > harder.
> > 
> > I said "assuming" number of times, because I am not so sure if the
> > subprocesses spawned from other codepaths do or do not want to see
> > the one-shot custom configuration settings.  If that assumption
> > turns out to be wrong and the processes spawned using the helper in
> > various helper functions in submodule.c are the oddball cases that
> > want to see the custom configuration, then such a change would break
> > existing users.
> 
> I think it really depends on the command being spawned. But keep in mind
> that the local_repo_env list is not limited just to callers inside of
> Git. We expose it to the user via rev-parse, so scripts can do:
> 
>   unset $(git rev-parse --local-env-vars)
>   cd /some/other/repo
> 
> I'm hesitant to change the output there, since we don't know exactly how
> it's used in the wild[1]. Changing what our internal callers do is less
> risky, though I'd generally avoid doing so unless there is a known
> benefit. And I'm sure what the benefit is; I think this came up mostly
> because you were looking at harmonizing the behavior of the two config
> systems (and I think that _is_ worth doing, but I'd probably choose the
> historical behavior for the new system).

Agreed. I wasn't aware of this helper function at all, and aligning both
config systems so they have the same behaviour there seems like the
right thing to do to me.

> I also think it really depends on the specific config the user is
> expecting to get passed. Remember we used to have a whitelist for "this
> config is OK to pass to submodules", but it was such a mess that we did
> away with it in 89044baa8b (submodule: stop sanitizing config options,
> 2016-05-04).

That also came to my mind while this thread. I can see why it would be
useful if e.g. `gc.auto=0` gets passed down to all subcommands spawned
by git. But if the user for example injects remote configuration via
config envvars, then it'd certainly be unexpected if submodules would
try to fetch from the same in-memory remote as the parent on a recursive
fetch.

> > I _think_ the one in connect.c, which runs either the ssh transport
> > (for which the processes that run on the other side in the other
> > repository won't be affected by our environment anyway) or the file
> > transport that runs another process and talks with it over a pipe is
> > probably OK if the configuration on the "client" side leaks through
> > to the "server" side, e.g.
> > 
> >     $ git -c advice.ignoredHook=false clone file:///the/repo.git/ here
> > 
> > would probably want the other end (i.e. the one that runs upload-pack
> > in /the/repo.git/ directory) to see the one-shot configuration, too.
> 
> That example is one of the reasons I prefer _not_ to pass config here.
> It only works over local-process invocations! Not over ssh://, nor
> git://, nor https://. Even though it will do what you want in this case,
> the overall behavior is more confusing.
> 
> The more-consistent (or less inconsistent, perhaps) way is:
> 
>   git clone -u 'git -c advice.ignoredHook=false upload-pack' \
>     file:///the/repo.git
> 
> which also works with ssh. It of course _doesn't_ work with other
> protocols, but I think the technique at least makes it more clear why
> that is the case (you do not get to specify arbitrary shell commands to
> https servers).
> 
> > I do not think it makes much difference to the use of local_repo_env
> > in object-file.c::for_each_alternate_ref() either way; it could be
> > used (via core.alternateRefsCommand) an arbitrary command in each
> > alternate repository, but by default it runs for-each-ref in them,
> > and I do not think of any configuration variables that would be
> > useful on "the other side".
> > 
> > And I suspect that trailers.c::apply_command() excludes these
> > environment variables just out of habit without much deep thinking.
> > It is not going in a different repository to run the command, and
> > santitizing the environment that pertains to this repository should
> > not have any meaningful effect [*].
> > 
> > So, I would not be surprised if it were a totally safe change, but I
> > am not yet sure.
> 
> My suspicion is that for most cases, nobody cares that much either way
> (which is why we have not seen people ask "hey, why is my config not
> passed down" in any context _except_ submodules).

Probably not, but it may be a good idea to document config boundaries
such that nobody is caught by surprise there.

Patrick
diff mbox series

Patch

diff --git c/environment.c w/environment.c
index 2f27008424..6dcee6a9c5 100644
--- c/environment.c
+++ w/environment.c
@@ -116,8 +116,6 @@  static char *super_prefix;
 const char * const local_repo_env[] = {
 	ALTERNATE_DB_ENVIRONMENT,
 	CONFIG_ENVIRONMENT,
-	CONFIG_DATA_ENVIRONMENT,
-	CONFIG_COUNT_ENVIRONMENT,
 	DB_ENVIRONMENT,
 	GIT_DIR_ENVIRONMENT,
 	GIT_WORK_TREE_ENVIRONMENT,