diff mbox series

setup: support GIT_IGNORE_INSECURE_OWNER environment variable

Message ID 20240626123358.420292-2-flo@geekplace.eu (mailing list archive)
State New, archived
Headers show
Series setup: support GIT_IGNORE_INSECURE_OWNER environment variable | expand

Commit Message

Florian Schmaus June 26, 2024, 12:33 p.m. UTC
Sometimes more flexibility to disable/ignore the ownership check, besides
the safe.directory configuration option, is required.

For example, git-daemon running as nobody user, which typically has no
home directory. Therefore, we can not add the path to a user-global
configuration and adding the path to the system-wide configuration could
have negative security implications.

Therefore, make the check configurable via an environment variable.

If the environment variable GIT_IGNORE_INSECURE_OWNER is set to true,
then ignore potentially insecure ownership of git-related path
components.

Signed-off-by: Florian Schmaus <flo@geekplace.eu>
---
 setup.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Phillip Wood June 26, 2024, 1:11 p.m. UTC | #1
Hi Florian

On 26/06/2024 13:33, Florian Schmaus wrote:
> Sometimes more flexibility to disable/ignore the ownership check, besides
> the safe.directory configuration option, is required.
> 
> For example, git-daemon running as nobody user, which typically has no
> home directory. Therefore, we can not add the path to a user-global
> configuration and adding the path to the system-wide configuration could
> have negative security implications.
> 
> Therefore, make the check configurable via an environment variable.

An alternative would be to allow safe.directory to be specified on the 
command line with "git -c safe.directory='*' daemon ..." rather than 
adding a dedicated environment variable. Or you could set $HOME to a 
suitable directory when running "git daemon" and put the user-global 
config file there. That directory and config file only need to be 
readable by the user that "git daemon" is running under it can be owned 
by root or whoever else you want.

Best Wishes

Phillip


> If the environment variable GIT_IGNORE_INSECURE_OWNER is set to true,
> then ignore potentially insecure ownership of git-related path
> components.
> 
> Signed-off-by: Florian Schmaus <flo@geekplace.eu>
> ---
>   setup.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/setup.c b/setup.c
> index 3afa6fb09b28..da3f504fb536 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1278,6 +1278,14 @@ static int ensure_valid_ownership(const char *gitfile,
>   	 */
>   	git_protected_config(safe_directory_cb, &data);
>   
> +	if (data.is_safe)
> +		return data.is_safe;
> +
> +	if (git_env_bool("GIT_IGNORE_INSECURE_OWNER", 0)) {
> +		warning("ignoring dubious ownership in repository at '%s' (GIT_IGNORE_INSECURE_OWNER set)", data.path);
> +		return 1;
> +	}
> +
>   	return data.is_safe;
>   }
>
Randall S. Becker June 26, 2024, 3:19 p.m. UTC | #2
On Wednesday, June 26, 2024 9:12 AM, Phillip Wood wrote:
>On 26/06/2024 13:33, Florian Schmaus wrote:
>> Sometimes more flexibility to disable/ignore the ownership check,
>> besides the safe.directory configuration option, is required.
>>
>> For example, git-daemon running as nobody user, which typically has no
>> home directory. Therefore, we can not add the path to a user-global
>> configuration and adding the path to the system-wide configuration
>> could have negative security implications.
>>
>> Therefore, make the check configurable via an environment variable.
>
>An alternative would be to allow safe.directory to be specified on the command line
>with "git -c safe.directory='*' daemon ..." rather than adding a dedicated
>environment variable. Or you could set $HOME to a suitable directory when
>running "git daemon" and put the user-global config file there. That directory and
>config file only need to be readable by the user that "git daemon" is running under it
>can be owned by root or whoever else you want.

I agree with this alternative. Our CI build environment already has so many environment variables (not just from git but all dependencies and the CI environment itself) that we are pushing the 56Kb platform limit (not kidding). Reducing dependence on environment variables is, in my opinion, a good and necessary thing.

>> If the environment variable GIT_IGNORE_INSECURE_OWNER is set to true,
>> then ignore potentially insecure ownership of git-related path
>> components.
>>
>> Signed-off-by: Florian Schmaus <flo@geekplace.eu>
>> ---
>>   setup.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/setup.c b/setup.c
>> index 3afa6fb09b28..da3f504fb536 100644
>> --- a/setup.c
>> +++ b/setup.c
>> @@ -1278,6 +1278,14 @@ static int ensure_valid_ownership(const char *gitfile,
>>   	 */
>>   	git_protected_config(safe_directory_cb, &data);
>>
>> +	if (data.is_safe)
>> +		return data.is_safe;
>> +
>> +	if (git_env_bool("GIT_IGNORE_INSECURE_OWNER", 0)) {
>> +		warning("ignoring dubious ownership in repository at '%s'
>(GIT_IGNORE_INSECURE_OWNER set)", data.path);
>> +		return 1;
>> +	}
>> +
>>   	return data.is_safe;
>>   }
>>

Sincerely
--Randall
Phillip Wood June 26, 2024, 3:26 p.m. UTC | #3
On 26/06/2024 14:11, Phillip Wood wrote:
> Hi Florian
> 
> On 26/06/2024 13:33, Florian Schmaus wrote:
>> Sometimes more flexibility to disable/ignore the ownership check, besides
>> the safe.directory configuration option, is required.
>>
>> For example, git-daemon running as nobody user, which typically has no
>> home directory. Therefore, we can not add the path to a user-global
>> configuration and adding the path to the system-wide configuration could
>> have negative security implications.
>>
>> Therefore, make the check configurable via an environment variable.
> 
> An alternative would be to allow safe.directory to be specified on the 
> command line with "git -c safe.directory='*' daemon ..." rather than 
> adding a dedicated environment variable.

To expand an this a little - a couple of times I've wanted to checkout a 
bare repository that is owned by a different user. It is a pain to have 
to add a new config setting just for a one-off checkout. Being able to 
adjust the config on the command line would be very useful in that case.

> Or you could set $HOME to a 
> suitable directory when running "git daemon" and put the user-global 
> config file there. That directory and config file only need to be 
> readable by the user that "git daemon" is running under it can be owned 
> by root or whoever else you want.

The advantage of this approach is that there are no changes needed to 
git, instead of setting GIT_IGNORE_INSECURE_OWNER one sets HOME to point 
to a suitable config file. I found this useful when I was debugging the 
issues with git-daemon earlier[1]

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/834862fd-b579-438a-b9b3-5246bf27ce8a@gmail.com


> Best Wishes
> 
> Phillip
> 
> 
>> If the environment variable GIT_IGNORE_INSECURE_OWNER is set to true,
>> then ignore potentially insecure ownership of git-related path
>> components.
>>
>> Signed-off-by: Florian Schmaus <flo@geekplace.eu>
>> ---
>>   setup.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/setup.c b/setup.c
>> index 3afa6fb09b28..da3f504fb536 100644
>> --- a/setup.c
>> +++ b/setup.c
>> @@ -1278,6 +1278,14 @@ static int ensure_valid_ownership(const char 
>> *gitfile,
>>        */
>>       git_protected_config(safe_directory_cb, &data);
>> +    if (data.is_safe)
>> +        return data.is_safe;
>> +
>> +    if (git_env_bool("GIT_IGNORE_INSECURE_OWNER", 0)) {
>> +        warning("ignoring dubious ownership in repository at '%s' 
>> (GIT_IGNORE_INSECURE_OWNER set)", data.path);
>> +        return 1;
>> +    }
>> +
>>       return data.is_safe;
>>   }
> 
>
Junio C Hamano June 26, 2024, 6:11 p.m. UTC | #4
Phillip Wood <phillip.wood123@gmail.com> writes:

> To expand an this a little - a couple of times I've wanted to checkout
> a bare repository that is owned by a different user. It is a pain to
> have to add a new config setting just for a one-off checkout. Being
> able to adjust the config on the command line would be very useful in
> that case.

True.  As long as it is deemed safe to honor the one-off "git -c
safe.directory=..." from the command line, for the purpose of this
"I who am running this 'git' process hereby declare that I trust
this and that repository", I think it would be the best solution
for the "git daemon" use case.

And it is much better than adding a one-off environment variable.
After all, if your "git daemon" user does not have a $HOME set in
its /etc/passwd entry, you cannot set such an environment variable
in $HOME/.profile so somewhere in your "git daemon" invocation would
have to be tweaked to have code snippet that sets and exports it
*anyway*.  You can tweak the "git" invocation to add the command
line tweak "-c safe.directory=..." at the place you would have set
and exported the variable, and using the well understood "git -c
var=val" mechanism would be more appropriate.

>> Or you could set $HOME to a suitable directory when running "git
> ...
> The advantage of this approach is that there are no changes needed to
> git, instead of setting GIT_IGNORE_INSECURE_OWNER one sets HOME to
> point to a suitable config file. I found this useful when I was
> debugging the issues with git-daemon earlier[1]

Yup, that sounds like a workable approach, if "git -c var=val"
approach turns out to be inappropriate for security purposes
for whatever reason.

Thanks.
Phillip Wood June 26, 2024, 6:38 p.m. UTC | #5
On 26/06/2024 16:19, rsbecker@nexbridge.com wrote:
> On Wednesday, June 26, 2024 9:12 AM, Phillip Wood wrote:
>> On 26/06/2024 13:33, Florian Schmaus wrote:
>> An alternative would be to allow safe.directory to be specified on the command line
>> with "git -c safe.directory='*' daemon ..." rather than adding a dedicated
>> environment variable. Or you could set $HOME to a suitable directory when
>> running "git daemon" and put the user-global config file there. That directory and
>> config file only need to be readable by the user that "git daemon" is running under it
>> can be owned by root or whoever else you want.
> 
> I agree with this alternative. Our CI build environment already has so many environment variables (not just from git but all dependencies and the CI environment itself) that we are pushing the 56Kb platform limit (not kidding). Reducing dependence on environment variables is, in my opinion, a good and necessary thing.

"git -c" still ends up setting an environment variable internally to 
ensure that the config setting is passed down to any child processes. 
The advantage is that we're re-using an existing mechanism rather than 
introducing a new environment variable.

Best Wishes

Phillip

>>> If the environment variable GIT_IGNORE_INSECURE_OWNER is set to true,
>>> then ignore potentially insecure ownership of git-related path
>>> components.
>>>
>>> Signed-off-by: Florian Schmaus <flo@geekplace.eu>
>>> ---
>>>    setup.c | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/setup.c b/setup.c
>>> index 3afa6fb09b28..da3f504fb536 100644
>>> --- a/setup.c
>>> +++ b/setup.c
>>> @@ -1278,6 +1278,14 @@ static int ensure_valid_ownership(const char *gitfile,
>>>    	 */
>>>    	git_protected_config(safe_directory_cb, &data);
>>>
>>> +	if (data.is_safe)
>>> +		return data.is_safe;
>>> +
>>> +	if (git_env_bool("GIT_IGNORE_INSECURE_OWNER", 0)) {
>>> +		warning("ignoring dubious ownership in repository at '%s'
>> (GIT_IGNORE_INSECURE_OWNER set)", data.path);
>>> +		return 1;
>>> +	}
>>> +
>>>    	return data.is_safe;
>>>    }
>>>
> 
> Sincerely
> --Randall
>
Florian Schmaus June 26, 2024, 7:06 p.m. UTC | #6
Thanks for all your replies. Much appreciated.

On 26/06/2024 20.11, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
>> To expand an this a little - a couple of times I've wanted to checkout
>> a bare repository that is owned by a different user. It is a pain to
>> have to add a new config setting just for a one-off checkout. Being
>> able to adjust the config on the command line would be very useful in
>> that case.
> 
> True.  As long as it is deemed safe to honor the one-off "git -c
> safe.directory=..." from the command line, for the purpose of this
> "I who am running this 'git' process hereby declare that I trust
> this and that repository", I think it would be the best solution
> for the "git daemon" use case.

How does one pass "-c safe.directory=…" to git-http-backend?

I currently have an Apache config snippet like

SetEnv GIT_PROJECT_ROOT /var/www/example.org/htdocs/git
SetEnv GIT_HTTP_EXPORT_ALL
ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

<Files "git-http-backend">
   Require all granted
   AcceptPathInfo On
</Files>

to serve git repositories.

Granted, the apache user has a home directory, so I am probably able to 
set save.directory via ~/.gitconfig.

However, the point here is that git is often invoked indirectly, with no 
control over the command line arguments that are passed to it. On the 
other hand, one has usually control over the environment variables.

I agree that "-c safe.directory=…" is preferable to 
GIT_IGNORE_INSECURE_OWNER. However, sometimes using "-c 
safe.directory=…" is cumbersome and maybe even impossible.

One alternative to GIT_IGNORE_INSECURE_OWNER would be a generic 
GIT_EXTRA_ARGS environment variable. So one could set

GIT_EXTRA_ARGS="-c safe.directory=…"

Not saying that I like the idea, just pointing out this option.

- Florian
Jeff King June 26, 2024, 8:37 p.m. UTC | #7
On Wed, Jun 26, 2024 at 09:06:10PM +0200, Florian Schmaus wrote:

> > True.  As long as it is deemed safe to honor the one-off "git -c
> > safe.directory=..." from the command line, for the purpose of this
> > "I who am running this 'git' process hereby declare that I trust
> > this and that repository", I think it would be the best solution
> > for the "git daemon" use case.
> 
> How does one pass "-c safe.directory=…" to git-http-backend?
> 
> I currently have an Apache config snippet like
> 
> SetEnv GIT_PROJECT_ROOT /var/www/example.org/htdocs/git
> SetEnv GIT_HTTP_EXPORT_ALL
> ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/
> 
> <Files "git-http-backend">
>   Require all granted
>   AcceptPathInfo On
> </Files>
> 
> to serve git repositories.
> 
> Granted, the apache user has a home directory, so I am probably able to set
> save.directory via ~/.gitconfig.
> 
> However, the point here is that git is often invoked indirectly, with no
> control over the command line arguments that are passed to it. On the other
> hand, one has usually control over the environment variables.
> 
> I agree that "-c safe.directory=…" is preferable to
> GIT_IGNORE_INSECURE_OWNER. However, sometimes using "-c safe.directory=…" is
> cumbersome and maybe even impossible.
> 
> One alternative to GIT_IGNORE_INSECURE_OWNER would be a generic
> GIT_EXTRA_ARGS environment variable. So one could set
> 
> GIT_EXTRA_ARGS="-c safe.directory=…"
> 
> Not saying that I like the idea, just pointing out this option.

You can do:

  GIT_CONFIG_COUNT=1
  GIT_CONFIG_KEY_0=safe.directory
  GIT_CONFIG_VALUE_0="*"

It is a bit verbose, but it's a documented interface in git-config(1).

Under the hood "git -c" uses a different, older mechanism, but we've not
documented it nor promised that it will remain stable.

-Peff
Phillip Wood June 27, 2024, 9:50 a.m. UTC | #8
On 26/06/2024 19:11, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> To expand an this a little - a couple of times I've wanted to checkout
>> a bare repository that is owned by a different user. It is a pain to
>> have to add a new config setting just for a one-off checkout. Being
>> able to adjust the config on the command line would be very useful in
>> that case.
> 
> True.  As long as it is deemed safe to honor the one-off "git -c
> safe.directory=..." from the command line, for the purpose of this
> "I who am running this 'git' process hereby declare that I trust
> this and that repository", I think it would be the best solution
> for the "git daemon" use case.

This actually works already, the behavior was changed in 6061601d9f 
(safe.directory: use git_protected_config(), 2022-07-14). The reason I 
thought it didn't work was that I remember it failing on Debian bullseye 
a few months ago but that used an older version of git. There is some 
more rationale for the change in 779ea9303a7 (Documentation: define 
protected configuration, 2022-07-14)

Best Wishes

Phillip

> And it is much better than adding a one-off environment variable.
> After all, if your "git daemon" user does not have a $HOME set in
> its /etc/passwd entry, you cannot set such an environment variable
> in $HOME/.profile so somewhere in your "git daemon" invocation would
> have to be tweaked to have code snippet that sets and exports it
> *anyway*.  You can tweak the "git" invocation to add the command
> line tweak "-c safe.directory=..." at the place you would have set
> and exported the variable, and using the well understood "git -c
> var=val" mechanism would be more appropriate.
> 
>>> Or you could set $HOME to a suitable directory when running "git
>> ...
>> The advantage of this approach is that there are no changes needed to
>> git, instead of setting GIT_IGNORE_INSECURE_OWNER one sets HOME to
>> point to a suitable config file. I found this useful when I was
>> debugging the issues with git-daemon earlier[1]
> 
> Yup, that sounds like a workable approach, if "git -c var=val"
> approach turns out to be inappropriate for security purposes
> for whatever reason.
> 
> Thanks.
Junio C Hamano June 27, 2024, 3:28 p.m. UTC | #9
Phillip Wood <phillip.wood123@gmail.com> writes:

> On 26/06/2024 19:11, Junio C Hamano wrote:
>> Phillip Wood <phillip.wood123@gmail.com> writes:
>> 
>>> To expand an this a little - a couple of times I've wanted to checkout
>>> a bare repository that is owned by a different user. It is a pain to
>>> have to add a new config setting just for a one-off checkout. Being
>>> able to adjust the config on the command line would be very useful in
>>> that case.
>> True.  As long as it is deemed safe to honor the one-off "git -c
>> safe.directory=..." from the command line, for the purpose of this
>> "I who am running this 'git' process hereby declare that I trust
>> this and that repository", I think it would be the best solution
>> for the "git daemon" use case.
>
> This actually works already, the behavior was changed in 6061601d9f
> (safe.directory: use git_protected_config(), 2022-07-14). The reason I
> thought it didn't work was that I remember it failing on Debian
> bullseye a few months ago but that used an older version of git. There
> is some more rationale for the change in 779ea9303a7 (Documentation:
> define protected configuration, 2022-07-14)

Thanks.

So, does this more or less conclude the episode about how best to
deal with the 2.45.1 regression that Florian's patch in this thread
started?  It seems that we already have enough mechanisms to help
users tweak their existing set-up, so we may not need code changes,
but I am wondering if we want to add a bit of documentation around
safe.directory to tell them when it makes sense to set it, what
value(s) they would want to set it to, etc.

 * For "git daemon" invocations, because we know the command is run
   after chdir to a directory with '.' specified as the repository,
   we recommend to have safe.directory=., either on the command line
   with "-c var=val" or in daemon user's ~/.gitconfig, in the
   "git-daemon" help page?  We could recommend safe.directory=*, but
   they would mean the same thing in the context of running "git
   daemon".

   We may want to discuss who protects from whom with the
   safe.directory mechanism and git-daemon-export-ok mechanism.  The
   former is "the daemon trusts that repositories won't harm the
   daemon user", while the latter is "the repository owner is OK for
   it to be published".

   Also optionally, we may update the code to take the absolute path
   of the repository before passing it to the safe.directory check.

 * For "http-backend" invocations, we should think about potential
   additions that would help users, similar to what I listed above
   for "git daemon".

Having said all that, I do not think I mind GIT_SAFE_DIRECTORIES
that is a ":" separated list of paths that is honored just like the
multi-valued configuration variable safe.directory.  Once an
attacker can influence your environment variables, it already is
game over, so trusting it does not make the attack surface any
worse.  As Peff explained, we can trigger the more general "git -c
var=val" mechanism by exporting a set of environment variables, so
such a specialized environment variable is not strictly needed, but
it would make writing the "SetEnv" directive in apache configuration
(and similar ones for other HTTP server implementations) slighly
simpler and a lot more straight-forward.
Phillip Wood June 28, 2024, 9:35 a.m. UTC | #10
On 27/06/2024 16:28, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> On 26/06/2024 19:11, Junio C Hamano wrote:
>>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>>
>>>> To expand an this a little - a couple of times I've wanted to checkout
>>>> a bare repository that is owned by a different user. It is a pain to
>>>> have to add a new config setting just for a one-off checkout. Being
>>>> able to adjust the config on the command line would be very useful in
>>>> that case.
>>> True.  As long as it is deemed safe to honor the one-off "git -c
>>> safe.directory=..." from the command line, for the purpose of this
>>> "I who am running this 'git' process hereby declare that I trust
>>> this and that repository", I think it would be the best solution
>>> for the "git daemon" use case.
>>
>> This actually works already, the behavior was changed in 6061601d9f
>> (safe.directory: use git_protected_config(), 2022-07-14). The reason I
>> thought it didn't work was that I remember it failing on Debian
>> bullseye a few months ago but that used an older version of git. There
>> is some more rationale for the change in 779ea9303a7 (Documentation:
>> define protected configuration, 2022-07-14)
> 
> Thanks.
> 
> So, does this more or less conclude the episode about how best to
> deal with the 2.45.1 regression that Florian's patch in this thread
> started? 

I think so yes

> It seems that we already have enough mechanisms to help
> users tweak their existing set-up, so we may not need code changes,
> but I am wondering if we want to add a bit of documentation around
> safe.directory to tell them when it makes sense to set it, what
> value(s) they would want to set it to, etc.
> 
>   * For "git daemon" invocations, because we know the command is run
>     after chdir to a directory with '.' specified as the repository,
>     we recommend to have safe.directory=., either on the command line
>     with "-c var=val" or in daemon user's ~/.gitconfig, in the
>     "git-daemon" help page?  We could recommend safe.directory=*, but
>     they would mean the same thing in the context of running "git
>     daemon".

I think we'd be better to fix the safe.directory check as you suggest 
below if we can but failing that updating the documentation would 
certainly help.

>     We may want to discuss who protects from whom with the
>     safe.directory mechanism and git-daemon-export-ok mechanism.  The
>     former is "the daemon trusts that repositories won't harm the
>     daemon user", while the latter is "the repository owner is OK for
>     it to be published".

Yes that would be helpful

>     Also optionally, we may update the code to take the absolute path
>     of the repository before passing it to the safe.directory check.

I think doing this would be more helpful than updating the documentation 
to recommend adding "safe.directory=.". If we do this we would also want 
to convert "//" -> "/" in the config keys as we've been forcing users to 
add paths like "/srv/git//my-repo" if the --base-path argument to 
git-daemon ended with a "/"

>   * For "http-backend" invocations, we should think about potential
>     additions that would help users, similar to what I listed above
>     for "git daemon".

That sounds sensible.

> Having said all that, I do not think I mind GIT_SAFE_DIRECTORIES
> that is a ":" separated list of paths that is honored just like the
> multi-valued configuration variable safe.directory.  Once an
> attacker can influence your environment variables, it already is
> game over, so trusting it does not make the attack surface any
> worse.

Indeed in that case the attacker can influence the path that we read the 
protected config from by setting $HOME (and do far worse by setting $PATH)

> As Peff explained, we can trigger the more general "git -c
> var=val" mechanism by exporting a set of environment variables, so
> such a specialized environment variable is not strictly needed, but
> it would make writing the "SetEnv" directive in apache configuration
> (and similar ones for other HTTP server implementations) slighly
> simpler and a lot more straight-forward.

Yes having to set all the GIT_CONFIG_* variables can be rather confusing

Best Wishes

Phillip
Junio C Hamano June 28, 2024, 4:48 p.m. UTC | #11
Phillip Wood <phillip.wood123@gmail.com> writes:

>>     We may want to discuss who protects from whom with the
>>     safe.directory mechanism and git-daemon-export-ok mechanism.  The
>>     former is "the daemon trusts that repositories won't harm the
>>     daemon user", while the latter is "the repository owner is OK for
>>     it to be published".
>
> Yes that would be helpful

OK, let's see if somebody volunteers for documentation updates in
this area.

> I think doing this would be more helpful than updating the
> documentation to recommend adding "safe.directory=.". If we do this we
> would also want to convert "//" -> "/" in the config keys as we've
> been forcing users to add paths like "/srv/git//my-repo" if the
> --base-path argument to git-daemon ended with a "/"

OK, so the idea is to normalize both safe.directory and data->path
(which might come from either worktree or gitdir) and then look for
a match.  path needs to be normalized because it can say '.' and
'/srv/git//my-repo', and values of safe.directory need to be
normalized because the users may have written '.' --- oops, relative
to what directory do we normalize safe.directory values?  That would
not work.  Let me retry.

 - Compare entries of safe.directory with data->path literally
   without normalization, as the user may have written in the
   configuration "safe.directory=.", expecting that data->path to be
   '.' (the git-daemon use case).

 - Normalize entries of safe.directory and data->path and then
   compare them, turning path="." (the git-daemon use case) into
   "/srv/git/my-repo" and a safe.directory entry "/srv/git//my-repo"
   user wrote into "/srv/git/my-repo", so that they match.  

Or we could treat "." on safe.directory as a synonym for "*"
(i.e. "anything goes"), and compare all other cases only after
normalization (which would save the cost of "literal" comparison for
safe.directory entries that are not ".")?

I may have missed some corner cases, but either of these would
probably work.

>>   * For "http-backend" invocations, we should think about potential
>>     additions that would help users, similar to what I listed above
>>     for "git daemon".
>
> That sounds sensible.

OK.

>> Having said all that, I do not think I mind GIT_SAFE_DIRECTORIES
>> that is a ":" separated list of paths that is honored just like the
>> multi-valued configuration variable safe.directory.  Once an
>> attacker can influence your environment variables, it already is
>> game over, so trusting it does not make the attack surface any
>> worse.
>
> Indeed in that case the attacker can influence the path that we read
> the protected config from by setting $HOME (and do far worse by
> setting $PATH)
> ...
> Yes having to set all the GIT_CONFIG_* variables can be rather confusing

OK.  

So an independent effort may be to introduce the said environment
variable, and have it split at ':' and feed into the same machinery
used to check paths against safe.directory configuration.  It may
need a minor refactoring to lift the current comparison logic that
assumes it is _only_ driven by the git_config() callback, and we
would probably want to define how these two sources of whitelist
entries interact (who overrides what, is "an empty element clears
all the previous entries" still true, etc.).

So, I think we have three actionable items here.

Thanks.
Phillip Wood July 1, 2024, 3:24 p.m. UTC | #12
Hi Junio

On 28/06/2024 17:48, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> I think doing this would be more helpful than updating the
>> documentation to recommend adding "safe.directory=.". If we do this we
>> would also want to convert "//" -> "/" in the config keys as we've
>> been forcing users to add paths like "/srv/git//my-repo" if the
>> --base-path argument to git-daemon ended with a "/"
> 
> OK, so the idea is to normalize both safe.directory and data->path
> (which might come from either worktree or gitdir) and then look for
> a match.  path needs to be normalized because it can say '.' and
> '/srv/git//my-repo', and values of safe.directory need to be
> normalized because the users may have written '.' --- oops, relative
> to what directory do we normalize safe.directory values?  That would
> not work.  Let me retry.
> 
>   - Compare entries of safe.directory with data->path literally
>     without normalization, as the user may have written in the
>     configuration "safe.directory=.", expecting that data->path to be
>     '.' (the git-daemon use case).

I'm not sure this is a good idea because it is not clear which directory 
the user wanted to mark as safe when they added a relative directory to 
safe.directory. In the case of git-daemon one needs both the absolute 
path to the repository and "." to be present in safe.directory so we can 
ignore "." and match the absolute path.

>   - Normalize entries of safe.directory and data->path and then
>     compare them, turning path="." (the git-daemon use case) into
>     "/srv/git/my-repo" and a safe.directory entry "/srv/git//my-repo"
>     user wrote into "/srv/git/my-repo", so that they match.

We have several of normalization functions available:

  - normalize_path_copy() does a textual normalization which cleans up
    "//", "/./" and "/../".

  - absolute_pathdup() which prepends the current directory to relative
    paths attempting to use $PWD for the current directory where possible
    but does not expand symbolic links and does not clean up the path
    passed to it.

  - real_pathdup() which expands symbolic links

One way forward would be to clean up the entries in safe.directory with 
normalize_path_copy() and compare them to the result of normalizing 
$git_dir with absolute_pathdup() followed by normalize_path_copy(). That 
will ensure that we're always comparing the safe.directory entries 
against an absolute path and both sides of the comparison are textually 
normalized. I'm not sure whether we'd be better to use 
absolute_pathdup() or real_pathdup() or if we'd be safer comparing the 
output of both against safe.directory if they give different results. If 
this sounds reasonable I'll try and put a patch together later this week.

> Or we could treat "." on safe.directory as a synonym for "*"
> (i.e. "anything goes"), and compare all other cases only after
> normalization (which would save the cost of "literal" comparison for
> safe.directory entries that are not ".")?

Having more than one way to spell "*" sounds confusing. Assuming "." has 
only been added as a workaround for the current limitations in our 
safe.directory comparisons I think we'd be better to ignore it.

Best Wishes

Phillip

> I may have missed some corner cases, but either of these would
> probably work.
> 
>>>    * For "http-backend" invocations, we should think about potential
>>>      additions that would help users, similar to what I listed above
>>>      for "git daemon".
>>
>> That sounds sensible.
> 
> OK.
> 
>>> Having said all that, I do not think I mind GIT_SAFE_DIRECTORIES
>>> that is a ":" separated list of paths that is honored just like the
>>> multi-valued configuration variable safe.directory.  Once an
>>> attacker can influence your environment variables, it already is
>>> game over, so trusting it does not make the attack surface any
>>> worse.
>>
>> Indeed in that case the attacker can influence the path that we read
>> the protected config from by setting $HOME (and do far worse by
>> setting $PATH)
>> ...
>> Yes having to set all the GIT_CONFIG_* variables can be rather confusing
> 
> OK.
> 
> So an independent effort may be to introduce the said environment
> variable, and have it split at ':' and feed into the same machinery
> used to check paths against safe.directory configuration.  It may
> need a minor refactoring to lift the current comparison logic that
> assumes it is _only_ driven by the git_config() callback, and we
> would probably want to define how these two sources of whitelist
> entries interact (who overrides what, is "an empty element clears
> all the previous entries" still true, etc.).
> 
> So, I think we have three actionable items here.
> 
> Thanks.
> 
>
Johannes Schindelin July 1, 2024, 4:34 p.m. UTC | #13
Hi Phillip,

On Wed, 26 Jun 2024, Phillip Wood wrote:

> On 26/06/2024 14:11, Phillip Wood wrote:
> > Hi Florian
> >
> > On 26/06/2024 13:33, Florian Schmaus wrote:
> > > Sometimes more flexibility to disable/ignore the ownership check, besides
> > > the safe.directory configuration option, is required.
> > >
> > > For example, git-daemon running as nobody user, which typically has no
> > > home directory. Therefore, we can not add the path to a user-global
> > > configuration and adding the path to the system-wide configuration could
> > > have negative security implications.
> > >
> > > Therefore, make the check configurable via an environment variable.
> >
> > An alternative would be to allow safe.directory to be specified on the
> > command line with "git -c safe.directory='*' daemon ..." rather than adding
> > a dedicated environment variable.
>
> To expand an this a little - a couple of times I've wanted to checkout a bare
> repository that is owned by a different user. It is a pain to have to add a
> new config setting just for a one-off checkout. Being able to adjust the
> config on the command line would be very useful in that case.

It is somewhat surprising that this `-c safe.directory=*` method does
_not_ work for local clones. To verify, I ran this:

  git init --bare other-user.git &&
  sudo chown -R 9999.9999 other-user.git/ &&
  git -c safe.directory=\* clone other-user.git/

This will complain about the dubious ownership and suggest to add the
`safe.directory` setting to the user-wide config, ignoring the
command-line config altogether.

The reason is to be found in
https://github.com/git/git/blob/v2.45.2/connect.c#L1462-L1464:

		/* remove repo-local variables from the environment */
		for (var = local_repo_env; *var; var++)
			strvec_push(&conn->env, *var);

The `local_repo_env` array _specifically_ lists `GIT_CONFIG_PARAMETERS` in
https://github.com/git/git/blob/v2.45.2/environment.c#L129 to be removed
from the environment when spawning the `git upload-pack` process.

It was not originally listed, but added via
https://lore.kernel.org/git/20100824064114.GA20724@burratino/, where the
commit message does not really shed light into the question why this was
desirable, and there is no discussion in that mail thread about this
aspect of the patch, but at least the added test case reveals the
intention in some sort of way: The `-c` option allows to specify
`receive.denyDeletes`, and in the described scenario the idea was that it
would only apply to the client side of a local `receive-pack` but not the
"remote" one. As the example above illustrates, that patch might have
been overly focused on one specific, particular scenario.

Ciao,
Johannes
Junio C Hamano July 1, 2024, 5:32 p.m. UTC | #14
Phillip Wood <phillip.wood123@gmail.com> writes:

>>   - Compare entries of safe.directory with data->path literally
>>     without normalization, as the user may have written in the
>>     configuration "safe.directory=.", expecting that data->path to be
>>     '.' (the git-daemon use case).
>
> I'm not sure this is a good idea because it is not clear which
> directory the user wanted to mark as safe when they added a relative
> directory to safe.directory. In the case of git-daemon one needs both
> the absolute path to the repository and "." to be present in
> safe.directory so we can ignore "." and match the absolute path.

IOW, we do not bend over backwards to try to be backward compatible
on this point?  I can go with that, especially because it smells
like (I haven't thought deeply about it yet, though) that approach
can simplify the checks.

>>   - Normalize entries of safe.directory and data->path and then
>>     compare them, turning path="." (the git-daemon use case) into
>>     "/srv/git/my-repo" and a safe.directory entry "/srv/git//my-repo"
>>     user wrote into "/srv/git/my-repo", so that they match.
>
> We have several of normalization functions available:
>
>  - normalize_path_copy() does a textual normalization which cleans up
>    "//", "/./" and "/../".
>
>  - absolute_pathdup() which prepends the current directory to relative
>    paths attempting to use $PWD for the current directory where possible
>    but does not expand symbolic links and does not clean up the path
>    passed to it.
>
>  - real_pathdup() which expands symbolic links
>
> One way forward would be to clean up the entries in safe.directory
> with normalize_path_copy() and compare them to the result of
> normalizing $git_dir with absolute_pathdup() followed by
> normalize_path_copy(). That will ensure that we're always comparing
> the safe.directory entries against an absolute path and both sides of
> the comparison are textually normalized. I'm not sure whether we'd be
> better to use absolute_pathdup() or real_pathdup() or if we'd be safer
> comparing the output of both against safe.directory if they give
> different results. If this sounds reasonable I'll try and put a patch
> together later this week.

Hmph.  Are runtime-detected (not from $GIT_DIR environment and
friends) gitdir and/or worktree paths always relative?  If we let
getcwd() involved in the process of turning them absolute, these
paths may already have symbolic links "expanded", so we have no
choice other than expanding symbolic links before using paths in
safe.directory to compare with them.

For example, the paths from safe.directory come from the end-user,
and may say things like "/home/phillip/repos/*", because phillip and
everybody else on the system think /home/$USER should be where their
home directories ought to be, but that was based on the niceness of
sysadmins to arrange "/home/phillip" to be a symbolic link to
"/home1/phillip" because the "/home" partition has already run out
of space to add more users.  When gitdir and/or worktree paths are
computed using getcwd(), they would be paths somewhere under the
"/home1/phillip/repos/" directory.  Letting real_pathdup() to deal
with the symbolic links may be the only usable way in such a set-up
available for us.

But we will of course make tons more synonymous paths by using
real_pathdup() to normalize both the safe.directory entries and the
gitdir and/or worktree paths---are there security downsides in doing
so?
Jeff King July 1, 2024, 6:19 p.m. UTC | #15
On Mon, Jul 01, 2024 at 06:34:10PM +0200, Johannes Schindelin wrote:

> The `local_repo_env` array _specifically_ lists `GIT_CONFIG_PARAMETERS` in
> https://github.com/git/git/blob/v2.45.2/environment.c#L129 to be removed
> from the environment when spawning the `git upload-pack` process.
> 
> It was not originally listed, but added via
> https://lore.kernel.org/git/20100824064114.GA20724@burratino/, where the
> commit message does not really shed light into the question why this was
> desirable, and there is no discussion in that mail thread about this
> aspect of the patch, but at least the added test case reveals the
> intention in some sort of way: The `-c` option allows to specify
> `receive.denyDeletes`, and in the described scenario the idea was that it
> would only apply to the client side of a local `receive-pack` but not the
> "remote" one. As the example above illustrates, that patch might have
> been overly focused on one specific, particular scenario.

One reason we haven't loosened local_repo_env for the local transport is
that it would make it inconsistent with all of the other transports.
I.e., "git -c foo.bar=baz fetch" would still be ignored over ssh, https,
and so on, because those transports don't pass environment variables.

There's some more discussion from a similar case that came up a month
ago:

  https://lore.kernel.org/git/20240529102307.GF1098944@coredump.intra.peff.net/

-Peff
Junio C Hamano July 1, 2024, 8:40 p.m. UTC | #16
Jeff King <peff@peff.net> writes:

> On Mon, Jul 01, 2024 at 06:34:10PM +0200, Johannes Schindelin wrote:
>
>> The `local_repo_env` array _specifically_ lists `GIT_CONFIG_PARAMETERS` in
>> https://github.com/git/git/blob/v2.45.2/environment.c#L129 to be removed
>> from the environment when spawning the `git upload-pack` process.
>> 
>> It was not originally listed, but added via
>> https://lore.kernel.org/git/20100824064114.GA20724@burratino/, where the
>> commit message does not really shed light into the question why this was
>> desirable, and there is no discussion in that mail thread about this
>> aspect of the patch, but at least the added test case reveals the
>> intention in some sort of way: The `-c` option allows to specify
>> `receive.denyDeletes`, and in the described scenario the idea was that it
>> would only apply to the client side of a local `receive-pack` but not the
>> "remote" one. As the example above illustrates, that patch might have
>> been overly focused on one specific, particular scenario.
>
> One reason we haven't loosened local_repo_env for the local transport is
> that it would make it inconsistent with all of the other transports.
> I.e., "git -c foo.bar=baz fetch" would still be ignored over ssh, https,
> and so on, because those transports don't pass environment variables.
>
> There's some more discussion from a similar case that came up a month
> ago:
>
>   https://lore.kernel.org/git/20240529102307.GF1098944@coredump.intra.peff.net/

Thanks.  I wonder if there is a way to add this kind of pieces of
information to old commits and discussion threads around it after
the fact, and if it helps us (like Dscho who wondered why we decided
if it is a good idea, and more importantly if we still think it is a
good idea and why).

    ... and then goes back to see the original discussion thread,
    with the "bright idea" that I could just follow up on 14-year
    old discussion thread.  Only to find that despite what Dscho
    said, the commit message does say why it is desirable ("to
    imitate remote transport well") already.

So, I guess we do not really need to do such a post-annotation in
this particular case, but I think after seeing somebody posting a
message like the one I am responding to and finding it helpful, it
would be helpful if somebody can post a message pointing at it as a
response to the old thread that wants a post-annotation.
Jeff King July 1, 2024, 10:25 p.m. UTC | #17
[+cc Eric for some possible public-inbox wisdom]

On Mon, Jul 01, 2024 at 01:40:18PM -0700, Junio C Hamano wrote:

> > There's some more discussion from a similar case that came up a month
> > ago:
> >
> >   https://lore.kernel.org/git/20240529102307.GF1098944@coredump.intra.peff.net/
> 
> Thanks.  I wonder if there is a way to add this kind of pieces of
> information to old commits and discussion threads around it after
> the fact, and if it helps us (like Dscho who wondered why we decided
> if it is a good idea, and more importantly if we still think it is a
> good idea and why).
> 
>     ... and then goes back to see the original discussion thread,
>     with the "bright idea" that I could just follow up on 14-year
>     old discussion thread.  Only to find that despite what Dscho
>     said, the commit message does say why it is desirable ("to
>     imitate remote transport well") already.
> 
> So, I guess we do not really need to do such a post-annotation in
> this particular case, but I think after seeing somebody posting a
> message like the one I am responding to and finding it helpful, it
> would be helpful if somebody can post a message pointing at it as a
> response to the old thread that wants a post-annotation.

Usually I find myself digging backwards in history, following links to
old threads. But I guess what you are asking is how would somebody
looking at old thread XYZ know that it was mentioned much later.

And I think the solution is for the new thread to just link to the old
one by message-id (i.e., the usual lore links). And then searching for
that message-id in the archive could turn up the later threads. I don't
know how well public-inbox handles that in practice, though:

  1. Do things that look like message-ids get searched for in message
     bodies? I'd think so if you don't explicitly say "this is a message
     id".

  2. It's really a multi-element search. If I have a thread with 10
     messages, I'd really like to know of more recent threads that
     linked back to _any_ message in the thread. You'd probably have to
     feed them all manually. But in theory indexing could generate some
     kind of bidirectional "related" link.

I don't often do this with message-ids, but I frequently do find other
references by doing a full-text search for commit hashes, or phrases
from commit subjects. I usually do so with my local notmuch archive,
rather than using public-inbox, but I think you should be able to do
phrase searches there, too.

-Peff
Eric Wong July 2, 2024, 12:19 a.m. UTC | #18
Jeff King <peff@peff.net> wrote:
> On Mon, Jul 01, 2024 at 01:40:18PM -0700, Junio C Hamano wrote:
> > Thanks.  I wonder if there is a way to add this kind of pieces of
> > information to old commits and discussion threads around it after
> > the fact, and if it helps us (like Dscho who wondered why we decided
> > if it is a good idea, and more importantly if we still think it is a
> > good idea and why).
> > 
> >     ... and then goes back to see the original discussion thread,
> >     with the "bright idea" that I could just follow up on 14-year
> >     old discussion thread.  Only to find that despite what Dscho
> >     said, the commit message does say why it is desirable ("to
> >     imitate remote transport well") already.
> > 
> > So, I guess we do not really need to do such a post-annotation in
> > this particular case, but I think after seeing somebody posting a
> > message like the one I am responding to and finding it helpful, it
> > would be helpful if somebody can post a message pointing at it as a
> > response to the old thread that wants a post-annotation.

I think adding code comments referencing commit OIDs and URLs
with Message-IDs can help.  I've noticed many people (usually in
other projects) haven't learned to use git (log|blame) :<

> Usually I find myself digging backwards in history, following links to
> old threads. But I guess what you are asking is how would somebody
> looking at old thread XYZ know that it was mentioned much later.
> 
> And I think the solution is for the new thread to just link to the old
> one by message-id (i.e., the usual lore links). And then searching for
> that message-id in the archive could turn up the later threads. I don't
> know how well public-inbox handles that in practice, though:
> 
>   1. Do things that look like message-ids get searched for in message
>      bodies? I'd think so if you don't explicitly say "this is a message
>      id".

Yes, encapsulating via double-quotes to turn it into a phrase
search may help if the Message-ID contains dashes and such.

>   2. It's really a multi-element search. If I have a thread with 10
>      messages, I'd really like to know of more recent threads that
>      linked back to _any_ message in the thread. You'd probably have to
>      feed them all manually. But in theory indexing could generate some
>      kind of bidirectional "related" link.

You can also join a bunch of Message-IDs (or any other query)
via `OR' elements to ensure you don't miss things.

<https://xapian.org/docs/queryparser.html> has a lot more
details on combining things which apply to notmuch, too.

> I don't often do this with message-ids, but I frequently do find other
> references by doing a full-text search for commit hashes, or phrases
> from commit subjects. I usually do so with my local notmuch archive,
> rather than using public-inbox, but I think you should be able to do
> phrase searches there, too.

Yeah.  That's fairly easy to do for Junio's git <=> git@vger
mirror.  It's more challenging to make work (and presentable)
for hundreds of kernel list archives and linux.git mirrors,
especially on my 15 year old hardware, but progress is being
made since I'm resorting to using C++ for Xapian :x
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index 3afa6fb09b28..da3f504fb536 100644
--- a/setup.c
+++ b/setup.c
@@ -1278,6 +1278,14 @@  static int ensure_valid_ownership(const char *gitfile,
 	 */
 	git_protected_config(safe_directory_cb, &data);
 
+	if (data.is_safe)
+		return data.is_safe;
+
+	if (git_env_bool("GIT_IGNORE_INSECURE_OWNER", 0)) {
+		warning("ignoring dubious ownership in repository at '%s' (GIT_IGNORE_INSECURE_OWNER set)", data.path);
+		return 1;
+	}
+
 	return data.is_safe;
 }