urlmatch: do not allow passwords in URLs by default

Message ID	pull.945.git.1619807844627.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <pull.945.git.1619807844627.gitgitgadget@gmail.com> Date: Fri, 30 Apr 2021 18:37:24 +0000 Subject: [PATCH] urlmatch: do not allow passwords in URLs by default Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, peff@peff.net, me@ttaylorr.com, avarab@gmail.com, christian.couder@gmail.com, johannes.schindelin@gmx.de, jrnieder@gmail.com, Derrick Stolee <derrickstolee@github.com>, Derrick Stolee <dstolee@microsoft.com> Precedence: bulk From: Derrick Stolee <dstolee@microsoft.com>
Series	urlmatch: do not allow passwords in URLs by default \| expand urlmatch: do not allow passwords in URLs by default

Derrick Stolee April 30, 2021, 6:37 p.m. UTC

From: Derrick Stolee <dstolee@microsoft.com>

Git allows URLs of the following pattern:

  https://username:password@domain/route

These URLs are then parsed to pull out the username and password for use
when authenticating with the URL. Git is careful to anonymize the URL in
status messages with transport_anonymize_url(), but it stores the URL as
plaintext in the .git/config file. The password may leak in other ways.

This is not a recommended way to store credentials, especially
authentication tokens that do more than simply allow access to a
repository.

Users should be aware that when they provide a URL in this form that
they are not being incredibly secure. It is much better to use a
credential manager to securely store passwords. Even better, some
credential managers use more sophisticated authentication strategies
including multi-factor authentication. This does not stop users from
continuing to do this.

Some Git hosting providers are working to completely drop
username/password credential strategies, which will make URLs of this
form stop working. However, that requires certain changes to credential
managers that need to be released and sufficiently adopted before making
such a server-side change.

In the meantime, it might be helpful to alert users that they are doing
something insecure with these URLs.

Create a new config option, core.allowUsernamePasswordUrls, which is
disabled by default. If Git attempts to parse a password from a URL in
this form, it will die() if this config is not enabled. This affects a
few test scripts, but enabling the config in those places is relatively
simple.

This will cause a significant change in behavior for users who rely upon
this username:password pattern. The error message describes the config
that they must enable to continue working with these URLs. This has a
significant chance of breaking some automated workflows that use URLs in
this fashion, but even those workflows would be better off using a
different mechanism for credentials.

I cannot understate the care in which we should consider this change.
The impact of this change in a Git release could be significant. We
should advertise this very clearly in the release notes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    Reject passwords in URLs
    
    I received multiple messages "alerting" me to the issue of users
    supplying server-side tokens into the username:password field of a URL.
    This is not a secure way to handle these tokens.
    
    On the one hand, this is user error: Users should not supply a token to
    a location where they do not know what will happen to it. In Git's
    defense, its behavior is completely open about storing the URL in the
    .git/config file as a plain-text string and users should know that when
    using this feature.
    
    However, users just. keep. doing it.
    
    There is some expectation that since this portion of the URL is a
    password, then Git is responsible for tracking that password securely.
    I'm not sure we should venture down that road, since we already have a
    pretty good solution by using the credential helper interface.
    
    Here is my best effort to find a compromise here: start failing when
    parsing a password from a URL like this, with a config option to
    re-enable the existing behavior.
    
    I completely understand if this is too much of a breaking change. I
    wonder if there is anything we can do to assist users into being more
    careful with their secrets.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-945%2Fderrickstolee%2Freject-passwords-in-urls-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-945/derrickstolee/reject-passwords-in-urls-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/945

 Documentation/config/core.txt     |  7 +++++++
 t/t0110-urlmatch-normalization.sh |  4 ++++
 t/t5541-http-push-smart.sh        |  9 +++++++--
 t/t5550-http-fetch-dumb.sh        |  3 ++-
 t/t5601-clone.sh                  |  5 +++++
 urlmatch.c                        | 14 ++++++++++++++
 6 files changed, 39 insertions(+), 3 deletions(-)


base-commit: 7e391989789db82983665667013a46eabc6fc570

Jeff King April 30, 2021, 6:50 p.m. UTC | #1

On Fri, Apr 30, 2021 at 06:37:24PM +0000, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Git allows URLs of the following pattern:
> 
>   https://username:password@domain/route
> 
> These URLs are then parsed to pull out the username and password for use
> when authenticating with the URL. Git is careful to anonymize the URL in
> status messages with transport_anonymize_url(), but it stores the URL as
> plaintext in the .git/config file. The password may leak in other ways.

I'm not really opposed to disallowing this entirely (with an escape
hatch, as you have here), because it really is an awful practice for a
lot of reasons. But another option we discussed previously was to allow
the initial clone, but not store the password, which would result in the
user being prompted for subsequent fetches:

  https://lore.kernel.org/git/20190519050724.GA26179@sigill.intra.peff.net/

I think that third patch there is just too gross. But with the first
two, if you do have a credential helper configured, then:

  git clone https://user:pass@example.com/repo.git

would do what you want: clone with that user/pass, and then store the
result in the credential helper.

> @@ -191,6 +204,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
>  			}
>  			colon_ptr = strchr(norm.buf + scheme_len + 3, ':');
>  			if (colon_ptr) {
> +				die_if_username_password_not_allowed();
>  				passwd_off = (colon_ptr + 1) - norm.buf;
>  				passwd_len = norm.len - passwd_off;
>  				user_len = (passwd_off - 1) - (scheme_len + 3);

It's probably a bit nicer to just ignore the password, which will prompt
the user. But then, it is nicer still to use it just the one time but
not store it in the .git/config file. :)

-Peff

brian m. carlson May 1, 2021, 2 a.m. UTC | #2

On 2021-04-30 at 18:37:24, Derrick Stolee via GitGitGadget wrote:
> Create a new config option, core.allowUsernamePasswordUrls, which is
> disabled by default. If Git attempts to parse a password from a URL in
> this form, it will die() if this config is not enabled. This affects a
> few test scripts, but enabling the config in those places is relatively
> simple.

Let's call this http.allowUsernamePasswordURLs (or
http.allowCredentialsInURL) because this is ultimately about HTTP and
HTTPS (and maybe FTP if we still support that, which I certainly hope we
do not).  SSH doesn't have URLs and won't read a password from either
the URL or a credential helper, since OpenSSH won't read a password from
anything but a terminal (which is secure, but occasionally irritating).

> This will cause a significant change in behavior for users who rely upon
> this username:password pattern. The error message describes the config
> that they must enable to continue working with these URLs. This has a
> significant chance of breaking some automated workflows that use URLs in
> this fashion, but even those workflows would be better off using a
> different mechanism for credentials.

I will admit to using this pattern in a test I was writing just this
week.  I ultimately switched to an environment-based credential helper
(à la FAQ) in my test, but I think automated tests and other situations
where the credentials don't matter are really the only cases where this
is okay.  I do think we will break some systems as a result, especially
in situations where users cannot otherwise specify credentials (e.g., a
SaaS offering which clones your repository to provide some
functionality).

So I am a bit torn about this.  On one hand, we should really encourage
much more secure options whenever possible and I'm glad this does this,
but on the other hand, there are some useful cases where this is
unobjectionable or at least the least terrible option and the config
option may be a problem.  Don't let my doubts hold up this series if
everyone else is for it, though.  It's definitely an improvement in
security.

It is my intention (in my copious free time) to adjust credential
helpers to support arbitrary auth schemes (e.g., Bearer).  At that
point, I plan to deprecate support for Authorization extra headers.  In
that case, because the user is guaranteed to have the opportunity to
edit the config, they are guaranteed to have credential helper support
and therefore there are no use cases we're excluding.

Christian Couder May 1, 2021, 6:39 a.m. UTC | #3

On Fri, Apr 30, 2021 at 8:37 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Git allows URLs of the following pattern:
>
>   https://username:password@domain/route

[...]

> Some Git hosting providers are working to completely drop
> username/password credential strategies, which will make URLs of this
> form stop working. However, that requires certain changes to credential
> managers that need to be released and sufficiently adopted before making
> such a server-side change.
>
> In the meantime, it might be helpful to alert users that they are doing
> something insecure with these URLs.

Another helpful thing to do might be to add --user and maybe
--password options to some commands like 'clone', 'fetch', 'remote
add', etc.

I think historically we considered that authentication wasn't Git's
responsibility. If we now think it should be concerned about this,
then --user and --password options might be a good way to start taking
responsibility.

For example `git clone --user XXX --password YYY
https://git.example.com/git/git.git` could use an HTTP header to send
the credentials, and then after the clone maybe (if a terminal is
used) ask if the user would like to save the credentials using a
credential manager.

I think this could be both as easy, or even easier, to use than an URL
with credentials and more secure. We could also make things more
secure over time by suggesting better credential managers as they
improve.

Also I wonder if on Linux a credential manager could encrypt HTTP
credentials and store them locally using the user's private ssh key if
there is one.

Thanks,
Christian.

Ævar Arnfjörð Bjarmason May 1, 2021, 8:44 a.m. UTC | #4

On Fri, Apr 30 2021, Derrick Stolee via GitGitGadget wrote:

Just nits on the patch, will reply to the idea in another message:

> [...]
> +test_expect_success 'enable username:password urls' '
> +	git config --global core.allowUsernamePasswordUrls true
> +'

Hrm, --global? In any case isn't it also better here to tweak this for
specific tests?

>  test_expect_success 'push status output scrubs password' '
>  	cd "$ROOT_PATH/test_repo_clone" &&
> +	git config core.allowUsernamePasswordUrls true &&
>  	git push --porcelain \
>  		"$HTTPD_URL_USER_PASS/smart/test_repo.git" \
>  		+HEAD:scrub >status &&
> @@ -469,9 +470,11 @@ test_expect_success 'push status output scrubs password' '

Use test_config instead, unless this is really "setup for the rest of
the tests" in disguise, but IMO even more of a reason to use test_config
for each one.

>  test_expect_success 'clone/fetch scrubs password from reflogs' '
>  	cd "$ROOT_PATH" &&
> -	git clone "$HTTPD_URL_USER_PASS/smart/test_repo.git" \
> +	git -c core.allowUsernamePasswordUrls=true clone \
> +		"$HTTPD_URL_USER_PASS/smart/test_repo.git" \
>  		reflog-test &&
>  	cd reflog-test &&
> +	git config core.allowUsernamePasswordUrls true &&

Ditto. Although redundant in your patch, no, since we've set it to true
above?

> +test_expect_success 'clone fails when using username:password' '
> +	test_must_fail git clone https://username:password@bogus.url 2>err &&
> +	test_i18ngrep "attempted to parse a URL with a plain-text username and password" err
> +'
> +

Just use grep, not test_i18ngrep. GETTEXT_POISON is gone.

>  test_expect_success 'clone from hooks' '
>  
>  	test_create_repo r0 &&
> diff --git a/urlmatch.c b/urlmatch.c
> index 33a2ccd306b6..e81ec9e1fc0b 100644
> --- a/urlmatch.c
> +++ b/urlmatch.c
> @@ -1,5 +1,6 @@
>  #include "cache.h"
>  #include "urlmatch.h"
> +#include "config.h"
>  
>  #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
>  #define URL_DIGIT "0123456789"
> @@ -106,6 +107,18 @@ static int match_host(const struct url_info *url_info,
>  	return (!url_len && !pat_len);
>  }
>  
> +static void die_if_username_password_not_allowed(void)
> +{
> +	int opt_in = 0;
> +	if (!git_config_get_bool("core.allowusernamepasswordurls", &opt_in) &&
> +	    opt_in)
> +		return;

API use nit: You need to either initialize "opt_in = 0" or check the
git_config_get_bool() return value. Doing both isn't strictly
needed. I.e. either of these would work:

    int opt_in = 0;
    git_config_get_bool(..., &opt_in);
    if (opt_in) ...;

Or:

    int opt_in;
    if (!git_config_get_bool(..., &opt_in) && opt_in)

No?

Ævar Arnfjörð Bjarmason May 1, 2021, 8:52 a.m. UTC | #5

On Fri, Apr 30 2021, Derrick Stolee via GitGitGadget wrote:

Now for a comment on the direction...:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Git allows URLs of the following pattern:
>
>   https://username:password@domain/route
>
> These URLs are then parsed to pull out the username and password for use
> when authenticating with the URL. Git is careful to anonymize the URL in
> status messages with transport_anonymize_url(), but it stores the URL as
> plaintext in the .git/config file. The password may leak in other ways.
>
> This is not a recommended way to store credentials, especially
> authentication tokens that do more than simply allow access to a
> repository.
>
> Users should be aware that when they provide a URL in this form that
> they are not being incredibly secure. It is much better to use a

"Incredibly secure"? Security so good you wouldn't believe it if you saw
it ? :)

> credential manager to securely store passwords. Even better, some
> credential managers use more sophisticated authentication strategies
> including multi-factor authentication. This does not stop users from
> continuing to do this.
>
> Some Git hosting providers are working to completely drop
> username/password credential strategies, which will make URLs of this
> form stop working. However, that requires certain changes to credential
> managers that need to be released and sufficiently adopted before making
> such a server-side change.
>
> In the meantime, it might be helpful to alert users that they are doing
> something insecure with these URLs.
>
> Create a new config option, core.allowUsernamePasswordUrls, which is
> disabled by default. If Git attempts to parse a password from a URL in
> this form, it will die() if this config is not enabled. This affects a
> few test scripts, but enabling the config in those places is relatively
> simple.
>
> This will cause a significant change in behavior for users who rely upon
> this username:password pattern. The error message describes the config
> that they must enable to continue working with these URLs. This has a
> significant chance of breaking some automated workflows that use URLs in
> this fashion, but even those workflows would be better off using a
> different mechanism for credentials.

...

> I cannot understate the care in which we should consider this change.
> The impact of this change in a Git release could be significant. We
> should advertise this very clearly in the release notes.

Seems like something more for below-the-"---" than a commit
message. If/when it lands the "consider this change" has already
happened.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>     Reject passwords in URLs
>     
>     I received multiple messages "alerting" me to the issue of users
>     supplying server-side tokens into the username:password field of a URL.
>     This is not a secure way to handle these tokens.
>     
>     On the one hand, this is user error: Users should not supply a token to
>     a location where they do not know what will happen to it. In Git's
>     defense, its behavior is completely open about storing the URL in the
>     .git/config file as a plain-text string and users should know that when
>     using this feature.
>     
>     However, users just. keep. doing it.
>     
>     There is some expectation that since this portion of the URL is a
>     password, then Git is responsible for tracking that password securely.
>     I'm not sure we should venture down that road, since we already have a
>     pretty good solution by using the credential helper interface.
>     
>     Here is my best effort to find a compromise here: start failing when
>     parsing a password from a URL like this, with a config option to
>     re-enable the existing behavior.
>     
>     I completely understand if this is too much of a breaking change. I
>     wonder if there is anything we can do to assist users into being more
>     careful with their secrets.

I think a good staring compromise would not be to make long-standing
supported behavior an error, but to use the advise() system to emit some
notice about this.

While I get what problem you're trying to solve in practice, I disagree
with the strong assertion that a user is "doing something insecure" by
definition by using such URLs.

If you trust your FS permissions, and as a local user, ultimately if you
didn't a credential manager wouldn't be much use anyway, and/or
use/don't care (e.g. closed network) about transport security then
having a password in your config is just fine, and doesn't otherwise
compromise security.

Unlike SSH this is a thing that the HTTP protocol supports, so I don't
think it's the place of git to go out of its way by default to break
working URL schemas by making hard breaking assumptions about the user's
workflow.

I think a much better way to address this issue, if it's an issue that
needs to be addressed, is on the server-side. The service operator is in
a much better position to know if URL passwords are a bad idea in their
case (e.g. is the software frequently used in certain ways, is there no
transport security, are we using debug URL tracing etc).

You allude to some of that in your commit message. "Some Git hosting
providers". Some? GitHub? In any case, to me that's further reason we
should hold off on such a change in git.git. Let the big hosting
providers experiment with this, and if e.g. they all decide to stop
supporting this and it therefore becomes less common practice we'll be
better informed about if/what to change in git.git.

Junio C Hamano May 3, 2021, 3:38 a.m. UTC | #6

Christian Couder <christian.couder@gmail.com> writes:

> Another helpful thing to do might be to add --user and maybe
> --password options to some commands like 'clone', 'fetch', 'remote
> add', etc.

Why?

We cannot get rid of <scheme>://<user>:<pass>@<host>/<path> right
away, but I'd imagine that we'd prefer to see fewer places on the
command line for users to leave the password that would end up in
their .bashrc and other places.

And I like the idea raised elsewhere in the thread to forward the
<pass> to credential helper and leave ":<pass>" part out of the
stored URL.

Thanks.

Robert Coup May 3, 2021, 8:40 a.m. UTC | #7

On Sat, 1 May 2021 at 10:13, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> Unlike SSH this is a thing that the HTTP protocol supports, so I don't
> think it's the place of git to go out of its way by default to break
> working URL schemas by making hard breaking assumptions about the user's
> workflow.
>
> I think a much better way to address this issue, if it's an issue that
> needs to be addressed, is on the server-side. The service operator is in
> a much better position to know if URL passwords are a bad idea in their
> case (e.g. is the software frequently used in certain ways, is there no
> transport security, are we using debug URL tracing etc).

A URL scheme of the form https://user:password@example.com/my.git gets
translated by the http *client* into a request of the form:

  <Connect via TLS to example.com:443>
  GET /my.git HTTP/1.1
  Host: example.com
  Authorization: Basic dXNlcjpwYXNzd29yZAo=
  ... some more headers ...

So the server can't determine whether either/both the username or
password in the Authorization header were retrieved from a credential
helper, were parsed from a URL string, or came via a terminal prompt.
Conceivably If the server only uses a non-default Authorization method
("Token", "Bearer", "Key", etc) or uses a separate header (eg:
"MyGitHosting-Auth") then it could conceivably block Basic
Authorization, and afaik Git only supports that approach via
http.extraHeader - which doesn't go via any credential helpers, and is
essentially obscuring authentication as something else (and is pretty
user-hostile).

Rob :)

Derrick Stolee May 3, 2021, 11:54 a.m. UTC | #8

(Forgive my top-reply, but this part of the message is intended to
summarize all replies in the thread.)

Thanks, everyone for the thoughtful comments. I appreciate the multiple
directions recommended in the thread, including making this config be a
tri-state with these states:

1. Keep existing behavior.
2. Warn if Git sees credentials in a URL.
3. Die if Git sees credentials in a URL.

This approach provides a good mechanism for transitioning from the
current state (1) to the die state (3): we can set (2) as default for
a while.

I believe something like this will be necessary to alert users who have
already created repositories with credentials in their .git/config files.

But, there is something better we can do that will be more helpful for
users still using this at "git clone" time, without causing serious
damage to automated scenarios:

On 4/30/2021 2:50 PM, Jeff King wrote:
> On Fri, Apr 30, 2021 at 06:37:24PM +0000, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Git allows URLs of the following pattern:
>>
>>   https://username:password@domain/route
>>
>> These URLs are then parsed to pull out the username and password for use
>> when authenticating with the URL. Git is careful to anonymize the URL in
>> status messages with transport_anonymize_url(), but it stores the URL as
>> plaintext in the .git/config file. The password may leak in other ways.
> 
> I'm not really opposed to disallowing this entirely (with an escape
> hatch, as you have here), because it really is an awful practice for a
> lot of reasons. But another option we discussed previously was to allow
> the initial clone, but not store the password, which would result in the
> user being prompted for subsequent fetches:
> 
>   https://lore.kernel.org/git/20190519050724.GA26179@sigill.intra.peff.net/
> 
> I think that third patch there is just too gross. But with the first
> two, if you do have a credential helper configured, then:
> 
>   git clone https://user:pass@example.com/repo.git
> 
> would do what you want: clone with that user/pass, and then store the
> result in the credential helper.

This seems like the best approach, as it presents the highest likelihood
of working as expected in the automated scenarios. I will take a look to
see how I could adapt those patches and maybe make the third one better.

I think a combined approach would be good. We should still warn that this
usage pattern is unsafe, because users might use it in an environment
where their commands are being logged and stored another way.

Is it possible that some Git installations have no credential helper? We
can keep the "git clone" working in that scenario by storing the password
in memory until the process completes, but later "git fetch" commands
will fail.

Thanks,
-Stolee

Jeff King May 3, 2021, 2:53 p.m. UTC | #9

On Mon, May 03, 2021 at 07:54:50AM -0400, Derrick Stolee wrote:

> > I'm not really opposed to disallowing this entirely (with an escape
> > hatch, as you have here), because it really is an awful practice for a
> > lot of reasons. But another option we discussed previously was to allow
> > the initial clone, but not store the password, which would result in the
> > user being prompted for subsequent fetches:
> > 
> >   https://lore.kernel.org/git/20190519050724.GA26179@sigill.intra.peff.net/
> > 
> > I think that third patch there is just too gross. But with the first
> > two, if you do have a credential helper configured, then:
> > 
> >   git clone https://user:pass@example.com/repo.git
> > 
> > would do what you want: clone with that user/pass, and then store the
> > result in the credential helper.
> 
> This seems like the best approach, as it presents the highest likelihood
> of working as expected in the automated scenarios. I will take a look to
> see how I could adapt those patches and maybe make the third one better.

IIRC, there was nothing too wrong with the patches. Reviewers had a few
small comments/fixups, but mostly I was on the fence on whether it was a
good idea at all, since it was not really my itch, and it was all
motivated by third-hand complaints about the behavior. Since nobody
brought it up more since then, I hadn't come back to it.

So I think it probably just needs a bit of polish, and to decide on
patch 3. If it helps, I've been rebasing it forward as:

  https://github.com/peff/git jk/clone-url-password-wip

but it's not part of my daily build, so caveat structor.

I don't think the third patch is wrong in _how_ it works. It's mostly
whether it's a good idea at all: we are not storing the file in
.git/config, but we are still storing it in ~/.git-credentials. That's
moderately better, but still not very secure.

If you do have a credential helper defined, then with just patch 2
everything would Just Work as you'd hope (and even with patch 3, we'll
skip using credential-store if you have something better defined).

> Is it possible that some Git installations have no credential helper? We
> can keep the "git clone" working in that scenario by storing the password
> in memory until the process completes, but later "git fetch" commands
> will fail.

I expect lots of installations have no helper configured. I don't think
any Linux distro packages ship with one configured. I think Apple Git
ships with osxkeychain configured, but I don't know whether the homebrew
recipe does, too. And of course anybody building from source is on their
own.

Hopefully some of those people install and configure a credential helper
on their own, but I expect it is wishful thinking to imagine that it's a
majority. :)

Even with just the first two patches, I believe the "in memory" thing
you suggest would already work. We feed the full URL to the transport
code, which can make us of it for the duration of the clone process.
It's only what we write into .git/config that is changed (so yes, future
fetches would fail).

-Peff

urlmatch: do not allow passwords in URLs by default

Commit Message

Comments

Patch