Message ID | 22b10bf9da8ccf4ae4da634aadfdaff5ee7a3508.1652485058.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | setup.c: make bare repo discovery optional | expand |
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Glen Choo <chooglen@google.com> > > Add a config variable, `discovery.bare`, that tells Git whether or not > it should work with the bare repository it has discovered i.e. Git will > die() if it discovers a bare repository, but it is not allowed by > `discovery.bare`. This only affects repository discovery, thus it has no > effect if discovery was not done (e.g. `--git-dir` was passed). > > This is motivated by the fact that some workflows don't use bare > repositories at all, and users may prefer to opt out of bare repository > discovery altogether: > > - An easy assumption for a user to make is that Git commands run > anywhere inside a repository's working tree will use the same > repository. However, if the working tree contains a bare repository > below the root-level (".git" is preferred at the root-level), any > operations inside that bare repository use the bare repository > instead. > > In the worst case, attackers can use this confusion to trick users > into running arbitrary code (see [1] for a deeper discussion). But > even in benign situations (e.g. a user renames ".git/" to ".git.old/" > and commits it for archival purposes), disabling bare repository > discovery can be a simpler mode of operation (e.g. because the user > doesn't actually want to use ".git.old/") [2]. > > - Git won't "accidentally" recognize a directory that wasn't meant to be > a bare repository, but happens to resemble one. While such accidents > are probably very rare in practice, this lets users reduce the chance > to zero. > > This config is an enum of: > > - ["always"|(unset)]: always recognize bare repositories (like Git does > today) > - "never": never recognize bare repositories > > More values are expected to be added later, and the default is expected > to change (i.e. to something other than "always"). > > [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com > [2]: I don't personally know anyone who does this as part of their > normal workflow, but a cursory search on GitHub suggests that there is a > not insubstantial number of people who munge ".git" in order to store > its contents. > > https://github.com/search?l=&o=desc&p=1&q=ref+size%3A%3C1000+filename%3AHEAD&s=indexed&type=Code > (aka search for the text "ref", size:<1000, filename:HEAD) > > Signed-off-by: Glen Choo <chooglen@google.com> The intended commit message ends here... > WIP setup.c: make discovery.bare die on failure > > Signed-off-by: Glen Choo <chooglen@google.com> Ugh, dumb mistake (bad squash). Fortunately this was one of my more professional-sounding WIP commit messages.
On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote: > From: Glen Choo <chooglen@google.com> > > Add a config variable, `discovery.bare`, that tells Git whether or not > it should work with the bare repository it has discovered i.e. Git will > die() if it discovers a bare repository, but it is not allowed by > `discovery.bare`. This only affects repository discovery, thus it has no > effect if discovery was not done (e.g. `--git-dir` was passed). > This config is an enum of: > > - ["always"|(unset)]: always recognize bare repositories (like Git does > today) > - "never": never recognize bare repositories > > More values are expected to be added later, and the default is expected > to change (i.e. to something other than "always"). I think it is fine to include the "never" option for users to opt-in to this super-protected state, but I want to make it very clear that we should never move to it as a new default. This phrasing of 'something other than "always"' is key, but it might be good to point out that "never" is very unlikely to be that default. > WIP setup.c: make discovery.bare die on failure > > Signed-off-by: Glen Choo <chooglen@google.com> Accidental concatenation of squashed commit? > diff --git a/Documentation/config/discovery.txt b/Documentation/config/discovery.txt > new file mode 100644 > index 00000000000..761cabe6e70 > --- /dev/null > +++ b/Documentation/config/discovery.txt > @@ -0,0 +1,24 @@ > +discovery.bare:: > + Specifies what kinds of directories Git can recognize as a bare > + repository when looking for the repository (aka repository > + discovery). This has no effect if repository discovery is not > + performed e.g. the path to the repository is set via `--git-dir` > + (see linkgit:git[1]). Avoid "e.g." here. This has no effect if the repository is specified directly via the --git-dir command-line option or the GIT_DIR environment variable. > +This config setting is only respected when specified in a system or global > +config, not when it is specified in a repository config or via the command > +line option `-c discovery.bare=<value>`. We are sprinkling config options that have these same restrictions throughout the config documentation. It might be time to define a term like "protected config" at the top of git-config.txt and then refer to that from these other locations. > +The currently supported values are `always` (Git always recognizes bare > +repositories) and `never` (Git never recognizes bare repositories). This sentence structure is likely to change in the future, and as it stands will become complicated. A bulleted list will have easier edits in the future. > +This defaults to `always`, but this default is likely to change. For now, I would say "but this default may change in the future." instead. > +If your workflow does not rely on bare repositories, it is recommended that > +you set this value to `never`. This makes repository discovery easier to > +reason about and prevents certain types of security and non-security > +problems, such as: > + (You might need a "+" here.) > +* `git clone`-ing a repository containing a malicious bare repository > + inside it. > +* Git recognizing a directory that isn't meant to be a bare repository, > + but happens to look like one. I think these last bits recommending the 'never' option are a bit distracting. It doesn't make repository discovery "easier to reason about" because we still discover the bare repo and die() instead of skipping it and looking higher for a non-bare repository in the parent directories. The case of an "accidentally-recognized bare repo" is so unlikely it is probably not worth mention in these docs. Instead, I think something like this might be better: If you do not use bare repositories in your workflow, then it may be beneficial to set `discovery.bare` to `never` in your global config. This will protect you from attacks that involve cloning a repository that contains a bare repository and running a Git command within that directory. > +static int check_bare_repo_allowed(void) > +{ > + if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) { > + read_very_early_config(discovery_bare_cb, NULL); This will add the third place where we use read_very_early_config(), adding to the existing calls in tr2_sysenv_load() and ensure_valid_ownership(). If I understand it correctly, that means that every Git execution in a bare repository will now parse the system and global config three times. This doesn't count the check for uploadpack.packobjectshook in upload-pack.c that uses current_config_scope() to restrict its value to the system and global config. We are probably at the point where we need to instead create a configset that stores this "protected config" and allow us to lookup config keys directly from that configset instead of iterating through these config files repeatedly. > + /* We didn't find a value; use the default. */ > + if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) > + discovery_bare_config = DISCOVERY_BARE_ALWAYS; This could also be done in advance of the config parsing by setting discovery_bare_config = DISCOVERY_BARE_ALWAYS before calling read_very_early_config(). Avoids an if and a comment here, which might be nice. > + } > + switch (discovery_bare_config) { > + case DISCOVERY_BARE_NEVER: > + return 0; > + case DISCOVERY_BARE_ALWAYS: > + return 1; > + default: > + BUG("invalid discovery_bare_config %d", discovery_bare_config); > + } You return -1 in discovery_bare_cb when the key matches, but the value is not understood. Should we check the return value of read_very_early_config(), too? > +static const char *discovery_bare_config_to_string(void) > +{ > + switch (discovery_bare_config) { > + case DISCOVERY_BARE_NEVER: > + return "never"; > + case DISCOVERY_BARE_ALWAYS: > + return "always"; > + default: > + BUG("invalid discovery_bare_config %d", discovery_bare_config); In general, I'm not sure these BUG() statements are helpful, but they aren't hurting anything. I wonder if it would be better to use DISCOVERY_BARE_UNKNOWN instead of default, because then the compiler should notice that the switch needs updating when a new enum mode is added. > @@ -1142,7 +1195,8 @@ enum discovery_result { > GIT_DIR_HIT_CEILING = -1, > GIT_DIR_HIT_MOUNT_POINT = -2, > GIT_DIR_INVALID_GITFILE = -3, > - GIT_DIR_INVALID_OWNERSHIP = -4 > + GIT_DIR_INVALID_OWNERSHIP = -4, > + GIT_DIR_DISALLOWED_BARE = -5 I think that you can add a comma at the end of this enum to avoid the changed line the next time the enum needs to be expanded. > }; > > /* > @@ -1239,6 +1293,8 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir, > } > > if (is_git_directory(dir->buf)) { > + if (!check_bare_repo_allowed()) > + return GIT_DIR_DISALLOWED_BARE; Won't this fail if someone runs a Git command inside of a .git/ directory for a non-bare repository? I just want to be sure that we hit this error instead: fatal: this operation must be run in a work tree I see that this error is tested in t0008-ignores.sh, but that's with the default "always" value. It would be good to explicitly check that this is the right error when using the "never" config. Thanks, -Stolee
On Mon, May 16, 2022 at 02:46:55PM -0400, Derrick Stolee wrote: > On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote: > > From: Glen Choo <chooglen@google.com> > > > > Add a config variable, `discovery.bare`, that tells Git whether or not > > it should work with the bare repository it has discovered i.e. Git will > > die() if it discovers a bare repository, but it is not allowed by > > `discovery.bare`. This only affects repository discovery, thus it has no > > effect if discovery was not done (e.g. `--git-dir` was passed). > > > This config is an enum of: > > > > - ["always"|(unset)]: always recognize bare repositories (like Git does > > today) > > - "never": never recognize bare repositories > > > > More values are expected to be added later, and the default is expected > > to change (i.e. to something other than "always"). > > I think it is fine to include the "never" option for users to opt-in to > this super-protected state, but I want to make it very clear that we > should never move to it as a new default. This phrasing of 'something > other than "always"' is key, but it might be good to point out that > "never" is very unlikely to be that default. I am confused, then. What does a user who has some legitimate (non-embedded) bare repositories do if they are skeptical of other bare repositories? I suspect the best answer we would be able to provide with these patches is "use `--git-dir`". What happens to a user who has a combination of legitimate bare repositories, embedded bare repositories that they trust, and other embedded bare repositories that they don't? As far as I can tell, our recommendation with these tools would be to: - run `git config --global discovery.bare never`, and - include `--git-dir=$(pwd)` in any git invocations in bare repositories that they do trust This gets at my concerns from [1] and [2] (mostly [2], in this case) that we're trying to close the embedded bare repos problem with an overly broad solution, at the expense of usability. I can't shake the feeling that something like I described towards the bottom of [2] would give you all of the security guarantees you're after without compromising on usability for non-embedded bare repositories. I'm happy to explore this direction more myself if you don't want to. I would just much rather see us adopt an approach that doesn't break more use-cases than it has to if such a thing can be avoided. I cannot endorse these patches as-is. Thanks, Taylor [1]: https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/ [2]: https://lore.kernel.org/git/YnmKwLoQCorBnMe2@nand.local/
Thanks for being thorough, I find it really helpful. For brevity, I won't reply to comments that I think are obviously good, so you can assume I'll incorproate anything that isn't commented on. Derrick Stolee <derrickstolee@github.com> writes: > On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote: >> From: Glen Choo <chooglen@google.com> >> >> +This config setting is only respected when specified in a system or global >> +config, not when it is specified in a repository config or via the command >> +line option `-c discovery.bare=<value>`. > > We are sprinkling config options that have these same restrictions throughout > the config documentation. It might be time to define a term like "protected > config" at the top of git-config.txt and then refer to that from these other > locations. Agree, and I think defining the term will be useful in future on-list discussions. >> +static int check_bare_repo_allowed(void) >> +{ >> + if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) { >> + read_very_early_config(discovery_bare_cb, NULL); > > This will add the third place where we use read_very_early_config(), > adding to the existing calls in tr2_sysenv_load() and > ensure_valid_ownership(). If I understand it correctly, that means > that every Git execution in a bare repository will now parse the > system and global config three times. > > This doesn't count the check for uploadpack.packobjectshook in > upload-pack.c that uses current_config_scope() to restrict its > value to the system and global config. > > We are probably at the point where we need to instead create a > configset that stores this "protected config" and allow us to > lookup config keys directly from that configset instead of > iterating through these config files repeatedly. Looking at all of the read_very_early_config() calls, - check_bare_repo_allowed() can use git_configset_get_string() - ensure_valid_ownership() can use git_configset_get_value_multi() - tr2_sysenv_load() reads every value with the "trace2." prefix. AFAICT configsets only support exact key lookups and I don't see an easy way teach configsets to support prefix lookups. (I didn't look too closely at uploadpack.packobjectshook because I don't know enough about config scopes to comment.) So using a configset, we'll still need to read the config files at least twice. That's better than thrice, but it doesn't cover the tr2_sysenv_load() use case, and we'll run into this yet again if add function that reads all config values with a given prefix. An hacky alternative that covers all of these use cases would be to read all protected config in a single pass, e.g. static struct protected_config { struct safe_directory_data safe_directory_data; const char *discovery_bare; struct string_list tr2_sysenv; }; static int protected_config_cb() { /* Parse EVERYTHING that belongs in protected_config. */ } but protected_config_cb() would have to parse too many unrelated things for my liking. So I'll use the configset for the cases where the key is known, and perhaps we'll punt on tr2_sysenv_load(). >> + } >> + switch (discovery_bare_config) { >> + case DISCOVERY_BARE_NEVER: >> + return 0; >> + case DISCOVERY_BARE_ALWAYS: >> + return 1; >> + default: >> + BUG("invalid discovery_bare_config %d", discovery_bare_config); >> + } > > You return -1 in discovery_bare_cb when the key matches, but > the value is not understood. Should we check the return value > of read_very_early_config(), too? This comment doesn't apply because unlike most other config reading functions, read_very_early_config() and read_early_config() die when the callback returns -1. I'm not sure why this is the case though, and maybe you think there is value in having a non-die()-ing variant, e.g. read_very_early_config_gently()? >> }; >> >> /* >> @@ -1239,6 +1293,8 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir, >> } >> >> if (is_git_directory(dir->buf)) { >> + if (!check_bare_repo_allowed()) >> + return GIT_DIR_DISALLOWED_BARE; > > Won't this fail if someone runs a Git command inside of a .git/ > directory for a non-bare repository? I just want to be sure that > we hit this error instead: > > fatal: this operation must be run in a work tree > > I see that this error is tested in t0008-ignores.sh, but that's > with the default "always" value. It would be good to explicitly > check that this is the right error when using the "never" config. Yes, it will fail if run inside of a .git/ directory. "never" prevents you from working from inside .git/ unless you set GIT_DIR. IIRC, we don't show "fatal: this operation must be run in a work tree" for every Git command, e.g. "git log" works just fine. It makes sense to show this warning when the CWD supports 'some, but not all' Git commands, but I don't think this is valuable if we forbid *all* Git commands. Instead of trying to make "never" accomodate this use case, perhaps what we want is a "dotgit-only" option that allows a bare repository if it is below a .git/ directory. Since we forbid .git in the index, this seems somewhat safe, but I hadn't proposed this sooner because I don't know if we need it yet, and I'm certain that there are less secure edge cases that need to be thought through.
Glen Choo <chooglen@google.com> writes: >>> +static int check_bare_repo_allowed(void) >>> +{ >>> + if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) { >>> + read_very_early_config(discovery_bare_cb, NULL); >> >> This will add the third place where we use read_very_early_config(), >> adding to the existing calls in tr2_sysenv_load() and >> ensure_valid_ownership(). If I understand it correctly, that means >> that every Git execution in a bare repository will now parse the >> system and global config three times. >> >> This doesn't count the check for uploadpack.packobjectshook in >> upload-pack.c that uses current_config_scope() to restrict its >> value to the system and global config. >> >> We are probably at the point where we need to instead create a >> configset that stores this "protected config" and allow us to >> lookup config keys directly from that configset instead of >> iterating through these config files repeatedly. > > Looking at all of the read_very_early_config() calls, > > - check_bare_repo_allowed() can use git_configset_get_string() > - ensure_valid_ownership() can use git_configset_get_value_multi() > - tr2_sysenv_load() reads every value with the "trace2." prefix. AFAICT > configsets only support exact key lookups and I don't see an easy way > teach configsets to support prefix lookups. > > (I didn't look too closely at uploadpack.packobjectshook because I don't > know enough about config scopes to comment.) > > So using a configset, we'll still need to read the config files at least > twice. That's better than thrice, but it doesn't cover the > tr2_sysenv_load() use case, and we'll run into this yet again if add > function that reads all config values with a given prefix. > > An hacky alternative that covers all of these use cases would be to read > all protected config in a single pass, e.g. > > static struct protected_config { > struct safe_directory_data safe_directory_data; > const char *discovery_bare; > struct string_list tr2_sysenv; > }; > > static int protected_config_cb() > { > /* Parse EVERYTHING that belongs in protected_config. */ > } > > but protected_config_cb() would have to parse too many unrelated things > for my liking. > > So I'll use the configset for the cases where the key is known, and > perhaps we'll punt on tr2_sysenv_load(). Since I'm trying to replace read_very_early_config() anyway, is this a good time to teach git to respect "-c safe.directory"? My understanding of [1] is that we only ignore "-c safe.directory" because read_very_early_config() doesn't support it, but we would prefer to support it if we could. [1] https://lore.kernel.org/git/xmqqlevabcsu.fsf@gitster.g/
diff --git a/Documentation/config/discovery.txt b/Documentation/config/discovery.txt new file mode 100644 index 00000000000..761cabe6e70 --- /dev/null +++ b/Documentation/config/discovery.txt @@ -0,0 +1,24 @@ +discovery.bare:: + Specifies what kinds of directories Git can recognize as a bare + repository when looking for the repository (aka repository + discovery). This has no effect if repository discovery is not + performed e.g. the path to the repository is set via `--git-dir` + (see linkgit:git[1]). ++ +This config setting is only respected when specified in a system or global +config, not when it is specified in a repository config or via the command +line option `-c discovery.bare=<value>`. ++ +The currently supported values are `always` (Git always recognizes bare +repositories) and `never` (Git never recognizes bare repositories). +This defaults to `always`, but this default is likely to change. ++ +If your workflow does not rely on bare repositories, it is recommended that +you set this value to `never`. This makes repository discovery easier to +reason about and prevents certain types of security and non-security +problems, such as: + +* `git clone`-ing a repository containing a malicious bare repository + inside it. +* Git recognizing a directory that isn't meant to be a bare repository, + but happens to look like one. diff --git a/setup.c b/setup.c index a7b36f3ffbf..cee01d86f0c 100644 --- a/setup.c +++ b/setup.c @@ -10,6 +10,13 @@ static int inside_git_dir = -1; static int inside_work_tree = -1; static int work_tree_config_is_bogus; +enum discovery_bare_config { + DISCOVERY_BARE_UNKNOWN = -1, + DISCOVERY_BARE_NEVER = 0, + DISCOVERY_BARE_ALWAYS, +}; +static enum discovery_bare_config discovery_bare_config = + DISCOVERY_BARE_UNKNOWN; static struct startup_info the_startup_info; struct startup_info *startup_info = &the_startup_info; @@ -1133,6 +1140,52 @@ static int ensure_valid_ownership(const char *path) return data.is_safe; } +static int discovery_bare_cb(const char *key, const char *value, void *d) +{ + if (strcmp(key, "discovery.bare")) + return 0; + + if (!strcmp(value, "never")) { + discovery_bare_config = DISCOVERY_BARE_NEVER; + return 0; + } + if (!strcmp(value, "always")) { + discovery_bare_config = DISCOVERY_BARE_ALWAYS; + return 0; + } + return -1; +} + +static int check_bare_repo_allowed(void) +{ + if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) { + read_very_early_config(discovery_bare_cb, NULL); + /* We didn't find a value; use the default. */ + if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) + discovery_bare_config = DISCOVERY_BARE_ALWAYS; + } + switch (discovery_bare_config) { + case DISCOVERY_BARE_NEVER: + return 0; + case DISCOVERY_BARE_ALWAYS: + return 1; + default: + BUG("invalid discovery_bare_config %d", discovery_bare_config); + } +} + +static const char *discovery_bare_config_to_string(void) +{ + switch (discovery_bare_config) { + case DISCOVERY_BARE_NEVER: + return "never"; + case DISCOVERY_BARE_ALWAYS: + return "always"; + default: + BUG("invalid discovery_bare_config %d", discovery_bare_config); + } +} + enum discovery_result { GIT_DIR_NONE = 0, GIT_DIR_EXPLICIT, @@ -1142,7 +1195,8 @@ enum discovery_result { GIT_DIR_HIT_CEILING = -1, GIT_DIR_HIT_MOUNT_POINT = -2, GIT_DIR_INVALID_GITFILE = -3, - GIT_DIR_INVALID_OWNERSHIP = -4 + GIT_DIR_INVALID_OWNERSHIP = -4, + GIT_DIR_DISALLOWED_BARE = -5 }; /* @@ -1239,6 +1293,8 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir, } if (is_git_directory(dir->buf)) { + if (!check_bare_repo_allowed()) + return GIT_DIR_DISALLOWED_BARE; if (!ensure_valid_ownership(dir->buf)) return GIT_DIR_INVALID_OWNERSHIP; strbuf_addstr(gitdir, "."); @@ -1385,6 +1441,14 @@ const char *setup_git_directory_gently(int *nongit_ok) } *nongit_ok = 1; break; + case GIT_DIR_DISALLOWED_BARE: + if (!nongit_ok) { + die(_("cannot use bare repository '%s' (discovery.bare is '%s')"), + dir.buf, + discovery_bare_config_to_string()); + } + *nongit_ok = 1; + break; case GIT_DIR_NONE: /* * As a safeguard against setup_git_directory_gently_1 returning diff --git a/t/t0034-discovery-bare.sh b/t/t0034-discovery-bare.sh new file mode 100755 index 00000000000..9c774872c4e --- /dev/null +++ b/t/t0034-discovery-bare.sh @@ -0,0 +1,59 @@ +#!/bin/sh + +test_description='verify discovery.bare checks' + +. ./test-lib.sh + +pwd="$(pwd)" + +expect_allowed () { + git rev-parse --absolute-git-dir >actual && + echo "$pwd/outer-repo/bare-repo" >expected && + test_cmp expected actual +} + +expect_rejected () { + test_must_fail git rev-parse --absolute-git-dir 2>err && + grep "discovery.bare" err +} + +test_expect_success 'setup bare repo in worktree' ' + git init outer-repo && + git init --bare outer-repo/bare-repo +' + +test_expect_success 'discovery.bare unset' ' + ( + cd outer-repo/bare-repo && + expect_allowed && + cd refs/ && + expect_allowed + ) +' + +test_expect_success 'discovery.bare=always' ' + git config --global discovery.bare always && + ( + cd outer-repo/bare-repo && + expect_allowed && + cd refs/ && + expect_allowed + ) +' + +test_expect_success 'discovery.bare=never' ' + git config --global discovery.bare never && + ( + cd outer-repo/bare-repo && + expect_rejected && + cd refs/ && + expect_rejected + ) && + ( + GIT_DIR=outer-repo/bare-repo && + export GIT_DIR && + expect_allowed + ) +' + +test_done