diff mbox series

clone: error specifically with --local and symlinked objects

Message ID pull.1488.git.git.1680652122547.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series clone: error specifically with --local and symlinked objects | expand

Commit Message

Glen Choo April 4, 2023, 11:48 p.m. UTC
From: Glen Choo <chooglen@google.com>

6f054f9fb3 (builtin/clone.c: disallow --local clones with
symlinks, 2022-07-28) gives a good error message when "git clone
--local" fails when the repo to clone has symlinks in
"$GIT_DIR/objects". In bffc762f87 (dir-iterator: prevent top-level
symlinks without FOLLOW_SYMLINKS, 2023-01-24), we later extended this
restriction to the case where "$GIT_DIR/objects" is itself a symlink,
but we didn't update the error message then - bffc762f87's tests show
that we print a generic "failed to start iterator over" message.

This is exacerbated by the fact that Documentation/git-clone.txt
mentions neither restriction, so users are left wondering if this is
intentional behavior or not.

Fix this by adding a check to builtin/clone.c: when doing a local clone,
perform an extra check to see if "$GIT_DIR/objects" is a symlink, and if
so, assume that that was the reason for the failure and report the
relevant information. Ideally, dir_iterator_begin() would tell us that
the real failure reason is the presence of the symlink, but (as far as I
can tell) there isn't an appropriate errno value for that.

Also, update Documentation/git-clone.txt to reflect that this
restriction exists.

Signed-off-by: Glen Choo <chooglen@google.com>
---
    clone: error specifically with --local and symlinked objects
    
    We noticed this because repo [1] creates Git repos where
    "$GIT_DIR/objects" is a symlink, and users have gotten confused as to
    whether this was intended behavior or not.
    
    I'm no good with lstat() and errno, so if there's a better to do this,
    I'd appreciate the input :)
    
    [1] https://gerrit.googlesource.com/git-repo

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1488%2Fchooglen%2Fpush-nymnqqnttzxz-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1488/chooglen/push-nymnqqnttzxz-v1
Pull-Request: https://github.com/git/git/pull/1488

 Documentation/git-clone.txt | 5 +++++
 builtin/clone.c             | 7 ++++++-
 t/t5604-clone-reference.sh  | 2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)


base-commit: 27d43aaaf50ef0ae014b88bba294f93658016a2e

Comments

Junio C Hamano April 5, 2023, 12:13 a.m. UTC | #1
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  	iter = dir_iterator_begin(src->buf, DIR_ITERATOR_PEDANTIC);
>  
> -	if (!iter)
> +	if (!iter) {
> +		struct stat st;
> +		if (lstat(src->buf, &st) >= 0 && S_ISLNK(st.st_mode))
> +			die(_("'%s' is a symlink, refusing to clone with --local"),
> +			    src->buf);

If you want to do lstat(2) yourself, the canonical way to check its
success is to see the returned value is 0, not "not negative", but
let's first see how dir_iterator_begin() can fail.  I suspect it may
not necessary to do another lstat(2).  The function returns NULL:

 * if lstat() fails; errno is left from the failed lstat() in this case.

 * if lstat() succeeds, but the path is *not* a directory; errno is
   explicitly set to ENOTDIR.  Unfortunately, if lstat(2) failed
   with ENOTDIR (e.g. dir_iterator_begin() gets called with a path
   whose leading component is not a directory), the caller will also
   see ENOTDIR, but the distinction may not matter in practice.  I
   haven't thought things through.

Assuming that the distinction does not matter, then,

	if (!iter) {
		if (errno == ENOTDIR)
			die(_("'%s' is not a directory, refusing to clone with --local"),
			    src->buf);
		else
			die_errno(_("failed to stat '%s'"), src->buf);
	}

may be sufficient.  But because this is an error codepath, it is not
worth optimizing what happens there, and an extra lstat(2) is not
too bad, if the code gains extra clarity.
Glen Choo April 5, 2023, 4:48 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> If you want to do lstat(2) yourself, the canonical way to check its
> success is to see the returned value is 0, not "not negative", but
> let's first see how dir_iterator_begin() can fail.

Ah, thanks.

>                                Unfortunately, if lstat(2) failed
>    with ENOTDIR (e.g. dir_iterator_begin() gets called with a path
>    whose leading component is not a directory), the caller will also
>    see ENOTDIR, but the distinction may not matter in practice.  I
>    haven't thought things through.
>
> ...
>
> 	if (!iter) {
> 		if (errno == ENOTDIR)
> 			die(_("'%s' is not a directory, refusing to clone with --local"),
> 			    src->buf);
> 		else
> 			die_errno(_("failed to stat '%s'"), src->buf);
> 	}
>
> may be sufficient.  But because this is an error codepath, it is not
> worth optimizing what happens there, and an extra lstat(2) is not
> too bad, if the code gains extra clarity.

Yeah, the considerations here make sense to me.

Since this is an error code path, I think the extra lstat() is probably
worth it since it lets us be really specific about the error. Maybe:

	if (!iter) {
		struct stat st;

    if (errno == ENOTDIR && lstat(src->buf, &st) == 0 && S_ISLNK(st.st_mode))
			die(_("'%s' is a symlink, refusing to clone with --local"),
			    src->buf);

    die_errno(_("failed to start iterator over '%s'"), src->buf);
	}

Doing the extra lstat() only makes sense if we saw ENOTDIR orignally
anyway.

Alternatively, since we do care about the distinction between ENOTDIR
from lstat vs ENOTDIR because dir_iterator_begin() saw a symlink, maybe
it's better to refactor dir_iterator_begin() so that we stop
piggybacking on "errno = ENOTDIR" in these two cases. I didn't want to
do this because I wasn't sure who might be relying on this behavior, but
this check was pretty recently introduced anyway (bffc762f87
(dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS,
2023-01-24), so maybe nobody needs it.
Junio C Hamano April 5, 2023, 7:20 p.m. UTC | #3
Glen Choo <chooglen@google.com> writes:

> Since this is an error code path, I think the extra lstat() is probably
> worth it since it lets us be really specific about the error.

OK.  Sounds good.  Thanks.
Taylor Blau April 6, 2023, 9:27 p.m. UTC | #4
On Tue, Apr 04, 2023 at 05:13:41PM -0700, Junio C Hamano wrote:
>  * if lstat() succeeds, but the path is *not* a directory; errno is
>    explicitly set to ENOTDIR.  Unfortunately, if lstat(2) failed
>    with ENOTDIR (e.g. dir_iterator_begin() gets called with a path
>    whose leading component is not a directory), the caller will also
>    see ENOTDIR, but the distinction may not matter in practice.  I
>    haven't thought things through.

Yeah. The real tragedy here is that there's no way to signal that a file
*is* a symbolic link and represent that as an error code in errno.

So I think the best that we can do here is to continue to report
ENOTDIR, and have the caller figure out what to do with it.

Thanks,
Taylor
Taylor Blau April 6, 2023, 9:35 p.m. UTC | #5
On Wed, Apr 05, 2023 at 09:48:22AM -0700, Glen Choo wrote:
> Since this is an error code path, I think the extra lstat() is probably
> worth it since it lets us be really specific about the error. Maybe:

FWIW, I don't totally think it's necessary for us to report back to the
user that they're trying to clone a repository with `--local` whose
`$GIT_DIR/objects` either is or contains a symbolic link. Especially so
since you're already updating the documentation here.

But I think it's a compelling argument in the name of improving the user
experience, so I think that this is a reasonable direction. And I agree,
that while the extra lstat() is unfortunate, I don't think there is a
convenient way to do it otherwise.

We *could* teach the dir-iterator API to return a specialized error code either
through a pointer, like:

    struct dir_iterator *dir_iterator_begin(const char *path,
                                            unsigned int flags, int *error)

and set error to something like -DIR_ITERATOR_IS_SYMLINK when error is
non-NULL.

Or we could do something like this:

    int dir_iterator_begin(struct dir_iterator **it,
                           const char *path, unsigned int flags)

and have the `dir_iterator_begin()` function return its specialized
error and initialize the dir iterator through a pointer. Between these
two, I prefer the latter, but I think it's up to individual taste.

> 	if (!iter) {
> 		struct stat st;
>
>     if (errno == ENOTDIR && lstat(src->buf, &st) == 0 && S_ISLNK(st.st_mode))

Couple of nit-picks here. Instead of writing "lstat(src->buf, &st ==
0)", we should write "!lstat(src->buf, &st)" to match
Documentation/CodingGuidelines.

But note that that call to lstat() will clobber your errno value, so
you'd want to save it off beforehand. If you end up going this route,
I'd probably do something like:

    if (!iter) {
      int saved_errno = errno;
      if (errno == ENOTDIR) {
        struct stat st;

        if (!lstat(src->buf, &st) && S_ISLNK(st.st_mode))
          die(_("'%s' is a symlink, refusing to clone with `--local`"),
              src->buf);
      }
      errno = saved_errno;

      die_errno(_("failed to start iterator over '%s'"), src->buf);
    }

Thanks,
Taylor
Glen Choo April 6, 2023, 9:55 p.m. UTC | #6
Thanks for the feedback!

(and phew, I was a few minutes away from submitting v2 :P)

Taylor Blau <me@ttaylorr.com> writes:

> We *could* teach the dir-iterator API to return a specialized error code either
> through a pointer, like:
>
>     struct dir_iterator *dir_iterator_begin(const char *path,
>                                             unsigned int flags, int *error)
>
> and set error to something like -DIR_ITERATOR_IS_SYMLINK when error is
> non-NULL.
>
> Or we could do something like this:
>
>     int dir_iterator_begin(struct dir_iterator **it,
>                            const char *path, unsigned int flags)
>
> and have the `dir_iterator_begin()` function return its specialized
> error and initialize the dir iterator through a pointer. Between these
> two, I prefer the latter, but I think it's up to individual taste.

Yeah, I've thought about doing the latter. That's also what I'd prefer.
Since you've brought it up, I'll give that a try.

>> 	if (!iter) {
>> 		struct stat st;
>>
>>     if (errno == ENOTDIR && lstat(src->buf, &st) == 0 && S_ISLNK(st.st_mode))
>
> Couple of nit-picks here. Instead of writing "lstat(src->buf, &st ==
> 0)", we should write "!lstat(src->buf, &st)" to match
> Documentation/CodingGuidelines.

Ah, thanks.

> But note that that call to lstat() will clobber your errno value, so
> you'd want to save it off beforehand. If you end up going this route,
> I'd probably do something like:
>
>     if (!iter) {
>       int saved_errno = errno;
>       if (errno == ENOTDIR) {
>         struct stat st;
>
>         if (!lstat(src->buf, &st) && S_ISLNK(st.st_mode))
>           die(_("'%s' is a symlink, refusing to clone with `--local`"),
>               src->buf);
>       }
>       errno = saved_errno;
>
>       die_errno(_("failed to start iterator over '%s'"), src->buf);
>     }

Ah thanks, yes. I think it isn't different in practice, since we're
lstat()-ing the same path, but we should always use the "real" errno to
be safe.

This is another good reason to return an error code instead of
overloading errno.
diff mbox series

Patch

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index d6434d262d6..c37c4a37f74 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -58,6 +58,11 @@  never use the local optimizations).  Specifying `--no-local` will
 override the default when `/path/to/repo` is given, using the regular
 Git transport instead.
 +
+If the repository's `$GIT_DIR/objects` has symbolic links or is a
+symbolic link, the clone will fail. This is a security measure to
+prevent the unintentional copying of files by dereferencing the symbolic
+links.
++
 *NOTE*: this operation can race with concurrent modification to the
 source repository, similar to running `cp -r src dst` while modifying
 `src`.
diff --git a/builtin/clone.c b/builtin/clone.c
index 462c286274c..74ec5b8b02a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -327,8 +327,13 @@  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 
 	iter = dir_iterator_begin(src->buf, DIR_ITERATOR_PEDANTIC);
 
-	if (!iter)
+	if (!iter) {
+		struct stat st;
+		if (lstat(src->buf, &st) >= 0 && S_ISLNK(st.st_mode))
+			die(_("'%s' is a symlink, refusing to clone with --local"),
+			    src->buf);
 		die_errno(_("failed to start iterator over '%s'"), src->buf);
+	}
 
 	strbuf_addch(src, '/');
 	src_len = src->len;
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 83e3c97861d..9845fc04d59 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -358,7 +358,7 @@  test_expect_success SYMLINKS 'clone repo with symlinked objects directory' '
 	test_must_fail git clone --local malicious clone 2>err &&
 
 	test_path_is_missing clone &&
-	grep "failed to start iterator over" err
+	grep "is a symlink, refusing to clone with --local" err
 '
 
 test_done