Message ID | pull.1488.git.git.1680652122547.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | clone: error specifically with --local and symlinked objects | expand |
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes: > iter = dir_iterator_begin(src->buf, DIR_ITERATOR_PEDANTIC); > > - if (!iter) > + if (!iter) { > + struct stat st; > + if (lstat(src->buf, &st) >= 0 && S_ISLNK(st.st_mode)) > + die(_("'%s' is a symlink, refusing to clone with --local"), > + src->buf); If you want to do lstat(2) yourself, the canonical way to check its success is to see the returned value is 0, not "not negative", but let's first see how dir_iterator_begin() can fail. I suspect it may not necessary to do another lstat(2). The function returns NULL: * if lstat() fails; errno is left from the failed lstat() in this case. * if lstat() succeeds, but the path is *not* a directory; errno is explicitly set to ENOTDIR. Unfortunately, if lstat(2) failed with ENOTDIR (e.g. dir_iterator_begin() gets called with a path whose leading component is not a directory), the caller will also see ENOTDIR, but the distinction may not matter in practice. I haven't thought things through. Assuming that the distinction does not matter, then, if (!iter) { if (errno == ENOTDIR) die(_("'%s' is not a directory, refusing to clone with --local"), src->buf); else die_errno(_("failed to stat '%s'"), src->buf); } may be sufficient. But because this is an error codepath, it is not worth optimizing what happens there, and an extra lstat(2) is not too bad, if the code gains extra clarity.
Junio C Hamano <gitster@pobox.com> writes: > If you want to do lstat(2) yourself, the canonical way to check its > success is to see the returned value is 0, not "not negative", but > let's first see how dir_iterator_begin() can fail. Ah, thanks. > Unfortunately, if lstat(2) failed > with ENOTDIR (e.g. dir_iterator_begin() gets called with a path > whose leading component is not a directory), the caller will also > see ENOTDIR, but the distinction may not matter in practice. I > haven't thought things through. > > ... > > if (!iter) { > if (errno == ENOTDIR) > die(_("'%s' is not a directory, refusing to clone with --local"), > src->buf); > else > die_errno(_("failed to stat '%s'"), src->buf); > } > > may be sufficient. But because this is an error codepath, it is not > worth optimizing what happens there, and an extra lstat(2) is not > too bad, if the code gains extra clarity. Yeah, the considerations here make sense to me. Since this is an error code path, I think the extra lstat() is probably worth it since it lets us be really specific about the error. Maybe: if (!iter) { struct stat st; if (errno == ENOTDIR && lstat(src->buf, &st) == 0 && S_ISLNK(st.st_mode)) die(_("'%s' is a symlink, refusing to clone with --local"), src->buf); die_errno(_("failed to start iterator over '%s'"), src->buf); } Doing the extra lstat() only makes sense if we saw ENOTDIR orignally anyway. Alternatively, since we do care about the distinction between ENOTDIR from lstat vs ENOTDIR because dir_iterator_begin() saw a symlink, maybe it's better to refactor dir_iterator_begin() so that we stop piggybacking on "errno = ENOTDIR" in these two cases. I didn't want to do this because I wasn't sure who might be relying on this behavior, but this check was pretty recently introduced anyway (bffc762f87 (dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS, 2023-01-24), so maybe nobody needs it.
Glen Choo <chooglen@google.com> writes: > Since this is an error code path, I think the extra lstat() is probably > worth it since it lets us be really specific about the error. OK. Sounds good. Thanks.
On Tue, Apr 04, 2023 at 05:13:41PM -0700, Junio C Hamano wrote: > * if lstat() succeeds, but the path is *not* a directory; errno is > explicitly set to ENOTDIR. Unfortunately, if lstat(2) failed > with ENOTDIR (e.g. dir_iterator_begin() gets called with a path > whose leading component is not a directory), the caller will also > see ENOTDIR, but the distinction may not matter in practice. I > haven't thought things through. Yeah. The real tragedy here is that there's no way to signal that a file *is* a symbolic link and represent that as an error code in errno. So I think the best that we can do here is to continue to report ENOTDIR, and have the caller figure out what to do with it. Thanks, Taylor
On Wed, Apr 05, 2023 at 09:48:22AM -0700, Glen Choo wrote: > Since this is an error code path, I think the extra lstat() is probably > worth it since it lets us be really specific about the error. Maybe: FWIW, I don't totally think it's necessary for us to report back to the user that they're trying to clone a repository with `--local` whose `$GIT_DIR/objects` either is or contains a symbolic link. Especially so since you're already updating the documentation here. But I think it's a compelling argument in the name of improving the user experience, so I think that this is a reasonable direction. And I agree, that while the extra lstat() is unfortunate, I don't think there is a convenient way to do it otherwise. We *could* teach the dir-iterator API to return a specialized error code either through a pointer, like: struct dir_iterator *dir_iterator_begin(const char *path, unsigned int flags, int *error) and set error to something like -DIR_ITERATOR_IS_SYMLINK when error is non-NULL. Or we could do something like this: int dir_iterator_begin(struct dir_iterator **it, const char *path, unsigned int flags) and have the `dir_iterator_begin()` function return its specialized error and initialize the dir iterator through a pointer. Between these two, I prefer the latter, but I think it's up to individual taste. > if (!iter) { > struct stat st; > > if (errno == ENOTDIR && lstat(src->buf, &st) == 0 && S_ISLNK(st.st_mode)) Couple of nit-picks here. Instead of writing "lstat(src->buf, &st == 0)", we should write "!lstat(src->buf, &st)" to match Documentation/CodingGuidelines. But note that that call to lstat() will clobber your errno value, so you'd want to save it off beforehand. If you end up going this route, I'd probably do something like: if (!iter) { int saved_errno = errno; if (errno == ENOTDIR) { struct stat st; if (!lstat(src->buf, &st) && S_ISLNK(st.st_mode)) die(_("'%s' is a symlink, refusing to clone with `--local`"), src->buf); } errno = saved_errno; die_errno(_("failed to start iterator over '%s'"), src->buf); } Thanks, Taylor
Thanks for the feedback! (and phew, I was a few minutes away from submitting v2 :P) Taylor Blau <me@ttaylorr.com> writes: > We *could* teach the dir-iterator API to return a specialized error code either > through a pointer, like: > > struct dir_iterator *dir_iterator_begin(const char *path, > unsigned int flags, int *error) > > and set error to something like -DIR_ITERATOR_IS_SYMLINK when error is > non-NULL. > > Or we could do something like this: > > int dir_iterator_begin(struct dir_iterator **it, > const char *path, unsigned int flags) > > and have the `dir_iterator_begin()` function return its specialized > error and initialize the dir iterator through a pointer. Between these > two, I prefer the latter, but I think it's up to individual taste. Yeah, I've thought about doing the latter. That's also what I'd prefer. Since you've brought it up, I'll give that a try. >> if (!iter) { >> struct stat st; >> >> if (errno == ENOTDIR && lstat(src->buf, &st) == 0 && S_ISLNK(st.st_mode)) > > Couple of nit-picks here. Instead of writing "lstat(src->buf, &st == > 0)", we should write "!lstat(src->buf, &st)" to match > Documentation/CodingGuidelines. Ah, thanks. > But note that that call to lstat() will clobber your errno value, so > you'd want to save it off beforehand. If you end up going this route, > I'd probably do something like: > > if (!iter) { > int saved_errno = errno; > if (errno == ENOTDIR) { > struct stat st; > > if (!lstat(src->buf, &st) && S_ISLNK(st.st_mode)) > die(_("'%s' is a symlink, refusing to clone with `--local`"), > src->buf); > } > errno = saved_errno; > > die_errno(_("failed to start iterator over '%s'"), src->buf); > } Ah thanks, yes. I think it isn't different in practice, since we're lstat()-ing the same path, but we should always use the "real" errno to be safe. This is another good reason to return an error code instead of overloading errno.
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt index d6434d262d6..c37c4a37f74 100644 --- a/Documentation/git-clone.txt +++ b/Documentation/git-clone.txt @@ -58,6 +58,11 @@ never use the local optimizations). Specifying `--no-local` will override the default when `/path/to/repo` is given, using the regular Git transport instead. + +If the repository's `$GIT_DIR/objects` has symbolic links or is a +symbolic link, the clone will fail. This is a security measure to +prevent the unintentional copying of files by dereferencing the symbolic +links. ++ *NOTE*: this operation can race with concurrent modification to the source repository, similar to running `cp -r src dst` while modifying `src`. diff --git a/builtin/clone.c b/builtin/clone.c index 462c286274c..74ec5b8b02a 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -327,8 +327,13 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest, iter = dir_iterator_begin(src->buf, DIR_ITERATOR_PEDANTIC); - if (!iter) + if (!iter) { + struct stat st; + if (lstat(src->buf, &st) >= 0 && S_ISLNK(st.st_mode)) + die(_("'%s' is a symlink, refusing to clone with --local"), + src->buf); die_errno(_("failed to start iterator over '%s'"), src->buf); + } strbuf_addch(src, '/'); src_len = src->len; diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh index 83e3c97861d..9845fc04d59 100755 --- a/t/t5604-clone-reference.sh +++ b/t/t5604-clone-reference.sh @@ -358,7 +358,7 @@ test_expect_success SYMLINKS 'clone repo with symlinked objects directory' ' test_must_fail git clone --local malicious clone 2>err && test_path_is_missing clone && - grep "failed to start iterator over" err + grep "is a symlink, refusing to clone with --local" err ' test_done