Message ID | 20190917215040.132503-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] merge-recursive: symlink's descendants not in way | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > When the working tree has: > - foo (symlink) > - foo/bar (directory) Whoa, wait. I assume, since this is about merge, the assumption is that the working tree is clean with respect to the index, so 'foo' is a symbolic link that is in the index. Now, if foo is a symlink, how can foo/bar (whether it is a directory or something else) exist, which requires foo to be a directory in the first place? > and the user merges a commit that deletes the foo symlink and instead > contains: > - foo (directory) > - foo/bar (file) This side is possible. If foo is a directory, then there can be foo/bar. But I do not get the initial setup you start with. In any case, if the working tree has 'foo' as a symlink, Git should not look at or get affected by what 'foo' points at.
> Jonathan Tan <jonathantanmy@google.com> writes: > > > When the working tree has: > > - foo (symlink) > > - foo/bar (directory) > > Whoa, wait. I assume, since this is about merge, the assumption is > that the working tree is clean with respect to the index, so 'foo' > is a symbolic link that is in the index. Now, if foo is a symlink, > how can foo/bar (whether it is a directory or something else) exist, > which requires foo to be a directory in the first place? I was trying to be concise but I guess I omitted too much detail. This is what's happening: Working tree: - foo (symlink pointing to .) - bar (directory) - bar/file (file) And the new commit: - foo (directory) - foo/bar (file) - bar (directory) - bar/file (file) I'll update the commit message when I send out another version. > In any case, if the working tree has 'foo' as a symlink, Git should > not look at or get affected by what 'foo' points at. Git should not, but it does - there is a call in process_entry() that calls lstat() on "foo/bar", which indeed reports that "foo/bar" is a directory. This patch adds a check that none of its ancestors are symlinks.
Jonathan Tan <jonathantanmy@google.com> writes: >> In any case, if the working tree has 'foo' as a symlink, Git should >> not look at or get affected by what 'foo' points at. > > Git should not, but it does - there is a call in process_entry() that calls > lstat() on "foo/bar", which indeed reports that "foo/bar" is a directory. This > patch adds a check that none of its ancestors are symlinks. Yeah, I recall having to add has_symlink_leading_path() long time ago in different codepaths (including "apply"). It is not surprising to see a similar glitch remaining in merge-recursive (it's a tricky issue, and it's a tricky code).
> Yeah, I recall having to add has_symlink_leading_path() long time > ago in different codepaths (including "apply"). It is not surprising > to see a similar glitch remaining in merge-recursive (it's a tricky > issue, and it's a tricky code). Thanks for the pointer to has_symlink_leading_path() - I tried it and it seems to work too. I'll wait for more replies and send an updated version tomorrow.
On Tue, Sep 17, 2019 at 02:50:40PM -0700, Jonathan Tan wrote: > diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh > index ff641b348a..dfd617a845 100755 > --- a/t/t3030-merge-recursive.sh > +++ b/t/t3030-merge-recursive.sh > @@ -452,6 +452,33 @@ test_expect_success 'merge-recursive d/f conflict result' ' > > ' > > +test_expect_success 'dir in working tree with symlink ancestor does not produce d/f conflict' ' This test needs the SYMLINKS prereq. > + git init sym && > + ( > + cd sym && > + ln -s . foo && > + mkdir bar && > + touch bar/file && Nit: >bar/file (though because of the symlink this test won't be run on the most fork()-challenged platform). > + git add foo bar/file && > + git commit -m "foo symlink" && > + > + git checkout -b branch1 && > + git commit --allow-empty -m "empty commit" && > + > + git checkout master && > + git rm foo && > + mkdir foo && > + touch foo/bar && > + git add foo/bar && > + git commit -m "replace foo symlink with real foo dir and foo/bar file" && > + > + git checkout branch1 && > + > + # Exercise to make sure that this works without errors > + git cherry-pick master Working without errors is great, but I think this test should also check that the results are as expected, e.g. 'foo' is now a directory, etc. > + ) > +' > + > test_expect_success 'reset and 3-way merge' ' > > git reset --hard "$c2" && > -- > 2.23.0.237.gc6a4ce50a0-goog >
Hi Jonathan, On Tue, Sep 17, 2019 at 2:50 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > When the working tree has: > - foo (symlink) > - foo/bar (directory) > > and the user merges a commit that deletes the foo symlink and instead > contains: > - foo (directory) > - foo/bar (file) > > the merge should happen without requiring user intervention. However, > this does not happen. > > In merge_trees(), process_entry() will be invoked first for "foo/bar", > then "foo" (in reverse lexicographical order). process_entry() correctly > reaches "Case B: Added in one", but dir_in_way() states that "bar" is > already present as a directory, causing a directory/file conflict at the > wrong point. I don't think the notes about hitting the "Case B: Added in one" codepath help; that's only one codepath that calls dir_in_way(), and I'm pretty sure with a little work we could trigger the same bug with the other ones. > Instead, teach dir_in_way() that directories under symlinks are not "in > the way", so that symlinks are treated as regular files instead of > directories containing other directories and files. Thus, the "else" > branch will be followed instead: "foo/bar" will be added to the working > tree, make_room_for_path() being indirectly called to unlink the "foo" > symlink (just like if "foo" were a regular file instead). When > process_entry() is subsequently invoked for "foo", process_entry() will > reach "Case A: Deleted in one", and will handle it as "Add/delete" or > "Modify/delete" appropriately (including reinstatement of the previously > unlinked symlink with a new unique filename if necessary, again, just > like if "foo" were a regular file instead). I was trying to think of a way to summarize it a bit, and then Junio later in the thread came in and provided a different and compatible way to view the issue that summarizes it quite nicely: "In any case, if the working tree has 'foo' as a symlink, Git should not look at or get affected by what 'foo' points at." We can probably make the commit message pretty concise using that wording or something similar. Maybe adding something like "In particular, the presence of a symlink should be treated much the same as the presence of a file; since dir_in_way() is only checking to see if there is a directory in the way, we don't want symlinks in leading paths to sometimes cause dir_in_way() to return true." > > Helped-by: Elijah Newren <newren@gmail.com> > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > Thanks to Elijah for his help. Some of the commit message is based on > his explanation [1]. > > I'm finding this relatively complicated, so I'm sending this as RFC. My > main concern is that whether all callers of dir_in_way() are OK with its > behavior change, and if yes, how to explain it. I suspect that this is > correct because dir_in_way() should behave consistently for all its > callers, but I might be wrong. Yes, we want all callers of dir_in_way() to get this change; if they don't, I'm pretty sure with some work we could devise special scenarios that exhibit the same bug. Thanks for working on this, Elijah
diff --git a/merge-recursive.c b/merge-recursive.c index 6b812d67e3..a2029a4e94 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -747,7 +747,7 @@ static int dir_in_way(struct index_state *istate, const char *path, { int pos; struct strbuf dirpath = STRBUF_INIT; - struct stat st; + int ret = 0; strbuf_addstr(&dirpath, path); strbuf_addch(&dirpath, '/'); @@ -758,13 +758,43 @@ static int dir_in_way(struct index_state *istate, const char *path, pos = -1 - pos; if (pos < istate->cache_nr && !strncmp(dirpath.buf, istate->cache[pos]->name, dirpath.len)) { - strbuf_release(&dirpath); - return 1; + ret = 1; + goto cleanup; } + if (check_working_copy) { + struct stat st; + + strbuf_trim_trailing_dir_sep(&dirpath); + if (lstat(dirpath.buf, &st)) + goto cleanup; /* doesn't exist, OK */ + if (!S_ISDIR(st.st_mode)) + goto cleanup; /* not directory, OK */ + if (empty_ok && is_empty_dir(dirpath.buf)) + goto cleanup; + + /* + * The given path is a directory, but this should not + * be treated as "in the way" if one of this + * directory's ancestors is a symlink. Check for this + * case. + */ + while (1) { + char *slash = strrchr(dirpath.buf, '/'); + + if (!slash) { + ret = 1; + goto cleanup; + } + *slash = '\0'; + if (!lstat(dirpath.buf, &st) && S_ISLNK(st.st_mode)) + goto cleanup; + } + } + +cleanup: strbuf_release(&dirpath); - return check_working_copy && !lstat(path, &st) && S_ISDIR(st.st_mode) && - !(empty_ok && is_empty_dir(path)); + return ret; } /* diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh index ff641b348a..dfd617a845 100755 --- a/t/t3030-merge-recursive.sh +++ b/t/t3030-merge-recursive.sh @@ -452,6 +452,33 @@ test_expect_success 'merge-recursive d/f conflict result' ' ' +test_expect_success 'dir in working tree with symlink ancestor does not produce d/f conflict' ' + git init sym && + ( + cd sym && + ln -s . foo && + mkdir bar && + touch bar/file && + git add foo bar/file && + git commit -m "foo symlink" && + + git checkout -b branch1 && + git commit --allow-empty -m "empty commit" && + + git checkout master && + git rm foo && + mkdir foo && + touch foo/bar && + git add foo/bar && + git commit -m "replace foo symlink with real foo dir and foo/bar file" && + + git checkout branch1 && + + # Exercise to make sure that this works without errors + git cherry-pick master + ) +' + test_expect_success 'reset and 3-way merge' ' git reset --hard "$c2" &&
When the working tree has: - foo (symlink) - foo/bar (directory) and the user merges a commit that deletes the foo symlink and instead contains: - foo (directory) - foo/bar (file) the merge should happen without requiring user intervention. However, this does not happen. In merge_trees(), process_entry() will be invoked first for "foo/bar", then "foo" (in reverse lexicographical order). process_entry() correctly reaches "Case B: Added in one", but dir_in_way() states that "bar" is already present as a directory, causing a directory/file conflict at the wrong point. Instead, teach dir_in_way() that directories under symlinks are not "in the way", so that symlinks are treated as regular files instead of directories containing other directories and files. Thus, the "else" branch will be followed instead: "foo/bar" will be added to the working tree, make_room_for_path() being indirectly called to unlink the "foo" symlink (just like if "foo" were a regular file instead). When process_entry() is subsequently invoked for "foo", process_entry() will reach "Case A: Deleted in one", and will handle it as "Add/delete" or "Modify/delete" appropriately (including reinstatement of the previously unlinked symlink with a new unique filename if necessary, again, just like if "foo" were a regular file instead). Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Thanks to Elijah for his help. Some of the commit message is based on his explanation [1]. I'm finding this relatively complicated, so I'm sending this as RFC. My main concern is that whether all callers of dir_in_way() are OK with its behavior change, and if yes, how to explain it. I suspect that this is correct because dir_in_way() should behave consistently for all its callers, but I might be wrong. The other thing is whether the commit message is clear enough. In particular, whether it needs more detail (I didn't mention the index, for example), or whether it should be more concise (for example, I thought of just stating that we treat symlinks as regular files and this would be backed up by the fact that I've updated the only part of merge-recursive.c that calls lstat() and then S_ISDIR). [1] https://public-inbox.org/git/CABPp-BHpXWF+1hKUTfn8s-y4MJZXz+jUVS_K10eKyD6PGwo=gg@mail.gmail.com/ --- merge-recursive.c | 40 +++++++++++++++++++++++++++++++++----- t/t3030-merge-recursive.sh | 27 +++++++++++++++++++++++++ 2 files changed, 62 insertions(+), 5 deletions(-)