Message ID | 20230911170901.49050-2-sokcevic@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 6a044a20480a8ef56f7ddb8142f660ca01a3391e |
Headers | show |
Series | [v3] diff-lib: Fix check_removed when fsmonitor is on | expand |
Josip Sokcevic <sokcevic@google.com> writes: > git-diff-index may return incorrect deleted entries when fsmonitor is used in a > repository with git submodules. This can be observed on Mac machines, but it > can affect all other supported platforms too. > > If fsmonitor is used, `stat *st` is not initialized if cache_entry has > CE_FSMONITOR_VALID set. But, there are three call sites that rely on stat > afterwards, which can result in incorrect results. > > This change partially reverts commit 4f3d6d0. > > Signed-off-by: Josip Sokcevic <sokcevic@google.com> > --- > diff-lib.c | 12 ++++++------ > t/t7527-builtin-fsmonitor.sh | 5 +++++ > 2 files changed, 11 insertions(+), 6 deletions(-) This certainly is more "complete" if simpler than the previous one ;-) In the longer term, we would probably want to enable optimization using what fsmonitor knows, but as we have seen in the review on the previous round, this code needs a bit more work than the original we are reverting here to get it right, and in the shorter term, hopefully this would do. Thanks. > diff --git a/diff-lib.c b/diff-lib.c > index d8aa777a73..5848e4f9ca 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -36,14 +36,14 @@ > * exists for ce that is a submodule -- it is a submodule that is not > * checked out). Return negative for an error. > */ > -static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st) > +static int check_removed(const struct cache_entry *ce, struct stat *st) > { > - assert(is_fsmonitor_refreshed(istate)); > - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) { > + if (lstat(ce->name, st) < 0) { > if (!is_missing_file_error(errno)) > return -1; > return 1; > } > + > if (has_symlink_leading_path(ce->name, ce_namelen(ce))) > return 1; > if (S_ISDIR(st->st_mode)) { > @@ -149,7 +149,7 @@ void run_diff_files(struct rev_info *revs, unsigned int option) > memset(&(dpath->parent[0]), 0, > sizeof(struct combine_diff_parent)*5); > > - changed = check_removed(istate, ce, &st); > + changed = check_removed(ce, &st); > if (!changed) > wt_mode = ce_mode_from_stat(ce, st.st_mode); > else { > @@ -229,7 +229,7 @@ void run_diff_files(struct rev_info *revs, unsigned int option) > } else { > struct stat st; > > - changed = check_removed(istate, ce, &st); > + changed = check_removed(ce, &st); > if (changed) { > if (changed < 0) { > perror(ce->name); > @@ -303,7 +303,7 @@ static int get_stat_data(const struct index_state *istate, > if (!cached && !ce_uptodate(ce)) { > int changed; > struct stat st; > - changed = check_removed(istate, ce, &st); > + changed = check_removed(ce, &st); > if (changed < 0) > return -1; > else if (changed) { > diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh > index 0c241d6c14..78503158fd 100755 > --- a/t/t7527-builtin-fsmonitor.sh > +++ b/t/t7527-builtin-fsmonitor.sh > @@ -809,6 +809,11 @@ my_match_and_clean () { > status --porcelain=v2 >actual.without && > test_cmp actual.with actual.without && > > + git -C super --no-optional-locks diff-index --name-status HEAD >actual.with && > + git -C super --no-optional-locks -c core.fsmonitor=false \ > + diff-index --name-status HEAD >actual.without && > + test_cmp actual.with actual.without && > + > git -C super/dir_1/dir_2/sub reset --hard && > git -C super/dir_1/dir_2/sub clean -d -f > }
On Mon, Sep 11, 2023 at 3:53 PM Junio C Hamano <gitster@pobox.com> wrote: > This certainly is more "complete" if simpler than the previous one > ;-) > > In the longer term, we would probably want to enable optimization > using what fsmonitor knows, but as we have seen in the review on > the previous round, this code needs a bit more work than the > original we are reverting here to get it right, and in the shorter > term, hopefully this would do. Yes, I agree we should optimize this in a follow up. One thing I'm not sure about is if we should try to construct `struct stat` using `cache_index`, or we should check for `CE_FSMONITOR_VALID` in a way that `stat` would no longer be needed for those code paths.
Josip Sokcevic <sokcevic@google.com> writes: > Yes, I agree we should optimize this in a follow up. One thing I'm not > sure about is if we should try to construct `struct stat` using > `cache_index`, or we should check for `CE_FSMONITOR_VALID` in a way > that `stat` would no longer be needed for those code paths. Good point. It seems to be entirely doable, even though the stench from the abstraction layer violation may be horrible. ie_match_stat(), which is called by match_stat_with_submodule(), does pay attention to CE_FSMONITOR_VALID bit, so none of the members of struct stat matters when the bit is set. But the bit is not set, all members that fill_stat_data() and ce_match_stat_basic() care about do matter. Other code that follows callers of check_removed() do care about at least .st_mode member, and I suspect that in the current code .st_mode is the only member that gets looked at. So after all, I think your original "fix" was correct, but it took us some time to figure out why it was, which means we would want to explain it in the log message for developers who would want to touch the same area in the future. Thanks.
On Tue, Sep 12, 2023 at 10:07 AM Junio C Hamano <gitster@pobox.com> wrote: > It seems to be entirely doable, even though the stench from the > abstraction layer violation may be horrible. > > ie_match_stat(), which is called by match_stat_with_submodule(), > does pay attention to CE_FSMONITOR_VALID bit, so none of the members > of struct stat matters when the bit is set. But the bit is not set, > all members that fill_stat_data() and ce_match_stat_basic() care > about do matter. Other code that follows callers of check_removed() > do care about at least .st_mode member, and I suspect that in the > current code .st_mode is the only member that gets looked at. > > So after all, I think your original "fix" was correct, but it took > us some time to figure out why it was, which means we would want to > explain it in the log message for developers who would want to touch > the same area in the future. I finished testing this - after my original fix and after running all tests, I can confirm that `st_mode` of `struct stat` is indeed not consumed if CE_FSMONITOR_VALID is set. But, it's fragile and likely to cause problems in the future. Your approach of constructing `struct stat` based on `struct cache_entry` is the way to go. I see you created a new set of patches in a separate thread, so I'll start those tests and report back there. Thanks!
Josip Sokcevic <sokcevic@google.com> writes: > I see you created a new set of patches in a separate thread, so I'll > start those tests and report back there. Thanks.
diff --git a/diff-lib.c b/diff-lib.c index d8aa777a73..5848e4f9ca 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -36,14 +36,14 @@ * exists for ce that is a submodule -- it is a submodule that is not * checked out). Return negative for an error. */ -static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st) +static int check_removed(const struct cache_entry *ce, struct stat *st) { - assert(is_fsmonitor_refreshed(istate)); - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) { + if (lstat(ce->name, st) < 0) { if (!is_missing_file_error(errno)) return -1; return 1; } + if (has_symlink_leading_path(ce->name, ce_namelen(ce))) return 1; if (S_ISDIR(st->st_mode)) { @@ -149,7 +149,7 @@ void run_diff_files(struct rev_info *revs, unsigned int option) memset(&(dpath->parent[0]), 0, sizeof(struct combine_diff_parent)*5); - changed = check_removed(istate, ce, &st); + changed = check_removed(ce, &st); if (!changed) wt_mode = ce_mode_from_stat(ce, st.st_mode); else { @@ -229,7 +229,7 @@ void run_diff_files(struct rev_info *revs, unsigned int option) } else { struct stat st; - changed = check_removed(istate, ce, &st); + changed = check_removed(ce, &st); if (changed) { if (changed < 0) { perror(ce->name); @@ -303,7 +303,7 @@ static int get_stat_data(const struct index_state *istate, if (!cached && !ce_uptodate(ce)) { int changed; struct stat st; - changed = check_removed(istate, ce, &st); + changed = check_removed(ce, &st); if (changed < 0) return -1; else if (changed) { diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh index 0c241d6c14..78503158fd 100755 --- a/t/t7527-builtin-fsmonitor.sh +++ b/t/t7527-builtin-fsmonitor.sh @@ -809,6 +809,11 @@ my_match_and_clean () { status --porcelain=v2 >actual.without && test_cmp actual.with actual.without && + git -C super --no-optional-locks diff-index --name-status HEAD >actual.with && + git -C super --no-optional-locks -c core.fsmonitor=false \ + diff-index --name-status HEAD >actual.without && + test_cmp actual.with actual.without && + git -C super/dir_1/dir_2/sub reset --hard && git -C super/dir_1/dir_2/sub clean -d -f }
git-diff-index may return incorrect deleted entries when fsmonitor is used in a repository with git submodules. This can be observed on Mac machines, but it can affect all other supported platforms too. If fsmonitor is used, `stat *st` is not initialized if cache_entry has CE_FSMONITOR_VALID set. But, there are three call sites that rely on stat afterwards, which can result in incorrect results. This change partially reverts commit 4f3d6d0. Signed-off-by: Josip Sokcevic <sokcevic@google.com> --- diff-lib.c | 12 ++++++------ t/t7527-builtin-fsmonitor.sh | 5 +++++ 2 files changed, 11 insertions(+), 6 deletions(-)