Message ID | 20230907170119.1536694-1-sokcevic@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] diff-lib: Fix check_removed when fsmonitor is on | expand |
Josip Sokcevic <sokcevic@google.com> writes: > git-diff-index may return incorrect deleted entries when fsmonitor is used in a > repository with git submodules. This can be observed on Mac machines, but it > can affect all other supported platforms too. > > If fsmonitor is used, `stat *st` may not be initialized. Since `lstat` calls > aren't not desired when fsmonitor is on, skip the entire gitlink check using > the same condition used to initialize `stat *st`. I think this paragraph is outdated - you'll need to update it to match the code in this version. > diff --git a/diff-lib.c b/diff-lib.c > index d8aa777a73..664613bb1b 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -39,11 +39,22 @@ > static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st) > { > assert(is_fsmonitor_refreshed(istate)); > - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) { > - if (!is_missing_file_error(errno)) > - return -1; > - return 1; > + if (ce->ce_flags & CE_FSMONITOR_VALID) { > + /* > + * Both check_removed() and its callers expect lstat() to have > + * happened and, in particular, the st_mode field to be set. > + * Simulate this with the contents of ce. > + */ > + memset(st, 0, sizeof(*st)); > + st->st_mode = ce->ce_mode; > + } else { > + if (lstat(ce->name, st) < 0) { > + if (!is_missing_file_error(errno)) > + return -1; > + return 1; > + } > } > + > if (has_symlink_leading_path(ce->name, ce_namelen(ce))) > return 1; > if (S_ISDIR(st->st_mode)) { I'm on the fence about whether the extra newline is necessary - the "if" did get bigger, but the code is clear enough without the newline. I lean towards removing it, to avoid cluttering the diff.
Josip Sokcevic <sokcevic@google.com> writes: > diff --git a/diff-lib.c b/diff-lib.c > index d8aa777a73..664613bb1b 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -39,11 +39,22 @@ > static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st) > { > assert(is_fsmonitor_refreshed(istate)); Not a problem this patch introduces, but doesn't this call path diff_cache() -> unpack_trees() -> oneway_diff() -> do_oneway_diff() -> show_new_file(), show_modified() -> get_stat_data() -> check_removed() violate the assertion? If so, perhaps we should rewrite it into a more explicit "if (...) BUG(...)" that is not compiled away. > - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) { > - if (!is_missing_file_error(errno)) > - return -1; > - return 1; > + if (ce->ce_flags & CE_FSMONITOR_VALID) { > + /* > + * Both check_removed() and its callers expect lstat() to have > + * happened and, in particular, the st_mode field to be set. > + * Simulate this with the contents of ce. > + */ > + memset(st, 0, sizeof(*st)); It is true that the original, when CE_FSMONITOR_VALID bit is set, bypasses lstat() altogether and leaves the contents of st completely uninitialized, but this is still way too insufficient, isn't it? There are three call sites of the check_removed() function. * The first one in run_diff_files() only cares about st.st_mode and other members of the structure are not looked at. This makes readers wonder if the "st" parameter to check_removed() should become "mode_t *st_mode" to clarify this point, but the primary thing I want to say is that this caller will not mind if we leave other members of st bogus (like 0-bit filled) as long as the mode is set correctly. * The second one in run_diff_files() passes the resulting &st to match_stat_with_submodule(), which in turn passes it to ie_match_stat(), which cares about "struct stat" members that are used for quick change detection, like owner, group, mtime. Giving it a bogus st will most likely cause it to report a change. * The third one is in get_stat_data(). This also uses the &st to call match_stat_with_submodule(), so it is still totally broken to give it a bogus st, the same way as the second caller above. > + st->st_mode = ce->ce_mode; Does this work correctly when the cache entry points at a gitlink, which uses 0160000 that is not a valid st_mode? I think you'd want to use a reverse function of create_ce_mode(). > + } else { > + if (lstat(ce->name, st) < 0) { > + if (!is_missing_file_error(errno)) > + return -1; > + return 1; > + } > } At this point, if FSMONITOR_VALID bit is not set, we will always perform lstat() and get all the members of st populated properly, which is a definite improvement. While I think this does not make it worse (it is an existing bug that the code is broken for a ce with the CE_FSMONITOR_VALID bit set), we may want to leave a note that we _know_ the code after this patch is still broken. "Simulate this with ..." -> "Just setting st_mode is still insufficient and will break majority of callers". It may make sense, until we clean it up, to disable the check for the FSMONITOR_VALID bit in this codepath and always perform lstat(). Optimization matters, but computing quickly in order to return an incorrect result is optimizing for a wrong thing. I dunno. Thanks.
On Thu, Sep 7, 2023 at 11:07 AM Junio C Hamano <gitster@pobox.com> wrote: > > Josip Sokcevic <sokcevic@google.com> writes: > > > diff --git a/diff-lib.c b/diff-lib.c > > index d8aa777a73..664613bb1b 100644 > > --- a/diff-lib.c > > +++ b/diff-lib.c > > @@ -39,11 +39,22 @@ > > static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st) > > { > > assert(is_fsmonitor_refreshed(istate)); > > Not a problem this patch introduces, but doesn't this call path > > diff_cache() > -> unpack_trees() > -> oneway_diff() > -> do_oneway_diff() > -> show_new_file(), show_modified() > -> get_stat_data() > -> check_removed() > > violate the assertion? If so, perhaps we should rewrite it into a > more explicit "if (...) BUG(...)" that is not compiled away. True, I will update it. > > > - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) { > > - if (!is_missing_file_error(errno)) > > - return -1; > > - return 1; > > + if (ce->ce_flags & CE_FSMONITOR_VALID) { > > + /* > > + * Both check_removed() and its callers expect lstat() to have > > + * happened and, in particular, the st_mode field to be set. > > + * Simulate this with the contents of ce. > > + */ > > + memset(st, 0, sizeof(*st)); > > It is true that the original, when CE_FSMONITOR_VALID bit is set, > bypasses lstat() altogether and leaves the contents of st completely > uninitialized, but this is still way too insufficient, isn't it? > > There are three call sites of the check_removed() function. > > * The first one in run_diff_files() only cares about st.st_mode and > other members of the structure are not looked at. This makes > readers wonder if the "st" parameter to check_removed() should > become "mode_t *st_mode" to clarify this point, but the primary > thing I want to say is that this caller will not mind if we leave > other members of st bogus (like 0-bit filled) as long as the mode > is set correctly. > > * The second one in run_diff_files() passes the resulting &st to > match_stat_with_submodule(), which in turn passes it to > ie_match_stat(), which cares about "struct stat" members that are > used for quick change detection, like owner, group, mtime. > Giving it a bogus st will most likely cause it to report a > change. > > * The third one is in get_stat_data(). This also uses the &st to > call match_stat_with_submodule(), so it is still totally broken > to give it a bogus st, the same way as the second caller above. > > > + st->st_mode = ce->ce_mode; > > Does this work correctly when the cache entry points at a gitlink, > which uses 0160000 that is not a valid st_mode? I think you'd want > to use a reverse function of create_ce_mode(). I realized this too but after I sent the patch. I don't think we have a good way to reverse it, so the best we can do is to guess it. But I don't think we should do that. Instead, should we just zero the struct and add a TODO? Alternative could be to use a double pointer for stat and do more checks in call sites, but I'm not familiar with the codebase to how that branching would need to look like. Any preference?
Josip Sokcevic <sokcevic@google.com> writes: >> > @@ -39,11 +39,22 @@ >> > static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st) >> > { >> > assert(is_fsmonitor_refreshed(istate)); >> >> Not a problem this patch introduces, but doesn't this call path >> >> diff_cache() >> -> unpack_trees() >> -> oneway_diff() >> -> do_oneway_diff() >> -> show_new_file(), show_modified() >> -> get_stat_data() >> -> check_removed() >> >> violate the assertion? If so, perhaps we should rewrite it into a >> more explicit "if (...) BUG(...)" that is not compiled away. > > True, I will update it. Not as a part of this series, though. I think this came from the same series that were merged at 858119f6 (Merge branch 'nk/diff-index-fsmonitor', 2021-03-24), so if we just simply do a moral "revert" of the series, then it should disappear. More importantly, you'd want to ask help from folks who are more familiar with fsmonitor, which would take some time, and fixing this assertion can be backburnered while you are fixing the original issue that got you started here. I see JeffH is already Cc:ed, which is good. >> > - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) { >> > - if (!is_missing_file_error(errno)) >> > - return -1; >> > - return 1; >> > + if (ce->ce_flags & CE_FSMONITOR_VALID) { >> > + /* >> > + * Both check_removed() and its callers expect lstat() to have >> > + * happened and, in particular, the st_mode field to be set. >> > + * Simulate this with the contents of ce. >> > + */ >> > + memset(st, 0, sizeof(*st)); >> >> It is true that the original, when CE_FSMONITOR_VALID bit is set, >> bypasses lstat() altogether and leaves the contents of st completely >> uninitialized, but this is still way too insufficient, isn't it? >> >> There are three call sites of the check_removed() function. >> >> * The first one in run_diff_files() only cares about st.st_mode and >> other members of the structure are not looked at. This makes >> readers wonder if the "st" parameter to check_removed() should >> become "mode_t *st_mode" to clarify this point, but the primary >> thing I want to say is that this caller will not mind if we leave >> other members of st bogus (like 0-bit filled) as long as the mode >> is set correctly. >> >> * The second one in run_diff_files() passes the resulting &st to >> match_stat_with_submodule(), which in turn passes it to >> ie_match_stat(), which cares about "struct stat" members that are >> used for quick change detection, like owner, group, mtime. >> Giving it a bogus st will most likely cause it to report a >> change. >> >> * The third one is in get_stat_data(). This also uses the &st to >> call match_stat_with_submodule(), so it is still totally broken >> to give it a bogus st, the same way as the second caller above. >> >> > + st->st_mode = ce->ce_mode; >> >> Does this work correctly when the cache entry points at a gitlink, >> which uses 0160000 that is not a valid st_mode? I think you'd want >> to use a reverse function of create_ce_mode(). > > I realized this too but after I sent the patch. I don't think we have > a good way to reverse it, so the best we can do is to guess it. But I > don't think we should do that. Instead, should we just zero the struct > and add a TODO? Alternative could be to use a double pointer for stat > and do more checks in call sites, but I'm not familiar with the > codebase to how that branching would need to look like. > > Any preference? My preference in the short term actually is to teach this codepath to ignore CE_FSMONITOR_VALID bit and always run lstat() instead. That will unbreak this "optimization" that allowed the code to do a wrong thing quickly. The solution suitable for a bit longer term becomes rather obvious when we step back a bit and think about the purpose of this block inside "if ce_flags says CE_FSMONITOR_VALID is true, do this". The fsmonitor has already checked and knows that the path has not been updated, i.e. what we have in the in-core index still matches what we would get from running lstat(). So now what we need to do is to pretend as if we called lstat(), recreating the data in "struct stat" from "struct cache_entry" well enough to fool ie_match_stat() into saying that the cached stat data in ce still matches what is in st. Can we do that? Of course we can. When we added or refreshed the cache entry, we should have had "struct stat" taken out of lstat() for the path, and copied some data from the members of "struct stat" into "struct cache_entry". After all, that is how we can later ask ie_match_stat() if the cached stat data in the "struct cache_entry" matches the current "struct stat" data to tell if the path hasn't been touched since we last refreshed. We want to have a helper function that we can ask: With the data in this 'struct cache_entry', please populate this 'struct stat'. Do so well enough that ie_match_stat() says "ok, the path is unchanged". You can write such a function by studying two code paths. * ie_match_stat() and its helpers to find out which members of "struct stat" matters (i.e. compared against corresponding members in "struct cache_entry"). * add_to_index() and its helpers, fill_stat_cache_info() in particular, to see which members of "struct stat" are used to populate ce->ce_stat_data and how (e.g. some members lose precision and that is find---the reverse operation will also be lossy and the resulting "struct stat" you will forge will *not* exactly match what you would obtain from real lstat(), but it is good enough to fool ie_match_stat() because you'll be comparing against an existing ce that was populated in a lossy way). Then, replace the above partial assignment to st with a call to such a helper function, which should live in statinfo.c next to fill_stat_data() and whose function signature should look like void fill_stat_from_stat_data(struct stat *st, struct stat_data *sd); You'd probably need to define two macros to update nsec part of c/mtime members that is conditional near where ST_CTIME_NSEC() macro is defined to write that reverse function of fill_stat_data(), perhaps along the lines of ... #ifdef NO_NSEC #define FILL_ST_CTIME_NSEC(st, nsec) do { ; /* noop */ } while (0) #else #ifdef USE_ST_TIMESPEC #define FILL_ST_CTIME_NSEC(st, nsec) (st)->st_ctimespec.tv_nsec = (nsec) #else #define FILL_ST_CTIME_NSEC(st, nsec) (st)->st_ctim.tv_nsec = (nsec) #endif #endif
diff --git a/diff-lib.c b/diff-lib.c index d8aa777a73..664613bb1b 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -39,11 +39,22 @@ static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st) { assert(is_fsmonitor_refreshed(istate)); - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) { - if (!is_missing_file_error(errno)) - return -1; - return 1; + if (ce->ce_flags & CE_FSMONITOR_VALID) { + /* + * Both check_removed() and its callers expect lstat() to have + * happened and, in particular, the st_mode field to be set. + * Simulate this with the contents of ce. + */ + memset(st, 0, sizeof(*st)); + st->st_mode = ce->ce_mode; + } else { + if (lstat(ce->name, st) < 0) { + if (!is_missing_file_error(errno)) + return -1; + return 1; + } } + if (has_symlink_leading_path(ce->name, ce_namelen(ce))) return 1; if (S_ISDIR(st->st_mode)) { diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh index 0c241d6c14..78503158fd 100755 --- a/t/t7527-builtin-fsmonitor.sh +++ b/t/t7527-builtin-fsmonitor.sh @@ -809,6 +809,11 @@ my_match_and_clean () { status --porcelain=v2 >actual.without && test_cmp actual.with actual.without && + git -C super --no-optional-locks diff-index --name-status HEAD >actual.with && + git -C super --no-optional-locks -c core.fsmonitor=false \ + diff-index --name-status HEAD >actual.without && + test_cmp actual.with actual.without && + git -C super/dir_1/dir_2/sub reset --hard && git -C super/dir_1/dir_2/sub clean -d -f }
git-diff-index may return incorrect deleted entries when fsmonitor is used in a repository with git submodules. This can be observed on Mac machines, but it can affect all other supported platforms too. If fsmonitor is used, `stat *st` may not be initialized. Since `lstat` calls aren't not desired when fsmonitor is on, skip the entire gitlink check using the same condition used to initialize `stat *st`. Signed-off-by: Josip Sokcevic <sokcevic@google.com> --- diff-lib.c | 19 +++++++++++++++---- t/t7527-builtin-fsmonitor.sh | 5 +++++ 2 files changed, 20 insertions(+), 4 deletions(-)