Message ID | cover.1617914011.git.matheus.bernardino@usp.br (mailing list archive) |
---|---|
Headers | show |
Series | add/rm: honor sparse checkout and warn on sparse paths | expand |
On Thu, Apr 8, 2021 at 1:41 PM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > Make `rm` honor sparse checkouts, and make both `rm` and `add` warn > when asked to update sparse entries. > > There are two changes since v3: > > - `test_i18ncmp` and `test_i18ngrep` were replaced by `test_cmp` and > `grep` > > - The flag added in patch 5 now makes refresh_index() completely ignore > skip_worktree entries, instead of just suppressing their matches on > the seen[] array. The previous implementation was not necessarily > wrong but, as Junio pointed out, it was rather odd to keep matching > the entries if we no longer want to use the matches. > > As "side effects", the new version of the flag also makes > refresh_index() refrain from both: > > (1) checking and warning if skip_worktree entries matching the given > pathspec are unmerged. > > (2) marking skip_worktree entries matching the given pathspec with > CE_UPTODATE. > > The change (1) is actually interesting because `git add` doesn't > update skip_worktree entries, and thus, it doesn't make much sense to > warn if they are unmerged. Besides, we will already warn if the user > requests to update such entries, anyway. And finally, unmerged > entries should not have the skip_worktree bit set in the first place. > (`git merge` should clean this bit when writing the new index, and > neither `git sparse-checkout` nor `git update-index` allow to set the > bit on an unmerged entry.) > > Change (2) is perhaps not very beneficial, but it is also not harmful. > The only practical difference we get by not setting the CE_UPTODATE > flag in the skip_worktree entries is that, when writing a new index at > the end of `git add --refresh`, do_write_index() will start checking > if these entries are racy clean. Note that it already does that for > all the skip_worktree entries that do not match the user-given > pathspecs. And, in fact, this behavior distinction based on the > pathspec only happens with `--refresh`. Plain `git add` and other > options don't mark any skip_worktree entry with CE_UPTODATE > (regardless of the pathspecs) and thus, all these entries are checked > when writing the index. So `git add --refresh` will only do what the > other options already do. Sorry for the delay. These two changes sound good to me, and the range-diff looks reasonable. > (Additionally, as I mentioned in [1], there might actually be at least > one advantage in checking if the skip_worktree entries are racy clean. > But this is a very specific case, and it's probably a topic for a > another thread :) > > [1]: https://lore.kernel.org/git/CAHd-oW4kRLjV9Sq3CFt-V1Ot9pYFzJggU1zPp3Hcuw=qWfq7Mg@mail.gmail.com/ This I'm a bit surprised by. I thought the outcome there was that you didn't want to mark skip_worktree entries as CE_UPTODATE in order to force them to be stat'd in the future when someone clears the skip_worktree bit. (And that we'd expect the sparse-checkout add/disable command to be the likely one to handle that, though if users use `git update-index --no-skip-worktree PATH` then they'd also be responsible for any refreshing as well.) Am I misunderstanding?
On Wed, Apr 14, 2021 at 1:36 PM Elijah Newren <newren@gmail.com> wrote: > > On Thu, Apr 8, 2021 at 1:41 PM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > Make `rm` honor sparse checkouts, and make both `rm` and `add` warn > > when asked to update sparse entries. > > > > There are two changes since v3: > > > > - `test_i18ncmp` and `test_i18ngrep` were replaced by `test_cmp` and > > `grep` > > > > - The flag added in patch 5 now makes refresh_index() completely ignore > > skip_worktree entries, instead of just suppressing their matches on > > the seen[] array. The previous implementation was not necessarily > > wrong but, as Junio pointed out, it was rather odd to keep matching > > the entries if we no longer want to use the matches. > > > > As "side effects", the new version of the flag also makes > > refresh_index() refrain from both: > > > > (1) checking and warning if skip_worktree entries matching the given > > pathspec are unmerged. > > > > (2) marking skip_worktree entries matching the given pathspec with > > CE_UPTODATE. > > > > The change (1) is actually interesting because `git add` doesn't > > update skip_worktree entries, and thus, it doesn't make much sense to > > warn if they are unmerged. Besides, we will already warn if the user > > requests to update such entries, anyway. And finally, unmerged > > entries should not have the skip_worktree bit set in the first place. > > (`git merge` should clean this bit when writing the new index, and > > neither `git sparse-checkout` nor `git update-index` allow to set the > > bit on an unmerged entry.) > > > > Change (2) is perhaps not very beneficial, but it is also not harmful. > > The only practical difference we get by not setting the CE_UPTODATE > > flag in the skip_worktree entries is that, when writing a new index at > > the end of `git add --refresh`, do_write_index() will start checking > > if these entries are racy clean. Note that it already does that for > > all the skip_worktree entries that do not match the user-given > > pathspecs. And, in fact, this behavior distinction based on the > > pathspec only happens with `--refresh`. Plain `git add` and other > > options don't mark any skip_worktree entry with CE_UPTODATE > > (regardless of the pathspecs) and thus, all these entries are checked > > when writing the index. So `git add --refresh` will only do what the > > other options already do. > > Sorry for the delay. These two changes sound good to me, and the > range-diff looks reasonable. > > > (Additionally, as I mentioned in [1], there might actually be at least > > one advantage in checking if the skip_worktree entries are racy clean. > > But this is a very specific case, and it's probably a topic for a > > another thread :) > > > > [1]: https://lore.kernel.org/git/CAHd-oW4kRLjV9Sq3CFt-V1Ot9pYFzJggU1zPp3Hcuw=qWfq7Mg@mail.gmail.com/ > > This I'm a bit surprised by. I thought the outcome there was that you > didn't want to mark skip_worktree entries as CE_UPTODATE in order to > force them to be stat'd in the future when someone clears the > skip_worktree bit. Hmm, not exactly. This situation is a bit tricky and I probably got lost when trying to communicate my thoughts about it. In short, the outcome of not marking skip_worktree entries as CE_UPTODATE (which is an in-memory-only flag) is that, when writing the updated index at the end of `git status --refresh` , we now properly detect and mark skip_worktree entries whose associated files are present in the working tree and are modified in relation to the respective blobs. (This whole process is skipped for CE_UPTODATE entries.) This doesn't have any effect while the skip_worktree bit is set. But it makes it possible for a later `git status` to properly show the files as modified when the skip_worktree bit gets unset. If we don't do this, the later `git status` will wrongly think these entries are clean. This is because of the way git detects racily clean entries. Paraphrasing `Documentation/technical/racy-git.txt`, we take two actions to diagnose these entries: 1) When we want to know if an entry is up-to-date: if the entry's timestamp is equal to, or newer than, the index timestamp, we not only compare the cached stat info with the filesystem stat info but we also compare the actual contents. 2) When writing a new index: if the index contains racily clean entries, their `st_size` is truncated to zero. Item 2) is important because, otherwise, the subsequent operations wouldn't be able to detect the racily clean entries using 1) as the index timestamp would have been updated. And that's what happens with skip_worktree entries on `git status --refresh`. We mark them as CE_UPTODATE even if the file exists in the working tree, so we don't check if the cached entry is racily clean, and thus we don't truncate `st_size` to 0, hiding the racily clean entry. With all that said, I think this whole situation must be quite rare and not very important in practice...
Matheus Tavares <matheus.bernardino@usp.br> writes: > Make `rm` honor sparse checkouts, and make both `rm` and `add` warn > when asked to update sparse entries. OK, has the back-and-forth concluded and this is now ready to be merged down to 'next'?
On Fri, Apr 16, 2021 at 2:33 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matheus Tavares <matheus.bernardino@usp.br> writes: > > > Make `rm` honor sparse checkouts, and make both `rm` and `add` warn > > when asked to update sparse entries. > > OK, has the back-and-forth concluded and this is now ready to be > merged down to 'next'? Yes, let's merge it.