Message ID | 142df1ab8e3e8ea5e213f2477e271e60e5b62f84.1615929436.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse Index: API protections | expand |
On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > Before iterating over all cache entries, ensure that a sparse index is > expanded to a full index to avoid unexpected behavior. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > builtin/add.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/builtin/add.c b/builtin/add.c > index ea762a41e3a2..ab2ef4a63530 100644 > --- a/builtin/add.c > +++ b/builtin/add.c > @@ -141,6 +141,7 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags) > { > int i, retval = 0; > > + ensure_full_index(&the_index); > for (i = 0; i < active_nr; i++) { > struct cache_entry *ce = active_cache[i]; I'm not so sure about this one; from git-add(1): --renormalize Apply the "clean" process freshly to all tracked files to forcibly add them again to the index. This is useful after changing core.autocrlf configuration or the text attribute in order to correct files added with wrong CRLF/LF line endings. This option implies -u. ... to "all tracked files". Should that be interpreted as all files within the sparsity paths, especially considering that we're trying to enable folks to work within the set of files of interest to them? If it doesn't imply that, wouldn't users want an extra option to be able to behave that way? And if we add an option, why are we adding a special option for people wanting to behave sparsely in a sparse-index-cone-mode-sparse-checkout rather than for the special case of folks wanting to behave densely in such a setup? The code below already has the following check below: if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode)) continue; /* do not touch non blobs */ suggesting that it'd correctly skip over the sparse directories, so I think this one just isn't needed. However, on a tangent, that S_ISLNK looks rather suspicious to me. Why would we renormalize symlinks? I mean, sure, symlinks aren't likely to have CRLF/LF characters in them, but if they did, wouldn't it be wrong to renormalize them?
On Wed, Mar 17, 2021 at 2:36 PM Elijah Newren <newren@gmail.com> wrote: > > On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > > From: Derrick Stolee <dstolee@microsoft.com> > > > > Before iterating over all cache entries, ensure that a sparse index is > > expanded to a full index to avoid unexpected behavior. > > > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > --- > > builtin/add.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/builtin/add.c b/builtin/add.c > > index ea762a41e3a2..ab2ef4a63530 100644 > > --- a/builtin/add.c > > +++ b/builtin/add.c > > @@ -141,6 +141,7 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags) > > { > > int i, retval = 0; > > > > + ensure_full_index(&the_index); > > for (i = 0; i < active_nr; i++) { > > struct cache_entry *ce = active_cache[i]; > > I'm not so sure about this one; from git-add(1): > > --renormalize > Apply the "clean" process freshly to all tracked files to forcibly > add them again to the index. This is useful after changing > core.autocrlf configuration or the text attribute in order to > correct files added with wrong CRLF/LF line endings. This option > implies -u. > > ... to "all tracked files". Should that be interpreted as all files > within the sparsity paths, especially considering that we're trying to > enable folks to work within the set of files of interest to them? I had the same question when working on the add/rm sparse-checkout series. As seen at [1], --renormalize and --chmod are, currently, the only options in git-add that do not honor the sparsity patterns and do update SKIP_WORKTREE paths too. But is this by design or possibly just an oversight? In my series I ended up making both options honor the sparsity rules, with a warning when requested to update any sparse path. (If we are going with this approach, perhaps should we also amend the docs to remove any ambiguity as for what "all tracked files" means in this case?) > The code below already has the following check below: > > if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode)) > continue; /* do not touch non blobs */ > > suggesting that it'd correctly skip over the sparse directories, so I > think this one just isn't needed. But if sparse index is not enabled, it does not skip over the sparse files, right? So isn't the ensure_full_index() call here (and in chmod_pathspec()) important to be consistent with the case when sparse index is not in use? Or maybe Stolee could rebase this on top of my series, where both these places already skip the sparse paths? (Assuming that's the behavior we are looking for. If not, I can also fix my series.) [1]: https://lore.kernel.org/git/d93c3f51465a3e409819db63bd81802e7ef20ea9.1615588108.git.matheus.bernardino@usp.br/
On 3/17/2021 4:35 PM, Matheus Tavares Bernardino wrote: > On Wed, Mar 17, 2021 at 2:36 PM Elijah Newren <newren@gmail.com> wrote: >> >> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget >> <gitgitgadget@gmail.com> wrote:>> I'm not so sure about this one; from git-add(1): >> >> --renormalize >> Apply the "clean" process freshly to all tracked files to forcibly >> add them again to the index. This is useful after changing >> core.autocrlf configuration or the text attribute in order to >> correct files added with wrong CRLF/LF line endings. This option >> implies -u. >> >> ... to "all tracked files". Should that be interpreted as all files >> within the sparsity paths, especially considering that we're trying to >> enable folks to work within the set of files of interest to them? I wrote in reply to another similar comment that I'm being overly cautious in creating these protections. When I can come back later with careful tests, we can ensure that everything behaves properly. > I had the same question when working on the add/rm sparse-checkout > series. As seen at [1], --renormalize and --chmod are, currently, the > only options in git-add that do not honor the sparsity patterns and do > update SKIP_WORKTREE paths too. > > But is this by design or possibly just an oversight? In my series I > ended up making both options honor the sparsity rules, with a warning > when requested to update any sparse path. (If we are going with this > approach, perhaps should we also amend the docs to remove any > ambiguity as for what "all tracked files" means in this case?) I'd be happy to see a resolution here, and it can happen in parallel to what I'm doing here. >> The code below already has the following check below: >> >> if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode)) >> continue; /* do not touch non blobs */ >> >> suggesting that it'd correctly skip over the sparse directories, so I >> think this one just isn't needed. > > But if sparse index is not enabled, it does not skip over the sparse > files, right? So isn't the ensure_full_index() call here (and in > chmod_pathspec()) important to be consistent with the case when sparse > index is not in use? The tests I am adding in t1092-sparse-checkout-compatibility.sh are focused on ensuring the same behavior across sparse-checkouts with and without the sparse-index, and also a full checkout (when possible). Since the sparse-index requires cone-mode sparse-checkout (currently), and the sparse files to be within a directory (or no index change happens), the tests you created in t3705-add-sparse-checkout.sh don't seem to apply. I'll need to find some important scenarios to duplicate in t1092 without the full depth of t3705. > Or maybe Stolee could rebase this on top of my > series, where both these places already skip the sparse paths? > (Assuming that's the behavior we are looking for. If not, I can also> fix my series.) I think they can proceed in parallel and then continue more carefully in the future. The series _after_ this one makes 'git status' and 'git add' work with the sparse-index, so at that point we will be removing these protections at the right time, with tests. In addition, those tests will ensure that we don't expand the sparse-index unless absolutely necessary. If that work collides with your approach, then I'll rebase onto a merge of that topic and this one. > [1]: https://lore.kernel.org/git/d93c3f51465a3e409819db63bd81802e7ef20ea9.1615588108.git.matheus.bernardino@usp.br/ (I will go take a look over here. I've been ignoring the thread for too long.) Thanks, -Stolee
diff --git a/builtin/add.c b/builtin/add.c index ea762a41e3a2..ab2ef4a63530 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -141,6 +141,7 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags) { int i, retval = 0; + ensure_full_index(&the_index); for (i = 0; i < active_nr; i++) { struct cache_entry *ce = active_cache[i];