diff mbox series

[05/27] add: ensure full index

Message ID 142df1ab8e3e8ea5e213f2477e271e60e5b62f84.1615929436.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse Index: API protections | expand

Commit Message

Derrick Stolee March 16, 2021, 9:16 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Before iterating over all cache entries, ensure that a sparse index is
expanded to a full index to avoid unexpected behavior.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/add.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Elijah Newren March 17, 2021, 5:35 p.m. UTC | #1
On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Before iterating over all cache entries, ensure that a sparse index is
> expanded to a full index to avoid unexpected behavior.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/add.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index ea762a41e3a2..ab2ef4a63530 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -141,6 +141,7 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
>  {
>         int i, retval = 0;
>
> +       ensure_full_index(&the_index);
>         for (i = 0; i < active_nr; i++) {
>                 struct cache_entry *ce = active_cache[i];

I'm not so sure about this one; from git-add(1):

       --renormalize
           Apply the "clean" process freshly to all tracked files to forcibly
           add them again to the index. This is useful after changing
           core.autocrlf configuration or the text attribute in order to
           correct files added with wrong CRLF/LF line endings. This option
           implies -u.

... to "all tracked files".  Should that be interpreted as all files
within the sparsity paths, especially considering that we're trying to
enable folks to work within the set of files of interest to them?  If
it doesn't imply that, wouldn't users want an extra option to be able
to behave that way?  And if we add an option, why are we adding a
special option for people wanting to behave sparsely in a
sparse-index-cone-mode-sparse-checkout rather than for the special
case of folks wanting to behave densely in such a setup?

The code below already has the following check below:

    if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
      continue; /* do not touch non blobs */

suggesting that it'd correctly skip over the sparse directories, so I
think this one just isn't needed.

However, on a tangent, that S_ISLNK looks rather suspicious to me.
Why would we renormalize symlinks?  I mean, sure, symlinks aren't
likely to have CRLF/LF characters in them, but if they did, wouldn't
it be wrong to renormalize them?
Matheus Tavares March 17, 2021, 8:35 p.m. UTC | #2
On Wed, Mar 17, 2021 at 2:36 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Derrick Stolee <dstolee@microsoft.com>
> >
> > Before iterating over all cache entries, ensure that a sparse index is
> > expanded to a full index to avoid unexpected behavior.
> >
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  builtin/add.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/builtin/add.c b/builtin/add.c
> > index ea762a41e3a2..ab2ef4a63530 100644
> > --- a/builtin/add.c
> > +++ b/builtin/add.c
> > @@ -141,6 +141,7 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
> >  {
> >         int i, retval = 0;
> >
> > +       ensure_full_index(&the_index);
> >         for (i = 0; i < active_nr; i++) {
> >                 struct cache_entry *ce = active_cache[i];
>
> I'm not so sure about this one; from git-add(1):
>
>        --renormalize
>            Apply the "clean" process freshly to all tracked files to forcibly
>            add them again to the index. This is useful after changing
>            core.autocrlf configuration or the text attribute in order to
>            correct files added with wrong CRLF/LF line endings. This option
>            implies -u.
>
> ... to "all tracked files".  Should that be interpreted as all files
> within the sparsity paths, especially considering that we're trying to
> enable folks to work within the set of files of interest to them?

I had the same question when working on the add/rm sparse-checkout
series. As seen at [1], --renormalize and --chmod are, currently, the
only options in git-add that do not honor the sparsity patterns and do
update SKIP_WORKTREE paths too.

But is this by design or possibly just an oversight? In my series I
ended up making both options honor the sparsity rules, with a warning
when requested to update any sparse path. (If we are going with this
approach, perhaps should we also amend the docs to remove any
ambiguity as for what "all tracked files" means in this case?)

> The code below already has the following check below:
>
>     if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
>       continue; /* do not touch non blobs */
>
> suggesting that it'd correctly skip over the sparse directories, so I
> think this one just isn't needed.

But if sparse index is not enabled, it does not skip over the sparse
files, right? So isn't the ensure_full_index() call here (and in
chmod_pathspec()) important to be consistent with the case when sparse
index is not in use? Or maybe Stolee could rebase this on top of my
series, where both these places already skip the sparse paths?
(Assuming that's the behavior we are looking for. If not, I can also
fix my series.)

[1]: https://lore.kernel.org/git/d93c3f51465a3e409819db63bd81802e7ef20ea9.1615588108.git.matheus.bernardino@usp.br/
Derrick Stolee March 17, 2021, 8:55 p.m. UTC | #3
On 3/17/2021 4:35 PM, Matheus Tavares Bernardino wrote:
> On Wed, Mar 17, 2021 at 2:36 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:>> I'm not so sure about this one; from git-add(1):
>>
>>        --renormalize
>>            Apply the "clean" process freshly to all tracked files to forcibly
>>            add them again to the index. This is useful after changing
>>            core.autocrlf configuration or the text attribute in order to
>>            correct files added with wrong CRLF/LF line endings. This option
>>            implies -u.
>>
>> ... to "all tracked files".  Should that be interpreted as all files
>> within the sparsity paths, especially considering that we're trying to
>> enable folks to work within the set of files of interest to them?

I wrote in reply to another similar comment that I'm being overly
cautious in creating these protections. When I can come back later
with careful tests, we can ensure that everything behaves properly.

> I had the same question when working on the add/rm sparse-checkout
> series. As seen at [1], --renormalize and --chmod are, currently, the
> only options in git-add that do not honor the sparsity patterns and do
> update SKIP_WORKTREE paths too.
> 
> But is this by design or possibly just an oversight? In my series I
> ended up making both options honor the sparsity rules, with a warning
> when requested to update any sparse path. (If we are going with this
> approach, perhaps should we also amend the docs to remove any
> ambiguity as for what "all tracked files" means in this case?)

I'd be happy to see a resolution here, and it can happen in parallel
to what I'm doing here.

>> The code below already has the following check below:
>>
>>     if (!S_ISREG(ce->ce_mode) && !S_ISLNK(ce->ce_mode))
>>       continue; /* do not touch non blobs */
>>
>> suggesting that it'd correctly skip over the sparse directories, so I
>> think this one just isn't needed.
> 
> But if sparse index is not enabled, it does not skip over the sparse
> files, right? So isn't the ensure_full_index() call here (and in
> chmod_pathspec()) important to be consistent with the case when sparse
> index is not in use?

The tests I am adding in t1092-sparse-checkout-compatibility.sh are
focused on ensuring the same behavior across sparse-checkouts with
and without the sparse-index, and also a full checkout (when possible).

Since the sparse-index requires cone-mode sparse-checkout (currently),
and the sparse files to be within a directory (or no index change
happens), the tests you created in t3705-add-sparse-checkout.sh don't
seem to apply. I'll need to find some important scenarios to duplicate
in t1092 without the full depth of t3705.

> Or maybe Stolee could rebase this on top of my
> series, where both these places already skip the sparse paths?
> (Assuming that's the behavior we are looking for. If not, I can also> fix my series.)

I think they can proceed in parallel and then continue more carefully
in the future. The series _after_ this one makes 'git status' and
'git add' work with the sparse-index, so at that point we will be
removing these protections at the right time, with tests. In addition,
those tests will ensure that we don't expand the sparse-index unless
absolutely necessary.

If that work collides with your approach, then I'll rebase onto a
merge of that topic and this one.

> [1]: https://lore.kernel.org/git/d93c3f51465a3e409819db63bd81802e7ef20ea9.1615588108.git.matheus.bernardino@usp.br/
 
(I will go take a look over here. I've been ignoring the thread for
too long.)

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/builtin/add.c b/builtin/add.c
index ea762a41e3a2..ab2ef4a63530 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -141,6 +141,7 @@  static int renormalize_tracked_files(const struct pathspec *pathspec, int flags)
 {
 	int i, retval = 0;
 
+	ensure_full_index(&the_index);
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];