diff mbox series

[v4,04/10] sparse-index: use WRITE_TREE_MISSING_OK

Message ID b379b8fc61af8a8c39ff8b73aae03ad4999a456c.1629841904.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse index: delete ignored files outside sparse cone | expand

Commit Message

Derrick Stolee Aug. 24, 2021, 9:51 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

When updating the cache tree in convert_to_sparse(), the
WRITE_TREE_MISSING_OK flag indicates that trees might be computed that
do not already exist within the object database. This happens in cases
such as 'git add' creating new trees that it wants to store in
anticipation of a following 'git commit'. If this flag is not specified,
then it might trigger a promisor fetch or a failure due to the object
not existing locally.

Use WRITE_TREE_MISSING_OK during convert_to_sparse() to avoid these
possible reasons for the cache_tree_update() to fail.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sparse-index.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Elijah Newren Aug. 27, 2021, 9:33 p.m. UTC | #1
On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When updating the cache tree in convert_to_sparse(), the
> WRITE_TREE_MISSING_OK flag indicates that trees might be computed that
> do not already exist within the object database.

Okay.

> This happens in cases
> such as 'git add' creating new trees that it wants to store in
> anticipation of a following 'git commit'.

This doesn't make any sense to me.  Does 'git add' call
convert_to_sparse()?  I don't see why it would; wouldn't the calls to
convert_to_sparse() come via sparse-checkout init/set commands?  If
I'm correct on that, and 'git add' wants to create new trees, then by
the time convert_to_sparse() is called in some subsequent git
operation, then convert_to_sparse would already have the trees it
needs.


I thought the reason you would need this is someone modified and
staged a change to a file underneath a directory that will be
sparsified away; at the time of convert_to_sparse(), a tree object may
not have yet been written for the new tree with the newly modified
file (because those tend to be written at commit time), but you'd need
it at the time you sparsified.

> If this flag is not specified,
> then it might trigger a promisor fetch or a failure due to the object
> not existing locally.

Good point.

> Use WRITE_TREE_MISSING_OK during convert_to_sparse() to avoid these
> possible reasons for the cache_tree_update() to fail.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  sparse-index.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index d9b07695953..880c5f72338 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -181,8 +181,11 @@ int convert_to_sparse(struct index_state *istate)
>         /*
>          * Silently return if there is a problem with the cache tree update,
>          * which might just be due to a conflict state in some entry.
> +        *
> +        * This might create new tree objects, so be sure to use
> +        * WRITE_TREE_MISSING_OK.
>          */
> -       if (cache_tree_update(istate, 0))
> +       if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
>                 return 0;
>
>         remove_fsmonitor(istate);
> --
> gitgitgadget
Derrick Stolee Aug. 30, 2021, 1:19 p.m. UTC | #2
On 8/27/2021 5:33 PM, Elijah Newren wrote:
> On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When updating the cache tree in convert_to_sparse(), the
>> WRITE_TREE_MISSING_OK flag indicates that trees might be computed that
>> do not already exist within the object database.
> 
> Okay.
> 
>> This happens in cases
>> such as 'git add' creating new trees that it wants to store in
>> anticipation of a following 'git commit'.
> 
> This doesn't make any sense to me.  Does 'git add' call
> convert_to_sparse()?  I don't see why it would; wouldn't the calls to
> convert_to_sparse() come via sparse-checkout init/set commands?  If
> I'm correct on that, and 'git add' wants to create new trees, then by
> the time convert_to_sparse() is called in some subsequent git
> operation, then convert_to_sparse would already have the trees it
> needs.

If someone adds a change outside the sparse-checkout cone, then the
index is expanded in-memory, then is converted to sparse when the
index is written again.

> I thought the reason you would need this is someone modified and
> staged a change to a file underneath a directory that will be
> sparsified away; at the time of convert_to_sparse(), a tree object may
> not have yet been written for the new tree with the newly modified
> file (because those tend to be written at commit time), but you'd need
> it at the time you sparsified.

Yes. I think we are trying to say the same thing.

Thanks,
-Stolee
Elijah Newren Aug. 30, 2021, 8:08 p.m. UTC | #3
On Mon, Aug 30, 2021 at 6:19 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/27/2021 5:33 PM, Elijah Newren wrote:
> > On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> When updating the cache tree in convert_to_sparse(), the
> >> WRITE_TREE_MISSING_OK flag indicates that trees might be computed that
> >> do not already exist within the object database.
> >
> > Okay.
> >
> >> This happens in cases
> >> such as 'git add' creating new trees that it wants to store in
> >> anticipation of a following 'git commit'.
> >
> > This doesn't make any sense to me.  Does 'git add' call
> > convert_to_sparse()?  I don't see why it would; wouldn't the calls to
> > convert_to_sparse() come via sparse-checkout init/set commands?  If
> > I'm correct on that, and 'git add' wants to create new trees, then by
> > the time convert_to_sparse() is called in some subsequent git
> > operation, then convert_to_sparse would already have the trees it
> > needs.
>
> If someone adds a change outside the sparse-checkout cone, then the
> index is expanded in-memory, then is converted to sparse when the
> index is written again.

Ah, I missed that convert_to_sparse() is called from write_index(),
and that we are dealing with a single operation that does a
sparse->full (in memory)->sparse roundtrip.

Thanks for clearing that up.

> > I thought the reason you would need this is someone modified and
> > staged a change to a file underneath a directory that will be
> > sparsified away; at the time of convert_to_sparse(), a tree object may
> > not have yet been written for the new tree with the newly modified
> > file (because those tend to be written at commit time), but you'd need
> > it at the time you sparsified.
>
> Yes. I think we are trying to say the same thing.

While writing this, I was thinking in terms of the `git add` being
done when the index was full (both in memory and on disk), and then a
later `git sparse-checkout ...` command would invoke the
convert_to_sparse().  I didn't fully catch the nuances here.

Thanks for bringing me up to speed.
diff mbox series

Patch

diff --git a/sparse-index.c b/sparse-index.c
index d9b07695953..880c5f72338 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -181,8 +181,11 @@  int convert_to_sparse(struct index_state *istate)
 	/*
 	 * Silently return if there is a problem with the cache tree update,
 	 * which might just be due to a conflict state in some entry.
+	 *
+	 * This might create new tree objects, so be sure to use
+	 * WRITE_TREE_MISSING_OK.
 	 */
-	if (cache_tree_update(istate, 0))
+	if (cache_tree_update(istate, WRITE_TREE_MISSING_OK))
 		return 0;
 
 	remove_fsmonitor(istate);