Message ID | b379b8fc61af8a8c39ff8b73aae03ad4999a456c.1629841904.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Sparse index: delete ignored files outside sparse cone | expand |
On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > When updating the cache tree in convert_to_sparse(), the > WRITE_TREE_MISSING_OK flag indicates that trees might be computed that > do not already exist within the object database. Okay. > This happens in cases > such as 'git add' creating new trees that it wants to store in > anticipation of a following 'git commit'. This doesn't make any sense to me. Does 'git add' call convert_to_sparse()? I don't see why it would; wouldn't the calls to convert_to_sparse() come via sparse-checkout init/set commands? If I'm correct on that, and 'git add' wants to create new trees, then by the time convert_to_sparse() is called in some subsequent git operation, then convert_to_sparse would already have the trees it needs. I thought the reason you would need this is someone modified and staged a change to a file underneath a directory that will be sparsified away; at the time of convert_to_sparse(), a tree object may not have yet been written for the new tree with the newly modified file (because those tend to be written at commit time), but you'd need it at the time you sparsified. > If this flag is not specified, > then it might trigger a promisor fetch or a failure due to the object > not existing locally. Good point. > Use WRITE_TREE_MISSING_OK during convert_to_sparse() to avoid these > possible reasons for the cache_tree_update() to fail. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > sparse-index.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/sparse-index.c b/sparse-index.c > index d9b07695953..880c5f72338 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -181,8 +181,11 @@ int convert_to_sparse(struct index_state *istate) > /* > * Silently return if there is a problem with the cache tree update, > * which might just be due to a conflict state in some entry. > + * > + * This might create new tree objects, so be sure to use > + * WRITE_TREE_MISSING_OK. > */ > - if (cache_tree_update(istate, 0)) > + if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) > return 0; > > remove_fsmonitor(istate); > -- > gitgitgadget
On 8/27/2021 5:33 PM, Elijah Newren wrote: > On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <dstolee@microsoft.com> >> >> When updating the cache tree in convert_to_sparse(), the >> WRITE_TREE_MISSING_OK flag indicates that trees might be computed that >> do not already exist within the object database. > > Okay. > >> This happens in cases >> such as 'git add' creating new trees that it wants to store in >> anticipation of a following 'git commit'. > > This doesn't make any sense to me. Does 'git add' call > convert_to_sparse()? I don't see why it would; wouldn't the calls to > convert_to_sparse() come via sparse-checkout init/set commands? If > I'm correct on that, and 'git add' wants to create new trees, then by > the time convert_to_sparse() is called in some subsequent git > operation, then convert_to_sparse would already have the trees it > needs. If someone adds a change outside the sparse-checkout cone, then the index is expanded in-memory, then is converted to sparse when the index is written again. > I thought the reason you would need this is someone modified and > staged a change to a file underneath a directory that will be > sparsified away; at the time of convert_to_sparse(), a tree object may > not have yet been written for the new tree with the newly modified > file (because those tend to be written at commit time), but you'd need > it at the time you sparsified. Yes. I think we are trying to say the same thing. Thanks, -Stolee
On Mon, Aug 30, 2021 at 6:19 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 8/27/2021 5:33 PM, Elijah Newren wrote: > > On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > >> > >> From: Derrick Stolee <dstolee@microsoft.com> > >> > >> When updating the cache tree in convert_to_sparse(), the > >> WRITE_TREE_MISSING_OK flag indicates that trees might be computed that > >> do not already exist within the object database. > > > > Okay. > > > >> This happens in cases > >> such as 'git add' creating new trees that it wants to store in > >> anticipation of a following 'git commit'. > > > > This doesn't make any sense to me. Does 'git add' call > > convert_to_sparse()? I don't see why it would; wouldn't the calls to > > convert_to_sparse() come via sparse-checkout init/set commands? If > > I'm correct on that, and 'git add' wants to create new trees, then by > > the time convert_to_sparse() is called in some subsequent git > > operation, then convert_to_sparse would already have the trees it > > needs. > > If someone adds a change outside the sparse-checkout cone, then the > index is expanded in-memory, then is converted to sparse when the > index is written again. Ah, I missed that convert_to_sparse() is called from write_index(), and that we are dealing with a single operation that does a sparse->full (in memory)->sparse roundtrip. Thanks for clearing that up. > > I thought the reason you would need this is someone modified and > > staged a change to a file underneath a directory that will be > > sparsified away; at the time of convert_to_sparse(), a tree object may > > not have yet been written for the new tree with the newly modified > > file (because those tend to be written at commit time), but you'd need > > it at the time you sparsified. > > Yes. I think we are trying to say the same thing. While writing this, I was thinking in terms of the `git add` being done when the index was full (both in memory and on disk), and then a later `git sparse-checkout ...` command would invoke the convert_to_sparse(). I didn't fully catch the nuances here. Thanks for bringing me up to speed.
diff --git a/sparse-index.c b/sparse-index.c index d9b07695953..880c5f72338 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -181,8 +181,11 @@ int convert_to_sparse(struct index_state *istate) /* * Silently return if there is a problem with the cache tree update, * which might just be due to a conflict state in some entry. + * + * This might create new tree objects, so be sure to use + * WRITE_TREE_MISSING_OK. */ - if (cache_tree_update(istate, 0)) + if (cache_tree_update(istate, WRITE_TREE_MISSING_OK)) return 0; remove_fsmonitor(istate);