Message ID | c9e100e68f80196a35a37b5d0aad74e8e1174766.1629206603.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse index: delete ignored files outside sparse cone | expand |
Hi Stolee, On Tue, 17 Aug 2021, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > attr.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/attr.c b/attr.c > index d029e681f28..a1009f78029 100644 > --- a/attr.c > +++ b/attr.c > @@ -14,6 +14,7 @@ > #include "utf8.h" > #include "quote.h" > #include "thread-utils.h" > +#include "dir.h" > > const char git_attr__true[] = "(builtin)true"; > const char git_attr__false[] = "\0(builtin)false"; > @@ -744,6 +745,19 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate, > if (!istate) > return NULL; > > + /* > + * In the case of cone-mode sparse-checkout, getting the > + * .gitattributes file from a directory is meaningless: all > + * contained paths will be sparse if the .gitattributes is also > + * sparse. In the case of a sparse index, it is critical that we > + * don't go looking for one as it will expand the index. > + */ > + init_sparse_checkout_patterns(istate); At first I thought that `init_sparse_checkout_patterns()` is called by `path_in_sparse_checkout()` below, and therefore would not be necessary. But it is! Without it, we have no way to test whether `use_cone_patterns` is set, because, well, it gets set by `init_sparse_checkout_patterns()`. Would it therefore make sense to refactor the code to have a `path_in_sparse_checkout_cone()` function? Or add a flag `only_in_cone_mode` as function parameter to `path_in_sparse_checkout()`? Ciao, Dscho > + if (istate->sparse_checkout_patterns && > + istate->sparse_checkout_patterns->use_cone_patterns && > + path_in_sparse_checkout(path, istate) == NOT_MATCHED) > + return NULL; > + > buf = read_blob_data_from_index(istate, path, NULL); > if (!buf) > return NULL; > -- > gitgitgadget > >
On Tue, Aug 17, 2021 at 6:23 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > attr.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/attr.c b/attr.c > index d029e681f28..a1009f78029 100644 > --- a/attr.c > +++ b/attr.c > @@ -14,6 +14,7 @@ > #include "utf8.h" > #include "quote.h" > #include "thread-utils.h" > +#include "dir.h" > > const char git_attr__true[] = "(builtin)true"; > const char git_attr__false[] = "\0(builtin)false"; > @@ -744,6 +745,19 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate, > if (!istate) > return NULL; > > + /* > + * In the case of cone-mode sparse-checkout, getting the > + * .gitattributes file from a directory is meaningless: all > + * contained paths will be sparse if the .gitattributes is also > + * sparse. In the case of a sparse index, it is critical that we > + * don't go looking for one as it will expand the index. > + */ "all contained paths will be sparse if the .gitattributes is also sparse"? Do you mean something more like "the .gitattributes only applies for files under the given directory, and if the directory is sparse, then neither the .gitattributes file or any other file under that directory will be present" ? Also, out of curiosity, I was suggesting in the past we do something like this for .gitignore files, for the same reason. Do we have such logic in place, or is that another area of the code that hasn't been handled yet? > + init_sparse_checkout_patterns(istate); > + if (istate->sparse_checkout_patterns && > + istate->sparse_checkout_patterns->use_cone_patterns && > + path_in_sparse_checkout(path, istate) == NOT_MATCHED) > + return NULL; > + > buf = read_blob_data_from_index(istate, path, NULL); > if (!buf) > return NULL; > -- > gitgitgadget Though the code appears correct, I too am curious about the questions Dscho asked about the code in this patch.
On 8/19/2021 4:11 AM, Johannes Schindelin wrote: > On Tue, 17 Aug 2021, Derrick Stolee via GitGitGadget wrote: >> + /* >> + * In the case of cone-mode sparse-checkout, getting the >> + * .gitattributes file from a directory is meaningless: all >> + * contained paths will be sparse if the .gitattributes is also >> + * sparse. In the case of a sparse index, it is critical that we >> + * don't go looking for one as it will expand the index. >> + */ >> + init_sparse_checkout_patterns(istate); > At first I thought that `init_sparse_checkout_patterns()` is called by > `path_in_sparse_checkout()` below, and therefore would not be necessary. > > But it is! Without it, we have no way to test whether `use_cone_patterns` > is set, because, well, it gets set by `init_sparse_checkout_patterns()`. > > Would it therefore make sense to refactor the code to have a > `path_in_sparse_checkout_cone()` function? Or add a flag > `only_in_cone_mode` as function parameter to `path_in_sparse_checkout()`? One way to clean this up as-is would be to replace >> + if (istate->sparse_checkout_patterns && >> + istate->sparse_checkout_patterns->use_cone_patterns && >> + path_in_sparse_checkout(path, istate) == NOT_MATCHED) with if (!path_in_sparse_checkout(path, istate) && istate->sparse_checkout_patterns->use_cone_patterns) to get a similar effect. However, it creates wasted pattern-match checks when not in cone-mode, so you are probably right that a path_in_cone_mode_sparse_checkout() would be best to short-circuit in the non-cone-mode case. This special casing should be rare, so I don't think it is worth adding a flag to path_in_sparse_checkout() and making those call sites more complicated. Thanks, -Stolee
On 8/19/2021 4:53 PM, Elijah Newren wrote: > On Tue, Aug 17, 2021 at 6:23 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> + /* >> + * In the case of cone-mode sparse-checkout, getting the >> + * .gitattributes file from a directory is meaningless: all >> + * contained paths will be sparse if the .gitattributes is also >> + * sparse. In the case of a sparse index, it is critical that we >> + * don't go looking for one as it will expand the index. >> + */ > > "all contained paths will be sparse if the .gitattributes is also sparse"? > > Do you mean something more like "the .gitattributes only applies for > files under the given directory, and if the directory is sparse, then > neither the .gitattributes file or any other file under that directory > will be present" ? Yes, you understand correctly and explain it better. Thanks. > Also, out of curiosity, I was suggesting in the past we do something > like this for .gitignore files, for the same reason. Do we have such > logic in place, or is that another area of the code that hasn't been > handled yet? I don't believe this has been handled. It definitely is less obvious what to do there, because the point of .gitignore is to skip files that exist in the working tree even if Git didn't put them there. Meanwhile, .gitattributes is about how Git writes tracked files, but Git doesn't write sparse tracked files. > Though the code appears correct, I too am curious about the questions > Dscho asked about the code in this patch. Thanks, -Stolee
On Fri, Aug 20, 2021 at 8:39 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 8/19/2021 4:53 PM, Elijah Newren wrote: > > On Tue, Aug 17, 2021 at 6:23 AM Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > >> + /* > >> + * In the case of cone-mode sparse-checkout, getting the > >> + * .gitattributes file from a directory is meaningless: all > >> + * contained paths will be sparse if the .gitattributes is also > >> + * sparse. In the case of a sparse index, it is critical that we > >> + * don't go looking for one as it will expand the index. > >> + */ > > > > "all contained paths will be sparse if the .gitattributes is also sparse"? > > > > Do you mean something more like "the .gitattributes only applies for > > files under the given directory, and if the directory is sparse, then > > neither the .gitattributes file or any other file under that directory > > will be present" ? > > Yes, you understand correctly and explain it better. Thanks. > > > Also, out of curiosity, I was suggesting in the past we do something > > like this for .gitignore files, for the same reason. Do we have such > > logic in place, or is that another area of the code that hasn't been > > handled yet? > > I don't believe this has been handled. It definitely is less obvious > what to do there, because the point of .gitignore is to skip files that > exist in the working tree even if Git didn't put them there. Meanwhile, > .gitattributes is about how Git writes tracked files, but Git doesn't > write sparse tracked files. Well, one advantage of deleting ignored files in sparse directories when we sparsify, is that we know if any files are left, they are all untracked and not ignored. So we don't need to load the .gitignore file for those sparse directories. Sure, there's a small edge case of users adding new untracked files that would have matched the .gitignore file, but it's also clear that they removed the .gitignore file when they sparsified, so I don't see a problem in reporting it as untracked while that directory was as-sparsified-away-as-possible.
diff --git a/attr.c b/attr.c index d029e681f28..a1009f78029 100644 --- a/attr.c +++ b/attr.c @@ -14,6 +14,7 @@ #include "utf8.h" #include "quote.h" #include "thread-utils.h" +#include "dir.h" const char git_attr__true[] = "(builtin)true"; const char git_attr__false[] = "\0(builtin)false"; @@ -744,6 +745,19 @@ static struct attr_stack *read_attr_from_index(struct index_state *istate, if (!istate) return NULL; + /* + * In the case of cone-mode sparse-checkout, getting the + * .gitattributes file from a directory is meaningless: all + * contained paths will be sparse if the .gitattributes is also + * sparse. In the case of a sparse index, it is critical that we + * don't go looking for one as it will expand the index. + */ + init_sparse_checkout_patterns(istate); + if (istate->sparse_checkout_patterns && + istate->sparse_checkout_patterns->use_cone_patterns && + path_in_sparse_checkout(path, istate) == NOT_MATCHED) + return NULL; + buf = read_blob_data_from_index(istate, path, NULL); if (!buf) return NULL;