Message ID | b2df2455871dc5c94ec804e9c0ce935c19da335f.1628618950.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | In grep, no adding submodule ODB as alternates | expand |
On Tue, Aug 10, 2021 at 11:28:44AM -0700, Jonathan Tan wrote: > > Record the repository whenever an OID grep source is created, and teach > the worker threads to explicitly provide the repository when accessing > objects. Since this extra information is not used by this series at all, does it make sense to only include this patch with the series covering the rest of your partial-clone-with-submodules work? - Emily
On Tue, Aug 10, 2021 at 3:29 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > Record the repository whenever an OID grep source is created, and teach > the worker threads to explicitly provide the repository when accessing > objects. This patch fixes the second NEEDSWORK comment at grep_submodule(), right? Maybe this comment could be replaced with a mention that add_submodule_odb_by_path() is called for testing purposes, and it should no longer produce a real addition to the alternates list? > diff --git a/grep.h b/grep.h > index f4a3090f1c..c5234f9b38 100644 > --- a/grep.h > +++ b/grep.h > @@ -187,6 +187,7 @@ struct grep_source { > GREP_SOURCE_BUF, > } type; > void *identifier; > + struct repository *repo; /* if GREP_SOURCE_OID */ Hmm, the grep threads now have two `struct repository` references, one in `struct grep_source` and one in `struct grep_opt` (see builtin/grep.c:run()). The one from `struct grep_opt` will always be `the_repository` (in the worker threads). I wonder if, in the long run, we could instruct the worker threads to use the new reference from `struct grep_source` throughout grep.c, and perhaps even remove the one from `struct grep_opt`. This would solve the issue I mentioned in [1], about git grep currently ignoring the textconv attributes from submodules, and instead applying the ones from the superproject in the submodules' files. (Here there is an example of this bug: [2] .) It would also allow grep to use the textconv cache from each submodule, instead of saving/reading everything from the superproject's textconv cache. While this doesn't happen, though, could we perhaps add a comment somewhere to avoid any confusion regarding the two different repository pointers that the worker threads hold? [1]: https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/ [2]: https://gitlab.com/-/snippets/1896951 > char *buf; > unsigned long size; > @@ -198,7 +199,8 @@ struct grep_source { > void grep_source_init_file(struct grep_source *gs, const char *name, > const char *path); > void grep_source_init_oid(struct grep_source *gs, const char *name, > - const char *path, const struct object_id *oid); > + const char *path, const struct object_id *oid, > + struct repository *repo); > void grep_source_init_buf(struct grep_source *gs); > void grep_source_clear_data(struct grep_source *gs); > void grep_source_clear(struct grep_source *gs); > -- > 2.33.0.rc1.237.g0d66db33f3-goog >
> On Tue, Aug 10, 2021 at 11:28:44AM -0700, Jonathan Tan wrote: > > > > Record the repository whenever an OID grep source is created, and teach > > the worker threads to explicitly provide the repository when accessing > > objects. > > Since this extra information is not used by this series at all, does it > make sense to only include this patch with the series covering the rest > of your partial-clone-with-submodules work? It is used by this series - it prevents an instance in which the submodule ODBs would be lazily registered as alternates. I'll add a note in the commit message to this effect.
> On Tue, Aug 10, 2021 at 3:29 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > > > Record the repository whenever an OID grep source is created, and teach > > the worker threads to explicitly provide the repository when accessing > > objects. > > This patch fixes the second NEEDSWORK comment at grep_submodule(), > right? Maybe this comment could be replaced with a mention that > add_submodule_odb_by_path() is called for testing purposes, and it > should no longer produce a real addition to the alternates list? Good catch. I'll update the comment. > > diff --git a/grep.h b/grep.h > > index f4a3090f1c..c5234f9b38 100644 > > --- a/grep.h > > +++ b/grep.h > > @@ -187,6 +187,7 @@ struct grep_source { > > GREP_SOURCE_BUF, > > } type; > > void *identifier; > > + struct repository *repo; /* if GREP_SOURCE_OID */ > > Hmm, the grep threads now have two `struct repository` references, one > in `struct grep_source` and one in `struct grep_opt` (see > builtin/grep.c:run()). The one from `struct grep_opt` will always be > `the_repository` (in the worker threads). I wonder if, in the long > run, we could instruct the worker threads to use the new reference > from `struct grep_source` throughout grep.c, and perhaps even remove > the one from `struct grep_opt`. > > This would solve the issue I mentioned in [1], about git grep > currently ignoring the textconv attributes from submodules, and > instead applying the ones from the superproject in the submodules' > files. (Here there is an example of this bug: [2] .) > > It would also allow grep to use the textconv cache from each > submodule, instead of saving/reading everything from the > superproject's textconv cache. > > While this doesn't happen, though, could we perhaps add a comment > somewhere to avoid any confusion regarding the two different > repository pointers that the worker threads hold? > > [1]: https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/ > [2]: https://gitlab.com/-/snippets/1896951 Ah...another good catch. Thanks also for a link to your email - I'll mention it when I add the comment you suggested.
diff --git a/builtin/grep.c b/builtin/grep.c index 5a40e18e47..ea6df6dca4 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -347,7 +347,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid, struct grep_source gs; grep_source_name(opt, filename, tree_name_len, &pathbuf); - grep_source_init_oid(&gs, pathbuf.buf, path, oid); + grep_source_init_oid(&gs, pathbuf.buf, path, oid, opt->repo); strbuf_release(&pathbuf); if (num_threads > 1) { diff --git a/grep.c b/grep.c index ba3711dc56..14c677b4ae 100644 --- a/grep.c +++ b/grep.c @@ -1853,7 +1853,8 @@ void grep_source_init_file(struct grep_source *gs, const char *name, } void grep_source_init_oid(struct grep_source *gs, const char *name, - const char *path, const struct object_id *oid) + const char *path, const struct object_id *oid, + struct repository *repo) { gs->type = GREP_SOURCE_OID; gs->name = xstrdup_or_null(name); @@ -1862,6 +1863,7 @@ void grep_source_init_oid(struct grep_source *gs, const char *name, gs->size = 0; gs->driver = NULL; gs->identifier = oiddup(oid); + gs->repo = repo; } void grep_source_init_buf(struct grep_source *gs) @@ -1901,7 +1903,8 @@ static int grep_source_load_oid(struct grep_source *gs) { enum object_type type; - gs->buf = read_object_file(gs->identifier, &type, &gs->size); + gs->buf = repo_read_object_file(gs->repo, gs->identifier, &type, + &gs->size); if (!gs->buf) return error(_("'%s': unable to read %s"), gs->name, diff --git a/grep.h b/grep.h index f4a3090f1c..c5234f9b38 100644 --- a/grep.h +++ b/grep.h @@ -187,6 +187,7 @@ struct grep_source { GREP_SOURCE_BUF, } type; void *identifier; + struct repository *repo; /* if GREP_SOURCE_OID */ char *buf; unsigned long size; @@ -198,7 +199,8 @@ struct grep_source { void grep_source_init_file(struct grep_source *gs, const char *name, const char *path); void grep_source_init_oid(struct grep_source *gs, const char *name, - const char *path, const struct object_id *oid); + const char *path, const struct object_id *oid, + struct repository *repo); void grep_source_init_buf(struct grep_source *gs); void grep_source_clear_data(struct grep_source *gs); void grep_source_clear(struct grep_source *gs);
Record the repository whenever an OID grep source is created, and teach the worker threads to explicitly provide the repository when accessing objects. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- builtin/grep.c | 2 +- grep.c | 7 +++++-- grep.h | 4 +++- 3 files changed, 9 insertions(+), 4 deletions(-)