Message ID | b70a00b9b06e5142bb95891cfc2faa87d708ef0d.1622580781.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | First steps towards partial clone submodules | expand |
On Tue, Jun 01, 2021 at 02:34:19PM -0700, Jonathan Tan wrote: > This is one step towards supporting partial clone submodules. > > Even after this patch, we will still lack partial clone submodules > support, primarily because a lot of Git code that accesses submodule > objects does so by adding their object stores as alternates, meaning > that any lazy fetches that would occur in the submodule would be done > based on the config of the superproject, not of the submodule. This also > prevents testing of the functionality in this patch by user-facing > commands. So for now, test this mechanism using a test helper. OK. Everything you wrote seemed reasonable to me, but I did have a couple of questions on the test you added: > diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c > new file mode 100644 > index 0000000000..e7bc7eb21f > --- /dev/null > +++ b/t/helper/test-partial-clone.c > @@ -0,0 +1,34 @@ > +#include "cache.h" > +#include "test-tool.h" > +#include "repository.h" > +#include "object-store.h" > + > +static void object_info(const char *gitdir, const char *oid_hex) > +{ > + struct repository r; > + struct object_id oid; > + unsigned long size; > + struct object_info oi = {.sizep = &size}; > + const char *p; > + > + if (repo_init(&r, gitdir, NULL)) > + die("could not init repo"); > + if (parse_oid_hex(oid_hex, &oid, &p)) > + die("could not parse oid"); > + if (oid_object_info_extended(&r, &oid, &oi, 0)) > + die("could not obtain object info"); > + printf("%d\n", (int) size); > +} Hmm. Is there a reason that the same couldn't be implemented by calling "git cat-file -s" from the partial clone? > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh > index 584a039b85..e804d267e6 100755 > --- a/t/t0410-partial-clone.sh > +++ b/t/t0410-partial-clone.sh > @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o > git -C repo cherry-pick side1 > ' > > +test_expect_success 'lazy-fetch when accessing object not in the_repository' ' > + rm -rf full partial.git && > + test_create_repo full && > + printf 12345 >full/file.txt && > + git -C full add file.txt && > + git -C full commit -m "first commit" && This is a stylistic nit, but I think using test_commit is better here for a non-superficial reason. My guess is that you wanted to avoid specifying a message and file (which are required positional arguments to test_commit before you can specify the contents). But I think there are two good reasons to use test_commit here: - It saves three lines of test script here. - You don't have to make the expected size a magic number (i.e., because you knew ahead of time that the contents was "12345"). I probably would have expected this test to end with: git -C full cat-file -s $FILE_HASH >expect && git -C partial.git cat-file -s $FILE_HASH >actual && test_cmp expect actual which reads more clearly to me (although I think the much more important test is that $FILE_HASH doesn't show up in the output of the rev-list --missing=print that is run in the partial clone). > + > + test_config -C full uploadpack.allowfilter 1 && > + test_config -C full uploadpack.allowanysha1inwant 1 && > + git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git && > + FILE_HASH=$(git hash-object --stdin <full/file.txt) && This works for me, although I wouldn't have been sad to see the sub-shell contain "git -C full rev-parse HEAD:file.txt" instead. > + # Sanity check that the file is missing > + git -C partial.git rev-list --objects --missing=print HEAD >out && > + grep "[?]$FILE_HASH" out && > + > + OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") && Coming back to my point about the utility of the partial-clone helper, could this be replaced by saying just OUT="$(git -C partial.git cat-file -s "$FILE_HASH")" instead? Thanks, Taylor
On Tue, Jun 1, 2021 at 2:38 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > This is one step towards supporting partial clone submodules. > > Even after this patch, we will still lack partial clone submodules > support, primarily because a lot of Git code that accesses submodule > objects does so by adding their object stores as alternates, meaning > that any lazy fetches that would occur in the submodule would be done > based on the config of the superproject, not of the submodule. This also > prevents testing of the functionality in this patch by user-facing > commands. So for now, test this mechanism using a test helper. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > Makefile | 1 + > object-file.c | 7 ++----- > promisor-remote.c | 14 +++++++++----- > t/helper/test-partial-clone.c | 34 ++++++++++++++++++++++++++++++++++ > t/helper/test-tool.c | 1 + > t/helper/test-tool.h | 1 + > t/t0410-partial-clone.sh | 24 ++++++++++++++++++++++++ > 7 files changed, 72 insertions(+), 10 deletions(-) > create mode 100644 t/helper/test-partial-clone.c > > diff --git a/Makefile b/Makefile > index c3565fc0f8..f6653bcd5e 100644 > --- a/Makefile > +++ b/Makefile > @@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o > TEST_BUILTINS_OBJS += test-online-cpus.o > TEST_BUILTINS_OBJS += test-parse-options.o > TEST_BUILTINS_OBJS += test-parse-pathspec-file.o > +TEST_BUILTINS_OBJS += test-partial-clone.o > TEST_BUILTINS_OBJS += test-path-utils.o > TEST_BUILTINS_OBJS += test-pcre2-config.o > TEST_BUILTINS_OBJS += test-pkt-line.o > diff --git a/object-file.c b/object-file.c > index f233b440b2..ebf273e9e7 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r, > } > > /* Check if it is a missing object */ > - if (fetch_if_missing && has_promisor_remote() && > - !already_retried && r == the_repository && > + if (fetch_if_missing && repo_has_promisor_remote(r) && > + !already_retried && So here you removed the special check against the_repository while looking for promisor_remotes. There are other such special checks in the code; I also see: diff.c: if (options->repo == the_repository && has_promisor_remote() && diffcore-break.c: if (r == the_repository && has_promisor_remote()) { diffcore-rename.c: if (r == the_repository && has_promisor_remote()) { and a series I'm planning to submit soon will add another to merge.ort.c. Do these need to all be fixed as part of the partial clone submodule support as well? Do I need to change anything about my series? I guess since I'm asking that, I should probably submit it first so you can actually see it and answer my question. (And the timing may be good since the area is fresh in your memory...) > !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) { > /* > * TODO Investigate checking promisor_remote_get_direct() > * TODO return value and stopping on error here. > - * TODO Pass a repository struct through > - * promisor_remote_get_direct(), such that arbitrary > - * repositories work. Odd, it appears that when this comment was added (in commit b14ed5adaf ("Use promisor_remote_get_direct() and has_promisor_remote()", 2019-06-25)), a repository was passed to promisor_remote_get_direct(). Sure, it was just a transliteration of the comment that was there before when fetch_objects() was the function being called, but since the code was being changed and the comment being updated, it seems the TODO should have been removed back then. Oh, well, good to update it now at least. > */ > promisor_remote_get_direct(r, real, 1); > already_retried = 1; > diff --git a/promisor-remote.c b/promisor-remote.c > index 5819d2cf28..1601f05d79 100644 > --- a/promisor-remote.c > +++ b/promisor-remote.c > @@ -11,7 +11,8 @@ struct promisor_remote_config { > struct promisor_remote **promisors_tail; > }; > > -static int fetch_objects(const char *remote_name, > +static int fetch_objects(struct repository *repo, > + const char *remote_name, > const struct object_id *oids, > int oid_nr) > { > @@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name, > > child.git_cmd = 1; > child.in = -1; > + if (repo != the_repository) { > + prepare_other_repo_env(&child.env_array); > + strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT, > + repo->gitdir); > + } > strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop", > "fetch", remote_name, "--no-tags", > "--no-write-fetch-head", "--recurse-submodules=no", > @@ -150,7 +156,7 @@ static void promisor_remote_init(struct repository *r) > xcalloc(sizeof(*r->promisor_remote_config), 1); > config->promisors_tail = &config->promisors; > > - git_config(promisor_remote_config, config); > + repo_config(r, promisor_remote_config, config); > > if (config->repository_format_partial_clone) { > struct promisor_remote *o, *previous; > @@ -246,10 +252,8 @@ int promisor_remote_get_direct(struct repository *repo, > > promisor_remote_init(repo); > > - if (repo != the_repository) > - BUG("only the_repository is supported for now"); > for (r = repo->promisor_remote_config->promisors; r; r = r->next) { > - if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) { > + if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) { > if (remaining_nr == 1) > continue; > remaining_nr = remove_fetched_oids(repo, &remaining_oids, > diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c > new file mode 100644 > index 0000000000..e7bc7eb21f > --- /dev/null > +++ b/t/helper/test-partial-clone.c > @@ -0,0 +1,34 @@ > +#include "cache.h" > +#include "test-tool.h" > +#include "repository.h" > +#include "object-store.h" > + > +static void object_info(const char *gitdir, const char *oid_hex) > +{ > + struct repository r; > + struct object_id oid; > + unsigned long size; > + struct object_info oi = {.sizep = &size}; > + const char *p; > + > + if (repo_init(&r, gitdir, NULL)) > + die("could not init repo"); > + if (parse_oid_hex(oid_hex, &oid, &p)) > + die("could not parse oid"); > + if (oid_object_info_extended(&r, &oid, &oi, 0)) > + die("could not obtain object info"); > + printf("%d\n", (int) size); > +} > + > +int cmd__partial_clone(int argc, const char **argv) > +{ > + if (argc < 4) > + die("too few arguments"); > + > + if (!strcmp(argv[1], "object-info")) > + object_info(argv[2], argv[3]); > + else > + die("invalid argument '%s'", argv[1]); > + > + return 0; > +} > diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c > index c5bd0c6d4c..b21e8f1519 100644 > --- a/t/helper/test-tool.c > +++ b/t/helper/test-tool.c > @@ -46,6 +46,7 @@ static struct test_cmd cmds[] = { > { "online-cpus", cmd__online_cpus }, > { "parse-options", cmd__parse_options }, > { "parse-pathspec-file", cmd__parse_pathspec_file }, > + { "partial-clone", cmd__partial_clone }, > { "path-utils", cmd__path_utils }, > { "pcre2-config", cmd__pcre2_config }, > { "pkt-line", cmd__pkt_line }, > diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h > index e8069a3b22..f845ced4b3 100644 > --- a/t/helper/test-tool.h > +++ b/t/helper/test-tool.h > @@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv); > int cmd__online_cpus(int argc, const char **argv); > int cmd__parse_options(int argc, const char **argv); > int cmd__parse_pathspec_file(int argc, const char** argv); > +int cmd__partial_clone(int argc, const char **argv); > int cmd__path_utils(int argc, const char **argv); > int cmd__pcre2_config(int argc, const char **argv); > int cmd__pkt_line(int argc, const char **argv); > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh > index 584a039b85..e804d267e6 100755 > --- a/t/t0410-partial-clone.sh > +++ b/t/t0410-partial-clone.sh > @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o > git -C repo cherry-pick side1 > ' > > +test_expect_success 'lazy-fetch when accessing object not in the_repository' ' > + rm -rf full partial.git && > + test_create_repo full && > + printf 12345 >full/file.txt && > + git -C full add file.txt && > + git -C full commit -m "first commit" && > + > + test_config -C full uploadpack.allowfilter 1 && > + test_config -C full uploadpack.allowanysha1inwant 1 && > + git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git && > + FILE_HASH=$(git hash-object --stdin <full/file.txt) && > + > + # Sanity check that the file is missing > + git -C partial.git rev-list --objects --missing=print HEAD >out && > + grep "[?]$FILE_HASH" out && > + > + OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") && > + test "$OUT" -eq 5 && > + > + # Sanity check that the file is now present > + git -C partial.git rev-list --objects --missing=print HEAD >out && > + ! grep "[?]$FILE_HASH" out > +' > + > . "$TEST_DIRECTORY"/lib-httpd.sh > start_httpd > > -- > 2.32.0.rc0.204.g9fa02ecfa5-goog Looks good to me.
Hi, On Tue, Jun 1, 2021 at 2:38 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > This is one step towards supporting partial clone submodules. > > Even after this patch, we will still lack partial clone submodules > support, primarily because a lot of Git code that accesses submodule > objects does so by adding their object stores as alternates, meaning > that any lazy fetches that would occur in the submodule would be done > based on the config of the superproject, not of the submodule. This also > prevents testing of the functionality in this patch by user-facing > commands. So for now, test this mechanism using a test helper. > ... > diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c > new file mode 100644 > index 0000000000..e7bc7eb21f > --- /dev/null > +++ b/t/helper/test-partial-clone.c > @@ -0,0 +1,34 @@ > +#include "cache.h" > +#include "test-tool.h" > +#include "repository.h" > +#include "object-store.h" > + > +static void object_info(const char *gitdir, const char *oid_hex) > +{ > + struct repository r; > + struct object_id oid; > + unsigned long size; > + struct object_info oi = {.sizep = &size}; > + const char *p; > + > + if (repo_init(&r, gitdir, NULL)) > + die("could not init repo"); > + if (parse_oid_hex(oid_hex, &oid, &p)) > + die("could not parse oid"); > + if (oid_object_info_extended(&r, &oid, &oi, 0)) > + die("could not obtain object info"); > + printf("%d\n", (int) size); > +} > + > +int cmd__partial_clone(int argc, const char **argv) > +{ > + if (argc < 4) > + die("too few arguments"); > + > + if (!strcmp(argv[1], "object-info")) > + object_info(argv[2], argv[3]); > + else > + die("invalid argument '%s'", argv[1]); > + > + return 0; > +} > diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c > index c5bd0c6d4c..b21e8f1519 100644 > --- a/t/helper/test-tool.c > +++ b/t/helper/test-tool.c > @@ -46,6 +46,7 @@ static struct test_cmd cmds[] = { > { "online-cpus", cmd__online_cpus }, > { "parse-options", cmd__parse_options }, > { "parse-pathspec-file", cmd__parse_pathspec_file }, > + { "partial-clone", cmd__partial_clone }, > { "path-utils", cmd__path_utils }, > { "pcre2-config", cmd__pcre2_config }, > { "pkt-line", cmd__pkt_line }, > diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h > index e8069a3b22..f845ced4b3 100644 > --- a/t/helper/test-tool.h > +++ b/t/helper/test-tool.h > @@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv); > int cmd__online_cpus(int argc, const char **argv); > int cmd__parse_options(int argc, const char **argv); > int cmd__parse_pathspec_file(int argc, const char** argv); > +int cmd__partial_clone(int argc, const char **argv); > int cmd__path_utils(int argc, const char **argv); > int cmd__pcre2_config(int argc, const char **argv); > int cmd__pkt_line(int argc, const char **argv); > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh > index 584a039b85..e804d267e6 100755 > --- a/t/t0410-partial-clone.sh > +++ b/t/t0410-partial-clone.sh > @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o > git -C repo cherry-pick side1 > ' > > +test_expect_success 'lazy-fetch when accessing object not in the_repository' ' > + rm -rf full partial.git && > + test_create_repo full && > + printf 12345 >full/file.txt && > + git -C full add file.txt && > + git -C full commit -m "first commit" && > + > + test_config -C full uploadpack.allowfilter 1 && > + test_config -C full uploadpack.allowanysha1inwant 1 && > + git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git && > + FILE_HASH=$(git hash-object --stdin <full/file.txt) && > + > + # Sanity check that the file is missing > + git -C partial.git rev-list --objects --missing=print HEAD >out && > + grep "[?]$FILE_HASH" out && > + > + OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") && > + test "$OUT" -eq 5 && > + > + # Sanity check that the file is now present > + git -C partial.git rev-list --objects --missing=print HEAD >out && > + ! grep "[?]$FILE_HASH" out > +' > + Turns out that this test fails under GIT_TEST_DEFAULT_HASH=sha256; output: error: wrong index v2 file size in /home/newren/floss/git/t/trash directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx error: wrong index v2 file size in /home/newren/floss/git/t/trash directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx fatal: couldn't find remote ref 74242c6e4a0d89f454d89d3496a1f7cb3f1f39f0 error: wrong index v2 file size in /home/newren/floss/git/t/trash directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx error: wrong index v2 file size in /home/newren/floss/git/t/trash directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx fatal: could not obtain object info
> > +static void object_info(const char *gitdir, const char *oid_hex) > > +{ > > + struct repository r; > > + struct object_id oid; > > + unsigned long size; > > + struct object_info oi = {.sizep = &size}; > > + const char *p; > > + > > + if (repo_init(&r, gitdir, NULL)) > > + die("could not init repo"); > > + if (parse_oid_hex(oid_hex, &oid, &p)) > > + die("could not parse oid"); > > + if (oid_object_info_extended(&r, &oid, &oi, 0)) > > + die("could not obtain object info"); > > + printf("%d\n", (int) size); > > +} > > Hmm. Is there a reason that the same couldn't be implemented by calling "git > cat-file -s" from the partial clone? I don't think "git cat-file" (when run in the superproject) by itself can access an object from a submodule. "git -C name_of_submodule cat-file $HASH" would access that object, but I specifically want to test oid_object_info_extended() on a repo that is not the_repository (which would not work with -C, because the_repository would then be the submodule). > > +test_expect_success 'lazy-fetch when accessing object not in the_repository' ' > > + rm -rf full partial.git && > > + test_create_repo full && > > + printf 12345 >full/file.txt && > > + git -C full add file.txt && > > + git -C full commit -m "first commit" && > > This is a stylistic nit, but I think using test_commit is better here > for a non-superficial reason. My guess is that you wanted to avoid > specifying a message and file (which are required positional arguments > to test_commit before you can specify the contents). But I think there > are two good reasons to use test_commit here: > > - It saves three lines of test script here. > - You don't have to make the expected size a magic number (i.e., > because you knew ahead of time that the contents was "12345"). > > I probably would have expected this test to end with: > > git -C full cat-file -s $FILE_HASH >expect && > git -C partial.git cat-file -s $FILE_HASH >actual && > test_cmp expect actual > > which reads more clearly to me (although I think the much more important > test is that $FILE_HASH doesn't show up in the output of the rev-list > --missing=print that is run in the partial clone). That makes sense. > > + test_config -C full uploadpack.allowfilter 1 && > > + test_config -C full uploadpack.allowanysha1inwant 1 && > > + git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git && > > + FILE_HASH=$(git hash-object --stdin <full/file.txt) && > > This works for me, although I wouldn't have been sad to see the > sub-shell contain "git -C full rev-parse HEAD:file.txt" instead. I'll do this too. > > + # Sanity check that the file is missing > > + git -C partial.git rev-list --objects --missing=print HEAD >out && > > + grep "[?]$FILE_HASH" out && > > + > > + OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") && > > Coming back to my point about the utility of the partial-clone helper, > could this be replaced by saying just OUT="$(git -C partial.git cat-file > -s "$FILE_HASH")" instead? > > Thanks, > Taylor Same answer as above - I specifically want to test accessing (and thereby lazy-fetching) an object when the object is not in the_repository. I'll add some documentation to explain what it does and that it's equivalent to using "git -C repo cat-file -s", except that this specifically tests another code path.
> > diff --git a/object-file.c b/object-file.c > > index f233b440b2..ebf273e9e7 100644 > > --- a/object-file.c > > +++ b/object-file.c > > @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r, > > } > > > > /* Check if it is a missing object */ > > - if (fetch_if_missing && has_promisor_remote() && > > - !already_retried && r == the_repository && > > + if (fetch_if_missing && repo_has_promisor_remote(r) && > > + !already_retried && > > So here you removed the special check against the_repository while > looking for promisor_remotes. There are other such special checks in > the code; I also see: > > diff.c: if (options->repo == the_repository && has_promisor_remote() && > diffcore-break.c: if (r == the_repository && has_promisor_remote()) { > diffcore-rename.c: if (r == the_repository && has_promisor_remote()) { > > and a series I'm planning to submit soon will add another to merge.ort.c. > > Do these need to all be fixed as part of the partial clone submodule > support as well? Do I need to change anything about my series? I > guess since I'm asking that, I should probably submit it first so you > can actually see it and answer my question. (And the timing may be > good since the area is fresh in your memory...) Thanks for raising this. Looking at the 3 you listed, they all have to do with prefetching. This is fine both now and later. Now, since partial clones in submodules still don't work, any fetching of any sort (pre- or not) will not work. Later, since this prefetching is just an optimization. (Of course, we should come back and add prefetching for submodule partial clones, but that is an optimization, not a correctness issue.) > > !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) { > > /* > > * TODO Investigate checking promisor_remote_get_direct() > > * TODO return value and stopping on error here. > > - * TODO Pass a repository struct through > > - * promisor_remote_get_direct(), such that arbitrary > > - * repositories work. > > Odd, it appears that when this comment was added (in commit b14ed5adaf > ("Use promisor_remote_get_direct() and has_promisor_remote()", > 2019-06-25)), a repository was passed to promisor_remote_get_direct(). > Sure, it was just a transliteration of the comment that was there > before when fetch_objects() was the function being called, but since > the code was being changed and the comment being updated, it seems the > TODO should have been removed back then. > > Oh, well, good to update it now at least. Yes - perhaps the comment was emphasizing the "such that arbitrary repositories work" part. But anyway, yes, it is now removed. [snip] > Looks good to me. Thanks for taking a look.
> Turns out that this test fails under GIT_TEST_DEFAULT_HASH=sha256; output: > > error: wrong index v2 file size in /home/newren/floss/git/t/trash > directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx > error: wrong index v2 file size in /home/newren/floss/git/t/trash > directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx > fatal: couldn't find remote ref 74242c6e4a0d89f454d89d3496a1f7cb3f1f39f0 > error: wrong index v2 file size in /home/newren/floss/git/t/trash > directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx > error: wrong index v2 file size in /home/newren/floss/git/t/trash > directory.t0410-partial-clone/partial.git/objects/pack/pack-66a15be115d740341216938fb7abb31902e960bd6d464829d85164d1a4a25bec.idx > fatal: could not obtain object info Thanks for noticing this. I'll take a look.
A quick update... On Fri, Jun 4, 2021 at 2:35 PM Elijah Newren <newren@gmail.com> wrote: > > On Tue, Jun 1, 2021 at 2:38 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > ... > > diff --git a/object-file.c b/object-file.c > > index f233b440b2..ebf273e9e7 100644 > > --- a/object-file.c > > +++ b/object-file.c > > @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r, > > } > > > > /* Check if it is a missing object */ > > - if (fetch_if_missing && has_promisor_remote() && > > - !already_retried && r == the_repository && > > + if (fetch_if_missing && repo_has_promisor_remote(r) && > > + !already_retried && > > So here you removed the special check against the_repository while > looking for promisor_remotes. There are other such special checks in > the code; I also see: > > diff.c: if (options->repo == the_repository && has_promisor_remote() && > diffcore-break.c: if (r == the_repository && has_promisor_remote()) { > diffcore-rename.c: if (r == the_repository && has_promisor_remote()) { > > and a series I'm planning to submit soon will add another to merge.ort.c. > > Do these need to all be fixed as part of the partial clone submodule > support as well? Do I need to change anything about my series? I > guess since I'm asking that, I should probably submit it first so you > can actually see it and answer my question. (And the timing may be > good since the area is fresh in your memory...) That other topic is now over here: https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/T/#t
On Tue, Jun 01, 2021 at 02:34:19PM -0700, Jonathan Tan wrote: > > This is one step towards supporting partial clone submodules. > > Even after this patch, we will still lack partial clone submodules > support, primarily because a lot of Git code that accesses submodule > objects does so by adding their object stores as alternates, meaning > that any lazy fetches that would occur in the submodule would be done > based on the config of the superproject, not of the submodule. This also > prevents testing of the functionality in this patch by user-facing > commands. So for now, test this mechanism using a test helper. > > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- > Makefile | 1 + > object-file.c | 7 ++----- > promisor-remote.c | 14 +++++++++----- > t/helper/test-partial-clone.c | 34 ++++++++++++++++++++++++++++++++++ > t/helper/test-tool.c | 1 + > t/helper/test-tool.h | 1 + > t/t0410-partial-clone.sh | 24 ++++++++++++++++++++++++ > 7 files changed, 72 insertions(+), 10 deletions(-) > create mode 100644 t/helper/test-partial-clone.c > > diff --git a/Makefile b/Makefile > index c3565fc0f8..f6653bcd5e 100644 > --- a/Makefile > +++ b/Makefile > @@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o > TEST_BUILTINS_OBJS += test-online-cpus.o > TEST_BUILTINS_OBJS += test-parse-options.o > TEST_BUILTINS_OBJS += test-parse-pathspec-file.o > +TEST_BUILTINS_OBJS += test-partial-clone.o > TEST_BUILTINS_OBJS += test-path-utils.o > TEST_BUILTINS_OBJS += test-pcre2-config.o > TEST_BUILTINS_OBJS += test-pkt-line.o > diff --git a/object-file.c b/object-file.c > index f233b440b2..ebf273e9e7 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r, > } > > /* Check if it is a missing object */ > - if (fetch_if_missing && has_promisor_remote() && > - !already_retried && r == the_repository && > + if (fetch_if_missing && repo_has_promisor_remote(r) && > + !already_retried && So we remove the explicit "if we've got promisors and are operating on the repo we launched in" and instead ask "if the repo we're operating on has promisors" - definitely a step towards in-process submodule happiness :) > !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) { > /* > * TODO Investigate checking promisor_remote_get_direct() > * TODO return value and stopping on error here. > - * TODO Pass a repository struct through > - * promisor_remote_get_direct(), such that arbitrary > - * repositories work. > */ > promisor_remote_get_direct(r, real, 1); And this seems like a stale comment, since I see we were already passing 'r' here. But arbitrary repositories still don't just work, right? Or, I guess your point was "partial clone + submodules don't just work, because of the alternates thing" - so maybe this part is OK? > @@ -150,7 +156,7 @@ static void promisor_remote_init(struct repository *r) > xcalloc(sizeof(*r->promisor_remote_config), 1); > config->promisors_tail = &config->promisors; > > - git_config(promisor_remote_config, config); > + repo_config(r, promisor_remote_config, config); Should this change have happened when we added 'r' to promisor_remote_init? If r==the_repository then there's no difference between these two calls, right? > diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c > new file mode 100644 > index 0000000000..e7bc7eb21f > --- /dev/null > +++ b/t/helper/test-partial-clone.c > @@ -0,0 +1,34 @@ > +#include "cache.h" > +#include "test-tool.h" > +#include "repository.h" > +#include "object-store.h" > + > +static void object_info(const char *gitdir, const char *oid_hex) > +{ > + struct repository r; > + struct object_id oid; > + unsigned long size; > + struct object_info oi = {.sizep = &size}; > + const char *p; > + > + if (repo_init(&r, gitdir, NULL)) > + die("could not init repo"); > + if (parse_oid_hex(oid_hex, &oid, &p)) > + die("could not parse oid"); > + if (oid_object_info_extended(&r, &oid, &oi, 0)) > + die("could not obtain object info"); > + printf("%d\n", (int) size); > +} > + > +int cmd__partial_clone(int argc, const char **argv) > +{ > + if (argc < 4) > + die("too few arguments"); > + > + if (!strcmp(argv[1], "object-info")) > + object_info(argv[2], argv[3]); > + else > + die("invalid argument '%s'", argv[1]); > + > + return 0; > +} > diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c > index c5bd0c6d4c..b21e8f1519 100644 > --- a/t/helper/test-tool.c > +++ b/t/helper/test-tool.c > @@ -46,6 +46,7 @@ static struct test_cmd cmds[] = { > { "online-cpus", cmd__online_cpus }, > { "parse-options", cmd__parse_options }, > { "parse-pathspec-file", cmd__parse_pathspec_file }, > + { "partial-clone", cmd__partial_clone }, > { "path-utils", cmd__path_utils }, > { "pcre2-config", cmd__pcre2_config }, > { "pkt-line", cmd__pkt_line }, > diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h > index e8069a3b22..f845ced4b3 100644 > --- a/t/helper/test-tool.h > +++ b/t/helper/test-tool.h > @@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv); > int cmd__online_cpus(int argc, const char **argv); > int cmd__parse_options(int argc, const char **argv); > int cmd__parse_pathspec_file(int argc, const char** argv); > +int cmd__partial_clone(int argc, const char **argv); > int cmd__path_utils(int argc, const char **argv); > int cmd__pcre2_config(int argc, const char **argv); > int cmd__pkt_line(int argc, const char **argv); > diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh > index 584a039b85..e804d267e6 100755 > --- a/t/t0410-partial-clone.sh > +++ b/t/t0410-partial-clone.sh > @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o > git -C repo cherry-pick side1 > ' > > +test_expect_success 'lazy-fetch when accessing object not in the_repository' ' > + rm -rf full partial.git && > + test_create_repo full && > + printf 12345 >full/file.txt && > + git -C full add file.txt && > + git -C full commit -m "first commit" && I think there is some test_commit or similar function here that's more commonly used, right? > + > + test_config -C full uploadpack.allowfilter 1 && > + test_config -C full uploadpack.allowanysha1inwant 1 && I wasn't sure what these configs are for, but it looks like .allowfilter is to allow 'full' to serve as a remote to a partial clone. But what do you need .allowAnySha1InWant for here? Are we expecting to ask for SHAs that 'full' doesn't have? > + git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git && > + FILE_HASH=$(git hash-object --stdin <full/file.txt) && > + > + # Sanity check that the file is missing > + git -C partial.git rev-list --objects --missing=print HEAD >out && > + grep "[?]$FILE_HASH" out && > + > + OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") && > + test "$OUT" -eq 5 && Hm. I guess I am confused about why this fetches the desired object into partial.git. Maybe the test-helper needs a comment (and maybe here too) on the line where fetch will be triggered? > + > + # Sanity check that the file is now present > + git -C partial.git rev-list --objects --missing=print HEAD >out && > + ! grep "[?]$FILE_HASH" out - Emily
> > !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) { > > /* > > * TODO Investigate checking promisor_remote_get_direct() > > * TODO return value and stopping on error here. > > - * TODO Pass a repository struct through > > - * promisor_remote_get_direct(), such that arbitrary > > - * repositories work. > > */ > > promisor_remote_get_direct(r, real, 1); > > And this seems like a stale comment, since I see we were already passing > 'r' here. But arbitrary repositories still don't just work, right? Or, I > guess your point was "partial clone + submodules don't just work, > because of the alternates thing" - so maybe this part is OK? This part is OK (arbitrary repositories work here), yes. > > @@ -150,7 +156,7 @@ static void promisor_remote_init(struct repository *r) > > xcalloc(sizeof(*r->promisor_remote_config), 1); > > config->promisors_tail = &config->promisors; > > > > - git_config(promisor_remote_config, config); > > + repo_config(r, promisor_remote_config, config); > > Should this change have happened when we added 'r' to > promisor_remote_init? If r==the_repository then there's no difference > between these two calls, right? Good point - yes, it should have. I'll change it. > > +test_expect_success 'lazy-fetch when accessing object not in the_repository' ' > > + rm -rf full partial.git && > > + test_create_repo full && > > + printf 12345 >full/file.txt && > > + git -C full add file.txt && > > + git -C full commit -m "first commit" && > I think there is some test_commit or similar function here that's more > commonly used, right? Taylor Blau suggested a similar thing, and I have changed it in v2. > > > + > > + test_config -C full uploadpack.allowfilter 1 && > > + test_config -C full uploadpack.allowanysha1inwant 1 && > I wasn't sure what these configs are for, but it looks like .allowfilter > is to allow 'full' to serve as a remote to a partial clone. But what do > you need .allowAnySha1InWant for here? Are we expecting to ask for SHAs > that 'full' doesn't have? We are expecting to ask for SHAs that 'full' doesn't *advertise*, yes (namely, the hash of a certain blob). > > + git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git && > > + FILE_HASH=$(git hash-object --stdin <full/file.txt) && > > + > > + # Sanity check that the file is missing > > + git -C partial.git rev-list --objects --missing=print HEAD >out && > > + grep "[?]$FILE_HASH" out && > > + > > + OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") && > > + test "$OUT" -eq 5 && > > Hm. I guess I am confused about why this fetches the desired object into > partial.git. Maybe the test-helper needs a comment (and maybe here too) > on the line where fetch will be triggered? I've added a comment to the test-helper code in v2 - could you take a look and see if that clarifies things? But in any case, the answer is that this test-tool invocation attempts to read an object in the submodule while running as a process in the superproject. The read attempt is a read of a missing object, so that object is lazily fetched.
diff --git a/Makefile b/Makefile index c3565fc0f8..f6653bcd5e 100644 --- a/Makefile +++ b/Makefile @@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-oidmap.o TEST_BUILTINS_OBJS += test-online-cpus.o TEST_BUILTINS_OBJS += test-parse-options.o TEST_BUILTINS_OBJS += test-parse-pathspec-file.o +TEST_BUILTINS_OBJS += test-partial-clone.o TEST_BUILTINS_OBJS += test-path-utils.o TEST_BUILTINS_OBJS += test-pcre2-config.o TEST_BUILTINS_OBJS += test-pkt-line.o diff --git a/object-file.c b/object-file.c index f233b440b2..ebf273e9e7 100644 --- a/object-file.c +++ b/object-file.c @@ -1570,15 +1570,12 @@ static int do_oid_object_info_extended(struct repository *r, } /* Check if it is a missing object */ - if (fetch_if_missing && has_promisor_remote() && - !already_retried && r == the_repository && + if (fetch_if_missing && repo_has_promisor_remote(r) && + !already_retried && !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) { /* * TODO Investigate checking promisor_remote_get_direct() * TODO return value and stopping on error here. - * TODO Pass a repository struct through - * promisor_remote_get_direct(), such that arbitrary - * repositories work. */ promisor_remote_get_direct(r, real, 1); already_retried = 1; diff --git a/promisor-remote.c b/promisor-remote.c index 5819d2cf28..1601f05d79 100644 --- a/promisor-remote.c +++ b/promisor-remote.c @@ -11,7 +11,8 @@ struct promisor_remote_config { struct promisor_remote **promisors_tail; }; -static int fetch_objects(const char *remote_name, +static int fetch_objects(struct repository *repo, + const char *remote_name, const struct object_id *oids, int oid_nr) { @@ -21,6 +22,11 @@ static int fetch_objects(const char *remote_name, child.git_cmd = 1; child.in = -1; + if (repo != the_repository) { + prepare_other_repo_env(&child.env_array); + strvec_pushf(&child.env_array, "%s=%s", GIT_DIR_ENVIRONMENT, + repo->gitdir); + } strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop", "fetch", remote_name, "--no-tags", "--no-write-fetch-head", "--recurse-submodules=no", @@ -150,7 +156,7 @@ static void promisor_remote_init(struct repository *r) xcalloc(sizeof(*r->promisor_remote_config), 1); config->promisors_tail = &config->promisors; - git_config(promisor_remote_config, config); + repo_config(r, promisor_remote_config, config); if (config->repository_format_partial_clone) { struct promisor_remote *o, *previous; @@ -246,10 +252,8 @@ int promisor_remote_get_direct(struct repository *repo, promisor_remote_init(repo); - if (repo != the_repository) - BUG("only the_repository is supported for now"); for (r = repo->promisor_remote_config->promisors; r; r = r->next) { - if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) { + if (fetch_objects(repo, r->name, remaining_oids, remaining_nr) < 0) { if (remaining_nr == 1) continue; remaining_nr = remove_fetched_oids(repo, &remaining_oids, diff --git a/t/helper/test-partial-clone.c b/t/helper/test-partial-clone.c new file mode 100644 index 0000000000..e7bc7eb21f --- /dev/null +++ b/t/helper/test-partial-clone.c @@ -0,0 +1,34 @@ +#include "cache.h" +#include "test-tool.h" +#include "repository.h" +#include "object-store.h" + +static void object_info(const char *gitdir, const char *oid_hex) +{ + struct repository r; + struct object_id oid; + unsigned long size; + struct object_info oi = {.sizep = &size}; + const char *p; + + if (repo_init(&r, gitdir, NULL)) + die("could not init repo"); + if (parse_oid_hex(oid_hex, &oid, &p)) + die("could not parse oid"); + if (oid_object_info_extended(&r, &oid, &oi, 0)) + die("could not obtain object info"); + printf("%d\n", (int) size); +} + +int cmd__partial_clone(int argc, const char **argv) +{ + if (argc < 4) + die("too few arguments"); + + if (!strcmp(argv[1], "object-info")) + object_info(argv[2], argv[3]); + else + die("invalid argument '%s'", argv[1]); + + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index c5bd0c6d4c..b21e8f1519 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -46,6 +46,7 @@ static struct test_cmd cmds[] = { { "online-cpus", cmd__online_cpus }, { "parse-options", cmd__parse_options }, { "parse-pathspec-file", cmd__parse_pathspec_file }, + { "partial-clone", cmd__partial_clone }, { "path-utils", cmd__path_utils }, { "pcre2-config", cmd__pcre2_config }, { "pkt-line", cmd__pkt_line }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index e8069a3b22..f845ced4b3 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -35,6 +35,7 @@ int cmd__oidmap(int argc, const char **argv); int cmd__online_cpus(int argc, const char **argv); int cmd__parse_options(int argc, const char **argv); int cmd__parse_pathspec_file(int argc, const char** argv); +int cmd__partial_clone(int argc, const char **argv); int cmd__path_utils(int argc, const char **argv); int cmd__pcre2_config(int argc, const char **argv); int cmd__pkt_line(int argc, const char **argv); diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 584a039b85..e804d267e6 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -604,6 +604,30 @@ test_expect_success 'do not fetch when checking existence of tree we construct o git -C repo cherry-pick side1 ' +test_expect_success 'lazy-fetch when accessing object not in the_repository' ' + rm -rf full partial.git && + test_create_repo full && + printf 12345 >full/file.txt && + git -C full add file.txt && + git -C full commit -m "first commit" && + + test_config -C full uploadpack.allowfilter 1 && + test_config -C full uploadpack.allowanysha1inwant 1 && + git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git && + FILE_HASH=$(git hash-object --stdin <full/file.txt) && + + # Sanity check that the file is missing + git -C partial.git rev-list --objects --missing=print HEAD >out && + grep "[?]$FILE_HASH" out && + + OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") && + test "$OUT" -eq 5 && + + # Sanity check that the file is now present + git -C partial.git rev-list --objects --missing=print HEAD >out && + ! grep "[?]$FILE_HASH" out +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd
This is one step towards supporting partial clone submodules. Even after this patch, we will still lack partial clone submodules support, primarily because a lot of Git code that accesses submodule objects does so by adding their object stores as alternates, meaning that any lazy fetches that would occur in the submodule would be done based on the config of the superproject, not of the submodule. This also prevents testing of the functionality in this patch by user-facing commands. So for now, test this mechanism using a test helper. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- Makefile | 1 + object-file.c | 7 ++----- promisor-remote.c | 14 +++++++++----- t/helper/test-partial-clone.c | 34 ++++++++++++++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t0410-partial-clone.sh | 24 ++++++++++++++++++++++++ 7 files changed, 72 insertions(+), 10 deletions(-) create mode 100644 t/helper/test-partial-clone.c