mbox series

[v2,0/4] First steps towards partial clone submodules

Message ID cover.1623111879.git.jonathantanmy@google.com (mailing list archive)
Headers show
Series First steps towards partial clone submodules | expand

Message

Jonathan Tan June 8, 2021, 12:25 a.m. UTC
Thanks everyone for your reviews. I believe I've addressed all review
comments, including the one from Elijah about the test failing with
sha256 (which turns out to be because I didn't add a call to
setup_git_directory(), which the other test helpers do).

Jonathan Tan (4):
  promisor-remote: read partialClone config here
  promisor-remote: support per-repository config
  run-command: move envvar-resetting function
  promisor-remote: teach lazy-fetch in any repo

 Makefile                      |   1 +
 cache.h                       |   1 -
 object-file.c                 |   7 +-
 promisor-remote.c             | 125 ++++++++++++++++++++--------------
 promisor-remote.h             |  28 +++++---
 repository.c                  |   6 ++
 repository.h                  |   4 ++
 run-command.c                 |  10 +++
 run-command.h                 |   9 +++
 setup.c                       |  10 ++-
 submodule.c                   |  14 +---
 t/helper/test-partial-clone.c |  43 ++++++++++++
 t/helper/test-tool.c          |   1 +
 t/helper/test-tool.h          |   1 +
 t/t0410-partial-clone.sh      |  23 +++++++
 15 files changed, 201 insertions(+), 82 deletions(-)
 create mode 100644 t/helper/test-partial-clone.c

Range-diff against v1:
1:  4a7ad9ffeb ! 1:  07290cba86 promisor-remote: read partialClone config here
    @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char
      	const char *subkey;
      
     +	if (!strcmp(var, "extensions.partialclone")) {
    -+		repository_format_partial_clone = xstrdup(value);
    ++		/*
    ++		 * NULL value is handled in handle_extension_v0 in setup.c.
    ++		 */
    ++		if (value)
    ++			repository_format_partial_clone = xstrdup(value);
     +		return 0;
     +	}
     +
2:  d8f5fa9b9f ! 2:  c462927ff2 promisor-remote: support per-repository config
    @@ promisor-remote.c: static void promisor_remote_move_to_tail(struct promisor_remo
      	const char *name;
      	size_t namelen;
      	const char *subkey;
    - 
    - 	if (!strcmp(var, "extensions.partialclone")) {
    --		repository_format_partial_clone = xstrdup(value);
    -+		config->repository_format_partial_clone = xstrdup(value);
    +@@ promisor-remote.c: static int promisor_remote_config(const char *var, const char *value, void *data
    + 		 * NULL value is handled in handle_extension_v0 in setup.c.
    + 		 */
    + 		if (value)
    +-			repository_format_partial_clone = xstrdup(value);
    ++			config->repository_format_partial_clone = xstrdup(value);
      		return 0;
      	}
      
    @@ promisor-remote.c: static int promisor_remote_config(const char *var, const char
      }
      
     -static void promisor_remote_clear(void)
    -+static void promisor_remote_clear(struct promisor_remote_config *config)
    ++void promisor_remote_clear(struct promisor_remote_config *config)
      {
     -	while (promisors) {
     -		struct promisor_remote *r = promisors;
     -		promisors = promisors->next;
    ++	FREE_AND_NULL(config->repository_format_partial_clone);
    ++
     +	while (config->promisors) {
     +		struct promisor_remote *r = config->promisors;
     +		config->promisors = config->promisors->next;
    @@ promisor-remote.h: struct promisor_remote {
     +	repo_promisor_remote_reinit(the_repository);
     +}
     +
    ++void promisor_remote_clear(struct promisor_remote_config *config);
    ++
     +struct promisor_remote *repo_promisor_remote_find(struct repository *r, const char *remote_name);
     +static inline struct promisor_remote *promisor_remote_find(const char *remote_name)
     +{
    @@ promisor-remote.h: struct promisor_remote {
      /*
       * Fetches all requested objects from all promisor remotes, trying them one at
     
    + ## repository.c ##
    +@@
    + #include "lockfile.h"
    + #include "submodule-config.h"
    + #include "sparse-index.h"
    ++#include "promisor-remote.h"
    + 
    + /* The main repository */
    + static struct repository the_repo;
    +@@ repository.c: void repo_clear(struct repository *repo)
    + 		if (repo->index != &the_index)
    + 			FREE_AND_NULL(repo->index);
    + 	}
    ++
    ++	if (repo->promisor_remote_config) {
    ++		promisor_remote_clear(repo->promisor_remote_config);
    ++		FREE_AND_NULL(repo->promisor_remote_config);
    ++	}
    + }
    + 
    + int repo_read_index(struct repository *repo)
    +
      ## repository.h ##
     @@ repository.h: struct lock_file;
      struct pathspec;
3:  c5307a9f02 ! 3:  9cbdf60981 run-command: move envvar-resetting function
    @@ run-command.h: int run_processes_parallel_tr2(int n, get_next_task_fn, start_fai
      			       const char *tr2_category, const char *tr2_label);
      
     +/**
    -+ * Convenience function that adds entries to env_array that resets all
    -+ * repo-specific environment variables except for CONFIG_DATA_ENVIRONMENT. See
    -+ * local_repo_env in cache.h for more information.
    ++ * Convenience function which adds all GIT_* environment variables to env_array
    ++ * with the exception of GIT_CONFIG_PARAMETERS. When used as the env_array of a
    ++ * subprocess, these entries cause the corresponding environment variables to
    ++ * be unset in the subprocess. See local_repo_env in cache.h for more
    ++ * information.
     + */
     +void prepare_other_repo_env(struct strvec *env_array);
     +
4:  b70a00b9b0 ! 4:  5b41569ace promisor-remote: teach lazy-fetch in any repo
    @@ t/helper/test-partial-clone.c (new)
     +#include "repository.h"
     +#include "object-store.h"
     +
    ++/*
    ++ * Prints the size of the object corresponding to the given hash in a specific
    ++ * gitdir. This is similar to "git -C gitdir cat-file -s", except that this
    ++ * exercises the code that accesses the object of an arbitrary repository that
    ++ * is not the_repository. ("git -C gitdir" makes it so that the_repository is
    ++ * the one in gitdir.)
    ++ */
     +static void object_info(const char *gitdir, const char *oid_hex)
     +{
     +	struct repository r;
    @@ t/helper/test-partial-clone.c (new)
     +
     +int cmd__partial_clone(int argc, const char **argv)
     +{
    ++	setup_git_directory();
    ++
     +	if (argc < 4)
     +		die("too few arguments");
     +
    @@ t/t0410-partial-clone.sh: test_expect_success 'do not fetch when checking existe
     +test_expect_success 'lazy-fetch when accessing object not in the_repository' '
     +	rm -rf full partial.git &&
     +	test_create_repo full &&
    -+	printf 12345 >full/file.txt &&
    -+	git -C full add file.txt &&
    -+	git -C full commit -m "first commit" &&
    ++	test_commit -C full create-a-file file.txt &&
     +
     +	test_config -C full uploadpack.allowfilter 1 &&
     +	test_config -C full uploadpack.allowanysha1inwant 1 &&
     +	git clone --filter=blob:none --bare "file://$(pwd)/full" partial.git &&
    -+	FILE_HASH=$(git hash-object --stdin <full/file.txt) &&
    ++	FILE_HASH=$(git -C full rev-parse HEAD:file.txt) &&
     +
     +	# Sanity check that the file is missing
     +	git -C partial.git rev-list --objects --missing=print HEAD >out &&
     +	grep "[?]$FILE_HASH" out &&
     +
    -+	OUT=$(test-tool partial-clone object-info partial.git "$FILE_HASH") &&
    -+	test "$OUT" -eq 5 &&
    ++	git -C full cat-file -s "$FILE_HASH" >expect &&
    ++	test-tool partial-clone object-info partial.git "$FILE_HASH" >actual &&
    ++	test_cmp expect actual &&
     +
     +	# Sanity check that the file is now present
     +	git -C partial.git rev-list --objects --missing=print HEAD >out &&

Comments

Elijah Newren June 8, 2021, 5:50 p.m. UTC | #1
On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Thanks everyone for your reviews. I believe I've addressed all review
> comments, including the one from Elijah about the test failing with
> sha256 (which turns out to be because I didn't add a call to
> setup_git_directory(), which the other test helpers do).

Thanks for fixing those up.  I spotted some minor nits/questions, but
nothing big.

Looks like Junio did spot some bigger items...which raises a question
for me.  I have a series
(https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/)
that also touches partial clones.  Our series are semantically
independent, but we both add a repository parameter to
fetch_objects().  So we both make the same change, but you also make
additional nearby changes, resulting in two trivial conflicts.  So,
should I rebase my series on yours, should you rebase on mine, or
should we just let both proceed independently and double-check Junio
resolves the trivial conflicts in favor of your side?

Thoughts?
Junio C Hamano June 8, 2021, 11:42 p.m. UTC | #2
Elijah Newren <newren@gmail.com> writes:

> On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>>
>> Thanks everyone for your reviews. I believe I've addressed all review
>> comments, including the one from Elijah about the test failing with
>> sha256 (which turns out to be because I didn't add a call to
>> setup_git_directory(), which the other test helpers do).
>
> Thanks for fixing those up.  I spotted some minor nits/questions, but
> nothing big.
>
> Looks like Junio did spot some bigger items...which raises a question
> for me.  I have a series ...

Do you mean, by "bigger items", that we may want to turn it around
to have repo extension data to the in-core repository structure?

> (https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/)
> that also touches partial clones.  Our series are semantically
> independent, but we both add a repository parameter to
> fetch_objects().  So we both make the same change, but you also make
> additional nearby changes, resulting in two trivial conflicts.  ...

I can sort of see how the above plan would work if we are not going
to fix the "keep only the partialclone related extension thing,
instead of solving the larger structural problem that the current
arrangement ignores that repository extensions are per repository".
But wouldn't that leave us with two series with technical debt?
Also, if Jonathan's series fixes the "bigger item", would the above
"proceed more or less independently or rebase one on top of the
other" plan work well without making the same fix in yours?

I guess a better first step would be to stop, think and decide what
to do with the "bigger" thing---if only to dismiss it with a firm
declaration that we would never do such a fix and move extension
data piecemeal to relevant subsystems, so that we'd reduce conflicts
in the future, as I am reasonably sure that the "bigger item" will
be tempting to fix even after the two series lands, and doing so at
that time would be twice larger surgery.
Elijah Newren June 9, 2021, 12:07 a.m. UTC | #3
On Tue, Jun 8, 2021 at 4:42 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Mon, Jun 7, 2021 at 5:26 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >>
> >> Thanks everyone for your reviews. I believe I've addressed all review
> >> comments, including the one from Elijah about the test failing with
> >> sha256 (which turns out to be because I didn't add a call to
> >> setup_git_directory(), which the other test helpers do).
> >
> > Thanks for fixing those up.  I spotted some minor nits/questions, but
> > nothing big.
> >
> > Looks like Junio did spot some bigger items...which raises a question
> > for me.  I have a series ...
>
> Do you mean, by "bigger items", that we may want to turn it around
> to have repo extension data to the in-core repository structure?

Yes.

> > (https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/)
> > that also touches partial clones.  Our series are semantically
> > independent, but we both add a repository parameter to
> > fetch_objects().  So we both make the same change, but you also make
> > additional nearby changes, resulting in two trivial conflicts.  ...
>
> I can sort of see how the above plan would work if we are not going
> to fix the "keep only the partialclone related extension thing,
> instead of solving the larger structural problem that the current
> arrangement ignores that repository extensions are per repository".
> But wouldn't that leave us with two series with technical debt?
> Also, if Jonathan's series fixes the "bigger item", would the above
> "proceed more or less independently or rebase one on top of the
> other" plan work well without making the same fix in yours?

My series is completely independent of the partialclone extension stuff.

My series merely adds the recording of a single statistic (number of
fetched objects) to the partial clone stuff; everything else is higher
level diffcore-rename and merge-ort stuff.

> I guess a better first step would be to stop, think and decide what
> to do with the "bigger" thing---if only to dismiss it with a firm
> declaration that we would never do such a fix and move extension
> data piecemeal to relevant subsystems, so that we'd reduce conflicts
> in the future, as I am reasonably sure that the "bigger item" will
> be tempting to fix even after the two series lands, and doing so at
> that time would be twice larger surgery.

I don't understand how you think the partialclone extension stuff is
relevant to my series at all.  My changes to promisor-remote.c are
just a couple lines, and if he expands or rearranges his work, the
amount of conflicts can't really get any bigger because there's only a
few lines on my side for it to conflict with.
Junio C Hamano June 9, 2021, 12:18 a.m. UTC | #4
Elijah Newren <newren@gmail.com> writes:

> My series is completely independent of the partialclone extension stuff.

OK, that makes it simpler.
Jonathan Tan June 9, 2021, 4:58 a.m. UTC | #5
> Looks like Junio did spot some bigger items...which raises a question
> for me.  I have a series
> (https://lore.kernel.org/git/pull.969.git.1622856485.gitgitgadget@gmail.com/)
> that also touches partial clones.  Our series are semantically
> independent, but we both add a repository parameter to
> fetch_objects().  So we both make the same change, but you also make
> additional nearby changes, resulting in two trivial conflicts.  So,
> should I rebase my series on yours, should you rebase on mine, or
> should we just let both proceed independently and double-check Junio
> resolves the trivial conflicts in favor of your side?
> 
> Thoughts?

From [1], looks like this is already resolved, but in any case I think
we can just let both proceed independently since the conflicts are
relatively trivial. If it turns out to be not so trivial, I think Junio
can just let one of us know on-list and whoever it is can rebase on the
other's.

[1] https://lore.kernel.org/git/xmqqlf7jnb5u.fsf@gitster.g/