diff mbox series

[21/30] repository: Implement extensions.compatObjectFormat

Message ID 20230927195537.1682-21-ebiederm@gmail.com (mailing list archive)
State New, archived
Headers show
Series Initial support for multiple hash functions | expand

Commit Message

Eric W. Biederman Sept. 27, 2023, 7:55 p.m. UTC
From: "brian m. carlson" <sandals@crustytoothpaste.net>

Add a configuration option to enable updating and reading from
compatibility hash maps when git accesses the reposotiry.

Call the helper function repo_set_compat_hash_algo with the value
that compatObjectFormat is set to.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/config/extensions.txt | 12 ++++++++++++
 repository.c                        |  2 +-
 setup.c                             | 23 +++++++++++++++++++++--
 setup.h                             |  1 +
 4 files changed, 35 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Sept. 27, 2023, 9:39 p.m. UTC | #1
"Eric W. Biederman" <ebiederm@gmail.com> writes:

> diff --git a/setup.c b/setup.c
> index deb5a33fe9e1..87b40472dbc5 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
>  		}

This line in the pre-context needed fuzzing, but otherwise the
series applied cleanly on top of v2.42.0.

> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat

"Implement" -> "implement" (many other patches share the same
problem, none of which I fixed while queueing).
Junio C Hamano Sept. 28, 2023, 8:18 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> diff --git a/setup.c b/setup.c
>> index deb5a33fe9e1..87b40472dbc5 100644
>> --- a/setup.c
>> +++ b/setup.c
>> @@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
>>  		}
>
> This line in the pre-context needed fuzzing, but otherwise the
> series applied cleanly on top of v2.42.0.
>
>> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
>
> "Implement" -> "implement" (many other patches share the same
> problem, none of which I fixed while queueing).


The topic when merged near the tip of 'seen' seems to break a few CI
jobs here and there.  The log from the broken run can be seen at

    https://github.com/git/git/actions/runs/6331978214

You may have to log-in there before you can view the details.

Thanks.
Eric W. Biederman Sept. 29, 2023, 12:50 a.m. UTC | #3
On September 28, 2023 3:18:46 PM CDT, Junio C Hamano <gitster@pobox.com> wrote:
>Junio C Hamano <gitster@pobox.com> writes:
>
>> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>>
>>> diff --git a/setup.c b/setup.c
>>> index deb5a33fe9e1..87b40472dbc5 100644
>>> --- a/setup.c
>>> +++ b/setup.c
>>> @@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
>>>  		}
>>
>> This line in the pre-context needed fuzzing, but otherwise the
>> series applied cleanly on top of v2.42.0.
>>
>>> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
>>
>> "Implement" -> "implement" (many other patches share the same
>> problem, none of which I fixed while queueing).
>
>
>The topic when merged near the tip of 'seen' seems to break a few CI
>jobs here and there.  The log from the broken run can be seen at
>
>    https://github.com/git/git/actions/runs/6331978214
>
>You may have to log-in there before you can view the details.

Thanks.

It might take me a couple of days before I can dig into this, but I will dig in and see if I can understand and fix the build failures.

With any luck it will be something simple like forgetting that {} != {0}.

Eric
Eric W. Biederman Sept. 29, 2023, 4:59 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>>
>>> diff --git a/setup.c b/setup.c
>>> index deb5a33fe9e1..87b40472dbc5 100644
>>> --- a/setup.c
>>> +++ b/setup.c
>>> @@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
>>>  		}
>>
>> This line in the pre-context needed fuzzing, but otherwise the
>> series applied cleanly on top of v2.42.0.
>>
>>> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
>>
>> "Implement" -> "implement" (many other patches share the same
>> problem, none of which I fixed while queueing).
>
>
> The topic when merged near the tip of 'seen' seems to break a few CI
> jobs here and there.  The log from the broken run can be seen at
>
>     https://github.com/git/git/actions/runs/6331978214
>
> You may have to log-in there before you can view the details.

Did you have any manual merge conflicts you had to resolve?
If so it is possible to see the merge result you had?

There is a static failure in commit.c of oidcpy because it thinks the
array is zero size.  That is weird, but once I get a test environment
setup I expect I can figure out what it is talking about.

There in linux-leaks it lists a bunch of test failures, and unless I see
what code is actually failing I am not certain I can figure it out.

Thanks,
Eric
Junio C Hamano Sept. 29, 2023, 6:48 p.m. UTC | #5
"Eric W. Biederman" <ebiederm@gmail.com> writes:

> Did you have any manual merge conflicts you had to resolve?
> If so it is possible to see the merge result you had?

The only merge-fix I had to apply to make everything compile was
this:

diff --git a/bloom.c b/bloom.c
index ff131893cd..59eb0a0481 100644
--- a/bloom.c
+++ b/bloom.c
@@ -278,7 +278,7 @@ static int has_entries_with_high_bit(struct repository *r, struct tree *t)
 		struct tree_desc desc;
 		struct name_entry entry;
 
-		init_tree_desc(&desc, t->buffer, t->size);
+		init_tree_desc(&desc, &t->object.oid, t->buffer, t->size);
 		while (tree_entry(&desc, &entry)) {
 			size_t i;
 			for (i = 0; i < entry.pathlen; i++) {

as one topic changed the function signature while the other topic
added a new callsite.

Everything else was pretty-much auto resolved, I think.

Output from "git show --cc seen" matches my recollection.  The above
does appear as an evil merge.
Eric W. Biederman Oct. 2, 2023, 12:48 a.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> Did you have any manual merge conflicts you had to resolve?
>> If so it is possible to see the merge result you had?
>
> The only merge-fix I had to apply to make everything compile was
> this:
>
> diff --git a/bloom.c b/bloom.c
> index ff131893cd..59eb0a0481 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -278,7 +278,7 @@ static int has_entries_with_high_bit(struct repository *r, struct tree *t)
>  		struct tree_desc desc;
>  		struct name_entry entry;
>  
> -		init_tree_desc(&desc, t->buffer, t->size);
> +		init_tree_desc(&desc, &t->object.oid, t->buffer, t->size);
>  		while (tree_entry(&desc, &entry)) {
>  			size_t i;
>  			for (i = 0; i < entry.pathlen; i++) {
>
> as one topic changed the function signature while the other topic
> added a new callsite.
>
> Everything else was pretty-much auto resolved, I think.
>
> Output from "git show --cc seen" matches my recollection.  The above
> does appear as an evil merge.

Thanks, and I found all of this on your seen branch.

After tracking all of these down it appears all of the errors
came from my branch, I will be resending the patches as soon
as I finish going through the review comments.


Looking at the build errors pretty much all of the all of the
automatic test failures came from commit_tree_extended.


There was a strbuf that did
strbuf_init(&buf, 8192);
strbuf_init(&buf, 8192);
twice.

Plus there was another buffer that was allocated and not freed,
in commit_tree_extended.


The leaks were a bit tricky to track down as building with SANITIZE=leak
causes tests to fail somewhat randomly for me with "gcc (Debian
12.2.0-14) 12.2.0".


There was one smatch static-analysis error that suggested using
CALLOC_ARRAY instead of xcalloc.



The "win" build and "linux-gcc-default (ubuntu-lastest)" build failed
because of an over eager gcc warning -Werror=array-bounds.
Claiming:

 In file included from /usr/include/string.h:535,
                  from git-compat-util.h:228,
                  from commit.c:1:
 In function ‘memcpy’,
     inlined from ‘oidcpy’ at hash-ll.h:272:2,
     inlined from ‘commit_tree_extended’ at commit.c:1705:3:
 ##[error]/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: ‘__builtin_memcpy’ offset [0, 31] is out of the bounds [0, 0] [-Werror=array-bounds]
    29 |   return __builtin___memcpy_chk (__dest, __src, __len,
       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    30 |                                  __glibc_objsize0 (__dest));
       |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~

Previously in commit_tree_extended the structure was:

	while (parents) {
		struct commit *parent = pop_commit(&parents);
		strbuf_addf(&buffer, "parent %s\n",
			    oid_to_hex(&parent->object.oid));
	}

brian had changed it to:

	nparents = commit_list_count(parents);
	parent_buf = xcalloc(nparents, sizeof(*parent_buf));
	for (i = 0; i < nparents; i++) {
		struct commit *parent = pop_commit(&parents);
		oidcpy(&parent_buf[i], &parent->object.oid);
	}

Which is perfectly sound code.

I changed the structure of the loop to:

	nparents = commit_list_count(parents);
	parent_buf = xcalloc(nparents, sizeof(*parent_buf));
	i = 0;
	while (parents) {
		struct commit *parent = pop_commit(&parents);
		oidcpy(&parent_buf[i++], &parent->object.oid);
	}

And the "array-bounds" warning had no problems with the code.
So it looks like the error was actually that array-bounds thought
there was a potential NULL pointer dereference at which point
it would not have array bounds, and then it complained about
the array bounds, instead of the NULL pointer dereference.

I am going to fix the patch.  If array-bounds causes further
problems you may want to think about disabling it.

Eric
Eric W. Biederman Oct. 2, 2023, 1:31 a.m. UTC | #7
Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
>
> "Implement" -> "implement" (many other patches share the same
> problem, none of which I fixed while queueing).

Why shouldn't sentences begin with a capital letter?

I agree it isn't a very extensive sentence just two words but it is a
complete thought, and thus is a sentence.

There is great value in uniformity in a project so I will make the
change, but it seems very weird to me.

Eric
diff mbox series

Patch

diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt
index bccaec7a9636..9f72e6d9f4f1 100644
--- a/Documentation/config/extensions.txt
+++ b/Documentation/config/extensions.txt
@@ -7,6 +7,18 @@  Note that this setting should only be set by linkgit:git-init[1] or
 linkgit:git-clone[1].  Trying to change it after initialization will not
 work and will produce hard-to-diagnose issues.
 
+extensions.compatObjectFormat::
+
+	Specify a compatitbility hash algorithm to use.  The acceptable values
+	are `sha1` and `sha256`.  The value specified must be different from the
+	value of extensions.objectFormat.  This allows client level
+	interoperability between git repositories whose objectFormat matches
+	this compatObjectFormat.  In particular when fully implemented the
+	pushes and pulls from a repository in whose objectFormat matches
+	compatObjectFormat.  As well as being able to use oids encoded in
+	compatObjectFormat in addition to oids encoded with objectFormat to
+	locally specify objects.
+
 extensions.worktreeConfig::
 	If enabled, then worktrees will load config settings from the
 	`$GIT_DIR/config.worktree` file in addition to the
diff --git a/repository.c b/repository.c
index 6214f61cf4e7..9d91536b613b 100644
--- a/repository.c
+++ b/repository.c
@@ -194,7 +194,7 @@  int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
-	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
+	repo_set_compat_hash_algo(repo, format.compat_hash_algo);
 	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
diff --git a/setup.c b/setup.c
index deb5a33fe9e1..87b40472dbc5 100644
--- a/setup.c
+++ b/setup.c
@@ -598,6 +598,25 @@  static enum extension_result handle_extension(const char *var,
 		}
 		data->hash_algo = format;
 		return EXTENSION_OK;
+	} else if (!strcmp(ext, "compatobjectformat")) {
+		struct string_list_item *item;
+		int format;
+
+		if (!value)
+			return config_error_nonbool(var);
+		format = hash_algo_by_name(value);
+		if (format == GIT_HASH_UNKNOWN)
+			return error(_("invalid value for '%s': '%s'"),
+				     "extensions.compatobjectformat", value);
+		/* For now only support compatObjectFormat being specified once. */
+		for_each_string_list_item(item, &data->v1_only_extensions) {
+			if (!strcmp(item->string, "compatobjectformat"))
+				return error(_("'%s' already specified as '%s'"),
+					"extensions.compatobjectformat",
+					hash_algos[data->compat_hash_algo].name);
+		}
+		data->compat_hash_algo = format;
+		return EXTENSION_OK;
 	}
 	return EXTENSION_UNKNOWN;
 }
@@ -1573,7 +1592,7 @@  const char *setup_git_directory_gently(int *nongit_ok)
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 			repo_set_compat_hash_algo(the_repository,
-						  GIT_HASH_UNKNOWN);
+						  repo_fmt.compat_hash_algo);
 			the_repository->repository_format_worktree_config =
 				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
@@ -1667,7 +1686,7 @@  void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
-	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
+	repo_set_compat_hash_algo(the_repository, fmt->compat_hash_algo);
 	the_repository->repository_format_worktree_config =
 		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
diff --git a/setup.h b/setup.h
index 58fd2605dd26..5d678ceb8caa 100644
--- a/setup.h
+++ b/setup.h
@@ -86,6 +86,7 @@  struct repository_format {
 	int worktree_config;
 	int is_bare;
 	int hash_algo;
+	int compat_hash_algo;
 	int sparse_index;
 	char *work_tree;
 	struct string_list unknown_extensions;