diff mbox series

[v4,5/6] remote: die if branch is not found in repository

Message ID 20211028183101.41013-6-chooglen@google.com (mailing list archive)
State Superseded
Headers show
Series remote: replace static variables with struct remote_state | expand

Commit Message

Glen Choo Oct. 28, 2021, 6:31 p.m. UTC
In a subsequent commit, we would like external-facing functions to be
able to accept "struct repository" and "struct branch" as a pair. This
is useful for functions like pushremote_for_branch(), which need to take
values from the remote_state and branch, even if branch == NULL.
However, a caller may supply an unrelated repository and branch, which
is not supported behavior.

To prevent misuse, add a die_on_missing_branch() helper function that
dies if a given branch is not from a given repository. Speed up the
existence check by using a new branches_hash hashmap to remote_state,
and use the hashmap to remove the branch array iteration in
make_branch().

Like read_config(), die_on_missing_branch() is only called from
non-static functions; static functions are less prone to misuse because
they have strong conventions for keeping remote_state and branch in
sync.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 remote.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 remote.h |  2 ++
 2 files changed, 69 insertions(+), 6 deletions(-)

Comments

Jonathan Tan Nov. 15, 2021, 6:50 p.m. UTC | #1
> In a subsequent commit, we would like external-facing functions to be
> able to accept "struct repository" and "struct branch" as a pair. This
> is useful for functions like pushremote_for_branch(), which need to take
> values from the remote_state and branch, even if branch == NULL.
> However, a caller may supply an unrelated repository and branch, which
> is not supported behavior.
> 
> To prevent misuse, add a die_on_missing_branch() helper function that
> dies if a given branch is not from a given repository. Speed up the
> existence check by using a new branches_hash hashmap to remote_state,
> and use the hashmap to remove the branch array iteration in
> make_branch().
> 
> Like read_config(), die_on_missing_branch() is only called from
> non-static functions; static functions are less prone to misuse because
> they have strong conventions for keeping remote_state and branch in
> sync.

This makes sense, but how often would this be called? Couldn't we just
iterate over the array (instead of making a hashmap)? If speed is
important, I think we could just sort the array and do a binary search.
Glen Choo Nov. 15, 2021, 8:06 p.m. UTC | #2
Jonathan Tan <jonathantanmy@google.com> writes:

>> To prevent misuse, add a die_on_missing_branch() helper function that
>> dies if a given branch is not from a given repository. Speed up the
>> existence check by using a new branches_hash hashmap to remote_state,
>> and use the hashmap to remove the branch array iteration in
>> make_branch().
> This makes sense, but how often would this be called?

This affects most of remote.h branches API (branch_get(),
branch_get_upstream(), branch_get_push()). From what I can tell, I don't
think these are called very frequently.

> Couldn't we just iterate over the array (instead of making a hashmap)?
> If speed is important, I think we could just sort the array and do a
> binary search.

The primary reason I used a hashmap is to be consistent with struct
remote (which also uses a hashmap). One possible argument in your favor
is that remotes are often looked up by name often (and justify the
hashmap), whereas branches are not looked up by name as often (and don't
justify a hashmap).

I say _justify_, but I don't see significant drawbacks to using a
hashmap here. I suspect that there is an advantage to binary search that
you haven't made explicit yet? Could you share your thought process to
help inform the decision?
Jonathan Tan Nov. 16, 2021, 5:45 p.m. UTC | #3
Glen Choo <chooglen@google.com> writes:
> > Couldn't we just iterate over the array (instead of making a hashmap)?
> > If speed is important, I think we could just sort the array and do a
> > binary search.
> 
> The primary reason I used a hashmap is to be consistent with struct
> remote (which also uses a hashmap). One possible argument in your favor
> is that remotes are often looked up by name often (and justify the
> hashmap), whereas branches are not looked up by name as often (and don't
> justify a hashmap).
> 
> I say _justify_, but I don't see significant drawbacks to using a
> hashmap here. I suspect that there is an advantage to binary search that
> you haven't made explicit yet? Could you share your thought process to
> help inform the decision?

The main drawback is that branches are now stored in 2 ways - as the
"branches" array in struct remote_state and as this new "branches_hash".
I think we should avoid storing the same data twice unless we really
need to, and I don't think there is a need here.

As for hashmap vs array (say, if we were thinking of removing the array
and putting in a hashmap instead), I would still prefer the array
(sorted if needed) just for the simplicity, but I wouldn't feel as
strongly about this since there is no duplication here.
diff mbox series

Patch

diff --git a/remote.c b/remote.c
index 88697e7e6d..2b5166dee0 100644
--- a/remote.c
+++ b/remote.c
@@ -166,17 +166,66 @@  static void add_merge(struct branch *branch, const char *name)
 	branch->merge_name[branch->merge_nr++] = name;
 }
 
+struct branches_hash_key {
+	const char *str;
+	int len;
+};
+
+static int branches_hash_cmp(const void *unused_cmp_data,
+			     const struct hashmap_entry *eptr,
+			     const struct hashmap_entry *entry_or_key,
+			     const void *keydata)
+{
+	const struct branch *a, *b;
+	const struct branches_hash_key *key = keydata;
+
+	a = container_of(eptr, const struct branch, ent);
+	b = container_of(entry_or_key, const struct branch, ent);
+
+	if (key)
+		return strncmp(a->name, key->str, key->len) ||
+		       a->name[key->len];
+	else
+		return strcmp(a->name, b->name);
+}
+
+static struct branch *find_branch(struct remote_state *remote_state,
+				  const char *name, size_t len)
+{
+	struct branches_hash_key lookup;
+	struct hashmap_entry lookup_entry, *e;
+
+	if (!len)
+		len = strlen(name);
+
+	lookup.str = name;
+	lookup.len = len;
+	hashmap_entry_init(&lookup_entry, memhash(name, len));
+
+	e = hashmap_get(&remote_state->branches_hash, &lookup_entry, &lookup);
+	if (e)
+		return container_of(e, struct branch, ent);
+
+	return NULL;
+}
+
+static void die_on_missing_branch(struct repository *repo,
+				  struct branch *branch)
+{
+	/* branch == NULL is always valid because it represents detached HEAD. */
+	if (branch &&
+	    branch != find_branch(repo->remote_state, branch->name, 0))
+		die("branch %s was not found in the repository", branch->name);
+}
+
 static struct branch *make_branch(struct remote_state *remote_state,
 				  const char *name, size_t len)
 {
 	struct branch *ret;
-	int i;
 
-	for (i = 0; i < remote_state->branches_nr; i++) {
-		if (!strncmp(name, remote_state->branches[i]->name, len) &&
-		    !remote_state->branches[i]->name[len])
-			return remote_state->branches[i];
-	}
+	ret = find_branch(remote_state, name, len);
+	if (ret)
+		return ret;
 
 	ALLOC_GROW(remote_state->branches, remote_state->branches_nr + 1,
 		   remote_state->branches_alloc);
@@ -185,6 +234,9 @@  static struct branch *make_branch(struct remote_state *remote_state,
 	ret->name = xstrndup(name, len);
 	ret->refname = xstrfmt("refs/heads/%s", ret->name);
 
+	hashmap_entry_init(&ret->ent, memhash(name, len));
+	if (hashmap_put_entry(&remote_state->branches_hash, ret, ent))
+		BUG("hashmap_put overwrote entry after hashmap_get returned NULL");
 	return ret;
 }
 
@@ -500,6 +552,8 @@  static const char *remotes_remote_for_branch(struct remote_state *remote_state,
 const char *remote_for_branch(struct branch *branch, int *explicit)
 {
 	read_config(the_repository);
+	die_on_missing_branch(the_repository, branch);
+
 	return remotes_remote_for_branch(the_repository->remote_state, branch,
 					 explicit);
 }
@@ -524,6 +578,8 @@  remotes_pushremote_for_branch(struct remote_state *remote_state,
 const char *pushremote_for_branch(struct branch *branch, int *explicit)
 {
 	read_config(the_repository);
+	die_on_missing_branch(the_repository, branch);
+
 	return remotes_pushremote_for_branch(the_repository->remote_state,
 					     branch, explicit);
 }
@@ -534,6 +590,8 @@  static struct remote *remotes_remote_get(struct remote_state *remote_state,
 const char *remote_ref_for_branch(struct branch *branch, int for_push)
 {
 	read_config(the_repository);
+	die_on_missing_branch(the_repository, branch);
+
 	if (branch) {
 		if (!for_push) {
 			if (branch->merge_nr) {
@@ -1879,6 +1937,8 @@  static const char *branch_get_push_1(struct remote_state *remote_state,
 const char *branch_get_push(struct branch *branch, struct strbuf *err)
 {
 	read_config(the_repository);
+	die_on_missing_branch(the_repository, branch);
+
 	if (!branch)
 		return error_buf(err, _("HEAD does not point to a branch"));
 
@@ -2652,6 +2712,7 @@  struct remote_state *remote_state_new(void)
 	memset(r, 0, sizeof(*r));
 
 	hashmap_init(&r->remotes_hash, remotes_hash_cmp, NULL, 0);
+	hashmap_init(&r->branches_hash, branches_hash_cmp, NULL, 0);
 	return r;
 }
 
diff --git a/remote.h b/remote.h
index 85a730b8ef..b205830ac3 100644
--- a/remote.h
+++ b/remote.h
@@ -46,6 +46,7 @@  struct remote_state {
 	struct branch **branches;
 	int branches_alloc;
 	int branches_nr;
+	struct hashmap branches_hash;
 
 	struct branch *current_branch;
 	const char *pushremote_name;
@@ -292,6 +293,7 @@  int remote_find_tracking(struct remote *remote, struct refspec_item *refspec);
  * branch_get(name) for "refs/heads/{name}", or with branch_get(NULL) for HEAD.
  */
 struct branch {
+	struct hashmap_entry ent;
 
 	/* The short name of the branch. */
 	const char *name;