Message ID | 20190327211110.46327-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fetch-pack: binary search when storing wanted-refs | expand |
On Wed, Mar 27, 2019 at 02:11:10PM -0700, Jonathan Tan wrote: > In do_fetch_pack_v2(), the "sought" array is sorted by name, and it is > not subsequently reordered (within the function). Therefore, > receive_wanted_refs() can assume that "sought" is sorted, and can thus > use a binary search when storing wanted-refs retrieved from the server. > > Replace the existing linear search with a binary search. This improves > performance significantly when mirror cloning a repository with more > than 1 million refs. This looks good. The flow in do_fetch_pack_v2() is a little funny. I wanted to double-check that we always sorted the sought list, because it only happens in the CHECK_LOCAL state of the state machine. But as far as I can tell we always begin the function in CHECK_LOCAL, it always transitions out to another state, and we never go back to it. So all of that part of the state-machine switch could really just be done once before entering the state-machine's while loop. Not really relevant to your patch, but maybe worth tweaking separately (or maybe not, if people find the all-in-a-state-machine style more readable; I found it more confusing to reason about). > +static int cmp_name_ref(const void *name, const void *ref) > +{ > + return strcmp(name, (*(struct ref **)ref)->name); > +} After some errors with qsort comparison functions a while back[1], I double-checked that this has the right number of asterisks. I believe it does. :) -Peff [1] https://public-inbox.org/git/d1b58614-989f-5998-6c53-c19eee409a2f@web.de/ and the child thread it spawned are interesting reading, though I don't think we ever followed up on it.
diff --git a/fetch-pack.c b/fetch-pack.c index e69993b2eb..e8266bd45a 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1298,6 +1298,11 @@ static void receive_shallow_info(struct fetch_pack_args *args, } } +static int cmp_name_ref(const void *name, const void *ref) +{ + return strcmp(name, (*(struct ref **)ref)->name); +} + static void receive_wanted_refs(struct packet_reader *reader, struct ref **sought, int nr_sought) { @@ -1305,20 +1310,16 @@ static void receive_wanted_refs(struct packet_reader *reader, while (packet_reader_read(reader) == PACKET_READ_NORMAL) { struct object_id oid; const char *end; - int i; + struct ref **found; if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ') die(_("expected wanted-ref, got '%s'"), reader->line); - for (i = 0; i < nr_sought; i++) { - if (!strcmp(end, sought[i]->name)) { - oidcpy(&sought[i]->old_oid, &oid); - break; - } - } - - if (i == nr_sought) + found = bsearch(end, sought, nr_sought, sizeof(*sought), + cmp_name_ref); + if (!found) die(_("unexpected wanted-ref: '%s'"), reader->line); + oidcpy(&(*found)->old_oid, &oid); } if (reader->status != PACKET_READ_DELIM)
In do_fetch_pack_v2(), the "sought" array is sorted by name, and it is not subsequently reordered (within the function). Therefore, receive_wanted_refs() can assume that "sought" is sorted, and can thus use a binary search when storing wanted-refs retrieved from the server. Replace the existing linear search with a binary search. This improves performance significantly when mirror cloning a repository with more than 1 million refs. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- fetch-pack.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)