mbox series

[0/1] ref-filter.c: faster multi-pattern ref filtering

Message ID cover.1561588479.git.me@ttaylorr.com (mailing list archive)
Headers show
Series ref-filter.c: faster multi-pattern ref filtering | expand

Message

Taylor Blau June 26, 2019, 10:41 p.m. UTC
Hi,

Peff and I have been experimenting with using the references from a
repository's alternate to speed up the connectivity check following a
'git clone --reference'.

We have noticed that the connectivity check becomes much faster when
advertising both the heads and tags of an alternate, as opposed to just
the heads. But, in our initial experiments, we noticed that it took
*far* longer to compute the answer to:

  $ git for-each-ref refs/heads refs/tags

then simply

  $ git for-each-ref refs/heads

We found that this dates back to cfe004a5a9 (ref-filter: limit
traversal to prefix, 2017-05-22), which drops the user-provided patterns
entirely when more than one pattern is passed to the ref filter.

To remedy this, we implemented an algorithm which computes the longest
pattern prefixes over the disjoint subsets of patterns provided to the
ref filter. This produces the most-specific queries (i.e., the ones that
we hope to return the fewest extra results) without covering the same
ref more than once.

The details of the algorithm are written up in detail in the patch. This
doesn't have quite the impressive results on repositories the size of
the kernel, but yields a drastic improvement on our fork network
repositories. Some synthetic results are included in the patch as well.

Thanks in advance for your review.

Taylor Blau (1):
  ref-filter.c: find disjoint pattern prefixes

 ref-filter.c            | 89 +++++++++++++++++++++++++++++------------
 t/t6300-for-each-ref.sh | 26 ++++++++++++
 2 files changed, 89 insertions(+), 26 deletions(-)

--
2.21.0.203.g358da99528