diff mbox series

[v2] midx: traverse the local MIDX first

Message ID 20200828202213.GA24009@nand.local (mailing list archive)
State Accepted
Commit 59552fb3e2161648952a8a1240ffef57eff9a262
Headers show
Series [v2] midx: traverse the local MIDX first | expand

Commit Message

Taylor Blau Aug. 28, 2020, 8:22 p.m. UTC
When a repository has an alternate object directory configured, callers
can traverse through each alternate's MIDX by walking the '->next'
pointer.

But, when 'prepare_multi_pack_index_one()' loads multiple MIDXs, it
places the new ones at the front of this pointer chain, not at the end.
This can be confusing for callers such as 'git repack -ad', causing test
failures like in t7700.6 with 'GIT_TEST_MULTI_PACK_INDEX=1'.

The occurs when dropping a pack known to the local MIDX with alternates
configured that have their own MIDX. Since the alternate's MIDX is
returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns
true (which is correct, since it traverses through the '->next' pointer
to find the MIDX in the chain that does contain the requested object).
But, we call 'clear_midx_file()' on 'the_repository', which drops the
MIDX at the path of the first MIDX in the chain, which (in the case of
t7700.6 is the one in the alternate).

This patch addresses that by:

  - placing the local MIDX first in the chain when calling
    'prepare_multi_pack_index_one()', and

  - introducing a new 'get_local_multi_pack_index()', which explicitly
    returns the repository-local MIDX, if any.

Don't impose an additional order on the MIDX's '->next' pointer beyond
that the first item in the chain must be local if one exists so that we
avoid a quadratic insertion.

Likewise, use 'get_local_multi_pack_index()' in
'remove_redundant_pack()' to fix the formerly broken t7700.6 when run
with 'GIT_TEST_MULTI_PACK_INDEX=1'.

Finally, note that the MIDX ordering invariant is only preserved by the
insertion order in 'prepare_packed_git()', which traverses through the
ODB's '->next' pointer, meaning we visit the local object store first.
This fragility makes this an undesirable long-term solution if more
callers are added, but it is acceptable for now since this is the only
caller.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c |  2 +-
 midx.c           |  8 ++++++--
 packfile.c       | 11 +++++++++++
 packfile.h       |  1 +
 4 files changed, 19 insertions(+), 3 deletions(-)

Comments

Jeff King Aug. 28, 2020, 9:19 p.m. UTC | #1
On Fri, Aug 28, 2020 at 04:22:13PM -0400, Taylor Blau wrote:

> This patch addresses that by:
> 
>   - placing the local MIDX first in the chain when calling
>     'prepare_multi_pack_index_one()', and
> 
>   - introducing a new 'get_local_multi_pack_index()', which explicitly
>     returns the repository-local MIDX, if any.
> 
> Don't impose an additional order on the MIDX's '->next' pointer beyond
> that the first item in the chain must be local if one exists so that we
> avoid a quadratic insertion.

This version looks fine to me.

Thinking on it a bit more, we probably want this "local one is first"
behavior even without it fixing this bug. For normal packs, we always
prefer local copies over alternates, under the assumption that
alternates are likely to be more expensive to access (e.g., shared nfs).

Of course that somewhat goes out the window since we re-order lookups
these days based on where we've found previous objects (my mru stuff,
but even before that the single "last pack" variable). But it makes
sense to at least start with the local ones being given priority.

I don't think we do any mru tricks with the midx list, so it would stay
in local-first mode always, which is reasonable (you probably don't have
so many midx's that any reordering is worth it anyway).

It does mean we may consult an alternates midx before any local non-midx
packs, which _could_ be slower. But since this is all guesses and
heuristics anyway, I'd wait until somebody has a concrete case where
they can demonstrate a different ordering scheme works better.

-Peff
diff mbox series

Patch

diff --git a/builtin/repack.c b/builtin/repack.c
index 98fac03946..01e7767c79 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -133,7 +133,7 @@  static void get_non_kept_pack_filenames(struct string_list *fname_list,
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	struct strbuf buf = STRBUF_INIT;
-	struct multi_pack_index *m = get_multi_pack_index(the_repository);
+	struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
 	strbuf_addf(&buf, "%s.pack", base_name);
 	if (m && midx_contains_pack(m, buf.buf))
 		clear_midx_file(the_repository);
diff --git a/midx.c b/midx.c
index e9b2e1253a..cc19b66152 100644
--- a/midx.c
+++ b/midx.c
@@ -416,8 +416,12 @@  int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 	m = load_multi_pack_index(object_dir, local);
 
 	if (m) {
-		m->next = r->objects->multi_pack_index;
-		r->objects->multi_pack_index = m;
+		struct multi_pack_index *mp = r->objects->multi_pack_index;
+		if (mp) {
+			m->next = mp->next;
+			mp->next = m;
+		} else
+			r->objects->multi_pack_index = m;
 		return 1;
 	}
 
diff --git a/packfile.c b/packfile.c
index 6ab5233613..9ef27508f2 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1027,6 +1027,17 @@  struct multi_pack_index *get_multi_pack_index(struct repository *r)
 	return r->objects->multi_pack_index;
 }
 
+struct multi_pack_index *get_local_multi_pack_index(struct repository *r)
+{
+	struct multi_pack_index *m = get_multi_pack_index(r);
+
+	/* no need to iterate; we always put the local one first (if any) */
+	if (m && m->local)
+		return m;
+
+	return NULL;
+}
+
 struct packed_git *get_all_packs(struct repository *r)
 {
 	struct multi_pack_index *m;
diff --git a/packfile.h b/packfile.h
index 240aa73b95..a58fc738e0 100644
--- a/packfile.h
+++ b/packfile.h
@@ -57,6 +57,7 @@  void install_packed_git(struct repository *r, struct packed_git *pack);
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
 struct multi_pack_index *get_multi_pack_index(struct repository *r);
+struct multi_pack_index *get_local_multi_pack_index(struct repository *r);
 struct packed_git *get_all_packs(struct repository *r);
 
 /*