Message ID | a66d2f9f7c20eeb813656e66b3ad9d42f2eecf34.1611617820.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | pack-revindex: introduce on-disk '.rev' format | expand |
On Mon, Jan 25, 2021 at 06:37:46PM -0500, Taylor Blau wrote: > When an on-disk reverse index exists, there is no need to generate one > in memory. In fact, doing so can be slow, and require large amounts of > the heap. > > Let's make sure that we treat the on-disk reverse index with precedence > (i.e., that when it exists, we don't bother trying to generate an > equivalent one in memory) by teaching Git how to conditionally die() > when generating a reverse index in memory. > > Then, add a test to ensure that when (a) an on-disk reverse index > exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we > do not die, implying that we read from the on-disk one. I don't love this kind of hackery, as it will have to live forever if we want to keep testing this feature. But I also don't know how else to tell in the regular test suite that we are avoiding the slow path. Would it be enough to instead add a t/perf test which checks the speed of: echo HEAD | git cat-file --batch-check='%(objectsize:disk)' ? I hate relying on the perf suite, because it gets run way less often (and requires a lot of squinting to interpret the results). But it wouldn't require any extra code the binary, as it's observing the actual user-visible thing we care about: speed. (I guess we care as much or more about peak heap usage, but measuring that is a whole other can of worms). -Peff
On Thu, Jan 28, 2021 at 07:53:34PM -0500, Jeff King wrote: > On Mon, Jan 25, 2021 at 06:37:46PM -0500, Taylor Blau wrote: > > > When an on-disk reverse index exists, there is no need to generate one > > in memory. In fact, doing so can be slow, and require large amounts of > > the heap. > > > > Let's make sure that we treat the on-disk reverse index with precedence > > (i.e., that when it exists, we don't bother trying to generate an > > equivalent one in memory) by teaching Git how to conditionally die() > > when generating a reverse index in memory. > > > > Then, add a test to ensure that when (a) an on-disk reverse index > > exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we > > do not die, implying that we read from the on-disk one. > > I don't love this kind of hackery, as it will have to live forever if we > want to keep testing this feature. But I also don't know how else to > tell in the regular test suite that we are avoiding the slow path. > > Would it be enough to instead add a t/perf test which checks the speed > of: > > echo HEAD | git cat-file --batch-check='%(objectsize:disk)' > > ? I hate relying on the perf suite, because it gets run way less often > (and requires a lot of squinting to interpret the results). But it > wouldn't require any extra code the binary, as it's observing the actual > user-visible thing we care about: speed. > > (I guess we care as much or more about peak heap usage, but measuring > that is a whole other can of worms). Yeah, I think that the thing we care most about is peak RSS, but I agree that I don't really want to wade into figuring out how to measure that reliably during test time (I think there is a separate and less relevant argument about whether that is something that we should be testing or not). But, getting back to your original comment, I think that I actually prefer to have the GIT_TEST_XYZ_DIE stuff in the binary than I do relying on the perf suite to catch stuff like this. I understand your concern about the binary size. I guess you could #ifdef this out and only build it in during the test suite, but then you're testing a different binary, and so that calls into question the integrity of the test suite itself, etc. etc. So, I guess that's all to say that I while I do find this to be hack-y and gross, I don't think that it's all that bad when you compare it to the alternatives. As usual, I'm happy to change it if you feel strongly that it should be changed. > -Peff Thanks, Taylor
On Thu, Jan 28, 2021 at 08:25:30PM -0500, Taylor Blau wrote: > But, getting back to your original comment, I think that I actually > prefer to have the GIT_TEST_XYZ_DIE stuff in the binary than I do > relying on the perf suite to catch stuff like this. > > I understand your concern about the binary size. I guess you could > #ifdef this out and only build it in during the test suite, but then > you're testing a different binary, and so that calls into question the > integrity of the test suite itself, etc. etc. I care less about binary size, and more just that the production binary people use day-to-day has a bunch of weird test-only behaviors baked into it. I guess nobody is likely to trigger this by accident, though, and anybody who can maliciously impact the environment can do a lot worse. So it's not the end of the world to keep. It just offends my sense of cleanliness and simplicity. ;) > So, I guess that's all to say that I while I do find this to be hack-y > and gross, I don't think that it's all that bad when you compare it to > the alternatives. > > As usual, I'm happy to change it if you feel strongly that it should be > changed. I agree the alternatives are pretty lousy. I don't feel strongly enough to complain further. :) -Peff
diff --git a/pack-revindex.c b/pack-revindex.c index a174fa5388..83fe4de773 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -2,6 +2,7 @@ #include "pack-revindex.h" #include "object-store.h" #include "packfile.h" +#include "config.h" struct revindex_entry { off_t offset; @@ -166,6 +167,9 @@ static void create_pack_revindex(struct packed_git *p) static int create_pack_revindex_in_memory(struct packed_git *p) { + if (git_env_bool(GIT_TEST_REV_INDEX_DIE_IN_MEMORY, 0)) + die("dying as requested by '%s'", + GIT_TEST_REV_INDEX_DIE_IN_MEMORY); if (open_pack_index(p)) return -1; create_pack_revindex(p); diff --git a/pack-revindex.h b/pack-revindex.h index d1a0595e89..ba7c82c125 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -21,6 +21,7 @@ #define RIDX_VERSION 1 #define GIT_TEST_WRITE_REV_INDEX "GIT_TEST_WRITE_REV_INDEX" +#define GIT_TEST_REV_INDEX_DIE_IN_MEMORY "GIT_TEST_REV_INDEX_DIE_IN_MEMORY" struct packed_git; diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index be452bb343..a344b18d7e 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -85,4 +85,13 @@ test_expect_success 'pack-objects respects pack.writeReverseIndex' ' test_path_is_file pack-1-*.rev ' +test_expect_success 'reverse index is not generated when available on disk' ' + test_index_pack true && + test_path_is_file $rev && + + git rev-parse HEAD >tip && + GIT_TEST_REV_INDEX_DIE_IN_MEMORY=1 git cat-file \ + --batch-check="%(objectsize:disk)" <tip +' + test_done
When an on-disk reverse index exists, there is no need to generate one in memory. In fact, doing so can be slow, and require large amounts of the heap. Let's make sure that we treat the on-disk reverse index with precedence (i.e., that when it exists, we don't bother trying to generate an equivalent one in memory) by teaching Git how to conditionally die() when generating a reverse index in memory. Then, add a test to ensure that when (a) an on-disk reverse index exists, and (b) when setting GIT_TEST_REV_INDEX_DIE_IN_MEMORY, that we do not die, implying that we read from the on-disk one. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- pack-revindex.c | 4 ++++ pack-revindex.h | 1 + t/t5325-reverse-index.sh | 9 +++++++++ 3 files changed, 14 insertions(+)