Message ID | 20190422211952.GA4728@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | p5302: create the repo in each index-pack test | expand |
Jeff King <peff@peff.net> writes: > Subject: [PATCH] p5302: create the repo in each index-pack test > > The p5302 script runs "index-pack --stdin" in each timing test. It does > two things to try to get good timings: > > 1. we do the repo creation in a separate (non-timed) setup test, so > that our timing is purely the index-pack run > > 2. we use a separate repo for each test; this is important because the > presence of existing objects in the repo influences the result > (because we'll end up doing collision checks against them) > > But this forgets one thing: we generally run each timed test multiple > times to reduce the impact of noise. Which means that repeats of each > test after the first will be subject to the collision slowdown from > point 2, and we'll generally just end up taking the first time anyway. The above is very cleanly written to convince anybody that what the current test does contradicts with wish #2 above, and that the two wishes #1 and #2 are probably mutually incompatible. But isn't the collision check a part of the real-life workload that Git users are made waiting for and care about the performance of? Or are we purely interested in the cost of resolving delta, computing the object name, and writing the result out to the disk in this test and the "overall experience" benchmark is left elsewhere? The reason why I got confused is because the test_description of the script leaves "the actual effects we're interested in measuring" unsaid, I think. The log message of b8a2486f ("index-pack: support multithreaded delta resolving", 2012-05-06) that created this test does not help that much, either. In any case, the above "this forgets one thing" makes it clear that we at this point in time declare what we are interested in very clearly, and I agree that the solution described in the paragraph below clearly matches the goal. Looks good. > Instead, let's create the repo in the test (effectively undoing point > 1). That does add a constant amount of extra work to each iteration, but > it's quite small compared to the actual effects we're interested in > measuring. > Signed-off-by: Jeff King <peff@peff.net> > --- > The very first 0-thread one will run faster because it has less to "rm > -rf", but I think we can ignore that. OK. > - GIT_DIR=t1 git index-pack --threads=1 --stdin < $PACK > + rm -rf repo.git && > + git init --bare repo.git && > + GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK This is obviously inherited from the original, but do we get scolded by some versions of bash for this line, without quoting the source path of the redirection, i.e. ... --stdin <"$PACK"
On Tue, Apr 23, 2019 at 10:09:54AM +0900, Junio C Hamano wrote: > The above is very cleanly written to convince anybody that what the > current test does contradicts with wish #2 above, and that the two > wishes #1 and #2 are probably mutually incompatible. > > But isn't the collision check a part of the real-life workload that > Git users are made waiting for and care about the performance of? > Or are we purely interested in the cost of resolving delta, > computing the object name, and writing the result out to the disk in > this test and the "overall experience" benchmark is left elsewhere? I think we _are_ just interested in the resolving delta cost (after all, we're testing it with various thread levels). What's more, the old code would run the test $GIT_PERF_COUNT times, once without any objects, and then the other N-1 times with objects. And then take the smallest time, which would generally be the one-off! So we're really just measuring that case more consistently now. But even if you left all of that aside, I think the case without objects is actually the realistic one. It represents the equivalent a full clone, where we would not have any objects already. The case where we are fetching into a repository with objects already is also potentially of interest, but this test wouldn't show that very well. Because there the main added cost is looking up each object and saying "ah, we do not have it; no need to do a collision check", because we'd generally not expect the other side to be sending us duplicates. But because this test would be repeating itself on the same pack each time, we'd be seeing a collision on _every_ object. And the added time would be dominated by us saying "oops, a collision; let's take the slow path and reconstruct that object from disk so we can compare its bytes". > The reason why I got confused is because the test_description of the > script leaves "the actual effects we're interested in measuring" > unsaid, I think. The log message of b8a2486f ("index-pack: support > multithreaded delta resolving", 2012-05-06) that created this test > does not help that much, either. > > In any case, the above "this forgets one thing" makes it clear that > we at this point in time declare what we are interested in very > clearly, and I agree that the solution described in the paragraph > below clearly matches the goal. Looks good. So I think you convinced yourself even before my email that this was the right path, but let me know if you think it's worth trying to revise the commit message to include some of the above reasoning. > > - GIT_DIR=t1 git index-pack --threads=1 --stdin < $PACK > > + rm -rf repo.git && > > + git init --bare repo.git && > > + GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK > > This is obviously inherited from the original, but do we get scolded > by some versions of bash for this line, without quoting the source path > of the redirection, i.e. > > ... --stdin <"$PACK" In general, yes, but I think we are OK in this instance because we generated $PACK ourselves in the setup step, and we know that it is just a relative .git/objects/pack/xyz.pack with no spaces. I almost touched it just to get rid of the style-violating space after the "<" though. ;) -Peff
Jeff King <peff@peff.net> writes: >> This is obviously inherited from the original, but do we get scolded >> by some versions of bash for this line, without quoting the source path >> of the redirection, i.e. >> >> ... --stdin <"$PACK" > > In general, yes, but I think we are OK in this instance because we > generated $PACK ourselves in the setup step, and we know that it is just > a relative .git/objects/pack/xyz.pack with no spaces. I know we are OK, but the issue with some versions of bash AFAIU is that bash is not OK regardless of the contents of $variable that is not quoted and used as the target or the source of a redirection, issuing an unnecessary warning.
On Tue, Apr 23, 2019 at 11:27:02AM +0900, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > >> This is obviously inherited from the original, but do we get scolded > >> by some versions of bash for this line, without quoting the source path > >> of the redirection, i.e. > >> > >> ... --stdin <"$PACK" > > > > In general, yes, but I think we are OK in this instance because we > > generated $PACK ourselves in the setup step, and we know that it is just > > a relative .git/objects/pack/xyz.pack with no spaces. > > I know we are OK, but the issue with some versions of bash AFAIU is > that bash is not OK regardless of the contents of $variable that is > not quoted and used as the target or the source of a redirection, > issuing an unnecessary warning. Is it? I thought the issue was specifically when there were spaces. I get: $ bash $ file=ok $ echo foo >$file $ file='not ok' $ echo foo >$file bash: $file: ambiguous redirect And that is AFAIK what the recent 7951a016a5 (t4038-diff-combined: quote paths with whitespace, 2019-03-17) was about (because our trash directory always has a space in it). Did I miss a report where it happens on some versions even without spaces? If so, we have quite a number of things to fix judging from the output of: git grep '>\$' -Peff
Jeff King <peff@peff.net> writes: > Is it? I thought the issue was specifically when there were spaces. I > get: > > $ bash > $ file=ok > $ echo foo >$file > $ file='not ok' > $ echo foo >$file > bash: $file: ambiguous redirect OK, so I misremembered. Then we are good. Thanks.
diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh index 99bdb16c85..a9b3e112d9 100755 --- a/t/perf/p5302-pack-index.sh +++ b/t/perf/p5302-pack-index.sh @@ -13,35 +13,40 @@ test_expect_success 'repack' ' export PACK ' -test_expect_success 'create target repositories' ' - for repo in t1 t2 t3 t4 t5 t6 - do - git init --bare $repo - done -' - test_perf 'index-pack 0 threads' ' - GIT_DIR=t1 git index-pack --threads=1 --stdin < $PACK + rm -rf repo.git && + git init --bare repo.git && + GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK ' test_perf 'index-pack 1 thread ' ' - GIT_DIR=t2 GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK + rm -rf repo.git && + git init --bare repo.git && + GIT_DIR=repo.git GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK ' test_perf 'index-pack 2 threads' ' - GIT_DIR=t3 git index-pack --threads=2 --stdin < $PACK + rm -rf repo.git && + git init --bare repo.git && + GIT_DIR=repo.git git index-pack --threads=2 --stdin < $PACK ' test_perf 'index-pack 4 threads' ' - GIT_DIR=t4 git index-pack --threads=4 --stdin < $PACK + rm -rf repo.git && + git init --bare repo.git && + GIT_DIR=repo.git git index-pack --threads=4 --stdin < $PACK ' test_perf 'index-pack 8 threads' ' - GIT_DIR=t5 git index-pack --threads=8 --stdin < $PACK + rm -rf repo.git && + git init --bare repo.git && + GIT_DIR=repo.git git index-pack --threads=8 --stdin < $PACK ' test_perf 'index-pack default number of threads' ' - GIT_DIR=t6 git index-pack --stdin < $PACK + rm -rf repo.git && + git init --bare repo.git && + GIT_DIR=repo.git git index-pack --stdin < $PACK ' test_done