Message ID | 20181112145442.GH7400@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | caching loose objects | expand |
On 11/12/2018 9:54 AM, Jeff King wrote: > In cases where we expect to ask has_sha1_file() about a lot of objects > that we are not likely to have (e.g., during fetch negotiation), we > already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with > a simultaneous write or repack) for speed (we avoid re-scanning the pack > directory). > > However, even checking for loose objects can be expensive, as we will > stat() each one. On many systems this cost isn't too noticeable, but > stat() can be particularly slow on some operating systems, or due to > network filesystems. > > Since the QUICK flag already tells us that we're OK with a slightly > stale answer, we can use that as a cue to look in our in-memory cache of > each object directory. That basically trades an in-memory binary search > for a stat() call. > > Note that it is possible for this to actually be _slower_. We'll do a > full readdir() to fill the cache, so if you have a very large number of > loose objects and a very small number of lookups, that readdir() may end > up more expensive. > > This shouldn't be a big deal in practice. If you have a large number of > reachable loose objects, you'll already run into performance problems > (which you should remedy by repacking). You may have unreachable objects > which wouldn't otherwise impact performance. Usually these would go away > with the prune step of "git gc", but they may be held for up to 2 weeks > in the default configuration. > > So it comes down to how many such objects you might reasonably expect to > have, how much slower is readdir() on N entries versus M stat() calls > (and here we really care about the syscall backing readdir(), like > getdents() on Linux, but I'll just call this readdir() below). > > If N is much smaller than M (a typical packed repo), we know this is a > big win (few readdirs() followed by many uses of the resulting cache). > When N and M are similar in size, it's also a win. We care about the > latency of making a syscall, and readdir() should be giving us many > values in a single call. How many? > > On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512 > entries per call (which is 64 bytes per entry; the name itself is 38 > bytes, plus there are some other fields). So we can imagine that this is > always a win as long as the number of loose objects in the repository is > a factor of 500 less than the number of lookups you make. It's hard to > auto-tune this because we don't generally know up front how many lookups > we're going to do. But it's unlikely for this to perform significantly > worse. > > Signed-off-by: Jeff King <peff@peff.net> > --- > There's some obvious hand-waving in the paragraphs above. I would love > it if somebody with an NFS system could do some before/after timings > with various numbers of loose objects, to get a sense of where the > breakeven point is. > > My gut is that we do not need the complexity of a cache-size limit, nor > of a config option to disable this. But it would be nice to have a real > number where "reasonable" ends and "pathological" begins. :) I'm interested in such numbers, but do not have the appropriate setup to test. I think the tradeoffs you mention above are reasonable. There's also some chance that this isn't "extra" work but is just "earlier" work, as the abbreviation code would load these loose object directories. > > object-store.h | 1 + > sha1-file.c | 20 ++++++++++++++++++++ > 2 files changed, 21 insertions(+) > > diff --git a/object-store.h b/object-store.h > index bf1e0cb761..60758efad8 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -13,6 +13,7 @@ struct object_directory { > /* > * Used to store the results of readdir(3) calls when we are OK > * sacrificing accuracy due to races for speed. That includes > + * object existence with OBJECT_INFO_QUICK, as well as > * our search for unique abbreviated hashes. Don't use it for tasks > * requiring greater accuracy! > * > diff --git a/sha1-file.c b/sha1-file.c > index 4aae716a37..e53da0b701 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -921,6 +921,24 @@ static int open_sha1_file(struct repository *r, > return -1; > } > > +static int quick_has_loose(struct repository *r, > + const unsigned char *sha1) > +{ > + int subdir_nr = sha1[0]; > + struct object_id oid; > + struct object_directory *odb; > + > + hashcpy(oid.hash, sha1); > + > + prepare_alt_odb(r); > + for (odb = r->objects->odb; odb; odb = odb->next) { > + odb_load_loose_cache(odb, subdir_nr); > + if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0) > + return 1; > + } > + return 0; > +} > + > /* > * Map the loose object at "path" if it is not NULL, or the path found by > * searching for a loose object named "sha1". > @@ -1171,6 +1189,8 @@ static int sha1_loose_object_info(struct repository *r, > if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) { > const char *path; > struct stat st; > + if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) > + return quick_has_loose(r, sha1) ? 0 : -1; > if (stat_sha1_file(r, sha1, &st, &path) < 0) > return -1; > if (oi->disk_sizep) LGTM. Thanks, -Stolee
On Mon, Nov 12 2018, Jeff King wrote: > In cases where we expect to ask has_sha1_file() about a lot of objects > that we are not likely to have (e.g., during fetch negotiation), we > already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with > a simultaneous write or repack) for speed (we avoid re-scanning the pack > directory). > > However, even checking for loose objects can be expensive, as we will > stat() each one. On many systems this cost isn't too noticeable, but > stat() can be particularly slow on some operating systems, or due to > network filesystems. > > Since the QUICK flag already tells us that we're OK with a slightly > stale answer, we can use that as a cue to look in our in-memory cache of > each object directory. That basically trades an in-memory binary search > for a stat() call. > > Note that it is possible for this to actually be _slower_. We'll do a > full readdir() to fill the cache, so if you have a very large number of > loose objects and a very small number of lookups, that readdir() may end > up more expensive. > > This shouldn't be a big deal in practice. If you have a large number of > reachable loose objects, you'll already run into performance problems > (which you should remedy by repacking). You may have unreachable objects > which wouldn't otherwise impact performance. Usually these would go away > with the prune step of "git gc", but they may be held for up to 2 weeks > in the default configuration. > > So it comes down to how many such objects you might reasonably expect to > have, how much slower is readdir() on N entries versus M stat() calls > (and here we really care about the syscall backing readdir(), like > getdents() on Linux, but I'll just call this readdir() below). > > If N is much smaller than M (a typical packed repo), we know this is a > big win (few readdirs() followed by many uses of the resulting cache). > When N and M are similar in size, it's also a win. We care about the > latency of making a syscall, and readdir() should be giving us many > values in a single call. How many? > > On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512 > entries per call (which is 64 bytes per entry; the name itself is 38 > bytes, plus there are some other fields). So we can imagine that this is > always a win as long as the number of loose objects in the repository is > a factor of 500 less than the number of lookups you make. It's hard to > auto-tune this because we don't generally know up front how many lookups > we're going to do. But it's unlikely for this to perform significantly > worse. > > Signed-off-by: Jeff King <peff@peff.net> > --- > There's some obvious hand-waving in the paragraphs above. I would love > it if somebody with an NFS system could do some before/after timings > with various numbers of loose objects, to get a sense of where the > breakeven point is. > > My gut is that we do not need the complexity of a cache-size limit, nor > of a config option to disable this. But it would be nice to have a real > number where "reasonable" ends and "pathological" begins. :) I'm happy to test this on some of the NFS we have locally, and started out with a plan to write some for-loop using the low-level API (so it would look up all 256), fake populate .git/objects/?? with N number of objects etc, but ran out of time. Do you have something ready that you think would be representative and I could just run? If not I'll try to pick this up again...
On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote: > > There's some obvious hand-waving in the paragraphs above. I would love > > it if somebody with an NFS system could do some before/after timings > > with various numbers of loose objects, to get a sense of where the > > breakeven point is. > > > > My gut is that we do not need the complexity of a cache-size limit, nor > > of a config option to disable this. But it would be nice to have a real > > number where "reasonable" ends and "pathological" begins. :) > > I'm happy to test this on some of the NFS we have locally, and started > out with a plan to write some for-loop using the low-level API (so it > would look up all 256), fake populate .git/objects/?? with N number of > objects etc, but ran out of time. > > Do you have something ready that you think would be representative and I > could just run? If not I'll try to pick this up again... No, but they don't even really need to be actual objects. So I suspect something like: git init for i in $(seq 256); do i=$(printf %02x $i) mkdir -p .git/objects/$i for j in $(seq --format=%038g 1000); do echo foo >.git/objects/$i/$j done done git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack might work (for various values of 1000). The shell loop would probably be faster as perl, too. :) Make sure you clear the object directory between runs, though (otherwise the subsequent index-pack's really do find collisions and spend time accessing the objects). If you want real objects, you could probably just dump a bunch of sequential blobs to fast-import, and then pipe the result to unpack-objects. -Peff
On Mon, Nov 12 2018, Jeff King wrote: > On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> > There's some obvious hand-waving in the paragraphs above. I would love >> > it if somebody with an NFS system could do some before/after timings >> > with various numbers of loose objects, to get a sense of where the >> > breakeven point is. >> > >> > My gut is that we do not need the complexity of a cache-size limit, nor >> > of a config option to disable this. But it would be nice to have a real >> > number where "reasonable" ends and "pathological" begins. :) >> >> I'm happy to test this on some of the NFS we have locally, and started >> out with a plan to write some for-loop using the low-level API (so it >> would look up all 256), fake populate .git/objects/?? with N number of >> objects etc, but ran out of time. >> >> Do you have something ready that you think would be representative and I >> could just run? If not I'll try to pick this up again... > > No, but they don't even really need to be actual objects. So I suspect > something like: > > git init > for i in $(seq 256); do > i=$(printf %02x $i) > mkdir -p .git/objects/$i > for j in $(seq --format=%038g 1000); do > echo foo >.git/objects/$i/$j > done > done > git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack > > might work (for various values of 1000). The shell loop would probably > be faster as perl, too. :) > > Make sure you clear the object directory between runs, though (otherwise > the subsequent index-pack's really do find collisions and spend time > accessing the objects). > > If you want real objects, you could probably just dump a bunch of > sequential blobs to fast-import, and then pipe the result to > unpack-objects. > > -Peff I did a very ad-hoc test against a NetApp filer using the test script quoted at the end of this E-Mail. The test compared origin/master, this branch of yours, and my core.checkCollisions=false branch. When run with DBD-mysql.git (just some random ~1k commit repo I had): $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh I get: Test origin/master peff/jk/loose-cache avar/check-collisions-config ------------------------------------------------------------------------------------------------------------------------ 0008.2: index-pack with 256*1 loose objects 4.31(0.55+0.18) 0.41(0.40+0.02) -90.5% 0.23(0.36+0.01) -94.7% 0008.3: index-pack with 256*10 loose objects 4.37(0.45+0.21) 0.45(0.40+0.02) -89.7% 0.25(0.38+0.01) -94.3% 0008.4: index-pack with 256*100 loose objects 4.47(0.53+0.23) 0.67(0.63+0.02) -85.0% 0.24(0.38+0.01) -94.6% 0008.5: index-pack with 256*250 loose objects 5.01(0.67+0.30) 1.04(0.98+0.06) -79.2% 0.24(0.37+0.01) -95.2% 0008.6: index-pack with 256*500 loose objects 5.11(0.57+0.21) 1.81(1.70+0.09) -64.6% 0.25(0.38+0.01) -95.1% 0008.7: index-pack with 256*750 loose objects 5.12(0.60+0.22) 2.54(2.38+0.14) -50.4% 0.24(0.38+0.01) -95.3% 0008.8: index-pack with 256*1000 loose objects 4.52(0.52+0.21) 3.36(3.17+0.17) -25.7% 0.23(0.36+0.01) -94.9% I then hacked it to test against git.git, but skipped origin/master for that one because it takes *ages*. So just mine v.s. yours: $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh [...] Test peff/jk/loose-cache avar/check-collisions-config --------------------------------------------------------------------------------------------------- 0008.2: index-pack with 256*1 loose objects 12.57(28.72+0.61) 12.68(29.36+0.62) +0.9% 0008.3: index-pack with 256*10 loose objects 12.77(28.75+0.61) 12.50(28.88+0.56) -2.1% 0008.4: index-pack with 256*100 loose objects 13.20(29.49+0.66) 12.38(28.58+0.60) -6.2% 0008.5: index-pack with 256*250 loose objects 14.10(30.59+0.64) 12.54(28.22+0.57) -11.1% 0008.6: index-pack with 256*500 loose objects 14.48(31.06+0.74) 12.43(28.59+0.60) -14.2% 0008.7: index-pack with 256*750 loose objects 15.31(31.91+0.74) 12.67(29.23+0.64) -17.2% 0008.8: index-pack with 256*1000 loose objects 16.34(32.84+0.76) 13.11(30.19+0.68) -19.8% So not much of a practical difference perhaps. But then again this isn't a very realistic test case of anything. Rarely are you going to push a history of something the size of git.git into a repo with this many loose objects. Using sha1collisiondetection.git is I think the most realistic scenario, i.e. you'll often end up fetching/pushing something roughly the size of its entire history on a big repo, and with it: Test peff/jk/loose-cache avar/check-collisions-config --------------------------------------------------------------------------------------------------- 0008.2: index-pack with 256*1 loose objects 0.16(0.04+0.01) 0.05(0.03+0.00) -68.8% 0008.3: index-pack with 256*10 loose objects 0.19(0.04+0.02) 0.05(0.02+0.00) -73.7% 0008.4: index-pack with 256*100 loose objects 0.32(0.17+0.02) 0.04(0.02+0.00) -87.5% 0008.5: index-pack with 256*250 loose objects 0.57(0.41+0.03) 0.04(0.02+0.00) -93.0% 0008.6: index-pack with 256*500 loose objects 1.02(0.83+0.06) 0.04(0.03+0.00) -96.1% 0008.7: index-pack with 256*750 loose objects 1.47(1.24+0.10) 0.04(0.02+0.00) -97.3% 0008.8: index-pack with 256*1000 loose objects 1.94(1.70+0.10) 0.04(0.02+0.00) -97.9% As noted in previous threads I have an in-house monorepo where (due to expiry policies) loose objects hover around the 256*250 mark. The script, which is hacky as hell and takes shortcuts not to re-create the huge fake loose object collection every time (takes ages). Perhaps you're interested in incorporating some version of this into a v2. To be useful it should take some target path as an env variable. $ cat t/perf/p0008-index-pack.sh #!/bin/sh test_description="Tests performance of index-pack with loose objects" . ./perf-lib.sh test_perf_fresh_repo test_expect_success 'setup tests' ' for count in 1 10 100 250 500 750 1000 do if test -d /mnt/ontap_githackers/repo-$count.git then rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack else git init --bare /mnt/ontap_githackers/repo-$count.git && ( cd /mnt/ontap_githackers/repo-$count.git && for i in $(seq 0 255) do i=$(printf %02x $i) && mkdir objects/$i && for j in $(seq --format=%038g $count) do >objects/$i/$j done done ) fi done ' for count in 1 10 100 250 500 750 1000 do echo 3 | sudo tee /proc/sys/vm/drop_caches test_perf "index-pack with 256*$count loose objects" " ( cd /mnt/ontap_githackers/repo-$count.git && rm -fv objects/pack/*; git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack ) " done test_done
On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote: > On Mon, Nov 12 2018, Jeff King wrote: > >> On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote: >> >>> > There's some obvious hand-waving in the paragraphs above. I would love >>> > it if somebody with an NFS system could do some before/after timings >>> > with various numbers of loose objects, to get a sense of where the >>> > breakeven point is. >>> > >>> > My gut is that we do not need the complexity of a cache-size limit, nor >>> > of a config option to disable this. But it would be nice to have a real >>> > number where "reasonable" ends and "pathological" begins. :) >>> >>> I'm happy to test this on some of the NFS we have locally, and started >>> out with a plan to write some for-loop using the low-level API (so it >>> would look up all 256), fake populate .git/objects/?? with N number of >>> objects etc, but ran out of time. >>> >>> Do you have something ready that you think would be representative and I >>> could just run? If not I'll try to pick this up again... >> >> No, but they don't even really need to be actual objects. So I suspect >> something like: >> >> git init >> for i in $(seq 256); do >> i=$(printf %02x $i) >> mkdir -p .git/objects/$i >> for j in $(seq --format=%038g 1000); do >> echo foo >.git/objects/$i/$j >> done >> done >> git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack >> >> might work (for various values of 1000). The shell loop would probably >> be faster as perl, too. :) >> >> Make sure you clear the object directory between runs, though (otherwise >> the subsequent index-pack's really do find collisions and spend time >> accessing the objects). >> >> If you want real objects, you could probably just dump a bunch of >> sequential blobs to fast-import, and then pipe the result to >> unpack-objects. >> >> -Peff > > I did a very ad-hoc test against a NetApp filer using the test script > quoted at the end of this E-Mail. The test compared origin/master, this > branch of yours, and my core.checkCollisions=false branch. > > When run with DBD-mysql.git (just some random ~1k commit repo I had): > > $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh > > I get: > > Test origin/master peff/jk/loose-cache avar/check-collisions-config > ------------------------------------------------------------------------------------------------------------------------ > 0008.2: index-pack with 256*1 loose objects 4.31(0.55+0.18) 0.41(0.40+0.02) -90.5% 0.23(0.36+0.01) -94.7% > 0008.3: index-pack with 256*10 loose objects 4.37(0.45+0.21) 0.45(0.40+0.02) -89.7% 0.25(0.38+0.01) -94.3% > 0008.4: index-pack with 256*100 loose objects 4.47(0.53+0.23) 0.67(0.63+0.02) -85.0% 0.24(0.38+0.01) -94.6% > 0008.5: index-pack with 256*250 loose objects 5.01(0.67+0.30) 1.04(0.98+0.06) -79.2% 0.24(0.37+0.01) -95.2% > 0008.6: index-pack with 256*500 loose objects 5.11(0.57+0.21) 1.81(1.70+0.09) -64.6% 0.25(0.38+0.01) -95.1% > 0008.7: index-pack with 256*750 loose objects 5.12(0.60+0.22) 2.54(2.38+0.14) -50.4% 0.24(0.38+0.01) -95.3% > 0008.8: index-pack with 256*1000 loose objects 4.52(0.52+0.21) 3.36(3.17+0.17) -25.7% 0.23(0.36+0.01) -94.9% > > I then hacked it to test against git.git, but skipped origin/master for > that one because it takes *ages*. So just mine v.s. yours: > > $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh > [...] > Test peff/jk/loose-cache avar/check-collisions-config > --------------------------------------------------------------------------------------------------- > 0008.2: index-pack with 256*1 loose objects 12.57(28.72+0.61) 12.68(29.36+0.62) +0.9% > 0008.3: index-pack with 256*10 loose objects 12.77(28.75+0.61) 12.50(28.88+0.56) -2.1% > 0008.4: index-pack with 256*100 loose objects 13.20(29.49+0.66) 12.38(28.58+0.60) -6.2% > 0008.5: index-pack with 256*250 loose objects 14.10(30.59+0.64) 12.54(28.22+0.57) -11.1% > 0008.6: index-pack with 256*500 loose objects 14.48(31.06+0.74) 12.43(28.59+0.60) -14.2% > 0008.7: index-pack with 256*750 loose objects 15.31(31.91+0.74) 12.67(29.23+0.64) -17.2% > 0008.8: index-pack with 256*1000 loose objects 16.34(32.84+0.76) 13.11(30.19+0.68) -19.8% > > So not much of a practical difference perhaps. But then again this isn't > a very realistic test case of anything. Rarely are you going to push a > history of something the size of git.git into a repo with this many > loose objects. > > Using sha1collisiondetection.git is I think the most realistic scenario, > i.e. you'll often end up fetching/pushing something roughly the size of > its entire history on a big repo, and with it: > > Test peff/jk/loose-cache avar/check-collisions-config > --------------------------------------------------------------------------------------------------- > 0008.2: index-pack with 256*1 loose objects 0.16(0.04+0.01) 0.05(0.03+0.00) -68.8% > 0008.3: index-pack with 256*10 loose objects 0.19(0.04+0.02) 0.05(0.02+0.00) -73.7% > 0008.4: index-pack with 256*100 loose objects 0.32(0.17+0.02) 0.04(0.02+0.00) -87.5% > 0008.5: index-pack with 256*250 loose objects 0.57(0.41+0.03) 0.04(0.02+0.00) -93.0% > 0008.6: index-pack with 256*500 loose objects 1.02(0.83+0.06) 0.04(0.03+0.00) -96.1% > 0008.7: index-pack with 256*750 loose objects 1.47(1.24+0.10) 0.04(0.02+0.00) -97.3% > 0008.8: index-pack with 256*1000 loose objects 1.94(1.70+0.10) 0.04(0.02+0.00) -97.9% > > As noted in previous threads I have an in-house monorepo where (due to > expiry policies) loose objects hover around the 256*250 mark. > > The script, which is hacky as hell and takes shortcuts not to re-create > the huge fake loose object collection every time (takes ages). Perhaps > you're interested in incorporating some version of this into a v2. To be > useful it should take some target path as an env variable. I forgot perhaps the most useful metric. Testing against origin/master too on the sha1collisiondetection.git repo, which as noted above I think is a good stand-in for making a medium sized push to a big repo. This shows when the loose cache becomes counterproductive: Test origin/master peff/jk/loose-cache avar/check-collisions-config ------------------------------------------------------------------------------------------------------------------------- 0008.2: index-pack with 256*1 loose objects 0.42(0.04+0.03) 0.17(0.04+0.00) -59.5% 0.04(0.03+0.00) -90.5% 0008.3: index-pack with 256*10 loose objects 0.49(0.04+0.03) 0.19(0.04+0.01) -61.2% 0.04(0.02+0.00) -91.8% 0008.4: index-pack with 256*100 loose objects 0.49(0.04+0.04) 0.33(0.18+0.01) -32.7% 0.05(0.02+0.00) -89.8% 0008.5: index-pack with 256*250 loose objects 0.54(0.03+0.04) 0.59(0.43+0.02) +9.3% 0.04(0.02+0.01) -92.6% 0008.6: index-pack with 256*500 loose objects 0.49(0.04+0.03) 1.04(0.83+0.07) +112.2% 0.04(0.02+0.00) -91.8% 0008.7: index-pack with 256*750 loose objects 0.56(0.04+0.05) 1.50(1.28+0.08) +167.9% 0.04(0.02+0.00) -92.9% 0008.8: index-pack with 256*1000 loose objects 0.54(0.05+0.03) 1.95(1.68+0.13) +261.1% 0.04(0.02+0.00) -92.6% I still think it's best to take this patch series since it's unlikely we're making anything worse in practice, the >50k objects case is a really high number, which I don't think is worth worrying about. But I am somewhat paranoid about the potential performance regression. I.e. this is me testing against a really expensive and relatively well performing NetApp NFS device where the ping stats are: rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms So I suspect this might get a lot worse for setups which don't enjoy the same performance or network locality. > $ cat t/perf/p0008-index-pack.sh > #!/bin/sh > > test_description="Tests performance of index-pack with loose objects" > > . ./perf-lib.sh > > test_perf_fresh_repo > > test_expect_success 'setup tests' ' > for count in 1 10 100 250 500 750 1000 > do > if test -d /mnt/ontap_githackers/repo-$count.git > then > rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack > else > git init --bare /mnt/ontap_githackers/repo-$count.git && > ( > cd /mnt/ontap_githackers/repo-$count.git && > for i in $(seq 0 255) > do > i=$(printf %02x $i) && > mkdir objects/$i && > for j in $(seq --format=%038g $count) > do > >objects/$i/$j > done > done > ) > fi > done > ' > > for count in 1 10 100 250 500 750 1000 > do > echo 3 | sudo tee /proc/sys/vm/drop_caches > test_perf "index-pack with 256*$count loose objects" " > ( > cd /mnt/ontap_githackers/repo-$count.git && > rm -fv objects/pack/*; > git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack > ) > " > done > > test_done
On Mon, Nov 12, 2018 at 11:21:51AM -0500, Jeff King wrote: > No, but they don't even really need to be actual objects. So I suspect > something like: > > git init > for i in $(seq 256); do > i=$(printf %02x $i) > mkdir -p .git/objects/$i > for j in $(seq --format=%038g 1000); do > echo foo >.git/objects/$i/$j > done > done > git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack > > might work (for various values of 1000). The shell loop would probably > be faster as perl, too. :) > > Make sure you clear the object directory between runs, though (otherwise > the subsequent index-pack's really do find collisions and spend time > accessing the objects). Below are my results. They are not as comprehensive as Ævar's tests. Similary I kept the loose objects between tests and removed the packs instead. And I also used the "echo 3 | sudo tee /proc/sys/vm/drop_caches" trick :) This is with git.git: origin/master jk/loose-object-cache 256*100 objects 520s 13.5s (-97%) 256*1000 objects 826s 59s (-93%) I've started a 256*10K setup but that's still creating the 2.5M loose objects. I'll post the results when it's done. I would expect that jk/loose-object-cache is still marginally faster than origin/master based on a simple linear extrapolation.
On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote: > On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote: > >> On Mon, Nov 12 2018, Jeff King wrote: >> >>> On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote: >>> >>>> > There's some obvious hand-waving in the paragraphs above. I would love >>>> > it if somebody with an NFS system could do some before/after timings >>>> > with various numbers of loose objects, to get a sense of where the >>>> > breakeven point is. >>>> > >>>> > My gut is that we do not need the complexity of a cache-size limit, nor >>>> > of a config option to disable this. But it would be nice to have a real >>>> > number where "reasonable" ends and "pathological" begins. :) >>>> >>>> I'm happy to test this on some of the NFS we have locally, and started >>>> out with a plan to write some for-loop using the low-level API (so it >>>> would look up all 256), fake populate .git/objects/?? with N number of >>>> objects etc, but ran out of time. >>>> >>>> Do you have something ready that you think would be representative and I >>>> could just run? If not I'll try to pick this up again... >>> >>> No, but they don't even really need to be actual objects. So I suspect >>> something like: >>> >>> git init >>> for i in $(seq 256); do >>> i=$(printf %02x $i) >>> mkdir -p .git/objects/$i >>> for j in $(seq --format=%038g 1000); do >>> echo foo >.git/objects/$i/$j >>> done >>> done >>> git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack >>> >>> might work (for various values of 1000). The shell loop would probably >>> be faster as perl, too. :) >>> >>> Make sure you clear the object directory between runs, though (otherwise >>> the subsequent index-pack's really do find collisions and spend time >>> accessing the objects). >>> >>> If you want real objects, you could probably just dump a bunch of >>> sequential blobs to fast-import, and then pipe the result to >>> unpack-objects. >>> >>> -Peff >> >> I did a very ad-hoc test against a NetApp filer using the test script >> quoted at the end of this E-Mail. The test compared origin/master, this >> branch of yours, and my core.checkCollisions=false branch. >> >> When run with DBD-mysql.git (just some random ~1k commit repo I had): >> >> $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh >> >> I get: >> >> Test origin/master peff/jk/loose-cache avar/check-collisions-config >> ------------------------------------------------------------------------------------------------------------------------ >> 0008.2: index-pack with 256*1 loose objects 4.31(0.55+0.18) 0.41(0.40+0.02) -90.5% 0.23(0.36+0.01) -94.7% >> 0008.3: index-pack with 256*10 loose objects 4.37(0.45+0.21) 0.45(0.40+0.02) -89.7% 0.25(0.38+0.01) -94.3% >> 0008.4: index-pack with 256*100 loose objects 4.47(0.53+0.23) 0.67(0.63+0.02) -85.0% 0.24(0.38+0.01) -94.6% >> 0008.5: index-pack with 256*250 loose objects 5.01(0.67+0.30) 1.04(0.98+0.06) -79.2% 0.24(0.37+0.01) -95.2% >> 0008.6: index-pack with 256*500 loose objects 5.11(0.57+0.21) 1.81(1.70+0.09) -64.6% 0.25(0.38+0.01) -95.1% >> 0008.7: index-pack with 256*750 loose objects 5.12(0.60+0.22) 2.54(2.38+0.14) -50.4% 0.24(0.38+0.01) -95.3% >> 0008.8: index-pack with 256*1000 loose objects 4.52(0.52+0.21) 3.36(3.17+0.17) -25.7% 0.23(0.36+0.01) -94.9% >> >> I then hacked it to test against git.git, but skipped origin/master for >> that one because it takes *ages*. So just mine v.s. yours: >> >> $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh >> [...] >> Test peff/jk/loose-cache avar/check-collisions-config >> --------------------------------------------------------------------------------------------------- >> 0008.2: index-pack with 256*1 loose objects 12.57(28.72+0.61) 12.68(29.36+0.62) +0.9% >> 0008.3: index-pack with 256*10 loose objects 12.77(28.75+0.61) 12.50(28.88+0.56) -2.1% >> 0008.4: index-pack with 256*100 loose objects 13.20(29.49+0.66) 12.38(28.58+0.60) -6.2% >> 0008.5: index-pack with 256*250 loose objects 14.10(30.59+0.64) 12.54(28.22+0.57) -11.1% >> 0008.6: index-pack with 256*500 loose objects 14.48(31.06+0.74) 12.43(28.59+0.60) -14.2% >> 0008.7: index-pack with 256*750 loose objects 15.31(31.91+0.74) 12.67(29.23+0.64) -17.2% >> 0008.8: index-pack with 256*1000 loose objects 16.34(32.84+0.76) 13.11(30.19+0.68) -19.8% >> >> So not much of a practical difference perhaps. But then again this isn't >> a very realistic test case of anything. Rarely are you going to push a >> history of something the size of git.git into a repo with this many >> loose objects. >> >> Using sha1collisiondetection.git is I think the most realistic scenario, >> i.e. you'll often end up fetching/pushing something roughly the size of >> its entire history on a big repo, and with it: >> >> Test peff/jk/loose-cache avar/check-collisions-config >> --------------------------------------------------------------------------------------------------- >> 0008.2: index-pack with 256*1 loose objects 0.16(0.04+0.01) 0.05(0.03+0.00) -68.8% >> 0008.3: index-pack with 256*10 loose objects 0.19(0.04+0.02) 0.05(0.02+0.00) -73.7% >> 0008.4: index-pack with 256*100 loose objects 0.32(0.17+0.02) 0.04(0.02+0.00) -87.5% >> 0008.5: index-pack with 256*250 loose objects 0.57(0.41+0.03) 0.04(0.02+0.00) -93.0% >> 0008.6: index-pack with 256*500 loose objects 1.02(0.83+0.06) 0.04(0.03+0.00) -96.1% >> 0008.7: index-pack with 256*750 loose objects 1.47(1.24+0.10) 0.04(0.02+0.00) -97.3% >> 0008.8: index-pack with 256*1000 loose objects 1.94(1.70+0.10) 0.04(0.02+0.00) -97.9% >> >> As noted in previous threads I have an in-house monorepo where (due to >> expiry policies) loose objects hover around the 256*250 mark. >> >> The script, which is hacky as hell and takes shortcuts not to re-create >> the huge fake loose object collection every time (takes ages). Perhaps >> you're interested in incorporating some version of this into a v2. To be >> useful it should take some target path as an env variable. > > I forgot perhaps the most useful metric. Testing against origin/master > too on the sha1collisiondetection.git repo, which as noted above I think > is a good stand-in for making a medium sized push to a big repo. This > shows when the loose cache becomes counterproductive: > > Test origin/master peff/jk/loose-cache avar/check-collisions-config > ------------------------------------------------------------------------------------------------------------------------- > 0008.2: index-pack with 256*1 loose objects 0.42(0.04+0.03) 0.17(0.04+0.00) -59.5% 0.04(0.03+0.00) -90.5% > 0008.3: index-pack with 256*10 loose objects 0.49(0.04+0.03) 0.19(0.04+0.01) -61.2% 0.04(0.02+0.00) -91.8% > 0008.4: index-pack with 256*100 loose objects 0.49(0.04+0.04) 0.33(0.18+0.01) -32.7% 0.05(0.02+0.00) -89.8% > 0008.5: index-pack with 256*250 loose objects 0.54(0.03+0.04) 0.59(0.43+0.02) +9.3% 0.04(0.02+0.01) -92.6% > 0008.6: index-pack with 256*500 loose objects 0.49(0.04+0.03) 1.04(0.83+0.07) +112.2% 0.04(0.02+0.00) -91.8% > 0008.7: index-pack with 256*750 loose objects 0.56(0.04+0.05) 1.50(1.28+0.08) +167.9% 0.04(0.02+0.00) -92.9% > 0008.8: index-pack with 256*1000 loose objects 0.54(0.05+0.03) 1.95(1.68+0.13) +261.1% 0.04(0.02+0.00) -92.6% > > I still think it's best to take this patch series since it's unlikely > we're making anything worse in practice, the >50k objects case is a > really high number, which I don't think is worth worrying about. > > But I am somewhat paranoid about the potential performance > regression. I.e. this is me testing against a really expensive and > relatively well performing NetApp NFS device where the ping stats are: > > rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms > > So I suspect this might get a lot worse for setups which don't enjoy the > same performance or network locality. I tried this with the same filer mounted from another DC with ~10x the RTT: rtt min/avg/max/mdev = 11.553/11.618/11.739/0.121 ms But otherwise the same setup (same machine type/specs mounting it). It had the opposite results of what I was expecting: Test origin/master peff/jk/loose-cache avar/check-collisions-config ------------------------------------------------------------------------------------------------------------------------ 0008.2: index-pack with 256*1 loose objects 7.78(0.04+0.03) 2.75(0.03+0.01) -64.7% 0.40(0.02+0.00) -94.9% 0008.3: index-pack with 256*10 loose objects 7.75(0.04+0.04) 2.77(0.05+0.01) -64.3% 0.40(0.02+0.00) -94.8% 0008.4: index-pack with 256*100 loose objects 7.75(0.05+0.02) 2.91(0.18+0.01) -62.5% 0.40(0.02+0.00) -94.8% 0008.5: index-pack with 256*250 loose objects 7.73(0.04+0.04) 3.19(0.43+0.02) -58.7% 0.40(0.02+0.00) -94.8% 0008.6: index-pack with 256*500 loose objects 7.73(0.04+0.04) 3.64(0.83+0.05) -52.9% 0.40(0.02+0.00) -94.8% 0008.7: index-pack with 256*750 loose objects 7.73(0.04+0.02) 4.14(1.29+0.07) -46.4% 0.40(0.02+0.00) -94.8% 0008.8: index-pack with 256*1000 loose objects 7.73(0.04+0.03) 4.55(1.72+0.09) -41.1% 0.40(0.02+0.01) -94.8% I.e. there the cliff of where the cache becomes counterproductive comes much later, not earlier. The sha1collisiondetection.git repo has 418 objects. So is it cheaper to fill a huge cache than look up those 418? I don't know, haven't dug. But so far what this suggests is that we're helping slow FSs to the detriment of faster ones. So here's the same test not against NFS, but the local ext4 fs (CO7; Linux 3.10) for sha1collisiondetection.git: Test origin/master peff/jk/loose-cache avar/check-collisions-config -------------------------------------------------------------------------------------------------------------------------- 0008.2: index-pack with 256*1 loose objects 0.02(0.02+0.00) 0.02(0.02+0.01) +0.0% 0.02(0.02+0.00) +0.0% 0008.3: index-pack with 256*10 loose objects 0.02(0.02+0.00) 0.03(0.03+0.00) +50.0% 0.02(0.02+0.00) +0.0% 0008.4: index-pack with 256*100 loose objects 0.02(0.02+0.00) 0.17(0.16+0.01) +750.0% 0.02(0.02+0.00) +0.0% 0008.5: index-pack with 256*250 loose objects 0.02(0.02+0.00) 0.43(0.40+0.03) +2050.0% 0.02(0.02+0.00) +0.0% 0008.6: index-pack with 256*500 loose objects 0.02(0.02+0.00) 0.88(0.80+0.09) +4300.0% 0.02(0.02+0.00) +0.0% 0008.7: index-pack with 256*750 loose objects 0.02(0.02+0.00) 1.35(1.27+0.09) +6650.0% 0.02(0.02+0.00) +0.0% 0008.8: index-pack with 256*1000 loose objects 0.02(0.02+0.00) 1.83(1.70+0.14) +9050.0% 0.02(0.02+0.00) +0.0% And for mu.git, a ~20k object repo: Test origin/master peff/jk/loose-cache avar/check-collisions-config ------------------------------------------------------------------------------------------------------------------------- 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4% 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4% 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7% 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7% 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7% 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7% 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4% All of which is to say that I think it definitely makes sense to re-roll this with a perf test, and a switch to toggle it + docs explaining the caveats & pointing to the perf test. It's a clear win in some scenarios, but a big loss in others. >> $ cat t/perf/p0008-index-pack.sh >> #!/bin/sh >> >> test_description="Tests performance of index-pack with loose objects" >> >> . ./perf-lib.sh >> >> test_perf_fresh_repo >> >> test_expect_success 'setup tests' ' >> for count in 1 10 100 250 500 750 1000 >> do >> if test -d /mnt/ontap_githackers/repo-$count.git >> then >> rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack >> else >> git init --bare /mnt/ontap_githackers/repo-$count.git && >> ( >> cd /mnt/ontap_githackers/repo-$count.git && >> for i in $(seq 0 255) >> do >> i=$(printf %02x $i) && >> mkdir objects/$i && >> for j in $(seq --format=%038g $count) >> do >> >objects/$i/$j >> done >> done >> ) >> fi >> done >> ' >> >> for count in 1 10 100 250 500 750 1000 >> do >> echo 3 | sudo tee /proc/sys/vm/drop_caches >> test_perf "index-pack with 256*$count loose objects" " >> ( >> cd /mnt/ontap_githackers/repo-$count.git && >> rm -fv objects/pack/*; >> git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack >> ) >> " >> done >> >> test_done
Am 13.11.2018 um 11:02 schrieb Ævar Arnfjörð Bjarmason: > So here's the same test not against NFS, but the local ext4 fs (CO7; > Linux 3.10) for sha1collisiondetection.git: > > Test origin/master peff/jk/loose-cache avar/check-collisions-config > -------------------------------------------------------------------------------------------------------------------------- > 0008.2: index-pack with 256*1 loose objects 0.02(0.02+0.00) 0.02(0.02+0.01) +0.0% 0.02(0.02+0.00) +0.0% > 0008.3: index-pack with 256*10 loose objects 0.02(0.02+0.00) 0.03(0.03+0.00) +50.0% 0.02(0.02+0.00) +0.0% > 0008.4: index-pack with 256*100 loose objects 0.02(0.02+0.00) 0.17(0.16+0.01) +750.0% 0.02(0.02+0.00) +0.0% > 0008.5: index-pack with 256*250 loose objects 0.02(0.02+0.00) 0.43(0.40+0.03) +2050.0% 0.02(0.02+0.00) +0.0% > 0008.6: index-pack with 256*500 loose objects 0.02(0.02+0.00) 0.88(0.80+0.09) +4300.0% 0.02(0.02+0.00) +0.0% > 0008.7: index-pack with 256*750 loose objects 0.02(0.02+0.00) 1.35(1.27+0.09) +6650.0% 0.02(0.02+0.00) +0.0% > 0008.8: index-pack with 256*1000 loose objects 0.02(0.02+0.00) 1.83(1.70+0.14) +9050.0% 0.02(0.02+0.00) +0.0% Ouch. > And for mu.git, a ~20k object repo: > > Test origin/master peff/jk/loose-cache avar/check-collisions-config > ------------------------------------------------------------------------------------------------------------------------- > 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4% > 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4% > 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7% > 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7% > 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7% > 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7% > 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4% > > All of which is to say that I think it definitely makes sense to re-roll > this with a perf test, and a switch to toggle it + docs explaining the > caveats & pointing to the perf test. It's a clear win in some scenarios, > but a big loss in others. Right, but can we perhaps find a way to toggle it automatically, like the special fetch-pack cache tried to do? So the code needs to decide between using lstat() on individual loose objects files or opendir()+readdir() to load the names in a whole fan-out directory. Intuitively I'd try to solve it using red tape, by measuring the duration of both kinds of calls, and then try to find a heuristic based on those numbers. Is the overhead worth it? But first, may I interest you in some further complication? We can also use access(2) to check for the existence of files. It doesn't need to fill in struct stat, so may have a slight advantage if we don't need any of that information. The following patch is a replacement for patch 8 and improves performance by ca. 3% with git.git on an SSD for me; I'm curious to see how it does on NFS: --- sha1-file.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/sha1-file.c b/sha1-file.c index b77dacedc7..5315c37cbc 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -888,8 +888,13 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1, prepare_alt_odb(r); for (odb = r->objects->odb; odb; odb = odb->next) { *path = odb_loose_path(odb, &buf, sha1); - if (!lstat(*path, st)) - return 0; + if (st) { + if (!lstat(*path, st)) + return 0; + } else { + if (!access(*path, F_OK)) + return 0; + } } return -1; @@ -1171,7 +1176,8 @@ static int sha1_loose_object_info(struct repository *r, if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) { const char *path; struct stat st; - if (stat_sha1_file(r, sha1, &st, &path) < 0) + struct stat *stp = oi->disk_sizep ? &st : NULL; + if (stat_sha1_file(r, sha1, stp, &path) < 0) return -1; if (oi->disk_sizep) *oi->disk_sizep = st.st_size; @@ -1382,7 +1388,6 @@ void *read_object_file_extended(const struct object_id *oid, void *data; const struct packed_git *p; const char *path; - struct stat st; const struct object_id *repl = lookup_replace ? lookup_replace_object(the_repository, oid) : oid; @@ -1399,7 +1404,7 @@ void *read_object_file_extended(const struct object_id *oid, die(_("replacement %s not found for %s"), oid_to_hex(repl), oid_to_hex(oid)); - if (!stat_sha1_file(the_repository, repl->hash, &st, &path)) + if (!stat_sha1_file(the_repository, repl->hash, NULL, &path)) die(_("loose object %s (stored in %s) is corrupt"), oid_to_hex(repl), path);
Am 12.11.2018 um 15:54 schrieb Jeff King: > diff --git a/sha1-file.c b/sha1-file.c > index 4aae716a37..e53da0b701 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -921,6 +921,24 @@ static int open_sha1_file(struct repository *r, > return -1; > } > > +static int quick_has_loose(struct repository *r, > + const unsigned char *sha1) > +{ > + int subdir_nr = sha1[0]; > + struct object_id oid; > + struct object_directory *odb; > + > + hashcpy(oid.hash, sha1); > + > + prepare_alt_odb(r); > + for (odb = r->objects->odb; odb; odb = odb->next) { > + odb_load_loose_cache(odb, subdir_nr); Is this thread-safe? What happens if e.g. one index-pack thread resizes the array while another one sorts it? Loading the cache explicitly up-front would avoid that, and improves performance a bit in my (very limited) tests on an SSD. Demo patch for next at the bottom. How does it do against your test cases? > + if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0) > + return 1; > + } > + return 0; > +} > + > /* > * Map the loose object at "path" if it is not NULL, or the path found by > * searching for a loose object named "sha1". > @@ -1171,6 +1189,8 @@ static int sha1_loose_object_info(struct repository *r, > if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) { > const char *path; > struct stat st; > + if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) > + return quick_has_loose(r, sha1) ? 0 : -1; > if (stat_sha1_file(r, sha1, &st, &path) < 0) > return -1; > if (oi->disk_sizep) > builtin/fetch.c | 2 ++ builtin/index-pack.c | 2 ++ fetch-pack.c | 2 ++ object-store.h | 1 + sha1-file.c | 30 +++++++++++++++++++++++++++--- 5 files changed, 34 insertions(+), 3 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index e0140327aa..4b031f5da5 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -301,6 +301,8 @@ static void find_non_local_tags(const struct ref *refs, refname_hash_init(&existing_refs); refname_hash_init(&remote_refs); + repo_load_loose_cache(the_repository); + for_each_ref(add_one_refname, &existing_refs); for (ref = refs; ref; ref = ref->next) { if (!starts_with(ref->name, "refs/tags/")) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index ac1f4ea9a7..7fc6321c77 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1772,6 +1772,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) if (show_stat) obj_stat = xcalloc(st_add(nr_objects, 1), sizeof(struct object_stat)); ofs_deltas = xcalloc(nr_objects, sizeof(struct ofs_delta_entry)); + if (startup_info->have_repository) + repo_load_loose_cache(the_repository); parse_pack_objects(pack_hash); if (report_end_of_input) write_in_full(2, "\0", 1); diff --git a/fetch-pack.c b/fetch-pack.c index dd6700bda9..96c4624d9e 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -656,6 +656,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator, save_commit_buffer = 0; + repo_load_loose_cache(the_repository); + for (ref = *refs; ref; ref = ref->next) { struct object *o; diff --git a/object-store.h b/object-store.h index 8dceed0f31..f98dd3c857 100644 --- a/object-store.h +++ b/object-store.h @@ -53,6 +53,7 @@ void add_to_alternates_memory(const char *dir); * from 0 to 255 inclusive). */ void odb_load_loose_cache(struct object_directory *odb, int subdir_nr); +void repo_load_loose_cache(struct repository *r); struct packed_git { struct packed_git *next; diff --git a/sha1-file.c b/sha1-file.c index 05f63dfd4e..ae12f0a198 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -921,10 +921,19 @@ static int open_sha1_file(struct repository *r, return -1; } +static int quick_has_loose_odb(struct object_directory *odb, + const struct object_id *oid) +{ + int subdir_nr = oid->hash[0]; + + if (odb->loose_objects_subdir_seen[subdir_nr]) + return oid_array_lookup(&odb->loose_objects_cache, oid) >= 0; + return check_and_freshen_odb(odb, oid, 0); +} + static int quick_has_loose(struct repository *r, const unsigned char *sha1) { - int subdir_nr = sha1[0]; struct object_id oid; struct object_directory *odb; @@ -932,8 +941,7 @@ static int quick_has_loose(struct repository *r, prepare_alt_odb(r); for (odb = r->objects->odb; odb; odb = odb->next) { - odb_load_loose_cache(odb, subdir_nr); - if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0) + if (quick_has_loose_odb(odb, &oid)) return 1; } return 0; @@ -2178,6 +2186,22 @@ void odb_load_loose_cache(struct object_directory *odb, int subdir_nr) strbuf_release(&buf); } +void repo_load_loose_cache(struct repository *r) +{ + struct object_directory *odb; + + prepare_alt_odb(r); + for (odb = r->objects->odb; odb; odb = odb->next) { + int i; + + for (i = 0; i < ARRAY_SIZE(odb->loose_objects_subdir_seen); i++) + odb_load_loose_cache(odb, i); + + /* Sort as a side-effect, only read the cache from here on. */ + oid_array_lookup(&odb->loose_objects_cache, &null_oid); + } +} + static int check_stream_sha1(git_zstream *stream, const char *hdr, unsigned long size,
On Tue, Nov 27, 2018 at 09:48:57PM +0100, René Scharfe wrote: > > +static int quick_has_loose(struct repository *r, > > + const unsigned char *sha1) > > +{ > > + int subdir_nr = sha1[0]; > > + struct object_id oid; > > + struct object_directory *odb; > > + > > + hashcpy(oid.hash, sha1); > > + > > + prepare_alt_odb(r); > > + for (odb = r->objects->odb; odb; odb = odb->next) { > > + odb_load_loose_cache(odb, subdir_nr); > > Is this thread-safe? What happens if e.g. one index-pack thread resizes > the array while another one sorts it? No, but neither is any of the object_info / has_object_file path, which may use static function-local buffers, or (before my series) alt scratch bufs, or even call reprepare_packed_git(). In the long run, I think the solution is probably going to be pushing some mutexes into the right places, and putting one around the cache fill is an obvious place. > Loading the cache explicitly up-front would avoid that, and improves > performance a bit in my (very limited) tests on an SSD. Demo patch for > next at the bottom. How does it do against your test cases? It's going to do badly on corner cases where we don't need to load every object subdirectory, and one or more of them are big. I.e., if I look up "1234abcd", the current code only needs to fault in $GIT_DIR/objects/12. Pre-loading means we'd hit them all. Even without a lot of objects, on NFS that's 256 latencies instead of 1. -Peff
Am 13.11.2018 um 11:02 schrieb Ævar Arnfjörð Bjarmason: > > On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote: > >> On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote: >> >>> I get: >>> >>> Test origin/master peff/jk/loose-cache avar/check-collisions-config >>> ------------------------------------------------------------------------------------------------------------------------ >>> 0008.2: index-pack with 256*1 loose objects 4.31(0.55+0.18) 0.41(0.40+0.02) -90.5% 0.23(0.36+0.01) -94.7% >>> 0008.3: index-pack with 256*10 loose objects 4.37(0.45+0.21) 0.45(0.40+0.02) -89.7% 0.25(0.38+0.01) -94.3% >>> 0008.4: index-pack with 256*100 loose objects 4.47(0.53+0.23) 0.67(0.63+0.02) -85.0% 0.24(0.38+0.01) -94.6% >>> 0008.5: index-pack with 256*250 loose objects 5.01(0.67+0.30) 1.04(0.98+0.06) -79.2% 0.24(0.37+0.01) -95.2% >>> 0008.6: index-pack with 256*500 loose objects 5.11(0.57+0.21) 1.81(1.70+0.09) -64.6% 0.25(0.38+0.01) -95.1% >>> 0008.7: index-pack with 256*750 loose objects 5.12(0.60+0.22) 2.54(2.38+0.14) -50.4% 0.24(0.38+0.01) -95.3% >>> 0008.8: index-pack with 256*1000 loose objects 4.52(0.52+0.21) 3.36(3.17+0.17) -25.7% 0.23(0.36+0.01) -94.9% >>> >>> I then hacked it to test against git.git, but skipped origin/master for >>> that one because it takes *ages*. So just mine v.s. yours: >>> >>> $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh >>> [...] >>> Test peff/jk/loose-cache avar/check-collisions-config >>> --------------------------------------------------------------------------------------------------- >>> 0008.2: index-pack with 256*1 loose objects 12.57(28.72+0.61) 12.68(29.36+0.62) +0.9% >>> 0008.3: index-pack with 256*10 loose objects 12.77(28.75+0.61) 12.50(28.88+0.56) -2.1% >>> 0008.4: index-pack with 256*100 loose objects 13.20(29.49+0.66) 12.38(28.58+0.60) -6.2% >>> 0008.5: index-pack with 256*250 loose objects 14.10(30.59+0.64) 12.54(28.22+0.57) -11.1% >>> 0008.6: index-pack with 256*500 loose objects 14.48(31.06+0.74) 12.43(28.59+0.60) -14.2% >>> 0008.7: index-pack with 256*750 loose objects 15.31(31.91+0.74) 12.67(29.23+0.64) -17.2% >>> 0008.8: index-pack with 256*1000 loose objects 16.34(32.84+0.76) 13.11(30.19+0.68) -19.8% >>> >>> So not much of a practical difference perhaps. But then again this isn't >>> a very realistic test case of anything. Rarely are you going to push a >>> history of something the size of git.git into a repo with this many >>> loose objects. >>> >>> Using sha1collisiondetection.git is I think the most realistic scenario, >>> i.e. you'll often end up fetching/pushing something roughly the size of >>> its entire history on a big repo, and with it: >>> >>> Test peff/jk/loose-cache avar/check-collisions-config >>> --------------------------------------------------------------------------------------------------- >>> 0008.2: index-pack with 256*1 loose objects 0.16(0.04+0.01) 0.05(0.03+0.00) -68.8% >>> 0008.3: index-pack with 256*10 loose objects 0.19(0.04+0.02) 0.05(0.02+0.00) -73.7% >>> 0008.4: index-pack with 256*100 loose objects 0.32(0.17+0.02) 0.04(0.02+0.00) -87.5% >>> 0008.5: index-pack with 256*250 loose objects 0.57(0.41+0.03) 0.04(0.02+0.00) -93.0% >>> 0008.6: index-pack with 256*500 loose objects 1.02(0.83+0.06) 0.04(0.03+0.00) -96.1% >>> 0008.7: index-pack with 256*750 loose objects 1.47(1.24+0.10) 0.04(0.02+0.00) -97.3% >>> 0008.8: index-pack with 256*1000 loose objects 1.94(1.70+0.10) 0.04(0.02+0.00) -97.9% >>> >>> As noted in previous threads I have an in-house monorepo where (due to >>> expiry policies) loose objects hover around the 256*250 mark. >>> >>> The script, which is hacky as hell and takes shortcuts not to re-create >>> the huge fake loose object collection every time (takes ages). Perhaps >>> you're interested in incorporating some version of this into a v2. To be >>> useful it should take some target path as an env variable. >> >> I forgot perhaps the most useful metric. Testing against origin/master >> too on the sha1collisiondetection.git repo, which as noted above I think >> is a good stand-in for making a medium sized push to a big repo. This >> shows when the loose cache becomes counterproductive: >> >> Test origin/master peff/jk/loose-cache avar/check-collisions-config >> ------------------------------------------------------------------------------------------------------------------------- >> 0008.2: index-pack with 256*1 loose objects 0.42(0.04+0.03) 0.17(0.04+0.00) -59.5% 0.04(0.03+0.00) -90.5% >> 0008.3: index-pack with 256*10 loose objects 0.49(0.04+0.03) 0.19(0.04+0.01) -61.2% 0.04(0.02+0.00) -91.8% >> 0008.4: index-pack with 256*100 loose objects 0.49(0.04+0.04) 0.33(0.18+0.01) -32.7% 0.05(0.02+0.00) -89.8% >> 0008.5: index-pack with 256*250 loose objects 0.54(0.03+0.04) 0.59(0.43+0.02) +9.3% 0.04(0.02+0.01) -92.6% >> 0008.6: index-pack with 256*500 loose objects 0.49(0.04+0.03) 1.04(0.83+0.07) +112.2% 0.04(0.02+0.00) -91.8% >> 0008.7: index-pack with 256*750 loose objects 0.56(0.04+0.05) 1.50(1.28+0.08) +167.9% 0.04(0.02+0.00) -92.9% >> 0008.8: index-pack with 256*1000 loose objects 0.54(0.05+0.03) 1.95(1.68+0.13) +261.1% 0.04(0.02+0.00) -92.6% >> >> I still think it's best to take this patch series since it's unlikely >> we're making anything worse in practice, the >50k objects case is a >> really high number, which I don't think is worth worrying about. >> >> But I am somewhat paranoid about the potential performance >> regression. I.e. this is me testing against a really expensive and >> relatively well performing NetApp NFS device where the ping stats are: >> >> rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms >> >> So I suspect this might get a lot worse for setups which don't enjoy the >> same performance or network locality. > > I tried this with the same filer mounted from another DC with ~10x the > RTT: > > rtt min/avg/max/mdev = 11.553/11.618/11.739/0.121 ms > > But otherwise the same setup (same machine type/specs mounting it). It > had the opposite results of what I was expecting: > > Test origin/master peff/jk/loose-cache avar/check-collisions-config > ------------------------------------------------------------------------------------------------------------------------ > 0008.2: index-pack with 256*1 loose objects 7.78(0.04+0.03) 2.75(0.03+0.01) -64.7% 0.40(0.02+0.00) -94.9% > 0008.3: index-pack with 256*10 loose objects 7.75(0.04+0.04) 2.77(0.05+0.01) -64.3% 0.40(0.02+0.00) -94.8% > 0008.4: index-pack with 256*100 loose objects 7.75(0.05+0.02) 2.91(0.18+0.01) -62.5% 0.40(0.02+0.00) -94.8% > 0008.5: index-pack with 256*250 loose objects 7.73(0.04+0.04) 3.19(0.43+0.02) -58.7% 0.40(0.02+0.00) -94.8% > 0008.6: index-pack with 256*500 loose objects 7.73(0.04+0.04) 3.64(0.83+0.05) -52.9% 0.40(0.02+0.00) -94.8% > 0008.7: index-pack with 256*750 loose objects 7.73(0.04+0.02) 4.14(1.29+0.07) -46.4% 0.40(0.02+0.00) -94.8% > 0008.8: index-pack with 256*1000 loose objects 7.73(0.04+0.03) 4.55(1.72+0.09) -41.1% 0.40(0.02+0.01) -94.8% > > I.e. there the cliff of where the cache becomes counterproductive comes > much later, not earlier. The sha1collisiondetection.git repo has 418 > objects. > > So is it cheaper to fill a huge cache than look up those 418? I don't > know, haven't dug. But so far what this suggests is that we're helping > slow FSs to the detriment of faster ones. > > So here's the same test not against NFS, but the local ext4 fs (CO7; > Linux 3.10) for sha1collisiondetection.git: > > Test origin/master peff/jk/loose-cache avar/check-collisions-config > -------------------------------------------------------------------------------------------------------------------------- > 0008.2: index-pack with 256*1 loose objects 0.02(0.02+0.00) 0.02(0.02+0.01) +0.0% 0.02(0.02+0.00) +0.0% > 0008.3: index-pack with 256*10 loose objects 0.02(0.02+0.00) 0.03(0.03+0.00) +50.0% 0.02(0.02+0.00) +0.0% > 0008.4: index-pack with 256*100 loose objects 0.02(0.02+0.00) 0.17(0.16+0.01) +750.0% 0.02(0.02+0.00) +0.0% > 0008.5: index-pack with 256*250 loose objects 0.02(0.02+0.00) 0.43(0.40+0.03) +2050.0% 0.02(0.02+0.00) +0.0% > 0008.6: index-pack with 256*500 loose objects 0.02(0.02+0.00) 0.88(0.80+0.09) +4300.0% 0.02(0.02+0.00) +0.0% > 0008.7: index-pack with 256*750 loose objects 0.02(0.02+0.00) 1.35(1.27+0.09) +6650.0% 0.02(0.02+0.00) +0.0% > 0008.8: index-pack with 256*1000 loose objects 0.02(0.02+0.00) 1.83(1.70+0.14) +9050.0% 0.02(0.02+0.00) +0.0% > > And for mu.git, a ~20k object repo: > > Test origin/master peff/jk/loose-cache avar/check-collisions-config > ------------------------------------------------------------------------------------------------------------------------- > 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4% > 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4% > 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7% > 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7% > 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7% > 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7% > 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4% OK, here's another theory: The cache scales badly with increasing numbers of loose objects because it sorts the array 256 times as it is filled. Loading it fully and sorting once would help, as would using one array per subdirectory. We can simulate the oid_array operations with test-sha1-array. It has no explicit sort command, but we can use for_each_unique for that; we just have to add 127.5 extra calls (that don't sort) to get the same amount of output in the two latter cases, to be able to compare just the sort time: for command in ' foreach (0..255) { $subdir = sprintf("%02x", $_); foreach (1..$ARGV[0]) { printf("append %s%038d\n", $subdir, $_); } # intermediate sort print "for_each_unique\n"; } ' ' foreach (0..255) { $subdir = sprintf("%02x", $_); foreach (1..$ARGV[0]) { printf("append %s%038d\n", $subdir, $_); } } # sort once at the end print "for_each_unique\n"; # ... and generate roughly the same amount of output foreach (0..127) { print "for_each_unique\n"; } ' ' foreach (0..255) { $subdir = sprintf("%02x", $_); foreach (1..$ARGV[0]) { printf("append %s%038d\n", $subdir, $_); } # sort each subdirectory separately print "for_each_unique\n"; # ... and generate roughly the same amount of output foreach (0..127) { print "for_each_unique\n"; } print "clear\n"; } ' do time perl -e "$command" 1000 | t/helper/test-tool sha1-array | wc -l done My results: 32896000 real 0m6.521s user 0m5.269s sys 0m2.234s 33024000 real 0m3.464s user 0m2.178s sys 0m2.251s 33024000 real 0m3.353s user 0m2.179s sys 0m1.939s So this adds up to a significant amount of time spent on sorting. Here's a patch, on top of next: -- >8 -- Subject: [PATCH] object-store: use one oid_array per subdirectory for loose cache The loose objects cache is filled one subdirectory at a time as needed. It is stored in an oid_array, which has to be resorted after each add operation. So when querying a wide range of objects the array needs to be resorted up to 256 times -- once for each subdirectory. This is not efficient. Use one oid_array for each subdirectory. This ensures that entries have to only be sorted once. It speeds up cache operations in a repository with ca. 100 loose objects per subdirectory (it's used for collision checks for the placeholders %h, %t and %p): $ git count-objects 25805 objects, 302452 kilobytes $ (cd t/perf && ./run HEAD^ HEAD ./p4205-log-pretty-formats.sh) [...] Test HEAD^ HEAD -------------------------------------------------------------------- 4205.1: log with %H 0.56(0.52+0.04) 0.57(0.54+0.02) +1.8% 4205.2: log with %h 0.92(0.86+0.06) 0.66(0.62+0.04) -28.3% 4205.3: log with %T 0.56(0.52+0.04) 0.57(0.55+0.01) +1.8% 4205.4: log with %t 0.92(0.88+0.04) 0.67(0.62+0.05) -27.2% 4205.5: log with %P 0.57(0.54+0.02) 0.57(0.54+0.03) +0.0% 4205.6: log with %p 0.92(0.86+0.05) 0.64(0.60+0.04) -30.4% 4205.7: log with %h-%h-%h 1.02(0.98+0.04) 0.72(0.69+0.03) -29.4% Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> --- This patch goes on top of next. p4205 is better suited to show the cost of sorting in the loose objects cache than the slooow index-pack. This change does fix the index-pack regression on ext4 for me as well, though. Not sure it warrants adding a loose objects test to p5302. object-store.h | 2 +- object.c | 5 ++++- packfile.c | 4 +++- sha1-file.c | 5 +++-- sha1-name.c | 8 +++++--- 5 files changed, 16 insertions(+), 8 deletions(-) diff --git a/object-store.h b/object-store.h index 8dceed0f31..ee67a50980 100644 --- a/object-store.h +++ b/object-store.h @@ -20,7 +20,7 @@ struct object_directory { * Be sure to call odb_load_loose_cache() before using. */ char loose_objects_subdir_seen[256]; - struct oid_array loose_objects_cache; + struct oid_array loose_objects_cache[256]; /* * Path to the alternative object store. If this is a relative path, diff --git a/object.c b/object.c index c29a97a7e9..965493ba76 100644 --- a/object.c +++ b/object.c @@ -484,8 +484,11 @@ struct raw_object_store *raw_object_store_new(void) static void free_object_directory(struct object_directory *odb) { + int i; + + for (i = 0; i < ARRAY_SIZE(odb->loose_objects_cache); i++) + oid_array_clear(&odb->loose_objects_cache[i]); free(odb->path); - oid_array_clear(&odb->loose_objects_cache); free(odb); } diff --git a/packfile.c b/packfile.c index 56496bb425..3ef6a241b7 100644 --- a/packfile.c +++ b/packfile.c @@ -995,7 +995,9 @@ void reprepare_packed_git(struct repository *r) struct object_directory *odb; for (odb = r->objects->odb; odb; odb = odb->next) { - oid_array_clear(&odb->loose_objects_cache); + int i; + for (i = 0; i < ARRAY_SIZE(odb->loose_objects_cache); i++) + oid_array_clear(&odb->loose_objects_cache[i]); memset(&odb->loose_objects_subdir_seen, 0, sizeof(odb->loose_objects_subdir_seen)); } diff --git a/sha1-file.c b/sha1-file.c index 05f63dfd4e..d2f5e65865 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -933,7 +933,8 @@ static int quick_has_loose(struct repository *r, prepare_alt_odb(r); for (odb = r->objects->odb; odb; odb = odb->next) { odb_load_loose_cache(odb, subdir_nr); - if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0) + if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr], + &oid) >= 0) return 1; } return 0; @@ -2173,7 +2174,7 @@ void odb_load_loose_cache(struct object_directory *odb, int subdir_nr) for_each_file_in_obj_subdir(subdir_nr, &buf, append_loose_object, NULL, NULL, - &odb->loose_objects_cache); + &odb->loose_objects_cache[subdir_nr]); odb->loose_objects_subdir_seen[subdir_nr] = 1; strbuf_release(&buf); } diff --git a/sha1-name.c b/sha1-name.c index b24502811b..fdb22147b2 100644 --- a/sha1-name.c +++ b/sha1-name.c @@ -94,14 +94,16 @@ static void find_short_object_filename(struct disambiguate_state *ds) odb && !ds->ambiguous; odb = odb->next) { int pos; + struct oid_array *loose_subdir_objects; odb_load_loose_cache(odb, subdir_nr); - pos = oid_array_lookup(&odb->loose_objects_cache, &ds->bin_pfx); + loose_subdir_objects = &odb->loose_objects_cache[subdir_nr]; + pos = oid_array_lookup(loose_subdir_objects, &ds->bin_pfx); if (pos < 0) pos = -1 - pos; - while (!ds->ambiguous && pos < odb->loose_objects_cache.nr) { + while (!ds->ambiguous && pos < loose_subdir_objects->nr) { const struct object_id *oid; - oid = odb->loose_objects_cache.oid + pos; + oid = loose_subdir_objects->oid + pos; if (!match_sha(ds->len, ds->bin_pfx.hash, oid->hash)) break; update_candidates(ds, oid);
On Sun, Dec 02, 2018 at 11:52:50AM +0100, René Scharfe wrote: > > And for mu.git, a ~20k object repo: > > > > Test origin/master peff/jk/loose-cache avar/check-collisions-config > > ------------------------------------------------------------------------------------------------------------------------- > > 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4% > > 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4% > > 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7% > > 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7% > > 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7% > > 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7% > > 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4% > > OK, here's another theory: The cache scales badly with increasing > numbers of loose objects because it sorts the array 256 times as it is > filled. Loading it fully and sorting once would help, as would using > one array per subdirectory. Yeah, that makes sense. This was actually how I had planned to do it originally, but then I ended up just reusing the existing single-array approach from the abbrev code. I hadn't actually thought about the repeated sortings (but that definitely makes sense that they would hurt in these pathological cases), but more just that we get a 256x reduction in N for our binary search (in fact we already do this first-byte lookup-table trick for pack index lookups). Your patch looks good to me. We may want to do one thing on top: > diff --git a/object-store.h b/object-store.h > index 8dceed0f31..ee67a50980 100644 > --- a/object-store.h > +++ b/object-store.h > @@ -20,7 +20,7 @@ struct object_directory { > * Be sure to call odb_load_loose_cache() before using. > */ > char loose_objects_subdir_seen[256]; > - struct oid_array loose_objects_cache; > + struct oid_array loose_objects_cache[256]; The comment in the context there is warning callers to remember to load the cache first. Now that we have individual caches, might it make sense to change the interface a bit, and make these members private. I.e., something like: struct oid_array *odb_loose_cache(struct object_directory *odb, int subdir_nr) { if (!loose_objects_subdir_seen[subdir_nr]) odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */ return &odb->loose_objects_cache[subdir_nr]; } That's harder to get wrong, and this: > diff --git a/sha1-file.c b/sha1-file.c > index 05f63dfd4e..d2f5e65865 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -933,7 +933,8 @@ static int quick_has_loose(struct repository *r, > prepare_alt_odb(r); > for (odb = r->objects->odb; odb; odb = odb->next) { > odb_load_loose_cache(odb, subdir_nr); > - if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0) > + if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr], > + &oid) >= 0) > return 1; > } becomes: struct oid_array *cache = odb_loose_cache(odb, subdir_nr); if (oid_array_lookup(cache, &oid)) return 1; (An even simpler interface would be a single function that computes subdir_nr and does the lookup itself, but that would not be enough for find_short_object_filename()). -Peff
Am 03.12.2018 um 23:04 schrieb Jeff King: > On Sun, Dec 02, 2018 at 11:52:50AM +0100, René Scharfe wrote: > >>> And for mu.git, a ~20k object repo: >>> >>> Test origin/master peff/jk/loose-cache avar/check-collisions-config >>> ------------------------------------------------------------------------------------------------------------------------- >>> 0008.2: index-pack with 256*1 loose objects 0.59(0.91+0.06) 0.58(0.93+0.03) -1.7% 0.57(0.89+0.04) -3.4% >>> 0008.3: index-pack with 256*10 loose objects 0.59(0.91+0.07) 0.59(0.92+0.03) +0.0% 0.57(0.89+0.03) -3.4% >>> 0008.4: index-pack with 256*100 loose objects 0.59(0.91+0.05) 0.81(1.13+0.04) +37.3% 0.58(0.91+0.04) -1.7% >>> 0008.5: index-pack with 256*250 loose objects 0.59(0.91+0.05) 1.23(1.51+0.08) +108.5% 0.58(0.91+0.04) -1.7% >>> 0008.6: index-pack with 256*500 loose objects 0.59(0.90+0.06) 1.96(2.20+0.12) +232.2% 0.58(0.91+0.04) -1.7% >>> 0008.7: index-pack with 256*750 loose objects 0.59(0.92+0.05) 2.72(2.92+0.17) +361.0% 0.58(0.90+0.04) -1.7% >>> 0008.8: index-pack with 256*1000 loose objects 0.59(0.90+0.06) 3.50(3.67+0.21) +493.2% 0.57(0.90+0.04) -3.4% >> >> OK, here's another theory: The cache scales badly with increasing >> numbers of loose objects because it sorts the array 256 times as it is >> filled. Loading it fully and sorting once would help, as would using >> one array per subdirectory. > > Yeah, that makes sense. This was actually how I had planned to do it > originally, but then I ended up just reusing the existing single-array > approach from the abbrev code. > > I hadn't actually thought about the repeated sortings (but that > definitely makes sense that they would hurt in these pathological > cases), but more just that we get a 256x reduction in N for our binary > search (in fact we already do this first-byte lookup-table trick for > pack index lookups). Skipping eight steps in a binary search is something, but it's faster even without that. Just realized that the demo code can use "lookup" instead of the much more expensive "for_each_unique" to sort. D'oh! With that change: for command in ' foreach (0..255) { $subdir = sprintf("%02x", $_); foreach (1..$ARGV[0]) { printf("append %s%038d\n", $subdir, $_); } # intermediate sort print "lookup " . "0" x 40 . "\n"; } ' ' foreach (0..255) { $subdir = sprintf("%02x", $_); foreach (1..$ARGV[0]) { printf("append %s%038d\n", $subdir, $_); } } # sort once at the end print "lookup " . "0" x 40 . "\n"; ' ' foreach (0..255) { $subdir = sprintf("%02x", $_); foreach (1..$ARGV[0]) { printf("append %s%038d\n", $subdir, $_); } # sort each subdirectory separately print "lookup " . "0" x 40 . "\n"; print "clear\n"; } ' do time perl -e "$command" 1000 | t/helper/test-tool sha1-array | wc -l done And the results make the scale of the improvement more obvious: 256 real 0m3.476s user 0m3.466s sys 0m0.099s 1 real 0m0.157s user 0m0.148s sys 0m0.046s 256 real 0m0.117s user 0m0.116s sys 0m0.051s > Your patch looks good to me. We may want to do one thing on top: > >> diff --git a/object-store.h b/object-store.h >> index 8dceed0f31..ee67a50980 100644 >> --- a/object-store.h >> +++ b/object-store.h >> @@ -20,7 +20,7 @@ struct object_directory { >> * Be sure to call odb_load_loose_cache() before using. >> */ >> char loose_objects_subdir_seen[256]; >> - struct oid_array loose_objects_cache; >> + struct oid_array loose_objects_cache[256]; > > The comment in the context there is warning callers to remember to load > the cache first. Now that we have individual caches, might it make sense > to change the interface a bit, and make these members private. I.e., > something like: > > struct oid_array *odb_loose_cache(struct object_directory *odb, > int subdir_nr) > { > if (!loose_objects_subdir_seen[subdir_nr]) > odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */ > > return &odb->loose_objects_cache[subdir_nr]; > } Sure. And it should take an object_id pointer instead of a subdir_nr -- less duplication, nicer interface (patch below). It would be nice if it could return a const pointer to discourage messing up the cache, but that's not compatible with oid_array_lookup(). And quick_has_loose() should be converted to object_id as well -- adding a function that takes a SHA-1 is a regression. :D René --- object-store.h | 8 ++++---- sha1-file.c | 19 ++++++++----------- sha1-name.c | 4 +--- 3 files changed, 13 insertions(+), 18 deletions(-) diff --git a/object-store.h b/object-store.h index ee67a50980..dd9efdd276 100644 --- a/object-store.h +++ b/object-store.h @@ -48,11 +48,11 @@ void add_to_alternates_file(const char *dir); void add_to_alternates_memory(const char *dir); /* - * Populate an odb's loose object cache for one particular subdirectory (i.e., - * the one that corresponds to the first byte of objects you're interested in, - * from 0 to 255 inclusive). + * Populate and return the loose object cache array corresponding to the + * given object ID. */ -void odb_load_loose_cache(struct object_directory *odb, int subdir_nr); +struct oid_array *odb_loose_cache(struct object_directory *odb, + const struct object_id *oid); struct packed_git { struct packed_git *next; diff --git a/sha1-file.c b/sha1-file.c index d2f5e65865..38af6d5d0b 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -924,7 +924,6 @@ static int open_sha1_file(struct repository *r, static int quick_has_loose(struct repository *r, const unsigned char *sha1) { - int subdir_nr = sha1[0]; struct object_id oid; struct object_directory *odb; @@ -932,9 +931,7 @@ static int quick_has_loose(struct repository *r, prepare_alt_odb(r); for (odb = r->objects->odb; odb; odb = odb->next) { - odb_load_loose_cache(odb, subdir_nr); - if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr], - &oid) >= 0) + if (oid_array_lookup(odb_loose_cache(odb, &oid), &oid) >= 0) return 1; } return 0; @@ -2159,24 +2156,24 @@ static int append_loose_object(const struct object_id *oid, const char *path, return 0; } -void odb_load_loose_cache(struct object_directory *odb, int subdir_nr) +struct oid_array *odb_loose_cache(struct object_directory *odb, + const struct object_id *oid) { + int subdir_nr = oid->hash[0]; + struct oid_array *subdir_array = &odb->loose_objects_cache[subdir_nr]; struct strbuf buf = STRBUF_INIT; - if (subdir_nr < 0 || - subdir_nr >= ARRAY_SIZE(odb->loose_objects_subdir_seen)) - BUG("subdir_nr out of range"); - if (odb->loose_objects_subdir_seen[subdir_nr]) - return; + return subdir_array; strbuf_addstr(&buf, odb->path); for_each_file_in_obj_subdir(subdir_nr, &buf, append_loose_object, NULL, NULL, - &odb->loose_objects_cache[subdir_nr]); + subdir_array); odb->loose_objects_subdir_seen[subdir_nr] = 1; strbuf_release(&buf); + return subdir_array; } static int check_stream_sha1(git_zstream *stream, diff --git a/sha1-name.c b/sha1-name.c index fdb22147b2..4fc6368ce5 100644 --- a/sha1-name.c +++ b/sha1-name.c @@ -87,7 +87,6 @@ static int match_sha(unsigned, const unsigned char *, const unsigned char *); static void find_short_object_filename(struct disambiguate_state *ds) { - int subdir_nr = ds->bin_pfx.hash[0]; struct object_directory *odb; for (odb = the_repository->objects->odb; @@ -96,8 +95,7 @@ static void find_short_object_filename(struct disambiguate_state *ds) int pos; struct oid_array *loose_subdir_objects; - odb_load_loose_cache(odb, subdir_nr); - loose_subdir_objects = &odb->loose_objects_cache[subdir_nr]; + loose_subdir_objects = odb_loose_cache(odb, &ds->bin_pfx); pos = oid_array_lookup(loose_subdir_objects, &ds->bin_pfx); if (pos < 0) pos = -1 - pos;
On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote: > > The comment in the context there is warning callers to remember to load > > the cache first. Now that we have individual caches, might it make sense > > to change the interface a bit, and make these members private. I.e., > > something like: > > > > struct oid_array *odb_loose_cache(struct object_directory *odb, > > int subdir_nr) > > { > > if (!loose_objects_subdir_seen[subdir_nr]) > > odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */ > > > > return &odb->loose_objects_cache[subdir_nr]; > > } > > Sure. And it should take an object_id pointer instead of a subdir_nr -- > less duplication, nicer interface (patch below). I had considered that initially, but it's a little less flexible if a caller doesn't actually have an oid. Though both of the proposed callers do, so it's probably OK to worry about that if and when it ever comes up (the most plausible thing in my mind is doing some abbrev-like analysis without looking to abbreviate a _particular_ oid). > It would be nice if it could return a const pointer to discourage > messing up the cache, but that's not compatible with oid_array_lookup(). Yeah. > And quick_has_loose() should be converted to object_id as well -- adding > a function that takes a SHA-1 is a regression. :D I actually wrote it that way initially, but doing the hashcpy() in the caller is a bit more awkward. My thought was to punt on that until the rest of the surrounding code starts handling oids. > --- > object-store.h | 8 ++++---- > sha1-file.c | 19 ++++++++----------- > sha1-name.c | 4 +--- > 3 files changed, 13 insertions(+), 18 deletions(-) This patch looks sane. How do you want to handle it? I'd assumed your earlier one would be for applying on top, but this one doesn't have a commit message. Did you want me to squash down the individual hunks? -Peff
Am 05.12.2018 um 05:46 schrieb Jeff King: > On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote: > >>> The comment in the context there is warning callers to remember to load >>> the cache first. Now that we have individual caches, might it make sense >>> to change the interface a bit, and make these members private. I.e., >>> something like: >>> >>> struct oid_array *odb_loose_cache(struct object_directory *odb, >>> int subdir_nr) >>> { >>> if (!loose_objects_subdir_seen[subdir_nr]) >>> odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */ >>> >>> return &odb->loose_objects_cache[subdir_nr]; >>> } >> >> Sure. And it should take an object_id pointer instead of a subdir_nr -- >> less duplication, nicer interface (patch below). > > I had considered that initially, but it's a little less flexible if a > caller doesn't actually have an oid. Though both of the proposed callers > do, so it's probably OK to worry about that if and when it ever comes up > (the most plausible thing in my mind is doing some abbrev-like analysis > without looking to abbreviate a _particular_ oid). Right, let's focus on concrete requirements of current callers. YAGNI.. :) >> And quick_has_loose() should be converted to object_id as well -- adding >> a function that takes a SHA-1 is a regression. :D > > I actually wrote it that way initially, but doing the hashcpy() in the > caller is a bit more awkward. My thought was to punt on that until the > rest of the surrounding code starts handling oids. The level of awkwardness is the same to me, but sha1_loose_object_info() is much longer already, so it might feel worse to add it there. This function is easily converted to struct object_id, though, as its single caller can pass one on -- this makes the copy unnecessary. > This patch looks sane. How do you want to handle it? I'd assumed your > earlier one would be for applying on top, but this one doesn't have a > commit message. Did you want me to squash down the individual hunks? I'm waiting for the first one (object-store: use one oid_array per subdirectory for loose cache) to be accepted, as it aims to solve a user-visible performance regression, i.e. that's where the meat is. I'm particularly interested in performance numbers from Ævar for it. I can send the last one properly later, and add patches for converting quick_has_loose() to struct object_id. Those are just cosmetic.. René
On Wed, Dec 05, 2018 at 07:02:17AM +0100, René Scharfe wrote: > > I actually wrote it that way initially, but doing the hashcpy() in the > > caller is a bit more awkward. My thought was to punt on that until the > > rest of the surrounding code starts handling oids. > > The level of awkwardness is the same to me, but sha1_loose_object_info() > is much longer already, so it might feel worse to add it there. Right, what I meant was that in sha1_loose_object_info(): if (special_case) handle_special_case(); is easier to follow than a block setting up the special case and then calling the function. > This > function is easily converted to struct object_id, though, as its single > caller can pass one on -- this makes the copy unnecessary. If you mean modifying sha1_loose_object_info() to take an oid, then sure, I agree that is a good step forward (and that is exactly the "punt until" moment I meant). > > This patch looks sane. How do you want to handle it? I'd assumed your > > earlier one would be for applying on top, but this one doesn't have a > > commit message. Did you want me to squash down the individual hunks? > > I'm waiting for the first one (object-store: use one oid_array per > subdirectory for loose cache) to be accepted, as it aims to solve a > user-visible performance regression, i.e. that's where the meat is. > I'm particularly interested in performance numbers from Ævar for it. > > I can send the last one properly later, and add patches for converting > quick_has_loose() to struct object_id. Those are just cosmetic.. Great, thanks. I just wanted to make sure these improvements weren't going to slip through the cracks. -Peff
On Wed, Dec 05, 2018 at 01:51:36AM -0500, Jeff King wrote: > > This > > function is easily converted to struct object_id, though, as its single > > caller can pass one on -- this makes the copy unnecessary. > > If you mean modifying sha1_loose_object_info() to take an oid, then > sure, I agree that is a good step forward (and that is exactly the "punt > until" moment I meant). So the simple thing is to do that, and then have it pass oid->hash to the other functions it uses. If we start to convert those, there's a little bit of a rabbit hole, but it's actually not too bad. Most of the spill-over is into the dumb-http code. Note that it actually uses sha1 itself! That probably needs to be the_hash_algo (though I'm not even sure how we'd negotiate the algorithm across a dumb fetch). At any rate, I don't think this patch makes anything _worse_ in that respect. diff --git a/http-push.c b/http-push.c index cd48590912..0141b0ad53 100644 --- a/http-push.c +++ b/http-push.c @@ -255,7 +255,7 @@ static void start_fetch_loose(struct transfer_request *request) struct active_request_slot *slot; struct http_object_request *obj_req; - obj_req = new_http_object_request(repo->url, request->obj->oid.hash); + obj_req = new_http_object_request(repo->url, &request->obj->oid); if (obj_req == NULL) { request->state = ABORTED; return; diff --git a/http-walker.c b/http-walker.c index 0a392c85b6..29b59e2fe0 100644 --- a/http-walker.c +++ b/http-walker.c @@ -58,7 +58,7 @@ static void start_object_request(struct walker *walker, struct active_request_slot *slot; struct http_object_request *req; - req = new_http_object_request(obj_req->repo->base, obj_req->oid.hash); + req = new_http_object_request(obj_req->repo->base, &obj_req->oid); if (req == NULL) { obj_req->state = ABORTED; return; @@ -543,11 +543,11 @@ static int fetch_object(struct walker *walker, unsigned char *sha1) } else if (req->zret != Z_STREAM_END) { walker->corrupt_object_found++; ret = error("File %s (%s) corrupt", hex, req->url); - } else if (!hasheq(obj_req->oid.hash, req->real_sha1)) { + } else if (!oideq(&obj_req->oid, &req->real_oid)) { ret = error("File %s has bad hash", hex); } else if (req->rename < 0) { struct strbuf buf = STRBUF_INIT; - loose_object_path(the_repository, &buf, req->sha1); + loose_object_path(the_repository, &buf, &req->oid); ret = error("unable to write sha1 filename %s", buf.buf); strbuf_release(&buf); } diff --git a/http.c b/http.c index 7cfa7a16e0..e95b5b9be0 100644 --- a/http.c +++ b/http.c @@ -2298,9 +2298,9 @@ static size_t fwrite_sha1_file(char *ptr, size_t eltsize, size_t nmemb, } struct http_object_request *new_http_object_request(const char *base_url, - unsigned char *sha1) + const struct object_id *oid) { - char *hex = sha1_to_hex(sha1); + char *hex = oid_to_hex(oid); struct strbuf filename = STRBUF_INIT; struct strbuf prevfile = STRBUF_INIT; int prevlocal; @@ -2311,10 +2311,10 @@ struct http_object_request *new_http_object_request(const char *base_url, freq = xcalloc(1, sizeof(*freq)); strbuf_init(&freq->tmpfile, 0); - hashcpy(freq->sha1, sha1); + oidcpy(&freq->oid, oid); freq->localfile = -1; - loose_object_path(the_repository, &filename, sha1); + loose_object_path(the_repository, &filename, oid); strbuf_addf(&freq->tmpfile, "%s.temp", filename.buf); strbuf_addf(&prevfile, "%s.prev", filename.buf); @@ -2456,16 +2456,16 @@ int finish_http_object_request(struct http_object_request *freq) } git_inflate_end(&freq->stream); - git_SHA1_Final(freq->real_sha1, &freq->c); + git_SHA1_Final(freq->real_oid.hash, &freq->c); if (freq->zret != Z_STREAM_END) { unlink_or_warn(freq->tmpfile.buf); return -1; } - if (!hasheq(freq->sha1, freq->real_sha1)) { + if (!oideq(&freq->oid, &freq->real_oid)) { unlink_or_warn(freq->tmpfile.buf); return -1; } - loose_object_path(the_repository, &filename, freq->sha1); + loose_object_path(the_repository, &filename, &freq->oid); freq->rename = finalize_object_file(freq->tmpfile.buf, filename.buf); strbuf_release(&filename); diff --git a/http.h b/http.h index d305ca1dc7..66c52b2e1e 100644 --- a/http.h +++ b/http.h @@ -224,8 +224,8 @@ struct http_object_request { CURLcode curl_result; char errorstr[CURL_ERROR_SIZE]; long http_code; - unsigned char sha1[20]; - unsigned char real_sha1[20]; + struct object_id oid; + struct object_id real_oid; git_SHA_CTX c; git_zstream stream; int zret; @@ -234,7 +234,7 @@ struct http_object_request { }; extern struct http_object_request *new_http_object_request( - const char *base_url, unsigned char *sha1); + const char *base_url, const struct object_id *oid); extern void process_http_object_request(struct http_object_request *freq); extern int finish_http_object_request(struct http_object_request *freq); extern void abort_http_object_request(struct http_object_request *freq); diff --git a/object-store.h b/object-store.h index fecbb7e094..265d0d8e1f 100644 --- a/object-store.h +++ b/object-store.h @@ -151,11 +151,13 @@ void raw_object_store_clear(struct raw_object_store *o); /* * Put in `buf` the name of the file in the local object database that - * would be used to store a loose object with the specified sha1. + * would be used to store a loose object with the specified oid. */ -const char *loose_object_path(struct repository *r, struct strbuf *buf, const unsigned char *sha1); +const char *loose_object_path(struct repository *r, struct strbuf *buf, + const struct object_id *oid); -void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size); +void *map_loose_object(struct repository *r, const struct object_id *oid, + unsigned long *size); extern void *read_object_file_extended(const struct object_id *oid, enum object_type *type, diff --git a/sha1-file.c b/sha1-file.c index 3ddf4c9426..0705709036 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -333,12 +333,12 @@ int raceproof_create_file(const char *path, create_file_fn fn, void *cb) return ret; } -static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1) +static void fill_loose_path(struct strbuf *buf, const struct object_id *oid) { int i; for (i = 0; i < the_hash_algo->rawsz; i++) { static char hex[] = "0123456789abcdef"; - unsigned int val = sha1[i]; + unsigned int val = oid->hash[i]; strbuf_addch(buf, hex[val >> 4]); strbuf_addch(buf, hex[val & 0xf]); if (!i) @@ -348,19 +348,19 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1) static const char *odb_loose_path(struct object_directory *odb, struct strbuf *buf, - const unsigned char *sha1) + const struct object_id *oid) { strbuf_reset(buf); strbuf_addstr(buf, odb->path); strbuf_addch(buf, '/'); - fill_sha1_path(buf, sha1); + fill_loose_path(buf, oid); return buf->buf; } const char *loose_object_path(struct repository *r, struct strbuf *buf, - const unsigned char *sha1) + const struct object_id *oid) { - return odb_loose_path(r->objects->odb, buf, sha1); + return odb_loose_path(r->objects->odb, buf, oid); } /* @@ -721,7 +721,7 @@ static int check_and_freshen_odb(struct object_directory *odb, int freshen) { static struct strbuf path = STRBUF_INIT; - odb_loose_path(odb, &path, oid->hash); + odb_loose_path(odb, &path, oid); return check_and_freshen_file(path.buf, freshen); } @@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags) * Note that it may point to static storage and is only valid until another * call to stat_sha1_file(). */ -static int stat_sha1_file(struct repository *r, const unsigned char *sha1, - struct stat *st, const char **path) +static int stat_loose_object(struct repository *r, const struct object_id *oid, + struct stat *st, const char **path) { struct object_directory *odb; static struct strbuf buf = STRBUF_INIT; prepare_alt_odb(r); for (odb = r->objects->odb; odb; odb = odb->next) { - *path = odb_loose_path(odb, &buf, sha1); + *path = odb_loose_path(odb, &buf, oid); if (!lstat(*path, st)) return 0; } @@ -900,7 +900,7 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1, * descriptor. See the caveats on the "path" parameter above. */ static int open_sha1_file(struct repository *r, - const unsigned char *sha1, const char **path) + const struct object_id *oid, const char **path) { int fd; struct object_directory *odb; @@ -909,7 +909,7 @@ static int open_sha1_file(struct repository *r, prepare_alt_odb(r); for (odb = r->objects->odb; odb; odb = odb->next) { - *path = odb_loose_path(odb, &buf, sha1); + *path = odb_loose_path(odb, &buf, oid); fd = git_open(*path); if (fd >= 0) return fd; @@ -922,19 +922,16 @@ static int open_sha1_file(struct repository *r, } static int quick_has_loose(struct repository *r, - const unsigned char *sha1) + const struct object_id *oid) { - int subdir_nr = sha1[0]; - struct object_id oid; + int subdir_nr = oid->hash[0]; struct object_directory *odb; - hashcpy(oid.hash, sha1); - prepare_alt_odb(r); for (odb = r->objects->odb; odb; odb = odb->next) { odb_load_loose_cache(odb, subdir_nr); if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr], - &oid) >= 0) + oid) >= 0) return 1; } return 0; @@ -944,8 +941,8 @@ static int quick_has_loose(struct repository *r, * Map the loose object at "path" if it is not NULL, or the path found by * searching for a loose object named "sha1". */ -static void *map_sha1_file_1(struct repository *r, const char *path, - const unsigned char *sha1, unsigned long *size) +static void *map_loose_object_1(struct repository *r, const char *path, + const struct object_id *oid, unsigned long *size) { void *map; int fd; @@ -953,7 +950,7 @@ static void *map_sha1_file_1(struct repository *r, const char *path, if (path) fd = git_open(path); else - fd = open_sha1_file(r, sha1, &path); + fd = open_sha1_file(r, oid, &path); map = NULL; if (fd >= 0) { struct stat st; @@ -972,10 +969,11 @@ static void *map_sha1_file_1(struct repository *r, const char *path, return map; } -void *map_sha1_file(struct repository *r, - const unsigned char *sha1, unsigned long *size) +void *map_loose_object(struct repository *r, + const struct object_id *oid, + unsigned long *size) { - return map_sha1_file_1(r, NULL, sha1, size); + return map_loose_object_1(r, NULL, oid, size); } static int unpack_sha1_short_header(git_zstream *stream, @@ -1045,7 +1043,9 @@ static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map, return -1; } -static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1) +static void *unpack_loose_rest(git_zstream *stream, + void *buffer, unsigned long size, + const struct object_id *oid) { int bytes = strlen(buffer) + 1; unsigned char *buf = xmallocz(size); @@ -1082,10 +1082,10 @@ static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long s } if (status < 0) - error(_("corrupt loose object '%s'"), sha1_to_hex(sha1)); + error(_("corrupt loose object '%s'"), oid_to_hex(oid)); else if (stream->avail_in) error(_("garbage at end of loose object '%s'"), - sha1_to_hex(sha1)); + oid_to_hex(oid)); free(buf); return NULL; } @@ -1164,9 +1164,9 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) return parse_sha1_header_extended(hdr, &oi, 0); } -static int sha1_loose_object_info(struct repository *r, - const unsigned char *sha1, - struct object_info *oi, int flags) +static int loose_object_info(struct repository *r, + const struct object_id *oid, + struct object_info *oi, int flags) { int status = 0; unsigned long mapsize; @@ -1191,15 +1191,15 @@ static int sha1_loose_object_info(struct repository *r, const char *path; struct stat st; if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) - return quick_has_loose(r, sha1) ? 0 : -1; - if (stat_sha1_file(r, sha1, &st, &path) < 0) + return quick_has_loose(r, oid) ? 0 : -1; + if (stat_loose_object(r, oid, &st, &path) < 0) return -1; if (oi->disk_sizep) *oi->disk_sizep = st.st_size; return 0; } - map = map_sha1_file(r, sha1, &mapsize); + map = map_loose_object(r, oid, &mapsize); if (!map) return -1; @@ -1211,22 +1211,22 @@ static int sha1_loose_object_info(struct repository *r, if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) { if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0) status = error(_("unable to unpack %s header with --allow-unknown-type"), - sha1_to_hex(sha1)); + oid_to_hex(oid)); } else if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) status = error(_("unable to unpack %s header"), - sha1_to_hex(sha1)); + oid_to_hex(oid)); if (status < 0) ; /* Do nothing */ else if (hdrbuf.len) { if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0) status = error(_("unable to parse %s header with --allow-unknown-type"), - sha1_to_hex(sha1)); + oid_to_hex(oid)); } else if ((status = parse_sha1_header_extended(hdr, oi, flags)) < 0) - status = error(_("unable to parse %s header"), sha1_to_hex(sha1)); + status = error(_("unable to parse %s header"), oid_to_hex(oid)); if (status >= 0 && oi->contentp) { - *oi->contentp = unpack_sha1_rest(&stream, hdr, - *oi->sizep, sha1); + *oi->contentp = unpack_loose_rest(&stream, hdr, + *oi->sizep, oid); if (!*oi->contentp) { git_inflate_end(&stream); status = -1; @@ -1292,7 +1292,7 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid, return -1; /* Most likely it's a loose object. */ - if (!sha1_loose_object_info(r, real->hash, oi, flags)) + if (!loose_object_info(r, real, oi, flags)) return 0; /* Not a loose object; someone else may have just packed it. */ @@ -1420,7 +1420,7 @@ void *read_object_file_extended(const struct object_id *oid, die(_("replacement %s not found for %s"), oid_to_hex(repl), oid_to_hex(oid)); - if (!stat_sha1_file(the_repository, repl->hash, &st, &path)) + if (!stat_loose_object(the_repository, repl, &st, &path)) die(_("loose object %s (stored in %s) is corrupt"), oid_to_hex(repl), path); @@ -1620,7 +1620,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr, static struct strbuf tmp_file = STRBUF_INIT; static struct strbuf filename = STRBUF_INIT; - loose_object_path(the_repository, &filename, oid->hash); + loose_object_path(the_repository, &filename, oid); fd = create_tmpfile(&tmp_file, filename.buf); if (fd < 0) { @@ -2196,7 +2196,7 @@ static int check_stream_sha1(git_zstream *stream, /* * This size comparison must be "<=" to read the final zlib packets; - * see the comment in unpack_sha1_rest for details. + * see the comment in unpack_loose_rest for details. */ while (total_read <= size && (status == Z_OK || @@ -2245,7 +2245,7 @@ int read_loose_object(const char *path, *contents = NULL; - map = map_sha1_file_1(the_repository, path, NULL, &mapsize); + map = map_loose_object_1(the_repository, path, NULL, &mapsize); if (!map) { error_errno(_("unable to mmap %s"), path); goto out; @@ -2267,7 +2267,7 @@ int read_loose_object(const char *path, if (check_stream_sha1(&stream, hdr, *size, path, expected_oid->hash) < 0) goto out; } else { - *contents = unpack_sha1_rest(&stream, hdr, *size, expected_oid->hash); + *contents = unpack_loose_rest(&stream, hdr, *size, expected_oid); if (!*contents) { error(_("unable to unpack contents of %s"), path); git_inflate_end(&stream); diff --git a/streaming.c b/streaming.c index ac7c7a22f9..9049146bc1 100644 --- a/streaming.c +++ b/streaming.c @@ -338,8 +338,8 @@ static struct stream_vtbl loose_vtbl = { static open_method_decl(loose) { - st->u.loose.mapped = map_sha1_file(the_repository, - oid->hash, &st->u.loose.mapsize); + st->u.loose.mapped = map_loose_object(the_repository, + oid, &st->u.loose.mapsize); if (!st->u.loose.mapped) return -1; if ((unpack_sha1_header(&st->z,
Am 05.12.2018 um 09:15 schrieb Jeff King: > On Wed, Dec 05, 2018 at 01:51:36AM -0500, Jeff King wrote: > >>> This >>> function is easily converted to struct object_id, though, as its single >>> caller can pass one on -- this makes the copy unnecessary. >> >> If you mean modifying sha1_loose_object_info() to take an oid, then >> sure, I agree that is a good step forward (and that is exactly the "punt >> until" moment I meant). > > So the simple thing is to do that, and then have it pass oid->hash to > the other functions it uses. Yes. > If we start to convert those, there's a > little bit of a rabbit hole, but it's actually not too bad. You don't need to crawl in just for quick_has_loose(), but eventually everything has to be converted. It seems a bit much for one patch, but perhaps that's just my ever-decreasing attention span speaking. Converting one function prototype or struct member at a time seems about the right amount of change per patch to me. That's not always possible due to entanglement, of course. > Most of the spill-over is into the dumb-http code. Note that it actually > uses sha1 itself! That probably needs to be the_hash_algo (though I'm > not even sure how we'd negotiate the algorithm across a dumb fetch). At > any rate, I don't think this patch makes anything _worse_ in that > respect. Right. > diff --git a/sha1-file.c b/sha1-file.c > index 3ddf4c9426..0705709036 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -333,12 +333,12 @@ int raceproof_create_file(const char *path, create_file_fn fn, void *cb) > return ret; > } > > -static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1) > +static void fill_loose_path(struct strbuf *buf, const struct object_id *oid) The new name fits. > @@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags) > * Note that it may point to static storage and is only valid until another > * call to stat_sha1_file(). > */ > -static int stat_sha1_file(struct repository *r, const unsigned char *sha1, > - struct stat *st, const char **path) > +static int stat_loose_object(struct repository *r, const struct object_id *oid, > + struct stat *st, const char **path) Hmm, read_sha1_file() was renamed to read_object_file() earlier this year, and I'd have expected this to become stat_object_file(). Names.. Anyway, the comment above and one a few lines below should be updated as well. > { > struct object_directory *odb; > static struct strbuf buf = STRBUF_INIT; > > prepare_alt_odb(r); > for (odb = r->objects->odb; odb; odb = odb->next) { > - *path = odb_loose_path(odb, &buf, sha1); > + *path = odb_loose_path(odb, &buf, oid); > if (!lstat(*path, st)) > return 0; > } > @@ -900,7 +900,7 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1, > * descriptor. See the caveats on the "path" parameter above. > */ > static int open_sha1_file(struct repository *r, > - const unsigned char *sha1, const char **path) > + const struct object_id *oid, const char **path) That function should lose the "sha1" in its name as well. > -static void *map_sha1_file_1(struct repository *r, const char *path, > - const unsigned char *sha1, unsigned long *size) > +static void *map_loose_object_1(struct repository *r, const char *path, > + const struct object_id *oid, unsigned long *size) Similarly, map_object_file_1()? > -void *map_sha1_file(struct repository *r, > - const unsigned char *sha1, unsigned long *size) > +void *map_loose_object(struct repository *r, > + const struct object_id *oid, > + unsigned long *size) Similar. > @@ -1045,7 +1043,9 @@ static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map, > return -1; > } > > -static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1) > +static void *unpack_loose_rest(git_zstream *stream, > + void *buffer, unsigned long size, > + const struct object_id *oid) Hmm, both old and new name here look weird to me at this point. > -static int sha1_loose_object_info(struct repository *r, > - const unsigned char *sha1, > - struct object_info *oi, int flags) > +static int loose_object_info(struct repository *r, > + const struct object_id *oid, > + struct object_info *oi, int flags) And nothing of value was lost. :) René
On Wed, Dec 05, 2018 at 07:41:44PM +0100, René Scharfe wrote: > > If we start to convert those, there's a > > little bit of a rabbit hole, but it's actually not too bad. > > You don't need to crawl in just for quick_has_loose(), but eventually > everything has to be converted. It seems a bit much for one patch, but > perhaps that's just my ever-decreasing attention span speaking. Yeah, my normal process here is to dig to the bottom of the rabbit hole, and then break it into actual patches. I just shared the middle state here. ;) I suspect the http bits could be split off into their own thing. The bits in sha1-file.c I'd plan to mostly do all together, as they are a family of related functions. Mostly I wasn't sure how to wrap this up with the other changes. It's obviously pretty invasive, and I don't want it to get in the way of actual functional changes we've been discussing. > > @@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags) > > * Note that it may point to static storage and is only valid until another > > * call to stat_sha1_file(). > > */ > > -static int stat_sha1_file(struct repository *r, const unsigned char *sha1, > > - struct stat *st, const char **path) > > +static int stat_loose_object(struct repository *r, const struct object_id *oid, > > + struct stat *st, const char **path) > > Hmm, read_sha1_file() was renamed to read_object_file() earlier this > year, and I'd have expected this to become stat_object_file(). Names.. read_object_file() is about reading an object from _any_ source. These are specifically about loose objects, and I think that distinction is important (both here and for map_loose_object, etc). I'd actually argue that read_object_file() should just be read_object(), but that already exists. Sigh. (I think it's fixable, but obviously orthogonal to this topic). > Anyway, the comment above and one a few lines below should be updated > as well. Thanks, fixed. > > static int open_sha1_file(struct repository *r, > > - const unsigned char *sha1, const char **path) > > + const struct object_id *oid, const char **path) > > That function should lose the "sha1" in its name as well. Yep, fixed. > > -static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1) > > +static void *unpack_loose_rest(git_zstream *stream, > > + void *buffer, unsigned long size, > > + const struct object_id *oid) > > Hmm, both old and new name here look weird to me at this point. It makes more sense in the pairing of unpack_sha1_header() and unpack_sha1_rest(). Maybe "body" would be better than "rest". At any rate, it probably makes sense to rename them together (but I didn't touch the "header" one here). Maybe the name changes should come as a separate patch. I was mostly changing them here because I was changing the signatures anyway, and had to touch all of the callers. -Peff
diff --git a/object-store.h b/object-store.h index bf1e0cb761..60758efad8 100644 --- a/object-store.h +++ b/object-store.h @@ -13,6 +13,7 @@ struct object_directory { /* * Used to store the results of readdir(3) calls when we are OK * sacrificing accuracy due to races for speed. That includes + * object existence with OBJECT_INFO_QUICK, as well as * our search for unique abbreviated hashes. Don't use it for tasks * requiring greater accuracy! * diff --git a/sha1-file.c b/sha1-file.c index 4aae716a37..e53da0b701 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -921,6 +921,24 @@ static int open_sha1_file(struct repository *r, return -1; } +static int quick_has_loose(struct repository *r, + const unsigned char *sha1) +{ + int subdir_nr = sha1[0]; + struct object_id oid; + struct object_directory *odb; + + hashcpy(oid.hash, sha1); + + prepare_alt_odb(r); + for (odb = r->objects->odb; odb; odb = odb->next) { + odb_load_loose_cache(odb, subdir_nr); + if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0) + return 1; + } + return 0; +} + /* * Map the loose object at "path" if it is not NULL, or the path found by * searching for a loose object named "sha1". @@ -1171,6 +1189,8 @@ static int sha1_loose_object_info(struct repository *r, if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) { const char *path; struct stat st; + if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK)) + return quick_has_loose(r, sha1) ? 0 : -1; if (stat_sha1_file(r, sha1, &st, &path) < 0) return -1; if (oi->disk_sizep)
In cases where we expect to ask has_sha1_file() about a lot of objects that we are not likely to have (e.g., during fetch negotiation), we already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with a simultaneous write or repack) for speed (we avoid re-scanning the pack directory). However, even checking for loose objects can be expensive, as we will stat() each one. On many systems this cost isn't too noticeable, but stat() can be particularly slow on some operating systems, or due to network filesystems. Since the QUICK flag already tells us that we're OK with a slightly stale answer, we can use that as a cue to look in our in-memory cache of each object directory. That basically trades an in-memory binary search for a stat() call. Note that it is possible for this to actually be _slower_. We'll do a full readdir() to fill the cache, so if you have a very large number of loose objects and a very small number of lookups, that readdir() may end up more expensive. This shouldn't be a big deal in practice. If you have a large number of reachable loose objects, you'll already run into performance problems (which you should remedy by repacking). You may have unreachable objects which wouldn't otherwise impact performance. Usually these would go away with the prune step of "git gc", but they may be held for up to 2 weeks in the default configuration. So it comes down to how many such objects you might reasonably expect to have, how much slower is readdir() on N entries versus M stat() calls (and here we really care about the syscall backing readdir(), like getdents() on Linux, but I'll just call this readdir() below). If N is much smaller than M (a typical packed repo), we know this is a big win (few readdirs() followed by many uses of the resulting cache). When N and M are similar in size, it's also a win. We care about the latency of making a syscall, and readdir() should be giving us many values in a single call. How many? On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512 entries per call (which is 64 bytes per entry; the name itself is 38 bytes, plus there are some other fields). So we can imagine that this is always a win as long as the number of loose objects in the repository is a factor of 500 less than the number of lookups you make. It's hard to auto-tune this because we don't generally know up front how many lookups we're going to do. But it's unlikely for this to perform significantly worse. Signed-off-by: Jeff King <peff@peff.net> --- There's some obvious hand-waving in the paragraphs above. I would love it if somebody with an NFS system could do some before/after timings with various numbers of loose objects, to get a sense of where the breakeven point is. My gut is that we do not need the complexity of a cache-size limit, nor of a config option to disable this. But it would be nice to have a real number where "reasonable" ends and "pathological" begins. :) object-store.h | 1 + sha1-file.c | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+)