Message ID | 8da7bad2-b5a8-5aef-284b-dfa4e78556a9@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hash: reduce size of algo member of object ID | expand |
René Scharfe <l.s.r@web.de> wrote: > cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) > introduced the algo member as an int. This increased the size of struct > object_id by 4 bytes (12.5%) and imposed a 4-byte alignment. Currently > we only need to stored the values 0, 1 and 2 in it. Let's use an > unsigned char instead to reduce the size overhead and lift the alignment > requirement. I like it. Btw, would a bitfield enum be portable enough for us? enum git_hash_algo algo:8; /* or algo:2 */ I've used those in other projects, but AFAIK they were for gcc||clang-only. > Not sure how to measure the performance impact of this change. The perf > tests mentioned by cf0983213c don't show much of a difference with > GIT_PERF_REPEAT_COUNT=10 for me: Thanks for that info. I'm not familiar enough with most git internals to know if it ought to make a difference as-is. In my experience, changes like this open the door to further struct-packing opportunities (possibly combined with conditional use of __attribute__((packed)) ).
On Sun, Oct 03, 2021 at 07:51:40AM +0200, René Scharfe wrote: > cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) > introduced the algo member as an int. This increased the size of struct > object_id by 4 bytes (12.5%) and imposed a 4-byte alignment. Currently > we only need to stored the values 0, 1 and 2 in it. Let's use an > unsigned char instead to reduce the size overhead and lift the alignment > requirement. > > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > Not sure how to measure the performance impact of this change. The perf > tests mentioned by cf0983213c don't show much of a difference with > GIT_PERF_REPEAT_COUNT=10 for me: Not surprising. If going from nothing to an int didn't have an impact on those tests, then going from an int to char isn't likely to, either. > 0001.1: rev-list --all 0.11(0.08+0.02) 0.11(0.08+0.02) +0.0% > 0001.2: rev-list --all --objects 3.04(2.98+0.05) 3.04(2.98+0.05) +0.0% > 0001.3: rev-list --parents 0.05(0.04+0.01) 0.05(0.03+0.01) +0.0% > 0001.5: rev-list -- dummy 0.21(0.20+0.01) 0.21(0.19+0.01) +0.0% > 0001.6: rev-list --parents -- dummy 0.22(0.20+0.01) 0.22(0.20+0.01) +0.0% > 0001.8: rev-list $commit --not --all 0.06(0.05+0.00) 0.06(0.05+0.00) +0.0% > 0001.9: rev-list --objects $commit --not --all 0.06(0.05+0.00) 0.06(0.05+0.00) +0.0% I do think these probably aren't very good tests. They're going to mostly be dealing with actual object structs. Your patch changes the size of "struct object_id", but the object structs themselves still need 4-byte alignment. And they have a bunch of padding in the first place. If we instrument "git version --build-options" like this: diff --git a/help.c b/help.c index 3c3bdec213..718e32cadf 100644 --- a/help.c +++ b/help.c @@ -11,6 +11,8 @@ #include "version.h" #include "refs.h" #include "parse-options.h" +#include "blob.h" +#include "tag.h" struct category_description { uint32_t category; @@ -663,6 +665,11 @@ void get_version_info(struct strbuf *buf, int show_build_options) strbuf_addf(buf, "sizeof-long: %d\n", (int)sizeof(long)); strbuf_addf(buf, "sizeof-size_t: %d\n", (int)sizeof(size_t)); strbuf_addf(buf, "shell-path: %s\n", SHELL_PATH); + strbuf_addf(buf, "sizeof-object_id: %d\n", (int)sizeof(struct object_id)); + strbuf_addf(buf, "sizeof-commit: %d\n", (int)sizeof(struct commit)); + strbuf_addf(buf, "sizeof-blob: %d\n", (int)sizeof(struct blob)); + strbuf_addf(buf, "sizeof-tree: %d\n", (int)sizeof(struct tree)); + strbuf_addf(buf, "sizeof-tag: %d\n", (int)sizeof(struct tag)); /* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */ } } then we can see that your patch doesn't the change the size of any of the structs at all! Even the original cf0983213c going from nothing to an int changed only "struct blob" (it went from 32 to 36 bytes). A more interesting test is something that stores a bunch of oids. A good candidate is "cat-file --batch-all-objects". In the default mode, it uses an oid_array to sort the set of objects. In unordered mode, it uses an oidset to mark seen ones. Here are hyperfine results. The three versions are: - none: cf0983213c^ - int: cf0983213c - char: cf0983213c with the equivalent of your patch on top All compiled with "gcc -O2", and run in a fully packed clone of linux.git. It looks like adding the "algo" field did make a big difference for the oid_array case, but changing it to a char doesn't seem to help at all: $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"' Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.653 s ± 0.009 s [User: 1.607 s, System: 0.046 s] Range (min … max): 1.640 s … 1.670 s 10 runs Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.067 s ± 0.012 s [User: 1.017 s, System: 0.050 s] Range (min … max): 1.053 s … 1.089 s 10 runs Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.092 s ± 0.013 s [User: 1.046 s, System: 0.046 s] Range (min … max): 1.080 s … 1.116 s 10 runs Summary './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran 1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"' 1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"' I'm actually surprised it had this much of an impact. But I guess this benchmark really is mostly just memcpy-ing oids into a big array, sorting it, and printing the result. If that array is 12% bigger, we'd expect at least a 12% speedup. But adding in non-linear elements like growing the array (though I guess that is amortized linear) and sorting (which picks up an extra log(n) term) make the difference. It's _kind of_ silly in a sense, since usually you'd ask for other parts of the object, which will make the speed difference relatively smaller. But just dumping a bunch of oids is actually not an unreasonable thing to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too (even when you're using 20-byte sha1), but I don't think there's an easy way to get out of that. Using --unordered switches us to using a khash-backed oidset. This doesn't seem to show any meaningful difference in the three cases (the averages follow the pattern I'd expect, but we're within the noise). $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --unordered --batch-check="%(objectname)"' Benchmark #1: ./git.none cat-file --batch-all-objects --unordered --batch-check="%(objectname)" Time (mean ± σ): 2.125 s ± 0.024 s [User: 2.071 s, System: 0.054 s] Range (min … max): 2.092 s … 2.182 s 10 runs Benchmark #2: ./git.int cat-file --batch-all-objects --unordered --batch-check="%(objectname)" Time (mean ± σ): 2.152 s ± 0.016 s [User: 2.083 s, System: 0.068 s] Range (min … max): 2.123 s … 2.175 s 10 runs Benchmark #3: ./git.char cat-file --batch-all-objects --unordered --batch-check="%(objectname)" Time (mean ± σ): 2.137 s ± 0.013 s [User: 2.079 s, System: 0.058 s] Range (min … max): 2.127 s … 2.167 s 10 runs Summary './git.none cat-file --batch-all-objects --unordered --batch-check="%(objectname)"' ran 1.01 ± 0.01 times faster than './git.char cat-file --batch-all-objects --unordered --batch-check="%(objectname)"' 1.01 ± 0.01 times faster than './git.int cat-file --batch-all-objects --unordered --batch-check="%(objectname)"' What surprises me most here is that --unordered is so much slower than its sorted cousin. It's purpose is to make things faster (though it likely still does so for queries that actually look at the object properties, since hitting them in pack order is beneficial there). -Peff
On Mon, Oct 04, 2021 at 04:13:34AM -0400, Jeff King wrote: > It looks like adding the "algo" field did make a big difference for the > oid_array case, but changing it to a char doesn't seem to help at all: > > $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"' > Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.653 s ± 0.009 s [User: 1.607 s, System: 0.046 s] > Range (min … max): 1.640 s … 1.670 s 10 runs > > Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.067 s ± 0.012 s [User: 1.017 s, System: 0.050 s] > Range (min … max): 1.053 s … 1.089 s 10 runs > > Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.092 s ± 0.013 s [User: 1.046 s, System: 0.046 s] > Range (min … max): 1.080 s … 1.116 s 10 runs > > Summary > './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran > 1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"' > 1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"' > > I'm actually surprised it had this much of an impact. But I guess this > benchmark really is mostly just memcpy-ing oids into a big array, > sorting it, and printing the result. If that array is 12% bigger, we'd > expect at least a 12% speedup. But adding in non-linear elements like > growing the array (though I guess that is amortized linear) and sorting > (which picks up an extra log(n) term) make the difference. > > It's _kind of_ silly in a sense, since usually you'd ask for other parts > of the object, which will make the speed difference relatively smaller. > But just dumping a bunch of oids is actually not an unreasonable thing > to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too > (even when you're using 20-byte sha1), but I don't think there's an easy > way to get out of that. Oh wait, I'm reading it totally wrong. Adding in the extra 4 bytes actually made it _faster_ than not having an algo field. Now I'm super-confused. I could believe that it gave us some better alignment, but the original struct was 32 bytes. 36 seems like a strictly worse number. -Peff
On Mon, Oct 04 2021, Jeff King wrote: > On Sun, Oct 03, 2021 at 07:51:40AM +0200, René Scharfe wrote: > >> cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) >> introduced the algo member as an int. This increased the size of struct >> object_id by 4 bytes (12.5%) and imposed a 4-byte alignment. Currently >> we only need to stored the values 0, 1 and 2 in it. Let's use an >> unsigned char instead to reduce the size overhead and lift the alignment >> requirement. >> >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> Not sure how to measure the performance impact of this change. The perf >> tests mentioned by cf0983213c don't show much of a difference with >> GIT_PERF_REPEAT_COUNT=10 for me: > > Not surprising. If going from nothing to an int didn't have an impact on > those tests, then going from an int to char isn't likely to, either. > >> 0001.1: rev-list --all 0.11(0.08+0.02) 0.11(0.08+0.02) +0.0% >> 0001.2: rev-list --all --objects 3.04(2.98+0.05) 3.04(2.98+0.05) +0.0% >> 0001.3: rev-list --parents 0.05(0.04+0.01) 0.05(0.03+0.01) +0.0% >> 0001.5: rev-list -- dummy 0.21(0.20+0.01) 0.21(0.19+0.01) +0.0% >> 0001.6: rev-list --parents -- dummy 0.22(0.20+0.01) 0.22(0.20+0.01) +0.0% >> 0001.8: rev-list $commit --not --all 0.06(0.05+0.00) 0.06(0.05+0.00) +0.0% >> 0001.9: rev-list --objects $commit --not --all 0.06(0.05+0.00) 0.06(0.05+0.00) +0.0% > > I do think these probably aren't very good tests. They're going to > mostly be dealing with actual object structs. Your patch changes the > size of "struct object_id", but the object structs themselves still need > 4-byte alignment. And they have a bunch of padding in the first place. > > If we instrument "git version --build-options" like this: > > diff --git a/help.c b/help.c > index 3c3bdec213..718e32cadf 100644 > --- a/help.c > +++ b/help.c > @@ -11,6 +11,8 @@ > #include "version.h" > #include "refs.h" > #include "parse-options.h" > +#include "blob.h" > +#include "tag.h" > > struct category_description { > uint32_t category; > @@ -663,6 +665,11 @@ void get_version_info(struct strbuf *buf, int show_build_options) > strbuf_addf(buf, "sizeof-long: %d\n", (int)sizeof(long)); > strbuf_addf(buf, "sizeof-size_t: %d\n", (int)sizeof(size_t)); > strbuf_addf(buf, "shell-path: %s\n", SHELL_PATH); > + strbuf_addf(buf, "sizeof-object_id: %d\n", (int)sizeof(struct object_id)); > + strbuf_addf(buf, "sizeof-commit: %d\n", (int)sizeof(struct commit)); > + strbuf_addf(buf, "sizeof-blob: %d\n", (int)sizeof(struct blob)); > + strbuf_addf(buf, "sizeof-tree: %d\n", (int)sizeof(struct tree)); > + strbuf_addf(buf, "sizeof-tag: %d\n", (int)sizeof(struct tag)); > /* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */ > } > } > > then we can see that your patch doesn't the change the size of any of > the structs at all! Even the original cf0983213c going from nothing to > an int changed only "struct blob" (it went from 32 to 36 bytes). > > A more interesting test is something that stores a bunch of oids. A good > candidate is "cat-file --batch-all-objects". In the default mode, it > uses an oid_array to sort the set of objects. In unordered mode, it uses > an oidset to mark seen ones. > > Here are hyperfine results. The three versions are: > > - none: cf0983213c^ > - int: cf0983213c > - char: cf0983213c with the equivalent of your patch on top > > All compiled with "gcc -O2", and run in a fully packed clone of > linux.git. > > It looks like adding the "algo" field did make a big difference for the > oid_array case, but changing it to a char doesn't seem to help at all: > > $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"' > Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.653 s ± 0.009 s [User: 1.607 s, System: 0.046 s] > Range (min … max): 1.640 s … 1.670 s 10 runs > > Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.067 s ± 0.012 s [User: 1.017 s, System: 0.050 s] > Range (min … max): 1.053 s … 1.089 s 10 runs > > Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.092 s ± 0.013 s [User: 1.046 s, System: 0.046 s] > Range (min … max): 1.080 s … 1.116 s 10 runs > > Summary > './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran > 1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"' > 1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"' *Rubs eyes* Maybe I'm being really stupid here, but doesn't this say the opposite of your summary? I.e. that adding the "int" field in cf0983213c made it faster. It was ~1.6s before, now it's ~1.1s? I haven't used "hyperfine" before, but this output is consistent with that: $ hyperfine --warmup 2 -L v 1.6,1.0 'sleep {v}' Benchmark #1: sleep 1.6 Time (mean ± σ): 1.604 s ± 0.000 s [User: 2.2 ms, System: 1.1 ms] Range (min … max): 1.603 s … 1.604 s 10 runs Benchmark #2: sleep 1.0 Time (mean ± σ): 1.004 s ± 0.000 s [User: 2.5 ms, System: 0.9 ms] Range (min … max): 1.004 s … 1.004 s 10 runs Summary 'sleep 1.0' ran 1.60 ± 0.00 times faster than 'sleep 1.6' [Goes and reads the downthread <YVq5XCyLDr0SPBzx@coredump.intra.peff.net> where I see you made the same discovery, should have probably done that first :)] Anyway, just for experimenting I played with removing "int algo" entirely on top of master. Diff on top at the end. That gives the same sorts of results: $ hyperfine --warmup 2 -L v algo-02,algo-03,no-algo-02,no-algo-03 '~/g/git/git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"' Benchmark #1: ~/g/git/git.algo-02 cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.283 s ± 0.022 s [User: 1.217 s, System: 0.066 s] Range (min … max): 1.265 s … 1.336 s 10 runs Benchmark #2: ~/g/git/git.algo-03 cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.261 s ± 0.032 s [User: 1.195 s, System: 0.066 s] Range (min … max): 1.240 s … 1.343 s 10 runs Benchmark #3: ~/g/git/git.no-algo-02 cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.911 s ± 0.034 s [User: 1.853 s, System: 0.058 s] Range (min … max): 1.878 s … 1.977 s 10 runs Benchmark #4: ~/g/git/git.no-algo-03 cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.880 s ± 0.011 s [User: 1.824 s, System: 0.056 s] Range (min … max): 1.864 s … 1.905 s 10 runs Summary '~/g/git/git.algo-03 cat-file --batch-all-objects --batch-check="%(objectname)"' ran 1.02 ± 0.03 times faster than '~/g/git/git.algo-02 cat-file --batch-all-objects --batch-check="%(objectname)"' 1.49 ± 0.04 times faster than '~/g/git/git.no-algo-03 cat-file --batch-all-objects --batch-check="%(objectname)"' 1.52 ± 0.05 times faster than '~/g/git/git.no-algo-02 cat-file --batch-all-objects --batch-check="%(objectname)"' Of course I did that thinking it might be worthwhile to provide say a compile-time option for SHA-1-only deployments of git, but that's when I went along with your misreading of the hyperfine output, seems it actually makes it fasterer :) Aside: I recently started testing git on HP/UX's aCC, which is quite whiny about things like this: "blame.c", line 2677: warning #4232-D: conversion from "struct object *" to a more strictly aligned type "struct commit *" may cause misaligned access found = (struct commit *)obj; Which I initially thought might be changed by something like René's patch, but is (I think) due to the timestamp_t in commit.h, presumably it's 32bit on HP/UX still? And int is 64 (I haven't actually been able to fully complie git yet, so...). Just an aside, but also wondering how if anything that & other alignment might have to do with these results on platforms/compilers that are less whiny about it. Or maybe it's all aligned on x86_64 (again, didn't check). > I'm actually surprised it had this much of an impact. But I guess this > benchmark really is mostly just memcpy-ing oids into a big array, > sorting it, and printing the result. If that array is 12% bigger, we'd > expect at least a 12% speedup. But adding in non-linear elements like > growing the array (though I guess that is amortized linear) and sorting > (which picks up an extra log(n) term) make the difference. > > It's _kind of_ silly in a sense, since usually you'd ask for other parts > of the object, which will make the speed difference relatively smaller. > But just dumping a bunch of oids is actually not an unreasonable thing > to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too > (even when you're using 20-byte sha1), but I don't think there's an easy > way to get out of that. [I wrote the below before seeing that you misread the "hyperfine" output, but most of it applies still]: Not easy, but it might not be super-duper-hard either, I think it might be a worthwhile thing to try, and these results seem to back that up. I had a mostly-working patch for that that I never submitted and just hacked up one afternoon. I was experimenting with it for something different: To have Git support a SHA-160 (completely non-standard, the lowest is SHA-224), i.e. a SHA-256 truncated to SHA-1 size. The advantage being that for say a system that has various outside-of-git (e.g. DB columns) hardcoded to SHA-1 size you could convert the repo to SHA-256, but use SHA-1-sized hashes publicly. Less secure than SHA-256 obviously, but only proportionally to the reduction in bits, and would still be less hairy than a SHA-1 derivative. I didn't think it was worthwhile & threw it away after playing with it, but anyway... Once you've got that, which requires disconnecting the hard assumption in hash.h and friends about hash type == hash length, i.e. oid_to_hex() and friends need to have variants that take 160, 224 etc. You can also do it the other way around. Say have a format like the commit-graph reference 80 bit objects, but have the object store stored in the full 256 bits. That obviously creates all sorts of hairy edge cases where none exist today. I.e. if you set the storage hash a SHA-1 repo to be 1/4 the size that it is now you can't push/pull anything, or if you could we'd need to piggy-back on the planned on-the-fly SHA-1<->SHA-256 rewriting. But it might be interesting & useful for these performance reasons. I.e. it's pretty easy (and I had some throwaway version of this working with only the "occasional" segfault) to get a copy of "git" running that say stores all OIDs as 1/4 their size. As long as you know you've got no collisions you can use that as a quick side-index. All your problems with interoperability are ones we'll need to deal with for SHA-1<->SHA-256, although you'll have the extra potential problem of collisions (which is X% likely depending on your object numbers & size of truncation you choose). Diff at end: diff --git a/builtin/show-index.c b/builtin/show-index.c index 0e0b9fb95bc..b63a3bf60b9 100644 --- a/builtin/show-index.c +++ b/builtin/show-index.c @@ -74,7 +74,6 @@ int cmd_show_index(int argc, const char **argv, const char *prefix) for (i = 0; i < nr; i++) { if (fread(entries[i].oid.hash, hashsz, 1, stdin) != 1) die("unable to read sha1 %u/%u", i, nr); - entries[i].oid.algo = hash_algo_by_ptr(the_hash_algo); } for (i = 0; i < nr; i++) if (fread(&entries[i].crc, 4, 1, stdin) != 1) diff --git a/hash.h b/hash.h index 9e25c40e9ac..bd1855b65ec 100644 --- a/hash.h +++ b/hash.h @@ -115,7 +115,6 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s struct object_id { unsigned char hash[GIT_MAX_RAWSZ]; - int algo; /* XXX requires 4-byte alignment */ }; /* A suitably aligned type for stack allocations of hash contexts. */ @@ -213,12 +212,7 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) { - const struct git_hash_algo *algop; - if (!oid1->algo) - algop = the_hash_algo; - else - algop = &hash_algos[oid1->algo]; - return hashcmp_algop(oid1->hash, oid2->hash, algop); + return hashcmp_algop(oid1->hash, oid2->hash, the_hash_algo); } static inline int hasheq_algop(const unsigned char *sha1, const unsigned char *sha2, const struct git_hash_algo *algop) @@ -239,12 +233,7 @@ static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2) static inline int oideq(const struct object_id *oid1, const struct object_id *oid2) { - const struct git_hash_algo *algop; - if (!oid1->algo) - algop = the_hash_algo; - else - algop = &hash_algos[oid1->algo]; - return hasheq_algop(oid1->hash, oid2->hash, algop); + return hasheq_algop(oid1->hash, oid2->hash, the_hash_algo); } static inline int is_null_oid(const struct object_id *oid) @@ -260,23 +249,16 @@ static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src) static inline void oidcpy(struct object_id *dst, const struct object_id *src) { memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ); - dst->algo = src->algo; } /* Like oidcpy() but zero-pads the unused bytes in dst's hash array. */ static inline void oidcpy_with_padding(struct object_id *dst, const struct object_id *src) { - size_t hashsz; - - if (!src->algo) - hashsz = the_hash_algo->rawsz; - else - hashsz = hash_algos[src->algo].rawsz; + size_t hashsz = the_hash_algo->rawsz; memcpy(dst->hash, src->hash, hashsz); memset(dst->hash + hashsz, 0, GIT_MAX_RAWSZ - hashsz); - dst->algo = src->algo; } static inline struct object_id *oiddup(const struct object_id *src) @@ -294,13 +276,11 @@ static inline void hashclr(unsigned char *hash) static inline void oidclr(struct object_id *oid) { memset(oid->hash, 0, GIT_MAX_RAWSZ); - oid->algo = hash_algo_by_ptr(the_hash_algo); } static inline void oidread(struct object_id *oid, const unsigned char *hash) { memcpy(oid->hash, hash, the_hash_algo->rawsz); - oid->algo = hash_algo_by_ptr(the_hash_algo); } static inline int is_empty_blob_sha1(const unsigned char *sha1) @@ -325,7 +305,7 @@ static inline int is_empty_tree_oid(const struct object_id *oid) static inline void oid_set_algo(struct object_id *oid, const struct git_hash_algo *algop) { - oid->algo = hash_algo_by_ptr(algop); + return; } const char *empty_tree_oid_hex(void); diff --git a/hex.c b/hex.c index 4f64d346963..6538e415a37 100644 --- a/hex.c +++ b/hex.c @@ -143,7 +143,7 @@ char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, char *oid_to_hex_r(char *buffer, const struct object_id *oid) { - return hash_to_hex_algop_r(buffer, oid->hash, &hash_algos[oid->algo]); + return hash_to_hex_algop_r(buffer, oid->hash, &hash_algos[GIT_HASH_SHA1]); } char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop) @@ -161,5 +161,5 @@ char *hash_to_hex(const unsigned char *hash) char *oid_to_hex(const struct object_id *oid) { - return hash_to_hex_algop(oid->hash, &hash_algos[oid->algo]); + return hash_to_hex_algop(oid->hash, &hash_algos[GIT_HASH_SHA1]); } diff --git a/http-push.c b/http-push.c index 3309aaf004a..3ce453e14a4 100644 --- a/http-push.c +++ b/http-push.c @@ -1011,8 +1011,6 @@ static void remote_ls(const char *path, int flags, /* extract hex from sharded "xx/x{38}" filename */ static int get_oid_hex_from_objpath(const char *path, struct object_id *oid) { - oid->algo = hash_algo_by_ptr(the_hash_algo); - if (strlen(path) != the_hash_algo->hexsz + 1) return -1; diff --git a/object-file.c b/object-file.c index be4f94ecf3b..be1385c5e72 100644 --- a/object-file.c +++ b/object-file.c @@ -58,27 +58,21 @@ static const struct object_id empty_tree_oid = { .hash = EMPTY_TREE_SHA1_BIN_LITERAL, - .algo = GIT_HASH_SHA1, }; static const struct object_id empty_blob_oid = { .hash = EMPTY_BLOB_SHA1_BIN_LITERAL, - .algo = GIT_HASH_SHA1, }; static const struct object_id null_oid_sha1 = { .hash = {0}, - .algo = GIT_HASH_SHA1, }; static const struct object_id empty_tree_oid_sha256 = { .hash = EMPTY_TREE_SHA256_BIN_LITERAL, - .algo = GIT_HASH_SHA256, }; static const struct object_id empty_blob_oid_sha256 = { .hash = EMPTY_BLOB_SHA256_BIN_LITERAL, - .algo = GIT_HASH_SHA256, }; static const struct object_id null_oid_sha256 = { .hash = {0}, - .algo = GIT_HASH_SHA256, }; static void git_hash_sha1_init(git_hash_ctx *ctx) @@ -105,7 +99,6 @@ static void git_hash_sha1_final_oid(struct object_id *oid, git_hash_ctx *ctx) { git_SHA1_Final(oid->hash, &ctx->sha1); memset(oid->hash + GIT_SHA1_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA1_RAWSZ); - oid->algo = GIT_HASH_SHA1; } @@ -137,7 +130,6 @@ static void git_hash_sha256_final_oid(struct object_id *oid, git_hash_ctx *ctx) * but keep it in case we extend the hash size again. */ memset(oid->hash + GIT_SHA256_RAWSZ, 0, GIT_MAX_RAWSZ - GIT_SHA256_RAWSZ); - oid->algo = GIT_HASH_SHA256; } static void git_hash_unknown_init(git_hash_ctx *ctx) diff --git a/oidtree.c b/oidtree.c index 0d39389bee2..61f4d9515b5 100644 --- a/oidtree.c +++ b/oidtree.c @@ -33,9 +33,6 @@ void oidtree_insert(struct oidtree *ot, const struct object_id *oid) struct cb_node *on; struct object_id k; - if (!oid->algo) - BUG("oidtree_insert requires oid->algo"); - on = mem_pool_alloc(&ot->mem_pool, sizeof(*on) + sizeof(*oid)); /* @@ -62,13 +59,6 @@ int oidtree_contains(struct oidtree *ot, const struct object_id *oid) oidcpy_with_padding(&k, oid); - if (oid->algo == GIT_HASH_UNKNOWN) - klen -= sizeof(oid->algo); - - /* cb_lookup relies on memcmp on the struct, so order matters: */ - klen += BUILD_ASSERT_OR_ZERO(offsetof(struct object_id, hash) < - offsetof(struct object_id, algo)); - return cb_lookup(&ot->tree, (const uint8_t *)&k, klen) ? 1 : 0; } @@ -80,9 +70,6 @@ static enum cb_next iter(struct cb_node *n, void *arg) /* Copy to provide 4-byte alignment needed by struct object_id. */ memcpy(&k, n->k, sizeof(k)); - if (x->algo != GIT_HASH_UNKNOWN && x->algo != k.algo) - return CB_CONTINUE; - if (x->last_nibble_at) { if ((k.hash[*x->last_nibble_at] ^ x->last_byte) & 0xf0) return CB_CONTINUE; @@ -100,7 +87,6 @@ void oidtree_each(struct oidtree *ot, const struct object_id *oid, x.fn = fn; x.arg = arg; - x.algo = oid->algo; if (oidhexsz & 1) { x.last_byte = oid->hash[klen]; x.last_nibble_at = &klen; diff --git a/t/helper/test-oidtree.c b/t/helper/test-oidtree.c index 180ee28dd93..6e22b422ddc 100644 --- a/t/helper/test-oidtree.c +++ b/t/helper/test-oidtree.c @@ -13,7 +13,7 @@ int cmd__oidtree(int argc, const char **argv) struct oidtree ot; struct strbuf line = STRBUF_INIT; int nongit_ok; - int algo = GIT_HASH_UNKNOWN; + int algo = GIT_HASH_SHA1; oidtree_init(&ot); setup_git_directory_gently(&nongit_ok); @@ -25,7 +25,6 @@ int cmd__oidtree(int argc, const char **argv) if (skip_prefix(line.buf, "insert ", &arg)) { if (get_oid_hex_any(arg, &oid) == GIT_HASH_UNKNOWN) die("insert not a hexadecimal oid: %s", arg); - algo = oid.algo; oidtree_insert(&ot, &oid); } else if (skip_prefix(line.buf, "contains ", &arg)) { if (get_oid_hex(arg, &oid)) @@ -37,7 +36,6 @@ int cmd__oidtree(int argc, const char **argv) memcpy(buf, arg, strlen(arg)); buf[hash_algos[algo].hexsz] = '\0'; get_oid_hex_any(buf, &oid); - oid.algo = algo; oidtree_each(&ot, &oid, strlen(arg), print_oid, NULL); } else if (!strcmp(line.buf, "clear")) { oidtree_clear(&ot);
On 2021-10-04 at 08:20:44, Jeff King wrote: > Oh wait, I'm reading it totally wrong. Adding in the extra 4 bytes > actually made it _faster_ than not having an algo field. Now I'm > super-confused. I could believe that it gave us some better alignment, > but the original struct was 32 bytes. 36 seems like a strictly worse > number. My guess is that the increased alignment means that memcpy can perform much better. Just because x86 has "fast" unaligned access doesn't mean it's free; there remains a penalty for that, although other architectures which support unaligned access have much worse ones. memcpy and memcmp will perform better when they can use 32-bit chunks to read without having to process the potentially unaligned pieces at the beginning and end. For the record, I have no particular stylistic opinion about whether we should adopt the proposed patch, but of course if it's faster as it is, we should probably leave it.
Am 04.10.21 um 10:20 schrieb Jeff King: > On Mon, Oct 04, 2021 at 04:13:34AM -0400, Jeff King wrote: > >> It looks like adding the "algo" field did make a big difference for the >> oid_array case, but changing it to a char doesn't seem to help at all: >> >> $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"' >> Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)" >> Time (mean ± σ): 1.653 s ± 0.009 s [User: 1.607 s, System: 0.046 s] >> Range (min … max): 1.640 s … 1.670 s 10 runs >> >> Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)" >> Time (mean ± σ): 1.067 s ± 0.012 s [User: 1.017 s, System: 0.050 s] >> Range (min … max): 1.053 s … 1.089 s 10 runs >> >> Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)" >> Time (mean ± σ): 1.092 s ± 0.013 s [User: 1.046 s, System: 0.046 s] >> Range (min … max): 1.080 s … 1.116 s 10 runs >> >> Summary >> './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran >> 1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"' >> 1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"' >> >> I'm actually surprised it had this much of an impact. But I guess this >> benchmark really is mostly just memcpy-ing oids into a big array, >> sorting it, and printing the result. If that array is 12% bigger, we'd >> expect at least a 12% speedup. But adding in non-linear elements like >> growing the array (though I guess that is amortized linear) and sorting >> (which picks up an extra log(n) term) make the difference. >> >> It's _kind of_ silly in a sense, since usually you'd ask for other parts >> of the object, which will make the speed difference relatively smaller. >> But just dumping a bunch of oids is actually not an unreasonable thing >> to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too >> (even when you're using 20-byte sha1), but I don't think there's an easy >> way to get out of that. > > Oh wait, I'm reading it totally wrong. Adding in the extra 4 bytes > actually made it _faster_ than not having an algo field. Now I'm > super-confused. I could believe that it gave us some better alignment, > but the original struct was 32 bytes. 36 seems like a strictly worse > number. Apple clang 13 on arm64-apple-darwin20.6.0: cf0983213c^ (no algo): Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.105 s ± 0.017 s [User: 1.058 s, System: 0.045 s] Range (min … max): 1.086 s … 1.126 s 10 runs cf0983213c (int algo): Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.186 s ± 0.015 s [User: 1.134 s, System: 0.051 s] Range (min … max): 1.162 s … 1.201 s 10 runs cf0983213c (unsigned char algo): Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.409 s ± 0.013 s [User: 1.367 s, System: 0.041 s] Range (min … max): 1.397 s … 1.429 s 10 runs current master (int algo): Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.217 s ± 0.018 s [User: 1.170 s, System: 0.046 s] Range (min … max): 1.202 s … 1.246 s 10 runs current master with the patch (unsigned char algo): Benchmark #1: ./git -C ../linux cat-file --batch-all-objects --batch-check="%(objectname)" Time (mean ± σ): 1.465 s ± 0.012 s [User: 1.417 s, System: 0.046 s] Range (min … max): 1.449 s … 1.480 s 10 runs OK, I don't like my patch anymore. Weird, though. René
diff --git a/hash.h b/hash.h index 9e25c40e9a..24d8f7cd21 100644 --- a/hash.h +++ b/hash.h @@ -115,7 +115,7 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s struct object_id { unsigned char hash[GIT_MAX_RAWSZ]; - int algo; /* XXX requires 4-byte alignment */ + unsigned char algo; }; /* A suitably aligned type for stack allocations of hash contexts. */
cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) introduced the algo member as an int. This increased the size of struct object_id by 4 bytes (12.5%) and imposed a 4-byte alignment. Currently we only need to stored the values 0, 1 and 2 in it. Let's use an unsigned char instead to reduce the size overhead and lift the alignment requirement. Signed-off-by: René Scharfe <l.s.r@web.de> --- Not sure how to measure the performance impact of this change. The perf tests mentioned by cf0983213c don't show much of a difference with GIT_PERF_REPEAT_COUNT=10 for me: Test origin/master HEAD -------------------------------------------------------------------------------------------- 0001.1: rev-list --all 0.11(0.08+0.02) 0.11(0.08+0.02) +0.0% 0001.2: rev-list --all --objects 3.04(2.98+0.05) 3.04(2.98+0.05) +0.0% 0001.3: rev-list --parents 0.05(0.04+0.01) 0.05(0.03+0.01) +0.0% 0001.5: rev-list -- dummy 0.21(0.20+0.01) 0.21(0.19+0.01) +0.0% 0001.6: rev-list --parents -- dummy 0.22(0.20+0.01) 0.22(0.20+0.01) +0.0% 0001.8: rev-list $commit --not --all 0.06(0.05+0.00) 0.06(0.05+0.00) +0.0% 0001.9: rev-list --objects $commit --not --all 0.06(0.05+0.00) 0.06(0.05+0.00) +0.0% 1450.1: fsck 20.20(19.71+0.47) 20.18(19.70+0.48) -0.1% hash.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.33.0