Message ID | 20220307232552.2799122-1-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | block-sha1: remove use of assembly | expand |
On Mon, Mar 07, 2022 at 11:25:52PM +0000, brian m. carlson wrote: > In the block SHA-1 code, we have special assembly code for i386 and > amd64 to perform rotations with assembly. This is supposed to help pick > the correct rotation operation depending on which rotation is smaller, > which can help some systems perform slightly better, since any circular > rotation can be specified as either a rotate left or a rotate right. > However, this isn't needed, so we should remove it. At -O1 or higher (at least on GCC) this optimization is indeed performed. Here's a Godbolt example that shows this: https://godbolt.org/z/9zMP93hr1 so I agree that this code isn't helping us at all. And in the meantime... > The downside of using this code, however, is that it uses a GCC > extension, which makes the compiler complain when using -pedantic unless > it's prefixed with __extension__. We could fix that, but since it's > not needed, let's just remove it. We haven't noticed this because > almost everyone uses the SHA1DC code instead, but it still shows up for > some people. ...it makes it impossible to compile git if you have `BLK_SHA1=YesPlease` and `DEVELOPER=1` in your environment. So I am happy to see this go. On another note: missing Signed-off-by? Thanks, Taylor
On 2022-03-07 at 23:32:42, Taylor Blau wrote: > On Mon, Mar 07, 2022 at 11:25:52PM +0000, brian m. carlson wrote: > > In the block SHA-1 code, we have special assembly code for i386 and > > amd64 to perform rotations with assembly. This is supposed to help pick > > the correct rotation operation depending on which rotation is smaller, > > which can help some systems perform slightly better, since any circular > > rotation can be specified as either a rotate left or a rotate right. > > However, this isn't needed, so we should remove it. > > At -O1 or higher (at least on GCC) this optimization is indeed > performed. Here's a Godbolt example that shows this: > > https://godbolt.org/z/9zMP93hr1 > > so I agree that this code isn't helping us at all. And in the > meantime... Thanks for providing a link. I also did a similar test there with slightly different code (and unfortunately closed the window before saving the link) but it demonstrated the same thing: that the compiler can optimize this case adequately. My (substantially) past experience with testing GCC in this case has shown the same thing. > > The downside of using this code, however, is that it uses a GCC > > extension, which makes the compiler complain when using -pedantic unless > > it's prefixed with __extension__. We could fix that, but since it's > > not needed, let's just remove it. We haven't noticed this because > > almost everyone uses the SHA1DC code instead, but it still shows up for > > some people. > > ...it makes it impossible to compile git if you have > `BLK_SHA1=YesPlease` and `DEVELOPER=1` in your environment. So I am > happy to see this go. > > On another note: missing Signed-off-by? I'll send an otherwise unchanged v2 with that in a second.
diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c index 1bb6e7c069..5974cd7dd3 100644 --- a/block-sha1/sha1.c +++ b/block-sha1/sha1.c @@ -11,27 +11,10 @@ #include "sha1.h" -#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__)) - -/* - * Force usage of rol or ror by selecting the one with the smaller constant. - * It _can_ generate slightly smaller code (a constant of 1 is special), but - * perhaps more importantly it's possibly faster on any uarch that does a - * rotate with a loop. - */ - -#define SHA_ASM(op, x, n) ({ unsigned int __res; __asm__(op " %1,%0":"=r" (__res):"i" (n), "0" (x)); __res; }) -#define SHA_ROL(x,n) SHA_ASM("rol", x, n) -#define SHA_ROR(x,n) SHA_ASM("ror", x, n) - -#else - #define SHA_ROT(X,l,r) (((X) << (l)) | ((X) >> (r))) #define SHA_ROL(X,n) SHA_ROT(X,n,32-(n)) #define SHA_ROR(X,n) SHA_ROT(X,32-(n),n) -#endif - /* * If you have 32 registers or more, the compiler can (and should) * try to change the array[] accesses into registers. However, on