Message ID | 20211110234033.3144165-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 89c37479dbd6e4350bb6d6fa5049c388629518d5 |
Headers | show |
Series | packfile: avoid overflowing shift during decode | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > situation by dying if the left shift overflows, but this patch is still > worthwhile as it makes a bad header be reported as a bad header, not a > fatal left shift overflow. Hmph, I am not sure if I like to see it put that way, but I think the problem is not anything new. Even when we have a packfile that is in a good shape, the current machine (especially its wordsize) may not be capable of extracting the object we are looking at from the packstream, when the object is larger than the current machine's ulong can express. So it may be an indication that your machine cannot use the packed object, but may not be an indication that there is anything _wrong_ in the object header. Side note. I suspect that this should already be the case; you can pack a large object whose size do not fit in u32 on a box whose ulong is u64, and you wouldn't be able to use such a packfile on a box where ulong is u32. There is no "bad object header" in the pack stream per-se, but we cannot use it there. After all, the reason why we chose to use the varint encoding is because we do not have to change the file format when we extend beyond the architectures of prevailing word size of the day. Having said all that, I think the patch is correct. We later use the number "shift" as the shift count like so size += (c & 0x7f) << shift; In order for the uppermost bit from (c & 0x7f) to be still inside "size", it is not sufficient that "shift" does not exceed the bit-size of "size". We need to subtract the bitscanreverse of (0x7f), which is 7, which is exactly what your patch does. Looking good. Thanks. > packfile.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/packfile.c b/packfile.c > index 89402cfc69..972c327e29 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -1068,7 +1068,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, > size = c & 15; > shift = 4; > while (c & 0x80) { > - if (len <= used || bitsizeof(long) <= shift) { > + if (len <= used || (bitsizeof(long) - 7) <= shift) { > error("bad object header"); > size = used = 0; > break;
On 11/11/2021 02:58, Junio C Hamano wrote: > Jonathan Tan <jonathantanmy@google.com> writes: > >> diff --git a/packfile.c b/packfile.c >> index 89402cfc69..972c327e29 100644 >> --- a/packfile.c >> +++ b/packfile.c >> @@ -1068,7 +1068,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, >> size = c & 15; >> shift = 4; >> while (c & 0x80) { >> - if (len <= used || bitsizeof(long) <= shift) { >> + if (len <= used || (bitsizeof(long) - 7) <= shift) { This seems to cause troubles now for 32-bit systems (in my case Git for Windows 32-Bit): `shift` will go through 4, 11, 18 and for 25 it finally errors out. This means that objects >= 32MB can't be processed anymore. The condition should probably be changed to: + if (len <= used || (bitsizeof(long) - 7) < shift) { This still ensures that the shift can never overflow and on 32-bit systems restores the maximum size of 4G with a final shift of 127<<25 (the old condition `bitsizeof(long) <= shift` was perfectly valid for 32-bit systems). > Even when we have a packfile that is in a good shape, the current > machine (especially its wordsize) may not be capable of extracting > the object we are looking at from the packstream, when the object is > larger than the current machine's ulong can express. So it may be > an indication that your machine cannot use the packed object, but > may not be an indication that there is anything _wrong_ in the > object header. I was actually quite confused by the error message "bad object header". It made me investigate all sorts of other things before thoroughly stepping through this loop. -Marc
Marc Strapetz <marc.strapetz@syntevo.com> writes: > On 11/11/2021 02:58, Junio C Hamano wrote: >> Jonathan Tan <jonathantanmy@google.com> writes: >> >>> diff --git a/packfile.c b/packfile.c >>> index 89402cfc69..972c327e29 100644 >>> --- a/packfile.c >>> +++ b/packfile.c >>> @@ -1068,7 +1068,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, >>> size = c & 15; >>> shift = 4; >>> while (c & 0x80) { >>> - if (len <= used || bitsizeof(long) <= shift) { >>> + if (len <= used || (bitsizeof(long) - 7) <= shift) { > > This seems to cause troubles now for 32-bit systems (in my case Git > for Windows 32-Bit): `shift` will go through 4, 11, 18 and for 25 it > finally errors out. This means that objects >= 32MB can't be processed > anymore. The condition should probably be changed to: > > + if (len <= used || (bitsizeof(long) - 7) < shift) { > > This still ensures that the shift can never overflow and on 32-bit > systems restores the maximum size of 4G with a final shift of 127<<25 > (the old condition `bitsizeof(long) <= shift` was perfectly valid for > 32-bit systems). Jonathan?
Junio C Hamano <gitster@pobox.com> writes: > Marc Strapetz <marc.strapetz@syntevo.com> writes: > >> On 11/11/2021 02:58, Junio C Hamano wrote: >>> Jonathan Tan <jonathantanmy@google.com> writes: >>> >>>> diff --git a/packfile.c b/packfile.c >>>> index 89402cfc69..972c327e29 100644 >>>> --- a/packfile.c >>>> +++ b/packfile.c >>>> @@ -1068,7 +1068,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, >>>> size = c & 15; >>>> shift = 4; >>>> while (c & 0x80) { >>>> - if (len <= used || bitsizeof(long) <= shift) { >>>> + if (len <= used || (bitsizeof(long) - 7) <= shift) { >> >> This seems to cause troubles now for 32-bit systems (in my case Git >> for Windows 32-Bit): `shift` will go through 4, 11, 18 and for 25 it >> finally errors out. This means that objects >= 32MB can't be processed >> anymore. The condition should probably be changed to: >> >> + if (len <= used || (bitsizeof(long) - 7) < shift) { >> >> This still ensures that the shift can never overflow and on 32-bit >> systems restores the maximum size of 4G with a final shift of 127<<25 >> (the old condition `bitsizeof(long) <= shift` was perfectly valid for >> 32-bit systems). > > Jonathan? ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----- Date: Wed, 12 Jan 2022 12:11:42 -0800 Subject: [PATCH] packfile: fix off-by-one error in decoding logic shift count being exactly at 7-bit smaller than the long is OK; on 32-bit architecture, shift count starts at 4 and goes through 11, 18 and 25, at which point the guard triggers one iteration too early. Reported-by: Marc Strapetz <marc.strapetz@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- packfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packfile.c b/packfile.c index d3820c780b..667e21ce97 100644 --- a/packfile.c +++ b/packfile.c @@ -1067,7 +1067,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, size = c & 15; shift = 4; while (c & 0x80) { - if (len <= used || (bitsizeof(long) - 7) <= shift) { + if (len <= used || (bitsizeof(long) - 7) < shift) { error("bad object header"); size = used = 0; break;
Junio C Hamano <gitster@pobox.com> writes: > Marc Strapetz <marc.strapetz@syntevo.com> writes: > > > On 11/11/2021 02:58, Junio C Hamano wrote: > >> Jonathan Tan <jonathantanmy@google.com> writes: > >> > >>> diff --git a/packfile.c b/packfile.c > >>> index 89402cfc69..972c327e29 100644 > >>> --- a/packfile.c > >>> +++ b/packfile.c > >>> @@ -1068,7 +1068,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, > >>> size = c & 15; > >>> shift = 4; > >>> while (c & 0x80) { > >>> - if (len <= used || bitsizeof(long) <= shift) { > >>> + if (len <= used || (bitsizeof(long) - 7) <= shift) { > > > > This seems to cause troubles now for 32-bit systems (in my case Git > > for Windows 32-Bit): `shift` will go through 4, 11, 18 and for 25 it > > finally errors out. This means that objects >= 32MB can't be processed > > anymore. The condition should probably be changed to: > > > > + if (len <= used || (bitsizeof(long) - 7) < shift) { > > > > This still ensures that the shift can never overflow and on 32-bit > > systems restores the maximum size of 4G with a final shift of 127<<25 > > (the old condition `bitsizeof(long) <= shift` was perfectly valid for > > 32-bit systems). > > Jonathan? This analysis makes sense - not sure how I missed that. 0x7f (the number being shifted) is 7 bits, so it can safely be shifted 25 bits. The original condition of `bitsizeof(long) <= shift` works for 32-bit but not for 64-bit (4, 11, 18, 25, 32, 39, 46, 53, 60) since shifting 0x7f, a 7-bit value, by 60 bits would result in overflow, so we still need to subtract 7. I agree that the inequality should be `<`, not `<=`.
diff --git a/packfile.c b/packfile.c index 89402cfc69..972c327e29 100644 --- a/packfile.c +++ b/packfile.c @@ -1068,7 +1068,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, size = c & 15; shift = 4; while (c & 0x80) { - if (len <= used || bitsizeof(long) <= shift) { + if (len <= used || (bitsizeof(long) - 7) <= shift) { error("bad object header"); size = used = 0; break;
unpack_object_header_buffer() attempts to protect against overflowing left shifts, but the limit of the shift amount should not be the size of the variable being shifted. It should be the size minus the size of its contents. Fix that accordingly. This was noticed at $DAYJOB by a fuzzer running internally. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- In next, d6a09e795d ("odb: guard against data loss checking out a huge file", 2021-11-03) (merged as fe5160a170 ("Merge branch 'mc/clean-smudge-with-llp64' into next", 2021-11-03)) ameliorates this situation by dying if the left shift overflows, but this patch is still worthwhile as it makes a bad header be reported as a bad header, not a fatal left shift overflow. --- packfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)