Message ID | YTzEvLHWcfuD20x4@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 8d133a4653abed4b06d3deb8bd71cf55cd87c990 |
Headers | show |
Series | strvec: use size_t to store nr and alloc | expand |
On Sat, Sep 11 2021, Jeff King wrote: > We converted argv_array (which later became strvec) to use size_t in > 819f0e76b1 (argv-array: use size_t for count and alloc, 2020-07-28) in > order to avoid the possibility of integer overflow. But later, commit > d70a9eb611 (strvec: rename struct fields, 2020-07-28) accidentally > converted these back to ints! > > Those two commits were part of the same patch series. I'm pretty sure > what happened is that they were originally written in the opposite order > and then cleaned up and re-ordered during an interactive rebase. And > when resolving the inevitable conflict, I mistakenly took the "rename" > patch completely, accidentally dropping the type change. > > We can correct it now; better late than never. > > Signed-off-by: Jeff King <peff@peff.net> > --- > This was posted previously in the midst of another thread, but I don't > think was picked up. There was some positive reaction, but one "do we > really need this?" to which I responded in detail: > > https://lore.kernel.org/git/YTIBnT8Ue1HZXs82@coredump.intra.peff.net/ > > I don't really think any of that needs to go into the commit message, > but if that's a hold-up, I can try to summarize it (though I think > referring to the commit which _already_ did this and was accidentally > reverted would be sufficient). Thanks, I have a WIP version of this outstanding starting with this patch that I was planning to submit sometime, but I'm happy to have you pursue it, especially with the ~100 outstanding patches I have in master..seen. It does feel somewhere between iffy and a landmine waiting to be stepped on to only convert the member itself, and not any of the corresponding "int" variables that track it to "size_t". If you do the change I suggested in https://lore.kernel.org/git/87v93i8svd.fsf@evledraar.gmail.com/ you'll find that there's at least one first-order reference to this that now uses "int" that if converted to "size_t" will result in a wrap-around error, we're lucky that one has a test failure. I can tell you what that bug is, but maybe it's better if you find it yourself :) I.e. I found *that* one, but I'm not sure I found them all. I just s/int nr/size_t *nr/ and eyeballed the wall off compiler errors & the code context (note: pointer, obviously broken, but makes the compiler yell). That particular bug will be caught by the compiler as it involves a >= 0 comparison against unsigned, but we may not not have that everywhere...
On 11/09/2021 17:13, Ævar Arnfjörð Bjarmason wrote: > On Sat, Sep 11 2021, Jeff King wrote: > >> We converted argv_array (which later became strvec) to use size_t in >> 819f0e76b1 (argv-array: use size_t for count and alloc, 2020-07-28) in >> order to avoid the possibility of integer overflow. But later, commit >> d70a9eb611 (strvec: rename struct fields, 2020-07-28) accidentally >> converted these back to ints! >> >> Those two commits were part of the same patch series. I'm pretty sure >> what happened is that they were originally written in the opposite order >> and then cleaned up and re-ordered during an interactive rebase. And >> when resolving the inevitable conflict, I mistakenly took the "rename" >> patch completely, accidentally dropping the type change. >> >> We can correct it now; better late than never. >> >> Signed-off-by: Jeff King <peff@peff.net> >> --- >> This was posted previously in the midst of another thread, but I don't >> think was picked up. There was some positive reaction, but one "do we >> really need this?" to which I responded in detail: >> >> https://lore.kernel.org/git/YTIBnT8Ue1HZXs82@coredump.intra.peff.net/ >> >> I don't really think any of that needs to go into the commit message, >> but if that's a hold-up, I can try to summarize it (though I think >> referring to the commit which _already_ did this and was accidentally >> reverted would be sufficient). > Thanks, I have a WIP version of this outstanding starting with this > patch that I was planning to submit sometime, but I'm happy to have you > pursue it, especially with the ~100 outstanding patches I have in > master..seen. > > It does feel somewhere between iffy and a landmine waiting to be stepped > on to only convert the member itself, and not any of the corresponding > "int" variables that track it to "size_t". > > If you do the change I suggested in > https://lore.kernel.org/git/87v93i8svd.fsf@evledraar.gmail.com/ you'll > find that there's at least one first-order reference to this that now > uses "int" that if converted to "size_t" will result in a wrap-around > error, we're lucky that one has a test failure. > > I can tell you what that bug is, but maybe it's better if you find it > yourself :) I.e. I found *that* one, but I'm not sure I found them > all. I just s/int nr/size_t *nr/ and eyeballed the wall off compiler > errors & the code context (note: pointer, obviously broken, but makes > the compiler yell). > > That particular bug will be caught by the compiler as it involves a >= 0 > comparison against unsigned, but we may not not have that everywhere... I'm particularly interested in the int -> size_t change problem as part of the wider 4GB limitations for the LLP64 systems [0] such as the RaspPi, git-lfs (on windows [1]), and Git-for-Windows[2]. It is a big problem. Philip [0] http://nickdesaulniers.github.io/blog/2016/05/30/data-models-and-word-size/ [1] https://github.com/git-lfs/git-lfs/issues/2434 Git on Windows client corrupts files > 4Gb [2] https://github.com/git-for-windows/git/pull/2179 [DRAFT] for testing : Fix 4Gb limit for large files on Git for Windows
On Sat, Sep 11, 2021 at 06:13:18PM +0200, Ævar Arnfjörð Bjarmason wrote: > > I don't really think any of that needs to go into the commit message, > > but if that's a hold-up, I can try to summarize it (though I think > > referring to the commit which _already_ did this and was accidentally > > reverted would be sufficient). > > Thanks, I have a WIP version of this outstanding starting with this > patch that I was planning to submit sometime, but I'm happy to have you > pursue it, especially with the ~100 outstanding patches I have in > master..seen. > > It does feel somewhere between iffy and a landmine waiting to be stepped > on to only convert the member itself, and not any of the corresponding > "int" variables that track it to "size_t". I don't think it's a landmine. If those other places are using "int" and have trouble with a too-big strvec after the patch, then they were generally already buggy before, too. I have poked around before for cases where half-converting some code from size_t to int can possibly make things worse, but it's pretty rare. Meanwhile, strvec using an int has obvious and easy-to-trigger overflow problems. Try this: yes '000aagent' | GIT_PROTOCOL=version=2 ./git-upload-pack . where we will happily allocate an arbitrarily-large strvec. If you have the patience and the RAM, this will overflow the alloc field. Luckily we catch it before under-allocating the array. Because our ALLOC_GROW() growth pattern is fixed, we'll always end up with a negative 32-bit int, which gets cast to a very large size_t, and an st_mult() invocation complains. Much scarier to me is that string_list seems to be using "unsigned int" for its counter, so it never goes negative! Try this: yes | git pack-objects --stdin-packs foo which reads into a string_list. It takes even more RAM to overflow because string_list is so horribly inefficient. But here once again we're saved by a quirk of ALLOC_GROW() dating back to 4234a76167 (Extend --pretty=oneline to cover the first paragraph,, 2007-06-11). alloc_nr() will overflow to a small number when it hits around 2^32 / 3. Before that patch, we'd then realloc a too-small buffer and have a heap overflow. After that patch (I think in order to catch growth by more than 1 element), we notice the case that alloc_nr() didn't give us a big enough size, and extend it exactly to what was asked for. So no heap overflow, though we do enter a quadratic growth loop. :-/ So I don't think we have any heap overflows here, which is good. But we are protected by a fair bit of luck, and none of this would get nearly as close to danger if we just used a sensible type for our allocation sizes. I do think we should separately fix the v2 protocol not to just allocate arbitrary-sized strvecs on behalf of the user. I took a stab at that this weekend and have a series I'm pretty happy with, but of course it conflicts with your ab/serve-cleanup (and in fact, I ended up doing some of those same cleanups, like not passing "keys" around). I'll see if I can re-build it on top. > If you do the change I suggested in > https://lore.kernel.org/git/87v93i8svd.fsf@evledraar.gmail.com/ you'll > find that there's at least one first-order reference to this that now > uses "int" that if converted to "size_t" will result in a wrap-around > error, we're lucky that one has a test failure. > > I can tell you what that bug is, but maybe it's better if you find it > yourself :) I.e. I found *that* one, but I'm not sure I found them > all. I just s/int nr/size_t *nr/ and eyeballed the wall off compiler > errors & the code context (note: pointer, obviously broken, but makes > the compiler yell). Yes, having converted s/int/size_t/ for other structures before, you have to be very careful of signedness. A safer conversion is ssize_t (which similarly avoids overflow problems on LP64 systems). I'm sure there are other overflow bugs lurking around use of strvecs, if you really push them hard. But I care a lot less about something like "oops, I accidentally overwrite list[0] instead of list[2^32-1]". Those are bugs that normal people will never see, because they aren't doing stupid or malicious things. I care a lot more about our allocating functions being vulnerable to heap overflows from malicious attackers. So I was really hoping to do the size_t fix separately. It's hard for it to screw anything up, and it addresses the most vulnerable and important use. -Peff
On Sat, Sep 11, 2021 at 11:48:38PM +0100, Philip Oakley wrote: > I'm particularly interested in the int -> size_t change problem as part > of the wider 4GB limitations for the LLP64 systems [0] such as the > RaspPi, git-lfs (on windows [1]), and Git-for-Windows[2]. It is a big > problem. Note that a lot of the Windows LLP64 problems are really a separate issue. They come from a misuse of "unsigned long" as "gee, this should be big enough for anything". Most of that is due to its use for object sizes, which of course infected a whole bunch of other code. Which isn't to say it's not important. But my main goal here was making sure we use size_t for growth allocations to avoid integer overflow leading to under-allocation (and thus heap overflow). -Peff
On 12/09/2021 23:00, Jeff King wrote: > On Sat, Sep 11, 2021 at 11:48:38PM +0100, Philip Oakley wrote: > >> I'm particularly interested in the int -> size_t change problem as part >> of the wider 4GB limitations for the LLP64 systems [0] such as the >> RaspPi, git-lfs (on windows [1]), and Git-for-Windows[2]. It is a big >> problem. > Note that a lot of the Windows LLP64 problems are really a separate > issue. They come from a misuse of "unsigned long" as "gee, this should > be big enough for anything". Most of that is due to its use for object > sizes, which of course infected a whole bunch of other code. I don't see it (root cause) as independent though. This is a nicely separated issue (effect), which helps. > Which isn't to say it's not important. But my main goal here was making > sure we use size_t for growth allocations to avoid integer overflow > leading to under-allocation (and thus heap overflow). It's also been helpful in highlighting some of the wider issues and approaches. Thank you Philip
diff --git a/strvec.h b/strvec.h index fdcad75b45..6b3cbd6758 100644 --- a/strvec.h +++ b/strvec.h @@ -29,8 +29,8 @@ extern const char *empty_strvec[]; */ struct strvec { const char **v; - int nr; - int alloc; + size_t nr; + size_t alloc; }; #define STRVEC_INIT { empty_strvec, 0, 0 }
We converted argv_array (which later became strvec) to use size_t in 819f0e76b1 (argv-array: use size_t for count and alloc, 2020-07-28) in order to avoid the possibility of integer overflow. But later, commit d70a9eb611 (strvec: rename struct fields, 2020-07-28) accidentally converted these back to ints! Those two commits were part of the same patch series. I'm pretty sure what happened is that they were originally written in the opposite order and then cleaned up and re-ordered during an interactive rebase. And when resolving the inevitable conflict, I mistakenly took the "rename" patch completely, accidentally dropping the type change. We can correct it now; better late than never. Signed-off-by: Jeff King <peff@peff.net> --- This was posted previously in the midst of another thread, but I don't think was picked up. There was some positive reaction, but one "do we really need this?" to which I responded in detail: https://lore.kernel.org/git/YTIBnT8Ue1HZXs82@coredump.intra.peff.net/ I don't really think any of that needs to go into the commit message, but if that's a hold-up, I can try to summarize it (though I think referring to the commit which _already_ did this and was accidentally reverted would be sufficient). strvec.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)