Message ID | 9dd8709d1b3b350008218133986befdb2ae74bae.1716541556.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Various memory leak fixes | expand |
On Fri, May 24, 2024 at 12:03:29PM +0200, Patrick Steinhardt wrote: > In `strbuf_appendwholeline()` we call `strbuf_getwholeline()` with a > temporary buffer. In case the call returns an error we indicate this by > returning EOF, but never release the temporary buffer. This can lead to > a memory leak when the line has been partially filled. Fix this. Hmm, doesn't this indicate a bug in getwholeline()? Most strbuf functions on error try to leave the allocation as-is. At the end of the getdelim() version (which is probably what you're running), when we see an error we do: if (!sb->buf) strbuf_init(sb, 0); else strbuf_reset(sb); return EOF; So if getdelim() returned error and left us with a buffer (but still returned -1 for error!), I think this code is assuming that the buffer it left us with was the same one that existed beforehand. But your commit message implies that it might allocate, hit an error, and then return that error along with an allocated buffer? Looks like that matches the getdelim() manpage, which says: If *lineptr was set to NULL before the call, then the buffer should be freed by the user program even on failure. So should we do something like: diff --git a/strbuf.c b/strbuf.c index e1076c9891..e37165812b 100644 --- a/strbuf.c +++ b/strbuf.c @@ -659,7 +659,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) if (!sb->buf) strbuf_init(sb, 0); else - strbuf_reset(sb); + strbuf_release(sb); return EOF; } #else That assumes sb->alloc is valid after a failed call, since strbuf_release() checks it. But that seems reasonable. If not, we'd need to free() and re-initialize it ourselves, and the code is more like: diff --git a/strbuf.c b/strbuf.c index e1076c9891..aed699c6bf 100644 --- a/strbuf.c +++ b/strbuf.c @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) * we can just re-init, but otherwise we should make sure that our * length is empty, and that the result is NUL-terminated. */ - if (!sb->buf) - strbuf_init(sb, 0); - else - strbuf_reset(sb); + FREE_AND_NULL(sb->buf); + strbuf_init(sb, 0); return EOF; } #else But I think either of those would solve your leak, _and_ would help with similar leaks of strbuf_getwholeline() and friends. -Peff
On Sat, May 25, 2024 at 12:46:35AM -0400, Jeff King wrote: > On Fri, May 24, 2024 at 12:03:29PM +0200, Patrick Steinhardt wrote: > > > In `strbuf_appendwholeline()` we call `strbuf_getwholeline()` with a > > temporary buffer. In case the call returns an error we indicate this by > > returning EOF, but never release the temporary buffer. This can lead to > > a memory leak when the line has been partially filled. Fix this. > > Hmm, doesn't this indicate a bug in getwholeline()? Most strbuf > functions on error try to leave the allocation as-is. > > At the end of the getdelim() version (which is probably what you're > running), when we see an error we do: > > if (!sb->buf) > strbuf_init(sb, 0); > else > strbuf_reset(sb); > return EOF; > > So if getdelim() returned error and left us with a buffer (but still > returned -1 for error!), I think this code is assuming that the buffer > it left us with was the same one that existed beforehand. > > But your commit message implies that it might allocate, hit an error, > and then return that error along with an allocated buffer? Looks like > that matches the getdelim() manpage, which says: > > If *lineptr was set to NULL before the call, then the buffer should be > freed by the user program even on failure. > > So should we do something like: > > diff --git a/strbuf.c b/strbuf.c > index e1076c9891..e37165812b 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -659,7 +659,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > if (!sb->buf) > strbuf_init(sb, 0); > else > - strbuf_reset(sb); > + strbuf_release(sb); > return EOF; > } > #else > > That assumes sb->alloc is valid after a failed call, since > strbuf_release() checks it. But that seems reasonable. If not, we'd need > to free() and re-initialize it ourselves, and the code is more like: > > diff --git a/strbuf.c b/strbuf.c > index e1076c9891..aed699c6bf 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > * we can just re-init, but otherwise we should make sure that our > * length is empty, and that the result is NUL-terminated. > */ > - if (!sb->buf) > - strbuf_init(sb, 0); > - else > - strbuf_reset(sb); > + FREE_AND_NULL(sb->buf); > + strbuf_init(sb, 0); > return EOF; > } > #else > > But I think either of those would solve your leak, _and_ would help with > similar leaks of strbuf_getwholeline() and friends. I'm not quite convinced that `strbuf_getwholeline()` should deallocate the buffer for the caller, I think that makes for quite a confusing calling convention. The caller may want to reuse the buffer for other operations, and it feels hostile to release the buffer under their feet. The only edge case where I think it would make sense to free allocated data is when being passed a not-yet-allocated strbuf. But I wonder whether the added complexity would be worth it. I've been going through all callsites and couldn't spot any that doesn't free the buffer on EOF. So I'd propose to leave this as-is and revisit if we eventually see that this is causing more memory leaks. Patrick
On Mon, May 27, 2024 at 08:44:46AM +0200, Patrick Steinhardt wrote: > > diff --git a/strbuf.c b/strbuf.c > > diff --git a/strbuf.c b/strbuf.c > > index e1076c9891..aed699c6bf 100644 > > --- a/strbuf.c > > +++ b/strbuf.c > > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > > * we can just re-init, but otherwise we should make sure that our > > * length is empty, and that the result is NUL-terminated. > > */ > > - if (!sb->buf) > > - strbuf_init(sb, 0); > > - else > > - strbuf_reset(sb); > > + FREE_AND_NULL(sb->buf); > > + strbuf_init(sb, 0); > > return EOF; > > } > > #else > > > > But I think either of those would solve your leak, _and_ would help with > > similar leaks of strbuf_getwholeline() and friends. > > I'm not quite convinced that `strbuf_getwholeline()` should deallocate > the buffer for the caller, I think that makes for quite a confusing > calling convention. The caller may want to reuse the buffer for other > operations, and it feels hostile to release the buffer under their feet. > > The only edge case where I think it would make sense to free allocated > data is when being passed a not-yet-allocated strbuf. But I wonder > whether the added complexity would be worth it. I'm not sure what they'd reuse it for. We necessarily have to reset it before reading, so the contents are now garbage. The allocated buffer could be reused, but since everybody has to call strbuf_grow() before assuming they can write, it's not a correctness issue, but only an optimization. But that optimization is pretty unlikely to matter. Since we hit this code only on EOF or error, it's generally going to happen once in a program, and not in a tight loop. If we really cared, though, I think you could check sb->alloc before the call to getdelim(), and then we'd know whether the original held an allocation or not (and we could restore its state). That's what other syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd() do. That said, I agree that leaks here are not going to be common. Most callers are going to call it in a loop and unconditionally release at the end, whether they get multiple lines or not. The "append" function is the odd man out by reading a single line into a new buffer[1]. Looking through the results of: git grep -P '(?<!while) \(!?strbuf_get(whole)?line' I saw only one questionable case. builtin/difftool.c does: if (strbuf_getline_nul(&lpath, fp)) break; without freeing lpath. But then...it does not free it in the case that we got a value, either! So I think it is leaking either way, and the solution, to strbuf_release(&lpath) outside of the loop, would fix both cases. > I've been going through all callsites and couldn't spot any that doesn't > free the buffer on EOF. So I'd propose to leave this as-is and revisit > if we eventually see that this is causing more memory leaks. OK. I don't feel too strongly about it, but mostly thought it seemed inconsistent with the philosophy of those other strbuf functions. -Peff [1] I was surprised at first that strbuf_appendwholeline() uses that extra copy, as the obvious implementation is just to skip the strbuf_reset() call in getwholeline(), and then implement "get" as a wrapper around "append" to add back in the reset call. But that only works for the slower getc() implementation. For the getdelim()-based one, there is no notion of append (the interface would have to take an offset into the buffer). We could probably optimize this at the cost of repeating ourselves, but given that there's only one probably-not-very-hot call to appendwholeline, it's likely not worth it.
On Wed, May 29, 2024 at 05:16:33AM -0400, Jeff King wrote: > On Mon, May 27, 2024 at 08:44:46AM +0200, Patrick Steinhardt wrote: > > > diff --git a/strbuf.c b/strbuf.c > > > diff --git a/strbuf.c b/strbuf.c > > > index e1076c9891..aed699c6bf 100644 > > > --- a/strbuf.c > > > +++ b/strbuf.c > > > @@ -656,10 +656,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > > > * we can just re-init, but otherwise we should make sure that our > > > * length is empty, and that the result is NUL-terminated. > > > */ > > > - if (!sb->buf) > > > - strbuf_init(sb, 0); > > > - else > > > - strbuf_reset(sb); > > > + FREE_AND_NULL(sb->buf); > > > + strbuf_init(sb, 0); > > > return EOF; > > > } > > > #else > > > > > > But I think either of those would solve your leak, _and_ would help with > > > similar leaks of strbuf_getwholeline() and friends. > > > > I'm not quite convinced that `strbuf_getwholeline()` should deallocate > > the buffer for the caller, I think that makes for quite a confusing > > calling convention. The caller may want to reuse the buffer for other > > operations, and it feels hostile to release the buffer under their feet. > > > > The only edge case where I think it would make sense to free allocated > > data is when being passed a not-yet-allocated strbuf. But I wonder > > whether the added complexity would be worth it. > > I'm not sure what they'd reuse it for. We necessarily have to reset it > before reading, so the contents are now garbage. The allocated buffer > could be reused, but since everybody has to call strbuf_grow() before > assuming they can write, it's not a correctness issue, but only an > optimization. But that optimization is pretty unlikely to matter. Since > we hit this code only on EOF or error, it's generally going to happen > once in a program, and not in a tight loop. > > If we really cared, though, I think you could check sb->alloc before the > call to getdelim(), and then we'd know whether the original held an > allocation or not (and we could restore its state). That's what other > syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd() > do. Ah, I didn't know that we did similar things in other strbuf functions. With that precedence I think it's less ugly to do this dance. > That said, I agree that leaks here are not going to be common. Most > callers are going to call it in a loop and unconditionally release at > the end, whether they get multiple lines or not. The "append" function > is the odd man out by reading a single line into a new buffer[1]. > > Looking through the results of: > > git grep -P '(?<!while) \(!?strbuf_get(whole)?line' > > I saw only one questionable case. builtin/difftool.c does: > > if (strbuf_getline_nul(&lpath, fp)) > break; > > without freeing lpath. But then...it does not free it in the case that > we got a value, either! So I think it is leaking either way, and the > solution, to strbuf_release(&lpath) outside of the loop, would fix both > cases. Indeed. We also didn't free `rpath` and `info`. I do have a follow up to this series already, so let me add those leak fixes to it. > > I've been going through all callsites and couldn't spot any that doesn't > > free the buffer on EOF. So I'd propose to leave this as-is and revisit > > if we eventually see that this is causing more memory leaks. > > OK. I don't feel too strongly about it, but mostly thought it seemed > inconsistent with the philosophy of those other strbuf functions. I get where you're coming from now with the additional info that other syscall-ish functions do a similar dance. I'll refrain from rerolling this series just to fix this in a different way, also because neither of us did spot any additional leaks caused by this. Patrick
On Wed, May 29, 2024 at 01:25:02PM +0200, Patrick Steinhardt wrote: > > If we really cared, though, I think you could check sb->alloc before the > > call to getdelim(), and then we'd know whether the original held an > > allocation or not (and we could restore its state). That's what other > > syscall-ish strbuf functions like strbuf_readlink() and strbuf_getcwd() > > do. > > Ah, I didn't know that we did similar things in other strbuf functions. > With that precedence I think it's less ugly to do this dance. Yeah, I probably should have explained that better in my earlier email. ;) > > I saw only one questionable case. builtin/difftool.c does: > > > > if (strbuf_getline_nul(&lpath, fp)) > > break; > > > > without freeing lpath. But then...it does not free it in the case that > > we got a value, either! So I think it is leaking either way, and the > > solution, to strbuf_release(&lpath) outside of the loop, would fix both > > cases. > > Indeed. We also didn't free `rpath` and `info`. I do have a follow up to > this series already, so let me add those leak fixes to it. Great! Thanks (as usual) for being receptive to me piling more things on your todo list. :) -Peff
diff --git a/strbuf.c b/strbuf.c index 0d929e4e19..e1076c9891 100644 --- a/strbuf.c +++ b/strbuf.c @@ -691,8 +691,10 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) int strbuf_appendwholeline(struct strbuf *sb, FILE *fp, int term) { struct strbuf line = STRBUF_INIT; - if (strbuf_getwholeline(&line, fp, term)) + if (strbuf_getwholeline(&line, fp, term)) { + strbuf_release(&line); return EOF; + } strbuf_addbuf(sb, &line); strbuf_release(&line); return 0; diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh index ec3443cc87..bbee2783ab 100755 --- a/t/t1400-update-ref.sh +++ b/t/t1400-update-ref.sh @@ -4,6 +4,8 @@ # test_description='Test git update-ref and basic ref logging' + +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh Z=$ZERO_OID
In `strbuf_appendwholeline()` we call `strbuf_getwholeline()` with a temporary buffer. In case the call returns an error we indicate this by returning EOF, but never release the temporary buffer. This can lead to a memory leak when the line has been partially filled. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- strbuf.c | 4 +++- t/t1400-update-ref.sh | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-)