Message ID | 3b12a8cf393b6d8f0877fd7d87173c565d7d5a90.1709841147.git.code@khaugsbakk.name (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | format-patch: teach `--header-cmd` | expand |
On Thu, Mar 07, 2024 at 08:59:35PM +0100, Kristoffer Haugsbakk wrote: > The MIME header handling started using string buffers in > d50b69b868d (log_write_email_headers: use strbufs, 2018-05-18). The > subject buffer is given to `extra_headers` without that variable taking > ownership; the commit “punts on that ownership” (in general, not just > for this buffer). > > In an upcoming commit we will first assign `extra_headers` to the owned > pointer from another `strbuf`. In turn we need this variable to always > contain an owned pointer so that we can free it in the calling > function. Hmm, OK. This patch by itself introduces a memory leak. It would be nice if we could couple it with the matching free() so that we can see that the issue is fixed. It sounds like your patch 2 is going to introduce such a free, but I'm not sure it's complete. It frees the old extra_headers before reassigning it, but nobody cleans it up after handling the final commit. We should also drop the "static" from subject_buffer, if it is no longer needed. Likewise, any strings that start owning memory (here or in patch 2) should probably drop their "const". That makes the ownership more clear, and avoids ugly casts when freeing. -Peff
Hi Jeff and thanks for taking a look On Tue, Mar 12, 2024, at 10:29, Jeff King wrote: > On Thu, Mar 07, 2024 at 08:59:35PM +0100, Kristoffer Haugsbakk wrote: > >> The MIME header handling started using string buffers in >> d50b69b868d (log_write_email_headers: use strbufs, 2018-05-18). The >> subject buffer is given to `extra_headers` without that variable taking >> ownership; the commit “punts on that ownership” (in general, not just >> for this buffer). >> >> In an upcoming commit we will first assign `extra_headers` to the owned >> pointer from another `strbuf`. In turn we need this variable to always >> contain an owned pointer so that we can free it in the calling >> function. > > Hmm, OK. This patch by itself introduces a memory leak. It would be nice > if we could couple it with the matching free() so that we can see that > the issue is fixed. It sounds like your patch 2 is going to introduce > such a free, but I'm not sure it's complete. Is it okay if it is done in patch 2? > It frees the old extra_headers before reassigning it, but nobody > cleans it up after handling the final commit. I didn’t get any leak errors from the CI. `extra_headers` in `show_log` is populated by calling `log_write_email_headers`. Then later it is assigned to ctx.after_subject = extra_headers; Then `ctx.after_subject is freed later free((char *)ctx.after_subject); Am I missing something? > We should also drop the "static" from subject_buffer, if it is no longer > needed. Likewise, any strings that start owning memory (here or in patch > 2) should probably drop their "const". That makes the ownership more > clear, and avoids ugly casts when freeing. Okay, I’ll do that. Thanks
On Tue, Mar 12, 2024 at 06:43:55PM +0100, Kristoffer Haugsbakk wrote: > > Hmm, OK. This patch by itself introduces a memory leak. It would be nice > > if we could couple it with the matching free() so that we can see that > > the issue is fixed. It sounds like your patch 2 is going to introduce > > such a free, but I'm not sure it's complete. > > Is it okay if it is done in patch 2? I don't think it's the end of the world to do it in patch 2, as long as we end up in a good spot. But IMHO it's really hard for reviewers to understand what is going on, because it's intermingled with so many other changes. It would be much easier to read if we had a preparatory patch that switched the memory ownership of the field, and then built on top of that. But I recognize that sometimes that's hard to do, because the state is so tangled that the functional change is what untangles it. I'm not sure if that's the case here or not; you'd probably have a better idea as somebody who looked carefully at it recently. > > It frees the old extra_headers before reassigning it, but nobody > > cleans it up after handling the final commit. > > I didn’t get any leak errors from the CI. `extra_headers` in `show_log` > is populated by calling `log_write_email_headers`. Then later it is > assigned to > > ctx.after_subject = extra_headers; > > Then `ctx.after_subject is freed later > > free((char *)ctx.after_subject); > > Am I missing something? Ah, I see. I was confused by looking for a free of an extra_headers field. We have rev_info.extra_headers, and that is _not_ owned by rev_info. We used to assign that to a variable in log_write_email_headers(), but now we actually make a copy of it. And so the copy is freed in that function when we replace it with a version containing extra mime headers here: strbuf_addf(&subject_buffer, "%s" "MIME-Version: 1.0\n" "Content-Type: multipart/mixed;" " boundary=\"%s%s\"\n" "\n" "This is a multi-part message in MIME " "format.\n" "--%s%s\n" "Content-Type: text/plain; " "charset=UTF-8; format=fixed\n" "Content-Transfer-Encoding: 8bit\n\n", extra_headers ? extra_headers : "", mime_boundary_leader, opt->mime_boundary, mime_boundary_leader, opt->mime_boundary); free((char *)extra_headers); extra_headers = strbuf_detach(&subject_buffer, NULL); But the actual ownership is passed out via the extra_headers_p variable, and that is what is assigned to ctx.after_subject (which now takes ownership). I think in the snippet I quoted above that extra_headers could never be NULL now, right? We'll always return at least an empty string. But moreover, we are formatting it into a strbuf, only to potentially copy it it another strbuf. Couldn't we just do it all in one strbuf? Something like this: log-tree.c | 29 ++++++++--------------------- 1 file changed, 8 insertions(+), 21 deletions(-) diff --git a/log-tree.c b/log-tree.c index 9196b4f1d4..0a703a0303 100644 --- a/log-tree.c +++ b/log-tree.c @@ -469,29 +469,22 @@ void fmt_output_email_subject(struct strbuf *sb, struct rev_info *opt) } } -static char *extra_and_pe_headers(const char *extra_headers, const char *pe_headers) { - struct strbuf all_headers = STRBUF_INIT; - - if (extra_headers) - strbuf_addstr(&all_headers, extra_headers); - if (pe_headers) { - strbuf_addstr(&all_headers, pe_headers); - } - return strbuf_detach(&all_headers, NULL); -} - void log_write_email_headers(struct rev_info *opt, struct commit *commit, const char **extra_headers_p, int *need_8bit_cte_p, int maybe_multipart) { - const char *extra_headers = - extra_and_pe_headers(opt->extra_headers, opt->pe_headers); + struct strbuf headers = STRBUF_INIT; const char *name = oid_to_hex(opt->zero_commit ? null_oid() : &commit->object.oid); *need_8bit_cte_p = 0; /* unknown */ + if (opt->extra_headers) + strbuf_addstr(&headers, opt->extra_headers); + if (opt->pe_headers) + strbuf_addstr(&headers, opt->pe_headers); + fprintf(opt->diffopt.file, "From %s Mon Sep 17 00:00:00 2001\n", name); graph_show_oneline(opt->graph); if (opt->message_id) { @@ -508,16 +501,13 @@ void log_write_email_headers(struct rev_info *opt, struct commit *commit, graph_show_oneline(opt->graph); } if (opt->mime_boundary && maybe_multipart) { - static struct strbuf subject_buffer = STRBUF_INIT; static struct strbuf buffer = STRBUF_INIT; struct strbuf filename = STRBUF_INIT; *need_8bit_cte_p = -1; /* NEVER */ - strbuf_reset(&subject_buffer); strbuf_reset(&buffer); - strbuf_addf(&subject_buffer, - "%s" + strbuf_addf(&headers, "MIME-Version: 1.0\n" "Content-Type: multipart/mixed;" " boundary=\"%s%s\"\n" @@ -528,11 +518,8 @@ void log_write_email_headers(struct rev_info *opt, struct commit *commit, "Content-Type: text/plain; " "charset=UTF-8; format=fixed\n" "Content-Transfer-Encoding: 8bit\n\n", - extra_headers ? extra_headers : "", mime_boundary_leader, opt->mime_boundary, mime_boundary_leader, opt->mime_boundary); - free((char *)extra_headers); - extra_headers = strbuf_detach(&subject_buffer, NULL); if (opt->numbered_files) strbuf_addf(&filename, "%d", opt->nr); @@ -552,7 +539,7 @@ void log_write_email_headers(struct rev_info *opt, struct commit *commit, opt->diffopt.stat_sep = buffer.buf; strbuf_release(&filename); } - *extra_headers_p = extra_headers; + *extra_headers_p = headers.len ? strbuf_detach(&headers, NULL) : NULL; } static void show_sig_lines(struct rev_info *opt, int status, const char *bol) The resulting code is shorter and (IMHO) easier to understand. It avoids an extra allocation and copy when using mime. It also avoids the allocation of an empty string when opt->extra_headers and opt->pe_headers are both NULL. It does make an extra copy when extra_headers is non-NULL but pe_headers is NULL (and you're not using MIME), as we could just use opt->extra_headers as-is, then. But since the caller needs to take ownership, we can't avoid that copy. I think you could even do this cleanup before adding pe_headers, especially if it was coupled with cleaning up the memory ownership issues. -Peff
On Wed, Mar 13, 2024, at 07:54, Jeff King wrote: > On Tue, Mar 12, 2024 at 06:43:55PM +0100, Kristoffer Haugsbakk wrote: > >> > Hmm, OK. This patch by itself introduces a memory leak. It would be nice >> > if we could couple it with the matching free() so that we can see that >> > the issue is fixed. It sounds like your patch 2 is going to introduce >> > such a free, but I'm not sure it's complete. >> >> Is it okay if it is done in patch 2? > > I don't think it's the end of the world to do it in patch 2, as long as > we end up in a good spot. But IMHO it's really hard for reviewers to > understand what is going on, because it's intermingled with so many > other changes. It would be much easier to read if we had a preparatory > patch that switched the memory ownership of the field, and then built on > top of that. Sounds good. I’ll do that. > But I recognize that sometimes that's hard to do, because the state is > so tangled that the functional change is what untangles it. I'm not sure > if that's the case here or not; you'd probably have a better idea as > somebody who looked carefully at it recently. Seems doable in this case. By the way. I pretty much just elbowed in the changes I needed (like in `revision.h`) in order to add this per-patch/cover letter headers variable. Let me know if there are better ways to do it. >> > It frees the old extra_headers before reassigning it, but nobody >> > cleans it up after handling the final commit. >> >> I didn’t get any leak errors from the CI. `extra_headers` in `show_log` >> is populated by calling `log_write_email_headers`. Then later it is >> assigned to >> >> ctx.after_subject = extra_headers; >> >> Then `ctx.after_subject is freed later >> >> free((char *)ctx.after_subject); >> >> Am I missing something? > > Ah, I see. I was confused by looking for a free of an extra_headers > field. We have rev_info.extra_headers, and that is _not_ owned by > rev_info. We used to assign that to a variable in > log_write_email_headers(), but now we actually make a copy of it. And so > the copy is freed in that function when we replace it with a version > containing extra mime headers here: > > [snip] > > But the actual ownership is passed out via the extra_headers_p variable, > and that is what is assigned to ctx.after_subject (which now takes > ownership). > > I think in the snippet I quoted above that extra_headers could never be > NULL now, right? We'll always return at least an empty string. But > moreover, we are formatting it into a strbuf, only to potentially copy > it it another strbuf. Couldn't we just do it all in one strbuf? > > Something like this: > > [snip] > > > The resulting code is shorter and (IMHO) easier to understand. It > avoids an extra allocation and copy when using mime. It also avoids the > allocation of an empty string when opt->extra_headers and > opt->pe_headers are both NULL. It does make an extra copy when > extra_headers is non-NULL but pe_headers is NULL (and you're not using > MIME), as we could just use opt->extra_headers as-is, then. But since > the caller needs to take ownership, we can't avoid that copy. > > I think you could even do this cleanup before adding pe_headers, > especially if it was coupled with cleaning up the memory ownership > issues. > > -Peff I haven’t tried yet but this seems like a good plan. It was getting a getting a bit too back and forth with my changes. So I’ll try to use your patch and see if I can get a clean preparatory patch/commit before the main change. Cheers
diff --git a/log-tree.c b/log-tree.c index 337b9334cdb..2eabd19962b 100644 --- a/log-tree.c +++ b/log-tree.c @@ -519,7 +519,7 @@ void log_write_email_headers(struct rev_info *opt, struct commit *commit, extra_headers ? extra_headers : "", mime_boundary_leader, opt->mime_boundary, mime_boundary_leader, opt->mime_boundary); - extra_headers = subject_buffer.buf; + extra_headers = strbuf_detach(&subject_buffer, NULL); if (opt->numbered_files) strbuf_addf(&filename, "%d", opt->nr);
The MIME header handling started using string buffers in d50b69b868d (log_write_email_headers: use strbufs, 2018-05-18). The subject buffer is given to `extra_headers` without that variable taking ownership; the commit “punts on that ownership” (in general, not just for this buffer). In an upcoming commit we will first assign `extra_headers` to the owned pointer from another `strbuf`. In turn we need this variable to always contain an owned pointer so that we can free it in the calling function. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> --- log-tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)