Message ID | 20231020101310.GB2673716@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 637e8944a13af5eae2dcaef99d4d84645f2b60ac |
Headers | show |
Series | some send-email --compose fixes | expand |
On Fri, Oct 20, 2023 at 06:13:10AM -0400, Jeff King wrote: >But one thing that gives me pause is that the neither before or after >this patch do we handle continuation lines like: > > Subject: this is the beginning > and this is more subject > >And it would probably be a lot easier to add when storing the headers in >a hash (it's not impossible to do it the other way, but you basically >have to delay processing each line with a small state machine). > that seems like a rather significant point, doesn't it? >So another option is to just fix the individual bugs separately. > ... so that seems preferable to me, given that the necessary fixes seem rather trivial. > I guess "readable" is up for debate here, but I find the inline handling > a lot easier to follow > any particular reason for that? > (and it's half as many lines; most of the diffstat is the new tests). >- if ($parsed_email{'From'}) { >- $sender = delete($parsed_email{'From'}); >- } this verbosity could be cut down somewhat using just $sender = delete($parsed_email{'From'}); and if the value can be pre-set and needs to be preserved, $sender = delete($parsed_email{'From'}) // $sender; but this seems kind of counter-productive legibility-wise. regards
Jeff King <peff@peff.net> writes:
> - the handling for to/cc/bcc is totally broken.
It is good to see another evidence that "--compose" is probably not
as often as used as we thought. With enough bugs discovered,
perhaps someday we can declare "it cannot be that the feature is
used in the wild, without anybody getting hit by these bugs---let's
deprecate and eventually remove it" ;-)
On Fri, Oct 20, 2023 at 12:45:43PM +0200, Oswald Buddenhagen wrote: > On Fri, Oct 20, 2023 at 06:13:10AM -0400, Jeff King wrote: > > But one thing that gives me pause is that the neither before or after > > this patch do we handle continuation lines like: > > > > Subject: this is the beginning > > and this is more subject > > > > And it would probably be a lot easier to add when storing the headers in > > a hash (it's not impossible to do it the other way, but you basically > > have to delay processing each line with a small state machine). > > > that seems like a rather significant point, doesn't it? Maybe. It depends on whether anybody is interested in adding continuation support. Nobody has in the previous 18 years, and nobody has asked for it. > > So another option is to just fix the individual bugs separately. > > > ... so that seems preferable to me, given that the necessary fixes seem > rather trivial. They're not too bad. Probably: 1. lc() the keys we put into the hash 2. match to/cc/bcc and dereference their arrays 3. maybe handle 'body' separately from headers to avoid confusion But there may be other similar bugs lurking. One I didn't mention: the hash-based version randomly reorders headers! > > I guess "readable" is up for debate here, but I find the inline handling > > a lot easier to follow > > > any particular reason for that? For the reasons I gave in the commit message: namely that the matching and logic is in one place and doesn't need to be duplicated (e.g., the special handling of to/cc/bcc, which caused a bug here). > > (and it's half as many lines; most of the diffstat is the new tests). > > > - if ($parsed_email{'From'}) { > > - $sender = delete($parsed_email{'From'}); > > - } > > this verbosity could be cut down somewhat using just > > $sender = delete($parsed_email{'From'}); > > and if the value can be pre-set and needs to be preserved, > > $sender = delete($parsed_email{'From'}) // $sender; > > but this seems kind of counter-productive legibility-wise. We do need to avoid overwriting the pre-set value. The "//" one would work, but we support perl versions old enough that they don't have it. -Peff
On Fri, Oct 20, 2023 at 02:45:30PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > - the handling for to/cc/bcc is totally broken. > > It is good to see another evidence that "--compose" is probably not > as often as used as we thought. With enough bugs discovered, > perhaps someday we can declare "it cannot be that the feature is > used in the wild, without anybody getting hit by these bugs---let's > deprecate and eventually remove it" ;-) I'm not sure if that is evidence or not. The to/cc/bcc feature was just never implemented. The commit from 2017 made it more broken than saying "not yet implemented", but that may only be an indication that nobody wants it or tried to use it. I dunno. As I noted, the same feature exists when reading the cover-letter from a set of format-patch files. And of course it is implemented using totally separate code (in pre_process_file). One possible cleanup would be to unify those two, but I'm sure there would be behavior changes. Some of them perhaps good (e.g., it looks like pre_process_file is more careful about rfc2047 handling) and some of them I'm not so sure of (e.g., support for --header-cmd in the --compose letter). I think an interested person could champion such changes, but I am not that interested in send-email (I don't use it, and some of its code is pretty ancient and gross). My goal was to fix the bug I saw with minimal regression (I waffled even on my patch 2). -Peff
On Mon, Oct 23, 2023 at 02:40:10PM -0400, Jeff King wrote: >On Fri, Oct 20, 2023 at 12:45:43PM +0200, Oswald Buddenhagen wrote: >> that seems like a rather significant point, doesn't it? > >Maybe. It depends on whether anybody is interested in adding >continuation support. Nobody has in the previous 18 years, and nobody >has asked for it. > dunno, it seems like a bug to me. so if i cared at all about this functionality, i'd fix it just because. so at least it doesn't seem nice to make it harder for a potential volunteer. >> > So another option is to just fix the individual bugs separately. >> > >> ... so that seems preferable to me, given that the necessary fixes >> seem >> rather trivial. > >They're not too bad. Probably: > > 1. lc() the keys we put into the hash > > 2. match to/cc/bcc and dereference their arrays > > 3. maybe handle 'body' separately from headers to avoid confusion > with the header keys lowercased, one could simply use BODY as the key and be done with it. >But there may be other similar bugs lurking. >One I didn't mention: the >hash-based version randomly reorders headers! > hmm, yeah, that would mean using Tie::IxHash if one wanted to do it elegantly, at the cost of depending on another non-core module. also, it means that another hash with non-lowercased versions of the keys would have to be kept. ok, that's stupid. it would be easier to just keep an additional array of the original keys for iteration, and check the hash before emitting them. >> > I guess "readable" is up for debate here, but I find the inline handling >> > a lot easier to follow >> > >> any particular reason for that? > >For the reasons I gave in the commit message: namely that the matching >and logic is in one place and doesn't need to be duplicated (e.g., the >special handling of to/cc/bcc, which caused a bug here). > from what i can see, there isn't really anything to "match", apart from agreeing on the data structure (which the code partially failed to do, but that's trivial enough). and layering/abstracting things is usually considered a good thing, unless the cost/benefit ratio is completely backwards. >The "//" one would >work, but we support perl versions old enough that they don't have it. > according to my grepping, that ship has sailed. also, why _would_ you support such ancient perl versions? that makes even less sense to me than supporting ancient c compilers. regards
On Mon, Oct 23, 2023 at 09:50:54PM +0200, Oswald Buddenhagen wrote: > > The "//" one would > > work, but we support perl versions old enough that they don't have it. > > > according to my grepping, that ship has sailed. > also, why _would_ you support such ancient perl versions? that makes even > less sense to me than supporting ancient c compilers. It may be reasonable to bump the default perl version for the script. But that would require somebody digging into what tends to ship these days (which can be sometimes be surprising; witness macos using old versions of bash due to license issues), and then updating the "use 5.008" in the script. The "//" operator was added in perl 5.10. I'm not sure what you found that makes you think the ship has sailed. The only hits for "//" I see look like the end of substitution regexes ("s/foo//" and similar). But if we are not consistent with the "use" claim, that is worth fixing. -Peff
On Wed, Oct 25, 2023 at 02:11:20AM -0400, Jeff King wrote: >The "//" operator was added in perl 5.10. I'm not sure what you found >that makes you think the ship has sailed. The only hits for "//" I see >look like the end of substitution regexes ("s/foo//" and similar). > grep with spaces around the operator, then you can see the instance in git-credential-netrc.perl easily. regards
Oswald Buddenhagen <oswald.buddenhagen@gmx.de> writes: > On Wed, Oct 25, 2023 at 02:11:20AM -0400, Jeff King wrote: >>The "//" operator was added in perl 5.10. I'm not sure what you found >>that makes you think the ship has sailed. The only hits for "//" I see >>look like the end of substitution regexes ("s/foo//" and similar). >> > grep with spaces around the operator, then you can see the instance in > git-credential-netrc.perl easily. Good find, but given the relative prevalence in use between netrc helper and send-email, my conclusion is rather opposite. It seems to indicate that avoiding "//" would still be prudent if the only tool we can find find broken on 5.008 is the netrc helper.
On Wed, Oct 25, 2023 at 11:23:01AM +0200, Oswald Buddenhagen wrote: > On Wed, Oct 25, 2023 at 02:11:20AM -0400, Jeff King wrote: > > The "//" operator was added in perl 5.10. I'm not sure what you found > > that makes you think the ship has sailed. The only hits for "//" I see > > look like the end of substitution regexes ("s/foo//" and similar). > > > grep with spaces around the operator, then you can see the instance in > git-credential-netrc.perl easily. Ah, yeah, there is one instance there. That script does not have a "use" marker, though, and we do not necessarily need or want to be as strict with contrib/ scripts, which are quite optional compared to core functionality like send-email. That said, I do suspect that requiring 5.10 or later would not be too burdensome these days. If we want to do so, then the first step would be updating the text in INSTALL, along with the "use" directives in most files. Probably d48b284183 (perl: bump the required Perl version to 5.8 from 5.6.[21], 2010-09-24) could serve as a template. -Peff
diff --git a/git-send-email.perl b/git-send-email.perl index 288ea1ae80..bbda2a931b 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -888,73 +888,59 @@ sub get_patch_subject { do_edit($compose_filename); } + open my $c2, ">", $compose_filename . ".final" + or die sprintf(__("Failed to open %s.final: %s"), $compose_filename, $!); + open $c, "<", $compose_filename or die sprintf(__("Failed to open %s: %s"), $compose_filename, $!); + my $need_8bit_cte = file_has_nonascii($compose_filename); + my $in_body = 0; + my $summary_empty = 1; if (!defined $compose_encoding) { $compose_encoding = "UTF-8"; } - - my %parsed_email; - while (my $line = <$c>) { - next if $line =~ m/^GIT:/; - parse_header_line($line, \%parsed_email); - if ($line =~ /^$/) { - $parsed_email{'body'} = filter_body($c); + while(<$c>) { + next if m/^GIT:/; + if ($in_body) { + $summary_empty = 0 unless (/^\n$/); + } elsif (/^\n$/) { + $in_body = 1; + if ($need_8bit_cte) { + print $c2 "MIME-Version: 1.0\n", + "Content-Type: text/plain; ", + "charset=$compose_encoding\n", + "Content-Transfer-Encoding: 8bit\n"; + } + } elsif (/^MIME-Version:/i) { + $need_8bit_cte = 0; + } elsif (/^Subject:\s*(.+)\s*$/i) { + $initial_subject = $1; + my $subject = $initial_subject; + $_ = "Subject: " . + quote_subject($subject, $compose_encoding) . + "\n"; + } elsif (/^In-Reply-To:\s*(.+)\s*$/i) { + $initial_in_reply_to = $1; + next; + } elsif (/^Reply-To:\s*(.+)\s*$/i) { + $reply_to = $1; + } elsif (/^From:\s*(.+)\s*$/i) { + $sender = $1; + next; + } elsif (/^(?:To|Cc|Bcc):/i) { + print __("To/Cc/Bcc fields are not interpreted yet, they have been ignored\n"); + next; } + print $c2 $_; } close $c; + close $c2; - open my $c2, ">", $compose_filename . ".final" - or die sprintf(__("Failed to open %s.final: %s"), $compose_filename, $!); - - - if ($parsed_email{'From'}) { - $sender = delete($parsed_email{'From'}); - } - if ($parsed_email{'In-Reply-To'}) { - $initial_in_reply_to = delete($parsed_email{'In-Reply-To'}); - } - if ($parsed_email{'Reply-To'}) { - $reply_to = delete($parsed_email{'Reply-To'}); - } - if ($parsed_email{'Subject'}) { - $initial_subject = delete($parsed_email{'Subject'}); - print $c2 "Subject: " . - quote_subject($initial_subject, $compose_encoding) . - "\n"; - } - - if ($parsed_email{'MIME-Version'}) { - print $c2 "MIME-Version: $parsed_email{'MIME-Version'}\n", - "Content-Type: $parsed_email{'Content-Type'};\n", - "Content-Transfer-Encoding: $parsed_email{'Content-Transfer-Encoding'}\n"; - delete($parsed_email{'MIME-Version'}); - delete($parsed_email{'Content-Type'}); - delete($parsed_email{'Content-Transfer-Encoding'}); - } elsif (file_has_nonascii($compose_filename)) { - my $content_type = (delete($parsed_email{'Content-Type'}) or - "text/plain; charset=$compose_encoding"); - print $c2 "MIME-Version: 1.0\n", - "Content-Type: $content_type\n", - "Content-Transfer-Encoding: 8bit\n"; - } - # Preserve unknown headers - foreach my $key (keys %parsed_email) { - next if $key eq 'body'; - print $c2 "$key: $parsed_email{$key}"; - } - - if ($parsed_email{'body'}) { - print $c2 "\n$parsed_email{'body'}\n"; - delete($parsed_email{'body'}); - } else { + if ($summary_empty) { print __("Summary email is empty, skipping it\n"); $compose = -1; } - - close $c2; - } elsif ($annotate) { do_edit(@files); } @@ -1009,32 +995,6 @@ sub ask { return; } -sub parse_header_line { - my $lines = shift; - my $parsed_line = shift; - my $addr_pat = join "|", qw(To Cc Bcc); - - foreach (split(/\n/, $lines)) { - if (/^($addr_pat):\s*(.+)$/i) { - $parsed_line->{$1} = [ parse_address_line($2) ]; - } elsif (/^([^:]*):\s*(.+)\s*$/i) { - $parsed_line->{$1} = $2; - } - } -} - -sub filter_body { - my $c = shift; - my $body = ""; - while (my $body_line = <$c>) { - if ($body_line !~ m/^GIT:/) { - $body .= $body_line; - } - } - return $body; -} - - my %broken_encoding; sub file_declares_8bit_cte { diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh index 263db3ad17..9644ff5793 100755 --- a/t/t9001-send-email.sh +++ b/t/t9001-send-email.sh @@ -2505,4 +2505,39 @@ test_expect_success $PREREQ 'test forbidSendmailVariables behavior override' ' HEAD^ ' +test_expect_success $PREREQ '--compose handles lowercase headers' ' + write_script fake-editor <<-\EOF && + sed "s/^From:.*/from: edited-from@example.com/i" "$1" >"$1.tmp" && + mv "$1.tmp" "$1" + EOF + clean_fake_sendmail && + git send-email \ + --compose \ + --from="Example <from@example.com>" \ + --to=nobody@example.com \ + --smtp-server="$(pwd)/fake.sendmail" \ + HEAD^ && + grep "From: edited-from@example.com" msgtxt1 +' + +test_expect_success $PREREQ '--compose handles to headers' ' + write_script fake-editor <<-\EOF && + sed "s/^$/To: edited-to@example.com\n/" <"$1" >"$1.tmp" && + echo this is the body >>"$1.tmp" && + mv "$1.tmp" "$1" + EOF + clean_fake_sendmail && + GIT_SEND_EMAIL_NOTTY=1 \ + git send-email \ + --compose \ + --from="Example <from@example.com>" \ + --to=nobody@example.com \ + --smtp-server="$(pwd)/fake.sendmail" \ + HEAD^ && + # Ideally the "to" header we specified would be used, + # but the program explicitly warns that these are + # ignored. For now, just make sure we did not abort. + grep "To:" msgtxt1 +' + test_done
This reverts commit b6049542b97e7b135e0e82bf996084d461224d32. Prior to that commit, we read the results of the user editing the "--compose" message in a loop, picking out parts we cared about, and streaming the result out to a ".final" file. That commit split the reading/interpreting into two phases; we'd now read into a hash, and then pick things out of the hash. The goal was making the code more readable. And in some ways it did, because the ugly regexes are confined to the reading phase. But it also introduced several bugs, because now the two phases need to match each other. In particular: - we pick out headers like "Subject: foo" with a case-insensitive regex, and then use the user-provided header name as the key in a case-sensitive hash. So if the user wrote "subject: foo", we'd no longer recognize it as a subject. - the namespace for the hash keys conflates header names with meta information like "body". If you put "body: foo" in your message, it would be misinterpreted as the actual message body (nobody is likely to do that in practice, but it seems like an unnecessary danger). - the handling for to/cc/bcc is totally broken. The behavior before that commit is to recognize and skip those headers, with a note to the user that they are not yet handled. Not great, but OK. But after the patch, the reading side now splits the addresses into a perl array-ref. But the interpreting side doesn't handle this at all, and blindly prints the stringified array-ref value. This leads to garbage like: (mbox) Adding to: ARRAY (0x555b4345c428) from line 'To: ARRAY(0x555b4345c428)' error: unable to extract a valid address from: ARRAY (0x555b4345c428) What to do with this address? ([q]uit|[d]rop|[e]dit): Probably not a huge deal, since nobody should even try to use those headers in the first place (since they were not implemented). But the new behavior is worse, and indicative of the sorts of problems that come from having the two layers. The revert had a few conflicts, due to later work in this area from 15dc3b9161 (send-email: rename variable for clarity, 2018-03-04) and d11c943c78 (send-email: support separate Reply-To address, 2018-03-04). I've ported the changes from those commits over as part of the conflict resolution. The new tests show the bugs. Note the use of GIT_SEND_EMAIL_NOTTY in the second one. Without it, the test is happy to reach outside the test harness to the developer's actual terminal (when run with the buggy state before this patch). Signed-off-by: Jeff King <peff@peff.net> --- I guess "readable" is up for debate here, but I find the inline handling a lot easier to follow (and it's half as many lines; most of the diffstat is the new tests). But one thing that gives me pause is that the neither before or after this patch do we handle continuation lines like: Subject: this is the beginning and this is more subject And it would probably be a lot easier to add when storing the headers in a hash (it's not impossible to do it the other way, but you basically have to delay processing each line with a small state machine). So another option is to just fix the individual bugs separately. git-send-email.perl | 120 ++++++++++++++---------------------------- t/t9001-send-email.sh | 35 ++++++++++++ 2 files changed, 75 insertions(+), 80 deletions(-)