From patchwork Sun May 30 16:26:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paul Barker X-Patchwork-Id: 13010640 Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBE4371 for ; Sun, 30 May 2021 16:26:47 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 20A035C00D5; Sun, 30 May 2021 12:26:47 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Sun, 30 May 2021 12:26:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pbarker.dev; h= from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; s=fm1; bh=t7WS8cAweSbJCkrAqapwJUVjwp VWQZWYMisPN+ItafY=; b=cCLyF1SloS1HPcu3ILavDzoWj0TD87lUGNe/Gc5xwa 0x9LxGVgLj6uRhUteqqgRSHPCTZv/xtigavQT0sCltdw7oxZq/OMIiVjQpahB2Fk 9Jgfi1TbnWzE54xQ42U1ni7oTghdiuQ1Sth4c+ke8qxWrH7RsGlve7h4SSdr227u VpWuyLbNa65snY8YwxjlPWT07w7SXhbQw3gffZuNzOYv6hcDE50g31h5zj2+2gWG 2yMRazsU/pPNBpO06g1xgsSWYaP2bi0DGjycNxTmi1fvwjZOIlh+g4XKSBzwYtbe yt2PVlCl1pRCep6ZsPFN78XJadkS3lOK/81XMEkAA+mw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :message-id:mime-version:subject:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=t7WS8cAweSbJCkrAq apwJUVjwpVWQZWYMisPN+ItafY=; b=mZMgPtGWw/qLo05mNlgiqi9pRNkZCXkJ1 KaQbr6C9UtIPF0msIGJZheoANw3ikb3JLxcwGvEO19ekXJUTdz7OVFJz8dqmFFLW k6oZLIS0R0CCMzO5QtyqzWzImwlwktKS3d3AFvHZdewH1eRGGrLdNmobCs62Yosb PxEjBAo5L8w5WjDosc6pqbSilxyCLfyU7rie3s4zni7rX82khPH3WKQOLl6DEarn kXyOcp/SnnWmffMaccRTC4H9e5LbNbWIVaa/+cY6Lo7r2UZBQeuh0z+1hM1QuZ4c R/Qe1w+JknHyjCmR6WajanNe0eJLomyKnfP1JAOn2DboV7kzbgSCg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdeluddguddtudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkofgggfestdekredtredttdenucfhrhhomheprfgruhhluceu rghrkhgvrhcuoehprghulhesphgsrghrkhgvrhdruggvvheqnecuggftrfgrthhtvghrnh epteefleelheeugeejudegiedttddvffdvleefgfdvudegkeduleegudeihfeijedunecu kfhppeektddrjedrudeitddrkedunecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepphgruhhlsehpsggrrhhkvghrrdguvghv X-ME-Proxy: Received: from dev.home.b5net.uk (cpc76132-clif11-2-0-cust80.12-4.cable.virginm.net [80.7.160.81]) by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 30 May 2021 12:26:46 -0400 (EDT) From: Paul Barker To: tools@linux.kernel.org, Konstantin Ryabitsev Cc: Paul Barker Subject: [patatt][PATCH] Handle MIME encoded-word & other header manglings Date: Sun, 30 May 2021 17:26:25 +0100 Message-Id: <20210530162625.31243-1-paul@pbarker.dev> X-Mailer: git-send-email 2.31.1 X-Mailing-List: tools@linux.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When testing patatt with patches sent to a sr.ht hosted mailing list, it was found that long header lines (such as the X-Developer-Signature line) were re-encoded using the MIME encoded-word syntax (RFC 2047) when an mbox archive is generated, causing patatt to choke on the resulting text which looks like this: X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject; bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=; =?utf-8?q?b=3DowGbwMvMwCF2?= =?utf-8?q?w7xIXuiX9CvG02pJDAmb67lTNi0+IeF97TL76vtKD7xjSjaluz0o/KfmZLX8rMi7_?= =?utf-8?q?l3M6O0pZGMQ4GGTFFFl2z951+fqDJVt7b0gHw8xhZQIZwsDFKQATydFhZJi+fFfvJ?= =?utf-8?q?8+0MF7GrfzWnP?= K7mAM/3n/r/UC+bprf6/g114QYGdbHcsaK7b1nanfA4IeZi1V0lL26cruXUWxgSEnNDP1FrAA= Avoiding this issue by neatly wrapping the X-Developer-Signature header before sending doesn't appear to be possible without making invasive changes to git-send-email and/or the Net::SMTP perl module. The header content generated by patatt is wrapped at 78 characters as can be seen here from a locally signed patch file: X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject; bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=; b=owGbwMvMwCF2w7xIXuiX9CvG02pJDAmbN1xO2bT4hIT3tcvsq+8rPfCOKdmU7vag8J+ak9XysyLv Xs7p7ChlYRDjYJAVU2TZPXvX5esPlmztvSEdDDOHlQlkCAMXpwBMpG0Dw/9Kpzgpc8UsQwOPK/taW6 dFnZyy5QlXPfNCC4WTc76ft9ZnZJjI37a17fP7sxvclKJ1tm36EhITcK62Pphje9KrmOxMJg4A Running `git send-email --smtp-debug=1 0001.patch` shows that this is joined into a single long line before the message is sent: Net::SMTP::_SSL=GLOB(0x5646fbdc3ac8)>>> X-Developer-Signature: v=1; a=openpgp-sha256; l=672; h=from:subject; bh=C40yOKgIfnNIUP+OW9WyPdBfljkZPpfUL1NepOODlx8=; b=owGbwMvMwCF2w7xIXuiX9CvG02pJDAmb571P2bT4hIT3tcvsq+8rPfCOKdmU7vag8J+ak9XysyLv Xs7p7ChlYRDjYJAVU2TZPXvX5esPlmztvSEdDDOHlQlkCAMXpwBM5JA3I8O5hP6Tqm7lJst0rldcux 1V7M4q8T5o1fPU6Zs+hxj+SjvN8D/DK3rn8b0m34/Xy388Yeu8jvFdJf/c6Y6LDU7Hulj01nAAAA== So we need to accept that the X-Developer-Signature line may be quite long and so may be re-encoded by a mail server or archiver. The Python email.header module provides the decode_header() and make_header() functions which can be used to handle MIME encoded-word syntax or other header manglings which may occur. The decode_header() function requires a str argument so we must decode our bytes before using this function. Thankfully, RFC 2822 makes life easy here as it says that all header content must be composed of US-ASCII characters (see section 2.2 of the RFC) so decoding is straightforward. The header content is re-encoded into bytes after un-mangling to avoid having to modify every other location in patatt where the header content is accessed. Signed-off-by: Paul Barker --- patatt/__init__.py | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/patatt/__init__.py b/patatt/__init__.py index 460d282..4e4d5c7 100644 --- a/patatt/__init__.py +++ b/patatt/__init__.py @@ -91,7 +91,7 @@ class DevsigHeader: def from_bytes(self, hval: bytes) -> None: self.hval = DevsigHeader._dkim_canonicalize_header(hval) - hval = re.sub(rb'\s*', b'', hval) + hval = re.sub(rb'\s*', b'', self.hval) for chunk in hval.split(b';'): parts = chunk.split(b'=', 1) if len(parts) < 2: @@ -392,6 +392,14 @@ class DevsigHeader: @staticmethod def _dkim_canonicalize_header(hval: bytes) -> bytes: + # The decode_header() function we're about to call requires a str + # argument. Since RFC2822 (sec 2.2), header fields must be ASCII + # characters so this is easy to achieve. + hval = hval.decode('ascii') + # Handle MIME encoded-word syntax or other types of header encoding + hval = str(email.header.make_header(email.header.decode_header(hval))) + # Convert the header back into bytes for further processing + hval = hval.encode('utf-8') # We only do relaxed for headers # o Unfold all header field continuation lines as described in # [RFC5322]; in particular, lines with terminators embedded in