diff mbox series

[b4,1/2] Avoid decoding errors when extracting message ID from stdin

Message ID 20210718043406.26727-2-kyle@kyleam.com (mailing list archive)
State New, archived
Headers show
Series Avoid decoding errors when extracting message ID from stdin | expand

Commit Message

Kyle Meyer July 18, 2021, 4:34 a.m. UTC
The mbox, am, and pr subcommands accept an mbox on stdin and extract
the message ID.  When stdin.read() is called, Python assumes the
encoding is locale.getpreferredencoding(False).  This may not match
the content encoding, leading to a decoding error.

Instead feed the stdin bytes to message_from_bytes(), which leads to a
decode('ASCII', errors='surrogateescape') underneath.  That's
sufficient to get the message ID from the ASCII headers.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kyle Meyer <kyle@kyleam.com>
---

  Note: I've tested only `b4 am/mbox' with the reproducer message
  mentioned in upthread; I haven't tested `b4 pr'.

 b4/__init__.py | 2 +-
 b4/pr.py       | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/b4/__init__.py b/b4/__init__.py
index 0e007be..5b32fb4 100644
--- a/b4/__init__.py
+++ b/b4/__init__.py
@@ -1948,7 +1948,7 @@  def get_requests_session():
 
 def get_msgid_from_stdin():
     if not sys.stdin.isatty():
-        message = email.message_from_string(sys.stdin.read())
+        message = email.message_from_bytes(sys.stdin.buffer.read())
         return message.get('Message-ID', None)
     return None
 
diff --git a/b4/pr.py b/b4/pr.py
index d8ff7f4..fbb2a71 100644
--- a/b4/pr.py
+++ b/b4/pr.py
@@ -433,7 +433,7 @@  def main(cmdargs):
 
     if not sys.stdin.isatty():
         logger.debug('Getting PR message from stdin')
-        msg = email.message_from_string(sys.stdin.read())
+        msg = email.message_from_bytes(sys.stdin.buffer.read())
         msgid = b4.LoreMessage.get_clean_msgid(msg)
         lmsg = parse_pr_data(msg)
     else: