diff mbox

Don't execute binary files if execve() returned ENOEXEC.

Message ID 20170207083307.14881-1-kilobyte@angband.pl (mailing list archive)
State Rejected
Delegated to: Herbert Xu
Headers show

Commit Message

Adam Borowski Feb. 7, 2017, 8:33 a.m. UTC
Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
scripts, but attempts to execute common binary files tend to be nasty:
especially both ELF and PE tend to make dash create a bunch of files with
unprintable names, that in turn confuse some tools up to causing data loss.

Thus, let's read the first line and see if it looks like text.  This is a
variant of the approach used by bash and zsh; mksh instead checks for
signatures of a bunch of common file types.

POSIX says: "If the executable file is not a text file, the shell may bypass
this command execution.".

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
---
This has been applied in Debian.  While technically it's only a "may" issue
in dash itself, and is triggered by user error (trying to exec files you
shouldn't exec), the fallout is nasty enough that the bug was classified as
serious.

The usual failure mode is to create files with names such as:
(per submitter, a PE):
90 d4 f6
(an ELF):
01 b0 40 40 08 01 40 38 02 40 04 03 01 05 40 40 ed ed
01 b0 40 40 f8 40 38 02 40 04 03 01 05 40 40 da da

Comments

Jilles Tjoelker Feb. 7, 2017, 10:52 p.m. UTC | #1
On Tue, Feb 07, 2017 at 09:33:07AM +0100, Adam Borowski wrote:
> Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
> scripts, but attempts to execute common binary files tend to be nasty:
> especially both ELF and PE tend to make dash create a bunch of files with
> unprintable names, that in turn confuse some tools up to causing data loss.

> Thus, let's read the first line and see if it looks like text.  This is a
> variant of the approach used by bash and zsh; mksh instead checks for
> signatures of a bunch of common file types.

> POSIX says: "If the executable file is not a text file, the shell may bypass
> this command execution.".

> Signed-off-by: Adam Borowski <kilobyte@angband.pl>
> ---
> This has been applied in Debian.  While technically it's only a "may" issue
> in dash itself, and is triggered by user error (trying to exec files you
> shouldn't exec), the fallout is nasty enough that the bug was classified as
> serious.

> The usual failure mode is to create files with names such as:
> (per submitter, a PE):
> 90 d4 f6
> (an ELF):
> 01 b0 40 40 08 01 40 38 02 40 04 03 01 05 40 40 ed ed
> 01 b0 40 40 f8 40 38 02 40 04 03 01 05 40 40 da da

In FreeBSD sh, I have done this slightly differently (since 2011), based
on POSIX's definition of a text file in XBD 3:

] A file that contains characters organized into zero or more lines. The
] lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
] in length, including the <newline> character.

The check is simply for a 0 byte in the first 256 bytes (how many bytes
are read by pread() for 256 bytes). A file containing the byte 8, for
example, can still be a text file per POSIX's definition.

This check might cause a terse script with binary to fail to execute,
but I have not received bug reports about that.

Stopping the check with a \n will cause a PNG header to be considered
text.

Also, FreeBSD sh uses O_NONBLOCK when opening the file and pread() to
read the data, in order to minimize the potential for modifying things
by reading.
Adam Borowski Feb. 8, 2017, 8:02 a.m. UTC | #2
On Tue, Feb 07, 2017 at 11:52:08PM +0100, Jilles Tjoelker wrote:
> On Tue, Feb 07, 2017 at 09:33:07AM +0100, Adam Borowski wrote:
> > Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
> > scripts, but attempts to execute common binary files tend to be nasty:
> > especially both ELF and PE tend to make dash create a bunch of files with
> > unprintable names, that in turn confuse some tools up to causing data loss.
> 
> > Thus, let's read the first line and see if it looks like text.  This is a
> > variant of the approach used by bash and zsh; mksh instead checks for
> > signatures of a bunch of common file types.
> 
> > POSIX says: "If the executable file is not a text file, the shell may bypass
> > this command execution.".
> 
> In FreeBSD sh, I have done this slightly differently (since 2011), based
> on POSIX's definition of a text file in XBD 3:
> 
> ] A file that contains characters organized into zero or more lines. The
> ] lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
> ] in length, including the <newline> character.

Using that definition would require reading the whole file, which can
obviously be slow and/or lead to DoS.

Also, it would reject some scripts currently accepted by popular shells,
including bash, mksh and present version of dash -- none of which ban
lines above LINE_MAX (2048) in length.

> The check is simply for a 0 byte in the first 256 bytes (how many bytes
> are read by pread() for 256 bytes). A file containing the byte 8, for
> example, can still be a text file per POSIX's definition.

The XBD 3 cannot possibly specify allowed byte values, as it doesn't even
know which is the newline, nor does it require even its own "portable
character set" being directly accessible (merely that it's included in one
of shiftable sets provided by a locale).

On the other hand, not only dash assumes 8-bit bytes and ASCII-compatible
charset (what a limitation!), but we also have knowledge about which byte
values are not a part of any printable string in any locale on a supported
platform.  And the set of supported character sets is going to only
decrease[1].

> This check might cause a terse script with binary to fail to execute,
> but I have not received bug reports about that.

In today's brave "curl|sudo sh" world, it's a concern I wouldn't dismiss.
The first line will be a legitimate command to the shell, the rest is often
arbitrary binary junk.

> Stopping the check with a \n will cause a PNG header to be considered
> text.

Good point.  It does fool bash and mksh (with mksh's different approach)
too.  On the other hand, PNG files are not something that's likely to have a
bogus +x permission, and unlike PE or ELF files, empirically I didn't notice
any nasty results.  It's still bad -- not sure what's the best solution.

> Also, FreeBSD sh uses O_NONBLOCK when opening the file and pread() to
> read the data, in order to minimize the potential for modifying things
> by reading.

The manpage for open(2) on Linux says that, while currently O_NONBLOCK has
no effect for regular files and block devices, the expected semantics might
eventually be implemented -- and I guess you're not going to prefault the
beginning of the file before reading, are you?  Thus, O_NONBLOCK is a bad
idea.

As for pread, dash execs a new process (via /bin/sh which might or might not
be dash) so there's no optimization in not reopening the file.  As for
avoiding fifos, the kernel has just checked whether it's a valid executable
anyway.  Am I missing something else?



Meow!

[1]. I've recently gathered stats about use of ancient encodings, and it's
so close to zero that I plan to raise discussion about dropping support for
non-UTF8 in Debian soon.  You might be reluctant to do so in dash for now,
but having a single encoding for all locales would make it easy to fix a
number of POSIX requirements dash fails at because you consider locale
support to be infeasible.  Like, making ${#foo} give length in characters is
a matter of counting bytes outside 0x80..0xBF if assuming well-formed
strings is okay, or not a lot more complex even if full validation is
required.
Jilles Tjoelker Feb. 8, 2017, 10:11 p.m. UTC | #3
On Wed, Feb 08, 2017 at 09:02:33AM +0100, Adam Borowski wrote:
> On Tue, Feb 07, 2017 at 11:52:08PM +0100, Jilles Tjoelker wrote:
> > On Tue, Feb 07, 2017 at 09:33:07AM +0100, Adam Borowski wrote:
> > > Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
> > > scripts, but attempts to execute common binary files tend to be nasty:
> > > especially both ELF and PE tend to make dash create a bunch of files with
> > > unprintable names, that in turn confuse some tools up to causing data loss.

> > > Thus, let's read the first line and see if it looks like text.  This is a
> > > variant of the approach used by bash and zsh; mksh instead checks for
> > > signatures of a bunch of common file types.

> > > POSIX says: "If the executable file is not a text file, the shell may bypass
> > > this command execution.".

> > In FreeBSD sh, I have done this slightly differently (since 2011), based
> > on POSIX's definition of a text file in XBD 3:

> > ] A file that contains characters organized into zero or more lines. The
> > ] lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
> > ] in length, including the <newline> character.

> Using that definition would require reading the whole file, which can
> obviously be slow and/or lead to DoS.

> Also, it would reject some scripts currently accepted by popular shells,
> including bash, mksh and present version of dash -- none of which ban
> lines above LINE_MAX (2048) in length.

The part you quoted has does not require that the shell bypass the
execution (of the shell) for any file that is not a text file, but that
the shell not bypass the execution for any file that is a text file.

Therefore, the shell may (but is not required to) bypass the execution
if the file contains 0 bytes, contains invalid byte sequences that do
not form a character, contains lines longer than {LINE_MAX} or ends with
a character that is not a newline. Otherwise, the shell must perform the
execution.

The line length is indeed somewhat strange: the INPUT FILES section of
XCU 4 Utilities -> sh requires the shell to read scripts without the
{LINE_MAX} limit, but the part you quoted does not use this modified
definition of a text file. In practice this is not relevant since shells
are not going to read more than {LINE_MAX} for this check anyway, as you
say.

> > The check is simply for a 0 byte in the first 256 bytes (how many bytes
> > are read by pread() for 256 bytes). A file containing the byte 8, for
> > example, can still be a text file per POSIX's definition.

> The XBD 3 cannot possibly specify allowed byte values, as it doesn't even
> know which is the newline, nor does it require even its own "portable
> character set" being directly accessible (merely that it's included in one
> of shiftable sets provided by a locale).

> On the other hand, not only dash assumes 8-bit bytes and ASCII-compatible
> charset (what a limitation!), but we also have knowledge about which byte
> values are not a part of any printable string in any locale on a supported
> platform.  And the set of supported character sets is going to only
> decrease[1].

Text files need not contain printable characters only.

> > This check might cause a terse script with binary to fail to execute,
> > but I have not received bug reports about that.

> In today's brave "curl|sudo sh" world, it's a concern I wouldn't dismiss.
> The first line will be a legitimate command to the shell, the rest is often
> arbitrary binary junk.

I think problems are unlikely because most scripts use the #! method
instead of relying on this shell feature.

> > Stopping the check with a \n will cause a PNG header to be considered
> > text.

> Good point.  It does fool bash and mksh (with mksh's different approach)
> too.  On the other hand, PNG files are not something that's likely to have a
> bogus +x permission, and unlike PE or ELF files, empirically I didn't notice
> any nasty results.  It's still bad -- not sure what's the best solution.

Anything can have a bogus +x permission on FAT, NTFS and other
filesystems that do not support executable permissions at all.

> > Also, FreeBSD sh uses O_NONBLOCK when opening the file and pread() to
> > read the data, in order to minimize the potential for modifying things
> > by reading.

> The manpage for open(2) on Linux says that, while currently O_NONBLOCK has
> no effect for regular files and block devices, the expected semantics might
> eventually be implemented -- and I guess you're not going to prefault the
> beginning of the file before reading, are you?  Thus, O_NONBLOCK is a bad
> idea.

Hmm, I doubt such a basic thing will ever be changed.

> As for pread, dash execs a new process (via /bin/sh which might or might not
> be dash) so there's no optimization in not reopening the file.  As for
> avoiding fifos, the kernel has just checked whether it's a valid executable
> anyway.  Am I missing something else?

This guards against a file change between execve and open in the parent
shell, but not against a file change between open in the parent and
newly executed shell.

The O_NOCTTY you added is similar. I did not add this because O_NOCTTY
has no effect in the FreeBSD kernel.

> Meow!

> [1]. I've recently gathered stats about use of ancient encodings, and it's
> so close to zero that I plan to raise discussion about dropping support for
> non-UTF8 in Debian soon.  You might be reluctant to do so in dash for now,
> but having a single encoding for all locales would make it easy to fix a
> number of POSIX requirements dash fails at because you consider locale
> support to be infeasible.  Like, making ${#foo} give length in characters is
> a matter of counting bytes outside 0x80..0xBF if assuming well-formed
> strings is okay, or not a lot more complex even if full validation is
> required.

I think UTF-8 and character=byte are the useful options for character
encodings these days. The latter is useful to process non-character data
without [EILSEQ] problems, or to process ASCII data with higher
performance (such as in sort(1)).

FreeBSD sh supports these two options and nothing more. The handling for
${#foo} in an UTF-8 locale is, in fact, what you are suggesting here.

Some advantages compared to generic approaches based on mbrtowc() or
similar are higher performance, simpler code and the possibility to
design handling for invalid sequences for each situation (most users do
not like the shell aborting at random places because of invalid UTF-8).
Herbert Xu March 6, 2018, 9:05 a.m. UTC | #4
On Tue, Feb 07, 2017 at 09:33:07AM +0100, Adam Borowski wrote:
> Both "dash -c foo" and "./foo" are supposed to be able to run hashbang-less
> scripts, but attempts to execute common binary files tend to be nasty:
> especially both ELF and PE tend to make dash create a bunch of files with
> unprintable names, that in turn confuse some tools up to causing data loss.
> 
> Thus, let's read the first line and see if it looks like text.  This is a
> variant of the approach used by bash and zsh; mksh instead checks for
> signatures of a bunch of common file types.
> 
> POSIX says: "If the executable file is not a text file, the shell may bypass
> this command execution.".
> 
> Signed-off-by: Adam Borowski <kilobyte@angband.pl>

I'm not going to apply this patch because

1) It's not required by POSIX.
2) As you said the detection is imperfect, both false positives
and false negatives can occur.

Cheers,
diff mbox

Patch

diff --git a/src/exec.c b/src/exec.c
index ec0eadd..19ae752 100644
--- a/src/exec.c
+++ b/src/exec.c
@@ -148,6 +148,36 @@  shellexec(char **argv, const char *path, int idx)
 }
 
 
+/*
+ * Check if an executable that just failed with ENOEXEC shouldn't be
+ * considered a script (wrong-arch ELF/PE, junk accidentally set +x, etc).
+ * We check only the first line to allow binaries encapsulated in a shell
+ * script without proper quoting.  The first line, if not a hashbang, is
+ * likely to contain comments; even ancient encodings, at least popular
+ * ones, don't use 0x7f nor values below 0x1f other than whitespace (\t,
+ * \n, \v, \f, \r), ISO/IEC 2022 can have SI, SO and \e.
+ */
+STATIC int file_is_binary(const char *cmd)
+{
+	char buf[128];
+	int fd = open(cmd, O_RDONLY|O_NOCTTY);
+	if (fd == -1)
+		return 1;
+	int len = read(fd, buf, sizeof(buf));
+	close(fd);
+	for (int i = 0; i < len; ++i) {
+		char c = buf[i];
+		if (c >= 0 && c <= 8 ||
+		    c >= 16 && c <= 31 && c != 27 ||
+		    c == 0x7f)
+			return 1;
+		if (c == '\n')
+			return 0;
+	}
+	return 0;
+}
+
+
 STATIC void
 tryexec(char *cmd, char **argv, char **envp)
 {
@@ -162,6 +192,8 @@  repeat:
 	execve(cmd, argv, envp);
 #endif
 	if (cmd != path_bshell && errno == ENOEXEC) {
+		if (file_is_binary(cmd))
+			return;
 		*argv-- = cmd;
 		*argv = cmd = path_bshell;
 		goto repeat;