diff mbox series

CodingGuidelines: use octal escapes, not hex

Message ID 20230613172927.19019-1-jonathantanmy@google.com (mailing list archive)
State Superseded
Headers show
Series CodingGuidelines: use octal escapes, not hex | expand

Commit Message

Jonathan Tan June 13, 2023, 5:29 p.m. UTC
Hexadecimal escapes in shell scripts are not portable across shells (in
particular, "dash" does not support them). Write in the CodingGuidelines
document that we should be using octal escapes instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/CodingGuidelines | 3 +++
 1 file changed, 3 insertions(+)

Comments

Eric Sunshine June 13, 2023, 6:16 p.m. UTC | #1
On Tue, Jun 13, 2023 at 1:44 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> Hexadecimal escapes in shell scripts are not portable across shells (in
> particular, "dash" does not support them). Write in the CodingGuidelines
> document that we should be using octal escapes instead.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> @@ -188,6 +188,9 @@ For shell scripts specifically (not exhaustive):
> + - Use octal escape sequences (e.g. "\302\242"), not hexadecimal (e.g.
> +   "\xc2\xa2"), as the latter is not portable.

The shell itself doesn't interpret these sequences, so this
description feels too generic. Perhaps it would make more sense to
cite specific tools for which octal sequences are needed for
portability reasons, such as `printf`, `sed`, `tr`, etc.
Jonathan Tan June 13, 2023, 6:43 p.m. UTC | #2
Eric Sunshine <sunshine@sunshineco.com> writes:
> On Tue, Jun 13, 2023 at 1:44 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> > Hexadecimal escapes in shell scripts are not portable across shells (in
> > particular, "dash" does not support them). Write in the CodingGuidelines
> > document that we should be using octal escapes instead.
> >
> > Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> > ---
> > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> > @@ -188,6 +188,9 @@ For shell scripts specifically (not exhaustive):
> > + - Use octal escape sequences (e.g. "\302\242"), not hexadecimal (e.g.
> > +   "\xc2\xa2"), as the latter is not portable.
> 
> The shell itself doesn't interpret these sequences, so this
> description feels too generic. Perhaps it would make more sense to
> cite specific tools for which octal sequences are needed for
> portability reasons, such as `printf`, `sed`, `tr`, etc.

Ah...good point. I checked with "echo" in "dash" and assumed that it
was "dash" that was interpreting the escapes, but indeed it is the
"echo" (and "printf") builtins in "dash" that are actually interpreting
them. What do you think of the following in the commit message:

  Hexadecimal escapes in shell scripts are not portable across shell builtins (in
  particular, the "printf" of "dash" does not support them). Write in the CodingGuidelines
  document that we should be using octal escapes instead.

and in the CodingGuidelines doc:

+ - Use octal escape sequences (e.g. "\302\242"), not hexadecimal (e.g.
+   "\xc2\xa2"), as the latter is not portable across some shell builtins like printf.
Eric Sunshine June 13, 2023, 7:15 p.m. UTC | #3
On Tue, Jun 13, 2023 at 2:43 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > On Tue, Jun 13, 2023 at 1:44 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> > > + - Use octal escape sequences (e.g. "\302\242"), not hexadecimal (e.g.
> > > +   "\xc2\xa2"), as the latter is not portable.
> >
> > The shell itself doesn't interpret these sequences, so this
> > description feels too generic. Perhaps it would make more sense to
> > cite specific tools for which octal sequences are needed for
> > portability reasons, such as `printf`, `sed`, `tr`, etc.
>
> Ah...good point. I checked with "echo" in "dash" and assumed that it
> was "dash" that was interpreting the escapes, but indeed it is the
> "echo" (and "printf") builtins in "dash" that are actually interpreting
> them. What do you think of the following in the commit message:
>
>   Hexadecimal escapes in shell scripts are not portable across shell builtins (in
>   particular, the "printf" of "dash" does not support them). Write in the CodingGuidelines
>   document that we should be using octal escapes instead.
>
> and in the CodingGuidelines doc:
>
> + - Use octal escape sequences (e.g. "\302\242"), not hexadecimal (e.g.
> +   "\xc2\xa2"), as the latter is not portable across some shell builtins like printf.

The portability concern is not specific to a certain shell or whether
a command is builtin, so it feels misleading to single out "dash" and
builtins. The same portability problems can crop up, as well, with
older (non-"dash") shells, and with commands which may or may not be
builtins (such as "echo" which, historically, was not always a
builtin), and non-builtins commands, such as "sed" and "tr".

So, for the commit message, perhaps simply:

    Extend the shell-scripting section of CodingGuidelines to suggest
    octal escape sequences (e.g. "\302\242") over hexadecimal
    (e.g. "\xc2\xa2") since the latter can be a source of portability
    problems.

As for the change to CodingGuidelines, this would probably be sufficient:

    Use octal escape sequences (e.g. "\302\242"), not hexadecimal
    (e.g. "\xc2\xa2"), since the latter is not portable across some
    commands, such as `printf`, `sed`, `tr`, etc.
Junio C Hamano June 13, 2023, 7:29 p.m. UTC | #4
Eric Sunshine <sunshine@sunshineco.com> writes:

> So, for the commit message, perhaps simply:
>
>     Extend the shell-scripting section of CodingGuidelines to suggest
>     octal escape sequences (e.g. "\302\242") over hexadecimal
>     (e.g. "\xc2\xa2") since the latter can be a source of portability
>     problems.
>
> As for the change to CodingGuidelines, this would probably be sufficient:
>
>     Use octal escape sequences (e.g. "\302\242"), not hexadecimal
>     (e.g. "\xc2\xa2"), since the latter is not portable across some
>     commands, such as `printf`, `sed`, `tr`, etc.

I'd prefer singling out `printf`, actually, and not talking about
"across some commands".

As I said in a separate message, we certainly do *not* want to rely
on `echo` interpreting bs-escaped octal sequences without '-e', even
though it may be expected on a POSIX systems, because it is not
portable across systems our users commonly encounter.

And `printf` has been what we chose to turn bs-escaped octal
sequence into binary.  I'd prefer not having to even worry about
`sed`, `tr`, etc. behaving differently and not allowing to expect
these other commands to be usable for turning bs-escaped octal
sequence into binary would be one way to achieve that goal.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 003393ed16..1c54abd7c5 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -188,6 +188,9 @@  For shell scripts specifically (not exhaustive):
    hopefully nobody starts using "local" before they are reimplemented
    in C ;-)
 
+ - Use octal escape sequences (e.g. "\302\242"), not hexadecimal (e.g.
+   "\xc2\xa2"), as the latter is not portable.
+
 
 For C programs: