mbox series

[0/15] allow multi-byte core.commentChar

Message ID 20240307091407.GA2072522@coredump.intra.peff.net (mailing list archive)
Headers show
Series allow multi-byte core.commentChar | expand

Message

Jeff King March 7, 2024, 9:14 a.m. UTC
On Wed, Mar 06, 2024 at 03:08:04AM -0500, Jeff King wrote:

> For a more readable series, I'd guess it would make sense to introduce
> comment_line_str as a separate variable (but continue to enforce the
> single-char rule), convert the easy cases en masse, the tricky cases one
> by one, and then finally drop comment_line_char entirely. At which point
> the config rules can be lifted to allow multi-byte strings.

I ended up cleaning this up. Like I said, this isn't something I'm
personally that interested in. But it just seemed like a wart that this
one spot could not handle multi-byte characters that all the cool kids
are using in their prompts etc these days.

Plus it was kind of an interesting puzzle for how to lay out the
refactoring to make each step self-consistent. At the very least, I
think the first couple of cleanups are worth it even if we do not see
the whole thing through. ;)

It obviously nullifies kh/doc-commentchar-is-a-byte, which is in 'next'.
Sadly "git merge" does not find a conflict with the documentation update
in patch 15, so we'll have to remember to pick up one topic or the
other.

I'm using U+00BB as my commentChar for now to see if any bugs show up,
but I expect I'll get sick of it after a few days.

  [01/15]: strbuf: simplify comment-handling in add_lines() helper
  [02/15]: strbuf: avoid static variables in strbuf_add_commented_lines()
  [03/15]: commit: refactor base-case of adjust_comment_line_char()
  [04/15]: strbuf: avoid shadowing global comment_line_char name

    These four are cleanups that could be taken independently.

  [05/15]: environment: store comment_line_char as a string

    This one preps us for incrementally moving code over to the new
    system.

  [06/15]: strbuf: accept a comment string for strbuf_stripspace()
  [07/15]: strbuf: accept a comment string for strbuf_commented_addf()
  [08/15]: strbuf: accept a comment string for strbuf_add_commented_lines()
  [09/15]: prefer comment_line_str to comment_line_char for printing
  [10/15]: find multi-byte comment chars in NUL-terminated strings
  [11/15]: find multi-byte comment chars in unterminated buffers
  [12/15]: sequencer: handle multi-byte comment characters when writing todo list
  [13/15]: wt-status: drop custom comment-char stringification

    These ones are the actual transition.

  [14/15]: environment: drop comment_line_char compatibility macro
  [15/15]: config: allow multi-byte core.commentChar

    And then we tie it off by dropping the now-unused bits and loosening
    the config logic.

 Documentation/config/core.txt |  4 ++-
 add-patch.c                   | 14 +++++-----
 builtin/branch.c              |  8 +++---
 builtin/commit.c              | 19 +++++++-------
 builtin/merge.c               | 12 ++++-----
 builtin/notes.c               | 10 ++++----
 builtin/rebase.c              |  2 +-
 builtin/stripspace.c          |  4 +--
 builtin/tag.c                 | 14 +++++-----
 commit.c                      |  3 ++-
 config.c                      |  6 ++---
 environment.c                 |  2 +-
 environment.h                 |  2 +-
 fmt-merge-msg.c               |  8 +++---
 rebase-interactive.c          | 10 ++++----
 sequencer.c                   | 48 ++++++++++++++++++-----------------
 strbuf.c                      | 47 ++++++++++++++++++----------------
 strbuf.h                      |  9 ++++---
 t/t0030-stripspace.sh         |  5 ++++
 t/t7507-commit-verbose.sh     | 10 ++++++++
 t/t7508-status.sh             |  4 ++-
 trailer.c                     |  6 ++---
 wt-status.c                   | 31 +++++++++-------------
 23 files changed, 149 insertions(+), 129 deletions(-)

Comments

Phillip Wood March 8, 2024, 11:07 a.m. UTC | #1
Hi Peff

On 07/03/2024 09:14, Jeff King wrote:
> On Wed, Mar 06, 2024 at 03:08:04AM -0500, Jeff King wrote:
> 
>> For a more readable series, I'd guess it would make sense to introduce
>> comment_line_str as a separate variable (but continue to enforce the
>> single-char rule), convert the easy cases en masse, the tricky cases one
>> by one, and then finally drop comment_line_char entirely. At which point
>> the config rules can be lifted to allow multi-byte strings.
> 
> I ended up cleaning this up. Like I said, this isn't something I'm
> personally that interested in. But it just seemed like a wart that this
> one spot could not handle multi-byte characters that all the cool kids
> are using in their prompts etc these days.

I agree it would be nice to support multibyte comment characters on 
principle even if I don't think I'd use that feature myself. I've looked 
through the changes to the sequencer and they all look sensible to me. 
As I mentioned when looking at patch 11 I do wonder if we want to reject 
ascii whitespace and control characters when parsing core.commentChar. 
At a minimum leading whitespace and LF anywhere in the comment string 
feel like they are asking for trouble.

Best Wishes

Phillip

> Plus it was kind of an interesting puzzle for how to lay out the
> refactoring to make each step self-consistent. At the very least, I
> think the first couple of cleanups are worth it even if we do not see
> the whole thing through. ;)
> 
> It obviously nullifies kh/doc-commentchar-is-a-byte, which is in 'next'.
> Sadly "git merge" does not find a conflict with the documentation update
> in patch 15, so we'll have to remember to pick up one topic or the
> other.
> 
> I'm using U+00BB as my commentChar for now to see if any bugs show up,
> but I expect I'll get sick of it after a few days.
> 
>    [01/15]: strbuf: simplify comment-handling in add_lines() helper
>    [02/15]: strbuf: avoid static variables in strbuf_add_commented_lines()
>    [03/15]: commit: refactor base-case of adjust_comment_line_char()
>    [04/15]: strbuf: avoid shadowing global comment_line_char name
> 
>      These four are cleanups that could be taken independently.
> 
>    [05/15]: environment: store comment_line_char as a string
> 
>      This one preps us for incrementally moving code over to the new
>      system.
> 
>    [06/15]: strbuf: accept a comment string for strbuf_stripspace()
>    [07/15]: strbuf: accept a comment string for strbuf_commented_addf()
>    [08/15]: strbuf: accept a comment string for strbuf_add_commented_lines()
>    [09/15]: prefer comment_line_str to comment_line_char for printing
>    [10/15]: find multi-byte comment chars in NUL-terminated strings
>    [11/15]: find multi-byte comment chars in unterminated buffers
>    [12/15]: sequencer: handle multi-byte comment characters when writing todo list
>    [13/15]: wt-status: drop custom comment-char stringification
> 
>      These ones are the actual transition.
> 
>    [14/15]: environment: drop comment_line_char compatibility macro
>    [15/15]: config: allow multi-byte core.commentChar
> 
>      And then we tie it off by dropping the now-unused bits and loosening
>      the config logic.
> 
>   Documentation/config/core.txt |  4 ++-
>   add-patch.c                   | 14 +++++-----
>   builtin/branch.c              |  8 +++---
>   builtin/commit.c              | 19 +++++++-------
>   builtin/merge.c               | 12 ++++-----
>   builtin/notes.c               | 10 ++++----
>   builtin/rebase.c              |  2 +-
>   builtin/stripspace.c          |  4 +--
>   builtin/tag.c                 | 14 +++++-----
>   commit.c                      |  3 ++-
>   config.c                      |  6 ++---
>   environment.c                 |  2 +-
>   environment.h                 |  2 +-
>   fmt-merge-msg.c               |  8 +++---
>   rebase-interactive.c          | 10 ++++----
>   sequencer.c                   | 48 ++++++++++++++++++-----------------
>   strbuf.c                      | 47 ++++++++++++++++++----------------
>   strbuf.h                      |  9 ++++---
>   t/t0030-stripspace.sh         |  5 ++++
>   t/t7507-commit-verbose.sh     | 10 ++++++++
>   t/t7508-status.sh             |  4 ++-
>   trailer.c                     |  6 ++---
>   wt-status.c                   | 31 +++++++++-------------
>   23 files changed, 149 insertions(+), 129 deletions(-)
>