Message ID | pull.1668.git.1708281443289.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | diff.c: use utf8_strwidth() instead of strlen() for display width | expand |
On Sun, Feb 18, 2024 at 06:37:23PM +0000, Chandra Pratap via GitGitGadget wrote: > From: Chandra Pratap <chandrapratap3519@gmail.com> > > Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> > --- > diff.c: use utf8_strwidth() instead of strlen() for display width > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1668%2FChand-ra%2Fdiff-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1668/Chand-ra/diff-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1668 > > diff.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/diff.c b/diff.c > index ccfa1fca0d0..02d60af6749 100644 > --- a/diff.c > +++ b/diff.c > @@ -2712,13 +2712,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) > * making the line longer than the maximum width. > */ > > - /* > - * NEEDSWORK: line_prefix is often used for "log --graph" output > - * and contains ANSI-colored string. utf8_strnwidth() should be > - * used to correctly count the display width instead of strlen(). > - */ > if (options->stat_width == -1) > - width = term_columns() - strlen(line_prefix); > + width = term_columns() - utf8_strwidth(line_prefix); It would be nice to add a test demonstrating that this indeed fixes an issue. This would also help to keep this from regressing in the future. Also, do you know why we didn't use `utf8_strwidth()` right from the start? It would have saved the writer some time to just use `utf8_strwidth()` instead of writing a whole paragraph explaining that we should do it eventually. Makes me wonder whether there is anything else going on here. Patrick > else > width = options->stat_width ? options->stat_width : 80; > number_width = decimal_width(max_change) > number_width ? > > base-commit: 2996f11c1d11ab68823f0939b6469dedc2b9ab90 > -- > gitgitgadget >
Patrick Steinhardt <ps@pks.im> writes: > Also, do you know why we didn't use `utf8_strwidth()` right from the > start? It would have saved the writer some time to just use > `utf8_strwidth()` instead of writing a whole paragraph explaining that > we should do it eventually. Makes me wonder whether there is anything > else going on here. I suspect that it is because it is not just that single strlen() call. The code assumed that byte count and display width were interchangeable, and use of strlen() there was merely an example. Starting there, the value returned by strlen() is treated as if it were interchangeable with a display width, and then later used to count how many bytes to trim from either end of the string so that the trimmed string would eventually fit a given display width, which means that the code to compute how much to trim (which is not that strlen() the patch in question is touching) would have to compute the reverse, i.e. "if we need to recover N display columns, how many bytes do we need to trim from that string?"
diff --git a/diff.c b/diff.c index ccfa1fca0d0..02d60af6749 100644 --- a/diff.c +++ b/diff.c @@ -2712,13 +2712,8 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) * making the line longer than the maximum width. */ - /* - * NEEDSWORK: line_prefix is often used for "log --graph" output - * and contains ANSI-colored string. utf8_strnwidth() should be - * used to correctly count the display width instead of strlen(). - */ if (options->stat_width == -1) - width = term_columns() - strlen(line_prefix); + width = term_columns() - utf8_strwidth(line_prefix); else width = options->stat_width ? options->stat_width : 80; number_width = decimal_width(max_change) > number_width ?