diff mbox series

[v4,2/2] diff.c: More changes and tests around utf8_strwidth()

Message ID 20220903053934.15629-1-tboegi@web.de (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Torsten Bögershausen Sept. 3, 2022, 5:39 a.m. UTC
From: Torsten Bögershausen <tboegi@web.de>

When unicode filenames (encoded in UTF-8) are used, the visible width
on the screen is not the same as strlen(filename).

For example, `git log --stat` may produce an output like this:

[snip the header]

 Arger.txt  | 1 +
 Ärger.txt | 1 +
 2 files changed, 2 insertions(+)

The last commit uses utf8_strwidth() instead of strlen() in diff.c
and it is time to test the change.

And now we detect another problem that is fixed here: code like this
strbuf_addf(&out, "%-*s", len, name);
(or using the underlying snprintf() function) does not align the
buffer to a minimum of len measured in screen-width, but uses the
memory count.

One could be tempted to wish that snprintf() was UTF-8 aware.
That doesn't seem to be the case anywhere (tested on Linux and Mac),
probably snprintf() uses the "bytes in memory"/strlen() approach to be
compatible with older versions and this will never change.

The basic idea is to change code in diff.c like this
strbuf_addf(&out, "%-*s", len, name);

into something like this:
int padding = len - utf8_strwidth(name);
if (padding < 0)
	padding = 0;
strbuf_addf(&out, " %s%*s", name, padding, "");

The real change is slighty bigger, as it, as well, integrates two calls
of strbuf_addf() into one.

Tests:
Two things need to be tested:
 - The calculation of the maximum width
 - The calculation of padding

The name "textfile" is changed into "tëxtfilë", both have a width of 8.
If strlen() was used, to get the maximum width, the shorter "binfile" would
have been mis-aligned:
 binfile    | [snip]
 tëxtfilë | [snip]

If only "binfile" would be renamed into "binfilë":
 binfilë | [snip]
 textfile | [snip]

In order to verify that the width is calculated correctly everywhere,
"binfile" is renamed into "binfilë", giving 1 bytes more in strlen()
"tëxtfile" is renamed into "tëxtfilë", 2 byte more in strlen().

The updated t4012-diff-binary.sh checks the correct aligment:
 binfilë  | [snip]
 tëxtfilë | [snip]

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 diff.c                 | 23 ++++++++++++++---------
 t/t4012-diff-binary.sh | 14 +++++++-------
 2 files changed, 21 insertions(+), 16 deletions(-)

--
2.34.0

Comments

Johannes Schindelin Sept. 5, 2022, 10:13 a.m. UTC | #1
Hi Torsten,

thank you for working on a new iteration!

On Sat, 3 Sep 2022, tboegi@web.de wrote:

> [...]
> diff --git a/diff.c b/diff.c
> index b5df464de5..35b9da90fe 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2734,7 +2734,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
>  		char *name = file->print_name;
>  		uintmax_t added = file->added;
>  		uintmax_t deleted = file->deleted;
> -		int name_len;
> +		int name_len, padding;

I had a look and `len` is also declard as an `int`.

>
>  		if (!file->is_interesting && (added + deleted == 0))
>  			continue;
> @@ -2753,10 +2753,14 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
>  			if (slash)
>  				name = slash;
>  		}
> +		padding = len - utf8_strwidth(name);
> +		if (padding < 0)
> +			padding = 0;

I would have had a slight preference for something like this:

		int name_len = utf8_strwidth(name);
		int padding = name_len < len ? len - name_len : 0;

i.e. avoid the potentially negative difference. (Ideally, I would have
liked the type to be changed to `size_t`, but that is impractical due to
the variables' use in `%.*s` formats.)

But it is not worth a new iteration on its own, and I am very happy with
the current iteration.

Thanks!
Dscho
diff mbox series

Patch

diff --git a/diff.c b/diff.c
index b5df464de5..35b9da90fe 100644
--- a/diff.c
+++ b/diff.c
@@ -2734,7 +2734,7 @@  static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		char *name = file->print_name;
 		uintmax_t added = file->added;
 		uintmax_t deleted = file->deleted;
-		int name_len;
+		int name_len, padding;

 		if (!file->is_interesting && (added + deleted == 0))
 			continue;
@@ -2753,10 +2753,14 @@  static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			if (slash)
 				name = slash;
 		}
+		padding = len - utf8_strwidth(name);
+		if (padding < 0)
+			padding = 0;

 		if (file->is_binary) {
-			strbuf_addf(&out, " %s%-*s |", prefix, len, name);
-			strbuf_addf(&out, " %*s", number_width, "Bin");
+			strbuf_addf(&out, " %s%s%*s | %*s",
+				    prefix, name, padding, "",
+				    number_width, "Bin");
 			if (!added && !deleted) {
 				strbuf_addch(&out, '\n');
 				emit_diff_symbol(options, DIFF_SYMBOL_STATS_LINE,
@@ -2776,8 +2780,9 @@  static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			continue;
 		}
 		else if (file->is_unmerged) {
-			strbuf_addf(&out, " %s%-*s |", prefix, len, name);
-			strbuf_addstr(&out, " Unmerged\n");
+			strbuf_addf(&out, " %s%s%*s | %*s",
+				    prefix, name, padding, "",
+				    number_width, "Unmerged");
 			emit_diff_symbol(options, DIFF_SYMBOL_STATS_LINE,
 					 out.buf, out.len, 0);
 			strbuf_reset(&out);
@@ -2803,10 +2808,10 @@  static void show_stats(struct diffstat_t *data, struct diff_options *options)
 				add = total - del;
 			}
 		}
-		strbuf_addf(&out, " %s%-*s |", prefix, len, name);
-		strbuf_addf(&out, " %*"PRIuMAX"%s",
-			number_width, added + deleted,
-			added + deleted ? " " : "");
+		strbuf_addf(&out, " %s%s%*s | %*"PRIuMAX"%s",
+			    prefix, name, padding, "",
+			    number_width, added + deleted,
+			    added + deleted ? " " : "");
 		show_graph(&out, '+', add, add_c, reset);
 		show_graph(&out, '-', del, del_c, reset);
 		strbuf_addch(&out, '\n');
diff --git a/t/t4012-diff-binary.sh b/t/t4012-diff-binary.sh
index c509143c81..c64d9d2f40 100755
--- a/t/t4012-diff-binary.sh
+++ b/t/t4012-diff-binary.sh
@@ -113,20 +113,20 @@  test_expect_success 'diff --no-index with binary creation' '
 '

 cat >expect <<EOF
- binfile  |   Bin 0 -> 1026 bytes
- textfile | 10000 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ binfilë  |   Bin 0 -> 1026 bytes
+ tëxtfilë | 10000 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 EOF

 test_expect_success 'diff --stat with binary files and big change count' '
-	printf "\01\00%1024d" 1 >binfile &&
-	git add binfile &&
+	printf "\01\00%1024d" 1 >binfilë &&
+	git add binfilë &&
 	i=0 &&
 	while test $i -lt 10000; do
 		echo $i &&
 		i=$(($i + 1)) || return 1
-	done >textfile &&
-	git add textfile &&
-	git diff --cached --stat binfile textfile >output &&
+	done >tëxtfilë &&
+	git add tëxtfilë &&
+	git -c core.quotepath=false diff --cached --stat binfilë tëxtfilë >output &&
 	grep " | " output >actual &&
 	test_cmp expect actual
 '