diff mbox series

[1/2] Fixing file name encoding issues.

Message ID d33fb3d940cf97bdfe3dc544763a51df4874a5c0.1739918546.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series gitk: Fixing file name encoding issues. | expand

Commit Message

Kazuhiro Kato Feb. 18, 2025, 10:42 p.m. UTC
From: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>

Signed-off-by: Kazuhiro Kato <kazuhiro.kato@hotmail.co.jp>
---
 gitk-git/gitk | 1 +
 1 file changed, 1 insertion(+)

Comments

Konstantin Khomoutov Feb. 19, 2025, 5:30 p.m. UTC | #1
On Tue, Feb 18, 2025 at 10:42:25PM +0000, Kazuhiro Kato via GitGitGadget wrote:

[...]
> diff --git a/gitk-git/gitk b/gitk-git/gitk
> index 47a7c1d29c4..88951ed2384 100755
> --- a/gitk-git/gitk
> +++ b/gitk-git/gitk
> @@ -12379,6 +12379,7 @@ catch {
>  if {$gitencoding == ""} {
>      set gitencoding "utf-8"
>  }
> +encoding system utf-8
>  set tclencoding [tcl_encoding $gitencoding]
>  if {$tclencoding == {}} {
>      puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"

I'm not sure one should sensibly do this - except to implement some
well-understood and well-documented kludge, simply because the "system
encoding" is supposed to be set by the Tcl runtime.

Let's demonstrate (on a Linux-based system with UTF-8 locale):

  tmp$ touch 'привет мир.txt'
  tmp$ tclsh
  % encoding system
  utf-8
  % glob привет*.txt
  {привет мир.txt}
  % encoding system cp1251
  % encoding system
  cp1251
  % glob привет*.txt
  no files matched glob pattern "привет*.txt"
  %

Here, CP1251 is a Windows "code page" for Cyrillic; it's what
[encoding system] reports on Windows systems.

Note that I create a file whose name is two words in Cyrillic script encoded
in UTF-8, and Tcl is fine finding and dispaying this file (via its "glob"
command). But as soon as I change the Tcl's system encoding to another
8-bit Cyrillic encoding, globbing stops working.

Moreover, if I'd do [glob *.txt] so that my file would be matched anyway,
its named would not be readable since Tcl would re-encode it from CP1251
to Unicode, but the name is encoded in UTF-8, not CP1251.

In other words, your patch (supposedly) works on UTF-8-based systems
which is common to Linux-based OSes and MacOS, but I'm afraid it won't work on
Windows.
Junio C Hamano Feb. 19, 2025, 6:02 p.m. UTC | #2
Konstantin Khomoutov <kostix@bswap.ru> writes:

> On Tue, Feb 18, 2025 at 10:42:25PM +0000, Kazuhiro Kato via GitGitGadget wrote:
>
> [...]
>> diff --git a/gitk-git/gitk b/gitk-git/gitk
>> index 47a7c1d29c4..88951ed2384 100755
>> --- a/gitk-git/gitk
>> +++ b/gitk-git/gitk
>> @@ -12379,6 +12379,7 @@ catch {
>>  if {$gitencoding == ""} {
>>      set gitencoding "utf-8"
>>  }
>> +encoding system utf-8
>>  set tclencoding [tcl_encoding $gitencoding]
>>  if {$tclencoding == {}} {
>>      puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"
>
> I'm not sure one should sensibly do this - except to implement some
> well-understood and well-documented kludge, simply because the "system
> encoding" is supposed to be set by the Tcl runtime.
> ...
> In other words, your patch (supposedly) works on UTF-8-based systems
> which is common to Linux-based OSes and MacOS, but I'm afraid it won't work on
> Windows.

In other words, things should work without "encoding system blah"
forcing a particular encoding that Tcl may not agree with?  Would
this mean perhaps in a "curious" repository with paths encoded in
something Tcl does not expect to be used (e.g., on a UTF-8 system
somehow EUC-jp is used for paths containing Japanese characters), it
needs to be possible to specify a "curious" encoding either with an
end-user on-demand action (e.g., menu items) or with an repository
configuration (e.g., gitk.pathencoding = euc-jp)?

Thanks.
diff mbox series

Patch

diff --git a/gitk-git/gitk b/gitk-git/gitk
index 47a7c1d29c4..88951ed2384 100755
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -12379,6 +12379,7 @@  catch {
 if {$gitencoding == ""} {
     set gitencoding "utf-8"
 }
+encoding system utf-8
 set tclencoding [tcl_encoding $gitencoding]
 if {$tclencoding == {}} {
     puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"