Message ID | d33fb3d940cf97bdfe3dc544763a51df4874a5c0.1739918546.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | gitk: Fixing file name encoding issues. | expand |
On Tue, Feb 18, 2025 at 10:42:25PM +0000, Kazuhiro Kato via GitGitGadget wrote: [...] > diff --git a/gitk-git/gitk b/gitk-git/gitk > index 47a7c1d29c4..88951ed2384 100755 > --- a/gitk-git/gitk > +++ b/gitk-git/gitk > @@ -12379,6 +12379,7 @@ catch { > if {$gitencoding == ""} { > set gitencoding "utf-8" > } > +encoding system utf-8 > set tclencoding [tcl_encoding $gitencoding] > if {$tclencoding == {}} { > puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk" I'm not sure one should sensibly do this - except to implement some well-understood and well-documented kludge, simply because the "system encoding" is supposed to be set by the Tcl runtime. Let's demonstrate (on a Linux-based system with UTF-8 locale): tmp$ touch 'привет мир.txt' tmp$ tclsh % encoding system utf-8 % glob привет*.txt {привет мир.txt} % encoding system cp1251 % encoding system cp1251 % glob привет*.txt no files matched glob pattern "привет*.txt" % Here, CP1251 is a Windows "code page" for Cyrillic; it's what [encoding system] reports on Windows systems. Note that I create a file whose name is two words in Cyrillic script encoded in UTF-8, and Tcl is fine finding and dispaying this file (via its "glob" command). But as soon as I change the Tcl's system encoding to another 8-bit Cyrillic encoding, globbing stops working. Moreover, if I'd do [glob *.txt] so that my file would be matched anyway, its named would not be readable since Tcl would re-encode it from CP1251 to Unicode, but the name is encoded in UTF-8, not CP1251. In other words, your patch (supposedly) works on UTF-8-based systems which is common to Linux-based OSes and MacOS, but I'm afraid it won't work on Windows.
Konstantin Khomoutov <kostix@bswap.ru> writes: > On Tue, Feb 18, 2025 at 10:42:25PM +0000, Kazuhiro Kato via GitGitGadget wrote: > > [...] >> diff --git a/gitk-git/gitk b/gitk-git/gitk >> index 47a7c1d29c4..88951ed2384 100755 >> --- a/gitk-git/gitk >> +++ b/gitk-git/gitk >> @@ -12379,6 +12379,7 @@ catch { >> if {$gitencoding == ""} { >> set gitencoding "utf-8" >> } >> +encoding system utf-8 >> set tclencoding [tcl_encoding $gitencoding] >> if {$tclencoding == {}} { >> puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk" > > I'm not sure one should sensibly do this - except to implement some > well-understood and well-documented kludge, simply because the "system > encoding" is supposed to be set by the Tcl runtime. > ... > In other words, your patch (supposedly) works on UTF-8-based systems > which is common to Linux-based OSes and MacOS, but I'm afraid it won't work on > Windows. In other words, things should work without "encoding system blah" forcing a particular encoding that Tcl may not agree with? Would this mean perhaps in a "curious" repository with paths encoded in something Tcl does not expect to be used (e.g., on a UTF-8 system somehow EUC-jp is used for paths containing Japanese characters), it needs to be possible to specify a "curious" encoding either with an end-user on-demand action (e.g., menu items) or with an repository configuration (e.g., gitk.pathencoding = euc-jp)? Thanks.
diff --git a/gitk-git/gitk b/gitk-git/gitk index 47a7c1d29c4..88951ed2384 100755 --- a/gitk-git/gitk +++ b/gitk-git/gitk @@ -12379,6 +12379,7 @@ catch { if {$gitencoding == ""} { set gitencoding "utf-8" } +encoding system utf-8 set tclencoding [tcl_encoding $gitencoding] if {$tclencoding == {}} { puts stderr "Warning: encoding $gitencoding is not supported by Tcl/Tk"