diff mbox series

[v4,1/1] macOS: ls-files path fails if path of workdir is NFD

Message ID 20240531193156.28046-1-tboegi@web.de (mailing list archive)
State New, archived
Headers show
Series [v4,1/1] macOS: ls-files path fails if path of workdir is NFD | expand

Commit Message

Torsten Bögershausen May 31, 2024, 7:31 p.m. UTC
From: Torsten Bögershausen <tboegi@web.de>

Under macOS, `git ls-files path` does not work (gives an error)
if the absolute 'path' contains characters in NFD (decomposed).
This happens when core.precomposeunicode is true, which is the
most common case. The bug report says:

$ cd somewhere          # some safe place, /tmp or ~/tmp etc.
$ mkdir $'u\xcc\x88'    # ü in NFD
$ cd ü                  # or cd $'u\xcc\x88' or cd $'\xc3\xbc'
$ git init
$ git ls-files $'/somewhere/u\xcc\x88'   # NFD
  fatal: /somewhere/ü: '/somewhere/ü' is outside repository at '/somewhere/ü'
$ git ls-files $'/somewhere/\xc3\xbc'    # NFC
(the same error as above)

In the 'fatal:' error message, there are three ü;
the 1st and 2nd are in NFC, the 3rd is in NFD.

Add test cases that follows the bug report, with the simplification
that the 'ü' is replaced by an 'ä', which is already used as NFD and
NFC in t3910.

The solution is to add a call to precompose_string_if_needed()
to this code in setup.c :
`work_tree = precompose_string_if_needed(get_git_work_tree());`

There is, however, a limitation with this very usage of Git:
The (repo) local .gitconfig file is not used, only the global
"core.precomposeunicode" is taken into account, if it is set (or not).
To set it to true is a good recommendation anyway, and here is the
analyzes from Jun T :

The problem is the_repository->config->hash_initialized
is set to 1 before the_repository->commondir is set to ".git".
Due to this, .git/config is never read, and precomposed_unicode
is never set to 1 (remains -1).

run_builtin() {
    setup_git_directory() {
        strbuf_getcwd() {   # setup.c:1542
            precompose_{strbuf,string}_if_needed() {
                # precomposed_unicode is still -1
                git_congig_get_bool("core.precomposeunicode") {
                    git_config_check_init() {
                        repo_read_config() {
                            git_config_init() {
                                # !!!
                                the_repository->config->hash_initialized=1
                                # !!!
                            }
                            # does not read .git/config since
                            # the_repository->commondir is still NULL
                        }
                    }
                }
                returns without converting to NFC
            }
            returns cwd in NFD
        }

        setup_discovered_git_dir() {
            set_git_work_tree(".") {
                repo_set_worktree() {
                    # this function indirectly calls strbuf_getcwd()
                    # --> precompose_{strbuf,string}_if_needed() -->
                    # {git,repo}_config_get_bool("core.precomposeunicode"),
                    # but does not try to read .git/config since
                    # the_repository->config->hash_initialized
                    # is already set to 1 above. And it will not read
                    # .git/config even if hash_initialized is 0
                    # since the_repository->commondir is still NULL.

                    the_repository->worktree = NFD
                }
            }
        }

        setup_git_env() {
            repo_setup_gitdir() {
                repo_set_commondir() {
                    # finally commondir is set here
                    the_repository->commondir = ".git"
                }
            }
        }

    } // END setup_git_directory

Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 setup.c                      |  2 +-
 t/t3910-mac-os-precompose.sh | 39 +++++++++++++++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 2 deletions(-)

--
2.41.0.394.ge43f4fd0bd

Comments

Junio C Hamano June 1, 2024, 3:55 p.m. UTC | #1
tboegi@web.de writes:

> The problem is the_repository->config->hash_initialized
> is set to 1 before the_repository->commondir is set to ".git".
> Due to this, .git/config is never read, and precomposed_unicode
> is never set to 1 (remains -1).

The "is never read" part is a bit confusing and misleading.  If it
were

    At the point of code flow where we would want to learn the value
    of precompose configuration, the local configuration has not
    been read.

then I would understand it, though.

Is the analysis telling us that we need to rethink the order of
things setup_git_directory() does?  Or is this inherently unsolvable
because we need to discover the .git/ directory and path to it
before we can read configuration to learn from it, but we need the
value of precompose setting to compute the "path to it"?

Presumably chdir() done in setup_discovered_git_dir() can be done
with either NFC or NFD if the filesystem is squashing the
differences between them, so perhaps doing the repo_set_worktree()
done in setup_discovered_git_dir() is wrong, and we could delay
populating the .worktree member until much later?  For reading the
local config, it does matter we know where the git-dir and
common-dir are, but the location of worktree is immaterial.

Anyway, thanks for a patch.  Will queue.

> run_builtin() {
>     setup_git_directory() {
>         strbuf_getcwd() {   # setup.c:1542
>             precompose_{strbuf,string}_if_needed() {
>                 # precomposed_unicode is still -1
>                 git_congig_get_bool("core.precomposeunicode") {
>                     git_config_check_init() {
>                         repo_read_config() {
>                             git_config_init() {
>                                 # !!!
>                                 the_repository->config->hash_initialized=1
>                                 # !!!
>                             }
>                             # does not read .git/config since
>                             # the_repository->commondir is still NULL
>                         }
>                     }
>                 }
>                 returns without converting to NFC
>             }
>             returns cwd in NFD
>         }
>
>         setup_discovered_git_dir() {
>             set_git_work_tree(".") {
>                 repo_set_worktree() {
>                     # this function indirectly calls strbuf_getcwd()
>                     # --> precompose_{strbuf,string}_if_needed() -->
>                     # {git,repo}_config_get_bool("core.precomposeunicode"),
>                     # but does not try to read .git/config since
>                     # the_repository->config->hash_initialized
>                     # is already set to 1 above. And it will not read
>                     # .git/config even if hash_initialized is 0
>                     # since the_repository->commondir is still NULL.
>
>                     the_repository->worktree = NFD
>                 }
>             }
>         }
>
>         setup_git_env() {
>             repo_setup_gitdir() {
>                 repo_set_commondir() {
>                     # finally commondir is set here
>                     the_repository->commondir = ".git"
>                 }
>             }
>         }
>
>     } // END setup_git_directory
>
> Reported-by: Jun T <takimoto-j@kba.biglobe.ne.jp>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  setup.c                      |  2 +-
>  t/t3910-mac-os-precompose.sh | 39 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index 2e607632db..61f61496ec 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -48,7 +48,7 @@ static int abspath_part_inside_repo(char *path)
>  	size_t wtlen;
>  	char *path0;
>  	int off;
> -	const char *work_tree = get_git_work_tree();
> +	const char *work_tree = precompose_string_if_needed(get_git_work_tree());
>  	struct strbuf realpath = STRBUF_INIT;
>
>  	if (!work_tree)
> diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
> index 898267a6bd..6d5918c8fe 100755
> --- a/t/t3910-mac-os-precompose.sh
> +++ b/t/t3910-mac-os-precompose.sh
> @@ -37,6 +37,27 @@ Alongc=$Alongc$Alongc$Alongc$Alongc$Alongc           #50 Byte
>  Alongc=$Alongc$Alongc$Alongc$Alongc$Alongc           #250 Byte
>  Alongc=$Alongc$AEligatu$AEligatu                     #254 Byte
>
> +
> +ls_files_nfc_nfd () {
> +	test_when_finished "git config --global --unset core.precomposeunicode" &&
> +	prglbl=$1
> +	prlocl=$2
> +	aumlcreat=$3
> +	aumllist=$4
> +	git config --global core.precomposeunicode $prglbl &&
> +	(
> +		rm -rf .git &&
> +		mkdir -p "somewhere/$prglbl/$prlocl/$aumlcreat" &&
> +		mypwd=$PWD &&
> +		cd "somewhere/$prglbl/$prlocl/$aumlcreat" &&
> +		git init &&
> +		git config core.precomposeunicode $prlocl &&
> +		git --literal-pathspecs ls-files "$mypwd/somewhere/$prglbl/$prlocl/$aumllist" 2>err &&
> +		>expected &&
> +		test_cmp expected err
> +	)
> +}
> +
>  test_expect_success "detect if nfd needed" '
>  	precomposeunicode=$(git config core.precomposeunicode) &&
>  	test "$precomposeunicode" = true &&
> @@ -211,8 +232,8 @@ test_expect_success "unicode decomposed: git restore -p . " '
>  '
>
>  # Test if the global core.precomposeunicode stops autosensing
> -# Must be the last test case
>  test_expect_success "respect git config --global core.precomposeunicode" '
> +	test_when_finished "git config --global --unset core.precomposeunicode" &&
>  	git config --global core.precomposeunicode true &&
>  	rm -rf .git &&
>  	git init &&
> @@ -220,4 +241,20 @@ test_expect_success "respect git config --global core.precomposeunicode" '
>  	test "$precomposeunicode" = "true"
>  '
>
> +test_expect_success "ls-files false false nfd nfd" '
> +	ls_files_nfc_nfd false false $Adiarnfd $Adiarnfd
> +'
> +
> +test_expect_success "ls-files false true nfd nfd" '
> +	ls_files_nfc_nfd false true $Adiarnfd $Adiarnfd
> +'
> +
> +test_expect_success "ls-files true false nfd nfd" '
> +	ls_files_nfc_nfd true false $Adiarnfd $Adiarnfd
> +'
> +
> +test_expect_success "ls-files true true nfd nfd" '
> +	ls_files_nfc_nfd true true $Adiarnfd $Adiarnfd
> +'
> +
>  test_done
> --
> 2.41.0.394.ge43f4fd0bd
Torsten Bögershausen June 2, 2024, 7:40 p.m. UTC | #2
On Sat, Jun 01, 2024 at 08:55:03AM -0700, Junio C Hamano wrote:
> tboegi@web.de writes:
>
> > The problem is the_repository->config->hash_initialized
> > is set to 1 before the_repository->commondir is set to ".git".
> > Due to this, .git/config is never read, and precomposed_unicode
> > is never set to 1 (remains -1).
>
> The "is never read" part is a bit confusing and misleading.  If it
> were
>
>     At the point of code flow where we would want to learn the value
>     of precompose configuration, the local configuration has not
>     been read.
>
> then I would understand it, though.

Yes, the thing is that it has not been read, and will never been read,
in this very very usage of Git.

>
> Is the analysis telling us that we need to rethink the order of
> things setup_git_directory() does?  Or is this inherently unsolvable
> because we need to discover the .git/ directory and path to it
> before we can read configuration to learn from it, but we need the
> value of precompose setting to compute the "path to it"?
I think it is solvable, see below.
>
> Presumably chdir() done in setup_discovered_git_dir() can be done
> with either NFC or NFD if the filesystem is squashing the
> differences between them, so perhaps doing the repo_set_worktree()
> done in setup_discovered_git_dir() is wrong, and we could delay
> populating the .worktree member until much later?  For reading the
> local config, it does matter we know where the git-dir and
> common-dir are, but the location of worktree is immaterial.
>
> Anyway, thanks for a patch.  Will queue.

My understanding is that detecting the .git/ dir and then reading
the config file from here does not work here.
That may be better documented in this very commit message,
please feel free to ammend it.
The root cause may be fixed in a different commit later.

[snip]

The best info that I have at the moment is the call stack analysis
done by Jun. T, where the global core.precomposeunicode is not set,
and reading the local one "fails".
I hope this is a useful repetition:


  The following is some info I got (hope it is correct and useful),
  but I have no idea how to fix the problem.

  precompose_string_if_needed() works only if:
    precomposed_unicode is already set to 1, or
    git_config_get_bool("core.precomposeunicode") sets it to 1.

  But git_config_get_bool() reads the file .git/config only if:
    the_repository->commondir is already set to ".git".

  Back trace when the strbuf_getcwd() is called for the
  3rd time is (frame #4 is set_git_work_tree()):

    * frame #0: git`strbuf_getcwd(sb=0x00007ff7bfeff0a8) at strbuf.c:588:20
      frame #1: git`strbuf_realpath_1(resolved=0x00007ff7bfeff0a8, path=".", flags=2) at abspath.c:101:7
      frame #2: git`strbuf_realpath(resolved=0x00007ff7bfeff0a8, path=".", die_on_error=1) at abspath.c:219:9
      frame #3: git`real_pathdup(path=".", die_on_error=1) at abspath.c:240:6
      frame #4: git`repo_set_worktree(repo=0x000000010044eb98, path=".") at repository.c:145:19
      frame #5: git`set_git_work_tree(new_work_tree=".") at environment.c:278:2
      frame #6: git`setup_discovered_git_dir(gitdir=".git", cwd=0x0000000100435238, offset=16, repo_fmt=0x00007ff7bfeff1d8, nongit_ok=0x0000000000000000) at setup.c:1119:2
      frame #7: git`setup_git_directory_gently(nongit_ok=0x0000000000000000) at setup.c:1606:12
      frame #8: git`setup_git_directory at setup.c:1815:9
      frame #9: git`run_builtin(p=0x0000000100424d58, argc=2, argv=0x00007ff7bfeff6d8) at git.c:448:12
      frame #10: git`handle_builtin(argc=2, argv=0x00007ff7bfeff6d8) at git.c:729:3
      frame #11: git`run_argv(argcp=0x00007ff7bfeff54c, argv=0x00007ff7bfeff540) at git.c:793:4
      frame #12: git`cmd_main(argc=2, argv=0x00007ff7bfeff6d8) at git.c:928:19
      frame #13: git`main(argc=3, argv=0x00007ff7bfeff6d0) at common-main.c:62:11

  At this point, precomposed_unicode is still -1 and
  the_repository->commondir is still NULL.
  This means strbuf_getcwd() retuns NFD, and the_repository->worktree is set to NFD.

  Moreover, precompose_string_if_needed() calls
  git_config_get_bool("core.precomposeunicode"), and
  this function indirecly sets
  the_repository->config->hash_initialized = 1

  Later setup_git_directory_gently() (frame #7) calls
  setup_git_env() --> repo_set_gitdir() --> repo_set_commondir()
  and the_repository->commondir is now set to ".git".

  Then run_builtin() (frame #10) calls precompose_argv_prefix()
   --> precompose_string_if_needed(). Here we have
    precomposed_unicode = -1
    the_repository->config->hash_initialized = 1
  This means git_config_check_init() does not read
  .git/config (does not call repo_read_config()) even if
  the_repository->commondir is set to ".git",
  and precomposed_unicode is not set to 1.
  [snip]
Jun T June 4, 2024, 12:56 a.m. UTC | #3
> 2024/06/01 4:3, tboegi@web.de <mailto:tboegi@web.de> wrote:
> 
> The solution is to add a call to precompose_string_if_needed()
> to this code in setup.c :
> `work_tree = precompose_string_if_needed(get_git_work_tree());`

This simple patch works for both 'ls-files NFD' and 'ls-files NFC'.
> 
> There is, however, a limitation with this very usage of Git:
> The (repo) local .gitconfig file is not used,

core.precomposeunicode in .git/config is read, in function
precompose_argv_prefix(), and NFD in argv is converted to NFC.

But, as you know, the variable the_repository->worktree or
the return value of get_git_work_tree() is in NFD.

> 2024/06/03 4:40, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> The root cause may be fixed in a different commit later.


Of course fixing the 'root cause' is better, but I don't
know whether it's easy or not.

Anyway, thank you for woking on this problem.

--
Jun
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index 2e607632db..61f61496ec 100644
--- a/setup.c
+++ b/setup.c
@@ -48,7 +48,7 @@  static int abspath_part_inside_repo(char *path)
 	size_t wtlen;
 	char *path0;
 	int off;
-	const char *work_tree = get_git_work_tree();
+	const char *work_tree = precompose_string_if_needed(get_git_work_tree());
 	struct strbuf realpath = STRBUF_INIT;

 	if (!work_tree)
diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
index 898267a6bd..6d5918c8fe 100755
--- a/t/t3910-mac-os-precompose.sh
+++ b/t/t3910-mac-os-precompose.sh
@@ -37,6 +37,27 @@  Alongc=$Alongc$Alongc$Alongc$Alongc$Alongc           #50 Byte
 Alongc=$Alongc$Alongc$Alongc$Alongc$Alongc           #250 Byte
 Alongc=$Alongc$AEligatu$AEligatu                     #254 Byte

+
+ls_files_nfc_nfd () {
+	test_when_finished "git config --global --unset core.precomposeunicode" &&
+	prglbl=$1
+	prlocl=$2
+	aumlcreat=$3
+	aumllist=$4
+	git config --global core.precomposeunicode $prglbl &&
+	(
+		rm -rf .git &&
+		mkdir -p "somewhere/$prglbl/$prlocl/$aumlcreat" &&
+		mypwd=$PWD &&
+		cd "somewhere/$prglbl/$prlocl/$aumlcreat" &&
+		git init &&
+		git config core.precomposeunicode $prlocl &&
+		git --literal-pathspecs ls-files "$mypwd/somewhere/$prglbl/$prlocl/$aumllist" 2>err &&
+		>expected &&
+		test_cmp expected err
+	)
+}
+
 test_expect_success "detect if nfd needed" '
 	precomposeunicode=$(git config core.precomposeunicode) &&
 	test "$precomposeunicode" = true &&
@@ -211,8 +232,8 @@  test_expect_success "unicode decomposed: git restore -p . " '
 '

 # Test if the global core.precomposeunicode stops autosensing
-# Must be the last test case
 test_expect_success "respect git config --global core.precomposeunicode" '
+	test_when_finished "git config --global --unset core.precomposeunicode" &&
 	git config --global core.precomposeunicode true &&
 	rm -rf .git &&
 	git init &&
@@ -220,4 +241,20 @@  test_expect_success "respect git config --global core.precomposeunicode" '
 	test "$precomposeunicode" = "true"
 '

+test_expect_success "ls-files false false nfd nfd" '
+	ls_files_nfc_nfd false false $Adiarnfd $Adiarnfd
+'
+
+test_expect_success "ls-files false true nfd nfd" '
+	ls_files_nfc_nfd false true $Adiarnfd $Adiarnfd
+'
+
+test_expect_success "ls-files true false nfd nfd" '
+	ls_files_nfc_nfd true false $Adiarnfd $Adiarnfd
+'
+
+test_expect_success "ls-files true true nfd nfd" '
+	ls_files_nfc_nfd true true $Adiarnfd $Adiarnfd
+'
+
 test_done