Message ID | 20240521141452.26210-1-tboegi@web.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v3,1/1] macOS: ls-files path fails if path of workdir is NFD | expand |
tboegi@web.de writes: > Add a missing call to precompose_string_if_needed() to this code > in setup.c : > `work_tree = precompose_string_if_needed(get_git_work_tree());` This is new in this iteration, I presume? The old one did the precompose only in strbuf_getcwd(). We now precompose also the result of get_git_work_tree(). Two questions. * It is unclear to me why this makes a difference only when the precompuse configuration is set only in the local configuration. * As the leading part of the value placed in get_git_work_tree() comes from strbuf_getcwd() called by abspath.c:real_pathdup() that is called by repository.c:repo_set_worktree(), doesn't this potentially call precompse twice on the already precomposed early parth of the get_git_work_tree() result? I suspect that with the arrangement in your test, the argument given to set_git_work_tree() from setup.c:setup_discovered_git_dir() is always ".", and that dot is passed to repository.c:repo_set_worktree() which calls abspath.c:real_pathdup() to turn it into an absolute, where it has a call to strbuf_getcwd(). So with the provided test, I suspect there is no difference between the previous and this iteration in behaviour, as what is fed to precompose should be identical? What this iteration does differently is that inside real_pathdup(), if the string given to repo_set_worktree() is more than the trivial ".", it is appended to the result of strbuf_getcwd(), and the new code precomposes after such appending in real_pathdup() happens. It will convert the leading part twice [*] and more importantly the appended part is now converted, unlike the previous one? Side note: [*] hopefully precompose is idempotent? Relying on that property somewhat feels yucky, though. Puzzled... Will replace and queue, but I couldn't figure out what is going on with the help by the proposed log message, so... Thanks.
On Tue, May 21, 2024 at 10:50:25AM -0700, Junio C Hamano wrote: > tboegi@web.de writes: > > > Add a missing call to precompose_string_if_needed() to this code > > in setup.c : > > `work_tree = precompose_string_if_needed(get_git_work_tree());` > > This is new in this iteration, I presume? The old one did the > precompose only in strbuf_getcwd(). We now precompose also the > result of get_git_work_tree(). > > Two questions. > > * It is unclear to me why this makes a difference only when the > precompuse configuration is set only in the local configuration. > > * As the leading part of the value placed in get_git_work_tree() > comes from strbuf_getcwd() called by abspath.c:real_pathdup() > that is called by repository.c:repo_set_worktree(), doesn't this > potentially call precompse twice on the already precomposed early > parth of the get_git_work_tree() result? > > I suspect that with the arrangement in your test, the argument given > to set_git_work_tree() from setup.c:setup_discovered_git_dir() is > always ".", and that dot is passed to repository.c:repo_set_worktree() > which calls abspath.c:real_pathdup() to turn it into an absolute, > where it has a call to strbuf_getcwd(). > > So with the provided test, I suspect there is no difference between > the previous and this iteration in behaviour, as what is fed to > precompose should be identical? > > What this iteration does differently is that inside real_pathdup(), > if the string given to repo_set_worktree() is more than the trivial > ".", it is appended to the result of strbuf_getcwd(), and the new > code precomposes after such appending in real_pathdup() happens. It > will convert the leading part twice [*] and more importantly the > appended part is now converted, unlike the previous one? > > Side note: [*] hopefully precompose is idempotent? Relying > on that property somewhat feels yucky, though. > > Puzzled... > > Will replace and queue, but I couldn't figure out what is going on > with the help by the proposed log message, so... Acknowledge. The commit message deserves an update, for sure. My suggestion would be too keep it in seen, until I have managed to write a better commit message. At the same time, I would ask Jun-ichi Takimoto to do a re-test of the new version.
Torsten Bögershausen <tboegi@web.de> writes: > The commit message deserves an update, for sure. > My suggestion would be too keep it in seen, until I have managed > to write a better commit message. > At the same time, I would ask Jun-ichi Takimoto to do a re-test > of the new version. Yup, that sounds extremely sensible. Thanks for working on this.
Unfortunately v3 still doesn't work. 'git ls-files NFD' works but 'git ls-files NFC' does not. I think it better to test both "ls-config NFD" and "ls-config NFC". The reason of the failure seems to be the same as v2, but I describe it here in more detail (or too detailed). (one of?) The problem is the_repository->config->hash_initialized is set to 1 before the_repository->commondir is set to ".git". Due to this, .git/config is never read, and precomposed_unicode is never set to 1 (remains -1). run_builtin() { setup_git_directory() { strbuf_getcwd() { # setup.c:1542 precompose_{strbuf,string}_if_needed() { # precomposed_unicode is still -1 git_congig_get_bool("core.precomposeunicode") { git_config_check_init() { repo_read_config() { git_config_init() { # !!! the_repository->config->hash_initialized=1 # !!! } # does not read .git/config since # the_repository->commondir is still NULL } } } returns without converting to NFC } returns cwd in NFD } setup_discovered_git_dir() { set_git_work_tree(".") { repo_set_worktree() { # this function indirectly calls strbuf_getcwd() # --> precompose_{strbuf,string}_if_needed() --> # {git,repo}_config_get_bool("core.precomposeunicode"), # but does not try to read .git/config since # the_repository->config->hash_initialized # is already set to 1 above. And it will not read # .git/config even if hash_initialized is 0 # since the_repository->commondir is still NULL. the_repository->worktree = NFD } } } setup_git_env() { repo_setup_gitdir() { repo_set_commondir() { # finally commondir is set here the_repository->commondir = ".git" } } } } // END setup_git_directory precompose_argv_prefix() { # since the_repository->config->hash_initialized is still 1 # .git/config is not read and precomposed_unicode remains -1, # and argv (if in NFD) is not converted to NFC } cmd_ls_files() { parse_pathspec(.., argv /* may be in NFD, see above */) { init_path_spec_item() { prefix_path_gently() { abspath_part_inside_repo() { work_tree = precomose_string_if_needed( get_git_work_gree()) # get_git_work_tree() returns NFD, and # precompose_string_if_needed() does not # convert it to NFC since # the_repository->config->hash_initialized is 1 worktree = NFD returns 0 for "ls-files NFD" since both argv and work_tree are in NFD, but returns -1 for "ls-files NFC" since argv is in NFC. } returns NULL for "ls-files NFC" } die() at pathspec.c:499 for "ls-files NFC" } } } } END run_builtin I don't know how to fix the problem, but I think it better to avoid calling precompose_{strbuf,string}_if_needed() before commondir is set to ".git" and .git/config is successfully read. Or reset the_repository->config->hash_initialized at some point?
On Fri, May 24, 2024 at 12:33:08AM +0900, Jun. T wrote: > > Unfortunately v3 still doesn't work. > 'git ls-files NFD' works but 'git ls-files NFC' does not. > > I think it better to test both "ls-config NFD" and "ls-config NFC". > > The reason of the failure seems to be the same as v2, but > I describe it here in more detail (or too detailed). Thanks for testing - I was fully convinced that the new test case did cover all problems - but proved to be wrong. [snip the nice analyses] > I don't know how to fix the problem, but I think it better to avoid > calling precompose_{strbuf,string}_if_needed() before commondir > is set to ".git" and .git/config is successfully read. > > Or reset the_repository->config->hash_initialized at some point? > I think that I may be able to offer a v4 patch, which has better test cases. From my understanding, the reading of the local/repo .git/config does not work as we need it, since the .git/ directroy is determined after the config has been read. And so it is never read. Having said that, a patch relying on the global .gitconfig may still be an improvement on it's own.
diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c index 0bd5c24250..5a7c90c90d 100644 --- a/compat/precompose_utf8.c +++ b/compat/precompose_utf8.c @@ -94,6 +94,16 @@ const char *precompose_string_if_needed(const char *in) return in; } +void precompose_strbuf_if_needed(struct strbuf *sb) +{ + char *buf_prec = (char *)precompose_string_if_needed(sb->buf); + if (buf_prec != sb->buf) { + size_t buf_prec_len = strlen(buf_prec); + free(strbuf_detach(sb, NULL)); + strbuf_attach(sb, buf_prec, buf_prec_len, buf_prec_len + 1); + } +} + const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix) { int i = 0; diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h index fea06cf28a..7c3cfcadb0 100644 --- a/compat/precompose_utf8.h +++ b/compat/precompose_utf8.h @@ -30,6 +30,7 @@ typedef struct { const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix); const char *precompose_string_if_needed(const char *in); +void precompose_strbuf_if_needed(struct strbuf *sb); void probe_utf8_pathname_composition(void); PREC_DIR *precompose_utf8_opendir(const char *dirname); diff --git a/git-compat-util.h b/git-compat-util.h index 3e7a59b5ff..8b63108f16 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -331,6 +331,7 @@ static inline const char *precompose_string_if_needed(const char *in) return in; } +#define precompose_strbuf_if_needed(a) #define probe_utf8_pathname_composition() #endif diff --git a/setup.c b/setup.c index 2e607632db..61f61496ec 100644 --- a/setup.c +++ b/setup.c @@ -48,7 +48,7 @@ static int abspath_part_inside_repo(char *path) size_t wtlen; char *path0; int off; - const char *work_tree = get_git_work_tree(); + const char *work_tree = precompose_string_if_needed(get_git_work_tree()); struct strbuf realpath = STRBUF_INIT; if (!work_tree) diff --git a/strbuf.c b/strbuf.c index 4c9ac6dc5e..b05581d8e7 100644 --- a/strbuf.c +++ b/strbuf.c @@ -569,6 +569,7 @@ int strbuf_getcwd(struct strbuf *sb) strbuf_grow(sb, guessed_len); if (getcwd(sb->buf, sb->alloc)) { strbuf_setlen(sb, strlen(sb->buf)); + precompose_strbuf_if_needed(sb); return 0; } diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh index 325eb1c3cd..5a9ee5be92 100755 --- a/t/t0050-filesystem.sh +++ b/t/t0050-filesystem.sh @@ -156,4 +156,30 @@ test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case in ) ' +test_expect_success 'git ls-files under NFD' ' + ( + mkdir -p "somewhere/$aumlcdiar" && + mypwd=$PWD && + cd "somewhere/$aumlcdiar" && + git init && + git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar" 2>err && + >expected && + test_cmp expected err + ) +' + +# Re-do the same test. Note: global core.precomposeunicode is changed +test_expect_success 'git ls-files under NFD. global precompose false' ' + test_when_finished "git config --global --unset core.precomposeunicode" && + ( + mypwd=$PWD && + cd "somewhere/$aumlcdiar" && + git config --global core.precomposeunicode false && + git config core.precomposeunicode true && + git --literal-pathspecs ls-files "$mypwd/somewhere/$aumlcdiar" 2>err && + >expected && + test_cmp expected err + ) +' + test_done