Message ID | cover.1686251688.git.jonathantanmy@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Changed path filter hash fix and version bump | expand |
On 08/06/2023 20:21, Jonathan Tan wrote: > Here's an updated version with changes only in the tests: > - resilience against unsigned-by-default (a "skip" will be shown) > - for systems that have different quoting behavior, the tests > will be skipped (see patch 2 for more information) > > Some of the test changes may seem a bit of a hack, so if you have > a better way of doing things, please let me know. > > I've also included a change to the file format specification. To > reviewers, if you generally agree that we need a version 2 but are still > unsure about how we should migrate, consider saying so and then perhaps > we can merge the first patch while the rest remain under review. This > will give other projects, like JGit, more certainty as to the direction > that the Git project wants to take. > > Jonathan Tan (4): > gitformat-commit-graph: describe version 2 of BDAT > t4216: test changed path filters with high bit paths > repo-settings: introduce commitgraph.changedPathsVersion > commit-graph: new filter ver. that fixes murmur3 > > Documentation/config/commitgraph.txt | 16 ++++- > Documentation/gitformat-commit-graph.txt | 9 ++- > bloom.c | 65 +++++++++++++++++- > bloom.h | 8 ++- > commit-graph.c | 29 ++++++-- > oss-fuzz/fuzz-commit-graph.c | 2 +- > repo-settings.c | 6 +- > repository.h | 2 +- > t/helper/test-bloom.c | 9 ++- > t/t0095-bloom.sh | 8 +++ > t/t4216-log-bloom.sh | 86 ++++++++++++++++++++++++ > 11 files changed, 219 insertions(+), 21 deletions(-) > > Range-diff against v2: > -: ---------- > 1: d4b63945f6 gitformat-commit-graph: describe version 2 of BDAT > 1: c587eb3470 ! 2: aa4535776e t4216: test changed path filters with high bit paths > @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm > +# chosen to be the same under all Unicode normalization forms > +CENT=$(printf "\xc2\xa2") I think the only change you need to make here (because /usr/bin/sh on Ubuntu is usually 'dash' not 'bash') is to use octal rather than hexadecimal. ie: CENT=$(printf "\302\242") HTH ATB, Ramsay Jones
Ramsay Jones <ramsay@ramsayjones.plus.com> writes: > > +CENT=$(printf "\xc2\xa2") > > I think the only change you need to make here (because /usr/bin/sh > on Ubuntu is usually 'dash' not 'bash') is to use octal rather than > hexadecimal. ie: CENT=$(printf "\302\242") > > HTH > > ATB, > Ramsay Jones Thanks! Yes, it works. I've checked that it passes CI [1] and will wait for other reviews before sending out a new version to the list. [1] https://github.com/jonathantanmy/git/actions/runs/5216513514
Ramsay Jones <ramsay@ramsayjones.plus.com> writes: > I think the only change you need to make here (because /usr/bin/sh > on Ubuntu is usually 'dash' not 'bash') is to use octal rather than > hexadecimal. ie: CENT=$(printf "\302\242") Perhaps an addition to Documentation/CodingGuidelines is in order? Documentation/CodingGuidelines | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines index 9d5c27807a..78bc60665d 100644 --- c/Documentation/CodingGuidelines +++ w/Documentation/CodingGuidelines @@ -188,6 +188,15 @@ For shell scripts specifically (not exhaustive): hopefully nobody starts using "local" before they are reimplemented in C ;-) + - In 'printf' format string, do not use hexadecimals, as they are not + portable. Write + + CENT=$(printf "\302\242") + + not + + CENT=$(printf "\xc2\xa2") + For C programs:
Junio C Hamano <gitster@pobox.com> writes: > Perhaps an addition to Documentation/CodingGuidelines is in order? > > Documentation/CodingGuidelines | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git c/Documentation/CodingGuidelines w/Documentation/CodingGuidelines > index 9d5c27807a..78bc60665d 100644 > --- c/Documentation/CodingGuidelines > +++ w/Documentation/CodingGuidelines > @@ -188,6 +188,15 @@ For shell scripts specifically (not exhaustive): > hopefully nobody starts using "local" before they are reimplemented > in C ;-) > > + - In 'printf' format string, do not use hexadecimals, as they are not > + portable. Write > + > + CENT=$(printf "\302\242") > + > + not > + > + CENT=$(printf "\xc2\xa2") I've checked with "dash" and this applies to any quoted string, not just when passed to printf. I'll prepare a patch describing this.
Jonathan Tan <jonathantanmy@google.com> writes: >> + - In 'printf' format string, do not use hexadecimals, as they are not >> + portable. Write >> + >> + CENT=$(printf "\302\242") >> + >> + not >> + >> + CENT=$(printf "\xc2\xa2") > > I've checked with "dash" and this applies to any quoted string, not just > when passed to printf. I'll prepare a patch describing this. What do you mean by "any quoted string"? I think built-in 'echo' of dash takes "\302\242" and emits the cent-sign (but /bin/echo may not), but I do not think it is "any quoted string". To wit: dash$ echo '\302\242' ยข The echo built into dash shows the cent-sign. The example does not let us tell if is the shell (i.e. the 'echo' command sees the cent-sign in its argv[1]) or if it is the command (i.e. the 'echo' sees 8-byte string "bs three zero two bs two four two" in its argv[1], and shows that as the cent-sign), though. dash$ /bin/echo '\302\242' \302\242 This shows that shell is not doing anything fancy. /bin/echo gets the 8-byte string in its argv[1] and emits that as-is. dash$ /bin/echo "\302\242" \302\242 Again this shows the same; I added this example to demonstrate that it is _not_ like the shell interprets the backslashed octal strings depending on how they are quoted. dash$ printf "%s\n" '\302\242' \302\242 dash$ printf "%s\n" "\302\242" \302\242 And these demonstrates that argv[2] given to printf in these cases are 8-byte string "bs three zero two bs two four two" and the shell is not doing anything fancy. So, I would suggest not saying "any quoted string". In addition, even though dash's built-in echo that recognizes "\0num" seems to be conformant to what POSIX specifies (cf. [*1*]), GNU requires "-e" in order to do so in both standalone 'echo' binary and one built into 'bash', so we cannot rely on this POSIX behaviour when writing a portable script. Hence, I would recommend us to focus on giving a piece of advice on use of printf in this part of the documentation. [References] *1* https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html *2* https://www.gnu.org/software/coreutils/manual/html_node/echo-invocation.html