diff mbox series

[v2,3/8] git-prompt: don't use shell arrays

Message ID 7e994eae7bc3dfa021262410c801ddb124ce24f1.1723727653.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit f2e264e43f6065e05ed7c53395ed167fc33eea2a
Headers show
Series git-prompt: support more shells v2 | expand

Commit Message

avih Aug. 15, 2024, 1:14 p.m. UTC
From: "Avi Halachmi (:avih)" <avihpit@yahoo.com>

Arrays only existed in the svn-upstream code, used to:
- Keep a list of svn remotes.
- Convert commit msg to array of words, extract the 2nd-to-last word.

Except bash/zsh, nearly all shells failed load on syntax errors here.

Now:
- The svn remotes are a list of newline-terminated values.
- The 2nd-to-last word is extracted using standard shell substrings.
- All shells can digest the svn-upstream code.

While using shell field splitting to extract the word is simple, and
doesn't even need non-standard code, e.g. set -- $(git log -1 ...),
it would have the same issues as the old array code: it depends on IFS
which we don't control, and it's subject to glob-expansion, e.g. if
the message happens to include * or **/* (as this commit message just
did), then the array could get huge. This was not great.

Now it uses standard shell substrings, and we know the exact delimiter
to expect, because it's the match from our grep just one line earlier.

The new word extraction code also fixes svn-upstream in zsh, because
previously it used arr[len-2], but because in zsh, unlike bash, array
subscripts are 1-based, it incorrectly extracted the 3rd-to-last word.
symptom: missing upstream status in a git-svn repo: u=, u+N-M, etc.

The breakage in zsh is surprising, because it was last touched by
  commit d0583da838 (prompt: fix show upstream with svn and zsh),
claiming to fix exactly that. However, it only mentions syntax fixes.
It's unclear if behavior was fixed too. But it was broken, now fixed.

Note LF=$'\n' and then using $LF instead of $'\n' few times.
A future commit will add fallback for shells without $'...', so this
would be the only line to touch instead of replacing every $'\n' .

Shells which could run the previous array code:
- bash

Shells which have arrays but were broken anyway:
- zsh: 1-based subscript
- ksh93: no "local" (the new code can't fix this part...)
- mksh, openbsd sh, pdksh: failed load on syntax error: "for ((...))".

More shells which Failed to load due to syntax error:
- dash, free/net bsd sh, busybox-ash, Schily Bourne shell, yash.

Signed-off-by: Avi Halachmi (:avih) <avihpit@yahoo.com>
---
 contrib/completion/git-prompt.sh | 48 ++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 18 deletions(-)

Comments

Patrick Steinhardt Aug. 16, 2024, 8:50 a.m. UTC | #1
On Thu, Aug 15, 2024 at 01:14:08PM +0000, Avi Halachmi (:avih) via GitGitGadget wrote:
> diff --git a/contrib/completion/git-prompt.sh b/contrib/completion/git-prompt.sh
> index 4cc2cf91bb6..75c3a813fda 100644
> --- a/contrib/completion/git-prompt.sh
> +++ b/contrib/completion/git-prompt.sh
> @@ -116,10 +116,10 @@ printf -v __git_printf_supports_v -- '%s' yes >/dev/null 2>&1
>  __git_ps1_show_upstream ()
>  {
>  	local key value
> -	local svn_remote svn_url_pattern="" count n
> +	local svn_remotes="" svn_url_pattern="" count n
>  	local upstream_type=git legacy="" verbose="" name=""
> +	local LF=$'\n'
>  
> -	svn_remote=()
>  	# get some config options from git-config
>  	local output="$(git config -z --get-regexp '^(svn-remote\..*\.url|bash\.showupstream)$' 2>/dev/null | tr '\0\n' '\n ')"
>  	while read -r key value; do
> @@ -132,7 +132,7 @@ __git_ps1_show_upstream ()
>  			fi
>  			;;
>  		svn-remote.*.url)
> -			svn_remote[$((${#svn_remote[@]} + 1))]="$value"
> +			svn_remotes=${svn_remotes}${value}${LF}  # URI\nURI\n...

I was wondering whether this is something we want to quote, mostly
because I still have the failures of dash in mind when assigning values
with spaces to a `local` variable without quoting. I do not know whether
the same issues also apply to non-local variables though, probably not.

Patrick
avih Aug. 16, 2024, 9:53 a.m. UTC | #2
On Friday, August 16, 2024 at 11:50:14 AM GMT+3, Patrick Steinhardt <ps@pks.im> wrote:
> On Thu, Aug 15, 2024 at 01:14:08PM +0000, Avi Halachmi (:avih) via GitGitGadget wrote:
>>
>> -            svn_remote[$((${#svn_remote[@]} + 1))]="$value"
>> +            svn_remotes=${svn_remotes}${value}${LF}  # URI\nURI\n...
>
>
> I was wondering whether this is something we want to quote, mostly
> because I still have the failures of dash in mind when assigning values
> with spaces to a `local` variable without quoting. I do not know whether
> the same issues also apply to non-local variables though, probably not.

IFS field splitting and glob expansion strictly never happen and never
happened at the assignment part of a "simple command", since the first
version of POSIX in 1994, so quotes are not needed to avoid that.

See my guidelines at the commit message of part 5/8 (git-prompt: add
some missing quotes), and this always works: a=$b$(foo bar)${x##baz}

However, this does not answer the question of whether we want to use
quotes in assignments.

My general take is to use quotes only when required, and if one is
not sure, then use quotes, or read the spec to be sure, but do keep
in mind that some older shells are not always fully compliant with
the latest POSIX spec. But unquoted assignment is ubiquitous.

I'd say to not quote in assignments, except to avoid tilde expansion
(which can only happen with unquoted literal tilde at the input, but
not after expansion or substitution).

But if there's a strong precedence or preference to use quotes in
assignment, then I can change it.

avih
Patrick Steinhardt Aug. 16, 2024, 10:52 a.m. UTC | #3
On Fri, Aug 16, 2024 at 09:53:36AM +0000, avih wrote:
>  On Friday, August 16, 2024 at 11:50:14 AM GMT+3, Patrick Steinhardt <ps@pks.im> wrote:
> > On Thu, Aug 15, 2024 at 01:14:08PM +0000, Avi Halachmi (:avih) via GitGitGadget wrote:
> >>
> >> -            svn_remote[$((${#svn_remote[@]} + 1))]="$value"
> >> +            svn_remotes=${svn_remotes}${value}${LF}  # URI\nURI\n...
> >
> >
> > I was wondering whether this is something we want to quote, mostly
> > because I still have the failures of dash in mind when assigning values
> > with spaces to a `local` variable without quoting. I do not know whether
> > the same issues also apply to non-local variables though, probably not.
> 
> IFS field splitting and glob expansion strictly never happen and never
> happened at the assignment part of a "simple command", since the first
> version of POSIX in 1994, so quotes are not needed to avoid that.

That's the theory, yes. But as said, we did hit bugs in similar areas in
dash where that wasn't properly honored, as Junio also pointed out on a
later patch. But that was in non-POSIX area anyway, as to the best of my
knowledge it only happens with `local` assignments.

Patrick
avih Aug. 16, 2024, 11:35 a.m. UTC | #4
On Friday, August 16, 2024 at 01:52:11 PM GMT+3, Patrick Steinhardt <ps@pks.im> wrote:
>On Fri, Aug 16, 2024 at 09:53:36AM +0000, avih wrote:
>>  On Friday, August 16, 2024 at 11:50:14 AM GMT+3, Patrick Steinhardt <ps@pks.im> wrote:
>> > On Thu, Aug 15, 2024 at 01:14:08PM +0000, Avi Halachmi (:avih) via GitGitGadget wrote:
>> >>
>> >> -            svn_remote[$((${#svn_remote[@]} + 1))]="$value"
>> >> +            svn_remotes=${svn_remotes}${value}${LF}  # URI\nURI\n...
>> >
>> >
>> > I was wondering whether this is something we want to quote, mostly
>> > because I still have the failures of dash in mind when assigning values
>> > with spaces to a `local` variable without quoting. I do not know whether
>> > the same issues also apply to non-local variables though, probably not.
>>
>> IFS field splitting and glob expansion strictly never happen and never
>> happened at the assignment part of a "simple command", since the first
>> version of POSIX in 1994, so quotes are not needed to avoid that.
>
> That's the theory, yes. But as said, we did hit bugs in similar areas in
> dash where that wasn't properly honored, as Junio also pointed out on a
> later patch. But that was in non-POSIX area anyway, as to the best of my
> knowledge it only happens with `local` assignments.

Yes. "local" is special, and not only because it's not POSIX.

The difference with "local" is that it takes assignment as arguments.

A "simple command" (posix term) is composed of optional assignment[s]
and optional command (and arguments).

The assigments part is never IFS-split or glob-expanded, while the
command and arguments part is (in words which include unquoted
expansion or substitution) and therefore needs quotes, e.g.:

foo=$x bar=$y echo a="$b" c="$d"

There are other commands (beyond "local") which take assignment[s]
as arguments, like "export", "readonly" and "command".

Before posix 2024, these commands also required quoting of the
arguments-assignments - just like "local" needed in dash.

But posix 2024 introduced the concept of a "declaration utility"
(which takes assignments as arguments, like export, readonly, etc),
and the concept of "assignment context" where IFS-split and glob
expansion don't happen - like the assignment part of a simple
command, but now also in the assignment arguments of declaration
utilities.

And indeed, new versions of shells now don't need quotes in export
etc, and shells now make "local" a declaration utility which
doesn't need quotes of the assignment args, including in dash.

However, the reason we do use quotes in local, export, etc, is
because many instances of shells which don't yet (or will ever)
support it still exist, so we quote for compatibility with those,
but still it's only needed in assignments which are arguments to
commands - not in the assignment part of a simple command.

I've also updated the wording a bit of my guidelines in part 5/8,
and I'll include it at the commit message of 5/8 v3.
Patrick Steinhardt Aug. 16, 2024, 12:38 p.m. UTC | #5
On Fri, Aug 16, 2024 at 11:35:33AM +0000, avih wrote:
>  On Friday, August 16, 2024 at 01:52:11 PM GMT+3, Patrick Steinhardt <ps@pks.im> wrote:
> >On Fri, Aug 16, 2024 at 09:53:36AM +0000, avih wrote:
> >>  On Friday, August 16, 2024 at 11:50:14 AM GMT+3, Patrick Steinhardt <ps@pks.im> wrote:
> >> > On Thu, Aug 15, 2024 at 01:14:08PM +0000, Avi Halachmi (:avih) via GitGitGadget wrote:
> >> >>
> >> >> -            svn_remote[$((${#svn_remote[@]} + 1))]="$value"
> >> >> +            svn_remotes=${svn_remotes}${value}${LF}  # URI\nURI\n...
> >> >
> >> >
> >> > I was wondering whether this is something we want to quote, mostly
> >> > because I still have the failures of dash in mind when assigning values
> >> > with spaces to a `local` variable without quoting. I do not know whether
> >> > the same issues also apply to non-local variables though, probably not.
> >>
> >> IFS field splitting and glob expansion strictly never happen and never
> >> happened at the assignment part of a "simple command", since the first
> >> version of POSIX in 1994, so quotes are not needed to avoid that.
> >
> > That's the theory, yes. But as said, we did hit bugs in similar areas in
> > dash where that wasn't properly honored, as Junio also pointed out on a
> > later patch. But that was in non-POSIX area anyway, as to the best of my
> > knowledge it only happens with `local` assignments.
> 
> Yes. "local" is special, and not only because it's not POSIX.
> 
> The difference with "local" is that it takes assignment as arguments.
> 
> A "simple command" (posix term) is composed of optional assignment[s]
> and optional command (and arguments).
> 
> The assigments part is never IFS-split or glob-expanded, while the
> command and arguments part is (in words which include unquoted
> expansion or substitution) and therefore needs quotes, e.g.:
> 
> foo=$x bar=$y echo a="$b" c="$d"
> 
> There are other commands (beyond "local") which take assignment[s]
> as arguments, like "export", "readonly" and "command".
> 
> Before posix 2024, these commands also required quoting of the
> arguments-assignments - just like "local" needed in dash.
> 
> But posix 2024 introduced the concept of a "declaration utility"
> (which takes assignments as arguments, like export, readonly, etc),
> and the concept of "assignment context" where IFS-split and glob
> expansion don't happen - like the assignment part of a simple
> command, but now also in the assignment arguments of declaration
> utilities.
> 
> And indeed, new versions of shells now don't need quotes in export
> etc, and shells now make "local" a declaration utility which
> doesn't need quotes of the assignment args, including in dash.
> 
> However, the reason we do use quotes in local, export, etc, is
> because many instances of shells which don't yet (or will ever)
> support it still exist, so we quote for compatibility with those,
> but still it's only needed in assignments which are arguments to
> commands - not in the assignment part of a simple command.
> 
> I've also updated the wording a bit of my guidelines in part 5/8,
> and I'll include it at the commit message of 5/8 v3.

Great, thanks for your thorough explanations!

Patrick
diff mbox series

Patch

diff --git a/contrib/completion/git-prompt.sh b/contrib/completion/git-prompt.sh
index 4cc2cf91bb6..75c3a813fda 100644
--- a/contrib/completion/git-prompt.sh
+++ b/contrib/completion/git-prompt.sh
@@ -116,10 +116,10 @@  printf -v __git_printf_supports_v -- '%s' yes >/dev/null 2>&1
 __git_ps1_show_upstream ()
 {
 	local key value
-	local svn_remote svn_url_pattern="" count n
+	local svn_remotes="" svn_url_pattern="" count n
 	local upstream_type=git legacy="" verbose="" name=""
+	local LF=$'\n'
 
-	svn_remote=()
 	# get some config options from git-config
 	local output="$(git config -z --get-regexp '^(svn-remote\..*\.url|bash\.showupstream)$' 2>/dev/null | tr '\0\n' '\n ')"
 	while read -r key value; do
@@ -132,7 +132,7 @@  __git_ps1_show_upstream ()
 			fi
 			;;
 		svn-remote.*.url)
-			svn_remote[$((${#svn_remote[@]} + 1))]="$value"
+			svn_remotes=${svn_remotes}${value}${LF}  # URI\nURI\n...
 			svn_url_pattern="$svn_url_pattern\\|$value"
 			upstream_type=svn+git # default upstream type is SVN if available, else git
 			;;
@@ -156,25 +156,37 @@  __git_ps1_show_upstream ()
 	case "$upstream_type" in
 	git)    upstream_type="@{upstream}" ;;
 	svn*)
-		# get the upstream from the "git-svn-id: ..." in a commit message
-		# (git-svn uses essentially the same procedure internally)
-		local -a svn_upstream
-		svn_upstream=($(git log --first-parent -1 \
-					--grep="^git-svn-id: \(${svn_url_pattern#??}\)" 2>/dev/null))
-		if [[ 0 -ne ${#svn_upstream[@]} ]]; then
-			svn_upstream=${svn_upstream[${#svn_upstream[@]} - 2]}
-			svn_upstream=${svn_upstream%@*}
-			local n_stop="${#svn_remote[@]}"
-			for ((n=1; n <= n_stop; n++)); do
-				svn_upstream=${svn_upstream#${svn_remote[$n]}}
-			done
+		# successful svn-upstream resolution:
+		# - get the list of configured svn-remotes ($svn_remotes set above)
+		# - get the last commit which seems from one of our svn-remotes
+		# - confirm that it is from one of the svn-remotes
+		# - use $GIT_SVN_ID if set, else "git-svn"
 
-			if [[ -z "$svn_upstream" ]]; then
+		# get upstream from "git-svn-id: UPSTRM@N HASH" in a commit message
+		# (git-svn uses essentially the same procedure internally)
+		local svn_upstream="$(
+			git log --first-parent -1 \
+				--grep="^git-svn-id: \(${svn_url_pattern#??}\)" 2>/dev/null
+		)"
+
+		if [ -n "$svn_upstream" ]; then
+			# extract the URI, assuming --grep matched the last line
+			svn_upstream=${svn_upstream##*$LF}  # last line
+			svn_upstream=${svn_upstream#*: }    # UPSTRM@N HASH
+			svn_upstream=${svn_upstream%@*}     # UPSTRM
+
+			case ${LF}${svn_remotes} in
+			*"${LF}${svn_upstream}${LF}"*)
+				# grep indeed matched the last line - it's our remote
 				# default branch name for checkouts with no layout:
 				upstream_type=${GIT_SVN_ID:-git-svn}
-			else
+				;;
+			*)
+				# the commit message includes one of our remotes, but
+				# it's not at the last line. is $svn_upstream junk?
 				upstream_type=${svn_upstream#/}
-			fi
+				;;
+			esac
 		elif [[ "svn+git" = "$upstream_type" ]]; then
 			upstream_type="@{upstream}"
 		fi