Message ID | 20200327080300.GA605934@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | upload-pack: handle unexpected v2 delim packets | expand |
On Fri, Mar 27, 2020 at 04:03:00AM -0400, Jeff King wrote: > The packetize() function takes its input on stdin, and requires 4 > separate sub-processes to format a simple string. We can do much better > by getting the length via the shell's "${#packet}" construct. The one > caveat is that the shell can't put a NUL into a variable, so we'll have > to continue to provide the stdin form for a few calls. > > There are a few other cleanups here in the touched code: > > - the stdin form of packetize() had an extra stray "%s" when printing > the packet > > - the converted calls in t5562 can be made simpler by redirecting > output as a block, rather than repeated appending > > Signed-off-by: Jeff King <peff@peff.net> > --- > t/t5562-http-backend-content-length.sh | 19 ++++++++++++------- > t/test-lib-functions.sh | 23 ++++++++++++++++------- > 2 files changed, 28 insertions(+), 14 deletions(-) > > diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh > index 4a110b307e..3f4ac71f83 100755 > --- a/t/t5562-http-backend-content-length.sh > +++ b/t/t5562-http-backend-content-length.sh > @@ -53,15 +53,20 @@ test_expect_success 'setup' ' > test_commit c1 && > hash_head=$(git rev-parse HEAD) && > hash_prev=$(git rev-parse HEAD~1) && > - printf "want %s" "$hash_head" | packetize >fetch_body && > - printf 0000 >>fetch_body && > - printf "have %s" "$hash_prev" | packetize >>fetch_body && > - printf done | packetize >>fetch_body && > + { > + packetize "want $hash_head" && > + printf 0000 && > + packetize "have $hash_prev" && > + packetize "done" > + } >fetch_body && Nicely done, this is an obvious improvement in clarity over the appending '>>' that was here before. The new version reads mouch more cleanly. > test_copy_bytes 10 <fetch_body >fetch_body.trunc && > hash_next=$(git commit-tree -p HEAD -m next HEAD^{tree}) && > - printf "%s %s refs/heads/newbranch\\0report-status\\n" "$ZERO_OID" "$hash_next" | packetize >push_body && > - printf 0000 >>push_body && > - echo "$hash_next" | git pack-objects --stdout >>push_body && > + { > + printf "%s %s refs/heads/newbranch\\0report-status\\n" \ > + "$ZERO_OID" "$hash_next" | packetize && > + printf 0000 && > + echo "$hash_next" | git pack-objects --stdout > + } >push_body && > test_copy_bytes 10 <push_body >push_body.trunc && > : >empty_body > ' > diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh > index 352c213d52..216918a58c 100644 > --- a/t/test-lib-functions.sh > +++ b/t/test-lib-functions.sh > @@ -1362,14 +1362,23 @@ nongit () { > ) > } 7>&2 2>&4 > > -# convert stdin to pktline representation; note that empty input becomes an > -# empty packet, not a flush packet (for that you can just print 0000 yourself). > +# convert function arguments or stdin (if not arguments given) to pktline > +# representation. If multiple arguments are given, they are separated by > +# whitespace and put in a single packet. Note that data containing NULs must be > +# given on stdin, and that empty input becomes an empty packet, not a flush > +# packet (for that you can just print 0000 yourself). > packetize() { > - cat >packetize.tmp && > - len=$(wc -c <packetize.tmp) && > - printf '%04x%s' "$(($len + 4))" && > - cat packetize.tmp && > - rm -f packetize.tmp > + if test $# -gt 0 > + then > + packet="$*" Mentioned off-list in a discussion already, but: though I find this behavior of joining multiple arguments by a whitespace character a little confusing (i.e., what would callers thing this function does if they hadn't read the documentation?) I think that this is probably the most sane thing that you could do here. On the other hand, nowhere in this patch do we currently call packetize with multiple arguments, so perhaps it would be made simpler if we instead wrote "$1" anywhere there was "$packet". > + printf '%04x%s' "$((4 + ${#packet}))" "$packet" > + else > + cat >packetize.tmp && > + len=$(wc -c <packetize.tmp) && > + printf '%04x' "$(($len + 4))" && > + cat packetize.tmp && > + rm -f packetize.tmp > + fi > } Right: if the contents contains an unrepresentable byte, then it has to be passed over stdin. This part is obviously correct. > # Parse the input as a series of pktlines, writing the result to stdout. > -- > 2.26.0.581.g322f77c0ee Thanks, Taylor
Jeff King <peff@peff.net> writes: > The packetize() function takes its input on stdin, and requires 4 > separate sub-processes to format a simple string. We can do much better > by getting the length via the shell's "${#packet}" construct. The one > caveat is that the shell can't put a NUL into a variable, so we'll have > to continue to provide the stdin form for a few calls. Yuck. Binary protocol and shell variables do not mix well. Documentation/CodingGuidelines forbids ${#parameter} in the first place and we seem to use it only when we know we are using bash. Perhaps we should start considering to lift it. I dunno. > +# convert function arguments or stdin (if not arguments given) to pktline > +# representation. If multiple arguments are given, they are separated by > +# whitespace and put in a single packet. Note that data containing NULs must be > +# given on stdin, and that empty input becomes an empty packet, not a flush > +# packet (for that you can just print 0000 yourself). > packetize() { > + if test $# -gt 0 > + then > + packet="$*" > + printf '%04x%s' "$((4 + ${#packet}))" "$packet" This allows packetize "want $hash_head" to be written like so: packetize want "$hash_head" which maybe is a handy thing to do. > + else > + cat >packetize.tmp && > + len=$(wc -c <packetize.tmp) && > + printf '%04x' "$(($len + 4))" && > + cat packetize.tmp && > + rm -f packetize.tmp perl -e ' my $data = do { local $?; <STDIN> }; printf "%04x%s", length($data), $data; ' That's one process but much heavier than cat/wc/printf/cat, I guess. > + fi > } > > # Parse the input as a series of pktlines, writing the result to stdout.
On Fri, Mar 27, 2020 at 12:18:34PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The packetize() function takes its input on stdin, and requires 4 > > separate sub-processes to format a simple string. We can do much better > > by getting the length via the shell's "${#packet}" construct. The one > > caveat is that the shell can't put a NUL into a variable, so we'll have > > to continue to provide the stdin form for a few calls. > > Yuck. Binary protocol and shell variables do not mix well. > > Documentation/CodingGuidelines forbids ${#parameter} in the first > place and we seem to use it only when we know we are using bash. > > Perhaps we should start considering to lift it. I dunno. Hmph. I had a vague recollection there but checked beforehand: - we do use it in t9010, which is /bin/sh (and have since 2010). I thought it might not be run on obscure platforms because it's svn-related, but I think it doesn't actually require svn. - it's in POSIX, at least as far back as 2004 (I couldn't find an easy copy of the 2001 version). That doesn't prove there aren't problematic systems, of course, but it at least passes the bar of "not even in POSIX". - it's not in check-non-portable-shell.pl. :) That doesn't mean CodingGuidelines is wrong, but we should probably reconcile them. Given that the first point means we've had a 10-year weather balloon, and that the original rule seems to have come from a big list of rules[1] rather than a specific known problem shell, I think we should declare it available. [1] https://lore.kernel.org/git/7vtznzf5jb.fsf@gitster.siamese.dyndns.org/ > > packetize() { > > + if test $# -gt 0 > > + then > > + packet="$*" > > + printf '%04x%s' "$((4 + ${#packet}))" "$packet" > > This allows > > packetize "want $hash_head" > > to be written like so: > > packetize want "$hash_head" > > which maybe is a handy thing to do. Yeah. I admit I don't care overly much between that and doing something else sensible with multiple arguments (like putting each one in its own packet). I just really didn't want to silently ignore anything after "$1". This at least behaves like "echo". > > + else > > + cat >packetize.tmp && > > + len=$(wc -c <packetize.tmp) && > > + printf '%04x' "$(($len + 4))" && > > + cat packetize.tmp && > > + rm -f packetize.tmp > > perl -e ' > my $data = do { local $?; <STDIN> }; > printf "%04x%s", length($data), $data; > ' > > That's one process but much heavier than cat/wc/printf/cat, I guess. Yeah, I was tempted to do that, but ${#packet} is even one process shorter. :) It might be worth simplifying the stdin path above, but it's much less important if most of those call-sites go away. -Peff
On Fri, Mar 27, 2020 at 09:16:07AM -0600, Taylor Blau wrote: > > packetize() { > > - cat >packetize.tmp && > > - len=$(wc -c <packetize.tmp) && > > - printf '%04x%s' "$(($len + 4))" && > > - cat packetize.tmp && > > - rm -f packetize.tmp > > + if test $# -gt 0 > > + then > > + packet="$*" > > Mentioned off-list in a discussion already, but: though I find this > behavior of joining multiple arguments by a whitespace character a > little confusing (i.e., what would callers thing this function does if > they hadn't read the documentation?) I think that this is probably the > most sane thing that you could do here. > > On the other hand, nowhere in this patch do we currently call packetize > with multiple arguments, so perhaps it would be made simpler if we > instead wrote "$1" anywhere there was "$packet". Of all the options, I like that the least because somebody doing: packetize foo bar would have the "bar" silently ignored. And it's more lines of code to check and complain about that than it is to just do something sensible. -Peff
Jeff King <peff@peff.net> writes: >> Documentation/CodingGuidelines forbids ${#parameter} in the first >> place and we seem to use it only when we know we are using bash. >> >> Perhaps we should start considering to lift it. I dunno. > > Hmph. I had a vague recollection there but checked beforehand: > > - we do use it in t9010, which is /bin/sh (and have since 2010). I > thought it might not be run on obscure platforms because it's > svn-related, but I think it doesn't actually require svn. I do not think I ran git-svn stuff myself for the past 10 years, though, after a few projects that matter to me migrated away from svn ;-) > - it's in POSIX, at least as far back as 2004 (I couldn't find an easy > copy of the 2001 version). That doesn't prove there aren't > problematic systems, of course, but it at least passes the bar of > "not even in POSIX". Yeah, IIRC the list was written in response to a request for _some_ guidance, so it largely came from in-house rules of my previous life, back when I had to deal with various flavours of UNIXen. > - it's not in check-non-portable-shell.pl. :) That doesn't mean > CodingGuidelines is wrong, but we should probably reconcile them. That checker came much much later than the guidelines so it is not surprising at all for it to be "buggy", in the sense that it does not check everything the guidelines ask. Yes, we may need bugfixes and there may be other bugs, too. > Yeah, I was tempted to do that, but ${#packet} is even one process > shorter. :) It might be worth simplifying the stdin path above, but it's > much less important if most of those call-sites go away. The largest issue (which is not that large, though) I felt with the approach when I saw the patch for the first time was that we need the big warning comment before the helper, i.e. > +# Note that data containing NULs must be given on stdin,... But after thinking about it a bit more, I think it is probably OK. I do not think you can assign a string with NUL in it to a variable and you can use "$(command substitution)" as an argument to the helper to pass such a string, either (not portably anyway). If such a string cannot be made into "$*", ${#packet} won't have to worry about counting bytes in a string with NUL in it to begin with, so the above note won't even be necessary (it would have to say "you cannot pass data containing NULs as arguments---you are welcome to try, but you won't be able to do so, not portably anyway"), anyway ;-).
Junio C Hamano <gitster@pobox.com> writes: I guess I didn't give an explicit conclusion in my message. >> - it's in POSIX, at least as far back as 2004 (I couldn't find an easy >> copy of the 2001 version). That doesn't prove there aren't >> problematic systems, of course, but it at least passes the bar of >> "not even in POSIX". > > Yeah, IIRC the list was written in response to a request for _some_ > guidance, so it largely came from in-house rules of my previous > life, back when I had to deal with various flavours of UNIXen. I strongly suspect that most of these historical curiosity systems died out or learned ${#posix}; I wouldn't at all be surprised if /bin/sh on Solaris back then was one of the motivating systems that led to the forbidding of the use of ${#parameter} in our in-house rules, but luckily, we have written it off as unsalvageable in this project ;-) It has been in POSIX long enough, and it is useful at times, so let's drop it from the list of guidelines (patch?). >> - it's not in check-non-portable-shell.pl. :) That doesn't mean >> CodingGuidelines is wrong, but we should probably reconcile them. > > That checker came much much later than the guidelines so it is not > surprising at all for it to be "buggy", in the sense that it does > not check everything the guidelines ask. Yes, we may need bugfixes > and there may be other bugs, too. This still gives us something to keep an eye on. -- >8 -- Subject: CodingGuidelines: allow ${#posix} == strlen($posix) The construct has been in POSIX for the past 10+ years, and we have used in t9xxx (subversion) series of the tests, so we know it is at least portable across systems that subversion Perl bindings have been ported to. Let's loosen the rule; luckily, the check-non-portable-shell script does not have any rule to find its use, so the only change needed is a removal of one paragraph from the documentation. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/CodingGuidelines | 2 -- 1 file changed, 2 deletions(-) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index ed4e443a3c..390ceece52 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -91,8 +91,6 @@ For shell scripts specifically (not exhaustive): - No shell arrays. - - No strlen ${#parameter}. - - No pattern replacement ${parameter/pattern/string}. - We use Arithmetic Expansion $(( ... )).
On Sat, Mar 28, 2020 at 05:11:59PM -0700, Junio C Hamano wrote: > > - we do use it in t9010, which is /bin/sh (and have since 2010). I > > thought it might not be run on obscure platforms because it's > > svn-related, but I think it doesn't actually require svn. > > I do not think I ran git-svn stuff myself for the past 10 years, > though, after a few projects that matter to me migrated away from > svn ;-) Me either. I don't even run the git-svn tests, usually. t9010 is about the svn fast-importer, though. We've all been running its tests for years, even though AFAIK it's not really used for anything. > The largest issue (which is not that large, though) I felt with the > approach when I saw the patch for the first time was that we need > the big warning comment before the helper, i.e. > > > +# Note that data containing NULs must be given on stdin,... > > But after thinking about it a bit more, I think it is probably OK. > I do not think you can assign a string with NUL in it to a variable > and you can use "$(command substitution)" as an argument to the > helper to pass such a string, either (not portably anyway). If such > a string cannot be made into "$*", ${#packet} won't have to worry > about counting bytes in a string with NUL in it to begin with, so > the above note won't even be necessary (it would have to say "you > cannot pass data containing NULs as arguments---you are welcome to > try, but you won't be able to do so, not portably anyway"), anyway > ;-). Yes. You could probably screw yourself with: packetize "$(printf "0020git-upload-pack\0host=example.com")" but presumably you'd quickly notice when your test fails (both dash and bash just drop the NUL byte entirely, and bash even issues a warning). -Peff
On Sat, Mar 28, 2020 at 08:05:10PM -0700, Junio C Hamano wrote: > -- >8 -- > Subject: CodingGuidelines: allow ${#posix} == strlen($posix) I'm in favor of this patch, but... > The construct has been in POSIX for the past 10+ years, and we have > used in t9xxx (subversion) series of the tests, so we know it is at > least portable across systems that subversion Perl bindings have > been ported to. It is even stronger than that. t9010 is run for everybody (with the exception of the last test, which needs svnadmin). -Peff
Jeff King <peff@peff.net> writes: > On Sat, Mar 28, 2020 at 08:05:10PM -0700, Junio C Hamano wrote: > >> -- >8 -- >> Subject: CodingGuidelines: allow ${#posix} == strlen($posix) > > I'm in favor of this patch, but... > >> The construct has been in POSIX for the past 10+ years, and we have >> used in t9xxx (subversion) series of the tests, so we know it is at >> least portable across systems that subversion Perl bindings have >> been ported to. > > It is even stronger than that. t9010 is run for everybody (with the > exception of the last test, which needs svnadmin). Ah, you're right. Lemme make it stronger. The construct has been in POSIX for the past 10+ years, and we have used in t9xxx (subversion) series of the tests, so we know it is at portable across systems that people have run those tests, which is almost everything we'd care about. Thanks.
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh index 4a110b307e..3f4ac71f83 100755 --- a/t/t5562-http-backend-content-length.sh +++ b/t/t5562-http-backend-content-length.sh @@ -53,15 +53,20 @@ test_expect_success 'setup' ' test_commit c1 && hash_head=$(git rev-parse HEAD) && hash_prev=$(git rev-parse HEAD~1) && - printf "want %s" "$hash_head" | packetize >fetch_body && - printf 0000 >>fetch_body && - printf "have %s" "$hash_prev" | packetize >>fetch_body && - printf done | packetize >>fetch_body && + { + packetize "want $hash_head" && + printf 0000 && + packetize "have $hash_prev" && + packetize "done" + } >fetch_body && test_copy_bytes 10 <fetch_body >fetch_body.trunc && hash_next=$(git commit-tree -p HEAD -m next HEAD^{tree}) && - printf "%s %s refs/heads/newbranch\\0report-status\\n" "$ZERO_OID" "$hash_next" | packetize >push_body && - printf 0000 >>push_body && - echo "$hash_next" | git pack-objects --stdout >>push_body && + { + printf "%s %s refs/heads/newbranch\\0report-status\\n" \ + "$ZERO_OID" "$hash_next" | packetize && + printf 0000 && + echo "$hash_next" | git pack-objects --stdout + } >push_body && test_copy_bytes 10 <push_body >push_body.trunc && : >empty_body ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 352c213d52..216918a58c 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1362,14 +1362,23 @@ nongit () { ) } 7>&2 2>&4 -# convert stdin to pktline representation; note that empty input becomes an -# empty packet, not a flush packet (for that you can just print 0000 yourself). +# convert function arguments or stdin (if not arguments given) to pktline +# representation. If multiple arguments are given, they are separated by +# whitespace and put in a single packet. Note that data containing NULs must be +# given on stdin, and that empty input becomes an empty packet, not a flush +# packet (for that you can just print 0000 yourself). packetize() { - cat >packetize.tmp && - len=$(wc -c <packetize.tmp) && - printf '%04x%s' "$(($len + 4))" && - cat packetize.tmp && - rm -f packetize.tmp + if test $# -gt 0 + then + packet="$*" + printf '%04x%s' "$((4 + ${#packet}))" "$packet" + else + cat >packetize.tmp && + len=$(wc -c <packetize.tmp) && + printf '%04x' "$(($len + 4))" && + cat packetize.tmp && + rm -f packetize.tmp + fi } # Parse the input as a series of pktlines, writing the result to stdout.
The packetize() function takes its input on stdin, and requires 4 separate sub-processes to format a simple string. We can do much better by getting the length via the shell's "${#packet}" construct. The one caveat is that the shell can't put a NUL into a variable, so we'll have to continue to provide the stdin form for a few calls. There are a few other cleanups here in the touched code: - the stdin form of packetize() had an extra stray "%s" when printing the packet - the converted calls in t5562 can be made simpler by redirecting output as a block, rather than repeated appending Signed-off-by: Jeff King <peff@peff.net> --- t/t5562-http-backend-content-length.sh | 19 ++++++++++++------- t/test-lib-functions.sh | 23 ++++++++++++++++------- 2 files changed, 28 insertions(+), 14 deletions(-)