Message ID | 203cb627-2423-8a35-d280-9f9ffc66e072@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | t3920: don't ignore errors of more than one command with `|| true` | expand |
On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote: > Use tee(1) to replace two calls of cat(1) for writing files with > different line endings. That's shorter and spawns less processes. > [...] > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh > @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES="" > create_crlf_ref () { > - cat >.crlf-orig-$branch.txt && > - cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && > + tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && This feels slightly magical and more difficult to reason about than using simple redirection to eliminate the second `cat`. Wouldn't this work just as well? cat >.crlf-orig-$branch.txt && append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt && (Plus, this avoids introducing `tee` into the test suite, more or less. The few existing instances are all from the same test author and don't seem particularly legitimate -- they appear to be aids the author used while developing the test to be able to watch its output as it ran.)
Am 03.12.22 um 06:09 schrieb Eric Sunshine: > On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote: >> Use tee(1) to replace two calls of cat(1) for writing files with >> different line endings. That's shorter and spawns less processes. >> [...] >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh >> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES="" >> create_crlf_ref () { >> - cat >.crlf-orig-$branch.txt && >> - cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && >> + tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && > > This feels slightly magical and more difficult to reason about than > using simple redirection to eliminate the second `cat`. Wouldn't this > work just as well? > > cat >.crlf-orig-$branch.txt && > append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt && It would work, of course, but this is the exact use case for tee(1). No repetition, no extra redirection symbols, just an nicely fitting piece of pipework. Don't fear the tee! ;-) (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command) that PowerShell has a tee command as well.) > (Plus, this avoids introducing `tee` into the test suite, more or > less. The few existing instances are all from the same test author and > don't seem particularly legitimate -- they appear to be aids the > author used while developing the test to be able to watch its output > as it ran.) I agree that the tee calls in t1001 and t5523 are unnecessary. René
On Sat, Dec 03 2022, René Scharfe wrote: > Am 03.12.22 um 06:09 schrieb Eric Sunshine: >> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote: >>> Use tee(1) to replace two calls of cat(1) for writing files with >>> different line endings. That's shorter and spawns less processes. >>> [...] >>> Signed-off-by: René Scharfe <l.s.r@web.de> >>> --- >>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh >>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES="" >>> create_crlf_ref () { >>> - cat >.crlf-orig-$branch.txt && >>> - cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && >>> + tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && >> >> This feels slightly magical and more difficult to reason about than >> using simple redirection to eliminate the second `cat`. Wouldn't this >> work just as well? >> >> cat >.crlf-orig-$branch.txt && >> append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt && > > It would work, of course, but this is the exact use case for tee(1). No > repetition, no extra redirection symbols, just an nicely fitting piece > of pipework. Don't fear the tee! ;-) > > (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command) > that PowerShell has a tee command as well.) I don't really care, but I must say I agree with Eric here. Not having surprising patterns in the test suite has a value of its own. In this case I wonder if you want to optimize this whether we couldn't do much better with "test_commit_bulk", maybe by teaching it a small set of new tricks. I.e. if I do: git fast-export --all At the end of the setup test it seems we just end up with refs with names that correspond to their contents, and with double newlines in them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end up with that. So maybe we can create them as a patch, possibly with some slight "sed" munging on the input stream, just just teach it to accept a "ref prefix" and "commit message contents". That could just be an argument that you "$(printf "...")", so we don't even need a sub-process.... Also this: perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x Isn't deterministic. Now, in this case I doubt it matters, but it's nice to have intermediate files in the test suite be determanistic, i.e. to always have the full content be in the file at the top after the "top". With a "tee" you need to worry about the "append_cr" function it's being piped in stopping the stdin. I don't think it matters in this case, but in general as a pattern: I do fear the "tee" a bit :)
Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason: > > On Sat, Dec 03 2022, René Scharfe wrote: > >> Am 03.12.22 um 06:09 schrieb Eric Sunshine: >>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote: >>>> Use tee(1) to replace two calls of cat(1) for writing files with >>>> different line endings. That's shorter and spawns less processes. >>>> [...] >>>> Signed-off-by: René Scharfe <l.s.r@web.de> >>>> --- >>>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh >>>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES="" >>>> create_crlf_ref () { >>>> - cat >.crlf-orig-$branch.txt && >>>> - cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && >>>> + tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && >>> >>> This feels slightly magical and more difficult to reason about than >>> using simple redirection to eliminate the second `cat`. Wouldn't this >>> work just as well? >>> >>> cat >.crlf-orig-$branch.txt && >>> append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt && >> >> It would work, of course, but this is the exact use case for tee(1). No >> repetition, no extra redirection symbols, just an nicely fitting piece >> of pipework. Don't fear the tee! ;-) >> >> (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command) >> that PowerShell has a tee command as well.) > > I don't really care, but I must say I agree with Eric here. Not having > surprising patterns in the test suite has a value of its own. That's a good general guideline, but I wouldn't have expected a pipe with three holes to startle anyone. *shrug* > In this case I wonder if you want to optimize this whether we couldn't > do much better with "test_commit_bulk", maybe by teaching it a small set > of new tricks. > > I.e. if I do: > > git fast-export --all > > At the end of the setup test it seems we just end up with refs with > names that correspond to their contents, and with double newlines in > them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end > up with that. > > So maybe we can create them as a patch, possibly with some slight "sed" > munging on the input stream, just just teach it to accept a "ref prefix" > and "commit message contents". That could just be an argument that you > "$(printf "...")", so we don't even need a sub-process.... The files are used later for verification, so their contents can't just be passed on via parameters. Had a similar idea and spent too much time on creating the four files in a single awk invocation. The code was too verbose and yet hard to read for my taste. > Also this: > > perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x > > Isn't deterministic. Now, in this case I doubt it matters, but it's nice > to have intermediate files in the test suite be determanistic, i.e. to > always have the full content be in the file at the top after the "top". Whoa, such a one-liner is a good argument for banishing Perl. So to rephrase it in a way that I can understand, you say that something like this: $ cd /tmp; seq 100000 | tee x | head -1 >/dev/null; wc -l x ... will probably report less than 100000 lines because the downpipe command ends the whole thing early. > With a "tee" you need to worry about the "append_cr" function it's being > piped in stopping the stdin. > > I don't think it matters in this case, but in general as a pattern: I do > fear the "tee" a bit :) Right, append_cr reads until EOF. René
On Sat, Dec 03 2022, René Scharfe wrote: > Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason: >> >> On Sat, Dec 03 2022, René Scharfe wrote: >> >>> Am 03.12.22 um 06:09 schrieb Eric Sunshine: >>>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote: >>>>> Use tee(1) to replace two calls of cat(1) for writing files with >>>>> different line endings. That's shorter and spawns less processes. >>>>> [...] >>>>> Signed-off-by: René Scharfe <l.s.r@web.de> >>>>> --- >>>>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh >>>>> @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES="" >>>>> create_crlf_ref () { >>>>> - cat >.crlf-orig-$branch.txt && >>>>> - cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && >>>>> + tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && >>>> >>>> This feels slightly magical and more difficult to reason about than >>>> using simple redirection to eliminate the second `cat`. Wouldn't this >>>> work just as well? >>>> >>>> cat >.crlf-orig-$branch.txt && >>>> append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt && >>> >>> It would work, of course, but this is the exact use case for tee(1). No >>> repetition, no extra redirection symbols, just an nicely fitting piece >>> of pipework. Don't fear the tee! ;-) >>> >>> (I'm delighted to learn from https://en.wikipedia.org/wiki/Tee_(command) >>> that PowerShell has a tee command as well.) >> >> I don't really care, but I must say I agree with Eric here. Not having >> surprising patterns in the test suite has a value of its own. > > That's a good general guideline, but I wouldn't have expected a pipe > with three holes to startle anyone. *shrug* It's more that you're used to seeing one thing, the "cat >in" at the start of a function is a common pattern. Then it takes some time to stop and grok an a new pattern. If I was hacking on a function like that I'd probably stop to try to understand "why", even though I understood the "what". I'd then find it was to try to optimize things on Windows a bit... :) I'm not saying it's not worth it in this case, just pointing out that boring "standard" patterns have a value of their own in us collectively understanding them, which has a value of its own. Whether optimizing a test case outweighs that is another matter (sometimes it would). >> In this case I wonder if you want to optimize this whether we couldn't >> do much better with "test_commit_bulk", maybe by teaching it a small set >> of new tricks. >> >> I.e. if I do: >> >> git fast-export --all >> >> At the end of the setup test it seems we just end up with refs with >> names that correspond to their contents, and with double newlines in >> them or whatever. This is a lot of "grep", "sed", "tr" etc. just to end >> up with that. >> >> So maybe we can create them as a patch, possibly with some slight "sed" >> munging on the input stream, just just teach it to accept a "ref prefix" >> and "commit message contents". That could just be an argument that you >> "$(printf "...")", so we don't even need a sub-process.... > > The files are used later for verification, so their contents can't just > be passed on via parameters. > > Had a similar idea and spent too much time on creating the four files in > a single awk invocation. The code was too verbose and yet hard to read > for my taste. Hah, I didn't try. Just a suggestion in case it made sense :) >> Also this: >> >> perl -wE 'say for 1..1024*100' | tee /tmp/x | perl -nE 'print "in: $_"; exit 1 if $_ == 512'; tail -n 1 /tmp/x >> >> Isn't deterministic. Now, in this case I doubt it matters, but it's nice >> to have intermediate files in the test suite be determanistic, i.e. to >> always have the full content be in the file at the top after the "top". > > Whoa, such a one-liner is a good argument for banishing Perl. > > So to rephrase it in a way that I can understand, you say that something > like this: > > $ cd /tmp; seq 100000 | tee x | head -1 >/dev/null; wc -l x > > ... will probably report less than 100000 lines because the downpipe > command ends the whole thing early. Yes, the "perl" line was just a quick demo hack. But the point is that the initial perl process on the LHS will be killed with a SIGPIPE as the "perl" on the RHS stops and a SIGPIPE is propagated up the chain. I don't think it matters in this case, but just pointing out that it *is* an edge case this sort of pattern introduces. I've sometimes resorted to recursively diffing the trash directories of two test runs to see if they're the same. E.g. I've caught cases where the stderr of programs unexpectedly changes, but we had no test coverage for it. I think it's good to avoid patterns in general that make test runs nondeterministic. In this case it's only nondeterministic on failure, so it's probably fine.
On Sun, Dec 4, 2022 at 4:41 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Sat, Dec 03 2022, René Scharfe wrote: > > Am 03.12.22 um 13:53 schrieb Ævar Arnfjörð Bjarmason: > >> On Sat, Dec 03 2022, René Scharfe wrote: > >>> Am 03.12.22 um 06:09 schrieb Eric Sunshine: > >>>> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote: > >>>>> - cat >.crlf-orig-$branch.txt && > >>>>> - cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && > >>>>> + tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && > >>>> > >>>> This feels slightly magical and more difficult to reason about than > >>>> using simple redirection to eliminate the second `cat`. Wouldn't this > >>>> work just as well? > >>>> > >>>> cat >.crlf-orig-$branch.txt && > >>>> append_cr <.crlf-orig-$branch.txt >.crlf-message-$branch.txt && > >>> > >>> It would work, of course, but this is the exact use case for tee(1). No > >>> repetition, no extra redirection symbols, just an nicely fitting piece > >>> of pipework. Don't fear the tee! ;-) > >> > >> I don't really care, but I must say I agree with Eric here. Not having > >> surprising patterns in the test suite has a value of its own. > > > > That's a good general guideline, but I wouldn't have expected a pipe > > with three holes to startle anyone. *shrug* > > It's more that you're used to seeing one thing, the "cat >in" at the > start of a function is a common pattern. > > Then it takes some time to stop and grok an a new pattern. If I was > hacking on a function like that I'd probably stop to try to understand > "why", even though I understood the "what". > > I'm not saying it's not worth it in this case, just pointing out that > boring "standard" patterns have a value of their own in us collectively > understanding them, which has a value of its own. Whether optimizing a > test case outweighs that is another matter (sometimes it would). Perhaps my experience is atypical, but in decades of using Unix, my use of `tee` can (probably) be counted on a single finger, so the patch, as implemented, did have higher cognitive load for me than a patch using simple redirection would have had. Anyhow, I mentioned the redirection approach, not to ask for a change, but only in case you had overlooked the (to me) simpler approach. I didn't expect it to spark so much discussion (though I do agree with everything Ævar has said about following established patterns). That said, I'm still rather unclear on the purpose of this patch. In a sense, it feels like mere churn for 1/100 of a second gain (assuming I'm reading the `hyperfine` output correctly).
diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh index 4fc9fa9cad..1f64ce565f 100755 --- a/t/t3920-crlf-messages.sh +++ b/t/t3920-crlf-messages.sh @@ -9,8 +9,7 @@ LIB_CRLF_BRANCHES="" create_crlf_ref () { branch="$1" && - cat >.crlf-orig-$branch.txt && - cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && + tee .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt && grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt && grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt && LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
Use tee(1) to replace two calls of cat(1) for writing files with different line endings. That's shorter and spawns less processes. It has a small, but measurable performance impact on my Windows machine. Here are the numbers before: $ (cd t && hyperfine.exe -w3 "sh.exe t3920-crlf-messages.sh") Benchmark 1: sh.exe t3920-crlf-messages.sh Time (mean ± σ): 5.705 s ± 0.047 s [User: 0.000 s, System: 0.001 s] Range (min … max): 5.632 s … 5.772 s 10 runs ... and with this patch: $ (cd t && hyperfine.exe -w3 "sh.exe t3920-crlf-messages.sh") Benchmark 1: sh.exe t3920-crlf-messages.sh Time (mean ± σ): 5.616 s ± 0.021 s [User: 0.001 s, System: 0.002 s] Range (min … max): 5.577 s … 5.644 s 10 runs Signed-off-by: René Scharfe <l.s.r@web.de> --- t/t3920-crlf-messages.sh | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -- 2.38.1.windows.1