[2/3] test-lib: simplify lsan results check

Message ID	20250107070752.GB584668@coredump.intra.peff.net (mailing list archive)
State	New
Headers	show Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B616E4879B for <git@vger.kernel.org>; Tue, 7 Jan 2025 07:07:54 +0000 (UTC) Date: Tue, 7 Jan 2025 02:07:52 -0500 From: Jeff King <peff@peff.net> To: Patrick Steinhardt <ps@pks.im> Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org Subject: [PATCH 2/3] test-lib: simplify lsan results check Message-ID: <20250107070752.GB584668@coredump.intra.peff.net> References: <20250107070409.GA584456@coredump.intra.peff.net> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250107070409.GA584456@coredump.intra.peff.net>
Series	lsan test-lib readability \| expand [0/3] lsan test-lib readability [1/3] test-lib: invert return value of check_test_results_san_file_empty [2/3] test-lib: simplify lsan results check [3/3] test-lib: add a few comments to LSan log checking

Jeff King Jan. 7, 2025, 7:07 a.m. UTC

We want to know if there are any leaks logged by LSan in the results
directory, so we run "find" on the containing directory and pipe it to
xargs. We can accomplish the same thing by just globbing in the shell
and passing the result to grep, which has a few advantages:

  - it's one fewer process to run

  - we can glob on the TEST_RESULTS_SAN_FILE pattern, which is what we
    checked at the beginning of the function, and is the same glob use
    to show the logs in check_test_results_san_file_

  - this correctly handles the case where TEST_OUTPUT_DIRECTORY has a
    space in it. For example doing:

       mkdir "/tmp/foo bar"
       TEST_OUTPUT_DIRECTORY="/tmp/foo bar" make SANITIZE=leak test

    would yield a lot of:

      grep: /tmp/foo: No such file or directory
      grep: bar/test-results/t0006-date.leak/trace.test-tool.582311: No such file or directory

    when there are leaks. We could do the same thing with "xargs
    --null", but that isn't portable.

We are now subject to command-line length limits, but that is also true
of the globbing cat used to show the logs themselves. This hasn't been a
problem in practice.

We do need to use "grep -s" for the case that the glob does not expand
(i.e., there are not any log files at all). This option is in POSIX, and
has been used in t7407 for several years without anybody complaining.
This also also naturally handles the case where the surrounding
directory has already been removed (in which case there are likewise no
files!), dropping the need to comment about it.

Signed-off-by: Jeff King <peff@peff.net>
---
I was surprised by the use of "grep -s" in t7407, since it is totally
pointless there. But I think we can take its presence as a positive sign
for portability.

 t/test-lib.sh | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Patrick Steinhardt Jan. 7, 2025, 7:37 a.m. UTC | #1

On Tue, Jan 07, 2025 at 02:07:52AM -0500, Jeff King wrote:
> We want to know if there are any leaks logged by LSan in the results
> directory, so we run "find" on the containing directory and pipe it to
> xargs. We can accomplish the same thing by just globbing in the shell
> and passing the result to grep, which has a few advantages:
> 
>   - it's one fewer process to run
> 
>   - we can glob on the TEST_RESULTS_SAN_FILE pattern, which is what we
>     checked at the beginning of the function, and is the same glob use

s/use/used

I'm always a bit thrown off by your style of bulleted lists, where they
feel like sentences but start with a lower-case letter, and sometimes
they do and sometimes they don't end with punctuation. Maybe it's just
me not being a native speaker and it's a natural thing to do in English.
In any case, it's nothing that really matters in the end, but would be
happy to learn if this is indeed something you tend to do in English.

>     to show the logs in check_test_results_san_file_
> 
>   - this correctly handles the case where TEST_OUTPUT_DIRECTORY has a
>     space in it. For example doing:
> 
>        mkdir "/tmp/foo bar"
>        TEST_OUTPUT_DIRECTORY="/tmp/foo bar" make SANITIZE=leak test
> 
>     would yield a lot of:
> 
>       grep: /tmp/foo: No such file or directory
>       grep: bar/test-results/t0006-date.leak/trace.test-tool.582311: No such file or directory
> 
>     when there are leaks. We could do the same thing with "xargs
>     --null", but that isn't portable.
> 
> We are now subject to command-line length limits, but that is also true
> of the globbing cat used to show the logs themselves. This hasn't been a
> problem in practice.

Yup, this also came to my mind immediately. But I agree that it
shouldn't be an issue in general.

> We do need to use "grep -s" for the case that the glob does not expand
> (i.e., there are not any log files at all). This option is in POSIX, and
> has been used in t7407 for several years without anybody complaining.
> This also also naturally handles the case where the surrounding
> directory has already been removed (in which case there are likewise no
> files!), dropping the need to comment about it.

Okay. So in case there are no matching files we don't expand the
globbing string, and "--no-messages" makes us ignore that case. A bit
funny, but I don't see any issue with it.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I was surprised by the use of "grep -s" in t7407, since it is totally
> pointless there. But I think we can take its presence as a positive sign
> for portability.

Good to know.

>  t/test-lib.sh | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index be3553e40e..898c2267b8 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -1172,12 +1172,7 @@ test_atexit_handler () {
>  check_test_results_san_file_has_entries_ () {
>  	test -z "$TEST_RESULTS_SAN_FILE" && return 1
>  
> -	# stderr piped to /dev/null because the directory may have
> -	# been "rmdir"'d already.
> -	find "$TEST_RESULTS_SAN_DIR" \
> -		-type f \
> -		-name "$TEST_RESULTS_SAN_FILE_PFX.*" 2>/dev/null |
> -	xargs grep ^DEDUP_TOKEN |
> +	grep -s ^DEDUP_TOKEN "$TEST_RESULTS_SAN_FILE".* |
>  	grep -qv sanitizer::GetThreadStackTopAndBottom

And this nicely simplifies things indeed.

Patrick

Junio C Hamano Jan. 7, 2025, 4:23 p.m. UTC | #2

Jeff King <peff@peff.net> writes:

> We want to know if there are any leaks logged by LSan in the results
> directory, so we run "find" on the containing directory and pipe it to
> xargs. We can accomplish the same thing by just globbing in the shell
> and passing the result to grep, which has a few advantages:
>
>   - it's one fewer process to run
> ...
> We are now subject to command-line length limits, but that is also true
> of the globbing cat used to show the logs themselves. This hasn't been a
> problem in practice.

Nice to see it mentioned here.  And the resulting code does become
simpler to reason about.

> We do need to use "grep -s" for the case that the glob does not expand
> (i.e., there are not any log files at all). This option is in POSIX, and
> has been used in t7407 for several years without anybody complaining.

Also since c625bf0e (git-p4: git-p4 tests with p4 triggers,
2017-07-13) t9831 has also been using it.  It is not like a stray
error message about unmatched glob would really matter here, though.
We are not doing 2>&1 to let the downstream of the pipe see it, and
unless the test is run under "-v" option, it wouldn't even be seen.

> This also also naturally handles the case where the surrounding
> directory has already been removed (in which case there are likewise no
> files!), dropping the need to comment about it.

Nice.

Thanks.

Jeff King Jan. 9, 2025, 7:57 a.m. UTC | #3

On Tue, Jan 07, 2025 at 08:37:33AM +0100, Patrick Steinhardt wrote:

> On Tue, Jan 07, 2025 at 02:07:52AM -0500, Jeff King wrote:
> > We want to know if there are any leaks logged by LSan in the results
> > directory, so we run "find" on the containing directory and pipe it to
> > xargs. We can accomplish the same thing by just globbing in the shell
> > and passing the result to grep, which has a few advantages:
> > 
> >   - it's one fewer process to run
> > 
> >   - we can glob on the TEST_RESULTS_SAN_FILE pattern, which is what we
> >     checked at the beginning of the function, and is the same glob use
> 
> s/use/used
> 
> I'm always a bit thrown off by your style of bulleted lists, where they
> feel like sentences but start with a lower-case letter, and sometimes
> they do and sometimes they don't end with punctuation. Maybe it's just
> me not being a native speaker and it's a natural thing to do in English.
> In any case, it's nothing that really matters in the end, but would be
> happy to learn if this is indeed something you tend to do in English.

Heh. Yeah, I've seen you mention them before and I've been tempted to
start a big discussion. But I never felt like it was worth it. But
tonight's your lucky night. ;)

In short: I think it's a style question. I perceive them as
continuations of the sentence that has the ":". Though admittedly I do
not always grammatically continue that sentence. So for example I could:

  - have one bullet item that completes the sentence.

  - and then another that likewise completes it.

;) I think many style guides would frown on that. Especially with the
periods at the end (you might argue that they should be semicolons).

In the example you quoted above they don't grammatically continue the
sentence, so arguably what I'm saying doesn't even apply. But I also
kind of think of the list items as sentence fragments. That sometimes
happen to make a full sentence. Or need punctuation because that
fragments gets so long it contains multiple sentences.

I dunno. You asked if it is something you tend to do in English. It is
something _I_ tend to do in English, but I think most style guides would
suggest against it (but then, most also suggest against bulleted lists
in the first place). (They probably also suggest against lots of
parentheses).  So I wouldn't necessarily copy me.

My general feeling is that unless a commit message is inaccurate or hard
to understand, we should mostly let it pass (even typos). Yes, they are
an artifact that is enshrined in the history. But at some point they are
also just a written communication between developers, and we all have
our own voices and styles. And make mistakes. Polishing them is
something we _can_ do collaboratively, but there are diminishing
returns.

In case it is not clear, I would not say the same for documentation,
error messages, etc. Those are artifacts that hits a wider audience, and
we have a tool for polishing them together: git.

And people should still proofread and correct their own messages before
sending. Believe it or not, I do always take a final pass when sending
out my commits and still manage to have errors. ;) A lot of times I end
up improving clarity and wording on the final pass, but end up
introducing a typo (I'm pretty sure that the use/used above was me
switching last-minute between "the same glob we use" and "the same glob
used").

Bringing it back to the example at hand, my assumption is that the
bullet list capitalization and punctuation is mostly a question of
style, and isn't making the result hard to understand. But if it is, I
can try to adjust. I actually wrote a bulleted list in a commit message
earlier today and capitalized it just for you. :)

-Peff

Jeff King Jan. 9, 2025, 7:59 a.m. UTC | #4

On Tue, Jan 07, 2025 at 08:23:34AM -0800, Junio C Hamano wrote:

> > We do need to use "grep -s" for the case that the glob does not expand
> > (i.e., there are not any log files at all). This option is in POSIX, and
> > has been used in t7407 for several years without anybody complaining.
> 
> Also since c625bf0e (git-p4: git-p4 tests with p4 triggers,
> 2017-07-13) t9831 has also been using it.  It is not like a stray
> error message about unmatched glob would really matter here, though.
> We are not doing 2>&1 to let the downstream of the pipe see it, and
> unless the test is run under "-v" option, it wouldn't even be seen.

Yeah, I saw those. But I don't think they count since hardly anybody
runs the p4 tests. They do run in CI, but on a rather limited set of
platforms. Though come to think of it, this one would only kick in for
LSan, which may also run on a pretty limited set of platforms. :)

-Peff

Patrick Steinhardt Jan. 9, 2025, 10 a.m. UTC | #5

On Thu, Jan 09, 2025 at 02:57:50AM -0500, Jeff King wrote:
> On Tue, Jan 07, 2025 at 08:37:33AM +0100, Patrick Steinhardt wrote:
> 
> > On Tue, Jan 07, 2025 at 02:07:52AM -0500, Jeff King wrote:
> > > We want to know if there are any leaks logged by LSan in the results
> > > directory, so we run "find" on the containing directory and pipe it to
> > > xargs. We can accomplish the same thing by just globbing in the shell
> > > and passing the result to grep, which has a few advantages:
> > > 
> > >   - it's one fewer process to run
> > > 
> > >   - we can glob on the TEST_RESULTS_SAN_FILE pattern, which is what we
> > >     checked at the beginning of the function, and is the same glob use
> > 
> > s/use/used
> > 
> > I'm always a bit thrown off by your style of bulleted lists, where they
> > feel like sentences but start with a lower-case letter, and sometimes
> > they do and sometimes they don't end with punctuation. Maybe it's just
> > me not being a native speaker and it's a natural thing to do in English.
> > In any case, it's nothing that really matters in the end, but would be
> > happy to learn if this is indeed something you tend to do in English.
> 
> Heh. Yeah, I've seen you mention them before and I've been tempted to
> start a big discussion. But I never felt like it was worth it. But
> tonight's your lucky night. ;)
> 
> In short: I think it's a style question. I perceive them as
> continuations of the sentence that has the ":". Though admittedly I do
> not always grammatically continue that sentence. So for example I could:
> 
>   - have one bullet item that completes the sentence.
> 
>   - and then another that likewise completes it.
> 
> ;) I think many style guides would frown on that. Especially with the
> periods at the end (you might argue that they should be semicolons).
> 
> In the example you quoted above they don't grammatically continue the
> sentence, so arguably what I'm saying doesn't even apply. But I also
> kind of think of the list items as sentence fragments. That sometimes
> happen to make a full sentence. Or need punctuation because that
> fragments gets so long it contains multiple sentences.
> 
> I dunno. You asked if it is something you tend to do in English. It is
> something _I_ tend to do in English, but I think most style guides would
> suggest against it (but then, most also suggest against bulleted lists
> in the first place). (They probably also suggest against lots of
> parentheses).  So I wouldn't necessarily copy me.
> 
> My general feeling is that unless a commit message is inaccurate or hard
> to understand, we should mostly let it pass (even typos). Yes, they are
> an artifact that is enshrined in the history. But at some point they are
> also just a written communication between developers, and we all have
> our own voices and styles. And make mistakes. Polishing them is
> something we _can_ do collaboratively, but there are diminishing
> returns.

Yup, agreed. It's a minor detail and I'm happy to gloss over it in the
future.

> In case it is not clear, I would not say the same for documentation,
> error messages, etc. Those are artifacts that hits a wider audience, and
> we have a tool for polishing them together: git.
> 
> And people should still proofread and correct their own messages before
> sending. Believe it or not, I do always take a final pass when sending
> out my commits and still manage to have errors. ;) A lot of times I end
> up improving clarity and wording on the final pass, but end up
> introducing a typo (I'm pretty sure that the use/used above was me
> switching last-minute between "the same glob we use" and "the same glob
> used").
> 
> Bringing it back to the example at hand, my assumption is that the
> bullet list capitalization and punctuation is mostly a question of
> style, and isn't making the result hard to understand. But if it is, I
> can try to adjust. I actually wrote a bulleted list in a commit message
> earlier today and capitalized it just for you. :)

Thanks for explaining!

Patrick

[2/3] test-lib: simplify lsan results check

Commit Message

Comments

Patch