diff mbox series

[v2,2/2] Documentation: dev-tools: Enhance static analysis section with discussion

Message ID 11f4750c6d4c175994dfd36d1ff385f68f61bd02.1648593132.git.marcelo.schmitt1@gmail.com (mailing list archive)
State Not Applicable, archived
Headers show
Series Add a section for static analysis tools | expand

Commit Message

Marcelo Schmitt March 29, 2022, 11:23 p.m. UTC
Enhance the static analysis tools section with a discussion on when to
use each of them.

This was mainly taken from Dan Carpenter and Julia Lawall's comments on
the previous documentation patch for static analysis tools.

Lore: https://lore.kernel.org/linux-doc/20220329090911.GX3293@kadam/T/#mb97770c8e938095aadc3ee08f4ac7fe32ae386e6

Signed-off-by: Marcelo Schmitt <marcelo.schmitt1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
---
 Documentation/dev-tools/testing-overview.rst | 33 ++++++++++++++++++++
 1 file changed, 33 insertions(+)

Comments

David Gow March 30, 2022, 2:48 a.m. UTC | #1
On Wed, Mar 30, 2022 at 7:23 AM Marcelo Schmitt
<marcelo.schmitt1@gmail.com> wrote:
>
> Enhance the static analysis tools section with a discussion on when to
> use each of them.
>
> This was mainly taken from Dan Carpenter and Julia Lawall's comments on
> the previous documentation patch for static analysis tools.
>
> Lore: https://lore.kernel.org/linux-doc/20220329090911.GX3293@kadam/T/#mb97770c8e938095aadc3ee08f4ac7fe32ae386e6
>
> Signed-off-by: Marcelo Schmitt <marcelo.schmitt1@gmail.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Julia Lawall <julia.lawall@inria.fr>
> ---

Thanks: this sort of "when to use which tool" information is really
what the testing guide page needs.

I'm not familiar enough with these tools that I can really review the
details properly, but nothing stands out as obviously wrong to me.
I've made a few comments below regardless, but feel free to ignore
them if they're not quite right.

Acked-by: David Gow <davidgow@google.com>

Cheers,
-- David

>  Documentation/dev-tools/testing-overview.rst | 33 ++++++++++++++++++++
>  1 file changed, 33 insertions(+)
>
> diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> index b5e02dd3fd94..91e479045d3a 100644
> --- a/Documentation/dev-tools/testing-overview.rst
> +++ b/Documentation/dev-tools/testing-overview.rst
> @@ -146,3 +146,36 @@ Documentation/dev-tools/coccinelle.rst documentation page for details.
>
>  Beware, though, that static analysis tools suffer from **false positives**.
>  Errors and warns need to be evaluated carefully before attempting to fix them.
> +
> +When to use Sparse and Smatch
> +-----------------------------
> +
> +Sparse is useful for type checking, detecting places that use ``__user``
> +pointers improperly, or finding endianness bugs. Sparse runs much faster than
> +Smatch.

Given that the __user pointer and endianness stuff is found as a
result of Sparse's type checking support, would rewording this as
"Sparse does type checking, such as [detecting places...]" or similar
be more clear?

> +
> +Smatch does flow analysis and, if allowed to build the function database, it
> +also does cross function analysis. Smatch tries to answer questions like where
> +is this buffer allocated? How big is it? Can this index be controlled by the
> +user? Is this variable larger than that variable?
> +
> +It's generally easier to write checks in Smatch than it is to write checks in
> +Sparse. Nevertheless, there are some overlaps between Sparse and Smatch checks
> +because there is no reason for re-implementing Sparse's check in Smatch.

This last sentence isn't totally clear to me. Should this "because" be "so"?

> +
> +Strong points of Smatch and Coccinelle
> +--------------------------------------
> +
> +Coccinelle is probably the easiest for writing checks. It works before the
> +pre-compiler so it's easier to check for bugs in macros using Coccinelle.
> +Coccinelle also writes patches fixes for you which no other tool does.
> +
> +With Coccinelle you can do a mass conversion from

(Maybe start this with "For example," just to make it clear that this
paragraph is mostly following on from how useful it is that Coccinelle
produces fixes, not just warnings.)

> +``kmalloc(x * size, GFP_KERNEL)`` to ``kmalloc_array(x, size, GFP_KERNEL)``, and
> +that's really useful. If you just created a Smatch warning and try to push the
> +work of converting on to the maintainers they would be annoyed. You'd have to
> +argue about each warning if can really overflow or not.
> +
> +Coccinelle does no analysis of variable values, which is the strong point of
> +Smatch. On the other hand, Coccinelle allows you to do simple things in a simple
> +way.
> --
> 2.35.1
>
Julia Lawall March 30, 2022, 8:06 a.m. UTC | #2
> +Strong points of Smatch and Coccinelle
> +--------------------------------------
> +
> +Coccinelle is probably the easiest for writing checks. It works before the
> +pre-compiler so it's easier to check for bugs in macros using Coccinelle.

pre-processor

> +Coccinelle also writes patches fixes for you which no other tool does.

writes patches fixes -> creates patches

> +
> +With Coccinelle you can do a mass conversion from

you can -> you can, for example,

julia

> +``kmalloc(x * size, GFP_KERNEL)`` to ``kmalloc_array(x, size, GFP_KERNEL)``, and
> +that's really useful. If you just created a Smatch warning and try to push the
> +work of converting on to the maintainers they would be annoyed. You'd have to
> +argue about each warning if can really overflow or not.
> +
> +Coccinelle does no analysis of variable values, which is the strong point of
> +Smatch. On the other hand, Coccinelle allows you to do simple things in a simple
> +way.
> --
> 2.35.1
>
>
Julia Lawall March 30, 2022, 8:07 a.m. UTC | #3
> > +Strong points of Smatch and Coccinelle
> > +--------------------------------------
> > +
> > +Coccinelle is probably the easiest for writing checks. It works before the
> > +pre-compiler so it's easier to check for bugs in macros using Coccinelle.
> > +Coccinelle also writes patches fixes for you which no other tool does.
> > +
> > +With Coccinelle you can do a mass conversion from
>
> (Maybe start this with "For example," just to make it clear that this
> paragraph is mostly following on from how useful it is that Coccinelle
> produces fixes, not just warnings.)

I also suggested "for example", in a different place, but either is fine.

julia

>
> > +``kmalloc(x * size, GFP_KERNEL)`` to ``kmalloc_array(x, size, GFP_KERNEL)``, and
> > +that's really useful. If you just created a Smatch warning and try to push the
> > +work of converting on to the maintainers they would be annoyed. You'd have to
> > +argue about each warning if can really overflow or not.
> > +
> > +Coccinelle does no analysis of variable values, which is the strong point of
> > +Smatch. On the other hand, Coccinelle allows you to do simple things in a simple
> > +way.
> > --
> > 2.35.1
> >
>
Marcelo Schmitt March 30, 2022, 7:30 p.m. UTC | #4
On 03/30, David Gow wrote:
> On Wed, Mar 30, 2022 at 7:23 AM Marcelo Schmitt
> <marcelo.schmitt1@gmail.com> wrote:
> >
> > Enhance the static analysis tools section with a discussion on when to
> > use each of them.
> >
> > This was mainly taken from Dan Carpenter and Julia Lawall's comments on
> > the previous documentation patch for static analysis tools.
> >
> > Lore: https://lore.kernel.org/linux-doc/20220329090911.GX3293@kadam/T/#mb97770c8e938095aadc3ee08f4ac7fe32ae386e6
> >
> > Signed-off-by: Marcelo Schmitt <marcelo.schmitt1@gmail.com>
> > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > Cc: Julia Lawall <julia.lawall@inria.fr>
> > ---
> 
> Thanks: this sort of "when to use which tool" information is really
> what the testing guide page needs.
> 
> I'm not familiar enough with these tools that I can really review the
> details properly, but nothing stands out as obviously wrong to me.
> I've made a few comments below regardless, but feel free to ignore
> them if they're not quite right.
> 
> Acked-by: David Gow <davidgow@google.com>
> 
> Cheers,
> -- David
> 
> >  Documentation/dev-tools/testing-overview.rst | 33 ++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> >
> > diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> > index b5e02dd3fd94..91e479045d3a 100644
> > --- a/Documentation/dev-tools/testing-overview.rst
> > +++ b/Documentation/dev-tools/testing-overview.rst
> > @@ -146,3 +146,36 @@ Documentation/dev-tools/coccinelle.rst documentation page for details.
> >
> >  Beware, though, that static analysis tools suffer from **false positives**.
> >  Errors and warns need to be evaluated carefully before attempting to fix them.
> > +
> > +When to use Sparse and Smatch
> > +-----------------------------
> > +
> > +Sparse is useful for type checking, detecting places that use ``__user``
> > +pointers improperly, or finding endianness bugs. Sparse runs much faster than
> > +Smatch.
> 
> Given that the __user pointer and endianness stuff is found as a
> result of Sparse's type checking support, would rewording this as
> "Sparse does type checking, such as [detecting places...]" or similar
> be more clear?

Myabe. I tried changing it a little while adding a bit of information from
https://lwn.net/Articles/689907/

"Sparse does type checking, such as verifying that annotated variables do not
cause endianness bugs, detecting places that use ``__user`` pointers improperly,
and analyzing the compatibility of symbol initializers."

Does it sound better?

> 
> > +
> > +Smatch does flow analysis and, if allowed to build the function database, it
> > +also does cross function analysis. Smatch tries to answer questions like where
> > +is this buffer allocated? How big is it? Can this index be controlled by the
> > +user? Is this variable larger than that variable?
> > +
> > +It's generally easier to write checks in Smatch than it is to write checks in
> > +Sparse. Nevertheless, there are some overlaps between Sparse and Smatch checks
> > +because there is no reason for re-implementing Sparse's check in Smatch.
> 
> This last sentence isn't totally clear to me. Should this "because" be "so"?

Smatch uses (is shipped with) a modified Sparse implementation which it uses as
a C parser. Apparently, Sparse does some checkings while parsing the code for
Smatch so that's why we have some overlapping between the checks made when we
run Smatch and the ones made when we run Sparse alone.

I didn't dig into the code, but I guess further modifying Sparse to prevent it
from doing some types of cheks wouldn't add much to Smatch. That last saying
should've reflected this fact, but it seems to cause confusion without a proper
context. Reading the sentence back again, I think we could just drop the last
part:

"Nevertheless, there are some overlaps between Sparse and Smatch checks."

> 
> > +
> > +Strong points of Smatch and Coccinelle
> > +--------------------------------------
> > +
> > +Coccinelle is probably the easiest for writing checks. It works before the
> > +pre-compiler so it's easier to check for bugs in macros using Coccinelle.
> > +Coccinelle also writes patches fixes for you which no other tool does.
> > +
> > +With Coccinelle you can do a mass conversion from
> 
> (Maybe start this with "For example," just to make it clear that this
> paragraph is mostly following on from how useful it is that Coccinelle
> produces fixes, not just warnings.)

Will do

> 
> > +``kmalloc(x * size, GFP_KERNEL)`` to ``kmalloc_array(x, size, GFP_KERNEL)``, and
> > +that's really useful. If you just created a Smatch warning and try to push the
> > +work of converting on to the maintainers they would be annoyed. You'd have to
> > +argue about each warning if can really overflow or not.
> > +
> > +Coccinelle does no analysis of variable values, which is the strong point of
> > +Smatch. On the other hand, Coccinelle allows you to do simple things in a simple
> > +way.
> > --
> > 2.35.1
> >
Dan Carpenter March 31, 2022, 8:14 a.m. UTC | #5
On Wed, Mar 30, 2022 at 10:48:13AM +0800, David Gow wrote:
> > +
> > +Smatch does flow analysis and, if allowed to build the function database, it
> > +also does cross function analysis. Smatch tries to answer questions like where
> > +is this buffer allocated? How big is it? Can this index be controlled by the
> > +user? Is this variable larger than that variable?
> > +
> > +It's generally easier to write checks in Smatch than it is to write checks in
> > +Sparse. Nevertheless, there are some overlaps between Sparse and Smatch checks
> > +because there is no reason for re-implementing Sparse's check in Smatch.
> 
> This last sentence isn't totally clear to me. Should this "because" be "so"?
> 

I stopped reading your email when you wrote "Cheers, David" but I should
have scrolled down.

There is not very much overlap between Sparse and Smatch.  Both have a
warning for if (!x & y).  That is a tiny thing.  The big overlap is when
it comes to the locking checks.  The Smatch check for locking is
honestly way better and more capable.

I always run both Sparse and Smatch on my patches.  I should run
Coccinelle as well, but I'm more familiar with Sparse and Smatch.

regards,
dan carpenter
David Gow April 1, 2022, 12:18 a.m. UTC | #6
On Thu, Mar 31, 2022 at 3:30 AM Marcelo Schmitt
<marcelo.schmitt1@gmail.com> wrote:
>
> On 03/30, David Gow wrote:
> > On Wed, Mar 30, 2022 at 7:23 AM Marcelo Schmitt
> > <marcelo.schmitt1@gmail.com> wrote:
> > >
> > > Enhance the static analysis tools section with a discussion on when to
> > > use each of them.
> > >
> > > This was mainly taken from Dan Carpenter and Julia Lawall's comments on
> > > the previous documentation patch for static analysis tools.
> > >
> > > Lore: https://lore.kernel.org/linux-doc/20220329090911.GX3293@kadam/T/#mb97770c8e938095aadc3ee08f4ac7fe32ae386e6
> > >
> > > Signed-off-by: Marcelo Schmitt <marcelo.schmitt1@gmail.com>
> > > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > > Cc: Julia Lawall <julia.lawall@inria.fr>
> > > ---
> >
> > Thanks: this sort of "when to use which tool" information is really
> > what the testing guide page needs.
> >
> > I'm not familiar enough with these tools that I can really review the
> > details properly, but nothing stands out as obviously wrong to me.
> > I've made a few comments below regardless, but feel free to ignore
> > them if they're not quite right.
> >
> > Acked-by: David Gow <davidgow@google.com>
> >
> > Cheers,
> > -- David
> >
> > >  Documentation/dev-tools/testing-overview.rst | 33 ++++++++++++++++++++
> > >  1 file changed, 33 insertions(+)
> > >
> > > diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> > > index b5e02dd3fd94..91e479045d3a 100644
> > > --- a/Documentation/dev-tools/testing-overview.rst
> > > +++ b/Documentation/dev-tools/testing-overview.rst
> > > @@ -146,3 +146,36 @@ Documentation/dev-tools/coccinelle.rst documentation page for details.
> > >
> > >  Beware, though, that static analysis tools suffer from **false positives**.
> > >  Errors and warns need to be evaluated carefully before attempting to fix them.
> > > +
> > > +When to use Sparse and Smatch
> > > +-----------------------------
> > > +
> > > +Sparse is useful for type checking, detecting places that use ``__user``
> > > +pointers improperly, or finding endianness bugs. Sparse runs much faster than
> > > +Smatch.
> >
> > Given that the __user pointer and endianness stuff is found as a
> > result of Sparse's type checking support, would rewording this as
> > "Sparse does type checking, such as [detecting places...]" or similar
> > be more clear?
>
> Myabe. I tried changing it a little while adding a bit of information from
> https://lwn.net/Articles/689907/
>
> "Sparse does type checking, such as verifying that annotated variables do not
> cause endianness bugs, detecting places that use ``__user`` pointers improperly,
> and analyzing the compatibility of symbol initializers."
>
> Does it sound better?
>

Yeah: that sounds much better to me. Thanks!

> >
> > > +
> > > +Smatch does flow analysis and, if allowed to build the function database, it
> > > +also does cross function analysis. Smatch tries to answer questions like where
> > > +is this buffer allocated? How big is it? Can this index be controlled by the
> > > +user? Is this variable larger than that variable?
> > > +
> > > +It's generally easier to write checks in Smatch than it is to write checks in
> > > +Sparse. Nevertheless, there are some overlaps between Sparse and Smatch checks
> > > +because there is no reason for re-implementing Sparse's check in Smatch.
> >
> > This last sentence isn't totally clear to me. Should this "because" be "so"?
>
> Smatch uses (is shipped with) a modified Sparse implementation which it uses as
> a C parser. Apparently, Sparse does some checkings while parsing the code for
> Smatch so that's why we have some overlapping between the checks made when we
> run Smatch and the ones made when we run Sparse alone.
>
> I didn't dig into the code, but I guess further modifying Sparse to prevent it
> from doing some types of cheks wouldn't add much to Smatch. That last saying
> should've reflected this fact, but it seems to cause confusion without a proper
> context. Reading the sentence back again, I think we could just drop the last
> part:
>
> "Nevertheless, there are some overlaps between Sparse and Smatch checks."
>

Yeah, I do think that makes more sense. I don't think the fact that
some of the checks overlap causes any problems at all, to be honest,
so you _could_ get rid of the whole sentence without losing too much,
but I'm also happy with it as it is in v3.


> >
> > > +
> > > +Strong points of Smatch and Coccinelle
> > > +--------------------------------------
> > > +
> > > +Coccinelle is probably the easiest for writing checks. It works before the
> > > +pre-compiler so it's easier to check for bugs in macros using Coccinelle.
> > > +Coccinelle also writes patches fixes for you which no other tool does.
> > > +
> > > +With Coccinelle you can do a mass conversion from
> >
> > (Maybe start this with "For example," just to make it clear that this
> > paragraph is mostly following on from how useful it is that Coccinelle
> > produces fixes, not just warnings.)
>
> Will do
>
> >
> > > +``kmalloc(x * size, GFP_KERNEL)`` to ``kmalloc_array(x, size, GFP_KERNEL)``, and
> > > +that's really useful. If you just created a Smatch warning and try to push the
> > > +work of converting on to the maintainers they would be annoyed. You'd have to
> > > +argue about each warning if can really overflow or not.
> > > +
> > > +Coccinelle does no analysis of variable values, which is the strong point of
> > > +Smatch. On the other hand, Coccinelle allows you to do simple things in a simple
> > > +way.
> > > --
> > > 2.35.1
> > >
David Gow April 1, 2022, 12:19 a.m. UTC | #7
On Thu, Mar 31, 2022 at 4:14 PM Dan Carpenter <dan.carpenter@oracle.com> wrote:
>
> On Wed, Mar 30, 2022 at 10:48:13AM +0800, David Gow wrote:
> > > +
> > > +Smatch does flow analysis and, if allowed to build the function database, it
> > > +also does cross function analysis. Smatch tries to answer questions like where
> > > +is this buffer allocated? How big is it? Can this index be controlled by the
> > > +user? Is this variable larger than that variable?
> > > +
> > > +It's generally easier to write checks in Smatch than it is to write checks in
> > > +Sparse. Nevertheless, there are some overlaps between Sparse and Smatch checks
> > > +because there is no reason for re-implementing Sparse's check in Smatch.
> >
> > This last sentence isn't totally clear to me. Should this "because" be "so"?
> >
>
> I stopped reading your email when you wrote "Cheers, David" but I should
> have scrolled down.
>
> There is not very much overlap between Sparse and Smatch.  Both have a
> warning for if (!x & y).  That is a tiny thing.  The big overlap is when
> it comes to the locking checks.  The Smatch check for locking is
> honestly way better and more capable.
>
> I always run both Sparse and Smatch on my patches.  I should run
> Coccinelle as well, but I'm more familiar with Sparse and Smatch.

Makes sense. I agree that the overlap doesn't seem particularly
important: it's the differences which should be more evident.

v3[1] of the patch cuts this down to just "Nevertheless, there are
some overlaps between Sparse and Smatch checks.", which I think is an
improvement.

Thanks,
-- David

[1]: https://lore.kernel.org/all/62f461a20600b95e694016c4e5348ef2e260fa87.1648674305.git.marcelo.schmitt1@gmail.com/
diff mbox series

Patch

diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
index b5e02dd3fd94..91e479045d3a 100644
--- a/Documentation/dev-tools/testing-overview.rst
+++ b/Documentation/dev-tools/testing-overview.rst
@@ -146,3 +146,36 @@  Documentation/dev-tools/coccinelle.rst documentation page for details.
 
 Beware, though, that static analysis tools suffer from **false positives**.
 Errors and warns need to be evaluated carefully before attempting to fix them.
+
+When to use Sparse and Smatch
+-----------------------------
+
+Sparse is useful for type checking, detecting places that use ``__user``
+pointers improperly, or finding endianness bugs. Sparse runs much faster than
+Smatch.
+
+Smatch does flow analysis and, if allowed to build the function database, it
+also does cross function analysis. Smatch tries to answer questions like where
+is this buffer allocated? How big is it? Can this index be controlled by the
+user? Is this variable larger than that variable?
+
+It's generally easier to write checks in Smatch than it is to write checks in
+Sparse. Nevertheless, there are some overlaps between Sparse and Smatch checks
+because there is no reason for re-implementing Sparse's check in Smatch.
+
+Strong points of Smatch and Coccinelle
+--------------------------------------
+
+Coccinelle is probably the easiest for writing checks. It works before the
+pre-compiler so it's easier to check for bugs in macros using Coccinelle.
+Coccinelle also writes patches fixes for you which no other tool does.
+
+With Coccinelle you can do a mass conversion from
+``kmalloc(x * size, GFP_KERNEL)`` to ``kmalloc_array(x, size, GFP_KERNEL)``, and
+that's really useful. If you just created a Smatch warning and try to push the
+work of converting on to the maintainers they would be annoyed. You'd have to
+argue about each warning if can really overflow or not.
+
+Coccinelle does no analysis of variable values, which is the strong point of
+Smatch. On the other hand, Coccinelle allows you to do simple things in a simple
+way.