diff mbox series

scripts: run-clang-tools: add file filtering option

Message ID 20240704-clang-tidy-filter-v1-1-8d4556a35b65@bootlin.com (mailing list archive)
State New
Headers show
Series scripts: run-clang-tools: add file filtering option | expand

Commit Message

Théo Lebrun July 4, 2024, 9:28 a.m. UTC
Add file filtering feature. We take zero or more filters at the end as
positional arguments. If none are given, the default behavior is kept
and we run the tool on all files in the datastore. Else, files must
match one or more filter to be analysed.

The below command runs clang-tidy on drivers/clk/clk.c and all C files
inside drivers/reset/.

    ./scripts/clang-tools/run-clang-tools.py clang-tidy \
        compile_commands.json \
        'drivers/clk/clk.c' 'drivers/reset/*'

The Python fnmatch builtin module is used. Matching is case-insensitive.
See its documentation for allowed syntax:
https://docs.python.org/3/library/fnmatch.html

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
Currently, all files in the datastore are analysed. This is not
practical for grabbing errors in a subsystem, or relative to a patch
series. Add a file filtering feature with wildcard support.

Have a nice day,
Théo
---
 scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)


---
base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826
change-id: 20240704-clang-tidy-filter-f470377cb2b4

Best regards,

Comments

Nathan Chancellor Aug. 2, 2024, 10:35 p.m. UTC | #1
Hi Théo,

First of all, apologies that it has taken me so long to review this!

On Thu, Jul 04, 2024 at 11:28:21AM +0200, Théo Lebrun wrote:
> Add file filtering feature. We take zero or more filters at the end as
> positional arguments. If none are given, the default behavior is kept
> and we run the tool on all files in the datastore. Else, files must
> match one or more filter to be analysed.
> 
> The below command runs clang-tidy on drivers/clk/clk.c and all C files
> inside drivers/reset/.
> 
>     ./scripts/clang-tools/run-clang-tools.py clang-tidy \
>         compile_commands.json \
>         'drivers/clk/clk.c' 'drivers/reset/*'
> 
> The Python fnmatch builtin module is used. Matching is case-insensitive.
> See its documentation for allowed syntax:
> https://docs.python.org/3/library/fnmatch.html
> 
> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
> ---
> Currently, all files in the datastore are analysed. This is not
> practical for grabbing errors in a subsystem, or relative to a patch
> series. Add a file filtering feature with wildcard support.

Sure, I think this is totally reasonable. In fact, I think some of this
could be added to the commit message as further existence for this
feature.

The change itself looks good to me for the most part, I have some
questions below just for my own understanding.

Reviewed-by: Nathan Chancellor <nathan@kernel.org>

One further question/comment now: Have you considered a way to
integrate this into Kbuild with the clang-tidy and clang-analyzer
commands? I don't think it is strictly necessary for the acceptance of
this patch but it might be nice to have some variable that users could
provide to do this with their regular make command + the clang-tidy
target? Not sure if Masahiro has further thoughts on that.

> Have a nice day,
> Théo
> ---
>  scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py
> index f31ffd09e1ea..b0b3a9c8cdec 100755
> --- a/scripts/clang-tools/run-clang-tools.py
> +++ b/scripts/clang-tools/run-clang-tools.py
> @@ -10,6 +10,7 @@ compile_commands.json.
>  """
>  
>  import argparse
> +import fnmatch
>  import json
>  import multiprocessing
>  import subprocess
> @@ -32,6 +33,8 @@ def parse_arguments():
>                          help=type_help)
>      path_help = "Path to the compilation database to parse"
>      parser.add_argument("path", type=str, help=path_help)
> +    file_filter_help = "Optional Unix shell-style wildcard file filters"
> +    parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help)
>  
>      checks_help = "Checks to pass to the analysis"
>      parser.add_argument("-checks", type=str, default=None, help=checks_help)
> @@ -48,6 +51,22 @@ def init(l, a):
>      args = a
>  
>  
> +def filter_entries(datastore, filters):
> +    for entry in datastore:
> +        if filters == []:
> +            yield entry
> +            continue
> +
> +        assert entry['file'].startswith(entry['directory'])

What is the purpose of this assertion? Will it cause AssertionError
under normal circumstances?

> +        # filepath is relative to the directory, to avoid matching on the absolute path
> +        filepath = entry['file'][len(entry['directory']):].lstrip('/')
> +
> +        for pattern in filters:
> +            if fnmatch.fnmatch(filepath, pattern):
> +                yield entry
> +                break
> +
> +
>  def run_analysis(entry):
>      # Disable all checks, then re-enable the ones we want
>      global args
> @@ -87,6 +106,7 @@ def main():
>          # Read JSON data into the datastore variable
>          with open(args.path, "r") as f:
>              datastore = json.load(f)
> +            datastore = filter_entries(datastore, args.file_filter)
>              pool.map(run_analysis, datastore)
>      except BrokenPipeError:
>          # Python flushes standard streams on exit; redirect remaining output
> 
> ---
> base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826
> change-id: 20240704-clang-tidy-filter-f470377cb2b4
> 
> Best regards,
> -- 
> Théo Lebrun <theo.lebrun@bootlin.com>
>
Masahiro Yamada Aug. 3, 2024, 8:42 a.m. UTC | #2
On Thu, Jul 4, 2024 at 6:28 PM Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>
> Add file filtering feature. We take zero or more filters at the end as
> positional arguments. If none are given, the default behavior is kept
> and we run the tool on all files in the datastore. Else, files must
> match one or more filter to be analysed.
>
> The below command runs clang-tidy on drivers/clk/clk.c and all C files
> inside drivers/reset/.
>
>     ./scripts/clang-tools/run-clang-tools.py clang-tidy \
>         compile_commands.json \
>         'drivers/clk/clk.c' 'drivers/reset/*'
>
> The Python fnmatch builtin module is used. Matching is case-insensitive.
> See its documentation for allowed syntax:
> https://docs.python.org/3/library/fnmatch.html
>
> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
> ---
> Currently, all files in the datastore are analysed. This is not
> practical for grabbing errors in a subsystem, or relative to a patch
> series. Add a file filtering feature with wildcard support.
>
> Have a nice day,
> Théo
> ---
>  scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py
> index f31ffd09e1ea..b0b3a9c8cdec 100755
> --- a/scripts/clang-tools/run-clang-tools.py
> +++ b/scripts/clang-tools/run-clang-tools.py
> @@ -10,6 +10,7 @@ compile_commands.json.
>  """
>
>  import argparse
> +import fnmatch
>  import json
>  import multiprocessing
>  import subprocess
> @@ -32,6 +33,8 @@ def parse_arguments():
>                          help=type_help)
>      path_help = "Path to the compilation database to parse"
>      parser.add_argument("path", type=str, help=path_help)
> +    file_filter_help = "Optional Unix shell-style wildcard file filters"
> +    parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help)
>
>      checks_help = "Checks to pass to the analysis"
>      parser.add_argument("-checks", type=str, default=None, help=checks_help)
> @@ -48,6 +51,22 @@ def init(l, a):
>      args = a
>
>
> +def filter_entries(datastore, filters):
> +    for entry in datastore:
> +        if filters == []:
> +            yield entry
> +            continue


Maybe, this can be checked on the caller side.
(Note, I did not test this at all)


if args.file_filter:
        datastore = filter_entries(datastore, args.file_filter)




> +
> +        assert entry['file'].startswith(entry['directory'])
> +        # filepath is relative to the directory, to avoid matching on the absolute path



Does this assertion work with the separate output directory
(O= option)?


Just try this command:

 $ make LLVM=1 O=/tmp/foo clang-tidy

Check the generated /tmp/foo/compile_commands.json


The 'file' entry starts with your source directory.
The 'directory' entry starts with the build directory, "/tmp/foo".









> +        filepath = entry['file'][len(entry['directory']):].lstrip('/')
> +
> +        for pattern in filters:
> +            if fnmatch.fnmatch(filepath, pattern):
> +                yield entry
> +                break
> +
> +
>  def run_analysis(entry):
>      # Disable all checks, then re-enable the ones we want
>      global args
> @@ -87,6 +106,7 @@ def main():
>          # Read JSON data into the datastore variable
>          with open(args.path, "r") as f:
>              datastore = json.load(f)
> +            datastore = filter_entries(datastore, args.file_filter)
>              pool.map(run_analysis, datastore)
>      except BrokenPipeError:
>          # Python flushes standard streams on exit; redirect remaining output
>
> ---
> base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826
> change-id: 20240704-clang-tidy-filter-f470377cb2b4
>
> Best regards,
> --
> Théo Lebrun <theo.lebrun@bootlin.com>
>
>
Masahiro Yamada Aug. 3, 2024, 9:16 a.m. UTC | #3
On Sat, Aug 3, 2024 at 7:35 AM Nathan Chancellor <nathan@kernel.org> wrote:
>
> Hi Théo,
>
> First of all, apologies that it has taken me so long to review this!
>
> On Thu, Jul 04, 2024 at 11:28:21AM +0200, Théo Lebrun wrote:
> > Add file filtering feature. We take zero or more filters at the end as
> > positional arguments. If none are given, the default behavior is kept
> > and we run the tool on all files in the datastore. Else, files must
> > match one or more filter to be analysed.
> >
> > The below command runs clang-tidy on drivers/clk/clk.c and all C files
> > inside drivers/reset/.
> >
> >     ./scripts/clang-tools/run-clang-tools.py clang-tidy \
> >         compile_commands.json \
> >         'drivers/clk/clk.c' 'drivers/reset/*'
> >
> > The Python fnmatch builtin module is used. Matching is case-insensitive.
> > See its documentation for allowed syntax:
> > https://docs.python.org/3/library/fnmatch.html
> >
> > Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
> > ---
> > Currently, all files in the datastore are analysed. This is not
> > practical for grabbing errors in a subsystem, or relative to a patch
> > series. Add a file filtering feature with wildcard support.
>
> Sure, I think this is totally reasonable. In fact, I think some of this
> could be added to the commit message as further existence for this
> feature.
>
> The change itself looks good to me for the most part, I have some
> questions below just for my own understanding.
>
> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
>
> One further question/comment now: Have you considered a way to
> integrate this into Kbuild with the clang-tidy and clang-analyzer
> commands? I don't think it is strictly necessary for the acceptance of
> this patch but it might be nice to have some variable that users could
> provide to do this with their regular make command + the clang-tidy
> target? Not sure if Masahiro has further thoughts on that.


We can do this in a separate patch if it is desired.

This script already supports multiple options.
I'd like to prefer a single generic option
like CLANG_TOOLS_FLAGS.





> > Have a nice day,
> > Théo
> > ---
> >  scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py
> > index f31ffd09e1ea..b0b3a9c8cdec 100755
> > --- a/scripts/clang-tools/run-clang-tools.py
> > +++ b/scripts/clang-tools/run-clang-tools.py
> > @@ -10,6 +10,7 @@ compile_commands.json.
> >  """
> >
> >  import argparse
> > +import fnmatch
> >  import json
> >  import multiprocessing
> >  import subprocess
> > @@ -32,6 +33,8 @@ def parse_arguments():
> >                          help=type_help)
> >      path_help = "Path to the compilation database to parse"
> >      parser.add_argument("path", type=str, help=path_help)
> > +    file_filter_help = "Optional Unix shell-style wildcard file filters"
> > +    parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help)
> >
> >      checks_help = "Checks to pass to the analysis"
> >      parser.add_argument("-checks", type=str, default=None, help=checks_help)
> > @@ -48,6 +51,22 @@ def init(l, a):
> >      args = a
> >
> >
> > +def filter_entries(datastore, filters):
> > +    for entry in datastore:
> > +        if filters == []:
> > +            yield entry
> > +            continue
> > +
> > +        assert entry['file'].startswith(entry['directory'])
>
> What is the purpose of this assertion? Will it cause AssertionError
> under normal circumstances?
>
> > +        # filepath is relative to the directory, to avoid matching on the absolute path
> > +        filepath = entry['file'][len(entry['directory']):].lstrip('/')
> > +
> > +        for pattern in filters:
> > +            if fnmatch.fnmatch(filepath, pattern):
> > +                yield entry
> > +                break
> > +
> > +
> >  def run_analysis(entry):
> >      # Disable all checks, then re-enable the ones we want
> >      global args
> > @@ -87,6 +106,7 @@ def main():
> >          # Read JSON data into the datastore variable
> >          with open(args.path, "r") as f:
> >              datastore = json.load(f)
> > +            datastore = filter_entries(datastore, args.file_filter)
> >              pool.map(run_analysis, datastore)
> >      except BrokenPipeError:
> >          # Python flushes standard streams on exit; redirect remaining output
> >
> > ---
> > base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826
> > change-id: 20240704-clang-tidy-filter-f470377cb2b4
> >
> > Best regards,
> > --
> > Théo Lebrun <theo.lebrun@bootlin.com>
> >
>


--
Best Regards
Masahiro Yamada
Théo Lebrun Aug. 5, 2024, 8:12 a.m. UTC | #4
Hello Nathan,

On Sat Aug 3, 2024 at 12:35 AM CEST, Nathan Chancellor wrote:
> First of all, apologies that it has taken me so long to review this!

No worries, there is no rush!

> On Thu, Jul 04, 2024 at 11:28:21AM +0200, Théo Lebrun wrote:
> > Add file filtering feature. We take zero or more filters at the end as
> > positional arguments. If none are given, the default behavior is kept
> > and we run the tool on all files in the datastore. Else, files must
> > match one or more filter to be analysed.
> > 
> > The below command runs clang-tidy on drivers/clk/clk.c and all C files
> > inside drivers/reset/.
> > 
> >     ./scripts/clang-tools/run-clang-tools.py clang-tidy \
> >         compile_commands.json \
> >         'drivers/clk/clk.c' 'drivers/reset/*'
> > 
> > The Python fnmatch builtin module is used. Matching is case-insensitive.
> > See its documentation for allowed syntax:
> > https://docs.python.org/3/library/fnmatch.html
> > 
> > Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
> > ---
> > Currently, all files in the datastore are analysed. This is not
> > practical for grabbing errors in a subsystem, or relative to a patch
> > series. Add a file filtering feature with wildcard support.
>
> Sure, I think this is totally reasonable. In fact, I think some of this
> could be added to the commit message as further existence for this
> feature.

Indeed, it can be added to the commit message directly.

> The change itself looks good to me for the most part, I have some
> questions below just for my own understanding.
>
> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
>
> One further question/comment now: Have you considered a way to
> integrate this into Kbuild with the clang-tidy and clang-analyzer
> commands? I don't think it is strictly necessary for the acceptance of
> this patch but it might be nice to have some variable that users could
> provide to do this with their regular make command + the clang-tidy
> target? Not sure if Masahiro has further thoughts on that.

I have not as I am using this script by calling it directly.
It will either way be a separate patch.

>
> > Have a nice day,
> > Théo
> > ---
> >  scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> > 
> > diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py
> > index f31ffd09e1ea..b0b3a9c8cdec 100755
> > --- a/scripts/clang-tools/run-clang-tools.py
> > +++ b/scripts/clang-tools/run-clang-tools.py
> > @@ -10,6 +10,7 @@ compile_commands.json.
> >  """
> >  
> >  import argparse
> > +import fnmatch
> >  import json
> >  import multiprocessing
> >  import subprocess
> > @@ -32,6 +33,8 @@ def parse_arguments():
> >                          help=type_help)
> >      path_help = "Path to the compilation database to parse"
> >      parser.add_argument("path", type=str, help=path_help)
> > +    file_filter_help = "Optional Unix shell-style wildcard file filters"
> > +    parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help)
> >  
> >      checks_help = "Checks to pass to the analysis"
> >      parser.add_argument("-checks", type=str, default=None, help=checks_help)
> > @@ -48,6 +51,22 @@ def init(l, a):
> >      args = a
> >  
> >  
> > +def filter_entries(datastore, filters):
> > +    for entry in datastore:
> > +        if filters == []:
> > +            yield entry
> > +            continue
> > +
> > +        assert entry['file'].startswith(entry['directory'])
>
> What is the purpose of this assertion? Will it cause AssertionError
> under normal circumstances?

Just below we extract `filepath` from entry["file"] by removing at its
start the length of entry["directory"]. We expect entry["file"] to
start with entry["directory"], so we document that with an assertion.

If this assertion triggers, it means the line below would do something
weird and would silently break the program. Silently because `filepath`
is used for pattern matching and is never displayed.

>
> > +        # filepath is relative to the directory, to avoid matching on the absolute path
> > +        filepath = entry['file'][len(entry['directory']):].lstrip('/')

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Théo Lebrun Aug. 5, 2024, 8:39 a.m. UTC | #5
Hello Masahiro,

On Sat Aug 3, 2024 at 10:42 AM CEST, Masahiro Yamada wrote:
> On Thu, Jul 4, 2024 at 6:28 PM Théo Lebrun <theo.lebrun@bootlin.com> wrote:
> > +def filter_entries(datastore, filters):
> > +    for entry in datastore:
> > +        if filters == []:
> > +            yield entry
> > +            continue
>
> Maybe, this can be checked on the caller side.
> (Note, I did not test this at all)
>
> if args.file_filter:
>         datastore = filter_entries(datastore, args.file_filter)

Agreed.

> > +
> > +        assert entry['file'].startswith(entry['directory'])
> > +        # filepath is relative to the directory, to avoid matching on the absolute path
>
> Does this assertion work with the separate output directory
> (O= option)?
>
> Just try this command:
>
>  $ make LLVM=1 O=/tmp/foo clang-tidy

Indeed this does not work. It requires some changes to Makefiles to see
that it does not work, as this assertion is only used when a filter is
passed (which current cmd_clang_tools cannot do).

What would you recommend to fix that? If we take the first entry:

  {
    "command": "clang ...",
    "directory": "/tmp/foo",
    "file": "/ABSOLUTE/PATH/linux-clang-tidy/arch/mips/crypto/crc32-mips.c"
  },

I don't see an easy way to know that "/ABSOLUTE/PATH/linux-clang-tidy/"
should be removed. We could probably compute it from the command value
but that isn't a great way forward.

There are some entries that look like this. Those are easy to deal with.

  {
    "command": "clang ...",
    "directory": "/tmp/foo",
    "file": "/tmp/foo/drivers/tty/vt/defkeymap.c"
  }

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Masahiro Yamada Aug. 6, 2024, 9:49 a.m. UTC | #6
On Mon, Aug 5, 2024 at 5:39 PM Théo Lebrun <theo.lebrun@bootlin.com> wrote:
>
> Hello Masahiro,
>
> On Sat Aug 3, 2024 at 10:42 AM CEST, Masahiro Yamada wrote:
> > On Thu, Jul 4, 2024 at 6:28 PM Théo Lebrun <theo.lebrun@bootlin.com> wrote:
> > > +def filter_entries(datastore, filters):
> > > +    for entry in datastore:
> > > +        if filters == []:
> > > +            yield entry
> > > +            continue
> >
> > Maybe, this can be checked on the caller side.
> > (Note, I did not test this at all)
> >
> > if args.file_filter:
> >         datastore = filter_entries(datastore, args.file_filter)
>
> Agreed.
>
> > > +
> > > +        assert entry['file'].startswith(entry['directory'])
> > > +        # filepath is relative to the directory, to avoid matching on the absolute path
> >
> > Does this assertion work with the separate output directory
> > (O= option)?
> >
> > Just try this command:
> >
> >  $ make LLVM=1 O=/tmp/foo clang-tidy
>
> Indeed this does not work. It requires some changes to Makefiles to see
> that it does not work, as this assertion is only used when a filter is
> passed (which current cmd_clang_tools cannot do).
>
> What would you recommend to fix that? If we take the first entry:
>
>   {
>     "command": "clang ...",
>     "directory": "/tmp/foo",
>     "file": "/ABSOLUTE/PATH/linux-clang-tidy/arch/mips/crypto/crc32-mips.c"
>   },
>
> I don't see an easy way to know that "/ABSOLUTE/PATH/linux-clang-tidy/"
> should be removed. We could probably compute it from the command value
> but that isn't a great way forward.
>
> There are some entries that look like this. Those are easy to deal with.
>
>   {
>     "command": "clang ...",
>     "directory": "/tmp/foo",
>     "file": "/tmp/foo/drivers/tty/vt/defkeymap.c"
>   }


The top of the source tree is derived from __file__

  srctree = str(pathlib.Path(__file__).parents[2].resolve())








--
Best Regards
Masahiro Yamada
diff mbox series

Patch

diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py
index f31ffd09e1ea..b0b3a9c8cdec 100755
--- a/scripts/clang-tools/run-clang-tools.py
+++ b/scripts/clang-tools/run-clang-tools.py
@@ -10,6 +10,7 @@  compile_commands.json.
 """
 
 import argparse
+import fnmatch
 import json
 import multiprocessing
 import subprocess
@@ -32,6 +33,8 @@  def parse_arguments():
                         help=type_help)
     path_help = "Path to the compilation database to parse"
     parser.add_argument("path", type=str, help=path_help)
+    file_filter_help = "Optional Unix shell-style wildcard file filters"
+    parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help)
 
     checks_help = "Checks to pass to the analysis"
     parser.add_argument("-checks", type=str, default=None, help=checks_help)
@@ -48,6 +51,22 @@  def init(l, a):
     args = a
 
 
+def filter_entries(datastore, filters):
+    for entry in datastore:
+        if filters == []:
+            yield entry
+            continue
+
+        assert entry['file'].startswith(entry['directory'])
+        # filepath is relative to the directory, to avoid matching on the absolute path
+        filepath = entry['file'][len(entry['directory']):].lstrip('/')
+
+        for pattern in filters:
+            if fnmatch.fnmatch(filepath, pattern):
+                yield entry
+                break
+
+
 def run_analysis(entry):
     # Disable all checks, then re-enable the ones we want
     global args
@@ -87,6 +106,7 @@  def main():
         # Read JSON data into the datastore variable
         with open(args.path, "r") as f:
             datastore = json.load(f)
+            datastore = filter_entries(datastore, args.file_filter)
             pool.map(run_analysis, datastore)
     except BrokenPipeError:
         # Python flushes standard streams on exit; redirect remaining output