Message ID | 20240704-clang-tidy-filter-v1-1-8d4556a35b65@bootlin.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | scripts: run-clang-tools: add file filtering option | expand |
Hi Théo, First of all, apologies that it has taken me so long to review this! On Thu, Jul 04, 2024 at 11:28:21AM +0200, Théo Lebrun wrote: > Add file filtering feature. We take zero or more filters at the end as > positional arguments. If none are given, the default behavior is kept > and we run the tool on all files in the datastore. Else, files must > match one or more filter to be analysed. > > The below command runs clang-tidy on drivers/clk/clk.c and all C files > inside drivers/reset/. > > ./scripts/clang-tools/run-clang-tools.py clang-tidy \ > compile_commands.json \ > 'drivers/clk/clk.c' 'drivers/reset/*' > > The Python fnmatch builtin module is used. Matching is case-insensitive. > See its documentation for allowed syntax: > https://docs.python.org/3/library/fnmatch.html > > Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> > --- > Currently, all files in the datastore are analysed. This is not > practical for grabbing errors in a subsystem, or relative to a patch > series. Add a file filtering feature with wildcard support. Sure, I think this is totally reasonable. In fact, I think some of this could be added to the commit message as further existence for this feature. The change itself looks good to me for the most part, I have some questions below just for my own understanding. Reviewed-by: Nathan Chancellor <nathan@kernel.org> One further question/comment now: Have you considered a way to integrate this into Kbuild with the clang-tidy and clang-analyzer commands? I don't think it is strictly necessary for the acceptance of this patch but it might be nice to have some variable that users could provide to do this with their regular make command + the clang-tidy target? Not sure if Masahiro has further thoughts on that. > Have a nice day, > Théo > --- > scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py > index f31ffd09e1ea..b0b3a9c8cdec 100755 > --- a/scripts/clang-tools/run-clang-tools.py > +++ b/scripts/clang-tools/run-clang-tools.py > @@ -10,6 +10,7 @@ compile_commands.json. > """ > > import argparse > +import fnmatch > import json > import multiprocessing > import subprocess > @@ -32,6 +33,8 @@ def parse_arguments(): > help=type_help) > path_help = "Path to the compilation database to parse" > parser.add_argument("path", type=str, help=path_help) > + file_filter_help = "Optional Unix shell-style wildcard file filters" > + parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help) > > checks_help = "Checks to pass to the analysis" > parser.add_argument("-checks", type=str, default=None, help=checks_help) > @@ -48,6 +51,22 @@ def init(l, a): > args = a > > > +def filter_entries(datastore, filters): > + for entry in datastore: > + if filters == []: > + yield entry > + continue > + > + assert entry['file'].startswith(entry['directory']) What is the purpose of this assertion? Will it cause AssertionError under normal circumstances? > + # filepath is relative to the directory, to avoid matching on the absolute path > + filepath = entry['file'][len(entry['directory']):].lstrip('/') > + > + for pattern in filters: > + if fnmatch.fnmatch(filepath, pattern): > + yield entry > + break > + > + > def run_analysis(entry): > # Disable all checks, then re-enable the ones we want > global args > @@ -87,6 +106,7 @@ def main(): > # Read JSON data into the datastore variable > with open(args.path, "r") as f: > datastore = json.load(f) > + datastore = filter_entries(datastore, args.file_filter) > pool.map(run_analysis, datastore) > except BrokenPipeError: > # Python flushes standard streams on exit; redirect remaining output > > --- > base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826 > change-id: 20240704-clang-tidy-filter-f470377cb2b4 > > Best regards, > -- > Théo Lebrun <theo.lebrun@bootlin.com> >
On Thu, Jul 4, 2024 at 6:28 PM Théo Lebrun <theo.lebrun@bootlin.com> wrote: > > Add file filtering feature. We take zero or more filters at the end as > positional arguments. If none are given, the default behavior is kept > and we run the tool on all files in the datastore. Else, files must > match one or more filter to be analysed. > > The below command runs clang-tidy on drivers/clk/clk.c and all C files > inside drivers/reset/. > > ./scripts/clang-tools/run-clang-tools.py clang-tidy \ > compile_commands.json \ > 'drivers/clk/clk.c' 'drivers/reset/*' > > The Python fnmatch builtin module is used. Matching is case-insensitive. > See its documentation for allowed syntax: > https://docs.python.org/3/library/fnmatch.html > > Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> > --- > Currently, all files in the datastore are analysed. This is not > practical for grabbing errors in a subsystem, or relative to a patch > series. Add a file filtering feature with wildcard support. > > Have a nice day, > Théo > --- > scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py > index f31ffd09e1ea..b0b3a9c8cdec 100755 > --- a/scripts/clang-tools/run-clang-tools.py > +++ b/scripts/clang-tools/run-clang-tools.py > @@ -10,6 +10,7 @@ compile_commands.json. > """ > > import argparse > +import fnmatch > import json > import multiprocessing > import subprocess > @@ -32,6 +33,8 @@ def parse_arguments(): > help=type_help) > path_help = "Path to the compilation database to parse" > parser.add_argument("path", type=str, help=path_help) > + file_filter_help = "Optional Unix shell-style wildcard file filters" > + parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help) > > checks_help = "Checks to pass to the analysis" > parser.add_argument("-checks", type=str, default=None, help=checks_help) > @@ -48,6 +51,22 @@ def init(l, a): > args = a > > > +def filter_entries(datastore, filters): > + for entry in datastore: > + if filters == []: > + yield entry > + continue Maybe, this can be checked on the caller side. (Note, I did not test this at all) if args.file_filter: datastore = filter_entries(datastore, args.file_filter) > + > + assert entry['file'].startswith(entry['directory']) > + # filepath is relative to the directory, to avoid matching on the absolute path Does this assertion work with the separate output directory (O= option)? Just try this command: $ make LLVM=1 O=/tmp/foo clang-tidy Check the generated /tmp/foo/compile_commands.json The 'file' entry starts with your source directory. The 'directory' entry starts with the build directory, "/tmp/foo". > + filepath = entry['file'][len(entry['directory']):].lstrip('/') > + > + for pattern in filters: > + if fnmatch.fnmatch(filepath, pattern): > + yield entry > + break > + > + > def run_analysis(entry): > # Disable all checks, then re-enable the ones we want > global args > @@ -87,6 +106,7 @@ def main(): > # Read JSON data into the datastore variable > with open(args.path, "r") as f: > datastore = json.load(f) > + datastore = filter_entries(datastore, args.file_filter) > pool.map(run_analysis, datastore) > except BrokenPipeError: > # Python flushes standard streams on exit; redirect remaining output > > --- > base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826 > change-id: 20240704-clang-tidy-filter-f470377cb2b4 > > Best regards, > -- > Théo Lebrun <theo.lebrun@bootlin.com> > >
On Sat, Aug 3, 2024 at 7:35 AM Nathan Chancellor <nathan@kernel.org> wrote: > > Hi Théo, > > First of all, apologies that it has taken me so long to review this! > > On Thu, Jul 04, 2024 at 11:28:21AM +0200, Théo Lebrun wrote: > > Add file filtering feature. We take zero or more filters at the end as > > positional arguments. If none are given, the default behavior is kept > > and we run the tool on all files in the datastore. Else, files must > > match one or more filter to be analysed. > > > > The below command runs clang-tidy on drivers/clk/clk.c and all C files > > inside drivers/reset/. > > > > ./scripts/clang-tools/run-clang-tools.py clang-tidy \ > > compile_commands.json \ > > 'drivers/clk/clk.c' 'drivers/reset/*' > > > > The Python fnmatch builtin module is used. Matching is case-insensitive. > > See its documentation for allowed syntax: > > https://docs.python.org/3/library/fnmatch.html > > > > Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> > > --- > > Currently, all files in the datastore are analysed. This is not > > practical for grabbing errors in a subsystem, or relative to a patch > > series. Add a file filtering feature with wildcard support. > > Sure, I think this is totally reasonable. In fact, I think some of this > could be added to the commit message as further existence for this > feature. > > The change itself looks good to me for the most part, I have some > questions below just for my own understanding. > > Reviewed-by: Nathan Chancellor <nathan@kernel.org> > > One further question/comment now: Have you considered a way to > integrate this into Kbuild with the clang-tidy and clang-analyzer > commands? I don't think it is strictly necessary for the acceptance of > this patch but it might be nice to have some variable that users could > provide to do this with their regular make command + the clang-tidy > target? Not sure if Masahiro has further thoughts on that. We can do this in a separate patch if it is desired. This script already supports multiple options. I'd like to prefer a single generic option like CLANG_TOOLS_FLAGS. > > Have a nice day, > > Théo > > --- > > scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++ > > 1 file changed, 20 insertions(+) > > > > diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py > > index f31ffd09e1ea..b0b3a9c8cdec 100755 > > --- a/scripts/clang-tools/run-clang-tools.py > > +++ b/scripts/clang-tools/run-clang-tools.py > > @@ -10,6 +10,7 @@ compile_commands.json. > > """ > > > > import argparse > > +import fnmatch > > import json > > import multiprocessing > > import subprocess > > @@ -32,6 +33,8 @@ def parse_arguments(): > > help=type_help) > > path_help = "Path to the compilation database to parse" > > parser.add_argument("path", type=str, help=path_help) > > + file_filter_help = "Optional Unix shell-style wildcard file filters" > > + parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help) > > > > checks_help = "Checks to pass to the analysis" > > parser.add_argument("-checks", type=str, default=None, help=checks_help) > > @@ -48,6 +51,22 @@ def init(l, a): > > args = a > > > > > > +def filter_entries(datastore, filters): > > + for entry in datastore: > > + if filters == []: > > + yield entry > > + continue > > + > > + assert entry['file'].startswith(entry['directory']) > > What is the purpose of this assertion? Will it cause AssertionError > under normal circumstances? > > > + # filepath is relative to the directory, to avoid matching on the absolute path > > + filepath = entry['file'][len(entry['directory']):].lstrip('/') > > + > > + for pattern in filters: > > + if fnmatch.fnmatch(filepath, pattern): > > + yield entry > > + break > > + > > + > > def run_analysis(entry): > > # Disable all checks, then re-enable the ones we want > > global args > > @@ -87,6 +106,7 @@ def main(): > > # Read JSON data into the datastore variable > > with open(args.path, "r") as f: > > datastore = json.load(f) > > + datastore = filter_entries(datastore, args.file_filter) > > pool.map(run_analysis, datastore) > > except BrokenPipeError: > > # Python flushes standard streams on exit; redirect remaining output > > > > --- > > base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826 > > change-id: 20240704-clang-tidy-filter-f470377cb2b4 > > > > Best regards, > > -- > > Théo Lebrun <theo.lebrun@bootlin.com> > > > -- Best Regards Masahiro Yamada
Hello Nathan, On Sat Aug 3, 2024 at 12:35 AM CEST, Nathan Chancellor wrote: > First of all, apologies that it has taken me so long to review this! No worries, there is no rush! > On Thu, Jul 04, 2024 at 11:28:21AM +0200, Théo Lebrun wrote: > > Add file filtering feature. We take zero or more filters at the end as > > positional arguments. If none are given, the default behavior is kept > > and we run the tool on all files in the datastore. Else, files must > > match one or more filter to be analysed. > > > > The below command runs clang-tidy on drivers/clk/clk.c and all C files > > inside drivers/reset/. > > > > ./scripts/clang-tools/run-clang-tools.py clang-tidy \ > > compile_commands.json \ > > 'drivers/clk/clk.c' 'drivers/reset/*' > > > > The Python fnmatch builtin module is used. Matching is case-insensitive. > > See its documentation for allowed syntax: > > https://docs.python.org/3/library/fnmatch.html > > > > Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> > > --- > > Currently, all files in the datastore are analysed. This is not > > practical for grabbing errors in a subsystem, or relative to a patch > > series. Add a file filtering feature with wildcard support. > > Sure, I think this is totally reasonable. In fact, I think some of this > could be added to the commit message as further existence for this > feature. Indeed, it can be added to the commit message directly. > The change itself looks good to me for the most part, I have some > questions below just for my own understanding. > > Reviewed-by: Nathan Chancellor <nathan@kernel.org> > > One further question/comment now: Have you considered a way to > integrate this into Kbuild with the clang-tidy and clang-analyzer > commands? I don't think it is strictly necessary for the acceptance of > this patch but it might be nice to have some variable that users could > provide to do this with their regular make command + the clang-tidy > target? Not sure if Masahiro has further thoughts on that. I have not as I am using this script by calling it directly. It will either way be a separate patch. > > > Have a nice day, > > Théo > > --- > > scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++ > > 1 file changed, 20 insertions(+) > > > > diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py > > index f31ffd09e1ea..b0b3a9c8cdec 100755 > > --- a/scripts/clang-tools/run-clang-tools.py > > +++ b/scripts/clang-tools/run-clang-tools.py > > @@ -10,6 +10,7 @@ compile_commands.json. > > """ > > > > import argparse > > +import fnmatch > > import json > > import multiprocessing > > import subprocess > > @@ -32,6 +33,8 @@ def parse_arguments(): > > help=type_help) > > path_help = "Path to the compilation database to parse" > > parser.add_argument("path", type=str, help=path_help) > > + file_filter_help = "Optional Unix shell-style wildcard file filters" > > + parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help) > > > > checks_help = "Checks to pass to the analysis" > > parser.add_argument("-checks", type=str, default=None, help=checks_help) > > @@ -48,6 +51,22 @@ def init(l, a): > > args = a > > > > > > +def filter_entries(datastore, filters): > > + for entry in datastore: > > + if filters == []: > > + yield entry > > + continue > > + > > + assert entry['file'].startswith(entry['directory']) > > What is the purpose of this assertion? Will it cause AssertionError > under normal circumstances? Just below we extract `filepath` from entry["file"] by removing at its start the length of entry["directory"]. We expect entry["file"] to start with entry["directory"], so we document that with an assertion. If this assertion triggers, it means the line below would do something weird and would silently break the program. Silently because `filepath` is used for pattern matching and is never displayed. > > > + # filepath is relative to the directory, to avoid matching on the absolute path > > + filepath = entry['file'][len(entry['directory']):].lstrip('/') Regards, -- Théo Lebrun, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Hello Masahiro, On Sat Aug 3, 2024 at 10:42 AM CEST, Masahiro Yamada wrote: > On Thu, Jul 4, 2024 at 6:28 PM Théo Lebrun <theo.lebrun@bootlin.com> wrote: > > +def filter_entries(datastore, filters): > > + for entry in datastore: > > + if filters == []: > > + yield entry > > + continue > > Maybe, this can be checked on the caller side. > (Note, I did not test this at all) > > if args.file_filter: > datastore = filter_entries(datastore, args.file_filter) Agreed. > > + > > + assert entry['file'].startswith(entry['directory']) > > + # filepath is relative to the directory, to avoid matching on the absolute path > > Does this assertion work with the separate output directory > (O= option)? > > Just try this command: > > $ make LLVM=1 O=/tmp/foo clang-tidy Indeed this does not work. It requires some changes to Makefiles to see that it does not work, as this assertion is only used when a filter is passed (which current cmd_clang_tools cannot do). What would you recommend to fix that? If we take the first entry: { "command": "clang ...", "directory": "/tmp/foo", "file": "/ABSOLUTE/PATH/linux-clang-tidy/arch/mips/crypto/crc32-mips.c" }, I don't see an easy way to know that "/ABSOLUTE/PATH/linux-clang-tidy/" should be removed. We could probably compute it from the command value but that isn't a great way forward. There are some entries that look like this. Those are easy to deal with. { "command": "clang ...", "directory": "/tmp/foo", "file": "/tmp/foo/drivers/tty/vt/defkeymap.c" } Regards, -- Théo Lebrun, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
On Mon, Aug 5, 2024 at 5:39 PM Théo Lebrun <theo.lebrun@bootlin.com> wrote: > > Hello Masahiro, > > On Sat Aug 3, 2024 at 10:42 AM CEST, Masahiro Yamada wrote: > > On Thu, Jul 4, 2024 at 6:28 PM Théo Lebrun <theo.lebrun@bootlin.com> wrote: > > > +def filter_entries(datastore, filters): > > > + for entry in datastore: > > > + if filters == []: > > > + yield entry > > > + continue > > > > Maybe, this can be checked on the caller side. > > (Note, I did not test this at all) > > > > if args.file_filter: > > datastore = filter_entries(datastore, args.file_filter) > > Agreed. > > > > + > > > + assert entry['file'].startswith(entry['directory']) > > > + # filepath is relative to the directory, to avoid matching on the absolute path > > > > Does this assertion work with the separate output directory > > (O= option)? > > > > Just try this command: > > > > $ make LLVM=1 O=/tmp/foo clang-tidy > > Indeed this does not work. It requires some changes to Makefiles to see > that it does not work, as this assertion is only used when a filter is > passed (which current cmd_clang_tools cannot do). > > What would you recommend to fix that? If we take the first entry: > > { > "command": "clang ...", > "directory": "/tmp/foo", > "file": "/ABSOLUTE/PATH/linux-clang-tidy/arch/mips/crypto/crc32-mips.c" > }, > > I don't see an easy way to know that "/ABSOLUTE/PATH/linux-clang-tidy/" > should be removed. We could probably compute it from the command value > but that isn't a great way forward. > > There are some entries that look like this. Those are easy to deal with. > > { > "command": "clang ...", > "directory": "/tmp/foo", > "file": "/tmp/foo/drivers/tty/vt/defkeymap.c" > } The top of the source tree is derived from __file__ srctree = str(pathlib.Path(__file__).parents[2].resolve()) -- Best Regards Masahiro Yamada
diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py index f31ffd09e1ea..b0b3a9c8cdec 100755 --- a/scripts/clang-tools/run-clang-tools.py +++ b/scripts/clang-tools/run-clang-tools.py @@ -10,6 +10,7 @@ compile_commands.json. """ import argparse +import fnmatch import json import multiprocessing import subprocess @@ -32,6 +33,8 @@ def parse_arguments(): help=type_help) path_help = "Path to the compilation database to parse" parser.add_argument("path", type=str, help=path_help) + file_filter_help = "Optional Unix shell-style wildcard file filters" + parser.add_argument("file_filter", type=str, nargs="*", help=file_filter_help) checks_help = "Checks to pass to the analysis" parser.add_argument("-checks", type=str, default=None, help=checks_help) @@ -48,6 +51,22 @@ def init(l, a): args = a +def filter_entries(datastore, filters): + for entry in datastore: + if filters == []: + yield entry + continue + + assert entry['file'].startswith(entry['directory']) + # filepath is relative to the directory, to avoid matching on the absolute path + filepath = entry['file'][len(entry['directory']):].lstrip('/') + + for pattern in filters: + if fnmatch.fnmatch(filepath, pattern): + yield entry + break + + def run_analysis(entry): # Disable all checks, then re-enable the ones we want global args @@ -87,6 +106,7 @@ def main(): # Read JSON data into the datastore variable with open(args.path, "r") as f: datastore = json.load(f) + datastore = filter_entries(datastore, args.file_filter) pool.map(run_analysis, datastore) except BrokenPipeError: # Python flushes standard streams on exit; redirect remaining output
Add file filtering feature. We take zero or more filters at the end as positional arguments. If none are given, the default behavior is kept and we run the tool on all files in the datastore. Else, files must match one or more filter to be analysed. The below command runs clang-tidy on drivers/clk/clk.c and all C files inside drivers/reset/. ./scripts/clang-tools/run-clang-tools.py clang-tidy \ compile_commands.json \ 'drivers/clk/clk.c' 'drivers/reset/*' The Python fnmatch builtin module is used. Matching is case-insensitive. See its documentation for allowed syntax: https://docs.python.org/3/library/fnmatch.html Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> --- Currently, all files in the datastore are analysed. This is not practical for grabbing errors in a subsystem, or relative to a patch series. Add a file filtering feature with wildcard support. Have a nice day, Théo --- scripts/clang-tools/run-clang-tools.py | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) --- base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826 change-id: 20240704-clang-tidy-filter-f470377cb2b4 Best regards,