mbox series

[0/2] diff-report.py tool

Message ID 20230504142523.2989306-1-luca.fancellu@arm.com (mailing list archive)
Headers show
Series diff-report.py tool | expand

Message

Luca Fancellu May 4, 2023, 2:25 p.m. UTC
--------------------------------------------------------------------------------
This serie is dependent on this patch:
https://patchwork.kernel.org/project/xen-devel/patch/20230504131245.2985400-4-luca.fancellu@arm.com/
--------------------------------------------------------------------------------

Now that we have a tool (xen-analysis.py) that wraps cppcheck to generates
reports, we have a generic overview of how many static analysis issues and non
compliances to MISRA C we have for a certain revision of the codebase.

This is great and eventually the work to be done should be just having less and
less findings in the report until we reach zero.

This is just an ideal trend, because in practice we might have issues that
comes from existing code (macro for example) that are not going to be fixed
soon for any reason, but we would like to check and see how many issues are
introduced by the commits added (ideally zero, but if any is added and the fault
resides outside the changed code, maintainers might decide to include it
anyway).

So the idea is to check the difference between the reports of the codebase: one
called "baseline" which is basically the current codebase, the other one called
"new report" that is the codebase after the changes.
To check if any new finding is added, we need to have a look on every finding in
the "new report" that is not listed in the "baseline".

It seems very simple, but what can happen to existing findings in the code after
a commit is applied?
Basically existing findings can be shifted in position due to changes to
unrelated lines, or they can be also deleted or fixed for changes involving the
findings line (Michal was the first one to point that out).

So comparing the two report now seems quite difficult, because if we compare
them we will have all the new findings plus all the findings that changed
position due to the changes applied.

To overcome this, the diff-tool.py is created and it allows to "patch" the
"baseline" report, looking into the changes applied to the baseline codebase,
described by git diff.

This serie is organised in two patch, I've tried to split the amount of code and
to leave a meaning, so in the first patch you have everything you need to
import cppcheck reports and do a "raw" diff between reports, this gives you
an hint about new findings + old findings that has changed place.

The second patch is adding the "patching" system, having a class that parses
the git diff output and later "patching" the baseline before doing the
comparison. This last option is activated only when passing the git diff changes
to the tool, but everything is described (I hope) in the help.

Some consideration needs to be made, this tool can translate in coordinates
(file, line) the findings from the "baseline" to the "new report" using the git
diff output as, let me say, a translation matrix.
This doesn't mean it can understand the meaning of the findings and recognise
them in the new codebase, so for example, a finding related to a line that is
moved to another part of the file won't be recognised as "old finding" and will
be just removed from the "baseline patched report", however we will find it
in the new report unless it contains a fix for the reported issue.

This means that the tool is not really suited to be a gatekeeper for the merge
action, it is more suitable to help the maintainer to understand when a change
is introducing new issues without having to check and compare manually two
reports of (nowadays) hundreds of finding.
Eventually we could run it in the CI and make the CI reply to the patchwork
thread with its output!

The tool has also a debug argument that when applied, generates extra files
that can be checked against the original files, for example the reports are
imported in the tool, and afterwards the debug code will generate them back from
the imported data and they should be the same (if everything works).
Another debug check is to export the representation of the parsed git diff
output, so that the developer can check how and if the parser interpreted
correctly the data.

Future works for this tool might be to parse also Coverity reports and
eventually (don't know if it is possible) also eclair text report.

Luca Fancellu (2):
  xen/misra: add diff-report.py tool
  xen/misra: diff-report.py: add report patching feature

 xen/scripts/diff-report.py                    | 127 +++++++++++
 .../xen_analysis/diff_tool/__init__.py        |   0
 .../xen_analysis/diff_tool/cppcheck_report.py |  41 ++++
 xen/scripts/xen_analysis/diff_tool/debug.py   |  55 +++++
 xen/scripts/xen_analysis/diff_tool/report.py  | 198 +++++++++++++++++
 .../diff_tool/unified_format_parser.py        | 202 ++++++++++++++++++
 6 files changed, 623 insertions(+)
 create mode 100755 xen/scripts/diff-report.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/__init__.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/cppcheck_report.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/debug.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/report.py
 create mode 100644 xen/scripts/xen_analysis/diff_tool/unified_format_parser.py