diff mbox series

scripts: Report 'suspicious' comments

Message ID 20200827151333.11591-1-mab@mab-labs.com (mailing list archive)
State New, archived
Headers show
Series scripts: Report 'suspicious' comments | expand

Commit Message

Mohammed Billoo Aug. 27, 2020, 3:13 p.m. UTC
This perl script attempts to mitigate CWE-546
(https://cwe.mitre.org/data/definitions/546.html), which identifies code
with comments that suggest that code is incomplete. This script was
tested against the kernel, and the following is a snippet of the
output that was generated. The output was verified by confirming that
the specified file does indeed have that string at the specified line.

./arch/arm/include/asm/pgtable.h contains FIXME on line 316
./arch/arm/include/debug/imx.S contains FIXME on line 14
./arch/arm/kernel/entry-header.S contains BUG on line 71
./arch/arm/kernel/fiq.c contains FIXME on line 72

Signed-off-by: Mohammed Billoo <mab@mab-labs.com>
---
 Makefile                |  8 +++++++-
 scripts/checkcomment.pl | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 scripts/checkcomment.pl

Comments

Lukas Bulwahn Sept. 9, 2020, 2:05 p.m. UTC | #1
On Thu, 27 Aug 2020, Mohammed Billoo wrote:

> This perl script attempts to mitigate CWE-546
> (https://cwe.mitre.org/data/definitions/546.html), which identifies code
> with comments that suggest that code is incomplete. This script was
> tested against the kernel, and the following is a snippet of the
> output that was generated. The output was verified by confirming that
> the specified file does indeed have that string at the specified line.
> 
> ./arch/arm/include/asm/pgtable.h contains FIXME on line 316
> ./arch/arm/include/debug/imx.S contains FIXME on line 14
> ./arch/arm/kernel/entry-header.S contains BUG on line 71
> ./arch/arm/kernel/fiq.c contains FIXME on line 72
>

Okay, easy things first:

The patch applies on v5.9-rc4 and the script runs. :)

Now, the challenging parts:

- I think your output should follow in its format the format of other 
"standard" tools that other tools use, e.g., look how gcc and clang 
compiler emit warnings, or how checkpatch.pl emits warnings.

We should try to imitate their output format for this tool.

Also, I would like to see the line which was warned about.

- Second, we then need to look into true and false positives here.
BUG appears in intended error messages, in DEBUG, in functions, such as 
BUG, BUG_ON. So we need to collect the context around BUG and see to 
improve the pattern to at least reduce the false positives.

- Third, we then need have suitable ways to aggregate the findings to know 
which places are known to have many such findings and hence deserve some 
kind of special care.

And then, we need a lot of help to make proper evaluations what to do.

Lukas 
 
> Signed-off-by: Mohammed Billoo <mab@mab-labs.com>
> ---
>  Makefile                |  8 +++++++-
>  scripts/checkcomment.pl | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 1 deletion(-)
>  create mode 100644 scripts/checkcomment.pl
> 
> diff --git a/Makefile b/Makefile
> index f21168154160..c84b8bc5c18e 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -264,7 +264,7 @@ no-dot-config-targets := $(clean-targets) \
>  			 cscope gtags TAGS tags help% %docs check% coccicheck \
>  			 $(version_h) headers headers_% archheaders archscripts \
>  			 %asm-generic kernelversion %src-pkg dt_binding_check \
> -			 outputmakefile
> +			 outputmakefile commentcheck
>  no-sync-config-targets := $(no-dot-config-targets) %install kernelrelease
>  single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.s %.symtypes %/
>  
> @@ -1575,6 +1575,7 @@ help:
>  	@echo  '  export_report   - List the usages of all exported symbols'
>  	@echo  '  headerdep       - Detect inclusion cycles in headers'
>  	@echo  '  coccicheck      - Check with Coccinelle'
> +	@echo  '  commentcheck    - Check and report suspicious comments'
>  	@echo  ''
>  	@echo  'Tools:'
>  	@echo  '  nsdeps          - Generate missing symbol namespace dependencies'
> @@ -1842,6 +1843,11 @@ versioncheck:
>  		-name '*.[hcS]' -type f -print | sort \
>  		| xargs $(PERL) -w $(srctree)/scripts/checkversion.pl
>  
> +commentcheck:
> +	find $(srctree)/* $(RCS_FIND_IGNORE) \
> +		-name '*.[hcS]' -type f -print | sort \
> +		| xargs $(PERL) -w $(srctree)/scripts/checkcomment.pl
> +
>  coccicheck:
>  	$(Q)$(BASH) $(srctree)/scripts/$@
>  
> diff --git a/scripts/checkcomment.pl b/scripts/checkcomment.pl
> new file mode 100644
> index 000000000000..22fd77bc75d1
> --- /dev/null
> +++ b/scripts/checkcomment.pl
> @@ -0,0 +1,35 @@
> +#!/usr/bin/env perl
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# (c) 2020, Mohammed Billoo (mab@mab-labs.com)
> +#
> +# This script checks for any keywords outlined in CWE-546
> +# (https://cwe.mitre.org/data/definitions/546.html)
> +# and simply reports them to the user. It's up to the user
> +# to take any further actions.
> +
> +use strict;
> +
> +my @keywords = ('TODO', 'BUG', 'FIXME', 'HACK');
> +my @mismatch_keywords = ('BUG\(\)');
> +
> +foreach my $file (@ARGV) {
> +	my $i = 1;
> +	open(my $f, '<', $file)
> +		or die "Cannot open $file: $!\n";
> +
> +	while (my $line = <$f>) {
> +		foreach my $keyword (@keywords) {
> +			if ($line =~ /\b$keyword\b/) {
> +				foreach my $mismatch_keyword (@mismatch_keywords) {
> +					if ($line =~ /$mismatch_keyword/) {}
> +					else {
> +						print "$file contains $keyword on line $i\n";
> +					}
> +				}
> +			}
> +		}
> +
> +		$i++;
> +	}
> +}
> -- 
> 2.17.1
> 
> 
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17): https://lists.elisa.tech/g/linux-safety/message/17
Mute This Topic: https://lists.elisa.tech/mt/76453566/4688437
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub  [patchwork-linux-safety@patchwork.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-
Mohammed Billoo Sept. 9, 2020, 11:45 p.m. UTC | #2
> - Second, we then need to look into true and false positives here.
> BUG appears in intended error messages, in DEBUG, in functions, such as
> BUG, BUG_ON. So we need to collect the context around BUG and see to
> improve the pattern to at least reduce the false positives.
>
I thought this was addressed in the Perl script. Did you find
instances where this isn't the case (IIRC, BUG(), BUG_ON(), and
anything else that is outside of a comment were ignored)?

> - Third, we then need have suitable ways to aggregate the findings to know
> which places are known to have many such findings and hence deserve some
> kind of special care.
>
Do you think CodeChecker would be suitable for this?

> And then, we need a lot of help to make proper evaluations what to do.
Can you elaborate on this? I don't understand what you mean here.
Lukas Bulwahn Sept. 10, 2020, 6:42 a.m. UTC | #3
On Wed, 9 Sep 2020, Mohammed Billoo wrote:

> > - Second, we then need to look into true and false positives here.
> > BUG appears in intended error messages, in DEBUG, in functions, such as
> > BUG, BUG_ON. So we need to collect the context around BUG and see to
> > improve the pattern to at least reduce the false positives.
> >
> I thought this was addressed in the Perl script. Did you find
> instances where this isn't the case (IIRC, BUG(), BUG_ON(), and
> anything else that is outside of a comment were ignored)?
>

Yes, I did when scanning through the instances.

Let me consider these 10 examples here:

grep "BUG" commentcheck_on_v5.9-rc4 | grep "arch/x86" | head -n 10

./arch/x86/entry/entry_32.S contains BUG on line 155
./arch/x86/include/asm/cpufeatures.h contains BUG on line 381
./arch/x86/kernel/cpu/cpuid-deps.c contains BUG on line 90
./arch/x86/kernel/kprobes/core.c contains BUG on line 655
./arch/x86/kernel/nmi_selftest.c contains BUG on line 165
./arch/x86/kernel/paravirt.c contains BUG on line 128
./arch/x86/kernel/traps.c contains BUG on line 225
./arch/x86/kernel/traps.c contains BUG on line 314
./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321
./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326


Case:

./arch/x86/kernel/nmi_selftest.c contains BUG on line 165

printk("BUG: %3d unexpected failures (out of %3d) - debugging disabled! |\n",
        unexpected_testcase_failures, testcase_total);


Here BUG is within an intended printk message.


Case:

./arch/x86/kernel/traps.c contains BUG on line 314


        printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n",
                 (void *)fault_address, current->stack, 
                 (char *)current->stack + THREAD_SIZE - 1);

Here BUG is within an intended printk message.


Case:

./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321

pr_err("%s: %p 0->BUG\n", __func__, spte);

Here BUG is within an intended pr_err message.


Case:

./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326

pr_err("%s:  %p 1->BUG\n", __func__, spte);

Here BUG is within an intended pr_err message.


Summary:

10 examples looked at, 4 cases where it is part of an error message.

At least in six cases, it needs more thought why it might not be 
relevant :)

> > - Third, we then need have suitable ways to aggregate the findings to know
> > which places are known to have many such findings and hence deserve some
> > kind of special care.
> >
> Do you think CodeChecker would be suitable for this?
>

Well, I think we can record and track findings with CodeChecker if the 
output from the tool has a suitable format to parse back.

The aggregations I have in mind are a bit orthogonal, though.

For example:

How "bad" is my architecture?

grep "\./arch" commentcheck_on_v5.9-rc4  | sed 's!^.*arch/\([^/]*\)/.*$!\1!' | sort | uniq -c | sort -nr

    159 powerpc
    113 x86
    109 arm
     48 parisc
     42 mips
     28 alpha
     24 microblaze
     20 sh
     18 ia64
     16 m68k
     15 sparc
     15 openrisc
     13 arc
     10 xtensa
     10 um
     10 arm64
      7 s390
      7 riscv
      5 nds32
      3 hexagon
      3 h8300
      1 nios2
      1 csky
      1 c6x

We want to provide a tool that can guide a user through this data your 
script collects. Some way to show this data in a reasonable aggregation 
can help (not that I am suggesting that this aggregation above is 
reasonable, but it is a first example I could quickly hack together, but
maybe it is even reasonable?).

> > And then, we need a lot of help to make proper evaluations what to do.
> Can you elaborate on this? I don't understand what you mean here.
>

The statistics above shows there are 113 issues in x86; now, if you would 
ever want to use that architecture for anything that matters to you, I 
guess you need to understand all those 113 issues. That needs a lot of 
help from others that understand the code and the issue, right?

Collecting data is a good first step, but there is still a long way ahead.

Lukas

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#19): https://lists.elisa.tech/g/linux-safety/message/19
Mute This Topic: https://lists.elisa.tech/mt/76453566/4688437
Group Owner: linux-safety+owner@lists.elisa.tech
Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub  [patchwork-linux-safety@patchwork.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-
Mohammed Billoo Sept. 10, 2020, 11:55 a.m. UTC | #4
Thanks for the details. I will update the script and submit a revised
version of the patch to address your comments.

Thanks

On Thu, Sep 10, 2020 at 2:42 AM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote:
>
>
>
> On Wed, 9 Sep 2020, Mohammed Billoo wrote:
>
> > > - Second, we then need to look into true and false positives here.
> > > BUG appears in intended error messages, in DEBUG, in functions, such as
> > > BUG, BUG_ON. So we need to collect the context around BUG and see to
> > > improve the pattern to at least reduce the false positives.
> > >
> > I thought this was addressed in the Perl script. Did you find
> > instances where this isn't the case (IIRC, BUG(), BUG_ON(), and
> > anything else that is outside of a comment were ignored)?
> >
>
> Yes, I did when scanning through the instances.
>
> Let me consider these 10 examples here:
>
> grep "BUG" commentcheck_on_v5.9-rc4 | grep "arch/x86" | head -n 10
>
> ./arch/x86/entry/entry_32.S contains BUG on line 155
> ./arch/x86/include/asm/cpufeatures.h contains BUG on line 381
> ./arch/x86/kernel/cpu/cpuid-deps.c contains BUG on line 90
> ./arch/x86/kernel/kprobes/core.c contains BUG on line 655
> ./arch/x86/kernel/nmi_selftest.c contains BUG on line 165
> ./arch/x86/kernel/paravirt.c contains BUG on line 128
> ./arch/x86/kernel/traps.c contains BUG on line 225
> ./arch/x86/kernel/traps.c contains BUG on line 314
> ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321
> ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326
>
>
> Case:
>
> ./arch/x86/kernel/nmi_selftest.c contains BUG on line 165
>
> printk("BUG: %3d unexpected failures (out of %3d) - debugging disabled! |\n",
>         unexpected_testcase_failures, testcase_total);
>
>
> Here BUG is within an intended printk message.
>
>
> Case:
>
> ./arch/x86/kernel/traps.c contains BUG on line 314
>
>
>         printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n",
>                  (void *)fault_address, current->stack,
>                  (char *)current->stack + THREAD_SIZE - 1);
>
> Here BUG is within an intended printk message.
>
>
> Case:
>
> ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321
>
> pr_err("%s: %p 0->BUG\n", __func__, spte);
>
> Here BUG is within an intended pr_err message.
>
>
> Case:
>
> ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326
>
> pr_err("%s:  %p 1->BUG\n", __func__, spte);
>
> Here BUG is within an intended pr_err message.
>
>
> Summary:
>
> 10 examples looked at, 4 cases where it is part of an error message.
>
> At least in six cases, it needs more thought why it might not be
> relevant :)
>
> > > - Third, we then need have suitable ways to aggregate the findings to know
> > > which places are known to have many such findings and hence deserve some
> > > kind of special care.
> > >
> > Do you think CodeChecker would be suitable for this?
> >
>
> Well, I think we can record and track findings with CodeChecker if the
> output from the tool has a suitable format to parse back.
>
> The aggregations I have in mind are a bit orthogonal, though.
>
> For example:
>
> How "bad" is my architecture?
>
> grep "\./arch" commentcheck_on_v5.9-rc4  | sed 's!^.*arch/\([^/]*\)/.*$!\1!' | sort | uniq -c | sort -nr
>
>     159 powerpc
>     113 x86
>     109 arm
>      48 parisc
>      42 mips
>      28 alpha
>      24 microblaze
>      20 sh
>      18 ia64
>      16 m68k
>      15 sparc
>      15 openrisc
>      13 arc
>      10 xtensa
>      10 um
>      10 arm64
>       7 s390
>       7 riscv
>       5 nds32
>       3 hexagon
>       3 h8300
>       1 nios2
>       1 csky
>       1 c6x
>
> We want to provide a tool that can guide a user through this data your
> script collects. Some way to show this data in a reasonable aggregation
> can help (not that I am suggesting that this aggregation above is
> reasonable, but it is a first example I could quickly hack together, but
> maybe it is even reasonable?).
>
> > > And then, we need a lot of help to make proper evaluations what to do.
> > Can you elaborate on this? I don't understand what you mean here.
> >
>
> The statistics above shows there are 113 issues in x86; now, if you would
> ever want to use that architecture for anything that matters to you, I
> guess you need to understand all those 113 issues. That needs a lot of
> help from others that understand the code and the issue, right?
>
> Collecting data is a good first step, but there is still a long way ahead.
>
> Lukas
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index f21168154160..c84b8bc5c18e 100644
--- a/Makefile
+++ b/Makefile
@@ -264,7 +264,7 @@  no-dot-config-targets := $(clean-targets) \
 			 cscope gtags TAGS tags help% %docs check% coccicheck \
 			 $(version_h) headers headers_% archheaders archscripts \
 			 %asm-generic kernelversion %src-pkg dt_binding_check \
-			 outputmakefile
+			 outputmakefile commentcheck
 no-sync-config-targets := $(no-dot-config-targets) %install kernelrelease
 single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.s %.symtypes %/
 
@@ -1575,6 +1575,7 @@  help:
 	@echo  '  export_report   - List the usages of all exported symbols'
 	@echo  '  headerdep       - Detect inclusion cycles in headers'
 	@echo  '  coccicheck      - Check with Coccinelle'
+	@echo  '  commentcheck    - Check and report suspicious comments'
 	@echo  ''
 	@echo  'Tools:'
 	@echo  '  nsdeps          - Generate missing symbol namespace dependencies'
@@ -1842,6 +1843,11 @@  versioncheck:
 		-name '*.[hcS]' -type f -print | sort \
 		| xargs $(PERL) -w $(srctree)/scripts/checkversion.pl
 
+commentcheck:
+	find $(srctree)/* $(RCS_FIND_IGNORE) \
+		-name '*.[hcS]' -type f -print | sort \
+		| xargs $(PERL) -w $(srctree)/scripts/checkcomment.pl
+
 coccicheck:
 	$(Q)$(BASH) $(srctree)/scripts/$@
 
diff --git a/scripts/checkcomment.pl b/scripts/checkcomment.pl
new file mode 100644
index 000000000000..22fd77bc75d1
--- /dev/null
+++ b/scripts/checkcomment.pl
@@ -0,0 +1,35 @@ 
+#!/usr/bin/env perl
+# SPDX-License-Identifier: GPL-2.0
+#
+# (c) 2020, Mohammed Billoo (mab@mab-labs.com)
+#
+# This script checks for any keywords outlined in CWE-546
+# (https://cwe.mitre.org/data/definitions/546.html)
+# and simply reports them to the user. It's up to the user
+# to take any further actions.
+
+use strict;
+
+my @keywords = ('TODO', 'BUG', 'FIXME', 'HACK');
+my @mismatch_keywords = ('BUG\(\)');
+
+foreach my $file (@ARGV) {
+	my $i = 1;
+	open(my $f, '<', $file)
+		or die "Cannot open $file: $!\n";
+
+	while (my $line = <$f>) {
+		foreach my $keyword (@keywords) {
+			if ($line =~ /\b$keyword\b/) {
+				foreach my $mismatch_keyword (@mismatch_keywords) {
+					if ($line =~ /$mismatch_keyword/) {}
+					else {
+						print "$file contains $keyword on line $i\n";
+					}
+				}
+			}
+		}
+
+		$i++;
+	}
+}