Message ID | 20200827151333.11591-1-mab@mab-labs.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | scripts: Report 'suspicious' comments | expand |
On Thu, 27 Aug 2020, Mohammed Billoo wrote: > This perl script attempts to mitigate CWE-546 > (https://cwe.mitre.org/data/definitions/546.html), which identifies code > with comments that suggest that code is incomplete. This script was > tested against the kernel, and the following is a snippet of the > output that was generated. The output was verified by confirming that > the specified file does indeed have that string at the specified line. > > ./arch/arm/include/asm/pgtable.h contains FIXME on line 316 > ./arch/arm/include/debug/imx.S contains FIXME on line 14 > ./arch/arm/kernel/entry-header.S contains BUG on line 71 > ./arch/arm/kernel/fiq.c contains FIXME on line 72 > Okay, easy things first: The patch applies on v5.9-rc4 and the script runs. :) Now, the challenging parts: - I think your output should follow in its format the format of other "standard" tools that other tools use, e.g., look how gcc and clang compiler emit warnings, or how checkpatch.pl emits warnings. We should try to imitate their output format for this tool. Also, I would like to see the line which was warned about. - Second, we then need to look into true and false positives here. BUG appears in intended error messages, in DEBUG, in functions, such as BUG, BUG_ON. So we need to collect the context around BUG and see to improve the pattern to at least reduce the false positives. - Third, we then need have suitable ways to aggregate the findings to know which places are known to have many such findings and hence deserve some kind of special care. And then, we need a lot of help to make proper evaluations what to do. Lukas > Signed-off-by: Mohammed Billoo <mab@mab-labs.com> > --- > Makefile | 8 +++++++- > scripts/checkcomment.pl | 35 +++++++++++++++++++++++++++++++++++ > 2 files changed, 42 insertions(+), 1 deletion(-) > create mode 100644 scripts/checkcomment.pl > > diff --git a/Makefile b/Makefile > index f21168154160..c84b8bc5c18e 100644 > --- a/Makefile > +++ b/Makefile > @@ -264,7 +264,7 @@ no-dot-config-targets := $(clean-targets) \ > cscope gtags TAGS tags help% %docs check% coccicheck \ > $(version_h) headers headers_% archheaders archscripts \ > %asm-generic kernelversion %src-pkg dt_binding_check \ > - outputmakefile > + outputmakefile commentcheck > no-sync-config-targets := $(no-dot-config-targets) %install kernelrelease > single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.s %.symtypes %/ > > @@ -1575,6 +1575,7 @@ help: > @echo ' export_report - List the usages of all exported symbols' > @echo ' headerdep - Detect inclusion cycles in headers' > @echo ' coccicheck - Check with Coccinelle' > + @echo ' commentcheck - Check and report suspicious comments' > @echo '' > @echo 'Tools:' > @echo ' nsdeps - Generate missing symbol namespace dependencies' > @@ -1842,6 +1843,11 @@ versioncheck: > -name '*.[hcS]' -type f -print | sort \ > | xargs $(PERL) -w $(srctree)/scripts/checkversion.pl > > +commentcheck: > + find $(srctree)/* $(RCS_FIND_IGNORE) \ > + -name '*.[hcS]' -type f -print | sort \ > + | xargs $(PERL) -w $(srctree)/scripts/checkcomment.pl > + > coccicheck: > $(Q)$(BASH) $(srctree)/scripts/$@ > > diff --git a/scripts/checkcomment.pl b/scripts/checkcomment.pl > new file mode 100644 > index 000000000000..22fd77bc75d1 > --- /dev/null > +++ b/scripts/checkcomment.pl > @@ -0,0 +1,35 @@ > +#!/usr/bin/env perl > +# SPDX-License-Identifier: GPL-2.0 > +# > +# (c) 2020, Mohammed Billoo (mab@mab-labs.com) > +# > +# This script checks for any keywords outlined in CWE-546 > +# (https://cwe.mitre.org/data/definitions/546.html) > +# and simply reports them to the user. It's up to the user > +# to take any further actions. > + > +use strict; > + > +my @keywords = ('TODO', 'BUG', 'FIXME', 'HACK'); > +my @mismatch_keywords = ('BUG\(\)'); > + > +foreach my $file (@ARGV) { > + my $i = 1; > + open(my $f, '<', $file) > + or die "Cannot open $file: $!\n"; > + > + while (my $line = <$f>) { > + foreach my $keyword (@keywords) { > + if ($line =~ /\b$keyword\b/) { > + foreach my $mismatch_keyword (@mismatch_keywords) { > + if ($line =~ /$mismatch_keyword/) {} > + else { > + print "$file contains $keyword on line $i\n"; > + } > + } > + } > + } > + > + $i++; > + } > +} > -- > 2.17.1 > > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#17): https://lists.elisa.tech/g/linux-safety/message/17 Mute This Topic: https://lists.elisa.tech/mt/76453566/4688437 Group Owner: linux-safety+owner@lists.elisa.tech Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [patchwork-linux-safety@patchwork.kernel.org] -=-=-=-=-=-=-=-=-=-=-=-
> - Second, we then need to look into true and false positives here. > BUG appears in intended error messages, in DEBUG, in functions, such as > BUG, BUG_ON. So we need to collect the context around BUG and see to > improve the pattern to at least reduce the false positives. > I thought this was addressed in the Perl script. Did you find instances where this isn't the case (IIRC, BUG(), BUG_ON(), and anything else that is outside of a comment were ignored)? > - Third, we then need have suitable ways to aggregate the findings to know > which places are known to have many such findings and hence deserve some > kind of special care. > Do you think CodeChecker would be suitable for this? > And then, we need a lot of help to make proper evaluations what to do. Can you elaborate on this? I don't understand what you mean here.
On Wed, 9 Sep 2020, Mohammed Billoo wrote: > > - Second, we then need to look into true and false positives here. > > BUG appears in intended error messages, in DEBUG, in functions, such as > > BUG, BUG_ON. So we need to collect the context around BUG and see to > > improve the pattern to at least reduce the false positives. > > > I thought this was addressed in the Perl script. Did you find > instances where this isn't the case (IIRC, BUG(), BUG_ON(), and > anything else that is outside of a comment were ignored)? > Yes, I did when scanning through the instances. Let me consider these 10 examples here: grep "BUG" commentcheck_on_v5.9-rc4 | grep "arch/x86" | head -n 10 ./arch/x86/entry/entry_32.S contains BUG on line 155 ./arch/x86/include/asm/cpufeatures.h contains BUG on line 381 ./arch/x86/kernel/cpu/cpuid-deps.c contains BUG on line 90 ./arch/x86/kernel/kprobes/core.c contains BUG on line 655 ./arch/x86/kernel/nmi_selftest.c contains BUG on line 165 ./arch/x86/kernel/paravirt.c contains BUG on line 128 ./arch/x86/kernel/traps.c contains BUG on line 225 ./arch/x86/kernel/traps.c contains BUG on line 314 ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321 ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326 Case: ./arch/x86/kernel/nmi_selftest.c contains BUG on line 165 printk("BUG: %3d unexpected failures (out of %3d) - debugging disabled! |\n", unexpected_testcase_failures, testcase_total); Here BUG is within an intended printk message. Case: ./arch/x86/kernel/traps.c contains BUG on line 314 printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n", (void *)fault_address, current->stack, (char *)current->stack + THREAD_SIZE - 1); Here BUG is within an intended printk message. Case: ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321 pr_err("%s: %p 0->BUG\n", __func__, spte); Here BUG is within an intended pr_err message. Case: ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326 pr_err("%s: %p 1->BUG\n", __func__, spte); Here BUG is within an intended pr_err message. Summary: 10 examples looked at, 4 cases where it is part of an error message. At least in six cases, it needs more thought why it might not be relevant :) > > - Third, we then need have suitable ways to aggregate the findings to know > > which places are known to have many such findings and hence deserve some > > kind of special care. > > > Do you think CodeChecker would be suitable for this? > Well, I think we can record and track findings with CodeChecker if the output from the tool has a suitable format to parse back. The aggregations I have in mind are a bit orthogonal, though. For example: How "bad" is my architecture? grep "\./arch" commentcheck_on_v5.9-rc4 | sed 's!^.*arch/\([^/]*\)/.*$!\1!' | sort | uniq -c | sort -nr 159 powerpc 113 x86 109 arm 48 parisc 42 mips 28 alpha 24 microblaze 20 sh 18 ia64 16 m68k 15 sparc 15 openrisc 13 arc 10 xtensa 10 um 10 arm64 7 s390 7 riscv 5 nds32 3 hexagon 3 h8300 1 nios2 1 csky 1 c6x We want to provide a tool that can guide a user through this data your script collects. Some way to show this data in a reasonable aggregation can help (not that I am suggesting that this aggregation above is reasonable, but it is a first example I could quickly hack together, but maybe it is even reasonable?). > > And then, we need a lot of help to make proper evaluations what to do. > Can you elaborate on this? I don't understand what you mean here. > The statistics above shows there are 113 issues in x86; now, if you would ever want to use that architecture for anything that matters to you, I guess you need to understand all those 113 issues. That needs a lot of help from others that understand the code and the issue, right? Collecting data is a good first step, but there is still a long way ahead. Lukas -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19): https://lists.elisa.tech/g/linux-safety/message/19 Mute This Topic: https://lists.elisa.tech/mt/76453566/4688437 Group Owner: linux-safety+owner@lists.elisa.tech Unsubscribe: https://lists.elisa.tech/g/linux-safety/unsub [patchwork-linux-safety@patchwork.kernel.org] -=-=-=-=-=-=-=-=-=-=-=-
Thanks for the details. I will update the script and submit a revised version of the patch to address your comments. Thanks On Thu, Sep 10, 2020 at 2:42 AM Lukas Bulwahn <lukas.bulwahn@gmail.com> wrote: > > > > On Wed, 9 Sep 2020, Mohammed Billoo wrote: > > > > - Second, we then need to look into true and false positives here. > > > BUG appears in intended error messages, in DEBUG, in functions, such as > > > BUG, BUG_ON. So we need to collect the context around BUG and see to > > > improve the pattern to at least reduce the false positives. > > > > > I thought this was addressed in the Perl script. Did you find > > instances where this isn't the case (IIRC, BUG(), BUG_ON(), and > > anything else that is outside of a comment were ignored)? > > > > Yes, I did when scanning through the instances. > > Let me consider these 10 examples here: > > grep "BUG" commentcheck_on_v5.9-rc4 | grep "arch/x86" | head -n 10 > > ./arch/x86/entry/entry_32.S contains BUG on line 155 > ./arch/x86/include/asm/cpufeatures.h contains BUG on line 381 > ./arch/x86/kernel/cpu/cpuid-deps.c contains BUG on line 90 > ./arch/x86/kernel/kprobes/core.c contains BUG on line 655 > ./arch/x86/kernel/nmi_selftest.c contains BUG on line 165 > ./arch/x86/kernel/paravirt.c contains BUG on line 128 > ./arch/x86/kernel/traps.c contains BUG on line 225 > ./arch/x86/kernel/traps.c contains BUG on line 314 > ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321 > ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326 > > > Case: > > ./arch/x86/kernel/nmi_selftest.c contains BUG on line 165 > > printk("BUG: %3d unexpected failures (out of %3d) - debugging disabled! |\n", > unexpected_testcase_failures, testcase_total); > > > Here BUG is within an intended printk message. > > > Case: > > ./arch/x86/kernel/traps.c contains BUG on line 314 > > > printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n", > (void *)fault_address, current->stack, > (char *)current->stack + THREAD_SIZE - 1); > > Here BUG is within an intended printk message. > > > Case: > > ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1321 > > pr_err("%s: %p 0->BUG\n", __func__, spte); > > Here BUG is within an intended pr_err message. > > > Case: > > ./arch/x86/kvm/mmu/mmu.c contains BUG on line 1326 > > pr_err("%s: %p 1->BUG\n", __func__, spte); > > Here BUG is within an intended pr_err message. > > > Summary: > > 10 examples looked at, 4 cases where it is part of an error message. > > At least in six cases, it needs more thought why it might not be > relevant :) > > > > - Third, we then need have suitable ways to aggregate the findings to know > > > which places are known to have many such findings and hence deserve some > > > kind of special care. > > > > > Do you think CodeChecker would be suitable for this? > > > > Well, I think we can record and track findings with CodeChecker if the > output from the tool has a suitable format to parse back. > > The aggregations I have in mind are a bit orthogonal, though. > > For example: > > How "bad" is my architecture? > > grep "\./arch" commentcheck_on_v5.9-rc4 | sed 's!^.*arch/\([^/]*\)/.*$!\1!' | sort | uniq -c | sort -nr > > 159 powerpc > 113 x86 > 109 arm > 48 parisc > 42 mips > 28 alpha > 24 microblaze > 20 sh > 18 ia64 > 16 m68k > 15 sparc > 15 openrisc > 13 arc > 10 xtensa > 10 um > 10 arm64 > 7 s390 > 7 riscv > 5 nds32 > 3 hexagon > 3 h8300 > 1 nios2 > 1 csky > 1 c6x > > We want to provide a tool that can guide a user through this data your > script collects. Some way to show this data in a reasonable aggregation > can help (not that I am suggesting that this aggregation above is > reasonable, but it is a first example I could quickly hack together, but > maybe it is even reasonable?). > > > > And then, we need a lot of help to make proper evaluations what to do. > > Can you elaborate on this? I don't understand what you mean here. > > > > The statistics above shows there are 113 issues in x86; now, if you would > ever want to use that architecture for anything that matters to you, I > guess you need to understand all those 113 issues. That needs a lot of > help from others that understand the code and the issue, right? > > Collecting data is a good first step, but there is still a long way ahead. > > Lukas
diff --git a/Makefile b/Makefile index f21168154160..c84b8bc5c18e 100644 --- a/Makefile +++ b/Makefile @@ -264,7 +264,7 @@ no-dot-config-targets := $(clean-targets) \ cscope gtags TAGS tags help% %docs check% coccicheck \ $(version_h) headers headers_% archheaders archscripts \ %asm-generic kernelversion %src-pkg dt_binding_check \ - outputmakefile + outputmakefile commentcheck no-sync-config-targets := $(no-dot-config-targets) %install kernelrelease single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.s %.symtypes %/ @@ -1575,6 +1575,7 @@ help: @echo ' export_report - List the usages of all exported symbols' @echo ' headerdep - Detect inclusion cycles in headers' @echo ' coccicheck - Check with Coccinelle' + @echo ' commentcheck - Check and report suspicious comments' @echo '' @echo 'Tools:' @echo ' nsdeps - Generate missing symbol namespace dependencies' @@ -1842,6 +1843,11 @@ versioncheck: -name '*.[hcS]' -type f -print | sort \ | xargs $(PERL) -w $(srctree)/scripts/checkversion.pl +commentcheck: + find $(srctree)/* $(RCS_FIND_IGNORE) \ + -name '*.[hcS]' -type f -print | sort \ + | xargs $(PERL) -w $(srctree)/scripts/checkcomment.pl + coccicheck: $(Q)$(BASH) $(srctree)/scripts/$@ diff --git a/scripts/checkcomment.pl b/scripts/checkcomment.pl new file mode 100644 index 000000000000..22fd77bc75d1 --- /dev/null +++ b/scripts/checkcomment.pl @@ -0,0 +1,35 @@ +#!/usr/bin/env perl +# SPDX-License-Identifier: GPL-2.0 +# +# (c) 2020, Mohammed Billoo (mab@mab-labs.com) +# +# This script checks for any keywords outlined in CWE-546 +# (https://cwe.mitre.org/data/definitions/546.html) +# and simply reports them to the user. It's up to the user +# to take any further actions. + +use strict; + +my @keywords = ('TODO', 'BUG', 'FIXME', 'HACK'); +my @mismatch_keywords = ('BUG\(\)'); + +foreach my $file (@ARGV) { + my $i = 1; + open(my $f, '<', $file) + or die "Cannot open $file: $!\n"; + + while (my $line = <$f>) { + foreach my $keyword (@keywords) { + if ($line =~ /\b$keyword\b/) { + foreach my $mismatch_keyword (@mismatch_keywords) { + if ($line =~ /$mismatch_keyword/) {} + else { + print "$file contains $keyword on line $i\n"; + } + } + } + } + + $i++; + } +}
This perl script attempts to mitigate CWE-546 (https://cwe.mitre.org/data/definitions/546.html), which identifies code with comments that suggest that code is incomplete. This script was tested against the kernel, and the following is a snippet of the output that was generated. The output was verified by confirming that the specified file does indeed have that string at the specified line. ./arch/arm/include/asm/pgtable.h contains FIXME on line 316 ./arch/arm/include/debug/imx.S contains FIXME on line 14 ./arch/arm/kernel/entry-header.S contains BUG on line 71 ./arch/arm/kernel/fiq.c contains FIXME on line 72 Signed-off-by: Mohammed Billoo <mab@mab-labs.com> --- Makefile | 8 +++++++- scripts/checkcomment.pl | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 scripts/checkcomment.pl