diff mbox series

[v2,2/4] llvm-cov: add Clang's MC/DC support

Message ID 20240905043245.1389509-3-wentaoz5@illinois.edu (mailing list archive)
State Handled Elsewhere
Headers show
Series Enable measuring the kernel's Source-based Code Coverage and MC/DC with Clang | expand

Commit Message

Wentao Zhang Sept. 5, 2024, 4:32 a.m. UTC
Add infrastructure to enable Clang's Modified Condition/Decision Coverage
(MC/DC) [1].

Clang has added MC/DC support as of its 18.1.0 release. MC/DC is a fine-
grained coverage metric required by many automotive and aviation industrial
standards for certifying mission-critical software [2].

In the following example from arch/x86/events/probe.c, llvm-cov gives the
MC/DC measurement for the compound logic decision at line 43.

   43|     12|			if (msr[bit].test && !msr[bit].test(bit, data))
  ------------------
  |---> MC/DC Decision Region (43:8) to (43:50)
  |
  |  Number of Conditions: 2
  |     Condition C1 --> (43:8)
  |     Condition C2 --> (43:25)
  |
  |  Executed MC/DC Test Vectors:
  |
  |     C1, C2    Result
  |  1 { T,  F  = F      }
  |  2 { T,  T  = T      }
  |
  |  C1-Pair: not covered
  |  C2-Pair: covered: (1,2)
  |  MC/DC Coverage for Decision: 50.00%
  |
  ------------------
   44|      5|				continue;

As the results suggest, during the span of measurement, only condition C2
(!msr[bit].test(bit, data)) is covered. That means C2 was evaluated to both
true and false, and in those test vectors C2 affected the decision outcome
independently. Therefore MC/DC for this decision is 1 out of 2 (50.00%).

As of Clang 19, users can determine the max number of conditions in a
decision to measure via option LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS, which
controls -fmcdc-max-conditions flag of Clang cc1 [3]. Since MC/DC
implementation utilizes bitmaps to track the execution of test vectors,
more memory is consumed if larger decisions are getting counted. The
maximum value supported by Clang is 32767. According to local experiments,
the working maximum for Linux kernel is 46, with the largest decisions in
kernel codebase (with 47 conditions, as of v6.11) excluded, otherwise the
kernel image size limit will be exceeded. The largest decisions in kernel
are contributed for example by macros checking CPUID.

Code exceeding LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS will produce compiler
warnings.

As of LLVM 19, certain expressions are still not covered, and will produce
build warnings when they are encountered:

"[...] if a boolean expression is embedded in the nest of another boolean
 expression but separated by a non-logical operator, this is also not
 supported. For example, in x = (a && b && c && func(d && f)), the d && f
 case starts a new boolean expression that is separated from the other
 conditions by the operator func(). When this is encountered, a warning
 will be generated and the boolean expression will not be
 instrumented." [4]

Link: https://en.wikipedia.org/wiki/Modified_condition%2Fdecision_coverage [1]
Link: https://digital-library.theiet.org/content/journals/10.1049/sej.1994.0025 [2]
Link: https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798 [3]
Link: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#mc-dc-instrumentation [4]
Signed-off-by: Wentao Zhang <wentaoz5@illinois.edu>
Reviewed-by: Chuck Wolber <chuck.wolber@boeing.com>
Tested-by: Chuck Wolber <chuck.wolber@boeing.com>
---
 Makefile                |  6 ++++++
 kernel/llvm-cov/Kconfig | 36 ++++++++++++++++++++++++++++++++++++
 scripts/Makefile.lib    | 12 ++++++++++++
 3 files changed, 54 insertions(+)

Comments

Nathan Chancellor Oct. 2, 2024, 1:10 a.m. UTC | #1
Hi Wentao,

On Wed, Sep 04, 2024 at 11:32:43PM -0500, Wentao Zhang wrote:
> Add infrastructure to enable Clang's Modified Condition/Decision Coverage
> (MC/DC) [1].
> 
> Clang has added MC/DC support as of its 18.1.0 release. MC/DC is a fine-
> grained coverage metric required by many automotive and aviation industrial
> standards for certifying mission-critical software [2].
> 
> In the following example from arch/x86/events/probe.c, llvm-cov gives the
> MC/DC measurement for the compound logic decision at line 43.
> 
>    43|     12|			if (msr[bit].test && !msr[bit].test(bit, data))
>   ------------------
>   |---> MC/DC Decision Region (43:8) to (43:50)
>   |
>   |  Number of Conditions: 2
>   |     Condition C1 --> (43:8)
>   |     Condition C2 --> (43:25)
>   |
>   |  Executed MC/DC Test Vectors:
>   |
>   |     C1, C2    Result
>   |  1 { T,  F  = F      }
>   |  2 { T,  T  = T      }
>   |
>   |  C1-Pair: not covered
>   |  C2-Pair: covered: (1,2)
>   |  MC/DC Coverage for Decision: 50.00%
>   |
>   ------------------
>    44|      5|				continue;
> 
> As the results suggest, during the span of measurement, only condition C2
> (!msr[bit].test(bit, data)) is covered. That means C2 was evaluated to both
> true and false, and in those test vectors C2 affected the decision outcome
> independently. Therefore MC/DC for this decision is 1 out of 2 (50.00%).

Thanks a lot for the detail in the commit message. Your first talk at
LPC in the Refereed Track was excellent as well. If the video for that
talk becomes available soon, it would be helpful to link that in the
commit message as well.

> As of Clang 19, users can determine the max number of conditions in a
> decision to measure via option LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS, which
> controls -fmcdc-max-conditions flag of Clang cc1 [3]. Since MC/DC
> implementation utilizes bitmaps to track the execution of test vectors,
> more memory is consumed if larger decisions are getting counted. The

Some of this could potentially be in the Kconfig text below as it seems
relevant for users to make a decision on modifying its value.

> maximum value supported by Clang is 32767. According to local experiments,
> the working maximum for Linux kernel is 46, with the largest decisions in
> kernel codebase (with 47 conditions, as of v6.11) excluded, otherwise the
> kernel image size limit will be exceeded. The largest decisions in kernel
> are contributed for example by macros checking CPUID.
> 
> Code exceeding LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS will produce compiler
> warnings.
> 
> As of LLVM 19, certain expressions are still not covered, and will produce
> build warnings when they are encountered:
> 
> "[...] if a boolean expression is embedded in the nest of another boolean
>  expression but separated by a non-logical operator, this is also not
>  supported. For example, in x = (a && b && c && func(d && f)), the d && f
>  case starts a new boolean expression that is separated from the other
>  conditions by the operator func(). When this is encountered, a warning
>  will be generated and the boolean expression will not be
>  instrumented." [4]

These two sets of warnings appear to be pretty noisy in my build
testing... Is there any way to shut them up? Perhaps it is good for
users to see these limitations but it basically makes the build output
useless. If there were switches, then they could be disabled in the
default case with a Kconfig option to turn them on if the user is
concerned with seeing which parts of their code are not instrumented. I
could see developers wanting to run this for writing tests and they
might not care about this as much as someone else might.

I did leave LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS at its default value.
Perhaps there is a more reasonable default that would result in less
noisy build output but not run afoul of potential memory usage concerns?
I assume that mention means that memory usage may be a concern for the
type of deployments this technology would commonly be used with?

> Link: https://en.wikipedia.org/wiki/Modified_condition%2Fdecision_coverage [1]
> Link: https://digital-library.theiet.org/content/journals/10.1049/sej.1994.0025 [2]
> Link: https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798 [3]
> Link: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#mc-dc-instrumentation [4]

Thank you for using this link format :)

> Signed-off-by: Wentao Zhang <wentaoz5@illinois.edu>
> Reviewed-by: Chuck Wolber <chuck.wolber@boeing.com>
> Tested-by: Chuck Wolber <chuck.wolber@boeing.com>

From an actual code perspective, this looks good to me.

Reviewed-by: Nathan Chancellor <nathan@kernel.org>

> diff --git a/Makefile b/Makefile
> index 51498134c..1185b38d6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -740,6 +740,12 @@ all: vmlinux
>  CFLAGS_LLVM_COV := -fprofile-instr-generate -fcoverage-mapping
>  export CFLAGS_LLVM_COV
>  
> +CFLAGS_LLVM_COV_MCDC := -fcoverage-mcdc
> +ifdef CONFIG_LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS
> +CFLAGS_LLVM_COV_MCDC += -Xclang -fmcdc-max-conditions=$(CONFIG_LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS)

Why is -Xclang needed here? Is this not a full frontend flag?

> +endif
> +export CFLAGS_LLVM_COV_MCDC
> +
>  CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
>  ifdef CONFIG_CC_IS_GCC
>  CFLAGS_GCOV	+= -fno-tree-loop-im
Wentao Zhang Oct. 3, 2024, 3:14 a.m. UTC | #2
Hi Nathan,

Thanks for your review! See some of my responses below inline. Other
comments, including those to [1/4] and [4/4], are acknowledged and will be
updated in v3.

On 2024-10-01 20:10, Nathan Chancellor wrote:
> ...
> > maximum value supported by Clang is 32767. According to local experiments,
> > the working maximum for Linux kernel is 46, with the largest decisions in
> > kernel codebase (with 47 conditions, as of v6.11) excluded, otherwise the
> > kernel image size limit will be exceeded. The largest decisions in kernel
> > are contributed for example by macros checking CPUID.
> > 
> > Code exceeding LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS will produce compiler
> > warnings.
> > 
> > As of LLVM 19, certain expressions are still not covered, and will produce
> > build warnings when they are encountered:
> > 
> > "[...] if a boolean expression is embedded in the nest of another boolean
> >  expression but separated by a non-logical operator, this is also not
> >  supported. For example, in x = (a && b && c && func(d && f)), the d && f
> >  case starts a new boolean expression that is separated from the other
> >  conditions by the operator func(). When this is encountered, a warning
> >  will be generated and the boolean expression will not be
> >  instrumented." [4]
> 
> These two sets of warnings appear to be pretty noisy in my build
> testing... Is there any way to shut them up? Perhaps it is good for

These two warnings are currently implemented as custom diagnostic in
clang/lib/CodeGen/CodeGenPGO.cpp:dataTraverseStmtPost. So I'm afraid there
is no corresponding "-W[no-]xxx" flag at this moment. I agree such switches
would be desirable but we might have to nudge this in LLVM community.

> users to see these limitations but it basically makes the build output
> useless. If there were switches, then they could be disabled in the
> default case with a Kconfig option to turn them on if the user is
> concerned with seeing which parts of their code are not instrumented. I
> could see developers wanting to run this for writing tests and they
> might not care about this as much as someone else might.
> 
> I did leave LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS at its default value.
> Perhaps there is a more reasonable default that would result in less
> noisy build output but not run afoul of potential memory usage concerns?
> I assume that mention means that memory usage may be a concern for the
> type of deployments this technology would commonly be used with?

To my own experiences, enlarging this threshold won't really help with the
issue, because the other type of warning ("nested boolean") is even more
prevalent in kernel codebase. I once built the kernel serially and counted
the number of instances from the gigantic log:

  unsupported number of conditions (>6): 837
  unsupported nested boolean:            8029

So again we should probably improve this on the tool side. I can talk to
developers there separately.

> ...
> > diff --git a/Makefile b/Makefile
> > index 51498134c..1185b38d6 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -740,6 +740,12 @@ all: vmlinux
> >  CFLAGS_LLVM_COV := -fprofile-instr-generate -fcoverage-mapping
> >  export CFLAGS_LLVM_COV
> >  
> > +CFLAGS_LLVM_COV_MCDC := -fcoverage-mcdc
> > +ifdef CONFIG_LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS
> > +CFLAGS_LLVM_COV_MCDC += -Xclang -fmcdc-max-conditions=$(CONFIG_LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS)
> 
> Why is -Xclang needed here? Is this not a full frontend flag?

"-fmcdc-max-conditions" is a cc1 option only, while "-fcoverage-mcdc" is
both a cc1 option and a clang option. See llvm/llvm-project#82448 and their
changes to clang/include/clang/Driver/Options.td.

Thanks,
Wentao

> 
> > +endif
> > +export CFLAGS_LLVM_COV_MCDC
> > +
> >  CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
> >  ifdef CONFIG_CC_IS_GCC
> >  CFLAGS_GCOV	+= -fno-tree-loop-im
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 51498134c..1185b38d6 100644
--- a/Makefile
+++ b/Makefile
@@ -740,6 +740,12 @@  all: vmlinux
 CFLAGS_LLVM_COV := -fprofile-instr-generate -fcoverage-mapping
 export CFLAGS_LLVM_COV
 
+CFLAGS_LLVM_COV_MCDC := -fcoverage-mcdc
+ifdef CONFIG_LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS
+CFLAGS_LLVM_COV_MCDC += -Xclang -fmcdc-max-conditions=$(CONFIG_LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS)
+endif
+export CFLAGS_LLVM_COV_MCDC
+
 CFLAGS_GCOV	:= -fprofile-arcs -ftest-coverage
 ifdef CONFIG_CC_IS_GCC
 CFLAGS_GCOV	+= -fno-tree-loop-im
diff --git a/kernel/llvm-cov/Kconfig b/kernel/llvm-cov/Kconfig
index 9241fdfb0..66259e1f2 100644
--- a/kernel/llvm-cov/Kconfig
+++ b/kernel/llvm-cov/Kconfig
@@ -61,4 +61,40 @@  config LLVM_COV_PROFILE_ALL
 	  Note that a kernel compiled with profiling flags will be significantly
 	  larger and run slower.
 
+config LLVM_COV_KERNEL_MCDC
+	bool "Enable measuring modified condition/decision coverage (MC/DC)"
+	depends on LLVM_COV_KERNEL
+	depends on CLANG_VERSION >= 180000
+	help
+	  This option enables modified condition/decision coverage (MC/DC)
+	  code coverage instrumentation.
+
+	  If unsure, say N.
+
+	  This will add Clang's Source-based Code Coverage MC/DC
+	  instrumentation to your kernel. As of LLVM 19, certain expressions
+	  are still not covered, and will produce build warnings when they are
+	  encountered.
+
+	  "[...] if a boolean expression is embedded in the nest of another
+	   boolean expression but separated by a non-logical operator, this is
+	   also not supported. For example, in
+	   x = (a && b && c && func(d && f)), the d && f case starts a new
+	   boolean expression that is separated from the other conditions by the
+	   operator func(). When this is encountered, a warning will be
+	   generated and the boolean expression will not be instrumented."
+
+	   https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#mc-dc-instrumentation
+
+config LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS
+	int "Maximum number of conditions in a decision to instrument"
+	range 6 32767
+	depends on LLVM_COV_KERNEL_MCDC
+	depends on CLANG_VERSION >= 190000
+	default "6"
+	help
+	  This value is passed to "-fmcdc-max-conditions" flag of Clang cc1.
+	  Expressions whose number of conditions is greater than this value will
+	  produce warnings and will not be instrumented.
+
 endmenu
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index b468856b8..afc94e92d 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -169,6 +169,18 @@  _c_flags += $(if $(patsubst n%,, \
 		$(CFLAGS_LLVM_COV))
 endif
 
+#
+# Flag that turns on modified condition/decision coverage (MC/DC) measurement
+# with Clang's Source-based Code Coverage. Enable the flag for a file or
+# directory depending on variables LLVM_COV_PROFILE_obj.o, LLVM_COV_PROFILE and
+# CONFIG_LLVM_COV_PROFILE_ALL.
+#
+ifeq ($(CONFIG_LLVM_COV_KERNEL_MCDC),y)
+_c_flags += $(if $(patsubst n%,, \
+		$(LLVM_COV_PROFILE_$(target-stem).o)$(LLVM_COV_PROFILE)$(if $(is-kernel-object),$(CONFIG_LLVM_COV_PROFILE_ALL))), \
+		$(CFLAGS_LLVM_COV_MCDC))
+endif
+
 #
 # Enable address sanitizer flags for kernel except some files or directories
 # we don't want to check (depends on variables KASAN_SANITIZE_obj.o, KASAN_SANITIZE)