diff mbox series

[XEN,for-4.19] domain: add ASSERT to help static analysis tools

Message ID 3f163bb58993410183229e72eb1f227057f9b1c7.1699034273.git.nicola.vetrini@bugseng.com (mailing list archive)
State Superseded
Headers show
Series [XEN,for-4.19] domain: add ASSERT to help static analysis tools | expand

Commit Message

Nicola Vetrini Nov. 3, 2023, 5:58 p.m. UTC
Static analysis tools may detect a possible null
pointer dereference at line 760 (the memcpy call)
of xen/common/domain.c. This ASSERT helps them in
detecting that such a condition is not possible
and also provides a basic sanity check.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
---
The check may be later improved by proper error checking
instead of relying on the semantics explained here:
https://lore.kernel.org/xen-devel/61f04d4b-34d9-4fd1-a989-56b042b4f3d8@citrix.com/

This addresses the caution reported by ECLAIR for MISRA C:2012 D4.11
---
 xen/common/domain.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Nicola Vetrini Nov. 3, 2023, 6:08 p.m. UTC | #1
On 2023-11-03 18:58, Nicola Vetrini wrote:
> Static analysis tools may detect a possible null
> pointer dereference at line 760 (the memcpy call)
> of xen/common/domain.c. This ASSERT helps them in
> detecting that such a condition is not possible
> and also provides a basic sanity check.
> 
Suggested-by: Julien Grall <julien@xen.org>
> Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>

This should have been present as well, I forgot to add it.
Stefano Stabellini Nov. 7, 2023, 4:18 a.m. UTC | #2
On Fri, 3 Nov 2023, Nicola Vetrini wrote:
> On 2023-11-03 18:58, Nicola Vetrini wrote:
> > Static analysis tools may detect a possible null
> > pointer dereference at line 760 (the memcpy call)
> > of xen/common/domain.c. This ASSERT helps them in
> > detecting that such a condition is not possible
> > and also provides a basic sanity check.
> > 
> Suggested-by: Julien Grall <julien@xen.org>
> > Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Jan Beulich Nov. 8, 2023, 8:24 a.m. UTC | #3
On 03.11.2023 18:58, Nicola Vetrini wrote:
> Static analysis tools may detect a possible null
> pointer dereference at line 760 (the memcpy call)
> of xen/common/domain.c. This ASSERT helps them in
> detecting that such a condition is not possible
> and also provides a basic sanity check.

I disagree with this being a possible justification for adding such a
redundant assertion. More detail is needed on what is actually
(suspected to be) confusing the tool. Plus it also needs explaining
why (a) adding such an assertion helps and (b) how that's going to
cover release builds.

> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -700,6 +700,8 @@ struct domain *domain_create(domid_t domid,
>  
>      if ( !is_idle_domain(d) )
>      {
> +        ASSERT(config);
> +
>          watchdog_domain_init(d);
>          init_status |= INIT_watchdog;

The assertion being redundant clearly requires it to be accompanied
by a comment. Otherwise it is going to be a prime subject of
redundancy elimination.

Jan
Nicola Vetrini Nov. 8, 2023, 11:03 a.m. UTC | #4
On 2023-11-08 09:24, Jan Beulich wrote:
> On 03.11.2023 18:58, Nicola Vetrini wrote:
>> Static analysis tools may detect a possible null
>> pointer dereference at line 760 (the memcpy call)
>> of xen/common/domain.c. This ASSERT helps them in
>> detecting that such a condition is not possible
>> and also provides a basic sanity check.
> 
> I disagree with this being a possible justification for adding such a
> redundant assertion. More detail is needed on what is actually
> (suspected to be) confusing the tool. Plus it also needs explaining
> why (a) adding such an assertion helps and (b) how that's going to
> cover release builds.
> 

How about:
"Static analysis tools may detect a possible null pointer dereference
at line 760 (config->handle) due to config possibly being NULL.

However, given that all system domains, including IDLE, have a NULL
config and in the code path leading to the assertion only real domains
(which have a non-NULL config) can be present."

On point b): this finding is a false positive, therefore even if the 
ASSERT is
expanded to effectively a no-op, there is no inherent problem with Xen's 
code.
The context in which the patch was suggested [1] hinted at avoiding 
inserting in
the codebase false positive comments.

>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -700,6 +700,8 @@ struct domain *domain_create(domid_t domid,
>> 
>>      if ( !is_idle_domain(d) )
>>      {
>> +        ASSERT(config);
>> +
>>          watchdog_domain_init(d);
>>          init_status |= INIT_watchdog;
> 
> The assertion being redundant clearly requires it to be accompanied
> by a comment. Otherwise it is going to be a prime subject of
> redundancy elimination.
> 
> Jan

Good point; I'd rather keep it short and understandable
(e.g. "This ASSERT helps static analysis tool avoid a false positive
on a possible NULL dereference of config")

[1] 
https://lore.kernel.org/xen-devel/5ddb6398-f2a3-4bcb-8808-bad653b6c3cd@xen.org/
Jan Beulich Nov. 8, 2023, 11:19 a.m. UTC | #5
On 08.11.2023 12:03, Nicola Vetrini wrote:
> On 2023-11-08 09:24, Jan Beulich wrote:
>> On 03.11.2023 18:58, Nicola Vetrini wrote:
>>> Static analysis tools may detect a possible null
>>> pointer dereference at line 760 (the memcpy call)
>>> of xen/common/domain.c. This ASSERT helps them in
>>> detecting that such a condition is not possible
>>> and also provides a basic sanity check.
>>
>> I disagree with this being a possible justification for adding such a
>> redundant assertion. More detail is needed on what is actually
>> (suspected to be) confusing the tool. Plus it also needs explaining
>> why (a) adding such an assertion helps and (b) how that's going to
>> cover release builds.
>>
> 
> How about:
> "Static analysis tools may detect a possible null pointer dereference
> at line 760 (config->handle) due to config possibly being NULL.
> 
> However, given that all system domains, including IDLE, have a NULL
> config and in the code path leading to the assertion only real domains
> (which have a non-NULL config) can be present."
> 
> On point b): this finding is a false positive, therefore even if the 
> ASSERT is
> expanded to effectively a no-op, there is no inherent problem with Xen's 
> code.
> The context in which the patch was suggested [1] hinted at avoiding 
> inserting in
> the codebase false positive comments.

Which I largely agree with. What I don't agree with is adding an
assertion which is only papering over the issue, and only in debug
builds. So perhaps instead we need a different way of tracking
false positives (which need to be tied to specific checker versions
anyway).

Jan
Nicola Vetrini Nov. 8, 2023, 1:28 p.m. UTC | #6
On 2023-11-08 12:19, Jan Beulich wrote:
> On 08.11.2023 12:03, Nicola Vetrini wrote:
>> On 2023-11-08 09:24, Jan Beulich wrote:
>>> On 03.11.2023 18:58, Nicola Vetrini wrote:
>>>> Static analysis tools may detect a possible null
>>>> pointer dereference at line 760 (the memcpy call)
>>>> of xen/common/domain.c. This ASSERT helps them in
>>>> detecting that such a condition is not possible
>>>> and also provides a basic sanity check.
>>> 
>>> I disagree with this being a possible justification for adding such a
>>> redundant assertion. More detail is needed on what is actually
>>> (suspected to be) confusing the tool. Plus it also needs explaining
>>> why (a) adding such an assertion helps and (b) how that's going to
>>> cover release builds.
>>> 
>> 
>> How about:
>> "Static analysis tools may detect a possible null pointer dereference
>> at line 760 (config->handle) due to config possibly being NULL.
>> 
>> However, given that all system domains, including IDLE, have a NULL
>> config and in the code path leading to the assertion only real domains
>> (which have a non-NULL config) can be present."
>> 
>> On point b): this finding is a false positive, therefore even if the
>> ASSERT is
>> expanded to effectively a no-op, there is no inherent problem with 
>> Xen's
>> code.
>> The context in which the patch was suggested [1] hinted at avoiding
>> inserting in
>> the codebase false positive comments.
> 
> Which I largely agree with. What I don't agree with is adding an
> assertion which is only papering over the issue, and only in debug
> builds. So perhaps instead we need a different way of tracking
> false positives (which need to be tied to specific checker versions
> anyway).
> 

Hmm. Is it better in your opinion to write something like:

if (config == NULL)
    return ERR_PTR(<some error code>); // or die() or something 
appropriate

this would be a rudimentary handling of the error with some messages 
detailing that something
is wrong if a domain has a null config at that point.

To be clear: I'm fine with every way of deviating the construct, but 
agreeing on an
alternate mechanism to SAF-x-false-positive would land later than 
implementing some form
of error handling, I think.
Julien Grall Nov. 8, 2023, 1:33 p.m. UTC | #7
Hi Jan,

On 08/11/2023 11:19, Jan Beulich wrote:
> On 08.11.2023 12:03, Nicola Vetrini wrote:
>> On 2023-11-08 09:24, Jan Beulich wrote:
>>> On 03.11.2023 18:58, Nicola Vetrini wrote:
>>>> Static analysis tools may detect a possible null
>>>> pointer dereference at line 760 (the memcpy call)
>>>> of xen/common/domain.c. This ASSERT helps them in
>>>> detecting that such a condition is not possible
>>>> and also provides a basic sanity check.
>>>
>>> I disagree with this being a possible justification for adding such a
>>> redundant assertion. More detail is needed on what is actually
>>> (suspected to be) confusing the tool. Plus it also needs explaining
>>> why (a) adding such an assertion helps and (b) how that's going to
>>> cover release builds.
>>>
>>
>> How about:
>> "Static analysis tools may detect a possible null pointer dereference
>> at line 760 (config->handle) due to config possibly being NULL.
>>
>> However, given that all system domains, including IDLE, have a NULL
>> config and in the code path leading to the assertion only real domains
>> (which have a non-NULL config) can be present."
>>
>> On point b): this finding is a false positive, therefore even if the
>> ASSERT is
>> expanded to effectively a no-op, there is no inherent problem with Xen's
>> code.
>> The context in which the patch was suggested [1] hinted at avoiding
>> inserting in
>> the codebase false positive comments.
> 
> Which I largely agree with. What I don't agree with is adding an
> assertion which is only papering over the issue, and only in debug
> builds.

I expect that the number of issues will increase a lot as soon as we 
start to analyze production builds.

I don't think it will be a solution to either replace all the ASSERT() 
with runtime check in all configuration or even...

> So perhaps instead we need a different way of tracking
> false positives (which need to be tied to specific checker versions
> anyway).

... documenting false positive.

IMHO, the only viable option would be to have a configuration to keep 
ASSERT in production build for scanning tools.

Cheers,
Andrew Cooper Nov. 8, 2023, 1:37 p.m. UTC | #8
On 03/11/2023 5:58 pm, Nicola Vetrini wrote:
> Static analysis tools may detect a possible null
> pointer dereference at line 760 (the memcpy call)
> of xen/common/domain.c. This ASSERT helps them in
> detecting that such a condition is not possible
> and also provides a basic sanity check.
>
> Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
> ---
> The check may be later improved by proper error checking
> instead of relying on the semantics explained here:
> https://lore.kernel.org/xen-devel/61f04d4b-34d9-4fd1-a989-56b042b4f3d8@citrix.com/
>
> This addresses the caution reported by ECLAIR for MISRA C:2012 D4.11
> ---
>  xen/common/domain.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 8f9ab01c0cb7..9378c0417645 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -700,6 +700,8 @@ struct domain *domain_create(domid_t domid,
>  
>      if ( !is_idle_domain(d) )
>      {
> +        ASSERT(config);
> +
>          watchdog_domain_init(d);
>          init_status |= INIT_watchdog;
>  

I have an idea that might resolve this differently and in an easier way.

Would you be happy waiting for a couple of days for me to experiment? 
Absolutely no guarantees of it turning into a workable solution.

~Andrew
Nicola Vetrini Nov. 8, 2023, 1:45 p.m. UTC | #9
On 2023-11-08 14:37, Andrew Cooper wrote:
> On 03/11/2023 5:58 pm, Nicola Vetrini wrote:
>> Static analysis tools may detect a possible null
>> pointer dereference at line 760 (the memcpy call)
>> of xen/common/domain.c. This ASSERT helps them in
>> detecting that such a condition is not possible
>> and also provides a basic sanity check.
>> 
>> Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
>> ---
>> The check may be later improved by proper error checking
>> instead of relying on the semantics explained here:
>> https://lore.kernel.org/xen-devel/61f04d4b-34d9-4fd1-a989-56b042b4f3d8@citrix.com/
>> 
>> This addresses the caution reported by ECLAIR for MISRA C:2012 D4.11
>> ---
>>  xen/common/domain.c | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 8f9ab01c0cb7..9378c0417645 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -700,6 +700,8 @@ struct domain *domain_create(domid_t domid,
>> 
>>      if ( !is_idle_domain(d) )
>>      {
>> +        ASSERT(config);
>> +
>>          watchdog_domain_init(d);
>>          init_status |= INIT_watchdog;
>> 
> 
> I have an idea that might resolve this differently and in an easier 
> way.
> 
> Would you be happy waiting for a couple of days for me to experiment? 
> Absolutely no guarantees of it turning into a workable solution.
> 

Sure, no problem.
Jan Beulich Nov. 9, 2023, 7:42 a.m. UTC | #10
On 08.11.2023 14:33, Julien Grall wrote:
> Hi Jan,
> 
> On 08/11/2023 11:19, Jan Beulich wrote:
>> On 08.11.2023 12:03, Nicola Vetrini wrote:
>>> On 2023-11-08 09:24, Jan Beulich wrote:
>>>> On 03.11.2023 18:58, Nicola Vetrini wrote:
>>>>> Static analysis tools may detect a possible null
>>>>> pointer dereference at line 760 (the memcpy call)
>>>>> of xen/common/domain.c. This ASSERT helps them in
>>>>> detecting that such a condition is not possible
>>>>> and also provides a basic sanity check.
>>>>
>>>> I disagree with this being a possible justification for adding such a
>>>> redundant assertion. More detail is needed on what is actually
>>>> (suspected to be) confusing the tool. Plus it also needs explaining
>>>> why (a) adding such an assertion helps and (b) how that's going to
>>>> cover release builds.
>>>>
>>>
>>> How about:
>>> "Static analysis tools may detect a possible null pointer dereference
>>> at line 760 (config->handle) due to config possibly being NULL.
>>>
>>> However, given that all system domains, including IDLE, have a NULL
>>> config and in the code path leading to the assertion only real domains
>>> (which have a non-NULL config) can be present."
>>>
>>> On point b): this finding is a false positive, therefore even if the
>>> ASSERT is
>>> expanded to effectively a no-op, there is no inherent problem with Xen's
>>> code.
>>> The context in which the patch was suggested [1] hinted at avoiding
>>> inserting in
>>> the codebase false positive comments.
>>
>> Which I largely agree with. What I don't agree with is adding an
>> assertion which is only papering over the issue, and only in debug
>> builds.
> 
> I expect that the number of issues will increase a lot as soon as we 
> start to analyze production builds.
> 
> I don't think it will be a solution to either replace all the ASSERT() 
> with runtime check in all configuration or even...
> 
>> So perhaps instead we need a different way of tracking
>> false positives (which need to be tied to specific checker versions
>> anyway).
> 
> ... documenting false positive.
> 
> IMHO, the only viable option would be to have a configuration to keep 
> ASSERT in production build for scanning tools.

But wouldn't that then likely mean scanning to be done on builds not also
used in production? Would doing so even be permitted when certification
is a requirement? Or do you expect such production builds to be used with
the assertions left in place (increasing the risk of a crash; recall that
assertions themselves may also be wrong, and hence one triggering in rare
cases may not really be a reason to bring down the system)?

Jan
Jan Beulich Nov. 9, 2023, 7:44 a.m. UTC | #11
On 08.11.2023 14:28, Nicola Vetrini wrote:
> On 2023-11-08 12:19, Jan Beulich wrote:
>> On 08.11.2023 12:03, Nicola Vetrini wrote:
>>> On 2023-11-08 09:24, Jan Beulich wrote:
>>>> On 03.11.2023 18:58, Nicola Vetrini wrote:
>>>>> Static analysis tools may detect a possible null
>>>>> pointer dereference at line 760 (the memcpy call)
>>>>> of xen/common/domain.c. This ASSERT helps them in
>>>>> detecting that such a condition is not possible
>>>>> and also provides a basic sanity check.
>>>>
>>>> I disagree with this being a possible justification for adding such a
>>>> redundant assertion. More detail is needed on what is actually
>>>> (suspected to be) confusing the tool. Plus it also needs explaining
>>>> why (a) adding such an assertion helps and (b) how that's going to
>>>> cover release builds.
>>>>
>>>
>>> How about:
>>> "Static analysis tools may detect a possible null pointer dereference
>>> at line 760 (config->handle) due to config possibly being NULL.
>>>
>>> However, given that all system domains, including IDLE, have a NULL
>>> config and in the code path leading to the assertion only real domains
>>> (which have a non-NULL config) can be present."
>>>
>>> On point b): this finding is a false positive, therefore even if the
>>> ASSERT is
>>> expanded to effectively a no-op, there is no inherent problem with 
>>> Xen's
>>> code.
>>> The context in which the patch was suggested [1] hinted at avoiding
>>> inserting in
>>> the codebase false positive comments.
>>
>> Which I largely agree with. What I don't agree with is adding an
>> assertion which is only papering over the issue, and only in debug
>> builds. So perhaps instead we need a different way of tracking
>> false positives (which need to be tied to specific checker versions
>> anyway).
>>
> 
> Hmm. Is it better in your opinion to write something like:
> 
> if (config == NULL)
>     return ERR_PTR(<some error code>); // or die() or something 
> appropriate
> 
> this would be a rudimentary handling of the error with some messages 
> detailing that something
> is wrong if a domain has a null config at that point.

No. This is no better than a redundant assertion. It's even slightly
worse, as it'll likely leave a trace in generated code for release
builds.

Jan
Julien Grall Nov. 9, 2023, 9:10 a.m. UTC | #12
Hi Jan,

On 09/11/2023 07:42, Jan Beulich wrote:
> On 08.11.2023 14:33, Julien Grall wrote:
>> Hi Jan,
>>
>> On 08/11/2023 11:19, Jan Beulich wrote:
>>> On 08.11.2023 12:03, Nicola Vetrini wrote:
>>>> On 2023-11-08 09:24, Jan Beulich wrote:
>>>>> On 03.11.2023 18:58, Nicola Vetrini wrote:
>>>>>> Static analysis tools may detect a possible null
>>>>>> pointer dereference at line 760 (the memcpy call)
>>>>>> of xen/common/domain.c. This ASSERT helps them in
>>>>>> detecting that such a condition is not possible
>>>>>> and also provides a basic sanity check.
>>>>>
>>>>> I disagree with this being a possible justification for adding such a
>>>>> redundant assertion. More detail is needed on what is actually
>>>>> (suspected to be) confusing the tool. Plus it also needs explaining
>>>>> why (a) adding such an assertion helps and (b) how that's going to
>>>>> cover release builds.
>>>>>
>>>>
>>>> How about:
>>>> "Static analysis tools may detect a possible null pointer dereference
>>>> at line 760 (config->handle) due to config possibly being NULL.
>>>>
>>>> However, given that all system domains, including IDLE, have a NULL
>>>> config and in the code path leading to the assertion only real domains
>>>> (which have a non-NULL config) can be present."
>>>>
>>>> On point b): this finding is a false positive, therefore even if the
>>>> ASSERT is
>>>> expanded to effectively a no-op, there is no inherent problem with Xen's
>>>> code.
>>>> The context in which the patch was suggested [1] hinted at avoiding
>>>> inserting in
>>>> the codebase false positive comments.
>>>
>>> Which I largely agree with. What I don't agree with is adding an
>>> assertion which is only papering over the issue, and only in debug
>>> builds.
>>
>> I expect that the number of issues will increase a lot as soon as we
>> start to analyze production builds.
>>
>> I don't think it will be a solution to either replace all the ASSERT()
>> with runtime check in all configuration or even...
>>
>>> So perhaps instead we need a different way of tracking
>>> false positives (which need to be tied to specific checker versions
>>> anyway).
>>
>> ... documenting false positive.
>>
>> IMHO, the only viable option would be to have a configuration to keep
>> ASSERT in production build for scanning tools.
> 
> But wouldn't that then likely mean scanning to be done on builds not also
> used in production? Would doing so even be permitted when certification
> is a requirement? Or do you expect such production builds to be used with
> the assertions left in place (increasing the risk of a crash; recall that
> assertions themselves may also be wrong, and hence one triggering in rare
> cases may not really be a reason to bring down the system)?

I will leave Stefano/Nicola to answer from the certification 
perspective. But I don't really see how we could get away unless we 
replace most of the ASSERT() with proper runtime check (which may not be 
desirable for ASSERT()s like this one).

Cheers,
Stefano Stabellini Nov. 10, 2023, 12:29 a.m. UTC | #13
On Thu, 9 Nov 2023, Julien Grall wrote:
> On 09/11/2023 07:42, Jan Beulich wrote:
> > On 08.11.2023 14:33, Julien Grall wrote:
> > > Hi Jan,
> > > 
> > > On 08/11/2023 11:19, Jan Beulich wrote:
> > > > On 08.11.2023 12:03, Nicola Vetrini wrote:
> > > > > On 2023-11-08 09:24, Jan Beulich wrote:
> > > > > > On 03.11.2023 18:58, Nicola Vetrini wrote:
> > > > > > > Static analysis tools may detect a possible null
> > > > > > > pointer dereference at line 760 (the memcpy call)
> > > > > > > of xen/common/domain.c. This ASSERT helps them in
> > > > > > > detecting that such a condition is not possible
> > > > > > > and also provides a basic sanity check.
> > > > > > 
> > > > > > I disagree with this being a possible justification for adding such
> > > > > > a
> > > > > > redundant assertion. More detail is needed on what is actually
> > > > > > (suspected to be) confusing the tool. Plus it also needs explaining
> > > > > > why (a) adding such an assertion helps and (b) how that's going to
> > > > > > cover release builds.
> > > > > > 
> > > > > 
> > > > > How about:
> > > > > "Static analysis tools may detect a possible null pointer dereference
> > > > > at line 760 (config->handle) due to config possibly being NULL.
> > > > > 
> > > > > However, given that all system domains, including IDLE, have a NULL
> > > > > config and in the code path leading to the assertion only real domains
> > > > > (which have a non-NULL config) can be present."
> > > > > 
> > > > > On point b): this finding is a false positive, therefore even if the
> > > > > ASSERT is
> > > > > expanded to effectively a no-op, there is no inherent problem with
> > > > > Xen's
> > > > > code.
> > > > > The context in which the patch was suggested [1] hinted at avoiding
> > > > > inserting in
> > > > > the codebase false positive comments.
> > > > 
> > > > Which I largely agree with. What I don't agree with is adding an
> > > > assertion which is only papering over the issue, and only in debug
> > > > builds.
> > > 
> > > I expect that the number of issues will increase a lot as soon as we
> > > start to analyze production builds.
> > > 
> > > I don't think it will be a solution to either replace all the ASSERT()
> > > with runtime check in all configuration or even...
> > > 
> > > > So perhaps instead we need a different way of tracking
> > > > false positives (which need to be tied to specific checker versions
> > > > anyway).
> > > 
> > > ... documenting false positive.
> > > 
> > > IMHO, the only viable option would be to have a configuration to keep
> > > ASSERT in production build for scanning tools.
> > 
> > But wouldn't that then likely mean scanning to be done on builds not also
> > used in production? Would doing so even be permitted when certification
> > is a requirement? Or do you expect such production builds to be used with
> > the assertions left in place (increasing the risk of a crash; recall that
> > assertions themselves may also be wrong, and hence one triggering in rare
> > cases may not really be a reason to bring down the system)?
> 
> I will leave Stefano/Nicola to answer from the certification perspective. But
> I don't really see how we could get away unless we replace most of the
> ASSERT() with proper runtime check (which may not be desirable for ASSERT()s
> like this one).

For sure we don't want to replace ASSERTs with runtime checks.

Nicola, do we really need the ASSERT to be implemented as a check, or
would the presence of the ASSERT alone suffice as a tag, the same way we
would be using /* SAF-xx-safe */ or asmlinkage?

If we only need ASSERT as a deviation tag, then production builds vs.
debug build doesn't matter.

If ECLAIR actually needs ASSERT to be implemented as a check, could we
have a special #define to define ASSERT in a special way for static
analysis tools in production builds? For instance:

#ifdef STATIC_ANALYSIS
#define ASSERT(p) \
    do { if ( unlikely(!(p)) ) printk("ASSERT triggered %s:%d", __file__,__LINE__); } while (0)
#endif

Nicola, would that be enough?
Julien Grall Nov. 10, 2023, 9:06 a.m. UTC | #14
Hi Stefano,

On 10/11/2023 00:29, Stefano Stabellini wrote:
> On Thu, 9 Nov 2023, Julien Grall wrote:
>> On 09/11/2023 07:42, Jan Beulich wrote:
>>> On 08.11.2023 14:33, Julien Grall wrote:
>>>> Hi Jan,
>>>>
>>>> On 08/11/2023 11:19, Jan Beulich wrote:
>>>>> On 08.11.2023 12:03, Nicola Vetrini wrote:
>>>>>> On 2023-11-08 09:24, Jan Beulich wrote:
>>>>>>> On 03.11.2023 18:58, Nicola Vetrini wrote:
>>>>>>>> Static analysis tools may detect a possible null
>>>>>>>> pointer dereference at line 760 (the memcpy call)
>>>>>>>> of xen/common/domain.c. This ASSERT helps them in
>>>>>>>> detecting that such a condition is not possible
>>>>>>>> and also provides a basic sanity check.
>>>>>>>
>>>>>>> I disagree with this being a possible justification for adding such
>>>>>>> a
>>>>>>> redundant assertion. More detail is needed on what is actually
>>>>>>> (suspected to be) confusing the tool. Plus it also needs explaining
>>>>>>> why (a) adding such an assertion helps and (b) how that's going to
>>>>>>> cover release builds.
>>>>>>>
>>>>>>
>>>>>> How about:
>>>>>> "Static analysis tools may detect a possible null pointer dereference
>>>>>> at line 760 (config->handle) due to config possibly being NULL.
>>>>>>
>>>>>> However, given that all system domains, including IDLE, have a NULL
>>>>>> config and in the code path leading to the assertion only real domains
>>>>>> (which have a non-NULL config) can be present."
>>>>>>
>>>>>> On point b): this finding is a false positive, therefore even if the
>>>>>> ASSERT is
>>>>>> expanded to effectively a no-op, there is no inherent problem with
>>>>>> Xen's
>>>>>> code.
>>>>>> The context in which the patch was suggested [1] hinted at avoiding
>>>>>> inserting in
>>>>>> the codebase false positive comments.
>>>>>
>>>>> Which I largely agree with. What I don't agree with is adding an
>>>>> assertion which is only papering over the issue, and only in debug
>>>>> builds.
>>>>
>>>> I expect that the number of issues will increase a lot as soon as we
>>>> start to analyze production builds.
>>>>
>>>> I don't think it will be a solution to either replace all the ASSERT()
>>>> with runtime check in all configuration or even...
>>>>
>>>>> So perhaps instead we need a different way of tracking
>>>>> false positives (which need to be tied to specific checker versions
>>>>> anyway).
>>>>
>>>> ... documenting false positive.
>>>>
>>>> IMHO, the only viable option would be to have a configuration to keep
>>>> ASSERT in production build for scanning tools.
>>>
>>> But wouldn't that then likely mean scanning to be done on builds not also
>>> used in production? Would doing so even be permitted when certification
>>> is a requirement? Or do you expect such production builds to be used with
>>> the assertions left in place (increasing the risk of a crash; recall that
>>> assertions themselves may also be wrong, and hence one triggering in rare
>>> cases may not really be a reason to bring down the system)?
>>
>> I will leave Stefano/Nicola to answer from the certification perspective. But
>> I don't really see how we could get away unless we replace most of the
>> ASSERT() with proper runtime check (which may not be desirable for ASSERT()s
>> like this one).
> 
> For sure we don't want to replace ASSERTs with runtime checks.
> 
> Nicola, do we really need the ASSERT to be implemented as a check, or
> would the presence of the ASSERT alone suffice as a tag, the same way we
> would be using /* SAF-xx-safe */ or asmlinkage?
> 
> If we only need ASSERT as a deviation tag, then production builds vs.
> debug build doesn't matter.
> 
> If ECLAIR actually needs ASSERT to be implemented as a check, could we
> have a special #define to define ASSERT in a special way for static
> analysis tools in production builds? For instance:
> 
> #ifdef STATIC_ANALYSIS
> #define ASSERT(p) \
>      do { if ( unlikely(!(p)) ) printk("ASSERT triggered %s:%d", __file__,__LINE__); } while (0)
> #endif

Just to make 100% clear, you are saying that assessor will be happy if 
we analyze it with ASSERT enabled but in production we use it wout them 
enabled? The assumption here is that they should have *never* been 
triggered so they surely should not happen in production.

Cheers,
Nicola Vetrini Nov. 10, 2023, 4:30 p.m. UTC | #15
Hi everyone,

I trimmed the thread a bit, to make this more readable.

>>>>> IMHO, the only viable option would be to have a configuration to 
>>>>> keep
>>>>> ASSERT in production build for scanning tools.
>>>> 
>>>> But wouldn't that then likely mean scanning to be done on builds not 
>>>> also
>>>> used in production? Would doing so even be permitted when 
>>>> certification
>>>> is a requirement? Or do you expect such production builds to be used 
>>>> with
>>>> the assertions left in place (increasing the risk of a crash; recall 
>>>> that
>>>> assertions themselves may also be wrong, and hence one triggering in 
>>>> rare
>>>> cases may not really be a reason to bring down the system)?
>>> 
>>> I will leave Stefano/Nicola to answer from the certification 
>>> perspective. But
>>> I don't really see how we could get away unless we replace most of 
>>> the
>>> ASSERT() with proper runtime check (which may not be desirable for 
>>> ASSERT()s
>>> like this one).
>> 
>> For sure we don't want to replace ASSERTs with runtime checks.
>> 
>> Nicola, do we really need the ASSERT to be implemented as a check, or
>> would the presence of the ASSERT alone suffice as a tag, the same way 
>> we
>> would be using /* SAF-xx-safe */ or asmlinkage?
>> 
>> If we only need ASSERT as a deviation tag, then production builds vs.
>> debug build doesn't matter.
>> 
>> If ECLAIR actually needs ASSERT to be implemented as a check, could we
>> have a special #define to define ASSERT in a special way for static
>> analysis tools in production builds? For instance:
>> 
>> #ifdef STATIC_ANALYSIS
>> #define ASSERT(p) \
>>      do { if ( unlikely(!(p)) ) printk("ASSERT triggered %s:%d", 
>> __file__,__LINE__); } while (0)
>> #endif
> 
> Just to make 100% clear, you are saying that assessor will be happy if 
> we analyze it with ASSERT enabled but in production we use it wout them 
> enabled? The assumption here is that they should have *never* been 
> triggered so they surely should not happen in production.
> 
> Cheers,

First of all, Andrew is experimenting with an alternate solution, so we 
should wait making
any decision here until he can share the outcome of his findings.
However, from a certification perspective, the fact that the codebase is 
tested with
asserts enabled is a strong enough claim for a justification to be based 
on an assertion;
the code path just needs to be exercised by the tests.
Getting into the business of how to define asserts for static analysis 
is likely to
just cause more trouble.
Andrew Cooper Nov. 10, 2023, 4:56 p.m. UTC | #16
On 08/11/2023 1:45 pm, Nicola Vetrini wrote:
> On 2023-11-08 14:37, Andrew Cooper wrote:
>> On 03/11/2023 5:58 pm, Nicola Vetrini wrote:
>>> Static analysis tools may detect a possible null
>>> pointer dereference at line 760 (the memcpy call)
>>> of xen/common/domain.c. This ASSERT helps them in
>>> detecting that such a condition is not possible
>>> and also provides a basic sanity check.
>>>
>>> Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
>>> ---
>>> The check may be later improved by proper error checking
>>> instead of relying on the semantics explained here:
>>> https://lore.kernel.org/xen-devel/61f04d4b-34d9-4fd1-a989-56b042b4f3d8@citrix.com/
>>>
>>>
>>> This addresses the caution reported by ECLAIR for MISRA C:2012 D4.11
>>> ---
>>>  xen/common/domain.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>>> index 8f9ab01c0cb7..9378c0417645 100644
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -700,6 +700,8 @@ struct domain *domain_create(domid_t domid,
>>>
>>>      if ( !is_idle_domain(d) )
>>>      {
>>> +        ASSERT(config);
>>> +
>>>          watchdog_domain_init(d);
>>>          init_status |= INIT_watchdog;
>>>
>>
>> I have an idea that might resolve this differently and in an easier way.
>>
>> Would you be happy waiting for a couple of days for me to experiment? 
>> Absolutely no guarantees of it turning into a workable solution.
>>
>
> Sure, no problem.
>

I'm afraid my experiments have failed.  I've got a bit of cleanup done
(which does remove the idle-domain predicate in context above), but
nothing that I expect would help with this issue specifically.

The best I can suggest is to copy x86's arch_domain_create() in it's
handling of config, which would end up looking like:

if ( !config )
{
    ASSERT_UNREACHABLE();
    goto fail;
}

to make a runtime-failsafe path in the same pattern that we use
elsewhere, and is known to influence toolchains.

This is actually the pattern used to emulate __builtin_assume() in GCC. 
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1774r4.pdf

~Andrew
Stefano Stabellini Nov. 11, 2023, 1:13 a.m. UTC | #17
On Fri, 10 Nov 2023, Nicola Vetrini wrote:
> Hi everyone,
> 
> I trimmed the thread a bit, to make this more readable.
> 
> > > > > > IMHO, the only viable option would be to have a configuration to
> > > > > > keep
> > > > > > ASSERT in production build for scanning tools.
> > > > > 
> > > > > But wouldn't that then likely mean scanning to be done on builds not
> > > > > also
> > > > > used in production? Would doing so even be permitted when
> > > > > certification
> > > > > is a requirement? Or do you expect such production builds to be used
> > > > > with
> > > > > the assertions left in place (increasing the risk of a crash; recall
> > > > > that
> > > > > assertions themselves may also be wrong, and hence one triggering in
> > > > > rare
> > > > > cases may not really be a reason to bring down the system)?
> > > > 
> > > > I will leave Stefano/Nicola to answer from the certification
> > > > perspective. But
> > > > I don't really see how we could get away unless we replace most of the
> > > > ASSERT() with proper runtime check (which may not be desirable for
> > > > ASSERT()s
> > > > like this one).
> > > 
> > > For sure we don't want to replace ASSERTs with runtime checks.
> > > 
> > > Nicola, do we really need the ASSERT to be implemented as a check, or
> > > would the presence of the ASSERT alone suffice as a tag, the same way we
> > > would be using /* SAF-xx-safe */ or asmlinkage?
> > > 
> > > If we only need ASSERT as a deviation tag, then production builds vs.
> > > debug build doesn't matter.
> > > 
> > > If ECLAIR actually needs ASSERT to be implemented as a check, could we
> > > have a special #define to define ASSERT in a special way for static
> > > analysis tools in production builds? For instance:
> > > 
> > > #ifdef STATIC_ANALYSIS
> > > #define ASSERT(p) \
> > >      do { if ( unlikely(!(p)) ) printk("ASSERT triggered %s:%d",
> > > __file__,__LINE__); } while (0)
> > > #endif
> > 
> > Just to make 100% clear, you are saying that assessor will be happy if we
> > analyze it with ASSERT enabled but in production we use it wout them
> > enabled? The assumption here is that they should have *never* been triggered
> > so they surely should not happen in production.
> > 
> > Cheers,
> 
> First of all, Andrew is experimenting with an alternate solution, so we should
> wait making
> any decision here until he can share the outcome of his findings.
> However, from a certification perspective, the fact that the codebase is
> tested with
> asserts enabled is a strong enough claim for a justification to be based on an
> assertion;
> the code path just needs to be exercised by the tests.
> Getting into the business of how to define asserts for static analysis is
> likely to
> just cause more trouble.

That's great. Also given Andrew's reply, then can we just go ahead with
adding the ASSERT as done in this patch (with the added in-code comment
as requested by Jan)?
Nicola Vetrini Nov. 14, 2023, 4:05 p.m. UTC | #18
On 2023-11-11 02:13, Stefano Stabellini wrote:
> On Fri, 10 Nov 2023, Nicola Vetrini wrote:
>> Hi everyone,
>> 
>> I trimmed the thread a bit, to make this more readable.
>> 
>> > > > > > IMHO, the only viable option would be to have a configuration to
>> > > > > > keep
>> > > > > > ASSERT in production build for scanning tools.
>> > > > >
>> > > > > But wouldn't that then likely mean scanning to be done on builds not
>> > > > > also
>> > > > > used in production? Would doing so even be permitted when
>> > > > > certification
>> > > > > is a requirement? Or do you expect such production builds to be used
>> > > > > with
>> > > > > the assertions left in place (increasing the risk of a crash; recall
>> > > > > that
>> > > > > assertions themselves may also be wrong, and hence one triggering in
>> > > > > rare
>> > > > > cases may not really be a reason to bring down the system)?
>> > > >
>> > > > I will leave Stefano/Nicola to answer from the certification
>> > > > perspective. But
>> > > > I don't really see how we could get away unless we replace most of the
>> > > > ASSERT() with proper runtime check (which may not be desirable for
>> > > > ASSERT()s
>> > > > like this one).
>> > >
>> > > For sure we don't want to replace ASSERTs with runtime checks.
>> > >
>> > > Nicola, do we really need the ASSERT to be implemented as a check, or
>> > > would the presence of the ASSERT alone suffice as a tag, the same way we
>> > > would be using /* SAF-xx-safe */ or asmlinkage?
>> > >
>> > > If we only need ASSERT as a deviation tag, then production builds vs.
>> > > debug build doesn't matter.
>> > >
>> > > If ECLAIR actually needs ASSERT to be implemented as a check, could we
>> > > have a special #define to define ASSERT in a special way for static
>> > > analysis tools in production builds? For instance:
>> > >
>> > > #ifdef STATIC_ANALYSIS
>> > > #define ASSERT(p) \
>> > >      do { if ( unlikely(!(p)) ) printk("ASSERT triggered %s:%d",
>> > > __file__,__LINE__); } while (0)
>> > > #endif
>> >
>> > Just to make 100% clear, you are saying that assessor will be happy if we
>> > analyze it with ASSERT enabled but in production we use it wout them
>> > enabled? The assumption here is that they should have *never* been triggered
>> > so they surely should not happen in production.
>> >
>> > Cheers,
>> 
>> First of all, Andrew is experimenting with an alternate solution, so 
>> we should
>> wait making
>> any decision here until he can share the outcome of his findings.
>> However, from a certification perspective, the fact that the codebase 
>> is
>> tested with
>> asserts enabled is a strong enough claim for a justification to be 
>> based on an
>> assertion;
>> the code path just needs to be exercised by the tests.
>> Getting into the business of how to define asserts for static analysis 
>> is
>> likely to
>> just cause more trouble.
> 
> That's great. Also given Andrew's reply, then can we just go ahead with
> adding the ASSERT as done in this patch (with the added in-code comment
> as requested by Jan)?

Yes, and sorry for the late reply.
diff mbox series

Patch

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 8f9ab01c0cb7..9378c0417645 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -700,6 +700,8 @@  struct domain *domain_create(domid_t domid,
 
     if ( !is_idle_domain(d) )
     {
+        ASSERT(config);
+
         watchdog_domain_init(d);
         init_status |= INIT_watchdog;