diff mbox series

ubsan: Introduce CONFIG_UBSAN_FATAL to panic on UBSAN failure

Message ID 20231127144156.361656-1-michal.orzel@amd.com (mailing list archive)
State New, archived
Headers show
Series ubsan: Introduce CONFIG_UBSAN_FATAL to panic on UBSAN failure | expand

Commit Message

Michal Orzel Nov. 27, 2023, 2:41 p.m. UTC
Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
attention to undefined behavior issues, notably during CI test runs, is
essential. When enabled, this option causes Xen to panic upon detecting
UBSAN failure (as the last step in ubsan_epilogue()).

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
---
 xen/Kconfig.debug        | 7 +++++++
 xen/common/ubsan/ubsan.c | 4 ++++
 2 files changed, 11 insertions(+)

Comments

Julien Grall Nov. 28, 2023, 4:14 p.m. UTC | #1
Hi Michal,

On 27/11/2023 15:41, Michal Orzel wrote:
> Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
> attention to undefined behavior issues, notably during CI test runs, is
> essential. When enabled, this option causes Xen to panic upon detecting
> UBSAN failure (as the last step in ubsan_epilogue()).

I have mixed opinions on this patch. This would be a good one if we had 
a Xen mostly free from UBSAN behavior. But this is not the case at least 
on arm32. So we would end up to stop at the first error. IOW, we would 
need to fix the first error before we can see the next one.

So I feel it would be best if the gitlab CI jobs actually check for 
USBAN in the logs and fail if there are any. With that, we can get the 
full list of UBSAN issues on each job.

Cheers,

> 
> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
> ---
>   xen/Kconfig.debug        | 7 +++++++
>   xen/common/ubsan/ubsan.c | 4 ++++
>   2 files changed, 11 insertions(+)
> 
> diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
> index 94e818ee09b5..e19e9d48817c 100644
> --- a/xen/Kconfig.debug
> +++ b/xen/Kconfig.debug
> @@ -107,6 +107,13 @@ config UBSAN
>   
>   	  If unsure, say N here.
>   
> +config UBSAN_FATAL
> +	bool "Panic on UBSAN failure"
> +	depends on UBSAN
> +	help
> +	  Enabling this option will cause Xen to panic when an undefined behavior
> +	  is detected by UBSAN. If unsure, say N here.
> +
>   config DEBUG_TRACE
>   	bool "Debug trace support"
>   	---help---
> diff --git a/xen/common/ubsan/ubsan.c b/xen/common/ubsan/ubsan.c
> index a3a80fa99eec..dd5ee0013648 100644
> --- a/xen/common/ubsan/ubsan.c
> +++ b/xen/common/ubsan/ubsan.c
> @@ -174,6 +174,10 @@ static void ubsan_epilogue(unsigned long *flags)
>   		"========================================\n");
>   	spin_unlock_irqrestore(&report_lock, *flags);
>   	current->in_ubsan--;
> +
> +#ifdef CONFIG_UBSAN_FATAL
> +	panic("UBSAN failure detected\n");
> +#endif
>   }
>   
>   static void handle_overflow(struct overflow_data *data, unsigned long lhs,
Michal Orzel Nov. 28, 2023, 5 p.m. UTC | #2
Hi Julien,

On 28/11/2023 17:14, Julien Grall wrote:
> 
> 
> Hi Michal,
> 
> On 27/11/2023 15:41, Michal Orzel wrote:
>> Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
>> attention to undefined behavior issues, notably during CI test runs, is
>> essential. When enabled, this option causes Xen to panic upon detecting
>> UBSAN failure (as the last step in ubsan_epilogue()).
> 
> I have mixed opinions on this patch. This would be a good one if we had
> a Xen mostly free from UBSAN behavior. But this is not the case at least
> on arm32. So we would end up to stop at the first error. IOW, we would
> need to fix the first error before we can see the next one.
Well, this patch introduces a config option disabled by default.
If we end up enabling it for CI for reasons* stated by Andrew, then the natural way
of handling such issues is to do the investigation locally. Then, you are not forced
to select this option and you can see all the UBSAN issues if you want.

> 
> So I feel it would be best if the gitlab CI jobs actually check for
> USBAN in the logs and fail if there are any. With that, we can get the
> full list of UBSAN issues on each job.
Well, I prefer Andrew suggestion (both [1] and on IRC), hence this patch.

*my plan was to first fix the UBSAN issues and then enable this option for CI.

[1] https://lore.kernel.org/xen-devel/7421ddfd-8947-4fe1-93a6-dc25a4aa8bbc@citrix.com/T/#t

~Michal
Julien Grall Nov. 28, 2023, 5:09 p.m. UTC | #3
On 28/11/2023 18:00, Michal Orzel wrote:
> Hi Julien,
> 
> On 28/11/2023 17:14, Julien Grall wrote:
>>
>>
>> Hi Michal,
>>
>> On 27/11/2023 15:41, Michal Orzel wrote:
>>> Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
>>> attention to undefined behavior issues, notably during CI test runs, is
>>> essential. When enabled, this option causes Xen to panic upon detecting
>>> UBSAN failure (as the last step in ubsan_epilogue()).
>>
>> I have mixed opinions on this patch. This would be a good one if we had
>> a Xen mostly free from UBSAN behavior. But this is not the case at least
>> on arm32. So we would end up to stop at the first error. IOW, we would
>> need to fix the first error before we can see the next one.
> Well, this patch introduces a config option disabled by default.

I understood this is disabled by default... I am pointing out that I am 
not convinced about the usefulness until we are at the stage where Xen 
is normally not reporting any USBAN error.

> If we end up enabling it for CI for reasons* stated by Andrew, then the natural way
> of handling such issues is to do the investigation locally.

This will not always be possible. One example is when you are only able 
to reproduce some of the USBAN errors on a specific HW.

> Then, you are not forced
> to select this option and you can see all the UBSAN issues if you want.

See above, I got that point. I am mostly concerned about the implication 
in the CI right now.

> 
>>
>> So I feel it would be best if the gitlab CI jobs actually check for
>> USBAN in the logs and fail if there are any. With that, we can get the
>> full list of UBSAN issues on each job.
> Well, I prefer Andrew suggestion (both [1] and on IRC), hence this patch.
> 
> *my plan was to first fix the UBSAN issues and then enable this option for CI.

That would have been useful to describe your goal after "---". With that 
in mind, then I suggest to revisit this patch once all the UBSAN issues 
in a normal build are fixed.

Cheers,
Andrew Cooper Nov. 28, 2023, 5:10 p.m. UTC | #4
On 28/11/2023 5:00 pm, Michal Orzel wrote:
> Hi Julien,
>
> On 28/11/2023 17:14, Julien Grall wrote:
>>
>> Hi Michal,
>>
>> On 27/11/2023 15:41, Michal Orzel wrote:
>>> Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
>>> attention to undefined behavior issues, notably during CI test runs, is
>>> essential. When enabled, this option causes Xen to panic upon detecting
>>> UBSAN failure (as the last step in ubsan_epilogue()).
>> I have mixed opinions on this patch. This would be a good one if we had
>> a Xen mostly free from UBSAN behavior. But this is not the case at least
>> on arm32. So we would end up to stop at the first error. IOW, we would
>> need to fix the first error before we can see the next one.
> Well, this patch introduces a config option disabled by default.
> If we end up enabling it for CI for reasons* stated by Andrew, then the natural way
> of handling such issues is to do the investigation locally. Then, you are not forced
> to select this option and you can see all the UBSAN issues if you want.
>
>> So I feel it would be best if the gitlab CI jobs actually check for
>> USBAN in the logs and fail if there are any. With that, we can get the
>> full list of UBSAN issues on each job.
> Well, I prefer Andrew suggestion (both [1] and on IRC), hence this patch.
>
> *my plan was to first fix the UBSAN issues and then enable this option for CI.
>
> [1] https://lore.kernel.org/xen-devel/7421ddfd-8947-4fe1-93a6-dc25a4aa8bbc@citrix.com/T/#t

Searching logs is never reliable.

Even within gitlab, we've had error upon error getting the console logs
out reliably, and it's even worse on real hardware.

This is an optional thing, but for people who want it, it's very highly
desirable.

~Andrew
Michal Orzel Nov. 28, 2023, 5:15 p.m. UTC | #5
On 28/11/2023 18:09, Julien Grall wrote:
> 
> 
> On 28/11/2023 18:00, Michal Orzel wrote:
>> Hi Julien,
>>
>> On 28/11/2023 17:14, Julien Grall wrote:
>>>
>>>
>>> Hi Michal,
>>>
>>> On 27/11/2023 15:41, Michal Orzel wrote:
>>>> Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
>>>> attention to undefined behavior issues, notably during CI test runs, is
>>>> essential. When enabled, this option causes Xen to panic upon detecting
>>>> UBSAN failure (as the last step in ubsan_epilogue()).
>>>
>>> I have mixed opinions on this patch. This would be a good one if we had
>>> a Xen mostly free from UBSAN behavior. But this is not the case at least
>>> on arm32. So we would end up to stop at the first error. IOW, we would
>>> need to fix the first error before we can see the next one.
>> Well, this patch introduces a config option disabled by default.
> 
> I understood this is disabled by default... I am pointing out that I am
> not convinced about the usefulness until we are at the stage where Xen
> is normally not reporting any USBAN error.
> 
>> If we end up enabling it for CI for reasons* stated by Andrew, then the natural way
>> of handling such issues is to do the investigation locally.
> 
> This will not always be possible. One example is when you are only able
> to reproduce some of the USBAN errors on a specific HW.
> 
>> Then, you are not forced
>> to select this option and you can see all the UBSAN issues if you want.
> 
> See above, I got that point. I am mostly concerned about the implication
> in the CI right now.
> 
>>
>>>
>>> So I feel it would be best if the gitlab CI jobs actually check for
>>> USBAN in the logs and fail if there are any. With that, we can get the
>>> full list of UBSAN issues on each job.
>> Well, I prefer Andrew suggestion (both [1] and on IRC), hence this patch.
>>
>> *my plan was to first fix the UBSAN issues and then enable this option for CI.
> 
> That would have been useful to describe your goal after "---". With that
> in mind, then I suggest to revisit this patch once all the UBSAN issues
> in a normal build are fixed.
But this patch does not enable this option for CI automatically, right? Why are you so keen to push it back?
Is it because you see no point in this option other than for the upstream CI loop? I find it useful on a day-to-day
Xen runs, and I would for sure enable it by default in my config not to miss UBSAN failures.

~Michal
Andrew Cooper Nov. 28, 2023, 5:15 p.m. UTC | #6
On 27/11/2023 2:41 pm, Michal Orzel wrote:
> diff --git a/xen/common/ubsan/ubsan.c b/xen/common/ubsan/ubsan.c
> index a3a80fa99eec..dd5ee0013648 100644
> --- a/xen/common/ubsan/ubsan.c
> +++ b/xen/common/ubsan/ubsan.c
> @@ -174,6 +174,10 @@ static void ubsan_epilogue(unsigned long *flags)
>  		"========================================\n");
>  	spin_unlock_irqrestore(&report_lock, *flags);
>  	current->in_ubsan--;
> +
> +#ifdef CONFIG_UBSAN_FATAL
> +	panic("UBSAN failure detected\n");
> +#endif

if ( IS_ENABLED(CONFIG_UBSAN_FATAL) )
    panic("UBSAN failure detected\n");

please.  Happy to fix on commit.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Michal Orzel Nov. 28, 2023, 5:18 p.m. UTC | #7
On 28/11/2023 18:15, Andrew Cooper wrote:
> 
> 
> On 27/11/2023 2:41 pm, Michal Orzel wrote:
>> diff --git a/xen/common/ubsan/ubsan.c b/xen/common/ubsan/ubsan.c
>> index a3a80fa99eec..dd5ee0013648 100644
>> --- a/xen/common/ubsan/ubsan.c
>> +++ b/xen/common/ubsan/ubsan.c
>> @@ -174,6 +174,10 @@ static void ubsan_epilogue(unsigned long *flags)
>>               "========================================\n");
>>       spin_unlock_irqrestore(&report_lock, *flags);
>>       current->in_ubsan--;
>> +
>> +#ifdef CONFIG_UBSAN_FATAL
>> +     panic("UBSAN failure detected\n");
>> +#endif
> 
> if ( IS_ENABLED(CONFIG_UBSAN_FATAL) )
>     panic("UBSAN failure detected\n");
> 
> please.  Happy to fix on commit.
Sounds good to me, thanks.

~Michal
Julien Grall Nov. 28, 2023, 5:52 p.m. UTC | #8
Hi Michal,

On 28/11/2023 18:15, Michal Orzel wrote:
> 
> 
> On 28/11/2023 18:09, Julien Grall wrote:
>>
>>
>> On 28/11/2023 18:00, Michal Orzel wrote:
>>> Hi Julien,
>>>
>>> On 28/11/2023 17:14, Julien Grall wrote:
>>>>
>>>>
>>>> Hi Michal,
>>>>
>>>> On 27/11/2023 15:41, Michal Orzel wrote:
>>>>> Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
>>>>> attention to undefined behavior issues, notably during CI test runs, is
>>>>> essential. When enabled, this option causes Xen to panic upon detecting
>>>>> UBSAN failure (as the last step in ubsan_epilogue()).
>>>>
>>>> I have mixed opinions on this patch. This would be a good one if we had
>>>> a Xen mostly free from UBSAN behavior. But this is not the case at least
>>>> on arm32. So we would end up to stop at the first error. IOW, we would
>>>> need to fix the first error before we can see the next one.
>>> Well, this patch introduces a config option disabled by default.
>>
>> I understood this is disabled by default... I am pointing out that I am
>> not convinced about the usefulness until we are at the stage where Xen
>> is normally not reporting any USBAN error.
>>
>>> If we end up enabling it for CI for reasons* stated by Andrew, then the natural way
>>> of handling such issues is to do the investigation locally.
>>
>> This will not always be possible. One example is when you are only able
>> to reproduce some of the USBAN errors on a specific HW.
>>
>>> Then, you are not forced
>>> to select this option and you can see all the UBSAN issues if you want.
>>
>> See above, I got that point. I am mostly concerned about the implication
>> in the CI right now.
>>
>>>
>>>>
>>>> So I feel it would be best if the gitlab CI jobs actually check for
>>>> USBAN in the logs and fail if there are any. With that, we can get the
>>>> full list of UBSAN issues on each job.
>>> Well, I prefer Andrew suggestion (both [1] and on IRC), hence this patch.
>>>
>>> *my plan was to first fix the UBSAN issues and then enable this option for CI.
>>
>> That would have been useful to describe your goal after "---". With that
>> in mind, then I suggest to revisit this patch once all the UBSAN issues
>> in a normal build are fixed.
> But this patch does not enable this option for CI automatically, right?

Correct.

> Why are you so keen to push it back?

I have been pushing back because your commit message refers to the CI 
specifically and today this would not really work (read as I would not 
be happy if this option is enabled in the CI right now at least on arm32).

If you want to fail a CI job for UBSAN today, then we need to find a 
different approach so we can get the full list of UBSAN error rather 
than fixing one, retry and then next one.

> Is it because you see no point in this option other than for the upstream CI loop?

Even in the upstream CI loop, I am a little unsure about this approach. 
At least, I feel I would want to see all the reports at once in the CI.

But this is not really a strong feeling.

> I find it useful on a day-to-day
> Xen runs, and I would for sure enable it by default in my config not to miss UBSAN failures.

Fair enough. I view USBAN issues the same a WARN_ON. They all need to be 
investigated. So now you have an inconsistent policy.

Are you are also intending to switch WARN_ON() to fatal? If not, then 
why would UBSAN warnings more important that the other warnings?

Cheers,
Stefano Stabellini Nov. 29, 2023, 2:29 a.m. UTC | #9
On Tue, 28 Nov 2023, Julien Grall wrote:
> On 28/11/2023 18:15, Michal Orzel wrote:
> > On 28/11/2023 18:09, Julien Grall wrote:
> > > On 28/11/2023 18:00, Michal Orzel wrote:
> > > > On 28/11/2023 17:14, Julien Grall wrote:
> > > > > On 27/11/2023 15:41, Michal Orzel wrote:
> > > > > > Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where
> > > > > > prompt
> > > > > > attention to undefined behavior issues, notably during CI test runs,
> > > > > > is
> > > > > > essential. When enabled, this option causes Xen to panic upon
> > > > > > detecting
> > > > > > UBSAN failure (as the last step in ubsan_epilogue()).
> > > > > 
> > > > > I have mixed opinions on this patch. This would be a good one if we
> > > > > had
> > > > > a Xen mostly free from UBSAN behavior. But this is not the case at
> > > > > least
> > > > > on arm32. So we would end up to stop at the first error. IOW, we would
> > > > > need to fix the first error before we can see the next one.
> > > > Well, this patch introduces a config option disabled by default.
> > > 
> > > I understood this is disabled by default... I am pointing out that I am
> > > not convinced about the usefulness until we are at the stage where Xen
> > > is normally not reporting any USBAN error.
> > > 
> > > > If we end up enabling it for CI for reasons* stated by Andrew, then the
> > > > natural way
> > > > of handling such issues is to do the investigation locally.
> > > 
> > > This will not always be possible. One example is when you are only able
> > > to reproduce some of the USBAN errors on a specific HW.
> > > 
> > > > Then, you are not forced
> > > > to select this option and you can see all the UBSAN issues if you want.
> > > 
> > > See above, I got that point. I am mostly concerned about the implication
> > > in the CI right now.
> > > 
> > > > 
> > > > > 
> > > > > So I feel it would be best if the gitlab CI jobs actually check for
> > > > > USBAN in the logs and fail if there are any. With that, we can get the
> > > > > full list of UBSAN issues on each job.
> > > > Well, I prefer Andrew suggestion (both [1] and on IRC), hence this
> > > > patch.
> > > > 
> > > > *my plan was to first fix the UBSAN issues and then enable this option
> > > > for CI.
> > > 
> > > That would have been useful to describe your goal after "---". With that
> > > in mind, then I suggest to revisit this patch once all the UBSAN issues
> > > in a normal build are fixed.
> > But this patch does not enable this option for CI automatically, right?
> 
> Correct.
> 
> > Why are you so keen to push it back?
> 
> I have been pushing back because your commit message refers to the CI
> specifically and today this would not really work (read as I would not be
> happy if this option is enabled in the CI right now at least on arm32).
> 
> If you want to fail a CI job for UBSAN today, then we need to find a different
> approach so we can get the full list of UBSAN error rather than fixing one,
> retry and then next one.
> 
> > Is it because you see no point in this option other than for the upstream CI
> > loop?
> 
> Even in the upstream CI loop, I am a little unsure about this approach. At
> least, I feel I would want to see all the reports at once in the CI.
> 
> But this is not really a strong feeling.
> 
> > I find it useful on a day-to-day
> > Xen runs, and I would for sure enable it by default in my config not to miss
> > UBSAN failures.
> 
> Fair enough. I view USBAN issues the same a WARN_ON. They all need to be
> investigated. So now you have an inconsistent policy.
> 
> Are you are also intending to switch WARN_ON() to fatal? If not, then why
> would UBSAN warnings more important that the other warnings?

I think it would be useful to be able to turn WARN_ONs into BUG_ONs
selectively by component. We could turn all WARN_ONs into BUG_ONs but
that's a bit extreme.

It is true that this patch is a bit... "ad-hoc". But given its
simplicity it doesn't hurt and I think it is nice-to-have for UBSAN. So
I would go with that.

But if we want something more sophisticated, here is an idea. The
concept is that one could specify warn=arch/arm/domain_build.c in the
Xen command line to change a WARN_ON in arch/arm/domain_build.c into a
BUG. I tested it with a WARN_ON I added on purpose and it works.


diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index 133b580768..c78d2963c3 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -20,6 +20,10 @@
 #include <public/platform.h>
 #include <xen/guest_access.h>
 #include <xen/errno.h>
+#include <xen/param.h>
+
+char opt_warn[30] = "";
+string_param("warn", opt_warn);
 
 #ifdef SYMBOLS_ORIGIN
 extern const unsigned int symbols_offsets[];
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index 1793be5b6b..60e14cdb85 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -12,12 +12,19 @@
 #include <xen/xmalloc.h>
 #include <xen/string.h>
 
+extern char opt_warn[30];
+
 #define BUG_ON(p)  do { if (unlikely(p)) BUG();  } while (0)
 #define WARN_ON(p)  ({                  \
     bool ret_warn_on_ = (p);            \
                                         \
     if ( unlikely(ret_warn_on_) )       \
-        WARN();                         \
+    {                                   \
+        if ( !strcmp(__FILE__, opt_warn) ) \
+            BUG();                      \
+        else                            \
+            WARN();                     \
+    }                                   \
     unlikely(ret_warn_on_);             \
 })
Michal Orzel Nov. 29, 2023, 8:02 a.m. UTC | #10
Hi Julien,

On 28/11/2023 18:52, Julien Grall wrote:
> 
> 
> Hi Michal,
> 
> On 28/11/2023 18:15, Michal Orzel wrote:
>>
>>
>> On 28/11/2023 18:09, Julien Grall wrote:
>>>
>>>
>>> On 28/11/2023 18:00, Michal Orzel wrote:
>>>> Hi Julien,
>>>>
>>>> On 28/11/2023 17:14, Julien Grall wrote:
>>>>>
>>>>>
>>>>> Hi Michal,
>>>>>
>>>>> On 27/11/2023 15:41, Michal Orzel wrote:
>>>>>> Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
>>>>>> attention to undefined behavior issues, notably during CI test runs, is
>>>>>> essential. When enabled, this option causes Xen to panic upon detecting
>>>>>> UBSAN failure (as the last step in ubsan_epilogue()).
>>>>>
>>>>> I have mixed opinions on this patch. This would be a good one if we had
>>>>> a Xen mostly free from UBSAN behavior. But this is not the case at least
>>>>> on arm32. So we would end up to stop at the first error. IOW, we would
>>>>> need to fix the first error before we can see the next one.
>>>> Well, this patch introduces a config option disabled by default.
>>>
>>> I understood this is disabled by default... I am pointing out that I am
>>> not convinced about the usefulness until we are at the stage where Xen
>>> is normally not reporting any USBAN error.
>>>
>>>> If we end up enabling it for CI for reasons* stated by Andrew, then the natural way
>>>> of handling such issues is to do the investigation locally.
>>>
>>> This will not always be possible. One example is when you are only able
>>> to reproduce some of the USBAN errors on a specific HW.
>>>
>>>> Then, you are not forced
>>>> to select this option and you can see all the UBSAN issues if you want.
>>>
>>> See above, I got that point. I am mostly concerned about the implication
>>> in the CI right now.
>>>
>>>>
>>>>>
>>>>> So I feel it would be best if the gitlab CI jobs actually check for
>>>>> USBAN in the logs and fail if there are any. With that, we can get the
>>>>> full list of UBSAN issues on each job.
>>>> Well, I prefer Andrew suggestion (both [1] and on IRC), hence this patch.
>>>>
>>>> *my plan was to first fix the UBSAN issues and then enable this option for CI.
>>>
>>> That would have been useful to describe your goal after "---". With that
>>> in mind, then I suggest to revisit this patch once all the UBSAN issues
>>> in a normal build are fixed.
>> But this patch does not enable this option for CI automatically, right?
> 
> Correct.
> 
>> Why are you so keen to push it back?
> 
> I have been pushing back because your commit message refers to the CI
> specifically and today this would not really work (read as I would not
> be happy if this option is enabled in the CI right now at least on arm32).
I mentioned CI as a noteworthy example. In no case, I wrote that this implies the immediate
enabling of this option for all the arches in CI. Especially, given that I'm aware of arm32 issues
as you might know.

> 
> If you want to fail a CI job for UBSAN today, then we need to find a
> different approach so we can get the full list of UBSAN error rather
> than fixing one, retry and then next one.
> 
>> Is it because you see no point in this option other than for the upstream CI loop?
> 
> Even in the upstream CI loop, I am a little unsure about this approach.
> At least, I feel I would want to see all the reports at once in the CI.
> 
> But this is not really a strong feeling.
> 
>> I find it useful on a day-to-day
>> Xen runs, and I would for sure enable it by default in my config not to miss UBSAN failures.
> 
> Fair enough. I view USBAN issues the same a WARN_ON. They all need to be
> investigated. So now you have an inconsistent policy.
> 
> Are you are also intending to switch WARN_ON() to fatal? If not, then
> why would UBSAN warnings more important that the other warnings?
WARN_ON() is placed by the developer to detect e.g. incorrect configuration. The fact that someone
decided to use WARN_ON and not BUG_ON means that this person did some investigation the result of
which suggests no critical consequence. For UBSAN, you can't always be sure (read undefined).
It might be at the same level as WARN_ON but can also result in unpredictable behavior leading to security issues.

That said, I do believe that we should also have option to panic on WARN().
As for this patch, Andrew provided Rb and Stefano is happy with it. Do you want more people to vote?

~Michal
Julien Grall Nov. 29, 2023, 10:32 a.m. UTC | #11
On 29/11/2023 09:02, Michal Orzel wrote:
> Hi Julien,

Hi,

> On 28/11/2023 18:52, Julien Grall wrote:
>>
>>
>> Hi Michal,
>>
>> On 28/11/2023 18:15, Michal Orzel wrote:
>>>
>>>
>>> On 28/11/2023 18:09, Julien Grall wrote:
>>>>
>>>>
>>>> On 28/11/2023 18:00, Michal Orzel wrote:
>>>>> Hi Julien,
>>>>>
>>>>> On 28/11/2023 17:14, Julien Grall wrote:
>>>>>>
>>>>>>
>>>>>> Hi Michal,
>>>>>>
>>>>>> On 27/11/2023 15:41, Michal Orzel wrote:
>>>>>>> Introduce the CONFIG_UBSAN_FATAL option to cater to scenarios where prompt
>>>>>>> attention to undefined behavior issues, notably during CI test runs, is
>>>>>>> essential. When enabled, this option causes Xen to panic upon detecting
>>>>>>> UBSAN failure (as the last step in ubsan_epilogue()).
>>>>>>
>>>>>> I have mixed opinions on this patch. This would be a good one if we had
>>>>>> a Xen mostly free from UBSAN behavior. But this is not the case at least
>>>>>> on arm32. So we would end up to stop at the first error. IOW, we would
>>>>>> need to fix the first error before we can see the next one.
>>>>> Well, this patch introduces a config option disabled by default.
>>>>
>>>> I understood this is disabled by default... I am pointing out that I am
>>>> not convinced about the usefulness until we are at the stage where Xen
>>>> is normally not reporting any USBAN error.
>>>>
>>>>> If we end up enabling it for CI for reasons* stated by Andrew, then the natural way
>>>>> of handling such issues is to do the investigation locally.
>>>>
>>>> This will not always be possible. One example is when you are only able
>>>> to reproduce some of the USBAN errors on a specific HW.
>>>>
>>>>> Then, you are not forced
>>>>> to select this option and you can see all the UBSAN issues if you want.
>>>>
>>>> See above, I got that point. I am mostly concerned about the implication
>>>> in the CI right now.
>>>>
>>>>>
>>>>>>
>>>>>> So I feel it would be best if the gitlab CI jobs actually check for
>>>>>> USBAN in the logs and fail if there are any. With that, we can get the
>>>>>> full list of UBSAN issues on each job.
>>>>> Well, I prefer Andrew suggestion (both [1] and on IRC), hence this patch.
>>>>>
>>>>> *my plan was to first fix the UBSAN issues and then enable this option for CI.
>>>>
>>>> That would have been useful to describe your goal after "---". With that
>>>> in mind, then I suggest to revisit this patch once all the UBSAN issues
>>>> in a normal build are fixed.
>>> But this patch does not enable this option for CI automatically, right?
>>
>> Correct.
>>
>>> Why are you so keen to push it back?
>>
>> I have been pushing back because your commit message refers to the CI
>> specifically and today this would not really work (read as I would not
>> be happy if this option is enabled in the CI right now at least on arm32).
> I mentioned CI as a noteworthy example. In no case, I wrote that this implies the immediate
> enabling of this option for all the arches in CI. Especially, given that I'm aware of arm32 issues
> as you might know.

You are missing my point... If you read what I wrote a paragraph after, 
I am pointing out that even in the future, I would prefer if the CI 
reports all the errors rather than stopping at the first one.

When I assess a patch, I also assess based on the examples provided in 
the commit message. Sadly, I don't get CCed to the CI patches, so I much 
prefer to express my opinion earlier rather than missing out the 
opportunity to do it ...

> 
>>
>> If you want to fail a CI job for UBSAN today, then we need to find a
>> different approach so we can get the full list of UBSAN error rather
>> than fixing one, retry and then next one.
>>
>>> Is it because you see no point in this option other than for the upstream CI loop?
>>
>> Even in the upstream CI loop, I am a little unsure about this approach.
>> At least, I feel I would want to see all the reports at once in the CI.
>>
>> But this is not really a strong feeling.
>>
>>> I find it useful on a day-to-day
>>> Xen runs, and I would for sure enable it by default in my config not to miss UBSAN failures.
>>
>> Fair enough. I view USBAN issues the same a WARN_ON. They all need to be
>> investigated. So now you have an inconsistent policy.
>>
>> Are you are also intending to switch WARN_ON() to fatal? If not, then
>> why would UBSAN warnings more important that the other warnings?
> WARN_ON() is placed by the developer to detect e.g. incorrect configuration. The fact that someone
> decided to use WARN_ON and not BUG_ON means that this person did some investigation the result of
> which suggests no critical consequence.
The general guidance is to use BUG_ON() when it is not possible to pass 
the error up to the stack and that could cause privilege escalation / 
information leak. But for anything else (e.g. DoS), then it would be 
more common to use WARN_ON() as an indication that something is fishy.

To give a concrete example, in Linux, we recently had an XSA [1] which 
could have been detected earlier if we had pay attention to WARN splats 
(albeit they were only showing up in certain configuration).

So I would not treat WARN and UBSAN splats differently. In particular in 
the context of a CI system, we really want to know about any splats.

> For UBSAN, you can't always be sure (read undefined).
> It might be at the same level as WARN_ON but can also result in unpredictable behavior leading to security issues.

I wish this were true :). My example above is not Xen, but still...

> 
> That said, I do believe that we should also have option to panic on WARN().
> As for this patch, Andrew provided Rb and Stefano is happy with it. Do you want more people to vote?

No as long as this is not planned to be used in the Gitlab CI.

Cheers,

[1] https://xenbits.xen.org/xsa/advisory-441.html
diff mbox series

Patch

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 94e818ee09b5..e19e9d48817c 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -107,6 +107,13 @@  config UBSAN
 
 	  If unsure, say N here.
 
+config UBSAN_FATAL
+	bool "Panic on UBSAN failure"
+	depends on UBSAN
+	help
+	  Enabling this option will cause Xen to panic when an undefined behavior
+	  is detected by UBSAN. If unsure, say N here.
+
 config DEBUG_TRACE
 	bool "Debug trace support"
 	---help---
diff --git a/xen/common/ubsan/ubsan.c b/xen/common/ubsan/ubsan.c
index a3a80fa99eec..dd5ee0013648 100644
--- a/xen/common/ubsan/ubsan.c
+++ b/xen/common/ubsan/ubsan.c
@@ -174,6 +174,10 @@  static void ubsan_epilogue(unsigned long *flags)
 		"========================================\n");
 	spin_unlock_irqrestore(&report_lock, *flags);
 	current->in_ubsan--;
+
+#ifdef CONFIG_UBSAN_FATAL
+	panic("UBSAN failure detected\n");
+#endif
 }
 
 static void handle_overflow(struct overflow_data *data, unsigned long lhs,