diff mbox series

[for-4.19] xen/xmalloc: XMEM_POOL_POISON improvements

Message ID 20231020202649.1802700-1-andrew.cooper3@citrix.com (mailing list archive)
State Superseded
Headers show
Series [for-4.19] xen/xmalloc: XMEM_POOL_POISON improvements | expand

Commit Message

Andrew Cooper Oct. 20, 2023, 8:26 p.m. UTC
When in use, the spew:

  (XEN) Assertion '!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE, (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE)' failed at common/xmalloc_tlsf.c:246

is unweidly and meaningless to non-Xen developers.  Therefore:

 * Switch to IS_ENABLED().  There's no need for full #ifdef-ary.
 * Pull memchr_inv() out into the if(), and provide a better error message.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

Observations from the debugging of:
  https://github.com/Dasharo/dasharo-issues/issues/488

There's a further bug.  XMEM_POOL_POISON can be enabled in release builds,
where the ASSERT() gets dropped.

However, upping to a BUG() can't provide any helpful message out to the user.

I tried modifying BUG() to take an optional message, but xen/bug.h needs
untangling substantially before that will work, and I don't have time right now.
---
 xen/common/xmalloc_tlsf.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Comments

Julien Grall Oct. 22, 2023, 4:52 p.m. UTC | #1
Hi Andrew,

On 20/10/2023 21:26, Andrew Cooper wrote:
> When in use, the spew:
> 
>    (XEN) Assertion '!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE, (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE)' failed at common/xmalloc_tlsf.c:246
> 
> is unweidly and meaningless to non-Xen developers.  Therefore:
> 
>   * Switch to IS_ENABLED().  There's no need for full #ifdef-ary.
>   * Pull memchr_inv() out into the if(), and provide a better error message.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: George Dunlap <George.Dunlap@eu.citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Wei Liu <wl@xen.org>
> CC: Julien Grall <julien@xen.org>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> Observations from the debugging of:
>    https://github.com/Dasharo/dasharo-issues/issues/488
> 
> There's a further bug.  XMEM_POOL_POISON can be enabled in release builds,
> where the ASSERT() gets dropped.
> 
> However, upping to a BUG() can't provide any helpful message out to the user.
> 
> I tried modifying BUG() to take an optional message, but xen/bug.h needs
> untangling substantially before that will work, and I don't have time right now.

Do we actually care about the registers values? If not, we can either use:

printk(...);
BUG();

Or

panic(...);

This would allow us to use XMEM_POOL_POISON in prod build before BUG() 
can accept string.

> ---
>   xen/common/xmalloc_tlsf.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/common/xmalloc_tlsf.c b/xen/common/xmalloc_tlsf.c
> index 349b31cb4cc1..13305cd87c2f 100644
> --- a/xen/common/xmalloc_tlsf.c
> +++ b/xen/common/xmalloc_tlsf.c
> @@ -249,11 +249,11 @@ static inline void EXTRACT_BLOCK(struct bhdr *b, struct xmem_pool *p, int fl,
>       }
>       b->ptr.free_ptr = (struct free_ptr) {NULL, NULL};
>   
> -#ifdef CONFIG_XMEM_POOL_POISON
> -    if ( (b->size & BLOCK_SIZE_MASK) > MIN_BLOCK_SIZE )
> -        ASSERT(!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE,
> -                           (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE));
> -#endif /* CONFIG_XMEM_POOL_POISON */
> +    if ( IS_ENABLED(CONFIG_XMEM_POOL_POISON) &&
> +         (b->size & BLOCK_SIZE_MASK) > MIN_BLOCK_SIZE &&
> +         memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE,
> +                    (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE) )
> +        ASSERT(!"XMEM Pool corruption found");
>   }
>   
>   /**
> @@ -261,11 +261,10 @@ static inline void EXTRACT_BLOCK(struct bhdr *b, struct xmem_pool *p, int fl,
>    */
>   static inline void INSERT_BLOCK(struct bhdr *b, struct xmem_pool *p, int fl, int sl)
>   {
> -#ifdef CONFIG_XMEM_POOL_POISON
> -    if ( (b->size & BLOCK_SIZE_MASK) > MIN_BLOCK_SIZE )
> +    if ( IS_ENABLED(CONFIG_XMEM_POOL_POISON) &&
> +         (b->size & BLOCK_SIZE_MASK) > MIN_BLOCK_SIZE )
>           memset(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE,
>                  (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE);
> -#endif /* CONFIG_XMEM_POOL_POISON */
>   
>       b->ptr.free_ptr = (struct free_ptr) {NULL, p->matrix[fl][sl]};
>       if ( p->matrix[fl][sl] )


Cheers,
Jan Beulich Oct. 23, 2023, 8:04 a.m. UTC | #2
On 20.10.2023 22:26, Andrew Cooper wrote:
> When in use, the spew:
> 
>   (XEN) Assertion '!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE, (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE)' failed at common/xmalloc_tlsf.c:246
> 
> is unweidly and meaningless to non-Xen developers.

Just to mention it (no objection to the changes done): While I agree with
unwieldy (and I'd add "of limited use"), an assertion message not necessarily
being meaningful to non-Xen developers is imo not really a criteria - that's
to be expected in various cases.

>  Therefore:
> 
>  * Switch to IS_ENABLED().  There's no need for full #ifdef-ary.
>  * Pull memchr_inv() out into the if(), and provide a better error message.

Thing is - this "better" error message now ends up being less specific, and
hence could mean also other kinds of corruption.

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: George Dunlap <George.Dunlap@eu.citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Wei Liu <wl@xen.org>
> CC: Julien Grall <julien@xen.org>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> 
> Observations from the debugging of:
>   https://github.com/Dasharo/dasharo-issues/issues/488
> 
> There's a further bug.  XMEM_POOL_POISON can be enabled in release builds,
> where the ASSERT() gets dropped.

Just to mention it - this limits usefulness, but doesn't make the option
entirely useless. The poisoning itself also allows certain cases of use-
after-free to be caught. The assertion really is "only" about write-after-
free. (Maybe the improved error message could indicate so?)

> However, upping to a BUG() can't provide any helpful message out to the user.
> 
> I tried modifying BUG() to take an optional message, but xen/bug.h needs
> untangling substantially before that will work, and I don't have time right now.

I agree with Julien's suggestion of using panic() in the meantime, as a
possible alternative. Question though is whether it's better to halt the
system right away, as opposed to e.g. permitting orderly shutdown to cover
the case where the corruption ends up not being "deadly".

Jan
Roger Pau Monné Oct. 23, 2023, 8:28 a.m. UTC | #3
On Mon, Oct 23, 2023 at 10:04:21AM +0200, Jan Beulich wrote:
> On 20.10.2023 22:26, Andrew Cooper wrote:
> > However, upping to a BUG() can't provide any helpful message out to the user.
> > 
> > I tried modifying BUG() to take an optional message, but xen/bug.h needs
> > untangling substantially before that will work, and I don't have time right now.
> 
> I agree with Julien's suggestion of using panic() in the meantime, as a
> possible alternative.

We might care about the stack trace, so would be helpful to print it,
maybe WARN + panic?

> Question though is whether it's better to halt the
> system right away, as opposed to e.g. permitting orderly shutdown to cover
> the case where the corruption ends up not being "deadly".

Hm, won't this be risky, as we could then possibly corrupt data on disk
for example?

Thanks, Roger.
Jan Beulich Oct. 23, 2023, 8:41 a.m. UTC | #4
On 23.10.2023 10:28, Roger Pau Monné wrote:
> On Mon, Oct 23, 2023 at 10:04:21AM +0200, Jan Beulich wrote:
>> On 20.10.2023 22:26, Andrew Cooper wrote:
>>> However, upping to a BUG() can't provide any helpful message out to the user.
>>>
>>> I tried modifying BUG() to take an optional message, but xen/bug.h needs
>>> untangling substantially before that will work, and I don't have time right now.
>>
>> I agree with Julien's suggestion of using panic() in the meantime, as a
>> possible alternative.
> 
> We might care about the stack trace, so would be helpful to print it,
> maybe WARN + panic?

Perhaps, yes.

>> Question though is whether it's better to halt the
>> system right away, as opposed to e.g. permitting orderly shutdown to cover
>> the case where the corruption ends up not being "deadly".
> 
> Hm, won't this be risky, as we could then possibly corrupt data on disk
> for example?

It is risky - you never really know which of the two options is better.

Jan
Andrew Cooper Oct. 23, 2023, 9:51 a.m. UTC | #5
On 22/10/2023 5:52 pm, Julien Grall wrote:
> Hi Andrew,
>
> On 20/10/2023 21:26, Andrew Cooper wrote:
>> When in use, the spew:
>>
>>    (XEN) Assertion '!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE,
>> POISON_BYTE, (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE)' failed at
>> common/xmalloc_tlsf.c:246
>>
>> is unweidly and meaningless to non-Xen developers.  Therefore:
>>
>>   * Switch to IS_ENABLED().  There's no need for full #ifdef-ary.
>>   * Pull memchr_inv() out into the if(), and provide a better error
>> message.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: George Dunlap <George.Dunlap@eu.citrix.com>
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> CC: Wei Liu <wl@xen.org>
>> CC: Julien Grall <julien@xen.org>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Observations from the debugging of:
>>    https://github.com/Dasharo/dasharo-issues/issues/488
>>
>> There's a further bug.  XMEM_POOL_POISON can be enabled in release
>> builds,
>> where the ASSERT() gets dropped.
>>
>> However, upping to a BUG() can't provide any helpful message out to
>> the user.
>>
>> I tried modifying BUG() to take an optional message, but xen/bug.h needs
>> untangling substantially before that will work, and I don't have time
>> right now.
>
> Do we actually care about the registers values? If not, we can either
> use:
>
> printk(...);
> BUG();
>
> Or
>
> panic(...);
>
> This would allow us to use XMEM_POOL_POISON in prod build before BUG()
> can accept string.

We care about the backtrace, so panic() on it's own isn't good enough.

But printk();BUG(); should be good enough so I'll swap to that.

~Andrew
diff mbox series

Patch

diff --git a/xen/common/xmalloc_tlsf.c b/xen/common/xmalloc_tlsf.c
index 349b31cb4cc1..13305cd87c2f 100644
--- a/xen/common/xmalloc_tlsf.c
+++ b/xen/common/xmalloc_tlsf.c
@@ -249,11 +249,11 @@  static inline void EXTRACT_BLOCK(struct bhdr *b, struct xmem_pool *p, int fl,
     }
     b->ptr.free_ptr = (struct free_ptr) {NULL, NULL};
 
-#ifdef CONFIG_XMEM_POOL_POISON
-    if ( (b->size & BLOCK_SIZE_MASK) > MIN_BLOCK_SIZE )
-        ASSERT(!memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE,
-                           (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE));
-#endif /* CONFIG_XMEM_POOL_POISON */
+    if ( IS_ENABLED(CONFIG_XMEM_POOL_POISON) &&
+         (b->size & BLOCK_SIZE_MASK) > MIN_BLOCK_SIZE &&
+         memchr_inv(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE,
+                    (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE) )
+        ASSERT(!"XMEM Pool corruption found");
 }
 
 /**
@@ -261,11 +261,10 @@  static inline void EXTRACT_BLOCK(struct bhdr *b, struct xmem_pool *p, int fl,
  */
 static inline void INSERT_BLOCK(struct bhdr *b, struct xmem_pool *p, int fl, int sl)
 {
-#ifdef CONFIG_XMEM_POOL_POISON
-    if ( (b->size & BLOCK_SIZE_MASK) > MIN_BLOCK_SIZE )
+    if ( IS_ENABLED(CONFIG_XMEM_POOL_POISON) &&
+         (b->size & BLOCK_SIZE_MASK) > MIN_BLOCK_SIZE )
         memset(b->ptr.buffer + MIN_BLOCK_SIZE, POISON_BYTE,
                (b->size & BLOCK_SIZE_MASK) - MIN_BLOCK_SIZE);
-#endif /* CONFIG_XMEM_POOL_POISON */
 
     b->ptr.free_ptr = (struct free_ptr) {NULL, p->matrix[fl][sl]};
     if ( p->matrix[fl][sl] )