mbox series

[v5,0/6] hw/cxl: Poison get, inject, clear

Message ID 20230423162013.4535-1-Jonathan.Cameron@huawei.com
Headers show
Series hw/cxl: Poison get, inject, clear | expand

Message

Jonathan Cameron April 23, 2023, 4:20 p.m. UTC
v5: More details in each patch.
 - Simpler algorithm to find entry when clearing.
 - Improvements to debugability and docs for 24 bit endian functions.
 - Use of ROUND_DOWN() to simplify the various alignment questions.
 - Use CXL_CACHELINE_SIZE define to explain the mysterious 64 byte
   granularity
 - Use memory_region_size() instead of direct accesses.

Many of the precursors listed for v4 have now been applied, but
a few minor fixes have come up in the meantime so there are still
a few precursors including the volatile support left from v4
precursors.

Depends on 
[PATCH 0/2] hw/cxl: CDAT file handling fixes.
[PATCH v2 0/3] hw/cxl: Fix decoder commit and uncommit handling
[PATCH 0/3] docs/cxl: Gathering of fixes for 8.0 CXL docs.
[PATCH v5 0/3] hw/mem: CXL Type-3 Volatile Memory Support
 
Based on: Message-ID: 20230421132020.7408-1-Jonathan.Cameron@huawei.com
Based on: Message-ID: 20230421135906.3515-1-Jonathan.Cameron@huawei.com
Based on: Message-ID: 20230421134507.26842-1-Jonathan.Cameron@huawei.com
Based on: Message-ID: 20230421160827.2227-1-Jonathan.Cameron@huawei.com

The kernel support for Poison handling is currently in the cxl/pending
branch and hopefully should be in the CXL pull request next week.

https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=pending

This code has been very useful for testing and helped identify various
corner cases.

Updated cover letter.

The series supports:
1) Injection of variable length poison regions via QMP (to fake real
   memory corruption and ensure we deal with odd overflow corner cases
   such as clearing the middle of a large region making the list overflow
   as we go from one long entry to two smaller entries.
2) Read of poison list via the CXL mailbox.
3) Injection via the poison injection mailbox command (limited to 64 byte
   entries - spec constraint)
4) Clearing of poison injected via either method.

The implementation is meant to be a valid combination of impdef choices
based on what the spec allowed. There are a number of places where it could
be made more sophisticated that we might consider in future:
* Fusing adjacent poison entries if the types match.
* Separate injection list and main poison list, to test out limits on
  injected poison list being smaller than the main list.
* Poison list overflow event (needs event log support in general)
* Connecting up to the poison list error record generation (rather complex
  and not needed for currently kernel handling testing).
* Triggering the synchronous and asynchronous errors that occur on reads
  and writes of the memory when the host receives poison.

As the kernel code is currently fairly simple, it is likely that the above
does not yet matter but who knows what will turn up in future!


Ira Weiny (2):
  hw/cxl: Introduce cxl_device_get_timestamp() utility function
  bswap: Add the ability to store to an unaligned 24 bit field

Jonathan Cameron (4):
  hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
  hw/cxl: QMP based poison injection support
  hw/cxl: Add poison injection via the mailbox.
  hw/cxl: Add clear poison mailbox command support.

 docs/devel/loads-stores.rst |   1 +
 hw/cxl/cxl-device-utils.c   |  15 ++
 hw/cxl/cxl-mailbox-utils.c  | 289 ++++++++++++++++++++++++++++++------
 hw/mem/cxl_type3.c          |  93 ++++++++++++
 hw/mem/cxl_type3_stubs.c    |   6 +
 include/hw/cxl/cxl.h        |   1 +
 include/hw/cxl/cxl_device.h |  23 +++
 include/qemu/bswap.h        |  25 ++++
 qapi/cxl.json               |  18 +++
 9 files changed, 429 insertions(+), 42 deletions(-)

Comments

Michael S. Tsirkin May 19, 2023, 8:49 a.m. UTC | #1
On Sun, Apr 23, 2023 at 05:20:07PM +0100, Jonathan Cameron wrote:
> v5: More details in each patch.
>  - Simpler algorithm to find entry when clearing.
>  - Improvements to debugability and docs for 24 bit endian functions.
>  - Use of ROUND_DOWN() to simplify the various alignment questions.
>  - Use CXL_CACHELINE_SIZE define to explain the mysterious 64 byte
>    granularity
>  - Use memory_region_size() instead of direct accesses.


picked first 3 but dropped the rest for now due to build errors.

> Many of the precursors listed for v4 have now been applied, but
> a few minor fixes have come up in the meantime so there are still
> a few precursors including the volatile support left from v4
> precursors.
> 
> Depends on 
> [PATCH 0/2] hw/cxl: CDAT file handling fixes.
> [PATCH v2 0/3] hw/cxl: Fix decoder commit and uncommit handling
> [PATCH 0/3] docs/cxl: Gathering of fixes for 8.0 CXL docs.
> [PATCH v5 0/3] hw/mem: CXL Type-3 Volatile Memory Support
>  
> Based on: Message-ID: 20230421132020.7408-1-Jonathan.Cameron@huawei.com
> Based on: Message-ID: 20230421135906.3515-1-Jonathan.Cameron@huawei.com
> Based on: Message-ID: 20230421134507.26842-1-Jonathan.Cameron@huawei.com
> Based on: Message-ID: 20230421160827.2227-1-Jonathan.Cameron@huawei.com
> 
> The kernel support for Poison handling is currently in the cxl/pending
> branch and hopefully should be in the CXL pull request next week.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=pending
> 
> This code has been very useful for testing and helped identify various
> corner cases.
> 
> Updated cover letter.
> 
> The series supports:
> 1) Injection of variable length poison regions via QMP (to fake real
>    memory corruption and ensure we deal with odd overflow corner cases
>    such as clearing the middle of a large region making the list overflow
>    as we go from one long entry to two smaller entries.
> 2) Read of poison list via the CXL mailbox.
> 3) Injection via the poison injection mailbox command (limited to 64 byte
>    entries - spec constraint)
> 4) Clearing of poison injected via either method.
> 
> The implementation is meant to be a valid combination of impdef choices
> based on what the spec allowed. There are a number of places where it could
> be made more sophisticated that we might consider in future:
> * Fusing adjacent poison entries if the types match.
> * Separate injection list and main poison list, to test out limits on
>   injected poison list being smaller than the main list.
> * Poison list overflow event (needs event log support in general)
> * Connecting up to the poison list error record generation (rather complex
>   and not needed for currently kernel handling testing).
> * Triggering the synchronous and asynchronous errors that occur on reads
>   and writes of the memory when the host receives poison.
> 
> As the kernel code is currently fairly simple, it is likely that the above
> does not yet matter but who knows what will turn up in future!
> 
> 
> Ira Weiny (2):
>   hw/cxl: Introduce cxl_device_get_timestamp() utility function
>   bswap: Add the ability to store to an unaligned 24 bit field
> 
> Jonathan Cameron (4):
>   hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
>   hw/cxl: QMP based poison injection support
>   hw/cxl: Add poison injection via the mailbox.
>   hw/cxl: Add clear poison mailbox command support.
> 
>  docs/devel/loads-stores.rst |   1 +
>  hw/cxl/cxl-device-utils.c   |  15 ++
>  hw/cxl/cxl-mailbox-utils.c  | 289 ++++++++++++++++++++++++++++++------
>  hw/mem/cxl_type3.c          |  93 ++++++++++++
>  hw/mem/cxl_type3_stubs.c    |   6 +
>  include/hw/cxl/cxl.h        |   1 +
>  include/hw/cxl/cxl_device.h |  23 +++
>  include/qemu/bswap.h        |  25 ++++
>  qapi/cxl.json               |  18 +++
>  9 files changed, 429 insertions(+), 42 deletions(-)
> 
> -- 
> 2.37.2
Jonathan Cameron May 19, 2023, 11:07 a.m. UTC | #2
On Fri, 19 May 2023 04:49:46 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Sun, Apr 23, 2023 at 05:20:07PM +0100, Jonathan Cameron wrote:
> > v5: More details in each patch.
> >  - Simpler algorithm to find entry when clearing.
> >  - Improvements to debugability and docs for 24 bit endian functions.
> >  - Use of ROUND_DOWN() to simplify the various alignment questions.
> >  - Use CXL_CACHELINE_SIZE define to explain the mysterious 64 byte
> >    granularity
> >  - Use memory_region_size() instead of direct accesses.  
> 
> 
> picked first 3 but dropped the rest for now due to build errors.
Drop the bswap one as well for now.
s390 is trying to call __builtin_bswap24 which clearly doesn't exist
- though you won't see that without the rest of this patch set.

Might be a case of crossing with a patch set reworking this stuff
to use the compiler more, but I'm not quite sure.

I'll see if I can figure out a fix or indeed exactly how this is
being triggered.

Hindsight says we should have kept definition local to CXL and
done the 'generic' version afterwards. 

For reference

/builds/jic23/qemu/include/qemu/bswap.h:42:32: error: implicit declaration of function ‘__builtin_bswap24’; did you mean ‘__builtin_bswap64’? [-Werror=implicit-function-declaration]
   42 | #define le_bswap(v, size) glue(__builtin_bswap, size)(v)
      |                                ^~~~~~~~~~~~~~~
/builds/jic23/qemu/include/qemu/compiler.h:34:21: note: in definition of macro ‘xglue’
   34 | #define xglue(x, y) x ## y
      |                     ^
/builds/jic23/qemu/include/qemu/bswap.h:42:27: note: in expansion of macro ‘glue’
   42 | #define le_bswap(v, size) glue(__builtin_bswap, size)(v)
      |                           ^~~~
/builds/jic23/qemu/include/qemu/bswap.h:322:20: note: in expansion of macro ‘le_bswap’
  322 |     st24_he_p(ptr, le_bswap(v, 24));
      |                    ^~~~~~~~
/builds/jic23/qemu/include/qemu/bswap.h:42:32: error: nested extern declaration of ‘__builtin_bswap24’ [-Werror=nested-externs]
   42 | #define le_bswap(v, size) glue(__builtin_bswap, size)(v)
      |                                ^~~~~~~~~~~~~~~
/builds/jic23/qemu/include/qemu/compiler.h:34:21: note: in definition of macro ‘xglue’
   34 | #define xglue(x, y) x ## y
      |                     ^
/builds/jic23/qemu/include/qemu/bswap.h:42:27: note: in expansion of macro ‘glue’
   42 | #define le_bswap(v, size) glue(__builtin_bswap, size)(v)
      |                           ^~~~
/builds/jic23/qemu/include/qemu/bswap.h:322:20: note: in expansion of macro ‘le_bswap’
  322 |     st24_he_p(ptr, le_bswap(v, 24));
      |                    ^~~~~~~~

Jonathan
 
> 
> > Many of the precursors listed for v4 have now been applied, but
> > a few minor fixes have come up in the meantime so there are still
> > a few precursors including the volatile support left from v4
> > precursors.
> > 
> > Depends on 
> > [PATCH 0/2] hw/cxl: CDAT file handling fixes.
> > [PATCH v2 0/3] hw/cxl: Fix decoder commit and uncommit handling
> > [PATCH 0/3] docs/cxl: Gathering of fixes for 8.0 CXL docs.
> > [PATCH v5 0/3] hw/mem: CXL Type-3 Volatile Memory Support
> >  
> > Based on: Message-ID: 20230421132020.7408-1-Jonathan.Cameron@huawei.com
> > Based on: Message-ID: 20230421135906.3515-1-Jonathan.Cameron@huawei.com
> > Based on: Message-ID: 20230421134507.26842-1-Jonathan.Cameron@huawei.com
> > Based on: Message-ID: 20230421160827.2227-1-Jonathan.Cameron@huawei.com
> > 
> > The kernel support for Poison handling is currently in the cxl/pending
> > branch and hopefully should be in the CXL pull request next week.
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=pending
> > 
> > This code has been very useful for testing and helped identify various
> > corner cases.
> > 
> > Updated cover letter.
> > 
> > The series supports:
> > 1) Injection of variable length poison regions via QMP (to fake real
> >    memory corruption and ensure we deal with odd overflow corner cases
> >    such as clearing the middle of a large region making the list overflow
> >    as we go from one long entry to two smaller entries.
> > 2) Read of poison list via the CXL mailbox.
> > 3) Injection via the poison injection mailbox command (limited to 64 byte
> >    entries - spec constraint)
> > 4) Clearing of poison injected via either method.
> > 
> > The implementation is meant to be a valid combination of impdef choices
> > based on what the spec allowed. There are a number of places where it could
> > be made more sophisticated that we might consider in future:
> > * Fusing adjacent poison entries if the types match.
> > * Separate injection list and main poison list, to test out limits on
> >   injected poison list being smaller than the main list.
> > * Poison list overflow event (needs event log support in general)
> > * Connecting up to the poison list error record generation (rather complex
> >   and not needed for currently kernel handling testing).
> > * Triggering the synchronous and asynchronous errors that occur on reads
> >   and writes of the memory when the host receives poison.
> > 
> > As the kernel code is currently fairly simple, it is likely that the above
> > does not yet matter but who knows what will turn up in future!
> > 
> > 
> > Ira Weiny (2):
> >   hw/cxl: Introduce cxl_device_get_timestamp() utility function
> >   bswap: Add the ability to store to an unaligned 24 bit field
> > 
> > Jonathan Cameron (4):
> >   hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
> >   hw/cxl: QMP based poison injection support
> >   hw/cxl: Add poison injection via the mailbox.
> >   hw/cxl: Add clear poison mailbox command support.
> > 
> >  docs/devel/loads-stores.rst |   1 +
> >  hw/cxl/cxl-device-utils.c   |  15 ++
> >  hw/cxl/cxl-mailbox-utils.c  | 289 ++++++++++++++++++++++++++++++------
> >  hw/mem/cxl_type3.c          |  93 ++++++++++++
> >  hw/mem/cxl_type3_stubs.c    |   6 +
> >  include/hw/cxl/cxl.h        |   1 +
> >  include/hw/cxl/cxl_device.h |  23 +++
> >  include/qemu/bswap.h        |  25 ++++
> >  qapi/cxl.json               |  18 +++
> >  9 files changed, 429 insertions(+), 42 deletions(-)
> > 
> > -- 
> > 2.37.2  
>