Message ID | cover.1674101475.git.alison.schofield@intel.com |
---|---|
Headers | show |
Series | cxl: CXL Inject & Clear Poison | expand |
On Wed, 18 Jan 2023 21:00:15 -0800 alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > Built on cxl/next plus Patchset: CXL Poison List Retrieval & Tracing: > https://lore.kernel.org/linux-cxl/de11785ff05844299b40b100f8e0f56c7eef7f08.1674070170.git.alison.schofield@intel.com/ Only tangentially relevant, but I've only just registered as a result of getting a lot of 0 timestamps (which is what you return if the timestamp base is unknown) that I don't think we currently ever set the EP timestamp. Recommendation in the spec (8.2.9.4.2) is: "It is recommended that the host set hte timestamp after ever Conventional or CXL Reset" I'd go further and assume that if we are doing native error handling then it's up to the OS to initialize the timestamp. Also relevant to Ira's series as events are timestamped. Currently Ira's QEMU code doesn't take this subtlety into account (poison doesn't either - but I have patches). Jonathan > > Changes in v2: > - Add Jonathan Reviewed-by tags to Patches 1,2,4 > - Clean up input payload structs for both inject and clear (Dan) > - Commit message cleanups, including spec references (Dave) > - Use CXL_POISON_LEN_MULT in define of clear write data > - Use IS_ALIGNED() for 64byte align check (Dan) > - Add Kconfig CXL_POISON_INJECT (Dan) > - Trivial space cleanup (Jonathan) > - Doc/ABI cleanup (Dave, Dan) > - Mock: Only use injected errors for get poison list > - Mock: Use 'POISONLMT -ENXIO' text from CMD_CMD_RC_TABLE (Jonathan) > - Mock: Add Patch 6/6: A module param to mock device inject limit > > Link to v1: https://lore.kernel.org/linux-cxl/cover.1669781852.git.alison.schofield@intel.com/ > > Introducing Inject and Clear Poison support for CXL Devices. > > These are optional commands, meaning not all CXL devices must support > them. The sysfs attributes, inject_poison and clear_poison, are only > visible for devices reporting support of the capability and when the > kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off) > > Example: > # echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison > # echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list > > cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0 > > > Alison Schofield (6): > cxl/memdev: Add support for the Inject Poison mailbox command > cxl/memdev: Add support for the Clear Poison mailbox command > tools/testing/cxl: Mock the Inject Poison mailbox command > tools/testing/cxl: Mock the Clear Poison mailbox command > tools/testing/cxl: Use injected poison for get poison list > tools/testing/cxl: Add a param to test poison injection limits > > Documentation/ABI/testing/sysfs-bus-cxl | 40 ++++++ > drivers/cxl/Kconfig | 10 ++ > drivers/cxl/core/memdev.c | 122 ++++++++++++++++ > drivers/cxl/cxlmem.h | 11 ++ > tools/testing/cxl/test/mem.c | 178 +++++++++++++++++++++--- > 5 files changed, 341 insertions(+), 20 deletions(-) >
On Mon, Jan 23, 2023 at 05:13:01PM +0000, Jonathan Cameron wrote: > On Wed, 18 Jan 2023 21:00:15 -0800 > alison.schofield@intel.com wrote: > > > From: Alison Schofield <alison.schofield@intel.com> > > > > Built on cxl/next plus Patchset: CXL Poison List Retrieval & Tracing: > > https://lore.kernel.org/linux-cxl/de11785ff05844299b40b100f8e0f56c7eef7f08.1674070170.git.alison.schofield@intel.com/ > > Only tangentially relevant, but I've only just registered > as a result of getting a lot of 0 timestamps (which is what > you return if the timestamp base is unknown) that I don't > think we currently ever set the EP timestamp. > > Recommendation in the spec (8.2.9.4.2) is: > "It is recommended that the host set hte timestamp > after ever Conventional or CXL Reset" > > I'd go further and assume that if we are doing native error > handling then it's up to the OS to initialize the timestamp. > > Also relevant to Ira's series as events are timestamped. > Currently Ira's QEMU code doesn't take this subtlety into > account (poison doesn't either - but I have patches). > > Jonathan > Jonathan, I hadn't seen the Set Timestamp cmd, but I think we are OK with Get Poison List and it's overflow_t reporting, it does not use a relative timestamp, but absolute since Jan-1970. Table 8-106 says: Overflow Timestamp: The time that the device determined the poison list overflowed. This field is only valid if the overflow indicator is set. The number of unsigned nanoseconds that have elapsed since midnight, 01- Jan-1970, UTC. If the device does not have a valid timestamp, return 0. Alison > > > > > Changes in v2: > > - Add Jonathan Reviewed-by tags to Patches 1,2,4 > > - Clean up input payload structs for both inject and clear (Dan) > > - Commit message cleanups, including spec references (Dave) > > - Use CXL_POISON_LEN_MULT in define of clear write data > > - Use IS_ALIGNED() for 64byte align check (Dan) > > - Add Kconfig CXL_POISON_INJECT (Dan) > > - Trivial space cleanup (Jonathan) > > - Doc/ABI cleanup (Dave, Dan) > > - Mock: Only use injected errors for get poison list > > - Mock: Use 'POISONLMT -ENXIO' text from CMD_CMD_RC_TABLE (Jonathan) > > - Mock: Add Patch 6/6: A module param to mock device inject limit > > > > Link to v1: https://lore.kernel.org/linux-cxl/cover.1669781852.git.alison.schofield@intel.com/ > > > > Introducing Inject and Clear Poison support for CXL Devices. > > > > These are optional commands, meaning not all CXL devices must support > > them. The sysfs attributes, inject_poison and clear_poison, are only > > visible for devices reporting support of the capability and when the > > kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off) > > > > Example: > > # echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison > > # echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list > > > > cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0 > > > > > > Alison Schofield (6): > > cxl/memdev: Add support for the Inject Poison mailbox command > > cxl/memdev: Add support for the Clear Poison mailbox command > > tools/testing/cxl: Mock the Inject Poison mailbox command > > tools/testing/cxl: Mock the Clear Poison mailbox command > > tools/testing/cxl: Use injected poison for get poison list > > tools/testing/cxl: Add a param to test poison injection limits > > > > Documentation/ABI/testing/sysfs-bus-cxl | 40 ++++++ > > drivers/cxl/Kconfig | 10 ++ > > drivers/cxl/core/memdev.c | 122 ++++++++++++++++ > > drivers/cxl/cxlmem.h | 11 ++ > > tools/testing/cxl/test/mem.c | 178 +++++++++++++++++++++--- > > 5 files changed, 341 insertions(+), 20 deletions(-) > > >
On Mon, 23 Jan 2023 15:42:33 -0800 Alison Schofield <alison.schofield@intel.com> wrote: > On Mon, Jan 23, 2023 at 05:13:01PM +0000, Jonathan Cameron wrote: > > On Wed, 18 Jan 2023 21:00:15 -0800 > > alison.schofield@intel.com wrote: > > > > > From: Alison Schofield <alison.schofield@intel.com> > > > > > > Built on cxl/next plus Patchset: CXL Poison List Retrieval & Tracing: > > > https://lore.kernel.org/linux-cxl/de11785ff05844299b40b100f8e0f56c7eef7f08.1674070170.git.alison.schofield@intel.com/ > > > > Only tangentially relevant, but I've only just registered > > as a result of getting a lot of 0 timestamps (which is what > > you return if the timestamp base is unknown) that I don't > > think we currently ever set the EP timestamp. > > > > Recommendation in the spec (8.2.9.4.2) is: > > "It is recommended that the host set hte timestamp > > after ever Conventional or CXL Reset" > > > > I'd go further and assume that if we are doing native error > > handling then it's up to the OS to initialize the timestamp. > > > > Also relevant to Ira's series as events are timestamped. > > Currently Ira's QEMU code doesn't take this subtlety into > > account (poison doesn't either - but I have patches). > > > > Jonathan > > > > Jonathan, > > I hadn't seen the Set Timestamp cmd, but I think we are OK with > Get Poison List and it's overflow_t reporting, it does not use > a relative timestamp, but absolute since Jan-1970. I'd assume most devices have no ability to get that timestamp without a set timestamp command. You might get some with an RTC that is factory set, but I doubt it will be common. > > Table 8-106 says: > Overflow Timestamp: The time that the device determined the poison > list overflowed. This field is only valid if the overflow indicator is set. The > number of unsigned nanoseconds that have elapsed since midnight, 01- > Jan-1970, UTC. If the device does not have a valid timestamp, return 0. Yup, but Table 8-60 Set Timestamp Input Payload has "Timestamp: The number of unsigned nanoseconds that have elapsed since midnight, 01-Jan-1970, UTC." above that it is the text that says the host should set it on Conventional or CXL reset (which obviously includes boot). That's where the device idea of time is coming from so that it can be returned in places like the Overflow timestamp. My reading of that bit about "not have a valid timestamp, return 0" is that is targeting the case where a timestamp has not been set since reset. All in all, we are fine (in that it 'works') but with out set timestamp the generated time stamps are going to be garbage on many devices. I'll spin an RFC patch adding the call and we can work out what conditions it should be called under in that thread. Jonathan > > Alison > > > > > > > > > > Changes in v2: > > > - Add Jonathan Reviewed-by tags to Patches 1,2,4 > > > - Clean up input payload structs for both inject and clear (Dan) > > > - Commit message cleanups, including spec references (Dave) > > > - Use CXL_POISON_LEN_MULT in define of clear write data > > > - Use IS_ALIGNED() for 64byte align check (Dan) > > > - Add Kconfig CXL_POISON_INJECT (Dan) > > > - Trivial space cleanup (Jonathan) > > > - Doc/ABI cleanup (Dave, Dan) > > > - Mock: Only use injected errors for get poison list > > > - Mock: Use 'POISONLMT -ENXIO' text from CMD_CMD_RC_TABLE (Jonathan) > > > - Mock: Add Patch 6/6: A module param to mock device inject limit > > > > > > Link to v1: https://lore.kernel.org/linux-cxl/cover.1669781852.git.alison.schofield@intel.com/ > > > > > > Introducing Inject and Clear Poison support for CXL Devices. > > > > > > These are optional commands, meaning not all CXL devices must support > > > them. The sysfs attributes, inject_poison and clear_poison, are only > > > visible for devices reporting support of the capability and when the > > > kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off) > > > > > > Example: > > > # echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison > > > # echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list > > > > > > cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0 > > > > > > > > > Alison Schofield (6): > > > cxl/memdev: Add support for the Inject Poison mailbox command > > > cxl/memdev: Add support for the Clear Poison mailbox command > > > tools/testing/cxl: Mock the Inject Poison mailbox command > > > tools/testing/cxl: Mock the Clear Poison mailbox command > > > tools/testing/cxl: Use injected poison for get poison list > > > tools/testing/cxl: Add a param to test poison injection limits > > > > > > Documentation/ABI/testing/sysfs-bus-cxl | 40 ++++++ > > > drivers/cxl/Kconfig | 10 ++ > > > drivers/cxl/core/memdev.c | 122 ++++++++++++++++ > > > drivers/cxl/cxlmem.h | 11 ++ > > > tools/testing/cxl/test/mem.c | 178 +++++++++++++++++++++--- > > > 5 files changed, 341 insertions(+), 20 deletions(-) > > > > > >
From: Alison Schofield <alison.schofield@intel.com> Built on cxl/next plus Patchset: CXL Poison List Retrieval & Tracing: https://lore.kernel.org/linux-cxl/de11785ff05844299b40b100f8e0f56c7eef7f08.1674070170.git.alison.schofield@intel.com/ Changes in v2: - Add Jonathan Reviewed-by tags to Patches 1,2,4 - Clean up input payload structs for both inject and clear (Dan) - Commit message cleanups, including spec references (Dave) - Use CXL_POISON_LEN_MULT in define of clear write data - Use IS_ALIGNED() for 64byte align check (Dan) - Add Kconfig CXL_POISON_INJECT (Dan) - Trivial space cleanup (Jonathan) - Doc/ABI cleanup (Dave, Dan) - Mock: Only use injected errors for get poison list - Mock: Use 'POISONLMT -ENXIO' text from CMD_CMD_RC_TABLE (Jonathan) - Mock: Add Patch 6/6: A module param to mock device inject limit Link to v1: https://lore.kernel.org/linux-cxl/cover.1669781852.git.alison.schofield@intel.com/ Introducing Inject and Clear Poison support for CXL Devices. These are optional commands, meaning not all CXL devices must support them. The sysfs attributes, inject_poison and clear_poison, are only visible for devices reporting support of the capability and when the kernel Kconfig option CONFIG_CXL_POISON_INJECT is on. (Default: off) Example: # echo 0x40000000 > /sys/bus/cxl/devices/mem1/inject_poison # echo 1 > /sys/bus/cxl/devices/mem1/trigger_poison_list cxl_poison: memdev=mem1 pcidev=cxl_mem.1 region= region_uuid=00000000-0000-0000-0000-000000000000 hpa=0xffffffffffffffff dpa=0x40000000 length=0x40 source=Injected flags= overflow_time=0 Alison Schofield (6): cxl/memdev: Add support for the Inject Poison mailbox command cxl/memdev: Add support for the Clear Poison mailbox command tools/testing/cxl: Mock the Inject Poison mailbox command tools/testing/cxl: Mock the Clear Poison mailbox command tools/testing/cxl: Use injected poison for get poison list tools/testing/cxl: Add a param to test poison injection limits Documentation/ABI/testing/sysfs-bus-cxl | 40 ++++++ drivers/cxl/Kconfig | 10 ++ drivers/cxl/core/memdev.c | 122 ++++++++++++++++ drivers/cxl/cxlmem.h | 11 ++ tools/testing/cxl/test/mem.c | 178 +++++++++++++++++++++--- 5 files changed, 341 insertions(+), 20 deletions(-)