diff mbox series

[RFC,v2,2/6] cxl/core: introduce cxl_mem_report_poison()

Message ID 20240329063614.362763-3-ruansy.fnst@fujitsu.com
State New, archived
Headers show
Series cxl: add poison event handler | expand

Commit Message

Shiyang Ruan March 29, 2024, 6:36 a.m. UTC
If poison is detected(reported from cxl memdev), OS should be notified to
handle it. So, introduce this helper function for later use:
  1. translate DPA to HPA;
  2. enqueue records into memory_failure's work queue;

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---

Currently poison injection from debugfs always create a 64-bytes-length
record, which is fine.  But the injection from qemu's QMP API:
qmp_cxl_inject_poison() could create a poison record contains big length,
which may cause many many times of calling memory_failure_queue().
Though the MEMORY_FAILURE_FIFO_SIZE is 1 << 4, it seems not enougth.

---
 drivers/cxl/core/mbox.c | 18 ++++++++++++++++++
 drivers/cxl/cxlmem.h    |  3 +++
 2 files changed, 21 insertions(+)

Comments

Dan Williams March 30, 2024, 1:39 a.m. UTC | #1
Shiyang Ruan wrote:
> If poison is detected(reported from cxl memdev), OS should be notified to
> handle it. So, introduce this helper function for later use:
>   1. translate DPA to HPA;
>   2. enqueue records into memory_failure's work queue;
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>

This patch is too small, it needs the corresponding caller to make sense
of the proposed change.

> ---
> 
> Currently poison injection from debugfs always create a 64-bytes-length
> record, which is fine.  But the injection from qemu's QMP API:
> qmp_cxl_inject_poison() could create a poison record contains big length,
> which may cause many many times of calling memory_failure_queue().
> Though the MEMORY_FAILURE_FIFO_SIZE is 1 << 4, it seems not enougth.

What matters is what devices do in practice, the kernel should not be
worried about odd corner cases that only exist in QEMU injection
scenarios.
diff mbox series

Patch

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 9adda4795eb7..31b1b8711256 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1290,6 +1290,24 @@  int cxl_set_timestamp(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL);
 
+void cxl_mem_report_poison(struct cxl_memdev *cxlmd,
+			   struct cxl_region *cxlr,
+			   struct cxl_poison_record *poison)
+{
+	u64 dpa = le64_to_cpu(poison->address) & CXL_POISON_START_MASK;
+	u64 len = PAGE_ALIGN(le32_to_cpu(poison->length) * CXL_POISON_LEN_MULT);
+	u64 hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
+	unsigned long pfn = PHYS_PFN(hpa);
+	unsigned long pfn_end = pfn + len / PAGE_SIZE - 1;
+
+	if (!IS_ENABLED(CONFIG_MEMORY_FAILURE))
+		return;
+
+	for (; pfn <= pfn_end; pfn++)
+		memory_failure_queue(pfn, MF_ACTION_REQUIRED);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_report_poison, CXL);
+
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr)
 {
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 20fb3b35e89e..82f80eb381fb 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -828,6 +828,9 @@  void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 			    const uuid_t *uuid, union cxl_event *evt);
 int cxl_set_timestamp(struct cxl_memdev_state *mds);
 int cxl_poison_state_init(struct cxl_memdev_state *mds);
+void cxl_mem_report_poison(struct cxl_memdev *cxlmd,
+			   struct cxl_region *cxlr,
+			   struct cxl_poison_record *poison);
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr);
 int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);