Message ID | 20240329063614.362763-3-ruansy.fnst@fujitsu.com |
---|---|
State | New, archived |
Headers | show |
Series | cxl: add poison event handler | expand |
Shiyang Ruan wrote: > If poison is detected(reported from cxl memdev), OS should be notified to > handle it. So, introduce this helper function for later use: > 1. translate DPA to HPA; > 2. enqueue records into memory_failure's work queue; > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> This patch is too small, it needs the corresponding caller to make sense of the proposed change. > --- > > Currently poison injection from debugfs always create a 64-bytes-length > record, which is fine. But the injection from qemu's QMP API: > qmp_cxl_inject_poison() could create a poison record contains big length, > which may cause many many times of calling memory_failure_queue(). > Though the MEMORY_FAILURE_FIFO_SIZE is 1 << 4, it seems not enougth. What matters is what devices do in practice, the kernel should not be worried about odd corner cases that only exist in QEMU injection scenarios.
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 9adda4795eb7..31b1b8711256 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1290,6 +1290,24 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_set_timestamp, CXL); +void cxl_mem_report_poison(struct cxl_memdev *cxlmd, + struct cxl_region *cxlr, + struct cxl_poison_record *poison) +{ + u64 dpa = le64_to_cpu(poison->address) & CXL_POISON_START_MASK; + u64 len = PAGE_ALIGN(le32_to_cpu(poison->length) * CXL_POISON_LEN_MULT); + u64 hpa = cxl_trace_hpa(cxlr, cxlmd, dpa); + unsigned long pfn = PHYS_PFN(hpa); + unsigned long pfn_end = pfn + len / PAGE_SIZE - 1; + + if (!IS_ENABLED(CONFIG_MEMORY_FAILURE)) + return; + + for (; pfn <= pfn_end; pfn++) + memory_failure_queue(pfn, MF_ACTION_REQUIRED); +} +EXPORT_SYMBOL_NS_GPL(cxl_mem_report_poison, CXL); + int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, struct cxl_region *cxlr) { diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 20fb3b35e89e..82f80eb381fb 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -828,6 +828,9 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd, const uuid_t *uuid, union cxl_event *evt); int cxl_set_timestamp(struct cxl_memdev_state *mds); int cxl_poison_state_init(struct cxl_memdev_state *mds); +void cxl_mem_report_poison(struct cxl_memdev *cxlmd, + struct cxl_region *cxlr, + struct cxl_poison_record *poison); int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, struct cxl_region *cxlr); int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
If poison is detected(reported from cxl memdev), OS should be notified to handle it. So, introduce this helper function for later use: 1. translate DPA to HPA; 2. enqueue records into memory_failure's work queue; Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> --- Currently poison injection from debugfs always create a 64-bytes-length record, which is fine. But the injection from qemu's QMP API: qmp_cxl_inject_poison() could create a poison record contains big length, which may cause many many times of calling memory_failure_queue(). Though the MEMORY_FAILURE_FIFO_SIZE is 1 << 4, it seems not enougth. --- drivers/cxl/core/mbox.c | 18 ++++++++++++++++++ drivers/cxl/cxlmem.h | 3 +++ 2 files changed, 21 insertions(+)