Message ID | 20241107102126.2183152-4-william.roche@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2,1/7] accel/kvm: Keep track of the HWPoisonPage page_size | expand |
On 07.11.24 11:21, “William Roche wrote: > From: William Roche <william.roche@oracle.com> > > When an entire large page is impacted by an error (hugetlbfs case), > report better the size and location of this large memory hole, so > give a warning message when this page is first hit: > Memory error: Loosing a large page (size: X) at QEMU addr Y and GUEST addr Z > Hm, I wonder if we really want to special-case hugetlb here. Why not make the warning independent of the underlying page size?
On 11/12/24 12:13, David Hildenbrand wrote: > On 07.11.24 11:21, “William Roche wrote: >> From: William Roche <william.roche@oracle.com> >> >> When an entire large page is impacted by an error (hugetlbfs case), >> report better the size and location of this large memory hole, so >> give a warning message when this page is first hit: >> Memory error: Loosing a large page (size: X) at QEMU addr Y and GUEST >> addr Z >> > > Hm, I wonder if we really want to special-case hugetlb here. > > Why not make the warning independent of the underlying page size? We already have a warning provided by Qemu (in kvm_arch_on_sigbus_vcpu()): Guest MCE Memory Error at QEMU addr Y and GUEST addr Z of type BUS_MCEERR_AR/_AO injected The one I suggest is an additional message provided before the above message. Here is an example: qemu-system-x86_64: warning: Memory error: Loosing a large page (size: 2097152) at QEMU addr 0x7fdd7d400000 and GUEST addr 0x11600000 qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr 0x7fdd7d400000 and GUEST addr 0x11600000 of type BUS_MCEERR_AO injected According to me, this large page case additional message will help to better understand the probable sudden proliferation of memory errors that can be reported by Qemu on the impacted range. Not only will the machine administrator identify better that a single memory error had this large impact, it can also help us to better measure the impact of fixing the large page memory error support in the field (in the future). These are some reasons why I do think this large page specific message can be useful.
On 12.11.24 19:17, William Roche wrote: > On 11/12/24 12:13, David Hildenbrand wrote: >> On 07.11.24 11:21, “William Roche wrote: >>> From: William Roche <william.roche@oracle.com> >>> >>> When an entire large page is impacted by an error (hugetlbfs case), >>> report better the size and location of this large memory hole, so >>> give a warning message when this page is first hit: >>> Memory error: Loosing a large page (size: X) at QEMU addr Y and GUEST >>> addr Z >>> >> >> Hm, I wonder if we really want to special-case hugetlb here. >> >> Why not make the warning independent of the underlying page size? > > We already have a warning provided by Qemu (in kvm_arch_on_sigbus_vcpu()): > > Guest MCE Memory Error at QEMU addr Y and GUEST addr Z of type > BUS_MCEERR_AR/_AO injected > > The one I suggest is an additional message provided before the above > message. > > Here is an example: > qemu-system-x86_64: warning: Memory error: Loosing a large page (size: > 2097152) at QEMU addr 0x7fdd7d400000 and GUEST addr 0x11600000 > qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr > 0x7fdd7d400000 and GUEST addr 0x11600000 of type BUS_MCEERR_AO injected > Hm, I think we should definitely be including the size in the existing one. That code was written without huge pages in mind. We should similarly warn in the arm implementation (where I don't see a similar message yet). > > According to me, this large page case additional message will help to > better understand the probable sudden proliferation of memory errors > that can be reported by Qemu on the impacted range. > Not only will the machine administrator identify better that a single > memory error had this large impact, it can also help us to better > measure the impact of fixing the large page memory error support in the > field (in the future). What about extending the existing one to something like warning: Guest MCE Memory Error at QEMU addr $ADDR and GUEST $PADDR of type BUS_MCEERR_AO and size $SIZE (large page) injected With the "large page" hint you can highlight that this is special. On a related note ...I think we have a problem. Assume we got a SIGBUS on a huge page (e.g., somewhere in a 1 GiB page). We will call kvm_mce_inject(cpu, paddr, code) / acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr) But where is the size information? :// Won't the VM simply assume that there was a MCE on a single 4k page starting at paddr? I'm not sure if we can inject ranges, or if we would have to issue one MCE per page ... hm, what's your take on this?
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 6dd06f5edf..a572437115 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1284,7 +1284,7 @@ static void kvm_unpoison_all(void *param) } } -void kvm_hwpoison_page_add(ram_addr_t ram_addr) +void kvm_hwpoison_page_add(ram_addr_t ram_addr, void *ha, hwaddr gpa) { HWPoisonPage *page; size_t sz = qemu_ram_pagesize_from_addr(ram_addr); @@ -1301,6 +1301,13 @@ void kvm_hwpoison_page_add(ram_addr_t ram_addr) page->ram_addr = ram_addr; page->page_size = sz; QLIST_INSERT_HEAD(&hwpoison_page_list, page, list); + + if (sz > TARGET_PAGE_SIZE) { + gpa = ROUND_DOWN(gpa, sz); + ha = (void *)ROUND_DOWN((uint64_t)ha, sz); + warn_report("Memory error: Loosing a large page (size: %zu) " + "at QEMU addr %p and GUEST addr 0x%" HWADDR_PRIx, sz, ha, gpa); + } } bool kvm_hwpoisoned_mem(void) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index a1e72763da..ee34f1d225 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -178,10 +178,12 @@ void kvm_set_max_memslot_size(hwaddr max_slot_size); * * Parameters: * @ram_addr: the address in the RAM for the poisoned page + * @hva: host virtual address aka QEMU addr + * @gpa: guest physical address aka GUEST addr * * Add a poisoned page to the list * * Return: None. */ -void kvm_hwpoison_page_add(ram_addr_t ram_addr); +void kvm_hwpoison_page_add(ram_addr_t ram_addr, void *hva, hwaddr gpa); #endif diff --git a/target/arm/kvm.c b/target/arm/kvm.c index f1f1b5b375..aae66dba41 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -2359,7 +2359,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) ram_addr = qemu_ram_addr_from_host(addr); if (ram_addr != RAM_ADDR_INVALID && kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { - kvm_hwpoison_page_add(ram_addr); + kvm_hwpoison_page_add(ram_addr, addr, paddr); /* * If this is a BUS_MCEERR_AR, we know we have been called * synchronously from the vCPU thread, so we can easily diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index fd9f198892..fd7cd7008e 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -753,7 +753,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) ram_addr = qemu_ram_addr_from_host(addr); if (ram_addr != RAM_ADDR_INVALID && kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) { - kvm_hwpoison_page_add(ram_addr); + kvm_hwpoison_page_add(ram_addr, addr, paddr); kvm_mce_inject(cpu, paddr, code); /*