Message ID | 20090609164036.GA10828@amt.cnet (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Marcelo Tosatti wrote: > Ryan, > > On Fri, May 29, 2009 at 11:43:26AM -0500, Ryan Harper wrote: > >> Testing latest qemu-kvm.git and kvm-kmod.git, ept enabled and backing >> guests with large pages trips a BUG in the mmu code. If I disable ept, >> but still use large pages, migration succeeds. Reproduce with: >> >> hugetlbfs setup: >> % mkdir -p /hugetlbfs && mount -t hugetlbfs hugetlbfs /hugetlbfs >> % echo 10000 > /proc/sys/vm/nr_hugepages >> >> qemu commands: >> >> guest a: >> sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :12 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw >> >> guest b: >> sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :13 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw -incoming tcp:0:4444 >> >> Once the guest a is up, issued migrate command: >> (qemu) migrate -d tcp:localhost:444 >> >> rmap_remove: ffff880a08e00098 c0336e65c0336e5b 0->BUG >> > ^^^^^^^^^^^^^^^^ > > This value looks very strange (bits 5:3 contain invalid value, for one). > Don't have access to HW at the very moment, so it would be great if you > had time to do a change equivalent to this and reproduce: > That spte is totally bogus. > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 809cce0..ceb70b0 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -1759,7 +1764,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, > child = page_header(pte & PT64_BASE_ADDR_MASK); > mmu_page_remove_parent_pte(child, shadow_pte); > } else if (pfn != spte_to_pfn(*shadow_pte)) { > - pgprintk("hfn old %lx new %lx\n", > + printk(KERN_ERR "hfn old %lx new %lx\n", > spte_to_pfn(*shadow_pte), pfn); > rmap_remove(vcpu->kvm, shadow_pte); > } else > > Avi, any hints? > Not really. One thing, migration should transition the shadow pagetables from large pages to small ones, maybe that bit is broken. Maybe we're looking at a largepage spte and interpreting it as a normal L2 spte, and interpreting a guest page as the L1 spt.
* Marcelo Tosatti <mtosatti@redhat.com> [2009-06-09 11:45]: > Ryan, Marcelo, thanks for taking a look. Applied patch and reproduced, included the new debug output. > > On Fri, May 29, 2009 at 11:43:26AM -0500, Ryan Harper wrote: > > Testing latest qemu-kvm.git and kvm-kmod.git, ept enabled and backing > > guests with large pages trips a BUG in the mmu code. If I disable ept, > > but still use large pages, migration succeeds. Reproduce with: > > > > hugetlbfs setup: > > % mkdir -p /hugetlbfs && mount -t hugetlbfs hugetlbfs /hugetlbfs > > % echo 10000 > /proc/sys/vm/nr_hugepages > > > > qemu commands: > > > > guest a: > > sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :12 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw > > > > guest b: > > sudo x86_64-softmmu/qemu-system-x86_64 -L pc-bios -m 2048 -mempath /hugetlbfs -net nic -net tap -vnc :13 -monitor stdio -hda /scratch/images/rharper/rhel4u8-32-ide.raw -incoming tcp:0:4444 > > > > Once the guest a is up, issued migrate command: > > (qemu) migrate -d tcp:localhost:444 > > > > rmap_remove: ffff880a08e00098 c0336e65c0336e5b 0->BUG > ^^^^^^^^^^^^^^^^ > > This value looks very strange (bits 5:3 contain invalid value, for one). > Don't have access to HW at the very moment, so it would be great if you > had time to do a change equivalent to this and reproduce: > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 809cce0..ceb70b0 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -1759,7 +1764,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, > child = page_header(pte & PT64_BASE_ADDR_MASK); > mmu_page_remove_parent_pte(child, shadow_pte); > } else if (pfn != spte_to_pfn(*shadow_pte)) { > - pgprintk("hfn old %lx new %lx\n", > + printk(KERN_ERR "hfn old %lx new %lx\n", > spte_to_pfn(*shadow_pte), pfn); > rmap_remove(vcpu->kvm, shadow_pte); > } else hfn old 36e65c0336 new 472213 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [<ffffffffa012ca5e>] gfn_to_rmap+0x17/0x49 [kvm] PGD 676517067 PUD 2de5cd067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map CPU 5 Modules linked in: kvm_intel(N) kvm(N) nls_iso8859_1 nls_cp437 vfat fat crc32c libcrc32c ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi iscsi_ibft tun nfs lockd nfs_acl sunrpc ipv6 bridge stp cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod sr_mod cdrom cdc_ether thermal usb_storage usbnet processor sg rtc_cmos i2c_i801 shpchp rtc_core rtc_lib button mii i2c_core pcspkr pci_hotplug joydev bnx2 mptctl usbhid hid ff_memless uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd fan thermal_sys hwmon ext3 mbcache jbd mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: kvm] Supported: No Pid: 31785, comm: qemu-system-x86 Tainted: G 2.6.27.19-5-default #1 RIP: 0010:[<ffffffffa012ca5e>] [<ffffffffa012ca5e>] gfn_to_rmap+0x17/0x49 [kvm] RSP: 0018:ffff880677499ba8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 00000000000fffbd RSI: ffff8803808e08c0 RDI: 730000434950415f RBP: 730000434950415f R08: 0000000000000023 R09: 0000000000000000 R10: 0000000000000046 R11: 0000000000000006 R12: ffff88067a971c60 R13: ffff8803808e0000 R14: 00000000c0336e5b R15: 0000000000000000 FS: 00007fee7dfbf950(0000) GS:ffff880c7cd938c0(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000674d09000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-system-x86 (pid: 31785, threadinfo ffff880677498000, task ffff8803425186c0) Stack: ffff880677499bb8 00000036e65c0336 ffff880472200098 ffffffffa012cb3b 0000000000472213 0000000000000000 ffff880c7a148080 ffff880472200098 ffff880c7a148080 ffffffffa012e5b2 ffff880677499ce8 00000005425186c0 Call Trace: [<ffffffffa012cb3b>] rmap_remove+0xab/0x19e [kvm] [<ffffffffa012e5b2>] mmu_set_spte+0xb0/0x316 [kvm] [<ffffffffa012ef2b>] direct_map_entry+0x7b/0x104 [kvm] [<ffffffffa012c1b1>] walk_shadow+0x8d/0xb7 [kvm] [<ffffffffa012deb1>] tdp_page_fault+0xf8/0x137 [kvm] [<ffffffffa012f802>] kvm_mmu_page_fault+0x19/0x80 [kvm] [<ffffffffa01be9b5>] handle_ept_violation+0xe0/0x17a [kvm_intel] [<ffffffffa012a012>] kvm_arch_vcpu_ioctl_run+0x4dd/0x6e5 [kvm] [<ffffffffa0123457>] kvm_vcpu_ioctl+0xf1/0x46b [kvm] [<ffffffff802bd249>] vfs_ioctl+0x21/0x6c [<ffffffff802bd4b6>] do_vfs_ioctl+0x222/0x231 [<ffffffff802bd516>] sys_ioctl+0x51/0x73 [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b [<00007fee7eed5b77>] 0x7fee7eed5b77 Code: 31 ed 0f 18 08 eb a1 41 58 5b 5d 41 5c 41 5d 41 5e 41 5f c3 55 48 89 f5 53 89 d3 48 83 ec 08 e8 85 7a ff ff 85 db 48 89 c1 75 11 <48> 2b 28 48 8d 14 ed 00 00 00 00 48 03 50 18 eb 19 48 8b 00 48 RIP [<ffffffffa012ca5e>] gfn_to_rmap+0x17/0x49 [kvm] RSP <ffff880677499ba8> CR2: 0000000000000000 ---[ end trace 6127eb9ebc2e7fb6 ]---
Avi Kivity wrote: > > Not really. One thing, migration should transition the shadow > pagetables from large pages to small ones, maybe that bit is broken. > > Maybe we're looking at a largepage spte and interpreting it as a > normal L2 spte, and interpreting a guest page as the L1 spt. I tried to find where we drop the mmu (or at least large sptes for the slot) when we enable dirty logging, and failed. Maybe remove_write_access() is sufficient.
On Wed, Jun 10, 2009 at 11:08:14AM +0300, Avi Kivity wrote: > Avi Kivity wrote: >> >> Not really. One thing, migration should transition the shadow >> pagetables from large pages to small ones, maybe that bit is broken. >> >> Maybe we're looking at a largepage spte and interpreting it as a >> normal L2 spte, and interpreting a guest page as the L1 spt. > > I tried to find where we drop the mmu (or at least large sptes for the > slot) when we enable dirty logging, and failed. Maybe > remove_write_access() is sufficient. I believe you have to break down large pages into 4k pages for migration to work reliably. Was tempted to copy&paste the hugetlbfs file ram alloc code into user/main.c to use with user/vm.c (which then can also be used to test TLB flushes on 2M->4k transition which are lacking). Regarding the bogus spte, could not reproduce yesterday with kvm.git, but in the worst case the audit code will catch it. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 809cce0..ceb70b0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1759,7 +1764,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, child = page_header(pte & PT64_BASE_ADDR_MASK); mmu_page_remove_parent_pte(child, shadow_pte); } else if (pfn != spte_to_pfn(*shadow_pte)) { - pgprintk("hfn old %lx new %lx\n", + printk(KERN_ERR "hfn old %lx new %lx\n", spte_to_pfn(*shadow_pte), pfn); rmap_remove(vcpu->kvm, shadow_pte); } else