Message ID | 50111369.6020209@googlemail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/26/2012 12:52 PM, Chris Clayton wrote: > On 07/19/12 19:23, Chris Clayton wrote: >> On 07/19/12 13:17, Avi Kivity wrote: >>> On 07/19/2012 03:14 PM, Chris Clayton wrote: >>> >>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, >>>>> crash >>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many >>>>> times more invocations before the crash occurs with 1.0.1 and I >>>>> haven't >>>>> used qemu-kvm much in the past few weeks. >>>>> >>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or >>>>> 1.1.0) on >>>>> linux-3.4.4. I'll report back in a day or two. >>>> >>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a >>>> crash. >>>> That would indicate that the problem is in the kernel. However, I >>>> pulled >>>> the latest and greatest from Linus yesterday evening and I now can't >>>> get >>>> the crash there either, so whatever it was seems to have been fixed. If >>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty >>>> quickly, >>>> so it's been fixed in the last few days. >>> >>> There were no kvm changes post-rc7. >>> >> Yes, I'm aware of that, Avi. This thread started because I was getting a >> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned >> out the the problem was also present in v1.0.1, but much harder to hit. >> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either >> version of qemu-kvm, was stable. So then it seemed that the problem was >> in the kernel, (but not necessarily in the kvm code). >> >> Something that's changed since rc7 has either fixed the problem or made >> it much harder to hit. With rc7 and earlier I can recreate the crash >> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With >> rc7+, I haven't been able to get a crash at all. >> > Well, I'm getting the crash again, but this time I've managed to get a > backtrace: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0xb60ffb40 (LWP 9405)] > 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 > (gdb) bt > #0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 > #1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0 > #2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0 > #3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at > qom/object.c:94 > #4 type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at > qom/object.c:149 > #5 0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, > typename=typename@entry=0x802b0c50 "apic-common") > at qom/object.c:416 > #6 0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818, > typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478 > #7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b') > at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 > #8 0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, > run=run@entry=0xb6239000) > at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 > #9 0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at > /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 > #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at > /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 > #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0 > #12 0xb77bbbbe in clone () from /lib/libc.so.6 > > This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built It looks like general memory corruption. Is this repeatable? What's the guest uptime when it happens (i.e. is it immediate?) Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
On 2012-07-26 12:01, Avi Kivity wrote: > On 07/26/2012 12:52 PM, Chris Clayton wrote: >> On 07/19/12 19:23, Chris Clayton wrote: >>> On 07/19/12 13:17, Avi Kivity wrote: >>>> On 07/19/2012 03:14 PM, Chris Clayton wrote: >>>> >>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, >>>>>> crash >>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many >>>>>> times more invocations before the crash occurs with 1.0.1 and I >>>>>> haven't >>>>>> used qemu-kvm much in the past few weeks. >>>>>> >>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or >>>>>> 1.1.0) on >>>>>> linux-3.4.4. I'll report back in a day or two. >>>>> >>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a >>>>> crash. >>>>> That would indicate that the problem is in the kernel. However, I >>>>> pulled >>>>> the latest and greatest from Linus yesterday evening and I now can't >>>>> get >>>>> the crash there either, so whatever it was seems to have been fixed. If >>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty >>>>> quickly, >>>>> so it's been fixed in the last few days. >>>> >>>> There were no kvm changes post-rc7. >>>> >>> Yes, I'm aware of that, Avi. This thread started because I was getting a >>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned >>> out the the problem was also present in v1.0.1, but much harder to hit. >>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either >>> version of qemu-kvm, was stable. So then it seemed that the problem was >>> in the kernel, (but not necessarily in the kvm code). >>> >>> Something that's changed since rc7 has either fixed the problem or made >>> it much harder to hit. With rc7 and earlier I can recreate the crash >>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With >>> rc7+, I haven't been able to get a crash at all. >>> >> Well, I'm getting the crash again, but this time I've managed to get a >> backtrace: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0xb60ffb40 (LWP 9405)] >> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 >> (gdb) bt >> #0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 >> #1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0 >> #2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0 >> #3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at >> qom/object.c:94 >> #4 type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at >> qom/object.c:149 >> #5 0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, >> typename=typename@entry=0x802b0c50 "apic-common") >> at qom/object.c:416 >> #6 0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818, >> typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478 >> #7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b') >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 >> #8 0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, >> run=run@entry=0xb6239000) >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 >> #9 0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at >> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 >> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at >> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 >> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0 >> #12 0xb77bbbbe in clone () from /lib/libc.so.6 >> >> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built > > It looks like general memory corruption. Is this repeatable? What's > the guest uptime when it happens (i.e. is it immediate?) > > Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel? To sync the userspace state with what the kernel maintains. Will end up in kvm_apic_set_tpr which does precisely this. We always did, just the QOM modeling is new. Jan
On 07/26/2012 01:29 PM, Jan Kiszka wrote: >> It looks like general memory corruption. Is this repeatable? What's >> the guest uptime when it happens (i.e. is it immediate?) >> >> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel? > > To sync the userspace state with what the kernel maintains. Will end up > in kvm_apic_set_tpr which does precisely this. We always did, just the > QOM modeling is new. We should move it to the general register synchronization code, there is no reason to do this every exit (though the cost is likely minimal).
On 2012-07-26 12:45, Avi Kivity wrote: > On 07/26/2012 01:29 PM, Jan Kiszka wrote: > >>> It looks like general memory corruption. Is this repeatable? What's >>> the guest uptime when it happens (i.e. is it immediate?) >>> >>> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel? >> >> To sync the userspace state with what the kernel maintains. Will end up >> in kvm_apic_set_tpr which does precisely this. We always did, just the >> QOM modeling is new. > > We should move it to the general register synchronization code, there is > no reason to do this every exit (though the cost is likely minimal). The cost is, well, was close to nothing. But I'm not sure about that QOM type casting magic (and also it's locking requirements, long-term). However, if that is a problem, it's likely a much bigger one anyway. Jan
On 2012-07-26 12:49, Jan Kiszka wrote: > On 2012-07-26 12:45, Avi Kivity wrote: >> On 07/26/2012 01:29 PM, Jan Kiszka wrote: >> >>>> It looks like general memory corruption. Is this repeatable? What's >>>> the guest uptime when it happens (i.e. is it immediate?) >>>> >>>> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel? >>> >>> To sync the userspace state with what the kernel maintains. Will end up >>> in kvm_apic_set_tpr which does precisely this. We always did, just the >>> QOM modeling is new. >> >> We should move it to the general register synchronization code, there is >> no reason to do this every exit (though the cost is likely minimal). > > The cost is, well, was close to nothing. But I'm not sure about that QOM > type casting magic (and also it's locking requirements, long-term). > However, if that is a problem, it's likely a much bigger one anyway. But, independent of this, we can likely move the whole kvm_arch_post_run out of the exit path for kvm_irqchip_in_kernel() == true. The price is that we create more deviation between both, but that should be controllable. I will play with a patch. Jan
Hi Chris, Could you please try this patch? http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=ccebf448daf7964ee2aff7947c0bbe4c7962d059 On 07/26/2012 05:52 PM, Chris Clayton wrote: > On 07/19/12 19:23, Chris Clayton wrote: >> On 07/19/12 13:17, Avi Kivity wrote: >>> On 07/19/2012 03:14 PM, Chris Clayton wrote: >>> >>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, >>>>> crash >>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many >>>>> times more invocations before the crash occurs with 1.0.1 and I haven't >>>>> used qemu-kvm much in the past few weeks. >>>>> >>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on >>>>> linux-3.4.4. I'll report back in a day or two. >>>> >>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash. >>>> That would indicate that the problem is in the kernel. However, I pulled >>>> the latest and greatest from Linus yesterday evening and I now can't get >>>> the crash there either, so whatever it was seems to have been fixed. If >>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly, >>>> so it's been fixed in the last few days. >>> >>> There were no kvm changes post-rc7. >>> >> Yes, I'm aware of that, Avi. This thread started because I was getting a >> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned >> out the the problem was also present in v1.0.1, but much harder to hit. >> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either >> version of qemu-kvm, was stable. So then it seemed that the problem was >> in the kernel, (but not necessarily in the kvm code). >> >> Something that's changed since rc7 has either fixed the problem or made >> it much harder to hit. With rc7 and earlier I can recreate the crash >> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With >> rc7+, I haven't been able to get a crash at all. >> > Well, I'm getting the crash again, but this time I've managed to get a backtrace: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0xb60ffb40 (LWP 9405)] > 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 > (gdb) bt > #0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 > #1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0 > #2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0 > #3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at qom/object.c:94 > #4 type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at qom/object.c:149 > #5 0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, typename=typename@entry=0x802b0c50 "apic-common") > at qom/object.c:416 > #6 0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818, > typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478 > #7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b') > at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 > #8 0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, run=run@entry=0xb6239000) > at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 > #9 0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 > #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 > #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0 > #12 0xb77bbbbe in clone () from /lib/libc.so.6 > > This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built against 3.4.4 kernel headers. The glibc, the kernel headers and the kernel are vanilla and the only change to the qemu-kvm sources is: > > --- qemu-kvm-1.1.0/configure~ 2012-07-15 22:38:39.000000000 +0100 > +++ qemu-kvm-1.1.0/configure 2012-07-15 22:39:09.000000000 +0100 > @@ -2783,7 +2783,7 @@ int main(int argc, char **argv) > } > EOF > if ! compile_prog "" "" ; then > - CFLAGS+="-march=i486" > + CFLAGS+="-march=i686" > fi > fi > > Please let me know of anything I can do to help track this down. > > Thanks > > Chris > >> I'm not inclined to bisect to find out which patch provided the fix, but >> this mail should at least close the mail thread down tidily. >> >> Chris > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/26/12 11:01, Avi Kivity wrote: > On 07/26/2012 12:52 PM, Chris Clayton wrote: >> On 07/19/12 19:23, Chris Clayton wrote: >>> On 07/19/12 13:17, Avi Kivity wrote: >>>> On 07/19/2012 03:14 PM, Chris Clayton wrote: >>>> >>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, >>>>>> crash >>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many >>>>>> times more invocations before the crash occurs with 1.0.1 and I >>>>>> haven't >>>>>> used qemu-kvm much in the past few weeks. >>>>>> >>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or >>>>>> 1.1.0) on >>>>>> linux-3.4.4. I'll report back in a day or two. >>>>> >>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a >>>>> crash. >>>>> That would indicate that the problem is in the kernel. However, I >>>>> pulled >>>>> the latest and greatest from Linus yesterday evening and I now can't >>>>> get >>>>> the crash there either, so whatever it was seems to have been fixed. If >>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty >>>>> quickly, >>>>> so it's been fixed in the last few days. >>>> >>>> There were no kvm changes post-rc7. >>>> >>> Yes, I'm aware of that, Avi. This thread started because I was getting a >>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned >>> out the the problem was also present in v1.0.1, but much harder to hit. >>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either >>> version of qemu-kvm, was stable. So then it seemed that the problem was >>> in the kernel, (but not necessarily in the kvm code). >>> >>> Something that's changed since rc7 has either fixed the problem or made >>> it much harder to hit. With rc7 and earlier I can recreate the crash >>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With >>> rc7+, I haven't been able to get a crash at all. >>> >> Well, I'm getting the crash again, but this time I've managed to get a >> backtrace: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0xb60ffb40 (LWP 9405)] >> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 >> (gdb) bt >> #0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 >> #1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0 >> #2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0 >> #3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at >> qom/object.c:94 >> #4 type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at >> qom/object.c:149 >> #5 0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, >> typename=typename@entry=0x802b0c50 "apic-common") >> at qom/object.c:416 >> #6 0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818, >> typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478 >> #7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b') >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 >> #8 0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, >> run=run@entry=0xb6239000) >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 >> #9 0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at >> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 >> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at >> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 >> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0 >> #12 0xb77bbbbe in clone () from /lib/libc.so.6 >> >> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built > > It looks like general memory corruption. Is this repeatable? What's > the guest uptime when it happens (i.e. is it immediate?) I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed early as XP was starting up - well before the desktop would have appeared. The other two crashed as XP was closing down, having been running for a few minutes (but not doing much). The error messages seen through dmesg are: qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in libc-2.16.so[b6b06000+1b4000] qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in libc-2.16.so[b6ab9000+1b4000] qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in libc-2.16.so[b6b96000+1b4000] qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in libc-2.16.so[b6b54000+1b4000] qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in libc-2.16.so[b6b1e000+1b4000] The other 5 were OK, although I only did a bit of web browsing for few minutes with IE. > > Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel? > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/26/2012 02:58 PM, Chris Clayton wrote: >> It looks like general memory corruption. Is this repeatable? What's >> the guest uptime when it happens (i.e. is it immediate?) > > I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed > early as XP was starting up - well before the desktop would have > appeared. The other two crashed as XP was closing down, having been > running for a few minutes (but not doing much). > > The error messages seen through dmesg are: > > qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in > libc-2.16.so[b6b06000+1b4000] > qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in > libc-2.16.so[b6ab9000+1b4000] > qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in > libc-2.16.so[b6b96000+1b4000] > qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in > libc-2.16.so[b6b54000+1b4000] > qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in > libc-2.16.so[b6b1e000+1b4000] > > The other 5 were OK, although I only did a bit of web browsing for few > minutes with IE. Failures always in the same place (I'm guess the variations are due to PIE -- please configure with --disable-pie for future tests). Please generate a core and look around, esp. in frame 3 (type_table_lookup). Also try to dissect type_table (you may need to install the glib debug symbols for this).
On 2012-07-26 13:58, Chris Clayton wrote: > On 07/26/12 11:01, Avi Kivity wrote: >> On 07/26/2012 12:52 PM, Chris Clayton wrote: >>> On 07/19/12 19:23, Chris Clayton wrote: >>>> On 07/19/12 13:17, Avi Kivity wrote: >>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote: >>>>> >>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, >>>>>>> crash >>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many >>>>>>> times more invocations before the crash occurs with 1.0.1 and I >>>>>>> haven't >>>>>>> used qemu-kvm much in the past few weeks. >>>>>>> >>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or >>>>>>> 1.1.0) on >>>>>>> linux-3.4.4. I'll report back in a day or two. >>>>>> >>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a >>>>>> crash. >>>>>> That would indicate that the problem is in the kernel. However, I >>>>>> pulled >>>>>> the latest and greatest from Linus yesterday evening and I now can't >>>>>> get >>>>>> the crash there either, so whatever it was seems to have been fixed. If >>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty >>>>>> quickly, >>>>>> so it's been fixed in the last few days. >>>>> >>>>> There were no kvm changes post-rc7. >>>>> >>>> Yes, I'm aware of that, Avi. This thread started because I was getting a >>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned >>>> out the the problem was also present in v1.0.1, but much harder to hit. >>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either >>>> version of qemu-kvm, was stable. So then it seemed that the problem was >>>> in the kernel, (but not necessarily in the kvm code). >>>> >>>> Something that's changed since rc7 has either fixed the problem or made >>>> it much harder to hit. With rc7 and earlier I can recreate the crash >>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With >>>> rc7+, I haven't been able to get a crash at all. >>>> >>> Well, I'm getting the crash again, but this time I've managed to get a >>> backtrace: >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> [Switching to Thread 0xb60ffb40 (LWP 9405)] >>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 >>> (gdb) bt >>> #0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 >>> #1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0 >>> #2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0 >>> #3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at >>> qom/object.c:94 >>> #4 type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at >>> qom/object.c:149 >>> #5 0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, >>> typename=typename@entry=0x802b0c50 "apic-common") >>> at qom/object.c:416 >>> #6 0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818, >>> typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478 >>> #7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b') >>> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 >>> #8 0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, >>> run=run@entry=0xb6239000) >>> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 >>> #9 0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at >>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 >>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at >>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 >>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0 >>> #12 0xb77bbbbe in clone () from /lib/libc.so.6 >>> >>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built >> >> It looks like general memory corruption. Is this repeatable? What's >> the guest uptime when it happens (i.e. is it immediate?) > > I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed Hmm, I'm running various XP SP3 here against qemu.git (now widely equivalent to qemu-kvm), and I saw no crashes at all. > early as XP was starting up - well before the desktop would have > appeared. The other two crashed as XP was closing down, having been > running for a few minutes (but not doing much). > > The error messages seen through dmesg are: > > qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in > libc-2.16.so[b6b06000+1b4000] > qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in > libc-2.16.so[b6ab9000+1b4000] > qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in > libc-2.16.so[b6b96000+1b4000] > qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in > libc-2.16.so[b6b54000+1b4000] > qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in > libc-2.16.so[b6b1e000+1b4000] Oh, you are running 32-bit userland? Also 32-bit kernel? Most of us do 64-on-64. Jan
On 07/26/12 12:10, Xiao Guangrong wrote: > Hi Chris, > > Could you please try this patch? > http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=ccebf448daf7964ee2aff7947c0bbe4c7962d059 > Sorry, that patch does not fix the crashes. > On 07/26/2012 05:52 PM, Chris Clayton wrote: >> On 07/19/12 19:23, Chris Clayton wrote: >>> On 07/19/12 13:17, Avi Kivity wrote: >>>> On 07/19/2012 03:14 PM, Chris Clayton wrote: >>>> >>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, >>>>>> crash >>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many >>>>>> times more invocations before the crash occurs with 1.0.1 and I haven't >>>>>> used qemu-kvm much in the past few weeks. >>>>>> >>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on >>>>>> linux-3.4.4. I'll report back in a day or two. >>>>> >>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash. >>>>> That would indicate that the problem is in the kernel. However, I pulled >>>>> the latest and greatest from Linus yesterday evening and I now can't get >>>>> the crash there either, so whatever it was seems to have been fixed. If >>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly, >>>>> so it's been fixed in the last few days. >>>> >>>> There were no kvm changes post-rc7. >>>> >>> Yes, I'm aware of that, Avi. This thread started because I was getting a >>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned >>> out the the problem was also present in v1.0.1, but much harder to hit. >>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either >>> version of qemu-kvm, was stable. So then it seemed that the problem was >>> in the kernel, (but not necessarily in the kvm code). >>> >>> Something that's changed since rc7 has either fixed the problem or made >>> it much harder to hit. With rc7 and earlier I can recreate the crash >>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With >>> rc7+, I haven't been able to get a crash at all. >>> >> Well, I'm getting the crash again, but this time I've managed to get a backtrace: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0xb60ffb40 (LWP 9405)] >> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 >> (gdb) bt >> #0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6 >> #1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0 >> #2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0 >> #3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at qom/object.c:94 >> #4 type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at qom/object.c:149 >> #5 0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, typename=typename@entry=0x802b0c50 "apic-common") >> at qom/object.c:416 >> #6 0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818, >> typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478 >> #7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b') >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 >> #8 0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, run=run@entry=0xb6239000) >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 >> #9 0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 >> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 >> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0 >> #12 0xb77bbbbe in clone () from /lib/libc.so.6 >> >> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built against 3.4.4 kernel headers. The glibc, the kernel headers and the kernel are vanilla and the only change to the qemu-kvm sources is: >> >> --- qemu-kvm-1.1.0/configure~ 2012-07-15 22:38:39.000000000 +0100 >> +++ qemu-kvm-1.1.0/configure 2012-07-15 22:39:09.000000000 +0100 >> @@ -2783,7 +2783,7 @@ int main(int argc, char **argv) >> } >> EOF >> if ! compile_prog "" "" ; then >> - CFLAGS+="-march=i486" >> + CFLAGS+="-march=i686" >> fi >> fi >> >> Please let me know of anything I can do to help track this down. >> >> Thanks >> >> Chris >> >>> I'm not inclined to bisect to find out which patch provided the fix, but >>> this mail should at least close the mail thread down tidily. >>> >>> Chris >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/26/12 13:07, Avi Kivity wrote: > On 07/26/2012 02:58 PM, Chris Clayton wrote: > >>> It looks like general memory corruption. Is this repeatable? What's >>> the guest uptime when it happens (i.e. is it immediate?) >> >> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed >> early as XP was starting up - well before the desktop would have >> appeared. The other two crashed as XP was closing down, having been >> running for a few minutes (but not doing much). >> >> The error messages seen through dmesg are: >> >> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in >> libc-2.16.so[b6b06000+1b4000] >> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in >> libc-2.16.so[b6ab9000+1b4000] >> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in >> libc-2.16.so[b6b96000+1b4000] >> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in >> libc-2.16.so[b6b54000+1b4000] >> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in >> libc-2.16.so[b6b1e000+1b4000] >> >> The other 5 were OK, although I only did a bit of web browsing for few >> minutes with IE. > > Failures always in the same place (I'm guess the variations are due to > PIE -- please configure with --disable-pie for future tests). > > Please generate a core and look around, esp. in frame 3 > (type_table_lookup). Also try to dissect type_table (you may need to > install the glib debug symbols for this). > > > Mmm, I'm sailing out of my comfort zone here, but I've built a debug version of glib and trapped another crash. The backtrace is: (gdb) bt #0 0xb7822d77 in __strcmp_sse4_2 () from /lib/libc.so.6 #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704 #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, key=0x8319b82, hash_return=0xb60ff178) at ghash.c:422 #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, key=key@entry=0x8319b82) at ghash.c:1074 #4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at qom/object.c:94 #5 type_get_by_name (name=name@entry=0x8319b82 "apic-common") at qom/object.c:149 #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:416 #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818, typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478 #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=8 '\b') at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 #9 0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a3ca60, run=run@entry=0xb6258000) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0 #13 0xb77dabbe in clone () from /lib/libc.so.6 Inspecting the args passed into g_str_equal shows: (gdb) print (gchar *) 0x8a0cd58 $12 = (gchar *) 0x8a0cd58 "apic-common" (gdb) print (gchar *) 0x8319b82 $13 = (gchar *) 0x8319b82 "apic-common" So it seems odd that glibc's implementation of strcmp should crash with two equal strings. As I say, however, I'm a bit out of my comfort zone here, so I may be missing something. I wouldn't know how to go about disecting type_table, which I assume is the hash_table arg passed into g_hash_table_lookup, so advice on how to do that and what I am looking for (NULL pointer?) would be helpful. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/27/12 00:22, Chris Clayton wrote: > On 07/26/12 13:07, Avi Kivity wrote: >> On 07/26/2012 02:58 PM, Chris Clayton wrote: >> >>>> It looks like general memory corruption. Is this repeatable? What's >>>> the guest uptime when it happens (i.e. is it immediate?) >>> >>> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed >>> early as XP was starting up - well before the desktop would have >>> appeared. The other two crashed as XP was closing down, having been >>> running for a few minutes (but not doing much). >>> >>> The error messages seen through dmesg are: >>> >>> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in >>> libc-2.16.so[b6b06000+1b4000] >>> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in >>> libc-2.16.so[b6ab9000+1b4000] >>> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in >>> libc-2.16.so[b6b96000+1b4000] >>> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in >>> libc-2.16.so[b6b54000+1b4000] >>> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in >>> libc-2.16.so[b6b1e000+1b4000] >>> >>> The other 5 were OK, although I only did a bit of web browsing for few >>> minutes with IE. >> >> Failures always in the same place (I'm guess the variations are due to >> PIE -- please configure with --disable-pie for future tests). >> >> Please generate a core and look around, esp. in frame 3 >> (type_table_lookup). Also try to dissect type_table (you may need to >> install the glib debug symbols for this). >> >> >> <snip> Here's another backtrace and source listing of the failing function, following build and installation of libc (2.16) with debugging turned on. I'm afraid it's beyond my current knowledge to know what this might be telling us. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb60ffb40 (LWP 6515)] __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 217 movdqu (%edx), %xmm2 (gdb) generate-core-file Saved corefile core.6509 (gdb) bt #0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704 #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, key=0x8319b82, hash_return=0xb60ff178) at ghash.c:422 #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, key=key@entry=0x8319b82) at ghash.c:1074 #4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at qom/object.c:94 #5 type_get_by_name (name=name@entry=0x8319b82 "apic-common") at qom/object.c:149 #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:416 #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818, typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478 #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a') at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 #9 0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a3ca60, run=run@entry=0xb6271000) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0 #13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132 (gdb) print *(0x8a0cd58) $1 = 1667854433 (gdb) print (char*) 0x8a0cd58 $2 = 0x8a0cd58 "apic-common" (gdb) list __strcmp_sse4_2 201 PUSH (REM) 202 #endif 203 #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L 204 PUSH (%edi) 205 #endif 206 mov STR1(%esp), %edx 207 mov STR2(%esp), %eax 208 #if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L 209 movl CNT(%esp), REM 210 test REM, REM (gdb) list 211 je L(eq) 212 #endif 213 mov %dx, %cx 214 and $0xfff, %cx 215 cmp $0xff0, %cx 216 ja L(first4bytes) 217 movdqu (%edx), %xmm2 218 mov %eax, %ecx 219 and $0xfff, %ecx 220 cmp $0xff0, %ecx (gdb) list 221 ja L(first4bytes) 222 #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L 223 # define TOLOWER(reg1, reg2) \ 224 movdqa reg1, %xmm3; \ 225 movdqa UCHIGH_reg, %xmm4; \ 226 movdqa reg2, %xmm5; \ 227 movdqa UCHIGH_reg, %xmm6; \ 228 pcmpgtb UCLOW_reg, %xmm3; \ 229 pcmpgtb reg1, %xmm4; \ 230 pcmpgtb UCLOW_reg, %xmm5; \ (gdb) I'll stop sending backtraces etc in now in the hope that someone will advise me on how I might better direct my efforts. Thanks for your help so far. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/27/12 19:08, Eric Northup wrote: > Could you include the output of "info registers" at the point where it > crashed? > Here you go: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb6a78b40 (LWP 13249)] __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 217 movdqu (%edx), %xmm2 (gdb) bt #0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704 #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, key=0x8319b82, hash_return=0xb6a78178) at ghash.c:422 #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, key=key@entry=0x8319b82) at ghash.c:1074 #4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at qom/object.c:94 #5 type_get_by_name (name=name@entry=0x8319b82 "apic-common") at qom/object.c:149 #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0, typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:416 #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0, typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478 #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r') at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 #9 0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370, run=run@entry=0xb6274000) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0 #13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132 (gdb) info registers eax 0x8319b82 137468802 ecx 0xd58 3416 edx 0x8a0cd58 144756056 ebx 0xb7f7f2c4 -1208487228 esp 0xb6a780ec 0xb6a780ec ebp 0xb6a78118 0xb6a78118 esi 0x8a313e0 144905184 edi 0xc513 50451 eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23> eflags 0x10283 [ CF SF IF RF ] cs 0x73 115 ss 0x7b 123 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x33 51 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/27/2012 10:04 PM, Chris Clayton wrote: > On 07/27/12 19:08, Eric Northup wrote: >> Could you include the output of "info registers" at the point where it >> crashed? >> > > Here you go: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0xb6a78b40 (LWP 13249)] > __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 > 217 movdqu (%edx), %xmm2 > (gdb) bt > #0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 > #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704 > #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, > key=0x8319b82, hash_return=0xb6a78178) > at ghash.c:422 > #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, > key=key@entry=0x8319b82) at ghash.c:1074 > #4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at > qom/object.c:94 > #5 type_get_by_name (name=name@entry=0x8319b82 "apic-common") at > qom/object.c:149 > #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0, > typename=typename@entry=0x8319b82 "apic-common") > at qom/object.c:416 > #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0, > typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478 > #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r') > at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 > #9 0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370, > run=run@entry=0xb6274000) > at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 > #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at > /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 > #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at > /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 > #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0 > #13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132 > (gdb) info registers > eax 0x8319b82 137468802 > ecx 0xd58 3416 > edx 0x8a0cd58 144756056 > ebx 0xb7f7f2c4 -1208487228 > esp 0xb6a780ec 0xb6a780ec > ebp 0xb6a78118 0xb6a78118 > esi 0x8a313e0 144905184 > edi 0xc513 50451 > eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23> > eflags 0x10283 [ CF SF IF RF ] > cs 0x73 115 > ss 0x7b 123 > ds 0x0 0 > es 0x0 0 > fs 0x0 0 > gs 0x33 51 > ds shouldn't be zero for a 32-bit process. But that should have crashed *much* earlier, ds is accessed all the time. Please add the following snippet to the beginning of kvm_arch_post_run(): { unsigned short ds; asm("mov %%ds, %0" : "=rm"(ds)); assert(ds != 0); } if the assert triggers, then kvm corrupted the segment registers. If not, corruption happens somewhere above.
On 07/29/12 13:42, Avi Kivity wrote: > On 07/27/2012 10:04 PM, Chris Clayton wrote: >> On 07/27/12 19:08, Eric Northup wrote: >>> Could you include the output of "info registers" at the point where it >>> crashed? >>> >> >> Here you go: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0xb6a78b40 (LWP 13249)] >> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 >> 217 movdqu (%edx), %xmm2 >> (gdb) bt >> #0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 >> #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704 >> #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, >> key=0x8319b82, hash_return=0xb6a78178) >> at ghash.c:422 >> #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, >> key=key@entry=0x8319b82) at ghash.c:1074 >> #4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at >> qom/object.c:94 >> #5 type_get_by_name (name=name@entry=0x8319b82 "apic-common") at >> qom/object.c:149 >> #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0, >> typename=typename@entry=0x8319b82 "apic-common") >> at qom/object.c:416 >> #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0, >> typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478 >> #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r') >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 >> #9 0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370, >> run=run@entry=0xb6274000) >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 >> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at >> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 >> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at >> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 >> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0 >> #13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132 >> (gdb) info registers >> eax 0x8319b82 137468802 >> ecx 0xd58 3416 >> edx 0x8a0cd58 144756056 >> ebx 0xb7f7f2c4 -1208487228 >> esp 0xb6a780ec 0xb6a780ec >> ebp 0xb6a78118 0xb6a78118 >> esi 0x8a313e0 144905184 >> edi 0xc513 50451 >> eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23> >> eflags 0x10283 [ CF SF IF RF ] >> cs 0x73 115 >> ss 0x7b 123 >> ds 0x0 0 >> es 0x0 0 >> fs 0x0 0 >> gs 0x33 51 >> > > ds shouldn't be zero for a 32-bit process. > > But that should have crashed *much* earlier, ds is accessed all the time. > > Please add the following snippet to the beginning of kvm_arch_post_run(): > > { > unsigned short ds; > asm("mov %%ds, %0" : "=rm"(ds)); > assert(ds != 0); > } > > if the assert triggers, then kvm corrupted the segment registers. If > not, corruption happens somewhere above. > Thanks, Avi. The assert didn't trigger - I got: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb60ffb40 (LWP 2134)] __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 217 movdqu (%edx), %xmm2 (gdb) info registers eax 0x8319ba2 137468834 ecx 0xd58 3416 edx 0x8a0cd58 144756056 ebx 0xb7f7f2c4 -1208487228 esp 0xb60ff0ec 0xb60ff0ec ebp 0xb60ff118 0xb60ff118 esi 0x8a44818 144984088 edi 0xc513 50451 eip 0xb7820f77 0xb7820f77 <__strcmp_sse4_2+23> eflags 0x10283 [ CF SF IF RF ] cs 0x73 115 ss 0x7b 123 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x33 51 (gdb) list 212 #endif 213 mov %dx, %cx 214 and $0xfff, %cx 215 cmp $0xff0, %cx 216 ja L(first4bytes) 217 movdqu (%edx), %xmm2 218 mov %eax, %ecx 219 and $0xfff, %ecx 220 cmp $0xff0, %ecx 221 ja L(first4bytes) (gdb) bt #0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704 #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, key=0x8319ba2, hash_return=0xb60ff178) at ghash.c:422 #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, key=key@entry=0x8319ba2) at ghash.c:1074 #4 0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at qom/object.c:94 #5 type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at qom/object.c:149 #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:416 #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818, typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478 #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a') at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 #9 0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60, run=run@entry=0xb626d000) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702 #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 #12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0 #13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132 I think you are saying that the problem isn't in kvm, so where would you recommend I continue investigations. I'm not seeing a crash with any other applications. Thanks again. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/29/2012 05:03 PM, Chris Clayton wrote: > On 07/29/12 13:42, Avi Kivity wrote: >> On 07/27/2012 10:04 PM, Chris Clayton wrote: >>> On 07/27/12 19:08, Eric Northup wrote: >>>> Could you include the output of "info registers" at the point where it >>>> crashed? >>>> >>> >>> Here you go: >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> [Switching to Thread 0xb6a78b40 (LWP 13249)] >>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 >>> 217 movdqu (%edx), %xmm2 >>> (gdb) bt >>> #0 __strcmp_sse4_2 () at >>> ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 >>> #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at >>> ghash.c:1704 >>> #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, >>> key=0x8319b82, hash_return=0xb6a78178) >>> at ghash.c:422 >>> #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, >>> key=key@entry=0x8319b82) at ghash.c:1074 >>> #4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at >>> qom/object.c:94 >>> #5 type_get_by_name (name=name@entry=0x8319b82 "apic-common") at >>> qom/object.c:149 >>> #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0, >>> typename=typename@entry=0x8319b82 "apic-common") >>> at qom/object.c:416 >>> #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0, >>> typename=typename@entry=0x8319b82 "apic-common") at >>> qom/object.c:478 >>> #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r') >>> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 >>> #9 0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370, >>> run=run@entry=0xb6274000) >>> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 >>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at >>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 >>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at >>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 >>> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0 >>> #13 0xb77e45ee in clone () at >>> ../sysdeps/unix/sysv/linux/i386/clone.S:132 >>> (gdb) info registers >>> eax 0x8319b82 137468802 >>> ecx 0xd58 3416 >>> edx 0x8a0cd58 144756056 >>> ebx 0xb7f7f2c4 -1208487228 >>> esp 0xb6a780ec 0xb6a780ec >>> ebp 0xb6a78118 0xb6a78118 >>> esi 0x8a313e0 144905184 >>> edi 0xc513 50451 >>> eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23> >>> eflags 0x10283 [ CF SF IF RF ] >>> cs 0x73 115 >>> ss 0x7b 123 >>> ds 0x0 0 >>> es 0x0 0 >>> fs 0x0 0 >>> gs 0x33 51 >>> >> >> ds shouldn't be zero for a 32-bit process. >> >> But that should have crashed *much* earlier, ds is accessed all the time. >> >> Please add the following snippet to the beginning of kvm_arch_post_run(): >> >> { >> unsigned short ds; >> asm("mov %%ds, %0" : "=rm"(ds)); >> assert(ds != 0); >> } >> >> if the assert triggers, then kvm corrupted the segment registers. If >> not, corruption happens somewhere above. >> > Thanks, Avi. > > The assert didn't trigger - I got: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0xb60ffb40 (LWP 2134)] > __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 > 217 movdqu (%edx), %xmm2 > (gdb) info registers > eax 0x8319ba2 137468834 > ecx 0xd58 3416 > edx 0x8a0cd58 144756056 > ebx 0xb7f7f2c4 -1208487228 > esp 0xb60ff0ec 0xb60ff0ec > ebp 0xb60ff118 0xb60ff118 > esi 0x8a44818 144984088 > edi 0xc513 50451 > eip 0xb7820f77 0xb7820f77 <__strcmp_sse4_2+23> > eflags 0x10283 [ CF SF IF RF ] > cs 0x73 115 > ss 0x7b 123 > ds 0x0 0 > es 0x0 0 > fs 0x0 0 > gs 0x33 51 > (gdb) list > 212 #endif > 213 mov %dx, %cx > 214 and $0xfff, %cx > 215 cmp $0xff0, %cx > 216 ja L(first4bytes) > 217 movdqu (%edx), %xmm2 > 218 mov %eax, %ecx > 219 and $0xfff, %ecx > 220 cmp $0xff0, %ecx > 221 ja L(first4bytes) > (gdb) bt > #0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 > #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704 > #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, > key=0x8319ba2, hash_return=0xb60ff178) > at ghash.c:422 > #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, > key=key@entry=0x8319ba2) at ghash.c:1074 > #4 0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at > qom/object.c:94 > #5 type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at > qom/object.c:149 > #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, > typename=typename@entry=0x8319ba2 "apic-common") > at qom/object.c:416 > #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818, > typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478 > #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a') > at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 > #9 0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60, > run=run@entry=0xb626d000) > at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702 > #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at > /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 > #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at > /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 > #12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0 > #13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132 > > I think you are saying that the problem isn't in kvm, so where would you > recommend I continue investigations. I'm not seeing a crash with any > other applications. What might have happened is that the movdqu instruction faulted (as it's an fpu instruction), and on the way back from the fault, ds and es didn't get restored correctly. You can test this by writing a trivial version of g_str_equal() somewhere in the qemu source code and rebuilding it.
On 07/29/2012 05:18 PM, Avi Kivity wrote: >> >> I think you are saying that the problem isn't in kvm, so where would you >> recommend I continue investigations. I'm not seeing a crash with any >> other applications. > > What might have happened is that the movdqu instruction faulted (as it's > an fpu instruction), and on the way back from the fault, ds and es > didn't get restored correctly. > > You can test this by writing a trivial version of g_str_equal() > somewhere in the qemu source code and rebuilding it. You're running a 32-bit kernel, yes? Please confirm.
On 07/29/12 15:48, Avi Kivity wrote: > On 07/29/2012 05:18 PM, Avi Kivity wrote: >>> >>> I think you are saying that the problem isn't in kvm, so where would you >>> recommend I continue investigations. I'm not seeing a crash with any >>> other applications. >> >> What might have happened is that the movdqu instruction faulted (as it's >> an fpu instruction), and on the way back from the fault, ds and es >> didn't get restored correctly. >> >> You can test this by writing a trivial version of g_str_equal() >> somewhere in the qemu source code and rebuilding it. > > You're running a 32-bit kernel, yes? Please confirm. > > Yes, I am running a 32-bit kernel and userland. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/29/2012 05:18 PM, Avi Kivity wrote: > On 07/29/2012 05:03 PM, Chris Clayton wrote: >> On 07/29/12 13:42, Avi Kivity wrote: >>> On 07/27/2012 10:04 PM, Chris Clayton wrote: >>>> On 07/27/12 19:08, Eric Northup wrote: >>>>> Could you include the output of "info registers" at the point where it >>>>> crashed? >>>>> >>>> >>>> Here you go: >>>> >>>> Program received signal SIGSEGV, Segmentation fault. >>>> [Switching to Thread 0xb6a78b40 (LWP 13249)] >>>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 >>>> 217 movdqu (%edx), %xmm2 >>>> (gdb) bt >>>> #0 __strcmp_sse4_2 () at >>>> ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 >>>> #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at >>>> ghash.c:1704 >>>> #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, >>>> key=0x8319b82, hash_return=0xb6a78178) >>>> at ghash.c:422 >>>> #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, >>>> key=key@entry=0x8319b82) at ghash.c:1074 >>>> #4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at >>>> qom/object.c:94 >>>> #5 type_get_by_name (name=name@entry=0x8319b82 "apic-common") at >>>> qom/object.c:149 >>>> #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0, >>>> typename=typename@entry=0x8319b82 "apic-common") >>>> at qom/object.c:416 >>>> #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0, >>>> typename=typename@entry=0x8319b82 "apic-common") at >>>> qom/object.c:478 >>>> #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r') >>>> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 >>>> #9 0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370, >>>> run=run@entry=0xb6274000) >>>> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695 >>>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at >>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 >>>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at >>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 >>>> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0 >>>> #13 0xb77e45ee in clone () at >>>> ../sysdeps/unix/sysv/linux/i386/clone.S:132 >>>> (gdb) info registers >>>> eax 0x8319b82 137468802 >>>> ecx 0xd58 3416 >>>> edx 0x8a0cd58 144756056 >>>> ebx 0xb7f7f2c4 -1208487228 >>>> esp 0xb6a780ec 0xb6a780ec >>>> ebp 0xb6a78118 0xb6a78118 >>>> esi 0x8a313e0 144905184 >>>> edi 0xc513 50451 >>>> eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23> >>>> eflags 0x10283 [ CF SF IF RF ] >>>> cs 0x73 115 >>>> ss 0x7b 123 >>>> ds 0x0 0 >>>> es 0x0 0 >>>> fs 0x0 0 >>>> gs 0x33 51 >>>> >>> >>> ds shouldn't be zero for a 32-bit process. >>> >>> But that should have crashed *much* earlier, ds is accessed all the time. >>> >>> Please add the following snippet to the beginning of kvm_arch_post_run(): >>> >>> { >>> unsigned short ds; >>> asm("mov %%ds, %0" : "=rm"(ds)); >>> assert(ds != 0); >>> } >>> >>> if the assert triggers, then kvm corrupted the segment registers. If >>> not, corruption happens somewhere above. >>> >> Thanks, Avi. >> >> The assert didn't trigger - I got: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0xb60ffb40 (LWP 2134)] >> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 >> 217 movdqu (%edx), %xmm2 >> (gdb) info registers >> eax 0x8319ba2 137468834 >> ecx 0xd58 3416 >> edx 0x8a0cd58 144756056 >> ebx 0xb7f7f2c4 -1208487228 >> esp 0xb60ff0ec 0xb60ff0ec >> ebp 0xb60ff118 0xb60ff118 >> esi 0x8a44818 144984088 >> edi 0xc513 50451 >> eip 0xb7820f77 0xb7820f77 <__strcmp_sse4_2+23> >> eflags 0x10283 [ CF SF IF RF ] >> cs 0x73 115 >> ss 0x7b 123 >> ds 0x0 0 >> es 0x0 0 >> fs 0x0 0 >> gs 0x33 51 >> (gdb) list >> 212 #endif >> 213 mov %dx, %cx >> 214 and $0xfff, %cx >> 215 cmp $0xff0, %cx >> 216 ja L(first4bytes) >> 217 movdqu (%edx), %xmm2 >> 218 mov %eax, %ecx >> 219 and $0xfff, %ecx >> 220 cmp $0xff0, %ecx >> 221 ja L(first4bytes) >> (gdb) bt >> #0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217 >> #1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704 >> #2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, >> key=0x8319ba2, hash_return=0xb60ff178) >> at ghash.c:422 >> #3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, >> key=key@entry=0x8319ba2) at ghash.c:1074 >> #4 0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at >> qom/object.c:94 >> #5 type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at >> qom/object.c:149 >> #6 0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, >> typename=typename@entry=0x8319ba2 "apic-common") >> at qom/object.c:416 >> #7 0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818, >> typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478 >> #8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a') >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60 >> #9 0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60, >> run=run@entry=0xb626d000) >> at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702 >> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at >> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269 >> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at >> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752 >> #12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0 >> #13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132 >> >> I think you are saying that the problem isn't in kvm, so where would you >> recommend I continue investigations. I'm not seeing a crash with any >> other applications. > > What might have happened is that the movdqu instruction faulted (as it's > an fpu instruction), and on the way back from the fault, ds and es > didn't get restored correctly. > > You can test this by writing a trivial version of g_str_equal() > somewhere in the qemu source code and rebuilding it. > > from entry_32.S: .macro RESTORE_REGS pop=0 RESTORE_INT_REGS 1: popl_cfi %ds /*CFI_RESTORE ds;*/ 2: popl_cfi %es /*CFI_RESTORE es;*/ 3: popl_cfi %fs /*CFI_RESTORE fs;*/ POP_GS \pop .pushsection .fixup, "ax" 4: movl $0, (%esp) jmp 1b 5: movl $0, (%esp) jmp 2b 6: movl $0, (%esp) jmp 3b .popsection this piece of code tries to restore %ds, and if it fails, zeros it, which is consistent with the core dump. This could happen if kvm is failing to restore GDT correctly.
On 07/29/2012 06:47 PM, Avi Kivity wrote: >> What might have happened is that the movdqu instruction faulted (as it's >> an fpu instruction), and on the way back from the fault, ds and es >> didn't get restored correctly. >> >> You can test this by writing a trivial version of g_str_equal() >> somewhere in the qemu source code and rebuilding it. >> >> > > from entry_32.S: > > .macro RESTORE_REGS pop=0 > RESTORE_INT_REGS > 1: popl_cfi %ds > /*CFI_RESTORE ds;*/ > 2: popl_cfi %es > /*CFI_RESTORE es;*/ > 3: popl_cfi %fs > /*CFI_RESTORE fs;*/ > POP_GS \pop > .pushsection .fixup, "ax" > 4: movl $0, (%esp) > jmp 1b > 5: movl $0, (%esp) > jmp 2b > 6: movl $0, (%esp) > jmp 3b > .popsection > > this piece of code tries to restore %ds, and if it fails, zeros it, > which is consistent with the core dump. > > This could happen if kvm is failing to restore GDT correctly. > Possible culprit: b2da15ac26a0c00.
On 07/29/12 17:34, Avi Kivity wrote: > On 07/29/2012 06:47 PM, Avi Kivity wrote: >>> What might have happened is that the movdqu instruction faulted (as it's >>> an fpu instruction), and on the way back from the fault, ds and es >>> didn't get restored correctly. >>> >>> You can test this by writing a trivial version of g_str_equal() >>> somewhere in the qemu source code and rebuilding it. >>> >>> >> >> from entry_32.S: >> >> .macro RESTORE_REGS pop=0 >> RESTORE_INT_REGS >> 1: popl_cfi %ds >> /*CFI_RESTORE ds;*/ >> 2: popl_cfi %es >> /*CFI_RESTORE es;*/ >> 3: popl_cfi %fs >> /*CFI_RESTORE fs;*/ >> POP_GS \pop >> .pushsection .fixup, "ax" >> 4: movl $0, (%esp) >> jmp 1b >> 5: movl $0, (%esp) >> jmp 2b >> 6: movl $0, (%esp) >> jmp 3b >> .popsection >> >> this piece of code tries to restore %ds, and if it fails, zeros it, >> which is consistent with the core dump. >> >> This could happen if kvm is failing to restore GDT correctly. >> > > Possible culprit: b2da15ac26a0c00. > > That commit isn't in qermu-kvm-1.1.1. I'm testing a build with g_str_equal implemented in kvm.c and so far I haven't had a crash in 6 invocations. That hasn't been possible with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be sure. Thanks for your help, Avi. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jul 29, 2012 at 06:50:09PM +0100, Chris Clayton wrote: > On 07/29/12 17:34, Avi Kivity wrote: > >On 07/29/2012 06:47 PM, Avi Kivity wrote: > >>>What might have happened is that the movdqu instruction faulted (as it's > >>>an fpu instruction), and on the way back from the fault, ds and es > >>>didn't get restored correctly. > >>> > >>>You can test this by writing a trivial version of g_str_equal() > >>>somewhere in the qemu source code and rebuilding it. > >>> > >>> > >> > >>from entry_32.S: > >> > >>.macro RESTORE_REGS pop=0 > >> RESTORE_INT_REGS > >>1: popl_cfi %ds > >> /*CFI_RESTORE ds;*/ > >>2: popl_cfi %es > >> /*CFI_RESTORE es;*/ > >>3: popl_cfi %fs > >> /*CFI_RESTORE fs;*/ > >> POP_GS \pop > >>.pushsection .fixup, "ax" > >>4: movl $0, (%esp) > >> jmp 1b > >>5: movl $0, (%esp) > >> jmp 2b > >>6: movl $0, (%esp) > >> jmp 3b > >>.popsection > >> > >>this piece of code tries to restore %ds, and if it fails, zeros it, > >>which is consistent with the core dump. > >> > >>This could happen if kvm is failing to restore GDT correctly. > >> > > > >Possible culprit: b2da15ac26a0c00. > > > > > That commit isn't in qermu-kvm-1.1.1. > It is in kernel. > I'm testing a build with g_str_equal implemented in kvm.c and so far > I haven't had a crash in 6 invocations. That hasn't been possible > with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be > sure. > > Thanks for your help, Avi. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/29/12 18:54, Gleb Natapov wrote: > On Sun, Jul 29, 2012 at 06:50:09PM +0100, Chris Clayton wrote: >> On 07/29/12 17:34, Avi Kivity wrote: >>> On 07/29/2012 06:47 PM, Avi Kivity wrote: >>>>> What might have happened is that the movdqu instruction faulted (as it's >>>>> an fpu instruction), and on the way back from the fault, ds and es >>>>> didn't get restored correctly. >>>>> >>>>> You can test this by writing a trivial version of g_str_equal() >>>>> somewhere in the qemu source code and rebuilding it. >>>>> >>>>> >>>> >>> >from entry_32.S: >>>> >>>> .macro RESTORE_REGS pop=0 >>>> RESTORE_INT_REGS >>>> 1: popl_cfi %ds >>>> /*CFI_RESTORE ds;*/ >>>> 2: popl_cfi %es >>>> /*CFI_RESTORE es;*/ >>>> 3: popl_cfi %fs >>>> /*CFI_RESTORE fs;*/ >>>> POP_GS \pop >>>> .pushsection .fixup, "ax" >>>> 4: movl $0, (%esp) >>>> jmp 1b >>>> 5: movl $0, (%esp) >>>> jmp 2b >>>> 6: movl $0, (%esp) >>>> jmp 3b >>>> .popsection >>>> >>>> this piece of code tries to restore %ds, and if it fails, zeros it, >>>> which is consistent with the core dump. >>>> >>>> This could happen if kvm is failing to restore GDT correctly. >>>> >>> >>> Possible culprit: b2da15ac26a0c00. >>> >>> >> That commit isn't in qermu-kvm-1.1.1. >> > It is in kernel. > Sorry, so it is. With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem to be the problem. >> I'm testing a build with g_str_equal implemented in kvm.c and so far >> I haven't had a crash in 6 invocations. That hasn't been possible >> with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be >> sure. >> Similarly, with my "local" implementation of g_str_equal, I've had 15 clean invocations on vanilla kernel 3.5.0. I'm more than happy to test patches to fix this regression, but it will be tomorrow before I will be able to do so. >> Thanks for your help, Avi. > > -- > Gleb. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/29/12 20:10, Chris Clayton wrote: >>>> Possible culprit: b2da15ac26a0c00. >>>> >>>> >>> That commit isn't in qermu-kvm-1.1.1. >>> >> It is in kernel. >> > > Sorry, so it is. > > With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 > clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem > to be the problem. Just to be sure, I've run some more tests today. No crashes occurred in 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 reverted. Thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/30/2012 05:00 PM, Chris Clayton wrote: > On 07/29/12 20:10, Chris Clayton wrote: >>>>> Possible culprit: b2da15ac26a0c00. >>>>> >>>>> >>>> That commit isn't in qermu-kvm-1.1.1. >>>> >>> It is in kernel. >>> >> >> Sorry, so it is. >> >> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 >> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem >> to be the problem. > > Just to be sure, I've run some more tests today. No crashes occurred in > 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 > reverted. Ok. I'm trying to reproduce it here on a nested-virt setup, since the code looks correct. What's your preemption settings?
On 07/30/12 15:03, Avi Kivity wrote: > On 07/30/2012 05:00 PM, Chris Clayton wrote: >> On 07/29/12 20:10, Chris Clayton wrote: >>>>>> Possible culprit: b2da15ac26a0c00. >>>>>> >>>>>> >>>>> That commit isn't in qermu-kvm-1.1.1. >>>>> >>>> It is in kernel. >>>> >>> >>> Sorry, so it is. >>> >>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 >>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem >>> to be the problem. >> >> Just to be sure, I've run some more tests today. No crashes occurred in >> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 >> reverted. > > Ok. I'm trying to reproduce it here on a nested-virt setup, since the > code looks correct. > > What's your preemption settings? > > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config CONFIG_TREE_PREEMPT_RCU=y CONFIG_PREEMPT_RCU=y CONFIG_PREEMPT_NOTIFIERS=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_COUNT=y -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/30/2012 05:07 PM, Chris Clayton wrote: >> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem >>>> to be the problem. >>> >>> Just to be sure, I've run some more tests today. No crashes occurred in >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 >>> reverted. >> >> Ok. I'm trying to reproduce it here on a nested-virt setup, since the >> code looks correct. >> >> What's your preemption settings? >> >> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config > CONFIG_TREE_PREEMPT_RCU=y > CONFIG_PREEMPT_RCU=y > CONFIG_PREEMPT_NOTIFIERS=y > # CONFIG_PREEMPT_NONE is not set > # CONFIG_PREEMPT_VOLUNTARY is not set > CONFIG_PREEMPT=y > CONFIG_PREEMPT_COUNT=y Here's what I think that is happening vcpu_load ... vmx_save_host_state vmx_vcpu_run (ds.cpl, es.cpl cleared by hardware) interrupt push ds, es # pushes bad ds, es schedule vmx_vcpu_put vmx_load_host_state reload ds, es pop ds, es # of other thread's stack iret # other thread runs interrupt schedule # back in vcpu thread interrupt return: pop ds, es # <-- problem iret ... vcpu_put # bad ds, es, but !vmx->host_state.loaded Marcelo, did I miss something here? Unfortunately, my reproducer has ceased to reproduce. But the fix is easy if the analysis above is right.
On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote: > On 07/30/2012 05:07 PM, Chris Clayton wrote: > >> > >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 > >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem > >>>> to be the problem. > >>> > >>> Just to be sure, I've run some more tests today. No crashes occurred in > >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 > >>> reverted. > >> > >> Ok. I'm trying to reproduce it here on a nested-virt setup, since the > >> code looks correct. > >> > >> What's your preemption settings? > >> > >> > > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config > > CONFIG_TREE_PREEMPT_RCU=y > > CONFIG_PREEMPT_RCU=y > > CONFIG_PREEMPT_NOTIFIERS=y > > # CONFIG_PREEMPT_NONE is not set > > # CONFIG_PREEMPT_VOLUNTARY is not set > > CONFIG_PREEMPT=y > > CONFIG_PREEMPT_COUNT=y > > Here's what I think that is happening > > vcpu_load > ... > vmx_save_host_state > vmx_vcpu_run > (ds.cpl, es.cpl cleared by hardware) > > interrupt > push ds, es # pushes bad ds, es > schedule > vmx_vcpu_put > vmx_load_host_state > reload ds, es > pop ds, es # of other thread's stack > iret > # other thread runs > interrupt > schedule # back in vcpu thread > interrupt return: pop ds, es # <-- problem > iret > > ... > vcpu_put > > # bad ds, es, but !vmx->host_state.loaded > > Marcelo, did I miss something here? Don't think so. > > Unfortunately, my reproducer has ceased to reproduce. But the fix is > easy if the analysis above is right. > > -- > error compiling committee.c: too many arguments to function > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/31/2012 02:36 AM, Marcelo Tosatti wrote: > On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote: >> On 07/30/2012 05:07 PM, Chris Clayton wrote: >> >> >> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 >> >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem >> >>>> to be the problem. >> >>> >> >>> Just to be sure, I've run some more tests today. No crashes occurred in >> >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 >> >>> reverted. >> >> >> >> Ok. I'm trying to reproduce it here on a nested-virt setup, since the >> >> code looks correct. >> >> >> >> What's your preemption settings? >> >> >> >> >> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config >> > CONFIG_TREE_PREEMPT_RCU=y >> > CONFIG_PREEMPT_RCU=y >> > CONFIG_PREEMPT_NOTIFIERS=y >> > # CONFIG_PREEMPT_NONE is not set >> > # CONFIG_PREEMPT_VOLUNTARY is not set >> > CONFIG_PREEMPT=y >> > CONFIG_PREEMPT_COUNT=y >> >> Here's what I think that is happening >> >> vcpu_load >> ... >> vmx_save_host_state >> vmx_vcpu_run >> (ds.cpl, es.cpl cleared by hardware) >> >> interrupt >> push ds, es # pushes bad ds, es >> schedule >> vmx_vcpu_put >> vmx_load_host_state >> reload ds, es >> pop ds, es # of other thread's stack >> iret >> # other thread runs >> interrupt >> schedule # back in vcpu thread >> interrupt return: pop ds, es # <-- problem >> iret >> >> ... >> vcpu_put >> >> # bad ds, es, but !vmx->host_state.loaded >> >> Marcelo, did I miss something here? > > Don't think so. So the same problem should happen with %fs and %gs, no? x86_64 is safe, since it entry_64.S never saves/restores segment registers.
On Tue, Jul 31, 2012 at 12:11:13PM +0300, Avi Kivity wrote: > On 07/31/2012 02:36 AM, Marcelo Tosatti wrote: > > On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote: > >> On 07/30/2012 05:07 PM, Chris Clayton wrote: > >> >> > >> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 > >> >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem > >> >>>> to be the problem. > >> >>> > >> >>> Just to be sure, I've run some more tests today. No crashes occurred in > >> >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 > >> >>> reverted. > >> >> > >> >> Ok. I'm trying to reproduce it here on a nested-virt setup, since the > >> >> code looks correct. > >> >> > >> >> What's your preemption settings? > >> >> > >> >> > >> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config > >> > CONFIG_TREE_PREEMPT_RCU=y > >> > CONFIG_PREEMPT_RCU=y > >> > CONFIG_PREEMPT_NOTIFIERS=y > >> > # CONFIG_PREEMPT_NONE is not set > >> > # CONFIG_PREEMPT_VOLUNTARY is not set > >> > CONFIG_PREEMPT=y > >> > CONFIG_PREEMPT_COUNT=y > >> > >> Here's what I think that is happening > >> > >> vcpu_load > >> ... > >> vmx_save_host_state > >> vmx_vcpu_run > >> (ds.cpl, es.cpl cleared by hardware) > >> > >> interrupt > >> push ds, es # pushes bad ds, es > >> schedule > >> vmx_vcpu_put > >> vmx_load_host_state > >> reload ds, es > >> pop ds, es # of other thread's stack > >> iret > >> # other thread runs > >> interrupt > >> schedule # back in vcpu thread > >> interrupt return: pop ds, es # <-- problem > >> iret > >> > >> ... > >> vcpu_put > >> > >> # bad ds, es, but !vmx->host_state.loaded > >> > >> Marcelo, did I miss something here? > > > > Don't think so. > > So the same problem should happen with %fs and %gs, no? AFAICS: depends on CONFIG_X86_32_LAZY_GS for GS, unconditional for FS. > x86_64 is safe, since it entry_64.S never saves/restores segment registers. Is the comment /* * The sysexit path does not restore ds/es, so we must set them * to * a reasonable value ourselves. */ Correct? syscall_exit -> syscall_exit_work -> resume_userspace -> restore_all -> RESTORE_REGS -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/31/2012 07:29 PM, Marcelo Tosatti wrote: >> >> So the same problem should happen with %fs and %gs, no? > > AFAICS: > > depends on CONFIG_X86_32_LAZY_GS for GS, unconditional for FS. This fs/gs were already in there, I wonder how it wasn't broken before. Something's fishy here. > >> x86_64 is safe, since it entry_64.S never saves/restores segment registers. > > Is the comment > > /* > * The sysexit path does not restore ds/es, so we must set them > * to > * a reasonable value ourselves. > */ > > Correct? > > syscall_exit -> syscall_exit_work -> resume_userspace -> > restore_all -> RESTORE_REGS > That's the non-sysexit path (could have arrived here by sysenter). Look at sysenter_exit.
On 07/30/2012 07:39 PM, Avi Kivity wrote: > On 07/30/2012 05:07 PM, Chris Clayton wrote: >>> >>>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 >>>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem >>>>> to be the problem. >>>> >>>> Just to be sure, I've run some more tests today. No crashes occurred in >>>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 >>>> reverted. >>> >>> Ok. I'm trying to reproduce it here on a nested-virt setup, since the >>> code looks correct. >>> >>> What's your preemption settings? >>> >>> >> [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config >> CONFIG_TREE_PREEMPT_RCU=y >> CONFIG_PREEMPT_RCU=y >> CONFIG_PREEMPT_NOTIFIERS=y >> # CONFIG_PREEMPT_NONE is not set >> # CONFIG_PREEMPT_VOLUNTARY is not set >> CONFIG_PREEMPT=y >> CONFIG_PREEMPT_COUNT=y > > Here's what I think that is happening > > vcpu_load > ... > vmx_save_host_state > vmx_vcpu_run > (ds.cpl, es.cpl cleared by hardware) > > interrupt > push ds, es # pushes bad ds, es > schedule > vmx_vcpu_put > vmx_load_host_state > reload ds, es > pop ds, es # of other thread's stack > iret > # other thread runs > interrupt > schedule # back in vcpu thread > interrupt return: pop ds, es # <-- problem In fact, those are fine. > iret But IRET-to-outer-privilege-level clears segment registers with the wrong RPL. Think how secure OSes would be if they used the hardware fully. Credit to Gleb for pinpointing this. > > ... > vcpu_put > > # bad ds, es, but !vmx->host_state.loaded >
--- qemu-kvm-1.1.0/configure~ 2012-07-15 22:38:39.000000000 +0100 +++ qemu-kvm-1.1.0/configure 2012-07-15 22:39:09.000000000 +0100 @@ -2783,7 +2783,7 @@ int main(int argc, char **argv) } EOF if ! compile_prog "" "" ; then - CFLAGS+="-march=i486" + CFLAGS+="-march=i686" fi fi