diff mbox

qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6

Message ID 50111369.6020209@googlemail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Clayton July 26, 2012, 9:52 a.m. UTC
On 07/19/12 19:23, Chris Clayton wrote:
> On 07/19/12 13:17, Avi Kivity wrote:
>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>
>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>> crash
>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>> times more invocations before the crash occurs with 1.0.1 and I haven't
>>>> used qemu-kvm much in the past few weeks.
>>>>
>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
>>>> linux-3.4.4. I'll report back in a day or two.
>>>
>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
>>> That would indicate that the problem is in the kernel. However, I pulled
>>> the latest and greatest from Linus yesterday evening and I now can't get
>>> the crash there either, so whatever it was seems to have been fixed. If
>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
>>> so it's been fixed in the last few days.
>>
>> There were no kvm changes post-rc7.
>>
> Yes, I'm aware of that, Avi. This thread started because I was getting a
> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
> out the the problem was also present in v1.0.1, but much harder to hit.
> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
> version of qemu-kvm, was stable. So then it seemed that the problem was
> in the kernel, (but not necessarily in the kvm code).
>
> Something that's changed since rc7 has either fixed the problem or made
> it much harder to hit. With rc7 and earlier I can recreate the crash
> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
> rc7+, I haven't been able to get a crash at all.
>
Well, I'm getting the crash again, but this time I've managed to get a 
backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at 
qom/object.c:94
#4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at 
qom/object.c:149
#5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, 
typename=typename@entry=0x802b0c50 "apic-common")
     at qom/object.c:416
#6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
     typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
#7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, 
run=run@entry=0xb6239000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6

This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built 
against 3.4.4 kernel headers. The glibc, the kernel headers and the 
kernel are vanilla and the only change to the qemu-kvm sources is:

Please let me know of anything I can do to help track this down.

Thanks

Chris

> I'm not inclined to bisect to find out which patch provided the fix, but
> this mail should at least close the mail thread down tidily.
>
> Chris

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Avi Kivity July 26, 2012, 10:01 a.m. UTC | #1
On 07/26/2012 12:52 PM, Chris Clayton wrote:
> On 07/19/12 19:23, Chris Clayton wrote:
>> On 07/19/12 13:17, Avi Kivity wrote:
>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>
>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>> crash
>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>> times more invocations before the crash occurs with 1.0.1 and I
>>>>> haven't
>>>>> used qemu-kvm much in the past few weeks.
>>>>>
>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or
>>>>> 1.1.0) on
>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>
>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a
>>>> crash.
>>>> That would indicate that the problem is in the kernel. However, I
>>>> pulled
>>>> the latest and greatest from Linus yesterday evening and I now can't
>>>> get
>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty
>>>> quickly,
>>>> so it's been fixed in the last few days.
>>>
>>> There were no kvm changes post-rc7.
>>>
>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>> out the the problem was also present in v1.0.1, but much harder to hit.
>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>> version of qemu-kvm, was stable. So then it seemed that the problem was
>> in the kernel, (but not necessarily in the kvm code).
>>
>> Something that's changed since rc7 has either fixed the problem or made
>> it much harder to hit. With rc7 and earlier I can recreate the crash
>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>> rc7+, I haven't been able to get a crash at all.
>>
> Well, I'm getting the crash again, but this time I've managed to get a
> backtrace:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb60ffb40 (LWP 9405)]
> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> (gdb) bt
> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
> qom/object.c:94
> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at
> qom/object.c:149
> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818,
> typename=typename@entry=0x802b0c50 "apic-common")
>     at qom/object.c:416
> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>     typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60,
> run=run@entry=0xb6239000)
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
> #12 0xb77bbbbe in clone () from /lib/libc.so.6
> 
> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built

It looks like general memory corruption.  Is this repeatable?  What's
the guest uptime when it happens (i.e. is it immediate?)

Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
Jan Kiszka July 26, 2012, 10:29 a.m. UTC | #2
On 2012-07-26 12:01, Avi Kivity wrote:
> On 07/26/2012 12:52 PM, Chris Clayton wrote:
>> On 07/19/12 19:23, Chris Clayton wrote:
>>> On 07/19/12 13:17, Avi Kivity wrote:
>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>>
>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>>> crash
>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>>> times more invocations before the crash occurs with 1.0.1 and I
>>>>>> haven't
>>>>>> used qemu-kvm much in the past few weeks.
>>>>>>
>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or
>>>>>> 1.1.0) on
>>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>>
>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a
>>>>> crash.
>>>>> That would indicate that the problem is in the kernel. However, I
>>>>> pulled
>>>>> the latest and greatest from Linus yesterday evening and I now can't
>>>>> get
>>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty
>>>>> quickly,
>>>>> so it's been fixed in the last few days.
>>>>
>>>> There were no kvm changes post-rc7.
>>>>
>>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>>> out the the problem was also present in v1.0.1, but much harder to hit.
>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>>> version of qemu-kvm, was stable. So then it seemed that the problem was
>>> in the kernel, (but not necessarily in the kvm code).
>>>
>>> Something that's changed since rc7 has either fixed the problem or made
>>> it much harder to hit. With rc7 and earlier I can recreate the crash
>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>>> rc7+, I haven't been able to get a crash at all.
>>>
>> Well, I'm getting the crash again, but this time I've managed to get a
>> backtrace:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb60ffb40 (LWP 9405)]
>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> (gdb) bt
>> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
>> qom/object.c:94
>> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at
>> qom/object.c:149
>> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818,
>> typename=typename@entry=0x802b0c50 "apic-common")
>>     at qom/object.c:416
>> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>>     typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
>> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60,
>> run=run@entry=0xb6239000)
>>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
>> #12 0xb77bbbbe in clone () from /lib/libc.so.6
>>
>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
> 
> It looks like general memory corruption.  Is this repeatable?  What's
> the guest uptime when it happens (i.e. is it immediate?)
> 
> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?

To sync the userspace state with what the kernel maintains. Will end up
in kvm_apic_set_tpr which does precisely this. We always did, just the
QOM modeling is new.

Jan
Avi Kivity July 26, 2012, 10:45 a.m. UTC | #3
On 07/26/2012 01:29 PM, Jan Kiszka wrote:

>> It looks like general memory corruption.  Is this repeatable?  What's
>> the guest uptime when it happens (i.e. is it immediate?)
>> 
>> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
> 
> To sync the userspace state with what the kernel maintains. Will end up
> in kvm_apic_set_tpr which does precisely this. We always did, just the
> QOM modeling is new.

We should move it to the general register synchronization code, there is
no reason to do this every exit (though the cost is likely minimal).
Jan Kiszka July 26, 2012, 10:49 a.m. UTC | #4
On 2012-07-26 12:45, Avi Kivity wrote:
> On 07/26/2012 01:29 PM, Jan Kiszka wrote:
> 
>>> It looks like general memory corruption.  Is this repeatable?  What's
>>> the guest uptime when it happens (i.e. is it immediate?)
>>>
>>> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
>>
>> To sync the userspace state with what the kernel maintains. Will end up
>> in kvm_apic_set_tpr which does precisely this. We always did, just the
>> QOM modeling is new.
> 
> We should move it to the general register synchronization code, there is
> no reason to do this every exit (though the cost is likely minimal).

The cost is, well, was close to nothing. But I'm not sure about that QOM
type casting magic (and also it's locking requirements, long-term).
However, if that is a problem, it's likely a much bigger one anyway.

Jan
Jan Kiszka July 26, 2012, 11:04 a.m. UTC | #5
On 2012-07-26 12:49, Jan Kiszka wrote:
> On 2012-07-26 12:45, Avi Kivity wrote:
>> On 07/26/2012 01:29 PM, Jan Kiszka wrote:
>>
>>>> It looks like general memory corruption.  Is this repeatable?  What's
>>>> the guest uptime when it happens (i.e. is it immediate?)
>>>>
>>>> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
>>>
>>> To sync the userspace state with what the kernel maintains. Will end up
>>> in kvm_apic_set_tpr which does precisely this. We always did, just the
>>> QOM modeling is new.
>>
>> We should move it to the general register synchronization code, there is
>> no reason to do this every exit (though the cost is likely minimal).
> 
> The cost is, well, was close to nothing. But I'm not sure about that QOM
> type casting magic (and also it's locking requirements, long-term).
> However, if that is a problem, it's likely a much bigger one anyway.

But, independent of this, we can likely move the whole kvm_arch_post_run
out of the exit path for kvm_irqchip_in_kernel() == true. The price is
that we create more deviation between both, but that should be
controllable. I will play with a patch.

Jan
Xiao Guangrong July 26, 2012, 11:10 a.m. UTC | #6
Hi Chris,

Could you please try this patch?
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=ccebf448daf7964ee2aff7947c0bbe4c7962d059

On 07/26/2012 05:52 PM, Chris Clayton wrote:
> On 07/19/12 19:23, Chris Clayton wrote:
>> On 07/19/12 13:17, Avi Kivity wrote:
>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>
>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>> crash
>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>> times more invocations before the crash occurs with 1.0.1 and I haven't
>>>>> used qemu-kvm much in the past few weeks.
>>>>>
>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>
>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
>>>> That would indicate that the problem is in the kernel. However, I pulled
>>>> the latest and greatest from Linus yesterday evening and I now can't get
>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
>>>> so it's been fixed in the last few days.
>>>
>>> There were no kvm changes post-rc7.
>>>
>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>> out the the problem was also present in v1.0.1, but much harder to hit.
>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>> version of qemu-kvm, was stable. So then it seemed that the problem was
>> in the kernel, (but not necessarily in the kvm code).
>>
>> Something that's changed since rc7 has either fixed the problem or made
>> it much harder to hit. With rc7 and earlier I can recreate the crash
>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>> rc7+, I haven't been able to get a crash at all.
>>
> Well, I'm getting the crash again, but this time I've managed to get a backtrace:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb60ffb40 (LWP 9405)]
> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> (gdb) bt
> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at qom/object.c:94
> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at qom/object.c:149
> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, typename=typename@entry=0x802b0c50 "apic-common")
>     at qom/object.c:416
> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>     typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, run=run@entry=0xb6239000)
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
> #12 0xb77bbbbe in clone () from /lib/libc.so.6
> 
> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built against 3.4.4 kernel headers. The glibc, the kernel headers and the kernel are vanilla and the only change to the qemu-kvm sources is:
> 
> --- qemu-kvm-1.1.0/configure~   2012-07-15 22:38:39.000000000 +0100
> +++ qemu-kvm-1.1.0/configure    2012-07-15 22:39:09.000000000 +0100
> @@ -2783,7 +2783,7 @@ int main(int argc, char **argv)
>  }
>  EOF
>    if ! compile_prog "" "" ; then
> -    CFLAGS+="-march=i486"
> +    CFLAGS+="-march=i686"
>    fi
>  fi
> 
> Please let me know of anything I can do to help track this down.
> 
> Thanks
> 
> Chris
> 
>> I'm not inclined to bisect to find out which patch provided the fix, but
>> this mail should at least close the mail thread down tidily.
>>
>> Chris
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Clayton July 26, 2012, 11:58 a.m. UTC | #7
On 07/26/12 11:01, Avi Kivity wrote:
> On 07/26/2012 12:52 PM, Chris Clayton wrote:
>> On 07/19/12 19:23, Chris Clayton wrote:
>>> On 07/19/12 13:17, Avi Kivity wrote:
>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>>
>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>>> crash
>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>>> times more invocations before the crash occurs with 1.0.1 and I
>>>>>> haven't
>>>>>> used qemu-kvm much in the past few weeks.
>>>>>>
>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or
>>>>>> 1.1.0) on
>>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>>
>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a
>>>>> crash.
>>>>> That would indicate that the problem is in the kernel. However, I
>>>>> pulled
>>>>> the latest and greatest from Linus yesterday evening and I now can't
>>>>> get
>>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty
>>>>> quickly,
>>>>> so it's been fixed in the last few days.
>>>>
>>>> There were no kvm changes post-rc7.
>>>>
>>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>>> out the the problem was also present in v1.0.1, but much harder to hit.
>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>>> version of qemu-kvm, was stable. So then it seemed that the problem was
>>> in the kernel, (but not necessarily in the kvm code).
>>>
>>> Something that's changed since rc7 has either fixed the problem or made
>>> it much harder to hit. With rc7 and earlier I can recreate the crash
>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>>> rc7+, I haven't been able to get a crash at all.
>>>
>> Well, I'm getting the crash again, but this time I've managed to get a
>> backtrace:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb60ffb40 (LWP 9405)]
>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> (gdb) bt
>> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
>> qom/object.c:94
>> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at
>> qom/object.c:149
>> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818,
>> typename=typename@entry=0x802b0c50 "apic-common")
>>      at qom/object.c:416
>> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>>      typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
>> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60,
>> run=run@entry=0xb6239000)
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
>> #12 0xb77bbbbe in clone () from /lib/libc.so.6
>>
>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
>
> It looks like general memory corruption.  Is this repeatable?  What's
> the guest uptime when it happens (i.e. is it immediate?)

I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed 
early as XP was starting up - well before the desktop would have 
appeared. The other two crashed as XP was closing down, having been 
running for a few minutes (but not doing much).

The error messages seen through dmesg are:

qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in 
libc-2.16.so[b6b06000+1b4000]
qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in 
libc-2.16.so[b6ab9000+1b4000]
qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in 
libc-2.16.so[b6b96000+1b4000]
qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in 
libc-2.16.so[b6b54000+1b4000]
qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in 
libc-2.16.so[b6b1e000+1b4000]

The other 5 were OK, although I only did a bit of web browsing for  few 
minutes with IE.

>
> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
>
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 26, 2012, 12:07 p.m. UTC | #8
On 07/26/2012 02:58 PM, Chris Clayton wrote:

>> It looks like general memory corruption.  Is this repeatable?  What's
>> the guest uptime when it happens (i.e. is it immediate?)
> 
> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
> early as XP was starting up - well before the desktop would have
> appeared. The other two crashed as XP was closing down, having been
> running for a few minutes (but not doing much).
> 
> The error messages seen through dmesg are:
> 
> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
> libc-2.16.so[b6b06000+1b4000]
> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
> libc-2.16.so[b6ab9000+1b4000]
> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
> libc-2.16.so[b6b96000+1b4000]
> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
> libc-2.16.so[b6b54000+1b4000]
> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
> libc-2.16.so[b6b1e000+1b4000]
> 
> The other 5 were OK, although I only did a bit of web browsing for  few
> minutes with IE.

Failures always in the same place (I'm guess the variations are due to
PIE -- please configure with --disable-pie for future tests).

Please generate a core and look around, esp. in frame 3
(type_table_lookup).  Also try to dissect type_table (you may need to
install the glib debug symbols for this).
Jan Kiszka July 26, 2012, 12:09 p.m. UTC | #9
On 2012-07-26 13:58, Chris Clayton wrote:
> On 07/26/12 11:01, Avi Kivity wrote:
>> On 07/26/2012 12:52 PM, Chris Clayton wrote:
>>> On 07/19/12 19:23, Chris Clayton wrote:
>>>> On 07/19/12 13:17, Avi Kivity wrote:
>>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>>>
>>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>>>> crash
>>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>>>> times more invocations before the crash occurs with 1.0.1 and I
>>>>>>> haven't
>>>>>>> used qemu-kvm much in the past few weeks.
>>>>>>>
>>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or
>>>>>>> 1.1.0) on
>>>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>>>
>>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a
>>>>>> crash.
>>>>>> That would indicate that the problem is in the kernel. However, I
>>>>>> pulled
>>>>>> the latest and greatest from Linus yesterday evening and I now can't
>>>>>> get
>>>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty
>>>>>> quickly,
>>>>>> so it's been fixed in the last few days.
>>>>>
>>>>> There were no kvm changes post-rc7.
>>>>>
>>>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>>>> out the the problem was also present in v1.0.1, but much harder to hit.
>>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>>>> version of qemu-kvm, was stable. So then it seemed that the problem was
>>>> in the kernel, (but not necessarily in the kvm code).
>>>>
>>>> Something that's changed since rc7 has either fixed the problem or made
>>>> it much harder to hit. With rc7 and earlier I can recreate the crash
>>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>>>> rc7+, I haven't been able to get a crash at all.
>>>>
>>> Well, I'm getting the crash again, but this time I've managed to get a
>>> backtrace:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0xb60ffb40 (LWP 9405)]
>>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>> (gdb) bt
>>> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>>> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>>> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
>>> qom/object.c:94
>>> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at
>>> qom/object.c:149
>>> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818,
>>> typename=typename@entry=0x802b0c50 "apic-common")
>>>      at qom/object.c:416
>>> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>>>      typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
>>> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>>> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60,
>>> run=run@entry=0xb6239000)
>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>>> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at
>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
>>> #12 0xb77bbbbe in clone () from /lib/libc.so.6
>>>
>>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
>>
>> It looks like general memory corruption.  Is this repeatable?  What's
>> the guest uptime when it happens (i.e. is it immediate?)
> 
> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed 

Hmm, I'm running various XP SP3 here against qemu.git (now widely
equivalent to qemu-kvm), and I saw no crashes at all.

> early as XP was starting up - well before the desktop would have 
> appeared. The other two crashed as XP was closing down, having been 
> running for a few minutes (but not doing much).
> 
> The error messages seen through dmesg are:
> 
> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in 
> libc-2.16.so[b6b06000+1b4000]
> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in 
> libc-2.16.so[b6ab9000+1b4000]
> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in 
> libc-2.16.so[b6b96000+1b4000]
> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in 
> libc-2.16.so[b6b54000+1b4000]
> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in 
> libc-2.16.so[b6b1e000+1b4000]

Oh, you are running 32-bit userland? Also 32-bit kernel? Most of us do
64-on-64.

Jan
Chris Clayton July 26, 2012, 1:49 p.m. UTC | #10
On 07/26/12 12:10, Xiao Guangrong wrote:
> Hi Chris,
>
> Could you please try this patch?
> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=ccebf448daf7964ee2aff7947c0bbe4c7962d059
>

Sorry, that patch does not fix the crashes.

> On 07/26/2012 05:52 PM, Chris Clayton wrote:
>> On 07/19/12 19:23, Chris Clayton wrote:
>>> On 07/19/12 13:17, Avi Kivity wrote:
>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>>
>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>>> crash
>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>>> times more invocations before the crash occurs with 1.0.1 and I haven't
>>>>>> used qemu-kvm much in the past few weeks.
>>>>>>
>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
>>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>>
>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
>>>>> That would indicate that the problem is in the kernel. However, I pulled
>>>>> the latest and greatest from Linus yesterday evening and I now can't get
>>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
>>>>> so it's been fixed in the last few days.
>>>>
>>>> There were no kvm changes post-rc7.
>>>>
>>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>>> out the the problem was also present in v1.0.1, but much harder to hit.
>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>>> version of qemu-kvm, was stable. So then it seemed that the problem was
>>> in the kernel, (but not necessarily in the kvm code).
>>>
>>> Something that's changed since rc7 has either fixed the problem or made
>>> it much harder to hit. With rc7 and earlier I can recreate the crash
>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>>> rc7+, I haven't been able to get a crash at all.
>>>
>> Well, I'm getting the crash again, but this time I've managed to get a backtrace:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb60ffb40 (LWP 9405)]
>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> (gdb) bt
>> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at qom/object.c:94
>> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at qom/object.c:149
>> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, typename=typename@entry=0x802b0c50 "apic-common")
>>      at qom/object.c:416
>> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>>      typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
>> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, run=run@entry=0xb6239000)
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
>> #12 0xb77bbbbe in clone () from /lib/libc.so.6
>>
>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built against 3.4.4 kernel headers. The glibc, the kernel headers and the kernel are vanilla and the only change to the qemu-kvm sources is:
>>
>> --- qemu-kvm-1.1.0/configure~   2012-07-15 22:38:39.000000000 +0100
>> +++ qemu-kvm-1.1.0/configure    2012-07-15 22:39:09.000000000 +0100
>> @@ -2783,7 +2783,7 @@ int main(int argc, char **argv)
>>   }
>>   EOF
>>     if ! compile_prog "" "" ; then
>> -    CFLAGS+="-march=i486"
>> +    CFLAGS+="-march=i686"
>>     fi
>>   fi
>>
>> Please let me know of anything I can do to help track this down.
>>
>> Thanks
>>
>> Chris
>>
>>> I'm not inclined to bisect to find out which patch provided the fix, but
>>> this mail should at least close the mail thread down tidily.
>>>
>>> Chris
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Clayton July 26, 2012, 11:22 p.m. UTC | #11
On 07/26/12 13:07, Avi Kivity wrote:
> On 07/26/2012 02:58 PM, Chris Clayton wrote:
>
>>> It looks like general memory corruption.  Is this repeatable?  What's
>>> the guest uptime when it happens (i.e. is it immediate?)
>>
>> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
>> early as XP was starting up - well before the desktop would have
>> appeared. The other two crashed as XP was closing down, having been
>> running for a few minutes (but not doing much).
>>
>> The error messages seen through dmesg are:
>>
>> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
>> libc-2.16.so[b6b06000+1b4000]
>> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
>> libc-2.16.so[b6ab9000+1b4000]
>> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
>> libc-2.16.so[b6b96000+1b4000]
>> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
>> libc-2.16.so[b6b54000+1b4000]
>> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
>> libc-2.16.so[b6b1e000+1b4000]
>>
>> The other 5 were OK, although I only did a bit of web browsing for  few
>> minutes with IE.
>
> Failures always in the same place (I'm guess the variations are due to
> PIE -- please configure with --disable-pie for future tests).
>
> Please generate a core and look around, esp. in frame 3
> (type_table_lookup).  Also try to dissect type_table (you may need to
> install the glib debug symbols for this).
>
>
>
Mmm, I'm sailing out of my comfort zone here, but I've built a debug 
version of glib and trapped another crash. The backtrace is:

(gdb) bt
#0  0xb7822d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, 
key=0x8319b82, hash_return=0xb60ff178)
     at ghash.c:422
#3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, 
key=key@entry=0x8319b82) at ghash.c:1074
#4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at 
qom/object.c:94
#5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at 
qom/object.c:149
#6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, 
typename=typename@entry=0x8319b82 "apic-common")
     at qom/object.c:416
#7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
     typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
#8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=8 '\b')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a3ca60, 
run=run@entry=0xb6258000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77dabbe in clone () from /lib/libc.so.6

Inspecting the args passed into g_str_equal shows:

(gdb) print (gchar *) 0x8a0cd58
$12 = (gchar *) 0x8a0cd58 "apic-common"
(gdb) print (gchar *) 0x8319b82
$13 = (gchar *) 0x8319b82 "apic-common"

So it seems odd that glibc's implementation of strcmp should crash with 
two equal strings. As I say, however, I'm a bit out of my comfort zone 
here, so I may be missing something.

I wouldn't know how to go about disecting type_table, which I assume is 
the hash_table arg passed into g_hash_table_lookup, so advice on how to 
do that and what I am looking for (NULL pointer?) would be helpful.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Clayton July 27, 2012, 10:46 a.m. UTC | #12
On 07/27/12 00:22, Chris Clayton wrote:
> On 07/26/12 13:07, Avi Kivity wrote:
>> On 07/26/2012 02:58 PM, Chris Clayton wrote:
>>
>>>> It looks like general memory corruption.  Is this repeatable?  What's
>>>> the guest uptime when it happens (i.e. is it immediate?)
>>>
>>> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
>>> early as XP was starting up - well before the desktop would have
>>> appeared. The other two crashed as XP was closing down, having been
>>> running for a few minutes (but not doing much).
>>>
>>> The error messages seen through dmesg are:
>>>
>>> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
>>> libc-2.16.so[b6b06000+1b4000]
>>> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
>>> libc-2.16.so[b6ab9000+1b4000]
>>> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
>>> libc-2.16.so[b6b96000+1b4000]
>>> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
>>> libc-2.16.so[b6b54000+1b4000]
>>> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
>>> libc-2.16.so[b6b1e000+1b4000]
>>>
>>> The other 5 were OK, although I only did a bit of web browsing for  few
>>> minutes with IE.
>>
>> Failures always in the same place (I'm guess the variations are due to
>> PIE -- please configure with --disable-pie for future tests).
>>
>> Please generate a core and look around, esp. in frame 3
>> (type_table_lookup).  Also try to dissect type_table (you may need to
>> install the glib debug symbols for this).
>>
>>
>>
<snip>
Here's another backtrace and source listing of the failing function, 
following build and installation of libc (2.16) with debugging turned 
on. I'm afraid it's beyond my current knowledge to know what this might 
be telling us.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 6515)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217             movdqu  (%edx), %xmm2
(gdb) generate-core-file
Saved corefile core.6509
(gdb) bt
#0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, 
key=0x8319b82, hash_return=0xb60ff178)
     at ghash.c:422
#3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, 
key=key@entry=0x8319b82) at ghash.c:1074
#4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at 
qom/object.c:94
#5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at 
qom/object.c:149
#6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, 
typename=typename@entry=0x8319b82 "apic-common")
     at qom/object.c:416
#7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
     typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
#8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a3ca60, 
run=run@entry=0xb6271000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) print *(0x8a0cd58)
$1 = 1667854433
(gdb) print (char*) 0x8a0cd58
$2 = 0x8a0cd58 "apic-common"
(gdb) list __strcmp_sse4_2
201             PUSH    (REM)
202     #endif
203     #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
204             PUSH    (%edi)
205     #endif
206             mov     STR1(%esp), %edx
207             mov     STR2(%esp), %eax
208     #if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
209             movl    CNT(%esp), REM
210             test    REM, REM
(gdb) list
211             je      L(eq)
212     #endif
213             mov     %dx, %cx
214             and     $0xfff, %cx
215             cmp     $0xff0, %cx
216             ja      L(first4bytes)
217             movdqu  (%edx), %xmm2
218             mov     %eax, %ecx
219             and     $0xfff, %ecx
220             cmp     $0xff0, %ecx
(gdb) list
221             ja      L(first4bytes)
222     #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
223     # define TOLOWER(reg1, reg2) \
224             movdqa  reg1, %xmm3; 
               \
225             movdqa  UCHIGH_reg, %xmm4; 
               \
226             movdqa  reg2, %xmm5; 
               \
227             movdqa  UCHIGH_reg, %xmm6; 
               \
228             pcmpgtb UCLOW_reg, %xmm3; 
               \
229             pcmpgtb reg1, %xmm4; 
               \
230             pcmpgtb UCLOW_reg, %xmm5; 
               \
(gdb)

I'll stop sending backtraces etc in now in the hope that someone will 
advise me on how I might better direct my efforts.

Thanks for your help so far.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Clayton July 27, 2012, 7:04 p.m. UTC | #13
On 07/27/12 19:08, Eric Northup wrote:
> Could you include the output of "info registers" at the point where it
> crashed?
>

Here you go:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6a78b40 (LWP 13249)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217             movdqu  (%edx), %xmm2
(gdb) bt
#0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, 
key=0x8319b82, hash_return=0xb6a78178)
     at ghash.c:422
#3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, 
key=key@entry=0x8319b82) at ghash.c:1074
#4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at 
qom/object.c:94
#5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at 
qom/object.c:149
#6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0, 
typename=typename@entry=0x8319b82 "apic-common")
     at qom/object.c:416
#7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
     typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
#8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370, 
run=run@entry=0xb6274000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) info registers
eax            0x8319b82        137468802
ecx            0xd58    3416
edx            0x8a0cd58        144756056
ebx            0xb7f7f2c4       -1208487228
esp            0xb6a780ec       0xb6a780ec
ebp            0xb6a78118       0xb6a78118
esi            0x8a313e0        144905184
edi            0xc513   50451
eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
eflags         0x10283  [ CF SF IF RF ]
cs             0x73     115
ss             0x7b     123
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x33     51


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 29, 2012, 12:42 p.m. UTC | #14
On 07/27/2012 10:04 PM, Chris Clayton wrote:
> On 07/27/12 19:08, Eric Northup wrote:
>> Could you include the output of "info registers" at the point where it
>> crashed?
>>
> 
> Here you go:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb6a78b40 (LWP 13249)]
> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
> 217             movdqu  (%edx), %xmm2
> (gdb) bt
> #0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
> key=0x8319b82, hash_return=0xb6a78178)
>     at ghash.c:422
> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
> key=key@entry=0x8319b82) at ghash.c:1074
> #4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
> qom/object.c:94
> #5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at
> qom/object.c:149
> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0,
> typename=typename@entry=0x8319b82 "apic-common")
>     at qom/object.c:416
> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
>     typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
> #9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370,
> run=run@entry=0xb6274000)
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
> #13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
> (gdb) info registers
> eax            0x8319b82        137468802
> ecx            0xd58    3416
> edx            0x8a0cd58        144756056
> ebx            0xb7f7f2c4       -1208487228
> esp            0xb6a780ec       0xb6a780ec
> ebp            0xb6a78118       0xb6a78118
> esi            0x8a313e0        144905184
> edi            0xc513   50451
> eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
> eflags         0x10283  [ CF SF IF RF ]
> cs             0x73     115
> ss             0x7b     123
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x33     51
> 

ds shouldn't be zero for a 32-bit process.

But that should have crashed *much* earlier, ds is accessed all the time.

Please add the following snippet to the beginning of kvm_arch_post_run():

{
    unsigned short ds;
    asm("mov %%ds, %0" : "=rm"(ds));
    assert(ds != 0);
}

if the assert triggers, then kvm corrupted the segment registers.  If
not, corruption happens somewhere above.
Chris Clayton July 29, 2012, 2:03 p.m. UTC | #15
On 07/29/12 13:42, Avi Kivity wrote:
> On 07/27/2012 10:04 PM, Chris Clayton wrote:
>> On 07/27/12 19:08, Eric Northup wrote:
>>> Could you include the output of "info registers" at the point where it
>>> crashed?
>>>
>>
>> Here you go:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb6a78b40 (LWP 13249)]
>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>> 217             movdqu  (%edx), %xmm2
>> (gdb) bt
>> #0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
>> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
>> key=0x8319b82, hash_return=0xb6a78178)
>>      at ghash.c:422
>> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
>> key=key@entry=0x8319b82) at ghash.c:1074
>> #4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
>> qom/object.c:94
>> #5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at
>> qom/object.c:149
>> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0,
>> typename=typename@entry=0x8319b82 "apic-common")
>>      at qom/object.c:416
>> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
>>      typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
>> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370,
>> run=run@entry=0xb6274000)
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
>> #13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
>> (gdb) info registers
>> eax            0x8319b82        137468802
>> ecx            0xd58    3416
>> edx            0x8a0cd58        144756056
>> ebx            0xb7f7f2c4       -1208487228
>> esp            0xb6a780ec       0xb6a780ec
>> ebp            0xb6a78118       0xb6a78118
>> esi            0x8a313e0        144905184
>> edi            0xc513   50451
>> eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
>> eflags         0x10283  [ CF SF IF RF ]
>> cs             0x73     115
>> ss             0x7b     123
>> ds             0x0      0
>> es             0x0      0
>> fs             0x0      0
>> gs             0x33     51
>>
>
> ds shouldn't be zero for a 32-bit process.
>
> But that should have crashed *much* earlier, ds is accessed all the time.
>
> Please add the following snippet to the beginning of kvm_arch_post_run():
>
> {
>      unsigned short ds;
>      asm("mov %%ds, %0" : "=rm"(ds));
>      assert(ds != 0);
> }
>
> if the assert triggers, then kvm corrupted the segment registers.  If
> not, corruption happens somewhere above.
>
Thanks, Avi.

The assert didn't trigger - I got:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 2134)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217             movdqu  (%edx), %xmm2
(gdb) info registers
eax            0x8319ba2        137468834
ecx            0xd58    3416
edx            0x8a0cd58        144756056
ebx            0xb7f7f2c4       -1208487228
esp            0xb60ff0ec       0xb60ff0ec
ebp            0xb60ff118       0xb60ff118
esi            0x8a44818        144984088
edi            0xc513   50451
eip            0xb7820f77       0xb7820f77 <__strcmp_sse4_2+23>
eflags         0x10283  [ CF SF IF RF ]
cs             0x73     115
ss             0x7b     123
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x33     51
(gdb) list
212     #endif
213             mov     %dx, %cx
214             and     $0xfff, %cx
215             cmp     $0xff0, %cx
216             ja      L(first4bytes)
217             movdqu  (%edx), %xmm2
218             mov     %eax, %ecx
219             and     $0xfff, %ecx
220             cmp     $0xff0, %ecx
221             ja      L(first4bytes)
(gdb) bt
#0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
#2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, 
key=0x8319ba2, hash_return=0xb60ff178)
     at ghash.c:422
#3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, 
key=key@entry=0x8319ba2) at ghash.c:1074
#4  0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at 
qom/object.c:94
#5  type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at 
qom/object.c:149
#6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, 
typename=typename@entry=0x8319ba2 "apic-common")
     at qom/object.c:416
#7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
     typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478
#8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9  0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60, 
run=run@entry=0xb626d000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
#10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132

I think you are saying that the problem isn't in kvm, so where would you 
recommend I continue investigations. I'm not seeing a crash with any 
other applications.

Thanks again.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 29, 2012, 2:18 p.m. UTC | #16
On 07/29/2012 05:03 PM, Chris Clayton wrote:
> On 07/29/12 13:42, Avi Kivity wrote:
>> On 07/27/2012 10:04 PM, Chris Clayton wrote:
>>> On 07/27/12 19:08, Eric Northup wrote:
>>>> Could you include the output of "info registers" at the point where it
>>>> crashed?
>>>>
>>>
>>> Here you go:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0xb6a78b40 (LWP 13249)]
>>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>>> 217             movdqu  (%edx), %xmm2
>>> (gdb) bt
>>> #0  __strcmp_sse4_2 () at
>>> ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>>> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at
>>> ghash.c:1704
>>> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
>>> key=0x8319b82, hash_return=0xb6a78178)
>>>      at ghash.c:422
>>> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
>>> key=key@entry=0x8319b82) at ghash.c:1074
>>> #4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
>>> qom/object.c:94
>>> #5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at
>>> qom/object.c:149
>>> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0,
>>> typename=typename@entry=0x8319b82 "apic-common")
>>>      at qom/object.c:416
>>> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
>>>      typename=typename@entry=0x8319b82 "apic-common") at
>>> qom/object.c:478
>>> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>>> #9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370,
>>> run=run@entry=0xb6274000)
>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at
>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>>> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
>>> #13 0xb77e45ee in clone () at
>>> ../sysdeps/unix/sysv/linux/i386/clone.S:132
>>> (gdb) info registers
>>> eax            0x8319b82        137468802
>>> ecx            0xd58    3416
>>> edx            0x8a0cd58        144756056
>>> ebx            0xb7f7f2c4       -1208487228
>>> esp            0xb6a780ec       0xb6a780ec
>>> ebp            0xb6a78118       0xb6a78118
>>> esi            0x8a313e0        144905184
>>> edi            0xc513   50451
>>> eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
>>> eflags         0x10283  [ CF SF IF RF ]
>>> cs             0x73     115
>>> ss             0x7b     123
>>> ds             0x0      0
>>> es             0x0      0
>>> fs             0x0      0
>>> gs             0x33     51
>>>
>>
>> ds shouldn't be zero for a 32-bit process.
>>
>> But that should have crashed *much* earlier, ds is accessed all the time.
>>
>> Please add the following snippet to the beginning of kvm_arch_post_run():
>>
>> {
>>      unsigned short ds;
>>      asm("mov %%ds, %0" : "=rm"(ds));
>>      assert(ds != 0);
>> }
>>
>> if the assert triggers, then kvm corrupted the segment registers.  If
>> not, corruption happens somewhere above.
>>
> Thanks, Avi.
> 
> The assert didn't trigger - I got:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb60ffb40 (LWP 2134)]
> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
> 217             movdqu  (%edx), %xmm2
> (gdb) info registers
> eax            0x8319ba2        137468834
> ecx            0xd58    3416
> edx            0x8a0cd58        144756056
> ebx            0xb7f7f2c4       -1208487228
> esp            0xb60ff0ec       0xb60ff0ec
> ebp            0xb60ff118       0xb60ff118
> esi            0x8a44818        144984088
> edi            0xc513   50451
> eip            0xb7820f77       0xb7820f77 <__strcmp_sse4_2+23>
> eflags         0x10283  [ CF SF IF RF ]
> cs             0x73     115
> ss             0x7b     123
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x33     51
> (gdb) list
> 212     #endif
> 213             mov     %dx, %cx
> 214             and     $0xfff, %cx
> 215             cmp     $0xff0, %cx
> 216             ja      L(first4bytes)
> 217             movdqu  (%edx), %xmm2
> 218             mov     %eax, %ecx
> 219             and     $0xfff, %ecx
> 220             cmp     $0xff0, %ecx
> 221             ja      L(first4bytes)
> (gdb) bt
> #0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
> key=0x8319ba2, hash_return=0xb60ff178)
>     at ghash.c:422
> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
> key=key@entry=0x8319ba2) at ghash.c:1074
> #4  0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at
> qom/object.c:94
> #5  type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at
> qom/object.c:149
> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818,
> typename=typename@entry=0x8319ba2 "apic-common")
>     at qom/object.c:416
> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
>     typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478
> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
> #9  0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60,
> run=run@entry=0xb626d000)
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
> #12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
> #13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
> 
> I think you are saying that the problem isn't in kvm, so where would you
> recommend I continue investigations. I'm not seeing a crash with any
> other applications.

What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.

You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
Avi Kivity July 29, 2012, 2:48 p.m. UTC | #17
On 07/29/2012 05:18 PM, Avi Kivity wrote:
>> 
>> I think you are saying that the problem isn't in kvm, so where would you
>> recommend I continue investigations. I'm not seeing a crash with any
>> other applications.
> 
> What might have happened is that the movdqu instruction faulted (as it's
> an fpu instruction), and on the way back from the fault, ds and es
> didn't get restored correctly.
> 
> You can test this by writing a trivial version of g_str_equal()
> somewhere in the qemu source code and rebuilding it.

You're running a 32-bit kernel, yes?  Please confirm.
Chris Clayton July 29, 2012, 3:21 p.m. UTC | #18
On 07/29/12 15:48, Avi Kivity wrote:
> On 07/29/2012 05:18 PM, Avi Kivity wrote:
>>>
>>> I think you are saying that the problem isn't in kvm, so where would you
>>> recommend I continue investigations. I'm not seeing a crash with any
>>> other applications.
>>
>> What might have happened is that the movdqu instruction faulted (as it's
>> an fpu instruction), and on the way back from the fault, ds and es
>> didn't get restored correctly.
>>
>> You can test this by writing a trivial version of g_str_equal()
>> somewhere in the qemu source code and rebuilding it.
>
> You're running a 32-bit kernel, yes?  Please confirm.
>
>
Yes, I am running a 32-bit kernel and userland.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 29, 2012, 3:47 p.m. UTC | #19
On 07/29/2012 05:18 PM, Avi Kivity wrote:
> On 07/29/2012 05:03 PM, Chris Clayton wrote:
>> On 07/29/12 13:42, Avi Kivity wrote:
>>> On 07/27/2012 10:04 PM, Chris Clayton wrote:
>>>> On 07/27/12 19:08, Eric Northup wrote:
>>>>> Could you include the output of "info registers" at the point where it
>>>>> crashed?
>>>>>
>>>>
>>>> Here you go:
>>>>
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> [Switching to Thread 0xb6a78b40 (LWP 13249)]
>>>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>>>> 217             movdqu  (%edx), %xmm2
>>>> (gdb) bt
>>>> #0  __strcmp_sse4_2 () at
>>>> ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>>>> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at
>>>> ghash.c:1704
>>>> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
>>>> key=0x8319b82, hash_return=0xb6a78178)
>>>>      at ghash.c:422
>>>> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
>>>> key=key@entry=0x8319b82) at ghash.c:1074
>>>> #4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
>>>> qom/object.c:94
>>>> #5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at
>>>> qom/object.c:149
>>>> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0,
>>>> typename=typename@entry=0x8319b82 "apic-common")
>>>>      at qom/object.c:416
>>>> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
>>>>      typename=typename@entry=0x8319b82 "apic-common") at
>>>> qom/object.c:478
>>>> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
>>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>>>> #9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370,
>>>> run=run@entry=0xb6274000)
>>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>>>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at
>>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>>>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
>>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>>>> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
>>>> #13 0xb77e45ee in clone () at
>>>> ../sysdeps/unix/sysv/linux/i386/clone.S:132
>>>> (gdb) info registers
>>>> eax            0x8319b82        137468802
>>>> ecx            0xd58    3416
>>>> edx            0x8a0cd58        144756056
>>>> ebx            0xb7f7f2c4       -1208487228
>>>> esp            0xb6a780ec       0xb6a780ec
>>>> ebp            0xb6a78118       0xb6a78118
>>>> esi            0x8a313e0        144905184
>>>> edi            0xc513   50451
>>>> eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
>>>> eflags         0x10283  [ CF SF IF RF ]
>>>> cs             0x73     115
>>>> ss             0x7b     123
>>>> ds             0x0      0
>>>> es             0x0      0
>>>> fs             0x0      0
>>>> gs             0x33     51
>>>>
>>>
>>> ds shouldn't be zero for a 32-bit process.
>>>
>>> But that should have crashed *much* earlier, ds is accessed all the time.
>>>
>>> Please add the following snippet to the beginning of kvm_arch_post_run():
>>>
>>> {
>>>      unsigned short ds;
>>>      asm("mov %%ds, %0" : "=rm"(ds));
>>>      assert(ds != 0);
>>> }
>>>
>>> if the assert triggers, then kvm corrupted the segment registers.  If
>>> not, corruption happens somewhere above.
>>>
>> Thanks, Avi.
>> 
>> The assert didn't trigger - I got:
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb60ffb40 (LWP 2134)]
>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>> 217             movdqu  (%edx), %xmm2
>> (gdb) info registers
>> eax            0x8319ba2        137468834
>> ecx            0xd58    3416
>> edx            0x8a0cd58        144756056
>> ebx            0xb7f7f2c4       -1208487228
>> esp            0xb60ff0ec       0xb60ff0ec
>> ebp            0xb60ff118       0xb60ff118
>> esi            0x8a44818        144984088
>> edi            0xc513   50451
>> eip            0xb7820f77       0xb7820f77 <__strcmp_sse4_2+23>
>> eflags         0x10283  [ CF SF IF RF ]
>> cs             0x73     115
>> ss             0x7b     123
>> ds             0x0      0
>> es             0x0      0
>> fs             0x0      0
>> gs             0x33     51
>> (gdb) list
>> 212     #endif
>> 213             mov     %dx, %cx
>> 214             and     $0xfff, %cx
>> 215             cmp     $0xff0, %cx
>> 216             ja      L(first4bytes)
>> 217             movdqu  (%edx), %xmm2
>> 218             mov     %eax, %ecx
>> 219             and     $0xfff, %ecx
>> 220             cmp     $0xff0, %ecx
>> 221             ja      L(first4bytes)
>> (gdb) bt
>> #0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
>> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
>> key=0x8319ba2, hash_return=0xb60ff178)
>>     at ghash.c:422
>> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
>> key=key@entry=0x8319ba2) at ghash.c:1074
>> #4  0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at
>> qom/object.c:94
>> #5  type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at
>> qom/object.c:149
>> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818,
>> typename=typename@entry=0x8319ba2 "apic-common")
>>     at qom/object.c:416
>> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
>>     typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478
>> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
>>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #9  0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60,
>> run=run@entry=0xb626d000)
>>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
>> #13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
>> 
>> I think you are saying that the problem isn't in kvm, so where would you
>> recommend I continue investigations. I'm not seeing a crash with any
>> other applications.
> 
> What might have happened is that the movdqu instruction faulted (as it's
> an fpu instruction), and on the way back from the fault, ds and es
> didn't get restored correctly.
> 
> You can test this by writing a trivial version of g_str_equal()
> somewhere in the qemu source code and rebuilding it.
> 
> 

from entry_32.S:

.macro RESTORE_REGS pop=0
	RESTORE_INT_REGS
1:	popl_cfi %ds
	/*CFI_RESTORE ds;*/
2:	popl_cfi %es
	/*CFI_RESTORE es;*/
3:	popl_cfi %fs
	/*CFI_RESTORE fs;*/
	POP_GS \pop
.pushsection .fixup, "ax"
4:	movl $0, (%esp)
	jmp 1b
5:	movl $0, (%esp)
	jmp 2b
6:	movl $0, (%esp)
	jmp 3b
.popsection

this piece of code tries to restore %ds, and if it fails, zeros it,
which is consistent with the core dump.

This could happen if kvm is failing to restore GDT correctly.
Avi Kivity July 29, 2012, 4:34 p.m. UTC | #20
On 07/29/2012 06:47 PM, Avi Kivity wrote:
>> What might have happened is that the movdqu instruction faulted (as it's
>> an fpu instruction), and on the way back from the fault, ds and es
>> didn't get restored correctly.
>> 
>> You can test this by writing a trivial version of g_str_equal()
>> somewhere in the qemu source code and rebuilding it.
>> 
>> 
> 
> from entry_32.S:
> 
> .macro RESTORE_REGS pop=0
> 	RESTORE_INT_REGS
> 1:	popl_cfi %ds
> 	/*CFI_RESTORE ds;*/
> 2:	popl_cfi %es
> 	/*CFI_RESTORE es;*/
> 3:	popl_cfi %fs
> 	/*CFI_RESTORE fs;*/
> 	POP_GS \pop
> .pushsection .fixup, "ax"
> 4:	movl $0, (%esp)
> 	jmp 1b
> 5:	movl $0, (%esp)
> 	jmp 2b
> 6:	movl $0, (%esp)
> 	jmp 3b
> .popsection
> 
> this piece of code tries to restore %ds, and if it fails, zeros it,
> which is consistent with the core dump.
> 
> This could happen if kvm is failing to restore GDT correctly.
> 

Possible culprit: b2da15ac26a0c00.
Chris Clayton July 29, 2012, 5:50 p.m. UTC | #21
On 07/29/12 17:34, Avi Kivity wrote:
> On 07/29/2012 06:47 PM, Avi Kivity wrote:
>>> What might have happened is that the movdqu instruction faulted (as it's
>>> an fpu instruction), and on the way back from the fault, ds and es
>>> didn't get restored correctly.
>>>
>>> You can test this by writing a trivial version of g_str_equal()
>>> somewhere in the qemu source code and rebuilding it.
>>>
>>>
>>
>> from entry_32.S:
>>
>> .macro RESTORE_REGS pop=0
>> 	RESTORE_INT_REGS
>> 1:	popl_cfi %ds
>> 	/*CFI_RESTORE ds;*/
>> 2:	popl_cfi %es
>> 	/*CFI_RESTORE es;*/
>> 3:	popl_cfi %fs
>> 	/*CFI_RESTORE fs;*/
>> 	POP_GS \pop
>> .pushsection .fixup, "ax"
>> 4:	movl $0, (%esp)
>> 	jmp 1b
>> 5:	movl $0, (%esp)
>> 	jmp 2b
>> 6:	movl $0, (%esp)
>> 	jmp 3b
>> .popsection
>>
>> this piece of code tries to restore %ds, and if it fails, zeros it,
>> which is consistent with the core dump.
>>
>> This could happen if kvm is failing to restore GDT correctly.
>>
>
> Possible culprit: b2da15ac26a0c00.
>
>
That commit isn't in qermu-kvm-1.1.1.

I'm testing a build with g_str_equal implemented in kvm.c and so far I 
haven't had a crash in 6 invocations. That hasn't been possible with 
vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be sure.

Thanks for your help, Avi.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov July 29, 2012, 5:54 p.m. UTC | #22
On Sun, Jul 29, 2012 at 06:50:09PM +0100, Chris Clayton wrote:
> On 07/29/12 17:34, Avi Kivity wrote:
> >On 07/29/2012 06:47 PM, Avi Kivity wrote:
> >>>What might have happened is that the movdqu instruction faulted (as it's
> >>>an fpu instruction), and on the way back from the fault, ds and es
> >>>didn't get restored correctly.
> >>>
> >>>You can test this by writing a trivial version of g_str_equal()
> >>>somewhere in the qemu source code and rebuilding it.
> >>>
> >>>
> >>
> >>from entry_32.S:
> >>
> >>.macro RESTORE_REGS pop=0
> >>	RESTORE_INT_REGS
> >>1:	popl_cfi %ds
> >>	/*CFI_RESTORE ds;*/
> >>2:	popl_cfi %es
> >>	/*CFI_RESTORE es;*/
> >>3:	popl_cfi %fs
> >>	/*CFI_RESTORE fs;*/
> >>	POP_GS \pop
> >>.pushsection .fixup, "ax"
> >>4:	movl $0, (%esp)
> >>	jmp 1b
> >>5:	movl $0, (%esp)
> >>	jmp 2b
> >>6:	movl $0, (%esp)
> >>	jmp 3b
> >>.popsection
> >>
> >>this piece of code tries to restore %ds, and if it fails, zeros it,
> >>which is consistent with the core dump.
> >>
> >>This could happen if kvm is failing to restore GDT correctly.
> >>
> >
> >Possible culprit: b2da15ac26a0c00.
> >
> >
> That commit isn't in qermu-kvm-1.1.1.
> 
It is in kernel.

> I'm testing a build with g_str_equal implemented in kvm.c and so far
> I haven't had a crash in 6 invocations. That hasn't been possible
> with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be
> sure.
> 
> Thanks for your help, Avi.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Clayton July 29, 2012, 7:10 p.m. UTC | #23
On 07/29/12 18:54, Gleb Natapov wrote:
> On Sun, Jul 29, 2012 at 06:50:09PM +0100, Chris Clayton wrote:
>> On 07/29/12 17:34, Avi Kivity wrote:
>>> On 07/29/2012 06:47 PM, Avi Kivity wrote:
>>>>> What might have happened is that the movdqu instruction faulted (as it's
>>>>> an fpu instruction), and on the way back from the fault, ds and es
>>>>> didn't get restored correctly.
>>>>>
>>>>> You can test this by writing a trivial version of g_str_equal()
>>>>> somewhere in the qemu source code and rebuilding it.
>>>>>
>>>>>
>>>>
>>> >from entry_32.S:
>>>>
>>>> .macro RESTORE_REGS pop=0
>>>> 	RESTORE_INT_REGS
>>>> 1:	popl_cfi %ds
>>>> 	/*CFI_RESTORE ds;*/
>>>> 2:	popl_cfi %es
>>>> 	/*CFI_RESTORE es;*/
>>>> 3:	popl_cfi %fs
>>>> 	/*CFI_RESTORE fs;*/
>>>> 	POP_GS \pop
>>>> .pushsection .fixup, "ax"
>>>> 4:	movl $0, (%esp)
>>>> 	jmp 1b
>>>> 5:	movl $0, (%esp)
>>>> 	jmp 2b
>>>> 6:	movl $0, (%esp)
>>>> 	jmp 3b
>>>> .popsection
>>>>
>>>> this piece of code tries to restore %ds, and if it fails, zeros it,
>>>> which is consistent with the core dump.
>>>>
>>>> This could happen if kvm is failing to restore GDT correctly.
>>>>
>>>
>>> Possible culprit: b2da15ac26a0c00.
>>>
>>>
>> That commit isn't in qermu-kvm-1.1.1.
>>
> It is in kernel.
>

Sorry, so it is.

With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem 
to be the problem.

>> I'm testing a build with g_str_equal implemented in kvm.c and so far
>> I haven't had a crash in 6 invocations. That hasn't been possible
>> with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be
>> sure.
>>

Similarly, with my "local" implementation of g_str_equal, I've had 15 
clean invocations on vanilla kernel 3.5.0.

I'm more than happy to test patches to fix this regression, but it will 
be tomorrow before I will be able to do so.

>> Thanks for your help, Avi.
>
> --
> 			Gleb.
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Clayton July 30, 2012, 2 p.m. UTC | #24
On 07/29/12 20:10, Chris Clayton wrote:
>>>> Possible culprit: b2da15ac26a0c00.
>>>>
>>>>
>>> That commit isn't in qermu-kvm-1.1.1.
>>>
>> It is in kernel.
>>
>
> Sorry, so it is.
>
> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
> to be the problem.

Just to be sure, I've run some more tests today. No crashes occurred in 
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 
reverted.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 30, 2012, 2:03 p.m. UTC | #25
On 07/30/2012 05:00 PM, Chris Clayton wrote:
> On 07/29/12 20:10, Chris Clayton wrote:
>>>>> Possible culprit: b2da15ac26a0c00.
>>>>>
>>>>>
>>>> That commit isn't in qermu-kvm-1.1.1.
>>>>
>>> It is in kernel.
>>>
>>
>> Sorry, so it is.
>>
>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>> to be the problem.
> 
> Just to be sure, I've run some more tests today. No crashes occurred in
> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
> reverted.

Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.

What's your preemption settings?
Chris Clayton July 30, 2012, 2:07 p.m. UTC | #26
On 07/30/12 15:03, Avi Kivity wrote:
> On 07/30/2012 05:00 PM, Chris Clayton wrote:
>> On 07/29/12 20:10, Chris Clayton wrote:
>>>>>> Possible culprit: b2da15ac26a0c00.
>>>>>>
>>>>>>
>>>>> That commit isn't in qermu-kvm-1.1.1.
>>>>>
>>>> It is in kernel.
>>>>
>>>
>>> Sorry, so it is.
>>>
>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>>> to be the problem.
>>
>> Just to be sure, I've run some more tests today. No crashes occurred in
>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
>> reverted.
>
> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
> code looks correct.
>
> What's your preemption settings?
>
>
[chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 30, 2012, 4:39 p.m. UTC | #27
On 07/30/2012 05:07 PM, Chris Clayton wrote:
>>
>>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>>>> to be the problem.
>>>
>>> Just to be sure, I've run some more tests today. No crashes occurred in
>>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
>>> reverted.
>>
>> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
>> code looks correct.
>>
>> What's your preemption settings?
>>
>>
> [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_PREEMPT_NOTIFIERS=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_COUNT=y

Here's what I think that is happening

  vcpu_load
  ...
  vmx_save_host_state
  vmx_vcpu_run
  (ds.cpl, es.cpl cleared by hardware)

  interrupt
    push ds, es  # pushes bad ds, es
    schedule
      vmx_vcpu_put
        vmx_load_host_state
          reload ds, es
    pop ds, es  # of other thread's stack
    iret
  # other thread runs
  interrupt
    schedule  # back in vcpu thread
    interrupt return: pop ds, es  # <-- problem
    iret

   ...
   vcpu_put

   # bad ds, es, but !vmx->host_state.loaded

Marcelo, did I miss something here?

Unfortunately, my reproducer has ceased to reproduce.  But the fix is
easy if the analysis above is right.
Marcelo Tosatti July 30, 2012, 11:36 p.m. UTC | #28
On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote:
> On 07/30/2012 05:07 PM, Chris Clayton wrote:
> >>
> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
> >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
> >>>> to be the problem.
> >>>
> >>> Just to be sure, I've run some more tests today. No crashes occurred in
> >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
> >>> reverted.
> >>
> >> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
> >> code looks correct.
> >>
> >> What's your preemption settings?
> >>
> >>
> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
> > CONFIG_TREE_PREEMPT_RCU=y
> > CONFIG_PREEMPT_RCU=y
> > CONFIG_PREEMPT_NOTIFIERS=y
> > # CONFIG_PREEMPT_NONE is not set
> > # CONFIG_PREEMPT_VOLUNTARY is not set
> > CONFIG_PREEMPT=y
> > CONFIG_PREEMPT_COUNT=y
> 
> Here's what I think that is happening
> 
>   vcpu_load
>   ...
>   vmx_save_host_state
>   vmx_vcpu_run
>   (ds.cpl, es.cpl cleared by hardware)
> 
>   interrupt
>     push ds, es  # pushes bad ds, es
>     schedule
>       vmx_vcpu_put
>         vmx_load_host_state
>           reload ds, es
>     pop ds, es  # of other thread's stack
>     iret
>   # other thread runs
>   interrupt
>     schedule  # back in vcpu thread
>     interrupt return: pop ds, es  # <-- problem
>     iret
> 
>    ...
>    vcpu_put
> 
>    # bad ds, es, but !vmx->host_state.loaded
> 
> Marcelo, did I miss something here?

Don't think so.

> 
> Unfortunately, my reproducer has ceased to reproduce.  But the fix is
> easy if the analysis above is right.
> 
> -- 
> error compiling committee.c: too many arguments to function
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 31, 2012, 9:11 a.m. UTC | #29
On 07/31/2012 02:36 AM, Marcelo Tosatti wrote:
> On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote:
>> On 07/30/2012 05:07 PM, Chris Clayton wrote:
>> >>
>> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>> >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>> >>>> to be the problem.
>> >>>
>> >>> Just to be sure, I've run some more tests today. No crashes occurred in
>> >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
>> >>> reverted.
>> >>
>> >> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
>> >> code looks correct.
>> >>
>> >> What's your preemption settings?
>> >>
>> >>
>> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
>> > CONFIG_TREE_PREEMPT_RCU=y
>> > CONFIG_PREEMPT_RCU=y
>> > CONFIG_PREEMPT_NOTIFIERS=y
>> > # CONFIG_PREEMPT_NONE is not set
>> > # CONFIG_PREEMPT_VOLUNTARY is not set
>> > CONFIG_PREEMPT=y
>> > CONFIG_PREEMPT_COUNT=y
>> 
>> Here's what I think that is happening
>> 
>>   vcpu_load
>>   ...
>>   vmx_save_host_state
>>   vmx_vcpu_run
>>   (ds.cpl, es.cpl cleared by hardware)
>> 
>>   interrupt
>>     push ds, es  # pushes bad ds, es
>>     schedule
>>       vmx_vcpu_put
>>         vmx_load_host_state
>>           reload ds, es
>>     pop ds, es  # of other thread's stack
>>     iret
>>   # other thread runs
>>   interrupt
>>     schedule  # back in vcpu thread
>>     interrupt return: pop ds, es  # <-- problem
>>     iret
>> 
>>    ...
>>    vcpu_put
>> 
>>    # bad ds, es, but !vmx->host_state.loaded
>> 
>> Marcelo, did I miss something here?
> 
> Don't think so.

So the same problem should happen with %fs and %gs, no?

x86_64 is safe, since it entry_64.S never saves/restores segment registers.
Marcelo Tosatti July 31, 2012, 4:29 p.m. UTC | #30
On Tue, Jul 31, 2012 at 12:11:13PM +0300, Avi Kivity wrote:
> On 07/31/2012 02:36 AM, Marcelo Tosatti wrote:
> > On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote:
> >> On 07/30/2012 05:07 PM, Chris Clayton wrote:
> >> >>
> >> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
> >> >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
> >> >>>> to be the problem.
> >> >>>
> >> >>> Just to be sure, I've run some more tests today. No crashes occurred in
> >> >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
> >> >>> reverted.
> >> >>
> >> >> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
> >> >> code looks correct.
> >> >>
> >> >> What's your preemption settings?
> >> >>
> >> >>
> >> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
> >> > CONFIG_TREE_PREEMPT_RCU=y
> >> > CONFIG_PREEMPT_RCU=y
> >> > CONFIG_PREEMPT_NOTIFIERS=y
> >> > # CONFIG_PREEMPT_NONE is not set
> >> > # CONFIG_PREEMPT_VOLUNTARY is not set
> >> > CONFIG_PREEMPT=y
> >> > CONFIG_PREEMPT_COUNT=y
> >> 
> >> Here's what I think that is happening
> >> 
> >>   vcpu_load
> >>   ...
> >>   vmx_save_host_state
> >>   vmx_vcpu_run
> >>   (ds.cpl, es.cpl cleared by hardware)
> >> 
> >>   interrupt
> >>     push ds, es  # pushes bad ds, es
> >>     schedule
> >>       vmx_vcpu_put
> >>         vmx_load_host_state
> >>           reload ds, es
> >>     pop ds, es  # of other thread's stack
> >>     iret
> >>   # other thread runs
> >>   interrupt
> >>     schedule  # back in vcpu thread
> >>     interrupt return: pop ds, es  # <-- problem
> >>     iret
> >> 
> >>    ...
> >>    vcpu_put
> >> 
> >>    # bad ds, es, but !vmx->host_state.loaded
> >> 
> >> Marcelo, did I miss something here?
> > 
> > Don't think so.
> 
> So the same problem should happen with %fs and %gs, no?

AFAICS: 

depends on CONFIG_X86_32_LAZY_GS for GS, unconditional for FS.

> x86_64 is safe, since it entry_64.S never saves/restores segment registers.

Is the comment 

        /*
         * The sysexit path does not restore ds/es, so we must set them
         * to
         * a reasonable value ourselves.
         */

Correct?

syscall_exit -> syscall_exit_work -> resume_userspace ->
restore_all -> RESTORE_REGS
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 31, 2012, 4:46 p.m. UTC | #31
On 07/31/2012 07:29 PM, Marcelo Tosatti wrote:
>> 
>> So the same problem should happen with %fs and %gs, no?
> 
> AFAICS: 
> 
> depends on CONFIG_X86_32_LAZY_GS for GS, unconditional for FS.

This fs/gs were already in there, I wonder how it wasn't broken before.
 Something's fishy here.

> 
>> x86_64 is safe, since it entry_64.S never saves/restores segment registers.
> 
> Is the comment 
> 
>         /*
>          * The sysexit path does not restore ds/es, so we must set them
>          * to
>          * a reasonable value ourselves.
>          */
> 
> Correct?
> 
> syscall_exit -> syscall_exit_work -> resume_userspace ->
> restore_all -> RESTORE_REGS
> 

That's the non-sysexit path (could have arrived here by sysenter).  Look
at sysenter_exit.
Avi Kivity Aug. 1, 2012, 1:11 p.m. UTC | #32
On 07/30/2012 07:39 PM, Avi Kivity wrote:
> On 07/30/2012 05:07 PM, Chris Clayton wrote:
>>>
>>>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>>>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>>>>> to be the problem.
>>>>
>>>> Just to be sure, I've run some more tests today. No crashes occurred in
>>>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
>>>> reverted.
>>>
>>> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
>>> code looks correct.
>>>
>>> What's your preemption settings?
>>>
>>>
>> [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
>> CONFIG_TREE_PREEMPT_RCU=y
>> CONFIG_PREEMPT_RCU=y
>> CONFIG_PREEMPT_NOTIFIERS=y
>> # CONFIG_PREEMPT_NONE is not set
>> # CONFIG_PREEMPT_VOLUNTARY is not set
>> CONFIG_PREEMPT=y
>> CONFIG_PREEMPT_COUNT=y
> 
> Here's what I think that is happening
> 
>   vcpu_load
>   ...
>   vmx_save_host_state
>   vmx_vcpu_run
>   (ds.cpl, es.cpl cleared by hardware)
> 
>   interrupt
>     push ds, es  # pushes bad ds, es
>     schedule
>       vmx_vcpu_put
>         vmx_load_host_state
>           reload ds, es
>     pop ds, es  # of other thread's stack
>     iret
>   # other thread runs
>   interrupt
>     schedule  # back in vcpu thread
>     interrupt return: pop ds, es  # <-- problem

In fact, those are fine.

>     iret

But IRET-to-outer-privilege-level clears segment registers with the
wrong RPL.  Think how secure OSes would be if they used the hardware
fully.  Credit to Gleb for pinpointing this.

> 
>    ...
>    vcpu_put
> 
>    # bad ds, es, but !vmx->host_state.loaded
>
diff mbox

Patch

--- qemu-kvm-1.1.0/configure~   2012-07-15 22:38:39.000000000 +0100
+++ qemu-kvm-1.1.0/configure    2012-07-15 22:39:09.000000000 +0100
@@ -2783,7 +2783,7 @@  int main(int argc, char **argv)
  }
  EOF
    if ! compile_prog "" "" ; then
-    CFLAGS+="-march=i486"
+    CFLAGS+="-march=i686"
    fi
  fi