diff mbox

commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

Message ID 20150901100417.GA424@x4 (mailing list archive)
State New, archived
Headers show

Commit Message

Markus Trippelsdorf Sept. 1, 2015, 10:04 a.m. UTC
On 2015.09.01 at 10:56 +0200, Ingo Molnar wrote:
> 
> * Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> > As I wrote in my other reply. The boot failure is nondeterministic (boot
> > succeeds roughly every sixth time). So the bisection and the patch is
> > just bogus (,but the boot failure is real).
> > 
> > Sorry.
> 
> No problem. Please let us know if any of these commits does turn out to be the 
> culprit. (Which is always a possibility.)

I'm pretty sure commit 3c2e7f7de3 is the culprit.

commit 3c2e7f7de3240216042b61073803b61b9b3cfb22
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Tue Jul 7 14:32:17 2015 +0200

    KVM: SVM: use NPT page attributes

I've booted ten times in a row successfully with the following patch:


Paolo, your commit causes nondeterministic boot failure on my machine.
It sometimes crashes early with the following backtrace:

map_vsyscall
kvm_arch_hardware_setup
map_vsyscall
kvm_init
map_vsyscall
do_one_initcall
kernel_init_freeable
rest_init
kernel_init
ret_from_fork
rest_init

RIP: svm_hardware_setup

Comments

Xiao Guangrong Sept. 1, 2015, 1 p.m. UTC | #1
On 09/01/2015 06:04 PM, Markus Trippelsdorf wrote:
> On 2015.09.01 at 10:56 +0200, Ingo Molnar wrote:
>>
>> * Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
>>> As I wrote in my other reply. The boot failure is nondeterministic (boot
>>> succeeds roughly every sixth time). So the bisection and the patch is
>>> just bogus (,but the boot failure is real).
>>>
>>> Sorry.
>>
>> No problem. Please let us know if any of these commits does turn out to be the
>> culprit. (Which is always a possibility.)
>
> I'm pretty sure commit 3c2e7f7de3 is the culprit.
>
> commit 3c2e7f7de3240216042b61073803b61b9b3cfb22
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Tue Jul 7 14:32:17 2015 +0200
>
>      KVM: SVM: use NPT page attributes
>
> I've booted ten times in a row successfully with the following patch:
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 74d825716f4f..3190173a575f 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -989,7 +989,7 @@ static __init int svm_hardware_setup(void)
>   	} else
>   		kvm_disable_tdp();
>
> -	build_mtrr2protval();
> +//	build_mtrr2protval();
>   	return 0;
>
>   err:
>
> Paolo, your commit causes nondeterministic boot failure on my machine.
> It sometimes crashes early with the following backtrace:
>

Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
If yes, could you please print the actual value out?

BTW, you may change BUG() to WARN() to get the print info more easier.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Trippelsdorf Sept. 1, 2015, 1:56 p.m. UTC | #2
On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:
> 
> Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
> If yes, could you please print the actual value out?

It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
it prints 1 for the value of mtrr.

 MTRR_TYPE_WRCOMB     1
Xiao Guangrong Sept. 1, 2015, 10:31 p.m. UTC | #3
On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:
> On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:
>>
>> Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
>> If yes, could you please print the actual value out?
>
> It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
> it prints 1 for the value of mtrr.
>
>   MTRR_TYPE_WRCOMB     1
>

Then I suspect pat is not enabled in your box, could you please check
CONFIG_X86_PAT is selected in your .config file, pat is shown in
/proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.

I will post a fix if the suspect is right.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Trippelsdorf Sept. 2, 2015, 3:50 a.m. UTC | #4
On 2015.09.02 at 06:31 +0800, Xiao Guangrong wrote:
> 
> 
> On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:
> > On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:
> >>
> >> Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
> >> If yes, could you please print the actual value out?
> >
> > It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
> > it prints 1 for the value of mtrr.
> >
> >   MTRR_TYPE_WRCOMB     1
> >
> 
> Then I suspect pat is not enabled in your box, could you please check
> CONFIG_X86_PAT is selected in your .config file, pat is shown in
> /proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.

No. PAT is of course enabled and booting is successful sometimes even
with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.

markus@x4 linux % cat .config | grep  X86_PAT
CONFIG_X86_PAT=y
markus@x4 linux % dmesg | grep PAT
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT  
markus@x4 linux % cat /proc/cpuinfo| grep pat
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save vmmcall
...
diff mbox

Patch

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74d825716f4f..3190173a575f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -989,7 +989,7 @@  static __init int svm_hardware_setup(void)
 	} else
 		kvm_disable_tdp();
 
-	build_mtrr2protval();
+//	build_mtrr2protval();
 	return 0;
 
 err: