diff mbox

[v2,3/3] paravirt: rename paravirt_enabled to paravirt_legacy

Message ID 20160217204954.GO2023@pd.tnic (mailing list archive)
State New, archived
Headers show

Commit Message

Borislav Petkov Feb. 17, 2016, 8:49 p.m. UTC
On Wed, Feb 17, 2016 at 12:07:13PM -0800, Luis R. Rodriguez wrote:
> OK so here's a wiki to keep track of progress of the difference uses:
> 
> http://kernelnewbies.org/KernelProjects/remove-paravirt-enabled
> 
> It seems we have a resolution one way or another for all except for
> the use on arch/x86/mm/dump_pagetables.c, is that right?

Why not?

I think we should simply check the range as ffff800000000000 -
ffff87ffffffffff is practically an ABI and nothing should be mapped
there anyway. No need for paravirt_enabled() there either.

---

Comments

Luis Chamberlain Feb. 17, 2016, 9:12 p.m. UTC | #1
On Wed, Feb 17, 2016 at 12:49 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Feb 17, 2016 at 12:07:13PM -0800, Luis R. Rodriguez wrote:
>> OK so here's a wiki to keep track of progress of the difference uses:
>>
>> http://kernelnewbies.org/KernelProjects/remove-paravirt-enabled
>>
>> It seems we have a resolution one way or another for all except for
>> the use on arch/x86/mm/dump_pagetables.c, is that right?
>
> Why not?
>
> I think we should simply check the range as ffff800000000000 -
> ffff87ffffffffff is practically an ABI and nothing should be mapped
> there anyway. No need for paravirt_enabled() there either.

Provided someone on the xen side acks, then great! We'd have full
coverage to remove all uses soon and kill paravirt_enabled() for good.
It may take some time to run tests of this to get a full sense of
correctness but perhaps in the future it may be easier if 0-day gets
some basic Xen tests (or embraces the Xen test suite) as was discussed
as possible a while ago. lguest may need some basic tests too, but I'm
not even sure what type of tests we'd run against lguest.

 Luis
Boris Ostrovsky Feb. 17, 2016, 9:21 p.m. UTC | #2
On 02/17/2016 03:49 PM, Borislav Petkov wrote:
> On Wed, Feb 17, 2016 at 12:07:13PM -0800, Luis R. Rodriguez wrote:
>> OK so here's a wiki to keep track of progress of the difference uses:
>>
>> http://kernelnewbies.org/KernelProjects/remove-paravirt-enabled
>>
>> It seems we have a resolution one way or another for all except for
>> the use on arch/x86/mm/dump_pagetables.c, is that right?
>
> Why not?
>
> I think we should simply check the range as ffff800000000000 -
> ffff87ffffffffff is practically an ABI and nothing should be mapped
                                                     ^^^^^^^^^

That's exactly the point: if something is mapped it's an error for a 
non-PV kernel.

By removing paravirt_enabled() we may miss those errors. Worse, I think 
we may even crash while doing pagetable walk (although it's probably 
better to crash here than to use an unexpected translation in real code 
somewhere)

-boris


> there anyway. No need for paravirt_enabled() there either.
>
Borislav Petkov Feb. 17, 2016, 10:03 p.m. UTC | #3
On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote:
> That's exactly the point: if something is mapped it's an error for a
> non-PV kernel.

How would something be mapped there? __PAGE_OFFSET is
0xffff880000000000.

Or are you thinking about some insanely b0rked kernel code mapping stuff
in there?

> By removing paravirt_enabled() we may miss those errors. Worse, I think we
> may even crash while doing pagetable walk (although it's probably better to
> crash here than to use an unexpected translation in real code somewhere)

Well, if this is the only site which keeps paravirt_enabled() from being
buried, we need to think about a better way to detect a hypervisor.
Maybe we should look at x86_hyper, use CPUID(0x4...) or something else.

What's your preference?
Andy Lutomirski Feb. 17, 2016, 10:18 p.m. UTC | #4
On Wed, Feb 17, 2016 at 2:03 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote:
>> That's exactly the point: if something is mapped it's an error for a
>> non-PV kernel.
>
> How would something be mapped there? __PAGE_OFFSET is
> 0xffff880000000000.
>
> Or are you thinking about some insanely b0rked kernel code mapping stuff
> in there?
>
>> By removing paravirt_enabled() we may miss those errors. Worse, I think we
>> may even crash while doing pagetable walk (although it's probably better to
>> crash here than to use an unexpected translation in real code somewhere)
>
> Well, if this is the only site which keeps paravirt_enabled() from being
> buried, we need to think about a better way to detect a hypervisor.
> Maybe we should look at x86_hyper, use CPUID(0x4...) or something else.
>
> What's your preference?

I'm confused.  Isn't it the other way around?  That is, checking for
the hypervisor range on all systems should be safer, right?  Or am I
missing something?

--Andy
Boris Ostrovsky Feb. 17, 2016, 10:19 p.m. UTC | #5
On 02/17/2016 05:03 PM, Borislav Petkov wrote:
> On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote:
>> That's exactly the point: if something is mapped it's an error for a
>> non-PV kernel.
> How would something be mapped there? __PAGE_OFFSET is
> 0xffff880000000000.
>
> Or are you thinking about some insanely b0rked kernel code mapping stuff
> in there?

The latter. This is to detect things that clearly shouldn't be happening.

>
>> By removing paravirt_enabled() we may miss those errors. Worse, I think we
>> may even crash while doing pagetable walk (although it's probably better to
>> crash here than to use an unexpected translation in real code somewhere)
> Well, if this is the only site which keeps paravirt_enabled() from being
> buried, we need to think about a better way to detect a hypervisor.
> Maybe we should look at x86_hyper, use CPUID(0x4...)

Can't use CPUID 0x40000000 because it will return hypervisor (Xen or 
otherwise) for non-PV guests as well. In Xen's case, you can't determine 
guest type from hypervisor leaves.


> or something else.

We could say xen_pv_domain(). But this means using Xen-specific code in 
x86-generic file to detect things specified by ABI. I don't know if I'd 
like that.

-boris

>
> What's your preference?
>
Luis Chamberlain Feb. 17, 2016, 10:35 p.m. UTC | #6
On Wed, Feb 17, 2016 at 11:03:04PM +0100, Borislav Petkov wrote:
> On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote:
> > That's exactly the point: if something is mapped it's an error for a
> > non-PV kernel.
> 
> How would something be mapped there? __PAGE_OFFSET is
> 0xffff880000000000.
> 
> Or are you thinking about some insanely b0rked kernel code mapping stuff
> in there?
> 
> > By removing paravirt_enabled() we may miss those errors. Worse, I think we
> > may even crash while doing pagetable walk (although it's probably better to
> > crash here than to use an unexpected translation in real code somewhere)
> 
> Well, if this is the only site which keeps paravirt_enabled() from being
> buried, we need to think about a better way to detect a hypervisor.
> Maybe we should look at x86_hyper, use CPUID(0x4...) or something else.
> 
> What's your preference?

I can think of two possibilities but lets also address _why_ we can't
replace it: the semantic gap.

1) Can't we just set_memory_np() to cause a page fault on that range
   to catch invalid access?

2) Add hypervisor type

I've been lobbying for a new boot protocol hypervisor type and hypervisor data
pointer extensions, much in the same way subarch and subarch_data was added for
x86 boot protocol 2.07.  This would be an extension to help fill in the gaps,
it would also make it accessible to super early code and could help those of
us who care about super clean inits to work towards these goals. Right now we
can rely on the subarch for most Xen concerns but with Xen HVM and future Xen
HVMLite things get more complex and as noted by hpa the subarch was not
designed as a 'hypervisor type', such a thing should be considered separateley.

So we'd knock about 2-3 birds with 1 stone here:

  a) there is a semantic gap between early access to hypervisor type of
     code between asm boot and when setup_arch() is called, only after
     init_hypervisor_platform() is called (in setup_arch()) can things
     such as  cpu_has_hypervisor() or derivatives such as kvm_para_available()
     be correctly used, as only then will that information be correct across
     the board. Part of this issue is what gave rise to paravirt_enabled()
     hackeries in the first place. We have subarch but that is not to be used
     to set things such as hypervisor type, I'll soon post a patch to clarify
     the exact use case for the subarch.
     Since we have no uniform way to detect hypervisor types we are starting
     to see custom hacks. For instance of a hack propagated to drivers now, see
     sound/pci/intel8x0.c use of kvm_para_available().
  b) help put an end to paravirt_enabled() for cases we can't replace
  c) provide an early access mechanism to hypervisor type. This should
     help towards unifying inits by enabling further stubs on early and
     post routine calls. This is future long term possible work:

     With a hypervisor type and hypervisor custom data pointer, we'd strive
     to work to make xen use the same startup_32() and startup_64() entries,
     with stubs possible using the hypervisor type at the start /end as
     follows:

	startup_32()                         startup_64()
	       |                                  |
	       |                                  |
	       V                                  V
	pre_hypervisor_stub_32()        pre_hypervisor_stub_64()
	       |                                  |
	       |                                  |
	       V                                  V
	 [existing startup_32()]       [existing startup_64()]
	       |                                  |
	       |                                  |
	       V                                  V
	post_hypervisor_stub_32()       post_hypervisor_stub_64()

    The pre_hypervisor_stub_32() would have much of the code of
    the newly proposed hvmlite_start_xen() but for 32-bit, pre_hypervisor_stub_64()
    would have the 64-bits.

I realize Andrew has not been a fan over the idea of Xen setting on the
zero page *any* parameter but he's also noted an alternative is to just
lobby for Grub to boot Xen kernels and then we can rely on Grub to
set it for Linux. The only issue with this is it seems this doesn't
address stubdomains which don't use grub?

  Luis
Boris Ostrovsky Feb. 17, 2016, 10:39 p.m. UTC | #7
On 02/17/2016 05:18 PM, Andy Lutomirski wrote:
> On Wed, Feb 17, 2016 at 2:03 PM, Borislav Petkov <bp@alien8.de> wrote:
>> On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote:
>>> That's exactly the point: if something is mapped it's an error for a
>>> non-PV kernel.
>> How would something be mapped there? __PAGE_OFFSET is
>> 0xffff880000000000.
>>
>> Or are you thinking about some insanely b0rked kernel code mapping stuff
>> in there?
>>
>>> By removing paravirt_enabled() we may miss those errors. Worse, I think we
>>> may even crash while doing pagetable walk (although it's probably better to
>>> crash here than to use an unexpected translation in real code somewhere)
>> Well, if this is the only site which keeps paravirt_enabled() from being
>> buried, we need to think about a better way to detect a hypervisor.
>> Maybe we should look at x86_hyper, use CPUID(0x4...) or something else.
>>
>> What's your preference?
> I'm confused.  Isn't it the other way around?  That is, checking for
> the hypervisor range on all systems should be safer, right?  Or am I
> missing something?

Hmm. I think you are right --- I was following wrong branch of the 'if' 
statement. We are always going straight to note_page().

Then yes, we should be able to remove paravirt_enabled(). Sorry for the 
noise.

-boris
Borislav Petkov Feb. 17, 2016, 11:39 p.m. UTC | #8
On Wed, Feb 17, 2016 at 05:39:23PM -0500, Boris Ostrovsky wrote:
> Hmm. I think you are right --- I was following wrong branch of the 'if'
> statement. We are always going straight to note_page().

Yap. The is_hypervisor_range() - without the "!" - does note_page().

> Then yes, we should be able to remove paravirt_enabled(). Sorry for
> the noise.

I'll do a formal patch tomorrow.

Thanks.
diff mbox

Patch

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 4a6f1d9b5106..de1ee3a40250 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -358,20 +358,20 @@  static void walk_pud_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
 #define pgd_none(a)  pud_none(__pud(pgd_val(a)))
 #endif
 
-#ifdef CONFIG_X86_64
 static inline bool is_hypervisor_range(int idx)
 {
+#ifdef CONFIG_X86_64
+
 	/*
 	 * ffff800000000000 - ffff87ffffffffff is reserved for
 	 * the hypervisor.
 	 */
-	return paravirt_enabled() &&
-		(idx >= pgd_index(__PAGE_OFFSET) - 16) &&
+	return	(idx >= pgd_index(__PAGE_OFFSET) - 16) &&
 		(idx < pgd_index(__PAGE_OFFSET));
-}
 #else
-static inline bool is_hypervisor_range(int idx) { return false; }
+	return false;
 #endif
+}
 
 static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 				       bool checkwx)