Message ID | 20160217204954.GO2023@pd.tnic (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Feb 17, 2016 at 12:49 PM, Borislav Petkov <bp@alien8.de> wrote: > On Wed, Feb 17, 2016 at 12:07:13PM -0800, Luis R. Rodriguez wrote: >> OK so here's a wiki to keep track of progress of the difference uses: >> >> http://kernelnewbies.org/KernelProjects/remove-paravirt-enabled >> >> It seems we have a resolution one way or another for all except for >> the use on arch/x86/mm/dump_pagetables.c, is that right? > > Why not? > > I think we should simply check the range as ffff800000000000 - > ffff87ffffffffff is practically an ABI and nothing should be mapped > there anyway. No need for paravirt_enabled() there either. Provided someone on the xen side acks, then great! We'd have full coverage to remove all uses soon and kill paravirt_enabled() for good. It may take some time to run tests of this to get a full sense of correctness but perhaps in the future it may be easier if 0-day gets some basic Xen tests (or embraces the Xen test suite) as was discussed as possible a while ago. lguest may need some basic tests too, but I'm not even sure what type of tests we'd run against lguest. Luis
On 02/17/2016 03:49 PM, Borislav Petkov wrote: > On Wed, Feb 17, 2016 at 12:07:13PM -0800, Luis R. Rodriguez wrote: >> OK so here's a wiki to keep track of progress of the difference uses: >> >> http://kernelnewbies.org/KernelProjects/remove-paravirt-enabled >> >> It seems we have a resolution one way or another for all except for >> the use on arch/x86/mm/dump_pagetables.c, is that right? > > Why not? > > I think we should simply check the range as ffff800000000000 - > ffff87ffffffffff is practically an ABI and nothing should be mapped ^^^^^^^^^ That's exactly the point: if something is mapped it's an error for a non-PV kernel. By removing paravirt_enabled() we may miss those errors. Worse, I think we may even crash while doing pagetable walk (although it's probably better to crash here than to use an unexpected translation in real code somewhere) -boris > there anyway. No need for paravirt_enabled() there either. >
On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote: > That's exactly the point: if something is mapped it's an error for a > non-PV kernel. How would something be mapped there? __PAGE_OFFSET is 0xffff880000000000. Or are you thinking about some insanely b0rked kernel code mapping stuff in there? > By removing paravirt_enabled() we may miss those errors. Worse, I think we > may even crash while doing pagetable walk (although it's probably better to > crash here than to use an unexpected translation in real code somewhere) Well, if this is the only site which keeps paravirt_enabled() from being buried, we need to think about a better way to detect a hypervisor. Maybe we should look at x86_hyper, use CPUID(0x4...) or something else. What's your preference?
On Wed, Feb 17, 2016 at 2:03 PM, Borislav Petkov <bp@alien8.de> wrote: > On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote: >> That's exactly the point: if something is mapped it's an error for a >> non-PV kernel. > > How would something be mapped there? __PAGE_OFFSET is > 0xffff880000000000. > > Or are you thinking about some insanely b0rked kernel code mapping stuff > in there? > >> By removing paravirt_enabled() we may miss those errors. Worse, I think we >> may even crash while doing pagetable walk (although it's probably better to >> crash here than to use an unexpected translation in real code somewhere) > > Well, if this is the only site which keeps paravirt_enabled() from being > buried, we need to think about a better way to detect a hypervisor. > Maybe we should look at x86_hyper, use CPUID(0x4...) or something else. > > What's your preference? I'm confused. Isn't it the other way around? That is, checking for the hypervisor range on all systems should be safer, right? Or am I missing something? --Andy
On 02/17/2016 05:03 PM, Borislav Petkov wrote: > On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote: >> That's exactly the point: if something is mapped it's an error for a >> non-PV kernel. > How would something be mapped there? __PAGE_OFFSET is > 0xffff880000000000. > > Or are you thinking about some insanely b0rked kernel code mapping stuff > in there? The latter. This is to detect things that clearly shouldn't be happening. > >> By removing paravirt_enabled() we may miss those errors. Worse, I think we >> may even crash while doing pagetable walk (although it's probably better to >> crash here than to use an unexpected translation in real code somewhere) > Well, if this is the only site which keeps paravirt_enabled() from being > buried, we need to think about a better way to detect a hypervisor. > Maybe we should look at x86_hyper, use CPUID(0x4...) Can't use CPUID 0x40000000 because it will return hypervisor (Xen or otherwise) for non-PV guests as well. In Xen's case, you can't determine guest type from hypervisor leaves. > or something else. We could say xen_pv_domain(). But this means using Xen-specific code in x86-generic file to detect things specified by ABI. I don't know if I'd like that. -boris > > What's your preference? >
On Wed, Feb 17, 2016 at 11:03:04PM +0100, Borislav Petkov wrote: > On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote: > > That's exactly the point: if something is mapped it's an error for a > > non-PV kernel. > > How would something be mapped there? __PAGE_OFFSET is > 0xffff880000000000. > > Or are you thinking about some insanely b0rked kernel code mapping stuff > in there? > > > By removing paravirt_enabled() we may miss those errors. Worse, I think we > > may even crash while doing pagetable walk (although it's probably better to > > crash here than to use an unexpected translation in real code somewhere) > > Well, if this is the only site which keeps paravirt_enabled() from being > buried, we need to think about a better way to detect a hypervisor. > Maybe we should look at x86_hyper, use CPUID(0x4...) or something else. > > What's your preference? I can think of two possibilities but lets also address _why_ we can't replace it: the semantic gap. 1) Can't we just set_memory_np() to cause a page fault on that range to catch invalid access? 2) Add hypervisor type I've been lobbying for a new boot protocol hypervisor type and hypervisor data pointer extensions, much in the same way subarch and subarch_data was added for x86 boot protocol 2.07. This would be an extension to help fill in the gaps, it would also make it accessible to super early code and could help those of us who care about super clean inits to work towards these goals. Right now we can rely on the subarch for most Xen concerns but with Xen HVM and future Xen HVMLite things get more complex and as noted by hpa the subarch was not designed as a 'hypervisor type', such a thing should be considered separateley. So we'd knock about 2-3 birds with 1 stone here: a) there is a semantic gap between early access to hypervisor type of code between asm boot and when setup_arch() is called, only after init_hypervisor_platform() is called (in setup_arch()) can things such as cpu_has_hypervisor() or derivatives such as kvm_para_available() be correctly used, as only then will that information be correct across the board. Part of this issue is what gave rise to paravirt_enabled() hackeries in the first place. We have subarch but that is not to be used to set things such as hypervisor type, I'll soon post a patch to clarify the exact use case for the subarch. Since we have no uniform way to detect hypervisor types we are starting to see custom hacks. For instance of a hack propagated to drivers now, see sound/pci/intel8x0.c use of kvm_para_available(). b) help put an end to paravirt_enabled() for cases we can't replace c) provide an early access mechanism to hypervisor type. This should help towards unifying inits by enabling further stubs on early and post routine calls. This is future long term possible work: With a hypervisor type and hypervisor custom data pointer, we'd strive to work to make xen use the same startup_32() and startup_64() entries, with stubs possible using the hypervisor type at the start /end as follows: startup_32() startup_64() | | | | V V pre_hypervisor_stub_32() pre_hypervisor_stub_64() | | | | V V [existing startup_32()] [existing startup_64()] | | | | V V post_hypervisor_stub_32() post_hypervisor_stub_64() The pre_hypervisor_stub_32() would have much of the code of the newly proposed hvmlite_start_xen() but for 32-bit, pre_hypervisor_stub_64() would have the 64-bits. I realize Andrew has not been a fan over the idea of Xen setting on the zero page *any* parameter but he's also noted an alternative is to just lobby for Grub to boot Xen kernels and then we can rely on Grub to set it for Linux. The only issue with this is it seems this doesn't address stubdomains which don't use grub? Luis
On 02/17/2016 05:18 PM, Andy Lutomirski wrote: > On Wed, Feb 17, 2016 at 2:03 PM, Borislav Petkov <bp@alien8.de> wrote: >> On Wed, Feb 17, 2016 at 04:21:56PM -0500, Boris Ostrovsky wrote: >>> That's exactly the point: if something is mapped it's an error for a >>> non-PV kernel. >> How would something be mapped there? __PAGE_OFFSET is >> 0xffff880000000000. >> >> Or are you thinking about some insanely b0rked kernel code mapping stuff >> in there? >> >>> By removing paravirt_enabled() we may miss those errors. Worse, I think we >>> may even crash while doing pagetable walk (although it's probably better to >>> crash here than to use an unexpected translation in real code somewhere) >> Well, if this is the only site which keeps paravirt_enabled() from being >> buried, we need to think about a better way to detect a hypervisor. >> Maybe we should look at x86_hyper, use CPUID(0x4...) or something else. >> >> What's your preference? > I'm confused. Isn't it the other way around? That is, checking for > the hypervisor range on all systems should be safer, right? Or am I > missing something? Hmm. I think you are right --- I was following wrong branch of the 'if' statement. We are always going straight to note_page(). Then yes, we should be able to remove paravirt_enabled(). Sorry for the noise. -boris
On Wed, Feb 17, 2016 at 05:39:23PM -0500, Boris Ostrovsky wrote: > Hmm. I think you are right --- I was following wrong branch of the 'if' > statement. We are always going straight to note_page(). Yap. The is_hypervisor_range() - without the "!" - does note_page(). > Then yes, we should be able to remove paravirt_enabled(). Sorry for > the noise. I'll do a formal patch tomorrow. Thanks.
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 4a6f1d9b5106..de1ee3a40250 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -358,20 +358,20 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, pgd_t addr, #define pgd_none(a) pud_none(__pud(pgd_val(a))) #endif -#ifdef CONFIG_X86_64 static inline bool is_hypervisor_range(int idx) { +#ifdef CONFIG_X86_64 + /* * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */ - return paravirt_enabled() && - (idx >= pgd_index(__PAGE_OFFSET) - 16) && + return (idx >= pgd_index(__PAGE_OFFSET) - 16) && (idx < pgd_index(__PAGE_OFFSET)); -} #else -static inline bool is_hypervisor_range(int idx) { return false; } + return false; #endif +} static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, bool checkwx)