Message ID | 20240319205849.115884-4-jason.andryuk@amd.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/pvh: Support relocating dom0 kernel | expand |
On 19.03.2024 21:58, Jason Andryuk wrote: > --- a/xen/arch/x86/hvm/dom0_build.c > +++ b/xen/arch/x86/hvm/dom0_build.c > @@ -537,6 +537,97 @@ static paddr_t __init find_memory( > return INVALID_PADDR; > } > > +static bool __init check_load_address( > + const struct domain *d, const struct elf_binary *elf) > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base; As before I think it is illegitimate to cast a pointer to paddr_t. The variable type wants to remain such, but the cast wants to be to unsigned long or uintptr_t (or else at least a comment wants adding). > + paddr_t kernel_end = kernel_start + elf->dest_size; > + unsigned int i; > + > + /* Relies on a sorted memory map. */ > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; Nit: Shorter if you use "start" here (also again in the other function). > + if ( start >= kernel_end ) > + return false; > + > + if ( d->arch.e820[i].type == E820_RAM && Nit: Excess blank before &&. > + start <= kernel_start && > + end >= kernel_end ) > + return true; As to the comment ahead of the function: This further relies on adjacent entries of the same type having got merged. > + } > + > + return false; > +} > + > +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */ > +static paddr_t __init find_kernel_memory( > + const struct domain *d, struct elf_binary *elf, > + const struct elf_dom_parms *parms) > +{ > + paddr_t kernel_size = elf->dest_size; > + unsigned int i; > + > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; > + paddr_t kstart, kend; > + > + if ( d->arch.e820[i].type != E820_RAM || > + d->arch.e820[i].size < kernel_size ) > + continue; > + > + kstart = ROUNDUP(start, parms->phys_align); > + kstart = max_t(paddr_t, kstart, parms->phys_min); I'd generally try to avoid max_t(), but I cannot think of any good way of expressing this without using it. > + kend = kstart + kernel_size; > + > + if ( kend > parms->phys_max ) So despite its default phys_max is exclusive aiui. Otherwise with kend being exclusive too, this would look to be off by one. > + return 0; > + > + if ( kend <= end ) > + return kstart; > + } > + > + return 0; > +} > + > +/* Check the kernel load address, and adjust if necessary and possible. */ > +static bool __init check_and_adjust_load_address( > + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms) > +{ > + paddr_t reloc_base; > + > + if ( check_load_address(d, elf) ) > + return true; > + > + if ( !parms->phys_reloc ) > + { > + printk("%pd kernel: Address conflict and not relocatable\n", d); > + return false; This better wouldn't result in -ENOMEM in the caller. Then again reasons are logged, so the specific error code perhaos doesn't matter that much. > + } > + > + reloc_base = find_kernel_memory(d, elf, parms); > + if ( reloc_base == 0 ) With ! used above please be consistent and do so here, too. > + { > + printk("%pd kernel: Failed find a load address\n", d); > + return false; > + } > + > + if ( opt_dom0_verbose ) > + printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", d, > + elf->dest_base, elf->dest_base + elf->dest_size - 1, > + reloc_base, reloc_base + elf->dest_size - 1); > + > + parms->phys_entry = reloc_base + > + (parms->phys_entry - (paddr_t)elf->dest_base); > + elf->dest_base = (char *)reloc_base; Just as a remark, no request to change anything: We're not dealing with strings here. Hence char * isn't quite right, just that "dest_base" is of that type (for whatever reason). > @@ -161,6 +164,10 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, > { > elf_msg(elf, "ELF: note: %s\n", note_desc[type].name); > } > + else if ( note_desc[type].type == ELFNOTE_CUSTOM ) > + { > + elf_msg(elf, "ELF: note: %s", note_desc[type].name); > + } This could do with a brief comment, to make clear the \n isn't accidentally missing here. > @@ -225,6 +232,28 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, > case XEN_ELFNOTE_PHYS32_ENTRY: > parms->phys_entry = val; > break; > + > + case XEN_ELFNOTE_PHYS32_RELOC: > + parms->phys_reloc = true; > + > + if ( descsz >= 4 ) > + { > + parms->phys_max = elf_note_numeric_array(elf, note, 4, 0); > + elf_msg(elf, " = max: %#"PRIx32, parms->phys_max); > + } > + if ( descsz >= 8 ) > + { > + parms->phys_min = elf_note_numeric_array(elf, note, 4, 1); > + elf_msg(elf, " min: %#"PRIx32, parms->phys_min); > + } > + if ( descsz >= 12 ) > + { > + parms->phys_align = elf_note_numeric_array(elf, note, 4, 2); > + elf_msg(elf, " align: %#"PRIx32, parms->phys_align); > + } > + > + elf_msg(elf, "\n"); Imo the newline wants adding outside of the switch(), for everything set to ELFNOTE_CUSTOM. In fact with that I think ELFNOTE_CUSTOM and ELFNOTE_NAME could be folded (with what's NAME right now simply printing nothing in the switch here). Which suggests that I may better not commit the (slightly adjusted) earlier patch, yet. > @@ -536,6 +565,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf, > parms->p2m_base = UNSET_ADDR; > parms->elf_paddr_offset = UNSET_ADDR; > parms->phys_entry = UNSET_ADDR32; > + parms->phys_min = 0; > + parms->phys_max = 0xffffffff; > + parms->phys_align = 0x200000; I think this wants to be MB(2) (requiring a pre-patch to centralize MB() in the tool stack to tools/include/xen-tools/common-macros.h). And I further think this needs to be an arch-specific constant, even if right now the note is expected to be present only for x86. Which then also needs saying ... > --- a/xen/include/public/elfnote.h > +++ b/xen/include/public/elfnote.h > @@ -194,10 +194,26 @@ > */ > #define XEN_ELFNOTE_PHYS32_ENTRY 18 > > +/* > + * Physical loading constraints for PVH kernels > + * > + * Used to place constraints on the guest physical loading addresses and > + * alignment for a PVH kernel. > + * > + * The presence of this note indicates the kernel supports relocating itself. > + * > + * The note may include up to three 32bit values in the following order: > + * - a maximum address for the entire image to be loaded below (default > + * 0xffffffff) > + * - a minimum address for the start of the image (default 0) > + * - a required start alignment (default 0x200000) > + */ > +#define XEN_ELFNOTE_PHYS32_RELOC 19 ... in the comment here. The reasoning behind not (intermediately) falling back to the alignment in the ELF headers also wants adding to the patch description (or a code comment, albeit there's no really good place to put it, imo). Additionally I think the pieces of the comment want re-ordering. The primary purpose is indicating relocatability; when not relocating, the constraints aren't applied in any way. Jan
On Wed, 20 Mar 2024, Jan Beulich wrote: > On 19.03.2024 21:58, Jason Andryuk wrote: > > --- a/xen/arch/x86/hvm/dom0_build.c > > +++ b/xen/arch/x86/hvm/dom0_build.c > > @@ -537,6 +537,97 @@ static paddr_t __init find_memory( > > return INVALID_PADDR; > > } > > > > +static bool __init check_load_address( > > + const struct domain *d, const struct elf_binary *elf) > > +{ > > + paddr_t kernel_start = (paddr_t)elf->dest_base; > > As before I think it is illegitimate to cast a pointer to paddr_t. The > variable type wants to remain such, but the cast wants to be to > unsigned long or uintptr_t (or else at least a comment wants adding). uintptr_t is a good suggestion. uintptr_t is the recommended way by the C standard and MISRA by extension to cast integers to points and vice versa. In Xen we have used unsigned long for the same purpose although it is not the best way any longer (something else to document).
On 2024-03-20 10:39, Jan Beulich wrote: > On 19.03.2024 21:58, Jason Andryuk wrote: >> --- a/xen/arch/x86/hvm/dom0_build.c >> +++ b/xen/arch/x86/hvm/dom0_build.c >> @@ -537,6 +537,97 @@ static paddr_t __init find_memory( >> return INVALID_PADDR; >> } >> >> +static bool __init check_load_address( >> + const struct domain *d, const struct elf_binary *elf) >> +{ >> + paddr_t kernel_start = (paddr_t)elf->dest_base; > > As before I think it is illegitimate to cast a pointer to paddr_t. The > variable type wants to remain such, but the cast wants to be to > unsigned long or uintptr_t (or else at least a comment wants adding). Ok, thanks. This explains it clearly. >> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */ >> +static paddr_t __init find_kernel_memory( >> + const struct domain *d, struct elf_binary *elf, >> + const struct elf_dom_parms *parms) >> +{ >> + paddr_t kernel_size = elf->dest_size; >> + unsigned int i; >> + >> + for ( i = 0; i < d->arch.nr_e820; i++ ) >> + { >> + paddr_t start = d->arch.e820[i].addr; >> + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; >> + paddr_t kstart, kend; >> + >> + if ( d->arch.e820[i].type != E820_RAM || >> + d->arch.e820[i].size < kernel_size ) >> + continue; >> + >> + kstart = ROUNDUP(start, parms->phys_align); >> + kstart = max_t(paddr_t, kstart, parms->phys_min); > > I'd generally try to avoid max_t(), but I cannot think of any good way > of expressing this without using it. > >> + kend = kstart + kernel_size; >> + >> + if ( kend > parms->phys_max ) > > So despite its default phys_max is exclusive aiui. Otherwise with > kend being exclusive too, this would look to be off by one. Yes, I'll fix the off-by-one. Hmmm, phys_max being 32bit, we want it to be inclusive to represent the maximum range. >> + return 0; >> + >> + if ( kend <= end ) >> + return kstart; >> + } >> + >> + return 0; >> +} >> + >> +/* Check the kernel load address, and adjust if necessary and possible. */ >> +static bool __init check_and_adjust_load_address( >> + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms) >> +{ >> + paddr_t reloc_base; >> + >> + if ( check_load_address(d, elf) ) >> + return true; >> + >> + if ( !parms->phys_reloc ) >> + { >> + printk("%pd kernel: Address conflict and not relocatable\n", d); >> + return false; > > This better wouldn't result in -ENOMEM in the caller. Then again reasons > are logged, so the specific error code perhaos doesn't matter that much. Failure here is turned into -ENOMEM by pvh_load_kernel(). -ENOMEM is already returned for later failure with find_memory(), so I thought it was acceptable. Without this code, elf_load_binary() would failed with -1 and that would be returned. I'll change it to whatever you prefer. >> + } >> + >> + reloc_base = find_kernel_memory(d, elf, parms); >> + if ( reloc_base == 0 ) > > With ! used above please be consistent and do so here, too. phys_reloc is a bool, and reloc_base is a paddr_t. >> + { >> + printk("%pd kernel: Failed find a load address\n", d); >> + return false; >> + } >> + >> + if ( opt_dom0_verbose ) >> + printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", d, >> + elf->dest_base, elf->dest_base + elf->dest_size - 1, >> + reloc_base, reloc_base + elf->dest_size - 1); >> + >> + parms->phys_entry = reloc_base + >> + (parms->phys_entry - (paddr_t)elf->dest_base); >> + elf->dest_base = (char *)reloc_base; > > Just as a remark, no request to change anything: We're not dealing with > strings here. Hence char * isn't quite right, just that "dest_base" is > of that type (for whatever reason). I think the reason is just to be byte addressable. >> @@ -225,6 +232,28 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, >> case XEN_ELFNOTE_PHYS32_ENTRY: >> parms->phys_entry = val; >> break; >> + >> + case XEN_ELFNOTE_PHYS32_RELOC: >> + parms->phys_reloc = true; >> + >> + if ( descsz >= 4 ) >> + { >> + parms->phys_max = elf_note_numeric_array(elf, note, 4, 0); >> + elf_msg(elf, " = max: %#"PRIx32, parms->phys_max); >> + } >> + if ( descsz >= 8 ) >> + { >> + parms->phys_min = elf_note_numeric_array(elf, note, 4, 1); >> + elf_msg(elf, " min: %#"PRIx32, parms->phys_min); >> + } >> + if ( descsz >= 12 ) >> + { >> + parms->phys_align = elf_note_numeric_array(elf, note, 4, 2); >> + elf_msg(elf, " align: %#"PRIx32, parms->phys_align); >> + } >> + >> + elf_msg(elf, "\n"); > > Imo the newline wants adding outside of the switch(), for everything > set to ELFNOTE_CUSTOM. In fact with that I think ELFNOTE_CUSTOM and > ELFNOTE_NAME could be folded (with what's NAME right now simply > printing nothing in the switch here). Which suggests that I may > better not commit the (slightly adjusted) earlier patch, yet. Yes, please hold off and I'll re-work both. >> @@ -536,6 +565,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf, >> parms->p2m_base = UNSET_ADDR; >> parms->elf_paddr_offset = UNSET_ADDR; >> parms->phys_entry = UNSET_ADDR32; >> + parms->phys_min = 0; >> + parms->phys_max = 0xffffffff; >> + parms->phys_align = 0x200000; > > I think this wants to be MB(2) (requiring a pre-patch to centralize MB() > in the tool stack to tools/include/xen-tools/common-macros.h). And I > further think this needs to be an arch-specific constant, even if right > now the note is expected to be present only for x86. Which then also > needs saying ... Ok, I'll look into this. >> --- a/xen/include/public/elfnote.h >> +++ b/xen/include/public/elfnote.h >> @@ -194,10 +194,26 @@ >> */ >> #define XEN_ELFNOTE_PHYS32_ENTRY 18 >> >> +/* >> + * Physical loading constraints for PVH kernels >> + * >> + * Used to place constraints on the guest physical loading addresses and >> + * alignment for a PVH kernel. >> + * >> + * The presence of this note indicates the kernel supports relocating itself. >> + * >> + * The note may include up to three 32bit values in the following order: >> + * - a maximum address for the entire image to be loaded below (default >> + * 0xffffffff) >> + * - a minimum address for the start of the image (default 0) >> + * - a required start alignment (default 0x200000) >> + */ >> +#define XEN_ELFNOTE_PHYS32_RELOC 19 > > ... in the comment here. The reasoning behind not (intermediately) falling > back to the alignment in the ELF headers also wants adding to the patch > description (or a code comment, albeit there's no really good place to put > it, imo). > > Additionally I think the pieces of the comment want re-ordering. The primary > purpose is indicating relocatability; when not relocating, the constraints > aren't applied in any way. Yes, this is a good point. Thanks for the review. -Jason
On 21.03.2024 14:45, Jason Andryuk wrote: > On 2024-03-20 10:39, Jan Beulich wrote: >> On 19.03.2024 21:58, Jason Andryuk wrote: >>> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */ >>> +static paddr_t __init find_kernel_memory( >>> + const struct domain *d, struct elf_binary *elf, >>> + const struct elf_dom_parms *parms) >>> +{ >>> + paddr_t kernel_size = elf->dest_size; >>> + unsigned int i; >>> + >>> + for ( i = 0; i < d->arch.nr_e820; i++ ) >>> + { >>> + paddr_t start = d->arch.e820[i].addr; >>> + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; >>> + paddr_t kstart, kend; >>> + >>> + if ( d->arch.e820[i].type != E820_RAM || >>> + d->arch.e820[i].size < kernel_size ) >>> + continue; >>> + >>> + kstart = ROUNDUP(start, parms->phys_align); >>> + kstart = max_t(paddr_t, kstart, parms->phys_min); >> >> I'd generally try to avoid max_t(), but I cannot think of any good way >> of expressing this without using it. >> >>> + kend = kstart + kernel_size; >>> + >>> + if ( kend > parms->phys_max ) >> >> So despite its default phys_max is exclusive aiui. Otherwise with >> kend being exclusive too, this would look to be off by one. > > Yes, I'll fix the off-by-one. Hmmm, phys_max being 32bit, we want it to > be inclusive to represent the maximum range. I specifically didn't ask for that, as I think it would end up a little awkward. But I also don't mind you adjusting things to that effect. >>> + return 0; >>> + >>> + if ( kend <= end ) >>> + return kstart; >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +/* Check the kernel load address, and adjust if necessary and possible. */ >>> +static bool __init check_and_adjust_load_address( >>> + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms) >>> +{ >>> + paddr_t reloc_base; >>> + >>> + if ( check_load_address(d, elf) ) >>> + return true; >>> + >>> + if ( !parms->phys_reloc ) >>> + { >>> + printk("%pd kernel: Address conflict and not relocatable\n", d); >>> + return false; >> >> This better wouldn't result in -ENOMEM in the caller. Then again reasons >> are logged, so the specific error code perhaos doesn't matter that much. > > Failure here is turned into -ENOMEM by pvh_load_kernel(). -ENOMEM is > already returned for later failure with find_memory(), so I thought it > was acceptable. Without this code, elf_load_binary() would failed with > -1 and that would be returned. I'll change it to whatever you prefer. ENOSPC would likely make it easily distinguishable from anything else. >>> + } >>> + >>> + reloc_base = find_kernel_memory(d, elf, parms); >>> + if ( reloc_base == 0 ) >> >> With ! used above please be consistent and do so here, too. > > phys_reloc is a bool, and reloc_base is a paddr_t. Hmm, I see. But still - we generally prefer ! over "== 0" (or even "== NULL"). >>> + { >>> + printk("%pd kernel: Failed find a load address\n", d); >>> + return false; >>> + } >>> + >>> + if ( opt_dom0_verbose ) >>> + printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", d, >>> + elf->dest_base, elf->dest_base + elf->dest_size - 1, >>> + reloc_base, reloc_base + elf->dest_size - 1); >>> + >>> + parms->phys_entry = reloc_base + >>> + (parms->phys_entry - (paddr_t)elf->dest_base); >>> + elf->dest_base = (char *)reloc_base; >> >> Just as a remark, no request to change anything: We're not dealing with >> strings here. Hence char * isn't quite right, just that "dest_base" is >> of that type (for whatever reason). > > I think the reason is just to be byte addressable. Sure, but that would call for void *, unsigned char *, or uint8_t *. Jan
On 2024-03-21 09:45, Jason Andryuk wrote: > On 2024-03-20 10:39, Jan Beulich wrote: >> On 19.03.2024 21:58, Jason Andryuk wrote: >>> @@ -536,6 +565,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary >>> *elf, >>> parms->p2m_base = UNSET_ADDR; >>> parms->elf_paddr_offset = UNSET_ADDR; >>> parms->phys_entry = UNSET_ADDR32; >>> + parms->phys_min = 0; >>> + parms->phys_max = 0xffffffff; >>> + parms->phys_align = 0x200000; >> >> I think this wants to be MB(2) (requiring a pre-patch to centralize MB() >> in the tool stack to tools/include/xen-tools/common-macros.h). And I >> further think this needs to be an arch-specific constant, even if right >> now the note is expected to be present only for x86. Which then also >> needs saying ... Are you thinking something like the following in libelf-dominfo.c: #define X86_PHYS_ALIGN_DEFAULT MB(2) #define X86_PHYS_MAX_DEFAULT (GB(4) - 1) and setting as: parms->phys_max = X86_PHYS_MAX_DEFAULT; parms->phys_align = X86_PHYS_ALIGN_DEFAULT; libelf is arch neutral, so there isn't a natural place to introduce arch-specific defines. Or were you looking for each arch to set it? We only care about x86 right now, so we can do something like: #if x86 #define ARCH_PHYS_MAX_DEFAULT (GB(4) - 1) #define ARCH_PHYS_ALIGN_DEFAULT MB(2) #else #define ARCH_PHYS_MAX_DEFAULT 0 #define ARCH_PHYS_ALIGN_DEFAULT 0 #endif Thanks, Jason
On 22.03.2024 15:24, Jason Andryuk wrote: > On 2024-03-21 09:45, Jason Andryuk wrote: >> On 2024-03-20 10:39, Jan Beulich wrote: >>> On 19.03.2024 21:58, Jason Andryuk wrote: > >>>> @@ -536,6 +565,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary >>>> *elf, >>>> parms->p2m_base = UNSET_ADDR; >>>> parms->elf_paddr_offset = UNSET_ADDR; >>>> parms->phys_entry = UNSET_ADDR32; >>>> + parms->phys_min = 0; >>>> + parms->phys_max = 0xffffffff; >>>> + parms->phys_align = 0x200000; >>> >>> I think this wants to be MB(2) (requiring a pre-patch to centralize MB() >>> in the tool stack to tools/include/xen-tools/common-macros.h). And I >>> further think this needs to be an arch-specific constant, even if right >>> now the note is expected to be present only for x86. Which then also >>> needs saying ... > > Are you thinking something like the following in libelf-dominfo.c: > > #define X86_PHYS_ALIGN_DEFAULT MB(2) > #define X86_PHYS_MAX_DEFAULT (GB(4) - 1) > > and setting as: > parms->phys_max = X86_PHYS_MAX_DEFAULT; > parms->phys_align = X86_PHYS_ALIGN_DEFAULT; > > libelf is arch neutral, so there isn't a natural place to introduce > arch-specific defines. Or were you looking for each arch to set it? We > only care about x86 right now, so we can do something like: > > #if x86 > #define ARCH_PHYS_MAX_DEFAULT (GB(4) - 1) > #define ARCH_PHYS_ALIGN_DEFAULT MB(2) > #else > #define ARCH_PHYS_MAX_DEFAULT 0 > #define ARCH_PHYS_ALIGN_DEFAULT 0 > #endif More like the latter. The former only if the phys_* fields themselves were to also become x86-only. As you say, libelf in its present shape doesn't easily lend itself to such arch-specifics. Jan
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index 0ceda4140b..d2686318c3 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -537,6 +537,97 @@ static paddr_t __init find_memory( return INVALID_PADDR; } +static bool __init check_load_address( + const struct domain *d, const struct elf_binary *elf) +{ + paddr_t kernel_start = (paddr_t)elf->dest_base; + paddr_t kernel_end = kernel_start + elf->dest_size; + unsigned int i; + + /* Relies on a sorted memory map. */ + for ( i = 0; i < d->arch.nr_e820; i++ ) + { + paddr_t start = d->arch.e820[i].addr; + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; + + if ( start >= kernel_end ) + return false; + + if ( d->arch.e820[i].type == E820_RAM && + start <= kernel_start && + end >= kernel_end ) + return true; + } + + return false; +} + +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */ +static paddr_t __init find_kernel_memory( + const struct domain *d, struct elf_binary *elf, + const struct elf_dom_parms *parms) +{ + paddr_t kernel_size = elf->dest_size; + unsigned int i; + + for ( i = 0; i < d->arch.nr_e820; i++ ) + { + paddr_t start = d->arch.e820[i].addr; + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; + paddr_t kstart, kend; + + if ( d->arch.e820[i].type != E820_RAM || + d->arch.e820[i].size < kernel_size ) + continue; + + kstart = ROUNDUP(start, parms->phys_align); + kstart = max_t(paddr_t, kstart, parms->phys_min); + kend = kstart + kernel_size; + + if ( kend > parms->phys_max ) + return 0; + + if ( kend <= end ) + return kstart; + } + + return 0; +} + +/* Check the kernel load address, and adjust if necessary and possible. */ +static bool __init check_and_adjust_load_address( + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms) +{ + paddr_t reloc_base; + + if ( check_load_address(d, elf) ) + return true; + + if ( !parms->phys_reloc ) + { + printk("%pd kernel: Address conflict and not relocatable\n", d); + return false; + } + + reloc_base = find_kernel_memory(d, elf, parms); + if ( reloc_base == 0 ) + { + printk("%pd kernel: Failed find a load address\n", d); + return false; + } + + if ( opt_dom0_verbose ) + printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", d, + elf->dest_base, elf->dest_base + elf->dest_size - 1, + reloc_base, reloc_base + elf->dest_size - 1); + + parms->phys_entry = reloc_base + + (parms->phys_entry - (paddr_t)elf->dest_base); + elf->dest_base = (char *)reloc_base; + + return true; +} + static int __init pvh_load_kernel(struct domain *d, const module_t *image, unsigned long image_headroom, module_t *initrd, void *image_base, @@ -585,6 +676,9 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base); elf.dest_size = parms.virt_kend - parms.virt_kstart; + if ( !check_and_adjust_load_address(d, &elf, &parms) ) + return -ENOMEM; + elf_set_vcpu(&elf, v); rc = elf_load_binary(&elf); if ( rc < 0 ) diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c index 405510ead9..7861e79658 100644 --- a/xen/common/libelf/libelf-dominfo.c +++ b/xen/common/libelf/libelf-dominfo.c @@ -105,6 +105,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, ELFNOTE_INT, ELFNOTE_STRING, ELFNOTE_NAME, + ELFNOTE_CUSTOM, } type; } note_desc[] = { [XEN_ELFNOTE_ENTRY] = { "ENTRY", ELFNOTE_INT }, @@ -125,6 +126,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT }, [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT }, [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT }, + [XEN_ELFNOTE_PHYS32_RELOC] = { "PHYS32_RELOC", ELFNOTE_CUSTOM }, }; /* *INDENT-ON* */ @@ -132,6 +134,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, uint64_t val = 0; unsigned int i; unsigned type = elf_uval(elf, note, type); + unsigned descsz = elf_uval(elf, note, descsz); if ( (type >= sizeof(note_desc) / sizeof(note_desc[0])) || (note_desc[type].name == NULL) ) @@ -161,6 +164,10 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, { elf_msg(elf, "ELF: note: %s\n", note_desc[type].name); } + else if ( note_desc[type].type == ELFNOTE_CUSTOM ) + { + elf_msg(elf, "ELF: note: %s", note_desc[type].name); + } parms->elf_notes[type].name = note_desc[type].name; switch ( type ) @@ -225,6 +232,28 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf, case XEN_ELFNOTE_PHYS32_ENTRY: parms->phys_entry = val; break; + + case XEN_ELFNOTE_PHYS32_RELOC: + parms->phys_reloc = true; + + if ( descsz >= 4 ) + { + parms->phys_max = elf_note_numeric_array(elf, note, 4, 0); + elf_msg(elf, " = max: %#"PRIx32, parms->phys_max); + } + if ( descsz >= 8 ) + { + parms->phys_min = elf_note_numeric_array(elf, note, 4, 1); + elf_msg(elf, " min: %#"PRIx32, parms->phys_min); + } + if ( descsz >= 12 ) + { + parms->phys_align = elf_note_numeric_array(elf, note, 4, 2); + elf_msg(elf, " align: %#"PRIx32, parms->phys_align); + } + + elf_msg(elf, "\n"); + break; } return 0; } @@ -536,6 +565,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf, parms->p2m_base = UNSET_ADDR; parms->elf_paddr_offset = UNSET_ADDR; parms->phys_entry = UNSET_ADDR32; + parms->phys_min = 0; + parms->phys_max = 0xffffffff; + parms->phys_align = 0x200000; + parms->phys_reloc = false; /* Find and parse elf notes. */ count = elf_phdr_count(elf); diff --git a/xen/include/public/elfnote.h b/xen/include/public/elfnote.h index 8bf54d035b..54362b907f 100644 --- a/xen/include/public/elfnote.h +++ b/xen/include/public/elfnote.h @@ -194,10 +194,26 @@ */ #define XEN_ELFNOTE_PHYS32_ENTRY 18 +/* + * Physical loading constraints for PVH kernels + * + * Used to place constraints on the guest physical loading addresses and + * alignment for a PVH kernel. + * + * The presence of this note indicates the kernel supports relocating itself. + * + * The note may include up to three 32bit values in the following order: + * - a maximum address for the entire image to be loaded below (default + * 0xffffffff) + * - a minimum address for the start of the image (default 0) + * - a required start alignment (default 0x200000) + */ +#define XEN_ELFNOTE_PHYS32_RELOC 19 + /* * The number of the highest elfnote defined. */ -#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY +#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_RELOC /* * System information exported through crash notes. diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h index 1c77e3df31..777c5008ca 100644 --- a/xen/include/xen/libelf.h +++ b/xen/include/xen/libelf.h @@ -434,10 +434,14 @@ struct elf_dom_parms { uint32_t f_supported[XENFEAT_NR_SUBMAPS]; uint32_t f_required[XENFEAT_NR_SUBMAPS]; uint32_t phys_entry; + uint32_t phys_min; + uint32_t phys_max; + uint32_t phys_align; /* calculated */ uint64_t virt_kstart; uint64_t virt_kend; + bool phys_reloc; }; static inline void elf_xen_feature_set(int nr, uint32_t * addr)
Xen tries to load a PVH dom0 kernel at the fixed guest physical address from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but it can be configured. Unfortunately there exist firmwares that have reserved regions at this address, so Xen fails to load the dom0 kernel since it's not RAM. The PVH entry code is not relocatable - it loads from absolute addresses, which fail when the kernel is loaded at a different address. With a suitably modified kernel, a reloctable entry point is possible. Add XEN_ELFNOTE_PHYS32_RELOC which specifies optional minimum, maximum and alignment needed for the kernel. The presence of the NOTE indicates the kernel supports a relocatable entry path. Change the loading to check for an acceptable load address. If the kernel is relocatable, support finding an alternate load address. Link: https://gitlab.com/xen-project/xen/-/issues/180 Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> --- ELF Note printing looks like: (XEN) ELF: note: PHYS32_RELOC = max: 0xffffffff min: 0x1000000 align: 0x200000 v2: Use elfnote for min, max & align - use 64bit values. Print original and relocated memory addresses Use check_and_adjust_load_address() name Return relocated base instead of offset Use PAGE_ALIGN Don't load above max_phys (expected to be 4GB in kernel elf note) Use single line comments Exit check_load_address loop earlier Add __init to find_kernel_memory() v3: Remove kernel_start/end page rounding Change loop comment to rely on a sorted memory map. Reorder E820_RAM check first Use %p for dest_base Use PRIpaddr Use 32bit phys_min/max/align Consolidate to if ( x || y ) continue Use max_t Add parms->phys_reloc Use common "%pd kernel: " prefix for messages Re-order phys_entry assignment Print range ends inclusively Remove extra "Unable to load kernel" message s/PVH_RELOCATION/PHYS32_RELOC/ Make PHYS32_RELOC contents optional Use 2MB default alignment Update ELF Note comment Update XEN_ELFNOTE_MAX --- xen/arch/x86/hvm/dom0_build.c | 94 ++++++++++++++++++++++++++++++ xen/common/libelf/libelf-dominfo.c | 33 +++++++++++ xen/include/public/elfnote.h | 18 +++++- xen/include/xen/libelf.h | 4 ++ 4 files changed, 148 insertions(+), 1 deletion(-)