Message ID | 20200302155509.44753-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/dom0: improve PVH initrd and metadata placement | expand |
On 02.03.2020 16:55, Roger Pau Monne wrote: > Don't assume there's going to be enough space at the tail of the > loaded kernel and instead try to find a suitable memory area where the > initrd and metadata can be loaded. > > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > xen/arch/x86/hvm/dom0_build.c | 51 ++++++++++++++++++++++++++++++++++- > 1 file changed, 50 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c > index eded87eaf5..a03bf2e663 100644 > --- a/xen/arch/x86/hvm/dom0_build.c > +++ b/xen/arch/x86/hvm/dom0_build.c > @@ -490,6 +490,44 @@ static int __init pvh_populate_p2m(struct domain *d) > #undef MB1_PAGES > } > > +static paddr_t find_memory(const struct domain *d, const struct elf_binary *elf, > + size_t size) > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base; > + paddr_t kernel_end = (paddr_t)(elf->dest_base + elf->dest_size); > + unsigned int i; > + > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start, end; > + > + if ( d->arch.e820[i].addr < MB(1) && > + d->arch.e820[i].addr + d->arch.e820[i].size < MB(1) ) I guess you mean <= here, and I also guess the left side of the && could be dropped altogether (as redundant - E820 entries aren't supposed to wrap, or if they did we'd have bigger problems). Also perhaps ... > + continue; > + > + start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1)); > + end = d->arch.e820[i].addr + d->arch.e820[i].size; ... calculate "end" earlier and use it in the if() above? As to the aligning to a 1Mb boundary - why are you doing this? I guess whatever the reason warrants a comment, the more that further down you only align to page boundaries. > + /* Deal with the kernel being loaded in the region. */ > + if ( kernel_start <= start && kernel_end >= start ) While it doesn't matter much, I think it would look better to use > on the right side of the && here ... > + /* Truncate the start of the region */ > + start = ROUNDUP(kernel_end, PAGE_SIZE); > + else if ( kernel_start <= end && kernel_end >= end ) and < on the left side of the && here. Furthermore - can this really be "else if()" here, i.e. doesn't it need to be plain "if()"? > + /* Truncate the end of the region */ > + end = kernel_start; > + /* Pick the biggest of the split regions */ > + else if ( kernel_start - start > end - kernel_end ) I don't think the logic above guarantees e.g. kernel_start > start (i.e. the subtraction to not wrap). More generally I don't think it follows that there are two split regions at this point. At the very least I think this whole block wants to be wrapped in a check that there's any overlap between kernel and the given region in the first place. > + end = kernel_start; > + else > + start = ROUNDUP(kernel_end, PAGE_SIZE); > + > + if ( end - start >= size ) > + return start; Are all blocks E820_RAM at this point in time? Otherwise there's a type check missing. (Even if all are of this type now, adding a type check may still be a good idea to be more future proof.) > @@ -546,7 +584,18 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, > return rc; > } > > - last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE); > + last_addr = find_memory(d, &elf, sizeof(start_info) + > + initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) + > + sizeof(mod) > + : 0 + > + cmdline ? ROUNDUP(strlen(cmdline) + 1, > + elf_64bit(&elf) ? 8 : 4) > + : 0); I guess you mean last_addr = find_memory(d, &elf, sizeof(start_info) + (initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) + sizeof(mod) : 0) + (cmdline ? ROUNDUP(strlen(cmdline) + 1, elf_64bit(&elf) ? 8 : 4) : 0)); ? Also, do both regions need to be adjacent? If not, wouldn't it be better to find slots for them one by one? Jan
On Tue, Mar 03, 2020 at 10:14:50AM +0100, Jan Beulich wrote: > On 02.03.2020 16:55, Roger Pau Monne wrote: > > Don't assume there's going to be enough space at the tail of the > > loaded kernel and instead try to find a suitable memory area where the > > initrd and metadata can be loaded. > > > > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > xen/arch/x86/hvm/dom0_build.c | 51 ++++++++++++++++++++++++++++++++++- > > 1 file changed, 50 insertions(+), 1 deletion(-) > > > > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c > > index eded87eaf5..a03bf2e663 100644 > > --- a/xen/arch/x86/hvm/dom0_build.c > > +++ b/xen/arch/x86/hvm/dom0_build.c > > @@ -490,6 +490,44 @@ static int __init pvh_populate_p2m(struct domain *d) > > #undef MB1_PAGES > > } > > > > +static paddr_t find_memory(const struct domain *d, const struct elf_binary *elf, > > + size_t size) > > +{ > > + paddr_t kernel_start = (paddr_t)elf->dest_base; > > + paddr_t kernel_end = (paddr_t)(elf->dest_base + elf->dest_size); > > + unsigned int i; > > + > > + for ( i = 0; i < d->arch.nr_e820; i++ ) > > + { > > + paddr_t start, end; > > + > > + if ( d->arch.e820[i].addr < MB(1) && > > + d->arch.e820[i].addr + d->arch.e820[i].size < MB(1) ) > > I guess you mean <= here, Hm, right, or else I would have to - 1. > and I also guess the left side of the > && could be dropped altogether (as redundant - E820 entries > aren't supposed to wrap, or if they did we'd have bigger > problems). Also perhaps ... > > > + continue; > > + > > + start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1)); > > + end = d->arch.e820[i].addr + d->arch.e820[i].size; > > ... calculate "end" earlier and use it in the if() above? Right. > > As to the aligning to a 1Mb boundary - why are you doing this? I'm not sure placing the initrd and metadata below 1MB is sensible, even if a region big enough was found. In pvh_populate_p2m we copy the data on the memory regions < 1MB, I'm not sure BDA/EBDA is marked as reserved in the memory map always. > I guess whatever the reason warrants a comment, the more that > further down you only align to page boundaries. > > > + /* Deal with the kernel being loaded in the region. */ > > + if ( kernel_start <= start && kernel_end >= start ) > > While it doesn't matter much, I think it would look better to > use > on the right side of the && here ... > > > + /* Truncate the start of the region */ > > + start = ROUNDUP(kernel_end, PAGE_SIZE); > > + else if ( kernel_start <= end && kernel_end >= end ) > > and < on the left side of the && here. Furthermore - can this > really be "else if()" here, i.e. doesn't it need to be plain > "if()"? I don't think so, as the region where the kernel has been loaded must always be a single RAM region. Ie: [kernel_start, kernel_end) is always going to be a subset of a RAM region. So either the kernel region doesn't overlap, or overlaps with the head, tail or splits the region into two. > > + /* Truncate the end of the region */ > > + end = kernel_start; > > + /* Pick the biggest of the split regions */ > > + else if ( kernel_start - start > end - kernel_end ) > > I don't think the logic above guarantees e.g. kernel_start > start > (i.e. the subtraction to not wrap). More generally I don't think > it follows that there are two split regions at this point. At the > very least I think this whole block wants to be wrapped in a > check that there's any overlap between kernel and the given region > in the first place. Yes, that's indeed missing. > > > + end = kernel_start; > > + else > > + start = ROUNDUP(kernel_end, PAGE_SIZE); > > + > > + if ( end - start >= size ) > > + return start; > > Are all blocks E820_RAM at this point in time? Otherwise there's > a type check missing. (Even if all are of this type now, adding > a type check may still be a good idea to be more future proof.) Will add the check. > > @@ -546,7 +584,18 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, > > return rc; > > } > > > > - last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE); > > + last_addr = find_memory(d, &elf, sizeof(start_info) + > > + initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) + > > + sizeof(mod) > > + : 0 + > > + cmdline ? ROUNDUP(strlen(cmdline) + 1, > > + elf_64bit(&elf) ? 8 : 4) > > + : 0); > > I guess you mean > > last_addr = find_memory(d, &elf, sizeof(start_info) + > (initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) + > sizeof(mod) > : 0) + > (cmdline ? ROUNDUP(strlen(cmdline) + 1, > elf_64bit(&elf) ? 8 : 4) > : 0)); > > ? Uh, yes, sorry. > > Also, do both regions need to be adjacent? If not, wouldn't it be > better to find slots for them one by one? That's going to be much more complicated, as you would have to account for previous regions while searching for empty spaces. If we want to go that route we would have to use a rangeset or some such in order to keep track of used areas. Thanks, Roger.
On 03.03.2020 11:29, Roger Pau Monné wrote: > On Tue, Mar 03, 2020 at 10:14:50AM +0100, Jan Beulich wrote: >> On 02.03.2020 16:55, Roger Pau Monne wrote: >>> + start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1)); >>> + end = d->arch.e820[i].addr + d->arch.e820[i].size; >> >> ... calculate "end" earlier and use it in the if() above? > > Right. > >> >> As to the aligning to a 1Mb boundary - why are you doing this? > > I'm not sure placing the initrd and metadata below 1MB is sensible, > even if a region big enough was found. In pvh_populate_p2m we copy the > data on the memory regions < 1MB, I'm not sure BDA/EBDA is marked as > reserved in the memory map always. I realize now that I misread the code - you don't align to a 1Mb boundary, but you cap the range at the lower end. That's fine of course. >> I guess whatever the reason warrants a comment, the more that >> further down you only align to page boundaries. >> >>> + /* Deal with the kernel being loaded in the region. */ >>> + if ( kernel_start <= start && kernel_end >= start ) >> >> While it doesn't matter much, I think it would look better to >> use > on the right side of the && here ... >> >>> + /* Truncate the start of the region */ >>> + start = ROUNDUP(kernel_end, PAGE_SIZE); >>> + else if ( kernel_start <= end && kernel_end >= end ) >> >> and < on the left side of the && here. Furthermore - can this >> really be "else if()" here, i.e. doesn't it need to be plain >> "if()"? > > I don't think so, as the region where the kernel has been loaded must > always be a single RAM region. Ie: [kernel_start, kernel_end) is > always going to be a subset of a RAM region. I this true even with the page size alignment getting done? IOW are all E820 ranges we produce exact multiples of 4k in size and aligned to 4k boundaries? >> Also, do both regions need to be adjacent? If not, wouldn't it be >> better to find slots for them one by one? > > That's going to be much more complicated, as you would have to account > for previous regions while searching for empty spaces. If we want to > go that route we would have to use a rangeset or some such in order to > keep track of used areas. I accept this being more complicated, and hence not really wanting doing now and here. But perhaps you could leave a comment to the effect that the choice of using a single region is for simplicity reasons? Jan
On Tue, Mar 03, 2020 at 12:00:00PM +0100, Jan Beulich wrote: > On 03.03.2020 11:29, Roger Pau Monné wrote: > > On Tue, Mar 03, 2020 at 10:14:50AM +0100, Jan Beulich wrote: > >> On 02.03.2020 16:55, Roger Pau Monne wrote: > >>> + /* Truncate the start of the region */ > >>> + start = ROUNDUP(kernel_end, PAGE_SIZE); > >>> + else if ( kernel_start <= end && kernel_end >= end ) > >> > >> and < on the left side of the && here. Furthermore - can this > >> really be "else if()" here, i.e. doesn't it need to be plain > >> "if()"? > > > > I don't think so, as the region where the kernel has been loaded must > > always be a single RAM region. Ie: [kernel_start, kernel_end) is > > always going to be a subset of a RAM region. > > I this true even with the page size alignment getting done? > IOW are all E820 ranges we produce exact multiples of 4k in > size and aligned to 4k boundaries? Yes, pvh_setup_e820 guarantees that, as the EPT/NPT cannot handle anything smaller than a page. The RAM regions in the native e820 are adjusted to that effect. > >> Also, do both regions need to be adjacent? If not, wouldn't it be > >> better to find slots for them one by one? > > > > That's going to be much more complicated, as you would have to account > > for previous regions while searching for empty spaces. If we want to > > go that route we would have to use a rangeset or some such in order to > > keep track of used areas. > > I accept this being more complicated, and hence not really > wanting doing now and here. But perhaps you could leave a > comment to the effect that the choice of using a single > region is for simplicity reasons? Sure. Thanks, Roger.
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index eded87eaf5..a03bf2e663 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -490,6 +490,44 @@ static int __init pvh_populate_p2m(struct domain *d) #undef MB1_PAGES } +static paddr_t find_memory(const struct domain *d, const struct elf_binary *elf, + size_t size) +{ + paddr_t kernel_start = (paddr_t)elf->dest_base; + paddr_t kernel_end = (paddr_t)(elf->dest_base + elf->dest_size); + unsigned int i; + + for ( i = 0; i < d->arch.nr_e820; i++ ) + { + paddr_t start, end; + + if ( d->arch.e820[i].addr < MB(1) && + d->arch.e820[i].addr + d->arch.e820[i].size < MB(1) ) + continue; + + start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1)); + end = d->arch.e820[i].addr + d->arch.e820[i].size; + + /* Deal with the kernel being loaded in the region. */ + if ( kernel_start <= start && kernel_end >= start ) + /* Truncate the start of the region */ + start = ROUNDUP(kernel_end, PAGE_SIZE); + else if ( kernel_start <= end && kernel_end >= end ) + /* Truncate the end of the region */ + end = kernel_start; + /* Pick the biggest of the split regions */ + else if ( kernel_start - start > end - kernel_end ) + end = kernel_start; + else + start = ROUNDUP(kernel_end, PAGE_SIZE); + + if ( end - start >= size ) + return start; + } + + return INVALID_PADDR; +} + static int __init pvh_load_kernel(struct domain *d, const module_t *image, unsigned long image_headroom, module_t *initrd, void *image_base, @@ -546,7 +584,18 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, return rc; } - last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE); + last_addr = find_memory(d, &elf, sizeof(start_info) + + initrd ? ROUNDUP(initrd->mod_end, PAGE_SIZE) + + sizeof(mod) + : 0 + + cmdline ? ROUNDUP(strlen(cmdline) + 1, + elf_64bit(&elf) ? 8 : 4) + : 0); + if ( last_addr == INVALID_PADDR ) + { + printk("Unable to find a memory region to load initrd and metadata\n"); + return -ENOMEM; + } if ( initrd != NULL ) {
Don't assume there's going to be enough space at the tail of the loaded kernel and instead try to find a suitable memory area where the initrd and metadata can be loaded. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/x86/hvm/dom0_build.c | 51 ++++++++++++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 1 deletion(-)