Message ID | 49E99054.6050208@kernel.org (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
* Yinghai Lu <yinghai@kernel.org> wrote: > Linus Torvalds wrote: > > > > On Thu, 16 Apr 2009, Yinghai Lu wrote: > >> please check. > >> > >> [PATCH] x86/pci: make pci_mem_start to be aligned only -v4 > > > > I like the approach. That said, I think that rather than do the "modify > > the e820 array" thing, why not just do it in the in the resource tree, and > > do it at "e820_reserve_resources_late()" time? > > > > IOW, something like this. > > > > TOTALLY UNTESTED! The point is to take all RAM resources we haev, and > > _after_ we've added all the resources we've seen in the E820 tree, we then > > _also_ try to add fake reserved entries for any "round up to X" at the end > > of the RAM resources. > > > > NOTE! I really didn't want to use "reserve_region_with_split()". I didn't > > want to recurse into any conflicting resources, I really wanted to just do > > the other failure cases. > > > > THIS PATCH IS NOT MEANT TO BE USED. Just a rough "almost like this" kind > > of thing. That includes the rough draft of how much to round things up to > > based on where the end of RAM region is etc. I'm really throwing this out > > more as a "wouldn't this be a readable way to handle any missing reserved > > entries" kind of thing.. > > > > Linus > > > > --- > > arch/x86/kernel/e820.c | 34 ++++++++++++++++++++++++++++++++++ > > 1 files changed, 34 insertions(+), 0 deletions(-) > > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > > index ef2c356..e8b8d33 100644 > > --- a/arch/x86/kernel/e820.c > > +++ b/arch/x86/kernel/e820.c > > @@ -1370,6 +1370,23 @@ void __init e820_reserve_resources(void) > > } > > } > > > > +/* How much should we pad RAM ending depending on where it is? */ > > +static unsigned long ram_alignment(resource_size_t pos) > > +{ > > + unsigned long mb = pos >> 20; > > + > > + /* To 64kB in the first megabyte */ > > + if (!mb) > > + return 64*1024; > > + > > + /* To 1MB in the first 16MB */ > > + if (mb < 16) > > + return 1024*1024; > > + > > + /* To 32MB for anything above that */ > > + return 32*1024*1024; > > +} > > + > > void __init e820_reserve_resources_late(void) > > { > > int i; > > @@ -1381,6 +1398,23 @@ void __init e820_reserve_resources_late(void) > > insert_resource_expand_to_fit(&iomem_resource, res); > > res++; > > } > > + > > + /* > > + * Try to bump up RAM regions to reasonable boundaries to > > + * avoid stolen RAM > > + */ > > + for (i = 0; i < e820.nr_map; i++) { > > + struct e820entry *entry = &e820_saved.map[i]; > > + resource_size_t start, end; > > + > > + if (entry->type != E820_RAM) > > + continue; > > + start = entry->addr + entry->size; > > + end = round_up(start, ram_alignment(start)); > > + if (start == end) > > + continue; > > + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); > > + } > > } > > > > char *__init default_machine_specific_memory_setup(void) > > except need to change > > + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); > ==> > + reserve_region_with_split(&iomem_resource, start, end - 1, "RAM buffer"); > > it will make sure dynmical allocating code will not use those range. > > and could make e820_setup_gap much simple. > > --- > arch/x86/kernel/e820.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > Index: linux-2.6/arch/x86/kernel/e820.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/e820.c > +++ linux-2.6/arch/x86/kernel/e820.c > @@ -635,14 +635,12 @@ __init void e820_setup_gap(void) > #endif > > /* > - * See how much we want to round up: start off with > - * rounding to the next 1MB area. > + * e820_reserve_resources_late will protect stolen RAM > + * so just round it to 1M > */ > round = 0x100000; > - while ((gapsize >> 4) > round) > - round += round; > - /* Fun with two's complement */ > - pci_mem_start = (gapstart + round) & -round; > + > + pci_mem_start = roundup(gapstart, round); > > printk(KERN_INFO > "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n", > > Ingo, can you put those two patches in tip? I think the point would be to explore the possibility to have no 'gap' logic at all - we should extend the e820 table with Linus's scheme to add 'RAM buffer' entries. That way, if you search for a sufficient size hole, it will be correctly aligned straight away - with no rounding/gap logic at all. (Maybe add a debug warning to catch when this is not the case.) Am i missing something? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ingo Molnar wrote: > * Yinghai Lu <yinghai@kernel.org> wrote: > >> Linus Torvalds wrote: >>> On Thu, 16 Apr 2009, Yinghai Lu wrote: >>>> please check. >>>> >>>> [PATCH] x86/pci: make pci_mem_start to be aligned only -v4 >>> I like the approach. That said, I think that rather than do the "modify >>> the e820 array" thing, why not just do it in the in the resource tree, and >>> do it at "e820_reserve_resources_late()" time? >>> >>> IOW, something like this. >>> >>> TOTALLY UNTESTED! The point is to take all RAM resources we haev, and >>> _after_ we've added all the resources we've seen in the E820 tree, we then >>> _also_ try to add fake reserved entries for any "round up to X" at the end >>> of the RAM resources. >>> >>> NOTE! I really didn't want to use "reserve_region_with_split()". I didn't >>> want to recurse into any conflicting resources, I really wanted to just do >>> the other failure cases. >>> >>> THIS PATCH IS NOT MEANT TO BE USED. Just a rough "almost like this" kind >>> of thing. That includes the rough draft of how much to round things up to >>> based on where the end of RAM region is etc. I'm really throwing this out >>> more as a "wouldn't this be a readable way to handle any missing reserved >>> entries" kind of thing.. >>> >>> Linus >>> >>> --- >>> arch/x86/kernel/e820.c | 34 ++++++++++++++++++++++++++++++++++ >>> 1 files changed, 34 insertions(+), 0 deletions(-) >>> >>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c >>> index ef2c356..e8b8d33 100644 >>> --- a/arch/x86/kernel/e820.c >>> +++ b/arch/x86/kernel/e820.c >>> @@ -1370,6 +1370,23 @@ void __init e820_reserve_resources(void) >>> } >>> } >>> >>> +/* How much should we pad RAM ending depending on where it is? */ >>> +static unsigned long ram_alignment(resource_size_t pos) >>> +{ >>> + unsigned long mb = pos >> 20; >>> + >>> + /* To 64kB in the first megabyte */ >>> + if (!mb) >>> + return 64*1024; >>> + >>> + /* To 1MB in the first 16MB */ >>> + if (mb < 16) >>> + return 1024*1024; >>> + >>> + /* To 32MB for anything above that */ >>> + return 32*1024*1024; >>> +} >>> + >>> void __init e820_reserve_resources_late(void) >>> { >>> int i; >>> @@ -1381,6 +1398,23 @@ void __init e820_reserve_resources_late(void) >>> insert_resource_expand_to_fit(&iomem_resource, res); >>> res++; >>> } >>> + >>> + /* >>> + * Try to bump up RAM regions to reasonable boundaries to >>> + * avoid stolen RAM >>> + */ >>> + for (i = 0; i < e820.nr_map; i++) { >>> + struct e820entry *entry = &e820_saved.map[i]; >>> + resource_size_t start, end; >>> + >>> + if (entry->type != E820_RAM) >>> + continue; >>> + start = entry->addr + entry->size; >>> + end = round_up(start, ram_alignment(start)); >>> + if (start == end) >>> + continue; >>> + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); >>> + } >>> } >>> >>> char *__init default_machine_specific_memory_setup(void) >> except need to change >>> + reserve_region_with_split(&iomem_resource, start, end, "RAM buffer"); >> ==> > + reserve_region_with_split(&iomem_resource, start, end - 1, "RAM buffer"); >> >> it will make sure dynmical allocating code will not use those range. >> >> and could make e820_setup_gap much simple. >> >> --- >> arch/x86/kernel/e820.c | 10 ++++------ >> 1 file changed, 4 insertions(+), 6 deletions(-) >> >> Index: linux-2.6/arch/x86/kernel/e820.c >> =================================================================== >> --- linux-2.6.orig/arch/x86/kernel/e820.c >> +++ linux-2.6/arch/x86/kernel/e820.c >> @@ -635,14 +635,12 @@ __init void e820_setup_gap(void) >> #endif >> >> /* >> - * See how much we want to round up: start off with >> - * rounding to the next 1MB area. >> + * e820_reserve_resources_late will protect stolen RAM >> + * so just round it to 1M >> */ >> round = 0x100000; >> - while ((gapsize >> 4) > round) >> - round += round; >> - /* Fun with two's complement */ >> - pci_mem_start = (gapstart + round) & -round; >> + >> + pci_mem_start = roundup(gapstart, round); >> >> printk(KERN_INFO >> "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n", >> >> Ingo, can you put those two patches in tip? > > I think the point would be to explore the possibility to have no > 'gap' logic at all - we should extend the e820 table with Linus's > scheme to add 'RAM buffer' entries. > so you prefer the old one aka the -v4, and add new entry type for RAM Buffer? YH -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 18 Apr 2009, Ingo Molnar wrote: > > Am i missing something? We also try to avoid random motherboard resources etc that aren't reserved or documented by the BIOS. It's better to go into big holes. It's also better to try to keep as close to the old (tested) behavior. Now, admittedly those undocumented resources are _much_ more common in IO space, but still. They're _very_ common. Om my modern Nehalem thing with an Intel BIOS (supposedly "good" and not from some random manufacturer), I have, for example: [ 26.533771] pci 0000:00:1f.0: ICH7 LPC Generic IO decode 2 PIO at 0810 (mask 007f) byt that one isn't covered by any PnP range or anythign else. [ Now, it's possible that it's bogus: "0x810" has a bit set in the same bits that cover the mask, and I don't know if the mask is a "ignore these bits" (and the range would thus match all of 0x0800-0x087f) or if the mast is a "port & ~mask == base" in which case nothing would ever match. But I _think_ the BIOS literally set up something to answer int he 0x08?? range, and didn't document it anywhere. The same can be true of MMIO too, and so we should try to avoid using random memory areas if we can ] Linus -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sat, 18 Apr 2009, Ingo Molnar wrote: > > > > Am i missing something? > > We also try to avoid random motherboard resources etc that aren't > reserved or documented by the BIOS. It's better to go into big > holes. It's also better to try to keep as close to the old > (tested) behavior. Yeah - i'm not suggesting any change in behavior, nor am i suggesting any risky behavior. The current code seems to work quite well. I'm just suggesting (maybe foolishly) that instead of having any gap-rounding logic at all, add artificial entries to the e820 map to 'extend' and round up any odd ending entries. I.e. explicitly manage all the 'hole' space to be nicely rounded and to be far away from any T-Seg or other sekrit motherboard resource danger area. We'd do this after PCI static allocations (so we dont ever stomp on real, known resources) but before PCI dynamic allocations. The e820 printout would look literally like this: BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) 0.639 MB RAM BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) 0.001 MB [ hole ] 0.250 MB BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) 0.125 MB BIOS-e820: 0000000000100000 - 000000003ed94000 (usable) 1004.5 MB RAM BIOS-e820: 000000003ed94000 - 000000003ee4e000 (ACPI NVS) 0.7 MB BIOS-e820: 000000003ee4e000 - 000000003fea2000 (usable) 16.3 MB RAM BIOS-e820: 000000003fea2000 - 000000003fee9000 (ACPI NVS) 0.3 MB BIOS-e820: 000000003fee9000 - 000000003feed000 (usable) 0.15 MB RAM BIOS-e820: 000000003feed000 - 000000003feff000 (ACPI data 0.07 MB BIOS-e820: 000000003feff000 - 000000003ff00000 (usable) 0.004 MB RAM BIOS-e820: 000000003ff00000 - 0000000040000000 (guard) 1.0 MB [ hole ] 3072.0 MB The '(guard)' entry at the end i added above. This way we intentionally create a 'free physical address space' hole space that is the same as the rounding logic. No rounding needed anywhere - as all the remaining address space is well-rounded already. Plus we'd also _see_ all our rounding logic by looking at the '(guard)' entries. Or maybe there's some aspect of gap-rounding that cannot be expressed in such a static way? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Apr 18, 2009 at 09:14:25PM +0200, Ingo Molnar wrote: > This way we intentionally create a 'free physical address space' > hole space that is the same as the rounding logic. No rounding > needed anywhere - as all the remaining address space is well-rounded > already. Plus we'd also _see_ all our rounding logic by looking at > the '(guard)' entries. > > Or maybe there's some aspect of gap-rounding that cannot be > expressed in such a static way? My gut feeling is that you guys do overcomplicate a simple issue which can be fixed with a one-liner like this: pci_mem_start = pci_mem_start < 0xc0000000 ? : 0xc0000000; This 0xc0000000 (3G) seems to be a pretty fundamental thing for certain 32-bit OS. ;-) Ivan. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/arch/x86/kernel/e820.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/e820.c +++ linux-2.6/arch/x86/kernel/e820.c @@ -635,14 +635,12 @@ __init void e820_setup_gap(void) #endif /* - * See how much we want to round up: start off with - * rounding to the next 1MB area. + * e820_reserve_resources_late will protect stolen RAM + * so just round it to 1M */ round = 0x100000; - while ((gapsize >> 4) > round) - round += round; - /* Fun with two's complement */ - pci_mem_start = (gapstart + round) & -round; + + pci_mem_start = roundup(gapstart, round); printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n",