Message ID | 20181010084119.17539-1-fanc.fnst@cn.fujitsu.com (mailing list archive) |
---|---|
Headers | show |
Series | x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory | expand |
On Wed, Oct 10, 2018 at 04:41:16PM +0800, Chao Fan wrote: > ***Background: > People reported that kaslr may randomly chooses some positions > which are located in movable memory regions. This will break memory > hotplug feature and make the movable memory chosen by KASLR can't be > removed. > > ***Solutions: > There should be a method to limit kaslr to choosing immovable memory > regions, so there are 2 solutions: > 1) Add a kernel parameter to specify the memory regions. > 2) Get the information of memory hot-remove, then kaslr will know the > right regions. > In method 2, information about memory hot-remove is in ACPI > tables, which will be parsed after start_kernel(), kaslr can't get > the information. > In method 1, users should know the regions address and specify in > kernel parameter. > > In the earliest time, I tried to dig ACPI tabls to solve this problem. > But I didn't splite the code in 'compressed/' and ACPI code, so the patch > is hard to follow so refused by community. > Somebody suggest to add a kernel parameter to specify the > immovable memory so that limit kaslr in these regions. Then I make > a new patchset. After several versions, Ingo gave a suggestion: > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html > Follow Ingo's suggestion, imitate the ACPI code to parse the acpi > tables, so that the kaslr can get necessary memory information in > ACPI tables. > I think ACPI code is an independent part, so copy the codes > and functions to 'compressed/' directory, so that kaslr won't > influence the initialization of ACPI. ... and we just picked up https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com and without having looked at the rest of your stuff, if people accept your solution, we don't need the silly parameter anymore, right? Which means, we should not rush the whole thing yet until the whole KASLR vs movable memory gets solved properly. Ingo, Thomas?
Hi Boris, On 10/10/18 at 10:59am, Borislav Petkov wrote: > ... and we just picked up > > https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com > > and without having looked at the rest of your stuff, if people accept > your solution, we don't need the silly parameter anymore, right? > > Which means, we should not rush the whole thing yet until the whole > KASLR vs movable memory gets solved properly. Masa's patches solves the problem in memory region KASLR which later hot added memory may be big than the default padding 10 TB. Chao's patches is trying to fix a conflict between 'movable_node' and kernel text KASLR. If 'movable_node' specified, we rely on SRAT to get which memory region is movable or immovable, and movable region can be hot removed. But if kernel is randomized into movable memory, it can't be hot removed any more, this is a regression after KASLR introduced. So this is a different issue than Masa's. Thanks Baoquan
On Wed, Oct 10, 2018 at 05:06:20PM +0800, Baoquan He wrote: >Hi Boris, > >On 10/10/18 at 10:59am, Borislav Petkov wrote: >> ... and we just picked up >> >> https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com >> >> and without having looked at the rest of your stuff, if people accept >> your solution, we don't need the silly parameter anymore, right? >> >> Which means, we should not rush the whole thing yet until the whole >> KASLR vs movable memory gets solved properly. > >Masa's patches solves the problem in memory region KASLR which later hot >added memory may be big than the default padding 10 TB. > >Chao's patches is trying to fix a conflict between 'movable_node' and >kernel text KASLR. If 'movable_node' specified, we rely on SRAT to get >which memory region is movable or immovable, and movable region can be >hot removed. But if kernel is randomized into movable memory, it can't >be hot removed any more, this is a regression after KASLR introduced. >So this is a different issue than Masa's. Yes, they are two issues. But if we can get more memory information by the function in the new file acpi.c, semms it's helfpul to Masa's issue. Thanks, Chao Fan > >Thanks >Baoquan > >
On Wed, 10 Oct 2018, Baoquan He wrote: > Hi Boris, > > On 10/10/18 at 10:59am, Borislav Petkov wrote: > > ... and we just picked up > > > > https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com > > > > and without having looked at the rest of your stuff, if people accept > > your solution, we don't need the silly parameter anymore, right? > > > > Which means, we should not rush the whole thing yet until the whole > > KASLR vs movable memory gets solved properly. > > Masa's patches solves the problem in memory region KASLR which later hot > added memory may be big than the default padding 10 TB. > > Chao's patches is trying to fix a conflict between 'movable_node' and > kernel text KASLR. If 'movable_node' specified, we rely on SRAT to get > which memory region is movable or immovable, and movable region can be > hot removed. But if kernel is randomized into movable memory, it can't > be hot removed any more, this is a regression after KASLR introduced. > So this is a different issue than Masa's. Yes, it's different, but if the SRAT information is available early, then the command line parameter can go away because then the required information for Masa's problem is available as well. Thanks, tglx
On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote: > Yes, it's different, but if the SRAT information is available early, then > the command line parameter can go away because then the required > information for Masa's problem is available as well. Exactly. And I'd prefer we delayed the command line parameter until we figure out we really need it and not expose it to upstream and then remove it shortly after. So I'd suggest we move Masa's patches to a separate branch and not send it up this round.
On 10/10/18 at 05:12pm, Chao Fan wrote: > On Wed, Oct 10, 2018 at 05:06:20PM +0800, Baoquan He wrote: > >Hi Boris, > > > >On 10/10/18 at 10:59am, Borislav Petkov wrote: > >> ... and we just picked up > >> > >> https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com > >> > >> and without having looked at the rest of your stuff, if people accept > >> your solution, we don't need the silly parameter anymore, right? > >> > >> Which means, we should not rush the whole thing yet until the whole > >> KASLR vs movable memory gets solved properly. > > > >Masa's patches solves the problem in memory region KASLR which later hot > >added memory may be big than the default padding 10 TB. > > > >Chao's patches is trying to fix a conflict between 'movable_node' and > >kernel text KASLR. If 'movable_node' specified, we rely on SRAT to get > >which memory region is movable or immovable, and movable region can be > >hot removed. But if kernel is randomized into movable memory, it can't > >be hot removed any more, this is a regression after KASLR introduced. > >So this is a different issue than Masa's. > > Yes, they are two issues. > But if we can get more memory information by the function in > the new file acpi.c, semms it's helfpul to Masa's issue. Hmm, reading SRAT three times during x86 kernel boot? Maybe we try this after the function has run a time and proved very stable? Thanks Baoquan
On 10/10/18 at 11:19am, Borislav Petkov wrote: > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote: > > Yes, it's different, but if the SRAT information is available early, then > > the command line parameter can go away because then the required > > information for Masa's problem is available as well. > > Exactly. And I'd prefer we delayed the command line parameter until we > figure out we really need it and not expose it to upstream and then > remove it shortly after. > > So I'd suggest we move Masa's patches to a separate branch and not send > it up this round. Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3 to solve the padding issue. Thanks Baoquan
On Wed, Oct 10, 2018 at 04:41:16PM +0800, Chao Fan wrote: > In the earliest time, I tried to dig ACPI tabls to solve this problem. > But I didn't splite the code in 'compressed/' and ACPI code, so the patch > is hard to follow so refused by community. > Somebody suggest to add a kernel parameter to specify the > immovable memory so that limit kaslr in these regions. Then I make > a new patchset. After several versions, Ingo gave a suggestion: > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html > Follow Ingo's suggestion, imitate the ACPI code to parse the acpi > tables, so that the kaslr can get necessary memory information in > ACPI tables. > I think ACPI code is an independent part, so copy the codes > and functions to 'compressed/' directory, so that kaslr won't > influence the initialization of ACPI. You say "copy". I'm still about to look at the code but can those functions be carved out in a separate compilation unit which ACPI *and* KASLR can both link with so that there's no duplication?
On Wed, Oct 10, 2018 at 05:30:57PM +0800, Baoquan He wrote: > On 10/10/18 at 11:19am, Borislav Petkov wrote: > > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote: > > > Yes, it's different, but if the SRAT information is available early, then > > > the command line parameter can go away because then the required > > > information for Masa's problem is available as well. > > > > Exactly. And I'd prefer we delayed the command line parameter until we > > figure out we really need it and not expose it to upstream and then > > remove it shortly after. > > > > So I'd suggest we move Masa's patches to a separate branch and not send > > it up this round. > > Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3 > to solve the padding issue. Thanks for your comments! Yes, immovable_mem[num_immovable_mem] in Chao's patch may be useful for calculating the padding size. If so, we don't need the new kernel parameter. It's nice! Do you happen to have ideas how we access immovable_mem[num_immovable_mem] from arch/x86/mm/kaslr.c ? It is located to arch/x86/boot/compressed/*, so I suppose it is not easy to access it... I would appreciate if you could give some advice. Thanks! Masa
On 10/10/18 at 03:44pm, Masayoshi Mizuma wrote: > On Wed, Oct 10, 2018 at 05:30:57PM +0800, Baoquan He wrote: > > On 10/10/18 at 11:19am, Borislav Petkov wrote: > > > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote: > > > > Yes, it's different, but if the SRAT information is available early, then > > > > the command line parameter can go away because then the required > > > > information for Masa's problem is available as well. > > > > > > Exactly. And I'd prefer we delayed the command line parameter until we > > > figure out we really need it and not expose it to upstream and then > > > remove it shortly after. > > > > > > So I'd suggest we move Masa's patches to a separate branch and not send > > > it up this round. > > > > Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3 > > to solve the padding issue. > > Thanks for your comments! Yes, immovable_mem[num_immovable_mem] in Chao's > patch may be useful for calculating the padding size. If so, we don't > need the new kernel parameter. It's nice! > > Do you happen to have ideas how we access immovable_mem[num_immovable_mem] > from arch/x86/mm/kaslr.c ? It is located to arch/x86/boot/compressed/*, so > I suppose it is not easy to access it... > I would appreciate if you could give some advice. Hmm, they are living in different life cycle and space. So we can only reuse the code from Chao's patch, but not the variable. Means we need go through the SRAT accessing again in arch/x86/mm/kaslr.c and fill immovable_mem[num_immovable_mem] for mm/kaslr.c usage, if we decide to do like that. Thanks Baoquan
On Wed, Oct 10, 2018 at 07:16:16PM +0200, Borislav Petkov wrote: >On Wed, Oct 10, 2018 at 04:41:16PM +0800, Chao Fan wrote: >> In the earliest time, I tried to dig ACPI tabls to solve this problem. >> But I didn't splite the code in 'compressed/' and ACPI code, so the patch >> is hard to follow so refused by community. >> Somebody suggest to add a kernel parameter to specify the >> immovable memory so that limit kaslr in these regions. Then I make >> a new patchset. After several versions, Ingo gave a suggestion: >> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html >> Follow Ingo's suggestion, imitate the ACPI code to parse the acpi >> tables, so that the kaslr can get necessary memory information in >> ACPI tables. >> I think ACPI code is an independent part, so copy the codes >> and functions to 'compressed/' directory, so that kaslr won't >> influence the initialization of ACPI. > >You say "copy". I'm still about to look at the code but can those >functions be carved out in a separate compilation unit which ACPI *and* >KASLR can both link with so that there's no duplication? Sorry for my poor English, I used 'copy' but they are not same. Maybe 'imitate' is better. Just like I said in my log, The ACPI part need to handle the map between physical address and virtual address. But in KASLR part, I remove these operations. So my code is simplified version. Thanks, Chao Fan > >-- >Regards/Gruss, > Boris. > >Good mailing practices for 400: avoid top-posting and trim the reply. > >
On Thu, Oct 11, 2018 at 08:29:55AM +0800, Baoquan He wrote: >On 10/10/18 at 03:44pm, Masayoshi Mizuma wrote: >> On Wed, Oct 10, 2018 at 05:30:57PM +0800, Baoquan He wrote: >> > On 10/10/18 at 11:19am, Borislav Petkov wrote: >> > > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote: >> > > > Yes, it's different, but if the SRAT information is available early, then >> > > > the command line parameter can go away because then the required >> > > > information for Masa's problem is available as well. >> > > >> > > Exactly. And I'd prefer we delayed the command line parameter until we >> > > figure out we really need it and not expose it to upstream and then >> > > remove it shortly after. >> > > >> > > So I'd suggest we move Masa's patches to a separate branch and not send >> > > it up this round. >> > >> > Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3 >> > to solve the padding issue. >> >> Thanks for your comments! Yes, immovable_mem[num_immovable_mem] in Chao's >> patch may be useful for calculating the padding size. If so, we don't >> need the new kernel parameter. It's nice! >> >> Do you happen to have ideas how we access immovable_mem[num_immovable_mem] >> from arch/x86/mm/kaslr.c ? It is located to arch/x86/boot/compressed/*, so >> I suppose it is not easy to access it... >> I would appreciate if you could give some advice. > >Hmm, they are living in different life cycle and space. So we can only >reuse the code from Chao's patch, but not the variable. Means we need >go through the SRAT accessing again in arch/x86/mm/kaslr.c and fill >immovable_mem[num_immovable_mem] for mm/kaslr.c usage, if we decide to >do like that. Reading three times is redundant, but reading two times is needed. Becasue the ACPI code run very stable, but we need more information before that. As for Masa's issue, I am wondering whether we can tranfer the information or only the address of SRAT table from compressed period to the period after start_kernel(). Thanks, Chao Fan > >Thanks >Baoquan > >
Baoquan and Chao, thank you for your comments! Boris, the patches has been merged to linux-next. Should I create a new patch for linux-next? I'm trying to make the padding size set automatically. Thanks, Masa On Thu, Oct 11, 2018 at 01:51:54PM +0800, Chao Fan wrote: > On Thu, Oct 11, 2018 at 08:29:55AM +0800, Baoquan He wrote: > >On 10/10/18 at 03:44pm, Masayoshi Mizuma wrote: > >> On Wed, Oct 10, 2018 at 05:30:57PM +0800, Baoquan He wrote: > >> > On 10/10/18 at 11:19am, Borislav Petkov wrote: > >> > > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote: > >> > > > Yes, it's different, but if the SRAT information is available early, then > >> > > > the command line parameter can go away because then the required > >> > > > information for Masa's problem is available as well. > >> > > > >> > > Exactly. And I'd prefer we delayed the command line parameter until we > >> > > figure out we really need it and not expose it to upstream and then > >> > > remove it shortly after. > >> > > > >> > > So I'd suggest we move Masa's patches to a separate branch and not send > >> > > it up this round. > >> > > >> > Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3 > >> > to solve the padding issue. > >> > >> Thanks for your comments! Yes, immovable_mem[num_immovable_mem] in Chao's > >> patch may be useful for calculating the padding size. If so, we don't > >> need the new kernel parameter. It's nice! > >> > >> Do you happen to have ideas how we access immovable_mem[num_immovable_mem] > >> from arch/x86/mm/kaslr.c ? It is located to arch/x86/boot/compressed/*, so > >> I suppose it is not easy to access it... > >> I would appreciate if you could give some advice. > > > >Hmm, they are living in different life cycle and space. So we can only > >reuse the code from Chao's patch, but not the variable. Means we need > >go through the SRAT accessing again in arch/x86/mm/kaslr.c and fill > >immovable_mem[num_immovable_mem] for mm/kaslr.c usage, if we decide to > >do like that. > > Reading three times is redundant, but reading two times is needed. > Becasue the ACPI code run very stable, but we need more information > before that. > As for Masa's issue, I am wondering whether we can tranfer the > information or only the address of SRAT table from compressed period > to the period after start_kernel(). > > Thanks, > Chao Fan > > > > >Thanks > >Baoquan > > > > > >
On Sat, Oct 13, 2018 at 04:19:59PM -0400, Masayoshi Mizuma wrote: > Baoquan and Chao, thank you for your comments! > > Boris, the patches has been merged to linux-next. First of all, please do not top-post. Even if they're in linux-next, that doesn't mean they should go upstream while there are still issues to be hammered out. In this particular case, if Chao Fan's patchset shapes up in a decent form to be upstreamed, we don't need the cmdline parameter anymore, do we? Because we'll have the SRAT parsing and proper KASLR limits ready and working automatically, I'd say.
On Sat, Oct 13, 2018 at 10:34:29PM +0200, Borislav Petkov wrote: > On Sat, Oct 13, 2018 at 04:19:59PM -0400, Masayoshi Mizuma wrote: > > Baoquan and Chao, thank you for your comments! > > > > Boris, the patches has been merged to linux-next. > > First of all, please do not top-post. > > Even if they're in linux-next, that doesn't mean they should go upstream > while there are still issues to be hammered out. Got it, thanks. > > In this particular case, if Chao Fan's patchset shapes up in a decent > form to be upstreamed, we don't need the cmdline parameter anymore, do > we? > > Because we'll have the SRAT parsing and proper KASLR limits ready and > working automatically, I'd say. Yes, but I think we need to arrange the Chao's SRAT parsing to be used from kernel_randomize_memory() because Chao's approach needs the SRAT parsing before extract kernel and the padding size calculation needs the parsing in start_kernel(), so they are living in different life cycle and space. Thanks, Masa
On Sat, Oct 13, 2018 at 05:45:51PM -0400, Masayoshi Mizuma wrote: > Yes, but I think we need to arrange the Chao's SRAT parsing to be used > from kernel_randomize_memory() because Chao's approach needs the SRAT > parsing before extract kernel and the padding size calculation needs > the parsing in start_kernel(), so they are living in different life > cycle and space. Or you would like to pass some info from the compressed kernel to kernel proper? For example, something like the passing in add_e820ext() and the consumtion in parse_setup_data(): switch (data_type) { case SETUP_E820_EXT: e820__memory_setup_extended(pa_data, data_len); So info you need for your padding gets collected in arch/x86/boot/compressed/acpitb.c and you consume it in kernel_randomize_memory()? Would that work? :-)
On Sun, Oct 14, 2018 at 12:05:41AM +0200, Borislav Petkov wrote: > On Sat, Oct 13, 2018 at 05:45:51PM -0400, Masayoshi Mizuma wrote: > > Yes, but I think we need to arrange the Chao's SRAT parsing to be used > > from kernel_randomize_memory() because Chao's approach needs the SRAT > > parsing before extract kernel and the padding size calculation needs > > the parsing in start_kernel(), so they are living in different life > > cycle and space. > > Or you would like to pass some info from the compressed kernel to kernel > proper? > > For example, something like the passing in add_e820ext() and the consumtion > in parse_setup_data(): > > switch (data_type) { > case SETUP_E820_EXT: > e820__memory_setup_extended(pa_data, data_len); > > So info you need for your padding gets collected in > arch/x86/boot/compressed/acpitb.c and you consume it in > kernel_randomize_memory()? > > Would that work? It's nice idea! Thank you so much! - Masa
Hi Boris, Baoquan and all, I have created a RFC patch for adjust KASLR padding size. This patch is based on Can's v8 patch [1/3], and the Can's patch will be changed in the future, so this patch is just RFC. Welcome to any comments and suggestions. Thanks! --- arch/x86/boot/compressed/acpitb.c | 8 ++++++- arch/x86/include/uapi/asm/bootparam.h | 3 ++- arch/x86/mm/kaslr.c | 31 ++++++++++++++++++++++++++- 3 files changed, 39 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/acpitb.c b/arch/x86/boot/compressed/acpitb.c index 6b869e3f9780..1504b46f2a04 100644 --- a/arch/x86/boot/compressed/acpitb.c +++ b/arch/x86/boot/compressed/acpitb.c @@ -367,6 +367,7 @@ void get_immovable_mem(void) struct acpi_subtable_header *table; struct acpi_srat_mem_affinity *ma; unsigned long table_end; + unsigned long long possible_addr, max_possible_addr = 0; int i = 0; if (!strstr(args, "movable_node") || strstr(args, "acpi=off")) @@ -384,7 +385,11 @@ void get_immovable_mem(void) while (((unsigned long)table) + table->length < table_end) { if (table->type == 1) { ma = (struct acpi_srat_mem_affinity *)table; - if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) { + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { + possible_addr = ma->base_address + ma->length; + if (possible_addr > max_possible_addr) + max_possible_addr = possible_addr; + } else { immovable_mem[i].start = ma->base_address; immovable_mem[i].size = ma->length; i++; @@ -397,6 +402,7 @@ void get_immovable_mem(void) ((unsigned long)table + table->length); } num_immovable_mem = i; + boot_params->possible_mem_addr = max_possible_addr; } #else void get_immovable_mem(void) diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index a06cbf019744..c987c725e93a 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -191,7 +191,8 @@ struct boot_params { __u8 _pad7[0x290-0x1f1-sizeof(struct setup_header)]; __u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX]; /* 0x290 */ struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */ - __u8 _pad8[48]; /* 0xcd0 */ + __u8 _pad8[40]; /* 0xcd0 */ + __u64 possible_mem_addr; /* 0xcf8 */ struct edd_info eddbuf[EDDMAXNR]; /* 0xd00 */ __u8 _pad9[276]; /* 0xeec */ } __attribute__((packed)); diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 61db77b0eda9..8c5aca792b51 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -69,6 +69,35 @@ static inline bool kaslr_memory_enabled(void) return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN); } +#ifdef CONFIG_MEMORY_HOTPLUG +static unsigned int __init kaslr_padding(void) +{ + unsigned long long max_possible_phys, max_actual_phys, threshold; + unsigned int rand_mem_physical_padding = + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; + + if (!boot_params.possible_mem_addr) + goto out; + + max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT); + max_possible_phys = roundup(boot_params.possible_mem_addr, + 1ULL << TB_SHIFT); + threshold = max_actual_phys + + ((unsigned long long)rand_mem_physical_padding << TB_SHIFT); + + if (max_possible_phys > threshold) + rand_mem_physical_padding = + (max_possible_phys - max_actual_phys) >> TB_SHIFT; +out: + return rand_mem_physical_padding; +} +#else +static unsigned int __init kaslr_padding(void) +{ + return CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; +} +#endif + /* Initialize base and padding for each memory region randomized with KASLR */ void __init kernel_randomize_memory(void) { @@ -102,7 +131,7 @@ void __init kernel_randomize_memory(void) */ BUG_ON(kaslr_regions[0].base != &page_offset_base); memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) + - CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; + kaslr_padding(); /* Adapt phyiscal memory region size based on available memory */ if (memory_tb < kaslr_regions[0].size_tb)
On Tue, Oct 16, 2018 at 11:13:54AM -0400, Masayoshi Mizuma wrote: > Hi Boris, Baoquan and all, > > I have created a RFC patch for adjust KASLR padding size. > This patch is based on Can's v8 patch [1/3], and the Can's patch > will be changed in the future, so this patch is just RFC. > > Welcome to any comments and suggestions. Thanks! ... > diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h > index a06cbf019744..c987c725e93a 100644 > --- a/arch/x86/include/uapi/asm/bootparam.h > +++ b/arch/x86/include/uapi/asm/bootparam.h > @@ -191,7 +191,8 @@ struct boot_params { > __u8 _pad7[0x290-0x1f1-sizeof(struct setup_header)]; > __u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX]; /* 0x290 */ > struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */ > - __u8 _pad8[48]; /* 0xcd0 */ > + __u8 _pad8[40]; /* 0xcd0 */ > + __u64 possible_mem_addr; /* 0xcf8 */ Where in the example I gave you with add_e820ext() do you see members being added to struct boot_params? Take a look at it again pls.
On Tue, Oct 16, 2018 at 09:11:14PM +0200, Borislav Petkov wrote: > On Tue, Oct 16, 2018 at 11:13:54AM -0400, Masayoshi Mizuma wrote: > > Hi Boris, Baoquan and all, > > > > I have created a RFC patch for adjust KASLR padding size. > > This patch is based on Can's v8 patch [1/3], and the Can's patch > > will be changed in the future, so this patch is just RFC. > > > > Welcome to any comments and suggestions. Thanks! > > ... > > > diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h > > index a06cbf019744..c987c725e93a 100644 > > --- a/arch/x86/include/uapi/asm/bootparam.h > > +++ b/arch/x86/include/uapi/asm/bootparam.h > > @@ -191,7 +191,8 @@ struct boot_params { > > __u8 _pad7[0x290-0x1f1-sizeof(struct setup_header)]; > > __u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX]; /* 0x290 */ > > struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */ > > - __u8 _pad8[48]; /* 0xcd0 */ > > + __u8 _pad8[40]; /* 0xcd0 */ > > + __u64 possible_mem_addr; /* 0xcf8 */ > > Where in the example I gave you with add_e820ext() do you see members > being added to struct boot_params? > > Take a look at it again pls. Ah, sorry, I misunderstood your suggetion... In parse_setup_data(), the data is picked up from boot_params, so I thought it is also useful for this sitiation. I'll try again, thanks! - Masa
On Tue, Oct 16, 2018 at 03:54:30PM -0400, Masayoshi Mizuma wrote: > Ah, sorry, I misunderstood your suggetion... > In parse_setup_data(), the data is picked up from boot_params, Yes, this is the pointer to the setup_data linked list head. See Documentation/ABI/testing/sysfs-kernel-boot_params Documentation/x86/boot.txt for more information and basically grep the tree for examples.
Hi Boris, On Tue, Oct 16, 2018 at 09:59:02PM +0200, Borislav Petkov wrote: > On Tue, Oct 16, 2018 at 03:54:30PM -0400, Masayoshi Mizuma wrote: > > Ah, sorry, I misunderstood your suggetion... > > In parse_setup_data(), the data is picked up from boot_params, > > Yes, this is the pointer to the setup_data linked list head. See > > Documentation/ABI/testing/sysfs-kernel-boot_params > Documentation/x86/boot.txt > > for more information and basically grep the tree for examples. I'm trying to store the SRAT info and pass it to kernel_randomize_memory(), looks like add_e820ext()/parse_setup_data(). Is the approach useful only EFI environment? I'm not sure how I allocate memory to store the data in legacy bios environment... On EFI, I can use efi_call_early(allocate_pool, EFI_LOADER_DATA, ...). Am I missing something? I would appreciate if you could help my understanding. Following patch is a prototype for EFI enviromnent. --- arch/x86/boot/compressed/acpitb.c | 23 ++++++++++- arch/x86/boot/compressed/eboot.c | 36 +++++++++++++++++ arch/x86/include/uapi/asm/bootparam.h | 1 + arch/x86/mm/kaslr.c | 58 ++++++++++++++++++++++++++- 4 files changed, 116 insertions(+), 2 deletions(-) diff --git a/arch/x86/boot/compressed/acpitb.c b/arch/x86/boot/compressed/acpitb.c index d119663..b79560c 100644 --- a/arch/x86/boot/compressed/acpitb.c +++ b/arch/x86/boot/compressed/acpitb.c @@ -309,6 +309,20 @@ static struct acpi_table_header *get_acpi_srat_table(void) return NULL; } +static void store_possible_addr(unsigned long long possible) +{ + struct setup_data *data; + + data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data; + while (data) { + if (data->type == SETUP_KASLR) { + *(unsigned long long *)data->data = possible; + return; + } + data = (struct setup_data *)(unsigned long)data->next; + } +} + /* * According to ACPI table, filter the immvoable memory regions * and store them in immovable_mem[]. @@ -319,6 +333,7 @@ void get_immovable_mem(void) struct acpi_subtable_header *table; struct acpi_srat_mem_affinity *ma; unsigned long table_end; + unsigned long long possible_addr, max_possible_addr = 0; int i = 0; if (!cmdline_find_option_bool("movable_node") || @@ -338,7 +353,12 @@ void get_immovable_mem(void) sizeof(struct acpi_subtable_header) < table_end) { if (table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) { ma = (struct acpi_srat_mem_affinity *)table; - if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) { + + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { + possible_addr = ma->base_address + ma->length; + if (possible_addr > max_possible_addr) + max_possible_addr = possible_addr; + } else { immovable_mem[i].start = ma->base_address; immovable_mem[i].size = ma->length; i++; @@ -351,4 +371,5 @@ void get_immovable_mem(void) ((unsigned long)table + table->length); } num_immovable_mem = i; + store_possible_addr(max_possible_addr); } diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index 1458b17..9b95fba 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -192,6 +192,40 @@ static void setup_efi_pci(struct boot_params *params) efi_call_early(free_pool, pci_handle); } +#ifdef CONFIG_RANDOMIZE_MEMORY +static void setup_kaslr(struct boot_params *params) +{ + struct setup_data *kaslr_data = NULL; + struct setup_data *data; + unsigned long size; + efi_status_t status; + + size = sizeof(struct setup_data) + sizeof(unsigned long long); + + status = efi_call_early(allocate_pool, EFI_LOADER_DATA, + size, (void **)&kaslr_data); + if (status != EFI_SUCCESS) { + efi_printk(sys_table, "Failed to allocate memory for 'kaslr_data'\n"); + return; + } + + kaslr_data->type = SETUP_KASLR; + kaslr_data->next = 0; + kaslr_data->len = size; + + data = (struct setup_data *)(unsigned long)params->hdr.setup_data; + if (data) + data->next = (unsigned long)kaslr_data; + else { + while (data->next) + data = (struct setup_data *)(unsigned long)data->next; + data->next = (unsigned long)kaslr_data; + } +} +#else +static void setup_kaslr(struct boot_params *params) {} +#endif + static void retrieve_apple_device_properties(struct boot_params *boot_params) { efi_guid_t guid = APPLE_PROPERTIES_PROTOCOL_GUID; @@ -770,6 +804,8 @@ efi_main(struct efi_config *c, struct boot_params *boot_params) setup_efi_pci(boot_params); + setup_kaslr(boot_params); + setup_quirks(boot_params); status = efi_call_early(allocate_pool, EFI_LOADER_DATA, diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index a06cbf0..0a44d83 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -10,6 +10,7 @@ #define SETUP_EFI 4 #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE 6 +#define SETUP_KASLR 7 /* ram_size flags */ #define RAMDISK_IMAGE_START_MASK 0x07FF diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 61db77b..6f91cf4 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -28,6 +28,7 @@ #include <asm/pgtable.h> #include <asm/setup.h> #include <asm/kaslr.h> +#include <linux/io.h> #include "mm_internal.h" @@ -69,6 +70,61 @@ static inline bool kaslr_memory_enabled(void) return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN); } +#ifdef CONFIG_MEMORY_HOTPLUG +static unsigned long long __init get_max_possible_addr(void) +{ + struct setup_data *data; + u64 pa_data; + unsigned long long max = 0; + + pa_data = boot_params.hdr.setup_data; + while (pa_data) { + data = early_memremap(pa_data, sizeof(*data)); + if (!data) + return 0; + + if (data->type == SETUP_KASLR) { + max = *(unsigned long long *)data->data; + early_memunmap(data, sizeof(*data)); + return max; + } + pa_data = data->next; + early_memunmap(data, sizeof(*data)); + } + + return max; +} + +static unsigned int __init kaslr_padding(void) +{ + unsigned long long max_possible_addr; + unsigned long long max_possible_phys, max_actual_phys, threshold; + unsigned int rand_mem_physical_padding = + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; + + max_possible_addr = get_max_possible_addr(); + if (!max_possible_addr) + goto out; + + max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT); + max_possible_phys = roundup(max_possible_addr, + 1ULL << TB_SHIFT); + threshold = max_actual_phys + + ((unsigned long long)rand_mem_physical_padding << TB_SHIFT); + + if (max_possible_phys > threshold) + rand_mem_physical_padding = + (max_possible_phys - max_actual_phys) >> TB_SHIFT; +out: + return rand_mem_physical_padding; +} +#else +static unsigned int __init kaslr_padding(void) +{ + return CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; +} +#endif + /* Initialize base and padding for each memory region randomized with KASLR */ void __init kernel_randomize_memory(void) { @@ -102,7 +158,7 @@ void __init kernel_randomize_memory(void) */ BUG_ON(kaslr_regions[0].base != &page_offset_base); memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) + - CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; + kaslr_padding(); /* Adapt phyiscal memory region size based on available memory */ if (memory_tb < kaslr_regions[0].size_tb) -- Thanks! Masa
On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote: >Hi Boris, Hi Mizuma-san, I have several questions: >+static void store_possible_addr(unsigned long long possible) >+{ >+ struct setup_data *data; >+ >+ data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data; I suggest you add check: if (!data) { debug_putstr("No setup_data found.\n"); return; } >+ while (data) { >+ if (data->type == SETUP_KASLR) { >+ *(unsigned long long *)data->data = possible; >+ return; >+ } >+ data = (struct setup_data *)(unsigned long)data->next; >+ } >+} >+ > /* > * According to ACPI table, filter the immvoable memory regions > * and store them in immovable_mem[]. >@@ -319,6 +333,7 @@ void get_immovable_mem(void) > struct acpi_subtable_header *table; > struct acpi_srat_mem_affinity *ma; > unsigned long table_end; >+ unsigned long long possible_addr, max_possible_addr = 0; > int i = 0; > > if (!cmdline_find_option_bool("movable_node") || >@@ -338,7 +353,12 @@ void get_immovable_mem(void) > sizeof(struct acpi_subtable_header) < table_end) { > if (table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) { > ma = (struct acpi_srat_mem_affinity *)table; >- if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) { >+ >+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { >+ possible_addr = ma->base_address + ma->length; >+ if (possible_addr > max_possible_addr) >+ max_possible_addr = possible_addr; >+ } else { > immovable_mem[i].start = ma->base_address; > immovable_mem[i].size = ma->length; > i++; >@@ -351,4 +371,5 @@ void get_immovable_mem(void) > ((unsigned long)table + table->length); > } > num_immovable_mem = i; >+ store_possible_addr(max_possible_addr); > } >diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c >index 1458b17..9b95fba 100644 >--- a/arch/x86/boot/compressed/eboot.c >+++ b/arch/x86/boot/compressed/eboot.c >@@ -192,6 +192,40 @@ static void setup_efi_pci(struct boot_params *params) > efi_call_early(free_pool, pci_handle); > } > >+#ifdef CONFIG_RANDOMIZE_MEMORY >+static void setup_kaslr(struct boot_params *params) >+{ >+ struct setup_data *kaslr_data = NULL; >+ struct setup_data *data; >+ unsigned long size; >+ efi_status_t status; >+ >+ size = sizeof(struct setup_data) + sizeof(unsigned long long); >+ >+ status = efi_call_early(allocate_pool, EFI_LOADER_DATA, >+ size, (void **)&kaslr_data); >+ if (status != EFI_SUCCESS) { >+ efi_printk(sys_table, "Failed to allocate memory for 'kaslr_data'\n"); >+ return; >+ } >+ >+ kaslr_data->type = SETUP_KASLR; >+ kaslr_data->next = 0; >+ kaslr_data->len = size; >+ >+ data = (struct setup_data *)(unsigned long)params->hdr.setup_data; >+ if (data) >+ data->next = (unsigned long)kaslr_data; Why just put the kaslr_data in data->next. You can't make sure data->next was NULL. >+ else { If data is NULL, go to this else{}, so these two lines below work? >+ while (data->next) >+ data = (struct setup_data *)(unsigned long)data->next; >+ data->next = (unsigned long)kaslr_data; >+ } If my understanding is not wrong, it should be: data = (struct setup_data *)(unsigned long)params->hdr.setup_data; if (!data) params->hdr.setup_data = (unsigned long)kaslr_data; else { while (data->next) data = (struct setup_data *)(unsigned long)data->next; data->next = (unsigned long)kaslr_data; } If I misunderstand something, please tell me. Thanks, Chao Fan
On Tue, Oct 23, 2018 at 10:48:02AM +0800, Chao Fan wrote: > On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote: > >Hi Boris, > > Hi Mizuma-san, > > I have several questions: Thank you for your comments! I think your suggestions are right. However, the prototype patch works EFI environment only. The memory hot-plug affinity in SRAT and KASLR are also available on legacy BIOS environment, so I need to get the patch useful for legacy BIOS as well, but I have no idea to add such things... If you have ideas, could you let me know? Probably I should have another idea, for example, add the SRAT parsing code, looks like you are adding to arch/x86/boot/compressed/acpitb.c, to arch/x86/mm/kaslr.c. Thanks, Masa > > >+static void store_possible_addr(unsigned long long possible) > >+{ > >+ struct setup_data *data; > >+ > >+ data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data; > I suggest you add check: > > if (!data) { > debug_putstr("No setup_data found.\n"); > return; > } > > >+ while (data) { > >+ if (data->type == SETUP_KASLR) { > >+ *(unsigned long long *)data->data = possible; > >+ return; > >+ } > >+ data = (struct setup_data *)(unsigned long)data->next; > >+ } > >+} > >+ > > /* > > * According to ACPI table, filter the immvoable memory regions > > * and store them in immovable_mem[]. > >@@ -319,6 +333,7 @@ void get_immovable_mem(void) > > struct acpi_subtable_header *table; > > struct acpi_srat_mem_affinity *ma; > > unsigned long table_end; > >+ unsigned long long possible_addr, max_possible_addr = 0; > > int i = 0; > > > > if (!cmdline_find_option_bool("movable_node") || > >@@ -338,7 +353,12 @@ void get_immovable_mem(void) > > sizeof(struct acpi_subtable_header) < table_end) { > > if (table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) { > > ma = (struct acpi_srat_mem_affinity *)table; > >- if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) { > >+ > >+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { > >+ possible_addr = ma->base_address + ma->length; > >+ if (possible_addr > max_possible_addr) > >+ max_possible_addr = possible_addr; > >+ } else { > > immovable_mem[i].start = ma->base_address; > > immovable_mem[i].size = ma->length; > > i++; > >@@ -351,4 +371,5 @@ void get_immovable_mem(void) > > ((unsigned long)table + table->length); > > } > > num_immovable_mem = i; > >+ store_possible_addr(max_possible_addr); > > } > >diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c > >index 1458b17..9b95fba 100644 > >--- a/arch/x86/boot/compressed/eboot.c > >+++ b/arch/x86/boot/compressed/eboot.c > >@@ -192,6 +192,40 @@ static void setup_efi_pci(struct boot_params *params) > > efi_call_early(free_pool, pci_handle); > > } > > > >+#ifdef CONFIG_RANDOMIZE_MEMORY > >+static void setup_kaslr(struct boot_params *params) > >+{ > >+ struct setup_data *kaslr_data = NULL; > >+ struct setup_data *data; > >+ unsigned long size; > >+ efi_status_t status; > >+ > >+ size = sizeof(struct setup_data) + sizeof(unsigned long long); > >+ > >+ status = efi_call_early(allocate_pool, EFI_LOADER_DATA, > >+ size, (void **)&kaslr_data); > >+ if (status != EFI_SUCCESS) { > >+ efi_printk(sys_table, "Failed to allocate memory for 'kaslr_data'\n"); > >+ return; > >+ } > >+ > >+ kaslr_data->type = SETUP_KASLR; > >+ kaslr_data->next = 0; > >+ kaslr_data->len = size; > >+ > >+ data = (struct setup_data *)(unsigned long)params->hdr.setup_data; > >+ if (data) > >+ data->next = (unsigned long)kaslr_data; > Why just put the kaslr_data in data->next. You can't make sure > data->next was NULL. > >+ else { > If data is NULL, go to this else{}, so these two lines below work? > >+ while (data->next) > >+ data = (struct setup_data *)(unsigned long)data->next; > >+ data->next = (unsigned long)kaslr_data; > >+ } > If my understanding is not wrong, it should be: > > data = (struct setup_data *)(unsigned long)params->hdr.setup_data; > if (!data) > params->hdr.setup_data = (unsigned long)kaslr_data; > else { > while (data->next) > data = (struct setup_data *)(unsigned long)data->next; > data->next = (unsigned long)kaslr_data; > } > > If I misunderstand something, please tell me. > > Thanks, > Chao Fan > >
On Wed, Oct 24, 2018 at 03:21:36PM -0400, Masayoshi Mizuma wrote: >On Tue, Oct 23, 2018 at 10:48:02AM +0800, Chao Fan wrote: >> On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote: >> >Hi Boris, >> >> Hi Mizuma-san, >> >> I have several questions: > >Thank you for your comments! I think your suggestions are >right. >However, the prototype patch works EFI environment only. Yes, I agree. But I think this method is much better than adding code to arch/x86/mm/kaslr.c. >The memory hot-plug affinity in SRAT and KASLR are also available >on legacy BIOS environment, so I need to get the patch useful >for legacy BIOS as well, but I have no idea to add such things... >If you have ideas, could you let me know? I have no idea. I will work on it, try to help figure out the BIOS issue. Thanks, Chao Fan > >Probably I should have another idea, for example, >add the SRAT parsing code, looks like you are adding to >arch/x86/boot/compressed/acpitb.c, to arch/x86/mm/kaslr.c. > >Thanks, >Masa > >> >> >+static void store_possible_addr(unsigned long long possible) >> >+{ >> >+ struct setup_data *data; >> >+ >> >+ data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data; >> I suggest you add check: >> >> if (!data) { >> debug_putstr("No setup_data found.\n"); >> return; >> } >> >> >+ while (data) { >> >+ if (data->type == SETUP_KASLR) { >> >+ *(unsigned long long *)data->data = possible; >> >+ return; >> >+ } >> >+ data = (struct setup_data *)(unsigned long)data->next; >> >+ } >> >+} >> >+ >> > /* >> > * According to ACPI table, filter the immvoable memory regions >> > * and store them in immovable_mem[]. >> >@@ -319,6 +333,7 @@ void get_immovable_mem(void) >> > struct acpi_subtable_header *table; >> > struct acpi_srat_mem_affinity *ma; >> > unsigned long table_end; >> >+ unsigned long long possible_addr, max_possible_addr = 0; >> > int i = 0; >> > >> > if (!cmdline_find_option_bool("movable_node") || >> >@@ -338,7 +353,12 @@ void get_immovable_mem(void) >> > sizeof(struct acpi_subtable_header) < table_end) { >> > if (table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) { >> > ma = (struct acpi_srat_mem_affinity *)table; >> >- if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) { >> >+ >> >+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { >> >+ possible_addr = ma->base_address + ma->length; >> >+ if (possible_addr > max_possible_addr) >> >+ max_possible_addr = possible_addr; >> >+ } else { >> > immovable_mem[i].start = ma->base_address; >> > immovable_mem[i].size = ma->length; >> > i++; >> >@@ -351,4 +371,5 @@ void get_immovable_mem(void) >> > ((unsigned long)table + table->length); >> > } >> > num_immovable_mem = i; >> >+ store_possible_addr(max_possible_addr); >> > } >> >diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c >> >index 1458b17..9b95fba 100644 >> >--- a/arch/x86/boot/compressed/eboot.c >> >+++ b/arch/x86/boot/compressed/eboot.c >> >@@ -192,6 +192,40 @@ static void setup_efi_pci(struct boot_params *params) >> > efi_call_early(free_pool, pci_handle); >> > } >> > >> >+#ifdef CONFIG_RANDOMIZE_MEMORY >> >+static void setup_kaslr(struct boot_params *params) >> >+{ >> >+ struct setup_data *kaslr_data = NULL; >> >+ struct setup_data *data; >> >+ unsigned long size; >> >+ efi_status_t status; >> >+ >> >+ size = sizeof(struct setup_data) + sizeof(unsigned long long); >> >+ >> >+ status = efi_call_early(allocate_pool, EFI_LOADER_DATA, >> >+ size, (void **)&kaslr_data); >> >+ if (status != EFI_SUCCESS) { >> >+ efi_printk(sys_table, "Failed to allocate memory for 'kaslr_data'\n"); >> >+ return; >> >+ } >> >+ >> >+ kaslr_data->type = SETUP_KASLR; >> >+ kaslr_data->next = 0; >> >+ kaslr_data->len = size; >> >+ >> >+ data = (struct setup_data *)(unsigned long)params->hdr.setup_data; >> >+ if (data) >> >+ data->next = (unsigned long)kaslr_data; >> Why just put the kaslr_data in data->next. You can't make sure >> data->next was NULL. >> >+ else { >> If data is NULL, go to this else{}, so these two lines below work? >> >+ while (data->next) >> >+ data = (struct setup_data *)(unsigned long)data->next; >> >+ data->next = (unsigned long)kaslr_data; >> >+ } >> If my understanding is not wrong, it should be: >> >> data = (struct setup_data *)(unsigned long)params->hdr.setup_data; >> if (!data) >> params->hdr.setup_data = (unsigned long)kaslr_data; >> else { >> while (data->next) >> data = (struct setup_data *)(unsigned long)data->next; >> data->next = (unsigned long)kaslr_data; >> } >> >> If I misunderstand something, please tell me. >> >> Thanks, >> Chao Fan >> >> > >
On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote: > I'm trying to store the SRAT info and pass it to kernel_randomize_memory(), > looks like add_e820ext()/parse_setup_data(). > > Is the approach useful only EFI environment? I'm not sure how I Does it matter for non-EFI even? I mean, you want this code only for your use case - when you have movable memory and you're doing kexec, yes? And those machines are all EFI boxes I'd assume...
On Thu, Oct 25, 2018 at 12:33:45PM +0200, Borislav Petkov wrote: > On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote: > > I'm trying to store the SRAT info and pass it to kernel_randomize_memory(), > > looks like add_e820ext()/parse_setup_data(). > > > > Is the approach useful only EFI environment? I'm not sure how I > > Does it matter for non-EFI even? > > I mean, you want this code only for your use case - when you have > movable memory and you're doing kexec, yes? > > And those machines are all EFI boxes I'd assume... My actual use case is for EFI boxes, however, I think it's better to useful for legacy BIOS as well because memory hot-plug affinity in SRAT and KASLR are available on legacy BIOS. Actually, we can create such environment in qemu. I have another idea to solve this issue. Adding a SRAT parsing code to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and also we don't need a new kernel parameter... Dose the idea make sense? Thanks, Masa
On Thu, Oct 25, 2018 at 09:40:51AM -0400, Masayoshi Mizuma wrote: > My actual use case is for EFI boxes, however, I think it's better to useful > for legacy BIOS as well because memory hot-plug affinity in SRAT and KASLR > are available on legacy BIOS. > Actually, we can create such environment in qemu. Ah, right, qemu. :) > I have another idea to solve this issue. Adding a SRAT parsing code > to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and > also we don't need a new kernel parameter... > Dose the idea make sense? The more automatic stuff we do and we don't have to involve the user, the better. However, lemme look at Chao's current patchset first - we should not go nuts by putting SRAT parsing everywhere :)
On 11/06/18 at 01:10pm, Borislav Petkov wrote: > > I have another idea to solve this issue. Adding a SRAT parsing code > > to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and > > also we don't need a new kernel parameter... > > Dose the idea make sense? > > The more automatic stuff we do and we don't have to involve the user, > the better. > > However, lemme look at Chao's current patchset first - we should not go > nuts by putting SRAT parsing everywhere :) arch/x86/mm/ident_map.c is a good example, it's shared between arch/x86/boot/compressed and arch/x86/mm/init_64.c
On Thu, Oct 25, 2018 at 09:40:51AM -0400, Masayoshi Mizuma wrote: > I have another idea to solve this issue. Adding a SRAT parsing code > to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and > also we don't need a new kernel parameter... > Dose the idea make sense? Ok, having swapped the whole thing back into my brain, forget what I said earlier today. Didn't we talk about passing info with setup_data to the later kernel stage? You even had a patch: https://lkml.kernel.org/r/20181022154204.kagmdb55jtoez4ca@gabell So what is that "idea" again about adding SRAT parsing code to arch/x86/mm/kaslr.c?!?! The intent of passing info with setup_data is to *avoid* parsing SRAT yet another time and duplicating that code one more time. IOW: * The first place that needs SRAT parsing, does the parsing - i.e., the compressed stage. * Later stages get information passed to them with setup_data. No second parsing. Ok?
On Tue, Nov 06, 2018 at 07:45:19PM +0100, Borislav Petkov wrote: > Ok, having swapped the whole thing back into my brain, forget what I > said earlier today. > > Didn't we talk about passing info with setup_data to the later kernel > stage? You even had a patch: > > https://lkml.kernel.org/r/20181022154204.kagmdb55jtoez4ca@gabell > > So what is that "idea" again about adding SRAT parsing code to > arch/x86/mm/kaslr.c?!?! > > The intent of passing info with setup_data is to *avoid* parsing SRAT > yet another time and duplicating that code one more time. > > IOW: > > * The first place that needs SRAT parsing, does the parsing - i.e., the > compressed stage. > > * Later stages get information passed to them with setup_data. No second > parsing. > > Ok? Yes, I think I can see what you are saying. However, I'm not sure how I use the setup_data in legacy BIOS environment. So, I said the another idea which adding the SRAT parsing code to arch/x86/mm/kaslr.c as well. Yes, as you said, that is not so good... I would appreciate if you could help to use setup_data or something to pass the information to later kernel stage in BIOS environment. Thanks, Masa
On Tue, Nov 06, 2018 at 02:36:38PM -0500, Masayoshi Mizuma wrote: > Yes, I think I can see what you are saying. However, I'm not sure how > I use the setup_data in legacy BIOS environment. What is "legacy BIOS environment" and what does that have to do with setup_data?
On Tue, Nov 06, 2018 at 09:45:11PM +0100, Borislav Petkov wrote: > On Tue, Nov 06, 2018 at 02:36:38PM -0500, Masayoshi Mizuma wrote: > > Yes, I think I can see what you are saying. However, I'm not sure how > > I use the setup_data in legacy BIOS environment. > > What is "legacy BIOS environment" and what does that have to do with > setup_data? My proposed patch [1] is useful only for EFI environment. The patch allocates a setup_date structure by efi_call_early() and store the KASLR data into the memory area. +static void setup_kaslr(struct boot_params *params) +{ + struct setup_data *kaslr_data = NULL; + struct setup_data *data; + unsigned long size; + efi_status_t status; + + size = sizeof(struct setup_data) + sizeof(unsigned long long); + + status = efi_call_early(allocate_pool, EFI_LOADER_DATA, + size, (void **)&kaslr_data); I'm not sure how I allocate such memory on no EFI environment. Am I missing something...? [1] https://lkml.kernel.org/r/20181022154204.kagmdb55jtoez4ca@gabell Thanks, Masa
On Tue, Nov 06, 2018 at 10:07:31PM +0800, Baoquan He wrote: >On 11/06/18 at 01:10pm, Borislav Petkov wrote: >> > I have another idea to solve this issue. Adding a SRAT parsing code >> > to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and >> > also we don't need a new kernel parameter... >> > Dose the idea make sense? >> >> The more automatic stuff we do and we don't have to involve the user, >> the better. >> >> However, lemme look at Chao's current patchset first - we should not go >> nuts by putting SRAT parsing everywhere :) > >arch/x86/mm/ident_map.c is a good example, it's shared between >arch/x86/boot/compressed and arch/x86/mm/init_64.c Thanks to Baoquan, I think we can try this idea. How about you, Masa? Thanks, Chao Fan > >
On Tue, Nov 06, 2018 at 05:21:34PM -0500, Masayoshi Mizuma wrote: > On Tue, Nov 06, 2018 at 09:45:11PM +0100, Borislav Petkov wrote: > > On Tue, Nov 06, 2018 at 02:36:38PM -0500, Masayoshi Mizuma wrote: > > > Yes, I think I can see what you are saying. However, I'm not sure how > > > I use the setup_data in legacy BIOS environment. > > > > What is "legacy BIOS environment" and what does that have to do with > > setup_data? > > My proposed patch [1] is useful only for EFI environment. The patch > allocates a setup_date structure by efi_call_early() and store the > KASLR data into the memory area. > > +static void setup_kaslr(struct boot_params *params) > +{ > + struct setup_data *kaslr_data = NULL; > + struct setup_data *data; > + unsigned long size; > + efi_status_t status; > + > + size = sizeof(struct setup_data) + sizeof(unsigned long long); > + > + status = efi_call_early(allocate_pool, EFI_LOADER_DATA, > + size, (void **)&kaslr_data); > > I'm not sure how I allocate such memory on no EFI environment. A global definition which doesn't need allocation? Maybe hpa would have another, better idea...
On Thu, Nov 08, 2018 at 11:51:29AM +0100, Borislav Petkov wrote: > A global definition which doesn't need allocation? > > Maybe hpa would have another, better idea... ...and he has: just put that address in a new field in struct boot_params by converting one of the padding arrays there. Don't forget to document it in Documentation/x86/zero-page.txt This way you don't need any of the allocation fun or to use setup_data at all. HTH.
On Sat, Nov 10, 2018 at 11:54:22AM +0100, Borislav Petkov wrote: > On Thu, Nov 08, 2018 at 11:51:29AM +0100, Borislav Petkov wrote: > > A global definition which doesn't need allocation? > > > > Maybe hpa would have another, better idea... > > ...and he has: just put that address in a new field in struct > boot_params by converting one of the padding arrays there. > > Don't forget to document it in Documentation/x86/zero-page.txt > > This way you don't need any of the allocation fun or to use setup_data > at all. Thanks! I have the prototype patch to use boot_params [1]. I will try to brush up it. [1] https://lore.kernel.org/lkml/20181016151353.punyk7exekut2543@gabell Thanks, Masa
Hi Boris and all, On Sun, Nov 11, 2018 at 08:45:57AM -0500, Masayoshi Mizuma wrote: > On Sat, Nov 10, 2018 at 11:54:22AM +0100, Borislav Petkov wrote: > > On Thu, Nov 08, 2018 at 11:51:29AM +0100, Borislav Petkov wrote: > > > A global definition which doesn't need allocation? > > > > > > Maybe hpa would have another, better idea... > > > > ...and he has: just put that address in a new field in struct > > boot_params by converting one of the padding arrays there. > > > > Don't forget to document it in Documentation/x86/zero-page.txt > > > > This way you don't need any of the allocation fun or to use setup_data > > at all. > > Thanks! > I have the prototype patch to use boot_params [1]. > I will try to brush up it. > > [1] https://lore.kernel.org/lkml/20181016151353.punyk7exekut2543@gabell Chao's patches are included in the tip tree, so I modified the patch. Could you review the following patch? From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Date: Tue, 5 Feb 2019 10:00:59 -0500 Subject: [PATCH] x86/mm: Introduce adjustment the padding size for KASLR If the physical memory layout has huge space for hotplug, the padding used for the physical memory mapping section is not enough. So, such system may crash while memory hot-adding on KASLR enabled system. For example, SRAT has the following layout, the maximum possible memory size is 32TB, and the memory is installed as 2TB actually, then the padding size should set 30TB (== possible memory size - actual memory size). SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug This patch introduces adjustment the padding size if the default padding size isn't enough. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> --- Documentation/x86/zero-page.txt | 1 + arch/x86/boot/compressed/acpi.c | 19 +++++++++++++++---- arch/x86/include/uapi/asm/bootparam.h | 2 +- arch/x86/mm/kaslr.c | 26 +++++++++++++++++++++++++- 4 files changed, 42 insertions(+), 6 deletions(-) diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt index 68aed077f..343fe1a90 100644 --- a/Documentation/x86/zero-page.txt +++ b/Documentation/x86/zero-page.txt @@ -15,6 +15,7 @@ Offset Proto Name Meaning 058/008 ALL tboot_addr Physical address of tboot shared page 060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information (struct ist_info) +078/010 ALL possible_mem_addr The possible maximum physical memory address. 080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!! 090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!! 0A0/010 ALL sys_desc_table System description table (struct sys_desc_table), diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c index c5a949335..7dd61b943 100644 --- a/arch/x86/boot/compressed/acpi.c +++ b/arch/x86/boot/compressed/acpi.c @@ -288,6 +288,7 @@ int count_immovable_mem_regions(void) struct acpi_subtable_header *sub_table; struct acpi_table_header *table_header; char arg[MAX_ACPI_ARG_LENGTH]; + unsigned long long possible_addr, max_possible_addr = 0; int num = 0; if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 && @@ -308,10 +309,19 @@ int count_immovable_mem_regions(void) struct acpi_srat_mem_affinity *ma; ma = (struct acpi_srat_mem_affinity *)sub_table; - if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) { - immovable_mem[num].start = ma->base_address; - immovable_mem[num].size = ma->length; - num++; + if (ma->length) { + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { + possible_addr = + ma->base_address + ma->length; + if (possible_addr > max_possible_addr) + max_possible_addr = + possible_addr; + } else { + immovable_mem[num].start = + ma->base_address; + immovable_mem[num].size = ma->length; + num++; + } } if (num >= MAX_NUMNODES*2) { @@ -320,6 +330,7 @@ int count_immovable_mem_regions(void) } } table += sub_table->length; + boot_params->possible_mem_addr = max_possible_addr; } return num; } diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index 60733f137..5b64b606e 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -156,7 +156,7 @@ struct boot_params { __u64 tboot_addr; /* 0x058 */ struct ist_info ist_info; /* 0x060 */ __u64 acpi_rsdp_addr; /* 0x070 */ - __u8 _pad3[8]; /* 0x078 */ + __u64 possible_mem_addr; /* 0x078 */ __u8 hd0_info[16]; /* obsolete! */ /* 0x080 */ __u8 hd1_info[16]; /* obsolete! */ /* 0x090 */ struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */ diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 3f452ffed..71fc28570 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -70,6 +70,30 @@ static inline bool kaslr_memory_enabled(void) return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN); } +static unsigned int __init kaslr_padding(void) +{ + unsigned int rand_mem_physical_padding = + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; +#ifdef CONFIG_MEMORY_HOTPLUG + unsigned long long max_possible_phys, max_actual_phys, threshold; + + if (!boot_params.possible_mem_addr) + goto out; + + max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT); + max_possible_phys = roundup(boot_params.possible_mem_addr, + 1ULL << TB_SHIFT); + threshold = max_actual_phys + + ((unsigned long long)rand_mem_physical_padding << TB_SHIFT); + + if (max_possible_phys > threshold) + rand_mem_physical_padding = + (max_possible_phys - max_actual_phys) >> TB_SHIFT; +out: +#endif + return rand_mem_physical_padding; +} + /* Initialize base and padding for each memory region randomized with KASLR */ void __init kernel_randomize_memory(void) { @@ -103,7 +127,7 @@ void __init kernel_randomize_memory(void) */ BUG_ON(kaslr_regions[0].base != &page_offset_base); memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) + - CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; + kaslr_padding(); /* Adapt phyiscal memory region size based on available memory */ if (memory_tb < kaslr_regions[0].size_tb)
On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote: > From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > Date: Tue, 5 Feb 2019 10:00:59 -0500 > Subject: [PATCH] x86/mm: Introduce adjustment the padding size for KASLR "Adjust the padding size for KASLR" > If the physical memory layout has huge space for hotplug, the padding > used for the physical memory mapping section is not enough. > So, such system may crash while memory hot-adding on KASLR enabled system. Crash why? Why is the padding not enough? > For example, SRAT has the following layout, the maximum possible memory > size is 32TB, and the memory is installed as 2TB actually, then the padding > size should set 30TB (== possible memory size - actual memory size). > > SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug What is that supposed to exemplify: that range is 3T not 2 and that range start is not at 2T but 28T. So I have absolutely no clue what you're trying to explain here. Please go back, take your time and structure your commit message like this: Problem is A. It happens because of B. Fix it by doing C. (Potentially do D). For more detailed info, see Documentation/process/submitting-patches.rst, Section "2) Describe your changes". > This patch introduces adjustment the padding size if the default Avoid having "This patch" or "This commit" in the commit message. It is tautologically useless. Also, do $ git grep 'This patch' Documentation/process for more details. > padding size isn't enough. > > Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > --- > Documentation/x86/zero-page.txt | 1 + > arch/x86/boot/compressed/acpi.c | 19 +++++++++++++++---- > arch/x86/include/uapi/asm/bootparam.h | 2 +- > arch/x86/mm/kaslr.c | 26 +++++++++++++++++++++++++- > 4 files changed, 42 insertions(+), 6 deletions(-) > > diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt > index 68aed077f..343fe1a90 100644 > --- a/Documentation/x86/zero-page.txt > +++ b/Documentation/x86/zero-page.txt > @@ -15,6 +15,7 @@ Offset Proto Name Meaning > 058/008 ALL tboot_addr Physical address of tboot shared page > 060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information > (struct ist_info) > +078/010 ALL possible_mem_addr The possible maximum physical memory address. Why isn't this called max_phys_addr then? Also, please explain what it means at the end of this file. > 080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!! > 090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!! > 0A0/010 ALL sys_desc_table System description table (struct sys_desc_table), > diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c > index c5a949335..7dd61b943 100644 > --- a/arch/x86/boot/compressed/acpi.c > +++ b/arch/x86/boot/compressed/acpi.c > @@ -288,6 +288,7 @@ int count_immovable_mem_regions(void) > struct acpi_subtable_header *sub_table; > struct acpi_table_header *table_header; > char arg[MAX_ACPI_ARG_LENGTH]; > + unsigned long long possible_addr, max_possible_addr = 0; > int num = 0; > > if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 && > @@ -308,10 +309,19 @@ int count_immovable_mem_regions(void) > struct acpi_srat_mem_affinity *ma; > > ma = (struct acpi_srat_mem_affinity *)sub_table; > - if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) { > - immovable_mem[num].start = ma->base_address; > - immovable_mem[num].size = ma->length; > - num++; > + if (ma->length) { > + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { > + possible_addr = > + ma->base_address + ma->length; > + if (possible_addr > max_possible_addr) > + max_possible_addr = > + possible_addr; > + } else { > + immovable_mem[num].start = > + ma->base_address; > + immovable_mem[num].size = ma->length; > + num++; > + } This piece has some ugly linebreaks which makes it impossible to read. Perhaps because the variable names you're adding are too long. You can carve it out in a separate function, for example. > if (num >= MAX_NUMNODES*2) { > @@ -320,6 +330,7 @@ int count_immovable_mem_regions(void) > } > } > table += sub_table->length; > + boot_params->possible_mem_addr = max_possible_addr; This assignment is inside the loop and happens potentially a bunch of times. Why? > } > return num; > } > diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h > index 60733f137..5b64b606e 100644 > --- a/arch/x86/include/uapi/asm/bootparam.h > +++ b/arch/x86/include/uapi/asm/bootparam.h > @@ -156,7 +156,7 @@ struct boot_params { > __u64 tboot_addr; /* 0x058 */ > struct ist_info ist_info; /* 0x060 */ > __u64 acpi_rsdp_addr; /* 0x070 */ > - __u8 _pad3[8]; /* 0x078 */ > + __u64 possible_mem_addr; /* 0x078 */ > __u8 hd0_info[16]; /* obsolete! */ /* 0x080 */ > __u8 hd1_info[16]; /* obsolete! */ /* 0x090 */ > struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */ > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c > index 3f452ffed..71fc28570 100644 > --- a/arch/x86/mm/kaslr.c > +++ b/arch/x86/mm/kaslr.c > @@ -70,6 +70,30 @@ static inline bool kaslr_memory_enabled(void) > return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN); > } > > +static unsigned int __init kaslr_padding(void) > +{ > + unsigned int rand_mem_physical_padding = > + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; > +#ifdef CONFIG_MEMORY_HOTPLUG > + unsigned long long max_possible_phys, max_actual_phys, threshold; > + > + if (!boot_params.possible_mem_addr) > + goto out; > + > + max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT); > + max_possible_phys = roundup(boot_params.possible_mem_addr, > + 1ULL << TB_SHIFT); > + threshold = max_actual_phys + > + ((unsigned long long)rand_mem_physical_padding << TB_SHIFT); > + > + if (max_possible_phys > threshold) > + rand_mem_physical_padding = > + (max_possible_phys - max_actual_phys) >> TB_SHIFT; > +out: > +#endif > + return rand_mem_physical_padding; Same problem: local variables with unnecessarily long names make the code hard to read. Pls shorten. Also, the types in that function are a total mess. An unsigned int which you cast to unsigned long long and return an unsigned int again to add into a sum with unsigned longs. Are you selecting the types randomly? Why aren't you using plain unsigned longs like the rest of the file does with memory addresses? Also, this function could have a comment above it explaining what all that threshold and actual and possible address arithmetic is supposed to do. Thx.
Hi Boris, Thank you for your review. On Fri, Feb 08, 2019 at 07:26:00PM +0100, Borislav Petkov wrote: > On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote: > > From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > > Date: Tue, 5 Feb 2019 10:00:59 -0500 > > Subject: [PATCH] x86/mm: Introduce adjustment the padding size for KASLR > > "Adjust the padding size for KASLR" > > > If the physical memory layout has huge space for hotplug, the padding > > used for the physical memory mapping section is not enough. > > So, such system may crash while memory hot-adding on KASLR enabled system. > > Crash why? > > Why is the padding not enough? > > > For example, SRAT has the following layout, the maximum possible memory > > size is 32TB, and the memory is installed as 2TB actually, then the padding > > size should set 30TB (== possible memory size - actual memory size). > > > > SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug > > What is that supposed to exemplify: that range is 3T not 2 and that > range start is not at 2T but 28T. So I have absolutely no clue what > you're trying to explain here. > > Please go back, take your time and structure your commit message like > this: > > Problem is A. > > It happens because of B. > > Fix it by doing C. > > (Potentially do D). > > For more detailed info, see > Documentation/process/submitting-patches.rst, Section "2) Describe your > changes". Got it, thanks. > > > This patch introduces adjustment the padding size if the default > > Avoid having "This patch" or "This commit" in the commit message. It is > tautologically useless. > > Also, do > > $ git grep 'This patch' Documentation/process > > for more details. Thanks. I see. > > > padding size isn't enough. > > > > Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> > > --- > > Documentation/x86/zero-page.txt | 1 + > > arch/x86/boot/compressed/acpi.c | 19 +++++++++++++++---- > > arch/x86/include/uapi/asm/bootparam.h | 2 +- > > arch/x86/mm/kaslr.c | 26 +++++++++++++++++++++++++- > > 4 files changed, 42 insertions(+), 6 deletions(-) > > > > diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt > > index 68aed077f..343fe1a90 100644 > > --- a/Documentation/x86/zero-page.txt > > +++ b/Documentation/x86/zero-page.txt > > @@ -15,6 +15,7 @@ Offset Proto Name Meaning > > 058/008 ALL tboot_addr Physical address of tboot shared page > > 060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information > > (struct ist_info) > > +078/010 ALL possible_mem_addr The possible maximum physical memory address. > > Why isn't this called max_phys_addr then? > > Also, please explain what it means at the end of this file. > > > 080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!! > > 090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!! > > 0A0/010 ALL sys_desc_table System description table (struct sys_desc_table), > > diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c > > index c5a949335..7dd61b943 100644 > > --- a/arch/x86/boot/compressed/acpi.c > > +++ b/arch/x86/boot/compressed/acpi.c > > @@ -288,6 +288,7 @@ int count_immovable_mem_regions(void) > > struct acpi_subtable_header *sub_table; > > struct acpi_table_header *table_header; > > char arg[MAX_ACPI_ARG_LENGTH]; > > + unsigned long long possible_addr, max_possible_addr = 0; > > int num = 0; > > > > if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 && > > @@ -308,10 +309,19 @@ int count_immovable_mem_regions(void) > > struct acpi_srat_mem_affinity *ma; > > > > ma = (struct acpi_srat_mem_affinity *)sub_table; > > - if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) { > > - immovable_mem[num].start = ma->base_address; > > - immovable_mem[num].size = ma->length; > > - num++; > > + if (ma->length) { > > + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { > > + possible_addr = > > + ma->base_address + ma->length; > > + if (possible_addr > max_possible_addr) > > + max_possible_addr = > > + possible_addr; > > + } else { > > + immovable_mem[num].start = > > + ma->base_address; > > + immovable_mem[num].size = ma->length; > > + num++; > > + } > > This piece has some ugly linebreaks which makes it impossible to read. > Perhaps because the variable names you're adding are too long. > > You can carve it out in a separate function, for example. Thanks. I will add a separate function like as: static void subtable_parse(struct acpi_subtable_header *sub_table, int *num, unsigned long *max_addr) { struct acpi_srat_mem_affinity *ma; unsigned long addr; ma = (struct acpi_srat_mem_affinity *)sub_table; if (ma->length) { if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { addr = ma->base_address + ma->length; if (addr > *max_addr) *max_addr = addr; } else { immovable_mem[*num].start = ma->base_address; immovable_mem[*num].size = ma->length; (*num)++; } } } > > > if (num >= MAX_NUMNODES*2) { > > @@ -320,6 +330,7 @@ int count_immovable_mem_regions(void) > > } > > } > > table += sub_table->length; > > + boot_params->possible_mem_addr = max_possible_addr; > > This assignment is inside the loop and happens potentially a bunch of > times. Why? I will move the assigment out of the loop, thanks. > > > } > > return num; > > } > > diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h > > index 60733f137..5b64b606e 100644 > > --- a/arch/x86/include/uapi/asm/bootparam.h > > +++ b/arch/x86/include/uapi/asm/bootparam.h > > @@ -156,7 +156,7 @@ struct boot_params { > > __u64 tboot_addr; /* 0x058 */ > > struct ist_info ist_info; /* 0x060 */ > > __u64 acpi_rsdp_addr; /* 0x070 */ > > - __u8 _pad3[8]; /* 0x078 */ > > + __u64 possible_mem_addr; /* 0x078 */ > > __u8 hd0_info[16]; /* obsolete! */ /* 0x080 */ > > __u8 hd1_info[16]; /* obsolete! */ /* 0x090 */ > > struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */ > > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c > > index 3f452ffed..71fc28570 100644 > > --- a/arch/x86/mm/kaslr.c > > +++ b/arch/x86/mm/kaslr.c > > @@ -70,6 +70,30 @@ static inline bool kaslr_memory_enabled(void) > > return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN); > > } > > > > +static unsigned int __init kaslr_padding(void) > > +{ > > + unsigned int rand_mem_physical_padding = > > + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; > > +#ifdef CONFIG_MEMORY_HOTPLUG > > + unsigned long long max_possible_phys, max_actual_phys, threshold; > > + > > + if (!boot_params.possible_mem_addr) > > + goto out; > > + > > + max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT); > > + max_possible_phys = roundup(boot_params.possible_mem_addr, > > + 1ULL << TB_SHIFT); > > + threshold = max_actual_phys + > > + ((unsigned long long)rand_mem_physical_padding << TB_SHIFT); > > + > > + if (max_possible_phys > threshold) > > + rand_mem_physical_padding = > > + (max_possible_phys - max_actual_phys) >> TB_SHIFT; > > +out: > > +#endif > > + return rand_mem_physical_padding; > > Same problem: local variables with unnecessarily long names make the > code hard to read. Pls shorten. > > Also, the types in that function are a total mess. An unsigned int which > you cast to unsigned long long and return an unsigned int again to add > into a sum with unsigned longs. Are you selecting the types randomly? > Why aren't you using plain unsigned longs like the rest of the file does > with memory addresses? Sorry. I will unify the type as unsinged long. > > Also, this function could have a comment above it explaining what all > that threshold and actual and possible address arithmetic is supposed to > do. Got it. Thanks! Masa
On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote: [...] Hi Masa, Sorry for delay, since last days were Chinese holiday. >diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c >index c5a949335..7dd61b943 100644 >--- a/arch/x86/boot/compressed/acpi.c >+++ b/arch/x86/boot/compressed/acpi.c >@@ -288,6 +288,7 @@ int count_immovable_mem_regions(void) > struct acpi_subtable_header *sub_table; > struct acpi_table_header *table_header; > char arg[MAX_ACPI_ARG_LENGTH]; >+ unsigned long long possible_addr, max_possible_addr = 0; This line is so long that it should be added in first line. > int num = 0; > > if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 && >@@ -308,10 +309,19 @@ int count_immovable_mem_regions(void) > struct acpi_srat_mem_affinity *ma; > > ma = (struct acpi_srat_mem_affinity *)sub_table; >- if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) { >- immovable_mem[num].start = ma->base_address; >- immovable_mem[num].size = ma->length; >- num++; >+ if (ma->length) { >+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { >+ possible_addr = >+ ma->base_address + ma->length; >+ if (possible_addr > max_possible_addr) >+ max_possible_addr = >+ possible_addr; >+ } else { >+ immovable_mem[num].start = >+ ma->base_address; >+ immovable_mem[num].size = ma->length; >+ num++; >+ } > } It looks better in another mail where you add a new function. Thanks, Chao Fan > > if (num >= MAX_NUMNODES*2) { >@@ -320,6 +330,7 @@ int count_immovable_mem_regions(void) > } > } > table += sub_table->length; >+ boot_params->possible_mem_addr = max_possible_addr; > } > return num; > } >diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h >index 60733f137..5b64b606e 100644 >--- a/arch/x86/include/uapi/asm/bootparam.h >+++ b/arch/x86/include/uapi/asm/bootparam.h >@@ -156,7 +156,7 @@ struct boot_params { > __u64 tboot_addr; /* 0x058 */ > struct ist_info ist_info; /* 0x060 */ > __u64 acpi_rsdp_addr; /* 0x070 */ >- __u8 _pad3[8]; /* 0x078 */ >+ __u64 possible_mem_addr; /* 0x078 */ > __u8 hd0_info[16]; /* obsolete! */ /* 0x080 */ > __u8 hd1_info[16]; /* obsolete! */ /* 0x090 */ > struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */ >diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c >index 3f452ffed..71fc28570 100644 >--- a/arch/x86/mm/kaslr.c >+++ b/arch/x86/mm/kaslr.c >@@ -70,6 +70,30 @@ static inline bool kaslr_memory_enabled(void) > return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN); > } > >+static unsigned int __init kaslr_padding(void) >+{ >+ unsigned int rand_mem_physical_padding = >+ CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; >+#ifdef CONFIG_MEMORY_HOTPLUG >+ unsigned long long max_possible_phys, max_actual_phys, threshold; >+ >+ if (!boot_params.possible_mem_addr) >+ goto out; >+ >+ max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT); >+ max_possible_phys = roundup(boot_params.possible_mem_addr, >+ 1ULL << TB_SHIFT); >+ threshold = max_actual_phys + >+ ((unsigned long long)rand_mem_physical_padding << TB_SHIFT); >+ >+ if (max_possible_phys > threshold) >+ rand_mem_physical_padding = >+ (max_possible_phys - max_actual_phys) >> TB_SHIFT; >+out: >+#endif >+ return rand_mem_physical_padding; >+} >+ > /* Initialize base and padding for each memory region randomized with KASLR */ > void __init kernel_randomize_memory(void) > { >@@ -103,7 +127,7 @@ void __init kernel_randomize_memory(void) > */ > BUG_ON(kaslr_regions[0].base != &page_offset_base); > memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) + >- CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; >+ kaslr_padding(); > > /* Adapt phyiscal memory region size based on available memory */ > if (memory_tb < kaslr_regions[0].size_tb) >-- >2.20.1 > >Thanks, >Masa > >
Hi Chao, Thank you for your review. On Mon, Feb 11, 2019 at 09:46:05AM +0800, Chao Fan wrote: > On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote: > [...] > > Hi Masa, > > Sorry for delay, since last days were Chinese holiday. > > >diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c > >index c5a949335..7dd61b943 100644 > >--- a/arch/x86/boot/compressed/acpi.c > >+++ b/arch/x86/boot/compressed/acpi.c > >@@ -288,6 +288,7 @@ int count_immovable_mem_regions(void) > > struct acpi_subtable_header *sub_table; > > struct acpi_table_header *table_header; > > char arg[MAX_ACPI_ARG_LENGTH]; > >+ unsigned long long possible_addr, max_possible_addr = 0; > > This line is so long that it should be added in first line. Thanks. I will simplify around the local variables. > > > int num = 0; > > > > if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 && > >@@ -308,10 +309,19 @@ int count_immovable_mem_regions(void) > > struct acpi_srat_mem_affinity *ma; > > > > ma = (struct acpi_srat_mem_affinity *)sub_table; > >- if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) { > >- immovable_mem[num].start = ma->base_address; > >- immovable_mem[num].size = ma->length; > >- num++; > >+ if (ma->length) { > >+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) { > >+ possible_addr = > >+ ma->base_address + ma->length; > >+ if (possible_addr > max_possible_addr) > >+ max_possible_addr = > >+ possible_addr; > >+ } else { > >+ immovable_mem[num].start = > >+ ma->base_address; > >+ immovable_mem[num].size = ma->length; > >+ num++; > >+ } > > } > > It looks better in another mail where you add a new function. Thanks! - Masa