mbox series

[v8,0/3] x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory

Message ID 20181010084119.17539-1-fanc.fnst@cn.fujitsu.com (mailing list archive)
Headers show
Series x86/boot/KASLR: Parse ACPI table and limit kaslr in immovable memory | expand

Message

Chao Fan Oct. 10, 2018, 8:41 a.m. UTC
***Background:
People reported that kaslr may randomly chooses some positions
which are located in movable memory regions. This will break memory
hotplug feature and make the movable memory chosen by KASLR can't be
removed.

***Solutions:
There should be a method to limit kaslr to choosing immovable memory
regions, so there are 2 solutions:
1) Add a kernel parameter to specify the memory regions.
2) Get the information of memory hot-remove, then kaslr will know the
   right regions.
In method 2, information about memory hot-remove is in ACPI
tables, which will be parsed after start_kernel(), kaslr can't get
the information.
In method 1, users should know the regions address and specify in
kernel parameter.

In the earliest time, I tried to dig ACPI tabls to solve this problem.
But I didn't splite the code in 'compressed/' and ACPI code, so the patch
is hard to follow so refused by community.
Somebody suggest to add a kernel parameter to specify the
immovable memory so that limit kaslr in these regions. Then I make
a new patchset. After several versions, Ingo gave a suggestion:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html
Follow Ingo's suggestion, imitate the ACPI code to parse the acpi
tables, so that the kaslr can get necessary memory information in
ACPI tables.
I think ACPI code is an independent part, so copy the codes
and functions to 'compressed/' directory, so that kaslr won't
influence the initialization of ACPI.

PATCH 1/3 Add acpitb.c to provide functions to parse ACPI code.
PATCH 2/3 If CONFIG_MEMORY_HOTREMOVE enabled, walk all nodes and
          store the information of immovable memory regions.
PATCH 3/3 According to the immovable memory regions, filter the
          immovable regions which KASLR can choose.

***Test results:
 - I did a very simple test, and it can get the memory information in
   bios and efi KVM guest machine, and put it by early printk. But no
   more tests, so it's with RFC tag.

v1->v2:
 -  Simplify some code.
Follow Baoquan He's suggestion:
 - Reuse the head file of acpi code.

v2->v3:
 - Test in more conditions, so remove the 'RFC' tag.
 - Change some comments.

v3->v4:
Follow Thomas Gleixner's suggetsion:
 - Put the whole efi related function into #define CONFIG_EFI and return
   false in the other stub.
 - Simplify two functions in head file.

v4->v5:
Follow Dou Liyang's suggestion:
 - Add more comments about some functions based on kernel code.
 - Change some typo in comments.
 - Clean useless variable.
 - Add check for the boundary of array.
 - Add check for 'movable_node' parameter

v5->v6:
Follow Baoquan He's suggestion:
 - Change some log.
 - Add the check for acpi_rsdp
 - Change some code logical to make code clear

v6->v7:
Follow Rafael's suggestion:
 - Add more comments and patch log.
Follow test robot's suggestion:
 - Add "static" tag for function

v7-v8:
Follow Kees Cook's suggestion:
 - Use mem_overlaps() to check memory region.
 - Use #ifdef in the definition of function.

Any comments will be welcome.

Chao Fan (3):
  x86/boot: Add acpitb.c to parse acpi tables
  x86/boot/KASLR: Walk srat tables to filter immovable memory
  x86/boot/KASLR: Limit kaslr to choosing the immovable memory

 arch/x86/boot/compressed/Makefile |   2 +
 arch/x86/boot/compressed/acpitb.c | 405 ++++++++++++++++++++++++++++++
 arch/x86/boot/compressed/kaslr.c  |  75 +++++-
 arch/x86/boot/compressed/misc.h   |   8 +
 4 files changed, 479 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/boot/compressed/acpitb.c

Comments

Borislav Petkov Oct. 10, 2018, 8:59 a.m. UTC | #1
On Wed, Oct 10, 2018 at 04:41:16PM +0800, Chao Fan wrote:
> ***Background:
> People reported that kaslr may randomly chooses some positions
> which are located in movable memory regions. This will break memory
> hotplug feature and make the movable memory chosen by KASLR can't be
> removed.
> 
> ***Solutions:
> There should be a method to limit kaslr to choosing immovable memory
> regions, so there are 2 solutions:
> 1) Add a kernel parameter to specify the memory regions.
> 2) Get the information of memory hot-remove, then kaslr will know the
>    right regions.
> In method 2, information about memory hot-remove is in ACPI
> tables, which will be parsed after start_kernel(), kaslr can't get
> the information.
> In method 1, users should know the regions address and specify in
> kernel parameter.
> 
> In the earliest time, I tried to dig ACPI tabls to solve this problem.
> But I didn't splite the code in 'compressed/' and ACPI code, so the patch
> is hard to follow so refused by community.
> Somebody suggest to add a kernel parameter to specify the
> immovable memory so that limit kaslr in these regions. Then I make
> a new patchset. After several versions, Ingo gave a suggestion:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html
> Follow Ingo's suggestion, imitate the ACPI code to parse the acpi
> tables, so that the kaslr can get necessary memory information in
> ACPI tables.
> I think ACPI code is an independent part, so copy the codes
> and functions to 'compressed/' directory, so that kaslr won't
> influence the initialization of ACPI.

... and we just picked up

https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com

and without having looked at the rest of your stuff, if people accept
your solution, we don't need the silly parameter anymore, right?

Which means, we should not rush the whole thing yet until the whole
KASLR vs movable memory gets solved properly.

Ingo, Thomas?
Baoquan He Oct. 10, 2018, 9:06 a.m. UTC | #2
Hi Boris,

On 10/10/18 at 10:59am, Borislav Petkov wrote:
> ... and we just picked up
> 
> https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com
> 
> and without having looked at the rest of your stuff, if people accept
> your solution, we don't need the silly parameter anymore, right?
> 
> Which means, we should not rush the whole thing yet until the whole
> KASLR vs movable memory gets solved properly.

Masa's patches solves the problem in memory region KASLR which later hot
added memory may be big than the default padding 10 TB.

Chao's patches is trying to fix a conflict between 'movable_node' and
kernel text KASLR. If 'movable_node' specified, we rely on SRAT to get
which memory region is movable or immovable, and movable region can be
hot removed. But if kernel is randomized into movable memory, it can't
be hot removed any more, this is a regression after KASLR introduced.
So this is a different issue than Masa's.

Thanks
Baoquan
Chao Fan Oct. 10, 2018, 9:12 a.m. UTC | #3
On Wed, Oct 10, 2018 at 05:06:20PM +0800, Baoquan He wrote:
>Hi Boris,
>
>On 10/10/18 at 10:59am, Borislav Petkov wrote:
>> ... and we just picked up
>> 
>> https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com
>> 
>> and without having looked at the rest of your stuff, if people accept
>> your solution, we don't need the silly parameter anymore, right?
>> 
>> Which means, we should not rush the whole thing yet until the whole
>> KASLR vs movable memory gets solved properly.
>
>Masa's patches solves the problem in memory region KASLR which later hot
>added memory may be big than the default padding 10 TB.
>
>Chao's patches is trying to fix a conflict between 'movable_node' and
>kernel text KASLR. If 'movable_node' specified, we rely on SRAT to get
>which memory region is movable or immovable, and movable region can be
>hot removed. But if kernel is randomized into movable memory, it can't
>be hot removed any more, this is a regression after KASLR introduced.
>So this is a different issue than Masa's.

Yes, they are two issues.
But if we can get more memory information by the function in
the new file acpi.c, semms it's helfpul to Masa's issue.

Thanks,
Chao Fan

>
>Thanks
>Baoquan
>
>
Thomas Gleixner Oct. 10, 2018, 9:14 a.m. UTC | #4
On Wed, 10 Oct 2018, Baoquan He wrote:

> Hi Boris,
> 
> On 10/10/18 at 10:59am, Borislav Petkov wrote:
> > ... and we just picked up
> > 
> > https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com
> > 
> > and without having looked at the rest of your stuff, if people accept
> > your solution, we don't need the silly parameter anymore, right?
> > 
> > Which means, we should not rush the whole thing yet until the whole
> > KASLR vs movable memory gets solved properly.
> 
> Masa's patches solves the problem in memory region KASLR which later hot
> added memory may be big than the default padding 10 TB.
> 
> Chao's patches is trying to fix a conflict between 'movable_node' and
> kernel text KASLR. If 'movable_node' specified, we rely on SRAT to get
> which memory region is movable or immovable, and movable region can be
> hot removed. But if kernel is randomized into movable memory, it can't
> be hot removed any more, this is a regression after KASLR introduced.
> So this is a different issue than Masa's.

Yes, it's different, but if the SRAT information is available early, then
the command line parameter can go away because then the required
information for Masa's problem is available as well.

Thanks,

	tglx
Borislav Petkov Oct. 10, 2018, 9:19 a.m. UTC | #5
On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote:
> Yes, it's different, but if the SRAT information is available early, then
> the command line parameter can go away because then the required
> information for Masa's problem is available as well.

Exactly. And I'd prefer we delayed the command line parameter until we
figure out we really need it and not expose it to upstream and then
remove it shortly after.

So I'd suggest we move Masa's patches to a separate branch and not send
it up this round.
Baoquan He Oct. 10, 2018, 9:21 a.m. UTC | #6
On 10/10/18 at 05:12pm, Chao Fan wrote:
> On Wed, Oct 10, 2018 at 05:06:20PM +0800, Baoquan He wrote:
> >Hi Boris,
> >
> >On 10/10/18 at 10:59am, Borislav Petkov wrote:
> >> ... and we just picked up
> >> 
> >> https://lkml.kernel.org/r/20181001140843.26137-1-msys.mizuma@gmail.com
> >> 
> >> and without having looked at the rest of your stuff, if people accept
> >> your solution, we don't need the silly parameter anymore, right?
> >> 
> >> Which means, we should not rush the whole thing yet until the whole
> >> KASLR vs movable memory gets solved properly.
> >
> >Masa's patches solves the problem in memory region KASLR which later hot
> >added memory may be big than the default padding 10 TB.
> >
> >Chao's patches is trying to fix a conflict between 'movable_node' and
> >kernel text KASLR. If 'movable_node' specified, we rely on SRAT to get
> >which memory region is movable or immovable, and movable region can be
> >hot removed. But if kernel is randomized into movable memory, it can't
> >be hot removed any more, this is a regression after KASLR introduced.
> >So this is a different issue than Masa's.
> 
> Yes, they are two issues.
> But if we can get more memory information by the function in
> the new file acpi.c, semms it's helfpul to Masa's issue.

Hmm, reading SRAT three times during x86 kernel boot? Maybe we try this
after the function has run a time and proved very stable?

Thanks
Baoquan
Baoquan He Oct. 10, 2018, 9:30 a.m. UTC | #7
On 10/10/18 at 11:19am, Borislav Petkov wrote:
> On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote:
> > Yes, it's different, but if the SRAT information is available early, then
> > the command line parameter can go away because then the required
> > information for Masa's problem is available as well.
> 
> Exactly. And I'd prefer we delayed the command line parameter until we
> figure out we really need it and not expose it to upstream and then
> remove it shortly after.
> 
> So I'd suggest we move Masa's patches to a separate branch and not send
> it up this round.

Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3
to solve the padding issue.

Thanks
Baoquan
Borislav Petkov Oct. 10, 2018, 5:16 p.m. UTC | #8
On Wed, Oct 10, 2018 at 04:41:16PM +0800, Chao Fan wrote:
> In the earliest time, I tried to dig ACPI tabls to solve this problem.
> But I didn't splite the code in 'compressed/' and ACPI code, so the patch
> is hard to follow so refused by community.
> Somebody suggest to add a kernel parameter to specify the
> immovable memory so that limit kaslr in these regions. Then I make
> a new patchset. After several versions, Ingo gave a suggestion:
> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html
> Follow Ingo's suggestion, imitate the ACPI code to parse the acpi
> tables, so that the kaslr can get necessary memory information in
> ACPI tables.
> I think ACPI code is an independent part, so copy the codes
> and functions to 'compressed/' directory, so that kaslr won't
> influence the initialization of ACPI.

You say "copy". I'm still about to look at the code but can those
functions be carved out in a separate compilation unit which ACPI *and*
KASLR can both link with so that there's no duplication?
Masayoshi Mizuma Oct. 10, 2018, 7:44 p.m. UTC | #9
On Wed, Oct 10, 2018 at 05:30:57PM +0800, Baoquan He wrote:
> On 10/10/18 at 11:19am, Borislav Petkov wrote:
> > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote:
> > > Yes, it's different, but if the SRAT information is available early, then
> > > the command line parameter can go away because then the required
> > > information for Masa's problem is available as well.
> > 
> > Exactly. And I'd prefer we delayed the command line parameter until we
> > figure out we really need it and not expose it to upstream and then
> > remove it shortly after.
> > 
> > So I'd suggest we move Masa's patches to a separate branch and not send
> > it up this round.
> 
> Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3
> to solve the padding issue.

Thanks for your comments! Yes, immovable_mem[num_immovable_mem] in Chao's
patch may be useful for calculating the padding size. If so, we don't
need the new kernel parameter. It's nice!

Do you happen to have ideas how we access immovable_mem[num_immovable_mem]
from arch/x86/mm/kaslr.c ? It is located to arch/x86/boot/compressed/*, so
I suppose it is not easy to access it... 
I would appreciate if you could give some advice.

Thanks!
Masa
Baoquan He Oct. 11, 2018, 12:29 a.m. UTC | #10
On 10/10/18 at 03:44pm, Masayoshi Mizuma wrote:
> On Wed, Oct 10, 2018 at 05:30:57PM +0800, Baoquan He wrote:
> > On 10/10/18 at 11:19am, Borislav Petkov wrote:
> > > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote:
> > > > Yes, it's different, but if the SRAT information is available early, then
> > > > the command line parameter can go away because then the required
> > > > information for Masa's problem is available as well.
> > > 
> > > Exactly. And I'd prefer we delayed the command line parameter until we
> > > figure out we really need it and not expose it to upstream and then
> > > remove it shortly after.
> > > 
> > > So I'd suggest we move Masa's patches to a separate branch and not send
> > > it up this round.
> > 
> > Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3
> > to solve the padding issue.
> 
> Thanks for your comments! Yes, immovable_mem[num_immovable_mem] in Chao's
> patch may be useful for calculating the padding size. If so, we don't
> need the new kernel parameter. It's nice!
> 
> Do you happen to have ideas how we access immovable_mem[num_immovable_mem]
> from arch/x86/mm/kaslr.c ? It is located to arch/x86/boot/compressed/*, so
> I suppose it is not easy to access it... 
> I would appreciate if you could give some advice.

Hmm, they are living in different life cycle and space. So we can only
reuse the code from Chao's patch, but not the variable. Means we need
go through the SRAT accessing again in arch/x86/mm/kaslr.c and fill
immovable_mem[num_immovable_mem] for mm/kaslr.c usage, if we decide to
do like that.

Thanks
Baoquan
Chao Fan Oct. 11, 2018, 1:30 a.m. UTC | #11
On Wed, Oct 10, 2018 at 07:16:16PM +0200, Borislav Petkov wrote:
>On Wed, Oct 10, 2018 at 04:41:16PM +0800, Chao Fan wrote:
>> In the earliest time, I tried to dig ACPI tabls to solve this problem.
>> But I didn't splite the code in 'compressed/' and ACPI code, so the patch
>> is hard to follow so refused by community.
>> Somebody suggest to add a kernel parameter to specify the
>> immovable memory so that limit kaslr in these regions. Then I make
>> a new patchset. After several versions, Ingo gave a suggestion:
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1634024.html
>> Follow Ingo's suggestion, imitate the ACPI code to parse the acpi
>> tables, so that the kaslr can get necessary memory information in
>> ACPI tables.
>> I think ACPI code is an independent part, so copy the codes
>> and functions to 'compressed/' directory, so that kaslr won't
>> influence the initialization of ACPI.
>
>You say "copy". I'm still about to look at the code but can those
>functions be carved out in a separate compilation unit which ACPI *and*
>KASLR can both link with so that there's no duplication?

Sorry for my poor English, I used 'copy' but they are not same.
Maybe 'imitate' is better.
Just like I said in my log, The ACPI part need to handle the map
between physical address and virtual address.
But in KASLR part, I remove these operations.
So my code is simplified version.

Thanks,
Chao Fan

>
>-- 
>Regards/Gruss,
>    Boris.
>
>Good mailing practices for 400: avoid top-posting and trim the reply.
>
>
Chao Fan Oct. 11, 2018, 5:51 a.m. UTC | #12
On Thu, Oct 11, 2018 at 08:29:55AM +0800, Baoquan He wrote:
>On 10/10/18 at 03:44pm, Masayoshi Mizuma wrote:
>> On Wed, Oct 10, 2018 at 05:30:57PM +0800, Baoquan He wrote:
>> > On 10/10/18 at 11:19am, Borislav Petkov wrote:
>> > > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote:
>> > > > Yes, it's different, but if the SRAT information is available early, then
>> > > > the command line parameter can go away because then the required
>> > > > information for Masa's problem is available as well.
>> > > 
>> > > Exactly. And I'd prefer we delayed the command line parameter until we
>> > > figure out we really need it and not expose it to upstream and then
>> > > remove it shortly after.
>> > > 
>> > > So I'd suggest we move Masa's patches to a separate branch and not send
>> > > it up this round.
>> > 
>> > Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3
>> > to solve the padding issue.
>> 
>> Thanks for your comments! Yes, immovable_mem[num_immovable_mem] in Chao's
>> patch may be useful for calculating the padding size. If so, we don't
>> need the new kernel parameter. It's nice!
>> 
>> Do you happen to have ideas how we access immovable_mem[num_immovable_mem]
>> from arch/x86/mm/kaslr.c ? It is located to arch/x86/boot/compressed/*, so
>> I suppose it is not easy to access it... 
>> I would appreciate if you could give some advice.
>
>Hmm, they are living in different life cycle and space. So we can only
>reuse the code from Chao's patch, but not the variable. Means we need
>go through the SRAT accessing again in arch/x86/mm/kaslr.c and fill
>immovable_mem[num_immovable_mem] for mm/kaslr.c usage, if we decide to
>do like that.

Reading three times is redundant, but reading two times is needed.
Becasue the ACPI code run very stable, but we need more information
before that.
As for Masa's issue, I am wondering whether we can tranfer the
information or only the address of SRAT table from compressed period
to the period after start_kernel().

Thanks,
Chao Fan

>
>Thanks
>Baoquan
>
>
Masayoshi Mizuma Oct. 13, 2018, 8:19 p.m. UTC | #13
Baoquan and Chao, thank you for your comments!

Boris, the patches has been merged to linux-next.
Should I create a new patch for linux-next?
I'm trying to make the padding size set automatically.

Thanks,
Masa

On Thu, Oct 11, 2018 at 01:51:54PM +0800, Chao Fan wrote:
> On Thu, Oct 11, 2018 at 08:29:55AM +0800, Baoquan He wrote:
> >On 10/10/18 at 03:44pm, Masayoshi Mizuma wrote:
> >> On Wed, Oct 10, 2018 at 05:30:57PM +0800, Baoquan He wrote:
> >> > On 10/10/18 at 11:19am, Borislav Petkov wrote:
> >> > > On Wed, Oct 10, 2018 at 11:14:26AM +0200, Thomas Gleixner wrote:
> >> > > > Yes, it's different, but if the SRAT information is available early, then
> >> > > > the command line parameter can go away because then the required
> >> > > > information for Masa's problem is available as well.
> >> > > 
> >> > > Exactly. And I'd prefer we delayed the command line parameter until we
> >> > > figure out we really need it and not expose it to upstream and then
> >> > > remove it shortly after.
> >> > > 
> >> > > So I'd suggest we move Masa's patches to a separate branch and not send
> >> > > it up this round.
> >> > 
> >> > Yes, sounds more reasonable if we can reuse functions in Chao's patch 1/3
> >> > to solve the padding issue.
> >> 
> >> Thanks for your comments! Yes, immovable_mem[num_immovable_mem] in Chao's
> >> patch may be useful for calculating the padding size. If so, we don't
> >> need the new kernel parameter. It's nice!
> >> 
> >> Do you happen to have ideas how we access immovable_mem[num_immovable_mem]
> >> from arch/x86/mm/kaslr.c ? It is located to arch/x86/boot/compressed/*, so
> >> I suppose it is not easy to access it... 
> >> I would appreciate if you could give some advice.
> >
> >Hmm, they are living in different life cycle and space. So we can only
> >reuse the code from Chao's patch, but not the variable. Means we need
> >go through the SRAT accessing again in arch/x86/mm/kaslr.c and fill
> >immovable_mem[num_immovable_mem] for mm/kaslr.c usage, if we decide to
> >do like that.
> 
> Reading three times is redundant, but reading two times is needed.
> Becasue the ACPI code run very stable, but we need more information
> before that.
> As for Masa's issue, I am wondering whether we can tranfer the
> information or only the address of SRAT table from compressed period
> to the period after start_kernel().
> 
> Thanks,
> Chao Fan
> 
> >
> >Thanks
> >Baoquan
> >
> >
> 
>
Borislav Petkov Oct. 13, 2018, 8:34 p.m. UTC | #14
On Sat, Oct 13, 2018 at 04:19:59PM -0400, Masayoshi Mizuma wrote:
> Baoquan and Chao, thank you for your comments!
> 
> Boris, the patches has been merged to linux-next.

First of all, please do not top-post.

Even if they're in linux-next, that doesn't mean they should go upstream
while there are still issues to be hammered out.

In this particular case, if Chao Fan's patchset shapes up in a decent
form to be upstreamed, we don't need the cmdline parameter anymore, do
we?

Because we'll have the SRAT parsing and proper KASLR limits ready and
working automatically, I'd say.
Masayoshi Mizuma Oct. 13, 2018, 9:45 p.m. UTC | #15
On Sat, Oct 13, 2018 at 10:34:29PM +0200, Borislav Petkov wrote:
> On Sat, Oct 13, 2018 at 04:19:59PM -0400, Masayoshi Mizuma wrote:
> > Baoquan and Chao, thank you for your comments!
> > 
> > Boris, the patches has been merged to linux-next.
> 
> First of all, please do not top-post.
> 
> Even if they're in linux-next, that doesn't mean they should go upstream
> while there are still issues to be hammered out.

Got it, thanks.

> 
> In this particular case, if Chao Fan's patchset shapes up in a decent
> form to be upstreamed, we don't need the cmdline parameter anymore, do
> we?
> 
> Because we'll have the SRAT parsing and proper KASLR limits ready and
> working automatically, I'd say.

Yes, but I think we need to arrange the Chao's SRAT parsing to be used
from kernel_randomize_memory() because Chao's approach needs the SRAT
parsing before extract kernel and the padding size calculation needs
the parsing in start_kernel(), so they are living in different life
cycle and space.

Thanks,
Masa
Borislav Petkov Oct. 13, 2018, 10:05 p.m. UTC | #16
On Sat, Oct 13, 2018 at 05:45:51PM -0400, Masayoshi Mizuma wrote:
> Yes, but I think we need to arrange the Chao's SRAT parsing to be used
> from kernel_randomize_memory() because Chao's approach needs the SRAT
> parsing before extract kernel and the padding size calculation needs
> the parsing in start_kernel(), so they are living in different life
> cycle and space.

Or you would like to pass some info from the compressed kernel to kernel
proper?

For example, something like the passing in add_e820ext() and the consumtion
in parse_setup_data():

                switch (data_type) {
                case SETUP_E820_EXT:
                        e820__memory_setup_extended(pa_data, data_len);

So info you need for your padding gets collected in
arch/x86/boot/compressed/acpitb.c and you consume it in
kernel_randomize_memory()?

Would that work?

:-)
Masayoshi Mizuma Oct. 15, 2018, 12:50 a.m. UTC | #17
On Sun, Oct 14, 2018 at 12:05:41AM +0200, Borislav Petkov wrote:
> On Sat, Oct 13, 2018 at 05:45:51PM -0400, Masayoshi Mizuma wrote:
> > Yes, but I think we need to arrange the Chao's SRAT parsing to be used
> > from kernel_randomize_memory() because Chao's approach needs the SRAT
> > parsing before extract kernel and the padding size calculation needs
> > the parsing in start_kernel(), so they are living in different life
> > cycle and space.
> 
> Or you would like to pass some info from the compressed kernel to kernel
> proper?
> 
> For example, something like the passing in add_e820ext() and the consumtion
> in parse_setup_data():
> 
>                 switch (data_type) {
>                 case SETUP_E820_EXT:
>                         e820__memory_setup_extended(pa_data, data_len);
> 
> So info you need for your padding gets collected in
> arch/x86/boot/compressed/acpitb.c and you consume it in
> kernel_randomize_memory()?
> 
> Would that work?

It's nice idea! Thank you so much!

- Masa
Masayoshi Mizuma Oct. 16, 2018, 3:13 p.m. UTC | #18
Hi Boris, Baoquan and all,

I have created a RFC patch for adjust KASLR padding size.
This patch is based on Can's v8 patch [1/3], and the Can's patch
will be changed in the future, so this patch is just RFC.

Welcome to any comments and suggestions. Thanks!

---
 arch/x86/boot/compressed/acpitb.c     |  8 ++++++-
 arch/x86/include/uapi/asm/bootparam.h |  3 ++-
 arch/x86/mm/kaslr.c                   | 31 ++++++++++++++++++++++++++-
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/acpitb.c b/arch/x86/boot/compressed/acpitb.c
index 6b869e3f9780..1504b46f2a04 100644
--- a/arch/x86/boot/compressed/acpitb.c
+++ b/arch/x86/boot/compressed/acpitb.c
@@ -367,6 +367,7 @@ void get_immovable_mem(void)
 	struct acpi_subtable_header *table;
 	struct acpi_srat_mem_affinity *ma;
 	unsigned long table_end;
+	unsigned long long possible_addr, max_possible_addr = 0;
 	int i = 0;
 
 	if (!strstr(args, "movable_node") || strstr(args, "acpi=off"))
@@ -384,7 +385,11 @@ void get_immovable_mem(void)
 	while (((unsigned long)table) + table->length < table_end) {
 		if (table->type == 1) {
 			ma = (struct acpi_srat_mem_affinity *)table;
-			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+			if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
+				possible_addr = ma->base_address + ma->length;
+				if (possible_addr > max_possible_addr)
+					max_possible_addr = possible_addr;
+			} else {
 				immovable_mem[i].start = ma->base_address;
 				immovable_mem[i].size = ma->length;
 				i++;
@@ -397,6 +402,7 @@ void get_immovable_mem(void)
 			((unsigned long)table + table->length);
 	}
 	num_immovable_mem = i;
+	boot_params->possible_mem_addr = max_possible_addr;
 }
 #else
 void get_immovable_mem(void)
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index a06cbf019744..c987c725e93a 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -191,7 +191,8 @@ struct boot_params {
 	__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
 	__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];	/* 0x290 */
 	struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */
-	__u8  _pad8[48];				/* 0xcd0 */
+	__u8  _pad8[40];				/* 0xcd0 */
+	__u64 possible_mem_addr;			/* 0xcf8 */
 	struct edd_info eddbuf[EDDMAXNR];		/* 0xd00 */
 	__u8  _pad9[276];				/* 0xeec */
 } __attribute__((packed));
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 61db77b0eda9..8c5aca792b51 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -69,6 +69,35 @@ static inline bool kaslr_memory_enabled(void)
 	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
 }
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+static unsigned int __init kaslr_padding(void)
+{
+	unsigned long long max_possible_phys, max_actual_phys, threshold;
+	unsigned int rand_mem_physical_padding =
+				CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+
+	if (!boot_params.possible_mem_addr)
+		goto out;
+
+	max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT);
+	max_possible_phys = roundup(boot_params.possible_mem_addr,
+					1ULL << TB_SHIFT);
+	threshold = max_actual_phys +
+		((unsigned long long)rand_mem_physical_padding << TB_SHIFT);
+
+	if (max_possible_phys > threshold)
+		rand_mem_physical_padding =
+			(max_possible_phys - max_actual_phys) >> TB_SHIFT;
+out:
+	return rand_mem_physical_padding;
+}
+#else
+static unsigned int __init kaslr_padding(void)
+{
+	return CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+}
+#endif
+
 /* Initialize base and padding for each memory region randomized with KASLR */
 void __init kernel_randomize_memory(void)
 {
@@ -102,7 +131,7 @@ void __init kernel_randomize_memory(void)
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
 	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+		kaslr_padding();
 
 	/* Adapt phyiscal memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
Borislav Petkov Oct. 16, 2018, 7:11 p.m. UTC | #19
On Tue, Oct 16, 2018 at 11:13:54AM -0400, Masayoshi Mizuma wrote:
> Hi Boris, Baoquan and all,
> 
> I have created a RFC patch for adjust KASLR padding size.
> This patch is based on Can's v8 patch [1/3], and the Can's patch
> will be changed in the future, so this patch is just RFC.
> 
> Welcome to any comments and suggestions. Thanks!

...

> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> index a06cbf019744..c987c725e93a 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -191,7 +191,8 @@ struct boot_params {
>  	__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
>  	__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];	/* 0x290 */
>  	struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */
> -	__u8  _pad8[48];				/* 0xcd0 */
> +	__u8  _pad8[40];				/* 0xcd0 */
> +	__u64 possible_mem_addr;			/* 0xcf8 */

Where in the example I gave you with add_e820ext() do you see members
being added to struct boot_params?

Take a look at it again pls.
Masayoshi Mizuma Oct. 16, 2018, 7:54 p.m. UTC | #20
On Tue, Oct 16, 2018 at 09:11:14PM +0200, Borislav Petkov wrote:
> On Tue, Oct 16, 2018 at 11:13:54AM -0400, Masayoshi Mizuma wrote:
> > Hi Boris, Baoquan and all,
> > 
> > I have created a RFC patch for adjust KASLR padding size.
> > This patch is based on Can's v8 patch [1/3], and the Can's patch
> > will be changed in the future, so this patch is just RFC.
> > 
> > Welcome to any comments and suggestions. Thanks!
> 
> ...
> 
> > diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> > index a06cbf019744..c987c725e93a 100644
> > --- a/arch/x86/include/uapi/asm/bootparam.h
> > +++ b/arch/x86/include/uapi/asm/bootparam.h
> > @@ -191,7 +191,8 @@ struct boot_params {
> >  	__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
> >  	__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];	/* 0x290 */
> >  	struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */
> > -	__u8  _pad8[48];				/* 0xcd0 */
> > +	__u8  _pad8[40];				/* 0xcd0 */
> > +	__u64 possible_mem_addr;			/* 0xcf8 */
> 
> Where in the example I gave you with add_e820ext() do you see members
> being added to struct boot_params?
> 
> Take a look at it again pls.

Ah, sorry, I misunderstood your suggetion...
In parse_setup_data(), the data is picked up from boot_params,
so I thought it is also useful for this sitiation.
I'll try again, thanks!

- Masa
Borislav Petkov Oct. 16, 2018, 7:59 p.m. UTC | #21
On Tue, Oct 16, 2018 at 03:54:30PM -0400, Masayoshi Mizuma wrote:
> Ah, sorry, I misunderstood your suggetion...
> In parse_setup_data(), the data is picked up from boot_params,

Yes, this is the pointer to the setup_data linked list head. See

Documentation/ABI/testing/sysfs-kernel-boot_params
Documentation/x86/boot.txt

for more information and basically grep the tree for examples.
Masayoshi Mizuma Oct. 22, 2018, 3:42 p.m. UTC | #22
Hi Boris,

On Tue, Oct 16, 2018 at 09:59:02PM +0200, Borislav Petkov wrote:
> On Tue, Oct 16, 2018 at 03:54:30PM -0400, Masayoshi Mizuma wrote:
> > Ah, sorry, I misunderstood your suggetion...
> > In parse_setup_data(), the data is picked up from boot_params,
> 
> Yes, this is the pointer to the setup_data linked list head. See
> 
> Documentation/ABI/testing/sysfs-kernel-boot_params
> Documentation/x86/boot.txt
> 
> for more information and basically grep the tree for examples.

I'm trying to store the SRAT info and pass it to kernel_randomize_memory(),
looks like add_e820ext()/parse_setup_data().

Is the approach useful only EFI environment? I'm not sure how I
allocate memory to store the data in legacy bios environment...
On EFI, I can use efi_call_early(allocate_pool, EFI_LOADER_DATA, ...).

Am I missing something? I would appreciate if you could help my
understanding.

Following patch is a prototype for EFI enviromnent.
---
 arch/x86/boot/compressed/acpitb.c     | 23 ++++++++++-
 arch/x86/boot/compressed/eboot.c      | 36 +++++++++++++++++
 arch/x86/include/uapi/asm/bootparam.h |  1 +
 arch/x86/mm/kaslr.c                   | 58 ++++++++++++++++++++++++++-
 4 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/acpitb.c b/arch/x86/boot/compressed/acpitb.c
index d119663..b79560c 100644
--- a/arch/x86/boot/compressed/acpitb.c
+++ b/arch/x86/boot/compressed/acpitb.c
@@ -309,6 +309,20 @@ static struct acpi_table_header *get_acpi_srat_table(void)
 	return NULL;
 }

+static void store_possible_addr(unsigned long long possible)
+{
+	struct setup_data *data;
+
+	data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data;
+	while (data) {
+		if (data->type == SETUP_KASLR) {
+			*(unsigned long long *)data->data = possible;
+			return;
+		}
+		data = (struct setup_data *)(unsigned long)data->next;
+	}
+}
+
 /*
  * According to ACPI table, filter the immvoable memory regions
  * and store them in immovable_mem[].
@@ -319,6 +333,7 @@ void get_immovable_mem(void)
 	struct acpi_subtable_header *table;
 	struct acpi_srat_mem_affinity *ma;
 	unsigned long table_end;
+	unsigned long long possible_addr, max_possible_addr = 0;
 	int i = 0;

 	if (!cmdline_find_option_bool("movable_node") ||
@@ -338,7 +353,12 @@ void get_immovable_mem(void)
 		       sizeof(struct acpi_subtable_header) < table_end) {
 		if (table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {
 			ma = (struct acpi_srat_mem_affinity *)table;
-			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+
+			if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
+				possible_addr = ma->base_address + ma->length;
+				if (possible_addr > max_possible_addr)
+					max_possible_addr = possible_addr;
+			} else {
 				immovable_mem[i].start = ma->base_address;
 				immovable_mem[i].size = ma->length;
 				i++;
@@ -351,4 +371,5 @@ void get_immovable_mem(void)
 			((unsigned long)table + table->length);
 	}
 	num_immovable_mem = i;
+	store_possible_addr(max_possible_addr);
 }
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index 1458b17..9b95fba 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -192,6 +192,40 @@ static void setup_efi_pci(struct boot_params *params)
 	efi_call_early(free_pool, pci_handle);
 }

+#ifdef CONFIG_RANDOMIZE_MEMORY
+static void setup_kaslr(struct boot_params *params)
+{
+	struct setup_data *kaslr_data = NULL;
+	struct setup_data *data;
+	unsigned long size;
+	efi_status_t status;
+
+	size = sizeof(struct setup_data) + sizeof(unsigned long long);
+
+	status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
+			size, (void **)&kaslr_data);
+	if (status != EFI_SUCCESS) {
+		efi_printk(sys_table, "Failed to allocate memory for 'kaslr_data'\n");
+		return;
+	}
+
+	kaslr_data->type = SETUP_KASLR;
+	kaslr_data->next = 0;
+	kaslr_data->len = size;
+
+	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
+	if (data)
+		data->next = (unsigned long)kaslr_data;
+	else {
+		while (data->next)
+			data = (struct setup_data *)(unsigned long)data->next;
+		data->next = (unsigned long)kaslr_data;
+	}
+}
+#else
+static void setup_kaslr(struct boot_params *params) {}
+#endif
+
 static void retrieve_apple_device_properties(struct boot_params *boot_params)
 {
 	efi_guid_t guid = APPLE_PROPERTIES_PROTOCOL_GUID;
@@ -770,6 +804,8 @@ efi_main(struct efi_config *c, struct boot_params *boot_params)

 	setup_efi_pci(boot_params);

+	setup_kaslr(boot_params);
+
 	setup_quirks(boot_params);

 	status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index a06cbf0..0a44d83 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -10,6 +10,7 @@
 #define SETUP_EFI			4
 #define SETUP_APPLE_PROPERTIES		5
 #define SETUP_JAILHOUSE			6
+#define SETUP_KASLR			7

 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK	0x07FF
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 61db77b..6f91cf4 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -28,6 +28,7 @@
 #include <asm/pgtable.h>
 #include <asm/setup.h>
 #include <asm/kaslr.h>
+#include <linux/io.h>

 #include "mm_internal.h"

@@ -69,6 +70,61 @@ static inline bool kaslr_memory_enabled(void)
 	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
 }

+#ifdef CONFIG_MEMORY_HOTPLUG
+static unsigned long long __init get_max_possible_addr(void)
+{
+	struct setup_data *data;
+	u64 pa_data;
+	unsigned long long max = 0;
+
+	pa_data = boot_params.hdr.setup_data;
+	while (pa_data) {
+		data = early_memremap(pa_data, sizeof(*data));
+		if (!data)
+			return 0;
+
+		if (data->type == SETUP_KASLR) {
+			max = *(unsigned long long *)data->data;
+			early_memunmap(data, sizeof(*data));
+			return max;
+		}
+		pa_data = data->next;
+		early_memunmap(data, sizeof(*data));
+	}
+
+	return max;
+}
+
+static unsigned int __init kaslr_padding(void)
+{
+	unsigned long long max_possible_addr;
+	unsigned long long max_possible_phys, max_actual_phys, threshold;
+	unsigned int rand_mem_physical_padding =
+				CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+
+	max_possible_addr = get_max_possible_addr();
+	if (!max_possible_addr)
+		goto out;
+
+	max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT);
+	max_possible_phys = roundup(max_possible_addr,
+					1ULL << TB_SHIFT);
+	threshold = max_actual_phys +
+		((unsigned long long)rand_mem_physical_padding << TB_SHIFT);
+
+	if (max_possible_phys > threshold)
+		rand_mem_physical_padding =
+			(max_possible_phys - max_actual_phys) >> TB_SHIFT;
+out:
+	return rand_mem_physical_padding;
+}
+#else
+static unsigned int __init kaslr_padding(void)
+{
+	return CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+}
+#endif
+
 /* Initialize base and padding for each memory region randomized with KASLR */
 void __init kernel_randomize_memory(void)
 {
@@ -102,7 +158,7 @@ void __init kernel_randomize_memory(void)
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
 	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+		kaslr_padding();

 	/* Adapt phyiscal memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
--

Thanks!
Masa
Chao Fan Oct. 23, 2018, 2:48 a.m. UTC | #23
On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote:
>Hi Boris,

Hi Mizuma-san,

I have several questions:

>+static void store_possible_addr(unsigned long long possible)
>+{
>+	struct setup_data *data;
>+
>+	data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data;
I suggest you add check:

	if (!data) {
		debug_putstr("No setup_data found.\n");
		return;
	}

>+	while (data) {
>+		if (data->type == SETUP_KASLR) {
>+			*(unsigned long long *)data->data = possible;
>+			return;
>+		}
>+		data = (struct setup_data *)(unsigned long)data->next;
>+	}
>+}
>+
> /*
>  * According to ACPI table, filter the immvoable memory regions
>  * and store them in immovable_mem[].
>@@ -319,6 +333,7 @@ void get_immovable_mem(void)
> 	struct acpi_subtable_header *table;
> 	struct acpi_srat_mem_affinity *ma;
> 	unsigned long table_end;
>+	unsigned long long possible_addr, max_possible_addr = 0;
> 	int i = 0;
>
> 	if (!cmdline_find_option_bool("movable_node") ||
>@@ -338,7 +353,12 @@ void get_immovable_mem(void)
> 		       sizeof(struct acpi_subtable_header) < table_end) {
> 		if (table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {
> 			ma = (struct acpi_srat_mem_affinity *)table;
>-			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
>+
>+			if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>+				possible_addr = ma->base_address + ma->length;
>+				if (possible_addr > max_possible_addr)
>+					max_possible_addr = possible_addr;
>+			} else {
> 				immovable_mem[i].start = ma->base_address;
> 				immovable_mem[i].size = ma->length;
> 				i++;
>@@ -351,4 +371,5 @@ void get_immovable_mem(void)
> 			((unsigned long)table + table->length);
> 	}
> 	num_immovable_mem = i;
>+	store_possible_addr(max_possible_addr);
> }
>diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
>index 1458b17..9b95fba 100644
>--- a/arch/x86/boot/compressed/eboot.c
>+++ b/arch/x86/boot/compressed/eboot.c
>@@ -192,6 +192,40 @@ static void setup_efi_pci(struct boot_params *params)
> 	efi_call_early(free_pool, pci_handle);
> }
>
>+#ifdef CONFIG_RANDOMIZE_MEMORY
>+static void setup_kaslr(struct boot_params *params)
>+{
>+	struct setup_data *kaslr_data = NULL;
>+	struct setup_data *data;
>+	unsigned long size;
>+	efi_status_t status;
>+
>+	size = sizeof(struct setup_data) + sizeof(unsigned long long);
>+
>+	status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
>+			size, (void **)&kaslr_data);
>+	if (status != EFI_SUCCESS) {
>+		efi_printk(sys_table, "Failed to allocate memory for 'kaslr_data'\n");
>+		return;
>+	}
>+
>+	kaslr_data->type = SETUP_KASLR;
>+	kaslr_data->next = 0;
>+	kaslr_data->len = size;
>+
>+	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
>+	if (data)
>+		data->next = (unsigned long)kaslr_data;
Why just put the kaslr_data in data->next. You can't make sure
data->next was NULL.
>+	else {
If data is NULL, go to this else{}, so these two lines below work?
>+		while (data->next)
>+			data = (struct setup_data *)(unsigned long)data->next;
>+		data->next = (unsigned long)kaslr_data;
>+	}
If my understanding is not wrong, it should be:

	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
	if (!data)
		params->hdr.setup_data = (unsigned long)kaslr_data;
	else {
		while (data->next)
			data = (struct setup_data *)(unsigned long)data->next;
		data->next = (unsigned long)kaslr_data;
	}

If I misunderstand something, please tell me.

Thanks,
Chao Fan
Masayoshi Mizuma Oct. 24, 2018, 7:21 p.m. UTC | #24
On Tue, Oct 23, 2018 at 10:48:02AM +0800, Chao Fan wrote:
> On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote:
> >Hi Boris,
> 
> Hi Mizuma-san,
> 
> I have several questions:

Thank you for your comments! I think your suggestions are
right.
However, the prototype patch works EFI environment only.
The memory hot-plug affinity in SRAT and KASLR are also available
on legacy BIOS environment, so I need to get the patch useful
for legacy BIOS as well, but I have no idea to add such things...
If you have ideas, could you let me know?

Probably I should have another idea, for example,
add the SRAT parsing code, looks like you are adding to
arch/x86/boot/compressed/acpitb.c, to arch/x86/mm/kaslr.c.

Thanks,
Masa

> 
> >+static void store_possible_addr(unsigned long long possible)
> >+{
> >+	struct setup_data *data;
> >+
> >+	data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data;
> I suggest you add check:
> 
> 	if (!data) {
> 		debug_putstr("No setup_data found.\n");
> 		return;
> 	}
> 
> >+	while (data) {
> >+		if (data->type == SETUP_KASLR) {
> >+			*(unsigned long long *)data->data = possible;
> >+			return;
> >+		}
> >+		data = (struct setup_data *)(unsigned long)data->next;
> >+	}
> >+}
> >+
> > /*
> >  * According to ACPI table, filter the immvoable memory regions
> >  * and store them in immovable_mem[].
> >@@ -319,6 +333,7 @@ void get_immovable_mem(void)
> > 	struct acpi_subtable_header *table;
> > 	struct acpi_srat_mem_affinity *ma;
> > 	unsigned long table_end;
> >+	unsigned long long possible_addr, max_possible_addr = 0;
> > 	int i = 0;
> >
> > 	if (!cmdline_find_option_bool("movable_node") ||
> >@@ -338,7 +353,12 @@ void get_immovable_mem(void)
> > 		       sizeof(struct acpi_subtable_header) < table_end) {
> > 		if (table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {
> > 			ma = (struct acpi_srat_mem_affinity *)table;
> >-			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
> >+
> >+			if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> >+				possible_addr = ma->base_address + ma->length;
> >+				if (possible_addr > max_possible_addr)
> >+					max_possible_addr = possible_addr;
> >+			} else {
> > 				immovable_mem[i].start = ma->base_address;
> > 				immovable_mem[i].size = ma->length;
> > 				i++;
> >@@ -351,4 +371,5 @@ void get_immovable_mem(void)
> > 			((unsigned long)table + table->length);
> > 	}
> > 	num_immovable_mem = i;
> >+	store_possible_addr(max_possible_addr);
> > }
> >diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
> >index 1458b17..9b95fba 100644
> >--- a/arch/x86/boot/compressed/eboot.c
> >+++ b/arch/x86/boot/compressed/eboot.c
> >@@ -192,6 +192,40 @@ static void setup_efi_pci(struct boot_params *params)
> > 	efi_call_early(free_pool, pci_handle);
> > }
> >
> >+#ifdef CONFIG_RANDOMIZE_MEMORY
> >+static void setup_kaslr(struct boot_params *params)
> >+{
> >+	struct setup_data *kaslr_data = NULL;
> >+	struct setup_data *data;
> >+	unsigned long size;
> >+	efi_status_t status;
> >+
> >+	size = sizeof(struct setup_data) + sizeof(unsigned long long);
> >+
> >+	status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
> >+			size, (void **)&kaslr_data);
> >+	if (status != EFI_SUCCESS) {
> >+		efi_printk(sys_table, "Failed to allocate memory for 'kaslr_data'\n");
> >+		return;
> >+	}
> >+
> >+	kaslr_data->type = SETUP_KASLR;
> >+	kaslr_data->next = 0;
> >+	kaslr_data->len = size;
> >+
> >+	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
> >+	if (data)
> >+		data->next = (unsigned long)kaslr_data;
> Why just put the kaslr_data in data->next. You can't make sure
> data->next was NULL.
> >+	else {
> If data is NULL, go to this else{}, so these two lines below work?
> >+		while (data->next)
> >+			data = (struct setup_data *)(unsigned long)data->next;
> >+		data->next = (unsigned long)kaslr_data;
> >+	}
> If my understanding is not wrong, it should be:
> 
> 	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
> 	if (!data)
> 		params->hdr.setup_data = (unsigned long)kaslr_data;
> 	else {
> 		while (data->next)
> 			data = (struct setup_data *)(unsigned long)data->next;
> 		data->next = (unsigned long)kaslr_data;
> 	}
> 
> If I misunderstand something, please tell me.
> 
> Thanks,
> Chao Fan
> 
>
Chao Fan Oct. 25, 2018, 1:22 a.m. UTC | #25
On Wed, Oct 24, 2018 at 03:21:36PM -0400, Masayoshi Mizuma wrote:
>On Tue, Oct 23, 2018 at 10:48:02AM +0800, Chao Fan wrote:
>> On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote:
>> >Hi Boris,
>> 
>> Hi Mizuma-san,
>> 
>> I have several questions:
>
>Thank you for your comments! I think your suggestions are
>right.
>However, the prototype patch works EFI environment only.

Yes, I agree. But I think this method is much better than
adding code to arch/x86/mm/kaslr.c.

>The memory hot-plug affinity in SRAT and KASLR are also available
>on legacy BIOS environment, so I need to get the patch useful
>for legacy BIOS as well, but I have no idea to add such things...
>If you have ideas, could you let me know?

I have no idea. I will work on it, try to help figure out the BIOS issue.

Thanks,
Chao Fan

>
>Probably I should have another idea, for example,
>add the SRAT parsing code, looks like you are adding to
>arch/x86/boot/compressed/acpitb.c, to arch/x86/mm/kaslr.c.
>
>Thanks,
>Masa
>
>> 
>> >+static void store_possible_addr(unsigned long long possible)
>> >+{
>> >+	struct setup_data *data;
>> >+
>> >+	data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data;
>> I suggest you add check:
>> 
>> 	if (!data) {
>> 		debug_putstr("No setup_data found.\n");
>> 		return;
>> 	}
>> 
>> >+	while (data) {
>> >+		if (data->type == SETUP_KASLR) {
>> >+			*(unsigned long long *)data->data = possible;
>> >+			return;
>> >+		}
>> >+		data = (struct setup_data *)(unsigned long)data->next;
>> >+	}
>> >+}
>> >+
>> > /*
>> >  * According to ACPI table, filter the immvoable memory regions
>> >  * and store them in immovable_mem[].
>> >@@ -319,6 +333,7 @@ void get_immovable_mem(void)
>> > 	struct acpi_subtable_header *table;
>> > 	struct acpi_srat_mem_affinity *ma;
>> > 	unsigned long table_end;
>> >+	unsigned long long possible_addr, max_possible_addr = 0;
>> > 	int i = 0;
>> >
>> > 	if (!cmdline_find_option_bool("movable_node") ||
>> >@@ -338,7 +353,12 @@ void get_immovable_mem(void)
>> > 		       sizeof(struct acpi_subtable_header) < table_end) {
>> > 		if (table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {
>> > 			ma = (struct acpi_srat_mem_affinity *)table;
>> >-			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
>> >+
>> >+			if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>> >+				possible_addr = ma->base_address + ma->length;
>> >+				if (possible_addr > max_possible_addr)
>> >+					max_possible_addr = possible_addr;
>> >+			} else {
>> > 				immovable_mem[i].start = ma->base_address;
>> > 				immovable_mem[i].size = ma->length;
>> > 				i++;
>> >@@ -351,4 +371,5 @@ void get_immovable_mem(void)
>> > 			((unsigned long)table + table->length);
>> > 	}
>> > 	num_immovable_mem = i;
>> >+	store_possible_addr(max_possible_addr);
>> > }
>> >diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
>> >index 1458b17..9b95fba 100644
>> >--- a/arch/x86/boot/compressed/eboot.c
>> >+++ b/arch/x86/boot/compressed/eboot.c
>> >@@ -192,6 +192,40 @@ static void setup_efi_pci(struct boot_params *params)
>> > 	efi_call_early(free_pool, pci_handle);
>> > }
>> >
>> >+#ifdef CONFIG_RANDOMIZE_MEMORY
>> >+static void setup_kaslr(struct boot_params *params)
>> >+{
>> >+	struct setup_data *kaslr_data = NULL;
>> >+	struct setup_data *data;
>> >+	unsigned long size;
>> >+	efi_status_t status;
>> >+
>> >+	size = sizeof(struct setup_data) + sizeof(unsigned long long);
>> >+
>> >+	status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
>> >+			size, (void **)&kaslr_data);
>> >+	if (status != EFI_SUCCESS) {
>> >+		efi_printk(sys_table, "Failed to allocate memory for 'kaslr_data'\n");
>> >+		return;
>> >+	}
>> >+
>> >+	kaslr_data->type = SETUP_KASLR;
>> >+	kaslr_data->next = 0;
>> >+	kaslr_data->len = size;
>> >+
>> >+	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
>> >+	if (data)
>> >+		data->next = (unsigned long)kaslr_data;
>> Why just put the kaslr_data in data->next. You can't make sure
>> data->next was NULL.
>> >+	else {
>> If data is NULL, go to this else{}, so these two lines below work?
>> >+		while (data->next)
>> >+			data = (struct setup_data *)(unsigned long)data->next;
>> >+		data->next = (unsigned long)kaslr_data;
>> >+	}
>> If my understanding is not wrong, it should be:
>> 
>> 	data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
>> 	if (!data)
>> 		params->hdr.setup_data = (unsigned long)kaslr_data;
>> 	else {
>> 		while (data->next)
>> 			data = (struct setup_data *)(unsigned long)data->next;
>> 		data->next = (unsigned long)kaslr_data;
>> 	}
>> 
>> If I misunderstand something, please tell me.
>> 
>> Thanks,
>> Chao Fan
>> 
>> 
>
>
Borislav Petkov Oct. 25, 2018, 10:33 a.m. UTC | #26
On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote:
> I'm trying to store the SRAT info and pass it to kernel_randomize_memory(),
> looks like add_e820ext()/parse_setup_data().
> 
> Is the approach useful only EFI environment? I'm not sure how I

Does it matter for non-EFI even?

I mean, you want this code only for your use case - when you have
movable memory and you're doing kexec, yes?

And those machines are all EFI boxes I'd assume...
Masayoshi Mizuma Oct. 25, 2018, 1:40 p.m. UTC | #27
On Thu, Oct 25, 2018 at 12:33:45PM +0200, Borislav Petkov wrote:
> On Mon, Oct 22, 2018 at 11:42:05AM -0400, Masayoshi Mizuma wrote:
> > I'm trying to store the SRAT info and pass it to kernel_randomize_memory(),
> > looks like add_e820ext()/parse_setup_data().
> > 
> > Is the approach useful only EFI environment? I'm not sure how I
> 
> Does it matter for non-EFI even?
> 
> I mean, you want this code only for your use case - when you have
> movable memory and you're doing kexec, yes?
> 
> And those machines are all EFI boxes I'd assume...

My actual use case is for EFI boxes, however, I think it's better to useful
for legacy BIOS as well because memory hot-plug affinity in SRAT and KASLR
are available on legacy BIOS.
Actually, we can create such environment in qemu.

I have another idea to solve this issue. Adding a SRAT parsing code
to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and
also we don't need a new kernel parameter...
Dose the idea make sense?

Thanks,
Masa
Borislav Petkov Nov. 6, 2018, 12:10 p.m. UTC | #28
On Thu, Oct 25, 2018 at 09:40:51AM -0400, Masayoshi Mizuma wrote:
> My actual use case is for EFI boxes, however, I think it's better to useful
> for legacy BIOS as well because memory hot-plug affinity in SRAT and KASLR
> are available on legacy BIOS.
> Actually, we can create such environment in qemu.

Ah, right, qemu. :)

> I have another idea to solve this issue. Adding a SRAT parsing code
> to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and
> also we don't need a new kernel parameter...
> Dose the idea make sense?

The more automatic stuff we do and we don't have to involve the user,
the better.

However, lemme look at Chao's current patchset first - we should not go
nuts by putting SRAT parsing everywhere :)
Baoquan He Nov. 6, 2018, 2:07 p.m. UTC | #29
On 11/06/18 at 01:10pm, Borislav Petkov wrote:
> > I have another idea to solve this issue. Adding a SRAT parsing code
> > to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and
> > also we don't need a new kernel parameter...
> > Dose the idea make sense?
> 
> The more automatic stuff we do and we don't have to involve the user,
> the better.
> 
> However, lemme look at Chao's current patchset first - we should not go
> nuts by putting SRAT parsing everywhere :)

arch/x86/mm/ident_map.c is a good example, it's shared between
arch/x86/boot/compressed and arch/x86/mm/init_64.c
Borislav Petkov Nov. 6, 2018, 6:45 p.m. UTC | #30
On Thu, Oct 25, 2018 at 09:40:51AM -0400, Masayoshi Mizuma wrote:
> I have another idea to solve this issue. Adding a SRAT parsing code
> to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and
> also we don't need a new kernel parameter...
> Dose the idea make sense?

Ok, having swapped the whole thing back into my brain, forget what I
said earlier today.

Didn't we talk about passing info with setup_data to the later kernel
stage? You even had a patch:

https://lkml.kernel.org/r/20181022154204.kagmdb55jtoez4ca@gabell

So what is that "idea" again about adding SRAT parsing code to
arch/x86/mm/kaslr.c?!?!

The intent of passing info with setup_data is to *avoid* parsing SRAT
yet another time and duplicating that code one more time.

IOW:

* The first place that needs SRAT parsing, does the parsing - i.e., the
compressed stage.

* Later stages get information passed to them with setup_data. No second
parsing.

Ok?
Masayoshi Mizuma Nov. 6, 2018, 7:36 p.m. UTC | #31
On Tue, Nov 06, 2018 at 07:45:19PM +0100, Borislav Petkov wrote:
> Ok, having swapped the whole thing back into my brain, forget what I
> said earlier today.
> 
> Didn't we talk about passing info with setup_data to the later kernel
> stage? You even had a patch:
> 
> https://lkml.kernel.org/r/20181022154204.kagmdb55jtoez4ca@gabell
> 
> So what is that "idea" again about adding SRAT parsing code to
> arch/x86/mm/kaslr.c?!?!
> 
> The intent of passing info with setup_data is to *avoid* parsing SRAT
> yet another time and duplicating that code one more time.
> 
> IOW:
> 
> * The first place that needs SRAT parsing, does the parsing - i.e., the
> compressed stage.
> 
> * Later stages get information passed to them with setup_data. No second
> parsing.
> 
> Ok?

Yes, I think I can see what you are saying. However, I'm not sure how
I use the setup_data in legacy BIOS environment. So, I said the another
idea which adding the SRAT parsing code to arch/x86/mm/kaslr.c as well.
Yes, as you said, that is not so good...

I would appreciate if you could help to use setup_data or something to
pass the information to later kernel stage in BIOS environment.

Thanks,
Masa
Borislav Petkov Nov. 6, 2018, 8:45 p.m. UTC | #32
On Tue, Nov 06, 2018 at 02:36:38PM -0500, Masayoshi Mizuma wrote:
> Yes, I think I can see what you are saying. However, I'm not sure how
> I use the setup_data in legacy BIOS environment.

What is "legacy BIOS environment" and what does that have to do with
setup_data?
Masayoshi Mizuma Nov. 6, 2018, 10:21 p.m. UTC | #33
On Tue, Nov 06, 2018 at 09:45:11PM +0100, Borislav Petkov wrote:
> On Tue, Nov 06, 2018 at 02:36:38PM -0500, Masayoshi Mizuma wrote:
> > Yes, I think I can see what you are saying. However, I'm not sure how
> > I use the setup_data in legacy BIOS environment.
> 
> What is "legacy BIOS environment" and what does that have to do with
> setup_data?

My proposed patch [1] is useful only for EFI environment. The patch
allocates a setup_date structure by efi_call_early() and store the
KASLR data into the memory area.

+static void setup_kaslr(struct boot_params *params)
+{
+	struct setup_data *kaslr_data = NULL;
+	struct setup_data *data;
+	unsigned long size;
+	efi_status_t status;
+
+	size = sizeof(struct setup_data) + sizeof(unsigned long long);
+
+	status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
+			size, (void **)&kaslr_data);

I'm not sure how I allocate such memory on no EFI environment.
Am I missing something...?

[1] https://lkml.kernel.org/r/20181022154204.kagmdb55jtoez4ca@gabell

Thanks,
Masa
Chao Fan Nov. 7, 2018, 1:21 a.m. UTC | #34
On Tue, Nov 06, 2018 at 10:07:31PM +0800, Baoquan He wrote:
>On 11/06/18 at 01:10pm, Borislav Petkov wrote:
>> > I have another idea to solve this issue. Adding a SRAT parsing code
>> > to arch/x86/mm/kaslr.c. It is useful for both EFI and BIOS and
>> > also we don't need a new kernel parameter...
>> > Dose the idea make sense?
>> 
>> The more automatic stuff we do and we don't have to involve the user,
>> the better.
>> 
>> However, lemme look at Chao's current patchset first - we should not go
>> nuts by putting SRAT parsing everywhere :)
>
>arch/x86/mm/ident_map.c is a good example, it's shared between
>arch/x86/boot/compressed and arch/x86/mm/init_64.c

Thanks to Baoquan, I think we can try this idea.
How about you, Masa?

Thanks,
Chao Fan

>
>
Borislav Petkov Nov. 8, 2018, 10:51 a.m. UTC | #35
On Tue, Nov 06, 2018 at 05:21:34PM -0500, Masayoshi Mizuma wrote:
> On Tue, Nov 06, 2018 at 09:45:11PM +0100, Borislav Petkov wrote:
> > On Tue, Nov 06, 2018 at 02:36:38PM -0500, Masayoshi Mizuma wrote:
> > > Yes, I think I can see what you are saying. However, I'm not sure how
> > > I use the setup_data in legacy BIOS environment.
> > 
> > What is "legacy BIOS environment" and what does that have to do with
> > setup_data?
> 
> My proposed patch [1] is useful only for EFI environment. The patch
> allocates a setup_date structure by efi_call_early() and store the
> KASLR data into the memory area.
> 
> +static void setup_kaslr(struct boot_params *params)
> +{
> +	struct setup_data *kaslr_data = NULL;
> +	struct setup_data *data;
> +	unsigned long size;
> +	efi_status_t status;
> +
> +	size = sizeof(struct setup_data) + sizeof(unsigned long long);
> +
> +	status = efi_call_early(allocate_pool, EFI_LOADER_DATA,
> +			size, (void **)&kaslr_data);
> 
> I'm not sure how I allocate such memory on no EFI environment.

A global definition which doesn't need allocation?

Maybe hpa would have another, better idea...
Borislav Petkov Nov. 10, 2018, 10:54 a.m. UTC | #36
On Thu, Nov 08, 2018 at 11:51:29AM +0100, Borislav Petkov wrote:
> A global definition which doesn't need allocation?
> 
> Maybe hpa would have another, better idea...

...and he has: just put that address in a new field in struct
boot_params by converting one of the padding arrays there.

Don't forget to document it in Documentation/x86/zero-page.txt

This way you don't need any of the allocation fun or to use setup_data
at all.

HTH.
Masayoshi Mizuma Nov. 11, 2018, 1:45 p.m. UTC | #37
On Sat, Nov 10, 2018 at 11:54:22AM +0100, Borislav Petkov wrote:
> On Thu, Nov 08, 2018 at 11:51:29AM +0100, Borislav Petkov wrote:
> > A global definition which doesn't need allocation?
> > 
> > Maybe hpa would have another, better idea...
> 
> ...and he has: just put that address in a new field in struct
> boot_params by converting one of the padding arrays there.
> 
> Don't forget to document it in Documentation/x86/zero-page.txt
> 
> This way you don't need any of the allocation fun or to use setup_data
> at all.

Thanks!
I have the prototype patch to use boot_params [1].
I will try to brush up it.

[1] https://lore.kernel.org/lkml/20181016151353.punyk7exekut2543@gabell

Thanks,
Masa
Masayoshi Mizuma Feb. 5, 2019, 3:05 p.m. UTC | #38
Hi Boris and all,

On Sun, Nov 11, 2018 at 08:45:57AM -0500, Masayoshi Mizuma wrote:
> On Sat, Nov 10, 2018 at 11:54:22AM +0100, Borislav Petkov wrote:
> > On Thu, Nov 08, 2018 at 11:51:29AM +0100, Borislav Petkov wrote:
> > > A global definition which doesn't need allocation?
> > > 
> > > Maybe hpa would have another, better idea...
> > 
> > ...and he has: just put that address in a new field in struct
> > boot_params by converting one of the padding arrays there.
> > 
> > Don't forget to document it in Documentation/x86/zero-page.txt
> > 
> > This way you don't need any of the allocation fun or to use setup_data
> > at all.
> 
> Thanks!
> I have the prototype patch to use boot_params [1].
> I will try to brush up it.
> 
> [1] https://lore.kernel.org/lkml/20181016151353.punyk7exekut2543@gabell

Chao's patches are included in the tip tree, so I modified the patch.

Could you review the following patch?

From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Date: Tue, 5 Feb 2019 10:00:59 -0500
Subject: [PATCH] x86/mm: Introduce adjustment the padding size for KASLR

If the physical memory layout has huge space for hotplug, the padding
used for the physical memory mapping section is not enough.
So, such system may crash while memory hot-adding on KASLR enabled system.

For example, SRAT has the following layout, the maximum possible memory
size is 32TB, and the memory is installed as 2TB actually, then the padding
size should set 30TB (== possible memory size - actual memory size).

  SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug

This patch introduces adjustment the padding size if the default
padding size isn't enough.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
---
 Documentation/x86/zero-page.txt       |  1 +
 arch/x86/boot/compressed/acpi.c       | 19 +++++++++++++++----
 arch/x86/include/uapi/asm/bootparam.h |  2 +-
 arch/x86/mm/kaslr.c                   | 26 +++++++++++++++++++++++++-
 4 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
index 68aed077f..343fe1a90 100644
--- a/Documentation/x86/zero-page.txt
+++ b/Documentation/x86/zero-page.txt
@@ -15,6 +15,7 @@ Offset	Proto	Name		Meaning
 058/008	ALL	tboot_addr      Physical address of tboot shared page
 060/010	ALL	ist_info	Intel SpeedStep (IST) BIOS support information
 				(struct ist_info)
+078/010	ALL	possible_mem_addr The possible maximum physical memory address.
 080/010	ALL	hd0_info	hd0 disk parameter, OBSOLETE!!
 090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
 0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table),
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index c5a949335..7dd61b943 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -288,6 +288,7 @@ int count_immovable_mem_regions(void)
 	struct acpi_subtable_header *sub_table;
 	struct acpi_table_header *table_header;
 	char arg[MAX_ACPI_ARG_LENGTH];
+	unsigned long long possible_addr, max_possible_addr = 0;
 	int num = 0;
 
 	if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 &&
@@ -308,10 +309,19 @@ int count_immovable_mem_regions(void)
 			struct acpi_srat_mem_affinity *ma;
 
 			ma = (struct acpi_srat_mem_affinity *)sub_table;
-			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
-				immovable_mem[num].start = ma->base_address;
-				immovable_mem[num].size = ma->length;
-				num++;
+			if (ma->length) {
+				if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
+					possible_addr =
+						ma->base_address + ma->length;
+					if (possible_addr > max_possible_addr)
+						max_possible_addr =
+							possible_addr;
+				} else {
+					immovable_mem[num].start =
+						ma->base_address;
+					immovable_mem[num].size = ma->length;
+					num++;
+				}
 			}
 
 			if (num >= MAX_NUMNODES*2) {
@@ -320,6 +330,7 @@ int count_immovable_mem_regions(void)
 			}
 		}
 		table += sub_table->length;
+		boot_params->possible_mem_addr = max_possible_addr;
 	}
 	return num;
 }
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 60733f137..5b64b606e 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -156,7 +156,7 @@ struct boot_params {
 	__u64  tboot_addr;				/* 0x058 */
 	struct ist_info ist_info;			/* 0x060 */
 	__u64 acpi_rsdp_addr;				/* 0x070 */
-	__u8  _pad3[8];					/* 0x078 */
+	__u64  possible_mem_addr;			/* 0x078 */
 	__u8  hd0_info[16];	/* obsolete! */		/* 0x080 */
 	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
 	struct sys_desc_table sys_desc_table; /* obsolete! */	/* 0x0a0 */
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 3f452ffed..71fc28570 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -70,6 +70,30 @@ static inline bool kaslr_memory_enabled(void)
 	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
 }
 
+static unsigned int __init kaslr_padding(void)
+{
+	unsigned int rand_mem_physical_padding =
+				CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+#ifdef CONFIG_MEMORY_HOTPLUG
+	unsigned long long max_possible_phys, max_actual_phys, threshold;
+
+	if (!boot_params.possible_mem_addr)
+		goto out;
+
+	max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT);
+	max_possible_phys = roundup(boot_params.possible_mem_addr,
+					1ULL << TB_SHIFT);
+	threshold = max_actual_phys +
+		((unsigned long long)rand_mem_physical_padding << TB_SHIFT);
+
+	if (max_possible_phys > threshold)
+		rand_mem_physical_padding =
+			(max_possible_phys - max_actual_phys) >> TB_SHIFT;
+out:
+#endif
+	return rand_mem_physical_padding;
+}
+
 /* Initialize base and padding for each memory region randomized with KASLR */
 void __init kernel_randomize_memory(void)
 {
@@ -103,7 +127,7 @@ void __init kernel_randomize_memory(void)
 	 */
 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
 	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
-		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+			kaslr_padding();
 
 	/* Adapt phyiscal memory region size based on available memory */
 	if (memory_tb < kaslr_regions[0].size_tb)
Borislav Petkov Feb. 8, 2019, 6:26 p.m. UTC | #39
On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> Date: Tue, 5 Feb 2019 10:00:59 -0500
> Subject: [PATCH] x86/mm: Introduce adjustment the padding size for KASLR

			"Adjust the padding size for KASLR"

> If the physical memory layout has huge space for hotplug, the padding
> used for the physical memory mapping section is not enough.
> So, such system may crash while memory hot-adding on KASLR enabled system.

Crash why?

Why is the padding not enough?

> For example, SRAT has the following layout, the maximum possible memory
> size is 32TB, and the memory is installed as 2TB actually, then the padding
> size should set 30TB (== possible memory size - actual memory size).
> 
>   SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug

What is that supposed to exemplify: that range is 3T not 2 and that
range start is not at 2T but 28T. So I have absolutely no clue what
you're trying to explain here.

Please go back, take your time and structure your commit message like
this:

Problem is A.

It happens because of B.

Fix it by doing C.

(Potentially do D).

For more detailed info, see
Documentation/process/submitting-patches.rst, Section "2) Describe your
changes".

> This patch introduces adjustment the padding size if the default

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> padding size isn't enough.
> 
> Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> ---
>  Documentation/x86/zero-page.txt       |  1 +
>  arch/x86/boot/compressed/acpi.c       | 19 +++++++++++++++----
>  arch/x86/include/uapi/asm/bootparam.h |  2 +-
>  arch/x86/mm/kaslr.c                   | 26 +++++++++++++++++++++++++-
>  4 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
> index 68aed077f..343fe1a90 100644
> --- a/Documentation/x86/zero-page.txt
> +++ b/Documentation/x86/zero-page.txt
> @@ -15,6 +15,7 @@ Offset	Proto	Name		Meaning
>  058/008	ALL	tboot_addr      Physical address of tboot shared page
>  060/010	ALL	ist_info	Intel SpeedStep (IST) BIOS support information
>  				(struct ist_info)
> +078/010	ALL	possible_mem_addr The possible maximum physical memory address.

Why isn't this called max_phys_addr then?

Also, please explain what it means at the end of this file.

>  080/010	ALL	hd0_info	hd0 disk parameter, OBSOLETE!!
>  090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
>  0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table),
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> index c5a949335..7dd61b943 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -288,6 +288,7 @@ int count_immovable_mem_regions(void)
>  	struct acpi_subtable_header *sub_table;
>  	struct acpi_table_header *table_header;
>  	char arg[MAX_ACPI_ARG_LENGTH];
> +	unsigned long long possible_addr, max_possible_addr = 0;
>  	int num = 0;
>  
>  	if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 &&
> @@ -308,10 +309,19 @@ int count_immovable_mem_regions(void)
>  			struct acpi_srat_mem_affinity *ma;
>  
>  			ma = (struct acpi_srat_mem_affinity *)sub_table;
> -			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
> -				immovable_mem[num].start = ma->base_address;
> -				immovable_mem[num].size = ma->length;
> -				num++;
> +			if (ma->length) {
> +				if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> +					possible_addr =
> +						ma->base_address + ma->length;
> +					if (possible_addr > max_possible_addr)
> +						max_possible_addr =
> +							possible_addr;
> +				} else {
> +					immovable_mem[num].start =
> +						ma->base_address;
> +					immovable_mem[num].size = ma->length;
> +					num++;
> +				}

This piece has some ugly linebreaks which makes it impossible to read.
Perhaps because the variable names you're adding are too long.

You can carve it out in a separate function, for example.

>  			if (num >= MAX_NUMNODES*2) {
> @@ -320,6 +330,7 @@ int count_immovable_mem_regions(void)
>  			}
>  		}
>  		table += sub_table->length;
> +		boot_params->possible_mem_addr = max_possible_addr;

This assignment is inside the loop and happens potentially a bunch of
times. Why?

>  	}
>  	return num;
>  }
> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> index 60733f137..5b64b606e 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -156,7 +156,7 @@ struct boot_params {
>  	__u64  tboot_addr;				/* 0x058 */
>  	struct ist_info ist_info;			/* 0x060 */
>  	__u64 acpi_rsdp_addr;				/* 0x070 */
> -	__u8  _pad3[8];					/* 0x078 */
> +	__u64  possible_mem_addr;			/* 0x078 */
>  	__u8  hd0_info[16];	/* obsolete! */		/* 0x080 */
>  	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
>  	struct sys_desc_table sys_desc_table; /* obsolete! */	/* 0x0a0 */
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index 3f452ffed..71fc28570 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -70,6 +70,30 @@ static inline bool kaslr_memory_enabled(void)
>  	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
>  }
>  
> +static unsigned int __init kaslr_padding(void)
> +{
> +	unsigned int rand_mem_physical_padding =
> +				CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +	unsigned long long max_possible_phys, max_actual_phys, threshold;
> +
> +	if (!boot_params.possible_mem_addr)
> +		goto out;
> +
> +	max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT);
> +	max_possible_phys = roundup(boot_params.possible_mem_addr,
> +					1ULL << TB_SHIFT);
> +	threshold = max_actual_phys +
> +		((unsigned long long)rand_mem_physical_padding << TB_SHIFT);
> +
> +	if (max_possible_phys > threshold)
> +		rand_mem_physical_padding =
> +			(max_possible_phys - max_actual_phys) >> TB_SHIFT;
> +out:
> +#endif
> +	return rand_mem_physical_padding;

Same problem: local variables with unnecessarily long names make the
code hard to read. Pls shorten.

Also, the types in that function are a total mess. An unsigned int which
you cast to unsigned long long and return an unsigned int again to add
into a sum with unsigned longs. Are you selecting the types randomly?
Why aren't you using plain unsigned longs like the rest of the file does
with memory addresses?

Also, this function could have a comment above it explaining what all
that threshold and actual and possible address arithmetic is supposed to
do.

Thx.
Masayoshi Mizuma Feb. 9, 2019, 12:24 a.m. UTC | #40
Hi Boris,

Thank you for your review.

On Fri, Feb 08, 2019 at 07:26:00PM +0100, Borislav Petkov wrote:
> On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote:
> > From: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> > Date: Tue, 5 Feb 2019 10:00:59 -0500
> > Subject: [PATCH] x86/mm: Introduce adjustment the padding size for KASLR
> 
> 			"Adjust the padding size for KASLR"
> 
> > If the physical memory layout has huge space for hotplug, the padding
> > used for the physical memory mapping section is not enough.
> > So, such system may crash while memory hot-adding on KASLR enabled system.
> 
> Crash why?
> 
> Why is the padding not enough?
> 
> > For example, SRAT has the following layout, the maximum possible memory
> > size is 32TB, and the memory is installed as 2TB actually, then the padding
> > size should set 30TB (== possible memory size - actual memory size).
> > 
> >   SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
> 
> What is that supposed to exemplify: that range is 3T not 2 and that
> range start is not at 2T but 28T. So I have absolutely no clue what
> you're trying to explain here.
> 
> Please go back, take your time and structure your commit message like
> this:
> 
> Problem is A.
> 
> It happens because of B.
> 
> Fix it by doing C.
> 
> (Potentially do D).
> 
> For more detailed info, see
> Documentation/process/submitting-patches.rst, Section "2) Describe your
> changes".

Got it, thanks.

> 
> > This patch introduces adjustment the padding size if the default
> 
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
> 
> Also, do
> 
> $ git grep 'This patch' Documentation/process
> 
> for more details.

Thanks. I see.

> 
> > padding size isn't enough.
> > 
> > Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
> > ---
> >  Documentation/x86/zero-page.txt       |  1 +
> >  arch/x86/boot/compressed/acpi.c       | 19 +++++++++++++++----
> >  arch/x86/include/uapi/asm/bootparam.h |  2 +-
> >  arch/x86/mm/kaslr.c                   | 26 +++++++++++++++++++++++++-
> >  4 files changed, 42 insertions(+), 6 deletions(-)
> > 
> > diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
> > index 68aed077f..343fe1a90 100644
> > --- a/Documentation/x86/zero-page.txt
> > +++ b/Documentation/x86/zero-page.txt
> > @@ -15,6 +15,7 @@ Offset	Proto	Name		Meaning
> >  058/008	ALL	tboot_addr      Physical address of tboot shared page
> >  060/010	ALL	ist_info	Intel SpeedStep (IST) BIOS support information
> >  				(struct ist_info)
> > +078/010	ALL	possible_mem_addr The possible maximum physical memory address.
> 
> Why isn't this called max_phys_addr then?
> 
> Also, please explain what it means at the end of this file.
> 
> >  080/010	ALL	hd0_info	hd0 disk parameter, OBSOLETE!!
> >  090/010	ALL	hd1_info	hd1 disk parameter, OBSOLETE!!
> >  0A0/010	ALL	sys_desc_table	System description table (struct sys_desc_table),
> > diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> > index c5a949335..7dd61b943 100644
> > --- a/arch/x86/boot/compressed/acpi.c
> > +++ b/arch/x86/boot/compressed/acpi.c
> > @@ -288,6 +288,7 @@ int count_immovable_mem_regions(void)
> >  	struct acpi_subtable_header *sub_table;
> >  	struct acpi_table_header *table_header;
> >  	char arg[MAX_ACPI_ARG_LENGTH];
> > +	unsigned long long possible_addr, max_possible_addr = 0;
> >  	int num = 0;
> >  
> >  	if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 &&
> > @@ -308,10 +309,19 @@ int count_immovable_mem_regions(void)
> >  			struct acpi_srat_mem_affinity *ma;
> >  
> >  			ma = (struct acpi_srat_mem_affinity *)sub_table;
> > -			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
> > -				immovable_mem[num].start = ma->base_address;
> > -				immovable_mem[num].size = ma->length;
> > -				num++;
> > +			if (ma->length) {
> > +				if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> > +					possible_addr =
> > +						ma->base_address + ma->length;
> > +					if (possible_addr > max_possible_addr)
> > +						max_possible_addr =
> > +							possible_addr;
> > +				} else {
> > +					immovable_mem[num].start =
> > +						ma->base_address;
> > +					immovable_mem[num].size = ma->length;
> > +					num++;
> > +				}
> 
> This piece has some ugly linebreaks which makes it impossible to read.
> Perhaps because the variable names you're adding are too long.
> 
> You can carve it out in a separate function, for example.

Thanks. I will add a separate function like as:

static void subtable_parse(struct acpi_subtable_header *sub_table, int *num,
		unsigned long *max_addr)
{
	struct acpi_srat_mem_affinity *ma;
	unsigned long addr;

	ma = (struct acpi_srat_mem_affinity *)sub_table;
	if (ma->length) {
		if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
			addr = ma->base_address + ma->length;
			if (addr > *max_addr)
				*max_addr = addr;
		} else {
			immovable_mem[*num].start = ma->base_address;
			immovable_mem[*num].size = ma->length;
			(*num)++;
		}
	}
}

> 
> >  			if (num >= MAX_NUMNODES*2) {
> > @@ -320,6 +330,7 @@ int count_immovable_mem_regions(void)
> >  			}
> >  		}
> >  		table += sub_table->length;
> > +		boot_params->possible_mem_addr = max_possible_addr;
> 
> This assignment is inside the loop and happens potentially a bunch of
> times. Why?


I will move the assigment out of the loop, thanks.

> 
> >  	}
> >  	return num;
> >  }
> > diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> > index 60733f137..5b64b606e 100644
> > --- a/arch/x86/include/uapi/asm/bootparam.h
> > +++ b/arch/x86/include/uapi/asm/bootparam.h
> > @@ -156,7 +156,7 @@ struct boot_params {
> >  	__u64  tboot_addr;				/* 0x058 */
> >  	struct ist_info ist_info;			/* 0x060 */
> >  	__u64 acpi_rsdp_addr;				/* 0x070 */
> > -	__u8  _pad3[8];					/* 0x078 */
> > +	__u64  possible_mem_addr;			/* 0x078 */
> >  	__u8  hd0_info[16];	/* obsolete! */		/* 0x080 */
> >  	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
> >  	struct sys_desc_table sys_desc_table; /* obsolete! */	/* 0x0a0 */
> > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > index 3f452ffed..71fc28570 100644
> > --- a/arch/x86/mm/kaslr.c
> > +++ b/arch/x86/mm/kaslr.c
> > @@ -70,6 +70,30 @@ static inline bool kaslr_memory_enabled(void)
> >  	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
> >  }
> >  
> > +static unsigned int __init kaslr_padding(void)
> > +{
> > +	unsigned int rand_mem_physical_padding =
> > +				CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> > +	unsigned long long max_possible_phys, max_actual_phys, threshold;
> > +
> > +	if (!boot_params.possible_mem_addr)
> > +		goto out;
> > +
> > +	max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT);
> > +	max_possible_phys = roundup(boot_params.possible_mem_addr,
> > +					1ULL << TB_SHIFT);
> > +	threshold = max_actual_phys +
> > +		((unsigned long long)rand_mem_physical_padding << TB_SHIFT);
> > +
> > +	if (max_possible_phys > threshold)
> > +		rand_mem_physical_padding =
> > +			(max_possible_phys - max_actual_phys) >> TB_SHIFT;
> > +out:
> > +#endif
> > +	return rand_mem_physical_padding;
> 
> Same problem: local variables with unnecessarily long names make the
> code hard to read. Pls shorten.
> 
> Also, the types in that function are a total mess. An unsigned int which
> you cast to unsigned long long and return an unsigned int again to add
> into a sum with unsigned longs. Are you selecting the types randomly?
> Why aren't you using plain unsigned longs like the rest of the file does
> with memory addresses?

Sorry. I will unify the type as unsinged long.

> 
> Also, this function could have a comment above it explaining what all
> that threshold and actual and possible address arithmetic is supposed to
> do.

Got it.

Thanks!
Masa
Chao Fan Feb. 11, 2019, 1:46 a.m. UTC | #41
On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote:
[...]

Hi Masa,

Sorry for delay, since last days were Chinese holiday.

>diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
>index c5a949335..7dd61b943 100644
>--- a/arch/x86/boot/compressed/acpi.c
>+++ b/arch/x86/boot/compressed/acpi.c
>@@ -288,6 +288,7 @@ int count_immovable_mem_regions(void)
> 	struct acpi_subtable_header *sub_table;
> 	struct acpi_table_header *table_header;
> 	char arg[MAX_ACPI_ARG_LENGTH];
>+	unsigned long long possible_addr, max_possible_addr = 0;

This line is so long that it should be added in first line.

> 	int num = 0;
> 
> 	if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 &&
>@@ -308,10 +309,19 @@ int count_immovable_mem_regions(void)
> 			struct acpi_srat_mem_affinity *ma;
> 
> 			ma = (struct acpi_srat_mem_affinity *)sub_table;
>-			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
>-				immovable_mem[num].start = ma->base_address;
>-				immovable_mem[num].size = ma->length;
>-				num++;
>+			if (ma->length) {
>+				if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>+					possible_addr =
>+						ma->base_address + ma->length;
>+					if (possible_addr > max_possible_addr)
>+						max_possible_addr =
>+							possible_addr;
>+				} else {
>+					immovable_mem[num].start =
>+						ma->base_address;
>+					immovable_mem[num].size = ma->length;
>+					num++;
>+				}
> 			}

It looks better in another mail where you add a new function.

Thanks,
Chao Fan

> 
> 			if (num >= MAX_NUMNODES*2) {
>@@ -320,6 +330,7 @@ int count_immovable_mem_regions(void)
> 			}
> 		}
> 		table += sub_table->length;
>+		boot_params->possible_mem_addr = max_possible_addr;
> 	}
> 	return num;
> }
>diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
>index 60733f137..5b64b606e 100644
>--- a/arch/x86/include/uapi/asm/bootparam.h
>+++ b/arch/x86/include/uapi/asm/bootparam.h
>@@ -156,7 +156,7 @@ struct boot_params {
> 	__u64  tboot_addr;				/* 0x058 */
> 	struct ist_info ist_info;			/* 0x060 */
> 	__u64 acpi_rsdp_addr;				/* 0x070 */
>-	__u8  _pad3[8];					/* 0x078 */
>+	__u64  possible_mem_addr;			/* 0x078 */
> 	__u8  hd0_info[16];	/* obsolete! */		/* 0x080 */
> 	__u8  hd1_info[16];	/* obsolete! */		/* 0x090 */
> 	struct sys_desc_table sys_desc_table; /* obsolete! */	/* 0x0a0 */
>diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
>index 3f452ffed..71fc28570 100644
>--- a/arch/x86/mm/kaslr.c
>+++ b/arch/x86/mm/kaslr.c
>@@ -70,6 +70,30 @@ static inline bool kaslr_memory_enabled(void)
> 	return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN);
> }
> 
>+static unsigned int __init kaslr_padding(void)
>+{
>+	unsigned int rand_mem_physical_padding =
>+				CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>+#ifdef CONFIG_MEMORY_HOTPLUG
>+	unsigned long long max_possible_phys, max_actual_phys, threshold;
>+
>+	if (!boot_params.possible_mem_addr)
>+		goto out;
>+
>+	max_actual_phys = roundup(PFN_PHYS(max_pfn), 1ULL << TB_SHIFT);
>+	max_possible_phys = roundup(boot_params.possible_mem_addr,
>+					1ULL << TB_SHIFT);
>+	threshold = max_actual_phys +
>+		((unsigned long long)rand_mem_physical_padding << TB_SHIFT);
>+
>+	if (max_possible_phys > threshold)
>+		rand_mem_physical_padding =
>+			(max_possible_phys - max_actual_phys) >> TB_SHIFT;
>+out:
>+#endif
>+	return rand_mem_physical_padding;
>+}
>+
> /* Initialize base and padding for each memory region randomized with KASLR */
> void __init kernel_randomize_memory(void)
> {
>@@ -103,7 +127,7 @@ void __init kernel_randomize_memory(void)
> 	 */
> 	BUG_ON(kaslr_regions[0].base != &page_offset_base);
> 	memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) +
>-		CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
>+			kaslr_padding();
> 
> 	/* Adapt phyiscal memory region size based on available memory */
> 	if (memory_tb < kaslr_regions[0].size_tb)
>-- 
>2.20.1
>
>Thanks,
>Masa
>
>
Masayoshi Mizuma Feb. 11, 2019, 8:11 p.m. UTC | #42
Hi Chao,

Thank you for your review.

On Mon, Feb 11, 2019 at 09:46:05AM +0800, Chao Fan wrote:
> On Tue, Feb 05, 2019 at 10:05:16AM -0500, Masayoshi Mizuma wrote:
> [...]
> 
> Hi Masa,
> 
> Sorry for delay, since last days were Chinese holiday.
> 
> >diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> >index c5a949335..7dd61b943 100644
> >--- a/arch/x86/boot/compressed/acpi.c
> >+++ b/arch/x86/boot/compressed/acpi.c
> >@@ -288,6 +288,7 @@ int count_immovable_mem_regions(void)
> > 	struct acpi_subtable_header *sub_table;
> > 	struct acpi_table_header *table_header;
> > 	char arg[MAX_ACPI_ARG_LENGTH];
> >+	unsigned long long possible_addr, max_possible_addr = 0;
> 
> This line is so long that it should be added in first line.

Thanks. I will simplify around the local variables.

> 
> > 	int num = 0;
> > 
> > 	if (cmdline_find_option("acpi", arg, sizeof(arg)) == 3 &&
> >@@ -308,10 +309,19 @@ int count_immovable_mem_regions(void)
> > 			struct acpi_srat_mem_affinity *ma;
> > 
> > 			ma = (struct acpi_srat_mem_affinity *)sub_table;
> >-			if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && ma->length) {
> >-				immovable_mem[num].start = ma->base_address;
> >-				immovable_mem[num].size = ma->length;
> >-				num++;
> >+			if (ma->length) {
> >+				if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> >+					possible_addr =
> >+						ma->base_address + ma->length;
> >+					if (possible_addr > max_possible_addr)
> >+						max_possible_addr =
> >+							possible_addr;
> >+				} else {
> >+					immovable_mem[num].start =
> >+						ma->base_address;
> >+					immovable_mem[num].size = ma->length;
> >+					num++;
> >+				}
> > 			}
> 
> It looks better in another mail where you add a new function.

Thanks!

- Masa