mbox series

[v6,kspp-next,00/22] Function Granular KASLR

Message ID 20210831144114.154-1-alexandr.lobakin@intel.com (mailing list archive)
Headers show
Series Function Granular KASLR | expand

Message

Alexander Lobakin Aug. 31, 2021, 2:40 p.m. UTC
This is a massive rework and a respin of Kristen Accardi's marvellous
FG-KASLR series (v5).

The major differences since v5 [0]:
* You can now tune the number of functions per each section to
  achieve the preferable vmlinux size or protection level. Default
  is still as one section per each function.
  This can be handy for storage-constrained systems. 4-8 fps are
  still strong, but reduce the size of the final vmlinu{x,z}
  significantly;
* I don't use orphan sections anymore. It's not reliable at all /
  may differ from linker to linker, and also conflicts with
  CONFIG_LD_ORPHAN_WARN which is great for catching random bugs ->
* All the .text.* sections are now being described explicitly in the
  linker script. A Perl script is used to take the original LDS, the
  original object file, read a list of input sections from it and
  generate the resulting LDS.
  This costs a bit of linking time as LD tends to think hard when
  processing scripts > 1 Mb. It adds about 40-60 seconds to the
  whole linking process (BTF step, 2-3 kallsyms steps and the final
  step), but "better safe than sorry".
  In addition, that approach allows to reserve some space at the end
  and add some link assertions ->
* Input .text section now must be empty, otherwise the linkage will
  be stopped. This is implemented by the size assertion in the
  resulting LD script and is designed to plug the potentional layout
  leakage. This also means that ->
* "Regular" ASM functions are now being placed into unique separate
  functions the same way compiler does this for C functions. This is
  achieved by introducing and using several new macros which take
  the symbol name as a base for its new section name.
  This gives a better opportunity to both DCE and FG-KASLR, as ASM
  code now can also be randomized or garbage-collected;
* It's now fully compatible with ClangLTO, ClangCFI,
  CONFIG_LD_ORPHAN_WARN and some more stuff landed since the last
  revision was published;
* Includes several fixes: relocations inside .altinstr_replacement
  code and minor issues found and/or suggested by LKP robot.

The series was compile-time and runtime tested on the following
setups with no issues:
- x86_64, GCC 11, Binutils 2.35;
- x86_64, Clang/LLVM 12, ClangLTO + ClangCFI (from Sami's tree).

The first 4 patches are from the linux-kbuild tree and included
to avoid merge conflicts and non-intuitive resolving of them.

The series is also available here: [1]

[0] https://lore.kernel.org/kernel-hardening/20200923173905.11219-1-kristen@linux.intel.com
[1] https://github.com/alobakin/linux/pull/3

The original v5 cover letter:

Function Granular Kernel Address Space Layout Randomization (fgkaslr)
---------------------------------------------------------------------

This patch set is an implementation of finer grained kernel address space
randomization. It rearranges your kernel code at load time 
on a per-function level granularity, with only around a second added to
boot time.

Changes in v5:
--------------
* fixed a bug in the code which increases boot heap size for
  CONFIG_FG_KASLR which prevented the boot heap from being increased
  for CONFIG_FG_KASLR when using bzip2 compression. Thanks to Andy Lavr
  for finding the problem and identifying the solution.
* changed the adjustment of the orc_unwind_ip table at boot time to
  disregard relocs associated with this table, and instead inspect the
  entries separately. Relocs are not able to be used since they are
  no longer correct once the table is resorted at buildtime.
* changed how orc_unwind_ip addresses in randomized sections are identified
  to include the byte immediately after the end of the section.
* updated module code to use kvmalloc/kvfree based on suggestions from
  Evgenii Shatokhin <eshatokhin@virtuozzo.com>.
* changed kernel commandline to disable fgkaslr to simply "nofgkaslr" to
  match the nokaslr option. fgkaslr="X" can be added at a later date
  if it is needed.
* Added a patch to force livepatch to require symbols to be unique if
  using while fgkaslr either for core or modules.

Changes in v4:
-------------
* dropped the patch to split out change to STATIC definition in
  x86/boot/compressed/misc.c and replaced with a patch authored
  by Kees Cook to avoid the duplicate malloc definitions
* Added a section to Documentation/admin-guide/kernel-parameters.txt
  to document the fgkaslr boot option.
* redesigned the patch to hide the new layout when reading
  /proc/kallsyms. The previous implementation utilized a dynamically
  allocated linked list to display the kernel and module symbols
  in alphabetical order. The new implementation uses a randomly
  shuffled index array to display the kernel and module symbols
  in a random order.

Changes in v3:
-------------
* Makefile changes to accommodate CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
* removal of extraneous ALIGN_PAGE from _etext changes
* changed variable names in x86/tools/relocs to be less confusing
* split out change to STATIC definition in x86/boot/compressed/misc.c
* Updates to Documentation to make it more clear what is preserved in .text
* much more detailed commit message for function granular KASLR patch
* minor tweaks and changes that make for more readable code
* this cover letter updated slightly to add additional details

Changes in v2:
--------------
* Fix to address i386 build failure
* Allow module reordering patch to be configured separately so that
  arm (or other non-x86_64 arches) can take advantage of module function
  reordering. This support has not be tested by me, but smoke tested by
  Ard Biesheuvel <ardb@kernel.org> on arm.
* Fix build issue when building on arm as reported by
  Ard Biesheuvel <ardb@kernel.org> 

Patches to objtool are included because they are dependencies for this
patchset, however they have been submitted by their maintainer separately.

Background
----------
KASLR was merged into the kernel with the objective of increasing the
difficulty of code reuse attacks. Code reuse attacks reused existing code
snippets to get around existing memory protections. They exploit software bugs
which expose addresses of useful code snippets to control the flow of
execution for their own nefarious purposes. KASLR moves the entire kernel
code text as a unit at boot time in order to make addresses less predictable.
The order of the code within the segment is unchanged - only the base address
is shifted. There are a few shortcomings to this algorithm.

1. Low Entropy - there are only so many locations the kernel can fit in. This
   means an attacker could guess without too much trouble.
2. Knowledge of a single address can reveal the offset of the base address,
   exposing all other locations for a published/known kernel image.
3. Info leaks abound.

Finer grained ASLR has been proposed as a way to make ASLR more resistant
to info leaks. It is not a new concept at all, and there are many variations
possible. Function reordering is an implementation of finer grained ASLR
which randomizes the layout of an address space on a function level
granularity. We use the term "fgkaslr" in this document to refer to the
technique of function reordering when used with KASLR, as well as finer grained
KASLR in general.

Proposed Improvement
--------------------
This patch set proposes adding function reordering on top of the existing
KASLR base address randomization. The over-arching objective is incremental
improvement over what we already have. It is designed to work in combination
with the existing solution. The implementation is really pretty simple, and
there are 2 main area where changes occur:

* Build time

GCC has had an option to place functions into individual .text sections for
many years now. This option can be used to implement function reordering at
load time. The final compiled vmlinux retains all the section headers, which
can be used to help find the address ranges of each function. Using this
information and an expanded table of relocation addresses, individual text
sections can be suffled immediately after decompression. Some data tables
inside the kernel that have assumptions about order require re-sorting
after being updated when applying relocations. In order to modify these tables,
a few key symbols are excluded from the objcopy symbol stripping process for
use after shuffling the text segments.

Some highlights from the build time changes to look for:

The top level kernel Makefile was modified to add the gcc flag if it
is supported. Currently, I am applying this flag to everything it is
possible to randomize. Anything that is written in C and not present in a
special input section is randomized. The final binary segment 0 retains a
consolidated .text section, as well as all the individual .text.* sections.
Future work could turn off this flags for selected files or even entire
subsystems, although obviously at the cost of security.

The relocs tool is updated to add relative relocations. This information
previously wasn't included because it wasn't necessary when moving the
entire .text segment as a unit. 

A new file was created to contain a list of symbols that objcopy should
keep. We use those symbols at load time as described below.

* Load time

The boot kernel was modified to parse the vmlinux elf file after
decompression to check for our interesting symbols that we kept, and to
look for any .text.* sections to randomize. The consolidated .text section
is skipped and not moved. The sections are shuffled randomly, and copied
into memory following the .text section in a new random order. The existing
code which updated relocation addresses was modified to account for
not just a fixed delta from the load address, but the offset that the function
section was moved to. This requires inspection of each address to see if
it was impacted by a randomization. We use a bsearch to make this less
horrible on performance. Any tables that need to be modified with new
addresses or resorted are updated using the symbol addresses parsed from the
elf symbol table.

In order to hide our new layout, symbols reported through /proc/kallsyms
will be displayed in a random order.

Security Considerations
-----------------------
The objective of this patch set is to improve a technology that is already
merged into the kernel (KASLR). This code will not prevent all attacks,
but should instead be considered as one of several tools that can be used.
In particular, this code is meant to make KASLR more effective in the presence
of info leaks.

How much entropy we are adding to the existing entropy of standard KASLR will
depend on a few variables. Firstly and most obviously, the number of functions
that are randomized matters. This implementation keeps the existing .text
section for code that cannot be randomized - for example, because it was
assembly code. The less sections to randomize, the less entropy. In addition,
due to alignment (16 bytes for x86_64), the number of bits in a address that
the attacker needs to guess is reduced, as the lower bits are identical.

Performance Impact
------------------
There are two areas where function reordering can impact performance: boot
time latency, and run time performance.

* Boot time latency
This implementation of finer grained KASLR impacts the boot time of the kernel
in several places. It requires additional parsing of the kernel ELF file to
obtain the section headers of the sections to be randomized. It calls the
random number generator for each section to be randomized to determine that
section's new memory location. It copies the decompressed kernel into a new
area of memory to avoid corruption when laying out the newly randomized
sections. It increases the number of relocations the kernel has to perform at
boot time vs. standard KASLR, and it also requires a lookup on each address
that needs to be relocated to see if it was in a randomized section and needs
to be adjusted by a new offset. Finally, it re-sorts a few data tables that
are required to be sorted by address.

Booting a test VM on a modern, well appointed system showed an increase in
latency of approximately 1 second.

* Run time
The performance impact at run-time of function reordering varies by workload.
Using kcbench, a kernel compilation benchmark, the performance of a kernel
build with finer grained KASLR was about 1% slower than a kernel with standard
KASLR. Analysis with perf showed a slightly higher percentage of 
L1-icache-load-misses. Other workloads were examined as well, with varied
results. Some workloads performed significantly worse under FGKASLR, while
others stayed the same or were mysteriously better. In general, it will
depend on the code flow whether or not finer grained KASLR will impact
your workload, and how the underlying code was designed. Because the layout
changes per boot, each time a system is rebooted the performance of a workload
may change.

Future work could identify hot areas that may not be randomized and either
leave them in the .text section or group them together into a single section
that may be randomized. If grouping things together helps, one other thing to
consider is that if we could identify text blobs that should be grouped together
to benefit a particular code flow, it could be interesting to explore
whether this security feature could be also be used as a performance
feature if you are interested in optimizing your kernel layout for a
particular workload at boot time. Optimizing function layout for a particular
workload has been researched and proven effective - for more information
read the Facebook paper "Optimizing Function Placement for Large-Scale
Data-Center Applications" (see references section below).

Image Size
----------
Adding additional section headers as a result of compiling with
-ffunction-sections will increase the size of the vmlinux ELF file.
With a standard distro config, the resulting vmlinux was increased by
about 3%. The compressed image is also increased due to the header files,
as well as the extra relocations that must be added. You can expect fgkaslr
to increase the size of the compressed image by about 15%.

Memory Usage
------------
fgkaslr increases the amount of heap that is required at boot time,
although this extra memory is released when the kernel has finished
decompression. As a result, it may not be appropriate to use this feature on
systems without much memory.

Building
--------
To enable fine grained KASLR, you need to have the following config options
set (including all the ones you would use to build normal KASLR)

CONFIG_FG_KASLR=y

In addition, fgkaslr is only supported for the X86_64 architecture.

Modules
-------
Modules are randomized similarly to the rest of the kernel by shuffling
the sections at load time prior to moving them into memory. The module must
also have been build with the -ffunction-sections compiler option.

Although fgkaslr for the kernel is only supported for the X86_64 architecture,
it is possible to use fgkaslr with modules on other architectures. To enable
this feature, select

CONFIG_MODULE_FG_KASLR=y

This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.

Disabling
---------
Disabling normal KASLR using the nokaslr command line option also disables
fgkaslr. It is also possible to disable fgkaslr separately by booting with
nofgkaslr on the commandline.

References
----------
There are a lot of academic papers which explore finer grained ASLR.
This paper in particular contributed the most to my implementation design
as well as my overall understanding of the problem space:

Selfrando: Securing the Tor Browser against De-anonymization Exploits,
M. Conti, S. Crane, T. Frassetto, et al.

For more information on how function layout impacts performance, see:

Optimizing Function Placement for Large-Scale Data-Center Applications,
G. Ottoni, B. Maher

Alexander Lobakin (7):
  linkage: add macros for putting ASM functions into own sections
  x86: conditionally place regular ASM functions into separate sections
  FG-KASLR: use a scripted approach to handle .text.* sections
  x86/boot: allow FG-KASLR to be selected
  arm64/crypto: conditionally place ASM functions into separate sections
  module: use a scripted approach for FG-KASLR
  maintainers: add MAINTAINERS entry for FG-KASLR

Kees Cook (2):
  x86/boot: Allow a "silent" kaslr random byte fetch
  x86/boot/compressed: Avoid duplicate malloc() implementations

Kristen Carlson Accardi (9):
  x86: tools/relocs: Support >64K section headers
  x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  Make sure ORC lookup covers the entire _etext - _stext
  x86/tools: Add relative relocs for randomized functions
  x86: Add support for function granular KASLR
  kallsyms: Hide layout
  livepatch: only match unique symbols when using fgkaslr
  module: Reorder functions
  Documentation: add a documentation for FG-KASLR

Masahiro Yamada (3):
  kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
  kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
  kbuild: merge vmlinux_link() between ARCH=um and other architectures

Sami Tolvanen (1):
  kbuild: Fix TRIM_UNUSED_KSYMS with LTO_CLANG

 .../admin-guide/kernel-parameters.txt         |   6 +
 Documentation/security/fgkaslr.rst            | 172 ++++
 Documentation/security/index.rst              |   1 +
 MAINTAINERS                                   |  12 +
 Makefile                                      |  17 +-
 arch/Kconfig                                  |   3 +
 arch/arm64/crypto/aes-ce-ccm-core.S           |  16 +-
 arch/arm64/crypto/aes-ce-core.S               |  16 +-
 arch/arm64/crypto/aes-ce.S                    |   4 +-
 arch/arm64/crypto/aes-cipher-core.S           |   8 +-
 arch/arm64/crypto/aes-modes.S                 |  16 +-
 arch/arm64/crypto/aes-neon.S                  |   4 +-
 arch/arm64/crypto/aes-neonbs-core.S           |  38 +-
 arch/arm64/crypto/chacha-neon-core.S          |  18 +-
 arch/arm64/crypto/crct10dif-ce-core.S         |  14 +-
 arch/arm64/crypto/ghash-ce-core.S             |  24 +-
 arch/arm64/crypto/nh-neon-core.S              |   4 +-
 arch/arm64/crypto/poly1305-armv8.pl           |  17 +
 arch/arm64/crypto/sha1-ce-core.S              |   4 +-
 arch/arm64/crypto/sha2-ce-core.S              |   4 +-
 arch/arm64/crypto/sha3-ce-core.S              |   4 +-
 arch/arm64/crypto/sha512-armv8.pl             |  11 +
 arch/arm64/crypto/sha512-ce-core.S            |   4 +-
 arch/arm64/crypto/sm3-ce-core.S               |   4 +-
 arch/arm64/crypto/sm4-ce-core.S               |   4 +-
 arch/x86/Kconfig                              |   1 +
 arch/x86/boot/compressed/Makefile             |   9 +-
 arch/x86/boot/compressed/fgkaslr.c            | 905 ++++++++++++++++++
 arch/x86/boot/compressed/kaslr.c              |   4 -
 arch/x86/boot/compressed/misc.c               | 157 ++-
 arch/x86/boot/compressed/misc.h               |  30 +
 arch/x86/boot/compressed/utils.c              |  13 +
 arch/x86/boot/compressed/vmlinux.symbols      |  19 +
 arch/x86/crypto/aegis128-aesni-asm.S          |  36 +-
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S       |  12 +-
 arch/x86/crypto/aesni-intel_asm.S             | 116 ++-
 arch/x86/crypto/aesni-intel_avx-x86_64.S      |  32 +-
 arch/x86/crypto/blake2s-core.S                |   8 +-
 arch/x86/crypto/blowfish-x86_64-asm_64.S      |  16 +-
 arch/x86/crypto/camellia-aesni-avx-asm_64.S   |  28 +-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S  |  28 +-
 arch/x86/crypto/camellia-x86_64-asm_64.S      |  16 +-
 arch/x86/crypto/cast5-avx-x86_64-asm_64.S     |  24 +-
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S     |  20 +-
 arch/x86/crypto/chacha-avx2-x86_64.S          |  12 +-
 arch/x86/crypto/chacha-avx512vl-x86_64.S      |  12 +-
 arch/x86/crypto/chacha-ssse3-x86_64.S         |  16 +-
 arch/x86/crypto/crc32-pclmul_asm.S            |   4 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S     |   4 +-
 arch/x86/crypto/crct10dif-pcl-asm_64.S        |   4 +-
 arch/x86/crypto/des3_ede-asm_64.S             |   8 +-
 arch/x86/crypto/ghash-clmulni-intel_asm.S     |  12 +-
 arch/x86/crypto/nh-avx2-x86_64.S              |   4 +-
 arch/x86/crypto/nh-sse2-x86_64.S              |   4 +-
 arch/x86/crypto/poly1305-x86_64-cryptogams.pl |   8 +-
 arch/x86/crypto/serpent-avx-x86_64-asm_64.S   |  20 +-
 arch/x86/crypto/serpent-avx2-asm_64.S         |  20 +-
 arch/x86/crypto/serpent-sse2-i586-asm_32.S    |   8 +-
 arch/x86/crypto/serpent-sse2-x86_64-asm_64.S  |   8 +-
 arch/x86/crypto/sha1_avx2_x86_64_asm.S        |   4 +-
 arch/x86/crypto/sha1_ni_asm.S                 |   4 +-
 arch/x86/crypto/sha1_ssse3_asm.S              |   4 +-
 arch/x86/crypto/sha256-avx-asm.S              |   4 +-
 arch/x86/crypto/sha256-avx2-asm.S             |   4 +-
 arch/x86/crypto/sha256-ssse3-asm.S            |   4 +-
 arch/x86/crypto/sha256_ni_asm.S               |   4 +-
 arch/x86/crypto/sha512-avx-asm.S              |   4 +-
 arch/x86/crypto/sha512-avx2-asm.S             |   4 +-
 arch/x86/crypto/sha512-ssse3-asm.S            |   4 +-
 arch/x86/crypto/twofish-avx-x86_64-asm_64.S   |  20 +-
 arch/x86/crypto/twofish-i586-asm_32.S         |   8 +-
 arch/x86/crypto/twofish-x86_64-asm_64-3way.S  |   8 +-
 arch/x86/crypto/twofish-x86_64-asm_64.S       |   8 +-
 arch/x86/entry/entry_32.S                     |  24 +-
 arch/x86/entry/entry_64.S                     |  18 +-
 arch/x86/entry/thunk_32.S                     |   4 +-
 arch/x86/entry/thunk_64.S                     |   8 +-
 arch/x86/include/asm/boot.h                   |  13 +-
 arch/x86/include/asm/paravirt.h               |   2 +-
 arch/x86/include/asm/qspinlock_paravirt.h     |   2 +-
 arch/x86/kernel/acpi/wakeup_32.S              |   9 +-
 arch/x86/kernel/acpi/wakeup_64.S              |  10 +-
 arch/x86/kernel/ftrace_32.S                   |  19 +-
 arch/x86/kernel/ftrace_64.S                   |  28 +-
 arch/x86/kernel/irqflags.S                    |   4 +-
 arch/x86/kernel/kprobes/core.c                |   3 +-
 arch/x86/kernel/kvm.c                         |   2 +-
 arch/x86/kernel/relocate_kernel_32.S          |   2 +
 arch/x86/kernel/relocate_kernel_64.S          |   2 +
 arch/x86/kernel/vmlinux.lds.S                 |   6 +-
 arch/x86/kvm/emulate.c                        |   2 +-
 arch/x86/kvm/vmx/vmenter.S                    |   8 +-
 arch/x86/lib/clear_page_64.S                  |  12 +-
 arch/x86/lib/cmpxchg16b_emu.S                 |   4 +-
 arch/x86/lib/copy_mc_64.S                     |   8 +-
 arch/x86/lib/copy_page_64.S                   |   7 +-
 arch/x86/lib/copy_user_64.S                   |  18 +-
 arch/x86/lib/csum-copy_64.S                   |   4 +-
 arch/x86/lib/error-inject.c                   |   3 +-
 arch/x86/lib/getuser.S                        |  37 +-
 arch/x86/lib/hweight.S                        |   9 +-
 arch/x86/lib/iomap_copy_64.S                  |   4 +-
 arch/x86/lib/kaslr.c                          |  18 +-
 arch/x86/lib/memmove_64.S                     |   4 +-
 arch/x86/lib/memset_64.S                      |  12 +-
 arch/x86/lib/msr-reg.S                        |   8 +-
 arch/x86/lib/putuser.S                        |  18 +-
 arch/x86/mm/mem_encrypt_boot.S                |   8 +-
 arch/x86/platform/efi/efi_stub_64.S           |   4 +-
 arch/x86/platform/efi/efi_thunk_64.S          |   4 +-
 arch/x86/power/hibernate_asm_32.S             |  14 +-
 arch/x86/power/hibernate_asm_64.S             |  14 +-
 arch/x86/tools/relocs.c                       | 135 ++-
 arch/x86/tools/relocs.h                       |   4 +-
 arch/x86/tools/relocs_common.c                |  15 +-
 arch/x86/xen/xen-asm.S                        |  49 +-
 arch/x86/xen/xen-head.S                       |  10 +-
 include/asm-generic/vmlinux.lds.h             |  41 +-
 include/linux/decompress/mm.h                 |  12 +-
 include/linux/linkage.h                       |  76 ++
 include/uapi/linux/elf.h                      |   1 +
 init/Kconfig                                  |  51 +
 kernel/kallsyms.c                             | 158 ++-
 kernel/livepatch/core.c                       |  11 +
 kernel/module.c                               |  91 +-
 scripts/Makefile.build                        |  27 +-
 scripts/Makefile.lib                          |   7 +
 scripts/Makefile.modfinal                     |  36 +-
 scripts/Makefile.modpost                      |  22 +-
 scripts/gen_autoksyms.sh                      |  12 -
 scripts/generate_text_sections.pl             | 149 +++
 scripts/link-vmlinux.sh                       | 104 +-
 scripts/module.lds.S                          |  14 +-
 133 files changed, 2771 insertions(+), 757 deletions(-)
 create mode 100644 Documentation/security/fgkaslr.rst
 create mode 100644 arch/x86/boot/compressed/fgkaslr.c
 create mode 100644 arch/x86/boot/compressed/utils.c
 create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
 create mode 100755 scripts/generate_text_sections.pl

Comments

Kees Cook Aug. 31, 2021, 5:27 p.m. UTC | #1
On Tue, Aug 31, 2021 at 04:40:52PM +0200, Alexander Lobakin wrote:
> This is a massive rework and a respin of Kristen Accardi's marvellous
> FG-KASLR series (v5).

Thanks for working on this! I know Marios has been looking at some of
this as well. I think he tracked down a kretprobes bug and has a fixed
prepared.

> The major differences since v5 [0]:
> * You can now tune the number of functions per each section to
>   achieve the preferable vmlinux size or protection level. Default
>   is still as one section per each function.
>   This can be handy for storage-constrained systems. 4-8 fps are
>   still strong, but reduce the size of the final vmlinu{x,z}
>   significantly;

Interesting, but I'm not sure what the size issue is. v5's on-disk
image size issues were related to the large relocation table that was
used during decompress and layout, but would get discarded. The final
in-core image size was roughly the same size as a non-FGKASLR kernel
(since functions were already aligned even without -ffunction-sections).
How does the functions-per-section knob change image size?

> * I don't use orphan sections anymore. It's not reliable at all /
>   may differ from linker to linker, and also conflicts with
>   CONFIG_LD_ORPHAN_WARN which is great for catching random bugs ->
> * All the .text.* sections are now being described explicitly in the
>   linker script. A Perl script is used to take the original LDS, the
>   original object file, read a list of input sections from it and
>   generate the resulting LDS.
>   This costs a bit of linking time as LD tends to think hard when
>   processing scripts > 1 Mb. It adds about 40-60 seconds to the
>   whole linking process (BTF step, 2-3 kallsyms steps and the final
>   step), but "better safe than sorry".
>   In addition, that approach allows to reserve some space at the end
>   and add some link assertions ->

Yeah, this "hope that orphan handling does it right" bugged me too, but my
attempts to solve it looked much like yours: creating a linker file that
named all the sections. I found this to be prohibitively expensive at link
time (and that seems backed by your own measurements of an extra minute
or so at link time). If that's still the result of using a generated
linker file, we just need to depend on orphan handling. LD_ORPHAN_WARN
will still exist for non-FGKASLR builds, so the benefits will continue
to exist -- I think the correct solution is to have the linker grow a
"pass through" special target like "DISCARD", which just maps given
input section patterns into same-named output sections.

> * Input .text section now must be empty, otherwise the linkage will
>   be stopped. This is implemented by the size assertion in the
>   resulting LD script and is designed to plug the potentional layout
>   leakage. This also means that ->

I worry this will create unexpected problems for named sections that
weren't originally being randomized with the v5 FGKASLR.

> * "Regular" ASM functions are now being placed into unique separate
>   functions the same way compiler does this for C functions. This is
>   achieved by introducing and using several new macros which take
>   the symbol name as a base for its new section name.
>   This gives a better opportunity to both DCE and FG-KASLR, as ASM
>   code now can also be randomized or garbage-collected;

This is interesting! I think it'd be a good evolutionary step on top of
"basic FGKASLR".

> * It's now fully compatible with ClangLTO, ClangCFI,
>   CONFIG_LD_ORPHAN_WARN and some more stuff landed since the last
>   revision was published;

FWIW, v5 was was too. :) I didn't have to do anything to v5 to make it
work with ClangLTO and ClangCFI.

> * Includes several fixes: relocations inside .altinstr_replacement
>   code and minor issues found and/or suggested by LKP robot.

Excellent!

> The series was compile-time and runtime tested on the following
> setups with no issues:
> - x86_64, GCC 11, Binutils 2.35;
> - x86_64, Clang/LLVM 12, ClangLTO + ClangCFI (from Sami's tree).

Great, this is a good start. One place we saw problems in the past was
with i386 build gotchas, so that'll need testing too.

> The first 4 patches are from the linux-kbuild tree and included
> to avoid merge conflicts and non-intuitive resolving of them.

Sounds good. It might be easier to base the series on linux-next, so a
smaller series. Though given the merge window just opened, it might make
more sense for a v7 to be based on v5.15-rc2 in three weeks.

> The series is also available here: [1]
> 
> [0] https://lore.kernel.org/kernel-hardening/20200923173905.11219-1-kristen@linux.intel.com
> [1] https://github.com/alobakin/linux/pull/3
> 
> The original v5 cover letter:

More notes below...

> 
> Function Granular Kernel Address Space Layout Randomization (fgkaslr)
> ---------------------------------------------------------------------
> 
> This patch set is an implementation of finer grained kernel address space
> randomization. It rearranges your kernel code at load time 
> on a per-function level granularity, with only around a second added to
> boot time.
> 
> Changes in v5:
> --------------
> * fixed a bug in the code which increases boot heap size for
>   CONFIG_FG_KASLR which prevented the boot heap from being increased
>   for CONFIG_FG_KASLR when using bzip2 compression. Thanks to Andy Lavr
>   for finding the problem and identifying the solution.
> * changed the adjustment of the orc_unwind_ip table at boot time to
>   disregard relocs associated with this table, and instead inspect the
>   entries separately. Relocs are not able to be used since they are
>   no longer correct once the table is resorted at buildtime.
> * changed how orc_unwind_ip addresses in randomized sections are identified
>   to include the byte immediately after the end of the section.
> * updated module code to use kvmalloc/kvfree based on suggestions from
>   Evgenii Shatokhin <eshatokhin@virtuozzo.com>.
> * changed kernel commandline to disable fgkaslr to simply "nofgkaslr" to
>   match the nokaslr option. fgkaslr="X" can be added at a later date
>   if it is needed.
> * Added a patch to force livepatch to require symbols to be unique if
>   using while fgkaslr either for core or modules.
> 
> Changes in v4:
> -------------
> * dropped the patch to split out change to STATIC definition in
>   x86/boot/compressed/misc.c and replaced with a patch authored
>   by Kees Cook to avoid the duplicate malloc definitions
> * Added a section to Documentation/admin-guide/kernel-parameters.txt
>   to document the fgkaslr boot option.
> * redesigned the patch to hide the new layout when reading
>   /proc/kallsyms. The previous implementation utilized a dynamically
>   allocated linked list to display the kernel and module symbols
>   in alphabetical order. The new implementation uses a randomly
>   shuffled index array to display the kernel and module symbols
>   in a random order.
> 
> Changes in v3:
> -------------
> * Makefile changes to accommodate CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> * removal of extraneous ALIGN_PAGE from _etext changes
> * changed variable names in x86/tools/relocs to be less confusing
> * split out change to STATIC definition in x86/boot/compressed/misc.c
> * Updates to Documentation to make it more clear what is preserved in .text
> * much more detailed commit message for function granular KASLR patch
> * minor tweaks and changes that make for more readable code
> * this cover letter updated slightly to add additional details
> 
> Changes in v2:
> --------------
> * Fix to address i386 build failure
> * Allow module reordering patch to be configured separately so that
>   arm (or other non-x86_64 arches) can take advantage of module function
>   reordering. This support has not be tested by me, but smoke tested by
>   Ard Biesheuvel <ardb@kernel.org> on arm.
> * Fix build issue when building on arm as reported by
>   Ard Biesheuvel <ardb@kernel.org> 
> 
> Patches to objtool are included because they are dependencies for this
> patchset, however they have been submitted by their maintainer separately.
> 
> Background
> ----------
> KASLR was merged into the kernel with the objective of increasing the
> difficulty of code reuse attacks. Code reuse attacks reused existing code
> snippets to get around existing memory protections. They exploit software bugs
> which expose addresses of useful code snippets to control the flow of
> execution for their own nefarious purposes. KASLR moves the entire kernel
> code text as a unit at boot time in order to make addresses less predictable.
> The order of the code within the segment is unchanged - only the base address
> is shifted. There are a few shortcomings to this algorithm.
> 
> 1. Low Entropy - there are only so many locations the kernel can fit in. This
>    means an attacker could guess without too much trouble.
> 2. Knowledge of a single address can reveal the offset of the base address,
>    exposing all other locations for a published/known kernel image.
> 3. Info leaks abound.
> 
> Finer grained ASLR has been proposed as a way to make ASLR more resistant
> to info leaks. It is not a new concept at all, and there are many variations
> possible. Function reordering is an implementation of finer grained ASLR
> which randomizes the layout of an address space on a function level
> granularity. We use the term "fgkaslr" in this document to refer to the
> technique of function reordering when used with KASLR, as well as finer grained
> KASLR in general.
> 
> Proposed Improvement
> --------------------
> This patch set proposes adding function reordering on top of the existing
> KASLR base address randomization. The over-arching objective is incremental
> improvement over what we already have. It is designed to work in combination
> with the existing solution. The implementation is really pretty simple, and
> there are 2 main area where changes occur:
> 
> * Build time
> 
> GCC has had an option to place functions into individual .text sections for
> many years now. This option can be used to implement function reordering at
> load time. The final compiled vmlinux retains all the section headers, which
> can be used to help find the address ranges of each function. Using this
> information and an expanded table of relocation addresses, individual text
> sections can be suffled immediately after decompression. Some data tables
> inside the kernel that have assumptions about order require re-sorting
> after being updated when applying relocations. In order to modify these tables,
> a few key symbols are excluded from the objcopy symbol stripping process for
> use after shuffling the text segments.
> 
> Some highlights from the build time changes to look for:
> 
> The top level kernel Makefile was modified to add the gcc flag if it
> is supported. Currently, I am applying this flag to everything it is
> possible to randomize. Anything that is written in C and not present in a
> special input section is randomized. The final binary segment 0 retains a
> consolidated .text section, as well as all the individual .text.* sections.
> Future work could turn off this flags for selected files or even entire
> subsystems, although obviously at the cost of security.
> 
> The relocs tool is updated to add relative relocations. This information
> previously wasn't included because it wasn't necessary when moving the
> entire .text segment as a unit. 
> 
> A new file was created to contain a list of symbols that objcopy should
> keep. We use those symbols at load time as described below.
> 
> * Load time
> 
> The boot kernel was modified to parse the vmlinux elf file after
> decompression to check for our interesting symbols that we kept, and to
> look for any .text.* sections to randomize. The consolidated .text section
> is skipped and not moved. The sections are shuffled randomly, and copied
> into memory following the .text section in a new random order. The existing
> code which updated relocation addresses was modified to account for
> not just a fixed delta from the load address, but the offset that the function
> section was moved to. This requires inspection of each address to see if
> it was impacted by a randomization. We use a bsearch to make this less
> horrible on performance. Any tables that need to be modified with new
> addresses or resorted are updated using the symbol addresses parsed from the
> elf symbol table.
> 
> In order to hide our new layout, symbols reported through /proc/kallsyms
> will be displayed in a random order.
> 
> Security Considerations
> -----------------------
> The objective of this patch set is to improve a technology that is already
> merged into the kernel (KASLR). This code will not prevent all attacks,
> but should instead be considered as one of several tools that can be used.
> In particular, this code is meant to make KASLR more effective in the presence
> of info leaks.
> 
> How much entropy we are adding to the existing entropy of standard KASLR will
> depend on a few variables. Firstly and most obviously, the number of functions
> that are randomized matters. This implementation keeps the existing .text
> section for code that cannot be randomized - for example, because it was
> assembly code. The less sections to randomize, the less entropy. In addition,
> due to alignment (16 bytes for x86_64), the number of bits in a address that
> the attacker needs to guess is reduced, as the lower bits are identical.
> 
> Performance Impact
> ------------------
> There are two areas where function reordering can impact performance: boot
> time latency, and run time performance.
> 
> * Boot time latency
> This implementation of finer grained KASLR impacts the boot time of the kernel
> in several places. It requires additional parsing of the kernel ELF file to
> obtain the section headers of the sections to be randomized. It calls the
> random number generator for each section to be randomized to determine that
> section's new memory location. It copies the decompressed kernel into a new
> area of memory to avoid corruption when laying out the newly randomized
> sections. It increases the number of relocations the kernel has to perform at
> boot time vs. standard KASLR, and it also requires a lookup on each address
> that needs to be relocated to see if it was in a randomized section and needs
> to be adjusted by a new offset. Finally, it re-sorts a few data tables that
> are required to be sorted by address.
> 
> Booting a test VM on a modern, well appointed system showed an increase in
> latency of approximately 1 second.
> 
> * Run time
> The performance impact at run-time of function reordering varies by workload.
> Using kcbench, a kernel compilation benchmark, the performance of a kernel
> build with finer grained KASLR was about 1% slower than a kernel with standard
> KASLR. Analysis with perf showed a slightly higher percentage of 
> L1-icache-load-misses. Other workloads were examined as well, with varied
> results. Some workloads performed significantly worse under FGKASLR, while
> others stayed the same or were mysteriously better. In general, it will
> depend on the code flow whether or not finer grained KASLR will impact
> your workload, and how the underlying code was designed. Because the layout
> changes per boot, each time a system is rebooted the performance of a workload
> may change.
> 
> Future work could identify hot areas that may not be randomized and either
> leave them in the .text section or group them together into a single section
> that may be randomized. If grouping things together helps, one other thing to
> consider is that if we could identify text blobs that should be grouped together
> to benefit a particular code flow, it could be interesting to explore
> whether this security feature could be also be used as a performance
> feature if you are interested in optimizing your kernel layout for a
> particular workload at boot time. Optimizing function layout for a particular
> workload has been researched and proven effective - for more information
> read the Facebook paper "Optimizing Function Placement for Large-Scale
> Data-Center Applications" (see references section below).
> 
> Image Size
> ----------
> Adding additional section headers as a result of compiling with
> -ffunction-sections will increase the size of the vmlinux ELF file.
> With a standard distro config, the resulting vmlinux was increased by
> about 3%. The compressed image is also increased due to the header files,
> as well as the extra relocations that must be added. You can expect fgkaslr
> to increase the size of the compressed image by about 15%.
> 
> Memory Usage
> ------------
> fgkaslr increases the amount of heap that is required at boot time,
> although this extra memory is released when the kernel has finished
> decompression. As a result, it may not be appropriate to use this feature on
> systems without much memory.
> 
> Building
> --------
> To enable fine grained KASLR, you need to have the following config options
> set (including all the ones you would use to build normal KASLR)
> 
> CONFIG_FG_KASLR=y
> 
> In addition, fgkaslr is only supported for the X86_64 architecture.
> 
> Modules
> -------
> Modules are randomized similarly to the rest of the kernel by shuffling
> the sections at load time prior to moving them into memory. The module must
> also have been build with the -ffunction-sections compiler option.
> 
> Although fgkaslr for the kernel is only supported for the X86_64 architecture,
> it is possible to use fgkaslr with modules on other architectures. To enable
> this feature, select
> 
> CONFIG_MODULE_FG_KASLR=y
> 
> This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.
> 
> Disabling
> ---------
> Disabling normal KASLR using the nokaslr command line option also disables
> fgkaslr. It is also possible to disable fgkaslr separately by booting with
> nofgkaslr on the commandline.
> 
> References
> ----------
> There are a lot of academic papers which explore finer grained ASLR.
> This paper in particular contributed the most to my implementation design
> as well as my overall understanding of the problem space:
> 
> Selfrando: Securing the Tor Browser against De-anonymization Exploits,
> M. Conti, S. Crane, T. Frassetto, et al.
> 
> For more information on how function layout impacts performance, see:
> 
> Optimizing Function Placement for Large-Scale Data-Center Applications,
> G. Ottoni, B. Maher
> 
> Alexander Lobakin (7):
>   linkage: add macros for putting ASM functions into own sections
>   x86: conditionally place regular ASM functions into separate sections
>   FG-KASLR: use a scripted approach to handle .text.* sections
>   x86/boot: allow FG-KASLR to be selected
>   arm64/crypto: conditionally place ASM functions into separate sections
>   module: use a scripted approach for FG-KASLR
>   maintainers: add MAINTAINERS entry for FG-KASLR
> 
> Kees Cook (2):
>   x86/boot: Allow a "silent" kaslr random byte fetch
>   x86/boot/compressed: Avoid duplicate malloc() implementations

These two can get landed right away -- they're standalone fixes that
can safely go in -tip.

> 
> Kristen Carlson Accardi (9):
>   x86: tools/relocs: Support >64K section headers

Same for this.

>   x86: Makefile: Add build and config option for CONFIG_FG_KASLR
>   Make sure ORC lookup covers the entire _etext - _stext
>   x86/tools: Add relative relocs for randomized functions
>   x86: Add support for function granular KASLR
>   kallsyms: Hide layout
>   livepatch: only match unique symbols when using fgkaslr
>   module: Reorder functions
>   Documentation: add a documentation for FG-KASLR

I suspect it'll still be easier to review this series as a rebase v5
followed by the evolutionary improvements, since the "basic FGKASLR" has
been reviewed in the past, and is fairly noninvasive. The changes for
ASM, new .text rules, etc, make a lot more changes that I think would be
nice to have separate so reasonable a/b testing can be done.

I'll try to go through the individual patches soon, though I'm currently
pretty swamped. :)

I'm looking forward to having this feature finally landed; it's a nice
complement to future eXecute-Only memory work too.

-Kees

> 
> Masahiro Yamada (3):
>   kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
>   kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
>   kbuild: merge vmlinux_link() between ARCH=um and other architectures
> 
> Sami Tolvanen (1):
>   kbuild: Fix TRIM_UNUSED_KSYMS with LTO_CLANG
> 
>  .../admin-guide/kernel-parameters.txt         |   6 +
>  Documentation/security/fgkaslr.rst            | 172 ++++
>  Documentation/security/index.rst              |   1 +
>  MAINTAINERS                                   |  12 +
>  Makefile                                      |  17 +-
>  arch/Kconfig                                  |   3 +
>  arch/arm64/crypto/aes-ce-ccm-core.S           |  16 +-
>  arch/arm64/crypto/aes-ce-core.S               |  16 +-
>  arch/arm64/crypto/aes-ce.S                    |   4 +-
>  arch/arm64/crypto/aes-cipher-core.S           |   8 +-
>  arch/arm64/crypto/aes-modes.S                 |  16 +-
>  arch/arm64/crypto/aes-neon.S                  |   4 +-
>  arch/arm64/crypto/aes-neonbs-core.S           |  38 +-
>  arch/arm64/crypto/chacha-neon-core.S          |  18 +-
>  arch/arm64/crypto/crct10dif-ce-core.S         |  14 +-
>  arch/arm64/crypto/ghash-ce-core.S             |  24 +-
>  arch/arm64/crypto/nh-neon-core.S              |   4 +-
>  arch/arm64/crypto/poly1305-armv8.pl           |  17 +
>  arch/arm64/crypto/sha1-ce-core.S              |   4 +-
>  arch/arm64/crypto/sha2-ce-core.S              |   4 +-
>  arch/arm64/crypto/sha3-ce-core.S              |   4 +-
>  arch/arm64/crypto/sha512-armv8.pl             |  11 +
>  arch/arm64/crypto/sha512-ce-core.S            |   4 +-
>  arch/arm64/crypto/sm3-ce-core.S               |   4 +-
>  arch/arm64/crypto/sm4-ce-core.S               |   4 +-
>  arch/x86/Kconfig                              |   1 +
>  arch/x86/boot/compressed/Makefile             |   9 +-
>  arch/x86/boot/compressed/fgkaslr.c            | 905 ++++++++++++++++++
>  arch/x86/boot/compressed/kaslr.c              |   4 -
>  arch/x86/boot/compressed/misc.c               | 157 ++-
>  arch/x86/boot/compressed/misc.h               |  30 +
>  arch/x86/boot/compressed/utils.c              |  13 +
>  arch/x86/boot/compressed/vmlinux.symbols      |  19 +
>  arch/x86/crypto/aegis128-aesni-asm.S          |  36 +-
>  arch/x86/crypto/aes_ctrby8_avx-x86_64.S       |  12 +-
>  arch/x86/crypto/aesni-intel_asm.S             | 116 ++-
>  arch/x86/crypto/aesni-intel_avx-x86_64.S      |  32 +-
>  arch/x86/crypto/blake2s-core.S                |   8 +-
>  arch/x86/crypto/blowfish-x86_64-asm_64.S      |  16 +-
>  arch/x86/crypto/camellia-aesni-avx-asm_64.S   |  28 +-
>  arch/x86/crypto/camellia-aesni-avx2-asm_64.S  |  28 +-
>  arch/x86/crypto/camellia-x86_64-asm_64.S      |  16 +-
>  arch/x86/crypto/cast5-avx-x86_64-asm_64.S     |  24 +-
>  arch/x86/crypto/cast6-avx-x86_64-asm_64.S     |  20 +-
>  arch/x86/crypto/chacha-avx2-x86_64.S          |  12 +-
>  arch/x86/crypto/chacha-avx512vl-x86_64.S      |  12 +-
>  arch/x86/crypto/chacha-ssse3-x86_64.S         |  16 +-
>  arch/x86/crypto/crc32-pclmul_asm.S            |   4 +-
>  arch/x86/crypto/crc32c-pcl-intel-asm_64.S     |   4 +-
>  arch/x86/crypto/crct10dif-pcl-asm_64.S        |   4 +-
>  arch/x86/crypto/des3_ede-asm_64.S             |   8 +-
>  arch/x86/crypto/ghash-clmulni-intel_asm.S     |  12 +-
>  arch/x86/crypto/nh-avx2-x86_64.S              |   4 +-
>  arch/x86/crypto/nh-sse2-x86_64.S              |   4 +-
>  arch/x86/crypto/poly1305-x86_64-cryptogams.pl |   8 +-
>  arch/x86/crypto/serpent-avx-x86_64-asm_64.S   |  20 +-
>  arch/x86/crypto/serpent-avx2-asm_64.S         |  20 +-
>  arch/x86/crypto/serpent-sse2-i586-asm_32.S    |   8 +-
>  arch/x86/crypto/serpent-sse2-x86_64-asm_64.S  |   8 +-
>  arch/x86/crypto/sha1_avx2_x86_64_asm.S        |   4 +-
>  arch/x86/crypto/sha1_ni_asm.S                 |   4 +-
>  arch/x86/crypto/sha1_ssse3_asm.S              |   4 +-
>  arch/x86/crypto/sha256-avx-asm.S              |   4 +-
>  arch/x86/crypto/sha256-avx2-asm.S             |   4 +-
>  arch/x86/crypto/sha256-ssse3-asm.S            |   4 +-
>  arch/x86/crypto/sha256_ni_asm.S               |   4 +-
>  arch/x86/crypto/sha512-avx-asm.S              |   4 +-
>  arch/x86/crypto/sha512-avx2-asm.S             |   4 +-
>  arch/x86/crypto/sha512-ssse3-asm.S            |   4 +-
>  arch/x86/crypto/twofish-avx-x86_64-asm_64.S   |  20 +-
>  arch/x86/crypto/twofish-i586-asm_32.S         |   8 +-
>  arch/x86/crypto/twofish-x86_64-asm_64-3way.S  |   8 +-
>  arch/x86/crypto/twofish-x86_64-asm_64.S       |   8 +-
>  arch/x86/entry/entry_32.S                     |  24 +-
>  arch/x86/entry/entry_64.S                     |  18 +-
>  arch/x86/entry/thunk_32.S                     |   4 +-
>  arch/x86/entry/thunk_64.S                     |   8 +-
>  arch/x86/include/asm/boot.h                   |  13 +-
>  arch/x86/include/asm/paravirt.h               |   2 +-
>  arch/x86/include/asm/qspinlock_paravirt.h     |   2 +-
>  arch/x86/kernel/acpi/wakeup_32.S              |   9 +-
>  arch/x86/kernel/acpi/wakeup_64.S              |  10 +-
>  arch/x86/kernel/ftrace_32.S                   |  19 +-
>  arch/x86/kernel/ftrace_64.S                   |  28 +-
>  arch/x86/kernel/irqflags.S                    |   4 +-
>  arch/x86/kernel/kprobes/core.c                |   3 +-
>  arch/x86/kernel/kvm.c                         |   2 +-
>  arch/x86/kernel/relocate_kernel_32.S          |   2 +
>  arch/x86/kernel/relocate_kernel_64.S          |   2 +
>  arch/x86/kernel/vmlinux.lds.S                 |   6 +-
>  arch/x86/kvm/emulate.c                        |   2 +-
>  arch/x86/kvm/vmx/vmenter.S                    |   8 +-
>  arch/x86/lib/clear_page_64.S                  |  12 +-
>  arch/x86/lib/cmpxchg16b_emu.S                 |   4 +-
>  arch/x86/lib/copy_mc_64.S                     |   8 +-
>  arch/x86/lib/copy_page_64.S                   |   7 +-
>  arch/x86/lib/copy_user_64.S                   |  18 +-
>  arch/x86/lib/csum-copy_64.S                   |   4 +-
>  arch/x86/lib/error-inject.c                   |   3 +-
>  arch/x86/lib/getuser.S                        |  37 +-
>  arch/x86/lib/hweight.S                        |   9 +-
>  arch/x86/lib/iomap_copy_64.S                  |   4 +-
>  arch/x86/lib/kaslr.c                          |  18 +-
>  arch/x86/lib/memmove_64.S                     |   4 +-
>  arch/x86/lib/memset_64.S                      |  12 +-
>  arch/x86/lib/msr-reg.S                        |   8 +-
>  arch/x86/lib/putuser.S                        |  18 +-
>  arch/x86/mm/mem_encrypt_boot.S                |   8 +-
>  arch/x86/platform/efi/efi_stub_64.S           |   4 +-
>  arch/x86/platform/efi/efi_thunk_64.S          |   4 +-
>  arch/x86/power/hibernate_asm_32.S             |  14 +-
>  arch/x86/power/hibernate_asm_64.S             |  14 +-
>  arch/x86/tools/relocs.c                       | 135 ++-
>  arch/x86/tools/relocs.h                       |   4 +-
>  arch/x86/tools/relocs_common.c                |  15 +-
>  arch/x86/xen/xen-asm.S                        |  49 +-
>  arch/x86/xen/xen-head.S                       |  10 +-
>  include/asm-generic/vmlinux.lds.h             |  41 +-
>  include/linux/decompress/mm.h                 |  12 +-
>  include/linux/linkage.h                       |  76 ++
>  include/uapi/linux/elf.h                      |   1 +
>  init/Kconfig                                  |  51 +
>  kernel/kallsyms.c                             | 158 ++-
>  kernel/livepatch/core.c                       |  11 +
>  kernel/module.c                               |  91 +-
>  scripts/Makefile.build                        |  27 +-
>  scripts/Makefile.lib                          |   7 +
>  scripts/Makefile.modfinal                     |  36 +-
>  scripts/Makefile.modpost                      |  22 +-
>  scripts/gen_autoksyms.sh                      |  12 -
>  scripts/generate_text_sections.pl             | 149 +++
>  scripts/link-vmlinux.sh                       | 104 +-
>  scripts/module.lds.S                          |  14 +-
>  133 files changed, 2771 insertions(+), 757 deletions(-)
>  create mode 100644 Documentation/security/fgkaslr.rst
>  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
>  create mode 100644 arch/x86/boot/compressed/utils.c
>  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
>  create mode 100755 scripts/generate_text_sections.pl
> 
> -- 
> 2.31.1
>
Alexander Lobakin Sept. 1, 2021, 10:36 a.m. UTC | #2
From: Kees Cook <keescook@chromium.org>
Date: Tue, 31 Aug 2021 10:27:45 -0700

> On Tue, Aug 31, 2021 at 04:40:52PM +0200, Alexander Lobakin wrote:
> > This is a massive rework and a respin of Kristen Accardi's marvellous
> > FG-KASLR series (v5).
> 
> Thanks for working on this! I know Marios has been looking at some of
> this as well. I think he tracked down a kretprobes bug and has a fixed
> prepared.

I was waiting for the fix to be landed in our discussion, but it
hasn't appeared there, so I queued the series without it. Will be
glad to finally see the fix and include it in v7.

> > The major differences since v5 [0]:
> > * You can now tune the number of functions per each section to
> >   achieve the preferable vmlinux size or protection level. Default
> >   is still as one section per each function.
> >   This can be handy for storage-constrained systems. 4-8 fps are
> >   still strong, but reduce the size of the final vmlinu{x,z}
> >   significantly;
> 
> Interesting, but I'm not sure what the size issue is. v5's on-disk
> image size issues were related to the large relocation table that was
> used during decompress and layout, but would get discarded. The final
> in-core image size was roughly the same size as a non-FGKASLR kernel
> (since functions were already aligned even without -ffunction-sections).
> How does the functions-per-section knob change image size?

Without FG-KASLR, we have only one .text section, and the total
section number is relatively small.
With FG-KASLR enabled, we have 40K+ separate text sections (I have
40K on a setup with ClangLTO and ClangCFI and about 48K on a
"regular" one) and each of them is described in the ELF header. Plus
a separate .rela.text section for every single of them. That's the
main reason of the size increases.

> > * I don't use orphan sections anymore. It's not reliable at all /
> >   may differ from linker to linker, and also conflicts with
> >   CONFIG_LD_ORPHAN_WARN which is great for catching random bugs ->
> > * All the .text.* sections are now being described explicitly in the
> >   linker script. A Perl script is used to take the original LDS, the
> >   original object file, read a list of input sections from it and
> >   generate the resulting LDS.
> >   This costs a bit of linking time as LD tends to think hard when
> >   processing scripts > 1 Mb. It adds about 40-60 seconds to the
> >   whole linking process (BTF step, 2-3 kallsyms steps and the final
> >   step), but "better safe than sorry".
> >   In addition, that approach allows to reserve some space at the end
> >   and add some link assertions ->
> 
> Yeah, this "hope that orphan handling does it right" bugged me too, but my
> attempts to solve it looked much like yours: creating a linker file that
> named all the sections. I found this to be prohibitively expensive at link
> time (and that seems backed by your own measurements of an extra minute
> or so at link time). If that's still the result of using a generated
> linker file, we just need to depend on orphan handling. LD_ORPHAN_WARN
> will still exist for non-FGKASLR builds, so the benefits will continue
> to exist -- I think the correct solution is to have the linker grow a
> "pass through" special target like "DISCARD", which just maps given
> input section patterns into same-named output sections.

We still have LD_ORPHAN_WARN on non-FG-KASLR builds, but we also
have a rather different set of sections with FG-KASLR enabled. For
example, I noticed the appearing of .symtab_shndx section only in
virtue of LD_ORPHAN_WARN. So it's kinda not the same.
I don't see a problem in this extra minute. FG-KASLR is all about
security, and you often pay something for this. We already have a
size increase, and a small delay while booting, and we can't get
rid of them. With orphan sections you leave a space for potentional
flaws of the code, linker and/or linker script, which is really
unwanted in case of a security feature.
After all, ClangLTO increases the linking time at lot, and
TRIM_UNUSED_KSYMS builds almost the entire kernel two times in a
row, but nobody complains about this as there's nothing we can do
with it and it's the price you pay for the optimizations, so again,
I don't see a problem here.
I'll be glad to see approaches with no link time penalties and still
without "grey zones" like orphans and stuff, but I could come only
with this. This can be a room for future patches and optimizations.

> > * Input .text section now must be empty, otherwise the linkage will
> >   be stopped. This is implemented by the size assertion in the
> >   resulting LD script and is designed to plug the potentional layout
> >   leakage. This also means that ->
> 
> I worry this will create unexpected problems for named sections that
> weren't originally being randomized with the v5 FGKASLR.

1. Input .text just contained a bunch of ASM functions (described
   below), none of them required any kind of special handling.
2. This was tested a lot.
3. We have plenty of time to test on a wide variety of setups since
   we miss 5.15 window.

> > * "Regular" ASM functions are now being placed into unique separate
> >   functions the same way compiler does this for C functions. This is
> >   achieved by introducing and using several new macros which take
> >   the symbol name as a base for its new section name.
> >   This gives a better opportunity to both DCE and FG-KASLR, as ASM
> >   code now can also be randomized or garbage-collected;
> 
> This is interesting! I think it'd be a good evolutionary step on top of
> "basic FGKASLR".

I still don't get why you're trying to split this series into two.
It's been almost a year since v5 was published, I doubt you can get
"basic FG-KASLR" accepted quickly just because it was reviewed back
then.
I prefer to provide a full picture of what I'm trying to bring, so
the community could review it all and throw much more ideas and
stuff.

> > * It's now fully compatible with ClangLTO, ClangCFI,
> >   CONFIG_LD_ORPHAN_WARN and some more stuff landed since the last
> >   revision was published;
> 
> FWIW, v5 was was too. :) I didn't have to do anything to v5 to make it
> work with ClangLTO and ClangCFI.

Once again, repeating the thing I wrote earlier in our discussion:
ClangCFI, at least shadowed implementation, requires the first text
section of the module to be page-aligned and contain __cfi_check()
at the very beginning of this section. With FG-KASLR and without
special handling, this section gets randomized along with the
others, and ClangCFI either rejects almost all modules or panics
the kernel.

> > * Includes several fixes: relocations inside .altinstr_replacement
> >   code and minor issues found and/or suggested by LKP robot.
> 
> Excellent!
> 
> > The series was compile-time and runtime tested on the following
> > setups with no issues:
> > - x86_64, GCC 11, Binutils 2.35;
> > - x86_64, Clang/LLVM 12, ClangLTO + ClangCFI (from Sami's tree).
> 
> Great, this is a good start. One place we saw problems in the past was
> with i386 build gotchas, so that'll need testing too.

For now, FG_KASLR for x86 depends on X86_64. We might relax this
dependency later after enough testing or whatsoever (like it's been
done for ClangLTO).

> > The first 4 patches are from the linux-kbuild tree and included
> > to avoid merge conflicts and non-intuitive resolving of them.
> 
> Sounds good. It might be easier to base the series on linux-next, so a
> smaller series. Though given the merge window just opened, it might make
> more sense for a v7 to be based on v5.15-rc2 in three weeks.

I don't usually base any series on linux-next, because it contains
all the changes from all "for-next" branches and repos, while the
series finally gets accepted to the specific repo based on just
v5.x-rc1 (sometimes on -rc2). This may bring additional apply/merge
problems.

> > The series is also available here: [1]
> > 
> > [0] https://lore.kernel.org/kernel-hardening/20200923173905.11219-1-kristen@linux.intel.com
> > [1] https://github.com/alobakin/linux/pull/3
> > 
> > The original v5 cover letter:
> 
> More notes below...
> 
> > 
> > Function Granular Kernel Address Space Layout Randomization (fgkaslr)
> > ---------------------------------------------------------------------
> > 
> > This patch set is an implementation of finer grained kernel address space
> > randomization. It rearranges your kernel code at load time 
> > on a per-function level granularity, with only around a second added to
> > boot time.
> > 
> > Changes in v5:
> > --------------
> > * fixed a bug in the code which increases boot heap size for
> >   CONFIG_FG_KASLR which prevented the boot heap from being increased
> >   for CONFIG_FG_KASLR when using bzip2 compression. Thanks to Andy Lavr
> >   for finding the problem and identifying the solution.
> > * changed the adjustment of the orc_unwind_ip table at boot time to
> >   disregard relocs associated with this table, and instead inspect the
> >   entries separately. Relocs are not able to be used since they are
> >   no longer correct once the table is resorted at buildtime.
> > * changed how orc_unwind_ip addresses in randomized sections are identified
> >   to include the byte immediately after the end of the section.
> > * updated module code to use kvmalloc/kvfree based on suggestions from
> >   Evgenii Shatokhin <eshatokhin@virtuozzo.com>.
> > * changed kernel commandline to disable fgkaslr to simply "nofgkaslr" to
> >   match the nokaslr option. fgkaslr="X" can be added at a later date
> >   if it is needed.
> > * Added a patch to force livepatch to require symbols to be unique if
> >   using while fgkaslr either for core or modules.
> > 
> > Changes in v4:
> > -------------
> > * dropped the patch to split out change to STATIC definition in
> >   x86/boot/compressed/misc.c and replaced with a patch authored
> >   by Kees Cook to avoid the duplicate malloc definitions
> > * Added a section to Documentation/admin-guide/kernel-parameters.txt
> >   to document the fgkaslr boot option.
> > * redesigned the patch to hide the new layout when reading
> >   /proc/kallsyms. The previous implementation utilized a dynamically
> >   allocated linked list to display the kernel and module symbols
> >   in alphabetical order. The new implementation uses a randomly
> >   shuffled index array to display the kernel and module symbols
> >   in a random order.
> > 
> > Changes in v3:
> > -------------
> > * Makefile changes to accommodate CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> > * removal of extraneous ALIGN_PAGE from _etext changes
> > * changed variable names in x86/tools/relocs to be less confusing
> > * split out change to STATIC definition in x86/boot/compressed/misc.c
> > * Updates to Documentation to make it more clear what is preserved in .text
> > * much more detailed commit message for function granular KASLR patch
> > * minor tweaks and changes that make for more readable code
> > * this cover letter updated slightly to add additional details
> > 
> > Changes in v2:
> > --------------
> > * Fix to address i386 build failure
> > * Allow module reordering patch to be configured separately so that
> >   arm (or other non-x86_64 arches) can take advantage of module function
> >   reordering. This support has not be tested by me, but smoke tested by
> >   Ard Biesheuvel <ardb@kernel.org> on arm.
> > * Fix build issue when building on arm as reported by
> >   Ard Biesheuvel <ardb@kernel.org> 
> > 
> > Patches to objtool are included because they are dependencies for this
> > patchset, however they have been submitted by their maintainer separately.
> > 
> > Background
> > ----------
> > KASLR was merged into the kernel with the objective of increasing the
> > difficulty of code reuse attacks. Code reuse attacks reused existing code
> > snippets to get around existing memory protections. They exploit software bugs
> > which expose addresses of useful code snippets to control the flow of
> > execution for their own nefarious purposes. KASLR moves the entire kernel
> > code text as a unit at boot time in order to make addresses less predictable.
> > The order of the code within the segment is unchanged - only the base address
> > is shifted. There are a few shortcomings to this algorithm.
> > 
> > 1. Low Entropy - there are only so many locations the kernel can fit in. This
> >    means an attacker could guess without too much trouble.
> > 2. Knowledge of a single address can reveal the offset of the base address,
> >    exposing all other locations for a published/known kernel image.
> > 3. Info leaks abound.
> > 
> > Finer grained ASLR has been proposed as a way to make ASLR more resistant
> > to info leaks. It is not a new concept at all, and there are many variations
> > possible. Function reordering is an implementation of finer grained ASLR
> > which randomizes the layout of an address space on a function level
> > granularity. We use the term "fgkaslr" in this document to refer to the
> > technique of function reordering when used with KASLR, as well as finer grained
> > KASLR in general.
> > 
> > Proposed Improvement
> > --------------------
> > This patch set proposes adding function reordering on top of the existing
> > KASLR base address randomization. The over-arching objective is incremental
> > improvement over what we already have. It is designed to work in combination
> > with the existing solution. The implementation is really pretty simple, and
> > there are 2 main area where changes occur:
> > 
> > * Build time
> > 
> > GCC has had an option to place functions into individual .text sections for
> > many years now. This option can be used to implement function reordering at
> > load time. The final compiled vmlinux retains all the section headers, which
> > can be used to help find the address ranges of each function. Using this
> > information and an expanded table of relocation addresses, individual text
> > sections can be suffled immediately after decompression. Some data tables
> > inside the kernel that have assumptions about order require re-sorting
> > after being updated when applying relocations. In order to modify these tables,
> > a few key symbols are excluded from the objcopy symbol stripping process for
> > use after shuffling the text segments.
> > 
> > Some highlights from the build time changes to look for:
> > 
> > The top level kernel Makefile was modified to add the gcc flag if it
> > is supported. Currently, I am applying this flag to everything it is
> > possible to randomize. Anything that is written in C and not present in a
> > special input section is randomized. The final binary segment 0 retains a
> > consolidated .text section, as well as all the individual .text.* sections.
> > Future work could turn off this flags for selected files or even entire
> > subsystems, although obviously at the cost of security.
> > 
> > The relocs tool is updated to add relative relocations. This information
> > previously wasn't included because it wasn't necessary when moving the
> > entire .text segment as a unit. 
> > 
> > A new file was created to contain a list of symbols that objcopy should
> > keep. We use those symbols at load time as described below.
> > 
> > * Load time
> > 
> > The boot kernel was modified to parse the vmlinux elf file after
> > decompression to check for our interesting symbols that we kept, and to
> > look for any .text.* sections to randomize. The consolidated .text section
> > is skipped and not moved. The sections are shuffled randomly, and copied
> > into memory following the .text section in a new random order. The existing
> > code which updated relocation addresses was modified to account for
> > not just a fixed delta from the load address, but the offset that the function
> > section was moved to. This requires inspection of each address to see if
> > it was impacted by a randomization. We use a bsearch to make this less
> > horrible on performance. Any tables that need to be modified with new
> > addresses or resorted are updated using the symbol addresses parsed from the
> > elf symbol table.
> > 
> > In order to hide our new layout, symbols reported through /proc/kallsyms
> > will be displayed in a random order.
> > 
> > Security Considerations
> > -----------------------
> > The objective of this patch set is to improve a technology that is already
> > merged into the kernel (KASLR). This code will not prevent all attacks,
> > but should instead be considered as one of several tools that can be used.
> > In particular, this code is meant to make KASLR more effective in the presence
> > of info leaks.
> > 
> > How much entropy we are adding to the existing entropy of standard KASLR will
> > depend on a few variables. Firstly and most obviously, the number of functions
> > that are randomized matters. This implementation keeps the existing .text
> > section for code that cannot be randomized - for example, because it was
> > assembly code. The less sections to randomize, the less entropy. In addition,
> > due to alignment (16 bytes for x86_64), the number of bits in a address that
> > the attacker needs to guess is reduced, as the lower bits are identical.
> > 
> > Performance Impact
> > ------------------
> > There are two areas where function reordering can impact performance: boot
> > time latency, and run time performance.
> > 
> > * Boot time latency
> > This implementation of finer grained KASLR impacts the boot time of the kernel
> > in several places. It requires additional parsing of the kernel ELF file to
> > obtain the section headers of the sections to be randomized. It calls the
> > random number generator for each section to be randomized to determine that
> > section's new memory location. It copies the decompressed kernel into a new
> > area of memory to avoid corruption when laying out the newly randomized
> > sections. It increases the number of relocations the kernel has to perform at
> > boot time vs. standard KASLR, and it also requires a lookup on each address
> > that needs to be relocated to see if it was in a randomized section and needs
> > to be adjusted by a new offset. Finally, it re-sorts a few data tables that
> > are required to be sorted by address.
> > 
> > Booting a test VM on a modern, well appointed system showed an increase in
> > latency of approximately 1 second.
> > 
> > * Run time
> > The performance impact at run-time of function reordering varies by workload.
> > Using kcbench, a kernel compilation benchmark, the performance of a kernel
> > build with finer grained KASLR was about 1% slower than a kernel with standard
> > KASLR. Analysis with perf showed a slightly higher percentage of 
> > L1-icache-load-misses. Other workloads were examined as well, with varied
> > results. Some workloads performed significantly worse under FGKASLR, while
> > others stayed the same or were mysteriously better. In general, it will
> > depend on the code flow whether or not finer grained KASLR will impact
> > your workload, and how the underlying code was designed. Because the layout
> > changes per boot, each time a system is rebooted the performance of a workload
> > may change.
> > 
> > Future work could identify hot areas that may not be randomized and either
> > leave them in the .text section or group them together into a single section
> > that may be randomized. If grouping things together helps, one other thing to
> > consider is that if we could identify text blobs that should be grouped together
> > to benefit a particular code flow, it could be interesting to explore
> > whether this security feature could be also be used as a performance
> > feature if you are interested in optimizing your kernel layout for a
> > particular workload at boot time. Optimizing function layout for a particular
> > workload has been researched and proven effective - for more information
> > read the Facebook paper "Optimizing Function Placement for Large-Scale
> > Data-Center Applications" (see references section below).
> > 
> > Image Size
> > ----------
> > Adding additional section headers as a result of compiling with
> > -ffunction-sections will increase the size of the vmlinux ELF file.
> > With a standard distro config, the resulting vmlinux was increased by
> > about 3%. The compressed image is also increased due to the header files,
> > as well as the extra relocations that must be added. You can expect fgkaslr
> > to increase the size of the compressed image by about 15%.
> > 
> > Memory Usage
> > ------------
> > fgkaslr increases the amount of heap that is required at boot time,
> > although this extra memory is released when the kernel has finished
> > decompression. As a result, it may not be appropriate to use this feature on
> > systems without much memory.
> > 
> > Building
> > --------
> > To enable fine grained KASLR, you need to have the following config options
> > set (including all the ones you would use to build normal KASLR)
> > 
> > CONFIG_FG_KASLR=y
> > 
> > In addition, fgkaslr is only supported for the X86_64 architecture.
> > 
> > Modules
> > -------
> > Modules are randomized similarly to the rest of the kernel by shuffling
> > the sections at load time prior to moving them into memory. The module must
> > also have been build with the -ffunction-sections compiler option.
> > 
> > Although fgkaslr for the kernel is only supported for the X86_64 architecture,
> > it is possible to use fgkaslr with modules on other architectures. To enable
> > this feature, select
> > 
> > CONFIG_MODULE_FG_KASLR=y
> > 
> > This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.
> > 
> > Disabling
> > ---------
> > Disabling normal KASLR using the nokaslr command line option also disables
> > fgkaslr. It is also possible to disable fgkaslr separately by booting with
> > nofgkaslr on the commandline.
> > 
> > References
> > ----------
> > There are a lot of academic papers which explore finer grained ASLR.
> > This paper in particular contributed the most to my implementation design
> > as well as my overall understanding of the problem space:
> > 
> > Selfrando: Securing the Tor Browser against De-anonymization Exploits,
> > M. Conti, S. Crane, T. Frassetto, et al.
> > 
> > For more information on how function layout impacts performance, see:
> > 
> > Optimizing Function Placement for Large-Scale Data-Center Applications,
> > G. Ottoni, B. Maher
> > 
> > Alexander Lobakin (7):
> >   linkage: add macros for putting ASM functions into own sections
> >   x86: conditionally place regular ASM functions into separate sections
> >   FG-KASLR: use a scripted approach to handle .text.* sections
> >   x86/boot: allow FG-KASLR to be selected
> >   arm64/crypto: conditionally place ASM functions into separate sections
> >   module: use a scripted approach for FG-KASLR
> >   maintainers: add MAINTAINERS entry for FG-KASLR
> > 
> > Kees Cook (2):
> >   x86/boot: Allow a "silent" kaslr random byte fetch
> >   x86/boot/compressed: Avoid duplicate malloc() implementations
> 
> These two can get landed right away -- they're standalone fixes that
> can safely go in -tip.
> 
> > 
> > Kristen Carlson Accardi (9):
> >   x86: tools/relocs: Support >64K section headers
> 
> Same for this.

They make little to no sense for non-FG-KASLR systems. And none of
them are "pure" fixes.
The same could be said about e.g. ORC lookup patch, but again, it
makes no sense right now.

> >   x86: Makefile: Add build and config option for CONFIG_FG_KASLR
> >   Make sure ORC lookup covers the entire _etext - _stext
> >   x86/tools: Add relative relocs for randomized functions
> >   x86: Add support for function granular KASLR
> >   kallsyms: Hide layout
> >   livepatch: only match unique symbols when using fgkaslr
> >   module: Reorder functions
> >   Documentation: add a documentation for FG-KASLR
> 
> I suspect it'll still be easier to review this series as a rebase v5
> followed by the evolutionary improvements, since the "basic FGKASLR" has
> been reviewed in the past, and is fairly noninvasive. The changes for
> ASM, new .text rules, etc, make a lot more changes that I think would be
> nice to have separate so reasonable a/b testing can be done.

I don't see a point in testing it two times instead of just one, as
well as in delivering this feature in two halves. It sounds like
"let's introduce ClangLTO, but firstly only for modules, as LTO for
vmlinux requires changes in objtool code and a special handling for
the initcalls".
The changes you mentioned only seem invasive, in fact, they can
carry way less harm than the "basic FG-KASLR" itself.

> I'll try to go through the individual patches soon, though I'm currently
> pretty swamped. :)
> 
> I'm looking forward to having this feature finally landed; it's a nice
> complement to future eXecute-Only memory work too.
> 
> -Kees
> 
> > 
> > Masahiro Yamada (3):
> >   kbuild: merge vmlinux_link() between the ordinary link and Clang LTO
> >   kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh
> >   kbuild: merge vmlinux_link() between ARCH=um and other architectures
> > 
> > Sami Tolvanen (1):
> >   kbuild: Fix TRIM_UNUSED_KSYMS with LTO_CLANG
> > 
> >  .../admin-guide/kernel-parameters.txt         |   6 +
> >  Documentation/security/fgkaslr.rst            | 172 ++++
> >  Documentation/security/index.rst              |   1 +
> >  MAINTAINERS                                   |  12 +
> >  Makefile                                      |  17 +-
> >  arch/Kconfig                                  |   3 +
> >  arch/arm64/crypto/aes-ce-ccm-core.S           |  16 +-
> >  arch/arm64/crypto/aes-ce-core.S               |  16 +-
> >  arch/arm64/crypto/aes-ce.S                    |   4 +-
> >  arch/arm64/crypto/aes-cipher-core.S           |   8 +-
> >  arch/arm64/crypto/aes-modes.S                 |  16 +-
> >  arch/arm64/crypto/aes-neon.S                  |   4 +-
> >  arch/arm64/crypto/aes-neonbs-core.S           |  38 +-
> >  arch/arm64/crypto/chacha-neon-core.S          |  18 +-
> >  arch/arm64/crypto/crct10dif-ce-core.S         |  14 +-
> >  arch/arm64/crypto/ghash-ce-core.S             |  24 +-
> >  arch/arm64/crypto/nh-neon-core.S              |   4 +-
> >  arch/arm64/crypto/poly1305-armv8.pl           |  17 +
> >  arch/arm64/crypto/sha1-ce-core.S              |   4 +-
> >  arch/arm64/crypto/sha2-ce-core.S              |   4 +-
> >  arch/arm64/crypto/sha3-ce-core.S              |   4 +-
> >  arch/arm64/crypto/sha512-armv8.pl             |  11 +
> >  arch/arm64/crypto/sha512-ce-core.S            |   4 +-
> >  arch/arm64/crypto/sm3-ce-core.S               |   4 +-
> >  arch/arm64/crypto/sm4-ce-core.S               |   4 +-
> >  arch/x86/Kconfig                              |   1 +
> >  arch/x86/boot/compressed/Makefile             |   9 +-
> >  arch/x86/boot/compressed/fgkaslr.c            | 905 ++++++++++++++++++
> >  arch/x86/boot/compressed/kaslr.c              |   4 -
> >  arch/x86/boot/compressed/misc.c               | 157 ++-
> >  arch/x86/boot/compressed/misc.h               |  30 +
> >  arch/x86/boot/compressed/utils.c              |  13 +
> >  arch/x86/boot/compressed/vmlinux.symbols      |  19 +
> >  arch/x86/crypto/aegis128-aesni-asm.S          |  36 +-
> >  arch/x86/crypto/aes_ctrby8_avx-x86_64.S       |  12 +-
> >  arch/x86/crypto/aesni-intel_asm.S             | 116 ++-
> >  arch/x86/crypto/aesni-intel_avx-x86_64.S      |  32 +-
> >  arch/x86/crypto/blake2s-core.S                |   8 +-
> >  arch/x86/crypto/blowfish-x86_64-asm_64.S      |  16 +-
> >  arch/x86/crypto/camellia-aesni-avx-asm_64.S   |  28 +-
> >  arch/x86/crypto/camellia-aesni-avx2-asm_64.S  |  28 +-
> >  arch/x86/crypto/camellia-x86_64-asm_64.S      |  16 +-
> >  arch/x86/crypto/cast5-avx-x86_64-asm_64.S     |  24 +-
> >  arch/x86/crypto/cast6-avx-x86_64-asm_64.S     |  20 +-
> >  arch/x86/crypto/chacha-avx2-x86_64.S          |  12 +-
> >  arch/x86/crypto/chacha-avx512vl-x86_64.S      |  12 +-
> >  arch/x86/crypto/chacha-ssse3-x86_64.S         |  16 +-
> >  arch/x86/crypto/crc32-pclmul_asm.S            |   4 +-
> >  arch/x86/crypto/crc32c-pcl-intel-asm_64.S     |   4 +-
> >  arch/x86/crypto/crct10dif-pcl-asm_64.S        |   4 +-
> >  arch/x86/crypto/des3_ede-asm_64.S             |   8 +-
> >  arch/x86/crypto/ghash-clmulni-intel_asm.S     |  12 +-
> >  arch/x86/crypto/nh-avx2-x86_64.S              |   4 +-
> >  arch/x86/crypto/nh-sse2-x86_64.S              |   4 +-
> >  arch/x86/crypto/poly1305-x86_64-cryptogams.pl |   8 +-
> >  arch/x86/crypto/serpent-avx-x86_64-asm_64.S   |  20 +-
> >  arch/x86/crypto/serpent-avx2-asm_64.S         |  20 +-
> >  arch/x86/crypto/serpent-sse2-i586-asm_32.S    |   8 +-
> >  arch/x86/crypto/serpent-sse2-x86_64-asm_64.S  |   8 +-
> >  arch/x86/crypto/sha1_avx2_x86_64_asm.S        |   4 +-
> >  arch/x86/crypto/sha1_ni_asm.S                 |   4 +-
> >  arch/x86/crypto/sha1_ssse3_asm.S              |   4 +-
> >  arch/x86/crypto/sha256-avx-asm.S              |   4 +-
> >  arch/x86/crypto/sha256-avx2-asm.S             |   4 +-
> >  arch/x86/crypto/sha256-ssse3-asm.S            |   4 +-
> >  arch/x86/crypto/sha256_ni_asm.S               |   4 +-
> >  arch/x86/crypto/sha512-avx-asm.S              |   4 +-
> >  arch/x86/crypto/sha512-avx2-asm.S             |   4 +-
> >  arch/x86/crypto/sha512-ssse3-asm.S            |   4 +-
> >  arch/x86/crypto/twofish-avx-x86_64-asm_64.S   |  20 +-
> >  arch/x86/crypto/twofish-i586-asm_32.S         |   8 +-
> >  arch/x86/crypto/twofish-x86_64-asm_64-3way.S  |   8 +-
> >  arch/x86/crypto/twofish-x86_64-asm_64.S       |   8 +-
> >  arch/x86/entry/entry_32.S                     |  24 +-
> >  arch/x86/entry/entry_64.S                     |  18 +-
> >  arch/x86/entry/thunk_32.S                     |   4 +-
> >  arch/x86/entry/thunk_64.S                     |   8 +-
> >  arch/x86/include/asm/boot.h                   |  13 +-
> >  arch/x86/include/asm/paravirt.h               |   2 +-
> >  arch/x86/include/asm/qspinlock_paravirt.h     |   2 +-
> >  arch/x86/kernel/acpi/wakeup_32.S              |   9 +-
> >  arch/x86/kernel/acpi/wakeup_64.S              |  10 +-
> >  arch/x86/kernel/ftrace_32.S                   |  19 +-
> >  arch/x86/kernel/ftrace_64.S                   |  28 +-
> >  arch/x86/kernel/irqflags.S                    |   4 +-
> >  arch/x86/kernel/kprobes/core.c                |   3 +-
> >  arch/x86/kernel/kvm.c                         |   2 +-
> >  arch/x86/kernel/relocate_kernel_32.S          |   2 +
> >  arch/x86/kernel/relocate_kernel_64.S          |   2 +
> >  arch/x86/kernel/vmlinux.lds.S                 |   6 +-
> >  arch/x86/kvm/emulate.c                        |   2 +-
> >  arch/x86/kvm/vmx/vmenter.S                    |   8 +-
> >  arch/x86/lib/clear_page_64.S                  |  12 +-
> >  arch/x86/lib/cmpxchg16b_emu.S                 |   4 +-
> >  arch/x86/lib/copy_mc_64.S                     |   8 +-
> >  arch/x86/lib/copy_page_64.S                   |   7 +-
> >  arch/x86/lib/copy_user_64.S                   |  18 +-
> >  arch/x86/lib/csum-copy_64.S                   |   4 +-
> >  arch/x86/lib/error-inject.c                   |   3 +-
> >  arch/x86/lib/getuser.S                        |  37 +-
> >  arch/x86/lib/hweight.S                        |   9 +-
> >  arch/x86/lib/iomap_copy_64.S                  |   4 +-
> >  arch/x86/lib/kaslr.c                          |  18 +-
> >  arch/x86/lib/memmove_64.S                     |   4 +-
> >  arch/x86/lib/memset_64.S                      |  12 +-
> >  arch/x86/lib/msr-reg.S                        |   8 +-
> >  arch/x86/lib/putuser.S                        |  18 +-
> >  arch/x86/mm/mem_encrypt_boot.S                |   8 +-
> >  arch/x86/platform/efi/efi_stub_64.S           |   4 +-
> >  arch/x86/platform/efi/efi_thunk_64.S          |   4 +-
> >  arch/x86/power/hibernate_asm_32.S             |  14 +-
> >  arch/x86/power/hibernate_asm_64.S             |  14 +-
> >  arch/x86/tools/relocs.c                       | 135 ++-
> >  arch/x86/tools/relocs.h                       |   4 +-
> >  arch/x86/tools/relocs_common.c                |  15 +-
> >  arch/x86/xen/xen-asm.S                        |  49 +-
> >  arch/x86/xen/xen-head.S                       |  10 +-
> >  include/asm-generic/vmlinux.lds.h             |  41 +-
> >  include/linux/decompress/mm.h                 |  12 +-
> >  include/linux/linkage.h                       |  76 ++
> >  include/uapi/linux/elf.h                      |   1 +
> >  init/Kconfig                                  |  51 +
> >  kernel/kallsyms.c                             | 158 ++-
> >  kernel/livepatch/core.c                       |  11 +
> >  kernel/module.c                               |  91 +-
> >  scripts/Makefile.build                        |  27 +-
> >  scripts/Makefile.lib                          |   7 +
> >  scripts/Makefile.modfinal                     |  36 +-
> >  scripts/Makefile.modpost                      |  22 +-
> >  scripts/gen_autoksyms.sh                      |  12 -
> >  scripts/generate_text_sections.pl             | 149 +++
> >  scripts/link-vmlinux.sh                       | 104 +-
> >  scripts/module.lds.S                          |  14 +-
> >  133 files changed, 2771 insertions(+), 757 deletions(-)
> >  create mode 100644 Documentation/security/fgkaslr.rst
> >  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
> >  create mode 100644 arch/x86/boot/compressed/utils.c
> >  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> >  create mode 100755 scripts/generate_text_sections.pl
> > 
> > -- 
> > 2.31.1
> > 
> 
> -- 
> Kees Cook

Thanks,
Al
Kees Cook Sept. 2, 2021, 1:36 a.m. UTC | #3
On Wed, Sep 01, 2021 at 12:36:58PM +0200, Alexander Lobakin wrote:
> Without FG-KASLR, we have only one .text section, and the total
> section number is relatively small.
> With FG-KASLR enabled, we have 40K+ separate text sections (I have
> 40K on a setup with ClangLTO and ClangCFI and about 48K on a
> "regular" one) and each of them is described in the ELF header. Plus
> a separate .rela.text section for every single of them. That's the
> main reason of the size increases.

If you have the size comparisons handy, I'd love to see them. My memory
from v5 was that none of that end up in-core. And in that case, why
limit the entropy of the resulting layout?

> We still have LD_ORPHAN_WARN on non-FG-KASLR builds, but we also
> have a rather different set of sections with FG-KASLR enabled. For
> example, I noticed the appearing of .symtab_shndx section only in
> virtue of LD_ORPHAN_WARN. So it's kinda not the same.

Agreed: I'd rather have LD_ORPHAN_WARN always enabled.

> I don't see a problem in this extra minute. FG-KASLR is all about

But not at this cost. Maybe the x86 maintainers will disagree, but I see
this as a prohibitive cost to doing development work under FGKASLR, and
if we expect this to become the default in distros, no one is going to
be happy with that change. Link time dominates the partial rebuild time,
so my opinion is that it should not be so inflated if not absolutely
needed. Perhaps once the link time bugs in ld.bfd and ld.lld get fixed,
but not now.

> security, and you often pay something for this. We already have a
> size increase, and a small delay while booting, and we can't get
> rid of them. With orphan sections you leave a space for potentional

There's a difference between development time costs and run time costs.
I don't think the LD_ORPHAN_WARN coverage is worth it in this case.

Either way, we need to fix the linker.

> flaws of the code, linker and/or linker script, which is really
> unwanted in case of a security feature.
> After all, ClangLTO increases the linking time at lot, and
> TRIM_UNUSED_KSYMS builds almost the entire kernel two times in a
> row, but nobody complains about this as there's nothing we can do
> with it and it's the price you pay for the optimizations, so again,
> I don't see a problem here.

I get what you mean with regard to getting the perfect situation, but
the kernel went 29 years without LD_ORPHAN_WARN. :) Anyway, we'll see
what other folks think, I guess.

> I still don't get why you're trying to split this series into two.
> It's been almost a year since v5 was published, I doubt you can get
> "basic FG-KASLR" accepted quickly just because it was reviewed back
> then.

Well, because it was blocked then by a single bug, and everything else
you've described are distinct improvements on v5, so to me it makes
sense to have it separated into those phases. I don't mean split the
series, I mean rearrange the series so that a rebased v5 is at the
start, and the improvements follow.

> I prefer to provide a full picture of what I'm trying to bring, so
> the community could review it all and throw much more ideas and
> stuff.

Understood. I am suggesting some ideas about how it might help with
review. :)

> > > * It's now fully compatible with ClangLTO, ClangCFI,
> > >   CONFIG_LD_ORPHAN_WARN and some more stuff landed since the last
> > >   revision was published;
> > 
> > FWIW, v5 was was too. :) I didn't have to do anything to v5 to make it
> > work with ClangLTO and ClangCFI.
> 
> Once again, repeating the thing I wrote earlier in our discussion:
> ClangCFI, at least shadowed implementation, requires the first text
> section of the module to be page-aligned and contain __cfi_check()
> at the very beginning of this section. With FG-KASLR and without
> special handling, this section gets randomized along with the
> others, and ClangCFI either rejects almost all modules or panics
> the kernel.

Ah-ha, thanks. I must have missed your answer to this earlier. I had
probably done my initial v5 testing without modules.

> > Great, this is a good start. One place we saw problems in the past was
> > with i386 build gotchas, so that'll need testing too.
> 
> For now, FG_KASLR for x86 depends on X86_64. We might relax this
> dependency later after enough testing or whatsoever (like it's been
> done for ClangLTO).

Yes, but we've had a history of making big patches that do _intend_ to
break the i386 build, but they do anyway. Hence my question.

> > Sounds good. It might be easier to base the series on linux-next, so a
> > smaller series. Though given the merge window just opened, it might make
> > more sense for a v7 to be based on v5.15-rc2 in three weeks.
> 
> I don't usually base any series on linux-next, because it contains
> all the changes from all "for-next" branches and repos, while the
> series finally gets accepted to the specific repo based on just
> v5.x-rc1 (sometimes on -rc2). This may bring additional apply/merge
> problems.

Understood. I just find it confusing to include patches on lkml that
already exist in a -next branch. Perhaps base on kbuild -next?

> > > Kees Cook (2):
> > >   x86/boot: Allow a "silent" kaslr random byte fetch
> > >   x86/boot/compressed: Avoid duplicate malloc() implementations
> > 
> > These two can get landed right away -- they're standalone fixes that
> > can safely go in -tip.
> > 
> > > 
> > > Kristen Carlson Accardi (9):
> > >   x86: tools/relocs: Support >64K section headers
> > 
> > Same for this.
> 
> They make little to no sense for non-FG-KASLR systems. And none of
> them are "pure" fixes.
> The same could be said about e.g. ORC lookup patch, but again, it
> makes no sense right now.

*shrug* They're trivial changes that have been reviewed before, so it
seems like we can avoid resending them every time.

> > I suspect it'll still be easier to review this series as a rebase v5
> > followed by the evolutionary improvements, since the "basic FGKASLR" has
> > been reviewed in the past, and is fairly noninvasive. The changes for
> > ASM, new .text rules, etc, make a lot more changes that I think would be
> > nice to have separate so reasonable a/b testing can be done.
> 
> I don't see a point in testing it two times instead of just one, as
> well as in delivering this feature in two halves. It sounds like
> "let's introduce ClangLTO, but firstly only for modules, as LTO for
> vmlinux requires changes in objtool code and a special handling for
> the initcalls".
> The changes you mentioned only seem invasive, in fact, they can
> carry way less harm than the "basic FG-KASLR" itself.

Mostly it's a question of building on prior testing (v5 worked), so that
new changes can be debugged if they cause problems. Regardless, it's
been so long, perhaps it won't matter to other reviewers and they'll
want to just start over from scratch.

-Kees
Alexander Lobakin Sept. 3, 2021, 11:19 a.m. UTC | #4
> From: Kees Cook <keescook@chromium.org>
> Date: Wed, 1 Sep 2021 18:36:59 -0700
> 
> On Wed, Sep 01, 2021 at 12:36:58PM +0200, Alexander Lobakin wrote:
> > Without FG-KASLR, we have only one .text section, and the total
> > section number is relatively small.
> > With FG-KASLR enabled, we have 40K+ separate text sections (I have
> > 40K on a setup with ClangLTO and ClangCFI and about 48K on a
> > "regular" one) and each of them is described in the ELF header. Plus
> > a separate .rela.text section for every single of them. That's the
> > main reason of the size increases.
> 
> If you have the size comparisons handy, I'd love to see them. My memory
> from v5 was that none of that end up in-core. And in that case, why
> limit the entropy of the resulting layout?

My testing machine is down for now, but I could send a size
comparison later. It's something about 10 Mb of uncompressed kernel
between 1 and 4 fps or so.

> > We still have LD_ORPHAN_WARN on non-FG-KASLR builds, but we also
> > have a rather different set of sections with FG-KASLR enabled. For
> > example, I noticed the appearing of .symtab_shndx section only in
> > virtue of LD_ORPHAN_WARN. So it's kinda not the same.
> 
> Agreed: I'd rather have LD_ORPHAN_WARN always enabled.
> 
> > I don't see a problem in this extra minute. FG-KASLR is all about
> 
> But not at this cost. Maybe the x86 maintainers will disagree, but I see
> this as a prohibitive cost to doing development work under FGKASLR, and
> if we expect this to become the default in distros, no one is going to
> be happy with that change. Link time dominates the partial rebuild time,
> so my opinion is that it should not be so inflated if not absolutely
> needed. Perhaps once the link time bugs in ld.bfd and ld.lld get fixed,
> but not now.

I don't think FG-KASLR will be enabled by default in distros. Apart
from linking time, it also increases cache misses a lot, and when
it comes to performance critical usecases like high-speed servers
and datacenters, I don't believe their maintainers would consider
FG-KASLR.
Speaking about distros, almost no build systems to my knowledge use
partial building, so this is only a downside for developers.

> > security, and you often pay something for this. We already have a
> > size increase, and a small delay while booting, and we can't get
> > rid of them. With orphan sections you leave a space for potentional
> 
> There's a difference between development time costs and run time costs.
> I don't think the LD_ORPHAN_WARN coverage is worth it in this case.
> 
> Either way, we need to fix the linker.

I agree on that, I was surprised both BFD and LLD choke on big LD
scripts.

> > flaws of the code, linker and/or linker script, which is really
> > unwanted in case of a security feature.
> > After all, ClangLTO increases the linking time at lot, and
> > TRIM_UNUSED_KSYMS builds almost the entire kernel two times in a
> > row, but nobody complains about this as there's nothing we can do
> > with it and it's the price you pay for the optimizations, so again,
> > I don't see a problem here.
> 
> I get what you mean with regard to getting the perfect situation, but
> the kernel went 29 years without LD_ORPHAN_WARN. :) Anyway, we'll see
> what other folks think, I guess.

Also agree, let's wait for more opinions on that, I'm open to
everything.

> > I still don't get why you're trying to split this series into two.
> > It's been almost a year since v5 was published, I doubt you can get
> > "basic FG-KASLR" accepted quickly just because it was reviewed back
> > then.
> 
> Well, because it was blocked then by a single bug, and everything else
> you've described are distinct improvements on v5, so to me it makes
> sense to have it separated into those phases. I don't mean split the
> series, I mean rearrange the series so that a rebased v5 is at the
> start, and the improvements follow.
> 
> > I prefer to provide a full picture of what I'm trying to bring, so
> > the community could review it all and throw much more ideas and
> > stuff.
> 
> Understood. I am suggesting some ideas about how it might help with
> review. :)
> 
> > > > * It's now fully compatible with ClangLTO, ClangCFI,
> > > >   CONFIG_LD_ORPHAN_WARN and some more stuff landed since the last
> > > >   revision was published;
> > > 
> > > FWIW, v5 was was too. :) I didn't have to do anything to v5 to make it
> > > work with ClangLTO and ClangCFI.
> > 
> > Once again, repeating the thing I wrote earlier in our discussion:
> > ClangCFI, at least shadowed implementation, requires the first text
> > section of the module to be page-aligned and contain __cfi_check()
> > at the very beginning of this section. With FG-KASLR and without
> > special handling, this section gets randomized along with the
> > others, and ClangCFI either rejects almost all modules or panics
> > the kernel.
> 
> Ah-ha, thanks. I must have missed your answer to this earlier. I had
> probably done my initial v5 testing without modules.
> 
> > > Great, this is a good start. One place we saw problems in the past was
> > > with i386 build gotchas, so that'll need testing too.
> > 
> > For now, FG_KASLR for x86 depends on X86_64. We might relax this
> > dependency later after enough testing or whatsoever (like it's been
> > done for ClangLTO).
> 
> Yes, but we've had a history of making big patches that do _intend_ to
> break the i386 build, but they do anyway. Hence my question.
> 
> > > Sounds good. It might be easier to base the series on linux-next, so a
> > > smaller series. Though given the merge window just opened, it might make
> > > more sense for a v7 to be based on v5.15-rc2 in three weeks.
> > 
> > I don't usually base any series on linux-next, because it contains
> > all the changes from all "for-next" branches and repos, while the
> > series finally gets accepted to the specific repo based on just
> > v5.x-rc1 (sometimes on -rc2). This may bring additional apply/merge
> > problems.
> 
> Understood. I just find it confusing to include patches on lkml that
> already exist in a -next branch. Perhaps base on kbuild -next?

That's not a problem anymore I believe, since it doesn't hit 5.15
window, so the rebased v7 will be on top of 5.15-rc1 which will
already contain those Kbuild fixes.

> > > > Kees Cook (2):
> > > >   x86/boot: Allow a "silent" kaslr random byte fetch
> > > >   x86/boot/compressed: Avoid duplicate malloc() implementations
> > > 
> > > These two can get landed right away -- they're standalone fixes that
> > > can safely go in -tip.
> > > 
> > > > 
> > > > Kristen Carlson Accardi (9):
> > > >   x86: tools/relocs: Support >64K section headers
> > > 
> > > Same for this.
> > 
> > They make little to no sense for non-FG-KASLR systems. And none of
> > them are "pure" fixes.
> > The same could be said about e.g. ORC lookup patch, but again, it
> > makes no sense right now.
> 
> *shrug* They're trivial changes that have been reviewed before, so it
> seems like we can avoid resending them every time.
> 
> > > I suspect it'll still be easier to review this series as a rebase v5
> > > followed by the evolutionary improvements, since the "basic FGKASLR" has
> > > been reviewed in the past, and is fairly noninvasive. The changes for
> > > ASM, new .text rules, etc, make a lot more changes that I think would be
> > > nice to have separate so reasonable a/b testing can be done.
> > 
> > I don't see a point in testing it two times instead of just one, as
> > well as in delivering this feature in two halves. It sounds like
> > "let's introduce ClangLTO, but firstly only for modules, as LTO for
> > vmlinux requires changes in objtool code and a special handling for
> > the initcalls".
> > The changes you mentioned only seem invasive, in fact, they can
> > carry way less harm than the "basic FG-KASLR" itself.
> 
> Mostly it's a question of building on prior testing (v5 worked), so that
> new changes can be debugged if they cause problems. Regardless, it's
> been so long, perhaps it won't matter to other reviewers and they'll
> want to just start over from scratch.
> 
> -Kees
> 
> -- 
> Kees Cook

Thanks,
Al