mbox series

[v4,00/10] klp-convert livepatch build tooling

Message ID 20190509143859.9050-1-joe.lawrence@redhat.com (mailing list archive)
Headers show
Series klp-convert livepatch build tooling | expand

Message

Joe Lawrence May 9, 2019, 2:38 p.m. UTC
Hi all,

Thanks for the feedback to v3, I've incorporated a few more bug fixes
and style/spelling nitpicks for the series.  v4 is a bit of a resting
place to collect loose ends before considering some heavier future work,
namely arch-specific special section support.  See the TODO section
below for more details.


Previous versions
-----------------

RFC:
  https://lore.kernel.org/lkml/cover.1477578530.git.jpoimboe@redhat.com/
v2:
  https://lore.kernel.org/lkml/f52d29f7-7d1b-ad3d-050b-a9fa8878faf2@redhat.com/
v3:
  https://lore.kernel.org/lkml/20190410155058.9437-1-joe.lawrence@redhat.com/


Summary
-------

Livepatches may use symbols which are not contained in its own scope,
and, because of that, may end up compiled with relocations that will
only be resolved during module load. Yet, when the referenced symbols are
not exported, solving this relocation requires information on the object
that holds the symbol (either vmlinux or modules) and its position inside
the object, as an object may contain multiple symbols with the same name.
Providing such information must be done accordingly to what is specified
in Documentation/livepatch/module-elf-format.txt.

Currently, there is no trivial way to embed the required information as
requested in the final livepatch elf object. klp-convert solves this
problem in two different forms: (i) by relying on a symbol map, which is
built during kernel compilation, to automatically infer the relocation
targeted symbol, and, when such inference is not possible (ii) by using
annotations in the elf object to convert the relocation accordingly to
the specification, enabling it to be handled by the livepatch loader.

Given the above, add support for symbol mapping in the form of
Symbols.list file; add klp-convert tool; integrate klp-convert tool into
kbuild; make livepatch modules discernible during kernel compilation
pipeline; add data-structure and macros to enable users to annotate
livepatch source code; make modpost stage compatible with livepatches;
update livepatch-sample and update documentation.

The patch was tested under three use-cases:

use-case 1: There is a relocation in the lp that can be automatically
resolved by klp-convert.  For example. see the saved_command_line
variable in lib/livepatch/test_klp_convert2.c.

use-case 2: There is a relocation in the lp that cannot be automatically
resolved, as the name of the respective symbol appears in multiple
objects. The livepatch contains an annotation to enable a correct
relocation.  See the KLP_MODULE_RELOC / KLP_SYMPOS annotation sections
in lib/livepatch/test_klp_convert{1,2}.c.

use-case 3: There is a relocation in the lp that cannot be automatically
resolved similarly as 2, but no annotation was provided in the
livepatch, triggering an error during compilation.  Reproducible by
removing the KLP_MODULE_RELOC / KLP_SYMPOS annotation sections in
lib/livepatch/test_klp_convert{1,2}.c.


Branches
--------

For review purposes, I posted two branches up to github:

  1 - an expanded branch with changes separate from the original
  https://github.com/joe-lawrence/linux/tree/klp-convert-v4-expanded

  2 - a squashed branch of (1) that comprises v4:
  https://github.com/joe-lawrence/linux/tree/klp-convert-v4

Non-trivial commits in the expanded branch have some extra commentary
and details for debugging in the commit message that were dropped when
squashing into their respective parent commits.


TODO
----

Summarized from the v3 thread, thanks to Miroslav, Joao and Josh for
feedback and parsing my long braindumps.

- Special (architecture specific) section support:

  .altinstructions, .altinst_replacement
  .parainstructions
   __jump_table

  We want to apply livepatch relocations *before* these sections are
  processed.  Or more precisely, the special section data structures
  entries which directly or indirectly involve livepatch relocations.
  Those need to be extracted into "klp.arch" sections, a corresponding
  rela section generated, and the kernel needs supporting code to handle
  deferred processing.

  Note that there is already x86-arch code present for handling
  .altinstruction and .parainstructions sections in
  arch/x86/kernel/livepatch.c.

- Support for multiple homonym <object, name> symbols with unique
  <position> values.  Consider a target module with symbol table that
  looks like:

     29: ... FILE    LOCAL  DEFAULT      ABS test_klp_convert_mod_a.c
     32: ... FUNC    LOCAL  DEFAULT        3 get_homonym_string
     33: ... OBJECT  LOCAL  DEFAULT        5 homonym_string
     37: ... FILE    LOCAL  DEFAULT      ABS test_klp_convert_mod_b.c
     38: ... FUNC    LOCAL  DEFAULT        3 get_homonym_string
     39: ... OBJECT  LOCAL  DEFAULT       11 homonym_string

- The BFD library is incapable of handling two rela sections with
  identical sh_info values *as relocation sections*.  This affects
  binutils and related programs like gdb, objdump, crash utility, etc.
  which fail to process klp-converted .ko files.

  https://sourceware.org/bugzilla/show_bug.cgi?id=24456
  https://sourceware.org/ml/binutils/2019-04/msg00194.html


Changelogs
----------

livepatch: Create and include UAPI headers
  v2:
  - jmoreira: split up and changelog
  v3:
  - joe: convert to SPDX license tags
  v4:
  - joe: from Masahiro, update UAPI headers with "GPL-2.0 WITH
    Linux-syscall-note"
  - joe: from Masahiro, types.h isn't needed by UAPI header

kbuild: Support for Symbols.list creation
  v3:
  - jmoreira: adjust for multiobject livepatch
  - joe: add klpclean to PHONY
  - joe: align KLP prefix
  - joe: update all in-tree livepatches with LIVEPATCH_* modinfo flag
  v4:
  - joe: from Miroslav, update the samples and self-test Makefiles with
    the LIVEPATCH_ build prefix.
  - joe: from Artem, use $(SLIST) in klpclean and $(call cmd,livepatch)
    instead of $(call cmd_livepatch)

livepatch: Add klp-convert tool
  v2:
  - khlebnikov: use HOSTLOADLIBES_ instead of HOSTLDFLAGS: -lelf must be
    at the end
  - jmoreira: add support to automatic relocation conversion in
    klp-convert.c, changelog
  v3:
  - joe: convert to SPDX license tags
  - jmoreira: add rela symbol name to WARNs
  - jmoreira: ignore relocations to .TOC and symbols with index 0
  - joe: separate and fix valid_sympos() sympos=0 and sympos=1..n checks
  - joe: fix symbol use-after-frees
  - joe: fix remaining valgrind leak complaints (non-error paths only)
  - joe: checkpatch nits
  v4:
  - joe: spelling nits s/Insuficient/Insufficient and s/clasic/classic
  - joe: from Miroslav, tweak klp-convert usage msg
  - joe: don't move multiple list elements in convert_rela() 
  - joe: relax duplicate user symbol check

livepatch: Add klp-convert annotation helpers
  v2:
  - jmoreira: split up: move KLP_MODULE_RELOC from previous patch to
    here, add KLP_SYMPOS, move macros from include/uapi/livepatch.h to
    include/linux/livepatch.h
  v3:
  - joe: from Josh, KLP_MODULE_RELOC macro should 4-byte align
    klp_module_reloc structures
  v4:
  - joe: remove the ',' struct array delimiter from KLP_SYMPOS

modpost: Integrate klp-convert
  v2:
  - khlebnikov: save cmd_ld_ko_o into .module.cmd, if_changed_rule
    doesn't do that.f
  - khlebnikov: fix bashisms for debian where /bin/sh is a symlink to
    /bin/dash
  - khlebnikov: rename rule_link_module to rule_ld_ko_o, otherwise
    arg-check inside if_changed_rule compares cmd_link_module and
    cmd_ld_ko_o.
  - khlebnikov: check modinfo -F livepatch only if CONFIG_LIVEPATCH is
    true.
  - mbenes: remove modinfo call. LIVEPATCH_ in Makefile
  - jmoreira: split up: move the .livepatch file-based scheme for
    identifying livepatches to a previous patch, as it was required for
    correctly building Symbols.list there.
  v3:
  - joe: adjust rule_ld_ko_o to call echo-cmd
  - joe: rebase for v5.1
  - joe: align KLP prefix
  v4:
  - joe: rule_ld_ko_o should include $(Q) to honor build verbosity

modpost: Add modinfo flag to livepatch modules
  v2:
  - jmoreira: fix modpost.c (add_livepatch_flag) to update module
    structure with livepatch flag and prevent modpost from breaking due to
    unresolved symbols
  v3:
  - joe: adjust modpost.c::get_modinfo() call for v5.0 version
  - joe: from Miroslav: remove MODULE_INFO(livepatch, "Y") from samples

livepatch: Add sample livepatch module
  v2:
  - jmoreira: update module to use KLP_SYMPOS
  - jmoreira: Comments on symbol resolution scheme
  - jmoreira: Update Makefile
  - jmoreira: Remove MODULE_INFO statement
  - jmoreira: Changelog
  v3:
  - joe: rebase for v5.1
  - joe: code shuffle to better match original source file

documentation: Update on livepatch elf format
  v3:
  - joe: clarify sympos=0 and sympos=1..n

livepatch/selftests: add klp-convert
  v3 (new)
  v4:
  - joe: Add a ',' struct array delimiter after KLP_SYMPOS

livepatch/klp-convert: abort on special sections
  v4 (new)


Git boilerplate
---------------

Joao Moreira (2):
  kbuild: Support for Symbols.list creation
  documentation: Update on livepatch elf format

Joe Lawrence (2):
  livepatch/selftests: add klp-convert
  livepatch/klp-convert: abort on special sections

Josh Poimboeuf (5):
  livepatch: Create and include UAPI headers
  livepatch: Add klp-convert tool
  livepatch: Add klp-convert annotation helpers
  modpost: Integrate klp-convert
  livepatch: Add sample livepatch module

Miroslav Benes (1):
  modpost: Add modinfo flag to livepatch modules

 .gitignore                                    |   1 +
 Documentation/livepatch/livepatch.txt         |   3 +
 Documentation/livepatch/module-elf-format.txt |  50 +-
 MAINTAINERS                                   |   2 +
 Makefile                                      |  30 +-
 include/linux/livepatch.h                     |  13 +
 include/uapi/linux/livepatch.h                |  20 +
 kernel/livepatch/core.c                       |   4 +-
 lib/livepatch/Makefile                        |  15 +
 lib/livepatch/test_klp_atomic_replace.c       |   1 -
 lib/livepatch/test_klp_callbacks_demo.c       |   1 -
 lib/livepatch/test_klp_callbacks_demo2.c      |   1 -
 lib/livepatch/test_klp_convert1.c             | 106 +++
 lib/livepatch/test_klp_convert2.c             | 103 +++
 lib/livepatch/test_klp_convert_mod_a.c        |  25 +
 lib/livepatch/test_klp_convert_mod_b.c        |  13 +
 lib/livepatch/test_klp_livepatch.c            |   1 -
 samples/livepatch/Makefile                    |   6 +
 .../livepatch/livepatch-annotated-sample.c    | 102 +++
 samples/livepatch/livepatch-callbacks-demo.c  |   1 -
 samples/livepatch/livepatch-sample.c          |   1 -
 samples/livepatch/livepatch-shadow-fix1.c     |   1 -
 samples/livepatch/livepatch-shadow-fix2.c     |   1 -
 scripts/Kbuild.include                        |   4 +-
 scripts/Makefile                              |   1 +
 scripts/Makefile.build                        |   7 +
 scripts/Makefile.modpost                      |  24 +-
 scripts/livepatch/.gitignore                  |   1 +
 scripts/livepatch/Makefile                    |   7 +
 scripts/livepatch/elf.c                       | 753 ++++++++++++++++++
 scripts/livepatch/elf.h                       |  73 ++
 scripts/livepatch/klp-convert.c               | 731 +++++++++++++++++
 scripts/livepatch/klp-convert.h               |  39 +
 scripts/livepatch/list.h                      | 391 +++++++++
 scripts/mod/modpost.c                         |  82 +-
 scripts/mod/modpost.h                         |   1 +
 .../selftests/livepatch/test-livepatch.sh     |  64 ++
 37 files changed, 2653 insertions(+), 26 deletions(-)
 create mode 100644 include/uapi/linux/livepatch.h
 create mode 100644 lib/livepatch/test_klp_convert1.c
 create mode 100644 lib/livepatch/test_klp_convert2.c
 create mode 100644 lib/livepatch/test_klp_convert_mod_a.c
 create mode 100644 lib/livepatch/test_klp_convert_mod_b.c
 create mode 100644 samples/livepatch/livepatch-annotated-sample.c
 create mode 100644 scripts/livepatch/.gitignore
 create mode 100644 scripts/livepatch/Makefile
 create mode 100644 scripts/livepatch/elf.c
 create mode 100644 scripts/livepatch/elf.h
 create mode 100644 scripts/livepatch/klp-convert.c
 create mode 100644 scripts/livepatch/klp-convert.h
 create mode 100644 scripts/livepatch/list.h

Comments

Miroslav Benes June 13, 2019, 1 p.m. UTC | #1
Hi Joe,

first, I'm sorry for the lack of response so far.

Maybe you've already noticed but the selftests fail. Well, at least in 
my VM. When test_klp_convert1.ko is loaded, the process is killed with

[  518.041826] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  518.042816] #PF: supervisor read access in kernel mode
[  518.043393] #PF: error_code(0x0000) - not-present page
[  518.043981] PGD 0 P4D 0 
[  518.044185] Oops: 0000 [#1] SMP PTI
[  518.044518] CPU: 2 PID: 2255 Comm: insmod Tainted: G           O  K   5.1.0-klp_convert_v4-193435-g67748576637e #2
[  518.045784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
[  518.046940] RIP: 0010:test_klp_convert_init+0x1c/0x40 [test_klp_convert1]
[  518.047611] Code: 1b a0 48 89 c6 e9 a8 c0 f4 e0 0f 1f 40 00 0f 1f 44 00 00 53 48 c7 c7 00 30 1b a0 e8 5e 33 f6 e0 85 c0 89 c3 74 04 89 d8 5b c3 <48> 8b 35 5d ef e4 5f 48 c7 c7 28 20 1b a0 e8 75 c0 f4 e0 e8 6c ff
[  518.049779] RSP: 0018:ffffc90000f37cc8 EFLAGS: 00010246
[  518.050243] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027de0
[  518.050922] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff88807ab54f40
[  518.051619] RBP: ffffffffa01b1080 R08: 0000000096efde7a R09: 0000000000000001
[  518.052332] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
[  518.053012] R13: 0000000000000000 R14: ffff888078b55000 R15: ffffc90000f37ea0
[  518.053714] FS:  00007febece1fb80(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
[  518.054514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  518.055078] CR2: 0000000000000000 CR3: 000000007a56a000 CR4: 00000000000006e0
[  518.055818] Call Trace:
[  518.056007]  do_one_initcall+0x6a/0x2da
[  518.056340]  ? do_init_module+0x22/0x230
[  518.056702]  ? rcu_read_lock_sched_held+0x96/0xa0
[  518.057125]  ? kmem_cache_alloc_trace+0x284/0x2e0
[  518.057493]  do_init_module+0x5a/0x230
[  518.057900]  load_module+0x17bc/0x1f50
[  518.058214]  ? __symbol_put+0x40/0x40
[  518.058499]  ? vfs_read+0x12d/0x160
[  518.058766]  __do_sys_finit_module+0x83/0xc0
[  518.059122]  do_syscall_64+0x57/0x190
[  518.059407]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
...

It crashes right in test_klp_convert_init() when print_*() using 
supposed-to-be-converted symbols are called. I'll debug it next week. Can 
you reproduce it too?

Regards,
Miroslav

PS: it is probably not a coincidence that I come across selftests failures 
right before I leave for a holiday...
Joe Lawrence June 13, 2019, 1:15 p.m. UTC | #2
On 6/13/19 9:00 AM, Miroslav Benes wrote:
> Hi Joe,
> 
> first, I'm sorry for the lack of response so far.
> 
> Maybe you've already noticed but the selftests fail. Well, at least in
> my VM. When test_klp_convert1.ko is loaded, the process is killed with
> 
> [  518.041826] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  518.042816] #PF: supervisor read access in kernel mode
> [  518.043393] #PF: error_code(0x0000) - not-present page
> [  518.043981] PGD 0 P4D 0
> [  518.044185] Oops: 0000 [#1] SMP PTI
> [  518.044518] CPU: 2 PID: 2255 Comm: insmod Tainted: G           O  K   5.1.0-klp_convert_v4-193435-g67748576637e #2
> [  518.045784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> [  518.046940] RIP: 0010:test_klp_convert_init+0x1c/0x40 [test_klp_convert1]
> [  518.047611] Code: 1b a0 48 89 c6 e9 a8 c0 f4 e0 0f 1f 40 00 0f 1f 44 00 00 53 48 c7 c7 00 30 1b a0 e8 5e 33 f6 e0 85 c0 89 c3 74 04 89 d8 5b c3 <48> 8b 35 5d ef e4 5f 48 c7 c7 28 20 1b a0 e8 75 c0 f4 e0 e8 6c ff
> [  518.049779] RSP: 0018:ffffc90000f37cc8 EFLAGS: 00010246
> [  518.050243] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027de0
> [  518.050922] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff88807ab54f40
> [  518.051619] RBP: ffffffffa01b1080 R08: 0000000096efde7a R09: 0000000000000001
> [  518.052332] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
> [  518.053012] R13: 0000000000000000 R14: ffff888078b55000 R15: ffffc90000f37ea0
> [  518.053714] FS:  00007febece1fb80(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
> [  518.054514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  518.055078] CR2: 0000000000000000 CR3: 000000007a56a000 CR4: 00000000000006e0
> [  518.055818] Call Trace:
> [  518.056007]  do_one_initcall+0x6a/0x2da
> [  518.056340]  ? do_init_module+0x22/0x230
> [  518.056702]  ? rcu_read_lock_sched_held+0x96/0xa0
> [  518.057125]  ? kmem_cache_alloc_trace+0x284/0x2e0
> [  518.057493]  do_init_module+0x5a/0x230
> [  518.057900]  load_module+0x17bc/0x1f50
> [  518.058214]  ? __symbol_put+0x40/0x40
> [  518.058499]  ? vfs_read+0x12d/0x160
> [  518.058766]  __do_sys_finit_module+0x83/0xc0
> [  518.059122]  do_syscall_64+0x57/0x190
> [  518.059407]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> ...
> 
> It crashes right in test_klp_convert_init() when print_*() using
> supposed-to-be-converted symbols are called. I'll debug it next week. Can
> you reproduce it too?

Hey, thanks for the report..

I don't recall the tests crashing, but I had put this patchset on the 
side for a few weeks now.  I'll try to fire up a VM and see what happens 
today.

> Regards,
> Miroslav
> 
> PS: it is probably not a coincidence that I come across selftests failures
> right before I leave for a holiday...
> 

Are you volunteering to go on holidays for each patchset until all of 
the selftests pass? :)

-- Joe
Joe Lawrence June 13, 2019, 8:48 p.m. UTC | #3
On 6/13/19 9:15 AM, Joe Lawrence wrote:
> On 6/13/19 9:00 AM, Miroslav Benes wrote:
>> Hi Joe,
>>
>> first, I'm sorry for the lack of response so far.
>>
>> Maybe you've already noticed but the selftests fail. Well, at least in
>> my VM. When test_klp_convert1.ko is loaded, the process is killed with
>>
>> [  518.041826] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> [  518.042816] #PF: supervisor read access in kernel mode
>> [  518.043393] #PF: error_code(0x0000) - not-present page
>> [  518.043981] PGD 0 P4D 0
>> [  518.044185] Oops: 0000 [#1] SMP PTI
>> [  518.044518] CPU: 2 PID: 2255 Comm: insmod Tainted: G           O  K   5.1.0-klp_convert_v4-193435-g67748576637e #2
>> [  518.045784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
>> [  518.046940] RIP: 0010:test_klp_convert_init+0x1c/0x40 [test_klp_convert1]
>> [  518.047611] Code: 1b a0 48 89 c6 e9 a8 c0 f4 e0 0f 1f 40 00 0f 1f 44 00 00 53 48 c7 c7 00 30 1b a0 e8 5e 33 f6 e0 85 c0 89 c3 74 04 89 d8 5b c3 <48> 8b 35 5d ef e4 5f 48 c7 c7 28 20 1b a0 e8 75 c0 f4 e0 e8 6c ff
>> [  518.049779] RSP: 0018:ffffc90000f37cc8 EFLAGS: 00010246
>> [  518.050243] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027de0
>> [  518.050922] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff88807ab54f40
>> [  518.051619] RBP: ffffffffa01b1080 R08: 0000000096efde7a R09: 0000000000000001
>> [  518.052332] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
>> [  518.053012] R13: 0000000000000000 R14: ffff888078b55000 R15: ffffc90000f37ea0
>> [  518.053714] FS:  00007febece1fb80(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
>> [  518.054514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  518.055078] CR2: 0000000000000000 CR3: 000000007a56a000 CR4: 00000000000006e0
>> [  518.055818] Call Trace:
>> [  518.056007]  do_one_initcall+0x6a/0x2da
>> [  518.056340]  ? do_init_module+0x22/0x230
>> [  518.056702]  ? rcu_read_lock_sched_held+0x96/0xa0
>> [  518.057125]  ? kmem_cache_alloc_trace+0x284/0x2e0
>> [  518.057493]  do_init_module+0x5a/0x230
>> [  518.057900]  load_module+0x17bc/0x1f50
>> [  518.058214]  ? __symbol_put+0x40/0x40
>> [  518.058499]  ? vfs_read+0x12d/0x160
>> [  518.058766]  __do_sys_finit_module+0x83/0xc0
>> [  518.059122]  do_syscall_64+0x57/0x190
>> [  518.059407]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> ...
>>
>> It crashes right in test_klp_convert_init() when print_*() using
>> supposed-to-be-converted symbols are called. I'll debug it next week. Can
>> you reproduce it too?
> 
> Hey, thanks for the report..
> 
> I don't recall the tests crashing, but I had put this patchset on the
> side for a few weeks now.  I'll try to fire up a VM and see what happens
> today.
> 

Hmm, I haven't been able to reproduce using my original base (Linux 5.1-rc6)
or when rebased ontop of livepatching.git/master + 997a55f3fb6d("stacktrace: Unbreak stack_trace_save_tsk_reliable()")


FWIW, my test_klp_convert1.ko has these klp-converted relocations:

Relocation section [36] '.klp.rela.vmlinux..text.unlikely' for section [ 5] '.text.unlikely' at offset 0x4a6b8 contains 1 entry:                                   
  Offset              Type            Value               Addend Name                                                                                              
  0x0000000000000003  X86_64_PC32     000000000000000000      -4 .klp.sym.vmlinux.saved_command_line,0 

Relocation section [37] '.klp.rela.test_klp_convert_mod..text.unlikely' for section [ 5] '.text.unlikely' at offset 0x4a6d0 contains 4 entries:                    
  Offset              Type            Value               Addend Name                                                                                              
  0x000000000000004e  X86_64_PC32     000000000000000000      -4 .klp.sym.test_klp_convert_mod.get_homonym_string,1                                                
  0x000000000000003d  X86_64_32S      000000000000000000      +0 .klp.sym.test_klp_convert_mod.homonym_string,1                                                    
  0x0000000000000027  X86_64_PC32     000000000000000000      -4 .klp.sym.test_klp_convert_mod.get_driver_name,0                                                   
  0x0000000000000016  X86_64_32S      000000000000000000      +0 .klp.sym.test_klp_convert_mod.driver_name,0

-- Joe
Petr Mladek June 14, 2019, 8:34 a.m. UTC | #4
On Thu 2019-06-13 16:48:02, Joe Lawrence wrote:
> On 6/13/19 9:15 AM, Joe Lawrence wrote:
> > On 6/13/19 9:00 AM, Miroslav Benes wrote:
> >> Hi Joe,
> >>
> >> first, I'm sorry for the lack of response so far.
> >>
> >> Maybe you've already noticed but the selftests fail. Well, at least in
> >> my VM. When test_klp_convert1.ko is loaded, the process is killed with
> >>
> >> [  518.041826] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >> [  518.042816] #PF: supervisor read access in kernel mode
> >> [  518.043393] #PF: error_code(0x0000) - not-present page
> >> [  518.043981] PGD 0 P4D 0
> >> [  518.044185] Oops: 0000 [#1] SMP PTI
> >> [  518.044518] CPU: 2 PID: 2255 Comm: insmod Tainted: G           O  K   5.1.0-klp_convert_v4-193435-g67748576637e #2
> >> [  518.045784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> >> [  518.046940] RIP: 0010:test_klp_convert_init+0x1c/0x40 [test_klp_convert1]
> >> [  518.047611] Code: 1b a0 48 89 c6 e9 a8 c0 f4 e0 0f 1f 40 00 0f 1f 44 00 00 53 48 c7 c7 00 30 1b a0 e8 5e 33 f6 e0 85 c0 89 c3 74 04 89 d8 5b c3 <48> 8b 35 5d ef e4 5f 48 c7 c7 28 20 1b a0 e8 75 c0 f4 e0 e8 6c ff
> >> [  518.049779] RSP: 0018:ffffc90000f37cc8 EFLAGS: 00010246
> >> [  518.050243] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027de0
> >> [  518.050922] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff88807ab54f40
> >> [  518.051619] RBP: ffffffffa01b1080 R08: 0000000096efde7a R09: 0000000000000001
> >> [  518.052332] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
> >> [  518.053012] R13: 0000000000000000 R14: ffff888078b55000 R15: ffffc90000f37ea0
> >> [  518.053714] FS:  00007febece1fb80(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
> >> [  518.054514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [  518.055078] CR2: 0000000000000000 CR3: 000000007a56a000 CR4: 00000000000006e0
> >> [  518.055818] Call Trace:
> >> [  518.056007]  do_one_initcall+0x6a/0x2da
> >> [  518.056340]  ? do_init_module+0x22/0x230
> >> [  518.056702]  ? rcu_read_lock_sched_held+0x96/0xa0
> >> [  518.057125]  ? kmem_cache_alloc_trace+0x284/0x2e0
> >> [  518.057493]  do_init_module+0x5a/0x230
> >> [  518.057900]  load_module+0x17bc/0x1f50
> >> [  518.058214]  ? __symbol_put+0x40/0x40
> >> [  518.058499]  ? vfs_read+0x12d/0x160
> >> [  518.058766]  __do_sys_finit_module+0x83/0xc0
> >> [  518.059122]  do_syscall_64+0x57/0x190
> >> [  518.059407]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >> ...
> >>
> >> It crashes right in test_klp_convert_init() when print_*() using
> >> supposed-to-be-converted symbols are called. I'll debug it next week. Can
> >> you reproduce it too?
> > 
> > Hey, thanks for the report..
> > 
> > I don't recall the tests crashing, but I had put this patchset on the
> > side for a few weeks now.  I'll try to fire up a VM and see what happens
> > today.
> > 
> 
> Hmm, I haven't been able to reproduce using my original base (Linux 5.1-rc6)
> or when rebased ontop of livepatching.git/master + 997a55f3fb6d("stacktrace: Unbreak stack_trace_save_tsk_reliable()")

I stared into the code a bit but I did not find any bug. Let's hope
that it was just some pre-vacation last minute mistake (system
inconsistency or so ;-)

Anyway, I am curious about one thing. I saw:

function __load_mod() {
	local mod="$1"; shift

	local msg="% modprobe $mod $*"
	log "${msg%% }"
	ret=$(modprobe "$mod" "$@" 2>&1)
	if [[ "$ret" != "" ]]; then
		die "$ret"
	fi

	# Wait for module in sysfs ...
	loop_until '[[ -e "/sys/module/$mod" ]]' ||
		die "failed to load module $mod"
}

Is the waiting for sysfs really necessary here?

Note that it is /sys/module and not /sys/kernel/livepatch/.

My understanding is that modprobe waits until the module succesfully
loaded. mod_sysfs_setup() is called before the module init callback.
Therefore the sysfs interface should be read before modprobe returns.
Do I miss something?

If it works different way then there might be some races because
mod_sysfs_setup() is called before the module is alive.

Best Regards,
Petr
Joe Lawrence June 14, 2019, 2:20 p.m. UTC | #5
On 6/14/19 4:34 AM, Petr Mladek wrote:
> On Thu 2019-06-13 16:48:02, Joe Lawrence wrote:
>> On 6/13/19 9:15 AM, Joe Lawrence wrote:
>>> On 6/13/19 9:00 AM, Miroslav Benes wrote:
>>>> Hi Joe,
>>>>
>>>> first, I'm sorry for the lack of response so far.
>>>>
>>>> Maybe you've already noticed but the selftests fail. Well, at least in
>>>> my VM. When test_klp_convert1.ko is loaded, the process is killed with
>>>>
>>>> [  518.041826] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>>> [  518.042816] #PF: supervisor read access in kernel mode
>>>> [  518.043393] #PF: error_code(0x0000) - not-present page
>>>> [  518.043981] PGD 0 P4D 0
>>>> [  518.044185] Oops: 0000 [#1] SMP PTI
>>>> [  518.044518] CPU: 2 PID: 2255 Comm: insmod Tainted: G           O  K   5.1.0-klp_convert_v4-193435-g67748576637e #2
>>>> [  518.045784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
>>>> [  518.046940] RIP: 0010:test_klp_convert_init+0x1c/0x40 [test_klp_convert1]
>>>> [  518.047611] Code: 1b a0 48 89 c6 e9 a8 c0 f4 e0 0f 1f 40 00 0f 1f 44 00 00 53 48 c7 c7 00 30 1b a0 e8 5e 33 f6 e0 85 c0 89 c3 74 04 89 d8 5b c3 <48> 8b 35 5d ef e4 5f 48 c7 c7 28 20 1b a0 e8 75 c0 f4 e0 e8 6c ff
>>>> [  518.049779] RSP: 0018:ffffc90000f37cc8 EFLAGS: 00010246
>>>> [  518.050243] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027de0
>>>> [  518.050922] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff88807ab54f40
>>>> [  518.051619] RBP: ffffffffa01b1080 R08: 0000000096efde7a R09: 0000000000000001
>>>> [  518.052332] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
>>>> [  518.053012] R13: 0000000000000000 R14: ffff888078b55000 R15: ffffc90000f37ea0
>>>> [  518.053714] FS:  00007febece1fb80(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
>>>> [  518.054514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  518.055078] CR2: 0000000000000000 CR3: 000000007a56a000 CR4: 00000000000006e0
>>>> [  518.055818] Call Trace:
>>>> [  518.056007]  do_one_initcall+0x6a/0x2da
>>>> [  518.056340]  ? do_init_module+0x22/0x230
>>>> [  518.056702]  ? rcu_read_lock_sched_held+0x96/0xa0
>>>> [  518.057125]  ? kmem_cache_alloc_trace+0x284/0x2e0
>>>> [  518.057493]  do_init_module+0x5a/0x230
>>>> [  518.057900]  load_module+0x17bc/0x1f50
>>>> [  518.058214]  ? __symbol_put+0x40/0x40
>>>> [  518.058499]  ? vfs_read+0x12d/0x160
>>>> [  518.058766]  __do_sys_finit_module+0x83/0xc0
>>>> [  518.059122]  do_syscall_64+0x57/0x190
>>>> [  518.059407]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>> ...
>>>>
>>>> It crashes right in test_klp_convert_init() when print_*() using
>>>> supposed-to-be-converted symbols are called. I'll debug it next week. Can
>>>> you reproduce it too?
>>>
>>> Hey, thanks for the report..
>>>
>>> I don't recall the tests crashing, but I had put this patchset on the
>>> side for a few weeks now.  I'll try to fire up a VM and see what happens
>>> today.
>>>
>>
>> Hmm, I haven't been able to reproduce using my original base (Linux 5.1-rc6)
>> or when rebased ontop of livepatching.git/master + 997a55f3fb6d("stacktrace: Unbreak stack_trace_save_tsk_reliable()")
> 
> I stared into the code a bit but I did not find any bug. Let's hope
> that it was just some pre-vacation last minute mistake (system
> inconsistency or so ;-)
> 
> Anyway, I am curious about one thing. I saw:
> 
> function __load_mod() {
> 	local mod="$1"; shift
> 
> 	local msg="% modprobe $mod $*"
> 	log "${msg%% }"
> 	ret=$(modprobe "$mod" "$@" 2>&1)
> 	if [[ "$ret" != "" ]]; then
> 		die "$ret"
> 	fi
> 
> 	# Wait for module in sysfs ...
> 	loop_until '[[ -e "/sys/module/$mod" ]]' ||
> 		die "failed to load module $mod"
> }
> 
> Is the waiting for sysfs really necessary here?
> 
> Note that it is /sys/module and not /sys/kernel/livepatch/.

I can't remember if that was just paranoid-protective-bash coding or 
actually required.  Libor provided great feedback on the initial patch 
series that introduced the self-tests, perhaps he remembers.

> My understanding is that modprobe waits until the module succesfully
> loaded. mod_sysfs_setup() is called before the module init callback.
> Therefore the sysfs interface should be read before modprobe returns.
> Do I miss something?
 >
> If it works different way then there might be some races because
> mod_sysfs_setup() is called before the module is alive.

All of this is called from a single bash script function, so in a call 
stack fashion, something like this would occur when loading a livepatch 
module:

   [ mod_sysfs_setup() ]
   modprobe waits for:         .init complete, MODULE_STATE_LIVE
   __load_mod() waits for:     /sys/module/$mod
   load_lp_nowait() waits for: /sys/kernel/livepatch/$mod
   load_lp() waits for:        /sys/kernel/livepatch/$mod/transition = 0
   test-script.sh

So I would think that by calling modprobe, we ensure that the module 
code is ready to go.  The /sys/module/$mod check might be redundant as 
you say, but because modprobe completed, we should be safe, no?

The only "nowait" function we have is load_lp_nowait(), which would let 
us march onward before the livepatch transition may have completed.

-- Joe
Libor Pechacek June 14, 2019, 4:36 p.m. UTC | #6
On Fri 14-06-19 10:20:09, Joe Lawrence wrote:
> On 6/14/19 4:34 AM, Petr Mladek wrote:
[...]
> > Anyway, I am curious about one thing. I saw:
> > 
> > function __load_mod() {
> > 	local mod="$1"; shift
> > 
> > 	local msg="% modprobe $mod $*"
> > 	log "${msg%% }"
> > 	ret=$(modprobe "$mod" "$@" 2>&1)
> > 	if [[ "$ret" != "" ]]; then
> > 		die "$ret"
> > 	fi
> > 
> > 	# Wait for module in sysfs ...
> > 	loop_until '[[ -e "/sys/module/$mod" ]]' ||
> > 		die "failed to load module $mod"
> > }
> > 
> > Is the waiting for sysfs really necessary here?
> > 
> > Note that it is /sys/module and not /sys/kernel/livepatch/.
> 
> I can't remember if that was just paranoid-protective-bash coding or
> actually required.  Libor provided great feedback on the initial patch
> series that introduced the self-tests, perhaps he remembers.

I don't recall analyzing this spot in detail but looking at it now I don't see
anything wrong with it. While the check is likely superfluous, I'm not against
keeping it in place.

> > My understanding is that modprobe waits until the module succesfully
> > loaded. mod_sysfs_setup() is called before the module init callback.
> > Therefore the sysfs interface should be read before modprobe returns.
> > Do I miss something?
> >
> > If it works different way then there might be some races because
> > mod_sysfs_setup() is called before the module is alive.
> 
> All of this is called from a single bash script function, so in a call stack
> fashion, something like this would occur when loading a livepatch module:
> 
>   [ mod_sysfs_setup() ]
>   modprobe waits for:         .init complete, MODULE_STATE_LIVE
>   __load_mod() waits for:     /sys/module/$mod
>   load_lp_nowait() waits for: /sys/kernel/livepatch/$mod
>   load_lp() waits for:        /sys/kernel/livepatch/$mod/transition = 0
>   test-script.sh
> 
> So I would think that by calling modprobe, we ensure that the module code is
> ready to go.  The /sys/module/$mod check might be redundant as you say, but
> because modprobe completed, we should be safe, no?
> 
> The only "nowait" function we have is load_lp_nowait(), which would let us
> march onward before the livepatch transition may have completed.

And even that one is waiting for the live patch module name appear under
/sys/kernel/livepatch/. This is IMHO acceptable level of paranoia.

Libor
Miroslav Benes June 25, 2019, 11:36 a.m. UTC | #7
On Fri, 14 Jun 2019, Petr Mladek wrote:

> On Thu 2019-06-13 16:48:02, Joe Lawrence wrote:
> > On 6/13/19 9:15 AM, Joe Lawrence wrote:
> > > On 6/13/19 9:00 AM, Miroslav Benes wrote:
> > >> Hi Joe,
> > >>
> > >> first, I'm sorry for the lack of response so far.
> > >>
> > >> Maybe you've already noticed but the selftests fail. Well, at least in
> > >> my VM. When test_klp_convert1.ko is loaded, the process is killed with
> > >>
> > >> [  518.041826] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > >> [  518.042816] #PF: supervisor read access in kernel mode
> > >> [  518.043393] #PF: error_code(0x0000) - not-present page
> > >> [  518.043981] PGD 0 P4D 0
> > >> [  518.044185] Oops: 0000 [#1] SMP PTI
> > >> [  518.044518] CPU: 2 PID: 2255 Comm: insmod Tainted: G           O  K   5.1.0-klp_convert_v4-193435-g67748576637e #2
> > >> [  518.045784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> > >> [  518.046940] RIP: 0010:test_klp_convert_init+0x1c/0x40 [test_klp_convert1]
> > >> [  518.047611] Code: 1b a0 48 89 c6 e9 a8 c0 f4 e0 0f 1f 40 00 0f 1f 44 00 00 53 48 c7 c7 00 30 1b a0 e8 5e 33 f6 e0 85 c0 89 c3 74 04 89 d8 5b c3 <48> 8b 35 5d ef e4 5f 48 c7 c7 28 20 1b a0 e8 75 c0 f4 e0 e8 6c ff
> > >> [  518.049779] RSP: 0018:ffffc90000f37cc8 EFLAGS: 00010246
> > >> [  518.050243] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027de0
> > >> [  518.050922] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff88807ab54f40
> > >> [  518.051619] RBP: ffffffffa01b1080 R08: 0000000096efde7a R09: 0000000000000001
> > >> [  518.052332] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff
> > >> [  518.053012] R13: 0000000000000000 R14: ffff888078b55000 R15: ffffc90000f37ea0
> > >> [  518.053714] FS:  00007febece1fb80(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
> > >> [  518.054514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [  518.055078] CR2: 0000000000000000 CR3: 000000007a56a000 CR4: 00000000000006e0
> > >> [  518.055818] Call Trace:
> > >> [  518.056007]  do_one_initcall+0x6a/0x2da
> > >> [  518.056340]  ? do_init_module+0x22/0x230
> > >> [  518.056702]  ? rcu_read_lock_sched_held+0x96/0xa0
> > >> [  518.057125]  ? kmem_cache_alloc_trace+0x284/0x2e0
> > >> [  518.057493]  do_init_module+0x5a/0x230
> > >> [  518.057900]  load_module+0x17bc/0x1f50
> > >> [  518.058214]  ? __symbol_put+0x40/0x40
> > >> [  518.058499]  ? vfs_read+0x12d/0x160
> > >> [  518.058766]  __do_sys_finit_module+0x83/0xc0
> > >> [  518.059122]  do_syscall_64+0x57/0x190
> > >> [  518.059407]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >> ...
> > >>
> > >> It crashes right in test_klp_convert_init() when print_*() using
> > >> supposed-to-be-converted symbols are called. I'll debug it next week. Can
> > >> you reproduce it too?
> > > 
> > > Hey, thanks for the report..
> > > 
> > > I don't recall the tests crashing, but I had put this patchset on the
> > > side for a few weeks now.  I'll try to fire up a VM and see what happens
> > > today.
> > > 
> > 
> > Hmm, I haven't been able to reproduce using my original base (Linux 5.1-rc6)
> > or when rebased ontop of livepatching.git/master + 997a55f3fb6d("stacktrace: Unbreak stack_trace_save_tsk_reliable()")
> 
> I stared into the code a bit but I did not find any bug. Let's hope
> that it was just some pre-vacation last minute mistake (system
> inconsistency or so ;-)

It was not and I do not understand it much, so there is a brain dump here.

I'll take test_klp_convert1.c as an example. When you compile it, 
test_klp_convert1.klp.o (not-yet-converted kernel module) has these 
relevant relocations.

Relocation section '.rela.text' at offset 0x1328 contains 27 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000008  0000002800000002 R_X86_64_PC32          0000000000000000 saved_command_line - 4
000000000000009f  0000002800000002 R_X86_64_PC32          0000000000000000 saved_command_line - 4

When converted, test_klp_convert1.ko has

Relocation section '.rela.text' at offset 0x138 contains 22 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
000000000000009f  0000002600000002 R_X86_64_PC32          0000000000000000 .klp.sym.vmlinux.saved_command_line,0 - 4

and

Relocation section '.klp.rela.vmlinux..text' at offset 0x1968 contains 1 
entry:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000008  0000002600000002 R_X86_64_PC32          0000000000000000 .klp.sym.vmlinux.saved_command_line,0 - 4

See? One of the relocations was not moved to the correct .klp.rela 
section. The final module should have

Relocation section '.klp.rela.vmlinux..text' at offset 0x1928 contains 2 
entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
000000000000009f  0000002600000002 R_X86_64_PC32          0000000000000000 .klp.sym.vmlinux.saved_command_line,0 - 4
0000000000000008  0000002600000002 R_X86_64_PC32          0000000000000000 .klp.sym.vmlinux.saved_command_line,0 - 4

and no saved_command_line relocation in .rela.text.

It thus makes sense that there is a NULL pointer dereference. The code 
accesses non-relocated symbol and boom.

So I made a couple of experiments and found that GCC is somehow involved. 
If klp-convert (from scripts/livepatch/) is compiled with our GCC 4.8.5 
from SLE12, the output is incorrect. If I compile it with GCC 7.4.0 from 
openSUSE Leap 15.1, the output is correct.

If I revert commit d59cadc0a8f8 ("[squash] klp-convert: make 
convert_rela() list-safe") (from Joe's expanded github tree), the problem 
disappears.

I haven't spotted any problem in the code and I cannot explain a 
dependency on GCC version. Any ideas?

Miroslav
Joe Lawrence June 25, 2019, 1:24 p.m. UTC | #8
On 6/25/19 7:36 AM, Miroslav Benes wrote:
> 
 > [ ... snip ... ]
 >
> So I made a couple of experiments and found that GCC is somehow involved.
> If klp-convert (from scripts/livepatch/) is compiled with our GCC 4.8.5
> from SLE12, the output is incorrect. If I compile it with GCC 7.4.0 from
> openSUSE Leap 15.1, the output is correct.
> 
> If I revert commit d59cadc0a8f8 ("[squash] klp-convert: make
> convert_rela() list-safe") (from Joe's expanded github tree), the problem
> disappears.
> 
> I haven't spotted any problem in the code and I cannot explain a
> dependency on GCC version. Any ideas?
> 

Thanks for revisiting and debugging this.  Narrowing it down to my "fix" 
to convert_rela() should be helpful.

In my case, I was probably testing with RHEL-8, which has gcc 8.2 vs 
RHEL-7 which has gcc 4.8.  I'll have to make sure to try with a few 
different versions for the next round.

-- Joe
Joe Lawrence June 25, 2019, 7:08 p.m. UTC | #9
On Tue, Jun 25, 2019 at 01:36:37PM +0200, Miroslav Benes wrote:
>
> [ ... snip ... ]
>
> If I revert commit d59cadc0a8f8 ("[squash] klp-convert: make
> convert_rela() list-safe") (from Joe's expanded github tree), the problem
> disappears.
>
> I haven't spotted any problem in the code and I cannot explain a
> dependency on GCC version. Any ideas?
>

I can confirm that test_klp_convert1.ko crashes with RHEL-7 and its
older gcc.  I added some debugging printf's to klp-convert and see:

  % ./scripts/livepatch/klp-convert \
          ./Symbols.list \
          lib/livepatch/test_klp_convert1.klp.o \
          lib/livepatch/test_klp_convert1.ko | \
          grep saved_command_line

  convert_rela: oldsec: .rela.text rela @ 0x1279670 rela->sym @ 0x12791f0 (.klp.sym.vmlinux.saved_command_line,0) offset: 0x3
  convert_rela: oldsec: .rela.text rela @ 0x1279cd0 rela->sym @ 0x12791f0 (.klp.sym.vmlinux.saved_command_line,0) offset: 0x9a
  move_rela: rela @ 0x1279670 rela->sym @ 0x12791f0 (.klp.sym.vmlinux.saved_command_line,0) offset: 0x3
  main: skipping rela @ 0x1279cd0 rela->sym @ 0x12791f0 (.klp.sym.vmlinux.saved_command_line,0) (!must_convert)

I think the problem is:

- Relas at different offsets, but for the same symbol may share symbol
  storage.  Note the same rela->sym value above.

- Before d59cadc0a8f8 ("[squash] klp-convert: make convert_rela()
  list-safe"), convert_rela() iterated through the entire section's
  relas, moving any of the same name.  This was determined not to be
  list safe when moving consecutive relas in the linked list.

- After d59cadc0a8f8 ("[squash] klp-convert: make convert_rela()
  list-safe"), convert_rela() still iterates through the section relas,
  but only updates r1->sym->klp_rela_sec instead of moving them.
  move_rela() was added to be called by the for-each-rela loop in
  main().

  - Bug 1: klp_rela_sec probably belongs in struct rela and not struct
    symbol

  - Bug 2: the main loop skips over second, third, etc. matching relas
    anyway as the shared symbol name will have already been converted

The following fix might not be elegant, but I can't think of a clever
way to handle the original issue d59cadc0a8f8 ("[squash] klp-convert:
make convert_rela() list-safe") as well as these resulting regressions.
So I broke out the moving of relas to a seperate loop.  That is probably
worth a comment and at the same time we might be able to drop some of
these other "safe" loop traversals for ordinary list_for_each_entry.

-- Joe

-->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8-- -->8--

diff --git a/scripts/livepatch/elf.h b/scripts/livepatch/elf.h
index 7551409..57a8242 100644
--- a/scripts/livepatch/elf.h
+++ b/scripts/livepatch/elf.h
@@ -33,7 +33,6 @@ struct symbol {
 	struct list_head list;
 	GElf_Sym sym;
 	struct section *sec;
-	struct section *klp_rela_sec;
 	char *name;
 	unsigned int idx;
 	unsigned char bind, type;
@@ -45,6 +44,7 @@ struct rela {
 	struct list_head list;
 	GElf_Rela rela;
 	struct symbol *sym;
+	struct section *klp_rela_sec;
 	unsigned int type;
 	unsigned long offset;
 	int addend;
diff --git a/scripts/livepatch/klp-convert.c b/scripts/livepatch/klp-convert.c
index b5873ab..50d6471 100644
--- a/scripts/livepatch/klp-convert.c
+++ b/scripts/livepatch/klp-convert.c
@@ -525,7 +525,7 @@ static bool convert_rela(struct section *oldsec, struct rela *r,
 
 	list_for_each_entry_safe(r1, r2, &oldsec->relas, list) {
 		if (r1->sym->name == r->sym->name) {
-			r1->sym->klp_rela_sec = sec;
+			r1->klp_rela_sec = sec;
 		}
 	}
 	return true;
@@ -535,7 +535,7 @@ static void move_rela(struct rela *r)
 {
 	/* Move the converted rela to klp rela section */
 	list_del(&r->list);
-	list_add(&r->list, &r->sym->klp_rela_sec->relas);
+	list_add(&r->list, &r->klp_rela_sec->relas);
 }
 
 /* Checks if given symbol name matches a symbol in exp_symbols */
@@ -687,8 +687,11 @@ int main(int argc, const char **argv)
 					return -1;
 				}
 			}
+		}
 
-			move_rela(rela);
+		list_for_each_entry_safe(rela, tmprela, &sec->relas, list) {
+			if (is_converted(rela->sym->name))
+				move_rela(rela);
 		}
 	}
Miroslav Benes June 26, 2019, 10:27 a.m. UTC | #10
On Tue, 25 Jun 2019, Joe Lawrence wrote:

> On Tue, Jun 25, 2019 at 01:36:37PM +0200, Miroslav Benes wrote:
> >
> > [ ... snip ... ]
> >
> > If I revert commit d59cadc0a8f8 ("[squash] klp-convert: make
> > convert_rela() list-safe") (from Joe's expanded github tree), the problem
> > disappears.
> >
> > I haven't spotted any problem in the code and I cannot explain a
> > dependency on GCC version. Any ideas?
> >
> 
> I can confirm that test_klp_convert1.ko crashes with RHEL-7 and its
> older gcc.  I added some debugging printf's to klp-convert and see:
> 
>   % ./scripts/livepatch/klp-convert \
>           ./Symbols.list \
>           lib/livepatch/test_klp_convert1.klp.o \
>           lib/livepatch/test_klp_convert1.ko | \
>           grep saved_command_line
> 
>   convert_rela: oldsec: .rela.text rela @ 0x1279670 rela->sym @ 0x12791f0 (.klp.sym.vmlinux.saved_command_line,0) offset: 0x3
>   convert_rela: oldsec: .rela.text rela @ 0x1279cd0 rela->sym @ 0x12791f0 (.klp.sym.vmlinux.saved_command_line,0) offset: 0x9a
>   move_rela: rela @ 0x1279670 rela->sym @ 0x12791f0 (.klp.sym.vmlinux.saved_command_line,0) offset: 0x3
>   main: skipping rela @ 0x1279cd0 rela->sym @ 0x12791f0 (.klp.sym.vmlinux.saved_command_line,0) (!must_convert)
> 
> I think the problem is:
> 
> - Relas at different offsets, but for the same symbol may share symbol
>   storage.  Note the same rela->sym value above.
> 
> - Before d59cadc0a8f8 ("[squash] klp-convert: make convert_rela()
>   list-safe"), convert_rela() iterated through the entire section's
>   relas, moving any of the same name.  This was determined not to be
>   list safe when moving consecutive relas in the linked list.
> 
> - After d59cadc0a8f8 ("[squash] klp-convert: make convert_rela()
>   list-safe"), convert_rela() still iterates through the section relas,
>   but only updates r1->sym->klp_rela_sec instead of moving them.
>   move_rela() was added to be called by the for-each-rela loop in
>   main().
> 
>   - Bug 1: klp_rela_sec probably belongs in struct rela and not struct
>     symbol
> 
>   - Bug 2: the main loop skips over second, third, etc. matching relas
>     anyway as the shared symbol name will have already been converted

Yes, it explains the issue.
 
> The following fix might not be elegant, but I can't think of a clever
> way to handle the original issue d59cadc0a8f8 ("[squash] klp-convert:
> make convert_rela() list-safe") as well as these resulting regressions.
> So I broke out the moving of relas to a seperate loop.

It works. Thanks Joe.

> That is probably
> worth a comment and at the same time we might be able to drop some of
> these other "safe" loop traversals for ordinary list_for_each_entry.

I think _safe from list_for_each_entry_safe(rela, tmprela, &sec->relas, 
list) in the main loop could be dropped, because convert_rela() only marks 
relas and does not move them anywhere. 
Similarly, list_for_each_entry_safe(r1, r2, &oldsec->relas, list) in 
convert_rela() itself.

Miroslav