diff mbox series

[v6,1/5] x86/boot: create a C bundle for 32 bit boot code and use it

Message ID 20241017133123.1946204-2-frediano.ziglio@cloud.com (mailing list archive)
State Superseded
Headers show
Series Reuse 32 bit C code more safely | expand

Commit Message

Frediano Ziglio Oct. 17, 2024, 1:31 p.m. UTC
The current method to include 32 bit C boot code is:
- compile each function we want to use into a separate object file;
- each function is compiled with -fpic option;
- convert these object files to binary files. This operation removes GOP
  which we don't want in the executable;
- a small assembly part in each file add the entry point;
- code can't have external references, all possible variables are passed
  by value or pointer;
- include these binary files in head.S.

There are currently some limitations:
- code is compiled separately, it's not possible to share a function
  (like memcpy) between different functions to use;
- although code is compiled with -fpic there's no certainty there are
  no relocations, specifically data ones. This can lead into hard to
  find bugs;
- it's hard to add a simple function;
- having to pass external variables makes hard to do multiple things
  otherwise functions would require a lot of parameters so code would
  have to be split into multiple functions which is not easy.

Current change extends the current process:
- all object files are linked together before getting converted making
  possible to share code between the function we want to call;
- a single object file is generated with all functions to use and
  exported symbols to easily call;
- variables to use are declared in linker script and easily used inside
  C code. Declaring them manually could be annoying but makes also
  easier to check them. Using external pointers can be still an issue if
  they are not fixed. If an external symbol is not declared this gives a
  link error.

Some details of the implementation:
- C code is compiled with -fpic flags (as before);
- object files from C code are linked together;
- the single bundled object file is linked with 2 slightly different
  script files to generate 2 bundled object files;
- the 2 bundled object files are converted to binary removing the need
  for global offset tables;
- a Python script is used to generate assembly source from the 2
  binaries;
- the single assembly file is compiled to generate final bundled object
  file;
- to detect possible unwanted relocation in data/code code is generated
  with different addresses. This is enforced starting .text section at
  different positions and adding a fixed "gap" at the beginning.
  This makes sure code and data is position independent;
- to detect used symbols in data/code symbols are placed in .text
  section at different offsets (based on the line in the linker script).
  This is needed as potentially a reference to a symbol is converted to
  a reference to the containing section so multiple symbols could be
  converted to reference to same symbol (section name) and we need to
  distinguish them;
- --orphan-handling=error option to linker is used to make sure we
  account for all possible sections from C code;

Current limitations:
- the main one is the lack of support for 64 bit code. It would make
  sure that even the code used for 64 bit (at the moment EFI code) is
  code and data position independent. We cannot assume that code that
  came from code compiled for 32 bit and compiled for 64 bit is code and
  data position independent, different compiler options lead to
  different code/data.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
---
Changes since v2:
- removed W^X limitation, allowing data;
- added some comments to python script;
- added extension to python script;
- added header to generated assembly code from python script;
- added starting symbol to generated assembly code from python script
  to make disassembly more clear;
- other minor style changes to python script.

Changes since v4:
- add build32.final.lds build32.other.lds to targets macro;
- place some comments over a rule, not inside;
- simplified linking and producing binary rule;
- renamed built_in_32 to built-in-32, coding style;
- fix minor indentation;
- put magic numbers in Makefile and propagate them;
- minor variable cleanups in Python script;
- add dependency to Python script.

Changes since v5:
- renamed "other" and "final" phases to "base" and "offset";
- use if_changed macro to generate built-in-32.S.
---
 xen/arch/x86/boot/.gitignore                  |   5 +-
 xen/arch/x86/boot/Makefile                    |  47 +++-
 .../x86/boot/{build32.lds => build32.lds.S}   |  35 ++-
 xen/arch/x86/boot/cmdline.c                   |  12 -
 xen/arch/x86/boot/head.S                      |  12 -
 xen/arch/x86/boot/reloc.c                     |  14 --
 xen/tools/combine_two_binaries.py             | 220 ++++++++++++++++++
 7 files changed, 292 insertions(+), 53 deletions(-)
 rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (70%)
 create mode 100755 xen/tools/combine_two_binaries.py

Comments

Anthony PERARD Oct. 17, 2024, 4 p.m. UTC | #1
On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
> +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff)
> +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL

I was somehow expecting "base" and "offset" to be the other way around,
but that's fine. And by the way, -DFINAL cancel changes to GAP and
TEXT_DIFF ;-).

But overall, the changes looks nice as is,
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>

Thanks,
Andrew Cooper Oct. 17, 2024, 5:13 p.m. UTC | #2
On 17/10/2024 2:31 pm, Frediano Ziglio wrote:
> The current method to include 32 bit C boot code is:
> - compile each function we want to use into a separate object file;
> - each function is compiled with -fpic option;
> - convert these object files to binary files. This operation removes GOP
>   which we don't want in the executable;
> - a small assembly part in each file add the entry point;
> - code can't have external references, all possible variables are passed
>   by value or pointer;
> - include these binary files in head.S.
>
> There are currently some limitations:
> - code is compiled separately, it's not possible to share a function
>   (like memcpy) between different functions to use;
> - although code is compiled with -fpic there's no certainty there are
>   no relocations, specifically data ones. This can lead into hard to
>   find bugs;
> - it's hard to add a simple function;
> - having to pass external variables makes hard to do multiple things
>   otherwise functions would require a lot of parameters so code would
>   have to be split into multiple functions which is not easy.
>
> Current change extends the current process:
> - all object files are linked together before getting converted making
>   possible to share code between the function we want to call;
> - a single object file is generated with all functions to use and
>   exported symbols to easily call;
> - variables to use are declared in linker script and easily used inside
>   C code. Declaring them manually could be annoying but makes also
>   easier to check them. Using external pointers can be still an issue if
>   they are not fixed. If an external symbol is not declared this gives a
>   link error.
>
> Some details of the implementation:
> - C code is compiled with -fpic flags (as before);
> - object files from C code are linked together;
> - the single bundled object file is linked with 2 slightly different
>   script files to generate 2 bundled object files;
> - the 2 bundled object files are converted to binary removing the need
>   for global offset tables;
> - a Python script is used to generate assembly source from the 2
>   binaries;
> - the single assembly file is compiled to generate final bundled object
>   file;
> - to detect possible unwanted relocation in data/code code is generated
>   with different addresses. This is enforced starting .text section at
>   different positions and adding a fixed "gap" at the beginning.
>   This makes sure code and data is position independent;
> - to detect used symbols in data/code symbols are placed in .text
>   section at different offsets (based on the line in the linker script).
>   This is needed as potentially a reference to a symbol is converted to
>   a reference to the containing section so multiple symbols could be
>   converted to reference to same symbol (section name) and we need to
>   distinguish them;
> - --orphan-handling=error option to linker is used to make sure we
>   account for all possible sections from C code;
>
> Current limitations:
> - the main one is the lack of support for 64 bit code. It would make
>   sure that even the code used for 64 bit (at the moment EFI code) is
>   code and data position independent. We cannot assume that code that
>   came from code compiled for 32 bit and compiled for 64 bit is code and
>   data position independent, different compiler options lead to
>   different code/data.
>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>

This commit message is not particularly easy to follow.  Can I recommend
the following:

---%<---
x86/boot: Rework how 32bit C is linked/included for early boot

Right now, the two functions which were really too complicated to write
in asm are compiled as 32bit PIC, linked to a blob and included
directly, using global asm() to arrange for them to have function semantics.

This is limiting and fragile; the use of data relocations will compile
fine but malfunction when used, creating hard-to-debug bugs.

Furthermore, we would like to increase the amount of C, to
deduplicate/unify Xen's boot logic, as well as making it easier to
follow.  Therefore, rework how the 32bit objects are included.

Link all 32bit objects together first.  This allows for sharing of logic
between translation units.  Use differential linking and explicit
imports/exports to confirm that we only have the expected relocations,
and write the object back out as an assembly file so it can be linked
again as if it were 64bit, to integrate with the rest of Xen.

This allows for the use of external references (e.g. access to global
variables) with reasonable assurance of doing so safely.

No functional change.
---%<---

which I think is an accurate and more concise summary of what's changing?

> diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore
> index a379db7988..7e85549751 100644
> --- a/xen/arch/x86/boot/.gitignore
> +++ b/xen/arch/x86/boot/.gitignore
> @@ -1,3 +1,4 @@
>  /mkelf32
> -/*.bin
> -/*.lnk
> +/build32.*.lds
> +/built-in-32.*.bin
> +/built-in-32.*.map

/built-in-32.S too

And from a glance at the file, this adjustment in the combine script too:

-print('\n\t.section\t.note.GNU-stack,"",@progbits', file=out)
+print('\n\t.section .note.GNU-stack, "", @progbits', file=out)

to have both .section's formatted in the same way.


> diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S
> similarity index 70%
> rename from xen/arch/x86/boot/build32.lds
> rename to xen/arch/x86/boot/build32.lds.S
> index 56edaa727b..e3f5e55261 100644
> --- a/xen/arch/x86/boot/build32.lds
> +++ b/xen/arch/x86/boot/build32.lds.S
> <snip>
>          *(.text)
>          *(.text.*)
> -        *(.data)
> -        *(.data.*)
>          *(.rodata)
>          *(.rodata.*)
> +        *(.data)
> +        *(.data.*)

Reordering .data and .rodata really isn't necessary.

I'd just drop this part of the diff.  I have some different follow-up
for it anyway, which I've been holding off until after this first change
is sorted.

Everything here I'm happy to fix up on commit, if you're ok with me
doing so.

~Andrew
Frediano Ziglio Oct. 18, 2024, 8:42 a.m. UTC | #3
On Thu, Oct 17, 2024 at 6:13 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 17/10/2024 2:31 pm, Frediano Ziglio wrote:
> > The current method to include 32 bit C boot code is:
> > - compile each function we want to use into a separate object file;
> > - each function is compiled with -fpic option;
> > - convert these object files to binary files. This operation removes GOP
> >   which we don't want in the executable;
> > - a small assembly part in each file add the entry point;
> > - code can't have external references, all possible variables are passed
> >   by value or pointer;
> > - include these binary files in head.S.
> >
> > There are currently some limitations:
> > - code is compiled separately, it's not possible to share a function
> >   (like memcpy) between different functions to use;
> > - although code is compiled with -fpic there's no certainty there are
> >   no relocations, specifically data ones. This can lead into hard to
> >   find bugs;
> > - it's hard to add a simple function;
> > - having to pass external variables makes hard to do multiple things
> >   otherwise functions would require a lot of parameters so code would
> >   have to be split into multiple functions which is not easy.
> >
> > Current change extends the current process:
> > - all object files are linked together before getting converted making
> >   possible to share code between the function we want to call;
> > - a single object file is generated with all functions to use and
> >   exported symbols to easily call;
> > - variables to use are declared in linker script and easily used inside
> >   C code. Declaring them manually could be annoying but makes also
> >   easier to check them. Using external pointers can be still an issue if
> >   they are not fixed. If an external symbol is not declared this gives a
> >   link error.
> >
> > Some details of the implementation:
> > - C code is compiled with -fpic flags (as before);
> > - object files from C code are linked together;
> > - the single bundled object file is linked with 2 slightly different
> >   script files to generate 2 bundled object files;
> > - the 2 bundled object files are converted to binary removing the need
> >   for global offset tables;
> > - a Python script is used to generate assembly source from the 2
> >   binaries;
> > - the single assembly file is compiled to generate final bundled object
> >   file;
> > - to detect possible unwanted relocation in data/code code is generated
> >   with different addresses. This is enforced starting .text section at
> >   different positions and adding a fixed "gap" at the beginning.
> >   This makes sure code and data is position independent;
> > - to detect used symbols in data/code symbols are placed in .text
> >   section at different offsets (based on the line in the linker script).
> >   This is needed as potentially a reference to a symbol is converted to
> >   a reference to the containing section so multiple symbols could be
> >   converted to reference to same symbol (section name) and we need to
> >   distinguish them;
> > - --orphan-handling=error option to linker is used to make sure we
> >   account for all possible sections from C code;
> >
> > Current limitations:
> > - the main one is the lack of support for 64 bit code. It would make
> >   sure that even the code used for 64 bit (at the moment EFI code) is
> >   code and data position independent. We cannot assume that code that
> >   came from code compiled for 32 bit and compiled for 64 bit is code and
> >   data position independent, different compiler options lead to
> >   different code/data.
> >
> > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
>
> This commit message is not particularly easy to follow.  Can I recommend
> the following:
>
> ---%<---
> x86/boot: Rework how 32bit C is linked/included for early boot
>
> Right now, the two functions which were really too complicated to write
> in asm are compiled as 32bit PIC, linked to a blob and included
> directly, using global asm() to arrange for them to have function semantics.
>
> This is limiting and fragile; the use of data relocations will compile
> fine but malfunction when used, creating hard-to-debug bugs.
>
> Furthermore, we would like to increase the amount of C, to
> deduplicate/unify Xen's boot logic, as well as making it easier to
> follow.  Therefore, rework how the 32bit objects are included.
>
> Link all 32bit objects together first.  This allows for sharing of logic
> between translation units.  Use differential linking and explicit
> imports/exports to confirm that we only have the expected relocations,
> and write the object back out as an assembly file so it can be linked
> again as if it were 64bit, to integrate with the rest of Xen.
>
> This allows for the use of external references (e.g. access to global
> variables) with reasonable assurance of doing so safely.
>
> No functional change.
> ---%<---
>
> which I think is an accurate and more concise summary of what's changing?
>

You cut half of the explanation, replacing with nothing.
Why is a script needed? Why 2 linking? How the new method detects
unwanted relocations?
Why wasn't possible to share functions?
Why using --orphan-handling option?

The description has been there for about 2 months without many objections.

> > diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore
> > index a379db7988..7e85549751 100644
> > --- a/xen/arch/x86/boot/.gitignore
> > +++ b/xen/arch/x86/boot/.gitignore
> > @@ -1,3 +1,4 @@
> >  /mkelf32
> > -/*.bin
> > -/*.lnk
> > +/build32.*.lds
> > +/built-in-32.*.bin
> > +/built-in-32.*.map
>
> /built-in-32.S too
>

Sure

> And from a glance at the file, this adjustment in the combine script too:
>
> -print('\n\t.section\t.note.GNU-stack,"",@progbits', file=out)
> +print('\n\t.section .note.GNU-stack, "", @progbits', file=out)
>
> to have both .section's formatted in the same way.
>

Fine

>
> > diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S
> > similarity index 70%
> > rename from xen/arch/x86/boot/build32.lds
> > rename to xen/arch/x86/boot/build32.lds.S
> > index 56edaa727b..e3f5e55261 100644
> > --- a/xen/arch/x86/boot/build32.lds
> > +++ b/xen/arch/x86/boot/build32.lds.S
> > <snip>
> >          *(.text)
> >          *(.text.*)
> > -        *(.data)
> > -        *(.data.*)
> >          *(.rodata)
> >          *(.rodata.*)
> > +        *(.data)
> > +        *(.data.*)
>
> Reordering .data and .rodata really isn't necessary.
>

Yes, I asked in some comment. No problem, can be removed.

I'll write another commit. Not anyway strong, this is the general
order of sections. Here won't make much difference, usually you want
this order to minimize page changes (both text and rodata are
read-only).


> I'd just drop this part of the diff.  I have some different follow-up
> for it anyway, which I've been holding off until after this first change
> is sorted.
>
> Everything here I'm happy to fix up on commit, if you're ok with me
> doing so.
>
> ~Andrew

Frediano
Roger Pau Monné Oct. 18, 2024, 11:41 a.m. UTC | #4
On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
> The current method to include 32 bit C boot code is:
> - compile each function we want to use into a separate object file;
> - each function is compiled with -fpic option;
> - convert these object files to binary files. This operation removes GOP
>   which we don't want in the executable;
> - a small assembly part in each file add the entry point;
> - code can't have external references, all possible variables are passed
>   by value or pointer;
> - include these binary files in head.S.
> 
> There are currently some limitations:
> - code is compiled separately, it's not possible to share a function
>   (like memcpy) between different functions to use;
> - although code is compiled with -fpic there's no certainty there are
>   no relocations, specifically data ones. This can lead into hard to
>   find bugs;
> - it's hard to add a simple function;
> - having to pass external variables makes hard to do multiple things
>   otherwise functions would require a lot of parameters so code would
>   have to be split into multiple functions which is not easy.
> 
> Current change extends the current process:
> - all object files are linked together before getting converted making
>   possible to share code between the function we want to call;
> - a single object file is generated with all functions to use and
>   exported symbols to easily call;
> - variables to use are declared in linker script and easily used inside
>   C code. Declaring them manually could be annoying but makes also
>   easier to check them. Using external pointers can be still an issue if
>   they are not fixed. If an external symbol is not declared this gives a
>   link error.
> 
> Some details of the implementation:
> - C code is compiled with -fpic flags (as before);
> - object files from C code are linked together;
> - the single bundled object file is linked with 2 slightly different
>   script files to generate 2 bundled object files;
> - the 2 bundled object files are converted to binary removing the need
>   for global offset tables;
> - a Python script is used to generate assembly source from the 2
>   binaries;
> - the single assembly file is compiled to generate final bundled object
>   file;
> - to detect possible unwanted relocation in data/code code is generated
>   with different addresses. This is enforced starting .text section at
>   different positions and adding a fixed "gap" at the beginning.
>   This makes sure code and data is position independent;
> - to detect used symbols in data/code symbols are placed in .text
>   section at different offsets (based on the line in the linker script).
>   This is needed as potentially a reference to a symbol is converted to
>   a reference to the containing section so multiple symbols could be
>   converted to reference to same symbol (section name) and we need to
>   distinguish them;
> - --orphan-handling=error option to linker is used to make sure we
>   account for all possible sections from C code;
> 
> Current limitations:
> - the main one is the lack of support for 64 bit code. It would make
>   sure that even the code used for 64 bit (at the moment EFI code) is
>   code and data position independent. We cannot assume that code that
>   came from code compiled for 32 bit and compiled for 64 bit is code and
>   data position independent, different compiler options lead to
>   different code/data.
> 
> Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
> ---
> Changes since v2:
> - removed W^X limitation, allowing data;
> - added some comments to python script;
> - added extension to python script;
> - added header to generated assembly code from python script;
> - added starting symbol to generated assembly code from python script
>   to make disassembly more clear;
> - other minor style changes to python script.
> 
> Changes since v4:
> - add build32.final.lds build32.other.lds to targets macro;
> - place some comments over a rule, not inside;
> - simplified linking and producing binary rule;
> - renamed built_in_32 to built-in-32, coding style;
> - fix minor indentation;
> - put magic numbers in Makefile and propagate them;
> - minor variable cleanups in Python script;
> - add dependency to Python script.
> 
> Changes since v5:
> - renamed "other" and "final" phases to "base" and "offset";
> - use if_changed macro to generate built-in-32.S.
> ---
>  xen/arch/x86/boot/.gitignore                  |   5 +-
>  xen/arch/x86/boot/Makefile                    |  47 +++-
>  .../x86/boot/{build32.lds => build32.lds.S}   |  35 ++-
>  xen/arch/x86/boot/cmdline.c                   |  12 -
>  xen/arch/x86/boot/head.S                      |  12 -
>  xen/arch/x86/boot/reloc.c                     |  14 --
>  xen/tools/combine_two_binaries.py             | 220 ++++++++++++++++++
>  7 files changed, 292 insertions(+), 53 deletions(-)
>  rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (70%)
>  create mode 100755 xen/tools/combine_two_binaries.py
> 
> diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore
> index a379db7988..7e85549751 100644
> --- a/xen/arch/x86/boot/.gitignore
> +++ b/xen/arch/x86/boot/.gitignore
> @@ -1,3 +1,4 @@
>  /mkelf32
> -/*.bin
> -/*.lnk
> +/build32.*.lds
> +/built-in-32.*.bin
> +/built-in-32.*.map
> diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
> index 1199291d2b..5da19501be 100644
> --- a/xen/arch/x86/boot/Makefile
> +++ b/xen/arch/x86/boot/Makefile
> @@ -1,4 +1,5 @@
>  obj-bin-y += head.o
> +obj-bin-y += built-in-32.o
>  
>  obj32 := cmdline.32.o
>  obj32 += reloc.32.o
> @@ -9,9 +10,6 @@ targets   += $(obj32)
>  
>  obj32 := $(addprefix $(obj)/,$(obj32))
>  
> -$(obj)/head.o: AFLAGS-y += -Wa$(comma)-I$(obj)
> -$(obj)/head.o: $(obj32:.32.o=.bin)
> -
>  CFLAGS_x86_32 := $(subst -m64,-m32 -march=i686,$(XEN_TREEWIDE_CFLAGS))
>  $(call cc-options-add,CFLAGS_x86_32,CC,$(EMBEDDED_EXTRA_CFLAGS))
>  CFLAGS_x86_32 += -Werror -fno-builtin -g0 -msoft-float -mregparm=3
> @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
>  $(obj)/%.32.o: $(src)/%.c FORCE
>  	$(call if_changed_rule,cc_o_c)
>  
> +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error
>  LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments
>  LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y)
>  LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT))
>  
> -%.bin: %.lnk
> -	$(OBJCOPY) -j .text -O binary $< $@
> +text_gap := 0x010200
> +text_diff := 0x408020
> +
> +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff)
> +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL
> +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE
> +	$(call if_changed_dep,cpp_lds_S)
> +
> +targets += build32.offset.lds build32.base.lds
> +
> +# link all 32bit objects together
> +$(obj)/built-in-32.tmp.o: $(obj32)
> +	$(LD32) -r -o $@ $^
> +
> +# link bundle with a given layout and extract a binary from it
> +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o
> +	$(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^)
> +	$(OBJCOPY) -j .text -O binary $(@:bin=o) $@
> +	rm -f $(@:bin=o)
> +
> +quiet_cmd_combine = GEN     $@
> +cmd_combine = \
> +	$(PYTHON) $(srctree)/tools/combine_two_binaries.py \
> +		--gap=$(text_gap) --text-diff=$(text_diff) \
> +		--script $(obj)/build32.offset.lds \
> +		--bin1 $(obj)/built-in-32.base.bin \
> +		--bin2 $(obj)/built-in-32.offset.bin \
> +		--map $(obj)/built-in-32.offset.map \
> +		--exports cmdline_parse_early,reloc \
> +		--output $@

See xen/Rules.mk, for consistency the indentation should be done with
spaces when defining variables.  That would also allow to align the
options.

> +
> +targets += built-in-32.S
>  
> -%.lnk: %.32.o $(src)/build32.lds
> -	$(LD32) -N -T $(filter %.lds,$^) -o $@ $<
> +# generate final object file combining and checking above binaries
> +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \
> +		$(srctree)/tools/combine_two_binaries.py FORCE

Can you indent this using spaces also, so it's on the same column as
the ':'?

> +	$(call if_changed,combine)
>  
> -clean-files := *.lnk *.bin
> +clean-files := built-in-32.*.bin built-in-32.*.map build32.*.lds
> diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S
> similarity index 70%
> rename from xen/arch/x86/boot/build32.lds
> rename to xen/arch/x86/boot/build32.lds.S
> index 56edaa727b..e3f5e55261 100644
> --- a/xen/arch/x86/boot/build32.lds
> +++ b/xen/arch/x86/boot/build32.lds.S
> @@ -15,22 +15,47 @@
>   * with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> -ENTRY(_start)
> +#ifdef FINAL
> +#  undef GAP
> +#  define GAP 0
> +#  define MULT 0
> +#  define TEXT_START
> +#else
> +#  define MULT 1
> +#  define TEXT_START TEXT_DIFF
> +#endif

In other places we use a single space between the hash and the define.

> +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT)
> +
> +ENTRY(dummy_start)
>  
>  SECTIONS
>  {
>    /* Merge code and data into one section. */
> -  .text : {
> +  .text TEXT_START : {
> +        /* Silence linker warning, we are not going to use it */
> +        dummy_start = .;
> +
> +        /* Declare below any symbol name needed.
> +         * Each symbol should be on its own line.
> +         * It looks like a tedious work but we make sure the things we use.
> +         * Potentially they should be all variables. */

The style is wrong for the opening and closing comment delimiters.

I think it would be best if this was written in a more natural style.

/*
 * Any symbols used should be declared below, this ensures which
 * symbols are visible to the 32bit C boot code.
 */

I don't think you need to mention that each symbol should be on it's
own line.

Thanks, Roger.
Roger Pau Monné Oct. 18, 2024, 11:49 a.m. UTC | #5
On Fri, Oct 18, 2024 at 09:42:48AM +0100, Frediano Ziglio wrote:
> On Thu, Oct 17, 2024 at 6:13 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >
> > On 17/10/2024 2:31 pm, Frediano Ziglio wrote:
> > > The current method to include 32 bit C boot code is:
> > > - compile each function we want to use into a separate object file;
> > > - each function is compiled with -fpic option;
> > > - convert these object files to binary files. This operation removes GOP
> > >   which we don't want in the executable;
> > > - a small assembly part in each file add the entry point;
> > > - code can't have external references, all possible variables are passed
> > >   by value or pointer;
> > > - include these binary files in head.S.
> > >
> > > There are currently some limitations:
> > > - code is compiled separately, it's not possible to share a function
> > >   (like memcpy) between different functions to use;
> > > - although code is compiled with -fpic there's no certainty there are
> > >   no relocations, specifically data ones. This can lead into hard to
> > >   find bugs;
> > > - it's hard to add a simple function;
> > > - having to pass external variables makes hard to do multiple things
> > >   otherwise functions would require a lot of parameters so code would
> > >   have to be split into multiple functions which is not easy.
> > >
> > > Current change extends the current process:
> > > - all object files are linked together before getting converted making
> > >   possible to share code between the function we want to call;
> > > - a single object file is generated with all functions to use and
> > >   exported symbols to easily call;
> > > - variables to use are declared in linker script and easily used inside
> > >   C code. Declaring them manually could be annoying but makes also
> > >   easier to check them. Using external pointers can be still an issue if
> > >   they are not fixed. If an external symbol is not declared this gives a
> > >   link error.
> > >
> > > Some details of the implementation:
> > > - C code is compiled with -fpic flags (as before);
> > > - object files from C code are linked together;
> > > - the single bundled object file is linked with 2 slightly different
> > >   script files to generate 2 bundled object files;
> > > - the 2 bundled object files are converted to binary removing the need
> > >   for global offset tables;
> > > - a Python script is used to generate assembly source from the 2
> > >   binaries;
> > > - the single assembly file is compiled to generate final bundled object
> > >   file;
> > > - to detect possible unwanted relocation in data/code code is generated
> > >   with different addresses. This is enforced starting .text section at
> > >   different positions and adding a fixed "gap" at the beginning.
> > >   This makes sure code and data is position independent;
> > > - to detect used symbols in data/code symbols are placed in .text
> > >   section at different offsets (based on the line in the linker script).
> > >   This is needed as potentially a reference to a symbol is converted to
> > >   a reference to the containing section so multiple symbols could be
> > >   converted to reference to same symbol (section name) and we need to
> > >   distinguish them;
> > > - --orphan-handling=error option to linker is used to make sure we
> > >   account for all possible sections from C code;
> > >
> > > Current limitations:
> > > - the main one is the lack of support for 64 bit code. It would make
> > >   sure that even the code used for 64 bit (at the moment EFI code) is
> > >   code and data position independent. We cannot assume that code that
> > >   came from code compiled for 32 bit and compiled for 64 bit is code and
> > >   data position independent, different compiler options lead to
> > >   different code/data.
> > >
> > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
> >
> > This commit message is not particularly easy to follow.  Can I recommend
> > the following:
> >
> > ---%<---
> > x86/boot: Rework how 32bit C is linked/included for early boot
> >
> > Right now, the two functions which were really too complicated to write
> > in asm are compiled as 32bit PIC, linked to a blob and included
> > directly, using global asm() to arrange for them to have function semantics.
> >
> > This is limiting and fragile; the use of data relocations will compile
> > fine but malfunction when used, creating hard-to-debug bugs.
> >
> > Furthermore, we would like to increase the amount of C, to
> > deduplicate/unify Xen's boot logic, as well as making it easier to
> > follow.  Therefore, rework how the 32bit objects are included.
> >
> > Link all 32bit objects together first.  This allows for sharing of logic
> > between translation units.  Use differential linking and explicit
> > imports/exports to confirm that we only have the expected relocations,
> > and write the object back out as an assembly file so it can be linked
> > again as if it were 64bit, to integrate with the rest of Xen.
> >
> > This allows for the use of external references (e.g. access to global
> > variables) with reasonable assurance of doing so safely.
> >
> > No functional change.
> > ---%<---
> >
> > which I think is an accurate and more concise summary of what's changing?
> >
> 
> You cut half of the explanation, replacing with nothing.
> Why is a script needed? Why 2 linking? How the new method detects
> unwanted relocations?

TBH this is not clear to me even with your original commit message.

> Why wasn't possible to share functions?
> Why using --orphan-handling option?
> 
> The description has been there for about 2 months without many objections.

IMO it's fine to use lists to describe specific points, but using
lists exclusively to write a commit message makes the items feel
disconnected between them.

The format of the commit message by Andrew is clearer to undertsand
for me.  Could you add what you think it's missing to the proposed
message by Andrew?

Thanks, Roger.
Jan Beulich Oct. 18, 2024, 12:04 p.m. UTC | #6
On 18.10.2024 13:41, Roger Pau Monné wrote:
> On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
>> @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
>>  $(obj)/%.32.o: $(src)/%.c FORCE
>>  	$(call if_changed_rule,cc_o_c)
>>  
>> +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error
>>  LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments
>>  LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y)
>>  LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT))
>>  
>> -%.bin: %.lnk
>> -	$(OBJCOPY) -j .text -O binary $< $@
>> +text_gap := 0x010200
>> +text_diff := 0x408020
>> +
>> +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff)
>> +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL
>> +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE
>> +	$(call if_changed_dep,cpp_lds_S)
>> +
>> +targets += build32.offset.lds build32.base.lds
>> +
>> +# link all 32bit objects together
>> +$(obj)/built-in-32.tmp.o: $(obj32)
>> +	$(LD32) -r -o $@ $^
>> +
>> +# link bundle with a given layout and extract a binary from it
>> +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o
>> +	$(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^)
>> +	$(OBJCOPY) -j .text -O binary $(@:bin=o) $@
>> +	rm -f $(@:bin=o)
>> +
>> +quiet_cmd_combine = GEN     $@
>> +cmd_combine = \
>> +	$(PYTHON) $(srctree)/tools/combine_two_binaries.py \
>> +		--gap=$(text_gap) --text-diff=$(text_diff) \
>> +		--script $(obj)/build32.offset.lds \
>> +		--bin1 $(obj)/built-in-32.base.bin \
>> +		--bin2 $(obj)/built-in-32.offset.bin \
>> +		--map $(obj)/built-in-32.offset.map \
>> +		--exports cmdline_parse_early,reloc \
>> +		--output $@
> 
> See xen/Rules.mk, for consistency the indentation should be done with
> spaces when defining variables.  That would also allow to align the
> options.

And ideally also such that the options align with the first program
argument.

>> +
>> +targets += built-in-32.S
>>  
>> -%.lnk: %.32.o $(src)/build32.lds
>> -	$(LD32) -N -T $(filter %.lds,$^) -o $@ $<
>> +# generate final object file combining and checking above binaries
>> +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \
>> +		$(srctree)/tools/combine_two_binaries.py FORCE
> 
> Can you indent this using spaces also, so it's on the same column as
> the ':'?

The first $(obj) you mean, I think / hope?

Jan
Frediano Ziglio Oct. 18, 2024, 12:42 p.m. UTC | #7
On Thu, Oct 17, 2024 at 5:00 PM Anthony PERARD
<anthony.perard@vates.tech> wrote:
>
> On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
> > +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff)
> > +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL
>
> I was somehow expecting "base" and "offset" to be the other way around,

That's weird. I mean, the "base" uses a GAP and TEXT_START of 0 while
for "offset" they are not zero adding some offset to the code/data.

> but that's fine. And by the way, -DFINAL cancel changes to GAP and
> TEXT_DIFF ;-).
>

Yes, and potentially the first like added above can be removed. On the
other hand, they are values used for the algorithm despite them being
used in some cases or not.

> But overall, the changes looks nice as is,
> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
>
> Thanks,
>
Thanks,

Frediano
Frediano Ziglio Oct. 18, 2024, 12:48 p.m. UTC | #8
On Fri, Oct 18, 2024 at 12:41 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
> > The current method to include 32 bit C boot code is:
> > - compile each function we want to use into a separate object file;
> > - each function is compiled with -fpic option;
> > - convert these object files to binary files. This operation removes GOP
> >   which we don't want in the executable;
> > - a small assembly part in each file add the entry point;
> > - code can't have external references, all possible variables are passed
> >   by value or pointer;
> > - include these binary files in head.S.
> >
> > There are currently some limitations:
> > - code is compiled separately, it's not possible to share a function
> >   (like memcpy) between different functions to use;
> > - although code is compiled with -fpic there's no certainty there are
> >   no relocations, specifically data ones. This can lead into hard to
> >   find bugs;
> > - it's hard to add a simple function;
> > - having to pass external variables makes hard to do multiple things
> >   otherwise functions would require a lot of parameters so code would
> >   have to be split into multiple functions which is not easy.
> >
> > Current change extends the current process:
> > - all object files are linked together before getting converted making
> >   possible to share code between the function we want to call;
> > - a single object file is generated with all functions to use and
> >   exported symbols to easily call;
> > - variables to use are declared in linker script and easily used inside
> >   C code. Declaring them manually could be annoying but makes also
> >   easier to check them. Using external pointers can be still an issue if
> >   they are not fixed. If an external symbol is not declared this gives a
> >   link error.
> >
> > Some details of the implementation:
> > - C code is compiled with -fpic flags (as before);
> > - object files from C code are linked together;
> > - the single bundled object file is linked with 2 slightly different
> >   script files to generate 2 bundled object files;
> > - the 2 bundled object files are converted to binary removing the need
> >   for global offset tables;
> > - a Python script is used to generate assembly source from the 2
> >   binaries;
> > - the single assembly file is compiled to generate final bundled object
> >   file;
> > - to detect possible unwanted relocation in data/code code is generated
> >   with different addresses. This is enforced starting .text section at
> >   different positions and adding a fixed "gap" at the beginning.
> >   This makes sure code and data is position independent;
> > - to detect used symbols in data/code symbols are placed in .text
> >   section at different offsets (based on the line in the linker script).
> >   This is needed as potentially a reference to a symbol is converted to
> >   a reference to the containing section so multiple symbols could be
> >   converted to reference to same symbol (section name) and we need to
> >   distinguish them;
> > - --orphan-handling=error option to linker is used to make sure we
> >   account for all possible sections from C code;
> >
> > Current limitations:
> > - the main one is the lack of support for 64 bit code. It would make
> >   sure that even the code used for 64 bit (at the moment EFI code) is
> >   code and data position independent. We cannot assume that code that
> >   came from code compiled for 32 bit and compiled for 64 bit is code and
> >   data position independent, different compiler options lead to
> >   different code/data.
> >
> > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
> > ---
> > Changes since v2:
> > - removed W^X limitation, allowing data;
> > - added some comments to python script;
> > - added extension to python script;
> > - added header to generated assembly code from python script;
> > - added starting symbol to generated assembly code from python script
> >   to make disassembly more clear;
> > - other minor style changes to python script.
> >
> > Changes since v4:
> > - add build32.final.lds build32.other.lds to targets macro;
> > - place some comments over a rule, not inside;
> > - simplified linking and producing binary rule;
> > - renamed built_in_32 to built-in-32, coding style;
> > - fix minor indentation;
> > - put magic numbers in Makefile and propagate them;
> > - minor variable cleanups in Python script;
> > - add dependency to Python script.
> >
> > Changes since v5:
> > - renamed "other" and "final" phases to "base" and "offset";
> > - use if_changed macro to generate built-in-32.S.
> > ---
> >  xen/arch/x86/boot/.gitignore                  |   5 +-
> >  xen/arch/x86/boot/Makefile                    |  47 +++-
> >  .../x86/boot/{build32.lds => build32.lds.S}   |  35 ++-
> >  xen/arch/x86/boot/cmdline.c                   |  12 -
> >  xen/arch/x86/boot/head.S                      |  12 -
> >  xen/arch/x86/boot/reloc.c                     |  14 --
> >  xen/tools/combine_two_binaries.py             | 220 ++++++++++++++++++
> >  7 files changed, 292 insertions(+), 53 deletions(-)
> >  rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (70%)
> >  create mode 100755 xen/tools/combine_two_binaries.py
> >
> > diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore
> > index a379db7988..7e85549751 100644
> > --- a/xen/arch/x86/boot/.gitignore
> > +++ b/xen/arch/x86/boot/.gitignore
> > @@ -1,3 +1,4 @@
> >  /mkelf32
> > -/*.bin
> > -/*.lnk
> > +/build32.*.lds
> > +/built-in-32.*.bin
> > +/built-in-32.*.map
> > diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
> > index 1199291d2b..5da19501be 100644
> > --- a/xen/arch/x86/boot/Makefile
> > +++ b/xen/arch/x86/boot/Makefile
> > @@ -1,4 +1,5 @@
> >  obj-bin-y += head.o
> > +obj-bin-y += built-in-32.o
> >
> >  obj32 := cmdline.32.o
> >  obj32 += reloc.32.o
> > @@ -9,9 +10,6 @@ targets   += $(obj32)
> >
> >  obj32 := $(addprefix $(obj)/,$(obj32))
> >
> > -$(obj)/head.o: AFLAGS-y += -Wa$(comma)-I$(obj)
> > -$(obj)/head.o: $(obj32:.32.o=.bin)
> > -
> >  CFLAGS_x86_32 := $(subst -m64,-m32 -march=i686,$(XEN_TREEWIDE_CFLAGS))
> >  $(call cc-options-add,CFLAGS_x86_32,CC,$(EMBEDDED_EXTRA_CFLAGS))
> >  CFLAGS_x86_32 += -Werror -fno-builtin -g0 -msoft-float -mregparm=3
> > @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
> >  $(obj)/%.32.o: $(src)/%.c FORCE
> >       $(call if_changed_rule,cc_o_c)
> >
> > +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error
> >  LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments
> >  LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y)
> >  LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT))
> >
> > -%.bin: %.lnk
> > -     $(OBJCOPY) -j .text -O binary $< $@
> > +text_gap := 0x010200
> > +text_diff := 0x408020
> > +
> > +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff)
> > +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL
> > +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE
> > +     $(call if_changed_dep,cpp_lds_S)
> > +
> > +targets += build32.offset.lds build32.base.lds
> > +
> > +# link all 32bit objects together
> > +$(obj)/built-in-32.tmp.o: $(obj32)
> > +     $(LD32) -r -o $@ $^
> > +
> > +# link bundle with a given layout and extract a binary from it
> > +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o
> > +     $(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^)
> > +     $(OBJCOPY) -j .text -O binary $(@:bin=o) $@
> > +     rm -f $(@:bin=o)
> > +
> > +quiet_cmd_combine = GEN     $@
> > +cmd_combine = \
> > +     $(PYTHON) $(srctree)/tools/combine_two_binaries.py \
> > +             --gap=$(text_gap) --text-diff=$(text_diff) \
> > +             --script $(obj)/build32.offset.lds \
> > +             --bin1 $(obj)/built-in-32.base.bin \
> > +             --bin2 $(obj)/built-in-32.offset.bin \
> > +             --map $(obj)/built-in-32.offset.map \
> > +             --exports cmdline_parse_early,reloc \
> > +             --output $@
>
> See xen/Rules.mk, for consistency the indentation should be done with
> spaces when defining variables.  That would also allow to align the
> options.
>

Changed.

Is there a reason why these variables (I think the correct make
terminology is macros) use "=" and not ":=" ?

> > +
> > +targets += built-in-32.S
> >
> > -%.lnk: %.32.o $(src)/build32.lds
> > -     $(LD32) -N -T $(filter %.lds,$^) -o $@ $<
> > +# generate final object file combining and checking above binaries
> > +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \
> > +             $(srctree)/tools/combine_two_binaries.py FORCE
>
> Can you indent this using spaces also, so it's on the same column as
> the ':'?
>

Changed.

> > +     $(call if_changed,combine)
> >
> > -clean-files := *.lnk *.bin
> > +clean-files := built-in-32.*.bin built-in-32.*.map build32.*.lds
> > diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S
> > similarity index 70%
> > rename from xen/arch/x86/boot/build32.lds
> > rename to xen/arch/x86/boot/build32.lds.S
> > index 56edaa727b..e3f5e55261 100644
> > --- a/xen/arch/x86/boot/build32.lds
> > +++ b/xen/arch/x86/boot/build32.lds.S
> > @@ -15,22 +15,47 @@
> >   * with this program.  If not, see <http://www.gnu.org/licenses/>.
> >   */
> >
> > -ENTRY(_start)
> > +#ifdef FINAL
> > +#  undef GAP
> > +#  define GAP 0
> > +#  define MULT 0
> > +#  define TEXT_START
> > +#else
> > +#  define MULT 1
> > +#  define TEXT_START TEXT_DIFF
> > +#endif
>
> In other places we use a single space between the hash and the define.
>

Changed.
This file has very weird indentation rules.

> > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT)
> > +
> > +ENTRY(dummy_start)
> >
> >  SECTIONS
> >  {
> >    /* Merge code and data into one section. */
> > -  .text : {
> > +  .text TEXT_START : {
> > +        /* Silence linker warning, we are not going to use it */
> > +        dummy_start = .;
> > +
> > +        /* Declare below any symbol name needed.
> > +         * Each symbol should be on its own line.
> > +         * It looks like a tedious work but we make sure the things we use.
> > +         * Potentially they should be all variables. */
>
> The style is wrong for the opening and closing comment delimiters.
>
> I think it would be best if this was written in a more natural style.
>
> /*
>  * Any symbols used should be declared below, this ensures which
>  * symbols are visible to the 32bit C boot code.
>  */
>

But why to remove the "Potentially they should be all variables.".
Surely something not written is more clear than something written, but
on the other way it carries no information.

> I don't think you need to mention that each symbol should be on it's
> own line.
>

Yes, this is also enforced by Python script, so you can't do that
mistake in any case.

> Thanks, Roger.

Frediano
Roger Pau Monné Oct. 18, 2024, 12:55 p.m. UTC | #9
On Fri, Oct 18, 2024 at 02:04:22PM +0200, Jan Beulich wrote:
> On 18.10.2024 13:41, Roger Pau Monné wrote:
> > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
> >> @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
> >>  $(obj)/%.32.o: $(src)/%.c FORCE
> >>  	$(call if_changed_rule,cc_o_c)
> >>  
> >> +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error
> >>  LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments
> >>  LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y)
> >>  LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT))
> >>  
> >> -%.bin: %.lnk
> >> -	$(OBJCOPY) -j .text -O binary $< $@
> >> +text_gap := 0x010200
> >> +text_diff := 0x408020
> >> +
> >> +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff)
> >> +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL
> >> +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE
> >> +	$(call if_changed_dep,cpp_lds_S)
> >> +
> >> +targets += build32.offset.lds build32.base.lds
> >> +
> >> +# link all 32bit objects together
> >> +$(obj)/built-in-32.tmp.o: $(obj32)
> >> +	$(LD32) -r -o $@ $^
> >> +
> >> +# link bundle with a given layout and extract a binary from it
> >> +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o
> >> +	$(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^)
> >> +	$(OBJCOPY) -j .text -O binary $(@:bin=o) $@
> >> +	rm -f $(@:bin=o)
> >> +
> >> +quiet_cmd_combine = GEN     $@
> >> +cmd_combine = \
> >> +	$(PYTHON) $(srctree)/tools/combine_two_binaries.py \
> >> +		--gap=$(text_gap) --text-diff=$(text_diff) \
> >> +		--script $(obj)/build32.offset.lds \
> >> +		--bin1 $(obj)/built-in-32.base.bin \
> >> +		--bin2 $(obj)/built-in-32.offset.bin \
> >> +		--map $(obj)/built-in-32.offset.map \
> >> +		--exports cmdline_parse_early,reloc \
> >> +		--output $@
> > 
> > See xen/Rules.mk, for consistency the indentation should be done with
> > spaces when defining variables.  That would also allow to align the
> > options.
> 
> And ideally also such that the options align with the first program
> argument.

Yup, that's what I was attempting to suggest, sorry if it wasn't
clear.

> >> +
> >> +targets += built-in-32.S
> >>  
> >> -%.lnk: %.32.o $(src)/build32.lds
> >> -	$(LD32) -N -T $(filter %.lds,$^) -o $@ $<
> >> +# generate final object file combining and checking above binaries
> >> +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \
> >> +		$(srctree)/tools/combine_two_binaries.py FORCE
> > 
> > Can you indent this using spaces also, so it's on the same column as
> > the ':'?
> 
> The first $(obj) you mean, I think / hope?

Indeed, it's one space after the ':':

# Generate final object file combining and checking above binaries
$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \
                      $(srctree)/tools/combine_two_binaries.py FORCE

Preferably comments should also start with an uppercase letter.

Note this already exceeds the 80 char maximum, as the longest line is
81.  I think we have been a bit lax with Makefiles however.

Thanks, Roger.
Roger Pau Monné Oct. 18, 2024, 12:59 p.m. UTC | #10
On Fri, Oct 18, 2024 at 01:48:27PM +0100, Frediano Ziglio wrote:
> On Fri, Oct 18, 2024 at 12:41 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
> > > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT)
> > > +
> > > +ENTRY(dummy_start)
> > >
> > >  SECTIONS
> > >  {
> > >    /* Merge code and data into one section. */
> > > -  .text : {
> > > +  .text TEXT_START : {
> > > +        /* Silence linker warning, we are not going to use it */
> > > +        dummy_start = .;
> > > +
> > > +        /* Declare below any symbol name needed.
> > > +         * Each symbol should be on its own line.
> > > +         * It looks like a tedious work but we make sure the things we use.
> > > +         * Potentially they should be all variables. */
> >
> > The style is wrong for the opening and closing comment delimiters.
> >
> > I think it would be best if this was written in a more natural style.
> >
> > /*
> >  * Any symbols used should be declared below, this ensures which
> >  * symbols are visible to the 32bit C boot code.
> >  */
> >
> 
> But why to remove the "Potentially they should be all variables.".
> Surely something not written is more clear than something written, but
> on the other way it carries no information.

I'm not sure I understand why this is helpful: either they are
mandated to be only variables, and hence the "potentially" is wrong, or
they are not, in which case I don't see why spelling a desire for they
to be only variables is helpful if it's not a strict requirement.

Thanks, Roger.
Frediano Ziglio Oct. 18, 2024, 1:28 p.m. UTC | #11
On Fri, Oct 18, 2024 at 12:49 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Oct 18, 2024 at 09:42:48AM +0100, Frediano Ziglio wrote:
> > On Thu, Oct 17, 2024 at 6:13 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > >
> > > On 17/10/2024 2:31 pm, Frediano Ziglio wrote:
> > > > The current method to include 32 bit C boot code is:
> > > > - compile each function we want to use into a separate object file;
> > > > - each function is compiled with -fpic option;
> > > > - convert these object files to binary files. This operation removes GOP
> > > >   which we don't want in the executable;
> > > > - a small assembly part in each file add the entry point;
> > > > - code can't have external references, all possible variables are passed
> > > >   by value or pointer;
> > > > - include these binary files in head.S.
> > > >
> > > > There are currently some limitations:
> > > > - code is compiled separately, it's not possible to share a function
> > > >   (like memcpy) between different functions to use;
> > > > - although code is compiled with -fpic there's no certainty there are
> > > >   no relocations, specifically data ones. This can lead into hard to
> > > >   find bugs;
> > > > - it's hard to add a simple function;
> > > > - having to pass external variables makes hard to do multiple things
> > > >   otherwise functions would require a lot of parameters so code would
> > > >   have to be split into multiple functions which is not easy.
> > > >
> > > > Current change extends the current process:
> > > > - all object files are linked together before getting converted making
> > > >   possible to share code between the function we want to call;
> > > > - a single object file is generated with all functions to use and
> > > >   exported symbols to easily call;
> > > > - variables to use are declared in linker script and easily used inside
> > > >   C code. Declaring them manually could be annoying but makes also
> > > >   easier to check them. Using external pointers can be still an issue if
> > > >   they are not fixed. If an external symbol is not declared this gives a
> > > >   link error.
> > > >
> > > > Some details of the implementation:
> > > > - C code is compiled with -fpic flags (as before);
> > > > - object files from C code are linked together;
> > > > - the single bundled object file is linked with 2 slightly different
> > > >   script files to generate 2 bundled object files;
> > > > - the 2 bundled object files are converted to binary removing the need
> > > >   for global offset tables;
> > > > - a Python script is used to generate assembly source from the 2
> > > >   binaries;
> > > > - the single assembly file is compiled to generate final bundled object
> > > >   file;
> > > > - to detect possible unwanted relocation in data/code code is generated
> > > >   with different addresses. This is enforced starting .text section at
> > > >   different positions and adding a fixed "gap" at the beginning.
> > > >   This makes sure code and data is position independent;
> > > > - to detect used symbols in data/code symbols are placed in .text
> > > >   section at different offsets (based on the line in the linker script).
> > > >   This is needed as potentially a reference to a symbol is converted to
> > > >   a reference to the containing section so multiple symbols could be
> > > >   converted to reference to same symbol (section name) and we need to
> > > >   distinguish them;
> > > > - --orphan-handling=error option to linker is used to make sure we
> > > >   account for all possible sections from C code;
> > > >
> > > > Current limitations:
> > > > - the main one is the lack of support for 64 bit code. It would make
> > > >   sure that even the code used for 64 bit (at the moment EFI code) is
> > > >   code and data position independent. We cannot assume that code that
> > > >   came from code compiled for 32 bit and compiled for 64 bit is code and
> > > >   data position independent, different compiler options lead to
> > > >   different code/data.
> > > >
> > > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
> > >
> > > This commit message is not particularly easy to follow.  Can I recommend
> > > the following:
> > >
> > > ---%<---
> > > x86/boot: Rework how 32bit C is linked/included for early boot
> > >
> > > Right now, the two functions which were really too complicated to write
> > > in asm are compiled as 32bit PIC, linked to a blob and included
> > > directly, using global asm() to arrange for them to have function semantics.
> > >
> > > This is limiting and fragile; the use of data relocations will compile
> > > fine but malfunction when used, creating hard-to-debug bugs.
> > >
> > > Furthermore, we would like to increase the amount of C, to
> > > deduplicate/unify Xen's boot logic, as well as making it easier to
> > > follow.  Therefore, rework how the 32bit objects are included.
> > >
> > > Link all 32bit objects together first.  This allows for sharing of logic
> > > between translation units.  Use differential linking and explicit
> > > imports/exports to confirm that we only have the expected relocations,
> > > and write the object back out as an assembly file so it can be linked
> > > again as if it were 64bit, to integrate with the rest of Xen.
> > >
> > > This allows for the use of external references (e.g. access to global
> > > variables) with reasonable assurance of doing so safely.
> > >
> > > No functional change.
> > > ---%<---
> > >
> > > which I think is an accurate and more concise summary of what's changing?
> > >
> >
> > You cut half of the explanation, replacing with nothing.
> > Why is a script needed? Why 2 linking? How the new method detects
> > unwanted relocations?
>
> TBH this is not clear to me even with your original commit message.
>

I'm starting to think that either me or Andrew are not the right
persons to write this, there's a lot of assumptions we assume for
granted.

From what I see, in my message:
----
- to detect possible unwanted relocation in data/code code is generated
  with different addresses. This is enforced starting .text section at
  different positions and adding a fixed "gap" at the beginning.
  This makes sure code and data is position independent;
- to detect used symbols in data/code symbols are placed in .text
  section at different offsets (based on the line in the linker script).
  This is needed as potentially a reference to a symbol is converted to
  a reference to the containing section so multiple symbols could be
  converted to reference to same symbol (section name) and we need to
  distinguish them;
----

in Andrew message:
----
Use differential linking and explicit imports/exports to confirm that
we only have the expected relocations,
----

probably both are cryptic, but getting from "differential linking" you
really need to know in advance what you are explaining.

> > Why wasn't possible to share functions?

mine:
----
- code is compiled separately, it's not possible to share a function
  (like memcpy) between different functions to use;
----

in Andrew message:
----
Link all 32bit objects together first.  This allows for sharing of
logic between translation units.
----

I would agree Andrew comment is clearer here.

> > Why using --orphan-handling option?

mine:
----
- --orphan-handling=error option to linker is used to make sure we
  account for all possible sections from C code;
---

in Andrew message:
----
----

still, Andrew more clear, but not carrying any information.

> >
> > The description has been there for about 2 months without many objections.
>
> IMO it's fine to use lists to describe specific points, but using
> lists exclusively to write a commit message makes the items feel
> disconnected between them.
>
> The format of the commit message by Andrew is clearer to undertsand
> for me.  Could you add what you think it's missing to the proposed
> message by Andrew?
>
> Thanks, Roger.

Probably, as said above, we (me and Andrew) have too many assumptions.
This commit is doing quite some magic that's not easy to grasp.
I can try to explain, and possibly you can suggest something that
makes sense also to people not too involved in this.

There are quite some challenges here. This code is executed during the
boot process and the environment is pretty uncommon. Simply writing a
message is not that easy. We are not sure if we have VGA, serial, BIOS
or UEFI. We are not executing in the location code was compiled/linked
to run so assuming it is wrong and causes issue; in other word the
code/data should be position independent. This code is meant to be
compiled and run in 32 mode, however Xen is 64 bit, so compiler output
can't be linked to the final executable. And obviously you cannot call
64 bit code from 32 bit code. Even if code would be compiled and run
in 64 bit mode, calling functions like printk would probably crash (it
does, we discovered recently) as Xen code assumes some environment
settings (in case of printk, just as example, it was missing per-cpu
info leading to pointer to garbage). In 32 bit mode, you can get code
position independence with -fpic, but this requires linker to generate
GOT table but 64 bit linker would generate that table in a position
not compatible with 32 bit code (and that's not the only issue of
using 64 bit linker on 32 bit code). So, to solve this code/data is
linked to a 32 bit object and converted to binary (note: this is an
invariant, it was done like that before and after this commit). Solved
the code with -fpic, what about data? Say we have something like
"const char *message = "hello";"... a pointer is a pointer, which is
the location of the data pointed. But as said before, we are in an
uncommon environment where code/data could be run at a different
location than compiled/linked! There's no magic option for doing that,
so instead of hoping for the best (as we are doing at the moment) we
check to not have pointers. How? We link code+data at different
locations which will generate different binary in that case and we
compare the 2 binaries to make sure there are no such differences
(well, this is a simplification of the process).


So... somebody willing to translate the above, surely poor and
unclear, explanation is somethig digestible by people less involved in
it?

Frediano
Frediano Ziglio Oct. 18, 2024, 1:45 p.m. UTC | #12
On Fri, Oct 18, 2024 at 1:59 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Fri, Oct 18, 2024 at 01:48:27PM +0100, Frediano Ziglio wrote:
> > On Fri, Oct 18, 2024 at 12:41 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > >
> > > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
> > > > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT)
> > > > +
> > > > +ENTRY(dummy_start)
> > > >
> > > >  SECTIONS
> > > >  {
> > > >    /* Merge code and data into one section. */
> > > > -  .text : {
> > > > +  .text TEXT_START : {
> > > > +        /* Silence linker warning, we are not going to use it */
> > > > +        dummy_start = .;
> > > > +
> > > > +        /* Declare below any symbol name needed.
> > > > +         * Each symbol should be on its own line.
> > > > +         * It looks like a tedious work but we make sure the things we use.
> > > > +         * Potentially they should be all variables. */
> > >
> > > The style is wrong for the opening and closing comment delimiters.
> > >
> > > I think it would be best if this was written in a more natural style.
> > >
> > > /*
> > >  * Any symbols used should be declared below, this ensures which
> > >  * symbols are visible to the 32bit C boot code.
> > >  */
> > >
> >
> > But why to remove the "Potentially they should be all variables.".
> > Surely something not written is more clear than something written, but
> > on the other way it carries no information.
>
> I'm not sure I understand why this is helpful: either they are
> mandated to be only variables, and hence the "potentially" is wrong, or
> they are not, in which case I don't see why spelling a desire for they
> to be only variables is helpful if it's not a strict requirement.
>

As normal, rules often have exceptions. Most of the functions (so
code) in Xen is 64 bit, so you don't want to use them. However, saying
you have a function in head.S written in assembly for 32 bit (or any
other functions written for 32 bit), you want the possibility to call
it. For instance you could export from head.S the function to output
to serial in the future.

About variables... are all variables fine to be accessed from these
functions? Probably yes if they have no pointers in them. If they have
pointers... that's another matter. Does the pointer have relocation?
Is it going to be used at the final defined program location or only
during initialization? To make an example, you could override a NULL
pointer (that is, without relocation) to a current symbol, if this
pointer is used after Xen is moved into its final position it will
become invalid. If, on the other hand, the pointer had relocation
potentially it will be automatically be relocated.

> Thanks, Roger.

Frediano
Roger Pau Monné Oct. 21, 2024, 8:01 a.m. UTC | #13
On Fri, Oct 18, 2024 at 02:45:59PM +0100, Frediano Ziglio wrote:
> On Fri, Oct 18, 2024 at 1:59 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Fri, Oct 18, 2024 at 01:48:27PM +0100, Frediano Ziglio wrote:
> > > On Fri, Oct 18, 2024 at 12:41 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> > > >
> > > > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote:
> > > > > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT)
> > > > > +
> > > > > +ENTRY(dummy_start)
> > > > >
> > > > >  SECTIONS
> > > > >  {
> > > > >    /* Merge code and data into one section. */
> > > > > -  .text : {
> > > > > +  .text TEXT_START : {
> > > > > +        /* Silence linker warning, we are not going to use it */
> > > > > +        dummy_start = .;
> > > > > +
> > > > > +        /* Declare below any symbol name needed.
> > > > > +         * Each symbol should be on its own line.
> > > > > +         * It looks like a tedious work but we make sure the things we use.
> > > > > +         * Potentially they should be all variables. */
> > > >
> > > > The style is wrong for the opening and closing comment delimiters.
> > > >
> > > > I think it would be best if this was written in a more natural style.
> > > >
> > > > /*
> > > >  * Any symbols used should be declared below, this ensures which
> > > >  * symbols are visible to the 32bit C boot code.
> > > >  */
> > > >
> > >
> > > But why to remove the "Potentially they should be all variables.".
> > > Surely something not written is more clear than something written, but
> > > on the other way it carries no information.
> >
> > I'm not sure I understand why this is helpful: either they are
> > mandated to be only variables, and hence the "potentially" is wrong, or
> > they are not, in which case I don't see why spelling a desire for they
> > to be only variables is helpful if it's not a strict requirement.
> >
> 
> As normal, rules often have exceptions. Most of the functions (so
> code) in Xen is 64 bit, so you don't want to use them. However, saying
> you have a function in head.S written in assembly for 32 bit (or any
> other functions written for 32 bit), you want the possibility to call
> it. For instance you could export from head.S the function to output
> to serial in the future.
> 
> About variables... are all variables fine to be accessed from these
> functions? Probably yes if they have no pointers in them. If they have
> pointers... that's another matter. Does the pointer have relocation?
> Is it going to be used at the final defined program location or only
> during initialization? To make an example, you could override a NULL
> pointer (that is, without relocation) to a current symbol, if this
> pointer is used after Xen is moved into its final position it will
> become invalid. If, on the other hand, the pointer had relocation
> potentially it will be automatically be relocated.

IMO comments are meant to clarify parts of the code.  A comment that
uses a conditional like "Potentially" introduces more ambiguity than
it removes, unless the restriction is stated in the comment itself.

I think you either you expand the comment to mention exactly which
kind of symbols can be declared, or you make the comment a more
restrictive one to avoid the ambiguity: "Symbols declared below should
all be variables."

Thanks, Roger.
Roger Pau Monné Oct. 21, 2024, 8:52 a.m. UTC | #14
On Fri, Oct 18, 2024 at 02:28:05PM +0100, Frediano Ziglio wrote:
> On Fri, Oct 18, 2024 at 12:49 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > On Fri, Oct 18, 2024 at 09:42:48AM +0100, Frediano Ziglio wrote:
> > > On Thu, Oct 17, 2024 at 6:13 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > > >
> > > > On 17/10/2024 2:31 pm, Frediano Ziglio wrote:
> > > > > The current method to include 32 bit C boot code is:
> > > > > - compile each function we want to use into a separate object file;
> > > > > - each function is compiled with -fpic option;
> > > > > - convert these object files to binary files. This operation removes GOP
> > > > >   which we don't want in the executable;
> > > > > - a small assembly part in each file add the entry point;
> > > > > - code can't have external references, all possible variables are passed
> > > > >   by value or pointer;
> > > > > - include these binary files in head.S.
> > > > >
> > > > > There are currently some limitations:
> > > > > - code is compiled separately, it's not possible to share a function
> > > > >   (like memcpy) between different functions to use;
> > > > > - although code is compiled with -fpic there's no certainty there are
> > > > >   no relocations, specifically data ones. This can lead into hard to
> > > > >   find bugs;
> > > > > - it's hard to add a simple function;
> > > > > - having to pass external variables makes hard to do multiple things
> > > > >   otherwise functions would require a lot of parameters so code would
> > > > >   have to be split into multiple functions which is not easy.
> > > > >
> > > > > Current change extends the current process:
> > > > > - all object files are linked together before getting converted making
> > > > >   possible to share code between the function we want to call;
> > > > > - a single object file is generated with all functions to use and
> > > > >   exported symbols to easily call;
> > > > > - variables to use are declared in linker script and easily used inside
> > > > >   C code. Declaring them manually could be annoying but makes also
> > > > >   easier to check them. Using external pointers can be still an issue if
> > > > >   they are not fixed. If an external symbol is not declared this gives a
> > > > >   link error.
> > > > >
> > > > > Some details of the implementation:
> > > > > - C code is compiled with -fpic flags (as before);
> > > > > - object files from C code are linked together;
> > > > > - the single bundled object file is linked with 2 slightly different
> > > > >   script files to generate 2 bundled object files;
> > > > > - the 2 bundled object files are converted to binary removing the need
> > > > >   for global offset tables;
> > > > > - a Python script is used to generate assembly source from the 2
> > > > >   binaries;
> > > > > - the single assembly file is compiled to generate final bundled object
> > > > >   file;
> > > > > - to detect possible unwanted relocation in data/code code is generated
> > > > >   with different addresses. This is enforced starting .text section at
> > > > >   different positions and adding a fixed "gap" at the beginning.
> > > > >   This makes sure code and data is position independent;
> > > > > - to detect used symbols in data/code symbols are placed in .text
> > > > >   section at different offsets (based on the line in the linker script).
> > > > >   This is needed as potentially a reference to a symbol is converted to
> > > > >   a reference to the containing section so multiple symbols could be
> > > > >   converted to reference to same symbol (section name) and we need to
> > > > >   distinguish them;
> > > > > - --orphan-handling=error option to linker is used to make sure we
> > > > >   account for all possible sections from C code;
> > > > >
> > > > > Current limitations:
> > > > > - the main one is the lack of support for 64 bit code. It would make
> > > > >   sure that even the code used for 64 bit (at the moment EFI code) is
> > > > >   code and data position independent. We cannot assume that code that
> > > > >   came from code compiled for 32 bit and compiled for 64 bit is code and
> > > > >   data position independent, different compiler options lead to
> > > > >   different code/data.
> > > > >
> > > > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
> > > >
> > > > This commit message is not particularly easy to follow.  Can I recommend
> > > > the following:
> > > >
> > > > ---%<---
> > > > x86/boot: Rework how 32bit C is linked/included for early boot
> > > >
> > > > Right now, the two functions which were really too complicated to write
> > > > in asm are compiled as 32bit PIC, linked to a blob and included
> > > > directly, using global asm() to arrange for them to have function semantics.
> > > >
> > > > This is limiting and fragile; the use of data relocations will compile
> > > > fine but malfunction when used, creating hard-to-debug bugs.
> > > >
> > > > Furthermore, we would like to increase the amount of C, to
> > > > deduplicate/unify Xen's boot logic, as well as making it easier to
> > > > follow.  Therefore, rework how the 32bit objects are included.
> > > >
> > > > Link all 32bit objects together first.  This allows for sharing of logic
> > > > between translation units.  Use differential linking and explicit
> > > > imports/exports to confirm that we only have the expected relocations,
> > > > and write the object back out as an assembly file so it can be linked
> > > > again as if it were 64bit, to integrate with the rest of Xen.
> > > >
> > > > This allows for the use of external references (e.g. access to global
> > > > variables) with reasonable assurance of doing so safely.
> > > >
> > > > No functional change.
> > > > ---%<---
> > > >
> > > > which I think is an accurate and more concise summary of what's changing?
> > > >
> > >
> > > You cut half of the explanation, replacing with nothing.
> > > Why is a script needed? Why 2 linking? How the new method detects
> > > unwanted relocations?
> >
> > TBH this is not clear to me even with your original commit message.
> >
> 
> I'm starting to think that either me or Andrew are not the right
> persons to write this, there's a lot of assumptions we assume for
> granted.
> 
> From what I see, in my message:
> ----
> - to detect possible unwanted relocation in data/code code is generated
>   with different addresses. This is enforced starting .text section at
>   different positions and adding a fixed "gap" at the beginning.
>   This makes sure code and data is position independent;
> - to detect used symbols in data/code symbols are placed in .text
>   section at different offsets (based on the line in the linker script).
>   This is needed as potentially a reference to a symbol is converted to
>   a reference to the containing section so multiple symbols could be
>   converted to reference to same symbol (section name) and we need to
>   distinguish them;
> ----
> 
> in Andrew message:
> ----
> Use differential linking and explicit imports/exports to confirm that
> we only have the expected relocations,
> ----
> 
> probably both are cryptic, but getting from "differential linking" you
> really need to know in advance what you are explaining.
> 
> > > Why wasn't possible to share functions?
> 
> mine:
> ----
> - code is compiled separately, it's not possible to share a function
>   (like memcpy) between different functions to use;
> ----
> 
> in Andrew message:
> ----
> Link all 32bit objects together first.  This allows for sharing of
> logic between translation units.
> ----
> 
> I would agree Andrew comment is clearer here.
> 
> > > Why using --orphan-handling option?
> 
> mine:
> ----
> - --orphan-handling=error option to linker is used to make sure we
>   account for all possible sections from C code;
> ---
> 
> in Andrew message:
> ----
> ----
> 
> still, Andrew more clear, but not carrying any information.

Maybe the issue is that some of the information you currently have in
the commit message would be better added as inline comments.  For
example the reasoning about why 2 linker passes are need should better
be added to xen/arch/x86/boot/Makefile IMO.

In any case the code in xen/arch/x86/boot/Makefile needs some
comments.  For example what are the seemingly random values in
text_{gap,diff}?  Could those values be different?

> 
> > >
> > > The description has been there for about 2 months without many objections.
> >
> > IMO it's fine to use lists to describe specific points, but using
> > lists exclusively to write a commit message makes the items feel
> > disconnected between them.
> >
> > The format of the commit message by Andrew is clearer to undertsand
> > for me.  Could you add what you think it's missing to the proposed
> > message by Andrew?
> >
> > Thanks, Roger.
> 
> Probably, as said above, we (me and Andrew) have too many assumptions.

Well, you are the author of the code, so it's expected from you to
provide a suitable commit message that goes together with the change,
as you ultimately knows exactly the reasoning behind the commit code
changes.

Andrew didn't think the provided message was fully suitable and as a
courtesy he suggested an adjusted wording.  He however had no
requirement to do such suggestion, neither you should feel his
verbatim wording is what should be used.

My bias is towards Andrew suggested message (or something along those
lines), because it can be read and parsed as a natural text rather
than unconnected bullet points.  It gives the reader enough context to
understand the intention of the code change.

I agree your commit message contains more details, but as said above,
it's in my opinion better for those implementation details to be added
as comments to the code.

> This commit is doing quite some magic that's not easy to grasp.
> I can try to explain, and possibly you can suggest something that
> makes sense also to people not too involved in this.
> 
> There are quite some challenges here. This code is executed during the
> boot process and the environment is pretty uncommon. Simply writing a
> message is not that easy. We are not sure if we have VGA, serial, BIOS
> or UEFI. We are not executing in the location code was compiled/linked
> to run so assuming it is wrong and causes issue; in other word the
> code/data should be position independent. This code is meant to be
> compiled and run in 32 mode, however Xen is 64 bit, so compiler output
> can't be linked to the final executable.

I've attempted to do something similar in FreeBSD for the PVH entry
point, so that the initial page-tables could be built in C rather than
asm:

https://reviews.freebsd.org/D44451

However there I didn't made the code position independent yet, and I
was using objcopy to convert the object from elf32-i386 to
elf64-x86-64 (sadly such conversion makes FreeBSD elftoolchain objcopy
explode).  I need to find some time to try and fix elftoolchain
objcopy so it can do the conversion.

The above however is much simpler, as FreeBSD PVH entry point is not
(yet) relocatable.

> And obviously you cannot call
> 64 bit code from 32 bit code. Even if code would be compiled and run
> in 64 bit mode, calling functions like printk would probably crash (it
> does, we discovered recently) as Xen code assumes some environment
> settings (in case of printk, just as example, it was missing per-cpu
> info leading to pointer to garbage). In 32 bit mode, you can get code
> position independence with -fpic, but this requires linker to generate
> GOT table but 64 bit linker would generate that table in a position
> not compatible with 32 bit code (and that's not the only issue of
> using 64 bit linker on 32 bit code). So, to solve this code/data is
> linked to a 32 bit object and converted to binary (note: this is an
> invariant, it was done like that before and after this commit). Solved
> the code with -fpic, what about data? Say we have something like
> "const char *message = "hello";"... a pointer is a pointer, which is
> the location of the data pointed. But as said before, we are in an
> uncommon environment where code/data could be run at a different
> location than compiled/linked! There's no magic option for doing that,
> so instead of hoping for the best (as we are doing at the moment) we
> check to not have pointers. How? We link code+data at different
> locations which will generate different binary in that case and we
> compare the 2 binaries to make sure there are no such differences
> (well, this is a simplification of the process).
> 
> 
> So... somebody willing to translate the above, surely poor and
> unclear, explanation is somethig digestible by people less involved in
> it?

I think it would be easier if you could attempt to convert the above
explanation as more concise inline comments.

It would also help if you could add a comment at the top of the
introduced script (xen/tools/combine_two_binaries.py) to describe it's
intended purpose.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore
index a379db7988..7e85549751 100644
--- a/xen/arch/x86/boot/.gitignore
+++ b/xen/arch/x86/boot/.gitignore
@@ -1,3 +1,4 @@ 
 /mkelf32
-/*.bin
-/*.lnk
+/build32.*.lds
+/built-in-32.*.bin
+/built-in-32.*.map
diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile
index 1199291d2b..5da19501be 100644
--- a/xen/arch/x86/boot/Makefile
+++ b/xen/arch/x86/boot/Makefile
@@ -1,4 +1,5 @@ 
 obj-bin-y += head.o
+obj-bin-y += built-in-32.o
 
 obj32 := cmdline.32.o
 obj32 += reloc.32.o
@@ -9,9 +10,6 @@  targets   += $(obj32)
 
 obj32 := $(addprefix $(obj)/,$(obj32))
 
-$(obj)/head.o: AFLAGS-y += -Wa$(comma)-I$(obj)
-$(obj)/head.o: $(obj32:.32.o=.bin)
-
 CFLAGS_x86_32 := $(subst -m64,-m32 -march=i686,$(XEN_TREEWIDE_CFLAGS))
 $(call cc-options-add,CFLAGS_x86_32,CC,$(EMBEDDED_EXTRA_CFLAGS))
 CFLAGS_x86_32 += -Werror -fno-builtin -g0 -msoft-float -mregparm=3
@@ -25,14 +23,47 @@  $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic
 $(obj)/%.32.o: $(src)/%.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
+orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error
 LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments
 LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y)
 LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT))
 
-%.bin: %.lnk
-	$(OBJCOPY) -j .text -O binary $< $@
+text_gap := 0x010200
+text_diff := 0x408020
+
+$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff)
+$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL
+$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE
+	$(call if_changed_dep,cpp_lds_S)
+
+targets += build32.offset.lds build32.base.lds
+
+# link all 32bit objects together
+$(obj)/built-in-32.tmp.o: $(obj32)
+	$(LD32) -r -o $@ $^
+
+# link bundle with a given layout and extract a binary from it
+$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o
+	$(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^)
+	$(OBJCOPY) -j .text -O binary $(@:bin=o) $@
+	rm -f $(@:bin=o)
+
+quiet_cmd_combine = GEN     $@
+cmd_combine = \
+	$(PYTHON) $(srctree)/tools/combine_two_binaries.py \
+		--gap=$(text_gap) --text-diff=$(text_diff) \
+		--script $(obj)/build32.offset.lds \
+		--bin1 $(obj)/built-in-32.base.bin \
+		--bin2 $(obj)/built-in-32.offset.bin \
+		--map $(obj)/built-in-32.offset.map \
+		--exports cmdline_parse_early,reloc \
+		--output $@
+
+targets += built-in-32.S
 
-%.lnk: %.32.o $(src)/build32.lds
-	$(LD32) -N -T $(filter %.lds,$^) -o $@ $<
+# generate final object file combining and checking above binaries
+$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \
+		$(srctree)/tools/combine_two_binaries.py FORCE
+	$(call if_changed,combine)
 
-clean-files := *.lnk *.bin
+clean-files := built-in-32.*.bin built-in-32.*.map build32.*.lds
diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S
similarity index 70%
rename from xen/arch/x86/boot/build32.lds
rename to xen/arch/x86/boot/build32.lds.S
index 56edaa727b..e3f5e55261 100644
--- a/xen/arch/x86/boot/build32.lds
+++ b/xen/arch/x86/boot/build32.lds.S
@@ -15,22 +15,47 @@ 
  * with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
-ENTRY(_start)
+#ifdef FINAL
+#  undef GAP
+#  define GAP 0
+#  define MULT 0
+#  define TEXT_START
+#else
+#  define MULT 1
+#  define TEXT_START TEXT_DIFF
+#endif
+#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT)
+
+ENTRY(dummy_start)
 
 SECTIONS
 {
   /* Merge code and data into one section. */
-  .text : {
+  .text TEXT_START : {
+        /* Silence linker warning, we are not going to use it */
+        dummy_start = .;
+
+        /* Declare below any symbol name needed.
+         * Each symbol should be on its own line.
+         * It looks like a tedious work but we make sure the things we use.
+         * Potentially they should be all variables. */
+        DECLARE_IMPORT(__base_relocs_start);
+        DECLARE_IMPORT(__base_relocs_end);
+        . = . + GAP;
         *(.text)
         *(.text.*)
-        *(.data)
-        *(.data.*)
         *(.rodata)
         *(.rodata.*)
+        *(.data)
+        *(.data.*)
         *(.bss)
         *(.bss.*)
   }
-
+  /DISCARD/ : {
+       *(.comment)
+       *(.comment.*)
+       *(.note.*)
+  }
   /* Dynamic linkage sections.  Collected simply so we can check they're empty. */
   .got : {
         *(.got)
diff --git a/xen/arch/x86/boot/cmdline.c b/xen/arch/x86/boot/cmdline.c
index fc9241ede9..196c580e91 100644
--- a/xen/arch/x86/boot/cmdline.c
+++ b/xen/arch/x86/boot/cmdline.c
@@ -18,18 +18,6 @@ 
  * Linux kernel source (linux/lib/string.c).
  */
 
-/*
- * This entry point is entered from xen/arch/x86/boot/head.S with:
- *   - %eax      = &cmdline,
- *   - %edx      = &early_boot_opts.
- */
-asm (
-    "    .text                         \n"
-    "    .globl _start                 \n"
-    "_start:                           \n"
-    "    jmp  cmdline_parse_early      \n"
-    );
-
 #include <xen/compiler.h>
 #include <xen/kconfig.h>
 #include <xen/macros.h>
diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index c4de1dfab5..e0776e3896 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -759,18 +759,6 @@  trampoline_setup:
         /* Jump into the relocated trampoline. */
         lret
 
-        /*
-         * cmdline and reloc are written in C, and linked to be 32bit PIC with
-         * entrypoints at 0 and using the fastcall convention.
-         */
-FUNC_LOCAL(cmdline_parse_early)
-        .incbin "cmdline.bin"
-END(cmdline_parse_early)
-
-FUNC_LOCAL(reloc)
-        .incbin "reloc.bin"
-END(reloc)
-
 ENTRY(trampoline_start)
 #include "trampoline.S"
 ENTRY(trampoline_end)
diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
index 8c58affcd9..94b078d7b1 100644
--- a/xen/arch/x86/boot/reloc.c
+++ b/xen/arch/x86/boot/reloc.c
@@ -12,20 +12,6 @@ 
  *    Daniel Kiper <daniel.kiper@oracle.com>
  */
 
-/*
- * This entry point is entered from xen/arch/x86/boot/head.S with:
- *   - %eax       = MAGIC,
- *   - %edx       = INFORMATION_ADDRESS,
- *   - %ecx       = TOPMOST_LOW_MEMORY_STACK_ADDRESS.
- *   - 0x04(%esp) = BOOT_VIDEO_INFO_ADDRESS.
- */
-asm (
-    "    .text                         \n"
-    "    .globl _start                 \n"
-    "_start:                           \n"
-    "    jmp  reloc                    \n"
-    );
-
 #include <xen/compiler.h>
 #include <xen/macros.h>
 #include <xen/types.h>
diff --git a/xen/tools/combine_two_binaries.py b/xen/tools/combine_two_binaries.py
new file mode 100755
index 0000000000..264be77274
--- /dev/null
+++ b/xen/tools/combine_two_binaries.py
@@ -0,0 +1,220 @@ 
+#!/usr/bin/env python3
+
+from __future__ import print_function
+import argparse
+import functools
+import re
+import struct
+import sys
+
+parser = argparse.ArgumentParser(description='Generate assembly file to merge into other code.')
+auto_int = functools.update_wrapper(lambda x: int(x, 0), int) # allows hex
+parser.add_argument('--script', dest='script',
+                    required=True,
+                    help='Linker script for extracting symbols')
+parser.add_argument('--bin1', dest='bin1',
+                    required=True,
+                    help='First binary')
+parser.add_argument('--bin2', dest='bin2',
+                    required=True,
+                    help='Second binary')
+parser.add_argument('--gap', dest='gap',
+                    required=True,
+                    type=auto_int,
+                    help='Gap inserted at the start of code section')
+parser.add_argument('--text-diff', dest='text_diff',
+                    required=True,
+                    type=auto_int,
+                    help='Difference between code section start')
+parser.add_argument('--output', dest='output',
+                    help='Output file')
+parser.add_argument('--map', dest='mapfile',
+                    help='Map file to read for symbols to export')
+parser.add_argument('--exports', dest='exports',
+                    help='Symbols to export')
+parser.add_argument('--section-header', dest='section_header',
+                    default='.section .init.text, "ax", @progbits',
+                    help='Section header declaration')
+parser.add_argument('-v', '--verbose',
+                    action='store_true')
+args = parser.parse_args()
+
+gap = args.gap
+text_diff = args.text_diff
+
+# Parse linker script for external symbols to use.
+# Next regex matches expanded DECLARE_IMPORT lines in linker script.
+symbol_re = re.compile(r'\s+(\S+)\s*=\s*\.\s*\+\s*\((\d+)\s*\*\s*0\s*\)\s*;')
+symbols = {}
+lines = {}
+for line in open(args.script):
+    m = symbol_re.match(line)
+    if not m:
+        continue
+    (name, line_num) = (m.group(1), int(m.group(2)))
+    if line_num == 0:
+        raise Exception("Invalid line number found:\n\t" + line)
+    if line_num in symbols:
+        raise Exception("Symbol with this line already present:\n\t" + line)
+    if name in lines:
+        raise Exception("Symbol with this name already present:\n\t" + name)
+    symbols[line_num] = name
+    lines[name] = line_num
+
+exports = []
+if args.exports is not None:
+    exports = dict([(name, None) for name in args.exports.split(',')])
+
+# Parse mapfile, look for ther symbols we want to export.
+if args.mapfile is not None:
+    symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
+    for line in open(args.mapfile):
+        m = symbol_re.match(line)
+        if not m or m.group(2) not in exports:
+            continue
+        addr = int(m.group(1), 16)
+        exports[m.group(2)] = addr
+for (name, addr) in exports.items():
+    if addr is None:
+        raise Exception("Required export symbols %s not found" % name)
+
+file1 = open(args.bin1, 'rb')
+file2 = open(args.bin2, 'rb')
+file1.seek(0, 2)
+size1 = file1.tell()
+file2.seek(0, 2)
+size2 = file2.tell()
+if size1 > size2:
+    file1, file2 = file2, file1
+    size1, size2 = size2, size1
+if size2 != size1 + gap:
+    raise Exception('File sizes do not match')
+del size2
+
+file1.seek(0, 0)
+data1 = file1.read(size1)
+del file1
+file2.seek(gap, 0)
+data2 = file2.read(size1)
+del file2
+
+max_line = max(symbols.keys())
+
+def to_int32(n):
+    '''Convert a number to signed 32 bit integer truncating if needed'''
+    mask = (1 << 32) - 1
+    h = 1 << 31
+    return (n & mask) ^ h - h
+
+i = 0
+references = {}
+internals = 0
+while i <= size1 - 4:
+    n1 = struct.unpack('<I', data1[i:i+4])[0]
+    n2 = struct.unpack('<I', data2[i:i+4])[0]
+    i += 1
+    # The two numbers are the same, no problems
+    if n1 == n2:
+        continue
+    # Try to understand why they are different
+    diff = to_int32(n1 - n2)
+    if diff == -gap: # this is an internal relocation
+        pos = i - 1
+        print("Internal relocation found at position %#x "
+              "n1=%#x n2=%#x diff=%#x" % (pos, n1, n2, diff),
+              file=sys.stderr)
+        i += 3
+        internals += 1
+        if internals >= 10:
+            break
+        continue
+    # This is a relative relocation to a symbol, accepted, code/data is
+    # relocatable.
+    if diff < gap and diff >= gap - max_line:
+        n = gap - diff
+        symbol = symbols.get(n)
+        # check we have a symbol
+        if symbol is None:
+            raise Exception("Cannot find symbol for line %d" % n)
+        pos = i - 1
+        if args.verbose:
+            print('Position %#x %d %s' % (pos, n, symbol), file=sys.stderr)
+        i += 3
+        references[pos] = symbol
+        continue
+    # First byte is the same, move to next byte
+    if diff & 0xff == 0 and i <= size1 - 4:
+       continue
+    # Probably a type of relocation we don't want or support
+    pos = i - 1
+    suggestion = ''
+    symbol = symbols.get(-diff - text_diff)
+    if symbol is not None:
+        suggestion = " Maybe %s is not defined as hidden?" % symbol
+    raise Exception("Unexpected difference found at %#x "
+                    "n1=%#x n2=%#x diff=%#x gap=%#x.%s" % \
+                    (pos, n1, n2, diff, gap, suggestion))
+if internals != 0:
+    raise Exception("Previous relocations found")
+
+def line_bytes(buf, out):
+    '''Output an assembly line with all bytes in "buf"'''
+    # Python 2 compatibility
+    if type(buf) == str:
+        print("\t.byte " + ','.join([str(ord(c)) for c in buf]), file=out)
+    else:
+        print("\t.byte " + ','.join([str(n) for n in buf]), file=out)
+
+def part(start, end, out):
+    '''Output bytes of "data" from "start" to "end"'''
+    while start < end:
+        e = min(start + 16, end)
+        line_bytes(data1[start:e], out)
+        start = e
+
+def reference(pos, out):
+    name = references[pos]
+    n = struct.unpack('<I', data1[pos:pos+4])[0]
+    sign = '+'
+    if n >= (1 << 31):
+        n -= (1 << 32)
+    n += pos
+    if n < 0:
+        n = -n
+        sign = '-'
+    print("\t.hidden %s\n"
+          "\t.long %s %s %#x - ." % (name, name, sign, n),
+          file=out)
+
+def output(out):
+    prev = 0
+    exports_by_addr = {}
+    for (sym, addr) in exports.items():
+        exports_by_addr.setdefault(addr, []).append(sym)
+    positions = list(references.keys())
+    positions += list(exports_by_addr.keys())
+    for pos in sorted(positions):
+        part(prev, pos, out)
+        prev = pos
+        if pos in references:
+            reference(pos, out)
+            prev = pos + 4
+        if pos in exports_by_addr:
+            for sym in exports_by_addr[pos]:
+                print("\t.global %s\n"
+                      "\t.hidden %s\n"
+                      "%s:" % (sym, sym, sym),
+                      file=out)
+    part(prev, size1, out)
+
+out = sys.stdout
+if args.output is not None:
+    out = open(args.output, 'w')
+print('''/*
+ * File autogenerated by combine_two_binaries.py DO NOT EDIT
+ */''', file=out)
+print('\t' + args.section_header, file=out)
+print('obj32_start:', file=out)
+output(out)
+print('\n\t.section\t.note.GNU-stack,"",@progbits', file=out)
+out.flush()