Message ID | 20241017133123.1946204-2-frediano.ziglio@cloud.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Reuse 32 bit C code more safely | expand |
On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: > +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) > +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL I was somehow expecting "base" and "offset" to be the other way around, but that's fine. And by the way, -DFINAL cancel changes to GAP and TEXT_DIFF ;-). But overall, the changes looks nice as is, Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> Thanks,
On 17/10/2024 2:31 pm, Frediano Ziglio wrote: > The current method to include 32 bit C boot code is: > - compile each function we want to use into a separate object file; > - each function is compiled with -fpic option; > - convert these object files to binary files. This operation removes GOP > which we don't want in the executable; > - a small assembly part in each file add the entry point; > - code can't have external references, all possible variables are passed > by value or pointer; > - include these binary files in head.S. > > There are currently some limitations: > - code is compiled separately, it's not possible to share a function > (like memcpy) between different functions to use; > - although code is compiled with -fpic there's no certainty there are > no relocations, specifically data ones. This can lead into hard to > find bugs; > - it's hard to add a simple function; > - having to pass external variables makes hard to do multiple things > otherwise functions would require a lot of parameters so code would > have to be split into multiple functions which is not easy. > > Current change extends the current process: > - all object files are linked together before getting converted making > possible to share code between the function we want to call; > - a single object file is generated with all functions to use and > exported symbols to easily call; > - variables to use are declared in linker script and easily used inside > C code. Declaring them manually could be annoying but makes also > easier to check them. Using external pointers can be still an issue if > they are not fixed. If an external symbol is not declared this gives a > link error. > > Some details of the implementation: > - C code is compiled with -fpic flags (as before); > - object files from C code are linked together; > - the single bundled object file is linked with 2 slightly different > script files to generate 2 bundled object files; > - the 2 bundled object files are converted to binary removing the need > for global offset tables; > - a Python script is used to generate assembly source from the 2 > binaries; > - the single assembly file is compiled to generate final bundled object > file; > - to detect possible unwanted relocation in data/code code is generated > with different addresses. This is enforced starting .text section at > different positions and adding a fixed "gap" at the beginning. > This makes sure code and data is position independent; > - to detect used symbols in data/code symbols are placed in .text > section at different offsets (based on the line in the linker script). > This is needed as potentially a reference to a symbol is converted to > a reference to the containing section so multiple symbols could be > converted to reference to same symbol (section name) and we need to > distinguish them; > - --orphan-handling=error option to linker is used to make sure we > account for all possible sections from C code; > > Current limitations: > - the main one is the lack of support for 64 bit code. It would make > sure that even the code used for 64 bit (at the moment EFI code) is > code and data position independent. We cannot assume that code that > came from code compiled for 32 bit and compiled for 64 bit is code and > data position independent, different compiler options lead to > different code/data. > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> This commit message is not particularly easy to follow. Can I recommend the following: ---%<--- x86/boot: Rework how 32bit C is linked/included for early boot Right now, the two functions which were really too complicated to write in asm are compiled as 32bit PIC, linked to a blob and included directly, using global asm() to arrange for them to have function semantics. This is limiting and fragile; the use of data relocations will compile fine but malfunction when used, creating hard-to-debug bugs. Furthermore, we would like to increase the amount of C, to deduplicate/unify Xen's boot logic, as well as making it easier to follow. Therefore, rework how the 32bit objects are included. Link all 32bit objects together first. This allows for sharing of logic between translation units. Use differential linking and explicit imports/exports to confirm that we only have the expected relocations, and write the object back out as an assembly file so it can be linked again as if it were 64bit, to integrate with the rest of Xen. This allows for the use of external references (e.g. access to global variables) with reasonable assurance of doing so safely. No functional change. ---%<--- which I think is an accurate and more concise summary of what's changing? > diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore > index a379db7988..7e85549751 100644 > --- a/xen/arch/x86/boot/.gitignore > +++ b/xen/arch/x86/boot/.gitignore > @@ -1,3 +1,4 @@ > /mkelf32 > -/*.bin > -/*.lnk > +/build32.*.lds > +/built-in-32.*.bin > +/built-in-32.*.map /built-in-32.S too And from a glance at the file, this adjustment in the combine script too: -print('\n\t.section\t.note.GNU-stack,"",@progbits', file=out) +print('\n\t.section .note.GNU-stack, "", @progbits', file=out) to have both .section's formatted in the same way. > diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S > similarity index 70% > rename from xen/arch/x86/boot/build32.lds > rename to xen/arch/x86/boot/build32.lds.S > index 56edaa727b..e3f5e55261 100644 > --- a/xen/arch/x86/boot/build32.lds > +++ b/xen/arch/x86/boot/build32.lds.S > <snip> > *(.text) > *(.text.*) > - *(.data) > - *(.data.*) > *(.rodata) > *(.rodata.*) > + *(.data) > + *(.data.*) Reordering .data and .rodata really isn't necessary. I'd just drop this part of the diff. I have some different follow-up for it anyway, which I've been holding off until after this first change is sorted. Everything here I'm happy to fix up on commit, if you're ok with me doing so. ~Andrew
On Thu, Oct 17, 2024 at 6:13 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > On 17/10/2024 2:31 pm, Frediano Ziglio wrote: > > The current method to include 32 bit C boot code is: > > - compile each function we want to use into a separate object file; > > - each function is compiled with -fpic option; > > - convert these object files to binary files. This operation removes GOP > > which we don't want in the executable; > > - a small assembly part in each file add the entry point; > > - code can't have external references, all possible variables are passed > > by value or pointer; > > - include these binary files in head.S. > > > > There are currently some limitations: > > - code is compiled separately, it's not possible to share a function > > (like memcpy) between different functions to use; > > - although code is compiled with -fpic there's no certainty there are > > no relocations, specifically data ones. This can lead into hard to > > find bugs; > > - it's hard to add a simple function; > > - having to pass external variables makes hard to do multiple things > > otherwise functions would require a lot of parameters so code would > > have to be split into multiple functions which is not easy. > > > > Current change extends the current process: > > - all object files are linked together before getting converted making > > possible to share code between the function we want to call; > > - a single object file is generated with all functions to use and > > exported symbols to easily call; > > - variables to use are declared in linker script and easily used inside > > C code. Declaring them manually could be annoying but makes also > > easier to check them. Using external pointers can be still an issue if > > they are not fixed. If an external symbol is not declared this gives a > > link error. > > > > Some details of the implementation: > > - C code is compiled with -fpic flags (as before); > > - object files from C code are linked together; > > - the single bundled object file is linked with 2 slightly different > > script files to generate 2 bundled object files; > > - the 2 bundled object files are converted to binary removing the need > > for global offset tables; > > - a Python script is used to generate assembly source from the 2 > > binaries; > > - the single assembly file is compiled to generate final bundled object > > file; > > - to detect possible unwanted relocation in data/code code is generated > > with different addresses. This is enforced starting .text section at > > different positions and adding a fixed "gap" at the beginning. > > This makes sure code and data is position independent; > > - to detect used symbols in data/code symbols are placed in .text > > section at different offsets (based on the line in the linker script). > > This is needed as potentially a reference to a symbol is converted to > > a reference to the containing section so multiple symbols could be > > converted to reference to same symbol (section name) and we need to > > distinguish them; > > - --orphan-handling=error option to linker is used to make sure we > > account for all possible sections from C code; > > > > Current limitations: > > - the main one is the lack of support for 64 bit code. It would make > > sure that even the code used for 64 bit (at the moment EFI code) is > > code and data position independent. We cannot assume that code that > > came from code compiled for 32 bit and compiled for 64 bit is code and > > data position independent, different compiler options lead to > > different code/data. > > > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> > > This commit message is not particularly easy to follow. Can I recommend > the following: > > ---%<--- > x86/boot: Rework how 32bit C is linked/included for early boot > > Right now, the two functions which were really too complicated to write > in asm are compiled as 32bit PIC, linked to a blob and included > directly, using global asm() to arrange for them to have function semantics. > > This is limiting and fragile; the use of data relocations will compile > fine but malfunction when used, creating hard-to-debug bugs. > > Furthermore, we would like to increase the amount of C, to > deduplicate/unify Xen's boot logic, as well as making it easier to > follow. Therefore, rework how the 32bit objects are included. > > Link all 32bit objects together first. This allows for sharing of logic > between translation units. Use differential linking and explicit > imports/exports to confirm that we only have the expected relocations, > and write the object back out as an assembly file so it can be linked > again as if it were 64bit, to integrate with the rest of Xen. > > This allows for the use of external references (e.g. access to global > variables) with reasonable assurance of doing so safely. > > No functional change. > ---%<--- > > which I think is an accurate and more concise summary of what's changing? > You cut half of the explanation, replacing with nothing. Why is a script needed? Why 2 linking? How the new method detects unwanted relocations? Why wasn't possible to share functions? Why using --orphan-handling option? The description has been there for about 2 months without many objections. > > diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore > > index a379db7988..7e85549751 100644 > > --- a/xen/arch/x86/boot/.gitignore > > +++ b/xen/arch/x86/boot/.gitignore > > @@ -1,3 +1,4 @@ > > /mkelf32 > > -/*.bin > > -/*.lnk > > +/build32.*.lds > > +/built-in-32.*.bin > > +/built-in-32.*.map > > /built-in-32.S too > Sure > And from a glance at the file, this adjustment in the combine script too: > > -print('\n\t.section\t.note.GNU-stack,"",@progbits', file=out) > +print('\n\t.section .note.GNU-stack, "", @progbits', file=out) > > to have both .section's formatted in the same way. > Fine > > > diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S > > similarity index 70% > > rename from xen/arch/x86/boot/build32.lds > > rename to xen/arch/x86/boot/build32.lds.S > > index 56edaa727b..e3f5e55261 100644 > > --- a/xen/arch/x86/boot/build32.lds > > +++ b/xen/arch/x86/boot/build32.lds.S > > <snip> > > *(.text) > > *(.text.*) > > - *(.data) > > - *(.data.*) > > *(.rodata) > > *(.rodata.*) > > + *(.data) > > + *(.data.*) > > Reordering .data and .rodata really isn't necessary. > Yes, I asked in some comment. No problem, can be removed. I'll write another commit. Not anyway strong, this is the general order of sections. Here won't make much difference, usually you want this order to minimize page changes (both text and rodata are read-only). > I'd just drop this part of the diff. I have some different follow-up > for it anyway, which I've been holding off until after this first change > is sorted. > > Everything here I'm happy to fix up on commit, if you're ok with me > doing so. > > ~Andrew Frediano
On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: > The current method to include 32 bit C boot code is: > - compile each function we want to use into a separate object file; > - each function is compiled with -fpic option; > - convert these object files to binary files. This operation removes GOP > which we don't want in the executable; > - a small assembly part in each file add the entry point; > - code can't have external references, all possible variables are passed > by value or pointer; > - include these binary files in head.S. > > There are currently some limitations: > - code is compiled separately, it's not possible to share a function > (like memcpy) between different functions to use; > - although code is compiled with -fpic there's no certainty there are > no relocations, specifically data ones. This can lead into hard to > find bugs; > - it's hard to add a simple function; > - having to pass external variables makes hard to do multiple things > otherwise functions would require a lot of parameters so code would > have to be split into multiple functions which is not easy. > > Current change extends the current process: > - all object files are linked together before getting converted making > possible to share code between the function we want to call; > - a single object file is generated with all functions to use and > exported symbols to easily call; > - variables to use are declared in linker script and easily used inside > C code. Declaring them manually could be annoying but makes also > easier to check them. Using external pointers can be still an issue if > they are not fixed. If an external symbol is not declared this gives a > link error. > > Some details of the implementation: > - C code is compiled with -fpic flags (as before); > - object files from C code are linked together; > - the single bundled object file is linked with 2 slightly different > script files to generate 2 bundled object files; > - the 2 bundled object files are converted to binary removing the need > for global offset tables; > - a Python script is used to generate assembly source from the 2 > binaries; > - the single assembly file is compiled to generate final bundled object > file; > - to detect possible unwanted relocation in data/code code is generated > with different addresses. This is enforced starting .text section at > different positions and adding a fixed "gap" at the beginning. > This makes sure code and data is position independent; > - to detect used symbols in data/code symbols are placed in .text > section at different offsets (based on the line in the linker script). > This is needed as potentially a reference to a symbol is converted to > a reference to the containing section so multiple symbols could be > converted to reference to same symbol (section name) and we need to > distinguish them; > - --orphan-handling=error option to linker is used to make sure we > account for all possible sections from C code; > > Current limitations: > - the main one is the lack of support for 64 bit code. It would make > sure that even the code used for 64 bit (at the moment EFI code) is > code and data position independent. We cannot assume that code that > came from code compiled for 32 bit and compiled for 64 bit is code and > data position independent, different compiler options lead to > different code/data. > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> > --- > Changes since v2: > - removed W^X limitation, allowing data; > - added some comments to python script; > - added extension to python script; > - added header to generated assembly code from python script; > - added starting symbol to generated assembly code from python script > to make disassembly more clear; > - other minor style changes to python script. > > Changes since v4: > - add build32.final.lds build32.other.lds to targets macro; > - place some comments over a rule, not inside; > - simplified linking and producing binary rule; > - renamed built_in_32 to built-in-32, coding style; > - fix minor indentation; > - put magic numbers in Makefile and propagate them; > - minor variable cleanups in Python script; > - add dependency to Python script. > > Changes since v5: > - renamed "other" and "final" phases to "base" and "offset"; > - use if_changed macro to generate built-in-32.S. > --- > xen/arch/x86/boot/.gitignore | 5 +- > xen/arch/x86/boot/Makefile | 47 +++- > .../x86/boot/{build32.lds => build32.lds.S} | 35 ++- > xen/arch/x86/boot/cmdline.c | 12 - > xen/arch/x86/boot/head.S | 12 - > xen/arch/x86/boot/reloc.c | 14 -- > xen/tools/combine_two_binaries.py | 220 ++++++++++++++++++ > 7 files changed, 292 insertions(+), 53 deletions(-) > rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (70%) > create mode 100755 xen/tools/combine_two_binaries.py > > diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore > index a379db7988..7e85549751 100644 > --- a/xen/arch/x86/boot/.gitignore > +++ b/xen/arch/x86/boot/.gitignore > @@ -1,3 +1,4 @@ > /mkelf32 > -/*.bin > -/*.lnk > +/build32.*.lds > +/built-in-32.*.bin > +/built-in-32.*.map > diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile > index 1199291d2b..5da19501be 100644 > --- a/xen/arch/x86/boot/Makefile > +++ b/xen/arch/x86/boot/Makefile > @@ -1,4 +1,5 @@ > obj-bin-y += head.o > +obj-bin-y += built-in-32.o > > obj32 := cmdline.32.o > obj32 += reloc.32.o > @@ -9,9 +10,6 @@ targets += $(obj32) > > obj32 := $(addprefix $(obj)/,$(obj32)) > > -$(obj)/head.o: AFLAGS-y += -Wa$(comma)-I$(obj) > -$(obj)/head.o: $(obj32:.32.o=.bin) > - > CFLAGS_x86_32 := $(subst -m64,-m32 -march=i686,$(XEN_TREEWIDE_CFLAGS)) > $(call cc-options-add,CFLAGS_x86_32,CC,$(EMBEDDED_EXTRA_CFLAGS)) > CFLAGS_x86_32 += -Werror -fno-builtin -g0 -msoft-float -mregparm=3 > @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic > $(obj)/%.32.o: $(src)/%.c FORCE > $(call if_changed_rule,cc_o_c) > > +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error > LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments > LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y) > LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT)) > > -%.bin: %.lnk > - $(OBJCOPY) -j .text -O binary $< $@ > +text_gap := 0x010200 > +text_diff := 0x408020 > + > +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) > +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL > +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE > + $(call if_changed_dep,cpp_lds_S) > + > +targets += build32.offset.lds build32.base.lds > + > +# link all 32bit objects together > +$(obj)/built-in-32.tmp.o: $(obj32) > + $(LD32) -r -o $@ $^ > + > +# link bundle with a given layout and extract a binary from it > +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o > + $(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^) > + $(OBJCOPY) -j .text -O binary $(@:bin=o) $@ > + rm -f $(@:bin=o) > + > +quiet_cmd_combine = GEN $@ > +cmd_combine = \ > + $(PYTHON) $(srctree)/tools/combine_two_binaries.py \ > + --gap=$(text_gap) --text-diff=$(text_diff) \ > + --script $(obj)/build32.offset.lds \ > + --bin1 $(obj)/built-in-32.base.bin \ > + --bin2 $(obj)/built-in-32.offset.bin \ > + --map $(obj)/built-in-32.offset.map \ > + --exports cmdline_parse_early,reloc \ > + --output $@ See xen/Rules.mk, for consistency the indentation should be done with spaces when defining variables. That would also allow to align the options. > + > +targets += built-in-32.S > > -%.lnk: %.32.o $(src)/build32.lds > - $(LD32) -N -T $(filter %.lds,$^) -o $@ $< > +# generate final object file combining and checking above binaries > +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \ > + $(srctree)/tools/combine_two_binaries.py FORCE Can you indent this using spaces also, so it's on the same column as the ':'? > + $(call if_changed,combine) > > -clean-files := *.lnk *.bin > +clean-files := built-in-32.*.bin built-in-32.*.map build32.*.lds > diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S > similarity index 70% > rename from xen/arch/x86/boot/build32.lds > rename to xen/arch/x86/boot/build32.lds.S > index 56edaa727b..e3f5e55261 100644 > --- a/xen/arch/x86/boot/build32.lds > +++ b/xen/arch/x86/boot/build32.lds.S > @@ -15,22 +15,47 @@ > * with this program. If not, see <http://www.gnu.org/licenses/>. > */ > > -ENTRY(_start) > +#ifdef FINAL > +# undef GAP > +# define GAP 0 > +# define MULT 0 > +# define TEXT_START > +#else > +# define MULT 1 > +# define TEXT_START TEXT_DIFF > +#endif In other places we use a single space between the hash and the define. > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT) > + > +ENTRY(dummy_start) > > SECTIONS > { > /* Merge code and data into one section. */ > - .text : { > + .text TEXT_START : { > + /* Silence linker warning, we are not going to use it */ > + dummy_start = .; > + > + /* Declare below any symbol name needed. > + * Each symbol should be on its own line. > + * It looks like a tedious work but we make sure the things we use. > + * Potentially they should be all variables. */ The style is wrong for the opening and closing comment delimiters. I think it would be best if this was written in a more natural style. /* * Any symbols used should be declared below, this ensures which * symbols are visible to the 32bit C boot code. */ I don't think you need to mention that each symbol should be on it's own line. Thanks, Roger.
On Fri, Oct 18, 2024 at 09:42:48AM +0100, Frediano Ziglio wrote: > On Thu, Oct 17, 2024 at 6:13 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > > > On 17/10/2024 2:31 pm, Frediano Ziglio wrote: > > > The current method to include 32 bit C boot code is: > > > - compile each function we want to use into a separate object file; > > > - each function is compiled with -fpic option; > > > - convert these object files to binary files. This operation removes GOP > > > which we don't want in the executable; > > > - a small assembly part in each file add the entry point; > > > - code can't have external references, all possible variables are passed > > > by value or pointer; > > > - include these binary files in head.S. > > > > > > There are currently some limitations: > > > - code is compiled separately, it's not possible to share a function > > > (like memcpy) between different functions to use; > > > - although code is compiled with -fpic there's no certainty there are > > > no relocations, specifically data ones. This can lead into hard to > > > find bugs; > > > - it's hard to add a simple function; > > > - having to pass external variables makes hard to do multiple things > > > otherwise functions would require a lot of parameters so code would > > > have to be split into multiple functions which is not easy. > > > > > > Current change extends the current process: > > > - all object files are linked together before getting converted making > > > possible to share code between the function we want to call; > > > - a single object file is generated with all functions to use and > > > exported symbols to easily call; > > > - variables to use are declared in linker script and easily used inside > > > C code. Declaring them manually could be annoying but makes also > > > easier to check them. Using external pointers can be still an issue if > > > they are not fixed. If an external symbol is not declared this gives a > > > link error. > > > > > > Some details of the implementation: > > > - C code is compiled with -fpic flags (as before); > > > - object files from C code are linked together; > > > - the single bundled object file is linked with 2 slightly different > > > script files to generate 2 bundled object files; > > > - the 2 bundled object files are converted to binary removing the need > > > for global offset tables; > > > - a Python script is used to generate assembly source from the 2 > > > binaries; > > > - the single assembly file is compiled to generate final bundled object > > > file; > > > - to detect possible unwanted relocation in data/code code is generated > > > with different addresses. This is enforced starting .text section at > > > different positions and adding a fixed "gap" at the beginning. > > > This makes sure code and data is position independent; > > > - to detect used symbols in data/code symbols are placed in .text > > > section at different offsets (based on the line in the linker script). > > > This is needed as potentially a reference to a symbol is converted to > > > a reference to the containing section so multiple symbols could be > > > converted to reference to same symbol (section name) and we need to > > > distinguish them; > > > - --orphan-handling=error option to linker is used to make sure we > > > account for all possible sections from C code; > > > > > > Current limitations: > > > - the main one is the lack of support for 64 bit code. It would make > > > sure that even the code used for 64 bit (at the moment EFI code) is > > > code and data position independent. We cannot assume that code that > > > came from code compiled for 32 bit and compiled for 64 bit is code and > > > data position independent, different compiler options lead to > > > different code/data. > > > > > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> > > > > This commit message is not particularly easy to follow. Can I recommend > > the following: > > > > ---%<--- > > x86/boot: Rework how 32bit C is linked/included for early boot > > > > Right now, the two functions which were really too complicated to write > > in asm are compiled as 32bit PIC, linked to a blob and included > > directly, using global asm() to arrange for them to have function semantics. > > > > This is limiting and fragile; the use of data relocations will compile > > fine but malfunction when used, creating hard-to-debug bugs. > > > > Furthermore, we would like to increase the amount of C, to > > deduplicate/unify Xen's boot logic, as well as making it easier to > > follow. Therefore, rework how the 32bit objects are included. > > > > Link all 32bit objects together first. This allows for sharing of logic > > between translation units. Use differential linking and explicit > > imports/exports to confirm that we only have the expected relocations, > > and write the object back out as an assembly file so it can be linked > > again as if it were 64bit, to integrate with the rest of Xen. > > > > This allows for the use of external references (e.g. access to global > > variables) with reasonable assurance of doing so safely. > > > > No functional change. > > ---%<--- > > > > which I think is an accurate and more concise summary of what's changing? > > > > You cut half of the explanation, replacing with nothing. > Why is a script needed? Why 2 linking? How the new method detects > unwanted relocations? TBH this is not clear to me even with your original commit message. > Why wasn't possible to share functions? > Why using --orphan-handling option? > > The description has been there for about 2 months without many objections. IMO it's fine to use lists to describe specific points, but using lists exclusively to write a commit message makes the items feel disconnected between them. The format of the commit message by Andrew is clearer to undertsand for me. Could you add what you think it's missing to the proposed message by Andrew? Thanks, Roger.
On 18.10.2024 13:41, Roger Pau Monné wrote: > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: >> @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic >> $(obj)/%.32.o: $(src)/%.c FORCE >> $(call if_changed_rule,cc_o_c) >> >> +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error >> LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments >> LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y) >> LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT)) >> >> -%.bin: %.lnk >> - $(OBJCOPY) -j .text -O binary $< $@ >> +text_gap := 0x010200 >> +text_diff := 0x408020 >> + >> +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) >> +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL >> +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE >> + $(call if_changed_dep,cpp_lds_S) >> + >> +targets += build32.offset.lds build32.base.lds >> + >> +# link all 32bit objects together >> +$(obj)/built-in-32.tmp.o: $(obj32) >> + $(LD32) -r -o $@ $^ >> + >> +# link bundle with a given layout and extract a binary from it >> +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o >> + $(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^) >> + $(OBJCOPY) -j .text -O binary $(@:bin=o) $@ >> + rm -f $(@:bin=o) >> + >> +quiet_cmd_combine = GEN $@ >> +cmd_combine = \ >> + $(PYTHON) $(srctree)/tools/combine_two_binaries.py \ >> + --gap=$(text_gap) --text-diff=$(text_diff) \ >> + --script $(obj)/build32.offset.lds \ >> + --bin1 $(obj)/built-in-32.base.bin \ >> + --bin2 $(obj)/built-in-32.offset.bin \ >> + --map $(obj)/built-in-32.offset.map \ >> + --exports cmdline_parse_early,reloc \ >> + --output $@ > > See xen/Rules.mk, for consistency the indentation should be done with > spaces when defining variables. That would also allow to align the > options. And ideally also such that the options align with the first program argument. >> + >> +targets += built-in-32.S >> >> -%.lnk: %.32.o $(src)/build32.lds >> - $(LD32) -N -T $(filter %.lds,$^) -o $@ $< >> +# generate final object file combining and checking above binaries >> +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \ >> + $(srctree)/tools/combine_two_binaries.py FORCE > > Can you indent this using spaces also, so it's on the same column as > the ':'? The first $(obj) you mean, I think / hope? Jan
On Thu, Oct 17, 2024 at 5:00 PM Anthony PERARD <anthony.perard@vates.tech> wrote: > > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: > > +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) > > +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL > > I was somehow expecting "base" and "offset" to be the other way around, That's weird. I mean, the "base" uses a GAP and TEXT_START of 0 while for "offset" they are not zero adding some offset to the code/data. > but that's fine. And by the way, -DFINAL cancel changes to GAP and > TEXT_DIFF ;-). > Yes, and potentially the first like added above can be removed. On the other hand, they are values used for the algorithm despite them being used in some cases or not. > But overall, the changes looks nice as is, > Reviewed-by: Anthony PERARD <anthony.perard@vates.tech> > > Thanks, > Thanks, Frediano
On Fri, Oct 18, 2024 at 12:41 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: > > The current method to include 32 bit C boot code is: > > - compile each function we want to use into a separate object file; > > - each function is compiled with -fpic option; > > - convert these object files to binary files. This operation removes GOP > > which we don't want in the executable; > > - a small assembly part in each file add the entry point; > > - code can't have external references, all possible variables are passed > > by value or pointer; > > - include these binary files in head.S. > > > > There are currently some limitations: > > - code is compiled separately, it's not possible to share a function > > (like memcpy) between different functions to use; > > - although code is compiled with -fpic there's no certainty there are > > no relocations, specifically data ones. This can lead into hard to > > find bugs; > > - it's hard to add a simple function; > > - having to pass external variables makes hard to do multiple things > > otherwise functions would require a lot of parameters so code would > > have to be split into multiple functions which is not easy. > > > > Current change extends the current process: > > - all object files are linked together before getting converted making > > possible to share code between the function we want to call; > > - a single object file is generated with all functions to use and > > exported symbols to easily call; > > - variables to use are declared in linker script and easily used inside > > C code. Declaring them manually could be annoying but makes also > > easier to check them. Using external pointers can be still an issue if > > they are not fixed. If an external symbol is not declared this gives a > > link error. > > > > Some details of the implementation: > > - C code is compiled with -fpic flags (as before); > > - object files from C code are linked together; > > - the single bundled object file is linked with 2 slightly different > > script files to generate 2 bundled object files; > > - the 2 bundled object files are converted to binary removing the need > > for global offset tables; > > - a Python script is used to generate assembly source from the 2 > > binaries; > > - the single assembly file is compiled to generate final bundled object > > file; > > - to detect possible unwanted relocation in data/code code is generated > > with different addresses. This is enforced starting .text section at > > different positions and adding a fixed "gap" at the beginning. > > This makes sure code and data is position independent; > > - to detect used symbols in data/code symbols are placed in .text > > section at different offsets (based on the line in the linker script). > > This is needed as potentially a reference to a symbol is converted to > > a reference to the containing section so multiple symbols could be > > converted to reference to same symbol (section name) and we need to > > distinguish them; > > - --orphan-handling=error option to linker is used to make sure we > > account for all possible sections from C code; > > > > Current limitations: > > - the main one is the lack of support for 64 bit code. It would make > > sure that even the code used for 64 bit (at the moment EFI code) is > > code and data position independent. We cannot assume that code that > > came from code compiled for 32 bit and compiled for 64 bit is code and > > data position independent, different compiler options lead to > > different code/data. > > > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> > > --- > > Changes since v2: > > - removed W^X limitation, allowing data; > > - added some comments to python script; > > - added extension to python script; > > - added header to generated assembly code from python script; > > - added starting symbol to generated assembly code from python script > > to make disassembly more clear; > > - other minor style changes to python script. > > > > Changes since v4: > > - add build32.final.lds build32.other.lds to targets macro; > > - place some comments over a rule, not inside; > > - simplified linking and producing binary rule; > > - renamed built_in_32 to built-in-32, coding style; > > - fix minor indentation; > > - put magic numbers in Makefile and propagate them; > > - minor variable cleanups in Python script; > > - add dependency to Python script. > > > > Changes since v5: > > - renamed "other" and "final" phases to "base" and "offset"; > > - use if_changed macro to generate built-in-32.S. > > --- > > xen/arch/x86/boot/.gitignore | 5 +- > > xen/arch/x86/boot/Makefile | 47 +++- > > .../x86/boot/{build32.lds => build32.lds.S} | 35 ++- > > xen/arch/x86/boot/cmdline.c | 12 - > > xen/arch/x86/boot/head.S | 12 - > > xen/arch/x86/boot/reloc.c | 14 -- > > xen/tools/combine_two_binaries.py | 220 ++++++++++++++++++ > > 7 files changed, 292 insertions(+), 53 deletions(-) > > rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (70%) > > create mode 100755 xen/tools/combine_two_binaries.py > > > > diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore > > index a379db7988..7e85549751 100644 > > --- a/xen/arch/x86/boot/.gitignore > > +++ b/xen/arch/x86/boot/.gitignore > > @@ -1,3 +1,4 @@ > > /mkelf32 > > -/*.bin > > -/*.lnk > > +/build32.*.lds > > +/built-in-32.*.bin > > +/built-in-32.*.map > > diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile > > index 1199291d2b..5da19501be 100644 > > --- a/xen/arch/x86/boot/Makefile > > +++ b/xen/arch/x86/boot/Makefile > > @@ -1,4 +1,5 @@ > > obj-bin-y += head.o > > +obj-bin-y += built-in-32.o > > > > obj32 := cmdline.32.o > > obj32 += reloc.32.o > > @@ -9,9 +10,6 @@ targets += $(obj32) > > > > obj32 := $(addprefix $(obj)/,$(obj32)) > > > > -$(obj)/head.o: AFLAGS-y += -Wa$(comma)-I$(obj) > > -$(obj)/head.o: $(obj32:.32.o=.bin) > > - > > CFLAGS_x86_32 := $(subst -m64,-m32 -march=i686,$(XEN_TREEWIDE_CFLAGS)) > > $(call cc-options-add,CFLAGS_x86_32,CC,$(EMBEDDED_EXTRA_CFLAGS)) > > CFLAGS_x86_32 += -Werror -fno-builtin -g0 -msoft-float -mregparm=3 > > @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic > > $(obj)/%.32.o: $(src)/%.c FORCE > > $(call if_changed_rule,cc_o_c) > > > > +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error > > LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments > > LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y) > > LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT)) > > > > -%.bin: %.lnk > > - $(OBJCOPY) -j .text -O binary $< $@ > > +text_gap := 0x010200 > > +text_diff := 0x408020 > > + > > +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) > > +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL > > +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE > > + $(call if_changed_dep,cpp_lds_S) > > + > > +targets += build32.offset.lds build32.base.lds > > + > > +# link all 32bit objects together > > +$(obj)/built-in-32.tmp.o: $(obj32) > > + $(LD32) -r -o $@ $^ > > + > > +# link bundle with a given layout and extract a binary from it > > +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o > > + $(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^) > > + $(OBJCOPY) -j .text -O binary $(@:bin=o) $@ > > + rm -f $(@:bin=o) > > + > > +quiet_cmd_combine = GEN $@ > > +cmd_combine = \ > > + $(PYTHON) $(srctree)/tools/combine_two_binaries.py \ > > + --gap=$(text_gap) --text-diff=$(text_diff) \ > > + --script $(obj)/build32.offset.lds \ > > + --bin1 $(obj)/built-in-32.base.bin \ > > + --bin2 $(obj)/built-in-32.offset.bin \ > > + --map $(obj)/built-in-32.offset.map \ > > + --exports cmdline_parse_early,reloc \ > > + --output $@ > > See xen/Rules.mk, for consistency the indentation should be done with > spaces when defining variables. That would also allow to align the > options. > Changed. Is there a reason why these variables (I think the correct make terminology is macros) use "=" and not ":=" ? > > + > > +targets += built-in-32.S > > > > -%.lnk: %.32.o $(src)/build32.lds > > - $(LD32) -N -T $(filter %.lds,$^) -o $@ $< > > +# generate final object file combining and checking above binaries > > +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \ > > + $(srctree)/tools/combine_two_binaries.py FORCE > > Can you indent this using spaces also, so it's on the same column as > the ':'? > Changed. > > + $(call if_changed,combine) > > > > -clean-files := *.lnk *.bin > > +clean-files := built-in-32.*.bin built-in-32.*.map build32.*.lds > > diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S > > similarity index 70% > > rename from xen/arch/x86/boot/build32.lds > > rename to xen/arch/x86/boot/build32.lds.S > > index 56edaa727b..e3f5e55261 100644 > > --- a/xen/arch/x86/boot/build32.lds > > +++ b/xen/arch/x86/boot/build32.lds.S > > @@ -15,22 +15,47 @@ > > * with this program. If not, see <http://www.gnu.org/licenses/>. > > */ > > > > -ENTRY(_start) > > +#ifdef FINAL > > +# undef GAP > > +# define GAP 0 > > +# define MULT 0 > > +# define TEXT_START > > +#else > > +# define MULT 1 > > +# define TEXT_START TEXT_DIFF > > +#endif > > In other places we use a single space between the hash and the define. > Changed. This file has very weird indentation rules. > > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT) > > + > > +ENTRY(dummy_start) > > > > SECTIONS > > { > > /* Merge code and data into one section. */ > > - .text : { > > + .text TEXT_START : { > > + /* Silence linker warning, we are not going to use it */ > > + dummy_start = .; > > + > > + /* Declare below any symbol name needed. > > + * Each symbol should be on its own line. > > + * It looks like a tedious work but we make sure the things we use. > > + * Potentially they should be all variables. */ > > The style is wrong for the opening and closing comment delimiters. > > I think it would be best if this was written in a more natural style. > > /* > * Any symbols used should be declared below, this ensures which > * symbols are visible to the 32bit C boot code. > */ > But why to remove the "Potentially they should be all variables.". Surely something not written is more clear than something written, but on the other way it carries no information. > I don't think you need to mention that each symbol should be on it's > own line. > Yes, this is also enforced by Python script, so you can't do that mistake in any case. > Thanks, Roger. Frediano
On Fri, Oct 18, 2024 at 02:04:22PM +0200, Jan Beulich wrote: > On 18.10.2024 13:41, Roger Pau Monné wrote: > > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: > >> @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic > >> $(obj)/%.32.o: $(src)/%.c FORCE > >> $(call if_changed_rule,cc_o_c) > >> > >> +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error > >> LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments > >> LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y) > >> LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT)) > >> > >> -%.bin: %.lnk > >> - $(OBJCOPY) -j .text -O binary $< $@ > >> +text_gap := 0x010200 > >> +text_diff := 0x408020 > >> + > >> +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) > >> +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL > >> +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE > >> + $(call if_changed_dep,cpp_lds_S) > >> + > >> +targets += build32.offset.lds build32.base.lds > >> + > >> +# link all 32bit objects together > >> +$(obj)/built-in-32.tmp.o: $(obj32) > >> + $(LD32) -r -o $@ $^ > >> + > >> +# link bundle with a given layout and extract a binary from it > >> +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o > >> + $(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^) > >> + $(OBJCOPY) -j .text -O binary $(@:bin=o) $@ > >> + rm -f $(@:bin=o) > >> + > >> +quiet_cmd_combine = GEN $@ > >> +cmd_combine = \ > >> + $(PYTHON) $(srctree)/tools/combine_two_binaries.py \ > >> + --gap=$(text_gap) --text-diff=$(text_diff) \ > >> + --script $(obj)/build32.offset.lds \ > >> + --bin1 $(obj)/built-in-32.base.bin \ > >> + --bin2 $(obj)/built-in-32.offset.bin \ > >> + --map $(obj)/built-in-32.offset.map \ > >> + --exports cmdline_parse_early,reloc \ > >> + --output $@ > > > > See xen/Rules.mk, for consistency the indentation should be done with > > spaces when defining variables. That would also allow to align the > > options. > > And ideally also such that the options align with the first program > argument. Yup, that's what I was attempting to suggest, sorry if it wasn't clear. > >> + > >> +targets += built-in-32.S > >> > >> -%.lnk: %.32.o $(src)/build32.lds > >> - $(LD32) -N -T $(filter %.lds,$^) -o $@ $< > >> +# generate final object file combining and checking above binaries > >> +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \ > >> + $(srctree)/tools/combine_two_binaries.py FORCE > > > > Can you indent this using spaces also, so it's on the same column as > > the ':'? > > The first $(obj) you mean, I think / hope? Indeed, it's one space after the ':': # Generate final object file combining and checking above binaries $(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \ $(srctree)/tools/combine_two_binaries.py FORCE Preferably comments should also start with an uppercase letter. Note this already exceeds the 80 char maximum, as the longest line is 81. I think we have been a bit lax with Makefiles however. Thanks, Roger.
On Fri, Oct 18, 2024 at 01:48:27PM +0100, Frediano Ziglio wrote: > On Fri, Oct 18, 2024 at 12:41 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: > > > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT) > > > + > > > +ENTRY(dummy_start) > > > > > > SECTIONS > > > { > > > /* Merge code and data into one section. */ > > > - .text : { > > > + .text TEXT_START : { > > > + /* Silence linker warning, we are not going to use it */ > > > + dummy_start = .; > > > + > > > + /* Declare below any symbol name needed. > > > + * Each symbol should be on its own line. > > > + * It looks like a tedious work but we make sure the things we use. > > > + * Potentially they should be all variables. */ > > > > The style is wrong for the opening and closing comment delimiters. > > > > I think it would be best if this was written in a more natural style. > > > > /* > > * Any symbols used should be declared below, this ensures which > > * symbols are visible to the 32bit C boot code. > > */ > > > > But why to remove the "Potentially they should be all variables.". > Surely something not written is more clear than something written, but > on the other way it carries no information. I'm not sure I understand why this is helpful: either they are mandated to be only variables, and hence the "potentially" is wrong, or they are not, in which case I don't see why spelling a desire for they to be only variables is helpful if it's not a strict requirement. Thanks, Roger.
On Fri, Oct 18, 2024 at 12:49 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > On Fri, Oct 18, 2024 at 09:42:48AM +0100, Frediano Ziglio wrote: > > On Thu, Oct 17, 2024 at 6:13 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > > > > > On 17/10/2024 2:31 pm, Frediano Ziglio wrote: > > > > The current method to include 32 bit C boot code is: > > > > - compile each function we want to use into a separate object file; > > > > - each function is compiled with -fpic option; > > > > - convert these object files to binary files. This operation removes GOP > > > > which we don't want in the executable; > > > > - a small assembly part in each file add the entry point; > > > > - code can't have external references, all possible variables are passed > > > > by value or pointer; > > > > - include these binary files in head.S. > > > > > > > > There are currently some limitations: > > > > - code is compiled separately, it's not possible to share a function > > > > (like memcpy) between different functions to use; > > > > - although code is compiled with -fpic there's no certainty there are > > > > no relocations, specifically data ones. This can lead into hard to > > > > find bugs; > > > > - it's hard to add a simple function; > > > > - having to pass external variables makes hard to do multiple things > > > > otherwise functions would require a lot of parameters so code would > > > > have to be split into multiple functions which is not easy. > > > > > > > > Current change extends the current process: > > > > - all object files are linked together before getting converted making > > > > possible to share code between the function we want to call; > > > > - a single object file is generated with all functions to use and > > > > exported symbols to easily call; > > > > - variables to use are declared in linker script and easily used inside > > > > C code. Declaring them manually could be annoying but makes also > > > > easier to check them. Using external pointers can be still an issue if > > > > they are not fixed. If an external symbol is not declared this gives a > > > > link error. > > > > > > > > Some details of the implementation: > > > > - C code is compiled with -fpic flags (as before); > > > > - object files from C code are linked together; > > > > - the single bundled object file is linked with 2 slightly different > > > > script files to generate 2 bundled object files; > > > > - the 2 bundled object files are converted to binary removing the need > > > > for global offset tables; > > > > - a Python script is used to generate assembly source from the 2 > > > > binaries; > > > > - the single assembly file is compiled to generate final bundled object > > > > file; > > > > - to detect possible unwanted relocation in data/code code is generated > > > > with different addresses. This is enforced starting .text section at > > > > different positions and adding a fixed "gap" at the beginning. > > > > This makes sure code and data is position independent; > > > > - to detect used symbols in data/code symbols are placed in .text > > > > section at different offsets (based on the line in the linker script). > > > > This is needed as potentially a reference to a symbol is converted to > > > > a reference to the containing section so multiple symbols could be > > > > converted to reference to same symbol (section name) and we need to > > > > distinguish them; > > > > - --orphan-handling=error option to linker is used to make sure we > > > > account for all possible sections from C code; > > > > > > > > Current limitations: > > > > - the main one is the lack of support for 64 bit code. It would make > > > > sure that even the code used for 64 bit (at the moment EFI code) is > > > > code and data position independent. We cannot assume that code that > > > > came from code compiled for 32 bit and compiled for 64 bit is code and > > > > data position independent, different compiler options lead to > > > > different code/data. > > > > > > > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> > > > > > > This commit message is not particularly easy to follow. Can I recommend > > > the following: > > > > > > ---%<--- > > > x86/boot: Rework how 32bit C is linked/included for early boot > > > > > > Right now, the two functions which were really too complicated to write > > > in asm are compiled as 32bit PIC, linked to a blob and included > > > directly, using global asm() to arrange for them to have function semantics. > > > > > > This is limiting and fragile; the use of data relocations will compile > > > fine but malfunction when used, creating hard-to-debug bugs. > > > > > > Furthermore, we would like to increase the amount of C, to > > > deduplicate/unify Xen's boot logic, as well as making it easier to > > > follow. Therefore, rework how the 32bit objects are included. > > > > > > Link all 32bit objects together first. This allows for sharing of logic > > > between translation units. Use differential linking and explicit > > > imports/exports to confirm that we only have the expected relocations, > > > and write the object back out as an assembly file so it can be linked > > > again as if it were 64bit, to integrate with the rest of Xen. > > > > > > This allows for the use of external references (e.g. access to global > > > variables) with reasonable assurance of doing so safely. > > > > > > No functional change. > > > ---%<--- > > > > > > which I think is an accurate and more concise summary of what's changing? > > > > > > > You cut half of the explanation, replacing with nothing. > > Why is a script needed? Why 2 linking? How the new method detects > > unwanted relocations? > > TBH this is not clear to me even with your original commit message. > I'm starting to think that either me or Andrew are not the right persons to write this, there's a lot of assumptions we assume for granted. From what I see, in my message: ---- - to detect possible unwanted relocation in data/code code is generated with different addresses. This is enforced starting .text section at different positions and adding a fixed "gap" at the beginning. This makes sure code and data is position independent; - to detect used symbols in data/code symbols are placed in .text section at different offsets (based on the line in the linker script). This is needed as potentially a reference to a symbol is converted to a reference to the containing section so multiple symbols could be converted to reference to same symbol (section name) and we need to distinguish them; ---- in Andrew message: ---- Use differential linking and explicit imports/exports to confirm that we only have the expected relocations, ---- probably both are cryptic, but getting from "differential linking" you really need to know in advance what you are explaining. > > Why wasn't possible to share functions? mine: ---- - code is compiled separately, it's not possible to share a function (like memcpy) between different functions to use; ---- in Andrew message: ---- Link all 32bit objects together first. This allows for sharing of logic between translation units. ---- I would agree Andrew comment is clearer here. > > Why using --orphan-handling option? mine: ---- - --orphan-handling=error option to linker is used to make sure we account for all possible sections from C code; --- in Andrew message: ---- ---- still, Andrew more clear, but not carrying any information. > > > > The description has been there for about 2 months without many objections. > > IMO it's fine to use lists to describe specific points, but using > lists exclusively to write a commit message makes the items feel > disconnected between them. > > The format of the commit message by Andrew is clearer to undertsand > for me. Could you add what you think it's missing to the proposed > message by Andrew? > > Thanks, Roger. Probably, as said above, we (me and Andrew) have too many assumptions. This commit is doing quite some magic that's not easy to grasp. I can try to explain, and possibly you can suggest something that makes sense also to people not too involved in this. There are quite some challenges here. This code is executed during the boot process and the environment is pretty uncommon. Simply writing a message is not that easy. We are not sure if we have VGA, serial, BIOS or UEFI. We are not executing in the location code was compiled/linked to run so assuming it is wrong and causes issue; in other word the code/data should be position independent. This code is meant to be compiled and run in 32 mode, however Xen is 64 bit, so compiler output can't be linked to the final executable. And obviously you cannot call 64 bit code from 32 bit code. Even if code would be compiled and run in 64 bit mode, calling functions like printk would probably crash (it does, we discovered recently) as Xen code assumes some environment settings (in case of printk, just as example, it was missing per-cpu info leading to pointer to garbage). In 32 bit mode, you can get code position independence with -fpic, but this requires linker to generate GOT table but 64 bit linker would generate that table in a position not compatible with 32 bit code (and that's not the only issue of using 64 bit linker on 32 bit code). So, to solve this code/data is linked to a 32 bit object and converted to binary (note: this is an invariant, it was done like that before and after this commit). Solved the code with -fpic, what about data? Say we have something like "const char *message = "hello";"... a pointer is a pointer, which is the location of the data pointed. But as said before, we are in an uncommon environment where code/data could be run at a different location than compiled/linked! There's no magic option for doing that, so instead of hoping for the best (as we are doing at the moment) we check to not have pointers. How? We link code+data at different locations which will generate different binary in that case and we compare the 2 binaries to make sure there are no such differences (well, this is a simplification of the process). So... somebody willing to translate the above, surely poor and unclear, explanation is somethig digestible by people less involved in it? Frediano
On Fri, Oct 18, 2024 at 1:59 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > On Fri, Oct 18, 2024 at 01:48:27PM +0100, Frediano Ziglio wrote: > > On Fri, Oct 18, 2024 at 12:41 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > > > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: > > > > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT) > > > > + > > > > +ENTRY(dummy_start) > > > > > > > > SECTIONS > > > > { > > > > /* Merge code and data into one section. */ > > > > - .text : { > > > > + .text TEXT_START : { > > > > + /* Silence linker warning, we are not going to use it */ > > > > + dummy_start = .; > > > > + > > > > + /* Declare below any symbol name needed. > > > > + * Each symbol should be on its own line. > > > > + * It looks like a tedious work but we make sure the things we use. > > > > + * Potentially they should be all variables. */ > > > > > > The style is wrong for the opening and closing comment delimiters. > > > > > > I think it would be best if this was written in a more natural style. > > > > > > /* > > > * Any symbols used should be declared below, this ensures which > > > * symbols are visible to the 32bit C boot code. > > > */ > > > > > > > But why to remove the "Potentially they should be all variables.". > > Surely something not written is more clear than something written, but > > on the other way it carries no information. > > I'm not sure I understand why this is helpful: either they are > mandated to be only variables, and hence the "potentially" is wrong, or > they are not, in which case I don't see why spelling a desire for they > to be only variables is helpful if it's not a strict requirement. > As normal, rules often have exceptions. Most of the functions (so code) in Xen is 64 bit, so you don't want to use them. However, saying you have a function in head.S written in assembly for 32 bit (or any other functions written for 32 bit), you want the possibility to call it. For instance you could export from head.S the function to output to serial in the future. About variables... are all variables fine to be accessed from these functions? Probably yes if they have no pointers in them. If they have pointers... that's another matter. Does the pointer have relocation? Is it going to be used at the final defined program location or only during initialization? To make an example, you could override a NULL pointer (that is, without relocation) to a current symbol, if this pointer is used after Xen is moved into its final position it will become invalid. If, on the other hand, the pointer had relocation potentially it will be automatically be relocated. > Thanks, Roger. Frediano
On Fri, Oct 18, 2024 at 02:45:59PM +0100, Frediano Ziglio wrote: > On Fri, Oct 18, 2024 at 1:59 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > On Fri, Oct 18, 2024 at 01:48:27PM +0100, Frediano Ziglio wrote: > > > On Fri, Oct 18, 2024 at 12:41 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > > > > > On Thu, Oct 17, 2024 at 02:31:19PM +0100, Frediano Ziglio wrote: > > > > > +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT) > > > > > + > > > > > +ENTRY(dummy_start) > > > > > > > > > > SECTIONS > > > > > { > > > > > /* Merge code and data into one section. */ > > > > > - .text : { > > > > > + .text TEXT_START : { > > > > > + /* Silence linker warning, we are not going to use it */ > > > > > + dummy_start = .; > > > > > + > > > > > + /* Declare below any symbol name needed. > > > > > + * Each symbol should be on its own line. > > > > > + * It looks like a tedious work but we make sure the things we use. > > > > > + * Potentially they should be all variables. */ > > > > > > > > The style is wrong for the opening and closing comment delimiters. > > > > > > > > I think it would be best if this was written in a more natural style. > > > > > > > > /* > > > > * Any symbols used should be declared below, this ensures which > > > > * symbols are visible to the 32bit C boot code. > > > > */ > > > > > > > > > > But why to remove the "Potentially they should be all variables.". > > > Surely something not written is more clear than something written, but > > > on the other way it carries no information. > > > > I'm not sure I understand why this is helpful: either they are > > mandated to be only variables, and hence the "potentially" is wrong, or > > they are not, in which case I don't see why spelling a desire for they > > to be only variables is helpful if it's not a strict requirement. > > > > As normal, rules often have exceptions. Most of the functions (so > code) in Xen is 64 bit, so you don't want to use them. However, saying > you have a function in head.S written in assembly for 32 bit (or any > other functions written for 32 bit), you want the possibility to call > it. For instance you could export from head.S the function to output > to serial in the future. > > About variables... are all variables fine to be accessed from these > functions? Probably yes if they have no pointers in them. If they have > pointers... that's another matter. Does the pointer have relocation? > Is it going to be used at the final defined program location or only > during initialization? To make an example, you could override a NULL > pointer (that is, without relocation) to a current symbol, if this > pointer is used after Xen is moved into its final position it will > become invalid. If, on the other hand, the pointer had relocation > potentially it will be automatically be relocated. IMO comments are meant to clarify parts of the code. A comment that uses a conditional like "Potentially" introduces more ambiguity than it removes, unless the restriction is stated in the comment itself. I think you either you expand the comment to mention exactly which kind of symbols can be declared, or you make the comment a more restrictive one to avoid the ambiguity: "Symbols declared below should all be variables." Thanks, Roger.
On Fri, Oct 18, 2024 at 02:28:05PM +0100, Frediano Ziglio wrote: > On Fri, Oct 18, 2024 at 12:49 PM Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > On Fri, Oct 18, 2024 at 09:42:48AM +0100, Frediano Ziglio wrote: > > > On Thu, Oct 17, 2024 at 6:13 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > > > > > > > On 17/10/2024 2:31 pm, Frediano Ziglio wrote: > > > > > The current method to include 32 bit C boot code is: > > > > > - compile each function we want to use into a separate object file; > > > > > - each function is compiled with -fpic option; > > > > > - convert these object files to binary files. This operation removes GOP > > > > > which we don't want in the executable; > > > > > - a small assembly part in each file add the entry point; > > > > > - code can't have external references, all possible variables are passed > > > > > by value or pointer; > > > > > - include these binary files in head.S. > > > > > > > > > > There are currently some limitations: > > > > > - code is compiled separately, it's not possible to share a function > > > > > (like memcpy) between different functions to use; > > > > > - although code is compiled with -fpic there's no certainty there are > > > > > no relocations, specifically data ones. This can lead into hard to > > > > > find bugs; > > > > > - it's hard to add a simple function; > > > > > - having to pass external variables makes hard to do multiple things > > > > > otherwise functions would require a lot of parameters so code would > > > > > have to be split into multiple functions which is not easy. > > > > > > > > > > Current change extends the current process: > > > > > - all object files are linked together before getting converted making > > > > > possible to share code between the function we want to call; > > > > > - a single object file is generated with all functions to use and > > > > > exported symbols to easily call; > > > > > - variables to use are declared in linker script and easily used inside > > > > > C code. Declaring them manually could be annoying but makes also > > > > > easier to check them. Using external pointers can be still an issue if > > > > > they are not fixed. If an external symbol is not declared this gives a > > > > > link error. > > > > > > > > > > Some details of the implementation: > > > > > - C code is compiled with -fpic flags (as before); > > > > > - object files from C code are linked together; > > > > > - the single bundled object file is linked with 2 slightly different > > > > > script files to generate 2 bundled object files; > > > > > - the 2 bundled object files are converted to binary removing the need > > > > > for global offset tables; > > > > > - a Python script is used to generate assembly source from the 2 > > > > > binaries; > > > > > - the single assembly file is compiled to generate final bundled object > > > > > file; > > > > > - to detect possible unwanted relocation in data/code code is generated > > > > > with different addresses. This is enforced starting .text section at > > > > > different positions and adding a fixed "gap" at the beginning. > > > > > This makes sure code and data is position independent; > > > > > - to detect used symbols in data/code symbols are placed in .text > > > > > section at different offsets (based on the line in the linker script). > > > > > This is needed as potentially a reference to a symbol is converted to > > > > > a reference to the containing section so multiple symbols could be > > > > > converted to reference to same symbol (section name) and we need to > > > > > distinguish them; > > > > > - --orphan-handling=error option to linker is used to make sure we > > > > > account for all possible sections from C code; > > > > > > > > > > Current limitations: > > > > > - the main one is the lack of support for 64 bit code. It would make > > > > > sure that even the code used for 64 bit (at the moment EFI code) is > > > > > code and data position independent. We cannot assume that code that > > > > > came from code compiled for 32 bit and compiled for 64 bit is code and > > > > > data position independent, different compiler options lead to > > > > > different code/data. > > > > > > > > > > Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> > > > > > > > > This commit message is not particularly easy to follow. Can I recommend > > > > the following: > > > > > > > > ---%<--- > > > > x86/boot: Rework how 32bit C is linked/included for early boot > > > > > > > > Right now, the two functions which were really too complicated to write > > > > in asm are compiled as 32bit PIC, linked to a blob and included > > > > directly, using global asm() to arrange for them to have function semantics. > > > > > > > > This is limiting and fragile; the use of data relocations will compile > > > > fine but malfunction when used, creating hard-to-debug bugs. > > > > > > > > Furthermore, we would like to increase the amount of C, to > > > > deduplicate/unify Xen's boot logic, as well as making it easier to > > > > follow. Therefore, rework how the 32bit objects are included. > > > > > > > > Link all 32bit objects together first. This allows for sharing of logic > > > > between translation units. Use differential linking and explicit > > > > imports/exports to confirm that we only have the expected relocations, > > > > and write the object back out as an assembly file so it can be linked > > > > again as if it were 64bit, to integrate with the rest of Xen. > > > > > > > > This allows for the use of external references (e.g. access to global > > > > variables) with reasonable assurance of doing so safely. > > > > > > > > No functional change. > > > > ---%<--- > > > > > > > > which I think is an accurate and more concise summary of what's changing? > > > > > > > > > > You cut half of the explanation, replacing with nothing. > > > Why is a script needed? Why 2 linking? How the new method detects > > > unwanted relocations? > > > > TBH this is not clear to me even with your original commit message. > > > > I'm starting to think that either me or Andrew are not the right > persons to write this, there's a lot of assumptions we assume for > granted. > > From what I see, in my message: > ---- > - to detect possible unwanted relocation in data/code code is generated > with different addresses. This is enforced starting .text section at > different positions and adding a fixed "gap" at the beginning. > This makes sure code and data is position independent; > - to detect used symbols in data/code symbols are placed in .text > section at different offsets (based on the line in the linker script). > This is needed as potentially a reference to a symbol is converted to > a reference to the containing section so multiple symbols could be > converted to reference to same symbol (section name) and we need to > distinguish them; > ---- > > in Andrew message: > ---- > Use differential linking and explicit imports/exports to confirm that > we only have the expected relocations, > ---- > > probably both are cryptic, but getting from "differential linking" you > really need to know in advance what you are explaining. > > > > Why wasn't possible to share functions? > > mine: > ---- > - code is compiled separately, it's not possible to share a function > (like memcpy) between different functions to use; > ---- > > in Andrew message: > ---- > Link all 32bit objects together first. This allows for sharing of > logic between translation units. > ---- > > I would agree Andrew comment is clearer here. > > > > Why using --orphan-handling option? > > mine: > ---- > - --orphan-handling=error option to linker is used to make sure we > account for all possible sections from C code; > --- > > in Andrew message: > ---- > ---- > > still, Andrew more clear, but not carrying any information. Maybe the issue is that some of the information you currently have in the commit message would be better added as inline comments. For example the reasoning about why 2 linker passes are need should better be added to xen/arch/x86/boot/Makefile IMO. In any case the code in xen/arch/x86/boot/Makefile needs some comments. For example what are the seemingly random values in text_{gap,diff}? Could those values be different? > > > > > > > The description has been there for about 2 months without many objections. > > > > IMO it's fine to use lists to describe specific points, but using > > lists exclusively to write a commit message makes the items feel > > disconnected between them. > > > > The format of the commit message by Andrew is clearer to undertsand > > for me. Could you add what you think it's missing to the proposed > > message by Andrew? > > > > Thanks, Roger. > > Probably, as said above, we (me and Andrew) have too many assumptions. Well, you are the author of the code, so it's expected from you to provide a suitable commit message that goes together with the change, as you ultimately knows exactly the reasoning behind the commit code changes. Andrew didn't think the provided message was fully suitable and as a courtesy he suggested an adjusted wording. He however had no requirement to do such suggestion, neither you should feel his verbatim wording is what should be used. My bias is towards Andrew suggested message (or something along those lines), because it can be read and parsed as a natural text rather than unconnected bullet points. It gives the reader enough context to understand the intention of the code change. I agree your commit message contains more details, but as said above, it's in my opinion better for those implementation details to be added as comments to the code. > This commit is doing quite some magic that's not easy to grasp. > I can try to explain, and possibly you can suggest something that > makes sense also to people not too involved in this. > > There are quite some challenges here. This code is executed during the > boot process and the environment is pretty uncommon. Simply writing a > message is not that easy. We are not sure if we have VGA, serial, BIOS > or UEFI. We are not executing in the location code was compiled/linked > to run so assuming it is wrong and causes issue; in other word the > code/data should be position independent. This code is meant to be > compiled and run in 32 mode, however Xen is 64 bit, so compiler output > can't be linked to the final executable. I've attempted to do something similar in FreeBSD for the PVH entry point, so that the initial page-tables could be built in C rather than asm: https://reviews.freebsd.org/D44451 However there I didn't made the code position independent yet, and I was using objcopy to convert the object from elf32-i386 to elf64-x86-64 (sadly such conversion makes FreeBSD elftoolchain objcopy explode). I need to find some time to try and fix elftoolchain objcopy so it can do the conversion. The above however is much simpler, as FreeBSD PVH entry point is not (yet) relocatable. > And obviously you cannot call > 64 bit code from 32 bit code. Even if code would be compiled and run > in 64 bit mode, calling functions like printk would probably crash (it > does, we discovered recently) as Xen code assumes some environment > settings (in case of printk, just as example, it was missing per-cpu > info leading to pointer to garbage). In 32 bit mode, you can get code > position independence with -fpic, but this requires linker to generate > GOT table but 64 bit linker would generate that table in a position > not compatible with 32 bit code (and that's not the only issue of > using 64 bit linker on 32 bit code). So, to solve this code/data is > linked to a 32 bit object and converted to binary (note: this is an > invariant, it was done like that before and after this commit). Solved > the code with -fpic, what about data? Say we have something like > "const char *message = "hello";"... a pointer is a pointer, which is > the location of the data pointed. But as said before, we are in an > uncommon environment where code/data could be run at a different > location than compiled/linked! There's no magic option for doing that, > so instead of hoping for the best (as we are doing at the moment) we > check to not have pointers. How? We link code+data at different > locations which will generate different binary in that case and we > compare the 2 binaries to make sure there are no such differences > (well, this is a simplification of the process). > > > So... somebody willing to translate the above, surely poor and > unclear, explanation is somethig digestible by people less involved in > it? I think it would be easier if you could attempt to convert the above explanation as more concise inline comments. It would also help if you could add a comment at the top of the introduced script (xen/tools/combine_two_binaries.py) to describe it's intended purpose. Thanks, Roger.
diff --git a/xen/arch/x86/boot/.gitignore b/xen/arch/x86/boot/.gitignore index a379db7988..7e85549751 100644 --- a/xen/arch/x86/boot/.gitignore +++ b/xen/arch/x86/boot/.gitignore @@ -1,3 +1,4 @@ /mkelf32 -/*.bin -/*.lnk +/build32.*.lds +/built-in-32.*.bin +/built-in-32.*.map diff --git a/xen/arch/x86/boot/Makefile b/xen/arch/x86/boot/Makefile index 1199291d2b..5da19501be 100644 --- a/xen/arch/x86/boot/Makefile +++ b/xen/arch/x86/boot/Makefile @@ -1,4 +1,5 @@ obj-bin-y += head.o +obj-bin-y += built-in-32.o obj32 := cmdline.32.o obj32 += reloc.32.o @@ -9,9 +10,6 @@ targets += $(obj32) obj32 := $(addprefix $(obj)/,$(obj32)) -$(obj)/head.o: AFLAGS-y += -Wa$(comma)-I$(obj) -$(obj)/head.o: $(obj32:.32.o=.bin) - CFLAGS_x86_32 := $(subst -m64,-m32 -march=i686,$(XEN_TREEWIDE_CFLAGS)) $(call cc-options-add,CFLAGS_x86_32,CC,$(EMBEDDED_EXTRA_CFLAGS)) CFLAGS_x86_32 += -Werror -fno-builtin -g0 -msoft-float -mregparm=3 @@ -25,14 +23,47 @@ $(obj32): XEN_CFLAGS := $(CFLAGS_x86_32) -fpic $(obj)/%.32.o: $(src)/%.c FORCE $(call if_changed_rule,cc_o_c) +orphan-handling-$(call ld-option,--orphan-handling=error) := --orphan-handling=error LDFLAGS_DIRECT-$(call ld-option,--warn-rwx-segments) := --no-warn-rwx-segments LDFLAGS_DIRECT += $(LDFLAGS_DIRECT-y) LD32 := $(LD) $(subst x86_64,i386,$(LDFLAGS_DIRECT)) -%.bin: %.lnk - $(OBJCOPY) -j .text -O binary $< $@ +text_gap := 0x010200 +text_diff := 0x408020 + +$(obj)/build32.base.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) +$(obj)/build32.offset.lds: AFLAGS-y += -DGAP=$(text_gap) -DTEXT_DIFF=$(text_diff) -DFINAL +$(obj)/build32.base.lds $(obj)/build32.offset.lds: $(src)/build32.lds.S FORCE + $(call if_changed_dep,cpp_lds_S) + +targets += build32.offset.lds build32.base.lds + +# link all 32bit objects together +$(obj)/built-in-32.tmp.o: $(obj32) + $(LD32) -r -o $@ $^ + +# link bundle with a given layout and extract a binary from it +$(obj)/built-in-32.%.bin: $(obj)/build32.%.lds $(obj)/built-in-32.tmp.o + $(LD32) $(orphan-handling-y) -N -T $< -Map $(@:bin=map) -o $(@:bin=o) $(filter %.o,$^) + $(OBJCOPY) -j .text -O binary $(@:bin=o) $@ + rm -f $(@:bin=o) + +quiet_cmd_combine = GEN $@ +cmd_combine = \ + $(PYTHON) $(srctree)/tools/combine_two_binaries.py \ + --gap=$(text_gap) --text-diff=$(text_diff) \ + --script $(obj)/build32.offset.lds \ + --bin1 $(obj)/built-in-32.base.bin \ + --bin2 $(obj)/built-in-32.offset.bin \ + --map $(obj)/built-in-32.offset.map \ + --exports cmdline_parse_early,reloc \ + --output $@ + +targets += built-in-32.S -%.lnk: %.32.o $(src)/build32.lds - $(LD32) -N -T $(filter %.lds,$^) -o $@ $< +# generate final object file combining and checking above binaries +$(obj)/built-in-32.S: $(obj)/built-in-32.base.bin $(obj)/built-in-32.offset.bin \ + $(srctree)/tools/combine_two_binaries.py FORCE + $(call if_changed,combine) -clean-files := *.lnk *.bin +clean-files := built-in-32.*.bin built-in-32.*.map build32.*.lds diff --git a/xen/arch/x86/boot/build32.lds b/xen/arch/x86/boot/build32.lds.S similarity index 70% rename from xen/arch/x86/boot/build32.lds rename to xen/arch/x86/boot/build32.lds.S index 56edaa727b..e3f5e55261 100644 --- a/xen/arch/x86/boot/build32.lds +++ b/xen/arch/x86/boot/build32.lds.S @@ -15,22 +15,47 @@ * with this program. If not, see <http://www.gnu.org/licenses/>. */ -ENTRY(_start) +#ifdef FINAL +# undef GAP +# define GAP 0 +# define MULT 0 +# define TEXT_START +#else +# define MULT 1 +# define TEXT_START TEXT_DIFF +#endif +#define DECLARE_IMPORT(name) name = . + (__LINE__ * MULT) + +ENTRY(dummy_start) SECTIONS { /* Merge code and data into one section. */ - .text : { + .text TEXT_START : { + /* Silence linker warning, we are not going to use it */ + dummy_start = .; + + /* Declare below any symbol name needed. + * Each symbol should be on its own line. + * It looks like a tedious work but we make sure the things we use. + * Potentially they should be all variables. */ + DECLARE_IMPORT(__base_relocs_start); + DECLARE_IMPORT(__base_relocs_end); + . = . + GAP; *(.text) *(.text.*) - *(.data) - *(.data.*) *(.rodata) *(.rodata.*) + *(.data) + *(.data.*) *(.bss) *(.bss.*) } - + /DISCARD/ : { + *(.comment) + *(.comment.*) + *(.note.*) + } /* Dynamic linkage sections. Collected simply so we can check they're empty. */ .got : { *(.got) diff --git a/xen/arch/x86/boot/cmdline.c b/xen/arch/x86/boot/cmdline.c index fc9241ede9..196c580e91 100644 --- a/xen/arch/x86/boot/cmdline.c +++ b/xen/arch/x86/boot/cmdline.c @@ -18,18 +18,6 @@ * Linux kernel source (linux/lib/string.c). */ -/* - * This entry point is entered from xen/arch/x86/boot/head.S with: - * - %eax = &cmdline, - * - %edx = &early_boot_opts. - */ -asm ( - " .text \n" - " .globl _start \n" - "_start: \n" - " jmp cmdline_parse_early \n" - ); - #include <xen/compiler.h> #include <xen/kconfig.h> #include <xen/macros.h> diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S index c4de1dfab5..e0776e3896 100644 --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -759,18 +759,6 @@ trampoline_setup: /* Jump into the relocated trampoline. */ lret - /* - * cmdline and reloc are written in C, and linked to be 32bit PIC with - * entrypoints at 0 and using the fastcall convention. - */ -FUNC_LOCAL(cmdline_parse_early) - .incbin "cmdline.bin" -END(cmdline_parse_early) - -FUNC_LOCAL(reloc) - .incbin "reloc.bin" -END(reloc) - ENTRY(trampoline_start) #include "trampoline.S" ENTRY(trampoline_end) diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c index 8c58affcd9..94b078d7b1 100644 --- a/xen/arch/x86/boot/reloc.c +++ b/xen/arch/x86/boot/reloc.c @@ -12,20 +12,6 @@ * Daniel Kiper <daniel.kiper@oracle.com> */ -/* - * This entry point is entered from xen/arch/x86/boot/head.S with: - * - %eax = MAGIC, - * - %edx = INFORMATION_ADDRESS, - * - %ecx = TOPMOST_LOW_MEMORY_STACK_ADDRESS. - * - 0x04(%esp) = BOOT_VIDEO_INFO_ADDRESS. - */ -asm ( - " .text \n" - " .globl _start \n" - "_start: \n" - " jmp reloc \n" - ); - #include <xen/compiler.h> #include <xen/macros.h> #include <xen/types.h> diff --git a/xen/tools/combine_two_binaries.py b/xen/tools/combine_two_binaries.py new file mode 100755 index 0000000000..264be77274 --- /dev/null +++ b/xen/tools/combine_two_binaries.py @@ -0,0 +1,220 @@ +#!/usr/bin/env python3 + +from __future__ import print_function +import argparse +import functools +import re +import struct +import sys + +parser = argparse.ArgumentParser(description='Generate assembly file to merge into other code.') +auto_int = functools.update_wrapper(lambda x: int(x, 0), int) # allows hex +parser.add_argument('--script', dest='script', + required=True, + help='Linker script for extracting symbols') +parser.add_argument('--bin1', dest='bin1', + required=True, + help='First binary') +parser.add_argument('--bin2', dest='bin2', + required=True, + help='Second binary') +parser.add_argument('--gap', dest='gap', + required=True, + type=auto_int, + help='Gap inserted at the start of code section') +parser.add_argument('--text-diff', dest='text_diff', + required=True, + type=auto_int, + help='Difference between code section start') +parser.add_argument('--output', dest='output', + help='Output file') +parser.add_argument('--map', dest='mapfile', + help='Map file to read for symbols to export') +parser.add_argument('--exports', dest='exports', + help='Symbols to export') +parser.add_argument('--section-header', dest='section_header', + default='.section .init.text, "ax", @progbits', + help='Section header declaration') +parser.add_argument('-v', '--verbose', + action='store_true') +args = parser.parse_args() + +gap = args.gap +text_diff = args.text_diff + +# Parse linker script for external symbols to use. +# Next regex matches expanded DECLARE_IMPORT lines in linker script. +symbol_re = re.compile(r'\s+(\S+)\s*=\s*\.\s*\+\s*\((\d+)\s*\*\s*0\s*\)\s*;') +symbols = {} +lines = {} +for line in open(args.script): + m = symbol_re.match(line) + if not m: + continue + (name, line_num) = (m.group(1), int(m.group(2))) + if line_num == 0: + raise Exception("Invalid line number found:\n\t" + line) + if line_num in symbols: + raise Exception("Symbol with this line already present:\n\t" + line) + if name in lines: + raise Exception("Symbol with this name already present:\n\t" + name) + symbols[line_num] = name + lines[name] = line_num + +exports = [] +if args.exports is not None: + exports = dict([(name, None) for name in args.exports.split(',')]) + +# Parse mapfile, look for ther symbols we want to export. +if args.mapfile is not None: + symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n') + for line in open(args.mapfile): + m = symbol_re.match(line) + if not m or m.group(2) not in exports: + continue + addr = int(m.group(1), 16) + exports[m.group(2)] = addr +for (name, addr) in exports.items(): + if addr is None: + raise Exception("Required export symbols %s not found" % name) + +file1 = open(args.bin1, 'rb') +file2 = open(args.bin2, 'rb') +file1.seek(0, 2) +size1 = file1.tell() +file2.seek(0, 2) +size2 = file2.tell() +if size1 > size2: + file1, file2 = file2, file1 + size1, size2 = size2, size1 +if size2 != size1 + gap: + raise Exception('File sizes do not match') +del size2 + +file1.seek(0, 0) +data1 = file1.read(size1) +del file1 +file2.seek(gap, 0) +data2 = file2.read(size1) +del file2 + +max_line = max(symbols.keys()) + +def to_int32(n): + '''Convert a number to signed 32 bit integer truncating if needed''' + mask = (1 << 32) - 1 + h = 1 << 31 + return (n & mask) ^ h - h + +i = 0 +references = {} +internals = 0 +while i <= size1 - 4: + n1 = struct.unpack('<I', data1[i:i+4])[0] + n2 = struct.unpack('<I', data2[i:i+4])[0] + i += 1 + # The two numbers are the same, no problems + if n1 == n2: + continue + # Try to understand why they are different + diff = to_int32(n1 - n2) + if diff == -gap: # this is an internal relocation + pos = i - 1 + print("Internal relocation found at position %#x " + "n1=%#x n2=%#x diff=%#x" % (pos, n1, n2, diff), + file=sys.stderr) + i += 3 + internals += 1 + if internals >= 10: + break + continue + # This is a relative relocation to a symbol, accepted, code/data is + # relocatable. + if diff < gap and diff >= gap - max_line: + n = gap - diff + symbol = symbols.get(n) + # check we have a symbol + if symbol is None: + raise Exception("Cannot find symbol for line %d" % n) + pos = i - 1 + if args.verbose: + print('Position %#x %d %s' % (pos, n, symbol), file=sys.stderr) + i += 3 + references[pos] = symbol + continue + # First byte is the same, move to next byte + if diff & 0xff == 0 and i <= size1 - 4: + continue + # Probably a type of relocation we don't want or support + pos = i - 1 + suggestion = '' + symbol = symbols.get(-diff - text_diff) + if symbol is not None: + suggestion = " Maybe %s is not defined as hidden?" % symbol + raise Exception("Unexpected difference found at %#x " + "n1=%#x n2=%#x diff=%#x gap=%#x.%s" % \ + (pos, n1, n2, diff, gap, suggestion)) +if internals != 0: + raise Exception("Previous relocations found") + +def line_bytes(buf, out): + '''Output an assembly line with all bytes in "buf"''' + # Python 2 compatibility + if type(buf) == str: + print("\t.byte " + ','.join([str(ord(c)) for c in buf]), file=out) + else: + print("\t.byte " + ','.join([str(n) for n in buf]), file=out) + +def part(start, end, out): + '''Output bytes of "data" from "start" to "end"''' + while start < end: + e = min(start + 16, end) + line_bytes(data1[start:e], out) + start = e + +def reference(pos, out): + name = references[pos] + n = struct.unpack('<I', data1[pos:pos+4])[0] + sign = '+' + if n >= (1 << 31): + n -= (1 << 32) + n += pos + if n < 0: + n = -n + sign = '-' + print("\t.hidden %s\n" + "\t.long %s %s %#x - ." % (name, name, sign, n), + file=out) + +def output(out): + prev = 0 + exports_by_addr = {} + for (sym, addr) in exports.items(): + exports_by_addr.setdefault(addr, []).append(sym) + positions = list(references.keys()) + positions += list(exports_by_addr.keys()) + for pos in sorted(positions): + part(prev, pos, out) + prev = pos + if pos in references: + reference(pos, out) + prev = pos + 4 + if pos in exports_by_addr: + for sym in exports_by_addr[pos]: + print("\t.global %s\n" + "\t.hidden %s\n" + "%s:" % (sym, sym, sym), + file=out) + part(prev, size1, out) + +out = sys.stdout +if args.output is not None: + out = open(args.output, 'w') +print('''/* + * File autogenerated by combine_two_binaries.py DO NOT EDIT + */''', file=out) +print('\t' + args.section_header, file=out) +print('obj32_start:', file=out) +output(out) +print('\n\t.section\t.note.GNU-stack,"",@progbits', file=out) +out.flush()
The current method to include 32 bit C boot code is: - compile each function we want to use into a separate object file; - each function is compiled with -fpic option; - convert these object files to binary files. This operation removes GOP which we don't want in the executable; - a small assembly part in each file add the entry point; - code can't have external references, all possible variables are passed by value or pointer; - include these binary files in head.S. There are currently some limitations: - code is compiled separately, it's not possible to share a function (like memcpy) between different functions to use; - although code is compiled with -fpic there's no certainty there are no relocations, specifically data ones. This can lead into hard to find bugs; - it's hard to add a simple function; - having to pass external variables makes hard to do multiple things otherwise functions would require a lot of parameters so code would have to be split into multiple functions which is not easy. Current change extends the current process: - all object files are linked together before getting converted making possible to share code between the function we want to call; - a single object file is generated with all functions to use and exported symbols to easily call; - variables to use are declared in linker script and easily used inside C code. Declaring them manually could be annoying but makes also easier to check them. Using external pointers can be still an issue if they are not fixed. If an external symbol is not declared this gives a link error. Some details of the implementation: - C code is compiled with -fpic flags (as before); - object files from C code are linked together; - the single bundled object file is linked with 2 slightly different script files to generate 2 bundled object files; - the 2 bundled object files are converted to binary removing the need for global offset tables; - a Python script is used to generate assembly source from the 2 binaries; - the single assembly file is compiled to generate final bundled object file; - to detect possible unwanted relocation in data/code code is generated with different addresses. This is enforced starting .text section at different positions and adding a fixed "gap" at the beginning. This makes sure code and data is position independent; - to detect used symbols in data/code symbols are placed in .text section at different offsets (based on the line in the linker script). This is needed as potentially a reference to a symbol is converted to a reference to the containing section so multiple symbols could be converted to reference to same symbol (section name) and we need to distinguish them; - --orphan-handling=error option to linker is used to make sure we account for all possible sections from C code; Current limitations: - the main one is the lack of support for 64 bit code. It would make sure that even the code used for 64 bit (at the moment EFI code) is code and data position independent. We cannot assume that code that came from code compiled for 32 bit and compiled for 64 bit is code and data position independent, different compiler options lead to different code/data. Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> --- Changes since v2: - removed W^X limitation, allowing data; - added some comments to python script; - added extension to python script; - added header to generated assembly code from python script; - added starting symbol to generated assembly code from python script to make disassembly more clear; - other minor style changes to python script. Changes since v4: - add build32.final.lds build32.other.lds to targets macro; - place some comments over a rule, not inside; - simplified linking and producing binary rule; - renamed built_in_32 to built-in-32, coding style; - fix minor indentation; - put magic numbers in Makefile and propagate them; - minor variable cleanups in Python script; - add dependency to Python script. Changes since v5: - renamed "other" and "final" phases to "base" and "offset"; - use if_changed macro to generate built-in-32.S. --- xen/arch/x86/boot/.gitignore | 5 +- xen/arch/x86/boot/Makefile | 47 +++- .../x86/boot/{build32.lds => build32.lds.S} | 35 ++- xen/arch/x86/boot/cmdline.c | 12 - xen/arch/x86/boot/head.S | 12 - xen/arch/x86/boot/reloc.c | 14 -- xen/tools/combine_two_binaries.py | 220 ++++++++++++++++++ 7 files changed, 292 insertions(+), 53 deletions(-) rename xen/arch/x86/boot/{build32.lds => build32.lds.S} (70%) create mode 100755 xen/tools/combine_two_binaries.py