diff mbox series

x86/boot: Fix build with LLVM toolchain

Message ID 20241105145507.613981-1-frediano.ziglio@cloud.com (mailing list archive)
State Superseded
Headers show
Series x86/boot: Fix build with LLVM toolchain | expand

Commit Message

Frediano Ziglio Nov. 5, 2024, 2:55 p.m. UTC
This toolchain generates different object and map files.
Account for these changes.
Added sections need to have special type so we put them in
separate sections as linker will copy type from input sections.

Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com>
---
 xen/arch/x86/boot/build32.lds.S   |  9 +++++++++
 xen/tools/combine_two_binaries.py | 17 +++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

Comments

Jan Beulich Nov. 5, 2024, 3:32 p.m. UTC | #1
On 05.11.2024 15:55, Frediano Ziglio wrote:
> This toolchain generates different object and map files.
> Account for these changes.

At least briefly mentioning what exactly the differences are would be
quite nice, imo.

> --- a/xen/tools/combine_two_binaries.py
> +++ b/xen/tools/combine_two_binaries.py
> @@ -67,13 +67,22 @@ if args.exports is not None:
>  
>  # Parse mapfile, look for ther symbols we want to export.
>  if args.mapfile is not None:
> -    symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> +    symbol_re_clang = \

Is "clang" really appropriate to use here? Even without the alternative
being named "gnu", I'd expect "llvm" to be more to the point, as at
the linking stage the original language (encoded in "clang") shouldn't
matter much anymore.

> +        re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')

Just wondering:
- How stable is their map file format?
- Do they not have an option to write GNU-compatible map files?
- Why do you declare 5 groups, when you're only after the 1st and last,
which would then allow to ...

> +    symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>      for line in open(args.mapfile):
> -        m = symbol_re.match(line)
> -        if not m or m.group(2) not in exports:
> +        name = None
> +        m = symbol_re_clang.match(line)
> +        if m:
> +            name = m.group(5)
> +        else:
> +            m = symbol_re_gnu.match(line)
> +            if m:
> +                name = m.group(2)

... uniformly use m.group(2) here (outside of the conditional)?

Jan

> +        if name is None or name not in exports:
>              continue
>          addr = int(m.group(1), 16)
> -        exports[m.group(2)] = addr
> +        exports[name] = addr
>  for (name, addr) in exports.items():
>      if addr is None:
>          raise Exception("Required export symbols %s not found" % name)
Frediano Ziglio Nov. 5, 2024, 4:35 p.m. UTC | #2
On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 05.11.2024 15:55, Frediano Ziglio wrote:
> > This toolchain generates different object and map files.
> > Account for these changes.
>
> At least briefly mentioning what exactly the differences are would be
> quite nice, imo.
>

What about.

Object have 3 additional sections which must be handled by the linker script.
Map file has 7 columns: VMA, LMA, size, alignment, output section,
definition, symbols, for our symbols output section and definition are
empty.

> > --- a/xen/tools/combine_two_binaries.py
> > +++ b/xen/tools/combine_two_binaries.py
> > @@ -67,13 +67,22 @@ if args.exports is not None:
> >
> >  # Parse mapfile, look for ther symbols we want to export.
> >  if args.mapfile is not None:
> > -    symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> > +    symbol_re_clang = \
>
> Is "clang" really appropriate to use here? Even without the alternative
> being named "gnu", I'd expect "llvm" to be more to the point, as at
> the linking stage the original language (encoded in "clang") shouldn't
> matter much anymore.
>

I'll change to "llvm".

> > +        re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
>
> Just wondering:
> - How stable is their map file format?

No idea, unfortunately, map files are not really standard.

> - Do they not have an option to write GNU-compatible map files?

I try to search, but I could not find such an option.

> - Why do you declare 5 groups, when you're only after the 1st and last,
> which would then allow to ...
>
> > +    symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> >      for line in open(args.mapfile):
> > -        m = symbol_re.match(line)
> > -        if not m or m.group(2) not in exports:
> > +        name = None
> > +        m = symbol_re_clang.match(line)
> > +        if m:
> > +            name = m.group(5)
> > +        else:
> > +            m = symbol_re_gnu.match(line)
> > +            if m:
> > +                name = m.group(2)
>
> ... uniformly use m.group(2) here (outside of the conditional)?
>

It could be done. The initial idea was testing that VMA and LMA fields
should be identical, which gives a bit more certainty about the format
used.
About map file format, both formats have some headers. Would it be
sensible to detect such headers? BTW, probably a mis-detection would
cause symbols not to be detected.

> Jan
>
> > +        if name is None or name not in exports:
> >              continue
> >          addr = int(m.group(1), 16)
> > -        exports[m.group(2)] = addr
> > +        exports[name] = addr
> >  for (name, addr) in exports.items():
> >      if addr is None:
> >          raise Exception("Required export symbols %s not found" % name)
>

Frediano
Jan Beulich Nov. 5, 2024, 5:06 p.m. UTC | #3
On 05.11.2024 17:35, Frediano Ziglio wrote:
> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 05.11.2024 15:55, Frediano Ziglio wrote:
>>> This toolchain generates different object and map files.
>>> Account for these changes.
>>
>> At least briefly mentioning what exactly the differences are would be
>> quite nice, imo.
>>
> 
> What about.
> 
> Object have 3 additional sections which must be handled by the linker script.

I expect these sections are there in both cases. The difference, I assume,
is that for the GNU linker they don't need mentioning in the linker script.
Maybe that's what you mean to say, but to me at least the sentence can also
be interpreted differently.

>>> +    symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>>>      for line in open(args.mapfile):
>>> -        m = symbol_re.match(line)
>>> -        if not m or m.group(2) not in exports:
>>> +        name = None
>>> +        m = symbol_re_clang.match(line)
>>> +        if m:
>>> +            name = m.group(5)
>>> +        else:
>>> +            m = symbol_re_gnu.match(line)
>>> +            if m:
>>> +                name = m.group(2)
>>
>> ... uniformly use m.group(2) here (outside of the conditional)?
>>
> 
> It could be done. The initial idea was testing that VMA and LMA fields
> should be identical, which gives a bit more certainty about the format
> used.
> About map file format, both formats have some headers. Would it be
> sensible to detect such headers? BTW, probably a mis-detection would
> cause symbols not to be detected.

Not sure either way, to be honest.

Jan
Andrew Cooper Nov. 5, 2024, 7:23 p.m. UTC | #4
On 05/11/2024 2:55 pm, Frediano Ziglio wrote:
> diff --git a/xen/tools/combine_two_binaries.py b/xen/tools/combine_two_binaries.py
> index 447c0d3bdb..79ae8900b1 100755
> --- a/xen/tools/combine_two_binaries.py
> +++ b/xen/tools/combine_two_binaries.py
> @@ -67,13 +67,22 @@ if args.exports is not None:
>  
>  # Parse mapfile, look for ther symbols we want to export.
>  if args.mapfile is not None:
> -    symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> +    symbol_re_clang = \
> +        re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
> +    symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')

These are specific to the linker, not the compiler, so really should be
_ld and _lld rather than _gnu and _clang.

The build environment does have CONFIG_LD_IS_GNU which is used for one
LD vs LLD distinction.  It seems reasonable to re-use it here (so you're
not trying both regexes in the hope that one works), although you'll
need to plumb it into the script somehow because it's not exported into
the environment currently.


But, regexes are utterly opaque, and given that neither Binutils nor
LLVM document their map format, you really need to provide at least one
example of what a relevant mapfile looks like in a comment.

Looking at built-in-32.offset.map in it's gnu form, it's presumably
looking for the lines that are just a hex address and a name with no
other punctuation.

But the lld form is clearly different.

~Andrew
Frediano Ziglio Nov. 6, 2024, 6:56 a.m. UTC | #5
On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 05.11.2024 17:35, Frediano Ziglio wrote:
> > On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 05.11.2024 15:55, Frediano Ziglio wrote:
> >>> This toolchain generates different object and map files.
> >>> Account for these changes.
> >>
> >> At least briefly mentioning what exactly the differences are would be
> >> quite nice, imo.
> >>
> >
> > What about.
> >
> > Object have 3 additional sections which must be handled by the linker script.
>
> I expect these sections are there in both cases. The difference, I assume,
> is that for the GNU linker they don't need mentioning in the linker script.
> Maybe that's what you mean to say, but to me at least the sentence can also
> be interpreted differently.
>

Why do you expect such sections? They are used for dynamic symbols in
shared objects, we don't use shared objects here. Normal object
symbols are not handled by these sections. GNU compiler+linker (we
link multiple objects together) do not generate these sections. So the
comment looks correct to me.

> >>> +    symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> >>>      for line in open(args.mapfile):
> >>> -        m = symbol_re.match(line)
> >>> -        if not m or m.group(2) not in exports:
> >>> +        name = None
> >>> +        m = symbol_re_clang.match(line)
> >>> +        if m:
> >>> +            name = m.group(5)
> >>> +        else:
> >>> +            m = symbol_re_gnu.match(line)
> >>> +            if m:
> >>> +                name = m.group(2)
> >>
> >> ... uniformly use m.group(2) here (outside of the conditional)?
> >>
> >
> > It could be done. The initial idea was testing that VMA and LMA fields
> > should be identical, which gives a bit more certainty about the format
> > used.
> > About map file format, both formats have some headers. Would it be
> > sensible to detect such headers? BTW, probably a mis-detection would
> > cause symbols not to be detected.
>
> Not sure either way, to be honest.
>
> Jan

Mumble... I'm looking for an alternative, maybe I can avoid using map files.

Frediano
Alejandro Vallejo Nov. 6, 2024, 10:32 a.m. UTC | #6
On Tue Nov 5, 2024 at 7:23 PM GMT, Andrew Cooper wrote:
> On 05/11/2024 2:55 pm, Frediano Ziglio wrote:
> > diff --git a/xen/tools/combine_two_binaries.py b/xen/tools/combine_two_binaries.py
> > index 447c0d3bdb..79ae8900b1 100755
> > --- a/xen/tools/combine_two_binaries.py
> > +++ b/xen/tools/combine_two_binaries.py
> > @@ -67,13 +67,22 @@ if args.exports is not None:
> >  
> >  # Parse mapfile, look for ther symbols we want to export.
> >  if args.mapfile is not None:
> > -    symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
> > +    symbol_re_clang = \
> > +        re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
> > +    symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>
> These are specific to the linker, not the compiler, so really should be
> _ld and _lld rather than _gnu and _clang.

GNU binutils ships not one, but two linkers. _ld does not reflect the fact that
it's not the map format of a linker, but a family of them. Probably shared by
mold too. That's a good reason for _gnu to stay _gnu (or if changing, to
_binutils; but that's hardly relevant in this patch).

Otherwise _clang could become either _llvm or _lld. Given there's a single
linker shipped by LLVM the difference is less than cosmetic.

>
> The build environment does have CONFIG_LD_IS_GNU which is used for one
> LD vs LLD distinction.  It seems reasonable to re-use it here (so you're
> not trying both regexes in the hope that one works), although you'll
> need to plumb it into the script somehow because it's not exported into
> the environment currently.
>
>
> But, regexes are utterly opaque, and given that neither Binutils nor
> LLVM document their map format, you really need to provide at least one
> example of what a relevant mapfile looks like in a comment.
>
> Looking at built-in-32.offset.map in it's gnu form, it's presumably
> looking for the lines that are just a hex address and a name with no
> other punctuation.
>
> But the lld form is clearly different.
>
> ~Andrew

Cheers,
Alejandro
Andrew Cooper Nov. 6, 2024, 10:50 a.m. UTC | #7
On 06/11/2024 10:32 am, Alejandro Vallejo wrote:
> On Tue Nov 5, 2024 at 7:23 PM GMT, Andrew Cooper wrote:
>> On 05/11/2024 2:55 pm, Frediano Ziglio wrote:
>>> diff --git a/xen/tools/combine_two_binaries.py b/xen/tools/combine_two_binaries.py
>>> index 447c0d3bdb..79ae8900b1 100755
>>> --- a/xen/tools/combine_two_binaries.py
>>> +++ b/xen/tools/combine_two_binaries.py
>>> @@ -67,13 +67,22 @@ if args.exports is not None:
>>>  
>>>  # Parse mapfile, look for ther symbols we want to export.
>>>  if args.mapfile is not None:
>>> -    symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>>> +    symbol_re_clang = \
>>> +        re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
>>> +    symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
>> These are specific to the linker, not the compiler, so really should be
>> _ld and _lld rather than _gnu and _clang.
> GNU binutils ships not one, but two linkers.

Yes that's technically true.

But while only one of them is remotely capable of linking a kernel, it's
also not very relevant.

Neither Gold, or indeed Mold which you mention too, are liable to be LD
within Xen for the foreseeable future.

~Andrew
Jan Beulich Nov. 6, 2024, 10:59 a.m. UTC | #8
On 06.11.2024 07:56, Frediano Ziglio wrote:
> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 05.11.2024 17:35, Frediano Ziglio wrote:
>>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 05.11.2024 15:55, Frediano Ziglio wrote:
>>>>> This toolchain generates different object and map files.
>>>>> Account for these changes.
>>>>
>>>> At least briefly mentioning what exactly the differences are would be
>>>> quite nice, imo.
>>>>
>>>
>>> What about.
>>>
>>> Object have 3 additional sections which must be handled by the linker script.
>>
>> I expect these sections are there in both cases. The difference, I assume,
>> is that for the GNU linker they don't need mentioning in the linker script.
>> Maybe that's what you mean to say, but to me at least the sentence can also
>> be interpreted differently.
> 
> Why do you expect such sections? They are used for dynamic symbols in
> shared objects, we don't use shared objects here. Normal object
> symbols are not handled by these sections. GNU compiler+linker (we
> link multiple objects together) do not generate these sections. So the
> comment looks correct to me.

About every ELF object will have .symtab and .strtab, and many also a
separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
confused by your reply.

Jan
Frediano Ziglio Nov. 6, 2024, 11:34 a.m. UTC | #9
On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.11.2024 07:56, Frediano Ziglio wrote:
> > On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 05.11.2024 17:35, Frediano Ziglio wrote:
> >>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 05.11.2024 15:55, Frediano Ziglio wrote:
> >>>>> This toolchain generates different object and map files.
> >>>>> Account for these changes.
> >>>>
> >>>> At least briefly mentioning what exactly the differences are would be
> >>>> quite nice, imo.
> >>>>
> >>>
> >>> What about.
> >>>
> >>> Object have 3 additional sections which must be handled by the linker script.
> >>
> >> I expect these sections are there in both cases. The difference, I assume,
> >> is that for the GNU linker they don't need mentioning in the linker script.
> >> Maybe that's what you mean to say, but to me at least the sentence can also
> >> be interpreted differently.
> >
> > Why do you expect such sections? They are used for dynamic symbols in
> > shared objects, we don't use shared objects here. Normal object
> > symbols are not handled by these sections. GNU compiler+linker (we
> > link multiple objects together) do not generate these sections. So the
> > comment looks correct to me.
>
> About every ELF object will have .symtab and .strtab, and many also a
> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
> confused by your reply.
>
> Jan

I checked the object files and there are no such sections using GNU toolchain.

Frediano
Jan Beulich Nov. 6, 2024, 11:45 a.m. UTC | #10
On 06.11.2024 12:34, Frediano Ziglio wrote:
> On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 06.11.2024 07:56, Frediano Ziglio wrote:
>>> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 05.11.2024 17:35, Frediano Ziglio wrote:
>>>>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 05.11.2024 15:55, Frediano Ziglio wrote:
>>>>>>> This toolchain generates different object and map files.
>>>>>>> Account for these changes.
>>>>>>
>>>>>> At least briefly mentioning what exactly the differences are would be
>>>>>> quite nice, imo.
>>>>>>
>>>>>
>>>>> What about.
>>>>>
>>>>> Object have 3 additional sections which must be handled by the linker script.
>>>>
>>>> I expect these sections are there in both cases. The difference, I assume,
>>>> is that for the GNU linker they don't need mentioning in the linker script.
>>>> Maybe that's what you mean to say, but to me at least the sentence can also
>>>> be interpreted differently.
>>>
>>> Why do you expect such sections? They are used for dynamic symbols in
>>> shared objects, we don't use shared objects here. Normal object
>>> symbols are not handled by these sections. GNU compiler+linker (we
>>> link multiple objects together) do not generate these sections. So the
>>> comment looks correct to me.
>>
>> About every ELF object will have .symtab and .strtab, and many also a
>> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
>> confused by your reply.
> 
> I checked the object files and there are no such sections using GNU toolchain.

I think I checked every *.o that's under boot/, and they all have these three
sections. Can you clarify which one(s) specifically you checked?

Jan
Frediano Ziglio Nov. 6, 2024, 11:58 a.m. UTC | #11
On Wed, Nov 6, 2024 at 11:45 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 06.11.2024 12:34, Frediano Ziglio wrote:
> > On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 06.11.2024 07:56, Frediano Ziglio wrote:
> >>> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>
> >>>> On 05.11.2024 17:35, Frediano Ziglio wrote:
> >>>>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>>>>>
> >>>>>> On 05.11.2024 15:55, Frediano Ziglio wrote:
> >>>>>>> This toolchain generates different object and map files.
> >>>>>>> Account for these changes.
> >>>>>>
> >>>>>> At least briefly mentioning what exactly the differences are would be
> >>>>>> quite nice, imo.
> >>>>>>
> >>>>>
> >>>>> What about.
> >>>>>
> >>>>> Object have 3 additional sections which must be handled by the linker script.
> >>>>
> >>>> I expect these sections are there in both cases. The difference, I assume,
> >>>> is that for the GNU linker they don't need mentioning in the linker script.
> >>>> Maybe that's what you mean to say, but to me at least the sentence can also
> >>>> be interpreted differently.
> >>>
> >>> Why do you expect such sections? They are used for dynamic symbols in
> >>> shared objects, we don't use shared objects here. Normal object
> >>> symbols are not handled by these sections. GNU compiler+linker (we
> >>> link multiple objects together) do not generate these sections. So the
> >>> comment looks correct to me.
> >>
> >> About every ELF object will have .symtab and .strtab, and many also a
> >> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
> >> confused by your reply.
> >
> > I checked the object files and there are no such sections using GNU toolchain.
>
> I think I checked every *.o that's under boot/, and they all have these three
> sections. Can you clarify which one(s) specifically you checked?
>
> Jan

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.38
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public Licence version 3 or (at your option) a later version.
This program has absolutely no warranty.

$ find xen/normal/ xen/pvh/ -name \*.o | xargs -ifilename sh -c
'objdump -x filename' | grep -e \\.
shstrtab -e \\.strtab -e \\.symtab

(xen/normal and xen/pvh are the build directory, with different configurations)

I'm saying that's possibly why the linker scripts didn't need to
specify these sections.

Frediano
Roger Pau Monné Nov. 6, 2024, 12:37 p.m. UTC | #12
On Wed, Nov 06, 2024 at 11:58:19AM +0000, Frediano Ziglio wrote:
> On Wed, Nov 6, 2024 at 11:45 AM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 06.11.2024 12:34, Frediano Ziglio wrote:
> > > On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich <jbeulich@suse.com> wrote:
> > >>
> > >> On 06.11.2024 07:56, Frediano Ziglio wrote:
> > >>> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich <jbeulich@suse.com> wrote:
> > >>>>
> > >>>> On 05.11.2024 17:35, Frediano Ziglio wrote:
> > >>>>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
> > >>>>>>
> > >>>>>> On 05.11.2024 15:55, Frediano Ziglio wrote:
> > >>>>>>> This toolchain generates different object and map files.
> > >>>>>>> Account for these changes.
> > >>>>>>
> > >>>>>> At least briefly mentioning what exactly the differences are would be
> > >>>>>> quite nice, imo.
> > >>>>>>
> > >>>>>
> > >>>>> What about.
> > >>>>>
> > >>>>> Object have 3 additional sections which must be handled by the linker script.
> > >>>>
> > >>>> I expect these sections are there in both cases. The difference, I assume,
> > >>>> is that for the GNU linker they don't need mentioning in the linker script.
> > >>>> Maybe that's what you mean to say, but to me at least the sentence can also
> > >>>> be interpreted differently.
> > >>>
> > >>> Why do you expect such sections? They are used for dynamic symbols in
> > >>> shared objects, we don't use shared objects here. Normal object
> > >>> symbols are not handled by these sections. GNU compiler+linker (we
> > >>> link multiple objects together) do not generate these sections. So the
> > >>> comment looks correct to me.
> > >>
> > >> About every ELF object will have .symtab and .strtab, and many also a
> > >> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
> > >> confused by your reply.
> > >
> > > I checked the object files and there are no such sections using GNU toolchain.
> >
> > I think I checked every *.o that's under boot/, and they all have these three
> > sections. Can you clarify which one(s) specifically you checked?
> >
> > Jan
> 
> $ gcc --version
> gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
> Copyright (C) 2021 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> $ ld --version
> GNU ld (GNU Binutils for Ubuntu) 2.38
> Copyright (C) 2022 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public Licence version 3 or (at your option) a later version.
> This program has absolutely no warranty.
> 
> $ find xen/normal/ xen/pvh/ -name \*.o | xargs -ifilename sh -c
> 'objdump -x filename' | grep -e \\.
> shstrtab -e \\.strtab -e \\.symtab
> 
> (xen/normal and xen/pvh are the build directory, with different configurations)
> 
> I'm saying that's possibly why the linker scripts didn't need to
> specify these sections.

objdump -x doesn't seem to list .symtab, .strtab or .shstrtab, but
readelf -S does:

# readelf -SW xen/arch/x86/boot/reloc.o
[...]
  [11] .symtab           SYMTAB          00000000 0004b0 000190 10     12  21  4
  [12] .strtab           STRTAB          00000000 000640 000092 00      0   0  1
  [13] .shstrtab         STRTAB          00000000 000734 000078 00      0   0  1

# objdump -x xen/arch/x86/boot/reloc.o
[...]
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .group        00000008  00000000  00000000  00000034  2**2
                  CONTENTS, READONLY, GROUP, LINK_ONCE_DISCARD
  1 .text         0000041d  00000000  00000000  00000040  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  2 .data         00000000  00000000  00000000  0000045d  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00000000  00000000  00000000  0000045d  2**0
                  ALLOC
  4 .rodata       00000024  00000000  00000000  00000460  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  5 .text.__x86.get_pc_thunk.bx 00000004  00000000  00000000  00000484  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  6 .comment      00000028  00000000  00000000  00000488  2**0
                  CONTENTS, READONLY
  7 .note.GNU-stack 00000000  00000000  00000000  000004b0  2**0
                  CONTENTS, READONLY

It also seems to skip .rel sections.  It doesn't seem reliable to use
objdump to get ELF sections.

Regards, Roger.
Jan Beulich Nov. 7, 2024, 9:46 a.m. UTC | #13
On 06.11.2024 12:58, Frediano Ziglio wrote:
> On Wed, Nov 6, 2024 at 11:45 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 06.11.2024 12:34, Frediano Ziglio wrote:
>>> On Wed, Nov 6, 2024 at 10:59 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 06.11.2024 07:56, Frediano Ziglio wrote:
>>>>> On Tue, Nov 5, 2024 at 5:06 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>
>>>>>> On 05.11.2024 17:35, Frediano Ziglio wrote:
>>>>>>> On Tue, Nov 5, 2024 at 3:32 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>>>
>>>>>>>> On 05.11.2024 15:55, Frediano Ziglio wrote:
>>>>>>>>> This toolchain generates different object and map files.
>>>>>>>>> Account for these changes.
>>>>>>>>
>>>>>>>> At least briefly mentioning what exactly the differences are would be
>>>>>>>> quite nice, imo.
>>>>>>>>
>>>>>>>
>>>>>>> What about.
>>>>>>>
>>>>>>> Object have 3 additional sections which must be handled by the linker script.
>>>>>>
>>>>>> I expect these sections are there in both cases. The difference, I assume,
>>>>>> is that for the GNU linker they don't need mentioning in the linker script.
>>>>>> Maybe that's what you mean to say, but to me at least the sentence can also
>>>>>> be interpreted differently.
>>>>>
>>>>> Why do you expect such sections? They are used for dynamic symbols in
>>>>> shared objects, we don't use shared objects here. Normal object
>>>>> symbols are not handled by these sections. GNU compiler+linker (we
>>>>> link multiple objects together) do not generate these sections. So the
>>>>> comment looks correct to me.
>>>>
>>>> About every ELF object will have .symtab and .strtab, and many also a
>>>> separate .shstrtab. There's nothing "dynamic" about them. IOW - I'm
>>>> confused by your reply.
>>>
>>> I checked the object files and there are no such sections using GNU toolchain.
>>
>> I think I checked every *.o that's under boot/, and they all have these three
>> sections. Can you clarify which one(s) specifically you checked?
> 
> $ gcc --version
> gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
> Copyright (C) 2021 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> $ ld --version
> GNU ld (GNU Binutils for Ubuntu) 2.38
> Copyright (C) 2022 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public Licence version 3 or (at your option) a later version.
> This program has absolutely no warranty.
> 
> $ find xen/normal/ xen/pvh/ -name \*.o | xargs -ifilename sh -c
> 'objdump -x filename' | grep -e \\.
> shstrtab -e \\.strtab -e \\.symtab
> 
> (xen/normal and xen/pvh are the build directory, with different configurations)
> 
> I'm saying that's possibly why the linker scripts didn't need to
> specify these sections.

Just to mention it here as well - objdump's -x option doesn't include "control"
sections. Considering the help text for -x this feels like a bug. However, as
documentation has it:

"Display all available header information, including the symbol table and
 relocation entries."

the symbol table and possible relocations _are_ being displayed, just not as
part of "Sections:".

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/boot/build32.lds.S b/xen/arch/x86/boot/build32.lds.S
index f20fc18977..2e565180d5 100644
--- a/xen/arch/x86/boot/build32.lds.S
+++ b/xen/arch/x86/boot/build32.lds.S
@@ -66,6 +66,15 @@  SECTIONS
        *(.comment.*)
        *(.note.*)
   }
+  .shstrtab : {
+       *(.shstrtab)
+  }
+  .strtab : {
+       *(.strtab)
+  }
+  .symtab : {
+       *(.symtab)
+  }
   /* Dynamic linkage sections.  Collected simply so we can check they're empty. */
   .got : {
         *(.got)
diff --git a/xen/tools/combine_two_binaries.py b/xen/tools/combine_two_binaries.py
index 447c0d3bdb..79ae8900b1 100755
--- a/xen/tools/combine_two_binaries.py
+++ b/xen/tools/combine_two_binaries.py
@@ -67,13 +67,22 @@  if args.exports is not None:
 
 # Parse mapfile, look for ther symbols we want to export.
 if args.mapfile is not None:
-    symbol_re = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
+    symbol_re_clang = \
+        re.compile(r'\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s+([0-9a-f]+)\s{15,}(\S+)\n')
+    symbol_re_gnu = re.compile(r'\s{15,}0x([0-9a-f]+)\s+(\S+)\n')
     for line in open(args.mapfile):
-        m = symbol_re.match(line)
-        if not m or m.group(2) not in exports:
+        name = None
+        m = symbol_re_clang.match(line)
+        if m:
+            name = m.group(5)
+        else:
+            m = symbol_re_gnu.match(line)
+            if m:
+                name = m.group(2)
+        if name is None or name not in exports:
             continue
         addr = int(m.group(1), 16)
-        exports[m.group(2)] = addr
+        exports[name] = addr
 for (name, addr) in exports.items():
     if addr is None:
         raise Exception("Required export symbols %s not found" % name)