mbox series

[v2,0/5] MODVERSIONS + RUST Redux

Message ID 20231118025748.2778044-1-mmaurer@google.com (mailing list archive)
Headers show
Series MODVERSIONS + RUST Redux | expand

Message

Matthew Maurer Nov. 18, 2023, 2:54 a.m. UTC
The goal of this patch series is to allow MODVERSIONS and RUST to be
enabled simultaneously. The primary issue with doing this at the moment
is that Rust uses some extremely long symbol names - for those
unfamiliar with Rust, it may be helpful to think of some of the mangled
C++ names you may have seen in binaries in the past.

Previously, Gary Guo attempted to accomplish this by modifying the
existing modversion format [1] to support variable-length symbol names.
This was unfortunately considered to be a potential userspace break
because kmod tools inspect this kernel module metadata. Masahiro Yamada
suggested [2] that this could instead be done with a section per-field.
This gives us the ability to be more flexible with this format in the
future, as a new field or additional information will be in a new
section which userspace tools will not yet attempt to read.

In the previous version of this patchset, Luis Chamberlain suggested [3]
I move validation out of the version checking and into the elf validity
checker, and also add kernel-docs over there. I found
elf_validity_cached_copy to be fairly dense and difficult to directly
describe, so I refactored it into easier to explain pieces. In the
process, I found a few missing checks and added those as well. See
[PATCH 2/5] for more details. If this is too much, I'm more than happy
to drop this patch from the series in favor of just adding the
kernel-doc to the original code, but figured I'd offer it up in case the
added clarity and checks were valuable.

[1] https://lore.kernel.org/lkml/20230111161155.1349375-1-gary@garyguo.net/
[2] https://lore.kernel.org/lkml/CAK7LNATsuszFR7JB5ZkqVS1W=hWr9=E7bTf+MvgJ+NXT3aZNwg@mail.gmail.com/
[3] https://lore.kernel.org/lkml/ZVZNh%2FPA5HiVRkeb@bombadil.infradead.org/

Matthew Maurer (5):
  export_report: Rehabilitate script
  modules: Refactor + kdoc elf_validity_cached_copy
  modpost: Extended modversion support
  rust: Allow MODVERSIONS
  export_report: Use new version info format

 arch/powerpc/kernel/module_64.c |  25 +-
 init/Kconfig                    |   1 -
 kernel/module/internal.h        |  18 +-
 kernel/module/main.c            | 663 +++++++++++++++++++++++++-------
 kernel/module/version.c         |  43 +++
 scripts/export_report.pl        |  17 +-
 scripts/mod/modpost.c           |  37 +-
 7 files changed, 642 insertions(+), 162 deletions(-)

Comments

Masahiro Yamada Nov. 22, 2023, 3:49 p.m. UTC | #1
On Sat, Nov 18, 2023 at 11:58 AM Matthew Maurer <mmaurer@google.com> wrote:
>
> The goal of this patch series is to allow MODVERSIONS and RUST to be
> enabled simultaneously. The primary issue with doing this at the moment
> is that Rust uses some extremely long symbol names - for those
> unfamiliar with Rust, it may be helpful to think of some of the mangled
> C++ names you may have seen in binaries in the past.
>
> Previously, Gary Guo attempted to accomplish this by modifying the
> existing modversion format [1] to support variable-length symbol names.
> This was unfortunately considered to be a potential userspace break
> because kmod tools inspect this kernel module metadata. Masahiro Yamada
> suggested [2] that this could instead be done with a section per-field.
> This gives us the ability to be more flexible with this format in the
> future, as a new field or additional information will be in a new
> section which userspace tools will not yet attempt to read.
>
> In the previous version of this patchset, Luis Chamberlain suggested [3]
> I move validation out of the version checking and into the elf validity
> checker, and also add kernel-docs over there. I found
> elf_validity_cached_copy to be fairly dense and difficult to directly
> describe, so I refactored it into easier to explain pieces. In the
> process, I found a few missing checks and added those as well. See
> [PATCH 2/5] for more details. If this is too much, I'm more than happy
> to drop this patch from the series in favor of just adding the
> kernel-doc to the original code, but figured I'd offer it up in case the
> added clarity and checks were valuable.
>
> [1] https://lore.kernel.org/lkml/20230111161155.1349375-1-gary@garyguo.net/
> [2] https://lore.kernel.org/lkml/CAK7LNATsuszFR7JB5ZkqVS1W=hWr9=E7bTf+MvgJ+NXT3aZNwg@mail.gmail.com/
> [3] https://lore.kernel.org/lkml/ZVZNh%2FPA5HiVRkeb@bombadil.infradead.org/





I want to know why this is useful.


To clarify my question, let me first explain
what the modversion is.



In C, a function callee and callers must agree
with the interface of the function.


This is usually done by having a function prototype
in a header file.


Say, we have a function declaration

    int foo(int x, int y);

in include/linux/foo.h


Then, the C file that defines foo() and all C files
that call it must include <linux/foo.h> so that
argument mismatch can be detected.




Same for EXPORT_SYMBOL; the symbol provider and consumers
must agree with the interface of exported symbols.

In the kernel, however, there is no promise for in-kernel ABI
compatibility across different kernel versions.
The kernel only promises the compatibility of the userspace interface.


To load modules, by principle, vmlinux and modules must have
the same version.

To slightly loosen the limitation, CONFIG_MODVERSIONS was
introduced; when it is enabled, you can load a module
as long as all the prototypes of exported symbols match.

To do this, we need to encode information about prototypes.


This is done by a tool called genksyms (scripts/genksyms/genksyms).



Say, we have this code:


int foo(int x, int y)
{
     // genksyms does not care about
     // the function body.
}
EXPORT_SYMBOL(foo);


Genksyms parses the code and computes a CRC value for 'foo'.
Genksyms is only interested in the function name and its prototype.

It sees

   int foo(int, int)

and it transforms it into a CRC.


Any change to the prototype results in a
different CRC, so the module subsystem
can check the interface compatibility
before loading a module.


It is obvious that this is impossible for Rust source
because scripts/genksyms/genksyms is only able to
parse C code.


Then, what is happening here?

See rust/exports.c


  #define EXPORT_SYMBOL_RUST_GPL(sym) extern int sym; EXPORT_SYMBOL_GPL(sym)


The global scope symbols in Rust (i.e. 'pub) are automatically
exported, and all of them are visible as 'int' variables
from C world.


Genksyms will see this code:

  extern int foo;
  EXPORT_SYMBOL_GPL(foo);

Of course, this is not a true prototype.
The real signature on the Rust side might be:

  fn foo(x: i32, y: i32) -> i32


So, even if you enable CONFIG_MODVERSIONS,
nothing is checked for Rust.
Genksyms computes a CRC from "int foo", and
the module subsystem confirms it is a "int"
variable.

We know this check always succeeds.

Why is this useful?






> Matthew Maurer (5):
>   export_report: Rehabilitate script
>   modules: Refactor + kdoc elf_validity_cached_copy
>   modpost: Extended modversion support
>   rust: Allow MODVERSIONS
>   export_report: Use new version info format
>
>  arch/powerpc/kernel/module_64.c |  25 +-
>  init/Kconfig                    |   1 -
>  kernel/module/internal.h        |  18 +-
>  kernel/module/main.c            | 663 +++++++++++++++++++++++++-------
>  kernel/module/version.c         |  43 +++
>  scripts/export_report.pl        |  17 +-
>  scripts/mod/modpost.c           |  37 +-
>  7 files changed, 642 insertions(+), 162 deletions(-)
>
> --
> 2.43.0.rc0.421.g78406f8d94-goog
>


--
Best Regards





Masahiro Yamada
Matthew Maurer Nov. 22, 2023, 9:04 p.m. UTC | #2
> So, even if you enable CONFIG_MODVERSIONS,
> nothing is checked for Rust.
> Genksyms computes a CRC from "int foo", and
> the module subsystem confirms it is a "int"
> variable.
>
> We know this check always succeeds.
>
> Why is this useful?
The reason this is immediately useful is that it allows us to have Rust
in use with a kernel where C modules are able to benefit from MODVERSIONS
checking. The check would effectively be a no-op for now, as you have correctly
determined, but we could refine it to make it more restrictive later.
Since the
existing C approach errs on the side of "it could work" rather than "it will
work", I thought being more permissive was the correct initial solution.

If we want to err on the other side (modversions passes, so we're pretty sure
it will work), I could add to the last patch support for using .rmeta files as
the CRC source for Rust symbols. This would essentially say that the interface
for the entire compilation unit has to stay the same rather than just that one
function. We could potentially loosen this requirement in the future.

With regards to future directions that likely won't work for loosening it:
Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
teach genksyms to open it up and split out the pieces for specific functions.
Extending genksyms to parse Rust would also not solve the situation -
layouts are allowed to differ across compiler versions or even (in rare
cases) seemingly unrelated code changes.

Future directions that might work for loosening it:
* Generating crcs from debuginfo + compiler + flags
* Adding a feature to the rust compiler to dump this information. This
is likely to
  get pushback because Rust's current stance is that there is no ability to load
  object code built against a different library.

Would setting up Rust symbols so that they have a crc built out of .rmeta be
sufficient for you to consider this useful? If not, can you help me understand
what level of precision would be required?
Greg Kroah-Hartman Nov. 23, 2023, 9:05 a.m. UTC | #3
On Wed, Nov 22, 2023 at 01:04:09PM -0800, Matthew Maurer wrote:
> > So, even if you enable CONFIG_MODVERSIONS,
> > nothing is checked for Rust.
> > Genksyms computes a CRC from "int foo", and
> > the module subsystem confirms it is a "int"
> > variable.
> >
> > We know this check always succeeds.
> >
> > Why is this useful?
> The reason this is immediately useful is that it allows us to have Rust
> in use with a kernel where C modules are able to benefit from MODVERSIONS
> checking. The check would effectively be a no-op for now, as you have correctly
> determined, but we could refine it to make it more restrictive later.
> Since the
> existing C approach errs on the side of "it could work" rather than "it will
> work", I thought being more permissive was the correct initial solution.

But it's just providing "fake" information to the CRC checker, which
means that the guarantee of a ABI check is not true at all.

So the ask for the user of "ensure that the ABI checking is correct" is
being circumvented here, and any change in the rust side can not be
detected at all.

The kernel is a "whole", either an option works for it, or it doesn't,
and you are splitting that guarantee here by saying "modversions will
only work for a portion of the kernel, not the whole thing" which is
going to cause problems for when people expect it to actually work
properly.

So, I'd strongly recommend fixing this for the rust code if you wish to
allow modversions to be enabled at all.

> With regards to future directions that likely won't work for loosening it:
> Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> teach genksyms to open it up and split out the pieces for specific functions.
> Extending genksyms to parse Rust would also not solve the situation -
> layouts are allowed to differ across compiler versions or even (in rare
> cases) seemingly unrelated code changes.

What do you mean by "layout" here?  Yes, the crcs can be different
across compiler versions and seemingly unrelated code changes (genksyms
is VERY fragile) but that's ok, that's not what you are checking here.
You want to know if the rust function signature changes or not from the
last time you built the code, with the same compiler and options, that's
all you are verifying.

> Future directions that might work for loosening it:
> * Generating crcs from debuginfo + compiler + flags
> * Adding a feature to the rust compiler to dump this information. This
> is likely to
>   get pushback because Rust's current stance is that there is no ability to load
>   object code built against a different library.

Why not parse the function signature like we do for C?

> Would setting up Rust symbols so that they have a crc built out of .rmeta be
> sufficient for you to consider this useful? If not, can you help me understand
> what level of precision would be required?

What exactly does .rmeta have to do with the function signature?  That's
all you care about here.

thanks,

greg k-h
Masahiro Yamada Nov. 23, 2023, 11:38 a.m. UTC | #4
On Thu, Nov 23, 2023 at 6:05 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Nov 22, 2023 at 01:04:09PM -0800, Matthew Maurer wrote:
> > > So, even if you enable CONFIG_MODVERSIONS,
> > > nothing is checked for Rust.
> > > Genksyms computes a CRC from "int foo", and
> > > the module subsystem confirms it is a "int"
> > > variable.
> > >
> > > We know this check always succeeds.
> > >
> > > Why is this useful?
> > The reason this is immediately useful is that it allows us to have Rust
> > in use with a kernel where C modules are able to benefit from MODVERSIONS
> > checking. The check would effectively be a no-op for now, as you have correctly
> > determined, but we could refine it to make it more restrictive later.
> > Since the
> > existing C approach errs on the side of "it could work" rather than "it will
> > work", I thought being more permissive was the correct initial solution.
>
> But it's just providing "fake" information to the CRC checker, which
> means that the guarantee of a ABI check is not true at all.
>
> So the ask for the user of "ensure that the ABI checking is correct" is
> being circumvented here, and any change in the rust side can not be
> detected at all.
>
> The kernel is a "whole", either an option works for it, or it doesn't,
> and you are splitting that guarantee here by saying "modversions will
> only work for a portion of the kernel, not the whole thing" which is
> going to cause problems for when people expect it to actually work
> properly.
>
> So, I'd strongly recommend fixing this for the rust code if you wish to
> allow modversions to be enabled at all.
>
> > With regards to future directions that likely won't work for loosening it:
> > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > teach genksyms to open it up and split out the pieces for specific functions.
> > Extending genksyms to parse Rust would also not solve the situation -
> > layouts are allowed to differ across compiler versions or even (in rare
> > cases) seemingly unrelated code changes.
>
> What do you mean by "layout" here?  Yes, the crcs can be different
> across compiler versions and seemingly unrelated code changes (genksyms
> is VERY fragile) but that's ok, that's not what you are checking here.
> You want to know if the rust function signature changes or not from the
> last time you built the code, with the same compiler and options, that's
> all you are verifying.
>
> > Future directions that might work for loosening it:
> > * Generating crcs from debuginfo + compiler + flags
> > * Adding a feature to the rust compiler to dump this information. This
> > is likely to
> >   get pushback because Rust's current stance is that there is no ability to load
> >   object code built against a different library.
>
> Why not parse the function signature like we do for C?
>
> > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > sufficient for you to consider this useful? If not, can you help me understand
> > what level of precision would be required?
>
> What exactly does .rmeta have to do with the function signature?  That's
> all you care about here.




rmeta is generated per crate.

CRC is computed per symbol.

They have different granularity.
It is weird to refuse a module for incompatibility
of a symbol that it is not using at all.




> thanks,
>
> greg k-h




--
Best Regards
Masahiro Yamada
Greg Kroah-Hartman Nov. 23, 2023, 12:12 p.m. UTC | #5
On Thu, Nov 23, 2023 at 08:38:45PM +0900, Masahiro Yamada wrote:
> On Thu, Nov 23, 2023 at 6:05 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Nov 22, 2023 at 01:04:09PM -0800, Matthew Maurer wrote:
> > > > So, even if you enable CONFIG_MODVERSIONS,
> > > > nothing is checked for Rust.
> > > > Genksyms computes a CRC from "int foo", and
> > > > the module subsystem confirms it is a "int"
> > > > variable.
> > > >
> > > > We know this check always succeeds.
> > > >
> > > > Why is this useful?
> > > The reason this is immediately useful is that it allows us to have Rust
> > > in use with a kernel where C modules are able to benefit from MODVERSIONS
> > > checking. The check would effectively be a no-op for now, as you have correctly
> > > determined, but we could refine it to make it more restrictive later.
> > > Since the
> > > existing C approach errs on the side of "it could work" rather than "it will
> > > work", I thought being more permissive was the correct initial solution.
> >
> > But it's just providing "fake" information to the CRC checker, which
> > means that the guarantee of a ABI check is not true at all.
> >
> > So the ask for the user of "ensure that the ABI checking is correct" is
> > being circumvented here, and any change in the rust side can not be
> > detected at all.
> >
> > The kernel is a "whole", either an option works for it, or it doesn't,
> > and you are splitting that guarantee here by saying "modversions will
> > only work for a portion of the kernel, not the whole thing" which is
> > going to cause problems for when people expect it to actually work
> > properly.
> >
> > So, I'd strongly recommend fixing this for the rust code if you wish to
> > allow modversions to be enabled at all.
> >
> > > With regards to future directions that likely won't work for loosening it:
> > > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > > teach genksyms to open it up and split out the pieces for specific functions.
> > > Extending genksyms to parse Rust would also not solve the situation -
> > > layouts are allowed to differ across compiler versions or even (in rare
> > > cases) seemingly unrelated code changes.
> >
> > What do you mean by "layout" here?  Yes, the crcs can be different
> > across compiler versions and seemingly unrelated code changes (genksyms
> > is VERY fragile) but that's ok, that's not what you are checking here.
> > You want to know if the rust function signature changes or not from the
> > last time you built the code, with the same compiler and options, that's
> > all you are verifying.
> >
> > > Future directions that might work for loosening it:
> > > * Generating crcs from debuginfo + compiler + flags
> > > * Adding a feature to the rust compiler to dump this information. This
> > > is likely to
> > >   get pushback because Rust's current stance is that there is no ability to load
> > >   object code built against a different library.
> >
> > Why not parse the function signature like we do for C?
> >
> > > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > > sufficient for you to consider this useful? If not, can you help me understand
> > > what level of precision would be required?
> >
> > What exactly does .rmeta have to do with the function signature?  That's
> > all you care about here.
> 
> 
> 
> 
> rmeta is generated per crate.
> 
> CRC is computed per symbol.
> 
> They have different granularity.
> It is weird to refuse a module for incompatibility
> of a symbol that it is not using at all.

I agree, this should be on a per-symbol basis, so the Rust
infrastructure in the kernel needs to be fixed up to support this
properly, not just ignored like this patchset does.

thanks,

greg k-h
Matthew Maurer Nov. 27, 2023, 7:27 p.m. UTC | #6
> > >
> > > > With regards to future directions that likely won't work for loosening it:
> > > > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > > > teach genksyms to open it up and split out the pieces for specific functions.
> > > > Extending genksyms to parse Rust would also not solve the situation -
> > > > layouts are allowed to differ across compiler versions or even (in rare
> > > > cases) seemingly unrelated code changes.
> > >
> > > What do you mean by "layout" here?  Yes, the crcs can be different
> > > across compiler versions and seemingly unrelated code changes (genksyms
> > > is VERY fragile) but that's ok, that's not what you are checking here.
> > > You want to know if the rust function signature changes or not from the
> > > last time you built the code, with the same compiler and options, that's
> > > all you are verifying.
What I mean by layout here is that if you write in Rust:
struct Foo {
  x: i32,
  y: i32,
}
it is not guaranteed to have the same layout across different compilations, even
within the same compiler. See
https://doc.rust-lang.org/reference/type-layout.html#the-rust-representation
Specifically, the compiler is allowed to arbitrarily insert padding,
reorder fields, etc.
on the same code as long as the overall alignment of the struct and individual
alignment of the fields remains correct and non-overlapping.

This means the compiler is *explicitly* allowed to, for example, permute x and y
as an optimization. In the above example this is unlikely, but if you
instead consider
struct Bar {
  x: i8,
  y: i64,
  z: i8,
}
It's easy to see why the compiler might decide to structure this as
y,x,z to reduce the
size of the struct. Those optimization decisions may be affected by
any other part of
the code, PGO, etc.
> > >
> > > > Future directions that might work for loosening it:
> > > > * Generating crcs from debuginfo + compiler + flags
> > > > * Adding a feature to the rust compiler to dump this information. This
> > > > is likely to
> > > >   get pushback because Rust's current stance is that there is no ability to load
> > > >   object code built against a different library.
> > >
> > > Why not parse the function signature like we do for C?
Because the function signature is insufficient to check the ABI, see above.
> > >
> > > > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > > > sufficient for you to consider this useful? If not, can you help me understand
> > > > what level of precision would be required?
> > >
> > > What exactly does .rmeta have to do with the function signature?  That's
> > > all you care about here.
The .rmeta file contains the decisions the compiler made about layout
in the crate
you're interfacing with. For example, the choice to encode Bar
with a yxz field order would be written into the .rmeta file.
> >
> >
> >
> >
> > rmeta is generated per crate.
> >
> > CRC is computed per symbol.
> >
> > They have different granularity.
> > It is weird to refuse a module for incompatibility
> > of a symbol that it is not using at all.
>
> I agree, this should be on a per-symbol basis, so the Rust
> infrastructure in the kernel needs to be fixed up to support this
> properly, not just ignored like this patchset does.
I agree there is a divergence here, I tried to point it out so that it
wouldn't be
a surprise later. The .rmeta file itself (which is the only way we
could know that
the ABI actually matches, because layout decisions are in there) is an unstable
format, which is why I would be reluctant to try to parse it and find only the
relevant portions to hash. This isn't just a "technically unstable"
format, but one
in which the compiler essentially just serializes out relevant internal data
structures, so any parser for it will involve significant alterations
on compiler
updates, which doesn't seem like a good plan.
>
> thanks,
>
> greg k-h
Given the above additional information, would you be interested in a patchset
which either:

A. Computes the CRC off the Rust type signature, knowing the compiler is
allowed to change the ABI based on information not contained in the CRC.
B. Uses the CRC of the .rmeta file, knowing, as was pointed out, that this
effectively contains the ABI of every symbol in the compilation unit, as well
as inline functions and polymorphic functions.

If neither of these works, we likely can't turn on MODVERSIONS+RUST until
further work is done upstream in the compiler to export some of this data in
an at least semi-stable fashion.
Greg Kroah-Hartman Nov. 28, 2023, 8:05 a.m. UTC | #7
On Mon, Nov 27, 2023 at 11:27:07AM -0800, Matthew Maurer wrote:
> > > >
> > > > > With regards to future directions that likely won't work for loosening it:
> > > > > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > > > > teach genksyms to open it up and split out the pieces for specific functions.
> > > > > Extending genksyms to parse Rust would also not solve the situation -
> > > > > layouts are allowed to differ across compiler versions or even (in rare
> > > > > cases) seemingly unrelated code changes.
> > > >
> > > > What do you mean by "layout" here?  Yes, the crcs can be different
> > > > across compiler versions and seemingly unrelated code changes (genksyms
> > > > is VERY fragile) but that's ok, that's not what you are checking here.
> > > > You want to know if the rust function signature changes or not from the
> > > > last time you built the code, with the same compiler and options, that's
> > > > all you are verifying.
> What I mean by layout here is that if you write in Rust:
> struct Foo {
>   x: i32,
>   y: i32,
> }
> it is not guaranteed to have the same layout across different compilations, even
> within the same compiler. See
> https://doc.rust-lang.org/reference/type-layout.html#the-rust-representation

Then you are going to have big problems, sorry.

> Specifically, the compiler is allowed to arbitrarily insert padding,
> reorder fields, etc.
> on the same code as long as the overall alignment of the struct and individual
> alignment of the fields remains correct and non-overlapping.
> 
> This means the compiler is *explicitly* allowed to, for example, permute x and y
> as an optimization. In the above example this is unlikely, but if you
> instead consider
> struct Bar {
>   x: i8,
>   y: i64,
>   z: i8,
> }
> It's easy to see why the compiler might decide to structure this as
> y,x,z to reduce the
> size of the struct. Those optimization decisions may be affected by
> any other part of
> the code, PGO, etc.

Then you all need to figure out some way to determine how the compiler
layed out the structure after it compiled/optimized it and be able to
compare it to previous builds (or just generate a crc based on the
layout it chose.)

> > > > > Future directions that might work for loosening it:
> > > > > * Generating crcs from debuginfo + compiler + flags
> > > > > * Adding a feature to the rust compiler to dump this information. This
> > > > > is likely to
> > > > >   get pushback because Rust's current stance is that there is no ability to load
> > > > >   object code built against a different library.
> > > >
> > > > Why not parse the function signature like we do for C?
> Because the function signature is insufficient to check the ABI, see above.
> > > >
> > > > > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > > > > sufficient for you to consider this useful? If not, can you help me understand
> > > > > what level of precision would be required?
> > > >
> > > > What exactly does .rmeta have to do with the function signature?  That's
> > > > all you care about here.
> The .rmeta file contains the decisions the compiler made about layout
> in the crate
> you're interfacing with. For example, the choice to encode Bar
> with a yxz field order would be written into the .rmeta file.

Ok, then yes, can you parse the .rmeta file to get that information?

> > > rmeta is generated per crate.
> > >
> > > CRC is computed per symbol.
> > >
> > > They have different granularity.
> > > It is weird to refuse a module for incompatibility
> > > of a symbol that it is not using at all.
> >
> > I agree, this should be on a per-symbol basis, so the Rust
> > infrastructure in the kernel needs to be fixed up to support this
> > properly, not just ignored like this patchset does.
> I agree there is a divergence here, I tried to point it out so that it
> wouldn't be
> a surprise later. The .rmeta file itself (which is the only way we
> could know that
> the ABI actually matches, because layout decisions are in there) is an unstable
> format, which is why I would be reluctant to try to parse it and find only the
> relevant portions to hash. This isn't just a "technically unstable"
> format, but one
> in which the compiler essentially just serializes out relevant internal data
> structures, so any parser for it will involve significant alterations
> on compiler
> updates, which doesn't seem like a good plan.
> >
> > thanks,
> >
> > greg k-h
> Given the above additional information, would you be interested in a patchset
> which either:
> 
> A. Computes the CRC off the Rust type signature, knowing the compiler is
> allowed to change the ABI based on information not contained in the CRC.

No.

> B. Uses the CRC of the .rmeta file, knowing, as was pointed out, that this
> effectively contains the ABI of every symbol in the compilation unit, as well
> as inline functions and polymorphic functions.

No.

> If neither of these works, we likely can't turn on MODVERSIONS+RUST until
> further work is done upstream in the compiler to export some of this data in
> an at least semi-stable fashion.

Looks like you need something a bit more fine-grained, as pointed out
above.  why not parse the structure/function information in the .rmeta
file?  Is the format of that file not stable?

thanks,

greg k-h
Greg Kroah-Hartman Nov. 28, 2023, 8:44 a.m. UTC | #8
On Tue, Nov 28, 2023 at 08:05:26AM +0000, Greg KH wrote:
> On Mon, Nov 27, 2023 at 11:27:07AM -0800, Matthew Maurer wrote:
> > > > >
> > > > > > With regards to future directions that likely won't work for loosening it:
> > > > > > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > > > > > teach genksyms to open it up and split out the pieces for specific functions.
> > > > > > Extending genksyms to parse Rust would also not solve the situation -
> > > > > > layouts are allowed to differ across compiler versions or even (in rare
> > > > > > cases) seemingly unrelated code changes.
> > > > >
> > > > > What do you mean by "layout" here?  Yes, the crcs can be different
> > > > > across compiler versions and seemingly unrelated code changes (genksyms
> > > > > is VERY fragile) but that's ok, that's not what you are checking here.
> > > > > You want to know if the rust function signature changes or not from the
> > > > > last time you built the code, with the same compiler and options, that's
> > > > > all you are verifying.
> > What I mean by layout here is that if you write in Rust:
> > struct Foo {
> >   x: i32,
> >   y: i32,
> > }
> > it is not guaranteed to have the same layout across different compilations, even
> > within the same compiler. See
> > https://doc.rust-lang.org/reference/type-layout.html#the-rust-representation
> 
> Then you are going to have big problems, sorry.
> 
> > Specifically, the compiler is allowed to arbitrarily insert padding,
> > reorder fields, etc.
> > on the same code as long as the overall alignment of the struct and individual
> > alignment of the fields remains correct and non-overlapping.
> > 
> > This means the compiler is *explicitly* allowed to, for example, permute x and y
> > as an optimization. In the above example this is unlikely, but if you
> > instead consider
> > struct Bar {
> >   x: i8,
> >   y: i64,
> >   z: i8,
> > }
> > It's easy to see why the compiler might decide to structure this as
> > y,x,z to reduce the
> > size of the struct. Those optimization decisions may be affected by
> > any other part of
> > the code, PGO, etc.
> 
> Then you all need to figure out some way to determine how the compiler
> layed out the structure after it compiled/optimized it and be able to
> compare it to previous builds (or just generate a crc based on the
> layout it chose.)
> 
> > > > > > Future directions that might work for loosening it:
> > > > > > * Generating crcs from debuginfo + compiler + flags
> > > > > > * Adding a feature to the rust compiler to dump this information. This
> > > > > > is likely to
> > > > > >   get pushback because Rust's current stance is that there is no ability to load
> > > > > >   object code built against a different library.
> > > > >
> > > > > Why not parse the function signature like we do for C?
> > Because the function signature is insufficient to check the ABI, see above.
> > > > >
> > > > > > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > > > > > sufficient for you to consider this useful? If not, can you help me understand
> > > > > > what level of precision would be required?
> > > > >
> > > > > What exactly does .rmeta have to do with the function signature?  That's
> > > > > all you care about here.
> > The .rmeta file contains the decisions the compiler made about layout
> > in the crate
> > you're interfacing with. For example, the choice to encode Bar
> > with a yxz field order would be written into the .rmeta file.
> 
> Ok, then yes, can you parse the .rmeta file to get that information?
> 
> > > > rmeta is generated per crate.
> > > >
> > > > CRC is computed per symbol.
> > > >
> > > > They have different granularity.
> > > > It is weird to refuse a module for incompatibility
> > > > of a symbol that it is not using at all.
> > >
> > > I agree, this should be on a per-symbol basis, so the Rust
> > > infrastructure in the kernel needs to be fixed up to support this
> > > properly, not just ignored like this patchset does.
> > I agree there is a divergence here, I tried to point it out so that it
> > wouldn't be
> > a surprise later. The .rmeta file itself (which is the only way we
> > could know that
> > the ABI actually matches, because layout decisions are in there) is an unstable
> > format, which is why I would be reluctant to try to parse it and find only the
> > relevant portions to hash. This isn't just a "technically unstable"
> > format, but one
> > in which the compiler essentially just serializes out relevant internal data
> > structures, so any parser for it will involve significant alterations
> > on compiler
> > updates, which doesn't seem like a good plan.
> > >
> > > thanks,
> > >
> > > greg k-h
> > Given the above additional information, would you be interested in a patchset
> > which either:
> > 
> > A. Computes the CRC off the Rust type signature, knowing the compiler is
> > allowed to change the ABI based on information not contained in the CRC.
> 
> No.
> 
> > B. Uses the CRC of the .rmeta file, knowing, as was pointed out, that this
> > effectively contains the ABI of every symbol in the compilation unit, as well
> > as inline functions and polymorphic functions.
> 
> No.
> 
> > If neither of these works, we likely can't turn on MODVERSIONS+RUST until
> > further work is done upstream in the compiler to export some of this data in
> > an at least semi-stable fashion.
> 
> Looks like you need something a bit more fine-grained, as pointed out
> above.  why not parse the structure/function information in the .rmeta
> file?  Is the format of that file not stable?

Or, step back and figure something else out that can detect the
structure and function signatures of rust code and determine a way to
notice when they change.  That's the goal here, you need to notice when
the code changes, perhaps just use libabigail as that will work on the
dwarf output?

Note, libabigail will miss things that the crc checker does not miss
(and the opposite is true as well) which is why some groups (i.e.
Android) use both on their kernel to ensure that nothing slips by.

thanks,

greg k-h