mbox series

[0/2] userspace: Implement new format of filename trans rules

Message ID 20200327152107.95915-1-omosnace@redhat.com (mailing list archive)
Headers show
Series userspace: Implement new format of filename trans rules | expand

Message

Ondrej Mosnacek March 27, 2020, 3:21 p.m. UTC
These patches are the userspace side of the kernel change posted at [1].

The first patch changes libsepol's internal representation of filename
transition rules in a way similar to kernel commit c3a276111ea2
("selinux: optimize storage of filename transitions") [2].

The second patch then builds upon that and implements reading and
writing of a new binary policy format that uses this representation also
in the data layout.

See individual patches for more details.

NOTE: This series unfortunately breaks the build of setools. Moreover,
when an existing build of setools dynamically links against the new
libsepol, it segfaults. Sadly, there doesn't seem to be a nice way of
handling this, since setools relies on non-public libsepol policydb
API/ABI.

[1] https://lore.kernel.org/selinux/20200327151941.95619-1-omosnace@redhat.com/T/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git/commit/?id=c3a276111ea2572399281988b3129683e2a6b60b

Ondrej Mosnacek (2):
  libsepol,checkpolicy: optimize storage of filename transitions
  libsepol: implement POLICYDB_VERSION_COMP_FTRANS

 checkpolicy/policy_define.c                |  52 ++--
 checkpolicy/test/dispol.c                  |  27 +-
 libsepol/cil/src/cil_binary.c              |  29 +-
 libsepol/include/sepol/policydb/policydb.h |  18 +-
 libsepol/src/expand.c                      |  60 +---
 libsepol/src/kernel_to_cil.c               |  24 +-
 libsepol/src/kernel_to_conf.c              |  24 +-
 libsepol/src/policydb.c                    | 313 ++++++++++++++++-----
 libsepol/src/write.c                       | 100 +++++--
 9 files changed, 440 insertions(+), 207 deletions(-)

Comments

Stephen Smalley March 27, 2020, 7:21 p.m. UTC | #1
On 3/27/20 11:21 AM, Ondrej Mosnacek wrote:
> These patches are the userspace side of the kernel change posted at [1].
> 
> The first patch changes libsepol's internal representation of filename
> transition rules in a way similar to kernel commit c3a276111ea2
> ("selinux: optimize storage of filename transitions") [2].
> 
> The second patch then builds upon that and implements reading and
> writing of a new binary policy format that uses this representation also
> in the data layout.
> 
> See individual patches for more details.
> 
> NOTE: This series unfortunately breaks the build of setools. Moreover,
> when an existing build of setools dynamically links against the new
> libsepol, it segfaults. Sadly, there doesn't seem to be a nice way of
> handling this, since setools relies on non-public libsepol policydb
> API/ABI.

I think this has happened before a few years ago when we made a 
different change to those structures, and required updates on the 
setools side.

Maybe we need to figure out what setools needs to be encapsulated and 
exported as part of the libsepol public ABI/API, and then stop having it 
peer into libsepol internals?
Chris PeBenito March 30, 2020, 1:05 p.m. UTC | #2
On 3/27/20 3:21 PM, Stephen Smalley wrote:
> On 3/27/20 11:21 AM, Ondrej Mosnacek wrote:
>> These patches are the userspace side of the kernel change posted at [1].
>>
>> The first patch changes libsepol's internal representation of filename
>> transition rules in a way similar to kernel commit c3a276111ea2
>> ("selinux: optimize storage of filename transitions") [2].
>>
>> The second patch then builds upon that and implements reading and
>> writing of a new binary policy format that uses this representation also
>> in the data layout.
>>
>> See individual patches for more details.
>>
>> NOTE: This series unfortunately breaks the build of setools. Moreover,
>> when an existing build of setools dynamically links against the new
>> libsepol, it segfaults. Sadly, there doesn't seem to be a nice way of
>> handling this, since setools relies on non-public libsepol policydb
>> API/ABI.
> 
> I think this has happened before a few years ago when we made a 
> different change to those structures, and required updates on the 
> setools side.
> 
> Maybe we need to figure out what setools needs to be encapsulated and 
> exported as part of the libsepol public ABI/API, and then stop having it 
> peer into libsepol internals?

I'd be fine with that :)

Ultimately SETools does 2 thing for the policy access:
* iterate over the entire policy, reading data out of the various objects
* use linkages between objects in the policy, e.g. get the 
type/attributes from an AV rule, get types from an attribute, etc. using 
the stuff like class_val_to_struct as needed.

So if sufficient functionality to do dispol was exported, that would get 
close to all that was needed.  There are some optimizations I made by 
digging at the structs a bit more than the above, but that could 
potentially be dropped if libsepol has sufficient support.  See 
<https://github.com/SELinuxProject/setools/blob/master/setools/policyrep/selinuxpolicy.pxi#L673> 
for the policy loading code.
James Carter April 29, 2020, 7 p.m. UTC | #3
On Mon, Mar 30, 2020 at 9:06 AM Chris PeBenito <pebenito@ieee.org> wrote:
>
> On 3/27/20 3:21 PM, Stephen Smalley wrote:
> > On 3/27/20 11:21 AM, Ondrej Mosnacek wrote:
> >> These patches are the userspace side of the kernel change posted at [1].
> >>
> >> The first patch changes libsepol's internal representation of filename
> >> transition rules in a way similar to kernel commit c3a276111ea2
> >> ("selinux: optimize storage of filename transitions") [2].
> >>
> >> The second patch then builds upon that and implements reading and
> >> writing of a new binary policy format that uses this representation also
> >> in the data layout.
> >>
> >> See individual patches for more details.
> >>
> >> NOTE: This series unfortunately breaks the build of setools. Moreover,
> >> when an existing build of setools dynamically links against the new
> >> libsepol, it segfaults. Sadly, there doesn't seem to be a nice way of
> >> handling this, since setools relies on non-public libsepol policydb
> >> API/ABI.
> >
> > I think this has happened before a few years ago when we made a
> > different change to those structures, and required updates on the
> > setools side.
> >
> > Maybe we need to figure out what setools needs to be encapsulated and
> > exported as part of the libsepol public ABI/API, and then stop having it
> > peer into libsepol internals?
>
> I'd be fine with that :)
>

I am a little confused. I would consider peering into the libsepol
internals to be something like assuming the memory layout of
structures and pulling things out based on those assumptions, but
setools is just using public header files and functions. It seems to
me to be using the public API. Now I think that too much of the
internals are being exposed, but I am not sure we can do anything
about that now.

It doesn't seem like it would be too hard to update setools to work
with this change, unless there is a requirement that it still work
with older versions of libsepol. I am not very familiar with cython
and I don't know how to make it work differently depending on what
version of libsepol is on the system.

Speaking of versions, since we are actually modifying structs in the
header files instead of just adding new things, don't we need change
the version of libsepol or something?

> Ultimately SETools does 2 thing for the policy access:
> * iterate over the entire policy, reading data out of the various objects
> * use linkages between objects in the policy, e.g. get the
> type/attributes from an AV rule, get types from an attribute, etc. using
> the stuff like class_val_to_struct as needed.
>
> So if sufficient functionality to do dispol was exported, that would get
> close to all that was needed.  There are some optimizations I made by
> digging at the structs a bit more than the above, but that could
> potentially be dropped if libsepol has sufficient support.  See
> <https://github.com/SELinuxProject/setools/blob/master/setools/policyrep/selinuxpolicy.pxi#L673>
> for the policy loading code.
>

I think the fact that the CIL, kernel to CIL, kernel to conf, and
module to CIL code is all in libsepol speaks to the fact of how
tightly integrated they are to the rest of libsepol. One argument that
could be made is that the policyrep stuff in setools really belongs in
libsepol.

Thinking about how libsepol could be encapsulated leads me to a couple
of possibilities. One way would be functions that could return lists
of rules. The policy module code gives us avrule_t, role_trans_rule_t,
role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
Those structures are probably unlikely to change and, at least in this
case, creating a function that walks the filename_trans hashtable and
returns a list of filename_trans_rule_t certainly seems like it
wouldn't be too hard. Another possible way to encapsulate would be
create a way to walk the various hashtables element by element (I
think hashtab_map() requires too much knowledge of the internal
structures), returning an opaque structure to track where you are in
the hashtable and functions that allow you to get each part of the
rule being stored. There are other ways that it could be done, but I
could rewrite kernel to and module to stuff with either of those. CIL
itself would require some functions to insert rules into the policydb
which probably wouldn't be too hard. None of this would be too hard,
but it would take some time. The real question is would it really be
valuable?

Jim
Stephen Smalley April 29, 2020, 7:26 p.m. UTC | #4
On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 9:06 AM Chris PeBenito <pebenito@ieee.org> wrote:
> >
> > On 3/27/20 3:21 PM, Stephen Smalley wrote:
> > > On 3/27/20 11:21 AM, Ondrej Mosnacek wrote:
> > >> These patches are the userspace side of the kernel change posted at [1].
> > >>
> > >> The first patch changes libsepol's internal representation of filename
> > >> transition rules in a way similar to kernel commit c3a276111ea2
> > >> ("selinux: optimize storage of filename transitions") [2].
> > >>
> > >> The second patch then builds upon that and implements reading and
> > >> writing of a new binary policy format that uses this representation also
> > >> in the data layout.
> > >>
> > >> See individual patches for more details.
> > >>
> > >> NOTE: This series unfortunately breaks the build of setools. Moreover,
> > >> when an existing build of setools dynamically links against the new
> > >> libsepol, it segfaults. Sadly, there doesn't seem to be a nice way of
> > >> handling this, since setools relies on non-public libsepol policydb
> > >> API/ABI.
> > >
> > > I think this has happened before a few years ago when we made a
> > > different change to those structures, and required updates on the
> > > setools side.
> > >
> > > Maybe we need to figure out what setools needs to be encapsulated and
> > > exported as part of the libsepol public ABI/API, and then stop having it
> > > peer into libsepol internals?
> >
> > I'd be fine with that :)
> >
>
> I am a little confused. I would consider peering into the libsepol
> internals to be something like assuming the memory layout of
> structures and pulling things out based on those assumptions, but
> setools is just using public header files and functions. It seems to
> me to be using the public API. Now I think that too much of the
> internals are being exposed, but I am not sure we can do anything
> about that now.

The sepol/policydb/*.h files are considered private and are only for
static users of libsepol
like checkpolicy.  Only the top-level sepol/*.h headers are part of
the shared library API/ABI.

> It doesn't seem like it would be too hard to update setools to work
> with this change, unless there is a requirement that it still work
> with older versions of libsepol. I am not very familiar with cython
> and I don't know how to make it work differently depending on what
> version of libsepol is on the system.
>
> Speaking of versions, since we are actually modifying structs in the
> header files instead of just adding new things, don't we need change
> the version of libsepol or something?
>
> > Ultimately SETools does 2 thing for the policy access:
> > * iterate over the entire policy, reading data out of the various objects
> > * use linkages between objects in the policy, e.g. get the
> > type/attributes from an AV rule, get types from an attribute, etc. using
> > the stuff like class_val_to_struct as needed.
> >
> > So if sufficient functionality to do dispol was exported, that would get
> > close to all that was needed.  There are some optimizations I made by
> > digging at the structs a bit more than the above, but that could
> > potentially be dropped if libsepol has sufficient support.  See
> > <https://github.com/SELinuxProject/setools/blob/master/setools/policyrep/selinuxpolicy.pxi#L673>
> > for the policy loading code.
> >
>
> I think the fact that the CIL, kernel to CIL, kernel to conf, and
> module to CIL code is all in libsepol speaks to the fact of how
> tightly integrated they are to the rest of libsepol. One argument that
> could be made is that the policyrep stuff in setools really belongs in
> libsepol.
>
> Thinking about how libsepol could be encapsulated leads me to a couple
> of possibilities. One way would be functions that could return lists
> of rules. The policy module code gives us avrule_t, role_trans_rule_t,
> role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
> Those structures are probably unlikely to change and, at least in this
> case, creating a function that walks the filename_trans hashtable and
> returns a list of filename_trans_rule_t certainly seems like it
> wouldn't be too hard. Another possible way to encapsulate would be
> create a way to walk the various hashtables element by element (I
> think hashtab_map() requires too much knowledge of the internal
> structures), returning an opaque structure to track where you are in
> the hashtable and functions that allow you to get each part of the
> rule being stored. There are other ways that it could be done, but I
> could rewrite kernel to and module to stuff with either of those. CIL
> itself would require some functions to insert rules into the policydb
> which probably wouldn't be too hard. None of this would be too hard,
> but it would take some time. The real question is would it really be
> valuable?
Stephen Smalley April 30, 2020, 1:22 p.m. UTC | #5
On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
> I think the fact that the CIL, kernel to CIL, kernel to conf, and
> module to CIL code is all in libsepol speaks to the fact of how
> tightly integrated they are to the rest of libsepol. One argument that
> could be made is that the policyrep stuff in setools really belongs in
> libsepol.
>
> Thinking about how libsepol could be encapsulated leads me to a couple
> of possibilities. One way would be functions that could return lists
> of rules. The policy module code gives us avrule_t, role_trans_rule_t,
> role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
> Those structures are probably unlikely to change and, at least in this
> case, creating a function that walks the filename_trans hashtable and
> returns a list of filename_trans_rule_t certainly seems like it
> wouldn't be too hard. Another possible way to encapsulate would be
> create a way to walk the various hashtables element by element (I
> think hashtab_map() requires too much knowledge of the internal
> structures), returning an opaque structure to track where you are in
> the hashtable and functions that allow you to get each part of the
> rule being stored. There are other ways that it could be done, but I
> could rewrite kernel to and module to stuff with either of those. CIL
> itself would require some functions to insert rules into the policydb
> which probably wouldn't be too hard. None of this would be too hard,
> but it would take some time. The real question is would it really be
> valuable?

I don't think we want to directly expose the existing data structures
from include/sepol/policydb/*.h (or at least not without a careful
audit) since those are often tightly coupled to policy compiler
internals and/or the kernel or module policy formats. Creating an
abstraction for each with a proper API in new definitions in
include/sepol/*.h would be preferable albeit more work. There was a
proposal a long time ago from the setools developers to create an
iterator interface and accessor functions for each data type, see
https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.
Ondrej Mosnacek April 30, 2020, 2:20 p.m. UTC | #6
On Thu, Apr 30, 2020 at 3:23 PM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
> On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
> > I think the fact that the CIL, kernel to CIL, kernel to conf, and
> > module to CIL code is all in libsepol speaks to the fact of how
> > tightly integrated they are to the rest of libsepol. One argument that
> > could be made is that the policyrep stuff in setools really belongs in
> > libsepol.
> >
> > Thinking about how libsepol could be encapsulated leads me to a couple
> > of possibilities. One way would be functions that could return lists
> > of rules. The policy module code gives us avrule_t, role_trans_rule_t,
> > role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
> > Those structures are probably unlikely to change and, at least in this
> > case, creating a function that walks the filename_trans hashtable and
> > returns a list of filename_trans_rule_t certainly seems like it
> > wouldn't be too hard. Another possible way to encapsulate would be
> > create a way to walk the various hashtables element by element (I
> > think hashtab_map() requires too much knowledge of the internal
> > structures), returning an opaque structure to track where you are in
> > the hashtable and functions that allow you to get each part of the
> > rule being stored. There are other ways that it could be done, but I
> > could rewrite kernel to and module to stuff with either of those. CIL
> > itself would require some functions to insert rules into the policydb
> > which probably wouldn't be too hard. None of this would be too hard,
> > but it would take some time. The real question is would it really be
> > valuable?
>
> I don't think we want to directly expose the existing data structures
> from include/sepol/policydb/*.h (or at least not without a careful
> audit) since those are often tightly coupled to policy compiler
> internals and/or the kernel or module policy formats. Creating an
> abstraction for each with a proper API in new definitions in
> include/sepol/*.h would be preferable albeit more work. There was a
> proposal a long time ago from the setools developers to create an
> iterator interface and accessor functions for each data type, see
> https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.

We could also partially mitigate the problem by linking libsepol
statically into setools at build time. I suggested this in [1] for
Fedora at least, but in general there is the problem that you need to
know the exact path to libsepol.a in setup.py [2]. I don't have enough
experience with python libraries to know if this can be implemented
directly in the upstream setup.py script in a reasonably generic way.

If linked statically, at least it wouldn't segfault after an update to
an incompatible libsepol.so. It would still fail to build with the new
headers, but at least this turns from a user-visible bug to only
packager's & maintainer's problem.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1826000
[2] https://stackoverflow.com/questions/4597228/how-to-statically-link-a-library-when-compiling-a-python-module-extension/49139257#49139257
Chris PeBenito April 30, 2020, 2:24 p.m. UTC | #7
On 4/30/20 9:22 AM, Stephen Smalley wrote:
> On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
>> I think the fact that the CIL, kernel to CIL, kernel to conf, and
>> module to CIL code is all in libsepol speaks to the fact of how
>> tightly integrated they are to the rest of libsepol. One argument that
>> could be made is that the policyrep stuff in setools really belongs in
>> libsepol.
>>
>> Thinking about how libsepol could be encapsulated leads me to a couple
>> of possibilities. One way would be functions that could return lists
>> of rules. The policy module code gives us avrule_t, role_trans_rule_t,
>> role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
>> Those structures are probably unlikely to change and, at least in this
>> case, creating a function that walks the filename_trans hashtable and
>> returns a list of filename_trans_rule_t certainly seems like it
>> wouldn't be too hard. Another possible way to encapsulate would be
>> create a way to walk the various hashtables element by element (I
>> think hashtab_map() requires too much knowledge of the internal
>> structures), returning an opaque structure to track where you are in
>> the hashtable and functions that allow you to get each part of the
>> rule being stored. There are other ways that it could be done, but I
>> could rewrite kernel to and module to stuff with either of those. CIL
>> itself would require some functions to insert rules into the policydb
>> which probably wouldn't be too hard. None of this would be too hard,
>> but it would take some time. The real question is would it really be
>> valuable?
> 
> I don't think we want to directly expose the existing data structures
> from include/sepol/policydb/*.h (or at least not without a careful
> audit) since those are often tightly coupled to policy compiler
> internals and/or the kernel or module policy formats. Creating an
> abstraction for each with a proper API in new definitions in
> include/sepol/*.h would be preferable albeit more work. There was a
> proposal a long time ago from the setools developers to create an
> iterator interface and accessor functions for each data type, see
> https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.

I agree.  The hardest thing with writing the policyrep in setools was stuff like 
the value_to_datum indirections, type_attr_map, etc. and knowing when to use 
value vs value-1.  An API that has a new set of structs would be ideal.

Unfortunately, since setools policyrep is now written in Cython, we can't simply 
move the code over to libsepol.  My guess is dispol has the most useful building 
blocks for making a new API.
Ondrej Mosnacek April 30, 2020, 2:34 p.m. UTC | #8
On Thu, Apr 30, 2020 at 4:24 PM Chris PeBenito <pebenito@ieee.org> wrote:
> On 4/30/20 9:22 AM, Stephen Smalley wrote:
> > On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
> >> I think the fact that the CIL, kernel to CIL, kernel to conf, and
> >> module to CIL code is all in libsepol speaks to the fact of how
> >> tightly integrated they are to the rest of libsepol. One argument that
> >> could be made is that the policyrep stuff in setools really belongs in
> >> libsepol.
> >>
> >> Thinking about how libsepol could be encapsulated leads me to a couple
> >> of possibilities. One way would be functions that could return lists
> >> of rules. The policy module code gives us avrule_t, role_trans_rule_t,
> >> role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
> >> Those structures are probably unlikely to change and, at least in this
> >> case, creating a function that walks the filename_trans hashtable and
> >> returns a list of filename_trans_rule_t certainly seems like it
> >> wouldn't be too hard. Another possible way to encapsulate would be
> >> create a way to walk the various hashtables element by element (I
> >> think hashtab_map() requires too much knowledge of the internal
> >> structures), returning an opaque structure to track where you are in
> >> the hashtable and functions that allow you to get each part of the
> >> rule being stored. There are other ways that it could be done, but I
> >> could rewrite kernel to and module to stuff with either of those. CIL
> >> itself would require some functions to insert rules into the policydb
> >> which probably wouldn't be too hard. None of this would be too hard,
> >> but it would take some time. The real question is would it really be
> >> valuable?
> >
> > I don't think we want to directly expose the existing data structures
> > from include/sepol/policydb/*.h (or at least not without a careful
> > audit) since those are often tightly coupled to policy compiler
> > internals and/or the kernel or module policy formats. Creating an
> > abstraction for each with a proper API in new definitions in
> > include/sepol/*.h would be preferable albeit more work. There was a
> > proposal a long time ago from the setools developers to create an
> > iterator interface and accessor functions for each data type, see
> > https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.
>
> I agree.  The hardest thing with writing the policyrep in setools was stuff like
> the value_to_datum indirections, type_attr_map, etc. and knowing when to use
> value vs value-1.  An API that has a new set of structs would be ideal.
>
> Unfortunately, since setools policyrep is now written in Cython, we can't simply
> move the code over to libsepol.  My guess is dispol has the most useful building
> blocks for making a new API.

Since you mention dispol... I also had the idea that setools could
just use the existing public interface to convert the whole policydb
to CIL and simply parse that as a string (this should be pretty
straightforward even in pure Python). However, based on my experiments
this would likely make setools a lot slower...
Chris PeBenito April 30, 2020, 2:58 p.m. UTC | #9
On 4/30/20 10:20 AM, Ondrej Mosnacek wrote:
> On Thu, Apr 30, 2020 at 3:23 PM Stephen Smalley
> <stephen.smalley.work@gmail.com> wrote:
>> On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
>>> I think the fact that the CIL, kernel to CIL, kernel to conf, and
>>> module to CIL code is all in libsepol speaks to the fact of how
>>> tightly integrated they are to the rest of libsepol. One argument that
>>> could be made is that the policyrep stuff in setools really belongs in
>>> libsepol.
>>>
>>> Thinking about how libsepol could be encapsulated leads me to a couple
>>> of possibilities. One way would be functions that could return lists
>>> of rules. The policy module code gives us avrule_t, role_trans_rule_t,
>>> role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
>>> Those structures are probably unlikely to change and, at least in this
>>> case, creating a function that walks the filename_trans hashtable and
>>> returns a list of filename_trans_rule_t certainly seems like it
>>> wouldn't be too hard. Another possible way to encapsulate would be
>>> create a way to walk the various hashtables element by element (I
>>> think hashtab_map() requires too much knowledge of the internal
>>> structures), returning an opaque structure to track where you are in
>>> the hashtable and functions that allow you to get each part of the
>>> rule being stored. There are other ways that it could be done, but I
>>> could rewrite kernel to and module to stuff with either of those. CIL
>>> itself would require some functions to insert rules into the policydb
>>> which probably wouldn't be too hard. None of this would be too hard,
>>> but it would take some time. The real question is would it really be
>>> valuable?
>>
>> I don't think we want to directly expose the existing data structures
>> from include/sepol/policydb/*.h (or at least not without a careful
>> audit) since those are often tightly coupled to policy compiler
>> internals and/or the kernel or module policy formats. Creating an
>> abstraction for each with a proper API in new definitions in
>> include/sepol/*.h would be preferable albeit more work. There was a
>> proposal a long time ago from the setools developers to create an
>> iterator interface and accessor functions for each data type, see
>> https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.
> 
> We could also partially mitigate the problem by linking libsepol
> statically into setools at build time. I suggested this in [1] for
> Fedora at least, but in general there is the problem that you need to
> know the exact path to libsepol.a in setup.py [2]. I don't have enough
> experience with python libraries to know if this can be implemented
> directly in the upstream setup.py script in a reasonably generic way.
> 
> If linked statically, at least it wouldn't segfault after an update to
> an incompatible libsepol.so. It would still fail to build with the new
> headers, but at least this turns from a user-visible bug to only
> packager's & maintainer's problem.

It used to be static and people complained about that.  I moved it to dynamic 
knowing this would be an issue, but an uncommon one.
Chris PeBenito April 30, 2020, 3:20 p.m. UTC | #10
On 4/30/20 10:34 AM, Ondrej Mosnacek wrote:
> On Thu, Apr 30, 2020 at 4:24 PM Chris PeBenito <pebenito@ieee.org> wrote:
>> On 4/30/20 9:22 AM, Stephen Smalley wrote:
>>> On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
>>>> I think the fact that the CIL, kernel to CIL, kernel to conf, and
>>>> module to CIL code is all in libsepol speaks to the fact of how
>>>> tightly integrated they are to the rest of libsepol. One argument that
>>>> could be made is that the policyrep stuff in setools really belongs in
>>>> libsepol.
>>>>
>>>> Thinking about how libsepol could be encapsulated leads me to a couple
>>>> of possibilities. One way would be functions that could return lists
>>>> of rules. The policy module code gives us avrule_t, role_trans_rule_t,
>>>> role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
>>>> Those structures are probably unlikely to change and, at least in this
>>>> case, creating a function that walks the filename_trans hashtable and
>>>> returns a list of filename_trans_rule_t certainly seems like it
>>>> wouldn't be too hard. Another possible way to encapsulate would be
>>>> create a way to walk the various hashtables element by element (I
>>>> think hashtab_map() requires too much knowledge of the internal
>>>> structures), returning an opaque structure to track where you are in
>>>> the hashtable and functions that allow you to get each part of the
>>>> rule being stored. There are other ways that it could be done, but I
>>>> could rewrite kernel to and module to stuff with either of those. CIL
>>>> itself would require some functions to insert rules into the policydb
>>>> which probably wouldn't be too hard. None of this would be too hard,
>>>> but it would take some time. The real question is would it really be
>>>> valuable?
>>>
>>> I don't think we want to directly expose the existing data structures
>>> from include/sepol/policydb/*.h (or at least not without a careful
>>> audit) since those are often tightly coupled to policy compiler
>>> internals and/or the kernel or module policy formats. Creating an
>>> abstraction for each with a proper API in new definitions in
>>> include/sepol/*.h would be preferable albeit more work. There was a
>>> proposal a long time ago from the setools developers to create an
>>> iterator interface and accessor functions for each data type, see
>>> https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.
>>
>> I agree.  The hardest thing with writing the policyrep in setools was stuff like
>> the value_to_datum indirections, type_attr_map, etc. and knowing when to use
>> value vs value-1.  An API that has a new set of structs would be ideal.
>>
>> Unfortunately, since setools policyrep is now written in Cython, we can't simply
>> move the code over to libsepol.  My guess is dispol has the most useful building
>> blocks for making a new API.
> 
> Since you mention dispol... I also had the idea that setools could
> just use the existing public interface to convert the whole policydb
> to CIL and simply parse that as a string (this should be pretty
> straightforward even in pure Python). However, based on my experiments
> this would likely make setools a lot slower...

This is an interesting idea.  I'm not familiar with the CIL API; secilc.c looks 
like it uses public API.  Can I use the existing CIL library functions to parse 
it, or does the resultant db lack public accessor functions?
James Carter April 30, 2020, 3:21 p.m. UTC | #11
On Thu, Apr 30, 2020 at 9:23 AM Stephen Smalley
<stephen.smalley.work@gmail.com> wrote:
>
> On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
> > I think the fact that the CIL, kernel to CIL, kernel to conf, and
> > module to CIL code is all in libsepol speaks to the fact of how
> > tightly integrated they are to the rest of libsepol. One argument that
> > could be made is that the policyrep stuff in setools really belongs in
> > libsepol.
> >
> > Thinking about how libsepol could be encapsulated leads me to a couple
> > of possibilities. One way would be functions that could return lists
> > of rules. The policy module code gives us avrule_t, role_trans_rule_t,
> > role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
> > Those structures are probably unlikely to change and, at least in this
> > case, creating a function that walks the filename_trans hashtable and
> > returns a list of filename_trans_rule_t certainly seems like it
> > wouldn't be too hard. Another possible way to encapsulate would be
> > create a way to walk the various hashtables element by element (I
> > think hashtab_map() requires too much knowledge of the internal
> > structures), returning an opaque structure to track where you are in
> > the hashtable and functions that allow you to get each part of the
> > rule being stored. There are other ways that it could be done, but I
> > could rewrite kernel to and module to stuff with either of those. CIL
> > itself would require some functions to insert rules into the policydb
> > which probably wouldn't be too hard. None of this would be too hard,
> > but it would take some time. The real question is would it really be
> > valuable?
>
> I don't think we want to directly expose the existing data structures
> from include/sepol/policydb/*.h (or at least not without a careful
> audit) since those are often tightly coupled to policy compiler
> internals and/or the kernel or module policy formats. Creating an
> abstraction for each with a proper API in new definitions in
> include/sepol/*.h would be preferable albeit more work. There was a
> proposal a long time ago from the setools developers to create an
> iterator interface and accessor functions for each data type, see
> https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.

I am so used to everything just using the stuff in
include/sepol/policydb/ that I forget that it exists. There are a
number of iterators there (for users, ports, nodes, etc), so it seems
like creating an iterator and other functions for filename transition
rules and converting setools to use that would be the first step. Over
time other iterators can be created and setools converted to use those
as well.

Jim
James Carter April 30, 2020, 3:27 p.m. UTC | #12
On Thu, Apr 30, 2020 at 11:20 AM Chris PeBenito <pebenito@ieee.org> wrote:
>
> On 4/30/20 10:34 AM, Ondrej Mosnacek wrote:
> > On Thu, Apr 30, 2020 at 4:24 PM Chris PeBenito <pebenito@ieee.org> wrote:
> >> On 4/30/20 9:22 AM, Stephen Smalley wrote:
> >>> On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
> >>>> I think the fact that the CIL, kernel to CIL, kernel to conf, and
> >>>> module to CIL code is all in libsepol speaks to the fact of how
> >>>> tightly integrated they are to the rest of libsepol. One argument that
> >>>> could be made is that the policyrep stuff in setools really belongs in
> >>>> libsepol.
> >>>>
> >>>> Thinking about how libsepol could be encapsulated leads me to a couple
> >>>> of possibilities. One way would be functions that could return lists
> >>>> of rules. The policy module code gives us avrule_t, role_trans_rule_t,
> >>>> role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
> >>>> Those structures are probably unlikely to change and, at least in this
> >>>> case, creating a function that walks the filename_trans hashtable and
> >>>> returns a list of filename_trans_rule_t certainly seems like it
> >>>> wouldn't be too hard. Another possible way to encapsulate would be
> >>>> create a way to walk the various hashtables element by element (I
> >>>> think hashtab_map() requires too much knowledge of the internal
> >>>> structures), returning an opaque structure to track where you are in
> >>>> the hashtable and functions that allow you to get each part of the
> >>>> rule being stored. There are other ways that it could be done, but I
> >>>> could rewrite kernel to and module to stuff with either of those. CIL
> >>>> itself would require some functions to insert rules into the policydb
> >>>> which probably wouldn't be too hard. None of this would be too hard,
> >>>> but it would take some time. The real question is would it really be
> >>>> valuable?
> >>>
> >>> I don't think we want to directly expose the existing data structures
> >>> from include/sepol/policydb/*.h (or at least not without a careful
> >>> audit) since those are often tightly coupled to policy compiler
> >>> internals and/or the kernel or module policy formats. Creating an
> >>> abstraction for each with a proper API in new definitions in
> >>> include/sepol/*.h would be preferable albeit more work. There was a
> >>> proposal a long time ago from the setools developers to create an
> >>> iterator interface and accessor functions for each data type, see
> >>> https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.
> >>
> >> I agree.  The hardest thing with writing the policyrep in setools was stuff like
> >> the value_to_datum indirections, type_attr_map, etc. and knowing when to use
> >> value vs value-1.  An API that has a new set of structs would be ideal.
> >>
> >> Unfortunately, since setools policyrep is now written in Cython, we can't simply
> >> move the code over to libsepol.  My guess is dispol has the most useful building
> >> blocks for making a new API.
> >
> > Since you mention dispol... I also had the idea that setools could
> > just use the existing public interface to convert the whole policydb
> > to CIL and simply parse that as a string (this should be pretty
> > straightforward even in pure Python). However, based on my experiments
> > this would likely make setools a lot slower...
>
> This is an interesting idea.  I'm not familiar with the CIL API; secilc.c looks
> like it uses public API.  Can I use the existing CIL library functions to parse
> it, or does the resultant db lack public accessor functions?
>

The resultant db does, in fact, lack public assessor functions. They
could be created, but since there is already a way to convert the
cil_db to a policydb, the general solution would be to create the
functions for the policydb.

Jim
Ondrej Mosnacek April 30, 2020, 3:34 p.m. UTC | #13
On Thu, Apr 30, 2020 at 5:20 PM Chris PeBenito <pebenito@ieee.org> wrote:
> On 4/30/20 10:34 AM, Ondrej Mosnacek wrote:
> > On Thu, Apr 30, 2020 at 4:24 PM Chris PeBenito <pebenito@ieee.org> wrote:
> >> On 4/30/20 9:22 AM, Stephen Smalley wrote:
> >>> On Wed, Apr 29, 2020 at 3:01 PM James Carter <jwcart2@gmail.com> wrote:
> >>>> I think the fact that the CIL, kernel to CIL, kernel to conf, and
> >>>> module to CIL code is all in libsepol speaks to the fact of how
> >>>> tightly integrated they are to the rest of libsepol. One argument that
> >>>> could be made is that the policyrep stuff in setools really belongs in
> >>>> libsepol.
> >>>>
> >>>> Thinking about how libsepol could be encapsulated leads me to a couple
> >>>> of possibilities. One way would be functions that could return lists
> >>>> of rules. The policy module code gives us avrule_t, role_trans_rule_t,
> >>>> role_allow_t, filename_trans_rule_t, range_trans_rule_t, and others.
> >>>> Those structures are probably unlikely to change and, at least in this
> >>>> case, creating a function that walks the filename_trans hashtable and
> >>>> returns a list of filename_trans_rule_t certainly seems like it
> >>>> wouldn't be too hard. Another possible way to encapsulate would be
> >>>> create a way to walk the various hashtables element by element (I
> >>>> think hashtab_map() requires too much knowledge of the internal
> >>>> structures), returning an opaque structure to track where you are in
> >>>> the hashtable and functions that allow you to get each part of the
> >>>> rule being stored. There are other ways that it could be done, but I
> >>>> could rewrite kernel to and module to stuff with either of those. CIL
> >>>> itself would require some functions to insert rules into the policydb
> >>>> which probably wouldn't be too hard. None of this would be too hard,
> >>>> but it would take some time. The real question is would it really be
> >>>> valuable?
> >>>
> >>> I don't think we want to directly expose the existing data structures
> >>> from include/sepol/policydb/*.h (or at least not without a careful
> >>> audit) since those are often tightly coupled to policy compiler
> >>> internals and/or the kernel or module policy formats. Creating an
> >>> abstraction for each with a proper API in new definitions in
> >>> include/sepol/*.h would be preferable albeit more work. There was a
> >>> proposal a long time ago from the setools developers to create an
> >>> iterator interface and accessor functions for each data type, see
> >>> https://lore.kernel.org/selinux/200603212246.k2LMkRNq028071@gotham.columbia.tresys.com/.
> >>
> >> I agree.  The hardest thing with writing the policyrep in setools was stuff like
> >> the value_to_datum indirections, type_attr_map, etc. and knowing when to use
> >> value vs value-1.  An API that has a new set of structs would be ideal.
> >>
> >> Unfortunately, since setools policyrep is now written in Cython, we can't simply
> >> move the code over to libsepol.  My guess is dispol has the most useful building
> >> blocks for making a new API.
> >
> > Since you mention dispol... I also had the idea that setools could
> > just use the existing public interface to convert the whole policydb
> > to CIL and simply parse that as a string (this should be pretty
> > straightforward even in pure Python). However, based on my experiments
> > this would likely make setools a lot slower...
>
> This is an interesting idea.  I'm not familiar with the CIL API; secilc.c looks
> like it uses public API.  Can I use the existing CIL library functions to parse
> it, or does the resultant db lack public accessor functions?

What I had in mind was actually to just use
sepol_kernel_policydb_to_cil() to dump the textual CIL into a
temporary file (maybe using tmpfile(3)), and then parse the contents
in Python. The CIL format is really easy to parse (especially in
Python) so a lack of existing functions for that wouldn't be much of
an issue. Yes, this would be a little dirty, but you'd avoid the
trouble of maintaining a stable binary interface between libsepol and
setools.