mbox series

[v5.4,0/3] Update SELinuxfs out of tree and then swapover

Message ID 20201015192956.1797021-1-dburgener@linux.microsoft.com (mailing list archive)
Headers show
Series Update SELinuxfs out of tree and then swapover | expand

Message

Daniel Burgener Oct. 15, 2020, 7:29 p.m. UTC
This is a backport for stable of my series to fix a race condition in
selinuxfs during policy load:

selinux: Create function for selinuxfs directory cleanup
https://lore.kernel.org/selinux/20200819195935.1720168-2-dburgener@linux.microsoft.com/

selinux: Refactor selinuxfs directory populating functions
https://lore.kernel.org/selinux/20200819195935.1720168-3-dburgener@linux.microsoft.com/

selinux: Standardize string literal usage for selinuxfs directory names
https://lore.kernel.org/selinux/20200819195935.1720168-4-dburgener@linux.microsoft.com/

selinux: Create new booleans and class dirs out of tree
https://lore.kernel.org/selinux/20200819195935.1720168-5-dburgener@linux.microsoft.com/

Several changes were necessary to backport.  They are detailed in the
commit message for the third commit in the series.  I also dropped the
original third commit from this because it was only a style change.

The bulk of the original cover letter is reproduced below.

In the current implementation, on policy load /sys/fs/selinux is updated
by deleting the previous contents of
/sys/fs/selinux/{class,booleans} and then recreating them.  This means
that there is a period of time when the contents of these directories do
not exist which can cause race conditions as userspace relies on them for
information about the policy.  In addition, it means that error recovery
in the event of failure is challenging.

This patch series follows the design outlined by Al Viro in a previous
e-mail to the list[1].  This approach is to first create the new
directory structures out of tree, then to perform the swapover, and
finally to delete the old directories.  Not handled in this series is
error recovery in the event of failure.

Error recovery in the selinuxfs recreation is unhandled in the current
code, so this series will not cause any regression in this regard.
Handling directory recreation in this manner is a prerequisite to make
proper error handling possible.

In order to demonstrate the race condition that this series fixes, you
can use the following commands:

while true; do cat /sys/fs/selinux/class/service/perms/status
>/dev/null; done &
while true; do load_policy; done;

In the existing code, this will display errors fairly often as the class
lookup fails.  (In normal operation from systemd, this would result in a
permission check which would be allowed or denied based on policy settings
around unknown object classes.) After applying this patch series you
should expect to no longer see such error messages.

Daniel Burgener (3):
  selinux: Create function for selinuxfs directory cleanup
  selinux: Refactor selinuxfs directory populating functions
  selinux: Create new booleans and class dirs out of tree

 security/selinux/selinuxfs.c | 160 +++++++++++++++++++++++++++--------
 1 file changed, 123 insertions(+), 37 deletions(-)

Comments

Greg KH Oct. 16, 2020, 5 a.m. UTC | #1
On Thu, Oct 15, 2020 at 03:29:53PM -0400, Daniel Burgener wrote:
> This is a backport for stable of my series to fix a race condition in
> selinuxfs during policy load:
> 
> selinux: Create function for selinuxfs directory cleanup
> https://lore.kernel.org/selinux/20200819195935.1720168-2-dburgener@linux.microsoft.com/
> 
> selinux: Refactor selinuxfs directory populating functions
> https://lore.kernel.org/selinux/20200819195935.1720168-3-dburgener@linux.microsoft.com/
> 
> selinux: Standardize string literal usage for selinuxfs directory names
> https://lore.kernel.org/selinux/20200819195935.1720168-4-dburgener@linux.microsoft.com/
> 
> selinux: Create new booleans and class dirs out of tree
> https://lore.kernel.org/selinux/20200819195935.1720168-5-dburgener@linux.microsoft.com/
> 
> Several changes were necessary to backport.  They are detailed in the
> commit message for the third commit in the series.  I also dropped the
> original third commit from this because it was only a style change.

As Sasha said, we want to take the original commits as much as possible
to reduce the delta.  It is ok to take style changes if other patches
depend on them, because every time we do something that is not upstream,
we create bugs.

So can you redo this series, and backport the original commits, and
provide the ids for them as well?

thanks,

greg k-h
Daniel Burgener Oct. 16, 2020, 1:05 p.m. UTC | #2
On 10/16/20 1:00 AM, Greg KH wrote:
> On Thu, Oct 15, 2020 at 03:29:53PM -0400, Daniel Burgener wrote:
>> This is a backport for stable of my series to fix a race condition in
>> selinuxfs during policy load:
>>
>> selinux: Create function for selinuxfs directory cleanup
>> https://lore.kernel.org/selinux/20200819195935.1720168-2-dburgener@linux.microsoft.com/
>>
>> selinux: Refactor selinuxfs directory populating functions
>> https://lore.kernel.org/selinux/20200819195935.1720168-3-dburgener@linux.microsoft.com/
>>
>> selinux: Standardize string literal usage for selinuxfs directory names
>> https://lore.kernel.org/selinux/20200819195935.1720168-4-dburgener@linux.microsoft.com/
>>
>> selinux: Create new booleans and class dirs out of tree
>> https://lore.kernel.org/selinux/20200819195935.1720168-5-dburgener@linux.microsoft.com/
>>
>> Several changes were necessary to backport.  They are detailed in the
>> commit message for the third commit in the series.  I also dropped the
>> original third commit from this because it was only a style change.
> As Sasha said, we want to take the original commits as much as possible
> to reduce the delta.  It is ok to take style changes if other patches
> depend on them, because every time we do something that is not upstream,
> we create bugs.
>
> So can you redo this series, and backport the original commits, and
> provide the ids for them as well?
>
> thanks,
>
> greg k-h

Yes, thank you.  I will fix up the series with the third commit 
included, and add commit ids.  Thanks.


-Daniel
Paul Moore Oct. 16, 2020, 1:55 p.m. UTC | #3
On Fri, Oct 16, 2020 at 9:05 AM Daniel Burgener
<dburgener@linux.microsoft.com> wrote:
> Yes, thank you.  I will fix up the series with the third commit
> included, and add commit ids.  Thanks.

Greg and I have different opinions on what is classified as a good
candidate for the -stable trees, but in my opinion this patch series
doesn't qualify.  There are a lot of dependencies, it is intertwined
with a lot of code, and the issue that this patchset fixes has been
around for a *long* time.  I personally feel the risk of backporting
this to -stable does not outweigh the potential wins.

My current opinion is that backporting this patchset is not a good
idea; it gets a NACK from me.

Daniel, in the future if this is something you want to see backported
please bring this issue up on the SELinux mailing list when the
patchset is originally posted so we can have a discussion about it and
plan accordingly.
Daniel Burgener Oct. 16, 2020, 2:02 p.m. UTC | #4
On 10/16/20 9:55 AM, Paul Moore wrote:
> On Fri, Oct 16, 2020 at 9:05 AM Daniel Burgener
> <dburgener@linux.microsoft.com> wrote:
>> Yes, thank you.  I will fix up the series with the third commit
>> included, and add commit ids.  Thanks.
> Greg and I have different opinions on what is classified as a good
> candidate for the -stable trees, but in my opinion this patch series
> doesn't qualify.  There are a lot of dependencies, it is intertwined
> with a lot of code, and the issue that this patchset fixes has been
> around for a *long* time.  I personally feel the risk of backporting
> this to -stable does not outweigh the potential wins.
>
> My current opinion is that backporting this patchset is not a good
> idea; it gets a NACK from me.
>
> Daniel, in the future if this is something you want to see backported
> please bring this issue up on the SELinux mailing list when the
> patchset is originally posted so we can have a discussion about it and
> plan accordingly.
>
Noted.  Thanks for the feedback.  I will make sure to bring such things 
up with the selinux list in the future.

-Daniel
Sasha Levin Oct. 16, 2020, 2:22 p.m. UTC | #5
On Fri, Oct 16, 2020 at 09:55:25AM -0400, Paul Moore wrote:
>On Fri, Oct 16, 2020 at 9:05 AM Daniel Burgener
><dburgener@linux.microsoft.com> wrote:
>> Yes, thank you.  I will fix up the series with the third commit
>> included, and add commit ids.  Thanks.
>
>Greg and I have different opinions on what is classified as a good
>candidate for the -stable trees, but in my opinion this patch series
>doesn't qualify.  There are a lot of dependencies, it is intertwined
>with a lot of code, and the issue that this patchset fixes has been
>around for a *long* time.  I personally feel the risk of backporting
>this to -stable does not outweigh the potential wins.

My understanding is that while the issue Daniel is fixing here has been
around for a while, it's also very real - the reports suggest a failure
rate of 1-2% on boot.

I do understand your concerns around this series, but given it was just
fixed upstream we don't have a better story than "sit tight for the
next LTS" to tell to users affected by this issue.

Is there a scenario where you'd feel safer with the series? I suspect
that if it doesn't go into upstream stable Daniel will end up carrying
it out of tree anyway, so maybe we can ask Daniel to do targetted
testing for the next week or two and report back?
Daniel Burgener Oct. 16, 2020, 2:36 p.m. UTC | #6
On 10/16/20 10:22 AM, Sasha Levin wrote:
> On Fri, Oct 16, 2020 at 09:55:25AM -0400, Paul Moore wrote:
>> On Fri, Oct 16, 2020 at 9:05 AM Daniel Burgener
>> <dburgener@linux.microsoft.com> wrote:
>>> Yes, thank you.  I will fix up the series with the third commit
>>> included, and add commit ids.  Thanks.
>>
>> Greg and I have different opinions on what is classified as a good
>> candidate for the -stable trees, but in my opinion this patch series
>> doesn't qualify.  There are a lot of dependencies, it is intertwined
>> with a lot of code, and the issue that this patchset fixes has been
>> around for a *long* time.  I personally feel the risk of backporting
>> this to -stable does not outweigh the potential wins.
>
> My understanding is that while the issue Daniel is fixing here has been
> around for a while, it's also very real - the reports suggest a failure
> rate of 1-2% on boot.

As a point of clarity, I think that the issue occurs much less 
frequently on boot than it does with a policy load during ordinary 
operation, since there are a much higher volume of userspace policy 
manager lookups on a policy_load once the system is up.  I think 1-2% is 
roughly accurate for what we're seeing in the environment I'm working on 
for a policy load during normal steady state operation.  I don't have 
hard numbers on policy load during boot, but I would expect it to be 
quite a bit lower.  We have seen it, but it's not the common case we're 
seeing.

>
> I do understand your concerns around this series, but given it was just
> fixed upstream we don't have a better story than "sit tight for the
> next LTS" to tell to users affected by this issue.
>
> Is there a scenario where you'd feel safer with the series? I suspect
> that if it doesn't go into upstream stable Daniel will end up carrying
> it out of tree anyway, so maybe we can ask Daniel to do targetted
> testing for the next week or two and report back?
>
I believe my team will intend to carry this out of tree, yes.  If 
additional data from that would be helpful, I'd be happy to provide it.

-Daniel