diff mbox

[v3,2/2] Protected O_CREAT open in sticky directories

Message ID 1511337706-8297-3-git-send-email-s.mesoraca16@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Salvatore Mesoraca Nov. 22, 2017, 8:01 a.m. UTC
Disallows O_CREAT open missing the O_EXCL flag, in world or
group writable directories, even if the file doesn't exist yet.
With few exceptions (e.g. shared lock files based on flock())
if a program tries to open a file, in a sticky directory,
with the O_CREAT flag and without the O_EXCL, it probably has a bug.
This feature allows to detect and potentially block programs that
act this way, it can be used to find vulnerabilities (like those
prevented by patch #1) and to do policy enforcement.

Suggested-by: Solar Designer <solar@openwall.com>
Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
---
 Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++
 fs/namei.c                  | 56 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h          |  1 +
 kernel/sysctl.c             |  9 ++++++++
 4 files changed, 96 insertions(+)

Comments

Pavel Vasilyev Nov. 22, 2017, 12:03 p.m. UTC | #1
22.11.2017 11:01, Salvatore Mesoraca пишет:
> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())
> if a program tries to open a file, in a sticky directory,
> with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> This feature allows to detect and potentially block programs that
> act this way, it can be used to find vulnerabilities (like those
> prevented by patch #1) and to do policy enforcement.
> 
> Suggested-by: Solar Designer <solar@openwall.com>
> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
> ---
>  Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++
>  fs/namei.c                  | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h          |  1 +
>  kernel/sysctl.c             |  9 ++++++++
>  4 files changed, 96 insertions(+)
> 
> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index f3cf2cd..7f24b4f 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
>  - protected_fifos
>  - protected_hardlinks
>  - protected_regular
> +- protected_sticky_child_create
>  - protected_symlinks
>  - suid_dumpable
>  - super-max
> @@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories.
>  
>  ==============================================================
>  
> +protected_sticky_child_create:
> +
> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
> +often, a bug or a synthom of the fact that the program is not
> +using appropriate procedures to access sticky directories.
> +This protection allow to detect and possibly block these unsafe
> +open invocations, even if the files don't exist yet.
> +Though should be noted that, sometimes, it's OK to open a file
> +with O_CREAT and without O_EXCL (e.g. shared lock files based
> +on flock()), for this reason values above 2 should be set
> +with care.
> +
> +When set to "0" the protection is disabled.
> +
> +When set to "1", notify about O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories.
> +
> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
> +in world or group writable sticky directories.
> +
> +When set to "3", block O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories and notify (but don't block)
> +in group writable sticky directories.
> +
> +When set to "4", block O_CREAT open missing the O_EXCL flag
> +in world writable and group writable sticky directories.

May be add:

When set to "X", notify O_CREAT open missing the O_EXCL flag
in world writable sticky directories and notify in group
writable sticky directories.
Matthew Wilcox (Oracle) Nov. 22, 2017, 1:22 p.m. UTC | #2
On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:
> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
> +often, a bug or a synthom of the fact that the program is not
> +using appropriate procedures to access sticky directories.
> +This protection allow to detect and possibly block these unsafe
> +open invocations, even if the files don't exist yet.
> +Though should be noted that, sometimes, it's OK to open a file
> +with O_CREAT and without O_EXCL (e.g. shared lock files based
> +on flock()), for this reason values above 2 should be set
> +with care.
> +
> +When set to "0" the protection is disabled.
> +
> +When set to "1", notify about O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories.
> +
> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
> +in world or group writable sticky directories.
> +
> +When set to "3", block O_CREAT open missing the O_EXCL flag
> +in world writable sticky directories and notify (but don't block)
> +in group writable sticky directories.
> +
> +When set to "4", block O_CREAT open missing the O_EXCL flag
> +in world writable and group writable sticky directories.

This seems insufficiently flexible.  For example, there is no way for me
to specify that I want to block O_CREAT without O_EXCL in world-writable,
but not be notified about O_CREAT without O_EXCL in group-writable.

And maybe I want to be notified that blocking has happened?

Why not make it bits?  So:

0 => notify in world
1 => block in world
2 => notify in group
3 => block in group

So you'd have the following meaningful values:

 0 - permit all (your option 0)
 1 - notify world; permit group (your option 1)
 2 - block world; permit group
 3 - block,notify world; permit group
 4 - permit world; notify group (?)
 5 - notify world; notify group (your option 2)
 6 - block world; notify group (your option 3)
 7 - block,notify world; notify group
 8 - permit world; block group (?)
 9 - notify world; block group (?)
10 - block world; block group (your option 4)
11 - block,notify world; block group
12 - permit world; block, notify group (?)
13 - notify world; block, notify group (?)
14 - block world; block, notify group
15 - block, notify world; block, notify group

Some of these don't make a lot of sense (marked with ?), but I don't see
the harm in permitting a sysadmin to do something that seems nonsensical
to me.
Pavel Vasilyev Nov. 22, 2017, 2:38 p.m. UTC | #3
22.11.2017 16:22, Matthew Wilcox пишет:

> 
> So you'd have the following meaningful values:
> 
>  0 - permit all (your option 0)
>  1 - notify world; permit group (your option 1)
>  2 - block world; permit group
>  3 - block,notify world; permit group
>  4 - permit world; notify group (?)
>  5 - notify world; notify group (your option 2)
>  6 - block world; notify group (your option 3)
>  7 - block,notify world; notify group
>  8 - permit world; block group (?)
>  9 - notify world; block group (?)
> 10 - block world; block group (your option 4)
> 11 - block,notify world; block group
> 12 - permit world; block, notify group (?)
> 13 - notify world; block, notify group (?)
> 14 - block world; block, notify group
> 15 - block, notify world; block, notify group
> 
> Some of these don't make a lot of sense (marked with ?), but I don't see
> the harm in permitting a sysadmin to do something that seems nonsensical
> to me.
> 

I think that notification in block mode by default.
Alan Cox Nov. 22, 2017, 4:51 p.m. UTC | #4
On Wed, 22 Nov 2017 09:01:46 +0100
Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:

> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())

Enough exceptions to make it a bad idea.

Firstly if you care this much *stop* having shared writable directories.
We have namespaces, you don't need them. You can give every user their
own /tmp etc.

The rest of this only make sense on a per application and directory basis
because there are valid use cases, and that means it wants to be part of
an existing LSM security module where you've got the context required and
you can attach it to a specific directory and/or process.

Alan
Tobin Harding Nov. 23, 2017, 10:57 p.m. UTC | #5
On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:

Same caveat about this being English language comments only as for patch
1/2. Please ignore if this is too trivial. My grammar is a long way from
perfect, especially please feel free to ignore my placement of commas,
they are often wrong. 

> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())
> if a program tries to open a file, in a sticky directory,
> with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> This feature allows to detect and potentially block programs that
> act this way, it can be used to find vulnerabilities (like those
> prevented by patch #1) and to do policy enforcement.
> 
> Suggested-by: Solar Designer <solar@openwall.com>
> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
> ---
>  Documentation/sysctl/fs.txt | 30 ++++++++++++++++++++++++
>  fs/namei.c                  | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h          |  1 +
>  kernel/sysctl.c             |  9 ++++++++
>  4 files changed, 96 insertions(+)
> 
> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index f3cf2cd..7f24b4f 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
>  - protected_fifos
>  - protected_hardlinks
>  - protected_regular
> +- protected_sticky_child_create
>  - protected_symlinks
>  - suid_dumpable
>  - super-max
> @@ -238,6 +239,35 @@ When set to "2" it also applies to group writable sticky directories.
>  
>  ==============================================================
>  
> +protected_sticky_child_create:
> +
> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
> +often, a bug or a synthom of the fact that the program is not

s/synthom/symptom

> +using appropriate procedures to access sticky directories.
> +This protection allow to detect and possibly block these unsafe

Perhaps

 This protection allows us to detect, and possibly block, these unsafe

> +open invocations, even if the files don't exist yet.
> +Though should be noted that, sometimes, it's OK to open a file

Perhaps

 +Although it should be noted, sometimes it's OK to open a file

(I looked up 'although' vs 'though' and am not quite sure on this one,
it seems to read better with 'although'. Again, apologies if this is
overly trivial.)


Hope this helps,
Tobin.
Salvatore Mesoraca Nov. 24, 2017, 8:28 a.m. UTC | #6
2017-11-22 13:03 GMT+01:00 Pavel Vasilyev <dixlor@gmail.com>:
> ...
>> +
>> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
>> +in world or group writable sticky directories.
>> +
>> +When set to "3", block O_CREAT open missing the O_EXCL flag
>> +in world writable sticky directories and notify (but don't block)
>> +in group writable sticky directories.
>> +
>> +When set to "4", block O_CREAT open missing the O_EXCL flag
>> +in world writable and group writable sticky directories.
>
> May be add:
>
> When set to "X", notify O_CREAT open missing the O_EXCL flag
> in world writable sticky directories and notify in group
> writable sticky directories.

Doesn't the "2" option already do this?
Did I misunderstood something?

Salvatore
Salvatore Mesoraca Nov. 24, 2017, 8:29 a.m. UTC | #7
2017-11-22 14:22 GMT+01:00 Matthew Wilcox <willy@infradead.org>:
> On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:
>> +An O_CREAT open missing the O_EXCL flag in a sticky directory is,
>> +often, a bug or a synthom of the fact that the program is not
>> +using appropriate procedures to access sticky directories.
>> +This protection allow to detect and possibly block these unsafe
>> +open invocations, even if the files don't exist yet.
>> +Though should be noted that, sometimes, it's OK to open a file
>> +with O_CREAT and without O_EXCL (e.g. shared lock files based
>> +on flock()), for this reason values above 2 should be set
>> +with care.
>> +
>> +When set to "0" the protection is disabled.
>> +
>> +When set to "1", notify about O_CREAT open missing the O_EXCL flag
>> +in world writable sticky directories.
>> +
>> +When set to "2", notify about O_CREAT open missing the O_EXCL flag
>> +in world or group writable sticky directories.
>> +
>> +When set to "3", block O_CREAT open missing the O_EXCL flag
>> +in world writable sticky directories and notify (but don't block)
>> +in group writable sticky directories.
>> +
>> +When set to "4", block O_CREAT open missing the O_EXCL flag
>> +in world writable and group writable sticky directories.
>
> This seems insufficiently flexible.  For example, there is no way for me
> to specify that I want to block O_CREAT without O_EXCL in world-writable,
> but not be notified about O_CREAT without O_EXCL in group-writable.

I understand your concern, I did it like this because I wanted to keep the
interface as simple as possible. But, maybe, more flexibility it's better.

> And maybe I want to be notified that blocking has happened?

I didn't write it explicitly in the doc, but you will always be notified
that blocking has happened. On the other hand you don't have any way to
suppress those notification.

> Why not make it bits?  So:
>
> 0 => notify in world
> 1 => block in world
> 2 => notify in group
> 3 => block in group
>
> So you'd have the following meaningful values:
>
>  0 - permit all (your option 0)
>  1 - notify world; permit group (your option 1)
>  2 - block world; permit group
>  3 - block,notify world; permit group
>  4 - permit world; notify group (?)
>  5 - notify world; notify group (your option 2)
>  6 - block world; notify group (your option 3)
>  7 - block,notify world; notify group
>  8 - permit world; block group (?)
>  9 - notify world; block group (?)
> 10 - block world; block group (your option 4)
> 11 - block,notify world; block group
> 12 - permit world; block, notify group (?)
> 13 - notify world; block, notify group (?)
> 14 - block world; block, notify group
> 15 - block, notify world; block, notify group
>
> Some of these don't make a lot of sense (marked with ?), but I don't see
> the harm in permitting a sysadmin to do something that seems nonsensical
> to me.

I like your idea of using "bits" this way, even if it will allow sysadmins
to set values that don't make too much sense.
Thank you very much for your suggestions,

Salvatore
Salvatore Mesoraca Nov. 24, 2017, 8:31 a.m. UTC | #8
2017-11-22 17:51 GMT+01:00 Alan Cox <gnomes@lxorguk.ukuu.org.uk>:
> On Wed, 22 Nov 2017 09:01:46 +0100
> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>
>> Disallows O_CREAT open missing the O_EXCL flag, in world or
>> group writable directories, even if the file doesn't exist yet.
>> With few exceptions (e.g. shared lock files based on flock())
>
> Enough exceptions to make it a bad idea.
>
> Firstly if you care this much *stop* having shared writable directories.
> We have namespaces, you don't need them. You can give every user their
> own /tmp etc.
>
> The rest of this only make sense on a per application and directory basis
> because there are valid use cases, and that means it wants to be part of
> an existing LSM security module where you've got the context required and
> you can attach it to a specific directory and/or process.

I think that this feature should be intended more as a "debugging" feature
than as a "security" one.
When the feature implemented in the first patch is enabled, this restriction
doesn't improve security at all and it's not supposed to do it.
The first patch blocks attacks that exploit some unsafe usage of sticky
directories.
This patch, instead, doesn't block actual attacks: it detects (and maybe
blocks) the bad code that can be exploited for the attacks blocked by #1,
even if no one is attacking you in that moment.
This looks like a useful feature to me, even if you already use more
sophisticated security apparatus like LSMs or namespaces, because it makes
easy to find real vulnerabilities in software: the commit message of
patch #1 has a short list of some CVEs that this feature can detect.
Also, being just a sysctl away, it's within anyone's reach.
Probably the "debugging" goal wasn't clear from my previous message,
I'm sorry for the misunderstanding.
Thank you very much for your time,

Salvatore
Salvatore Mesoraca Nov. 24, 2017, 8:34 a.m. UTC | #9
2017-11-23 23:57 GMT+01:00 Tobin C. Harding <me@tobin.cc>:
> On Wed, Nov 22, 2017 at 09:01:46AM +0100, Salvatore Mesoraca wrote:
>
> Same caveat about this being English language comments only as for patch
> 1/2. Please ignore if this is too trivial. My grammar is a long way from
> perfect, especially please feel free to ignore my placement of commas,
> they are often wrong.

As I wrote before: any help is always welcome.
Thank you,

Salvatore
David Laight Nov. 24, 2017, 10:53 a.m. UTC | #10
From: Alan Cox
> Sent: 22 November 2017 16:52
> 
> On Wed, 22 Nov 2017 09:01:46 +0100
> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> 
> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> > group writable directories, even if the file doesn't exist yet.
> > With few exceptions (e.g. shared lock files based on flock())
> 
> Enough exceptions to make it a bad idea.
> 
> Firstly if you care this much *stop* having shared writable directories.
> We have namespaces, you don't need them. You can give every user their
> own /tmp etc.

Looks like a very bad idea to me as well.

Doesn't this stop all shell redirects into a shared /tmp ?
I'm pretty sure most programs use O_CREAT | O_TRUNC for output
files - they'll all stop working.

If there are some directories where you need to force O_EXCL you
need to make it a property of the directory, not the kernel.

	David
Salvatore Mesoraca Nov. 24, 2017, 11:43 a.m. UTC | #11
2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> From: Alan Cox
>> Sent: 22 November 2017 16:52
>>
>> On Wed, 22 Nov 2017 09:01:46 +0100
>> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>>
>> > Disallows O_CREAT open missing the O_EXCL flag, in world or
>> > group writable directories, even if the file doesn't exist yet.
>> > With few exceptions (e.g. shared lock files based on flock())
>>
>> Enough exceptions to make it a bad idea.
>>
>> Firstly if you care this much *stop* having shared writable directories.
>> We have namespaces, you don't need them. You can give every user their
>> own /tmp etc.
>
> Looks like a very bad idea to me as well.
>
> Doesn't this stop all shell redirects into a shared /tmp ?
> I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> files - they'll all stop working.

If some program does such a thing, that's a potential vulnerability.
With "protected_hardlinks" you are, in most cases, safe.
But, still, that program has a bug and having this feature enabled will
help you notice it soon.
For that matter, I'm using this patch on my system and I don't have any
program behaving like this.

> If there are some directories where you need to force O_EXCL you
> need to make it a property of the directory, not the kernel.
>
>         David
>
David Laight Nov. 24, 2017, 11:53 a.m. UTC | #12
From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com]

> Sent: 24 November 2017 11:44

> 

> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:

> > From: Alan Cox

> >> Sent: 22 November 2017 16:52

> >>

> >> On Wed, 22 Nov 2017 09:01:46 +0100

> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:

> >>

> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or

> >> > group writable directories, even if the file doesn't exist yet.

> >> > With few exceptions (e.g. shared lock files based on flock())

> >>

> >> Enough exceptions to make it a bad idea.

> >>

> >> Firstly if you care this much *stop* having shared writable directories.

> >> We have namespaces, you don't need them. You can give every user their

> >> own /tmp etc.

> >

> > Looks like a very bad idea to me as well.

> >

> > Doesn't this stop all shell redirects into a shared /tmp ?

> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output

> > files - they'll all stop working.

> 

> If some program does such a thing, that's a potential vulnerability.

> With "protected_hardlinks" you are, in most cases, safe.

> But, still, that program has a bug and having this feature enabled will

> help you notice it soon.

> For that matter, I'm using this patch on my system and I don't have any

> program behaving like this.


Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then
open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't.
I can't help feeling that is just hiding a race.

	David
Salvatore Mesoraca Nov. 26, 2017, 11:29 a.m. UTC | #13
2017-11-24 12:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> From: Salvatore Mesoraca [mailto:s.mesoraca16@gmail.com]
>> Sent: 24 November 2017 11:44
>>
>> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
>> > From: Alan Cox
>> >> Sent: 22 November 2017 16:52
>> >>
>> >> On Wed, 22 Nov 2017 09:01:46 +0100
>> >> Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
>> >>
>> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
>> >> > group writable directories, even if the file doesn't exist yet.
>> >> > With few exceptions (e.g. shared lock files based on flock())
>> >>
>> >> Enough exceptions to make it a bad idea.
>> >>
>> >> Firstly if you care this much *stop* having shared writable directories.
>> >> We have namespaces, you don't need them. You can give every user their
>> >> own /tmp etc.
>> >
>> > Looks like a very bad idea to me as well.
>> >
>> > Doesn't this stop all shell redirects into a shared /tmp ?
>> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
>> > files - they'll all stop working.
>>
>> If some program does such a thing, that's a potential vulnerability.
>> With "protected_hardlinks" you are, in most cases, safe.
>> But, still, that program has a bug and having this feature enabled will
>> help you notice it soon.
>> For that matter, I'm using this patch on my system and I don't have any
>> program behaving like this.
>
> Hmmm.... a quick strace shows cp and vi doing stat("/tmp/foo") and then
> open(O_WRONLY|O_TRUNC) if it exists and O_CREATE|O_EXCL if it doesn't.
> I can't help feeling that is just hiding a race.

Yes, unfortunately, doing something like "cp somefile /tmp/" is a bad
practice that
in most cases will go unnoticed by this feature. Nevertheless there
are many other
real world vulnerability that it would have been able to detect.
Thank you very much for taking the time to do some experiments.

Salvatore
Solar Designer Nov. 27, 2017, 12:26 a.m. UTC | #14
On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> > From: Alan Cox
> >> Sent: 22 November 2017 16:52
> >>
> >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> >>
> >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> >> > group writable directories, even if the file doesn't exist yet.
> >> > With few exceptions (e.g. shared lock files based on flock())

Why would "shared lock files based on flock()" need O_CREAT without
O_EXCL?  Where specifically are they currently used that way?  My guess
would be a group-writable mail spool (that would be fcntl() locks on
Linux now, but it's the same thing for the purpose of this discussion),
but even then I don't see a need for such open flags.  If a program does
that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
retry without these to open the existing file, then flock() either way).

> >> Enough exceptions to make it a bad idea.

Maybe, but as Salvatore points out we're considering this for rather
limited use - easy opt-in policy enforcement during software development
and auditing (years ago, I was using an LKM for this purpose, which
caught some issues in Postfix that were promptly patched), and maybe on
specific production systems where the sysadmins know what they're doing.

> >> Firstly if you care this much *stop* having shared writable directories.
> >> We have namespaces, you don't need them. You can give every user their
> >> own /tmp etc.

This is good advice (and I've been doing similar with pam_mktemp, prior
to namespaces), but it's not exactly for the same use case.

Also, there are shared writable directories that have to remain shared
unless more things are reworked.  For example, mail spools and crontab
directories, which could be avoided by having the corresponding files in
user homes instead (and it's been done), but that's also a related risk
(a privileged process accessing a directory writable by a user, even if
accessing only for reading, lowering the risk impact).

> > Looks like a very bad idea to me as well.
> >
> > Doesn't this stop all shell redirects into a shared /tmp ?

The above sounds like it'd be something bad - but it's actually just
right for an opt-in policy like this.  A command like "program >
/tmp/file" is generally unsafe (unless the file was pre-created safely,
see below) and should be avoided.  The fact that a lot of documentation
gives commands of this kind only emphasizes the need for easy policy
enforcement against such misuses of /tmp.

Ideally, we'd stop most "shell redirects into a shared /tmp" - all but
those into files correctly pre-created with mktemp(1) and the like.

Technically, we could achieve similar by allowing O_CREAT without O_EXCL
if the file exists _and_ has the same owner as our current fsuid or the
owner is root.  Most scripts that use mktemp(1) then use that file
without switching privileges, so they'll continue working.  If there's
an exception to that on a system where I'd configure a policy like
what's discussed here, I'd rather learn of it and be interested in
looking at that unusual script.  That script would be taking a risk by
making one (pseudo-)user (or root) write into a file in /tmp created
(even if initially safely) by another pseudo-user (not root), thus
letting the latter pseudo-user attack the former (or root).

Ditto for programs using mkstemp(3), but then somehow going through the
pathname rather than the fd.  That's not ideal, but it's sometimes safe
and sometimes makes sense (such as when they need to exec other programs
only accepting a pathname), so the same exception and approach applies.

> > I'm pretty sure most programs use O_CREAT | O_TRUNC for output
> > files - they'll all stop working.

That would be great.  Unfortunately (in this context), you've identified
exceptions, where programs try to be smart but are not necessarily safe.

It's beneficial to be able to easily stop at least some misuses of /tmp
and the like, even if we can't stop all.

> If some program does such a thing, that's a potential vulnerability.

Exactly.

> With "protected_hardlinks" you are, in most cases, safe.

And "protected_symlinks", and not in terms of data spoofing attacks via
FIFOs and regular files (the initial focus of this patchset).

> But, still, that program has a bug and having this feature enabled will
> help you notice it soon.
> For that matter, I'm using this patch on my system and I don't have any
> program behaving like this.
> 
> > If there are some directories where you need to force O_EXCL you
> > need to make it a property of the directory, not the kernel.
> >
> >         David

Good idea, but this would need to be checked and enforced by the kernel
anyhow, so it's a matter of coarse vs. fine policy.  Coarse is much
easier to implement and to start using, albeit perhaps mostly not by
general-purpose distros, but by a software developer/auditor, an
adventurous sysadmin, or a sysadmin team or a developer of specific
systems where a simple policy like this is known to be valid (e.g.,
where there's no expectation of arbitrary software being added, so a
simple rule like this can be introduced system-wide).

Alexander
Salvatore Mesoraca Nov. 30, 2017, 2:39 p.m. UTC | #15
2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>:
> > > From: Alan Cox
> > >> Sent: 22 November 2017 16:52
> > >>
> > >> On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.mesoraca16@gmail.com> wrote:
> > >>
> > >> > Disallows O_CREAT open missing the O_EXCL flag, in world or
> > >> > group writable directories, even if the file doesn't exist yet.
> > >> > With few exceptions (e.g. shared lock files based on flock())
>
> Why would "shared lock files based on flock()" need O_CREAT without
> O_EXCL?  Where specifically are they currently used that way?

I don't think that they *need* to act like this, but this is how
util-linux's flock(1) currently works.
And it doesn't seem an unreasonable behavior from their perspective,
so maybe other programs do that too.
I was citing that case just to make it clear that, if someone gets
a warning because of flock(1), they shouldn't be worried about it.
That behavior can be certainly avoided, but of course it isn't a
security problem per se.

> If a program does
> that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
> retry without these to open the existing file, then flock() either way).

Yes, this would probably be the best thing to do, good idea.
Thanks again for your time,

Salvatore
Ian Campbell Nov. 30, 2017, 2:57 p.m. UTC | #16
On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote:
> 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> > On Fri, Nov 24, 2017 at 12:43:47PM +0100, Salvatore Mesoraca wrote:
> > > 2017-11-24 11:53 GMT+01:00 David Laight <David.Laight@aculab.com>
> > > :
> > > > From: Alan Cox
> > > > > Sent: 22 November 2017 16:52
> > > > > 
> > > > > On Wed, 22 Nov 2017 09:01:46 +0100 Salvatore Mesoraca <s.meso
> > > > > raca16@gmail.com> wrote:
> > > > > 
> > > > > > Disallows O_CREAT open missing the O_EXCL flag, in world or
> > > > > > group writable directories, even if the file doesn't exist
> > > > > > yet.
> > > > > > With few exceptions (e.g. shared lock files based on
> > > > > > flock())
> > 
> > Why would "shared lock files based on flock()" need O_CREAT without
> > O_EXCL?  Where specifically are they currently used that way?
> 
> I don't think that they *need* to act like this, but this is how
> util-linux's flock(1) currently works.
> And it doesn't seem an unreasonable behavior from their perspective,

I thought that too, specifically I reasoned that using O_EXCL would
defeat the purpose of the _shared_ flock(), wouldn't it?

Ian.
Solar Designer Nov. 30, 2017, 4:30 p.m. UTC | #17
Replying to Salvatore and Ian at once, and CC'ing H. Peter Anvin and
Karel Zak for util-linux flock(1).

On Thu, Nov 30, 2017 at 02:57:06PM +0000, Ian Campbell wrote:
> On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote:
> > 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > Why would "shared lock files based on flock()" need O_CREAT without
> > > O_EXCL?  Where specifically are they currently used that way?
> > 
> > I don't think that they *need* to act like this, but this is how
> > util-linux's flock(1) currently works.

Oh, but you never referred to flock(1) in there, so I read flock() as
implying flock(2).  I think util-linux's flock(1) is relatively obscure,
and as far as I can tell it isn't documented anywhere whether it may be
used on untrusted lock file directories or not.  A revision of the
flock(1) man page seen on RHEL7 gives really weird examples using /tmp.
The current util-linux-2.31/sys-utils/flock.1 is similar.  strace'ing
those examples, I see flock(1) literally uses the /tmp directory itself
(not a file inside) as the lock, which surprisingly appears to work.
Checking the util-linux tree, it looks like this was added as a feature.
I suppose the good news is this doesn't appear to allow for worse than a
DoS against the script using flock(1) (since a different user can also
take that lock), unless any of the upper level directories are also
untrusted.  The man page as seen at https://linux.die.net/man/1/flock
currently only lists a more complex example with /var/lock/mylockfile,
where flock(1) itself is asked to access it via a pre-opened fd.
Permissions on /var/lock vary between distros (e.g., a RHEL6'ish system
I just checked has it as 775 root:lock, whereas a RHEL7'ish has it as
symlink to ../run/lock, which is 755 root:root).  Anyway, these are just
examples.  The reality is people will use flock(1) in various directories,
some of them trusted and some not.  If our proposed policy can detect and
optionally break some uses in untrusted directories, that's good since
such uses are currently unsafe (see below).

> > And it doesn't seem an unreasonable behavior from their perspective,
> 
> I thought that too, specifically I reasoned that using O_EXCL would
> defeat the purpose of the _shared_ flock(), wouldn't it?

No.  O_EXCL means the attempt to O_CREAT will fail, at which point the
program can retry without both of these flags.  (The actual lock is
obtained later via flock(2) or fcntl(2) anyway.)  In fact, flock(1)
already does something similar, as tested using the very first example
from the man page on a RHEL7'ish system:

$ strace flock /tmp -c cat
[...]
open("/tmp", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = -1 EISDIR (Is a directory)
open("/tmp", O_RDONLY|O_NOCTTY)         = 3
flock(3, LOCK_EX)                       = 0

As you can see, when O_CREAT failed, the program retried without that
flag.  It could just as easily have tried O_CREAT|O_EXCL, then dropped
both at once.  Testing with a lock file (rather than lock directory):

$ strace flock /tmp/lockfile -c cat
[...]
open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
flock(3, LOCK_EX)                       = 0

This use of flock(1) would be a worse vulnerability, not limited to DoS
against the script itself but also allowing for privilege escalation
unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
would help (reduce the impact back to DoS against itself), and would
require that the retry logic (like what is seen in the lock directory
example above) be present.

On Thu, Nov 30, 2017 at 03:39:48PM +0100, Salvatore Mesoraca wrote:
> so maybe other programs do that too.

Maybe, and they would be similarly vulnerable if they do that in
untrusted directories, and they would also need to be fixed.

A subtle case is when something like this is technically done in a
directory writable by someone else, but is not necessarily crossing
what the program's author or the sysadmin recognize as a privilege
boundary.  Maybe they have accepted that this one pseudo-user can
attack that other one anyway, such as because under a certain daemon's
privsep model the would-be attacking pseudo-user is necessarily more
privileged anyway.  This is why we shouldn't by default extend this
policy to all directories writable by someone else, but instead we
consider introducing a sysctl with varying policy levels - initially
focusing on sticky directories writable by someone else and only later
optionally extending to non-sticky directories writable by someone else.
The latter mode would have even greater focus on development and
debugging rather than on production use, although I imagine that
specific systems could eventually afford it in production as well.

> I was citing that case just to make it clear that, if someone gets
> a warning because of flock(1), they shouldn't be worried about it.

Checking the util-linux-2.31/sys-utils/flock.c, I see that it in fact
doesn't retry without O_CREAT in the non-directory case.  That would
need to be corrected, along with introduction of O_EXCL.

> That behavior can be certainly avoided, but of course it isn't a
> security problem per se.

I think it is a security problem per se, except in the "subtle case"
above, and it's great that our proposed policy would catch it.

Perhaps flock(1) should be extended as follows:

> > If a program does
> > that, we could want to find out and revise it (if O_CREAT|O_EXCL fails,
> > retry without these to open the existing file, then flock() either way).
> 
> Yes, this would probably be the best thing to do, good idea.

util-linux-2.31/sys-utils/flock.c currently has:

static int open_file(const char *filename, int *flags)
{

	int fd;
	int fl = *flags == 0 ? O_RDONLY : *flags;

	errno = 0;
	fl |= O_NOCTTY | O_CREAT;
	fd = open(filename, fl, 0666);

	/* Linux doesn't like O_CREAT on a directory, even though it
	 * should be a no-op; POSIX doesn't allow O_RDWR or O_WRONLY
	 */
	if (fd < 0 && errno == EISDIR) {
		fl = O_RDONLY | O_NOCTTY;
		fd = open(filename, fl);
	}

I think this should be revised to:

	fl |= O_NOCTTY | O_CREAT | O_EXCL;
	fd = open(filename, fl, 0666);

	/* Linux doesn't like O_CREAT on a directory, even though it
	 * should be a no-op; POSIX doesn't allow O_RDWR or O_WRONLY
	 */
	if (fd < 0) {
		if (errno == EISDIR)
			fl = O_RDONLY | O_NOCTTY;
		else
			fl &= ~(O_CREAT | O_EXCL);
		fd = open(filename, fl);
	}

This adds O_EXCL and retry without O_CREAT|O_EXCL for non-directories.

[ The pre-umask permissions of 0666 are also a potential concern since
many users have a relaxed umask as distro default, but changing these to
0600 would break some otherwise valid uses where a lock file may be
shared between different users on purpose.  Maybe a future major version
of flock(1) could default to 0600 and/or accept a "-m" option to set the
lock file's mode in case a new file gets created.  mktemp(1) may be
viewed as precedent: it uses pre-umask permissions of 0600.  Lock files
are arguably somewhat similar to temporary files.  This is beyond scope
of this thread and LKML, though, and should be discussed elsewhere. ]

Alexander
David Laight Nov. 30, 2017, 4:53 p.m. UTC | #18
From: Salvatore Mesoraca
> Sent: 22 November 2017 08:02
> 
> Disallows O_CREAT open missing the O_EXCL flag, in world or
> group writable directories, even if the file doesn't exist yet.
> With few exceptions (e.g. shared lock files based on flock())
> if a program tries to open a file, in a sticky directory,
> with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> This feature allows to detect and potentially block programs that
> act this way, it can be used to find vulnerabilities (like those
> prevented by patch #1) and to do policy enforcement.

(Going back to the original post)

I presume the 'vulnerabilities' are related to symlinks being created
just before the open?

Trouble is this change breaks a lot of general use of /tmp.
I always assumed that code that cared would use O_EXCL and
everything else wasn't worth subverting.

I found code in vi (and elsewhere) that subverted these checks
by opening with O_WRONLY if stat() showed the file existed and
O_CREAT|O_EXCL if it didn't.

I'm pretty sure that traditionally a lot of these opens were done
with O_CREAT|O_TRUNC.
Implementing that as unlink() followed by a create would stop
'random' (ok all) symlinks being followed.

Overall I'm pretty sure this change will break things badly somewhere.

	David
Solar Designer Nov. 30, 2017, 5:51 p.m. UTC | #19
On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote:
> From: Salvatore Mesoraca
> > if a program tries to open a file, in a sticky directory,
> > with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> > This feature allows to detect and potentially block programs that
> > act this way, it can be used to find vulnerabilities (like those
> > prevented by patch #1) and to do policy enforcement.

> I presume the 'vulnerabilities' are related to symlinks being created
> just before the open?

That's one way to exploit them.  We already have a sysctl to try and
block that specific attack, and we're considering adding more, for other
attacks.  But we'd also like an easy way to learn of the vulnerabilities
without anyone trying to exploit them yet, hence this patch.

> Trouble is this change breaks a lot of general use of /tmp.

That's general misuse of /tmp.  Things like "command > /tmp/file"
without having pre-created the file with O_EXCL e.g. by mktemp(1).

> I always assumed that code that cared would use O_EXCL and
> everything else wasn't worth subverting.

Opinions will vary on whether it's worth subverting or not, and someone
may reasonably want to configure this differently on different systems.

> I found code in vi (and elsewhere) that subverted these checks
> by opening with O_WRONLY if stat() showed the file existed and
> O_CREAT|O_EXCL if it didn't.

Yes, such misuses of /tmp and the like by use of those programs - where
the potential consequences would often be less severe (if your example
above is correct, it sounds like it'd be data spoofing but not
overwriting another file over a link) - could remain unnoticed.

> I'm pretty sure that traditionally a lot of these opens were done
> with O_CREAT|O_TRUNC.
> Implementing that as unlink() followed by a create would stop
> 'random' (ok all) symlinks being followed.

That's racy.  We have O_EXCL, and it must be used along with O_CREAT
when the directory is untrusted.  (If it were only about symlinks, we
also already have O_NOFOLLOW.)

> Overall I'm pretty sure this change will break things badly somewhere.

Sure.  We want it to visibly break what was subtly broken, so that the
breakage can be seen and corrected without an attempted attack.

Alexander
David Laight Dec. 1, 2017, 9:46 a.m. UTC | #20
From: Solar Designer
> Sent: 30 November 2017 17:52
> 
> On Thu, Nov 30, 2017 at 04:53:06PM +0000, David Laight wrote:
> > From: Salvatore Mesoraca
> > > if a program tries to open a file, in a sticky directory,
> > > with the O_CREAT flag and without the O_EXCL, it probably has a bug.
> > > This feature allows to detect and potentially block programs that
> > > act this way, it can be used to find vulnerabilities (like those
> > > prevented by patch #1) and to do policy enforcement.
> 
> > I presume the 'vulnerabilities' are related to symlinks being created
> > just before the open?
> 
> That's one way to exploit them.  We already have a sysctl to try and
> block that specific attack, and we're considering adding more, for other
> attacks.  But we'd also like an easy way to learn of the vulnerabilities
> without anyone trying to exploit them yet, hence this patch.
> 
> > Trouble is this change breaks a lot of general use of /tmp.
> 
> That's general misuse of /tmp.  Things like "command > /tmp/file"
> without having pre-created the file with O_EXCL e.g. by mktemp(1).

I'm sorry, I've been using Unix for over 30 years.
/tmp is a place that temporary files were created - nothing special.
Traditionally it was emptied on every boot.
There was never anything that required files be created in any
specific way.

> > I always assumed that code that cared would use O_EXCL and
> > everything else wasn't worth subverting.
> 
> Opinions will vary on whether it's worth subverting or not, and someone
> may reasonably want to configure this differently on different systems.
> 
> > I found code in vi (and elsewhere) that subverted these checks
> > by opening with O_WRONLY if stat() showed the file existed and
> > O_CREAT|O_EXCL if it didn't.
> 
> Yes, such misuses of /tmp and the like by use of those programs - where
> the potential consequences would often be less severe (if your example
> above is correct, it sounds like it'd be data spoofing but not
> overwriting another file over a link) - could remain unnoticed.

It seemed to me that code had been added to avoid issues caused by
this policing of opens.
But the code added is itself racy - so makes the whole thing pointless.

> > I'm pretty sure that traditionally a lot of these opens were done
> > with O_CREAT|O_TRUNC.
> > Implementing that as unlink() followed by a create would stop
> > 'random' (ok all) symlinks being followed.
> 
> That's racy.  We have O_EXCL, and it must be used along with O_CREAT
> when the directory is untrusted.  (If it were only about symlinks, we
> also already have O_NOFOLLOW.)

Not racy if done in the kernel itself.

> > Overall I'm pretty sure this change will break things badly somewhere.
> 
> Sure.  We want it to visibly break what was subtly broken, so that the
> breakage can be seen and corrected without an attempted attack.

Maybe, but it will break some user system when they do a kernel update.
It won't necessarily break a developer's system first.

And, AFAICT, there is already code that subverts any tests by doing
a check for the file existing and then selecting between two different
types on open.

	David
Alan Cox Dec. 1, 2017, 3:52 p.m. UTC | #21
> > That's general misuse of /tmp.  Things like "command > /tmp/file"
> > without having pre-created the file with O_EXCL e.g. by mktemp(1).  
> 
> I'm sorry, I've been using Unix for over 30 years.
> /tmp is a place that temporary files were created - nothing special.
> Traditionally it was emptied on every boot.
> There was never anything that required files be created in any
> specific way.

And in 1978 you had to boot single user and use nckeck and icheck to fix
the filesystem up by hand, you had no networking, no systemd, no
sysvinit, no ANSI C. no X11 ... (shall I go on...)

There are reasons it all changed. The origin of /tmp is a compromise of
security and disk performance made in the 1970s about an OS that was
quite different, running on a machine with typically 256K of RAM, no RAM
disks, a single very expensive fixed head drive and a larger moving head
one.

The existence of /tmp in that form today is a bizarre historic quirk.
Fortunately if you want a perfectly safe /tmp/ use namespaces and every
user can have their own private /tmp.

Alan
Salvatore Mesoraca Dec. 5, 2017, 10:21 a.m. UTC | #22
2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>:
> Replying to Salvatore and Ian at once, and CC'ing H. Peter Anvin and
> Karel Zak for util-linux flock(1).
>
> On Thu, Nov 30, 2017 at 02:57:06PM +0000, Ian Campbell wrote:
> > On Thu, 2017-11-30 at 15:39 +0100, Salvatore Mesoraca wrote:
> > > 2017-11-27 1:26 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > > Why would "shared lock files based on flock()" need O_CREAT without
> > > > O_EXCL?  Where specifically are they currently used that way?
> > >
> > > I don't think that they *need* to act like this, but this is how
> > > util-linux's flock(1) currently works.
>
> Oh, but you never referred to flock(1) in there, so I read flock() as
> implying flock(2).

That's true, I observed that behavior in flock(1), but I didn't specify
it was flock(1) because I thought that there may be some other users of
flock(2) that works that way.

> I think util-linux's flock(1) is relatively obscure,
> and as far as I can tell it isn't documented anywhere whether it may be
> used on untrusted lock file directories or not.  A revision of the
> flock(1) man page seen on RHEL7 gives really weird examples using /tmp.
> The current util-linux-2.31/sys-utils/flock.1 is similar.  strace'ing
> those examples, I see flock(1) literally uses the /tmp directory itself
> (not a file inside) as the lock, which surprisingly appears to work.
> Checking the util-linux tree, it looks like this was added as a feature.
> I suppose the good news is this doesn't appear to allow for worse than a
> DoS against the script using flock(1) (since a different user can also
> take that lock), unless any of the upper level directories are also
> untrusted.  The man page as seen at https://linux.die.net/man/1/flock
> currently only lists a more complex example with /var/lock/mylockfile,
> where flock(1) itself is asked to access it via a pre-opened fd.
> Permissions on /var/lock vary between distros (e.g., a RHEL6'ish system
> I just checked has it as 775 root:lock, whereas a RHEL7'ish has it as
> symlink to ../run/lock, which is 755 root:root).  Anyway, these are just
> examples.  The reality is people will use flock(1) in various directories,
> some of them trusted and some not.  If our proposed policy can detect and
> optionally break some uses in untrusted directories, that's good since
> such uses are currently unsafe (see below).
>
> > > And it doesn't seem an unreasonable behavior from their perspective,
> >
> > I thought that too, specifically I reasoned that using O_EXCL would
> > defeat the purpose of the _shared_ flock(), wouldn't it?
>
> No.  O_EXCL means the attempt to O_CREAT will fail, at which point the
> program can retry without both of these flags.  (The actual lock is
> obtained later via flock(2) or fcntl(2) anyway.)  In fact, flock(1)
> already does something similar, as tested using the very first example
> from the man page on a RHEL7'ish system:
>
> $ strace flock /tmp -c cat
> [...]
> open("/tmp", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = -1 EISDIR (Is a directory)
> open("/tmp", O_RDONLY|O_NOCTTY)         = 3
> flock(3, LOCK_EX)                       = 0
>
> As you can see, when O_CREAT failed, the program retried without that
> flag.  It could just as easily have tried O_CREAT|O_EXCL, then dropped
> both at once.  Testing with a lock file (rather than lock directory):
>
> $ strace flock /tmp/lockfile -c cat
> [...]
> open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
> flock(3, LOCK_EX)                       = 0
>
> This use of flock(1) would be a worse vulnerability, not limited to DoS
> against the script itself but also allowing for privilege escalation
> unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
> would help (reduce the impact back to DoS against itself), and would
> require that the retry logic (like what is seen in the lock directory
> example above) be present.
>
> On Thu, Nov 30, 2017 at 03:39:48PM +0100, Salvatore Mesoraca wrote:
> > so maybe other programs do that too.
>
> Maybe, and they would be similarly vulnerable if they do that in
> untrusted directories, and they would also need to be fixed.
>
> A subtle case is when something like this is technically done in a
> directory writable by someone else, but is not necessarily crossing
> what the program's author or the sysadmin recognize as a privilege
> boundary.  Maybe they have accepted that this one pseudo-user can
> attack that other one anyway, such as because under a certain daemon's
> privsep model the would-be attacking pseudo-user is necessarily more
> privileged anyway.  This is why we shouldn't by default extend this
> policy to all directories writable by someone else, but instead we
> consider introducing a sysctl with varying policy levels - initially
> focusing on sticky directories writable by someone else and only later
> optionally extending to non-sticky directories writable by someone else.
> The latter mode would have even greater focus on development and
> debugging rather than on production use, although I imagine that
> specific systems could eventually afford it in production as well.
>
> > I was citing that case just to make it clear that, if someone gets
> > a warning because of flock(1), they shouldn't be worried about it.
>
> Checking the util-linux-2.31/sys-utils/flock.c, I see that it in fact
> doesn't retry without O_CREAT in the non-directory case.  That would
> need to be corrected, along with introduction of O_EXCL.
>
> > That behavior can be certainly avoided, but of course it isn't a
> > security problem per se.
>
> I think it is a security problem per se, except in the "subtle case"
> above, and it's great that our proposed policy would catch it.

I agree on the DoS, though, at first, I didn't consider it a "bug" because
there isn't any open mode that can prevent the DoS in this case.
If you want to avoid it, you must avoid other-users-writable directories
at all. So, It think that, if you are using a sticky directory, it's
intended behavior to let someone else "lock" you.
But maybe many flock(1) users are not aware of the issue and so, sometimes,
it can be unintended.
I didn't consider privilege escalation as an issue because I always
looked at flock(1) under the assumption that the lockfile is never actually
read or modified in any way and so it shouldn't make too much difference if
it's an already existing regular file or a symlink etc.
Am I missing something?

Salvatore
Solar Designer Dec. 7, 2017, 9:47 p.m. UTC | #23
On Tue, Dec 05, 2017 at 11:21:00AM +0100, Salvatore Mesoraca wrote:
> 2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>:
> > $ strace flock /tmp/lockfile -c cat
> > [...]
> > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
> > flock(3, LOCK_EX)                       = 0
> >
> > This use of flock(1) would be a worse vulnerability, not limited to DoS
> > against the script itself but also allowing for privilege escalation
> > unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
> > would help (reduce the impact back to DoS against itself), and would
> > require that the retry logic (like what is seen in the lock directory
> > example above) be present.

> > > That behavior can be certainly avoided, but of course it isn't a
> > > security problem per se.
> >
> > I think it is a security problem per se, except in the "subtle case"
> > above, and it's great that our proposed policy would catch it.
> 
> I agree on the DoS, though, at first, I didn't consider it a "bug" because
> there isn't any open mode that can prevent the DoS in this case.
> If you want to avoid it, you must avoid other-users-writable directories
> at all. So, It think that, if you are using a sticky directory, it's
> intended behavior to let someone else "lock" you.

Right.  There's a worse DoS I had overlooked, though: flock(1) can also
be made to create and/or lock another file (maybe in another directory).
Perhaps adding O_NOFOLLOW (alongside flock(1)'s existing use of
O_NOCTTY) would be a good idea, even though uses in a directory writable
by someone else are inherently risky anyway.

> But maybe many flock(1) users are not aware of the issue and so, sometimes,
> it can be unintended.
> I didn't consider privilege escalation as an issue because I always
> looked at flock(1) under the assumption that the lockfile is never actually
> read or modified in any way and so it shouldn't make too much difference if
> it's an already existing regular file or a symlink etc.
> Am I missing something?

You made a good point, but yes: O_CREAT will follow a dangling symlink
and there are cases where creating an empty file of an attacker-chosen
pathname may allow for privilege escalation.  For example, crontab(1)
man page on RHEL7 says regarding /etc/cron.allow and /etc/cron.deny:

"If neither of these files exists, only the super user is allowed to
use cron."

In that configuration, simply creating empty /etc/cron.deny grants
access to crontab(1) to all users.  As user:

$ crontab -l
You (solar) are not allowed to use this program (crontab)
See crontab(1) for more information
$ ln -s /etc/cron.deny /tmp/lockfile

As root:

# sysctl -w fs.protected_symlinks=0
fs.protected_symlinks = 0
# flock /tmp/lockfile -c echo

As user:

$ crontab -l
no crontab for solar

There may also be side-effects on open of device files (the best known
example is rewinding a tape), and we won't avoid those by retrying
without O_CREAT|O_EXCL.  O_NOFOLLOW will help against symlinks to device
files, but not against hard links (if on same device).  The kernel's
symlink and hardlink protections help, but in this sub-thread we were
discussing detecting userspace software issues without waiting for an
attack.  Things like this fit David Laight's point well: programs trying
to make risky (mis)uses less risky sometimes also avoid being detected
by our proposed policy.  That's life.

Alexander
Salvatore Mesoraca Dec. 11, 2017, 12:08 p.m. UTC | #24
2017-12-07 22:47 GMT+01:00 Solar Designer <solar@openwall.com>:
> On Tue, Dec 05, 2017 at 11:21:00AM +0100, Salvatore Mesoraca wrote:
> > 2017-11-30 17:30 GMT+01:00 Solar Designer <solar@openwall.com>:
> > > $ strace flock /tmp/lockfile -c cat
> > > [...]
> > > open("/tmp/lockfile", O_RDONLY|O_CREAT|O_NOCTTY, 0666) = 3
> > > flock(3, LOCK_EX)                       = 0
> > >
> > > This use of flock(1) would be a worse vulnerability, not limited to DoS
> > > against the script itself but also allowing for privilege escalation
> > > unless (sym)link restrictions are enabled in the kernel.  Adding O_EXCL
> > > would help (reduce the impact back to DoS against itself), and would
> > > require that the retry logic (like what is seen in the lock directory
> > > example above) be present.
>
> > > > That behavior can be certainly avoided, but of course it isn't a
> > > > security problem per se.
> > >
> > > I think it is a security problem per se, except in the "subtle case"
> > > above, and it's great that our proposed policy would catch it.
> >
> > I agree on the DoS, though, at first, I didn't consider it a "bug" because
> > there isn't any open mode that can prevent the DoS in this case.
> > If you want to avoid it, you must avoid other-users-writable directories
> > at all. So, It think that, if you are using a sticky directory, it's
> > intended behavior to let someone else "lock" you.
>
> Right.  There's a worse DoS I had overlooked, though: flock(1) can also
> be made to create and/or lock another file (maybe in another directory).
> Perhaps adding O_NOFOLLOW (alongside flock(1)'s existing use of
> O_NOCTTY) would be a good idea, even though uses in a directory writable
> by someone else are inherently risky anyway.
>
> > But maybe many flock(1) users are not aware of the issue and so, sometimes,
> > it can be unintended.
> > I didn't consider privilege escalation as an issue because I always
> > looked at flock(1) under the assumption that the lockfile is never actually
> > read or modified in any way and so it shouldn't make too much difference if
> > it's an already existing regular file or a symlink etc.
> > Am I missing something?
>
> You made a good point, but yes: O_CREAT will follow a dangling symlink
> and there are cases where creating an empty file of an attacker-chosen
> pathname may allow for privilege escalation.  For example, crontab(1)
> man page on RHEL7 says regarding /etc/cron.allow and /etc/cron.deny:
>
> "If neither of these files exists, only the super user is allowed to
> use cron."
>
> In that configuration, simply creating empty /etc/cron.deny grants
> access to crontab(1) to all users.  As user:
>
> $ crontab -l
> You (solar) are not allowed to use this program (crontab)
> See crontab(1) for more information
> $ ln -s /etc/cron.deny /tmp/lockfile
>
> As root:
>
> # sysctl -w fs.protected_symlinks=0
> fs.protected_symlinks = 0
> # flock /tmp/lockfile -c echo
>
> As user:
>
> $ crontab -l
> no crontab for solar

Ah, right. I didn't think of it.

> There may also be side-effects on open of device files (the best known
> example is rewinding a tape), and we won't avoid those by retrying
> without O_CREAT|O_EXCL.  O_NOFOLLOW will help against symlinks to device
> files, but not against hard links (if on same device).  The kernel's
> symlink and hardlink protections help, but in this sub-thread we were
> discussing detecting userspace software issues without waiting for an
> attack.  Things like this fit David Laight's point well: programs trying
> to make risky (mis)uses less risky sometimes also avoid being detected
> by our proposed policy.  That's life.

Yes, unfortunately some bad behaviors will go unnoticed. But many others won't,
so I thinks this is still a useful feature to have.

Salvatore
diff mbox

Patch

diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index f3cf2cd..7f24b4f 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -37,6 +37,7 @@  Currently, these files are in /proc/sys/fs:
 - protected_fifos
 - protected_hardlinks
 - protected_regular
+- protected_sticky_child_create
 - protected_symlinks
 - suid_dumpable
 - super-max
@@ -238,6 +239,35 @@  When set to "2" it also applies to group writable sticky directories.
 
 ==============================================================
 
+protected_sticky_child_create:
+
+An O_CREAT open missing the O_EXCL flag in a sticky directory is,
+often, a bug or a synthom of the fact that the program is not
+using appropriate procedures to access sticky directories.
+This protection allow to detect and possibly block these unsafe
+open invocations, even if the files don't exist yet.
+Though should be noted that, sometimes, it's OK to open a file
+with O_CREAT and without O_EXCL (e.g. shared lock files based
+on flock()), for this reason values above 2 should be set
+with care.
+
+When set to "0" the protection is disabled.
+
+When set to "1", notify about O_CREAT open missing the O_EXCL flag
+in world writable sticky directories.
+
+When set to "2", notify about O_CREAT open missing the O_EXCL flag
+in world or group writable sticky directories.
+
+When set to "3", block O_CREAT open missing the O_EXCL flag
+in world writable sticky directories and notify (but don't block)
+in group writable sticky directories.
+
+When set to "4", block O_CREAT open missing the O_EXCL flag
+in world writable and group writable sticky directories.
+
+==============================================================
+
 protected_symlinks:
 
 A long-standing class of security issues is the symlink-based
diff --git a/fs/namei.c b/fs/namei.c
index 92992ad..fcee423 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -904,6 +904,7 @@  static inline void put_link(struct nameidata *nd)
 int sysctl_protected_hardlinks __read_mostly = 0;
 int sysctl_protected_fifos __read_mostly;
 int sysctl_protected_regular __read_mostly;
+int sysctl_protected_sticky_child_create __read_mostly;
 
 /**
  * may_follow_link - Check symlink following for unsafe situations
@@ -1065,6 +1066,53 @@  static int may_create_in_sticky(struct dentry * const dir,
 	return 0;
 }
 
+/**
+ * may_create_no_excl - Detect and possibly block unsafe O_CREAT open
+ *			without O_EXCL.
+ * @dir: the stick parent directory
+ * @name: the file name
+ * @inode: the inode of the file to open (can be NULL to skip uid checks)
+ *
+ * When sysctl_protected_sticky_child_create is set to "0" the
+ * protection is disabled.
+ * When it's set to "1", notify about O_CREAT open missing the O_EXCL flag
+ * in world writable sticky directories.
+ * When it's set to "2", notify about O_CREAT open missing the O_EXCL flag
+ * in group writable sticky directories.
+ * When it's set to "3", block O_CREAT open missing the O_EXCL flag
+ * in world writable sticky directories and notify (but don't block)
+ * in group writable sticky directories.
+ * When it's set to "4", block O_CREAT open missing the O_EXCL flag
+ * in world writable and group writable sticky directories.
+ *
+ * Returns 0 if the open is allowed, -ve on error.
+ */
+static int may_create_no_excl(struct dentry * const dir,
+			      const unsigned char * const name,
+			      struct inode * const inode)
+{
+	umode_t mode = dir->d_inode->i_mode;
+
+	if (likely(!(mode & S_ISVTX)))
+		return 0;
+	if (inode && uid_eq(current_fsuid(), inode->i_uid))
+		return 0;
+
+	if ((sysctl_protected_sticky_child_create && likely(mode & 0002)) ||
+	    (sysctl_protected_sticky_child_create >= 2 && mode & 0020)) {
+		pr_notice_ratelimited("unsafe O_CREAT open (missing O_EXCL) of '%s' in a sticky directory by UID %u, EUID %u, process %s:%d.\n",
+				      name,
+				      from_kuid(&init_user_ns, current_uid()),
+				      from_kuid(&init_user_ns, current_euid()),
+				      current->comm, current->pid);
+		if (sysctl_protected_sticky_child_create >= 4 ||
+		    (sysctl_protected_sticky_child_create == 3 &&
+		     likely(mode & 0002)))
+			return -EACCES;
+	}
+	return 0;
+}
+
 static __always_inline
 const char *get_link(struct nameidata *nd)
 {
@@ -3256,6 +3304,11 @@  static int lookup_open(struct nameidata *nd, struct path *path,
 			error = -EACCES;
 			goto out_dput;
 		}
+		if (!(open_flag & O_EXCL)) {
+			error = may_create_no_excl(dir, nd->last.name, NULL);
+			if (unlikely(error))
+				goto out_dput;
+		}
 		error = dir_inode->i_op->create(dir_inode, dentry, mode,
 						open_flag & O_EXCL);
 		if (error)
@@ -3422,6 +3475,9 @@  static int do_last(struct nameidata *nd,
 		error = may_create_in_sticky(dir, nd->last.name, inode);
 		if (unlikely(error))
 			goto out;
+		error = may_create_no_excl(dir, nd->last.name, inode);
+		if (unlikely(error))
+			goto out;
 	}
 	error = -ENOTDIR;
 	if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6fb45a52..3ab37e0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -74,6 +74,7 @@ 
 extern int sysctl_protected_hardlinks;
 extern int sysctl_protected_fifos;
 extern int sysctl_protected_regular;
+extern int sysctl_protected_sticky_child_create;
 
 typedef __kernel_rwf_t rwf_t;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 590fbc9..012c739 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1817,6 +1817,15 @@  static int sysrq_sysctl_handler(struct ctl_table *table, int write,
 		.extra2		= &two,
 	},
 	{
+		.procname	= "protected_sticky_child_create",
+		.data		= &sysctl_protected_sticky_child_create,
+		.maxlen		= sizeof(int),
+		.mode		= 0600,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &four,
+	},
+	{
 		.procname	= "suid_dumpable",
 		.data		= &suid_dumpable,
 		.maxlen		= sizeof(int),