diff mbox

fanotify read returns with errno == EOPENSTALE

Message ID CAOQ4uxjA8Z7wnwuHFCKeKT1xz0Gh-qS40Y6cfPi3fAM0MKsRuQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Amir Goldstein April 20, 2017, 11:33 a.m. UTC
On Thu, Apr 20, 2017 at 2:06 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Apr 19, 2017 at 4:46 PM, Marko Rauhamaa
> <marko.rauhamaa@f-secure.com> wrote:
>> Amir Goldstein <amir73il@gmail.com>:
>>
>>> On Thu, Mar 23, 2017 at 8:43 AM, Marko Rauhamaa
>>> <marko.rauhamaa@f-secure.com> wrote:
>>>> Jeff Layton <jlayton@poochiereds.net>:
>>>>
>>>>> It was definitely not the intention to leak this error code to
>>>>> userland. EOPENSTALE is not a POSIX sanctioned error code, so
>>>>> applications generally don't know anything about it and will be
>>>>> confused.
>>>>
>>>> Got it. I will try to work on a reproduction and make a proper bug
>>>> report.
>>>
>>> Try this:
>>>
>>> - watch a single file for permissions events (so you will only have
>>> one event in the queue)
>>> - open file from client to generate single event (don't read event yet)
>>> - remove file from server (to make it stale)
>>> - read event (with stale file)
>>
>> I did that and reproduced the problem on a recent development kernel.
>> Happens every time.
>>
>> Just take the example program listed under "man fanotify" ("fantest")
>> and follow these steps:
>>
>>     ==============================================================
>>     NFS Server    NFS Client(1)     NFS Client(2)
>>     ==============================================================
>>     # echo foo >/nfsshare/bar.txt
>>                   # cat /nfsshare/bar.txt
>>                   foo
>>                                     # ./fantest /nfsshare
>>                                     Press enter key to terminate.
>>                                     Listening for events.
>>     # rm -f /nfsshare/bar.txt
>>                   # cat /nfsshare/bar.txt
>>                                     read: Unknown error 518
>>                   cat: /nfsshare/bar.txt: Operation not permitted
>>     ==============================================================
>>
>> where NFS Client (1) and (2) are two terminal sessions on a single NFS
>> Client machine.
>>
>
> Thanks for the reproducer.
> I'll try it myself when I get to it.
>
>> So what do we conclude? Is this a kernel bug or works as designed?
>>
>
> Exposing EOPENSTALE to userspace is definitely a kernel bug.
>
>
>>> Oh my. I completely misread your report before. I though you were
>>> trying to read from the event->fd. Now I understand that you mean read
>>> from fanotify fd. That will definitely return the error, but only in
>>> the special case where open error happened on the first event being
>>> read to the buffer. If error happens after adding some events to the
>>> buffer, fanotify process will not know about this. Regular event will
>>> be silently dropped and permission event will be denied.
>>>
>>> [...]
>>>
>>> You do NOT need to call fanotify_init() again, the next read will read
>>> the next event.
>>
>> It does appear that reading the fanotify fd again does the trick.
>>
>> However, the client gets an EPERM instead of ENOENT, which is a bit
>> weird.
>>
>
> Why would the client get ENOENT? That EOPENSTALE event is already
> consumed, the client reads the next event in the queue.

Sorry, I keep confusing when you refer to read of file and read of fanotify fd
when kernel fails to get response from fanotify daemon it will deny access
to file. That's the default.

>
>>> The fix seems trivial and I can post it once you have the test:
>>> - return EAGAIN for read in case of a single event in queue without fd
>>>   so apps getting the read error will have a good idea what to do
>>> - in case of non single event, maybe copy event with error on event->fd
>>>   to the buffer for specific errors that make sense to report (EMFILE)
>>>   so a watcher checks the values of negative event->fd can maybe do
>>>   something about it (e.g. provide a smaller buffer).
>>
>> EAGAIN would be perfect for me since I'm using fanotify in a nonblocking
>> mode. It might be a bit surprising in the blocking case.
>>
>>
>
> Can you please try this patch?
> Can you please try it with blocking and non-blocking
> Can you please try to add to reproducer the non empty queue case:
> - Add another mark on another mount without PERM events in the mask
> - Populate other mount with some files
> - Before reading from nfsshare, read from other mount to fill the
>    event queue, e.g.:
> # cat /tmp/foo* /nfsshare/bar.txt /tmp/bar*
>
> This should result (depending on number of files) with
>>= 2 buffer reads - first with /tmp/foo* files access
> last with /tmp/bar* files access
>
>

Sorry I messed up the previous patch. please try this one:

                 * events can be destroyed now.
@@ -305,7 +315,7 @@ static ssize_t fanotify_read(struct file *file,
char __user *buf,
                                break;
                } else {
 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
-                       if (ret < 0) {
+                       if (ret <= 0) {
                                FANOTIFY_PE(kevent)->response = FAN_DENY;
                                wake_up(&group->fanotify_data.access_waitq);
                                break;

Comments

Marko Rauhamaa April 20, 2017, 12:43 p.m. UTC | #1
Amir Goldstein <amir73il@gmail.com>:

> Sorry I messed up the previous patch. please try this one:

I will try it.

> +                        * do the right thing if there are no more events to
> +                        * read (i.e. return bytes read, -EAGAIN or wait).

EAGAIN is the right thing to do when FAN_NONBLOCK has been specified.
Without FAN_NONBLOCK, EAGAIN is bound to confuse the application. That
could be documented, of course.

More importantly, does EAGAIN here still guarantee EPOLLET semantics of
epoll(7)? IOW, if I get EAGAIN, I shouldn't have to try read(2)ing the
fanotify fd again before calling epoll_wait(2).


Marko
Amir Goldstein April 20, 2017, 1:34 p.m. UTC | #2
On Thu, Apr 20, 2017 at 3:43 PM, Marko Rauhamaa
<marko.rauhamaa@f-secure.com> wrote:
> Amir Goldstein <amir73il@gmail.com>:
>
>> Sorry I messed up the previous patch. please try this one:
>
> I will try it.
>
>> +                        * do the right thing if there are no more events to
>> +                        * read (i.e. return bytes read, -EAGAIN or wait).
>
> EAGAIN is the right thing to do when FAN_NONBLOCK has been specified.
> Without FAN_NONBLOCK, EAGAIN is bound to confuse the application. That
> could be documented, of course.
>

My comment says "do the right thing ... -EAGAIN or wait", meaning depending
FAN_NONBLOCK. The same code that checks for FAN_NONBLOCK will
take care of that. My patch only takes care of dropping the stale event and
continue to next event. If there is no next event, code will "do the
right thing".

> More importantly, does EAGAIN here still guarantee EPOLLET semantics of
> epoll(7)? IOW, if I get EAGAIN, I shouldn't have to try read(2)ing the
> fanotify fd again before calling epoll_wait(2).
>

Yes, if you get EAGAIN it means there are no more events in the queue,
so shouldn't have to try read again.

Amir.
Jan Kara April 20, 2017, 2:20 p.m. UTC | #3
On Thu 20-04-17 14:33:04, Amir Goldstein wrote:
> 
> Sorry I messed up the previous patch. please try this one:
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c
> b/fs/notify/fanotify/fanotify_user.c
> index 2b37f27..7864354 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -295,6 +295,16 @@ static ssize_t fanotify_read(struct file *file,
> char __user *buf,
>                 }
> 
>                 ret = copy_event_to_user(group, kevent, buf);
> +               if (unlikely(ret == -EOPENSTALE)) {
> +                       /*
> +                        * We cannot report events with stale fd so drop it.
> +                        * Setting ret to 0 will continue the event loop and
> +                        * do the right thing if there are no more events to
> +                        * read (i.e. return bytes read, -EAGAIN or wait).
> +                        */
> +                       ret = 0;
> +               }
> +
>                 /*
>                  * Permission events get queued to wait for response.  Other
>                  * events can be destroyed now.
> @@ -305,7 +315,7 @@ static ssize_t fanotify_read(struct file *file,
> char __user *buf,
>                                 break;
>                 } else {
>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> -                       if (ret < 0) {
> +                       if (ret <= 0) {
>                                 FANOTIFY_PE(kevent)->response = FAN_DENY;
>                                 wake_up(&group->fanotify_data.access_waitq);
>                                 break;

I don't think you want to break out of the reading loop when ret == 0 and
the code might be more readable as:

               if (!(kevent->mask & FAN_ALL_PERM_EVENTS)) {
                        fsnotify_destroy_event(group, kevent);
               } else {
#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
                        if (ret <= 0) {
                                FANOTIFY_PE(kevent)->response = FAN_DENY;
                                wake_up(&group->fanotify_data.access_waitq);
                        } else {
                                spin_lock(&group->notification_lock);
                                list_add_tail(&kevent->list,
                                              &group->fanotify_data.access_list);
                                spin_unlock(&group->notification_lock);
                        }
#endif
                }
                if (ret < 0)
                        break;

Hmm?

								Honza
Amir Goldstein April 20, 2017, 3:06 p.m. UTC | #4
On Thu, Apr 20, 2017 at 5:20 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 20-04-17 14:33:04, Amir Goldstein wrote:
>>
>> Sorry I messed up the previous patch. please try this one:
>>
>> diff --git a/fs/notify/fanotify/fanotify_user.c
>> b/fs/notify/fanotify/fanotify_user.c
>> index 2b37f27..7864354 100644
>> --- a/fs/notify/fanotify/fanotify_user.c
>> +++ b/fs/notify/fanotify/fanotify_user.c
>> @@ -295,6 +295,16 @@ static ssize_t fanotify_read(struct file *file,
>> char __user *buf,
>>                 }
>>
>>                 ret = copy_event_to_user(group, kevent, buf);
>> +               if (unlikely(ret == -EOPENSTALE)) {
>> +                       /*
>> +                        * We cannot report events with stale fd so drop it.
>> +                        * Setting ret to 0 will continue the event loop and
>> +                        * do the right thing if there are no more events to
>> +                        * read (i.e. return bytes read, -EAGAIN or wait).
>> +                        */
>> +                       ret = 0;
>> +               }
>> +
>>                 /*
>>                  * Permission events get queued to wait for response.  Other
>>                  * events can be destroyed now.
>> @@ -305,7 +315,7 @@ static ssize_t fanotify_read(struct file *file,
>> char __user *buf,
>>                                 break;
>>                 } else {
>>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>> -                       if (ret < 0) {
>> +                       if (ret <= 0) {
>>                                 FANOTIFY_PE(kevent)->response = FAN_DENY;
>>                                 wake_up(&group->fanotify_data.access_waitq);
>>                                 break;
>
> I don't think you want to break out of the reading loop when ret == 0 and
> the code might be more readable as:
>
>                if (!(kevent->mask & FAN_ALL_PERM_EVENTS)) {
>                         fsnotify_destroy_event(group, kevent);
>                } else {
> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>                         if (ret <= 0) {
>                                 FANOTIFY_PE(kevent)->response = FAN_DENY;
>                                 wake_up(&group->fanotify_data.access_waitq);
>                         } else {
>                                 spin_lock(&group->notification_lock);
>                                 list_add_tail(&kevent->list,
>                                               &group->fanotify_data.access_list);
>                                 spin_unlock(&group->notification_lock);
>                         }
> #endif
>                 }
>                 if (ret < 0)
>                         break;
>
> Hmm?
>

Right, I missed that.
Thanks.
Marko Rauhamaa April 21, 2017, 1:13 p.m. UTC | #5
Amir Goldstein <amir73il@gmail.com>:

> On Thu, Apr 20, 2017 at 3:43 PM, Marko Rauhamaa
> <marko.rauhamaa@f-secure.com> wrote:
>> Amir Goldstein <amir73il@gmail.com>:
>>
>>> Sorry I messed up the previous patch. please try this one:
>>
>> I will try it.

Tried it. Superficially, it seems to be working, but...

>> More importantly, does EAGAIN here still guarantee EPOLLET semantics
>> of epoll(7)? IOW, if I get EAGAIN, I shouldn't have to try read(2)ing
>> the fanotify fd again before calling epoll_wait(2).
>
> Yes, if you get EAGAIN it means there are no more events in the queue,
> so shouldn't have to try read again.

I ran into a system hang with our real product that suggests there might
be a problem left. It would be explained if your patch generated an
EAGAIN while there were other events waiting in the queue.

I will have to investigate further.


Marko
Amir Goldstein April 22, 2017, 7:22 a.m. UTC | #6
On Thu, Apr 20, 2017 at 6:06 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Apr 20, 2017 at 5:20 PM, Jan Kara <jack@suse.cz> wrote:
>> On Thu 20-04-17 14:33:04, Amir Goldstein wrote:
>>>
>>> Sorry I messed up the previous patch. please try this one:
>>>
>>> diff --git a/fs/notify/fanotify/fanotify_user.c
>>> b/fs/notify/fanotify/fanotify_user.c
>>> index 2b37f27..7864354 100644
>>> --- a/fs/notify/fanotify/fanotify_user.c
>>> +++ b/fs/notify/fanotify/fanotify_user.c
>>> @@ -295,6 +295,16 @@ static ssize_t fanotify_read(struct file *file,
>>> char __user *buf,
>>>                 }
>>>
>>>                 ret = copy_event_to_user(group, kevent, buf);
>>> +               if (unlikely(ret == -EOPENSTALE)) {
>>> +                       /*
>>> +                        * We cannot report events with stale fd so drop it.
>>> +                        * Setting ret to 0 will continue the event loop and
>>> +                        * do the right thing if there are no more events to
>>> +                        * read (i.e. return bytes read, -EAGAIN or wait).
>>> +                        */
>>> +                       ret = 0;
>>> +               }
>>> +
>>>                 /*
>>>                  * Permission events get queued to wait for response.  Other
>>>                  * events can be destroyed now.
>>> @@ -305,7 +315,7 @@ static ssize_t fanotify_read(struct file *file,
>>> char __user *buf,
>>>                                 break;
>>>                 } else {
>>>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>>> -                       if (ret < 0) {
>>> +                       if (ret <= 0) {
>>>                                 FANOTIFY_PE(kevent)->response = FAN_DENY;
>>>                                 wake_up(&group->fanotify_data.access_waitq);
>>>                                 break;
>>
>> I don't think you want to break out of the reading loop when ret == 0 and
>> the code might be more readable as:
>>
>>                if (!(kevent->mask & FAN_ALL_PERM_EVENTS)) {
>>                         fsnotify_destroy_event(group, kevent);
>>                } else {
>> #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>>                         if (ret <= 0) {
>>                                 FANOTIFY_PE(kevent)->response = FAN_DENY;
>>                                 wake_up(&group->fanotify_data.access_waitq);
>>                         } else {
>>                                 spin_lock(&group->notification_lock);
>>                                 list_add_tail(&kevent->list,
>>                                               &group->fanotify_data.access_list);
>>                                 spin_unlock(&group->notification_lock);
>>                         }
>> #endif
>>                 }
>>                 if (ret < 0)
>>                         break;
>>
>> Hmm?
>>
>

On Fri, Apr 21, 2017 at 5:27 PM, Marko Rauhamaa
<marko.rauhamaa@f-secure.com> wrote:
> Amir Goldstein <amir73il@gmail.com>:
>
>> Did you notice Jan's comments on my patch? I had a bug that broke out
>> of the loop. Without his corrections read will return even if there
>> are more events in the queue.
>
> Yes, I now tried Jan's fix, and it did the trick.
>
> It will now take months or years before distros have a proper fix. In
> the interim, I will absorb EOPENSTALE and schedule a reread in that
> situation.
>
> Thank you both for your attention.
>
>

Marko,

Were you able to verify that both blocking and non-blocking mode work correctly?
May I add your Tested-by and Reported-by tags?

Thanks!
Amir.
Marko Rauhamaa April 24, 2017, 7:40 a.m. UTC | #7
Amir Goldstein <amir73il@gmail.com>:

> Were you able to verify that both blocking and non-blocking mode work
> correctly? May I add your Tested-by and Reported-by tags?

I have only verified the nonblocking case in my tests (which are by no
means extensive). The reported issue can no longer be reproduced with
the patched kernel, and the regular OPEN_PERM function is still
operational.

Feel free to add me to the tags.


Marko
diff mbox

Patch

diff --git a/fs/notify/fanotify/fanotify_user.c
b/fs/notify/fanotify/fanotify_user.c
index 2b37f27..7864354 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -295,6 +295,16 @@  static ssize_t fanotify_read(struct file *file,
char __user *buf,
                }

                ret = copy_event_to_user(group, kevent, buf);
+               if (unlikely(ret == -EOPENSTALE)) {
+                       /*
+                        * We cannot report events with stale fd so drop it.
+                        * Setting ret to 0 will continue the event loop and
+                        * do the right thing if there are no more events to
+                        * read (i.e. return bytes read, -EAGAIN or wait).
+                        */
+                       ret = 0;
+               }
+
                /*
                 * Permission events get queued to wait for response.  Other