diff mbox series

Possible FS race condition between vfs_rename and do_linkat (fs/namei.c)

Message ID CAE9vp3JsD1KVqLXnLWpNrAtDSmm6gUa0KC_degOECDsGntvUXw@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series Possible FS race condition between vfs_rename and do_linkat (fs/namei.c) | expand

Commit Message

Xavier Roche Aug. 24, 2019, 7:24 a.m. UTC
Dear distinguished filesystem contributors,

There seem to be a race condition between vfs_rename and do_linkat,
when those operations are done in parallel:

1. Moving a file to a target file (eg. mv file target)
2. Creating a link from the target file (eg. ln target link)

My understanding is that as the target file is never erased on client
side, but just replaced, the link should never fail.

But maybe this is something the filesystem can not guarantee at all
(w.r.t POSIX typically) ?

To demonstrate this issue, just run the following script (it will in
loop move "file" to "target", and in parallel link "target" to "link")
:

========== Cut here ==========
#!/bin/bash
#

rm -f link file target
touch target

# Link target -> link in loop
while ln target link && rm link; do :; done &

# Overwrite file -> target in loop
while touch file && mv file target; do :; done &

wait
========== Cut here ==========

Running the script will yield:
./bug.sh
ln: failed to create hard link 'link' => 'target': No such file or directory

The issue seem to lie inside vfs_link (fs/namei.c):
       inode_lock(inode);
       /* Make sure we don't allow creating hardlink to an unlinked file */
       if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
               error =  -ENOENT;

The possible answer is that the inode refcount is zero because the
file has just been replaced concurrently, old file being erased, and
as such, the link operation is failing.

Patching with this very naive fix "solves" the issue (but this is
probably not something we want):

                error = try_break_deleg(inode, delegated_inode);

Kudos to Xavier Grand from Algolia for spotting the issue with a
reproducible case.

Comments

Xavier Roche Aug. 24, 2019, 10:01 a.m. UTC | #1
Just a small addition:

> But maybe this is something the filesystem can not guarantee at all
> (w.r.t POSIX typically) ?

The POSIX standard does not seem to directly address this case, but my
understanding is that rename() atomicity should guarantee correct
behavior w.r.t rename()/link() concurrent operations:

See IEEE Std 1003.1,

(1) https://pubs.opengroup.org/onlinepubs/009695399/functions/rename.html

"If the link named by the new argument exists, it shall be removed and
old renamed to new. In this case, a link named new shall remain
visible to other processes throughout the renaming operation and refer
either to the file referred to by new or old before the operation
began."

"This rename() function is equivalent for regular files to that
defined by the ISO C standard. Its inclusion here expands that
definition to include actions on directories and specifies behavior
when the new parameter names a file that already exists. That
specification requires that the action of the function be atomic."

(2) https://pubs.opengroup.org/onlinepubs/009695399/functions/link.html
[ENOENT]
    "A component of either path prefix does not exist; the file named
by path1 does not exist; or path1 or path2 points to an empty string."

The only erroneous [ENOENT] case is when the file does not "exist";
but the renaming should guarantee that the file always "exists" ("a
link named new shall remain visible to other processes throughout the
renaming operation and refer either to the file referred to by new or
old before the operation began")

Cheers,
Xavier Roche Aug. 26, 2019, 9:36 a.m. UTC | #2
One more addition:

This issue seems to be a regression introduced by an old fix
(aae8a97d3ec30) in fs/namei.c preventing to hardlink a deleted file
(with i_nlink == 0):

Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Sat Jan 29 18:43:27 2011 +0530

    fs: Don't allow to create hardlink for deleted file

    Add inode->i_nlink == 0 check in VFS. Some of the file systems
    do this internally. A followup patch will remove those instance.
    This is needed to ensure that with link by handle we don't allow
    to create hardlink of an unlinked file. The check also prevent a race
    between unlink and link

    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/namei.c b/fs/namei.c
index 83e92bab79a6..33be51a2ddb7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2906,7 +2906,11 @@ int vfs_link(struct dentry *old_dentry, struct
inode *dir, struct dentry *new_de
                return error;

        mutex_lock(&inode->i_mutex);
-       error = dir->i_op->link(old_dentry, dir, new_dentry);
+       /* Make sure we don't allow creating hardlink to an unlinked file */
+       if (inode->i_nlink == 0)
+               error =  -ENOENT;
+       else
+               error = dir->i_op->link(old_dentry, dir, new_dentry);
        mutex_unlock(&inode->i_mutex);
        if (!error)
                fsnotify_link(dir, inode, new_dentry);


Regards,
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 209c51a5226c..befb15f4b865 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4231,9 +4231,10 @@  int vfs_link(struct dentry *old_dentry, struct
inode *dir, struct dentry *new_de

        inode_lock(inode);
        /* Make sure we don't allow creating hardlink to an unlinked file */
-       if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
-               error =  -ENOENT;
-       else if (max_links && inode->i_nlink >= max_links)
+       //if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
+       //      error =  -ENOENT;
+       // else
+       if (max_links && inode->i_nlink >= max_links)
                error = -EMLINK;
        else {