diff mbox series

vfs: fix link vs. rename race

Message ID 20220218153249.406028-1-mszeredi@redhat.com (mailing list archive)
State New, archived
Headers show
Series vfs: fix link vs. rename race | expand

Commit Message

Miklos Szeredi Feb. 18, 2022, 3:32 p.m. UTC
There has been a longstanding race condition between rename(2) and link(2),
when those operations are done in parallel:

1. Moving a file to an existing target file (eg. mv file target)
2. Creating a link from the target file to a third file (eg. ln target
   link)

By the time vfs_link() locks the target inode, it might already be unlinked
by rename.  This results in vfs_link() returning -ENOENT in order to
prevent linking to already unlinked files.  This check was introduced in
v2.6.39 by commit aae8a97d3ec3 ("fs: Don't allow to create hardlink for
deleted file").

This breaks apparent atomicity of rename(2), which is described in
standards and the man page:

    "If newpath already exists, it will be atomically replaced, so that
     there is no point at which another process attempting to access
     newpath will find it missing."

The simplest fix is to exclude renames for the complete link operation.

This patch introduces a global rw_semaphore that is locked for read in
rename and for write in link.  To prevent excessive contention, do not take
the lock in link on the first try.  If the source of the link was found to
be unlinked, then retry with the lock held.

Reproducer can be found at:

  https://lore.kernel.org/all/20220216131814.GA2463301@xavier-xps/

Reported-by: Xavier Roche <xavier.roche@algolia.com>
Link: https://lore.kernel.org/all/20220214210708.GA2167841@xavier-xps/
Fixes: aae8a97d3ec3 ("fs: Don't allow to create hardlink for deleted file")
Tested-by: Xavier Roche <xavier.roche@algolia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/namei.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

Comments

Xavier Roche Feb. 18, 2022, 6:43 p.m. UTC | #1
On Fri, Feb 18, 2022 at 04:32:49PM +0100, Miklos Szeredi wrote:
> Reported-by: Xavier Roche <xavier.roche@algolia.com>

Just one minor detail for the records: this was tested by me but reported by
another Xavier. But that's not a big deal.

Reported-by: Xavier Grand <xavier.grand@algolia.com>
Tested-by: Xavier Roche <xavier.roche@algolia.com>
Al Viro Feb. 18, 2022, 7:31 p.m. UTC | #2
On Fri, Feb 18, 2022 at 04:32:49PM +0100, Miklos Szeredi wrote:
> There has been a longstanding race condition between rename(2) and link(2),
> when those operations are done in parallel:
> 
> 1. Moving a file to an existing target file (eg. mv file target)
> 2. Creating a link from the target file to a third file (eg. ln target
>    link)
> 
> By the time vfs_link() locks the target inode, it might already be unlinked
> by rename.  This results in vfs_link() returning -ENOENT in order to
> prevent linking to already unlinked files.  This check was introduced in
> v2.6.39 by commit aae8a97d3ec3 ("fs: Don't allow to create hardlink for
> deleted file").
> 
> This breaks apparent atomicity of rename(2), which is described in
> standards and the man page:
> 
>     "If newpath already exists, it will be atomically replaced, so that
>      there is no point at which another process attempting to access
>      newpath will find it missing."
> 
> The simplest fix is to exclude renames for the complete link operation.
> 
> This patch introduces a global rw_semaphore that is locked for read in
> rename and for write in link.  To prevent excessive contention, do not take
> the lock in link on the first try.  If the source of the link was found to
> be unlinked, then retry with the lock held.

AFAICS, that deadlocks if lock_rename() is taken in ecryptfs_rename() (with
lock_rename() already taken by its caller) after another thread blocks trying
to take your link_rwsem exclusive.
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 3f1829b3ab5b..dd6908cee49d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -122,6 +122,8 @@ 
  * PATH_MAX includes the nul terminator --RR.
  */
 
+static DECLARE_RWSEM(link_rwsem);
+
 #define EMBEDDED_NAME_MAX	(PATH_MAX - offsetof(struct filename, iname))
 
 struct filename *
@@ -2961,6 +2963,8 @@  struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
 {
 	struct dentry *p;
 
+	down_read(&link_rwsem);
+
 	if (p1 == p2) {
 		inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
 		return NULL;
@@ -2995,6 +2999,8 @@  void unlock_rename(struct dentry *p1, struct dentry *p2)
 		inode_unlock(p2->d_inode);
 		mutex_unlock(&p1->d_sb->s_vfs_rename_mutex);
 	}
+
+	up_read(&link_rwsem);
 }
 EXPORT_SYMBOL(unlock_rename);
 
@@ -4456,6 +4462,7 @@  int do_linkat(int olddfd, struct filename *old, int newdfd,
 	struct path old_path, new_path;
 	struct inode *delegated_inode = NULL;
 	int how = 0;
+	bool lock = false;
 	int error;
 
 	if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) {
@@ -4474,10 +4481,13 @@  int do_linkat(int olddfd, struct filename *old, int newdfd,
 
 	if (flags & AT_SYMLINK_FOLLOW)
 		how |= LOOKUP_FOLLOW;
+retry_lock:
+	if (lock)
+		down_write(&link_rwsem);
 retry:
 	error = filename_lookup(olddfd, old, how, &old_path, NULL);
 	if (error)
-		goto out_putnames;
+		goto out_unlock_link;
 
 	new_dentry = filename_create(newdfd, new, &new_path,
 					(how & LOOKUP_REVAL));
@@ -4511,8 +4521,16 @@  int do_linkat(int olddfd, struct filename *old, int newdfd,
 		how |= LOOKUP_REVAL;
 		goto retry;
 	}
+	if (!lock && error == -ENOENT) {
+		path_put(&old_path);
+		lock = true;
+		goto retry_lock;
+	}
 out_putpath:
 	path_put(&old_path);
+out_unlock_link:
+	if (lock)
+		up_write(&link_rwsem);
 out_putnames:
 	putname(old);
 	putname(new);