From patchwork Tue Apr 24 06:19:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10358689 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AD4EF6038F for ; Tue, 24 Apr 2018 06:19:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9BE2228CF9 for ; Tue, 24 Apr 2018 06:19:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8E77128CFB; Tue, 24 Apr 2018 06:19:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FE6028CF9 for ; Tue, 24 Apr 2018 06:19:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756320AbeDXGT4 (ORCPT ); Tue, 24 Apr 2018 02:19:56 -0400 Received: from mail-io0-f194.google.com ([209.85.223.194]:37403 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756304AbeDXGTx (ORCPT ); Tue, 24 Apr 2018 02:19:53 -0400 Received: by mail-io0-f194.google.com with SMTP id y128-v6so21342943iod.4 for ; Mon, 23 Apr 2018 23:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :in-reply-to:references; bh=7m8fW9QusJw1Q80+wh9Wh5S3Su0WiXvd0p9qA76Jw94=; b=wCCJiMLS89rvsdGoKMLEhtTAwpXfpnGbtUXYNagesYaEl3Cv15LvKzRqN3wkryjoJN +jgyvyeJRflYlwEwimQOyAfxlXMIChr4wdUBUTRBRcCTymU6Vl8GogZ8elvGldC66Xre fr5QKBD5wOAvdbXqLNw+ahZ8i3LEIeAH4HpuBWxC7eZHKPq3AKdVmYoa9DwxYk5pAi6K JGymRFng7KGiqPfBi944Yt+rIWLFbgTnhHgS55D+cM3OjwyVvJbw4xNoQDVvwrDCvvCf pupxiPgFITH7VuX/iXM98AFeNfbVDSdK8Bne/5QN1ccfpzwEjnyKpQFOhCLdtAIBkazH gpGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:in-reply-to:references; bh=7m8fW9QusJw1Q80+wh9Wh5S3Su0WiXvd0p9qA76Jw94=; b=bkz4hVqJMfDAWsOZL9rKBgXezLuo/kYsmHm2dfOLKdsKfo+t10edlO5FRUud3Cztom hpzDAw2St7dl1fe4VXu0pre/VekVeb25m+TrkhpXhRVajII2HtOG1wZV1aSMClRE877B PXsaiovzCK3bA47m5M2Jy0zsUlo4brxUz2engYvuxzaze/deB609gQWVTUA2JBPjhWQe TNhg0o419hPVbfZ7mQYLoVULofLLoq/sQj7gTR4MAzU0+IacQ4TfWx8HC15zoC7cFlK/ /o6Rqi3JtjmASogvngwTaDm7Yi1uCwbzMumUhCIw3X88VXaqMOJGpbp1le1myEonyH8M 5RfQ== X-Gm-Message-State: ALQs6tChPxUSlpRM8ZLvw60QYQyy36On5o0rdy2/v/IzCL2NPt3s6Iwi DxXRsyfjJRH+zRzsIKJi6klpwQ== X-Google-Smtp-Source: AB8JxZowtXeokwaHgWMlnFQEqB0EZsWQw0gruUzXeg3YgjT1INdDlmHFWXetrf9F8usMnbGjBrRRNA== X-Received: by 2002:a6b:1d11:: with SMTP id d17-v6mr2093377iod.190.1524550792859; Mon, 23 Apr 2018 23:19:52 -0700 (PDT) Received: from vader.Home (174-23-136-226.slkc.qwest.net. [174.23.136.226]) by smtp.gmail.com with ESMTPSA id n142-v6sm5057653itn.38.2018.04.23.23.19.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Apr 2018 23:19:52 -0700 (PDT) From: Omar Sandoval To: Al Viro , linux-fsdevel@vger.kernel.org Cc: Linus Torvalds , linux-api@vger.kernel.org, kernel-team@fb.com, Xi Wang Subject: [RFC PATCH v3 1/2] fs: add AT_REPLACE flag for linkat() which replaces the target Date: Mon, 23 Apr 2018 23:19:41 -0700 Message-Id: X-Mailer: git-send-email 2.17.0 In-Reply-To: References: In-Reply-To: References: Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval One of the most common uses of temporary files is the classic atomic replacement pattern, i.e., - write temporary file - fsync temporary file - rename temporary file over real file - fsync parent directory Now, we have O_TMPFILE, which gives us a much better way to create temporary files, but it's not possible to use it for this pattern. This patch introduces an AT_REPLACE flag which allows linkat() to replace the target file. Now, the temporary file in the pattern above can be a proper O_TMPFILE. Even without O_TMPFILE, this is a new primitive which might be useful in other contexts. The implementation on the VFS side mimics sys_renameat2(). Cc: Xi Wang Signed-off-by: Omar Sandoval --- fs/ecryptfs/inode.c | 2 +- fs/namei.c | 181 +++++++++++++++++++++++++++++-------- fs/nfsd/vfs.c | 2 +- fs/overlayfs/overlayfs.h | 2 +- include/linux/fs.h | 3 +- include/uapi/linux/fcntl.h | 1 + 6 files changed, 150 insertions(+), 41 deletions(-) diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 97d17eaeba07..d3d29bf5d6b7 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -437,7 +437,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir, dget(lower_new_dentry); lower_dir_dentry = lock_parent(lower_new_dentry); rc = vfs_link(lower_old_dentry, d_inode(lower_dir_dentry), - lower_new_dentry, NULL); + lower_new_dentry, NULL, 0); if (rc || d_really_is_negative(lower_new_dentry)) goto out_lock; rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb); diff --git a/fs/namei.c b/fs/namei.c index 186bd2464fd5..2cc2b1deaa12 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4149,6 +4149,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn * @dir: new parent * @new_dentry: where to create the new link * @delegated_inode: returns inode needing a delegation break + * @flags: link flags * * The caller must hold dir->i_mutex * @@ -4162,16 +4163,26 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn * be appropriate for callers that expect the underlying filesystem not * to be NFS exported. */ -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode) +int vfs_link(struct dentry *old_dentry, struct inode *dir, + struct dentry *new_dentry, struct inode **delegated_inode, + unsigned int flags) { struct inode *inode = old_dentry->d_inode; + struct inode *target = new_dentry->d_inode; unsigned max_links = dir->i_sb->s_max_links; int error; if (!inode) return -ENOENT; - error = may_create(dir, new_dentry); + if (target) { + if (flags & AT_REPLACE) + error = may_delete(dir, new_dentry, d_is_dir(old_dentry)); + else + error = -EEXIST; + } else { + error = may_create(dir, new_dentry); + } if (error) return error; @@ -4190,8 +4201,10 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de */ if (HAS_UNMAPPED_ID(inode)) return -EPERM; - if (!dir->i_op->link) + if (!dir->i_op->link && !dir->i_op->link2) return -EPERM; + if (flags && !dir->i_op->link2) + return -EINVAL; if (S_ISDIR(inode->i_mode)) return -EPERM; @@ -4199,26 +4212,58 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de if (error) return error; - inode_lock(inode); + dget(new_dentry); + lock_two_nondirectories(inode, target); + + if (is_local_mountpoint(new_dentry)) { + error = -EBUSY; + goto out; + } + /* Make sure we don't allow creating hardlink to an unlinked file */ - if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE)) + if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE)) { error = -ENOENT; - else if (max_links && inode->i_nlink >= max_links) + goto out; + } + if (max_links && inode->i_nlink >= max_links) { error = -EMLINK; - else { - error = try_break_deleg(inode, delegated_inode); - if (!error) - error = dir->i_op->link(old_dentry, dir, new_dentry); + goto out; + } + + error = try_break_deleg(inode, delegated_inode); + if (error) + goto out; + if (target) { + error = try_break_deleg(target, delegated_inode); + if (error) + goto out; + } + + if (dir->i_op->link) + error = dir->i_op->link(old_dentry, dir, new_dentry); + else + error = dir->i_op->link2(old_dentry, dir, new_dentry, flags); + if (error) + goto out; + + if (target) { + dont_mount(new_dentry); + detach_mounts(new_dentry); } - if (!error && (inode->i_state & I_LINKABLE)) { + if (inode->i_state & I_LINKABLE) { spin_lock(&inode->i_lock); inode->i_state &= ~I_LINKABLE; spin_unlock(&inode->i_lock); } - inode_unlock(inode); - if (!error) +out: + unlock_two_nondirectories(inode, target); + dput(new_dentry); + if (!error) { + if (target) + fsnotify_link_count(target); fsnotify_link(dir, inode, new_dentry); + } return error; } EXPORT_SYMBOL(vfs_link); @@ -4237,11 +4282,15 @@ int do_linkat(int olddfd, const char __user *oldname, int newdfd, { struct dentry *new_dentry; struct path old_path, new_path; + struct qstr new_last; + int new_type; struct inode *delegated_inode = NULL; - int how = 0; + struct filename *to; + unsigned int how = 0, target_flags; + bool should_retry = false; int error; - if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) + if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH | AT_REPLACE)) != 0) return -EINVAL; /* * To use null names we require CAP_DAC_READ_SEARCH @@ -4256,44 +4305,102 @@ int do_linkat(int olddfd, const char __user *oldname, int newdfd, if (flags & AT_SYMLINK_FOLLOW) how |= LOOKUP_FOLLOW; + + if (flags & AT_REPLACE) + target_flags = LOOKUP_RENAME_TARGET; + else + target_flags = LOOKUP_CREATE | LOOKUP_EXCL; retry: error = user_path_at(olddfd, oldname, how, &old_path); if (error) return error; - new_dentry = user_path_create(newdfd, newname, &new_path, - (how & LOOKUP_REVAL)); - error = PTR_ERR(new_dentry); - if (IS_ERR(new_dentry)) - goto out; + to = filename_parentat(newdfd, getname(newname), how & LOOKUP_REVAL, + &new_path, &new_last, &new_type); + if (IS_ERR(to)) { + error = PTR_ERR(to); + goto exit1; + } + + if (old_path.mnt != new_path.mnt) { + error = -EXDEV; + goto exit2; + } + + if (new_type != LAST_NORM) { + if (flags & AT_REPLACE) + error = -EBUSY; + else + error = -EEXIST; + goto exit2; + } + + error = mnt_want_write(old_path.mnt); + if (error) + goto exit2; + +retry_deleg: + inode_lock_nested(new_path.dentry->d_inode, I_MUTEX_PARENT); + + new_dentry = __lookup_hash(&new_last, new_path.dentry, + (how & LOOKUP_REVAL) | target_flags); + if (IS_ERR(new_dentry)) { + error = PTR_ERR(new_dentry); + goto exit3; + } + if (!(flags & AT_REPLACE) && d_is_positive(new_dentry)) { + error = -EEXIST; + goto exit4; + } + if (new_last.name[new_last.len]) { + /* trailing slash on negative dentry gives -ENOENT */ + if (d_is_negative(new_dentry)) { + error = -ENOENT; + goto exit4; + } + + /* + * unless the source is a directory, trailing slash gives + * -ENOTDIR (this can only happen in the AT_REPLACE case, so we + * make this consistent with sys_renameat2() even though a + * source directory will fail later with -EPERM) + */ + if (!d_is_dir(old_path.dentry)) { + error = -ENOTDIR; + goto exit4; + } + } - error = -EXDEV; - if (old_path.mnt != new_path.mnt) - goto out_dput; error = may_linkat(&old_path); if (unlikely(error)) - goto out_dput; + goto exit4; error = security_path_link(old_path.dentry, &new_path, new_dentry); if (error) - goto out_dput; - error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode); -out_dput: - done_path_create(&new_path, new_dentry); + goto exit4; + error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, + &delegated_inode, flags & AT_REPLACE); +exit4: + dput(new_dentry); +exit3: + inode_unlock(new_path.dentry->d_inode); if (delegated_inode) { error = break_deleg_wait(&delegated_inode); - if (!error) { - path_put(&old_path); - goto retry; - } + if (!error) + goto retry_deleg; } - if (retry_estale(error, how)) { - path_put(&old_path); + mnt_drop_write(old_path.mnt); +exit2: + if (retry_estale(error, how)) + should_retry = true; + path_put(&new_path); + putname(to); +exit1: + path_put(&old_path); + if (should_retry) { + should_retry = false; how |= LOOKUP_REVAL; goto retry; } -out: - path_put(&old_path); - return error; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 2410b093a2e6..541a78a6d684 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1594,7 +1594,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, err = nfserr_noent; if (d_really_is_negative(dold)) goto out_dput; - host_err = vfs_link(dold, dirp, dnew, NULL); + host_err = vfs_link(dold, dirp, dnew, NULL, 0); if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index e0b7de799f6b..906d0ee1de74 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -100,7 +100,7 @@ static inline int ovl_do_unlink(struct inode *dir, struct dentry *dentry) static inline int ovl_do_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, bool debug) { - int err = vfs_link(old_dentry, dir, new_dentry, NULL); + int err = vfs_link(old_dentry, dir, new_dentry, NULL, 0); if (debug) { pr_debug("link(%pd2, %pd2) = %i\n", old_dentry, new_dentry, err); diff --git a/include/linux/fs.h b/include/linux/fs.h index 760d8da1b6c7..8e44b12023c2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1607,7 +1607,7 @@ extern int vfs_create(struct inode *, struct dentry *, umode_t, bool); extern int vfs_mkdir(struct inode *, struct dentry *, umode_t); extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t); extern int vfs_symlink(struct inode *, struct dentry *, const char *); -extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **); +extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int); extern int vfs_rmdir(struct inode *, struct dentry *); extern int vfs_unlink(struct inode *, struct dentry *, struct inode **); extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int); @@ -1752,6 +1752,7 @@ struct inode_operations { int (*create) (struct inode *,struct dentry *, umode_t, bool); int (*link) (struct dentry *,struct inode *,struct dentry *); + int (*link2) (struct dentry *,struct inode *,struct dentry *,unsigned int); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); int (*mkdir) (struct inode *,struct dentry *,umode_t); diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6448cdd9a350..b601ad36e726 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -84,6 +84,7 @@ #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ #define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ +#define AT_REPLACE 0x2000 /* Replace new path */ #define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ #define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */