From patchwork Thu Dec 1 10:14:59 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 9455741 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8E83D60515 for ; Thu, 1 Dec 2016 10:16:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7E82D283BB for ; Thu, 1 Dec 2016 10:16:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7396A284CB; Thu, 1 Dec 2016 10:16:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4DE4283BB for ; Thu, 1 Dec 2016 10:16:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756367AbcLAKPP (ORCPT ); Thu, 1 Dec 2016 05:15:15 -0500 Received: from mail-wm0-f45.google.com ([74.125.82.45]:33213 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756188AbcLAKPN (ORCPT ); Thu, 1 Dec 2016 05:15:13 -0500 Received: by mail-wm0-f45.google.com with SMTP id c184so59054510wmd.0 for ; Thu, 01 Dec 2016 02:15:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ANRbw/KJa/9TCbuqu3eeiUNG09N4e6UuQg2RZz6JmxM=; b=GGgsTiCCIu7ihsIsK5WWLdAoUuDfI7e98aNY0shGVW1+21fFq0EIdGQeEezLF5yC0V gMEITTJky82ogzIfo4uWSlMPPYzxFh9LyR3Q1MFeIfVaJNA1HAO7nyYDTNqjyP5OyZXN wWPiWAWejhM26nlF/5ozlofJDJFI/WWrxzuNzpqoBhmbCsTfofKAd6hcKMPPqBGbC1+K ORVWSgIHlQkGEV8kNUMOCXOubQ0II3x/spqYoh7hoQfzhkkezoedY7erTLcOY6tRJHcp 1/Lfjm/eGRq93RjImDfjkFkq1srbH1bSHYqCk2WEDHYsuzA5p5xXSsl203fbkbcV1Yiz fw5A== X-Gm-Message-State: AKaTC004w8jvvee4N1RMiDZ+p7HpyBwYxPdaITKZyq75XgvP44uAtkZcE/Cnq9CcmfM+e+A5 X-Received: by 10.28.126.146 with SMTP id z140mr34751536wmc.84.1480587311465; Thu, 01 Dec 2016 02:15:11 -0800 (PST) Received: from veci.piliscsaba.szeredi.hu (pool-dsl-2c-0018.externet.hu. [217.173.44.24]) by smtp.gmail.com with ESMTPSA id ab10sm77218141wjc.45.2016.12.01.02.15.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 01 Dec 2016 02:15:10 -0800 (PST) From: Miklos Szeredi To: Al Viro Cc: linux-unionfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 4/8] ovl: add infrastructure for intercepting file ops Date: Thu, 1 Dec 2016 11:14:59 +0100 Message-Id: <1480587303-25088-5-git-send-email-mszeredi@redhat.com> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1480587303-25088-1-git-send-email-mszeredi@redhat.com> References: <1480587303-25088-1-git-send-email-mszeredi@redhat.com> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Overlayfs needs to intercept a few file operations in order to properly handle the following corner case: - file X is in overlayfs; - X resides on a lower (read-only) layer the lower file is L; - X is opened with O_RDONLY -> L is opened -> 'rofd'; - X is opened with O_WRONLY -> this results in L being copied up to the upper layer file U U is opened -> 'rwfd'; - write to 'rwfd' modifying U; - read from 'rofd' gets data from unmodified L; While this is a rare occurrence, it has been known to cause problems. To prevent such inconsistencies, allow intercepting read operations and fix them up in the unlikely case that it's needed. This patch adds an ovl_fops structure that is going to contain the pieces necessary for intercepting file operations: a) a new file_operations structure, based on the original but changing those operations that need to be intercepted; b) a pointer to the origin f_op. Intercepted operations will normally just call the original, unless the file was copied up; c) a hash table entry to hook this up for reuse in subsequent opens. The hash table is small (32 buckets) since there will only be a few different file operations used (basically the number of different filesystems used as lower layer of overlay). The key to the has table is the original fops pointer. Insertion is done under a global mutex. There will only be one insertion event for each unique underlying file_operations structure, so this is going to be rare. The hash table is accessed using rcu, so this will add minimal overhead to opens. The override fops are removed only at module exit time, so we don't even have to be worry about read-side rcu locking. There are a few assumption this scheme makes about file operation structures supplied by filesystems: 1) the lifetime of the structures is equal to the lifetime of the filesystem module; ensure this by holding a ref to the sb->s_type->owner (normal filesystems don't fill in f_ops->owner). This means that once a filesystem has been used as lower layer of overlayfs its module cannot be removed until the overlay module itself is removed. I don't believe this will be a problem. 2) Filesystems should not make use of f_op in any way except possibly replacing it in ->open. Overlayfs itself breaks this assumption, but recursion is handled by ->d_real so this is not an issue. Add a ->magic field to the override fops structure to verify that the filesystem didn't mess with file->f_op. Signed-off-by: Miklos Szeredi --- fs/overlayfs/inode.c | 146 ++++++++++++++++++++++++++++++++++++++++++++++- fs/overlayfs/overlayfs.h | 2 + fs/overlayfs/super.c | 1 + 3 files changed, 148 insertions(+), 1 deletion(-) diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index 1981a5514f51..e98e3323c330 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -12,6 +12,7 @@ #include #include #include +#include #include "overlayfs.h" static int ovl_copy_up_truncate(struct dentry *dentry) @@ -334,16 +335,159 @@ static const struct inode_operations ovl_symlink_inode_operations = { .update_time = ovl_update_time, }; +static DEFINE_READ_MOSTLY_HASHTABLE(ovl_fops_htable, 5); +static DEFINE_MUTEX(ovl_fops_mutex); + +#define OVL_FOPS_MAGIC 0x73706f462a6c764f + +struct ovl_fops { + struct module *owner; + struct file_operations fops; + u64 magic; + const struct file_operations *orig_fops; + struct hlist_node entry; +}; + +void ovl_cleanup_fops_htable(void) +{ + int bkt; + struct hlist_node *tmp; + struct ovl_fops *ofop; + + hash_for_each_safe(ovl_fops_htable, bkt, tmp, ofop, entry) { + module_put(ofop->owner); + fops_put(ofop->orig_fops); + kfree(ofop); + } +} + +#define OVL_CALL_REAL_FOP(file, call) \ + ({ struct ovl_fops *__ofop = \ + container_of(file->f_op, struct ovl_fops, fops); \ + WARN_ON(__ofop->magic != OVL_FOPS_MAGIC) ? -EIO : \ + __ofop->orig_fops->call; \ + }) + +static struct ovl_fops *ovl_fops_find(const struct file_operations *orig) +{ + struct ovl_fops *ofop; + + hash_for_each_possible_rcu(ovl_fops_htable, ofop, entry, (long) orig) { + if (ofop->orig_fops == orig) + return ofop; + } + return NULL; +} + +static struct ovl_fops *ovl_fops_get(struct file *file) +{ + const struct file_operations *orig = file->f_op; + struct ovl_fops *ofop = ovl_fops_find(orig); + + if (likely(ofop)) + return ofop; + + mutex_lock(&ovl_fops_mutex); + ofop = ovl_fops_find(orig); + if (ofop) + goto out_unlock; + + ofop = kzalloc(sizeof(struct ovl_fops), GFP_KERNEL); + if (!ofop) + goto out_unlock; + + /* + * FS don't usually fill in fops->owner, so grab ref to filesystem's + * module as well. + */ + ofop->owner = file_inode(file)->i_sb->s_type->owner; + __module_get(ofop->owner); + + ofop->magic = OVL_FOPS_MAGIC; + ofop->orig_fops = fops_get(orig); + + /* These will need to be intercepted: */ + ofop->fops.read_iter = orig->read_iter; + ofop->fops.mmap = orig->mmap; + ofop->fops.fsync = orig->fsync; + + /* + * These should be intercepted, but they are very unlikely to be + * a problem in practice. Leave them alone for now. + */ + ofop->fops.copy_file_range = orig->copy_file_range; + ofop->fops.clone_file_range = orig->clone_file_range; + ofop->fops.dedupe_file_range = orig->dedupe_file_range; + + /* Don't intercept these: */ + ofop->fops.llseek = orig->llseek; + ofop->fops.unlocked_ioctl = orig->unlocked_ioctl; + ofop->fops.compat_ioctl = orig->compat_ioctl; + ofop->fops.flush = orig->flush; + ofop->fops.release = orig->release; + ofop->fops.get_unmapped_area = orig->get_unmapped_area; + ofop->fops.check_flags = orig->check_flags; + + /* splice_read should be generic_file_splice_read */ + WARN_ON(orig->splice_read != generic_file_splice_read); + ofop->fops.splice_read = generic_file_splice_read; + + /* These make no sense for "normal" files: */ + WARN_ON(orig->read); + WARN_ON(orig->iterate); + WARN_ON(orig->iterate_shared); + WARN_ON(orig->poll); + WARN_ON(orig->fasync); + WARN_ON(orig->show_fdinfo); + + /* + * Don't add those which are unneeded for O_RDONLY: + * + * write + * write_iter + * splice_write + * sendpage + * fallocate + * + * Locking operations are already intercepted by vfs for ovl: + * + * lock + * flock + * setlease + */ + + hash_add_rcu(ovl_fops_htable, &ofop->entry, (long) orig); + +out_unlock: + mutex_unlock(&ovl_fops_mutex); + + return ofop; +} + static int ovl_open(struct inode *inode, struct file *file) { int ret = 0; + struct ovl_fops *ofop; + bool isupper = OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry)); /* Want fops from real inode */ replace_fops(file, inode->i_fop); if (file->f_op->open) ret = file->f_op->open(inode, file); - return ret; + /* No need to override fops for upper */ + if (isupper || ret) + return ret; + + ofop = ovl_fops_get(file); + if (unlikely(!ofop)) { + if (file->f_op->release) + file->f_op->release(inode, file); + return -ENOMEM; + } + replace_fops(file, &ofop->fops); + + return 0; } static const struct file_operations ovl_file_operations = { diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index a83de5d5b8a0..a305083112f9 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -209,6 +209,8 @@ static inline void ovl_copyattr(struct inode *from, struct inode *to) to->i_ctime = from->i_ctime; } +void ovl_cleanup_fops_htable(void); + /* dir.c */ extern const struct inode_operations ovl_dir_inode_operations; struct dentry *ovl_lookup_temp(struct dentry *workdir, struct dentry *dentry); diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 4bd1e9c7246f..f9a2021c9abb 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -963,6 +963,7 @@ static int __init ovl_init(void) static void __exit ovl_exit(void) { unregister_filesystem(&ovl_fs_type); + ovl_cleanup_fops_htable(); } module_init(ovl_init);