diff mbox

[2/2] ovl: allow distributed fs as lower layer

Message ID 1433424586-7771-3-git-send-email-miklos@szeredi.hu (mailing list archive)
State New, archived
Headers show

Commit Message

Miklos Szeredi June 4, 2015, 1:29 p.m. UTC
From: Miklos Szeredi <mszeredi@suse.cz>

Allow filesystems with .d_revalidate as lower layer(s), but not as upper
layer.

For local filesystems the rule was that modifications on the layers
directly while being part of the overlay results in undefined behavior.

This can easily be extended to distributed filesystems: we assume the tree
used as lower layer is static, which means ->d_revalidate() should always
return "1".  If that is not the case, return -ESTALE, don't try to work
around the modification.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/overlayfs/super.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 68 insertions(+), 5 deletions(-)

Comments

Al Viro June 5, 2015, 12:07 a.m. UTC | #1
On Thu, Jun 04, 2015 at 03:29:46PM +0200, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Allow filesystems with .d_revalidate as lower layer(s), but not as upper
> layer.
> 
> For local filesystems the rule was that modifications on the layers
> directly while being part of the overlay results in undefined behavior.
> 
> This can easily be extended to distributed filesystems: we assume the tree
> used as lower layer is static, which means ->d_revalidate() should always
> return "1".  If that is not the case, return -ESTALE, don't try to work
> around the modification.

Umm...  Cosmetical point is that this

> +static bool ovl_remote(struct dentry *root)
> +{
> +	const struct dentry_operations *dop = root->d_op;
> +
> +	return dop && (dop->d_revalidate || dop->d_weak_revalidate);
> +}

is better done as
	root->d_flags & (DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE)

More interesting question is whether anything in the system relies on
existing behaviour that follows ->d_revalidate() returning 0.  Have you
tried to mount e.g. procfs as underlying layer and torture it for a while?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miklos Szeredi June 5, 2015, 3:37 p.m. UTC | #2
On Fri, Jun 05, 2015 at 01:07:15AM +0100, Al Viro wrote:

> Umm...  Cosmetical point is that this
> 
> > +static bool ovl_remote(struct dentry *root)
> > +{
> > +	const struct dentry_operations *dop = root->d_op;
> > +
> > +	return dop && (dop->d_revalidate || dop->d_weak_revalidate);
> > +}
> 
> is better done as
> 	root->d_flags & (DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE)

Okay.

> 
> More interesting question is whether anything in the system relies on
> existing behaviour that follows ->d_revalidate() returning 0.

Hmm, d_invalidate() almost always follows ->d_revalidate().  Almost, becuase RCU
lookup can get aborted at that point.  We can easily stick d_invalidate() in
there for the non-RCU case.

Regular lookup also almost always follows ->d_revalidate().  Except if
allocation of new dentry fails.  So relying on this would be buggy (which is not
to say nobody does it).

>  Have you tried to mount e.g. procfs as underlying layer and torture it for a
> while?

I did try now.  Nothing bad happened during the test (parallel stat(1) of the
whole overlayed proc tree).

My laptop froze while trying to write this mail.  But it's 8 years old and when
the fan starts to make noises and the weather is hot, it does this sometimes.  I
don't think that has anything to do with overlayfs, but will do more testing...

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman June 7, 2015, 1:02 a.m. UTC | #3
Miklos Szeredi <miklos@szeredi.hu> writes:

> On Fri, Jun 05, 2015 at 01:07:15AM +0100, Al Viro wrote:
>
>> Umm...  Cosmetical point is that this
>> 
>> > +static bool ovl_remote(struct dentry *root)
>> > +{
>> > +	const struct dentry_operations *dop = root->d_op;
>> > +
>> > +	return dop && (dop->d_revalidate || dop->d_weak_revalidate);
>> > +}
>> 
>> is better done as
>> 	root->d_flags & (DCACHE_OP_REVALIDATE | DCACHE_OP_WEAK_REVALIDATE)
>
> Okay.
>
>> 
>> More interesting question is whether anything in the system relies on
>> existing behaviour that follows ->d_revalidate() returning 0.
>
> Hmm, d_invalidate() almost always follows ->d_revalidate().  Almost, becuase RCU
> lookup can get aborted at that point.  We can easily stick d_invalidate() in
> there for the non-RCU case.
>
> Regular lookup also almost always follows ->d_revalidate().  Except if
> allocation of new dentry fails.  So relying on this would be buggy (which is not
> to say nobody does it).
>
>>  Have you tried to mount e.g. procfs as underlying layer and torture it for a
>> while?
>
> I did try now.  Nothing bad happened during the test (parallel stat(1) of the
> whole overlayed proc tree).
>
> My laptop froze while trying to write this mail.  But it's 8 years old and when
> the fan starts to make noises and the weather is hot, it does this sometimes.  I
> don't think that has anything to do with overlayfs, but will do more
> testing...

A nasty corner case to be aware of (and I think this is part of what Al
was warning about).  /proc/sys/net is different depending upon which
current->nsproxy->net_ns.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miklos Szeredi June 9, 2015, 12:44 p.m. UTC | #4
On Sun, Jun 7, 2015 at 3:02 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
> A nasty corner case to be aware of (and I think this is part of what Al
> was warning about).  /proc/sys/net is different depending upon which
> current->nsproxy->net_ns.

Ah,  I'm beginning to grasp what's going on there: mulitple dentries
with the same name but belonging to different namespaces,
->d_compare() being used to select the right one.  Is that it?

Overlayfs checks for d_compare() on the root of the lower and upper
trees, but here it only set on a subdirectory of proc, not on every
dentry.  So overlayfs should be careful and check for
DCACHE_OP_COMPARE | DCACHE_OP_HASH and reject going down such a
dentry.


Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman June 9, 2015, 12:54 p.m. UTC | #5
Miklos Szeredi <miklos@szeredi.hu> writes:

> On Sun, Jun 7, 2015 at 3:02 AM, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> A nasty corner case to be aware of (and I think this is part of what Al
>> was warning about).  /proc/sys/net is different depending upon which
>> current->nsproxy->net_ns.
>
> Ah,  I'm beginning to grasp what's going on there: mulitple dentries
> with the same name but belonging to different namespaces,
> ->d_compare() being used to select the right one.  Is that it?

Yes.  The whole sysctl_is_seen magic.

I am not proud of it, and I keep thinking I should create
/proc/<pid>/sys/...  making /proc/sys a symlink to /proc/<pid>/sys/

so that case could go away.  Although at this point tomoyo or apparmor
probably has rules that would make that impossible (despite no
applications actually caring). *sigh*

> Overlayfs checks for d_compare() on the root of the lower and upper
> trees, but here it only set on a subdirectory of proc, not on every
> dentry.  So overlayfs should be careful and check for
> DCACHE_OP_COMPARE | DCACHE_OP_HASH and reject going down such a
> dentry.

That sound about right.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index de9d2ee68ccf..822e6bbe7bf7 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -273,10 +273,52 @@  static void ovl_dentry_release(struct dentry *dentry)
 	}
 }
 
+static int ovl_dentry_revalidate(struct dentry *dentry, unsigned int flags)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+	unsigned int i;
+	int ret = 1;
+
+	for (i = 0; i < oe->numlower; i++) {
+		struct dentry *d = oe->lowerstack[i].dentry;
+
+		if (d->d_flags & DCACHE_OP_REVALIDATE) {
+			ret = d->d_op->d_revalidate(d, flags);
+			if (ret <= 0)
+				return ret ? ret : -ESTALE;
+		}
+	}
+	return 1;
+}
+
+static int ovl_dentry_weak_revalidate(struct dentry *dentry, unsigned int flags)
+{
+	struct ovl_entry *oe = dentry->d_fsdata;
+	unsigned int i;
+	int ret = 1;
+
+	for (i = 0; i < oe->numlower; i++) {
+		struct dentry *d = oe->lowerstack[i].dentry;
+
+		if (d->d_flags & DCACHE_OP_WEAK_REVALIDATE) {
+			ret = d->d_op->d_weak_revalidate(d, flags);
+			if (ret <= 0)
+				return ret ? ret : -ESTALE;
+		}
+	}
+	return 1;
+}
+
 static const struct dentry_operations ovl_dentry_operations = {
 	.d_release = ovl_dentry_release,
 };
 
+static const struct dentry_operations ovl_reval_dentry_operations = {
+	.d_release = ovl_dentry_release,
+	.d_revalidate = ovl_dentry_revalidate,
+	.d_weak_revalidate = ovl_dentry_weak_revalidate,
+};
+
 static struct ovl_entry *ovl_alloc_entry(unsigned int numlower)
 {
 	size_t size = offsetof(struct ovl_entry, lowerstack[numlower]);
@@ -705,18 +747,23 @@  static bool ovl_is_allowed_fs_type(struct dentry *root)
 	/*
 	 * We don't support:
 	 *  - autofs
-	 *  - filesystems with revalidate (FIXME for lower layer)
 	 *  - filesystems with case insensitive names
 	 */
 	if (dop &&
 	    (dop->d_manage ||
-	     dop->d_revalidate || dop->d_weak_revalidate ||
 	     dop->d_compare || dop->d_hash)) {
 		return false;
 	}
 	return true;
 }
 
+static bool ovl_remote(struct dentry *root)
+{
+	const struct dentry_operations *dop = root->d_op;
+
+	return dop && (dop->d_revalidate || dop->d_weak_revalidate);
+}
+
 static int ovl_mount_dir_noesc(const char *name, struct path *path)
 {
 	int err = -EINVAL;
@@ -755,13 +802,21 @@  static int ovl_mount_dir(const char *name, struct path *path)
 	if (tmp) {
 		ovl_unescape(tmp);
 		err = ovl_mount_dir_noesc(tmp, path);
+
+		if (!err)
+			if (ovl_remote(path->dentry)) {
+				pr_err("overlayfs: filesystem on '%s' not supported as upperdir\n",
+				       tmp);
+				path_put(path);
+				err = -EINVAL;
+			}
 		kfree(tmp);
 	}
 	return err;
 }
 
 static int ovl_lower_dir(const char *name, struct path *path, long *namelen,
-			 int *stack_depth)
+			 int *stack_depth, bool *remote)
 {
 	int err;
 	struct kstatfs statfs;
@@ -778,6 +833,9 @@  static int ovl_lower_dir(const char *name, struct path *path, long *namelen,
 	*namelen = max(*namelen, statfs.f_namelen);
 	*stack_depth = max(*stack_depth, path->mnt->mnt_sb->s_stack_depth);
 
+	if (ovl_remote(path->dentry))
+		*remote = true;
+
 	return 0;
 
 out_put:
@@ -831,6 +889,7 @@  static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	unsigned int numlower;
 	unsigned int stacklen = 0;
 	unsigned int i;
+	bool remote = false;
 	int err;
 
 	err = -ENOMEM;
@@ -904,7 +963,8 @@  static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	lower = lowertmp;
 	for (numlower = 0; numlower < stacklen; numlower++) {
 		err = ovl_lower_dir(lower, &stack[numlower],
-				    &ufs->lower_namelen, &sb->s_stack_depth);
+				    &ufs->lower_namelen, &sb->s_stack_depth,
+				    &remote);
 		if (err)
 			goto out_put_lowerpath;
 
@@ -962,7 +1022,10 @@  static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 	if (!ufs->upper_mnt)
 		sb->s_flags |= MS_RDONLY;
 
-	sb->s_d_op = &ovl_dentry_operations;
+	if (remote)
+		sb->s_d_op = &ovl_reval_dentry_operations;
+	else
+		sb->s_d_op = &ovl_dentry_operations;
 
 	err = -ENOMEM;
 	oe = ovl_alloc_entry(numlower);