diff mbox

dcache: make d_splice_alias use d_materialise_unique

Message ID 20140123212700.GA30466@fieldses.org (mailing list archive)
State New, archived
Headers show

Commit Message

J. Bruce Fields Jan. 23, 2014, 9:27 p.m. UTC
From: "J. Bruce Fields" <bfields@redhat.com>

d_splice_alias can create duplicate directory aliases (in the !new
case), or (in the new case) d_move directories without holding
appropriate locks.

d_materialise_unique deals with both of these problems.  (The latter
seems to be dealt by trylocks (see __d_unalias), which look like they
could cause spurious lookup failures--but that's at least better than
corrupting the dcache.)

We have to fix up a couple minor differences between d_splice_alias and
d_materialise_unique:
	- d_splice_alias also handles IS_ERR(inode)
	- d_splice_alias allows already-hashed dentries in one special
	  case.

We keep the d_splice_alias name and calling convention and deprecate
d_materialise_unique, which has fewer callers.

Also add some documentation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/dcache.c |   96 ++++++++++++++++++++++-------------------------------------
 1 file changed, 35 insertions(+), 61 deletions(-)

Here's a revised patch.

If it looks reasonable then there are some further minor simplifications
possible.  (See git://linux-nfs.org/~bfields/linux-topics.git for-viro.)

Comments

Al Viro Jan. 31, 2014, 6:42 p.m. UTC | #1
On Thu, Jan 23, 2014 at 04:27:00PM -0500, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> d_splice_alias can create duplicate directory aliases (in the !new
> case), or (in the new case) d_move directories without holding
> appropriate locks.

Details, please.  In the new case, we have IS_ROOT() alias found;
what locks would that need?  Note that d_materialise_unique() won't
bother with __d_unalias() in such case - it does what d_move() would've
done, without taking any mutex.

In the !new case, we'd need a preexisting dentry alias, complete with
parent.  IOW, that's the case when directory already in the tree
has been found during lookup from another parent.  In which case
we shouldn't be using d_splice_alias() at all, as it is (and it
certainly can't happen for any local fs).

Now, I agree that merging that with d_materialise_unique() might be
a good idea, but commit message is wrong as it, AFAICS.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Jan. 31, 2014, 7:47 p.m. UTC | #2
On Fri, Jan 31, 2014 at 06:42:58PM +0000, Al Viro wrote:
> On Thu, Jan 23, 2014 at 04:27:00PM -0500, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > d_splice_alias can create duplicate directory aliases (in the !new
> > case), or (in the new case) d_move directories without holding
> > appropriate locks.
> 
> Details, please.  In the new case, we have IS_ROOT() alias found;
> what locks would that need?  Note that d_materialise_unique() won't
> bother with __d_unalias() in such case - it does what d_move() would've
> done, without taking any mutex.

Of course you're right, and Miklos had pointed this out already and I
forgot to update the changelog.  Apologies!

> In the !new case, we'd need a preexisting dentry alias, complete with
> parent.  IOW, that's the case when directory already in the tree
> has been found during lookup from another parent.  In which case
> we shouldn't be using d_splice_alias() at all, as it is (and it
> certainly can't happen for any local fs).

Yes, except: won't a local filesystem will still hit this case on a
filesystem that's corrupted to link a directory into multiple parents?

Though in that case arguably the right behavior might be, say, WARN and
return -EIO.

> Now, I agree that merging that with d_materialise_unique() might be
> a good idea, but commit message is wrong as it, AFAICS.

Agreed, I'll fix and resend.

Though now I wonder whether it's worth keeping two different interfaces,
one for the case when finding a parent in a different directory is an
error and one for the case when it's normal and you'd just like it fixed
up.

(Then one remaining thing I don't understand is how to make that fixing
up reliable.  Or is there some reason nobody hits the _EBUSY case of
__d_unalias?)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Feb. 6, 2014, 5:03 p.m. UTC | #3
On Fri, Jan 31, 2014 at 02:47:58PM -0500, J. Bruce Fields wrote:
> (Then one remaining thing I don't understand is how to make that fixing
> up reliable.  Or is there some reason nobody hits the _EBUSY case of
> __d_unalias?)

In fact, a reproducer found thanks to Hideshi Yamaoka:

On server:

	while true; do
		mv /exports/DIR /exports/TO/DIR
		mv /exports/TO/DIR /exports/DIR
	done

On client:

	mount -olookupcache=pos /mnt

	while true; do
		ls /mnt/TO;
	done

Also on client:

	while true; do
		strace -e open cat /mnt/DIR/test.txt 2>&1 | grep EBUSY
	done

Once all three of those loops are running I hit

	open("/mnt/DIR/test.txt", O_RDONLY)	= -1 EBUSY

very quickly.

The "lookupcache=pos" isn't really necessary but makes the reproducer
more reliable.

(Originally this was seen on a single client: the client itself was
doing the renames but also continually killing the second mv.  I suspect
that means the client sends the RENAME but then fails to update its
dcache, the result being again that the client's dcache is out of sync
with the server's tree and hence lookup is stuck trying to grab a dentry
from another directory.)

Is there some solution short of making ->lookup callers drop the i_mutex
and retry???

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/dcache.c b/fs/dcache.c
index 4bdb300..ec43194 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1765,7 +1765,7 @@  EXPORT_SYMBOL(d_instantiate_unique);
  * @inode: inode to attach to this dentry
  *
  * Fill in inode information in the entry.  If a directory alias is found, then
- * return an error (and drop inode).  Together with d_materialise_unique() this
+ * return an error (and drop inode).  Together with d_splice_alias() this
  * guarantees that a directory inode may never have more than one alias.
  */
 int d_instantiate_no_diralias(struct dentry *entry, struct inode *inode)
@@ -1905,58 +1905,6 @@  struct dentry *d_obtain_alias(struct inode *inode)
 EXPORT_SYMBOL(d_obtain_alias);
 
 /**
- * d_splice_alias - splice a disconnected dentry into the tree if one exists
- * @inode:  the inode which may have a disconnected dentry
- * @dentry: a negative dentry which we want to point to the inode.
- *
- * If inode is a directory and has a 'disconnected' dentry (i.e. IS_ROOT and
- * DCACHE_DISCONNECTED), then d_move that in place of the given dentry
- * and return it, else simply d_add the inode to the dentry and return NULL.
- *
- * This is needed in the lookup routine of any filesystem that is exportable
- * (via knfsd) so that we can build dcache paths to directories effectively.
- *
- * If a dentry was found and moved, then it is returned.  Otherwise NULL
- * is returned.  This matches the expected return value of ->lookup.
- *
- * Cluster filesystems may call this function with a negative, hashed dentry.
- * In that case, we know that the inode will be a regular file, and also this
- * will only occur during atomic_open. So we need to check for the dentry
- * being already hashed only in the final case.
- */
-struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
-{
-	struct dentry *new = NULL;
-
-	if (IS_ERR(inode))
-		return ERR_CAST(inode);
-
-	if (inode && S_ISDIR(inode->i_mode)) {
-		spin_lock(&inode->i_lock);
-		new = __d_find_alias(inode, 1);
-		if (new) {
-			BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
-			spin_unlock(&inode->i_lock);
-			security_d_instantiate(new, inode);
-			d_move(new, dentry);
-			iput(inode);
-		} else {
-			/* already taking inode->i_lock, so d_add() by hand */
-			__d_instantiate(dentry, inode);
-			spin_unlock(&inode->i_lock);
-			security_d_instantiate(dentry, inode);
-			d_rehash(dentry);
-		}
-	} else {
-		d_instantiate(dentry, inode);
-		if (d_unhashed(dentry))
-			d_rehash(dentry);
-	}
-	return new;
-}
-EXPORT_SYMBOL(d_splice_alias);
-
-/**
  * d_add_ci - lookup or allocate new dentry with case-exact name
  * @inode:  the inode case-insensitive lookup has found
  * @dentry: the negative dentry that was passed to the parent's lookup func
@@ -2716,19 +2664,37 @@  static void __d_materialise_dentry(struct dentry *dentry, struct dentry *anon)
 }
 
 /**
- * d_materialise_unique - introduce an inode into the tree
- * @dentry: candidate dentry
+ * d_splice_alias - introduce an inode into the tree
  * @inode: inode to bind to the dentry, to which aliases may be attached
+ * @dentry: candidate dentry
+ *
+ * Introduces a dentry into the tree, using either the provided dentry
+ * or, if appropriate, a preexisting alias for the same inode.  Caller must
+ * hold the i_mutex of the parent directory.
+ *
+ * The inode may also be an error or NULL; in the former case the error is
+ * just passed through, in the latter we hash and instantiate the negative
+ * dentry.  This allows filesystems to use d_splice_alias as the
+ * unconditional last step of their lookup method.
+ *
+ * d_splice_alias guarantees that directories will continue to have at
+ * most one alias, by moving an existing alias if necessary.  If doing
+ * so would create a directory loop, it will fail with -ELOOP.
+ *
+ * d_splice_alias makes no such guarantee for files, but may still
+ * use a preexisting alias when it's convenient.
  *
- * Introduces an dentry into the tree, substituting an extant disconnected
- * root directory alias in its place if there is one. Caller must hold the
- * i_mutex of the parent directory.
+ * Note that d_splice_alias normally expects an unhashed dentry, the single
+ * exception being that cluster filesystems may call this function
+ * during atomic_open with a negative hashed dentry, and with inode a
+ * regular file.
  */
-struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
+struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
 {
 	struct dentry *actual;
 
-	BUG_ON(!d_unhashed(dentry));
+	if (IS_ERR(inode))
+		return ERR_CAST(inode);
 
 	if (!inode) {
 		actual = dentry;
@@ -2788,7 +2754,8 @@  struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
 
 	spin_lock(&actual->d_lock);
 found:
-	_d_rehash(actual);
+	if (d_unhashed(actual))
+		_d_rehash(actual);
 	spin_unlock(&actual->d_lock);
 	spin_unlock(&inode->i_lock);
 out_nolock:
@@ -2800,6 +2767,13 @@  out_nolock:
 	iput(inode);
 	return actual;
 }
+EXPORT_SYMBOL(d_splice_alias);
+
+/* deprecated synonym for d_splice_alias(inode, dentry): */
+struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
+{
+	return d_splice_alias(inode, dentry);
+}
 EXPORT_SYMBOL_GPL(d_materialise_unique);
 
 static int prepend(char **buffer, int *buflen, const char *str, int namelen)