diff mbox series

ceph: is_root_ceph_dentry() cleanup

Message ID 20250128011023.55012-1-slava@dubeyko.com (mailing list archive)
State New
Headers show
Series ceph: is_root_ceph_dentry() cleanup | expand

Commit Message

Viacheslav Dubeyko Jan. 28, 2025, 1:10 a.m. UTC
From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

This patch introduces CEPH_HIDDEN_DIR_NAME. It
declares name of the hidden directory .ceph in
the include/linux/ceph/ceph_fs.h instead of hiding
it in dir.c file. Also hardcoded length of the name
is changed on strlen(CEPH_HIDDEN_DIR_NAME).

Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
---
 fs/ceph/dir.c                | 10 ++++++++--
 include/linux/ceph/ceph_fs.h |  2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

Comments

Al Viro Jan. 28, 2025, 3:07 a.m. UTC | #1
On Mon, Jan 27, 2025 at 05:10:23PM -0800, Viacheslav Dubeyko wrote:
> From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
> 
> This patch introduces CEPH_HIDDEN_DIR_NAME. It
> declares name of the hidden directory .ceph in
> the include/linux/ceph/ceph_fs.h instead of hiding
> it in dir.c file. Also hardcoded length of the name
> is changed on strlen(CEPH_HIDDEN_DIR_NAME).

Hmm...

Speaking of that area
	* how the hell could ceph_lookup() ever be called with dentry
that is *NOT* negative?  VFS certainly won't do that; I'm not sure about
ceph_handle_notrace_create(), but it doesn't look like that's possible
without server being malicious (if it's possible at all).

	* speaking of malicious servers, what happens if
it gets CEPH_MDS_OP_LOOKUP and it returns a normal reply to positive
lookup, but with cpu_to_le32(-ENOENT) shoved into head->result?
	AFAICS, ceph_handle_snapdir() will be called with dentry
that is already made positive; results will not be pretty...
Viacheslav Dubeyko Jan. 28, 2025, 11:27 p.m. UTC | #2
On Tue, 2025-01-28 at 03:07 +0000, Al Viro wrote:
> On Mon, Jan 27, 2025 at 05:10:23PM -0800, Viacheslav Dubeyko wrote:
> > From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
> > 
> > This patch introduces CEPH_HIDDEN_DIR_NAME. It
> > declares name of the hidden directory .ceph in
> > the include/linux/ceph/ceph_fs.h instead of hiding
> > it in dir.c file. Also hardcoded length of the name
> > is changed on strlen(CEPH_HIDDEN_DIR_NAME).
> 
> Hmm...
> 
> Speaking of that area
> 	* how the hell could ceph_lookup() ever be called with dentry
> that is *NOT* negative?  VFS certainly won't do that; I'm not sure about
> ceph_handle_notrace_create(), but it doesn't look like that's possible
> without server being malicious (if it's possible at all).
> 
> 	* speaking of malicious servers, what happens if
> it gets CEPH_MDS_OP_LOOKUP and it returns a normal reply to positive
> lookup, but with cpu_to_le32(-ENOENT) shoved into head->result?
> 	AFAICS, ceph_handle_snapdir() will be called with dentry
> that is already made positive; results will not be pretty...

I assume that you imply this code:

	/* can we conclude ENOENT locally? */
	if (d_really_is_negative(dentry)) {
		struct ceph_inode_info *ci = ceph_inode(dir);
		struct ceph_dentry_info *di = ceph_dentry(dentry);

		spin_lock(&ci->i_ceph_lock);
		doutc(cl, " dir %llx.%llx flags are 0x%lx\n",
		      ceph_vinop(dir), ci->i_ceph_flags);
		if (strncmp(dentry->d_name.name,
			    fsc->mount_options->snapdir_name,
			    dentry->d_name.len) &&
		    !is_root_ceph_dentry(dir, dentry) &&
		    ceph_test_mount_opt(fsc, DCACHE) &&
		    __ceph_dir_is_complete(ci) &&
		    __ceph_caps_issued_mask_metric(ci, CEPH_CAP_FILE_SHARED,
1)) {
			__ceph_touch_fmode(ci, mdsc, CEPH_FILE_MODE_RD);
			spin_unlock(&ci->i_ceph_lock);
			doutc(cl, " dir %llx.%llx complete, -ENOENT\n",
			      ceph_vinop(dir));
			d_add(dentry, NULL);
			di->lease_shared_gen = atomic_read(&ci->i_shared_gen);
			return NULL;
		}
		spin_unlock(&ci->i_ceph_lock);
	}

Am I correct? So, how can we rework this code if it's wrong? What is your
vision? Do you mean that it's dead code? How can we check it?

Thanks,
Slava.
Al Viro Jan. 29, 2025, 1:12 a.m. UTC | #3
On Tue, Jan 28, 2025 at 11:27:05PM +0000, Viacheslav Dubeyko wrote:

> I assume that you imply this code:
> 
> 	/* can we conclude ENOENT locally? */
> 	if (d_really_is_negative(dentry)) {
> 		struct ceph_inode_info *ci = ceph_inode(dir);
> 		struct ceph_dentry_info *di = ceph_dentry(dentry);
> 
> 		spin_lock(&ci->i_ceph_lock);
> 		doutc(cl, " dir %llx.%llx flags are 0x%lx\n",
> 		      ceph_vinop(dir), ci->i_ceph_flags);
> 		if (strncmp(dentry->d_name.name,
> 			    fsc->mount_options->snapdir_name,
> 			    dentry->d_name.len) &&
> 		    !is_root_ceph_dentry(dir, dentry) &&
> 		    ceph_test_mount_opt(fsc, DCACHE) &&
> 		    __ceph_dir_is_complete(ci) &&
> 		    __ceph_caps_issued_mask_metric(ci, CEPH_CAP_FILE_SHARED,
> 1)) {
> 			__ceph_touch_fmode(ci, mdsc, CEPH_FILE_MODE_RD);
> 			spin_unlock(&ci->i_ceph_lock);
> 			doutc(cl, " dir %llx.%llx complete, -ENOENT\n",
> 			      ceph_vinop(dir));
> 			d_add(dentry, NULL);
> 			di->lease_shared_gen = atomic_read(&ci->i_shared_gen);
> 			return NULL;
> 		}
> 		spin_unlock(&ci->i_ceph_lock);
> 	}
> 
> Am I correct? So, how can we rework this code if it's wrong? What is your
> vision? Do you mean that it's dead code? How can we check it?

I mean that ->lookup() is called *ONLY* for a negative unhashed dentries.
In other words, on a call from VFS that condition will always be true.
That part is easily provable; what is harder to reason about is the
direct call of ceph_lookup() from ceph_handle_notrace_create().

The callers of that thing (ceph_mknod(), ceph_symlink() and ceph_mkdir())
are all guaranteed that dentry will be negative when they are called.
The hard-to-reason-about part is the call of ceph_mdsc_do_request()
directly preceding the calls of ceph_handle_notrace_create().

Can ceph_mdsc_do_request() return 0, with req->r_reply_info.head->is_dentry
being false *AND* a call of splice_dentry() made by ceph_fill_trace() called
by ceph_mdsc_do_request()?

AFAICS, there are 3 calls of splice_dentry(); two of them happen under
explicit check for ->is_dentry and thus are not interesting for our
purposes.  The third one, though, could be hit with ->is_dentry being
false and ->r_op being CEPH_MDS_OP_MKSNAP.  That is not impossible
from ceph_mkdir(), as far as I can tell, and I don't understand the
details well enough to tell whether it can actually happen.

Is it actually possible to hit ceph_handle_notrace_create() with
a positive dentry?
diff mbox series

Patch

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 0bf388e07a02..5151c614b5cb 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -782,10 +782,16 @@  struct dentry *ceph_finish_lookup(struct ceph_mds_request *req,
 	return dentry;
 }
 
+static inline
+bool is_hidden_ceph_dir(struct dentry *dentry)
+{
+	size_t len = strlen(CEPH_HIDDEN_DIR_NAME);
+	return strncmp(dentry->d_name.name, CEPH_HIDDEN_DIR_NAME, len) == 0;
+}
+
 static bool is_root_ceph_dentry(struct inode *inode, struct dentry *dentry)
 {
-	return ceph_ino(inode) == CEPH_INO_ROOT &&
-		strncmp(dentry->d_name.name, ".ceph", 5) == 0;
+	return ceph_ino(inode) == CEPH_INO_ROOT && is_hidden_ceph_dir(dentry);
 }
 
 /*
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 2d7d86f0290d..84a1391aab29 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -31,6 +31,8 @@ 
 #define CEPH_INO_CEPH   2            /* hidden .ceph dir */
 #define CEPH_INO_GLOBAL_SNAPREALM  3 /* global dummy snaprealm */
 
+#define CEPH_HIDDEN_DIR_NAME	".ceph"
+
 /* arbitrary limit on max # of monitors (cluster of 3 is typical) */
 #define CEPH_MAX_MON   31