Message ID | 20250128011023.55012-1-slava@dubeyko.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ceph: is_root_ceph_dentry() cleanup | expand |
On Mon, Jan 27, 2025 at 05:10:23PM -0800, Viacheslav Dubeyko wrote: > From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> > > This patch introduces CEPH_HIDDEN_DIR_NAME. It > declares name of the hidden directory .ceph in > the include/linux/ceph/ceph_fs.h instead of hiding > it in dir.c file. Also hardcoded length of the name > is changed on strlen(CEPH_HIDDEN_DIR_NAME). Hmm... Speaking of that area * how the hell could ceph_lookup() ever be called with dentry that is *NOT* negative? VFS certainly won't do that; I'm not sure about ceph_handle_notrace_create(), but it doesn't look like that's possible without server being malicious (if it's possible at all). * speaking of malicious servers, what happens if it gets CEPH_MDS_OP_LOOKUP and it returns a normal reply to positive lookup, but with cpu_to_le32(-ENOENT) shoved into head->result? AFAICS, ceph_handle_snapdir() will be called with dentry that is already made positive; results will not be pretty...
On Tue, 2025-01-28 at 03:07 +0000, Al Viro wrote: > On Mon, Jan 27, 2025 at 05:10:23PM -0800, Viacheslav Dubeyko wrote: > > From: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> > > > > This patch introduces CEPH_HIDDEN_DIR_NAME. It > > declares name of the hidden directory .ceph in > > the include/linux/ceph/ceph_fs.h instead of hiding > > it in dir.c file. Also hardcoded length of the name > > is changed on strlen(CEPH_HIDDEN_DIR_NAME). > > Hmm... > > Speaking of that area > * how the hell could ceph_lookup() ever be called with dentry > that is *NOT* negative? VFS certainly won't do that; I'm not sure about > ceph_handle_notrace_create(), but it doesn't look like that's possible > without server being malicious (if it's possible at all). > > * speaking of malicious servers, what happens if > it gets CEPH_MDS_OP_LOOKUP and it returns a normal reply to positive > lookup, but with cpu_to_le32(-ENOENT) shoved into head->result? > AFAICS, ceph_handle_snapdir() will be called with dentry > that is already made positive; results will not be pretty... I assume that you imply this code: /* can we conclude ENOENT locally? */ if (d_really_is_negative(dentry)) { struct ceph_inode_info *ci = ceph_inode(dir); struct ceph_dentry_info *di = ceph_dentry(dentry); spin_lock(&ci->i_ceph_lock); doutc(cl, " dir %llx.%llx flags are 0x%lx\n", ceph_vinop(dir), ci->i_ceph_flags); if (strncmp(dentry->d_name.name, fsc->mount_options->snapdir_name, dentry->d_name.len) && !is_root_ceph_dentry(dir, dentry) && ceph_test_mount_opt(fsc, DCACHE) && __ceph_dir_is_complete(ci) && __ceph_caps_issued_mask_metric(ci, CEPH_CAP_FILE_SHARED, 1)) { __ceph_touch_fmode(ci, mdsc, CEPH_FILE_MODE_RD); spin_unlock(&ci->i_ceph_lock); doutc(cl, " dir %llx.%llx complete, -ENOENT\n", ceph_vinop(dir)); d_add(dentry, NULL); di->lease_shared_gen = atomic_read(&ci->i_shared_gen); return NULL; } spin_unlock(&ci->i_ceph_lock); } Am I correct? So, how can we rework this code if it's wrong? What is your vision? Do you mean that it's dead code? How can we check it? Thanks, Slava.
On Tue, Jan 28, 2025 at 11:27:05PM +0000, Viacheslav Dubeyko wrote: > I assume that you imply this code: > > /* can we conclude ENOENT locally? */ > if (d_really_is_negative(dentry)) { > struct ceph_inode_info *ci = ceph_inode(dir); > struct ceph_dentry_info *di = ceph_dentry(dentry); > > spin_lock(&ci->i_ceph_lock); > doutc(cl, " dir %llx.%llx flags are 0x%lx\n", > ceph_vinop(dir), ci->i_ceph_flags); > if (strncmp(dentry->d_name.name, > fsc->mount_options->snapdir_name, > dentry->d_name.len) && > !is_root_ceph_dentry(dir, dentry) && > ceph_test_mount_opt(fsc, DCACHE) && > __ceph_dir_is_complete(ci) && > __ceph_caps_issued_mask_metric(ci, CEPH_CAP_FILE_SHARED, > 1)) { > __ceph_touch_fmode(ci, mdsc, CEPH_FILE_MODE_RD); > spin_unlock(&ci->i_ceph_lock); > doutc(cl, " dir %llx.%llx complete, -ENOENT\n", > ceph_vinop(dir)); > d_add(dentry, NULL); > di->lease_shared_gen = atomic_read(&ci->i_shared_gen); > return NULL; > } > spin_unlock(&ci->i_ceph_lock); > } > > Am I correct? So, how can we rework this code if it's wrong? What is your > vision? Do you mean that it's dead code? How can we check it? I mean that ->lookup() is called *ONLY* for a negative unhashed dentries. In other words, on a call from VFS that condition will always be true. That part is easily provable; what is harder to reason about is the direct call of ceph_lookup() from ceph_handle_notrace_create(). The callers of that thing (ceph_mknod(), ceph_symlink() and ceph_mkdir()) are all guaranteed that dentry will be negative when they are called. The hard-to-reason-about part is the call of ceph_mdsc_do_request() directly preceding the calls of ceph_handle_notrace_create(). Can ceph_mdsc_do_request() return 0, with req->r_reply_info.head->is_dentry being false *AND* a call of splice_dentry() made by ceph_fill_trace() called by ceph_mdsc_do_request()? AFAICS, there are 3 calls of splice_dentry(); two of them happen under explicit check for ->is_dentry and thus are not interesting for our purposes. The third one, though, could be hit with ->is_dentry being false and ->r_op being CEPH_MDS_OP_MKSNAP. That is not impossible from ceph_mkdir(), as far as I can tell, and I don't understand the details well enough to tell whether it can actually happen. Is it actually possible to hit ceph_handle_notrace_create() with a positive dentry?
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index 0bf388e07a02..5151c614b5cb 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -782,10 +782,16 @@ struct dentry *ceph_finish_lookup(struct ceph_mds_request *req, return dentry; } +static inline +bool is_hidden_ceph_dir(struct dentry *dentry) +{ + size_t len = strlen(CEPH_HIDDEN_DIR_NAME); + return strncmp(dentry->d_name.name, CEPH_HIDDEN_DIR_NAME, len) == 0; +} + static bool is_root_ceph_dentry(struct inode *inode, struct dentry *dentry) { - return ceph_ino(inode) == CEPH_INO_ROOT && - strncmp(dentry->d_name.name, ".ceph", 5) == 0; + return ceph_ino(inode) == CEPH_INO_ROOT && is_hidden_ceph_dir(dentry); } /* diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 2d7d86f0290d..84a1391aab29 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -31,6 +31,8 @@ #define CEPH_INO_CEPH 2 /* hidden .ceph dir */ #define CEPH_INO_GLOBAL_SNAPREALM 3 /* global dummy snaprealm */ +#define CEPH_HIDDEN_DIR_NAME ".ceph" + /* arbitrary limit on max # of monitors (cluster of 3 is typical) */ #define CEPH_MAX_MON 31