diff mbox

[RFC] client: don't use special inode for /..

Message ID 1470861005.2694.12.camel@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton Aug. 10, 2016, 8:30 p.m. UTC
On Wed, 2016-08-10 at 16:08 -0400, Patrick Donnelly wrote:
> On Wed, Aug 10, 2016 at 12:30 PM, Jeff Layton <jlayton@redhat.com>
> wrote:
> > 
> > The CEPH_INO_DOTDOT thing is quite strange. Under most OS (Linux
> > included), the parent of the root is itself. IOW, at the root, '.'
> > and
> > '..' refer to the same inode.
> > 
> > Change the ceph client to do the same, as this allows users to get
> > valid stat info for '..', as well as elimnating some special-
> > casing.
> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> 
> Don't forget Client::_lookup:
> 
>   if (dname == "..") {
>     if (dir->dn_set.empty())
>       r = -ENOENT;
>     else
>       *target = dir->get_first_parent()->dir->parent_inode; //dirs
> can't be hard-linked
>     goto done;
>   }
> 
> Otherwise LGTM.
> 


Ahh, thanks. So will dir->dn_set.empty() be true at the root? If so,
then something like the patch below?

Note that this patch is not strictly necessary, but it does simplify
some other changes that I have queued up:

Comments

Patrick Donnelly Aug. 10, 2016, 8:46 p.m. UTC | #1
On Wed, Aug 10, 2016 at 4:30 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Wed, 2016-08-10 at 16:08 -0400, Patrick Donnelly wrote:
>> On Wed, Aug 10, 2016 at 12:30 PM, Jeff Layton <jlayton@redhat.com>
>> wrote:
>> >
>> > The CEPH_INO_DOTDOT thing is quite strange. Under most OS (Linux
>> > included), the parent of the root is itself. IOW, at the root, '.'
>> > and
>> > '..' refer to the same inode.
>> >
>> > Change the ceph client to do the same, as this allows users to get
>> > valid stat info for '..', as well as elimnating some special-
>> > casing.
>> >
>> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
>>
>> Don't forget Client::_lookup:
>>
>>   if (dname == "..") {
>>     if (dir->dn_set.empty())
>>       r = -ENOENT;
>>     else
>>       *target = dir->get_first_parent()->dir->parent_inode; //dirs
>> can't be hard-linked
>>     goto done;
>>   }
>>
>> Otherwise LGTM.
>>
>
>
> Ahh, thanks. So will dir->dn_set.empty() be true at the root? If so,
> then something like the patch below?

Well, that's tricky actually. My understanding is that if dn_set is
empty then either the inode is unlinked or it is the root inode (from
the client's perspective). So the below patch is probably not quite
right? I think if the directory is unlinked but not the root, its ".."
should still refer to its first parent? The ENOENT error is probably
wrong.

> Note that this patch is not strictly necessary, but it does simplify
> some other changes that I have queued up:

I think the patch is a good change but there may be some other code
paths that need fixed. This change needs some simple tests.
Jeff Layton Aug. 10, 2016, 9:06 p.m. UTC | #2
On Wed, 2016-08-10 at 16:46 -0400, Patrick Donnelly wrote:
> > On Wed, Aug 10, 2016 at 4:30 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > On Wed, 2016-08-10 at 16:08 -0400, Patrick Donnelly wrote:
> > > 
> > > > > > On Wed, Aug 10, 2016 at 12:30 PM, Jeff Layton <jlayton@redhat.com>
> > > wrote:
> > > > 
> > > > 
> > > > The CEPH_INO_DOTDOT thing is quite strange. Under most OS (Linux
> > > > included), the parent of the root is itself. IOW, at the root, '.'
> > > > and
> > > > '..' refer to the same inode.
> > > > 
> > > > Change the ceph client to do the same, as this allows users to get
> > > > valid stat info for '..', as well as elimnating some special-
> > > > casing.
> > > > 
> > > > > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > 
> > > Don't forget Client::_lookup:
> > > 
> > >   if (dname == "..") {
> > >     if (dir->dn_set.empty())
> > >       r = -ENOENT;
> > >     else
> > >       *target = dir->get_first_parent()->dir->parent_inode; //dirs
> > > can't be hard-linked
> > >     goto done;
> > >   }
> > > 
> > > Otherwise LGTM.
> > > 
> > 
> > 
> > Ahh, thanks. So will dir->dn_set.empty() be true at the root? If so,
> > then something like the patch below?
> 
> Well, that's tricky actually. My understanding is that if dn_set is
> empty then either the inode is unlinked or it is the root inode (from
> the client's perspective). So the below patch is probably not quite
> right? I think if the directory is unlinked but not the root, its ".."
> should still refer to its first parent? The ENOENT error is probably
> wrong.
> 

Ok, so is there some way to reliably tell whether it's the root? Should
we instead check whether it's inode number is CEPH_INO_ROOT ?

> > 
> > Note that this patch is not strictly necessary, but it does simplify
> > some other changes that I have queued up:
> 
> I think the patch is a good change but there may be some other code
> paths that need fixed. This change needs some simple tests.
> 

Yeah, agreed. I'll plan to add some if this patch is reasonable. I just
wanted to float the patch out here as an RFC first, in case I was
missing some reason that we needed to keep CEPH_INO_DOTDOT.

Thanks for having a look!
Patrick Donnelly Aug. 10, 2016, 9:15 p.m. UTC | #3
On Wed, Aug 10, 2016 at 5:06 PM, Jeff Layton <jlayton@redhat.com> wrote:
> On Wed, 2016-08-10 at 16:46 -0400, Patrick Donnelly wrote:
>> > On Wed, Aug 10, 2016 at 4:30 PM, Jeff Layton <jlayton@redhat.com> wrote:
>> >
>> > On Wed, 2016-08-10 at 16:08 -0400, Patrick Donnelly wrote:
>> > >
>> > > > > > On Wed, Aug 10, 2016 at 12:30 PM, Jeff Layton <jlayton@redhat.com>
>> > > wrote:
>> > > >
>> > > >
>> > > > The CEPH_INO_DOTDOT thing is quite strange. Under most OS (Linux
>> > > > included), the parent of the root is itself. IOW, at the root, '.'
>> > > > and
>> > > > '..' refer to the same inode.
>> > > >
>> > > > Change the ceph client to do the same, as this allows users to get
>> > > > valid stat info for '..', as well as elimnating some special-
>> > > > casing.
>> > > >
>> > > > > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
>> > >
>> > > Don't forget Client::_lookup:
>> > >
>> > >   if (dname == "..") {
>> > >     if (dir->dn_set.empty())
>> > >       r = -ENOENT;
>> > >     else
>> > >       *target = dir->get_first_parent()->dir->parent_inode; //dirs
>> > > can't be hard-linked
>> > >     goto done;
>> > >   }
>> > >
>> > > Otherwise LGTM.
>> > >
>> >
>> >
>> > Ahh, thanks. So will dir->dn_set.empty() be true at the root? If so,
>> > then something like the patch below?
>>
>> Well, that's tricky actually. My understanding is that if dn_set is
>> empty then either the inode is unlinked or it is the root inode (from
>> the client's perspective). So the below patch is probably not quite
>> right? I think if the directory is unlinked but not the root, its ".."
>> should still refer to its first parent? The ENOENT error is probably
>> wrong.
>>
>
> Ok, so is there some way to reliably tell whether it's the root? Should
> we instead check whether it's inode number is CEPH_INO_ROOT ?

Inode::is_root should work. By the way, I see now that the readdir
code is also wrong. It should not need to check dn_set at all (just
is_root()). (It doesn't matter if the directory inode is unlinked, the
parent is still visible.)
Sage Weil Aug. 10, 2016, 9:23 p.m. UTC | #4
On Wed, 10 Aug 2016, Jeff Layton wrote:
> On Wed, 2016-08-10 at 16:08 -0400, Patrick Donnelly wrote:
> > On Wed, Aug 10, 2016 at 12:30 PM, Jeff Layton <jlayton@redhat.com>
> > wrote:
> > > 
> > > The CEPH_INO_DOTDOT thing is quite strange. Under most OS (Linux
> > > included), the parent of the root is itself. IOW, at the root, '.'
> > > and
> > > '..' refer to the same inode.
> > > 
> > > Change the ceph client to do the same, as this allows users to get
> > > valid stat info for '..', as well as elimnating some special-
> > > casing.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > 
> > Don't forget Client::_lookup:
> > 
> >   if (dname == "..") {
> >     if (dir->dn_set.empty())
> >       r = -ENOENT;
> >     else
> >       *target = dir->get_first_parent()->dir->parent_inode; //dirs
> > can't be hard-linked
> >     goto done;
> >   }
> > 
> > Otherwise LGTM.
> > 
> 
> 
> Ahh, thanks. So will dir->dn_set.empty() be true at the root? If so,
> then something like the patch below?
> 
> Note that this patch is not strictly necessary, but it does simplify
> some other changes that I have queued up:
> 
> diff --git a/src/client/Client.cc b/src/client/Client.cc
> index 5ab0ace4d3df..287baaf20536 100644
> --- a/src/client/Client.cc
> +++ b/src/client/Client.cc
> @@ -5924,7 +5924,7 @@ int Client::_lookup(Inode *dir, const string& dname, int mask,
>  
>    if (dname == "..") {
>      if (dir->dn_set.empty())
> -      r = -ENOENT;
> +      *target = dir;
>      else
>        *target = dir->get_first_parent()->dir->parent_inode; //dirs can't be hard-linked
>      goto done;

IIRC I did the dotdot thing originally because otherwise the '..' entry at 
the mount point in ls -al didn't point to the parent directory.  Having 
the fs explicitly do .. at all seems pretty weird to me... it seems like 
the VFS should be doing this.  But in any case, I'd just verify that it 
behaves the same way a real mount does after this change.

sage
Jeff Layton Aug. 10, 2016, 9:52 p.m. UTC | #5
On Wed, 2016-08-10 at 21:23 +0000, Sage Weil wrote:
> On Wed, 10 Aug 2016, Jeff Layton wrote:
> > On Wed, 2016-08-10 at 16:08 -0400, Patrick Donnelly wrote:
> > > > On Wed, Aug 10, 2016 at 12:30 PM, Jeff Layton <jlayton@redhat.com>
> > > wrote:
> > > > 
> > > > The CEPH_INO_DOTDOT thing is quite strange. Under most OS (Linux
> > > > included), the parent of the root is itself. IOW, at the root, '.'
> > > > and
> > > > '..' refer to the same inode.
> > > > 
> > > > Change the ceph client to do the same, as this allows users to get
> > > > valid stat info for '..', as well as elimnating some special-
> > > > casing.
> > > > 
> > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > 
> > > Don't forget Client::_lookup:
> > > 
> > >   if (dname == "..") {
> > >     if (dir->dn_set.empty())
> > >       r = -ENOENT;
> > >     else
> > >       *target = dir->get_first_parent()->dir->parent_inode; //dirs
> > > can't be hard-linked
> > >     goto done;
> > >   }
> > > 
> > > Otherwise LGTM.
> > > 
> > 
> > 
> > Ahh, thanks. So will dir->dn_set.empty() be true at the root? If so,
> > then something like the patch below?
> > 
> > Note that this patch is not strictly necessary, but it does simplify
> > some other changes that I have queued up:
> > 
> > diff --git a/src/client/Client.cc b/src/client/Client.cc
> > index 5ab0ace4d3df..287baaf20536 100644
> > --- a/src/client/Client.cc
> > +++ b/src/client/Client.cc
> > @@ -5924,7 +5924,7 @@ int Client::_lookup(Inode *dir, const string& dname, int mask,
> >  
> >    if (dname == "..") {
> >      if (dir->dn_set.empty())
> > -      r = -ENOENT;
> > +      *target = dir;
> >      else
> >        *target = dir->get_first_parent()->dir->parent_inode; //dirs can't be hard-linked
> >      goto done;
> 
> IIRC I did the dotdot thing originally because otherwise the '..' entry at 
> the mount point in ls -al didn't point to the parent directory.  Having 
> the fs explicitly do .. at all seems pretty weird to me... it seems like 
> the VFS should be doing this.  But in any case, I'd just verify that it 
> behaves the same way a real mount does after this change.
> 
> sage


The Linux VFS will definitely already handle ".." correctly (as you end
up doing a transition to a different vfsmount). So this shouldn't
affect ceph-fuse, AFAICT.

I think this change would primarily be noticed by those using libcephfs
directly...either in ceph_readdir/ceph_lookup (and related) calls, or
during a pathwalk.

That said, Patrick's suggestion to add some tests around this makes
sense. I'll plan to spin that up so we can be clear on how the behavior
changes.

Thanks,
Jeff Layton Aug. 10, 2016, 10:21 p.m. UTC | #6
On Wed, 2016-08-10 at 17:15 -0400, Patrick Donnelly wrote:
> > On Wed, Aug 10, 2016 at 5:06 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > On Wed, 2016-08-10 at 16:46 -0400, Patrick Donnelly wrote:
> > > 
> > > > 
> > > > On Wed, Aug 10, 2016 at 4:30 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > 
> > > > On Wed, 2016-08-10 at 16:08 -0400, Patrick Donnelly wrote:
> > > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > On Wed, Aug 10, 2016 at 12:30 PM, Jeff Layton <jlayton@redhat.com>
> > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > The CEPH_INO_DOTDOT thing is quite strange. Under most OS (Linux
> > > > > > included), the parent of the root is itself. IOW, at the root, '.'
> > > > > > and
> > > > > > '..' refer to the same inode.
> > > > > > 
> > > > > > Change the ceph client to do the same, as this allows users to get
> > > > > > valid stat info for '..', as well as elimnating some special-
> > > > > > casing.
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > > > 
> > > > > Don't forget Client::_lookup:
> > > > > 
> > > > >   if (dname == "..") {
> > > > >     if (dir->dn_set.empty())
> > > > >       r = -ENOENT;
> > > > >     else
> > > > >       *target = dir->get_first_parent()->dir->parent_inode; //dirs
> > > > > can't be hard-linked
> > > > >     goto done;
> > > > >   }
> > > > > 
> > > > > Otherwise LGTM.
> > > > > 
> > > > 
> > > > 
> > > > Ahh, thanks. So will dir->dn_set.empty() be true at the root? If so,
> > > > then something like the patch below?
> > > 
> > > Well, that's tricky actually. My understanding is that if dn_set is
> > > empty then either the inode is unlinked or it is the root inode (from
> > > the client's perspective). So the below patch is probably not quite
> > > right? I think if the directory is unlinked but not the root, its ".."
> > > should still refer to its first parent? The ENOENT error is probably
> > > wrong.
> > > 
> > 
> > Ok, so is there some way to reliably tell whether it's the root? Should
> > we instead check whether it's inode number is CEPH_INO_ROOT ?
> 
> Inode::is_root should work. By the way, I see now that the readdir
> code is also wrong. It should not need to check dn_set at all (just
> is_root()). (It doesn't matter if the directory inode is unlinked, the
> parent is still visible.)
> 


Ahh thanks. I'll see about fixing that up while I'm in there too.

Your point about is_root is valid, but I think we should step back a
min and consider how we expect this to work when we mount a subtree of
the MDS root.

Suppose I do:

    ceph_mount(&cmount, "/foo/bar/baz");
    ceph_lookup(&cmount, "/..", &st);

...what should we ultimately end up stat'ing there? Should I get back
the info for "bar" or "baz" ?

Thanks,
Jeff Layton Aug. 11, 2016, 1:31 a.m. UTC | #7
On Wed, 2016-08-10 at 18:21 -0400, Jeff Layton wrote:
> On Wed, 2016-08-10 at 17:15 -0400, Patrick Donnelly wrote:
> > 
> > > 
> > > On Wed, Aug 10, 2016 at 5:06 PM, Jeff Layton <jlayton@redhat.com>
> > > wrote:
> > > 
> > > On Wed, 2016-08-10 at 16:46 -0400, Patrick Donnelly wrote:
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > On Wed, Aug 10, 2016 at 4:30 PM, Jeff Layton <jlayton@redhat.
> > > > > com> wrote:
> > > > > 
> > > > > On Wed, 2016-08-10 at 16:08 -0400, Patrick Donnelly wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Wed, Aug 10, 2016 at 12:30 PM, Jeff Layton <jlayto
> > > > > > > > > n@redhat.com>
> > > > > > wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > The CEPH_INO_DOTDOT thing is quite strange. Under most OS
> > > > > > > (Linux
> > > > > > > included), the parent of the root is itself. IOW, at the
> > > > > > > root, '.'
> > > > > > > and
> > > > > > > '..' refer to the same inode.
> > > > > > > 
> > > > > > > Change the ceph client to do the same, as this allows
> > > > > > > users to get
> > > > > > > valid stat info for '..', as well as elimnating some
> > > > > > > special-
> > > > > > > casing.
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jeff Layton <j
> > > > > > > > > > > > > > > > > > > > > layton@redhat.com>
> > > > > > 
> > > > > > Don't forget Client::_lookup:
> > > > > > 
> > > > > >   if (dname == "..") {
> > > > > >     if (dir->dn_set.empty())
> > > > > >       r = -ENOENT;
> > > > > >     else
> > > > > >       *target = dir->get_first_parent()->dir->parent_inode; 
> > > > > > //dirs
> > > > > > can't be hard-linked
> > > > > >     goto done;
> > > > > >   }
> > > > > > 
> > > > > > Otherwise LGTM.
> > > > > > 
> > > > > 
> > > > > 
> > > > > Ahh, thanks. So will dir->dn_set.empty() be true at the root?
> > > > > If so,
> > > > > then something like the patch below?
> > > > 
> > > > Well, that's tricky actually. My understanding is that if
> > > > dn_set is
> > > > empty then either the inode is unlinked or it is the root inode
> > > > (from
> > > > the client's perspective). So the below patch is probably not
> > > > quite
> > > > right? I think if the directory is unlinked but not the root,
> > > > its ".."
> > > > should still refer to its first parent? The ENOENT error is
> > > > probably
> > > > wrong.
> > > > 
> > > 
> > > Ok, so is there some way to reliably tell whether it's the root?
> > > Should
> > > we instead check whether it's inode number is CEPH_INO_ROOT ?
> > 
> > Inode::is_root should work. By the way, I see now that the readdir
> > code is also wrong. It should not need to check dn_set at all (just
> > is_root()). (It doesn't matter if the directory inode is unlinked,
> > the
> > parent is still visible.)
> > 
> 
> 
> Ahh thanks. I'll see about fixing that up while I'm in there too.
> 
> Your point about is_root is valid, but I think we should step back a
> min and consider how we expect this to work when we mount a subtree
> of
> the MDS root.
> 
> Suppose I do:
> 
>     ceph_mount(&cmount, "/foo/bar/baz");
>     ceph_lookup(&cmount, "/..", &st);
> 
> ...what should we ultimately end up stat'ing there? Should I get back
> the info for "bar" or "baz" ?


Thinking out loud...

I think we'd want that to return the info for "baz" since anything else
would mean escaping from cmount. So, looking at the code...maybe we
should alter Inode->is_root() to something like:

  bool is_root() { return ino == client->get_root_ino; }

I don't think there are any callers of Inode->is_root currently, so I
wouldn't think this would break anything. Then we could call that to
see if we're at the root when doing a lookup or readdir of "..".
diff mbox

Patch

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 5ab0ace4d3df..287baaf20536 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -5924,7 +5924,7 @@  int Client::_lookup(Inode *dir, const string& dname, int mask,
 
   if (dname == "..") {
     if (dir->dn_set.empty())
-      r = -ENOENT;
+      *target = dir;
     else
       *target = dir->get_first_parent()->dir->parent_inode; //dirs can't be hard-linked
     goto done;