Message ID | 20191211012940.18128-1-xiubli@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] ceph: check availability of mds cluster on mount after wait timeout | expand |
On Tue, 2019-12-10 at 20:29 -0500, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > If all the MDS daemons are down for some reasons and for the first > time to do the mount, it will fail with IO error after the mount > request timed out. > > Or if the cluster becomes laggy suddenly, and just before the kclient > getting the new mdsmap and the mount request is fired off, it also > will fail with IO error. > > This will add some useful hint message by checking the cluster state > before the fail the mount operation. > > Signed-off-by: Xiubo Li <xiubli@redhat.com> > --- > > V3: > - Rebase to the new mount API version. > > fs/ceph/mds_client.c | 3 +-- > fs/ceph/super.c | 5 +++++ > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 7d3ec051f179..bf507120659e 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -2576,8 +2576,7 @@ static void __do_request(struct ceph_mds_client *mdsc, > if (!(mdsc->fsc->mount_options->flags & > CEPH_MOUNT_OPT_MOUNTWAIT) && > !ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) { > - err = -ENOENT; > - pr_info("probably no mds server is up\n"); > + err = -EHOSTUNREACH; > goto finish; > } > } > diff --git a/fs/ceph/super.c b/fs/ceph/super.c > index 9c9a7c68eea3..6f33a265ccf1 100644 > --- a/fs/ceph/super.c > +++ b/fs/ceph/super.c > @@ -1068,6 +1068,11 @@ static int ceph_get_tree(struct fs_context *fc) > return 0; > > out_splat: > + if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) { > + pr_info("No mds server is up or the cluster is laggy\n"); > + err = -EHOSTUNREACH; > + } > + > ceph_mdsc_close_sessions(fsc->mdsc); > deactivate_locked_super(sb); > goto out_final; Looks reasonable. Merged into testing branch with a revamped changelog. Please have a look at the testing branch and make sure the changelog is OK with you. Thanks,
On 2019/12/11 21:17, Jeff Layton wrote: > On Tue, 2019-12-10 at 20:29 -0500, xiubli@redhat.com wrote: >> From: Xiubo Li <xiubli@redhat.com> >> >> If all the MDS daemons are down for some reasons and for the first >> time to do the mount, it will fail with IO error after the mount >> request timed out. >> >> Or if the cluster becomes laggy suddenly, and just before the kclient >> getting the new mdsmap and the mount request is fired off, it also >> will fail with IO error. >> >> This will add some useful hint message by checking the cluster state >> before the fail the mount operation. >> >> Signed-off-by: Xiubo Li <xiubli@redhat.com> >> --- >> >> V3: >> - Rebase to the new mount API version. >> >> fs/ceph/mds_client.c | 3 +-- >> fs/ceph/super.c | 5 +++++ >> 2 files changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >> index 7d3ec051f179..bf507120659e 100644 >> --- a/fs/ceph/mds_client.c >> +++ b/fs/ceph/mds_client.c >> @@ -2576,8 +2576,7 @@ static void __do_request(struct ceph_mds_client *mdsc, >> if (!(mdsc->fsc->mount_options->flags & >> CEPH_MOUNT_OPT_MOUNTWAIT) && >> !ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) { >> - err = -ENOENT; >> - pr_info("probably no mds server is up\n"); >> + err = -EHOSTUNREACH; >> goto finish; >> } >> } >> diff --git a/fs/ceph/super.c b/fs/ceph/super.c >> index 9c9a7c68eea3..6f33a265ccf1 100644 >> --- a/fs/ceph/super.c >> +++ b/fs/ceph/super.c >> @@ -1068,6 +1068,11 @@ static int ceph_get_tree(struct fs_context *fc) >> return 0; >> >> out_splat: >> + if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) { >> + pr_info("No mds server is up or the cluster is laggy\n"); >> + err = -EHOSTUNREACH; >> + } >> + >> ceph_mdsc_close_sessions(fsc->mdsc); >> deactivate_locked_super(sb); >> goto out_final; > Looks reasonable. Merged into testing branch with a revamped changelog. > Please have a look at the testing branch and make sure the changelog is > OK with you. Yeah, that looks good to me. Thanks. > > Thanks,
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 7d3ec051f179..bf507120659e 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2576,8 +2576,7 @@ static void __do_request(struct ceph_mds_client *mdsc, if (!(mdsc->fsc->mount_options->flags & CEPH_MOUNT_OPT_MOUNTWAIT) && !ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) { - err = -ENOENT; - pr_info("probably no mds server is up\n"); + err = -EHOSTUNREACH; goto finish; } } diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 9c9a7c68eea3..6f33a265ccf1 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -1068,6 +1068,11 @@ static int ceph_get_tree(struct fs_context *fc) return 0; out_splat: + if (!ceph_mdsmap_is_cluster_available(fsc->mdsc->mdsmap)) { + pr_info("No mds server is up or the cluster is laggy\n"); + err = -EHOSTUNREACH; + } + ceph_mdsc_close_sessions(fsc->mdsc); deactivate_locked_super(sb); goto out_final;