Message ID | 20191119130440.19384-1-xiubli@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ceph: check availability of mds cluster on mount after wait timeout | expand |
On Tue, 2019-11-19 at 08:04 -0500, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > If all the MDS daemons are down for some reasons, and immediately > just before the kclient getting the new mdsmap the mount request is > fired out, it will be the request wait will timeout with -EIO. > > After this just check the mds cluster availability to give a friendly > hint to let the users check the MDS cluster status. > > Signed-off-by: Xiubo Li <xiubli@redhat.com> > --- > fs/ceph/mds_client.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index a5163296d9d9..82a929084671 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -2712,6 +2712,9 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc, > if (test_bit(CEPH_MDS_R_GOT_RESULT, &req->r_req_flags)) { > err = le32_to_cpu(req->r_reply_info.head->result); > } else if (err < 0) { > + if (!ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) > + pr_info("probably no mds server is up\n"); > + > dout("aborted request %lld with %d\n", req->r_tid, err); > > /* Probably? If they're all unavailable then definitely. Also, this is a pr_info message, so you probably need to prefix this with "ceph: ". Beyond that though, do we want to do this in what amounts to low-level infrastructure for MDS requests? I wonder if a warning like this would be better suited in open_root_dentry(). If ceph_mdsc_do_request returns -EIO [1] maybe have open_root_dentry do the check and pr_info? [1]: Why does it use -EIO here anyway? Wouldn't -ETIMEOUT or something be better? Maybe the worry was that that error could bubble up to userla nd?
On 2019/11/20 1:28, Jeff Layton wrote: > On Tue, 2019-11-19 at 08:04 -0500, xiubli@redhat.com wrote: >> From: Xiubo Li <xiubli@redhat.com> >> >> If all the MDS daemons are down for some reasons, and immediately >> just before the kclient getting the new mdsmap the mount request is >> fired out, it will be the request wait will timeout with -EIO. >> >> After this just check the mds cluster availability to give a friendly >> hint to let the users check the MDS cluster status. >> >> Signed-off-by: Xiubo Li <xiubli@redhat.com> >> --- >> fs/ceph/mds_client.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >> index a5163296d9d9..82a929084671 100644 >> --- a/fs/ceph/mds_client.c >> +++ b/fs/ceph/mds_client.c >> @@ -2712,6 +2712,9 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc, >> if (test_bit(CEPH_MDS_R_GOT_RESULT, &req->r_req_flags)) { >> err = le32_to_cpu(req->r_reply_info.head->result); >> } else if (err < 0) { >> + if (!ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) >> + pr_info("probably no mds server is up\n"); >> + >> dout("aborted request %lld with %d\n", req->r_tid, err); >> >> /* > Probably? If they're all unavailable then definitely. Currently, the ceph_mdsmap_is_cluster_available() is a bit buggy, and my commit comment is not very correct and detail too. In case: # ceph fs dump [...] max_mds 3 in 0,1,2 up {0=5139,1=4837,2=4985} failed damaged stopped data_pools [2] metadata_pool 1 inline_data disabled balancer standby_count_wanted 1 [mds.a{0:5139} state up:active seq 7 laggy since 2019-11-20T01:04:13.040701-0500 addr v1:192.168.195.165:6813/2514516359] [mds.b{1:4837} state up:active seq 6 addr v1:192.168.195.165:6815/1921459709] [mds.f{2:4985} state up:active seq 6 laggy since 2019-11-20T01:04:13.040685-0500 addr v1:192.168.195.165:6814/3730607184] The m->m_num_laggy == 2, but there still has one MDS in (up:active & !laggy) state. In this case if the mount request choose the mds.a, there still has the IO errors and failure. A better choice is that it can choose the mds.b instead. Currently the ceph_mdsmap_is_cluster_available() will just return false if there has any MDS is laggy. I will fix it. But even after fixing it, in a corner case that the Monitor may take a while to update the laggy stat in mdsmap, at this time even though the mds.a and mds.f have already crashed, but the state is still in up:active without laggy, and if we do mount it may still choose the mds.a, then it will fail too. But that do not mean that the MDS cluster is not totally available. The "Probaly" here is in case of this corner case. Will it make sense ? > Also, this is a > pr_info message, so you probably need to prefix this with "ceph: ". For the pr_info message it will add the module name as a prefix automatically: "<6>[23167.778366] ceph: probably no mds server is up" This should be enough. > > Beyond that though, do we want to do this in what amounts to low-level > infrastructure for MDS requests? > > I wonder if a warning like this would be better suited in > open_root_dentry(). If ceph_mdsc_do_request returns -EIO [1] maybe have > open_root_dentry do the check and pr_info? Yeah, I was also thinking to bubble it up to the mount.ceph daemon in userland, but still not sure which errno should it be, just -ETIMEOUT or some others. > [1]: Why does it use -EIO here anyway? Wouldn't -ETIMEOUT or something > be better? Maybe the worry was that that error could bubble up to userla > nd? Yeah, I also have the same doubt, this is also the general metadata IO paths for other operations, such as "create/lookup...". And in the mount operation it really will bubble up to the mount.ceph in userland. Thanks BRs >
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index a5163296d9d9..82a929084671 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2712,6 +2712,9 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc, if (test_bit(CEPH_MDS_R_GOT_RESULT, &req->r_req_flags)) { err = le32_to_cpu(req->r_reply_info.head->result); } else if (err < 0) { + if (!ceph_mdsmap_is_cluster_available(mdsc->mdsmap)) + pr_info("probably no mds server is up\n"); + dout("aborted request %lld with %d\n", req->r_tid, err); /*