Message ID | 20190821120724.23614-1-idryomov@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | libceph: fix PG split vs OSD (re)connect race | expand |
On Wed, 2019-08-21 at 14:07 +0200, Ilya Dryomov wrote: > We can't rely on ->peer_features in calc_target() because it may be > called both when the OSD session is established and open and when it's > not. ->peer_features is not valid unless the OSD session is open. If > this happens on a PG split (pg_num increase), that could mean we don't > resend a request that should have been resent, hanging the client > indefinitely. > > In userspace this was fixed by looking at require_osd_release and > get_xinfo[osd].features fields of the osdmap. However these fields > belong to the OSD section of the osdmap, which the kernel doesn't > decode (only the client section is decoded). > > Instead, let's drop this feature check. It effectively checks for > luminous, so only pre-luminous OSDs would be affected in that on a PG > split the kernel might resend a request that should not have been > resent. Duplicates can occur in other scenarios, so both sides should > already be prepared for them: see dup/replay logic on the OSD side and > retry_attempt check on the client side. > > Cc: stable@vger.kernel.org > Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT") > Reported-by: Jerry Lee <leisurelysw24@gmail.com> > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > net/ceph/osd_client.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > index fed6b0334609..4e78d1ddd441 100644 > --- a/net/ceph/osd_client.c > +++ b/net/ceph/osd_client.c > @@ -1514,7 +1514,7 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, > struct ceph_osds up, acting; > bool force_resend = false; > bool unpaused = false; > - bool legacy_change; > + bool legacy_change = false; > bool split = false; > bool sort_bitwise = ceph_osdmap_flag(osdc, CEPH_OSDMAP_SORTBITWISE); > bool recovery_deletes = ceph_osdmap_flag(osdc, > @@ -1602,15 +1602,14 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, > t->osd = acting.primary; > } > > - if (unpaused || legacy_change || force_resend || > - (split && con && CEPH_HAVE_FEATURE(con->peer_features, > - RESEND_ON_SPLIT))) > + if (unpaused || legacy_change || force_resend || split) > ct_res = CALC_TARGET_NEED_RESEND; > else > ct_res = CALC_TARGET_NO_ACTION; > > out: > - dout("%s t %p -> ct_res %d osd %d\n", __func__, t, ct_res, t->osd); > + dout("%s t %p -> %d%d%d%d ct_res %d osd%d\n", __func__, t, unpaused, > + legacy_change, force_resend, split, ct_res, t->osd); > return ct_res; > } > Reviewed-by: Jeff Layton <jlayton@kernel.org>
With this patch, the issue isn't encountered in my environment (more than 20 runs of tests). Tested-by: Jerry Lee <leisurelysw24@gmail.com> Thanks! On Wed, 21 Aug 2019 at 22:56, Jeff Layton <jlayton@kernel.org> wrote: > > On Wed, 2019-08-21 at 14:07 +0200, Ilya Dryomov wrote: > > We can't rely on ->peer_features in calc_target() because it may be > > called both when the OSD session is established and open and when it's > > not. ->peer_features is not valid unless the OSD session is open. If > > this happens on a PG split (pg_num increase), that could mean we don't > > resend a request that should have been resent, hanging the client > > indefinitely. > > > > In userspace this was fixed by looking at require_osd_release and > > get_xinfo[osd].features fields of the osdmap. However these fields > > belong to the OSD section of the osdmap, which the kernel doesn't > > decode (only the client section is decoded). > > > > Instead, let's drop this feature check. It effectively checks for > > luminous, so only pre-luminous OSDs would be affected in that on a PG > > split the kernel might resend a request that should not have been > > resent. Duplicates can occur in other scenarios, so both sides should > > already be prepared for them: see dup/replay logic on the OSD side and > > retry_attempt check on the client side. > > > > Cc: stable@vger.kernel.org > > Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT") > > Reported-by: Jerry Lee <leisurelysw24@gmail.com> > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > > --- > > net/ceph/osd_client.c | 9 ++++----- > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c > > index fed6b0334609..4e78d1ddd441 100644 > > --- a/net/ceph/osd_client.c > > +++ b/net/ceph/osd_client.c > > @@ -1514,7 +1514,7 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, > > struct ceph_osds up, acting; > > bool force_resend = false; > > bool unpaused = false; > > - bool legacy_change; > > + bool legacy_change = false; > > bool split = false; > > bool sort_bitwise = ceph_osdmap_flag(osdc, CEPH_OSDMAP_SORTBITWISE); > > bool recovery_deletes = ceph_osdmap_flag(osdc, > > @@ -1602,15 +1602,14 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, > > t->osd = acting.primary; > > } > > > > - if (unpaused || legacy_change || force_resend || > > - (split && con && CEPH_HAVE_FEATURE(con->peer_features, > > - RESEND_ON_SPLIT))) > > + if (unpaused || legacy_change || force_resend || split) > > ct_res = CALC_TARGET_NEED_RESEND; > > else > > ct_res = CALC_TARGET_NO_ACTION; > > > > out: > > - dout("%s t %p -> ct_res %d osd %d\n", __func__, t, ct_res, t->osd); > > + dout("%s t %p -> %d%d%d%d ct_res %d osd%d\n", __func__, t, unpaused, > > + legacy_change, force_resend, split, ct_res, t->osd); > > return ct_res; > > } > > > > Reviewed-by: Jeff Layton <jlayton@kernel.org> >
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index fed6b0334609..4e78d1ddd441 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1514,7 +1514,7 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, struct ceph_osds up, acting; bool force_resend = false; bool unpaused = false; - bool legacy_change; + bool legacy_change = false; bool split = false; bool sort_bitwise = ceph_osdmap_flag(osdc, CEPH_OSDMAP_SORTBITWISE); bool recovery_deletes = ceph_osdmap_flag(osdc, @@ -1602,15 +1602,14 @@ static enum calc_target_result calc_target(struct ceph_osd_client *osdc, t->osd = acting.primary; } - if (unpaused || legacy_change || force_resend || - (split && con && CEPH_HAVE_FEATURE(con->peer_features, - RESEND_ON_SPLIT))) + if (unpaused || legacy_change || force_resend || split) ct_res = CALC_TARGET_NEED_RESEND; else ct_res = CALC_TARGET_NO_ACTION; out: - dout("%s t %p -> ct_res %d osd %d\n", __func__, t, ct_res, t->osd); + dout("%s t %p -> %d%d%d%d ct_res %d osd%d\n", __func__, t, unpaused, + legacy_change, force_resend, split, ct_res, t->osd); return ct_res; }
We can't rely on ->peer_features in calc_target() because it may be called both when the OSD session is established and open and when it's not. ->peer_features is not valid unless the OSD session is open. If this happens on a PG split (pg_num increase), that could mean we don't resend a request that should have been resent, hanging the client indefinitely. In userspace this was fixed by looking at require_osd_release and get_xinfo[osd].features fields of the osdmap. However these fields belong to the OSD section of the osdmap, which the kernel doesn't decode (only the client section is decoded). Instead, let's drop this feature check. It effectively checks for luminous, so only pre-luminous OSDs would be affected in that on a PG split the kernel might resend a request that should not have been resent. Duplicates can occur in other scenarios, so both sides should already be prepared for them: see dup/replay logic on the OSD side and retry_attempt check on the client side. Cc: stable@vger.kernel.org Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT") Reported-by: Jerry Lee <leisurelysw24@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- net/ceph/osd_client.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)