From patchwork Mon Nov 25 11:08:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiubo Li X-Patchwork-Id: 11260235 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB89615AC for ; Mon, 25 Nov 2019 11:09:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BCAE22075C for ; Mon, 25 Nov 2019 11:09:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UCMC37QJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727648AbfKYLJY (ORCPT ); Mon, 25 Nov 2019 06:09:24 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:54713 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727495AbfKYLJY (ORCPT ); Mon, 25 Nov 2019 06:09:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574680162; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=myFxpDrAzanEG01Ae41+b1PiW1Sqeoj4x/3weLu2EX8=; b=UCMC37QJy7Q1zXPLysxIKEz2upwY5y6Ga7A9DReRavKa5le04oAiDaqagaSdehLzVmtMll M0EVnMvveqFKH/4U1MeBxEfRT5Uwc34Y7pBnTi3LTG6/0KXa31/D+eiJycSQL/utMEDVOL iHy7slaXvsgS9sneIy5ZQWPNIPnw2Kc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-150-UCeHMbNJOL-aRMxoHyd-nQ-1; Mon, 25 Nov 2019 06:09:18 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 67E46DB65; Mon, 25 Nov 2019 11:09:17 +0000 (UTC) Received: from localhost.localdomain (ovpn-12-66.pek2.redhat.com [10.72.12.66]) by smtp.corp.redhat.com (Postfix) with ESMTP id 11FC55D9CA; Mon, 25 Nov 2019 11:09:02 +0000 (UTC) From: xiubli@redhat.com To: jlayton@kernel.org Cc: sage@redhat.com, idryomov@gmail.com, zyan@redhat.com, pdonnell@redhat.com, ceph-devel@vger.kernel.org, Xiubo Li Subject: [PATCH v2 2/3] mdsmap: fix mdsmap cluster available check based on laggy number Date: Mon, 25 Nov 2019 06:08:26 -0500 Message-Id: <20191125110827.12827-3-xiubli@redhat.com> In-Reply-To: <20191125110827.12827-1-xiubli@redhat.com> References: <20191125110827.12827-1-xiubli@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-MC-Unique: UCeHMbNJOL-aRMxoHyd-nQ-1 X-Mimecast-Spam-Score: 0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org From: Xiubo Li In case the max_mds > 1 in MDS cluster and there is no any standby MDS and all the max_mds MDSs are in up:active state, if one of the up:active MDSs is dead, the m->m_num_laggy in kclient will be 1. Then the mount will fail without considering other healthy MDSs. There manybe some MDSs still "in" the cluster but not in up:active state, we will ignore them. Only when all the up:active MDSs in the cluster are laggy will treat the cluster as not be available. In case decreasing the max_mds, the cluster will not stop the extra up:active MDSs immediately and there will be a latency. During it the up:active MDS number will be larger than the max_mds, so later the m_info memories will 100% be reallocated. Here will pick out the up:active MDSs as the m_num_mds and allocate the needed memories once. Signed-off-by: Xiubo Li --- fs/ceph/mdsmap.c | 32 ++++++++++---------------------- include/linux/ceph/mdsmap.h | 5 +++-- 2 files changed, 13 insertions(+), 24 deletions(-) diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c index 471bac335fae..cc9ec959fe46 100644 --- a/fs/ceph/mdsmap.c +++ b/fs/ceph/mdsmap.c @@ -138,14 +138,21 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end) m->m_session_autoclose = ceph_decode_32(p); m->m_max_file_size = ceph_decode_64(p); m->m_max_mds = ceph_decode_32(p); - m->m_num_mds = m->m_max_mds; + + /* + * pick out the active nodes as the m_num_mds, the m_num_mds + * maybe larger than m_max_mds when decreasing the max_mds in + * cluster side, in other case it should less than or equal + * to m_max_mds. + */ + m->m_num_mds = n = ceph_decode_32(p); + m->m_num_active_mds = m->m_num_mds; m->m_info = kcalloc(m->m_num_mds, sizeof(*m->m_info), GFP_NOFS); if (!m->m_info) goto nomem; /* pick out active nodes from mds_info (state > 0) */ - n = ceph_decode_32(p); for (i = 0; i < n; i++) { u64 global_id; u32 namelen; @@ -218,17 +225,6 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end) if (mds < 0 || state <= 0) continue; - if (mds >= m->m_num_mds) { - int new_num = max(mds + 1, m->m_num_mds * 2); - void *new_m_info = krealloc(m->m_info, - new_num * sizeof(*m->m_info), - GFP_NOFS | __GFP_ZERO); - if (!new_m_info) - goto nomem; - m->m_info = new_m_info; - m->m_num_mds = new_num; - } - info = &m->m_info[mds]; info->global_id = global_id; info->state = state; @@ -247,14 +243,6 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end) info->export_targets = NULL; } } - if (m->m_num_mds > m->m_max_mds) { - /* find max up mds */ - for (i = m->m_num_mds; i >= m->m_max_mds; i--) { - if (i == 0 || m->m_info[i-1].state > 0) - break; - } - m->m_num_mds = i; - } /* pg_pools */ ceph_decode_32_safe(p, end, n, bad); @@ -396,7 +384,7 @@ bool ceph_mdsmap_is_cluster_available(struct ceph_mdsmap *m) return false; if (m->m_damaged) return false; - if (m->m_num_laggy > 0) + if (m->m_num_laggy == m->m_num_active_mds) return false; for (i = 0; i < m->m_num_mds; i++) { if (m->m_info[i].state == CEPH_MDS_STATE_ACTIVE) diff --git a/include/linux/ceph/mdsmap.h b/include/linux/ceph/mdsmap.h index 0067d767c9ae..3a66f4f926ce 100644 --- a/include/linux/ceph/mdsmap.h +++ b/include/linux/ceph/mdsmap.h @@ -25,8 +25,9 @@ struct ceph_mdsmap { u32 m_session_timeout; /* seconds */ u32 m_session_autoclose; /* seconds */ u64 m_max_file_size; - u32 m_max_mds; /* size of m_addr, m_state arrays */ - int m_num_mds; + u32 m_max_mds; /* expected up:active mds number */ + int m_num_active_mds; /* actual up:active mds number */ + int m_num_mds; /* size of m_info array */ struct ceph_mds_info *m_info; /* which object pools file data can be stored in */