From patchwork Mon Apr 23 17:04:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 10357685 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 67E3360225 for ; Mon, 23 Apr 2018 17:05:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5882B28BD0 for ; Mon, 23 Apr 2018 17:05:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4B59628BD8; Mon, 23 Apr 2018 17:05:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B63CF28BD0 for ; Mon, 23 Apr 2018 17:05:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932256AbeDWRFU (ORCPT ); Mon, 23 Apr 2018 13:05:20 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:39088 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755643AbeDWRFM (ORCPT ); Mon, 23 Apr 2018 13:05:12 -0400 Received: by mail-wr0-f193.google.com with SMTP id q3-v6so33282763wrj.6 for ; Mon, 23 Apr 2018 10:05:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=DcdR46vJDnIDQ1sp4hPUkFTk6hyST4nN0asyx4GvV3k=; b=o7CW5kdWZ66lQOO6I7SocpLjP8MxpSHEpilE4Nf67+M9uUh34ETH69siQTMx7R3XkJ CmmW0Z5BLsAdcNLf4TUN2h/Ex7xbmCUxrH15mRe6qWOkSE1RQqTRW06Ogpynr8f87u9e HwYtTaHuo8BwQLYTPm4VNXD/So7I2Sm7s/0YXnLpfOzTd0e4G6n6No8EfRj375Mgb10O 3hFbP8VwiojuOTqW2HlHq7rQP15P+Z5lB733X0oVHLCA/JVDZV4dSyyghPM3PCyvD2Gm e0EeWRvIIMv5lDdEsGR6vSuV6ifmejYeIfaJzcIhQkMQ9oS0vDl1VHKsa2FDOAabhqf+ EG1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=DcdR46vJDnIDQ1sp4hPUkFTk6hyST4nN0asyx4GvV3k=; b=kvPLRCTfGhfu0qGXbgJ8AIuP5jN2RooIrm6TfFBN3CcB3cNt7KiKElNyirGp7EoF8l 6E5crLpUdSH4aVqald0anjQMp4wpbQbIFGaEHn6/FKWPsmhn5DIAswTCR5fFiKn/b4xz cLMMaZJ8n/1saFXVpIMjGVkxhzeslZXVh1+tDM/AUHAckS7jYVDQ64fFOF30k3BlJSQd nvAEkvp0lx7X66pb53snTIzYSt7Wj5zzC9kiMFVGXevqou0ZRosQgSjYiOiwGEnlOIZ9 oPCchaHljK6etWIUxU/tn+sFXiKQlR2XVwKpgmmC+dHXL5REatFk/YSK86E6WzgRBprw Mk3A== X-Gm-Message-State: ALQs6tAmKyji3j8FeviEzwt6sOET4CMtXWfoE3ybUPz+sdvkrf0Ct1Qk i+ZL03nRKkw9SycWQRsWmOgfoLd2 X-Google-Smtp-Source: AIpwx48o8QLVjw/JrA3PGliXqzLTYUDOebnMXD5tGHKdcebEiYhSxna7EOtV5OMK6DbXVZG38czfQQ== X-Received: by 10.28.26.73 with SMTP id a70mr10521733wma.77.1524503110914; Mon, 23 Apr 2018 10:05:10 -0700 (PDT) Received: from orange.brq.redhat.com. (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id b47-v6sm19029085wrg.13.2018.04.23.10.05.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Apr 2018 10:05:05 -0700 (PDT) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Subject: [PATCH 2/2] libceph: reschedule a tick in finish_hunting() Date: Mon, 23 Apr 2018 19:04:25 +0200 Message-Id: <1524503065-24623-3-git-send-email-idryomov@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1524503065-24623-1-git-send-email-idryomov@gmail.com> References: <1524503065-24623-1-git-send-email-idryomov@gmail.com> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we go without an established session for a while, backoff delay will climb to 30 seconds. The keepalive timeout is also 30 seconds, so it's pretty easily hit after a prolonged hunting for a monitor: we don't get a chance to send out a keepalive in time, which means we never get back a keepalive ack in time, cutting an established session and attempting to connect to a different monitor every 30 seconds: [Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established [Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon [Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established [Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon [Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established [Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon [Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established [Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon The regular keepalive interval is 10 seconds. After ->hunting is cleared in finish_hunting(), call __schedule_delayed() to ensure we send out a keepalive after 10 seconds. Cc: stable@vger.kernel.org # 4.7+ Link: http://tracker.ceph.com/issues/23537 Signed-off-by: Ilya Dryomov --- net/ceph/mon_client.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c index 02c441c12c38..21ac6e3b96bb 100644 --- a/net/ceph/mon_client.c +++ b/net/ceph/mon_client.c @@ -1133,6 +1133,7 @@ static void finish_hunting(struct ceph_mon_client *monc) monc->hunting = false; monc->had_a_connection = true; un_backoff(monc); + __schedule_delayed(monc); } }