From patchwork Wed Feb 18 13:25:40 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Dryomov X-Patchwork-Id: 5843991 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id D3B4C9F373 for ; Wed, 18 Feb 2015 13:25:52 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DC3672026C for ; Wed, 18 Feb 2015 13:25:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E79B22025A for ; Wed, 18 Feb 2015 13:25:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750851AbbBRNZt (ORCPT ); Wed, 18 Feb 2015 08:25:49 -0500 Received: from mail-lb0-f169.google.com ([209.85.217.169]:40955 "EHLO mail-lb0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750772AbbBRNZs (ORCPT ); Wed, 18 Feb 2015 08:25:48 -0500 Received: by lbdu10 with SMTP id u10so1024785lbd.7 for ; Wed, 18 Feb 2015 05:25:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=DwEuLD7nLUD2Z/ZjDROYI6KeENMk9sV2eVF5pFHRXvc=; b=M+PRX/kmvmA5BF+9VkAbRxZVTPIkK0MolkSPhWPwmbwmz7eCfakjWFH51xOJ4OPzRC A6tIL87imAg6y6Nb7duur+hZ0yl8v7ROfVr24al/nN4teqW51kalC28z9K60THeyqh/5 aHKa46O6Shw0JUOQqcBkYC/sG8jpGvC7EwrdsNq9OSn/gfdqGGF10uqlPzkjXA8NmWbU 0fssMbgHujHGE/1LGYe5xGXnw4BwEoZu5Z+kh5QHeg/2fyZHiZoIqLmkcgJ1WNpC1/7g OPmc7EqEBhqvTgW1+lG8g4InCWqt1sAznbHS2nu8AUEdFTz0PzbvVhsbrSTTQcjh8Bj3 QW1g== X-Received: by 10.153.11.130 with SMTP id ei2mr25065603lad.53.1424265946621; Wed, 18 Feb 2015 05:25:46 -0800 (PST) Received: from localhost.localdomain ([185.16.31.201]) by mx.google.com with ESMTPSA id oq8sm4100097lbb.8.2015.02.18.05.25.45 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Feb 2015 05:25:45 -0800 (PST) From: Ilya Dryomov To: ceph-devel@vger.kernel.org Cc: Sage Weil Subject: [PATCH] libceph: fix double __remove_osd() problem Date: Wed, 18 Feb 2015 16:25:40 +0300 Message-Id: <1424265940-40771-1-git-send-email-idryomov@gmail.com> X-Mailer: git-send-email 1.9.3 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It turns out it's possible to get __remove_osd() called twice on the same OSD. That doesn't sit well with rb_erase() - depending on the shape of the tree we can get a NULL dereference, a soft lockup or a random crash at some point in the future as we end up touching freed memory. One scenario that I was able to reproduce is as follows: con_fault_finish() osd_reset() ceph_osdc_handle_map() kick_requests() reset_changed_osds() __reset_osd() __remove_osd() __kick_osd_requests() __reset_osd() __remove_osd() <-- !!! A case can be made that osd refcounting is imperfect and reworking it would be a proper resolution, but for now Sage and I decided to fix this by adding a safe guard around __remove_osd(). Fixes: http://tracker.ceph.com/issues/8087 Cc: Sage Weil Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular and lingering lists in __remove_osd() Cc: stable@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to WARN for __remove_osd() asserts Cc: stable@vger.kernel.org # 3.9+ Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil Reviewed-by: Alex Elder --- net/ceph/osd_client.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 53299c7b0ca4..f693a2f8ac86 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1048,14 +1048,24 @@ static void put_osd(struct ceph_osd *osd) */ static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd) { - dout("__remove_osd %p\n", osd); + dout("%s %p osd%d\n", __func__, osd, osd->o_osd); WARN_ON(!list_empty(&osd->o_requests)); WARN_ON(!list_empty(&osd->o_linger_requests)); - rb_erase(&osd->o_node, &osdc->osds); list_del_init(&osd->o_osd_lru); - ceph_con_close(&osd->o_con); - put_osd(osd); + rb_erase(&osd->o_node, &osdc->osds); + RB_CLEAR_NODE(&osd->o_node); +} + +static void remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd) +{ + dout("%s %p osd%d\n", __func__, osd, osd->o_osd); + + if (!RB_EMPTY_NODE(&osd->o_node)) { + ceph_con_close(&osd->o_con); + __remove_osd(osdc, osd); + put_osd(osd); + } } static void remove_all_osds(struct ceph_osd_client *osdc) @@ -1065,7 +1075,7 @@ static void remove_all_osds(struct ceph_osd_client *osdc) while (!RB_EMPTY_ROOT(&osdc->osds)) { struct ceph_osd *osd = rb_entry(rb_first(&osdc->osds), struct ceph_osd, o_node); - __remove_osd(osdc, osd); + remove_osd(osdc, osd); } mutex_unlock(&osdc->request_mutex); } @@ -1106,7 +1116,7 @@ static void remove_old_osds(struct ceph_osd_client *osdc) list_for_each_entry_safe(osd, nosd, &osdc->osd_lru, o_osd_lru) { if (time_before(jiffies, osd->lru_ttl)) break; - __remove_osd(osdc, osd); + remove_osd(osdc, osd); } mutex_unlock(&osdc->request_mutex); } @@ -1121,8 +1131,7 @@ static int __reset_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd) dout("__reset_osd %p osd%d\n", osd, osd->o_osd); if (list_empty(&osd->o_requests) && list_empty(&osd->o_linger_requests)) { - __remove_osd(osdc, osd); - + remove_osd(osdc, osd); return -ENODEV; } @@ -1926,6 +1935,7 @@ static void reset_changed_osds(struct ceph_osd_client *osdc) { struct rb_node *p, *n; + dout("%s %p\n", __func__, osdc); for (p = rb_first(&osdc->osds); p; p = n) { struct ceph_osd *osd = rb_entry(p, struct ceph_osd, o_node);