From patchwork Wed May 15 21:35:09 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Elder X-Patchwork-Id: 2574501 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 5C1093FDBC for ; Wed, 15 May 2013 21:35:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751368Ab3EOVfN (ORCPT ); Wed, 15 May 2013 17:35:13 -0400 Received: from mail-oa0-f49.google.com ([209.85.219.49]:54169 "EHLO mail-oa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751345Ab3EOVfM (ORCPT ); Wed, 15 May 2013 17:35:12 -0400 Received: by mail-oa0-f49.google.com with SMTP id k14so2796066oag.8 for ; Wed, 15 May 2013 14:35:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding:x-gm-message-state; bh=qLLFbwexdaguDOeSPsUYtGeqFvxBvhFSXI88Emch4Tc=; b=egyQu5nN2PdX6tT7t2LGzJdunIfw1fsXYAAA2Zk0aJLjXsM+SRjiDFzLh1eFk1uDpG IKE5qCnGi1HpHni6SN0J7pd9rA9PvpayJXonP3YJEvMo+LBkyte/skZwjspVJhWeQFYJ MB8syT9iqVXgHGCjL0S1pK4XaYRdJvgSuU7qr+07XBHTxLdv0UvK/aPDfw3CE373aBqV LC/iD3bdSwvJl1gbBUgDDK/rP8UhbI2lkAeWJlAdnBUf4vWGDEB62WspEslq3ImhfHp0 xhyZvq6225ZnvzYm9X4S/W4uNpqGSxy7ZFm7MS/CYsO/1ECMLqR7Indr1Atck5BvReoX 3jhA== X-Received: by 10.182.134.231 with SMTP id pn7mr17974577obb.11.1368653711586; Wed, 15 May 2013 14:35:11 -0700 (PDT) Received: from [172.22.22.4] (c-71-195-31-37.hsd1.mn.comcast.net. [71.195.31.37]) by mx.google.com with ESMTPSA id qj8sm4892439oeb.2.2013.05.15.14.35.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 15 May 2013 14:35:10 -0700 (PDT) Message-ID: <5193FF8D.2070403@inktank.com> Date: Wed, 15 May 2013 16:35:09 -0500 From: Alex Elder User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: ceph-devel@vger.kernel.org Subject: [PATCH] libceph: must hold mutex for reset_changed_osds() X-Gm-Message-State: ALoCoQn022m8P5j0rWN2DQtGamxc8nRrXKGDlQi6DzQOoB37ZM9cRsp+U/DszdOPsEQZzCB5qbPE Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org An osd client has a red-black tree describing its osds, and occasionally we would get crashes due to one of these trees tree becoming corrupt somehow. The problem turned out to be that reset_changed_osds() was being called without protection of the osd client request mutex. That function would call __reset_osd() for any osd that had changed, and __reset_osd() would call __remove_osd() for any osd with no outstanding requests, and finally __remove_osd() would remove the corresponding entry from the red-black tree. Thus, the tree was getting modified without having any lock protection, and was vulnerable to problems due to concurrent updates. This appears to be the only osd tree updating path that has this problem. It can be fairly easily fixed by moving the call up a few lines, to just before the request mutex gets dropped in kick_requests(). This resolves: http://tracker.ceph.com/issues/5043 Signed-off-by: Alex Elder Reviewed-by: Sage Weil --- net/ceph/osd_client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index d5953b8..3a246a6 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1675,13 +1675,13 @@ static void kick_requests(struct ceph_osd_client *osdc, int force_resend) __register_request(osdc, req); __unregister_linger_request(osdc, req); } + reset_changed_osds(osdc); mutex_unlock(&osdc->request_mutex); if (needmap) { dout("%d requests for down osds, need new map\n", needmap); ceph_monc_request_next_osdmap(&osdc->client->monc); } - reset_changed_osds(osdc); }