From patchwork Thu Jul 28 22:21:45 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Somnath Roy X-Patchwork-Id: 9251769 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 200D76077C for ; Thu, 28 Jul 2016 22:21:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B8BE27E63 for ; Thu, 28 Jul 2016 22:21:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 001D727F60; Thu, 28 Jul 2016 22:21:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0FFC227F54 for ; Thu, 28 Jul 2016 22:21:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751728AbcG1WVt (ORCPT ); Thu, 28 Jul 2016 18:21:49 -0400 Received: from mail-by2nam01on0061.outbound.protection.outlook.com ([104.47.34.61]:15118 "EHLO NAM01-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751438AbcG1WVs convert rfc822-to-8bit (ORCPT ); Thu, 28 Jul 2016 18:21:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sandiskcorp.onmicrosoft.com; s=selector1-sandisk-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=TGBVLhWZHAnxQ7+OMxhuhHjiFusqZ0GCrVsDnl6oBxA=; b=fFWMIpVTsL3M2WADGHQ/1UcHOCUVpw/BxLZ1WIsEpxaebMp/m9gv9gD10oftB80ijCdocATWi1Ku71mF/bHEui/cue+FjMma5DG7wUawbXs1foqiNDtBvMOSjw/6prIY70SnC5aY+A1jUGd34Ihut4M0pWuXDeM6eZ5jPY4WM9U= Received: from BL2PR02MB2115.namprd02.prod.outlook.com (10.167.97.13) by BL2PR02MB2114.namprd02.prod.outlook.com (10.167.97.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.549.15; Thu, 28 Jul 2016 22:21:46 +0000 Received: from BL2PR02MB2115.namprd02.prod.outlook.com ([10.167.97.13]) by BL2PR02MB2115.namprd02.prod.outlook.com ([10.167.97.13]) with mapi id 15.01.0549.016; Thu, 28 Jul 2016 22:21:45 +0000 From: Somnath Roy To: "Ma, Jianpeng" CC: ceph-devel Subject: RE: BLueStore Deadlock Thread-Topic: BLueStore Deadlock Thread-Index: AdHoqlo3LS+osxH0TeOY0HU9pqmuyAAOsgjAAAY/0zA= Date: Thu, 28 Jul 2016 22:21:45 +0000 Message-ID: References: <6AA21C22F0A5DA478922644AD2EC308C373B92BC@SHSMSX101.ccr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Somnath.Roy@sandisk.com; x-originating-ip: [63.163.107.100] x-ms-office365-filtering-correlation-id: 02ac9cfc-12ad-4855-20dc-08d3b7358f3d x-microsoft-exchange-diagnostics: 1; BL2PR02MB2114; 20:L56mKFd+VPre3oljepmH9MgfyEFaJ9BhLc8R/u43TT6prpJgK+TSGIBfFCZHZNFUlBcPWWR/BmMHosvFzRrR/4pxXSwBRUjo9FxSjdK9V2C4PXz5vhks1asYnYfxIHFSL8/aC8czriwlXEz691T5WVii/zJItsCFLXoqbccd7/Ljz094InbWWaDN9301YdzlJi81pbOLeYhJuvISbmSJnsr9LY9rtVRPF8xwyuWGEXMI1QFElw3ibrawK9iADcJL x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BL2PR02MB2114; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(166708455590820)(228905959029699); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026); SRVR:BL2PR02MB2114; BCL:0; PCL:0; RULEID:; SRVR:BL2PR02MB2114; x-forefront-prvs: 00179089FD x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(377454003)(13464003)(374574003)(199003)(189002)(189998001)(86362001)(106356001)(76576001)(2900100001)(7846002)(7736002)(6116002)(66066001)(2906002)(68736007)(110136002)(92566002)(305945005)(87936001)(3846002)(122556002)(102836003)(10400500002)(101416001)(33656002)(15975445007)(551934003)(221733001)(586003)(3480700004)(7696003)(19580395003)(99286002)(74316002)(8676002)(76176999)(5002640100001)(8936002)(3660700001)(54356999)(3280700002)(81166006)(3900700001)(50986999)(81156014)(19580405001)(105586002)(97736004)(4326007)(9686002)(77096005); DIR:OUT; SFP:1101; SCL:1; SRVR:BL2PR02MB2114; H:BL2PR02MB2115.namprd02.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: sandisk.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: sandisk.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jul 2016 22:21:45.0897 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: fcd9ea9c-ae8c-460c-ab3c-3db42d7ac64d X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL2PR02MB2114 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Jianpeng, I thought through this and it seems there could be one possible deadlock scenario. tp_osd_tp --> waiting on onode->flush() for previous txc to finish. Holding Wlock(coll) aio_complete_thread --> waiting for RLock(coll) No other thread will be blocked here. We do add previous txc in the flush_txns list during _txc_write_nodes() and before aio_complete_thread calling _txc_state_proc(). So, if within this time frame if we have IO on the same collection , it will be waiting on unfinished txcs. The solution to this could be the following.. root@emsnode5:~/ceph-master/src# git diff I am not able to reproduce this in my setup , so, if you can do the above changes in your env and see if you are still hitting the issue, would be helpful. Thanks & Regards Somnath -----Original Message----- From: Somnath Roy Sent: Thursday, July 28, 2016 8:45 AM To: 'Ma, Jianpeng' Cc: ceph-devel Subject: RE: BLueStore Deadlock Hi Jianpeng, Are you trying with latest master and still hitting the issue (seems so but confirming) ? The following scenario should not be creating deadlock because of the following reason. Onode->flush() is waiting on flush_lock() and from _txc_finish() it is releasing that before taking osr->qlock(). Am I missing anything ? I got a deadlock in this path in one of my earlier changes in the following pull request (described in detail there) and it is fixed and merged. https://github.com/ceph/ceph/pull/10220 If my theory is right , we are hitting deadlock because of some other reason may be. It seems you are doing WAL write , could you please describe the steps to reproduce ? Thanks & Regards Somnath From: Ma, Jianpeng [mailto:jianpeng.ma@intel.com] Sent: Thursday, July 28, 2016 1:46 AM To: Somnath Roy Cc: ceph-devel; Ma, Jianpeng Subject: BLueStore Deadlock Hi Roy: When do seqwrite w/ rbd+librbd, I met deadlock for bluestore. It can reproduce 100%.(based on 98602ae6c67637dbadddd549bd9a0035e5a2717) By add message and this found this bug caused by bf70bcb6c54e4d6404533bc91781a5ef77d62033. Consider this case: tp_osd_tp aio_complete_thread kv_sync_thread Rwlock(coll) txc_finish_io _txc_finish do_write lock(osr->qlock) lock(osr->qlock) do_read RLock(coll) need osr->qlock to continue onode->flush() need coll onode->readlock to continue need previous txc complete But current I don't how to fix this. Thanks! PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/src/os/bluestore/BlueStore.cc b/src/os/bluestore/BlueStore.cc index e8548b1..575a234 100644 --- a/src/os/bluestore/BlueStore.cc +++ b/src/os/bluestore/BlueStore.cc @@ -4606,6 +4606,11 @@ void BlueStore::_txc_state_proc(TransContext *txc) (txc->first_collection)->lock.get_read(); } for (auto& o : txc->onodes) { + { + std::lock_guard l(o->flush_lock); + o->flush_txns.insert(txc); + } + for (auto& p : o->blob_map.blob_map) { p.bc.finish_write(txc->seq); } @@ -4733,8 +4738,8 @@ void BlueStore::_txc_write_nodes(TransContext *txc, KeyValueDB::Transaction t) dout(20) << " onode " << (*p)->oid << " is " << bl.length() << dendl; t->set(PREFIX_OBJ, (*p)->key, bl); - std::lock_guard l((*p)->flush_lock); - (*p)->flush_txns.insert(txc); + /*std::lock_guard l((*p)->flush_lock); + (*p)->flush_txns.insert(txc);*/ }