From patchwork Fri Jun 9 05:21:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Nicholas A. Bellinger" X-Patchwork-Id: 9777181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7430B6034B for ; Fri, 9 Jun 2017 05:21:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 666AD28546 for ; Fri, 9 Jun 2017 05:21:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5B1EB2858C; Fri, 9 Jun 2017 05:21:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DE497285A6 for ; Fri, 9 Jun 2017 05:21:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751511AbdFIFVy (ORCPT ); Fri, 9 Jun 2017 01:21:54 -0400 Received: from mail.linux-iscsi.org ([67.23.28.174]:39039 "EHLO linux-iscsi.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751509AbdFIFVy (ORCPT ); Fri, 9 Jun 2017 01:21:54 -0400 Received: from [192.168.1.66] (75-37-194-224.lightspeed.lsatca.sbcglobal.net [75.37.194.224]) (using SSLv3 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: nab) by linux-iscsi.org (Postfix) with ESMTPSA id 519EE40B11; Fri, 9 Jun 2017 05:25:11 +0000 (UTC) Message-ID: <1496985712.28997.13.camel@haakon3.risingtidesystems.com> Subject: Re: ESXi snapshot I/O error after upgrade to 4.9.30 From: "Nicholas A. Bellinger" To: Martin Svec Cc: target-devel Date: Thu, 08 Jun 2017 22:21:52 -0700 In-Reply-To: References: X-Mailer: Evolution 3.4.4-1 Mime-Version: 1.0 Sender: target-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: target-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Martin, On Mon, 2017-06-05 at 18:05 +0200, Martin Svec wrote: > Hello Nic, > > Today, three of our vSphere VMs running on iSCSI LIO 4.9.30 failed to create a backup snapshot and > hung with errors like "Create virtual machine snapshot xxxxx. Unable to close the > '/vmfs/volumes/.../xxxxx-000001-ctk.vmdk' file: 5 (Input/output error)." or other more general I/O > errors. It always happened during snapshot creation and there were multiple "Detected MISCOMPARE + > Target/iblock: Send MISCOMPARE check condition and sense" in target log at the same time. > Subsequently, virtual machines lost access to their virtual disks and required VM reset. The > failures seem to be independent of each other and VMs ran on different hosts. > So nothing else in the target logs of interest..? I assume the MISCOMPARE warnings occur at the normal rate..? > The storage was upgraded to 4.9.30 only two days ago. However, we have an identical iSCSI LIO > storage running 4.9.27 more than three weeks without any issue in the same vSphere cluster. So I'm > wondering if this could be caused by a stable target patch between 4.9.27 and 4.9.30. Quick look > into changelog shows "target: Fix compare_and_write_callback handling for non GOOD status" as the > only fix related to CAW since 4.9.27. What do you think? > > We have ESXi 5.5.0 rev. 5230635 on all ESXi nodes. Note the 'target: Fix compare_and_write_callback handling for non GOOD status' change only effects COMPARE_AND_WRITE related I/Os that actually fail. That is, unless the underlying backend target device was actually generating hard I/O errors (eg: something like the following where 'sdc' is your target backend device): Buffer I/O error on dev sdc, logical block 0, async page read blk_update_request: I/O error, dev sdc, sector 2097144 blk_update_request: I/O error, dev sdc, sector 2097144 Buffer I/O error on dev sdc, logical block 262143, async page read blk_update_request: I/O error, dev sdc, sector 0 Buffer I/O error on dev sdc, logical block 0, async page read blk_update_request: I/O error, dev sdc, sector 0 then the CAW change above in v4.9.30 won't have any effect. If the issue is reproducible, you can verify by re-enabling the debug message for a hard I/O error in compare_and_write_callback(): That said, if you can confirm the backend device is not generating hard I/O errors for COMPARE_AND_WRITE I/O up to target-core, I'd wager the ESX host failures observed aren't specific to the change. --- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index ca42fba..a0de5ab 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -479,7 +479,7 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd, bool succes * been failed with a non-zero SCSI status. */ if (cmd->scsi_status) { - pr_debug("compare_and_write_callback: non zero scsi_status:" + printk_ratelimited("compare_and_write_callback: non zero scsi_status:" " 0x%02x\n", cmd->scsi_status); *post_ret = 1; if (cmd->scsi_status == SAM_STAT_CHECK_CONDITION)